Creating an Eval

How to create an eval on the Datalab.

Portex makes it easy to take your expert domain knowledge and generate custom evaluations or "evals" that you can offer to model builders on the Datalab. Follow this guide to get started with writing and building evals on the Datalab. First, we walk through the details of effective evals on Portex.

Details

Evaluations or "evals" have become essential tools to measure the progress and capabilities of AI. Think of evals simply as well-designed tests to determine the subjects and domains where AI systems excel, or falter.

Writing effective evals on Portex

Follow these guidelines to write and design good evals on Portex.

What types of questions/tasks should I ask?

Good evals test capabilities that are difficult in your field/subject of expertise and require reasoning rather than brute-force memorization. Avoid trivia-type questions or questions that can be answered easily via web search.

What format does my answer need to be in? Do I need to include a detailed rationale?

Answers do not need to be strictly formed (e.g. exact matches, multiple choice). The only requirement is that there is no extraneous information in the answer key that was not explicitly asked for in the task/question itself. Detailed rationales are not required in the answer key.

Can I include images and supporting files to accompany my tasks?

Yes. Evals on Portex support multiple modalities including image and text. You can upload a single file alongside each task. The accepted files are .json, .csv, .md, .html, .txt, .webp, .jpg, .jpeg, .png, .gif, .pdf

How do I know if my questions are hard enough?

For every eval on Portex, we maintain leaderboards showing how state-of-the-art models are performing on your tasks. You can use these as a guideline for your eval and can always add more difficult questions if the models are doing very well on the first batch.

How many questions should I include?

Aim for quality over quantity. Starting with 5-10 questions is great, but the more questions the better.

There are two ways to create evals on Portex: the Eval Builder and the Eval Dataset uploader. We walk through each below.

Writing an Eval with the Eval Builder

The eval builder allows experts to write evals into the platform directly without existing files.

You may optionally upload reference files (make sure your reference file is one of the following accepted formats) to accompany your task.

.json, .csv, .md, .html, .txt, .webp, .jpg, .jpeg, .png, .gif, .pdf

Creating an Eval Dataset Bundle

If you instead have files that you want to work with in an editor or IDE, you can upload those directly with the Eval Dataset option.

To get started, create a new dataset and choose Eval Dataset - from here you will be prompted to upload your files. Eval datasets on Portex are uploaded as a bundle of 4 files (2 optional). This forms a Core Dataset which can be offered to sale to model builders in addition to per-eval runs. We walk through each file in the Core Dataset below.

Formatting your Eval Dataset Bundle

Eval datasets on Portex use 4 parts. Two are required.

Upload Checklist

  1. Required: tasks.json

  2. Required: answers.json

  3. Optional: reference_files.zip (or .tar.gz)

  4. Optional (but recommended for premium offering): core_dataset.zip (or .tar.gz)

Rule: every task_id in answer_keys.json must also exist in task_list.json

🧩 Task List

A JSON file containing your tasks. Each record must include:

  • task_id: unique identifier

  • task_prompt: the question for the model

  • reference_file: if your task requires a reference file, include the file name here.

Example:

This file will be downloadable by eval buyers so they can generate model responses.


🧠 Answer Keys

A JSON file mapping task_id to the correct output. Must include:

  • task_id: unique identifier used above

  • answer: the expected result (numerical, textual, objective criteria etc.)

Optional fields you might include to supplement:

  • rationale: explanation of how the answer was derived, reasoning traces

  • answer_type: e.g. "percentage", "multiple_choice", "exact_match"

Example:

By default, your answer key is blinded from eval buyers. You may optionally add it to the Core Dataset below for purchase.


📂 Reference Files (Optional)

If your tasks refer to text documents or images or other supporting files, bundle them into a .zip or .gz archive.

Example archive/zip structure:

Buyers will extract this archive to access the referenced files and generate model responses.

Accepted file types include: .json, .csv, .md, .html, .txt, .webp, .jpg, .jpeg, .png, .gif, .pdf


📘 Core Dataset (Optional)

A .zip or .gz archive that may optionally contain the answer key and/or additional files that support your reasoning or calculations (analyst notes, data tables, or expert commentary).

Example archive structure:

Buyers can use this core dataset to validate or audit the answer logic, or even refine their models using reinforcement learning methods. The contents of this dataset are up to you as the expert to decide but including the answer key can generally support a higher price point.


Creating an Eval Listing

Once you've uploaded your eval dataset, you can create a listing for it.

The first step will be setting two prices:

  1. A per-eval price: a set price model builders pay each time they submit model responses and receive a performance report.

  2. (If there is a Core Dataset) Core Dataset price and minimum bid: a fixed "buy now" price and minimum bid for model builders to access the Core Dataset (tasks and answers, as well as reference files and knowledge reference if applicable).

The next step is configuring your license and all other relevant details on the listing editor. Once you publish your listing, your eval will be ready for model builders to access.

Last updated