Creating an Eval

How to create an eval using the Eval Builder.

The Eval Builder is the primary way to create evals on Portex. It lets you write tasks, answers, and grading criteria directly in the platform without preparing files externally.

If you prefer working with JSON files in an editor or IDE, see Importing an Eval.

Open the Eval Builder

From the Data Studio, navigate to Evals > Eval Builder in the left sidebar. This opens the Eval Builder.

At the top, give your eval a name. The builder auto-saves as you work.

Write a Task

The left panel shows your task list. Click "+ Add Task" to create a new task.

In the right panel, the Task tab provides a markdown editor with LaTeX support. Write your task prompt here. Be specific about the expected output format and any constraints (see the Eval Design Guide for best practices).

Attach a reference file

If your task requires the model to analyze a document, image, or dataset, click "Attach Reference File" below the editor. You can attach one file per task. Accepted formats: PDF, images (JPG, PNG, WebP, GIF), CSV, TXT, JSON, Markdown, HTML.

We currently support one reference file per task, but you can combine/append files as needed. Reach out to [email protected] if you'd like support for a more advanced enviornment for your task.

Write the Answer (Golden Reference Solution)

Switch to the Answer tab and enter the correct answer for the task. This field is required. This serves as the golden reference solution. Think: what is the most important, salient output from this task?

The answer key is kept private by default. Models and buyers do not see it unless you sell the Core Dataset.

Define Grading Criteria (Rubric)

Switch to the Criterion tab. Here you define a rubric of criteria.

Note: If you use a rubric, make sure to specify exact numbers in the "Explicit Grading Criterion". For example, if the solution to a calculation is $145,824, this number must appear in the Explicit Grading Criterion so the judges know what to look for. Repeat info from the Answer as needed.

Click "+ Add Grading Criterion" to add a criterion. For each criterion, fill in:

Criterion Name (required): a short label describing what is being checked
Weight: the percentage weight for this criterion
Description (optional): context for what you are checking
Explicit Grading Criterion (required): the specific details for the judge of what is successful (must include numbers and exact figures). Supports markdown and LaTeX. For semantic criteria, this is the prompt sent to the LLM jury.

Weight distribution

The bottom of the Criterion tab shows a weight distribution bar and the total weight sum. Click "Auto-normalize" if your weights do not sum to 100%.

Pass threshold

Set the pass threshold as a percentage (e.g., 70%). A task is marked as passed if the weighted score across all criteria meets or exceeds this value.

Preview

The Preview tab renders the task as it will appear to model builders, including formatted markdown and any attached reference files.

Create the Eval Dataset

Once you have written all your tasks with answers and criteria, click "Create Eval Dataset" in the top right. This packages your work into an eval dataset on the Datalab.

From here, you can publish a listing to make the eval available to model builders.

You can view an example eval by toggling "View Example" in the Eval Builder header.

PreviousEval Design Guide NextImporting an Eval

Last updated 20 days ago

hashtagOpen the Eval Builder

hashtagWrite a Task

hashtagAttach a reference file

hashtagWrite the Answer (Golden Reference Solution)

hashtagDefine Grading Criteria (Rubric)

hashtagWeight distribution

hashtagPass threshold

hashtagPreview

hashtagCreate the Eval Dataset

Open the Eval Builder

Write a Task

Attach a reference file

Write the Answer (Golden Reference Solution)

Define Grading Criteria (Rubric)

Weight distribution

Pass threshold

Preview

Create the Eval Dataset