Importing an Eval

Import an eval from JSON files for advanced users.

If you prefer authoring evals in an editor or IDE, you can upload structured JSON files directly to the Datalab. This path is intended for advanced users who want full control over the eval schema or already have files ready to go locally.

Upload Flow

From the Data Studio, go to Datasets > Upload a File, then select "Eval Dataset." You will be prompted to upload your files.

Eval Dataset Bundle

An eval dataset consists of three parts:

  1. tasks.json (required)

  2. answers.json (required)

  3. Reference files as a .zip archive (optional)

Every task_id in answers.json must also exist in tasks.json. Ensure task ids are unique.

tasks.json

A JSON array of task objects. Each task must include:

  • task_id: a unique identifier (string)

  • task_prompt: the full prompt for the model

  • reference_file: filename of the attached reference file (or empty string if none)

This file is downloadable by eval buyers so they can generate model responses.

answers.json

A JSON array of answer objects. Each answer must include:

  • task_id: matches the task_id in tasks.json

  • answer: the correct output (text, number, or structured content)

  • reference_file: the associated reference file name (if any)

  • criteria: an array of grading criteria (see below)

  • passThreshold: minimum weighted score (0-100) for the task to be considered passed

Optional fields:

  • tools: array of tool names if the task involves tool use (can be empty)

Criteria schema

Each criterion in the criteria array has the following fields:

Field
Type
Required
Description

id

string

Yes

Unique identifier (UUID recommended)

name

string

Yes

Short label for the criterion

type

string

Yes

One of: semantic, lexical, binary, ordinal, numeric, regex

description

string

Yes

What constitutes a correct response

weight

number

Yes

Percentage weight (all criteria weights for a task should sum to 100)

rationale

string

No

Explanation of how the correct answer is derived

examples

array

No

Example responses (can be empty)

semanticPrompt

string

For semantic

The prompt sent to the LLM jury to evaluate the response (must contain exact figures)

The answer key is private by default. Buyers never see it unless you sell the Core Dataset.

Reference Files (Optional)

Bundle any reference files into a .zip archive. Each task can reference at most one file. Accepted formats: PDF, images (JPG, PNG, WebP, GIF), CSV, TXT.

No video or audio files are supported.

The archive structure should be flat (files at the root level):

After Upload

Once your eval dataset is created, proceed to Publishing Your Eval to create a listing.

Last updated