Importing an Eval
Import an eval from JSON files for advanced users.
If you prefer authoring evals in an editor or IDE, you can upload structured JSON files directly to the Datalab. This path is intended for advanced users who want full control over the eval schema or already have files ready to go locally.
Upload Flow
From the Data Studio, go to Datasets > Upload a File, then select "Eval Dataset." You will be prompted to upload your files.

Eval Dataset Bundle
An eval dataset consists of three parts:
tasks.json (required)
answers.json (required)
Reference files as a .zip archive (optional)
Every task_id in answers.json must also exist in tasks.json. Ensure task ids are unique.
tasks.json
A JSON array of task objects. Each task must include:
task_id: a unique identifier (string)task_prompt: the full prompt for the modelreference_file: filename of the attached reference file (or empty string if none)
This file is downloadable by eval buyers so they can generate model responses.
answers.json
A JSON array of answer objects. Each answer must include:
task_id: matches the task_id in tasks.jsonanswer: the correct output (text, number, or structured content)reference_file: the associated reference file name (if any)criteria: an array of grading criteria (see below)passThreshold: minimum weighted score (0-100) for the task to be considered passed
Optional fields:
tools: array of tool names if the task involves tool use (can be empty)
Criteria schema
Each criterion in the criteria array has the following fields:
id
string
Yes
Unique identifier (UUID recommended)
name
string
Yes
Short label for the criterion
type
string
Yes
One of: semantic, lexical, binary, ordinal, numeric, regex
description
string
Yes
What constitutes a correct response
weight
number
Yes
Percentage weight (all criteria weights for a task should sum to 100)
rationale
string
No
Explanation of how the correct answer is derived
examples
array
No
Example responses (can be empty)
semanticPrompt
string
For semantic
The prompt sent to the LLM jury to evaluate the response (must contain exact figures)
The answer key is private by default. Buyers never see it unless you sell the Core Dataset.
Reference Files (Optional)
Bundle any reference files into a .zip archive. Each task can reference at most one file. Accepted formats: PDF, images (JPG, PNG, WebP, GIF), CSV, TXT.
No video or audio files are supported.
The archive structure should be flat (files at the root level):
After Upload
Once your eval dataset is created, proceed to Publishing Your Eval to create a listing.
Last updated