# Importing an Eval

If you prefer authoring Q\&A evals in an editor or IDE, you can upload structured JSON files directly to the Datalab. This path is intended for advanced users who want full control over the eval schema or already have files ready to go locally.

{% hint style="info" %}
JSON import is currently available for Q\&A (single-turn) evals only. To create an agentic eval, use the [Eval Builder](https://docs.portexai.com/portex-docs/for-experts/creating-an-eval).
{% endhint %}

## Upload Flow

From the Data Studio, go to Datasets > Upload a File, then select "Eval Dataset." You will be prompted to upload your files.

<figure><img src="https://867705781-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FUkUyaZptb9tX5Pbk7oma%2Fuploads%2Fgit-blob-deb66578a127796b5e268ffd15cfdabf3145e2ca%2FScreenshot%202026-02-13%20at%2010.01.29%20PM.png?alt=media" alt=""><figcaption></figcaption></figure>

## Eval Dataset Bundle

An eval dataset consists of up to four parts:

1. tasks.json (required)
2. answers.json (required)
3. Reference files as a .zip archive (optional)
4. Knowledge reference as a .zip archive (optional, for premium offerings)

Every `task_id` in answers.json must also exist in tasks.json. Ensure task ids are unique.

## tasks.json

A JSON array of task objects. Each task must include:

* `task_id`: a unique identifier (string)
* `task_prompt`: the full prompt for the model
* `reference_file`: filename of the attached reference file (or empty string if none)

{% code title="// An example task from the AI Productivity Index (APEX), which measures performance in finance, law, consulting, and medicine." overflow="wrap" expandable="true" %}

```json
[
  {
    "task_id": "828",
    "task_prompt": "I'm working to optimize the distribution of marketing spend for my client for the next cycle to better fit their Family Man target group. They want to focus on campaigns that they have experience with, meaning only Partner + Category combos that they've done more than 2 campaigns with, previously. Among those, disregard campaigns where Family Man is not in top 2 in the attached dataset. The ones left are our campaigns of interest for this analysis. First off, what is the total campaign cost and total impressions of the campaigns of interest?",
    "reference_file": "Target_Group (1).csv"
  }
]
```

{% endcode %}

This file is downloadable by eval buyers so they can generate model responses.

## answers.json

A JSON array of answer objects. Each answer must include:

* `task_id`: matches the task\_id in tasks.json
* `answer`: the correct output (text, number, or structured content)
* `reference_file`: the associated reference file name (if any)
* `criteria`: an array of grading criteria (see below)
* `passThreshold`: minimum weighted score (0-100) for the task to be considered passed

Optional fields:

* `tools`: array of tool names if the task involves tool use (can be empty)

{% code title="// An example task from the AI Productivity Index (APEX), which measures performance in finance, law, consulting, and medicine." overflow="wrap" expandable="true" %}

```json
[
  {
    "task_id": "828",
    "answer": "Calculates the total campaign costs for campaigns of interest as $1,591,118.",
    "reference_file": "Target_Group (1).csv",
    "tools": [],
    "criteria": [
      {
        "id": "70093557-b436-4700-804a-51a489b949ad",
        "name": "Calculates the total campaign costs",
        "type": "semantic",
        "description": "Calculates the total campaign costs for campaigns of interest as $1,591,118.",
        "weight": 14.26,
        "rationale": "Using the dataset Target_Group.csv, filter for campaigns with more than 2 campaigns with the same Category + Partners combo. Then filter for campaigns with Family Man in either Target_Group_1 or Target_Group_2. Sum all campaign costs, yielding $1,591,118.",
        "examples": [],
        "semanticPrompt": "Calculates the total campaign costs for campaigns of interest as $1,591,118. (Acceptable value is $1,591,118)"
      }
    ],
    "passThreshold": 70
  }
]
```

{% endcode %}

### Criteria schema

Each criterion in the `criteria` array has the following fields:

| Field            | Type   | Required     | Description                                                                           |
| ---------------- | ------ | ------------ | ------------------------------------------------------------------------------------- |
| `id`             | string | Yes          | Unique identifier (UUID recommended)                                                  |
| `name`           | string | Yes          | Short label for the criterion                                                         |
| `type`           | string | Yes          | One of: `semantic`, `lexical`, `binary`, `ordinal`, `numeric`, `regex`                |
| `description`    | string | Yes          | What constitutes a correct response                                                   |
| `weight`         | number | Yes          | Percentage weight (all criteria weights for a task should sum to 100)                 |
| `rationale`      | string | No           | Explanation of how the correct answer is derived                                      |
| `examples`       | array  | No           | Example responses (can be empty)                                                      |
| `semanticPrompt` | string | For semantic | The prompt sent to the LLM jury to evaluate the response (must contain exact figures) |

The answer key is private by default. Buyers never see it unless you sell the Core Dataset.

## Reference Files (Optional)

Bundle any reference files into a .zip archive. Each task can reference at most one file. Accepted formats: PDF, images (JPG, PNG, WebP, GIF), CSV, TXT.

No video or audio files are supported.

The archive structure should be flat (files at the root level):

```
refs.zip/
  Target_Group (1).csv
```

## Knowledge Reference (Optional)

A .zip archive containing supplementary expert materials such as answer keys, analyst notes, calculation workbooks, or other resources that support the eval. This forms part of the Core Dataset that can be sold separately to model builders for reinforcement learning.

## After Upload

Once your eval dataset is created, proceed to [Publishing Your Eval](https://docs.portexai.com/portex-docs/for-experts/publishing-your-eval) to create a listing.
