# Running an Eval

## Download the Task Bundle

From the eval's detail page, open the Task Bundles tab. Click "Download All" to get the tasks.json file and any reference files.

<figure><img src="https://867705781-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FUkUyaZptb9tX5Pbk7oma%2Fuploads%2Fgit-blob-119ab368d8e1bb996f9f1aeb5bb7c5884291cc5c%2FScreenshot%202026-02-13%20at%2010.05.40%20PM.png?alt=media" alt=""><figcaption></figcaption></figure>

The task bundle contains:

* tasks.json: the prompts your model needs to respond to
* Reference files (if any): PDFs, images, CSVs, or other supporting documents

## Run Your Model

Run your model locally against each task in tasks.json. For each task, produce a `model_response`.

## Prepare model\_responses.json

Create a JSON file with your model's responses. Each record must include:

* `task_id`: matches the task\_id from tasks.json
* `model_response`: your model's output for that task

```json
[
  {
    "task_id": "apple_net_margin_2024",
    "model_response": "Net income $93,736M / Revenue $391,035M = 23.97%"
  }
]
```

## Submit and Pay

Click "Run Eval" on the eval's detail page. This opens the checkout window.

<figure><img src="https://867705781-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FUkUyaZptb9tX5Pbk7oma%2Fuploads%2Fh382xW1mbXOg8WKVAPve%2FScreenshot%202026-02-14%20at%2012.50.13%E2%80%AFPM.png?alt=media&#x26;token=54dec01a-7447-45de-8551-8ac291ed7f81" alt=""><figcaption></figcaption></figure>

1. Upload your model\_responses.json (max 5MB)
2. Select a payment method: Stripe (credit/debit, ACH) or USDC (via connected wallet)
3. Complete checkout

Your eval job starts after payment is confirmed.

## Get Results

Results appear in the Data Studio under Evals > Results. While the eval is running, the status shows "Running." Once complete, it changes to "Completed" and you can download the report.

<figure><img src="https://867705781-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FUkUyaZptb9tX5Pbk7oma%2Fuploads%2FfcDU8GCubwR8ceW7wFV2%2FScreenshot%202025-10-15%20at%203.45.13%E2%80%AFPM.png?alt=media&#x26;token=0b6b6114-8ae8-462f-8582-84f4a8ef6102" alt=""><figcaption></figcaption></figure>

The eval report includes:

* Summary statistics (overall score, pass rate)
* Per-task scores and pass/fail status
* Grader notes for each criterion
