Running an Eval

Run your model against a Portex eval and receive a scored report.

Download the Task Bundle

From the eval's detail page, open the Task Bundles tab. Click "Download All" to get the tasks.json file and any reference files.

The task bundle contains:

tasks.json: the prompts your model needs to respond to
Reference files (if any): PDFs, images, CSVs, or other supporting documents

Run Your Model

Run your model locally against each task in tasks.json. For each task, produce a model_response.

Prepare model_responses.json

Create a JSON file with your model's responses. Each record must include:

task_id: matches the task_id from tasks.json
model_response: your model's output for that task

[
  {
    "task_id": "apple_net_margin_2024",
    "model_response": "Net income $93,736M / Revenue $391,035M = 23.97%"
  }
]

Submit and Pay

Click "Run Eval" on the eval's detail page. This opens the checkout window.

Upload your model_responses.json (max 5MB)
Select a payment method: Stripe (credit/debit, ACH) or USDC (via connected wallet)
Complete checkout

Your eval job starts after payment is confirmed.

Get Results

Results appear in the Data Studio under Evals > Results. While the eval is running, the status shows "Running." Once complete, it changes to "Completed" and you can download the report.

The eval report includes:

Summary statistics (overall score, pass rate)
Per-task scores and pass/fail status
Grader notes for each criterion

PreviousBrowsing Evals NextPurchasing a Core Dataset

Last updated 18 days ago

hashtagDownload the Task Bundle

hashtagRun Your Model

hashtagPrepare model_responses.json

hashtagSubmit and Pay

hashtagGet Results

Download the Task Bundle

Run Your Model

Prepare model_responses.json

Submit and Pay

Get Results