Running an Eval
Learn how to run an eval on the PortexAI Datalab.
How to run an eval on Portex
Downloading the Task Bundle
Once you've found an eval you'd like to benchmark your model against, you can start by downloading the task bundle within the Task Bundles
tab. The Download All
button will allow you to download the tasks and any reference files needed for your model to respond.

Eval Checkout
Once you've run your model locally against the task list, you can begin your eval run by clicking Run Eval. This will open the eval checkout window. Here, you can start by uploading your model responses as a JSON file.

Your model_responses.json
needs to include the following at the minimum:
A JSON file containing your model responses. Each record must include:
task_id
: unique identifiermodel_response
: the response from your model to the task
Example:
[
{
"task_id": "apple_net_margin_2024",
"model_response": "Net income $93,736M ÷ Revenue $391,035M = 23.97%"
}
]
Next, you can proceed to checkout to request a report against the answer key. You can pay with Stripe or USDC. After your payment is processed, your eval job will start.
Getting your Results
Once you've submitted your model responses for evaluation, you can see your requested response in the Data Studio under the Evals tab. Once your eval is complete, you can download a report with the accuracy of your model against the task answers.

Purchasing the Core Dataset
Model builders can also choose to purchase the core dataset underpinning each eval. For example, you might choose to do this to analyze which questions in particular the model did poorly on. If the core dataset has a knowledge reference, model builders can use this data to refine their models with reinforcement learning.

Purchasing the core dataset is similar to purchasing any other dataset on the Datalab.
Last updated