# Reviewing Model Responses

After you've created an eval dataset, Portex runs SOTA models against your eval and maintains leaderboards for your eval. You can review model scores and inspect individual responses and offer feedback in the Data Studio under the annotate page.

This is only visible to you and can help you iterate on your evals and improve them.

{% hint style="success" %}
These eval runs are internally maintained by Portex for leaderboards. Payment is received when external model builders submit model runs.
{% endhint %}

## Annotate Overview

Navigate to Evals > Results > Annotate in the Data Studio sidebar. Select your eval to see results.

<figure><img src="/files/vcLXeIk0kZ2l0B7bBO76" alt=""><figcaption></figcaption></figure>

The results table shows eval name, latest version, task count, average performance, total run time, model count, and top model.

Click into an eval to see the detailed pass/fail grid across models and criteria.

<figure><img src="/files/LCmgC1X7bhmfkwACsxYh" alt=""><figcaption></figcaption></figure>

You can toggle between summary and detailed views. The summary view shows aggregate scores per model; the detailed view shows per-task, per-criterion breakdowns with pass/fail badges for each model.

## Annotate

The Annotate task-level view provides a side-by-side interface for inspecting individual model responses against your tasks.

<figure><img src="/files/vQTYKcTvsC6u0CEje8AJ" alt=""><figcaption></figcaption></figure>

### Layout

* Left panel: a selected SOTA model's response and reasoning
* Right panel: your task prompt, answers, and an Notes section for annotating model responses

### Comparing models

Use the model selector tabs at the top of the right panel to switch between different model submissions. You can toggle between "Model Responses" and "Model Reasoning" (to see the model's chain of thought, if available).

Tabs within the right panel let you view the Task Prompt, Answer, and Grading Criteria for the selected task.

### Annotate

Highlight text selections in the model response and offer specific feedback or commentary.

### Ranking and tagging

You can rank model responses (top 3) by dragging them into position. This is useful for comparative analysis across models.

You can also tag responses with labels like "Hallucination," "Logical Errors," or "Surprising Result" for tracking patterns across submissions.

### Versioning

If you have [edited your eval](/portex-docs/for-experts/editing-an-eval.md) and created new versions, the Annotate view lets you review results per version.

## Leaderboards

Each published eval has a public leaderboard showing model performance. The leaderboard is visible on the eval's detail page under the Leaderboards tab, showing model name, score, run time, and relative cost.

Use leaderboard data to calibrate your eval's difficulty and identify areas where you might add harder tasks.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.portexai.com/portex-docs/for-experts/reviewing-results.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
