What is Portex?

Expert-authored evals for state-of-the-art AI models.

PortexAI builds evaluations ("evals") for state-of-the-art AI models and agents. Evals have become a bedrock of the AI ecosystem as they are increasingly doing double duty: they both contextualize model performance in benchmarks and provide reward signals for post-training and reinforcement learning.

Portex Evals are expert-authored, domain-specific evaluation datasets and grading rubrics designed to measure frontier and economically-relevant work by AI models. Each eval is a set of procedural tasks (with optional reference files) plus a private answer key and explicit rubric used by our AsymmetryZero LLM-jury protocol or lexical judge to produce standardized scores and reports.

The PortexAI Datalab is where experts design, publish, and commercialize evals and accompanying datasets, and where model builders can license task bundles or evaluate their model's responses.

These docs cover how to create evals, run them, and use the Datalab as either an expert or a model builder.

New here? Start with Creating an Account or read How Evals Work for a conceptual overview.

NextHow Evals Work

Last updated 1 month ago