Quickstart

This walkthrough takes you from an empty organization to your first scored session. Plan on building one small scenario and running it yourself before involving anyone else.

Create a role

A role, like Claims Adjuster or Case Manager, is the container that organizes your scenarios, agents, and benchmarks. Pick the job whose judgment you want to capture and create a role for it. Everything else in this walkthrough happens inside that role. See roles and members.

Build your first scenario

Start small. A decision task is the fastest first scenario: present the operator with a set of inputs, such as a claim file or an application, and ask for one judgment. If your domain is conversation-driven, build a simple conversation scenario instead: a persona holds the facts, and the operator has to ask for them.Either way, define at least one success metric (what the operator should accomplish) and one failure metric (the error they must avoid). These are what the session gets scored against. See defining good.

Base the scenario on a real case you have seen handled well and handled badly. If you can say what separated the two, you have your scoring criteria.

Run a session yourself

Before assigning the scenario to anyone, complete it yourself. You will see exactly what an operator sees: the briefing, then the task or the conversation. Running it yourself surfaces problems fast: a briefing that gives the answer away, a persona that volunteers too much, a judgment that cannot be made from the inputs provided.

Review the scored result

When your session ends, it is scored against the criteria you defined. The report shows which criteria were met, which were missed, and why. Check that the score matches your own sense of how you did. If it does not, the criteria need work, not the scoring. Tighten them and run again. See results.

Run it with people, artificial intelligence (AI) agents, or both

Once the scenario scores you the way you would score yourself, put real operators in the seat:

Your people

Create a cohort: a batch of scenario assignments sent to your team. Each person’s session is scored against the same criteria.

Your AI

Configure an agent and assign it the same scenario. Its sessions are scored identically, so you can compare it directly against your people.

Where to go next

Read core concepts for the platform’s full vocabulary, or go deeper on scenario design with personas, state, and outputs.

What Can You Build?Three scenario shapes, from one-shot decisions to multi-week simulations

Getting Started

What Can You Build?

Scenario Anatomy

Running Sessions

Results

Organization

Connectors

Your people

Your AI

Where to go next

Your people

Your AI

​Where to go next

Where to go next