Configuring an agent
An agent is a saved configuration with four parts:| Setting | What it is |
|---|---|
| Name | A label so you can tell agents apart in run results |
| Model source | A predefined model, or a custom endpoint (see below) |
| System prompt | Written instructions that define how the agent behaves: the role it plays, how it decides, and what it refuses to do |
| Sampling settings | Controls over how varied the model’s responses are |
- Predefined model
- Custom endpoint
Pick a model from the providers available in the app. The available list evolves, so check the agent creation form for current options.
Agent creation requires admin access. See roles and members.
Training tiers
Agents progress through three tiers, each building on the expert sessions you have captured:- Prompt - you write a system prompt yourself; self-serve and available now.
- Optimized - your prompts are refined against your captured expert data; a bespoke service.
- Custom - a model fine-tuned on your expert data; a bespoke service.
Benchmark runs
A benchmark run executes an agent against one or more scenarios and scores the results.Start a run
Pick an agent and the scenarios to test. The run is queued and executes asynchronously, so you can start it and come back later.
Watch it progress
The run moves through statuses: queued, running (agent-vs-persona sessions in progress), scoring (sessions finished, scoring against the scenario’s criteria), and completed. Runs that hit an error show failed, and you can cancel a run manually. Each scenario’s progress updates individually.
Testing a chatbot
If the thing you want to evaluate is itself a conversational AI product, you do not need anything special: that is a conversation scenario with your agent in the operator seat. The persona plays the customer, your chatbot plays the operator, and the session is scored against the same criteria you would apply to a human agent handling the conversation. The comparison between your chatbot and your people is direct because nothing about the measurement changes, only who is in the seat.Run scenarios with your people
Capture expert sessions and measure your team at scale.
Scenario types
Decision tasks, conversation scenarios, and journey simulations.