Conversation Scenarios

A conversation scenario takes a decision task and removes one convenience: the inputs. The facts still exist, but they are held by a simulated persona, and the operator has to elicit them, turn by turn, before there is anything to decide. That one change is what makes the shape worth building. In real work, the case file does not arrive complete. Someone asked the right follow-up question, noticed the hesitation, built enough trust for the client to mention the thing they were not going to mention. Conversation scenarios measure that layer, with the decision task still sitting at the core.

How the persona holds the facts

The scenario’s state is split by how it can be reached:

Known state is what the persona knows about themselves. Each item carries a reveal trigger: some facts are volunteered freely, some surface only under a direct question, and some are shared only after rapport is built. Two operators running the same scenario walk away with different information, because they asked differently.
Hidden state is what the persona genuinely cannot know: test results they have not seen, records they cannot access, inconsistencies only a professional would spot. It rewards the operator who investigates rather than accepts.

The persona itself is built from a reusable identity (who they are) and personality (how they communicate), and it reacts to the operator: a dismissive opening produces a more guarded persona, a well-handled moment opens them up. The conversation is generated fresh each session, not scripted.

What you configure

Component	Role in a conversation
Persona	Identity, personality, and how readily they disclose
State	Known state with reveal triggers; hidden state for what only investigation surfaces
Briefing	What the operator knows walking in, and who speaks first
Outputs	The decision and artifacts the conversation should produce, same as a decision task
Criteria	Success and failure metrics, plus scope boundaries (what the operator can do, must escalate, must never do) and the terminology they should use correctly

Scoring covers both layers: did the operator surface what was there to surface, and was the resulting judgment right. A confident decision built on facts the operator never elicited scores differently from the same decision built on a complete picture.

Worked example

A lender wants to know whether loan officers uncover affordability risks during an application call.

Persona: a self-employed applicant, friendly and talkative, whose stated income is accurate but recent
Known state: changed business structure eight months ago (direct question); a second loan application declined elsewhere last month (rapport required); monthly figures (volunteered)
Hidden state: the bank statements, which the operator can request, show irregular deposits that do not match the stated monthly figure
Outputs: proceed, decline, or refer to a senior assessor, plus a written application summary
Failure metric: proceeds without asking how long the business has operated in its current form

A skilled officer finds the declined application. A rushed one gets a pleasant call and a clean-looking summary, and the score shows exactly which questions were never asked.

Testing a chatbot

If the thing you want to evaluate is itself an AI, the shape does not change. A chatbot evaluation is a conversation scenario with your agent in the operator seat: the agent faces the same persona, the same withheld facts, and the same criteria as your people. That symmetry is the point. “Is the bot ready?” becomes a comparison you can read off a report, not an opinion.

Moving up the ladder

A conversation measures one session. Some work only reveals its quality over time: the plan made in week one either holds or unravels by week six. When the question is longitudinal, extend the conversation into a journey simulation.

Getting Started

What Can You Build?

Scenario Anatomy

Running Sessions

Results

Organization

Connectors

How the persona holds the facts

What you configure

Worked example

Testing a chatbot

Moving up the ladder

​How the persona holds the facts

​What you configure

​Worked example

​Testing a chatbot

​Moving up the ladder

How the persona holds the facts

What you configure

Worked example

Testing a chatbot

Moving up the ladder