Decision Tasks

A decision task hands the operator everything up front: the case file, the context, the constraints. The operator reads it and produces a judgment, with the reasoning behind it. There is no dialogue to manage and no information to chase down. What gets measured is the decision itself. This is the simplest scenario shape, and the right place to start if your question is “do our people (or our AI) make the right call when the facts are in front of them?”

When to use a decision task

Classification problems: approve or deny, eligible or ineligible, urgent or routine
Triage: which queue, which specialist, which priority
Escalation calls: handle it, refer it, or stop and flag it
Document review: is this application complete, is this report consistent, does this claim hold together

If the hard part of the job is the judgment rather than the interaction, a decision task isolates exactly that.

What you configure

A decision task uses the standard scenario anatomy with the conversational parts switched off or minimized:

Component	Role in a decision task
Documents	Carry the case material itself: forms, reports, transcripts, records
Briefing	Frames the operator’s role and what they are being asked to decide
Decision definitions	The choice the operator must make, with its available options
Artifact definitions	The written rationale, assessment note, or recommendation that accompanies the decision
Criteria	Success metrics, failure metrics, and scope boundaries that define the right call and the calls that must never be made

The decision and its rationale are both scored. A correct decision reached by the wrong reasoning is a finding, not a pass: the rationale is where you see whether the judgment will transfer to the next case.

Worked example

An income protection insurer wants to know whether claims assessors apply the policy’s work-capacity test consistently.

Documents: a claim file containing the claimant’s statement, an employer report, and two medical assessments that point in slightly different directions
Decision: continue payments, suspend payments, or request an independent medical examination, defined as a decision with three options
Artifact: a written assessment note justifying the choice against the policy wording
Success metrics: identifies the conflict between the two medical assessments; applies the work-capacity test from the policy, not a general impression of severity
Failure metric: suspends payments based on the employer report alone

Every assessor sees the identical file. Every AI agent sees the identical file. The scored results show who applied the test, who pattern-matched on severity words, and how the reasoning differed, case by case.

Running it

With your people: assign the task through a cohort. Operators complete it in one sitting; you see decisions, rationales, and scores per operator.
With an AI agent: run it in an automated benchmark. Because the inputs are fixed, decision tasks are the cheapest shape to run at volume, which makes them well suited to comparing several agents or prompt versions on the same case set.

Moving up the ladder

A decision task measures the judgment but assumes the facts arrive complete. In most real work, they do not: someone had to ask the right questions first. When you want to measure that part too, wrap the same judgment in a conversation scenario, where the persona holds the facts and reveals them only to operators who earn them.

Getting Started

What Can You Build?

Scenario Anatomy

Running Sessions

Results

Organization

Connectors

When to use a decision task

What you configure

Worked example

Running it

Moving up the ladder

​When to use a decision task

​What you configure

​Worked example

​Running it

​Moving up the ladder

When to use a decision task

What you configure

Worked example

Running it

Moving up the ladder