Defining good

A scenario is only as useful as its definition of a good performance. Tacit gives you six families of criteria, each answering a different question about the session. You do not need all six on every scenario; pick the ones that match what you want to measure. All three scenario shapes use these criteria. Decision tasks lean on success and failure metrics around the judgment itself. Conversation scenarios and journey simulations tend to use the full range, because dialogue gives scope boundaries, terminology, and best practices more room to show up.

Success metrics

What the operator should accomplish. Each success metric names one concrete outcome the session should produce.
  • “Determine whether the claim is valid”
  • “Identify a return-to-work timeline”
  • “Uncover the side work the claimant has been doing”
Keep each metric to a single accomplishment. “Gather history and assess risk and document the plan” is three metrics wearing one label, and the report cannot tell you which of the three was missed.

Failure metrics

Critical errors the operator must avoid. Failure metrics name the moves that sink a session regardless of everything else done well.
  • “Approves return to work without reviewing the imaging”
  • “Quotes a payout figure before eligibility is confirmed”
  • “Dismisses a safety concern the persona raises”
A session can hit every success metric and still fail on one of these. That is the point: failure metrics encode the lines your organization does not let anyone cross.

Rubrics

Graded quality, not pass or fail. A rubric measures how well something was done on a scale, where metrics ask whether it was done at all. Each rubric has a three-level hierarchy:
  1. Rubric - a named evaluation framework, such as “Claims Assessment Quality”
  2. Criteria - the individual dimensions evaluated, such as “Information Gathering” or “Decision Quality”
  3. Levels - the scoring levels for each criterion, each with a label such as “Exceptional” or “Proficient”, a score value, a description, and observable indicators that mark the level in a real session
Rubrics live at the organization level and link to scenarios, so one rubric can score many scenarios consistently. A scenario can have several rubrics attached, each covering a different dimension of the work.
Write observable indicators as things you could point to in a transcript. “Asked about home duties before recommending restrictions” is observable. “Showed good judgment” is not.

Scope boundaries

What this role is allowed to do. Scope boundaries draw three lines:
BoundaryMeaningExample
Can doWithin the operator’s authorityAdjust a return-to-work date
Must refer or escalateAllowed to recognize, not to resolveA request for a lump-sum settlement
Cannot doOutside the role entirelyGive a medical diagnosis
Staying inside scope is itself a measured skill. An operator who answers a question they should have escalated has made an error, even if the answer happened to be right.

Terminology

The domain language the operator should use correctly. Terminology entries define the words and phrases that carry precise meaning in your domain, so the session can be checked for whether the operator used them properly. If “incapacity” and “impairment” mean different things in your domain, an operator who swaps them is making a real error, and terminology entries make that error visible.

Best practice sets

Curated patterns of competent behavior. A best practice set collects the moves your strongest people make: confirm understanding before moving on, check medication history before discussing treatment, summarize agreed actions at the close. Each practice can carry optional exemplars, real excerpts showing the pattern done well, so the standard is concrete rather than aspirational.

How criteria become results

Every session is scored against the criteria the scenario defines, and the report shows which criteria were met and why.

Next steps

Outputs

The artifacts and decisions these criteria are applied to.

Results

How scored sessions are reported and compared.