Daggerheart Live Mechanics Matrix

Latest live acceptance matrix for the Daggerheart AI mechanics surface. This table is tool-centric so future model runs can add new columns without changing the shape of the document.

For the follow-on AI-service design guidance derived from these results, see Campaign AI Mechanics Quality.

Status enums

  • clean_pass: lane passed with no recorded tool-call errors
  • pass_with_tool_errors: lane passed, but the accepted run included at least one failed tool call before recovery
  • fail_tool_loop_limit: lane failed because the orchestration loop exhausted its tool budget
  • fail_timeout: lane failed because the live run timed out
  • fail_precondition_gap: lane failed because setup/runtime preconditions did not permit the intended mechanic path
  • not_run: no accepted live run recorded for that model yet

Latest accepted runs

Post-fix live acceptance was executed on March 22, 2026 with gpt-5-mini. Accepted artifacts are written locally under .tmp/ai-live-captures/ as raw captures, markdown reports, and .summary.json summaries.

Tool Source scenario gpt-5-mini Notes
character_sheet_read CapabilityLookup clean_pass Character-capability read succeeded before scene framing.
interaction_state_read AttackReview clean_pass Added as the primary diagnosis read for board-sensitive attack lanes.
daggerheart_combat_board_read SpotlightBoardReview clean_pass Clean board-control run used no reference lookups and re-read the board after adversary/countdown updates.
daggerheart_action_roll_resolve MechanicsReview clean_pass Action resolution plus GM review handoff completed cleanly.
daggerheart_gm_move_apply GMMovePlacementReview clean_pass Clean placement run used a single direct-move spend target with no recovery call.
daggerheart_adversary_create GMMovePlacementReview clean_pass Same clean placement lane as daggerheart_gm_move_apply.
daggerheart_scene_countdown_create CountdownTriggerReview clean_pass Latest accepted run used fixed-start countdown creation and full trigger lifecycle.
daggerheart_scene_countdown_advance CountdownTriggerReview clean_pass Countdown advanced to TRIGGER_PENDING cleanly.
daggerheart_scene_countdown_resolve_trigger CountdownTriggerReview clean_pass Trigger resolution and board reread both completed in the accepted run.
daggerheart_adversary_update SpotlightBoardReview clean_pass Same clean board-control lane as daggerheart_combat_board_read.
daggerheart_attack_flow_resolve AttackReview clean_pass Default-profile attack flow succeeded after board/state diagnosis hardening.
daggerheart_adversary_attack_flow_resolve AdversaryAttackReview clean_pass Adversary attack, memory update, and review resolution completed cleanly.
daggerheart_group_action_flow_resolve GroupActionReview clean_pass Cooperative action lane passed once fixture names and extra-character setup were corrected.
daggerheart_reaction_flow_resolve ReactionReview clean_pass Reaction flow completed cleanly.
daggerheart_tag_team_flow_resolve TagTeamReview clean_pass Tag-team lane passed after prompt alignment with fixture character names.
system_reference_search PlaybookAttackReview clean_pass Clean playbook lane used exactly one intentional search before the combat flow.
system_reference_read PlaybookAttackReview clean_pass Same clean playbook lane as system_reference_search.

Scenario cleanliness

These rows are scenario-level, not tool-level. They exist so a lane can be tracked as clean even when other runs for the same tools had recoverable errors.

Scenario gpt-5-mini Reference usage Notes
PlaybookAttackReview clean_pass 1 search / 1 read Uses the bounded playbook lane: one explicit playbook consult, then sheet, board, attack flow, and review resolution.
SpotlightBoardReview clean_pass 0 search / 0 read Uses the short always-on guidance only; no reference lookup.
GMMovePlacementReview clean_pass 0 search / 0 read Uses the short always-on guidance only; no reference lookup.

Notes for future runs

  • Add one column per model; do not replace older model columns.
  • Use the latest accepted .summary.json artifact for each scenario/model pair.
  • For scenario-level cleanliness, prefer the latest clean accepted summary, not merely the latest attempted run.
  • If a newer run fails, do not overwrite a previous accepted status without an explicit product or engineering decision.