Daggerheart Live Mechanics Matrix
Latest live acceptance matrix for the Daggerheart AI mechanics surface. This table is tool-centric so future model runs can add new columns without changing the shape of the document.
For the follow-on AI-service design guidance derived from these results, see Campaign AI Mechanics Quality.
Status enums
clean_pass: lane passed with no recorded tool-call errorspass_with_tool_errors: lane passed, but the accepted run included at least one failed tool call before recoveryfail_tool_loop_limit: lane failed because the orchestration loop exhausted its tool budgetfail_timeout: lane failed because the live run timed outfail_precondition_gap: lane failed because setup/runtime preconditions did not permit the intended mechanic pathnot_run: no accepted live run recorded for that model yet
Latest accepted runs
Post-fix live acceptance was executed on March 22, 2026 with gpt-5-mini. Accepted artifacts are written locally under .tmp/ai-live-captures/ as raw captures, markdown reports, and .summary.json summaries.
| Tool | Source scenario | gpt-5-mini | Notes |
|---|---|---|---|
character_sheet_read | CapabilityLookup | clean_pass | Character-capability read succeeded before scene framing. |
interaction_state_read | AttackReview | clean_pass | Added as the primary diagnosis read for board-sensitive attack lanes. |
daggerheart_combat_board_read | SpotlightBoardReview | clean_pass | Clean board-control run used no reference lookups and re-read the board after adversary/countdown updates. |
daggerheart_action_roll_resolve | MechanicsReview | clean_pass | Action resolution plus GM review handoff completed cleanly. |
daggerheart_gm_move_apply | GMMovePlacementReview | clean_pass | Clean placement run used a single direct-move spend target with no recovery call. |
daggerheart_adversary_create | GMMovePlacementReview | clean_pass | Same clean placement lane as daggerheart_gm_move_apply. |
daggerheart_scene_countdown_create | CountdownTriggerReview | clean_pass | Latest accepted run used fixed-start countdown creation and full trigger lifecycle. |
daggerheart_scene_countdown_advance | CountdownTriggerReview | clean_pass | Countdown advanced to TRIGGER_PENDING cleanly. |
daggerheart_scene_countdown_resolve_trigger | CountdownTriggerReview | clean_pass | Trigger resolution and board reread both completed in the accepted run. |
daggerheart_adversary_update | SpotlightBoardReview | clean_pass | Same clean board-control lane as daggerheart_combat_board_read. |
daggerheart_attack_flow_resolve | AttackReview | clean_pass | Default-profile attack flow succeeded after board/state diagnosis hardening. |
daggerheart_adversary_attack_flow_resolve | AdversaryAttackReview | clean_pass | Adversary attack, memory update, and review resolution completed cleanly. |
daggerheart_group_action_flow_resolve | GroupActionReview | clean_pass | Cooperative action lane passed once fixture names and extra-character setup were corrected. |
daggerheart_reaction_flow_resolve | ReactionReview | clean_pass | Reaction flow completed cleanly. |
daggerheart_tag_team_flow_resolve | TagTeamReview | clean_pass | Tag-team lane passed after prompt alignment with fixture character names. |
system_reference_search | PlaybookAttackReview | clean_pass | Clean playbook lane used exactly one intentional search before the combat flow. |
system_reference_read | PlaybookAttackReview | clean_pass | Same clean playbook lane as system_reference_search. |
Scenario cleanliness
These rows are scenario-level, not tool-level. They exist so a lane can be tracked as clean even when other runs for the same tools had recoverable errors.
| Scenario | gpt-5-mini | Reference usage | Notes |
|---|---|---|---|
PlaybookAttackReview | clean_pass | 1 search / 1 read | Uses the bounded playbook lane: one explicit playbook consult, then sheet, board, attack flow, and review resolution. |
SpotlightBoardReview | clean_pass | 0 search / 0 read | Uses the short always-on guidance only; no reference lookup. |
GMMovePlacementReview | clean_pass | 0 search / 0 read | Uses the short always-on guidance only; no reference lookup. |
Notes for future runs
- Add one column per model; do not replace older model columns.
- Use the latest accepted
.summary.jsonartifact for each scenario/model pair. - For scenario-level cleanliness, prefer the latest clean accepted summary, not merely the latest attempted run.
- If a newer run fails, do not overwrite a previous accepted status without an explicit product or engineering decision.