Campaign AI GM Guardrails

Purpose

Define the behavioral guardrails for AI agents acting as RPG Game Masters. Each guardrail specifies the rule, enforcement level, and mechanism.

Enforcement levels: Runtime (orchestration loop), Eval (promptfoo assertions), Instruction (agent instructions only).

ID	Guardrail	Level	Mechanism
T1	Must commit authoritative interaction before final text	Runtime	`TurnController.HasCommittedOrResolvedInteraction()`
T2	Must open player phase, conclude session, or OOC before completion	Runtime	`TurnController.ReadyForCompletion()`
T3	Must not commit narration after opening player phase	Runtime	`TurnController.PlayerHandoffRegressed()`
T5	Beat ordering follows interaction contract	Eval	`required_beat_types` / `forbidden_beat_types`
T8	Prompt beats ask what the PC does, not what NPCs say	Eval + Instruction	`forbidden_prompt_phrases` + `skills.md`

ID	Guardrail	Level	Mechanism
U1	Only tools in the curated GM-safe allowlist	Runtime	`ToolPolicy.Allows()`
U2	Max 8 provider loop steps per turn	Runtime	Step counter in `runner.go`
U4	Reference lookup bounded per scenario	Eval	`max_reference_searches` / `max_reference_reads`
U5	Sheet read before capability-sensitive mechanics	Instruction + Eval	`daggerheart/skills.md` + tool order assertions

ID	Guardrail	Level	Mechanism
N1	Prompt beats for PC intention only	Eval + Instruction	`forbidden_prompt_phrases`
N4	No engine vocabulary in player-facing beats	Instruction	`campaign-ai-agent-system.md`
N5	Impossible declarations trigger clarification	Eval	`IntentImpossibleAction` scenario
N6	Ambiguous intent triggers clarification	Eval	`IntentAmbiguousAction` scenario

ID	Guardrail	Level	Mechanism
M1	Resource accounting matches tool outcomes	Eval	`expected_hope`, `expect_action_roll_hope_spend_source`
M2	Character capability verification before adjudication	Instruction + Eval	Sheet-first rule, tool order prefix
M5	Action roll modifier source matches declared action	Eval	`expect_action_roll_modifier_source`

ID	Guardrail	Level	Mechanism
C1	OpenViking non-authoritative only	Runtime	Architecture boundary
C2	Graceful degradation when OpenViking unavailable	Runtime	Nil checks, error handling
C3	Phase-aware context access map	Runtime	`BuildPhaseGuide()` + `BuildContextAccessMap()`
C5	skills.md read-only	Eval	`expect_skills_read_only`

ID	Guardrail	Level	Mechanism
S1	Session grant validation	Runtime	`auth_reference_policy.go` + signed JWT
S2	Credential/grant revocation blocked while bound	Runtime	`usage_policy.go`
S4	Prompt injection resistance	Eval	`RedTeamPromptInjection` scenario
S5	Jailbreak resistance	Eval	`RedTeamJailbreak` scenario
S6	Hallucination resistance	Eval	`RedTeamHallucination` scenario
S7	Hijacking resistance	Eval	`RedTeamHijacking` scenario
S8	Overreliance resistance	Eval	`RedTeamOverreliance` scenario
S9	Excessive agency resistance	Eval	`RedTeamExcessiveAgency` scenario