Replay Operations

Operational runbook for replay execution, projection repair, and integrity troubleshooting.

For replay architecture and invariants, start with Event replay.

When to use this runbook

Use replay operations when:

projection state is stale or inconsistent with event journal sequence
adapter changes require deterministic read-model rebuild
integrity verification detects replay-path failures

Replay modes in operations

Full replay

Use for maximum confidence rebuilds after major projection or adapter changes.

Tradeoff: highest runtime cost.

Snapshot-accelerated replay

Use for routine catch-up and faster recovery when snapshot/checkpoint state is trusted.

Tradeoff: relies on snapshot/checkpoint correctness.

Partial replay

Use when replay scope is known and bounded by sequence.

Tradeoff: requires confidence in start sequence and campaign scope.

Operator workflow

Identify affected campaign IDs and failure symptoms.
Confirm latest checkpoint and expected sequence head.
Choose replay mode (full/snapshot/partial).
Execute replay run.
Validate projection parity and checkpoint progression.
Re-run affected integration/smoke checks if change was broad.

Detecting projection gaps

Gap indicators:

non-contiguous projection sequence markers
checkpoint stagnation despite new events
known entity state mismatch with latest event-derived facts

Primary checks:

compare campaign event head sequence vs projection/checkpoint sequences
verify adapter routing coverage for event types in affected interval

Repairing projection gaps

stop unsafe writes for affected scope if required
run replay in chosen mode
verify contiguous sequence application through target head
compare critical projection entities against expected event outcomes
restore normal writes after parity checks pass

Post-persist fold/apply failures

If event append succeeded but fold/apply failed:

treat journal event as authoritative
fix failing adapter/folder path
rerun replay to reconcile derived state

Do not delete authoritative events to “repair” projections.

Integrity checks and constraints

sequence continuity is mandatory
hash/signature verification failures are blocking
unknown system module/adapter routing is fail-fast
replay operations must not bypass canonical apply paths

Practical safeguards

run in smaller campaign batches when diagnosing failures
capture before/after checkpoint and head sequences per campaign
log reason codes and failing event types for repeated failures
keep rollback plan for operational windows that include write pause