Observability gaps

The game service has good audit coverage for unary RPCs and OTel tracing infrastructure. This document captures remaining observability gaps identified during the game service review.

Completed

  • Streaming audit interceptor coverage (F14.1)
  • Domain rejections promoted to audit events (F14.2)
  • Projection gap detection emits audit events (F14.3)

Pending

Structured logging migration

log.Printf remains in 11+ operational paths: audit failures, authz telemetry, dead-letter detection, session lock blocks, startup phases, worker loops, and store close errors. These should migrate to log/slog with structured fields for queryability.

Domain-specific tracing spans

OTel setup exists but no domain-specific spans are created for command execution, event persistence, or projection apply. A span factory for domain write operations would enable latency attribution across the write path.

Metrics wiring

Two metric constants exist (game_audit_writes_emitted_total, game_audit_write_errors_total) but are unused. Wiring these and adding counters for projection lag, command throughput, and rejection rates would provide operational health signals.

Health check expansion

Three capability states are registered at startup. Gaps:

  • Campaign service hardcodes Operational (never degrades)
  • Catalog monitor stops polling after ready (no re-detection)
  • No health checks for event store, projection lag, or auth service