Observability
Trace a single request end-to-end and know when the service is healthy.
Audience: operators monitoring the backend. What you will accomplish: wire logs to an aggregator and use the health/readiness probes correctly.
Correlation IDs
Every request is tagged with a correlation ID (X-Correlation-Id) that propagates
through all log calls, graph nodes, and service layers — so one request can be traced across
Redis operations, ChromaDB retrievals, LLM calls, and ingest steps.
- The correlation-ID middleware injects or preserves the
X-Correlation-Idheader and stores it in acontextvars.ContextVarfor async-safe propagation. - The value also appears as
meta.correlation_idin chat responses — quote it in support requests.
Request timing
The request-timing middleware logs method, path, status code, duration (ms), and the correlation ID for every request.
Structured logging
Set LOG_FORMAT=json for Datadog, CloudWatch, or ELK ingestion. Each log line includes
timestamp, level, correlation_id, and message fields. The default is text.
LOG_FORMAT=json # "text" (default) or "json" for log aggregators
LOG_LEVEL=INFOLogs go to the console and a rotating file: logs/app.log is capped at 10 MB with
5 rotated backups.
Health vs readiness probes
Two distinct probes:
GET /health— cached startup flags; returnsokordegradedbased on Redis and ChromaDB connectivity at startup time.GET /ready— live probe; returns200with{"status": "ready"}only if both Redis and ChromaDB respond right now, or503with dependency-specific error detail if either is down. Use this for Kubernetes readiness probes / load-balancer health checks.
curl http://127.0.0.1:8000/health
# → ok (or degraded if Redis/ChromaDB unreachable at startup)
curl -i http://127.0.0.1:8000/ready
# → 200 {"status": "ready"} when both deps respond
# → 503 {... dependency detail ...} when one is downIf it fails: Connection refused means the server isn't running. A 503 from /ready names the failing dependency.
Verify your result
- Verify: Your client logs
X-Correlation-Idfrom every response. - Verify: Log aggregation is enabled with
LOG_FORMAT=json. - Verify: Kubernetes readiness uses
/ready; a quick liveness check can use/health.
Common failure modes
Related next steps
- Plan replicas and proxies in Deployment.
- Harden access in Security.
- See the headers to log in the API summary.