OperationsDeployment

Deployment

Run the service reliably in production: the compose topology, durable ingestion, and how to scale.

Audience: operators running the backend. What you will accomplish: choose an ingestion mode, scale workers, and serve streaming through a proxy.

Compose topology (api + worker + redis)

docker-compose.yml runs three services:

  • api — the FastAPI app serving /api/v1.
  • workerpython -m ingest.worker, the durable ingest consumer.
  • redis — memory, rate limiting, the ingest registry, and the durable ingest queue.

The cloud compose file sets INGEST_MODE=queue so ingestion survives API restarts. For a single-process setup, set INGEST_MODE=inline (the code default) and drop the worker. For a zero-cloud stack, see docker-compose.local.yml in Installation.

Durable ingestion (INGEST_MODE)

INGEST_MODE controls how queued documents are processed:

ModeBehavior
inline (default)Processed in-process via FastAPI BackgroundTasks. Simple; suited to small deployments. Ingestion does not survive an API restart mid-job.
queueJobs are enqueued onto a Redis list consumed by python -m ingest.worker. Survives API restarts and retries transient failures.

Key properties of queue mode:

  • Retries — failed jobs are retried up to INGEST_MAX_ATTEMPTS (default 3).
  • Idempotency — ingestion is incremental and content-hashed, so re-running a job does not duplicate chunks; unchanged content is skipped.
  • Shared upload staging — uploaded files are staged to INGEST_INCOMING_DIR (default ./ingest_incoming), which must be a volume shared between the api and worker containers in queue mode.
Terminal
# Run the durable worker (queue mode)
python -m ingest.worker

If it fails: If uploads never process, confirm INGEST_INCOMING_DIR is a volume shared by both the api and worker containers.

Scaling

  • Redis-shared state — memory, rate-limit counters, and queues live in Redis, so multiple API replicas share state correctly behind a load balancer.
  • More API workers — run Uvicorn/Gunicorn with --workers N to use multiple processes.
  • More ingest throughput — run additional ingest.worker processes against the same Redis queue.

Verify your result

  • Verify: You chose inline (single process) or queue (durable worker) deliberately.
  • Verify: In queue mode, INGEST_INCOMING_DIR is a shared volume and the worker is running.
  • Verify: Multiple replicas share Redis state; SSE buffering is disabled at the proxy.

Common failure modes