GuidesIngesting documents

Ingesting documents

Give the assistant something to ground its answers on.

Audience: operators and developers managing the knowledge base. What you will accomplish: ingest a document (by URL or local upload) and confirm it is searchable.

Supported formats

PDF, TXT, Markdown (.md / .markdown), DOCX, and HTML (.html / .htm). The format is inferred from the file or URL extension.

How ingestion works

Ingestion on /api/v1 is asynchronous: the request returns 202 Accepted with a Location header and status: "queued", then processes in the background. You poll the status endpoint for progress.

  1. Step 1: Submit the document

    Either point the server at a public HTTPS URL, or upload a local file directly (no public URL needed — privacy-friendly).

  2. Step 2: Poll the status

    Call the status endpoint until the document reaches a terminal state.

  3. Step 3: Confirm it is searchable

    Once status is done, ask a question it should answer and check the citations point at your document.

Ingest by URL

curl
curl -X POST http://127.0.0.1:8000/api/v1/ingest \
-H "Content-Type: application/json" \
-d '{"file_name":"returns","s3_url":"https://example.com/returns.pdf"}'

If it fails: The URL must be a public https link ending in a supported extension. A 401 means a key is required (see auth below).

Upload a local file

No public URL required — the file never has to be hosted anywhere.

curl
curl -X POST http://127.0.0.1:8000/api/v1/ingest/upload \
-F "file=@/path/to/returns.docx"
# optional explicit id: -F "file_name=returns"

If it fails: The file must be a supported format (validated by extension; PDFs also get a %PDF header check) and within the server’s MAX_FILE_SIZE_MB.

Poll status, list, and delete

curl
# Poll status
curl http://127.0.0.1:8000/api/v1/ingest/status/returns

# List (paginated)
curl "http://127.0.0.1:8000/api/v1/ingest/docs?limit=50&cursor=0"

# Delete (always needs X-API-Key)
curl -X DELETE http://127.0.0.1:8000/api/v1/ingest/returns -H "X-API-Key: <key>"

Status values

StatusMeaning
queuedAccepted; processing in the background.
doneEmbedded and searchable.
skippedUnchanged or duplicate content (the same file under a different name is caught by content hash).
failedProcessing failed; resubmit or check the document.

Authentication

EndpointKey required?
DELETE /api/v1/ingest/{doc_id}Always (when API_KEY is set).
POST /api/v1/ingest, /ingest/upload, GET /ingest/docsOnly when REQUIRE_AUTH_FOR_INGEST=true.

Pass the key in the X-API-Key header. When API_KEY is empty, auth is skipped (dev mode).

Verify your result

  • Verify: The ingest call returns 202 with a Location header and status: "queued".
  • Verify: Polling the status endpoint eventually returns done (or skipped for duplicates).
  • Verify: A question the document answers now returns it in sources[].
  • Verify: DELETE with a valid X-API-Key removes the document.

Common mistakes and fixes

  • 401 on DELETEDELETE always requires X-API-Key when a key is configured.
  • skipped instead of done → the content is unchanged or a duplicate; this is expected and not an error.
  • Invalid file_name → remove dots/slashes and keep it ≤128 chars.