User Guide

The AI Chatbot Backend, explained for the people who use it

A retrieval-augmented (RAG) knowledge assistant that answers from your approved documents, shows the citations behind every answer, and tells you honestly how well each answer is grounded. Multi-mode, multilingual, and streamable.

Start the Quickstart API summary

Modes: strict · open · learning · learning_review
Languages: EN · AR · PT
SSE streaming
RFC 9457 errors

What this assistant is

The service is a document-grounded question-answering API. You send a question to POST /api/v1/chat (base URL http://127.0.0.1:8000); it retrieves the most relevant chunks from your ingested documents, drafts an answer, and verifies that the answer is actually supported by those chunks before returning it.

Three properties make it trustworthy rather than just fluent:

Citations on every answer. Each reply carries structured sources[] — label, doc_id, score, page, and snippet — so a person can verify it.
Honest grounding. meta.grounded reports whether the answer is supported, partial, or unsupported. In strict mode an unsupported answer is replaced by a refusal, so a confident-sounding hallucination cannot ship.
You stay in control. Pick a mode per request, force a language, and tune retrieval (top_k, score_threshold) without changing server config.

Choose your path

Get started

Installation

Run the server from source (conda + pip), with Docker Compose, or fully local with Ollama + FastEmbed — zero cloud keys.

Open

15 min

Quickstart

Get a first grounded answer back from a running server with one curl command, then verify the response.

Open

Concept

Architecture

FastAPI + middleware + the 8-node LangGraph pipeline + Redis + ChromaDB, and the sync / streaming / async-ingest lifecycles.

Open

How-to

Chatting

The chat endpoint, request fields, the four modes, and how to read the answer + sources + grounding.

Open

How-to

Streaming

Render tokens as they arrive over Server-Sent Events, with the token → sources → done event order.

Open

Concept

Languages

English, Arabic, and European Portuguese — auto-detection, RTL rendering, and multi-turn follow-ups.

Open

Concept

Trust & citations

Read citations, turn grounding scores into a confidence indicator, and treat refusal as a feature.

Open

How-to

Ingesting documents

Add knowledge by URL or local upload, poll ingest status, and understand API-key auth.

Open

How-to

Rating answers

Submit up/down feedback with a correlation id so bad answers feed the review queue.

Open

Providers

LLM & embeddings

14 LLM providers via a universal OpenAI-compatible adapter, plus OpenAI / FastEmbed / HuggingFace embeddings.

Open

Operations

Deploy & secure

Compose topology, durable ingestion, scaling, the security model and hardening checklist, observability, and evaluation.

Open

Reference

Configuration

Every environment variable grouped like .env.example — defaults and purpose for LLM, retrieval, security, and more.

Open

Reference

API summary

Endpoint list, request headers, and the response/rate-limit headers you should log.

Open

Reference

Troubleshooting

Empty strict answers, 401 on ingest, 422 validation, 429 backoff, and degraded health.

Open

New here? Begin with the Quickstart.
Building a chat UI? Read Streaming and Trust & citations.
Operating the service? See Ingesting documents and Errors & rate limits.

Installation

The AI Chatbot Backend, explained for the people who use it

What this assistant is

Choose your path

Installation

Quickstart

Architecture

Chatting

Streaming

Languages

Trust & citations

Ingesting documents

Rating answers

LLM & embeddings

Deploy & secure

Configuration

API summary

Troubleshooting

Related next steps