Retrieval

How the system decides what context to feed the model — and what to do when nothing is relevant.

Audience: developers tuning answer quality. What you will accomplish: understand the score gate, the retrieval strategies, and query rewriting.

The relevance score gate

Retrieval runs in two steps. Step 1 fetches the single closest chunk and checks its cosine similarity against RETRIEVAL_SCORE_THRESHOLD (default 0.3). Behavior then depends on CHAT_MODE:

strict — if even the best match is below threshold, the question is off-topic: no context is sent to the LLM and a refusal prompt is used.
open / learning — below-threshold matches are still passed as weak grounding signals; the prompt tells the model to use general knowledge when context is weak and to be honest about provenance (best-available behavior).

ChromaDB is configured with cosine distance (hnsw:space: cosine) so scores are comparable to a 0..1 threshold.

Retrieval strategies (MMR vs hybrid)

Step 2 runs only when the gate passes. The strategy is set by RETRIEVAL_STRATEGY:

Value	What it does
`mmr` (default)	Maximal Marginal Relevance — fetches candidates and picks chunks that are both relevant and diverse, avoiding near-identical paragraphs.
`hybrid`	Dense vectors + BM25 lexical recall fused via Reciprocal Rank Fusion (RRF). Recovers acronyms, SKUs, and exact phrases that pure dense search misses.
`hybrid_rerank`	Hybrid candidates passed through a reranker integration point (default passthrough until a concrete reranker is wired in).

Context-aware query rewriting (#1)

When QUERY_REWRITE_ENABLED=true (default), the condense_query node rewrites a multi-turn follow-up into a standalone search query using the rolling summary plus recent turns — before retrieval. This fixes the classic failure where “what about overseas?” retrieves nothing because it has no subject on its own.

Two important properties:

It is skipped on the first turn (there is nothing to condense).
Only retrieval uses the rewritten query — generation still uses the original question, so the answer stays faithful to what the user actually asked.

Verify your result

Verify: You can explain why strict mode returns empty sources on an off-topic question.
Verify: You know RETRIEVAL_STRATEGY accepts mmr | hybrid | hybrid_rerank and that mmr is the default.
Verify: You understand that query rewriting affects retrieval only, not the generated answer.

Common failure modes

See how a passing retrieval is then checked in Groundedness.
Compare per-mode gate behavior in Chat modes.
Tune the knobs in Configuration.

Architecture Groundedness