Embeddings
Embeddings turn document chunks and questions into vectors for retrieval. Three providers
are supported via EMBEDDING_PROVIDER.
Audience: operators tuning retrieval quality, cost, and locality. What you will accomplish: choose an embedding provider/model and avoid the re-ingest trap.
The three providers
- OpenAI (default) — no extra dependencies.
text-embedding-3-small/text-embedding-3-large. Best when you don’t want to manage local inference. - FastEmbed (recommended for local) — ONNX Runtime, no torch dependency. ~50MB download vs ~2GB for torch-based alternatives, zero CVEs (pure Python + ONNX). Any FastEmbed-compatible model works; unknown models trigger a warning but still attempt to load.
- HuggingFace (optional) — torch-based (
sentence-transformers/transformers). Only install if you need a HuggingFace-specific model unavailable in FastEmbed.
# FastEmbed is already in requirements.txt:
# EMBEDDING_PROVIDER=fastembed
# EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
# HuggingFace (extra deps, pulls torch):
pip install langchain-huggingface sentence-transformers transformers numpyFastEmbed model registry
| Model | Provider | Dimensions | Download | Context | Best For |
|---|---|---|---|---|---|
text-embedding-3-small | OpenAI | 1536 | API-only | 8191 | Default, production reliability |
text-embedding-3-large | OpenAI | 3072 | API-only | 8191 | Maximum accuracy, higher cost |
BAAI/bge-small-en-v1.5 | FastEmbed | 384 | ~50MB | 512 | Prototyping, small datasets, low memory |
BAAI/bge-base-en-v1.5 | FastEmbed | 768 | ~120MB | 512 | Balanced speed/quality (recommended) |
BAAI/bge-large-en-v1.5 | FastEmbed | 1024 | ~430MB | 512 | Highest local quality, slower inference |
sentence-transformers/all-MiniLM-L6-v2 | FastEmbed | 384 | ~30MB | 256 | Fast semantic search, versatile |
sentence-transformers/all-MiniLM-L12-v2 | FastEmbed | 384 | ~60MB | 256 | Slightly better quality than L6 |
BAAI/bge-m3 | FastEmbed | 1024 | ~570MB | 8192 | Arabic/English mixed content, multilingual |
nomic-ai/nomic-embed-text-v1.5 | FastEmbed | 768 | ~130MB | 8192 | Long documents (>256 tokens) |
sentence-transformers/all-MiniLM-L6-v2 | HuggingFace | 384 | ~2GB+ | 256 | Same model, torch-based (AVOID if FastEmbed works) |
Verify your result
- Verify: You set
EMBEDDING_PROVIDERand a matchingEMBEDDING_MODEL. - Verify: You prefer FastEmbed for local/zero-CVE embeddings over HuggingFace+torch.
- Verify: After changing the model, you re-ingested every document.
Common failure modes
Related next steps
- Pair embeddings with a chat model in LLM providers.
- See how vectors are used in Retrieval.
- Add documents in Ingesting documents.