Embedding Models
Configure embedding models for semantic search and similarity comparisons
The Embedding Service transforms text content into vector embeddings that enable semantic search and similarity comparisons in your RAG implementation.
Overview
Embeddings are numerical representations of text that capture semantic meaning. The application uses these vectors to power search functionality and document comparisons. By default, a local embedding model is used, but you can configure the system to use OpenAI’s embedding API or custom ONNX models.
Quick Setup
For most users, the default embedding configuration works out of the box. You can easily customize it using environment variables in your deployment.
Environment Variables
Embedding Configuration Options
Supported Models
Default Local Model
By default, the application uses the AllMiniLmL6V2 model, which offers:
- Fast, efficient embedding generation
- 384-dimensional vectors
- Good balance of performance and quality
- No external API dependencies
Example docker-compose setup for default model:
Or using Docker run command:
OpenAI Models
For higher quality embeddings, you can use OpenAI’s embedding models:
Or using Docker run command:
Benefits of OpenAI models:
- Higher quality embeddings
- More dimensions (1536 for text-embedding-3-small)
- Better semantic understanding
Trade-offs:
- Requires internet connectivity
- Incurs API usage costs
- Adds network latency
Custom ONNX Models
For advanced users, custom ONNX models can be used:
Or using Docker run command:
Performance Considerations
OpenAI Models
Higher quality but add latency and cost
Local Models
Faster and work offline but may have lower quality
Custom ONNX
Flexible, configurable for specific use cases
Embedding generation happens when documents are uploaded and indexed. The vector similarity search performance depends on the vector database implementation used.