Embedding Models
Configure embedding models for semantic search and similarity comparisons
The Embedding Service transforms text content into vector embeddings that enable semantic search and similarity comparisons in your RAG implementation.
Overview
Embeddings are numerical representations of text that capture semantic meaning. The application uses these vectors to power search functionality and document comparisons. By default, a local embedding model is used, but you can configure the system to use OpenAI’s embedding API or custom ONNX models.
Quick Setup
For most users, the default embedding configuration works out of the box. You can easily customize it using environment variables in your deployment.
Environment Variables
Embedding Configuration Options
Supported Models
Default Local Model
By default, the application uses the AllMiniLmL6V2 model, which offers:
- Fast, efficient embedding generation
- 384-dimensional vectors
- Good balance of performance and quality
- No external API dependencies
Example docker-compose setup for default model:
Or using Docker run command:
OpenAI Models
For higher quality embeddings, you can use OpenAI’s embedding models:
Or using Docker run command:
Benefits of OpenAI models:
- Higher quality embeddings
- More dimensions (1536 for text-embedding-3-small)
- Better semantic understanding
Trade-offs:
- Requires internet connectivity
- Incurs API usage costs
- Adds network latency
Custom ONNX Models
For advanced users, custom ONNX models can be used:
Or using Docker run command:
Performance Considerations
OpenAI Models
Higher quality but add latency and cost
Local Models
Faster and work offline but may have lower quality
Custom ONNX
Flexible, configurable for specific use cases
Embedding generation happens when documents are uploaded and indexed. The vector similarity search performance depends on the vector database implementation used.
Troubleshooting
Common Issues
OpenAI Connection Errors
OpenAI Connection Errors
- Check your OPEN_RESPONSES_EMBEDDINGS_API_KEY is correct
- Verify network connectivity to OPEN_RESPONSES_EMBEDDINGS_URL
- Confirm your OpenAI account has available quota
Local Model Performance
Local Model Performance
- The default model requires approximately 150MB of RAM
- Ensure your container has sufficient memory allocated
Custom ONNX Model Issues
Custom ONNX Model Issues
- Verify file paths are correct and the files are accessible
- Ensure your model is compatible with the application
- Check logs for specific error messages