The Embedding Service transforms text content into vector embeddings that enable semantic search and similarity comparisons in your RAG implementation.

Overview

Embeddings are numerical representations of text that capture semantic meaning. The application uses these vectors to power search functionality and document comparisons. By default, a local embedding model is used, but you can configure the system to use OpenAI’s embedding API or custom ONNX models.

Quick Setup

For most users, the default embedding configuration works out of the box. You can easily customize it using environment variables in your deployment.

Environment Variables

VariableDefaultDescription
OPEN_RESPONSES_EMBEDDINGS_MODEL_TYPEall-minilm-l6-v2The type of embedding model to use (all-minilm-l6-v2 or onnx)
OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLEDfalseWhether to use OpenAI API (true) or local model (false)
OPEN_RESPONSES_EMBEDDINGS_API_KEYYour OpenAI API key (required when HTTP_ENABLED is true)
OPEN_RESPONSES_EMBEDDINGS_MODELThe OpenAI model to use (when HTTP_ENABLED is true)
OPEN_RESPONSES_EMBEDDINGS_URLThe base URL for OpenAI API (when HTTP_ENABLED is true)
OPEN_RESPONSES_EMBEDDINGS_ONNX_MODEL_PATHPath to custom ONNX model file (when MODEL_TYPE is onnx)
OPEN_RESPONSES_EMBEDDINGS_TOKENIZER_PATHPath to custom tokenizer JSON file (when MODEL_TYPE is onnx)
OPEN_RESPONSES_EMBEDDINGS_POOLING_MODEmeanPooling mode for ONNX models: mean, cls, or max

Embedding Configuration Options

Supported Models

Default Local Model

By default, the application uses the AllMiniLmL6V2 model, which offers:

  • Fast, efficient embedding generation
  • 384-dimensional vectors
  • Good balance of performance and quality
  • No external API dependencies

Example docker-compose setup for default model:

services:
  app:
    image: masaicai/open-responses:latest
    # No specific embedding environment variables needed for default setup

Or using Docker run command:

docker run -p 8080:8080 masaicai/open-responses:latest

OpenAI Models

For higher quality embeddings, you can use OpenAI’s embedding models:

services:
  app:
    image: masaicai/open-responses:latest
    environment:
      - OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLED=true
      - OPEN_RESPONSES_EMBEDDINGS_API_KEY=your-openai-api-key
      - OPEN_RESPONSES_EMBEDDINGS_MODEL=text-embedding-3-small

Or using Docker run command:

docker run -p 8080:8080 \
  -e OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLED=true \
  -e OPEN_RESPONSES_EMBEDDINGS_API_KEY=your-openai-api-key \
  -e OPEN_RESPONSES_EMBEDDINGS_MODEL=text-embedding-3-small \
  masaicai/open-responses:latest

Benefits of OpenAI models:

  • Higher quality embeddings
  • More dimensions (1536 for text-embedding-3-small)
  • Better semantic understanding

Trade-offs:

  • Requires internet connectivity
  • Incurs API usage costs
  • Adds network latency

Custom ONNX Models

For advanced users, custom ONNX models can be used:

services:
  app:
    image: masaicai/open-responses:latest
    environment:
      - OPEN_RESPONSES_EMBEDDINGS_MODEL_TYPE=onnx
      - OPEN_RESPONSES_EMBEDDINGS_ONNX_MODEL_PATH=/models/your-model.onnx
      - OPEN_RESPONSES_EMBEDDINGS_TOKENIZER_PATH=/models/your-tokenizer.json
      - OPEN_RESPONSES_EMBEDDINGS_POOLING_MODE=mean
    volumes:
      - ./models:/models

Or using Docker run command:

docker run -p 8080:8080 \
  -e OPEN_RESPONSES_EMBEDDINGS_MODEL_TYPE=onnx \
  -e OPEN_RESPONSES_EMBEDDINGS_ONNX_MODEL_PATH=/models/your-model.onnx \
  -e OPEN_RESPONSES_EMBEDDINGS_TOKENIZER_PATH=/models/your-tokenizer.json \
  -e OPEN_RESPONSES_EMBEDDINGS_POOLING_MODE=mean \
  -v ./models:/models \
  masaicai/open-responses:latest

Performance Considerations

OpenAI Models

Higher quality but add latency and cost

Local Models

Faster and work offline but may have lower quality

Custom ONNX

Flexible, configurable for specific use cases

Embedding generation happens when documents are uploaded and indexed. The vector similarity search performance depends on the vector database implementation used.

Troubleshooting

Common Issues

Further Resources