Open Response API Documentation

The Embedding Service transforms text content into vector embeddings that enable semantic search and similarity comparisons in your RAG implementation.

Overview

Embeddings are numerical representations of text that capture semantic meaning. The application uses these vectors to power search functionality and document comparisons. By default, a local embedding model is used, but you can configure the system to use OpenAI’s embedding API or custom ONNX models.

Quick Setup

For most users, the default embedding configuration works out of the box. You can easily customize it using environment variables in your deployment.

Environment Variables

Variable	Default	Description
`OPEN_RESPONSES_EMBEDDINGS_MODEL_TYPE`	all-minilm-l6-v2	The type of embedding model to use (all-minilm-l6-v2 or onnx)
`OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLED`	false	Whether to use OpenAI API (true) or local model (false)
`OPEN_RESPONSES_EMBEDDINGS_API_KEY`		Your OpenAI API key (required when HTTP_ENABLED is true)
`OPEN_RESPONSES_EMBEDDINGS_MODEL`		The OpenAI model to use (when HTTP_ENABLED is true)
`OPEN_RESPONSES_EMBEDDINGS_URL`		The base URL for OpenAI API (when HTTP_ENABLED is true)
`OPEN_RESPONSES_EMBEDDINGS_ONNX_MODEL_PATH`		Path to custom ONNX model file (when MODEL_TYPE is onnx)
`OPEN_RESPONSES_EMBEDDINGS_TOKENIZER_PATH`		Path to custom tokenizer JSON file (when MODEL_TYPE is onnx)
`OPEN_RESPONSES_EMBEDDINGS_POOLING_MODE`	mean	Pooling mode for ONNX models: mean, cls, or max

Embedding Configuration Options

Supported Models

Default Local Model

By default, the application uses the AllMiniLmL6V2 model, which offers:

Fast, efficient embedding generation
384-dimensional vectors
Good balance of performance and quality
No external API dependencies

Example docker-compose setup for default model:

services:
  app:
    image: masaicai/open-responses:latest
    # No specific embedding environment variables needed for default setup

Or using Docker run command:

docker run -p 8080:8080 masaicai/open-responses:latest

OpenAI Models

For higher quality embeddings, you can use OpenAI’s embedding models:

services:
  app:
    image: masaicai/open-responses:latest
    environment:
      - OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLED=true
      - OPEN_RESPONSES_EMBEDDINGS_API_KEY=your-openai-api-key
      - OPEN_RESPONSES_EMBEDDINGS_MODEL=text-embedding-3-small

Or using Docker run command:

docker run -p 8080:8080 \
  -e OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLED=true \
  -e OPEN_RESPONSES_EMBEDDINGS_API_KEY=your-openai-api-key \
  -e OPEN_RESPONSES_EMBEDDINGS_MODEL=text-embedding-3-small \
  masaicai/open-responses:latest

Benefits of OpenAI models:

Higher quality embeddings
More dimensions (1536 for text-embedding-3-small)
Better semantic understanding

Trade-offs:

Requires internet connectivity
Incurs API usage costs
Adds network latency

Custom ONNX Models

For advanced users, custom ONNX models can be used:

services:
  app:
    image: masaicai/open-responses:latest
    environment:
      - OPEN_RESPONSES_EMBEDDINGS_MODEL_TYPE=onnx
      - OPEN_RESPONSES_EMBEDDINGS_ONNX_MODEL_PATH=/models/your-model.onnx
      - OPEN_RESPONSES_EMBEDDINGS_TOKENIZER_PATH=/models/your-tokenizer.json
      - OPEN_RESPONSES_EMBEDDINGS_POOLING_MODE=mean
    volumes:
      - ./models:/models

Or using Docker run command:

docker run -p 8080:8080 \
  -e OPEN_RESPONSES_EMBEDDINGS_MODEL_TYPE=onnx \
  -e OPEN_RESPONSES_EMBEDDINGS_ONNX_MODEL_PATH=/models/your-model.onnx \
  -e OPEN_RESPONSES_EMBEDDINGS_TOKENIZER_PATH=/models/your-tokenizer.json \
  -e OPEN_RESPONSES_EMBEDDINGS_POOLING_MODE=mean \
  -v ./models:/models \
  masaicai/open-responses:latest

Performance Considerations

OpenAI Models

Higher quality but add latency and cost

Local Models

Faster and work offline but may have lower quality

Custom ONNX

Flexible, configurable for specific use cases

Embedding generation happens when documents are uploaded and indexed. The vector similarity search performance depends on the vector database implementation used.

Get Started

Announcements

Use Cases & Demos

OpenResponses API

Retrieval (RAG)

Embedding Models

Overview

Quick Setup

Environment Variables

Supported Models

Default Local Model

OpenAI Models

Custom ONNX Models

Performance Considerations

OpenAI Models

Local Models

Custom ONNX

Troubleshooting

Common Issues

Further Resources

Get Started

Announcements

Use Cases & Demos

OpenResponses API

Retrieval (RAG)

​Overview

​Quick Setup

​Environment Variables

​Supported Models

​Default Local Model

​OpenAI Models

​Custom ONNX Models

​Performance Considerations

OpenAI Models

Local Models

Custom ONNX

​Troubleshooting

​Common Issues

​Further Resources

Overview

Quick Setup

Environment Variables

Supported Models

Default Local Model

OpenAI Models

Custom ONNX Models

Performance Considerations

Troubleshooting

Common Issues

Further Resources