The RAG system provides a comprehensive solution for enhancing your AI model outputs with relevant information from your document collections. This powerful capability enables more accurate, informative, and contextually relevant responses.

System Overview

The RAG system exposes several APIs that you can use in your applications:

FeatureFiles APIVector Store APISearch API
PurposeDocument managementEmbedding storageContent retrieval
Endpoints/v1/files/v1/vector_stores/v1/vector_stores/:id/search
InputPDF, TXT, etc.File IDsQuery, filters
OutputFile metadataStore metadataContent chunks

RAG System API Overview

How RAG Works

  1. You upload documents via the Files API
  2. You add these documents to vector stores using the Vector Store API
  3. The system automatically processes documents, extracting and embedding their content
  4. Your application or AI assistants query for relevant information using the Search API
  5. You enhance AI responses with the retrieved information

Detailed RAG API Data Flow

Getting Started

Prerequisites

  • LLM provider’s API key for authentication
  • Access to the Open Responses API endpoints
  • Documents you want to make searchable

Step-by-Step Integration Guide

1. Configure Your Environment

Set up your environment variables:

# For OpenAI integration
export OPENAI_API_KEY=your_openai_api_key

# For Groq integration
export GROQ_API_KEY=your_groq_api_key

# For embedding configuration (optional)
export OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLED=true
export OPEN_RESPONSES_EMBEDDINGS_API_KEY=your_openai_api_key
export OPEN_RESPONSES_EMBEDDINGS_MODEL=text-embedding-3-small

For detailed embedding configuration options, see the Embedding Models documentation.

2. Upload Documents

Upload documents using the Files API:

curl --location 'http://localhost:8080/v1/files' \
--header 'Authorization: Bearer $YOUR_API_KEY' \
--form 'file=@"/path/to/your/document.pdf"' \
--form 'purpose="user_data"'

Response:

{
  "id": "file_abc123",
  "object": "file",
  "bytes": 12345,
  "created_at": 1677610602,
  "filename": "document.pdf",
  "purpose": "user_data"
}

3. Create a Vector Store

Create a vector store to hold document embeddings:

curl --location 'http://localhost:8080/v1/vector_stores' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $YOUR_API_KEY' \
--data '{
  "name": "My Knowledge Base"
}'

Response:

{
  "id": "vs_def456",
  "object": "vector_store",
  "created_at": 1677610602,
  "name": "My Knowledge Base",
  "file_count": 0,
  "metadata": {}
}

4. Add Files to the Vector Store

Add your uploaded files to the vector store:

curl --location 'http://localhost:8080/v1/vector_stores/vs_def456/files' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $YOUR_API_KEY' \
--data '{
  "file_id": "file_abc123",
  "chunking_strategy": {
    "type": "static",
    "static": {
      "max_chunk_size_tokens": 1000,
      "chunk_overlap_tokens": 200
    }
  },
  "attributes": {
    "category": "documentation",
    "language": "en"
  }
}'

Response:

{
  "id": "vsfile_ghi789",
  "object": "vector_store.file",
  "created_at": 1677610603,
  "vector_store_id": "vs_def456",
  "status": "processing",
  "usage_bytes": 0,
  "chunking_strategy": {
    "type": "static",
    "static": {
      "max_chunk_size_tokens": 1000,
      "chunk_overlap_tokens": 200
    }
  },
  "attributes": {
    "category": "documentation",
    "language": "en"
  }
}

5. Search the Vector Store

Search for relevant content:

curl --location 'http://localhost:8080/v1/vector_stores/vs_def456/search' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $YOUR_API_KEY' \
--data '{
  "query": "How do I configure the system?",
  "max_num_results": 5,
  "filters": {
    "type": "eq",
    "key": "language",
    "value": "en"
  }
}'

Integration with AI Models

Using the file_search Tool with OpenAI

You can integrate RAG with OpenAI models by using the file_search tool:

curl --location 'http://localhost:8080/v1/responses' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--header 'x-model-provider: openai' \
--data '{
    "model": "gpt-4o",
    "store": true,
    "tools": [{
      "type": "file_search",
      "vector_store_ids": ["vs_def456"],
      "max_num_results": 5,
      "filters": {
        "type": "and",
        "filters": [
          {
            "type": "eq",
            "key": "language",
            "value": "en"
          },
          {
            "type": "eq",
            "key": "category",
            "value": "documentation"
          }
        ]
      }
    }],
    "input": "How do I configure the system?",
    "instructions": "Answer questions using information from the provided documents.",
    "stream": false
  }'

Advanced Usage

Chunking Strategies

When adding files to a vector store, you can specify how the documents should be chunked:

"chunking_strategy": {
  "type": "static",
  "static": {
    "max_chunk_size_tokens": 1000,
    "chunk_overlap_tokens": 200
  }
}

Document Chunking and Embedding Process

Filtering Results

You can filter search results using file attributes with a structured filter format:

"filters": {
  "type": "and",
  "filters": [
    {
      "type": "eq",
      "key": "language",
      "value": "en"
    },
    {
      "type": "eq",
      "key": "category",
      "value": "documentation"
    },
    {
      "type": "gte",
      "key": "version",
      "value": 2.0
    }
  ]
}

The filter system supports comparison operators (eq, ne, gt, gte, lt, lte) and logical operators (and, or) for powerful and flexible search filtering. For detailed information, see the Search API documentation.

Best Practices

1

Document Preparation

Use clear, well-structured documents for better search results. Split large documents into logical sections before uploading.

2

Vector Store Organization

Create separate vector stores for different domains or use cases. Use file attributes to organize documents within a vector store.

3

Query Optimization

Craft clear, specific queries for better results. Use filters to narrow down the search scope. Adjust chunking strategy based on your content and query patterns.

4

Performance Considerations

Balance chunk size for optimal search precision and context. Limit the number of search results to what you actually need. Consider caching frequent search results.

API Reference

For detailed information on individual APIs, refer to the dedicated documentation: