The Search API enables you to perform semantic searches across your vector stores to find relevant document content. This API is at the core of the RAG system, allowing you to retrieve information based on meaning rather than just keywords.

The official documentation for the Search API is available here.

Key Benefits

Semantic Understanding

Find content based on meaning, not just exact keyword matches

Contextual Results

Retrieve document chunks with surrounding context preserved

Flexible Filtering

Narrow results by document attributes and metadata

AI Integration

Connect search results directly to AI responses

API Endpoints

Search a Vector Store

Search for relevant content within a vector store.

Endpoint: POST /v1/vector_stores/{vector_store_id}/search
Content-Type: application/json

Path Parameters:

  • vector_store_id: The ID of the vector store to search (required)

Request Body:

  • query: The search query string (required)
  • max_num_results: Maximum number of results to return (default: 10)
  • filters: Optional filters to apply based on file attributes
  • rewrite_query: Whether to rewrite the query for better vector search (default: false)
  • ranking_options: Options for ranking/scoring results
curl --location 'http://localhost:8080/v1/vector_stores/vs_67f8046b1ad5d3000012/search' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $YOUR_API_KEY' \
--data '{
  "query": "When did empress matilda reached england?",
  "max_num_results": 2,
  "filters": {
    "type": "and",
    "filters": [
      {
        "type": "eq",
        "key": "language",
        "value": "en"
      },
      {
        "type": "eq",
        "key": "category",
        "value": "api_reference" 
      }
    ]
  },
  "ranking_options": {
    "score_threshold": 0.7
  }
}'

Example Response:

{
    "object": "vector_store.search_results.page",
    "search_query": "When did empress matilda reached england?",
    "data": [
        {
            "file_id": "open-responses-file_67f803701ad5d3000002.pdf",
            "filename": "Empress Matilda.pdf",
            "score": 0.8449070359549928,
            "attributes": {
                "category": "api_reference",
                "language": "en",
                "version": "2.0",
                "filename": "Empress Matilda.pdf",
                "chunk_index": 10,
                "total_chunks": 14,
                "chunk_id": "c_67f8084e1ad5d3000039",
                "file_id": "open-responses-file_67f803701ad5d3000002.pdf"
            },
            "content": [
                {
                    "type": "text",
                    "text": "In 1108, Henry left Matilda..."
                }
            ]
        },
        {
            "file_id": "open-responses-file_67f803701ad5d3000002.pdf",
            "filename": "Empress Matilda.pdf",
            "score": 0.8283641913829363,
            "attributes": {
                "category": "api_reference",
                "language": "en",
                "version": "2.0",
                "filename": "Empress Matilda.pdf",
                "chunk_index": 0,
                "total_chunks": 14,
                "chunk_id": "c_67f8084d1ad5d300002f",
                "file_id": "open-responses-file_67f803701ad5d3000002.pdf"
            },
            "content": [
                {
                    "type": "text",
                    "text": "Matilda Depiction of Matilda in the 12th-century..."
                }
            ]
        }
    ],
    "has_more": false,
    "next_page": null
}

Request Parameters in Detail

Query

The query parameter is a natural language question or statement that describes the information you’re looking for. The system converts this query into a vector embedding and finds document chunks with similar embeddings.

"query": "How do I reset my password?"

Straightforward queries work well for specific information needs.

Max Number of Results

The max_num_results parameter controls how many results are returned. The default is 10, but you can adjust this based on your needs:

"max_num_results": 5

For applications that need more context, increase this value, but be aware that higher values may include less relevant results.

Filters

The filters parameter lets you narrow search results based on file attributes. Filters follow a structured format that enables both simple and complex filtering conditions.

Filter Types

The API supports two types of filters:

Comparison Filters

Compare a property value against a specific value

Compound Filters

Combine multiple filters using logical operators

Comparison Operators

OperatorDescriptionExample
eqEqual toMatch where language is “en”
neNot equal toMatch where status is not “archived”
gtGreater thanMatch where version > 1.0
gteGreater than or equal toMatch where created_at ≥ 1672531200
ltLess thanMatch where priority < 3
lteLess than or equal toMatch where size ≤ 1024

Basic Filter Examples

"filters": {
  "type": "eq",
  "key": "language",
  "value": "en"
}

Find documents where the language attribute equals “en”.

Compound Filters

Combine multiple conditions using logical operators:

Advanced Use Cases

Date Range Filtering

"filters": {
  "type": "and",
  "filters": [
    {
      "type": "gte",
      "key": "created_at",
      "value": 1672531200
    },
    {
      "type": "lt",
      "key": "created_at",
      "value": 1704067200
    }
  ]
}

Filter documents created in 2023 (using Unix timestamps).

Boolean Filters

"filters": {
  "type": "and",
  "filters": [
    {
      "type": "eq",
      "key": "is_approved",
      "value": true
    },
    {
      "type": "eq",
      "key": "is_public",
      "value": true
    }
  ]
}

Find approved public documents.

When designing your document attributes, consider what filtering capabilities you’ll need and ensure your attributes are structured accordingly.

Ranking Options

The ranking_options parameter configures how results are scored and ranked:

"ranking_options": {
  "score_threshold": 0.7
}
  • score_threshold: Minimum similarity score (0.0-1.0) for results to be included

Response Format in Detail

The search response includes:

Search Query

The original query string used for the search:

"search_query": "How do I configure the system?"

Data Array

An array of search results, each containing:

  • file_id: ID of the file containing the result
  • filename: Name of the file
  • score: Similarity score (0.0-1.0)
  • attributes: File attributes for filtering
  • content: Array of content segments

The results are ordered by descending score (most relevant first).

Pagination Information

  • has_more: Boolean indicating if there are more results available
  • next_page: Token for retrieving the next page of results (if available)

Using the file_search Tool

The file_search tool provides an alternative way to search vector stores through the AI assistant API. This is especially useful for integrating search within conversational AI flows.

Tool Usage with OpenAI Models

curl --location 'http://localhost:8080/v1/responses' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--header 'x-model-provider: openai' \
--data '{
    "model": "gpt-4o",
    "tools": [{
      "type": "file_search",
      "vector_store_ids": ["vs_abc123"],
      "max_num_results": 5,
      "filters": {
        "type": "and",
        "filters": [
          {
            "type": "eq",
            "key": "category",
            "value": "documentation"
          },
          {
            "type": "eq",
            "key": "language",
            "value": "en"
          }
        ]
      }
    }],
    "input": "How do I configure the system?",
    "instructions": "Answer questions using information from the provided documents."
}'

The file_search tool automatically performs the search and provides the results to the AI model in the same request. This creates a seamless RAG experience.

Advanced Search Techniques

Query Rewriting (Coming Soon)

For improved search results, you can enable query rewriting:

"rewrite_query": true

When enabled, the system will automatically reformulate the query to improve vector similarity matching. This is particularly helpful for:

  • Queries with ambiguous terms
  • Questions with implicit context
  • Queries that could benefit from expansion with related terms

Best Practices

Query Optimization

1

Be specific and clear

Craft precise queries rather than broad ones. “How do I reset a user password?” is better than “password reset.”

2

Use natural language

Frame queries as complete questions or statements rather than keyword lists.

3

Include context

If searching for domain-specific information, include relevant context in the query.

4

Use filters effectively

Combine query text with appropriate filters to narrow down the search space.

Performance Considerations

  • Keep max_num_results reasonable (5-20) for better performance
  • Use filters to reduce the search space when applicable
  • Consider implementing client-side caching for frequent queries
  • For very large vector stores, use pagination to handle results efficiently

Integration Patterns

Direct API Integration

Integrate the Search API directly into your application’s backend:

  1. Capture user query from your application UI
  2. Send search request to the Search API
  3. Process and display results in your application
  4. Optionally use results to enhance subsequent AI interactions

AI-Driven Search with file_search Tool

Let the AI assistant drive the search interaction:

  1. Configure AI with the file_search tool
  2. User asks a question to the AI
  3. AI determines when to use the search tool and formulates appropriate queries
  4. AI incorporates search results into its response
  5. User receives a cohesive answer combining search results with AI capabilities