The File Search tool provides direct access to your vector stores, enabling semantic search to find relevant content based on the meaning of your query rather than just keyword matching.
What It Is
The File Search tool connects directly to your vector stores, allowing you to search through your document collections using semantic understanding rather than just keyword matching. It’s designed for immediate results on simple, well-defined queries.
Query: “What is the maximum connection timeout value?”
Results:
From: config/database.md
The maximum connection timeout value is 300 seconds (5 minutes). This can be configured in the database settings with the DB_CONNECTION_TIMEOUT
environment variable. For high-latency networks, consider increasing this value.
File Search Tool in Action
Configuration
Basic configuration requires a query parameter:
Basic Configuration
Example Query
curl --location 'http://localhost:8080/v1/responses' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $YOUR_API_KEY' \
--data '{
"model": "openai@gpt-4o",
"tools": [
{
"type": "file_search",
"vector_store_ids": ["vs_product_docs"]
}
],
"input": "What is the recommended database connection timeout?",
"instructions": "Search the product documentation to find specific recommendations."
}'
Configuration Parameters
Parameter Description Default Required query
The search query - Yes vector_store_ids
List of vector store IDs to search - Yes filters
Optional filters to narrow search results null No max_num_results
Maximum number of results to return 20 No
The response includes relevant content chunks with metadata:
{
"data" : [
{
"file_id" : "file_abc123" ,
"filename" : "config/database.md" ,
"score" : 0.92 ,
"content" : "The maximum connection timeout value is 300 seconds (5 minutes). This can be configured in the database settings with the `DB_CONNECTION_TIMEOUT` environment variable. For high-latency networks, consider increasing this value." ,
"annotations" : [
{
"type" : "file_citation" ,
"index" : 0 ,
"file_id" : "file_abc123" ,
"filename" : "config/database.md"
}
]
},
{
"file_id" : "file_def456" ,
"filename" : "deployment/networking.md" ,
"score" : 0.87 ,
"content" : "Connection timeout settings are crucial for system stability. Default values vary by service: database (300s), API services (60s), worker processes (120s)." ,
"annotations" : [
{
"type" : "file_citation" ,
"index" : 0 ,
"file_id" : "file_def456" ,
"filename" : "deployment/networking.md"
}
]
}
]
}
When to Use It
The File Search tool excels in specific scenarios where direct, immediate access to information is required.
Recommended For Consider Alternatives For Simple, factual queries Finding specific documentation Known-item searches Code examples for specific tasks Looking up configurations or settings Complex research questions Multi-faceted analysis Questions requiring synthesis of multiple sources Exploratory research without clear goals
Use Cases for File Search
For complex questions requiring deep research and multiple search iterations, consider using the Agentic Search Tool instead.
Example Queries
Simple Factual Queries
{
"query" : "What is the maximum file upload size?"
}
Finding Code Examples
{
"query" : "Show me examples of error handling in async functions"
}
Looking Up Configuration
{
"query" : "Default Redis cache expiration settings"
}
Advanced Usage
The filters
parameter allows you to narrow search results based on document metadata:
curl --location 'http://localhost:8080/v1/responses' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--data '{
"model": "openai@gpt-4o",
"tools": [{
"type": "file_search",
"vector_store_ids": ["vs_security_docs"],
"max_num_results": 5,
"query": "Authentication best practices",
"filters": {
"type": "and",
"filters": [
{
"type": "eq",
"key": "compliance",
"value": "GDPR"
},
{
"type": "gt",
"key": "updated_date",
"value": "2023-01-01"
}
]
}
}],
"input": "What are the best practices for authentication that comply with GDPR?",
"instructions": "Answer questions using information from the provided documents."
}'
Filters dramatically improve search relevance by narrowing the search space. Always use them when you know specific attributes of the documents you’re looking for.
Multi-Store Searching
Searching across multiple vector stores allows you to find information across different document collections:
curl --location 'http://localhost:8080/v1/responses' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--data '{
"model": "openai@gpt-4o",
"tools": [{
"type": "file_search",
"vector_store_ids": ["vs_technical_docs", "vs_code_examples", "vs_architecture"],
"max_num_results": 5,
"query": "API rate limiting implementation"
}],
"input": "Explain how to implement API rate limiting",
"instructions": "Answer questions using information from the provided documents."
}'
Best Practices
More specific queries yield better results than vague ones.
Good Example :
{
"query" : "How to configure JWT token expiration in the auth service"
}
Less Effective :
{
"query" : "JWT settings"
}
The vector search works best with natural language queries rather than keyword lists.
Good Example :
{
"query" : "What are the steps to set up a development environment for the project?"
}
Less Effective :
{
"query" : "setup development environment steps"
}
For the most precise results, combine semantic search with metadata filters.
curl --location 'http://localhost:8080/v1/responses' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--data '{
"model": "openai@gpt-4o",
"tools": [{
"type": "file_search",
"vector_store_ids": ["vs_engineering_docs"],
"max_num_results": 5,
"query": "Database migration best practices",
"filters": {
"type": "and",
"filters": [
{
"type": "eq",
"key": "department",
"value": "backend"
},
{
"type": "eq",
"key": "classification",
"value": "internal"
}
]
}
}],
"input": "What are the best practices for database migrations?",
"instructions": "Answer questions using information from the provided documents."
}'
Integration Examples
curl --location 'http://localhost:8080/v1/responses' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--data '{
"model": "openai@gpt-4o",
"tools": [{
"type": "file_search",
"vector_store_ids": ["vs_technical_docs"],
"max_num_results": 5,
"filters": {
"type": "and",
"filters": [
{
"type": "eq",
"key": "category",
"value": "documentation"
},
{
"type": "eq",
"key": "updated_after",
"value": "2023-01-01"
}
]
}
}],
"input": "How do I implement rate limiting?",
"instructions": "Answer questions using information from the provided documents."
}'
The file_search tool automatically performs the search and provides the results to the AI model in the same request. This creates a seamless RAG experience where the model can access and use the information without additional API calls.
GitHub Example Implementation
For a complete working example of the File Search tool, check out this Python example on GitHub . This example demonstrates:
Uploading files and creating vector stores
Setting up and configuring the FileSearchTool
Performing direct searches with different parameters
Comparing results with the Agentic Search approach
Integration with the OpenAI Agents SDK
Using with Custom Search Logic
For applications that need more control over the search process, you can implement custom search logic:
async function enhancedResponseWithSearch ( query , topic ) {
// First perform a targeted search
const searchFilters = {
type: "and" ,
filters: [
{ type: "eq" , key: "topic" , value: topic }
]
};
// Use the results to enhance your AI response
const response = await openai . responses . create ({
model: "gpt-4o" ,
tools: [{
type: "file_search" ,
vector_store_ids: [ "vs_docs" , "vs_code_examples" ],
max_num_results: 3 ,
filters: searchFilters
}],
input: query ,
instructions: "Provide a detailed answer based on the documentation."
});
return response ;
}
Feature File Search Agentic Search Query complexity Simple Complex Search iterations Single Multiple Response time Fast Slower Result quality Direct matches Comprehensive Best for Known-item searches Research questions
File Search vs. Agentic Search