Vector Store API
Create and manage collections of document embeddings for semantic search
The Vector Store API enables you to create and manage vector stores—specialized collections that transform your documents into searchable embeddings for semantic retrieval.
The official documentation for the Vector Store API is available here.
Key Concepts
Vector Stores
Collections that organize and index document embeddings
Document Embeddings
Numerical representations of document content that capture semantic meaning
Chunking Strategies
Methods for dividing documents into semantically meaningful segments
File Attributes
Metadata tags that enable filtering and organizing documents
API Endpoints
Create a Vector Store
Create a new vector store to hold document embeddings.
Endpoint: POST /v1/vector_stores
Content-Type: application/json
Request Body:
name
: Name of the vector store (required)metadata
: Custom metadata for the vector store (optional)
Example Response:
List Vector Stores
Retrieve a list of all your vector stores.
Endpoint: GET /v1/vector_stores
Query Parameters:
limit
: Maximum number of vector stores to return (default: 20, range: 1-100)after
: Return vector stores after this ID for pagination (optional)
Example Response:
Retrieve a Vector Store
Get details about a specific vector store.
Endpoint: GET /v1/vector_stores/{vector_store_id}
Path Parameters:
vector_store_id
: The ID of the vector store to retrieve (required)
Example Response:
Update a Vector Store
Update the metadata of a vector store.
Endpoint: PATCH /v1/vector_stores/{vector_store_id}
Content-Type: application/json
Path Parameters:
vector_store_id
: The ID of the vector store to update (required)
Request Body:
name
: New name for the vector store (optional)metadata
: New custom metadata for the vector store (optional)
Example Response:
Delete a Vector Store
Delete a vector store and all its files.
Endpoint: DELETE /v1/vector_stores/{vector_store_id}
Path Parameters:
vector_store_id
: The ID of the vector store to delete (required)
Example Response:
Deleting a vector store permanently removes all associated file embeddings. The original files in the Files API are not affected.
Example Implementation
For a complete working example of how to use the Vector Store API, check out this Python example on GitHub. This example demonstrates:
- Creating vector stores for document collections
- Adding files to vector stores with custom chunking strategies
- Managing vector store metadata and attributes
- Integrating with search tools like AgenticSearchTool and FileSearchTool
- Working with the OpenAI Agents SDK for RAG applications
Working with Vector Store Files
Add a File to a Vector Store
Add a file to a vector store for embedding and semantic search.
Endpoint: POST /v1/vector_stores/{vector_store_id}/files
Content-Type: application/json
Path Parameters:
vector_store_id
: The ID of the vector store to add the file to (required)
Request Body:
file_id
: ID of the file to add (required)chunking_strategy
: Strategy for dividing the document (optional)attributes
: Custom attributes for filtering (optional)
Example Response:
File processing happens asynchronously. The initial status will be “in_progress”, and you’ll need to check again later for the completed status.
List Files in a Vector Store
Retrieve a list of all files in a vector store.
Endpoint: GET /v1/vector_stores/{vector_store_id}/files
Path Parameters:
vector_store_id
: The ID of the vector store to list files from (required)
Query Parameters:
limit
: Maximum number of files to return (default: 20, range: 1-100)after
: Return files after this ID for pagination (optional)
Example Response:
Retrieve a File in a Vector Store
Get details about a specific file in a vector store.
Endpoint: GET /v1/vector_stores/{vector_store_id}/files/{file_id}
Path Parameters:
vector_store_id
: The ID of the vector store (required)file_id
: The ID of the file to retrieve (required)
Example Response:
Delete a File from a Vector Store
Remove a file from a vector store.
Endpoint: DELETE /v1/vector_stores/{vector_store_id}/files/{file_id}
Path Parameters:
vector_store_id
: The ID of the vector store (required)file_id
: The ID of the file to remove (required)
Example Response:
Advanced Features
Chunking Strategies
Chunking strategies determine how documents are divided for embedding and retrieval:
Divides documents into fixed-size chunks with configurable overlap:
Good for general-purpose document processing.
Divides documents into fixed-size chunks with configurable overlap:
Good for general-purpose document processing.
Attempts to divide documents along natural sentence boundaries:
Better for technical documents with code blocks and equations.
Attempts to divide documents along natural paragraph boundaries:
Better for preserving context in narrative documents.
File Attributes
File attributes enable filtering and organization of documents within a vector store:
These attributes can later be used as filters in search queries to narrow down results.
Best Practices
Vector Store Organization
Create separate vector stores for different domains
Create separate vector stores for different domains
Having distinct vector stores for different subject areas improves search relevance. For example, separate “Technical Documentation” from “Marketing Materials.”
Use meaningful names and descriptions
Use meaningful names and descriptions
Descriptive names and detailed descriptions make vector stores easier to manage as their number grows.
Leverage metadata for additional context
Leverage metadata for additional context
Store information like department ownership, data sensitivity level, or content type in the vector store metadata.
File Management
Use consistent attribute schemes
Use consistent attribute schemes
Develop a standardized set of attributes (like categories, versions, languages) and apply them consistently across files.
Include version information
Include version information
When dealing with versioned documentation, include version numbers in attributes to enable version-specific searches.
Update attributes as files change
Update attributes as files change
Keep attributes up-to-date as document content evolves to maintain search accuracy.
Chunking Strategy
Technical documentation
Technical documentation
For API references, code documentation, and technical guides, use 1000-1500 tokens per chunk.
Narrative content
Narrative content
For user guides, case studies, and explanatory content, use 500-1000 tokens per chunk.
Tabular/structured data
Tabular/structured data
For content with tables, lists, or structured formats, use smaller chunks (300-500 tokens).
Include overlap between chunks
Include overlap between chunks
Always include some overlap to maintain context between chunks, typically 10-20% of the chunk size.
Performance Considerations
- Keep file count per vector store reasonable (less than 10,000 files for optimal performance)
- Use pagination when listing files in large vector stores
- Add files in batches rather than one by one for large collections