zvec_db.embedders.dense.openai
OpenAI-compatible API embeddings using /embeddings endpoint.
This module provides dense embedding generation using OpenAI-compatible APIs, which works with: - OpenAI API (text-embedding-3-small, text-embedding-3-large, etc.) - vLLM serving open-source embedding models - Any OpenAI-compatible API endpoint
Available Classes
- OpenAIEmbedder
Uses the /v1/embeddings endpoint for dense vector generation. Supports query/passage prefixes for asymmetric embedding models.
Example Usage
from zvec_db.embedders.dense import OpenAIEmbedder
# OpenAI API
embedder = OpenAIEmbedder(
model_name="text-embedding-3-small",
api_key="sk-..."
)
vector = embedder.embed("search query")
# vLLM local with asymmetric model (e.g., E5, GTE)
embedder = OpenAIEmbedder(
base_url="http://localhost:8000/v1",
model_name="intfloat/e5-large-v2",
query_prefix="query: ",
passage_prefix="passage: "
)
query_vector = embedder.embed_query("What is machine learning?")
doc_vector = embedder.embed_passage("ML is a subset of AI.")
Classes
|
Dense embedder using OpenAI-compatible /embeddings endpoint. |
- class zvec_db.embedders.dense.openai.OpenAIEmbedder(model='text-embedding-3-small', base_url='https://api.openai.com/v1', api_key=None, dimensions=None, timeout=30.0, encoding_format='float', max_batch_size=None, truncate_prompt_tokens=None, query_prefix=None, passage_prefix=None, model_kwargs=None, model_name=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]
Dense embedder using OpenAI-compatible /embeddings endpoint.
This embedder uses the /v1/embeddings endpoint to compute dense vector representations of texts. It’s compatible with OpenAI’s embedding API format and supports batch processing.
Works with: - OpenAI API (text-embedding-3-small, text-embedding-3-large, etc.) - vLLM serving open-source embedding models - Any OpenAI-compatible API endpoint
- Parameters:
model (str) – Model name to use. OpenAI: “text-embedding-3-small”, “text-embedding-3-large” vLLM: Model name configured in vLLM
base_url (str, optional) – API base URL. For OpenAI: “https://api.openai.com/v1” For vLLM local: “http://localhost:8000/v1” Defaults to “https://api.openai.com/v1”.
api_key (Optional[str], optional) – API key for authentication. Defaults to None (reads from OPENAI_API_KEY env var).
dimensions (Optional[int], optional) – Output embedding dimensions. Only supported by some models (e.g., text-embedding-3-small). Defaults to None (use model default).
timeout (float, optional) – HTTP request timeout in seconds. Defaults to 30.0.
encoding_format (str, optional) – Encoding format for embeddings. “float” for float32 vectors, “base64” for base64-encoded. Defaults to “float”.
max_batch_size (Optional[int], optional) – Maximum number of texts to embed in a single batch. None means no limit. Defaults to None.
truncate_prompt_tokens (Optional[int], optional) – Maximum number of tokens for prompt truncation. When set, prompts exceeding this limit are truncated. By default, APIs reject prompts exceeding max_model_len unless this is set. Defaults to None (no truncation).
query_prefix (str, optional) – Prefix to add to query texts. Useful for asymmetric embedding models like E5, GTE, etc. Example: “query: “ for E5 models. Defaults to “” (no prefix).
passage_prefix (str, optional) – Prefix to add to passage/document texts. Useful for asymmetric embedding models like E5, GTE, etc. Example: “passage: “ for E5 models. Defaults to “” (no prefix).
model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to the API request. Useful for options like: - user: Unique identifier for monitoring and abuse detection - extra_headers: Additional HTTP headers - extra_query_params: Additional query parameters Defaults to None (no additional kwargs).
model_name (str, optional) – Deprecated. Use model instead. This parameter is kept for backward compatibility. Defaults to None.
max_retries (int, optional) – Maximum number of retry attempts for transient failures. Set to 0 to disable retries. Defaults to 3.
initial_delay (float, optional) – Initial delay before first retry in seconds. Defaults to 1.0.
max_delay (float, optional) – Maximum delay cap in seconds. Defaults to 60.0.
exponential_base (float, optional) – Base for exponential backoff. Defaults to 2.0.
jitter (float, optional) – Random jitter factor (0.0-1.0) to avoid thundering herd. Defaults to 0.1.
retry_config (Optional[RetryConfig], optional) – Pre-configured retry settings. If provided, overrides individual retry parameters. Defaults to None.
Example
>>> # OpenAI API >>> embedder = OpenAIEmbedder( ... model="text-embedding-3-small", ... api_key="sk-..." ... ) >>> vector = embedder.embed("search query")
>>> # vLLM local >>> embedder = OpenAIEmbedder( ... base_url="http://localhost:8000/v1", ... api_key="not-needed", ... model="BAAI/bge-m3" ... ) >>> vector = embedder.embed("search query")
>>> # With truncation to handle long prompts >>> embedder = OpenAIEmbedder( ... base_url="http://localhost:8000/v1", ... model="embedding", ... truncate_prompt_tokens=512 ... )
>>> # With prefixes for asymmetric models (e.g., E5, GTE) >>> embedder = OpenAIEmbedder( ... base_url="http://localhost:8000/v1", ... model="intfloat/e5-large-v2", ... query_prefix="query: ", ... passage_prefix="passage: " ... ) >>> query_vector = embedder.embed_query("What is machine learning?") >>> doc_vector = embedder.embed_passage("ML is a subset of AI.")
>>> # With custom retry settings for production >>> embedder = OpenAIEmbedder( ... model="text-embedding-3-small", ... max_retries=5, ... initial_delay=2.0, ... max_delay=120.0, ... )
See also
SentenceTransformersEmbedder: Local dense embeddings using HuggingFace models. RetryConfig: Configuration class for retry behavior.
- __init__(model='text-embedding-3-small', base_url='https://api.openai.com/v1', api_key=None, dimensions=None, timeout=30.0, encoding_format='float', max_batch_size=None, truncate_prompt_tokens=None, query_prefix=None, passage_prefix=None, model_kwargs=None, model_name=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]
- Parameters:
model (str)
base_url (str)
api_key (str | None)
dimensions (int | None)
timeout (float)
encoding_format (str)
max_batch_size (int | None)
truncate_prompt_tokens (int | None)
query_prefix (str | None)
passage_prefix (str | None)
model_name (str | None)
max_retries (int)
initial_delay (float)
max_delay (float)
exponential_base (float)
jitter (float)
retry_config (RetryConfig | None)
- property truncate_prompt_tokens: int | None
Maximum number of tokens for prompt truncation.
- Type:
Optional[int]
- property model_kwargs: Mapping[str, Any]
Additional kwargs passed to the API.
- Type:
Mapping[str, Any]
- fit(documents)[source]
Initialize the embedder.
For API-based embedder, this is a no-op as the model is pre-trained. This method exists for API compatibility.
- embed(input_text, prefix=None)[source]
Embed texts into dense vectors.
- Parameters:
- Returns:
If single text: np.ndarray of shape (embedding_dim,)
If multiple texts: List[np.ndarray] of shape (n_texts, embedding_dim)
- Return type:
Union[np.ndarray, List[np.ndarray]]
- embed_passage(passage)[source]
Embed a passage/document or list of passages with the passage prefix.