zvec_db.embedders.dense.openai

OpenAI-compatible API embeddings using /embeddings endpoint.

This module provides dense embedding generation using OpenAI-compatible APIs, which works with: - OpenAI API (text-embedding-3-small, text-embedding-3-large, etc.) - vLLM serving open-source embedding models - Any OpenAI-compatible API endpoint

Available Classes

OpenAIEmbedder

Uses the /v1/embeddings endpoint for dense vector generation. Supports query/passage prefixes for asymmetric embedding models.

Example Usage

from zvec_db.embedders.dense import OpenAIEmbedder

# OpenAI API
embedder = OpenAIEmbedder(
    model_name="text-embedding-3-small",
    api_key="sk-..."
)
vector = embedder.embed("search query")

# vLLM local with asymmetric model (e.g., E5, GTE)
embedder = OpenAIEmbedder(
    base_url="http://localhost:8000/v1",
    model_name="intfloat/e5-large-v2",
    query_prefix="query: ",
    passage_prefix="passage: "
)
query_vector = embedder.embed_query("What is machine learning?")
doc_vector = embedder.embed_passage("ML is a subset of AI.")

Classes

OpenAIEmbedder([model, base_url, api_key, ...])

Dense embedder using OpenAI-compatible /embeddings endpoint.

class zvec_db.embedders.dense.openai.OpenAIEmbedder(model='text-embedding-3-small', base_url='https://api.openai.com/v1', api_key=None, dimensions=None, timeout=30.0, encoding_format='float', max_batch_size=None, truncate_prompt_tokens=None, query_prefix=None, passage_prefix=None, model_kwargs=None, model_name=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]

Dense embedder using OpenAI-compatible /embeddings endpoint.

This embedder uses the /v1/embeddings endpoint to compute dense vector representations of texts. It’s compatible with OpenAI’s embedding API format and supports batch processing.

Works with: - OpenAI API (text-embedding-3-small, text-embedding-3-large, etc.) - vLLM serving open-source embedding models - Any OpenAI-compatible API endpoint

Parameters:
  • model (str) – Model name to use. OpenAI: “text-embedding-3-small”, “text-embedding-3-large” vLLM: Model name configured in vLLM

  • base_url (str, optional) – API base URL. For OpenAI: “https://api.openai.com/v1” For vLLM local: “http://localhost:8000/v1” Defaults to “https://api.openai.com/v1”.

  • api_key (Optional[str], optional) – API key for authentication. Defaults to None (reads from OPENAI_API_KEY env var).

  • dimensions (Optional[int], optional) – Output embedding dimensions. Only supported by some models (e.g., text-embedding-3-small). Defaults to None (use model default).

  • timeout (float, optional) – HTTP request timeout in seconds. Defaults to 30.0.

  • encoding_format (str, optional) – Encoding format for embeddings. “float” for float32 vectors, “base64” for base64-encoded. Defaults to “float”.

  • max_batch_size (Optional[int], optional) – Maximum number of texts to embed in a single batch. None means no limit. Defaults to None.

  • truncate_prompt_tokens (Optional[int], optional) – Maximum number of tokens for prompt truncation. When set, prompts exceeding this limit are truncated. By default, APIs reject prompts exceeding max_model_len unless this is set. Defaults to None (no truncation).

  • query_prefix (str, optional) – Prefix to add to query texts. Useful for asymmetric embedding models like E5, GTE, etc. Example: “query: “ for E5 models. Defaults to “” (no prefix).

  • passage_prefix (str, optional) – Prefix to add to passage/document texts. Useful for asymmetric embedding models like E5, GTE, etc. Example: “passage: “ for E5 models. Defaults to “” (no prefix).

  • model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to the API request. Useful for options like: - user: Unique identifier for monitoring and abuse detection - extra_headers: Additional HTTP headers - extra_query_params: Additional query parameters Defaults to None (no additional kwargs).

  • model_name (str, optional) – Deprecated. Use model instead. This parameter is kept for backward compatibility. Defaults to None.

  • max_retries (int, optional) – Maximum number of retry attempts for transient failures. Set to 0 to disable retries. Defaults to 3.

  • initial_delay (float, optional) – Initial delay before first retry in seconds. Defaults to 1.0.

  • max_delay (float, optional) – Maximum delay cap in seconds. Defaults to 60.0.

  • exponential_base (float, optional) – Base for exponential backoff. Defaults to 2.0.

  • jitter (float, optional) – Random jitter factor (0.0-1.0) to avoid thundering herd. Defaults to 0.1.

  • retry_config (Optional[RetryConfig], optional) – Pre-configured retry settings. If provided, overrides individual retry parameters. Defaults to None.

Example

>>> # OpenAI API
>>> embedder = OpenAIEmbedder(
...     model="text-embedding-3-small",
...     api_key="sk-..."
... )
>>> vector = embedder.embed("search query")
>>> # vLLM local
>>> embedder = OpenAIEmbedder(
...     base_url="http://localhost:8000/v1",
...     api_key="not-needed",
...     model="BAAI/bge-m3"
... )
>>> vector = embedder.embed("search query")
>>> # With truncation to handle long prompts
>>> embedder = OpenAIEmbedder(
...     base_url="http://localhost:8000/v1",
...     model="embedding",
...     truncate_prompt_tokens=512
... )
>>> # With prefixes for asymmetric models (e.g., E5, GTE)
>>> embedder = OpenAIEmbedder(
...     base_url="http://localhost:8000/v1",
...     model="intfloat/e5-large-v2",
...     query_prefix="query: ",
...     passage_prefix="passage: "
... )
>>> query_vector = embedder.embed_query("What is machine learning?")
>>> doc_vector = embedder.embed_passage("ML is a subset of AI.")
>>> # With custom retry settings for production
>>> embedder = OpenAIEmbedder(
...     model="text-embedding-3-small",
...     max_retries=5,
...     initial_delay=2.0,
...     max_delay=120.0,
... )

See also

SentenceTransformersEmbedder: Local dense embeddings using HuggingFace models. RetryConfig: Configuration class for retry behavior.

__init__(model='text-embedding-3-small', base_url='https://api.openai.com/v1', api_key=None, dimensions=None, timeout=30.0, encoding_format='float', max_batch_size=None, truncate_prompt_tokens=None, query_prefix=None, passage_prefix=None, model_kwargs=None, model_name=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]
Parameters:
  • model (str)

  • base_url (str)

  • api_key (str | None)

  • dimensions (int | None)

  • timeout (float)

  • encoding_format (str)

  • max_batch_size (int | None)

  • truncate_prompt_tokens (int | None)

  • query_prefix (str | None)

  • passage_prefix (str | None)

  • model_kwargs (Mapping[str, Any] | None)

  • model_name (str | None)

  • max_retries (int)

  • initial_delay (float)

  • max_delay (float)

  • exponential_base (float)

  • jitter (float)

  • retry_config (RetryConfig | None)

property model_name: str

Model identifier (alias for model for backward compatibility).

Type:

str

property model: str

Model identifier (OpenAI API naming).

Type:

str

property base_url: str

Base URL for the API.

Type:

str

property api_key: str | None

API key for authentication.

Type:

Optional[str]

property dimensions: int | None

Output embedding dimensions.

Type:

Optional[int]

property timeout: float

HTTP request timeout in seconds.

Type:

float

property encoding_format: str

Encoding format for embeddings.

Type:

str

property max_batch_size: int | None

Maximum batch size for embedding.

Type:

Optional[int]

property truncate_prompt_tokens: int | None

Maximum number of tokens for prompt truncation.

Type:

Optional[int]

property query_prefix: str

Prefix added to query texts.

Type:

str

property passage_prefix: str

Prefix added to passage/document texts.

Type:

str

property model_kwargs: Mapping[str, Any]

Additional kwargs passed to the API.

Type:

Mapping[str, Any]

property embedding_dim: int

Dimension of embeddings (available after fit or first embed).

Type:

int

property is_fitted: bool

Whether the embedder has been fitted.

Type:

bool

fit(documents)[source]

Initialize the embedder.

For API-based embedder, this is a no-op as the model is pre-trained. This method exists for API compatibility.

Parameters:

documents (List[str]) – List of documents (not used, for API compatibility).

Returns:

For method chaining.

Return type:

self

embed(input_text, prefix=None)[source]

Embed texts into dense vectors.

Parameters:
  • input_text (Union[str, List[str]]) – Single text or list of texts to embed.

  • prefix (Optional[str], optional) – Prefix to add to each text. Defaults to None (no prefix).

Returns:

  • If single text: np.ndarray of shape (embedding_dim,)

  • If multiple texts: List[np.ndarray] of shape (n_texts, embedding_dim)

Return type:

Union[np.ndarray, List[np.ndarray]]

embed_query(query)[source]

Embed a query or list of queries with the query prefix.

Parameters:

query (Union[str, List[str]]) – Single query or list of queries to embed.

Returns:

  • If single query: np.ndarray of shape (embedding_dim,)

  • If multiple queries: List[np.ndarray] of shape (n_queries, embedding_dim)

Return type:

Union[np.ndarray, List[np.ndarray]]

embed_passage(passage)[source]

Embed a passage/document or list of passages with the passage prefix.

Parameters:

passage (Union[str, List[str]]) – Single passage or list of passages to embed.

Returns:

  • If single passage: np.ndarray of shape (embedding_dim,)

  • If multiple passages: List[np.ndarray] of shape (n_passages, embedding_dim)

Return type:

Union[np.ndarray, List[np.ndarray]]

embed_batch(documents, show_progress=False, prefix=None)[source]

Embed a batch of documents.

Parameters:
  • documents (List[str]) – List of documents to embed.

  • show_progress (bool, optional) – Show progress bar. Not used for API-based embedding. Defaults to False.

  • prefix (Optional[str], optional) – Prefix to add to each document. Defaults to None (no prefix).

Returns:

List of embedding vectors.

Return type:

List[np.ndarray]