zvec_db.embedders.dense

Dense embedders for semantic search.

class zvec_db.embedders.dense.BaseDenseEmbedder(model_name, max_length=512, normalize=True)[source]

Base class for dense embedding models.

Dense embedders generate fixed-size vector representations of text, as opposed to sparse embeddings which have variable dimensions.

Parameters:

model_name (str) – Name or path of the model to use.
max_length (Optional[int]) – Maximum sequence length. Defaults to 512.
normalize (bool) – Whether to normalize embeddings to unit length. Defaults to True for cosine similarity compatibility.

Example

>>> embedder = SentenceTransformersEmbedder("all-MiniLM-L6-v2")
>>> embedder.fit(["document 1", "document 2"])
>>> vector = embedder.embed("query")
>>> len(vector)  # Fixed size
384

__init__(model_name, max_length=512, normalize=True)[source]

Parameters:

model_name (str)
max_length (int | None)
normalize (bool)

abstractmethod fit(documents)[source]

Initialize the embedder on a corpus.

For dense models, this is typically optional and just initializes the model. Unlike sparse models, no vocabulary is learned.

Parameters:: documents (List[str]) – List of documents for initialization.
Returns:: For method chaining.
Return type:: self

abstractmethod embed(input_text)[source]

Generate dense embeddings for text.

Parameters:: input_text (str | List[str]) – Single document or batch of documents.
Returns:: Numpy array for single input, or list of numpy arrays for batch input.
Return type:: ndarray | List[ndarray]

__call__(input_text)[source]

Call shortcut that delegates to embed().

This allows the embedder to be called like a function:

embedder = SentenceTransformersEmbedder()
embedder.fit(documents)
vector = embedder("query text")  # equivalent to embedder.embed(...)

Parameters:: input_text (str | List[str]) – Single document or batch of documents.
Returns:: List of floats for single input, or list of lists for batch input.
Return type:: ndarray | List[ndarray]

transform(input_text)[source]

Alias for embed() returning numpy array.

For single input, returns 2D array with shape (1, dim). For batch input, returns 2D array with shape (n, dim).

Parameters:: input_text (str | List[str]) – Single document or batch.
Returns:: 2D numpy array of embeddings.
Return type:: ndarray

save(path)[source]

Save embedder configuration.

Dense models typically don’t need saving as they load pre-trained weights. This saves configuration only.

Parameters:: path (str) – Path to save configuration.
Return type:: None

load(path)[source]

Load embedder configuration.

Parameters:: path (str) – Path to configuration file.
Return type:: None

property is_fitted: bool

True if embedder is initialized.

Type:: bool

class zvec_db.embedders.dense.SentenceTransformersEmbedder(model_name='all-MiniLM-L6-v2', device=None, max_length=512, normalize=True, trust_remote_code=False, model_kwargs=None)[source]

Dense embeddings using Sentence Transformers models locally.

This embedder uses pre-trained models from the sentence-transformers library to generate semantic embeddings. It supports hundreds of models available on HuggingFace.

Parameters:

model_name (str, optional) – Name of the model from HuggingFace. Examples: - “all-MiniLM-L6-v2” (384 dims, fast) - “all-mpnet-base-v2” (768 dims, best quality) - “BAAI/bge-small-en-v1.5” (384 dims, good quality) Defaults to “all-MiniLM-L6-v2”.
device (Optional[str], optional) – Device to run model on. “cpu”, “cuda”, or None for auto-detect. Defaults to None.
max_length (Optional[int], optional) – Maximum sequence length. Defaults to 512.
normalize (bool, optional) – Normalize embeddings to unit length. Defaults to True for cosine similarity compatibility.
trust_remote_code (bool, optional) – Trust remote code in model. Defaults to False.
model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to SentenceTransformer constructor. Useful for options like: - torch_dtype: Model dtype (torch.float16, torch.bfloat16, “auto”) - trust_remote_code: Trust remote code from HuggingFace Hub - token: HuggingFace API token for private models - revision: Model revision to load - cache_dir: Custom cache directory - local_files_only: Load only local files - attn_implementation: Attention implementation (e.g., “flash_attention_2”) Defaults to None (no additional kwargs).

Example

>>> # Standard embedding
>>> embedder = SentenceTransformersEmbedder(
...     model_name="all-MiniLM-L6-v2",
...     device="cpu"
... )
>>> embedder.fit(["document 1", "document 2"])
>>> vector = embedder.embed("search query")
>>> print(vector.shape)
(384,)

>>> # With model_kwargs for private models
>>> embedder = SentenceTransformersEmbedder(
...     model_name="org/private-model",
...     model_kwargs={"token": "hf_..."}
... )

>>> # With float16 for reduced memory
>>> import torch
>>> embedder = SentenceTransformersEmbedder(
...     model_name="all-MiniLM-L6-v2",
...     model_kwargs={"torch_dtype": torch.float16}
... )

Note

Requires the sentence-transformers package
Models are downloaded automatically on first use
GPU acceleration available if CUDA is installed

See also

OpenAIEmbedder: Dense embeddings via OpenAI-compatible API.

__init__(model_name='all-MiniLM-L6-v2', device=None, max_length=512, normalize=True, trust_remote_code=False, model_kwargs=None)[source]

Parameters:

model_name (str)
device (str | None)
max_length (int | None)
normalize (bool)
trust_remote_code (bool)
model_kwargs (Mapping[str, Any] | None)

property device: str | None

Device to run model on.

Type:: Optional[str]

property trust_remote_code: bool

Trust remote code in model.

Type:: bool

property model_kwargs: Mapping[str, Any]

Additional kwargs passed to the model.

Type:: Mapping[str, Any]

property embedding_dim: int

Dimension of the embedding vectors.

Type:: int

property is_fitted: bool

Whether the embedder has been fitted.

Type:: bool

fit(documents)[source]

Initialize the embedder by loading the model.

For Sentence Transformers, this loads the model. No training is performed as models are pre-trained.

Parameters:: documents (List[str]) – List of documents (used for initialization only).
Returns:: For method chaining.
Return type:: self

embed(input_text)[source]

Generate embeddings for text.

Parameters:: input_text (str | List[str]) – Single document or batch.
Returns:: Single numpy array or list for batch.
Raises:: RuntimeError – If model loading fails.
Return type:: ndarray | list[ndarray]

embed_batch(documents, batch_size=32, show_progress=False)[source]

Embed a large batch of documents with optional progress bar.

This method is optimized for processing large corpora by embedding documents in smaller batches. It supports an optional progress bar for tracking long-running operations.

Parameters:

documents (List[str]) – List of documents to embed.
batch_size (int, optional) – Number of documents per batch. Defaults to 32.
show_progress (bool, optional) – Show progress bar. Defaults to False.

Returns:

List of embedding arrays, one per document.

Return type:

List[np.ndarray]

Example

>>> embedder = SentenceTransformersEmbedder().fit(corpus)
>>> vectors = embedder.embed_batch(
...     large_corpus,
...     batch_size=64,
...     show_progress=True
... )

Note

For single documents or small batches, use embed() instead.

class zvec_db.embedders.dense.OpenAIEmbedder(model='text-embedding-3-small', base_url='https://api.openai.com/v1', api_key=None, dimensions=None, timeout=30.0, encoding_format='float', max_batch_size=None, truncate_prompt_tokens=None, query_prefix=None, passage_prefix=None, model_kwargs=None, model_name=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]

Dense embedder using OpenAI-compatible /embeddings endpoint.

This embedder uses the /v1/embeddings endpoint to compute dense vector representations of texts. It’s compatible with OpenAI’s embedding API format and supports batch processing.

Works with: - OpenAI API (text-embedding-3-small, text-embedding-3-large, etc.) - vLLM serving open-source embedding models - Any OpenAI-compatible API endpoint

Parameters:

model (str) – Model name to use. OpenAI: “text-embedding-3-small”, “text-embedding-3-large” vLLM: Model name configured in vLLM
base_url (str, optional) – API base URL. For OpenAI: “https://api.openai.com/v1” For vLLM local: “http://localhost:8000/v1” Defaults to “https://api.openai.com/v1”.
api_key (Optional[str], optional) – API key for authentication. Defaults to None (reads from OPENAI_API_KEY env var).
dimensions (Optional[int], optional) – Output embedding dimensions. Only supported by some models (e.g., text-embedding-3-small). Defaults to None (use model default).
timeout (float, optional) – HTTP request timeout in seconds. Defaults to 30.0.
encoding_format (str, optional) – Encoding format for embeddings. “float” for float32 vectors, “base64” for base64-encoded. Defaults to “float”.
max_batch_size (Optional[int], optional) – Maximum number of texts to embed in a single batch. None means no limit. Defaults to None.
truncate_prompt_tokens (Optional[int], optional) – Maximum number of tokens for prompt truncation. When set, prompts exceeding this limit are truncated. By default, APIs reject prompts exceeding max_model_len unless this is set. Defaults to None (no truncation).
query_prefix (str, optional) – Prefix to add to query texts. Useful for asymmetric embedding models like E5, GTE, etc. Example: “query: “ for E5 models. Defaults to “” (no prefix).
passage_prefix (str, optional) – Prefix to add to passage/document texts. Useful for asymmetric embedding models like E5, GTE, etc. Example: “passage: “ for E5 models. Defaults to “” (no prefix).
model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to the API request. Useful for options like: - user: Unique identifier for monitoring and abuse detection - extra_headers: Additional HTTP headers - extra_query_params: Additional query parameters Defaults to None (no additional kwargs).
model_name (str, optional) – Deprecated. Use model instead. This parameter is kept for backward compatibility. Defaults to None.
max_retries (int, optional) – Maximum number of retry attempts for transient failures. Set to 0 to disable retries. Defaults to 3.
initial_delay (float, optional) – Initial delay before first retry in seconds. Defaults to 1.0.
max_delay (float, optional) – Maximum delay cap in seconds. Defaults to 60.0.
exponential_base (float, optional) – Base for exponential backoff. Defaults to 2.0.
jitter (float, optional) – Random jitter factor (0.0-1.0) to avoid thundering herd. Defaults to 0.1.
retry_config (Optional[RetryConfig], optional) – Pre-configured retry settings. If provided, overrides individual retry parameters. Defaults to None.

Example

>>> # OpenAI API
>>> embedder = OpenAIEmbedder(
...     model="text-embedding-3-small",
...     api_key="sk-..."
... )
>>> vector = embedder.embed("search query")

>>> # vLLM local
>>> embedder = OpenAIEmbedder(
...     base_url="http://localhost:8000/v1",
...     api_key="not-needed",
...     model="BAAI/bge-m3"
... )
>>> vector = embedder.embed("search query")

>>> # With truncation to handle long prompts
>>> embedder = OpenAIEmbedder(
...     base_url="http://localhost:8000/v1",
...     model="embedding",
...     truncate_prompt_tokens=512
... )

>>> # With prefixes for asymmetric models (e.g., E5, GTE)
>>> embedder = OpenAIEmbedder(
...     base_url="http://localhost:8000/v1",
...     model="intfloat/e5-large-v2",
...     query_prefix="query: ",
...     passage_prefix="passage: "
... )
>>> query_vector = embedder.embed_query("What is machine learning?")
>>> doc_vector = embedder.embed_passage("ML is a subset of AI.")

>>> # With custom retry settings for production
>>> embedder = OpenAIEmbedder(
...     model="text-embedding-3-small",
...     max_retries=5,
...     initial_delay=2.0,
...     max_delay=120.0,
... )

See also

SentenceTransformersEmbedder: Local dense embeddings using HuggingFace models. RetryConfig: Configuration class for retry behavior.

__init__(model='text-embedding-3-small', base_url='https://api.openai.com/v1', api_key=None, dimensions=None, timeout=30.0, encoding_format='float', max_batch_size=None, truncate_prompt_tokens=None, query_prefix=None, passage_prefix=None, model_kwargs=None, model_name=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]

Parameters:

model (str)
base_url (str)
api_key (str | None)
dimensions (int | None)
timeout (float)
encoding_format (str)
max_batch_size (int | None)
truncate_prompt_tokens (int | None)
query_prefix (str | None)
passage_prefix (str | None)
model_kwargs (Mapping[str, Any] | None)
model_name (str | None)
max_retries (int)
initial_delay (float)
max_delay (float)
exponential_base (float)
jitter (float)
retry_config (RetryConfig | None)

property model_name: str

Model identifier (alias for model for backward compatibility).

Type:: str

property model: str

Model identifier (OpenAI API naming).

Type:: str

property base_url: str

Base URL for the API.

Type:: str

property api_key: str | None

API key for authentication.

Type:: Optional[str]

property dimensions: int | None

Output embedding dimensions.

Type:: Optional[int]

property timeout: float

HTTP request timeout in seconds.

Type:: float

property encoding_format: str

Encoding format for embeddings.

Type:: str

property max_batch_size: int | None

Maximum batch size for embedding.

Type:: Optional[int]

property truncate_prompt_tokens: int | None

Maximum number of tokens for prompt truncation.

Type:: Optional[int]

property query_prefix: str

Prefix added to query texts.

Type:: str

property passage_prefix: str

Prefix added to passage/document texts.

Type:: str

property model_kwargs: Mapping[str, Any]

Additional kwargs passed to the API.

Type:: Mapping[str, Any]

property embedding_dim: int

Dimension of embeddings (available after fit or first embed).

Type:: int

property is_fitted: bool

Whether the embedder has been fitted.

Type:: bool

fit(documents)[source]

Initialize the embedder.

For API-based embedder, this is a no-op as the model is pre-trained. This method exists for API compatibility.

Parameters:: documents (List[str]) – List of documents (not used, for API compatibility).
Returns:: For method chaining.
Return type:: self

embed(input_text, prefix=None)[source]

Embed texts into dense vectors.

Parameters:

input_text (Union[str, List[str]]) – Single text or list of texts to embed.
prefix (Optional[str], optional) – Prefix to add to each text. Defaults to None (no prefix).

Returns:

If single text: np.ndarray of shape (embedding_dim,)
If multiple texts: List[np.ndarray] of shape (n_texts, embedding_dim)

Return type:

Union[np.ndarray, List[np.ndarray]]

embed_query(query)[source]

Embed a query or list of queries with the query prefix.

Parameters:

query (Union[str, List[str]]) – Single query or list of queries to embed.

Returns:

If single query: np.ndarray of shape (embedding_dim,)
If multiple queries: List[np.ndarray] of shape (n_queries, embedding_dim)

Return type:

Union[np.ndarray, List[np.ndarray]]

embed_passage(passage)[source]

Embed a passage/document or list of passages with the passage prefix.

Parameters:

passage (Union[str, List[str]]) – Single passage or list of passages to embed.

Returns:

If single passage: np.ndarray of shape (embedding_dim,)
If multiple passages: List[np.ndarray] of shape (n_passages, embedding_dim)

Return type:

Union[np.ndarray, List[np.ndarray]]

embed_batch(documents, show_progress=False, prefix=None)[source]

Embed a batch of documents.

Parameters:

documents (List[str]) – List of documents to embed.
show_progress (bool, optional) – Show progress bar. Not used for API-based embedding. Defaults to False.
prefix (Optional[str], optional) – Prefix to add to each document. Defaults to None (no prefix).

Returns:

List of embedding vectors.

Return type:

List[np.ndarray]

Modules

`embedders`	Dense embedding base classes.
`openai`	OpenAI-compatible API embeddings using /embeddings endpoint.
`sentence_transformers`	Sentence Transformers embeddings using local models.