zvec_db.embedders.dense
Dense embedders for semantic search.
- class zvec_db.embedders.dense.BaseDenseEmbedder(model_name, max_length=512, normalize=True)[source]
Base class for dense embedding models.
Dense embedders generate fixed-size vector representations of text, as opposed to sparse embeddings which have variable dimensions.
- Parameters:
Example
>>> embedder = SentenceTransformersEmbedder("all-MiniLM-L6-v2") >>> embedder.fit(["document 1", "document 2"]) >>> vector = embedder.embed("query") >>> len(vector) # Fixed size 384
- abstractmethod fit(documents)[source]
Initialize the embedder on a corpus.
For dense models, this is typically optional and just initializes the model. Unlike sparse models, no vocabulary is learned.
- __call__(input_text)[source]
Call shortcut that delegates to
embed().This allows the embedder to be called like a function:
embedder = SentenceTransformersEmbedder() embedder.fit(documents) vector = embedder("query text") # equivalent to embedder.embed(...)
- transform(input_text)[source]
Alias for embed() returning numpy array.
For single input, returns 2D array with shape (1, dim). For batch input, returns 2D array with shape (n, dim).
- save(path)[source]
Save embedder configuration.
Dense models typically don’t need saving as they load pre-trained weights. This saves configuration only.
- Parameters:
path (str) – Path to save configuration.
- Return type:
None
- class zvec_db.embedders.dense.SentenceTransformersEmbedder(model_name='all-MiniLM-L6-v2', device=None, max_length=512, normalize=True, trust_remote_code=False, model_kwargs=None)[source]
Dense embeddings using Sentence Transformers models locally.
This embedder uses pre-trained models from the sentence-transformers library to generate semantic embeddings. It supports hundreds of models available on HuggingFace.
- Parameters:
model_name (str, optional) – Name of the model from HuggingFace. Examples: - “all-MiniLM-L6-v2” (384 dims, fast) - “all-mpnet-base-v2” (768 dims, best quality) - “BAAI/bge-small-en-v1.5” (384 dims, good quality) Defaults to “all-MiniLM-L6-v2”.
device (Optional[str], optional) – Device to run model on. “cpu”, “cuda”, or None for auto-detect. Defaults to None.
max_length (Optional[int], optional) – Maximum sequence length. Defaults to 512.
normalize (bool, optional) – Normalize embeddings to unit length. Defaults to True for cosine similarity compatibility.
trust_remote_code (bool, optional) – Trust remote code in model. Defaults to False.
model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to SentenceTransformer constructor. Useful for options like: - torch_dtype: Model dtype (torch.float16, torch.bfloat16, “auto”) - trust_remote_code: Trust remote code from HuggingFace Hub - token: HuggingFace API token for private models - revision: Model revision to load - cache_dir: Custom cache directory - local_files_only: Load only local files - attn_implementation: Attention implementation (e.g., “flash_attention_2”) Defaults to None (no additional kwargs).
Example
>>> # Standard embedding >>> embedder = SentenceTransformersEmbedder( ... model_name="all-MiniLM-L6-v2", ... device="cpu" ... ) >>> embedder.fit(["document 1", "document 2"]) >>> vector = embedder.embed("search query") >>> print(vector.shape) (384,)
>>> # With model_kwargs for private models >>> embedder = SentenceTransformersEmbedder( ... model_name="org/private-model", ... model_kwargs={"token": "hf_..."} ... )
>>> # With float16 for reduced memory >>> import torch >>> embedder = SentenceTransformersEmbedder( ... model_name="all-MiniLM-L6-v2", ... model_kwargs={"torch_dtype": torch.float16} ... )
Note
Requires the sentence-transformers package
Models are downloaded automatically on first use
GPU acceleration available if CUDA is installed
See also
OpenAIEmbedder: Dense embeddings via OpenAI-compatible API.
- __init__(model_name='all-MiniLM-L6-v2', device=None, max_length=512, normalize=True, trust_remote_code=False, model_kwargs=None)[source]
- property model_kwargs: Mapping[str, Any]
Additional kwargs passed to the model.
- Type:
Mapping[str, Any]
- fit(documents)[source]
Initialize the embedder by loading the model.
For Sentence Transformers, this loads the model. No training is performed as models are pre-trained.
- embed_batch(documents, batch_size=32, show_progress=False)[source]
Embed a large batch of documents with optional progress bar.
This method is optimized for processing large corpora by embedding documents in smaller batches. It supports an optional progress bar for tracking long-running operations.
- Parameters:
- Returns:
List of embedding arrays, one per document.
- Return type:
List[np.ndarray]
Example
>>> embedder = SentenceTransformersEmbedder().fit(corpus) >>> vectors = embedder.embed_batch( ... large_corpus, ... batch_size=64, ... show_progress=True ... )
Note
For single documents or small batches, use
embed()instead.
- class zvec_db.embedders.dense.OpenAIEmbedder(model='text-embedding-3-small', base_url='https://api.openai.com/v1', api_key=None, dimensions=None, timeout=30.0, encoding_format='float', max_batch_size=None, truncate_prompt_tokens=None, query_prefix=None, passage_prefix=None, model_kwargs=None, model_name=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]
Dense embedder using OpenAI-compatible /embeddings endpoint.
This embedder uses the /v1/embeddings endpoint to compute dense vector representations of texts. It’s compatible with OpenAI’s embedding API format and supports batch processing.
Works with: - OpenAI API (text-embedding-3-small, text-embedding-3-large, etc.) - vLLM serving open-source embedding models - Any OpenAI-compatible API endpoint
- Parameters:
model (str) – Model name to use. OpenAI: “text-embedding-3-small”, “text-embedding-3-large” vLLM: Model name configured in vLLM
base_url (str, optional) – API base URL. For OpenAI: “https://api.openai.com/v1” For vLLM local: “http://localhost:8000/v1” Defaults to “https://api.openai.com/v1”.
api_key (Optional[str], optional) – API key for authentication. Defaults to None (reads from OPENAI_API_KEY env var).
dimensions (Optional[int], optional) – Output embedding dimensions. Only supported by some models (e.g., text-embedding-3-small). Defaults to None (use model default).
timeout (float, optional) – HTTP request timeout in seconds. Defaults to 30.0.
encoding_format (str, optional) – Encoding format for embeddings. “float” for float32 vectors, “base64” for base64-encoded. Defaults to “float”.
max_batch_size (Optional[int], optional) – Maximum number of texts to embed in a single batch. None means no limit. Defaults to None.
truncate_prompt_tokens (Optional[int], optional) – Maximum number of tokens for prompt truncation. When set, prompts exceeding this limit are truncated. By default, APIs reject prompts exceeding max_model_len unless this is set. Defaults to None (no truncation).
query_prefix (str, optional) – Prefix to add to query texts. Useful for asymmetric embedding models like E5, GTE, etc. Example: “query: “ for E5 models. Defaults to “” (no prefix).
passage_prefix (str, optional) – Prefix to add to passage/document texts. Useful for asymmetric embedding models like E5, GTE, etc. Example: “passage: “ for E5 models. Defaults to “” (no prefix).
model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to the API request. Useful for options like: - user: Unique identifier for monitoring and abuse detection - extra_headers: Additional HTTP headers - extra_query_params: Additional query parameters Defaults to None (no additional kwargs).
model_name (str, optional) – Deprecated. Use model instead. This parameter is kept for backward compatibility. Defaults to None.
max_retries (int, optional) – Maximum number of retry attempts for transient failures. Set to 0 to disable retries. Defaults to 3.
initial_delay (float, optional) – Initial delay before first retry in seconds. Defaults to 1.0.
max_delay (float, optional) – Maximum delay cap in seconds. Defaults to 60.0.
exponential_base (float, optional) – Base for exponential backoff. Defaults to 2.0.
jitter (float, optional) – Random jitter factor (0.0-1.0) to avoid thundering herd. Defaults to 0.1.
retry_config (Optional[RetryConfig], optional) – Pre-configured retry settings. If provided, overrides individual retry parameters. Defaults to None.
Example
>>> # OpenAI API >>> embedder = OpenAIEmbedder( ... model="text-embedding-3-small", ... api_key="sk-..." ... ) >>> vector = embedder.embed("search query")
>>> # vLLM local >>> embedder = OpenAIEmbedder( ... base_url="http://localhost:8000/v1", ... api_key="not-needed", ... model="BAAI/bge-m3" ... ) >>> vector = embedder.embed("search query")
>>> # With truncation to handle long prompts >>> embedder = OpenAIEmbedder( ... base_url="http://localhost:8000/v1", ... model="embedding", ... truncate_prompt_tokens=512 ... )
>>> # With prefixes for asymmetric models (e.g., E5, GTE) >>> embedder = OpenAIEmbedder( ... base_url="http://localhost:8000/v1", ... model="intfloat/e5-large-v2", ... query_prefix="query: ", ... passage_prefix="passage: " ... ) >>> query_vector = embedder.embed_query("What is machine learning?") >>> doc_vector = embedder.embed_passage("ML is a subset of AI.")
>>> # With custom retry settings for production >>> embedder = OpenAIEmbedder( ... model="text-embedding-3-small", ... max_retries=5, ... initial_delay=2.0, ... max_delay=120.0, ... )
See also
SentenceTransformersEmbedder: Local dense embeddings using HuggingFace models. RetryConfig: Configuration class for retry behavior.
- __init__(model='text-embedding-3-small', base_url='https://api.openai.com/v1', api_key=None, dimensions=None, timeout=30.0, encoding_format='float', max_batch_size=None, truncate_prompt_tokens=None, query_prefix=None, passage_prefix=None, model_kwargs=None, model_name=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]
- Parameters:
model (str)
base_url (str)
api_key (str | None)
dimensions (int | None)
timeout (float)
encoding_format (str)
max_batch_size (int | None)
truncate_prompt_tokens (int | None)
query_prefix (str | None)
passage_prefix (str | None)
model_name (str | None)
max_retries (int)
initial_delay (float)
max_delay (float)
exponential_base (float)
jitter (float)
retry_config (RetryConfig | None)
- property truncate_prompt_tokens: int | None
Maximum number of tokens for prompt truncation.
- Type:
Optional[int]
- property model_kwargs: Mapping[str, Any]
Additional kwargs passed to the API.
- Type:
Mapping[str, Any]
- fit(documents)[source]
Initialize the embedder.
For API-based embedder, this is a no-op as the model is pre-trained. This method exists for API compatibility.
- embed(input_text, prefix=None)[source]
Embed texts into dense vectors.
- Parameters:
- Returns:
If single text: np.ndarray of shape (embedding_dim,)
If multiple texts: List[np.ndarray] of shape (n_texts, embedding_dim)
- Return type:
Union[np.ndarray, List[np.ndarray]]
- embed_passage(passage)[source]
Embed a passage/document or list of passages with the passage prefix.
Modules
Dense embedding base classes. |
|
OpenAI-compatible API embeddings using /embeddings endpoint. |
|
Sentence Transformers embeddings using local models. |