zvec_db.embedders.dense.sentence_transformers
Sentence Transformers embeddings using local models.
This module provides dense embedding generation using Sentence Transformers models from HuggingFace. These models run locally on CPU or GPU.
Available Classes
- SentenceTransformersEmbedder
Uses local Sentence Transformers models for dense vector generation. Supports hundreds of pre-trained models from HuggingFace.
Example Usage
from zvec_db.embedders.dense import SentenceTransformersEmbedder
# Standard embedding
embedder = SentenceTransformersEmbedder(
model_name="all-MiniLM-L6-v2",
device="cpu"
)
embedder.fit(documents)
vector = embedder.embed("search query")
# With model_kwargs for private models or custom options
embedder = SentenceTransformersEmbedder(
model_name="org/private-model",
model_kwargs={"token": "hf_...", "trust_remote_code": True}
)
# With float16 for reduced memory
import torch
embedder = SentenceTransformersEmbedder(
model_name="all-MiniLM-L6-v2",
model_kwargs={"torch_dtype": torch.float16}
)
Classes
|
Dense embeddings using Sentence Transformers models locally. |
- class zvec_db.embedders.dense.sentence_transformers.SentenceTransformersEmbedder(model_name='all-MiniLM-L6-v2', device=None, max_length=512, normalize=True, trust_remote_code=False, model_kwargs=None)[source]
Dense embeddings using Sentence Transformers models locally.
This embedder uses pre-trained models from the sentence-transformers library to generate semantic embeddings. It supports hundreds of models available on HuggingFace.
- Parameters:
model_name (str, optional) – Name of the model from HuggingFace. Examples: - “all-MiniLM-L6-v2” (384 dims, fast) - “all-mpnet-base-v2” (768 dims, best quality) - “BAAI/bge-small-en-v1.5” (384 dims, good quality) Defaults to “all-MiniLM-L6-v2”.
device (Optional[str], optional) – Device to run model on. “cpu”, “cuda”, or None for auto-detect. Defaults to None.
max_length (Optional[int], optional) – Maximum sequence length. Defaults to 512.
normalize (bool, optional) – Normalize embeddings to unit length. Defaults to True for cosine similarity compatibility.
trust_remote_code (bool, optional) – Trust remote code in model. Defaults to False.
model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to SentenceTransformer constructor. Useful for options like: - torch_dtype: Model dtype (torch.float16, torch.bfloat16, “auto”) - trust_remote_code: Trust remote code from HuggingFace Hub - token: HuggingFace API token for private models - revision: Model revision to load - cache_dir: Custom cache directory - local_files_only: Load only local files - attn_implementation: Attention implementation (e.g., “flash_attention_2”) Defaults to None (no additional kwargs).
Example
>>> # Standard embedding >>> embedder = SentenceTransformersEmbedder( ... model_name="all-MiniLM-L6-v2", ... device="cpu" ... ) >>> embedder.fit(["document 1", "document 2"]) >>> vector = embedder.embed("search query") >>> print(vector.shape) (384,)
>>> # With model_kwargs for private models >>> embedder = SentenceTransformersEmbedder( ... model_name="org/private-model", ... model_kwargs={"token": "hf_..."} ... )
>>> # With float16 for reduced memory >>> import torch >>> embedder = SentenceTransformersEmbedder( ... model_name="all-MiniLM-L6-v2", ... model_kwargs={"torch_dtype": torch.float16} ... )
Note
Requires the sentence-transformers package
Models are downloaded automatically on first use
GPU acceleration available if CUDA is installed
See also
OpenAIEmbedder: Dense embeddings via OpenAI-compatible API.
- __init__(model_name='all-MiniLM-L6-v2', device=None, max_length=512, normalize=True, trust_remote_code=False, model_kwargs=None)[source]
- property model_kwargs: Mapping[str, Any]
Additional kwargs passed to the model.
- Type:
Mapping[str, Any]
- fit(documents)[source]
Initialize the embedder by loading the model.
For Sentence Transformers, this loads the model. No training is performed as models are pre-trained.
- embed_batch(documents, batch_size=32, show_progress=False)[source]
Embed a large batch of documents with optional progress bar.
This method is optimized for processing large corpora by embedding documents in smaller batches. It supports an optional progress bar for tracking long-running operations.
- Parameters:
- Returns:
List of embedding arrays, one per document.
- Return type:
List[np.ndarray]
Example
>>> embedder = SentenceTransformersEmbedder().fit(corpus) >>> vectors = embedder.embed_batch( ... large_corpus, ... batch_size=64, ... show_progress=True ... )
Note
For single documents or small batches, use
embed()instead.