zvec_db.embedders.dense.embedders

Dense embedding base classes.

This module provides the base class for dense embedding models.

Main Classes

BaseDenseEmbedder

Abstract base class for all dense embedders.

Note

Concrete implementations are in separate modules: - openai.py: OpenAIEmbedder (API-based) - sentence_transformers.py: SentenceTransformersEmbedder (local)

Example Usage

from zvec_db.embedders.dense import SentenceTransformersEmbedder, OpenAIEmbedder

# Sentence Transformers embedding
embedder = SentenceTransformersEmbedder(
    model_name="all-MiniLM-L6-v2",
    device="cpu"
)
vector = embedder.embed("search query")

# OpenAI embedding
embedder = OpenAIEmbedder(
    base_url="http://localhost:8000/v1",
    model="BAAI/bge-m3"
)
# fit() not needed for API-based embedders
vector = embedder.embed("search query")

Classes

BaseDenseEmbedder(model_name[, max_length, ...])

Base class for dense embedding models.

class zvec_db.embedders.dense.embedders.BaseDenseEmbedder(model_name, max_length=512, normalize=True)[source]

Base class for dense embedding models.

Dense embedders generate fixed-size vector representations of text, as opposed to sparse embeddings which have variable dimensions.

Parameters:
  • model_name (str) – Name or path of the model to use.

  • max_length (Optional[int]) – Maximum sequence length. Defaults to 512.

  • normalize (bool) – Whether to normalize embeddings to unit length. Defaults to True for cosine similarity compatibility.

Example

>>> embedder = SentenceTransformersEmbedder("all-MiniLM-L6-v2")
>>> embedder.fit(["document 1", "document 2"])
>>> vector = embedder.embed("query")
>>> len(vector)  # Fixed size
384
__init__(model_name, max_length=512, normalize=True)[source]
Parameters:
  • model_name (str)

  • max_length (int | None)

  • normalize (bool)

abstractmethod fit(documents)[source]

Initialize the embedder on a corpus.

For dense models, this is typically optional and just initializes the model. Unlike sparse models, no vocabulary is learned.

Parameters:

documents (List[str]) – List of documents for initialization.

Returns:

For method chaining.

Return type:

self

abstractmethod embed(input_text)[source]

Generate dense embeddings for text.

Parameters:

input_text (str | List[str]) – Single document or batch of documents.

Returns:

Numpy array for single input, or list of numpy arrays for batch input.

Return type:

ndarray | List[ndarray]

__call__(input_text)[source]

Call shortcut that delegates to embed().

This allows the embedder to be called like a function:

embedder = SentenceTransformersEmbedder()
embedder.fit(documents)
vector = embedder("query text")  # equivalent to embedder.embed(...)
Parameters:

input_text (str | List[str]) – Single document or batch of documents.

Returns:

List of floats for single input, or list of lists for batch input.

Return type:

ndarray | List[ndarray]

transform(input_text)[source]

Alias for embed() returning numpy array.

For single input, returns 2D array with shape (1, dim). For batch input, returns 2D array with shape (n, dim).

Parameters:

input_text (str | List[str]) – Single document or batch.

Returns:

2D numpy array of embeddings.

Return type:

ndarray

save(path)[source]

Save embedder configuration.

Dense models typically don’t need saving as they load pre-trained weights. This saves configuration only.

Parameters:

path (str) – Path to save configuration.

Return type:

None

load(path)[source]

Load embedder configuration.

Parameters:

path (str) – Path to configuration file.

Return type:

None

property is_fitted: bool

True if embedder is initialized.

Type:

bool