Loaders

Uncompressed File Loaders

class pysvs.VectorDataLoader

Handle representing an uncompressed vector data file.

__init__(self: pysvs.VectorDataLoader, path: str, data_type: pysvs.DataType | None = None, dims: int | None = None) None

Construct a new pysvs.VectorDataLoader.

Parameters:
  • path (str) –

    The path to the file to load. This can either be:

    • The path to the directory where a previous vector dataset was saved (preferred).

    • The direct path to the vector data file itself. In this case, the type of the file will try to be inferred automatically. Recognized extensions: “.[b/i/f]vecs”, “.bin”, and “.svs”.

  • data_type (pysvs.DataType) – The native type of the elements in the dataset.

  • dims (int) – The expected dimsionality of the dataset. While this argument is generally optional, providing it may yield runtime speedups.

property data_type

Access the assigned data type.

Type:

Read/Write (pysvs.DataType)

property dims

Access the expected dimensionality.

Type:

Read/Write (int)

property filepath

Access the underlying file path.

Type:

Read/Write (str)

LVQ Loader

The LVQ loader provides lazy compression of uncompressed data and reloading of previously saved LVQ data.

class pysvs.LVQLoader

Generic LVQ Loader

__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: pysvs.LVQLoader, datafile: pysvs.VectorDataLoader, primary: int, residual: int = 0, padding: int = 0, strategy: pysvs.LVQStrategy = <LVQStrategy.Auto: 0>) -> None

Construct a loader that will lazily compress the results of the data loader. Requires an appropriate back-end to be compiled for all combinations of primary and residual bits.

Parameters:
  • loader (pysvs.VectorDataLoader) – The uncompressed dataset to compress in-memory.

  • primary (int) – The number of bits to use for compression in the primary dataset.

  • residual (int) – The number of bits to use for compression in the residual dataset. Default: 0.

  • padding (int) – The value (in bytes) to align the beginning of each compressed vectors. Values of 32 or 64 may offer the best performance at the cost of a lower compression ratio. A value of 0 implies no special alignment.

  • strategy (pysvs.LVQStrategy) – The packing strategy to use for the compressed codes. See the associated documenation for that enum.

  1. __init__(self: pysvs.LVQLoader, directory: str, padding: int = 0, strategy: pysvs.LVQStrategy = <LVQStrategy.Auto: 0>) -> None

Reload a compressed dataset from a previously saved dataset. Requires an appropriate back-end to be compiled for all combinations of primary and residual bits.

Parameters:
  • directory (str) – The directory where the dataset was previously saved.

  • primary (int) – The number of bits to use for compression in the primary dataset.

  • residual (int) – The number of bits to use for compression in the residual dataset. Default: 0>

  • dims (int) – The number of dimensions in the dataset. May provide a performance boost if given if a specialization has been compiled. Default: Dynamic (any dimension).

  • padding (int) – The value (in bytes) to align the beginning of each compressed vectors. Values of 32 or 64 may offer the best performance at the cost of a lower compression ratio. A value of 0 implies no special alignment. Default: 0.

  • strategy (pysvs.LVQStrategy) – The packing strategy to use for the compressed codes. See the associated documenation for that enum.

  1. __init__(self: pysvs.LVQLoader, legacy: pysvs.LVQ4) -> None

  2. __init__(self: pysvs.LVQLoader, legacy: pysvs.LVQ8) -> None

  3. __init__(self: pysvs.LVQLoader, legacy: pysvs.LVQ4x4) -> None

  4. __init__(self: pysvs.LVQLoader, legacy: pysvs.LVQ4x8) -> None

  5. __init__(self: pysvs.LVQLoader, legacy: pysvs.LVQ8x8) -> None

property dims

The number of dimensions.

property primary_bits

The number of bits used for the primary encoding.

reload_from(self: pysvs.LVQLoader, directory: str) pysvs.LVQLoader

Create a copy of the argument loader configured to reload a previously saved LVQ dataset from the given directory.

property residual_bits

The number of bits used for the residual encoding.

property strategy

The packing strategy to use.

Strategy Selection

The strategy argument of the LVQ loader provides a way of overriding the default selection of the packing strategy used by a LVQ backend.

Note that overriding the default strategy requires the corresponding backend to be compiled in the pysvs shared library component.

class pysvs.LVQStrategy

Select the packing mode for LVQ

Members:

Auto : Let SVS decide the best strategy.

Sequential : Use the Sequential packing strategy.

Turbo : Use the best Turbo packing strategy for this architecture.

LeanVecLoader

The LeanVec loader provides a way to use dimensionality reduction to improve performance on high dimensional datasets.

Internally, a LeanVec dataset consists of the dimensionality reduced primary dataset (over which the bulk of the index search is conducted) and a full dimensional secondary dataset used to rerank and refine candidates returned from the initial search.

pysvs allows selection of the storage format using the pysvs.LeanVecKind enum, enabling float16 and lvq compression for either of the primary and secondary datasets.

class pysvs.LeanVecLoader

Generic LeanVec Loader

__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: pysvs.LeanVecLoader, datafile: pysvs.VectorDataLoader, leanvec_dims: int, primary_kind: pysvs.LeanVecKind = <LeanVecKind.lvq8: 2>, secondary_kind: pysvs.LeanVecKind = <LeanVecKind.lvq8: 2>, data_matrix: Optional[numpy.ndarray[numpy.float32]] = None, query_matrix: Optional[numpy.ndarray[numpy.float32]] = None, alignment: int = 32) -> None

Construct a loader that will lazily reduce the dimensionality of the data loader. Requires an appropriate back-end to be compiled for all combinations of primary and secondary types.

Parameters:
  • loader (pysvs.VectorDataLoader) – The uncompressed original dataset.

  • leanvec_dims (int) – resulting value of reduced dimensionality

  • primary (LeanVecKind) – Type of dataset used for Primary (Default: LVQ8)

  • secondary (LeanVecKind) – Type of dataset used for Secondary (Default: LVQ8)

  • data_matrix (Optional[numpy.ndarray[numpy.float32]]) – Matrix for data transformation [see note 1] (Default: None).

  • query_matrix (Optional[numpy.ndarray[numpy.float32]]) – Matrix for query transformation [see note 1] (Default: None).

  • alignment (int) – alignement/padding used in LVQ data types (Default: 32)

Note 1: The arguments data_matrix and data_matrix are optional and have the following requirements for valid combinations:

  1. Neither matrix provided: Transform dataset and queries using a default PCA-based transformation.

  2. Only data_matrix provided: The provided matrix is used to transform both the queries and the original dataset.

  3. Both arguments are provided: Use the respective matrices for transformation.

  1. __init__(self: pysvs.LeanVecLoader, directory: str, alignment: int = 32) -> None

Reload a LeanVec dataset from a previously saved dataset. Requires an appropriate back-end to be compiled for all combinations of primary and secondary types.

Parameters:
  • directory (str) – The directory where the dataset was previously saved.

  • leanvec_dims (int) – resulting value of reduced dimensionality. Default: Dynamic (any dimension).

  • dims (int) – The number of dimensions in the original dataset. Default: Dynamic (any dimension).

  • primary (LeanVecKind) – Type of dataset used for Primary Default: pysvs.LeanVecKind.lvq8.

  • secondary (LeanVecKind) – Type of dataset used for Secondary Default: pysvs.LeanVecKind.LVQ8.

  • alignment (int) – alignement/padding used in LVQ data types. Default: 32.

property alignment

The alignment to use for LVQ encoded data.

property dims

The full-dimensionality.

property leanvec_dims

The reduced dimensionality.

property primary_kind

The encoding of the reduced dimensional dataset.

reload_from(self: pysvs.LeanVecLoader, directory: str) pysvs.LeanVecLoader

Create a copy of the argument loader configured to reload a previously saved LeanVec dataset from the given directory.

property secondary_kind

The encoding of the full-dimensional dataset.

class pysvs.LeanVecKind

LeanVec primary and secondary types

Members:

float32 : Uncompressed float32

float16 : Uncompressed float16

lvq8 : Compressed with LVQ 8bits

lvq4 : Compressed with LVQ 4bits