Common Python API

Memory Allocators

class pysvs.DRAM

Small class for an allocator capable of using huge pages. Prioritizes page use in the order: 1~GiB, 2~MiB, 4~KiB. See Huge Pages for more information on what huge pages are and how to allocate them on your system.

Enums

class pysvs.DistanceType

Select which distance function to use

Members:

L2 : Euclidean Distance (minimize)

MIP : Maximum Inner Product (maximize)

Cosine : Cosine similarity (maximize)

class pysvs.DataType

Datatype Selector

Members:

uint8 : 8-bit unsigned integer.

uint16 : 16-bit unsigned integer.

uint32 : 32-bit unsigned integer.

uint64 : 64-bit unsigned integer.

int8 : 8-bit signed integer.

int16 : 16-bit signed integer.

int32 : 32-bit signed integer.

int64 : 64-bit signed integer.

float16 : 16-bit IEEE floating point.

float32 : 32-bit IEEE floating point.

float64 : 64-bit IEEE floating point.

Helper Functions

pysvs.read_vecs(filename)

Read a file in the bvecs/fvecs/ivecs format and return a NumPy array with the results.

The data type of the returned array is determined by the file extension with the following mapping:

  • bvecs: 8-bit unsigned integers.

  • fvecs: 32-bit floating point numbers.

  • ivecs: 32-bit signed integers.

Parameters:

filename (str) – The file to read.

Returns:

Numpy array with the results.

pysvs.write_vecs(array, filename, skip_check=False)
Parameters:
  • array (array) – The raw array to save.

  • filename (str) – The file where the results will be saved.

  • skip_check (bool) –

    Be default, this function will check if the file extension for the vecs file is appropriate for the given array (see list below).

    Passing skip_check = True overrides this logic and forces creation of the file.

Result:

The array is saved to the requested file.

File extention to array element type:

  • fvecs: np.float32

  • hvecs: np.float16

  • ivecs: np.uint32

  • bvecs: np.uint8

Warning

The user must specify the file extension corresponding to the desired file format in the filename argument of pysvs.write_vecs().

pysvs.read_svs(filename, dtype=<class 'numpy.float32'>)

Read the pysvs native data file as a numpy array. Note: As of no, now type checking is performed. Make sure the requested type actually matches the contents of the file.

Parameters:
  • filename (str) – The file to read.

  • dtype – The data type of the encoded vectors in the file.

Result:

A numpy matrix with the results.

pysvs.convert_fvecs_to_float16(source_file: str, destination_file: str) None

Convert the fvecs file on disk with 32-bit floating point entries to a fvecs file with 16-bit floating point entries.

Parameters:
  • source_file – The source file path to convert.

  • destination_file – The destination file to generate.

pysvs.generate_test_dataset(nvectors, nqueries, ndims, directory, data_seed=None, query_seed=None, num_threads=1, num_neighbors=100, distance=<DistanceType.L2: 0>)

Generate a sample dataset consisting of the base data, queries, and groundtruth all in the standard *vecs form.

Parameters:
  • nvectors (int) – The number of base vectors in the generated dataset.

  • nqueries (int) – The number of query vectors in the generated dataset.

  • ndims (int) – The number of dimensions per vector in the dataset.

  • directory (str) – The directory in which to generate the dataset.

  • data_seed (optional) – The seed to use for random number generation in the dataset.

  • query_seed (optional) – The seed to use for random number generation for the queries.

  • num_threads (optional) – Number of threads to use to generate the groundtruth.

  • num_neighbors (int) – The number of neighbors to compute for the groundtruth.

  • distance (optional) – The distance metric to use for groundtruth generation.

Creates directory if it didn’t already exist. The following files are generated:

  • $(directory)/data.fvecs: The dataset encoded using float32 in as fvecs.

  • $(directory)/queries.fvecs: The queries encoded using float32 as fvecs.

  • $(directory)/groundtruth.ivecs: The computed num_neighbors nearest neighbors of the queries in the dataset with respect to the provided distance.

pysvs.convert_vecs_to_svs(vecs_file: str, pysvs_file: str, dtype: pysvs.DataType = <DataType.float32: 9>) None

Convert the vecs file (containing the specified element types) to the pysvs native format.

Parameters:
  • vecs_file – The source [f/h/i/b]vecs file.

  • pysvs_file – The destination native file.

  • dtype – The pysvs.DataType of the vecs file. Supported types: (float32, float16, uint32, and uint8).

File extension type map:

  • fvecs = pysvs.DataType.float32

  • hvecs = pysvs.DataType.float16

  • ivecs = pysvs.DataType.uint32

  • bvecs = pysvs.DataType.uint8