Dynamic Vamana Graph Index

In this section, we cover the API and usage of the DynamicVamana graph-based index.

class pysvs.DynamicVamana

Top level class for the dynamic Vamana graph index.

__init__(self: pysvs.DynamicVamana, config_path: str, graph_loader: pysvs.GraphLoader, data_loader: Union[pysvs.VectorDataLoader, pysvs.LVQLoader], distance: pysvs.DistanceType = <DistanceType.L2: 0>, query_type: pysvs.DataType = <DataType.float32: 9>, enforce_dims: bool = False, num_threads: int = 1, debug_load_from_static: bool = False) → None

add(self: pysvs.DynamicVamana, points: numpy.ndarray[numpy.float32], ids: numpy.ndarray[numpy.uint64]) → None

Add every point in points to the index, assigning the element-wise corresponding ID to each point.

Parameters:

points – A matrix of data whose rows, corresponding to points in R^n, will be added to the index.
ids – Vector of ids to assign to each row in points. Must have the same number of elements as points has rows.

Furthermore, all entries in ids must be unique and not already exist in the index. If either of these does not hold, an exception will be thrown without mutating the underlying index.

all_ids(self: pysvs.DynamicVamana) → numpy.ndarray[numpy.uint64]: Return a Numpy vector of all IDs currently in the index.

property alpha

Get/set the alpha value used when adding and deleting points.

Type:: Read/Write (float)

static build(parameters: pysvs.VamanaBuildParameters, data: numpy.ndarray[numpy.float32], ids: numpy.ndarray[numpy.uint64], distance_type: pysvs.DistanceType, num_threads: int) → pysvs.DynamicVamana

Construct a Vamana index over the given data, returning a searchable index.

Parameters:

data – The dataset to index. NOTE: PySVS will maintain an internal copy of the dataset. This may change in future releases.
parameters – Parameters controlling graph construction. See below for the documentation of this class.
distance_type – The distance type to use for this dataset.

compact(self: pysvs.DynamicVamana, arg0: int) → pysvs.DynamicVamana: Remove any holes created in the graph and data by renumbering internal IDs. Shrink the underlying data structures. Following consolidate, this can potentialy reduce the memory footprint of the index if a sufficient number of points were deleted.

consolidate(self: pysvs.DynamicVamana) → pysvs.DynamicVamana: Remove and patch around all deleted entries in the graph. Should be called after a sufficient number of deletions to avoid the memory consumption of the index monotonically increasing.

property construction_window_size

Get/set the window size used when adding and deleting points.

Type:: Read/Write (int)

delete(self: pysvs.DynamicVamana, ids: numpy.ndarray[numpy.uint64]) → None

Soft delete the IDs from the index. Soft deletion does not remove the IDs from the graph, but prevents them from being returned from future searches.

Parameters:: ids – The IDs to delete.

Each element in IDs must be unique and must correspond to a valid ID stored in the index. Otherwise, an exception will be thrown. If an exception is thrown for this reason, the index will be left unchanged from before the function call.

property dimensions: Return the logical number of dimensions for each vector in the dataset.

property experimental_backend_string

Get a string identifying the full-type of the backend implementation.

This property is experimental and subject to change without a deprecation warning.

Type:: Read Only (str)

experimental_calibrate(*args, **kwargs)

Overloaded function.

experimental_calibrate(self: pysvs.DynamicVamana, queries: numpy.ndarray[float16], groundtruth: numpy.ndarray[numpy.uint32], num_neighbors: int, target_recall: float, calibration_parameters: pysvs.VamanaCalibrationParameters = <pysvs.VamanaCalibrationParameters object at 0x7f4120623530>) -> pysvs.VamanaSearchParameters

NOTE: This method is experimental and subject to change or removal without notice.

Run an experimental calibration routine to select the best search parameters.

Parameters:

queries – Queries used to drive the calibration process.
groundtruth – The groundtruth for the given query set.
num_neighbors – The number of nearest neighbors to calibrate for.
target_recall – The target num_neighbors-recall-at-num_neighbors. If such a recall is possible, then calibration will find parameters that optimize performance at this recall level.
calibration_parameters – The hyper-parameters to use during calibration.

Returns:

The best pysvs.VamanaSearchParameters found.

The calibration routine will also configure the index with the best found parameters. Note that calibration will use the number of threads already assigned to the index and can therefore be used to tune the algorithm to different threading amounts.

See also: pysvs.VamanaCalibrationParameters

experimental_calibrate(self: pysvs.DynamicVamana, queries: numpy.ndarray[numpy.float32], groundtruth: numpy.ndarray[numpy.uint32], num_neighbors: int, target_recall: float, calibration_parameters: pysvs.VamanaCalibrationParameters = <pysvs.VamanaCalibrationParameters object at 0x7f412061b0f0>) -> pysvs.VamanaSearchParameters

NOTE: This method is experimental and subject to change or removal without notice.

Run an experimental calibration routine to select the best search parameters.

Parameters:

queries – Queries used to drive the calibration process.
groundtruth – The groundtruth for the given query set.
num_neighbors – The number of nearest neighbors to calibrate for.
target_recall – The target num_neighbors-recall-at-num_neighbors. If such a recall is possible, then calibration will find parameters that optimize performance at this recall level.
calibration_parameters – The hyper-parameters to use during calibration.

Returns:

The best pysvs.VamanaSearchParameters found.

The calibration routine will also configure the index with the best found parameters. Note that calibration will use the number of threads already assigned to the index and can therefore be used to tune the algorithm to different threading amounts.

See also: pysvs.VamanaCalibrationParameters

experimental_reset_performance_parameters(self: pysvs.DynamicVamana) → None

Reset the internal performance-only parameters to built-in heuristics. This can be useful if experimenting with different dataset implementations which may need different values for performance-only parameters (such as prefetchers).

Calling this method should not affect recall.

has_id(self: pysvs.DynamicVamana, id: int) → bool: Return whether the ID exists in the index.

property num_threads

Get and set the number of threads used to process queries.

Type:: Read/Write (int)

property query_types: Return the query element types this index is specialized for.

reconstruct(self: pysvs.DynamicVamana, ids: numpy.ndarray[numpy.uint64]) → numpy.ndarray[numpy.float32]

save(self: pysvs.DynamicVamana, config_directory: str, graph_directory: str, data_directory: str) → None

Save a constructed index to disk (useful following index construction).

Parameters:

config_directory – Directory where index configuration information will be saved.
graph_directory – Directory where graph will be saved.
data_directory – Directory where the dataset will be saved.

Note: All directories should be separate to avoid accidental name collision with any auxiliary files that are needed when saving the various components of the index.

If the directory does not exist, it will be created if its parent exists.

It is the caller’s responsibilty to ensure that no existing data will be overwritten when saving the index to this directory.

search(self: pysvs.DynamicVamana, queries: numpy.ndarray[numpy.float32], n_neighbors: int) → tuple

Perform a search to return the n_neighbors approximate nearest neighbors to the query.

Parameters:

queries – Numpy Vector or Matrix representing the queries. If the argument is a vector, it will be treated as a single query. If the argument is a matrix, individual queries are assumed to the rows of the matrix. Returned results will have a position-wise correspondence with the queries. That is, the N-th row of the returned IDs and distances will correspond to the N-th row in the query matrix.
n_neighbors – The number of neighbors to return for this search job.

Returns:

A tuple (I, D) where I contains the n_neighbors approximate (or exact) nearest neighbors to the queries and D contains the approximate distances.

Note: This form is returned regardless of whether the given query was a vector or a matrix.

property search_parameters

Get/set the current search parameters for the index. These parameters modify both the algorithmic properties of search (affecting recall) and non-algorthmic properties of search (affecting queries-per-second).