Unconventional application of k-means for distributed approximate similarity search
Fecha
2022
Título de la revista
ISSN de la revista
Título del volumen
Editor
Elsevier
Resumen
Similarity search based on a distance function in metric spaces is a fundamental problem
for many applications. Queries for similar objects lead to the well-known machine learning
task of nearest-neighbours identification. Many data indexing strategies, collectively
known as Metric Access Methods (MAM), have been proposed to speed up these queries.
Moreover, since exact approaches to solving similarity queries can be complex and timeconsuming, alternative options have emerged to reduce query execution time, such as
returning approximate results or resorting to distributed computing platforms. In this
paper, we introduce MASK (Multilevel Approximate Similarity search with k-means), an
unconventional application of the k-means algorithm as the foundation of a multilevel
index structure for approximate similarity search suitable for metric spaces. We show that
this method leverages inherent properties of k-means for this purpose, like representing
high-density data areas with fewer prototypes. An implementation of this new indexing
procedure is evaluated using a synthetic dataset and two real-world datasets in highdimensional and high-sparsity spaces. Experimental tests show that MASK performs better
than alternative algorithms for approximate similarity search. Results are promising and
underpin the applicability of this novel indexing method in multiple domains.
Descripción
Palabras clave
Citación
Felipe Ortega, Maria Jesus Algar, Isaac Martín de Diego, Javier M. Moguerza, Unconventional application of k-means for distributed approximate similarity search, Information Sciences, Volume 619, 2023, Pages 208-234, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2022.11.024
Colecciones
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional