Unconventional application of k-means for distributed approximate similarity search

dc.contributor.authorOrtega, Felipe
dc.contributor.authorAlgar, Maria Jesus
dc.contributor.authorMartín de Diego, Isaac
dc.contributor.authorMartínez Moguerza, Javier
dc.date.accessioned2023-09-22T11:25:58Z
dc.date.available2023-09-22T11:25:58Z
dc.date.issued2022
dc.description.abstractSimilarity search based on a distance function in metric spaces is a fundamental problem for many applications. Queries for similar objects lead to the well-known machine learning task of nearest-neighbours identification. Many data indexing strategies, collectively known as Metric Access Methods (MAM), have been proposed to speed up these queries. Moreover, since exact approaches to solving similarity queries can be complex and timeconsuming, alternative options have emerged to reduce query execution time, such as returning approximate results or resorting to distributed computing platforms. In this paper, we introduce MASK (Multilevel Approximate Similarity search with k-means), an unconventional application of the k-means algorithm as the foundation of a multilevel index structure for approximate similarity search suitable for metric spaces. We show that this method leverages inherent properties of k-means for this purpose, like representing high-density data areas with fewer prototypes. An implementation of this new indexing procedure is evaluated using a synthetic dataset and two real-world datasets in highdimensional and high-sparsity spaces. Experimental tests show that MASK performs better than alternative algorithms for approximate similarity search. Results are promising and underpin the applicability of this novel indexing method in multiple domains.es
dc.identifier.citationFelipe Ortega, Maria Jesus Algar, Isaac Martín de Diego, Javier M. Moguerza, Unconventional application of k-means for distributed approximate similarity search, Information Sciences, Volume 619, 2023, Pages 208-234, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2022.11.024es
dc.identifier.doi10.1016/j.ins.2022.11.024es
dc.identifier.issn0020-0255
dc.identifier.urihttps://hdl.handle.net/10115/24488
dc.language.isoenges
dc.publisherElsevieres
dc.rightsAtribución 4.0 Internacional*
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectData indexinges
dc.subjectApproximate similarity searches
dc.subjectMetric distancees
dc.subjectUnsupervised learninges
dc.subjectDistributed computinges
dc.subjectk-meanses
dc.titleUnconventional application of k-means for distributed approximate similarity searches
dc.typeinfo:eu-repo/semantics/articlees

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1-s2.0-S0020025522013056-main.pdf
Tamaño:
4.82 MB
Formato:
Adobe Portable Document Format
Descripción:

Bloque de licencias

Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
2.67 KB
Formato:
Item-specific license agreed upon to submission
Descripción: