New Strategies for Global Optimum Search in Variance-Based k-Clustering
Fecha
2025
Autores
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Rey Juan Carlos
Resumen
Partitioning a dataset into subsets, or clustering, is a fundamental technique in
data mining for grouping similar data points. A common approach is to formulate
clustering as an optimization problem, where an objective function evaluates
how well the data are grouped. A variance-based objective function sums the
dispersions of all clusters, which are measured as the sum of the squared distances
of all points in the cluster to the center of mass.
Finding the globally optimal partition into k clusters with this objective func tion is known to be NP-hard, even for k = 2 in dimensions greater than one.
These results fostered the proposal of heuristic algorithms, which outputs good
candidates for solution. However, these heuristic approaches do not guarantee
an optimal solution. In fact, it remains an open problem to ensure that the so lution found is close to the global optimum, even for small to moderately sized
datasets. This fundamental challenge motivates the development of exact algo rithms, which explicitly find the globally optimal clustering and can be used to
measure the fitness of heuristic algorithms—this being the central research topic
of this thesis.
In this work, we propose new strategies to find the globally optimal solution,
extending previous ideas and perspectives in the literature. Our analysis shows
that the complexity of our approach improves in terms of theoretical bounds and
experimental simulations. The strategy reduces the solution space, with a careful
enumeration of all possible candidates jointly with a tailored branch-and-bound
method.
Building upon this core approach, we propose additional strategies that in corporate randomization, specifically through the generation of Gaussian pseu dorandom numbers. One of these strategies consistently provides the exact solu tion efficiently, while others rely on random initialization within a metaheuristic
framework. Motivated by this approach, this thesis also investigates new algebraic
constructions for the efficient generation of pseudorandom number sequences with
strong statistical properties. Additionally, we offer new insights into the selection
of parameters for these generators, which may be useful for practitioners working
on similar problems.
Descripción
Tesis Doctoral leída en la Universidad Rey Juan Carlos de Madrid en 2025. Directores: Ana Isabel Gómez Pérez
Domingo Gómez Pérez
Palabras clave
Citación
Soto Sánchez, F. J. (2025). New strategies for global optimum search in variance-based k-clustering (Tesis doctoral). Escuela Internacional de Doctorado, Programa de Doctorado en Tecnologías de la Información y las Comunicaciones.
Colecciones

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution 4.0 International