New Strategies for Global Optimum Search in Variance-Based k-Clustering

Soto Sánchez, Francisco Javier

New Strategies for Global Optimum Search in Variance-Based k-Clustering

Archivos

Tesis_Javier_Soto_URJC.pdf (5.3 MB)

Fecha

2025

Autores

Soto Sánchez, Francisco Javier

Editor

Universidad Rey Juan Carlos

URI

https://hdl.handle.net/10115/97318

Citas

0 citas en

Resumen

Partitioning a dataset into subsets, or clustering, is a fundamental technique in data mining for grouping similar data points. A common approach is to formulate clustering as an optimization problem, where an objective function evaluates how well the data are grouped. A variance-based objective function sums the dispersions of all clusters, which are measured as the sum of the squared distances of all points in the cluster to the center of mass. Finding the globally optimal partition into k clusters with this objective func tion is known to be NP-hard, even for k = 2 in dimensions greater than one. These results fostered the proposal of heuristic algorithms, which outputs good candidates for solution. However, these heuristic approaches do not guarantee an optimal solution. In fact, it remains an open problem to ensure that the so lution found is close to the global optimum, even for small to moderately sized datasets. This fundamental challenge motivates the development of exact algo rithms, which explicitly find the globally optimal clustering and can be used to measure the fitness of heuristic algorithms—this being the central research topic of this thesis. In this work, we propose new strategies to find the globally optimal solution, extending previous ideas and perspectives in the literature. Our analysis shows that the complexity of our approach improves in terms of theoretical bounds and experimental simulations. The strategy reduces the solution space, with a careful enumeration of all possible candidates jointly with a tailored branch-and-bound method. Building upon this core approach, we propose additional strategies that in corporate randomization, specifically through the generation of Gaussian pseu dorandom numbers. One of these strategies consistently provides the exact solu tion efficiently, while others rely on random initialization within a metaheuristic framework. Motivated by this approach, this thesis also investigates new algebraic constructions for the efficient generation of pseudorandom number sequences with strong statistical properties. Additionally, we offer new insights into the selection of parameters for these generators, which may be useful for practitioners working on similar problems.

Descripción

Tesis Doctoral leída en la Universidad Rey Juan Carlos de Madrid en 2025. Directores: Ana Isabel Gómez Pérez Domingo Gómez Pérez

Citación

Soto Sánchez, F. J. (2025). New strategies for global optimum search in variance-based k-clustering (Tesis doctoral). Escuela Internacional de Doctorado, Programa de Doctorado en Tecnologías de la Información y las Comunicaciones.

Colecciones

Tesis Doctorales

Página completa del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution 4.0 International

New Strategies for Global Optimum Search in Variance-Based k-Clustering

Archivos

Fecha

Autores

Título de la revista

ISSN de la revista

Título del volumen

Editor

URI

Citas

Resumen

Descripción

Palabras clave

Citación

Colecciones