New Strategies for Global Optimum Search in Variance-Based k-Clustering
dc.contributor.author | Soto Sánchez, Francisco Javier | |
dc.date.accessioned | 2025-08-01T07:22:37Z | |
dc.date.available | 2025-08-01T07:22:37Z | |
dc.date.issued | 2025 | |
dc.description | Tesis Doctoral leída en la Universidad Rey Juan Carlos de Madrid en 2025. Directores: Ana Isabel Gómez Pérez Domingo Gómez Pérez | |
dc.description.abstract | Partitioning a dataset into subsets, or clustering, is a fundamental technique in data mining for grouping similar data points. A common approach is to formulate clustering as an optimization problem, where an objective function evaluates how well the data are grouped. A variance-based objective function sums the dispersions of all clusters, which are measured as the sum of the squared distances of all points in the cluster to the center of mass. Finding the globally optimal partition into k clusters with this objective func tion is known to be NP-hard, even for k = 2 in dimensions greater than one. These results fostered the proposal of heuristic algorithms, which outputs good candidates for solution. However, these heuristic approaches do not guarantee an optimal solution. In fact, it remains an open problem to ensure that the so lution found is close to the global optimum, even for small to moderately sized datasets. This fundamental challenge motivates the development of exact algo rithms, which explicitly find the globally optimal clustering and can be used to measure the fitness of heuristic algorithms—this being the central research topic of this thesis. In this work, we propose new strategies to find the globally optimal solution, extending previous ideas and perspectives in the literature. Our analysis shows that the complexity of our approach improves in terms of theoretical bounds and experimental simulations. The strategy reduces the solution space, with a careful enumeration of all possible candidates jointly with a tailored branch-and-bound method. Building upon this core approach, we propose additional strategies that in corporate randomization, specifically through the generation of Gaussian pseu dorandom numbers. One of these strategies consistently provides the exact solu tion efficiently, while others rely on random initialization within a metaheuristic framework. Motivated by this approach, this thesis also investigates new algebraic constructions for the efficient generation of pseudorandom number sequences with strong statistical properties. Additionally, we offer new insights into the selection of parameters for these generators, which may be useful for practitioners working on similar problems. | |
dc.identifier.citation | Soto Sánchez, F. J. (2025). New strategies for global optimum search in variance-based k-clustering (Tesis doctoral). Escuela Internacional de Doctorado, Programa de Doctorado en Tecnologías de la Información y las Comunicaciones. | |
dc.identifier.uri | https://hdl.handle.net/10115/97318 | |
dc.language.iso | en_US | |
dc.publisher | Universidad Rey Juan Carlos | |
dc.rights | Attribution 4.0 International | en |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
dc.title | New Strategies for Global Optimum Search in Variance-Based k-Clustering | |
dc.type | Thesis |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- Tesis_Javier_Soto_URJC.pdf
- Tamaño:
- 5.3 MB
- Formato:
- Adobe Portable Document Format