Hostility measure for multi-level study of data complexity
Archivos
Fecha
2022
Título de la revista
ISSN de la revista
Título del volumen
Editor
Springer
Resumen
Complexity measures aim to characterize the underlying complexity of supervised data. These measures tackle factors
hindering the performance of Machine Learning (ML) classifiers like overlap, density, linearity, etc. The state-of-the-art
has mainly focused on the dataset perspective of complexity, i.e., offering an estimation of the complexity of the whole
dataset. Recently, the instance perspective has also been addressed. In this paper, the hostility measure, a complexity measure
offering a multi-level (instance, class, and dataset) perspective of data complexity is proposed. The proposal is built by
estimating the novel notion of hostility: the difficulty of correctly classifying a point, a class, or a whole dataset given their
corresponding neighborhoods. The proposed measure is estimated at the instance level by applying the k-means algorithm
in a recursive and hierarchical way, which allows to analyze how points from different classes are naturally grouped together
across partitions. The instance information is aggregated to provide complexity knowledge at the class and the dataset
levels. The validity of the proposal is evaluated through a variety of experiments dealing with the three perspectives and the
corresponding comparative with the state-of-the-art measures. Throughout the experiments, the hostility measure has shown
promising results and to be competitive, stable, and robust.
Descripción
This research has been supported by grants from Rey Juan Carlos University (Ref: C1PREDOC2020), Madrid Autonomous Community (Ref: IND2019/TIC-17194) and the Spanish Ministry of Science and Innovation, under the Retos-Investigación program: MODAS-IN (Ref: RTI-2018-094269-B-I00). We would like to thank Antonio Alonso Ayuso and Emilio López Cano from the Data Science Laboratory at Rey Juan Carlos University for the checking of mathematics.
Funding
Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.
Author information
Palabras clave
Citación
Lancho, C., Martín De Diego, I., Cuesta, M. et al. Hostility measure for multi-level study of data complexity. Appl Intell 53, 8073–8096 (2023). https://doi.org/10.1007/s10489-022-03793-w
Colecciones
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional