An index of effective number of variables for uncertainty and reliability analysis in model selection problems

Resumen

An index of an effective number of variables (ENV) is introduced for model selection in nested models. This is the case, for instance, when we have to decide the order of a polynomial function or the number of bases in a nonlinear regression, choose the number of clusters in a clustering problem, or the number of features in a variable selection application (to name few examples). It is inspired by the idea of the maximum area under the curve (AUC). The interpretation of the ENV index is identical to the effective sample size (ESS) indices concerning a set of samples. The ENV index improves drawbacks of the elbow detectors described in the literature and introduces different confidence measures of the proposed solution. These novel measures can be also employed jointly with the use of different information criteria, such as the well-known AIC and BIC, or any other model selection procedures. Comparisons with classical and recent schemes are provided in different experiments involving real datasets. Related Matlab code is given

Descripción

Citación

Luca Martino, Eduardo Morgado, Roberto San Millán Castillo, An index of effective number of variables for uncertainty and reliability analysis in model selection problems, Signal Processing, 2024, 109735, ISSN 0165-1684, https://doi.org/10.1016/j.sigpro.2024.109735
license logo
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución-NoComercial 4.0 Internacional