Assessment of Classification Models and Relevant Features on Nonalcoholic Steatohepatitis Using Random Forest
Archivos
Fecha
2021-06
Título de la revista
ISSN de la revista
Título del volumen
Editor
MDPI
Resumen
Nonalcoholic fatty liver disease (NAFLD) is the hepatic manifestation of metabolic syndrome and is the most common cause of chronic liver disease in developed countries. Certain
conditions, including mild inflammation biomarkers, dyslipidemia, and insulin resistance, can trigger a progression to nonalcoholic steatohepatitis (NASH), a condition characterized by inflammation
and liver cell damage. We demonstrate the usefulness of machine learning with a case study to
analyze the most important features in random forest (RF) models for predicting patients at risk
of developing NASH. We collected data from patients who attended the Cardiovascular Risk Unit
of Mostoles University Hospital (Madrid, Spain) from 2005 to 2021. We reviewed electronic health
records to assess the presence of NASH, which was used as the outcome. We chose RF as the algorithm to develop six models using different pre-processing strategies. The performance metrics was
evaluated to choose an optimized model. Finally, several interpretability techniques, such as feature
importance, contribution of each feature to predictions, and partial dependence plots, were used to
understand and explain the model to help obtain a better understanding of machine learning-based
predictions. In total, 1525 patients met the inclusion criteria. The mean age was 57.3 years, and 507 patients had NASH (prevalence of 33.2%). Filter methods (the chi-square and Mann–Whitney–Wilcoxon
tests) did not produce additional insight in terms of interactions, contributions, or relationships
among variables and their outcomes. The random forest model correctly classified patients with
NASH to an accuracy of 0.87 in the best model and to 0.79 in the worst one. Four features were the
most relevant: insulin resistance, ferritin, serum levels of insulin, and triglycerides. The contribution
of each feature was assessed via partial dependence plots. Random forest-based modeling demonstrated that machine learning can be used to improve interpretability, produce understanding of the
modeled behavior, and demonstrate how far certain features can contribute to predictions.
Descripción
Citación
García-Carretero, R.; Holgado-Cuadrado, R.; Barquero-Pérez, Ó. Assessment of Classification Models and Relevant Features on Nonalcoholic Steatohepatitis Using Random Forest. Entropy 2021, 23, 763. https://doi.org/10.3390/e23060763
Colecciones
Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution 4.0 International