Unsupervised Learning Methods to Extract ClinicalKnowledge of Patients with Chronic Diseases
Abstract
Over the last decades, life expectancy has significantly increased worldwide.Recentdemographic trends outline that the number of elderly people will continue to rise, yieldingpopulations at higher risk of developing chronic diseases. The number of chronic patients isgrowing yearly, entailing a significant health burden and demand of services and resources formedical care. Diabetes and hypertension are two of the most prevalent chronic conditions,mainly showing patterns of associative multimorbidity among elderly people.The widespread adoption of Electronic Health Records (EHRs) in national health systemshas generated an unprecedented amount of clinical data. EHRs allow to register data ofdifferent aspects of care, collecting great information of patients, and becoming a valuablesource for conducting data-driven approaches, especially those based on Machine Learning(ML). These methods have revolutionized both academia and industry, substantiallyoutperforming prior outcomes in different domains. ML models have been used in conjunctionwith EHRs for different clinical applications, including patient mortality prediction, hospitalreadmission prediction, and identification of adverse events, among others. The obtainedinsights from these models have the potential to lead an important transformation in traditionalhealth care, shifting from approaches guided by experts to data-driven approaches.Despite the noteworthy benefits of using ML methods in the clinical setting, data extractedfrom EHRs raised important challenges. EHR data exhibit high levels of heterogeneity andhigh-dimensionality that substantially affect the learning process of statistical and conventionalML methods. Furthermore, in many applications, the data labels may not be available or bereliable. Unsupervised learning methods provide a way to reveal the underlying structure ofcomplex datasets, allowing us to discover unknown patterns and characterize clustersassociated with chronic conditions. The main goal of this Dissertation is to apply and adaptunsupervised learning methods to automatically extract clinical knowledge of patients with chronic diseases. The following specific objectives are proposed:(i)to develop a data-drivenapproach enabling the clinical characterization of the health status associated with differentchronic populations;(ii)to build new representations associated with chronic patients throughdimensionality-reduction techniques enabling the visualization and identification of clusters ofpatients with specific chronic conditions; and(iii)to design a methodology based onprobabilistic methods for supporting interpretability of black-box models when used in theclinical setting. From a clinical point of view, we seek to determine factors associated with theonset and progression of chronic conditions, crucial for planning resources, early diagnosis,and prevention. Remark that early interventions and appropriate treatments can help to reducethe economic burden associated with chronic diseases.This Thesis contributes to the bioengineering field by providing effective unsupervisedlearning methodologies for extracting clinical knowledge from real-world patient data,allowing us to address the main challenges raised by EHR data, improving pattern recognition,visualization, and clinical interpretation.
Description
Tesis Doctoral leída en la Universidad Rey Juan Carlos de Madrid en 2022. Directoras de la Tesis: Inmaculada Mora Jiménez y Cristina Soguero Ruiz
Collections
- Tesis Doctorales [1490]