Examinando por Autor "Chushig-Muzo, David"
Mostrando 1 - 6 de 6
- Resultados por página
- Opciones de ordenación
Ítem Algoritmos genéticos para la mejora de iluminación en imágenes macroscópicas y modelos basados en redes neuronales para la segmentación y detección de lesiones cutáneas(Sociedad Española de Ingeniería Biomédica, 2024-11-13) Gómez-Martínez, Vanesa; Chushig-Muzo, David; Soguero-Ruiz, CristinaEl cáncer de piel es una de las formas de cáncer más comunes y de rápido crecimiento a nivel mundial. Tradicionalmente, las imágenes dermatoscópicas han sido el estándar para evaluar lesiones cutáneas debido a su alta resolución y detalle. Sin embargo, las imágenes macroscópicas están ganando popularidad en la práctica clı́nica por su accesibilidad y facilidad de uso, aunque suelen presentar una calidad inferior que puede afectar la precisión del diagnóstico. Este estudio propone y evalúa técnicas para mejorar la iluminación de imágenes macroscópicas mediante algoritmos genéticos (AGs), modelos U-Net para la segmentación de lesiones cutáneas y redes neuronales convolucionales para la detección de melanoma. Mediante la aplicación de AGs para ajustar el contraste y brillo, se logró mejorar la calidad visual de las imágenes en comparación con los métodos del estado del arte. Estas imágenes mejoradas permitieron obtener los mejores resultados en segmentación con el modelo Attention U-Net, alcanzando un ı́ndice Dice de 0.871, superando a imágenes originales y a aquellas mejoradas con métodos del estado del arte. Además, para la detección de melanoma, se evaluaron tres enfoques de imágenes: originales, mejoradas con AGs, y mejoradas con AGs y segmentadas. El enfoque de imágenes mejoradas y segmentadas junto con Resnet-50 logró un AUCROC de 0.80, lo que representa una mejora del 4 % respecto a las originales y del 2 % respecto a las solo mejoradas. Estos resultados destacan la eficacia de combinar técnicas de mejora y segmentación para mejorar la precisión en la detección de melanoma.Ítem Ensemble feature selection and tabular data augmentation with generative adversarial networks to enhance cutaneous melanoma identification and interpretability(BioMed Central, 2024-10-30) Gómez Martínez, Vanesa; Chushig-Muzo, David; Veierød, Marit B.; Granja, Conceição; Soguero Ruiz, CristinaCutaneous melanoma is the most aggressive form of skin cancer, responsible for most skin cancer-related deaths. Recent advances in artificial intelligence, jointly with the availability of public dermoscopy image datasets, have allowed to assist dermatologists in melanoma identification. While image feature extraction holds potential for melanoma detection, it often leads to high-dimensional data. Furthermore, most image datasets present the class imbalance problem, where a few classes have numerous samples, whereas others are under-represented.MethodsIn this paper, we propose to combine ensemble feature selection (FS) methods and data augmentation with the conditional tabular generative adversarial networks (CTGAN) to enhance melanoma identification in imbalanced datasets. We employed dermoscopy images from two public datasets, PH2 and Derm7pt, which contain melanoma and not-melanoma lesions. To capture intrinsic information from skin lesions, we conduct two feature extraction (FE) approaches, including handcrafted and embedding features. For the former, color, geometric and first-, second-, and higher-order texture features were extracted, whereas for the latter, embeddings were obtained using ResNet-based models. To alleviate the high-dimensionality in the FE, ensemble FS with filter methods were used and evaluated. For data augmentation, we conducted a progressive analysis of the imbalance ratio (IR), related to the amount of synthetic samples created, and evaluated the impact on the predictive results. To gain interpretability on predictive models, we used SHAP, bootstrap resampling statistical tests and UMAP visualizations.ResultsThe combination of ensemble FS, CTGAN, and linear models achieved the best predictive results, achieving AUCROC values of 87% (with support vector machine and IR=0.9) and 76% (with LASSO and IR=1.0) for the PH2 and Derm7pt, respectively. We also identified that melanoma lesions were mainly characterized by features related to color, while not-melanoma lesions were characterized by texture features.ConclusionsOur results demonstrate the effectiveness of ensemble FS and synthetic data in the development of models that accurately identify melanoma. This research advances skin lesion analysis, contributing to both melanoma detection and the interpretation of main features for its identification.Ítem Evaluation of Synthetic Categorical Data Generation Techniques for Predicting Cardiovascular Diseases and Post-Hoc Interpretability of the Risk Factors(MDPI, 2023-03-23) García Vicente, Clara; Chushig-Muzo, David; Mora Jiménez, Inmaculada; Fabelo, Himar; Torhild Gram, Inger; Løchen, Maja-Lisa; Granja, Conceição; Soguero Ruiz, CristinaMachine Learning (ML) methods have become important for enhancing the performance of decision-support predictive models. However, class imbalance is one of the main challenges for developing ML models, because it may bias the learning process and the model generalization ability. In this paper, we consider oversampling methods for generating synthetic categorical clinical data aiming to improve the predictive performance in ML models, and the identification of risk factors for cardiovascular diseases (CVDs). We performed a comparative study of several categorical synthetic data generation methods, including Synthetic Minority Oversampling Technique Nominal (SMOTEN), Tabular Variational Autoencoder (TVAE) and Conditional Tabular Generative Adversarial Networks (CTGANs). Then, we assessed the impact of combining oversampling strategies and linear and nonlinear supervised ML methods. Lastly, we conducted a post-hoc model interpretability based on the importance of the risk factors. Experimental results show the potential of GAN-based models for generating high-quality categorical synthetic data, yielding probability mass functions that are very close to those provided by real data, maintaining relevant insights, and contributing to increasing the predictive performance. The GAN-based model and a linear classifier outperform other oversampling techniques, improving the area under the curve by 2%. These results demonstrate the capability of synthetic data to help with both determining risk factors and building models for CVD prediction.Ítem Interpreting clinical latent representations using autoencoders and probabilistic models(Elsevier, 2021) Chushig-Muzo, David; Soguero-Ruiz, Cristina; Bohoyo, Pablo de Miguel; Mora-Jiménez, InmaculadaElectronic health records (EHRs) are a valuable data source that, in conjunction with deep learning (DL) methods, have provided important outcomes in different domains, contributing to supporting decision-making. Owing to the remarkable advancements achieved by DL-based models, autoencoders (AE) are becoming extensively used in health care. Nevertheless, AE-based models are based on nonlinear transformations, resulting in black-box models leading to a lack of interpretability, which is vital in the clinical setting. To obtain insights from AE latent representations, we propose a methodology by combining probabilistic models based on Gaussian mixture models and hierarchical clustering supported by Kullback-Leibler divergence. To validate the methodology from a clinical viewpoint, we used real-world data extracted from EHRs of the University Hospital of Fuenlabrada (Spain). Records were associated with healthy and chronic hypertensive and diabetic patients. Experimental outcomes showed that our approach can find groups of patients with similar health conditions by identifying patterns associated with diagnosis and drug codes. This work opens up promising opportunities for interpreting representations obtained by the AE-based model, bringing some light to the decision-making process made by clinical experts in daily practice.Ítem Learning and visualizing chronic latent representations using electronic health records(BMC, 2022-09-05) Chushig-Muzo, David; Soguero-Ruiz, Cristina; Miguel Bohoyo, Pablo; Mora-Jiménez, InmaculadaBackground: Nowadays, patients with chronic diseases such as diabetes and hypertension have reached alarming numbers worldwide. These diseases increase the risk of developing acute complications and involve a substantial economic burden and demand for health resources. The widespread adoption of Electronic Health Records (EHRs) is opening great opportunities for supporting decision-making. Nevertheless, data extracted from EHRs are complex (heterogeneous, high-dimensional and usually noisy), hampering the knowledge extraction with conventional approaches. Methods: We propose the use of the Denoising Autoencoder (DAE), a Machine Learning (ML) technique allowing to transform high-dimensional data into latent representations (LRs), thus addressing the main challenges with clinical data. We explore in this work how the combination of LRs with a visualization method can be used to map the patient data in a two-dimensional space, gaining knowledge about the distribution of patients with diferent chronic conditions. Furthermore, this representation can be also used to characterize the patient’s health status evolution, which is of paramount importance in the clinical setting. Results: To obtain clinical LRs, we considered real-world data extracted from EHRs linked to the University Hospital of Fuenlabrada in Spain. Experimental results showed the great potential of DAEs to identify patients with clinical patterns linked to hyper‑ tension, diabetes and multimorbidity. The procedure allowed us to fnd patients with the same main chronic disease but diferent clinical characteristics. Thus, we identifed two kinds of diabetic patients with diferences in their drug therapy (insulin and non-insulin dependant), and also a group of women afected by hypertension and gestational diabetes. We also present a proof of concept for mapping the health status evolution of synthetic patients when considering the most signifcant diagnoses and drugs associated with chronic patients. Conclusion: Our results highlighted the value of ML techniques to extract clinical knowledge, supporting the identifcation of patients with certain chronic conditions. Furthermore, the patient’s health status progression on the two-dimensional space might be used as a tool for clinicians aiming to characterize health conditions and identify their more relevant clinical codes.Ítem Learning and visualizing chronic latent representations using electronic health records(Springer, 2022-09-05) Chushig-Muzo, David; Soguero Ruiz, Cristina; de Miguel Bohoyo, Pablo; Mora Jiménez, InmaculadaNowadays, patients with chronic diseases such as diabetes and hypertension have reached alarming numbers worldwide. These diseases increase the risk of developing acute complications and involve a substantial economic burden and demand for health resources. The widespread adoption of Electronic Health Records (EHRs) is opening great opportunities for supporting decision-making. Nevertheless, data extracted from EHRs are complex (heterogeneous, high-dimensional and usually noisy), hampering the knowledge extraction with conventional approaches. We propose the use of the Denoising Autoencoder (DAE), a Machine Learning (ML) technique allowing to transform high-dimensional data into latent representations (LRs), thus addressing the main challenges with clinical data. We explore in this work how the combination of LRs with a visualization method can be used to map the patient data in a two-dimensional space, gaining knowledge about the distribution of patients with different chronic conditions. Furthermore, this representation can be also used to characterize the patient’s health status evolution, which is of paramount importance in the clinical setting. To obtain clinical LRs, we considered real-world data extracted from EHRs linked to the University Hospital of Fuenlabrada in Spain. Experimental results showed the great potential of DAEs to identify patients with clinical patterns linked to hypertension, diabetes and multimorbidity. The procedure allowed us to find patients with the same main chronic disease but different clinical characteristics. Thus, we identified two kinds of diabetic patients with differences in their drug therapy (insulin and non-insulin dependant), and also a group of women affected by hypertension and gestational diabetes. We also present a proof of concept for mapping the health status evolution of synthetic patients when considering the most significant diagnoses and drugs associated with chronic patients. Our results highlighted the value of ML techniques to extract clinical knowledge, supporting the identification of patients with certain chronic conditions. Furthermore, the patient’s health status progression on the two-dimensional space might be used as a tool for clinicians aiming to characterize health conditions and identify their more relevant clinical codes.