Multivariate feature selection and autoencoder embeddings of ovarian cancer clinical and genetic data
Fecha
2022
Título de la revista
ISSN de la revista
Título del volumen
Editor
Elsevier
Resumen
Although certain genetic alterations have been defined as predictive and prognostic biomarkers in the context
of ovarian cancer (OC), data science methods represent alternative approaches to identify novel correlations
and define relevant markers in these gynecological tumors. Considering this potential, our work focused both
on clinical and genomic data information collected from patients with OC to identify relationships between
clinical and genetic factors and disease progression-related variables. For this aim, we proposed two analyses:
(1) a nonlinear exploration of an OC dataset using autoencoders, a type of neural network that can be used as
a feature extraction tool to represent a dataset in 3-dimensional latent space, so that we could assess whether
there are intrinsic or natural nonlinear separability patterns between disease progression groups (in our case,
platinum-sensitive and platinum-resistant groups); and (2) the identification of relevant variable relationships
by means of an adaptation of the informative variable identifier (IVI), a feature selection method that labels
each input feature as informative or noisy with respect to the task at hand, identifies the relationships among
features, and builds a ranking of features, allowing us to study which input features and relationships may be
most informative for the OC disease progression classification to define new biomarkers involved in disease
progression. Our interest has been in clinical and genetic factors and in the combination of clinical features
and genetic profile. Results with autoencoders suggest a pattern of separability between disease progression
groups in the clinical part and for the combination of genes and clinical features of the OC dataset, that is
increased via supervised fine tuning. In the genetic part, this pattern of separability is not observed, but it is
more defined when a supervised fine tuning is performed. Results of the IVI-mediated feature selection method
show significance for relevant clinical variables (such as type of surgery and neoadjuvant chemotherapy), some
mutation genes, and low-risk genetic features. These results highlight the efficacy of the considered approaches
to better understand the clinical course of OC.
Descripción
This work has been supported by the meHeart-RisBi and Beyond grants (PID2019-104356RB-C42 and PID2019-106623RB-C41) from the Spanish Ministry of Science and Innovation and by the SC-LEARNING-CM grant supported by the REACT-EU programme (Next Generation EU funds) from the Community of Madrid and Rey Juan Carlos University, spain . All authors read and approved the final manuscript.
Palabras clave
Citación
Luis Bote-Curiel, Sergio Ruiz-Llorente, Sergio Muñoz-Romero, Mónica Yagüe-Fernández, Arantzazu Barquín, Jesús García-Donas, José Luis Rojo-Álvarez, Multivariate feature selection and autoencoder embeddings of ovarian cancer clinical and genetic data, Expert Systems with Applications, Volume 206, 2022, 117865, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2022.117865
Colecciones
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional