Multivariate feature selection and autoencoder embeddings of ovarian cancer clinical and genetic data
Although certain genetic alterations have been defined as predictive and prognostic biomarkers in the context of ovarian cancer (OC), data science methods represent alternative approaches to identify novel correlations and define relevant markers in these gynecological tumors. Considering this potential, our work focused both on clinical and genomic data information collected from patients with OC to identify relationships between clinical and genetic factors and disease progression-related variables. For this aim, we proposed two analyses: (1) a nonlinear exploration of an OC dataset using autoencoders, a type of neural network that can be used as a feature extraction tool to represent a dataset in 3-dimensional latent space, so that we could assess whether there are intrinsic or natural nonlinear separability patterns between disease progression groups (in our case, platinum-sensitive and platinum-resistant groups); and (2) the identification of relevant variable relationships by means of an adaptation of the informative variable identifier (IVI), a feature selection method that labels each input feature as informative or noisy with respect to the task at hand, identifies the relationships among features, and builds a ranking of features, allowing us to study which input features and relationships may be most informative for the OC disease progression classification to define new biomarkers involved in disease progression. Our interest has been in clinical and genetic factors and in the combination of clinical features and genetic profile. Results with autoencoders suggest a pattern of separability between disease progression groups in the clinical part and for the combination of genes and clinical features of the OC dataset, that is increased via supervised fine tuning. In the genetic part, this pattern of separability is not observed, but it is more defined when a supervised fine tuning is performed. Results of the IVI-mediated feature selection method show significance for relevant clinical variables (such as type of surgery and neoadjuvant chemotherapy), some mutation genes, and low-risk genetic features. These results highlight the efficacy of the considered approaches to better understand the clinical course of OC.
This work has been supported by the meHeart-RisBi and Beyond grants (PID2019-104356RB-C42 and PID2019-106623RB-C41) from the Spanish Ministry of Science and Innovation and by the SC-LEARNING-CM grant supported by the REACT-EU programme (Next Generation EU funds) from the Community of Madrid and Rey Juan Carlos University, spain . All authors read and approved the final manuscript.
- Artículos de Revista