Text Analytics and Mixed Feature Extraction in Ovarian Cancer Clinical and Genetic Data

dc.contributor.authorBote-Curiel, Luis
dc.contributor.authorRuiz-Llorente, Sergio
dc.contributor.authorMuñoz-Romero, Sergio
dc.contributor.authorYagüe-Fernández, Mónica
dc.contributor.authorBarquín, Arantzazu
dc.contributor.authorGarcía-Donas, Jesús
dc.contributor.authorRojo-Álvarez, José Luis
dc.date.accessioned2025-01-24T11:01:07Z
dc.date.available2025-01-24T11:01:07Z
dc.date.issued2021
dc.description.abstractDevelopments of richer integrative analysis methods for oncological studies are needed for efficiently leveraging the amount of clinical and genetic data available to provide the clinicians with better information. However, analyses of this nature often require mixing data of different types, which are not immediate to address jointly with classical methods. In this work, our aim is to find relationships between clinical and genetic features of different types (metric, categorical, and text) and the ovarian cancer (OC) disease progression. To this end, we first propose a univariate statistical method for text type applying bootstrap resampling to Bag of Words and Latent Dirichlet Allocation in order to include as features the free-text fields of the health recordings. Secondly, we extend bootstrap resampling for metric and categorical feature extraction with Principal Component Analysis (PCA) and Multiple Correspondence Analysis (MCA), respectively. We subsequently formulate a novel and integrative method for jointly considering metric, categorical, and text features. Results obtained in text analysis indicate individual differences in some words between two OC patients groups categorised according to their sensitivity to platinum drugs. These results indicate separability between both groups for text features. Also, regarding the multivariate analysis, clinical data results showed separability patterns for the three methods analysed according to the platinum-sensitivity degree. The use of these analytical tools in our OC cohort has allowed us to demonstrate their strengths by confirming the predictive and prognostic role of widely-known clinical and genetic variables (BRCA status, value of adjuvant therapy and optimal resection, or family history) and demonstrating significant associations in other variables whose role in OC development has been studied to a lesser extent (such as PMS1, GPC3, and SLX4 genes). These results highlight the value of implementing these approaches for the identification of novel biomarkers in the context of OC.
dc.identifier.citationBote-Curiel, L., Ruiz-Llorente, S., Muñoz-Romero, S., Yagüe-Fernández, M., Barquín, A., García-Donas, J., & Rojo-Álvarez, J. L. (2021). Text analytics and mixed feature extraction in ovarian cancer clinical and genetic data. IEEE Access, 9, 58034–58051. https://doi.org/10.1109/ACCESS.2021.3072941
dc.identifier.doi10.1109/ACCESS.2021.3072941
dc.identifier.issn2169-3536
dc.identifier.urihttps://hdl.handle.net/10115/63198
dc.language.isoen
dc.publisherIEEE
dc.rightsAttribution 4.0 Internationalen
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectMeasurement
dc.subjectGenetics
dc.subjectTumors
dc.subjectChemotherapy
dc.subjectPrincipal component analysis
dc.subjectFeature extraction
dc.subjectCancer
dc.titleText Analytics and Mixed Feature Extraction in Ovarian Cancer Clinical and Genetic Data
dc.typeArticle

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
Text analytics and mixed feature extraction in ovarian cancer clinical and genetic data.pdf
Tamaño:
11.45 MB
Formato:
Adobe Portable Document Format