Examinando por Autor "Soguero-Ruiz, Cristina"

Mostrando 1 - 19 de 19

A Big Data Approach to Customer Relationship Management Strategy in Hospitality Using Multiple Correspondence Domain Description
(MDPI, 2020-12-29) González-Serrano, Lydia; Talón-Ballestero, Pilar; Muñoz-Romero, Sergio; Soguero-Ruiz, Cristina; Rojo-Álvarez, José Luis
COVID-19 has hit the hotel sector in a hitherto unknown way. This situation is producing a fundamental change in client behavior that makes crucial an adequate knowledge of their profile to overcome an uncertain environment. Customer Relationship Management (CRM) can provide key strategies in hospitality industry by generating a great amount of valuable information about clients, whereas Big Data tools are providing with unprecedented facilities to conduct massive analysis and to focus the client-to-business relationship. However, few instruments have been proposed to handle categorical features, which are the most usual in CRMs, aiming to adapt the statistical robustness with the best interpretability for the managers. Therefore, our aim was to identify the profiles of clients from an international hotel chain using the overall data in its CRM system. An analysis method was created involving three elements: First, Multiple Correspondence Analysis provides us with a statistical description of the interactions among categories and features. Second, bootstrap resampling techniques give us information about the statistical variability of the feature maps. Third, kernel methods provide easy-to-visualize domain descriptions based on confidence areas in the maps. The proposed methodology can provide an operative and statistically principled way to scrutinize the CRM profiles in hospitality.
A streaming data visualization framework for supporting decision-making in the Intensive Care Unit
(Elsevier, 2023) Mohedano-Munoz, Miguel A.; Soguero-Ruiz, Cristina; Mora-Jiménez, Inmaculada; Rubio-Sánchez, Manuel; Álvarez-Rodríguez, Joaquín; Sanchez, Alberto
The number of reporting activities in real time has increased over the last years. This situation has pushed the need for providing real time analysis and visualizations to support decision-making. We propose a visualization framework for exploratory data analysis of multivariate data streams that relies on dimensionality reduction and machine learning techniques for plotting the data in two dimensions. Users can demarcate regions of interest for their study, and use them to make predictions or to decide when to train a new model. The knowledge gained from these visualizations allows users to: (i) characterize the data stream scenario; (ii) track the evolution of a case of interest; and (iii) configure and raise alarms according to the user-defined regions. We illustrate the effectiveness of our proposal through a case study analyzing real-world streaming data to identify patients with multi-drug resistant bacteria when they are in a hospital intensive care unit. Our visualization framework enables the patient follow-up which can allow clinicians to support decisions about the health status evolution of a particular patient. This could provide information for deciding on a particular treatment or whether to isolate patients with a high risk of having multi-drug resistant bacteria since their presence boosts infections in intensive care units.
Algoritmos genéticos para la mejora de iluminación en imágenes macroscópicas y modelos basados en redes neuronales para la segmentación y detección de lesiones cutáneas
(Sociedad Española de Ingeniería Biomédica, 2024-11-13) Gómez-Martínez, Vanesa; Chushig-Muzo, David; Soguero-Ruiz, Cristina
El cáncer de piel es una de las formas de cáncer más comunes y de rápido crecimiento a nivel mundial. Tradicionalmente, las imágenes dermatoscópicas han sido el estándar para evaluar lesiones cutáneas debido a su alta resolución y detalle. Sin embargo, las imágenes macroscópicas están ganando popularidad en la práctica clı́nica por su accesibilidad y facilidad de uso, aunque suelen presentar una calidad inferior que puede afectar la precisión del diagnóstico. Este estudio propone y evalúa técnicas para mejorar la iluminación de imágenes macroscópicas mediante algoritmos genéticos (AGs), modelos U-Net para la segmentación de lesiones cutáneas y redes neuronales convolucionales para la detección de melanoma. Mediante la aplicación de AGs para ajustar el contraste y brillo, se logró mejorar la calidad visual de las imágenes en comparación con los métodos del estado del arte. Estas imágenes mejoradas permitieron obtener los mejores resultados en segmentación con el modelo Attention U-Net, alcanzando un ı́ndice Dice de 0.871, superando a imágenes originales y a aquellas mejoradas con métodos del estado del arte. Además, para la detección de melanoma, se evaluaron tres enfoques de imágenes: originales, mejoradas con AGs, y mejoradas con AGs y segmentadas. El enfoque de imágenes mejoradas y segmentadas junto con Resnet-50 logró un AUCROC de 0.80, lo que representa una mejora del 4 % respecto a las originales y del 2 % respecto a las solo mejoradas. Estos resultados destacan la eficacia de combinar técnicas de mejora y segmentación para mejorar la precisión en la detección de melanoma.
Cystatin C as a predictor of cardiovascular outcomes in a hypertensive population
(Springer Science and Business Media LLC, 2017-12) Garcia-Carretero, Rafael; Vigil-Medina, Luis; Barquero-Perez, Oscar; Goya-Esteban, Rebeca; Mora-Jimenez, Inmaculada; Soguero-Ruiz, Cristina; Ramos-Lopez, Javier
Dimensionality reduction and ensemble of LSTMs for antimicrobial resistance prediction
(Elsevier, 2023-02-14) Hernàndez-Carnerero, Àlvar; Sànchez-Marrè, Miquel; Mora-Jiménez, Inmaculada; Soguero-Ruiz, Cristina; Martínez-Agüero, Sergio; Álvarez-Rodríguez, Joaquín
Bacterial resistance to antibiotics has been rapidly increasing, resulting in low antibiotic effectiveness even treating common infections. The presence of resistant pathogens in environments such as a hospital Intensive Care Unit (ICU) exacerbates the critical admission-acquired infections. This work focuses on the prediction of antibiotic resistance in Pseudomonas aeruginosa nosocomial infections at the ICU, using Long Short-Term Memory (LSTM) artificial neural networks as the predictive method. The analyzed data were extracted from the Electronic Health Records (EHR) of patients admitted to the University Hospital of Fuenlabrada from 2004 to 2019 and were modeled as Multivariate Time Series. A data-driven dimensionality reduction method is built by adapting three feature importance techniques from the literature to the considered data and proposing an algorithm for selecting the most appropriate number of features. This is done using LSTM sequential capabilities so that the temporal aspect of features is taken into account. Furthermore, an ensemble of LSTMs is used to reduce the variance in performance. Our results indicate that the patient’s admission information, the antibiotics administered during the ICU stay, and the previous antimicrobial resistance are the most important risk factors. Compared to other conventional dimensionality reduction schemes, our approach is able to improve performance while reducing the number of features for most of the experiments. In essence, the proposed framework achieve, in a computationally cost-efficient manner, promising results for supporting decisions in this clinical task, characterized by high dimensionality, data scarcity, and concept drift.
dtwParallel: A Python package to efficiently compute dynamic time warping between time series
(Elsevier, 2023) Escudero-Arnanz, Óscar; G. Marques, Antonio; Soguero-Ruiz, Cristina; Mora-Jiménez, Inmaculada; Robles, Gregorio
dtwParallel is a Python package that computes the Dynamic Time Warping (DTW) distance between a collection of (multivariate) time series (MTS). dtwParallel incorporates the main functionalities available in current DTW libraries and novel functionalities such as parallelization, computation of similarity (kernel-based) values, and consideration of data with different types of features (categorical, real-valued, . . . ). A low-floor, high-ceiling, and wide-walls software design principle has been adopted, envisioning uses in education, research, and industry. The source code and documentation of the package are available at https://github.com/oscarescuderoarnanz/dtwParallel.
Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship Management
(MDPI, 2019) González-Serrano, Lydia; Talón-Ballestero, Pilar; Muñoz-Romero, Sergio; Soguero-Ruiz, Cristina; Rojo-Álvarez, José Luis
Customer Relationship Management (CRM) is a fundamental tool in the hospitality industry nowadays, which can be seen as a big-data scenario due to the large amount of recordings which are annually handled by managers. Data quality is crucial for the success of these systems, and one of the main issues to be solved by businesses in general and by hospitality businesses in particular in this setting is the identification of duplicated customers, which has not received much attention in recent literature, probably and partly because it is not an easy-to-state problem in statistical terms. In the present work, we address the problem statement of duplicated customer identification as a large-scale data analysis, and we propose and benchmark a general-purpose solution for it. Our system consists of four basic elements: (a) A generic feature representation for the customer fields in a simple table-shape database; (b) An efficient distance for comparison among feature values, in terms of the Wagner-Fischer algorithm to calculate the Levenshtein distance; (c) A big-data implementation using basic map-reduce techniques to readily support the comparison of strategies; (d) An X-from-M criterion to identify those possible neighbors to a duplicated-customer candidate. We analyze the mass density function of the distances in the CRM text-based fields and characterized their behavior and consistency in terms of the entropy and of the mutual information for these fields. Our experiments in a large CRM from a multinational hospitality chain show that the distance distributions are statistically consistent for each feature, and that neighbourhood thresholds are automatically adjusted by the system at a first step and they can be subsequently more-finely tuned according to the manager experience. The entropy distributions for the different variables, as well as the mutual information between pairs, are characterized by multimodal profiles, where a wide gap between close and far fields is often present. This motivates the proposal of the so-called X-from-M strategy, which is shown to be computationally affordable, and can provide the expert with a reduced number of duplicated candidates to supervise, with low X values being enough to warrant the sensitivity required at the automatic detection stage. The proposed system again encourages and supports the benefits of big-data technologies in CRM scenarios for hotel chains, and rather than the use of ad-hoc heuristic rules, it promotes the research and development of theoretically principled approaches.
Explainable Temporal Inference for Irregular Multivariate Time Series. A Case Study for Early Prediction of Multidrug Resistance
(Institute of Electrical and Electronics Engineers, 2025-07-23) Escudero-Arnanz, Óscar; Soguero-Ruiz, Cristina; Álvarez-Rodríguez, Joaquín; G. Marques, Antonio
Objective: Many healthcare problems involve complex patient trajectories represented as Multivariate Time Series (MTS), with predictions often coming as Time Series (TS) outputs. Despite recent advances, these “MTS-to-TS” inference tasks remain challenging due to data irregularity, temporal dependencies, and the need for clinical explainability. To address these demands, we propose novel eXplainable Artificial Intelligence (XAI) methods for “MTS-to-TS” architectures, enabling tracking of patient evolution and identification of key variable patterns associated with adverse outcomes. We evaluate our approach on private ICU data from the University Hospital of Fuenlabrada (UHF) for Multidrug Resistance (MDR) prediction and the public HiRID dataset (circulatory failure). Methods: We introduce three XAI techniques: i) Irregular Time SHapley Additive exPlanation (IT-SHAP), a post-hoc extension of TimeSHAP to TS outputs; ii) Hadamard Attention, an intrinsic mechanism for capturing temporal dependencies; and iii) Causal Conditional Mutual Information, a pre-hoc approach for feature selection. Results: MDR prediction achieved highest performance with a GRU using Hadamard Attention (ROC-AUC=0.783±0.023), while circulatory failure was best predicted with LSTM (ROC–AUC of 0.9970±1.6e−3). In terms of explainability, IT-SHAP uncovered clinically relevant risk factors—early antibiotic use and bacterial cultures—later validated by UHF clinicians. Conclusion: Our framework offers temporal explainability in “MTS-to-TS” architectures, allowing clinicians to trace disease trajectories and understand the contribution of each variable at each time step. Significance: Integrating explainable MDR risk predictions into EHR systems enables early interventions, improved antimicrobial stewardship, and infection control. The framework's scalability to other ICU challenges underscores its clinical impact.
Identification of clinically relevant features in hypertensive patients using penalized regression: a case study of cardiovascular events
(Springer Science and Business Media LLC, 2019-07-25) Garcia-Carretero, Rafael; Barquero-Perez, Oscar; Mora-Jimenez, Inmaculada; Soguero-Ruiz, Cristina; Goya-Esteban, Rebeca; Ramos-Lopez, Javier
Interpretable clinical time-series modeling with intelligent feature selection for early prediction of antimicrobial multidrug resistance
(Elsevier, 2022) Martínez-Agüero, Sergio; Soguero-Ruiz, Cristina; Alonso-Moral, Jose M.; Mora-Jiménez, Inmaculada; Álvarez-Rodríguez, Joaquín; Marques, Antonio G.
Electronic health records provide rich, heterogeneous data about the evolution of the patients’ health status. However, such data need to be processed carefully, with the aim of extracting meaningful information for clinical decision support. In this paper, we leverage interpretable (deep) learning and signal processing tools to deal with multivariate time-series data collected from the Intensive Care Unit (ICU) of the University Hospital of Fuenlabrada (Madrid, Spain). The presence of antimicrobial multidrug-resistant (AMR) bacteria is one of the greatest threats to the health system in general and to the ICUs in particular due to the critical health status of the patients therein. Thus, early identification of bacteria at the ICU and early prediction of their antibiotic resistance are key for the patients’ prognosis. While intelligent data-based processing and learning schemes can contribute to this early prediction, their acceptance and deployment in the ICUs require the automatic schemes to be not only accurate but also understandable by clinicians. Accordingly, we have designed trustworthy intelligent models for the early prediction of AMR based on the combination of meaningful feature selection with interpretable recurrent neural networks. These models were created using irregularly sampled clinical measurements, both considering the health status of the patient and the global ICU environment. We explored several strategies to cope with strongly imbalance data, since only a few ICU patients are infected by AMR bacteria. It is worth noting that our approach exhibits a good balance between performance and interpretability, especially when considering the difficulty of the classification task at hand. A multitude of factors are involved in the emergence of AMR (several of them not fully understood), and the records only contain a subset of them. In addition, the limited number of patients, the imbalance between classes, and the irregularity of the data render the problem harder to solve. Our models are also enriched with SHAP post-hoc interpretability and validated by clinicians who considered model understandability and trustworthiness of paramount concern for pragmatic purposes. Moreover, we use linguistic fuzzy systems to provide clinicians with explanations in natural language. Such explanations are automatically generated from a pool of interpretable rules that describe the interaction among the most relevant features identified by SHAP. Notice that clinicians were especially satisfied with new insights provided by our models. Such insights helped them to trust the automatic schemes and use them to make (better) decisions to mitigate AMR spreading in the ICU. All in all, this work paves the way towards more comprehensible time-series analysis in the context of early AMR prediction in ICUs and reduces the time of detection of infectious diseases, opening the door to better hospital care.
Interpreting clinical latent representations using autoencoders and probabilistic models
(Elsevier, 2021) Chushig-Muzo, David; Soguero-Ruiz, Cristina; Bohoyo, Pablo de Miguel; Mora-Jiménez, Inmaculada
Electronic health records (EHRs) are a valuable data source that, in conjunction with deep learning (DL) methods, have provided important outcomes in different domains, contributing to supporting decision-making. Owing to the remarkable advancements achieved by DL-based models, autoencoders (AE) are becoming extensively used in health care. Nevertheless, AE-based models are based on nonlinear transformations, resulting in black-box models leading to a lack of interpretability, which is vital in the clinical setting. To obtain insights from AE latent representations, we propose a methodology by combining probabilistic models based on Gaussian mixture models and hierarchical clustering supported by Kullback-Leibler divergence. To validate the methodology from a clinical viewpoint, we used real-world data extracted from EHRs of the University Hospital of Fuenlabrada (Spain). Records were associated with healthy and chronic hypertensive and diabetic patients. Experimental outcomes showed that our approach can find groups of patients with similar health conditions by identifying patterns associated with diagnosis and drug codes. This work opens up promising opportunities for interpreting representations obtained by the AE-based model, bringing some light to the decision-making process made by clinical experts in daily practice.
Learning and visualizing chronic latent representations using electronic health records
(BMC, 2022-09-05) Chushig-Muzo, David; Soguero-Ruiz, Cristina; Miguel Bohoyo, Pablo; Mora-Jiménez, Inmaculada
Background: Nowadays, patients with chronic diseases such as diabetes and hypertension have reached alarming numbers worldwide. These diseases increase the risk of developing acute complications and involve a substantial economic burden and demand for health resources. The widespread adoption of Electronic Health Records (EHRs) is opening great opportunities for supporting decision-making. Nevertheless, data extracted from EHRs are complex (heterogeneous, high-dimensional and usually noisy), hampering the knowledge extraction with conventional approaches. Methods: We propose the use of the Denoising Autoencoder (DAE), a Machine Learning (ML) technique allowing to transform high-dimensional data into latent representations (LRs), thus addressing the main challenges with clinical data. We explore in this work how the combination of LRs with a visualization method can be used to map the patient data in a two-dimensional space, gaining knowledge about the distribution of patients with diferent chronic conditions. Furthermore, this representation can be also used to characterize the patient’s health status evolution, which is of paramount importance in the clinical setting. Results: To obtain clinical LRs, we considered real-world data extracted from EHRs linked to the University Hospital of Fuenlabrada in Spain. Experimental results showed the great potential of DAEs to identify patients with clinical patterns linked to hyper‑ tension, diabetes and multimorbidity. The procedure allowed us to fnd patients with the same main chronic disease but diferent clinical characteristics. Thus, we identifed two kinds of diabetic patients with diferences in their drug therapy (insulin and non-insulin dependant), and also a group of women afected by hypertension and gestational diabetes. We also present a proof of concept for mapping the health status evolution of synthetic patients when considering the most signifcant diagnoses and drugs associated with chronic patients. Conclusion: Our results highlighted the value of ML techniques to extract clinical knowledge, supporting the identifcation of patients with certain chronic conditions. Furthermore, the patient’s health status progression on the two-dimensional space might be used as a tool for clinicians aiming to characterize health conditions and identify their more relevant clinical codes.
Noisy multi-label semi-supervised dimensionality reduction
(Elsevier, 2019-02-05) Mikalsen, Karl Øyvind; Soguero-Ruiz, Cristina; Bianchi, Filippo Maria; Jenssen, Robert
Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively within the framework of standard supervised machine learning over a period of several decades. However, very little research has been conducted on solving the challenge posed by noisy labels in non-standard settings. This includes situations where only a fraction of the samples are labeled (semi-supervised) and each high-dimensional sample is associated with multiple labels. In this work, we present a novel semi-supervised and multi-label dimensionality reduction method that effectively utilizes information from both noisy multi-labels and unlabeled data. With the proposed Noisy multi-label semi-supervised dimensionality reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled data are labeled simultaneously via a specially designed label propagation algorithm. NMLSDR then learns a projection matrix for reducing the dimensionality by maximizing the dependence between the enlarged and denoised multi-label space and the features in the projected space. Extensive experiments on synthetic data, benchmark datasets, as well as a real-world case study, demonstrate the effectiveness of the proposed algorithm and show that it outperforms state-of-the-art multi-label feature extraction algorithms.
Scaled Radial Axes for Interactive Visual Feature Selection: A Case Study for Analyzing Chronic Conditions
(2018-06-15) Sanchez, Alberto; Soguero-Ruiz, Cristina; Mora-Jiménez, Inmaculada; Rivas-Flores, Francisco Javier; Lehmann, Dirk Joachim; Rubio-Sánchez, Manuel
In statistics, machine learning, and related fields, feature selection is the process of choosing a smaller subset of features to work with. This is an important topic since selecting a subset of features can help analysts to interpret models and data, and to decrease computational runtimes. While many techniques are purely automatic, the data visualization community has produced a number of interactive approaches where users can make decisions taking into account their domain knowledge. In this paper we propose a new visualization technique based on radial axes that allows analysts to perform feature selection effectively, in contrast to previous radial axes methods. This is achieved by employing alternative scaled axes that provide insight regarding the features that have a smaller contribution to the visualizations. Therefore, analysts can use the technique to carry out interactive backwards feature elimination, by discarding the least relevant features according to the information on the plots and their expertise. Our approach can be coupled with any linear dimensionality reduction method, and can be used when performing analyses of cluster structure, correlations, class separability, etc. Specifically, in this paper we focus on combining the proposed technique with methods designed for classification. Lastly, we illustrate the effectiveness of our proposal through a case study analyzing high-dimensional medical chronic conditions data. In particular, clinicians have used the technique for determining the most important features that discriminate between patients with diabetes and high blood pressure.
Time series cluster kernel for learning similarities between multivariate time series with missing data
(Elsevier, 2018-04) Mikalsen, Karl Øyvind; Bianchi, Filippo Maria; Soguero-Ruiz, Cristina; Jenssen, Robert
Similarity-based approaches represent a promising direction for time series analysis. However, many such methods rely on parameter tuning, and some have shortcomings if the time series are multivariate (MTS), due to dependencies between attributes, or the time series contain missing data. In this paper, we address these challenges within the powerful context of kernel methods by proposing the robust time series cluster kernel (TCK). The approach taken leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with informative prior distributions. An ensemble learning approach is exploited to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel. We evaluate the TCK on synthetic and real data and compare to other state-of-the-art techniques. The experimental results demonstrate that the TCK is robust to parameter choices, provides competitive results for MTS without missing data and outstanding results for missing data.
Transfer learning for a tabular-to-image approach: A case study for cardiovascular disease prediction
(Elsevier, 2025-05) Lara-Abelenda , Francisco J.; Chushig-Muzo, David; Peiro-Corbacho, Pablo; Gómez-Martínez, Vanesa; Wägner, Ana M.; Granja, Conceição; Soguero-Ruiz, Cristina
Objective: Machine learning (ML) models have been extensively used for tabular data classification but recent works have been developed to transform tabular data into images, aiming to leverage the predictive performance of convolutional neural networks (CNNs). However, most of these approaches fail to convert data with a low number of samples and mixed-type features. This study aims: to evaluate the performance of the tabular-to-image method named low mixed-image generator for tabular data (LM-IGTD); and to assess the effectiveness of transfer learning and fine-tuning for improving predictions on tabular data. Methods: We employed two public tabular datasets with patients diagnosed with cardiovascular diseases (CVDs): Framingham and Steno. First, both datasets were transformed into images using LM-IGTD. Then, Framingham, which contains a larger set of samples than Steno, is used to train CNN-based models. Finally, we performed transfer learning and fine-tuning using the pre-trained CNN on the Steno dataset to predict CVD risk. Results: The CNN-based model with transfer learning achieved the highest AUCORC in Steno (0.855), outperforming ML models such as decision trees, K-nearest neighbors, least absolute shrinkage and selection operator (LASSO) support vector machine and TabPFN. This approach improved accuracy by 2% over the best-performing traditional model, TabPFN. Conclusion: To the best of our knowledge, this is the first study that evaluates the effectiveness of applying transfer learning and fine-tuning to tabular data using tabular-to-image approaches. Through the use of CNNs’ predictive capabilities, our work also advances the diagnosis of CVD by providing a framework for early clinical intervention and decision-making support.
Use of a K-nearest neighbors model to predict the development of type 2 diabetes within 2 years in an obese, hypertensive population
(Springer Science and Business Media LLC, 2020-05) Garcia-Carretero, Rafael; Vigil-Medina, Luis; Mora-Jimenez, Inmaculada; Soguero-Ruiz, Cristina; Barquero-Perez, Oscar; Ramos-Lopez, Javier
Prediabetes is a type of hyperglycemia in which patients have blood glucose levels above normal but below the threshold for type 2 diabetes mellitus (T2DM). Prediabetic patients are considered to be at high risk for developing T2DM, but not all will eventually do so. Because it is difficult to identify which patients have an increased risk of developing T2DM, we developed a model of several clinical and laboratory features to predict the development of T2DM within a 2-year period. We used a supervised machine learning algorithm to identify at-risk patients from among 1647 obese, hypertensive patients. The study period began in 2005 and ended in 2018. We constrained data up to 2 years before the development of T2DM. Then, using a time series analysis with the features of every patient, we calculated one linear regression line and one slope per feature. Features were then included in a K-nearest neighbors classification model. Feature importance was assessed using the random forest algorithm. The K-nearest neighbors model accurately classified patients in 96% of cases, with a sensitivity of 99%, specificity of 78%, positive predictive value of 96%, and negative predictive value of 94%. The random forest algorithm selected the homeostatic model assessment–estimated insulin resistance, insulin levels, and body mass index as the most important factors, which in combination with KNN had an accuracy of 99% with a sensitivity of 99% and specificity of 97%. We built a prognostic model that accurately identified obese, hypertensive patients at risk for developing T2DM within a 2-year period. Clinicians may use machine learning approaches to better assess risk for T2DM and better manage hypertensive patients. Machine learning algorithms may help health care providers make more informed decisions.
Using big data from Customer Relationship Management information systems to determine the client profile in the hotel sector
(Elsevier Ltd., 2018) Talón Ballestero, Pilar; González-Serrano, Lydia; Soguero-Ruiz, Cristina; Muñoz-Romero, Sergio; Rojo-Álvarez, José Luis
Client knowledge remains a key strategic point in hospitality management. However, the role that can be played by large amounts of available information in the Customer Relationship Management (CRM) systems, when addressed by using emerging Big Data techniques for efficient client profiling, is still in its early stages. In this work, we addressed the client profile of the data in a CRM system of an international hotel chain, by using Big Data technology and Bootstrap resampling techniques for Proportion Tests. Strong consistency was found on the most representative feature of repeaters being traveling without children. Profiles were more similar for British and German clients, and their main differences with Spanish clients were in the stay duration and in age. For a vacation chain, these results suggest further analysis on the target orientation towards new market segments. Big Data technologies can be extremely useful for analyzing indoor data available in CRM information systems from hospitality industry.
Visually guided classification trees for analyzing chronic patients
(2020-03-11) Soguero-Ruiz, Cristina; Mora-Jiménez, Inmaculada; Mohedano-Munoz, Miguel Ángel; Rubio-Sánchez, Manuel; de Miguel-Bohoyo, Pablo; Sanchez, Alberto
Background: Chronic diseases are becoming more widespread each year in developed countries, mainly due to increasing life expectancy. Among them, diabetes mellitus (DM) and essential hypertension (EH) are two of the most prevalent ones. Furthermore, they can be the onset of other chronic conditions such as kidney or obstructive pulmonary diseases. The need to comprehend the factors related to such complex diseases motivates the development of interpretative and visual analysis methods, such as classification trees, which not only provide predictive models for diagnosing patients, but can also help to discover new clinical insights. Results: In this paper, we analyzed healthy and chronic (diabetic, hypertensive) patients associated with the University Hospital of Fuenlabrada in Spain. Each patient was classified into a single health status according to clinical risk groups (CRGs). The CRGs characterize a patient through features such as age, gender, diagnosis codes, and drug codes. Based on these features and the CRGs, we have designed classification trees to determine the most discriminative decision features among different health statuses. In particular, we propose to make use of statistical data visualizations to guide the selection of features in each node when constructing a tree. We created several classification trees to distinguish among patients with different health statuses. We analyzed their performance in terms of classification accuracy, and drew clinical conclusions regarding the decision features considered in each tree. As expected, healthy patients and patients with a single chronic condition were better classified than patients with comorbidities. The constructed classification trees also show that the use of antipsychotics and the diagnosis of chronic airway obstruction are relevant for classifying patients with more than one chronic condition, in conjunction with the usual DM and/or EH diagnoses. Conclusions: We propose a methodology for constructing classification trees in a visually guided manner. The approach allows clinicians to progressively select the decision features at each of the tree nodes. The process is guided by exploratory data analysis visualizations, which may provide new insights and unexpected clinical information.