Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship Management
Archivos
Fecha
2019
Título de la revista
ISSN de la revista
Título del volumen
Editor
MDPI
Resumen
Customer Relationship Management (CRM) is a fundamental tool in the hospitality industry
nowadays, which can be seen as a big-data scenario due to the large amount of recordings which
are annually handled by managers. Data quality is crucial for the success of these systems, and one
of the main issues to be solved by businesses in general and by hospitality businesses in particular
in this setting is the identification of duplicated customers, which has not received much attention
in recent literature, probably and partly because it is not an easy-to-state problem in statistical
terms. In the present work, we address the problem statement of duplicated customer identification
as a large-scale data analysis, and we propose and benchmark a general-purpose solution for it.
Our system consists of four basic elements: (a) A generic feature representation for the customer
fields in a simple table-shape database; (b) An efficient distance for comparison among feature
values, in terms of the Wagner-Fischer algorithm to calculate the Levenshtein distance; (c) A big-data
implementation using basic map-reduce techniques to readily support the comparison of strategies;
(d) An X-from-M criterion to identify those possible neighbors to a duplicated-customer candidate.
We analyze the mass density function of the distances in the CRM text-based fields and characterized
their behavior and consistency in terms of the entropy and of the mutual information for these
fields. Our experiments in a large CRM from a multinational hospitality chain show that the distance
distributions are statistically consistent for each feature, and that neighbourhood thresholds are
automatically adjusted by the system at a first step and they can be subsequently more-finely tuned
according to the manager experience. The entropy distributions for the different variables, as well
as the mutual information between pairs, are characterized by multimodal profiles, where a wide
gap between close and far fields is often present. This motivates the proposal of the so-called
X-from-M strategy, which is shown to be computationally affordable, and can provide the expert with
a reduced number of duplicated candidates to supervise, with low X values being enough to warrant
the sensitivity required at the automatic detection stage. The proposed system again encourages
and supports the benefits of big-data technologies in CRM scenarios for hotel chains, and rather
than the use of ad-hoc heuristic rules, it promotes the research and development of theoretically
principled approaches.
Descripción
Citación
González-Serrano, L., Talón-Ballestero, P., Muñoz-Romero, S., Soguero-Ruiz, C., & Rojo-Álvarez, J. L. (2019). Entropic statistical description of big data quality in hotel customer relationship management. Entropy, 21(4), 419.
Colecciones
Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution 4.0 International