An Analysis of Software Parallelism in Big Data Technologies for Data-Intensive Architectures

Cerezo, Felipe; Cuesta, Carlos E.; Vela, Belén

An Analysis of Software Parallelism in Big Data Technologies for Data-Intensive Architectures

Archivos

ECSA_2021_CameraReady_18.pdf (101.37 KB)

Fecha

2021-08-26

Autores

Cerezo, Felipe

Cuesta, Carlos E.

Vela, Belén

Editor

Springer Nature

URI

https://hdl.handle.net/10115/30154

DOI

https://doi.org/10.1007/978-3-030-86044-8_13

Citas

0 citas en

Resumen

Data-intensive architectures handle an enormous amount of information, which require the use of big data technologies. These tools include the parallelization mechanisms employed to speed up data processing. However, the increasing volume of these data has an impact on this parallelism and on resource usage. The strategy traditionally employed to increase the processing power has usually been that of adding more resources in order to exploit the parallelism; this strategy is, however, not always feasible in real projects, principally owing to the cost implied. The intention of this paper is, therefore, to analyze how this parallelism can be exploited from a software perspective, focusing specifically on whether big data tools behave as ideally expected: a linear increase in performance with respect to the degree of parallelism and the data load rate. Analysis is consequently carried out of, on the one hand, the impact of the internal data partitioning mechanisms of big data tools and, on the other, the impact on the performance of an increasing data load, while keeping the hardware resources constant. We have, therefore, conducted an experiment with two consolidated big data tools, Kafka and Elasticsearch. Our goal is to analyze the performance obtained when varying the degree of parallelism and the data load rate without ever reaching the limit of hardware resources available. The results of these experiments lead us to conclude that the performance obtained is far from being the ideal speedup, but that software parallelism still has a significant impact.

Palabras clave

Data-intensive architecture , Software parallelism , Partitioning , Big data technologies , Linear scalability

Citación

Software Architecture, 15th European Conference Proceedings (ECSA 2021), Lecture Notes in Computer Science, vol. 12857, 181-188, septiembre 2021

Colecciones

Comunicaciones a Congresos

Página completa del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivs 4.0 International

An Analysis of Software Parallelism in Big Data Technologies for Data-Intensive Architectures

Archivos

Fecha

Autores

Título de la revista

ISSN de la revista

Título del volumen

Editor

URI

DOI

Citas

Resumen

Descripción

Palabras clave

Citación

Colecciones