An Analysis of Software Parallelism in Big Data Technologies for Data-Intensive Architectures

Cerezo, Felipe; Cuesta, Carlos E.; Vela, Belén

doi:10.1007/978-3-030-86044-8_13

dc.contributor.author	Cerezo, Felipe
dc.contributor.author	Cuesta, Carlos E.
dc.contributor.author	Vela, Belén
dc.date.accessioned	2024-02-09T08:26:21Z
dc.date.available	2024-02-09T08:26:21Z
dc.date.issued	2021-08-26
dc.identifier.citation	Software Architecture, 15th European Conference Proceedings (ECSA 2021), Lecture Notes in Computer Science, vol. 12857, 181-188, septiembre 2021	es
dc.identifier.issn	0302-9743
dc.identifier.uri	https://hdl.handle.net/10115/30154
dc.description.abstract	Data-intensive architectures handle an enormous amount of information, which require the use of big data technologies. These tools include the parallelization mechanisms employed to speed up data processing. However, the increasing volume of these data has an impact on this parallelism and on resource usage. The strategy traditionally employed to increase the processing power has usually been that of adding more resources in order to exploit the parallelism; this strategy is, however, not always feasible in real projects, principally owing to the cost implied. The intention of this paper is, therefore, to analyze how this parallelism can be exploited from a software perspective, focusing specifically on whether big data tools behave as ideally expected: a linear increase in performance with respect to the degree of parallelism and the data load rate. Analysis is consequently carried out of, on the one hand, the impact of the internal data partitioning mechanisms of big data tools and, on the other, the impact on the performance of an increasing data load, while keeping the hardware resources constant. We have, therefore, conducted an experiment with two consolidated big data tools, Kafka and Elasticsearch. Our goal is to analyze the performance obtained when varying the degree of parallelism and the data load rate without ever reaching the limit of hardware resources available. The results of these experiments lead us to conclude that the performance obtained is far from being the ideal speedup, but that software parallelism still has a significant impact.	es
dc.language.iso	eng	es
dc.publisher	Springer Nature	es
dc.rights	Attribution-NonCommercial-NoDerivs 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Data-intensive architecture	es
dc.subject	Software parallelism	es
dc.subject	Partitioning	es
dc.subject	Big data technologies	es
dc.subject	Linear scalability	es
dc.title	An Analysis of Software Parallelism in Big Data Technologies for Data-Intensive Architectures	es
dc.type	info:eu-repo/semantics/bookPart	es
dc.identifier.doi	10.1007/978-3-030-86044-8_13	es
dc.rights.accessRights	info:eu-repo/semantics/restrictedAccess	es

Files in this item

Name:: ECSA_2021_CameraReady_18.pdf
Size:: 101.3Kb
Format:: PDF

View/Open

Google Viewer/

This item appears in the following Collection(s)

Comunicaciones a Congresos [446]

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 4.0 International