An Analysis of Software Parallelism in Big Data Technologies for Data-Intensive Architectures

dc.contributor.authorCerezo, Felipe
dc.contributor.authorCuesta, Carlos E.
dc.contributor.authorVela, Belén
dc.date.accessioned2024-02-09T08:26:21Z
dc.date.available2024-02-09T08:26:21Z
dc.date.issued2021-08-26
dc.description.abstractData-intensive architectures handle an enormous amount of information, which require the use of big data technologies. These tools include the parallelization mechanisms employed to speed up data processing. However, the increasing volume of these data has an impact on this parallelism and on resource usage. The strategy traditionally employed to increase the processing power has usually been that of adding more resources in order to exploit the parallelism; this strategy is, however, not always feasible in real projects, principally owing to the cost implied. The intention of this paper is, therefore, to analyze how this parallelism can be exploited from a software perspective, focusing specifically on whether big data tools behave as ideally expected: a linear increase in performance with respect to the degree of parallelism and the data load rate. Analysis is consequently carried out of, on the one hand, the impact of the internal data partitioning mechanisms of big data tools and, on the other, the impact on the performance of an increasing data load, while keeping the hardware resources constant. We have, therefore, conducted an experiment with two consolidated big data tools, Kafka and Elasticsearch. Our goal is to analyze the performance obtained when varying the degree of parallelism and the data load rate without ever reaching the limit of hardware resources available. The results of these experiments lead us to conclude that the performance obtained is far from being the ideal speedup, but that software parallelism still has a significant impact.es
dc.identifier.citationSoftware Architecture, 15th European Conference Proceedings (ECSA 2021), Lecture Notes in Computer Science, vol. 12857, 181-188, septiembre 2021es
dc.identifier.doi10.1007/978-3-030-86044-8_13es
dc.identifier.issn0302-9743
dc.identifier.urihttps://hdl.handle.net/10115/30154
dc.language.isoenges
dc.publisherSpringer Naturees
dc.rightsAttribution-NonCommercial-NoDerivs 4.0 International
dc.rights.accessRightsinfo:eu-repo/semantics/restrictedAccesses
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectData-intensive architecturees
dc.subjectSoftware parallelismes
dc.subjectPartitioninges
dc.subjectBig data technologieses
dc.subjectLinear scalabilityes
dc.titleAn Analysis of Software Parallelism in Big Data Technologies for Data-Intensive Architectureses
dc.typeinfo:eu-repo/semantics/bookPartes

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
ECSA_2021_CameraReady_18.pdf
Tamaño:
101.37 KB
Formato:
Adobe Portable Document Format
Descripción:

Bloque de licencias

Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
2.67 KB
Formato:
Item-specific license agreed upon to submission
Descripción: