Show simple item record

An Analysis of Software Parallelism in Big Data Technologies for Data-Intensive Architectures

dc.contributor.authorCerezo, Felipe
dc.contributor.authorCuesta, Carlos E.
dc.contributor.authorVela, Belén
dc.date.accessioned2024-02-09T08:26:21Z
dc.date.available2024-02-09T08:26:21Z
dc.date.issued2021-08-26
dc.identifier.citationSoftware Architecture, 15th European Conference Proceedings (ECSA 2021), Lecture Notes in Computer Science, vol. 12857, 181-188, septiembre 2021es
dc.identifier.issn0302-9743
dc.identifier.urihttps://hdl.handle.net/10115/30154
dc.description.abstractData-intensive architectures handle an enormous amount of information, which require the use of big data technologies. These tools include the parallelization mechanisms employed to speed up data processing. However, the increasing volume of these data has an impact on this parallelism and on resource usage. The strategy traditionally employed to increase the processing power has usually been that of adding more resources in order to exploit the parallelism; this strategy is, however, not always feasible in real projects, principally owing to the cost implied. The intention of this paper is, therefore, to analyze how this parallelism can be exploited from a software perspective, focusing specifically on whether big data tools behave as ideally expected: a linear increase in performance with respect to the degree of parallelism and the data load rate. Analysis is consequently carried out of, on the one hand, the impact of the internal data partitioning mechanisms of big data tools and, on the other, the impact on the performance of an increasing data load, while keeping the hardware resources constant. We have, therefore, conducted an experiment with two consolidated big data tools, Kafka and Elasticsearch. Our goal is to analyze the performance obtained when varying the degree of parallelism and the data load rate without ever reaching the limit of hardware resources available. The results of these experiments lead us to conclude that the performance obtained is far from being the ideal speedup, but that software parallelism still has a significant impact.es
dc.language.isoenges
dc.publisherSpringer Naturees
dc.rightsAttribution-NonCommercial-NoDerivs 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectData-intensive architecturees
dc.subjectSoftware parallelismes
dc.subjectPartitioninges
dc.subjectBig data technologieses
dc.subjectLinear scalabilityes
dc.titleAn Analysis of Software Parallelism in Big Data Technologies for Data-Intensive Architectureses
dc.typeinfo:eu-repo/semantics/bookPartes
dc.identifier.doi10.1007/978-3-030-86044-8_13es
dc.rights.accessRightsinfo:eu-repo/semantics/restrictedAccesses


Files in this item

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 4.0 InternationalExcept where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 4.0 International