Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.12188/20794
DC FieldValueLanguage
dc.contributor.authorGrzegorowski, Mareken_US
dc.contributor.authorZdravevski, Eftimen_US
dc.contributor.authorJanusz, Andrzejen_US
dc.contributor.authorLameski, Petreen_US
dc.contributor.authorApanowicz, Casen_US
dc.contributor.authorŚlęzak, Dominiken_US
dc.date.accessioned2022-07-15T09:51:57Z-
dc.date.available2022-07-15T09:51:57Z-
dc.date.issued2021-07-15-
dc.identifier.urihttp://hdl.handle.net/20.500.12188/20794-
dc.description.abstractAnalytical data processing has become the cornerstone of today’s businesses success, and it is facilitated by Big Data platforms that offer virtually limitless scalability. However, minimizing the total cost of ownership (TCO) for the infrastructure can be challenging. We propose a novel method to build resilient clusters on cloud resources that are fine-tuned to the particular data processing task. The presented architecture follows the infrastructure-as-a-code paradigm so that the cluster can be dynamically configured and managed. It first identifies the optimal cluster size to perform a job in the required time. Then, by analyzing spot instance price history and using ARIMA models, it optimizes the schedule of the job execution to leverage the discounted prices of the cloud spot market. In particular, we evaluated savings opportunities when using Amazon EC2 spot instances comparing to on-demand resources. The performed experiments confirmed that the prediction module significantly improved the costeffectiveness of the solution – up to 80% savings compared to the on-demand prices, and at the worstcase, 1% more cost than the absolute minimum. The production deployments of the architecture show that it is invaluable for minimizing the total cost of ownership of analytical data processing solutions.en_US
dc.publisherElsevieren_US
dc.relation.ispartofBig Data Researchen_US
dc.subjectBig Data, ETL, Cloud computing, Spot price prediction, ARIMA, Sparken_US
dc.titleCost optimization for big data workloads based on dynamic scheduling and cluster-size tuningen_US
dc.typeArticleen_US
item.grantfulltextopen-
item.fulltextWith Fulltext-
crisitem.author.deptFaculty of Computer Science and Engineering-
crisitem.author.deptFaculty of Computer Science and Engineering-
Appears in Collections:Faculty of Computer Science and Engineering: Journal Articles
Show simple item record

Page view(s)

53
checked on Oct 11, 2024

Download(s)

74
checked on Oct 11, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.