Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.12188/20774
DC FieldValueLanguage
dc.contributor.authorZdravevski, Eftimen_US
dc.contributor.authorLameski, Petreen_US
dc.contributor.authorApanowicz, Casen_US
dc.contributor.authorŚlȩzak, Dominiken_US
dc.date.accessioned2022-07-14T11:16:42Z-
dc.date.available2022-07-14T11:16:42Z-
dc.date.issued2020-05-01-
dc.identifier.urihttp://hdl.handle.net/20.500.12188/20774-
dc.description.abstractThe success of companies hugely depends on how well they can analyze the available data and extract meaningful knowledge. The Extract-Transform-Load (ETL) process is instrumental in accomplishing these goals, but requires significant effort, especially for Big Data. Previous works have failed to formalize, integrate, and evaluate the ETL process for Big Data problems in a scalable and cost-effective way. In this paper, we propose a cloud-based ETL framework for data fusion and aggregation from a variety of sources. Next, we define three scenarios regarding data aggregation during ETL: (i) ETL with no aggregation; (ii) aggregation based on predefined columns or time intervals; and (iii) aggregation within single user sessions spanning over arbitrary time intervals. The third scenario is very valuable in the context of feature engineering, making it possible to define features as “the time since the last occurrence of event X”. The scalability was evaluated on Amazon AWS Hadoop clusters by processing user logs collected with Kinesis streams with datasets ranging from 30 GB to 2.6 TB. The business value of the architecture was demonstrated with applications in churn prediction, service-outage prediction, fraud detection, and more generally – decision support and recommendation systems. In the churn prediction case, we showed that over 98% of churners could be detected, while identifying the individual reason. This allowed support and sales teams to perform targeted retention measures.en_US
dc.publisherElsevieren_US
dc.relation.ispartofApplied Soft Computingen_US
dc.subjectData streams; ETL; Business analytics; Hadoop; Spark; Churn Predictionen_US
dc.titleFrom Big Data to business analytics: The case study of churn predictionen_US
dc.typeArticleen_US
item.fulltextWith Fulltext-
item.grantfulltextopen-
crisitem.author.deptFaculty of Computer Science and Engineering-
crisitem.author.deptFaculty of Computer Science and Engineering-
Appears in Collections:Faculty of Computer Science and Engineering: Journal Articles
Files in This Item:
File Description SizeFormat 
FromBigDatatobusinessanalytics-Thecasestudyofchurnprediction-accepted.pdf990.05 kBAdobe PDFView/Open
Show simple item record

Page view(s)

32
checked on May 28, 2024

Download(s)

289
checked on May 28, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.