Ве молиме користете го овој идентификатор да го цитирате или поврзете овој запис: http://hdl.handle.net/20.500.12188/22307
DC FieldValueLanguage
dc.contributor.authorZdravevski, Eftimen_US
dc.contributor.authorApanowicz, Casen_US
dc.contributor.authorStencel, Krzysztofen_US
dc.contributor.authorSlezak, Dominiken_US
dc.date.accessioned2022-08-16T08:01:22Z-
dc.date.available2022-08-16T08:01:22Z-
dc.date.issued2019-
dc.identifier.urihttp://hdl.handle.net/20.500.12188/22307-
dc.description.abstractNowadays, companies must inevitably analyze the available data and extract meaningful knowledge. As an essential prerequisite, Extract-Transform-Load (ETL) requires significant effort, especially for Big Data. The existing solutions fail to formalize, integrate and evaluate the ETL process for Big Data in a scalable and cost-effective way. In this paper, we introduce a cloud-based architecture for data fusion and aggregation from a variety of sources. We identify three scenarios that generalize data aggregation during ETL. They are particularly valuable in the context of machine learning, as they facilitate feature engineering even in complex cases when the data from an extended time period has to be processed. In our experiments, we investigate user logs collected with Kinesis streams on Amazon AWS Hadoop clusters and demonstrate the scalability of our solution. The considered datasets range from 30 GB to 2.5 TB. The results were deployed in the domains, such as churn prediction, fraud detection, service outage prediction, and more generally – decision support and recommendation systems.en_US
dc.subjectData warehouses, Data streams, ETL, Business analyticsen_US
dc.titleScalable Cloud-based ETL for Self-serving Analyticsen_US
dc.typeProceedingsen_US
dc.relation.conferenceICDMen_US
item.grantfulltextopen-
item.fulltextWith Fulltext-
crisitem.author.deptFaculty of Computer Science and Engineering-
Appears in Collections:Faculty of Computer Science and Engineering: Conference papers
Files in This Item:
File Опис SizeFormat 
2019_07_ICDM_Cloud-basedscalableETL.pdf607.65 kBAdobe PDFView/Open
Прикажи едноставен запис

Page view(s)

65
checked on 9.11.2024

Download(s)

44
checked on 9.11.2024

Google ScholarTM

Проверете


Записите во DSpace се заштитени со авторски права, со сите права задржани, освен ако не е поинаку наведено.