Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.12188/17065
DC FieldValueLanguage
dc.contributor.authorZdravevski, Eftimen_US
dc.contributor.authorLameski, Petreen_US
dc.contributor.authorKulakov, Andreaen_US
dc.contributor.authorJakimovski, Boroen_US
dc.contributor.authorFiliposka, Sonjaen_US
dc.contributor.authorTrajanov, Dimitaren_US
dc.date.accessioned2022-03-25T12:09:54Z-
dc.date.available2022-03-25T12:09:54Z-
dc.date.issued2015-08-
dc.identifier.urihttp://hdl.handle.net/20.500.12188/17065-
dc.description.abstractIn classification problems the large number of features can pose a significant challenge from many aspects. This is particularly the case in the context of Big Data. In order to address this issue we propose a distributed and parallel computation of information gain based on MapReduce. The proposed implementation on Hadoop can be used for ranking features of large datasets and furthermore for feature selection. The data-parallelism is achieved by uniformly distributing it using HBase tables with proper row keys. Performance evaluations are made by estimation of the speed-up of multi-node clusters against a one-node cluster. The framework was deployed on a on-premises Hadoop cluster. The results show that by parallelization and distribution of the computations on a cluster significant speedup can be achieved. The main contribution of this paper is that we have demonstrated how the higher level scripting language Pig Latin can be used for writing MapReduce jobs instead of directly writing a separate map and reduce function. Additionally, we have proposed the use of manually pre-splitted HBase tables instead of HDFS files for data fragmentation in order to set the degree of parallelism on a higher level.en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.titleFeature Ranking Based on Information Gain for Large Classification Problems with MapReduceen_US
dc.typeArticleen_US
dc.relation.conference2015 IEEE Trustcom/BigDataSE/ISPAen_US
dc.identifier.doi10.1109/trustcom.2015.580-
dc.identifier.urlhttp://xplorestaging.ieee.org/ielx7/7293439/7345453/07345493.pdf?arnumber=7345493-
item.grantfulltextopen-
item.fulltextWith Fulltext-
crisitem.author.deptFaculty of Computer Science and Engineering-
crisitem.author.deptFaculty of Computer Science and Engineering-
crisitem.author.deptFaculty of Computer Science and Engineering-
crisitem.author.deptFaculty of Computer Science and Engineering-
crisitem.author.deptFaculty of Computer Science and Engineering-
crisitem.author.deptFaculty of Computer Science and Engineering-
Appears in Collections:Faculty of Computer Science and Engineering: Conference papers
Files in This Item:
File Description SizeFormat 
2015_InfoGain_EftimZdravevski.pdf650.44 kBAdobe PDFView/Open
Show simple item record

Page view(s)

50
checked on Apr 25, 2024

Download(s)

31
checked on Apr 25, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.