Simplifying parallel implementation of algorithms on Hadoop with Pig Latin
Date Issued
2015
Author(s)
Abstract
In this paper we present a general technique for
parallelizing regular algorithms with the tools the Hadoop ecosystem offers: MapReduce, HDFS, HBase and Pig. This framework
can be applied for parallelizing algorithms for feature selection,
clustering, machine learning etc. It consists of several steps: load
the datasets in HDFS, apply some transformations if they are
needed, store the datasets in HBase, and implement the algorithm
in Pig with the help of User Defined Functions.
parallelizing regular algorithms with the tools the Hadoop ecosystem offers: MapReduce, HDFS, HBase and Pig. This framework
can be applied for parallelizing algorithms for feature selection,
clustering, machine learning etc. It consists of several steps: load
the datasets in HDFS, apply some transformations if they are
needed, store the datasets in HBase, and implement the algorithm
in Pig with the help of User Defined Functions.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
SimplifyingMapReducedevelopmentonHadoopandHBasewithPigLatin-EftimZdravevski.pdf
Size
312.49 KB
Format
Adobe PDF
Checksum
(MD5):afdc559704abce1ba429cff236d0b649
