Row key designs of NoSQL database tables and their impact on write performance
Date Issued
2016-02-17
Author(s)
Abstract
In several NoSQL database systems, among which
is HBase, only one index is available for the tables, which is
also the row key and the clustered index. Using other indexes
does not come out of the box. As a result, the row key design
is the most important thing when designing tables, because
an inappropriate design can lead to detrimental consequences
on performances and costs. Particular row key designs are
suitable for different problems, and in this paper we analyze the
performance, characteristics and applicability of each of them.
In particular we investigate the effect of using various techniques
for modeling row keys: sequences, salting, padding, hashing, and
modulo operations. We propose four different designs based on
these techniques and we analyze their performance on different
HBase clusters when loading HDFS files with various sizes. The
experiments show that particular designs consistently outperform
others on differently sized clusters in both execution time and
even load distribution across nodes.
is HBase, only one index is available for the tables, which is
also the row key and the clustered index. Using other indexes
does not come out of the box. As a result, the row key design
is the most important thing when designing tables, because
an inappropriate design can lead to detrimental consequences
on performances and costs. Particular row key designs are
suitable for different problems, and in this paper we analyze the
performance, characteristics and applicability of each of them.
In particular we investigate the effect of using various techniques
for modeling row keys: sequences, salting, padding, hashing, and
modulo operations. We propose four different designs based on
these techniques and we analyze their performance on different
HBase clusters when loading HDFS files with various sizes. The
experiments show that particular designs consistently outperform
others on differently sized clusters in both execution time and
even load distribution across nodes.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
2015_HBase_Rowkeys_PDP_2016_EftimZdravevski.pdf
Size
1.32 MB
Format
Adobe PDF
Checksum
(MD5):966090a9821bc3893b9caef4b3b6f064
