HConfig: Resource Adaptive Fast Bulk Loading in HBase

HConfig: Resource Adaptive Fast Bulk Loading in HBase NoSQL (Not only SQL) data stores become a vital component in many big data computing platforms due to its inherent horizontal scalability. HBase is an open-source distributed NoSQL store that is widely used by many Internet enterprises to handle their big data computing applications (e.g. Facebook handles millions of messages each day with HBase). Optimizations that can enhance the performance of HBase are of paramount interests for big data applications that use HBase or Big Table like key-value stores. In this paper…

Read More

Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters

Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low…

Read More