Diving Into is a blogpost serie where we are digging a specific part of of the project's basecode. In this episode, we will digg into the implementation behind Hbase's MemStore.
tl;dr:
Hbase is using the ConcurrentSkipListMap.
🔗What is the MemStore?
The
memtable
from the official BigTable paper is the equivalent of theMemStore
in Hbase.
As rows are sorted lexicographically in Hbase, when data comes in, you need to have some kind of a in-memory buffer to order those keys. This is where the MemStore
comes in. It absorbs the recent write (or put in Hbase semantics) operations. All the rest are immutable files called HFile
stored in HDFS. There is one MemStore
per column family
.
Let's dig into how the MemStore internally works in Hbase 1.X.
🔗Hbase 1
All extract of code for this section are taken from rel/1.4.9 tag.
🔗in-memory storage
The MemStore interface is giving us insight on how it is working internally.
/**
* Write an update
* @param cell
* @return approximate size of the passed cell.
*/
long add(final Cell cell);
-- add function on the MemStore
The implementation is hold by DefaultMemStore. add
is wrapped by several functions, but in the end, we are arriving here:
private boolean addToCellSet(Cell e) {
boolean b = this.activeSection.getCellSkipListSet().add(e);
-- addToCellSet on the DefaultMemStore
CellSkipListSet class is built on top of ConcurrentSkipListMap, which provide nice features:
- concurrency
- sorted elements
🔗Flush on HDFS
As we seen above, the MemStore
is supporting all the puts. When asked to flush, the current memstore is moved to snapshot and is cleared. Flushed file are called (HFiles) and they are similar to SSTables
introduced by the official BigTable paper. HFiles are flushed on the Hadoop Distributed File System called HDFS
.
If you want deeper insight about SSTables, I recommend reading Table Format from the awesome RocksDB wiki
🔗Compaction
Compaction are only run on HFiles. It means that if hot data is continuously updated, we are overusing memory due to duplicate entries per row per MemStore. Accordion tends to solve this problem through in-memory compactions. Let's have a look to Hbase 2.X!
🔗Hbase 2
🔗storing data
All extract of code starting from here are taken from rel/2.1.2 tag.
Does MemStore
interface changed?
/**
* Write an update
* @param cell
* @param memstoreSizing The delta in memstore size will be passed back via this.
* This will include both data size and heap overhead delta.
*/
void add(final Cell cell, MemStoreSizing memstoreSizing);
-- add function in MemStore interface
The signature changed a bit, to include passing a object instead of returning a long. Moving on.
The new structure implementing MemStore is called AbstractMemStore. Again, we have some layers, where AbstractMemStore is writing to a MutableSegment
, which itsef is wrapping Segment
. If you dig far enough, you will find that data are stored into the CellSet class which is also things built on top of ConcurrentSkipListMap!
🔗in-memory Compactions
Hbase 2.0 introduces a big change to the original memstore called Accordion which is a codename for in-memory compactions. An awesome blogpost is available here: Accordion: HBase Breathes with In-Memory Compaction and the document design is also available.
Thank you for reading my post! feel free to react to this article, I'm also available on Twitter if needed.