Pierre Zemb

Engineering Manager @ Clever Cloud
Distributed and Database systems

Diving into Hbase's MemStore

Diving Into is a blogpost serie where we are digging a specific part of of the project’s basecode. In this episode, we will digg into the implementation behind Hbase’s MemStore. tl;dr: Hbase is using the ConcurrentSkipListMap. What is the MemStore? The memtable from the official BigTable paper is the equivalent of the MemStore in Hbase. As rows are sorted lexicographically in Hbase, when data comes in, you need to have some kind of a in-memory buffer to order those keys.

What can be gleaned about GFS successor codenamed Colossus?

In the last few months, there has been numerous blogposts about the end of the Hadoop-era. It is true that: Health of Hadoop-based companies are publicly bad Hadoop has a bad publicity with headlines like ‘What does the death of Hadoop mean for big data?’ Hadoop, as a distributed-system, is hard to operate, but can be essential for some type of workload. As Hadoop is based on GFS, we can wonder how GFS evolved inside Google.

Playing with TTL in HBase

Among all features provided by HBase, there is one that is pretty handy to deal with your data’s lifecyle: the fact that every cell version can have Time to Live or TTL. Let’s dive into the feature! Time To Live (TTL) Let’s read the doc first! ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached. HBase Book: Time To Live (TTL)

Handling OVH's alerts with Apache Flink

This is a repost from OVH’s official blogpost.. Thanks Horacio Gonzalez for the awesome drawings! Handling OVH’s alerts with Apache Flink OVH relies extensively on metrics to effectively monitor its entire stack. Whether they are low-level or business centric, they allow teams to gain insight into how our services are operating on a daily basis. The need to store millions of datapoints per second has produced the need to create a dedicated team to build a operate a product to handle that load: **Metrics Data Platform.

What are ACID transactions?

Transaction? "Programming should be about transforming data" — Programming Elixir 1.3 by Dave Thomas As developers, we are interacting oftenly with data, whenever handling it from an API or a messaging consumer. To store it, we started to create softwares called relational database management system or RDBMS. Thanks to them, we, as developers, can develop applications pretty easily, without the need to implement our own storage solution. Interacting with mySQL or PostgreSQL have now become a commodity.

Hbase Data Model

HBase? Apache HBase™ is a type of “NoSQL” database. “NoSQL” is a general term meaning that the database isn’t an RDBMS which supports SQL as its primary access language. Technically speaking, HBase is really more a “Data Store” than “Data Base” because it lacks many of the features you find in an RDBMS, such as typed columns, secondary indexes, triggers, and advanced query languages, etc. – Hbase architecture overview Hbase data model The data model is simple: it’s like a multi-dimensional map: