Pierre Zemb

Engineering Manager @ Clever Cloud
Distributed and Database systems

Notes about FoundationDB

Pierre Zemb published on 2020-01-30 included in Foundationdb Notesabout

Notes About is a blogpost serie you will find a lot of links, videos, quotes, podcasts to click on about a specific topic. Today we will discover FoundationDB. Overview of FoundationDB As stated in the official documentation: FoundationDB is a distributed database designed to handle large volumes of structured data across clusters of commodity servers. It organizes data as an ordered key-value store and employs ACID transactions for all operations.

Diving into Kafka's Protocol

Pierre Zemb published on 2019-12-08 included in Kafka Diving Into

Diving Into is a blogpost serie where we are digging a specific part of of the project’s basecode. In this episode, we will digg into Kafka’s protocol. The protocol reference For the last few months, I worked a lot around Kafka’s protocols, first by creating a fully async Kafka to Pulsar Proxy in Rust, and now by contributing directly to KoP (Kafka On Pulsar). The full Kafka Protocol documentation is available here, but it does not offer a global view of what is happening for a classic Producer and Consumer exchange.

Diving into Hbase's MemStore

Pierre Zemb published on 2019-11-17 included in Hbase Diving Into

Diving Into is a blogpost serie where we are digging a specific part of of the project’s basecode. In this episode, we will digg into the implementation behind Hbase’s MemStore. tl;dr: Hbase is using the ConcurrentSkipListMap. What is the MemStore? The memtable from the official BigTable paper is the equivalent of the MemStore in Hbase. As rows are sorted lexicographically in Hbase, when data comes in, you need to have some kind of a in-memory buffer to order those keys.

What can be gleaned about GFS successor codenamed Colossus?

Pierre Zemb published on 2019-08-04 included in Distributed-Systems Hadoop Colossus

In the last few months, there has been numerous blogposts about the end of the Hadoop-era. It is true that: Health of Hadoop-based companies are publicly bad Hadoop has a bad publicity with headlines like ‘What does the death of Hadoop mean for big data?’ Hadoop, as a distributed-system, is hard to operate, but can be essential for some type of workload. As Hadoop is based on GFS, we can wonder how GFS evolved inside Google.

Playing with TTL in HBase

Pierre Zemb published on 2019-05-27 included in Distributed-Systems Hbase

Among all features provided by HBase, there is one that is pretty handy to deal with your data’s lifecyle: the fact that every cell version can have Time to Live or TTL. Let’s dive into the feature! Time To Live (TTL) Let’s read the doc first! ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached. HBase Book: Time To Live (TTL)

Handling OVH's alerts with Apache Flink

Pierre Zemb published on 2019-02-03 included in Distributed-Systems Flink

This is a repost from OVH’s official blogpost.. Thanks Horacio Gonzalez for the awesome drawings! Handling OVH’s alerts with Apache Flink OVH relies extensively on metrics to effectively monitor its entire stack. Whether they are low-level or business centric, they allow teams to gain insight into how our services are operating on a daily basis. The need to store millions of datapoints per second has produced the need to create a dedicated team to build a operate a product to handle that load: **Metrics Data Platform.