Skip to content

Introducing Kafka

Posted on:June 9, 2016

What is Kafka

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system.

A lot of keywords. But we will get into that later.

Basic terminology

Before we get in details with Kafka lets get to know the common terminologies.

Figure 1: Basic terminologies

Kafka Topic

The way Kafka scales the topic is by splitting into multiple partitions. In a partition each message has a incremental sequence number called offset. Each messages in a partition are ordered in the way they were pushed to Kafka.

Figure 2: Topics are distributed in partitions

Each partitions then can be replicated in different nodes for HA and fault tolerance. For each partition different nodes are leader. Only a leader can write to the partition. Leader writes to a commit log before it is replicated in other nodes, which is why in Kafka the messages are persistent. As there are multiple leaders in the cluster for different partitions there are different commit log which is being written at the same time. That is why it is called distributed commit log.

Figure 3: Partitions are replicated in the cluster

Consumers and Consumer group

One partition is consumed by one consumer in a consumer group. But in different consumer group same partition is consumed.

Figure 4: Consumer groups

Handling failure

Figure 5: Handling failure

Reference