Home
Arenadata Streaming
Concepts
Architecture
Kafka
KRaft overview

KRaft overview

Olga Semme

Contents

Overview
KRaft benefits
Controller quorum
Raft algorithm
Metadata management
- Write metadata
- Metadata replication
KRaft in ADS

Overview

Kafka Raft (KRaft) is a protocol based on the Raft consensus algorithm that manages Kafka cluster metadata. KRaft organizes the storage of information about the cluster configuration, ensures synchronization of Kafka brokers during data replication without using leader election and metadata storage mechanisms provided by ZooKeeper.

NOTE

The ability to enable KRaft mode has been available since Kafka version 3.6.2.

KRaft benefits

The main characteristic of KRaft is the absence of an intermediary service, which entails the following advantages:

Fast and efficient metadata storage and distribution — updates are sent directly to the controller quorum, eliminating an additional transmission node (ZooKeeper).
Brokers and controllers can store metadata locally in their cache and access it without using a connection to an external service.
Reducing delays when broker stops their work.
It is expected that the KRaft protocol will significantly increase the limit on the number of partitions for one node and an entire cluster.

Controller quorum

The main concept of KRaft is based on the creation of a controller quorum — a set of specialized brokers that are responsible for storing and timely updating cluster metadata.

Controller quorum in KRaft

For controller assignment in Kafka the process.roles parameter with the controller value is used.

The controller quorum only serves the internal __cluster_metadata topic with Kafka metadata. The topic contains one partition, which records all information about the current state of Kafka brokers, topics, and partitions. The leader of the partition — active controller — is selected based on the Raft consensus algorithm by voting for the leader epoch. The active controller is responsible for writing metadata changes to the __cluster_metadata topic.

Controllers that are not the leader store an exact copy of the topic with metadata as in-sync replicas (ISR) and have the right to vote for the new leader epoch. As a result of the elections, each of these voters can become a leader.

Brokers whose process.roles parameter is set to broker cannot vote for the leader epoch, they are observers. Observers work like regular Kafka brokers, but they store a copy of the __cluster_metadata topic, each time rewriting the cluster metadata state into their own cache whenever the topic changes.

The active controller also monitors the state of all controllers and brokers using timeouts and heartbeat messages sent by brokers. When a broker stops, the leader redistributes partitions to other brokers and updates information about them in the metadata topic.

To coordinate replicas, the high water mark (HWM) marker is used — the largest offset in the log that was written to all ISRs. Followers cache its value and update it whenever the leader’s log changes, thus maintaining a consistent state of the cluster. When a leader changes, all follower partitions truncate their logs to the HWM recorded by the new leader, and all messages above the HWM become unreadable. After the log is truncated, the follower can continue to replicate any new messages from the leader. Once all new messages are replicated to a majority of nodes, the HWM is increased.

In addition to the process.roles parameter, the following parameters must be defined for Kafka cluster brokers configured to work in KRaft mode:

controller.quorum.voters.roles — addresses of nodes participating in the controller quorum.
node.id — a node ID, must be a unique integer and is assigned during installation/extension. Cannot be the same as the ID of any other broker or controller in the cluster.
controller.listener.names — a comma-separated list of listener names used by the controller. When communicating with a controller quorum, a broker in KRaft mode will always use the first listener in this list.
metadata.log.dir — a directory for storing cluster metadata. The directory name must not be the same as any broker directory; the default is /kafka-meta.

Raft algorithm

Overview

The Raft and ZAB (used in ZooKeeper) consensus algorithms have the following conceptual similarities:

the only leader responsible for data processing at any time;
separate processes for leader elections and log replication;
general focus on ensuring data consistency in distributed systems.

The Raft architecture has a simpler implementation, in which the following roles are allocated for the operation of nodes:

Leader of the cluster, which processes client requests and controls data replication.
Candidate trying to become a leader.
Follower recording messages and processing requests from the leader and candidates.

Vote

The leader is chosen randomly on the first run, and the remaining nodes start as followers.

The leader monitors the health of followers using timeouts and periodic heartbeats, and also sends its heartbeats with or without messages (if any).

If a follower does not receive heartbeats from the leader during the timeout, it becomes a leader candidate. Each candidate sends a request for leadership to the others and accepts the same requests from the others. The candidate who receives the most votes becomes the leader. The leader is selected for a period, the called term. The term is identified as epoch — a number that i incremented with each new leader.

Message replication

The leader sends a request to all followers to add a new record and considers the transaction successful only after receiving confirmation of the message record from the majority of nodes.

Messages are always added sequentially.

If an external client connects to the cluster through a follower, the request is still forwarded to the leader.

KRaft

Additionally, the KRaft protocol introduces a method for storing metadata based on the Kafka log replication protocol, where controller quorum is used. The basic principle is that each cluster broker strives to store a replica of the metadata topic and does not require a connection to ZooKeeper to obtain the state of the cluster (topic partition leaders, new brokers, etc.), and therefore does not require latency and a significant amount of memory, since the process of storing metadata in the log is part of the operation of the Kafka algorithms.

Metadata management

Write metadata

Each new broker or controller that joins the cluster requests a copy of the internal topic __cluster_metadata and creates a replica of it, simultaneously caching the topic data, i.e. saves data about the complete state of the cluster.

If the broker starts after a failure, it reads the most recent changes available in its metadata cache. After that, it queries the active controller for events that have occurred since the last update in __cluster_metadata and in response receives the latest changes in the form of snapshots of the cluster state.

Metadata replication

For Kafka running in KRaft mode, HWM is tracked and taken into account by the leader for the total number of metadata topic replicas (not just for ISR replicas, as in regular Kafka).

The HWM cannot be increased until all messages written by the leader have been stored by a majority of the replicas. If there is a change of leader, the epoch increases. To reconcile their logs with the leader, followers send a request containing the last offset and the last epoch during which messages were recorded.

Based on such requests, the leader monitors the performance of brokers and in response sends information about the new epoch and the last offset. Next, the replicas catch up with the leader or cut off their log to the HWM transmitted by the leader. After a majority of brokers confirm the last offset entry, the HWM is incremented. After the log is truncated, the follower can continue to replicate any new messages from the leader.

KRaft in ADS

Starting from ADS version 3.6.2.2.b1, installation of the Kafka service in KRaft mode is available via the ADCM interface. This method automatically sets the necessary parameters for brokers.

CAUTION

The KRaft protocol for Kafka is in technical preview and is not recommended for use in a production environment. Arenadata encourages you to explore this feature in non-production environments and provide feedback on your experience as appropriate.
For clusters where Kafka service is used together with ZooKeeper, migration to KRaft mode can be performed.
There is no possibility to migrate from KRaft to ZooKeeper.
For ADS cluster where installation of Kafka service in KRaft mode is planned, ZooKeeper service installation is not required.

Found a mistake? Seleсt text and press Ctrl+Enter to report it