ADS architecture

Olga Semme

Contents

Overview
- Kafka
- ZooKeeper
- NiFi
- API

Overview

For each use case, the architecture of ADS is determined by the consumer, depending on the tasks. The high-level architecture view of one of the options of ADS is shown below.

ADS architecture

Kafka

Every named topic in Kafka consists of one or more partitions distributed among brokers within one ADS cluster.
When a new message is added to a topic, it is written to one of that topic’s partitions. Messages with the same keys are always written to the same partition, thereby guaranteeing the sequence or order of writing and reading.
To ensure data integrity, each partition in Kafka can be replicated multiple times. This ensures that there are multiple copies of the message stored on different brokers.
Every partition has a leader — a broker that works with clients. It is the leader that works with the producers and generally gives messages to the consumers. Messages are always sent to the leader and, in general, are read from the leader.
For scalably and reliably streaming data between Kafka and other data systems Kafka Connect is used. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, and can also deliver data from Kafka topics, for example, into data analysis and retrieval systems (ADLS) or database management system (ADB).
Confluent Schema Registry ensures schemas compatibility between Kafka producer and consumer.

ZooKeeper

ZooKeeper decides which of the brokers is the controller (i.e. the broker responsible for choosing the partition leaders). It also determines the state of the partition leaders and their replicas.
ZooKeeper also acts as a consistent metadata storage and distributed log service. In the event of a broker crash, it is in ZooKeeper that the controller will write information about new partition leaders.

NiFi

To work with stream files, NiFi uses a lot of handlers (processors) — separate code fragments to perform a specific operation with stream files.
Every Kafka consumer is part of a Consumer Group that is configured by the ConsumeKafka processor in NiFi with a specific Kafka broker and group name.
Every Consumer Group has a unique name and is registered by brokers in the Kafka cluster. Data from the same topic can be read by multiple Consumer Groups at the same time.
When several consumers read data from Kafka and they are members of the same group, each of them receives messages from different topic partitions, thus distributing the load. NiFi in this case acts as a Kafka consumer and implements all the data processing logic.
It is also possible to transfer data from Kafka to another system using NiFi.
The NiFi web user interface allows development, management and monitoring in a single interface.

API

Producer API allows an application to publish a message stream to one or more topics of the Kafka platform.
Consumer API allows an application to subscribe to one or more topics and process the post streams that belong to them.
Streams API allows an application to act as a stream processor. It consumes an input data stream from one or more topics and creates an output data stream for one or more topics as well. Thus Streams API effectively converts input streams into output streams.

Found a mistake? Seleсt text and press Ctrl+Enter to report it