Use cases
The article describes the use cases for ADS.
Messaging
Message brokers are used for the optimal organization of the messages exchange. A message broker is an intermediary computer program module that translates a message from the formal messaging protocol of the sender to the formal messaging protocol of the receiver. Kafka has several advantages over most message brokers:
-
High throughput allows you to process a large amount of data in a short period of time.
-
Organized built-in partitioning allows to control the storage time and amount of data due to the fact that the messages are divided into segments by partitions and are not stored in one large file.
-
Replication оf partitions provides an automatic failover opportunity to replicas оf partitions when a server in the cluster fails, so messages remain available even in case of failures.
-
Fault-tolerance ensures that the messaging system will keep messages in the absence of active subscribers until the consumer subscribes to the message delivery queue.
Service | Interoperability with Kafka | Advantages of use |
---|---|---|
NiFi |
NiFi as producer — takes data from sources directly to a central NiFi instance that delivers data to the appropriate Kafka topic |
|
NiFi as consumer — takes data from Kafka and transfers it to another system |
|
|
Dynamic self-adjusting data flow — combining the power of NiFi, Kafka, and an analytic platform:
|
|
|
ZooKeeper |
Informs each Kafka broker about the current state of the cluster |
Automatic updating of metadata in the Kafka client when connecting to any broker |
Confluent Schema Registry |
Implements a mechanism for working with different Kafka data schemas |
|
Website activity tracking
Kafka is a great tool for tracking website activity. When a new user registers on the website, their activity can be tracked as follows:
-
The user presses a button in the web page interface.
-
The web application creates a message with the metadata of this button.
-
Messages with metadata for this button are collected in a data packet and sent to Kafka, while creating commit logs.
-
On the next user action with this button, a message about the action is added to the commit logs, and the offset of the message in the queue increases.
Further from Kafka this data can be collected for analytics. This data shows real-time website usage.
Metrics
Kafka can be used to quickly collect metrics of various applications and operational data:
-
technological processes;
-
audit statistics and data collection;
-
system activity;
-
aggregation of statistics of running applications and infrastructure;
-
trackers of consumption of data streams by users in real time.
Kafka has several advantages when used in monitoring:
-
Ability to connect new producers to send metrics, and then use the monitoring data for several different systems.
-
Ability to perform real-time analysis on a large data set, while collecting metrics.
-
The consumer application may use a small amount of code to process data.
-
Ability to allocate data to different modules in accordance with their purpose.
Log aggregation
Log aggregation is one possible use case for ADS. Kafka is the best tool for log aggregation for distributed environments with different architectures. Kafka collects physical log files from servers and places them in a central location (perhaps a file server or HDFS) for processing. Compared to other systems, Kafka has a number of advantages:
-
Kafka abstracts away the details of files and gives a clearer abstraction of log data or events as a message stream. This provides lower latency processing and makes it easier to support multiple data sources and distributed data consumption.
-
Kafka offers the same high performance as other systems but better reliability guarantees through replication and much lower end-to-end latency.
Stream processing
Stream processing is used to improve messaging performance. Benefits of stream processing using Kafka over batch pipelines:
-
Message streams are consumed in real time.
-
Transformation, filtering, aggregation or join operations are applied to messages in order to publish the processed messages in another thread.
-
Stream pipelines reduce the load on the data source, because you do not need to execute full queries, but you can extract data from the log files of the DBMS or other systems.
-
Messages can be kept as long as needed.
Tool or Service | Role in Stream Processing | Advantages of use |
---|---|---|
Kafka Streams |
Client library for developing streaming Big Data applications that work with data stored in Kafka topics |
Powerful and flexible API with all the benefits of Kafka (scalability, reliability, minimum latency, analytic query mechanisms) allows the developer to write code in local mode (outside the cluster) |
NiFi |
Uses the concept of a stream as a sequence of operations: transfer, transformation and enrichment of data over a sequence of individual events |
A stream is NOT treated as a large batch operation that requires an initial load of all data before processing can begin. For example, a SQL database with millions of rows is treated as millions of individual rows that need to be processed |
ksqlDB |
Platform for streaming Big Data processing in Kafka using structured SQL queries |
Data structures in Kafka SQL are software units that are able to store and process a lot of data, linked by ksqlDB logic |