Use cases

The article describes the use cases for ADS.

Messaging

Message brokers are used for the optimal organization of the messages exchange. A message broker is an intermediary computer program module that translates a message from the formal messaging protocol of the sender to the formal messaging protocol of the receiver. Kafka has several advantages over most message brokers:

  • High throughput allows you to process a large amount of data in a short period of time.

  • Organized built-in partitioning allows to control the storage time and amount of data due to the fact that the messages are divided into segments by partitions and are not stored in one large file.

  • Replication оf partitions provides an automatic failover opportunity to replicas оf partitions when a server in the cluster fails, so messages remain available even in case of failures.

  • Fault-tolerance ensures that the messaging system will keep messages in the absence of active subscribers until the consumer subscribes to the message delivery queue.

Services that complement and optimize the work of Kafka
Service Interoperability with Kafka Advantages of use

NiFi

NiFi as producer — takes data from sources directly to a central NiFi instance that delivers data to the appropriate Kafka topic

  • Data transfer to Kafka without special software.

  • Ability to visually monitor and control the data pipeline.

NiFi as consumer — takes data from Kafka and transfers it to another system

  • Data transfer from Kafka to some file systems (HFDS, HBase) without special software.

  • Ability to visually monitor and control the data pipeline.

Dynamic self-adjusting data flow — combining the power of NiFi, Kafka, and an analytic platform:

  1. NiFi sends data to Kafka.

  2. Data becomes available for analytical platforms.

  3. Results are written back to another Kafka topic.

  4. Data goes to NiFi.

  • Ability to further use the data received by NiFi without special software.

  • Ability to visually monitor and control the data pipeline.

ZooKeeper

Informs each Kafka broker about the current state of the cluster

Automatic updating of metadata in the Kafka client when connecting to any broker

Confluent Schema Registry

Implements a mechanism for working with different Kafka data schemas

  • Definition of the fields required to describe the event, the type of each field.

  • Documentation of events and values of each of its fields in an accessible form.

  • Preventing consumers from receiving corrupted data.

  • Working with different data formats from different information sources.

Website activity tracking

Kafka is a great tool for tracking website activity. When a new user registers on the website, their activity can be tracked as follows:

  1. The user presses a button in the web page interface.

  2. The web application creates a message with the metadata of this button.

  3. Messages with metadata for this button are collected in a data packet and sent to Kafka, while creating commit logs.

  4. On the next user action with this button, a message about the action is added to the commit logs, and the offset of the message in the queue increases.

Further from Kafka this data can be collected for analytics. This data shows real-time website usage.

Metrics

Kafka can be used to quickly collect metrics of various applications and operational data:

  • technological processes;

  • audit statistics and data collection;

  • system activity;

  • aggregation of statistics of running applications and infrastructure;

  • trackers of consumption of data streams by users in real time.

 

Kafka has several advantages when used in monitoring:

  • Ability to connect new producers to send metrics, and then use the monitoring data for several different systems.

  • Ability to perform real-time analysis on a large data set, while collecting metrics.

  • The consumer application may use a small amount of code to process data.

  • Ability to allocate data to different modules in accordance with their purpose.

Log aggregation

Log aggregation is one possible use case for ADS. Kafka is the best tool for log aggregation for distributed environments with different architectures. Kafka collects physical log files from servers and places them in a central location (perhaps a file server or HDFS) for processing. Compared to other systems, Kafka has a number of advantages:

  • Kafka abstracts away the details of files and gives a clearer abstraction of log data or events as a message stream. This provides lower latency processing and makes it easier to support multiple data sources and distributed data consumption.

  • Kafka offers the same high performance as other systems but better reliability guarantees through replication and much lower end-to-end latency.

Stream processing

Stream processing is used to improve messaging performance. Benefits of stream processing using Kafka over batch pipelines:

  • Message streams are consumed in real time.

  • Transformation, filtering, aggregation or join operations are applied to messages in order to publish the processed messages in another thread.

  • Stream pipelines reduce the load on the data source, because you do not need to execute full queries, but you can extract data from the log files of the DBMS or other systems.

  • Messages can be kept as long as needed.

Tools and services for stream processing
Tool or Service Role in Stream Processing Advantages of use

Kafka Streams

Client library for developing streaming Big Data applications that work with data stored in Kafka topics

Powerful and flexible API with all the benefits of Kafka (scalability, reliability, minimum latency, analytic query mechanisms) allows the developer to write code in local mode (outside the cluster)

NiFi

Uses the concept of a stream as a sequence of operations: transfer, transformation and enrichment of data over a sequence of individual events

A stream is NOT treated as a large batch operation that requires an initial load of all data before processing can begin. For example, a SQL database with millions of rows is treated as millions of individual rows that need to be processed

ksqlDB

Platform for streaming Big Data processing in Kafka using structured SQL queries

Data structures in Kafka SQL are software units that are able to store and process a lot of data, linked by ksqlDB logic

Event sourcing

Kafka is the best option to support applications built with the event sourcing pattern for event processing. Advantages of using Kafka in event sourcing:

  • Support for storing very large log data.

  • The ability to work with delayed events due to the queue and offset mechanisms in Kafka.

Сommit log

Kafka can serve as an external commit log for a distributed system. Advantages of using Kafka:

  • The ability to replicate data between nodes.

  • Possibility of data recovery (re-synchronization mechanism for failed nodes).

  • Log compression feature in Kafka.

Found a mistake? Seleсt text and press Ctrl+Enter to report it