Flink overview
The Flink service provides an engine for stream and batch data processing. Being similar to Spark in some aspects, Flink offers stateful and stateless computations, an SQL abstraction layer, high availability support, and robust clustering capabilities. However, Flink was initially designed as a stream-first framework that supports bounded and unbounded data streams, making it versatile for various applications. Flink’s applicability examples are: real-time analytics, event-driven applications, pattern/anomaly/fraud detection apps, high-load ETL pipelines, etc.
Flink provides the following capabilities:
-
Unified API for stream and batch processing flows. Although Flink was primarily designed for stream processing, it can also handle batch processing tasks.
-
Stateful computations. Flink supports stateful processing, remembering the data at various processing stages. Flink can guarantee exactly-once processing semantics, ensuring that data is processed once and only once, even in case of failures or restarts.
-
Windowing and time-based operation. Flink has built-in support for various types of windows, which are a key tool for processing unbounded streams. The windows allow grouping and aggregating data over arbitrary time intervals. Flink also supports running aggregations like
SUM
,AVG
,COUNT
in scope of a window based on the event time or processing time. -
Fault tolerance and recovery. Flink provides powerful checkpointing capabilities to ensure fault tolerance. With checkpoints, the system periodically persists its state, being able to recover from a successful checkpoint in case of a failure. A similar savepoints mechanism allows triggering manual snapshots of Flink jobs, allowing you to pause, update, and restart jobs while preserving the state.
-
SQL over streams. Flink provides an SQL abstraction over incoming data streams, allowing you to run SQL queries against a data stream as if it were a regular table. You can run operations like filters, joins, aggregations, and windowing on data streams directly in SQL.
Architecture
The high-level Flink architecture is shown below.
The main diagram components are highlighted below.
Flink Client
Compiles Flink application code into a logical graph (also called a Dataflow graph or JobGraph) and submits it to JobManager. It is not a part of the Flink runtime. There are several implementations that can act as a client in a Flink cluster:
-
REST endpoint. Flink can receive JARs with tasks within a REST request payload.
-
SQL client. Allows running Flink jobs using a built-in CLI interface for SQL.
-
Java/Python client applications that programmatically submit new jobs using Flink API.
JobManager
The key component that coordinates the lifecycle of Flink applications. It manages task scheduling, monitors the state of tasks, coordinates checkpoints, handles recovery, and so on. JobManager is a single entity per cluster that directly controls one or more TaskManagers.
The JobManager process comprises the following components:
-
Resource Manager. Responsible for resource provisioning in a Flink cluster. It manages task slots, which are the unit of resource scheduling in Flink.
-
JobMaster. A per-job object responsible for managing the execution of a single Dataflow graph. Multiple jobs can run in parallel in a Flink cluster, each having its own JobMaster.
-
Dispatcher. Creates a new JobMaster for each submitted job. Exposes REST endpoints to receive Flink application JARs and also provides the Flink web UI with information about job execution.
TaskManager
A JVM process, capable of running one or more subtasks in separate threads. A Flink cluster includes one or more TaskManagers. TaskManagers buffer data streams and exchange data between each other. The smallest unit of resource scheduling in a TaskManager is a task slot.
TIP
In ADH, you can submit Flink jobs to a YARN cluster, which automatically creates containers for Flink JobManagers/TaskManagers on hosts controlled by YARN.
For more information about supported modes and usage examples, see Run Flink on YARN.
|
Concepts
The following are major Flink-specific concepts frequently used in related articles. For more details, see Flink documentation.
-
Bounded and unbounded streams
A bounded data stream is a stream of data that has a well-defined, finite size. For example, reading a CSV file and processing its records with Flink DataStream API is an example of bounded processing. This mode assumes that the entire dataset can be ingested (possibly sorted or pre-processed) before applying any computations.
An unbounded stream has a starting point but may never end, continuously delivering data records at arbitrary intervals. Examples of unbounded data streams: a Kafka topic, a web socket emitting JSON records, or a digital interface of a sensor.
-
State and stateful processing
State refers to the data that Flink retains between individual processing stages (for example, between two aggregation operators). The examples of stateful processing: searching for patterns in the input data stream, aggregating events within a specific time frame (minute/hour/date), etc. Stateful processing assumes additional expenses on managing and storing the state data. However, it is the state data that allows complex computations like aggregations, deduplication, pattern matching, and window-based computations.
In Flink, the state data is managed by the component called state backend. Choosing an appropriate state backend implementation that best fits a particular Flink application can help minimize state data access costs.
-
Sources and sinks
A typical Flink application dealing with data streams has a structure presented below. The following is a simplified representation since Flink can automatically create several source/sink/operator instances in JVM, depending on many factors, such as parallelism settings, enabled optimization mechanisms, etc.
Sources and sinks.Sources and sinksThe major flow components are as follows:
-
Source. The entry point of a Flink job. It reads data from an external system, such as a message queue, database, or socket, and creates a DataStream entity, which is further processed by operators. Flink comes with in-built source connectors for Kafka, JDBC, file systems, etc.
-
Operators apply transformations on DataStreams.
-
Sink is the endpoint of a Flink application flow. Data sinks consume DataStreams and forward their data to files, sockets, queues, and other external systems.
-
-
Tasks and operators
Operators consume one or more DataStreams, apply transformations, and output new DataStream(s). Examples of transformation operators are:
map()
,flatMap()
,keyBy()
.A task is a basic unit of work. For each instance of an operator that runs in parallel, it takes one task to complete its job. For example, an operator with parallelism set to
3
will have each of its instances executed by three separate tasks.Tasks and operatorsTasks and operators
Flink History Server
The Flink service comes with the Flink History Server component that provides a web UI with statistics about completed Flink applications.
NOTE
The Flink History Server component is available starting ADH 4.0.0.
|
The statistics about completed jobs are generated by a JobManager and are stored in a configurable location (defaults to /apps/flink/completed-jobs in HDFS). The History Server scans this location periodically and renders information about completed jobs in the web UI. The archive directory locations and scan interval can be configured using Flink service configuration parameters.
History Server REST API
Apart from the web UI, Flink History Server exposes a REST API, which can be used to fetch information about completed jobs. The REST endpoints do not require authentication, accept only GET requests, and can be requested as follows:
GET http://<history_server_host>:<port>/<endpoint_path> HTTP/1.1
where:
-
<history_server_host>
is the name or IP-address of the host with the Flink History Server component. -
<port>
is the port number to accept REST requests. Defaults to8081
and can be changed using thehistoryserver.web.port
parameter of the Flink service in ADCM.
The following endpoints are available:
Returns general information about the Flink cluster.
Sample request:
$ curl 'http://ka-adh-3.ru-central1.internal:8082/config'
Sample response:
{
"refresh-interval": 10000,
"timezone-name": "Coordinated Universal Time",
"timezone-offset": 0,
"flink-version": "1.20.1",
"flink-revision": "ae77a60 @ 2025-04-28T10:41:27+02:00",
"features": {
"web-submit": false,
"web-cancel": false,
"web-rescale": false,
"web-history": true
}
}
Returns a list of submitted Flink jobs.
Sample request:
$ curl 'http://ka-adh-3.ru-central1.internal:8082/jobs/overview'
Sample response:
{
"jobs": [
{
"jid": "547cc23f3e85744bffbfcafe7c06471a",
"name": "Flink Java Job at Sun Jun 15 23:25:44 UTC 2025",
"start-time": 1750029945589,
"end-time": 1750029946616,
"duration": 1027,
"state": "FINISHED",
"last-modification": 1750029946616,
"tasks": {
"running": 0,
"canceling": 0,
"canceled": 0,
"total": 3,
"created": 0,
"scheduled": 0,
"deploying": 0,
"reconciling": 0,
"finished": 3,
"initializing": 0,
"failed": 0
}
}
]
}
Returns details on a specific Flink job, including the job execution plan, vertices, subtasks, etc.
Sample request:
$ curl 'http://ka-adh-3.ru-central1.internal:8082/jobs/547cc23f3e85744bffbfcafe7c06471a'
Sample response:
{
"jid": "547cc23f3e85744bffbfcafe7c06471a",
"name": "Flink Java Job at Sun Jun 15 23:25:44 UTC 2025",
"isStoppable": false,
"state": "FINISHED",
"job-type": "BATCH",
"start-time": 1750029945589,
"end-time": 1750029946616,
"duration": 1027,
"maxParallelism": -1,
"now": 1750029946692,
"timestamps": {
"FAILED": 0,
"SUSPENDED": 0,
"CANCELLING": 0,
"INITIALIZING": 1750029945589,
"FAILING": 0,
"CANCELED": 0,
"RUNNING": 1750029946033,
"FINISHED": 1750029946616,
"CREATED": 1750029945752,
"RESTARTING": 0,
"RECONCILING": 0
},
"vertices": [
...
],
"status-counts": {
"INITIALIZING": 0,
"CANCELING": 0,
"DEPLOYING": 0,
"FAILED": 0,
"SCHEDULED": 0,
"RUNNING": 0,
"RECONCILING": 0,
"FINISHED": 3,
"CREATED": 0,
"CANCELED": 0
},
"plan": {
...
}
}
Returns an execution configuration for a given Flink job.
Sample request:
$ curl 'http://ka-adh-3.ru-central1.internal:8082/jobs/547cc23f3e85744bffbfcafe7c06471a/config'
Sample response:
{
"jid": "547cc23f3e85744bffbfcafe7c06471a",
"name": "Flink Java Job at Sun Jun 15 23:25:44 UTC 2025",
"execution-config": {
"execution-mode": "PIPELINED",
"restart-strategy": "Cluster level default restart strategy",
"job-parallelism": 1,
"object-reuse-mode": false,
"user-config": {}
}
}
Returns details about a particular vertex of a Flink job.
Sample request:
$ curl 'http://ka-adh-3.ru-central1.internal:8082/jobs/547cc23f3e85744bffbfcafe7c06471a/vertices/33b1d60e4c0d4656355755296d6794e0'
Sample response:
{
"id": "33b1d60e4c0d4656355755296d6794e0",
"name": "CHAIN DataSource (at getDefaultTextLineDataSet(WordCountData.java:70) (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:97)) -> Combine (SUM(1), at main(WordCount.java:100)",
"parallelism": 1,
"maxParallelism": 1,
"now": 1750029946787,
"subtasks": [
{
...
}
],
"aggregated": {
...
}
}
Returns the exceptions raised during the job execution.
Sample request:
$ curl 'http://ka-adh-3.ru-central1.internal:8082/jobs/57a9afc1904b14f808c80378ee20ffeb/exceptions'
Sample response:
{
"root-exception": null,
"timestamp": null,
"all-exceptions": [],
"truncated": false,
"exceptionHistory": {
"entries": [],
"truncated": false
}
}
Returns a list of accumulator objects involved into the processing of the given the job.
Sample request:
$ curl 'http://ka-adh-3.ru-central1.internal:8082/jobs/57a9afc1904b14f808c80378ee20ffeb/accumulators'
Sample response:
{
"job-accumulators": [],
"user-task-accumulators": [
{
"name": "85d49c256c7a7e0370ebba1e88e8f199",
"type": "SerializedListAccumulator",
"value": "[[B@7b294b0f, [B@2a90f4d2, [B@2a86b8be, ..."
}
],
"serialized-user-task-accumulators": {
"85d49c256c7a7e0370ebba1e88e8f199": "rO0ABXNyACVvcmcuYXBhY2hlLmZsaW5rLnV..."
}
}
Returns a list of Flink subtasks, which were created to complete the job.
Sample request:
$ curl 'http://ka-adh-3.ru-central1.internal:8082/jobs/547cc23f3e85744bffbfcafe7c06471a/vertices/33b1d60e4c0d4656355755296d6794e0/subtasktimes'
Sample response:
{
"id": "33b1d60e4c0d4656355755296d6794e0",
"name": "CHAIN DataSource (at getDefaultTextLineDataSet(WordCountData.java:70) (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:97)) -> Combine (SUM(1), at main(WordCount.java:100)",
"now": 1750029946723,
"subtasks": [
{
"subtask": 0,
"host": "ka-adh-3",
"endpoint": "ka-adh-3.ru-central1.internal:38443",
"duration": 565,
"timestamps": {
"INITIALIZING": 1750029946391,
"CANCELING": 0,
"DEPLOYING": 1750029946222,
"FAILED": 0,
"SCHEDULED": 1750029946037,
"RUNNING": 1750029946393,
"RECONCILING": 0,
"FINISHED": 1750029946602,
"CREATED": 1750029945835,
"CANCELED": 0
}
}
]
}
Returns the execution plan for a given Flink job.
Sample request:
$ curl 'http://ka-adh-3.ru-central1.internal:8082/jobs/57a9afc1904b14f808c80378ee20ffeb/plan'
Sample response:
{
"plan": {
"jid": "57a9afc1904b14f808c80378ee20ffeb",
"name": "Flink Java Job at Sun Jun 15 23:49:15 UTC 2025",
"type": "BATCH",
"nodes": [
{
"id": "5b6d423820643b538f78784054364b75",
"parallelism": 1,
"operator": "Data Sink",
"operator_strategy": "(none)",
"description": "DataSink (collect())",
"inputs": [
{
"num": 0,
"id": "fa8699328ec101fd063912c9734556b7",
"ship_strategy": "Forward",
"exchange": "pipelined"
}
],
"optimizer_properties": {
...
}
},
{
"id": "fa8699328ec101fd063912c9734556b7",
"parallelism": 1,
"operator": "GroupReduce",
"operator_strategy": "Sorted Group Reduce",
"description": "Reduce (SUM(1), at main(WordCount.java:100)",
"inputs": [
{
"num": 0,
"id": "9d782237f8c637c037444a774f60e6e7",
"ship_strategy": "Hash Partition on [0]",
"local_strategy": "Sort (combining) on [0:ASC]",
"exchange": "pipelined"
}
],
"optimizer_properties": {
...
}
},
{
"id": "9d782237f8c637c037444a774f60e6e7",
"parallelism": 1,
"operator": "Data Source -> FlatMap -> GroupCombine",
"operator_strategy": "(none)<br/> -> FlatMap<br/> -> Sorted Combine",
"description": "DataSource (at getDefaultTextLineDataSet(WordCountData.java:70) (org.apache.flink.api.java.io.CollectionInputFormat))<br/> -> FlatMap (FlatMap at main(WordCount.java:97))<br/> -> Combine (SUM(1), at main(WordCount.java:100)",
"optimizer_properties": {
...
}
}
]
}
}