Flink overview
Flink is a processing engine used to process continuous streams of data at a very high speed. You can install Flink during the ADH cluster deployment. You also can install Flink to an existing cluster.
The simplest way to configure Flink is by using ADCM. However, if you want a customized configuration, you can use XML configuration files. The default configuration files are located in the /etc/flink/conf/ directory, with flink-conf.yaml being the main configuration file.
The configuration is parsed and evaluated when the Flink processes are started. Changes to a configuration file require all the relevant processes to restart.
The default configuration uses your default Java installation.
If you want to override the Java runtime that should be used, you can manually set the environment variable JAVA_HOME
or specify the configuration key env.java.home
in flink-conf.yaml.
You can specify a different configuration directory location by defining the FLINK_CONF_DIR
environment variable.
All parameters are described in the Flink documentation.
Flink can execute applications in one of three ways:
-
Application Mode. Creates a session cluster per application and executes the application’s
main()
method on the cluster. -
Per-Job Mode. Uses an available resource provider framework (e.g. YARN, Kubernetes) to spin up a cluster for every submitted job. In this case, the life cycle of the cluster is bound to the life cycle of the job.
-
Session Mode. Uses an already running cluster and the resources of that cluster to execute any submitted application. The cluster life cycle is independent of that of any job that is running on the cluster. The resources are shared across all jobs.
For details on how to run a Flink app, see Connect to Flink via CLI and Flink UI overview.
Flink’s JobManager components can operate in the high availability mode (HA) so the application execution does not interrupt when a JobManager instance goes down. The HA mode activates automatically when 2 or more JobManager components are installed in the cluster. In this case, the port numbers to access JobManagers are generated by ZooKeeper.