Configuration parameters

Elena Kostyuchenko

Collapse content Expand content

Contents

ADPG
Airflow2
GitSync
Monitoring
Redis
DBT

This topic describes the parameters that can be configured for ADO services via ADCM. To read about the configuring process, refer to the relevant articles: Online installation, Offline installation.

NOTE

Some of the parameters become visible in the ADCM UI after the Advanced flag has been set.
The parameters that are set in the Custom group will overwrite the existing parameters even if they are read-only.

ADPG

Data directory

Parameter

Description

Default value

Data directory

Directories that are used to store data on the ADPG hosts

/pg_data1

ADPG configurations

Parameter Description Default value

listen_addresses

Specifies the TCP/IP address(es) on which the server is to listen for connections from client applications (requires a restart when changed)

port

The TCP port the server listens on

5432

max_connections

Determines the maximum number of concurrent connections to the server. For a replica host, the value of this parameter must be greater than or equal to the value on the leader host. If this requirement is not met, the replica host will reject all requests

100

shared_buffers

Sets the amount of memory for the shared memory buffer. The higher the value of this parameter, the less the load on the host hard drives will be

128 MB

max_worker_processes

Sets the maximum number of background processes that the system can support

max_parallel_workers

Sets the maximum number of workers that the system can support for parallel operations

max_parallel_workers_per_gather

Sets the maximum number of workers that can be started by a single Gather or Gather Merge node

max_parallel_maintenance_workers

Sets the maximum number of parallel workers that can be started by a single utility command

effective_cache_size

Sets the planner’s assumption about the effective size of the disk cache that is available to a single query. This is taken into account when estimating the cost of using the index. A higher value makes it more likely that index scans will be used, a lower value makes it more likely that sequential scans will be applied. When setting this parameter, you should consider both PostgreSQL shared buffers and the portion of the kernel’s disk cache that will be used for PostgreSQL data files, though some data might exist in both places. Also, take into account the expected number of concurrent queries to different tables, since they will have to share the available space. This parameter does not affect the size of shared memory allocated by PostgreSQL, and it does not reserve kernel disk cache. It is used only for estimation purposes. The system also does not assume data remains in the disk cache between queries. If this value is specified without units, it is taken as blocks, that is BLCKSZ bytes, typically 8 KB

4096 MB

maintenance_work_mem

Specifies the maximum amount of memory to be used by maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY. If this value is specified without units, it is taken as kilobytes. Since only one of these operations can be executed at a time by a session, and they are usually not executed in parallel in a cluster, it is safe to set this value significantly larger than work_mem. Larger settings might improve performance of vacuuming and restoring database dumps

64 MB

work_mem

Sets the base maximum amount of memory to be used by a query operation (such as a sort or hash table) before writing to temporary disk files. Note that for a complex query, several sort or hash operations might be running in parallel. Each operation will be allowed to use as much memory as this value specifies before it starts to write data into temporary files. Several running sessions can also do such operations concurrently. Therefore, the total memory used can be many times greater than the value of work_mem. If this value is specified without units, it is taken as kilobytes

4 MB

min_wal_size

Until WAL disk usage stays below the min_wal_size value, old WAL files are recycled for future use at a checkpoint instead of removing. This ensures that enough WAL space is reserved to handle spikes in WAL usage, for example, when running large batch jobs

80 MB

max_wal_size

Sets the memory limit to which the log size can grow between automatic checkpoints. Increasing this setting may increase the recovery time after a failure. The specified limit can be exceeded automatically with a high load on ADPG

1024 MB

wal_keep_size

Sets the minimum size of segments retained in the pg_wal directory, in case a standby server needs to fetch them for streaming replication. If a standby server connected to the sending server falls behind by more than wal_keep_size megabytes, the sending server might remove a WAL segment still needed by the standby server. In this case, the replication connection is terminated. Downstream connections also fail as a result. If WAL archiving is enabled, the standby server can fetch the segment from the archive and recover. The wal_keep_size parameter sets only the minimum size of segments retained in pg_wal. The system might need to retain more segments for WAL archival or to recover from a checkpoint. If wal_keep_size=0, the system does not keep any extra segments for standby purposes. In the Enterprise version, if the wal_keep_size value is set to less than 16 MB, Patroni uses 128 MB instead in its configuration. The value in the ADCM UI is not changed

128 MB

huge_pages

Defines whether huge pages can be requested for the main shared memory area. The following values are valid:

Try — the server tries to request huge pages. If this operation fails, the server falls back to the default page.
On — if a huge page request fails, the server does not start.
Off — huge pages are not used.

try

superuser_reserved_connections

Determines the number of connection "slots" that are reserved for PostgreSQL superuser connections

logging_collector

Enables the logging collector. The logging collector is a background process that captures log messages sent to stderr and redirects them into log files

true

log_directory

Determines the directory that contains log files. It can be specified as an absolute path or relative to the cluster data directory

log
(the absolute path is /pg_data1/adpg16/log)

log_filename

Specifies the log file name pattern. The value can include strftime %-escapes to define time-varying file names. If you specify a file name pattern without escapes, use a log rotation utility to save disk space

postgresql-%a.log

log_rotation_age

Determines the maximum period of time to use a log file, after which a new log file is created. If this value is specified without units, it is taken as minutes. Set log_rotation_age to 0 to disable time-based log file creation

log_rotation_size

Determines the maximum size of a log file. After a log file reaches the specified size, a new log file is created. If the value is set without units, it is taken as kilobytes. Set log_rotation_size to 0 to disable size-based log file creation

log_min_messages

Specifies the minimum severity level of messages that are written to a log file. Valid values are debug5, debug4, debug3, debug2, debug1, info, notice, warning, error, log, fatal, and panic (see Severity levels table). Messages with the specified severity or higher are included in the log file. For example, if you set log_min_messages to warning, the log file will include the warning, error, log, fatal, and panic messages

warning

log_min_error_statement

Specifies which SQL statements that cause errors are logged. Valid values are debug5, debug4, debug3, debug2, debug1, info, notice, warning, error, log, fatal, and panic see Severity levels table). The log file includes SQL statements for messages of the specified severity or higher. To disable the logging of failed statements, set log_min_error_statement to panic

error

Custom postgresql.conf

You can use the Custom postgresql.conf field to set configuration parameters for specific ADPG nodes using ADCM configuration groups. The settings specified in this field have higher priority than the settings specified in postgresql.conf. To switch to editing mode, click Custom postgresql.conf in the Configuration tree.

Custom pg_hba.conf

The section allows you to add lines to the pg_hba.conf file. The pg_hba.conf file configures the client authentication.

Airflow2

Use secret backend with Vault

Parameter Description Default value

Manage sensitive configuration data

When enabled, ADO takes over the creation of secrets (transferring them from configurations to Vault) as well as updating them. Requires the right to create secrets. Affects the Rotate fernet key action (see fernet key rotation)

true

Secrets backend

A secret backend to use

airflow.providers.hashicorp.secrets.vault.VaultBackend

url

Base URL for a Vault instance being addressed. Has to include protocol and port (e.g. http://127.0.0.1:8200)

—

auth_type

Authentication type for Vault. Possible values: approle, github, kubernetes, ldap, token, userpass

token

mount_point

The path the secret engine was mounted on. Note that this mount_point is not used for authentication if authentication is done via a different engine. For authentication mount points, see auth_mount_point

secret

config_path

Specifies the path of the Airflow configuration secret to read. If set to None (null), requests for configurations will not be sent to Vault

config

connections_path

Specifies the path of the secret to read to get connections. If set to None (null), requests for connections will not be sent to Vault

connections

variables_path

Specifies the path of the secret to read to get variables. If set to None (null), requests for variables will not be sent to Vault

variables

auth_mount_point

Defines a mount point for a chosen authentication type. The default value depends on the authentication method used

—

kv_engine_version

The engine version to run

token

Authentication token to include in requests sent to Vault (for the token and github authentication methods)

—

token_path

Path to the file containing authentication token to include in requests sent to Vault (for the token and github authentication methods)

—

username

Username for the ldap and userpass authentication methods

—

password

Password for the ldap and userpass authentication methods

—

secret_id

Secret ID for the approle authentication method

—

role_id

Role ID for the approle authentication method

—

kubernetes_role

Role for the kubernetes authentication method

—

kubernetes_jwt_path

Path to the Kubernetes JWT token for the kubernetes authentication method

—

Database settings

Parameter Description Default value

admin_password

The password of the webserver’s admin user

—

db_user

The name of the metadata DB user

airflow

db_password

The password of the metadata DB user

—

Database type

The external database type. Possible values: PostgreSQL, MySQL/MariaDB

PostgreSQL

Hostname

The external database host

Port

The external database port

5432

Airflow database name

The external database name

airflow

airflow.cfg [core]

Parameter Description Default value

dags_folder

The absolute path to the Airflow pipelines directory

/opt/airflow/dags

hostname_callable

A path to a callable, which will resolve the hostname. The format is package.function. The default value (airflow.utils.net.getfqdn) means that result from patched version of socket.getfqdn(). No argument should be required in the function specified. If using IP address as hostname is preferred, use value airflow.utils.net.get_host_ip_address

airflow.utils.net.getfqdn

might_contain_dag_callable

A callable to check if a Python file has Airflow DAGs defined or not with argument as: (file_path: str, zip_file: zipfile.ZipFile | None = None). Returns True if it has DAGs, otherwise False. If this is not provided, Airflow uses its own heuristic rules

airflow.utils.file.might_contain_dag_via_default_heuristic

default_timezone

Default timezone. Can be UTC (default), system, or any IANA timezone string (e.g. Europe/Amsterdam)

utc

executor

The executor class that Airflow should use. Choices include SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor, KubernetesExecutor, CeleryKubernetesExecutor, or a full import path to the class if a custom executor is used

CeleryExecutor

parallelism

This defines the maximum number of task instances that can run concurrently per scheduler in Airflow, regardless of the worker count. Generally this value, multiplied by the number of schedulers in your cluster, is the maximum number of task instances with the running state in the metadata database

max_active_tasks_per_dag

The maximum number of task instances allowed to run concurrently in each DAG. To calculate the number of tasks that is running concurrently for a DAG, add up the number of running tasks for all DAG runs of the DAG. This is configurable at the DAG level with max_active_tasks, which is defaulted as max_active_tasks_per_dag. An example scenario when this would be useful is when you want to stop a new dag with an early start date from stealing all the executor slots in a cluster

dags_are_paused_at_creation

The flag that indicates if DAGs are paused by default at creation

true

max_active_runs_per_dag

The maximum number of active DAG runs per DAG. The scheduler will not create more DAG runs if it reaches the limit. This is configurable at the DAG level with max_active_runs, which is defaulted as max_active_runs_per_dag

mp_start_method

The name of the method used in order to start Python processes via the multiprocessing module. This corresponds directly with the options available in the Python docs. Must be one of the values returned by multiprocessing

—

load_examples

Whether to load the DAG examples that ship with Airflow

true

plugins_folder

Path to the folder containing Airflow plugins

/opt/airflow/plugins

execute_tasks_new_python_interpreter

Should tasks be executed via forking of the parent process (False, the speedier option) or by spawning a new python process (True slow, but means plugin changes picked up by tasks straight away)

false

fernet_key

The secret key to save connection passwords in the database

—

donot_pickle

Whether to disable pickling DAGs

true

dagbag_import_timeout

How long before timing out a Python file import

dagbag_import_error_tracebacks

Should a traceback be shown in the UI for dagbag import errors instead of just the exception message

true

dagbag_import_error_traceback_depth

If tracebacks are shown, how many entries from the traceback should be shown

dag_file_processor_timeout

How long before timing out a DagFileProcessor, which processes a DAG file

task_runner

The class to use for running task instances in a subprocess. Choices include StandardTaskRunner, CgroupTaskRunner or the full import path to the class when using a custom task runner

StandardTaskRunner

default_impersonation

If set, tasks without a run_as_user argument will be run with this user. Can be used to de-elevate a sudo user running Airflow when executing tasks

—

security

Defines which security module to use. For example, kerberos

—

unit_test_mode

Turn unit test mode on (overwrites many configuration options with test values at runtime)

false

enable_xcom_pickling

Whether to enable pickling for xcom (note that this is insecure and allows for RCE exploits)

false

allowed_deserialization_classes

What classes can be imported during deserialization. This is a multi line value. The individual items will be parsed as regexp. Python built-in classes (like dict) are always allowed. Bare . will be replaced so you can set airflow.*

airflow\..*

killed_task_cleanup_time

When a task is killed forcefully, this is the amount of time in seconds that it has to cleanup after it is sent a SIGTERM, before it is SIGKILLED

dag_run_conf_overrides_params

Whether to override params with dag_run.conf. If you pass some key-value pairs through airflow dags backfill -c or airflow dags trigger -c, the key/value pairs will override the existing ones in params

true

dag_discovery_safe_mode

If enabled, Airflow will only scan files containing both DAG and airflow (case-insensitive)

true

dag_ignore_file_syntax

The pattern syntax used in the .airflowignore files in the DAG directories. Valid values are regexp or glob

regexp

default_task_retries

The number of retries each task is going to have by default. Can be overridden at DAG or task level

default_task_retry_delay

The number of seconds each task is going to wait by default between retries. Can be overridden at dag or task level

300

max_task_retry_delay

The maximum delay (in seconds) each task is going to wait by default between retries. This is a global setting and cannot be overridden at task or DAG level

86400

default_task_weight_rule

The weighting method used for the effective total priority weight of the task

downstream

default_task_execution_timeout

The default task execution_timeout value for the operators. Expected an integer value to be passed into timedelta as seconds. If not specified, then the value is considered as None, meaning that the operators are never timed out by default

—

min_serialized_dag_update_interval

Updating serialized DAG cannot be faster than a minimum interval to reduce database write rate

compress_serialized_dags

If True, serialized DAGs are compressed before writing to DB. This will disable the DAG dependencies view

false

min_serialized_dag_fetch_interval

Fetching serialized DAG cannot be faster than a minimum interval to reduce database read rate. This config controls when your DAGs are updated in the Webserver

max_num_rendered_ti_fields_per_task

Maximum number of rendered task instance fields (template fields) per task to store in the database. All the template_fields for each of task instance are stored in the database. Keeping this number small may cause an error when you try to view Rendered tab in TaskInstance view for older tasks

check_slas

On each dagrun check against defined SLAs

true

xcom_backend

Path to custom XCom class that will be used to store and resolve operators results

airflow.models.xcom.BaseXCom

lazy_load_plugins

By default, Airflow plugins are lazily-loaded (only loaded when required). Set it to False if you want to load plugins whenever airflow is invoked via CLI or loaded from module

true

lazy_discover_providers

By default, Airflow providers are lazily-discovered (discovery and imports happen only when required). Set it to False if you want to discover providers whenever airflow is invoked via CLI or loaded from a module

true

hide_sensitive_var_conn_fields

Hide sensitive variables or extra JSON connection keys from UI and task logs when set to True (connection passwords are always hidden in logs)

true

sensitive_var_conn_names

A comma-separated list of extra sensitive keywords to look for in variables names or connection’s extra JSON

—

default_pool_task_slot_count

Task slot counts for default_pool. This setting would not have any effect in an existing deployment where the default_pool is already created. For existing deployments, users can change the number of slots using webserver, API, or the CLI

128

max_map_length

The maximum list/dict length an XCom can push to trigger task mapping. If the pushed list/dict has a length exceeding this value, the task pushing the XCom will be failed automatically to prevent the mapped tasks from clogging the scheduler

1024

daemon_umask

The default umask to use for process when run in daemon mode (scheduler, worker, etc.) This controls the file-creation mode mask which determines the initial value of file permission bits for newly created files. This value is treated as an octal-integer

0o077

dataset_manager_class

Class to use as dataset manager

—

dataset_manager_kwargs

Kwargs to supply to dataset manager

—

database_access_isolation

Experimental feature. The flag that indicates whether components should use Airflow Internal API for DB connectivity

false

internal_api_url

Experimental feature. Airflow Internal API URL. Only used if the database_access_isolation core setting is True

—

airflow.cfg [database]

Parameter Description Default value

sql_alchemy_conn

The SQLAlchemy connection string to the metadata database. The value of the parameter is automatically populated based on the input values in the Database settings section. It is not displayed in the UI for security reasons. SQLAlchemy supports many different database engines

—

sql_alchemy_engine_args

Extra engine specific keyword args passed to SQLAlchemy’s create_engine, as a JSON-encoded value

—

sql_engine_encoding

The encoding for the databases

utf-8

sql_engine_collation_for_ids

Collation for dag_id, task_id, key, external_executor_id columns in case they have different encoding. By default, this collation is the same as the database collation, however for mysql and mariadb the default is utf8mb3_bin so that the index sizes of index keys will not exceed the maximum size of allowed index when collation is set to utf8mb4 variant

—

sql_alchemy_pool_enabled

If SQLAlchemy should pool database connections

true

sql_alchemy_pool_size

The SQLAlchemy pool size is the maximum number of database connections in the pool. 0 indicates no limit

sql_alchemy_max_overflow

The maximum overflow size of the pool. When the number of checked-out connections reaches the size set in pool_size, additional connections will be returned up to this limit. When those additional connections are returned to the pool, they are disconnected and discarded. The total number of simultaneous connections the pool will allow is pool_size + max_overflow, and the total number of sleeping connections the pool will allow is pool_size. max_overflow can be set to -1 to indicate no overflow limit; no limit will be placed on the total number of concurrent connections. Defaults to 10

sql_alchemy_pool_recycle

The SQLAlchemy pool recycle is the number of seconds a connection can be idle in the pool before it is invalidated. This config does not apply to Sqlite. If the number of DB connections is ever exceeded, a lower config value will allow the system to recover faster

1800

sql_alchemy_pool_pre_ping

Check connection at the start of each connection pool checkout

true

sql_alchemy_schema

The schema to use for the metadata database. SQLAlchemy supports databases with the concept of multiple schemas

—

sql_alchemy_connect_args

Import path for connection arguments in SQLAlchemy. Defaults to an empty dictionary. This is useful when you want to configure DB engine arguments that SQLAlchemy won’t parse in connection string

—

load_default_connections

Whether to load the default connections that ship with Airflow

true

max_db_retries

Number of times the code should be retried in case of DB operational errors. Not all transactions will be retried as it can cause undesired state. Currently, it is only used in DagFileProcessor.process_file to retry dagbag.sync_to_db

check_migrations

Whether to run alembic migrations during Airflow start up. Sometimes this operation can be expensive, and the users can assert the correct version through other means (e.g. through a Helm chart). Accepts True or False

true

airflow.cfg [logging]

Parameter Description Default value

base_log_folder

The absolute path to the Airflow log files directory. There are a few existing configurations that assume this is set to the default. If you choose to override this, you may need to update the dag_processor_manager_log_location and dag_processor_manager_log_location settings as well

/var/log/airflow

remote_logging

Airflow can store logs remotely in AWS S3, Google Cloud Storage, or Elastic Search. Set this to True if you want to enable remote logging

false

remote_log_conn_id

Users must supply an Airflow connection ID that provides access to the storage location. Depending on your remote logging service, this may only be used for reading logs, not writing them

—

delete_local_logs

Whether the local log files for GCS, S3, WASB, and OSS remote logging should be deleted after they are uploaded to the remote location

false

google_key_path

Path to Google Credential JSON file. If omitted, authorization based on the Application Default Credentials will be used

—

remote_base_log_folder

Storage bucket URL for remote logging. S3 buckets should start with s3://, Cloudwatch log groups should start with cloudwatch://, GCS buckets should start with gs://, WASB buckets should start with wasb just to help Airflow select correct handler, Stackdriver logs should start with stackdriver://

—

remote_task_handler_kwargs

The remote_task_handler_kwargs param is loaded into a dictionary and passed to __init__ of remote task handler and it overrides the values provided by Airflow config. For example, if you set delete_local_logs=False and you provide {{"delete_local_copy": true}}, then the local log files will be deleted after they are uploaded to remote location

—

encrypt_s3_logs

Use server-side encryption for logs stored in S3

false

logging_level

Logging level. Supported values: CRITICAL, ERROR, WARNING, INFO, DEBUG

INFO

celery_logging_level

Logging level for celery

WARNING

fab_logging_level

Logging level for Flask-appbuilder UI. Supported values: CRITICAL, ERROR, WARNING, INFO, DEBUG

WARNING

logging_config_class

The name of the class that specifies the logging configuration. This class has to be on the Python classpath

—

colored_console_log

Flag to enable/disable colored logs

true

colored_log_format

The log format for colored logs if they are enabled. The value must be taken in a tag raw/endraw

{% raw %}[%%(blue)s%%(asctime)s%%(reset)s] {%%(blue)s%%(filename)s:%%(reset)s%%(lineno)d} %%(log_color)s%%(levelname)s%%(reset)s - %%(log_color)s%%(message)s%%(reset)s{% endraw %}

colored_formatter_class

Specifies the class utilized by Airflow to implement colored logging

airflow.utils.log.colored_log.CustomTTYColoredFormatter

log_format

Format of log line. The value must be taken in a tag raw/endraw

{% raw %}[%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s{% endraw %}

simple_log_format

Defines the format of log messages for simple logging configuration

%%(asctime)s %%(levelname)s - %%(message)s

dag_processor_log_target

Where to store DAG parser logs. If set to file, logs are sent to log files defined in child_process_log_directory

file

dag_processor_log_format

DAG processor log line format. The value must be taken in a tag raw/endraw

{% raw %}[%%(asctime)s] [SOURCE:DAG_PROCESSOR]{{%%(filename)s:%%(lineno)d}} %%(levelname)s - %%(message)s{% endraw %}

log_formatter_class

Determines the formatter class used by Airflow for structuring its log messages. The default formatter class is timezone-aware, which means that timestamps attached to log entries will be adjusted to reflect the local timezone of the Airflow instance

airflow.utils.log.timezone_aware.TimezoneAware

secret_mask_adapter

An import path to a function to add adaptations of each secret added with airflow.utils.log.secrets_masker.mask_secret to be masked in log messages. The given function is expected to require a single parameter: the secret to be adapted. It may return a single adaptation of the secret or an iterable of adaptations to each be masked as secrets. The original secret will be masked as well as any adaptations returned

—

task_log_prefix_template

Prefix pattern specified with stream handler TaskHandlerWithCustomFormatter

—

log_filename_template

The format of generated Airflow file and path names for each task run. The value must be taken in a tag raw/endraw

{% raw %}dag_id={{ ti.dag_id }}/run_id={{ ti.run_id }}/task_id={{ ti.task_id }}/{%% if ti.map_index >= 0 %%}map_index={{ ti.map_index }}/{%% endif %%}attempt={{ try_number }}.log{% endraw %}

log_processor_filename_template

The format of generated Airflow file and path names for logs. The value must be taken in a tag raw/endraw

{% raw %}{{ filename }}.log{% endraw %}

dag_processor_manager_log_location

Full path of dag_processor_manager logfile

/var/log/airflow/dag_processor_manager/dag_processor_manager.log

task_log_reader

Name of handler to read task instance logs. Defaults to use task handler

task

extra_logger_names

A comma-separated list of third-party logger names that will be configured to print messages to consoles

—

worker_log_server_port

When you start an Airflow worker, the service starts a tiny web server subprocess to serve the workers local log files to the Airflow main web server, who then builds pages and sends them to users. This defines the port on which the logs are served. It must be unused, open, and visible from the main web server to connect into the workers

8793

trigger_log_server_port

Port to serve logs from for triggerer. See worker_log_server_port description for more information

8794

interleave_timestamp_parser

Import path to callable, which takes a string log line and returns the timestamp (datetime.datetime compatible)

—

file_task_handler_new_folder_permissions

Permissions in the form of octal string as understood by chmod. The permissions are important when you use impersonation, when logs are written by a different user than airflow. The most secure way of configuring it is to add both users to the same group and make it the default group of both users. Group-writeable logs are default in Airflow. For cases when the logs other-writeable, set the value to 0o777. You might decide to add more security if you do not use impersonation and change it to 0o755 to make it only owner-writeable. You can also make it just readable only for owner by changing it to 0o700, if all the access (read/write) for your logs happens from the same user

0o775

file_task_handler_new_file_permissions

Permissions in the form of octal string as understood by chmod. The permissions are important when you use impersonation, when logs are written by a different user than airflow. The most secure way of configuring it is to add both users to the same group and make it the default group of both users. For cases when the logs other-writeable, set the value to 0o666. You might decide to add more security if you do not use impersonation and change it to 0o644 to make it only owner-writeable. You can also make it just readable only for owner by changing it to 0o600, if all the access (read/write) for your logs happens from the same user

0o664

airflow.cfg [metrics]

Parameter Description Default value

metrics_allow_list

If you want to avoid emitting all the available metrics, you can configure a list of prefixes (comma-separated) to send only the metrics that start with the elements of the list (e.g. scheduler,executor,dagrun)

—

metrics_block_list

If you want to avoid emitting all the available metrics, you can configure a list of prefixes (comma-separated) to filter out metrics that start with the elements of the list (e.g. scheduler,executor,dagrun). If metrics_allow_list and metrics_block_list are both configured, metrics_block_list is ignored

—

statsd_on

Enables sending metrics to StatsD

true

statsd_host

Specifies the host address where the StatsD daemon (or server) is running

localhost

statsd_port

Specifies the port on which the StatsD daemon (or server) is listening to

8125

statsd_prefix

Defines the namespace for all metrics sent from Airflow to StatsD

airflow

stat_name_handler

A function that validates the StatsD stat name, applies changes to the stat name if necessary, and returns the transformed stat name. The function should have the following signature: def func_name(stat_name: str) → str

—

statsd_datadog_enabled

Enables datadog integration to send Airflow metrics

false

statsd_datadog_tags

List of datadog tags attached to all metrics(e.g. key1:value1,key2:value2)

—

statsd_datadog_metrics_tags

Set to False to disable metadata tags for some of the emitted metrics

true

statsd_custom_client_path

If you want to use your own custom StatsD client, set the relevant module path in this value. The module path must exist on your PYTHONPATH

—

statsd_disabled_tags

If you want to avoid sending all the available metrics tags to StatsD, you can configure a list of prefixes (comma-separated) to filter out metric tags that start with the elements of the list (e.g. job_id,run_id)

job_id,run_id

statsd_influxdb_enabled

Enables sending Airflow metrics with StatsD-Influxdb tagging convention

false

otel_on

Enables sending metrics to OpenTelemetry

false

otel_host

Specifies the hostname or IP address of the OpenTelemetry Collector to which Airflow sends traces

localhost

otel_port

Specifies the port of the OpenTelemetry Collector that is listening to

8889

otel_prefix

The prefix for the Airflow metrics

airflow

otel_interval_milliseconds

Defines the interval, in milliseconds, at which Airflow sends batches of metrics and traces to the configured OpenTelemetry Collector

60000

airflow.cfg [cli]

Parameter Description Default value

api_client

Defines the format of access to the API. The LocalClient will use the database directly, while the json_client will use the API running on the webserver

airflow.api.client.local_client

endpoint_url

If you set web_server_url_prefix, append it here as follows: endpoint_url = http://localhost:8080/myroot. So that the API URI looks like this: http://localhost:8080/myroot/api/experimental/...

http://localhost:8080

airflow.cfg [debug]

Parameter Description Default value

fail_fast

Used only with DebugExecutor. If set to True, DAG will fail with the first failed task

false

airflow.cfg [api]

Parameter Description Default value

enable_experimental_api

Enables the deprecated since the 2.0 version experimental REST API. These APIs do not have access control. The authenticated user has full access. Please consider using the stable REST API. For more information on migration, see RELEASE_NOTES.rst

false

auth_backends

Comma-separated list of auth backends to authenticate users of the API. The airflow.api.auth.backend.default value allows all requests

airflow.api.auth.backend.session,airflow.api.auth.backend.basic_auth

maximum_page_limit

Used to set the maximum page limit for API requests. If limit passed is greater than maximum page limit, it will be ignored and maximum page limit value will be set as the limit

100

fallback_page_limit

Used to set the default page limit when limit param is zero or not provided in API requests. Otherwise, if positive integer is passed in the API requests as limit, the smallest number of user given limit or maximum page limit is taken as limit

100

google_oauth2_audience

The intended audience for JWT token credentials used for authorization. This value must match on the client and server sides. If empty, audience will not be tested

—

google_key_path

Path to Google Cloud Service Account key file (JSON). If omitted, authorization based on the Application Default Credentials will be used

—

access_control_allow_headers

Used in response to a preflight request to indicate which HTTP headers can be used when making the actual request. This header is the server side response to the browser’s Access-Control-Request-Headers header

—

access_control_allow_methods

Specifies the method or methods allowed when accessing the resource

—

access_control_allow_origins

Indicates whether the response can be shared with requesting code from the given origins. Separate URLs with space

—

airflow.cfg [lineage]

Parameter

Description

Default value

backend

What lineage backend to use

—

airflow.cfg [atlas]

Parameter

Description

Default value

sasl_enabled

Enables SASL authentication fo connecting to Atlas

false

host

Atlas host

—

port

Atlas connection port

21000

username

Username for connecting to Atlas

—

password

Password for connecting to Atlas

—

airflow.cfg [operators]

Parameter Description Default value

default_owner

The default owner assigned to each new operator, unless provided explicitly or passed via default_args

airflow

default_cpus

Indicates the default number of CPU units allocated to each operator when no specific CPU request is specified in the operator’s configuration

default_ram

Indicates the default number of RAM allocated to each operator when no specific RAM request is specified in the operator’s configuration

512

default_disk

Indicates the default number of disk storage allocated to each operator when no specific disk request is specified in the operator’s configuration

512

default_gpus

Indicates the default number of GPUs allocated to each operator when no specific GPUs request is specified in the operator’s configuration

default_queue

Default queue that tasks get assigned to and that workers listen on

default

allow_illegal_arguments

Is allowed to pass additional/unused arguments (args, kwargs) to the BaseOperator operator. If set to False, an exception will be thrown, otherwise only the console message will be displayed

false

airflow.cfg [hive]

Parameter Description Default value

default_hive_mapred_queue

Default MapReduce queue for HiveOperator tasks

—

mapred_job_name_template

Template for mapred_job_name in HiveOperator, supports the following named parameters: hostname, dag_id, task_id, execution_date

—

airflow.cfg [webserver]

Parameter Description Default value

base_url

The base URL of your website as Airflow cannot guess what domain or cname you are using. This is used in automated emails that Airflow sends to point links to the right webserver

http://localhost:8080

default_ui_timezone

Default timezone to display all dates in the UI, can be UTC, system, or any IANA timezone string (e.g. Europe/Amsterdam). If left empty, the default value of core/default_timezone will be used

UTC

web_server_host

The IP specified when starting the webserver

0.0.0.0

web_server_port

The port on which to run the webserver

8080

web_server_ssl_cert

Paths to the SSL certificate and key for the webserver. When both are provided, SSL will be enabled. This does not change the webserver port

—

web_server_ssl_key

Paths to the SSL certificate and key for the webserver. When both are provided, SSL will be enabled. This does not change the webserver port

—

session_backend

The type of backend used to store web session data, can be database or securecookie

database

web_server_master_timeout

Number of seconds the webserver waits before killing gunicorn master that doesn’t respond

120

web_server_worker_timeout

Number of seconds the Gunicorn webserver waits before timing out on a worker

120

worker_refresh_batch_size

Number of workers to refresh at a time. When set to 0, worker refresh is disabled. For any other value, Airflow periodically refreshes webserver workers by bringing up new ones and killing old ones

worker_refresh_interval

Number of seconds to wait before refreshing a batch of workers

6000

reload_on_plugin_change

If set to True, Airflow will track files in the plugins_folder directory. When it detects changes, it will reload the Gunicorn

false

secret_key

Secret key used to run your flask app. It should be as random as possible. However, when running more than one instance of webserver, make sure all of them use the same secret_key otherwise one of them will error with CSRF session token is missing. The webserver key is also used to authorize requests to Celery workers when logs are retrieved. The token generated using the secret key has a short expiration time. Make sure that key expiration date is the same on all Airflow hosts (for example, using ntpd). Otherwise you might get a forbidden error

—

workers

Number of workers to run the Gunicorn webserver

worker_class

The worker class Gunicorn should use. Choices include sync (default), eventlet, gevent. When using gevent, you might also want to set the _AIRFLOW_PATCH_GEVENT environment variable to 1 to make sure gevent patching is done as early as possible

sync

access_logfile

Log files for the Gunicorn webserver. The - value means log to stderr

—

error_logfile

Log files for the Gunicorn webserver. The - value means log to stderr

—

access_logformat

Access log format for Gunicorn webserver. Default format is %%(h)s %%(l)s %%(u)s %%(t)s "%%(r)s" %%(s)s %%(b)s "%%(f)s" "%%(a)s". More information in the Gunicorn documentation

—

expose_config

Expose the configuration file in the webserver. Set to non-sensitive-only to show all values except those that have security implications. The value True — shows all values, False — hides the configuration completely

false

expose_hostname

Whether to expose hostname in the webserver

false

expose_stacktrace

Whether to expose stacktrace in the webserver

false

dag_default_view

Default DAG view. Valid values are: grid, graph, duration, gantt, landing_times

grid

dag_orientation

Default DAG orientation. Valid values are: LR (Left→Right), TB (Top→Bottom), RL (Right→Left), BT (Bottom→Top)

log_fetch_timeout_sec

The amount of time (in seconds) the webserver will wait for initial handshake while fetching logs from other worker machine

log_fetch_delay_sec

Time interval (in seconds) to wait before next log fetching

log_auto_tailing_offset

Distance away from page bottom to enable auto tailing

log_animation_speed

Animation speed for auto tailing log display

1000

hide_paused_dags_by_default

By default, the webserver shows paused DAGs. Flip this to hide paused DAGs by default

false

page_size

Consistent page size across all listing views in the UI

100

navbar_color

Defines the color of navigation bar

#fff

default_dag_run_display_number

Default dagrun to show in UI

enable_proxy_fix

Enables werkzeug ProxyFix middleware for reverse proxy

false

proxy_fix_x_for

Number of values to trust for X-Forwarded-For. More information in the werkzeug documentation

proxy_fix_x_proto

Number of values to trust for X-Forwarded-Proto

proxy_fix_x_host

Number of values to trust for X-Forwarded-Host

proxy_fix_x_port

Number of values to trust for X-Forwarded-Port

proxy_fix_x_prefix

Number of values to trust for X-Forwarded-Prefix

cookie_secure

Sets secure flag on session cookie

false

cookie_samesite

Sets same-site policy on session cookie

Lax

default_wrap

Default setting for wrap toggle on DAG code and TI log views

false

x_frame_enabled

Allows the UI to be rendered in a frame

true

analytics_tool

Whether to send anonymous user activity to your analytics tool. Supported values: google_analytics, segment, metarouter

—

analytics_id

Unique ID of your account in the analytics tool

—

show_recent_stats_for_completed_runs

Recent Tasks stats will show for old DagRuns if set

true

update_fab_perms

Whether to update FAB permissions and sync security manager roles on webserver startup

true

session_lifetime_minutes

The UI cookie lifetime in minutes. User will be logged out from UI after session_lifetime_minutes of non-activity

43200

instance_name

Sets a custom page title for the DAGs overview page and site title for all pages

—

instance_name_has_markup

Whether the custom page title for the DAGs overview page contains any markup language

false

auto_refresh_interval

How frequently, in seconds, the DAG data will auto-refresh in graph or grid view when auto-refresh is turned on

warn_deployment_exposure

Boolean for displaying warning for publicly viewable deployment

true

audit_view_excluded_events

Comma-separated string of view events to exclude from DAG audit view. All other events will be added minus the ones passed here. The audit logs in the DB will not be affected by this parameter

gantt,landing_times,tries,duration,calendar,graph,grid,tree,tree_data

audit_view_included_events

Comma-separated string of view events to include in DAG audit view. If passed, only these events will populate the DAG audit view. The audit logs in the DB will not be affected by this parameter

—

enable_swagger_ui

Boolean for running SwaggerUI in the webserver

true

run_internal_api

Boolean for running Internal API in the webserver

false

auth_rate_limited

Boolean for enabling rate limiting on authentication endpoints

true

auth_rate_limit

Rate limit for authentication endpoints

5 per 40 second

caching_hash_method

The caching algorithm used by the webserver. Must be a valid hashlib function name

md5

airflow.cfg [email]

Parameter Description Default value

email_backend

Email backend to use

airflow.utils.email.send_email_smtp

email_conn_id

An Airflow connection that contains SMTP credentials

smtp_default

default_email_on_retry

Whether email alerts should be sent when a task is retried

true

default_email_on_failure

Whether email alerts should be sent when a task failed

true

subject_template

File that will be used as the template for email subject (which will be rendered using Jinja2). If not set, Airflow uses a base template

—

html_content_template

File that will be used as the template for email content (which will be rendered using Jinja2). If not set, Airflow uses a base template

—

from_email

Email address that will be used as sender address. It can either be raw email or the complete address in a format Sender Name <sender@email.com>

—

airflow.cfg [smtp]

Parameter Description Default value

smtp_host

Specifies the host server address used by Airflow when sending out email notifications via SMTP

localhost

smtp_starttls

Determines whether to use the STARTTLS command when connecting to the SMTP server

true

smtp_ssl

Determines whether to use an SSL connection when talking to the SMTP server

false

smtp_user

Username to authenticate when connecting to SMTP server

—

smtp_password

Password to authenticate when connecting to SMTP server

—

smtp_port

Defines the port number on which Airflow connects to the SMTP server to send email notifications

smtp_mail_from

Specifies the default from email address used when Airflow sends email notifications

airflow@example.com

smtp_timeout

Determines the maximum time (in seconds) the Apache Airflow system will wait for a connection to the SMTP server to be established

smtp_retry_limit

Defines the maximum number of times Airflow will attempt to connect to the SMTP server

airflow.cfg [sentry]

Parameter Description Default value

sentry_on

Enables error reporting to Sentry

false

sentry_dsn

A Sentry DSN URL

—

before_send

Dotted path to a before_send function that the sentry SDK should be configured to use

—

airflow.cfg [local_kubernetes_executor]

Parameter Description Default value

kubernetes_queue

Defines whether to send a task to KubernetesExecutor or LocalKubernetesExecutor. When the queue of a task is the value of kubernetes_queue (default kubernetes), the task is executed via KubernetesExecutor, otherwise via LocalExecutor

kubernetes

airflow.cfg [celery_kubernetes_executor]

Parameter Description Default value

kubernetes_queue

Defines when to send a task to KubernetesExecutor when using CeleryKubernetesExecutor. When the queue of a task is the value of kubernetes_queue (default kubernetes), the task is executed via KubernetesExecutor, otherwise via CeleryExecutor

kubernetes

airflow.cfg [celery]

Parameter Description Default value

celery_app_name

The app name that will be used by Celery

airflow.executors.celery_executor

worker_concurrency

The concurrency that will be used when starting workers with the airflow celery worker command. This defines the number of task instances that a worker will take, so size up your workers based on the resources on your worker box and the nature of your tasks

worker_autoscale

The maximum and minimum concurrency that will be used when starting workers with the airflow celery worker command (always keep minimum processes, but grow to maximum if necessary). The value should be in format max_concurrency,min_concurrency. If autoscale option is available, worker_concurrency will be ignored

—

worker_prefetch_multiplier

Used to increase the number of tasks that a worker prefetches, which can improve performance. The number of processes multiplied by worker_prefetch_multiplier is the number of tasks that are prefetched by a worker. A value greater than 1 can result in tasks being unnecessarily blocked if there are multiple workers and one worker prefetches tasks that sit behind long running tasks while another worker has unutilized processes that are unable to process the already claimed blocked tasks. For more information, see Celery documentation

worker_enable_remote_control

Specify if remote control of workers is enabled. In some cases, when the broker does not support remote control, Celery creates lots of .*reply-celery-pidbox queues. You can prevent this by setting this parameter to false. However, with this option is disabled, Flower won’t work. For more information, see Celery documentation

true

broker_url

The Celery broker URL. Celery supports RabbitMQ, Redis, and experimentally a SQLAlchemy database. Refer to the Celery documentation for more information

redis://{{groups['redis.server'][0]|d(omit)}}:6379/0

result_backend

The Celery backend for storing job metadata. When a job finishes, it needs to update the metadata of the job. Therefore it will post a message on a message bus or insert it into a database (depending of the backend). This status is used by the scheduler to update the state of the task. The use of a database is highly recommended. When not specified, sql_alchemy_conn with a db+ scheme prefix will be used. For more information, see Celery documentation

—

result_backend_sqlalchemy_engine_options

Optional configuration dictionary to pass to the Celery result backend SQLAlchemy engine

—

flower_host

Celery Flower is a sweet UI for Celery. Airflow has a shortcut to start it airflow celery flower. This defines the IP that Celery Flower runs on

0.0.0.0

flower_url_prefix

The root URL for Flower

—

flower_port

The port that Celery Flower runs on

5555

flower_basic_auth

Enable basic authentication for Flower. This parameter takes a string in the format user:password, which will be required when accessing the Flower UI

—

sync_parallelism

How many processes CeleryExecutor uses to sync task state. 0 means to use max

celery_config_options

Import path for Celery configuration options

airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG

ssl_active

Defines if SSL is active for Airflow

false

ssl_key

Path to the client key

—

ssl_cert

Path to the client certificate

—

ssl_cacert

Path to the CA certificate

—

pool

Celery pool implementation. Possible choices are: prefork (default), eventlet, gevent, or solo. For more information, see Celery documentation

prefork

operation_timeout

The number of seconds to wait before timing out send_task_to_executor or fetch_celery_task_state operations

task_track_started

Celery task will report its status as started when the task is executed by a worker. This is used in Airflow to keep track of the running tasks and if a Scheduler is restarted or run in HA mode, it can adopt the orphan tasks launched by previous SchedulerJob

true

task_publish_max_retries

The maximum number of retries for publishing task messages to the broker when failing due to AirflowTaskTimeout error before giving up and marking a task as failed

worker_precheck

Worker initialisation check to validate metadata database connection

false

airflow.cfg [celery_broker_transport_options]

Parameter Description Default value

visibility_timeout

The visibility timeout defines the number of seconds to wait for the worker to acknowledge the task before the message is redelivered to another worker. Make sure to increase the visibility timeout to match the time of the longest ETA you’re planning to use. visibility_timeout is only supported for Redis and SQS Celery brokers. See Celery documentation for details

—

airflow.cfg [dask]

Parameter

Description

Default value

cluster_address

The IP address and port of the Dask cluster’s scheduler

127.0.0.1:8786

tls_ca

TLS/ SSL settings to access a secured Dask scheduler

—

tls_cert

TLS Certificate

—

tls_key

TLS Certificate key

—

airflow.cfg [scheduler]

Parameter Description Default value

job_heartbeat_sec

Defines the frequency (in seconds) at which task instances should listen for external kill signal (when you clear tasks from the CLI or the UI)

scheduler_heartbeat_sec

The scheduler constantly tries to trigger new tasks. This defines how often the scheduler should run (in seconds)

num_runs

The number of times to try to schedule each DAG file. -1 indicates unlimited number

-1

scheduler_idle_sleep_time

Controls how long the scheduler will sleep between loops. If there was nothing to schedule, the next loop starts straight away

min_file_process_interval

Number of seconds after which a DAG file is parsed. The DAG file is parsed every min_file_process_interval number of seconds. Updates to DAGs are reflected after this interval. Keeping this number low will increase CPU usage

parsing_cleanup_interval

How often (in seconds) to check for stale DAGs (DAGs which are no longer present in the expected files) which should be deactivated, as well as datasets that are no longer referenced and should be marked as orphaned

stale_dag_threshold

How long (in seconds) to wait after we have re-parsed a DAG file before deactivating stale DAGs (DAGs which are no longer present in the expected files). The absolute maximum that this could take is dag_file_processor_timeout, but when you have a long timeout configured, it results in a significant delay in the deactivation of stale DAGs

dag_dir_list_interval

How often (in seconds) to scan the DAGs directory for new files. Default to 5 minutes

300

print_stats_interval

How often should stats be printed to the logs. Setting to 0 will disable printing stats

pool_metrics_interval

How often (in seconds) should pool usage stats be sent to StatsD (if statsd_on is enabled)

scheduler_health_check_threshold

If the last scheduler heartbeat happened more than scheduler_health_check_threshold ago (in seconds), scheduler is considered unhealthy. This is used by the health check in the /health endpoint and in airflow jobs check CLI for SchedulerJob

enable_health_check

When you start a scheduler, Airflow starts a tiny webserver subprocess to serve a health check if this is set to True

false

scheduler_health_check_server_port

When you start a scheduler, Airflow starts a tiny webserver subprocess to serve a health check on this port

8974

orphaned_tasks_check_interval

How often (in seconds) should the scheduler check for orphaned tasks and SchedulerJobs

300

child_process_log_directory

Determines the directory where logs for the child processes of the scheduler will be stored

/var/log/airflow/scheduler

scheduler_zombie_task_threshold

Local task jobs periodically heartbeat to the DB. If the job has not heartbeat in this many seconds, the scheduler will mark the associated task instance as failed and will re-schedule the task

300

zombie_detection_interval

How often (in seconds) should the scheduler check for zombie tasks

catchup_by_default

Turn off scheduler catchup by setting this to False. Default behavior is unchanged and command line backfills still work, but the scheduler will not do scheduler catchup if this is False, however it can be set on a per DAG basis in the DAG definition (catchup)

true

ignore_first_depends_on_past_by_default

Setting this to True will make first task instance of a task ignore depends_on_past setting. A task instance will be considered as the first task instance of a task when there is no task instance in the DB with an execution_date earlier than it, i.e. no manual marking success will be needed for a newly added task to be scheduled

true

max_tis_per_query

This changes the batch size of queries in the scheduling main loop. If this is too high, SQL query performance may be impacted by complexity of query predicate, and/or excessive locking. Additionally, you may hit the maximum allowable query length for your db. Set this to 0 for no limit (not advised)

512

use_row_level_locking

Should the scheduler issue SELECT … FOR UPDATE in relevant queries. If this is set to False then you should not run more than a single scheduler at once

true

max_dagruns_to_create_per_loop

Max number of DAGs to create DagRuns for per scheduler loop

max_dagruns_per_loop_to_schedule

How many DagRuns should a scheduler examine (and lock) when scheduling and queuing tasks

schedule_after_task_execution

Should the Task supervisor process perform a mini scheduler to attempt to schedule more tasks of the same DAG. Leaving this on will mean tasks in the same DAG execute quicker, but might starve out other DAGs in some circumstances

true

parsing_pre_import_modules

The scheduler reads DAG files to extract the Airflow modules that are going to be used, and imports them ahead of time to avoid having to re-do it for each parsing process. This flag can be set to False to disable this behavior in case an Airflow module needs to be freshly imported each time (at the cost of increased DAG parsing time)

true

parsing_processes

The scheduler can run multiple processes in parallel to parse dags. This defines how many processes will run

file_parsing_sort_mode

Determines the format of DAG parsing and sorting by the scheduler. One of three values can be specified:

modified_time — sort by modified time of the files. This is useful on large scale to parse the recently modified DAGs first.
random_seeded_by_host — sort randomly across multiple Schedulers but with same order on the same host. This is useful when running with Scheduler in HA mode where each scheduler can parse different DAG files.
alphabetical — sort by filename

modified_time

standalone_dag_processor

Whether the DAG processor is running as a standalone process or it is a subprocess of a scheduler job

true

max_callbacks_per_loop

Only applicable if [scheduler]standalone_dag_processor is True and callbacks are stored in database. Contains maximum number of callbacks that are fetched during a single loop

dag_stale_not_seen_duration

Only applicable if [scheduler]standalone_dag_processor is True. Time in seconds after which DAGs, which were not updated by DAG Processor are deactivated

600

use_job_schedule

Turn off scheduler use of cron intervals by setting this to False. DAGs submitted manually in the web UI or with trigger_dag will still run

true

allow_trigger_in_future

Allows externally triggered DagRuns for Execution Dates in the future. Only has effect if schedule_interval is set to None in DAG

false

trigger_timeout_check_interval

How often to check for expired trigger requests that have not run yet

task_queued_timeout

Amount of time a task can be in the queued state before being retried or set to failed

600

task_queued_timeout_check_interval

How often to check for tasks that have been in the queued state for longer than [scheduler] task_queued_timeout

120

allowed_run_id_pattern

The run_id pattern used to verify the validity of user input to the run_id parameter when triggering a DAG. This pattern cannot change the pattern used by scheduler to generate run_id for scheduled DAG runs or DAG runs triggered without changing the run_id parameter

^[A-Za-z0-9_.~:+-]+$

airflow.cfg [triggerer]

Parameter

Description

Default value

default_capacity

How many triggers a single Triggerer will run at once, by default

1000

job_heartbeat_sec

How often to heartbeat the Triggerer job to ensure it hasn’t been killed

airflow.cfg [kerberos]

Parameter Description Default value

ccache

Location of your ccache file once kinit has been performed

/opt/airflow/krb5_ccache

principal

Kerberos principal

—

reinit_frequency

Kerberos reinit frequency

3600

kinit_path

Path to the kinit executable

kinit

keytab

Designates the path to the Kerberos keytab file for the Airflow user

—

forwardable

Allows you to disable ticket forwardability

true

include_ip

Allows you to remove source IP from token, useful when using token behind NATted Docker host

true

airflow.cfg [elasticsearch]

Parameter Description Default value

host

Elasticsearch host

—

log_id_template

Format of the log_id, which is used to query for a given tasks logs. The value must be taken in a tag raw/endraw

{% raw %}{dag_id}-{task_id}-{run_id}-{map_index}-{try_number}{% endraw %}

end_of_log_mark

Used to mark the end of a log stream for a task

end_of_log

frontend

Qualified URL for an Elasticsearch frontend (like Kibana) with a template argument for log_id. Code will construct log_id using the log_id template from the argument above. The scheme will default to https if one is not provided

—

write_stdout

Write the task logs to the stdout of the worker, rather than the default files

false

json_format

Instead of the default log formatter, write the log lines as JSON

false

json_fields

Attach log fields to the JSON output, if enabled

asctime, filename, lineno, levelname, message

host_field

The field where host name is stored (normally either host or host.name)

host

offset_field

The field where offset is stored (normally either offset or log.offset)

offset

index_patterns

Comma-separated list of index patterns to use when searching for logs (by default, _all)

_all

airflow.cfg [elasticsearch_configs]

Parameter Description Default value

use_ssl

Whether to use SSL for conections

false

verify_certs

Whether to verify SSL certificates. Set to False if a self-signed certificate is used

true

airflow.cfg [kubernetes_executor]

Parameter Description Default value

api_client_retry_configuration

Kwargs to override the default urllib3 Retry used in the Kubernetes API client

—

pod_template_file

Path to the YAML pod file that forms the basis for KubernetesExecutor workers

—

worker_container_repository

The repository of the Kubernetes image for the worker to run

—

worker_container_tag

The tag of the Kubernetes image for the worker to run

—

namespace

The Kubernetes namespace where Airflow workers should be created. Defaults to default

default

delete_worker_pods

If True, all worker pods will be deleted upon termination

true

delete_worker_pods_on_failure

If False (and delete_worker_pods is True), failed worker pods will not be deleted so users can investigate them. This only prevents removal of worker pods where the worker itself failed, not when the task it ran failed

false

worker_pods_creation_batch_size

Number of Kubernetes worker pod creation calls per scheduler loop. The current default of 1 will only launch a single pod per-heartbeat. It’s recommended that users increase this number to match the tolerance of their Kubernetes cluster for better performance

multi_namespace_mode

Allows users to launch pods in multiple namespaces. Will require creating a cluster-role for the scheduler, or use multi_namespace_mode_namespace_list configuration

false

multi_namespace_mode_namespace_list

If multi_namespace_mode is True while scheduler does not have a cluster-role, give the list of namespaces where the scheduler will schedule jobs scheduler needs to have the necessary permissions in these namespaces

—

in_cluster

Whether to use the service account that Kubernetes gives to pods to connect to Kubernetes cluster. It’s intended for clients that expect to be running inside a pod running on Kubernetes. It will raise an exception if called from a process not running in a Kubernetes environment

true

cluster_context

When running with in_cluster=False change the default cluster_context or config_file options to Kubernetes client. Leave blank these to use default behaviour like kubectl has

—

config_file

Path to the Kubernetes configfile to be used when in_cluster is set to False

—

kube_client_request_args

Keyword parameters to pass while calling a Kubernetes client core_v1_api methods from Kubernetes Executor provided as a single line formatted JSON dictionary string. List of supported params are similar for all core_v1_apis, hence a single config variable for all APIs. For more information, see core_v1_api.py

—

delete_option_kwargs

Optional keyword arguments to pass to the delete_namespaced_pod Kubernetes client core_v1_api method when using the Kubernetes Executor. This should be an object and can contain any of the options listed in the v1DeleteOptions class defined in the V1DeleteOptions class

—

enable_tcp_keepalive

Enables TCP keepalive mechanism. This prevents Kubernetes API requests to hang indefinitely when idle connection is time-outed on services like cloud load balancers or firewalls

true

tcp_keep_idle

When the enable_tcp_keepalive option is enabled, TCP probes a connection that has been idle for tcp_keep_idle seconds

120

tcp_keep_intvl

When the enable_tcp_keepalive option is enabled, if Kubernetes API does not respond to a keepalive probe, TCP retransmits the probe after tcp_keep_intvl seconds

tcp_keep_cnt

When the enable_tcp_keepalive option is enabled, if Kubernetes API does not respond to a keepalive probe, TCP retransmits the probe tcp_keep_cnt number of times before a connection is considered to be broken

verify_ssl

Set this to False to skip verifying SSL certificate of Kubernetes Python client

true

worker_pods_queued_check_interval

How often in seconds to check for task instances stuck in queued status without a pod

ssl_ca_cert

Path to a CA certificate to be used by the Kubernetes client to verify the server’s SSL certificate

—

airflow.cfg [sensors]

Parameter

Description

Default value

default_timeout

Sensor default timeout, 7 days by default (7 * 24 * 60 * 60)

604800

Custom airflow.cfg

This field enables adding custom parameters to the airflow_cfg configuration files.

airflow-env.sh

Parameter

Description

Default value

AIRFLOW_HOME

The home directory for Airflow service

/opt/airflow

AIRFLOW_CONFIG

The location of Airflow configuration file

/opt/airflow/airflow.cfg

AIRFLOW_PYTHON_PATH

The location of Python used by Airflow

/opt/airflow/bin/python3.10

DAG_PROCESSOR_SUBDIR

The location of Airflow stored DAGs

/opt/airflow/dags

Custom airflow-env.sh

This field enables adding custom parameters to the airflow_cfg configuration files.

LDAP Security manager

Parameter Description Default value

AUTH_LDAP_SERVER

The LDAP server URI

—

AUTH_LDAP_BIND_USER

The path of the LDAP proxy user to bind on to the top level. Example: cn=airflow,ou=users,dc=example,dc=com

—

AUTH_LDAP_BIND_PASSWORD

The password of the bind user

—

AUTH_LDAP_SEARCH

Update with the LDAP path under which you’d like the users to have access to Airflow. Example: dc=example, dc=com

—

AUTH_LDAP_UID_FIELD

The UID (unique identifier) field in LDAP

—

AUTH_ROLES_MAPPING

The parameter for mapping the internal roles to the LDAP Active Directory groups

—

AUTH_LDAP_GROUP_FIELD

The LDAP user attribute which has their role DNs

—

AUTH_ROLES_SYNC_AT_LOGIN

A flag that indicates if all the user’s roles should be replaced on each login, or only on registration

true

PERMANENT_SESSION_LIFETIME

Sets an inactivity timeout after which users have to re-authenticate (to keep roles in sync)

1800

AUTH_LDAP_USE_TLS

Boolean whether TLS is being used

false

AUTH_LDAP_ALLOW_SELF_SIGNED

Boolean to allow self-signed certificates

true

AUTH_LDAP_TLS_CACERTFILE

Location of the certificate

—

Dependency Management

Parameter Description Default value

Extra requirements

List of Python packages to be installed on Airflow hosts. Use the standard requirements.txt format: <package_name>==<version>

—

index-url

Base URL of the Python Package Index (default: https://pypi.org/simple). The URL must point to a repository that complies with PEP 503 (the simple API) or to a local directory with the same structure

—

index-url-user

Username used for authenticating with the repository specified in index-url

—

index-url-password

Password used for authenticating with the repository specified in index-url

—

proxy

Address of the proxy server through which package installation requests will be routed

—

proxy-user

Username for authenticating with the proxy server

—

proxy-password

Password used for authenticating with the proxy server

—

trusted-host

IP address of the host or the <host IP>:<port> pair to be treated as trusted, even if it lacks valid HTTPS. Useful for internal or self-hosted repositories

—

Airflow components configuration

Parameter Description Default value

Enable custom ulimits

Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the Ulimit settings table

[Service]
DefaultLimitCPU=
DefaultLimitFSIZE=
DefaultLimitDATA=
DefaultLimitSTACK=
DefaultLimitCORE=
DefaultLimitRSS=
DefaultLimitNOFILE=
DefaultLimitAS=
DefaultLimitNPROC=
DefaultLimitMEMLOCK=
DefaultLimitLOCKS=
DefaultLimitSIGPENDING=
DefaultLimitMSGQUEUE=
DefaultLimitNICE=
DefaultLimitRTPRIO=
DefaultLimitRTTIME=

Airflow Flower
Parameter	Description	Default value
auto_refresh	Enables automatic refresh for the *Workers* view. By default, the *Workers* view automatically refreshes at regular intervals to provide up-to-date information about the workers. Set this option to `False` to disable automatic refreshing	true
ca_cert	Sets the path to the ca_certs file containing a set of concatenated certification authority certificates	—
cert_file	Sets the path to the SSL certificate file	—
keyfile	Sets the path to the SSL key file	—
db	Sets the database file to use if persistent mode is enabled	flower
tasks_columns	Specifies the list of comma-separated columns to display on the *Tasks* page	name,uuid,state,args,kwargs,result,received,started,runtime,worker
persistent	When persistent mode is enabled, Flower saves its current state and reloads it upon restart. This ensures that Flower retains its state and configuration across restarts. Flower stores its state in a database file specified by the db option	false
debug	Enables the debug mode	false
enable_events	When enabled, Flower periodically sends Celery `enable_events` commands to all workers. Enabling Celery events allows Flower to receive real-time updates about task events from the Celery workers	false
inspect_timeout	Sets the timeout for the worker inspect commands in milliseconds	1000
max_workers	Sets the maximum number of workers to keep in memory	5000
max_tasks	Sets the maximum number of tasks to keep in memory	100000
natural_time	Enables showing time relative to the page refresh time in a human-readable format	false
state_save_interval	Sets the interval for saving the Flower state. Flower state includes information about workers, tasks. The state is saved periodically to ensure data persistence and recovery upon restart	100000
xheaders	Enables support for X-Real-Ip and X-Scheme headers	false
purge_offline_workers	Time (in seconds) after which offline workers are automatically removed from the Workers view. By default, offline workers will remain on the dashboard indefinitely	—
task_runtime_metric_buckets	Sets the task runtime latency buckets. You can provide the buckets value as a comma-separated list of values	—
auth_provider	Sets the authentication provider for Flower. By default, the `auth_provider` option is set to `None`, indicating that no authentication provider is configured	—
auth	Enables authentication. `auth` is a regular expression of emails to grant access. The `auth` option allows you to enable authentication in Flower. By default, the `auth` option is set to an empty string, indicating that authentication is disabled	—
oauth2_key	Sets the OAuth 2.0 key (client ID) issued by the OAuth 2.0 provider	—
oauth2_secret	Sets the OAuth 2.0 secret issued by the OAuth 2.0 provider	—
oauth2_redirect_uri	Sets the URI to which an OAuth 2.0 server redirects the user after successful authentication and authorization	—
cookie_secret	Sets a secret key for signing cookies	—
Enable custom ulimits	Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the Ulimit settings table	`[Service] DefaultLimitCPU= DefaultLimitFSIZE= DefaultLimitDATA= DefaultLimitSTACK= DefaultLimitCORE= DefaultLimitRSS= DefaultLimitNOFILE= DefaultLimitAS= DefaultLimitNPROC= DefaultLimitMEMLOCK= DefaultLimitLOCKS= DefaultLimitSIGPENDING= DefaultLimitMSGQUEUE= DefaultLimitNICE= DefaultLimitRTPRIO= DefaultLimitRTTIME=`

Ulimit settings
Parameter	Description	Corresponding option of the ulimit command in CentOS
LimitCPU	A limit in seconds on the amount of CPU time that a process can consume	cpu time ( -t)
DefaultLimitFSIZE	The maximum size of files that a process can create, in 512-byte blocks	file size ( -f)
DefaultLimitDATA	The maximum size of a process’s data segment, in kilobytes	data seg size ( -d)
DefaultLimitSTACK	The maximum stack size allocated to a process, in kilobytes	stack size ( -s)
DefaultLimitCORE	The maximum size of a core dump file allowed for a process, in 512-byte blocks	core file size ( -c)
DefaultLimitRSS	The maximum amount of RAM memory (resident set size) that can be allocated to a process, in kilobytes	max memory size ( -m)
DefaultLimitNOFILE	The maximum number of open file descriptors allowed for the process	open files ( -n)
DefaultLimitAS	The maximum size of the process virtual memory (address space), in kilobytes	virtual memory ( -v)
DefaultLimitNPROC	The maximum number of processes	max user processes ( -u)
DefaultLimitMEMLOCK	The maximum memory size that can be locked for the process, in kilobytes. Memory locking ensures the memory is always in RAM and a swap file is not used	max locked memory ( -l)
DefaultLimitLOCKS	The maximum number of files locked by a process	file locks ( -x)
DefaultLimitSIGPENDING	The maximum number of signals that are pending for delivery to the calling thread	pending signals ( -i)
DefaultLimitMSGQUEUE	The maximum number of bytes in POSIX message queues. POSIX message queues allow processes to exchange data in the form of messages	POSIX message queues ( -q)
DefaultLimitNICE	The maximum NICE priority level that can be assigned to a process	scheduling priority ( -e)
DefaultLimitRTPRIO	The maximum real-time scheduling priority level	real-time priority ( -r)
DefaultLimitRTTIME	The maximum pipe buffer size, in 512-byte blocks	pipe size ( -p)

GitSync

config.json

A JSON configuration with repository parameters for GitSync.

Parameter Description Default value

url

Git repository URL

git@ssh.gitlab.example.io:org/repo.git

files

File filter pattern

*.py

branch

Git branch

main

Monitoring

Prometheus settings

Parameter

Description

Default value

scrape_interval

Specifies how frequently to scrape targets

scrape_timeout

Specifies how long to wait until a scrape request times out

10s

Password for Grafana connection

Password of a Grafana user to connect to Prometheus

—

Prometheus users to login/logout to Prometheus

User credentials for logging into the Prometheus web interface

—

Service parameters
Parameter	Description	Default value
config.file	Path to the main Prometheus configuration file, which defines scrape jobs, alerting rules, and other settings	/etc/admprom/prometheus/prometheus.yml
storage.tsdb.path	Directory where Prometheus stores its time series database (TSDB) files	/var/lib/admprom/prometheus
web.console.libraries	Location of console library files used for rendering the Prometheus UI consoles	/usr/share/admprom/prometheus/console_libraries
web.console.templates	Directory containing console templates for the Prometheus UI	/usr/share/admprom/prometheus/consoles
web.config.file	Path to the web configuration file, used for authentication, TLS, and other web server settings	/etc/admprom/prometheus/prometheus-auth.yml
storage.tsdb.retention.time	Defines how long to retain data in the time series database before deletion	15d
web.listen-address	IP address and port where the Prometheus web interface and API listen for incoming connections	0.0.0.0:11200

Grafana settings

Parameter

Description

Default value

Grafana administrator’s password

Password of a Grafana administrator user

—

Grafana listen port

Port to access the Grafana web interface

11210

Node Exporter settings

Parameter

Description

Default value

Listen port

Port to listen for a host’s system metrics in the Prometheus format

11203

Metrics endpoint

Endpoint to retrieve system metrics

/metrics

SSL configuration

Parameter

Description

Default value

[Prometheus] → Enable SSL

Defines whether SSL is enabled for Prometheus

false

[Prometheus] → Certificate file

Path to the Prometheus server SSL certificate file in the PEM format

/etc/admprom/prometheus/server.crt

[Prometheus] → Private key file

Path to the private key file of the Prometheus server SSL certificate

/etc/admprom/prometheus/server.key

[Prometheus] → Certificate authority file

Path to the certificate authority file

/etc/admprom/prometheus/ca.crt

[Grafana] → Enable SSL

Defines whether SSL is enabled for Grafana

false

[Grafana] → Certificate file

Path to the Grafana server SSL certificate file in the PEM format

/etc/admprom/grafana/server.crt

[Grafana] → Private key file

Path to the private key file of the Grafana server SSL certificate

/etc/admprom/grafana/server.key

[Grafana] → Certificate authority file

Path to the certificate authority file

/etc/admprom/grafana/ca.crt

[Node-exporter] → Enable SSL

Defines whether SSL is enabled for Node Exporter

false

[Node-exporter] → Certificate file

Path to the Node Exporter server SSL certificate file in the PEM format

/etc/ssl/server.crt

[Node-exporter] → Private key file

Path to the private key file of the Node Exporter server SSL certificate

/etc/ssl/server.key

Set SSL rights for certs/key

Enables changing the owner and permissions of the SSL certificate and key files

false

Scrape config for statsd (statsd_scraper.yml)

Parameter Description Default value

job_name

The name of the job within which metrics will be collected

statsd_exporter

scrape_interval

Specifies how frequently to scrape targets

scrape_timeout

Specifies how long to wait until a scrape request times out. Cannot be greater than the value of the scrape_interval parameter

—

Scrape config for Flower (flower_scraper.yml)

Parameter Description Default value

job_name

The name of the job within which metrics will be collected

flower_exporter

scrape_interval

Specifies how frequently to scrape targets

scrape_timeout

Specifies how long to wait until a scrape request times out. Cannot be greater than the value of the scrape_interval parameter

—

Statsd-exporter component configuration

Mapping config
Parameter	Description	Default value
Mapping config	Airflow StatsD metrics mapping	—

statsd-options.env
Parameter	Description	Default value
web.listen-address	Port where the Prometheus web interface and metrics are displayed	9102
statsd.mapping-config	The name of the metrics mapping configuration file	/etc/statsd-exporter/conf/statsd-mapping.yml
statsd.listen-udp	The UDP port on which to receive statsd metric lines. Filled from the *statsd_port* parameter in *airflow.cfg [metrics]*	8125
web.enable-lifecycle	Enables shutdown and reload via HTTP request	true
statsd.cache-size	Maximum size of the metric mapping cache. If max size is reached, the service will rely on the least recently used replacement policy	—
statsd.listen-tcp	The TCP port on which to receive statsd metric lines. Leave the value empty to disable it	—
web.telemetry-path	Path under which to expose metrics	—
statsd.listen-unixgram	The Unixgram socket path to receive statsd metric lines in datagram. Leave the value empty to disable it	—
statsd.unixsocket-mode	The permission mode of the Unix socket	—
statsd.read-buffer	The size (in bytes) of the operating system’s transmit read buffer associated with the UDP or Unixgram connection. Ensure that the `net.core.rmem_max` kernel parameter is greater than this value	—
statsd.cache-type	Metric mapping cache type. Valid options are `lru` (least recently used) and `random`	—
statsd.event-queue-size	The size of internal queue for processing events	—
statsd.event-flush-threshold	The number of events to hold in the queue before flushing	—
statsd.event-flush-interval	Maximum time between event queue flushes	—
debug.dump-fsm	The path where to dump internal FSM generated for glob matching (as a Dot file)	—
statsd.parse-dogstatsd-tags	Indicates whether to parse DogStatsd style tags	true
statsd.parse-influxdb-tags	Indicates whether to parse InfluxDB style tags	true
statsd.parse-librato-tags	Indicates whether to parse Librato style tags	true
statsd.parse-signalfx-tags	Indicates whether to parse SignalFX style tags	true
statsd.relay.address	The UDP relay target address in the `<host:port>` format	—
statsd.relay.packet-length	Maximum relay output packet length to avoid fragmentation	—
statsd.udp-packet-queue-size	Size of internal queue for processing UDP packets	—
log.level	The logging level. Supported values: `debug`, `info`, `warn`, `error`	—
log.format	The output format of logs. Supported values: `logfmt`, `json`	—
Custom statsd-options.env	This field enables adding custom parameters to the statsd-options.env configuration file	—

Redis

Redis configuration

Parameter

Description

Default value

redis.conf

Redis configuration file

—

redis_port

Redis broker listen port

6379

Sentinel configuration

Parameter

Description

Default value

sentinel.conf

Sentinel configuration file

—

sentinel_port

Sentinel port

26379

Redis Server

Parameter

Description

Default value

Enable custom ulimits

Displays an editable ulimits config for the Redis Server

—

DBT

dbt-env.sh

Parameter Description Default value

DBT_PROFILES_DIR

Path to the profiles.yml configuration file

/etc/ad-dbt/conf

DBT_LOG_PATH

Directory for logs

—

DBT_TARGET

Specifies the --target-path parameter. Defines the directory where dbt stores compiled SQL files, logs, manifests, and execution artifacts (by default, the target/ directory in the project root)

—

Custom dbt-env.sh

You can use the Custom dbt-env.sh field to set configuration parameters for GitSync. The settings specified in this field have higher priority than the settings specified in dbt-env.sh.

docs component

The docs component has the docs_projects parameter, which can be used as a configuration file to map project paths to ports for documentation’s web UI.

Configuration template:

docs_projects:
  <project_dir>: <port>

where:

project_dir — path to a dbt project;
port — port used to access documentation.

Example:

docs_projects:
  /opt/test/dbt/dbt_people_lab: 8092

The documentation will be available at http://<host>:<port>.

Found a mistake? Seleсt text and press Ctrl+Enter to report it