Configuration parameters
This topic describes the parameters that can be configured for ADO services via ADCM. To read about the configuring process, refer to the relevant articles: Online installation, Offline installation.
|
NOTE
|
ADPG
| Parameter | Description | Default value |
|---|---|---|
Data directory |
Directories that are used to store data on the ADPG hosts |
/pg_data1 |
| Parameter | Description | Default value |
|---|---|---|
listen_addresses |
Specifies the TCP/IP address(es) on which the server is to listen for connections from client applications (requires a restart when changed) |
* |
port |
The TCP port the server listens on |
5432 |
max_connections |
Determines the maximum number of concurrent connections to the server. For a replica host, the value of this parameter must be greater than or equal to the value on the leader host. If this requirement is not met, the replica host will reject all requests |
100 |
shared_buffers |
Sets the amount of memory for the shared memory buffer. The higher the value of this parameter, the less the load on the host hard drives will be |
128 MB |
max_worker_processes |
Sets the maximum number of background processes that the system can support |
8 |
max_parallel_workers |
Sets the maximum number of workers that the system can support for parallel operations |
8 |
max_parallel_workers_per_gather |
Sets the maximum number of workers that can be started by a single Gather or Gather Merge node |
2 |
max_parallel_maintenance_workers |
Sets the maximum number of parallel workers that can be started by a single utility command |
2 |
effective_cache_size |
Sets the planner’s assumption about the effective size of the disk cache that is available to a single query. This is taken into account when estimating the cost of using the index. A higher value makes it more likely that index scans will be used, a lower value makes it more likely that sequential scans will be applied. When setting this parameter, you should consider both PostgreSQL shared buffers and the portion of the kernel’s disk cache that will be used for PostgreSQL data files, though some data might exist in both places. Also, take into account the expected number of concurrent queries to different tables, since they will have to share the available space. This parameter does not affect the size of shared memory allocated by PostgreSQL, and it does not reserve kernel disk cache. It is used only for estimation purposes. The system also does not assume data remains in the disk cache between queries. If this value is specified without units, it is taken as blocks, that is |
4096 MB |
maintenance_work_mem |
Specifies the maximum amount of memory to be used by maintenance operations, such as |
64 MB |
work_mem |
Sets the base maximum amount of memory to be used by a query operation (such as a sort or hash table) before writing to temporary disk files. Note that for a complex query, several sort or hash operations might be running in parallel. Each operation will be allowed to use as much memory as this value specifies before it starts to write data into temporary files. Several running sessions can also do such operations concurrently. Therefore, the total memory used can be many times greater than the value of |
4 MB |
min_wal_size |
Until WAL disk usage stays below the |
80 MB |
max_wal_size |
Sets the memory limit to which the log size can grow between automatic checkpoints. Increasing this setting may increase the recovery time after a failure. The specified limit can be exceeded automatically with a high load on ADPG |
1024 MB |
wal_keep_size |
Sets the minimum size of segments retained in the pg_wal directory, in case a standby server needs to fetch them for streaming replication. If a standby server connected to the sending server falls behind by more than |
128 MB |
huge_pages |
Defines whether huge pages can be requested for the main shared memory area. The following values are valid:
|
try |
superuser_reserved_connections |
Determines the number of connection "slots" that are reserved for PostgreSQL superuser connections |
3 |
logging_collector |
Enables the logging collector. The logging collector is a background process that captures log messages sent to stderr and redirects them into log files |
true |
log_directory |
Determines the directory that contains log files. It can be specified as an absolute path or relative to the cluster data directory |
log |
log_filename |
Specifies the log file name pattern. The value can include strftime |
postgresql-%a.log |
log_rotation_age |
Determines the maximum period of time to use a log file, after which a new log file is created. If this value is specified without units, it is taken as minutes. Set |
1d |
log_rotation_size |
Determines the maximum size of a log file. After a log file reaches the specified size, a new log file is created. If the value is set without units, it is taken as kilobytes. Set |
0 |
log_min_messages |
Specifies the minimum severity level of messages that are written to a log file. Valid values are |
warning |
log_min_error_statement |
Specifies which SQL statements that cause errors are logged. Valid values are |
error |
You can use the Custom postgresql.conf field to set configuration parameters for specific ADPG nodes using ADCM configuration groups. The settings specified in this field have higher priority than the settings specified in postgresql.conf. To switch to editing mode, click Custom postgresql.conf in the Configuration tree.
The section allows you to add lines to the pg_hba.conf file. The pg_hba.conf file configures the client authentication.
Airflow2
| Parameter | Description | Default value |
|---|---|---|
Manage sensitive configuration data |
When enabled, ADO takes over the creation of secrets (transferring them from configurations to Vault) as well as updating them. Requires the right to create secrets. Affects the Rotate fernet key action (see fernet key rotation) |
true |
Secrets backend |
A secret backend to use |
airflow.providers.hashicorp.secrets.vault.VaultBackend |
url |
Base URL for a Vault instance being addressed. Has to include protocol and port (e.g. |
— |
auth_type |
Authentication type for Vault. Possible values: |
token |
mount_point |
The path the secret engine was mounted on. Note that this |
secret |
config_path |
Specifies the path of the Airflow configuration secret to read. If set to |
config |
connections_path |
Specifies the path of the secret to read to get connections. If set to |
connections |
variables_path |
Specifies the path of the secret to read to get variables. If set to |
variables |
auth_mount_point |
Defines a mount point for a chosen authentication type. The default value depends on the authentication method used |
— |
kv_engine_version |
The engine version to run |
2 |
token |
Authentication token to include in requests sent to Vault (for the |
— |
token_path |
Path to the file containing authentication token to include in requests sent to Vault (for the |
— |
username |
Username for the |
— |
password |
Password for the |
— |
secret_id |
Secret ID for the |
— |
role_id |
Role ID for the |
— |
kubernetes_role |
Role for the |
— |
kubernetes_jwt_path |
Path to the Kubernetes JWT token for the |
— |
| Parameter | Description | Default value |
|---|---|---|
admin_password |
The password of the webserver’s admin user |
— |
db_user |
The name of the metadata DB user |
airflow |
db_password |
The password of the metadata DB user |
— |
Database type |
The external database type. Possible values: |
PostgreSQL |
Hostname |
The external database host |
{{groups['adpg.adpg'][0]|d(omit)}} |
Port |
The external database port |
5432 |
Airflow database name |
The external database name |
airflow |
| Parameter | Description | Default value |
|---|---|---|
dags_folder |
The absolute path to the Airflow pipelines directory |
/opt/airflow/dags |
hostname_callable |
A path to a callable, which will resolve the hostname. The format is |
airflow.utils.net.getfqdn |
might_contain_dag_callable |
A callable to check if a Python file has Airflow DAGs defined or not with argument as: |
airflow.utils.file.might_contain_dag_via_default_heuristic |
default_timezone |
Default timezone. Can be |
utc |
executor |
The executor class that Airflow should use. Choices include |
CeleryExecutor |
parallelism |
This defines the maximum number of task instances that can run concurrently per scheduler in Airflow, regardless of the worker count. Generally this value, multiplied by the number of schedulers in your cluster, is the maximum number of task instances with the running state in the metadata database |
32 |
max_active_tasks_per_dag |
The maximum number of task instances allowed to run concurrently in each DAG. To calculate the number of tasks that is running concurrently for a DAG, add up the number of running tasks for all DAG runs of the DAG. This is configurable at the DAG level with |
16 |
dags_are_paused_at_creation |
The flag that indicates if DAGs are paused by default at creation |
true |
max_active_runs_per_dag |
The maximum number of active DAG runs per DAG. The scheduler will not create more DAG runs if it reaches the limit. This is configurable at the DAG level with |
16 |
mp_start_method |
The name of the method used in order to start Python processes via the multiprocessing module. This corresponds directly with the options available in the Python docs. Must be one of the values returned by multiprocessing |
— |
load_examples |
Whether to load the DAG examples that ship with Airflow |
true |
plugins_folder |
Path to the folder containing Airflow plugins |
/opt/airflow/plugins |
execute_tasks_new_python_interpreter |
Should tasks be executed via forking of the parent process ( |
false |
fernet_key |
The secret key to save connection passwords in the database |
— |
donot_pickle |
Whether to disable pickling DAGs |
true |
dagbag_import_timeout |
How long before timing out a Python file import |
30 |
dagbag_import_error_tracebacks |
Should a traceback be shown in the UI for dagbag import errors instead of just the exception message |
true |
dagbag_import_error_traceback_depth |
If tracebacks are shown, how many entries from the traceback should be shown |
2 |
dag_file_processor_timeout |
How long before timing out a |
50 |
task_runner |
The class to use for running task instances in a subprocess. Choices include |
StandardTaskRunner |
default_impersonation |
If set, tasks without a |
— |
security |
Defines which security module to use. For example, |
— |
unit_test_mode |
Turn unit test mode on (overwrites many configuration options with test values at runtime) |
false |
enable_xcom_pickling |
Whether to enable pickling for xcom (note that this is insecure and allows for RCE exploits) |
false |
allowed_deserialization_classes |
What classes can be imported during deserialization. This is a multi line value. The individual items will be parsed as regexp. Python built-in classes (like dict) are always allowed. Bare |
airflow\..* |
killed_task_cleanup_time |
When a task is killed forcefully, this is the amount of time in seconds that it has to cleanup after it is sent a SIGTERM, before it is SIGKILLED |
60 |
dag_run_conf_overrides_params |
Whether to override params with dag_run.conf. If you pass some key-value pairs through |
true |
dag_discovery_safe_mode |
If enabled, Airflow will only scan files containing both |
true |
dag_ignore_file_syntax |
The pattern syntax used in the .airflowignore files in the DAG directories. Valid values are |
regexp |
default_task_retries |
The number of retries each task is going to have by default. Can be overridden at DAG or task level |
0 |
default_task_retry_delay |
The number of seconds each task is going to wait by default between retries. Can be overridden at dag or task level |
300 |
max_task_retry_delay |
The maximum delay (in seconds) each task is going to wait by default between retries. This is a global setting and cannot be overridden at task or DAG level |
86400 |
default_task_weight_rule |
The weighting method used for the effective total priority weight of the task |
downstream |
default_task_execution_timeout |
The default task |
— |
min_serialized_dag_update_interval |
Updating serialized DAG cannot be faster than a minimum interval to reduce database write rate |
30 |
compress_serialized_dags |
If |
false |
min_serialized_dag_fetch_interval |
Fetching serialized DAG cannot be faster than a minimum interval to reduce database read rate. This config controls when your DAGs are updated in the Webserver |
10 |
max_num_rendered_ti_fields_per_task |
Maximum number of rendered task instance fields (template fields) per task to store in the database. All the |
30 |
check_slas |
On each dagrun check against defined SLAs |
true |
xcom_backend |
Path to custom XCom class that will be used to store and resolve operators results |
airflow.models.xcom.BaseXCom |
lazy_load_plugins |
By default, Airflow plugins are lazily-loaded (only loaded when required). Set it to |
true |
lazy_discover_providers |
By default, Airflow providers are lazily-discovered (discovery and imports happen only when required). Set it to |
true |
hide_sensitive_var_conn_fields |
Hide sensitive variables or extra JSON connection keys from UI and task logs when set to |
true |
sensitive_var_conn_names |
A comma-separated list of extra sensitive keywords to look for in variables names or connection’s extra JSON |
— |
default_pool_task_slot_count |
Task slot counts for |
128 |
max_map_length |
The maximum list/dict length an XCom can push to trigger task mapping. If the pushed list/dict has a length exceeding this value, the task pushing the XCom will be failed automatically to prevent the mapped tasks from clogging the scheduler |
1024 |
daemon_umask |
The default umask to use for process when run in daemon mode (scheduler, worker, etc.) This controls the file-creation mode mask which determines the initial value of file permission bits for newly created files. This value is treated as an octal-integer |
0o077 |
dataset_manager_class |
Class to use as dataset manager |
— |
dataset_manager_kwargs |
Kwargs to supply to dataset manager |
— |
database_access_isolation |
Experimental feature. The flag that indicates whether components should use Airflow Internal API for DB connectivity |
false |
internal_api_url |
Experimental feature. Airflow Internal API URL. Only used if the |
— |
| Parameter | Description | Default value |
|---|---|---|
sql_alchemy_conn |
The SQLAlchemy connection string to the metadata database. The value of the parameter is automatically populated based on the input values in the Database settings section. It is not displayed in the UI for security reasons. SQLAlchemy supports many different database engines |
— |
sql_alchemy_engine_args |
Extra engine specific keyword args passed to SQLAlchemy’s |
— |
sql_engine_encoding |
The encoding for the databases |
utf-8 |
sql_engine_collation_for_ids |
Collation for |
— |
sql_alchemy_pool_enabled |
If SQLAlchemy should pool database connections |
true |
sql_alchemy_pool_size |
The SQLAlchemy pool size is the maximum number of database connections in the pool. |
5 |
sql_alchemy_max_overflow |
The maximum overflow size of the pool. When the number of checked-out connections reaches the size set in |
10 |
sql_alchemy_pool_recycle |
The SQLAlchemy pool recycle is the number of seconds a connection can be idle in the pool before it is invalidated. This config does not apply to Sqlite. If the number of DB connections is ever exceeded, a lower config value will allow the system to recover faster |
1800 |
sql_alchemy_pool_pre_ping |
Check connection at the start of each connection pool checkout |
true |
sql_alchemy_schema |
The schema to use for the metadata database. SQLAlchemy supports databases with the concept of multiple schemas |
— |
sql_alchemy_connect_args |
Import path for connection arguments in SQLAlchemy. Defaults to an empty dictionary. This is useful when you want to configure DB engine arguments that SQLAlchemy won’t parse in connection string |
— |
load_default_connections |
Whether to load the default connections that ship with Airflow |
true |
max_db_retries |
Number of times the code should be retried in case of DB operational errors. Not all transactions will be retried as it can cause undesired state. Currently, it is only used in |
3 |
check_migrations |
Whether to run alembic migrations during Airflow start up. Sometimes this operation can be expensive, and the users can assert the correct version through other means (e.g. through a Helm chart). Accepts |
true |
| Parameter | Description | Default value |
|---|---|---|
base_log_folder |
The absolute path to the Airflow log files directory. There are a few existing configurations that assume this is set to the default. If you choose to override this, you may need to update the |
/var/log/airflow |
remote_logging |
Airflow can store logs remotely in AWS S3, Google Cloud Storage, or Elastic Search. Set this to |
false |
remote_log_conn_id |
Users must supply an Airflow connection ID that provides access to the storage location. Depending on your remote logging service, this may only be used for reading logs, not writing them |
— |
delete_local_logs |
Whether the local log files for GCS, S3, WASB, and OSS remote logging should be deleted after they are uploaded to the remote location |
false |
google_key_path |
Path to Google Credential JSON file. If omitted, authorization based on the Application Default Credentials will be used |
— |
remote_base_log_folder |
Storage bucket URL for remote logging. S3 buckets should start with |
— |
remote_task_handler_kwargs |
The |
— |
encrypt_s3_logs |
Use server-side encryption for logs stored in S3 |
false |
logging_level |
Logging level. Supported values: |
INFO |
celery_logging_level |
Logging level for celery |
WARNING |
fab_logging_level |
Logging level for Flask-appbuilder UI. Supported values: |
WARNING |
logging_config_class |
The name of the class that specifies the logging configuration. This class has to be on the Python classpath |
— |
colored_console_log |
Flag to enable/disable colored logs |
true |
colored_log_format |
The log format for colored logs if they are enabled. The value must be taken in a tag |
{% raw %}[%%(blue)s%%(asctime)s%%(reset)s] {%%(blue)s%%(filename)s:%%(reset)s%%(lineno)d} %%(log_color)s%%(levelname)s%%(reset)s - %%(log_color)s%%(message)s%%(reset)s{% endraw %} |
colored_formatter_class |
Specifies the class utilized by Airflow to implement colored logging |
airflow.utils.log.colored_log.CustomTTYColoredFormatter |
log_format |
Format of log line. The value must be taken in a tag |
{% raw %}[%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s{% endraw %} |
simple_log_format |
Defines the format of log messages for simple logging configuration |
%%(asctime)s %%(levelname)s - %%(message)s |
dag_processor_log_target |
Where to store DAG parser logs. If set to |
file |
dag_processor_log_format |
DAG processor log line format. The value must be taken in a tag |
{% raw %}[%%(asctime)s] [SOURCE:DAG_PROCESSOR]{{%%(filename)s:%%(lineno)d}} %%(levelname)s - %%(message)s{% endraw %} |
log_formatter_class |
Determines the formatter class used by Airflow for structuring its log messages. The default formatter class is timezone-aware, which means that timestamps attached to log entries will be adjusted to reflect the local timezone of the Airflow instance |
airflow.utils.log.timezone_aware.TimezoneAware |
secret_mask_adapter |
An import path to a function to add adaptations of each secret added with |
— |
task_log_prefix_template |
Prefix pattern specified with stream handler |
— |
log_filename_template |
The format of generated Airflow file and path names for each task run. The value must be taken in a tag |
{% raw %}dag_id={{ ti.dag_id }}/run_id={{ ti.run_id }}/task_id={{ ti.task_id }}/{%% if ti.map_index >= 0 %%}map_index={{ ti.map_index }}/{%% endif %%}attempt={{ try_number }}.log{% endraw %} |
log_processor_filename_template |
The format of generated Airflow file and path names for logs. The value must be taken in a tag |
{% raw %}{{ filename }}.log{% endraw %} |
dag_processor_manager_log_location |
Full path of dag_processor_manager logfile |
/var/log/airflow/dag_processor_manager/dag_processor_manager.log |
task_log_reader |
Name of handler to read task instance logs. Defaults to use |
task |
extra_logger_names |
A comma-separated list of third-party logger names that will be configured to print messages to consoles |
— |
worker_log_server_port |
When you start an Airflow worker, the service starts a tiny web server subprocess to serve the workers local log files to the Airflow main web server, who then builds pages and sends them to users. This defines the port on which the logs are served. It must be unused, open, and visible from the main web server to connect into the workers |
8793 |
trigger_log_server_port |
Port to serve logs from for triggerer. See |
8794 |
interleave_timestamp_parser |
Import path to callable, which takes a string log line and returns the timestamp ( |
— |
file_task_handler_new_folder_permissions |
Permissions in the form of octal string as understood by chmod. The permissions are important when you use impersonation, when logs are written by a different user than |
0o775 |
file_task_handler_new_file_permissions |
Permissions in the form of octal string as understood by chmod. The permissions are important when you use impersonation, when logs are written by a different user than |
0o664 |
| Parameter | Description | Default value |
|---|---|---|
metrics_allow_list |
If you want to avoid emitting all the available metrics, you can configure a list of prefixes (comma-separated) to send only the metrics that start with the elements of the list (e.g. |
— |
metrics_block_list |
If you want to avoid emitting all the available metrics, you can configure a list of prefixes (comma-separated) to filter out metrics that start with the elements of the list (e.g. |
— |
statsd_on |
Enables sending metrics to StatsD |
true |
statsd_host |
Specifies the host address where the StatsD daemon (or server) is running |
localhost |
statsd_port |
Specifies the port on which the StatsD daemon (or server) is listening to |
8125 |
statsd_prefix |
Defines the namespace for all metrics sent from Airflow to StatsD |
airflow |
stat_name_handler |
A function that validates the StatsD stat name, applies changes to the stat name if necessary, and returns the transformed stat name. The function should have the following signature: |
— |
statsd_datadog_enabled |
Enables datadog integration to send Airflow metrics |
false |
statsd_datadog_tags |
List of datadog tags attached to all metrics(e.g. |
— |
statsd_datadog_metrics_tags |
Set to |
true |
statsd_custom_client_path |
If you want to use your own custom StatsD client, set the relevant module path in this value. The module path must exist on your |
— |
statsd_disabled_tags |
If you want to avoid sending all the available metrics tags to StatsD, you can configure a list of prefixes (comma-separated) to filter out metric tags that start with the elements of the list (e.g. |
job_id,run_id |
statsd_influxdb_enabled |
Enables sending Airflow metrics with StatsD-Influxdb tagging convention |
false |
otel_on |
Enables sending metrics to OpenTelemetry |
false |
otel_host |
Specifies the hostname or IP address of the OpenTelemetry Collector to which Airflow sends traces |
localhost |
otel_port |
Specifies the port of the OpenTelemetry Collector that is listening to |
8889 |
otel_prefix |
The prefix for the Airflow metrics |
airflow |
otel_interval_milliseconds |
Defines the interval, in milliseconds, at which Airflow sends batches of metrics and traces to the configured OpenTelemetry Collector |
60000 |
| Parameter | Description | Default value |
|---|---|---|
api_client |
Defines the format of access to the API. The |
airflow.api.client.local_client |
endpoint_url |
If you set |
http://localhost:8080 |
| Parameter | Description | Default value |
|---|---|---|
fail_fast |
Used only with |
false |
| Parameter | Description | Default value |
|---|---|---|
enable_experimental_api |
Enables the deprecated since the 2.0 version experimental REST API. These APIs do not have access control. The authenticated user has full access. Please consider using the stable REST API. For more information on migration, see RELEASE_NOTES.rst |
false |
auth_backends |
Comma-separated list of auth backends to authenticate users of the API. The |
airflow.api.auth.backend.session,airflow.api.auth.backend.basic_auth |
maximum_page_limit |
Used to set the maximum page limit for API requests. If limit passed is greater than maximum page limit, it will be ignored and maximum page limit value will be set as the limit |
100 |
fallback_page_limit |
Used to set the default page limit when limit param is zero or not provided in API requests. Otherwise, if positive integer is passed in the API requests as limit, the smallest number of user given limit or maximum page limit is taken as limit |
100 |
google_oauth2_audience |
The intended audience for JWT token credentials used for authorization. This value must match on the client and server sides. If empty, audience will not be tested |
— |
google_key_path |
Path to Google Cloud Service Account key file (JSON). If omitted, authorization based on the Application Default Credentials will be used |
— |
access_control_allow_headers |
Used in response to a preflight request to indicate which HTTP headers can be used when making the actual request. This header is the server side response to the browser’s |
— |
access_control_allow_methods |
Specifies the method or methods allowed when accessing the resource |
— |
access_control_allow_origins |
Indicates whether the response can be shared with requesting code from the given origins. Separate URLs with space |
— |
| Parameter | Description | Default value |
|---|---|---|
backend |
What lineage backend to use |
— |
| Parameter | Description | Default value |
|---|---|---|
sasl_enabled |
Enables SASL authentication fo connecting to Atlas |
false |
host |
Atlas host |
— |
port |
Atlas connection port |
21000 |
username |
Username for connecting to Atlas |
— |
password |
Password for connecting to Atlas |
— |
| Parameter | Description | Default value |
|---|---|---|
default_owner |
The default owner assigned to each new operator, unless provided explicitly or passed via |
airflow |
default_cpus |
Indicates the default number of CPU units allocated to each operator when no specific CPU request is specified in the operator’s configuration |
1 |
default_ram |
Indicates the default number of RAM allocated to each operator when no specific RAM request is specified in the operator’s configuration |
512 |
default_disk |
Indicates the default number of disk storage allocated to each operator when no specific disk request is specified in the operator’s configuration |
512 |
default_gpus |
Indicates the default number of GPUs allocated to each operator when no specific GPUs request is specified in the operator’s configuration |
0 |
default_queue |
Default queue that tasks get assigned to and that workers listen on |
default |
allow_illegal_arguments |
Is allowed to pass additional/unused arguments (args, kwargs) to the BaseOperator operator. If set to |
false |
| Parameter | Description | Default value |
|---|---|---|
default_hive_mapred_queue |
Default MapReduce queue for HiveOperator tasks |
— |
mapred_job_name_template |
Template for |
— |
| Parameter | Description | Default value |
|---|---|---|
base_url |
The base URL of your website as Airflow cannot guess what domain or cname you are using. This is used in automated emails that Airflow sends to point links to the right webserver |
|
default_ui_timezone |
Default timezone to display all dates in the UI, can be UTC, system, or any IANA timezone string (e.g. |
UTC |
web_server_host |
The IP specified when starting the webserver |
0.0.0.0 |
web_server_port |
The port on which to run the webserver |
8080 |
web_server_ssl_cert |
Paths to the SSL certificate and key for the webserver. When both are provided, SSL will be enabled. This does not change the webserver port |
— |
web_server_ssl_key |
Paths to the SSL certificate and key for the webserver. When both are provided, SSL will be enabled. This does not change the webserver port |
— |
session_backend |
The type of backend used to store web session data, can be |
database |
web_server_master_timeout |
Number of seconds the webserver waits before killing gunicorn master that doesn’t respond |
120 |
web_server_worker_timeout |
Number of seconds the Gunicorn webserver waits before timing out on a worker |
120 |
worker_refresh_batch_size |
Number of workers to refresh at a time. When set to |
1 |
worker_refresh_interval |
Number of seconds to wait before refreshing a batch of workers |
6000 |
reload_on_plugin_change |
If set to |
false |
secret_key |
Secret key used to run your flask app. It should be as random as possible. However, when running more than one instance of webserver, make sure all of them use the same |
— |
workers |
Number of workers to run the Gunicorn webserver |
4 |
worker_class |
The worker class Gunicorn should use. Choices include |
sync |
access_logfile |
Log files for the Gunicorn webserver. The |
— |
error_logfile |
Log files for the Gunicorn webserver. The |
— |
access_logformat |
Access log format for Gunicorn webserver. Default format is |
— |
expose_config |
Expose the configuration file in the webserver. Set to |
false |
expose_hostname |
Whether to expose hostname in the webserver |
false |
expose_stacktrace |
Whether to expose stacktrace in the webserver |
false |
dag_default_view |
Default DAG view. Valid values are: |
grid |
dag_orientation |
Default DAG orientation. Valid values are: |
LR |
log_fetch_timeout_sec |
The amount of time (in seconds) the webserver will wait for initial handshake while fetching logs from other worker machine |
5 |
log_fetch_delay_sec |
Time interval (in seconds) to wait before next log fetching |
2 |
log_auto_tailing_offset |
Distance away from page bottom to enable auto tailing |
30 |
log_animation_speed |
Animation speed for auto tailing log display |
1000 |
hide_paused_dags_by_default |
By default, the webserver shows paused DAGs. Flip this to hide paused DAGs by default |
false |
page_size |
Consistent page size across all listing views in the UI |
100 |
navbar_color |
Defines the color of navigation bar |
#fff |
default_dag_run_display_number |
Default |
25 |
enable_proxy_fix |
Enables werkzeug |
false |
proxy_fix_x_for |
Number of values to trust for |
1 |
proxy_fix_x_proto |
Number of values to trust for |
1 |
proxy_fix_x_host |
Number of values to trust for |
1 |
proxy_fix_x_port |
Number of values to trust for |
1 |
proxy_fix_x_prefix |
Number of values to trust for |
1 |
cookie_secure |
Sets secure flag on session cookie |
false |
cookie_samesite |
Sets same-site policy on session cookie |
Lax |
default_wrap |
Default setting for wrap toggle on DAG code and TI log views |
false |
x_frame_enabled |
Allows the UI to be rendered in a frame |
true |
analytics_tool |
Whether to send anonymous user activity to your analytics tool. Supported values: |
— |
analytics_id |
Unique ID of your account in the analytics tool |
— |
show_recent_stats_for_completed_runs |
Recent Tasks stats will show for old DagRuns if set |
true |
update_fab_perms |
Whether to update FAB permissions and sync security manager roles on webserver startup |
true |
session_lifetime_minutes |
The UI cookie lifetime in minutes. User will be logged out from UI after |
43200 |
instance_name |
Sets a custom page title for the DAGs overview page and site title for all pages |
— |
instance_name_has_markup |
Whether the custom page title for the DAGs overview page contains any markup language |
false |
auto_refresh_interval |
How frequently, in seconds, the DAG data will auto-refresh in graph or grid view when auto-refresh is turned on |
3 |
warn_deployment_exposure |
Boolean for displaying warning for publicly viewable deployment |
true |
audit_view_excluded_events |
Comma-separated string of view events to exclude from DAG audit view. All other events will be added minus the ones passed here. The audit logs in the DB will not be affected by this parameter |
gantt,landing_times,tries,duration,calendar,graph,grid,tree,tree_data |
audit_view_included_events |
Comma-separated string of view events to include in DAG audit view. If passed, only these events will populate the DAG audit view. The audit logs in the DB will not be affected by this parameter |
— |
enable_swagger_ui |
Boolean for running SwaggerUI in the webserver |
true |
run_internal_api |
Boolean for running Internal API in the webserver |
false |
auth_rate_limited |
Boolean for enabling rate limiting on authentication endpoints |
true |
auth_rate_limit |
Rate limit for authentication endpoints |
5 per 40 second |
caching_hash_method |
The caching algorithm used by the webserver. Must be a valid |
md5 |
| Parameter | Description | Default value |
|---|---|---|
email_backend |
Email backend to use |
airflow.utils.email.send_email_smtp |
email_conn_id |
An Airflow connection that contains SMTP credentials |
smtp_default |
default_email_on_retry |
Whether email alerts should be sent when a task is retried |
true |
default_email_on_failure |
Whether email alerts should be sent when a task failed |
true |
subject_template |
File that will be used as the template for email subject (which will be rendered using Jinja2). If not set, Airflow uses a base template |
— |
html_content_template |
File that will be used as the template for email content (which will be rendered using Jinja2). If not set, Airflow uses a base template |
— |
from_email |
Email address that will be used as sender address. It can either be raw email or the complete address in a format |
— |
| Parameter | Description | Default value |
|---|---|---|
smtp_host |
Specifies the host server address used by Airflow when sending out email notifications via SMTP |
localhost |
smtp_starttls |
Determines whether to use the |
true |
smtp_ssl |
Determines whether to use an SSL connection when talking to the SMTP server |
false |
smtp_user |
Username to authenticate when connecting to SMTP server |
— |
smtp_password |
Password to authenticate when connecting to SMTP server |
— |
smtp_port |
Defines the port number on which Airflow connects to the SMTP server to send email notifications |
25 |
smtp_mail_from |
Specifies the default from email address used when Airflow sends email notifications |
airflow@example.com |
smtp_timeout |
Determines the maximum time (in seconds) the Apache Airflow system will wait for a connection to the SMTP server to be established |
30 |
smtp_retry_limit |
Defines the maximum number of times Airflow will attempt to connect to the SMTP server |
5 |
| Parameter | Description | Default value |
|---|---|---|
sentry_on |
Enables error reporting to Sentry |
false |
sentry_dsn |
A Sentry DSN URL |
— |
before_send |
Dotted path to a |
— |
| Parameter | Description | Default value |
|---|---|---|
kubernetes_queue |
Defines whether to send a task to |
kubernetes |
| Parameter | Description | Default value |
|---|---|---|
kubernetes_queue |
Defines when to send a task to |
kubernetes |
| Parameter | Description | Default value |
|---|---|---|
celery_app_name |
The app name that will be used by Celery |
airflow.executors.celery_executor |
worker_concurrency |
The concurrency that will be used when starting workers with the |
16 |
worker_autoscale |
The maximum and minimum concurrency that will be used when starting workers with the |
— |
worker_prefetch_multiplier |
Used to increase the number of tasks that a worker prefetches, which can improve performance. The number of processes multiplied by |
1 |
worker_enable_remote_control |
Specify if remote control of workers is enabled. In some cases, when the broker does not support remote control, Celery creates lots of |
true |
broker_url |
The Celery broker URL. Celery supports RabbitMQ, Redis, and experimentally a SQLAlchemy database. Refer to the Celery documentation for more information |
redis://{{groups['redis.server'][0]|d(omit)}}:6379/0 |
result_backend |
The Celery backend for storing job metadata. When a job finishes, it needs to update the metadata of the job. Therefore it will post a message on a message bus or insert it into a database (depending of the backend). This status is used by the scheduler to update the state of the task. The use of a database is highly recommended. When not specified, |
— |
result_backend_sqlalchemy_engine_options |
Optional configuration dictionary to pass to the Celery result backend SQLAlchemy engine |
— |
flower_host |
Celery Flower is a sweet UI for Celery. Airflow has a shortcut to start it |
0.0.0.0 |
flower_url_prefix |
The root URL for Flower |
— |
flower_port |
The port that Celery Flower runs on |
5555 |
flower_basic_auth |
Enable basic authentication for Flower. This parameter takes a string in the format |
— |
sync_parallelism |
How many processes CeleryExecutor uses to sync task state. |
0 |
celery_config_options |
Import path for Celery configuration options |
airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG |
ssl_active |
Defines if SSL is active for Airflow |
false |
ssl_key |
Path to the client key |
— |
ssl_cert |
Path to the client certificate |
— |
ssl_cacert |
Path to the CA certificate |
— |
pool |
Celery pool implementation. Possible choices are: |
prefork |
operation_timeout |
The number of seconds to wait before timing out |
1 |
task_track_started |
Celery task will report its status as |
true |
task_publish_max_retries |
The maximum number of retries for publishing task messages to the broker when failing due to |
3 |
worker_precheck |
Worker initialisation check to validate metadata database connection |
false |
| Parameter | Description | Default value |
|---|---|---|
visibility_timeout |
The visibility timeout defines the number of seconds to wait for the worker to acknowledge the task before the message is redelivered to another worker. Make sure to increase the visibility timeout to match the time of the longest ETA you’re planning to use. |
— |
| Parameter | Description | Default value |
|---|---|---|
cluster_address |
The IP address and port of the Dask cluster’s scheduler |
127.0.0.1:8786 |
tls_ca |
TLS/ SSL settings to access a secured Dask scheduler |
— |
tls_cert |
TLS Certificate |
— |
tls_key |
TLS Certificate key |
— |
| Parameter | Description | Default value |
|---|---|---|
job_heartbeat_sec |
Defines the frequency (in seconds) at which task instances should listen for external kill signal (when you clear tasks from the CLI or the UI) |
5 |
scheduler_heartbeat_sec |
The scheduler constantly tries to trigger new tasks. This defines how often the scheduler should run (in seconds) |
5 |
num_runs |
The number of times to try to schedule each DAG file. |
-1 |
scheduler_idle_sleep_time |
Controls how long the scheduler will sleep between loops. If there was nothing to schedule, the next loop starts straight away |
1 |
min_file_process_interval |
Number of seconds after which a DAG file is parsed. The DAG file is parsed every |
30 |
parsing_cleanup_interval |
How often (in seconds) to check for stale DAGs (DAGs which are no longer present in the expected files) which should be deactivated, as well as datasets that are no longer referenced and should be marked as orphaned |
60 |
stale_dag_threshold |
How long (in seconds) to wait after we have re-parsed a DAG file before deactivating stale DAGs (DAGs which are no longer present in the expected files). The absolute maximum that this could take is |
50 |
dag_dir_list_interval |
How often (in seconds) to scan the DAGs directory for new files. Default to 5 minutes |
300 |
print_stats_interval |
How often should stats be printed to the logs. Setting to |
30 |
pool_metrics_interval |
How often (in seconds) should pool usage stats be sent to StatsD (if |
5 |
scheduler_health_check_threshold |
If the last scheduler heartbeat happened more than |
30 |
enable_health_check |
When you start a scheduler, Airflow starts a tiny webserver subprocess to serve a health check if this is set to |
false |
scheduler_health_check_server_port |
When you start a scheduler, Airflow starts a tiny webserver subprocess to serve a health check on this port |
8974 |
orphaned_tasks_check_interval |
How often (in seconds) should the scheduler check for orphaned tasks and SchedulerJobs |
300 |
child_process_log_directory |
Determines the directory where logs for the child processes of the scheduler will be stored |
/var/log/airflow/scheduler |
scheduler_zombie_task_threshold |
Local task jobs periodically heartbeat to the DB. If the job has not heartbeat in this many seconds, the scheduler will mark the associated task instance as failed and will re-schedule the task |
300 |
zombie_detection_interval |
How often (in seconds) should the scheduler check for zombie tasks |
10 |
catchup_by_default |
Turn off scheduler catchup by setting this to |
true |
ignore_first_depends_on_past_by_default |
Setting this to |
true |
max_tis_per_query |
This changes the batch size of queries in the scheduling main loop. If this is too high, SQL query performance may be impacted by complexity of query predicate, and/or excessive locking. Additionally, you may hit the maximum allowable query length for your db. Set this to |
512 |
use_row_level_locking |
Should the scheduler issue |
true |
max_dagruns_to_create_per_loop |
Max number of DAGs to create DagRuns for per scheduler loop |
10 |
max_dagruns_per_loop_to_schedule |
How many DagRuns should a scheduler examine (and lock) when scheduling and queuing tasks |
20 |
schedule_after_task_execution |
Should the Task supervisor process perform a |
true |
parsing_pre_import_modules |
The scheduler reads DAG files to extract the Airflow modules that are going to be used, and imports them ahead of time to avoid having to re-do it for each parsing process. This flag can be set to |
true |
parsing_processes |
The scheduler can run multiple processes in parallel to parse dags. This defines how many processes will run |
2 |
file_parsing_sort_mode |
Determines the format of DAG parsing and sorting by the scheduler. One of three values can be specified:
|
modified_time |
standalone_dag_processor |
Whether the DAG processor is running as a standalone process or it is a subprocess of a scheduler job |
true |
max_callbacks_per_loop |
Only applicable if |
20 |
dag_stale_not_seen_duration |
Only applicable if |
600 |
use_job_schedule |
Turn off scheduler use of cron intervals by setting this to |
true |
allow_trigger_in_future |
Allows externally triggered DagRuns for Execution Dates in the future. Only has effect if |
false |
trigger_timeout_check_interval |
How often to check for expired trigger requests that have not run yet |
15 |
task_queued_timeout |
Amount of time a task can be in the queued state before being retried or set to failed |
600 |
task_queued_timeout_check_interval |
How often to check for tasks that have been in the queued state for longer than |
120 |
allowed_run_id_pattern |
The |
^[A-Za-z0-9_.~:+-]+$ |
| Parameter | Description | Default value |
|---|---|---|
default_capacity |
How many triggers a single Triggerer will run at once, by default |
1000 |
job_heartbeat_sec |
How often to heartbeat the Triggerer job to ensure it hasn’t been killed |
5 |
| Parameter | Description | Default value |
|---|---|---|
ccache |
Location of your ccache file once |
/opt/airflow/krb5_ccache |
principal |
Kerberos principal |
— |
reinit_frequency |
Kerberos reinit frequency |
3600 |
kinit_path |
Path to the |
kinit |
keytab |
Designates the path to the Kerberos keytab file for the Airflow user |
— |
forwardable |
Allows you to disable ticket forwardability |
true |
include_ip |
Allows you to remove source IP from token, useful when using token behind NATted Docker host |
true |
| Parameter | Description | Default value |
|---|---|---|
host |
Elasticsearch host |
— |
log_id_template |
Format of the |
{% raw %}{dag_id}-{task_id}-{run_id}-{map_index}-{try_number}{% endraw %} |
end_of_log_mark |
Used to mark the end of a log stream for a task |
end_of_log |
frontend |
Qualified URL for an Elasticsearch frontend (like Kibana) with a template argument for |
— |
write_stdout |
Write the task logs to the stdout of the worker, rather than the default files |
false |
json_format |
Instead of the default log formatter, write the log lines as JSON |
false |
json_fields |
Attach log fields to the JSON output, if enabled |
asctime, filename, lineno, levelname, message |
host_field |
The field where host name is stored (normally either |
host |
offset_field |
The field where offset is stored (normally either |
offset |
index_patterns |
Comma-separated list of index patterns to use when searching for logs (by default, |
_all |
| Parameter | Description | Default value |
|---|---|---|
use_ssl |
Whether to use SSL for conections |
false |
verify_certs |
Whether to verify SSL certificates. Set to |
true |
| Parameter | Description | Default value |
|---|---|---|
api_client_retry_configuration |
Kwargs to override the default urllib3 Retry used in the Kubernetes API client |
— |
pod_template_file |
Path to the YAML pod file that forms the basis for KubernetesExecutor workers |
— |
worker_container_repository |
The repository of the Kubernetes image for the worker to run |
— |
worker_container_tag |
The tag of the Kubernetes image for the worker to run |
— |
namespace |
The Kubernetes namespace where Airflow workers should be created. Defaults to |
default |
delete_worker_pods |
If |
true |
delete_worker_pods_on_failure |
If |
false |
worker_pods_creation_batch_size |
Number of Kubernetes worker pod creation calls per scheduler loop. The current default of |
1 |
multi_namespace_mode |
Allows users to launch pods in multiple namespaces. Will require creating a cluster-role for the scheduler, or use |
false |
multi_namespace_mode_namespace_list |
If |
— |
in_cluster |
Whether to use the service account that Kubernetes gives to pods to connect to Kubernetes cluster. It’s intended for clients that expect to be running inside a pod running on Kubernetes. It will raise an exception if called from a process not running in a Kubernetes environment |
true |
cluster_context |
When running with |
— |
config_file |
Path to the Kubernetes configfile to be used when |
— |
kube_client_request_args |
Keyword parameters to pass while calling a Kubernetes client |
— |
delete_option_kwargs |
Optional keyword arguments to pass to the |
— |
enable_tcp_keepalive |
Enables TCP keepalive mechanism. This prevents Kubernetes API requests to hang indefinitely when idle connection is time-outed on services like cloud load balancers or firewalls |
true |
tcp_keep_idle |
When the |
120 |
tcp_keep_intvl |
When the |
30 |
tcp_keep_cnt |
When the |
6 |
verify_ssl |
Set this to |
true |
worker_pods_queued_check_interval |
How often in seconds to check for task instances stuck in |
60 |
ssl_ca_cert |
Path to a CA certificate to be used by the Kubernetes client to verify the server’s SSL certificate |
— |
| Parameter | Description | Default value |
|---|---|---|
default_timeout |
Sensor default timeout, 7 days by default (7 * 24 * 60 * 60) |
604800 |
This field enables adding custom parameters to the airflow_cfg configuration files.
| Parameter | Description | Default value |
|---|---|---|
AIRFLOW_HOME |
The home directory for Airflow service |
/opt/airflow |
AIRFLOW_CONFIG |
The location of Airflow configuration file |
/opt/airflow/airflow.cfg |
AIRFLOW_PYTHON_PATH |
The location of Python used by Airflow |
/opt/airflow/bin/python3.10 |
DAG_PROCESSOR_SUBDIR |
The location of Airflow stored DAGs |
/opt/airflow/dags |
This field enables adding custom parameters to the airflow_cfg configuration files.
| Parameter | Description | Default value |
|---|---|---|
AUTH_LDAP_SERVER |
The LDAP server URI |
— |
AUTH_LDAP_BIND_USER |
The path of the LDAP proxy user to bind on to the top level. Example: |
— |
AUTH_LDAP_BIND_PASSWORD |
The password of the bind user |
— |
AUTH_LDAP_SEARCH |
Update with the LDAP path under which you’d like the users to have access to Airflow. Example: |
— |
AUTH_LDAP_UID_FIELD |
The UID (unique identifier) field in LDAP |
— |
AUTH_ROLES_MAPPING |
The parameter for mapping the internal roles to the LDAP Active Directory groups |
— |
AUTH_LDAP_GROUP_FIELD |
The LDAP user attribute which has their role DNs |
— |
AUTH_ROLES_SYNC_AT_LOGIN |
A flag that indicates if all the user’s roles should be replaced on each login, or only on registration |
true |
PERMANENT_SESSION_LIFETIME |
Sets an inactivity timeout after which users have to re-authenticate (to keep roles in sync) |
1800 |
AUTH_LDAP_USE_TLS |
Boolean whether TLS is being used |
false |
AUTH_LDAP_ALLOW_SELF_SIGNED |
Boolean to allow self-signed certificates |
true |
AUTH_LDAP_TLS_CACERTFILE |
Location of the certificate |
— |
| Parameter | Description | Default value |
|---|---|---|
Extra requirements |
List of Python packages to be installed on Airflow hosts. Use the standard requirements.txt format: |
— |
index-url |
Base URL of the Python Package Index (default: https://pypi.org/simple). The URL must point to a repository that complies with PEP 503 (the simple API) or to a local directory with the same structure |
— |
index-url-user |
Username used for authenticating with the repository specified in index-url |
— |
index-url-password |
Password used for authenticating with the repository specified in index-url |
— |
proxy |
Address of the proxy server through which package installation requests will be routed |
— |
proxy-user |
Username for authenticating with the proxy server |
— |
proxy-password |
Password used for authenticating with the proxy server |
— |
trusted-host |
IP address of the host or the |
— |
| Parameter | Description | Default value |
|---|---|---|
Enable custom ulimits |
Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the Ulimit settings table |
|
| Parameter | Description | Default value |
|---|---|---|
auto_refresh |
Enables automatic refresh for the Workers view. By default, the Workers view automatically refreshes at regular intervals to provide up-to-date information about the workers. Set this option to |
true |
ca_cert |
Sets the path to the ca_certs file containing a set of concatenated certification authority certificates |
— |
cert_file |
Sets the path to the SSL certificate file |
— |
keyfile |
Sets the path to the SSL key file |
— |
db |
Sets the database file to use if persistent mode is enabled |
flower |
tasks_columns |
Specifies the list of comma-separated columns to display on the Tasks page |
name,uuid,state,args,kwargs,result,received,started,runtime,worker |
persistent |
When persistent mode is enabled, Flower saves its current state and reloads it upon restart. This ensures that Flower retains its state and configuration across restarts. Flower stores its state in a database file specified by the db option |
false |
debug |
Enables the debug mode |
false |
enable_events |
When enabled, Flower periodically sends Celery |
false |
inspect_timeout |
Sets the timeout for the worker inspect commands in milliseconds |
1000 |
max_workers |
Sets the maximum number of workers to keep in memory |
5000 |
max_tasks |
Sets the maximum number of tasks to keep in memory |
100000 |
natural_time |
Enables showing time relative to the page refresh time in a human-readable format |
false |
state_save_interval |
Sets the interval for saving the Flower state. Flower state includes information about workers, tasks. The state is saved periodically to ensure data persistence and recovery upon restart |
100000 |
xheaders |
Enables support for X-Real-Ip and X-Scheme headers |
false |
purge_offline_workers |
Time (in seconds) after which offline workers are automatically removed from the Workers view. By default, offline workers will remain on the dashboard indefinitely |
— |
task_runtime_metric_buckets |
Sets the task runtime latency buckets. You can provide the buckets value as a comma-separated list of values |
— |
auth_provider |
Sets the authentication provider for Flower. By default, the |
— |
auth |
Enables authentication. |
— |
oauth2_key |
Sets the OAuth 2.0 key (client ID) issued by the OAuth 2.0 provider |
— |
oauth2_secret |
Sets the OAuth 2.0 secret issued by the OAuth 2.0 provider |
— |
oauth2_redirect_uri |
Sets the URI to which an OAuth 2.0 server redirects the user after successful authentication and authorization |
— |
cookie_secret |
Sets a secret key for signing cookies |
— |
Enable custom ulimits |
Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the Ulimit settings table |
|
| Parameter | Description | Corresponding option of the ulimit command in CentOS |
|---|---|---|
LimitCPU |
A limit in seconds on the amount of CPU time that a process can consume |
cpu time ( -t) |
DefaultLimitFSIZE |
The maximum size of files that a process can create, in 512-byte blocks |
file size ( -f) |
DefaultLimitDATA |
The maximum size of a process’s data segment, in kilobytes |
data seg size ( -d) |
DefaultLimitSTACK |
The maximum stack size allocated to a process, in kilobytes |
stack size ( -s) |
DefaultLimitCORE |
The maximum size of a core dump file allowed for a process, in 512-byte blocks |
core file size ( -c) |
DefaultLimitRSS |
The maximum amount of RAM memory (resident set size) that can be allocated to a process, in kilobytes |
max memory size ( -m) |
DefaultLimitNOFILE |
The maximum number of open file descriptors allowed for the process |
open files ( -n) |
DefaultLimitAS |
The maximum size of the process virtual memory (address space), in kilobytes |
virtual memory ( -v) |
DefaultLimitNPROC |
The maximum number of processes |
max user processes ( -u) |
DefaultLimitMEMLOCK |
The maximum memory size that can be locked for the process, in kilobytes. Memory locking ensures the memory is always in RAM and a swap file is not used |
max locked memory ( -l) |
DefaultLimitLOCKS |
The maximum number of files locked by a process |
file locks ( -x) |
DefaultLimitSIGPENDING |
The maximum number of signals that are pending for delivery to the calling thread |
pending signals ( -i) |
DefaultLimitMSGQUEUE |
The maximum number of bytes in POSIX message queues. POSIX message queues allow processes to exchange data in the form of messages |
POSIX message queues ( -q) |
DefaultLimitNICE |
The maximum NICE priority level that can be assigned to a process |
scheduling priority ( -e) |
DefaultLimitRTPRIO |
The maximum real-time scheduling priority level |
real-time priority ( -r) |
DefaultLimitRTTIME |
The maximum pipe buffer size, in 512-byte blocks |
pipe size ( -p) |
GitSync
A JSON configuration with repository parameters for GitSync.
| Parameter | Description | Default value |
|---|---|---|
url |
Git repository URL |
git@ssh.gitlab.example.io:org/repo.git |
files |
File filter pattern |
*.py |
branch |
Git branch |
main |
directory |
Path inside repository |
./src |
sync_interval |
Interval between sync cycles (seconds) |
60 |
sync_timeout |
Timeout for a sync operation |
120 |
target_folder |
Destination directory. Each repository must use a unique |
/path/to/target/folder |
delete_old_files |
Whether to remove outdated files |
true |
ssh_key |
Name of the private key. When using SSH repositories, private keys are managed by the GitSync service and do not need to be manually placed on the host. Required for SSH repositories |
— |
tag |
Git tag to use instead of a branch for repository synchronization |
v1.0.0 |
sync_requirements |
Enables synchronization and installation of Python dependencies from a requirements.txt file |
true |
requirements_path |
Path to the requirements file |
./requirements.txt |
access_token |
Access token for authentication when using HTTPS repositories |
— |
https_username |
Username for HTTPS authentication when using access tokens |
oauth2 |
| Parameter | Description | Default value |
|---|---|---|
TARGET_PYTHON |
Python binary used for synchronizing dependencies. Defined at the service level and shared across all repositories. Separate Python environments per repository are not supported |
/usr/lib/ado-python310/bin/python3 |
WORK_DIR |
Internal directories used by the git-sync application for storing temporary data and repository copies |
/usr/lib/ad-gitsync |
LOG_DIR |
Directory for storing application logs |
/var/log/ad-gitsync |
CREDENTIALS_DIRECTORY |
Directory containing SSH keys and credentials for accessing Git repositories |
/etc/ad-gitsync/ssh_key |
CONFIG_FILE |
Path to the JSON configuration file that defines repository settings and synchronization parameters |
/etc/ad-gitsync/conf/config.json |
WORKER_COUNT |
Number of parallel workers that can process multiple repositories simultaneously |
1 |
You can use the Custom gitsync_env.sh field to set configuration parameters for GitSync. The settings specified in this field have higher priority than the settings specified in gitsync_env.sh.
Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the Ulimit settings table.
[Service]
LimitCPU=
LimitFSIZE=
LimitDATA=
LimitSTACK=
LimitCORE=
LimitRSS=
LimitNOFILE=
LimitAS=
LimitNPROC=
LimitMEMLOCK=
LimitLOCKS=
LimitSIGPENDING=
LimitMSGQUEUE=
LimitNICE=
LimitRTPRIO=
LimitRTTIME=
| Parameter | Description | Corresponding option of the ulimit command in CentOS |
|---|---|---|
LimitCPU |
A limit in seconds on the amount of CPU time that a process can consume |
cpu time ( -t) |
DefaultLimitFSIZE |
The maximum size of files that a process can create, in 512-byte blocks |
file size ( -f) |
DefaultLimitDATA |
The maximum size of a process’s data segment, in kilobytes |
data seg size ( -d) |
DefaultLimitSTACK |
The maximum stack size allocated to a process, in kilobytes |
stack size ( -s) |
DefaultLimitCORE |
The maximum size of a core dump file allowed for a process, in 512-byte blocks |
core file size ( -c) |
DefaultLimitRSS |
The maximum amount of RAM memory (resident set size) that can be allocated to a process, in kilobytes |
max memory size ( -m) |
DefaultLimitNOFILE |
The maximum number of open file descriptors allowed for the process |
open files ( -n) |
DefaultLimitAS |
The maximum size of the process virtual memory (address space), in kilobytes |
virtual memory ( -v) |
DefaultLimitNPROC |
The maximum number of processes |
max user processes ( -u) |
DefaultLimitMEMLOCK |
The maximum memory size that can be locked for the process, in kilobytes. Memory locking ensures the memory is always in RAM and a swap file is not used |
max locked memory ( -l) |
DefaultLimitLOCKS |
The maximum number of files locked by a process |
file locks ( -x) |
DefaultLimitSIGPENDING |
The maximum number of signals that are pending for delivery to the calling thread |
pending signals ( -i) |
DefaultLimitMSGQUEUE |
The maximum number of bytes in POSIX message queues. POSIX message queues allow processes to exchange data in the form of messages |
POSIX message queues ( -q) |
DefaultLimitNICE |
The maximum NICE priority level that can be assigned to a process |
scheduling priority ( -e) |
DefaultLimitRTPRIO |
The maximum real-time scheduling priority level |
real-time priority ( -r) |
DefaultLimitRTTIME |
The maximum pipe buffer size, in 512-byte blocks |
pipe size ( -p) |
Monitoring
| Parameter | Description | Default value |
|---|---|---|
scrape_interval |
Specifies how frequently to scrape targets |
1m |
scrape_timeout |
Specifies how long to wait until a scrape request times out |
10s |
Password for Grafana connection |
Password of a Grafana user to connect to Prometheus |
— |
Prometheus users to login/logout to Prometheus |
User credentials for logging into the Prometheus web interface |
— |
| Parameter | Description | Default value |
|---|---|---|
config.file |
Path to the main Prometheus configuration file, which defines scrape jobs, alerting rules, and other settings |
/etc/admprom/prometheus/prometheus.yml |
storage.tsdb.path |
Directory where Prometheus stores its time series database (TSDB) files |
/var/lib/admprom/prometheus |
web.console.libraries |
Location of console library files used for rendering the Prometheus UI consoles |
/usr/share/admprom/prometheus/console_libraries |
web.console.templates |
Directory containing console templates for the Prometheus UI |
/usr/share/admprom/prometheus/consoles |
web.config.file |
Path to the web configuration file, used for authentication, TLS, and other web server settings |
/etc/admprom/prometheus/prometheus-auth.yml |
storage.tsdb.retention.time |
Defines how long to retain data in the time series database before deletion |
15d |
web.listen-address |
IP address and port where the Prometheus web interface and API listen for incoming connections |
0.0.0.0:11200 |
| Parameter | Description | Default value |
|---|---|---|
Grafana administrator’s password |
Password of a Grafana administrator user |
— |
Grafana listen port |
Port to access the Grafana web interface |
11210 |
| Parameter | Description | Default value |
|---|---|---|
Listen port |
Port to listen for a host’s system metrics in the Prometheus format |
11203 |
Metrics endpoint |
Endpoint to retrieve system metrics |
/metrics |
| Parameter | Description | Default value |
|---|---|---|
[Prometheus] → Enable SSL |
Defines whether SSL is enabled for Prometheus |
false |
[Prometheus] → Certificate file |
Path to the Prometheus server SSL certificate file in the PEM format |
/etc/admprom/prometheus/server.crt |
[Prometheus] → Private key file |
Path to the private key file of the Prometheus server SSL certificate |
/etc/admprom/prometheus/server.key |
[Prometheus] → Certificate authority file |
Path to the certificate authority file |
/etc/admprom/prometheus/ca.crt |
[Grafana] → Enable SSL |
Defines whether SSL is enabled for Grafana |
false |
[Grafana] → Certificate file |
Path to the Grafana server SSL certificate file in the PEM format |
/etc/admprom/grafana/server.crt |
[Grafana] → Private key file |
Path to the private key file of the Grafana server SSL certificate |
/etc/admprom/grafana/server.key |
[Grafana] → Certificate authority file |
Path to the certificate authority file |
/etc/admprom/grafana/ca.crt |
[Node-exporter] → Enable SSL |
Defines whether SSL is enabled for Node Exporter |
false |
[Node-exporter] → Certificate file |
Path to the Node Exporter server SSL certificate file in the PEM format |
/etc/ssl/server.crt |
[Node-exporter] → Private key file |
Path to the private key file of the Node Exporter server SSL certificate |
/etc/ssl/server.key |
Set SSL rights for certs/key |
Enables changing the owner and permissions of the SSL certificate and key files |
false |
| Parameter | Description | Default value |
|---|---|---|
job_name |
The name of the job within which metrics will be collected |
statsd_exporter |
scrape_interval |
Specifies how frequently to scrape targets |
5s |
scrape_timeout |
Specifies how long to wait until a scrape request times out. Cannot be greater than the value of the |
— |
| Parameter | Description | Default value |
|---|---|---|
job_name |
The name of the job within which metrics will be collected |
flower_exporter |
scrape_interval |
Specifies how frequently to scrape targets |
5s |
scrape_timeout |
Specifies how long to wait until a scrape request times out. Cannot be greater than the value of the |
— |
| Parameter | Description | Default value |
|---|---|---|
Mapping config |
Airflow StatsD metrics mapping |
— |
| Parameter | Description | Default value |
|---|---|---|
web.listen-address |
Port where the Prometheus web interface and metrics are displayed |
9102 |
statsd.mapping-config |
The name of the metrics mapping configuration file |
/etc/statsd-exporter/conf/statsd-mapping.yml |
statsd.listen-udp |
The UDP port on which to receive statsd metric lines. Filled from the statsd_port parameter in airflow.cfg [metrics] |
8125 |
web.enable-lifecycle |
Enables shutdown and reload via HTTP request |
true |
statsd.cache-size |
Maximum size of the metric mapping cache. If max size is reached, the service will rely on the least recently used replacement policy |
— |
statsd.listen-tcp |
The TCP port on which to receive statsd metric lines. Leave the value empty to disable it |
— |
web.telemetry-path |
Path under which to expose metrics |
— |
statsd.listen-unixgram |
The Unixgram socket path to receive statsd metric lines in datagram. Leave the value empty to disable it |
— |
statsd.unixsocket-mode |
The permission mode of the Unix socket |
— |
statsd.read-buffer |
The size (in bytes) of the operating system’s transmit read buffer associated with the UDP or Unixgram connection. Ensure that the |
— |
statsd.cache-type |
Metric mapping cache type. Valid options are |
— |
statsd.event-queue-size |
The size of internal queue for processing events |
— |
statsd.event-flush-threshold |
The number of events to hold in the queue before flushing |
— |
statsd.event-flush-interval |
Maximum time between event queue flushes |
— |
debug.dump-fsm |
The path where to dump internal FSM generated for glob matching (as a Dot file) |
— |
statsd.parse-dogstatsd-tags |
Indicates whether to parse DogStatsd style tags |
true |
statsd.parse-influxdb-tags |
Indicates whether to parse InfluxDB style tags |
true |
statsd.parse-librato-tags |
Indicates whether to parse Librato style tags |
true |
statsd.parse-signalfx-tags |
Indicates whether to parse SignalFX style tags |
true |
statsd.relay.address |
The UDP relay target address in the |
— |
statsd.relay.packet-length |
Maximum relay output packet length to avoid fragmentation |
— |
statsd.udp-packet-queue-size |
Size of internal queue for processing UDP packets |
— |
log.level |
The logging level. Supported values: |
— |
log.format |
The output format of logs. Supported values: |
— |
Custom statsd-options.env |
This field enables adding custom parameters to the statsd-options.env configuration file |
— |
Redis
| Parameter | Description | Default value |
|---|---|---|
redis.conf |
Redis configuration file |
— |
redis_port |
Redis broker listen port |
6379 |
| Parameter | Description | Default value |
|---|---|---|
sentinel.conf |
Sentinel configuration file |
— |
sentinel_port |
Sentinel port |
26379 |
| Parameter | Description | Default value |
|---|---|---|
Enable custom ulimits |
Displays an editable ulimits config for the Redis Server |
— |
DBT
| Parameter | Description | Default value |
|---|---|---|
DBT_PROFILES_DIR |
Path to the profiles.yml configuration file |
/etc/ad-dbt/conf |
DBT_LOG_PATH |
Directory for logs |
— |
DBT_TARGET |
Specifies the |
— |
You can use the Custom dbt-env.sh field to set configuration parameters for GitSync. The settings specified in this field have higher priority than the settings specified in dbt-env.sh.
The docs component has the docs_projects parameter, which can be used as a configuration file to map project paths to ports for documentation’s web UI.
Configuration template:
docs_projects:
<project_dir>: <port>
where:
-
project_dir— path to a dbt project; -
port— port used to access documentation.
Example:
docs_projects:
/opt/test/dbt/dbt_people_lab: 8092
The documentation will be available at http://<host>:<port>.