Airflow performance tuning

Elena Kostyuchenko

Contents

Scheduler tuning
- Key settings
- Scheduler parallelism, HA, and worker coordination
DAG design recommendations
Executor and worker tuning
Metadata database (PostgreSQL) tuning
- Connection pooling & SQLAlchemy
Concurrency controls, pools, and priorities
- Pools
- Priorities and SLA
Web Server sizing
Logging
Evaluate tuning results

This article describes optimization techniques and best practices that can help you improve the performance of the Airflow service in Arenadata Orchestrator clusters. The described approaches are based on the latest Apache Airflow recommendations and are applicable to the 2.10.5 version of Airflow or higher.

Scheduler tuning

The scheduler is the primary throughput limiter for Airflow at scale. When the scheduler can’t keep up with DAG parsing and task scheduling, queues grow, task wait time increases and leads to the metadata DB overload. Tuning the scheduler first yields the largest benefits in terms of throughput optimization.

If scheduler loop time increases, you can find bottlenecks by checking DAG parse time, top-level imports in DAGs, and values in the store_serialized_dags and min_file_process_interval parameters.

If tasks stay in a queue for a long time, check the state of the Executor, Celery broker queue length, the values in the worker_concurrency parameter, and DB connection pool exhaustion.

Key settings

Configuration property

Description

Suggested starting value

store_serialized_dags

Stores serialized DAGs in the metadata DB so the scheduler reads pre-parsed DAGs instead of re-parsing Python on every loop. Reduces parse CPU and I/O by letting the scheduler load a compact serialized DAG model from the DB instead of running Python. This is very effective for large numbers of DAG files

True

min_file_process_interval

Minimum seconds between DAG file parsing cycles. Increases parse efficiency at expense of immediate DAG update propagation

30s

dag_dir_list_interval

Interval for scanning the dags folder for new or removed files. Set a higher value to reduce I/O overhead

60s

max_tis_per_query

Batch size of TaskInstance rows fetched per scheduling loop. Lower values make the scheduler more responsive. Experiment to find a value that balances between the DB query cost and scheduler throughput. Recent Airflow releases changed defaults to favor responsiveness. Must be lower than core.parallelism

16–128

max_dagruns_to_create_per_loop

Limit DAGs for which runs are created per loop (helps distribute work across multiple schedulers)

10–50

Scheduler parallelism, HA, and worker coordination

For better performance, run schedulers on CPU-optimized nodes. For high scale, run multiple schedulers in HA mode with use_row_level_locking=True and tune max_dagruns_to_create_per_loop so creation of DAG runs is spread across schedulers.

Increase scheduler CPU and memory if you parse many DAGs or use dynamic DAG generation, but first refactor DAGs to minimize top-level work.

DAG design recommendations

Flawed DAG design can multiply scheduler and worker cost. To make sure that DAGs do not impede your Airflow optimization efforts, consider best practices for writing DAGs described in this section.

Parsing and top-level code

When writing DAG files in Airflow, it is important to avoid heavy imports or blocking calls at the module’s top level (outside of task functions or operators). This is because the scheduler and Web Server need to continuously parse and load DAG files to keep track of available workflows.

Every time a DAG is parsed, all the top-level code in the file is executed. If the file contains expensive imports, database queries, or network calls at the top level, this significantly increases the parsing time and can put unnecessary strain on the scheduler and Web Server. Under high load with many DAGs, these delays compound, leading to slower scheduling loops, missed SLAs, and reduced cluster throughput.

Another risk is that blocking calls or large dependencies at the module level can introduce instability. For example, if a DAG file imports a library that takes several seconds to load or attempts to fetch data from an external service, the scheduler may hang or even fail to parse the DAG, marking it as broken. This can prevent critical workflows from being scheduled at all.

To avoid these issues, heavy operations should be moved inside task functions or operators, where they will only execute when the task runs. Keeping DAG files lightweight ensures that parsing remains fast, predictable, and resilient even in environments with thousands of workflows and frequent code updates.

An example of a problematic DAG:

import pandas as pd  (1)
heavy = heavy_initialization() (2)

with DAG(
    dag_id="bad_dag",
    start_date=datetime(2025, 1, 1),
    schedule_interval=None,
    catchup=False,
):
    @task()
    def do_work_bad():
        heavy.process()

    do_work_bad()

1	Placing an import here will slow down the parsing.
2	This top-level function will be executed during parsing, which will also slow it down.

An example of how to fix that DAG:

with DAG(
    dag_id="good_dag",
    start_date=datetime(2025, 1, 1),
    schedule_interval=None,
    catchup=False,
):
    @task()
    def do_work_good():
        from mylib import heavy_initialization   (1)
        heavy = heavy_initialization()           (2)
        heavy.process()

    do_work_good()

1	Put the import inside the task.
2	Put the function call inside the task.

Removing top-level work in this way reduces parse latency and scheduler CPU spikes.

Dynamic task mapping

Dynamic task mapping enables optimizing Airflow performance in environments with many parameterized or repetitive tasks through a DAG design. It enables creating tasks at runtime rather than generating task instances at parse time.

When generating DAGs this way, you define a single mapped task and supply it with a list of inputs instead of creating all task objects manually. When the DAG runs, Airflow automatically expands that task into multiple parallel instances, then it creates the individual task instances in the metadata database. This reduces scheduler load, speeds up DAG parsing, and keeps the DAG lightweight.

XComs and inter-task data transfer

It’s not recommended to push large payloads to XCom (metadata DB). For large artifacts it’s better to use object storage (for example, S3 or HDFS) and push only pointers, like paths and URIs, via XCom.

If you need structured small messages, use XCom JSON serialization or a remote XCom backend designed for larger payloads.

Executor and worker tuning

The Executor determines how Airflow runs tasks in parallel. In ADO, the recommended Executors are CeleryExecutor and KubernetesExecutor, depending on workload characteristics.

Key parameters for tuning Executors in ADO are listed below.

Parameter

Description

Default value

parallelism

Global limit for the number of task instances that can run in parallel across the entire Airflow environment

dag_concurrency

Maximum number of task instances allowed to run concurrently per DAG. Prevents a single DAG from exhausting cluster resources

max_active_runs_per_dag

Maximum number of active DAG runs for each DAG. Setting too high values may lead to scheduler and worker overload

worker_concurrency

Maximum number of task instances a single worker can execute simultaneously. Adjust based on available CPU cores and memory

Worker concurrency and resource sizing parameters directly determine how efficiently tasks are executed across your cluster.

The configuration of workers defines the throughput, responsiveness, and stability of the entire system. Worker concurrency controls how many tasks a single worker process can execute at the same time. If concurrency is set too low, your cluster might underutilize available CPU and memory resources, leading to slow DAG runs and idle infrastructure. If it’s set too high, the worker may attempt to run more tasks than the host machine can handle, causing CPU thrashing, memory exhaustion, and task failures.

For example, a worker with 8 CPU cores and a concurrency setting (worker_concurrency) of 64 might try to run far too many Python processes in parallel, leading to context switching overhead and degraded performance. But setting concurrency to 2 on the same machine would leave resources underused, forcing tasks to queue unnecessarily.

For optimal performance, set concurrency to a level that matches the available hardware resources (cores, memory) and the typical resource profile of your tasks. For lightweight tasks (like database queries or API calls), higher concurrency may work well, but for heavy ETL or ML tasks, lower concurrency ensures stability and better throughput.

Airflow workers can be scaled both vertically (increasing CPU, memory, or disk resources per worker) and horizontally (adding more worker nodes). It’s recommended to align the capacity of your hardware with the type of the workloads:

CPU-intensive tasks (e.g. data transformations, Spark job submission) benefit from more cores but require careful concurrency tuning so that CPU-bound tasks don’t overwhelm the infrastructure.
Memory-heavy tasks (e.g. large Pandas dataframes) require larger memory allocations and lower concurrency settings to prevent out-of-memory errors.
Mixed workloads often benefit from splitting workers into pools or queues, so different task types run on workers optimized for their needs.

Metadata database (PostgreSQL) tuning

The metadata database is often the first and most critical bottleneck when scaling Airflow. Every scheduler heartbeat, task state update, log reference, and DAG parse results in reads and writes to this database. Because Airflow’s workload consists of a very high volume of small, frequent inserts, updates, and queries, the database must be tuned specifically for high write and read concurrency.

Recommendations for DB optimization:

Set the value for max_connections high enough to handle Airflow components (scheduler, Web Server, workers, CLI), but balance it with connection pooling to avoid exhausting memory.
Increase wal_buffers and set the right value for checkpoint_completion_target so that write-ahead log (WAL) flushing doesn’t block frequent updates.
Ensure the storage is optimized for fast random writes (e.g. SSDs).
Airflow’s metadata tables, such as task_instance, dag_run, and log, grow quickly. If autovacuum isn’t productive enough, table bloat might cause slow queries. Increase frequency of autovacuum on these large tables, or schedule manual VACUUM/ANALYZE for predictable performance.
Partitioning or periodically archiving old task history and logs keeps table sizes manageable.

It is also important to scale the database vertically (CPU, memory, IOPS) as workflows grow. For more information on optimizing Postgres, see the Performance tuning article.

Connection pooling & SQLAlchemy

Set SQLAlchemy pool values so connections are available for scheduler, Web Server, workers, and external tooling.

Parameter Description Suggested starting value

sql_alchemy_pool_size

The SQLAlchemy pool size is the maximum number of database connections in the pool. 0 indicates no limit

sql_alchemy_max_overflow

The maximum overflow size of the pool. When the number of checked-out connections reaches the size set in pool_size, additional connections will be returned up to this limit. When those additional connections are returned to the pool, they are disconnected and discarded. The total number of simultaneous connections the pool will allow is pool_size + max_overflow, and the total number of sleeping connections the pool will allow is pool_size. max_overflow can be set to -1 to indicate no overflow limit; no limit will be placed on the total number of concurrent connections. Defaults to 10

sql_alchemy_pool_recycle

The SQLAlchemy pool recycle is the number of seconds a connection can be idle in the pool before it is invalidated. If the number of DB connections is ever exceeded, a lower config value will allow the system to recover faster

1800

Increasing sql_alchemy_pool_size reduces wait time for DB connections. Ensure your Postgres max_connections is sized to accommodate total Airflow connections plus other DB users.

Concurrency controls, pools, and priorities

Pools

By default, Airflow will schedule as many tasks as allowed by the Executor’s capacity and worker concurrency. While this maximizes throughput, it can easily lead to resource contention. For example, too many tasks hammering a database, API rate limits being exceeded, or I/O-heavy jobs monopolizing a worker. Pools act as a concurrency throttle: they group tasks into resource categories and limit how many of them can run at once across the entire Airflow cluster.

One of the better ways to utilize this feature is to create pools for resource-bounded operations (e.g. hadoop_jobs, db_ingest, api_calls), set explicit slot counts, and assign tasks to pools via pool argument in operators.

You can configure pools using Airflow CLI or UI.

Priorities and SLA

Use priority_weight on tasks to influence slot allocation when pools are saturated. Combine with pools to ensure high-priority jobs get scheduled sooner.

Using DAG-level caps such as max_active_runs_per_dag together with pools helps to smooth out bursts of task executions and prevents resource exhaustion. By default, if a DAG is triggered frequently or has a large backlog of catchup runs, Airflow may attempt to schedule many DAG runs at once. Setting max_active_runs_per_dag limits how many runs of that DAG can execute in parallel, ensuring that execution is paced and system resources are not overwhelmed. When combined with pools, DAG-level settings control how a single workflow consumes cluster capacity, while pools coordinate concurrency across different workflows and resource types.

You can also implement a short task-rate limiter via an upstream DAG or queue consumer that writes to an intermediate queue (e.g. Kafka) for some cases.

Web Server sizing

The Airflow Web Server serves the UI, API requests, and DAG/task metadata queries, all of which increase linearly with the number of users and DAG activity. Under-provisioned webservers can lead to slow page loads, timeouts, or even service unavailability during peak usage.

To optimize performance, it is important to scale the Web Server both vertically (allocating sufficient CPU, memory, and network bandwidth) and horizontally (running multiple replicas behind a load balancer). Additionally, you can enable Gunicorn worker tuning to be able to adjust the number of worker processes and threads to match available cores. This ensures that requests are served efficiently without overloading the CPU or exhausting memory.

For very large deployments, caching static content, limiting the number of DAGs loaded per user session, and offloading heavy API queries to dedicated endpoints can further reduce load, ensuring that the Airflow UI remains responsive even under high user concurrency.

Logging

Avoid local-only logs if you scale workers: configure remote logging (S3/GCS/Elasticsearch) to centralize logs and reduce disk I/O on workers. Configure remote_base_log_folder and remote_log_conn_id parameters via ADCM.

Evaluate tuning results

Before large config changes (e.g. raising parallelism or DB pool sizes), it is recommended to run load tests with representative DAGs to measure effects on scheduler, DB, and workers. The best way to evaluate the effectiveness of Airflow performance tuning would be using A/B comparison tests to isolate impact.

Found a mistake? Seleсt text and press Ctrl+Enter to report it