Manage dependencies via ADCM

Overview

Some Airflow DAGs require additional Python packages that are not included in the base environment. These packages may come from the public Python Package Index (PyPI), private repositories, or local distributions.

ADCM provides a centralized way to manage these dependencies. You can define and update Python package requirements directly from the ADCM UI using configuration parameters available for the Airflow service. ADCM ensures that these configurations are applied to all hosts in the cluster.

This feature supports:

  • dependencies from public packages from PyPI (e.g. requests, pandas);

  • dependencies from private or internal Python packages hosted on authenticated repositories;

  • accessing repositories through a proxy;

  • dependency version constraints.

The Airflow configuration in ADCM provides editable fields that map directly to the parameters used by pip install.

This guide describes how to configure dependency management for Airflow using ADCM.

Define dependencies in ADCM

Let’s consider a DAG that has the following dependencies:

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
import requests (1)
import pandas as pd (2)
import my_private_module (3)
1 Python package.
2 Python package of a specific version.
3 Package from a private repository.

Add a Python package

ADCM provides the Extra requirements field in Airflow configuration that functions as a requirements.txt file for managing dependencies.

To install the required Python packages for DAGs via ADCM:

  1. On the Clusters page, select your ADO cluster.

  2. Go to the Services tab and select the Airflow service.

  3. Toggle the Show advanced option and unfold the Dependency Management section.

  4. In the Extra requirements field, list the packages in the following format: <package-name>==<version>. For example:

    requests
    pandas==2.2.2

    If a version is omitted, the latest is installed by default. For more information on how to specify packages, see the Requirements File Format description.

  5. Confirm changes to configuration by clicking Save.

  6. In the Actions drop-down menu, select Sync requirements and click Run.

  7. In the Actions drop-down menu, select Restart, make sure the Apply configs from ADCM option is set to true, and click Run.

Add a package from a private repository

When your DAGs rely on a package hosted in a private repository, you can configure the package source host and authentication credentials in ADCM.

To install the required Python packages from a private repository via ADCM:

  1. On the Clusters page, select your ADO cluster.

  2. Go to the Services tab and select the Airflow service.

  3. Toggle the Show advanced option and unfold the Dependency Management section.

  4. In the Extra requirements field, list the packages in the following format: <package-name>==<version>. For example:

    my-private-pkg==0.1.0
  5. Configure the following fields:

    • index-url — repository URL;

    • index-url-user — username for authentication;

    • index-url-password — password for authentication.

  6. Confirm changes to configuration by clicking Save.

  7. In the Actions drop-down menu, select Sync requirements and click Run.

  8. In the Actions drop-down menu, select Restart, make sure the Apply configs from ADCM option is set to true, and click Run.

Optionally, if your cluster requires a proxy access to fetch public packages, you can enter the proxy parameters in the following fields:

  • proxy — proxy server address;

  • proxy-user — username for authentication via proxy;

  • proxy-password — password for authentication via proxy.

If the proxy does not support HTTPS, enter the IP address of the host or the <host IP>:<port> pair in the trusted-host field.

Dependency constraints

ADCM provides a mechanism to control the versions of critical Python packages using constraints files. This helps to prevent conflicts when adding new dependencies or unintended upgrades of essential components such as Apache Airflow.

Constraints files differ from requirements files in that they define which versions are allowed, but do not install them automatically.

The Dependency Management Airflow configuration group includes the following fields for version control:

  • Constraints file — editable list of version constraints defined by the user;

  • Base constraints file — read-only list of constraints required for proper Airflow operation, which cannot be overridden by Constraints file.

To define version constraints via ADCM:

  1. On the Clusters page, select your ADO cluster.

  2. Go to the Services tab and select the Airflow service.

  3. Toggle the Show advanced option and unfold the Dependency Management section.

  4. In the Constraints file field, list the packages in the following format: <package-name>==<version>. For example:

    apache-airflow==2.10.5
    pandas==2.2.2
    requests<3.0.0
  5. Confirm changes to the configuration by clicking Save.

  6. In the Actions drop-down menu, select Sync requirements and click Run.

  7. In the Actions drop-down menu, select Restart, make sure the Apply configs from ADCM option is set to true, and click Run.

ADCM validates all dependencies against the combined constraints file during the Sync requirements action.

If a package installation violates defined constraints, the synchronization process fails and an error is logged.

Examples of constraint violations include:

  • attempting to install a package that requires a higher version of Apache Airflow than allowed;

  • specifying a package version that conflicts with the base constraints;

  • manually installing incompatible packages in the Airflow virtual environment.

In all such cases, dependency synchronization will be blocked to maintain cluster stability.

Found a mistake? Seleсt text and press Ctrl+Enter to report it