GitSync overview
GitSync is an ADO service designed for uploading and synchronization of Airflow DAGs from remote Git repositories. The service enables Git-based management of DAGs and integrates directly into Airflow environments.
GitSync main features:
-
Repository synchronization — cloning and updating files from one or multiple Git repositories.
-
Automated delivery — synchronization of DAG files to the target Airflow DAG directory.
-
Flexible filtering — selection of files using pattern-based filters.
-
Parallel processing — handling multiple repositories simultaneously using workers.
-
Cleanup support — optional removal of outdated files from target directories.
-
SSH key management — centralized handling of SSH credentials through service actions in ADCM.
Workflow
GitSync operates as a standalone service and has only one component (gitsync).
The synchronization process consists of the following steps:
-
DAG source code is stored in one or more Git repositories.
-
GitSync clones or updates repositories.
-
Files are filtered based on GitSync’s configuration.
-
DAG files are copied to the target directory.
-
Optional cleanup removes outdated files.
-
Logs and metrics are generated.
Airflow automatically discovers updated DAGs by scanning the configured DAG directory. It recursively scans all subdirectories inside the DAG folder (for example, /opt/airflow/dags).
Configuration
GitSync configuration consists of two levels:
Service-level configuration
Service-level parameters define global behavior of the GitSync service. They are defined in the gitsync-env.sh option in ADCM.
Key parameters include:
-
number of parallel workers;
-
synchronization interval and timeout;
-
logging configuration.
Repository-level configuration
Repository settings are defined in the config.json option in ADCM, which contains the parameters for connecting to the repositories and DAG selection options for synchronization.
Example repository configuration:
{
"url": "git@ssh.gitlab.example.io:org/repo.git", (1)
"branch": "main", (2)
"directory": "./dags",
"files": "*.py", (3)
"sync_interval": 60, (4)
"sync_timeout": 120,
"ssh_key": "my-git-key", (5)
"target_folder": "/opt/airflow/dags/project", (6)
"delete_old_files": true (7)
}
| 1 | Git repository URL. |
| 2 | Branch and directory. |
| 3 | File filtering rules. |
| 4 | Synchronization interval and timeout. |
| 5 | SSH key name (for SSH repositories). |
| 6 | Target directory for DAGs in Airflow. |
| 7 | Optional cleanup behavior. |
GitSync supports synchronization of multiple repositories simultaneously.
Each repository is processed independently by the worker pool and must use a unique target_folder to avoid conflicts.
[
{
"url": "git@ssh.gitlab.example.io:org/marketing-dags.git",
"sync_interval": 60,
"target_folder": "/opt/airflow/dags/marketing",
"branch": "main",
"tag": null,
"directory": "./dags",
"files": "*.py",
"sync_requirements": false,
"requirements_path": null,
"sync_timeout": 120,
"ssh_key": "ssh_key_marketing",
"delete_old_files": true
},
{
"url": "git@ssh.gitlab.example.io:org/finance-dags.git",
"sync_interval": 120,
"target_folder": "/opt/airflow/dags/finance",
"branch": "main",
"tag": null,
"directory": "./dags",
"files": "*.py",
"sync_requirements": false,
"requirements_path": null,
"sync_timeout": 300,
"ssh_key": "ssh_key_finance",
"delete_old_files": true
},
{
"url": "git@ssh.gitlab.example.io:org/sales-dags.git",
"sync_interval": 180,
"target_folder": "/opt/airflow/dags/sales",
"branch": "main",
"tag": null,
"directory": "./dags",
"files": "*.py",
"sync_requirements": false,
"requirements_path": null,
"sync_timeout": 300,
"ssh_key": "ssh_key_sales",
"delete_old_files": true
},
{
"url": "https://github.com/org/shared-dags.git",
"sync_interval": 300,
"target_folder": "/opt/airflow/dags/shared",
"branch": "main",
"tag": null,
"directory": "./",
"files": "*.py",
"sync_requirements": false,
"requirements_path": null,
"sync_timeout": 300,
"access_token": "******",
"https_username": "oauth2",
"delete_old_files": false
}
]
SSH authentication
For SSH-based repositories, GitSync provides built-in key management:
-
SSH keys are uploaded via the Upload private key action.
-
Keys are stored and managed by GitSync, according to the service configuration.
-
Repository configuration references keys by name.
-
Keys are injected at runtime.
The same SSH key can be reused across multiple repositories.
Usage
To start using GitSync:
-
Configure the service parameters.
-
Upload SSH keys (if required) via the Upload private key GitSync action.
-
Define repository configurations.
-
Ensure that target DAG directories are accessible by Airflow.
After all steps are completed, GitSync automatically maintains DAG synchronization according to defined intervals.
Limitations
Consider the following limitations when configuring GitSync:
-
Python environment.
TARGET_PYTHONis defined at the service level and shared across all repositories. Using separate Python environments for different repositories is not supported. -
Repository configuration flexibility. The following cases are not supported and may case undefined behavior:
-
synchronization of multiple directories from the same repository and branch;
-
synchronization of the same repository from multiple branches.
-
-
dbt project support. This is not a primary use case and not fully validated. Runtime-generated artifacts (for example,
target/,logs/) may be removed if not configured correctly. Additionally, the use of dbt equires the following configuration:-
files = "*"; -
delete_old_files = false.
-
-
File synchronization behavior.
delete_old_filesremoves files based on repository state and does not distinguish between outdated and runtime-generated files. -
General limitations.
-
Requires network access to Git repositories.
-
SSH requires correct key configuration.
-
Duplicate
dag_idacross repositories leads to conflicts.
-