DBT service overview

Overview

DBT service is an ADO service that provides a pre-installed dbt runtime integrated into Arenadata Orchestrator.

dbt (data build tool) is a framework for transforming data in data warehouses using SQL and modular models. It enables defining transformations as code, manage dependencies between datasets, and apply testing and documentation practices.

The DBT service allows you to:

  • execute dbt transformations;

  • run tests;

  • build models;

  • generate documentation.

It can be used both via ADCM actions and Airflow DAGs.

The DBT service consists of two components:

  • DBT — provides dbt runtime and execution capabilities;

  • DBT Docs — provides a web UI for dbt documentation.

Configuration

The DBT service and its component can be configured via ADCM.

The variables defined in configuration parameters are applied during execution in both ADCM actions and Airflow DAGs.

When executing dbt commands, parameters are applied in the following order:

  1. CLI parameters (from the Run dbt command action or a DAG).

  2. Service configuration (environment variables).

  3. dbt default values.

Permissions

To run dbt successfully, the executing user (for example, airflow) must have appropriate file system permissions:

  • The user must have read access to the dbt project directory.

  • The user must have write access to the following directories:

    • custom paths (--target-path, --log-path), if specified;

    • default paths (<project_dir>/target, <project_dir>/logs).

If existing files (for example, manifest.json, dbt.log) are not writable, the execution fails.

All parent directories must allow traversal (+x permission). Missing execute permissions on parent directories prevents access even if file permissions are correct.

Usage

dbt commands are executed using the Run dbt command action.

To run a dbt command via ADCM:

  1. On the Clusters page, select the desired cluster.

  2. In the actions menu for the DBT service, select the Run dbt command action.

  3. Enter the path to the project’s directory in the Project dir field.

  4. Select the command from the Operation menu.

  5. Fill in other parameters if required.

  6. Click Next, toggle Raise non-blocking concern if needed, and confirm the action start by clicking Run.

To run a dbt command for a specific host:

  1. On the Clusters page, select the desired cluster.

  2. Go to the Hosts tab and select the desired host with a DBT component.

  3. In the list of components installed on that host, select the Run dbt command action for the DBT component.

  4. Fill in the necessary fields and run the action.

All commands are executed via /usr/bin/dbt. It’s a binary wrapper that initializes environment variables defined at the service level.

Use dbt in Airflow

When using dbt in Airflow DAGs:

  • Use the /usr/bin/dbt dbt binary.

  • Provide required parameters. For example:

    $ /usr/bin/dbt
        --project-dir <project_dir>
        ...
  • Provide overrides, if needed:

    $ /usr/bin/dbt
        --profiles-dir
        --target
        --log-path
        --target-path

DBT documentation

The DBT Docs component provides access to generated dbt documentation.

For each configured project, documentation is generated using dbt docs generate. Created documentation is available at http://<host>:<port>, where <host> is the IP address of the host with DBT and <port> is the port configured for DBT Docs in ADCM.

After configuration changes, the component restart is required.

Found a mistake? Seleсt text and press Ctrl+Enter to report it