Trino architecture

Overview

Trino is an open-source SQL query engine used for processing data in parallel, distributed over multiple storages, such as object storages, databases, and file systems.

It was designed for OLAP (Online Analytical Processing) type of workloads and is not efficient for OLTP (Online Transaction Processing).

A single server is enough to run all Trino components. Multiple nodes running Trino, which are configured to collaborate with each other, make up a Trino cluster.

Concepts

Trino is based on the following concepts:

  • Connector

    A connector is a Trino plugin that creates a connection to a data source using Trino’s service provider interface (SPI). A single connector can be used to access one or more data sources of the same type.

  • Catalog

    A catalog is a Trino abstraction to access a specific data source (e.g. MySQL database). It contains connection settings, credentials, URLs, etc. Every catalog uses a specific connector specified in the connector.name property. A catalog contains one or more schemas, which also contain objects such as tables or views. For more information about catalogs, refer to the Catalog management in Trino article.

  • Schema

    A schema defines the data that can be queried. For a relational database, a schema translates to the same concept in the target database, but for other data sources, a schema might be represented as a different type of table organization.

  • Statement

    In Trino, a statement is the textual representation of a statement written in SQL. When a statement is executed, Trino creates a query plan consisting of a series of interconnected stages running on Trino workers.

  • Stage

    In Trino, each execution is comprised of several stages that have a tree hierarchy. When Trino aggregates data from a data source, it creates a root stage to aggregate the output of several other stages, and other stages implement different sections of a distributed query plan.

  • Task

    A Trino task is a part of a query that is executed on a Trino worker. Each stage of execution may be represented by several tasks.

  • Split

    A split is a smaller part of a target dataset defined in the connector. When Trino is scheduling a query, the coordinator queries a connector for a list of all splits that are available for the table.

Components

Trino architecture is based on the following components:

  • Trino Coordinator

    A coordinator is a Trino server that handles incoming queries and manages the workers that execute them. The coordinator keeps track of which nodes are running which tasks and what splits are being processed by tasks.

  • Trino Worker

    A worker is a Trino server responsible for executing tasks and processing data.

Trino architecture
Trino architecture
Trino architecture
Trino architecture

Trino cluster includes one coordinator and multiple worker nodes. All communication and data transfer between Trino components happens via HTTP/HTTPS.

Connectors

Trino contains many built-in connectors for different data sources:

  • Data lakes and warehouses: HDFS, Hive, Iceberg, and other connectors.

  • Relational databases: MariaDB, MySQL, PostgreSQL, Oracle, ADPG, etc.

  • Other systems and utilities: Cassandra, ClickHouse, OpenSearch, Prometheus, Snowflake, JMX, etc.

For more information about connectors, see the Overview of Trino connectors.

Found a mistake? Seleсt text and press Ctrl+Enter to report it