Catalog management in Trino

Alpashkin Konstantin

Contents

Catalog configuration
View available catalogs
Preconfigured catalogs
Catalog management via ADCM
Catalog management properties

A Trino catalog is a collection of configuration properties that represent a single data source (for example, a database) to be accessed by the Trino engine. Every catalog configuration specifies a connector for accessing the underlying data source as well as other details required to establish the connection.

With Trino catalogs, you can refer to the underlying data source in your SQL queries by using a catalog name. For example:

SELECT * FROM "hive-adh"."default"."transactions";

Where:

hive-adh is the name of the built-in Hive catalog.
default is the name of the schema, used to group tables, views, etc. For example, a schema in Trino maps to a Hive database. For other data sources, the schema can map to a different entity.
transactions is the table to fetch data from.

Although the example above refers to Hive, catalogs can use any other connector to work with many data storage systems. With catalogs, the underlying data store is therefore abstracted for Trino clients.

You can use several catalogs within a single SQL statement to retrieve information from different data sources with one query. For example, the following SQL makes Trino fetch data from two independent data sources — a MySQL database and a Hive warehouse. After getting the result sets from both sources using corresponding connectors, Trino applies a JOIN operation and returns the final result set to the client.

SELECT txns.*, ords.*
FROM "mysql_test_catalog"."demo_schema"."transactions" txns
JOIN "hive-adh"."demo_schema"."orders" ords
ON txns.acc_id = ords.account_id;

Catalog configuration

Trino catalogs are defined using <catalog_name>.properties files stored in the Trino catalog directory. By default, this directory is located at /etc/trino/conf/catalog/ on the hosts with Trino components. To change the default catalog directory, use the catalog.config-dir setting in ADCM (Clusters → <clusterName> → Services → Trino → Components → Trino Coordinator → node.properties). The example of /etc/trino/conf/catalog/ content is below:

$ ls -l /etc/trino/conf/catalog

-rw-r--r--. 1 trino hadoop 433 Dec 19 14:34 hive-adh.properties
-rw-r--r--. 1 trino hadoop 280 Dec 20 10:21 iceberg-adh.properties

The .properties catalog configuration files include the following information:

The connector used by the catalog (connector.name). This is a mandatory property in a <catalog_name>.properties file.
Connectivity details to access the data source, such as connection URLs, credentials, etc.
Security parameters.
Other properties required for interaction with the data source. For example, data formats, compression codecs, etc.

Below is an example of the built-in Hive catalog configuration (hive-adh.properties), which is used by the Trino service to work with Hive:

connector.name=hive
fs.hadoop.enabled=True
hive.compression-codec=SNAPPY
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
hive.hdfs.impersonation.enabled=False
hive.metastore.thrift.client.ssl.enabled=False
hive.metastore.thrift.impersonation.enabled=True
hive.metastore.uri=thrift://ka-adh-3.ru-central1.internal:9083
hive.non-managed-table-writes-enabled=True
hive.storage-format=PARQUET

The name of the <catalog_name>.properties file determines the name of the catalog. For example, hive-adh.properties stores settings for the catalog named hive-adh.

View available catalogs

To get a list of catalogs available in the Trino service, use the command:

SHOW CATALOGS;

Sample output:

Catalog
---------
hive-adh
iceberg-adh
test-postgresql-catalog
system
(4 rows)

If you need to get all catalogs that use a specific connector, submit the following query to Trino:

SELECT *
FROM system.metadata.catalogs
WHERE connector_name='hive';

The sample result:

catalog_name|connector_id|connector_name|
------------+------------+--------------+
hive-adh    |hive-adh    |hive          |

Preconfigured catalogs

The Trino service comes with the following preconfigured catalogs out-of-the-box:

A Hive catalog named hive-adh. Used for working with the Hive service deployed in the ADH cluster.
An Iceberg catalog named iceberg-adh. Used for working with Iceberg tables stored in the ADH cluster.

If Hive is installed in the ADH cluster, the configurations for the built-in catalogs are generated automatically using up-to-date values for accessing your Hive service. You can edit the configurations of the built-in Hive/Iceberg catalogs in ADCM. For this, use the Hive configuration and Iceberg configuration sections in ADCM (Clusters → <clusterName> → Services → Trino → Primary configuration).

TIP

To overwrite configuration properties for the default Hive/Iceberg catalogs, use Custom Hive configuration and Custom Iceberg configuration fields respectively.

Catalog management via ADCM

In Trino, adding a new catalog assumes creating a <catalog_name>.properties file in the Trino catalog directory (/etc/trino/conf/catalog/) on the hosts of a Trino cluster. For this, ADCM provides a convenient UI for adding new catalogs instead of manual manipulations with .properties files.

The following steps show how to create a sample Trino catalog that interacts with Arenadata Postgres (ADPG) through the PostgreSQL connector:

In ADCM, go to Clusters → <clusterName> → Services → Trino → Primary configuration and enable the Custom catalogs option.
Expand Custom catalogs and specify the properties listed in the table.

catalog_name key value

test_catalog_adpg

connector.name

postgresql

connection-url

jdbc:postgresql://<ADPG_master_host>:5432

connection-user

postgres

connection-password

postgres
Save the configuration and restart the Trino service. During the restart, ADCM creates test_catalog_adpg.properties files on all ADH hosts with Trino components. Verify that the .properties files have been created:
$ ls /etc/trino/conf/catalog/
The output:
hive-adh.properties iceberg-adh.properties test_catalog_adpg.properties
Connect to Trino (for example, via Trino CLI or an SQL client, like DBeaver) and query available catalogs:
SHOW CATALOGS;
The result set includes the new catalog:
Catalog | -----------------+ hive-adh | iceberg-adh | system | test_catalog_adpg|

Now the data on the target ADPG server can be accessed via the newly created catalog. For example:

SELECT * FROM "test_catalog_adpg"."public"."book";

The result:

book_id|title                |author_id|in_stock|
-------+---------------------+---------+--------+
      1|Mrs. Dalloway        |        1|      25|
      2|To the Lighthouse    |        1|      13|
      3|To Kill a Mockingbird|        2|       0|
      4|The Great Gatsby     |        3|       3|
      5|The Lord of the Rings|        4|      45|

Catalog management properties

The following properties are used for managing catalogs in Trino. By default, the Trino service starts using the default parameter values. To override a configuration parameter, use the Custom config.properties section of the Trino Coordinator component in ADCM.

Property name Description Default value

catalog.management

Specifies the catalog management mode. The possible values are:

static. Reads configuration files and creates catalog entities only on the server startup. Adding a catalog requires a restart of the Trino service.
dynamic. Allows managing catalogs dynamically using the CREATE CATALOG and DROP CATALOG commands.
DBCatalogManager. Allows managing catalogs dynamically and stores catalog metadata in an ADPG/PostgreSQL database. Allows updating catalog configurations without restarting the Trino service.

This property is known to have limitations for some connectors. For more information, see Trino documentation

static

catalog.prune.update-interval

Requires catalog.management=dynamic. Specifies an interval for pruning dropped catalogs. Dropping a catalog does not interrupt running queries (if any), but makes the catalog unavailable for new queries

catalog.store

Possible values: file, memory. Requires catalog.management=dynamic. If set to file, creating and dropping catalogs using SQL commands also creates/removes <catalog_name>.properties files. Existing catalog files are also read on the Trino service startup. When set to memory, catalogs exist only in memory, and the .properties files are ignored on startup

file

catalog.config-dir

Requires catalog.management=static or catalog.store=file. Specifies the directory with catalog property files

/etc/trino/conf/catalog

catalog.disabled-catalogs

Requires catalog.management=static or catalog.store=file. Specifies a comma-separated list of catalogs to ignore while reading <catalog_name>.properties during the startup

—

catalog.read-only

Requires catalog.store=file. If set to true, the existing <catalog_name>.properties files cannot be removed with the DROP CATALOG command, and no new catalog files can be written with identical names using CREATE CATALOG operation. As a result, the restart of the Trino service resets all the catalogs to the existing files only

false

Found a mistake? Seleсt text and press Ctrl+Enter to report it