Catalog management in Trino
A Trino catalog is a collection of configuration properties that represent a single data source (for example, a database) to be accessed by the Trino engine. Every catalog configuration specifies a connector for accessing the underlying data source as well as other details required to establish the connection.
With Trino catalogs, you can refer to the underlying data source in your SQL queries by using a catalog name. For example:
SELECT * FROM "hive-adh"."default"."transactions";
Where:
-
hive-adh
is the name of the built-in Hive catalog. -
default
is the name of the schema, used to group tables, views, etc. For example, a schema in Trino maps to a Hive database. For other data sources, the schema can map to a different entity. -
transactions
is the table to fetch data from.
Although the example above refers to Hive, catalogs can use any other connector to work with many data storage systems. With catalogs, the underlying data store is therefore abstracted for Trino clients.
You can use several catalogs within a single SQL statement to retrieve information from different data sources with one query.
For example, the following SQL makes Trino fetch data from two independent data sources — a MySQL database and a Hive warehouse.
After getting the result sets from both sources using corresponding connectors, Trino applies a JOIN
operation and returns the final result set to the client.
SELECT txns.*, ords.*
FROM "mysql_test_catalog"."demo_schema"."transactions" txns
JOIN "hive-adh"."demo_schema"."orders" ords
ON txns.acc_id = ords.account_id;
Catalog configuration
Trino catalogs are defined using <catalog_name>.properties files stored in the Trino catalog directory. By default, this directory is located at /etc/trino/conf/catalog/ on the hosts with Trino components. To change the default catalog directory, use the catalog.config-dir setting in ADCM (Clusters → <clusterName> → Services → Trino → Components → Trino Coordinator → node.properties). The example of /etc/trino/conf/catalog/ content is below:
$ ls -l /etc/trino/conf/catalog
-rw-r--r--. 1 trino hadoop 433 Dec 19 14:34 hive-adh.properties -rw-r--r--. 1 trino hadoop 280 Dec 20 10:21 iceberg-adh.properties
The .properties catalog configuration files include the following information:
-
The connector used by the catalog (
connector.name
). This is a mandatory property in a <catalog_name>.properties file. -
Connectivity details to access the data source, such as connection URLs, credentials, etc.
-
Security parameters.
-
Other properties required for interaction with the data source. For example, data formats, compression codecs, etc.
Below is an example of the built-in Hive catalog configuration (hive-adh.properties), which is used by the Trino service to work with Hive:
connector.name=hive
fs.hadoop.enabled=True
hive.compression-codec=SNAPPY
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
hive.hdfs.impersonation.enabled=False
hive.metastore.thrift.client.ssl.enabled=False
hive.metastore.thrift.impersonation.enabled=True
hive.metastore.uri=thrift://ka-adh-3.ru-central1.internal:9083
hive.non-managed-table-writes-enabled=True
hive.storage-format=PARQUET
The name of the <catalog_name>.properties file determines the name of the catalog.
For example, hive-adh.properties stores settings for the catalog named hive-adh
.
View available catalogs
To get a list of catalogs available in the Trino service, use the command:
SHOW CATALOGS;
Sample output:
Catalog --------- hive-adh iceberg-adh test-postgresql-catalog system (4 rows)
If you need to get all catalogs that use a specific connector, submit the following query to Trino:
SELECT *
FROM system.metadata.catalogs
WHERE connector_name='hive';
The sample result:
catalog_name|connector_id|connector_name| ------------+------------+--------------+ hive-adh |hive-adh |hive |
Preconfigured catalogs
The Trino service comes with the following preconfigured catalogs out-of-the-box:
-
A Hive catalog named
hive-adh
. Used for working with the Hive service deployed in the ADH cluster. -
An Iceberg catalog named
iceberg-adh
. Used for working with Iceberg tables stored in the ADH cluster.
If Hive is installed in the ADH cluster, the configurations for the built-in catalogs are generated automatically using up-to-date values for accessing your Hive service. You can edit the configurations of the built-in Hive/Iceberg catalogs in ADCM. For this, use the Hive configuration and Iceberg configuration sections in ADCM (Clusters → <clusterName> → Services → Trino → Primary configuration).
TIP
To overwrite configuration properties for the default Hive/Iceberg catalogs, use Custom Hive configuration and Custom Iceberg configuration fields respectively.
|
Catalog management via ADCM
In Trino, adding a new catalog assumes creating a <catalog_name>.properties file in the Trino catalog directory (/etc/trino/conf/catalog/) on the hosts of a Trino cluster. For this, ADCM provides a convenient UI for adding new catalogs instead of manual manipulations with .properties files.
The following steps show how to create a sample Trino catalog that interacts with Arenadata Postgres (ADPG) through the PostgreSQL connector:
-
In ADCM, go to Clusters → <clusterName> → Services → Trino → Primary configuration and enable the Custom catalogs option.
-
Expand Custom catalogs and specify the properties listed in the table.
catalog_name key value test_catalog_adpg
connector.name
postgresql
connection-url
jdbc:postgresql://<ADPG_master_host>:5432
connection-user
postgres
connection-password
postgres
-
Save the configuration and restart the Trino service. During the restart, ADCM creates test_catalog_adpg.properties files on all ADH hosts with Trino components. Verify that the .properties files have been created:
$ ls /etc/trino/conf/catalog/
The output:
hive-adh.properties iceberg-adh.properties test_catalog_adpg.properties
-
Connect to Trino (for example, via Trino CLI or an SQL client, like DBeaver) and query available catalogs:
SHOW CATALOGS;
The result set includes the new catalog:
Catalog | -----------------+ hive-adh | iceberg-adh | system | test_catalog_adpg|
Now the data on the target ADPG server can be accessed via the newly created catalog. For example:
SELECT * FROM "test_catalog_adpg"."public"."book";
The result:
book_id|title |author_id|in_stock| -------+---------------------+---------+--------+ 1|Mrs. Dalloway | 1| 25| 2|To the Lighthouse | 1| 13| 3|To Kill a Mockingbird| 2| 0| 4|The Great Gatsby | 3| 3| 5|The Lord of the Rings| 4| 45|
You can also add catalogs manually by placing <catalog_name>.properties files under the catalog directory (/etc/trino/conf/catalog/) on all hosts with Trino components. An example of test_catalog_adpg.properties is below:
connector.name = postgresql
connection-url = jdbc:postgresql://10.92.40.105:5432/postgres
connection-user = postgres
connection-password = postgres
After placing the <catalog_name>.properties file to the catalog directory, restart the Trino service.
Catalog management properties
The following properties are used for managing catalogs in Trino. By default, the Trino service starts using the default parameter values. To override a configuration parameter, use the Custom config.properties section of the Trino Coordinator component in ADCM.
Property name | Description | Default value |
---|---|---|
catalog.management |
Possible values: This property is known to have limitations for some connectors. For more information, see Trino documentation |
static |
catalog.prune.update-interval |
Requires |
5s |
catalog.store |
Possible values: |
file |
catalog.config-dir |
Requires |
/etc/trino/conf/catalog |
catalog.disabled-catalogs |
Requires |
— |
catalog.read-only |
Requires |
false |