Iceberg catalog in Trino

Konstantin Alpashkin

Contents

Enable Iceberg catalog
Supported table formats
Data types mapping
Iceberg catalog configuration properties
SQL support

The Trino service comes with a built-in catalog for working with Iceberg tables. This catalog configuration uses the Iceberg connector and out-of-the-box is ready to work with Iceberg tables stored in your ADH cluster. The built-in Iceberg catalog uses Hive Metastore as an Iceberg catalog implementation. That is, to execute a query on an Iceberg table, Trino requests metadata about the table from Hive Metastore.

In the built-in Iceberg catalog, all the configuration properties are auto-generated to ensure access to Hive Metastore and Hive warehouse directories of your ADH cluster. The built-in Iceberg catalog name is iceberg-adh. ADH forms this catalog automatically and generates the iceberg-adh.properties file at /etc/trino/conf/catalog/ if:

Your ADH cluster has the Hive service installed. If Hive was installed after Trino, the Trino service must be restarted.
The Iceberg catalog configuration is active.

Enable Iceberg catalog

By default, the built-in Iceberg catalog is disabled and can be activated using ADCM. For this, toggle the Iceberg configuration option of the Trino service (Clusters → <clusterName> → Services → Trino → Configuration) and then restart the service.

Once enabled, check the catalog availability. For this, connect to Trino Coordinator (for example, via Trino CLI or an SQL client like DBeaver) and run the command:

SHOW CATALOGS;

The built-in Iceberg catalog appears in the list:

Catalog    |
-----------+
iceberg-adh|
system     |

TIP

The up-to-date JDBC connection string for connecting to the Trino service is available on the service configuration page in ADCM (Clusters → <clusterName> → Services → Trino → Info).

Supported table formats

Trino’s Iceberg connector can work with Iceberg tables in the following formats:

Parquet
ORC
Avro

Data types mapping

When reading and writing data to Iceberg tables, Trino automatically converts certain data types to adhere to the Iceberg specification and Avro/ORC/Parquet formats. The mapping rules may vary depending on the direction.

For information on mapping Iceberg types to Parquet/ORC/Avro types, follow the links to the Iceberg spec:

Iceberg to Trino type mapping

Iceberg type

Trino type

BOOLEAN

INT

INTEGER

LONG

BIGINT

FLOAT

REAL

DOUBLE

DECIMAL(p,s)

DATE

TIME

TIME(6)

TIMESTAMP

TIMESTAMP(6)

TIMESTAMPTZ

TIMESTAMP(6) WITH TIME ZONE

STRING

VARCHAR

UUID

BINARY

VARBINARY

FIXED (L)

VARBINARY

STRUCT(…)

ROW(…)

LIST(e)

ARRAY(e)

MAP(k,v)

Trino to Iceberg type mapping

Trino type

Iceberg type

BOOLEAN

INTEGER

INT

BIGINT

LONG

REAL

FLOAT

DOUBLE

DECIMAL(p,s)

DATE

TIME(6)

TIME

TIMESTAMP(6)

TIMESTAMP

TIMESTAMP(6) WITH TIME ZONE

TIMESTAMPTZ

VARCHAR

STRING

UUID

VARBINARY

BINARY

ROW(…)

STRUCT(…)

ARRAY(e)

LIST(e)

MAP(k,v)

Iceberg catalog configuration properties

The Trino’s Iceberg catalog supports numerous configuration properties for fine-tuning of the interaction with Iceberg tables. The up-to-date configuration properties list is available in Trino documentation. You can specify new or override the default property values using ADCM. For this, use the Custom Iceberg configuration settings section (Clusters → <clusterName> → Services → Trino → Configuration).

SQL support

IMPORTANT

To allow Trino to write to Iceberg tables, the user name specified while connecting to Trino (for example, via Trino CLI or DBeaver) must match the Unix user on the ADH host with sufficient permissions to write to Hive warehouse directories in HDFS. Otherwise, you may observe a similar error:

Failed to create external path hdfs://adh/apps/hive/warehouse/trino_demo_schema1.db for database trino_demo_schema1. This may result in access not being allowed if the StorageBasedAuthorizationProvider is enabled: null

Schema management

A schema in Trino is used to group several tables/views and is equivalent to a database in Hive parlance. Below are basic commands for managing Trino schemas:

CREATE SCHEMA "iceberg-adh"."trino_demo_schema_iceberg"; (1)
CREATE SCHEMA "iceberg-adh"."trino_demo_schema_iceberg_with_loc"
    WITH (location='/user/hive/warehouse/some_custom_path'); (2)
SHOW SCHEMAS;
USE "iceberg-adh"."trino_demo_schema_iceberg"; (3)
DROP SCHEMA "trino_demo_schema_iceberg"; (4)

1	Creates a new schema (Hive database). `iceberg-adh` is the name of the built-in Iceberg catalog. Notice the double quotes used for identifiers.
2	Creates a schema (Hive database) using a custom HDFS location.
3	Specifies a session schema. All the subsequent SQL commands submitted to Trino within the current session will run on the selected Hive database.
4	Deleting a non-empty schema is prohibited. To delete a schema, you have to remove all nested tables first.

To run Trino queries on arbitrary tables that belong to different schemas, you can specify a catalog, schema, and table name within your SQL command, for example:

SELECT * FROM "iceberg-adh"."trino_demo_schema_iceberg"."transactions"

TIP

Using Trino UI, you can view and analyze SQL queries submitted to Trino.

DDL examples

Trino supports common DDL operations like CREATE, DROP, ALTER, etc., for managing Iceberg tables and materialized views. Trino supports a set of Iceberg table properties that specify metadata for Iceberg tables. These properties can be applied to an Iceberg table using the WITH clause as shown in the examples below. The following snippet provides examples of basic DDL operations using the Iceberg catalog.

CREATE TABLE "iceberg-adh"."trino_demo_schema_iceberg"."transactions" ( (1)
    txn_id BIGINT NOT NULL,
    txn_value DOUBLE,
    acc_id BIGINT,
    txn_date DATE,
    comment VARCHAR
);

CREATE TABLE "iceberg-adh"."trino_demo_schema_iceberg"."transactions_pttnd" ( (2)
    txn_id BIGINT,
    txn_value DOUBLE,
    acc_id BIGINT,
    txn_date DATE,
    comment VARCHAR
) WITH (
    format = 'PARQUET',
    partitioning = ARRAY['acc_id']
);

CREATE TABLE "iceberg-adh"."trino_demo_schema_iceberg"."transactions_pttnd_ts" ( (3)
    txn_id BIGINT,
    txn_value DOUBLE,
    acc_id BIGINT,
    txn_date DATE,
    comment VARCHAR
) WITH (
    partitioning = ARRAY['month(txn_date)', 'bucket(acc_id, 5)']
);

CREATE TABLE "iceberg-adh"."trino_demo_schema_iceberg"."transactions_sorted" ( (4)
    txn_id BIGINT,
    txn_value DOUBLE,
    acc_id BIGINT,
    txn_date DATE,
    comment VARCHAR
) WITH (
    sorted_by = ARRAY['txn_value']
);

CREATE TABLE "iceberg-adh"."trino_demo_schema_iceberg"."transactions_orc_as" (5)
WITH (
    format = 'ORC',
    orc_bloom_filter_columns = ARRAY['acc_id','txn_date'],
    orc_bloom_filter_fpp = 0.1
) AS
    SELECT * FROM "iceberg-adh"."trino_demo_schema_iceberg"."transactions"
    WHERE txn_id < 10;

ALTER TABLE "iceberg-adh"."trino_demo_schema_iceberg"."transactions_pttnd" (6)
SET PROPERTIES partitioning = ARRAY['acc_id', 'txn_date'];

ALTER TABLE "iceberg-adh"."trino_demo_schema_iceberg"."transactions" (7)
EXECUTE expire_snapshots(retention_threshold => '1d');


CREATE MATERIALIZED VIEW "iceberg-adh"."trino_demo_schema_iceberg"."transactions_mv"  (8)
WITH (
    partitioning = ARRAY['txn_date']
) AS

1	Creates an Iceberg table (`table_type=ICEBERG`) in the default Hive warehouse location. Notice that Trino accepts queries in ANSI SQL with corresponding data types, whereas the Iceberg specification may use slightly different data types. For example, when running this query, the Trino’s `VARCHAR` data type will be cast to the Iceberg’s `STRING`.
2	Creates a Parquet-formatted Iceberg table with partitions. Notice the `ARRAY['col1', 'col2']` syntax for defining partitions.
3	Creates a table partitioned by months of `txn_date` and also by hashes of `acc_id` (5 partitions).
4	Creates a table, whose content is stored in Iceberg files sorted by a column (`txn_value`). Storing sorted table data can be beneficial in terms of performance for future read operations.
5	Creates an ORC-formatted table using the CTAS (`CREATE TABLE AS`) syntax. Additionally, sets ORC-specific bloom filtering properties.
6	Adds more partition columns to a partitioned table using `ALTER TABLE SET`. For more information about Iceberg table properties that can be set via `ALTER TABLE SET`, see Trino documentation.
7	Removes all the snapshots, metadata, and data files of an Iceberg table, which are older than the specified retention value (7 days in this example). Running this command periodically allows cleaning up unnecessary historical data files and helps to keep your table metadata small. For information on other Iceberg-specific commands (`remove_orphan_files`, `drop_extended_stats`, etc.) that can be invoked using `ALTER TABLE EXECUTE`, see Trino documentation.
8	Creates a materialized view based on results of a `SELECT` query.

For more information on DDL commands supported by the Iceberg catalog in Trino, see Trino documentation.

DML examples

Trino supports common DDL operations like INSERT, UPDATE, DELETE, MERGE, etc. The following snippet provides examples of DML operations via the Iceberg catalog.

SELECT * FROM "iceberg-adh"."trino_demo_schema_iceberg"."transactions_pttnd$partitions"; (1)

SELECT * FROM "iceberg-adh"."trino_demo_schema_iceberg"."transactions_pttnd$snapshots"; (2)

SELECT * FROM TABLE( (3)
    system.table_changes(
      schema_name => "trino_demo_schema_iceberg",
      table_name => "transactions",
      start_snapshot_id => 3196536194725230246,
      end_snapshot_id => 248634242462154958
    )
  );

SELECT * FROM "iceberg-adh"."trino_demo_schema_iceberg"."transactions_pttnd"
FOR VERSION AS OF 2602443793554441031; (4)

SELECT * FROM "iceberg-adh"."trino_demo_schema_iceberg"."transactions_pttnd" FOR TIMESTAMP AS OF TIMESTAMP '2025-01-17 01:41:38.000 +0300' (5)

SELECT * FROM "iceberg-adh"."trino_demo_schema_iceberg"."transactions" txns_icb
JOIN "hive-adh"."trino_demo_schema_hive"."transactions" txns_hive
ON txns_icb.txn_id = txns_hive.txn_id; (6)

DELETE FROM "iceberg-adh"."trino_demo_schema_iceberg"."transactions_pttnd"
WHERE acc_id = 1001; (7)

1	Retrieves detailed information about the partitions of the given Iceberg table by using the hidden `$partitions` metadata table. Similarly, you can make use of other metadata tables like `$manifests`, `$files`, `$history`, etc. For more information on fetching from Iceberg metadata tables, see Trino documentation.
2	Gets detailed information about the snapshots available for the given Iceberg table.
3	Returns row-level changes between two versions of an Iceberg table.
4	A time-travel `SELECT` using a snapshot ID. The state of the table to read data from is determined by an exact snapshot. Such a query ignores all the changes made to the table after the given snapshot was created.
5	A time-travel `SELECT` using a timestamp. The state of the table to read data from is determined by a snapshot closest to the provided timestamp. Such a query ignores all the changes made to the table after the given timestamp.
6	Selects data from two independent Trino catalogs — an Iceberg catalog (`iceberg-adh`) and a Hive catalog (`hive-adh`). After Trino gets result sets from both catalogs, it applies a `JOIN` operation and returns the final result set to the client. For this sample query to work, both catalogs must be enabled.
7	Deletes the entire partition if the `WHERE` clause specifies filters that exactly match a partition.

Found a mistake? Seleсt text and press Ctrl+Enter to report it