Iceberg tables
Overview
Apache Iceberg is an open-source table format for data lakes that enables ACID transactions, time travel, schema evolution, partition evolution, and other features.
It acts as an abstraction layer between a query engine and the data. Iceberg tables work similar to SQL tables and can be integrated with compute engines like Spark, Hive, Impala, etc.
Iceberg solves issues with traditional table formats, like the Hive table format, including data consistency, performance issues, the lack of schema evolution, and consistent concurrent writes in parallel.
Another important function of Iceberg tables is reliability. It allows tracking changes in a dataset and rolling back to the desired database state.
NOTE
In the ADCM paradigm, Iceberg is not a service. It’s a set of parameters for services like Hive, Spark, and Impala. These parameters can be configured to alter the way these services interact with data. By default, it’s enabled. |
For more information about Iceberg tables, refer to the Iceberg tables specification.
Differences from Hive tables
Traditionally, the Hive tables format has been utilized for data lakes. This format monitors data at the directory level, which can lead to performance issues when numerous file list operations are required. Additionally, there’s a risk of data loss when executing file list operations on an object store like S3, which is only eventually consistent.
In contrast, Iceberg maintains a comprehensive list of all files within a table by using a persistent tree structure. Any changes to a table are executed via an atomic object/file-level commit, which subsequently updates the path to a new metadata file. This metadata file contains the locations of all individual data files.
Features
The main features of Iceberg tables are:
-
Expressive SQL
Iceberg supports a wide range of SQL operations, enabling tasks like row updates, data merging, and targeted deletes. It allows for data file rewrites and supports such queries as, for example,
INSERT INTO
,MERGE INTO
, andINSERT OVERWRITE
. -
Schema evolution
Changes to the Iceberg schema only affect the metadata, not the actual data files. This supports seamless modifications to the data structure, such as adding, dropping, updating, reordering, and renaming columns.
-
Partition evolution
Unlike standard partitioning, Iceberg tables allow the alteration of the data’s partitioning schema without a need for a rewrite of all previously partitioned data.
-
Snapshots
A snapshot represents the table’s state at a particular moment. Iceberg maintains a log of past snapshots, enabling time-travel queries.
-
Time travel and rollback
Iceberg’s time travel functionality enables data querying at a specific point in time. The rollback feature lets users restore tables to their prior state.
-
Transactional consistency
Iceberg supports ACID transactions, ensuring safe concurrent writes in the cluster and preventing read operations from being impacted by writes. Whenever changes occur, Iceberg generates a new, immutable version of the table’s data files and metadata.
-
Faster querying
Iceberg provides several features aimed at boosting query speed and efficiency. These include the possibility of incremental data processing, fast scan planning, filtering metadata files, and the ability to filter out data files that don’t contain matching data.
NOTE
The available Iceberg features depend on the service you are using and its version. To find out more information about the available Iceberg features in a specific service, please refer to the documentation for that service. |
Architecture
There are three main layers in the Iceberg table architecture:
Iceberg catalog
In Iceberg, a group of tables is usually grouped into namespaces. An Iceberg catalog keeps track of those namespaces and tables, along with references or pointers to the corresponding metadata files.
The catalog is an add-on to an existing metadata service, such as HDFS, Hive Metastore, or a relational database (JDBC). ADH uses Hive Metastore as the Iceberg catalog implementation.
The metadata layer
Apache Iceberg uses three types of metadata files to maintain a file location map:
-
Manifest files — each manifest file tracks a subset of the files in a table for a single snapshot. These files track the individual files, their partitions, and column metrics.
-
Manifest lists — define a snapshot of the table and list all the manifest files that make up that snapshot. Additionally, they store metadata for every manifest file in a given snapshot.
-
Metadata files — define the table and track manifest lists, current and previous snapshots, the table and partition schemas.
The data layer
Iceberg tracks each data file in a table and saves information about its partitions and columns.
There are two categories of files in the data layer:
-
Data files — actual data in such file formats as Parquet, Avro, or ORC.
-
Delete files — the records about the files that have been marked as deleted.
Query lifecycle
To illustrate how a simple query operates within a cluster utilizing Iceberg tables, we can break down the lifecycle into the following steps:
-
When a query is submitted, the engine needs data from the Iceberg catalog and sends a request to retrieve the file path of the table’s current metadata file.
-
The engine gets the following information from the metadata file:
-
The table’s schema.
-
The partitioning of the table. It will help to determine how the data is organized so it can eliminate files that are not relevant to the query.
-
The current snapshot’s manifest list. It will help to determine what files to scan.
-
-
Next, the engine retrieves a list of manifest files from the manifest list. Each manifest file contains a list of data files that comprise the data in the selected snapshot. Key details include the
partition-spec-id
, which indicates the partition schema used when writing the snapshot, and the partition field summaries (PartitionFieldSummary
), which can help to determine whether a manifest can be skipped. -
The engine then verifies in parallel each of the selected manifest files, reading the metadata. This metadata includes the
schema-id
andpartition-id
values for each data file and the type of content (checks if it’s a data file or a delete file) to determine whether a data file is relevant to the query. -
The engine scans the relevant files and returns the results.
The process of running a query on an Iceberg table is considerably quicker than on other table formats, as the engine can disregard irrelevant files, resulting in a much smaller portion of the table that needs to be scanned.
Use cases
Below are examples of SQL requests that illustrate some of the Iceberg table features.
Branching
The branching is similar to version control systems like Git. You can create a local branch, commit necessary changes, and merge to the main branch.
An example statement for creating a branch:
ALTER TABLE test.db.table CREATE BRANCH 'task_1';
This syntax creates a branch with the same state as the current table, but you can create a branch of a specific table state using the FOR
clause.
An example statement for adding new values to the created branch:
INSERT INTO test.db.table.task_1 VALUES (1, 'a'), (2, 'b');
An example statement for reading from a specific branch:
SELECT column1, column2 FROM test.db.table.task_1;
For more information on branching, see the Branching and Tagging article.
Tagging
In Iceberg, you can mark a snapshot to return to it later. You can create, query, or delete a tag.
An example of syntax for creating a tag:
ALTER TABLE test.db.table CREATE TAG `test` RETAIN 10 DAYS;
Time travel
Using the time travel feature, you can make a request to a specific state of the table in time.
An example of syntax for querying a table at a specific time:
SELECT * FROM test.db.table TIMESTAMP AS OF '2007-09-03 04:21:00';
An example of syntax for querying a specific snapshot:
SELECT * FROM test.db.table VERSION AS OF 10726387402803;
An example of syntax for querying the head of a specific branch:
SELECT * FROM test.db.table VERSION AS OF 'task_1';
Read more about the time travel feature in the Time travel section.
Rollback
Using the rollback feature, you can return a table to a previous state. The state can be specified either as a snapshot ID or a timestamp.
An example of syntax for rolling back to a snapshot:
ROLLBACK TABLE test.db.table TO SNAPSHOT '2894821635228246189';
Schema evolution
With Iceberg tables, you can alter a table structure by adding, dropping, renaming, or reordering columns.
Since Iceberg only updates metadata when table schema is altered, the data doesn’t change when you make a schema update.
An example of syntax for changing a column name:
ALTER TABLE test.db.table CHANGE COLUMN names employee_names STRING;