Use TTL to manage data
Overview
For MergeTree tables, it is possible to set TTL (time-to-live) — time interval after which old data will be deleted from a table in the background. TTL can be set for a whole table (entire data rows with outdated values ββwill be deleted) and for each individual column (outdated values ββin a column will be replaced with default values ββfor the column data type).
Table-level TTL also allows you to schedule the following actions to be performed automatically on expired data rows:
-
Move data between disks or volumes of an ADQM storage — this can be useful when you deploy an architecture to separate data into "hot", "warm", and "cold" (see Configure data storages).
-
Rollup data into some aggregations and computations — you can store old data with a lower level of detail to save resources.
-
Apply a higher level of compression to old data.
Thus, you can optimize a storage and improve query performance by managing your data lifecycle with the TTL mechanism.
When TTL events are triggered
ADQM deletes data with expired TTL when merging data parts (not immediately). It periodically performs "off-schedule" merges to delete or compress data according to TTL rules. You can control the frequency of such merges with the special parameters of a MergeTree table:
-
merge_with_ttl_timeout
— minimum delay in seconds between merges to delete data based TTL; -
merge_with_recompression_ttl_timeout
— minimum delay in seconds between merges for data compression based on TTL.
By default, these parameters are set to 14400 seconds. This means that deletion/compression of outdated data will be performed every 4 hours or when a background merge of data parts occurs. If you want TTL rules to apply more frequently, change the corresponding setting above for your table. Note that too low value can lead to performing many "off-schedule" merges and consuming a lot of server resources — so it is not recommended to use values smaller than 300 seconds.
A MergeTree table also has the ttl_only_drop_parts
setting that affects the expired data deletion as follows (depending on this parameter value):
-
0
(default) — ADQM deletes only rows expired according to their TTL; -
1
— ADQM deletes a whole data part when all rows in it are expired.
Deleting whole parts instead of individual rows allows you to set a smaller merge_with_ttl_timeout
timeout and reduce the impact of TTL operations on system performance.
Operations of moving data between disks/volumes or recompressing data are performed only when all rows in a data part are out of date.
When you run a SELECT
query between merges, you can get expired data in a table. To avoid this, run the OPTIMIZE … FINAL
query before selecting (to force merging). However, it is not recommended to use it frequently, especially for large tables.
Table TTL
Create a table with TTL
To set TTL for a table when creating it, use the TTL
clause at the end of the table definition in the CREATE TABLE query. After the TTL
clause, you can specify one rule to delete expired rows and/or multiple TTL rules to process data in a different way (for example, to move data to another disk/volume or recompress it).
Below is the basic syntax of a query for creating a table with TTL:
CREATE TABLE <table_name>
( <timestamp_column> Date|DateTime ...,
<column1> ...,
<column2> ...,
...)
ENGINE = MergeTree()
TTL <ttl_expr> [<ttl_action>] [, <ttl_expr2> [<ttl_action2>]] [, ...];
where each TTL rule includes:
-
<ttl_expr>
— TTL expression that specifies when data to be considered outdated; -
<ttl_action>
— action to be performed after the time specified by the TTL expression has passed.
A TTL expression that should return a value of the Date or DateTime type is usually specified in the following format:
<timestamp_column> [+ INTERVAL <time_interval> <interval_type>]
where:
-
<timestamp_column>
— name of a column of the Date or DateTime type with timestamps relative to which a time to live of data will be determined; -
INTERVAL <time_interval> <interval_type>
— value of the Interval type that describes how long a data row should live, relative to the timestamp from the<timestamp_column>
column. The structure of this value includes a time interval as a positive integer value and an interval type — for example,1 MONTH
,3 DAY
,5 HOUR
.
To specify a time interval, you can use the toInterval conversion function instead of the INTERVAL
operator. In this case, an example of a TTL expression can look like one of the following:
<timestamp_column> toIntervalMonth(1)
<timestamp_column> toIntervalDay(3)
<timestamp_column> toIntervalHour(5)
To specify actions to be performed on data rows that have expired according to their TTL expressions, use clauses listed in the table below.
Clause | Basic TTL syntax | Action description |
---|---|---|
DELETE |
|
Delete expired data rows. This is the default action (in other words, expired rows are removed if another action is not specified explicitly after the |
TO DISK |
|
Move data to the specified disk |
TO VOLUME |
|
Move data to the specified volume |
RECOMPRESS |
|
Compress data using the specified codec |
GROUP BY |
|
Aggregate data. When defining a rule for aggregating old data, take into account the following points:
|
For the DELETE
or GROUP BY
action, you can specify a filtering condition with the WHERE
clause to delete/aggregate only some of the expired rows. For moving and compressing data, the WHERE
clause is not applicable.
Modify TTL
You can change TTL of an existing table using the following query:
ALTER TABLE <table_name> MODIFY TTL <new_ttl_rules>;
After this query execution, each time new data is inserted or a new data part is created as a result of a background merge operation, TTL will be calculated according to a new rule.
To recalculate TTL for existing data, TTL materialization is required. The materialize_ttl_after_modify
setting (enabled by default) controls whether TTL materialization should be performed automatically after the ALTER TABLE … MODIFY TTL
query execution. You can also force TTL materialization with the following query:
ALTER TABLE <table_name> MATERIALIZE TTL;
TTL materialization can be a quite heavy operation (especially for large tables), as it recalculates TTL in all data parts (updates ttl.txt files) and rewrites data for all columns, which can result in a lot of read/write operations. To facilitate the TTL materialization operation, you can set materialize_ttl_recalculate_only = 1
in the MergeTree table settings. In this case, only ttl.txt recalculation will be performed (it is a lightweight operation as only columns used in the TTL expression are read) and all columns will be copied via hard links. Some time later, expired data will be deleted during merge with ttl_only_drop_parts = 1
.
In some cases, an optimal solution can be to disable TTL materialization (materialize_ttl_after_modify = 0
) and delete/move old partitions manually (using the ALTER TABLE … DROP|MOVE PARTITION
queries — see Manipulate partitions and data parts for details).
Examples
-
Create a table with a TTL rule to remove rows that will be considered outdated when 10 minutes have passed since the time in the
time
column:CREATE TABLE test_table (time DateTime, value UInt64) ENGINE = MergeTree ORDER BY time TTL time + INTERVAL 10 MINUTE;
Fill the table with test data:
INSERT INTO test_table SELECT now() - toIntervalMinute(number), number FROM numbers(10);
SELECT * FROM test_table;
βββββββββββββββββtimeββ¬βvalueββ 1. β 2024-06-07 13:15:49 β 9 β 2. β 2024-06-07 13:16:49 β 8 β 3. β 2024-06-07 13:17:49 β 7 β 4. β 2024-06-07 13:18:49 β 6 β 5. β 2024-06-07 13:19:49 β 5 β 6. β 2024-06-07 13:20:49 β 4 β 7. β 2024-06-07 13:21:49 β 3 β 8. β 2024-06-07 13:22:49 β 2 β 9. β 2024-06-07 13:23:49 β 1 β 10. β 2024-06-07 13:24:49 β 0 β βββββββββββββββββββββββ΄ββββββββ
-
Wait for 5 minutes after inserting data and run the query:
OPTIMIZE TABLE test_table FINAL;
Repeat the query to select data from the table:
SELECT * FROM test_table;
You can see that the first five rows have been removed from the table (for each row, 10 minutes have passed since the time specified in the
time
column):βββββββββββββββββtimeββ¬βvalueββ 1. β 2024-06-07 13:20:49 β 4 β 2. β 2024-06-07 13:21:49 β 3 β 3. β 2024-06-07 13:22:49 β 2 β 4. β 2024-06-07 13:23:49 β 1 β 5. β 2024-06-07 13:24:49 β 0 β βββββββββββββββββββββββ΄ββββββββ
If you do not use the
OPTIMIZE TABLE … FINAL
query and runSELECT
between merges, outdated records can still exist in the table.
-
Create the
alerts
table and define a TTL rule to remove rows with theCritical
value in thealert
column and a time value in thetime
column after which 10 minutes have passed:CREATE TABLE alerts (time DateTime, value UInt64, alert String) ENGINE = MergeTree ORDER BY time TTL time + INTERVAL 10 MINUTE DELETE WHERE alert = 'Critical';
Insert test data into the table:
INSERT INTO alerts SELECT now() - toIntervalMinute(number), number, ['Warning', 'Critical', 'Info'][(rand() % 3) + 1] FROM numbers(10);
SELECT * FROM alerts;
βββββββββββββββββtimeββ¬βvalueββ¬βalertβββββ 1. β 2024-06-10 07:14:56 β 9 β Critical β 2. β 2024-06-10 07:15:56 β 8 β Warning β 3. β 2024-06-10 07:16:56 β 7 β Info β 4. β 2024-06-10 07:17:56 β 6 β Critical β 5. β 2024-06-10 07:18:56 β 5 β Warning β 6. β 2024-06-10 07:19:56 β 4 β Info β 7. β 2024-06-10 07:20:56 β 3 β Critical β 8. β 2024-06-10 07:21:56 β 2 β Warning β 9. β 2024-06-10 07:22:56 β 1 β Warning β 10. β 2024-06-10 07:23:56 β 0 β Critical β βββββββββββββββββββββββ΄ββββββββ΄βββββββββββ
-
10 minutes after inserting the data, initiate a merge of data parts and look at the table contents:
OPTIMIZE TABLE alerts FINAL;
SELECT * FROM alerts;
All old
Critical
alerts have been removed from the table:βββββββββββββββββtimeββ¬βvalueββ¬βalertββββ 1. β 2024-06-10 07:15:56 β 8 β Warning β 2. β 2024-06-10 07:16:56 β 7 β Info β 3. β 2024-06-10 07:18:56 β 5 β Warning β 4. β 2024-06-10 07:19:56 β 4 β Info β 5. β 2024-06-10 07:21:56 β 2 β Warning β 6. β 2024-06-10 07:22:56 β 1 β Warning β βββββββββββββββββββββββ΄ββββββββ΄ββββββββββ
If you need to move data to another table before it is deleted from the main table, you can use TTL in combination with a materialized view.
-
Create:
-
the
alerts
table with a TTL rule to remove alerts after 10 minutes relative to the time from thetime
column; -
the
alerts_history
table to storeCritical
alerts from thealerts
table.
DROP TABLE IF EXISTS alerts;
CREATE TABLE alerts (time DateTime, value UInt64, alert String) ENGINE = MergeTree ORDER BY time TTL time + INTERVAL 10 MINUTE DELETE;
CREATE TABLE alerts_history (time DateTime, value UInt64, alert String) ENGINE = MergeTree ORDER BY time;
Note that the
alerts_history
table has the same structure asalerts
, but do not use theCREATE TABLE alerts_history AS alerts
query as it will also copy the TTL rule which is not needed. -
-
Create a materialized view that will be triggered on inserting data into the
alerts
table and automatically loadCritical
alerts intoalerts_history
:CREATE MATERIALIZED VIEW alerts_history_mv TO alerts_history AS SELECT * FROM alerts WHERE alert = 'Critical';
-
Insert test data into the
alerts
table:INSERT INTO alerts SELECT now() - toIntervalMinute(number), number, ['Warning', 'Critical', 'Info'][(rand() % 3) + 1] FROM numbers(10);
SELECT * FROM alerts;
βββββββββββββββββtimeββ¬βvalueββ¬βalertβββββ 1. β 2024-06-10 08:02:16 β 9 β Warning β 2. β 2024-06-10 08:03:16 β 8 β Warning β 3. β 2024-06-10 08:04:16 β 7 β Info β 4. β 2024-06-10 08:05:16 β 6 β Critical β 5. β 2024-06-10 08:06:16 β 5 β Warning β 6. β 2024-06-10 08:07:16 β 4 β Warning β 7. β 2024-06-10 08:08:16 β 3 β Critical β 8. β 2024-06-10 08:09:16 β 2 β Warning β 9. β 2024-06-10 08:10:16 β 1 β Critical β 10. β 2024-06-10 08:11:16 β 0 β Warning β βββββββββββββββββββββββ΄ββββββββ΄βββββββββββ
-
Wait for 10 minutes and run the query:
OPTIMIZE TABLE alerts FINAL;
TTL will delete records from the
alerts
table, but rows with theCritical
value in thealert
column will be retained in thealerts_history
table:SELECT * FROM alerts_history;
βββββββββββββββββtimeββ¬βvalueββ¬βalertβββββ 1. β 2024-06-10 08:05:16 β 6 β Critical β 2. β 2024-06-10 08:08:16 β 3 β Critical β 3. β 2024-06-10 08:10:16 β 1 β Critical β βββββββββββββββββββββββ΄ββββββββ΄βββββββββββ
-
Create the
alerts
table and fill it with test data:DROP TABLE IF EXISTS alerts;
CREATE TABLE alerts (time DateTime, value UInt64, alert String) ENGINE = MergeTree ORDER BY (toHour(time), alert);
INSERT INTO alerts SELECT now() - toIntervalMinute(number), toInt64(randUniform(0, 100)), ['Warning', 'Critical', 'Info'][(rand() % 3) + 1] FROM numbers(15);
SELECT * FROM alerts;
βββββββββββββββββtimeββ¬βvalueββ¬βalertβββββ 1. β 2024-07-26 21:51:41 β 5 β Critical β 2. β 2024-07-26 21:48:41 β 18 β Critical β 3. β 2024-07-26 21:46:41 β 53 β Critical β 4. β 2024-07-26 21:43:41 β 33 β Critical β 5. β 2024-07-26 21:50:41 β 98 β Info β 6. β 2024-07-26 21:49:41 β 74 β Info β 7. β 2024-07-26 21:47:41 β 73 β Info β 8. β 2024-07-26 21:42:41 β 52 β Info β 9. β 2024-07-26 21:41:41 β 92 β Info β 10. β 2024-07-26 21:39:41 β 70 β Info β 11. β 2024-07-26 21:38:41 β 81 β Info β 12. β 2024-07-26 21:37:41 β 72 β Info β 13. β 2024-07-26 21:45:41 β 7 β Warning β 14. β 2024-07-26 21:44:41 β 96 β Warning β 15. β 2024-07-26 21:40:41 β 0 β Warning β βββββββββββββββββββββββ΄ββββββββ΄βββββββββββ
-
Five minutes after inserting data (at
21:56:41
relative to timestamps generated above), modify the table to add a TTL rule that groups outdated data rows (if 10 minutes have passed since the time in thetime
column) by time and alert type, with each group storing the maximum value in thevalue
column and the most recent time in thetime
column:ALTER TABLE alerts MODIFY TTL time + INTERVAL 10 MINUTE GROUP BY toHour(time), alert SET value = max(value), time = max(time);
Note that
GROUP BY
columns in TTL should be a prefix of the table primary key (ORDER BY
ifPRIMARY KEY
is not specified separately). -
Perform TTL materialization to apply the TTL rule to existing data:
ALTER TABLE alerts MATERIALIZE TTL;
-
Check the contents of the table:
SELECT * FROM alerts;
Rows for which the TTL rule was met (10 minutes have passed since their
time
values) were grouped, while the rest of the rows remained as before:βββββββββββββββββtimeββ¬βvalueββ¬βalertβββββ 1. β 2024-07-26 21:51:41 β 5 β Critical β 2. β 2024-07-26 21:48:41 β 18 β Critical β 3. β 2024-07-26 21:46:41 β 53 β Critical β 4. β 2024-07-26 21:50:41 β 98 β Info β 5. β 2024-07-26 21:49:41 β 74 β Info β 6. β 2024-07-26 21:47:41 β 73 β Info β 7. β 2024-07-26 21:42:41 β 92 β Info β 8. β 2024-07-26 21:45:41 β 96 β Warning β βββββββββββββββββββββββ΄ββββββββ΄βββββββββββ
If you run optimization after another 10 minutes, the data will be aggregated further — the table will have one row for the most recent alert of each type:
OPTIMIZE TABLE alerts FINAL;
βββββββββββββββββtimeββ¬βvalueββ¬βalertβββββ 1. β 2024-07-26 21:51:41 β 53 β Critical β 2. β 2024-07-26 21:50:41 β 98 β Info β 3. β 2024-07-26 21:45:41 β 96 β Warning β βββββββββββββββββββββββ΄ββββββββ΄βββββββββββ
-
Create a table with a TTL rule to compress old data with the
LZ4HC(10)
codec and also setmerge_with_recompression_ttl_timeout = 1200
(in seconds) in the table settings to specify how often data should be recompressed according to TTL (in this example, every 20 minutes):DROP TABLE IF EXISTS alerts;
CREATE TABLE alerts (time DateTime, value UInt64, alert String) ENGINE = MergeTree ORDER BY time TTL time + INTERVAL 10 MINUTE RECOMPRESS CODEC(LZ4HC(10)) SETTINGS merge_with_recompression_ttl_timeout = 1200;
-
Insert data:
INSERT INTO alerts SELECT now() - toIntervalMinute(number), number, ['Warning', 'Critical', 'Info'][(rand() % 3) + 1] FROM numbers(100);
-
View information about data parts of the
alerts
table in thesystem.parts
system table:SELECT name, active, rows, data_compressed_bytes, data_uncompressed_bytes FROM system.parts WHERE table = 'alerts' FORMAT Vertical;
Inserted data rows were written as one part:
Row 1: ββββββ name: all_1_1_0 active: 1 rows: 100 data_compressed_bytes: 1114 data_uncompressed_bytes: 1933
-
Repeat the above
INSERT INTO
andSELECT … FROM system.parts
queries. Now the table has two data parts:Row 1: ββββββ name: all_1_1_0 active: 1 rows: 100 data_compressed_bytes: 1114 data_uncompressed_bytes: 1933 Row 2: ββββββ name: all_2_2_0 active: 1 rows: 100 data_compressed_bytes: 1125 data_uncompressed_bytes: 1931
-
Some time later, repeat the
SELECT … FROM system.parts
query. In the query results, you can see:-
initial data parts have been merged into one part (
all_1_2_1
) containing 200 rows; -
the size of uncompressed data in the resulting part is equal to the total amount of uncompressed data in two merged parts;
-
the size of compressed data in the resulting part is smaller than the size of compressed data in both initial parts because the outdated data has been recompressed with the LZ4HC(10) codec during merging.
Row 1: ββββββ name: all_1_2_1 active: 1 rows: 200 data_compressed_bytes: 1509 data_uncompressed_bytes: 3864
-
In this example, separate disks are used to store "hot" and "cold" data — SSD and HDD disks mounted on the following directories:
-
/mnt/ssd/adqm/
-
/mnt/hdd/adqm/
Each directory is assigned the clickhouse
owner:
$ sudo chown clickhouse:clickhouse -R /mnt/ssd/adqm/ /mnt/hdd/adqm/
-
In the ADCM interface, configure a tiered storage for ADQM as described below.
Specify local disks in the Enable additional local storage section (previously enabled) of the ADQMDB serviceβs configuration page:
Configure local disksIn the Storage policies section, create a policy (
hot_to_cold
) that includes two volumes for storing "hot" and "cold" data.Configure storage policyClick Save and run the Reconfig and restart action for the service to apply the changes.
After that, information about disks and policy will appear in the appropriate tags inside the
<storage_configuration>
section in the /etc/clickhouse-server/config.d/storage.xml file.storage.xml<storage_configuration> <disks> <disk_ssd> <type>local</type> <path>/mnt/ssd/adqm/</path> </disk_ssd> <disk_hdd> <type>local</type> <path>/mnt/hdd/adqm/</path> </disk_hdd> </disks> <policies> <hot_to_cold> <volumes> <volume_hot> <disk>disk_ssd</disk> </volume_hot> <volume_cold> <disk>disk_hdd</disk> </volume_cold> </volumes> </hot_to_cold> </policies> </storage_configuration>
You can view available disks in the
system.disks
system table and storage policies in thesystem.storage_policies
table:SELECT name, path, free_space, total_space FROM system.disks;
ββnameββββββ¬βpathββββββββββββββββββ¬ββfree_spaceββ¬βtotal_spaceββ 1. β default β /var/lib/clickhouse/ β 45148381184 β 53674487808 β 2. β disk_hdd β /mnt/hdd/adqm/ β 45148381184 β 53674487808 β 3. β disk_ssd β /mnt/ssd/adqm/ β 45148381184 β 53674487808 β ββββββββββββ΄βββββββββββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ
SELECT policy_name, volume_name, disks FROM system.storage_policies;
ββpolicy_nameββ¬βvolume_nameββ¬βdisksβββββββββ 1. β default β default β ['default'] β 2. β hot_to_cold β volume_hot β ['disk_ssd'] β 3. β hot_to_cold β volume_cold β ['disk_hdd'] β βββββββββββββββ΄ββββββββββββββ΄βββββββββββββββ
-
Create a table with a TTL rule to move data between the
volume_hot
andvolume_cold
volumes:DROP TABLE IF EXISTS alerts;
CREATE TABLE alerts (time DateTime, value UInt64, alert String) ENGINE = MergeTree ORDER BY time TTL time TO VOLUME 'volume_hot', time + INTERVAL 5 MINUTE TO VOLUME 'volume_cold' SETTINGS storage_policy = 'hot_to_cold';
-
Insert test data into the table:
INSERT INTO alerts SELECT now() - toIntervalMinute(number), number, ['Warning', 'Critical', 'Info'][(rand() % 3) + 1] FROM numbers(5);
Repeat the
INSERT INTO
query some time later.As a result, the table will have two data parts located on the
disk_ssd
disk (thevolume_hot
volume):SELECT name, rows, active, disk_name FROM system.parts WHERE table = 'alerts';
ββnameβββββββ¬βrowsββ¬βactiveββ¬βdisk_nameββ 1. β all_1_1_0 β 5 β 1 β disk_ssd β 2. β all_2_2_0 β 5 β 1 β disk_ssd β βββββββββββββ΄βββββββ΄βββββββββ΄ββββββββββββ
-
5 minutes after the first insertion, the first data part will be moved to the
disk_hdd
disk (thevolume_cold
volume). To verify this, repeat theSELECT … FROM system.parts
query:ββnameβββββββ¬βrowsββ¬βactiveββ¬βdisk_nameββ 1. β all_1_1_0 β 5 β 1 β disk_hdd β 2. β all_2_2_0 β 5 β 1 β disk_ssd β βββββββββββββ΄βββββββ΄βββββββββ΄ββββββββββββ
Column TTL
When values in a column with TTL expire, ADQM replaces them with the default value for the column data type (or the value specified via DEFAULT
). If all values in a column expire, the column is removed from the data part in the file system.
IMPORTANT
TTL cannot be used for key columns.
|
Assign TTL to columns
To specify a time to live for column values, use the TTL
clause in the column description when creating a table with the CREATE TABLE
query:
CREATE TABLE <table_name>
( <timestamp_column> Date|DateTime,
...,
<column_name1> [<data_type1>] TTL <timestamp_column> + INTERVAL <time_interval1> <interval_type1>,
<column_name2> [<data_type2>] TTL <timestamp_column> + INTERVAL <time_interval2> <interval_type2>,
...)
ENGINE = MergeTree()
...;
A TTL expression specifying when column values ββshould be deleted is described in the same way as a table-level TTL expression.
Modify TTL
To add a TTL to a column of an existing table or change the existing TTL of a column, use the ALTER TABLE … MODIFY COLUMN
query:
ALTER TABLE <table_name> MODIFY COLUMN <column_name> <data_type> TTL <new_ttl_expr>;
Remove TTL
The following query allows you to delete TTL from a column:
ALTER TABLE <table_name> MODIFY COLUMN <column_name> REMOVE TTL;
Examples
-
Create a table and set the following TTL rules for the columns:
-
value1
— a value in the column should be replaced with100
if 5 minutes have passed since the time in thetime
column; -
value2
— a column value should be replaced with the default one (0
for the Int data type) if 10 minutes have passed since the time in thetime
column.
CREATE TABLE column_ttl_test ( time DateTime, value1 Int DEFAULT 100 TTL time + INTERVAL 5 MINUTE, value2 Int TTL time + INTERVAL 10 MINUTE, alert String ) ENGINE = MergeTree ORDER BY time SETTINGS min_rows_for_wide_part = 0, min_bytes_for_wide_part = 0;
The
min_rows_for_wide_part
andmin_bytes_for_wide_part
settings are used to store data parts in theWide
format (for testing purposes) — each column in a separate file. -
-
Insert data into the table:
INSERT INTO column_ttl_test SELECT now() - toIntervalMinute(number), number, number + 10, ['Warning', 'Critical', 'Info'][(rand() % 3) + 1] FROM numbers(10);
The first five rows in the
value1
column contain100
because their values ββin thetime
column were already outdated at the time of insertion:SELECT * FROM column_ttl_test;
βββββββββββββββββtimeββ¬βvalue1ββ¬βvalue2ββ¬βalertβββββ 1. β 2024-06-25 11:42:36 β 100 β 19 β Warning β 2. β 2024-06-25 11:43:36 β 100 β 18 β Critical β 3. β 2024-06-25 11:44:36 β 100 β 17 β Warning β 4. β 2024-06-25 11:45:36 β 100 β 16 β Info β 5. β 2024-06-25 11:46:36 β 100 β 15 β Info β 6. β 2024-06-25 11:47:36 β 4 β 14 β Info β 7. β 2024-06-25 11:48:36 β 3 β 13 β Warning β 8. β 2024-06-25 11:49:36 β 2 β 12 β Info β 9. β 2024-06-25 11:50:36 β 1 β 11 β Critical β 10. β 2024-06-25 11:51:36 β 0 β 10 β Critical β βββββββββββββββββββββββ΄βββββββββ΄βββββββββ΄βββββββββββ
-
3 minutes after inserting data into the table, run the query:
OPTIMIZE TABLE column_ttl_test FINAL;
Select data from the table again. In the results, you can see that values of the
value1
column ββin rows 6-8 and values of thevalue2
column in rows 1-3 have been replaced:βββββββββββββββββtimeββ¬βvalue1ββ¬βvalue2ββ¬βalertβββββ 1. β 2024-06-25 11:42:36 β 100 β 0 β Warning β 2. β 2024-06-25 11:43:36 β 100 β 0 β Critical β 3. β 2024-06-25 11:44:36 β 100 β 0 β Warning β 4. β 2024-06-25 11:45:36 β 100 β 16 β Info β 5. β 2024-06-25 11:46:36 β 100 β 15 β Info β 6. β 2024-06-25 11:47:36 β 100 β 14 β Info β 7. β 2024-06-25 11:48:36 β 100 β 13 β Warning β 8. β 2024-06-25 11:49:36 β 100 β 12 β Info β 9. β 2024-06-25 11:50:36 β 1 β 11 β Critical β 10. β 2024-06-25 11:51:36 β 0 β 10 β Critical β βββββββββββββββββββββββ΄βββββββββ΄βββββββββ΄βββββββββββ
-
After another 2 minutes (5 minutes after the initial data insert), all values ββin the
value1
column will be considered outdated and replaced with100
:OPTIMIZE TABLE column_ttl_test FINAL;
SELECT * FROM column_ttl_test;
βββββββββββββββββtimeββ¬βvalue1ββ¬βvalue2ββ¬βalertβββββ 1. β 2024-06-25 11:42:36 β 100 β 0 β Warning β 2. β 2024-06-25 11:43:36 β 100 β 0 β Critical β 3. β 2024-06-25 11:44:36 β 100 β 0 β Warning β 4. β 2024-06-25 11:45:36 β 100 β 0 β Info β 5. β 2024-06-25 11:46:36 β 100 β 0 β Info β 6. β 2024-06-25 11:47:36 β 100 β 14 β Info β 7. β 2024-06-25 11:48:36 β 100 β 13 β Warning β 8. β 2024-06-25 11:49:36 β 100 β 12 β Info β 9. β 2024-06-25 11:50:36 β 100 β 11 β Critical β 10. β 2024-06-25 11:51:36 β 100 β 10 β Critical β βββββββββββββββββββββββ΄βββββββββ΄βββββββββ΄βββββββββββ
The
value1
column has been removed from the data part directory in the file system. You can check this as follows.Obtain the path to the directory with files of the active data part (
all_1_1_3
) from thesystem.parts
table:SELECT name, active, path FROM system.parts WHERE table = 'column_ttl_test';
ββnameβββββββ¬βactiveββ¬βpathβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 1. β all_1_1_0 β 0 β /var/lib/clickhouse/store/62a/62a16a55-c91c-4c6b-9386-f4347ffac137/all_1_1_0/ β 2. β all_1_1_1 β 0 β /var/lib/clickhouse/store/62a/62a16a55-c91c-4c6b-9386-f4347ffac137/all_1_1_1/ β 3. β all_1_1_2 β 0 β /var/lib/clickhouse/store/62a/62a16a55-c91c-4c6b-9386-f4347ffac137/all_1_1_2/ β 4. β all_1_1_3 β 1 β /var/lib/clickhouse/store/62a/62a16a55-c91c-4c6b-9386-f4347ffac137/all_1_1_3/ β βββββββββββββ΄βββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Make sure that there are no files with data of the
value1
column (value1.bin and value1.cmrk2) in this directory:$ sudo ls /var/lib/clickhouse/store/62a/62a16a55-c91c-4c6b-9386-f4347ffac137/all_1_1_3/
alert.bin checksums.txt count.txt metadata_version.txt serialization.json time.cmrk2 value2.bin alert.cmrk2 columns.txt default_compression_codec.txt primary.cidx time.bin ttl.txt value2.cmrk2
The columns.txt file also has no information about the
value1
column:$ sudo cat /var/lib/clickhouse/store/62a/62a16a55-c91c-4c6b-9386-f4347ffac137/all_1_1_3/columns.txt
columns format version: 1 3 columns: `time` DateTime `value2` Int32 `alert` String
Use the ALTER TABLE … MODIFY COLUMN … TTL
query to change the TTL for the value2
column of the table created in the previous example:
ALTER TABLE column_ttl_test MODIFY COLUMN value2 Int TTL time + INTERVAL 1 HOUR;
Check the current TTL settings with the SHOW CREATE TABLE
query:
SHOW CREATE TABLE column_ttl_test;
As you can see, the new TTL rule is applied to the value2
column:
CREATE TABLE default.column_ttl_test ( `time` DateTime, `value1` Int32 DEFAULT 100 TTL time + toIntervalMinute(5), `value2` Int32 TTL time + toIntervalHour(1), `alert` String ) ENGINE = MergeTree ORDER BY time SETTINGS min_rows_for_wide_part = 0, min_bytes_for_wide_part = 0, index_granularity = 8192