Row update & deletion

Although ADQM is not designed to update and delete data, it supports the ability to execute mutations — the ALTER TABLE … UPDATE/DELETE operations to update/delete existing data. The scope of mutations is changes that affect a large number of data rows in a table. These operations are heavyweight and not suitable for frequent use (best to avoid them if possible).

Update data

To update data in a table, use the ALTER TABLE…​UPDATE query:

ALTER TABLE [<db_name>.]<table_name> [ON CLUSTER <cluster_name>] UPDATE <column_name> = <expression>[, ...] [IN PARTITION <partition_expr>] WHERE <filter_expression>;

In this query:

  • [<db_name>.]<table_name> — name of a table to be updated. If no database name is specified, the table is searched in the current database.

  • [ON CLUSTER <cluster_name>] — name of a cluster to run the query on all servers of that cluster. If the ON CLUSTER clause is not used, the query affects only the server where it is executed.

  • <column_name> = <expression> — command to update data values in a column. Values of the <column_name> column are replaced by the <expression> expression value in each row for which the <filter_expression> expression returns a non-null value.

     
    The <expression> expression value should be of the same data type as the <column_name> column, or it should be cast to the same type via the CAST operator.

    A single query can include multiple <column_name> = <expression> commands separated by commas. Modifying values of multiple columns with a single query can be more efficient than running multiple queries (one query for updating each column).

    Updating columns that are used for calculating the primary key or sorting key is not supported.

  • [IN PARTITION <partition_expr>] — partition that contains data to be updated. The IN PARTITION clause helps speed up UPDATE operations when you need to perform targeted updates for a table divided into many partitions.

  • <filter_expression> — expression that returns a value of the UInt8 type for each row of the table to determine in which rows the data should be updated.

Example

 
Initial table:

--a---b----c---------date--
│ 1 │ a │ 10 │ 2022-12-10 │
│ 2 │ b │ 20 │ 2022-12-11 │
│ 3 │ c │ 30 │ 2022-12-12 │
│ 4 │ d │ 40 │ 2022-12-13 │
│ 5 │ e │ 50 │ 2022-12-14 │
│ 6 │ f │ 60 │ 2022-12-15 │
---------------------------

The following query replaces values of the b and c columns in rows where the date column’s value meets the specified condition:

ALTER TABLE test_table UPDATE b = repeat(b, 3), c = c/10 WHERE date > '2022-12-13';

The result is:

--a---b------c---------date--
│ 1 │ a   │ 10 │ 2022-12-10 │
│ 2 │ b   │ 20 │ 2022-12-11 │
│ 3 │ c   │ 30 │ 2022-12-12 │
│ 4 │ d   │ 40 │ 2022-12-13 │
│ 5 │ eee │  5 │ 2022-12-14 │
│ 6 │ fff │  6 │ 2022-12-15 │
-----------------------------

For replicated tables, ALTER TABLE…​UPDATE queries are executed asynchronously. If necessary, you can use the mutations_sync setting to synchronize data updates on replicas.

Delete data

To delete data rows from a table, you can use the following queries:

In most cases, DELETE FROM is faster than ALTER TABLE…​DELETE (this operation requires data to be merged before it is deleted). DELETE FROM is a lightweight operation that is effective for deleting a small number of data rows. For some scenarios (for example, for infrequent bulk deletes), ALTER TABLE…​DELETE may be more suitable.

ALTER TABLE…​DELETE

Query syntax:

ALTER TABLE [<db_name>.]<table_name> [ON CLUSTER <cluster_name>] DELETE [IN PARTITION <partition_expr>] WHERE <filter_expression>;

The query removes data rows for which the <filter_expression> expression of the UInt8 type returns a non-zero value. To remove rows only from an individual partition, specify the partition using the IN PARTITION expression.

Example

 
Initial table:

--a---b------c---------date--
│ 1 │ a   │ 10 │ 2022-12-10 │
│ 2 │ b   │ 20 │ 2022-12-11 │
│ 3 │ c   │ 30 │ 2022-12-12 │
│ 4 │ d   │ 40 │ 2022-12-13 │
│ 5 │ eee │  5 │ 2022-12-14 │
│ 6 │ fff │  6 │ 2022-12-15 │
-----------------------------

The following query removes rows in which values of the date and a columns satisfy the specified conditions:

ALTER TABLE test_table DELETE WHERE date < '2022-12-13' AND a = 2;

The result is:

--a---b------c---------date--
│ 1 │ a   │ 10 │ 2022-12-10 │
│ 3 │ c   │ 30 │ 2022-12-12 │
│ 4 │ d   │ 40 │ 2022-12-13 │
│ 5 │ eee │  5 │ 2022-12-14 │
│ 6 │ fff │  6 │ 2022-12-15 │
-----------------------------

For replicated tables, ALTER TABLE…​DELETE queries are asynchronous. If you need to synchronize the delete operation on replicas, change the mutations_sync parameter value.

DELETE FROM

Query syntax:

DELETE FROM [<db_name>.]<table_name> [WHERE <filter_expression>];

In the <table_name> table, the query removes rows for which the <filter_expression> expression of the UInt8 type returns a non-zero value. Rows to be deleted are immediately marked as deleted and automatically excluded from all subsequent requests. Data deletion is performed asynchronously in the background (the mutations_sync setting controls the synchronicity of query processing).

To use the DELETE FROM query, enable the allow_experimental_lightweight_delete setting:

SET allow_experimental_lightweight_delete = true;
Example

 
Initial table:

--a---b-------c---------date--
│ 1 │ a    │ 10 │ 2022-12-10 │
│ 3 │ c    │ 30 │ 2022-12-12 │
│ 4 │ d    │ 40 │ 2022-12-13 │
│ 5 │ eee  │  5 │ 2022-12-14 │
│ 6 │ fff  │  6 │ 2022-12-15 │
│ 7 │ aabb │ 10 │ 2022-12-16 │
│ 8 │ bbaa │ 20 │ 2022-12-17 │
------------------------------

The following query removes all rows from the test_table table where the b column value includes the aa substring:

DELETE FROM test_table WHERE b LIKE '%aa%';

The result is:

--a---b------c---------date--
│ 1 │ a   │ 10 │ 2022-12-10 │
│ 3 │ c   │ 30 │ 2022-12-12 │
│ 4 │ d   │ 40 │ 2022-12-13 │
│ 5 │ eee │  5 │ 2022-12-14 │
│ 6 │ fff │  6 │ 2022-12-15 │
-----------------------------
Found a mistake? Seleсt text and press Ctrl+Enter to report it