JOIN clause
Overview
The JOIN clause combines columns from two tables into a new table. A set of data rows in a resulting table depends on the JOIN type and the specified join conditions.
The general JOIN syntax is:
FROM <left_table> <JOIN_type> <right_table> <join_condition>
where:
-
<JOIN_type>— JOIN type. ADQM supports all standard SQL JOIN types and additional specialized JOIN types that can be useful for various analytical queries and time-series analysis.
Standard SQL JOIN types:
Additional types of JOIN available in ADQM:
If the JOIN type is not specified explicitly,INNER JOINis used. -
<join_condition>— condition that determines which rows from two source tables match each other. A condition (or multiple conditions) should be specified asON <expr_list>orUSING <column_list>for allJOINtypes exceptCROSS JOIN— see the JOIN conditions section for details.
The join_algorithm setting allows you to choose an algorithm for join execution. You can specify multiple algorithms (one of the available algorithms will be selected for each specific query based on its type/strictness and table engine) or allow the system to select and dynamically change the algorithm during query execution depending on resource availability and usage.
To reproduce the examples provided below in this article and explore different types of JOIN, create two tables:
-
usersCREATE TABLE users (user_id Int32, name String, role_id Int32) ENGINE = MergeTree ORDER BY user_id;INSERT INTO users VALUES (1, 'john', 10), (1, 'john', 20), (1, 'john', 30), (2, 'mary', 30), (2, 'mary', 40), (3, 'andrew', 40), (4, 'harry', 70), (5, 'ann', 50);┌─user_id─┬─name───┬─role_id─┐ │ 1 │ john │ 10 │ │ 2 │ mike │ 20 │ │ 3 │ tom │ 30 │ │ 4 │ mary │ 30 │ │ 5 │ ada │ 40 │ │ 6 │ andrew │ 40 │ │ 7 │ harry │ 70 │ │ 8 │ ann │ 50 │ └─────────┴────────┴─────────┘
-
rolesCREATE TABLE roles (id Int32, title String) ENGINE = MergeTree ORDER BY id;INSERT INTO roles VALUES (10, 'admin'), (20, 'owner'), (30, 'author'), (40, 'reviewer'), (50, 'editor'), (60, 'view only');┌─id─┬─title──────┐ │ 10 │ admin │ │ 20 │ owner │ │ 30 │ author │ │ 40 │ reviewer │ │ 50 │ editor │ │ 60 │ view only │ └────┴────────────┘
The users table’s role_id field is a key that refers to the id field in the roles table and serves for defining relationships between tables.
JOIN types
INNER JOIN
For each row in the left table, INNER JOIN checks whether the right table contains a row that satisfies the JOIN condition. If a corresponding row is found, a new row with columns from both tables is created and added to the result set. If there are more than one match for a row, then all matches are added to the resulting table (it means that the Cartesian product is produced for rows that match on the join condition).
The INNER keyword can be omitted in a query.
SELECT users.name AS user, roles.title AS role FROM users INNER JOIN roles ON users.role_id=roles.id;
The query joins the users and roles tables and returns combined rows matching on identifiers of user roles. The harry user is not included in the result because the identifier of a role assigned to him was not found in the roles table. The result also does not include the view only role since it is not assigned to any user.
┌─user───┬─role─────┐ │ john │ admin │ │ mike │ owner │ │ tom │ author │ │ mary │ author │ │ ada │ reviewer │ │ andrew │ reviewer │ │ ann │ editor │ └────────┴──────────┘
OUTER JOIN
LEFT/RIGHT/FULL OUTER JOIN queries extend INNER JOIN with rows from source tables that do not match the specified join condition. The OUTER keyword can be omitted.
-
The
LEFT OUTER JOINclause performs theINNER JOINoperation first. Then, each row of the left table, which is not included in theINNER JOINresult, is added to theLEFT OUTER JOINresult with default values for the corresponding data types in columns of the right table. The joined table always has at least one row for each row of the left table.ExampleSELECT users.name AS user, roles.title AS role FROM users LEFT OUTER JOIN roles ON users.role_id=roles.id;The result contains role titles for all users from the
userstable, and also includes a row for theharryuser whose role was not defined in therolestable. In this row, theroles.titlecolumn (with therolealias) contains an empty string — the default value for theStringtype:┌─user───┬─role─────┐ │ john │ admin │ │ mike │ owner │ │ tom │ author │ │ mary │ author │ │ ada │ reviewer │ │ andrew │ reviewer │ │ harry │ │ │ ann │ editor │ └────────┴──────────┘
-
RIGHT OUTER JOINworks in a similar way — performsINNER JOINfirst, and then adds all non-matching rows from the right table with default values in columns of the left table to the joined table, so that the joined table has at least one row for each row of the right table.ExampleSELECT users.name AS user, roles.title AS role FROM users RIGHT OUTER JOIN roles ON users.role_id=roles.id;The query returns all matches from the
usersandrolestables on role identifiers, as well as a row for theview onlyrole that is not assigned to any user (with an empty string in theusercolumn):┌─user───┬─role─────┐ │ john │ admin │ │ mike │ owner │ │ tom │ author │ │ mary │ author │ │ ada │ reviewer │ │ andrew │ reviewer │ │ ann │ editor │ └────────┴──────────┘ ┌─user─┬─role──────┐ │ │ view only │ └──────┴───────────┘
-
FULL OUTER JOINcombinesLEFT OUTER JOINandRIGHT OUTER JOIN— it returns theINNER JOINresult and all non-matching rows from both tables with default values in empty cells.ExampleSELECT users.name AS user, roles.title AS role FROM users FULL OUTER JOIN roles ON users.role_id=roles.id;The result includes all users from the
userstable with their roles (including theharryuser whose role is not defined in therolestable), as well as a role from therolestable that is not assigned to any user:┌─user───┬─role─────┐ │ john │ admin │ │ mike │ owner │ │ tom │ author │ │ mary │ author │ │ ada │ reviewer │ │ andrew │ reviewer │ │ harry │ │ │ ann │ editor │ └────────┴──────────┘ ┌─user─┬─role──────┐ │ │ view only │ └──────┴───────────┘
To use NULL values in empty cells instead of default values (as in standard SQL), change the join_use_nulls setting value.
CROSS JOIN
The CROSS JOIN query combines each row of the left table with each row of the right table (creates the Cartesian product). The column set of the joined table includes all columns of the left table followed by all columns of the right table. If tables have N and M rows respectively, a joined table has N * M rows. CROSS JOIN is equivalent to FROM <left_table> INNER JOIN <right_table> ON TRUE or FROM <left_table>, <right_table>.
SELECT * FROM users CROSS JOIN roles;
The result contains all possible combinations of rows from the users and roles tables:
┌─user_id─┬─name───┬─role_id─┬─id─┬─title──────┐ │ 1 │ john │ 10 │ 10 │ admin │ │ 1 │ john │ 10 │ 20 │ owner │ │ 1 │ john │ 10 │ 30 │ author │ │ 1 │ john │ 10 │ 40 │ reviewer │ │ 1 │ john │ 10 │ 50 │ editor │ │ 1 │ john │ 10 │ 60 │ view only │ │ 2 │ mike │ 20 │ 10 │ admin │ │ 2 │ mike │ 20 │ 20 │ owner │ │ 2 │ mike │ 20 │ 30 │ author │ │ 2 │ mike │ 20 │ 40 │ reviewer │ │ 2 │ mike │ 20 │ 50 │ editor │ │ 2 │ mike │ 20 │ 60 │ view only │ │ 3 │ tom │ 30 │ 10 │ admin │ │ 3 │ tom │ 30 │ 20 │ owner │ │ 3 │ tom │ 30 │ 30 │ author │ │ 3 │ tom │ 30 │ 40 │ reviewer │ │ 3 │ tom │ 30 │ 50 │ editor │ │ 3 │ tom │ 30 │ 60 │ view only │ │ 4 │ mary │ 30 │ 10 │ admin │ │ 4 │ mary │ 30 │ 20 │ owner │ │ 4 │ mary │ 30 │ 30 │ author │ │ 4 │ mary │ 30 │ 40 │ reviewer │ │ 4 │ mary │ 30 │ 50 │ editor │ │ 4 │ mary │ 30 │ 60 │ view only │ │ 5 │ ada │ 40 │ 10 │ admin │ │ 5 │ ada │ 40 │ 20 │ owner │ │ 5 │ ada │ 40 │ 30 │ author │ │ 5 │ ada │ 40 │ 40 │ reviewer │ │ 5 │ ada │ 40 │ 50 │ editor │ │ 5 │ ada │ 40 │ 60 │ view only │ │ 6 │ andrew │ 40 │ 10 │ admin │ │ 6 │ andrew │ 40 │ 20 │ owner │ │ 6 │ andrew │ 40 │ 30 │ author │ │ 6 │ andrew │ 40 │ 40 │ reviewer │ │ 6 │ andrew │ 40 │ 50 │ editor │ │ 6 │ andrew │ 40 │ 60 │ view only │ │ 7 │ harry │ 70 │ 10 │ admin │ │ 7 │ harry │ 70 │ 20 │ owner │ │ 7 │ harry │ 70 │ 30 │ author │ │ 7 │ harry │ 70 │ 40 │ reviewer │ │ 7 │ harry │ 70 │ 50 │ editor │ │ 7 │ harry │ 70 │ 60 │ view only │ │ 8 │ ann │ 50 │ 10 │ admin │ │ 8 │ ann │ 50 │ 20 │ owner │ │ 8 │ ann │ 50 │ 30 │ author │ │ 8 │ ann │ 50 │ 40 │ reviewer │ │ 8 │ ann │ 50 │ 50 │ editor │ │ 8 │ ann │ 50 │ 60 │ view only │ └─────────┴────────┴─────────┴────┴────────────┘
LEFT/RIGHT SEMI JOIN
The LEFT SEMI JOIN query returns column values for each row from the left table that has at least one match on the join condition in the right table. If more than one match is found for a row, only the first found match is returned (Cartesian product is disabled).
RIGHT SEMI JOIN works similarly — it returns column values for all rows from the right table that have at least one match on the join condition in the left table. If multiple matches are found for a row, only the first found match is added to the result.
SELECT roles.title AS role, users.name AS user FROM users RIGHT SEMI JOIN roles ON users.role_id=roles.id;
For each role from the right roles table, the RIGHT SEMI JOIN query returns only one user that has been assigned the role. A view only role that has no match in the users table is not included in the resulting table:
┌─role─────┬─user─┐ │ admin │ john │ │ owner │ mike │ │ author │ tom │ │ reviewer │ ada │ │ editor │ ann │ └──────────┴──────┘
LEFT/RIGHT ANTI JOIN
LEFT ANTI JOIN returns column values for all rows of the left table that do not match any rows in the right table on the specified join condition. Similarly, RIGHT ANTI JOIN returns column values for the right table’s rows that have no matches in the left table.
SELECT users.name AS user, roles.title AS role FROM users LEFT ANTI JOIN roles ON users.role_id=roles.id;
The LEFT ANTI JOIN query returns a data row only for the harry user, whose role is not found in the roles table on the specified identifier:
┌─user──┬─role─┐ │ harry │ │ └───────┴──────┘
LEFT/RIGHT/INNER ANY JOIN
The LEFT ANY JOIN query returns column values of each row from the left table, combining them either with column values of matching rows from the right table, or with default values in the right table columns (if no matches were found). If a row of the left table has more than one match in the right table, the query returns the first found match only. RIGHT ANY JOIN works similarly for the right table’s rows.
In other words, the ANY keyword in queries indicates the strictness of the JOIN operation - the Cartesian product for rows of source tables matching on the join condition is not created and only one of the found matches for each row is included in the result. The default strictness is ALL — all matching rows from source tables are joined and written to the resulting table (see the LEFT/RIGHT OUTER JOIN and INNER JOIN query examples above). So, LEFT/RIGHT ANY JOIN joins tables as LEFT/RIGHT OUTER JOIN, and INNER ANY JOIN as INNER JOIN, but with disabled Cartesian product.
The default strictness value can be adjusted via the join_default_strictness setting.
SELECT users.name AS user, roles.title AS role FROM users RIGHT ANY JOIN roles ON users.role_id=roles.id;
For each role from the right roles table, the RIGHT ANY JOIN query returns only one user (the first one found) to which this role is assigned. The result also includes the view only role that is not assigned to any user:
┌─user─┬─role─────┐ │ john │ admin │ │ mike │ owner │ │ tom │ author │ │ ada │ reviewer │ │ ann │ editor │ └──────┴──────────┘ ┌─user─┬─role──────┐ │ │ view only │ └──────┴───────────┘
ASOF JOIN
The ASOF JOIN and LEFT ASOF JOIN queries are used when it is necessary to join rows that do not have an exact match. If a row from the left table does not have an exact match with any row in the right table, then the closest matching row from the right table is used as a match instead. The closest match condition should be based on a special column (of the Int, UInt, Float, Date, DateTime, or Decimal type) that both tables contain.
General query syntax:
-
ASOF JOIN … ON:<left_table> [LEFT] ASOF JOIN <right_table> ON <exact_match_condition> AND <closest_match_condition>where:
-
<exact_match_condition>— condition for joining by equality (multiple conditions can be set); -
<closest_match_condition>— condition for joining by the closest match (one condition should be set). The closest match condition supports the>,>=,<, and<=operators.
-
-
ASOF JOIN … USING:<left_table> [LEFT] ASOF JOIN <right_table> USING <exact_match_column1>, ... <exact_match_columnN>, <closest_match_column>where
-
<exact_match_column1>, … <exact_match_columnN>— columns for joining by equality of values; -
<closest_match_column>— column (should be the last one in theUSINGlist) for joining by the closest match. A join condition is<left_table>.<closest_match_column> >= <right_table>.<closest_match_column>.
-
Create two tables for test examples:
-
table1CREATE TABLE table1 (id Int32, time DateTime, text String) ENGINE = MergeTree ORDER BY id;INSERT INTO table1 VALUES (50, '2023-03-10 14:55:00', 'text1_0'), (50, '2023-03-10 15:00:00', 'text1_1'), (50, '2023-03-10 15:03:00', 'text1_2'), (50, '2023-03-10 15:10:00', 'text1_3'), (50, '2023-03-10 15:14:00', 'text1_4');┌─id─┬────────────────time─┬─text────┐ │ 50 │ 2023-03-10 14:55:00 │ text1_0 │ │ 50 │ 2023-03-10 15:00:00 │ text1_1 │ │ 50 │ 2023-03-10 15:03:00 │ text1_2 │ │ 50 │ 2023-03-10 15:10:00 │ text1_3 │ │ 50 │ 2023-03-10 15:14:00 │ text1_4 │ └────┴─────────────────────┴─────────┘
-
table2CREATE TABLE table2 (id Int32, time DateTime, text String) ENGINE = MergeTree ORDER BY id;INSERT INTO table2 VALUES (50, '2023-03-10 15:00:00', 'text2_1'), (50, '2023-03-10 15:07:00', 'text2_2'), (50, '2023-03-10 15:11:00', 'text2_3'), (50, '2023-03-10 15:20:00', 'text2_4');┌─id─┬────────────────time─┬─text────┐ │ 50 │ 2023-03-10 15:00:00 │ text2_1 │ │ 50 │ 2023-03-10 15:07:00 │ text2_2 │ │ 50 │ 2023-03-10 15:11:00 │ text2_3 │ │ 50 │ 2023-03-10 15:20:00 │ text2_4 │ └────┴─────────────────────┴─────────┘
Run the ASOF LEFT JOIN query:
SELECT
table1.text AS table1_text,
table1.time AS table1_time,
table2.text AS table2_text,
table2.time AS table2_time
FROM table1
ASOF LEFT JOIN table2 ON (table1.id = table2.id) AND (table1.time >= table2.time);
In the resulting table, each row of table1 is combined with a row from table2 with the closest value in the time column according to the table1.time >= table2.time condition. For the first row (text1_0), no matching row was found in table2, so the table2_text and table2_time columns contain default values corresponding to column data types:
┌─table1_text─┬─────────table1_time─┬─table2_text─┬─────────table2_time─┐ │ text1_0 │ 2023-03-10 14:55:00 │ │ 1970-01-01 00:00:00 │ │ text1_1 │ 2023-03-10 15:00:00 │ text2_1 │ 2023-03-10 15:00:00 │ │ text1_2 │ 2023-03-10 15:03:00 │ text2_1 │ 2023-03-10 15:00:00 │ │ text1_3 │ 2023-03-10 15:10:00 │ text2_2 │ 2023-03-10 15:07:00 │ │ text1_4 │ 2023-03-10 15:14:00 │ text2_3 │ 2023-03-10 15:11:00 │ └─────────────┴─────────────────────┴─────────────┴─────────────────────┘
Run the ASOF LEFT JOIN query with the USING clause:
SELECT
table1.text AS table1_text,
table1.time AS table1_time,
table2.text AS table2_text,
table2.time AS table2_time
FROM table1
ASOF LEFT JOIN table2 USING (id, time);
The resulting table is the same:
┌─table1_text─┬─────────table1_time─┬─table2_text─┬─────────table2_time─┐ │ text1_0 │ 2023-03-10 14:55:00 │ │ 1970-01-01 00:00:00 │ │ text1_1 │ 2023-03-10 15:00:00 │ text2_1 │ 2023-03-10 15:00:00 │ │ text1_2 │ 2023-03-10 15:03:00 │ text2_1 │ 2023-03-10 15:00:00 │ │ text1_3 │ 2023-03-10 15:10:00 │ text2_2 │ 2023-03-10 15:07:00 │ │ text1_4 │ 2023-03-10 15:14:00 │ text2_3 │ 2023-03-10 15:11:00 │ └─────────────┴─────────────────────┴─────────────┴─────────────────────┘
JOIN conditions
JOIN combines rows of two tables based on the specified columns — join keys. Use the ON or USING clause for the JOIN operator to set join keys and additional join conditions if required.
CROSS JOIN queries do not require join conditions.
ON
The ON clause returns a boolean value. A row from one table matches a row from another table if the ON expression returns true.
You can specify multiple conditions using the AND and/or OR operators — in this case, rows are joined if the entire compound condition is met. If a condition is not met, rows can still be included in the result depending on the JOIN type .
A condition that determines a join key should use the equality operator (=) to compare columns of the left and right tables. Other conditions may use other logical operators, and each individual condition should be specified either for the left or right table.
The following query specifies two conditions for joining tables:
-
users.role_id=roles.id— join key; -
roles.id>20— additional condition for therolestable.
SELECT
users.name AS user,
roles.title AS role,
roles.id AS role_id
FROM users
LEFT JOIN roles ON users.role_id = roles.id AND roles.id > 20;
The result:
┌─user───┬─role─────┬─role_id─┐ │ john │ │ 0 │ │ mike │ │ 0 │ │ tom │ author │ 30 │ │ mary │ author │ 30 │ │ ada │ reviewer │ 40 │ │ andrew │ reviewer │ 40 │ │ harry │ │ 0 │ │ ann │ editor │ 50 │ └────────┴──────────┴─────────┘
USING
You can utilize the USING clause instead of ON when columns in both tables have the same names. Pass a comma-separated list of column names as a parameter to the USING clause — the equality of these columns will be considered a condition for joining.
The result of the JOIN clause with USING contains a single column for each of the listed column pairs in the specified order, followed by all remaining columns from the left and right tables.
Create two test tables:
-
test_table1CREATE TABLE test_table1 (num Int32, name String) ENGINE = MergeTree ORDER BY num;INSERT INTO test_table1 VALUES (1, 'a'), (2, 'b'), (3, 'c'); -
test_table2CREATE TABLE test_table2 (num Int32, value String) ENGINE = MergeTree ORDER BY num;INSERT INTO test_table2 VALUES (0, 'value1'), (1, 'value2'), (2, 'value3');
test_table1 test_table2 ┌─num─┬─name─┐ ┌─num─┬─value──┐ | 1 | a | │ 0 │ value1 │ | 2 | b | │ 1 │ value2 │ | 3 | c | │ 2 │ value3 │ └─────┴──────┘ └─────┴────────┘
Execute the following query with the JOIN … USING clause to join the test_table2 and test_table2 tables on the num column:
SELECT * FROM test_table1 INNER JOIN test_table2 USING num;
The output:
┌─num─┬─name─┬─value──┐ │ 1 │ a │ value2 │ │ 2 │ b │ value3 │ └─────┴──────┴────────┘