JOIN clause

Overview

The JOIN clause combines columns from two tables into a new table. A set of data rows in a resulting table depends on the JOIN type and the specified join conditions.

The general JOIN syntax is:

FROM <left_table> <JOIN_type> <right_table> <join_condition>

where:

The join_algorithm setting allows you to choose an algorithm for join execution. You can specify multiple algorithms (one of the available algorithms will be selected for each specific query based on its type/strictness and table engine) or allow the system to select and dynamically change the algorithm during query execution depending on resource availability and usage.

Create tables for test examples

 
To reproduce the examples provided below in this article and explore different types of JOIN, create two tables:

  • users

    CREATE TABLE users (user_id Int32, name String, role_id Int32) ENGINE = MergeTree ORDER BY user_id;
    INSERT INTO users VALUES (1, 'john', 10), (1, 'john', 20), (1, 'john', 30), (2, 'mary', 30), (2, 'mary', 40), (3, 'andrew', 40), (4, 'harry', 70), (5, 'ann', 50);
    ┌─user_id─┬─name───┬─role_id─┐
    │       1 │ john   │      10 │
    │       2 │ mike   │      20 │
    │       3 │ tom    │      30 │
    │       4 │ mary   │      30 │
    │       5 │ ada    │      40 │
    │       6 │ andrew │      40 │
    │       7 │ harry  │      70 │
    │       8 │ ann    │      50 │
    └─────────┴────────┴─────────┘
  • roles

    CREATE TABLE roles (id Int32, title String) ENGINE = MergeTree ORDER BY id;
    INSERT INTO roles VALUES (10, 'admin'), (20, 'owner'), (30, 'author'), (40, 'reviewer'), (50, 'editor'), (60, 'view only');
    ┌─id─┬─title──────┐
    │ 10 │ admin      │
    │ 20 │ owner      │
    │ 30 │ author     │
    │ 40 │ reviewer   │
    │ 50 │ editor     │
    │ 60 │ view only  │
    └────┴────────────┘

The users table’s role_id field is a key that refers to the id field in the roles table and serves for defining relationships between tables.

JOIN types

INNER JOIN

For each row in the left table, INNER JOIN checks whether the right table contains a row that satisfies the JOIN condition. If a corresponding row is found, a new row with columns from both tables is created and added to the result set. If there are more than one match for a row, then all matches are added to the resulting table (it means that the Cartesian product is produced for rows that match on the join condition).

The INNER keyword can be omitted in a query.

Example
SELECT users.name AS user, roles.title AS role FROM users INNER JOIN roles ON users.role_id=roles.id;

The query joins the users and roles tables and returns combined rows matching on identifiers of user roles. The harry user is not included in the result because the identifier of a role assigned to him was not found in the roles table. The result also does not include the view only role since it is not assigned to any user.

┌─user───┬─role─────┐
│ john   │ admin    │
│ mike   │ owner    │
│ tom    │ author   │
│ mary   │ author   │
│ ada    │ reviewer │
│ andrew │ reviewer │
│ ann    │ editor   │
└────────┴──────────┘

OUTER JOIN

LEFT/RIGHT/FULL OUTER JOIN queries extend INNER JOIN with rows from source tables that do not match the specified join condition. The OUTER keyword can be omitted.

  • The LEFT OUTER JOIN clause performs the INNER JOIN operation first. Then, each row of the left table, which is not included in the INNER JOIN result, is added to the LEFT OUTER JOIN result with default values for the corresponding data types in columns of the right table. The joined table always has at least one row for each row of the left table.

    Example
    SELECT users.name AS user, roles.title AS role FROM users LEFT OUTER JOIN roles ON users.role_id=roles.id;

    The result contains role titles for all users from the users table, and also includes a row for the harry user whose role was not defined in the roles table. In this row, the roles.title column (with the role alias) contains an empty string — the default value for the String type:

    ┌─user───┬─role─────┐
    │ john   │ admin    │
    │ mike   │ owner    │
    │ tom    │ author   │
    │ mary   │ author   │
    │ ada    │ reviewer │
    │ andrew │ reviewer │
    │ harry  │          │
    │ ann    │ editor   │
    └────────┴──────────┘
  • RIGHT OUTER JOIN works in a similar way — performs INNER JOIN first, and then adds all non-matching rows from the right table with default values in columns of the left table to the joined table, so that the joined table has at least one row for each row of the right table.

    Example
    SELECT users.name AS user, roles.title AS role FROM users RIGHT OUTER JOIN roles ON users.role_id=roles.id;

    The query returns all matches from the users and roles tables on role identifiers, as well as a row for the view only role that is not assigned to any user (with an empty string in the user column):

    ┌─user───┬─role─────┐
    │ john   │ admin    │
    │ mike   │ owner    │
    │ tom    │ author   │
    │ mary   │ author   │
    │ ada    │ reviewer │
    │ andrew │ reviewer │
    │ ann    │ editor   │
    └────────┴──────────┘
    ┌─user─┬─role──────┐
    │      │ view only │
    └──────┴───────────┘
  • FULL OUTER JOIN combines LEFT OUTER JOIN and RIGHT OUTER JOIN — it returns the INNER JOIN result and all non-matching rows from both tables with default values in empty cells.

    Example
    SELECT users.name AS user, roles.title AS role FROM users FULL OUTER JOIN roles ON users.role_id=roles.id;

    The result includes all users from the users table with their roles (including the harry user whose role is not defined in the roles table), as well as a role from the roles table that is not assigned to any user:

    ┌─user───┬─role─────┐
    │ john   │ admin    │
    │ mike   │ owner    │
    │ tom    │ author   │
    │ mary   │ author   │
    │ ada    │ reviewer │
    │ andrew │ reviewer │
    │ harry  │          │
    │ ann    │ editor   │
    └────────┴──────────┘
    ┌─user─┬─role──────┐
    │      │ view only │
    └──────┴───────────┘

To use NULL values in empty cells instead of default values (as in standard SQL), change the join_use_nulls setting value.

CROSS JOIN

The CROSS JOIN query combines each row of the left table with each row of the right table (creates the Cartesian product). The column set of the joined table includes all columns of the left table followed by all columns of the right table. If tables have N and M rows respectively, a joined table has N * M rows. CROSS JOIN is equivalent to FROM <left_table> INNER JOIN <right_table> ON TRUE or FROM <left_table>, <right_table>.

Example
SELECT * FROM users CROSS JOIN roles;

The result contains all possible combinations of rows from the users and roles tables:

┌─user_id─┬─name───┬─role_id─┬─id─┬─title──────┐
│       1 │ john   │      10 │ 10 │ admin      │
│       1 │ john   │      10 │ 20 │ owner      │
│       1 │ john   │      10 │ 30 │ author     │
│       1 │ john   │      10 │ 40 │ reviewer   │
│       1 │ john   │      10 │ 50 │ editor     │
│       1 │ john   │      10 │ 60 │ view only  │
│       2 │ mike   │      20 │ 10 │ admin      │
│       2 │ mike   │      20 │ 20 │ owner      │
│       2 │ mike   │      20 │ 30 │ author     │
│       2 │ mike   │      20 │ 40 │ reviewer   │
│       2 │ mike   │      20 │ 50 │ editor     │
│       2 │ mike   │      20 │ 60 │ view only  │
│       3 │ tom    │      30 │ 10 │ admin      │
│       3 │ tom    │      30 │ 20 │ owner      │
│       3 │ tom    │      30 │ 30 │ author     │
│       3 │ tom    │      30 │ 40 │ reviewer   │
│       3 │ tom    │      30 │ 50 │ editor     │
│       3 │ tom    │      30 │ 60 │ view only  │
│       4 │ mary   │      30 │ 10 │ admin      │
│       4 │ mary   │      30 │ 20 │ owner      │
│       4 │ mary   │      30 │ 30 │ author     │
│       4 │ mary   │      30 │ 40 │ reviewer   │
│       4 │ mary   │      30 │ 50 │ editor     │
│       4 │ mary   │      30 │ 60 │ view only  │
│       5 │ ada    │      40 │ 10 │ admin      │
│       5 │ ada    │      40 │ 20 │ owner      │
│       5 │ ada    │      40 │ 30 │ author     │
│       5 │ ada    │      40 │ 40 │ reviewer   │
│       5 │ ada    │      40 │ 50 │ editor     │
│       5 │ ada    │      40 │ 60 │ view only  │
│       6 │ andrew │      40 │ 10 │ admin      │
│       6 │ andrew │      40 │ 20 │ owner      │
│       6 │ andrew │      40 │ 30 │ author     │
│       6 │ andrew │      40 │ 40 │ reviewer   │
│       6 │ andrew │      40 │ 50 │ editor     │
│       6 │ andrew │      40 │ 60 │ view only  │
│       7 │ harry  │      70 │ 10 │ admin      │
│       7 │ harry  │      70 │ 20 │ owner      │
│       7 │ harry  │      70 │ 30 │ author     │
│       7 │ harry  │      70 │ 40 │ reviewer   │
│       7 │ harry  │      70 │ 50 │ editor     │
│       7 │ harry  │      70 │ 60 │ view only  │
│       8 │ ann    │      50 │ 10 │ admin      │
│       8 │ ann    │      50 │ 20 │ owner      │
│       8 │ ann    │      50 │ 30 │ author     │
│       8 │ ann    │      50 │ 40 │ reviewer   │
│       8 │ ann    │      50 │ 50 │ editor     │
│       8 │ ann    │      50 │ 60 │ view only  │
└─────────┴────────┴─────────┴────┴────────────┘

LEFT/RIGHT SEMI JOIN

The LEFT SEMI JOIN query returns column values for each row from the left table that has at least one match on the join condition in the right table. If more than one match is found for a row, only the first found match is returned (Cartesian product is disabled).

RIGHT SEMI JOIN works similarly — it returns column values for all rows from the right table that have at least one match on the join condition in the left table. If multiple matches are found for a row, only the first found match is added to the result.

Example (RIGHT SEMI JOIN)
SELECT roles.title AS role, users.name AS user FROM users RIGHT SEMI JOIN roles ON users.role_id=roles.id;

For each role from the right roles table, the RIGHT SEMI JOIN query returns only one user that has been assigned the role. A view only role that has no match in the users table is not included in the resulting table:

┌─role─────┬─user─┐
│ admin    │ john │
│ owner    │ mike │
│ author   │ tom  │
│ reviewer │ ada  │
│ editor   │ ann  │
└──────────┴──────┘

LEFT/RIGHT ANTI JOIN

LEFT ANTI JOIN returns column values for all rows of the left table that do not match any rows in the right table on the specified join condition. Similarly, RIGHT ANTI JOIN returns column values for the right table’s rows that have no matches in the left table.

Example (LEFT ANTI JOIN)
SELECT users.name AS user, roles.title AS role FROM users LEFT ANTI JOIN roles ON users.role_id=roles.id;

The LEFT ANTI JOIN query returns a data row only for the harry user, whose role is not found in the roles table on the specified identifier:

┌─user──┬─role─┐
│ harry │      │
└───────┴──────┘

LEFT/RIGHT/INNER ANY JOIN

The LEFT ANY JOIN query returns column values of each row from the left table, combining them either with column values of matching rows from the right table, or with default values in the right table columns (if no matches were found). If a row of the left table has more than one match in the right table, the query returns the first found match only. RIGHT ANY JOIN works similarly for the right table’s rows.

In other words, the ANY keyword in queries indicates the strictness of the JOIN operation - the Cartesian product for rows of source tables matching on the join condition is not created and only one of the found matches for each row is included in the result. The default strictness is ALL — all matching rows from source tables are joined and written to the resulting table (see the LEFT/RIGHT OUTER JOIN and INNER JOIN query examples above). So, LEFT/RIGHT ANY JOIN joins tables as LEFT/RIGHT OUTER JOIN, and INNER ANY JOIN as INNER JOIN, but with disabled Cartesian product.

The default strictness value can be adjusted via the join_default_strictness setting.

Example (RIGHT ANY JOIN)
SELECT users.name AS user, roles.title AS role FROM users RIGHT ANY JOIN roles ON users.role_id=roles.id;

For each role from the right roles table, the RIGHT ANY JOIN query returns only one user (the first one found) to which this role is assigned. The result also includes the view only role that is not assigned to any user:

┌─user─┬─role─────┐
│ john │ admin    │
│ mike │ owner    │
│ tom  │ author   │
│ ada  │ reviewer │
│ ann  │ editor   │
└──────┴──────────┘
┌─user─┬─role──────┐
│      │ view only │
└──────┴───────────┘

ASOF JOIN

The ASOF JOIN and LEFT ASOF JOIN queries are used when it is necessary to join rows that do not have an exact match. If a row from the left table does not have an exact match with any row in the right table, then the closest matching row from the right table is used as a match instead. The closest match condition should be based on a special column (of the Int, UInt, Float, Date, DateTime, or Decimal type) that both tables contain.

General query syntax:

  • ASOF JOIN …​ ON:

    <left_table> [LEFT] ASOF JOIN <right_table> ON <exact_match_condition> AND <closest_match_condition>

    where:

    • <exact_match_condition> — condition for joining by equality (multiple conditions can be set);

    • <closest_match_condition> — condition for joining by the closest match (one condition should be set). The closest match condition supports the >, >=, <, and <= operators.

  • ASOF JOIN …​ USING:

    <left_table> [LEFT] ASOF JOIN <right_table> USING <exact_match_column1>, ... <exact_match_columnN>, <closest_match_column>

    where

    • <exact_match_column1>, …​ <exact_match_columnN> — columns for joining by equality of values;

    • <closest_match_column> — column (should be the last one in the USING list) for joining by the closest match. A join condition is <left_table>.<closest_match_column> >= <right_table>.<closest_match_column>.

Example (LEFT ASOF JOIN)

 
Create two tables for test examples:

  • table1

    CREATE TABLE table1 (id Int32, time DateTime, text String) ENGINE = MergeTree ORDER BY id;
    INSERT INTO table1 VALUES
        (50, '2023-03-10 14:55:00', 'text1_0'),
        (50, '2023-03-10 15:00:00', 'text1_1'),
        (50, '2023-03-10 15:03:00', 'text1_2'),
        (50, '2023-03-10 15:10:00', 'text1_3'),
        (50, '2023-03-10 15:14:00', 'text1_4');
    ┌─id─┬────────────────time─┬─text────┐
    │ 50 │ 2023-03-10 14:55:00 │ text1_0 │
    │ 50 │ 2023-03-10 15:00:00 │ text1_1 │
    │ 50 │ 2023-03-10 15:03:00 │ text1_2 │
    │ 50 │ 2023-03-10 15:10:00 │ text1_3 │
    │ 50 │ 2023-03-10 15:14:00 │ text1_4 │
    └────┴─────────────────────┴─────────┘
  • table2

    CREATE TABLE table2 (id Int32, time DateTime, text String) ENGINE = MergeTree ORDER BY id;
    INSERT INTO table2 VALUES
        (50, '2023-03-10 15:00:00', 'text2_1'),
        (50, '2023-03-10 15:07:00', 'text2_2'),
        (50, '2023-03-10 15:11:00', 'text2_3'),
        (50, '2023-03-10 15:20:00', 'text2_4');
    ┌─id─┬────────────────time─┬─text────┐
    │ 50 │ 2023-03-10 15:00:00 │ text2_1 │
    │ 50 │ 2023-03-10 15:07:00 │ text2_2 │
    │ 50 │ 2023-03-10 15:11:00 │ text2_3 │
    │ 50 │ 2023-03-10 15:20:00 │ text2_4 │
    └────┴─────────────────────┴─────────┘

Run the ASOF LEFT JOIN query:

SELECT
    table1.text AS table1_text,
    table1.time AS table1_time,
    table2.text AS table2_text,
    table2.time AS table2_time
FROM table1
ASOF LEFT JOIN table2 ON (table1.id = table2.id) AND (table1.time >= table2.time);

In the resulting table, each row of table1 is combined with a row from table2 with the closest value in the time column according to the table1.time >= table2.time condition. For the first row (text1_0), no matching row was found in table2, so the table2_text and table2_time columns contain default values corresponding to column data types:

┌─table1_text─┬─────────table1_time─┬─table2_text─┬─────────table2_time─┐
│ text1_0     │ 2023-03-10 14:55:00 │             │ 1970-01-01 00:00:00 │
│ text1_1     │ 2023-03-10 15:00:00 │ text2_1     │ 2023-03-10 15:00:00 │
│ text1_2     │ 2023-03-10 15:03:00 │ text2_1     │ 2023-03-10 15:00:00 │
│ text1_3     │ 2023-03-10 15:10:00 │ text2_2     │ 2023-03-10 15:07:00 │
│ text1_4     │ 2023-03-10 15:14:00 │ text2_3     │ 2023-03-10 15:11:00 │
└─────────────┴─────────────────────┴─────────────┴─────────────────────┘

Run the ASOF LEFT JOIN query with the USING clause:

SELECT
    table1.text AS table1_text,
    table1.time AS table1_time,
    table2.text AS table2_text,
    table2.time AS table2_time
FROM table1
ASOF LEFT JOIN table2 USING (id, time);

The resulting table is the same:

┌─table1_text─┬─────────table1_time─┬─table2_text─┬─────────table2_time─┐
│ text1_0     │ 2023-03-10 14:55:00 │             │ 1970-01-01 00:00:00 │
│ text1_1     │ 2023-03-10 15:00:00 │ text2_1     │ 2023-03-10 15:00:00 │
│ text1_2     │ 2023-03-10 15:03:00 │ text2_1     │ 2023-03-10 15:00:00 │
│ text1_3     │ 2023-03-10 15:10:00 │ text2_2     │ 2023-03-10 15:07:00 │
│ text1_4     │ 2023-03-10 15:14:00 │ text2_3     │ 2023-03-10 15:11:00 │
└─────────────┴─────────────────────┴─────────────┴─────────────────────┘

JOIN conditions

JOIN combines rows of two tables based on the specified columns — join keys. Use the ON or USING clause for the JOIN operator to set join keys and additional join conditions if required.

CROSS JOIN queries do not require join conditions.

ON

The ON clause returns a boolean value. A row from one table matches a row from another table if the ON expression returns true.

You can specify multiple conditions using the AND and/or OR operators — in this case, rows are joined if the entire compound condition is met. If a condition is not met, rows can still be included in the result depending on the JOIN type .

A condition that determines a join key should use the equality operator (=) to compare columns of the left and right tables. Other conditions may use other logical operators, and each individual condition should be specified either for the left or right table.

Example

 
The following query specifies two conditions for joining tables:

  • users.role_id=roles.id — join key;

  • roles.id>20 — additional condition for the roles table.

SELECT
    users.name AS user,
    roles.title AS role,
    roles.id AS role_id
FROM users
LEFT JOIN roles ON users.role_id = roles.id AND roles.id > 20;

The result:

┌─user───┬─role─────┬─role_id─┐
│ john   │          │       0 │
│ mike   │          │       0 │
│ tom    │ author   │      30 │
│ mary   │ author   │      30 │
│ ada    │ reviewer │      40 │
│ andrew │ reviewer │      40 │
│ harry  │          │       0 │
│ ann    │ editor   │      50 │
└────────┴──────────┴─────────┘

USING

You can utilize the USING clause instead of ON when columns in both tables have the same names. Pass a comma-separated list of column names as a parameter to the USING clause — the equality of these columns will be considered a condition for joining.

The result of the JOIN clause with USING contains a single column for each of the listed column pairs in the specified order, followed by all remaining columns from the left and right tables.

Example

 
Create two test tables:

  • test_table1

    CREATE TABLE test_table1 (num Int32, name String) ENGINE = MergeTree ORDER BY num;
    INSERT INTO test_table1 VALUES (1, 'a'), (2, 'b'), (3, 'c');
  • test_table2

    CREATE TABLE test_table2 (num Int32, value String) ENGINE = MergeTree ORDER BY num;
    INSERT INTO test_table2 VALUES (0, 'value1'), (1, 'value2'), (2, 'value3');
  test_table1        test_table2

┌─num─┬─name─┐     ┌─num─┬─value──┐
|   1 | a    |     │   0 │ value1 │
|   2 | b    |     │   1 │ value2 │
|   3 | c    |     │   2 │ value3 │
└─────┴──────┘     └─────┴────────┘

Execute the following query with the JOIN …​ USING clause to join the test_table2 and test_table2 tables on the num column:

SELECT * FROM test_table1 INNER JOIN test_table2 USING num;

The output:

┌─num─┬─name─┬─value──┐
│   1 │ a    │ value2 │
│   2 │ b    │ value3 │
└─────┴──────┴────────┘
Found a mistake? Seleсt text and press Ctrl+Enter to report it