Subqueries

Overview

A subquery is an SQL query nested within another (parent) query or subquery. A subquery is executed first, and then its results are passed to the parent query. Thus, subqueries can be useful when you need to use the results of one query as the input for another query. For example, you can use subqueries in SELECT queries: in the FROM and JOIN clauses to specify a source from which data should be read or in the WHERE clause to filter data.

Operations for which a subquery can be used depend on the type of data the subquery returns. The result of a subquery can be:

  • Scalar value — when a subquery returns a value of one column from one row or a single calculated value (for example, an aggregate function result — sum, average/minimum/maximum value of a column). Such a subquery can be used with the comparison operators >, <, =, and others (see Comparison Operators in the ClickHouse documentation), passed as an argument to a function, or used as a column value in the SELECT clause.

  • Table value — when a subquery returns multiple rows (possibly zero rows). In this case, a subquery can be used with operators such as IN, NOT IN, ANY, ALL, EXISTS to check the occurrence of a row from the parent query in the subquery results. Also, a table value from a subquery can be used in the FROM or JOIN clause to specify a source table.

NOTE
  • Syntactically, a subquery is a SELECT query enclosed in parentheses.

  • Aliases are not visible in subqueries and between subqueries.

Below are various examples of using subqueries in SELECT queries.

Create tables for test examples

 
To reproduce examples from the article, create the following tables first:

  • employees

    CREATE TABLE employees (
        id UInt16,
        full_name String,
        hire_date Date,
        position Enum('c developer' = 1,
                      'python developer' = 2,
                      'java developer' = 3,
                      'frontend developer' = 4,
                      'test automation engineer' = 5,
                      'manual testing engineer' = 6,
                      'load testing engineer' = 7,
                      'support engineer 1st line' = 8,
                      'support engineer 2nd line' = 9),
        department Enum('development' = 10, 'qa' = 20, 'tech support' = 30),
        location String,
        email String)
    ENGINE = MergeTree
    ORDER BY id;
    INSERT INTO employees VALUES
        (1, 'James Wilson', '2020-11-23', 1, 10, 'New York', 'jwilson@example.com'),
        (2, 'Mary Burton', '2023-05-06', 1, 10, 'Chicago', 'mburton@example.com'),
        (3, 'Patricia Robinson', '2021-10-11', 2, 10, 'Los Angeles', 'probinson@example.com'),
        (4, 'Robert Gray', '2019-07-03', 2, 10, 'Seattle', 'rgray@example.com'),
        (5, 'Elizabeth Tucker', '2024-02-03', 4, 10, 'Chicago', 'etucker@example.com'),
        (6, 'Joseph Lewis', '2023-04-05', 4, 10, 'New York', 'jlewis@example.com'),
        (7, 'Paul Brown', '2024-02-03', 3, 10, 'Chicago', 'pbrown@example.com'),
        (8, 'Andrew Clarke', '2024-11-23', 3, 10, 'Los Angeles', 'aclarke@example.com')
        (9, 'William Ferguson', '2023-11-12', 5, 20, 'Chicago', 'wferguson@example.com'),
        (10, 'Linda Black', '2021-07-20', 6, 20, 'New York', 'lblack@example.com'),
        (11, 'David Green', '2019-10-10', 7, 20, 'Chicago', 'dgreen@example.com'),
        (12, 'Daniel Gray', '2020-08-03', 7, 20, 'New York', 'dgray@example.com'),
        (13, 'Mark Armstrong', '2020-04-05', 8, 30, 'New York', 'marmstrong@example.com'),
        (14, 'Donald Carter', '2021-07-07', 8, 30, 'Seattle', 'dcarter@example.com'),
        (15, 'Elizabeth Collins', '2022-01-22', 9, 30, 'Chicago', 'ecollins@example.com');
        ┌─id─┬─full_name─────────┬──hire_date─┬─position──────────────────┬─department───┬─location────┬─email──────────────────┐
     1. │  1 │ James Wilson      │ 2020-11-23 │ c developer               │ development  │ New York    │ jwilson@example.com    │
     2. │  2 │ Mary Burton       │ 2023-05-06 │ c developer               │ development  │ Chicago     │ mburton@example.com    │
     3. │  3 │ Patricia Robinson │ 2021-10-11 │ python developer          │ development  │ Los Angeles │ probinson@example.com  │
     4. │  4 │ Robert Gray       │ 2019-07-03 │ python developer          │ development  │ Seattle     │ rgray@example.com      │
     5. │  5 │ Elizabeth Tucker  │ 2024-02-03 │ frontend developer        │ development  │ Chicago     │ etucker@example.com    │
     6. │  6 │ Joseph Lewis      │ 2023-04-05 │ frontend developer        │ development  │ New York    │ jlewis@example.com     │
     7. │  7 │ Paul Brown        │ 2024-02-03 │ java developer            │ development  │ Chicago     │ pbrown@example.com     │
     8. │  8 │ Andrew Clarke     │ 2024-11-23 │ java developer            │ development  │ Los Angeles │ aclarke@example.com    │
     9. │  9 │ William Ferguson  │ 2023-11-12 │ test automation engineer  │ qa           │ Chicago     │ wferguson@example.com  │
    10. │ 10 │ Linda Black       │ 2021-07-20 │ manual testing engineer   │ qa           │ New York    │ lblack@example.com     │
    11. │ 11 │ David Green       │ 2019-10-10 │ load testing engineer     │ qa           │ Chicago     │ dgreen@example.com     │
    12. │ 12 │ Daniel Gray       │ 2020-08-03 │ load testing engineer     │ qa           │ New York    │ dgray@example.com      │
    13. │ 13 │ Mark Armstrong    │ 2020-04-05 │ support engineer 1st line │ tech support │ New York    │ marmstrong@example.com │
    14. │ 14 │ Donald Carter     │ 2021-07-07 │ support engineer 1st line │ tech support │ Seattle     │ dcarter@example.com    │
    15. │ 15 │ Elizabeth Collins │ 2022-01-22 │ support engineer 2nd line │ tech support │ Chicago     │ ecollins@example.com   │
        └────┴───────────────────┴────────────┴───────────────────────────┴──────────────┴─────────────┴────────────────────────┘
  • employee_bonus_oct_24

    CREATE TABLE employee_bonus_oct_24 (employee_id UInt16, bonus UInt32) ENGINE = MergeTree ORDER BY employee_id;
    INSERT INTO employee_bonus_oct_24
    VALUES (1, 700), (2, 450), (3, 500), (4, 950), (5, 370), (6, 600),(7, 1000), (8, 950),
           (9, 770), (10, 350), (11, 500), (12, 670), (13, 600), (14, 600), (15, 480);
        ┌─employee_id─┬─bonus─┐
     1. │           1 │   700 │
     2. │           2 │   450 │
     3. │           3 │   500 │
     4. │           4 │   950 │
     5. │           5 │   370 │
     6. │           6 │   600 │
     7. │           7 │  1000 │
     8. │           8 │   950 │
     9. │           9 │   770 │
    10. │          10 │   350 │
    11. │          11 │   500 │
    12. │          12 │   670 │
    13. │          13 │   600 │
    14. │          14 │   600 │
    15. │          15 │   480 │
        └─────────────┴───────┘
  • employee_bonus_oct_24_full

    CREATE TABLE employee_bonus_oct_24_full (
        employee_id UInt16,
        full_name String,
        position String,
        department String,
        bonus UInt32)
    ENGINE = MergeTree
    ORDER BY employee_id;
    INSERT INTO employee_bonus_oct_24_full SELECT
        employee_bonus_oct_24.employee_id,
        employees.full_name,
        employees.position,
        employees.department,
        employee_bonus_oct_24.bonus
    FROM employee_bonus_oct_24
    INNER JOIN employees ON employee_bonus_oct_24.employee_id = employees.id;
        ┌─employee_id─┬─full_name─────────┬─position──────────────────┬─department───┬─bonus─┐
     1. │           1 │ James Wilson      │ c developer               │ development  │   700 │
     2. │           2 │ Mary Burton       │ c developer               │ development  │   450 │
     3. │           3 │ Patricia Robinson │ python developer          │ development  │   500 │
     4. │           4 │ Robert Gray       │ python developer          │ development  │   950 │
     5. │           5 │ Elizabeth Tucker  │ frontend developer        │ development  │   370 │
     6. │           6 │ Joseph Lewis      │ frontend developer        │ development  │   600 │
     7. │           7 │ Paul Brown        │ java developer            │ development  │  1000 │
     8. │           8 │ Andrew Clarke     │ java developer            │ development  │   950 │
     9. │           9 │ William Ferguson  │ test automation engineer  │ qa           │   770 │
    10. │          10 │ Linda Black       │ manual testing engineer   │ qa           │   350 │
    11. │          11 │ David Green       │ load testing engineer     │ qa           │   500 │
    12. │          12 │ Daniel Gray       │ load testing engineer     │ qa           │   670 │
    13. │          13 │ Mark Armstrong    │ support engineer 1st line │ tech support │   600 │
    14. │          14 │ Donald Carter     │ support engineer 1st line │ tech support │   600 │
    15. │          15 │ Elizabeth Collins │ support engineer 2nd line │ tech support │   480 │
        └─────────────┴───────────────────┴───────────────────────────┴──────────────┴───────┘

Subqueries in FROM

Instead of a table name or table function, you can use a subquery in the FROM clause to specify a data source for a SELECT query:

SELECT <expression> FROM (<subquery>) [AS <alias>]

In contrast to standard SQL, it is not necessary to specify an alias after a subquery (AS <alias>). You can do this for compatibility, but the specified name is not used anywhere.

A subquery in the following query gets bonuses of employees in the development department for October and calculates the value of each employee’s bonus relative to the amount of the maximum bonus in the company:

SELECT employee, bonus, relative_to_max
FROM
(
    SELECT
        full_name AS employee,
        bonus,
        round((bonus/(SELECT max(bonus) FROM employee_bonus_oct_24_full)), 2) AS relative_to_max
    FROM employee_bonus_oct_24_full
    WHERE department = 'development'
)
WHERE relative_to_max < 0.5;
   ┌─employee─────────┬─bonus─┬─relative_to_max─┐
1. │ Mary Burton      │   450 │            0.45 │
2. │ Elizabeth Tucker │   370 │            0.37 │
   └──────────────────┴───────┴─────────────────┘

Subqueries in JOIN

In a SELECT query with the JOIN clause, there is no optimization of the order in which the join operation is executed relative to other stages of the query. The join operation (search in the right table) is performed before WHERE filtering and before aggregation. To explicitly specify the order of calculations and improve query performance, it is recommended to apply a filter as early as possible — for this purpose, you can join subqueries.

For example, the following query first evaluates the subqueries in the FROM (number of employees in each department) and LEFT ANY JOIN (number of employees in each department who received bonuses less than 500) clauses with grouping and filtering, and then joins the results of these subqueries:

SELECT department, number_of_employees, bonus_less_500
FROM
(
    SELECT department, count() AS number_of_employees
    FROM employees
    GROUP BY department
)
LEFT ANY JOIN
(
    SELECT department, count() AS bonus_less_500
    FROM employee_bonus_oct_24_full
    WHERE bonus < 500
    GROUP BY department
) USING (department)
SETTINGS joined_subquery_requires_alias = 0;
   ┌─department───┬─number_of_employees─┬─bonus_less_500─┐
1. │ development  │                   8 │              2 │
2. │ qa           │                   4 │              1 │
3. │ tech support │                   3 │              1 │
   └──────────────┴─────────────────────┴────────────────┘

Subqueries in WHERE

Subqueries are often used within the WHERE clause to filter data. As mentioned above, subqueries can be used with various comparison operators to define data filtering conditions:

  • subqueries that return scalar values can be used before or after comparison operators such as = (==), != (<>), >, >=, <, <=. For example, the following query finds all employees whose bonuses in October are greater than the average:

    SELECT full_name AS employee, position, department
    FROM employee_bonus_oct_24_full
    WHERE bonus > (SELECT avg(bonus) FROM employee_bonus_oct_24_full);
       ┌─employee─────────┬─position─────────────────┬─department──┐
    1. │ James Wilson     │ c developer              │ development │
    2. │ Robert Gray      │ python developer         │ development │
    3. │ Paul Brown       │ java developer           │ development │
    4. │ Andrew Clarke    │ java developer           │ development │
    5. │ William Ferguson │ test automation engineer │ qa          │
    6. │ Daniel Gray      │ load testing engineer    │ qa          │
       └──────────────────┴──────────────────────────┴─────────────┘
  • subqueries that return multiple data rows can be used with the IN, NOT IN, ANY, ALL, EXISTS operators, allowing you to check for the existence of a row in a set of rows. Examples of using subqueries with these operators are given in the sections below.

IN

Syntax of the IN operator with a subquery:

<IN_expression> IN (<IN_subquery>)

The IN operator evaluates the IN_expression expression, compares it with values ​​in all rows of the IN_subquery result, and returns:

  • 1 — if a row equal to IN_expression is found in the results of IN_subquery;

  • 0 — in the following cases:

    • no rows equal to IN_expression were found in the results of IN_subquery;

    • IN_subquery does not return any rows;

    • there is a NULL value in the right or left part of the IN operator and the transform_null_in parameter is set to 0 (by default).

The following sample query uses the IN operator to display information about bonuses paid in October only to employees living in New York:

SELECT full_name AS employee, bonus FROM employee_bonus_oct_24_full
WHERE employee_id IN (SELECT id FROM employees WHERE location='New York');
   ┌─employee───────┬─bonus─┐
1. │ James Wilson   │   700 │
2. │ Joseph Lewis   │   600 │
3. │ Linda Black    │   350 │
4. │ Daniel Gray    │   670 │
5. │ Mark Armstrong │   600 │
   └────────────────┴───────┘
NOTE
  • A subquery can specify more than one column to filter tuples (types of columns on the left and right of the IN operator should be the same):

    (<column1>, <column2>) IN (SELECT <IN_subquery_result_column1>, <IN_subquery_result_column2> FROM ...)
  • The IN operator with a subquery can be used in any part of the parent query, including in aggregate functions and lambda functions.

  • For processing a distributed query, it is often more optimal to use the GLOBAL IN operator instead of IN with a subquery — see detailed information in the Distributed Subqueries section of the ClickHouse documentation.

NOT IN

Syntax of the NOT IN operator with a subquery:

<NOT_IN_expression> NOT IN (<NOT_IN_subquery>)

The NOT IN operator evaluates NOT_IN_expression, compares it with each row of the NOT_IN_subquery result, and returns:

  • 1 — if no matches were found in the results of NOT_IN_expression and NOT_IN_subquery, or the subquery returns no rows;

  • 0 — if the NOT_IN_subquery results contain at least one row equal to NOT_IN_expression.

For example, the following query returns data on bonuses received by employees who do not live in New York:

SELECT full_name AS employee, bonus FROM employee_bonus_oct_24_full
WHERE employee_id NOT IN (SELECT id FROM employees WHERE location='New York');
    ┌─employee──────────┬─bonus─┐
 1. │ Mary Burton       │   450 │
 2. │ Patricia Robinson │   500 │
 3. │ Robert Gray       │   950 │
 4. │ Elizabeth Tucker  │   370 │
 5. │ Paul Brown        │  1000 │
 6. │ Andrew Clarke     │   950 │
 7. │ William Ferguson  │   770 │
 8. │ David Green       │   500 │
 9. │ Donald Carter     │   600 │
10. │ Elizabeth Collins │   480 │
    └───────────────────┴───────┘

ANY

Syntax of the ANY operator with a subquery:

<ANY_expression> <operator> ANY (<ANY_subquery>)

where operator can be =, <>, <, <=, >, >=.

The ANY operator compares the results of ANY_expression and ANY_subquery using operator and returns 1 if at least one element from the subquery result satisfies the specified condition.

The following query finds employees not from the development department whose bonuses in October are greater than the bonus of at least one employee in the development department in the same month:

SELECT full_name AS employee, bonus, department FROM employee_bonus_oct_24_full
WHERE bonus > ANY (SELECT bonus FROM employee_bonus_oct_24_full WHERE department='development') AND department!='development';
   ┌─employee──────────┬─bonus─┬─department───┐
1. │ William Ferguson  │   770 │ qa           │
2. │ David Green       │   500 │ qa           │
3. │ Daniel Gray       │   670 │ qa           │
4. │ Mark Armstrong    │   600 │ tech support │
5. │ Donald Carter     │   600 │ tech support │
6. │ Elizabeth Collins │   480 │ tech support │
   └───────────────────┴───────┴──────────────┘

= ANY is equivalent to IN. If you rewrite the above example with IN as follows, the query result will be the same:

SELECT full_name AS employee, bonus FROM employee_bonus_oct_24_full
WHERE employee_id = ANY (SELECT id FROM employees WHERE location='New York');
   ┌─employee───────┬─bonus─┐
1. │ James Wilson   │   700 │
2. │ Joseph Lewis   │   600 │
3. │ Linda Black    │   350 │
4. │ Daniel Gray    │   670 │
5. │ Mark Armstrong │   600 │
   └────────────────┴───────┘

ALL

Syntax of the ALL operator with a subquery:

<ALL_expression> <operator> ALL (<ALL_subquery>)

where operator can be =, <>, <, <=, >, >=.

The ALL operator compares the value of ALL_expression with each value in the set returned by ALL_subquery and returns 1 if the comparison condition is true for all values ​​in the set.

For example, the following query returns employees whose bonuses are less than the bonus of any employee in the tech support department:

SELECT full_name AS employee, bonus, department FROM employee_bonus_oct_24_full
WHERE bonus < ALL (SELECT bonus FROM employee_bonus_oct_24_full WHERE department='tech support');
   ┌─employee─────────┬─bonus─┬─department──┐
1. │ Mary Burton      │   450 │ development │
2. │ Elizabeth Tucker │   370 │ development │
3. │ Linda Black      │   350 │ qa          │
   └──────────────────┴───────┴─────────────┘

!= ALL is equivalent to NOT IN. The above example with NOT IN can be rewritten as follows:

SELECT full_name AS employee, bonus FROM employee_bonus_oct_24_full
WHERE employee_id != ALL (SELECT id FROM employees WHERE location='New York');

The result will be the same:

    ┌─employee──────────┬─bonus─┐
 1. │ Mary Burton       │   450 │
 2. │ Patricia Robinson │   500 │
 3. │ Robert Gray       │   950 │
 4. │ Elizabeth Tucker  │   370 │
 5. │ Paul Brown        │  1000 │
 6. │ Andrew Clarke     │   950 │
 7. │ William Ferguson  │   770 │
 8. │ David Green       │   500 │
 9. │ Donald Carter     │   600 │
10. │ Elizabeth Collins │   480 │
    └───────────────────┴───────┘

EXISTS

Syntax of the EXISTS operator:

EXISTS (<subquery>)

It checks how many rows the subquery result contains. If the result is empty, the operator returns 0, otherwise — 1.

CAUTION
References to tables and columns of a parent query are not supported in a subquery.

For example, the following query displays the employees table only if it contains at least one employee who lives in Chicago:

SELECT full_name AS employee, location AS city
FROM employees
WHERE EXISTS (SELECT * FROM employees WHERE location='Chicago');
    ┌─employee──────────┬─city────────┐
 1. │ James Wilson      │ New York    │
 2. │ Mary Burton       │ Chicago     │
 3. │ Patricia Robinson │ Los Angeles │
 4. │ Robert Gray       │ Seattle     │
 5. │ Elizabeth Tucker  │ Chicago     │
 6. │ Joseph Lewis      │ New York    │
 7. │ Paul Brown        │ Chicago     │
 8. │ Andrew Clarke     │ Los Angeles │
 9. │ William Ferguson  │ Chicago     │
10. │ Linda Black       │ New York    │
11. │ David Green       │ Chicago     │
12. │ Daniel Gray       │ New York    │
13. │ Mark Armstrong    │ New York    │
14. │ Donald Carter     │ Seattle     │
15. │ Elizabeth Collins │ Chicago     │
    └───────────────────┴─────────────┘
Found a mistake? Seleсt text and press Ctrl+Enter to report it