Subqueries
Overview
A subquery is an SQL query nested within another (parent) query or subquery. A subquery is executed first, and then its results are passed to the parent query. Thus, subqueries can be useful when you need to use the results of one query as the input for another query. For example, you can use subqueries in SELECT
queries: in the FROM
and JOIN
clauses to specify a source from which data should be read or in the WHERE
clause to filter data.
Operations for which a subquery can be used depend on the type of data the subquery returns. The result of a subquery can be:
-
Scalar value — when a subquery returns a value of one column from one row or a single calculated value (for example, an aggregate function result — sum, average/minimum/maximum value of a column). Such a subquery can be used with the comparison operators
>
,<
,=
, and others (see Comparison Operators in the ClickHouse documentation), passed as an argument to a function, or used as a column value in theSELECT
clause. -
Table value — when a subquery returns multiple rows (possibly zero rows). In this case, a subquery can be used with operators such as
IN
,NOT IN
,ANY
,ALL
,EXISTS
to check the occurrence of a row from the parent query in the subquery results. Also, a table value from a subquery can be used in theFROM
orJOIN
clause to specify a source table.
NOTE
|
Below are various examples of using subqueries in SELECT
queries.
To reproduce examples from the article, create the following tables first:
-
employees
CREATE TABLE employees ( id UInt16, full_name String, hire_date Date, position Enum('c developer' = 1, 'python developer' = 2, 'java developer' = 3, 'frontend developer' = 4, 'test automation engineer' = 5, 'manual testing engineer' = 6, 'load testing engineer' = 7, 'support engineer 1st line' = 8, 'support engineer 2nd line' = 9), department Enum('development' = 10, 'qa' = 20, 'tech support' = 30), location String, email String) ENGINE = MergeTree ORDER BY id;
INSERT INTO employees VALUES (1, 'James Wilson', '2020-11-23', 1, 10, 'New York', 'jwilson@example.com'), (2, 'Mary Burton', '2023-05-06', 1, 10, 'Chicago', 'mburton@example.com'), (3, 'Patricia Robinson', '2021-10-11', 2, 10, 'Los Angeles', 'probinson@example.com'), (4, 'Robert Gray', '2019-07-03', 2, 10, 'Seattle', 'rgray@example.com'), (5, 'Elizabeth Tucker', '2024-02-03', 4, 10, 'Chicago', 'etucker@example.com'), (6, 'Joseph Lewis', '2023-04-05', 4, 10, 'New York', 'jlewis@example.com'), (7, 'Paul Brown', '2024-02-03', 3, 10, 'Chicago', 'pbrown@example.com'), (8, 'Andrew Clarke', '2024-11-23', 3, 10, 'Los Angeles', 'aclarke@example.com') (9, 'William Ferguson', '2023-11-12', 5, 20, 'Chicago', 'wferguson@example.com'), (10, 'Linda Black', '2021-07-20', 6, 20, 'New York', 'lblack@example.com'), (11, 'David Green', '2019-10-10', 7, 20, 'Chicago', 'dgreen@example.com'), (12, 'Daniel Gray', '2020-08-03', 7, 20, 'New York', 'dgray@example.com'), (13, 'Mark Armstrong', '2020-04-05', 8, 30, 'New York', 'marmstrong@example.com'), (14, 'Donald Carter', '2021-07-07', 8, 30, 'Seattle', 'dcarter@example.com'), (15, 'Elizabeth Collins', '2022-01-22', 9, 30, 'Chicago', 'ecollins@example.com');
┌─id─┬─full_name─────────┬──hire_date─┬─position──────────────────┬─department───┬─location────┬─email──────────────────┐ 1. │ 1 │ James Wilson │ 2020-11-23 │ c developer │ development │ New York │ jwilson@example.com │ 2. │ 2 │ Mary Burton │ 2023-05-06 │ c developer │ development │ Chicago │ mburton@example.com │ 3. │ 3 │ Patricia Robinson │ 2021-10-11 │ python developer │ development │ Los Angeles │ probinson@example.com │ 4. │ 4 │ Robert Gray │ 2019-07-03 │ python developer │ development │ Seattle │ rgray@example.com │ 5. │ 5 │ Elizabeth Tucker │ 2024-02-03 │ frontend developer │ development │ Chicago │ etucker@example.com │ 6. │ 6 │ Joseph Lewis │ 2023-04-05 │ frontend developer │ development │ New York │ jlewis@example.com │ 7. │ 7 │ Paul Brown │ 2024-02-03 │ java developer │ development │ Chicago │ pbrown@example.com │ 8. │ 8 │ Andrew Clarke │ 2024-11-23 │ java developer │ development │ Los Angeles │ aclarke@example.com │ 9. │ 9 │ William Ferguson │ 2023-11-12 │ test automation engineer │ qa │ Chicago │ wferguson@example.com │ 10. │ 10 │ Linda Black │ 2021-07-20 │ manual testing engineer │ qa │ New York │ lblack@example.com │ 11. │ 11 │ David Green │ 2019-10-10 │ load testing engineer │ qa │ Chicago │ dgreen@example.com │ 12. │ 12 │ Daniel Gray │ 2020-08-03 │ load testing engineer │ qa │ New York │ dgray@example.com │ 13. │ 13 │ Mark Armstrong │ 2020-04-05 │ support engineer 1st line │ tech support │ New York │ marmstrong@example.com │ 14. │ 14 │ Donald Carter │ 2021-07-07 │ support engineer 1st line │ tech support │ Seattle │ dcarter@example.com │ 15. │ 15 │ Elizabeth Collins │ 2022-01-22 │ support engineer 2nd line │ tech support │ Chicago │ ecollins@example.com │ └────┴───────────────────┴────────────┴───────────────────────────┴──────────────┴─────────────┴────────────────────────┘
-
employee_bonus_oct_24
CREATE TABLE employee_bonus_oct_24 (employee_id UInt16, bonus UInt32) ENGINE = MergeTree ORDER BY employee_id;
INSERT INTO employee_bonus_oct_24 VALUES (1, 700), (2, 450), (3, 500), (4, 950), (5, 370), (6, 600),(7, 1000), (8, 950), (9, 770), (10, 350), (11, 500), (12, 670), (13, 600), (14, 600), (15, 480);
┌─employee_id─┬─bonus─┐ 1. │ 1 │ 700 │ 2. │ 2 │ 450 │ 3. │ 3 │ 500 │ 4. │ 4 │ 950 │ 5. │ 5 │ 370 │ 6. │ 6 │ 600 │ 7. │ 7 │ 1000 │ 8. │ 8 │ 950 │ 9. │ 9 │ 770 │ 10. │ 10 │ 350 │ 11. │ 11 │ 500 │ 12. │ 12 │ 670 │ 13. │ 13 │ 600 │ 14. │ 14 │ 600 │ 15. │ 15 │ 480 │ └─────────────┴───────┘
-
employee_bonus_oct_24_full
CREATE TABLE employee_bonus_oct_24_full ( employee_id UInt16, full_name String, position String, department String, bonus UInt32) ENGINE = MergeTree ORDER BY employee_id;
INSERT INTO employee_bonus_oct_24_full SELECT employee_bonus_oct_24.employee_id, employees.full_name, employees.position, employees.department, employee_bonus_oct_24.bonus FROM employee_bonus_oct_24 INNER JOIN employees ON employee_bonus_oct_24.employee_id = employees.id;
┌─employee_id─┬─full_name─────────┬─position──────────────────┬─department───┬─bonus─┐ 1. │ 1 │ James Wilson │ c developer │ development │ 700 │ 2. │ 2 │ Mary Burton │ c developer │ development │ 450 │ 3. │ 3 │ Patricia Robinson │ python developer │ development │ 500 │ 4. │ 4 │ Robert Gray │ python developer │ development │ 950 │ 5. │ 5 │ Elizabeth Tucker │ frontend developer │ development │ 370 │ 6. │ 6 │ Joseph Lewis │ frontend developer │ development │ 600 │ 7. │ 7 │ Paul Brown │ java developer │ development │ 1000 │ 8. │ 8 │ Andrew Clarke │ java developer │ development │ 950 │ 9. │ 9 │ William Ferguson │ test automation engineer │ qa │ 770 │ 10. │ 10 │ Linda Black │ manual testing engineer │ qa │ 350 │ 11. │ 11 │ David Green │ load testing engineer │ qa │ 500 │ 12. │ 12 │ Daniel Gray │ load testing engineer │ qa │ 670 │ 13. │ 13 │ Mark Armstrong │ support engineer 1st line │ tech support │ 600 │ 14. │ 14 │ Donald Carter │ support engineer 1st line │ tech support │ 600 │ 15. │ 15 │ Elizabeth Collins │ support engineer 2nd line │ tech support │ 480 │ └─────────────┴───────────────────┴───────────────────────────┴──────────────┴───────┘
Subqueries in FROM
Instead of a table name or table function, you can use a subquery in the FROM
clause to specify a data source for a SELECT
query:
SELECT <expression> FROM (<subquery>) [AS <alias>]
In contrast to standard SQL, it is not necessary to specify an alias after a subquery (AS <alias>
). You can do this for compatibility, but the specified name is not used anywhere.
A subquery in the following query gets bonuses of employees in the development
department for October and calculates the value of each employee’s bonus relative to the amount of the maximum bonus in the company:
SELECT employee, bonus, relative_to_max
FROM
(
SELECT
full_name AS employee,
bonus,
round((bonus/(SELECT max(bonus) FROM employee_bonus_oct_24_full)), 2) AS relative_to_max
FROM employee_bonus_oct_24_full
WHERE department = 'development'
)
WHERE relative_to_max < 0.5;
┌─employee─────────┬─bonus─┬─relative_to_max─┐ 1. │ Mary Burton │ 450 │ 0.45 │ 2. │ Elizabeth Tucker │ 370 │ 0.37 │ └──────────────────┴───────┴─────────────────┘
Subqueries in JOIN
In a SELECT
query with the JOIN clause, there is no optimization of the order in which the join operation is executed relative to other stages of the query. The join operation (search in the right table) is performed before WHERE
filtering and before aggregation. To explicitly specify the order of calculations and improve query performance, it is recommended to apply a filter as early as possible — for this purpose, you can join subqueries.
For example, the following query first evaluates the subqueries in the FROM
(number of employees in each department) and LEFT ANY JOIN
(number of employees in each department who received bonuses less than 500) clauses with grouping and filtering, and then joins the results of these subqueries:
SELECT department, number_of_employees, bonus_less_500
FROM
(
SELECT department, count() AS number_of_employees
FROM employees
GROUP BY department
)
LEFT ANY JOIN
(
SELECT department, count() AS bonus_less_500
FROM employee_bonus_oct_24_full
WHERE bonus < 500
GROUP BY department
) USING (department)
SETTINGS joined_subquery_requires_alias = 0;
┌─department───┬─number_of_employees─┬─bonus_less_500─┐ 1. │ development │ 8 │ 2 │ 2. │ qa │ 4 │ 1 │ 3. │ tech support │ 3 │ 1 │ └──────────────┴─────────────────────┴────────────────┘
Subqueries in WHERE
Subqueries are often used within the WHERE
clause to filter data. As mentioned above, subqueries can be used with various comparison operators to define data filtering conditions:
-
subqueries that return scalar values can be used before or after comparison operators such as
=
(==
),!=
(<>
),>
,>=
,<
,<=
. For example, the following query finds all employees whose bonuses in October are greater than the average:SELECT full_name AS employee, position, department FROM employee_bonus_oct_24_full WHERE bonus > (SELECT avg(bonus) FROM employee_bonus_oct_24_full);
┌─employee─────────┬─position─────────────────┬─department──┐ 1. │ James Wilson │ c developer │ development │ 2. │ Robert Gray │ python developer │ development │ 3. │ Paul Brown │ java developer │ development │ 4. │ Andrew Clarke │ java developer │ development │ 5. │ William Ferguson │ test automation engineer │ qa │ 6. │ Daniel Gray │ load testing engineer │ qa │ └──────────────────┴──────────────────────────┴─────────────┘
-
subqueries that return multiple data rows can be used with the
IN
,NOT IN
,ANY
,ALL
,EXISTS
operators, allowing you to check for the existence of a row in a set of rows. Examples of using subqueries with these operators are given in the sections below.
IN
Syntax of the IN
operator with a subquery:
<IN_expression> IN (<IN_subquery>)
The IN
operator evaluates the IN_expression
expression, compares it with values in all rows of the IN_subquery
result, and returns:
-
1
— if a row equal toIN_expression
is found in the results ofIN_subquery
; -
0
— in the following cases:-
no rows equal to
IN_expression
were found in the results ofIN_subquery
; -
IN_subquery
does not return any rows; -
there is a
NULL
value in the right or left part of theIN
operator and the transform_null_in parameter is set to0
(by default).
-
The following sample query uses the IN
operator to display information about bonuses paid in October only to employees living in New York:
SELECT full_name AS employee, bonus FROM employee_bonus_oct_24_full
WHERE employee_id IN (SELECT id FROM employees WHERE location='New York');
┌─employee───────┬─bonus─┐ 1. │ James Wilson │ 700 │ 2. │ Joseph Lewis │ 600 │ 3. │ Linda Black │ 350 │ 4. │ Daniel Gray │ 670 │ 5. │ Mark Armstrong │ 600 │ └────────────────┴───────┘
NOTE
|
NOT IN
Syntax of the NOT IN
operator with a subquery:
<NOT_IN_expression> NOT IN (<NOT_IN_subquery>)
The NOT IN
operator evaluates NOT_IN_expression
, compares it with each row of the NOT_IN_subquery
result, and returns:
-
1
— if no matches were found in the results ofNOT_IN_expression
andNOT_IN_subquery
, or the subquery returns no rows; -
0
— if theNOT_IN_subquery
results contain at least one row equal toNOT_IN_expression
.
For example, the following query returns data on bonuses received by employees who do not live in New York:
SELECT full_name AS employee, bonus FROM employee_bonus_oct_24_full
WHERE employee_id NOT IN (SELECT id FROM employees WHERE location='New York');
┌─employee──────────┬─bonus─┐ 1. │ Mary Burton │ 450 │ 2. │ Patricia Robinson │ 500 │ 3. │ Robert Gray │ 950 │ 4. │ Elizabeth Tucker │ 370 │ 5. │ Paul Brown │ 1000 │ 6. │ Andrew Clarke │ 950 │ 7. │ William Ferguson │ 770 │ 8. │ David Green │ 500 │ 9. │ Donald Carter │ 600 │ 10. │ Elizabeth Collins │ 480 │ └───────────────────┴───────┘
ANY
Syntax of the ANY
operator with a subquery:
<ANY_expression> <operator> ANY (<ANY_subquery>)
where operator
can be =
, <>
, <
, <=
, >
, >=
.
The ANY
operator compares the results of ANY_expression
and ANY_subquery
using operator
and returns 1
if at least one element from the subquery result satisfies the specified condition.
The following query finds employees not from the development
department whose bonuses in October are greater than the bonus of at least one employee in the development
department in the same month:
SELECT full_name AS employee, bonus, department FROM employee_bonus_oct_24_full
WHERE bonus > ANY (SELECT bonus FROM employee_bonus_oct_24_full WHERE department='development') AND department!='development';
┌─employee──────────┬─bonus─┬─department───┐ 1. │ William Ferguson │ 770 │ qa │ 2. │ David Green │ 500 │ qa │ 3. │ Daniel Gray │ 670 │ qa │ 4. │ Mark Armstrong │ 600 │ tech support │ 5. │ Donald Carter │ 600 │ tech support │ 6. │ Elizabeth Collins │ 480 │ tech support │ └───────────────────┴───────┴──────────────┘
= ANY
is equivalent to IN. If you rewrite the above example with IN
as follows, the query result will be the same:
SELECT full_name AS employee, bonus FROM employee_bonus_oct_24_full
WHERE employee_id = ANY (SELECT id FROM employees WHERE location='New York');
┌─employee───────┬─bonus─┐ 1. │ James Wilson │ 700 │ 2. │ Joseph Lewis │ 600 │ 3. │ Linda Black │ 350 │ 4. │ Daniel Gray │ 670 │ 5. │ Mark Armstrong │ 600 │ └────────────────┴───────┘
ALL
Syntax of the ALL
operator with a subquery:
<ALL_expression> <operator> ALL (<ALL_subquery>)
where operator
can be =
, <>
, <
, <=
, >
, >=
.
The ALL
operator compares the value of ALL_expression
with each value in the set returned by ALL_subquery
and returns 1
if the comparison condition is true for all values in the set.
For example, the following query returns employees whose bonuses are less than the bonus of any employee in the tech support
department:
SELECT full_name AS employee, bonus, department FROM employee_bonus_oct_24_full
WHERE bonus < ALL (SELECT bonus FROM employee_bonus_oct_24_full WHERE department='tech support');
┌─employee─────────┬─bonus─┬─department──┐ 1. │ Mary Burton │ 450 │ development │ 2. │ Elizabeth Tucker │ 370 │ development │ 3. │ Linda Black │ 350 │ qa │ └──────────────────┴───────┴─────────────┘
!= ALL
is equivalent to NOT IN. The above example with NOT IN
can be rewritten as follows:
SELECT full_name AS employee, bonus FROM employee_bonus_oct_24_full
WHERE employee_id != ALL (SELECT id FROM employees WHERE location='New York');
The result will be the same:
┌─employee──────────┬─bonus─┐ 1. │ Mary Burton │ 450 │ 2. │ Patricia Robinson │ 500 │ 3. │ Robert Gray │ 950 │ 4. │ Elizabeth Tucker │ 370 │ 5. │ Paul Brown │ 1000 │ 6. │ Andrew Clarke │ 950 │ 7. │ William Ferguson │ 770 │ 8. │ David Green │ 500 │ 9. │ Donald Carter │ 600 │ 10. │ Elizabeth Collins │ 480 │ └───────────────────┴───────┘
EXISTS
Syntax of the EXISTS
operator:
EXISTS (<subquery>)
It checks how many rows the subquery result contains. If the result is empty, the operator returns 0
, otherwise — 1
.
CAUTION
References to tables and columns of a parent query are not supported in a subquery.
|
For example, the following query displays the employees
table only if it contains at least one employee who lives in Chicago:
SELECT full_name AS employee, location AS city
FROM employees
WHERE EXISTS (SELECT * FROM employees WHERE location='Chicago');
┌─employee──────────┬─city────────┐ 1. │ James Wilson │ New York │ 2. │ Mary Burton │ Chicago │ 3. │ Patricia Robinson │ Los Angeles │ 4. │ Robert Gray │ Seattle │ 5. │ Elizabeth Tucker │ Chicago │ 6. │ Joseph Lewis │ New York │ 7. │ Paul Brown │ Chicago │ 8. │ Andrew Clarke │ Los Angeles │ 9. │ William Ferguson │ Chicago │ 10. │ Linda Black │ New York │ 11. │ David Green │ Chicago │ 12. │ Daniel Gray │ New York │ 13. │ Mark Armstrong │ New York │ 14. │ Donald Carter │ Seattle │ 15. │ Elizabeth Collins │ Chicago │ └───────────────────┴─────────────┘