Common table expressions
A common table expression (CTE) is a temporary result of an SQL statement that can be referenced in another SQL statement. CTEs allow you to split complex SQL queries into simpler parts.
CTE has the following syntax:
WITH <CTE_name> (<column_list>) AS (
<CTE_query_definition>
)
<statement>;
In the statement above, the WITH
clause creates CTE using the following:
-
CTE_name is an identifier for CTE;
-
column_list is an optional parameter that contains a list of CTE columns;
-
CTE_query_definition specifies an SQL statement to populate CTE;
-
statement is an SQL expression that uses CTE.
You can also define several CTEs with a single WITH
clause.
The SELECT statement in CTE
The example below displays sale totals per book in the top sale store. The WITH
clause defines two auxiliary statements named store_sales
and top_stores
. The store_sales
output is used in top_stores
, and the top_stores
output is used in the parent SELECT
query.
WITH store_sales AS (
SELECT store, SUM(amount) AS total_sales
FROM orders
GROUP BY store
), top_stores AS (
SELECT store
FROM store_sales
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM store_sales)
)
SELECT store,
book,
SUM(quantity) AS book_quantity,
SUM(amount) AS book_sales
FROM orders
WHERE store IN (SELECT store FROM top_stores)
GROUP BY store, book;
If rewrite this query without WITH
, it will be a complex statement with two levels of nested SELECT subqueries.
Recursive queries
Recursive queries are used to process hierarchical or tree-structured data. You can use the RECURSIVE
modifier in a WITH
query to refer to the query’s own output.
A recursive CTE has the following syntax:
WITH RECURSIVE <CTE_name> AS(
<CTE_query_definition> -- non-recursive term
UNION [ALL]
<CTE_query definion> -- recursive term
A recursive WITH
query contains a non-recursive term, UNION
(or UNION ALL
), and a recursive term. Only the recursive term can contain a reference to the query’s own output.
PostgreSQL executes a recursive query in the following sequence:
-
Execute the non-recursive term to create the base result set (R0).
-
Execute the recursive term with Ri as an input to return the result set Ri+1 as the output.
-
Repeat step 2 until an empty set is returned.
-
Return the final result set that is a
UNION
orUNION ALL
of the result sets R0, R1,… Rn.
Create a new table to demonstrate a recursive query:
CREATE TABLE employees (
employee_id serial PRIMARY KEY,
full_name VARCHAR NOT NULL,
manager_id INT
);
Insert data to the table:
INSERT INTO employees (employee_id, full_name, manager_id)
VALUES
(1, 'James Wilson', NULL),
(2, 'Mary Burton', 1),
(3, 'Patricia Robinson', 1),
(4, 'Robert Gray', 1),
(5, 'Elizabeth Tucker', 2),
(6, 'Joseph Lewis', 2),
(7, 'William Ferguson', 2),
(8, 'Linda Black', 3),
(9, 'David Green', 3),
(10, 'Daniel Gray', 5),
(11, 'Mark Armstrong', 4),
(12, 'Donald Carter', 7),
(13, 'Elizabeth Collins', 7),
(14, 'Paul Brown', 8),
(15, 'Andrew Clarke', 8);
The following recursive query returns all the subordinates of the manager with employee_id
equal to 2
:
WITH RECURSIVE subordinates (employee_id, full_name, manager_id) AS (
SELECT employee_id, manager_id, full_name
FROM employees WHERE employee_id = 2
UNION
SELECT e.employee_id, e.manager_id, e.full_name
FROM employees e
INNER JOIN subordinates s ON s.employee_id = e.manager_id
)
SELECT * FROM subordinates;
The non-recursive term returns the base result set (R0):
employee_id | manager_id | full_name -------------+------------+------------- 2 | 1 | Mary Burton
The recursive term returns the subordinates of the employee from the first result. This is the JOIN
result of the employees
table and the subordinates
CTE. The first iteration of the recursive term returns the following result:
employee_id | manager_id | full_name -------------+------------+------------------ 5 | 2 | Elizabeth Tucker 6 | 2 | Joseph Lewis 7 | 2 | William Ferguson
The second iteration uses the result set above as the input, and returns the result set:
employee_id | manager_id | full_name -------------+------------+------------------- 12 | 7 | Donald Carter 13 | 7 | Elizabeth Collins 10 | 5 | Daniel Gray
The third iteration returns an empty result set because employees with employee_id
10, 12, and 13 do not have subordinates.
The final result set is the union of all result sets in the first and second iterations generated by the non-recursive and recursive terms.
employee_id | manager_id | full_name -------------+------------+------------------- 2 | 1 | Mary Burton 5 | 2 | Elizabeth Tucker 6 | 2 | Joseph Lewis 7 | 2 | William Ferguson 12 | 7 | Donald Carter 13 | 7 | Elizabeth Collins 10 | 5 | Daniel Gray
Search order
If you need to get the results in the depth-first or breadth-first order, you can use the SEARCH
clause.
Add SEARCH DEPTH FIRST BY
to display the result in the depth-first order. As a result, a column that contains the path to the row is added to the output.
WITH RECURSIVE subordinates(employee_id, manager_id, full_name) AS (
SELECT employee_id, manager_id, full_name
FROM employees WHERE employee_id = 2
UNION
SELECT e.employee_id, e.manager_id, e.full_name
FROM employees e
INNER JOIN subordinates s ON s.employee_id = e.manager_id
) SEARCH DEPTH FIRST BY employee_id SET ordercol
SELECT * FROM subordinates ORDER BY ordercol;
The result:
employee_id | manager_id | full_name | ordercol -------------+------------+-------------------+---------------- 2 | 1 | Mary Burton | {(2)} 5 | 2 | Elizabeth Tucker | {(2),(5)} 10 | 5 | Daniel Gray | {(2),(5),(10)} 6 | 2 | Joseph Lewis | {(2),(6)} 7 | 2 | William Ferguson | {(2),(7)} 12 | 7 | Donald Carter | {(2),(7),(12)} 13 | 7 | Elizabeth Collins | {(2),(7),(13)}
Use SEARCH BREADTH FIRST BY
to display the result in the breadth-first order. As a result, a column that contains the nested level of the recursive query is added to the output.
WITH RECURSIVE subordinates(employee_id, manager_id, full_name) AS (
SELECT employee_id, manager_id, full_name
FROM employees WHERE employee_id = 2
UNION
SELECT e.employee_id, e.manager_id, e.full_name
FROM employees e
INNER JOIN subordinates s ON s.employee_id = e.manager_id
) SEARCH BREADTH FIRST BY employee_id SET ordercol
SELECT * FROM subordinates ORDER BY ordercol;
The result:
employee_id | manager_id | full_name | ordercol -------------+------------+-------------------+---------- 2 | 1 | Mary Burton | (0,2) 5 | 2 | Elizabeth Tucker | (1,5) 6 | 2 | Joseph Lewis | (1,6) 7 | 2 | William Ferguson | (1,7) 10 | 5 | Daniel Gray | (2,10) 12 | 7 | Donald Carter | (2,12) 13 | 7 | Elizabeth Collins | (2,13)
Cycle Detection
When working with recursive queries, it is important to make sure that the recursive part of the query eventually returns no items, otherwise the query loops indefinitely. You can use the CYCLE
clause to detect cycles.
WITH RECURSIVE subordinates(employee_id, manager_id, full_name) AS (
SELECT employee_id, manager_id, full_name
FROM employees WHERE employee_id=2
UNION
SELECT e.employee_id, e.manager_id, e.full_name
FROM employees e
INNER JOIN subordinates s ON s.employee_id = e.manager_id
) CYCLE employee_id SET is_cycle USING path
SELECT * FROM subordinates;
The CYCLE
clause contains the list of columns to track for cycle detection (employee_id
— in the example above) a name of column that shows whether a cycle is detected (is_cycle
), and the name of another column that tracks the path (path
). The is_cycle
and path
columns are added to the query output.
The result:
employee_id | manager_id | full_name | is_cycle | path -------------+------------+-------------------+----------+---------------- 2 | 1 | Mary Burton | f | {(2)} 5 | 2 | Elizabeth Tucker | f | {(2),(5)} 6 | 2 | Joseph Lewis | f | {(2),(6)} 7 | 2 | William Ferguson | f | {(2),(7)} 12 | 7 | Donald Carter | f | {(2),(7),(12)} 13 | 7 | Elizabeth Collins | f | {(2),(7),(13)} 10 | 5 | Daniel Gray | f | {(2),(5),(10)}
TIP
A query can include both SEARCH and CYCLE clauses. The cycle path column is computed in the same way as the depth-first ordering column. The SEARCH DEPTH FIRST BY and CYCLE clauses used together create redundant computations. To avoid it, use the CYCLE clause and order by the path column. If you need to apply the breadth-first order, specify both SEARCH and CYCLE clauses.
|
CTE materialization
You can define whether the result of the СTE subquery (starting with AS
) is materialized. It means that the subquery is executed separately from the parent query and only once during its execution, and its result is written to a temporary table stored in memory. It can be useful if this subquery contains resource intensive calculations.
Use the MATERIALIZED
modifier to force the subquery separate calculation from the parent query:
WITH employees_data AS MATERIALIZED (
SELECT * FROM employees
)
SELECT full_name FROM employees_data WHERE employee_id = 5;
The following subqueries are materialized by default:
-
subqueries from recursive CTE queries;
-
subqueries that are referenced more than once.
Since a materialized subquery is executed and optimized separately from the parent query it may cause the performance problems. That is why the default behavior was changed starting with PostgreSQL 12. In recent PostgreSQL versions, CTE subqueries referenced once are not materialized. They are merged into the parent queries and the optimizer processes both query simultaneously. For example, the optimizer can combine conditions from two queries into one filter to search by index.
You can use the NOT MATERIALIZED
modifier to force the joint optimization of the subquery and parent query:
WITH employees_data AS NOT MATERIALIZED (
SELECT * FROM employees
)
SELECT (
SELECT full_name FROM employees_data WHERE employee_id = 5),
(SELECT full_name FROM employees_data WHERE employee_id = 6);
Note, the NOT MATERIALIZED
modifier has no effect on recursive CTE queries, they are always materialized.
Data-modifying statements
You can use the INSERT
, UPDATE
, and DELETE
statements in CTEs. It allows you to perform multiple different operations in the same query. The example below moves two rows from employees
to retired_employees
:
WITH moved_rows AS (
DELETE FROM employees
WHERE
employee_id = 4 OR
employee_id = 5
RETURNING *
)
INSERT INTO retired_employee
SELECT * FROM moved_rows;
DELETE
removes the specified rows from employees
, returns their content by the RETURNING clause. The parent query reads the subquery output and inserts it into retired_employee
.
Data-modifying statements in WITH
should contain the RETURNING
clause to refer to their result sets in the parent query.
Note that the CTE statement execution order is not determined. The query below returns not modified data from the employees
table, because the UPDATE
statement is executed later than SELECT
.
WITH employees_data AS (
UPDATE employees SET manager_id = 1 WHERE employee_id =10
RETURNING *
)
SELECT * FROM employees;
To get the modified data, refer to the subquery result:
WITH employees_data AS (
UPDATE employees SET manager_id = 1 WHERE employee_id =10
RETURNING *
)
SELECT * FROM employees_data;
Recursive self-references in data-modifying statements are not allowed. You can include data-modifying statement in the parent query. The following recursive query deletes all subordinates of the manager with the id 2
and this manager from employees
:
WITH RECURSIVE subordinates (employee_id, full_name, manager_id) AS (
SELECT employee_id, manager_id, full_name
FROM employees WHERE employee_id = 2
UNION
SELECT e.employee_id, e.manager_id, e.full_name
FROM employees e
INNER JOIN subordinates s ON s.employee_id = e.manager_id
)
DELETE FROM employees
WHERE employee_id IN (SELECT employee_id FROM subordinates);