Common table expressions

A common table expression (CTE) is a temporary result of an SQL statement that can be referenced in another SQL statement. CTEs allow you to split complex SQL queries into simpler parts.

CTE has the following syntax:

WITH <CTE_name> (<column_list>) AS (
<CTE_query_definition>
)
<statement>;

In the statement above, the WITH clause creates CTE using the following:

  • CTE_name is an identifier for CTE;

  • column_list is an optional parameter that contains a list of CTE columns;

  • CTE_query_definition specifies an SQL statement to populate CTE;

  • statement is an SQL expression that uses CTE.

You can also define several CTEs with a single WITH clause.

The SELECT statement in CTE

The example below displays sale totals per book in the top sale store. The WITH clause defines two auxiliary statements named store_sales and top_stores. The store_sales output is used in top_stores, and the top_stores output is used in the parent SELECT query.

WITH store_sales AS (
    SELECT store, SUM(amount) AS total_sales
    FROM orders
    GROUP BY store
), top_stores AS (
    SELECT store
    FROM store_sales
    WHERE total_sales > (SELECT SUM(total_sales)/10 FROM store_sales)
)
SELECT store,
       book,
       SUM(quantity) AS book_quantity,
       SUM(amount) AS book_sales
FROM orders
WHERE store IN (SELECT store FROM top_stores)
GROUP BY store, book;

If rewrite this query without WITH, it will be a complex statement with two levels of nested SELECT subqueries.

Recursive queries

Recursive queries are used to process hierarchical or tree-structured data. You can use the RECURSIVE modifier in a WITH query to refer to the query’s own output.

A recursive CTE has the following syntax:

WITH RECURSIVE <CTE_name> AS(
    <CTE_query_definition> -- non-recursive term
    UNION [ALL]
    <CTE_query definion> -- recursive term

A recursive WITH query contains a non-recursive term, UNION (or UNION ALL), and a recursive term. Only the recursive term can contain a reference to the query’s own output.

PostgreSQL executes a recursive query in the following sequence:

  1. Execute the non-recursive term to create the base result set (R0).

  2. Execute the recursive term with Ri as an input to return the result set Ri+1 as the output.

  3. Repeat step 2 until an empty set is returned.

  4. Return the final result set that is a UNION or UNION ALL of the result sets R0, R1,…​ Rn.

Create a new table to demonstrate a recursive query:

CREATE TABLE employees (
employee_id serial PRIMARY KEY,
full_name VARCHAR NOT NULL,
manager_id INT
);

Insert data to the table:

INSERT INTO employees (employee_id, full_name, manager_id)
    VALUES
    (1, 'James Wilson', NULL),
    (2, 'Mary Burton', 1),
    (3, 'Patricia Robinson', 1),
    (4, 'Robert Gray', 1),
    (5, 'Elizabeth Tucker', 2),
    (6, 'Joseph Lewis', 2),
    (7, 'William Ferguson', 2),
    (8, 'Linda Black', 3),
    (9, 'David Green', 3),
    (10, 'Daniel Gray', 5),
    (11, 'Mark Armstrong', 4),
    (12, 'Donald Carter', 7),
    (13, 'Elizabeth Collins', 7),
    (14, 'Paul Brown', 8),
    (15, 'Andrew Clarke', 8);

The following recursive query returns all the subordinates of the manager with employee_id equal to 2:

WITH RECURSIVE subordinates (employee_id, full_name, manager_id) AS (
    SELECT employee_id, manager_id, full_name
    FROM employees WHERE employee_id = 2
    UNION
        SELECT e.employee_id, e.manager_id, e.full_name
        FROM employees e
    INNER JOIN subordinates s ON s.employee_id = e.manager_id
)
SELECT * FROM subordinates;

The non-recursive term returns the base result set (R0):

 employee_id | manager_id |  full_name
-------------+------------+-------------
           2 |          1 | Mary Burton

The recursive term returns the subordinates of the employee from the first result. This is the JOIN result of the employees table and the subordinates CTE. The first iteration of the recursive term returns the following result:

 employee_id | manager_id |    full_name
-------------+------------+------------------
           5 |          2 | Elizabeth Tucker
           6 |          2 | Joseph Lewis
           7 |          2 | William Ferguson

The second iteration uses the result set above as the input, and returns the result set:

 employee_id | manager_id |     full_name
-------------+------------+-------------------
          12 |          7 | Donald Carter
          13 |          7 | Elizabeth Collins
          10 |          5 | Daniel Gray

The third iteration returns an empty result set because employees with employee_id 10, 12, and 13 do not have subordinates.

The final result set is the union of all result sets in the first and second iterations generated by the non-recursive and recursive terms.

 employee_id | manager_id |     full_name
-------------+------------+-------------------
           2 |          1 | Mary Burton
           5 |          2 | Elizabeth Tucker
           6 |          2 | Joseph Lewis
           7 |          2 | William Ferguson
          12 |          7 | Donald Carter
          13 |          7 | Elizabeth Collins
          10 |          5 | Daniel Gray

Search order

If you need to get the results in the depth-first or breadth-first order, you can use the SEARCH clause.

Add SEARCH DEPTH FIRST BY to display the result in the depth-first order. As a result, a column that contains the path to the row is added to the output.

WITH RECURSIVE subordinates(employee_id, manager_id, full_name) AS (
    SELECT employee_id, manager_id, full_name
    FROM employees WHERE employee_id = 2
    UNION
        SELECT e.employee_id, e.manager_id, e.full_name
        FROM employees e
    INNER JOIN subordinates s ON s.employee_id = e.manager_id
) SEARCH DEPTH FIRST BY employee_id SET ordercol
SELECT * FROM subordinates ORDER BY ordercol;

The result:

 employee_id | manager_id |     full_name     |    ordercol
-------------+------------+-------------------+----------------
           2 |          1 | Mary Burton       | {(2)}
           5 |          2 | Elizabeth Tucker  | {(2),(5)}
          10 |          5 | Daniel Gray       | {(2),(5),(10)}
           6 |          2 | Joseph Lewis      | {(2),(6)}
           7 |          2 | William Ferguson  | {(2),(7)}
          12 |          7 | Donald Carter     | {(2),(7),(12)}
          13 |          7 | Elizabeth Collins | {(2),(7),(13)}

Use SEARCH BREADTH FIRST BY to display the result in the breadth-first order. As a result, a column that contains the nested level of the recursive query is added to the output.

WITH RECURSIVE subordinates(employee_id, manager_id, full_name) AS (
    SELECT employee_id, manager_id, full_name
    FROM employees WHERE employee_id = 2
    UNION
        SELECT e.employee_id, e.manager_id, e.full_name
        FROM employees e
    INNER JOIN subordinates s ON s.employee_id = e.manager_id
) SEARCH BREADTH FIRST BY employee_id SET ordercol
SELECT * FROM subordinates ORDER BY ordercol;

The result:

 employee_id | manager_id |     full_name     | ordercol
-------------+------------+-------------------+----------
           2 |          1 | Mary Burton       | (0,2)
           5 |          2 | Elizabeth Tucker  | (1,5)
           6 |          2 | Joseph Lewis      | (1,6)
           7 |          2 | William Ferguson  | (1,7)
          10 |          5 | Daniel Gray       | (2,10)
          12 |          7 | Donald Carter     | (2,12)
          13 |          7 | Elizabeth Collins | (2,13)

Cycle Detection

When working with recursive queries, it is important to make sure that the recursive part of the query eventually returns no items, otherwise the query loops indefinitely. You can use the CYCLE clause to detect cycles.

WITH RECURSIVE subordinates(employee_id, manager_id, full_name) AS (
    SELECT employee_id, manager_id, full_name
    FROM employees WHERE employee_id=2
    UNION
        SELECT e.employee_id, e.manager_id, e.full_name
        FROM employees e
    INNER JOIN subordinates s ON s.employee_id = e.manager_id
)  CYCLE employee_id SET is_cycle USING path
SELECT * FROM subordinates;

The CYCLE clause contains the list of columns to track for cycle detection (employee_id — in the example above) a name of column that shows whether a cycle is detected (is_cycle), and the name of another column that tracks the path (path). The is_cycle and path columns are added to the query output.

The result:

 employee_id | manager_id |     full_name     | is_cycle |      path
-------------+------------+-------------------+----------+----------------
           2 |          1 | Mary Burton       | f        | {(2)}
           5 |          2 | Elizabeth Tucker  | f        | {(2),(5)}
           6 |          2 | Joseph Lewis      | f        | {(2),(6)}
           7 |          2 | William Ferguson  | f        | {(2),(7)}
          12 |          7 | Donald Carter     | f        | {(2),(7),(12)}
          13 |          7 | Elizabeth Collins | f        | {(2),(7),(13)}
          10 |          5 | Daniel Gray       | f        | {(2),(5),(10)}
TIP
A query can include both SEARCH and CYCLE clauses. The cycle path column is computed in the same way as the depth-first ordering column. The SEARCH DEPTH FIRST BY and CYCLE clauses used together create redundant computations. To avoid it, use the CYCLE clause and order by the path column. If you need to apply the breadth-first order, specify both SEARCH and CYCLE clauses.

CTE materialization

You can define whether the result of the СTE subquery (starting with AS) is materialized. It means that the subquery is executed separately from the parent query and only once during its execution, and its result is written to a temporary table stored in memory. It can be useful if this subquery contains resource intensive calculations.

Use the MATERIALIZED modifier to force the subquery separate calculation from the parent query:

WITH employees_data AS MATERIALIZED (
    SELECT * FROM employees
)
SELECT full_name FROM employees_data WHERE employee_id = 5;

The following subqueries are materialized by default:

  • subqueries from recursive CTE queries;

  • subqueries that are referenced more than once.

Since a materialized subquery is executed and optimized separately from the parent query it may cause the performance problems. That is why the default behavior was changed starting with PostgreSQL 12. In recent PostgreSQL versions, CTE subqueries referenced once are not materialized. They are merged into the parent queries and the optimizer processes both query simultaneously. For example, the optimizer can combine conditions from two queries into one filter to search by index.

You can use the NOT MATERIALIZED modifier to force the joint optimization of the subquery and parent query:

WITH employees_data AS NOT MATERIALIZED (
    SELECT * FROM employees
)
SELECT (
	SELECT full_name FROM employees_data WHERE employee_id = 5),
    (SELECT full_name FROM employees_data WHERE employee_id = 6);

Note, the NOT MATERIALIZED modifier has no effect on recursive CTE queries, they are always materialized.

Data-modifying statements

You can use the INSERT, UPDATE, and DELETE statements in CTEs. It allows you to perform multiple different operations in the same query. The example below moves two rows from employees to retired_employees:

WITH moved_rows AS (
    DELETE FROM employees
    WHERE
        employee_id = 4 OR
        employee_id = 5
    RETURNING *
)
INSERT INTO retired_employee
SELECT * FROM moved_rows;

DELETE removes the specified rows from employees, returns their content by the RETURNING clause. The parent query reads the subquery output and inserts it into retired_employee.

Data-modifying statements in WITH should contain the RETURNING clause to refer to their result sets in the parent query.

Note that the CTE statement execution order is not determined. The query below returns not modified data from the employees table, because the UPDATE statement is executed later than SELECT.

WITH employees_data AS (
    UPDATE employees SET manager_id = 1 WHERE employee_id =10
    RETURNING *
)
SELECT * FROM employees;

To get the modified data, refer to the subquery result:

WITH employees_data AS (
    UPDATE employees SET manager_id = 1 WHERE employee_id =10
    RETURNING *
)
SELECT * FROM employees_data;

Recursive self-references in data-modifying statements are not allowed. You can include data-modifying statement in the parent query. The following recursive query deletes all subordinates of the manager with the id 2 and this manager from employees:

WITH RECURSIVE subordinates (employee_id, full_name, manager_id) AS (
    SELECT employee_id, manager_id, full_name
    FROM employees WHERE employee_id = 2
    UNION
        SELECT e.employee_id, e.manager_id, e.full_name
        FROM employees e
    INNER JOIN subordinates s ON s.employee_id = e.manager_id
)
DELETE FROM employees
  WHERE employee_id IN (SELECT employee_id FROM subordinates);
Found a mistake? Seleсt text and press Ctrl+Enter to report it