Window functions
A window function performs a calculation across a set of rows: all rows in a query are divided into groups (windows), and each group has its own aggregates. It does not return a single output row as an aggregate function. A window function adds an aggregated value to each row of the selection result in a separate column.
To follow the examples in this article and see how window functions work, first create the wnd_func_table
table:
CREATE TABLE wnd_func_table (group_id Int32, sort_id Int32, value Int32) ENGINE = MergeTree ORDER BY group_id;
INSERT INTO wnd_func_table FORMAT Values
(1,1,10), (1,2,20), (1,3,30), (1,4,40), (1,5,50),
(2,1,1), (2,2,2), (2,3,3), (2,4,4), (2,4,5), (2,4,6), (2,5,7), (2,6,8);
┌─group_id─┬─sort_id─┬─value─┐ │ 1 │ 1 │ 10 │ │ 1 │ 2 │ 20 │ │ 1 │ 3 │ 30 │ │ 1 │ 4 │ 40 │ │ 1 │ 5 │ 50 │ │ 2 │ 1 │ 1 │ │ 2 │ 2 │ 2 │ │ 2 │ 3 │ 3 │ │ 2 │ 4 │ 4 │ │ 2 │ 4 │ 5 │ │ 2 │ 4 │ 6 │ │ 2 │ 5 │ 7 │ │ 2 │ 6 │ 8 │ └──────────┴─────────┴───────┘
Syntax overview
The basic syntax for using a window function can be written as follows:
SELECT <window_function_name>([<argument_list>]) OVER {() | <window_name> | (<window_spec>)}
FROM <table_name>
WINDOW <window_name> AS (<window_spec>);
Use the OVER
clause to define a window for a function — a set of rows over which the function is calculated. After the OVER
keyword, you can specify:
-
()
— empty parentheses, if you want to pass all rows from a dataset to a window function;Example
The following query reads all rows from thewnd_func_table
table and adds row numbers and the sum of all values in thevalue
column to the additionalnumber
andsum
columns:SELECT group_id, sort_id, value, row_number() OVER() AS number, sum(value) OVER() AS sum FROM wnd_func_table;
┌─group_id─┬─sort_id─┬─value─┬─number─┬─sum─┐ │ 1 │ 1 │ 10 │ 1 │ 186 │ │ 1 │ 2 │ 20 │ 2 │ 186 │ │ 1 │ 3 │ 30 │ 3 │ 186 │ │ 1 │ 4 │ 40 │ 4 │ 186 │ │ 1 │ 5 │ 50 │ 5 │ 186 │ │ 2 │ 1 │ 1 │ 6 │ 186 │ │ 2 │ 2 │ 2 │ 7 │ 186 │ │ 2 │ 3 │ 3 │ 8 │ 186 │ │ 2 │ 4 │ 4 │ 9 │ 186 │ │ 2 │ 4 │ 5 │ 10 │ 186 │ │ 2 │ 4 │ 6 │ 11 │ 186 │ │ 2 │ 5 │ 7 │ 12 │ 186 │ │ 2 │ 6 │ 8 │ 13 │ 186 │ └──────────┴─────────┴───────┴────────┴─────┘
-
<window_name>
— name assigned to a window defined via theWINDOW
clause within a query (in other words, theWINDOW
clause allows you to describe a window once and then reuse it for multiple window functions in theSELECT
query); -
<window_spec>
— window specification in parentheses as a set of optional clauses (see the syntax below).
TIP
If you need to filter your query data based on values of window functions, calculate window functions within CTE or subquery.
|
Window specification
Below is the basic syntax of the window specification (<window_spec>
):
[<window_name>]
[PARTITION BY <partition_column> [, ...]]
[ORDER BY <order_column> [ASC|DESC] [, ...]]
[{RANGE | ROWS} {<frame_start> | BETWEEN <frame_start> AND <frame_end>}]
where:
-
PARTITION BY
— column that contains values used to divide rows into windows. A window function calculates a value for each row of the source dataset over rows of the window to which this row belongs. If you omitPARTITION BY
in a query, a window function is evaluated over all rows.Example
The following query uses thePARTITION BY
clause to divide data rows into windows by values of thegroup_id
column:SELECT group_id, sort_id, value, row_number() OVER (w) AS number, sum(value) OVER (w) AS sum FROM wnd_func_table WINDOW w AS (PARTITION BY group_id);
The
row_number
andsum
functions are calculated from data of each window separately (in this example, a window is a set of rows with the same values in thegroup_id
column):┌─group_id─┬─sort_id─┬─value─┬─number─┬─sum─┐ │ 1 │ 1 │ 10 │ 1 │ 150 │ │ 1 │ 2 │ 20 │ 2 │ 150 │ │ 1 │ 3 │ 30 │ 3 │ 150 │ │ 1 │ 4 │ 40 │ 4 │ 150 │ │ 1 │ 5 │ 50 │ 5 │ 150 │ │ 2 │ 1 │ 1 │ 1 │ 36 │ │ 2 │ 2 │ 2 │ 2 │ 36 │ │ 2 │ 3 │ 3 │ 3 │ 36 │ │ 2 │ 4 │ 4 │ 4 │ 36 │ │ 2 │ 4 │ 5 │ 5 │ 36 │ │ 2 │ 4 │ 6 │ 6 │ 36 │ │ 2 │ 5 │ 7 │ 7 │ 36 │ │ 2 │ 6 │ 8 │ 8 │ 36 │ └──────────┴─────────┴───────┴────────┴─────┘
-
ORDER BY
— column that contains values used to order rows inside a window during a window function calculation.If
ORDER BY
is specified, the function is calculated for rows from the beginning of the window to the current row (including all rows with the same value in the field specified in theORDER BY
clause) according to the specified sort order. That is, for each row in the source dataset, it is selected a frame — a set of rows over which the function should be calculated (for example, aggregate functions used as window functions operate with frame rows, rather than with the entire window). You can use the RANGE or ROWS clause to specify another rule for defining a frame.Example
The following query sets sorting by values of thesort_id
column:SELECT group_id, sort_id, value, row_number() OVER (w) AS number, groupArray(value) OVER (w) AS frame_values, sum(value) OVER (w) AS sum FROM wnd_func_table WINDOW w AS (PARTITION BY group_id ORDER BY sort_id ASC);
The output table includes the following additional columns with results of window functions:
-
number
— row number within a window; -
frame_values
— thevalue
column’s values in rows that belong to the frame for the current row; -
sum
— sum of thevalue
column’s values in rows included into the frame to which the current row belongs (in this example — each value is summed up with all previous ones).
┌─group_id─┬─sort_id─┬─value─┬─number─┬─frame_values──────┬─sum─┐ │ 1 │ 1 │ 10 │ 1 │ [10] │ 10 │ │ 1 │ 2 │ 20 │ 2 │ [10,20] │ 30 │ │ 1 │ 3 │ 30 │ 3 │ [10,20,30] │ 60 │ │ 1 │ 4 │ 40 │ 4 │ [10,20,30,40] │ 100 │ │ 1 │ 5 │ 50 │ 5 │ [10,20,30,40,50] │ 150 │ │ 2 │ 1 │ 1 │ 1 │ [1] │ 1 │ │ 2 │ 2 │ 2 │ 2 │ [1,2] │ 3 │ │ 2 │ 3 │ 3 │ 3 │ [1,2,3] │ 6 │ │ 2 │ 4 │ 4 │ 4 │ [1,2,3,4,5,6] │ 21 │ │ 2 │ 4 │ 5 │ 5 │ [1,2,3,4,5,6] │ 21 │ │ 2 │ 4 │ 6 │ 6 │ [1,2,3,4,5,6] │ 21 │ │ 2 │ 5 │ 7 │ 7 │ [1,2,3,4,5,6,7] │ 28 │ │ 2 │ 6 │ 8 │ 8 │ [1,2,3,4,5,6,7,8] │ 36 │ └──────────┴─────────┴───────┴────────┴───────────────────┴─────┘
The sort order of rows in a window is not related to the query-level sort order.
-
ROWS
The ROWS
clause allows you to define a frame by specifying a fixed number of rows preceding or following the current row in one of the following ways:
ROWS <frame_start>
ROWS BETWEEN <frame_start> AND <frame_end>
Bounds of a frame (<frame_start>
and <frame_end>
) can be defined with the following expressions.
CURRENT ROW |
Frame starts or ends with the current row |
UNBOUNDED PRECEDING |
Frame starts with the first row of the window |
UNBOUNDED FOLLOWING |
Frame ends with the last row of the window |
<N> PRECEDING |
Frame includes |
<M> FOLLOWING |
Frame includes |
If <frame_end>
is omitted, a frame ends with CURRENT ROW
by default.
A frame includes the current row and two previous ones:
SELECT
group_id,
sort_id,
value,
groupArray(value) OVER (PARTITION BY group_id ORDER BY sort_id ASC
ROWS 2 PRECEDING) AS frame_values
FROM wnd_func_table;
The frame_values
column contains values of the value
column in all rows that fall into a frame defined for each row separately:
┌─group_id─┬─sort_id─┬─value─┬─frame_values─┐ │ 1 │ 1 │ 10 │ [10] │ │ 1 │ 2 │ 20 │ [10,20] │ │ 1 │ 3 │ 30 │ [10,20,30] │ │ 1 │ 4 │ 40 │ [20,30,40] │ │ 1 │ 5 │ 50 │ [30,40,50] │ │ 2 │ 1 │ 1 │ [1] │ │ 2 │ 2 │ 2 │ [1,2] │ │ 2 │ 3 │ 3 │ [1,2,3] │ │ 2 │ 4 │ 4 │ [2,3,4] │ │ 2 │ 4 │ 5 │ [3,4,5] │ │ 2 │ 4 │ 6 │ [4,5,6] │ │ 2 │ 5 │ 7 │ [5,6,7] │ │ 2 │ 6 │ 8 │ [6,7,8] │ └──────────┴─────────┴───────┴──────────────┘
A frame includes the current row and all subsequent rows until the end of the window:
SELECT
group_id,
sort_id,
value,
groupArray(value) OVER (PARTITION BY group_id ORDER BY sort_id ASC
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS frame_values
FROM wnd_func_table;
┌─group_id─┬─sort_id─┬─value─┬─frame_values──────┐ │ 1 │ 1 │ 10 │ [10,20,30,40,50] │ │ 1 │ 2 │ 20 │ [20,30,40,50] │ │ 1 │ 3 │ 30 │ [30,40,50] │ │ 1 │ 4 │ 40 │ [40,50] │ │ 1 │ 5 │ 50 │ [50] │ │ 2 │ 1 │ 1 │ [1,2,3,4,5,6,7,8] │ │ 2 │ 2 │ 2 │ [2,3,4,5,6,7,8] │ │ 2 │ 3 │ 3 │ [3,4,5,6,7,8] │ │ 2 │ 4 │ 4 │ [4,5,6,7,8] │ │ 2 │ 4 │ 5 │ [5,6,7,8] │ │ 2 │ 4 │ 6 │ [6,7,8] │ │ 2 │ 5 │ 7 │ [7,8] │ │ 2 │ 6 │ 8 │ [8] │ └──────────┴─────────┴───────┴───────────────────┘
RANGE
The RANGE
clause is also intended to specify a window frame. Unlike ROWS
that operates with physical rows, RANGE
operates with row ranges defined by the ORDER BY
clause. This means that rows with the same values in ORDER BY
columns are counted as one row. For the ROWS
clause, each physical record in the dataset is considered a separate row.
There are two ways to use the RANGE
clause:
RANGE <frame_start>
RANGE BETWEEN <frame_start> AND <frame_end>
You can use expressions listed in the table below to set frame bounds (<frame_start>
and <frame_end>
).
CURRENT ROW |
Frame starts or ends with a range of rows with the same values in |
UNBOUNDED PRECEDING |
Frame starts with the first row of the window |
UNBOUNDED FOLLOWING |
Frame ends with the last row of the window |
<offset> PRECEDING |
Frame includes rows with the |
<offset> FOLLOWING |
Frame includes rows with the |
If <frame_end>
is omitted, a frame ends with CURRENT ROW
by default.
If ORDER BY
is specified in a query, but the frame is not explicitly described, the default frame is applied: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
A frame includes only the current row:
SELECT
group_id,
sort_id,
value,
groupArray(value) OVER (PARTITION BY group_id ORDER BY sort_id ASC
RANGE BETWEEN CURRENT ROW AND CURRENT ROW) AS frame_values
FROM wnd_func_table;
In the second window (group_id=2
), a range of rows with the same value in the sort_id
field is treated as a single row:
┌─group_id─┬─sort_id─┬─value─┬─frame_values─┐ │ 1 │ 1 │ 10 │ [10] │ │ 1 │ 2 │ 20 │ [20] │ │ 1 │ 3 │ 30 │ [30] │ │ 1 │ 4 │ 40 │ [40] │ │ 1 │ 5 │ 50 │ [50] │ │ 2 │ 1 │ 1 │ [1] │ │ 2 │ 2 │ 2 │ [2] │ │ 2 │ 3 │ 3 │ [3] │ │ 2 │ 4 │ 4 │ [4,5,6] │ │ 2 │ 4 │ 5 │ [4,5,6] │ │ 2 │ 4 │ 6 │ [4,5,6] │ │ 2 │ 5 │ 7 │ [7] │ │ 2 │ 6 │ 8 │ [8] │ └──────────┴─────────┴───────┴──────────────┘
A frame for each row includes rows where the number
value (the ORDER BY
sort field) is at most 10
less than and at most 5
greater than the number
value in the current row:
SELECT
number,
groupArray(number) OVER (ORDER BY number ASC
RANGE BETWEEN 10 PRECEDING AND 5 FOLLOWING) AS frame_values
FROM values('number Int8', 10, 20, 25, 27, 30, 40, 15, 50, 60, 7, 5, 2);
In the resulting table, the frame_values
column contains arrays of number
values for the frame of each row:
┌─number─┬─frame_values─────┐ │ 2 │ [2,5,7] │ │ 5 │ [2,5,7,10] │ │ 7 │ [2,5,7,10] │ │ 10 │ [2,5,7,10,15] │ │ 15 │ [5,7,10,15,20] │ │ 20 │ [10,15,20,25] │ │ 25 │ [15,20,25,27,30] │ │ 27 │ [20,25,27,30] │ │ 30 │ [20,25,27,30] │ │ 40 │ [30,40] │ │ 50 │ [40,50] │ │ 60 │ [50,60] │ └────────┴──────────────────┘
Function types
Window functions can be combined into the following groups:
-
Aggregate — aggregate functions (for example,
sum
,avg
,count
,min
,max
) can be used as window functions. -
Ranking — functions that rank rows in a window. For example, such functions can be used to assign a sequence number to a row or to rank rows.
Function Return value row_number()
Row number within a window. Numbering starts from
1
and does not depend on whether rows have the same values in theORDER BY
columnrank()
Row rank. The same rank is returned for rows with the same values in the
ORDER BY
column, andN-1
rank values are skipped for the following rows, whereN
is the number of rows in the previous group of rows with the same values in theORDER BY
column (that is, ranking is with gaps)dense_rank()
Row rank. The same rank is returned for rows with the same values in the
ORDER BY
column, and the numeric rank for the following rows is not skipped (that is, ranking is sequential) -
Value functions — functions that allow you to access different rows of a frame relative to the current row, as well as values of rows at the beginning or end of the frame.
Function Return value lagInFrame(value, [N])
leadInFrame(value, [N])
Value of the
value
column from a row that precedes or follows the current row byN
rows within the frame. The default value ofN
is1
first_value(value)
last_value(value)
Value of the
value
column from the first or last row of the framenth_value(value, N)
Value of the
value
column from the N-th row of the frame