Aggregate functions
Overview
Aggregate functions compute a single result from a set of input values. For example, you can compute the count, sum, average, maximum, and minimum over a set of rows.
Example:
SELECT MIN(public_year) FROM book;
The result:
min ------ 1869
To utilize an aggregate function in the WHERE
clause, use a subquery. The following example returns the title of the oldest book:
SELECT title FROM book WHERE public_year = (SELECT min(public_year) FROM book);
The result:
title --------------- War and Peace
You can also combine aggregate functions with the GROUP BY
clauses.
For example, we can get the oldest book of each genre:
SELECT genre, min(public_year) public_year FROM book GROUP BY genre;
The result:
genre | public_year -----------------+------------- science fiction | 1895 fantasy | 1937 novel | 1869
You can use the HAVING
clause to filter the result set:
SELECT genre, min(public_year) public_year
FROM book GROUP BY genre HAVING min(public_year) < 1900;
The result:
genre | min -----------------+------ science fiction | 1895 novel | 1869
NOTE
There is a fundamental difference between the WHERE and HAVING clauses. WHERE selects input rows before evaluating groups and aggregate functions, it determines which rows are included in aggregate calculations. HAVING selects group rows after groups and aggregate functions are calculated. The WHERE clause cannot contain aggregate functions, and the HAVING clause always contains aggregate functions.
If you need to use additional conditions, add them to the WHERE clause, not HAVING . This approach avoids aggregation calculations for rows that do not meet the additional conditions.
|
To obtain the full list of predefined aggregate functions, refer to the following link: Aggregate functions.
Aggregate expression syntax
The syntax of an aggregate expression can be one of the following:
-
The first form of aggregate expression invokes the aggregate function once for each input row according to the specified expression:
<aggregate_name> (<expression> [ , ... ] [<order_by_clause>] ) [ FILTER ( WHERE <filter_clause> ) ]
Where:
-
aggregate_name is an aggregate function;
-
expression is a value expression that does not contain any aggregate expression or window function call;
-
order_by_clause is a clause that specifies the sort order (optional);
-
filter_clause is a clause that defines a filter (optional).
Example:
SELECT MIN(public_year) FROM book;
-
-
The second form has the
ALL
modifier. SinceALL
is used by default, the second form is the same as the first.<aggregate_name> (ALL <expression> [ , ... ] [<order_by_clause>] ) [ FILTER (WHERE <filter_clause>) ]
-
The third form invokes the aggregate function once for each distinct value of the expression (or distinct set of values, for multiple expressions).
<aggregate_name> (DISTINCT <expression> [ , ... ] [<order_by_clause>] ) [ FILTER (WHERE <filter_clause>) ]
The following statement returns the number of distinct book titles:
SELECT COUNT (DISTINCT(book_title)) FROM orders;
-
The fourth form invokes the aggregate function once for each row. Since no particular input value is specified, this form only applies to the
COUNT(*)
function.<aggregate_name> ( * ) [ FILTER (WHERE <filter_clause>) ]
-
The fifth form is used with ordered-set aggregate functions (for example,
array_agg
,string_agg
,rank
, andpercentile
). The result of these functions depends on the order of the input rows. Theorder_by_clause
parameter is required for these functions.<aggregate_name> ( [ <expression> [ , ... ] ] ) WITHIN GROUP ( <order_by_clause>) [ FILTER (WHERE <filter_clause>) ]
The
ORDER BY
clause has the same syntax as a query-levelORDER BY
clause, but cannot contain output-column numbers and names.Example:
SELECT array_agg(field1 ORDER BY field2 DESC) FROM table1;
If you call multiple-argument aggregate functions,
ORDER BY
must be placed after all aggregate arguments:SELECT string_agg(field1, ',' ORDER BY field1) FROM table1;
If
DISTINCT
is specified in addition toORDER BY
, all theORDER BY
expressions must match regular arguments of the aggregate function. You cannot sort by an expression that is not included in the DISTINCT list.For an ordered-set aggregate function, you can put
order_by_clause
insideWITHIN GROUP(…)
. The expressions inorder_by_clause
are evaluated once per input row as regular aggregate arguments, sorted according toorder_by_clause
requirements, and passed to the aggregate function as input arguments. Whenorder_by_clause
is not inWITHIN GROUP
, it is not treated as an aggregate function argument. The argument expressions precedingWITHIN GROUP
, if any, are called direct arguments to distinguish them from the aggregated arguments listed in theorder_by_clause
. Direct arguments are evaluated once per aggregate call. They can contain variables only if these variables are grouped byGROUP BY
. Direct arguments are typically used for a single value per aggregation calculation (for example, percentile fractions). If the direct argument list is empty, write()
.Example:
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY income) FROM households;
For all types of aggregate expression listed above, filter_clause
is an optional parameter. If FILTER
is specified, only the input rows for which the filter_clause
evaluates to true
are passed to the aggregate function.
We have the following table:
id | book_title | public_year ----+-------------------------------------+------------- 1 | Mrs. Dalloway | 1925 2 | To the Lighthouse | 1927 3 | To Kill a Mockingbird | 1960 4 | The Great Gatsby | 1925 5 | The Lord of the Rings | 1955 8 | War and Peace | 1869 7 | The Hobbit, or There and Back Again | 1937 9 | Hyperion | 1989 10 | The Time Machine | 1895 6 | 1984 | 1949
Execute query with the specified FILTER
:
SELECT
COUNT(*) AS unfiltered,
COUNT(*) FILTER (WHERE public_year < 1900) AS filtered
FROM book;
The result:
unfiltered | filtered ------------+---------- 10 | 2
Most aggregate functions ignore NULL input, and rows where one or more expressions evaluate to NULL are discarded. This is true for all built-in aggregate functions unless otherwise specified. For example, COUNT(*)
returns the total number of input rows, COUNT(field1)
returns the number of input rows where field1
is not null, and COUNT(DISTINCT field1)
returns the number of distinct non-null values of field1
.