Aggregate functions

Overview

Aggregate functions compute a single result from a set of input values. For example, you can compute the count, sum, average, maximum, and minimum over a set of rows.

Example:

SELECT MIN(public_year) FROM book;

The result:

 min
------
 1869

To utilize an aggregate function in the WHERE clause, use a subquery. The following example returns the title of the oldest book:

SELECT title FROM book WHERE public_year = (SELECT min(public_year) FROM book);

The result:

     title
---------------
 War and Peace

You can also combine aggregate functions with the GROUP BY clauses. For example, we can get the oldest book of each genre:

SELECT genre, min(public_year) public_year FROM book GROUP BY genre;

The result:

      genre      | public_year
-----------------+-------------
 science fiction |        1895
 fantasy         |        1937
 novel           |        1869

You can use the HAVING clause to filter the result set:

SELECT genre, min(public_year) public_year
    FROM book GROUP BY genre HAVING min(public_year) < 1900;

The result:

      genre      | min
-----------------+------
 science fiction | 1895
 novel           | 1869
NOTE
There is a fundamental difference between the WHERE and HAVING clauses. WHERE selects input rows before evaluating groups and aggregate functions, it determines which rows are included in aggregate calculations. HAVING selects group rows after groups and aggregate functions are calculated. The WHERE clause cannot contain aggregate functions, and the HAVING clause always contains aggregate functions.
If you need to use additional conditions, add them to the WHERE clause, not HAVING. This approach avoids aggregation calculations for rows that do not meet the additional conditions.

To obtain the full list of predefined aggregate functions, refer to the following link: Aggregate functions.

Aggregate expression syntax

The syntax of an aggregate expression can be one of the following:

  1. The first form of aggregate expression invokes the aggregate function once for each input row according to the specified expression:

    <aggregate_name> (<expression> [ , ... ] [<order_by_clause>] ) [ FILTER ( WHERE <filter_clause> ) ]

    Where:

    • aggregate_name is an aggregate function;

    • expression is a value expression that does not contain any aggregate expression or window function call;

    • order_by_clause is a clause that specifies the sort order (optional);

    • filter_clause is a clause that defines a filter (optional).

       

      Example:

      SELECT MIN(public_year) FROM book;
  2. The second form has the ALL modifier. Since ALL is used by default, the second form is the same as the first.

    <aggregate_name> (ALL <expression> [ , ... ] [<order_by_clause>] )
        [ FILTER (WHERE <filter_clause>) ]
  3. The third form invokes the aggregate function once for each distinct value of the expression (or distinct set of values, for multiple expressions).

     <aggregate_name> (DISTINCT <expression> [ , ... ] [<order_by_clause>] )
        [ FILTER (WHERE <filter_clause>) ]

    The following statement returns the number of distinct book titles:

    SELECT COUNT (DISTINCT(book_title)) FROM orders;
  4. The fourth form invokes the aggregate function once for each row. Since no particular input value is specified, this form only applies to the COUNT(*) function.

    <aggregate_name> ( * ) [ FILTER (WHERE <filter_clause>) ]
  5. The fifth form is used with ordered-set aggregate functions (for example, array_agg, string_agg, rank, and percentile). The result of these functions depends on the order of the input rows. The order_by_clause parameter is required for these functions.

    <aggregate_name> ( [ <expression> [ , ... ] ] ) WITHIN GROUP ( <order_by_clause>)
        [ FILTER (WHERE <filter_clause>) ]

    The ORDER BY clause has the same syntax as a query-level ORDER BY clause, but cannot contain output-column numbers and names.

    Example:

    SELECT array_agg(field1 ORDER BY field2 DESC) FROM table1;

    If you call multiple-argument aggregate functions, ORDER BY must be placed after all aggregate arguments:

    SELECT string_agg(field1, ',' ORDER BY field1) FROM table1;

    If DISTINCT is specified in addition to ORDER BY, all the ORDER BY expressions must match regular arguments of the aggregate function. You cannot sort by an expression that is not included in the DISTINCT list.

    For an ordered-set aggregate function, you can put order_by_clause inside WITHIN GROUP(…​). The expressions in order_by_clause are evaluated once per input row as regular aggregate arguments, sorted according to order_by_clause requirements, and passed to the aggregate function as input arguments. When order_by_clause is not in WITHIN GROUP, it is not treated as an aggregate function argument. The argument expressions preceding WITHIN GROUP, if any, are called direct arguments to distinguish them from the aggregated arguments listed in the order_by_clause. Direct arguments are evaluated once per aggregate call. They can contain variables only if these variables are grouped by GROUP BY. Direct arguments are typically used for a single value per aggregation calculation (for example, percentile fractions). If the direct argument list is empty, write ().

    Example:

    SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY income) FROM households;

For all types of aggregate expression listed above, filter_clause is an optional parameter. If FILTER is specified, only the input rows for which the filter_clause evaluates to true are passed to the aggregate function.

We have the following table:

 id |            book_title               | public_year
----+-------------------------------------+-------------
  1 | Mrs. Dalloway                       |        1925
  2 | To the Lighthouse                   |        1927
  3 | To Kill a Mockingbird               |        1960
  4 | The Great Gatsby                    |        1925
  5 | The Lord of the Rings               |        1955
  8 | War and Peace                       |        1869
  7 | The Hobbit, or There and Back Again |        1937
  9 | Hyperion                            |        1989
 10 | The Time Machine                    |        1895
  6 | 1984                                |        1949

Execute query with the specified FILTER:

SELECT
    COUNT(*) AS unfiltered,
    COUNT(*) FILTER (WHERE public_year < 1900) AS filtered
FROM book;

The result:

 unfiltered | filtered
------------+----------
         10 |        2

Most aggregate functions ignore NULL input, and rows where one or more expressions evaluate to NULL are discarded. This is true for all built-in aggregate functions unless otherwise specified. For example, COUNT(*) returns the total number of input rows, COUNT(field1) returns the number of input rows where field1 is not null, and COUNT(DISTINCT field1) returns the number of distinct non-null values of field1.

Found a mistake? Seleсt text and press Ctrl+Enter to report it