# 9.20. 彙總函數

彙總函數將一組輸入值計算為單一個結果。[Table 9.52](#table-9-52-general-purpose-aggregate-functions) 中列出了內建的通用彙總函數，[Table 9.53](#table-9-53-aggregate-functions-for-statistics) 中列出了統計的彙總函數。[Table 9.54](#table-9-54-ordered-set-aggregate-functions) 中列出了內建的群組內有序集合函數，而內建的群組內假設集合函數列於 [Table 9.55](#table-9-55-hypothetical-set-aggregate-functions) 中。[Table 9.56](#table-9-56-grouping-operations) 列出了與彙總函數密切相關的分組操作。[第 4.2.7 節](https://docs.postgresql.tw/11/sql-syntax/value-expressions#4-2-7-biao-shi-shi)介紹了彙總函數的特殊語法注意事項。有關其他介紹性資訊，請參閱[第 2.7 節](https://docs.postgresql.tw/11/tutorial/the-sql-language/aggregate-functions)。

#### **Table 9.52. General-Purpose Aggregate Functions**

| Function                                   | Argument Type(s)                                                                           | Return Type                                                                                                                            | Partial Mode | Description                                                                                                                           |
| ------------------------------------------ | ------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------- |
| `array_agg(`*`expression`*)                | any non-array type                                                                         | array of the argument type                                                                                                             | No           | input values, including nulls, concatenated into an array                                                                             |
| `array_agg(`*`expression`*)                | any array type                                                                             | same as argument data type                                                                                                             | No           | input arrays concatenated into array of one higher dimension (inputs must all have same dimensionality, and cannot be empty or NULL)  |
| `avg(`*`expression`*)                      | `smallint`, `int`, `bigint`, `real`, `double precision`, `numeric`, or `interval`          | `numeric` for any integer-type argument, `double precision`for a floating-point argument, otherwise the same as the argument data type | Yes          | the average (arithmetic mean) of all input values                                                                                     |
| `bit_and(`*`expression`*)                  | `smallint`, `int`, `bigint`, or `bit`                                                      | same as argument data type                                                                                                             | Yes          | the bitwise AND of all non-null input values, or null if none                                                                         |
| `bit_or(`*`expression`*)                   | `smallint`, `int`, `bigint`, or `bit`                                                      | same as argument data type                                                                                                             | Yes          | the bitwise OR of all non-null input values, or null if none                                                                          |
| `bool_and(`*`expression`*)                 | `bool`                                                                                     | `bool`                                                                                                                                 | Yes          | true if all input values are true, otherwise false                                                                                    |
| `bool_or(`*`expression`*)                  | `bool`                                                                                     | `bool`                                                                                                                                 | Yes          | true if at least one input value is true, otherwise false                                                                             |
| `count(*)`                                 |                                                                                            | `bigint`                                                                                                                               | Yes          | number of input rows                                                                                                                  |
| `count(`*`expression`*)                    | any                                                                                        | `bigint`                                                                                                                               | Yes          | number of input rows for which the value of *`expression`*&#x69;s not null                                                            |
| `every(`*`expression`*)                    | `bool`                                                                                     | `bool`                                                                                                                                 | Yes          | equivalent to `bool_and`                                                                                                              |
| `json_agg(`*`expression`*)                 | `any`                                                                                      | `json`                                                                                                                                 | No           | aggregates values as a JSON array                                                                                                     |
| `jsonb_agg(`*`expression`*)                | `any`                                                                                      | `jsonb`                                                                                                                                | No           | aggregates values as a JSON array                                                                                                     |
| `json_object_agg(`*`name`*,*`value`*)      | `(any, any)`                                                                               | `json`                                                                                                                                 | No           | aggregates name/value pairs as a JSON object                                                                                          |
| `jsonb_object_agg(`*`name`*,*`value`*)     | `(any, any)`                                                                               | `jsonb`                                                                                                                                | No           | aggregates name/value pairs as a JSON object                                                                                          |
| `max(`*`expression`*)                      | any numeric, string, date/time, network, or enum type, or arrays of these types            | same as argument type                                                                                                                  | Yes          | maximum value of *`expression`* across all input values                                                                               |
| `min(`*`expression`*)                      | any numeric, string, date/time, network, or enum type, or arrays of these types            | same as argument type                                                                                                                  | Yes          | minimum value of *`expression`* across all input values                                                                               |
| `string_agg(`*`expression`*,*`delimiter`*) | (`text`, `text`) or (`bytea`, `bytea`)                                                     | same as argument types                                                                                                                 | No           | input values concatenated into a string, separated by delimiter                                                                       |
| `sum(`*`expression`*)                      | `smallint`, `int`, `bigint`, `real`, `double precision`, `numeric`, `interval`, or `money` | `bigint` for `smallint` or `int` arguments, `numeric` for`bigint` arguments, otherwise the same as the argument data type              | Yes          | sum of *`expression`* across all input values                                                                                         |
| `xmlagg(`*`expression`*)                   | `xml`                                                                                      | `xml`                                                                                                                                  | No           | concatenation of XML values (see also [Section 9.14.1.7](https://www.postgresql.org/docs/11/functions-xml.html#FUNCTIONS-XML-XMLAGG)) |

It should be noted that except for `count`, these functions return a null value when no rows are selected. In particular, `sum` of no rows returns null, not zero as one might expect, and `array_agg` returns null rather than an empty array when there are no input rows. The `coalesce` function can be used to substitute zero or an empty array for null when necessary.

Aggregate functions which support *Partial Mode* are eligible to participate in various optimizations, such as parallel aggregation.

#### Note

Boolean aggregates `bool_and` and `bool_or` correspond to standard SQL aggregates `every` and `any` or `some`. As for `any` and `some`, it seems that there is an ambiguity built into the standard syntax:

```
SELECT b1 = ANY((SELECT b2 FROM t2 ...)) FROM t1 ...;
```

Here `ANY` can be considered either as introducing a subquery, or as being an aggregate function, if the subquery returns one row with a Boolean value. Thus the standard name cannot be given to these aggregates.

#### Note

Users accustomed to working with other SQL database management systems might be disappointed by the performance of the `count` aggregate when it is applied to the entire table. A query like:

```
SELECT count(*) FROM sometable;
```

will require effort proportional to the size of the table: PostgreSQL will need to scan either the entire table or the entirety of an index which includes all rows in the table.

The aggregate functions `array_agg`, `json_agg`, `jsonb_agg`, `json_object_agg`, `jsonb_object_agg`, `string_agg`, and `xmlagg`, as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an `ORDER BY` clause within the aggregate call, as shown in [Section 4.2.7](https://www.postgresql.org/docs/11/sql-expressions.html#SYNTAX-AGGREGATES). Alternatively, supplying the input values from a sorted subquery will usually work. For example:

```
SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
```

Beware that this approach can fail if the outer query level contains additional processing, such as a join, because that might cause the subquery's output to be reordered before the aggregate is computed.

[Table 9.53](https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE) shows aggregate functions typically used in statistical analysis. (These are separated out merely to avoid cluttering the listing of more-commonly-used aggregates.) Where the description mentions *`N`*, it means the number of input rows for which all the input expressions are non-null. In all cases, null is returned if the computation is meaningless, for example when *`N`* is zero.

#### **Table 9.53. Aggregate Functions for Statistics**

| Function                       | Argument Type                                                         | Return Type                                                          | Partial Mode | Description                                                                                                      |
| ------------------------------ | --------------------------------------------------------------------- | -------------------------------------------------------------------- | ------------ | ---------------------------------------------------------------------------------------------------------------- |
| `corr(`*`Y`*, *`X`*)           | `double precision`                                                    | `double precision`                                                   | Yes          | correlation coefficient                                                                                          |
| `covar_pop(`*`Y`*, *`X`*)      | `double precision`                                                    | `double precision`                                                   | Yes          | population covariance                                                                                            |
| `covar_samp(`*`Y`*, *`X`*)     | `double precision`                                                    | `double precision`                                                   | Yes          | sample covariance                                                                                                |
| `regr_avgx(`*`Y`*, *`X`*)      | `double precision`                                                    | `double precision`                                                   | Yes          | average of the independent variable (`sum(`*`X`*)/*`N`*)                                                         |
| `regr_avgy(`*`Y`*, *`X`*)      | `double precision`                                                    | `double precision`                                                   | Yes          | average of the dependent variable (`sum(`*`Y`*)/*`N`*)                                                           |
| `regr_count(`*`Y`*, *`X`*)     | `double precision`                                                    | `bigint`                                                             | Yes          | number of input rows in which both expressions are nonnull                                                       |
| `regr_intercept(`*`Y`*, *`X`*) | `double precision`                                                    | `double precision`                                                   | Yes          | y-intercept of the least-squares-fit linear equation determined by the (*`X`*, *`Y`*) pairs                      |
| `regr_r2(`*`Y`*, *`X`*)        | `double precision`                                                    | `double precision`                                                   | Yes          | square of the correlation coefficient                                                                            |
| `regr_slope(`*`Y`*, *`X`*)     | `double precision`                                                    | `double precision`                                                   | Yes          | slope of the least-squares-fit linear equation determined by the (*`X`*, *`Y`*) pairs                            |
| `regr_sxx(`*`Y`*, *`X`*)       | `double precision`                                                    | `double precision`                                                   | Yes          | `sum(`*`X`*^2) - sum(*`X`*)^2/*`N`* (“sum of squares” of the independent variable)                               |
| `regr_sxy(`*`Y`*, *`X`*)       | `double precision`                                                    | `double precision`                                                   | Yes          | `sum(`*`X`*\**`Y`*) - sum(*`X`*) \* sum(*`Y`*)/*`N`* (“sum of products” of independent times dependent variable) |
| `regr_syy(`*`Y`*, *`X`*)       | `double precision`                                                    | `double precision`                                                   | Yes          | `sum(`*`Y`*^2) - sum(*`Y`*)^2/*`N`* (“sum of squares” of the dependent variable)                                 |
| `stddev(`*`expression`*)       | `smallint`, `int`, `bigint`, `real`, `double precision`, or `numeric` | `double precision` for floating-point arguments, otherwise `numeric` | Yes          | historical alias for `stddev_samp`                                                                               |
| `stddev_pop(`*`expression`*)   | `smallint`, `int`, `bigint`, `real`, `double precision`, or `numeric` | `double precision` for floating-point arguments, otherwise `numeric` | Yes          | population standard deviation of the input values                                                                |
| `stddev_samp(`*`expression`*)  | `smallint`, `int`, `bigint`, `real`, `double precision`, or `numeric` | `double precision` for floating-point arguments, otherwise `numeric` | Yes          | sample standard deviation of the input values                                                                    |
| `variance`(*`expression`*)     | `smallint`, `int`, `bigint`, `real`, `double precision`, or `numeric` | `double precision` for floating-point arguments, otherwise `numeric` | Yes          | historical alias for `var_samp`                                                                                  |
| `var_pop`(*`expression`*)      | `smallint`, `int`, `bigint`, `real`, `double precision`, or `numeric` | `double precision` for floating-point arguments, otherwise `numeric` | Yes          | population variance of the input values (square of the population standard deviation)                            |
| `var_samp`(*`expression`*)     | `smallint`, `int`, `bigint`, `real`, `double precision`, or `numeric` | `double precision` for floating-point arguments, otherwise `numeric` | Yes          | sample variance of the input values (square of the sample standard deviation)                                    |

[Table 9.54](https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) shows some aggregate functions that use the *ordered-set aggregate* syntax. These functions are sometimes referred to as “inverse distribution” functions.

#### **Table 9.54. Ordered-Set Aggregate Functions**

| Function                                                                     | Direct Argument Type(s) | Aggregated Argument Type(s)     | Return Type                     | Partial Mode | Description                                                                                                                                                                                              |
| ---------------------------------------------------------------------------- | ----------------------- | ------------------------------- | ------------------------------- | ------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `mode() WITHIN GROUP (ORDER BY`*`sort_expression`*)                          |                         | any sortable type               | same as sort expression         | No           | returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)                                                                                |
| `percentile_cont(`*`fraction`*) WITHIN GROUP (ORDER BY *`sort_expression`*)  | `double precision`      | `double precision` or`interval` | same as sort expression         | No           | continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed                                                     |
| `percentile_cont(`*`fractions`*) WITHIN GROUP (ORDER BY *`sort_expression`*) | `double precision[]`    | `double precision` or`interval` | array of sort expression's type | No           | multiple continuous percentile: returns an array of results matching the shape of the *`fractions`*&#x70;arameter, with each non-null element replaced by the value corresponding to that percentile     |
| `percentile_disc(`*`fraction`*) WITHIN GROUP (ORDER BY *`sort_expression`*)  | `double precision`      | any sortable type               | same as sort expression         | No           | discrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction                                                                               |
| `percentile_disc(`*`fractions`*) WITHIN GROUP (ORDER BY *`sort_expression`*) | `double precision[]`    | any sortable type               | array of sort expression's type | No           | multiple discrete percentile: returns an array of results matching the shape of the *`fractions`*&#x70;arameter, with each non-null element replaced by the input value corresponding to that percentile |

All the aggregates listed in [Table 9.54](https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) ignore null values in their sorted input. For those that take a *`fraction`* parameter, the fraction value must be between 0 and 1; an error is thrown if not. However, a null fraction value simply produces a null result.

Each of the aggregates listed in [Table 9.55](https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE) is associated with a window function of the same name defined in [Section 9.21](https://www.postgresql.org/docs/11/functions-window.html). In each case, the aggregate result is the value that the associated window function would have returned for the “hypothetical” row constructed from *`args`*, if such a row had been added to the sorted group of rows computed from the *`sorted_args`*.

#### **Table 9.55. Hypothetical-Set Aggregate Functions**

| Function                                                             | Direct Argument Type(s) | Aggregated Argument Type(s) | Return Type        | Partial Mode | Description                                                      |
| -------------------------------------------------------------------- | ----------------------- | --------------------------- | ------------------ | ------------ | ---------------------------------------------------------------- |
| `rank(`*`args`*) WITHIN GROUP (ORDER BY *`sorted_args`*)             | `VARIADIC` `"any"`      | `VARIADIC` `"any"`          | `bigint`           | No           | rank of the hypothetical row, with gaps for duplicate rows       |
| `dense_rank(`*`args`*) WITHIN GROUP (ORDER B&#x59;*`sorted_args`*)   | `VARIADIC` `"any"`      | `VARIADIC` `"any"`          | `bigint`           | No           | rank of the hypothetical row, without gaps                       |
| `percent_rank(`*`args`*) WITHIN GROUP (ORDER B&#x59;*`sorted_args`*) | `VARIADIC` `"any"`      | `VARIADIC` `"any"`          | `double precision` | No           | relative rank of the hypothetical row, ranging from 0 to 1       |
| `cume_dist(`*`args`*) WITHIN GROUP (ORDER B&#x59;*`sorted_args`*)    | `VARIADIC` `"any"`      | `VARIADIC` `"any"`          | `double precision` | No           | relative rank of the hypothetical row, ranging from 1/*`N`* to 1 |

For each of these hypothetical-set aggregates, the list of direct arguments given in *`args`* must match the number and types of the aggregated arguments given in *`sorted_args`*. Unlike most built-in aggregates, these aggregates are not strict, that is they do not drop input rows containing nulls. Null values sort according to the rule specified in the `ORDER BY` clause.

#### **Table 9.56. Grouping Operations**

| Function                | Return Type | Description                                                                                    |
| ----------------------- | ----------- | ---------------------------------------------------------------------------------------------- |
| `GROUPING(`*`args...`*) | `integer`   | Integer bit mask indicating which arguments are not being included in the current grouping set |

Grouping operations are used in conjunction with grouping sets (see [Section 7.2.4](https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-GROUPING-SETS)) to distinguish result rows. The arguments to the `GROUPING` operation are not actually evaluated, but they must match exactly expressions given in the `GROUP BY` clause of the associated query level. Bits are assigned with the rightmost argument being the least-significant bit; each bit is 0 if the corresponding expression is included in the grouping criteria of the grouping set generating the result row, and 1 if it is not. For example:

```
=> SELECT * FROM items_sold;
 make  | model | sales
-------+-------+-------
 Foo   | GT    |  10
 Foo   | Tour  |  20
 Bar   | City  |  15
 Bar   | Sport |  5
(4 rows)

=> SELECT make, model, GROUPING(make,model), sum(sales) FROM items_sold GROUP BY ROLLUP(make,model);
 make  | model | grouping | sum
-------+-------+----------+-----
 Foo   | GT    |        0 | 10
 Foo   | Tour  |        0 | 20
 Bar   | City  |        0 | 15
 Bar   | Sport |        0 | 5
 Foo   |       |        1 | 30
 Bar   |       |        1 | 20
       |       |        3 | 50
(7 rows)
```
