Value expressions are used in a variety of contexts, such as in the target list of the SELECT
command, as new column values in INSERT
or UPDATE
, or in search conditions in a number of commands. The result of a value expression is sometimes called a scalar, to distinguish it from the result of a table expression (which is a table). Value expressions are therefore also called scalar expressions (or even simply expressions). The expression syntax allows the calculation of values from primitive parts using arithmetic, logical, set, and other operations.
A value expression is one of the following:
A constant or literal value
A column reference
A positional parameter reference, in the body of a function definition or prepared statement
A subscripted expression
A field selection expression
An operator invocation
A function call
An aggregate expression
A window function call
A type cast
A collation expression
A scalar subquery
An array constructor
A row constructor
Another value expression in parentheses (used to group subexpressions and override precedence)
In addition to this list, there are a number of constructs that can be classified as an expression but do not follow any general syntax rules. These generally have the semantics of a function or operator and are explained in the appropriate location in Chapter 9. An example is the IS NULL
clause.
We have already discussed constants in Section 4.1.2. The following sections discuss the remaining options.
A column can be referenced in the form:
correlation
.columnname
correlation
is the name of a table (possibly qualified with a schema name), or an alias for a table defined by means of a FROM
clause. The correlation name and separating dot can be omitted if the column name is unique across all the tables being used in the current query. (See also Chapter 7.)
A positional parameter reference is used to indicate a value that is supplied externally to an SQL statement. Parameters are used in SQL function definitions and in prepared queries. Some client libraries also support specifying data values separately from the SQL command string, in which case parameters are used to refer to the out-of-line data values. The form of a parameter reference is:
$number
For example, consider the definition of a function, dept
, as:
CREATE FUNCTION dept(text) RETURNS dept AS $$ SELECT * FROM dept WHERE name = $1 $$ LANGUAGE SQL;
Here the $1
references the value of the first function argument whenever the function is invoked.
If an expression yields a value of an array type, then a specific element of the array value can be extracted by writing
expression
[subscript
]
or multiple adjacent elements (an “array slice”) can be extracted by writing
expression
[lower_subscript
:upper_subscript
]
(Here, the brackets [ ]
are meant to appear literally.) Each subscript
is itself an expression, which will be rounded to the nearest integer value.
In general the array expression
must be parenthesized, but the parentheses can be omitted when the expression to be subscripted is just a column reference or positional parameter. Also, multiple subscripts can be concatenated when the original array is multidimensional. For example:
mytable.arraycolumn[4] mytable.two_d_column[17][34] $1[10:42] (arrayfunction(a,b))[42]
The parentheses in the last example are required. See Section 8.15 for more about arrays.
If an expression yields a value of a composite type (row type), then a specific field of the row can be extracted by writing
expression
.fieldname
In general the row expression
must be parenthesized, but the parentheses can be omitted when the expression to be selected from is just a table reference or positional parameter. For example:
mytable.mycolumn $1.somecolumn (rowfunction(a,b)).col3
(Thus, a qualified column reference is actually just a special case of the field selection syntax.) An important special case is extracting a field from a table column that is of a composite type:
(compositecol).somefield (mytable.compositecol).somefield
The parentheses are required here to show that compositecol
is a column name not a table name, or that mytable
is a table name not a schema name in the second case.
You can ask for all fields of a composite value by writing .*
:
(compositecol).*
This notation behaves differently depending on context; see Section 8.16.5 for details.
There are three possible syntaxes for an operator invocation:
expression operator expression (binary infix operator) |
operator expression (unary prefix operator) |
expression operator (unary postfix operator) |
where the operator
token follows the syntax rules of Section 4.1.3, or is one of the key words AND
, OR
, and NOT
, or is a qualified operator name in the form:
OPERATOR(
schema
.
operatorname
)
Which particular operators exist and whether they are unary or binary depends on what operators have been defined by the system or the user. Chapter 9 describes the built-in operators.
The syntax for a function call is the name of a function (possibly qualified with a schema name), followed by its argument list enclosed in parentheses:
function_name
([expression
[,expression
... ]] )
For example, the following computes the square root of 2:
sqrt(2)
The list of built-in functions is in Chapter 9. Other functions can be added by the user.
When issuing queries in a database where some users mistrust other users, observe security precautions from Section 10.3 when writing function calls.
The arguments can optionally have names attached. See Section 4.3 for details.
A function that takes a single argument of composite type can optionally be called using field-selection syntax, and conversely field selection can be written in functional style. That is, the notations col(table)
and table.col
are interchangeable. This behavior is not SQL-standard but is provided in PostgreSQL because it allows use of functions to emulate “computed fields”. For more information see Section 8.16.5.
An aggregate expression represents the application of an aggregate function across the rows selected by a query. An aggregate function reduces multiple inputs to a single output value, such as the sum or average of the inputs. The syntax of an aggregate expression is one of the following:
aggregate_name
(expression
[ , ... ] [order_by_clause
] ) [ FILTER ( WHEREfilter_clause
) ]aggregate_name
(ALLexpression
[ , ... ] [order_by_clause
] ) [ FILTER ( WHEREfilter_clause
) ]aggregate_name
(DISTINCTexpression
[ , ... ] [order_by_clause
] ) [ FILTER ( WHEREfilter_clause
) ]aggregate_name
( * ) [ FILTER ( WHEREfilter_clause
) ]aggregate_name
( [expression
[ , ... ] ] ) WITHIN GROUP (order_by_clause
) [ FILTER ( WHEREfilter_clause
) ]
where aggregate_name
is a previously defined aggregate (possibly qualified with a schema name) and expression
is any value expression that does not itself contain an aggregate expression or a window function call. The optional order_by_clause
and filter_clause
are described below.
The first form of aggregate expression invokes the aggregate once for each input row. The second form is the same as the first, since ALL
is the default. The third form invokes the aggregate once for each distinct value of the expression (or distinct set of values, for multiple expressions) found in the input rows. The fourth form invokes the aggregate once for each input row; since no particular input value is specified, it is generally only useful for the count(*)
aggregate function. The last form is used with ordered-set aggregate functions, which are described below.
Most aggregate functions ignore null inputs, so that rows in which one or more of the expression(s) yield null are discarded. This can be assumed to be true, unless otherwise specified, for all built-in aggregates.
For example, count(*)
yields the total number of input rows; count(f1)
yields the number of input rows in which f1
is non-null, since count
ignores nulls; and count(distinct f1)
yields the number of distinct non-null values of f1
.
Ordinarily, the input rows are fed to the aggregate function in an unspecified order. In many cases this does not matter; for example, min
produces the same result no matter what order it receives the inputs in. However, some aggregate functions (such as array_agg
and string_agg
) produce results that depend on the ordering of the input rows. When using such an aggregate, the optional order_by_clause
can be used to specify the desired ordering. The order_by_clause
has the same syntax as for a query-level ORDER BY
clause, as described in Section 7.5, except that its expressions are always just expressions and cannot be output-column names or numbers. For example:
SELECT array_agg(a ORDER BY b DESC) FROM table;
When dealing with multiple-argument aggregate functions, note that the ORDER BY
clause goes after all the aggregate arguments. For example, write this:
SELECT string_agg(a, ',' ORDER BY a) FROM table;
not this:
SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect
The latter is syntactically valid, but it represents a call of a single-argument aggregate function with two ORDER BY
keys (the second one being rather useless since it's a constant).
If DISTINCT
is specified in addition to an order_by_clause
, then all the ORDER BY
expressions must match regular arguments of the aggregate; that is, you cannot sort on an expression that is not included in the DISTINCT
list.
The ability to specify both DISTINCT
and ORDER BY
in an aggregate function is a PostgreSQL extension.
Placing ORDER BY
within the aggregate's regular argument list, as described so far, is used when ordering the input rows for general-purpose and statistical aggregates, for which ordering is optional. There is a subclass of aggregate functions called ordered-set aggregates for which an order_by_clause
is required, usually because the aggregate's computation is only sensible in terms of a specific ordering of its input rows. Typical examples of ordered-set aggregates include rank and percentile calculations. For an ordered-set aggregate, the order_by_clause
is written inside WITHIN GROUP (...)
, as shown in the final syntax alternative above. The expressions in the order_by_clause
are evaluated once per input row just like regular aggregate arguments, sorted as per the order_by_clause
's requirements, and fed to the aggregate function as input arguments. (This is unlike the case for a non-WITHIN GROUP
order_by_clause
, which is not treated as argument(s) to the aggregate function.) The argument expressions preceding WITHIN GROUP
, if any, are called direct arguments to distinguish them from the aggregated arguments listed in the order_by_clause
. Unlike regular aggregate arguments, direct arguments are evaluated only once per aggregate call, not once per input row. This means that they can contain variables only if those variables are grouped by GROUP BY
; this restriction is the same as if the direct arguments were not inside an aggregate expression at all. Direct arguments are typically used for things like percentile fractions, which only make sense as a single value per aggregation calculation. The direct argument list can be empty; in this case, write just ()
not (*)
. (PostgreSQL will actually accept either spelling, but only the first way conforms to the SQL standard.)
An example of an ordered-set aggregate call is:
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY income) FROM households; percentile_cont ----------------- 50489
which obtains the 50th percentile, or median, value of the income
column from table households
. Here, 0.5
is a direct argument; it would make no sense for the percentile fraction to be a value varying across rows.
If FILTER
is specified, then only the input rows for which the filter_clause
evaluates to true are fed to the aggregate function; other rows are discarded. For example:
SELECT count(*) AS unfiltered, count(*) FILTER (WHERE i < 5) AS filtered FROM generate_series(1,10) AS s(i); unfiltered | filtered ------------+---------- 10 | 4 (1 row)
The predefined aggregate functions are described in Section 9.20. Other aggregate functions can be added by the user.
An aggregate expression can only appear in the result list or HAVING
clause of a SELECT
command. It is forbidden in other clauses, such as WHERE
, because those clauses are logically evaluated before the results of aggregates are formed.
When an aggregate expression appears in a subquery (see Section 4.2.11 and Section 9.22), the aggregate is normally evaluated over the rows of the subquery. But an exception occurs if the aggregate's arguments (and filter_clause
if any) contain only outer-level variables: the aggregate then belongs to the nearest such outer level, and is evaluated over the rows of that query. The aggregate expression as a whole is then an outer reference for the subquery it appears in, and acts as a constant over any one evaluation of that subquery. The restriction about appearing only in the result list or HAVING
clause applies with respect to the query level that the aggregate belongs to.