A composite type represents the structure of a row or record; it is essentially just a list of field names and their data types. PostgreSQL allows composite types to be used in many of the same ways that simple types can be used. For example, a column of a table can be declared to be of a composite type.
Here are two simple examples of defining composite types:
The syntax is comparable to CREATE TABLE
, except that only field names and types can be specified; no constraints (such as NOT NULL
) can presently be included. Note that the AS
keyword is essential; without it, the system will think a different kind of CREATE TYPE
command is meant, and you will get odd syntax errors.
Having defined the types, we can use them to create tables:
or functions:
Whenever you create a table, a composite type is also automatically created, with the same name as the table, to represent the table's row type. For example, had we said:
then the same inventory_item
composite type shown above would come into being as a byproduct, and could be used just as above. Note however an important restriction of the current implementation: since no constraints are associated with a composite type, the constraints shown in the table definition do not apply to values of the composite type outside the table. (To work around this, create a domain over the composite type, and apply the desired constraints as CHECK
constraints of the domain.)
To write a composite value as a literal constant, enclose the field values within parentheses and separate them by commas. You can put double quotes around any field value, and must do so if it contains commas or parentheses. (More details appear below.) Thus, the general format of a composite constant is the following:
An example is:
which would be a valid value of the inventory_item
type defined above. To make a field be NULL, write no characters at all in its position in the list. For example, this constant specifies a NULL third field:
If you want an empty string rather than NULL, write double quotes:
Here the first field is a non-NULL empty string, the third is NULL.
(These constants are actually only a special case of the generic type constants discussed in Section 4.1.2.7. The constant is initially treated as a string and passed to the composite-type input conversion routine. An explicit type specification might be necessary to tell which type to convert the constant to.)
The ROW
expression syntax can also be used to construct composite values. In most cases this is considerably simpler to use than the string-literal syntax since you don't have to worry about multiple layers of quoting. We already used this method above:
The ROW keyword is actually optional as long as you have more than one field in the expression, so these can be simplified to:
The ROW
expression syntax is discussed in more detail in Section 4.2.13.
To access a field of a composite column, one writes a dot and the field name, much like selecting a field from a table name. In fact, it's so much like selecting from a table name that you often have to use parentheses to keep from confusing the parser. For example, you might try to select some subfields from our on_hand
example table with something like:
This will not work since the name item
is taken to be a table name, not a column name of on_hand
, per SQL syntax rules. You must write it like this:
or if you need to use the table name as well (for instance in a multitable query), like this:
Now the parenthesized object is correctly interpreted as a reference to the item
column, and then the subfield can be selected from it.
Similar syntactic issues apply whenever you select a field from a composite value. For instance, to select just one field from the result of a function that returns a composite value, you'd need to write something like:
Without the extra parentheses, this will generate a syntax error.
The special field name *
means “all fields”, as further explained in Section 8.16.5.
Here are some examples of the proper syntax for inserting and updating composite columns. First, inserting or updating a whole column:
The first example omits ROW
, the second uses it; we could have done it either way.
We can update an individual subfield of a composite column:
Notice here that we don't need to (and indeed cannot) put parentheses around the column name appearing just after SET
, but we do need parentheses when referencing the same column in the expression to the right of the equal sign.
And we can specify subfields as targets for INSERT
, too:
Had we not supplied values for all the subfields of the column, the remaining subfields would have been filled with null values.
There are various special syntax rules and behaviors associated with composite types in queries. These rules provide useful shortcuts, but can be confusing if you don't know the logic behind them.
In PostgreSQL, a reference to a table name (or alias) in a query is effectively a reference to the composite value of the table's current row. For example, if we had a table inventory_item
as shown above, we could write:
This query produces a single composite-valued column, so we might get output like:
Note however that simple names are matched to column names before table names, so this example works only because there is no column named c
in the query's tables.
The ordinary qualified-column-name syntax table_name
.
column_name
can be understood as applying field selection to the composite value of the table's current row. (For efficiency reasons, it's not actually implemented that way.)
When we write
then, according to the SQL standard, we should get the contents of the table expanded into separate columns:
as if the query were
PostgreSQL will apply this expansion behavior to any composite-valued expression, although as shown above, you need to write parentheses around the value that .*
is applied to whenever it's not a simple table name. For example, if myfunc()
is a function returning a composite type with columns a
, b
, and c
, then these two queries have the same result:
PostgreSQL handles column expansion by actually transforming the first form into the second. So, in this example, myfunc()
would get invoked three times per row with either syntax. If it's an expensive function you may wish to avoid that, which you can do with a query like:
Placing the function in a LATERAL
FROM
item keeps it from being invoked more than once per row. m.*
is still expanded into m.a, m.b, m.c
, but now those variables are just references to the output of the FROM
item. (The LATERAL
keyword is optional here, but we show it to clarify that the function is getting x
from some_table
.)
The composite_value
.*
syntax results in column expansion of this kind when it appears at the top level of a SELECT
output list, a RETURNING
list in INSERT
/UPDATE
/DELETE
, a VALUES
clause, or a row constructor. In all other contexts (including when nested inside one of those constructs), attaching .*
to a composite value does not change the value, since it means “all columns” and so the same composite value is produced again. For example, if somefunc()
accepts a composite-valued argument, these queries are the same:
In both cases, the current row of inventory_item
is passed to the function as a single composite-valued argument. Even though .*
does nothing in such cases, using it is good style, since it makes clear that a composite value is intended. In particular, the parser will consider c
in c.*
to refer to a table name or alias, not to a column name, so that there is no ambiguity; whereas without .*
, it is not clear whether c
means a table name or a column name, and in fact the column-name interpretation will be preferred if there is a column named c
.
Another example demonstrating these concepts is that all these queries mean the same thing:
All of these ORDER BY
clauses specify the row's composite value, resulting in sorting the rows according to the rules described in Section 9.23.6. However, if inventory_item
contained a column named c
, the first case would be different from the others, as it would mean to sort by that column only. Given the column names previously shown, these queries are also equivalent to those above:
(The last case uses a row constructor with the key word ROW
omitted.)
Another special syntactical behavior associated with composite values is that we can use functional notation for extracting a field of a composite value. The simple way to explain this is that the notations field
(table
) and table
.field
are interchangeable. For example, these queries are equivalent:
Moreover, if we have a function that accepts a single argument of a composite type, we can call it with either notation. These queries are all equivalent:
This equivalence between functional notation and field notation makes it possible to use functions on composite types to implement “computed fields”. An application using the last query above wouldn't need to be directly aware that somefunc
isn't a real column of the table.
Because of this behavior, it's unwise to give a function that takes a single composite-type argument the same name as any of the fields of that composite type. If there is ambiguity, the field-name interpretation will be chosen if field-name syntax is used, while the function will be chosen if function-call syntax is used. However, PostgreSQL versions before 11 always chose the field-name interpretation, unless the syntax of the call required it to be a function call. One way to force the function interpretation in older versions is to schema-qualify the function name, that is, write schema
.func
(compositevalue
).
The external text representation of a composite value consists of items that are interpreted according to the I/O conversion rules for the individual field types, plus decoration that indicates the composite structure. The decoration consists of parentheses ((
and )
) around the whole value, plus commas (,
) between adjacent items. Whitespace outside the parentheses is ignored, but within the parentheses it is considered part of the field value, and might or might not be significant depending on the input conversion rules for the field data type. For example, in:
the whitespace will be ignored if the field type is integer, but not if it is text.
As shown previously, when writing a composite value you can write double quotes around any individual field value. You must do so if the field value would otherwise confuse the composite-value parser. In particular, fields containing parentheses, commas, double quotes, or backslashes must be double-quoted. To put a double quote or backslash in a quoted composite field value, precede it with a backslash. (Also, a pair of double quotes within a double-quoted field value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can avoid quoting and use backslash-escaping to protect all data characters that would otherwise be taken as composite syntax.
A completely empty field value (no characters at all between the commas or parentheses) represents a NULL. To write a value that is an empty string rather than NULL, write ""
.
The composite output routine will put double quotes around field values if they are empty strings or contain parentheses, commas, double quotes, backslashes, or white space. (Doing so for white space is not essential, but aids legibility.) Double quotes and backslashes embedded in field values will be doubled.
Remember that what you write in an SQL command will first be interpreted as a string literal, and then as a composite. This doubles the number of backslashes you need (assuming escape string syntax is used). For example, to insert a text
field containing a double quote and a backslash in a composite value, you'd need to write:
The string-literal processor removes one level of backslashes, so that what arrives at the composite-value parser looks like ("\"\\")
. In turn, the string fed to the text
data type's input routine becomes "\
. (If we were working with a data type whose input routine also treated backslashes specially, bytea
for example, we might need as many as eight backslashes in the command to get one backslash into the stored composite field.) Dollar quoting (see Section 4.1.2.4) can be used to avoid the need to double backslashes.
The ROW
constructor syntax is usually easier to work with than the composite-literal syntax when writing composite values in SQL commands. In ROW
, individual field values are written the same way they would be written when not members of a composite.\