Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
PostgreSQL 內建一套豐富的資料型別供用戶使用。使用者也可以使用 CREATE TYPE 指令讓 PostgreSQL 增加新的資料型別。
Table 8.1 列出所有內建的通用資料型別。大多數列在「Aliases」中的替代名稱是由於在 PostgreSQL 內部使用的歷史因素。此外,還有一些內部使用或不建議使用的資料型別,但這裡並沒有列出。
Table 8.1. Data Types
bigint
int8
signed eight-byte integer
bigserial
serial8
autoincrementing eight-byte integer
bit [ (n
) ]
fixed-length bit string
bit varying [ (n
) ]
varbit
variable-length bit string
boolean
bool
logical Boolean (true/false)
box
rectangular box on a plane
bytea
binary data (“byte array”)
character [ (n
) ]
char [ (n
) ]
fixed-length character string
character varying [ (n
) ]
varchar [ (n
) ]
variable-length character string
cidr
IPv4 or IPv6 network address
circle
circle on a plane
date
calendar date (year, month, day)
double precision
float8
double precision floating-point number (8 bytes)
inet
IPv4 or IPv6 host address
integer
int
,int4
signed four-byte integer
interval [fields
] [ (p
) ]
time span
json
textual JSON data
jsonb
binary JSON data, decomposed
line
infinite line on a plane
lseg
line segment on a plane
macaddr
MAC (Media Access Control) address
macaddr8
MAC (Media Access Control) address (EUI-64 format)
money
currency amount
numeric [ (p
,s
) ]
decimal [ (p
,s
) ]
exact numeric of selectable precision
path
geometric path on a plane
pg_lsn
PostgreSQLLog Sequence Number
point
geometric point on a plane
polygon
closed geometric path on a plane
real
float4
single precision floating-point number (4 bytes)
smallint
int2
signed two-byte integer
smallserial
serial2
autoincrementing two-byte integer
serial
serial4
autoincrementing four-byte integer
text
variable-length character string
time [ (p
) ] [ without time zone ]
time of day (no time zone)
time [ (p
) ] with time zone
timetz
time of day, including time zone
timestamp [ (p
) ] [ without time zone ]
date and time (no time zone)
timestamp [ (p
) ] with time zone
timestamptz
date and time, including time zone
tsquery
text search query
tsvector
text search document
txid_snapshot
user-level transaction ID snapshot
uuid
universally unique identifier
xml
XML data
以下資料型別(或其拼寫方式)是由 SQL 指定的:bigint
,bit
,bit varying
,boolean
,char
,character varying
,character
,varchar
,date
,double precision
,integer
,interval
,numeric
,decimal
,real
,smallint
,time
(with or without time zone),timestamp
(with or without time zone),xml
.
每種資料型別都具有其明確的輸入和輸出功能外部表示法。許多內建的資料型別都有明顯的外部格式。但是,有幾種資料型別是 PostgreSQL 獨有的,比如幾何路徑,或者有幾種可能的格式,像是日期和時間型別。某些輸入和輸出功能是不可逆的,意即,與原始輸入相比,輸出功能的結果可能會失去一些精確度。
貨幣型別儲存具有固定小數精確度的貨幣數量;詳見表 8.3。 小數精確度視資料庫的 lc_monetary 設定而定。表中顯示的範圍假設有兩個小數位。有許多可以接受的格式,包括整數和浮點數字,以及典型的貨幣格式,例如如「$1,000.00」。 輸出時通常採用後者的形式,但取決於語言環境(locale)。
Table 8.3. Monetary Types
money
8 bytes
currency amount
-92233720368547758.08 to +92233720368547758.07
由於此資料型別的輸出是與區域設定有關的,因此可能無法將貨幣資料載入到不同 lc_monetary 設定的資料庫中。為避免出現問題,在將轉換恢復到新的資料庫之前,請確保 lc_monetary 與轉換的資料庫中的設定值相容。
numberic、int 和 bigint 資料型別的值可以轉換為 money。從 real 和 double precision 資料型別轉換會先轉為 numeric 來完成,例如:
但是,並不推薦這樣做。由於四捨五入誤差的可能性,不應該使用浮點數來處理貨幣。
money 型別的數值可以轉換為 numeric 而不會損失精確度。轉換為其他型別可能會失去精確性,而且還必須分兩步驟完成:
當貨幣數值除以另一貨幣數值時,結果會是 double precision(即純數,而不是貨幣);貨幣單位會相互抵消。
數字型別由兩位數,四位數和八位數整數,四位元組和八位元組的浮點數以及可調式精確度的小數組成。表格 8.2 列出了可用的類型。
smallint
2 bytes
small-range integer
-32768 to +32767
integer
4 bytes
typical choice for integer
-2147483648 to +2147483647
bigint
8 bytes
large-range integer
-9223372036854775808 to +9223372036854775807
decimal
variable
user-specified precision, exact
up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
numeric
variable
user-specified precision, exact
up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
real
4 bytes
variable-precision, inexact
6 decimal digits precision
double precision
8 bytes
variable-precision, inexact
15 decimal digits precision
smallserial
2 bytes
small autoincrementing integer
1 to 32767
serial
4 bytes
autoincrementing integer
1 to 2147483647
bigserial
8 bytes
large autoincrementing integer
1 to 9223372036854775807
4.1.2 節描述了數字型別常數的語法。 數字型別有一整套相應的算術運算元和函數。有關更多訊息,請參閱第 9 章。 以下各節將詳細介紹這些型別。
smallint、integer 和 bigint 型別儲存整數,即不包含小數部分的各種範圍的數字。嘗試儲存在允許的範圍之外的數值將會導致錯誤。
「integer」型別是常見的選擇,因為它提供了數值範圍、儲存空間及效能之間的最佳平衡。「smallint」 列別通常只在磁碟空間不足的情況下使用。「bigint」 型別被設計用於整數型別的範圍不足時。
SQL僅指定整數型別 integer(或 int)、smallint 和 bigint。 型別名稱 int2、int4 和 int8 則是延伸型別,也有一些其他 SQL 資料庫系統使用。
數字型別可以儲存很多位數的數字。特別建議使用在要求正確性的地方,像是儲存貨幣金額或其他數量。使用數值的計算在可能需要的情況下得到確切的結果,例如 加法、減法、乘法。但是,與整數型別或下一節中介紹的浮點型別相比,對數值的計算速度非常緩慢。
我們使用下面的術語:數字的「scale」是小數點右邊的小數部分,也就是小數的位數。數字的「precision」是整數中有效位數的總數,即小數點兩邊的位數總合。所以 23.5141 的 precision 是 6,scale 是 4。整數可以被認為是 scale 為 0。
可以配置數字欄位的最大 precision 和最大 scale。要宣告數字型別的欄位,請使用以下語法:
precision 必須是正值,scale 為零或正值。或是:
選擇 0 為 scale。這樣使用:
沒有任何 precision 或 scale 的話,就會建立一個欄位,其欄位中可以儲存任何 precision 和 scale 的數字值,直到達到 precision 的極限。這種型別的欄位不會將輸入值強制轉為任何特定的 scale,其中具有聲明比例的數字欄位會將輸入值強制為該 scale。 (SQL 標準需要預設 scale 為 0,即強制為整數精度,我們發現這樣做有點無用。如果你擔心可移植性,請務必明確指定 precision 和 scale。
注意在型別宣告中明確指定時允許的最大 precision 為 1000;沒有指定 precision 的NUMERIC 為 Table 8.2 中所述的限制。
如果要儲存的小數位數大於欄位所宣告的 scale,則係統會將值四捨五入到宣告所指定的小數位數。然後,如果小數點左邊的位數超過宣告的 precise 減去聲明的 scale 的話,則會產生錯誤。
數字內容的實體儲存不會有任何額外的前導位數或補零。因此,欄位宣告的 precise 和 scale 是最大值,而不是固定的分配。(在這個意義上,數字型別更像是 varchar(n) 而不是 char(n)。) 實際儲存的要求是每四個十進制數字組加兩個位元組,再加上三到八個位元組的額外配置。
除了普通的數值之外,數字型別還允許特殊值 NaN,意思是「不是一個數字」。 NaN 的任何操作都會產生另一個 NaN。在 SQL 指令中將此值作為常數寫入時,必須在其中使用單引號,例如 UPDATE table SET x = 'NaN'。 在輸入時,字串 NaN 識別是不區分大小寫的。
注意「非數字」的概念在大多數實作中,NaN 不被視為等於任何其他數值(包括 NaN)。為了允許數值在樹狀索引中排序和使用,PostgreSQL 將 NaN 值視為相等或大於所有的非 NaN 值。
decimal 和 numeric 的型別是相同的。 這兩種型別都是 SQL 標準的一部分。
當需要四捨五入時,數字型別會往離零較遠的值調整,而(在大多數機器上)實數和雙精度型別會調整到最接近的偶數。 例如:
資料型別中 real 和 double 是非精確的、可變精確度的數字型別。在實務上,這些型別通常是針對二進制浮點數運算(分別為單精度和雙精度)的IEEE 754標準的實作,需要底層的中央處理器、作業系統和編譯器支持。
非精確意味著某些值不能完全轉換為內部格式,並以近似值儲存,因此儲存和檢索值可能會表現出輕微的差異。管理這些誤差以及它們如何計算傳遞是數學和計算機科學分支的主題,除了以下幾點之外,這裡不再討論:
如果你需要精確的儲存和計算(例如貨幣金額),請改為使用 numeric 型別。
如果你想對這些型別做任何重要的複雜計算,特別是如果你依賴邊界情況下的某些行為(極大極小值或超過上下限),你應該仔細評估實作方式。
比較兩個相等的浮點數值可能並不總是按預期中直覺的方式運作。
在大多數平台上,real 型別的範圍至少為 1E-37 至 1E + 37,精確度至少為 6 位數十進制數字。double 型別的範圍通常在 1E-307 至 1E + 308 之間,精確度至少為 15 位數。數值太大或太小都會導致錯誤。如果輸入數字的精確度太高,四捨五入的情況則可能會發生。數字太接近於零,卻不能表示為零的話,將導致 underflow 超過下限的錯誤。
注意extra_float_digits 參數設定控制浮點數轉換為文字輸出時所包含的額外有效位數。使用預設值 0 時,PostgreSQL 支援的每個平台上的輸出都是相同的。增加它的話,能更精確地輸出儲存值,但可能在不同平台間是不同的結果。
除了普通的數值之外,浮點型別還有幾個特殊的值:
Infinity
-Infinity
NaN
這些分別代表 IEEE 754 特殊值「無限大」、「負無限大」和「非數字」。(在浮點數計算不符合 IEEE 754 標準的機器上,這些值可能無法如期運作。)在 SQL 指令中將這些值作為常數寫入時,必須在其放入單引號中,例如 UPDATE table SET x = '-Infinity'。 在輸入時,這些字串識別是不區分大小寫的。
注意IEEE 754 規定 NaN 不應與任何其他浮點數值(包括NaN)相等。為了允許浮點值在樹狀索引中排序和使用,PostgreSQL 將 NaN 視為相等或大於所有非 NaN 的數值。
PostgreSQL 也支援 SQL 標準的 float 和 float(p) 來表示非精確的數字型別。這裡,p 指的是二進位數字的最小可接受的精確度。PostgreSQL 接受 float(1) 到 float(24) 選擇視為 real 型別,而 float(25) 到 float(53) 則視為 double。p 超出允許範圍的話會產生錯誤。沒有指定精確度的浮點數意味著 double。
注意假設 real 和 double 的尾數分別為 24 位和 53 位,以 IEEE 標準浮點數實作而言是正確的。在非 IEEE 平台上,它可能會有一些小問題,但為了簡單起見,最好在所有平台上都使用相同的 p 範圍。
注意本節介紹的是 PostgreSQL 專屬建立自動增量(auto-incrementing)欄位的方式。另一種方式是使用 CREATE TABLE 中描述的 SQL 標準識別欄位功能。
資料型別 smallserial、serial 和 bigserial 都不是真正的型別,而僅僅是建立唯一識別欄位(類似於某些其他資料庫所支援的 AUTO_INCREMENT 屬性)的方便型別語法。以目前的實作方式,請使用:
相當於以下的指令:
因此,我們建立了一個整數欄位,並將其預設值設定為序列數字產生器。使用 NOT NULL 限制條件來確保無法插入空值。(在大多數情況下,你還需要附加一個 UNIQUE 或 PRIMARY KEY 限制條件來防止偶然插入重複值,但這不是自動的。) 最後,這個序列被標記為「owned by」欄位,以便在欄位或資料表被刪除時一併被刪除。
注意smallserial、serial 和 bigserial,被實作來實現序列數字,即使沒有資料列被刪除,在欄位中出現的值在序列中仍可能會有「漏洞」或缺口。即使包含該值的資料列從未成功插入資料表中,從序列中分配的值仍然會用完。例如,如果資料插入的交易回溯了,則可能發生這種情況。有關詳細訊息,請參閱第 9.16 節中的 nextval()。
要將序列的下一個值插入到序列欄位中,請指定序列欄位應被分配其預設值。這可以透過從 INSERT 語句中欄位列表中排除欄位或使用DEFAULT關鍵字來完成。
型別名稱 serial 和 serial4 是等價的:都是建立整數(integer)欄位。型別名稱 bigserial 和 serial8 也以相同的方式作用,差別是他們建立一個 bigint 的欄位。如果你預期在資料表的整個生命週期中使用超過 2^31 個標識符,則應使用 bigserial。型別名稱 smallserial 和 serial2 也是以相同的,而除了它們是建立一個 smallint 欄位。
當擁有的欄位被刪除時,為序列欄位創建的序列也將自動刪除。但你可以刪除序列而不刪除欄位,這會強制刪除欄位的預設表示式。
PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses, as shown in Table 8.21. It is better to use these types instead of plain text types to store network addresses, because these types offer input error checking and specialized operators and functions (see Section 9.12).
cidr
7 or 19 bytes
IPv4 and IPv6 networks
inet
7 or 19 bytes
IPv4 and IPv6 hosts and networks
macaddr
6 bytes
MAC addresses
macaddr8
8 bytes
MAC addresses (EUI-64 format)
When sorting inet
or cidr
data types, IPv4 addresses will always sort before IPv6 addresses, including IPv4 addresses encapsulated or mapped to IPv6 addresses, such as ::10.2.3.4 or ::ffff:10.4.3.2.
inet
The inet
type holds an IPv4 or IPv6 host address, and optionally its subnet, all in one field. The subnet is represented by the number of network address bits present in the host address (the “netmask”). If the netmask is 32 and the address is IPv4, then the value does not indicate a subnet, only a single host. In IPv6, the address length is 128 bits, so 128 bits specify a unique host address. Note that if you want to accept only networks, you should use the cidr
type rather than inet
.
The input format for this type is address/y
where address
is an IPv4 or IPv6 address and y
is the number of bits in the netmask. If the /y
portion is missing, the netmask is 32 for IPv4 and 128 for IPv6, so the value represents just a single host. On display, the /y
portion is suppressed if the netmask specifies a single host.
cidr
The cidr
type holds an IPv4 or IPv6 network specification. Input and output formats follow Classless Internet Domain Routing conventions. The format for specifying networks is address/y
where address
is the network represented as an IPv4 or IPv6 address, and y
is the number of bits in the netmask. If y
is omitted, it is calculated using assumptions from the older classful network numbering system, except it will be at least large enough to include all of the octets written in the input. It is an error to specify a network address that has bits set to the right of the specified netmask.
Table 8.22 shows some examples.
cidr
Type Input Examplescidr
Input
cidr
Output
abbrev(cidr
)
192.168.100.128/25
192.168.100.128/25
192.168.100.128/25
192.168/24
192.168.0.0/24
192.168.0/24
192.168/25
192.168.0.0/25
192.168.0.0/25
192.168.1
192.168.1.0/24
192.168.1/24
192.168
192.168.0.0/24
192.168.0/24
128.1
128.1.0.0/16
128.1/16
128
128.0.0.0/16
128.0/16
128.1.2
128.1.2.0/24
128.1.2/24
10.1.2
10.1.2.0/24
10.1.2/24
10.1
10.1.0.0/16
10.1/16
10
10.0.0.0/8
10/8
10.1.2.3/32
10.1.2.3/32
10.1.2.3/32
2001:4f8:3:ba::/64
2001:4f8:3:ba::/64
2001:4f8:3:ba::/64
2001:4f8:3:ba:2e0:81ff:fe22:d1f1/128
2001:4f8:3:ba:2e0:81ff:fe22:d1f1/128
2001:4f8:3:ba:2e0:81ff:fe22:d1f1
::ffff:1.2.3.0/120
::ffff:1.2.3.0/120
::ffff:1.2.3/120
::ffff:1.2.3.0/128
::ffff:1.2.3.0/128
::ffff:1.2.3.0/128
inet
vs. cidr
The essential difference between inet
and cidr
data types is that inet
accepts values with nonzero bits to the right of the netmask, whereas cidr
does not. For example, 192.168.0.1/24
is valid for inet
but not for cidr
.
If you do not like the output format for inet
or cidr
values, try the functions host
, text
, and abbrev
.
macaddr
The macaddr
type stores MAC addresses, known for example from Ethernet card hardware addresses (although MAC addresses are used for other purposes as well). Input is accepted in the following formats:
'08:00:2b:01:02:03'
'08-00-2b-01-02-03'
'08002b:010203'
'08002b-010203'
'0800.2b01.0203'
'0800-2b01-0203'
'08002b010203'
These examples would all specify the same address. Upper and lower case is accepted for the digits a
through f
. Output is always in the first of the forms shown.
IEEE Std 802-2001 specifies the second shown form (with hyphens) as the canonical form for MAC addresses, and specifies the first form (with colons) as the bit-reversed notation, so that 08-00-2b-01-02-03 = 01:00:4D:08:04:0C. This convention is widely ignored nowadays, and it is relevant only for obsolete network protocols (such as Token Ring). PostgreSQL makes no provisions for bit reversal, and all accepted formats use the canonical LSB order.
The remaining five input formats are not part of any standard.
macaddr8
The macaddr8
type stores MAC addresses in EUI-64 format, known for example from Ethernet card hardware addresses (although MAC addresses are used for other purposes as well). This type can accept both 6 and 8 byte length MAC addresses and stores them in 8 byte length format. MAC addresses given in 6 byte format will be stored in 8 byte length format with the 4th and 5th bytes set to FF and FE, respectively. Note that IPv6 uses a modified EUI-64 format where the 7th bit should be set to one after the conversion from EUI-48. The function macaddr8_set7bit
is provided to make this change. Generally speaking, any input which is comprised of pairs of hex digits (on byte boundaries), optionally separated consistently by one of ':'
, '-'
or '.'
, is accepted. The number of hex digits must be either 16 (8 bytes) or 12 (6 bytes). Leading and trailing whitespace is ignored. The following are examples of input formats that are accepted:
'08:00:2b:01:02:03:04:05'
'08-00-2b-01-02-03-04-05'
'08002b:0102030405'
'08002b-0102030405'
'0800.2b01.0203.0405'
'0800-2b01-0203-0405'
'08002b01:02030405'
'08002b0102030405'
These examples would all specify the same address. Upper and lower case is accepted for the digits a
through f
. Output is always in the first of the forms shown. The last six input formats that are mentioned above are not part of any standard. To convert a traditional 48 bit MAC address in EUI-48 format to modified EUI-64 format to be included as the host portion of an IPv6 address, use macaddr8_set7bit
as shown:
Bit strings are strings of 1's and 0's. They can be used to store or visualize bit masks. There are two SQL bit types: bit(
n
) and bit varying(
n
), where n
is a positive integer.
bit
type data must match the length n
exactly; it is an error to attempt to store shorter or longer bit strings. bit varying
data is of variable length up to the maximum length n
; longer strings will be rejected. Writing bit
without a length is equivalent to bit(1)
, while bit varying
without a length specification means unlimited length.
If one explicitly casts a bit-string value to bit(
n
), it will be truncated or zero-padded on the right to be exactly n
bits, without raising an error. Similarly, if one explicitly casts a bit-string value to bit varying(
n
), it will be truncated on the right if it is more than n
bits.
Refer to Section 4.1.2.5 for information about the syntax of bit string constants. Bit-logical operators and string manipulation functions are available; see Section 9.6.
Example 8.3. Using the Bit String Types
A bit string value requires 1 byte for each group of 8 bits, plus 5 or 8 bytes overhead depending on the length of the string (but long values may be compressed or moved out-of-line, as explained in Section 8.3 for character strings).
Geometric data types represent two-dimensional spatial objects. Table 8.20 shows the geometric types available in PostgreSQL.
Table 8.20. Geometric Types
point
16 bytes
Point on a plane
(x,y)
line
32 bytes
Infinite line
{A,B,C}
lseg
32 bytes
Finite line segment
((x1,y1),(x2,y2))
box
32 bytes
Rectangular box
((x1,y1),(x2,y2))
path
16+16n bytes
Closed path (similar to polygon)
((x1,y1),...)
path
16+16n bytes
Open path
[(x1,y1),...]
polygon
40+16n bytes
Polygon (similar to closed path)
((x1,y1),...)
circle
24 bytes
Circle
<(x,y),r> (center point and radius)
A rich set of functions and operators is available to perform various geometric operations such as scaling, translation, rotation, and determining intersections. They are explained in Section 9.11.
Points are the fundamental two-dimensional building block for geometric types. Values of type point
are specified using either of the following syntaxes:
where x
and y
are the respective coordinates, as floating-point numbers.
Points are output using the first syntax.
Lines are represented by the linear equation _A
_x + _B
_y + C
= 0, where A
and B
are not both zero. Values of type line
are input and output in the following form:
Alternatively, any of the following forms can be used for input:
where (
x1
,y1
) and (
x2
,y2
) are two different points on the line.
Line segments are represented by pairs of points that are the endpoints of the segment. Values of type lseg
are specified using any of the following syntaxes:
where (
x1
,y1
) and (
x2
,y2
) are the end points of the line segment.
Line segments are output using the first syntax.
Boxes are represented by pairs of points that are opposite corners of the box. Values of type box
are specified using any of the following syntaxes:
where (
x1
,y1
) and (
x2
,y2
) are any two opposite corners of the box.
Boxes are output using the second syntax.
Any two opposite corners can be supplied on input, but the values will be reordered as needed to store the upper right and lower left corners, in that order.
Paths are represented by lists of connected points. Paths can be open, where the first and last points in the list are considered not connected, or closed, where the first and last points are considered connected.
Values of type path
are specified using any of the following syntaxes:
where the points are the end points of the line segments comprising the path. Square brackets ([]
) indicate an open path, while parentheses (()
) indicate a closed path. When the outermost parentheses are omitted, as in the third through fifth syntaxes, a closed path is assumed.
Paths are output using the first or second syntax, as appropriate.
Polygons are represented by lists of points (the vertexes of the polygon). Polygons are very similar to closed paths, but are stored differently and have their own set of support routines.
Values of type polygon
are specified using any of the following syntaxes:
where the points are the end points of the line segments comprising the boundary of the polygon.
Polygons are output using the first syntax.
Circles are represented by a center point and radius. Values of type circle
are specified using any of the following syntaxes:
where (
x
,y
) is the center point and r
is the radius of the circle.
Circles are output using the first syntax.
資料型別 uuid 儲存由 RFC 4122、ISO/IEC 9834-8:2005 和相關標準定義的通用唯一識別字 (Universally Unique IDentifiers, UUID)。(有些系統將此資料型別稱為 Globally Unique IDentifier 或 GUID。)此識別字是一個 128 位元的數字,由所選擇演算法產生,以確保其他任何人在已知的情況下使用相同的演算法都不太可能產生相同的識別字。因此,對於分散式系統,這些識別字提供了比序列產生器更好的唯一性保證,序列產生器僅在單一資料庫中確保唯一性。
一個 UUID 寫成一系列小寫的十六進位數字,由連接字元分隔為幾組,特別是一組 8 位數字後跟三組 4 位數字後跟一組 12 位數字,總共 32 位數字代表 128 位元。此標準形式的 UUID 範例是:
PostgreSQL 還接受以下替代形式的輸入方式:使用大寫數字、用大括號括起來的標準格式、省略部分或全部連接字元、在任何一組四位數字後加上連接字元。一些例子如下:
Output is always in the standard form.
有關如何在 PostgreSQL 中產生 UUID,請參閱第 9.14 節。
bytea 資料型別允許儲存位元組字串;詳見 Table 8.6。
bytea
1 or 4 bytes 加上實際的位元組字串長度
可變長度二進位字串
位元組字串是位元組的序列。位元組字串以兩種方式與字串區分開來。首先,位元組字串特別允許儲存零值的位元組和其他「不可列印」位元組(通常是在 32 到 126 範圍之外的位元組)。字串不允許全為零位元組,並且還禁止資料庫選擇無效的字元集編碼序列。其次,對位元組字串的操作處理實際的位元組,而字串的處理取決於區域設定。簡而言之,位元組字串適合於儲存程式設計師認為是「raw bytes」的資料,而字串適合於儲存文字。
bytea 型別支援兩種輸入和輸出的外部格式:PostgreSQL 既有的「escape」格式和「十六進位」格式,輸入時始終接受這兩個。輸出格式取決於組態參數 bytea_output;預設值為十六進位。(注意,在 PostgreSQL 9.0 中引入了十六進位格式;早期版本和一些工具並無法解譯它。)
SQL 標準定義了一種不同的位元組字串型別,稱為 BLOB 或 BINARY LARGE OBJECT。輸入格式與 bytea 不同,但提供的函數和運算子大致相同。
bytea
十六進位格式「十六進位」格式將二進位資料編碼為每個位元組為 2 個十六進位數字,儲存不反轉。整個字符串前面是序列 \x(以區別於轉譯格式)。在某些情況下,初始倒斜線可能需要透過加倍來進行轉譯,在相同的情況下,倒斜線必須以轉譯格式加倍;細節如下。十六進位數字可以是大寫或小寫,並且在數字組之間允許空格(但不在數字組內,也不在起始 \x 序列中)。十六進位格式與各種外部應用程序和協議相容,並且轉換速度往往比轉譯格式更快,因此偏好使用它。
例如:
bytea
轉譯(escape)格式「轉義」格式是 bytea 型別的傳統 PostgreSQL 格式。它採用將位元組字串表示為 ASCII 字元序列的方法,同時將那些不能表示為 ASCII 字元的位元組轉換為特殊的轉譯序列。如果從應用程序的角度來看,將位元組表示為字元是有意義的,那麼這種表示可以很方便。但實際上它通常會令人困惑,因為它模糊了位元組字串和字串之間的區別,而且所選擇的特定轉譯機制也有點笨拙。因此,對於大多數新的應用程序,應該避免使用此格式。
以轉譯格式輸入 bytea 值時,必須轉譯某些值的位元組,也同時可以轉譯所有位元組值。通常,要轉譯位元組,請將其轉換為三位數的八進位值,並在其前面加一個倒斜線(或兩個倒斜線,如果要使用轉譯字串語法將值寫為文字的話)。倒斜線本身(位元組 92)也可以用雙倒斜線表示。Table 8.7 列出了必須轉譯的字元,並在適合的情況下提供了備用轉譯序列。
bytea
Literal Escaped Octets0
zero octet
E'\\000'
SELECT E'\\000'::bytea;
\000
39
single quote
''''
or E'\\047'
SELECT E'\''::bytea;
'
92
backslash
E'\\\\'
or E'\\134'
SELECT E'\\\\'::bytea;
\\
0 to 31 and 127 to 255
“non-printable” octets
E'\\
xxx'
(octal value)
SELECT E'\\001'::bytea;
\001
轉譯不可列印的位元組的要求因區域設定而異。在某些情況下,你可以放棄他們而不轉譯。請注意,即使看起來有時多於一個字符,Table 8.7 中每個範例的結果也只有一個位元組。
如 Table 8.7 所示,需要多個倒斜線的原因是,作為字串文字編輯的輸入字串必須通過 PostgreSQL 伺服器中的兩個解析階段。每組的第一個倒斜線以字串文字解析器解釋為轉譯字元(假設使用了轉譯字串語法)並因此被消耗,留下該組的第二個倒斜線。(錢字號引用的字串可用於避免此轉譯程序。)然後,bytea 輸入函數將剩餘的倒斜線識別從三位數八進位值開始或轉譯另一個倒斜線。例如,在通過轉譯字串解析器後,作為 E'\ 001' 傳遞給伺服器的字串文字變為 \001。然後將 \001 發送到 bytea 輸入函數,在該函數中將其轉換為十進制值為 1 的單個位元組。請注意,單引號字元不受 bytea 特殊處理,因此它遵循字串文字的一般規則。(另詳見第 4.1.2.1 節。)
bytea 位元組有時在輸出時被轉義。通常,每個「不可列印」的位元組都會轉換為等效的三位數八進位值,並以一個倒斜線開頭。大多數「可列印」位元組由它們在用戶端字元集中的標準來表示。十進位值為 92(倒斜線)的位元組在輸出中會加倍。詳情見 Table 8.8。
bytea
Output Escaped Octets92
backslash
\\
SELECT E'\\134'::bytea;
\\
0 to 31 and 127 to 255
“non-printable” octets
\
xxx
(octal value)
SELECT E'\\001'::bytea;
\001
32 to 126
“printable” octets
client character set representation
SELECT E'\\176'::bytea;
~
根據您使用的 PostgreSQL 的前端,在轉譯和未轉譯 bytea 字串方面可能還有其他工作要做。例如,如果您的界面會自動轉譯這些,您可能還必須轉譯換行符號和回行首符號。
pg_lsn 資料型別用於儲存 LSN(日誌序列編號)資料,該資料是指向 WAL 中某個位置的指標。此型別用於表示 XLogRecPtr,並且是 PostgreSQL 的內部系統型別。
Internally, an LSN is a 64-bit integer, representing a byte position in the write-ahead log stream. It is printed as two hexadecimal numbers of up to 8 digits each, separated by a slash; for example, 16/B374D848
. The pg_lsn
type supports the standard comparison operators, like =
and >
. Two LSNs can be subtracted using the -
operator; the result is the number of bytes separating those write-ahead log locations.
Table 8.4. Character Types
character varying(n)
, varchar(n)
可變長度,但有限制
character(n)
, char(n)
固定長度,空白填充
text
可變且無限長度
Table 8.4 列出了 PostgreSQL 中可用的通用字串型別。
SQL 定義了兩種主要字串型別:character varying(n) 和 character(n),其中 n 是正整數。這兩種型別都可以儲存長度最多為 n 個字元(不是位元組)的字串。嘗試將較長的字串儲存到這些型別的欄位中將産生錯誤,除非多餘的字元都是空格,在這種情況下,字串將被截斷為最大長度。(這個有點奇怪的異常是 SQL 標準所要求的。)如果要儲存的字串比宣告的長度短,則 character 型別的值將被空格填充;character varying 的值將只儲存較短的字串。
如果明確地將值轉換為 character varying(n) 或 character(n),則超長值將被截斷為 n 個字元而不會引發錯誤。(這也是 SQL 標準所要求的。)
型別 varchar(n) 和 char(n) 分別是 character varying(n) 和 character(n) 的別名。沒有長度的 character 等同於 character(1)。如果在沒有長度的情況下使用 character varying,則該型別接受任何長度的字串。後者是 PostgreSQL 延伸功能。
另外,PostgreSQL 提供了 text 型別,它儲存任意長度的字串。雖然型別 text 不在 SQL 標準中,但是其他幾個 SQL 資料庫管理系統也支援它。
character 的值用空格填充到指定的長度 n,並以這種方式儲存和顯示。但是,在比較兩個型別字串時,尾隨空格在語義上無關緊要會被忽略。在空格很重要的排序規則中,這種行為會產生意想不到的結果; 例如 SELECT 'a '::CHAR(2) collate "C"<E'a\n'::CHAR(2)
會回傳 true,即使 C 語言環境會認為空格大於換行符。將字串轉換為其他字串型別之一時,將刪除尾隨的空格。請注意,尾隨空格在 character varying 和 text 方面具有語義重要性,尤其在使用樣式匹配時,即 LIKE 和正規表示式。
短字串(126 個位元組以下)的儲存要求是 1 個位元組加上實際字串,其中包括字串空間填充。較長的字串有 4 個位元組的開銷而不是 1。長字串由系統自動壓縮,因此磁碟上的物理需求可能更少。非常長的值也儲存在後台的資料表中,這樣它們就不會干擾對較短欄位的快速存取。在任何情況下,可儲存的最長字串大約為 1 GB。(資料型別宣告中 n 允許的最大值小於此值。更改此值沒有用,因為使用多位元組字串編碼時,位元組數和字元數可能完全不同。如果您希望儲存沒有特定上限的長字串,使用不帶長度的 text 或 character varying,而不是隨便設定長度限制。)
這三種型別之間並沒有效能差異,除了使用空白填充類型時增加的儲存空間之外,以及一些額外的 CPU 週期來檢查儲存長度與欄位中的長度。雖然 character(n) 在其他一些資料庫系統中具有效能優勢,但 PostgreSQL 中並沒有這樣的優勢;事實上,由於額外的儲存成本,character(n) 通常是三者中最慢的。在大多數情況下,應使用 text 或 character varying。
有關字串文字語法的資訊,請參閱第 4.1.2.1 節;有關可用運算子和函數的資訊,請參閱第 9 章。資料庫字元集決定用於儲存文字的字元集;有關字元集支援的更多訊息,請參閱第 23.3 節。
Example 8.1. Using the Character Types
(1)
PostgreSQL 中還有另外兩種固定長度的字串型別,如 Table 8.5 所示。name 型別僅用於在內部系統目錄中儲存指標,並非供一般使用者使用。它的長度目前定義為 64 個位元組(63 個可用字元加結尾符號),但應視 C 原始碼中的常數 NAMEDATALEN 而定。長度在編譯時設定(因此可以根據特殊用途進行調整); 預設的最大長度可能會在將來的版本中變更。型別「“char”」(注意雙引號)與 char(1) 的不同之處在於它僅使用一個位元組的儲存空間。它在系統目錄中作為簡單內部使用的列舉型別。
Table 8.5. Special Character Types
"char"
1 byte
單位元組內部型別
name
64 bytes
物件名稱的內部型別
PostgreSQL 支援完整的 SQL 日期和時間格式,如表 8.9 所示。對於這些資料型態能使用的操作,將會在說明。
Table 8.9. 日期/時間型態
SQL 標準中要求 timestamp
的效果等同於 timestamp without time zone
,對此 PostgreSQL 尊重這個行為。同時 PostgreSQL 額外擴充了 timestamptz
作為 timestamp with time zone
的縮寫。
time
、timestamp
和 interval
接受 p
作為非必須的精度參數,可指定秒的欄位保留的小數位數。預設情況下,精度沒有明確的界限。其中 p
允許的範圍是 0 到 6。
interval
型態有個額外的選項,可以寫下下列其中一個詞組來限制存放的欄位:
需注意若是 fields
和 p
同時指定時,fields
必須要包含 SECOND
。這是因為精度只會套用在秒上。
time with time zone
型態是由 SQL 標準所定義的,但是在定義中展示的屬性會導致對有用性產生疑問。在多數狀況下,date
、time
、timestamp without time zone
和 timestamp with time zone
的組合應該就能提供任何應用程式需要的完整日期/時間功能。
abstime
和 reltime
型態是較低精度的內部用型態,並不建議將這些型態用在應用程式中;這些內部型態也可能在未來的釋出中消失。
其中 p
是非必須的精度設定,用來指定秒欄位的小數位數。精度可以用來指定 time
、timestamp
和 interval
型態,可指定範圍為 0 到 6。如果沒有指定精度時,預設將以字面數值的精度為準(但最多不超過 6 位)。
Table 8.10. 日期輸入
time-of-day 格式包含 time [ (p
) ] without time zone和
time [ (_
p_\) \] with time zone
,其中 time
單獨出現時等同於 time without time zone
。
Table 8.11. 時間輸入
Table 8.12. 時區輸入
時間戳記型態的合法輸入,依序包含了日期、時間、非必須的時區、以及非必須的 AD
或者 BC
。 (其中,AD
或者 BC
也可以寫在時區前面,但這並非推薦的格式。)因此:
以及:
都是遵循 ISO 8601 標準的合法值。除此之外,常見的格式:
也有支援。
SQL 標準中,timestamp without time zone
和 timestamp with time zone
字面可以在時間後面加上 “+” 或 “-” 符號和時差來做區別,因此根據這個標準,
是 timestamp without time zone
型態,而
則是 timestamp with time zone
型態。PostgreSQL 從不會在識別型態前就解析字面的內容,因此會將上述兩種值都視為 timestamp without time zone
型態。如要確保字面會被視為 timestamp with time zone
,請給它正確而明確的型態:
在一個已被確定為沒有時區的時間戳記的字串中,PostgreSQL 將默默地忽略任何時區指示。也就是說,結果值是從輸入值中的日期/時間字串產生的,而不針對時區進行調整。
沒有時區的時間戳記和帶時區的時間戳記之間的轉換通常假定應該採用沒有時區值的時間戳記或本地時間所給予的時區。可以使用 AT TIME ZONE 為指定轉換不同的時區。
為方便起見,PostgreSQL 支援幾個特殊的日期/時間輸入值,如 Table 8.13 所示。infinaity 和 -infinity 值在系統內部有特別的表示,但不會顯示;而其他的只是符號縮寫,在閱讀時會轉換為普通的日期/時間值。(特別是,now 和相關的字串一旦被讀取就會被轉換為特定的時間值。)當在 SQL 命令中要作為常數使用時,所有這些值都需要用單引號括起來。
Table 8.13. Special Date/Time Inputs
While the input strings now
, today
, tomorrow
, and yesterday
are fine to use in interactive SQL commands, they can have surprising behavior when the command is saved to be executed later, for example in prepared statements, views, and function definitions. The string can be converted to a specific time value that continues to be used long after it becomes stale. Use one of the SQL functions instead in such contexts. For example, CURRENT_DATE + 1
is safer than 'tomorrow'::date
.
Table 8.14. Date/Time Output Styles
Table 8.15. Date Order Conventions
In the ISO style, the time zone is always shown as a signed numeric offset from UTC, with positive sign used for zones east of Greenwich. The offset will be shown as hh
(hours only) if it is an integral number of hours, else as hh
:mm
if it is an integral number of minutes, else as hh
:mm
:ss
. (The third case is not possible with any modern time zone standard, but it can appear when working with timestamps that predate the adoption of standardized time zones.) In the other date styles, the time zone is shown as an alphabetic abbreviation if one is in common use in the current zone. Otherwise it appears as a signed numeric offset in ISO 8601 basic format (hh
or hhmm
).
Time zones, and time-zone conventions, are influenced by political decisions, not just earth geometry. Time zones around the world became somewhat standardized during the 1900s, but continue to be prone to arbitrary changes, particularly with respect to daylight-savings rules. PostgreSQL uses the widely-used IANA (Olson) time zone database for information about historical time zone rules. For times in the future, the assumption is that the latest known rules for a given time zone will continue to be observed indefinitely far into the future.
PostgreSQL endeavors to be compatible with the SQL standard definitions for typical usage. However, the SQL standard has an odd mix of date and time types and capabilities. Two obvious problems are:
Although the date
type cannot have an associated time zone, the time
type can. Time zones in the real world have little meaning unless associated with a date as well as a time, since the offset can vary through the year with daylight-saving time boundaries.
The default time zone is specified as a constant numeric offset from UTC. It is therefore impossible to adapt to daylight-saving time when doing date/time arithmetic across DST boundaries.
To address these difficulties, we recommend using date/time types that contain both date and time when using time zones. We do not recommend using the type time with time zone
(though it is supported by PostgreSQL for legacy applications and for compliance with the SQL standard). PostgreSQL assumes your local time zone for any type containing only date or time.
PostgreSQL allows you to specify time zones in three different forms:
In short, this is the difference between abbreviations and full names: abbreviations represent a specific offset from UTC, whereas many of the full names imply a local daylight-savings time rule, and so have two possible UTC offsets. As an example, 2014-06-04 12:00 America/New_York
represents noon local time in New York, which for this particular date was Eastern Daylight Time (UTC-4). So 2014-06-04 12:00 EDT
specifies that same time instant. But 2014-06-04 12:00 EST
specifies noon Eastern Standard Time (UTC-5), regardless of whether daylight savings was nominally in effect on that date.
To complicate matters, some jurisdictions have used the same timezone abbreviation to mean different UTC offsets at different times; for example, in Moscow MSK
has meant UTC+3 in some years and UTC+4 in others. PostgreSQL interprets such abbreviations according to whatever they meant (or had most recently meant) on the specified date; but, as with the EST
example above, this is not necessarily the same as local civil time on that date.
In all cases, timezone names and abbreviations are recognized case-insensitively. (This is a change from PostgreSQL versions prior to 8.2, which were case-sensitive in some contexts but not others.)
The SQL command SET TIME ZONE
sets the time zone for the session. This is an alternative spelling of SET TIMEZONE TO
with a more SQL-spec-compatible syntax.
The PGTZ
environment variable is used by libpq clients to send a SET TIME ZONE
command to the server upon connection.
interval
values can be written using the following verbose syntax:
Quantities of days, hours, minutes, and seconds can be specified without explicit unit markings. For example, '1 12:59:10'
is read the same as '1 day 12 hours 59 min 10 sec'
. Also, a combination of years and months can be specified with a dash; for example '200-10'
is read the same as '200 years 10 months'
. (These shorter forms are in fact the only ones allowed by the SQL standard, and are used for output when IntervalStyle
is set to sql_standard
.)
Interval values can also be written as ISO 8601 time intervals, using either the “format with designators” of the standard's section 4.4.3.2 or the “alternative format” of section 4.4.3.3. The format with designators looks like this:
Table 8.16. ISO 8601 Interval Unit Abbreviations
In the alternative format:
the string must begin with P
, and a T
separates the date and time parts of the interval. The values are given as numbers similar to ISO 8601 dates.
When writing an interval constant with a fields
specification, or when assigning a string to an interval column that was defined with a fields
specification, the interpretation of unmarked quantities depends on the fields
. For example INTERVAL '1' YEAR
is read as 1 year, whereas INTERVAL '1'
means 1 second. Also, field values “to the right” of the least significant field allowed by the fields
specification are silently discarded. For example, writing INTERVAL '1 day 2:03:04' HOUR TO MINUTE
results in dropping the seconds field, but not the day field.
According to the SQL standard all fields of an interval value must have the same sign, so a leading negative sign applies to all fields; for example the negative sign in the interval literal '-1 2:03:04'
applies to both the days and hour/minute/second parts. PostgreSQL allows the fields to have different signs, and traditionally treats each field in the textual representation as independently signed, so that the hour/minute/second part is considered positive in this example. If IntervalStyle
is set to sql_standard
then a leading sign is considered to apply to all fields (but only if no additional signs appear). Otherwise the traditional PostgreSQL interpretation is used. To avoid ambiguity, it's recommended to attach an explicit sign to each field if any field is negative.
Field values can have fractional parts: for example, '1.5 weeks'
or '01:02:03.45'
. However, because interval internally stores only three integer units (months, days, microseconds), fractional units must be spilled to smaller units. Fractional parts of units greater than months are rounded to be an integer number of months, e.g. '1.5 years'
becomes '1 year 6 mons'
. Fractional parts of weeks and days are computed to be an integer number of days and microseconds, assuming 30 days per month and 24 hours per day, e.g., '1.75 months'
becomes 1 mon 22 days 12:00:00
. Only seconds will ever be shown as fractional on output.
Internally interval
values are stored as months, days, and microseconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the microseconds field can store fractional seconds. Because intervals are usually created from constant strings or timestamp
subtraction, this storage method works well in most cases, but can cause unexpected results:
Functions justify_days
and justify_hours
are available for adjusting days and hours that overflow their normal ranges.
The sql_standard
style produces output that conforms to the SQL standard's specification for interval literal strings, if the interval value meets the standard's restrictions (either year-month only or day-time only, with no mixing of positive and negative components). Otherwise the output looks like a standard year-month literal string followed by a day-time literal string, with explicit signs added to disambiguate mixed-sign intervals.
The output of the postgres_verbose
style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle
parameter was set to non-ISO
output.
The output of the iso_8601
style matches the “format with designators” described in section 4.4.3.2 of the ISO 8601 standard.
The xml
data type can be used to store XML data. Its advantage over storing XML data in a text
field is that it checks the input values for well-formedness, and there are support functions to perform type-safe operations on it; see . Use of this data type requires the installation to have been built with configure --with-libxml
.
The xml
type can store well-formed “documents”, as defined by the XML standard, as well as “content” fragments, which are defined by reference to the more permissive of the XQuery and XPath data model. Roughly, this means that content fragments can have more than one top-level element or character node. The expression xmlvalue
IS DOCUMENT can be used to evaluate whether a particular xml
value is a full document or only a content fragment.
Limits and compatibility notes for the xml
data type can be found in .
To produce a value of type xml
from character data, use the function xmlparse
:
Examples:
While this is the only way to convert character strings into XML values according to the SQL standard, the PostgreSQL-specific syntaxes:
can also be used.
The xml
type does not validate input values against a document type declaration (DTD), even when the input value specifies a DTD. There is also currently no built-in support for validating against other XML schema languages such as XML Schema.
The inverse operation, producing a character string value from xml
, uses the function xmlserialize
:
type
can be character
, character varying
, or text
(or an alias for one of those). Again, according to the SQL standard, this is the only way to convert between type xml
and character types, but PostgreSQL also allows you to simply cast the value.
When a character string value is cast to or from type xml
without going through XMLPARSE
or XMLSERIALIZE
, respectively, the choice of DOCUMENT
versus CONTENT
is determined by the “XML option” session configuration parameter, which can be set using the standard command:
or the more PostgreSQL-like syntax
The default is CONTENT
, so all forms of XML data are allowed.
When using binary mode to pass query parameters to the server and query results back to the client, no encoding conversion is performed, so the situation is different. In this case, an encoding declaration in the XML data will be observed, and if it is absent, the data will be assumed to be in UTF-8 (as required by the XML standard; note that PostgreSQL does not support UTF-16). On output, data will have an encoding declaration specifying the client encoding, unless the client encoding is UTF-8, in which case it will be omitted.
Needless to say, processing XML data with PostgreSQL will be less error-prone and more efficient if the XML data encoding, client encoding, and server encoding are the same. Since XML data is internally processed in UTF-8, computations will be most efficient if the server encoding is also UTF-8.
Some XML-related functions may not work at all on non-ASCII data when the server encoding is not UTF-8. This is known to be an issue for xmltable()
and xpath()
in particular.
The xml
data type is unusual in that it does not provide any comparison operators. This is because there is no well-defined and universally useful comparison algorithm for XML data. One consequence of this is that you cannot retrieve rows by comparing an xml
column against a search value. XML values should therefore typically be accompanied by a separate key field such as an ID. An alternative solution for comparing XML values is to convert them to character strings first, but note that character string comparison has little to do with a useful XML comparison method.
Since there are no comparison operators for the xml
data type, it is not possible to create an index directly on a column of this type. If speedy searches in XML data are desired, possible workarounds include casting the expression to a character string type and indexing that, or indexing an XPath expression. Of course, the actual query would have to be adjusted to search by the indexed expression.
The text-search functionality in PostgreSQL can also be used to speed up full-document searches of XML data. The necessary preprocessing support is, however, not yet available in the PostgreSQL distribution.
Domain 是基於另一個基本型別的使用者定義資料型別。可以選擇性將其有效值限制為基本型別的子集。否則,它的行為類似於基本型別 — 例如,可以應用於基本型別的任何運算子或函數都將在 domain 型別上以相同的行為運作。基本型別可以是任何內建或其他使用者定義的基本型別、列舉型別、陣列型別、複合型別、範圍型別或其他的 domain。
例如,我們可以整數型別上建立一個 domain,其僅接受正數:
當基本型別的運算子或函數運作於 domain 的值時,該 domain 會自動向下轉換為基本型別。因此,例如,mytable.id - 1 的結果被認為是整數型別而不是 posint 型別。 我們可以使用 (mytable.id - 1)::posint 將結果轉換回 posint,使其重新檢查 domain 的條件。在這種情況下,如果將表示式於 ID 值 1 做運算,則會產生錯誤。可以在不明確寫出強制型別轉換的情況下,將基本型別的值寫入到 domain 型別的欄位或變數,但是將會檢查該 domain 的限制條件。
有關更多資訊,請參閱 。
PostgreSQL provides two data types that are designed to support full text search, which is the activity of searching through a collection of natural-language documents to locate those that best match a query. The tsvector
type represents a document in a form optimized for text search; the tsquery
type similarly represents a text query. provides a detailed explanation of this facility, and summarizes the related functions and operators.
tsvector
A tsvector
value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see for details). Sorting and duplicate-elimination are done automatically during input, as shown in this example:
To represent lexemes containing whitespace or punctuation, surround them with quotes:
(We use dollar-quoted string literals in this example and the next one to avoid the confusion of having to double quote marks within the literals.) Embedded quotes and backslashes must be doubled:
Optionally, integer positions can be attached to lexemes:
A position normally indicates the source word's location in the document. Positional information can be used for proximity ranking. Position values can range from 1 to 16383; larger numbers are silently set to 16383. Duplicate positions for the same lexeme are discarded.
Lexemes that have positions can further be labeled with a weight, which can be A
, B
, C
, or D
. D
is the default and hence is not shown on output:
Weights are typically used to reflect document structure, for example by marking title words differently from body words. Text search ranking functions can assign different priorities to the different weight markers.
It is important to understand that the tsvector
type itself does not perform any word normalization; it assumes the words it is given are normalized appropriately for the application. For example,
For most English-text-searching applications the above words would be considered non-normalized, but tsvector
doesn't care. Raw document text should usually be passed through to_tsvector
to normalize the words appropriately for searching:
tsquery
A tsquery
value stores lexemes that are to be searched for, and can combine them using the Boolean operators &
(AND), |
(OR), and !
(NOT), as well as the phrase search operator <->
(FOLLOWED BY). There is also a variant <
N
> of the FOLLOWED BY operator, where N
is an integer constant that specifies the distance between the two lexemes being searched for. <->
is equivalent to <1>
.
Parentheses can be used to enforce grouping of these operators. In the absence of parentheses, !
(NOT) binds most tightly, <->
(FOLLOWED BY) next most tightly, then &
(AND), with |
(OR) binding the least tightly.
Here are some examples:
Optionally, lexemes in a tsquery
can be labeled with one or more weight letters, which restricts them to match only tsvector
lexemes with one of those weights:
Also, lexemes in a tsquery
can be labeled with *
to specify prefix matching:
This query will match any word in a tsvector
that begins with “super”.
Quoting rules for lexemes are the same as described previously for lexemes in tsvector
; and, as with tsvector
, any required normalization of words must be done before converting to the tsquery
type. The to_tsquery
function is convenient for performing such normalization:
Note that to_tsquery
will process prefixes in the same way as other words, which means this comparison returns true:
because postgres
gets stemmed to postgr
:
which will match the stemmed form of postgraduate
.
A composite type represents the structure of a row or record; it is essentially just a list of field names and their data types. PostgreSQL allows composite types to be used in many of the same ways that simple types can be used. For example, a column of a table can be declared to be of a composite type.
Here are two simple examples of defining composite types:
The syntax is comparable to CREATE TABLE
, except that only field names and types can be specified; no constraints (such as NOT NULL
) can presently be included. Note that the AS
keyword is essential; without it, the system will think a different kind of CREATE TYPE
command is meant, and you will get odd syntax errors.
Having defined the types, we can use them to create tables:
or functions:
Whenever you create a table, a composite type is also automatically created, with the same name as the table, to represent the table's row type. For example, had we said:
then the same inventory_item
composite type shown above would come into being as a byproduct, and could be used just as above. Note however an important restriction of the current implementation: since no constraints are associated with a composite type, the constraints shown in the table definition do not apply to values of the composite type outside the table. (To work around this, create a domain over the composite type, and apply the desired constraints as CHECK
constraints of the domain.)
An example is:
which would be a valid value of the inventory_item
type defined above. To make a field be NULL, write no characters at all in its position in the list. For example, this constant specifies a NULL third field:
If you want an empty string rather than NULL, write double quotes:
Here the first field is a non-NULL empty string, the third is NULL.
The ROW
expression syntax can also be used to construct composite values. In most cases this is considerably simpler to use than the string-literal syntax since you don't have to worry about multiple layers of quoting. We already used this method above:
The ROW keyword is actually optional as long as you have more than one field in the expression, so these can be simplified to:
To access a field of a composite column, one writes a dot and the field name, much like selecting a field from a table name. In fact, it's so much like selecting from a table name that you often have to use parentheses to keep from confusing the parser. For example, you might try to select some subfields from our on_hand
example table with something like:
This will not work since the name item
is taken to be a table name, not a column name of on_hand
, per SQL syntax rules. You must write it like this:
or if you need to use the table name as well (for instance in a multitable query), like this:
Now the parenthesized object is correctly interpreted as a reference to the item
column, and then the subfield can be selected from it.
Similar syntactic issues apply whenever you select a field from a composite value. For instance, to select just one field from the result of a function that returns a composite value, you'd need to write something like:
Without the extra parentheses, this will generate a syntax error.
Here are some examples of the proper syntax for inserting and updating composite columns. First, inserting or updating a whole column:
The first example omits ROW
, the second uses it; we could have done it either way.
We can update an individual subfield of a composite column:
Notice here that we don't need to (and indeed cannot) put parentheses around the column name appearing just after SET
, but we do need parentheses when referencing the same column in the expression to the right of the equal sign.
And we can specify subfields as targets for INSERT
, too:
Had we not supplied values for all the subfields of the column, the remaining subfields would have been filled with null values.
There are various special syntax rules and behaviors associated with composite types in queries. These rules provide useful shortcuts, but can be confusing if you don't know the logic behind them.
This query produces a single composite-valued column, so we might get output like:
Note however that simple names are matched to column names before table names, so this example works only because there is no column named c
in the query's tables.
When we write
then, according to the SQL standard, we should get the contents of the table expanded into separate columns:
as if the query were
PostgreSQL handles column expansion by actually transforming the first form into the second. So, in this example, myfunc()
would get invoked three times per row with either syntax. If it's an expensive function you may wish to avoid that, which you can do with a query like:
Placing the function in a LATERAL
FROM
item keeps it from being invoked more than once per row. m.*
is still expanded into m.a, m.b, m.c
, but now those variables are just references to the output of the FROM
item. (The LATERAL
keyword is optional here, but we show it to clarify that the function is getting x
from some_table
.)
In both cases, the current row of inventory_item
is passed to the function as a single composite-valued argument. Even though .*
does nothing in such cases, using it is good style, since it makes clear that a composite value is intended. In particular, the parser will consider c
in c.*
to refer to a table name or alias, not to a column name, so that there is no ambiguity; whereas without .*
, it is not clear whether c
means a table name or a column name, and in fact the column-name interpretation will be preferred if there is a column named c
.
Another example demonstrating these concepts is that all these queries mean the same thing:
(The last case uses a row constructor with the key word ROW
omitted.)
Another special syntactical behavior associated with composite values is that we can use functional notation for extracting a field of a composite value. The simple way to explain this is that the notations field
(table
) and table
.field
are interchangeable. For example, these queries are equivalent:
Moreover, if we have a function that accepts a single argument of a composite type, we can call it with either notation. These queries are all equivalent:
This equivalence between functional notation and field notation makes it possible to use functions on composite types to implement “computed fields”. An application using the last query above wouldn't need to be directly aware that somefunc
isn't a real column of the table.
Because of this behavior, it's unwise to give a function that takes a single composite-type argument the same name as any of the fields of that composite type. If there is ambiguity, the field-name interpretation will be chosen if field-name syntax is used, while the function will be chosen if function-call syntax is used. However, PostgreSQL versions before 11 always chose the field-name interpretation, unless the syntax of the call required it to be a function call. One way to force the function interpretation in older versions is to schema-qualify the function name, that is, write schema
.func
(compositevalue
).
The external text representation of a composite value consists of items that are interpreted according to the I/O conversion rules for the individual field types, plus decoration that indicates the composite structure. The decoration consists of parentheses ((
and )
) around the whole value, plus commas (,
) between adjacent items. Whitespace outside the parentheses is ignored, but within the parentheses it is considered part of the field value, and might or might not be significant depending on the input conversion rules for the field data type. For example, in:
the whitespace will be ignored if the field type is integer, but not if it is text.
As shown previously, when writing a composite value you can write double quotes around any individual field value. You must do so if the field value would otherwise confuse the composite-value parser. In particular, fields containing parentheses, commas, double quotes, or backslashes must be double-quoted. To put a double quote or backslash in a quoted composite field value, precede it with a backslash. (Also, a pair of double quotes within a double-quoted field value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can avoid quoting and use backslash-escaping to protect all data characters that would otherwise be taken as composite syntax.
A completely empty field value (no characters at all between the commas or parentheses) represents a NULL. To write a value that is an empty string rather than NULL, write ""
.
The composite output routine will put double quotes around field values if they are empty strings or contain parentheses, commas, double quotes, backslashes, or white space. (Doing so for white space is not essential, but aids legibility.) Double quotes and backslashes embedded in field values will be doubled.
Remember that what you write in an SQL command will first be interpreted as a string literal, and then as a composite. This doubles the number of backslashes you need (assuming escape string syntax is used). For example, to insert a text
field containing a double quote and a backslash in a composite value, you'd need to write:
The ROW
constructor syntax is usually easier to work with than the composite-literal syntax when writing composite values in SQL commands. In ROW
, individual field values are written the same way they would be written when not members of a composite.\
Object identifiers (OIDs) are used internally by PostgreSQL as primary keys for various system tables. Type oid
represents an object identifier. There are also several alias types for oid
: regproc
, regprocedure
, regoper
, regoperator
, regclass
, regtype
, regrole
, regnamespace
, regconfig
, and regdictionary
. shows an overview.
The oid
type is currently implemented as an unsigned four-byte integer. Therefore, it is not large enough to provide database-wide uniqueness in large databases, or even in large individual tables.
The oid
type itself has few operations beyond comparison. It can be cast to integer, however, and then manipulated using the standard integer operators. (Beware of possible signed-versus-unsigned confusion if you do this.)
The OID alias types have no operations of their own except for specialized input and output routines. These routines are able to accept and display symbolic names for system objects, rather than the raw numeric value that type oid
would use. The alias types allow simplified lookup of OID values for objects. For example, to examine the pg_attribute
rows related to a table mytable
, one could write:
rather than:
While that doesn't look all that bad by itself, it's still oversimplified. A far more complicated sub-select would be needed to select the right OID if there are multiple tables named mytable
in different schemas. The regclass
input converter handles the table lookup according to the schema path setting, and so it does the “right thing” automatically. Similarly, casting a table's OID to regclass
is handy for symbolic display of a numeric OID.
All of the OID alias types for objects grouped by namespace accept schema-qualified names, and will display schema-qualified names on output if the object would not be found in the current search path without being qualified. The regproc
and regoper
alias types will only accept input names that are unique (not overloaded), so they are of limited use; for most uses regprocedure
or regoperator
are more appropriate. For regoperator
, unary operators are identified by writing NONE
for the unused operand.
An additional property of most of the OID alias types is the creation of dependencies. If a constant of one of these types appears in a stored expression (such as a column default expression or view), it creates a dependency on the referenced object. For example, if a column has a default expression nextval('my_seq'::regclass)
, PostgreSQL understands that the default expression depends on the sequence my_seq
; the system will not let the sequence be dropped without first removing the default expression. regrole
is the only exception for the property. Constants of this type are not allowed in such expressions.
The OID alias types do not completely follow transaction isolation rules. The planner also treats them as simple constants, which may result in sub-optimal planning.
Another identifier type used by the system is xid
, or transaction (abbreviated xact) identifier. This is the data type of the system columns xmin
and xmax
. Transaction identifiers are 32-bit quantities.
A third identifier type used by the system is cid
, or command identifier. This is the data type of the system columns cmin
and cmax
. Command identifiers are also 32-bit quantities.
A final identifier type used by the system is tid
, or tuple identifier (row identifier). This is the data type of the system column ctid
. A tuple ID is a pair (block number, tuple index within block) that identifies the physical location of the row within its table.
PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, composite type, range type, or domain can be created.
To illustrate the use of array types, we create this table:
As shown, an array data type is named by appending square brackets ([]
) to the data type name of the array elements. The above command will create a table named sal_emp
with a column of type text
(name
), a one-dimensional array of type integer
(pay_by_quarter
), which represents the employee's salary by quarter, and a two-dimensional array of text
(schedule
), which represents the employee's weekly schedule.
The syntax for CREATE TABLE
allows the exact size of arrays to be specified, for example:
However, the current implementation ignores any supplied array size limits, i.e., the behavior is the same as for arrays of unspecified length.
The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE
is simply documentation; it does not affect run-time behavior.
An alternative syntax, which conforms to the SQL standard by using the keyword ARRAY
, can be used for one-dimensional arrays. pay_by_quarter
could have been defined as:
Or, if no array size is to be specified:
As before, however, PostgreSQL does not enforce the size restriction in any case.
To write an array value as a literal constant, enclose the element values within curly braces and separate them by commas. (If you know C, this is not unlike the C syntax for initializing structures.) You can put double quotes around any element value, and must do so if it contains commas or curly braces. (More details appear below.) Thus, the general format of an array constant is the following:
where delim
is the delimiter character for the type, as recorded in its pg_type
entry. Among the standard data types provided in the PostgreSQL distribution, all use a comma (,
), except for type box
which uses a semicolon (;
). Each val
is either a constant of the array element type, or a subarray. An example of an array constant is:
This constant is a two-dimensional, 3-by-3 array consisting of three subarrays of integers.
To set an element of an array constant to NULL, write NULL
for the element value. (Any upper- or lower-case variant of NULL
will do.) If you want an actual string value “NULL”, you must put double quotes around it.
Now we can show some INSERT
statements:
The result of the previous two inserts looks like this:
Multidimensional arrays must have matching extents for each dimension. A mismatch causes an error, for example:
The ARRAY
constructor syntax can also be used:
Now, we can run some queries on the table. First, we show how to access a single element of an array. This query retrieves the names of the employees whose pay changed in the second quarter:
The array subscript numbers are written within square brackets. By default PostgreSQL uses a one-based numbering convention for arrays, that is, an array of n
elements starts with array[1]
and ends with array[
n
].
This query retrieves the third quarter pay of all employees:
We can also access arbitrary rectangular slices of an array, or subarrays. An array slice is denoted by writing lower-bound
:upper-bound
for one or more array dimensions. For example, this query retrieves the first item on Bill's schedule for the first two days of the week:
If any dimension is written as a slice, i.e., contains a colon, then all dimensions are treated as slices. Any dimension that has only a single number (no colon) is treated as being from 1 to the number specified. For example, [2]
is treated as [1:2]
, as in this example:
To avoid confusion with the non-slice case, it's best to use slice syntax for all dimensions, e.g., [1:2][1:1]
, not [2][1:1]
.
It is possible to omit the lower-bound
and/or upper-bound
of a slice specifier; the missing bound is replaced by the lower or upper limit of the array's subscripts. For example:
An array subscript expression will return null if either the array itself or any of the subscript expressions are null. Also, null is returned if a subscript is outside the array bounds (this case does not raise an error). For example, if schedule
currently has the dimensions [1:3][1:2]
then referencing schedule[3][3]
yields NULL. Similarly, an array reference with the wrong number of subscripts yields a null rather than an error.
An array slice expression likewise yields null if the array itself or any of the subscript expressions are null. However, in other cases such as selecting an array slice that is completely outside the current array bounds, a slice expression yields an empty (zero-dimensional) array instead of null. (This does not match non-slice behavior and is done for historical reasons.) If the requested slice partially overlaps the array bounds, then it is silently reduced to just the overlapping region instead of returning null.
The current dimensions of any array value can be retrieved with the array_dims
function:
array_dims
produces a text
result, which is convenient for people to read but perhaps inconvenient for programs. Dimensions can also be retrieved with array_upper
and array_lower
, which return the upper and lower bound of a specified array dimension, respectively:
array_length
will return the length of a specified array dimension:
cardinality
returns the total number of elements in an array across all dimensions. It is effectively the number of rows a call to unnest
would yield:
An array value can be replaced completely:
or using the ARRAY
expression syntax:
An array can also be updated at a single element:
or updated in a slice:
The slice syntaxes with omitted lower-bound
and/or upper-bound
can be used too, but only when updating an array value that is not NULL or zero-dimensional (otherwise, there is no existing subscript limit to substitute).
A stored array value can be enlarged by assigning to elements not already present. Any positions between those previously present and the newly assigned elements will be filled with nulls. For example, if array myarray
currently has 4 elements, it will have six elements after an update that assigns to myarray[6]
; myarray[5]
will contain null. Currently, enlargement in this fashion is only allowed for one-dimensional arrays, not multidimensional arrays.
Subscripted assignment allows creation of arrays that do not use one-based subscripts. For example one might assign to myarray[-2:7]
to create an array with subscript values from -2 to 7.
New array values can also be constructed using the concatenation operator, ||
:
The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. It also accepts two N
-dimensional arrays, or an N
-dimensional and an N+1
-dimensional array.
When a single element is pushed onto either the beginning or end of a one-dimensional array, the result is an array with the same lower bound subscript as the array operand. For example:
When two arrays with an equal number of dimensions are concatenated, the result retains the lower bound subscript of the left-hand operand's outer dimension. The result is an array comprising every element of the left-hand operand followed by every element of the right-hand operand. For example:
When an N
-dimensional array is pushed onto the beginning or end of an N+1
-dimensional array, the result is analogous to the element-array case above. Each N
-dimensional sub-array is essentially an element of the N+1
-dimensional array's outer dimension. For example:
An array can also be constructed by using the functions array_prepend
, array_append
, or array_cat
. The first two only support one-dimensional arrays, but array_cat
supports multidimensional arrays. Some examples:
In simple cases, the concatenation operator discussed above is preferred over direct use of these functions. However, because the concatenation operator is overloaded to serve all three cases, there are situations where use of one of the functions is helpful to avoid ambiguity. For example consider:
In the examples above, the parser sees an integer array on one side of the concatenation operator, and a constant of undetermined type on the other. The heuristic it uses to resolve the constant's type is to assume it's of the same type as the operator's other input — in this case, integer array. So the concatenation operator is presumed to represent array_cat
, not array_append
. When that's the wrong choice, it could be fixed by casting the constant to the array's element type; but explicit use of array_append
might be a preferable solution.
To search for a value in an array, each value must be checked. This can be done manually, if you know the size of the array. For example:
In addition, you can find rows where the array has all values equal to 10000 with:
Alternatively, the generate_subscripts
function can be used. For example:
You can also search an array using the &&
operator, which checks whether the left operand overlaps with the right operand. For instance:
You can also search for specific values in an array using the array_position
and array_positions
functions. The former returns the subscript of the first occurrence of a value in an array; the latter returns an array with the subscripts of all occurrences of the value in the array. For example:
Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.
The external text representation of an array value consists of items that are interpreted according to the I/O conversion rules for the array's element type, plus decoration that indicates the array structure. The decoration consists of curly braces ({
and }
) around the array value plus delimiter characters between adjacent items. The delimiter character is usually a comma (,
) but can be something else: it is determined by the typdelim
setting for the array's element type. Among the standard data types provided in the PostgreSQL distribution, all use a comma, except for type box
, which uses a semicolon (;
). In a multidimensional array, each dimension (row, plane, cube, etc.) gets its own level of curly braces, and delimiters must be written between adjacent curly-braced entities of the same level.
The array output routine will put double quotes around element values if they are empty strings, contain curly braces, delimiter characters, double quotes, backslashes, or white space, or match the word NULL
. Double quotes and backslashes embedded in element values will be backslash-escaped. For numeric data types it is safe to assume that double quotes will never appear, but for textual data types one should be prepared to cope with either the presence or absence of quotes.
By default, the lower bound index value of an array's dimensions is set to one. To represent arrays with other lower bounds, the array subscript ranges can be specified explicitly before writing the array contents. This decoration consists of square brackets ([]
) around each array dimension's lower and upper bounds, with a colon (:
) delimiter character in between. The array dimension decoration is followed by an equal sign (=
). For example:
The array output routine will include explicit dimensions in its result only when there are one or more lower bounds different from one.
As shown previously, when writing an array value you can use double quotes around any individual array element. You must do so if the element value would otherwise confuse the array-value parser. For example, elements containing curly braces, commas (or the data type's delimiter character), double quotes, backslashes, or leading or trailing whitespace must be double-quoted. Empty strings and strings matching the word NULL
must be quoted, too. To put a double quote or backslash in a quoted array element value, precede it with a backslash. Alternatively, you can avoid quotes and use backslash-escaping to protect all data characters that would otherwise be taken as array syntax.
You can add whitespace before a left brace or after a right brace. You can also add whitespace before or after any individual item string. In all of these cases the whitespace will be ignored. However, whitespace within double-quoted elements, or surrounded on both sides by non-whitespace characters of an element, is not ignored.
JSON 資料型別用於儲存 中所規範的 JSON(JavaScript Object Notation)資料。此類資料也可以儲存為 text,但是 JSON 資料型別的優點是可以根據 JSON 規則強制讓每個儲存的值必須是有效的值 。對於這些資料型別中儲存的資料,還提供了各種特定於 JSON 的函數和運算子。 另請參閱。
PostgreSQL 提供了兩種儲存 JSON 資料的型別:json 和 jsonb。為了對這些資料型別實作有效的查詢機制,PostgreSQL 還提供了 中所描述的 jsonpath 資料型別。
json 和 jsonb 資料型別接受幾乎相同的內容集合作為輸入。實際主要的差別是效率。json 資料型別儲存與輸入字串完全相同的內容,處理函數必須在每次執行時重新解析;jsonb 資料型別則以分解後的二進位格式儲存,由於增加了轉換成本,因此資料輸入的速度稍慢,但由於後續不需要解析,因此處理速度明顯加快。jsonb 還支援索引處理,這是一個很大的優勢。
因為 json 型別儲存與輸入字串完全相同的內容,所以它將保留標記之間語義上無關的空白以及 JSON 物件中鍵的順序。另外,如果 JSON 內容物件包含相同的鍵不只一次,則所有鍵/值對都會保留。(處理函數會將最後一個值視為可用的值。)相比之下,jsonb 不會保留空白,不會保留物件中鍵的順序,也不會保留物件中重複的鍵。如果在輸入中指定了重複的鍵,則僅保留最後一個值。
通常,大多數應用程序應該將 JSON 資料儲存為 jsonb,除非有非常特殊的需求,例如關於物件中鍵的順序有一些傳統上的假設。
由於 PostgreSQL 每個資料庫只允許一種字元集的編碼。因此,除非資料庫編碼為 UTF8,否則 JSON 型別不可能嚴格符合 JSON 規範。嘗試直接使用資料庫編碼中無法表示的字元會失敗;相反,character 型別則允許使用可以在資料庫編碼中表示但不能以 UTF8 表示的字元。
RFC 7159 允許 JSON 字串包含 \uXXXX 所表示的 Unicode 轉譯序列。在 json 型別的輸入函數中,無論資料庫編碼如何,都允許 Unicode 轉譯,並且僅檢查語法正確性(即,四個十六進位數字跟在 \u 之後)。但是,jsonb 的輸入函數更嚴格:除非資料庫編碼為 UTF8,否則它不允許非 ASCII 字元(U+007F 以上的字元)使用 Unicode 轉譯。jsonb 型別也拒絕 \u0000(因為無法在 PostgreSQL 的 text 型別中表現),並且堅持認為使用 Unicode surrogate pair 對來指定 Unicode Basic Multilingual Plane 之外的字元都是正確的。有效的 Unicode 轉譯會轉換為等效的 ASCII 或 UTF8 字元進行儲存; 這包括將 surrogate pair 折疊為單個字元。
第 9.15 節中描述的許多 JSON 處理函數會將 Unicode 轉譯為一般字元,因此,即使輸入型別為 json 而不是 jsonb,它們也會拋出與上述類型相同的錯誤。json 輸入函數不進行這些檢查的事實可能被認為是歷史共業,儘管它確實允許以非 UTF8 資料庫編碼的形式簡單儲存(毋須處理)JSON Unicode 轉譯。 通常,如果可以的話,最好避免將 JSON 中的 Unicode 轉譯與非 UTF8 資料庫編碼混在一起。
將字串 JSON 輸入轉換為 jsonb 時,RFC 7159 描述的原始型別將會有效地對應到內建的 PostgreSQL 型別,如 Table 8.23 所示。因此,對於構成有效 jsonb 資料的內容存在一些較小的附加約束條件,這些約束條件既不適用於 json 型別,也不適用於抽象上 JSON,這對應於基礎資料型別可以表示的內容限制。值得注意的是,jsonb 會拒絕 PostgreSQL 數字資料型別範圍之外的數字,而 json 不會。RFC 7159 允許此類實作定義限制。但是,實際上,在其他實作中更容易出現此類問題,因為通常將 JSON 的數字基本型別表示為 IEEE 754 雙精確度浮點數(RFC 7159 明確預期了這一點且允許)。當使用 JSON 作為此類系統的交換格式時,應考慮與 PostgreSQL 最初儲存的資料相比較,可能會有失去數字精確度的風險。
相反,如下表中所示,JSON 基本型別的輸入格式有一些微小的限制,但並不適用於其相應的 PostgreSQL 資料型別。
JSON 資料型別的輸入/輸出語法被規範在 RFC 7159 之中。
以下是所有有效的 json(或 jsonb)表示式:
如前所述,當輸入 JSON 內容然後在不進行任何其他處理的情況下進行輸出時,json 輸出與輸入相同的內容,而 jsonb 則不會保留與語義無關的細節,像是空格。例如,請注意此處的差別:
值得注意的一個語義無關的細節是,在 jsonb 中,數字將根據基本數字型別的行為進行輸出。實際上,這意味著使用 E 記號輸入的數字將不會以原輸出形式輸出,例如:
但是,jsonb 將保留小數尾巴的數字零,如在本範例中所示,即使它們在語義上無意義(例如,相等運算),也是如此。
將資料表示為 JSON 可以比傳統的關連資料模型要靈活得多,而傳統的關連資料模型在需求多變的環境中非常引人注目。這兩種方法很可能在同一應用程序中共存和互補。但是,即使對於需要最大靈活性的應用程序,仍然建議 JSON 文件具有某種固定的結構。該結構通常是不具有強制性的(儘管可以宣告強制執行某些業務規則),但是具有可預測的結構可以使撰編查詢變得更加容易,該查詢可以有效地彙總資料表中的一組「文件」(datums)。
JSON 資料儲存在資料表中時,與其他任何資料型別一樣,要遵循相同的一致性控制事項。儘管儲存大型文件是可行的,但請記住,任何更新都會取得整筆資料的 row-level lock。考慮將 JSON 文件限制在可管理的大小以內,以減少更新交易事務之間的鎖定競爭。理想情況下,每個 JSON 文件都應代表一個完整交易單位資料(atomic datum),業務規則規定不能將該完整交易單位資料進一步細分為可以獨立更新的較小單位資料。
jsonb
Containment and Existence測試包容性(containment)是 jsonb 的一項重要功能。json 型別沒有平行處理的工具集。包含性測試一個 jsonb 文件是否在其中包含另一個。除說明以外的部份,這些範例會回傳 true:
一般原則是,包含物件必須在結構和資料內容上與包含的物件相吻合,可能是在從包含的物件中丟棄了一些不吻合的陣列元素或物件鍵/值配對之後。但是請記住,進行包含性檢查時,陣列元素的順序並不重要,並且重複陣列元素僅有一個元素會被視為有效。
作為結構必須吻合的一般原則的特殊例外,陣列可以包含單一基本值:
jsonb 還具有一個 existence 運算子,它是包含性的變體:它測試字串(作為 text 值)是否作為物件鍵或陣列元素出現在 jsonb 值的頂層。這些範例回傳 true,除非另有說明:
當涉及許多鍵或元素時,JSON 物件比陣列更適合用於測試是否包含或存在,因為與陣列不同,JSON 物件在內部進行了最佳化以進行搜尋,因此不需要線性搜尋。
由於 JSON 的包含性是巢狀的,因此適當的查詢可以跳過對子物件的明確選擇。舉例來說,假設我們有一個 doc 欄位,其中包含最上層物件,而大多數物件包含子物件陣列的標籤欄位。該查詢項目,在其中包含“ term”:“ paris”和“ term”:“ food”的子物件出現,而忽略標籤陣列以外的任何鍵:
例如,另一個方式可以完成同一件事
但是這種方法靈活性較差,而且效率通常也較低。
另一方面,JSON 存在性運算子不是巢狀的:它只會在 JSON 內容的最上層查詢指定的鍵或陣列元素。
在第 9.15 節中記錄了各種包含性和存在性的運算子,以及所有其他 JSON 運算子和函數。
jsonb
IndexingGIN 索引可用於有效搜尋大量的 jsonb 文件(datums)中出現的鍵或鍵/值配對。有兩種 GIN “operator classes”,提供了不同的效能和靈活性權衡。
非預設 GIN 運算子類 jsonb_path_ops 僅支援對 @> 運算子進行索引。使用此運算子類建立索引的範例是:
想像一個資料表的範例,該資料表儲存了從第三方 Web 服務檢索到的 JSON 文件以及已文件化的結構定義。典型的文件是:
我們將這些文件儲存在名為 api 的資料表中,名為 jdoc 的 jsonb 欄位中。如果在此欄位上建立了 GIN 索引,則如下查詢可以使用到該索引:
但是,索引不能用於以下查詢,儘管運算子 ? 是可索引的,但它不會直接套用於索引欄位 jdoc:
儘管如此,透過適當使用表示式索引,上述查詢仍可以使用索引。如果在“tags”鍵中查詢特定項目很常見,則定義這樣的索引可能是值得的:
另外,GIN 索引支援 @@ 和 @? 運算子,它們處理 jsonpath 的搜尋。
GIN 索引從 jsonpath 中取出以下形式的語句:accessors_chain = const
。Accessors chain 可能由 .key,[*] 和 [index] 的 Accessor 所組成_。jsonb_ops 也支持 .*_ 和 .** 的 Accessor。
查詢的另一種方法是利用 containment,例如:
jdoc 欄位上的簡單 GIN 索引可以支援此查詢。但是請注意,這樣的索引將在 jdoc 欄位中儲存每個鍵和值的副本,而上一範例的表示式索引僅儲存在 tag 鍵下所找到的資料。儘管簡單索引方法更加靈活(因為它支援對任何鍵的查詢),但目標表示式索引可能比簡單索引更小且搜尋速度更快。
儘管 jsonb_path_ops 運算子類僅支援使用 @>,@@ 和 @? 運算子的查詢,它比預設的運算子類 jsonb_ops 具有明顯的效能優勢。對於相同資料集,jsonb_path_ops 索引通常也比 jsonb_ops 索引小得多,針對搜尋的專用性更好,尤其是當查詢包含頻繁出現在資料中的鍵時。因此,搜尋性質的操作通常比預設運算子類具有更好的效能。
A disadvantage of the jsonb_path_ops
approach is that it produces no index entries for JSON structures not containing any values, such as {"a": {}}
. If a search for documents containing such a structure is requested, it will require a full-index scan, which is quite slow. jsonb_path_ops
is therefore ill-suited for applications that often perform such searches.
jsonb
also supports btree
and hash
indexes. These are usually useful only if it's important to check equality of complete JSON documents. The btree
ordering for jsonb
datums is seldom of great interest, but for completeness it is:
Objects with equal numbers of pairs are compared in the order:
Note that object keys are compared in their storage order; in particular, since shorter keys are stored before longer keys, this can lead to results that might be unintuitive, such as:
Similarly, arrays with equal numbers of elements are compared in the order:
Primitive JSON values are compared using the same comparison rules as for the underlying PostgreSQL data type. Strings are compared using the default database collation.
jsonb
SubscriptingThe jsonb
data type supports array-style subscripting expressions to extract and modify elements. Nested values can be indicated by chaining subscripting expressions, following the same rules as the path
argument in the jsonb_set
function. If a jsonb
value is an array, numeric subscripts start at zero, and negative integers count backwards from the last element of the array. Slice expressions are not supported. The result of a subscripting expression is always of the jsonb data type.
UPDATE
statements may use subscripting in the SET
clause to modify jsonb
values. Subscript paths must be traversable for all affected values insofar as they exist. For instance, the path val['a']['b']['c']
can be traversed all the way to c
if every val
, val['a']
, and val['a']['b']
is an object. If any val['a']
or val['a']['b']
is not defined, it will be created as an empty object and filled as necessary. However, if any val
itself or one of the intermediary values is defined as a non-object such as a string, number, or jsonb
null
, traversal cannot proceed so an error is raised and the transaction aborted.
An example of subscripting syntax:
jsonb
assignment via subscripting handles a few edge cases differently from jsonb_set
. When a source jsonb
value is NULL
, assignment via subscripting will proceed as if it was an empty JSON value of the type (object or array) implied by the subscript key:
If an index is specified for an array containing too few elements, NULL
elements will be appended until the index is reachable and the value can be set.
A jsonb
value will accept assignments to nonexistent subscript paths as long as the last existing element to be traversed is an object or array, as implied by the corresponding subscript (the element indicated by the last subscript in the path is not traversed and may be anything). Nested array and object structures will be created, and in the former case null
-padded, as specified by the subscript path until the assigned value can be placed.
可以使用其他延伸功能來實作針對不同程序語言的 jsonb 型別轉換。
PL/Perl 的延伸功能名稱為 jsonb_plperl 和 jsonb_plperlu。如果使用它們,則 jsonb 的值將視情況對應轉換為到 Perl 的 array、hash 和 scalar。
PL/Python 的延伸功能名稱為 jsonb_plpython3u。使用的時候,jsonb 值將適當地對應轉換到 Python 的 dictionary,list 和 scalar。
在這些延伸功能中,jsonb_plperl 是「trusted」,也就是說,它可以由對目前資料庫具有 CREATE 權限的非超級使用者自行安裝。其餘的需要超級使用者權限才能安裝。
jsonpath 型別實現了 PostgreSQL 中對 SQL/JSON 路徑語法的支援,以有效地查詢 JSON 資料。它提供以二元運算的形式來使用已解析的 SQL/JSON 路徑表示式,此表示式讓路徑引擎從 JSON 資料檢索的項目取出內容,以供 SQL/JSON 查詢函數進一步處理。
SQL / JSON 路徑 predicate 和運算子的語義基本遵循 SQL 標準。同時,為了提供使用 JSON 資料的更自然的方式,SQL/JSON 路徑語法使用了一些 JavaScript 約定:
點(.)用於資料成員存取。
中括號([ ])用於陣列存取。
與從 1 開始的一般 SQL 陣列不同,SQL/JSON 陣列是 從 0 開始。
A path expression consists of a sequence of path elements, which can be any of the following:
Path literals of JSON primitive types: Unicode text, numeric, true, false, or null.
Parentheses, which can be used to provide filter expressions or define the order of path evaluation.
jsonpath
Variablesjsonpath
AccessorsPostgreSQL 支援標準 SQL 的布林型別,如 所示。布林型別有幾種狀態: "true"、"false",和第三種狀態 "unknown","unknown" 會用 SQL 的 null 值表示。
以下的字詞都可以代表 "true" 狀態:
"false" 狀態則可以用以下的字詞表示:
開頭和結尾的空白都會被忽略,也不分大小寫。 為了符合 SQL 用法,建議使用關鍵字 "TRUE" 和 "FALSE"。
使用字母 t 和 f,來顯示布林型別的輸出。
Enumerated (enum) types are data types that comprise a static, ordered set of values. They are equivalent to the enum
types supported in a number of programming languages. An example of an enum type might be the days of the week, or a set of status values for a piece of data.
Enum types are created using the command, for example:
Once created, the enum type can be used in table and function definitions much like any other type:
The ordering of the values in an enum type is the order in which the values were listed when the type was created. All standard comparison operators and related aggregate functions are supported for enums. For example:
Each enumerated data type is separate and cannot be compared with other enumerated types. See this example:
If you really need to do something like that, you can either write a custom operator or add explicit casts to your query:
Enum labels are case sensitive, so 'happy'
is not the same as 'HAPPY'
. White space in the labels is significant too.
An enum value occupies four bytes on disk. The length of an enum value's textual label is limited by the NAMEDATALEN
setting compiled into PostgreSQL; in standard builds this means at most 63 bytes.
範圍型別(Range Type)是表示某種資料型別(稱為範圍的子類型)其元素值為某種範圍的資料型別。例如,時間戳記範圍可用於表示保留會議室的時間範圍。在這種情況下,資料型別為 tsrange(「時間戳記範圍」的縮寫),時間戳記是子型別。子型別必須具有次序性,以便可以很明確地定義元素值是在範圍之內,之前還是之後。
範圍類型之所以有用,是因為它們在某個範圍值中表示許多元素值,並且因為可以清楚地表示諸如重疊範圍之類的概念。將時間和日期範圍用於計劃目的是最明顯的例子;還有像是價格範圍、儀器的測量範圍等等也會有用。
每個範圍型別都有相對應的多範圍型別。多範圍是非連續、非空值、非空範圍的有序列表。大多數範圍運算子也適用於多範圍,並且它們也有一些自己的函數。
PostgreSQL 內建了以下內建範圍型別:
int4range
— Range of integer
int8range
— Range of bigint
numrange
— Range of numeric
tsrange
— Range of timestamp without time zone
tstzrange
— Range of timestamp with time zone
daterange
— Range of date
另外,您也可以定義自己的範圍類型。 有關更多說明,請參閱 。
Every non-empty range has two bounds, the lower bound and the upper bound. All points between these values are included in the range. An inclusive bound means that the boundary point itself is included in the range as well, while an exclusive bound means that the boundary point is not included in the range.
The functions lower_inc
and upper_inc
test the inclusivity of the lower and upper bounds of a range value, respectively.
The lower bound of a range can be omitted, meaning that all values less than the upper bound are included in the range, e.g., (,3]
. Likewise, if the upper bound of the range is omitted, then all values greater than the lower bound are included in the range. If both lower and upper bounds are omitted, all values of the element type are considered to be in the range. Specifying a missing bound as inclusive is automatically converted to exclusive, e.g., [,]
is converted to (,)
. You can think of these missing values as +/-infinity, but they are special range type values and are considered to be beyond any range element type's +/-infinity values.
Element types that have the notion of “infinity” can use them as explicit bound values. For example, with timestamp ranges, [today,infinity)
excludes the special timestamp
value infinity
, while [today,infinity]
include it, as does [today,)
and [today,]
.
The functions lower_inf
and upper_inf
test for infinite lower and upper bounds of a range, respectively.
The input for a range value must follow one of the following patterns:
The parentheses or brackets indicate whether the lower and upper bounds are exclusive or inclusive, as described previously. Notice that the final pattern is empty
, which represents an empty range (a range that contains no points).
The lower-bound
may be either a string that is valid input for the subtype, or empty to indicate no lower bound. Likewise, upper-bound
may be either a string that is valid input for the subtype, or empty to indicate no upper bound.
Each bound value can be quoted using "
(double quote) characters. This is necessary if the bound value contains parentheses, brackets, commas, double quotes, or backslashes, since these characters would otherwise be taken as part of the range syntax. To put a double quote or backslash in a quoted bound value, precede it with a backslash. (Also, a pair of double quotes within a double-quoted bound value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can avoid quoting and use backslash-escaping to protect all data characters that would otherwise be taken as range syntax. Also, to write a bound value that is an empty string, write ""
, since writing nothing means an infinite bound.
Whitespace is allowed before and after the range value, but any whitespace between the parentheses or brackets is taken as part of the lower or upper bound value. (Depending on the element type, it might or might not be significant.)
Examples:
The input for a multirange is curly brackets ({
and }
) containing zero or more valid ranges, separated by commas. Whitespace is permitted around the brackets and commas. This is intended to be reminiscent of array syntax, although multiranges are much simpler: they have just one dimension and there is no need to quote their contents. (The bounds of their ranges may be quoted as above however.)
Examples:
Each range type has a constructor function with the same name as the range type. Using the constructor function is frequently more convenient than writing a range literal constant, since it avoids the need for extra quoting of the bound values. The constructor function accepts two or three arguments. The two-argument form constructs a range in standard form (lower bound inclusive, upper bound exclusive), while the three-argument form constructs a range with bounds of the form specified by the third argument. The third argument must be one of the strings “()
”, “(]
”, “[)
”, or “[]
”. For example:
Each range type also has a multirange constructor with the same name as the multirange type. The constructor function takes zero or more arguments which are all ranges of the appropriate type. For example:
A discrete range is one whose element type has a well-defined “step”, such as integer
or date
. In these types two elements can be said to be adjacent, when there are no valid values between them. This contrasts with continuous ranges, where it's always (or almost always) possible to identify other element values between two given values. For example, a range over the numeric
type is continuous, as is a range over timestamp
. (Even though timestamp
has limited precision, and so could theoretically be treated as discrete, it's better to consider it continuous since the step size is normally not of interest.)
Another way to think about a discrete range type is that there is a clear idea of a “next” or “previous” value for each element value. Knowing that, it is possible to convert between inclusive and exclusive representations of a range's bounds, by choosing the next or previous element value instead of the one originally given. For example, in an integer range type [4,8]
and (3,9)
denote the same set of values; but this would not be so for a range over numeric.
A discrete range type should have a canonicalization function that is aware of the desired step size for the element type. The canonicalization function is charged with converting equivalent values of the range type to have identical representations, in particular consistently inclusive or exclusive bounds. If a canonicalization function is not specified, then ranges with different formatting will always be treated as unequal, even though they might represent the same set of values in reality.
The built-in range types int4range
, int8range
, and daterange
all use a canonical form that includes the lower bound and excludes the upper bound; that is, [)
. User-defined range types can use other conventions, however.
Users can define their own range types. The most common reason to do this is to use ranges over subtypes not provided among the built-in range types. For example, to define a new range type of subtype float8
:
Because float8
has no meaningful “step”, we do not define a canonicalization function in this example.
When you define your own range you automatically get a corresponding multirange type.
Defining your own range type also allows you to specify a different subtype B-tree operator class or collation to use, so as to change the sort ordering that determines which values fall into a given range.
If the subtype is considered to have discrete rather than continuous values, the CREATE TYPE
command should specify a canonical
function. The canonicalization function takes an input range value, and must return an equivalent range value that may have different bounds and formatting. The canonical output for two ranges that represent the same set of values, for example the integer ranges [1, 7]
and [1, 8)
, must be identical. It doesn't matter which representation you choose to be the canonical one, so long as two equivalent values with different formattings are always mapped to the same value with the same formatting. In addition to adjusting the inclusive/exclusive bounds format, a canonicalization function might round off boundary values, in case the desired step size is larger than what the subtype is capable of storing. For instance, a range type over timestamp
could be defined to have a step size of an hour, in which case the canonicalization function would need to round off bounds that weren't a multiple of an hour, or perhaps throw an error instead.
In addition, any range type that is meant to be used with GiST or SP-GiST indexes should define a subtype difference, or subtype_diff
, function. (The index will still work without subtype_diff
, but it is likely to be considerably less efficient than if a difference function is provided.) The subtype difference function takes two input values of the subtype, and returns their difference (i.e., X
minus Y
) represented as a float8
value. In our example above, the function float8mi
that underlies the regular float8
minus operator can be used; but for any other subtype, some type conversion would be necessary. Some creative thought about how to represent differences as numbers might be needed, too. To the greatest extent possible, the subtype_diff
function should agree with the sort ordering implied by the selected operator class and collation; that is, its result should be positive whenever its first argument is greater than its second according to the sort ordering.
A less-oversimplified example of a subtype_diff
function is:
GiST and SP-GiST indexes can be created for table columns of range types. GiST indexes can be also created for table columns of multirange types. For instance, to create a GiST index:
In addition, B-tree and hash indexes can be created for table columns of range types. For these index types, basically the only useful range operation is equality. There is a B-tree sort ordering defined for range values, with corresponding <
and >
operators, but the ordering is rather arbitrary and not usually useful in the real world. Range types' B-tree and hash support is primarily meant to allow sorting and hashing internally in queries, rather than creation of actual indexes.
That constraint will prevent any overlapping values from existing in the table at the same time:
PostgreSQL 型別系統包含許多專用項目,這些項目統稱為概念型別(pseudo-type)。概念型別不能用作實際的欄位資料型別,但可以用來宣告函數的參數或結果型別。各個概念型別在於函數的行為不只是對應於簡單地獲取或回傳特定 SQL 資料型別的值的情況下很有用。Table 8.27 列出了現有的概念型別。
Functions coded in C (whether built-in or dynamically loaded) can be declared to accept or return any of these pseudo data types. It is up to the function author to ensure that the function will behave safely when a pseudo-type is used as an argument type.
Functions coded in procedural languages can use pseudo-types only as allowed by their implementation languages. At present most procedural languages forbid use of a pseudo-type as an argument type, and allow only void
and record
as a result type (plus trigger
or event_trigger
when the function is used as a trigger or event trigger). Some also support polymorphic functions using the types anyelement
, anyarray
, anynonarray
, anyenum
, and anyrange
.
The internal
pseudo-type is used to declare functions that are meant only to be called internally by the database system, and not by direct invocation in an SQL query. If a function has at least one internal
-type argument then it cannot be called from SQL. To preserve the type safety of this restriction it is important to follow this coding rule: do not create any function that is declared to return internal
unless it has at least one internal
argument.\
char_length 函數在 中討論。
日期和時間的輸入格式可以接受幾乎任何合理的格式,包括 ISO 8601、相容於 SQL 的格式、傳統 POSTGRES 格式或者其他格式。在部份格式中,日期的年、月、日的順序可能很含糊,因此有支援指定這些欄位期望的順序。可以設定 參數為 MDY
來以 月-日-年 表示、設定為 DMY
以 日-月-年 表示、或者設定為 YMD
以 年-月-日 表示。
PostgreSQL 在處理日期/時間的輸入是比 SQL 標準要求的更加靈活,關於精確的解析規則以及包含月份、一週天數、時區等可以接受的文字欄位,可以參閱。
請記得,任何日期和時間字面的輸入,都需要像文字一樣以單引號結束,詳細的資訊請參閱。SQL 要求使用以下的語法:
列出 date
型態的一些可能的輸入格式:
這些型態的合法輸入包含了一天當中的時間,以及非必須的時區。(請參照 和)。如果在 time without time zone
的輸入中指定了時區,則時區會被無聲地忽略。你也可以指定日期,但日期也會被忽略,除非你指定的時區名稱是像 America/New_York
這種具有日光節約規則的時區,因為在這種狀況下,為了能夠決定要套用一般規則或是日光節約規則,必須要有日期。適合的時差資訊會被紀錄在 time with time zone
的值當中。
關於指定時區的其他資訊,請參照。
對於帶有時區的時間戳記,內部儲存的值始終為 UTC(Universal Coordinated Time,傳統上稱為格林威治標準時間,GMT)。具有指定時區的輸入值將使用該時區的相對偏移量轉換為 UTC。如果輸入字串中未指定時區,則假定它位於系統的 參數所指示的時區中,並使用時區的偏移量轉換為 UTC。
輸出帶有時區值的時間戳記時,始終由 UTC 轉換為目前時區,並在該時區中顯示為本地時間。要查看另一個時區的時間,請變更時區或使用 AT TIME ZONE 語法(參閱)。
以下 SQL 相容函數也可用於取得相對應資料型別目前的時間值:CURRENT_DATE,CURRENT_TIME,CURRENT_TIMESTAMP,LOCALTIME,LOCALTIMESTAMP。後四者接受選擇性的 subsecond 級精確度。 (請參閱。)請注意,這些是 SQL 函數,在資料輸入字串中會無法識別。
The output format of the date/time types can be set to one of the four styles ISO 8601, SQL (Ingres), traditional POSTGRES (Unix date format), or German. The default is the ISO format. (The SQL standard requires the use of the ISO 8601 format. The name of the “SQL” output format is a historical accident.) shows examples of each output style. The output of the date
and time
types is generally only the date or time part in accordance with the given examples. However, the POSTGRES style outputs date-only values in ISO format.
ISO 8601 specifies the use of uppercase letter T
to separate the date and time. PostgreSQL accepts that format on input, but on output it uses a space rather than T
, as shown above. This is for readability and for consistency with as well as some other database systems.
In the SQL and POSTGRES styles, day appears before month if DMY field ordering has been specified, otherwise month appears before day. (See for how this setting also affects interpretation of input values.) shows examples.
The date/time style can be selected by the user using the SET datestyle
command, the parameter in the postgresql.conf
configuration file, or the PGDATESTYLE
environment variable on the server or client.
The formatting function to_char
(see ) is also available as a more flexible way to format date/time output.
All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the zone specified by the configuration parameter before being displayed to the client.
A full time zone name, for example America/New_York
. The recognized time zone names are listed in the pg_timezone_names
view (see ). PostgreSQL uses the widely-used IANA time zone data for this purpose, so the same time zone names are also recognized by other software.
A time zone abbreviation, for example PST
. Such a specification merely defines a particular offset from UTC, in contrast to full time zone names which can imply a set of daylight savings transition rules as well. The recognized abbreviations are listed in the pg_timezone_abbrevs
view (see ). You cannot set the configuration parameters or to a time zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME ZONE
operator.
In addition to the timezone names and abbreviations, PostgreSQL will accept POSIX-style time zone specifications, as described in . This option is not normally preferable to using a named time zone, but it may be necessary if no suitable IANA time zone entry is available.
Neither timezone names nor abbreviations are hard-wired into the server; they are obtained from configuration files stored under .../share/timezone/
and .../share/timezonesets/
of the installation directory (see ).
The configuration parameter can be set in the file postgresql.conf
, or in any of the other standard ways described in . There are also some special ways to set it:
where quantity
is a number (possibly signed); unit
is microsecond
, millisecond
, second
, minute
, hour
, day
, week
, month
, year
, decade
, century
, millennium
, or abbreviations or plurals of these units; direction
can be ago
or empty. The at sign (@
) is optional noise. The amounts of the different units are implicitly added with appropriate sign accounting. ago
negates all the fields. This syntax is also used for interval output, if is set to postgres_verbose
.
The string must start with a P
, and may include a T
that introduces the time-of-day units. The available unit abbreviations are given in . Units may be omitted, and may be specified in any order, but units smaller than a day must appear after T
. In particular, the meaning of M
depends on whether it is before or after T
.
shows some examples of valid interval
input.
The output format of the interval type can be set to one of the four styles sql_standard
, postgres
, postgres_verbose
, or iso_8601
, using the command SET intervalstyle
. The default is the postgres
format. shows examples of each output style.
The output of the postgres
style matches the output of PostgreSQL releases prior to 8.4 when the parameter was set to ISO
.
Care must be taken when dealing with multiple character encodings on the client, server, and in the XML data passed through them. When using the text mode to pass queries to the server and query results to the client (which is the normal mode), PostgreSQL converts all character data passed between the client and the server and vice versa to the character encoding of the respective end; see . This includes string representations of XML values, such as in the above examples. This would ordinarily mean that encoding declarations contained in XML data can become invalid as the character data is converted to other encodings while traveling between client and server, because the embedded encoding declaration is not changed. To cope with this behavior, encoding declarations contained in character strings presented for input to the xml
type are ignored, and content is assumed to be in the current server encoding. Consequently, for correct processing, character strings of XML data must be sent from the client in the current client encoding. It is the responsibility of the client to either convert documents to the current client encoding before sending them to the server, or to adjust the client encoding appropriately. On output, values of type xml
will not have an encoding declaration, and clients should assume all data is in the current client encoding.
Again, see for more detail.
To write a composite value as a literal constant, enclose the field values within parentheses and separate them by commas. You can put double quotes around any field value, and must do so if it contains commas or parentheses. (More details appear .) Thus, the general format of a composite constant is the following:
(These constants are actually only a special case of the generic type constants discussed in . The constant is initially treated as a string and passed to the composite-type input conversion routine. An explicit type specification might be necessary to tell which type to convert the constant to.)
The ROW
expression syntax is discussed in more detail in .
The special field name *
means “all fields”, as further explained in .
In PostgreSQL, a reference to a table name (or alias) in a query is effectively a reference to the composite value of the table's current row. For example, if we had a table inventory_item
as shown , we could write:
The ordinary qualified-column-name syntax table_name
.
column_name
can be understood as applying to the composite value of the table's current row. (For efficiency reasons, it's not actually implemented that way.)
PostgreSQL will apply this expansion behavior to any composite-valued expression, although as shown , you need to write parentheses around the value that .*
is applied to whenever it's not a simple table name. For example, if myfunc()
is a function returning a composite type with columns a
, b
, and c
, then these two queries have the same result:
The composite_value
.*
syntax results in column expansion of this kind when it appears at the top level of a , a in INSERT
/UPDATE
/DELETE
, a , or a . In all other contexts (including when nested inside one of those constructs), attaching .*
to a composite value does not change the value, since it means “all columns” and so the same composite value is produced again. For example, if somefunc()
accepts a composite-valued argument, these queries are the same:
All of these ORDER BY
clauses specify the row's composite value, resulting in sorting the rows according to the rules described in . However, if inventory_item
contained a column named c
, the first case would be different from the others, as it would mean to sort by that column only. Given the column names previously shown, these queries are also equivalent to those above:
The string-literal processor removes one level of backslashes, so that what arrives at the composite-value parser looks like ("\"\\")
. In turn, the string fed to the text
data type's input routine becomes "\
. (If we were working with a data type whose input routine also treated backslashes specially, bytea
for example, we might need as many as eight backslashes in the command to get one backslash into the stored composite field.) Dollar quoting (see ) can be used to avoid the need to double backslashes.
(The system columns are further explained in .)
(These kinds of array constants are actually only a special case of the generic type constants discussed in . The constant is initially treated as a string and passed to the array input conversion routine. An explicit type specification might be necessary.)
Notice that the array elements are ordinary SQL constants or expressions; for instance, string literals are single quoted, instead of double quoted as they would be in an array literal. The ARRAY
constructor syntax is discussed in more detail in .
However, this quickly becomes tedious for large arrays, and is not helpful if the size of the array is unknown. An alternative method is described in . The above query could be replaced by:
This function is described in .
This and other array operators are further described in . It can be accelerated by an appropriate index, as described in .
If the value written for an element is NULL
(in any case variant), the element is taken to be NULL. The presence of any quotes or backslashes disables this and allows the literal string value “NULL” to be entered. Also, for backward compatibility with pre-8.2 versions of PostgreSQL, the configuration parameter can be turned off
to suppress recognition of NULL
as a NULL.
The ARRAY
constructor syntax (see ) is often easier to work with than the array-literal syntax when writing array values in SQL commands. In ARRAY
, individual element values are written the same way they would be written when not members of an array.\
有關可用於建構和處理 JSON 內容的內建函數和運算子的列表,請參閱。
jsonb 的預設 GIN 運算子類支援使用最上層鍵存在的運算子 ?,?& 和 ?| 進行查詢。運算子和路徑/值存在性運算子 @>。(有關這些運算子實作的語義的詳細信息,請參見 。)使用此運算子類建立索引的範例是:
現在,WHERE 子句 jdoc->'tags' ? 'qui' 將被識別為可索引運算子的應用程序 ? 到索引表示式 jdoc->'tags'。(有關表示式索引的更多資訊,請參閱。)
The technical difference between a jsonb_ops
and a jsonb_path_ops
GIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data. Basically, each jsonb_path_ops
index item is a hash of the value and the key(s) leading to it; for example to index {"foo": {"bar": "baz"}}
, a single index item would be created incorporating all three of foo
, bar
, and baz
into the hash value. Thus a containment query looking for this structure would result in an extremely specific index search; but there is no way at all to find out whether foo
appears as a key. On the other hand, a jsonb_ops
index would create three index items representing foo
, bar
, and baz
separately; then to do the containment query, it would look for rows containing all three of these items. While GIN indexes can perform such an AND search fairly efficiently, it will still be less specific and slower than the equivalent jsonb_path_ops
search, especially if there are a very large number of rows containing any single one of the three index items.
SQL/JSON 路徑表示式通常以 SQL 字串文字形式寫在 SQL 查詢中,因此它必須用單引號引起來,並且值中所需的任何單引號都必須加倍(請參閱)。某些形式的路徑表示式需要在其中包含字串文字。這些嵌入的字串文字遵循 JavaScript/ECMAScript 約定:它們必須用雙引號引起來,並且在其中可以使用反斜線轉譯符號來表示,否則很難輸入的字元。特別地,在嵌入式字串文字中寫雙引號的方式是 \",而寫反斜線本身則必須寫成 \。其他特殊的反斜線序列包括在 JSON 字串中識別的那些:\b,\f,\n,\r,\t,\v 用於各種 ASCII 控制字元,\uNNNN 用於其 4 進位數字代碼標識的 Unicode 字元。反斜線語法還包括 JSON 不允許的兩種情況:\xNN 僅用兩個十六進位數字編寫的字元代碼,而 \u {N ...} 用於用 1 至 6 個十六進位數字編寫的字元代碼。
Path variables listed in .
Accessor operators listed in .
jsonpath
operators and methods listed in .
For details on using jsonpath
expressions with SQL/JSON query functions, see .
For this purpose, the term “value” includes array elements, though JSON terminology sometimes considers array elements distinct from values within objects.
Although enum types are primarily intended for static sets of values, there is support for adding new values to an existing enum type, and for renaming values (see ). Existing values cannot be removed from an enum type, nor can the sort ordering of such values be changed, short of dropping and re-creating the enum type.
The translations from internal enum values to textual labels are kept in the system catalog . Querying this catalog directly can be useful.
有關範圍型別的運算子和函數的完整列表,請參閱 和 。
In the text form of a range, an inclusive lower bound is represented by “[
” while an exclusive lower bound is represented by “(
”. Likewise, an inclusive upper bound is represented by “]
”, while an exclusive upper bound is represented by “)
”. (See for more details.)
These rules are very similar to those for writing field values in composite-type literals. See for additional commentary.
See for more information about creating range types.
A GiST or SP-GiST index on ranges can accelerate queries involving these range operators: =
, &&
, <@
, @>
, <<
, >>
, -|-
, &<
, and &>
. A GiST index on multiranges can accelerate queries involving the same set of multirange operators. A GiST index on ranges and GiST index on multiranges can also accelerate queries involving these cross-type range to multirange and multirange to range operators correspondingly: &&
, <@
, @>
, <<
, >>
, -|-
, &<
, and &>
. See for more information.
While UNIQUE
is a natural constraint for scalar values, it is usually unsuitable for range types. Instead, an exclusion constraint is often more appropriate (see ). Exclusion constraints allow the specification of constraints such as “non-overlapping” on a range type. For example:
You can use the extension to define exclusion constraints on plain scalar data types, which can then be combined with range exclusions for maximum flexibility. For example, after btree_gist
is installed, the following constraint will reject overlapping ranges only if the meeting room numbers are equal:
1999-01-08
ISO 8601; January 8 in any mode (recommended format)
January 8, 1999
unambiguous in any datestyle
input mode
1/8/1999
January 8 in MDY
mode; August 1 in DMY
mode
1/18/1999
January 18 in MDY
mode; rejected in other modes
01/02/03
January 2, 2003 in MDY
mode; February 1, 2003 in DMY
mode; February 3, 2001 in YMD
mode
1999-Jan-08
January 8 in any mode
Jan-08-1999
January 8 in any mode
08-Jan-1999
January 8 in any mode
99-Jan-08
January 8 in YMD
mode, else error
08-Jan-99
January 8, except error in YMD
mode
Jan-08-99
January 8, except error in YMD
mode
19990108
ISO 8601; January 8, 1999 in any mode
990108
ISO 8601; January 8, 1999 in any mode
1999.008
year and day of year
J2451187
Julian date
January 8, 99 BC
year 99 BC
04:05:06.789
ISO 8601
04:05:06
ISO 8601
04:05
ISO 8601
040506
ISO 8601
04:05 AM
same as 04:05; AM does not affect value
04:05 PM
same as 16:05; input hour must be <= 12
04:05:06.789-8
ISO 8601
04:05:06-08:00
ISO 8601
04:05-08:00
ISO 8601
040506-08
ISO 8601
04:05:06 PST
time zone specified by abbreviation
2003-04-12 04:05:06 America/New_York
time zone specified by full name
PST
Abbreviation (for Pacific Standard Time)
America/New_York
Full time zone name
PST8PDT
POSIX-style time zone specification
-8:00
ISO-8601 offset for PST
-800
ISO-8601 offset for PST
-8
ISO-8601 offset for PST
zulu
Military abbreviation for UTC
z
Short form of zulu
epoch
date
, timestamp
1970-01-01 00:00:00+00 (Unix system time zero)
infinity
date
, timestamp
later than all other time stamps
-infinity
date
, timestamp
earlier than all other time stamps
now
date
, time
, timestamp
current transaction's start time
today
date
, timestamp
midnight today
tomorrow
date
, timestamp
midnight tomorrow
yesterday
date
, timestamp
midnight yesterday
allballs
time
00:00:00.00 UTC
ISO
ISO 8601, SQL standard
1997-12-17 07:37:16-08
SQL
traditional style
12/17/1997 07:37:16.00 PST
Postgres
original style
Wed Dec 17 07:37:16 1997 PST
German
regional style
17.12.1997 07:37:16.00 PST
datestyle
Setting
Input Ordering
Example Output
SQL, DMY
day
/month
/year
17/12/1997 15:37:16.00 CET
SQL, MDY
month
/day
/year
12/17/1997 07:37:16.00 PST
Postgres, DMY
day
/month
/year
Wed 17 Dec 07:37:16 1997 PST
Y
Years
M
Months (in the date part)
W
Weeks
D
Days
H
Hours
M
Minutes (in the time part)
S
Seconds
1-2
SQL standard format: 1 year 2 months
3 4:05:06
SQL standard format: 3 days 4 hours 5 minutes 6 seconds
1 year 2 months 3 days 4 hours 5 minutes 6 seconds
Traditional Postgres format: 1 year 2 months 3 days 4 hours 5 minutes 6 seconds
P1Y2M3DT4H5M6S
ISO 8601 “format with designators”: same meaning as above
P0001-02-03T04:05:06
ISO 8601 “alternative format”: same meaning as above
sql_standard
1-2
3 4:05:06
-1-2 +3 -4:05:06
postgres
1 year 2 mons
3 days 04:05:06
-1 year -2 mons +3 days -04:05:06
postgres_verbose
@ 1 year 2 mons
@ 3 days 4 hours 5 mins 6 secs
@ 1 year 2 mons -3 days 4 hours 5 mins 6 secs ago
iso_8601
P1Y2M
P3DT4H5M6S
P-1Y-2M3DT-4H-5M-6S
$
A variable representing the JSON text to be queried (the context item).
$varname
A named variable. Its value can be set by the parameter vars
of several JSON processing functions. See Table 9.47 and its notes for details.
@
A variable representing the result of path evaluation in filter expressions.
.
key
."$
varname
"
Member accessor that returns an object member with the specified key. If the key name is a named variable starting with $
or does not meet the JavaScript rules of an identifier, it must be enclosed in double quotes as a character string literal.
.*
Wildcard member accessor that returns the values of all members located at the top level of the current object.
.**
Recursive wildcard member accessor that processes all levels of the JSON hierarchy of the current object and returns all the member values, regardless of their nesting level. This is a PostgreSQL extension of the SQL/JSON standard.
.
{
level
}
.
{
start_level
to end_level
}
Same as .**
, but with a filter over nesting levels of JSON hierarchy. Nesting levels are specified as integers. Zero level corresponds to the current object. To access the lowest nesting level, you can use the last
keyword. This is a PostgreSQL extension of the SQL/JSON standard.
[
subscript
, ...]
Array element accessor. subscript
can be given in two forms: index
or start_index
to end_index
. The first form returns a single array element by its index. The second form returns an array slice by the range of indexes, including the elements that correspond to the provided start_index
and end_index
.
The specified index
can be an integer, as well as an expression returning a single numeric value, which is automatically cast to integer. Zero index corresponds to the first array element. You can also use the last
keyword to denote the last array element, which is useful for handling arrays of unknown length.
[*]
Wildcard array element accessor that returns all array elements.
any
Indicates that a function accepts any input data type.
anyelement
Indicates that a function accepts any data type (see Section 37.2.5).
anyarray
Indicates that a function accepts any array data type (see Section 37.2.5).
anynonarray
Indicates that a function accepts any non-array data type (see Section 37.2.5).
anyenum
Indicates that a function accepts any enum data type (see Section 37.2.5 and Section 8.7).
anyrange
Indicates that a function accepts any range data type (see Section 37.2.5 and Section 8.17).
cstring
Indicates that a function accepts or returns a null-terminated C string.
internal
Indicates that a function accepts or returns a server-internal data type.
language_handler
A procedural language call handler is declared to return language_handler
.
fdw_handler
A foreign-data wrapper handler is declared to return fdw_handler
.
index_am_handler
An index access method handler is declared to return index_am_handler
.
tsm_handler
A tablesample method handler is declared to return tsm_handler
.
record
Identifies a function taking or returning an unspecified row type.
trigger
A trigger function is declared to return trigger.
event_trigger
An event trigger function is declared to return event_trigger.
pg_ddl_command
Identifies a representation of DDL commands that is available to event triggers.
void
Indicates that a function returns no value.
unknown
Identifies a not-yet-resolved type, e.g. of an undecorated string literal.
opaque
An obsolete type name that formerly served many of the above purposes.
timestamp [ (p
) ] [ without time zone ]
8 bytes
both date and time (no time zone)
4713 BC
294276 AD
1 microsecond
timestamp [ (p
) ] with time zone
8 bytes
both date and time, with time zone
4713 BC
294276 AD
1 microsecond
date
4 bytes
date (no time of day)
4713 BC
5874897 AD
1 day
time [ (p
) ] [ without time zone ]
8 bytes
time of day (no date)
00:00:00
24:00:00
1 microsecond
time [ (p
) ] with time zone
12 bytes
time of day (no date), with time zone
00:00:00+1459
24:00:00-1459
1 microsecond
interval [
fields
] [ (p
) ]
16 bytes
time interval
-178000000 years
178000000 years
1 microsecond
oid
any
numeric object identifier
564182
regproc
pg_proc
function name
sum
regprocedure
pg_proc
function with argument types
sum(int4)
regoper
pg_operator
operator name
+
regoperator
pg_operator
operator with argument types
*(integer,integer)
or -(NONE,integer)
regclass
pg_class
relation name
pg_type
regtype
pg_type
data type name
integer
regrole
pg_authid
role name
smithee
regnamespace
pg_namespace
namespace name
pg_catalog
regconfig
pg_ts_config
text search configuration
english
regdictionary
pg_ts_dict
text search dictionary
simple
string
text
禁止使用 \u0000,如果資料庫編碼不是 UTF8,則不允許使用非 ASCII Unicode 轉譯
number
numeric
不允許使用 NaN 和 infinity
boolean
boolean
僅接受小寫的 true 和 false
null
(none)
與 SQL NULL 是不同的概念
boolean
1 byte
state of true or false
't'
'true'
'y'
'yes'
'on'
'1'
'f'
'false'
'n'
'no'
'off'
'0'