Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
貨幣型別儲存具有固定小數精確度的貨幣數量;詳見表 8.3。 小數精確度視資料庫的 lc_monetary 設定而定。表中顯示的範圍假設有兩個小數位。有許多可以接受的格式,包括整數和浮點數字,以及典型的貨幣格式,例如如「$1,000.00」。 輸出時通常採用後者的形式,但取決於語言環境(locale)。
Table 8.3. Monetary Types
Name
Storage Size
Description
Range
money
8 bytes
currency amount
-92233720368547758.08 to +92233720368547758.07
由於此資料型別的輸出是與區域設定有關的,因此可能無法將貨幣資料載入到不同 lc_monetary 設定的資料庫中。為避免出現問題,在將轉換恢復到新的資料庫之前,請確保 lc_monetary 與轉換的資料庫中的設定值相容。
numberic、int 和 bigint 資料型別的值可以轉換為 money。從 real 和 double precision 資料型別轉換會先轉為 numeric 來完成,例如:
但是,並不推薦這樣做。由於四捨五入誤差的可能性,不應該使用浮點數來處理貨幣。
money 型別的數值可以轉換為 numeric 而不會損失精確度。轉換為其他型別可能會失去精確性,而且還必須分兩步驟完成:
當貨幣數值除以另一貨幣數值時,結果會是 double precision(即純數,而不是貨幣);貨幣單位會相互抵消。
PostgreSQL 支援標準 SQL 的布林型別,如表 [Table 8-19]("DATATYPE-BOOLEAN-TABLE") 所示。布林型別有幾種狀態: "true"、"false",和第三種狀態 "unknown","unknown" 會用 SQL 的 null 值表示。
Table 8-19. 布林型別的資料型態描述
Name
Storage Size
Description
boolean
1 byte
state of true or false
以下的字詞都可以代表 "true" 狀態:
TRUE
't'
'true'
'y'
'yes'
'on'
'1'
"false" 狀態則可以用以下的字詞表示:
FALSE
'f'
'false'
'n'
'no'
'off'
'0'
開頭和結尾的空白都會被忽略,也不分大小寫。 為了符合 SQL 用法,建議使用關鍵字 "TRUE" 和 "FALSE"。
[Example 8-2]("DATATYPE-BOOLEAN-EXAMPLE") 使用字母 t 和 f,來顯示布林型別的輸出。
bytea 資料型別允許儲存位元組字串;詳見 Table 8.6。
Name
Storage Size
Description
bytea
1 or 4 bytes 加上實際的位元組字串長度
可變長度二進位字串
位元組字串是位元組的序列。位元組字串以兩種方式與字串區分開來。首先,位元組字串特別允許儲存零值的位元組和其他「不可列印」位元組(通常是在 32 到 126 範圍之外的位元組)。字串不允許全為零位元組,並且還禁止資料庫選擇無效的字元集編碼序列。其次,對位元組字串的操作處理實際的位元組,而字串的處理取決於區域設定。簡而言之,位元組字串適合於儲存程式設計師認為是「raw bytes」的資料,而字串適合於儲存文字。
bytea 型別支援兩種輸入和輸出的外部格式:PostgreSQL 既有的「escape」格式和「十六進位」格式,輸入時始終接受這兩個。輸出格式取決於組態參數 bytea_output;預設值為十六進位。(注意,在 PostgreSQL 9.0 中引入了十六進位格式;早期版本和一些工具並無法解譯它。)
SQL 標準定義了一種不同的位元組字串型別,稱為 BLOB 或 BINARY LARGE OBJECT。輸入格式與 bytea 不同,但提供的函數和運算子大致相同。
bytea
十六進位格式「十六進位」格式將二進位資料編碼為每個位元組為 2 個十六進位數字,儲存不反轉。整個字符串前面是序列 \x(以區別於轉譯格式)。在某些情況下,初始倒斜線可能需要透過加倍來進行轉譯,在相同的情況下,倒斜線必須以轉譯格式加倍;細節如下。十六進位數字可以是大寫或小寫,並且在數字組之間允許空格(但不在數字組內,也不在起始 \x 序列中)。十六進位格式與各種外部應用程序和協議相容,並且轉換速度往往比轉譯格式更快,因此偏好使用它。
例如:
bytea
轉譯(escape)格式「轉義」格式是 bytea 型別的傳統 PostgreSQL 格式。它採用將位元組字串表示為 ASCII 字元序列的方法,同時將那些不能表示為 ASCII 字元的位元組轉換為特殊的轉譯序列。如果從應用程序的角度來看,將位元組表示為字元是有意義的,那麼這種表示可以很方便。但實際上它通常會令人困惑,因為它模糊了位元組字串和字串之間的區別,而且所選擇的特定轉譯機制也有點笨拙。因此,對於大多數新的應用程序,應該避免使用此格式。
以轉譯格式輸入 bytea 值時,必須轉譯某些值的位元組,也同時可以轉譯所有位元組值。通常,要轉譯位元組,請將其轉換為三位數的八進位值,並在其前面加一個倒斜線(或兩個倒斜線,如果要使用轉譯字串語法將值寫為文字的話)。倒斜線本身(位元組 92)也可以用雙倒斜線表示。Table 8.7 列出了必須轉譯的字元,並在適合的情況下提供了備用轉譯序列。
bytea
Literal Escaped OctetsDecimal Octet Value
Description
Escaped Input Representation
Example
Output Representation
0
zero octet
E'\\000'
SELECT E'\\000'::bytea;
\000
39
single quote
''''
or E'\\047'
SELECT E'\''::bytea;
'
92
backslash
E'\\\\'
or E'\\134'
SELECT E'\\\\'::bytea;
\\
0 to 31 and 127 to 255
“non-printable” octets
E'\\
xxx'
(octal value)
SELECT E'\\001'::bytea;
\001
轉譯不可列印的位元組的要求因區域設定而異。在某些情況下,你可以放棄他們而不轉譯。請注意,即使看起來有時多於一個字符,Table 8.7 中每個範例的結果也只有一個位元組。
如 Table 8.7 所示,需要多個倒斜線的原因是,作為字串文字編輯的輸入字串必須通過 PostgreSQL 伺服器中的兩個解析階段。每組的第一個倒斜線以字串文字解析器解釋為轉譯字元(假設使用了轉譯字串語法)並因此被消耗,留下該組的第二個倒斜線。(錢字號引用的字串可用於避免此轉譯程序。)然後,bytea 輸入函數將剩餘的倒斜線識別從三位數八進位值開始或轉譯另一個倒斜線。例如,在通過轉譯字串解析器後,作為 E'\ 001' 傳遞給伺服器的字串文字變為 \001。然後將 \001 發送到 bytea 輸入函數,在該函數中將其轉換為十進制值為 1 的單個位元組。請注意,單引號字元不受 bytea 特殊處理,因此它遵循字串文字的一般規則。(另詳見第 4.1.2.1 節。)
bytea 位元組有時在輸出時被轉義。通常,每個「不可列印」的位元組都會轉換為等效的三位數八進位值,並以一個倒斜線開頭。大多數「可列印」位元組由它們在用戶端字元集中的標準來表示。十進位值為 92(倒斜線)的位元組在輸出中會加倍。詳情見 Table 8.8。
bytea
Output Escaped OctetsDecimal Octet Value
Description
Escaped Output Representation
Example
Output Result
92
backslash
\\
SELECT E'\\134'::bytea;
\\
0 to 31 and 127 to 255
“non-printable” octets
\
xxx
(octal value)
SELECT E'\\001'::bytea;
\001
32 to 126
“printable” octets
client character set representation
SELECT E'\\176'::bytea;
~
根據您使用的 PostgreSQL 的前端,在轉譯和未轉譯 bytea 字串方面可能還有其他工作要做。例如,如果您的界面會自動轉譯這些,您可能還必須轉譯換行符號和回行首符號。
數字型別由兩位數,四位數和八位數整數,四位元組和八位元組的浮點數以及可調式精確度的小數組成。表格 8.2 列出了可用的類型。
Name
Storage Size
Description
Range
smallint
2 bytes
small-range integer
-32768 to +32767
integer
4 bytes
typical choice for integer
-2147483648 to +2147483647
bigint
8 bytes
large-range integer
-9223372036854775808 to +9223372036854775807
decimal
variable
user-specified precision, exact
up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
numeric
variable
user-specified precision, exact
up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
real
4 bytes
variable-precision, inexact
6 decimal digits precision
double precision
8 bytes
variable-precision, inexact
15 decimal digits precision
smallserial
2 bytes
small autoincrementing integer
1 to 32767
serial
4 bytes
autoincrementing integer
1 to 2147483647
bigserial
8 bytes
large autoincrementing integer
1 to 9223372036854775807
4.1.2 節描述了數字型別常數的語法。 數字型別有一整套相應的算術運算元和函數。有關更多訊息,請參閱第 9 章。 以下各節將詳細介紹這些型別。
smallint、integer 和 bigint 型別儲存整數,即不包含小數部分的各種範圍的數字。嘗試儲存在允許的範圍之外的數值將會導致錯誤。
「integer」型別是常見的選擇,因為它提供了數值範圍、儲存空間及效能之間的最佳平衡。「smallint」 列別通常只在磁碟空間不足的情況下使用。「bigint」 型別被設計用於整數型別的範圍不足時。
SQL僅指定整數型別 integer(或 int)、smallint 和 bigint。 型別名稱 int2、int4 和 int8 則是延伸型別,也有一些其他 SQL 資料庫系統使用。
數字型別可以儲存很多位數的數字。特別建議使用在要求正確性的地方,像是儲存貨幣金額或其他數量。使用數值的計算在可能需要的情況下得到確切的結果,例如 加法、減法、乘法。但是,與整數型別或下一節中介紹的浮點型別相比,對數值的計算速度非常緩慢。
我們使用下面的術語:數字的「scale」是小數點右邊的小數部分,也就是小數的位數。數字的「precision」是整數中有效位數的總數,即小數點兩邊的位數總合。所以 23.5141 的 precision 是 6,scale 是 4。整數可以被認為是 scale 為 0。
可以配置數字欄位的最大 precision 和最大 scale。要宣告數字型別的欄位,請使用以下語法:
precision 必須是正值,scale 為零或正值。或是:
選擇 0 為 scale。這樣使用:
沒有任何 precision 或 scale 的話,就會建立一個欄位,其欄位中可以儲存任何 precision 和 scale 的數字值,直到達到 precision 的極限。這種型別的欄位不會將輸入值強制轉為任何特定的 scale,其中具有聲明比例的數字欄位會將輸入值強制為該 scale。 (SQL 標準需要預設 scale 為 0,即強制為整數精度,我們發現這樣做有點無用。如果你擔心可移植性,請務必明確指定 precision 和 scale。
注意在型別宣告中明確指定時允許的最大 precision 為 1000;沒有指定 precision 的NUMERIC 為 Table 8.2 中所述的限制。
如果要儲存的小數位數大於欄位所宣告的 scale,則係統會將值四捨五入到宣告所指定的小數位數。然後,如果小數點左邊的位數超過宣告的 precise 減去聲明的 scale 的話,則會產生錯誤。
數字內容的實體儲存不會有任何額外的前導位數或補零。因此,欄位宣告的 precise 和 scale 是最大值,而不是固定的分配。(在這個意義上,數字型別更像是 varchar(n) 而不是 char(n)。) 實際儲存的要求是每四個十進制數字組加兩個位元組,再加上三到八個位元組的額外配置。
除了普通的數值之外,數字型別還允許特殊值 NaN,意思是「不是一個數字」。 NaN 的任何操作都會產生另一個 NaN。在 SQL 指令中將此值作為常數寫入時,必須在其中使用單引號,例如 UPDATE table SET x = 'NaN'。 在輸入時,字串 NaN 識別是不區分大小寫的。
注意「非數字」的概念在大多數實作中,NaN 不被視為等於任何其他數值(包括 NaN)。為了允許數值在樹狀索引中排序和使用,PostgreSQL 將 NaN 值視為相等或大於所有的非 NaN 值。
decimal 和 numeric 的型別是相同的。 這兩種型別都是 SQL 標準的一部分。
當需要四捨五入時,數字型別會往離零較遠的值調整,而(在大多數機器上)實數和雙精度型別會調整到最接近的偶數。 例如:
資料型別中 real 和 double 是非精確的、可變精確度的數字型別。在實務上,這些型別通常是針對二進制浮點數運算(分別為單精度和雙精度)的IEEE 754標準的實作,需要底層的中央處理器、作業系統和編譯器支持。
非精確意味著某些值不能完全轉換為內部格式,並以近似值儲存,因此儲存和檢索值可能會表現出輕微的差異。管理這些誤差以及它們如何計算傳遞是數學和計算機科學分支的主題,除了以下幾點之外,這裡不再討論:
如果你需要精確的儲存和計算(例如貨幣金額),請改為使用 numeric 型別。
如果你想對這些型別做任何重要的複雜計算,特別是如果你依賴邊界情況下的某些行為(極大極小值或超過上下限),你應該仔細評估實作方式。
比較兩個相等的浮點數值可能並不總是按預期中直覺的方式運作。
在大多數平台上,real 型別的範圍至少為 1E-37 至 1E + 37,精確度至少為 6 位數十進制數字。double 型別的範圍通常在 1E-307 至 1E + 308 之間,精確度至少為 15 位數。數值太大或太小都會導致錯誤。如果輸入數字的精確度太高,四捨五入的情況則可能會發生。數字太接近於零,卻不能表示為零的話,將導致 underflow 超過下限的錯誤。
注意extra_float_digits 參數設定控制浮點數轉換為文字輸出時所包含的額外有效位數。使用預設值 0 時,PostgreSQL 支援的每個平台上的輸出都是相同的。增加它的話,能更精確地輸出儲存值,但可能在不同平台間是不同的結果。
除了普通的數值之外,浮點型別還有幾個特殊的值:
Infinity
-Infinity
NaN
這些分別代表 IEEE 754 特殊值「無限大」、「負無限大」和「非數字」。(在浮點數計算不符合 IEEE 754 標準的機器上,這些值可能無法如期運作。)在 SQL 指令中將這些值作為常數寫入時,必須在其放入單引號中,例如 UPDATE table SET x = '-Infinity'。 在輸入時,這些字串識別是不區分大小寫的。
注意IEEE 754 規定 NaN 不應與任何其他浮點數值(包括NaN)相等。為了允許浮點值在樹狀索引中排序和使用,PostgreSQL 將 NaN 視為相等或大於所有非 NaN 的數值。
PostgreSQL 也支援 SQL 標準的 float 和 float(p) 來表示非精確的數字型別。這裡,p 指的是二進位數字的最小可接受的精確度。PostgreSQL 接受 float(1) 到 float(24) 選擇視為 real 型別,而 float(25) 到 float(53) 則視為 double。p 超出允許範圍的話會產生錯誤。沒有指定精確度的浮點數意味著 double。
注意假設 real 和 double 的尾數分別為 24 位和 53 位,以 IEEE 標準浮點數實作而言是正確的。在非 IEEE 平台上,它可能會有一些小問題,但為了簡單起見,最好在所有平台上都使用相同的 p 範圍。
注意本節介紹的是 PostgreSQL 專屬建立自動增量(auto-incrementing)欄位的方式。另一種方式是使用 CREATE TABLE 中描述的 SQL 標準識別欄位功能。
資料型別 smallserial、serial 和 bigserial 都不是真正的型別,而僅僅是建立唯一識別欄位(類似於某些其他資料庫所支援的 AUTO_INCREMENT 屬性)的方便型別語法。以目前的實作方式,請使用:
相當於以下的指令:
因此,我們建立了一個整數欄位,並將其預設值設定為序列數字產生器。使用 NOT NULL 限制條件來確保無法插入空值。(在大多數情況下,你還需要附加一個 UNIQUE 或 PRIMARY KEY 限制條件來防止偶然插入重複值,但這不是自動的。) 最後,這個序列被標記為「owned by」欄位,以便在欄位或資料表被刪除時一併被刪除。
注意smallserial、serial 和 bigserial,被實作來實現序列數字,即使沒有資料列被刪除,在欄位中出現的值在序列中仍可能會有「漏洞」或缺口。即使包含該值的資料列從未成功插入資料表中,從序列中分配的值仍然會用完。例如,如果資料插入的交易回溯了,則可能發生這種情況。有關詳細訊息,請參閱第 9.16 節中的 nextval()。
要將序列的下一個值插入到序列欄位中,請指定序列欄位應被分配其預設值。這可以透過從 INSERT 語句中欄位列表中排除欄位或使用DEFAULT關鍵字來完成。
型別名稱 serial 和 serial4 是等價的:都是建立整數(integer)欄位。型別名稱 bigserial 和 serial8 也以相同的方式作用,差別是他們建立一個 bigint 的欄位。如果你預期在資料表的整個生命週期中使用超過 2^31 個標識符,則應使用 bigserial。型別名稱 smallserial 和 serial2 也是以相同的,而除了它們是建立一個 smallint 欄位。
當擁有的欄位被刪除時,為序列欄位創建的序列也將自動刪除。但你可以刪除序列而不刪除欄位,這會強制刪除欄位的預設表示式。
Bit strings are strings of 1's and 0's. They can be used to store or visualize bit masks. There are two SQL bit types:bit(n)andbit varying(n), wherenis a positive integer.
bittype data must match the lengthnexactly; it is an error to attempt to store shorter or longer bit strings.bit varyingdata is of variable length up to the maximum lengthn; longer strings will be rejected. Writingbitwithout a length is equivalent tobit(1), whilebit varyingwithout a length specification means unlimited length.
Note:If one explicitly casts a bit-string value tobit(n), it will be truncated or zero-padded on the right to be exactlynbits, without raising an error. Similarly, if one explicitly casts a bit-string value tobit varying(n), it will be truncated on the right if it is more thannbits.
Refer toSection 4.1.2.5for information about the syntax of bit string constants. Bit-logical operators and string manipulation functions are available; seeSection 9.6.
Example 8-3. Using the Bit String Types
A bit string value requires 1 byte for each group of 8 bits, plus 5 or 8 bytes overhead depending on the length of the string (but long values may be compressed or moved out-of-line, as explained inSection 8.3for character strings).
Geometric data types represent two-dimensional spatial objects. Table 8.20 shows the geometric types available in PostgreSQL.
Table 8.20. Geometric Types
Name
Storage Size
Description
Representation
point
16 bytes
Point on a plane
(x,y)
line
32 bytes
Infinite line
{A,B,C}
lseg
32 bytes
Finite line segment
((x1,y1),(x2,y2))
box
32 bytes
Rectangular box
((x1,y1),(x2,y2))
path
16+16n bytes
Closed path (similar to polygon)
((x1,y1),...)
path
16+16n bytes
Open path
[(x1,y1),...]
polygon
40+16n bytes
Polygon (similar to closed path)
((x1,y1),...)
circle
24 bytes
Circle
<(x,y),r> (center point and radius)
A rich set of functions and operators is available to perform various geometric operations such as scaling, translation, rotation, and determining intersections. They are explained in Section 9.11.
Points are the fundamental two-dimensional building block for geometric types. Values of type point
are specified using either of the following syntaxes:
where x
and y
are the respective coordinates, as floating-point numbers.
Points are output using the first syntax.
Lines are represented by the linear equation A
x + B
y + C
= 0, where A
and B
are not both zero. Values of type line
are input and output in the following form:
Alternatively, any of the following forms can be used for input:
where (
x1
,y1
) and (
x2
,y2
) are two different points on the line.
Line segments are represented by pairs of points that are the endpoints of the segment. Values of type lseg
are specified using any of the following syntaxes:
where (
x1
,y1
) and (
x2
,y2
) are the end points of the line segment.
Line segments are output using the first syntax.
Boxes are represented by pairs of points that are opposite corners of the box. Values of type box
are specified using any of the following syntaxes:
where (
x1
,y1
) and (
x2
,y2
) are any two opposite corners of the box.
Boxes are output using the second syntax.
Any two opposite corners can be supplied on input, but the values will be reordered as needed to store the upper right and lower left corners, in that order.
Paths are represented by lists of connected points. Paths can be open, where the first and last points in the list are considered not connected, or closed, where the first and last points are considered connected.
Values of type path
are specified using any of the following syntaxes:
where the points are the end points of the line segments comprising the path. Square brackets ([]
) indicate an open path, while parentheses (()
) indicate a closed path. When the outermost parentheses are omitted, as in the third through fifth syntaxes, a closed path is assumed.
Paths are output using the first or second syntax, as appropriate.
Polygons are represented by lists of points (the vertexes of the polygon). Polygons are very similar to closed paths, but are stored differently and have their own set of support routines.
Values of type polygon
are specified using any of the following syntaxes:
where the points are the end points of the line segments comprising the boundary of the polygon.
Polygons are output using the first syntax.
Circles are represented by a center point and radius. Values of type circle
are specified using any of the following syntaxes:
where (
x
,y
) is the center point and r
is the radius of the circle.
Circles are output using the first syntax.
The data typeuuidstores Universally Unique Identifiers (UUID) as defined by RFC 4122, ISO/IEC 9834-8:2005, and related standards. (Some systems refer to this data type as a globally unique identifier, or GUID, instead.) This identifier is a 128-bit quantity that is generated by an algorithm chosen to make it very unlikely that the same identifier will be generated by anyone else in the known universe using the same algorithm. Therefore, for distributed systems, these identifiers provide a better uniqueness guarantee than sequence generators, which are only unique within a single database.
A UUID is written as a sequence of lower-case hexadecimal digits, in several groups separated by hyphens, specifically a group of 8 digits followed by three groups of 4 digits followed by a group of 12 digits, for a total of 32 digits representing the 128 bits. An example of a UUID in this standard form is:
PostgreSQLalso accepts the following alternative forms for input: use of upper-case digits, the standard format surrounded by braces, omitting some or all hyphens, adding a hyphen after any group of four digits. Examples are:
Output is always in the standard form.
PostgreSQLprovides storage and comparison functions for UUIDs, but the core database does not include any function for generating UUIDs, because no single algorithm is well suited for every application. Theuuid-osspmodule provides functions that implement several standard algorithms. Thepgcryptomodule also provides a generation function for random UUIDs. Alternatively, UUIDs could be generated by client applications or other libraries invoked through a server-side function.
Table 8.4. Character Types
Table 8.4 列出了 PostgreSQL 中可用的通用字串型別。
SQL 定義了兩種主要字串型別:character varying(n) 和 character(n),其中 n 是正整數。這兩種型別都可以儲存長度最多為 n 個字元(不是位元組)的字串。嘗試將較長的字串儲存到這些型別的欄位中將産生錯誤,除非多餘的字元都是空格,在這種情況下,字串將被截斷為最大長度。(這個有點奇怪的異常是 SQL 標準所要求的。)如果要儲存的字串比宣告的長度短,則 character 型別的值將被空格填充;character varying 的值將只儲存較短的字串。
如果明確地將值轉換為 character varying(n) 或 character(n),則超長值將被截斷為 n 個字元而不會引發錯誤。(這也是 SQL 標準所要求的。)
型別 varchar(n) 和 char(n) 分別是 character varying(n) 和 character(n) 的別名。沒有長度的 character 等同於 character(1)。如果在沒有長度的情況下使用 character varying,則該型別接受任何長度的字串。後者是 PostgreSQL 延伸功能。
另外,PostgreSQL 提供了 text 型別,它儲存任意長度的字串。雖然型別 text 不在 SQL 標準中,但是其他幾個 SQL 資料庫管理系統也支援它。
character 的值用空格填充到指定的長度 n,並以這種方式儲存和顯示。但是,在比較兩個型別字串時,尾隨空格在語義上無關緊要會被忽略。在空格很重要的排序規則中,這種行為會產生意想不到的結果; 例如 SELECT 'a '::CHAR(2) collate "C"<E'a\n'::CHAR(2)
會回傳 true,即使 C 語言環境會認為空格大於換行符。將字串轉換為其他字串型別之一時,將刪除尾隨的空格。請注意,尾隨空格在 character varying 和 text 方面具有語義重要性,尤其在使用樣式匹配時,即 LIKE 和正規表示式。
短字串(126 個位元組以下)的儲存要求是 1 個位元組加上實際字串,其中包括字串空間填充。較長的字串有 4 個位元組的開銷而不是 1。長字串由系統自動壓縮,因此磁碟上的物理需求可能更少。非常長的值也儲存在後台的資料表中,這樣它們就不會干擾對較短欄位的快速存取。在任何情況下,可儲存的最長字串大約為 1 GB。(資料型別宣告中 n 允許的最大值小於此值。更改此值沒有用,因為使用多位元組字串編碼時,位元組數和字元數可能完全不同。如果您希望儲存沒有特定上限的長字串,使用不帶長度的 text 或 character varying,而不是隨便設定長度限制。)
這三種型別之間並沒有效能差異,除了使用空白填充類型時增加的儲存空間之外,以及一些額外的 CPU 週期來檢查儲存長度與欄位中的長度。雖然 character(n) 在其他一些資料庫系統中具有效能優勢,但 PostgreSQL 中並沒有這樣的優勢;事實上,由於額外的儲存成本,character(n) 通常是三者中最慢的。在大多數情況下,應使用 text 或 character varying。
有關字串文字語法的資訊,請參閱;有關可用運算子和函數的資訊,請參閱。資料庫字元集決定用於儲存文字的字元集;有關字元集支援的更多訊息,請參閱。
Example 8.1. Using the Character Types
PostgreSQL 中還有另外兩種固定長度的字串型別,如 Table 8.5 所示。name 型別僅用於在內部系統目錄中儲存指標,並非供一般使用者使用。它的長度目前定義為 64 個位元組(63 個可用字元加結尾符號),但應視 C 原始碼中的常數 NAMEDATALEN 而定。長度在編譯時設定(因此可以根據特殊用途進行調整); 預設的最大長度可能會在將來的版本中變更。型別「“char”」(注意雙引號)與 char(1) 的不同之處在於它僅使用一個位元組的儲存空間。它在系統目錄中作為簡單內部使用的列舉型別。
Table 8.5. Special Character Types
A_composite type_represents the structure of a row or record; it is essentially just a list of field names and their data types.PostgreSQLallows composite types to be used in many of the same ways that simple types can be used. For example, a column of a table can be declared to be of a composite type.
Here are two simple examples of defining composite types:
The syntax is comparable toCREATE TABLE
, except that only field names and types can be specified; no constraints (such asNOT NULL
) can presently be included. Note that theAS
keyword is essential; without it, the system will think a different kind ofCREATE TYPE
command is meant, and you will get odd syntax errors.
Having defined the types, we can use them to create tables:
or functions:
Whenever you create a table, a composite type is also automatically created, with the same name as the table, to represent the table's row type. For example, had we said:
then the sameinventory_item
composite type shown above would come into being as a byproduct, and could be used just as above. Note however an important restriction of the current implementation: since no constraints are associated with a composite type, the constraints shown in the table definition_do not apply_to values of the composite type outside the table. (A partial workaround is to use domain types as members of composite types.)
An example is:
which would be a valid value of theinventory_item
type defined above. To make a field be NULL, write no characters at all in its position in the list. For example, this constant specifies a NULL third field:
If you want an empty string rather than NULL, write double quotes:
Here the first field is a non-NULL empty string, the third is NULL.
TheROW
expression syntax can also be used to construct composite values. In most cases this is considerably simpler to use than the string-literal syntax since you don't have to worry about multiple layers of quoting. We already used this method above:
The ROW keyword is actually optional as long as you have more than one field in the expression, so these can be simplified to:
To access a field of a composite column, one writes a dot and the field name, much like selecting a field from a table name. In fact, it's so much like selecting from a table name that you often have to use parentheses to keep from confusing the parser. For example, you might try to select some subfields from ouron_hand
example table with something like:
This will not work since the nameitem
is taken to be a table name, not a column name ofon_hand
, per SQL syntax rules. You must write it like this:
or if you need to use the table name as well (for instance in a multitable query), like this:
Now the parenthesized object is correctly interpreted as a reference to theitem
column, and then the subfield can be selected from it.
Similar syntactic issues apply whenever you select a field from a composite value. For instance, to select just one field from the result of a function that returns a composite value, you'd need to write something like:
Without the extra parentheses, this will generate a syntax error.
Here are some examples of the proper syntax for inserting and updating composite columns. First, inserting or updating a whole column:
The first example omitsROW
, the second uses it; we could have done it either way.
We can update an individual subfield of a composite column:
Notice here that we don't need to (and indeed cannot) put parentheses around the column name appearing just afterSET
, but we do need parentheses when referencing the same column in the expression to the right of the equal sign.
And we can specify subfields as targets forINSERT
, too:
Had we not supplied values for all the subfields of the column, the remaining subfields would have been filled with null values.
There are various special syntax rules and behaviors associated with composite types in queries. These rules provide useful shortcuts, but can be confusing if you don't know the logic behind them.
This query produces a single composite-valued column, so we might get output like:
Note however that simple names are matched to column names before table names, so this example works only because there is no column namedc
in the query's tables.
When we write
then, according to the SQL standard, we should get the contents of the table expanded into separate columns:
as if the query were
PostgreSQLhandles column expansion by actually transforming the first form into the second. So, in this example,myfunc()
would get invoked three times per row with either syntax. If it's an expensive function you may wish to avoid that, which you can do with a query like:
TheOFFSET 0
clause keeps the optimizer from“flattening”the sub-select to arrive at the form with multiple calls ofmyfunc()
.
In both cases, the current row ofinventory_item
is passed to the function as a single composite-valued argument. Even though.*
does nothing in such cases, using it is good style, since it makes clear that a composite value is intended. In particular, the parser will considerc
inc.*
to refer to a table name or alias, not to a column name, so that there is no ambiguity; whereas without.*
, it is not clear whetherc
means a table name or a column name, and in fact the column-name interpretation will be preferred if there is a column namedc
.
Another example demonstrating these concepts is that all these queries mean the same thing:
(The last case uses a row constructor with the key wordROW
omitted.)
Another special syntactical behavior associated with composite values is that we can usefunctional notation_for extracting a field of a composite value. The simple way to explain this is that the notationsfield
(table
)andtable
.field
_are interchangeable. For example, these queries are equivalent:
Moreover, if we have a function that accepts a single argument of a composite type, we can call it with either notation. These queries are all equivalent:
This equivalence between functional notation and field notation makes it possible to use functions on composite types to implement“computed fields”.An application using the last query above wouldn't need to be directly aware thatsomefunc
isn't a real column of the table.
Because of this behavior, it's unwise to give a function that takes a single composite-type argument the same name as any of the fields of that composite type. If there is ambiguity, the field-name interpretation will be preferred, so that such a function could not be called without tricks. One way to force the function interpretation is to schema-qualify the function name, that is, writeschema
.func
(compositevalue
).
The external text representation of a composite value consists of items that are interpreted according to the I/O conversion rules for the individual field types, plus decoration that indicates the composite structure. The decoration consists of parentheses ((
and)
) around the whole value, plus commas (,
) between adjacent items. Whitespace outside the parentheses is ignored, but within the parentheses it is considered part of the field value, and might or might not be significant depending on the input conversion rules for the field data type. For example, in:
the whitespace will be ignored if the field type is integer, but not if it is text.
As shown previously, when writing a composite value you can write double quotes around any individual field value. You_must_do so if the field value would otherwise confuse the composite-value parser. In particular, fields containing parentheses, commas, double quotes, or backslashes must be double-quoted. To put a double quote or backslash in a quoted composite field value, precede it with a backslash. (Also, a pair of double quotes within a double-quoted field value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can avoid quoting and use backslash-escaping to protect all data characters that would otherwise be taken as composite syntax.
A completely empty field value (no characters at all between the commas or parentheses) represents a NULL. To write a value that is an empty string rather than NULL, write""
.
The composite output routine will put double quotes around field values if they are empty strings or contain parentheses, commas, double quotes, backslashes, or white space. (Doing so for white space is not essential, but aids legibility.) Double quotes and backslashes embedded in field values will be doubled.
Remember that what you write in an SQL command will first be interpreted as a string literal, and then as a composite. This doubles the number of backslashes you need (assuming escape string syntax is used). For example, to insert atext
field containing a double quote and a backslash in a composite value, you'd need to write:
TheROW
constructor syntax is usually easier to work with than the composite-literal syntax when writing composite values in SQL commands. InROW
, individual field values are written the same way they would be written when not members of a composite.
PostgreSQL 內建一套豐富的資料型別供用戶使用。使用者也可以使用 指令讓 PostgreSQL 增加新的資料型別。
Table 8.1 列出所有內建的通用資料型別。大多數列在「Aliases」中的替代名稱是由於在 PostgreSQL 內部使用的歷史因素。此外,還有一些內部使用或不建議使用的資料型別,但這裡並沒有列出。
Table 8.1. Data Types
以下資料型別(或其拼寫方式)是由 SQL 指定的:bigint
,bit
,bit varying
,boolean
,char
,character varying
,character
,varchar
,date
,double precision
,integer
,interval
,numeric
,decimal
,real
,smallint
,time
(with or without time zone),timestamp
(with or without time zone),xml
.
每種資料型別都具有其明確的輸入和輸出功能外部表示法。許多內建的資料型別都有明顯的外部格式。但是,有幾種資料型別是 PostgreSQL 獨有的,比如幾何路徑,或者有幾種可能的格式,像是日期和時間型別。某些輸入和輸出功能是不可逆的,意即,與原始輸入相比,輸出功能的結果可能會失去一些精確度。
PostgreSQLallows columns of a table to be defined as variable-length multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, or composite type can be created. Arrays of domains are not yet supported.
To illustrate the use of array types, we create this table:
As shown, an array data type is named by appending square brackets ([]
) to the data type name of the array elements. The above command will create a table namedsal_emp
with a column of typetext
(name
), a one-dimensional array of typeinteger
(pay_by_quarter
), which represents the employee's salary by quarter, and a two-dimensional array oftext
(schedule
), which represents the employee's weekly schedule.
The syntax forCREATE TABLE
allows the exact size of arrays to be specified, for example:
However, the current implementation ignores any supplied array size limits, i.e., the behavior is the same as for arrays of unspecified length.
The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions inCREATE TABLE
is simply documentation; it does not affect run-time behavior.
An alternative syntax, which conforms to the SQL standard by using the keywordARRAY
, can be used for one-dimensional arrays.pay_by_quarter
could have been defined as:
Or, if no array size is to be specified:
As before, however,PostgreSQLdoes not enforce the size restriction in any case.
To write an array value as a literal constant, enclose the element values within curly braces and separate them by commas. (If you know C, this is not unlike the C syntax for initializing structures.) You can put double quotes around any element value, and must do so if it contains commas or curly braces. (More details appear below.) Thus, the general format of an array constant is the following:
wheredelim
_is the delimiter character for the type, as recorded in itspg_type
entry. Among the standard data types provided in thePostgreSQLdistribution, all use a comma (,
), except for typebox
which uses a semicolon (;
). Eachval
_is either a constant of the array element type, or a subarray. An example of an array constant is:
This constant is a two-dimensional, 3-by-3 array consisting of three subarrays of integers.
To set an element of an array constant to NULL, writeNULL
for the element value. (Any upper- or lower-case variant ofNULL
will do.) If you want an actual string value“NULL”, you must put double quotes around it.
Now we can show someINSERT
statements:
The result of the previous two inserts looks like this:
Multidimensional arrays must have matching extents for each dimension. A mismatch causes an error, for example:
TheARRAY
constructor syntax can also be used:
Now, we can run some queries on the table. First, we show how to access a single element of an array. This query retrieves the names of the employees whose pay changed in the second quarter:
The array subscript numbers are written within square brackets. By defaultPostgreSQLuses a one-based numbering convention for arrays, that is, an array ofn
_elements starts witharray[1]
and ends witharray[n
_].
This query retrieves the third quarter pay of all employees:
We can also access arbitrary rectangular slices of an array, or subarrays. An array slice is denoted by writinglower-bound
:_upper-bound
_for one or more array dimensions. For example, this query retrieves the first item on Bill's schedule for the first two days of the week:
If any dimension is written as a slice, i.e., contains a colon, then all dimensions are treated as slices. Any dimension that has only a single number (no colon) is treated as being from 1 to the number specified. For example,[2]
is treated as[1:2]
, as in this example:
To avoid confusion with the non-slice case, it's best to use slice syntax for all dimensions, e.g.,[1:2][1:1]
, not[2][1:1]
.
It is possible to omit thelower-bound
_and/orupper-bound
_of a slice specifier; the missing bound is replaced by the lower or upper limit of the array's subscripts. For example:
An array subscript expression will return null if either the array itself or any of the subscript expressions are null. Also, null is returned if a subscript is outside the array bounds (this case does not raise an error). For example, ifschedule
currently has the dimensions[1:3][1:2]
then referencingschedule[3][3]
yields NULL. Similarly, an array reference with the wrong number of subscripts yields a null rather than an error.
An array slice expression likewise yields null if the array itself or any of the subscript expressions are null. However, in other cases such as selecting an array slice that is completely outside the current array bounds, a slice expression yields an empty (zero-dimensional) array instead of null. (This does not match non-slice behavior and is done for historical reasons.) If the requested slice partially overlaps the array bounds, then it is silently reduced to just the overlapping region instead of returning null.
The current dimensions of any array value can be retrieved with thearray_dims
function:
array_dims
produces atext
result, which is convenient for people to read but perhaps inconvenient for programs. Dimensions can also be retrieved witharray_upper
andarray_lower
, which return the upper and lower bound of a specified array dimension, respectively:
array_length
will return the length of a specified array dimension:
cardinality
returns the total number of elements in an array across all dimensions. It is effectively the number of rows a call tounnest
would yield:
An array value can be replaced completely:
or using theARRAY
expression syntax:
An array can also be updated at a single element:
or updated in a slice:
The slice syntaxes with omittedlower-bound
_and/orupper-bound
_can be used too, but only when updating an array value that is not NULL or zero-dimensional (otherwise, there is no existing subscript limit to substitute).
A stored array value can be enlarged by assigning to elements not already present. Any positions between those previously present and the newly assigned elements will be filled with nulls. For example, if arraymyarray
currently has 4 elements, it will have six elements after an update that assigns tomyarray[6]
;myarray[5]
will contain null. Currently, enlargement in this fashion is only allowed for one-dimensional arrays, not multidimensional arrays.
Subscripted assignment allows creation of arrays that do not use one-based subscripts. For example one might assign tomyarray[-2:7]
to create an array with subscript values from -2 to 7.
New array values can also be constructed using the concatenation operator,||
:
The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. It also accepts twoN
-dimensional arrays, or anN
-dimensional and anN+1
-dimensional array.
When a single element is pushed onto either the beginning or end of a one-dimensional array, the result is an array with the same lower bound subscript as the array operand. For example:
When two arrays with an equal number of dimensions are concatenated, the result retains the lower bound subscript of the left-hand operand's outer dimension. The result is an array comprising every element of the left-hand operand followed by every element of the right-hand operand. For example:
When anN
-dimensional array is pushed onto the beginning or end of anN+1
-dimensional array, the result is analogous to the element-array case above. EachN
-dimensional sub-array is essentially an element of theN+1
-dimensional array's outer dimension. For example:
An array can also be constructed by using the functionsarray_prepend
,array_append
, orarray_cat
. The first two only support one-dimensional arrays, butarray_cat
supports multidimensional arrays. Some examples:
In simple cases, the concatenation operator discussed above is preferred over direct use of these functions. However, because the concatenation operator is overloaded to serve all three cases, there are situations where use of one of the functions is helpful to avoid ambiguity. For example consider:
In the examples above, the parser sees an integer array on one side of the concatenation operator, and a constant of undetermined type on the other. The heuristic it uses to resolve the constant's type is to assume it's of the same type as the operator's other input — in this case, integer array. So the concatenation operator is presumed to representarray_cat
, notarray_append
. When that's the wrong choice, it could be fixed by casting the constant to the array's element type; but explicit use ofarray_append
might be a preferable solution.
To search for a value in an array, each value must be checked. This can be done manually, if you know the size of the array. For example:
In addition, you can find rows where the array has all values equal to 10000 with:
Alternatively, thegenerate_subscripts
function can be used. For example:
You can also search an array using the&&
operator, which checks whether the left operand overlaps with the right operand. For instance:
You can also search for specific values in an array using thearray_position
andarray_positions
functions. The former returns the subscript of the first occurrence of a value in an array; the latter returns an array with the subscripts of all occurrences of the value in the array. For example:
Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.
The external text representation of an array value consists of items that are interpreted according to the I/O conversion rules for the array's element type, plus decoration that indicates the array structure. The decoration consists of curly braces ({
and}
) around the array value plus delimiter characters between adjacent items. The delimiter character is usually a comma (,
) but can be something else: it is determined by thetypdelim
setting for the array's element type. Among the standard data types provided in thePostgreSQLdistribution, all use a comma, except for typebox
, which uses a semicolon (;
). In a multidimensional array, each dimension (row, plane, cube, etc.) gets its own level of curly braces, and delimiters must be written between adjacent curly-braced entities of the same level.
The array output routine will put double quotes around element values if they are empty strings, contain curly braces, delimiter characters, double quotes, backslashes, or white space, or match the wordNULL
. Double quotes and backslashes embedded in element values will be backslash-escaped. For numeric data types it is safe to assume that double quotes will never appear, but for textual data types one should be prepared to cope with either the presence or absence of quotes.
By default, the lower bound index value of an array's dimensions is set to one. To represent arrays with other lower bounds, the array subscript ranges can be specified explicitly before writing the array contents. This decoration consists of square brackets ([]
) around each array dimension's lower and upper bounds, with a colon (:
) delimiter character in between. The array dimension decoration is followed by an equal sign (=
). For example:
The array output routine will include explicit dimensions in its result only when there are one or more lower bounds different from one.
As shown previously, when writing an array value you can use double quotes around any individual array element. You_must_do so if the element value would otherwise confuse the array-value parser. For example, elements containing curly braces, commas (or the data type's delimiter character), double quotes, backslashes, or leading or trailing whitespace must be double-quoted. Empty strings and strings matching the wordNULL
must be quoted, too. To put a double quote or backslash in a quoted array element value, use escape string syntax and precede it with a backslash. Alternatively, you can avoid quotes and use backslash-escaping to protect all data characters that would otherwise be taken as array syntax.
You can add whitespace before a left brace or after a right brace. You can also add whitespace before or after any individual item string. In all of these cases the whitespace will be ignored. However, whitespace within double-quoted elements, or surrounded on both sides by non-whitespace characters of an element, is not ignored.
Remember that what you write in an SQL command will first be interpreted as a string literal, and then as an array. This doubles the number of backslashes you need. For example, to insert atext
array value containing a backslash and a double quote, you'd need to write:
版本:11
Range types are data types representing a range of values of some element type (called the range's subtype). For instance, ranges of timestamp
might be used to represent the ranges of time that a meeting room is reserved. In this case the data type is tsrange
(short for “timestamp range”), and timestamp
is the subtype. The subtype must have a total order so that it is well-defined whether element values are within, before, or after a range of values.
Range types are useful because they represent many element values in a single range value, and because concepts such as overlapping ranges can be expressed clearly. The use of time and date ranges for scheduling purposes is the clearest example; but price ranges, measurement ranges from an instrument, and so forth can also be useful.
PostgreSQL comes with the following built-in range types:
int4range
— Range of integer
int8range
— Range of bigint
numrange
— Range of numeric
tsrange
— Range of timestamp without time zone
tstzrange
— Range of timestamp with time zone
daterange
— Range of date
In addition, you can define your own range types; see for more information.
Every non-empty range has two bounds, the lower bound and the upper bound. All points between these values are included in the range. An inclusive bound means that the boundary point itself is included in the range as well, while an exclusive bound means that the boundary point is not included in the range.
The functions lower_inc
and upper_inc
test the inclusivity of the lower and upper bounds of a range value, respectively.
The lower bound of a range can be omitted, meaning that all points less than the upper bound are included in the range. Likewise, if the upper bound of the range is omitted, then all points greater than the lower bound are included in the range. If both lower and upper bounds are omitted, all values of the element type are considered to be in the range.
This is equivalent to considering that the lower bound is “minus infinity”, or the upper bound is “plus infinity”, respectively. But note that these infinite values are never values of the range's element type, and can never be part of the range. (So there is no such thing as an inclusive infinite bound — if you try to write one, it will automatically be converted to an exclusive bound.)
Also, some element types have a notion of “infinity”, but that is just another value so far as the range type mechanisms are concerned. For example, in timestamp ranges, [today,]
means the same thing as [today,)
. But [today,infinity]
means something different from [today,infinity)
— the latter excludes the special timestamp
value infinity
.
The functions lower_inf
and upper_inf
test for infinite lower and upper bounds of a range, respectively.
The input for a range value must follow one of the following patterns:
The parentheses or brackets indicate whether the lower and upper bounds are exclusive or inclusive, as described previously. Notice that the final pattern is empty
, which represents an empty range (a range that contains no points).
The lower-bound
may be either a string that is valid input for the subtype, or empty to indicate no lower bound. Likewise, upper-bound
may be either a string that is valid input for the subtype, or empty to indicate no upper bound.
Each bound value can be quoted using "
(double quote) characters. This is necessary if the bound value contains parentheses, brackets, commas, double quotes, or backslashes, since these characters would otherwise be taken as part of the range syntax. To put a double quote or backslash in a quoted bound value, precede it with a backslash. (Also, a pair of double quotes within a double-quoted bound value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can avoid quoting and use backslash-escaping to protect all data characters that would otherwise be taken as range syntax. Also, to write a bound value that is an empty string, write ""
, since writing nothing means an infinite bound.
Whitespace is allowed before and after the range value, but any whitespace between the parentheses or brackets is taken as part of the lower or upper bound value. (Depending on the element type, it might or might not be significant.)
Examples:
Each range type has a constructor function with the same name as the range type. Using the constructor function is frequently more convenient than writing a range literal constant, since it avoids the need for extra quoting of the bound values. The constructor function accepts two or three arguments. The two-argument form constructs a range in standard form (lower bound inclusive, upper bound exclusive), while the three-argument form constructs a range with bounds of the form specified by the third argument. The third argument must be one of the strings “()
”, “(]
”, “[)
”, or “[]
”. For example:
A discrete range is one whose element type has a well-defined “step”, such as integer
or date
. In these types two elements can be said to be adjacent, when there are no valid values between them. This contrasts with continuous ranges, where it's always (or almost always) possible to identify other element values between two given values. For example, a range over the numeric
type is continuous, as is a range over timestamp
. (Even though timestamp
has limited precision, and so could theoretically be treated as discrete, it's better to consider it continuous since the step size is normally not of interest.)
Another way to think about a discrete range type is that there is a clear idea of a “next” or “previous” value for each element value. Knowing that, it is possible to convert between inclusive and exclusive representations of a range's bounds, by choosing the next or previous element value instead of the one originally given. For example, in an integer range type [4,8]
and (3,9)
denote the same set of values; but this would not be so for a range over numeric.
A discrete range type should have a canonicalization function that is aware of the desired step size for the element type. The canonicalization function is charged with converting equivalent values of the range type to have identical representations, in particular consistently inclusive or exclusive bounds. If a canonicalization function is not specified, then ranges with different formatting will always be treated as unequal, even though they might represent the same set of values in reality.
The built-in range types int4range
, int8range
, and daterange
all use a canonical form that includes the lower bound and excludes the upper bound; that is, [)
. User-defined range types can use other conventions, however.
Users can define their own range types. The most common reason to do this is to use ranges over subtypes not provided among the built-in range types. For example, to define a new range type of subtype float8
:
Because float8
has no meaningful “step”, we do not define a canonicalization function in this example.
Defining your own range type also allows you to specify a different subtype B-tree operator class or collation to use, so as to change the sort ordering that determines which values fall into a given range.
If the subtype is considered to have discrete rather than continuous values, the CREATE TYPE
command should specify a canonical
function. The canonicalization function takes an input range value, and must return an equivalent range value that may have different bounds and formatting. The canonical output for two ranges that represent the same set of values, for example the integer ranges [1, 7]
and [1, 8)
, must be identical. It doesn't matter which representation you choose to be the canonical one, so long as two equivalent values with different formattings are always mapped to the same value with the same formatting. In addition to adjusting the inclusive/exclusive bounds format, a canonicalization function might round off boundary values, in case the desired step size is larger than what the subtype is capable of storing. For instance, a range type over timestamp
could be defined to have a step size of an hour, in which case the canonicalization function would need to round off bounds that weren't a multiple of an hour, or perhaps throw an error instead.
In addition, any range type that is meant to be used with GiST or SP-GiST indexes should define a subtype difference, or subtype_diff
, function. (The index will still work without subtype_diff
, but it is likely to be considerably less efficient than if a difference function is provided.) The subtype difference function takes two input values of the subtype, and returns their difference (i.e., X
minus Y
) represented as a float8
value. In our example above, the function float8mi
that underlies the regular float8
minus operator can be used; but for any other subtype, some type conversion would be necessary. Some creative thought about how to represent differences as numbers might be needed, too. To the greatest extent possible, the subtype_diff
function should agree with the sort ordering implied by the selected operator class and collation; that is, its result should be positive whenever its first argument is greater than its second according to the sort ordering.
A less-oversimplified example of a subtype_diff
function is:
GiST and SP-GiST indexes can be created for table columns of range types. For instance, to create a GiST index:
In addition, B-tree and hash indexes can be created for table columns of range types. For these index types, basically the only useful range operation is equality. There is a B-tree sort ordering defined for range values, with corresponding <
and >
operators, but the ordering is rather arbitrary and not usually useful in the real world. Range types' B-tree and hash support is primarily meant to allow sorting and hashing internally in queries, rather than creation of actual indexes.
That constraint will prevent any overlapping values from existing in the table at the same time:
To write a composite value as a literal constant, enclose the field values within parentheses and separate them by commas. You can put double quotes around any field value, and must do so if it contains commas or parentheses. (More details appear.) Thus, the general format of a composite constant is the following:
(These constants are actually only a special case of the generic type constants discussed in. The constant is initially treated as a string and passed to the composite-type input conversion routine. An explicit type specification might be necessary to tell which type to convert the constant to.)
TheROW
expression syntax is discussed in more detail in.
The special field name*
means“all fields”, as further explained in.
InPostgreSQL, a reference to a table name (or alias) in a query is effectively a reference to the composite value of the table's current row. For example, if we had a tableinventory_item
as shown, we could write:
The ordinary qualified-column-name syntaxtable_name.
_column_name
_can be understood as applyingto the composite value of the table's current row. (For efficiency reasons, it's not actually implemented that way.)
PostgreSQLwill apply this expansion behavior to any composite-valued expression, although as shown, you need to write parentheses around the value that.*
is applied to whenever it's not a simple table name. For example, ifmyfunc()
is a function returning a composite type with columnsa
,b
, andc
, then these two queries have the same result:
Thecomposite_value.*
syntax results in column expansion of this kind when it appears at the top level of a, ainINSERT
/UPDATE
/DELETE
, a, or a. In all other contexts (including when nested inside one of those constructs), attaching.*
to a composite value does not change the value, since it means“all columns”and so the same composite value is produced again. For example, ifsomefunc()
accepts a composite-valued argument, these queries are the same:
All of theseORDER BY
clauses specify the row's composite value, resulting in sorting the rows according to the rules described in. However, ifinventory_item
contained a column namedc
, the first case would be different from the others, as it would mean to sort by that column only. Given the column names previously shown, these queries are also equivalent to those above:
The string-literal processor removes one level of backslashes, so that what arrives at the composite-value parser looks like("\"\\")
. In turn, the string fed to thetext
data type's input routine becomes"\
. (If we were working with a data type whose input routine also treated backslashes specially,bytea
for example, we might need as many as eight backslashes in the command to get one backslash into the stored composite field.) Dollar quoting (see) can be used to avoid the need to double backslashes.
(These kinds of array constants are actually only a special case of the generic type constants discussed in. The constant is initially treated as a string and passed to the array input conversion routine. An explicit type specification might be necessary.)
Notice that the array elements are ordinary SQL constants or expressions; for instance, string literals are single quoted, instead of double quoted as they would be in an array literal. TheARRAY
constructor syntax is discussed in more detail in.
However, this quickly becomes tedious for large arrays, and is not helpful if the size of the array is unknown. An alternative method is described in. The above query could be replaced by:
This function is described in.
This and other array operators are further described in. It can be accelerated by an appropriate index, as described in.
If the value written for an element isNULL
(in any case variant), the element is taken to be NULL. The presence of any quotes or backslashes disables this and allows the literal string value“NULL”to be entered. Also, for backward compatibility with pre-8.2 versions ofPostgreSQL, theconfiguration parameter can be turnedoff
to suppress recognition ofNULL
as a NULL.
The escape string processor removes one level of backslashes, so that what arrives at the array-value parser looks like{"\\","\""}
. In turn, the strings fed to thetext
data type's input routine become\
and"
respectively. (If we were working with a data type whose input routine also treated backslashes specially,bytea
for example, we might need as many as eight backslashes in the command to get one backslash into the stored array element.) Dollar quoting (see) can be used to avoid the need to double backslashes.
TheARRAY
constructor syntax (see) is often easier to work with than the array-literal syntax when writing array values in SQL commands. InARRAY
, individual element values are written the same way they would be written when not members of an array.
See and for complete lists of operators and functions on range types.
In the text form of a range, an inclusive lower bound is represented by “[
” while an exclusive lower bound is represented by “(
”. Likewise, an inclusive upper bound is represented by “]
”, while an exclusive upper bound is represented by “)
”. (See for more details.)
These rules are very similar to those for writing field values in composite-type literals. See for additional commentary.
See for more information about creating range types.
A GiST or SP-GiST index can accelerate queries involving these range operators: =
, &&
, <@
, @>
, <<
, >>
, -|-
, &<
, and &>
(see for more information).
While UNIQUE
is a natural constraint for scalar values, it is usually unsuitable for range types. Instead, an exclusion constraint is often more appropriate (see ). Exclusion constraints allow the specification of constraints such as “non-overlapping” on a range type. For example:
You can use the extension to define exclusion constraints on plain scalar data types, which can then be combined with range exclusions for maximum flexibility. For example, after btree_gist
is installed, the following constraint will reject overlapping ranges only if the meeting room numbers are equal:
(1)
char_length 函數在 9.4 節中討論。
Name
Storage Size
Description
"char"
1 byte
單位元組內部型別
name
64 bytes
物件名稱的內部型別
Name
Description
character varying(n)
, varchar(n)
可變長度,但有限制
character(n)
, char(n)
固定長度,空白填充
text
可變且無限長度
Name
Aliases
Description
bigint
int8
signed eight-byte integer
bigserial
serial8
autoincrementing eight-byte integer
bit [ (n
) ]
fixed-length bit string
bit varying [ (n
) ]
varbit
variable-length bit string
boolean
bool
logical Boolean (true/false)
box
rectangular box on a plane
bytea
binary data (“byte array”)
character [ (n
) ]
char [ (n
) ]
fixed-length character string
character varying [ (n
) ]
varchar [ (n
) ]
variable-length character string
cidr
IPv4 or IPv6 network address
circle
circle on a plane
date
calendar date (year, month, day)
double precision
float8
double precision floating-point number (8 bytes)
inet
IPv4 or IPv6 host address
integer
int
,int4
signed four-byte integer
interval [fields
] [ (p
) ]
time span
json
textual JSON data
jsonb
binary JSON data, decomposed
line
infinite line on a plane
lseg
line segment on a plane
macaddr
MAC (Media Access Control) address
macaddr8
MAC (Media Access Control) address (EUI-64 format)
money
currency amount
numeric [ (p
,s
) ]
decimal [ (p
,s
) ]
exact numeric of selectable precision
path
geometric path on a plane
pg_lsn
PostgreSQLLog Sequence Number
point
geometric point on a plane
polygon
closed geometric path on a plane
real
float4
single precision floating-point number (4 bytes)
smallint
int2
signed two-byte integer
smallserial
serial2
autoincrementing two-byte integer
serial
serial4
autoincrementing four-byte integer
text
variable-length character string
time [ (p
) ] [ without time zone ]
time of day (no time zone)
time [ (p
) ] with time zone
timetz
time of day, including time zone
timestamp [ (p
) ] [ without time zone ]
date and time (no time zone)
timestamp [ (p
) ] with time zone
timestamptz
date and time, including time zone
tsquery
text search query
tsvector
text search document
txid_snapshot
user-level transaction ID snapshot
uuid
universally unique identifier
xml
XML data
JSON data types are for storing JSON (JavaScript Object Notation) data, as specified inRFC 7159. Such data can also be stored astext
, but the JSON data types have the advantage of enforcing that each stored value is valid according to the JSON rules. There are also assorted JSON-specific functions and operators available for data stored in these data types; seeSection 9.15.
There are two JSON data types:json
andjsonb
. They accept_almost_identical sets of values as input. The major practical difference is one of efficiency. Thejson
data type stores an exact copy of the input text, which processing functions must reparse on each execution; whilejsonb
data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed.jsonb
also supports indexing, which can be a significant advantage.
Because thejson
type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast,jsonb
does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.
In general, most applications should prefer to store JSON data asjsonb
, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys.
PostgreSQLallows only one character set encoding per database. It is therefore not possible for the JSON types to conform rigidly to the JSON specification unless the database encoding is UTF8. Attempts to directly include characters that cannot be represented in the database encoding will fail; conversely, characters that can be represented in the database encoding but not in UTF8 will be allowed.
RFC 7159 permits JSON strings to contain Unicode escape sequences denoted by\uXXXX
. In the input function for thejson
type, Unicode escapes are allowed regardless of the database encoding, and are checked only for syntactic correctness (that is, that four hex digits follow\u
). However, the input function forjsonb
is stricter: it disallows Unicode escapes for non-ASCII characters (those aboveU+007F
) unless the database encoding is UTF8. Thejsonb
type also rejects\u0000
(because that cannot be represented inPostgreSQL'stext
type), and it insists that any use of Unicode surrogate pairs to designate characters outside the Unicode Basic Multilingual Plane be correct. Valid Unicode escapes are converted to the equivalent ASCII or UTF8 character for storage; this includes folding surrogate pairs into a single character.
Many of the JSON processing functions described inSection 9.15will convert Unicode escapes to regular characters, and will therefore throw the same types of errors just described even if their input is of typejson
notjsonb
. The fact that thejson
input function does not make these checks may be considered a historical artifact, although it does allow for simple storage (without processing) of JSON Unicode escapes in a non-UTF8 database encoding. In general, it is best to avoid mixing Unicode escapes in JSON with a non-UTF8 database encoding, if possible.
When converting textual JSON input intojsonb
, the primitive types described byRFC7159 are effectively mapped onto nativePostgreSQLtypes, as shown inTable 8.23. Therefore, there are some minor additional constraints on what constitutes validjsonb
data that do not apply to thejson
type, nor to JSON in the abstract, corresponding to limits on what can be represented by the underlying data type. Notably,jsonb
will reject numbers that are outside the range of thePostgreSQLnumeric
data type, whilejson
will not. Such implementation-defined restrictions are permitted byRFC7159. However, in practice such problems are far more likely to occur in other implementations, as it is common to represent JSON'snumber
primitive type as IEEE 754 double precision floating point (whichRFC7159 explicitly anticipates and allows for). When using JSON as an interchange format with such systems, the danger of losing numeric precision compared to data originally stored byPostgreSQLshould be considered.
Conversely, as noted in the table there are some minor restrictions on the input format of JSON primitive types that do not apply to the correspondingPostgreSQLtypes.
Table 8.23. JSON primitive types and correspondingPostgreSQLtypes
JSON primitive type
PostgreSQL
type
Notes
string
text
\u0000
is disallowed, as are non-ASCII Unicode escapes if database encoding is not UTF8
number
numeric
NaN
andinfinity
values are disallowed
boolean
boolean
Only lowercasetrue
andfalse
spellings are accepted
null
(none)
SQLNULL
is a different concept
The input/output syntax for the JSON data types is as specified inRFC7159.
The following are all validjson
(orjsonb
) expressions:
As previously stated, when a JSON value is input and then printed without any additional processing,json
outputs the same text that was input, whilejsonb
does not preserve semantically-insignificant details such as whitespace. For example, note the differences here:
One semantically-insignificant detail worth noting is that injsonb
, numbers will be printed according to the behavior of the underlyingnumeric
type. In practice this means that numbers entered withE
notation will be printed without it, for example:
However,jsonb
will preserve trailing fractional zeroes, as seen in this example, even though those are semantically insignificant for purposes such as equality checks.
Representing data as JSON can be considerably more flexible than the traditional relational data model, which is compelling in environments where requirements are fluid. It is quite possible for both approaches to co-exist and complement each other within the same application. However, even for applications where maximal flexibility is desired, it is still recommended that JSON documents have a somewhat fixed structure. The structure is typically unenforced (though enforcing some business rules declaratively is possible), but having a predictable structure makes it easier to write queries that usefully summarize a set of“documents”(datums) in a table.
JSON data is subject to the same concurrency-control considerations as any other data type when stored in a table. Although storing large documents is practicable, keep in mind that any update acquires a row-level lock on the whole row. Consider limiting JSON documents to a manageable size in order to decrease lock contention among updating transactions. Ideally, JSON documents should each represent an atomic datum that business rules dictate cannot reasonably be further subdivided into smaller datums that could be modified independently.
jsonb
Containment and ExistenceTesting_containment_is an important capability ofjsonb
. There is no parallel set of facilities for thejson
type. Containment tests whether onejsonb
document has contained within it another one. These examples return true except as noted:
The general principle is that the contained object must match the containing object as to structure and data contents, possibly after discarding some non-matching array elements or object key/value pairs from the containing object. But remember that the order of array elements is not significant when doing a containment match, and duplicate array elements are effectively considered only once.
As a special exception to the general principle that the structures must match, an array may contain a primitive value:
jsonb
also has an_existence_operator, which is a variation on the theme of containment: it tests whether a string (given as atext
value) appears as an object key or array element at the top level of thejsonb
value. These examples return true except as noted:
JSON objects are better suited than arrays for testing containment or existence when there are many keys or elements involved, because unlike arrays they are internally optimized for searching, and do not need to be searched linearly.
Because JSON containment is nested, an appropriate query can skip explicit selection of sub-objects. As an example, suppose that we have adoc
column containing objects at the top level, with most objects containingtags
fields that contain arrays of sub-objects. This query finds entries in which sub-objects containing both"term":"paris"
and"term":"food"
appear, while ignoring any such keys outside thetags
array:
One could accomplish the same thing with, say,
but that approach is less flexible, and often less efficient as well.
On the other hand, the JSON existence operator is not nested: it will only look for the specified key or array element at top level of the JSON value.
The various containment and existence operators, along with all other JSON operators and functions are documented inSection 9.15.
jsonb
IndexingGIN indexes can be used to efficiently search for keys or key/value pairs occurring within a large number ofjsonb
documents (datums). Two GIN“operator classes”are provided, offering different performance and flexibility trade-offs.
The default GIN operator class forjsonb
supports queries with top-level key-exists operators?
,?&
and?|
operators and path/value-exists operator@>
. (For details of the semantics that these operators implement, seeTable 9.44.) An example of creating an index with this operator class is:
The non-default GIN operator classjsonb_path_ops
supports indexing the@>
operator only. An example of creating an index with this operator class is:
Consider the example of a table that stores JSON documents retrieved from a third-party web service, with a documented schema definition. A typical document is:
We store these documents in a table namedapi
, in ajsonb
column namedjdoc
. If a GIN index is created on this column, queries like the following can make use of the index:
However, the index could not be used for queries like the following, because though the operator?
is indexable, it is not applied directly to the indexed columnjdoc
:
Still, with appropriate use of expression indexes, the above query can use an index. If querying for particular items within the"tags"
key is common, defining an index like this may be worthwhile:
Now, theWHERE
clausejdoc -> 'tags' ? 'qui'
will be recognized as an application of the indexable operator?
to the indexed expressionjdoc -> 'tags'
. (More information on expression indexes can be found inSection 11.7.)
Another approach to querying is to exploit containment, for example:
A simple GIN index on thejdoc
column can support this query. But note that such an index will store copies of every key and value in thejdoc
column, whereas the expression index of the previous example stores only data found under thetags
key. While the simple-index approach is far more flexible (since it supports queries about any key), targeted expression indexes are likely to be smaller and faster to search than a simple index.
Although thejsonb_path_ops
operator class supports only queries with the@>
operator, it has notable performance advantages over the default operator classjsonb_ops
. Ajsonb_path_ops
index is usually much smaller than ajsonb_ops
index over the same data, and the specificity of searches is better, particularly when queries contain keys that appear frequently in the data. Therefore search operations typically perform better than with the default operator class.
The technical difference between ajsonb_ops
and ajsonb_path_ops
GIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data.[6]Basically, eachjsonb_path_ops
index item is a hash of the value and the key(s) leading to it; for example to index{"foo": {"bar": "baz"}}
, a single index item would be created incorporating all three offoo
,bar
, andbaz
into the hash value. Thus a containment query looking for this structure would result in an extremely specific index search; but there is no way at all to find out whetherfoo
appears as a key. On the other hand, ajsonb_ops
index would create three index items representingfoo
,bar
, andbaz
separately; then to do the containment query, it would look for rows containing all three of these items. While GIN indexes can perform such an AND search fairly efficiently, it will still be less specific and slower than the equivalentjsonb_path_ops
search, especially if there are a very large number of rows containing any single one of the three index items.
A disadvantage of thejsonb_path_ops
approach is that it produces no index entries for JSON structures not containing any values, such as{"a": {}}
. If a search for documents containing such a structure is requested, it will require a full-index scan, which is quite slow.jsonb_path_ops
is therefore ill-suited for applications that often perform such searches.
jsonb
also supportsbtree
andhash
indexes. These are usually useful only if it's important to check equality of complete JSON documents. Thebtree
ordering forjsonb
datums is seldom of great interest, but for completeness it is:
Objects with equal numbers of pairs are compared in the order:
Note that object keys are compared in their storage order; in particular, since shorter keys are stored before longer keys, this can lead to results that might be unintuitive, such as:
Similarly, arrays with equal numbers of elements are compared in the order:
Primitive JSON values are compared using the same comparison rules as for the underlyingPostgreSQLdata type. Strings are compared using the default database collation.
[6]For this purpose, the term“value”includes array elements, though JSON terminology sometimes considers array elements distinct from values within objects.
Thepg_lsn
data type can be used to store LSN (Log Sequence Number) data which is a pointer to a location in the WAL. This type is a representation ofXLogRecPtr
and an internal system type ofPostgreSQL.
Internally, an LSN is a 64-bit integer, representing a byte position in the write-ahead log stream. It is printed as two hexadecimal numbers of up to 8 digits each, separated by a slash; for example,16/B374D848
. Thepg_lsn
type supports the standard comparison operators, like=
and>
. Two LSNs can be subtracted using the-
operator; the result is the number of bytes separating those write-ahead log locations.
PostgreSQL 支援完整的 SQL 日期和時間格式,如表 8.9 所示。對於這些資料型態能使用的操作,將會在9.9節說明。
Table 8.9. 日期/時間型態
Name
Storage Size
Description
Low Value
High Value
Resolution
timestamp [ (p
) ] [ without time zone ]
8 bytes
both date and time (no time zone)
4713 BC
294276 AD
1 microsecond
timestamp [ (p
) ] with time zone
8 bytes
both date and time, with time zone
4713 BC
294276 AD
1 microsecond
date
4 bytes
date (no time of day)
4713 BC
5874897 AD
1 day
time [ (p
) ] [ without time zone ]
8 bytes
time of day (no date)
00:00:00
24:00:00
1 microsecond
time [ (p
) ] with time zone
12 bytes
time of day (no date), with time zone
00:00:00+1459
24:00:00-1459
1 microsecond
interval [
fields
] [ (p
) ]
16 bytes
time interval
-178000000 years
178000000 years
1 microsecond
SQL 標準中要求 timestamp
的效果等同於 timestamp without time zone
,對此 PostgreSQL 尊重這個行為。同時 PostgreSQL 額外擴充了 timestamptz
作為 timestamp with time zone
的縮寫。
time
、timestamp
和 interval
接受 p
作為非必須的精度參數,可指定秒的欄位保留的小數位數。預設情況下,精度沒有明確的界限。其中 p
允許的範圍是 0 到 6。
interval
型態有個額外的選項,可以寫下下列其中一個詞組來限制存放的欄位:
需注意若是 fields
和 p
同時指定時,fields
必須要包含 SECOND
。這是因為精度只會套用在秒上。
time with time zone
型態是由 SQL 標準所定義的,但是在定義中展示的屬性會導致對有用性產生疑問。在多數狀況下,date
、time
、timestamp without time zone
和 timestamp with time zone
的組合應該就能提供任何應用程式需要的完整日期/時間功能。
abstime
和 reltime
型態是較低精度的內部用型態,並不建議將這些型態用在應用程式中;這些內部型態也可能在未來的釋出中消失。
日期和時間的輸入格式可以接受幾乎任何合理的格式,包括 ISO 8601、相容於 SQL 的格式、傳統 POSTGRES 格式或者其他格式。在部份格式中,日期的年、月、日的順序可能很含糊,因此有支援指定這些欄位期望的順序。可以設定 DateStyle 參數為 MDY
來以 月-日-年 表示、設定為 DMY
以 日-月-年 表示、或者設定為 YMD
以 年-月-日 表示。
PostgreSQL 在處理日期/時間的輸入是比 SQL 標準要求的更加靈活,關於精確的解析規則以及包含月份、一週天數、時區等可以接受的文字欄位,可以參閱附錄 B。
請記得,任何日期和時間字面的輸入,都需要像文字一樣以單引號結束,詳細的資訊請參閱4.1.2.7 節。SQL 要求使用以下的語法:
其中 p
是非必須的精度設定,用來指定秒欄位的小數位數。精度可以用來指定 time
、timestamp
和 interval
型態,可指定範圍為 0 到 6。如果沒有指定精度時,預設將以字面數值的精度為準(但最多不超過 6 位)。
表 8.10 列出 date
型態的一些可能的輸入格式:
表 8.10. 日期輸入
Example
Description
1999-01-08
ISO 8601; January 8 in any mode (recommended format)
January 8, 1999
unambiguous in any datestyle
input mode
1/8/1999
January 8 in MDY
mode; August 1 in DMY
mode
1/18/1999
January 18 in MDY
mode; rejected in other modes
01/02/03
January 2, 2003 in MDY
mode; February 1, 2003 in DMY
mode; February 3, 2001 in YMD
mode
1999-Jan-08
January 8 in any mode
Jan-08-1999
January 8 in any mode
08-Jan-1999
January 8 in any mode
99-Jan-08
January 8 in YMD
mode, else error
08-Jan-99
January 8, except error in YMD
mode
Jan-08-99
January 8, except error in YMD
mode
19990108
ISO 8601; January 8, 1999 in any mode
990108
ISO 8601; January 8, 1999 in any mode
1999.008
year and day of year
J2451187
Julian date
January 8, 99 BC
year 99 BC
time-of-day 格式包含 time [ (p
) ] without time zone和
time [ (_
p_\) \] with time zone
,其中 time
單獨出現時等同於 time without time zone
。
這些型態的合法輸入包含了一天當中的時間,以及非必須的時區。(請參照表 8.11 和表 8.12)。如果在 time without time zone
的輸入中指定了時區,則時區會被無聲地忽略。你也可以指定日期,但日期也會被忽略,除非你指定的時區名稱是像 America/New_York
這種具有日光節約規則的時區,因為在這種狀況下,為了能夠決定要套用一般規則或是日光節約規則,必須要有日期。適合的時差資訊會被紀錄在 time with time zone
的值當中。
表 8.11. 時間輸入
Example
Description
04:05:06.789
ISO 8601
04:05:06
ISO 8601
04:05
ISO 8601
040506
ISO 8601
04:05 AM
same as 04:05; AM does not affect value
04:05 PM
same as 16:05; input hour must be <= 12
04:05:06.789-8
ISO 8601
04:05:06-08:00
ISO 8601
04:05-08:00
ISO 8601
040506-08
ISO 8601
04:05:06 PST
time zone specified by abbreviation
2003-04-12 04:05:06 America/New_York
time zone specified by full name
表 8.12. 時區輸入
Example
Description
PST
Abbreviation (for Pacific Standard Time)
America/New_York
Full time zone name
PST8PDT
POSIX-style time zone specification
-8:00
ISO-8601 offset for PST
-800
ISO-8601 offset for PST
-8
ISO-8601 offset for PST
zulu
Military abbreviation for UTC
z
Short form of zulu
關於指定時區的其他資訊,請參照8.5.3節。
時間戳記型態的合法輸入,依序包含了日期、時間、非必須的時區、以及非必須的 AD
或者 BC
。 (其中,AD
或者 BC
也可以寫在時區前面,但這並非推薦的格式。)因此:
以及:
都是遵循 ISO 8601 標準的合法值。除此之外,常見的格式:
也有支援。
SQL 標準中,timestamp without time zone
和 timestamp with time zone
字面可以在時間後面加上 “+” 或 “-” 符號和時差來做區別,因此根據這個標準,
是 timestamp without time zone
型態,而
則是 timestamp with time zone
型態。PostgreSQL 從不會在識別型態前就解析字面的內容,因此會將上述兩種值都視為 timestamp without time zone
型態。如要確保字面會被視為 timestamp with time zone
,請給它正確而明確的型態:
在一個已被確定為沒有時區的時間戳記的字串中,PostgreSQL 將默默地忽略任何時區指示。也就是說,結果值是從輸入值中的日期/時間字串產生的,而不針對時區進行調整。
對於帶有時區的時間戳記,內部儲存的值始終為 UTC(Universal Coordinated Time,傳統上稱為格林威治標準時間,GMT)。具有指定時區的輸入值將使用該時區的相對偏移量轉換為 UTC。如果輸入字串中未指定時區,則假定它位於系統的 TimeZone 參數所指示的時區中,並使用時區的偏移量轉換為 UTC。
輸出帶有時區值的時間戳記時,始終由 UTC 轉換為目前時區,並在該時區中顯示為本地時間。要查看另一個時區的時間,請變更時區或使用 AT TIME ZONE 語法(參閱第 9.9.3 節)。
沒有時區的時間戳記和帶時區的時間戳記之間的轉換通常假定應該採用沒有時區值的時間戳記或本地時間所給予的時區。可以使用 AT TIME ZONE 為指定轉換不同的時區。
為方便起見,PostgreSQL 支援幾個特殊的日期/時間輸入值,如 Table 8.13 所示。infinaity 和 -infinity 值在系統內部有特別的表示,但不會顯示;而其他的只是符號縮寫,在閱讀時會轉換為普通的日期/時間值。(特別是,now 和相關的字串一旦被讀取就會被轉換為特定的時間值。)當在 SQL 命令中要作為常數使用時,所有這些值都需要用單引號括起來。
Table 8.13. Special Date/Time Inputs
Input String
Valid Types
Description
epoch
date
, timestamp
1970-01-01 00:00:00+00 (Unix system time zero)
infinity
date
, timestamp
later than all other time stamps
-infinity
date
, timestamp
earlier than all other time stamps
now
date
, time
, timestamp
current transaction's start time
today
date
, timestamp
midnight today
tomorrow
date
, timestamp
midnight tomorrow
yesterday
date
, timestamp
midnight yesterday
allballs
time
00:00:00.00 UTC
以下 SQL 相容函數也可用於取得相對應資料型別目前的時間值:CURRENT_DATE,CURRENT_TIME,CURRENT_TIMESTAMP,LOCALTIME,LOCALTIMESTAMP。後四者接受選擇性的 subsecond 級精確度。 (請參閱第 9.9.4 節。)請注意,這些是 SQL 函數,在資料輸入字串中會無法識別。
The output format of the date/time types can be set to one of the four styles ISO 8601, SQL (Ingres), traditional POSTGRES (Unix date format), or German. The default is the ISO format. (The SQL standard requires the use of the ISO 8601 format. The name of the “SQL” output format is a historical accident.) Table 8.14 shows examples of each output style. The output of the date
and time
types is generally only the date or time part in accordance with the given examples. However, the POSTGRES style outputs date-only values in ISO format.
Table 8.14. Date/Time Output Styles
Style Specification
Description
Example
ISO
ISO 8601, SQL standard
1997-12-17 07:37:16-08
SQL
traditional style
12/17/1997 07:37:16.00 PST
Postgres
original style
Wed Dec 17 07:37:16 1997 PST
German
regional style
17.12.1997 07:37:16.00 PST
ISO 8601 specifies the use of uppercase letter T
to separate the date and time. PostgreSQLaccepts that format on input, but on output it uses a space rather than T
, as shown above. This is for readability and for consistency with RFC 3339 as well as some other database systems.
In the SQL and POSTGRES styles, day appears before month if DMY field ordering has been specified, otherwise month appears before day. (See Section 8.5.1 for how this setting also affects interpretation of input values.) Table 8.15 shows examples.
Table 8.15. Date Order Conventions
datestyle
Setting
Input Ordering
Example Output
SQL, DMY
day
/month
/year
17/12/1997 15:37:16.00 CET
SQL, MDY
month
/day
/year
12/17/1997 07:37:16.00 PST
Postgres, DMY
day
/month
/year
Wed 17 Dec 07:37:16 1997 PST
The date/time style can be selected by the user using the SET datestyle
command, the DateStyle parameter in the postgresql.conf
configuration file, or the PGDATESTYLE
environment variable on the server or client.
The formatting function to_char
(see Section 9.8) is also available as a more flexible way to format date/time output.
Time zones, and time-zone conventions, are influenced by political decisions, not just earth geometry. Time zones around the world became somewhat standardized during the 1900s, but continue to be prone to arbitrary changes, particularly with respect to daylight-savings rules. PostgreSQL uses the widely-used IANA (Olson) time zone database for information about historical time zone rules. For times in the future, the assumption is that the latest known rules for a given time zone will continue to be observed indefinitely far into the future.
PostgreSQL endeavors to be compatible with the SQL standard definitions for typical usage. However, the SQL standard has an odd mix of date and time types and capabilities. Two obvious problems are:
Although the date
type cannot have an associated time zone, the time
type can. Time zones in the real world have little meaning unless associated with a date as well as a time, since the offset can vary through the year with daylight-saving time boundaries.
The default time zone is specified as a constant numeric offset from UTC. It is therefore impossible to adapt to daylight-saving time when doing date/time arithmetic across DST boundaries.
To address these difficulties, we recommend using date/time types that contain both date and time when using time zones. We do not recommend using the type time with time zone
(though it is supported by PostgreSQL for legacy applications and for compliance with the SQL standard). PostgreSQL assumes your local time zone for any type containing only date or time.
All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the zone specified by the TimeZone configuration parameter before being displayed to the client.
PostgreSQL allows you to specify time zones in three different forms:
A full time zone name, for example America/New_York
. The recognized time zone names are listed in the pg_timezone_names
view (see Section 51.90). PostgreSQL uses the widely-used IANA time zone data for this purpose, so the same time zone names are also recognized by much other software.
A time zone abbreviation, for example PST
. Such a specification merely defines a particular offset from UTC, in contrast to full time zone names which can imply a set of daylight savings transition-date rules as well. The recognized abbreviations are listed in the pg_timezone_abbrevs
view (see Section 51.89). You cannot set the configuration parameters TimeZone or log_timezone to a time zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME ZONE
operator.
In addition to the timezone names and abbreviations, PostgreSQL will accept POSIX-style time zone specifications of the form STDoffset
or STDoffsetDST
, where STD
is a zone abbreviation, offset
is a numeric offset in hours west from UTC, and DST
is an optional daylight-savings zone abbreviation, assumed to stand for one hour ahead of the given offset. For example, if EST5EDT
were not already a recognized zone name, it would be accepted and would be functionally equivalent to United States East Coast time. In this syntax, a zone abbreviation can be a string of letters, or an arbitrary string surrounded by angle brackets (<>
). When a daylight-savings zone abbreviation is present, it is assumed to be used according to the same daylight-savings transition rules used in the IANA time zone database's posixrules
entry. In a standard PostgreSQL installation, posixrules
is the same as US/Eastern
, so that POSIX-style time zone specifications follow USA daylight-savings rules. If needed, you can adjust this behavior by replacing the posixrules
file.
In short, this is the difference between abbreviations and full names: abbreviations represent a specific offset from UTC, whereas many of the full names imply a local daylight-savings time rule, and so have two possible UTC offsets. As an example, 2014-06-04 12:00 America/New_York
represents noon local time in New York, which for this particular date was Eastern Daylight Time (UTC-4). So 2014-06-04 12:00 EDT
specifies that same time instant. But 2014-06-04 12:00 EST
specifies noon Eastern Standard Time (UTC-5), regardless of whether daylight savings was nominally in effect on that date.
To complicate matters, some jurisdictions have used the same timezone abbreviation to mean different UTC offsets at different times; for example, in Moscow MSK
has meant UTC+3 in some years and UTC+4 in others. PostgreSQLinterprets such abbreviations according to whatever they meant (or had most recently meant) on the specified date; but, as with the EST
example above, this is not necessarily the same as local civil time on that date.
One should be wary that the POSIX-style time zone feature can lead to silently accepting bogus input, since there is no check on the reasonableness of the zone abbreviations. For example, SET TIMEZONE TO FOOBAR0
will work, leaving the system effectively using a rather peculiar abbreviation for UTC. Another issue to keep in mind is that in POSIX time zone names, positive offsets are used for locations west of Greenwich. Everywhere else, PostgreSQLfollows the ISO-8601 convention that positive timezone offsets are east of Greenwich.
In all cases, timezone names and abbreviations are recognized case-insensitively. (This is a change from PostgreSQL versions prior to 8.2, which were case-sensitive in some contexts but not others.)
Neither timezone names nor abbreviations are hard-wired into the server; they are obtained from configuration files stored under .../share/timezone/
and .../share/timezonesets/
of the installation directory (see Section B.3).
The TimeZone configuration parameter can be set in the file postgresql.conf
, or in any of the other standard ways described in Chapter 19. There are also some special ways to set it:
The SQL command SET TIME ZONE
sets the time zone for the session. This is an alternative spelling of SET TIMEZONE TO
with a more SQL-spec-compatible syntax.
The PGTZ
environment variable is used by libpq clients to send a SET TIME ZONE
command to the server upon connection.
interval
values can be written using the following verbose syntax:
where quantity
is a number (possibly signed); unit
is microsecond
, millisecond
, second
, minute
, hour
, day
, week
, month
, year
, decade
, century
, millennium
, or abbreviations or plurals of these units; direction
can be ago
or empty. The at sign (@
) is optional noise. The amounts of the different units are implicitly added with appropriate sign accounting. ago
negates all the fields. This syntax is also used for interval output, if IntervalStyle is set to postgres_verbose
.
Quantities of days, hours, minutes, and seconds can be specified without explicit unit markings. For example, '1 12:59:10'
is read the same as '1 day 12 hours 59 min 10 sec'
. Also, a combination of years and months can be specified with a dash; for example '200-10'
is read the same as '200 years 10 months'
. (These shorter forms are in fact the only ones allowed by the SQL standard, and are used for output when IntervalStyle
is set to sql_standard
.)
Interval values can also be written as ISO 8601 time intervals, using either the “format with designators” of the standard's section 4.4.3.2 or the “alternative format” of section 4.4.3.3. The format with designators looks like this:
The string must start with a P
, and may include a T
that introduces the time-of-day units. The available unit abbreviations are given in Table 8.16. Units may be omitted, and may be specified in any order, but units smaller than a day must appear after T
. In particular, the meaning of M
depends on whether it is before or after T
.
Table 8.16. ISO 8601 Interval Unit Abbreviations
Abbreviation
Meaning
Y
Years
M
Months (in the date part)
W
Weeks
D
Days
H
Hours
M
Minutes (in the time part)
S
Seconds
In the alternative format:
the string must begin with P
, and a T
separates the date and time parts of the interval. The values are given as numbers similar to ISO 8601 dates.
When writing an interval constant with a fields
specification, or when assigning a string to an interval column that was defined with a fields
specification, the interpretation of unmarked quantities depends on the fields
. For example INTERVAL '1' YEAR
is read as 1 year, whereas INTERVAL '1'
means 1 second. Also, field values “to the right” of the least significant field allowed by the fields
specification are silently discarded. For example, writing INTERVAL '1 day 2:03:04' HOUR TO MINUTE
results in dropping the seconds field, but not the day field.
According to the SQL standard all fields of an interval value must have the same sign, so a leading negative sign applies to all fields; for example the negative sign in the interval literal '-1 2:03:04'
applies to both the days and hour/minute/second parts. PostgreSQL allows the fields to have different signs, and traditionally treats each field in the textual representation as independently signed, so that the hour/minute/second part is considered positive in this example. If IntervalStyle
is set to sql_standard
then a leading sign is considered to apply to all fields (but only if no additional signs appear). Otherwise the traditional PostgreSQL interpretation is used. To avoid ambiguity, it's recommended to attach an explicit sign to each field if any field is negative.
Internally interval
values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. Because intervals are usually created from constant strings or timestamp
subtraction, this storage method works well in most cases. Functions justify_days
and justify_hours
are available for adjusting days and hours that overflow their normal ranges.
In the verbose input format, and in some fields of the more compact input formats, field values can have fractional parts; for example '1.5 week'
or '01:02:03.45'
. Such input is converted to the appropriate number of months, days, and seconds for storage. When this would result in a fractional number of months or days, the fraction is added to the lower-order fields using the conversion factors 1 month = 30 days and 1 day = 24 hours. For example,'1.5 month'
becomes 1 month and 15 days. Only seconds will ever be shown as fractional on output.
Table 8.17 shows some examples of valid interval
input.
Table 8.17. Interval Input
Example
Description
1-2
SQL standard format: 1 year 2 months
3 4:05:06
SQL standard format: 3 days 4 hours 5 minutes 6 seconds
1 year 2 months 3 days 4 hours 5 minutes 6 seconds
Traditional Postgres format: 1 year 2 months 3 days 4 hours 5 minutes 6 seconds
P1Y2M3DT4H5M6S
ISO 8601 “format with designators”: same meaning as above
P0001-02-03T04:05:06
ISO 8601 “alternative format”: same meaning as above
The output format of the interval type can be set to one of the four styles sql_standard
, postgres
, postgres_verbose
, or iso_8601
, using the command SET intervalstyle
. The default is the postgres
format. Table 8.18 shows examples of each output style.
The sql_standard
style produces output that conforms to the SQL standard's specification for interval literal strings, if the interval value meets the standard's restrictions (either year-month only or day-time only, with no mixing of positive and negative components). Otherwise the output looks like a standard year-month literal string followed by a day-time literal string, with explicit signs added to disambiguate mixed-sign intervals.
The output of the postgres
style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle parameter was set to ISO
.
The output of the postgres_verbose
style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle
parameter was set to non-ISO
output.
The output of the iso_8601
style matches the “format with designators” described in section 4.4.3.2 of the ISO 8601 standard.
Table 8.18. Interval Output Style Examples
Style Specification
Year-Month Interval
Day-Time Interval
Mixed Interval
sql_standard
1-2
3 4:05:06
-1-2 +3 -4:05:06
postgres
1 year 2 mons
3 days 04:05:06
-1 year -2 mons +3 days -04:05:06
postgres_verbose
@ 1 year 2 mons
@ 3 days 4 hours 5 mins 6 secs
@ 1 year 2 mons -3 days 4 hours 5 mins 6 secs ago
iso_8601
P1Y2M
P3DT4H5M6S
P-1Y-2M3DT-4H-5M-6S
Object identifiers (OIDs) are used internally byPostgreSQLas primary keys for various system tables. OIDs are not added to user-created tables, unlessWITH OIDS
is specified when the table is created, or thedefault_with_oidsconfiguration variable is enabled. Typeoid
represents an object identifier. There are also several alias types foroid
:regproc
,regprocedure
,regoper
,regoperator
,regclass
,regtype
,regrole
,regnamespace
,regconfig
, andregdictionary
.Table 8.24shows an overview.
Theoid
type is currently implemented as an unsigned four-byte integer. Therefore, it is not large enough to provide database-wide uniqueness in large databases, or even in large individual tables. So, using a user-created table's OID column as a primary key is discouraged. OIDs are best used only for references to system tables.
Theoid
type itself has few operations beyond comparison. It can be cast to integer, however, and then manipulated using the standard integer operators. (Beware of possible signed-versus-unsigned confusion if you do this.)
The OID alias types have no operations of their own except for specialized input and output routines. These routines are able to accept and display symbolic names for system objects, rather than the raw numeric value that typeoid
would use. The alias types allow simplified lookup of OID values for objects. For example, to examine thepg_attribute
rows related to a tablemytable
, one could write:
rather than:
While that doesn't look all that bad by itself, it's still oversimplified. A far more complicated sub-select would be needed to select the right OID if there are multiple tables namedmytable
in different schemas. Theregclass
input converter handles the table lookup according to the schema path setting, and so it does the“right thing”automatically. Similarly, casting a table's OID toregclass
is handy for symbolic display of a numeric OID.
Table 8.24. Object Identifier Types
Name
References
Description
Value Example
oid
any
numeric object identifier
564182
regproc
pg_proc
function name
sum
regprocedure
pg_proc
function with argument types
sum(int4)
regoper
pg_operator
operator name
+
regoperator
pg_operator
operator with argument types
*(integer,integer)
or-(NONE,integer)
regclass
pg_class
relation name
pg_type
regtype
pg_type
data type name
integer
regrole
pg_authid
role name
smithee
regnamespace
pg_namespace
namespace name
pg_catalog
regconfig
pg_ts_config
text search configuration
english
regdictionary
pg_ts_dict
text search dictionary
simple
All of the OID alias types for objects grouped by namespace accept schema-qualified names, and will display schema-qualified names on output if the object would not be found in the current search path without being qualified. Theregproc
andregoper
alias types will only accept input names that are unique (not overloaded), so they are of limited use; for most usesregprocedure
orregoperator
are more appropriate. Forregoperator
, unary operators are identified by writingNONE
for the unused operand.
An additional property of most of the OID alias types is the creation of dependencies. If a constant of one of these types appears in a stored expression (such as a column default expression or view), it creates a dependency on the referenced object. For example, if a column has a default expressionnextval('my_seq'::regclass)
,PostgreSQLunderstands that the default expression depends on the sequencemy_seq
; the system will not let the sequence be dropped without first removing the default expression.regrole
is the only exception for the property. Constants of this type are not allowed in such expressions.
The OID alias types do not completely follow transaction isolation rules. The planner also treats them as simple constants, which may result in sub-optimal planning.
Another identifier type used by the system isxid
, or transaction (abbreviatedxact) identifier. This is the data type of the system columnsxmin
andxmax
. Transaction identifiers are 32-bit quantities.
A third identifier type used by the system iscid
, or command identifier. This is the data type of the system columnscmin
andcmax
. Command identifiers are also 32-bit quantities.
A final identifier type used by the system istid
, or tuple identifier (row identifier). This is the data type of the system columnctid
. A tuple ID is a pair (block number, tuple index within block) that identifies the physical location of the row within its table.
(The system columns are further explained inSection 5.4.)
ThePostgreSQLtype system contains a number of special-purpose entries that are collectively calledpseudo-types. A pseudo-type cannot be used as a column data type, but it can be used to declare a function's argument or result type. Each of the available pseudo-types is useful in situations where a function's behavior does not correspond to simply taking or returning a value of a specificSQLdata type.Table 8.25lists the existing pseudo-types.
Table 8.25. Pseudo-Types
Name
Description
any
Indicates that a function accepts any input data type.
anyelement
anyarray
anynonarray
anyenum
anyrange
cstring
Indicates that a function accepts or returns a null-terminated C string.
internal
Indicates that a function accepts or returns a server-internal data type.
language_handler
A procedural language call handler is declared to returnlanguage_handler
.
fdw_handler
A foreign-data wrapper handler is declared to returnfdw_handler
.
index_am_handler
An index access method handler is declared to returnindex_am_handler
.
tsm_handler
A tablesample method handler is declared to returntsm_handler
.
record
Identifies a function taking or returning an unspecified row type.
trigger
A trigger function is declared to returntrigger.
event_trigger
An event trigger function is declared to returnevent_trigger.
pg_ddl_command
Identifies a representation of DDL commands that is available to event triggers.
void
Indicates that a function returns no value.
unknown
Identifies a not-yet-resolved type, e.g. of an undecorated string literal.
opaque
An obsolete type name that formerly served many of the above purposes.
Functions coded in C (whether built-in or dynamically loaded) can be declared to accept or return any of these pseudo data types. It is up to the function author to ensure that the function will behave safely when a pseudo-type is used as an argument type.
Functions coded in procedural languages can use pseudo-types only as allowed by their implementation languages. At present most procedural languages forbid use of a pseudo-type as an argument type, and allow onlyvoid
andrecord
as a result type (plustrigger
orevent_trigger
when the function is used as a trigger or event trigger). Some also support polymorphic functions using the typesanyelement
,anyarray
,anynonarray
,anyenum
, andanyrange
.
Theinternal
pseudo-type is used to declare functions that are meant only to be called internally by the database system, and not by direct invocation in anSQLquery. If a function has at least oneinternal
-type argument then it cannot be called fromSQL. To preserve the type safety of this restriction it is important to follow this coding rule: do not create any function that is declared to returninternal
unless it has at least oneinternal
argument.
Indicates that a function accepts any data type (see).
Indicates that a function accepts any array data type (see).
Indicates that a function accepts any non-array data type (see).
Indicates that a function accepts any enum data type (seeand).
Indicates that a function accepts any range data type (seeand).