1 of 22

8. 資料型別

PostgreSQL 內建一套豐富的資料型別供用戶使用。使用者也可以使用指令讓 PostgreSQL 增加新的資料型別。

Table 8.1 列出所有內建的通用資料型別。大多數列在「Aliases」中的替代名稱是由於在 PostgreSQL 內部使用的歷史因素。此外，還有一些內部使用或不建議使用的資料型別，但這裡並沒有列出。

Table 8.1. Data Types

相容性

以下資料型別（或其拼寫方式）是由 SQL 指定的：bigint,bit,bit varying,boolean,char,character varying,character,varchar,date,double precision,integer,interval,numeric,decimal,real,smallint,time(with or without time zone),timestamp(with or without time zone),xml.

每種資料型別都具有其明確的輸入和輸出功能外部表示法。許多內建的資料型別都有明顯的外部格式。但是，有幾種資料型別是 PostgreSQL 獨有的，比如幾何路徑，或者有幾種可能的格式，像是日期和時間型別。某些輸入和輸出功能是不可逆的，意即，與原始輸入相比，輸出功能的結果可能會失去一些精確度。

8.1. 數字型別

數字型別由兩位數，四位數和八位數整數，四位元組和八位元組的浮點數以及可調式精確度的小數組成。表格 8.2 列出了可用的類型。

Table 8.2. Numeric Types

4.1.2 節描述了數字型別常數的語法。數字型別有一整套相應的算術運算元和函數。有關更多訊息，請參閱第 9 章。以下各節將詳細介紹這些型別。

8.1.1. 整數型別（Integer Types）

smallint、integer 和 bigint 型別儲存整數，即不包含小數部分的各種範圍的數字。嘗試儲存在允許的範圍之外的數值將會導致錯誤。

「integer」型別是常見的選擇，因為它提供了數值範圍、儲存空間及效能之間的最佳平衡。「smallint」列別通常只在磁碟空間不足的情況下使用。「bigint」型別被設計用於整數型別的範圍不足時。

SQL僅指定整數型別 integer（或 int）、smallint 和 bigint。型別名稱 int2、int4 和 int8 則是延伸型別，也有一些其他 SQL 資料庫系統使用。

8.1.2. 可調式精確度數值型別（NUMERIC Type）

數字型別可以儲存很多位數的數字。特別建議使用在要求正確性的地方，像是儲存貨幣金額或其他數量。使用數值的計算在可能需要的情況下得到確切的結果，例如加法、減法、乘法。但是，與整數型別或下一節中介紹的浮點型別相比，對數值的計算速度非常緩慢。

我們使用下面的術語：數字的「scale」是小數點右邊的小數部分，也就是小數的位數。數字的「precision」是整數中有效位數的總數，即小數點兩邊的位數總合。所以 23.5141 的 precision 是 6，scale 是 4。整數可以被認為是 scale 為 0。

可以配置數字欄位的最大 precision 和最大 scale。要宣告數字型別的欄位，請使用以下語法：

NUMERIC(precision, scale)

precision 必須是正值，scale 為零或正值。或是：

NUMERIC(precision)

選擇 0 為 scale。這樣使用：

NUMERIC

沒有任何 precision 或 scale 的話，就會建立一個欄位，其欄位中可以儲存任何 precision 和 scale 的數字值，直到達到 precision 的極限。這種型別的欄位不會將輸入值強制轉為任何特定的 scale，其中具有聲明比例的數字欄位會將輸入值強制為該 scale。（SQL 標準需要預設 scale 為 0，即強制為整數精度，我們發現這樣做有點無用。如果你擔心可移植性，請務必明確指定 precision 和 scale。

注意
在型別宣告中明確指定時允許的最大 precision 為 1000；沒有指定 precision 的NUMERIC 為 Table 8.2 中所述的限制。

如果要儲存的小數位數大於欄位所宣告的 scale，則係統會將值四捨五入到宣告所指定的小數位數。然後，如果小數點左邊的位數超過宣告的 precise 減去聲明的 scale 的話，則會產生錯誤。

數字內容的實體儲存不會有任何額外的前導位數或補零。因此，欄位宣告的 precise 和 scale 是最大值，而不是固定的分配。（在這個意義上，數字型別更像是 varchar(n) 而不是 char(n)。）實際儲存的要求是每四個十進制數字組加兩個位元組，再加上三到八個位元組的額外配置。

除了普通的數值之外，數字型別還允許特殊值 NaN，意思是「不是一個數字」。 NaN 的任何操作都會產生另一個 NaN。在 SQL 指令中將此值作為常數寫入時，必須在其中使用單引號，例如 UPDATE table SET x = 'NaN'。在輸入時，字串 NaN 識別是不區分大小寫的。

注意
「非數字」的概念在大多數實作中，NaN 不被視為等於任何其他數值（包括 NaN）。為了允許數值在樹狀索引中排序和使用，PostgreSQL 將 NaN 值視為相等或大於所有的非 NaN 值。

decimal 和 numeric 的型別是相同的。這兩種型別都是 SQL 標準的一部分。

當需要四捨五入時，數字型別會往離零較遠的值調整，而（在大多數機器上）實數和雙精度型別會調整到最接近的偶數。例如：

SELECT x,
  round(x::numeric) AS num_round,
  round(x::double precision) AS dbl_round
FROM generate_series(-3.5, 3.5, 1) as x;
  x   | num_round | dbl_round
------+-----------+-----------
 -3.5 |        -4 |        -4
 -2.5 |        -3 |        -2
 -1.5 |        -2 |        -2
 -0.5 |        -1 |        -0
  0.5 |         1 |         0
  1.5 |         2 |         2
  2.5 |         3 |         2
  3.5 |         4 |         4
(8 rows)

8.1.3. 浮點數型別（Floating-Point Types）

資料型別中 real 和 double 是非精確的、可變精確度的數字型別。在實務上，這些型別通常是針對二進制浮點數運算（分別為單精度和雙精度）的IEEE 754標準的實作，需要底層的中央處理器、作業系統和編譯器支持。

非精確意味著某些值不能完全轉換為內部格式，並以近似值儲存，因此儲存和檢索值可能會表現出輕微的差異。管理這些誤差以及它們如何計算傳遞是數學和計算機科學分支的主題，除了以下幾點之外，這裡不再討論：

如果你需要精確的儲存和計算（例如貨幣金額），請改為使用 numeric 型別。
如果你想對這些型別做任何重要的複雜計算，特別是如果你依賴邊界情況下的某些行為（極大極小值或超過上下限），你應該仔細評估實作方式。
比較兩個相等的浮點數值可能並不總是按預期中直覺的方式運作。

在大多數平台上，real 型別的範圍至少為 1E-37 至 1E + 37，精確度至少為 6 位數十進制數字。double 型別的範圍通常在 1E-307 至 1E + 308 之間，精確度至少為 15 位數。數值太大或太小都會導致錯誤。如果輸入數字的精確度太高，四捨五入的情況則可能會發生。數字太接近於零，卻不能表示為零的話，將導致 underflow 超過下限的錯誤。

注意
extra_float_digits 參數設定控制浮點數轉換為文字輸出時所包含的額外有效位數。使用預設值 0 時，PostgreSQL 支援的每個平台上的輸出都是相同的。增加它的話，能更精確地輸出儲存值，但可能在不同平台間是不同的結果。

除了普通的數值之外，浮點型別還有幾個特殊的值：

Infinity -Infinity NaN

這些分別代表 IEEE 754 特殊值「無限大」、「負無限大」和「非數字」。（在浮點數計算不符合 IEEE 754 標準的機器上，這些值可能無法如期運作。）在 SQL 指令中將這些值作為常數寫入時，必須在其放入單引號中，例如 UPDATE table SET x = '-Infinity'。在輸入時，這些字串識別是不區分大小寫的。

注意
IEEE 754 規定 NaN 不應與任何其他浮點數值（包括NaN）相等。為了允許浮點值在樹狀索引中排序和使用，PostgreSQL 將 NaN 視為相等或大於所有非 NaN 的數值。

PostgreSQL 也支援 SQL 標準的 float 和 float(p) 來表示非精確的數字型別。這裡，p 指的是二進位數字的最小可接受的精確度。PostgreSQL 接受 float(1) 到 float(24) 選擇視為 real 型別，而 float(25) 到 float(53) 則視為 double。p 超出允許範圍的話會產生錯誤。沒有指定精確度的浮點數意味著 double。

注意
假設 real 和 double 的尾數分別為 24 位和 53 位，以 IEEE 標準浮點數實作而言是正確的。在非 IEEE 平台上，它可能會有一些小問題，但為了簡單起見，最好在所有平台上都使用相同的 p 範圍。

8.1.4. 序列型別（Serial Types）

注意
本節介紹的是 PostgreSQL 專屬建立自動增量（auto-incrementing）欄位的方式。另一種方式是使用 CREATE TABLE 中描述的 SQL 標準識別欄位功能。

資料型別 smallserial、serial 和 bigserial 都不是真正的型別，而僅僅是建立唯一識別欄位（類似於某些其他資料庫所支援的 AUTO_INCREMENT 屬性）的方便型別語法。以目前的實作方式，請使用：

CREATE TABLE tablename (
   colname SERIAL
);

相當於以下的指令：

CREATE SEQUENCE tablename_colname_seq;
CREATE TABLE tablename (
   colname integer NOT NULL DEFAULT nextval('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;

因此，我們建立了一個整數欄位，並將其預設值設定為序列數字產生器。使用 NOT NULL 限制條件來確保無法插入空值。（在大多數情況下，你還需要附加一個 UNIQUE 或 PRIMARY KEY 限制條件來防止偶然插入重複值，但這不是自動的。）最後，這個序列被標記為「owned by」欄位，以便在欄位或資料表被刪除時一併被刪除。

注意
smallserial、serial 和 bigserial，被實作來實現序列數字，即使沒有資料列被刪除，在欄位中出現的值在序列中仍可能會有「漏洞」或缺口。即使包含該值的資料列從未成功插入資料表中，從序列中分配的值仍然會用完。例如，如果資料插入的交易回溯了，則可能發生這種情況。有關詳細訊息，請參閱第 9.16 節中的 nextval()。

要將序列的下一個值插入到序列欄位中，請指定序列欄位應被分配其預設值。這可以透過從 INSERT 語句中欄位列表中排除欄位或使用DEFAULT關鍵字來完成。

型別名稱 serial 和 serial4 是等價的：都是建立整數（integer）欄位。型別名稱 bigserial 和 serial8 也以相同的方式作用，差別是他們建立一個 bigint 的欄位。如果你預期在資料表的整個生命週期中使用超過 2^31 個標識符，則應使用 bigserial。型別名稱 smallserial 和 serial2 也是以相同的，而除了它們是建立一個 smallint 欄位。

當擁有的欄位被刪除時，為序列欄位創建的序列也將自動刪除。但你可以刪除序列而不刪除欄位，這會強制刪除欄位的預設表示式。

8.2. 貨幣型別

貨幣型別儲存具有固定小數精確度的貨幣數量；詳見表 8.3。小數精確度視資料庫的設定而定。表中顯示的範圍假設有兩個小數位。有許多可以接受的格式，包括整數和浮點數字，以及典型的貨幣格式，例如如「$1,000.00」。輸出時通常採用後者的形式，但取決於語言環境（locale）。

Table 8.3. Monetary Types

由於此資料型別的輸出是與區域設定有關的，因此可能無法將貨幣資料載入到不同 lc_monetary 設定的資料庫中。為避免出現問題，在將轉換恢復到新的資料庫之前，請確保 lc_monetary 與轉換的資料庫中的設定值相容。

numberic、int 和 bigint 資料型別的值可以轉換為 money。從 real 和 double precision 資料型別轉換會先轉為 numeric 來完成，例如：

但是，並不推薦這樣做。由於四捨五入誤差的可能性，不應該使用浮點數來處理貨幣。

money 型別的數值可以轉換為 numeric 而不會損失精確度。轉換為其他型別可能會失去精確性，而且還必須分兩步驟完成：

當貨幣數值除以另一貨幣數值時，結果會是 double precision（即純數，而不是貨幣）；貨幣單位會相互抵消。

8.3. 字串型別

Table 8.4. Character Types

Table 8.4 列出了 PostgreSQL 中可用的通用字串型別。

SQL 定義了兩種主要字串型別：character varying(n) 和 character(n)，其中 n 是正整數。這兩種型別都可以儲存長度最多為 n 個字元（不是位元組）的字串。嘗試將較長的字串儲存到這些型別的欄位中將産生錯誤，除非多餘的字元都是空格，在這種情況下，字串將被截斷為最大長度。（這個有點奇怪的異常是 SQL 標準所要求的。）如果要儲存的字串比宣告的長度短，則 character 型別的值將被空格填充；character varying 的值將只儲存較短的字串。

如果明確地將值轉換為 character varying(n) 或 character(n)，則超長值將被截斷為 n 個字元而不會引發錯誤。（這也是 SQL 標準所要求的。）

型別 varchar(n) 和 char(n) 分別是 character varying(n) 和 character(n) 的別名。沒有長度的 character 等同於 character(1)。如果在沒有長度的情況下使用 character varying，則該型別接受任何長度的字串。後者是 PostgreSQL 延伸功能。

另外，PostgreSQL 提供了 text 型別，它儲存任意長度的字串。雖然型別 text 不在 SQL 標準中，但是其他幾個 SQL 資料庫管理系統也支援它。

character 的值用空格填充到指定的長度 n，並以這種方式儲存和顯示。但是，在比較兩個型別字串時，尾隨空格在語義上無關緊要會被忽略。在空格很重要的排序規則中，這種行為會產生意想不到的結果; 例如 SELECT 'a '::CHAR(2) collate "C"<E'a\n'::CHAR(2) 會回傳 true，即使 C 語言環境會認為空格大於換行符。將字串轉換為其他字串型別之一時，將刪除尾隨的空格。請注意，尾隨空格在 character varying 和 text 方面具有語義重要性，尤其在使用樣式匹配時，即 LIKE 和正規表示式。

短字串（126 個位元組以下）的儲存要求是 1 個位元組加上實際字串，其中包括字串空間填充。較長的字串有 4 個位元組的開銷而不是 1。長字串由系統自動壓縮，因此磁碟上的物理需求可能更少。非常長的值也儲存在後台的資料表中，這樣它們就不會干擾對較短欄位的快速存取。在任何情況下，可儲存的最長字串大約為 1 GB。（資料型別宣告中 n 允許的最大值小於此值。更改此值沒有用，因為使用多位元組字串編碼時，位元組數和字元數可能完全不同。如果您希望儲存沒有特定上限的長字串，使用不帶長度的 text 或 character varying，而不是隨便設定長度限制。）

小提醒

這三種型別之間並沒有效能差異，除了使用空白填充類型時增加的儲存空間之外，以及一些額外的 CPU 週期來檢查儲存長度與欄位中的長度。雖然 character(n) 在其他一些資料庫系統中具有效能優勢，但 PostgreSQL 中並沒有這樣的優勢；事實上，由於額外的儲存成本，character(n) 通常是三者中最慢的。在大多數情況下，應使用 text 或 character varying。

有關字串文字語法的資訊，請參閱；有關可用運算子和函數的資訊，請參閱。資料庫字元集決定用於儲存文字的字元集；有關字元集支援的更多訊息，請參閱。

Example 8.1. Using the Character Types

PostgreSQL 中還有另外兩種固定長度的字串型別，如 Table 8.5 所示。name 型別僅用於在內部系統目錄中儲存指標，並非供一般使用者使用。它的長度目前定義為 64 個位元組（63 個可用字元加結尾符號），但應視 C 原始碼中的常數 NAMEDATALEN 而定。長度在編譯時設定（因此可以根據特殊用途進行調整）; 預設的最大長度可能會在將來的版本中變更。型別「“char”」（注意雙引號）與 char(1) 的不同之處在於它僅使用一個位元組的儲存空間。它在系統目錄中作為簡單內部使用的列舉型別。

Table 8.5. Special Character Types

8.4. 位元組型別（bytea）

bytea 資料型別允許儲存位元組字串；詳見。

Table 8.6. Binary Data Types

位元組字串是位元組的序列。位元組字串以兩種方式與字串區分開來。首先，位元組字串特別允許儲存零值的位元組和其他「不可列印」位元組（通常是在 32 到 126 範圍之外的位元組）。字串不允許全為零位元組，並且還禁止資料庫選擇無效的字元集編碼序列。其次，對位元組字串的操作處理實際的位元組，而字串的處理取決於區域設定。簡而言之，位元組字串適合於儲存程式設計師認為是「raw bytes」的資料，而字串適合於儲存文字。

bytea 型別支援兩種輸入和輸出的外部格式：PostgreSQL 既有的「escape」格式和「十六進位」格式，輸入時始終接受這兩個。輸出格式取決於組態參數；預設值為十六進位。（注意，在 PostgreSQL 9.0 中引入了十六進位格式；早期版本和一些工具並無法解譯它。）

SQL 標準定義了一種不同的位元組字串型別，稱為 BLOB 或 BINARY LARGE OBJECT。輸入格式與 bytea 不同，但提供的函數和運算子大致相同。

8.4.1. `bytea` 十六進位格式

「十六進位」格式將二進位資料編碼為每個位元組為 2 個十六進位數字，儲存不反轉。整個字符串前面是序列 \x（以區別於轉譯格式）。在某些情況下，初始倒斜線可能需要透過加倍來進行轉譯，在相同的情況下，倒斜線必須以轉譯格式加倍；細節如下。十六進位數字可以是大寫或小寫，並且在數字組之間允許空格（但不在數字組內，也不在起始 \x 序列中）。十六進位格式與各種外部應用程序和協議相容，並且轉換速度往往比轉譯格式更快，因此偏好使用它。

例如：

8.4.2. `bytea` 轉譯（escape）格式

「轉義」格式是 bytea 型別的傳統 PostgreSQL 格式。它採用將位元組字串表示為 ASCII 字元序列的方法，同時將那些不能表示為 ASCII 字元的位元組轉換為特殊的轉譯序列。如果從應用程序的角度來看，將位元組表示為字元是有意義的，那麼這種表示可以很方便。但實際上它通常會令人困惑，因為它模糊了位元組字串和字串之間的區別，而且所選擇的特定轉譯機制也有點笨拙。因此，對於大多數新的應用程序，應該避免使用此格式。

以轉譯格式輸入 bytea 值時，必須轉譯某些值的位元組，也同時可以轉譯所有位元組值。通常，要轉譯位元組，請將其轉換為三位數的八進位值，並在其前面加一個倒斜線（或兩個倒斜線，如果要使用轉譯字串語法將值寫為文字的話）。倒斜線本身（位元組 92）也可以用雙倒斜線表示。列出了必須轉譯的字元，並在適合的情況下提供了備用轉譯序列。

Table 8.7. `bytea` Literal Escaped Octets

Table 8.8. `bytea` Output Escaped Octets

根據您使用的 PostgreSQL 的前端，在轉譯和未轉譯 bytea 字串方面可能還有其他工作要做。例如，如果您的界面會自動轉譯這些，您可能還必須轉譯換行符號和回行首符號。

8.5. 日期時間型別

PostgreSQL 支援完整的 SQL 日期和時間格式，如表 8.9 所示。對於這些資料型態能使用的操作，將會在9.9節說明。

Table 8.9. 日期/時間型態

注意

SQL 標準中要求 timestamp 的效果等同於 timestamp without time zone，對此 PostgreSQL 尊重這個行為。同時 PostgreSQL 額外擴充了 timestamptz 作為 timestamp with time zone 的縮寫。

time、timestamp 和 interval 接受 p 作為非必須的精度參數，可指定秒的欄位保留的小數位數。預設情況下，精度沒有明確的界限。其中 p 允許的範圍是 0 到 6。

interval 型態有個額外的選項，可以寫下下列其中一個詞組來限制存放的欄位：

YEAR
MONTH
DAY
HOUR
MINUTE
SECOND
YEAR TO MONTH
DAY TO HOUR
DAY TO MINUTE
DAY TO SECOND
HOUR TO MINUTE
HOUR TO SECOND
MINUTE TO SECOND

需注意若是 fields 和 p 同時指定時，fields 必須要包含 SECOND。這是因為精度只會套用在秒上。

time with time zone 型態是由 SQL 標準所定義的，但是在定義中展示的屬性會導致對有用性產生疑問。在多數狀況下，date、time、timestamp without time zone 和 timestamp with time zone 的組合應該就能提供任何應用程式需要的完整日期/時間功能。

abstime 和 reltime 型態是較低精度的內部用型態，並不建議將這些型態用在應用程式中；這些內部型態也可能在未來的釋出中消失。

8.5.1. 日期/時間輸入

日期和時間的輸入格式可以接受幾乎任何合理的格式，包括 ISO 8601、相容於 SQL 的格式、傳統 POSTGRES 格式或者其他格式。在部份格式中，日期的年、月、日的順序可能很含糊，因此有支援指定這些欄位期望的順序。可以設定 DateStyle 參數為 MDY 來以月-日-年表示、設定為 DMY 以日-月-年表示、或者設定為 YMD 以年-月-日表示。

PostgreSQL 在處理日期/時間的輸入是比 SQL 標準要求的更加靈活，關於精確的解析規則以及包含月份、一週天數、時區等可以接受的文字欄位，可以參閱附錄 B。

請記得，任何日期和時間字面的輸入，都需要像文字一樣以單引號結束，詳細的資訊請參閱4.1.2.7 節。SQL 要求使用以下的語法：

type [ (p) ] 'value'

其中 p 是非必須的精度設定，用來指定秒欄位的小數位數。精度可以用來指定 time、timestamp 和 interval 型態，可指定範圍為 0 到 6。如果沒有指定精度時，預設將以字面數值的精度為準（但最多不超過 6 位）。

8.5.1.1. 日期

表 8.10 列出 date 型態的一些可能的輸入格式：

表 8.10. 日期輸入

8.5.1.2. 時間

time-of-day 格式包含 time [ (p) ] without time zone和time [ (_p_\) \] with time zone，其中 time 單獨出現時等同於 time without time zone。

這些型態的合法輸入包含了一天當中的時間，以及非必須的時區。（請參照表 8.11 和表 8.12）。如果在 time without time zone 的輸入中指定了時區，則時區會被無聲地忽略。你也可以指定日期，但日期也會被忽略，除非你指定的時區名稱是像 America/New_York 這種具有日光節約規則的時區，因為在這種狀況下，為了能夠決定要套用一般規則或是日光節約規則，必須要有日期。適合的時差資訊會被紀錄在 time with time zone 的值當中。

表 8.11. 時間輸入

表 8.12. 時區輸入

關於指定時區的其他資訊，請參照8.5.3節。

8.5.1.3. 時間戳記

時間戳記型態的合法輸入，依序包含了日期、時間、非必須的時區、以及非必須的 AD 或者 BC。（其中，AD 或者 BC 也可以寫在時區前面，但這並非推薦的格式。）因此：

1999-01-08 04:05:06

以及：

1999-01-08 04:05:06 -8:00

都是遵循 ISO 8601 標準的合法值。除此之外，常見的格式：

January 8 04:05:06 1999 PST

也有支援。

SQL 標準中，timestamp without time zone 和 timestamp with time zone 字面可以在時間後面加上 “+” 或 “-” 符號和時差來做區別，因此根據這個標準，

TIMESTAMP '2004-10-19 10:23:54'

是 timestamp without time zone 型態，而

TIMESTAMP '2004-10-19 10:23:54+02'

則是 timestamp with time zone 型態。PostgreSQL 從不會在識別型態前就解析字面的內容，因此會將上述兩種值都視為 timestamp without time zone 型態。如要確保字面會被視為 timestamp with time zone，請給它正確而明確的型態：

TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'

在一個已被確定為沒有時區的時間戳記的字串中，PostgreSQL 將默默地忽略任何時區指示。也就是說，結果值是從輸入值中的日期/時間字串產生的，而不針對時區進行調整。

對於帶有時區的時間戳記，內部儲存的值始終為 UTC（Universal Coordinated Time，傳統上稱為格林威治標準時間，GMT）。具有指定時區的輸入值將使用該時區的相對偏移量轉換為 UTC。如果輸入字串中未指定時區，則假定它位於系統的 TimeZone 參數所指示的時區中，並使用時區的偏移量轉換為 UTC。

輸出帶有時區值的時間戳記時，始終由 UTC 轉換為目前時區，並在該時區中顯示為本地時間。要查看另一個時區的時間，請變更時區或使用 AT TIME ZONE 語法（參閱第 9.9.3 節）。

沒有時區的時間戳記和帶時區的時間戳記之間的轉換通常假定應該採用沒有時區值的時間戳記或本地時間所給予的時區。可以使用 AT TIME ZONE 為指定轉換不同的時區。

8.5.1.4. 特殊值

為方便起見，PostgreSQL 支援幾個特殊的日期/時間輸入值，如 Table 8.13 所示。infinaity 和 -infinity 值在系統內部有特別的表示，但不會顯示；而其他的只是符號縮寫，在閱讀時會轉換為普通的日期/時間值。（特別是，now 和相關的字串一旦被讀取就會被轉換為特定的時間值。）當在 SQL 命令中要作為常數使用時，所有這些值都需要用單引號括起來。

Table 8.13. Special Date/Time Inputs

以下 SQL 相容函數也可用於取得相對應資料型別目前的時間值：CURRENT_DATE，CURRENT_TIME，CURRENT_TIMESTAMP，LOCALTIME，LOCALTIMESTAMP。後四者接受選擇性的 subsecond 級精確度。（請參閱第 9.9.4 節。）請注意，這些是 SQL 函數，在資料輸入字串中會無法識別。

8.5.2. Date/Time Output

The output format of the date/time types can be set to one of the four styles ISO 8601, SQL (Ingres), traditional POSTGRES (Unix date format), or German. The default is the ISO format. (The SQL standard requires the use of the ISO 8601 format. The name of the “SQL” output format is a historical accident.) Table 8.14 shows examples of each output style. The output of the date and time types is generally only the date or time part in accordance with the given examples. However, the POSTGRES style outputs date-only values in ISO format.

Table 8.14. Date/Time Output Styles

Note

ISO 8601 specifies the use of uppercase letter T to separate the date and time. PostgreSQLaccepts that format on input, but on output it uses a space rather than T, as shown above. This is for readability and for consistency with RFC 3339 as well as some other database systems.

In the SQL and POSTGRES styles, day appears before month if DMY field ordering has been specified, otherwise month appears before day. (See Section 8.5.1 for how this setting also affects interpretation of input values.) Table 8.15 shows examples.

Table 8.15. Date Order Conventions

The date/time style can be selected by the user using the SET datestyle command, the DateStyle parameter in the postgresql.conf configuration file, or the PGDATESTYLE environment variable on the server or client.

The formatting function to_char (see Section 9.8) is also available as a more flexible way to format date/time output.

8.5.3. Time Zones

Time zones, and time-zone conventions, are influenced by political decisions, not just earth geometry. Time zones around the world became somewhat standardized during the 1900s, but continue to be prone to arbitrary changes, particularly with respect to daylight-savings rules. PostgreSQL uses the widely-used IANA (Olson) time zone database for information about historical time zone rules. For times in the future, the assumption is that the latest known rules for a given time zone will continue to be observed indefinitely far into the future.

PostgreSQL endeavors to be compatible with the SQL standard definitions for typical usage. However, the SQL standard has an odd mix of date and time types and capabilities. Two obvious problems are:

Although the date type cannot have an associated time zone, the time type can. Time zones in the real world have little meaning unless associated with a date as well as a time, since the offset can vary through the year with daylight-saving time boundaries.
The default time zone is specified as a constant numeric offset from UTC. It is therefore impossible to adapt to daylight-saving time when doing date/time arithmetic across DST boundaries.

To address these difficulties, we recommend using date/time types that contain both date and time when using time zones. We do not recommend using the type time with time zone (though it is supported by PostgreSQL for legacy applications and for compliance with the SQL standard). PostgreSQL assumes your local time zone for any type containing only date or time.

All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the zone specified by the TimeZone configuration parameter before being displayed to the client.

PostgreSQL allows you to specify time zones in three different forms:

A full time zone name, for example America/New_York. The recognized time zone names are listed in the pg_timezone_names view (see Section 51.90). PostgreSQL uses the widely-used IANA time zone data for this purpose, so the same time zone names are also recognized by much other software.
A time zone abbreviation, for example PST. Such a specification merely defines a particular offset from UTC, in contrast to full time zone names which can imply a set of daylight savings transition-date rules as well. The recognized abbreviations are listed in the pg_timezone_abbrevs view (see Section 51.89). You cannot set the configuration parameters TimeZone or log_timezone to a time zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME ZONE operator.
In addition to the timezone names and abbreviations, PostgreSQL will accept POSIX-style time zone specifications of the form STDoffset or STDoffsetDST, where STD is a zone abbreviation, offset is a numeric offset in hours west from UTC, and DST is an optional daylight-savings zone abbreviation, assumed to stand for one hour ahead of the given offset. For example, if EST5EDT were not already a recognized zone name, it would be accepted and would be functionally equivalent to United States East Coast time. In this syntax, a zone abbreviation can be a string of letters, or an arbitrary string surrounded by angle brackets (<>). When a daylight-savings zone abbreviation is present, it is assumed to be used according to the same daylight-savings transition rules used in the IANA time zone database's posixrules entry. In a standard PostgreSQL installation, posixrules is the same as US/Eastern, so that POSIX-style time zone specifications follow USA daylight-savings rules. If needed, you can adjust this behavior by replacing the posixrules file.

In short, this is the difference between abbreviations and full names: abbreviations represent a specific offset from UTC, whereas many of the full names imply a local daylight-savings time rule, and so have two possible UTC offsets. As an example, 2014-06-04 12:00 America/New_York represents noon local time in New York, which for this particular date was Eastern Daylight Time (UTC-4). So 2014-06-04 12:00 EDT specifies that same time instant. But 2014-06-04 12:00 EST specifies noon Eastern Standard Time (UTC-5), regardless of whether daylight savings was nominally in effect on that date.

To complicate matters, some jurisdictions have used the same timezone abbreviation to mean different UTC offsets at different times; for example, in Moscow MSK has meant UTC+3 in some years and UTC+4 in others. PostgreSQLinterprets such abbreviations according to whatever they meant (or had most recently meant) on the specified date; but, as with the EST example above, this is not necessarily the same as local civil time on that date.

One should be wary that the POSIX-style time zone feature can lead to silently accepting bogus input, since there is no check on the reasonableness of the zone abbreviations. For example, SET TIMEZONE TO FOOBAR0 will work, leaving the system effectively using a rather peculiar abbreviation for UTC. Another issue to keep in mind is that in POSIX time zone names, positive offsets are used for locations west of Greenwich. Everywhere else, PostgreSQLfollows the ISO-8601 convention that positive timezone offsets are east of Greenwich.

In all cases, timezone names and abbreviations are recognized case-insensitively. (This is a change from PostgreSQL versions prior to 8.2, which were case-sensitive in some contexts but not others.)

Neither timezone names nor abbreviations are hard-wired into the server; they are obtained from configuration files stored under .../share/timezone/ and .../share/timezonesets/ of the installation directory (see Section B.3).

The TimeZone configuration parameter can be set in the file postgresql.conf, or in any of the other standard ways described in Chapter 19. There are also some special ways to set it:

The SQL command SET TIME ZONE sets the time zone for the session. This is an alternative spelling of SET TIMEZONE TO with a more SQL-spec-compatible syntax.
The PGTZ environment variable is used by libpq clients to send a SET TIME ZONE command to the server upon connection.

8.5.4. Interval Input

interval values can be written using the following verbose syntax:

[@] quantity unit [quantity unit...] [direction]

where quantity is a number (possibly signed); unit is microsecond, millisecond, second, minute, hour, day, week, month, year, decade, century, millennium, or abbreviations or plurals of these units; direction can be ago or empty. The at sign (@) is optional noise. The amounts of the different units are implicitly added with appropriate sign accounting. ago negates all the fields. This syntax is also used for interval output, if IntervalStyle is set to postgres_verbose.

Quantities of days, hours, minutes, and seconds can be specified without explicit unit markings. For example, '1 12:59:10' is read the same as '1 day 12 hours 59 min 10 sec'. Also, a combination of years and months can be specified with a dash; for example '200-10' is read the same as '200 years 10 months'. (These shorter forms are in fact the only ones allowed by the SQL standard, and are used for output when IntervalStyle is set to sql_standard.)

Interval values can also be written as ISO 8601 time intervals, using either the “format with designators” of the standard's section 4.4.3.2 or the “alternative format” of section 4.4.3.3. The format with designators looks like this:

P quantity unit [ quantity unit ...] [ T [ quantity unit ...]]

The string must start with a P, and may include a T that introduces the time-of-day units. The available unit abbreviations are given in Table 8.16. Units may be omitted, and may be specified in any order, but units smaller than a day must appear after T. In particular, the meaning of M depends on whether it is before or after T.

Table 8.16. ISO 8601 Interval Unit Abbreviations

In the alternative format:

P [ years-months-days ] [ T hours:minutes:seconds ]

the string must begin with P, and a T separates the date and time parts of the interval. The values are given as numbers similar to ISO 8601 dates.

When writing an interval constant with a fields specification, or when assigning a string to an interval column that was defined with a fields specification, the interpretation of unmarked quantities depends on the fields. For example INTERVAL '1' YEAR is read as 1 year, whereas INTERVAL '1' means 1 second. Also, field values “to the right” of the least significant field allowed by the fields specification are silently discarded. For example, writing INTERVAL '1 day 2:03:04' HOUR TO MINUTE results in dropping the seconds field, but not the day field.

According to the SQL standard all fields of an interval value must have the same sign, so a leading negative sign applies to all fields; for example the negative sign in the interval literal '-1 2:03:04' applies to both the days and hour/minute/second parts. PostgreSQL allows the fields to have different signs, and traditionally treats each field in the textual representation as independently signed, so that the hour/minute/second part is considered positive in this example. If IntervalStyle is set to sql_standard then a leading sign is considered to apply to all fields (but only if no additional signs appear). Otherwise the traditional PostgreSQL interpretation is used. To avoid ambiguity, it's recommended to attach an explicit sign to each field if any field is negative.

Internally interval values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. Because intervals are usually created from constant strings or timestamp subtraction, this storage method works well in most cases. Functions justify_days and justify_hours are available for adjusting days and hours that overflow their normal ranges.

In the verbose input format, and in some fields of the more compact input formats, field values can have fractional parts; for example '1.5 week' or '01:02:03.45'. Such input is converted to the appropriate number of months, days, and seconds for storage. When this would result in a fractional number of months or days, the fraction is added to the lower-order fields using the conversion factors 1 month = 30 days and 1 day = 24 hours. For example,'1.5 month' becomes 1 month and 15 days. Only seconds will ever be shown as fractional on output.

Table 8.17 shows some examples of valid interval input.

Table 8.17. Interval Input

8.5.5. Interval Output

The output format of the interval type can be set to one of the four styles sql_standard, postgres, postgres_verbose, or iso_8601, using the command SET intervalstyle. The default is the postgres format. Table 8.18 shows examples of each output style.

The sql_standard style produces output that conforms to the SQL standard's specification for interval literal strings, if the interval value meets the standard's restrictions (either year-month only or day-time only, with no mixing of positive and negative components). Otherwise the output looks like a standard year-month literal string followed by a day-time literal string, with explicit signs added to disambiguate mixed-sign intervals.

The output of the postgres style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle parameter was set to ISO.

The output of the postgres_verbose style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle parameter was set to non-ISO output.

The output of the iso_8601 style matches the “format with designators” described in section 4.4.3.2 of the ISO 8601 standard.

Table 8.18. Interval Output Style Examples

8.6. 布林型別

PostgreSQL 支援標準 SQL 的布林型別，如表 [Table 8-19]("DATATYPE-BOOLEAN-TABLE") 所示。布林型別有幾種狀態: "true"、"false"，和第三種狀態 "unknown"，"unknown" 會用 SQL 的 null 值表示。

Table 8-19. 布林型別的資料型態描述

以下的字詞都可以代表 "true" 狀態:

"false" 狀態則可以用以下的字詞表示:

開頭和結尾的空白都會被忽略，也不分大小寫。為了符合 SQL 用法，建議使用關鍵字 "TRUE" 和 "FALSE"。

[Example 8-2]("DATATYPE-BOOLEAN-EXAMPLE") 使用字母 t 和 f，來顯示布林型別的輸出。

Example 8-2. 使用布林型別

CREATE TABLE test1 (a boolean, b text);
INSERT INTO test1 VALUES (TRUE, 'sic est');
INSERT INTO test1 VALUES (FALSE, 'non est');
SELECT * FROM test1;
 a |    b
---+---------
 t | sic est
 f | non est

SELECT * FROM test1 WHERE a;
 a |    b
---+---------
 t | sic est

8.7. 列舉型別

Enumerated (enum) types are data types that comprise a static, ordered set of values. They are equivalent to the enum types supported in a number of programming languages. An example of an enum type might be the days of the week, or a set of status values for a piece of data.

8.7.1. Declaration of Enumerated Types

Enum types are created using the command, for example:

Once created, the enum type can be used in table and function definitions much like any other type:

8.7.2. Ordering

The ordering of the values in an enum type is the order in which the values were listed when the type was created. All standard comparison operators and related aggregate functions are supported for enums. For example:

8.7.3. Type Safety

Each enumerated data type is separate and cannot be compared with other enumerated types. See this example:

If you really need to do something like that, you can either write a custom operator or add explicit casts to your query:

8.7.4. Implementation Details

Enum labels are case sensitive, so 'happy' is not the same as 'HAPPY'. White space in the labels is significant too.

An enum value occupies four bytes on disk. The length of an enum value's textual label is limited by the NAMEDATALEN setting compiled into PostgreSQL; in standard builds this means at most 63 bytes.

8.8. 地理資訊型別

Geometric data types represent two-dimensional spatial objects. shows the geometric types available in PostgreSQL.

Table 8.20. Geometric Types

A rich set of functions and operators is available to perform various geometric operations such as scaling, translation, rotation, and determining intersections. They are explained in .

8.8.1. Points

Points are the fundamental two-dimensional building block for geometric types. Values of type point are specified using either of the following syntaxes:

where x and y are the respective coordinates, as floating-point numbers.

Points are output using the first syntax.

8.8.2. Lines

Lines are represented by the linear equation A_x + By + C = 0, where A and B_ are not both zero. Values of type line are input and output in the following form:

Alternatively, any of the following forms can be used for input:

where (x1,y1) and (x2,y2) are two different points on the line.

8.8.3. Line Segments

Line segments are represented by pairs of points that are the endpoints of the segment. Values of type lseg are specified using any of the following syntaxes:

where (x1,y1) and (x2,y2) are the end points of the line segment.

Line segments are output using the first syntax.

8.8.4. Boxes

Boxes are represented by pairs of points that are opposite corners of the box. Values of type box are specified using any of the following syntaxes:

where (x1,y1) and (x2,y2) are any two opposite corners of the box.

Boxes are output using the second syntax.

Any two opposite corners can be supplied on input, but the values will be reordered as needed to store the upper right and lower left corners, in that order.

8.8.5. Paths

Paths are represented by lists of connected points. Paths can be open, where the first and last points in the list are considered not connected, or closed, where the first and last points are considered connected.

Values of type path are specified using any of the following syntaxes:

where the points are the end points of the line segments comprising the path. Square brackets ([]) indicate an open path, while parentheses (()) indicate a closed path. When the outermost parentheses are omitted, as in the third through fifth syntaxes, a closed path is assumed.

Paths are output using the first or second syntax, as appropriate.

8.8.6. Polygons

Polygons are represented by lists of points (the vertexes of the polygon). Polygons are very similar to closed paths, but are stored differently and have their own set of support routines.

Values of type polygon are specified using any of the following syntaxes:

where the points are the end points of the line segments comprising the boundary of the polygon.

Polygons are output using the first syntax.

8.8.7. Circles

Circles are represented by a center point and radius. Values of type circle are specified using any of the following syntaxes:

where (x,y) is the center point and r is the radius of the circle.

Circles are output using the first syntax.

8.9. 網路資訊型別

PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses, as shown in . It is better to use these types instead of plain text types to store network addresses, because these types offer input error checking and specialized operators and functions (see ).

Table 8.21. Network Address Types

When sorting inet or cidr data types, IPv4 addresses will always sort before IPv6 addresses, including IPv4 addresses encapsulated or mapped to IPv6 addresses, such as ::10.2.3.4 or ::ffff:10.4.3.2.

8.9.1. `inet`

The inet type holds an IPv4 or IPv6 host address, and optionally its subnet, all in one field. The subnet is represented by the number of network address bits present in the host address (the “netmask”). If the netmask is 32 and the address is IPv4, then the value does not indicate a subnet, only a single host. In IPv6, the address length is 128 bits, so 128 bits specify a unique host address. Note that if you want to accept only networks, you should use the cidr type rather than inet.

The input format for this type is address/y where address is an IPv4 or IPv6 address and y is the number of bits in the netmask. If the /y portion is missing, the netmask is 32 for IPv4 and 128 for IPv6, so the value represents just a single host. On display, the /y portion is suppressed if the netmask specifies a single host.

8.9.2. `cidr`

The cidr type holds an IPv4 or IPv6 network specification. Input and output formats follow Classless Internet Domain Routing conventions. The format for specifying networks is address/y where address is the network represented as an IPv4 or IPv6 address, and y is the number of bits in the netmask. If y is omitted, it is calculated using assumptions from the older classful network numbering system, except it will be at least large enough to include all of the octets written in the input. It is an error to specify a network address that has bits set to the right of the specified netmask.

shows some examples.

Table 8.22. `cidr` Type Input Examples

8.9.3. `inet` vs. `cidr`

The essential difference between inet and cidr data types is that inet accepts values with nonzero bits to the right of the netmask, whereas cidr does not. For example, 192.168.0.1/24 is valid for inet but not for cidr.

Tip

If you do not like the output format for inet or cidr values, try the functions host, text, and abbrev.

8.9.4. `macaddr`

The macaddr type stores MAC addresses, known for example from Ethernet card hardware addresses (although MAC addresses are used for other purposes as well). Input is accepted in the following formats:

These examples would all specify the same address. Upper and lower case is accepted for the digits a through f. Output is always in the first of the forms shown.

IEEE Std 802-2001 specifies the second shown form (with hyphens) as the canonical form for MAC addresses, and specifies the first form (with colons) as the bit-reversed notation, so that 08-00-2b-01-02-03 = 01:00:4D:08:04:0C. This convention is widely ignored nowadays, and it is relevant only for obsolete network protocols (such as Token Ring). PostgreSQL makes no provisions for bit reversal, and all accepted formats use the canonical LSB order.

The remaining five input formats are not part of any standard.

8.9.5. `macaddr8`

The macaddr8 type stores MAC addresses in EUI-64 format, known for example from Ethernet card hardware addresses (although MAC addresses are used for other purposes as well). This type can accept both 6 and 8 byte length MAC addresses and stores them in 8 byte length format. MAC addresses given in 6 byte format will be stored in 8 byte length format with the 4th and 5th bytes set to FF and FE, respectively. Note that IPv6 uses a modified EUI-64 format where the 7th bit should be set to one after the conversion from EUI-48. The function macaddr8_set7bit is provided to make this change. Generally speaking, any input which is comprised of pairs of hex digits (on byte boundaries), optionally separated consistently by one of ':', '-' or '.', is accepted. The number of hex digits must be either 16 (8 bytes) or 12 (6 bytes). Leading and trailing whitespace is ignored. The following are examples of input formats that are accepted:

These examples would all specify the same address. Upper and lower case is accepted for the digits a through f. Output is always in the first of the forms shown. The last six input formats that are mentioned above are not part of any standard. To convert a traditional 48 bit MAC address in EUI-48 format to modified EUI-64 format to be included as the host portion of an IPv6 address, use macaddr8_set7bit as shown:

8.10. 位元字串型別

Bit strings are strings of 1's and 0's. They can be used to store or visualize bit masks. There are two SQL bit types: bit(n) and bit varying(n), where n is a positive integer.

bit type data must match the length n exactly; it is an error to attempt to store shorter or longer bit strings. bit varying data is of variable length up to the maximum length n; longer strings will be rejected. Writing bit without a length is equivalent to bit(1), while bit varying without a length specification means unlimited length.

Note

If one explicitly casts a bit-string value to bit(n), it will be truncated or zero-padded on the right to be exactly n bits, without raising an error. Similarly, if one explicitly casts a bit-string value to bit varying(n), it will be truncated on the right if it is more than n bits.

Refer to for information about the syntax of bit string constants. Bit-logical operators and string manipulation functions are available; see .

Example 8.3. Using the Bit String Types

8.11. 全文檢索型別

PostgreSQL provides two data types that are designed to support full text search, which is the activity of searching through a collection of natural-language documents to locate those that best match a query. The tsvector type represents a document in a form optimized for text search; the tsquery type similarly represents a text query. provides a detailed explanation of this facility, and summarizes the related functions and operators.

8.11.1. `tsvector`

A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see for details). Sorting and duplicate-elimination are done automatically during input, as shown in this example:

To represent lexemes containing whitespace or punctuation, surround them with quotes:

(We use dollar-quoted string literals in this example and the next one to avoid the confusion of having to double quote marks within the literals.) Embedded quotes and backslashes must be doubled:

Optionally, integer positions can be attached to lexemes:

A position normally indicates the source word's location in the document. Positional information can be used for proximity ranking. Position values can range from 1 to 16383; larger numbers are silently set to 16383. Duplicate positions for the same lexeme are discarded.

Lexemes that have positions can further be labeled with a weight, which can be A, B, C, or D. D is the default and hence is not shown on output:

Weights are typically used to reflect document structure, for example by marking title words differently from body words. Text search ranking functions can assign different priorities to the different weight markers.

It is important to understand that the tsvector type itself does not perform any word normalization; it assumes the words it is given are normalized appropriately for the application. For example,

For most English-text-searching applications the above words would be considered non-normalized, but tsvector doesn't care. Raw document text should usually be passed through to_tsvector to normalize the words appropriately for searching:

8.11.2. `tsquery`

A tsquery value stores lexemes that are to be searched for, and can combine them using the Boolean operators & (AND), | (OR), and ! (NOT), as well as the phrase search operator <-> (FOLLOWED BY). There is also a variant <N> of the FOLLOWED BY operator, where N is an integer constant that specifies the distance between the two lexemes being searched for. <-> is equivalent to <1>.

Parentheses can be used to enforce grouping of these operators. In the absence of parentheses, ! (NOT) binds most tightly, <-> (FOLLOWED BY) next most tightly, then & (AND), with | (OR) binding the least tightly.

Here are some examples:

Optionally, lexemes in a tsquery can be labeled with one or more weight letters, which restricts them to match only tsvector lexemes with one of those weights:

Also, lexemes in a tsquery can be labeled with * to specify prefix matching:

This query will match any word in a tsvector that begins with “super”.

Quoting rules for lexemes are the same as described previously for lexemes in tsvector; and, as with tsvector, any required normalization of words must be done before converting to the tsquery type. The to_tsquery function is convenient for performing such normalization:

Note that to_tsquery will process prefixes in the same way as other words, which means this comparison returns true:

because postgres gets stemmed to postgr:

which will match the stemmed form of postgraduate.

8.12. UUID 型別

The data type uuid stores Universally Unique Identifiers (UUID) as defined by RFC 4122, ISO/IEC 9834-8:2005, and related standards. (Some systems refer to this data type as a globally unique identifier, or GUID, instead.) This identifier is a 128-bit quantity that is generated by an algorithm chosen to make it very unlikely that the same identifier will be generated by anyone else in the known universe using the same algorithm. Therefore, for distributed systems, these identifiers provide a better uniqueness guarantee than sequence generators, which are only unique within a single database.

A UUID is written as a sequence of lower-case hexadecimal digits, in several groups separated by hyphens, specifically a group of 8 digits followed by three groups of 4 digits followed by a group of 12 digits, for a total of 32 digits representing the 128 bits. An example of a UUID in this standard form is:

PostgreSQL also accepts the following alternative forms for input: use of upper-case digits, the standard format surrounded by braces, omitting some or all hyphens, adding a hyphen after any group of four digits. Examples are:

Output is always in the standard form.

PostgreSQL provides storage and comparison functions for UUIDs, but the core database does not include any function for generating UUIDs, because no single algorithm is well suited for every application. The module provides functions that implement several standard algorithms. The module also provides a generation function for random UUIDs. Alternatively, UUIDs could be generated by client applications or other libraries invoked through a server-side function.

8.13. XML 型別

The xml data type can be used to store XML data. Its advantage over storing XML data in a text field is that it checks the input values for well-formedness, and there are support functions to perform type-safe operations on it; see . Use of this data type requires the installation to have been built with configure --with-libxml.

The xml type can store well-formed “documents”, as defined by the XML standard, as well as “content” fragments, which are defined by reference to the more permissive of the XQuery and XPath data model. Roughly, this means that content fragments can have more than one top-level element or character node. The expression xmlvalue IS DOCUMENT can be used to evaluate whether a particular xml value is a full document or only a content fragment.

Limits and compatibility notes for the xml data type can be found in .

8.13.1. Creating XML Values

To produce a value of type xml from character data, use the function xmlparse:

Examples:

While this is the only way to convert character strings into XML values according to the SQL standard, the PostgreSQL-specific syntaxes:

can also be used.

The xml type does not validate input values against a document type declaration (DTD), even when the input value specifies a DTD. There is also currently no built-in support for validating against other XML schema languages such as XML Schema.

The inverse operation, producing a character string value from xml, uses the function xmlserialize:

type can be character, character varying, or text (or an alias for one of those). Again, according to the SQL standard, this is the only way to convert between type xml and character types, but PostgreSQL also allows you to simply cast the value.

When a character string value is cast to or from type xml without going through XMLPARSE or XMLSERIALIZE, respectively, the choice of DOCUMENT versus CONTENT is determined by the “XML option” session configuration parameter, which can be set using the standard command:

or the more PostgreSQL-like syntax

The default is CONTENT, so all forms of XML data are allowed.

8.13.2. Encoding Handling

When using binary mode to pass query parameters to the server and query results back to the client, no encoding conversion is performed, so the situation is different. In this case, an encoding declaration in the XML data will be observed, and if it is absent, the data will be assumed to be in UTF-8 (as required by the XML standard; note that PostgreSQL does not support UTF-16). On output, data will have an encoding declaration specifying the client encoding, unless the client encoding is UTF-8, in which case it will be omitted.

Needless to say, processing XML data with PostgreSQL will be less error-prone and more efficient if the XML data encoding, client encoding, and server encoding are the same. Since XML data is internally processed in UTF-8, computations will be most efficient if the server encoding is also UTF-8.

Caution

Some XML-related functions may not work at all on non-ASCII data when the server encoding is not UTF-8. This is known to be an issue for xmltable() and xpath() in particular.

8.13.3. Accessing XML Values

The xml data type is unusual in that it does not provide any comparison operators. This is because there is no well-defined and universally useful comparison algorithm for XML data. One consequence of this is that you cannot retrieve rows by comparing an xml column against a search value. XML values should therefore typically be accompanied by a separate key field such as an ID. An alternative solution for comparing XML values is to convert them to character strings first, but note that character string comparison has little to do with a useful XML comparison method.

Since there are no comparison operators for the xml data type, it is not possible to create an index directly on a column of this type. If speedy searches in XML data are desired, possible workarounds include casting the expression to a character string type and indexing that, or indexing an XPath expression. Of course, the actual query would have to be adjusted to search by the indexed expression.

The text-search functionality in PostgreSQL can also be used to speed up full-document searches of XML data. The necessary preprocessing support is, however, not yet available in the PostgreSQL distribution.

8.14. JSON 型別

JSON 資料型別用於儲存中所規範的 JSON（JavaScript Object Notation）資料。此類資料也可以儲存為 text，但是 JSON 資料型別的優點是可以根據 JSON 規則強制讓每個儲存的值必須是有效的值。對於這些資料型別中儲存的資料，還提供了各種特定於 JSON 的函數和運算子。另請參閱。

PostgreSQL 提供了兩種儲存 JSON 資料的型別：json 和 jsonb。為了對這些資料型別實作有效的查詢機制，PostgreSQL 還提供了中所描述的 jsonpath 資料型別。

json 和 jsonb 資料型別接受幾乎相同的內容集合作為輸入。實際主要的差別是效率。json 資料型別儲存與輸入字串完全相同的內容，處理函數必須在每次執行時重新解析；jsonb 資料型別則以分解後的二進位格式儲存，由於增加了轉換成本，因此資料輸入的速度稍慢，但由於後續不需要解析，因此處理速度明顯加快。jsonb 還支援索引處理，這是一個很大的優勢。

因為 json 型別儲存與輸入字串完全相同的內容，所以它將保留標記之間語義上無關的空白以及 JSON 物件中鍵的順序。另外，如果 JSON 內容物件包含相同的鍵不只一次，則所有鍵/值對都會保留。（處理函數會將最後一個值視為可用的值。）相比之下，jsonb 不會保留空白，不會保留物件中鍵的順序，也不會保留物件中重複的鍵。如果在輸入中指定了重複的鍵，則僅保留最後一個值。

通常，大多數應用程序應該將 JSON 資料儲存為 jsonb，除非有非常特殊的需求，例如關於物件中鍵的順序有一些傳統上的假設。

由於 PostgreSQL 每個資料庫只允許一種字元集的編碼。因此，除非資料庫編碼為 UTF8，否則 JSON 型別不可能嚴格符合 JSON 規範。嘗試直接使用資料庫編碼中無法表示的字元會失敗；相反，character 型別則允許使用可以在資料庫編碼中表示但不能以 UTF8 表示的字元。

RFC 7159 允許 JSON 字串包含 \uXXXX 所表示的 Unicode 轉譯序列。在 json 型別的輸入函數中，無論資料庫編碼如何，都允許 Unicode 轉譯，並且僅檢查語法正確性（即，四個十六進位數字跟在 \u 之後）。但是，jsonb 的輸入函數更嚴格：除非資料庫編碼為 UTF8，否則它不允許非 ASCII 字元（U+007F 以上的字元）使用 Unicode 轉譯。jsonb 型別也拒絕 \u0000（因為無法在 PostgreSQL 的 text 型別中表現），並且堅持認為使用 Unicode surrogate pair 對來指定 Unicode Basic Multilingual Plane 之外的字元都是正確的。有效的 Unicode 轉譯會轉換為等效的 ASCII 或 UTF8 字元進行儲存；這包括將 surrogate pair 折疊為單個字元。

第 9.15 節中描述的許多 JSON 處理函數會將 Unicode 轉譯為一般字元，因此，即使輸入型別為 json 而不是 jsonb，它們也會拋出與上述類型相同的錯誤。json 輸入函數不進行這些檢查的事實可能被認為是歷史共業，儘管它確實允許以非 UTF8 資料庫編碼的形式簡單儲存（毋須處理）JSON Unicode 轉譯。通常，如果可以的話，最好避免將 JSON 中的 Unicode 轉譯與非 UTF8 資料庫編碼混在一起。

將字串 JSON 輸入轉換為 jsonb 時，RFC 7159 描述的原始型別將會有效地對應到內建的 PostgreSQL 型別，如 Table 8.23 所示。因此，對於構成有效 jsonb 資料的內容存在一些較小的附加約束條件，這些約束條件既不適用於 json 型別，也不適用於抽象上 JSON，這對應於基礎資料型別可以表示的內容限制。值得注意的是，jsonb 會拒絕 PostgreSQL 數字資料型別範圍之外的數字，而 json 不會。RFC 7159 允許此類實作定義限制。但是，實際上，在其他實作中更容易出現此類問題，因為通常將 JSON 的數字基本型別表示為 IEEE 754 雙精確度浮點數（RFC 7159 明確預期了這一點且允許）。當使用 JSON 作為此類系統的交換格式時，應考慮與 PostgreSQL 最初儲存的資料相比較，可能會有失去數字精確度的風險。

相反，如下表中所示，JSON 基本型別的輸入格式有一些微小的限制，但並不適用於其相應的 PostgreSQL 資料型別。

Table 8.23. JSON Primitive Types and Corresponding PostgreSQL Types

8.14.1. JSON 輸入與輸出語法

JSON 資料型別的輸入/輸出語法被規範在 RFC 7159 之中。

以下是所有有效的 json（或 jsonb）表示式：

如前所述，當輸入 JSON 內容然後在不進行任何其他處理的情況下進行輸出時，json 輸出與輸入相同的內容，而 jsonb 則不會保留與語義無關的細節，像是空格。例如，請注意此處的差別：

值得注意的一個語義無關的細節是，在 jsonb 中，數字將根據基本數字型別的行為進行輸出。實際上，這意味著使用 E 記號輸入的數字將不會以原輸出形式輸出，例如：

但是，jsonb 將保留小數尾巴的數字零，如在本範例中所示，即使它們在語義上無意義（例如，相等運算），也是如此。

8.14.2. 設計 JSON 文件結構

將資料表示為 JSON 可以比傳統的關連資料模型要靈活得多，而傳統的關連資料模型在需求多變的環境中非常引人注目。這兩種方法很可能在同一應用程序中共存和互補。但是，即使對於需要最大靈活性的應用程序，仍然建議 JSON 文件具有某種固定的結構。該結構通常是不具有強制性的（儘管可以宣告強制執行某些業務規則），但是具有可預測的結構可以使撰編查詢變得更加容易，該查詢可以有效地彙總資料表中的一組「文件」（datums）。

JSON 資料儲存在資料表中時，與其他任何資料型別一樣，要遵循相同的一致性控制事項。儘管儲存大型文件是可行的，但請記住，任何更新都會取得整筆資料的 row-level lock。考慮將 JSON 文件限制在可管理的大小以內，以減少更新交易事務之間的鎖定競爭。理想情況下，每個 JSON 文件都應代表一個完整交易單位資料(atomic datum)，業務規則規定不能將該完整交易單位資料進一步細分為可以獨立更新的較小單位資料。

8.14.3. `jsonb` Containment and Existence

測試包容性(containment)是 jsonb 的一項重要功能。json 型別沒有平行處理的工具集。包含性測試一個 jsonb 文件是否在其中包含另一個。除說明以外的部份，這些範例會回傳 true：

一般原則是，包含物件必須在結構和資料內容上與包含的物件相吻合，可能是在從包含的物件中丟棄了一些不吻合的陣列元素或物件鍵/值配對之後。但是請記住，進行包含性檢查時，陣列元素的順序並不重要，並且重複陣列元素僅有一個元素會被視為有效。

作為結構必須吻合的一般原則的特殊例外，陣列可以包含單一基本值：

jsonb 還具有一個 existence 運算子，它是包含性的變體：它測試字串（作為 text 值）是否作為物件鍵或陣列元素出現在 jsonb 值的頂層。這些範例回傳 true，除非另有說明：

當涉及許多鍵或元素時，JSON 物件比陣列更適合用於測試是否包含或存在，因為與陣列不同，JSON 物件在內部進行了最佳化以進行搜尋，因此不需要線性搜尋。

由於 JSON 的包含性是巢狀的，因此適當的查詢可以跳過對子物件的明確選擇。舉例來說，假設我們有一個 doc 欄位，其中包含最上層物件，而大多數物件包含子物件陣列的標籤欄位。該查詢項目，在其中包含“ term”：“ paris”和“ term”：“ food”的子物件出現，而忽略標籤陣列以外的任何鍵：

例如，另一個方式可以完成同一件事

但是這種方法靈活性較差，而且效率通常也較低。

另一方面，JSON 存在性運算子不是巢狀的：它只會在 JSON 內容的最上層查詢指定的鍵或陣列元素。

在第 9.15 節中記錄了各種包含性和存在性的運算子，以及所有其他 JSON 運算子和函數。

8.14.4. `jsonb` Indexing

GIN 索引可用於有效搜尋大量的 jsonb 文件（datums）中出現的鍵或鍵/值配對。有兩種 GIN “operator classes”，提供了不同的效能和靈活性權衡。

非預設 GIN 運算子類 jsonb_path_ops 僅支援對 @> 運算子進行索引。使用此運算子類建立索引的範例是：

想像一個資料表的範例，該資料表儲存了從第三方 Web 服務檢索到的 JSON 文件以及已文件化的結構定義。典型的文件是：

我們將這些文件儲存在名為 api 的資料表中，名為 jdoc 的 jsonb 欄位中。如果在此欄位上建立了 GIN 索引，則如下查詢可以使用到該索引：

但是，索引不能用於以下查詢，儘管運算子 ? 是可索引的，但它不會直接套用於索引欄位 jdoc：

儘管如此，透過適當使用表示式索引，上述查詢仍可以使用索引。如果在“tags”鍵中查詢特定項目很常見，則定義這樣的索引可能是值得的：

另外，GIN 索引支援＠＠和＠？運算子，它們處理 jsonpath 的搜尋。

GIN 索引從 jsonpath 中取出以下形式的語句：accessors_chain = const。Accessors chain 可能由 .key，[*] 和 [index] 的 Accessor 所組成。_jsonb_ops 也支持 .*_ 和 .** 的 Accessor。

查詢的另一種方法是利用 containment，例如：

jdoc 欄位上的簡單 GIN 索引可以支援此查詢。但是請注意，這樣的索引將在 jdoc 欄位中儲存每個鍵和值的副本，而上一範例的表示式索引僅儲存在 tag 鍵下所找到的資料。儘管簡單索引方法更加靈活（因為它支援對任何鍵的查詢），但目標表示式索引可能比簡單索引更小且搜尋速度更快。

儘管 jsonb_path_ops 運算子類僅支援使用 @>，@@ 和 @? 運算子的查詢，它比預設的運算子類 jsonb_ops 具有明顯的效能優勢。對於相同資料集，jsonb_path_ops 索引通常也比 jsonb_ops 索引小得多，針對搜尋的專用性更好，尤其是當查詢包含頻繁出現在資料中的鍵時。因此，搜尋性質的操作通常比預設運算子類具有更好的效能。

A disadvantage of the jsonb_path_ops approach is that it produces no index entries for JSON structures not containing any values, such as {"a": {}}. If a search for documents containing such a structure is requested, it will require a full-index scan, which is quite slow. jsonb_path_ops is therefore ill-suited for applications that often perform such searches.

jsonb also supports btree and hash indexes. These are usually useful only if it's important to check equality of complete JSON documents. The btree ordering for jsonb datums is seldom of great interest, but for completeness it is:

Objects with equal numbers of pairs are compared in the order:

Note that object keys are compared in their storage order; in particular, since shorter keys are stored before longer keys, this can lead to results that might be unintuitive, such as:

Similarly, arrays with equal numbers of elements are compared in the order:

Primitive JSON values are compared using the same comparison rules as for the underlying PostgreSQL data type. Strings are compared using the default database collation.

8.14.5. 對應轉換

可以使用其他延伸功能來實作針對不同程序語言的 jsonb 型別轉換。

PL/Perl 的延伸功能名稱為 jsonb_plperl 和 jsonb_plperlu。如果使用它們，則 jsonb 的值將視情況對應轉換為到 Perl 的 array、hash 和 scalar。

PL/Python 的延伸功能名稱為 jsonb_plpythonu，jsonb_plpython2u 和 jsonb_plpython3u（有關 PL/Python 的命名約定，請參閱第 45.1 節）。如果使用它們，則 jsonb 值將適當地對應轉換到 Python 的 dictionary，list 和 scalar。

8.14.6. jsonpath Type

jsonpath 型別實現了 PostgreSQL 中對 SQL/JSON 路徑語法的支援，以有效地查詢 JSON 資料。它提供以二元運算的形式來使用已解析的 SQL/JSON 路徑表示式，此表示式讓路徑引擎從 JSON 資料檢索的項目取出內容，以供 SQL/JSON 查詢函數進一步處理。

SQL / JSON 路徑 predicate 和運算子的語義基本遵循 SQL 標準。同時，為了提供使用 JSON 資料的更自然的方式，SQL/JSON 路徑語法使用了一些 JavaScript 約定：

點（.）用於資料成員存取。
中括號（[ ]）用於陣列存取。
與從 1 開始的一般 SQL 陣列不同，SQL/JSON 陣列是從 0 開始。

A path expression consists of a sequence of path elements, which can be the following:

Path literals of JSON primitive types: Unicode text, numeric, true, false, or null.
Parentheses, which can be used to provide filter expressions or define the order of path evaluation.

Table 8.24. `jsonpath` Variables

Table 8.25. `jsonpath` Accessors

8.5. 日期時間型別

PostgreSQL 支援完整的 SQL 日期和時間格式，如表 8.9 所示。對於這些資料型態能使用的操作，將會在9.9節說明。

Table 8.9. 日期/時間型態

注意

interval 型態有個額外的選項，可以寫下下列其中一個詞組來限制存放的欄位：

YEAR
MONTH
DAY
HOUR
MINUTE
SECOND
YEAR TO MONTH
DAY TO HOUR
DAY TO MINUTE
DAY TO SECOND
HOUR TO MINUTE
HOUR TO SECOND
MINUTE TO SECOND

需注意若是 fields 和 p 同時指定時，fields 必須要包含 SECOND。這是因為精度只會套用在秒上。

abstime 和 reltime 型態是較低精度的內部用型態，並不建議將這些型態用在應用程式中；這些內部型態也可能在未來的釋出中消失。

8.5.1. 日期/時間輸入

請記得，任何日期和時間字面的輸入，都需要像文字一樣以單引號結束，詳細的資訊請參閱4.1.2.7 節。SQL 要求使用以下的語法：

type [ (p) ] 'value'

8.5.1.1. 日期

表 8.10 列出 date 型態的一些可能的輸入格式：

表 8.10. 日期輸入

8.5.1.2. 時間

time-of-day 格式包含 time [ (p) ] without time zone和time [ (_p_\) \] with time zone，其中 time 單獨出現時等同於 time without time zone。

表 8.11. 時間輸入

表 8.12. 時區輸入

關於指定時區的其他資訊，請參照8.5.3節。

8.5.1.3. 時間戳記

1999-01-08 04:05:06

以及：

1999-01-08 04:05:06 -8:00

都是遵循 ISO 8601 標準的合法值。除此之外，常見的格式：

January 8 04:05:06 1999 PST

也有支援。

SQL 標準中，timestamp without time zone 和 timestamp with time zone 字面可以在時間後面加上 “+” 或 “-” 符號和時差來做區別，因此根據這個標準，

TIMESTAMP '2004-10-19 10:23:54'

是 timestamp without time zone 型態，而

TIMESTAMP '2004-10-19 10:23:54+02'

TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'

8.5.1.4. 特殊值

Table 8.13. Special Date/Time Inputs

8.5.2. Date/Time Output

Table 8.14. Date/Time Output Styles

Note

Table 8.15. Date Order Conventions

The formatting function to_char (see Section 9.8) is also available as a more flexible way to format date/time output.

8.5.3. Time Zones

PostgreSQL endeavors to be compatible with the SQL standard definitions for typical usage. However, the SQL standard has an odd mix of date and time types and capabilities. Two obvious problems are:

Although the date type cannot have an associated time zone, the time type can. Time zones in the real world have little meaning unless associated with a date as well as a time, since the offset can vary through the year with daylight-saving time boundaries.
The default time zone is specified as a constant numeric offset from UTC. It is therefore impossible to adapt to daylight-saving time when doing date/time arithmetic across DST boundaries.

All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the zone specified by the TimeZone configuration parameter before being displayed to the client.

PostgreSQL allows you to specify time zones in three different forms:

A full time zone name, for example America/New_York. The recognized time zone names are listed in the pg_timezone_names view (see Section 51.90). PostgreSQL uses the widely-used IANA time zone data for this purpose, so the same time zone names are also recognized by much other software.
A time zone abbreviation, for example PST. Such a specification merely defines a particular offset from UTC, in contrast to full time zone names which can imply a set of daylight savings transition-date rules as well. The recognized abbreviations are listed in the pg_timezone_abbrevs view (see Section 51.89). You cannot set the configuration parameters TimeZone or log_timezone to a time zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME ZONE operator.
In addition to the timezone names and abbreviations, PostgreSQL will accept POSIX-style time zone specifications of the form STDoffset or STDoffsetDST, where STD is a zone abbreviation, offset is a numeric offset in hours west from UTC, and DST is an optional daylight-savings zone abbreviation, assumed to stand for one hour ahead of the given offset. For example, if EST5EDT were not already a recognized zone name, it would be accepted and would be functionally equivalent to United States East Coast time. In this syntax, a zone abbreviation can be a string of letters, or an arbitrary string surrounded by angle brackets (<>). When a daylight-savings zone abbreviation is present, it is assumed to be used according to the same daylight-savings transition rules used in the IANA time zone database's posixrules entry. In a standard PostgreSQL installation, posixrules is the same as US/Eastern, so that POSIX-style time zone specifications follow USA daylight-savings rules. If needed, you can adjust this behavior by replacing the posixrules file.

In all cases, timezone names and abbreviations are recognized case-insensitively. (This is a change from PostgreSQL versions prior to 8.2, which were case-sensitive in some contexts but not others.)

The TimeZone configuration parameter can be set in the file postgresql.conf, or in any of the other standard ways described in Chapter 19. There are also some special ways to set it:

The SQL command SET TIME ZONE sets the time zone for the session. This is an alternative spelling of SET TIMEZONE TO with a more SQL-spec-compatible syntax.
The PGTZ environment variable is used by libpq clients to send a SET TIME ZONE command to the server upon connection.

8.5.4. Interval Input

interval values can be written using the following verbose syntax:

[@] quantity unit [quantity unit...] [direction]

P quantity unit [ quantity unit ...] [ T [ quantity unit ...]]

Table 8.16. ISO 8601 Interval Unit Abbreviations

In the alternative format:

P [ years-months-days ] [ T hours:minutes:seconds ]

the string must begin with P, and a T separates the date and time parts of the interval. The values are given as numbers similar to ISO 8601 dates.

Table 8.17 shows some examples of valid interval input.

Table 8.17. Interval Input

8.5.5. Interval Output

The output of the postgres style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle parameter was set to ISO.

The output of the postgres_verbose style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle parameter was set to non-ISO output.

The output of the iso_8601 style matches the “format with designators” described in section 4.4.3.2 of the ISO 8601 standard.

Table 8.18. Interval Output Style Examples

8.16. 複合型別

A composite type represents the structure of a row or record; it is essentially just a list of field names and their data types. PostgreSQL allows composite types to be used in many of the same ways that simple types can be used. For example, a column of a table can be declared to be of a composite type.

8.16.1. Declaration of Composite Types

Here are two simple examples of defining composite types:

CREATE TYPE complex AS (
    r       double precision,
    i       double precision
);

CREATE TYPE inventory_item AS (
    name            text,
    supplier_id     integer,
    price           numeric
);

The syntax is comparable to CREATE TABLE, except that only field names and types can be specified; no constraints (such as NOT NULL) can presently be included. Note that the AS keyword is essential; without it, the system will think a different kind of CREATE TYPE command is meant, and you will get odd syntax errors.

Having defined the types, we can use them to create tables:

CREATE TABLE on_hand (
    item      inventory_item,
    count     integer
);

INSERT INTO on_hand VALUES (ROW('fuzzy dice', 42, 1.99), 1000);

or functions:

CREATE FUNCTION price_extension(inventory_item, integer) RETURNS numeric
AS 'SELECT $1.price * $2' LANGUAGE SQL;

SELECT price_extension(item, 10) FROM on_hand;

Whenever you create a table, a composite type is also automatically created, with the same name as the table, to represent the table's row type. For example, had we said:

CREATE TABLE inventory_item (
    name            text,
    supplier_id     integer REFERENCES suppliers,
    price           numeric CHECK (price > 0)
);

then the same inventory_item composite type shown above would come into being as a byproduct, and could be used just as above. Note however an important restriction of the current implementation: since no constraints are associated with a composite type, the constraints shown in the table definition do not apply to values of the composite type outside the table. (To work around this, create a domain over the composite type, and apply the desired constraints as CHECK constraints of the domain.)

8.16.2. Constructing Composite Values

To write a composite value as a literal constant, enclose the field values within parentheses and separate them by commas. You can put double quotes around any field value, and must do so if it contains commas or parentheses. (More details appear below.) Thus, the general format of a composite constant is the following:

'( val1 , val2 , ... )'

An example is:

'("fuzzy dice",42,1.99)'

which would be a valid value of the inventory_item type defined above. To make a field be NULL, write no characters at all in its position in the list. For example, this constant specifies a NULL third field:

'("fuzzy dice",42,)'

If you want an empty string rather than NULL, write double quotes:

'("",42,)'

Here the first field is a non-NULL empty string, the third is NULL.

(These constants are actually only a special case of the generic type constants discussed in Section 4.1.2.7. The constant is initially treated as a string and passed to the composite-type input conversion routine. An explicit type specification might be necessary to tell which type to convert the constant to.)

The ROW expression syntax can also be used to construct composite values. In most cases this is considerably simpler to use than the string-literal syntax since you don't have to worry about multiple layers of quoting. We already used this method above:

ROW('fuzzy dice', 42, 1.99)
ROW('', 42, NULL)

The ROW keyword is actually optional as long as you have more than one field in the expression, so these can be simplified to:

('fuzzy dice', 42, 1.99)
('', 42, NULL)

The ROW expression syntax is discussed in more detail in Section 4.2.13.

8.16.3. Accessing Composite Types

To access a field of a composite column, one writes a dot and the field name, much like selecting a field from a table name. In fact, it's so much like selecting from a table name that you often have to use parentheses to keep from confusing the parser. For example, you might try to select some subfields from our on_hand example table with something like:

SELECT item.name FROM on_hand WHERE item.price > 9.99;

This will not work since the name item is taken to be a table name, not a column name of on_hand, per SQL syntax rules. You must write it like this:

SELECT (item).name FROM on_hand WHERE (item).price > 9.99;

or if you need to use the table name as well (for instance in a multitable query), like this:

SELECT (on_hand.item).name FROM on_hand WHERE (on_hand.item).price > 9.99;

Now the parenthesized object is correctly interpreted as a reference to the item column, and then the subfield can be selected from it.

Similar syntactic issues apply whenever you select a field from a composite value. For instance, to select just one field from the result of a function that returns a composite value, you'd need to write something like:

SELECT (my_func(...)).field FROM ...

Without the extra parentheses, this will generate a syntax error.

The special field name * means “all fields”, as further explained in Section 8.16.5.

8.16.4. Modifying Composite Types

Here are some examples of the proper syntax for inserting and updating composite columns. First, inserting or updating a whole column:

INSERT INTO mytab (complex_col) VALUES((1.1,2.2));

UPDATE mytab SET complex_col = ROW(1.1,2.2) WHERE ...;

The first example omits ROW, the second uses it; we could have done it either way.

We can update an individual subfield of a composite column:

UPDATE mytab SET complex_col.r = (complex_col).r + 1 WHERE ...;

Notice here that we don't need to (and indeed cannot) put parentheses around the column name appearing just after SET, but we do need parentheses when referencing the same column in the expression to the right of the equal sign.

And we can specify subfields as targets for INSERT, too:

INSERT INTO mytab (complex_col.r, complex_col.i) VALUES(1.1, 2.2);

Had we not supplied values for all the subfields of the column, the remaining subfields would have been filled with null values.

8.16.5. Using Composite Types in Queries

There are various special syntax rules and behaviors associated with composite types in queries. These rules provide useful shortcuts, but can be confusing if you don't know the logic behind them.

In PostgreSQL, a reference to a table name (or alias) in a query is effectively a reference to the composite value of the table's current row. For example, if we had a table inventory_item as shown above, we could write:

SELECT c FROM inventory_item c;

This query produces a single composite-valued column, so we might get output like:

           c
------------------------
 ("fuzzy dice",42,1.99)
(1 row)

Note however that simple names are matched to column names before table names, so this example works only because there is no column named c in the query's tables.

The ordinary qualified-column-name syntax table_name.column_name can be understood as applying field selection to the composite value of the table's current row. (For efficiency reasons, it's not actually implemented that way.)

When we write

SELECT c.* FROM inventory_item c;

then, according to the SQL standard, we should get the contents of the table expanded into separate columns:

    name    | supplier_id | price
------------+-------------+-------
 fuzzy dice |          42 |  1.99
(1 row)

as if the query were

SELECT c.name, c.supplier_id, c.price FROM inventory_item c;

PostgreSQL will apply this expansion behavior to any composite-valued expression, although as shown above, you need to write parentheses around the value that .* is applied to whenever it's not a simple table name. For example, if myfunc() is a function returning a composite type with columns a, b, and c, then these two queries have the same result:

SELECT (myfunc(x)).* FROM some_table;
SELECT (myfunc(x)).a, (myfunc(x)).b, (myfunc(x)).c FROM some_table;

Tip

PostgreSQL handles column expansion by actually transforming the first form into the second. So, in this example, myfunc() would get invoked three times per row with either syntax. If it's an expensive function you may wish to avoid that, which you can do with a query like:

SELECT m.* FROM some_table, LATERAL myfunc(x) AS m;

Placing the function in a LATERAL FROM item keeps it from being invoked more than once per row. m.* is still expanded into m.a, m.b, m.c, but now those variables are just references to the output of the FROM item. (The LATERAL keyword is optional here, but we show it to clarify that the function is getting x from some_table.)

The composite_value.* syntax results in column expansion of this kind when it appears at the top level of a SELECT output list, a RETURNING list in INSERT/UPDATE/DELETE, a VALUES clause, or a row constructor. In all other contexts (including when nested inside one of those constructs), attaching .* to a composite value does not change the value, since it means “all columns” and so the same composite value is produced again. For example, if somefunc() accepts a composite-valued argument, these queries are the same:

SELECT somefunc(c.*) FROM inventory_item c;
SELECT somefunc(c) FROM inventory_item c;

In both cases, the current row of inventory_item is passed to the function as a single composite-valued argument. Even though .* does nothing in such cases, using it is good style, since it makes clear that a composite value is intended. In particular, the parser will consider c in c.* to refer to a table name or alias, not to a column name, so that there is no ambiguity; whereas without .*, it is not clear whether c means a table name or a column name, and in fact the column-name interpretation will be preferred if there is a column named c.

Another example demonstrating these concepts is that all these queries mean the same thing:

SELECT * FROM inventory_item c ORDER BY c;
SELECT * FROM inventory_item c ORDER BY c.*;
SELECT * FROM inventory_item c ORDER BY ROW(c.*);

All of these ORDER BY clauses specify the row's composite value, resulting in sorting the rows according to the rules described in Section 9.23.6. However, if inventory_item contained a column named c, the first case would be different from the others, as it would mean to sort by that column only. Given the column names previously shown, these queries are also equivalent to those above:

SELECT * FROM inventory_item c ORDER BY ROW(c.name, c.supplier_id, c.price);
SELECT * FROM inventory_item c ORDER BY (c.name, c.supplier_id, c.price);

(The last case uses a row constructor with the key word ROW omitted.)

Another special syntactical behavior associated with composite values is that we can use functional notation for extracting a field of a composite value. The simple way to explain this is that the notations field(table) and table.field are interchangeable. For example, these queries are equivalent:

SELECT c.name FROM inventory_item c WHERE c.price > 1000;
SELECT name(c) FROM inventory_item c WHERE price(c) > 1000;

Moreover, if we have a function that accepts a single argument of a composite type, we can call it with either notation. These queries are all equivalent:

SELECT somefunc(c) FROM inventory_item c;
SELECT somefunc(c.*) FROM inventory_item c;
SELECT c.somefunc FROM inventory_item c;

This equivalence between functional notation and field notation makes it possible to use functions on composite types to implement “computed fields”. An application using the last query above wouldn't need to be directly aware that somefunc isn't a real column of the table.

Tip

Because of this behavior, it's unwise to give a function that takes a single composite-type argument the same name as any of the fields of that composite type. If there is ambiguity, the field-name interpretation will be chosen if field-name syntax is used, while the function will be chosen if function-call syntax is used. However, PostgreSQL versions before 11 always chose the field-name interpretation, unless the syntax of the call required it to be a function call. One way to force the function interpretation in older versions is to schema-qualify the function name, that is, write schema.func(compositevalue).

8.16.6. Composite Type Input and Output Syntax

The external text representation of a composite value consists of items that are interpreted according to the I/O conversion rules for the individual field types, plus decoration that indicates the composite structure. The decoration consists of parentheses (( and )) around the whole value, plus commas (,) between adjacent items. Whitespace outside the parentheses is ignored, but within the parentheses it is considered part of the field value, and might or might not be significant depending on the input conversion rules for the field data type. For example, in:

'(  42)'

the whitespace will be ignored if the field type is integer, but not if it is text.

As shown previously, when writing a composite value you can write double quotes around any individual field value. You must do so if the field value would otherwise confuse the composite-value parser. In particular, fields containing parentheses, commas, double quotes, or backslashes must be double-quoted. To put a double quote or backslash in a quoted composite field value, precede it with a backslash. (Also, a pair of double quotes within a double-quoted field value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can avoid quoting and use backslash-escaping to protect all data characters that would otherwise be taken as composite syntax.

A completely empty field value (no characters at all between the commas or parentheses) represents a NULL. To write a value that is an empty string rather than NULL, write "".

The composite output routine will put double quotes around field values if they are empty strings or contain parentheses, commas, double quotes, backslashes, or white space. (Doing so for white space is not essential, but aids legibility.) Double quotes and backslashes embedded in field values will be doubled.

Note

Remember that what you write in an SQL command will first be interpreted as a string literal, and then as a composite. This doubles the number of backslashes you need (assuming escape string syntax is used). For example, to insert a text field containing a double quote and a backslash in a composite value, you'd need to write:

INSERT ... VALUES ('("\"\\")');

The string-literal processor removes one level of backslashes, so that what arrives at the composite-value parser looks like ("\"\\"). In turn, the string fed to the text data type's input routine becomes "\. (If we were working with a data type whose input routine also treated backslashes specially, bytea for example, we might need as many as eight backslashes in the command to get one backslash into the stored composite field.) Dollar quoting (see Section 4.1.2.4) can be used to avoid the need to double backslashes.

Tip

The ROW constructor syntax is usually easier to work with than the composite-literal syntax when writing composite values in SQL commands. In ROW, individual field values are written the same way they would be written when not members of a composite.

8.17. 範圍型別

版本：11

Range types are data types representing a range of values of some element type (called the range's subtype). For instance, ranges of timestamp might be used to represent the ranges of time that a meeting room is reserved. In this case the data type is tsrange (short for “timestamp range”), and timestamp is the subtype. The subtype must have a total order so that it is well-defined whether element values are within, before, or after a range of values.

Range types are useful because they represent many element values in a single range value, and because concepts such as overlapping ranges can be expressed clearly. The use of time and date ranges for scheduling purposes is the clearest example; but price ranges, measurement ranges from an instrument, and so forth can also be useful.

8.17.1. Built-in Range Types

PostgreSQL comes with the following built-in range types:

int4range — Range of integer
int8range — Range of bigint
numrange — Range of numeric
tsrange — Range of timestamp without time zone
tstzrange — Range of timestamp with time zone
daterange — Range of date

In addition, you can define your own range types; see CREATE TYPE for more information.

8.17.2. Examples

CREATE TABLE reservation (room int, during tsrange);
INSERT INTO reservation VALUES
    (1108, '[2010-01-01 14:30, 2010-01-01 15:30)');

-- Containment
SELECT int4range(10, 20) @> 3;

-- Overlaps
SELECT numrange(11.1, 22.2) && numrange(20.0, 30.0);

-- Extract the upper bound
SELECT upper(int8range(15, 25));

-- Compute the intersection
SELECT int4range(10, 20) * int4range(15, 25);

-- Is the range empty?
SELECT isempty(numrange(1, 5));

See Table 9.53 and Table 9.54 for complete lists of operators and functions on range types.

8.17.3. Inclusive and Exclusive Bounds

Every non-empty range has two bounds, the lower bound and the upper bound. All points between these values are included in the range. An inclusive bound means that the boundary point itself is included in the range as well, while an exclusive bound means that the boundary point is not included in the range.

In the text form of a range, an inclusive lower bound is represented by “[” while an exclusive lower bound is represented by “(”. Likewise, an inclusive upper bound is represented by “]”, while an exclusive upper bound is represented by “)”. (See Section 8.17.5 for more details.)

The functions lower_inc and upper_inc test the inclusivity of the lower and upper bounds of a range value, respectively.

8.17.4. Infinite (Unbounded) Ranges

The lower bound of a range can be omitted, meaning that all values less than the upper bound are included in the range, e.g., (,3]. Likewise, if the upper bound of the range is omitted, then all values greater than the lower bound are included in the range. If both lower and upper bounds are omitted, all values of the element type are considered to be in the range. Specifying a missing bound as inclusive is automatically converted to exclusive, e.g., [,] is converted to (,). You can think of these missing values as +/-infinity, but they are special range type values and are considered to be beyond any range element type's +/-infinity values.

Element types that have the notion of “infinity” can use them as explicit bound values. For example, with timestamp ranges, [today,infinity) excludes the special timestamp value infinity, while [today,infinity] include it, as does [today,) and [today,].

The functions lower_inf and upper_inf test for infinite lower and upper bounds of a range, respectively.

8.17.5. Range Input/Output

The input for a range value must follow one of the following patterns:

(lower-bound,upper-bound)
(lower-bound,upper-bound]
[lower-bound,upper-bound)
[lower-bound,upper-bound]
empty

The parentheses or brackets indicate whether the lower and upper bounds are exclusive or inclusive, as described previously. Notice that the final pattern is empty, which represents an empty range (a range that contains no points).

The lower-bound may be either a string that is valid input for the subtype, or empty to indicate no lower bound. Likewise, upper-bound may be either a string that is valid input for the subtype, or empty to indicate no upper bound.

Each bound value can be quoted using " (double quote) characters. This is necessary if the bound value contains parentheses, brackets, commas, double quotes, or backslashes, since these characters would otherwise be taken as part of the range syntax. To put a double quote or backslash in a quoted bound value, precede it with a backslash. (Also, a pair of double quotes within a double-quoted bound value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can avoid quoting and use backslash-escaping to protect all data characters that would otherwise be taken as range syntax. Also, to write a bound value that is an empty string, write "", since writing nothing means an infinite bound.

Whitespace is allowed before and after the range value, but any whitespace between the parentheses or brackets is taken as part of the lower or upper bound value. (Depending on the element type, it might or might not be significant.)

Note

These rules are very similar to those for writing field values in composite-type literals. See Section 8.16.6 for additional commentary.

Examples:

-- includes 3, does not include 7, and does include all points in between
SELECT '[3,7)'::int4range;

-- does not include either 3 or 7, but includes all points in between
SELECT '(3,7)'::int4range;

-- includes only the single point 4
SELECT '[4,4]'::int4range;

-- includes no points (and will be normalized to 'empty')
SELECT '[4,4)'::int4range;

8.17.6. Constructing Ranges

Each range type has a constructor function with the same name as the range type. Using the constructor function is frequently more convenient than writing a range literal constant, since it avoids the need for extra quoting of the bound values. The constructor function accepts two or three arguments. The two-argument form constructs a range in standard form (lower bound inclusive, upper bound exclusive), while the three-argument form constructs a range with bounds of the form specified by the third argument. The third argument must be one of the strings “()”, “(]”, “[)”, or “[]”. For example:

-- The full form is: lower bound, upper bound, and text argument indicating
-- inclusivity/exclusivity of bounds.
SELECT numrange(1.0, 14.0, '(]');

-- If the third argument is omitted, '[)' is assumed.
SELECT numrange(1.0, 14.0);

-- Although '(]' is specified here, on display the value will be converted to
-- canonical form, since int8range is a discrete range type (see below).
SELECT int8range(1, 14, '(]');

-- Using NULL for either bound causes the range to be unbounded on that side.
SELECT numrange(NULL, 2.2);

8.17.7. Discrete Range Types

A discrete range is one whose element type has a well-defined “step”, such as integer or date. In these types two elements can be said to be adjacent, when there are no valid values between them. This contrasts with continuous ranges, where it's always (or almost always) possible to identify other element values between two given values. For example, a range over the numeric type is continuous, as is a range over timestamp. (Even though timestamp has limited precision, and so could theoretically be treated as discrete, it's better to consider it continuous since the step size is normally not of interest.)

Another way to think about a discrete range type is that there is a clear idea of a “next” or “previous” value for each element value. Knowing that, it is possible to convert between inclusive and exclusive representations of a range's bounds, by choosing the next or previous element value instead of the one originally given. For example, in an integer range type [4,8] and (3,9) denote the same set of values; but this would not be so for a range over numeric.

A discrete range type should have a canonicalization function that is aware of the desired step size for the element type. The canonicalization function is charged with converting equivalent values of the range type to have identical representations, in particular consistently inclusive or exclusive bounds. If a canonicalization function is not specified, then ranges with different formatting will always be treated as unequal, even though they might represent the same set of values in reality.

The built-in range types int4range, int8range, and daterange all use a canonical form that includes the lower bound and excludes the upper bound; that is, [). User-defined range types can use other conventions, however.

8.17.8. Defining New Range Types

Users can define their own range types. The most common reason to do this is to use ranges over subtypes not provided among the built-in range types. For example, to define a new range type of subtype float8:

CREATE TYPE floatrange AS RANGE (
    subtype = float8,
    subtype_diff = float8mi
);

SELECT '[1.234, 5.678]'::floatrange;

Because float8 has no meaningful “step”, we do not define a canonicalization function in this example.

Defining your own range type also allows you to specify a different subtype B-tree operator class or collation to use, so as to change the sort ordering that determines which values fall into a given range.

If the subtype is considered to have discrete rather than continuous values, the CREATE TYPE command should specify a canonical function. The canonicalization function takes an input range value, and must return an equivalent range value that may have different bounds and formatting. The canonical output for two ranges that represent the same set of values, for example the integer ranges [1, 7] and [1, 8), must be identical. It doesn't matter which representation you choose to be the canonical one, so long as two equivalent values with different formattings are always mapped to the same value with the same formatting. In addition to adjusting the inclusive/exclusive bounds format, a canonicalization function might round off boundary values, in case the desired step size is larger than what the subtype is capable of storing. For instance, a range type over timestamp could be defined to have a step size of an hour, in which case the canonicalization function would need to round off bounds that weren't a multiple of an hour, or perhaps throw an error instead.

In addition, any range type that is meant to be used with GiST or SP-GiST indexes should define a subtype difference, or subtype_diff, function. (The index will still work without subtype_diff, but it is likely to be considerably less efficient than if a difference function is provided.) The subtype difference function takes two input values of the subtype, and returns their difference (i.e., X minus Y) represented as a float8 value. In our example above, the function float8mi that underlies the regular float8 minus operator can be used; but for any other subtype, some type conversion would be necessary. Some creative thought about how to represent differences as numbers might be needed, too. To the greatest extent possible, the subtype_diff function should agree with the sort ordering implied by the selected operator class and collation; that is, its result should be positive whenever its first argument is greater than its second according to the sort ordering.

A less-oversimplified example of a subtype_diff function is:

CREATE FUNCTION time_subtype_diff(x time, y time) RETURNS float8 AS
'SELECT EXTRACT(EPOCH FROM (x - y))' LANGUAGE sql STRICT IMMUTABLE;

CREATE TYPE timerange AS RANGE (
    subtype = time,
    subtype_diff = time_subtype_diff
);

SELECT '[11:10, 23:00]'::timerange;

See CREATE TYPE for more information about creating range types.

8.17.9. Indexing

GiST and SP-GiST indexes can be created for table columns of range types. For instance, to create a GiST index:

CREATE INDEX reservation_idx ON reservation USING GIST (during);

A GiST or SP-GiST index can accelerate queries involving these range operators: =, &&, <@, @>, <<, >>, -|-, &<, and &> (see Table 9.53 for more information).

In addition, B-tree and hash indexes can be created for table columns of range types. For these index types, basically the only useful range operation is equality. There is a B-tree sort ordering defined for range values, with corresponding < and > operators, but the ordering is rather arbitrary and not usually useful in the real world. Range types' B-tree and hash support is primarily meant to allow sorting and hashing internally in queries, rather than creation of actual indexes.

8.17.10. Constraints on Ranges

While UNIQUE is a natural constraint for scalar values, it is usually unsuitable for range types. Instead, an exclusion constraint is often more appropriate (see CREATE TABLE ... CONSTRAINT ... EXCLUDE). Exclusion constraints allow the specification of constraints such as “non-overlapping” on a range type. For example:

CREATE TABLE reservation (
    during tsrange,
    EXCLUDE USING GIST (during WITH &&)
);

That constraint will prevent any overlapping values from existing in the table at the same time:

INSERT INTO reservation VALUES
    ('[2010-01-01 11:30, 2010-01-01 15:00)');
INSERT 0 1

INSERT INTO reservation VALUES
    ('[2010-01-01 14:45, 2010-01-01 15:45)');
ERROR:  conflicting key value violates exclusion constraint "reservation_during_excl"
DETAIL:  Key (during)=(["2010-01-01 14:45:00","2010-01-01 15:45:00")) conflicts
with existing key (during)=(["2010-01-01 11:30:00","2010-01-01 15:00:00")).

You can use the btree_gist extension to define exclusion constraints on plain scalar data types, which can then be combined with range exclusions for maximum flexibility. For example, after btree_gist is installed, the following constraint will reject overlapping ranges only if the meeting room numbers are equal:

CREATE EXTENSION btree_gist;
CREATE TABLE room_reservation (
    room text,
    during tsrange,
    EXCLUDE USING GIST (room WITH =, during WITH &&)
);

INSERT INTO room_reservation VALUES
    ('123A', '[2010-01-01 14:00, 2010-01-01 15:00)');
INSERT 0 1

INSERT INTO room_reservation VALUES
    ('123A', '[2010-01-01 14:30, 2010-01-01 15:30)');
ERROR:  conflicting key value violates exclusion constraint "room_reservation_room_during_excl"
DETAIL:  Key (room, during)=(123A, ["2010-01-01 14:30:00","2010-01-01 15:30:00")) conflicts
with existing key (room, during)=(123A, ["2010-01-01 14:00:00","2010-01-01 15:00:00")).

INSERT INTO room_reservation VALUES
    ('123B', '[2010-01-01 14:30, 2010-01-01 15:30)');
INSERT 0 1

8.15. 陣列

PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, composite type, range type, or domain can be created.

8.15.1. Declaration of Array Types

To illustrate the use of array types, we create this table:

CREATE TABLE sal_emp (
    name            text,
    pay_by_quarter  integer[],
    schedule        text[][]
);

As shown, an array data type is named by appending square brackets ([]) to the data type name of the array elements. The above command will create a table named sal_emp with a column of type text (name), a one-dimensional array of type integer (pay_by_quarter), which represents the employee's salary by quarter, and a two-dimensional array of text (schedule), which represents the employee's weekly schedule.

The syntax for CREATE TABLE allows the exact size of arrays to be specified, for example:

CREATE TABLE tictactoe (
    squares   integer[3][3]
);

However, the current implementation ignores any supplied array size limits, i.e., the behavior is the same as for arrays of unspecified length.

The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE is simply documentation; it does not affect run-time behavior.

An alternative syntax, which conforms to the SQL standard by using the keyword ARRAY, can be used for one-dimensional arrays. pay_by_quarter could have been defined as:

    pay_by_quarter  integer ARRAY[4],

Or, if no array size is to be specified:

    pay_by_quarter  integer ARRAY,

As before, however, PostgreSQL does not enforce the size restriction in any case.

8.15.2. Array Value Input

To write an array value as a literal constant, enclose the element values within curly braces and separate them by commas. (If you know C, this is not unlike the C syntax for initializing structures.) You can put double quotes around any element value, and must do so if it contains commas or curly braces. (More details appear below.) Thus, the general format of an array constant is the following:

'{ val1 delim val2 delim ... }'

where delim is the delimiter character for the type, as recorded in its pg_type entry. Among the standard data types provided in the PostgreSQL distribution, all use a comma (,), except for type box which uses a semicolon (;). Each val is either a constant of the array element type, or a subarray. An example of an array constant is:

'{{1,2,3},{4,5,6},{7,8,9}}'

This constant is a two-dimensional, 3-by-3 array consisting of three subarrays of integers.

To set an element of an array constant to NULL, write NULL for the element value. (Any upper- or lower-case variant of NULL will do.) If you want an actual string value “NULL”, you must put double quotes around it.

(These kinds of array constants are actually only a special case of the generic type constants discussed in Section 4.1.2.7. The constant is initially treated as a string and passed to the array input conversion routine. An explicit type specification might be necessary.)

Now we can show some INSERT statements:

INSERT INTO sal_emp
    VALUES ('Bill',
    '{10000, 10000, 10000, 10000}',
    '{{"meeting", "lunch"}, {"training", "presentation"}}');

INSERT INTO sal_emp
    VALUES ('Carol',
    '{20000, 25000, 25000, 25000}',
    '{{"breakfast", "consulting"}, {"meeting", "lunch"}}');

The result of the previous two inserts looks like this:

SELECT * FROM sal_emp;
 name  |      pay_by_quarter       |                 schedule
-------+---------------------------+-------------------------------------------
 Bill  | {10000,10000,10000,10000} | {{meeting,lunch},{training,presentation}}
 Carol | {20000,25000,25000,25000} | {{breakfast,consulting},{meeting,lunch}}
(2 rows)

Multidimensional arrays must have matching extents for each dimension. A mismatch causes an error, for example:

INSERT INTO sal_emp
    VALUES ('Bill',
    '{10000, 10000, 10000, 10000}',
    '{{"meeting", "lunch"}, {"meeting"}}');
ERROR:  multidimensional arrays must have array expressions with matching dimensions

The ARRAY constructor syntax can also be used:

INSERT INTO sal_emp
    VALUES ('Bill',
    ARRAY[10000, 10000, 10000, 10000],
    ARRAY[['meeting', 'lunch'], ['training', 'presentation']]);

INSERT INTO sal_emp
    VALUES ('Carol',
    ARRAY[20000, 25000, 25000, 25000],
    ARRAY[['breakfast', 'consulting'], ['meeting', 'lunch']]);

Notice that the array elements are ordinary SQL constants or expressions; for instance, string literals are single quoted, instead of double quoted as they would be in an array literal. The ARRAY constructor syntax is discussed in more detail in Section 4.2.12.

8.15.3. Accessing Arrays

Now, we can run some queries on the table. First, we show how to access a single element of an array. This query retrieves the names of the employees whose pay changed in the second quarter:

SELECT name FROM sal_emp WHERE pay_by_quarter[1] <> pay_by_quarter[2];

 name
-------
 Carol
(1 row)

The array subscript numbers are written within square brackets. By default PostgreSQL uses a one-based numbering convention for arrays, that is, an array of n elements starts with array[1] and ends with array[n].

This query retrieves the third quarter pay of all employees:

SELECT pay_by_quarter[3] FROM sal_emp;

 pay_by_quarter
----------------
          10000
          25000
(2 rows)

We can also access arbitrary rectangular slices of an array, or subarrays. An array slice is denoted by writing lower-bound:upper-bound for one or more array dimensions. For example, this query retrieves the first item on Bill's schedule for the first two days of the week:

SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';

        schedule
------------------------
 {{meeting},{training}}
(1 row)

If any dimension is written as a slice, i.e., contains a colon, then all dimensions are treated as slices. Any dimension that has only a single number (no colon) is treated as being from 1 to the number specified. For example, [2] is treated as [1:2], as in this example:

SELECT schedule[1:2][2] FROM sal_emp WHERE name = 'Bill';

                 schedule
-------------------------------------------
 {{meeting,lunch},{training,presentation}}
(1 row)

To avoid confusion with the non-slice case, it's best to use slice syntax for all dimensions, e.g., [1:2][1:1], not [2][1:1].

It is possible to omit the lower-bound and/or upper-bound of a slice specifier; the missing bound is replaced by the lower or upper limit of the array's subscripts. For example:

SELECT schedule[:2][2:] FROM sal_emp WHERE name = 'Bill';

        schedule
------------------------
 {{lunch},{presentation}}
(1 row)

SELECT schedule[:][1:1] FROM sal_emp WHERE name = 'Bill';

        schedule
------------------------
 {{meeting},{training}}
(1 row)

An array subscript expression will return null if either the array itself or any of the subscript expressions are null. Also, null is returned if a subscript is outside the array bounds (this case does not raise an error). For example, if schedule currently has the dimensions [1:3][1:2] then referencing schedule[3][3] yields NULL. Similarly, an array reference with the wrong number of subscripts yields a null rather than an error.

An array slice expression likewise yields null if the array itself or any of the subscript expressions are null. However, in other cases such as selecting an array slice that is completely outside the current array bounds, a slice expression yields an empty (zero-dimensional) array instead of null. (This does not match non-slice behavior and is done for historical reasons.) If the requested slice partially overlaps the array bounds, then it is silently reduced to just the overlapping region instead of returning null.

The current dimensions of any array value can be retrieved with the array_dims function:

SELECT array_dims(schedule) FROM sal_emp WHERE name = 'Carol';

 array_dims
------------
 [1:2][1:2]
(1 row)

array_dims produces a text result, which is convenient for people to read but perhaps inconvenient for programs. Dimensions can also be retrieved with array_upper and array_lower, which return the upper and lower bound of a specified array dimension, respectively:

SELECT array_upper(schedule, 1) FROM sal_emp WHERE name = 'Carol';

 array_upper
-------------
           2
(1 row)

array_length will return the length of a specified array dimension:

SELECT array_length(schedule, 1) FROM sal_emp WHERE name = 'Carol';

 array_length
--------------
            2
(1 row)

cardinality returns the total number of elements in an array across all dimensions. It is effectively the number of rows a call to unnest would yield:

SELECT cardinality(schedule) FROM sal_emp WHERE name = 'Carol';

 cardinality
-------------
           4
(1 row)

8.15.4. Modifying Arrays

An array value can be replaced completely:

UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}'
    WHERE name = 'Carol';

or using the ARRAY expression syntax:

UPDATE sal_emp SET pay_by_quarter = ARRAY[25000,25000,27000,27000]
    WHERE name = 'Carol';

An array can also be updated at a single element:

UPDATE sal_emp SET pay_by_quarter[4] = 15000
    WHERE name = 'Bill';

or updated in a slice:

UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}'
    WHERE name = 'Carol';

The slice syntaxes with omitted lower-bound and/or upper-bound can be used too, but only when updating an array value that is not NULL or zero-dimensional (otherwise, there is no existing subscript limit to substitute).

A stored array value can be enlarged by assigning to elements not already present. Any positions between those previously present and the newly assigned elements will be filled with nulls. For example, if array myarray currently has 4 elements, it will have six elements after an update that assigns to myarray[6]; myarray[5] will contain null. Currently, enlargement in this fashion is only allowed for one-dimensional arrays, not multidimensional arrays.

Subscripted assignment allows creation of arrays that do not use one-based subscripts. For example one might assign to myarray[-2:7] to create an array with subscript values from -2 to 7.

New array values can also be constructed using the concatenation operator, ||:

SELECT ARRAY[1,2] || ARRAY[3,4];
 ?column?
-----------
 {1,2,3,4}
(1 row)

SELECT ARRAY[5,6] || ARRAY[[1,2],[3,4]];
      ?column?
---------------------
 {{5,6},{1,2},{3,4}}
(1 row)

The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. It also accepts two N-dimensional arrays, or an N-dimensional and an N+1-dimensional array.

When a single element is pushed onto either the beginning or end of a one-dimensional array, the result is an array with the same lower bound subscript as the array operand. For example:

SELECT array_dims(1 || '[0:1]={2,3}'::int[]);
 array_dims
------------
 [0:2]
(1 row)

SELECT array_dims(ARRAY[1,2] || 3);
 array_dims
------------
 [1:3]
(1 row)

When two arrays with an equal number of dimensions are concatenated, the result retains the lower bound subscript of the left-hand operand's outer dimension. The result is an array comprising every element of the left-hand operand followed by every element of the right-hand operand. For example:

SELECT array_dims(ARRAY[1,2] || ARRAY[3,4,5]);
 array_dims
------------
 [1:5]
(1 row)

SELECT array_dims(ARRAY[[1,2],[3,4]] || ARRAY[[5,6],[7,8],[9,0]]);
 array_dims
------------
 [1:5][1:2]
(1 row)

When an N-dimensional array is pushed onto the beginning or end of an N+1-dimensional array, the result is analogous to the element-array case above. Each N-dimensional sub-array is essentially an element of the N+1-dimensional array's outer dimension. For example:

SELECT array_dims(ARRAY[1,2] || ARRAY[[3,4],[5,6]]);
 array_dims
------------
 [1:3][1:2]
(1 row)

An array can also be constructed by using the functions array_prepend, array_append, or array_cat. The first two only support one-dimensional arrays, but array_cat supports multidimensional arrays. Some examples:

SELECT array_prepend(1, ARRAY[2,3]);
 array_prepend
---------------
 {1,2,3}
(1 row)

SELECT array_append(ARRAY[1,2], 3);
 array_append
--------------
 {1,2,3}
(1 row)

SELECT array_cat(ARRAY[1,2], ARRAY[3,4]);
 array_cat
-----------
 {1,2,3,4}
(1 row)

SELECT array_cat(ARRAY[[1,2],[3,4]], ARRAY[5,6]);
      array_cat
---------------------
 {{1,2},{3,4},{5,6}}
(1 row)

SELECT array_cat(ARRAY[5,6], ARRAY[[1,2],[3,4]]);
      array_cat
---------------------
 {{5,6},{1,2},{3,4}}

In simple cases, the concatenation operator discussed above is preferred over direct use of these functions. However, because the concatenation operator is overloaded to serve all three cases, there are situations where use of one of the functions is helpful to avoid ambiguity. For example consider:

SELECT ARRAY[1, 2] || '{3, 4}';  -- the untyped literal is taken as an array
 ?column?
-----------
 {1,2,3,4}

SELECT ARRAY[1, 2] || '7';                 -- so is this one
ERROR:  malformed array literal: "7"

SELECT ARRAY[1, 2] || NULL;                -- so is an undecorated NULL
 ?column?
----------
 {1,2}
(1 row)

SELECT array_append(ARRAY[1, 2], NULL);    -- this might have been meant
 array_append
--------------
 {1,2,NULL}

In the examples above, the parser sees an integer array on one side of the concatenation operator, and a constant of undetermined type on the other. The heuristic it uses to resolve the constant's type is to assume it's of the same type as the operator's other input — in this case, integer array. So the concatenation operator is presumed to represent array_cat, not array_append. When that's the wrong choice, it could be fixed by casting the constant to the array's element type; but explicit use of array_append might be a preferable solution.

8.15.5. Searching in Arrays

To search for a value in an array, each value must be checked. This can be done manually, if you know the size of the array. For example:

SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR
                            pay_by_quarter[2] = 10000 OR
                            pay_by_quarter[3] = 10000 OR
                            pay_by_quarter[4] = 10000;

However, this quickly becomes tedious for large arrays, and is not helpful if the size of the array is unknown. An alternative method is described in Section 9.23. The above query could be replaced by:

SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);

In addition, you can find rows where the array has all values equal to 10000 with:

SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);

Alternatively, the generate_subscripts function can be used. For example:

SELECT * FROM
   (SELECT pay_by_quarter,
           generate_subscripts(pay_by_quarter, 1) AS s
      FROM sal_emp) AS foo
 WHERE pay_by_quarter[s] = 10000;

This function is described in Table 9.62.

You can also search an array using the && operator, which checks whether the left operand overlaps with the right operand. For instance:

SELECT * FROM sal_emp WHERE pay_by_quarter && ARRAY[10000];

This and other array operators are further described in Section 9.18. It can be accelerated by an appropriate index, as described in Section 11.2.

You can also search for specific values in an array using the array_position and array_positions functions. The former returns the subscript of the first occurrence of a value in an array; the latter returns an array with the subscripts of all occurrences of the value in the array. For example:

SELECT array_position(ARRAY['sun','mon','tue','wed','thu','fri','sat'], 'mon');
 array_positions
-----------------
 2

SELECT array_positions(ARRAY[1, 4, 3, 1, 3, 4, 2, 1], 1);
 array_positions
-----------------
 {1,4,8}

Tip

Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

8.15.6. Array Input and Output Syntax

The external text representation of an array value consists of items that are interpreted according to the I/O conversion rules for the array's element type, plus decoration that indicates the array structure. The decoration consists of curly braces ({ and }) around the array value plus delimiter characters between adjacent items. The delimiter character is usually a comma (,) but can be something else: it is determined by the typdelim setting for the array's element type. Among the standard data types provided in the PostgreSQL distribution, all use a comma, except for type box, which uses a semicolon (;). In a multidimensional array, each dimension (row, plane, cube, etc.) gets its own level of curly braces, and delimiters must be written between adjacent curly-braced entities of the same level.

The array output routine will put double quotes around element values if they are empty strings, contain curly braces, delimiter characters, double quotes, backslashes, or white space, or match the word NULL. Double quotes and backslashes embedded in element values will be backslash-escaped. For numeric data types it is safe to assume that double quotes will never appear, but for textual data types one should be prepared to cope with either the presence or absence of quotes.

By default, the lower bound index value of an array's dimensions is set to one. To represent arrays with other lower bounds, the array subscript ranges can be specified explicitly before writing the array contents. This decoration consists of square brackets ([]) around each array dimension's lower and upper bounds, with a colon (:) delimiter character in between. The array dimension decoration is followed by an equal sign (=). For example:

SELECT f1[1][-2][3] AS e1, f1[1][-1][5] AS e2
 FROM (SELECT '[1:1][-2:-1][3:5]={{{1,2,3},{4,5,6}}}'::int[] AS f1) AS ss;

 e1 | e2
----+----
  1 |  6
(1 row)

The array output routine will include explicit dimensions in its result only when there are one or more lower bounds different from one.

If the value written for an element is NULL (in any case variant), the element is taken to be NULL. The presence of any quotes or backslashes disables this and allows the literal string value “NULL” to be entered. Also, for backward compatibility with pre-8.2 versions of PostgreSQL, the array_nulls configuration parameter can be turned off to suppress recognition of NULL as a NULL.

As shown previously, when writing an array value you can use double quotes around any individual array element. You must do so if the element value would otherwise confuse the array-value parser. For example, elements containing curly braces, commas (or the data type's delimiter character), double quotes, backslashes, or leading or trailing whitespace must be double-quoted. Empty strings and strings matching the word NULL must be quoted, too. To put a double quote or backslash in a quoted array element value, precede it with a backslash. Alternatively, you can avoid quotes and use backslash-escaping to protect all data characters that would otherwise be taken as array syntax.

You can add whitespace before a left brace or after a right brace. You can also add whitespace before or after any individual item string. In all of these cases the whitespace will be ignored. However, whitespace within double-quoted elements, or surrounded on both sides by non-whitespace characters of an element, is not ignored.

Tip

The ARRAY constructor syntax (see Section 4.2.12) is often easier to work with than the array-literal syntax when writing array values in SQL commands. In ARRAY, individual element values are written the same way they would be written when not members of an array.

8. 資料型別

相容性

8.1. 數字型別

Table 8.2. Numeric Types

8.1.1. 整數型別（Integer Types）

8.1.2. 可調式精確度數值型別（NUMERIC Type）

注意

注意

8.1.3. 浮點數型別（Floating-Point Types）

注意

注意

注意

8.1.4. 序列型別（Serial Types）

注意

注意

8.2. 貨幣型別

8.3. 字串型別

小提醒

8.4. 位元組型別（bytea）

Table 8.6. Binary Data Types

8.4.1. bytea 十六進位格式

8.4.2. bytea 轉譯（escape）格式

Table 8.7. bytea Literal Escaped Octets

Table 8.8. bytea Output Escaped Octets

8.5. 日期時間型別

注意

8.5.1. 日期/時間輸入

8.5.1.1. 日期

8.5.1.2. 時間

8.5.1.3. 時間戳記

8.5.1.4. 特殊值

8.5.2. Date/Time Output

Note

8.5.3. Time Zones

8.5.4. Interval Input

8.5.5. Interval Output

8.6. 布林型別

8.7. 列舉型別

8.7.1. Declaration of Enumerated Types

8.7.2. Ordering

8.7.3. Type Safety

8.7.4. Implementation Details

8.8. 地理資訊型別

8.8.1. Points

8.8.2. Lines

8.8.3. Line Segments

8.8.4. Boxes

8.8.5. Paths

8.8.6. Polygons

8.8.7. Circles

8.9. 網路資訊型別

Table 8.21. Network Address Types

8.9.1. inet

8.9.2. cidr

Table 8.22. cidr Type Input Examples

8.9.3. inet vs. cidr

Tip

8.9.4. macaddr

8.9.5. macaddr8

8.10. 位元字串型別

Note

8.11. 全文檢索型別

8.11.1. tsvector

8.11.2. tsquery

8.12. UUID 型別

8.13. XML 型別

8.13.1. Creating XML Values

8.13.2. Encoding Handling

Caution

8.13.3. Accessing XML Values

8.14. JSON 型別

Table 8.23. JSON Primitive Types and Corresponding PostgreSQL Types

8.14.1. JSON 輸入與輸出語法

8.14.2. 設計 JSON 文件結構

8.14.3. jsonb Containment and Existence

8.14.4. jsonb Indexing

8.14.5. 對應轉換

8.14.6. jsonpath Type

Table 8.24. jsonpath Variables

Table 8.25. jsonpath Accessors

8.4.1. `bytea` 十六進位格式

8.4.2. `bytea` 轉譯（escape）格式

Table 8.7. `bytea` Literal Escaped Octets

Table 8.8. `bytea` Output Escaped Octets

8.9.1. `inet`

8.9.2. `cidr`

Table 8.22. `cidr` Type Input Examples

8.9.3. `inet` vs. `cidr`

8.9.4. `macaddr`

8.9.5. `macaddr8`

8.11.1. `tsvector`

8.11.2. `tsquery`

8.14.3. `jsonb` Containment and Existence

8.14.4. `jsonb` Indexing

Table 8.24. `jsonpath` Variables

Table 8.25. `jsonpath` Accessors

8.9.1. `inet`

8.9.2. `cidr`

Table 8.22. `cidr` Type Input Examples

8.9.3. `inet` vs. `cidr`

8.9.4. `macaddr`

8.9.5. `macaddr8`

8.11.1. `tsvector`

8.11.2. `tsquery`

8.14.3. `jsonb` Containment and Existence

8.14.4. `jsonb` Indexing