1 of 100

12 簡介

本使用手冊由台灣 PostgreSQL 社群提供，翻譯自 PostgreSQL 官方使用手冊，以推廣 PostgreSQL 於台灣的應用。

本使用手冊目前編譯內容為 PostgreSQL 12。

每一個頁面均附上官方手冊對應連結，翻譯未詳盡之處，可對照閱讀。未翻譯完成之段落，將暫以原文（英文）替代。

參與協作請在任何頁面，點選右上角的「Edit on GitHub」，修改後直接送 PR 給我們即可。（只翻一句也可以唷！）

任何問題或建議可以 Email 給我們的文件小組：[email protected]

前言

本手冊是 PostgreSQL 的官方手冊。由 PostgreSQL 開發人員和其他志願者在 PostgreSQL 軟體開發的同時所撰寫的。它描述了目前 PostgreSQL 版本正式支援的所有功能。

為了使有關 PostgreSQL 的大量資訊易於管理，本書劃分為幾個部分。每個部分針對的是不同需求的使用者，或針對處於 PostgreSQL 經驗不同階段的使用者：

第一部分是對新使用者的入門簡介。
第二部分將介紹 SQL 查詢語言環境，包括資料型別、函數以及使用者層級的效能調教。每個 PostgreSQL 使用者都應該閱讀此部份的內容。
第三部分則介紹伺服器的安裝及管理。任何維運 PostgreSQL 伺服器的人，無論是供私人使用還是提供給其他人使用，都應該閱讀此部分。
第四部分描述了 PostgreSQL 用戶端的程式設計介面。
第五部分為進階使用者提供有關資料庫服務進階功能的資訊。主題包括使用者定義的資料型別與函數。
第六部分包含有關 SQL 命令、用戶端和伺服器程式的參考資訊。這部分以命令或程序分類結構化資訊。
第七部分包含了對 PostgreSQL 開發人員可能有用的各種資訊。

1. 什麼是 PostgreSQL？

PostgreSQL 是美國加州伯克萊大學資訊科學系基於所研發的物件關聯式資料庫管理系統（ORDBMS, Object-Relational Database Management System）。POSTGRES 中的許多重要概念成為日後一些商用資料庫系統重要的一部份。

PostgreSQL 由伯克萊大學公開其原始碼所誕生，它支援了大多數的標準 SQL 語法，並提供許多先進的功能：

複雜查詢（complex queries）
外部索引鍵（foreign keys）
觸發器（triggers）
可更新檢查表（updatable views）
事務完整性（transactional integrity）
多版本併行控制（multiversion concurrency control）

同時，PostgreSQL 也支援讓使用者能以自己的方式進行擴充。比如透過新增：

資料型別（data types）
函數（functions）
操作（operators）
聚合函數（aggregate functions）
索引方法（index methods）
過程式語言（procedural languages）

並且基於自由許可證，任何人都能夠以任何目的，免費地使用、修改、與散布 PostgreSQL，不論是個人使用、商業用途還是學術研究。

2. PostgreSQL 沿革

現在被稱為 PostgreSQL 的物件關聯式資料庫管理系統，是根據美國加州伯克萊大學所研發的 POSTGRES 衍生而成。經過超過二十年以上的演進，PostgreSQL 現在是世界上最先進的開源資料庫系統。

2.1. 伯克萊大學 POSTGRES 專案

POSTGRES 專案是由 Michael Stonebraker 教授領導的團隊進行研發，由美國國防高等研究計劃署（DARPA, Defense Advanced Research Projects Agency）、美國陸軍研究辦公室（ARO, the Army Research Office）、美國國家科學基金會（NSF, the National Science Foundation）及美國電磁系統實驗室（ESL, Inc）所贊助。POSTGRES 專案始於 1986 年，最原始的設計，＂The design of POSTGRES＂，作為開端，其最初的資料結構模型則揭露於＂The POSTGRES data model＂。規則系統設計發表於＂The design of the POSTGRES rules system＂，而當時的關連式資料儲存的架構則刊載於＂The design of the POSTGRES storage system＂。

POSTGRES 接下進行了幾次重大的變革。第一代的＂demoware＂在 1987 年真的實作成為可用的系統，並在 1988 年的 ACM-SIGMOD 研討會中進行展示，並在 1989 年六月，釋出了第一版可供外部使用者使用的資料庫系統。為了回應當時使用者對於第一代規則系統的批評，其規則系統重新進行設計，並在隔年 1990 年的六月份，隨即推出第二版系統，搭載新的規則系統設計。第三版系統則於 1991 年發表，新增支援多重儲存管理機制，改善查詢處理器，並又改寫了規則系統。如此直到 Postgres 95 誕生之前，主要都專注於移植性及可信賴度的發展。

POSTGRES 接下來開始被運用在許多不同的研究和產品上，財務資料分析系統、噴氣引擎效能監控系統、小行星追蹤資料庫、醫療資訊系統、以及數個地理資訊系統。POSTGRES 也被好幾所大學用於其教學工具。最後，由 Illustra Information Technologies（後來併入，而 Informix 目前為所擁有）技術移轉，並將其商業化。於 1992 年末，POSTGRES 成為主要的資料管理系統。

在 1993 年間，用戶數量呈現倍數成長，伴隨而來的是大量的程式碼維護與服務支援，占去絕大部份原來應該進行研究的時間。為了減少維運的負擔，伯克萊的 POSTGRES 專案正式終止於 4.2 版。

2.2. Postgres95

1994年，Andrew Yu 和 Jolly Chen 在 POSTGRES 增加了 SQL 語法的直譯器，並且以新的 Postgres95 為名，在網路上開放讓全世界的人使用，他們成為伯克萊 POSTGRES 原始碼最初的繼承者。

Postgres95 的程式碼是完全以 ANSI C 開發，並且輕量化了 25%。許多內部的改良增進了效率及可維護性。當時 Wisconsin Benchmark 進行測試，Postgres95 在 1.0.x 時的效能比原始的 POSTGRES 4.2 快了約 30% 至 50%。除了一些錯誤修正之外，還有下面這些主要的改良：

原有的 PostQUEL 以 SQL（實作於伺服器端）所取代。（連接介面在 PostQUEL 之後便採函式庫）子查詢一直到 PostgreSQL 出現之前都還未支援，但在 Postgres95 便已能使用自訂的 SQL 函數，聚合函數 Aggregate function 則被重新實作。GROUP BY 查詢語句也在此時被加入。
提供新的工具 psql 可進行互動式的 SQL 操作，採用的是 GNU Readline 的技術，大量地取代了老舊的管理工具。
提供新的前端函式庫 libpgtcl，支援 Tcl-based 用戶端程式。還有一個簡易的 shell 接口 pgtclsh，使用新的 Tcl 命令來和 Postgres95 伺服器進行操作。
重新改寫了大型物件的交換介面，只保留大型物件翻轉（inversion）作為儲存大型物件的唯一機制。（移除了 inversion 檔案系統）
淘汰了實例層級（instance-level）的規則系統，但其規則仍用於重構規則所使用。
製作了一個簡短的說明，介紹標準的 SQL 功能，並隨 Postgres95 原始碼發佈。
使用 GNU make（取代 BSD make）來編譯程式碼。除此之外，Postgres95 也支援使用未修正的 GCC 編譯器（修正高精度資料對齊問題）。

2.3. PostgreSQL

1996 年，「Postgres95」這個名稱很明顯不再適合。於是我們選擇了新的名稱「PostgreSQL」來呈現出與原始 POSTGRES 之間的源由，也彰顯了結合 SQL 力量的意義。同時，我們設定其版本由 6.0 開始，重回伯克萊 POSTGRES 專案的版號序列。

許多人持續使用「Postgres」（現在已經很少使用全大寫字母表現）來代表 PostgreSQL，是因為傳統，也可能是因為比較好發音。這樣的用法也廣為用於暱稱或別名。

Postgres95 的發展主要在於瞭解及定義伺服器程式既有的問題，而 PostgreSQL 則更重視系統的能力與爭議性的功能上，不過所有的工作是全面性的。

更多有關於 PostgreSQL 的發展，請參閱。

3. 慣例

以下所提到慣例，用於指令的語法描述上（均為半型字元）：

中括號（[ 和 ]）指可選擇是否輸入的選項。（在 Tcl 指令的語法中，習慣使用問號 ? 來表達這樣的可選擇性）
大括號（{ 和 }）及垂直線（|）指的是必須要輸入的部份。
連續句點（...）指的是該段落可以允許不斷重覆。

為了使說明更簡潔：

SQL 指令會跟在命令提示字元 => 之後
Shell 命令會跟在命令提示字元 $ 之後。雖然一般而言，提示字元可能不會顯示。

關於操作人員的定義：

管理者（Administrator） 一般的定義是負責安裝及運行資料庫系統的人
使用者（User） 指的是任何正在使用資料庫的人，或者正要使用任何 PostgreSQL 相關系統的人。

這些定義不應該被解釋得太過嚴格，在本文件中，對於系統管理的工作，並沒有固定的假設。

4. 其他參考資訊

除了此份文件之外，PostgreSQL 還有其他的參考資源：

維基（Wiki）

PostgreSQL的 wiki 記錄了這個專案的常見問題與解答（FAQ），待辦事項（TODO），以及其他更多不同主題的資訊。

PostgreSQL wiki 也有台灣中文的頁面喔。

網站（Web Site）

PostgreSQL 的官方網站，有最新軟體的釋出訊息，讓你能夠和 PostgreSQL 相處得更棒！

郵件列表（Mailing Lists）

郵件列表的功能，是一個為您解答疑問的好地方，你也可以分享使用經驗給其他同好，或直接和開發者溝通。詳情請參閱 PostgreSQL 的官方網站。

你！（Yourself!）

PostgreSQL 是一個開源的專案，也就是說，它仰賴社群的每一個人給予支持。當你開始使用 PostgreSQL，你會需要其他人的幫助，可能是透過文件或是郵件列表的功能。請考慮也可以回饋您的知識。在閱讀郵件列表和回答疑問的同時，如果你學到了未被文件記載的知識時，請寫下來，並且供獻出來。如果你撰寫了一些程式碼增加了特別的功能，也希望能夠回饋到社群之中。

I. 新手教學

歡迎來到 PostgreSQL 的新手教學。在這個部份裡的內容，主要提供有關於 PostgreSQL 各項功能的簡介、關連式資料庫概念、以及 SQL 語法的入門說明。我們只假設您俱備一些電腦系統基本操作，並不需要很專業的 Unix 或程式設計經驗。這裡主要提供一些實用的經驗，還有 PostgreSQL 系統中重要部份的介紹。在這個部份並不會進行所有議題的詳細說明。

在你閱讀完新手教學之後，也許可以繼續閱讀第二部份：更多有關於 SQL 語法的標準知識；或者到第四部份：瞭解如何開發 PostgreSQL 的應用程式；而如果你需要建置及管理你的資料庫伺服器的話，請參閱第三部份的內容。

1. 入門指南

1.1. 安裝：從無到有，安裝一個 PostgreSQL 資料庫系統。
1.2. 基礎架構：認識 PostgreSQL 的資料庫架構。
1.3. 建立一個資料庫：建立第一個 PostgreSQL 資料庫。
1.4. 存取一個資料庫：開始存取你的 PostgreSQL 資料庫。

1.1. 安裝

你需要先進行安裝，才能開始使用 PostgreSQL。當然，PostgreSQL 也可能已經被安裝在你的系統之中，因為你的作業系統預設套件包含了 PostgreSQL，或其他系統管理者已先行安裝。如果是這樣的話，那麼你應該先瞭解作業系統的資訊，或向你的系統管理員取得存取方式的資訊。

如果你並不確定 PostgreSQL 是否已經可以使用，那麼你也可以自行安裝試試。這樣做並不是很困難，而且是很好的操作練習。PostgreSQL 可以以一般使用者進行安裝，它並不需要系統管理者（root）的權限才能安裝。

如果你打算自行安裝 PostgreSQL，你可以參考第 16 章的指令進行，完成之後再回到這裡，以瞭解下面有關設定環境變數的內容。

如果你的系統管理者並非以預設的方式安裝，你可能還有一些額外的工作要做。例如，如果資料庫主機其實是遠端的伺服器，你會需要設定 PGHOST 的環境變數，將其指向資料庫主機的網路名稱。而 PGPORT 變數也是必須要設定的。最基本的情境是，如果你嘗試啓動一個應用程式，而它回報它無法取得資料庫連線時，你就必須洽詢你的系統管理者。而如果系統管理者就是你自己，那麼你應該依文件再確認你的環境設定是正確的。如果你仍然並不清楚前面所描述的事項，請詳細閱讀下一節的內容。

1.2. 基礎架構

在開始使用之前，你需要瞭解基本的 PostgreSQL 系統架構。瞭解 PostgreSQL 如何回應操作，有助於讓你更清楚理解接下來的說明。

以資料庫的術語來說，PostgreSQL 採用了主從式架構（client/server）。PostgreSQL 會在進行下列操作時保持連線：

伺服器的執行程序，負責管理資料庫的檔案、受理用戶端的連線要求、執行相對應的資料庫動作。這樣的資料庫伺服端程式稱之為「postgres」。
用戶端的程式用來發起資料庫操作的行為，其設計的形態很廣泛：可能是文字介面的工具、圖型介面的程式、將資料庫內容顯示成網頁的網際網路伺服器、甚或是專用的資料庫管理工具。有一些用戶端程式是由 PostgreSQL 官方所提供，大部份由第三方的其他使用者所開發。

如同一般的主從式架構，用戶端與伺服端可以是兩台不同的主機，而他們透過 TCP/IP 的網路協定溝通。你應該將這個觀念謹記在心，因為某些在用戶端可以被存取的檔案，在伺服端可能就無法存取（或使用不同的檔案名稱）。

PostgreSQL 伺服器可以管理來自多個用戶端的同時連線。為了達到這樣的功能，它會自我複製（fork）成新的執行程序，一對一地處理每一個連線。這個部份進一步來說，用戶端和新的伺服器執行程序之間的溝通，並不需要原始的 postgres 執行程序介入。也就是說，主要的資料庫服務執行程序會持續等待其他用戶端的連線，協助安排好其與伺服端執行程序的配對之後便完全交接，再回到等待的狀態。（當然，使用者完全不會察覺這些行為，在此說明僅僅是為了整體性的概念描繪）

1.3. 建立一個資料庫

第一個測試確認你是否能夠存取一個資料庫服務，就是嘗試去建立一個資料庫。一個執行中的 PostgreSQL 服務可以管理許多個資料庫。一般來說，每一個專案或使用者會分開使用不同的資料庫。

你的系統管理員也可能已經為你建立了一個資料庫，如果是這樣的話，那你可以略過本節說明，直接進入到下一節的內容。

要建立一個新的資料庫，在本例中取名叫「mydb」，你可以使用以下的命令：

如果在這個步驟沒有產生任何回應，那就是成功了。你可以跳過本節剩餘的部份。

但你如果看到如下的訊息：

這個訊息代表 PostgreSQL 並沒有被正確的安裝。不是它沒有被安裝好，那就是你的命令路徑設定並未包含這個指令。嘗試使用下列這個包含絕對路徑的指令看看：

命令路徑在你的系統可能會有些不同。洽詢你的系統管理員，或著檢查安裝步驟以修正這個情況。

另一種回應可能是如此：

這代表了資料庫服務尚未啓動，或者它並不存在於createdb預設連線的位置。同樣地，檢查安裝的步驟或洽詢系統管理者。

而另一種回應也可能是：

這裡指出你用來連線的使用者名稱。這種情況可能會發生在你的資料庫管理員並未建立屬於你的資料庫。（PostgreSQL 的使用者帳戶是獨立於作業系統的使用者帳戶的）如果你是資料庫管理員，請參閱，進行建立資料庫帳戶。你必須是 PostgreSQL 初始安裝的管理者（通常是 postgres），以建立第一個一般資料庫使用者的帳戶。這個情況也可能發生在，你被發配的 PostgreSQL 使用者名稱有別於你的作業系統使用者名稱，如果是這樣的話，那你需要在指令上使用 -U 選項，或者設定 PGUSER 環境變數，以指定你的 PostgreSQL 使用者名稱。

如果你有一個資料庫帳戶，但你並沒有建立資料庫的權限，你將會看到下列訊息：

並非每一個使用者都被授權可以建立一個新的資料庫。如果 PostgreSQL 拒絕你建立資料庫，那麼系統管理者就需要賦予你建立資料庫的權限。洽詢你的系統管理者，如果是這種情況的話。如果你是自行安裝 PostgreSQL，那麼你應該以你啓動資料庫服務的使用者登入作業系統，再嘗試這個操作。

你也可以建立資料庫，但使用其他的名稱。PostgreSQL 允許在資料庫系統中建立無限制數量的資料庫。資料庫名稱必須是以英文字母為開頭，總長度限制為 63 位元組。一個簡便的方式是，建立一個與你使用者名稱同名的資料庫。許多工具會預設假定資料庫名稱和你同名，所以這可以省略一些文字的輸入。要建立這樣的資料庫，只要簡單地輸入：

如果你不再使用你的資料庫，你可以移除它。舉例來說，你是 mydb 這個資料庫的擁有者（建立者），你可以使用下列指令來消毁它：

（對這個指令來說，資料庫名稱並不會預設使用你的使用者同名資料庫。你必須明確地指定名稱）這個動作會完全地移除所有和這個資料庫相關的檔案，並且沒有回復的可能，所以要進行這個動作的話，請一定要考慮清楚。

更多有關於 createdb 和 dropdb 的說明，請參閱和的相關章節。

1.4. 存取一個資料庫

一旦你已經建立一個資料庫，你就可以開始以下列方式進行存取：

執行 PostgreSQL 互動式的終端程式，稱作 psql，它可以讓你輸入、編輯、執行 SQL 指令。
使用既有的圖型化介面工具，例如 pgAdmin 或是支援 ODBC 或 JDBC 的辦公室軟體，以建立並輸入資料到資料庫裡。不過這部份並未包含在這份手冊之中。
自行撰寫一個程式，可以使用許多種程式語言來完成。這個部份將會在第 IV 章中進行介紹。

在這份指南中，你可能會先使用 psql 來進行一些嘗試。你可以藉由下列指令開始操作 mydb 這個資料庫：

如果你並未指明資料庫名稱，那麼它預設會以你的使用者名稱作為資料庫名稱。在先前的章節使用 createdb 時，你已經知道這個隱含的規則了。

在 psql 中，你會以下列訊息開始：

最後一行也可能是：

這表示你是資料庫的超級使用者（superuser），如果你是自行安裝 PostgreSQL 的話，大概就會是這個情況。作為一個超級使用者，表示你不會受限於任何存取控制。不過在這份指南中，這並不是重要的事。

如果你在啓動 psql 時遭遇了一些問題，那麼請回到前一節。createdb 和 psql 的行為很類似，如果前者正常，後者也應該如期運行。

最後一行會輸出的是 psql 的提示字串，它表示 psql 正在等待你輸入 SQL 查詢語句。試試下面的指令吧：

psql 程式中也內建了一些非 SQL 的命令。他們會以倒斜線（\）起頭。舉例來說，你可以輸入下列指令以取得一些有關 PostgreSQL 所支援的 SQL 語法資訊：

要離開 psql 的話，請輸入：

如此的話，psql 將會結束，並回到你的命令列介面之中。（想瞭解更多內建指令，在 psql 提示字串後輸入 \? 。）完整的 psql 說明，都記載在之中。在這份指南中，我們並未使用這些功能，但你可以在需要的時候使用他們。

2. SQL 查詢語言

本章適合初學資料庫的朋友閱讀，以簡單的語法範例，實際操作以瞭解資料庫的運作方式。事實上，更複雜的資料庫行為，也不脫這個基本的操作模式。

2.1. 簡介

在這一章之中，提供了一個如何使用 SQL 進行簡易操作的大致概念。這裡主要讓你有基本的認識，但無法提供 SQL 完整且巨細靡遺的說明。許多書籍詳細介紹了 SQL，例如「Understanding the New SQL. A complete quide.」及「A Guide to the SQL Standard. A user's guid to the standard database language SQL.」。你應該瞭解的是，一些 PostgreSQL 語法來自於標準 SQL 的延伸。

在下面的例子當中，我們假設你已經建立了一個資料庫 mydb，如同前面章節所述，你也能夠使用 psql 了。

這些例子也放在 PostgreSQL 的原始碼之中，你可以在目錄 src/tutorial/ 下找到他們。（PostgreSQL的可執行套件可能未包含這些檔案）想要使用這些檔案的話，首先請切換到該目錄之下，然後執行 make：

$ cd ..../src/tutorial
$ make

這將會建立編譯 C 語言的程序，包含了使用者自訶的函式及型別。接下來，進行下列動作，以開始這個導覽：

$ cd ..../tutorial
$ psql -s mydb
...

mydb=> \i basics.sql

\i 指令會去指定的檔案讀取內容，並且執行。而在 psql 的 -s 選項則可以使用單步模式執行，也就是在每一個與伺服器互動的指令之後暫停。這個指令被使用在本節的檔案 basics.sql 之中。

2.2. 概念

PostgreSQL 是一個關連式資料庫管理系統（RDBMS）。這表示它是一個管理關連性質資料的系統。關連性，基本上在數學裡是以資料表（table）的形式來表現的。今天，以資料表為形式儲存資料是很常見的事，它是很自然的表現，但也有很多其他組識資料庫的方式。在 Unix-like 的作業系統中，檔案和目錄是一個階層式資料庫的案例。更先進的發展是採用物件導向式的資料庫。

每一個資料表是很多資料列（row）的集合。而每一個資料列則以許多相同集合的欄位（column）所組成。每一個欄位都被指定了特定的資料型別。每一個資料列中欄位的次序是固定的。很重要且必須記得的是，SQL 並不保證資料列在資料表中的次序（雖然他們可以在顯示的時候被明確表現）。

一個資料庫中集合了許多資料表，而很多的資料庫則被一個 PostgreSQL 服務所管理，形成一個資料庫叢集。

2.3. 創建一個新的資料表

你可以創建一個新的資料表，為它取一個名字，並且宣告所有的欄位名稱與其資料型別：

CREATE TABLE weather (
    city            varchar(80),
    temp_lo         int,           -- low temperature
    temp_hi         int,           -- high temperature
    prcp            real,          -- precipitation
    date            date
);

你可以把上述內容在 psql 中輸入，包含換行字元不會影響判讀。psql 是以分號作為指令結束的判定。

空白（包含「空白」、「定位符號」和「換行符號」）都可以自由使用在 SQL 指令當中。這表示你可以將指令以不同的形式排版，甚至全部寫都在一行也沒問題。使用破折號，連續2個（＂--＂），表示緊接的內容只是註解，直到該行結束為止。PostgreSQL 是不分大小寫字母的，包括各類關鍵字和描述語，除非是使用雙引號括起來的文字。（更精確地說，沒有被雙引號括起來的識別字，都會轉為小寫字母進行識別）

varchar(80) 表示指定一個資料型別，它可以儲放任意 80 個字元以內的字串。int 是一般認知的整數型別。real 表示資料是單精確度的浮點數。date 顧名思義，就是日期時間型別。（本例中欄位名稱和型別都使用 date，這可能是方便，也可能是困擾，端看你如何使用。）

PostgreSQL 支援標準的資料型別 int, smallint, real, double precision, char(N), varchar(N), date, time, timestamp, interval，也支援了複合型的地理資料型別。PostgreSQL 可以自訂組合任意數量的資料型別。語法上，資料型別名稱並不是保留關鍵字的範圍，除非特定的標準 SQL 支援需求之外。

第二個例子用來儲存城市及其所在的地理位置：

CREATE TABLE cities (
    name            varchar(80),
    location        point
);

point 型別是一個 PostgreSQL專屬資料型別的範例。

最後，應該被點出來的是，如果你不再需要一個表格，或者想要重新以別的方式創建它，那麼你可以以下列的指令來移除它：

DROP TABLE tablename;

2.4. 資料列是資料表的組成單位

INSERT 指令被用來將資料以資料列（row）的形式，新增至資料表（table）之中：

注意，所有的資料型別都有明確的輸入格式。只要不是簡單的數值內容，都必須要以單引號（'）括住，如同在本例中的形式。日期時間型別（date type）的資料內容就比較有彈性，但在這個導覽之中，我們仍然使用較固定的格式來表現。

地理資訊型別（point type）需要有座標組作為輸入，如下所示：

到目前為止，語法的使用需要你依照欄位宣告的次序擺放，而另一種語法可以允許你明確地指定資料相對應的欄位：

你可以將欄位以不同的次序擺放，甚或略去某些欄位，例如，precipitation 欄位（prcp）內容未知：

許多開發者會認為，在撰寫習慣上，明確指定欄位是比較好的方式。

請執行下列的指令，你將會擁有後續章節所需要的範例資料。

你可能需要使用 COPY 這個指令從文字檔案來載入大量的資料。這個指令會比 INSERT 要快上許多，因為 COPY 指令的設計就是為了大量資料輸入而產生的。它少了一些彈性，但提供了效率上的最佳表現。使用範例如下所示：

資料來源的檔案必須存在於後端的伺服器之中，並且可被 PostgreSQL 使用者（postgres）所存取，注意不是用戶端的主機，因為後端伺服器的服務需要直接讀取該檔案。你可以取得更多詳細說明，在的說明頁面。

2.5. 資料表的查詢

要從資料表（table）中取出資料，稱作資料表的查詢。要進行這個行為，你需要 SQL 中的 SELECT 指令。這個指令由幾個部份所組成，回傳列表（select list，想要回傳的欄位）、資料表列表（資料來源的資料表）、選擇性的條件定義（指定一些限制條件）。舉個例子來說，要取得資料表 weather 中所有的資料的話，請輸入：

SELECT * FROM weather;

這裡的星號 * 表示「所有欄位」。下列的指令會回傳相同的結果。

SELECT city, temp_lo, temp_hi, prcp, date FROM weather;

其輸出結果將會如下所示：

     city      | temp_lo | temp_hi | prcp |    date
---------------+---------+---------+------+------------
 San Francisco |      46 |      50 | 0.25 | 1994-11-27
 San Francisco |      43 |      57 |    0 | 1994-11-29
 Hayward       |      37 |      54 |      | 1994-11-29
(3 rows)

你可以在回傳列表中撰寫一些運算表示式，而不只是簡單的欄位引用。舉例來說，你可以輸入：

SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;

這應該會產生這樣的結果：

     city      | temp_avg |    date
---------------+----------+------------
 San Francisco |       48 | 1994-11-27
 San Francisco |       50 | 1994-11-29
 Hayward       |       45 | 1994-11-29
(3 rows)

注意，「AS」被用來重新命名輸出的欄位。（選用）

查詢語句可以加上「WHERE」來設定限制條件，以指定哪些列才需要被回傳。WHERE 的內容是一個布林（truth value）表示式，而只有在其運算值為真（true）時，該列才會被回傳。一般的布林運算子（AND, OR, NOT）都是被允許出現在表示式中的。舉例來說，下列的指令將會回傳 San Francisco 在雨天的天氣數值：

SELECT * FROM weather
    WHERE city = 'San Francisco' AND prcp > 0.0;

結果：

     city      | temp_lo | temp_hi | prcp |    date
---------------+---------+---------+------+------------
 San Francisco |      46 |      50 | 0.25 | 1994-11-27
(1 row)

你可以將結果進行排序：

SELECT * FROM weather
    ORDER BY city;

     city      | temp_lo | temp_hi | prcp |    date
---------------+---------+---------+------+------------
 Hayward       |      37 |      54 |      | 1994-11-29
 San Francisco |      43 |      57 |    0 | 1994-11-29
 San Francisco |      46 |      50 | 0.25 | 1994-11-27

在這個例子之中，其次序並沒有完全地被指定，所以你可能會得到 San Francisco 的列以另一種次序呈現。而你如果以下列指令查詢的話，那你就會得到如上但固定的結果：

SELECT * FROM weather
    ORDER BY city, temp_lo;

你可以在查詢時去除重覆的列：

SELECT DISTINCT city
    FROM weather;

     city
---------------
 Hayward
 San Francisco
(2 rows)

再一次，其結果的次序可能每次都不同，你可以同時使用 DISTINCT 及 ORDER BY 來確保能得到一致性的查詢結果：

SELECT DISTINCT city
    FROM weather
    ORDER BY city;

2.6. 交叉查詢

到目前為止，我們的一個查詢都只涉及到一個資料表。其實可以在同一個查詢中，同時查詢多個資料表，或者在同一個資料表之中同時處理多個資料列的資料。在一個查詢之中，涉及到同一個或多個不同的資料表中的資料，稱作為交叉查詢（join）。舉個例子來說，你希望同時列出天氣和城市位置的資料。要完成這項工作，我們需要關連資料表 weather 中的 city 欄位與表格 cities 中的 name 欄位，然後回傳符合條件的資料。

注意

這只是一個概念式的模形，交叉查詢（join）會以更有效率的方式運行，並非真正需要比較每一種組合是否符合條件，不過這些過程對於使用者而言並不會產生操作或結果上的差異。

下列查詢會產生交叉查詢的結果：

在這個結果中可以觀察到兩件事情：

不會有關於 Hayward 的結果出現。這是因為在資料表 cities 中未有 Hayward 的資料，所以交叉查詢會忽略資料表 weather 中未能關連的資料。關於這點，我們很快就會有解決辦法。
有兩個欄位顯示了城市的名稱。這樣是正確的，因為來自於資料表 weather 和 cities 的欄位被串連起來了。實務上，這樣的結果並不令人滿意，所以也許你可以明確地指出輸出的欄位，取代「 * 」的使用：

練習：試試看，當 WHERE 表示式被省略的話，查詢語句的意義會怎麼樣？

因為所有的欄位都使用不同的名稱，所以解譯器會自動發現他們所屬的資料表為何。如果在兩個資料表之中，存在有相同名稱的欄位時，你最好明確指出確定的欄位，如下所示：

多數開發者認為，在交叉查詢中，明確指出確定的欄位名稱，是良好的撰寫習慣。這樣查詢就不會因為有相同的欄位名稱而產生錯誤。而相同名稱的欄位可能是開發後續才加入的，未指明的話，就可能造成意外的結果。

交叉查詢也可以寫成如下的另一種形式：

這種語法並不如上述的常見，但我們會在這裡說明，以幫助你在後續章節的學習。

現在我們要回到前面的問題，把 Hayward 的資料放在輸出的結果之中。我們要在查詢中做的是，掃描資料表 weather，找到有所關連的每一列資料；沒有關連到的資料列，我們要填上「空值」（null）在資料表 cities 相對的欄位之中。這樣的查詢我們稱作「外部交叉查詢」（outer join）。（先前的交叉查詢為「內部交叉查詢」（inner join））。這樣的查詢指令如下所示：

這種查詢稱作為「左側外部查詢」（left outer join），因為這個交叉查詢，放在左側的資料表中的資料列，一定會在結果中至少出現一次，而右側的資料表中，則只有輸出有關連到左側資料表的資料列。當左側資料表的資料列，並沒有在右側資料表中被關連到時，屬於右側資料表的欄位就會被填上空值輸出。

練習：也有「右側外部交叉查詢」（right outer join）和「完全外部交叉查詢」（full outer join），試著找出他們都做了些什麼。

我們也可以對同一個資料表做交叉查詢，稱作為「自我交叉查詢」（self join）。接下來的範例，假設我們希望找到所有氣溫範圍的天氣資料。所以我們需要讓 temp_lo 及 temp_hi 兩個欄位，和其他的 temp_lo 及 temp_high 相比較。我們可以用下列的查詢來符合需求：

這裡我們重新命名了資料表 weather 為 W1 及 W2，以在交叉查詢中區分左側及右側。你也可以在其他查詢中使用這個技巧，以節省輸入的複雜度，例如：

你將會在後續內容中，不斷練習到這樣的使用方式。

2.7. 彙總查詢

如同其他的關連式資料庫產品，PostgreSQL 也支援彙總查詢的功能。彙總查詢指的是能夠把多個資料列的資料經過計算，產生單一結果的功能。舉例來說， count、sum、avg（平均值）、max（最大值）、min（最小值）都是彙總查詢的函式。

這裡的例子，我們可以得到所有低溫中的最大值：

如果我們想要知道，這個數值是發生在哪一個城市？也許可以試試：

不過，這行不通，因為 max 不能使用在 WHERE 條件式當中。（會有這樣的限制，是因為 WHERE 條件式目的是要判斷有哪些資料列的資料應該被彙總計算，所以很明顯地，這件事必須要在彙整計算前發生，這就產生了矛盾。）所以，像本例的查詢一般會使用子查詢（subquery）來取得適當的結果：

這樣就對了，因為子查詢是一個獨立的查詢，它可以獨立進行彙總查詢，有別於括號以外的查詢語句。

彙總查詢和 GROUP BY 一起使用會很方便的。舉例來說，我們可以得到每個城市所觀測到的最高氣溫：

這個查詢對每個城市都輸出一列的結果。每一個彙總的結果，將整個資料表，以關連到的城市進行計算。而我們可以進一步過濾資料內容，使用 HAVING：

如果限制所有 temp_lo 的數值必須要小於 40 （WHERE temp_lo < 40）的話，也可能得到相同的結果。最後，如果我們只關心以＂S＂開頭的城市的話，可以這樣做：

LIKE 運算子進行特徵比對運算，這將會在中進一步說明。

這裡很重要的是，瞭解 SQL 中 WHERE 和 HAVING 之間的行為。其根本上的差異是：WHERE 會在合併和彙總計算之前進行選擇資料的動作（也就是它控制著，哪些資料需要被彙總計算）；而 HAVING 是在合併及彙整計算之後，才進行過濾資料的動作。所以，在 WHERE 條件式當中，絕不可以使用彙整運算式；另一方面，HAVING 條件式總是使用彙整運算式。（嚴格來說，你也可以不在 HAVING 條件式中使用彙整運算式，但很少人這樣使用，通常就會改寫到 WHERE 條式件當中，那會更有效率。）

在先前的例子當中，我們可以把城市名稱的限制放在 WHERE 條件式之中，因為它不需要彙總。這將會比放在 HAVING 條件式中更有效率，因為這樣可以避免合併及彙整運算整個表格，不用浪費時間在本來就會被過濾掉的資料上。

2.8. 更新資料

你可以使用 UPDATE 指令以列為單位來更新資料。假設你發現氣溫的數值測量在 11 月 28 日之後都多了 2 度。你可以以下列語法來修正這些資料：

查看一下這些更新後的資料：

2.9. 刪除資料

要把某些資料列從資料表中移除，就使用 DELETE 這個指令。假設你對於 Hayward 這個城市的天氣不再感興趣了，那麼你可以執行下列指令，來刪除資料表中的這些資料：

所有關於 Hayward 的資料都被刪除了。

這個指令有一個應該要特別注意的情況：

沒有任何限制的條件，DELETE 將會刪去所有該資料表中的資料，使成為空的資料表。資料庫系統並不會在這個動作執行前和你確認！

3. 先進功能

3.1. 簡介

在前面的章節，我們介紹了如何使用 SQL 來存取 PostgreSQL 的基本方式。接下來，我們將會討論更多先進的功能，SQL 的管理功能以及防止資料遺失或損毁。最後，我們也會介紹一些 PostgreSQL 的延伸功能。

這個章節偶爾會引用第 2 章的範例，試著去改寫或是優化他們，所以閱讀過上一章也是很有用的。在這一章中有一些範例是來自於 tutorial 目錄中的 advanced.sql，這個檔案有一些範例資料可以載入，但載入方式在此就不再贅述。（請參閱 2.1 節的內容）

3.2. 檢視表（View）

讓我們回到 2.6 節的查詢範例。假設關連天氣資訊和城市位置的結果，是你的應用中特別常用的，但你並不想要每次都要輸入一長串的查詢語句。那麼，你可以為這個查詢語句建立一個「檢視表（View）」，你可以取一個名字，當你需要使用的時候，你可以把它當作一個資料表來使用：

妥善地使用檢視表，對於良好的 SQL 資料庫設計而言，是很關鍵的部份。檢視表允許你可封裝你的資料表結構與細節，當你的應用系統在逐步發展成熟的過程中，扮演一致性的資料介面。

檢視表可以用在大多數資料表可以使用的地方。而用檢視表來封裝其他檢視表的情況，也不少見。

3.3. 外部索引鍵

回想一下在第 2 章中的表格 weather 及 cities，思考下列問題：你想要保證沒有另一個人可以新增在 cities 中沒有的城市資料到 weather 中。這就是所謂資料關連性的管理。在簡單的資料庫系統當中，可能會這樣實作：先檢查 cities 中是否已有對應的資料，然後再決定資料表 weather 中新增或拒絕新的天氣資料。這個辦法還有很多問題，而且很不方便，所以 PostgreSQL 可以幫助你解決這個需求。

新的資料表宣告如下所示：

CREATE TABLE cities (
        city     varchar(80) primary key,
        location point
);

CREATE TABLE weather (
        city      varchar(80) references cities(city),
        temp_lo   int,
        temp_hi   int,
        prcp      real,
        date      date
);

現在嘗試新增一筆不合理的資料：

INSERT INTO weather VALUES ('Berkeley', 45, 53, 0.0, '1994-11-28');

ERROR:  insert or update on table "weather" violates foreign key constraint "weather_city_fkey"
DETAIL:  Key (city)=(Berkeley) is not present in table "cities".

外部索引鍵或簡稱外部鍵（foreign key）的行為可以讓你的應用程式變得容易調整。我們在這個導覽中不會再深入這個簡單的例子了，但你可以在第 5 章取得進一步的資訊。正確地使用外部索引鍵，可以改善資料庫應用程式的品質，所以強烈建議一定要好好學習它。

3.4. 交易安全

交易（Transaction），是所有資料庫的基礎概念。基本上來說，一個交易指的是，一系列的執行步驟包裹在一起，其結果只有全部成功或全部失敗兩種情況的操作行為。而其即時的執行狀態，對於其他同時在進行的交易而言，相互之間都是不可見的。如果在執行過程中產生了錯誤而造成整個交易無法完成，那麼所有的指令都不會對資料庫原來的內容產生影響。

舉例來說，某個銀行資料庫存放著各個客戶的存款資訊，也存放著分行的存款總額資訊。假設我們想要轉帳 $100.00，從 Alice 的帳號轉到 Bob 的帳戶。可以很直觀地依敘述，直接以下列指令執行：

這些指令的細節在這裡並不重要，重要的是，有好幾個更新資料的動作要被執行。我們銀行的營業員需要保證所有的更新資料都要完成，或是保持原樣。如果因為系統錯誤，而造成 Bob 收到 $100.00，但 Alice 卻沒有轉出金額，就不是應該發生的事。又或是 Alice 轉出了現金，而 Bob 卻沒有轉入金額，她也不會是開心的客戶。我們需要具有保證交易安全的方法，也就是如果在執行過程中，有部份出了錯，那麼即使是已經執行的部份，也不會對資料庫產生影響。把這些更新資料的指令，包裝在一個交易之中，就是這個保證交易安全的方法。這樣的交易稱作為 atomic：從其他的交易的角度來看，整個行為只有完全執行，亦或是什麼都沒有做，兩種結果而已。

我們也希望有某個保證是，一旦某個交易被完成了，那麼會由資料庫系統發出通知，使它確實是永久性的資料，即使發生短暫的當機之後，資料也不會遺失。舉例來說，如果我們正在進行 Bob 的提款系統操作行為，在他走出銀行大門之後，我們不要有任何可能性使他的提款記錄消失。一個具備交易安全的資料庫，會將這裡交易裡的更新行為，在它們被回報完成之前，都記錄在長效型儲存裝置上（也就是磁碟機）。

交易安全資料庫的另一個重要性質是， atomic update 的概念：當多個交易同時在進行時，每一個交易都不能夠看到其他交易未完成交易的資料狀態。舉個例子，如果某個交易正在進行總計所有分行的餘額，它不會只包含 Alice 的分行的提款，或不計算 Bob 的分行的存款，反之亦然。所以交易必須是全有全無的結果，而不只是資料庫資料的永久性，還包含了交易執行過程的可視性。一個未完成的交易直到完全完成之前，其間資料的改變，對其他的交易而言都看不見；而當交易完成的同時，資料的改變也同時全部呈現出來。

在 PostgreSQL 中，所謂的交易，是以 SQL 的 BEGIN 及 COMMIT 兩個指令相夾的過程。所以我們前述的銀行交易實際上會像這樣：

如果在交易的過程之中，我們決定不要完成交易（也許我們發現 Alice 的帳戶餘額不足），我們可以使用 ROLLBACK 指令來取代 COMMIT，那麼所有資料的變更都會取消。

PostgreSQL 一般將每一個 SQL 指令都視為一個交易來執行。如果你並沒有使用 BEGIN 指令，那麼每一個個別的指令就會隱含 BEGIN 先行，然後如果成功的話，COMMIT 也自動執行。一系列被 BEGIN 和 COMMIT 包夾的區域，有時候就稱為交易區塊。

注意

有一些用戶端程式會自動加入執行 BEGIN 及 COMMIT 指令，使得你不需要要求就獲得交易區塊的效果。請詳閱你所所用的工具文件。

還有一種交易的控制更為細緻，就是使用交易儲存點（savepoint）。交易儲存點允許你可以選擇性地取消部份交易，而只成交剩下的部份。使用 SAVEPOINT 指令定義一個交易儲存點之後，你可以使用 ROLLBACK，回復該交易狀態到交易儲存點。所有在交易儲存點之後所造成的資料庫變更，都會被回復，但交易儲存點之前的變更會暫時留存。

在回復到交易儲存點之後，它仍然可以繼續進行，而你可以多次回到該儲存點。相反地，如果你確定你不要再回復到某個特定的交易儲存點時，它也可以被釋放出來，系統資源也可以獲得舒解。記得，釋放或回復到一個交易儲存點時，將會自動釋放所有在那之後的交易儲存點。

所有這些過程都發生在交易區塊之中，所有沒有任何改變會讓其他資料庫連線所發現。當你確認完成了交易區塊的時候，完成交易的動作就會讓其他的連線知道，也能發現資料的改變；同時，回復的動作也會再也無法執行了。

記得這個銀行的資料庫，假設我們從 Alice 的帳號提出了 $100.00，然後存入了 Bob 的帳戶之中，隨後又發現應該要存到 Wally 的帳戶。我們可以使用交易儲存點來完成這個過程：

當然，這個例子是過度於簡化了，但這呈現出在交易區塊中使用交易儲存點，有著更多的可能性。進一步來說，ROLLBACK TO 是唯一能夠控制交易區塊執行流程的方式，當系統產生錯誤時，可以縮小回復的範圍，而不是只能全部回復再執行。

3.5. 窗函數

窗函數（window function）提供了在一個資料表中，進行資料列與資料列之間的關連運算。這部份可以和彙總函數的功能相呼應。然而，窗函數並無法像彙總函數一樣，把多個資料列運算合併為單一資料列的結果。取而代之的是，這些資料列仍然是分開並列的狀態。在這樣的情境下，窗函數能讓查詢結果的每一個資料列，都得到更多資訊。

這裡有一個列子，試著比較每一個員工他的薪資及他的部門平均薪資的情況：

SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;

  depname  | empno | salary |          avg          
-----------+-------+--------+-----------------------
 develop   |    11 |   5200 | 5020.0000000000000000
 develop   |     7 |   4200 | 5020.0000000000000000
 develop   |     9 |   4500 | 5020.0000000000000000
 develop   |     8 |   6000 | 5020.0000000000000000
 develop   |    10 |   5200 | 5020.0000000000000000
 personnel |     5 |   3500 | 3700.0000000000000000
 personnel |     2 |   3900 | 3700.0000000000000000
 sales     |     3 |   4800 | 4866.6666666666666667
 sales     |     1 |   5000 | 4866.6666666666666667
 sales     |     4 |   4800 | 4866.6666666666666667
(10 rows)

前面三個欄位是由資料表 empsalary 直接取得，每一個資料列就是該資料表的每一個資料列列。而第四個欄位則呈現整個資料表中，與其 depname 相同的平均薪資。（這實際上就是由非窗函數的 avg 彙總而得，只是 OVER 修飾字讓它成為窗函數，透過「窗」的可見範圍做計算。）

窗函數都會使用 OVER 修飾字，然後緊接著窗函數及其參數。這是在語法上使其有別於一般函數或非窗函數的彙總。OVER 區段需要確切指出如何分組要被窗函數計算的資料列。PARTITION BY 在 OVER 中，意思是要以 PARTITION BY 之後的表示式來分組或拆分資料列的資料。對於每一個資料列而言，窗函數的結果是，透過所有和該資料列相同分組的資料，共同運算而得。

你也可以控制列被窗函數處理的次序，透過在 OVER 中加入 ORDER BY。（窗內的 ORDER BY 不見得需要對應到資料列輸出的次序）例子如下：

SELECT depname, empno, salary,
       rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;

  depname  | empno | salary | rank 
-----------+-------+--------+------
 develop   |     8 |   6000 |    1
 develop   |    10 |   5200 |    2
 develop   |    11 |   5200 |    2
 develop   |     9 |   4500 |    4
 develop   |     7 |   4200 |    5
 personnel |     2 |   3900 |    1
 personnel |     5 |   3500 |    2
 sales     |     1 |   5000 |    1
 sales     |     4 |   4800 |    2
 sales     |     3 |   4800 |    2
(10 rows)

如上所示，rank 函數為每個有使用 ORDER BY 的分組，標記一系列數字的次序。rank 不需要特定的參數，因為它標記的範圍一定是整個 OVER 所涵蓋定的範圍。

窗函數所計算的範圍，是一個虛擬資料表的概念，是由 WHERE、GROUP BY、HAVING、或其他方式虛擬出來的。舉例來說，當某個資料列被 WHERE 過濾掉時，它也不會被任何窗函數看見。一個查詢中可以包含多個窗函數，透過不同 OVER 修飾字的指定，將資料做不同觀點的處理。但他們都會在一個相同的虛擬資料表中進行處理。

我們已經瞭解如果次序不重要的話， ORDER BY 可以被省略；且如果所有的資料列都只區分成一組的話，其實 PARITION BY 也可以省略。

還有另一個窗函數相關的重要概念：對於每一個資料列來說，它會在分組中還有個分組，另稱作窗框（window frame），有一些窗函數只對窗框裡的資料列進行處理，而不是整個分組。預設的情況是，如果 ORDER BY 被指定了，以 ORDER BY 排序後，那麼窗框的範圍就是從分組的第一列到該列為止，而在那之後資料列的值都會相同。當 ORDER BY 被省略的時候，預設窗框的範圍就是整個分組。下面是使用 sum 的例子：

SELECT salary, sum(salary) OVER () FROM empsalary;

 salary |  sum  
--------+-------
   5200 | 47100
   5000 | 47100
   3500 | 47100
   4800 | 47100
   3900 | 47100
   4200 | 47100
   4500 | 47100
   4800 | 47100
   6000 | 47100
   5200 | 47100
(10 rows)

上面可以看到，因為在 OVER 裡面沒有 ORDER BY，窗框就等於整個分組，甚至因為沒有 PARTITION BY，所以等於整個資料表。換句話說，每一個資料列總和都是整個資料表的總計，所以我們在每一個資料列中都得到相同的結果。但如果我們加入了 ORDER BY 之後，結果將會不同：

SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;

 salary |  sum  
--------+-------
   3500 |  3500
   3900 |  7400
   4200 | 11600
   4500 | 16100
   4800 | 25700
   4800 | 25700
   5000 | 30700
   5200 | 41100
   5200 | 41100
   6000 | 47100
(10 rows)

這裡的總和就是從第一筆（最小），加計到每一列，包含薪資相同的每一列（注意薪資相同的）。

窗函數只允許出現在 SELECT 的輸出列表及 ORDER BY 子句裡，在其他地方都是被禁止的，像是 GROUP BY，HAVING，WHERE等區段。這是因為窗函數在邏輯上，都是在他們處理完之後才進一步處理資料的。也就是說，窗函數是在非窗函數之後才執行的。這意指在窗函數中使用非窗函數是可以的，但反過來就不行了。

如果有一個需要在窗函數處理完再進行過濾或分組的查詢的話，你可以使用子查詢。舉列來說：

SELECT depname, empno, salary, enroll_date
FROM
  (SELECT depname, empno, salary, enroll_date,
          rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos
     FROM empsalary
  ) AS ss
WHERE pos < 3;

上面的查詢只會顯示內層查詢的次序（rank）小於 3 的資料。

當一個查詢使用了多個窗函數的話，它就會分別使用 OVER 子句來描述，但如果相同的分組方式要被多個函數所引用的話，就重覆了，也容易出錯。這種情況可以使用 WINDOW 子句來取一個別名，來取代 OVER。舉個例子：

SELECT sum(salary) OVER w, avg(salary) OVER w
  FROM empsalary
  WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);

更多窗函數的細節可以參閱 4.2.8 節、9.21 節、7.2.5 節、及 SELECT 指令的說明頁。

3.6. 繼承

繼承是一個物件導向資料庫的概念，它開啓了資料庫設計的更多可能性。

讓我們創建兩個資料表：cities 和 capitals。很自然地，首都（capitals）也是城市（cities），所以你希望有個方式，可以在列出所有城市時，同時也包含首都。如果你真的很清楚的話，你可以建立如下的結構：

這樣的查詢結果會是正確的，不過它有點不是很漂亮，當你需要更新一些資料的時候。

有一個更好的方法是這樣：

在這個例子中，captitals 繼承了 cities 的所有欄位（name, population, altitude）。欄位 name 的資料型別是文字型別（text），是一個 PostgreSQL 內建的資料型別，它允許字串長度是動態的。然後宣告 capitals 另外多一個欄位，state，以呈現它是屬於哪一個州。在 PostgreSQL，一個資料表可以繼承多個其他的資料格。

舉個例子，下面的查詢可以找出所有的城市名稱，包含各州的首都，而其海拔高過於 500 英呎以上：

回傳結果：

另一方面，下面的查詢可以列出非首都的城市，且其海拔在 500 英呎以上：

這裡的「ONLY」（cities之前），指的是這個查詢只要在資料表 cities 上就好，不包含繼承 cities 其他資料表。這裡許多我們都已經討論的指令 — SELECT、UPDATE、DELETE — 都支援 ONLY 這個修飾字。

注意

雖然繼承經常被使用，但尚未整合唯一性限制或外部索引鍵的功能，這限制了它的可用性。詳情請參考的說明。

3.7. 結論

PostgreSQL 還有許多這份導覽中未能介紹到的功能，這裡主要是針對新鮮的 SQL 使用者所準備的內容。這些功能將會在後續的章節進行更詳細的討論。

如果你覺得你需要更多介紹的資訊，可以到 PostgreSQL 的官方網站取得更多訊息。

II. SQL 查詢語言

在這個部份介紹如何在 PostgreSQL 中使用 SQL 語言。首先，我們從一般性的 SQL 語法開始說明，然後解釋如何建立結構來保存資料，如何充實資料庫，以及如何查詢資料的方法。中段的部份列出 SQL 指令中的資料型別與函數。最後剩餘的部份，將會針對一些調教資料庫的重要議題進行說明。

這個部份的內容設計讓初學者可以循序漸進地完整瞭解該主題，而不需要反覆前後查閱。各章的內容設計上都是獨立的，所以進階的使用者可以分別閱讀他們需要的部份。在這個部份的內容，針對於主題式的單元描述。需要瞭解詳情的讀者，請參閱第 6 部份中，個別指令的說明頁面。

在這個部份裡的讀者，應該要知道如何連線到一個 PostgreSQL 資料庫，並且執行 SQL 指令。如果不熟悉這些操作的讀者，建議先閱讀第 1 部份的內容。SQL 指令一般是使用終端工具 psql，但其他具有類似功能的程式也可以使用。

4. SQL 語法

這章中說明 SQL 的使用語法。從這裡建立後續章節所需的理解基礎，然後進一步瞭解 SQL 如何使用去定義及修改資料。

我們也建議已經熟悉 SQL 語法的使用者，仔細地閱讀本章，因為這裡包含了一些有別於其他 SQL 資料庫或專屬於 PostgreSQL 的規則和觀念。

4.3. 函數呼叫

PostgreSQL 允許函數呼叫的時候，使用編號或名稱記號。名稱記號特別好用在於有很多參數的時候，因為它能讓參數與實際的引數有更明確的關連，也更有信賴感。使用編號記號的話，函數呼叫就會依其宣告時的參數次序給予編號；而使用名稱記號的話，參數就會依宣告時的名稱配對，不需要次序對應。

不論哪一種記號方式，如果在宣告時有設定預設值的話，那就不一定要在呼叫時設定其值。不過這點對名稱記號特別好用，因為任何參數的組合都可以省略，而編號記號時就只有從最右邊的參數開始省略。

PostgreSQL 也支援混合式的記號方式，也就是同時使用編號，也使用名稱。在這個例子中，編號的參數會先使用，然後名稱的參數在其之後使用。

接下來的例子，將會描繪所有三種記號方式，都使用下列的函數定義：

CREATE FUNCTION concat_lower_or_upper(a text, b text, uppercase boolean DEFAULT false)
RETURNS text
AS
$$
 SELECT CASE
        WHEN $3 THEN UPPER($1 || ' ' || $2)
        ELSE LOWER($1 || ' ' || $2)
        END;
$$
LANGUAGE SQL IMMUTABLE STRICT;

函數 concat_lower_or_upper 有兩個必要的參數，a 與 b。然後有一個參數是選擇性的，uppercase 的預設值是 false。參數 a 和 b 的文字會被連結起來，然後依 uppercase 的設定，強制轉換為大寫或小寫字母。這個函數定義的其他部份在這裡並不重要（詳情請參閱第 37 章）。

4.3.1. 使用編號記號（Positional Notation）

編號記號是 PostgreSQL 傳統的參數呼叫方式，如下所示：

SELECT concat_lower_or_upper('Hello', 'World', true);
 concat_lower_or_upper 
-----------------------
 HELLO WORLD
(1 row)

所有的參數會依序指定。結果是全大寫，因為 uppercase 設定為 true。另一個例子如下：

SELECT concat_lower_or_upper('Hello', 'World');
 concat_lower_or_upper 
-----------------------
 hello world
(1 row)

這裡的 uppercase 省略了，所以會使用預設值 false，結果就以小寫字母輸出。在編號的記號方式時，參數的省略是由右至左，只有具有預設值的部份才能省略。

4.3.2. 使用名稱記號（Named Notation）

使用名稱作為參數記號方式的話，每一個參數名使用「=>」來指定其所代表的表示式，如下所示：

In named notation, each argument's name is specified using=>to separate it from the argument expression. For example:

SELECT concat_lower_or_upper(a => 'Hello', b => 'World');
 concat_lower_or_upper 
-----------------------
 hello world
(1 row)

再一次省略 uppercase，所以它自動設為 false。使用名稱記號的一項好處就是參數不用固定次數，如下例所示：

SELECT concat_lower_or_upper(a => 'Hello', b => 'World', uppercase => true);
 concat_lower_or_upper 
-----------------------
 HELLO WORLD
(1 row)

SELECT concat_lower_or_upper(a => 'Hello', uppercase => true, b => 'World');
 concat_lower_or_upper 
-----------------------
 HELLO WORLD
(1 row)

有一種比較舊的語法是使用「:=」，因為相容性而保留下來：

SELECT concat_lower_or_upper(a := 'Hello', uppercase := true, b := 'World');
 concat_lower_or_upper 
-----------------------
 HELLO WORLD
(1 row)

4.3.3. 混用記號

混用記號指的就是混合使用編號及名稱來設定參數。然而，如前所述，名稱參數不能先於編號參數。例如：

SELECT concat_lower_or_upper('Hello', 'World', uppercase => true);
 concat_lower_or_upper 
-----------------------
 HELLO WORLD
(1 row)

在上面的查詢中，a 和 b 兩個參數以編號指定，而 uppercase 就以名稱指定。在本例子，只有增加一點點內容而已。使用比較複雜的函數時，會有許多參數設定了預設值，以名稱或混合的方式來設定參數，可以節省許多撰寫的程式碼，也可以減少出錯的可能性。

注意

名稱記號和混用記號目前不能用於彙總函數的呼叫（但如果是用於窗函數是就可以）。

5. 定義資料結構

這一章涵蓋了如何建立資料庫結構。在關連式資料庫中，原始資料儲存在表格之中，所以在這一章裡，主要說明表格如何建立及調整，以及有什麼樣的功能可以操控所存放的資料。再來我們會討論表格如何以結構來管理，以及權限的設定。最後，我們會簡短地看一下其他的功能如何影響資料儲存，像是繼承、表格分割、view、函數、還有觸發函數。

5.1. 認識資料表

「資料表」（table）在關連式資料庫中的角色很接近在紙上畫一個「資料表」：包含了列與欄。欄的數量與次序是固定的，而每個欄位都有一個名稱。列的數量是變動的—它表示在當下有多少資料被存在資料庫中。SQL 並不保證列在資料表中的次序。當讀取資料表的時候，除非明確要求要排序，不然列與列之間是不存在固定的次序。這些將在中進一步說明。進一步來說，SQL 並沒有給每一列一個唯一性的識別，所以在資料表中是有可能存在有完全相同內容的列。這是 SQL 架構下的數學模型結果，通常不是理想的結果。在這章之後，我們會說明如何處理這個問題。

每一個欄位都有一個資料型別。資料型別限制了儲存於該欄位的資料內容，同時也設定了資料儲存的型態，使得該資料可以直接用於計算。舉個例子，一個被宣告為數字型別的欄位，就不能放進任何文字字串，而儲存於此欄位中的資料，可用於數學計算。相反地，一個被宣告為字元字串的欄位，可以儲存任何型能的資料，但就無法用於數學計算了，雖然也有其他操作可以進行字串串接。

PostgreSQL 擁有許多內建的資料型別，可以適應許多應用系統。使用者也可以自訂他們所需的資料型別。大多數內建的資料型別都有顯而易見的名稱與用法，所以我們打算在再做詳細的說明。有一些常用的資料型別，像是 interger 用於整數，numeric 用於浮點數，text 用於字串，date 則是日期，time 是時間，而 timestamp 則同時包含日期和時間。

要建立一個資料表，你可以使用指令。這個指令你至少要指定一個名稱給新的資料表，還有每一個欄位的名稱與資料型別。例如：

這個建立一個叫作 my_first_table 的資料表，它包含了兩個欄位。第一個欄位叫作 first_column，其資料型別為 text；第二個欄位名稱為 second_column，資料型別為 integer。表格與欄位名稱的規則依中所介紹的識別字語法，但也有一些例外。注意欄位列表是用逗號分隔，並且包含於括號之中。

當然，前面的例子明顯只是做做樣子而已。一般來說，你會將你的資料表欄位以實際用途來命名，所以我們來看一下更實際的例子：

（numeric 資料型別可以儲存浮點數，用於典型的貨幣計量。）

小技巧
當你建立了許多相關的資料表時，建立最好選擇一個用於命名表格及欄位的規則。舉例來說，有一個規則是使用單數或複數名詞來取名表格，兩者都有些人喜歡使用。

一個資料表中有多少欄位是有限制的，依欄位型別而定，上限通常是 250 個到 1600 個之間。不過，宣告到這麼多的欄位是非常罕見，而且應該是有問題的設定。

如果你不再需要某個資料表，你可以移除它。請使用指令，如下所示：

企圖要移除一個不存在的資料表，會產生錯誤。不過，在 SQL 腳本中，在建立資料表前嘗試移除是很常見的，通常會忽略錯誤訊息，所以不論資料表是否已經存在，腳本都能如預期執行。（如果你需要的話，你也可以使用 DROP TABLE IF EXISTS 來避免產生錯誤訊息，但這並不是標準 SQL 語法。）

如果你需要變更資料表的結構的話，請參閱本章的。

到目前為止，你已經可以利用工具建立完整功能的資料表。本章接下來的部份會針對附加的功能介紹，像是確保資料完整性、安全性、或方便性。如果你現在急著要將資料存入你的資料表的話，你可以暫時跳過本章，到繼續操作。

5.2. 預設值

欄位可以指定一個預設值。當新的列被插入，某些欄位卻沒有指定其值時，這些欄位將會被填入相對應的預設值。資料處理的過程中，當有欄位的值不確定時，也會被設定為其預設值。（關於資料處理的詳細內容，請參閱第 6 章。）

如果預設值並沒有明確被指定時，預設值就會是 null。一般來說空值是可接受的情況，因為空值可以表示「未知的資料」的意義。

在表格定義時，預設值接在資料型別後宣告，如下所示：

CREATE TABLE products (
    product_no integer,
    name text,
    price numeric 
DEFAULT 9.99
);

預設值也可以是運算表示式，會在資料插入的同時進行運算（不是在表格建立時）。常見的例子是 timestamp 欄位，會設定一個 CURRENT_TIMESTAMP 的預設值，使其在資料插入時設定為當下的時間。另一個例子是產生「序列數」，這在 PostgreSQL 中，通常以下列語法來表現：

CREATE TABLE products (
    product_no integer 
DEFAULT nextval('products_product_no_seq')
,
    ...
);

這裡的 nextval() 函數會從序列物件取得下一個數字（參閱 9.16 節）。這個例子也常簡化為：

CREATE TABLE products (
    product_no 
SERIAL
,
    ...
);

有關 SERIAL 的簡寫方式，將在 8.1.4 節中說明。

5.3. Generated Columns

Generated column (自動欄位)是特殊的欄位，它的內容由其他欄位的內容計算得出。相對於資料表來說，就是欄位形態的 View。Generated column 有兩種：stored 和 virtual。 Stored 的自動欄位在寫入（插入或更新）時進行計算，會像正常欄位一樣佔用儲存空間。Virtual 的自動欄位則不佔用任何儲存空間，而是在讀取時會對其進行計算。因此，虛擬的自動欄位類似於檢視表(view)，而儲存的自動欄位則類似於具體化檢視表(materialized view)（但會自動更新）。 PostgreSQL 目前僅實作了儲存的自動欄位。

使用 GENERATED ALWAYS AS 語法來產生自動欄位，舉例來說：

務必要加上 STORED 關鍵字以寫入自動欄位. 更多細節請參閱

使用 INSERT 或 UPDATE 指令時，不能直接指定內容至自動欄位，但可以使用 DEFAULT 關鍵字來設定預設值。

針對「包含預設值的欄位」及「自動欄位」進行比較:

當資料列第一次被寫入時，如果該欄位沒有提供任何值，將採用預設值寫入; 而自動欄位則是在資料列被更新時，根據其他欄位來產生對應的值，該值無法被覆寫。

「包含預設值的欄位」通常不會參考到表格的其他欄位; 而「自動欄位」通常都會參考到其他欄位。
「含預設值的欄位」在設定預設值時可以使用易變函數 (volatile function)，舉例來說: random() 或者是取得當前時間的函式; 而「自動欄位」則不允許使用。

「自動欄位」和「包含自動欄位的表格」有一些限制：

自動欄位的表示式只能使用 immutable 函數，不能使用子查詢或以任何方式引用同筆資料以外的任何內容。
自動欄位的表示式不能引用另一個自動欄位。
自動欄位的表示式不能引用系統欄位（tableoid 除外）。
自動欄位不能有欄位預設值或識別定義。
自動欄位不能是分割區主鍵的一部分。
外部資料表可以具有自動欄位。有關詳細資訊，請參閱。

其他注意事項適用於自動欄位的使用。

自動欄位與其一般欄位分開維護存取權限。因此，可以對其進行安排，以便設定可以從自動欄位中讀取，但不能從一般欄位中讀取的特定角色。
從概念上講，在執行事件觸發器之前，會先更新自動欄位。因此，在 BEFORE 觸發器中對基本欄位所做的更新將先反映在自動欄位中。但是相反地，不允許在觸發器之前讀取自動欄位。

5.5. 系統欄位

每一個表格都有幾個系統欄位，而它們是由資料庫系統預先定義好的，所以使用者在定義欄位名稱時，不能使用這些名字。（這些限制並不是因為它們是保留關鍵字，所以就算用引號括起來也不能使用。）但在一般使用時，你也不需要特別考慮這些欄位，只要瞭解會有這些欄位存在就好。

oid

每一個資料列會有一個 Object ID，不過這個欄位只有在建立表格時，加上 WITH OIDS 語法才能使用。或者也可以藉由參數 default_with_oids 來切換使用。這個欄位的型別是 oid（和欄位名相同）。參閱 8.18 節瞭解詳細資訊。

tableoid

每個表格也有一個 ID 也會記錄在每一個資料列中。這個欄位特別方便在取得表格的繼承結構（參閱 5.9 節），如果沒有這個欄位的話，要去找出資料列的來源就會很麻煩。tableoid 可以參考 pg_class 表格中的 oid 欄位，進一步取得表格的名稱。

xmin

這指的是資料列在插入資料的版本資訊。（每一個資料列的版本，都是一個獨立的資料狀態；每一次資料的更新，都會在邏輯層產生一個新的資料列版本。）

cmin

指令識別碼，會存在於新增資料的交易中。（從 0 開始）

xmax

刪除資料的交易版本資訊，如果是 0 的話，代表讓資料列不是刪除中的資料列版本。這通常是用來指出某個刪除的交易還未被完成，或某個刪除正在被回復。

cmax

指令識別碼，有數字的話表示一個刪除的交易指令，或是 0。

ctid

表示每一個資料列存在於該表格的實體位址。注意到的是，雖然 ctid 可以用來快速找到特定的資料列版本，但 ctid 是會改變的，如果有執行過 VACUUM FULL 的話。所以 ctid 如果要用於固定的資料定位的話，是不應該被考慮的選項。OID 或額外自訂序列數字，更適合用於分別邏輯上的資料列。

OID 是一個 32 位元的數字，以 cluster 為單位配發。在一個大型或長期使用的資料庫中，是有可能出現重覆的情況。所以，假設 OID 是唯一的識別是不正確的觀念，除非你還有搭配其他方法來確保唯一性。如果你需要識別表格中的資料列的話，使用序列數產生器是比較建議的作法。OID 也可以這樣用來得到一些額外的預防性功能：

唯一性的限制應該設定在 OID 欄位上，來確保每一個 OID 可以識別每一個資料列。當有唯一性限制存在的時候，對於已經存在的資料列就不會有重覆的 OID。（當然，這方法只能用於資料筆數在 40 億筆以下的表格。不過實務上的表格多數都少於這個數目，而且太多資料的話，效果也會變得很差。）
OID 在多個表格間就不能假設為是唯一，你應該搭配 tableoid 來識別資料庫層級的唯一性。
當然，在建立表格時必須要加入 WITH OIDS 語法。在 PostgreSQL 8.1 之前，WITHOUT OIDS 是預設值。

交易識別碼也是 32 位元的數字。在一個長期運行的資料庫中，交易識別碼也可能會重覆。只要有適當的管理機制的話，這並不會是什麼嚴重的問題，詳情請參閱第 24 章。然而，長期來說（超過 10 億個交易），假定交易識別碼的唯一性是不明智的作法。

指令識別碼也是 32 位元的數字，其絕對上限是約 40 億個指令在一個交易當中，實務上這個限制並不會是問題。注意到這個限制是 SQL 指令數量的限制，而不是處理資料的限制。只有真正有改變資料庫內容的指令才會有指令識別碼。

5.6. 表格變更

當你建立了一個表格，而你發現出了點錯，或者應用需求有一些改變，那麼你可以移除它再重新建立。但這可能不會一個好的選擇，當表格中已經儲存了許多資料時，或者表格正在被其他的資料庫物件所參考中（例如外部鍵參考）。所以 PostgreSQL 提供了一系列的指令來修改現存的表格。注意到這和更新表格內資料的概念是不同的：在這裡，我們主要針對的是調整表格的定義或結構。

你可以：

加入欄位
移除欄位
加入限制條件
移除限制條件
改變預設值
改變欄位資料型別
變更欄位名稱
變更表格名稱

所有這些動作都透過指令來進行，你可以參考該頁面取得詳細資訊。

5.5.1. 加入欄位

要加入一個新欄位，請使用下面的指令：

這個新的欄位預設會以預設值填入（如果你沒有使用 DEFAULT 子句來宣告的話，那會使用 NULL）。

你也可以在新增同時建立限制條件：

事實上，所有在 CREATE TABLE 的選項都可以在這裡使用。要記得的是，預設值必須要符合限制條件的設定，否則這個欄位會無法加入。順帶一提的是，你也可以隨後再加入限制條件（隨後說明），在你更新好新的欄位資料內容後。

小技巧

加入一個欄位，並且設定預設值，會更新表格的裡的每一個資料列（為了存入新的欄位內容）。然而，無預設值的話，PostgreSQL 就不會在實體上真正進行更新的動行。所以如果你的新欄位大多數的內容都不是預設值的話，那麼就建議不要在加入欄位時設定預設值。之後再使用 UPDATE 來分別更新其內容，然後再以隨後的介紹來更新預設值的設定。

5.5.2. 移除欄位

要移除一個欄位，請使用下列指令：

不論資料在該欄位是否消滅，表格的限制條件都會同步再次啓動檢查。所以，如果欄位是被外部鍵所參考的話，PostgreSQL 不會就這樣移除它。你可以宣告同步刪去與此欄位相關的物件，加上 CASCADE：

請參閱，瞭解詳細的處理機制。

5.5.3. 加入限制條件

要加入限制條件，請使用表格限制條件的語法，例如：

要加入 NOT NULL 限制條件的話，就不能寫成表格的限制條件，請使用這樣的語法：

加入的限制條件會立即開始檢查，所以當下的資料內容必須要能符合條件才能加入成功。

5.5.4. 移除限制條件

要移除限制條件，你需要先知道它的名稱。如果你在宣告時有命名的話，那就使用那個名稱，否則你得找出系統自動命名的名稱。其所使用的指令為「\d tablename」，會列出表格相關的資訊。或使用其他的資料庫工具應該也可以找到它。找到之後請使用下列指令來移除限制條件：

（如果你的限制條件名稱像是「$2」這樣的，不要忘記使用雙引號括住，使其可以正確地被識別為是名稱。）

在移除欄位時，你需要加入 CASCADE，如果你需要同步移除相關的限制條件的話。像是外部鍵就會依賴另一個唯一性限制或主鍵的限制條件。

下面這可以用在移除 NOT NULL 限制的欄位：

(記得 NOT NULL 是沒有名稱的。)

5.5.5. 變更欄位預設值

要設定新的欄位預設值，請使用下面指令：

注意這並不會影響到已經存在的資料，只有隨後新增的資料才會使用。

要移除任何預設值，請使用：

這個指令會把預設值設為空值。因為預設值本來就設為空值，所以即使刪去一個未設定預設值欄位的預設值，也不會是一種錯誤。

5.5.6. 變更欄位資料型別

要變更欄位成為另一個資料型別，請使用下列指令：

這只有在欄位內容可以被自動轉換型別時才會成功。如果存在比較複雜的轉換時，你需要加上 USING 子句來指示如何轉換資料內容。

PostgreSQL 會企圖轉換欄位預設值到任何新的型別，而所有的限制條件也會啓動檢查機制。但這些轉換可能會失敗，也可以產生意外的結果。比較好的作法是，先移除限制條件，再變更資料型別，最後再重新加入適當調整後的限制條件。

5.5.7. 變更欄位名稱

要變更某個欄位的名稱：

5.5.8. 變更表格名稱

要變更表格的名稱：

5.7. 權限

當一個資料庫物件被建立時，它會先指定存取權限給擁有者，而擁有者一般來說就是執行建立指令的使用者。對大多數的資料庫物件來說，其預設的狀態就是只有擁有者（或超級使用者）可以對該物件進行所有操作。要讓給其他使用者來操作的話，就必須進行授權的動作。

有很多不同種類的權限：SELECT、INSERT、UPDATE、DELETE、TRUNCATE、REFERENCES、TRIGGER、CREATE、CONNECT、TEMPORARY、EXECUTE、USAGE。這些權限對於不同物件的效果，會因為是哪一種物件而有所差別（表格、函式...等等）。要瞭解完整在 PostgreSQL 中所支援的各種物件權限，請參考 GRANT 語法頁面。這裡的內容主要說明如何使用。

修改和移除一個資料庫物件，是只有擁有者才具備的權力。

要把一個物件被指派給一個新的擁有者的話，使用該物件的 ALTER 指令，例如：ALTER TABLE。超級使用者也可以做指派的動作；原來的擁有者如果它仍是該物件的管理群組一員的話，當然也可以；再來就管理群組新的成員。

要進行授權行為的話，請使用 GRANT 指令。舉例來說，如果 joe 是一個使用者，而 accounts 是一個表格，要讓他可以獲得更新表格資料的權力：

GRANT UPDATE ON accounts TO joe;

使用 ALL 的權限，就代表授權所有可設定的權限。

有一個特別的使用者是 PUBLIC，代表的是系統內的所有使用者。當資料庫內有很多使用者時，可以制定「群組（group）」來簡化管理。這部份詳細的說明請參閱第 21 章。

要移除權限，請使用 REVOKE 指令：

REVOKE ALL ON accounts FROM PUBLIC;

物件擁有者的特殊權限（例如DROP、GRANT、REVOKE...等）都是和擁有者一起設定，而無法單獨授權。不過，擁有者可以選擇移除自己的權限，例如建立一個唯讀的表格，讓自己和其他人一樣。

回到前面所說的，只有物件的擁有者（或超級使用者）可以變更該物件的權限。然而，也可以使用「with grant option」讓另一個使用者可以代授權給其他使用者。不過如果這個「grant option」被移除時，所有被代授權的使用者都會同時喪失該權限。更詳細的說明請參閱 GRANT 及 REVOKE 說明頁面。

5.12. 外部資料

PostgreSQL 實作了 SQL/MED 的部份標準，讓你可以存取不在 PostgreSQL 管理下的資料，重點是，你仍然只需要使用 SQL 語法。這樣的資料我們稱作為外部資料。（注意這部份的使用不要和外部鍵搞混了，外部鍵是資料庫內部的一種條件限制。）

外部資料的存取是透過「Foreign data wrapper」（外部資料封裝技術）。外部資料封裝技術是一組函式庫，用於和外部的資料源溝通，它封裝了資料連線和存取資料的細節。有一些外部資料封裝的套件收錄在 contrib 模組之中，參閱附件 F。其他種類的外部封裝套件則由第三方產品提供。如果沒有適合你的資料源的套件的話，你也可以自己寫一個，參閱第 56 章。

要存取外部資料，你需要建立外部服務物件，用它來連結特定的外部資料源，也可以對套件進行一些設定。然後你還需要建立幾個外部資料表，用於定義外部資料的資料結構。外部資料表的使用就如一般的表格一樣，只不過它沒有實際儲存任何資料罷了。當外部資料表被查詢時，PostgreSQL 會透過外部資料封裝套件，從外部資料源取得資料，或者傳送資料到外部，進行更新資料。

存取外部資料可能需要對外部資料源進行認證。這可以利用使用者映對（user mapping）的方法，讓每個 PostgreSQL 使用者在使用部資料表時，可以傳送自己的認證資訊。

進一步的資訊，請參閱、、、、等內容。

5.13. 其他資料庫物件

表格是關連式資料庫結構裡的主要物件，因為它負責存放資料，但並不是資料庫中唯一的物件。還有許多其他種的物件存在，讓使用上更方便或管理更有效率。這些其他的物件並不在本章中討論，但我們先在這裡列出讓你知道：

視觀
函數與運算子
資料型別和領域
觸發事件和規則覆寫

關於這些物件的詳細說明安排在第 V 部份。

5.14. 相依性追蹤

當你建立了一個複雜的資料庫結構，包含了許多資料表，也設計了許多外部索引鍵、檢視表、觸發事件、函數.....等等。也就是說，其實你建立了一堆物件之間的關連性。舉例來說，資料表的外部索引鍵就與另一個資料表有著參考的關連性。

要維護整個資料庫結構的完整性，PostgreSQL 得確保你不能在有關連性的情況下，隨意刪去物件。舉例來說，企圖刪去在 5.3.5 節中，我們所使用過的產品資料表，而訂單資料表與其有相依的關連性，那就會產生如下的錯誤訊息：

DROP TABLE products;

ERROR:  cannot drop table products because other objects depend on it
DETAIL:  constraint orders_product_no_fkey on table orders depends on table products
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

這個錯誤訊息包含了很有用的指引：如果你不想要一個個處理其相依關連性，那可以一次刪去他們：

DROP TABLE products CASCADE;

如此所有相依的物件就會被刪除了，所有相互依存的物件都會，是遞迴式的處理流程。在這個例子中，它不會移除訂單資料表，只會移除外部索引鍵的限制條件，因為沒有其他物件與該外部索引鍵相依。（如果你要確認 DROP ... CASCADE 會處理哪些物件，你可以用 DETAIL 取代 CASCADE，就會輸出其相依的物件。）

幾乎所有 PostgreSQL 的 DROP 指令都支援 CASCADE 的用法。當然，有些自然的關連性是和物件型別有關。你也可以使用 RESTRICT 來取代 CASCADE 的位置，以強制以預設的行為來處理，也就是絕對不會刪去其他相關的物件。

注意

根據 SQL 標準，不論是 RESTRICT 或 CASCADE，都必須要在 DROP 指令中明確表示，但沒有任何一套資料庫系統真的這樣設計。不過，都會內定預設行為是 RESTRICT 或 CASCADE，每個資料庫不同。

如果 DROP 指令列出了多個物件，CASCADE 只有在這些物件之外還有相依性時才會需要。舉個例子，當執行「DROP TABLE tab1, tab2」時，即使 tab1 與 tab2 之間有外部索引鍵的相依關係，而沒有指定 CASCADE，這個操作也會完成。

對於使用者自訂的函數來說，PostgreSQL 會引用函數的外顯屬性來判斷其相依性，例如函數的參數或輸出型態，但函數內部執行的相依關係就無法追蹤了。舉個列子：

CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow',
                             'green', 'blue', 'purple');

CREATE TABLE my_colors (color rainbow, note text);

CREATE FUNCTION get_color_note (rainbow) RETURNS text AS
  'SELECT note FROM my_colors WHERE color = $1'
  LANGUAGE SQL;

（參閱 37.4 節，瞭解 SQL 語言的函數。）PostgreSQL 會知道 get_color_note 函數相依於 rainbow 資料型別：也就是刪去該資料型別時，也會強制要刪去該函數，因為它的參數將不再合法。但 PostgreSQL 就無法發現 get_color_note 和 my_colors 之間的關連性，當該資料表被移除時，此函數並不會跟著被移除。這種情況有好有壞，函數基本上還是合法的，即使內含的資料表不存在的話，頂多就是執行會出錯就是了，只要再建立該名稱的資料表就可以讓這個函數重新正常運作。

6. 資料處理

前一章討論了如何建立資料表和其他結構來保存資料。現在是把資料表填滿的時候了。本章介紹如何新增、更新和刪除資料表的資料。下一章將會完整說明如何從資料庫中取回你遺落在裡面的資料。

6.1. 新增資料

資料表在建立的時候，並不包含任何資料。以各種方式使用資料庫之前，要做的第一件事就是新增資料。概念上，資料是一次新增一列。當然你也可以新增多列，但就沒有辦法新增少於一列。即使只知道某些欄位的值，也必須建立一個完整的資料列。

要建立新的資料列，請使用指令。該命令需要資料表的名稱和各欄位的資料內容。例如，來看看中的產品資料表：

新增資料列的指令可能如下所示：

資料內容按資料表表中欄位的順序列出，以逗號分隔。通常，資料內容會是文字（常數），但運算表示式也是允許的。

上面的語法有缺點，就是你需要知道資料表中欄位的順序。為了避免這種情況，您可以明確地列出欄位。例如，以下兩個命令與上面的命令具有相同的效果：

許多用戶認為總是列出欄位名稱是一個很好的習慣。

如果你並沒有所有欄位的內容，則可以省略其中一些欄位。在這種情況下，那些欄位將會以預設值代入。如下所示：

第二種形式是屬於 PostgreSQL 延伸寫法。從左邊開始的欄位填入所給定的內容，其餘的欄位則使用預設值。

為了清楚起見，你也可以明確地指定個別欄位或整個資料列都使用預設值：

您可以在一個命令中新增多個資料列：

也可以以查詢的結果新增（可能沒有資料，一個資料列或多個資料列）：

這包含完整 SQL 查詢機制（）用於計算需要新增的資料列。

小技巧

同時要新增大量資料時，請考慮使用指令。它不像 INSERT 指令那麼靈活，但是效率更高。有關提高批次新增資料效率的更多資訊，請參閱。

6.2. 更新資料

將已經在資料庫中的資料做修改被稱為更新。您可以單獨更新某個資料列，或資料表中的所有資料列，或是部份資料列。每個欄位可以單獨更新，而不影響其他欄位。

要更新現有的資料列，請使用指令。這需要三種資訊：

要更新的資料表和欄位的名稱
資料欄位新的內容
哪些資料列要更新

回想一下，SQL 通常不提供資料列的唯一識別資訊。因此，直接指定要更新哪一行通常是不行的，而是指定該資料列必須符合哪些條件才能更新。只有你在資料表中有一個主鍵（決定於是否你有宣告過）之後，才能通過選擇與主鍵相匹配的條件來可靠地解決單個資料列的問題。圖形化的資料庫管理工具依賴這個方式才能允許你單獨更新指定的資料列。

例如，這個指令會將價格為 5 的所有產品更新為 10：

這結果可能是零個，一個或多個資料列被更新。嘗試更新卻沒有匹配到任何資料列，並不是一種錯誤。

我們來詳細看看這個命令。首先是關鍵字 UPDATE，然後是資料表的名稱。像往常一樣，資料表的名稱可以使用加上 schema 的完整路徑名稱，否則就會在搜尋路徑中尋找。接下來的關鍵字是 SET，後面接著欄位名稱，等號和新的欄位內容。新的欄位內容可以是任何的運算表示式，而不僅僅是一個常數。例如，如果要將所有產品的價格提高10％，則可以使用：

如你所見，欄位的表示式可以引用資料列中現有的內容。我們還遺漏了 WHERE 子句。如果省略的話，則意味著資料表中的所有資料列都會被更新。如果存在的話，則只有更新符合 WHERE 條件的那些資料列。請注意，SET 子句中的等號是一個賦值運算，而 WHERE 子句中的等號是比較運算，但這不會造成任何誤解。當然，WHERE 條件不一定是等號運算。還有許多其他的運算子可以使用（詳見第 9 章）。但是表示式需要能產生為布林運算的結果。

您可以在使用 UPDATE 指令時，以 SET 子句中列出多個欄位賦值來更新多個欄位內容。例如：

6.3. 刪除資料

到目前為止，我們已經解釋瞭如何將資料新增到資料表以及如何更新資料了。剩下的就是討論如何刪除不再需要的資料。正如新增資料時只能新增整個資料列一樣，你只能從資料表中以資料列為單位刪除資料。在前面的章節中，我們解釋了SQL沒有提供直接處理某個資料列的方法。因此，只能透過指定要刪除的行必須符合的條件來刪除指定的資料列。如果資料列中有主鍵，則可以指定確切的資料列。但是，你也可以刪除全部符合條件的資料列，更可以一次刪除資料表中的所有資料列。

您使用 DELETE 指令刪除資料列；該語法與 UPDATE 指令十分類似。例如，要從產品表中刪除價格為 10 的所有資料列，請使用：

DELETE FROM products WHERE price = 10;

如果你只是寫：

DELETE FROM products;

那麼資料表中的所有資料列都將被刪除！請程式設計師一定要小心使用。

6.4. 修改並回傳資料

有時在修改資料列的操作過程中取得資料是很方便的。INSERT、UPDATE 和 DELETE 指令都有一個選擇性的RETURNING 子句來支持這個功能。使用 RETURNING 可以避免執行額外的資料庫查詢來收集資料，特別是在難以可靠地識別修改的資料列時尤其有用。

RETURNING 子句允許的語法與 SELECT 指令的輸出列表相同（詳見）。它可以包含命令目標資料表的欄位名稱，或者包含使用這些欄位的表示式。常用的簡寫形式是 RETURNING *，預設是資料表的所有欄位，且相同次序。

在 INSERT 中，可用於 RETURNING 的資料是新增的資料列。這在一般的資料新增中並不是很有用，因為它只會重複用戶端所提供的資料。但如果是計算過的預設值就會非常方便。例如，當使用串列欄位（）提供唯一識別時，RETURNING 可以回傳分配給新資料列的 ID：

對於 INSERT ... SELECT，RETURNING 子句也非常有用。

在 UPDATE 中，可用於 RETURNING 的資料是被修改的資料列新內容。例如：

在 DELETE 中，可用於 RETURNING 的資料是已刪除資料列的內容。例如：

如果目標資料表上有觸發函數的話（），則可用於 RETURNING 的資料是由該觸發函數所修改的資料列。因此，由觸發函數計算檢查欄位是 RETURNING 的另一個常見用法。

7. 資料查詢

前面的章節解釋了如何建立資料表，如何填入資料以及如何操作這些資料。現在我們是時候討論如何從資料庫中檢索資料了。

7.1. 概觀

檢索過程或從資料庫檢索資料的命令稱之為查詢。在 SQL 中，SELECT 命令用於進行條件查詢。 SELECT 指令的一般語法是：

[WITH with_queries] SELECT select_list FROM table_expression [sort_specification]

以下各節介紹了資料列表（select list），資料表和排序規則的詳細資訊。由於 WITH 查詢是高級功能，因此最後再介紹。

一種簡單的查詢形式如下：

SELECT * FROM table1;

假設有一個名稱為 table1 的資料表，該指令會將取出 table1 中的所有資料表和所有用戶定義的欄位。（檢索的方法取決於用戶端的應用程序，例如，psql 程序將在屏幕上顯示一個 ASCII-art 表格，而用戶端的程式函式庫將提供從查詢結果中提取單一值的功能。選擇資料列表定義「*」表示由資料表表示式所產生的所有欄位。篩選列表可以是可用欄位的子集或使用欄位進行計算。例如，如果 table1 具有名稱為 a，b 和 c（也許是其他）的欄位，則可以進行以下查詢：

SELECT a, b + c FROM table1;

（假設 b 和 c 是數字型別）。更多細節詳見 7.3 節。

FROM table1是一種簡單的資料表表示式：它只讀取一個資料表。一般來說，資料表表示式可以是一般的資料表，交叉查詢和子查詢的複雜結構。但是，你也可以完全省略資料表表示式，並使用 SELECT 指令作為計算機：

SELECT 3 * 4;

使用資料列表中的表達式產生變動的結果，是更為常用的方式。例如，你可以這樣呼叫一個函數：

SELECT random();

7.3. 取得資料列表

如前一節所述，SELECT 指令中的資料示表表示式透過各種可能地組合資料表、view、消除資料列、分組等來建構中介的虛擬資料表。這個資料表最終會被傳遞給資料列表的處理。資料列表確認中介資料表的哪些欄位是實際上要輸出的。

7.3.1. 資料列表項目

最簡單的選擇列表是*，它表示資料表表示式產生的所有欄位。否則，資料列表是逗號分隔的參數表示式列表（如中所定義的）。例如，它可能是欄位名稱的列表：

欄位名稱 a、b 和 c 是 FROM 子句中資料表的欄位的實際名稱，或者是由中所賦予它們的別名。資料列表中可用的命名空間與 WHERE 子句中的命名空間相同，除非是使用分組查詢，在這種情況下，它與 HAVING 子句中的相同。

如果多個資料表具有相同名稱的欄位，則還必須加上資料表的名稱，如下所示：

處理多個資料表時，查詢特定資料表的所有欄位也是可以的：

有關 table_name.* 表示法的更多信息，請參閱第 8.16.5 節。

如果在資料列表中使用任意值表示式，則概念上是它將新的虛擬欄位加到回傳的資料表中。參數表示式對每個結果資料列計算一次，將該資料列的值替換為任何欄位引用。但是資料列表中的表示式不必引用 FROM 子句的資料表表示式中的任何欄位；例如，它們可以是常數算術表示式。

7.3.2. 欄位命名標籤

資料列表中的項目可以被分配用於後續處理的名稱，例如在 ORDER BY 子句中使用或由用戶端應用程序顯示。例如：

如果沒有使用 AS 指定輸出欄位的名稱，系統將分配一個預設的欄位名稱。對於簡單欄位的引用，就是引用欄位的名稱。對於函數呼叫，就是函數的名稱。對於複雜的表示式，系統將會產成一個通用的名稱。

AS 關鍵字是選用的，但前提是新的欄位名稱不為任何PostgreSQL 關鍵字（請參閱）。為避免與關鍵字意外撞名，你可以對欄位名稱使用雙引號。例如，VALUE 是一個關鍵字，所以就不能這樣使用：

但這樣就可以了：

為了防止未來可能增加的關鍵字，建議你習慣使用 AS 或總是在欄位名稱使用雙引號。

注意
這裡輸出欄位的命名與 FROM 子句中的命名不同（參閱第 7.2.1.2 節）。可以重新命名相同的欄位兩次，但在資料列表中分配的名稱是將要回傳的名稱。

7.3.3. `DISTINCT`

在處理了資料列表之後，結果資料表可以選擇性地消除重複的資料列。 DISTINCT 關鍵字在 SELECT 之後直接寫入以指定這個動作：

（如果不是 DISTINCT，而是關鍵字 ALL，可用於指定保留所有資料列的預設行為。）

顯然，如果至少有一個欄位值不同，則兩個資料列就會被認為是不同的。在這個比較中，空值（null）被認為是相等的。

或者，使用表示式可以指定資料列如何被認為是不同的：

這裡表示式是一個任意的運算表示式，對所有資料列進行求值運算。所有表示式相等的一組資料列被認為是重複的，並且只有該組的第一個資料列會被保留在輸出中。請注意，集合中的「第一行」是不可預知的，除非查詢按足夠的欄位進行排序，以保證進到 DISTINCT 過濾器的資料列是唯一排序。（在 ORDER BY 排序後才進行 DISTINCT ON 處理。）

DISTINCT ON 子句不是SQL標準的一部分，有時被認為是不好的樣式，因為其結果有潛在的不確定性。透過在 FROM 中智慧地使用 GROUP BY 和子查詢，可以避免這種結構，但這卻往往是最方便的選擇。

7.4. 合併查詢結果

兩個查詢的結果可以使用集合操作聯、交集和差集來組合。其語法為：

query1 和 query2 是到目前為止討論過的任何查詢功能。集合操作也可以巢狀也可以連接，例如：

會如下方式執行：

UNION 將 query2 的結果有效率地附加到 query1 的結果中（但不能保證這是實際回傳資料列的次序）。此外，除非使用了UNION ALL，否則它將以與 DISTINCT相同的方式從結果中消除重複的資料列。

INTERSECT 返回 query1 的結果和 query2 的結果中所有共同的資料列。除非使用 INTERSECT ALL，否則會刪除重複的資料列。

EXCEPT 回傳 query1 的結果中但不包含在 query2 的結果中的所有資料列。（這有時被稱為兩個查詢之間的差集。）同樣地，除非使用 EXCEPT ALL，否則重複資料列將被刪除。

為了計算兩個查詢的聯集、交集或差集，兩個查詢必須是「union compatible」，這意味著它們回傳相同數量的欄位，相應的欄位具有相容的資料型別，如所述。

7.5. 資料排序

在查詢產生了一個輸出資料表（處理了資料列表之後）之後，可以對其資料列進行排序。如果未選擇排序，則資料列將以未指定的順序回傳。在這種情況下的實際順序將取決於資料掃描和交叉查詢類型以及磁碟上的順序，但不能依賴它。只有明確選擇了排序方式，才能保證特定的輸出排序。

以 ORDER BY 子句指定排序順序：

排序表示式可以在查詢的資料列表中有效的任何表示式。一個例子是：

當指定多個表示式時，後面的表示式用於前面表示式都相同的資料進行排序。每個表示式可以跟隨一個選擇性的 ASC 或 DESC 關鍵字來設定排序方向為升冪或降冪。 ASC 排序是預設的選項。升冪首先放置較小的值，其中「較小」是根據「<」運算元定義的。同樣，降冪也是由「>」運算元決定的。

NULLS FIRST 和 NULLS LAST 選項可用於確定在排序順序中是否出現空值出現在非空值之前或之後。預設情況下，空值排序大於任何非空值；也就是 NULLS FIRST 是 DESC 選項的預設值，否則就是 NULLS LAST。

請注意，排序選項是針對每個排序欄位獨立考慮的。例如 ORDER BY x, y DESC 是指 ORDER BY x ASC, y DESC，它與 ORDER BY x DESC, y DESC 不同。

排序表示式也可以是輸出欄位的欄位標籤或編號，如下所示：

兩者都按第一個輸出欄位排序。請注意，輸出欄位名稱必須獨立，也就是說，不能在表示式中使用 - 例如，這樣是不正確的：

這種限制是為了減少歧義。即使 ORDER BY 項目是一個簡單的名字，可以匹配輸出欄位名稱或者資料表表示式中的一項，這仍然是會混淆的。在這種情況下請使用輸出欄位。如果您使用 AS 來重新命名輸出欄位以匹配其他資料表欄位的名稱，只會導致混淆。

可以將 ORDER BY 應用於 UNION、INTERSECT 或 EXCEPT 組合的結果，但在這種情況下，只允許按輸出欄位名稱或數字進行排序，而不能使用表示式進行排序。

7.6. 指定資料範圍

LIMIT 和 OFFSET 允許你只回傳由查詢生成的一部分資料列：

如果給了一個限制的數量，那麼只有那個數目的資料列會回傳（如果查詢本身產生較少的資料列，則可能會少一些）。LIMIT ALL 與省略 LIMIT 子句相同，也如同 LIMIT 的參數為 NULL。

OFFSET 指的是在開始回傳資料列之前跳過那麼多少資料列。OFFSET 0 與忽略 OFFSET 子句相同，就像使用 NULL 參數的 OFFSET 一樣。

如果同時出現 OFFSET 和 LIMIT，則在開始計算回傳的LIMIT 資料列之前，先跳過 OFFSET 數量的資料列。

使用 LIMIT 時，運用 ORDER BY 子句將結果資料列限制為唯一順序非常重要。否則，你會得到一個不可預知的查詢資料列的子集。你可能會查詢第十到第二十個資料列，但是第十到第二十個資料列是按什麼順序排列的？次序是未知的，除非你指定 ORDER BY。

查詢最佳化在產生查詢計劃時會將 LIMIT 考慮在內，所以根據你給的 LIMIT 和 OFFSET，你很可能會得到不同的計劃（產生不同的資料列順序）。因此，使用不同的 LIMIT / OFFSET 值來選擇查詢結果的不同子集將導致不一致的結果，除非使用 ORDER BY 強制執行可預測的結果排序。這不是一個錯誤；這是一種事實上的結果，即 SQL 不保證以任何特定順序傳遞查詢的結果，除非使用 ORDER BY 來約束順序。

由 OFFSET 子句跳過的資料列仍然需要在伺服器內計算。因此一個大的 OFFSET 可能是低效率的。

7.7. 列舉資料

VALUES 提供了一種產生「靜態資料表」的方法，可以在查詢中使用，而不必實際創建和寫入磁碟上的資料表。其語法是

VALUES ( expression [, ...] ) [, ...]

每個括號內的表示式列表在資料表中生成一個資料列。列表必須具有相同數量的元素（即資料表中的欄位數），並且每個列表中的對應條目必須具有兼容的資料型別。分配給結果中每個欄位的實際資料型別，使用與 UNION 相同的規則來給定（請參閱第 10.5 節）。

如下範例所示：

VALUES (1, 'one'), (2, 'two'), (3, 'three');

將回傳一個兩個欄位三個資料列的資料表。這實際上相當於：

SELECT 1 AS column1, 'one' AS column2
UNION ALL
SELECT 2, 'two'
UNION ALL
SELECT 3, 'three';

預設情況下，PostgreSQL 會將名稱 column1、column2 等分配給 VALUES 資料表的欄位。欄位名稱並不是由 SQL 標準規定的，不同的資料庫系統會以不同的方式賦予，所以通常以資料表別名列表覆寫預設名稱會比較好，如下所示：

=> SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS t (num,letter);
 num | letter
-----+--------
   1 | one
   2 | two
   3 | three
(3 rows)

在語法上，VALUES 接在表示式列表之後被視為等同於：

SELECT select_list FROM table_expression

並可以出現在任何一個 SELECT 可以使用的地方。例如，你可以將其用作為 UNION 的一部分，或者為其增加排序規則（ORDER BY、LIMIT 和 OFFSET）。在 INSERT 命令中，VALUES 最常來作為資料源，其次最常在子查詢。

關於更多訊息，請參閱 VALUES。

8.2. 貨幣型別

貨幣型別儲存具有固定小數精確度的貨幣數量；詳見表 8.3。小數精確度視資料庫的 lc_monetary 設定而定。表中顯示的範圍假設有兩個小數位。有許多可以接受的格式，包括整數和浮點數字，以及典型的貨幣格式，例如如「$1,000.00」。輸出時通常採用後者的形式，但取決於語言環境（locale）。

Table 8.3. Monetary Types

Name

Storage Size

Description

Range

money

8 bytes

currency amount

-92233720368547758.08 to +92233720368547758.07

由於此資料型別的輸出是與區域設定有關的，因此可能無法將貨幣資料載入到不同 lc_monetary 設定的資料庫中。為避免出現問題，在將轉換恢復到新的資料庫之前，請確保 lc_monetary 與轉換的資料庫中的設定值相容。

numberic、int 和 bigint 資料型別的值可以轉換為 money。從 real 和 double precision 資料型別轉換會先轉為 numeric 來完成，例如：

SELECT '12.34'::float8::numeric::money;

但是，並不推薦這樣做。由於四捨五入誤差的可能性，不應該使用浮點數來處理貨幣。

money 型別的數值可以轉換為 numeric 而不會損失精確度。轉換為其他型別可能會失去精確性，而且還必須分兩步驟完成：

SELECT '52093.89'::money::numeric::float8;

當貨幣數值除以另一貨幣數值時，結果會是 double precision（即純數，而不是貨幣）；貨幣單位會相互抵消。

8.3. 字串型別

Table 8.4. Character Types

Name

Description

character varying(n), varchar(n)

可變長度，但有限制

character(n), char(n)

固定長度，空白填充

text

可變且無限長度

Table 8.4 列出了 PostgreSQL 中可用的通用字串型別。

SQL 定義了兩種主要字串型別：character varying(n) 和 character(n)，其中 n 是正整數。這兩種型別都可以儲存長度最多為 n 個字元（不是位元組）的字串。嘗試將較長的字串儲存到這些型別的欄位中將産生錯誤，除非多餘的字元都是空格，在這種情況下，字串將被截斷為最大長度。（這個有點奇怪的異常是 SQL 標準所要求的。）如果要儲存的字串比宣告的長度短，則 character 型別的值將被空格填充；character varying 的值將只儲存較短的字串。

如果明確地將值轉換為 character varying(n) 或 character(n)，則超長值將被截斷為 n 個字元而不會引發錯誤。（這也是 SQL 標準所要求的。）

型別 varchar(n) 和 char(n) 分別是 character varying(n) 和 character(n) 的別名。沒有長度的 character 等同於 character(1)。如果在沒有長度的情況下使用 character varying，則該型別接受任何長度的字串。後者是 PostgreSQL 延伸功能。

另外，PostgreSQL 提供了 text 型別，它儲存任意長度的字串。雖然型別 text 不在 SQL 標準中，但是其他幾個 SQL 資料庫管理系統也支援它。

character 的值用空格填充到指定的長度 n，並以這種方式儲存和顯示。但是，在比較兩個型別字串時，尾隨空格在語義上無關緊要會被忽略。在空格很重要的排序規則中，這種行為會產生意想不到的結果; 例如 SELECT 'a '::CHAR(2) collate "C"<E'a\n'::CHAR(2) 會回傳 true，即使 C 語言環境會認為空格大於換行符。將字串轉換為其他字串型別之一時，將刪除尾隨的空格。請注意，尾隨空格在 character varying 和 text 方面具有語義重要性，尤其在使用樣式匹配時，即 LIKE 和正規表示式。

短字串（126 個位元組以下）的儲存要求是 1 個位元組加上實際字串，其中包括字串空間填充。較長的字串有 4 個位元組的開銷而不是 1。長字串由系統自動壓縮，因此磁碟上的物理需求可能更少。非常長的值也儲存在後台的資料表中，這樣它們就不會干擾對較短欄位的快速存取。在任何情況下，可儲存的最長字串大約為 1 GB。（資料型別宣告中 n 允許的最大值小於此值。更改此值沒有用，因為使用多位元組字串編碼時，位元組數和字元數可能完全不同。如果您希望儲存沒有特定上限的長字串，使用不帶長度的 text 或 character varying，而不是隨便設定長度限制。）

小提醒

這三種型別之間並沒有效能差異，除了使用空白填充類型時增加的儲存空間之外，以及一些額外的 CPU 週期來檢查儲存長度與欄位中的長度。雖然 character(n) 在其他一些資料庫系統中具有效能優勢，但 PostgreSQL 中並沒有這樣的優勢；事實上，由於額外的儲存成本，character(n) 通常是三者中最慢的。在大多數情況下，應使用 text 或 character varying。

有關字串文字語法的資訊，請參閱第 4.1.2.1 節；有關可用運算子和函數的資訊，請參閱第 9 章。資料庫字元集決定用於儲存文字的字元集；有關字元集支援的更多訊息，請參閱第 23.3 節。

Example 8.1. Using the Character Types

CREATE TABLE test1 (a character(4));
INSERT INTO test1 VALUES ('ok');
SELECT a, char_length(a) FROM test1; -- (1)
  a   | char_length
------+-------------
 ok   |           2

CREATE TABLE test2 (b varchar(5));
INSERT INTO test2 VALUES ('ok');
INSERT INTO test2 VALUES ('good      ');
INSERT INTO test2 VALUES ('too long');
ERROR:  value too long for type character varying(5)
INSERT INTO test2 VALUES ('too long'::varchar(5)); -- explicit truncation
SELECT b, char_length(b) FROM test2;
   b   | char_length
-------+-------------
 ok    |           2
 good  |           5
 tool  |           5

(1)

char_length 函數在中討論。

PostgreSQL 中還有另外兩種固定長度的字串型別，如 Table 8.5 所示。name 型別僅用於在內部系統目錄中儲存指標，並非供一般使用者使用。它的長度目前定義為 64 個位元組（63 個可用字元加結尾符號），但應視 C 原始碼中的常數 NAMEDATALEN 而定。長度在編譯時設定（因此可以根據特殊用途進行調整）; 預設的最大長度可能會在將來的版本中變更。型別「“char”」（注意雙引號）與 char(1) 的不同之處在於它僅使用一個位元組的儲存空間。它在系統目錄中作為簡單內部使用的列舉型別。

Table 8.5. Special Character Types

Name

Storage Size

Description

"char"

1 byte

單位元組內部型別

name

64 bytes

物件名稱的內部型別

8.6. 布林型別

PostgreSQL 支援標準 SQL 的布林型別，如表 [Table 8-19]("DATATYPE-BOOLEAN-TABLE") 所示。布林型別有幾種狀態: "true"、"false"，和第三種狀態 "unknown"，"unknown" 會用 SQL 的 null 值表示。

Table 8-19. 布林型別的資料型態描述

以下的字詞都可以代表 "true" 狀態:

"false" 狀態則可以用以下的字詞表示:

開頭和結尾的空白都會被忽略，也不分大小寫。為了符合 SQL 用法，建議使用關鍵字 "TRUE" 和 "FALSE"。

[Example 8-2]("DATATYPE-BOOLEAN-EXAMPLE") 使用字母 t 和 f，來顯示布林型別的輸出。

Example 8-2. 使用布林型別

8.7. 列舉型別

Enumerated (enum) types are data types that comprise a static, ordered set of values. They are equivalent to the enum types supported in a number of programming languages. An example of an enum type might be the days of the week, or a set of status values for a piece of data.

8.7.1. Declaration of Enumerated Types

Enum types are created using the command, for example:

Once created, the enum type can be used in table and function definitions much like any other type:

8.7.2. Ordering

The ordering of the values in an enum type is the order in which the values were listed when the type was created. All standard comparison operators and related aggregate functions are supported for enums. For example:

8.7.3. Type Safety

Each enumerated data type is separate and cannot be compared with other enumerated types. See this example:

If you really need to do something like that, you can either write a custom operator or add explicit casts to your query:

8.7.4. Implementation Details

Enum labels are case sensitive, so 'happy' is not the same as 'HAPPY'. White space in the labels is significant too.

Although enum types are primarily intended for static sets of values, there is support for adding new values to an existing enum type, and for renaming values (see ). Existing values cannot be removed from an enum type, nor can the sort ordering of such values be changed, short of dropping and re-creating the enum type.

An enum value occupies four bytes on disk. The length of an enum value's textual label is limited by the NAMEDATALEN setting compiled into PostgreSQL; in standard builds this means at most 63 bytes.

The translations from internal enum values to textual labels are kept in the system catalog . Querying this catalog directly can be useful.

8.10. 位元字串型別

Bit strings are strings of 1's and 0's. They can be used to store or visualize bit masks. There are two SQL bit types: bit(n) and bit varying(n), where n is a positive integer.

bit type data must match the length n exactly; it is an error to attempt to store shorter or longer bit strings. bit varying data is of variable length up to the maximum length n; longer strings will be rejected. Writing bit without a length is equivalent to bit(1), while bit varying without a length specification means unlimited length.

Note

If one explicitly casts a bit-string value to bit(n), it will be truncated or zero-padded on the right to be exactly n bits, without raising an error. Similarly, if one explicitly casts a bit-string value to bit varying(n), it will be truncated on the right if it is more than n bits.

Refer to Section 4.1.2.5 for information about the syntax of bit string constants. Bit-logical operators and string manipulation functions are available; see Section 9.6.

Example 8.3. Using the Bit String Types

CREATE TABLE test (a BIT(3), b BIT VARYING(5));
INSERT INTO test VALUES (B'101', B'00');
INSERT INTO test VALUES (B'10', B'101');

ERROR:  bit string length 2 does not match type bit(3)

INSERT INTO test VALUES (B'10'::bit(3), B'101');
SELECT * FROM test;

  a  |  b
-----+-----
 101 | 00
 100 | 101

A bit string value requires 1 byte for each group of 8 bits, plus 5 or 8 bytes overhead depending on the length of the string (but long values may be compressed or moved out-of-line, as explained in Section 8.3 for character strings).

8.12. UUID 型別

The data type uuid stores Universally Unique Identifiers (UUID) as defined by RFC 4122, ISO/IEC 9834-8:2005, and related standards. (Some systems refer to this data type as a globally unique identifier, or GUID, instead.) This identifier is a 128-bit quantity that is generated by an algorithm chosen to make it very unlikely that the same identifier will be generated by anyone else in the known universe using the same algorithm. Therefore, for distributed systems, these identifiers provide a better uniqueness guarantee than sequence generators, which are only unique within a single database.

A UUID is written as a sequence of lower-case hexadecimal digits, in several groups separated by hyphens, specifically a group of 8 digits followed by three groups of 4 digits followed by a group of 12 digits, for a total of 32 digits representing the 128 bits. An example of a UUID in this standard form is:

a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11

PostgreSQL also accepts the following alternative forms for input: use of upper-case digits, the standard format surrounded by braces, omitting some or all hyphens, adding a hyphen after any group of four digits. Examples are:

A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
{a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
a0eebc999c0b4ef8bb6d6bb9bd380a11
a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
{a0eebc99-9c0b4ef8-bb6d6bb9-bd380a11}

Output is always in the standard form.

PostgreSQL provides storage and comparison functions for UUIDs, but the core database does not include any function for generating UUIDs, because no single algorithm is well suited for every application. The uuid-ossp module provides functions that implement several standard algorithms. The pgcrypto module also provides a generation function for random UUIDs. Alternatively, UUIDs could be generated by client applications or other libraries invoked through a server-side function.

8.13. XML 型別

The xml data type can be used to store XML data. Its advantage over storing XML data in a text field is that it checks the input values for well-formedness, and there are support functions to perform type-safe operations on it; see . Use of this data type requires the installation to have been built with configure --with-libxml.

The xml type can store well-formed “documents”, as defined by the XML standard, as well as “content” fragments, which are defined by reference to the more permissive of the XQuery and XPath data model. Roughly, this means that content fragments can have more than one top-level element or character node. The expression xmlvalue IS DOCUMENT can be used to evaluate whether a particular xml value is a full document or only a content fragment.

Limits and compatibility notes for the xml data type can be found in .

8.13.1. Creating XML Values

To produce a value of type xml from character data, use the function xmlparse:

Examples:

While this is the only way to convert character strings into XML values according to the SQL standard, the PostgreSQL-specific syntaxes:

can also be used.

The xml type does not validate input values against a document type declaration (DTD), even when the input value specifies a DTD. There is also currently no built-in support for validating against other XML schema languages such as XML Schema.

The inverse operation, producing a character string value from xml, uses the function xmlserialize:

type can be character, character varying, or text (or an alias for one of those). Again, according to the SQL standard, this is the only way to convert between type xml and character types, but PostgreSQL also allows you to simply cast the value.

When a character string value is cast to or from type xml without going through XMLPARSE or XMLSERIALIZE, respectively, the choice of DOCUMENT versus CONTENT is determined by the “XML option” session configuration parameter, which can be set using the standard command:

or the more PostgreSQL-like syntax

The default is CONTENT, so all forms of XML data are allowed.

8.13.2. Encoding Handling

Care must be taken when dealing with multiple character encodings on the client, server, and in the XML data passed through them. When using the text mode to pass queries to the server and query results to the client (which is the normal mode), PostgreSQL converts all character data passed between the client and the server and vice versa to the character encoding of the respective end; see . This includes string representations of XML values, such as in the above examples. This would ordinarily mean that encoding declarations contained in XML data can become invalid as the character data is converted to other encodings while traveling between client and server, because the embedded encoding declaration is not changed. To cope with this behavior, encoding declarations contained in character strings presented for input to the xml type are ignored, and content is assumed to be in the current server encoding. Consequently, for correct processing, character strings of XML data must be sent from the client in the current client encoding. It is the responsibility of the client to either convert documents to the current client encoding before sending them to the server, or to adjust the client encoding appropriately. On output, values of type xml will not have an encoding declaration, and clients should assume all data is in the current client encoding.

When using binary mode to pass query parameters to the server and query results back to the client, no encoding conversion is performed, so the situation is different. In this case, an encoding declaration in the XML data will be observed, and if it is absent, the data will be assumed to be in UTF-8 (as required by the XML standard; note that PostgreSQL does not support UTF-16). On output, data will have an encoding declaration specifying the client encoding, unless the client encoding is UTF-8, in which case it will be omitted.

Needless to say, processing XML data with PostgreSQL will be less error-prone and more efficient if the XML data encoding, client encoding, and server encoding are the same. Since XML data is internally processed in UTF-8, computations will be most efficient if the server encoding is also UTF-8.

Caution

Some XML-related functions may not work at all on non-ASCII data when the server encoding is not UTF-8. This is known to be an issue for xmltable() and xpath() in particular.

8.13.3. Accessing XML Values

The xml data type is unusual in that it does not provide any comparison operators. This is because there is no well-defined and universally useful comparison algorithm for XML data. One consequence of this is that you cannot retrieve rows by comparing an xml column against a search value. XML values should therefore typically be accompanied by a separate key field such as an ID. An alternative solution for comparing XML values is to convert them to character strings first, but note that character string comparison has little to do with a useful XML comparison method.

Since there are no comparison operators for the xml data type, it is not possible to create an index directly on a column of this type. If speedy searches in XML data are desired, possible workarounds include casting the expression to a character string type and indexing that, or indexing an XPath expression. Of course, the actual query would have to be adjusted to search by the indexed expression.

The text-search functionality in PostgreSQL can also be used to speed up full-document searches of XML data. The necessary preprocessing support is, however, not yet available in the PostgreSQL distribution.

8.18. Domain Types

A domain is a user-defined data type that is based on another underlying type. Optionally, it can have constraints that restrict its valid values to a subset of what the underlying type would allow. Otherwise it behaves like the underlying type — for example, any operator or function that can be applied to the underlying type will work on the domain type. The underlying type can be any built-in or user-defined base type, enum type, array type, composite type, range type, or another domain.

For example, we could create a domain over integers that accepts only positive integers:

CREATE DOMAIN posint AS integer CHECK (VALUE > 0);
CREATE TABLE mytable (id posint);
INSERT INTO mytable VALUES(1);   -- works
INSERT INTO mytable VALUES(-1);  -- fails

When an operator or function of the underlying type is applied to a domain value, the domain is automatically down-cast to the underlying type. Thus, for example, the result of mytable.id - 1 is considered to be of type integer not posint. We could write (mytable.id - 1)::posint to cast the result back to posint, causing the domain's constraints to be rechecked. In this case, that would result in an error if the expression had been applied to an id value of 1. Assigning a value of the underlying type to a field or variable of the domain type is allowed without writing an explicit cast, but the domain's constraints will be checked.

有關更多資訊，請參閱 CREATE DOMAIN。

8.20. pg_lsn 型別

pg_lsn 資料型別用於儲存 LSN（日誌序列編號）資料，該資料是指向 WAL 中某個位置的指標。此型別用於表示 XLogRecPtr，並且是 PostgreSQL 的內部系統型別。

Internally, an LSN is a 64-bit integer, representing a byte position in the write-ahead log stream. It is printed as two hexadecimal numbers of up to 8 digits each, separated by a slash; for example, 16/B374D848. The pg_lsn type supports the standard comparison operators, like = and >. Two LSNs can be subtracted using the - operator; the result is the number of bytes separating those write-ahead log locations.

9. 函式及運算子

PostgreSQL 為內建的資料型別提供了大量的函數和運算子。使用者還可以定義自己的函數和運算子，如所述。psql 指令 \df 和 \do 可分別用於列出所有可用的函數和運算子。

如果您擔心可移植性，那麼請注意，本章中描述的大多數函數和運算子（最常見的算術運算子和比較運算子以及一些明確標記的函數除外）都不是由 SQL 標準指定的。其他一些 SQL 資料庫管理系統提供了其中一些延伸功能，並且在許多情況下，這些功能在各種實作之間是相容和一致的。本章可能不夠完整；附加功能出現在手冊的其他相關章節中。

9.1. 邏輯運算子

常見可用的邏輯運算子：

SQL 使用具有 true、false 和 null 的三值邏輯系統，其中 null 表示“未知”。請參閱以下真值表：

運算子 AND 和 OR 是可交換的，也就是說，您可以在不影響結果的情況下交換左右運算元。有關子表示式求值順序的更多資訊，請參閱。

9.10. 列舉型別函式

For enum types (described in), there are several functions that allow cleaner programming without hard-coding particular values of an enum type. These are listed in. The examples assume an enum type created as:

Table 9.32. Enum Support Functions

Notice that except for the two-argument form ofenum_range, these functions disregard the specific value passed to them; they care only about its declared data type. Either null or a specific value of the type can be passed, with the same result. It is more common to apply these functions to a table column or function argument than to a hardwired type name as suggested by the examples.

8.15. 陣列

PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, composite type, range type, or domain can be created.

8.15.1. Declaration of Array Types

To illustrate the use of array types, we create this table:

CREATE TABLE sal_emp (
    name            text,
    pay_by_quarter  integer[],
    schedule        text[][]
);

As shown, an array data type is named by appending square brackets ([]) to the data type name of the array elements. The above command will create a table named sal_emp with a column of type text (name), a one-dimensional array of type integer (pay_by_quarter), which represents the employee's salary by quarter, and a two-dimensional array of text (schedule), which represents the employee's weekly schedule.

The syntax for CREATE TABLE allows the exact size of arrays to be specified, for example:

CREATE TABLE tictactoe (
    squares   integer[3][3]
);

However, the current implementation ignores any supplied array size limits, i.e., the behavior is the same as for arrays of unspecified length.

The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE is simply documentation; it does not affect run-time behavior.

An alternative syntax, which conforms to the SQL standard by using the keyword ARRAY, can be used for one-dimensional arrays. pay_by_quarter could have been defined as:

    pay_by_quarter  integer ARRAY[4],

Or, if no array size is to be specified:

    pay_by_quarter  integer ARRAY,

As before, however, PostgreSQL does not enforce the size restriction in any case.

8.15.2. Array Value Input

To write an array value as a literal constant, enclose the element values within curly braces and separate them by commas. (If you know C, this is not unlike the C syntax for initializing structures.) You can put double quotes around any element value, and must do so if it contains commas or curly braces. (More details appear below.) Thus, the general format of an array constant is the following:

'{ val1 delim val2 delim ... }'

where delim is the delimiter character for the type, as recorded in its pg_type entry. Among the standard data types provided in the PostgreSQL distribution, all use a comma (,), except for type box which uses a semicolon (;). Each val is either a constant of the array element type, or a subarray. An example of an array constant is:

'{{1,2,3},{4,5,6},{7,8,9}}'

This constant is a two-dimensional, 3-by-3 array consisting of three subarrays of integers.

To set an element of an array constant to NULL, write NULL for the element value. (Any upper- or lower-case variant of NULL will do.) If you want an actual string value “NULL”, you must put double quotes around it.

(These kinds of array constants are actually only a special case of the generic type constants discussed in Section 4.1.2.7. The constant is initially treated as a string and passed to the array input conversion routine. An explicit type specification might be necessary.)

Now we can show some INSERT statements:

INSERT INTO sal_emp
    VALUES ('Bill',
    '{10000, 10000, 10000, 10000}',
    '{{"meeting", "lunch"}, {"training", "presentation"}}');

INSERT INTO sal_emp
    VALUES ('Carol',
    '{20000, 25000, 25000, 25000}',
    '{{"breakfast", "consulting"}, {"meeting", "lunch"}}');

The result of the previous two inserts looks like this:

SELECT * FROM sal_emp;
 name  |      pay_by_quarter       |                 schedule
-------+---------------------------+-------------------------------------------
 Bill  | {10000,10000,10000,10000} | {{meeting,lunch},{training,presentation}}
 Carol | {20000,25000,25000,25000} | {{breakfast,consulting},{meeting,lunch}}
(2 rows)

Multidimensional arrays must have matching extents for each dimension. A mismatch causes an error, for example:

INSERT INTO sal_emp
    VALUES ('Bill',
    '{10000, 10000, 10000, 10000}',
    '{{"meeting", "lunch"}, {"meeting"}}');
ERROR:  multidimensional arrays must have array expressions with matching dimensions

The ARRAY constructor syntax can also be used:

INSERT INTO sal_emp
    VALUES ('Bill',
    ARRAY[10000, 10000, 10000, 10000],
    ARRAY[['meeting', 'lunch'], ['training', 'presentation']]);

INSERT INTO sal_emp
    VALUES ('Carol',
    ARRAY[20000, 25000, 25000, 25000],
    ARRAY[['breakfast', 'consulting'], ['meeting', 'lunch']]);

Notice that the array elements are ordinary SQL constants or expressions; for instance, string literals are single quoted, instead of double quoted as they would be in an array literal. The ARRAY constructor syntax is discussed in more detail in Section 4.2.12.

8.15.3. Accessing Arrays

Now, we can run some queries on the table. First, we show how to access a single element of an array. This query retrieves the names of the employees whose pay changed in the second quarter:

SELECT name FROM sal_emp WHERE pay_by_quarter[1] <> pay_by_quarter[2];

 name
-------
 Carol
(1 row)

The array subscript numbers are written within square brackets. By default PostgreSQL uses a one-based numbering convention for arrays, that is, an array of n elements starts with array[1] and ends with array[n].

This query retrieves the third quarter pay of all employees:

SELECT pay_by_quarter[3] FROM sal_emp;

 pay_by_quarter
----------------
          10000
          25000
(2 rows)

We can also access arbitrary rectangular slices of an array, or subarrays. An array slice is denoted by writing lower-bound:upper-bound for one or more array dimensions. For example, this query retrieves the first item on Bill's schedule for the first two days of the week:

SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';

        schedule
------------------------
 {{meeting},{training}}
(1 row)

If any dimension is written as a slice, i.e., contains a colon, then all dimensions are treated as slices. Any dimension that has only a single number (no colon) is treated as being from 1 to the number specified. For example, [2] is treated as [1:2], as in this example:

SELECT schedule[1:2][2] FROM sal_emp WHERE name = 'Bill';

                 schedule
-------------------------------------------
 {{meeting,lunch},{training,presentation}}
(1 row)

To avoid confusion with the non-slice case, it's best to use slice syntax for all dimensions, e.g., [1:2][1:1], not [2][1:1].

It is possible to omit the lower-bound and/or upper-bound of a slice specifier; the missing bound is replaced by the lower or upper limit of the array's subscripts. For example:

SELECT schedule[:2][2:] FROM sal_emp WHERE name = 'Bill';

        schedule
------------------------
 {{lunch},{presentation}}
(1 row)

SELECT schedule[:][1:1] FROM sal_emp WHERE name = 'Bill';

        schedule
------------------------
 {{meeting},{training}}
(1 row)

An array subscript expression will return null if either the array itself or any of the subscript expressions are null. Also, null is returned if a subscript is outside the array bounds (this case does not raise an error). For example, if schedule currently has the dimensions [1:3][1:2] then referencing schedule[3][3] yields NULL. Similarly, an array reference with the wrong number of subscripts yields a null rather than an error.

An array slice expression likewise yields null if the array itself or any of the subscript expressions are null. However, in other cases such as selecting an array slice that is completely outside the current array bounds, a slice expression yields an empty (zero-dimensional) array instead of null. (This does not match non-slice behavior and is done for historical reasons.) If the requested slice partially overlaps the array bounds, then it is silently reduced to just the overlapping region instead of returning null.

The current dimensions of any array value can be retrieved with the array_dims function:

SELECT array_dims(schedule) FROM sal_emp WHERE name = 'Carol';

 array_dims
------------
 [1:2][1:2]
(1 row)

array_dims produces a text result, which is convenient for people to read but perhaps inconvenient for programs. Dimensions can also be retrieved with array_upper and array_lower, which return the upper and lower bound of a specified array dimension, respectively:

SELECT array_upper(schedule, 1) FROM sal_emp WHERE name = 'Carol';

 array_upper
-------------
           2
(1 row)

array_length will return the length of a specified array dimension:

SELECT array_length(schedule, 1) FROM sal_emp WHERE name = 'Carol';

 array_length
--------------
            2
(1 row)

cardinality returns the total number of elements in an array across all dimensions. It is effectively the number of rows a call to unnest would yield:

SELECT cardinality(schedule) FROM sal_emp WHERE name = 'Carol';

 cardinality
-------------
           4
(1 row)

8.15.4. Modifying Arrays

An array value can be replaced completely:

UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}'
    WHERE name = 'Carol';

or using the ARRAY expression syntax:

UPDATE sal_emp SET pay_by_quarter = ARRAY[25000,25000,27000,27000]
    WHERE name = 'Carol';

An array can also be updated at a single element:

UPDATE sal_emp SET pay_by_quarter[4] = 15000
    WHERE name = 'Bill';

or updated in a slice:

UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}'
    WHERE name = 'Carol';

The slice syntaxes with omitted lower-bound and/or upper-bound can be used too, but only when updating an array value that is not NULL or zero-dimensional (otherwise, there is no existing subscript limit to substitute).

A stored array value can be enlarged by assigning to elements not already present. Any positions between those previously present and the newly assigned elements will be filled with nulls. For example, if array myarray currently has 4 elements, it will have six elements after an update that assigns to myarray[6]; myarray[5] will contain null. Currently, enlargement in this fashion is only allowed for one-dimensional arrays, not multidimensional arrays.

Subscripted assignment allows creation of arrays that do not use one-based subscripts. For example one might assign to myarray[-2:7] to create an array with subscript values from -2 to 7.

New array values can also be constructed using the concatenation operator, ||:

SELECT ARRAY[1,2] || ARRAY[3,4];
 ?column?
-----------
 {1,2,3,4}
(1 row)

SELECT ARRAY[5,6] || ARRAY[[1,2],[3,4]];
      ?column?
---------------------
 {{5,6},{1,2},{3,4}}
(1 row)

The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. It also accepts two N-dimensional arrays, or an N-dimensional and an N+1-dimensional array.

When a single element is pushed onto either the beginning or end of a one-dimensional array, the result is an array with the same lower bound subscript as the array operand. For example:

SELECT array_dims(1 || '[0:1]={2,3}'::int[]);
 array_dims
------------
 [0:2]
(1 row)

SELECT array_dims(ARRAY[1,2] || 3);
 array_dims
------------
 [1:3]
(1 row)

When two arrays with an equal number of dimensions are concatenated, the result retains the lower bound subscript of the left-hand operand's outer dimension. The result is an array comprising every element of the left-hand operand followed by every element of the right-hand operand. For example:

SELECT array_dims(ARRAY[1,2] || ARRAY[3,4,5]);
 array_dims
------------
 [1:5]
(1 row)

SELECT array_dims(ARRAY[[1,2],[3,4]] || ARRAY[[5,6],[7,8],[9,0]]);
 array_dims
------------
 [1:5][1:2]
(1 row)

When an N-dimensional array is pushed onto the beginning or end of an N+1-dimensional array, the result is analogous to the element-array case above. Each N-dimensional sub-array is essentially an element of the N+1-dimensional array's outer dimension. For example:

SELECT array_dims(ARRAY[1,2] || ARRAY[[3,4],[5,6]]);
 array_dims
------------
 [1:3][1:2]
(1 row)

An array can also be constructed by using the functions array_prepend, array_append, or array_cat. The first two only support one-dimensional arrays, but array_cat supports multidimensional arrays. Some examples:

SELECT array_prepend(1, ARRAY[2,3]);
 array_prepend
---------------
 {1,2,3}
(1 row)

SELECT array_append(ARRAY[1,2], 3);
 array_append
--------------
 {1,2,3}
(1 row)

SELECT array_cat(ARRAY[1,2], ARRAY[3,4]);
 array_cat
-----------
 {1,2,3,4}
(1 row)

SELECT array_cat(ARRAY[[1,2],[3,4]], ARRAY[5,6]);
      array_cat
---------------------
 {{1,2},{3,4},{5,6}}
(1 row)

SELECT array_cat(ARRAY[5,6], ARRAY[[1,2],[3,4]]);
      array_cat
---------------------
 {{5,6},{1,2},{3,4}}

In simple cases, the concatenation operator discussed above is preferred over direct use of these functions. However, because the concatenation operator is overloaded to serve all three cases, there are situations where use of one of the functions is helpful to avoid ambiguity. For example consider:

SELECT ARRAY[1, 2] || '{3, 4}';  -- the untyped literal is taken as an array
 ?column?
-----------
 {1,2,3,4}

SELECT ARRAY[1, 2] || '7';                 -- so is this one
ERROR:  malformed array literal: "7"

SELECT ARRAY[1, 2] || NULL;                -- so is an undecorated NULL
 ?column?
----------
 {1,2}
(1 row)

SELECT array_append(ARRAY[1, 2], NULL);    -- this might have been meant
 array_append
--------------
 {1,2,NULL}

In the examples above, the parser sees an integer array on one side of the concatenation operator, and a constant of undetermined type on the other. The heuristic it uses to resolve the constant's type is to assume it's of the same type as the operator's other input — in this case, integer array. So the concatenation operator is presumed to represent array_cat, not array_append. When that's the wrong choice, it could be fixed by casting the constant to the array's element type; but explicit use of array_append might be a preferable solution.

8.15.5. Searching in Arrays

To search for a value in an array, each value must be checked. This can be done manually, if you know the size of the array. For example:

SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR
                            pay_by_quarter[2] = 10000 OR
                            pay_by_quarter[3] = 10000 OR
                            pay_by_quarter[4] = 10000;

However, this quickly becomes tedious for large arrays, and is not helpful if the size of the array is unknown. An alternative method is described in Section 9.23. The above query could be replaced by:

SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);

In addition, you can find rows where the array has all values equal to 10000 with:

SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);

Alternatively, the generate_subscripts function can be used. For example:

SELECT * FROM
   (SELECT pay_by_quarter,
           generate_subscripts(pay_by_quarter, 1) AS s
      FROM sal_emp) AS foo
 WHERE pay_by_quarter[s] = 10000;

This function is described in Table 9.62.

You can also search an array using the && operator, which checks whether the left operand overlaps with the right operand. For instance:

SELECT * FROM sal_emp WHERE pay_by_quarter && ARRAY[10000];

This and other array operators are further described in Section 9.18. It can be accelerated by an appropriate index, as described in Section 11.2.

You can also search for specific values in an array using the array_position and array_positions functions. The former returns the subscript of the first occurrence of a value in an array; the latter returns an array with the subscripts of all occurrences of the value in the array. For example:

SELECT array_position(ARRAY['sun','mon','tue','wed','thu','fri','sat'], 'mon');
 array_positions
-----------------
 2

SELECT array_positions(ARRAY[1, 4, 3, 1, 3, 4, 2, 1], 1);
 array_positions
-----------------
 {1,4,8}

Tip

Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

8.15.6. Array Input and Output Syntax

The external text representation of an array value consists of items that are interpreted according to the I/O conversion rules for the array's element type, plus decoration that indicates the array structure. The decoration consists of curly braces ({ and }) around the array value plus delimiter characters between adjacent items. The delimiter character is usually a comma (,) but can be something else: it is determined by the typdelim setting for the array's element type. Among the standard data types provided in the PostgreSQL distribution, all use a comma, except for type box, which uses a semicolon (;). In a multidimensional array, each dimension (row, plane, cube, etc.) gets its own level of curly braces, and delimiters must be written between adjacent curly-braced entities of the same level.

The array output routine will put double quotes around element values if they are empty strings, contain curly braces, delimiter characters, double quotes, backslashes, or white space, or match the word NULL. Double quotes and backslashes embedded in element values will be backslash-escaped. For numeric data types it is safe to assume that double quotes will never appear, but for textual data types one should be prepared to cope with either the presence or absence of quotes.

By default, the lower bound index value of an array's dimensions is set to one. To represent arrays with other lower bounds, the array subscript ranges can be specified explicitly before writing the array contents. This decoration consists of square brackets ([]) around each array dimension's lower and upper bounds, with a colon (:) delimiter character in between. The array dimension decoration is followed by an equal sign (=). For example:

SELECT f1[1][-2][3] AS e1, f1[1][-1][5] AS e2
 FROM (SELECT '[1:1][-2:-1][3:5]={{{1,2,3},{4,5,6}}}'::int[] AS f1) AS ss;

 e1 | e2
----+----
  1 |  6
(1 row)

The array output routine will include explicit dimensions in its result only when there are one or more lower bounds different from one.

If the value written for an element is NULL (in any case variant), the element is taken to be NULL. The presence of any quotes or backslashes disables this and allows the literal string value “NULL” to be entered. Also, for backward compatibility with pre-8.2 versions of PostgreSQL, the array_nulls configuration parameter can be turned off to suppress recognition of NULL as a NULL.

As shown previously, when writing an array value you can use double quotes around any individual array element. You must do so if the element value would otherwise confuse the array-value parser. For example, elements containing curly braces, commas (or the data type's delimiter character), double quotes, backslashes, or leading or trailing whitespace must be double-quoted. Empty strings and strings matching the word NULL must be quoted, too. To put a double quote or backslash in a quoted array element value, precede it with a backslash. Alternatively, you can avoid quotes and use backslash-escaping to protect all data characters that would otherwise be taken as array syntax.

You can add whitespace before a left brace or after a right brace. You can also add whitespace before or after any individual item string. In all of these cases the whitespace will be ignored. However, whitespace within double-quoted elements, or surrounded on both sides by non-whitespace characters of an element, is not ignored.

Tip

The ARRAY constructor syntax (see Section 4.2.12) is often easier to work with than the array-literal syntax when writing array values in SQL commands. In ARRAY, individual element values are written the same way they would be written when not members of an array.

4.1. 語法結構

SQL 語法包含一連串的命令，命令是由一系列的指示記號所組合而成，以分號結尾。最後如果是串流輸入，也會結束一個命令。指示的合法性是由特別的命令語法所定義的。

指示記號可能是關鍵字、識別項、引號識別項、文字、或一個特別的字元符號。指示一般來說是以空白分隔（空白符號、定位符號、換行符號），但如果不會混淆的話，也不一定需要。（一般只出現在特殊字元用來調整了其他指示的型別）

舉個例子，下面就是一個合法（符合語法）的 SQL 輸入：

SELECT * FROM MY_TABLE;
UPDATE MY_TABLE SET A = 5;
INSERT INTO MY_TABLE VALUES (3, 'hi there');

這個序列包含了 3 個命令，每行一個（然而這不是一定的，同一行可以超過一個命令，而一個命令也可以分解為多行使用）。

順帶一提的是，註解也是 SQL 輸入的一部份，但不屬於任何指示記號，他們等同於空白字元。

SQL 語法並不是很嚴格要求什麼樣的指示記號來識別命令，或是哪些是運算子或參數。通常最前面的指示記號是命令的名稱，以上面的例子來說，我們通常會說是一個「SELECT」、一個「UPDATE」、以及一個「INSERT」命令。但對於 UPDATE 命令而言，有一個 SET 指示記號出現在某個地方是必要的；同樣地，INSERT 也需要有 VALUES 來搭配。精確的語法規則都在第 6 部份中的章節進行說明。

4.1.1. 識別項（Identifier）和關鍵字（Keyword）

在上面的例子中的 SELECT、UPDATE、或是 VALUES，都是屬於關鍵字的範圍。所謂關鍵字，意即在 SQL 語言中，其具有固定的意義。像指示記號 MY_TABLE 則是屬於識別項。它識別表格的名稱，欄位名稱，或是其他的資料庫物件，端看命令如何看待該識別項。然而，有時候它們會被簡稱為「名稱」。關鍵字和識別項的文法結構是相同的，意即不看整個命令的話，是無法辨別到底是識別項還是關鍵字的。完整的關鍵字列表，收錄在附件 C 當中。

SQL 識別項與關鍵字必須以英文字母開頭（a - z，也可以是附加符號和非拉丁字母，中文沒問題）或是底線（_）。剩餘的字元可以是字母、底線、數字（0 - 9）、或錢字號（$）。注意錢字號，在標準 SQL 語法中是不允許使用的，所以可能會降低一些應用程式的可攜性。標準 SQL 也沒有定義包含數字或是以底線起迄的關鍵字，所以識別項這樣的形式定義是安全的，不會和標準未來的修訂相衝突。

資料庫系統不能使用長度超過 NAMEDATALEN -1 的識別項；太長的名稱仍然可以在命令中被輸入，但會被截斷。預設上，NAMEDATALEN 的設定是 64，所以最長的識別項名稱長度是 63 位元組。如果這個限制會造成困擾的話，你也可以調整 NAMEDATALEN 的編譯值，它的設定在 src/include/pg_config_manual.h 檔案中。

關鍵字和無引號識別項都是不分大小寫的，所以：

UPDATE MY_TABLE SET A = 5;

等同於：

uPDaTE my_TabLE SeT a = 5;

有一種寫法很常使用，就是把關鍵字用大寫表示，而識別項名稱使用小寫，例如：

UPDATE my_table SET a = 5;

第二種要介紹的識別項是，受限制的識別項，或是引號識別項。它的形式就是以雙引號括住的任何字串。受限制的識別項，就一定是識別項，不會是關鍵字。所以，「"select"」就會被識別為名稱為「select」的表格或欄位，而無引號的 select 就會被視為是關鍵字，也可能會產生解譯錯誤，如果剛好用在可能是表格或欄位名稱的位置上的話。使用引號識別項的例子如下：

UPDATE "my_table" SET "a" = 5;

引號識別項可以包含任何字元，除了字元碼為 0 的字元以外。（要包含雙引號字元的話，請使用連續兩個雙引號。）這可以用來建立原來不能使用的表格或欄位名稱，甚至是包含空白或＂&＂。但長度的限制仍然要遵守。

還有一種變形的引號識別項，允許包含跳脫的形式來表現萬國碼（unicode）。這種變形會以「U&」開頭（U大小寫皆可）緊接在前面的雙引號的前面，不能有任何空白在它們之間，例如：U&"foo"。（注意，這可能會和運算子的 & 產生混淆，但可以在運算子的 & 前後都加上空白來避免這個問題。）在雙引號內，萬國碼字元以跳脫的形式表現，也就是以倒斜線再接 4 位數的 16 進位碼，或倒斜線接一個加號再串一組 6 位數的 16 進位碼。例如，識別項 "data" 可以寫成這樣：

U&"d\0061t\+000061"

下面是稍微不簡明的例子是，俄文的＂slon＂（大象），以希伯萊文字母表現：

U&"\0441\043B\043E\043D"

如果希望以不同的跳脫字元來代替倒斜線的話，那麼可以雙引號結束後使用 UESCAPE 子句來指定，舉例來說：

U&"d!0061t!+000061" UESCAPE '!'

跳脫字元可以是任何的單一字元，除了 16 進位數字的字元、單引號、雙引號、或空白以外。注意指定的跳脫字元是以單引號括住，而不是雙引號。

內容要使用到跳脫字元的話，就重覆輸入 2 次。

萬國碼的跳脫語法，只能使用 UTF8 的編碼。如果有用到其他的編碼的話，只有在 ASCII 範圍（最大為 \007F）可以使用。4 位數及 6 位數的形式，可以組合配對用來指定 UTF-16 中，大於 U+FFFF 的字元，雖然 6 位數的形式單獨就可以解決這個問題（組合配對並不會直接被儲存起來，他們會被編碼成 UTF-8 再儲存。）

把識別項用引號括起來也可以用來保持它的大小寫狀態，沒有括起來的話，都會被轉成小寫字母。舉例來說，對 PostgreSQL 而言，FOO、foo、"foo"，三者都是一樣的，但 "Foo" 和 "FOO" 就彼此及前面三者都視為不同。（在 PostgreSQL 中，把未引號括起的名稱轉成小寫，並不是 SQL 的標準。SQL 標準反而是都轉成大寫。所以在 SQL 標準中，foo 應該是等同於 "FOO" 而不同於 "foo"。如果你要增加語法的可攜性的話，建議最好都使用引號括起特別的名稱，或者都不要使用引號。）

4.1.2. 常數

PostgreSQL 中有三種隱含型別的常數：字串、位元字串、和數值。常數也可以強制型別，有助於更精確的表達，也可以讓系統處理更有效率。接下來就開始進行相關的說明。

4.1.2.1. 字串常數

在 SQL 中，所謂的字串常數，指的是用單引號括住的任意字元串列，例如：'This is a string'。如果在字串常數內需要有單引號的話就使用連續兩個單引號，例如：'Dianne''s horse'。注意這不是雙引號，是兩個單引號。

兩個字串常數如果只用空白及至少一個換行符號所分隔的話，那個它們會被連在一起，和寫成一個字串是一樣的。舉例來說：

SELECT 'foo'
'bar';

等同於：

SELECT 'foobar';

但如果是這樣：

SELECT 'foo'      'bar';

語法上就不正確了。（這是來自於 SQL 奇怪的常規，PostgreSQL 單純只是遵循。）

4.1.2.2. C 語言樣式的跳脫字串常數

PostgreSQL 也支援跳脫字串常數，這些是 SQL 標準的延伸。跳脫字串常數使用的是字母 E （大小寫皆可），緊接著單引號所組成，例如：E'foo'。（如果字串有超過一行的話，也只要在第一個單引號前有 E 就可以了。）在跳脫字串當中，使用倒斜線開頭，就可以使用 C 語言式的倒斜線跳脫字串，通常是一個倒斜線再接一個字元，對應到一個特殊位元組的值，如 Table 4.1 所示。

Table 4.1. 倒斜線跳腳字串（Backslash Escape Sequence）

倒斜線跳腳字串

字元意義

\b

backspace（倒退）

\f

form feed（換頁）

\n

newline（換行）

\r

carriage return（回到行首）

\t

tab（定位符號）

\o,\oo,\ooo(o= 0 - 7)

octal byte value（8 進位值）

\xh,\xhh(h= 0 - 9, A - F)

hexadecimal byte value（16 進位值）

\uxxxx,\Uxxxxxxxx(x= 0 - 9, A - F)

16 or 32-bit hexadecimal Unicode character value（16 位元或 32 位元的 16 進位萬國碼字元值）

任何其他接在倒斜線後面的字元都僅以原樣呈現。而如果要包含一個倒斜線的話，就使用連續兩個倒斜線輸入。同樣地，要包含一個單引號的話，可以使用跳脫字串 \' 輸入，也可以用一般連續兩個單引號的方式輸入。

你需要確保你所使用的 8 進位或 16 進位創建的位元組序列，都是屬於資料庫中合法的字元集。當資料庫編輯是 UTF-8 時，就應該使用萬國碼跳脫寫法，或其他萬國碼的輸入方式，如前 4.1.2.3 中所述。（所謂其他的方式可能是自行組合每一個位元組，但這樣會是相當麻煩的事。）

萬國碼跳脫語法只有在 UTF8 的編碼下才完整支援。當有其他的字元編碼被使用時，就只能使用 ASCII 的範圍（最大值為 \u007F）中的值。4 位數及 6 位數的型式可以用來配對指定 UTF-16 超過 U+FFFF 的字元，即使 6 位數的型式就足以解決這個問題。（當使用配對語法，且字元編碼為 UTF8 時，他們會先被合併成單一字元，然後再編碼成 UTF-8。）

注意

如果設定檔參數 standard_conforming_string 設定為 off，PostgreSQL 不論在一般字串還是跳脫字串常數，都會把倒斜線識別為跳脫符號。然而，在 PostgreSQL 9.1 之前，這個參數的預設值為 on，表示只在跳脫字串常數裡，才把倒斜線視為跳脫符號。這樣的模式是更與標準相容的，但可能會破壞默認舊有設定的應用程式，也就是總是把倒斜線視為跳脫符號。在這樣的背景之下，你可以把這個參數設為 off，但更好的是，修改程式不再使用倒斜線跳脫符號。如果你需要使用倒斜線跳脫符號來表示一個特殊字元，請使用 E 開頭的字串常數。

有關 standard_conforming_string，順帶一提的是，還有 escape_string_warning 和 backslash_quote 兩個參數，也提供調整倒斜線在字串常數中的使用。

字元代碼 0 的字元不能使用在字串常數當中。

4.1.2.3. String Constants with Unicode Escapes

PostgreSQL 也支援其他跳脫字串的語法，可以用來直接輸入任意的萬國碼字元。萬國碼跳脫字串常數是以 U& （U& 或 u& 皆可）開頭，然後緊接著單引號括住的字串，記得中間不能有任何空白，例如：U&'foo'。（注意這可能會混淆到 & 的使用，最好在其他使用 & 作為運算子的指令中，在 & 前後加上空白字元，以避免這個問題。）在括住的內容裡，萬國碼字元可以使用跳脫字元來指定，也就是使用倒斜線再接一組 4 位數的 16 進位值，或者以倒斜線加上加號再接一組 6 位數的 16 進位值。舉個例子，字串 'data' 也可以寫成：

U&'d\0061t\+000061'

下面是稍微不簡明的例子是，俄文的＂slon＂（大象），以希伯萊文字母表現：

U&'\0441\043B\043E\043D'

如果希望以不同的跳脫字元來代替倒斜線的話，那麼可以雙引號結束後使用 UESCAPE 子句來指定，舉例來說：

U&'d!0061t!+000061' UESCAPE '!'

跳脫字元可以是任何的單一字元，除了 16 進位數字的字元、單引號、雙引號、或空白以外。

然而，萬國碼的跳脫字串語法，只有在參數 standard_conforming_strings 設定為 on 時有效。這是因為這個語法可能會造成 SQL 指令在編譯時的困擾，造成 SQL 隱碼攻擊（SQL injection）或其他安全性的問題。如果這個參數設定為 off，那麼這個語法就會被禁止，並且產生錯誤訊息。

內容要使用到跳脫字元的話，就重覆輸入 2 次。

4.1.2.4. 錢字引號字串常數

標準的語法用於字串常數的設定很方便的，但如果字串裡有很多單引號或倒斜線，可讀性就很低了，因為它們都必須再連續多一個符號輸入。像這樣的例子，要改善可讀性的話，PostgreSQL 提供了另一個方式，稱作「錢字引號」（dollar quoting），來描述字串常數。錢字引號字串常數包含一個錢字號（$），可省略或多個字元所組成的「標籤」，另一個錢字號，組成字川的任何序列文字，再一個錢字號，與起始的錢字引號同樣的標籤，再一個錢字號。舉例來說，這裡有兩個不同使用錢字引號的方式，但都是「Dianne's horse」

$$Dianne's horse$$
$SomeTag$Dianne's horse$SomeTag$

注意在錢字引號字串中，單引號的使用就不需要跳脫處理了。實際上，在錢字引號字串中，沒有字元需要跳脫處理：字串內容就原樣輸出。倒斜錢並不特別，就算是錢字號也是，除非它們是引號標籤配對的一部份。

巢狀錢字字串常數是可以的，只要在不同層選擇不同的標籤就好。最常見的用途就是撰寫函數定義。舉例如下：

$function$
BEGIN
    RETURN ($1 ~ $q$[\t\r\n\v\\]$q$);
END;
$function$

這裡，「$q$[\t\r\n\v\]$q$」以錢字引號字串輸出就是「[\t\r\n\v\]」，作為 PostgreSQL 的函數內容。但這個字串並不會和外層的 $function$ 配對。對外層的字串而言，它只是被包裏的一部份字元而已。

以錢字符作為標籤（如果有的話）的引號字串和無引號的識別項，遵循相同的規則，除了它無法包含錢字符號以外。標籤是區分大小寫的，所以 $tag$String content$tag$ 是正確的，而 $TAG$String content$tag$ 是不合法的。

錢字引號字串緊接著關鍵字或識別項的話，就必須以空白分隔；否則錢字號的終止符可能會被當作前面識別項的一部份。

錢字引號並不是標準 SQL 的用法，但當撰寫一些複雜字串的時候，會比標準語法更為便利。當字串常數內嵌於另一個常數時，也是很好用的情境，像自訂函數時就時常用到。使用單引號的語法時，前面例子中的每一個倒斜線，需要使用 4 個倒斜線才能表示（原來字串常數時需要雙倒斜線，然後在執行階段時也需要雙倒斜線，一共就是 4 倍）。

4.1.2.5. 位元字串常數（Bit-string Constants）

位元字串常數看起來就像是一般的字串常數，只是將 B（大小寫皆可）放在引號的前面（不能有空白），例如：B'1001'。而在位元字串當中，只能有 0 或 1 的存在。

另一方面，位元字串常數也可以表示一個 16 進位的值，使用的先導字為 X（大小寫皆可），例如：X'1FF'。這個撰寫方式與使用前段方式，以 4 位數 2 進位表示每一個 16 進位位數，是相同的結果。

這兩種位元字串常數的表達方式，都可以在字串中換行，如同一般的字串常數。錢字引號表示方式不能使用在位元字串常數上。

4.1.2.6. 數值常數（Numeric Constants）

數值常數可以以下列語法輸入：

digits
digits.[digits][e[+-]digits]
[digits].digits[e[+-]digits]
[digits]e[+-]digits

這裡的 digits 指的是 0 到 9 的多位數十進位數字。如果有小數點的話，在小數點之前或之後要有數字。在指數標記 e 之前，也必須要有數字。字串中間不能再有其他字元或空白出現。注意，最前面正負號並不是數值常數的一部份，它是屬於運算子的概念。

下面是一些合法數值常數的例子：

42 3.5 4. .001 5e2 1.925e-3

數值常數如果沒有小數點或指數標記的話，預設就會被假定為整數，32 位元以內的為整數型別（interger），否則就會以 64 位元的大整數型別（bigint）來處理。其次就會宣告為數值型別（numeric）。只要包含小數點或指數標記的數值，都會預設使用數值型別。

預設數值常數的資料型別只是整個型別解析演算法的開端而已。在多數的情況下，各種常數會自動被轉換為最貼近內容的適當型別。不過，如果需要的話，你可以強制指定一個資料型別給該常數。舉例來說，你可以強制以實數型別（real 或 float4）來處理該數值：

REAL '1.23'  -- string style
1.23::REAL   -- PostgreSQL (historical) style

實際上，在型別轉換上還有一些特殊的情況，留待後續探討。

4.1.2.7. 其他型別常數

任意型別的常數，可以使用下列的語法來表示：

type 'string'
'string'::type
CAST ( 'string' AS type )

字串常數的內容會由型別轉換的程序 type 來處理，其結果就會得到該常數的專屬型別。明定型別轉換可以被省略，如果不會混淆的話（舉例來說，要輸入給特定的表格欄位的話，因為已有型別宣告，就不會混淆），那麼就會自動給定型別。

字串常數可以使用一般 SQL 標準寫法，或是錢字引號寫法。

還可以使用函數式的語法來撰寫：

typename ( 'string' )

但並非所有的型別都可以使用這個方式，請參閱 4.2.9 節取得詳細說明。

「::」、CAST()、及函數式語法，也可以用來指定任何表示式在執行中的型別轉換，如同 4.2.9 節中所描述的。要避免語法上的混淆，「type 'string'」這個語法，只能用在指定簡單的文字常數，另一個限制是，不能用於陣列型別。陣列常數的型別指定，請使用 :: 或 CAST() 的語法。

4.1.3. 運算子（Operators）

一個運算子最長可以是 NAMEDATALEN - 1（預設為 63 個字元），除了以下的字元之外：

- * / <> = ~ ! @ # % ^ & | ` ?

還有一些運算子的限制：

「--」和「/*」都不能出現在運算子裡，因為它們表示註解的開始。
多字元的運算子不能以 + 或 - 結尾，除非名稱裡也包含了下列字元：
~ ! @ # % ^ & | ` ?

舉個例子，@- 可以是合法的運算子，但 *- 就不合法。這個限制是讓 PostgreSQL 解譯 SQL 語法時，可以不需要在不同的標記間使用空白分隔。

當使用非 SQL 標準的運算子時，你通常需要在相隣的運算子間使用空白以免混淆。舉例來說，如果你已經定義了一個左側單元運算子 @，你就不能使用 X*@Y，必須寫成 X* @Y，以確保 PostgreSQL 可以識別為兩個運算子，而不是一個。

4.1.4. 特殊字元

有一些字元並不是字母型態，而具有特殊意義，但並非運算子。詳細的說明請參閱相對應的語法說明。本節僅簡要描述這些特殊字元的使用情境。

錢字號（$）其後接著數字的話，用來表示函數宣告或預備指令的參數編號。其他的用法還有識別項的一部份，或是錢字引號常數。
小括號（( )）一般用來強調表示式並且優先運算。還有某些情況用於表示某些 SQL 指令的部份的必要性。
中括號（[ ]）用於組成陣列的各個元素。詳情請參閱 8.15 節有關於陣列的內容。
逗號（,）用於一般語法上的結構需要，來分隔列表中的單元。
分號（;）表示 SQL 指令的結束。它不能出現在指令中的其他位置，除非是在字串常數當中，或是引號識別項。
冒號（:）用在取得陣列的小項。（參閱 8.15 節）在某些 SQL 分支（篏入式 SQL 之類的）中，冒號用來前置變數名稱。
米字號（*）用來表示表格中所有的欄位，或複合性的內容。它也可以用於函數宣告時，不限制固定數量的參數。
頓號（.）用在數值常數之中，也用於區分結構、表格、及欄位名稱。

4.1.5. 註解（Comments）

註解是以連續兩個破折號開頭，一直到行結尾的字串。例如：

-- This is a standard SQL comment

另外，C 語言的註解語法也可以使用：

/* multiline comment
 * with nesting: /* nested block comment */
 */

這樣的註解，以「/*」開頭，一直持續到對應的「*/」出現才結束。這樣區塊式的註解可以巢狀使用，所以你可以一次註解掉一堆包含註解的指令。這點是 SQL 的標準，和 C 語言的使用不太一樣的地方。

註解會在進一步的語法分析前被消去，也可以方便地以空白字元替代。

4.1.6. 運算優先權（Operator Precedence）

Table 4.2 列出在 PostgreSQL 中，運算子的運算優先權及運算次序。大多數的運算子都是相同的運算優先權，並且是左側運算。這些優先權與次序是撰寫在解譯器的程式當中的。

你有時候需要加上括號，當遇到二元運算子與一元運算子一起出現時。舉個例子：

SELECT 5 ! - 6;

會被解譯為：

SELECT 5 ! (- 6);

因為解譯器並不知道實際的情況，所以它可能會搞錯。「!」是一個後置運算子，並非中置運算子。在這個例子中，要以想要的方式進行運算的話，你必須要改寫為：

SELECT (5 !) - 6;

這是為了延展性而需要付出的代價。

Table 4.2. Operator Precedence (highest to lowest)

Operator/Element

Associativity

Description

.

left

table/column name separator

::

left

PostgreSQL-style typecast

[]

left

array element selection

+-

right

unary plus, unary minus

^

left

exponentiation

*/%

left

multiplication, division, modulo

+-

left

addition, subtraction

(any other operator)

left

all other native and user-defined operators

BETWEEN / IN / LIKE / ILIKE / SIMILAR

range containment, set membership, string matching

<>=<=>=<>

comparison operators

IS / ISNULL/ NOTNULL

IS TRUE,IS FALSE,IS NULL,IS DISTINCT FROM, etc

NOT

right

logical negation

AND

left

logical conjunction

OR

left

logical disjunction

注意，使用與內建運算子同名的自訂運算子，運算優先權的規則也會以原規則適用，如同上面的樣子。舉例來說，如果你定義了一個「+」的運算子，用於自訂的資料型態，那麼它就會和內建的「+」擁有相同的運算優先權，而與你的運算內容無關。

當某個結構操作的運算子用於 OPERATOR 語法之中時，如下所示：

SELECT 3 OPERATOR(pg_catalog.+) 4;

OPERATOR 建構式被用來為任何運算子，取得如 Table 4.2 中所示的預設運算優先權。不論在 OPERATOR() 中指定什麼運算子，都會回傳 true 的結果。

注意

PostgreSQL 在 9.5 之前的運算優先權有一些不同。比較特別的是，比較運算子「<= >= <>」是和一般其他運算子是相同等級的；「IS」先前的優先權較高；而「NOT BETWEEN」和相關的建構式行為不一致，使得在某些情況下，「NOT」和「BETWEEN」的優先權不同。這些規則的改變是為了與 SQL 標準有更好的相容性，減少因為等價轉換的不一致處理所造成的困擾。大多數的情況，這些改變並不需要使用習慣的改變，也不會產生沒有運算子的錯誤，而且都可以透過增加括號來解決。然而，有一些極端的情況可能會在沒有錯誤的情況改變其運算行為。如果你很關心這些變化，很擔心這些無聲的錯誤，你可以打開參數 operator_precedence_warning 來測試你的程式，然後檢查是否有警告被記錄下來。

8.14. JSON 型別

JSON 資料型別用於儲存 RFC 7159 中所規範的 JSON（JavaScript Object Notation）資料。此類資料也可以儲存為 text，但是 JSON 資料型別的優點是可以根據 JSON 規則強制讓每個儲存的值必須是有效的值。對於這些資料型別中儲存的資料，還提供了各種特定於 JSON 的函數和運算子。另請參閱第 9.15 節。

PostgreSQL 提供了兩種儲存 JSON 資料的型別：json 和 jsonb。為了對這些資料型別實作有效的查詢機制，PostgreSQL 還提供了 8.14.6 節中所描述的 jsonpath 資料型別。

json 和 jsonb 資料型別接受幾乎相同的內容集合作為輸入。實際主要的差別是效率。json 資料型別儲存與輸入字串完全相同的內容，處理函數必須在每次執行時重新解析；jsonb 資料型別則以分解後的二進位格式儲存，由於增加了轉換成本，因此資料輸入的速度稍慢，但由於後續不需要解析，因此處理速度明顯加快。jsonb 還支援索引處理，這是一個很大的優勢。

因為 json 型別儲存與輸入字串完全相同的內容，所以它將保留標記之間語義上無關的空白以及 JSON 物件中鍵的順序。另外，如果 JSON 內容物件包含相同的鍵不只一次，則所有鍵/值對都會保留。（處理函數會將最後一個值視為可用的值。）相比之下，jsonb 不會保留空白，不會保留物件中鍵的順序，也不會保留物件中重複的鍵。如果在輸入中指定了重複的鍵，則僅保留最後一個值。

通常，大多數應用程序應該將 JSON 資料儲存為 jsonb，除非有非常特殊的需求，例如關於物件中鍵的順序有一些傳統上的假設。

由於 PostgreSQL 每個資料庫只允許一種字元集的編碼。因此，除非資料庫編碼為 UTF8，否則 JSON 型別不可能嚴格符合 JSON 規範。嘗試直接使用資料庫編碼中無法表示的字元會失敗；相反，character 型別則允許使用可以在資料庫編碼中表示但不能以 UTF8 表示的字元。

RFC 7159 允許 JSON 字串包含 \uXXXX 所表示的 Unicode 轉譯序列。在 json 型別的輸入函數中，無論資料庫編碼如何，都允許 Unicode 轉譯，並且僅檢查語法正確性（即，四個十六進位數字跟在 \u 之後）。但是，jsonb 的輸入函數更嚴格：除非資料庫編碼為 UTF8，否則它不允許非 ASCII 字元（U+007F 以上的字元）使用 Unicode 轉譯。jsonb 型別也拒絕 \u0000（因為無法在 PostgreSQL 的 text 型別中表現），並且堅持認為使用 Unicode surrogate pair 對來指定 Unicode Basic Multilingual Plane 之外的字元都是正確的。有效的 Unicode 轉譯會轉換為等效的 ASCII 或 UTF8 字元進行儲存；這包括將 surrogate pair 折疊為單個字元。

第 9.15 節中描述的許多 JSON 處理函數會將 Unicode 轉譯為一般字元，因此，即使輸入型別為 json 而不是 jsonb，它們也會拋出與上述類型相同的錯誤。json 輸入函數不進行這些檢查的事實可能被認為是歷史共業，儘管它確實允許以非 UTF8 資料庫編碼的形式簡單儲存（毋須處理）JSON Unicode 轉譯。通常，如果可以的話，最好避免將 JSON 中的 Unicode 轉譯與非 UTF8 資料庫編碼混在一起。

將字串 JSON 輸入轉換為 jsonb 時，RFC 7159 描述的原始型別將會有效地對應到內建的 PostgreSQL 型別，如 Table 8.23 所示。因此，對於構成有效 jsonb 資料的內容存在一些較小的附加約束條件，這些約束條件既不適用於 json 型別，也不適用於抽象上 JSON，這對應於基礎資料型別可以表示的內容限制。值得注意的是，jsonb 會拒絕 PostgreSQL 數字資料型別範圍之外的數字，而 json 不會。RFC 7159 允許此類實作定義限制。但是，實際上，在其他實作中更容易出現此類問題，因為通常將 JSON 的數字基本型別表示為 IEEE 754 雙精確度浮點數（RFC 7159 明確預期了這一點且允許）。當使用 JSON 作為此類系統的交換格式時，應考慮與 PostgreSQL 最初儲存的資料相比較，可能會有失去數字精確度的風險。

相反，如下表中所示，JSON 基本型別的輸入格式有一些微小的限制，但並不適用於其相應的 PostgreSQL 資料型別。

Table 8.23. JSON Primitive Types and Corresponding PostgreSQL Types

JSON primitive type

PostgreSQL type

Notes

string

text

禁止使用 \u0000，如果資料庫編碼不是 UTF8，則不允許使用非 ASCII Unicode 轉譯

number

numeric

不允許使用 NaN 和 infinity

boolean

僅接受小寫的 true 和 false

null

(none)

與 SQL NULL 是不同的概念

8.14.1. JSON 輸入與輸出語法

JSON 資料型別的輸入/輸出語法被規範在 RFC 7159 之中。

以下是所有有效的 json（或 jsonb）表示式：

-- Simple scalar/primitive value
-- Primitive values can be numbers, quoted strings, true, false, or null
SELECT '5'::json;

-- Array of zero or more elements (elements need not be of same type)
SELECT '[1, 2, "foo", null]'::json;

-- Object containing pairs of keys and values
-- Note that object keys must always be quoted strings
SELECT '{"bar": "baz", "balance": 7.77, "active": false}'::json;

-- Arrays and objects can be nested arbitrarily
SELECT '{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'::json;

如前所述，當輸入 JSON 內容然後在不進行任何其他處理的情況下進行輸出時，json 輸出與輸入相同的內容，而 jsonb 則不會保留與語義無關的細節，像是空格。例如，請注意此處的差別：

SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::json;
                      json                       
-------------------------------------------------
 {"bar": "baz", "balance": 7.77, "active":false}
(1 row)

SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::jsonb;
                      jsonb                       
--------------------------------------------------
 {"bar": "baz", "active": false, "balance": 7.77}
(1 row)

值得注意的一個語義無關的細節是，在 jsonb 中，數字將根據基本數字型別的行為進行輸出。實際上，這意味著使用 E 記號輸入的數字將不會以原輸出形式輸出，例如：

SELECT '{"reading": 1.230e-5}'::json, '{"reading": 1.230e-5}'::jsonb;
         json          |          jsonb          
-----------------------+-------------------------
 {"reading": 1.230e-5} | {"reading": 0.00001230}
(1 row)

但是，jsonb 將保留小數尾巴的數字零，如在本範例中所示，即使它們在語義上無意義（例如，相等運算），也是如此。

有關可用於建構和處理 JSON 內容的內建函數和運算子的列表，請參閱第 9.15 節。

8.14.2. 設計 JSON 文件結構

將資料表示為 JSON 可以比傳統的關連資料模型要靈活得多，而傳統的關連資料模型在需求多變的環境中非常引人注目。這兩種方法很可能在同一應用程序中共存和互補。但是，即使對於需要最大靈活性的應用程序，仍然建議 JSON 文件具有某種固定的結構。該結構通常是不具有強制性的（儘管可以宣告強制執行某些業務規則），但是具有可預測的結構可以使撰編查詢變得更加容易，該查詢可以有效地彙總資料表中的一組「文件」（datums）。

JSON 資料儲存在資料表中時，與其他任何資料型別一樣，要遵循相同的一致性控制事項。儘管儲存大型文件是可行的，但請記住，任何更新都會取得整筆資料的 row-level lock。考慮將 JSON 文件限制在可管理的大小以內，以減少更新交易事務之間的鎖定競爭。理想情況下，每個 JSON 文件都應代表一個完整交易單位資料(atomic datum)，業務規則規定不能將該完整交易單位資料進一步細分為可以獨立更新的較小單位資料。

8.14.3. `jsonb` Containment and Existence

測試包容性(containment)是 jsonb 的一項重要功能。json 型別沒有平行處理的工具集。包含性測試一個 jsonb 文件是否在其中包含另一個。除說明以外的部份，這些範例會回傳 true：

-- Simple scalar/primitive values contain only the identical value:
SELECT '"foo"'::jsonb @> '"foo"'::jsonb;

-- The array on the right side is contained within the one on the left:
SELECT '[1, 2, 3]'::jsonb @> '[1, 3]'::jsonb;

-- Order of array elements is not significant, so this is also true:
SELECT '[1, 2, 3]'::jsonb @> '[3, 1]'::jsonb;

-- Duplicate array elements don't matter either:
SELECT '[1, 2, 3]'::jsonb @> '[1, 2, 2]'::jsonb;

-- The object with a single pair on the right side is contained
-- within the object on the left side:
SELECT '{"product": "PostgreSQL", "version": 9.4, "jsonb": true}'::jsonb @> '{"version": 9.4}'::jsonb;

-- The array on the right side is not considered contained within the
-- array on the left, even though a similar array is nested within it:
SELECT '[1, 2, [1, 3]]'::jsonb @> '[1, 3]'::jsonb;  -- yields false

-- But with a layer of nesting, it is contained:
SELECT '[1, 2, [1, 3]]'::jsonb @> '[[1, 3]]'::jsonb;

-- Similarly, containment is not reported here:
SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"bar": "baz"}'::jsonb;  -- yields false

-- A top-level key and an empty object is contained:
SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"foo": {}}'::jsonb;

一般原則是，包含物件必須在結構和資料內容上與包含的物件相吻合，可能是在從包含的物件中丟棄了一些不吻合的陣列元素或物件鍵/值配對之後。但是請記住，進行包含性檢查時，陣列元素的順序並不重要，並且重複陣列元素僅有一個元素會被視為有效。

作為結構必須吻合的一般原則的特殊例外，陣列可以包含單一基本值：

-- This array contains the primitive string value:
SELECT '["foo", "bar"]'::jsonb @> '"bar"'::jsonb;

-- This exception is not reciprocal -- non-containment is reported here:
SELECT '"bar"'::jsonb @> '["bar"]'::jsonb;  -- yields false

jsonb 還具有一個 existence 運算子，它是包含性的變體：它測試字串（作為 text 值）是否作為物件鍵或陣列元素出現在 jsonb 值的頂層。這些範例回傳 true，除非另有說明：

-- String exists as array element:
SELECT '["foo", "bar", "baz"]'::jsonb ? 'bar';

-- String exists as object key:
SELECT '{"foo": "bar"}'::jsonb ? 'foo';

-- Object values are not considered:
SELECT '{"foo": "bar"}'::jsonb ? 'bar';  -- yields false

-- As with containment, existence must match at the top level:
SELECT '{"foo": {"bar": "baz"}}'::jsonb ? 'bar'; -- yields false

-- A string is considered to exist if it matches a primitive JSON string:
SELECT '"foo"'::jsonb ? 'foo';

當涉及許多鍵或元素時，JSON 物件比陣列更適合用於測試是否包含或存在，因為與陣列不同，JSON 物件在內部進行了最佳化以進行搜尋，因此不需要線性搜尋。

由於 JSON 的包含性是巢狀的，因此適當的查詢可以跳過對子物件的明確選擇。舉例來說，假設我們有一個 doc 欄位，其中包含最上層物件，而大多數物件包含子物件陣列的標籤欄位。該查詢項目，在其中包含“ term”：“ paris”和“ term”：“ food”的子物件出現，而忽略標籤陣列以外的任何鍵：

SELECT doc->'site_name' FROM websites
  WHERE doc @> '{"tags":[{"term":"paris"}, {"term":"food"}]}';

例如，另一個方式可以完成同一件事

SELECT doc->'site_name' FROM websites
  WHERE doc->'tags' @> '[{"term":"paris"}, {"term":"food"}]';

但是這種方法靈活性較差，而且效率通常也較低。

另一方面，JSON 存在性運算子不是巢狀的：它只會在 JSON 內容的最上層查詢指定的鍵或陣列元素。

在第 9.15 節中記錄了各種包含性和存在性的運算子，以及所有其他 JSON 運算子和函數。

8.14.4. `jsonb` Indexing

GIN 索引可用於有效搜尋大量的 jsonb 文件（datums）中出現的鍵或鍵/值配對。有兩種 GIN “operator classes”，提供了不同的效能和靈活性權衡。

jsonb 的預設 GIN 運算子類支援使用最上層鍵存在的運算子 ?，?& 和 ?| 進行查詢。運算子和路徑/值存在性運算子 @>。（有關這些運算子實作的語義的詳細信息，請參見 Table 9.45。）使用此運算子類建立索引的範例是：

CREATE INDEX idxgin ON api USING GIN (jdoc);

非預設 GIN 運算子類 jsonb_path_ops 僅支援對 @> 運算子進行索引。使用此運算子類建立索引的範例是：

CREATE INDEX idxginp ON api USING GIN (jdoc jsonb_path_ops);

想像一個資料表的範例，該資料表儲存了從第三方 Web 服務檢索到的 JSON 文件以及已文件化的結構定義。典型的文件是：

{
    "guid": "9c36adc1-7fb5-4d5b-83b4-90356a46061a",
    "name": "Angela Barton",
    "is_active": true,
    "company": "Magnafone",
    "address": "178 Howard Place, Gulf, Washington, 702",
    "registered": "2009-11-07T08:53:22 +08:00",
    "latitude": 19.793713,
    "longitude": 86.513373,
    "tags": [
        "enim",
        "aliquip",
        "qui"
    ]
}

我們將這些文件儲存在名為 api 的資料表中，名為 jdoc 的 jsonb 欄位中。如果在此欄位上建立了 GIN 索引，則如下查詢可以使用到該索引：

-- Find documents in which the key "company" has value "Magnafone"
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"company": "Magnafone"}';

但是，索引不能用於以下查詢，儘管運算子 ? 是可索引的，但它不會直接套用於索引欄位 jdoc：

-- Find documents in which the key "tags" contains key or array element "qui"
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc -> 'tags' ? 'qui';

儘管如此，透過適當使用表示式索引，上述查詢仍可以使用索引。如果在“tags”鍵中查詢特定項目很常見，則定義這樣的索引可能是值得的：

CREATE INDEX idxgintags ON api USING GIN ((jdoc -> 'tags'));

現在，WHERE 子句 jdoc->'tags' ? 'qui' 將被識別為可索引運算子的應用程序 ? 到索引表示式 jdoc->'tags'。（有關表示式索引的更多資訊，請參閱第 11.7 節。）

另外，GIN 索引支援＠＠和＠？運算子，它們處理 jsonpath 的搜尋。

SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"';

SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] ? (@ == "qui")';

GIN 索引從 jsonpath 中取出以下形式的語句：accessors_chain = const。Accessors chain 可能由 .key，[*] 和 [index] 的 Accessor 所組成。_jsonb_ops 也支持 .*_ 和 .** 的 Accessor。

查詢的另一種方法是利用 containment，例如：

-- Find documents in which the key "tags" contains array element "qui"
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qui"]}';

jdoc 欄位上的簡單 GIN 索引可以支援此查詢。但是請注意，這樣的索引將在 jdoc 欄位中儲存每個鍵和值的副本，而上一範例的表示式索引僅儲存在 tag 鍵下所找到的資料。儘管簡單索引方法更加靈活（因為它支援對任何鍵的查詢），但目標表示式索引可能比簡單索引更小且搜尋速度更快。

儘管 jsonb_path_ops 運算子類僅支援使用 @>，@@ 和 @? 運算子的查詢，它比預設的運算子類 jsonb_ops 具有明顯的效能優勢。對於相同資料集，jsonb_path_ops 索引通常也比 jsonb_ops 索引小得多，針對搜尋的專用性更好，尤其是當查詢包含頻繁出現在資料中的鍵時。因此，搜尋性質的操作通常比預設運算子類具有更好的效能。

The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data. [6] Basically, each jsonb_path_ops index item is a hash of the value and the key(s) leading to it; for example to index {"foo": {"bar": "baz"}}, a single index item would be created incorporating all three of foo, bar, and baz into the hash value. Thus a containment query looking for this structure would result in an extremely specific index search; but there is no way at all to find out whether foo appears as a key. On the other hand, a jsonb_ops index would create three index items representing foo, bar, and baz separately; then to do the containment query, it would look for rows containing all three of these items. While GIN indexes can perform such an AND search fairly efficiently, it will still be less specific and slower than the equivalent jsonb_path_ops search, especially if there are a very large number of rows containing any single one of the three index items.

A disadvantage of the jsonb_path_ops approach is that it produces no index entries for JSON structures not containing any values, such as {"a": {}}. If a search for documents containing such a structure is requested, it will require a full-index scan, which is quite slow. jsonb_path_ops is therefore ill-suited for applications that often perform such searches.

jsonb also supports btree and hash indexes. These are usually useful only if it's important to check equality of complete JSON documents. The btree ordering for jsonb datums is seldom of great interest, but for completeness it is:

Object > Array > Boolean > Number > String > Null

Object with n pairs > object with n - 1 pairs

Array with n elements > array with n - 1 elements

Objects with equal numbers of pairs are compared in the order:

key-1, value-1, key-2 ...

Note that object keys are compared in their storage order; in particular, since shorter keys are stored before longer keys, this can lead to results that might be unintuitive, such as:

{ "aa": 1, "c": 1} > {"b": 1, "d": 1}

Similarly, arrays with equal numbers of elements are compared in the order:

element-1, element-2 ...

Primitive JSON values are compared using the same comparison rules as for the underlying PostgreSQL data type. Strings are compared using the default database collation.

8.14.5. 對應轉換

可以使用其他延伸功能來實作針對不同程序語言的 jsonb 型別轉換。

PL/Perl 的延伸功能名稱為 jsonb_plperl 和 jsonb_plperlu。如果使用它們，則 jsonb 的值將視情況對應轉換為到 Perl 的 array、hash 和 scalar。

PL/Python 的延伸功能名稱為 jsonb_plpythonu，jsonb_plpython2u 和 jsonb_plpython3u（有關 PL/Python 的命名約定，請參閱第 45.1 節）。如果使用它們，則 jsonb 值將適當地對應轉換到 Python 的 dictionary，list 和 scalar。

8.14.6. jsonpath Type

jsonpath 型別實現了 PostgreSQL 中對 SQL/JSON 路徑語法的支援，以有效地查詢 JSON 資料。它提供以二元運算的形式來使用已解析的 SQL/JSON 路徑表示式，此表示式讓路徑引擎從 JSON 資料檢索的項目取出內容，以供 SQL/JSON 查詢函數進一步處理。

SQL / JSON 路徑 predicate 和運算子的語義基本遵循 SQL 標準。同時，為了提供使用 JSON 資料的更自然的方式，SQL/JSON 路徑語法使用了一些 JavaScript 約定：

點（.）用於資料成員存取。
中括號（[ ]）用於陣列存取。
與從 1 開始的一般 SQL 陣列不同，SQL/JSON 陣列是從 0 開始。

SQL/JSON 路徑表示式通常以 SQL 字串文字形式寫在 SQL 查詢中，因此它必須用單引號引起來，並且值中所需的任何單引號都必須加倍（請參閱第 4.1.2.1 節）。某些形式的路徑表示式需要在其中包含字串文字。這些嵌入的字串文字遵循 JavaScript/ECMAScript 約定：它們必須用雙引號引起來，並且在其中可以使用反斜線轉譯符號來表示，否則很難輸入的字元。特別地，在嵌入式字串文字中寫雙引號的方式是 \"，而寫反斜線本身則必須寫成 \。其他特殊的反斜線序列包括在 JSON 字串中識別的那些：\b，\f，\n，\r，\t，\v 用於各種 ASCII 控制字元，\uNNNN 用於其 4 進位數字代碼標識的 Unicode 字元。反斜線語法還包括 JSON 不允許的兩種情況：\xNN 僅用兩個十六進位數字編寫的字元代碼，而 \u {N ...} 用於用 1 至 6 個十六進位數字編寫的字元代碼。

A path expression consists of a sequence of path elements, which can be the following:

Path literals of JSON primitive types: Unicode text, numeric, true, false, or null.
Path variables listed in Table 8.24.
Accessor operators listed in Table 8.25.
jsonpath operators and methods listed in Section 9.15.2.3
Parentheses, which can be used to provide filter expressions or define the order of path evaluation.

For details on using jsonpath expressions with SQL/JSON query functions, see Section 9.15.2.

Table 8.24. `jsonpath` Variables

Variable

Description

$

A variable representing the JSON text to be queried (the context item).

$varname

A named variable. Its value can be set by the parameter vars of several JSON processing functions. See and its notes for details.

@

A variable representing the result of path evaluation in filter expressions.

Table 8.25. `jsonpath` Accessors

Accessor Operator

Description

.key

."$varname"

Member accessor that returns an object member with the specified key. If the key name is a named variable starting with $ or does not meet the JavaScript rules of an identifier, it must be enclosed in double quotes as a character string literal.

.*

Wildcard member accessor that returns the values of all members located at the top level of the current object.

.**

Recursive wildcard member accessor that processes all levels of the JSON hierarchy of the current object and returns all the member values, regardless of their nesting level. This is a PostgreSQL extension of the SQL/JSON standard.

.**{level}

.**{start_level to end_level}

Same as .**, but with a filter over nesting levels of JSON hierarchy. Nesting levels are specified as integers. Zero level corresponds to the current object. To access the lowest nesting level, you can use the last keyword. This is a PostgreSQL extension of the SQL/JSON standard.

[subscript, ...]

Array element accessor. subscript can be given in two forms: index or start_index to end_index. The first form returns a single array element by its index. The second form returns an array slice by the range of indexes, including the elements that correspond to the provided start_index and end_index.

The specified index can be an integer, as well as an expression returning a single numeric value, which is automatically cast to integer. Zero index corresponds to the first array element. You can also use the last keyword to denote the last array element, which is useful for handling arrays of unknown length.

[*]

Wildcard array element accessor that returns all array elements.

[6] For this purpose, the term “value” includes array elements, though JSON terminology sometimes considers array elements distinct from values within objects.

4.2. 參數表示式

參數表示式用在許多不同的方面，像是 SELECT 指令中的回傳列表；在 INSERT 或 UPDATE 指令中指定欄位的新值；又或是在一些命令中，指出搜尋的條件等。參數表示式的結果，有時候會被稱作 scalar，以有別於表格表示式（就是一個表格）的結果。參數表示式也可以稱作 scalar expressions（賦值表示式），甚或簡化為 expressions （表示式）。表示式的語法容許其值為各種運算的單一結果，如數學、邏輯、集合、或其他運算。

參數表示式可以是下列的其中一種形態：

常數或文字內容
欄位的引用
函數參數的引用，在函數裡或預備指令（prepared statement）中
子參數表示式
欄位選擇表示式
運算子宣告
函數呼叫
彙總表示式
窗函數呼叫
型別轉換
校對轉換（collation expression）
賦值子查詢（scalar subquery）
陣列建構式
列建構式
其他被括號括住的參數表示式（用於群組子表示式和強制調整運算優先權）

除了這個列表之外，還有一些建構式也會應用到表示式，但並沒有特別定義語法規則。一般來說，他們會包含函數或運算子的操作，在第 9 章中會有適當的說明。其中有一個例子便是 IS NULL 字句。

我們已經在 4.1.2 節中討論過常數了，所以接下來就從常數以下的項目繼續說明。

4.2.1. 欄位引用

要引要一個欄位的話，請使用下列的形式：

correlation.columnname

「correlation」（所屬名稱）是其所屬表格的名稱（也可能需要包含結構名），或是表格的別名（在 FROM 子句中所定義的）。所屬名稱和分隔用的句點是可以省略的，如果欄位名稱在目前查詢中的所有表格中是唯一的話。（參閱第 7 章）

4.2.2. 函數參數引用

函數參數的引用，用來指定一個不在該 SQL 指令中的值。參數是使用在 SQL 函數定義或預備查詢之中。有一些用戶端函式庫也支援將資料數值與 SQL 指令分離，在這種情境下，參數就會用來指向外部的資料數值。參數引用的形式如下：

$number

舉個例子，有一個函數 dept 的宣告如下：

CREATE FUNCTION dept(text) RETURNS dept
    AS $$ SELECT * FROM dept WHERE name = $1 $$
    LANGUAGE SQL;

這裡的 $1 指的是函數被呼叫時的第 1 個輸入參數：

4.2.3. 子參數表示式（Subscripts）

如果表示式要產生陣列的結果的話，指定陣列中某個元素，請使用：

expression[subscript]

或是要取得陣列中多個相隣的元素，請使用：

expression[lower_subscript:upper_subscript]

每一個「subscript」本身都是一個表示式，必須要產生一個整數值。

一般來說，陣列表示式必須被括號起來，但如果該表示式只是一個欄位或參數的引用的話，那麼括號可以省略。然後，多個子參數表示式可以連在一起使用，當你需要陣列表達多維度的概念時。舉例如下：

mytable.arraycolumn[4]
mytable.two_d_column[17][34]
$1[10:42]
(arrayfunction(a,b))[42]

在最後一個例子中，括號是必須的。關於陣列，在 8.15 節有更多說明。

4.2.4. 欄位選擇

如果一個表示式產生了複合性的型別（列型別），那麼要指定其中的某個欄位時，請使用：

expression.fieldname

一般來說，列的表示式必須被括號起來，但如果該表示式只是一個欄位或參數的引用的話，那麼括號可以省略。舉例如下：

mytable.mycolumn
$1.somecolumn
(rowfunction(a,b)).col3

（然而，有限制的欄位引用，實際上就是一種欄位選擇語法的特列。）有一種重要的特例是從某個複合型別的表格欄位中取其子欄位的值：

(compositecol).somefield
(mytable.compositecol).somefield

在這裡，括號是必要的，以表示 compositecol 是一個欄位名稱，但不是表格名稱。而在第二個例子中，mytable 是表格名稱，而非結構名稱。

你可以取得複合資料的所有欄位值，使用「.*」：

(compositecol).*

這個記號在不同的地方有不同的用法，請參閱 8.16.5 節的說明。

4.2.5. 運算子宣告（Operator Invocations）

有三種用來進行運算子宣告的語法：

expression operator expression(雙元中置運算子)

operator expression(單元前置運算子)

expression operator(單元後置運算子)

運算子記號的語法規則依 4.1.3 節的說明，或是關鍵字 AND、OR、和 NOT，又或是如下形式的限定運算子名稱：

OPERATOR(schema.operatorname)

哪些特定的運算子的使用與運算方式，端看系統與使用者如何定義。在第 9 章中會說明內建的運算子詳情。

4.2.6. 函數呼叫

函數呼叫的語法是，函數的名稱（可能還會加上結構名）接著一連串用括號括起來的參數列表：

function_name ([expression [, expression ... ]] )

舉個例子，下面的函數呼叫可以計算 2 的平方根：

sqrt(2)

內建函數在第 9 章說明，其他的函數可由使用者自訂。

參數可以是選擇性的附加名稱，請參閱 4.3 節的內容。

注意

函數如果只有一個參數，而又是複合型別的話，就稱作使用了欄位選擇語法；反過來說，欄位選擇語法也可以寫成函數的形式。這是因為 col(table) 和 table.col 是可以互換的。這並非標準 SQL，但 PostgreSQL 支援了，因為這使得函數的使用可以模擬「計算欄位」（computed fields）。更多資訊請參閱 8.16.5 節。

4.2.7. 彙總表示式

彙總表示式用在查詢時，過濾資料進行彙總函數計算的應用。彙總函數壓縮了大量資料輸入成為一個單一的輸出值，例如加總或平均數。彙總表示式的語法可以是下列其中之一：

aggregate_name (expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]

aggregate_name (ALL expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]

aggregate_name (DISTINCT expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]

aggregate_name ( * ) [ FILTER ( WHERE filter_clause ) ]

aggregate_name ( [ expression [ , ... ] ] ) WITHIN GROUP ( order_by_clause ) [ FILTER ( WHERE filter_clause ) ]

這裡的 agregate_name 是預先就定義好的（可能還需要加上結構名稱），表示式可以是任何的函數形態，但不能包含彙總函數或窗函數。而 order_by_clause 和 filter_clause 後續進行說明。

第一種形式的彙總表示式用於每次輸入一列的情況；第二種形式和第一種相同，當 ALL 是預設的時候；第三種形式彙總不重覆的資料（或在多種表示式的時候，取不重覆的集合）；第四種形式也是每次輸入一列，但沒有限定輸入條件，通常是用於 count(*)；最後一種形式用於有次序的彙總函數，稍後說明。

大多數的彙總函數會忽略空值，所以如果表示式計算的結果是空值的話，就會忽略不計。這樣的假設除非有特別設定，對所有內建的函數都是如此。

舉例來說，count(*) 計算輸入列的個數，而 count(f1) 是計算輸入列中 f1 欄位非空值的個數，因為 count 會忽略空值；然而，count(distinct f1) 則是計算 f1 欄位不重覆又非空值的個數。

通常彙總函數在處理輸入資料時，都是未排序過的。在大多數的情況沒有關係，例如：min 最小值的計算，與其輸入的次序沒有關係。然而，還是有些彙總函數的結果，與其處理次序是有關連的，例如：array_agg 和 string_agg。ORDER BY 字句就可以達到此效果，其與一般查詢語法 ORDER BY 的用法相同，詳細說明在 7.5 節，除非該表示式無法輸出成欄位名稱或數字。舉例如下：

SELECT array_agg(a ORDER BY b DESC) FROM table;

操作到多參數的彙總函數時，注意 ORDER BY 會處理過所有的彙總參數，例如：

SELECT string_agg(a, ',' ORDER BY a) FROM table;

但不能這樣寫：

SELECT string_agg(a ORDER BY a, ',') FROM table;  -- incorrect

這在語法上沒有不合法，但這表示一個單參數的彙總函數，使用了兩個排序的關鍵值（第二個完全沒用，因為它是常數）。

如果 DISTINCT 被加到 ORDER BY 子句裡的話，那麼所有的 ORDER BY 表示式都必須符合彙總函數的參數，也就是說，你不能使用不在 DISTINCT 列表中的表示式來排序。

注意

在彙總函數中使用 DISTINCT 和 ORDER BY，都是 PostgreSQL 的延伸。

把 ORDER BY 放進彙總函數的參數列表中，就如同到目前為止的描述，用於排序輸入值，進行一般性的處理或統計彙總，而排序是選擇性的。有另一種類型的彙總函數稱作有次序彙總，它們就必須要有 ORDER BY 子句，通常就是因為這些函數的計算結果，只會對某些特定次序的資料產生效果。典型的有次序彙總例子，包含排名和累計百分比計算。對於有次序彙總計算，將 ORDER BY 字句寫進 WITHIN GROUP (...) 中，如同上述最後一個語法例子。在 ORDER BY 子句中的表示式會處理每一筆輸入資料，如同一般的彚總函數，然後將其依子句中的表示式計算並排序，最後再依序轉送給彙總函數處理。（這和非處理 WITHIN GROUP 中的 ORDER BY 不同，它們不會再轉送給彙總函數。）如果有在 WITHIN GROUP 之前的表示式的話，稱作直接參數，會和有 ORDER BY 的參數有區分。不像一般的彙總參數，直接參數只會被處理一次，而不是每一筆都一次。這意思是只有在 GROUP BY 中，這些變數才會被彙總處理。這樣的限制就如同直接參數不在彙總表示式之中一樣。直接參數一般用於累計分配，只有在每一次彙整完的值才有意義。直接參數可以是空值，在這個例子中，使用的是 ()，而非 (*)。（PostgreSQL 兩種寫法都可以接受，但標準 SQL 只接受前者。）

有次序彙總查詢如下：

SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY income) FROM households;

 percentile_cont
-----------------
           50489

這裡包含了 50% 的累計，或是中間數累計，來源是表格 households 的 income 欄位。其中，0.5 是直接參數，它不影響百分累計彙整計算過程。

如果使用了 FILTER，那就只有符合 FILTER 子句條件的資料會被彙總處理，其他的資料都會被忽略掉。舉例來說：

SELECT
    count(*) AS unfiltered,
    count(*) FILTER (WHERE i < 5) AS filtered
FROM generate_series(1,10) AS s(i);

 unfiltered | filtered
------------+----------
         10 |        4
(1 row)

預先內建的彙總函數將在 9.20 節中介紹，其他彙總函數可以由使用者自行設計。

彙總表示式只可以用於結果列表或 SELECT 中的 HAVING 子句。在其他子句中是被禁止的，像是 WHERE，因為這些子句邏輯上都是在彙總處理前就得處理資料。

當彙總表示式使用在子查詢（參閱 4.2.11 節及 9.22 節）中時，彙總計算就會一般性地處理子查詢中的資料。但如果該彙總計算的參數用到了外層的變數時，就會產生例外情況：彙整計算是屬於最接近的外層查詢，並且只處理該層的查詢資料。這個彙總表示式對整體而言，只是一個子查詢的引用，它會被視為一個常數的結果，限制它只會出現在 HAVING 子句的運算層次而已。

4.2.8. 窗函數呼叫

窗函數呼叫指的是使用類似彙總函數的使用方式，只是僅用於查詢中部份列的選擇上。和非窗函數不同的是，這並不會只輸出為單一列—每一列都仍然分開輸出。然而，窗函數也是處理了所有該列所屬群組的其他列（PARTITION BY），依其窗函數所定義的範圍。窗函數呼叫的方式可以是下列其中之一：

function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER ( indow_definition )
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER ( indow_definition )

定義「窗」，請使用下列語法：

[ existing_window_name ][ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]

選擇性的 frame_clause 語法如下：

{ RANGE | ROWS } frame_start
{ RANGE | ROWS } BETWEEN frame_start AND frame_end

frame_start 及 frame_end 的語法如下：

UNBOUNDED PRECEDING
value PRECEDING 
CURRENT ROW
value FOLLOWING 
UNBOUNDED FOLLOWING

在這裡的表示式（expression），除了不能再包含窗函數之外，無其他特別限制。

window_name 是一個定義在 WINDOW 子句中的命名。另一方面，一個完整的窗也可以是被括號括起來，使用和 WINDOW 子句相同語法的定義。詳見 SELECT 語法頁面。值得探討的是，OVER wname 並不完全等同於 OVER (wname ...)；後者隱含著複製及修改窗的定義，而如果包含 frame 子句的話，就會被拒絕執行。

PARTITION BY 子句將查詢分組成為不同的分區，它們將會分別地被窗函數所處理。PARTITION BY 的行為和查詢語句中的 GROUP BY 很類似，除了它的表示式就只是表示式，而且不能產出欄位名稱或編號。沒有 PARTITION BY 的話，所有的列都會被當作一個分組進行彙總。ORDER BY 子句決定窗函數的處理次序，它也和查詢語句中的 ORDER BY 很類似，但它不能使用輸出的欄位或編號。如果沒有 ORDER BY 的話，就無法保證彙總處理的次序了。

frame_clause 指的是構成該窗的列，再進一步以「窗框」拆分，是目前分區的子集合。對窗函數而言，運算會以窗框的範圍取代整合分區。窗框的指定可以是 RANGE 或 ROW 兩種模式。不論哪種模式，都 frame_start 執行到 frame_end，但如果 frame_end 省略了，預設就是到目前的列（CURRENT ROW）。

UNBOUNDED PRECEDING 的窗框始於該分區的第一列，同樣地，UNBOUNDED FOLLOWING 意指窗框結束於分區的最後一列。

在 RANGE 模式裡，如果 frame_start 設定為 CURRENT ROW 的話，表示窗框始於目前列同序的那一列（使用 ORDER BY 時，排序相同的那一列），同理，frame_end 設定為 CURRENT ROW 時，表示窗框止於排序相同的列。而在 ROWS 模式時，CURRENT ROW 指的就是自己。

PRECEDING 和 FOLLOWING 兩個設定值，目前只能用在 ROWS 模式。它們指的是窗框的起迄於指定的一個值，表示目前列之前後多少列。而所謂的值，必須是整數表示式而不包含任何變數、彙總函數、或窗函數。其值也不能是空值或負值，但可以為零，表示只處理目前列。

預設的窗框設定是 RANGE UNBOUNDED PRECEDING，和 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 是一樣的。加上 ORDER BY 的話，這可以讓窗框起於和目前列並列的列；沒有 ORDER BY 的話，所有的列都會在分區裡，因為如此就無法判定次序，表示大家都一樣。

frame_start 的限制是不能使用 UNBOUNDED FOLLOWING，而 frame_end 不能使用 UNBOUNDED PRECEDING。frame_end 的設定也不能先於 frame_start—舉例來說，RANGE BETWEEN CURRENT ROW，使用 PRECEDING 就不可以。

如果有使用到 FILTER 的話，就只有符合 FILTER 條件式的列會被窗函數處理，其餘的列都會被忽略。只有彙總式的窗函數可以使用 FILTER 子句。

內建的窗函數會在 9.57 節中說明，使用者也可以自行設計窗函數。任何內建或自訂的一般函數或統計函數，都可以當作窗函數來使用。（有序集合和假定集合的彙總數，目前不能當作窗函數來使用。）

「*」語法的使用，用來把無參數的彙總函數當作窗函數來使用，例如：count(*) OVER (PARTITION BY x ORDER BY y)。「*」通常不會用於專門的窗函數上，專門的窗函數不允許參數裡有用到 DISTINCT 或 ORDER BY 的語法。

窗函數呼叫只限於 SELECT 回傳列表，及 ORDER BY 子句中。

更多窗函數的說明請參閱 3.5 節、9.21 節、及 7.2.5 節。

4.2.9. 型別轉換

型別轉換指定從一種資料型別轉換為另一種資料型別。PostgreSQL 接受兩種用於型別轉換的等效語法：

CAST ( expression AS type )
expression::type

CAST 語法符合 SQL 標準；帶「::」的語法是 PostgreSQL 既有的用法。

當強制轉換應用於已知型別的值表示式時，它表示執行時型別轉換。只有定義了合適的型別轉換操作，操作才能成功。請注意，這與使用帶常數的強制轉換略有不同，如 4.1.2.7 節所示。應用於未經修飾的字串文字的強制轉換表示將型別初始分配給文字常數，因此對於任何型別（如果字串文字的內容都是資料型別的可接受輸入語法）都會成功。

如果對於值表示式必須產生的型別沒有歧義（例如，當它被分配給資料表欄位），通常可以省略顯式的型別轉換；系統將在這種情況下自動套用型別轉換。但是，只有在系統目錄中標記為「可以隱式套用」的強制轉換才會執行自動強制轉換。其他強制轉換必須使用顯式強制轉換語法來使用。此限制旨在防止系統默默地套用令人意外的轉換。

也可以使用函數式語法來指定型別轉換：

typename ( expression )

但是，這僅適用於名稱也可以作為函數名稱使用的型別。例如，雙精度不能用這種方式，但等價的 float8 可以。而且，由於語法衝突，名稱間隔，時間和時間戳記只能使用雙引號才能用於這種方式。因此，使用類似功能的轉換語法會導致不一致，因此可能應該避免。

注意

函數式語法實際上只是一個函數呼叫。當兩個標準轉換語法之一用於執行轉換時，它將在內部呼叫已註冊的函數來執行轉換。按照慣例，這些轉換函數與它們的輸出類型具有相同的名稱，因此「函數式語法」只不過是直接呼叫底層的轉換函數。顯然，這不是一個可移植式應用程序應該依賴的東西。有關更多詳情，請參閱 CREATE CAST。

4.2.10. 排序表示式

COLLATE 子句用於覆蓋排序規則的表示式。它附加到所套用的表示式上：

expr COLLATE collation

排序規則是一種可以綱要限定識別指標。COLLATE 子句比運算子更緊密；必要時可以使用括號。

如果沒有明確指定排序規則，那麼資料庫系統會從表示式中涉及的欄位中衍生一個排序規則，或者如果表示式中未包含任何欄位，則預設為資料庫的預設排序規則。

COLLATE 子句的兩個常見用法是重寫 ORDER BY 子句中的排序順序，例如：

SELECT a, b, c FROM tbl WHERE ... ORDER BY a COLLATE "C";

並覆蓋具有語言環境特性結果的函數或運算子呼叫的排序規則，例如：

SELECT * FROM tbl WHERE a > 'foo' COLLATE "C";

請注意，在後者的情況下，COLLATE 子句附加到我們希望影響的運算子的輸入參數。無論運算子或函數呼叫 COLLATE 子句的哪個參數被附加到哪個參數都沒有關係，因為運算子或函數套用的排序規則是透過考慮所有參數衍生的，並且顯式 COLLATE 子句將覆蓋所有其他排序規則參數。（然而，將不匹配的 COLLATE 子句連接到多個參數是錯誤的，更多細節請參閱第 23.2 節）。因此，這會産生與前面的例子相同的結果：

SELECT * FROM tbl WHERE a COLLATE "C" > 'foo';

但是這會有錯：

SELECT * FROM tbl WHERE (a > 'foo') COLLATE "C";

因為它試圖將排序規則應用於「>」運算子的結果，該運算符是不可排序的布林資料型別。

4.2.11. Scalar 子查詢

Scalar 子查詢指的是括號中的普通 SELECT 查詢，但它只回傳一個資料列的一個欄位。（有關撰寫查詢的訊息，請參閱第 7 章。）執行 SELECT 查詢並在周圍的值表示式中使用單個回傳的值。使用回傳多於一個資料列或多於一個欄位的查詢作為 scalar 子查詢是錯誤的。（但是，如果在特定執行過程中子查詢不回傳任何資料列，則不會出現錯誤；Scalar 結果將視為空）。子查詢可以引用周圍查詢中的變數，該變數在任何一次運算期間都將用作常數的子查詢。有關子查詢的其他表示式，另請參閱第 9.22 節。

例如，以下是每個州中最大的城市人口數量：

SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name)
    FROM states;

4.2.12. 陣列建構函數

陣列建構函數是一種使用其成員元素的值建構陣列的表示式。一個簡單的陣列建構函數由關鍵字 ARRAY，左方括號 [，陣列元素值的表示式列表（用逗號分隔），最後一個右方括號 ] 組成。例如：

SELECT ARRAY[1,2,3+4];
  array
---------
 {1,2,7}
(1 row)

預設情況下，陣列元素型別是成員表示式的通用型別，使用與 UNION 或 CASE 結構相同的規則來決定（參閱 10.5 節）。您也可以透過明確將陣列建構函數轉換為所需的型別來覆蓋它，例如：

SELECT ARRAY[1,2,22.7]::integer[];
  array
----------
 {1,2,23}
(1 row)

這與分別將每個表示式轉換為陣列元素型別的效果相同。有關型別轉換的更多訊息，請參閱第 4.2.9 節。

可以透過巢狀的陣列建構函數來建構多維陣列。在內部的建構函數中，關鍵字 ARRAY 可以省略。例如，這些語法會產生相同的結果：

SELECT ARRAY[ARRAY[1,2], ARRAY[3,4]];
     array
---------------
 &#123;{1,2},{3,4}&#125;
(1 row)

SELECT ARRAY[[1,2],[3,4]];
     array
---------------
 &#123;{1,2},{3,4}&#125;
(1 row)

由於多維陣列必須是矩形，因此同一級別的內部建構函數必須産生具有相同維數的子陣列。套用於外部 ARRAY 建構函數的任何強制型別都會自動轉送給所有內部建構函數。

多維陣列建構函數的元素可以是任何產生適當型別陣列的東西，不僅只是一個子 ARRAY 結構。例如：

CREATE TABLE arr(f1 int[], f2 int[]);

INSERT INTO arr VALUES (ARRAY[[1,2],[3,4]], ARRAY[[5,6],[7,8]]);

SELECT ARRAY[f1, f2, '&#123;{9,10},{11,12}&#125;'::int[]] FROM arr;
                     array
------------------------------------------------
 {&#123;{1,2},{3,4}},&#123;{5,6},{7,8}},&#123;{9,10},{11,12}&#125;}
(1 row)

你可以建構一個空陣列，但由於不可能有一個沒有型別的陣列，所以你必須明確地將你的空陣列轉換為所需的型別。例如：

SELECT ARRAY[]::integer[];
 array
-------
 {}
(1 row)

也可以從子查詢的結果中建構一個陣列。在這種形式下，陣列建構函數使用關鍵字 ARRAY 和小括號（不是中括號）的子查詢寫入。例如：

SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
                                 array
-----------------------------------------------------------------------
 {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413}
(1 row)

SELECT ARRAY(SELECT ARRAY[i, i*2] FROM generate_series(1,5) AS a(i));
              array
----------------------------------
 &#123;{1,2},{2,4},{3,6},{4,8},{5,10}&#125;
(1 row)

子查詢必須回傳一個資料列。如果子查詢的輸出欄位是非陣列型別，則産生的一維陣列將具有子查詢結果中每個資料列的元素，其元素型別與子查詢的輸出欄位匹配。如果子查詢的輸出欄位是一個陣列型別，則結果將是一個相同型別的陣列，但會是一個更高的維度；在這種情況下，所有子查詢資料列都必須産生具有相同維度的陣列，否則結果將不是矩形。

用 ARRAY 建構的陣列索引值的下標始終以 1 開頭。有關陣列的更多訊息，請參閱第 8.15 節。

4.2.13. 資料列建構者

資料列建構函數是一個表示式，它使用其成員字串的值建構資料列內容（也稱為複合值）。資料建構函數由關鍵字 ROW，左括號，資料列字串的零個或多個表示式（以逗號分隔）所組成，最後則是右括號。例如：

SELECT ROW(1,2.5,'this is a test');

當列表中有多個表示式時，關鍵詞 ROW 是選用的。

資料列建構函數可以包含語法 rowvalue.，它將被延展為資料列內容的元素列表，就像在 SELECT 回傳列表的使用 . 語法時一樣（請參閱第 8.16.5 節）。例如，如果資料列具有欄位 f1 和 f2，則這些欄位是相同的：

SELECT ROW(t.*, 42) FROM t;
SELECT ROW(t.f1, t.f2, 42) FROM t;

注意

在 PostgreSQL 8.2 之前，. 語法在資料列建構函數中不會展開，因此寫了ROW(t., 42) 會建立一個兩個字串欄位的資料列，其第一個是欄位是另一個資料列值。新的建構行為通常更有用。如果您需要嵌套資料列值的舊行為，請不要使用 .* 的內部資料列值，例如 ROW(t, 42)。

預設情況下，由 ROW 表示式建立的值是匿名記錄型別。如有必要，可將其轉換為指定的複合型別 - 資料表的資料列型別或使用 CREATE TYPE AS 建立的複合型別。可能需要明確表示以避免歧義。例如：

CREATE TABLE mytable(f1 int, f2 float, f3 text);

CREATE FUNCTION getf1(mytable) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;

-- No cast needed since only one getf1() exists
SELECT getf1(ROW(1,2.5,'this is a test'));
 getf1
-------
     1
(1 row)

CREATE TYPE myrowtype AS (f1 int, f2 text, f3 numeric);

CREATE FUNCTION getf1(myrowtype) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;

-- Now we need a cast to indicate which function to call:
SELECT getf1(ROW(1,2.5,'this is a test'));
ERROR:  function getf1(record) is not unique

SELECT getf1(ROW(1,2.5,'this is a test')::mytable);
 getf1
-------
     1
(1 row)

SELECT getf1(CAST(ROW(11,'this is a test',2.5) AS myrowtype));
 getf1
-------
    11
(1 row)

資料列建構函數可用於建構要儲存在複合型別資料表欄位中的複合內容，或者要傳遞給接受複合參數的函數。此外，可以比較兩個資料列值或用 IS NULL 或 IS NOT NULL 來測試資料列，例如：

SELECT ROW(1,2.5,'this is a test') = ROW(1, 3, 'not the same');

SELECT ROW(table.*) IS NULL FROM table;  -- detect all-null rows

更多細節請參閱第 9.23 節。資料列建構函數也可以與子查詢結合使用，如第 9.22 節所述。

4.2.14. 表示式運算規則

並沒有定義子表示式的運算順序。特別是，運算子或函數的輸入不一定是從左到右或以任何其他固定順序進行運算。

進一步來說，如果一個表示式的結果可以透過只運算它的某些部分來得到，那麼其他子表示式可能根本就不會被運算。例如，如果有人寫了：

SELECT true OR somefunc();

那麼 somefunc() 將（可能）根本不會被呼叫。如果有人寫了：

SELECT somefunc() OR true;

請注意，這與在某些程語言中發現的布林運算是從左到右的「短路」不同。

因此，將具有副作用的函數用作複雜表示式的一部分是不明智的。在 WHERE 和 HAVING 子句中依賴副作用或運算順序是特別危險的，因為這些子句作為製定執行計劃的一部分經常式會被重新運算。這些子句中的布林表示式（AND / OR / NOT 組合）可以按照布林代數法則的任何方式重新組織。

如果必須強制執行某部份的運算指令，則可以使用 CASE 結構（請參閱第 9.17 節）。例如，這是試圖避免在 WHERE 子句中除以零不可信任的方式：

SELECT ... WHERE x 
>
 0 AND y/x 
>
 1.5;

但這樣是安全的：

SELECT ... WHERE CASE WHEN x 
>
 0 THEN y/x 
>
 1.5 ELSE false END;

以這種方式使用的 CASE 構造將放棄最佳化嘗試，因此只能在必要時進行。（在這個特定的例子中，透過改寫為 y> 1.5 * x 來避免這個問題會更好。）

然而，CASE 對於這些問題並不是萬能的。上述技術的一個局限是它不能阻止對常數子表示式的預先評估。如第 37.6 節所述，標記為 IMMUTABLE 的函數和運算子可以在查詢計劃時進行運算，而不是在執行時進行運算。因此，例如：

SELECT CASE WHEN x 
>
 0 THEN x ELSE 1/0 END FROM tab;

由於查詢規劃試圖簡化常數子表示式，因此即使資料表中的每一個資料列都具有 x> 0，以至於在執行時永遠不會走到 ELSE，也可能導致除以零的例外情況。

雖然這個特殊的例子看起來很愚蠢，但是在函數中執行的查詢中可能會出現不明顯涉及常數的情況，因為函數參數和局部變數的值可以作為常數插入到查詢中以用於查詢規劃。例如，在 PL/pgSQL 函數中，使用 IF-THEN-ELSE 語句來保護有風險的運算要比將它嵌套在 CASE 表示式中要安全得多。

同一種類型的另一個限制是，CASE 無法阻止運算其中包含的彙總表示式，因為需要在 SELECT 資料列表或 HAVING 子句中的其他表示式之前計算彙總表示式。例如，下面的查詢可能會導致一個除以零例外情況，儘管似乎已經受到保護：

SELECT CASE WHEN min(employees) > 0
            THEN avg(expenses / employees)
       END
    FROM departments;

min() 和 avg() 彙總運算是在所有輸入的資料列上同時計算的，因此如果任何員工的資料等於零，則在有任何測試 min() 結果的機會之前，發生除以零的錯誤。相反，使用 WHERE 或 FILTER 子句來防止有問題的輸入資料列，將可以在彙總函數之前來預防這種情況發生。

5.11. 分割資料表

PostgreSQL 支援基本的分割資料表。本節描述了為什麼以及如何在資料庫設計中實現分割資料表。

5.11.1. 概念

分割資料表指的是將一個大型資料表以邏輯規則實體拆分為較小的資料庫。分割資料表可以帶來以下好處：

在某些情況下，尤其是當資料表中大多數被頻繁存取的資料位於單個分割區或少量的分割區之中時，查詢效能可以顯著地提高。分割區替代了索引的前幾個欄位，從而縮減了索引的大小，並使索引中頻繁使用的部分更有可能都放入記憶體之中。
當查詢或更新存取單個分割區的很大一部分時，可以透過對該分割區進行循序掃描而不是使用索引和遍及整個資料表的隨機讀取來提高效能。
如果計劃程序將這種需求計劃在分割區的設計中，則可以透過增加或刪除分區來完成批次加入和刪除。使用 ALTER TABLE DETACH PARTITION 或使用 DROP TABLE 刪除單個分割區比批次操作要快得多。這些命令還完全避免了由批次 DELETE 所增加的 VACUUM 成本。
很少使用的資料可以遷移到慢一些，但更便宜的儲存媒體上。

通常只有在資料表很大的情況下，這些好處才是值得的。資料表可以從分割區中受益的確切評估點取決於應用程式，儘管經驗法則是資料表的大小超過資料庫伺服器的記憶體大小的時候。

PostgreSQL 內建支援以下形式的分割方式：

Range Partitioning

此資料庫表的分割區以一個欄位為鍵或一組欄位定義的「range」來分配，分配給不同分割區的範圍之間沒有重疊。例如，可以按日期範圍或特定業務對象的標識範圍進行分割。

List Partitioning

透過明確列出哪些鍵值出現應該在哪個分割區中來對資料表進行分割。

Hash Partitioning

透過為每個分割區指定除數和餘數來對資料表進行分割。每個分割區將保留其分割鍵的雜湊值除以指定的除數所產生指定的餘數的資料列。

如果您的應用程式需要使用上面未列出的其他分割區形式，則可以使用替代方法，例如繼承和 UNION ALL 檢視表。此類方法提供了靈活性，但沒有內建宣告分割區的效能優勢。

5.11.2. Declarative Partitioning

PostgreSQL offers a way to specify how to divide a table into pieces called partitions. The table that is divided is referred to as a partitioned table. The specification consists of the partitioning method and a list of columns or expressions to be used as the partition key.

All rows inserted into a partitioned table will be routed to one of the partitions based on the value of the partition key. Each partition has a subset of the data defined by its partition bounds. The currently supported partitioning methods are range, list, and hash.

Partitions may themselves be defined as partitioned tables, using what is called sub-partitioning. Partitions may have their own indexes, constraints and default values, distinct from those of other partitions. See for more details on creating partitioned tables and partitions.

It is not possible to turn a regular table into a partitioned table or vice versa. However, it is possible to add a regular or partitioned table containing data as a partition of a partitioned table, or remove a partition from a partitioned table turning it into a standalone table; see to learn more about the ATTACH PARTITION and DETACH PARTITION sub-commands.

Individual partitions are linked to the partitioned table with inheritance behind-the-scenes; however, it is not possible to use some of the generic features of inheritance (discussed below) with declaratively partitioned tables or their partitions. For example, a partition cannot have any parents other than the partitioned table it is a partition of, nor can a regular table inherit from a partitioned table making the latter its parent. That means partitioned tables and their partitions do not participate in inheritance with regular tables. Since a partition hierarchy consisting of the partitioned table and its partitions is still an inheritance hierarchy, all the normal rules of inheritance apply as described in with some exceptions, most notably:

Both CHECK and NOT NULL constraints of a partitioned table are always inherited by all its partitions. CHECK constraints that are marked NO INHERIT are not allowed to be created on partitioned tables.
Using ONLY to add or drop a constraint on only the partitioned table is supported as long as there are no partitions. Once partitions exist, using ONLY will result in an error as adding or dropping constraints on only the partitioned table, when partitions exist, is not supported. Instead, constraints on the partitions themselves can be added and (if they are not present in the parent table) dropped.
As a partitioned table does not have any data directly, attempts to use TRUNCATE ONLY on a partitioned table will always return an error.
Partitions cannot have columns that are not present in the parent. It is not possible to specify columns when creating partitions with CREATE TABLE, nor is it possible to add columns to partitions after-the-fact using ALTER TABLE. Tables may be added as a partition with ALTER TABLE ... ATTACH PARTITION only if their columns exactly match the parent.
You cannot drop the NOT NULL constraint on a partition's column if the constraint is present in the parent table.

分割區也可以是外部資料表，儘管它們會有一些普通資料表所沒有的限制。有關更多資訊，請參閱。

更新資料的分割區主鍵會將其遷移到該筆資料所滿足分割區範圍的其他分割區中。

5.11.2.1. Example

Suppose we are constructing a database for a large ice cream company. The company measures peak temperatures every day as well as ice cream sales in each region. Conceptually, we want a table like:

We know that most queries will access just the last week's, month's or quarter's data, since the main use of this table will be to prepare online reports for management. To reduce the amount of old data that needs to be stored, we decide to only keep the most recent 3 years worth of data. At the beginning of each month we will remove the oldest month's data. In this situation we can use partitioning to help us meet all of our different requirements for the measurements table.

To use declarative partitioning in this case, use the following steps:

Create measurement table as a partitioned table by specifying the PARTITION BY clause, which includes the partitioning method (RANGE in this case) and the list of column(s) to use as the partition key.
You may decide to use multiple columns in the partition key for range partitioning, if desired. Of course, this will often result in a larger number of partitions, each of which is individually smaller. On the other hand, using fewer columns may lead to a coarser-grained partitioning criteria with smaller number of partitions. A query accessing the partitioned table will have to scan fewer partitions if the conditions involve some or all of these columns. For example, consider a table range partitioned using columns lastname and firstname (in that order) as the partition key.
Create partitions. Each partition's definition must specify the bounds that correspond to the partitioning method and partition key of the parent. Note that specifying bounds such that the new partition's values will overlap with those in one or more existing partitions will cause an error. Inserting data into the parent table that does not map to one of the existing partitions will cause an error; an appropriate partition must be added manually.
Partitions thus created are in every way normal PostgreSQL tables (or, possibly, foreign tables). It is possible to specify a tablespace and storage parameters for each partition separately.
It is not necessary to create table constraints describing partition boundary condition for partitions. Instead, partition constraints are generated implicitly from the partition bound specification whenever there is need to refer to them.
To implement sub-partitioning, specify the PARTITION BY clause in the commands used to create individual partitions, for example:
After creating partitions of measurement_y2006m02, any data inserted into measurement that is mapped to measurement_y2006m02 (or data that is directly inserted into measurement_y2006m02, provided it satisfies its partition constraint) will be further redirected to one of its partitions based on the peaktemp column. The partition key specified may overlap with the parent's partition key, although care should be taken when specifying the bounds of a sub-partition such that the set of data it accepts constitutes a subset of what the partition's own bounds allows; the system does not try to check whether that's really the case.
Create an index on the key column(s), as well as any other indexes you might want, on the partitioned table. (The key index is not strictly necessary, but in most scenarios it is helpful.) This automatically creates one index on each partition, and any partitions you create or attach later will also contain the index.
Ensure that the configuration parameter is not disabled in postgresql.conf. If it is, queries will not be optimized as desired.

In the above example we would be creating a new partition each month, so it might be wise to write a script that generates the required DDL automatically.

5.11.2.2. Partition Maintenance

Normally the set of partitions established when initially defining the table are not intended to remain static. It is common to want to remove old partitions of data and periodically add new partitions for new data. One of the most important advantages of partitioning is precisely that it allows this otherwise painful task to be executed nearly instantaneously by manipulating the partition structure, rather than physically moving large amounts of data around.

The simplest option for removing old data is to drop the partition that is no longer necessary:

This can very quickly delete millions of records because it doesn't have to individually delete every record. Note however that the above command requires taking an ACCESS EXCLUSIVE lock on the parent table.

Another option that is often preferable is to remove the partition from the partitioned table but retain access to it as a table in its own right:

This allows further operations to be performed on the data before it is dropped. For example, this is often a useful time to back up the data using COPY, pg_dump, or similar tools. It might also be a useful time to aggregate data into smaller formats, perform other data manipulations, or run reports.

Similarly we can add a new partition to handle new data. We can create an empty partition in the partitioned table just as the original partitions were created above:

As an alternative, it is sometimes more convenient to create the new table outside the partition structure, and make it a proper partition later. This allows the data to be loaded, checked, and transformed prior to it appearing in the partitioned table:

Before running the ATTACH PARTITION command, it is recommended to create a CHECK constraint on the table to be attached matching the desired partition constraint. That way, the system will be able to skip the scan to validate the implicit partition constraint. Without the CHECK constraint, the table will be scanned to validate the partition constraint while holding an ACCESS EXCLUSIVE lock on that partition and a SHARE UPDATE EXCLUSIVE lock on the parent table. It may be desired to drop the redundant CHECK constraint after ATTACH PARTITION is finished.

As explained above, it is possible to create indexes on partitioned tables and they are applied automatically to the entire hierarchy. This is very convenient, as not only the existing partitions will become indexed, but also any partitions that are created in the future will. One limitation is that it's not possible to use the CONCURRENTLY qualifier when creating such a partitioned index. To overcome long lock times, it is possible to use CREATE INDEX ON ONLY the partitioned table; such an index is marked invalid, and the partitions do not get the index applied automatically. The indexes on partitions can be created separately using CONCURRENTLY, and later attached to the index on the parent using ALTER INDEX .. ATTACH PARTITION. Once indexes for all partitions are attached to the parent index, the parent index is marked valid automatically. Example:

This technique can be used with UNIQUE and PRIMARY KEY constraints too; the indexes are created implicitly when the constraint is created. Example:

5.11.2.3. Limitations

以下是分割區資料表的限制：

無法建立跨所有分割區的限制條件。只能單獨對每個分割區設定。
分割區資料表上的唯一性限制條件必須包含所有分割主鍵欄位。存在此限制是因為 PostgreSQL 只能在每個分割區中個別實施唯一性。
必要時，必須在單個分割區（而不是分割資料表）上定義 BEFORE ROW 觸發器。
不允許在同一分割區中混合臨時和永久關連。因此，如果分割資料表是永久性的，則分割區也必須是永久性的，或者都臨時的。使用臨時關連時，分割資料表的所有成員必須來自同一個資料庫連線。

5.11.3. Implementation Using Inheritance

While the built-in declarative partitioning is suitable for most common use cases, there are some circumstances where a more flexible approach may be useful. Partitioning can be implemented using table inheritance, which allows for several features not supported by declarative partitioning, such as:

For declarative partitioning, partitions must have exactly the same set of columns as the partitioned table, whereas with table inheritance, child tables may have extra columns not present in the parent.
Table inheritance allows for multiple inheritance.
Declarative partitioning only supports range, list and hash partitioning, whereas table inheritance allows data to be divided in a manner of the user's choosing. (Note, however, that if constraint exclusion is unable to prune child tables effectively, query performance might be poor.)
Some operations require a stronger lock when using declarative partitioning than when using table inheritance. For example, adding or removing a partition to or from a partitioned table requires taking an ACCESS EXCLUSIVE lock on the parent table, whereas a SHARE UPDATE EXCLUSIVE lock is enough in the case of regular inheritance.

5.11.3.1. Example

We use the same measurement table we used above. To implement partitioning using inheritance, use the following steps:

Create the “master” table, from which all of the “child” tables will inherit. This table will contain no data. Do not define any check constraints on this table, unless you intend them to be applied equally to all child tables. There is no point in defining any indexes or unique constraints on it, either. For our example, the master table is the measurement table as originally defined.
Create several “child” tables that each inherit from the master table. Normally, these tables will not add any columns to the set inherited from the master. Just as with declarative partitioning, these tables are in every way normal PostgreSQL tables (or foreign tables).
Add non-overlapping table constraints to the child tables to define the allowed key values in each.
Typical examples would be:
Ensure that the constraints guarantee that there is no overlap between the key values permitted in different child tables. A common mistake is to set up range constraints like:
This is wrong since it is not clear which child table the key value 200 belongs in.
It would be better to instead create child tables as follows:
For each child table, create an index on the key column(s), as well as any other indexes you might want.
We want our application to be able to say INSERT INTO measurement ... and have the data be redirected into the appropriate child table. We can arrange that by attaching a suitable trigger function to the master table. If data will be added only to the latest child, we can use a very simple trigger function:
After creating the function, we create a trigger which calls the trigger function:
We must redefine the trigger function each month so that it always points to the current child table. The trigger definition does not need to be updated, however.
We might want to insert data and have the server automatically locate the child table into which the row should be added. We could do this with a more complex trigger function, for example:
The trigger definition is the same as before. Note that each IF test must exactly match the CHECK constraint for its child table.
While this function is more complex than the single-month case, it doesn't need to be updated as often, since branches can be added in advance of being needed.
Note
In practice, it might be best to check the newest child first, if most inserts go into that child. For simplicity, we have shown the trigger's tests in the same order as in other parts of this example.
A different approach to redirecting inserts into the appropriate child table is to set up rules, instead of a trigger, on the master table. For example:
A rule has significantly more overhead than a trigger, but the overhead is paid once per query rather than once per row, so this method might be advantageous for bulk-insert situations. In most cases, however, the trigger method will offer better performance.
Be aware that COPY ignores rules. If you want to use COPY to insert data, you'll need to copy into the correct child table rather than directly into the master. COPY does fire triggers, so you can use it normally if you use the trigger approach.
Another disadvantage of the rule approach is that there is no simple way to force an error if the set of rules doesn't cover the insertion date; the data will silently go into the master table instead.
Ensure that the configuration parameter is not disabled in postgresql.conf; otherwise child tables may be accessed unnecessarily.

As we can see, a complex table hierarchy could require a substantial amount of DDL. In the above example we would be creating a new child table each month, so it might be wise to write a script that generates the required DDL automatically.

5.11.3.2. Maintenance For Inheritance Partitioning

To remove old data quickly, simply drop the child table that is no longer necessary:

To remove the child table from the inheritance hierarchy table but retain access to it as a table in its own right:

To add a new child table to handle new data, create an empty child table just as the original children were created above:

Alternatively, one may want to create and populate the new child table before adding it to the table hierarchy. This could allow data to be loaded, checked, and transformed before being made visible to queries on the parent table.

5.11.3.3. Caveats

The following caveats apply to partitioning implemented using inheritance:

There is no automatic way to verify that all of the CHECK constraints are mutually exclusive. It is safer to create code that generates child tables and creates and/or modifies associated objects than to write each by hand.
Indexes and foreign key constraints apply to single tables and not to their inheritance children, hence they have some to be aware of.
The schemes shown here assume that the values of a row's key column(s) never change, or at least do not change enough to require it to move to another partition. An UPDATE that attempts to do that will fail because of the CHECK constraints. If you need to handle such cases, you can put suitable update triggers on the child tables, but it makes management of the structure much more complicated.
If you are using manual VACUUM or ANALYZE commands, don't forget that you need to run them on each child table individually. A command like:
will only process the master table.
INSERT statements with ON CONFLICT clauses are unlikely to work as expected, as the ON CONFLICT action is only taken in case of unique violations on the specified target relation, not its child relations.
Triggers or rules will be needed to route rows to the desired child table, unless the application is explicitly aware of the partitioning scheme. Triggers may be complicated to write, and will be much slower than the tuple routing performed internally by declarative partitioning.

5.11.4. Partition Pruning

Partition pruning (分割區修剪)是一種查詢最佳化技術，可提高分割資料表的效能。舉個例子：

如果不進行分割區修剪，則上面的查詢將掃描 measurement 資料表的每個分割區。啟用分割區修剪後，計劃程序將檢查每個分割區的定義並證明不需要掃描該分割區，因為該分割區不會包含滿足查詢 WHERE 子句的資料。當計劃程序可以證明這一點時，它將從查詢計劃中排除（修剪）分割區。

透過使用 EXPLAIN 指令和組態參數，可以顯示已修剪分割區的計劃與未修剪分割區的計劃之間差異。對於這種類型的資料表設定，典型未最佳化的計劃是：

Some or all of the partitions might use index scans instead of full-table sequential scans, but the point here is that there is no need to scan the older partitions at all to answer this query. When we enable partition pruning, we get a significantly cheaper plan that will deliver the same answer:

Note that partition pruning is driven only by the constraints defined implicitly by the partition keys, not by the presence of indexes. Therefore it isn't necessary to define indexes on the key columns. Whether an index needs to be created for a given partition depends on whether you expect that queries that scan the partition will generally scan a large part of the partition or just a small part. An index will be helpful in the latter case but not the former.

Partition pruning can be performed not only during the planning of a given query, but also during its execution. This is useful as it can allow more partitions to be pruned when clauses contain expressions whose values are not known at query planning time, for example, parameters defined in a PREPARE statement, using a value obtained from a subquery, or using a parameterized value on the inner side of a nested loop join. Partition pruning during execution can be performed at any of the following times:

During initialization of the query plan. Partition pruning can be performed here for parameter values which are known during the initialization phase of execution. Partitions which are pruned during this stage will not show up in the query's EXPLAIN or EXPLAIN ANALYZE. It is possible to determine the number of partitions which were removed during this phase by observing the “Subplans Removed” property in the EXPLAIN output.
During actual execution of the query plan. Partition pruning may also be performed here to remove partitions using values which are only known during actual query execution. This includes values from subqueries and values from execution-time parameters such as those from parameterized nested loop joins. Since the value of these parameters may change many times during the execution of the query, partition pruning is performed whenever one of the execution parameters being used by partition pruning changes. Determining if partitions were pruned during this phase requires careful inspection of the loops property in the EXPLAIN ANALYZE output. Subplans corresponding to different partitions may have different values for it depending on how many times each of them was pruned during execution. Some may be shown as (never executed) if they were pruned every time.

Partition pruning can be disabled using the setting.

Note

Execution-time partition pruning currently only occurs for the Append and MergeAppend node types. It is not yet implemented for the ModifyTable node type, but that is likely to be changed in a future release of PostgreSQL.

5.11.5. Partitioning and Constraint Exclusion

Constraint exclusion is a query optimization technique similar to partition pruning. While it is primarily used for partitioning implemented using the legacy inheritance method, it can be used for other purposes, including with declarative partitioning.

Constraint exclusion works in a very similar way to partition pruning, except that it uses each table's CHECK constraints — which gives it its name — whereas partition pruning uses the table's partition bounds, which exist only in the case of declarative partitioning. Another difference is that constraint exclusion is only applied at plan time; there is no attempt to remove partitions at execution time.

The fact that constraint exclusion uses CHECK constraints, which makes it slow compared to partition pruning, can sometimes be used as an advantage: because constraints can be defined even on declaratively-partitioned tables, in addition to their internal partition bounds, constraint exclusion may be able to elide additional partitions from the query plan.

The default (and recommended) setting of is neither on nor off, but an intermediate setting called partition, which causes the technique to be applied only to queries that are likely to be working on inheritance partitioned tables. The on setting causes the planner to examine CHECK constraints in all queries, even simple ones that are unlikely to benefit.

The following caveats apply to constraint exclusion:

Constraint exclusion is only applied during query planning, unlike partition pruning, which can also be applied during query execution.
Constraint exclusion only works when the query's WHERE clause contains constants (or externally supplied parameters). For example, a comparison against a non-immutable function such as CURRENT_TIMESTAMP cannot be optimized, since the planner cannot know which child table the function's value might fall into at run time.
Keep the partitioning constraints simple, else the planner may not be able to prove that child tables might not need to be visited. Use simple equality conditions for list partitioning, or simple range tests for range partitioning, as illustrated in the preceding examples. A good rule of thumb is that partitioning constraints should contain only comparisons of the partitioning column(s) to constants using B-tree-indexable operators, because only B-tree-indexable column(s) are allowed in the partition key.
All constraints on all children of the parent table are examined during constraint exclusion, so large numbers of children are likely to increase query planning time considerably. So the legacy inheritance based partitioning will work well with up to perhaps a hundred child tables; don't try to use many thousands of children.

5.11.6. Declarative Partitioning Best Practices

The choice of how to partition a table should be made carefully as the performance of query planning and execution can be negatively affected by poor design.

One of the most critical design decisions will be the column or columns by which you partition your data. Often the best choice will be to partition by the column or set of columns which most commonly appear in WHERE clauses of queries being executed on the partitioned table. WHERE clause items that match and are compatible with the partition key can be used to prune unneeded partitions. However, you may be forced into making other decisions by requirements for the PRIMARY KEY or a UNIQUE constraint. Removal of unwanted data is also a factor to consider when planning your partitioning strategy. An entire partition can be detached fairly quickly, so it may be beneficial to design the partition strategy in such a way that all data to be removed at once is located in a single partition.

Choosing the target number of partitions that the table should be divided into is also a critical decision to make. Not having enough partitions may mean that indexes remain too large and that data locality remains poor which could result in low cache hit ratios. However, dividing the table into too many partitions can also cause issues. Too many partitions can mean longer query planning times and higher memory consumption during both query planning and execution. When choosing how to partition your table, it's also important to consider what changes may occur in the future. For example, if you choose to have one partition per customer and you currently have a small number of large customers, consider the implications if in several years you instead find yourself with a large number of small customers. In this case, it may be better to choose to partition by HASH and choose a reasonable number of partitions rather than trying to partition by LIST and hoping that the number of customers does not increase beyond what it is practical to partition the data by.

Sub-partitioning can be useful to further divide partitions that are expected to become larger than other partitions, although excessive sub-partitioning can easily lead to large numbers of partitions and can cause the same problems mentioned in the preceding paragraph.

It is also important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few thousand partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher when more partitions remain after the planner performs partition pruning. This is particularly true for the UPDATE and DELETE commands. Another reason to be concerned about having a large number of partitions is that the server's memory consumption may grow significantly over a period of time, especially if many sessions touch large numbers of partitions. That's because each partition requires its metadata to be loaded into the local memory of each session that touches it.

With data warehouse type workloads, it can make sense to use a larger number of partitions than with an OLTP type workload. Generally, in data warehouses, query planning time is less of a concern as the majority of processing time is spent during query execution. With either of these two types of workload, it is important to make the right decisions early, as re-partitioning large quantities of data can be painfully slow. Simulations of the intended workload are often beneficial for optimizing the partitioning strategy. Never assume that more partitions are better than fewer partitions and vice-versa.

7.2. 資料表表示式

A table expression computes a table. The table expression contains a FROM clause that is optionally followed by WHERE, GROUP BY, and HAVING clauses. Trivial table expressions simply refer to a table on disk, a so-called base table, but more complex expressions can be used to modify or combine base tables in various ways.

The optional WHERE, GROUP BY, and HAVING clauses in the table expression specify a pipeline of successive transformations performed on the table derived in the FROM clause. All these transformations produce a virtual table that provides the rows that are passed to the select list to compute the output rows of the query.

7.2.1. The `FROM` Clause

The FROM Clause derives a table from one or more other tables given in a comma-separated table reference list.

FROM table_reference [, table_reference [, ...]]

A table reference can be a table name (possibly schema-qualified), or a derived table such as a subquery, a JOIN construct, or complex combinations of these. If more than one table reference is listed in the FROM clause, the tables are cross-joined (that is, the Cartesian product of their rows is formed; see below). The result of the FROM list is an intermediate virtual table that can then be subject to transformations by the WHERE, GROUP BY, and HAVING clauses and is finally the result of the overall table expression.

When a table reference names a table that is the parent of a table inheritance hierarchy, the table reference produces rows of not only that table but all of its descendant tables, unless the key word ONLY precedes the table name. However, the reference produces only the columns that appear in the named table — any columns added in subtables are ignored.

Instead of writing ONLY before the table name, you can write * after the table name to explicitly specify that descendant tables are included. There is no real reason to use this syntax any more, because searching descendant tables is now always the default behavior. However, it is supported for compatibility with older releases.

7.2.1.1. Joined Tables

A joined table is a table derived from two other (real or derived) tables according to the rules of the particular join type. Inner, outer, and cross-joins are available. The general syntax of a joined table is

T1 join_type T2 [ join_condition ]

Joins of all types can be chained together, or nested: either or both T1 and T2 can be joined tables. Parentheses can be used around JOIN clauses to control the join order. In the absence of parentheses, JOIN clauses nest left-to-right.

Join TypesCross join

T1 CROSS JOIN T2

For every possible combination of rows from T1 and T2 (i.e., a Cartesian product), the joined table will contain a row consisting of all columns in T1 followed by all columns in T2. If the tables have N and M rows respectively, the joined table will have N * M rows.

FROM T1 CROSS JOIN T2 is equivalent to FROM T1 INNER JOIN T2 ON TRUE (see below). It is also equivalent to FROM T1, T2.

Note

This latter equivalence does not hold exactly when more than two tables appear, because JOIN binds more tightly than comma. For example FROMT1 CROSS JOIN T2 INNER JOIN T3 ON condition is not the same as FROMT1, T2 INNER JOIN T3 ON condition because the condition can referenceT1 in the first case but not the second.Qualified joins

T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 ON boolean_expression
T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 USING ( join column list )
T1 NATURAL { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2

The words INNER and OUTER are optional in all forms. INNER is the default; LEFT, RIGHT, and FULL imply an outer join.

The join condition is specified in the ON or USING clause, or implicitly by the word NATURAL. The join condition determines which rows from the two source tables are considered to “match”, as explained in detail below.

The possible types of qualified join are:INNER JOIN

For each row R1 of T1, the joined table has a row for each row in T2 that satisfies the join condition with R1.LEFT OUTER JOIN

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Thus, the joined table always has at least one row for each row in T1.RIGHT OUTER JOIN

First, an inner join is performed. Then, for each row in T2 that does not satisfy the join condition with any row in T1, a joined row is added with null values in columns of T1. This is the converse of a left join: the result table will always have a row for each row in T2.FULL OUTER JOIN

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Also, for each row of T2 that does not satisfy the join condition with any row in T1, a joined row with null values in the columns of T1 is added.

The ON clause is the most general kind of join condition: it takes a Boolean value expression of the same kind as is used in a WHERE clause. A pair of rows from T1 and _T2_match if the ON expression evaluates to true.

The USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining T1 and T2_with USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b = T2_.b.

Furthermore, the output of JOIN USING suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. While JOIN ON produces all columns from T1 followed by all columns from T2, JOIN USING produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns from T1, followed by any remaining columns from T2.

Finally, NATURAL is a shorthand form of USING: it forms a USING list consisting of all column names that appear in both input tables. As with USING, these columns appear only once in the output table. If there are no common column names, NATURAL JOIN behaves like JOIN ... ON TRUE, producing a cross-product join.

Note

USING is reasonably safe from column changes in the joined relations since only the listed columns are combined. NATURAL is considerably more risky since any schema changes to either relation that cause a new matching column name to be present will cause the join to combine that new column as well.

To put this together, assume we have tables t1:

 num | name
-----+------
   1 | a
   2 | b
   3 | c

and t2:

 num | value
-----+-------
   1 | xxx
   3 | yyy
   5 | zzz

then we get the following results for the various joins:

=> SELECT * FROM t1 CROSS JOIN t2;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   1 | a    |   3 | yyy
   1 | a    |   5 | zzz
   2 | b    |   1 | xxx
   2 | b    |   3 | yyy
   2 | b    |   5 | zzz
   3 | c    |   1 | xxx
   3 | c    |   3 | yyy
   3 | c    |   5 | zzz
(9 rows)

=> SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   3 | c    |   3 | yyy
(2 rows)

=> SELECT * FROM t1 INNER JOIN t2 USING (num);
 num | name | value
-----+------+-------
   1 | a    | xxx
   3 | c    | yyy
(2 rows)

=> SELECT * FROM t1 NATURAL INNER JOIN t2;
 num | name | value
-----+------+-------
   1 | a    | xxx
   3 | c    | yyy
(2 rows)

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   2 | b    |     |
   3 | c    |   3 | yyy
(3 rows)

=> SELECT * FROM t1 LEFT JOIN t2 USING (num);
 num | name | value
-----+------+-------
   1 | a    | xxx
   2 | b    |
   3 | c    | yyy
(3 rows)

=> SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   3 | c    |   3 | yyy
     |      |   5 | zzz
(3 rows)

=> SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   2 | b    |     |
   3 | c    |   3 | yyy
     |      |   5 | zzz
(4 rows)

The join condition specified with ON can also contain conditions that do not relate directly to the join. This can prove useful for some queries but needs to be thought out carefully. For example:

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value = 'xxx';
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   2 | b    |     |
   3 | c    |     |
(3 rows)

Notice that placing the restriction in the WHERE clause produces a different result:

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value = 'xxx';
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
(1 row)

This is because a restriction placed in the ON clause is processed before the join, while a restriction placed in the WHERE clause is processed after the join. That does not matter with inner joins, but it matters a lot with outer joins.

7.2.1.2. Table and Column Aliases

A temporary name can be given to tables and complex table references to be used for references to the derived table in the rest of the query. This is called a table alias.

To create a table alias, write

FROM table_reference AS alias

FROM table_reference alias

The AS key word is optional noise. alias can be any identifier.

A typical application of table aliases is to assign short identifiers to long table names to keep the join clauses readable. For example:

SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.id = a.num;

The alias becomes the new name of the table reference so far as the current query is concerned — it is not allowed to refer to the table by the original name elsewhere in the query. Thus, this is not valid:

SELECT * FROM my_table AS m WHERE my_table.a > 5;    -- wrong

Table aliases are mainly for notational convenience, but it is necessary to use them when joining a table to itself, e.g.:

SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;

Additionally, an alias is required if the table reference is a subquery (see Section 7.2.1.3).

Parentheses are used to resolve ambiguities. In the following example, the first statement assigns the alias b to the second instance of my_table, but the second statement assigns the alias to the result of the join:

SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...

Another form of table aliasing gives temporary names to the columns of the table, as well as the table itself:

FROM table_reference [AS] alias ( column1 [, column2 [, ...]] )

If fewer column aliases are specified than the actual table has columns, the remaining columns are not renamed. This syntax is especially useful for self-joins or subqueries.

When an alias is applied to the output of a JOIN clause, the alias hides the original name(s) within the JOIN. For example:

SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...

is valid SQL, but:

SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c

is not valid; the table alias a is not visible outside the alias c.

7.2.1.3. Subqueries

Subqueries specifying a derived table must be enclosed in parentheses and must be assigned a table alias name (as in Section 7.2.1.2). For example:

FROM (SELECT * FROM table1) AS alias_name

This example is equivalent to FROM table1 AS alias_name. More interesting cases, which cannot be reduced to a plain join, arise when the subquery involves grouping or aggregation.

A subquery can also be a VALUES list:

FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))
     AS names(first, last)

Again, a table alias is required. Assigning alias names to the columns of the VALUES list is optional, but is good practice. For more information see Section 7.7.

7.2.1.4. Table Functions

Table functions are functions that produce a set of rows, made up of either base data types (scalar types) or composite data types (table rows). They are used like a table, view, or subquery in the FROM clause of a query. Columns returned by table functions can be included in SELECT, JOIN, or WHERE clauses in the same manner as columns of a table, view, or subquery.

Table functions may also be combined using the ROWS FROM syntax, with the results returned in parallel columns; the number of result rows in this case is that of the largest function result, with smaller results padded with null values to match.

function_call [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]
ROWS FROM( function_call [, ... ] ) [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]

If the WITH ORDINALITY clause is specified, an additional column of type bigint will be added to the function result columns. This column numbers the rows of the function result set, starting from 1. (This is a generalization of the SQL-standard syntax for UNNEST ... WITH ORDINALITY.) By default, the ordinal column is called ordinality, but a different column name can be assigned to it using an AS clause.

The special table function UNNEST may be called with any number of array parameters, and it returns a corresponding number of columns, as if UNNEST (Section 9.18) had been called on each parameter separately and combined using the ROWS FROM construct.

UNNEST( array_expression [, ... ] ) [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]

If no table_alias is specified, the function name is used as the table name; in the case of a ROWS FROM() construct, the first function's name is used.

If column aliases are not supplied, then for a function returning a base data type, the column name is also the same as the function name. For a function returning a composite type, the result columns get the names of the individual attributes of the type.

Some examples:

CREATE TABLE foo (fooid int, foosubid int, fooname text);

CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
    SELECT * FROM foo WHERE fooid = $1;
$$ LANGUAGE SQL;

SELECT * FROM getfoo(1) AS t1;

SELECT * FROM foo
    WHERE foosubid IN (
                        SELECT foosubid
                        FROM getfoo(foo.fooid) z
                        WHERE z.fooid = foo.fooid
                      );

CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);

SELECT * FROM vw_getfoo;

In some cases it is useful to define table functions that can return different column sets depending on how they are invoked. To support this, the table function can be declared as returning the pseudo-type record. When such a function is used in a query, the expected row structure must be specified in the query itself, so that the system can know how to parse and plan the query. This syntax looks like:

function_call [AS] alias (column_definition [, ... ])
function_call AS [alias] (column_definition [, ... ])
ROWS FROM( ... function_call AS (column_definition [, ... ]) [, ... ] )

When not using the ROWS FROM() syntax, the column_definition list replaces the column alias list that could otherwise be attached to the FROM item; the names in the column definitions serve as column aliases. When using the ROWS FROM() syntax, a column_definition list can be attached to each member function separately; or if there is only one member function and no WITH ORDINALITY clause, a column_definition list can be written in place of a column alias list following ROWS FROM().

Consider this example:

SELECT *
    FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
      AS t1(proname name, prosrc text)
    WHERE proname LIKE 'bytea%';

The dblink function (part of the dblink module) executes a remote query. It is declared to return record since it might be used for any kind of query. The actual column set must be specified in the calling query so that the parser knows, for example, what * should expand to.

7.2.1.5. LATERAL Subqueries

Subqueries appearing in FROM can be preceded by the key word LATERAL. This allows them to reference columns provided by preceding FROM items. (Without LATERAL, each subquery is evaluated independently and so cannot cross-reference any other FROM item.)

Table functions appearing in FROM can also be preceded by the key word LATERAL, but for functions the key word is optional; the function's arguments can contain references to columns provided by preceding FROM items in any case.

A LATERAL item can appear at top level in the FROM list, or within a JOIN tree. In the latter case it can also refer to any items that are on the left-hand side of a JOIN that it is on the right-hand side of.

When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROM item providing the cross-referenced column(s), or set of rows of multiple FROM items providing the columns, the LATERAL item is evaluated using that row or row set's values of the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is repeated for each row or set of rows from the column source table(s).

A trivial example of LATERAL is

SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss;

This is not especially useful since it has exactly the same result as the more conventional

SELECT * FROM foo, bar WHERE bar.id = foo.bar_id;

LATERAL is primarily useful when the cross-referenced column is necessary for computing the row(s) to be joined. A common application is providing an argument value for a set-returning function. For example, supposing that vertices(polygon) returns the set of vertices of a polygon, we could identify close-together vertices of polygons stored in a table with:

SELECT p1.id, p2.id, v1, v2
FROM polygons p1, polygons p2,
     LATERAL vertices(p1.poly) v1,
     LATERAL vertices(p2.poly) v2
WHERE (v1 <-> v2) < 10 AND p1.id != p2.id;

This query could also be written

SELECT p1.id, p2.id, v1, v2
FROM polygons p1 CROSS JOIN LATERAL vertices(p1.poly) v1,
     polygons p2 CROSS JOIN LATERAL vertices(p2.poly) v2
WHERE (v1 <-> v2) < 10 AND p1.id != p2.id;

or in several other equivalent formulations. (As already mentioned, the LATERAL key word is unnecessary in this example, but we use it for clarity.)

It is often particularly handy to LEFT JOIN to a LATERAL subquery, so that source rows will appear in the result even if the LATERAL subquery produces no rows for them. For example, if get_product_names() returns the names of products made by a manufacturer, but some manufacturers in our table currently produce no products, we could find out which ones those are like this:

SELECT m.name
FROM manufacturers m LEFT JOIN LATERAL get_product_names(m.id) pname ON true
WHERE pname IS NULL;

7.2.2. The `WHERE` Clause

The syntax of the WHERE Clause is

WHERE search_condition

where search_condition is any value expression (see Section 4.2) that returns a value of type boolean.

After the processing of the FROM clause is done, each row of the derived virtual table is checked against the search condition. If the result of the condition is true, the row is kept in the output table, otherwise (i.e., if the result is false or null) it is discarded. The search condition typically references at least one column of the table generated in the FROMclause; this is not required, but otherwise the WHERE clause will be fairly useless.

Note

The join condition of an inner join can be written either in the WHEREclause or in the JOIN clause. For example, these table expressions are equivalent:

FROM a, b WHERE a.id = b.id AND b.val > 5

and:

FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5

or perhaps even:

FROM a NATURAL JOIN b WHERE b.val > 5

Which one of these you use is mainly a matter of style. The JOIN syntax in the FROM clause is probably not as portable to other SQL database management systems, even though it is in the SQL standard. For outer joins there is no choice: they must be done in the FROM clause. The ON or USING clause of an outer join is not equivalent to a WHERE condition, because it results in the addition of rows (for unmatched input rows) as well as the removal of rows in the final result.

Here are some examples of WHERE clauses:

SELECT ... FROM fdt WHERE c1 > 5

SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)

SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)

SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)

SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100

SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > fdt.c1)

fdt is the table derived in the FROM clause. Rows that do not meet the search condition of the WHERE clause are eliminated from fdt. Notice the use of scalar subqueries as value expressions. Just like any other query, the subqueries can employ complex table expressions. Notice also how fdt is referenced in the subqueries. Qualifying c1 as fdt.c1 is only necessary if c1 is also the name of a column in the derived input table of the subquery. But qualifying the column name adds clarity even when it is not needed. This example shows how the column naming scope of an outer query extends into its inner queries.

7.2.3. The `GROUP BY` and `HAVING` Clauses

After passing the WHERE filter, the derived input table might be subject to grouping, using the GROUP BY clause, and elimination of group rows using the HAVING clause.

SELECT select_list
    FROM ...
    [WHERE ...]
    GROUP BY grouping_column_reference [, grouping_column_reference]...

The GROUP BY Clause is used to group together those rows in a table that have the same values in all the columns listed. The order in which the columns are listed does not matter. The effect is to combine each set of rows having common values into one group row that represents all rows in the group. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups. For instance:

=> SELECT * FROM test1;
 x | y
---+---
 a | 3
 c | 2
 b | 5
 a | 1
(4 rows)

=> SELECT x FROM test1 GROUP BY x;
 x
---
 a
 b
 c
(3 rows)

In the second query, we could not have written SELECT * FROM test1 GROUP BY x, because there is no single value for the column y that could be associated with each group. The grouped-by columns can be referenced in the select list since they have a single value in each group.

In general, if a table is grouped, columns that are not listed in GROUP BY cannot be referenced except in aggregate expressions. An example with aggregate expressions is:

=> SELECT x, sum(y) FROM test1 GROUP BY x;
 x | sum
---+-----
 a |   4
 b |   5
 c |   2
(3 rows)

Here sum is an aggregate function that computes a single value over the entire group. More information about the available aggregate functions can be found in Section 9.20.

Tip

Grouping without aggregate expressions effectively calculates the set of distinct values in a column. This can also be achieved using the DISTINCTclause (see Section 7.3.3).

Here is another example: it calculates the total sales for each product (rather than the total sales of all products):

SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
    FROM products p LEFT JOIN sales s USING (product_id)
    GROUP BY product_id, p.name, p.price;

In this example, the columns product_id, p.name, and p.price must be in the GROUP BY clause since they are referenced in the query select list (but see below). The columns.units does not have to be in the GROUP BY list since it is only used in an aggregate expression (sum(...)), which represents the sales of a product. For each product, the query returns a summary row about all sales of the product.

If the products table is set up so that, say, product_id is the primary key, then it would be enough to group by product_id in the above example, since name and price would be functionally dependent on the product ID, and so there would be no ambiguity about which name and price value to return for each product ID group.

In strict SQL, GROUP BY can only group by columns of the source table but PostgreSQL extends this to also allow GROUP BY to group by columns in the select list. Grouping by value expressions instead of simple column names is also allowed.

If a table has been grouped using GROUP BY, but only certain groups are of interest, the HAVING clause can be used, much like a WHERE clause, to eliminate groups from the result. The syntax is:

SELECT select_list FROM ... [WHERE ...] GROUP BY ... HAVING boolean_expression

Expressions in the HAVING clause can refer both to grouped expressions and to ungrouped expressions (which necessarily involve an aggregate function).

Example:

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3;
 x | sum
---+-----
 a |   4
 b |   5
(2 rows)

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c';
 x | sum
---+-----
 a |   4
 b |   5
(2 rows)

Again, a more realistic example:

SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit
    FROM products p LEFT JOIN sales s USING (product_id)
    WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
    GROUP BY product_id, p.name, p.price, p.cost
    HAVING sum(p.price * s.units) > 5000;

In the example above, the WHERE clause is selecting rows by a column that is not grouped (the expression is only true for sales during the last four weeks), while the HAVINGclause restricts the output to groups with total gross sales over 5000. Note that the aggregate expressions do not necessarily need to be the same in all parts of the query.

If a query contains aggregate function calls, but no GROUP BY clause, grouping still occurs: the result is a single group row (or perhaps no rows at all, if the single row is then eliminated by HAVING). The same is true if it contains a HAVING clause, even without any aggregate function calls or GROUP BY clause.

7.2.4. `GROUPING SETS`, `CUBE`, and `ROLLUP`

More complex grouping operations than those described above are possible using the concept of grouping sets. The data selected by the FROM and WHERE clauses is grouped separately by each specified grouping set, aggregates computed for each group just as for simple GROUP BY clauses, and then the results returned. For example:

=> SELECT * FROM items_sold;
 brand | size | sales
-------+------+-------
 Foo   | L    |  10
 Foo   | M    |  20
 Bar   | M    |  15
 Bar   | L    |  5
(4 rows)

=> SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING SETS ((brand), (size), ());
 brand | size | sum
-------+------+-----
 Foo   |      |  30
 Bar   |      |  20
       | L    |  15
       | M    |  35
       |      |  50
(5 rows)

Each sublist of GROUPING SETS may specify zero or more columns or expressions and is interpreted the same way as though it were directly in the GROUP BY clause. An empty grouping set means that all rows are aggregated down to a single group (which is output even if no input rows were present), as described above for the case of aggregate functions with no GROUP BY clause.

References to the grouping columns or expressions are replaced by null values in result rows for grouping sets in which those columns do not appear. To distinguish which grouping a particular output row resulted from, see Table 9.56.

A shorthand notation is provided for specifying two common types of grouping set. A clause of the form

ROLLUP ( e1, e2, e3, ... )

represents the given list of expressions and all prefixes of the list including the empty list; thus it is equivalent to

GROUPING SETS (
    ( e1, e2, e3, ... ),
    ...
    ( e1, e2 ),
    ( e1 ),
    ( )
)

This is commonly used for analysis over hierarchical data; e.g. total salary by department, division, and company-wide total.

A clause of the form

CUBE ( e1, e2, ... )

represents the given list and all of its possible subsets (i.e. the power set). Thus

CUBE ( a, b, c )

is equivalent to

GROUPING SETS (
    ( a, b, c ),
    ( a, b    ),
    ( a,    c ),
    ( a       ),
    (    b, c ),
    (    b    ),
    (       c ),
    (         )
)

The individual elements of a CUBE or ROLLUP clause may be either individual expressions, or sublists of elements in parentheses. In the latter case, the sublists are treated as single units for the purposes of generating the individual grouping sets. For example:

CUBE ( (a, b), (c, d) )

is equivalent to

GROUPING SETS (
    ( a, b, c, d ),
    ( a, b       ),
    (       c, d ),
    (            )
)

and

ROLLUP ( a, (b, c), d )

is equivalent to

GROUPING SETS (
    ( a, b, c, d ),
    ( a, b, c    ),
    ( a          ),
    (            )
)

The CUBE and ROLLUP constructs can be used either directly in the GROUP BY clause, or nested inside a GROUPING SETS clause. If one GROUPING SETS clause is nested inside another, the effect is the same as if all the elements of the inner clause had been written directly in the outer clause.

If multiple grouping items are specified in a single GROUP BY clause, then the final list of grouping sets is the cross product of the individual items. For example:

GROUP BY a, CUBE (b, c), GROUPING SETS ((d), (e))

is equivalent to

GROUP BY GROUPING SETS (
    (a, b, c, d), (a, b, c, e),
    (a, b, d),    (a, b, e),
    (a, c, d),    (a, c, e),
    (a, d),       (a, e)
)

Note

The construct (a, b) is normally recognized in expressions as a row constructor. Within the GROUP BY clause, this does not apply at the top levels of expressions, and (a, b) is parsed as a list of expressions as described above. If for some reason you need a row constructor in a grouping expression, use ROW(a, b).

7.2.5. Window Function Processing

If the query contains any window functions (see Section 3.5, Section 9.21 and Section 4.2.8), these functions are evaluated after any grouping, aggregation, and HAVING filtering is performed. That is, if the query uses any aggregates, GROUP BY, or HAVING, then the rows seen by the window functions are the group rows instead of the original table rows from FROM/WHERE.

When multiple window functions are used, all the window functions having syntactically equivalent PARTITION BY and ORDER BY clauses in their window definitions are guaranteed to be evaluated in a single pass over the data. Therefore they will see the same sort ordering, even if the ORDER BY does not uniquely determine an ordering. However, no guarantees are made about the evaluation of functions having different PARTITION BY or ORDER BY specifications. (In such cases a sort step is typically required between the passes of window function evaluations, and the sort is not guaranteed to preserve ordering of rows that its ORDER BY sees as equivalent.)

Currently, window functions always require presorted data, and so the query output will be ordered according to one or another of the window functions' PARTITION BY/ORDER BYclauses. It is not recommended to rely on this, however. Use an explicit top-level ORDER BY clause if you want to be sure the results are sorted in a particular way.

9.11. 地理資訊函式及運算子

The geometric typespoint,box,lseg,line,path,polygon, andcirclehave a large set of native support functions and operators, shown inTable 9.33,Table 9.34, andTable 9.35.

Caution

Note that the“same as”operator,~=, represents the usual notion of equality for thepoint,box,polygon, andcircletypes. Some of these types also have an=operator, but=compares for equal_areas_only. The other scalar comparison operators (<=and so on) likewise compare areas for these types.

Table 9.33. Geometric Operators

Operator

Description

Example

+

Translation

box '((0,0),(1,1))' + point '(2.0,0)'

-

Translation

box '((0,0),(1,1))' - point '(2.0,0)'

*

Scaling/rotation

box '((0,0),(1,1))' * point '(2.0,0)'

/

Scaling/rotation

box '((0,0),(2,2))' / point '(2.0,0)'

#

Point or box of intersection

box '((1,-1),(-1,1))' # box '((1,1),(-2,-2))'

#

Number of points in path or polygon

# path '((1,0),(0,1),(-1,0))'

@-@

Length or circumference

@-@ path '((0,0),(1,0))'

@@

Center

@@ circle '((0,0),10)'

##

Closest point to first operand on second operand

point '(0,0)' ## lseg '((2,0),(0,2))'

<->

Distance between

circle '((0,0),1)' <-> circle '((5,0),1)'

&&

Overlaps? (One point in common makes this true.)

box '((0,0),(1,1))' && box '((0,0),(2,2))'

<<

Is strictly left of?

circle '((0,0),1)' << circle '((5,0),1)'

>>

Is strictly right of?

circle '((5,0),1)' >> circle '((0,0),1)'

&<

Does not extend to the right of?

box '((0,0),(1,1))' &< box '((0,0),(2,2))'

&>

Does not extend to the left of?

box '((0,0),(3,3))' &> box '((0,0),(2,2))'

`<<

Is strictly below?

`box '((0,0),(3,3))' <<

box '((3,4),(5,5))'`

>>`

Is strictly above?

`box '((3,4),(5,5))'

>> box '((0,0),(3,3))'`

`&<

Does not extend above?

`box '((0,0),(1,1))' &<

box '((0,0),(2,2))'`

&>`

Does not extend below?

`box '((0,0),(3,3))'

&> box '((0,0),(2,2))'`

<^

Is below (allows touching)?

circle '((0,0),1)' <^ circle '((0,5),1)'

>^

Is above (allows touching)?

circle '((0,5),1)' >^ circle '((0,0),1)'

?#

Intersects?

lseg '((-1,0),(1,0))' ?# box '((-2,-2),(2,2))'

?-

Is horizontal?

?- lseg '((-1,0),(1,0))'

?-

Are horizontally aligned?

point '(1,0)' ?- point '(0,0)'

Is vertical?

lseg '((-1,0),(1,0))'`

Are vertically aligned?

`point '(0,1)' ?

point '(0,0)'`

`?-

Is perpendicular?

`lseg '((0,0),(0,1))' ?-

lseg '((0,0),(1,0))'`

Are parallel?

`lseg '((-1,0),(1,0))' ?

lseg '((-1,2),(1,2))'`

@>

Contains?

circle '((0,0),2)' @> point '(1,1)'

<@

Contained in or on?

point '(1,1)' <@ circle '((0,0),2)'

~=

Same as?

polygon '((0,0),(1,1))' ~= polygon '((1,1),(0,0))'

Note

BeforePostgreSQL8.2, the containment operators@>and<@were respectively called~and@. These names are still available, but are deprecated and will eventually be removed.

Table 9.34. Geometric Functions

Function

Return Type

Description

Example

area(object)

double precision

area

area(box '((0,0),(1,1))')

center(object)

point

center

center(box '((0,0),(1,2))')

diameter(circle)

double precision

diameter of circle

diameter(circle '((0,0),2.0)')

height(box)

double precision

vertical size of box

height(box '((0,0),(1,1))')

isclosed(path)

boolean

a closed path?

isclosed(path '((0,0),(1,1),(2,0))')

isopen(path)

boolean

an open path?

isopen(path '[(0,0),(1,1),(2,0)]')

length(object)

double precision

length

length(path '((-1,0),(1,0))')

npoints(path)

int

number of points

npoints(path '[(0,0),(1,1),(2,0)]')

npoints(polygon)

int

number of points

npoints(polygon '((1,1),(0,0))')

pclose(path)

path

convert path to closed

pclose(path '[(0,0),(1,1),(2,0)]')

popen(path)

path

convert path to open

popen(path '((0,0),(1,1),(2,0))')

radius(circle)

double precision

radius of circle

radius(circle '((0,0),2.0)')

width(box)

double precision

horizontal size of box

width(box '((0,0),(1,1))')

Table 9.35. Geometric Type Conversion Functions

Function

Return Type

Description

Example

box(circle)

box

circle to box

box(circle '((0,0),2.0)')

box(point)

box

point to empty box

box(point '(0,0)')

box(point,point)

box

points to box

box(point '(0,0)', point '(1,1)')

box(polygon)

box

polygon to box

box(polygon '((0,0),(1,1),(2,0))')

bound_box(box,box)

box

boxes to bounding box

bound_box(box '((0,0),(1,1))', box '((3,3),(4,4))')

circle(box)

circle

box to circle

circle(box '((0,0),(1,1))')

circle(point,double precision)

circle

center and radius to circle

circle(point '(0,0)', 2.0)

circle(polygon)

circle

polygon to circle

circle(polygon '((0,0),(1,1),(2,0))')

line(point,point)

line

points to line

line(point '(-1,0)', point '(1,0)')

lseg(box)

lseg

box diagonal to line segment

lseg(box '((-1,0),(1,0))')

lseg(point,point)

lseg

points to line segment

lseg(point '(-1,0)', point '(1,0)')

path(polygon)

path

polygon to path

path(polygon '((0,0),(1,1),(2,0))')

point(double precision,double precision)

point

construct point

point(23.4, -44.5)

point(box)

point

center of box

point(box '((-1,0),(1,0))')

point(circle)

point

center of circle

point(circle '((0,0),2.0)')

point(lseg)

point

center of line segment

point(lseg '((-1,0),(1,0))')

point(polygon)

point

center of polygon

point(polygon '((0,0),(1,1),(2,0))')

polygon(box)

polygon

box to 4-point polygon

polygon(box '((0,0),(1,1))')

polygon(circle)

polygon

circle to 12-point polygon

polygon(circle '((0,0),2.0)')

polygon(npts,circle)

polygon

circle tonpts-point polygon

polygon(12, circle '((0,0),2.0)')

polygon(path)

polygon

path to polygon

polygon(path '((0,0),(1,1),(2,0))')

It is possible to access the two component numbers of apointas though the point were an array with indexes 0 and 1. For example, ift.pis apointcolumn thenSELECT p[0] FROM tretrieves the X coordinate andUPDATE t SET p[1] = ...changes the Y coordinate. In the same way, a value of typeboxorlsegcan be treated as an array of twopointvalues.

Theareafunction works for the typesbox,circle, andpath. Theareafunction only works on thepathdata type if the points in thepathare non-intersecting. For example, thepath'((0,0),(0,1),(2,1),(2,2),(1,2),(1,0),(0,0))'::PATHwill not work; however, the following visually identicalpath'((0,0),(0,1),(1,1),(1,2),(2,2),(2,1),(1,1),(1,0),(0,0))'::PATHwill work. If the concept of an intersecting versus non-intersectingpathis confusing, draw both of the abovepaths side by side on a piece of graph paper.

9.8. 型別轉換函式

PostgreSQL 格式化函數提供了一套功能強大的工具，用於將各種資料型別（日期/時間、整數、浮點數、數字）轉換為格式化的字串，以及從格式化字串轉換為特定資料型別。Table 9.24 列出了這些函數，而這些函數都遵循一個通用的呼叫約定：第一個參數是要格式化的值，第二個參數是定義輸出或輸入格式的樣板。

Table 9.24. Formatting Functions

Function

Return Type

Description

Example

to_char(timestamp, text)

text

將時間戳記轉換為字串

to_char(current_timestamp, 'HH12:MI:SS')

to_char(interval, text)

text

convert interval to string

to_char(interval '15h 2m 12s', 'HH24:MI:SS')

to_char(int, text)

text

convert integer to string

to_char(125, '999')

to_char(double precision, text)

text

convert real/double precision to string

to_char(125.8::real, '999D9')

to_char(numeric, text)

text

convert numeric to string

to_char(-125.8, '999D99S')

to_date(text, text)

date

convert string to date

to_date('05 Dec 2000', 'DD Mon YYYY')

to_number(text, text)

numeric

convert string to numeric

to_number('12,454.8-', '99G999D9S')

to_timestamp(text, text)

timestamp with time zone

convert string to time stamp

to_timestamp('05 Dec 2000', 'DD Mon YYYY')

提醒還有一個單一參數 to_timestamp 函數；請參閱 Table 9.31。

小技巧 存在有 to_timestamp 和 to_date 來處理無法透過簡單轉換進行轉換的輸入格式。對於大多數標準日期/時間格式，只需將來源字串強制轉換為所需的資料型別即可，並且非常容易。同樣地，對於標準數字表示形式，to_number 也不是必要的。

在 to_char 輸出樣版字串中，基於給予值識別並替換為某些格式資料的某些樣式。非樣板的任何文字都將被逐字複製。同樣地，在輸入樣板字串（用於其他功能）中，樣板標識輸入資料字串要提供的值。如果樣板字串中存在不是樣板的字串，則只需跳過輸入資料字串中的相對應字元（無論它們是否等於樣板字串字元）。

Table 9.25 shows the template patterns available for formatting date and time values.

Table 9.25. Template Patterns for Date/Time Formatting

Pattern

Description

HH

hour of day (01-12)

HH12

hour of day (01-12)

HH24

hour of day (00-23)

MI

minute (00-59)

SS

second (00-59)

MS

millisecond (000-999)

US

microsecond (000000-999999)

SSSS

seconds past midnight (0-86399)

AM, am, PM or pm

meridiem indicator (without periods)

A.M., a.m., P.M. or p.m.

meridiem indicator (with periods)

Y,YYY

year (4 or more digits) with comma

YYYY

year (4 or more digits)

YYY

last 3 digits of year

YY

last 2 digits of year

Y

last digit of year

IYYY

ISO 8601 week-numbering year (4 or more digits)

IYY

last 3 digits of ISO 8601 week-numbering year

IY

last 2 digits of ISO 8601 week-numbering year

I

last digit of ISO 8601 week-numbering year

BC, bc, AD or ad

era indicator (without periods)

B.C., b.c., A.D. or a.d.

era indicator (with periods)

MONTH

full upper case month name (blank-padded to 9 chars)

Month

full capitalized month name (blank-padded to 9 chars)

month

full lower case month name (blank-padded to 9 chars)

MON

abbreviated upper case month name (3 chars in English, localized lengths vary)

Mon

abbreviated capitalized month name (3 chars in English, localized lengths vary)

mon

abbreviated lower case month name (3 chars in English, localized lengths vary)

MM

month number (01-12)

DAY

full upper case day name (blank-padded to 9 chars)

Day

full capitalized day name (blank-padded to 9 chars)

day

full lower case day name (blank-padded to 9 chars)

DY

abbreviated upper case day name (3 chars in English, localized lengths vary)

Dy

abbreviated capitalized day name (3 chars in English, localized lengths vary)

dy

abbreviated lower case day name (3 chars in English, localized lengths vary)

DDD

day of year (001-366)

IDDD

day of ISO 8601 week-numbering year (001-371; day 1 of the year is Monday of the first ISO week)

DD

day of month (01-31)

D

day of the week, Sunday (1) to Saturday (7)

ID

ISO 8601 day of the week, Monday (1) to Sunday (7)

W

week of month (1-5) (the first week starts on the first day of the month)

WW

week number of year (1-53) (the first week starts on the first day of the year)

IW

week number of ISO 8601 week-numbering year (01-53; the first Thursday of the year is in week 1)

CC

century (2 digits) (the twenty-first century starts on 2001-01-01)

J

Julian Day (integer days since November 24, 4714 BC at midnight UTC)

Q

quarter

RM

month in upper case Roman numerals (I-XII; I=January)

rm

month in lower case Roman numerals (i-xii; i=January)

TZ

upper case time-zone abbreviation (only supported in to_char)

tz

lower case time-zone abbreviation (only supported in to_char)

TZH

time-zone hours

TZM

time-zone minutes

OF

time-zone offset from UTC (only supported in to_char)

Modifiers can be applied to any template pattern to alter its behavior. For example, FMMonth is the Month pattern with the FM modifier. Table 9.26 shows the modifier patterns for date/time formatting.

Table 9.26. Template Pattern Modifiers for Date/Time Formatting

Modifier

Description

Example

FM prefix

fill mode (suppress leading zeroes and padding blanks)

FMMonth

TH suffix

upper case ordinal number suffix

DDTH, e.g., 12TH

th suffix

lower case ordinal number suffix

DDth, e.g., 12th

FX prefix

fixed format global option (see usage notes)

FX Month DD Day

TM prefix

translation mode (print localized day and month names based on )

TMMonth

SP suffix

spell mode (not implemented)

DDSP

Usage notes for date/time formatting:

FM suppresses leading zeroes and trailing blanks that would otherwise be added to make the output of a pattern be fixed-width. In PostgreSQL, FM modifies only the next specification, while in Oracle FM affects all subsequent specifications, and repeated FM modifiers toggle fill mode on and off.
TM does not include trailing blanks. to_timestamp and to_date ignore the TM modifier.
to_timestamp and to_date skip multiple blank spaces at the beginning of the input string and around date and time values unless the FX option is used. For example, to_timestamp(' 2000 JUN', 'YYYY MON') and to_timestamp('2000 - JUN', 'YYYY-MON') work, but to_timestamp('2000 JUN', 'FXYYYY MON') returns an error because to_timestamp expects only a single space. FX must be specified as the first item in the template.
A separator (a space or non-letter/non-digit character) in the template string of to_timestamp and to_date matches any single separator in the input string or is skipped, unless the FX option is used. For example, to_timestamp('2000JUN', 'YYYY///MON') and to_timestamp('2000/JUN', 'YYYY MON') work, but to_timestamp('2000//JUN', 'YYYY/MON') returns an error because the number of separators in the input string exceeds the number of separators in the template.
If FX is specified, a separator in the template string matches exactly one character in the input string. But note that the input string character is not required to be the same as the separator from the template string. For example, to_timestamp('2000/JUN', 'FXYYYY MON') works, but to_timestamp('2000/JUN', 'FXYYYY MON') returns an error because the second space in the template string consumes the letter J from the input string.
A TZH template pattern can match a signed number. Without the FX option, minus signs may be ambiguous, and could be interpreted as a separator. This ambiguity is resolved as follows: If the number of separators before TZH in the template string is less than the number of separators before the minus sign in the input string, the minus sign is interpreted as part of TZH. Otherwise, the minus sign is considered to be a separator between values. For example, to_timestamp('2000 -10', 'YYYY TZH') matches -10 to TZH, but to_timestamp('2000 -10', 'YYYY TZH') matches 10 to TZH.
Ordinary text is allowed in to_char templates and will be output literally. You can put a substring in double quotes to force it to be interpreted as literal text even if it contains template patterns. For example, in '"Hello Year "YYYY', the YYYY will be replaced by the year data, but the single Y in Year will not be. In to_date, to_number, and to_timestamp, literal text and double-quoted strings result in skipping the number of characters contained in the string; for example "XX" skips two input characters (whether or not they are XX).
Tip
Prior to PostgreSQL 12, it was possible to skip arbitrary text in the input string using non-letter or non-digit characters. For example, to_timestamp('2000y6m1d', 'yyyy-MM-DD') used to work. Now you can only use letter characters for this purpose. For example, to_timestamp('2000y6m1d', 'yyyytMMtDDt') and to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"') skip y, m, and d.
If you want to have a double quote in the output you must precede it with a backslash, for example '\"YYYY Month\"'. Backslashes are not otherwise special outside of double-quoted strings. Within a double-quoted string, a backslash causes the next character to be taken literally, whatever it is (but this has no special effect unless the next character is a double quote or another backslash).
In to_timestamp and to_date, if the year format specification is less than four digits, e.g. YYY, and the supplied year is less than four digits, the year will be adjusted to be nearest to the year 2020, e.g. 95 becomes 1995.
In to_timestamp and to_date, the YYYY conversion has a restriction when processing years with more than 4 digits. You must use some non-digit character or template after YYYY, otherwise the year is always interpreted as 4 digits. For example (with the year 20000): to_date('200001131', 'YYYYMMDD') will be interpreted as a 4-digit year; instead use a non-digit separator after the year, like to_date('20000-1131', 'YYYY-MMDD') or to_date('20000Nov31', 'YYYYMonDD').
In to_timestamp and to_date, the CC (century) field is accepted but ignored if there is a YYY, YYYY or Y,YYY field. If CC is used with YY or Y then the result is computed as that year in the specified century. If the century is specified but the year is not, the first year of the century is assumed.
In to_timestamp and to_date, weekday names or numbers (DAY, D, and related field types) are accepted but are ignored for purposes of computing the result. The same is true for quarter (Q) fields.
In to_timestamp and to_date, an ISO 8601 week-numbering date (as distinct from a Gregorian date) can be specified in one of two ways:
- Year, week number, and weekday: for example to_date('2006-42-4', 'IYYY-IW-ID') returns the date 2006-10-19. If you omit the weekday it is assumed to be 1 (Monday).
- Year and day of year: for example to_date('2006-291', 'IYYY-IDDD') also returns 2006-10-19.
Attempting to enter a date using a mixture of ISO 8601 week-numbering fields and Gregorian date fields is nonsensical, and will cause an error. In the context of an ISO 8601 week-numbering year, the concept of a “month” or “day of month” has no meaning. In the context of a Gregorian year, the ISO week has no meaning.
Caution
While to_date will reject a mixture of Gregorian and ISO week-numbering date fields, to_char will not, since output format specifications like YYYY-MM-DD (IYYY-IDDD) can be useful. But avoid writing something like IYYY-MM-DD; that would yield surprising results near the start of the year. (See Section 9.9.1 for more information.)
In to_timestamp, millisecond (MS) or microsecond (US) fields are used as the seconds digits after the decimal point. For example to_timestamp('12.3', 'SS.MS') is not 3 milliseconds, but 300, because the conversion treats it as 12 + 0.3 seconds. So, for the format SS.MS, the input values 12.3, 12.30, and 12.300 specify the same number of milliseconds. To get three milliseconds, one must write 12.003, which the conversion treats as 12 + 0.003 = 12.003 seconds.
Here is a more complex example: to_timestamp('15:12:02.020.001230', 'HH24:MI:SS.MS.US') is 15 hours, 12 minutes, and 2 seconds + 20 milliseconds + 1230 microseconds = 2.021230 seconds.
to_char(..., 'ID')'s day of the week numbering matches the extract(isodow from ...) function, but to_char(..., 'D')'s does not match extract(dow from ...)'s day numbering.
to_char(interval) formats HH and HH12 as shown on a 12-hour clock, for example zero hours and 36 hours both output as 12, while HH24 outputs the full hour value, which can exceed 23 in an interval value.

Table 9.27 shows the template patterns available for formatting numeric values.

Table 9.27. Template Patterns for Numeric Formatting

Pattern

Description

9

digit position (can be dropped if insignificant)

0

digit position (will not be dropped, even if insignificant)

. (period)

decimal point

, (comma)

group (thousands) separator

PR

negative value in angle brackets

S

sign anchored to number (uses locale)

L

currency symbol (uses locale)

D

decimal point (uses locale)

G

group separator (uses locale)

MI

minus sign in specified position (if number < 0)

PL

plus sign in specified position (if number > 0)

SG

plus/minus sign in specified position

RN

Roman numeral (input between 1 and 3999)

TH or th

ordinal number suffix

V

shift specified number of digits (see notes)

EEEE

exponent for scientific notation

Usage notes for numeric formatting:

0 specifies a digit position that will always be printed, even if it contains a leading/trailing zero. 9 also specifies a digit position, but if it is a leading zero then it will be replaced by a space, while if it is a trailing zero and fill mode is specified then it will be deleted. (For to_number(), these two pattern characters are equivalent.)
The pattern characters S, L, D, and G represent the sign, currency symbol, decimal point, and thousands separator characters defined by the current locale (see lc_monetary and lc_numeric). The pattern characters period and comma represent those exact characters, with the meanings of decimal point and thousands separator, regardless of locale.
If no explicit provision is made for a sign in to_char()'s pattern, one column will be reserved for the sign, and it will be anchored to (appear just left of) the number. If S appears just left of some 9's, it will likewise be anchored to the number.
A sign formatted using SG, PL, or MI is not anchored to the number; for example, to_char(-12, 'MI9999') produces '- 12' but to_char(-12, 'S9999') produces ' -12'. (The Oracle implementation does not allow the use of MI before 9, but rather requires that 9 precede MI.)
TH does not convert values less than zero and does not convert fractional numbers.
PL, SG, and TH are PostgreSQL extensions.
In to_number, if non-data template patterns such as L or TH are used, the corresponding number of input characters are skipped, whether or not they match the template pattern, unless they are data characters (that is, digits, sign, decimal point, or comma). For example, TH would skip two non-data characters.
V with to_char multiplies the input values by 10^n, where n is the number of digits following V. V with to_number divides in a similar manner. to_char and to_number do not support the use of V combined with a decimal point (e.g., 99.9V99 is not allowed).
EEEE (scientific notation) cannot be used in combination with any of the other formatting patterns or modifiers other than digit and decimal point patterns, and must be at the end of the format string (e.g., 9.99EEEE is a valid pattern).

Certain modifiers can be applied to any template pattern to alter its behavior. For example, FM99.99 is the 99.99 pattern with the FM modifier. Table 9.28 shows the modifier patterns for numeric formatting.

Table 9.28. Template Pattern Modifiers for Numeric Formatting

Modifier

Description

Example

FM prefix

fill mode (suppress trailing zeroes and padding blanks)

FM99.99

TH suffix

upper case ordinal number suffix

999TH

th suffix

lower case ordinal number suffix

999th

Table 9.29 shows some examples of the use of the to_char function.

Table 9.29. `to_char` Examples

Expression

Result

to_char(current_timestamp, 'Day, DD HH12:MI:SS')

'Tuesday , 06 05:39:18'

to_char(current_timestamp, 'FMDay, FMDD HH12:MI:SS')

'Tuesday, 6 05:39:18'

to_char(-0.1, '99.99')

' -.10'

to_char(-0.1, 'FM9.99')

'-.1'

to_char(-0.1, 'FM90.99')

'-0.1'

to_char(0.1, '0.9')

' 0.1'

to_char(12, '9990999.9')

' 0012.0'

to_char(12, 'FM9990999.9')

'0012.'

to_char(485, '999')

' 485'

to_char(-485, '999')

'-485'

to_char(485, '9 9 9')

' 4 8 5'

to_char(1485, '9,999')

' 1,485'

to_char(1485, '9G999')

' 1 485'

to_char(148.5, '999.999')

' 148.500'

to_char(148.5, 'FM999.999')

'148.5'

to_char(148.5, 'FM999.990')

'148.500'

to_char(148.5, '999D999')

' 148,500'

to_char(3148.5, '9G999D999')

' 3 148,500'

to_char(-485, '999S')

'485-'

to_char(-485, '999MI')

'485-'

to_char(485, '999MI')

'485 '

to_char(485, 'FM999MI')

'485'

to_char(485, 'PL999')

'+485'

to_char(485, 'SG999')

'+485'

to_char(-485, 'SG999')

'-485'

to_char(-485, '9SG99')

'4-85'

to_char(-485, '999PR')

'<485>'

to_char(485, 'L999')

'DM 485'

to_char(485, 'RN')

' CDLXXXV'

to_char(485, 'FMRN')

'CDLXXXV'

to_char(5.2, 'FMRN')

'V'

to_char(482, '999th')

' 482nd'

to_char(485, '"Good number:"999')

'Good number: 485'

to_char(485.8, '"Pre:"999" Post:" .999')

'Pre: 485 Post: .800'

to_char(12, '99V999')

' 12000'

to_char(12.4, '99V999')

' 12400'

to_char(12.45, '99V9')

' 125'

to_char(0.0004859, '9.99EEEE')

' 4.86e-04'

8.5. 日期時間型別

PostgreSQL 支援完整的 SQL 日期和時間格式，如表 8.9 所示。對於這些資料型態能使用的操作，將會在9.9節說明。

Table 8.9. 日期/時間型態

Name

Storage Size

Description

Low Value

High Value

Resolution

timestamp [ (p) ] [ without time zone ]

8 bytes

both date and time (no time zone)

4713 BC

294276 AD

1 microsecond

timestamp [ (p) ] with time zone

8 bytes

both date and time, with time zone

4713 BC

294276 AD

1 microsecond

date

4 bytes

date (no time of day)

4713 BC

5874897 AD

1 day

time [ (p) ] [ without time zone ]

8 bytes

time of day (no date)

00:00:00

24:00:00

1 microsecond

time [ (p) ] with time zone

12 bytes

time of day (no date), with time zone

00:00:00+1459

24:00:00-1459

1 microsecond

interval [ fields ] [ (p) ]

16 bytes

time interval

-178000000 years

178000000 years

1 microsecond

注意

SQL 標準中要求 timestamp 的效果等同於 timestamp without time zone，對此 PostgreSQL 尊重這個行為。同時 PostgreSQL 額外擴充了 timestamptz 作為 timestamp with time zone 的縮寫。

time、timestamp 和 interval 接受 p 作為非必須的精度參數，可指定秒的欄位保留的小數位數。預設情況下，精度沒有明確的界限。其中 p 允許的範圍是 0 到 6。

interval 型態有個額外的選項，可以寫下下列其中一個詞組來限制存放的欄位：

YEAR
MONTH
DAY
HOUR
MINUTE
SECOND
YEAR TO MONTH
DAY TO HOUR
DAY TO MINUTE
DAY TO SECOND
HOUR TO MINUTE
HOUR TO SECOND
MINUTE TO SECOND

需注意若是 fields 和 p 同時指定時，fields 必須要包含 SECOND。這是因為精度只會套用在秒上。

time with time zone 型態是由 SQL 標準所定義的，但是在定義中展示的屬性會導致對有用性產生疑問。在多數狀況下，date、time、timestamp without time zone 和 timestamp with time zone 的組合應該就能提供任何應用程式需要的完整日期/時間功能。

abstime 和 reltime 型態是較低精度的內部用型態，並不建議將這些型態用在應用程式中；這些內部型態也可能在未來的釋出中消失。

8.5.1. 日期/時間輸入

日期和時間的輸入格式可以接受幾乎任何合理的格式，包括 ISO 8601、相容於 SQL 的格式、傳統 POSTGRES 格式或者其他格式。在部份格式中，日期的年、月、日的順序可能很含糊，因此有支援指定這些欄位期望的順序。可以設定 DateStyle 參數為 MDY 來以月-日-年表示、設定為 DMY 以日-月-年表示、或者設定為 YMD 以年-月-日表示。

PostgreSQL 在處理日期/時間的輸入是比 SQL 標準要求的更加靈活，關於精確的解析規則以及包含月份、一週天數、時區等可以接受的文字欄位，可以參閱附錄 B。

請記得，任何日期和時間字面的輸入，都需要像文字一樣以單引號結束，詳細的資訊請參閱4.1.2.7 節。SQL 要求使用以下的語法：

type [ (p) ] 'value'

其中 p 是非必須的精度設定，用來指定秒欄位的小數位數。精度可以用來指定 time、timestamp 和 interval 型態，可指定範圍為 0 到 6。如果沒有指定精度時，預設將以字面數值的精度為準（但最多不超過 6 位）。

8.5.1.1. 日期

表 8.10 列出 date 型態的一些可能的輸入格式：

表 8.10. 日期輸入

Example

Description

1999-01-08

ISO 8601; January 8 in any mode (recommended format)

January 8, 1999

unambiguous in any datestyle input mode

1/8/1999

January 8 in MDY mode; August 1 in DMY mode

1/18/1999

January 18 in MDY mode; rejected in other modes

01/02/03

January 2, 2003 in MDY mode; February 1, 2003 in DMY mode; February 3, 2001 in YMD mode

1999-Jan-08

January 8 in any mode

Jan-08-1999

January 8 in any mode

08-Jan-1999

January 8 in any mode

99-Jan-08

January 8 in YMD mode, else error

08-Jan-99

January 8, except error in YMD mode

Jan-08-99

January 8, except error in YMD mode

19990108

ISO 8601; January 8, 1999 in any mode

990108

ISO 8601; January 8, 1999 in any mode

1999.008

year and day of year

J2451187

Julian date

January 8, 99 BC

year 99 BC

8.5.1.2. 時間

time-of-day 格式包含 time [ (p) ] without time zone和time [ (_p_\) \] with time zone，其中 time 單獨出現時等同於 time without time zone。

這些型態的合法輸入包含了一天當中的時間，以及非必須的時區。（請參照表 8.11 和表 8.12）。如果在 time without time zone 的輸入中指定了時區，則時區會被無聲地忽略。你也可以指定日期，但日期也會被忽略，除非你指定的時區名稱是像 America/New_York 這種具有日光節約規則的時區，因為在這種狀況下，為了能夠決定要套用一般規則或是日光節約規則，必須要有日期。適合的時差資訊會被紀錄在 time with time zone 的值當中。

表 8.11. 時間輸入

Example

Description

04:05:06.789

ISO 8601

04:05:06

ISO 8601

04:05

ISO 8601

040506

ISO 8601

04:05 AM

same as 04:05; AM does not affect value

04:05 PM

same as 16:05; input hour must be <= 12

04:05:06.789-8

ISO 8601

04:05:06-08:00

ISO 8601

04:05-08:00

ISO 8601

040506-08

ISO 8601

04:05:06 PST

time zone specified by abbreviation

2003-04-12 04:05:06 America/New_York

time zone specified by full name

表 8.12. 時區輸入

Example

Description

PST

Abbreviation (for Pacific Standard Time)

America/New_York

Full time zone name

PST8PDT

POSIX-style time zone specification

-8:00

ISO-8601 offset for PST

-800

ISO-8601 offset for PST

-8

ISO-8601 offset for PST

zulu

Military abbreviation for UTC

z

Short form of zulu

關於指定時區的其他資訊，請參照8.5.3節。

8.5.1.3. 時間戳記

時間戳記型態的合法輸入，依序包含了日期、時間、非必須的時區、以及非必須的 AD 或者 BC。（其中，AD 或者 BC 也可以寫在時區前面，但這並非推薦的格式。）因此：

1999-01-08 04:05:06

以及：

1999-01-08 04:05:06 -8:00

都是遵循 ISO 8601 標準的合法值。除此之外，常見的格式：

January 8 04:05:06 1999 PST

也有支援。

SQL 標準中，timestamp without time zone 和 timestamp with time zone 字面可以在時間後面加上 “+” 或 “-” 符號和時差來做區別，因此根據這個標準，

TIMESTAMP '2004-10-19 10:23:54'

是 timestamp without time zone 型態，而

TIMESTAMP '2004-10-19 10:23:54+02'

則是 timestamp with time zone 型態。PostgreSQL 從不會在識別型態前就解析字面的內容，因此會將上述兩種值都視為 timestamp without time zone 型態。如要確保字面會被視為 timestamp with time zone，請給它正確而明確的型態：

TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'

在一個已被確定為沒有時區的時間戳記的字串中，PostgreSQL 將默默地忽略任何時區指示。也就是說，結果值是從輸入值中的日期/時間字串產生的，而不針對時區進行調整。

對於帶有時區的時間戳記，內部儲存的值始終為 UTC（Universal Coordinated Time，傳統上稱為格林威治標準時間，GMT）。具有指定時區的輸入值將使用該時區的相對偏移量轉換為 UTC。如果輸入字串中未指定時區，則假定它位於系統的 TimeZone 參數所指示的時區中，並使用時區的偏移量轉換為 UTC。

輸出帶有時區值的時間戳記時，始終由 UTC 轉換為目前時區，並在該時區中顯示為本地時間。要查看另一個時區的時間，請變更時區或使用 AT TIME ZONE 語法（參閱第 9.9.3 節）。

沒有時區的時間戳記和帶時區的時間戳記之間的轉換通常假定應該採用沒有時區值的時間戳記或本地時間所給予的時區。可以使用 AT TIME ZONE 為指定轉換不同的時區。

8.5.1.4. 特殊值

為方便起見，PostgreSQL 支援幾個特殊的日期/時間輸入值，如 Table 8.13 所示。infinaity 和 -infinity 值在系統內部有特別的表示，但不會顯示；而其他的只是符號縮寫，在閱讀時會轉換為普通的日期/時間值。（特別是，now 和相關的字串一旦被讀取就會被轉換為特定的時間值。）當在 SQL 命令中要作為常數使用時，所有這些值都需要用單引號括起來。

Table 8.13. Special Date/Time Inputs

Input String

Valid Types

Description

epoch

date, timestamp

1970-01-01 00:00:00+00 (Unix system time zero)

infinity

date, timestamp

later than all other time stamps

-infinity

date, timestamp

earlier than all other time stamps

now

date, time, timestamp

current transaction's start time

today

date, timestamp

midnight today

tomorrow

date, timestamp

midnight tomorrow

yesterday

date, timestamp

midnight yesterday

allballs

time

00:00:00.00 UTC

以下 SQL 相容函數也可用於取得相對應資料型別目前的時間值：CURRENT_DATE，CURRENT_TIME，CURRENT_TIMESTAMP，LOCALTIME，LOCALTIMESTAMP。後四者接受選擇性的 subsecond 級精確度。（請參閱第 9.9.4 節。）請注意，這些是 SQL 函數，在資料輸入字串中會無法識別。

8.5.2. Date/Time Output

The output format of the date/time types can be set to one of the four styles ISO 8601, SQL (Ingres), traditional POSTGRES (Unix date format), or German. The default is the ISO format. (The SQL standard requires the use of the ISO 8601 format. The name of the “SQL” output format is a historical accident.) Table 8.14 shows examples of each output style. The output of the date and time types is generally only the date or time part in accordance with the given examples. However, the POSTGRES style outputs date-only values in ISO format.

Table 8.14. Date/Time Output Styles

Style Specification

Description

Example

ISO

ISO 8601, SQL standard

1997-12-17 07:37:16-08

SQL

traditional style

12/17/1997 07:37:16.00 PST

Postgres

original style

Wed Dec 17 07:37:16 1997 PST

German

regional style

17.12.1997 07:37:16.00 PST

Note

ISO 8601 specifies the use of uppercase letter T to separate the date and time. PostgreSQLaccepts that format on input, but on output it uses a space rather than T, as shown above. This is for readability and for consistency with RFC 3339 as well as some other database systems.

In the SQL and POSTGRES styles, day appears before month if DMY field ordering has been specified, otherwise month appears before day. (See Section 8.5.1 for how this setting also affects interpretation of input values.) Table 8.15 shows examples.

Table 8.15. Date Order Conventions

datestyle Setting

Input Ordering

Example Output

SQL, DMY

day/month/year

17/12/1997 15:37:16.00 CET

SQL, MDY

month/day/year

12/17/1997 07:37:16.00 PST

Postgres, DMY

day/month/year

Wed 17 Dec 07:37:16 1997 PST

The date/time style can be selected by the user using the SET datestyle command, the DateStyle parameter in the postgresql.conf configuration file, or the PGDATESTYLE environment variable on the server or client.

The formatting function to_char (see Section 9.8) is also available as a more flexible way to format date/time output.

8.5.3. Time Zones

Time zones, and time-zone conventions, are influenced by political decisions, not just earth geometry. Time zones around the world became somewhat standardized during the 1900s, but continue to be prone to arbitrary changes, particularly with respect to daylight-savings rules. PostgreSQL uses the widely-used IANA (Olson) time zone database for information about historical time zone rules. For times in the future, the assumption is that the latest known rules for a given time zone will continue to be observed indefinitely far into the future.

PostgreSQL endeavors to be compatible with the SQL standard definitions for typical usage. However, the SQL standard has an odd mix of date and time types and capabilities. Two obvious problems are:

Although the date type cannot have an associated time zone, the time type can. Time zones in the real world have little meaning unless associated with a date as well as a time, since the offset can vary through the year with daylight-saving time boundaries.
The default time zone is specified as a constant numeric offset from UTC. It is therefore impossible to adapt to daylight-saving time when doing date/time arithmetic across DST boundaries.

To address these difficulties, we recommend using date/time types that contain both date and time when using time zones. We do not recommend using the type time with time zone (though it is supported by PostgreSQL for legacy applications and for compliance with the SQL standard). PostgreSQL assumes your local time zone for any type containing only date or time.

All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the zone specified by the TimeZone configuration parameter before being displayed to the client.

PostgreSQL allows you to specify time zones in three different forms:

A full time zone name, for example America/New_York. The recognized time zone names are listed in the pg_timezone_names view (see Section 51.90). PostgreSQL uses the widely-used IANA time zone data for this purpose, so the same time zone names are also recognized by much other software.
A time zone abbreviation, for example PST. Such a specification merely defines a particular offset from UTC, in contrast to full time zone names which can imply a set of daylight savings transition-date rules as well. The recognized abbreviations are listed in the pg_timezone_abbrevs view (see Section 51.89). You cannot set the configuration parameters TimeZone or log_timezone to a time zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME ZONE operator.
In addition to the timezone names and abbreviations, PostgreSQL will accept POSIX-style time zone specifications of the form STDoffset or STDoffsetDST, where STD is a zone abbreviation, offset is a numeric offset in hours west from UTC, and DST is an optional daylight-savings zone abbreviation, assumed to stand for one hour ahead of the given offset. For example, if EST5EDT were not already a recognized zone name, it would be accepted and would be functionally equivalent to United States East Coast time. In this syntax, a zone abbreviation can be a string of letters, or an arbitrary string surrounded by angle brackets (<>). When a daylight-savings zone abbreviation is present, it is assumed to be used according to the same daylight-savings transition rules used in the IANA time zone database's posixrules entry. In a standard PostgreSQL installation, posixrules is the same as US/Eastern, so that POSIX-style time zone specifications follow USA daylight-savings rules. If needed, you can adjust this behavior by replacing the posixrules file.

In short, this is the difference between abbreviations and full names: abbreviations represent a specific offset from UTC, whereas many of the full names imply a local daylight-savings time rule, and so have two possible UTC offsets. As an example, 2014-06-04 12:00 America/New_York represents noon local time in New York, which for this particular date was Eastern Daylight Time (UTC-4). So 2014-06-04 12:00 EDT specifies that same time instant. But 2014-06-04 12:00 EST specifies noon Eastern Standard Time (UTC-5), regardless of whether daylight savings was nominally in effect on that date.

To complicate matters, some jurisdictions have used the same timezone abbreviation to mean different UTC offsets at different times; for example, in Moscow MSK has meant UTC+3 in some years and UTC+4 in others. PostgreSQLinterprets such abbreviations according to whatever they meant (or had most recently meant) on the specified date; but, as with the EST example above, this is not necessarily the same as local civil time on that date.

One should be wary that the POSIX-style time zone feature can lead to silently accepting bogus input, since there is no check on the reasonableness of the zone abbreviations. For example, SET TIMEZONE TO FOOBAR0 will work, leaving the system effectively using a rather peculiar abbreviation for UTC. Another issue to keep in mind is that in POSIX time zone names, positive offsets are used for locations west of Greenwich. Everywhere else, PostgreSQLfollows the ISO-8601 convention that positive timezone offsets are east of Greenwich.

In all cases, timezone names and abbreviations are recognized case-insensitively. (This is a change from PostgreSQL versions prior to 8.2, which were case-sensitive in some contexts but not others.)

Neither timezone names nor abbreviations are hard-wired into the server; they are obtained from configuration files stored under .../share/timezone/ and .../share/timezonesets/ of the installation directory (see Section B.3).

The TimeZone configuration parameter can be set in the file postgresql.conf, or in any of the other standard ways described in Chapter 19. There are also some special ways to set it:

The SQL command SET TIME ZONE sets the time zone for the session. This is an alternative spelling of SET TIMEZONE TO with a more SQL-spec-compatible syntax.
The PGTZ environment variable is used by libpq clients to send a SET TIME ZONE command to the server upon connection.

8.5.4. Interval Input

interval values can be written using the following verbose syntax:

[@] quantity unit [quantity unit...] [direction]

where quantity is a number (possibly signed); unit is microsecond, millisecond, second, minute, hour, day, week, month, year, decade, century, millennium, or abbreviations or plurals of these units; direction can be ago or empty. The at sign (@) is optional noise. The amounts of the different units are implicitly added with appropriate sign accounting. ago negates all the fields. This syntax is also used for interval output, if IntervalStyle is set to postgres_verbose.

Quantities of days, hours, minutes, and seconds can be specified without explicit unit markings. For example, '1 12:59:10' is read the same as '1 day 12 hours 59 min 10 sec'. Also, a combination of years and months can be specified with a dash; for example '200-10' is read the same as '200 years 10 months'. (These shorter forms are in fact the only ones allowed by the SQL standard, and are used for output when IntervalStyle is set to sql_standard.)

Interval values can also be written as ISO 8601 time intervals, using either the “format with designators” of the standard's section 4.4.3.2 or the “alternative format” of section 4.4.3.3. The format with designators looks like this:

P quantity unit [ quantity unit ...] [ T [ quantity unit ...]]

The string must start with a P, and may include a T that introduces the time-of-day units. The available unit abbreviations are given in Table 8.16. Units may be omitted, and may be specified in any order, but units smaller than a day must appear after T. In particular, the meaning of M depends on whether it is before or after T.

Table 8.16. ISO 8601 Interval Unit Abbreviations

Abbreviation

Meaning

Years

Months (in the date part)

Weeks

Days

Hours

Minutes (in the time part)

Seconds

In the alternative format:

P [ years-months-days ] [ T hours:minutes:seconds ]

the string must begin with P, and a T separates the date and time parts of the interval. The values are given as numbers similar to ISO 8601 dates.

When writing an interval constant with a fields specification, or when assigning a string to an interval column that was defined with a fields specification, the interpretation of unmarked quantities depends on the fields. For example INTERVAL '1' YEAR is read as 1 year, whereas INTERVAL '1' means 1 second. Also, field values “to the right” of the least significant field allowed by the fields specification are silently discarded. For example, writing INTERVAL '1 day 2:03:04' HOUR TO MINUTE results in dropping the seconds field, but not the day field.

According to the SQL standard all fields of an interval value must have the same sign, so a leading negative sign applies to all fields; for example the negative sign in the interval literal '-1 2:03:04' applies to both the days and hour/minute/second parts. PostgreSQL allows the fields to have different signs, and traditionally treats each field in the textual representation as independently signed, so that the hour/minute/second part is considered positive in this example. If IntervalStyle is set to sql_standard then a leading sign is considered to apply to all fields (but only if no additional signs appear). Otherwise the traditional PostgreSQL interpretation is used. To avoid ambiguity, it's recommended to attach an explicit sign to each field if any field is negative.

Internally interval values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. Because intervals are usually created from constant strings or timestamp subtraction, this storage method works well in most cases. Functions justify_days and justify_hours are available for adjusting days and hours that overflow their normal ranges.

In the verbose input format, and in some fields of the more compact input formats, field values can have fractional parts; for example '1.5 week' or '01:02:03.45'. Such input is converted to the appropriate number of months, days, and seconds for storage. When this would result in a fractional number of months or days, the fraction is added to the lower-order fields using the conversion factors 1 month = 30 days and 1 day = 24 hours. For example,'1.5 month' becomes 1 month and 15 days. Only seconds will ever be shown as fractional on output.

Table 8.17 shows some examples of valid interval input.

Table 8.17. Interval Input

Example

Description

1-2

SQL standard format: 1 year 2 months

3 4:05:06

SQL standard format: 3 days 4 hours 5 minutes 6 seconds

1 year 2 months 3 days 4 hours 5 minutes 6 seconds

Traditional Postgres format: 1 year 2 months 3 days 4 hours 5 minutes 6 seconds

P1Y2M3DT4H5M6S

ISO 8601 “format with designators”: same meaning as above

P0001-02-03T04:05:06

ISO 8601 “alternative format”: same meaning as above

8.5.5. Interval Output

The output format of the interval type can be set to one of the four styles sql_standard, postgres, postgres_verbose, or iso_8601, using the command SET intervalstyle. The default is the postgres format. Table 8.18 shows examples of each output style.

The sql_standard style produces output that conforms to the SQL standard's specification for interval literal strings, if the interval value meets the standard's restrictions (either year-month only or day-time only, with no mixing of positive and negative components). Otherwise the output looks like a standard year-month literal string followed by a day-time literal string, with explicit signs added to disambiguate mixed-sign intervals.

The output of the postgres style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle parameter was set to ISO.

The output of the postgres_verbose style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle parameter was set to non-ISO output.

The output of the iso_8601 style matches the “format with designators” described in section 4.4.3.2 of the ISO 8601 standard.

Table 8.18. Interval Output Style Examples

Style Specification

Year-Month Interval

Day-Time Interval

Mixed Interval

sql_standard

1-2

3 4:05:06

-1-2 +3 -4:05:06

postgres

1 year 2 mons

3 days 04:05:06

-1 year -2 mons +3 days -04:05:06

postgres_verbose

@ 1 year 2 mons

@ 3 days 4 hours 5 mins 6 secs

@ 1 year 2 mons -3 days 4 hours 5 mins 6 secs ago

iso_8601

P1Y2M

P3DT4H5M6S

P-1Y-2M3DT-4H-5M-6S

9.7. 特徵比對

PostgreSQL 提供了三種不同的特徵比對方法：傳統的 SQL LIKE 運算子，最新的 SIMILAR TO 運算子（於 SQL：1999 中加入）和 POSIX 樣式的正規表示式。除了基本的「這個字串符合這個樣式嗎？」運算子之外，還可以使用函數來提取或替換符合的子字串，以及在配對的位置拆分字串。

提醒如果您的特徵比對需求超出此範圍，請考慮在 Perl 或 Tcl 中撰寫使用者定義的函數。

注意

雖然大多數正規表示式搜尋可以非常快速地執行，但是完成正規表示式需要花費大量的時間和記憶體來處理。要特別注意從各種來源接受正規表示式的搜尋方式。如果必須這樣做，建議強制限制執行語句執行時間。

使用 SIMILAR TO 方式的搜尋具有相同的安全隱憂，因為 SIMILAR TO 提供了許多與 POSIX 樣式的正規表示式相同功能。

LIKE 搜尋比其他兩個選項要簡單得多，在使用可能惡意的來源時更安全。

9.7.1. `LIKE`

string LIKE pattern [ESCAPE escape-character]
string NOT LIKE pattern [ESCAPE escape-character]

The LIKE expression returns true if the string matches the supplied pattern. (As expected, the NOT LIKE expression returns false if LIKE returns true, and vice versa. An equivalent expression is NOT (string LIKE pattern).)

If pattern does not contain percent signs or underscores, then the pattern only represents the string itself; in that case LIKE acts like the equals operator. An underscore (_) in pattern stands for (matches) any single character; a percent sign (%) matches any sequence of zero or more characters.

Some examples:

'abc' LIKE 'abc'    true
'abc' LIKE 'a%'     true
'abc' LIKE '_b_'    true
'abc' LIKE 'c'      false

LIKE pattern matching always covers the entire string. Therefore, if it's desired to match a sequence anywhere within a string, the pattern must start and end with a percent sign.

To match a literal underscore or percent sign without matching other characters, the respective character in pattern must be preceded by the escape character. The default escape character is the backslash but a different one can be selected by using the ESCAPE clause. To match the escape character itself, write two escape characters.

Note

If you have standard_conforming_strings turned off, any backslashes you write in literal string constants will need to be doubled. See Section 4.1.2.1 for more information.

It's also possible to select no escape character by writing ESCAPE ''. This effectively disables the escape mechanism, which makes it impossible to turn off the special meaning of underscore and percent signs in the pattern.

The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale. This is not in the SQL standard but is a PostgreSQL extension.

The operator ~~ is equivalent to LIKE, and ~~* corresponds to ILIKE. There are also !~~ and !~~* operators that represent NOT LIKE and NOT ILIKE, respectively. All of these operators are PostgreSQL-specific.

There is also the prefix operator ^@ and corresponding starts_with function which covers cases when only searching by beginning of the string is needed.

9.7.2. `SIMILAR TO` Regular Expressions

string SIMILAR TO pattern [ESCAPE escape-character]
string NOT SIMILAR TO pattern [ESCAPE escape-character]

The SIMILAR TO operator returns true or false depending on whether its pattern matches the given string. It is similar to LIKE, except that it interprets the pattern using the SQL standard's definition of a regular expression. SQL regular expressions are a curious cross between LIKE notation and common regular expression notation.

Like LIKE, the SIMILAR TO operator succeeds only if its pattern matches the entire string; this is unlike common regular expression behavior where the pattern can match any part of the string. Also like LIKE, SIMILAR TO uses _ and % as wildcard characters denoting any single character and any string, respectively (these are comparable to . and .* in POSIX regular expressions).

In addition to these facilities borrowed from LIKE, SIMILAR TO supports these pattern-matching metacharacters borrowed from POSIX regular expressions:

| denotes alternation (either of two alternatives).
* denotes repetition of the previous item zero or more times.
+ denotes repetition of the previous item one or more times.
? denotes repetition of the previous item zero or one time.
{m} denotes repetition of the previous item exactly m times.
{m,} denotes repetition of the previous item m or more times.
{m,n} denotes repetition of the previous item at least m and not more than n times.
Parentheses () can be used to group items into a single logical item.
A bracket expression [...] specifies a character class, just as in POSIX regular expressions.

Notice that the period (.) is not a metacharacter for SIMILAR TO.

As with LIKE, a backslash disables the special meaning of any of these metacharacters; or a different escape character can be specified with ESCAPE.

Some examples:

'abc' SIMILAR TO 'abc'      true
'abc' SIMILAR TO 'a'        false
'abc' SIMILAR TO '%(b|d)%'  true
'abc' SIMILAR TO '(b|c)%'   false

The substring function with three parameters, substring(string from pattern for escape-character), provides extraction of a substring that matches an SQL regular expression pattern. As with SIMILAR TO, the specified pattern must match the entire data string, or else the function fails and returns null. To indicate the part of the pattern that should be returned on success, the pattern must contain two occurrences of the escape character followed by a double quote ("). The text matching the portion of the pattern between these markers is returned.

Some examples, with #" delimiting the return string:

substring('foobar' from '%#"o_b#"%' for '#')   oob
substring('foobar' from '#"o_b#"%' for '#')    NULL

9.7.3. POSIX Regular Expressions

Table 9.14 lists the available operators for pattern matching using POSIX regular expressions.

Table 9.14. Regular Expression Match Operators

Operator

Description

Example

~

Matches regular expression, case sensitive

'thomas' ~ '.*thomas.*'

~*

Matches regular expression, case insensitive

'thomas' ~* '.*Thomas.*'

!~

Does not match regular expression, case sensitive

'thomas' !~ '.*Thomas.*'

!~*

Does not match regular expression, case insensitive

'thomas' !~* '.*vadim.*'

POSIX regular expressions provide a more powerful means for pattern matching than the LIKE and SIMILAR TO operators. Many Unix tools such as egrep, sed, or awk use a pattern matching language that is similar to the one described here.

A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a regular expression if it is a member of the regular set described by the regular expression. As with LIKE, pattern characters match string characters exactly unless they are special characters in the regular expression language — but regular expressions use different special characters than LIKE does. Unlike LIKE patterns, a regular expression is allowed to match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or end of the string.

Some examples:

'abc' ~ 'abc'    true
'abc' ~ '^a'     true
'abc' ~ '(b|d)'  true
'abc' ~ '^(b|c)' false

The POSIX pattern language is described in much greater detail below.

The substring function with two parameters, substring(string from pattern), provides extraction of a substring that matches a POSIX regular expression pattern. It returns null if there is no match, otherwise the portion of the text that matched the pattern. But if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned. You can put parentheses around the whole expression if you want to use parentheses within it without triggering this exception. If you need parentheses in the pattern before the subexpression you want to extract, see the non-capturing parentheses described below.

Some examples:

substring('foobar' from 'o.b')     oob
substring('foobar' from 'o(.)b')   o

The regexp_replace function provides substitution of new text for substrings that match POSIX regular expression patterns. It has the syntax regexp_replace(source, pattern, replacement [, flags ]). The source string is returned unchanged if there is no match to the pattern. If there is a match, the source string is returned with the replacement string substituted for the matching substring. The replacement string can contain \n, where n is 1 through 9, to indicate that the source substring matching the n'th parenthesized subexpression of the pattern should be inserted, and it can contain \& to indicate that the substring matching the entire pattern should be inserted. Write \\ if you need to put a literal backslash in the replacement text. The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Flag i specifies case-insensitive matching, while flag g specifies replacement of each matching substring rather than only the first one. Supported flags (though not g) are described in Table 9.22.

Some examples:

regexp_replace('foobarbaz', 'b..', 'X')
                                   fooXbaz
regexp_replace('foobarbaz', 'b..', 'X', 'g')
                                   fooXX
regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
                                   fooXarYXazY

The regexp_match function returns a text array of captured substring(s) resulting from the first match of a POSIX regular expression pattern to a string. It has the syntax regexp_match(string, pattern [, flags ]). If there is no match, the result is NULL. If a match is found, and the pattern contains no parenthesized subexpressions, then the result is a single-element text array containing the substring matching the whole pattern. If a match is found, and the pattern contains parenthesized subexpressions, then the result is a text array whose n'th element is the substring matching the n'th parenthesized subexpression of the pattern (not counting “non-capturing” parentheses; see below for details). The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Supported flags are described in Table 9.22.

Some examples:

SELECT regexp_match('foobarbequebaz', 'bar.*que');
 regexp_match
--------------
 {barbeque}
(1 row)

SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
 regexp_match
--------------
 {bar,beque}
(1 row)

In the common case where you just want the whole matching substring or NULL for no match, write something like

SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
 regexp_match
--------------
 barbeque
(1 row)

The regexp_matches function returns a set of text arrays of captured substring(s) resulting from matching a POSIX regular expression pattern to a string. It has the same syntax as regexp_match. This function returns no rows if there is no match, one row if there is a match and the g flag is not given, or N rows if there are N matches and the g flag is given. Each returned row is a text array containing the whole matched substring or the substrings matching parenthesized subexpressions of the pattern, just as described above for regexp_match. regexp_matches accepts all the flags shown in Table 9.22, plus the g flag which commands it to return all matches, not just the first one.

Some examples:

SELECT regexp_matches('foo', 'not there');
 regexp_matches
----------------
(0 rows)

SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b]+)(b[^b]+)', 'g');
 regexp_matches
----------------
 {bar,beque}
 {bazil,barf}
(2 rows)

Tip

In most cases regexp_matches() should be used with the g flag, since if you only want the first match, it's easier and more efficient to use regexp_match(). However,regexp_match() only exists in PostgreSQL version 10 and up. When working in older versions, a common trick is to place a regexp_matches() call in a sub-select, for example:

SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;

This produces a text array if there's a match, or NULL if not, the same as regexp_match()would do. Without the sub-select, this query would produce no output at all for table rows without a match, which is typically not the desired behavior.

The regexp_split_to_table function splits a string using a POSIX regular expression pattern as a delimiter. It has the syntax regexp_split_to_table(string, pattern [, flags ]). If there is no match to the pattern, the function returns the string. If there is at least one match, for each match it returns the text from the end of the last match (or the beginning of the string) to the beginning of the match. When there are no more matches, it returns the text from the end of the last match to the end of the string. The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. regexp_split_to_table supports the flags described in Table 9.22.

The regexp_split_to_array function behaves the same as regexp_split_to_table, except that regexp_split_to_array returns its result as an array of text. It has the syntax regexp_split_to_array(string, pattern [, flags ]). The parameters are the same as for regexp_split_to_table.

Some examples:

SELECT foo FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', '\s+') AS foo;
  foo   
-------
 the    
 quick  
 brown  
 fox    
 jumps 
 over   
 the    
 lazy   
 dog    
(9 rows)

SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', '\s+');
              regexp_split_to_array             
-----------------------------------------------
 {the,quick,brown,fox,jumps,over,the,lazy,dog}
(1 row)

SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
 foo 
-----
 t         
 h         
 e         
 q         
 u         
 i         
 c         
 k         
 b         
 r         
 o         
 w         
 n         
 f         
 o         
 x         
(16 rows)

As the last example demonstrates, the regexp split functions ignore zero-length matches that occur at the start or end of the string or immediately after a previous match. This is contrary to the strict definition of regexp matching that is implemented by regexp_match and regexp_matches, but is usually the most convenient behavior in practice. Other software systems such as Perl use similar definitions.

9.7.3.1. Regular Expression Details

PostgreSQL's regular expressions are implemented using a software package written by Henry Spencer. Much of the description of regular expressions below is copied verbatim from his manual.

Regular expressions (REs), as defined in POSIX 1003.2, come in two forms: extended REs or EREs (roughly those of egrep), and basic REs or BREs (roughly those of ed). PostgreSQL supports both forms, and also implements some extensions that are not in the POSIX standard, but have become widely used due to their availability in programming languages such as Perl and Tcl. REs using these non-POSIX extensions are called advanced REs or AREs in this documentation. AREs are almost an exact superset of EREs, but BREs have several notational incompatibilities (as well as being much more limited). We first describe the ARE and ERE forms, noting features that apply only to AREs, and then describe how BREs differ.

Note

PostgreSQL always initially presumes that a regular expression follows the ARE rules. However, the more limited ERE or BRE rules can be chosen by prepending an embedded option to the RE pattern, as described in Section 9.7.3.4. This can be useful for compatibility with applications that expect exactly the POSIX 1003.2 rules.

A regular expression is defined as one or more branches, separated by |. It matches anything that matches one of the branches.

A branch is zero or more quantified atoms or constraints, concatenated. It matches a match for the first, followed by a match for the second, etc; an empty branch matches the empty string.

A quantified atom is an atom possibly followed by a single quantifier. Without a quantifier, it matches a match for the atom. With a quantifier, it can match some number of matches of the atom. An atom can be any of the possibilities shown in Table 9.15. The possible quantifiers and their meanings are shown in Table 9.16.

A constraint matches an empty string, but matches only when specific conditions are met. A constraint can be used where an atom could be used, except it cannot be followed by a quantifier. The simple constraints are shown in Table 9.17; some more constraints are described later.

Table 9.15. Regular Expression Atoms

Atom

Description

(re)

(where re is any regular expression) matches a match for re, with the match noted for possible reporting

(?:re)

as above, but the match is not noted for reporting (a “non-capturing” set of parentheses) (AREs only)

.

matches any single character

[chars]

a bracket expression, matching any one of the chars (see for more detail)

\k

(where k is a non-alphanumeric character) matches that character taken as an ordinary character, e.g., \\ matches a backslash character

\c

where c is alphanumeric (possibly followed by other characters) is an escape, see (AREs only; in EREs and BREs, this matches c)

{

when followed by a character other than a digit, matches the left-brace character {; when followed by a digit, it is the beginning of a bound (see below)

x

where x is a single character with no other significance, matches that character

An RE cannot end with a backslash (\).

Note

If you have standard_conforming_strings turned off, any backslashes you write in literal string constants will need to be doubled. See Section 4.1.2.1 for more information.

Table 9.16. Regular Expression Quantifiers

Quantifier

Matches

*

a sequence of 0 or more matches of the atom

+

a sequence of 1 or more matches of the atom

?

a sequence of 0 or 1 matches of the atom

{m}

a sequence of exactly m matches of the atom

{m,}

a sequence of m or more matches of the atom

{m,n}

a sequence of m through n (inclusive) matches of the atom; m cannot exceed n

*?

non-greedy version of *

+?

non-greedy version of +

??

non-greedy version of ?

{m}?

non-greedy version of {m}

{m,}?

non-greedy version of {m,}

{m,n}?

non-greedy version of {m,n}

The forms using {...} are known as bounds. The numbers m and n within a bound are unsigned decimal integers with permissible values from 0 to 255 inclusive.

Non-greedy quantifiers (available in AREs only) match the same possibilities as their corresponding normal (greedy) counterparts, but prefer the smallest number rather than the largest number of matches. See Section 9.7.3.5 for more detail.

Note

A quantifier cannot immediately follow another quantifier, e.g., ** is invalid. A quantifier cannot begin an expression or subexpression or follow ^ or |.

Table 9.17. Regular Expression Constraints

Constraint

Description

^

matches at the beginning of the string

$

matches at the end of the string

(?=re)

positive lookahead matches at any point where a substring matching re begins (AREs only)

(?!re)

negative lookahead matches at any point where no substring matching re begins (AREs only)

(?<=re)

positive lookbehind matches at any point where a substring matching re ends (AREs only)

(?<!re)

negative lookbehind matches at any point where no substring matching re ends (AREs only)

Lookahead and lookbehind constraints cannot contain back references (see Section 9.7.3.3), and all parentheses within them are considered non-capturing.

9.7.3.2. Bracket Expressions

A bracket expression is a list of characters enclosed in []. It normally matches any single character from the list (but see below). If the list begins with ^, it matches any single character not from the rest of the list. If two characters in the list are separated by -, this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g., [0-9] in ASCII matches any decimal digit. It is illegal for two ranges to share an endpoint, e.g., a-c-e. Ranges are very collating-sequence-dependent, so portable programs should avoid relying on them.

To include a literal ] in the list, make it the first character (after ^, if that is used). To include a literal -, make it the first or last character, or the second endpoint of a range. To use a literal - as the first endpoint of a range, enclose it in [. and .] to make it a collating element (see below). With the exception of these characters, some combinations using [ (see next paragraphs), and escapes (AREs only), all other special characters lose their special significance within a bracket expression. In particular, \ is not special when following ERE or BRE rules, though it is special (as introducing an escape) in AREs.

Within a bracket expression, a collating element (a character, a multiple-character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in [. and .]stands for the sequence of characters of that collating element. The sequence is treated as a single element of the bracket expression's list. This allows a bracket expression containing a multiple-character collating element to match more than one character, e.g., if the collating sequence includes a ch collating element, then the RE [[.ch.]]*c matches the first five characters of chchcc.

Note

PostgreSQL currently does not support multi-character collating elements. This information describes possible future behavior.

Within a bracket expression, a collating element enclosed in [= and =] is an equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, the treatment is as if the enclosing delimiters were [. and .].) For example, if o and ^ are the members of an equivalence class, then [[=o=]], [[=^=]], and [o^] are all synonymous. An equivalence class cannot be an endpoint of a range.

Within a bracket expression, the name of a character class enclosed in [: and :] stands for the list of all characters belonging to that class. Standard character class names are: alnum, alpha, blank,cntrl, digit, graph, lower, print, punct, space, upper, xdigit. These stand for the character classes defined in ctype. A locale can provide others. A character class cannot be used as an endpoint of a range.

There are two special cases of bracket expressions: the bracket expressions [[:<:]] and [[:>:]] are constraints, matching empty strings at the beginning and end of a word respectively. A word is defined as a sequence of word characters that is neither preceded nor followed by word characters. A word character is an alnum character (as defined by ctype) or an underscore. This is an extension, compatible with but not specified by POSIX 1003.2, and should be used with caution in software intended to be portable to other systems. The constraint escapes described below are usually preferable; they are no more standard, but are easier to type.

9.7.3.3. Regular Expression Escapes

Escapes are special sequences beginning with \ followed by an alphanumeric character. Escapes come in several varieties: character entry, class shorthands, constraint escapes, and back references. A \ followed by an alphanumeric character but not constituting a valid escape is illegal in AREs. In EREs, there are no escapes: outside a bracket expression, a \ followed by an alphanumeric character merely stands for that character as an ordinary character, and inside a bracket expression, \ is an ordinary character. (The latter is the one actual incompatibility between EREs and AREs.)

Character-entry escapes exist to make it easier to specify non-printing and other inconvenient characters in REs. They are shown in Table 9.18.

Class-shorthand escapes provide shorthands for certain commonly-used character classes. They are shown in Table 9.19.

A constraint escape is a constraint, matching the empty string if specific conditions are met, written as an escape. They are shown in Table 9.20.

A back reference (\n) matches the same string matched by the previous parenthesized subexpression specified by the number n (see Table 9.21). For example, ([bc])\1 matches bb or cc but not bcor cb. The subexpression must entirely precede the back reference in the RE. Subexpressions are numbered in the order of their leading parentheses. Non-capturing parentheses do not define subexpressions.

Table 9.18. Regular Expression Character-entry Escapes

Escape

Description

\a

alert (bell) character, as in C

\b

backspace, as in C

\B

synonym for backslash (\) to help reduce the need for backslash doubling

\cX

(where X is any character) the character whose low-order 5 bits are the same as those of X, and whose other bits are all zero

\e

the character whose collating-sequence name is ESC, or failing that, the character with octal value 033

\f

form feed, as in C

\n

newline, as in C

\r

carriage return, as in C

\t

horizontal tab, as in C

\uwxyz

(where wxyz is exactly four hexadecimal digits) the character whose hexadecimal value is 0xwxyz

\Ustuvwxyz

(where stuvwxyz is exactly eight hexadecimal digits) the character whose hexadecimal value is 0xstuvwxyz

\v

vertical tab, as in C

\xhhh

(where hhh is any sequence of hexadecimal digits) the character whose hexadecimal value is 0xhhh (a single character no matter how many hexadecimal digits are used)

\0

the character whose value is 0 (the null byte)

\xy

(where xy is exactly two octal digits, and is not a back reference) the character whose octal value is 0xy

\xyz

(where xyz is exactly three octal digits, and is not a back reference) the character whose octal value is 0xyz

Hexadecimal digits are 0-9, a-f, and A-F. Octal digits are 0-7.

Numeric character-entry escapes specifying values outside the ASCII range (0-127) have meanings dependent on the database encoding. When the encoding is UTF-8, escape values are equivalent to Unicode code points, for example \u1234 means the character U+1234. For other multibyte encodings, character-entry escapes usually just specify the concatenation of the byte values for the character. If the escape value does not correspond to any legal character in the database encoding, no error will be raised, but it will never match any data.

The character-entry escapes are always taken as ordinary characters. For example, \135 is ] in ASCII, but \135 does not terminate a bracket expression.

Table 9.19. Regular Expression Class-shorthand Escapes

Escape

Description

\d

[[:digit:]]

\s

[[:space:]]

\w

[[:alnum:]_] (note underscore is included)

\D

[^[:digit:]]

\S

[^[:space:]]

\W

[^[:alnum:]_] (note underscore is included)

Within bracket expressions, \d, \s, and \w lose their outer brackets, and \D, \S, and \W are illegal. (So, for example, [a-c\d] is equivalent to [a-c[:digit:]]. Also, [a-c\D], which is equivalent to [a-c^[:digit:]], is illegal.)

Table 9.20. Regular Expression Constraint Escapes

Escape

Description

\A

matches only at the beginning of the string (see for how this differs from ^)

\m

matches only at the beginning of a word

\M

matches only at the end of a word

\y

matches only at the beginning or end of a word

\Y

matches only at a point that is not the beginning or end of a word

\Z

matches only at the end of the string (see for how this differs from $)

A word is defined as in the specification of [[:<:]] and [[:>:]] above. Constraint escapes are illegal within bracket expressions.

Table 9.21. Regular Expression Back References

Escape

Description

\m

(where m is a nonzero digit) a back reference to the m'th subexpression

\mnn

(where m is a nonzero digit, and nn is some more digits, and the decimal value mnn is not greater than the number of closing capturing parentheses seen so far) a back reference to the mnn'th subexpression

Note

There is an inherent ambiguity between octal character-entry escapes and back references, which is resolved by the following heuristics, as hinted at above. A leading zero always indicates an octal escape. A single non-zero digit, not followed by another digit, is always taken as a back reference. A multi-digit sequence not starting with a zero is taken as a back reference if it comes after a suitable subexpression (i.e., the number is in the legal range for a back reference), and otherwise is taken as octal.

9.7.3.4. Regular Expression Metasyntax

In addition to the main syntax described above, there are some special forms and miscellaneous syntactic facilities available.

An RE can begin with one of two special director prefixes. If an RE begins with ***:, the rest of the RE is taken as an ARE. (This normally has no effect in PostgreSQL, since REs are assumed to be AREs; but it does have an effect if ERE or BRE mode had been specified by the flags parameter to a regex function.) If an RE begins with ***=, the rest of the RE is taken to be a literal string, with all characters considered ordinary characters.

An ARE can begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic characters) specifies options affecting the rest of the RE. These options override any previously determined options — in particular, they can override the case-sensitivity behavior implied by a regex operator, or the flags parameter to a regex function. The available option letters are shown in Table 9.22. Note that these same option letters are used in the flags parameters of regex functions.

Table 9.22. ARE Embedded-option Letters

Option

Description

b

rest of RE is a BRE

c

case-sensitive matching (overrides operator type)

e

rest of RE is an ERE

i

case-insensitive matching (see ) (overrides operator type)

m

historical synonym for n

n

newline-sensitive matching (see )

p

partial newline-sensitive matching (see )

q

rest of RE is a literal (“quoted”) string, all ordinary characters

s

non-newline-sensitive matching (default)

t

tight syntax (default; see below)

w

inverse partial newline-sensitive (“weird”) matching (see )

x

expanded syntax (see below)

Embedded options take effect at the ) terminating the sequence. They can appear only at the start of an ARE (after the ***: director if any).

In addition to the usual (tight) RE syntax, in which all characters are significant, there is an expanded syntax, available by specifying the embedded x option. In the expanded syntax, white-space characters in the RE are ignored, as are all characters between a # and the following newline (or the end of the RE). This permits paragraphing and commenting a complex RE. There are three exceptions to that basic rule:

a white-space character or # preceded by \ is retained
white space or # within a bracket expression is retained
white space and comments cannot appear within multi-character symbols, such as (?:

For this purpose, white-space characters are blank, tab, newline, and any character that belongs to the space character class.

Finally, in an ARE, outside bracket expressions, the sequence (?#ttt) (where ttt is any text not containing a )) is a comment, completely ignored. Again, this is not allowed between the characters of multi-character symbols, like (?:. Such comments are more a historical artifact than a useful facility, and their use is deprecated; use the expanded syntax instead.

None of these metasyntax extensions is available if an initial ***= director has specified that the user's input be treated as a literal string rather than as an RE.

9.7.3.5. Regular Expression Matching Rules

In the event that an RE could match more than one substring of a given string, the RE matches the one starting earliest in the string. If the RE could match more than one substring starting at that point, either the longest possible match or the shortest possible match will be taken, depending on whether the RE is greedy or non-greedy.

Whether an RE is greedy or not is determined by the following rules:

Most atoms, and all constraints, have no greediness attribute (because they cannot match variable amounts of text anyway).
Adding parentheses around an RE does not change its greediness.
A quantified atom with a fixed-repetition quantifier ({m} or {m}?) has the same greediness (possibly none) as the atom itself.
A quantified atom with other normal quantifiers (including {m,n} with m equal to n) is greedy (prefers longest match).
A quantified atom with a non-greedy quantifier (including {m,n}? with m equal to n) is non-greedy (prefers shortest match).
A branch — that is, an RE that has no top-level | operator — has the same greediness as the first quantified atom in it that has a greediness attribute.
An RE consisting of two or more branches connected by the | operator is always greedy.

The above rules associate greediness attributes not only with individual quantified atoms, but with branches and entire REs that contain quantified atoms. What that means is that the matching is done in such a way that the branch, or whole RE, matches the longest or shortest possible substring as a whole. Once the length of the entire match is determined, the part of it that matches any particular subexpression is determined on the basis of the greediness attribute of that subexpression, with subexpressions starting earlier in the RE taking priority over ones starting later.

An example of what this means:

SELECT SUBSTRING('XY1234Z', 'Y*([0-9]{1,3})');
Result: 123
SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})');
Result: 1

In the first case, the RE as a whole is greedy because Y* is greedy. It can match beginning at the Y, and it matches the longest possible string starting there, i.e., Y123. The output is the parenthesized part of that, or 123. In the second case, the RE as a whole is non-greedy because Y*? is non-greedy. It can match beginning at the Y, and it matches the shortest possible string starting there, i.e., Y1. The subexpression [0-9]{1,3} is greedy but it cannot change the decision as to the overall match length; so it is forced to match just 1.

In short, when an RE contains both greedy and non-greedy subexpressions, the total match length is either as long as possible or as short as possible, according to the attribute assigned to the whole RE. The attributes assigned to the subexpressions only affect how much of that match they are allowed to “eat” relative to each other.

The quantifiers {1,1} and {1,1}? can be used to force greediness or non-greediness, respectively, on a subexpression or a whole RE. This is useful when you need the whole RE to have a greediness attribute different from what's deduced from its elements. As an example, suppose that we are trying to separate a string containing some digits into the digits and the parts before and after them. We might try to do that like this:

SELECT regexp_match('abc01234xyz', '(.*)(\d+)(.*)');
Result: {abc0123,4,xyz}

That didn't work: the first .* is greedy so it “eats” as much as it can, leaving the \d+ to match at the last possible place, the last digit. We might try to fix that by making it non-greedy:

SELECT regexp_match('abc01234xyz', '(.*?)(\d+)(.*)');
Result: {abc,0,""}

That didn't work either, because now the RE as a whole is non-greedy and so it ends the overall match as soon as possible. We can get what we want by forcing the RE as a whole to be greedy:

SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
Result: {abc,01234,xyz}

Controlling the RE's overall greediness separately from its components' greediness allows great flexibility in handling variable-length patterns.

When deciding what is a longer or shorter match, match lengths are measured in characters, not collating elements. An empty string is considered longer than no match at all. For example: bb*matches the three middle characters of abbbc; (week|wee)(night|knights) matches all ten characters of weeknights; when (.*).* is matched against abc the parenthesized subexpression matches all three characters; and when (a*)* is matched against bc both the whole RE and the parenthesized subexpression match an empty string.

If case-independent matching is specified, the effect is much as if all case distinctions had vanished from the alphabet. When an alphabetic that exists in multiple cases appears as an ordinary character outside a bracket expression, it is effectively transformed into a bracket expression containing both cases, e.g., x becomes [xX]. When it appears inside a bracket expression, all case counterparts of it are added to the bracket expression, e.g., [x] becomes [xX] and [^x] becomes [^xX].

If newline-sensitive matching is specified, . and bracket expressions using ^ will never match the newline character (so that matches will never cross newlines unless the RE explicitly arranges it) and ^and $ will match the empty string after and before a newline respectively, in addition to matching at beginning and end of string respectively. But the ARE escapes \A and \Z continue to match beginning or end of string only.

If partial newline-sensitive matching is specified, this affects . and bracket expressions as with newline-sensitive matching, but not ^ and $.

If inverse partial newline-sensitive matching is specified, this affects ^ and $ as with newline-sensitive matching, but not . and bracket expressions. This isn't very useful but is provided for symmetry.

9.7.3.6. Limits And Compatibility

No particular limit is imposed on the length of REs in this implementation. However, programs intended to be highly portable should not employ REs longer than 256 bytes, as a POSIX-compliant implementation can refuse to accept such REs.

The only feature of AREs that is actually incompatible with POSIX EREs is that \ does not lose its special significance inside bracket expressions. All other ARE features use syntax which is illegal or has undefined or unspecified effects in POSIX EREs; the *** syntax of directors likewise is outside the POSIX syntax for both BREs and EREs.

Many of the ARE extensions are borrowed from Perl, but some have been changed to clean them up, and a few Perl extensions are not present. Incompatibilities of note include \b, \B, the lack of special treatment for a trailing newline, the addition of complemented bracket expressions to the things affected by newline-sensitive matching, the restrictions on parentheses and back references in lookahead/lookbehind constraints, and the longest/shortest-match (rather than first-match) matching semantics.

Two significant incompatibilities exist between AREs and the ERE syntax recognized by pre-7.4 releases of PostgreSQL:

In AREs, \ followed by an alphanumeric character is either an escape or an error, while in previous releases, it was just another way of writing the alphanumeric. This should not be much of a problem because there was no reason to write such a sequence in earlier releases.
In AREs, \ remains a special character within [], so a literal \ within a bracket expression must be written \\.

9.7.3.7. Basic Regular Expressions

BREs differ from EREs in several respects. In BREs, |, +, and ? are ordinary characters and there is no equivalent for their functionality. The delimiters for bounds are \{ and \}, with { and } by themselves ordinary characters. The parentheses for nested subexpressions are $ and $, with ( and ) by themselves ordinary characters. ^ is an ordinary character except at the beginning of the RE or the beginning of a parenthesized subexpression, $ is an ordinary character except at the end of the RE or the end of a parenthesized subexpression, and * is an ordinary character if it appears at the beginning of the RE or the beginning of a parenthesized subexpression (after a possible leading ^). Finally, single-digit back references are available, and \< and \> are synonyms for [[:<:]] and [[:>:]] respectively; no other escapes are available in BREs.

9.9 日期時間函式及運算子

Table 9.31 shows the available functions for date/time value processing, with details appearing in the following subsections. Table 9.30 illustrates the behaviors of the basic arithmetic operators (+, *, etc.). For formatting functions, refer to Section 9.8. You should be familiar with the background information on date/time data types from Section 8.5.

All the functions and operators described below that take time or timestamp inputs actually come in two variants: one that takes time with time zone or timestamp with time zone, and one that takes time without time zone or timestamp without time zone. For brevity, these variants are not shown separately. Also, the + and * operators come in commutative pairs (for example both date + integer and integer + date); we show only one of each such pair.

Table 9.30. Date/Time Operators

Operator

Example

Result

+

date '2001-09-28' + integer '7'

date '2001-10-05'

+

date '2001-09-28' + interval '1 hour'

timestamp '2001-09-28 01:00:00'

+

date '2001-09-28' + time '03:00'

timestamp '2001-09-28 03:00:00'

+

interval '1 day' + interval '1 hour'

interval '1 day 01:00:00'

+

timestamp '2001-09-28 01:00' + interval '23 hours'

timestamp '2001-09-29 00:00:00'

+

time '01:00' + interval '3 hours'

time '04:00:00'

-

- interval '23 hours'

interval '-23:00:00'

-

date '2001-10-01' - date '2001-09-28'

integer '3' (days)

-

date '2001-10-01' - integer '7'

date '2001-09-24'

-

date '2001-09-28' - interval '1 hour'

timestamp '2001-09-27 23:00:00'

-

time '05:00' - time '03:00'

interval '02:00:00'

-

time '05:00' - interval '2 hours'

time '03:00:00'

-

timestamp '2001-09-28 23:00' - interval '23 hours'

timestamp '2001-09-28 00:00:00'

-

interval '1 day' - interval '1 hour'

interval '1 day -01:00:00'

-

timestamp '2001-09-29 03:00' - timestamp '2001-09-27 12:00'

interval '1 day 15:00:00'

*

900 * interval '1 second'

interval '00:15:00'

*

21 * interval '1 day'

interval '21 days'

*

double precision '3.5' * interval '1 hour'

interval '03:30:00'

/

interval '1 hour' / double precision '1.5'

interval '00:40:00'

Table 9.31. Date/Time Functions

Function

Return Type

Description

Example

Result

age(timestamp, timestamp)

interval

Subtract arguments, producing a “symbolic” result that uses years and months, rather than just days

age(timestamp '2001-04-10', timestamp '1957-06-13')

43 years 9 mons 27 days

age(timestamp)

interval

Subtract from current_date (at midnight)

age(timestamp '1957-06-13')

43 years 8 mons 3 days

clock_timestamp()

timestamp with time zone

Current date and time (changes during statement execution); see

current_date

date

Current date; see

current_time

time with time zone

Current time of day; see

current_timestamp

timestamp with time zone

Current date and time (start of current transaction); see

date_part(text, timestamp)

double precision

Get subfield (equivalent to extract); see

date_part('hour', timestamp '2001-02-16 20:38:40')

20

date_part(text, interval)

double precision

Get subfield (equivalent to extract); see

date_part('month', interval '2 years 3 months')

3

date_trunc(text, timestamp)

timestamp

Truncate to specified precision; see

date_trunc('hour', timestamp '2001-02-16 20:38:40')

2001-02-16 20:00:00

date_trunc(text, timestamp with time zone, text)

timestamp with time zone

Truncate to specified precision in the specified time zone; see

date_trunc('day', timestamptz '2001-02-16 20:38:40+00', 'Australia/Sydney')

2001-02-16 13:00:00+00

date_trunc(text, interval)

interval

Truncate to specified precision; see

date_trunc('hour', interval '2 days 3 hours 40 minutes')

2 days 03:00:00

extract(field from timestamp)

double precision

Get subfield; see

extract(hour from timestamp '2001-02-16 20:38:40')

20

extract(field from interval)

double precision

Get subfield; see

extract(month from interval '2 years 3 months')

3

isfinite(date)

boolean

Test for finite date (not +/-infinity)

isfinite(date '2001-02-16')

true

isfinite(timestamp)

boolean

Test for finite time stamp (not +/-infinity)

isfinite(timestamp '2001-02-16 21:28:30')

true

isfinite(interval)

boolean

Test for finite interval

isfinite(interval '4 hours')

true

justify_days(interval)

interval

Adjust interval so 30-day time periods are represented as months

justify_days(interval '35 days')

1 mon 5 days

justify_hours(interval)

interval

Adjust interval so 24-hour time periods are represented as days

justify_hours(interval '27 hours')

1 day 03:00:00

justify_interval(interval)

interval

Adjust interval using justify_days and justify_hours, with additional sign adjustments

justify_interval(interval '1 mon -1 hour')

29 days 23:00:00

localtime

time

Current time of day; see

localtimestamp

timestamp

Current date and time (start of current transaction); see

make_date(year int, month int, day int)

date

Create date from year, month and day fields

make_date(2013, 7, 15)

2013-07-15

make_interval(years int DEFAULT 0, months int DEFAULT 0, weeks int DEFAULT 0, days int DEFAULT 0, hours int DEFAULT 0, mins int DEFAULT 0, secs double precision DEFAULT 0.0)

interval

Create interval from years, months, weeks, days, hours, minutes and seconds fields

make_interval(days => 10)

10 days

make_time(hour int, min int, sec double precision)

time

Create time from hour, minute and seconds fields

make_time(8, 15, 23.5)

08:15:23.5

make_timestamp(year int, month int, day int, hour int, min int, sec double precision)

timestamp

Create timestamp from year, month, day, hour, minute and seconds fields

make_timestamp(2013, 7, 15, 8, 15, 23.5)

2013-07-15 08:15:23.5

make_timestamptz(year int, month int, day int, hour int, min int, sec double precision, [ timezone text ])

timestamp with time zone

Create timestamp with time zone from year, month, day, hour, minute and seconds fields; if timezone is not specified, the current time zone is used

make_timestamptz(2013, 7, 15, 8, 15, 23.5)

2013-07-15 08:15:23.5+01

now()

timestamp with time zone

Current date and time (start of current transaction); see

statement_timestamp()

timestamp with time zone

Current date and time (start of current statement); see

timeofday()

text

Current date and time (like clock_timestamp, but as a text string); see

transaction_timestamp()

timestamp with time zone

Current date and time (start of current transaction); see

to_timestamp(double precision)

timestamp with time zone

Convert Unix epoch (seconds since 1970-01-01 00:00:00+00) to timestamp

to_timestamp(1284352323)

2010-09-13 04:32:03+00

In addition to these functions, the SQL OVERLAPS operator is supported:

(start1, end1) OVERLAPS (start2, end2)
(start1, length1) OVERLAPS (start2, length2)

This expression yields true when two time periods (defined by their endpoints) overlap, false when they do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time, or time stamp followed by an interval. When a pair of values is provided, either the start or the end can be written first; OVERLAPS automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.

SELECT (DATE '2001-02-16', DATE '2001-12-21') OVERLAPS
       (DATE '2001-10-30', DATE '2002-10-30');
Result: true
SELECT (DATE '2001-02-16', INTERVAL '100 days') OVERLAPS
       (DATE '2001-10-30', DATE '2002-10-30');
Result: false
SELECT (DATE '2001-10-29', DATE '2001-10-30') OVERLAPS
       (DATE '2001-10-30', DATE '2001-10-31');
Result: false
SELECT (DATE '2001-10-30', DATE '2001-10-30') OVERLAPS
       (DATE '2001-10-30', DATE '2001-10-31');
Result: true

When adding an interval value to (or subtracting an interval value from) a timestamp with time zone value, the days component advances or decrements the date of the timestamp with time zone by the indicated number of days, keeping the time of day the same. Across daylight saving time changes (when the session time zone is set to a time zone that recognizes DST), this means interval '1 day' does not necessarily equal interval '24 hours'. For example, with the session time zone set to America/Denver:

SELECT timestamp with time zone '2005-04-02 12:00:00-07' + interval '1 day';
Result: 2005-04-03 12:00:00-06
SELECT timestamp with time zone '2005-04-02 12:00:00-07' + interval '24 hours';
Result: 2005-04-03 13:00:00-06

This happens because an hour was skipped due to a change in daylight saving time at 2005-04-03 02:00:00 in time zone America/Denver.

Note there can be ambiguity in the months field returned by age because different months have different numbers of days. PostgreSQL's approach uses the month from the earlier of the two dates when calculating partial months. For example, age('2004-06-01', '2004-04-30') uses April to yield 1 mon 1 day, while using May would yield 1 mon 2 days because May has 31 days, while April has only 30.

Subtraction of dates and timestamps can also be complex. One conceptually simple way to perform subtraction is to convert each value to a number of seconds using EXTRACT(EPOCH FROM ...), then subtract the results; this produces the number of seconds between the two values. This will adjust for the number of days in each month, timezone changes, and daylight saving time adjustments. Subtraction of date or timestamp values with the “-” operator returns the number of days (24-hours) and hours/minutes/seconds between the values, making the same adjustments. The age function returns years, months, days, and hours/minutes/seconds, performing field-by-field subtraction and then adjusting for negative field values. The following queries illustrate the differences in these approaches. The sample results were produced with timezone = 'US/Eastern'; there is a daylight saving time change between the two dates used:

SELECT EXTRACT(EPOCH FROM timestamptz '2013-07-01 12:00:00') -
       EXTRACT(EPOCH FROM timestamptz '2013-03-01 12:00:00');
Result: 10537200
SELECT (EXTRACT(EPOCH FROM timestamptz '2013-07-01 12:00:00') -
        EXTRACT(EPOCH FROM timestamptz '2013-03-01 12:00:00'))
        / 60 / 60 / 24;
Result: 121.958333333333
SELECT timestamptz '2013-07-01 12:00:00' - timestamptz '2013-03-01 12:00:00';
Result: 121 days 23:00:00
SELECT age(timestamptz '2013-07-01 12:00:00', timestamptz '2013-03-01 12:00:00');
Result: 4 mons

9.9.1. `EXTRACT`, `date_part`

EXTRACT(field FROM source)

The extract function retrieves subfields such as year or hour from date/time values. source must be a value expression of type timestamp, time, or interval. (Expressions of type date are cast to timestamp and can therefore be used as well.) field is an identifier or string that selects what field to extract from the source value. The extract function returns values of type double precision. The following are valid field names:century

The century

SELECT EXTRACT(CENTURY FROM TIMESTAMP '2000-12-16 12:21:13');
Result: 20
SELECT EXTRACT(CENTURY FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 21

The first century starts at 0001-01-01 00:00:00 AD, although they did not know it at the time. This definition applies to all Gregorian calendar countries. There is no century number 0, you go from -1 century to 1 century. If you disagree with this, please write your complaint to: Pope, Cathedral Saint-Peter of Roma, Vatican.day

For timestamp values, the day (of the month) field (1 - 31) ; for interval values, the number of days

SELECT EXTRACT(DAY FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 16

SELECT EXTRACT(DAY FROM INTERVAL '40 days 1 minute');
Result: 40

decade

The year field divided by 10

SELECT EXTRACT(DECADE FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 200

dow

The day of the week as Sunday (0) to Saturday (6)

SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 5

Note that extract's day of the week numbering differs from that of the to_char(..., 'D') function.doy

The day of the year (1 - 365/366)

SELECT EXTRACT(DOY FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 47

epoch

For timestamp with time zone values, the number of seconds since 1970-01-01 00:00:00 UTC (can be negative); for date and timestamp values, the number of seconds since 1970-01-01 00:00:00 local time; for interval values, the total number of seconds in the interval

SELECT EXTRACT(EPOCH FROM TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40.12-08');
Result: 982384720.12

SELECT EXTRACT(EPOCH FROM INTERVAL '5 days 3 hours');
Result: 442800

You can convert an epoch value back to a time stamp with to_timestamp:

SELECT to_timestamp(982384720.12);
Result: 2001-02-17 04:38:40.12+00

hour

The hour field (0 - 23)

SELECT EXTRACT(HOUR FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 20

isodow

The day of the week as Monday (1) to Sunday (7)

SELECT EXTRACT(ISODOW FROM TIMESTAMP '2001-02-18 20:38:40');
Result: 7

This is identical to dow except for Sunday. This matches the ISO 8601 day of the week numbering.isoyear

The ISO 8601 week-numbering year that the date falls in (not applicable to intervals)

SELECT EXTRACT(ISOYEAR FROM DATE '2006-01-01');
Result: 2005
SELECT EXTRACT(ISOYEAR FROM DATE '2006-01-02');
Result: 2006

Each ISO 8601 week-numbering year begins with the Monday of the week containing the 4th of January, so in early January or late December the ISO year may be different from the Gregorian year. See the week field for more information.

This field is not available in PostgreSQL releases prior to 8.3.microseconds

The seconds field, including fractional parts, multiplied by 1 000 000; note that this includes full seconds

SELECT EXTRACT(MICROSECONDS FROM TIME '17:12:28.5');
Result: 28500000

millennium

The millennium

SELECT EXTRACT(MILLENNIUM FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 3

Years in the 1900s are in the second millennium. The third millennium started January 1, 2001.milliseconds

The seconds field, including fractional parts, multiplied by 1000. Note that this includes full seconds.

SELECT EXTRACT(MILLISECONDS FROM TIME '17:12:28.5');
Result: 28500

minute

The minutes field (0 - 59)

SELECT EXTRACT(MINUTE FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 38

month

For timestamp values, the number of the month within the year (1 - 12) ; for interval values, the number of months, modulo 12 (0 - 11)

SELECT EXTRACT(MONTH FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 2

SELECT EXTRACT(MONTH FROM INTERVAL '2 years 3 months');
Result: 3

SELECT EXTRACT(MONTH FROM INTERVAL '2 years 13 months');
Result: 1

quarter

The quarter of the year (1 - 4) that the date is in

SELECT EXTRACT(QUARTER FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 1

second

The seconds field, including fractional parts (0 - 59[7])

SELECT EXTRACT(SECOND FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 40

SELECT EXTRACT(SECOND FROM TIME '17:12:28.5');
Result: 28.5

timezone

The time zone offset from UTC, measured in seconds. Positive values correspond to time zones east of UTC, negative values to zones west of UTC. (Technically, PostgreSQL does not use UTC because leap seconds are not handled.)timezone_hour

The hour component of the time zone offsettimezone_minute

The minute component of the time zone offsetweek

The number of the ISO 8601 week-numbering week of the year. By definition, ISO weeks start on Mondays and the first week of a year contains January 4 of that year. In other words, the first Thursday of a year is in week 1 of that year.

In the ISO week-numbering system, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for late-December dates to be part of the first week of the next year. For example, 2005-01-01 is part of the 53rd week of year 2004, and 2006-01-01 is part of the 52nd week of year 2005, while 2012-12-31 is part of the first week of 2013. It's recommended to use the isoyear field together with week to get consistent results.

SELECT EXTRACT(WEEK FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 7

year

The year field. Keep in mind there is no 0 AD, so subtracting BC years from AD years should be done with care.

SELECT EXTRACT(YEAR FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 2001

Note

When the input value is +/-Infinity, extract returns +/-Infinity for monotonically-increasing fields (epoch, julian, year, isoyear, decade, century, and millennium). For other fields, NULL is returned. PostgreSQL versions before 9.6 returned zero for all cases of infinite input.

The extract function is primarily intended for computational processing. For formatting date/time values for display, see Section 9.8.

The date_part function is modeled on the traditional Ingres equivalent to the SQL-standard function extract:

date_part('field', source)

Note that here the field parameter needs to be a string value, not a name. The valid field names for date_part are the same as for extract.

SELECT date_part('day', TIMESTAMP '2001-02-16 20:38:40');
Result: 16

SELECT date_part('hour', INTERVAL '4 hours 3 minutes');
Result: 4

9.9.2. `date_trunc`

The function date_trunc is conceptually similar to the trunc function for numbers.

date_trunc(field, source [, time_zone ])

source is a value expression of type timestamp, timestamp with time zone, or interval. (Values of type date and time are cast automatically to timestamp or interval, respectively.) field selects to which precision to truncate the input value. The return value is likewise of type timestamp, timestamp with time zone, or interval, and it has all fields that are less significant than the selected one set to zero (or one, for day and month).

Valid values for field are:

microseconds

milliseconds

second

minute

hour

day

week

month

quarter

year

decade

century

millennium

When the input value is of type timestamp with time zone, the truncation is performed with respect to a particular time zone; for example, truncation to day produces a value that is midnight in that zone. By default, truncation is done with respect to the current TimeZone setting, but the optional time_zone argument can be provided to specify a different time zone. The time zone name can be specified in any of the ways described in Section 8.5.3.

A time zone cannot be specified when processing timestamp without time zone or interval inputs. These are always taken at face value.

Examples (assuming the local time zone is America/New_York):

SELECT date_trunc('hour', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-02-16 20:00:00

SELECT date_trunc('year', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-01-01 00:00:00

SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00');
Result: 2001-02-16 00:00:00-05

SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00', 'Australia/Sydney');
Result: 2001-02-16 08:00:00-05

SELECT date_trunc('hour', INTERVAL '3 days 02:47:33');
Result: 3 days 02:00:00

9.9.3. `AT TIME ZONE`

The AT TIME ZONE converts time stamp without time zone to/from time stamp with time zone, and time values to different time zones. Table 9.32 shows its variants.

Table 9.32. `AT TIME ZONE` Variants

Expression

Return Type

Description

timestamp without time zone AT TIME ZONE zone

timestamp with time zone

Treat given time stamp without time zone as located in the specified time zone

timestamp with time zone AT TIME ZONE zone

timestamp without time zone

Convert given time stamp with time zone to the new time zone, with no time zone designation

time with time zone AT TIME ZONE zone

time with time zone

Convert given time with time zone to the new time zone

In these expressions, the desired time zone zone can be specified either as a text string (e.g., 'America/Los_Angeles') or as an interval (e.g., INTERVAL '-08:00'). In the text case, a time zone name can be specified in any of the ways described in Section 8.5.3.

Examples (assuming the local time zone is America/Los_Angeles):

SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'America/Denver';
Result: 2001-02-16 19:38:40-08

SELECT TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40-05' AT TIME ZONE 'America/Denver';
Result: 2001-02-16 18:38:40

SELECT TIMESTAMP '2001-02-16 20:38:40-05' AT TIME ZONE 'Asia/Tokyo' AT TIME ZONE 'America/Chicago';
Result: 2001-02-16 05:38:40

The first example adds a time zone to a value that lacks it, and displays the value using the current TimeZone setting. The second example shifts the time stamp with time zone value to the specified time zone, and returns the value without a time zone. This allows storage and display of values different from the current TimeZone setting. The third example converts Tokyo time to Chicago time. Converting time values to other time zones uses the currently active time zone rules since no date is supplied.

The function timezone(zone, timestamp) is equivalent to the SQL-conforming construct timestamp AT TIME ZONE zone.

9.9.4. Current Date/Time

PostgreSQL provides a number of functions that return values related to the current date and time. These SQL-standard functions all return values based on the start time of the current transaction:

CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
CURRENT_TIME(precision)
CURRENT_TIMESTAMP(precision)
LOCALTIME
LOCALTIMESTAMP
LOCALTIME(precision)
LOCALTIMESTAMP(precision)

CURRENT_TIME and CURRENT_TIMESTAMP deliver values with time zone; LOCALTIME and LOCALTIMESTAMP deliver values without time zone.

CURRENT_TIME, CURRENT_TIMESTAMP, LOCALTIME, and LOCALTIMESTAMP can optionally take a precision parameter, which causes the result to be rounded to that many fractional digits in the seconds field. Without a precision parameter, the result is given to the full available precision.

Some examples:

SELECT CURRENT_TIME;
Result: 14:39:53.662522-05

SELECT CURRENT_DATE;
Result: 2001-12-23

SELECT CURRENT_TIMESTAMP;
Result: 2001-12-23 14:39:53.662522-05

SELECT CURRENT_TIMESTAMP(2);
Result: 2001-12-23 14:39:53.66-05

SELECT LOCALTIMESTAMP;
Result: 2001-12-23 14:39:53.662522

Since these functions return the start time of the current transaction, their values do not change during the transaction. This is considered a feature: the intent is to allow a single transaction to have a consistent notion of the “current” time, so that multiple modifications within the same transaction bear the same time stamp.

Note

Other database systems might advance these values more frequently.

PostgreSQL also provides functions that return the start time of the current statement, as well as the actual current time at the instant the function is called. The complete list of non-SQL-standard time functions is:

transaction_timestamp()
statement_timestamp()
clock_timestamp()
timeofday()
now()

transaction_timestamp() is equivalent to CURRENT_TIMESTAMP, but is named to clearly reflect what it returns. statement_timestamp() returns the start time of the current statement (more specifically, the time of receipt of the latest command message from the client). statement_timestamp() and transaction_timestamp() return the same value during the first command of a transaction, but might differ during subsequent commands. clock_timestamp() returns the actual current time, and therefore its value changes even within a single SQL command. timeofday() is a historical PostgreSQL function. Like clock_timestamp(), it returns the actual current time, but as a formatted text string rather than a timestamp with time zone value. now() is a traditional PostgreSQL equivalent to transaction_timestamp().

All the date/time data types also accept the special literal value now to specify the current date and time (again, interpreted as the transaction start time). Thus, the following three all return the same result:

SELECT CURRENT_TIMESTAMP;
SELECT now();
SELECT TIMESTAMP 'now';  -- incorrect for use with DEFAULT

Tip

You do not want to use the third form when specifying a DEFAULT clause while creating a table. The system will convert now to a timestamp as soon as the constant is parsed, so that when the default value is needed, the time of the table creation would be used! The first two forms will not be evaluated until the default value is used, because they are function calls. Thus they will give the desired behavior of defaulting to the time of row insertion.

9.9.5. Delaying Execution

The following functions are available to delay execution of the server process:

pg_sleep(seconds)
pg_sleep_for(interval)
pg_sleep_until(timestamp with time zone)

pg_sleep makes the current session's process sleep until seconds seconds have elapsed. seconds is a value of type double precision, so fractional-second delays can be specified. pg_sleep_for is a convenience function for larger sleep times specified as an interval. pg_sleep_until is a convenience function for when a specific wake-up time is desired. For example:

SELECT pg_sleep(1.5);
SELECT pg_sleep_for('5 minutes');
SELECT pg_sleep_until('tomorrow 03:00');

Note

The effective resolution of the sleep interval is platform-specific; 0.01 seconds is a common value. The sleep delay will be at least as long as specified. It might be longer depending on factors such as server load. In particular, pg_sleep_until is not guaranteed to wake up exactly at the specified time, but it will not wake up any earlier.

Warning

Make sure that your session does not hold more locks than necessary when calling pg_sleep or its variants. Otherwise other sessions might have to wait for your sleeping process, slowing down the entire system.

[7] 60 if leap seconds are implemented by the operating system

9.4. 字串函式及運算子

本節介紹了用於檢查和操作字串的函數和運算子。在這種情況下，字串包括 character、character varying 和 text 型別的值。除非另有說明，否則下面列出的所有函數都可以在這些型別上使用，但是請注意在使用 character 型別時自動空格填充的潛在影響。其中有一些函數還支援對於位元型別的處理。

SQL 定義了一些使用關鍵字而不是逗號分隔參數的字串函數。詳情請見 Table 9.9。PostgreSQL 還提供了使用一般函數呼叫的語法，這些功能的函數版本（請參見 Table 9.10）。

在 PostgreSQL 8.3 之前的版本中，由於存在從這些資料型別到文字的隱式強制轉換，這些函數也將默默接受幾種非字串資料型別的值。這些強制轉換已被刪除，因為它們經常引起令人驚訝的結果。但是，字串連接運算子（||）仍然接受非字串輸入，只要至少一個輸入為字串型別即可，如 Table 9.9 所示。對於其他情況，如果您需要複製以前的行為，請在查詢語句中明確加入型別轉換。

Table 9.9. SQL String Functions and Operators

Function

Return Type

Description

Example

Result

string `

_string`_

text

String concatenation

`'Post'

'greSQL'`

PostgreSQL

string `

_non-string_ or _non-string_

_string`_

text

String concatenation with one non-string input

`'Value: '

42`

Value: 42

bit_length(string)

int

Number of bits in string

bit_length('jose')

32

char_length(string) or character_length(string)

int

Number of characters in string

char_length('jose')

4

lower(string)

text

Convert string to lower case

lower('TOM')

tom

octet_length(string)

int

Number of bytes in string

octet_length('jose')

4

overlay(string placing string from int [for int])

text

Replace substring

overlay('Txxxxas' placing 'hom' from 2 for 4)

Thomas

position(substring in string)

int

Location of specified substring

position('om' in 'Thomas')

3

substring(string [from int] [for int])

text

Extract substring

substring('Thomas' from 2 for 3)

hom

substring(string from pattern)

text

Extract substring matching POSIX regular expression. See for more information on pattern matching.

substring('Thomas' from '...$')

mas

substring(string from pattern for escape)

text

Extract substring matching SQL regular expression. See for more information on pattern matching.

substring('Thomas' from '%#"o_a#"_' for '#')

oma

`trim([leading

trailing

both] [_characters_\] from _string`_)

text

Remove the longest string containing only characters from characters (a space by default) from the start, end, or both ends (both is the default) of string

trim(both 'xyz' from 'yxTomxx')

Tom

`trim([leading

trailing

both] [from]_string_ \[, _characters`_] )

text

Non-standard syntax for trim()

trim(both from 'yxTomxx', 'xyz')

Tom

upper(string)

text

Convert string to upper case

upper('tom')

TOM

其他字串操作的可用函數，在 Table 9.10 中列出。其中一些用於內部實作的SQL標準字符串函數，則在 Table 9.9 中列出。

Table 9.10. Other String Functions

Function

Return Type

Description

Example

Result

ascii(string)

int

ASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character.

ascii('x')

120

btrim(string text [, characters text])

text

Remove the longest string consisting only of characters in characters (a space by default) from the start and end of string

btrim('xyxtrimyyx', 'xyz')

trim

chr(int)

text

Character with the given code. For UTF8 the argument is treated as a Unicode code point. For other multibyte encodings the argument must designate an ASCII character. The NULL (0) character is not allowed because text data types cannot store such bytes.

chr(65)

A

concat(str "any" [, str "any" [, ...] ])

text

Concatenate the text representations of all the arguments. NULL arguments are ignored.

concat('abcde', 2, NULL, 22)

abcde222

concat_ws(sep text, str "any" [, str "any" [, ...] ])

text

Concatenate all but the first argument with separators. The first argument is used as the separator string. NULL arguments are ignored.

concat_ws(',', 'abcde', 2, NULL, 22)

abcde,2,22

convert(string bytea, src_encoding name, dest_encoding name)

bytea

Convert string to dest_encoding. The original encoding is specified by src_encoding. The string must be valid in this encoding. Conversions can be defined by CREATE CONVERSION. Also there are some predefined conversions. See for available conversions.

convert('text_in_utf8', 'UTF8', 'LATIN1')

text_in_utf8 represented in Latin-1 encoding (ISO 8859-1)

convert_from(string bytea, src_encoding name)

text

Convert string to the database encoding. The original encoding is specified by src_encoding. The string must be valid in this encoding.

convert_from('text_in_utf8', 'UTF8')

text_in_utf8 represented in the current database encoding

convert_to(string text, dest_encoding name)

bytea

Convert string to dest_encoding.

convert_to('some text', 'UTF8')

some text represented in the UTF8 encoding

decode(string text, format text)

bytea

Decode binary data from textual representation in string. Options for format are same as in encode.

decode('MTIzAAE=', 'base64')

\x3132330001

encode(data bytea, format text)

text

Encode binary data into a textual representation. Supported formats are: base64, hex, escape. escape converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes.

encode('123\000\001', 'base64')

MTIzAAE=

format(formatstr text [, formatarg "any" [, ...] ])

text

Format arguments according to a format string. This function is similar to the C function sprintf. See .

format('Hello %s, %1$s', 'World')

Hello World, World

initcap(string)

text

Convert the first letter of each word to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.

initcap('hi THOMAS')

Hi Thomas

left(str text, n int)

text

Return first n characters in the string. When n is negative, return all but last |n| characters.

left('abcde', 2)

ab

length(string)

int

Number of characters in string

length('jose')

4

length(string bytea, encoding name )

int

Number of characters in string in the given encoding. The string must be valid in this encoding.

length('jose', 'UTF8')

4

lpad(string text, length int [, fill text])

text

Fill up the string to length length by prepending the characters fill (a space by default). If the string is already longer than length then it is truncated (on the right).

lpad('hi', 5, 'xy')

xyxhi

ltrim(string text [, characters text])

text

Remove the longest string containing only characters from characters (a space by default) from the start of string

ltrim('zzzytest', 'xyz')

test

md5(string)

text

Calculates the MD5 hash of string, returning the result in hexadecimal

md5('abc')

900150983cd24fb0 d6963f7d28e17f72

parse_ident(qualified_identifier text [, strictmode boolean DEFAULT true ] )

text[]

Split qualified_identifier into an array of identifiers, removing any quoting of individual identifiers. By default, extra characters after the last identifier are considered an error; but if the second parameter is false, then such extra characters are ignored. (This behavior is useful for parsing names for objects like functions.) Note that this function does not truncate over-length identifiers. If you want truncation you can cast the result to name[].

parse_ident('"SomeSchema".someTable')

{SomeSchema,sometable}

pg_client_encoding()

name

Current client encoding name

pg_client_encoding()

SQL_ASCII

quote_ident(string text)

text

Return the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are properly doubled. See also .

quote_ident('Foo bar')

"Foo bar"

quote_literal(string text)

text

Return the given string suitably quoted to be used as a string literal in an SQL statement string. Embedded single-quotes and backslashes are properly doubled. Note that quote_literal returns null on null input; if the argument might be null, quote_nullable is often more suitable. See also .

quote_literal(E'O\'Reilly')

'O''Reilly'

quote_literal(value anyelement)

text

Coerce the given value to text and then quote it as a literal. Embedded single-quotes and backslashes are properly doubled.

quote_literal(42.5)

'42.5'

quote_nullable(string text)

text

Return the given string suitably quoted to be used as a string literal in an SQL statement string; or, if the argument is null, return NULL. Embedded single-quotes and backslashes are properly doubled. See also .

quote_nullable(NULL)

NULL

quote_nullable(value anyelement)

text

Coerce the given value to text and then quote it as a literal; or, if the argument is null, return NULL. Embedded single-quotes and backslashes are properly doubled.

quote_nullable(42.5)

'42.5'

regexp_match(string text, pattern text [, flags text])

text[]

Return captured substring(s) resulting from the first match of a POSIX regular expression to the string. See for more information.

regexp_match('foobarbequebaz', '(bar)(beque)')

{bar,beque}

regexp_matches(string text, pattern text [, flags text])

setof text[]

Return captured substring(s) resulting from matching a POSIX regular expression to the string. See for more information.

regexp_matches('foobarbequebaz', 'ba.', 'g')

{bar}

{baz}(2 rows)

regexp_replace(string text, pattern text, replacement text [, flags text])

text

Replace substring(s) matching a POSIX regular expression. See for more information.

regexp_replace('Thomas', '.[mN]a.', 'M')

ThM

regexp_split_to_array(string text, pattern text [, flags text ])

text[]

Split string using a POSIX regular expression as the delimiter. See for more information.

regexp_split_to_array('hello world', '\s+')

{hello,world}

regexp_split_to_table(string text, pattern text [, flags text])

setof text

Split string using a POSIX regular expression as the delimiter. See for more information.

regexp_split_to_table('hello world', '\s+')

hello

world(2 rows)

repeat(string text, number int)

text

Repeat string the specified number of times

repeat('Pg', 4)

PgPgPgPg

replace(string text, from text, to text)

text

Replace all occurrences in string of substring from with substring to

replace('abcdefabcdef', 'cd', 'XX')

abXXefabXXef

reverse(str)

text

Return reversed string.

reverse('abcde')

edcba

right(str text, n int)

text

Return last n characters in the string. When n is negative, return all but first |n| characters.

right('abcde', 2)

de

rpad(string text, length int [, fill text])

text

Fill up the string to length length by appending the characters fill (a space by default). If the string is already longer than length then it is truncated.

rpad('hi', 5, 'xy')

hixyx

rtrim(string text [, characters text])

text

Remove the longest string containing only characters from characters (a space by default) from the end of string

rtrim('testxxzx', 'xyz')

test

split_part(string text, delimiter text, field int)

text

Split string on delimiter and return the given field (counting from one)

split_part('abc~@~def~@~ghi', '~@~', 2)

def

strpos(string, substring)

int

Location of specified substring (same as position(substring in string), but note the reversed argument order)

strpos('high', 'ig')

2

substr(string, from [, count])

text

回傳子字串（與 substring(string from from for count) 相同）

substr('alphabet', 3, 2)

ph

starts_with(string, prefix)

bool

Returns true if string starts with prefix.

starts_with('alphabet', 'alph')

t

to_ascii(string text [, encoding text])

text

Convert string to ASCII from another encoding (only supports conversion from LATIN1, LATIN2, LATIN9, and WIN1250 encodings)

to_ascii('Karel')

Karel

to_hex(number int or bigint)

text

Convert number to its equivalent hexadecimal representation

to_hex(2147483647)

7fffffff

translate(string text, from text, to text)

text

Any character in string that matches a character in the from set is replaced by the corresponding character in the to set. If from is longer than to, occurrences of the extra characters in from are removed.

translate('12345', '143', 'ax')

a2x5

concat、concat_ws 和 format 函數是動態參數，因此可以將要連接或格式化的值以 VARIADIC 關鍵字標記的陣列（請參閱第 37.5.5 節）輸入。將陣列的元素視為函數的一個普通參數。如果動態參數陣列參數為 NULL，則 concat 和 concat_ws 回傳 NULL，但是 format 將 NULL 視為零元素陣列。

另請參閱第 9.20 節中的彙總函數 string_agg。

Table 9.11. Built-in Conversions

Conversion Name

Source Encoding

Destination Encoding

ascii_to_mic

SQL_ASCII

MULE_INTERNAL

ascii_to_utf8

SQL_ASCII

UTF8

big5_to_euc_tw

BIG5

EUC_TW

big5_to_mic

BIG5

MULE_INTERNAL

big5_to_utf8

BIG5

UTF8

euc_cn_to_mic

EUC_CN

MULE_INTERNAL

euc_cn_to_utf8

EUC_CN

UTF8

euc_jp_to_mic

EUC_JP

MULE_INTERNAL

euc_jp_to_sjis

EUC_JP

SJIS

euc_jp_to_utf8

EUC_JP

UTF8

euc_kr_to_mic

EUC_KR

MULE_INTERNAL

euc_kr_to_utf8

EUC_KR

UTF8

euc_tw_to_big5

EUC_TW

BIG5

euc_tw_to_mic

EUC_TW

MULE_INTERNAL

euc_tw_to_utf8

EUC_TW

UTF8

gb18030_to_utf8

GB18030

UTF8

gbk_to_utf8

GBK

UTF8

iso_8859_10_to_utf8

LATIN6

UTF8

iso_8859_13_to_utf8

LATIN7

UTF8

iso_8859_14_to_utf8

LATIN8

UTF8

iso_8859_15_to_utf8

LATIN9

UTF8

iso_8859_16_to_utf8

LATIN10

UTF8

iso_8859_1_to_mic

LATIN1

MULE_INTERNAL

iso_8859_1_to_utf8

LATIN1

UTF8

iso_8859_2_to_mic

LATIN2

MULE_INTERNAL

iso_8859_2_to_utf8

LATIN2

UTF8

iso_8859_2_to_windows_1250

LATIN2

WIN1250

iso_8859_3_to_mic

LATIN3

MULE_INTERNAL

iso_8859_3_to_utf8

LATIN3

UTF8

iso_8859_4_to_mic

LATIN4

MULE_INTERNAL

iso_8859_4_to_utf8

LATIN4

UTF8

iso_8859_5_to_koi8_r

ISO_8859_5

KOI8R

iso_8859_5_to_mic

ISO_8859_5

MULE_INTERNAL

iso_8859_5_to_utf8

ISO_8859_5

UTF8

iso_8859_5_to_windows_1251

ISO_8859_5

WIN1251

iso_8859_5_to_windows_866

ISO_8859_5

WIN866

iso_8859_6_to_utf8

ISO_8859_6

UTF8

iso_8859_7_to_utf8

ISO_8859_7

UTF8

iso_8859_8_to_utf8

ISO_8859_8

UTF8

iso_8859_9_to_utf8

LATIN5

UTF8

johab_to_utf8

JOHAB

UTF8

koi8_r_to_iso_8859_5

KOI8R

ISO_8859_5

koi8_r_to_mic

KOI8R

MULE_INTERNAL

koi8_r_to_utf8

KOI8R

UTF8

koi8_r_to_windows_1251

KOI8R

WIN1251

koi8_r_to_windows_866

KOI8R

WIN866

koi8_u_to_utf8

KOI8U

UTF8

mic_to_ascii

MULE_INTERNAL

SQL_ASCII

mic_to_big5

MULE_INTERNAL

BIG5

mic_to_euc_cn

MULE_INTERNAL

EUC_CN

mic_to_euc_jp

MULE_INTERNAL

EUC_JP

mic_to_euc_kr

MULE_INTERNAL

EUC_KR

mic_to_euc_tw

MULE_INTERNAL

EUC_TW

mic_to_iso_8859_1

MULE_INTERNAL

LATIN1

mic_to_iso_8859_2

MULE_INTERNAL

LATIN2

mic_to_iso_8859_3

MULE_INTERNAL

LATIN3

mic_to_iso_8859_4

MULE_INTERNAL

LATIN4

mic_to_iso_8859_5

MULE_INTERNAL

ISO_8859_5

mic_to_koi8_r

MULE_INTERNAL

KOI8R

mic_to_sjis

MULE_INTERNAL

SJIS

mic_to_windows_1250

MULE_INTERNAL

WIN1250

mic_to_windows_1251

MULE_INTERNAL

WIN1251

mic_to_windows_866

MULE_INTERNAL

WIN866

sjis_to_euc_jp

SJIS

EUC_JP

sjis_to_mic

SJIS

MULE_INTERNAL

sjis_to_utf8

SJIS

UTF8

windows_1258_to_utf8

WIN1258

UTF8

uhc_to_utf8

UHC

UTF8

utf8_to_ascii

UTF8

SQL_ASCII

utf8_to_big5

UTF8

BIG5

utf8_to_euc_cn

UTF8

EUC_CN

utf8_to_euc_jp

UTF8

EUC_JP

utf8_to_euc_kr

UTF8

EUC_KR

utf8_to_euc_tw

UTF8

EUC_TW

utf8_to_gb18030

UTF8

GB18030

utf8_to_gbk

UTF8

GBK

utf8_to_iso_8859_1

UTF8

LATIN1

utf8_to_iso_8859_10

UTF8

LATIN6

utf8_to_iso_8859_13

UTF8

LATIN7

utf8_to_iso_8859_14

UTF8

LATIN8

utf8_to_iso_8859_15

UTF8

LATIN9

utf8_to_iso_8859_16

UTF8

LATIN10

utf8_to_iso_8859_2

UTF8

LATIN2

utf8_to_iso_8859_3

UTF8

LATIN3

utf8_to_iso_8859_4

UTF8

LATIN4

utf8_to_iso_8859_5

UTF8

ISO_8859_5

utf8_to_iso_8859_6

UTF8

ISO_8859_6

utf8_to_iso_8859_7

UTF8

ISO_8859_7

utf8_to_iso_8859_8

UTF8

ISO_8859_8

utf8_to_iso_8859_9

UTF8

LATIN5

utf8_to_johab

UTF8

JOHAB

utf8_to_koi8_r

UTF8

KOI8R

utf8_to_koi8_u

UTF8

KOI8U

utf8_to_sjis

UTF8

SJIS

utf8_to_windows_1258

UTF8

WIN1258

utf8_to_uhc

UTF8

UHC

utf8_to_windows_1250

UTF8

WIN1250

utf8_to_windows_1251

UTF8

WIN1251

utf8_to_windows_1252

UTF8

WIN1252

utf8_to_windows_1253

UTF8

WIN1253

utf8_to_windows_1254

UTF8

WIN1254

utf8_to_windows_1255

UTF8

WIN1255

utf8_to_windows_1256

UTF8

WIN1256

utf8_to_windows_1257

UTF8

WIN1257

utf8_to_windows_866

UTF8

WIN866

utf8_to_windows_874

UTF8

WIN874

windows_1250_to_iso_8859_2

WIN1250

LATIN2

windows_1250_to_mic

WIN1250

MULE_INTERNAL

windows_1250_to_utf8

WIN1250

UTF8

windows_1251_to_iso_8859_5

WIN1251

ISO_8859_5

windows_1251_to_koi8_r

WIN1251

KOI8R

windows_1251_to_mic

WIN1251

MULE_INTERNAL

windows_1251_to_utf8

WIN1251

UTF8

windows_1251_to_windows_866

WIN1251

WIN866

windows_1252_to_utf8

WIN1252

UTF8

windows_1256_to_utf8

WIN1256

UTF8

windows_866_to_iso_8859_5

WIN866

ISO_8859_5

windows_866_to_koi8_r

WIN866

KOI8R

windows_866_to_mic

WIN866

MULE_INTERNAL

windows_866_to_utf8

WIN866

UTF8

windows_866_to_windows_1251

WIN866

WIN

windows_874_to_utf8

WIN874

UTF8

euc_jis_2004_to_utf8

EUC_JIS_2004

UTF8

utf8_to_euc_jis_2004

UTF8

EUC_JIS_2004

shift_jis_2004_to_utf8

SHIFT_JIS_2004

UTF8

utf8_to_shift_jis_2004

UTF8

SHIFT_JIS_2004

euc_jis_2004_to_shift_jis_2004

EUC_JIS_2004

SHIFT_JIS_2004

shift_jis_2004_to_euc_jis_2004

SHIFT_JIS_2004

EUC_JIS_2004

轉換名稱遵循標準的命名規則：來源編碼的正式名稱，所有非字母數字字元均用下底線代替，接在 _to_ 之後，然後是經過類似處理的目標編碼名稱。因此，名稱可能與習慣的編碼名稱有所不同。

9.4.1. `format`

The function format produces output formatted according to a format string, in a style similar to the C function sprintf.

format(formatstr text [, formatarg "any" [, ...] ])

formatstr is a format string that specifies how the result should be formatted. Text in the format string is copied directly to the result, except where format specifiers are used. Format specifiers act as placeholders in the string, defining how subsequent function arguments should be formatted and inserted into the result. Each formatarg argument is converted to text according to the usual output rules for its data type, and then formatted and inserted into the result string according to the format specifier(s).

Format specifiers are introduced by a % character and have the form

%[position][flags][width]type

where the component fields are:position (optional)

A string of the form n$ where n is the index of the argument to print. Index 1 means the first argument after formatstr. If the position is omitted, the default is to use the next argument in sequence.flags (optional)

Additional options controlling how the format specifier's output is formatted. Currently the only supported flag is a minus sign (-) which will cause the format specifier's output to be left-justified. This has no effect unless the width field is also specified.width (optional)

Specifies the minimum number of characters to use to display the format specifier's output. The output is padded on the left or right (depending on the - flag) with spaces as needed to fill the width. A too-small width does not cause truncation of the output, but is simply ignored. The width may be specified using any of the following: a positive integer; an asterisk (*) to use the next function argument as the width; or a string of the form *n$ to use the _n_th function argument as the width.

If the width comes from a function argument, that argument is consumed before the argument that is used for the format specifier's value. If the width argument is negative, the result is left aligned (as if the - flag had been specified) within a field of length abs(width).type (required)

The type of format conversion to use to produce the format specifier's output. The following types are supported:

s formats the argument value as a simple string. A null value is treated as an empty string.
I treats the argument value as an SQL identifier, double-quoting it if necessary. It is an error for the value to be null (equivalent to quote_ident).
L quotes the argument value as an SQL literal. A null value is displayed as the string NULL, without quotes (equivalent to quote_nullable).

In addition to the format specifiers described above, the special sequence %% may be used to output a literal % character.

Here are some examples of the basic format conversions:

SELECT format('Hello %s', 'World');
Result: Hello World

SELECT format('Testing %s, %s, %s, %%', 'one', 'two', 'three');
Result: Testing one, two, three, %

SELECT format('INSERT INTO %I VALUES(%L)', 'Foo bar', E'O\'Reilly');
Result: INSERT INTO "Foo bar" VALUES('O''Reilly')

SELECT format('INSERT INTO %I VALUES(%L)', 'locations', 'C:\Program Files');
Result: INSERT INTO locations VALUES('C:\Program Files')

Here are examples using width fields and the - flag:

SELECT format('|%10s|', 'foo');
Result: |       foo|

SELECT format('|%-10s|', 'foo');
Result: |foo       |

SELECT format('|%*s|', 10, 'foo');
Result: |       foo|

SELECT format('|%*s|', -10, 'foo');
Result: |foo       |

SELECT format('|%-*s|', 10, 'foo');
Result: |foo       |

SELECT format('|%-*s|', -10, 'foo');
Result: |foo       |

These examples show use of position fields:

SELECT format('Testing %3$s, %2$s, %1$s', 'one', 'two', 'three');
Result: Testing three, two, one

SELECT format('|%*2$s|', 'foo', 10, 'bar');
Result: |       bar|

SELECT format('|%1$*2$s|', 'foo', 10, 'bar');
Result: |       foo|

Unlike the standard C function sprintf, PostgreSQL's format function allows format specifiers with and without position fields to be mixed in the same format string. A format specifier without a position field always uses the next argument after the last argument consumed. In addition, the format function does not require all function arguments to be used in the format string. For example:

SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
Result: Testing three, two, three

The %I and %L format specifiers are particularly useful for safely constructing dynamic SQL statements. See Example 42.1.