1 of 100

11 簡介

本使用手冊由台灣 PostgreSQL 社群提供，翻譯自 PostgreSQL 官方使用手冊，以推廣 PostgreSQL 於台灣的應用。

本使用手冊目前編譯內容為 PostgreSQL 11，另外還有可以參考。

每一個頁面均附上對應連結，翻譯未詳盡之處，可對照閱讀。未翻譯完成之段落，將暫以原文（英文）替代。

閱讀前，也可以參閱摘要簡報：

目前最新版本為 PostgreSQL 11：(部份內容為版本10，持續更新中)

參與協作請在任何頁面，點選右上角的「Edit on GitHub」，修改後直接送 PR 給我們即可。（只翻一句也可以唷！）

任何問題或建議可以 Email 給我們的文件小組：

前言

本手冊為PostgreSQL官方手冊翻譯版，由PostgreSQL台灣社群愛好者所提供，描述本版本PostgreSQL的功能與支援情形。

為了更易於閱讀與管理，本手冊以下列數個部份所組成。每一個部份均為不同類型的使用者所撰寫：

第一部份：給新的使用者一份簡易的介紹。
第二部份：介紹SQL查詢語言，包含資料型態及函數功能，應用層級的效能調教也在此有所說明。每一個PostgreSQL使用者都推薦閱讀此部份。
第三部份：說明資料庫伺服器的安裝及管理資訊。如果你需要管理一個PostgreSQL伺服器，那你必須閱讀此部份的內容。
第四部份：說明PostgreSQL用戶端的程式操作介面。
第五部份：資料庫伺服務的進階說明及延伸的使用方式，亦包含了使用者自訂的資料型別及函式。
第六部份：SQL查詢指令、用戶端指令及伺服器端指令在此部份詳細說明。
第七部份：提供PostgreSQL開發者可能需要的其他相關資訊。

1. 什麼是PostgreSQL？

PostgreSQL是基於POSTGRES 4.2的物件導向關連式資料庫管理系統，其由美國加州伯克萊大學資訊科學系所研發。POSTGRES所開發的許多重要概念成為許多日後商用資料庫系統重要的一部份。

PostgreSQL由伯克萊大學公開其原始碼所誕生，它支援了大多數的標準SQL語法，並提供許多先進的功能：

complex queries
foreign keys
triggers
updatable views
transactional integrity
multiversion concurrency control

同時，PostgreSQL也支援讓使用者能以自己的方式善用資料庫系統：

data types
functions
operators
aggregate functions
index methods
procedural languages

在使用權利方面，不論任何人以任何目的，使用、修改、散布PostgreSQL，都是被允許的，包含私人使用、商業用途、或學術研究。

2. PostgreSQL沿革

PostgreSQL目前為眾所皆知的物件導向的關連式資料庫管理系統，其由美國加州伯克萊大學所研發的POSTGRES衍生而成。經過超過二十年以上的演進，PostgreSQL現在是世界上最先進的開源資料庫系統。

2.1. 伯克萊大學POSTGRES專案

POSTGRES專案是由Michael Stonebraker教授領導的團隊進行研發，其受到Defense Advanced Research Projects Agency (DARPA)，the Army Research Office (ARO)，the National Science Foundation (NSF),及ESL, Inc的贊助。POSTGRES專案始於1986年，最原始的設計，＂The design of POSTGRES＂，作為開端，其最初的資料結構模型則揭露於＂The POSTGRES data model＂。規則系統設計發表於＂The design of the POSTGRES rules system＂，而當時的關連式資料儲存的架構則刊載於＂The design of the POSTGRES storage system＂。

POSTGRES接下進行了幾次重大的變革。第一代的＂demoware＂在1987年真的實作成為可用的系統，並在1988年的ACM-SIGMOD研討會中進行展示，並在1989年6月，釋出了第1版可供外部使用者使用的資料庫系統。為了回應當時使用者對於第一代規則系統的批評，其規則系統重新進行設計，並在隔年1990年的6月份，隨即推出第2版系統，搭載新的規則系統設計。第3版系統則於1991年發表，新增支援多重儲存管理機制，改善查詢處理器，並又改寫了規則系統。如此直到Postgres 95誕生之前，主要都專注於移植性及可信賴度的發展。

POSTGRES接下來開始被運用在許多不同的研究和產品上，財務資料分析系統、噴氣引擎效能監控系統、小行星追蹤資料庫、醫療資訊系統、以及數個地理資訊系統。POSTGRES也被好幾所大學用於其教學工具。最後，由Illustra Information Technologies（後來併入，而Informix目前為所擁有）技術移轉，並將其商業化。於1992年末，POSTGRES成為主要的資料管理系統。

在1993年間，用戶數量呈現倍數成長，伴隨而來的是大量的程式碼維護與服務支援，占去絕大部份原來應該進行研究的時間。為了減少維運的負擔，伯克萊的POSTGRES專案正式終止於4.2版。

2.2. Postgres 95

1994年，Andrew Yu和Jolly Chen在POSTGRES增加了SQL語法的直譯器，並且以新的Postgres 95為名，在網路上開放讓全世界的人使用。他們成為伯克萊POSTGRES原始碼最初的繼承者。

Postgres 95的程式碼是完全以ANSI C開發，並且輕量化了25%。許多內部的改良增進了效率及可維護性。當時Wisconsin Benchmark進行測試，Postgres 95在1.0.x時的效能比原始的POSTGRES 4.2快了約30%至50%。除了一些錯誤修正之外，還有下面這些主要的改良：

原有的PostQUEL以SQL（實作於伺服器端）所取代。（連接介面在PostQUEL之後便採libpq函式庫）子查詢一直到PostgreSQL出現之前都還未支援，但在Postgres 95便已能使用自訂的SQL函數，聚合函數Aggregate function則被重新實作。GROUP BY查詢語句也在此時被加入。
新的工具psql可進行互動式的SQL操作，其採用的是GNU Readline的技術。psql開始大量取代老舊的管理工具。
新的前端函式庫，libpgtcl，支援Tcl-based用戶端程式。還有一個簡易的命令列介面工具pgtclsh，使用新的Tcl命令和Postgres 95伺服器進行操作。
重新改寫了large-object處理的交換介面，僅使inversion作為儲存大型物件的唯一機制。（inversion檔案系統就此移除）
Instance-level的規則系統被淘汰了，但其規則仍用於重構規則所使用。
製作了一個簡短的說明，介紹標準的SQL功能，並隨Postgres 95原始碼發佈。
使用GNU make（取代BSD make）編譯程式碼，Postgres 95也支援使用未修正的GCC編譯器（修正高精度資料對齊問題）。

2.3. PostgreSQL

1996年，＂Postgres 95＂這個名稱很明顯不再適合。我們選擇了新的名稱，PostgreSQL，其呈現出與原始POSTGRES之間的源由，也彰顯了結合SQL力量的意義。同時，我們設定其版本由6.0開始，重回伯克萊POSTGRES專案的版號序列。

許多人持續使用＂Postgres＂（現在已經很少使用全大寫字母表現）來代表PostgreSQL，是因為傳統，也可能是因為比較好發音。這樣的用法也廣為用於暱稱或別名。

Postgres 95的發展主要在於瞭解及定義伺服器程式既有的問題，而PostgreSQL則更重視系統的能力與爭議性的功能上，不過所有的工作是全面性的。

更多有關於PostgreSQL的發展，請參閱附錄E。

3. 慣例

以下所提到慣例，用於指令的語法描述上（均為半型字元）：中括號（ [ 和 ] ）指可選擇是否輸入的選項。（在Tcl指令的語法中，習慣使用問號？表達這樣的可選擇性）大括號（ { 和 } ）及垂直線（ | ）指的是必須要輸入的部份。連續句點（...）指的是該段落可以允許不斷重覆。

為了使說明更簡潔，SQL指令使用提示字元＂=>＂，作業系統命令列指令則使用＂$＂。雖然一般而言，提示字元可能不會顯示。

Administrator一般的定義是負責安裝及運行資料庫系統的人；User指的是任何正在使用資料庫的人，或者正要使用任何PostgreSQL相關系統的人。這些定義不應該被解釋得太過嚴格，在本文件中，對於系統管理的工作，並沒有固定的假設。

4. 其他參考資訊

除了本文件之外，PostgreSQL還有其他的參考資訊：

Wiki

PostgreSQL的記錄了，，以及其他更多不同主題的資訊。

PostgreSQL wiki 也有的頁面喔。

Web Site

PostgreSQL的，有最新軟體的釋出訊息，讓你能夠和PostgreSQL相處得更棒！

Mailing Lists

郵件列表的功能，是一個為您解答疑問的好地方，你也可以分享使用經驗給其他同好，或直接和開發者溝通。詳情請參閱PostgreSQL的官方網站。

Yourself!

PostgreSQL是一個開源的專案，也就是說，它仰賴社群的每一個人給予支持。當你開始使用PostgreSQL，你會需要其他人的幫助，可能是透過文件或是郵件列表的功能。請考慮也可以回饋您的知識。在閱讀郵件列表和回答疑問的同時，如果你學到了未被文件記載的知識時，請寫下來，並且供獻出來。如果你撰寫了一些程式碼增加了特別的功能，也希望能夠回饋到社群之中。

I. 新手教學

歡迎來到 PostgreSQL 的新手教學。在這個部份裡的內容，主要提供有關於 PostgreSQL 各項功能的簡介、關連式資料庫概念、以及 SQL 語法的入門說明。我們只假設您俱備一些電腦系統基本操作，並不需要很專業的 Unix 或程式設計經驗。這裡主要提供一些實用的經驗，還有 PostgreSQL 系統中重要部份的介紹。在這個部份並不會進行所有議題的詳細說明。

在你閱讀完新手教學之後，也許可以繼續閱讀：更多有關於 SQL 語法的標準知識；或者到：瞭解如何開發 PostgreSQL 的應用程式；而如果你需要建置及管理你的資料庫伺服器的話，請參閱的內容。

1. 入門指南

版本：11

1.1. 安裝：從無到有，安裝一個 PostgreSQL 資料庫系統。
1.2. 基礎架構：認識 PostgreSQL 的資料庫架構。
1.3. 建立一個資料庫：建立第一個 PostgreSQL 資料庫。
1.4. 存取一個資料庫：開始存取你的 PostgreSQL 資料庫。

1.1. 安裝

版本：11

你需要先進行安裝，才能開始使用PostgreSQL。當然，PostgreSQL也可能已經被安裝在你的系統之中，因為你的作業系統預設套件包含了PostgreSQL，或其他系統管理者已先行安裝。如果是這樣的話，那麼你應該先瞭解作業系統的資訊，或向你的系統管理員取能存取的資訊。

如果你並不確定PostgreSQL是否已經可以使用，或者你也可以自行安裝試試。這樣做並不是很困難，而且是很好的操作練習。PostgreSQL可以以一般使用者進行安裝，它並不需要系統管理者（root）的權限才能安裝。

如果你打算自行安裝PostgreSQL，你可以參考第16章的指令進行，完成之後再回到這裡，以瞭解接下來關於設定環境變數的內容。

如果你的系統管理者並非以預設的方式安裝，你可能還有一些額外的工作要做。例如，如果資料庫主機其實是遠端的伺服器，你會需要設定PGHOST的環境變數，將其指向資料庫主機的網路名稱。而PGPORT變數也是必須要設定的。最基本的情境是，如果你嘗試啓動一個應用程式，而它回報它無法取得資料庫連線時，你就必須洽詢你的系統管理者。而如果系統管理者就是你自己，那麼你應該依文件再確認你的環境設定是正確的。如果你仍然並不清楚前面所描述的事項，請詳細閱讀的內容。

1.2. 基礎架構

版本：11

在開始使用之前，你需要瞭解基本的PostgreSQL系統架構。認識PostgreSQL如何回應操作，有助於讓你更清楚瞭解以下的說明。

以資料庫的述語來說，PostgreSQL採用了主從式架構（client/server）。PostgreSQL會在進行下列操作時保持連線：

伺服器的執行緒，負責管理資料庫的檔案、受理用戶端的連線要求、執行相對應的資料庫動作。這樣的資料庫伺服端程式稱之為「postgres」。
用戶端的程式用來發起資料庫操作的行為，其設計的形態很廣泛：可能是文字介面的工具、圖型介面的程式、將資料庫內容顯示成網頁的網際網路伺服器、甚或是專用的資料庫管理工具。有一些用戶端程式是由PostgreSQL官方所提供，大部份由第三方的其他使用者所開發。

如同一般的主從式架構，用戶端與伺服端可以是兩台不同的主機，而他們透過TCP/IP的網路協定溝通。你應該將這個觀念謹記在心，因為某些在用戶端可以被存取的檔案，在伺服端可能就無法存取（或使用不同的檔案名稱）。

PostgreSQL伺服器可以管理來自多個用戶端的同步連線。為了達到這樣的功能，它會自我複製（fork）成新的執行緒，一對一地處理每一個連線。這個部份進一步來說，用戶端和新的伺服器執行緒之間的溝通，並不需要原始的postgres執行緒介入。也就是說，主要的資料庫服務執行緒會持續等待其他用戶端的連線，協助安排好其與伺服端執行緒的配對之後便完全交接，再回到等待的狀態。（當然，使用者完全不會察覺這些行為，在此說明僅僅是為了整體性的概念描繪）

1.3. 建立一個資料庫

版本：11

第一個測試確認你是否能夠存取一個資料庫服務，就是嘗試去建立一個資料庫。一個執行中的 PostgreSQL 服務可以管理許多個資料庫。一般來說，每一個專案或使用者會分開使用不同的資料庫。

你的系統管理員也可能已經為你建立了一個資料庫，如果是這樣的話，那你可以略過本節說明，直接進入到下一節的內容。

要建立一個新的資料庫，在本例中取名叫「mydb」，你可以使用以下的命令：

$ createdb mydb

如果在這個步驟沒有產生任何回應，那就是成功了。你可以跳過本節剩餘的部份。

但你如果看到如下的訊息：

createdb: command not found

這個訊息代表 PostgreSQL 並沒有被正確的安裝。不是它沒有被安裝好，那就是你的命令路徑設定並未包含這個指令。嘗試使用下列這個包含絕對路徑的指令看看：

$ /usr/local/pgsql/bin/createdb mydb

命令路徑在你的系統可能會有些不同。洽詢你的系統管理員，或著檢查安裝步驟以修正這個情況。

另一種回應可能是如此：

createdb: could not connect to database postgres: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

這代表了資料庫服務尚未啓動，或者它並不存在於createdb預設連線的位置。同樣地，檢查安裝的步驟或洽詢系統管理者。

而另一種回應也可能是：

createdb: could not connect to database postgres: FATAL:  role "joe" does not exist

這裡指出你用來連線的使用者名稱。這種情況可能會發生在你的資料庫管理員並未建立屬於你的資料庫。（PostgreSQL 的使用者帳戶是獨立於作業系統的使用者帳戶的）如果你是資料庫管理員，請參閱第 21 章，進行建立資料庫帳戶。你必須是 PostgreSQL 初始安裝的管理者（通常是 postgres），以建立第一個一般資料庫使用者的帳戶。這個情況也可能發生在，你被發配的 PostgreSQL 使用者名稱有別於你的作業系統使用者名稱，如果是這樣的話，那你需要在指令上使用 -U 選項，或者設定 PGUSER 環境變數，以指定你的 PostgreSQL 使用者名稱。

如果你有一個資料庫帳戶，但你並沒有建立資料庫的權限，你將會看到下列訊息：

createdb: database creation failed: ERROR:  permission denied to create database

並非每一個使用者都被授權可以建立一個新的資料庫。如果 PostgreSQL 拒絕你建立資料庫，那麼系統管理者就需要賦予你建立資料庫的權限。洽詢你的系統管理者，如果是這種情況的話。如果你是自行安裝 PostgreSQL，那麼你應該以你啓動資料庫服務的使用者登入作業系統，再嘗試這個操作。

你也可以建立資料庫，但使用其他的名稱。PostgreSQL 允許在資料庫系統中建立無限制數量的資料庫。資料庫名稱必須是以英文字母為開頭，總長度限制為 63 位元組。一個簡便的方式是，建立一個與你使用者名稱同名的資料庫。許多工具會預設假定資料庫名稱和你同名，所以這可以省略一些文字的輸入。要建立這樣的資料庫，只要簡單地輸入：

$ createdb

如果你不再使用你的資料庫，你可以移除它。舉例來說，你是 mydb 這個資料庫的擁有者（建立者），你可以使用下列指令來消毁它：

$ dropdb mydb

（對這個指令來說，資料庫名稱並不會預設使用你的使用者同名資料庫。你必須明確地指定名稱）這個動作會完全地移除所有和這個資料庫相關的檔案，並且沒有回復的可能，所以要進行這個動作的話，請一定要考慮清楚。

更多有關於 createdb 和 dropdb 的說明，請參閱 createdb 和 dropdb 的相關章節。

1.4. 存取一個資料庫

版本：11

一旦你已經建立一個資料庫，你就可以開始以下列方式進行存取：

執行 PostgreSQL 互動式的終端程式，稱作 psql，它可以讓你輸入、編輯、執行 SQL 指令。
使用既有的圖型化介面工具，例如 pgAdmin 或是支援 ODBC 或 JDBC 的辦公室軟體，以建立並輸入資料到資料庫裡。不過這部份並未包含在這份手冊之中。
自行撰寫一個程式，可以使用許多種程式語言來完成。這個部份將會在第 IV 章中進行介紹。

在這份指南中，你可能會先使用 psql 來進行一些嘗試。你可以藉由下列指令開始操作 mydb 這個資料庫：

$ psql mydb

如果你並未指明資料庫名稱，那麼它預設會以你的使用者名稱作為資料庫名稱。在先前的章節使用 createdb 時，你已經知道這個隱含的規則了。

在 psql 中，你會以下列訊息開始：

psql (10beta1)
Type "help" for help.

mydb=>

最後一行也可能是：

mydb=#

這表示你是資料庫的超級使用者（superuser），如果你是自行安裝 PostgreSQL 的話，大概就會是這個情況。作為一個超級使用者，表示你不會受限於任何存取控制。不過在這份指南中，這並不是重要的事。

如果你在啓動 psql 時遭遇了一些問題，那麼請回到前一節。createdb 和 psql 的行為很類似，如果前者正常，後者也應該如期運行。

最後一行會輸出的是 psql 的提示字串，它表示 psql 正在等待你輸入 SQL 查詢語句。試試下面的指令吧：

mydb=> SELECT version();
                                         version
------------------------------------------------------------------------------------------
 PostgreSQL 10beta1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
(1 row)

mydb=> SELECT current_date;
    date
------------
 2016-01-07
(1 row)

mydb=> SELECT 2 + 2;

 ?column?
----------
        4
(1 row)

psql 程式中也內建了一些非 SQL 的命令。他們會以倒斜線（）起頭。舉例來說，你可以輸入下列指令以取得一些有關 PostgreSQL 所支援的 SQL 語法資訊：

mydb=> \h

要離開 psql 的話，請輸入：

mydb=> \q

如此的話，psql 將會結束，並回到你的命令列介面之中。（想瞭解更多內建指令，在 psql 提示字串後輸入 \? 。）完整的 psql 說明，都記載在 psql 頁面之中。在這份指南中，我們並未使用這些功能，但你可以在需要的時候使用他們。

2. SQL查詢語言

版本：11

本章適合初學資料庫的朋友閱讀，以簡單的語法範例，實際操作以瞭解資料庫的運作方式。事實上，更複雜的資料庫行為，也不脫這個基本的操作模式。

2.1. 簡介

版本：11

在這一章之中，提供了一個如何使用 SQL 進行簡易操作的大致概念。這裡主要讓你有基本的認識，但無法提供 SQL 完整且巨細靡遺的說明。許多書籍詳細介紹了 SQL，例如「Understanding the New SQL. A complete quide.」及「A Guide to the SQL Standard. A user's guid to the standard database language SQL.」。你應該瞭解的是，一些 PostgreSQL 語法來自於標準 SQL 的延伸。

在下面的例子當中，我們假設你已經建立了一個資料庫 mydb，如同前面章節所述，你也能夠使用 psql 了。

這些例子也放在 PostgreSQL 的原始碼之中，你可以在目錄 src/tutorial/ 下找到他們。（PostgreSQL的可執行套件可能未包含這些檔案）想要使用這些檔案的話，首先請切換到該目錄之下，然後執行 make：

這將會建立編譯 C 語言的程序，包含了使用者自訶的函式及型別。接下來，進行下列動作，以開始這個導覽：

\i 指令會去指定的檔案讀取內容，並且執行。而在 psql 的 -s 選項則可以使用單步模式執行，也就是在每一個與伺服器互動的指令之後暫停。這個指令被使用在本節的檔案 basics.sql 之中。

2.2. 概念

版本：11

PostgreSQL 是一個關連式資料庫管理系統（RDBMS）。這表示它是一個管理關連性質資料的系統。關連性，基本上在數學裡是以資料表（table）的形式來表現的。今天，以資料表為形式儲存資料是很常見的事，它是很自然的表現，但也有很多其他組識資料庫的方式。在 Unix-like 的作業系統中，檔案和目錄是一個階層式資料庫的案例。更先進的發展是採用物件導向式的資料庫。

每一個資料表是很多資料列（row）的集合。而每一個資料列則以許多相同集合的欄位（column）所組成。每一個欄位都被指定了特定的資料型別。每一個資料列中欄位的次序是固定的。很重要且必須記得的是，SQL 並不保證資料列在資料表中的次序（雖然他們可以在顯示的時候被明確表現）。

一個資料庫中集合了許多資料表，而很多的資料庫則被一個 PostgreSQL 服務所管理，形成一個資料庫叢集。

2.3. 創建一個新的資料表

版本：11

你可以創建一個新的資料表，為它取一個名字，並且宣告所有的欄位名稱與其資料型別：

你可以把上述內容在 psql 中輸入，包含換行字元不會影響判讀。psql 是以分號作為指令結束的判定。

空白（包含「空白」、「定位符號」和「換行符號」）都可以自由使用在 SQL 指令當中。這表示你可以將指令以不同的形式排版，甚至全部寫都在一行也沒問題。使用破折號，連續2個（＂--＂），表示緊接的內容只是註解，直到該行結束為止。PostgreSQL 是不分大小寫字母的，包括各類關鍵字和描述語，除非是使用雙引號括起來的文字。（更精確地說，沒有被雙引號括起來的識別字，都會轉為小寫字母進行識別）

varchar(80) 表示指定一個資料型別，它可以儲放任意 80 個字元以內的字串。int 是一般認知的整數型別。real 表示資料是單精確度的浮點數。date 顧名思義，就是日期時間型別。（本例中欄位名稱和型別都使用 date，這可能是方便，也可能是困擾，端看你如何使用。）

PostgreSQL 支援標準的資料型別 int, smallint, real, double precision, char(N), varchar(N), date, time, timestamp, interval，也支援了複合型的地理資料型別。PostgreSQL 可以自訂組合任意數量的資料型別。語法上，資料型別名稱並不是保留關鍵字的範圍，除非特定的標準 SQL 支援需求之外。

第二個例子用來儲存城市及其所在的地理位置：

point 型別是一個 PostgreSQL專屬資料型別的範例。

最後，應該被點出來的是，如果你不再需要一個表格，或者想要重新以別的方式創建它，那麼你可以以下列的指令來移除它：

2.4. 資料列是資料表的組成單位

版本：11

INSERT 指令被用來將資料以資料列（row）的形式，新增至資料表（table）之中：

INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');

注意，所有的資料型別都有明確的輸入格式。只要不是簡單的數值內容，都必須要以單引號（'）括住，如同在本例中的形式。日期時間型別（date type）的資料內容就比較有彈性，但在這個導覽之中，我們仍然使用較固定的格式來表現。

地理資訊型別（point type）需要有座標組作為輸入，如下所示：

INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');

到目前為止，語法的使用需要你依照欄位宣告的次序擺放，而另一種語法可以允許你明確地指定資料相對應的欄位：

INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
    VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');

你可以將欄位以不同的次序擺放，甚或略去某些欄位，例如，precipitation 欄位（prcp）內容未知：

INSERT INTO weather (date, city, temp_hi, temp_lo)
    VALUES ('1994-11-29', 'Hayward', 54, 37);

許多開發者會認為，在撰寫習慣上，明確指定欄位是比較好的方式。

請執行下列的指令，你將會擁有後續章節所需要的範例資料。

你可能需要使用 COPY 這個指令從文字檔案來載入大量的資料。這個指令會比 INSERT 要快上許多，因為 COPY 指令的設計就是為了大量資料輸入而產生的。它少了一些彈性，但提供了效率上的最佳表現。使用範例如下所示：

COPY weather FROM '/home/user/weather.txt';

資料來源的檔案必須存在於後端的伺服器之中，並且可被 PostgreSQL 使用者（postgres）所存取，注意不是用戶端的主機，因為後端伺服器的服務需要直接讀取該檔案。你可以取得更多詳細說明，在 COPY 指令的說明頁面。

2.5. 資料表的查詢

版本：11

要從資料表（table）中取出資料，稱作資料表的查詢。要進行這個行為，你需要 SQL 中的 SELECT 指令。這個指令由幾個部份所組成，回傳列表（select list，想要回傳的欄位）、資料表列表（資料來源的資料表）、選擇性的條件定義（指定一些限制條件）。舉個例子來說，要取得資料表 weather 中所有的資料的話，請輸入：

這裡的星號 * 表示「所有欄位」。下列的指令會回傳相同的結果。

其輸出結果將會如下所示：

你可以在回傳列表中撰寫一些運算表示式，而不只是簡單的欄位引用。舉例來說，你可以輸入：

這應該會產生這樣的結果：

注意，「AS」被用來重新命名輸出的欄位。（選用）

查詢語句可以加上「WHERE」來設定限制條件，以指定哪些列才需要被回傳。WHERE 的內容是一個布林（truth value）表示式，而只有在其運算值為真（true）時，該列才會被回傳。一般的布林運算子（AND, OR, NOT）都是被允許出現在表示式中的。舉例來說，下列的指令將會回傳 San Francisco 在雨天的天氣數值：

結果：

你可以將結果進行排序：

在這個例子之中，其次序並沒有完全地被指定，所以你可能會得到 San Francisco 的列以另一種次序呈現。而你如果以下列指令查詢的話，那你就會得到如上但固定的結果：

你可以在查詢時去除重覆的列：

再一次，其結果的次序可能每次都不同，你可以同時使用 DISTINCT 及 ORDER BY 來確保能得到一致性的查詢結果：

2.6. 交叉查詢

版本：11

到目前為止，我們的一個查詢都只涉及到一個資料表。其實可以在同一個查詢中，同時查詢多個資料表，或者在同一個資料表之中同時處理多個資料列的資料。在一個查詢之中，涉及到同一個或多個不同的資料表中的資料，稱作為交叉查詢（join）。舉個例子來說，你希望同時列出天氣和城市位置的資料。要完成這項工作，我們需要關連資料表 weather 中的 city 欄位與表格 cities 中的 name 欄位，然後回傳符合條件的資料。

注意

這只是一個概念式的模形，交叉查詢（join）會以更有效率的方式運行，並非真正需要比較每一種組合是否符合條件，不過這些過程對於使用者而言並不會產生操作或結果上的差異。

下列查詢會產生交叉查詢的結果：

在這個結果中可以觀察到兩件事情：

不會有關於 Hayward 的結果出現。這是因為在資料表 cities 中未有 Hayward 的資料，所以交叉查詢會忽略資料表 weather 中未能關連的資料。關於這點，我們很快就會有解決辦法。
有兩個欄位顯示了城市的名稱。這樣是正確的，因為來自於資料表 weather 和 cities 的欄位被串連起來了。實務上，這樣的結果並不令人滿意，所以也許你可以明確地指出輸出的欄位，取代「 * 」的使用：

練習：試試看，當 WHERE 表示式被省略的話，查詢語句的意義會怎麼樣？

因為所有的欄位都使用不同的名稱，所以解譯器會自動發現他們所屬的資料表為何。如果在兩個資料表之中，存在有相同名稱的欄位時，你最好明確指出確定的欄位，如下所示：

多數開發者認為，在交叉查詢中，明確指出確定的欄位名稱，是良好的撰寫習慣。這樣查詢就不會因為有相同的欄位名稱而產生錯誤。而相同名稱的欄位可能是開發後續才加入的，未指明的話，就可能造成意外的結果。

交叉查詢也可以寫成如下的另一種形式：

這種語法並不如上述的常見，但我們會在這裡說明，以幫助你在後續章節的學習。

現在我們要回到前面的問題，把 Hayward 的資料放在輸出的結果之中。我們要在查詢中做的是，掃描資料表 weather，找到有所關連的每一列資料；沒有關連到的資料列，我們要填上「空值」（null）在資料表 cities 相對的欄位之中。這樣的查詢我們稱作「外部交叉查詢」（outer join）。（先前的交叉查詢為「內部交叉查詢」（inner join））。這樣的查詢指令如下所示：

這種查詢稱作為「左側外部查詢」（left outer join），因為這個交叉查詢，放在左側的資料表中的資料列，一定會在結果中至少出現一次，而右側的資料表中，則只有輸出有關連到左側資料表的資料列。當左側資料表的資料列，並沒有在右側資料表中被關連到時，屬於右側資料表的欄位就會被填上空值輸出。

練習：也有「右側外部交叉查詢」（right outer join）和「完全外部交叉查詢」（full outer join），試著找出他們都做了些什麼。

我們也可以對同一個資料表做交叉查詢，稱作為「自我交叉查詢」（self join）。接下來的範例，假設我們希望找到所有氣溫範圍的天氣資料。所以我們需要讓 temp_lo 及 temp_hi 兩個欄位，和其他的 temp_lo 及 temp_high 相比較。我們可以用下列的查詢來符合需求：

這裡我們重新命名了資料表 weather 為 W1 及 W2，以在交叉查詢中區分左側及右側。你也可以在其他查詢中使用這個技巧，以節省輸入的複雜度，例如：

你將會在後續內容中，不斷練習到這樣的使用方式。

2.7. 彙總查詢

版本：11

如同其他的關連式資料庫產品，PostgreSQL 也支援彙總查詢的功能。彙總查詢指的是能夠把多個資料列的資料經過計算，產生單一結果的功能。舉例來說， count、sum、avg（平均值）、max（最大值）、min（最小值）都是彙總查詢的函式。

這裡的例子，我們可以得到所有低溫中的最大值：

SELECT max(temp_lo) FROM weather;

 max
-----
  46
(1 row)

如果我們想要知道，這個數值是發生在哪一個城市？也許可以試試：

SELECT city FROM weather WHERE temp_lo = max(temp_lo);

WRONG

不過，這行不通，因為 max 不能使用在 WHERE 條件式當中。（會有這樣的限制，是因為 WHERE 條件式目的是要判斷有哪些資料列的資料應該被彙總計算，所以很明顯地，這件事必須要在彙整計算前發生，這就產生了矛盾。）所以，像本例的查詢一般會使用子查詢（subquery）來取得適當的結果：

SELECT city FROM weather
    WHERE temp_lo = (SELECT max(temp_lo) FROM weather);

     city
---------------
 San Francisco
(1 row)

這樣就對了，因為子查詢是一個獨立的查詢，它可以獨立進行彙總查詢，有別於括號以外的查詢語句。

彙總查詢和 GROUP BY 一起使用會很方便的。舉例來說，我們可以得到每個城市所觀測到的最高氣溫：

SELECT city, max(temp_lo)
    FROM weather
    GROUP BY city;

     city      | max
---------------+-----
 Hayward       |  37
 San Francisco |  46
(2 rows)

這個查詢對每個城市都輸出一列的結果。每一個彙總的結果，將整個資料表，以關連到的城市進行計算。而我們可以進一步過濾資料內容，使用 HAVING：

SELECT city, max(temp_lo)
    FROM weather
    GROUP BY city
    HAVING max(temp_lo) < 40;

  city   | max
---------+-----
 Hayward |  37
(1 row)

如果限制所有 temp_lo 的數值必須要小於 40 （WHERE temp_lo < 40）的話，也可能得到相同的結果。最後，如果我們只關心以＂S＂開頭的城市的話，可以這樣做：

SELECT city, max(temp_lo)
    FROM weather
    WHERE city LIKE 'S%'            -- (1)
    GROUP BY city
    HAVING max(temp_lo) < 40;

LIKE 運算子進行特徵比對運算，這將會在 9.7. 特徵比對中進一步說明。

這裡很重要的是，瞭解 SQL 中 WHERE 和 HAVING 之間的行為。其根本上的差異是：WHERE 會在合併和彙總計算之前進行選擇資料的動作（也就是它控制著，哪些資料需要被彙總計算）；而 HAVING 是在合併及彙整計算之後，才進行過濾資料的動作。所以，在 WHERE 條件式當中，絕不可以使用彙整運算式；另一方面，HAVING 條件式總是使用彙整運算式。（嚴格來說，你也可以不在 HAVING 條件式中使用彙整運算式，但很少人這樣使用，通常就會改寫到 WHERE 條式件當中，那會更有效率。）

在先前的例子當中，我們可以把城市名稱的限制放在 WHERE 條件式之中，因為它不需要彙總。這將會比放在 HAVING 條件式中更有效率，因為這樣可以避免合併及彙整運算整個表格，不用浪費時間在本來就會被過濾掉的資料上。

2.8. 更新資料

版本：11

你可以使用 UPDATE 指令以列為單位來更新資料。假設你發現氣溫的數值測量在 11 月 28 日之後都多了 2 度。你可以以下列語法來修正這些資料：

UPDATE weather
    SET temp_hi = temp_hi - 2,  temp_lo = temp_lo - 2
    WHERE date > '1994-11-28';

查看一下這些更新後的資料：

SELECT * FROM weather;

     city      | temp_lo | temp_hi | prcp |    date
---------------+---------+---------+------+------------
 San Francisco |      46 |      50 | 0.25 | 1994-11-27
 San Francisco |      41 |      55 |    0 | 1994-11-29
 Hayward       |      35 |      52 |      | 1994-11-29
(3 rows)

2.9. 刪除資料

版本：11

要把某些資料列從資料表中移除，就使用 DELETE 這個指令。假設你對於 Hayward 這個城市的天氣不再感興趣了，那麼你可以執行下列指令，來刪除資料表中的這些資料：

所有關於 Hayward 的資料都被刪除了。

這個指令有一個應該要特別注意的情況：

沒有任何限制的條件，DELETE 將會刪去所有該資料表中的資料，使成為空的資料表。資料庫系統並不會在這個動作執行前和你確認！

3. 先進功能

3.1. 簡介

在前面的章節，我們介紹了如何使用 SQL 來存取 PostgreSQL 的基本方式。接下來，我們將會討論更多先進的功能，SQL 的管理功能以及防止資料遺失或損毁。最後，我們也會介紹一些 PostgreSQL 的延伸功能。

這個章節偶爾會引用第 2 章的範例，試著去改寫或是優化他們，所以閱讀過上一章也是很有用的。在這一章中有一些範例是來自於 tutorial 目錄中的 advanced.sql，這個檔案有一些範例資料可以載入，但載入方式在此就不再贅述。（請參閱 2.1 節的內容）

3.2. 檢視表（View）

讓我們回到 2.6 節的查詢範例。假設關連天氣資訊和城市位置的結果，是你的應用中特別常用的，但你並不想要每次都要輸入一長串的查詢語句。那麼，你可以為這個查詢語句建立一個「檢視表（View）」，你可以取一個名字，當你需要使用的時候，你可以把它當作一個資料表來使用：

CREATE VIEW myview AS
    SELECT city, temp_lo, temp_hi, prcp, date, location
        FROM weather, cities
        WHERE city = name;

SELECT * FROM myview;

妥善地使用檢視表，對於良好的 SQL 資料庫設計而言，是很關鍵的部份。檢視表允許你可封裝你的資料表結構與細節，當你的應用系統在逐步發展成熟的過程中，扮演一致性的資料介面。

檢視表可以用在大多數資料表可以使用的地方。而用檢視表來封裝其他檢視表的情況，也不少見。

3.3. 外部索引鍵

回想一下在中的表格 weather 及 cities，思考下列問題：你想要保證沒有另一個人可以新增在 cities 中沒有的城市資料到 weather 中。這就是所謂資料關連性的管理。在簡單的資料庫系統當中，可能會這樣實作：先檢查 cities 中是否已有對應的資料，然後再決定資料表 weather 中新增或拒絕新的天氣資料。這個辦法還有很多問題，而且很不方便，所以 PostgreSQL 可以幫助你解決這個需求。

新的資料表宣告如下所示：

現在嘗試新增一筆不合理的資料：

外部索引鍵或簡稱外部鍵（foreign key）的行為可以讓你的應用程式變得容易調整。我們在這個導覽中不會在深入這個簡單的例子了，但你可以在取得進一步的資訊。正確地使用外部索引鍵，可以改善資料庫應用程式的品質，所以強烈建議一定要好好學習它。

3.4. 交易安全

交易（Transaction），是所有資料庫的基礎概念。基本上來說，一個交易指的是，一系列的執行步驟包裹在一起，其結果只有全部成功或全部失敗兩種情況的操作行為。而其即時的執行狀態，對於其他同時在進行的交易而言，相互之間都是不可見的。如果在執行過程中產生了錯誤而造成整個交易無法完成，那麼所有的指令都不會對資料庫原來的內容產生影響。

舉例來說，某個銀行資料庫存放著各個客戶的存款資訊，也存放著分行的存款總額資訊。假設我們想要轉帳 $100.00，從 Alice 的帳號轉到 Bob 的帳戶。可以很直觀地依敘述，直接以下列指令執行：

這些指令的細節在這裡並不重要，重要的是，有好幾個更新資料的動作要被執行。我們銀行的營業員需要保證所有的更新資料都要完成，或是保持原樣。如果因為系統錯誤，而造成 Bob 收到 $100.00，但 Alice 卻沒有轉出金額，就不是應該發生的事。又或是 Alice 轉出了現金，而 Bob 卻沒有轉入金額，她也不會是開心的客戶。我們需要具有保證交易安全的方法，也就是如果在執行過程中，有部份出了錯，那麼即使是已經執行的部份，也不會對資料庫產生影響。把這些更新資料的指令，包裝在一個交易之中，就是這個保證交易安全的方法。這樣的交易稱作為 atomic：從其他的交易的角度來看，整個行為只有完全執行，亦或是什麼都沒有做，兩種結果而已。

我們也希望有某個保證是，一旦某個交易被完成了，那麼會由資料庫系統發出通知，使它確實是永久性的資料，即使發生短暫的當機之後，資料也不會遺失。舉例來說，如果我們正在進行 Bob 的提款系統操作行為，在他走出銀行大門之後，我們不要有任何可能性使他的提款記錄消失。一個具備交易安全的資料庫，會將這裡交易裡的更新行為，在它們被回報完成之前，都記錄在長效型儲存裝置上（也就是磁碟機）。

交易安全資料庫的另一個重要性質是， atomic update 的概念：當多個交易同時在進行時，每一個交易都不能夠看到其他交易未完成交易的資料狀態。舉個例子，如果某個交易正在進行總計所有分行的餘額，它不會只包含 Alice 的分行的提款，或不計算 Bob 的分行的存款，反之亦然。所以交易必須是全有全無的結果，而不只是資料庫資料的永久性，還包含了交易執行過程的可視性。一個未完成的交易直到完全完成之前，其間資料的改變，對其他的交易而言都看不見；而當交易完成的同時，資料的改變也同時全部呈現出來。

在 PostgreSQL 中，所謂的交易，是以 SQL 的 BEGIN 及 COMMIT 兩個指令相夾的過程。所以我們前述的銀行交易實際上會像這樣：

如果在交易的過程之中，我們決定不要完成交易（也許我們發現 Alice 的帳戶餘額不足），我們可以使用 ROLLBACK 指令來取代 COMMIT，那麼所有資料的變更都會取消。

PostgreSQL 一般將每一個 SQL 指令都視為一個交易來執行。如果你並沒有使用 BEGIN 指令，那麼每一個個別的指令就會隱含 BEGIN 先行，然後如果成功的話，COMMIT 也自動執行。一系列被 BEGIN 和 COMMIT 包夾的區域，有時候就稱為交易區塊。

注意

有一些用戶端程式會自動加入執行 BEGIN 及 COMMIT 指令，使得你不需要要求就獲得交易區塊的效果。請詳閱你所所用的工具文件。

還有一種交易的控制更為細緻，就是使用交易儲存點（savepoint）。交易儲存點允許你可以選擇性地取消部份交易，而只成交剩下的部份。使用 SAVEPOINT 指令定義一個交易儲存點之後，你可以使用 ROLLBACK，回復該交易狀態到交易儲存點。所有在交易儲存點之後所造成的資料庫變更，都會被回復，但交易儲存點之前的變更會暫時留存。

在回復到交易儲存點之後，它仍然可以繼續進行，而你可以多次回到該儲存點。相反地，如果你確定你不要再回復到某個特定的交易儲存點時，它也可以被釋放出來，系統資源也可以獲得舒解。記得，釋放或回復到一個交易儲存點時，將會自動釋放所有在那之後的交易儲存點。

所有這些過程都發生在交易區塊之中，所有沒有任何改變會讓其他資料庫連線所發現。當你確認完成了交易區塊的時候，完成交易的動作就會讓其他的連線知道，也能發現資料的改變；同時，回復的動作也會再也無法執行了。

記得這個銀行的資料庫，假設我們從 Alice 的帳號提出了 $100.00，然後存入了 Bob 的帳戶之中，隨後又發現應該要存到 Wally 的帳戶。我們可以使用交易儲存點來完成這個過程：

當然，這個例子是過度於簡化了，但這呈現出在交易區塊中使用交易儲存點，有著更多的可能性。進一步來說，ROLLBACK TO 是唯一能夠控制交易區塊執行流程的方式，當系統產生錯誤時，可以縮小回復的範圍，而不是只能全部回復再執行。

3.6. 繼承

繼承是一個物件導向資料庫的概念，它開啓了資料庫設計的更多可能性。

讓我們創建兩個資料表：cities 和 capitals。很自然地，首都（capitals）也是城市（cities），所以你希望有個方式，可以在列出所有城市時，同時也包含首都。如果你真的很清楚的話，你可以建立如下的結構：

這樣的查詢結果會是正確的，不過它有點不是很漂亮，當你需要更新一些資料的時候。

有一個更好的方法是這樣：

在這個例子中，captitals 繼承了 cities 的所有欄位（name, population, altitude）。欄位 name 的資料型別是文字型別（text），是一個 PostgreSQL 內建的資料型別，它允許字串長度是動態的。然後宣告 capitals 另外多一個欄位，state，以呈現它是屬於哪一個州。在 PostgreSQL，一個資料表可以繼承多個其他的資料格。

舉個例子，下面的查詢可以找出所有的城市名稱，包含各州的首都，而其海拔高過於 500 英呎以上：

回傳結果：

另一方面，下面的查詢可以列出非首都的城市，且其海拔在 500 英呎以上：

這裡的「ONLY」（cities之前），指的是這個查詢只要在資料表 cities 上就好，不包含繼承 cities 其他資料表。這裡許多我們都已經討論的指令 — SELECT、UPDATE、DELETE — 都支援 ONLY 這個修飾字。

注意

雖然繼承經常被使用，但尚未整合唯一性限制或外部索引鍵的功能，這限制了它的可用性。詳情請參考的說明。

3.7. 結論

PostgreSQL 還有許多這份導覽中未能介紹到的功能，這裡主要是針對新鮮的 SQL 使用者所準備的內容。這些功能將會在後續的章節進行更詳細的討論。

如果你覺得你需要更多介紹的資訊，可以到取得更多訊息。

II. SQL查詢語言

在這個部份介紹如何在 PostgreSQL 中使用 SQL 語言。首先，我們從一般性的 SQL 語法開始說明，然後解釋如何建立結構來保存資料，如何充實資料庫，以及如何查詢資料的方法。中段的部份列出 SQL 指令中的資料型別與函數。最後剩餘的部份，將會針對一些調教資料庫的重要議題進行說明。

這個部份的內容設計讓初學者可以循序漸進地完整瞭解該主題，而不需要反覆前後查閱。各章的內容設計上都是獨立的，所以進階的使用者可以分別閱讀他們需要的部份。在這個部份的內容，針對於主題式的單元描述。需要瞭解詳情的讀者，請參閱第 6 部份中，個別指令的說明頁面。

在這個部份裡的讀者，應該要知道如何連線到一個 PostgreSQL 資料庫，並且執行 SQL 指令。如果不熟悉這些操作的讀者，建議先閱讀第 1 部份的內容。SQL 指令一般是使用終端工具 psql，但其他具有類似功能的程式也可以使用。

4. SQL語法

這章中說明 SQL 的使用語法。從這裡建立後續章節所需的理解基礎，然後進一步瞭解 SQL 如何使用去定義及修改資料。

我們也建議已經熟悉 SQL 語法的使用者，仔細地閱讀本章，因為這裡包含了一些有別於其他 SQL 資料庫或專屬於 PostgreSQL 的規則和觀念。

5. 定義資料結構

這一章涵蓋了如何建立資料庫結構。在關連式資料庫中，原始資料儲存在表格之中，所以在這一章裡，主要說明表格如何建立及調整，以及有什麼樣的功能可以操控所存放的資料。再來我們會討論表格如何以結構來管理，以及權限的設定。最後，我們會簡短地看一下其他的功能如何影響資料儲存，像是繼承、表格分割、view、函數、還有觸發函數。

5.1. 認識資料表

「資料表」（table）在關連式資料庫中的角色很接近在紙上畫一個「資料表」：包含了列與欄。欄的數量與次序是固定的，而每個欄位都有一個名稱。列的數量是變動的—它表示在當下有多少資料被存在資料庫中。SQL 並不保證列在資料表中的次序。當讀取資料表的時候，除非明確要求要排序，不然列與列之間是不存在固定的次序。這些將在第 7 章中進一步說明。進一步來說，SQL 並沒有給每一列一個唯一性的識別，所以在資料表中是有可能存在有完全相同內容的列。這是 SQL 架構下的數學模型結果，通常不是理想的結果。在這章之後，我們會說明如何處理這個問題。

每一個欄位都有一個資料型別。資料型別限制了儲存於該欄位的資料內容，同時也設定了資料儲存的型態，使得該資料可以直接用於計算。舉個例子，一個被宣告為數字型別的欄位，就不能放進任何文字字串，而儲存於此欄位中的資料，可用於數學計算。相反地，一個被宣告為字元字串的欄位，可以儲存任何型能的資料，但就無法用於數學計算了，雖然也有其他操作可以進行字串串接。

PostgreSQL 擁有許多內建的資料型別，可以適應許多應用系統。使用者也可以自訂他們所需的資料型別。大多數內建的資料型別都有顯而易見的名稱與用法，所以我們打算在第 8 章再做詳細的說明。有一些常用的資料型別，像是 interger 用於整數，numeric 用於浮點數，text 用於字串，date 則是日期，time 是時間，而 timestamp 則同時包含日期和時間。

要建立一個資料表，你可以使用 CREATE TABLE 指令。這個指令你至少要指定一個名稱給新的資料表，還有每一個欄位的名稱與資料型別。例如：

CREATE TABLE my_first_table (
    first_column text,
    second_column integer
);

這個建立一個叫作 my_first_table 的資料表，它包含了兩個欄位。第一個欄位叫作 first_column，其資料型別為 text；第二個欄位名稱為 second_column，資料型別為 integer。表格與欄位名稱的規則依 4.1.1 節中所介紹的識別字語法，但也有一些例外。注意欄位列表是用逗號分隔，並且包含於括號之中。

當然，前面的例子明顯只是做做樣子而已。一般來說，你會將你的資料表欄位以實際用途來命名，所以我們來看一下更實際的例子：

CREATE TABLE products (
    product_no integer,
    name text,
    price numeric
);

（numeric 資料型別可以儲存浮點數，用於典型的貨幣計量。）

小技巧
當你建立了許多相關的資料表時，建立最好選擇一個用於命名表格及欄位的規則。舉例來說，有一個規則是使用單數或複數名詞來取名表格，兩者都有些人喜歡使用。

一個資料表中有多少欄位是有限制的，依欄位型別而定，上限通常是 250 個到 1600 個之間。不過，宣告到這麼多的欄位是非常罕見，而且應該是有問題的設定。

如果你不再需要某個資料表，你可以移除它。請使用 DROP TABLE 指令，如下所示：

DROP TABLE my_first_table;
DROP TABLE products;

企圖要移除一個不存在的資料表，會產生錯誤。不過，在 SQL 腳本中，在建立資料表前嘗試移除是很常見的，通常會忽略錯誤訊息，所以不論資料表是否已經存在，腳本都能如預期執行。（如果你需要的話，你也可以使用 DROP TABLE IF EXISTS 來避免產生錯誤訊息，但這並不是標準 SQL 語法。）

如果你需要變更資料表的結構的話，請參閱本章的 5.5 節。

到目前為止，你已經可以利用工具建立完整功能的資料表。本章接下來的部份會針對附加的功能介紹，像是確保資料完整性、安全性、或方便性。如果你現在急著要將資料存入你的資料表的話，你可以暫時跳過本章，到第 6 章繼續操作。

5.2. 預設值

欄位可以指定一個預設值。當新的列被插入，某些欄位卻沒有指定其值時，這些欄位將會被填入相對應的預設值。資料處理的過程中，當有欄位的值不確定時，也會被設定為其預設值。（關於資料處理的詳細內容，請參閱第 6 章。）

如果預設值並沒有明確被指定時，預設值就會是 null。一般來說空值是可接受的情況，因為空值可以表示「未知的資料」的意義。

在表格定義時，預設值接在資料型別後宣告，如下所示：

CREATE TABLE products (
    product_no integer,
    name text,
    price numeric 
DEFAULT 9.99
);

預設值也可以是運算表示式，會在資料插入的同時進行運算（不是在表格建立時）。常見的例子是 timestamp 欄位，會設定一個 CURRENT_TIMESTAMP 的預設值，使其在資料插入時設定為當下的時間。另一個例子是產生「序列數」，這在 PostgreSQL 中，通常以下列語法來表現：

CREATE TABLE products (
    product_no integer 
DEFAULT nextval('products_product_no_seq')
,
    ...
);

這裡的 nextval() 函數會從序列物件取得下一個數字（參閱 9.16 節）。這個例子也常簡化為：

CREATE TABLE products (
    product_no 
SERIAL
,
    ...
);

有關 SERIAL 的簡寫方式，將在 8.1.4 節中說明。

5.4. 系統欄位

每一個表格都有幾個系統欄位，而它們是由資料庫系統預先定義好的，所以使用者在定義欄位名稱時，不能使用這些名字。（這些限制並不是因為它們是保留關鍵字，所以就算用引號括起來也不能使用。）但在一般使用時，你也不需要特別考慮這些欄位，只要瞭解會有這些欄位存在就好。

oid

每一個資料列會有一個 Object ID，不過這個欄位只有在建立表格時，加上 WITH OIDS 語法才能使用。或者也可以藉由參數來切換使用。這個欄位的型別是 oid（和欄位名相同）。參閱瞭解詳細資訊。

tableoid

每個表格也有一個 ID 也會記錄在每一個資料列中。這個欄位特別方便在取得表格的繼承結構（參閱），如果沒有這個欄位的話，要去找出資料列的來源就會很麻煩。tableoid 可以參考 pg_class 表格中的 oid 欄位，進一步取得表格的名稱。

xmin

這指的是資料列在插入資料的版本資訊。（每一個資料列的版本，都是一個獨立的資料狀態；每一次資料的更新，都會在邏輯層產生一個新的資料列版本。）

cmin

指令識別碼，會存在於新增資料的交易中。（從 0 開始）

xmax

刪除資料的交易版本資訊，如果是 0 的話，代表讓資料列不是刪除中的資料列版本。這通常是用來指出某個刪除的交易還未被完成，或某個刪除正在被回復。

cmax

指令識別碼，有數字的話表示一個刪除的交易指令，或是 0。

ctid

表示每一個資料列存在於該表格的實體位址。注意到的是，雖然 ctid 可以用來快速找到特定的資料列版本，但 ctid 是會改變的，如果有執行過 VACUUM FULL 的話。所以 ctid 如果要用於固定的資料定位的話，是不應該被考慮的選項。OID 或額外自訂序列數字，更適合用於分別邏輯上的資料列。

OID 是一個 32 位元的數字，以 cluster 為單位配發。在一個大型或長期使用的資料庫中，是有可能出現重覆的情況。所以，假設 OID 是唯一的識別是不正確的觀念，除非你還有搭配其他方法來確保唯一性。如果你需要識別表格中的資料列的話，使用序列數產生器是比較建議的作法。OID 也可以這樣用來得到一些額外的預防性功能：

唯一性的限制應該設定在 OID 欄位上，來確保每一個 OID 可以識別每一個資料列。當有唯一性限制存在的時候，對於已經存在的資料列就不會有重覆的 OID。（當然，這方法只能用於資料筆數在 40 億筆以下的表格。不過實務上的表格多數都少於這個數目，而且太多資料的話，效果也會變得很差。）
OID 在多個表格間就不能假設為是唯一，你應該搭配 tableoid 來識別資料庫層級的唯一性。
當然，在建立表格時必須要加入 WITH OIDS 語法。在 PostgreSQL 8.1 之前，WITHOUT OIDS 是預設值。

交易識別碼也是 32 位元的數字。在一個長期運行的資料庫中，交易識別碼也可能會重覆。只要有適當的管理機制的話，這並不會是什麼嚴重的問題，詳情請參閱第 24 章。然而，長期來說（超過 10 億個交易），假定交易識別碼的唯一性是不明智的作法。

指令識別碼也是 32 位元的數字，其絕對上限是約 40 億個指令在一個交易當中，實務上這個限制並不會是問題。注意到這個限制是 SQL 指令數量的限制，而不是處理資料的限制。只有真正有改變資料庫內容的指令才會有指令識別碼。

5.5. 表格變更

當你建立了一個表格，而你發現出了點錯，或者應用需求有一些改變，那麼你可以移除它再重新建立。但這可能不會一個好的選擇，當表格中已經儲存了許多資料時，或者表格正在被其他的資料庫物件所參考中（例如外部鍵參考）。所以 PostgreSQL 提供了一系列的指令來修改現存的表格。注意到這和更新表格內資料的概念是不同的：在這裡，我們主要針對的是調整表格的定義或結構。

你可以：

加入欄位
移除欄位
加入限制條件
移除限制條件
改變預設值
改變欄位資料型別
變更欄位名稱
變更表格名稱

所有這些動作都透過 ALTER TABLE 指令來進行，你可以參考該頁面取得詳細資訊。

5.5.1. 加入欄位

要加入一個新欄位，請使用下面的指令：

ALTER TABLE products ADD COLUMN description text;

這個新的欄位預設會以預設值填入（如果你沒有使用 DEFAULT 子句來宣告的話，那會使用 NULL）。

你也可以在新增同時建立限制條件：

ALTER TABLE products ADD COLUMN description text CHECK (description <> '');

事實上，所有在 CREATE TABLE 的選項都可以在這裡使用。要記得的是，預設值必須要符合限制條件的設定，否則這個欄位會無法加入。順帶一提的是，你也可以隨後再加入限制條件（隨後說明），在你更新好新的欄位資料內容後。

小技巧

加入一個欄位，並且設定預設值，會更新表格的裡的每一個資料列（為了存入新的欄位內容）。然而，無預設值的話，PostgreSQL 就不會在實體上真正進行更新的動行。所以如果你的新欄位大多數的內容都不是預設值的話，那麼就建議不要在加入欄位時設定預設值。之後再使用 UPDATE 來分別更新其內容，然後再以隨後的介紹來更新預設值的設定。

5.5.2. 移除欄位

要移除一個欄位，請使用下列指令：

ALTER TABLE products DROP COLUMN description;

不論資料在該欄位是否消滅，表格的限制條件都會同步再次啓動檢查。所以，如果欄位是被外部鍵所參考的話，PostgreSQL 不會就這樣移除它。你可以宣告同步刪去與此欄位相關的物件，加上 CASCADE：

ALTER TABLE products DROP COLUMN description CASCADE;

請參閱 5.13 節，瞭解詳細的處理機制。

5.5.3. 加入限制條件

要加入限制條件，請使用表格限制條件的語法，例如：

ALTER TABLE products ADD CHECK (name <> '');
ALTER TABLE products ADD CONSTRAINT some_name UNIQUE (product_no);
ALTER TABLE products ADD FOREIGN KEY (product_group_id) REFERENCES product_groups;

要加入 NOT NULL 限制條件的話，就不能寫成表格的限制條件，請使用這樣的語法：

ALTER TABLE products ALTER COLUMN product_no SET NOT NULL;

加入的限制條件會立即開始檢查，所以當下的資料內容必須要能符合條件才能加入成功。

5.5.4. 移除限制條件

要移除限制條件，你需要先知道它的名稱。如果你在宣告時有命名的話，那就使用那個名稱，否則你得找出系統自動命名的名稱。其所使用的指令為「\d tablename」，會列出表格相關的資訊。或使用其他的資料庫工具應該也可以找到它。找到之後請使用下列指令來移除限制條件：

ALTER TABLE products DROP CONSTRAINT some_name;

（如果你的限制條件名稱像是「$2」這樣的，不要忘記使用雙引號括住，使其可以正確地被識別為是名稱。）

在移除欄位時，你需要加入 CASCADE，如果你需要同步移除相關的限制條件的話。像是外部鍵就會依賴另一個唯一性限制或主鍵的限制條件。

下面這可以用在移除 NOT NULL 限制的欄位：

ALTER TABLE products ALTER COLUMN product_no DROP NOT NULL;

(記得 NOT NULL 是沒有名稱的。)

5.5.5. 變更欄位預設值

要設定新的欄位預設值，請使用下面指令：

ALTER TABLE products ALTER COLUMN price SET DEFAULT 7.77;

注意這並不會影響到已經存在的資料，只有隨後新增的資料才會使用。

要移除任何預設值，請使用：

ALTER TABLE products ALTER COLUMN price DROP DEFAULT;

這個指令會把預設值設為空值。因為預設值本來就設為空值，所以即使刪去一個未設定預設值欄位的預設值，也不會是一種錯誤。

5.5.6. 變更欄位資料型別

要變更欄位成為另一個資料型別，請使用下列指令：

ALTER TABLE products ALTER COLUMN price TYPE numeric(10,2);

這只有在欄位內容可以被自動轉換型別時才會成功。如果存在比較複雜的轉換時，你需要加上 USING 子句來指示如何轉換資料內容。

PostgreSQL 會企圖轉換欄位預設值到任何新的型別，而所有的限制條件也會啓動檢查機制。但這些轉換可能會失敗，也可以產生意外的結果。比較好的作法是，先移除限制條件，再變更資料型別，最後再重新加入適當調整後的限制條件。

5.5.7. 變更欄位名稱

要變更某個欄位的名稱：

ALTER TABLE products RENAME COLUMN product_no TO product_number;

5.5.8. 變更表格名稱

要變更表格的名稱：

ALTER TABLE products RENAME TO items;

5.6. 權限

當一個資料庫物件被建立時，它會先指定存取權限給擁有者，而擁有者一般來說就是執行建立指令的使用者。對大多數的資料庫物件來說，其預設的狀態就是只有擁有者（或超級使用者）可以對該物件進行所有操作。要讓給其他使用者來操作的話，就必須進行授權的動作。

有很多不同種類的權限：SELECT、INSERT、UPDATE、DELETE、TRUNCATE、REFERENCES、TRIGGER、CREATE、CONNECT、TEMPORARY、EXECUTE、USAGE。這些權限對於不同物件的效果，會因為是哪一種物件而有所差別（表格、函式...等等）。要瞭解完整在 PostgreSQL 中所支援的各種物件權限，請參考語法頁面。這裡的內容主要說明如何使用。

修改和移除一個資料庫物件，是只有擁有者才具備的權力。

要把一個物件被指派給一個新的擁有者的話，使用該物件的 ALTER 指令，例如：。超級使用者也可以做指派的動作；原來的擁有者如果它仍是該物件的管理群組一員的話，當然也可以；再來就管理群組新的成員。

要進行授權行為的話，請使用 GRANT 指令。舉例來說，如果 joe 是一個使用者，而 accounts 是一個表格，要讓他可以獲得更新表格資料的權力：

使用 ALL 的權限，就代表授權所有可設定的權限。

有一個特別的使用者是 PUBLIC，代表的是系統內的所有使用者。當資料庫內有很多使用者時，可以制定「群組（group）」來簡化管理。這部份詳細的說明請參閱。

要移除權限，請使用 REVOKE 指令：

物件擁有者的特殊權限（例如DROP、GRANT、REVOKE...等）都是和擁有者一起設定，而無法單獨授權。不過，擁有者可以選擇移除自己的權限，例如建立一個唯讀的表格，讓自己和其他人一樣。

回到前面所說的，只有物件的擁有者（或超級使用者）可以變更該物件的權限。然而，也可以使用「with grant option」讓另一個使用者可以代授權給其他使用者。不過如果這個「grant option」被移除時，所有被代授權的使用者都會同時喪失該權限。更詳細的說明請參閱及說明頁面。

5.11. 外部資料

PostgreSQL 實作了 SQL/MED 的部份標準，讓你可以存取不在 PostgreSQL 管理下的資料，重點是，你仍然只需要使用 SQL 語法。這樣的資料我們稱作為外部資料。（注意這部份的使用不要和外部鍵搞混了，外部鍵是資料庫內部的一種條件限制。）

外部資料的存取是透過「Foreign data wrapper」（外部資料封裝技術）。外部資料封裝技術是一組函式庫，用於和外部的資料源溝通，它封裝了資料連線和存取資料的細節。有一些外部資料封裝的套件收錄在 contrib 模組之中，參閱附件 F。其他種類的外部封裝套件則由第三方產品提供。如果沒有適合你的資料源的套件的話，你也可以自己寫一個，參閱第 56 章。

要存取外部資料，你需要建立外部服務物件，用它來連結特定的外部資料源，也可以對套件進行一些設定。然後你還需要建立幾個外部資料表，用於定義外部資料的資料結構。外部資料表的使用就如一般的表格一樣，只不過它沒有實際儲存任何資料罷了。當外部資料表被查詢時，PostgreSQL 會透過外部資料封裝套件，從外部資料源取得資料，或者傳送資料到外部，進行更新資料。

存取外部資料可能需要對外部資料源進行認證。這可以利用使用者映對（user mapping）的方法，讓每個 PostgreSQL 使用者在使用部資料表時，可以傳送自己的認證資訊。

進一步的資訊，請參閱、、、、等內容。

5.12. 其他資料庫物件

表格是關連式資料庫結構裡的主要物件，因為它負責存放資料，但並不是資料庫中唯一的物件。還有許多其他種的物件存在，讓使用上更方便或管理更有效率。這些其他的物件並不在本章中討論，但我們先在這裡列出讓你知道：

視觀
函數與運算子
資料型別和領域
觸發事件和規則覆寫

關於這些物件的詳細說明安排在。

5.13. 相依性追蹤

當你建立了一個複雜的資料庫結構，包含了許多資料表，也設計了許多外部索引鍵、檢視表、觸發事件、函數.....等等。也就是說，其實你建立了一堆物件之間的關連性。舉例來說，資料表的外部索引鍵就與另一個資料表有著參考的關連性。

要維護整個資料庫結構的完整性，PostgreSQL 得確保你不能在有關連性的情況下，隨意刪去物件。舉例來說，企圖刪去在中，我們所使用過的產品資料表，而訂單資料表與其有相依的關連性，那就會產生如下的錯誤訊息：

這個錯誤訊息包含了很有用的指引：如果你不想要一個個處理其相依關連性，那可以一次刪去他們：

如此所有相依的物件就會被刪除了，所有相互依存的物件都會，是遞迴式的處理流程。在這個例子中，它不會移除訂單資料表，只會移除外部索引鍵的限制條件，因為沒有其他物件與該外部索引鍵相依。（如果你要確認 DROP ... CASCADE 會處理哪些物件，你可以用 DETAIL 取代 CASCADE，就會輸出其相依的物件。）

幾乎所有 PostgreSQL 的 DROP 指令都支援 CASCADE 的用法。當然，有些自然的關連性是和物件型別有關。你也可以使用 RESTRICT 來取代 CASCADE 的位置，以強制以預設的行為來處理，也就是絕對不會刪去其他相關的物件。

注意

根據 SQL 標準，不論是 RESTRICT 或 CASCADE，都必須要在 DROP 指令中明確表示，但沒有任何一套資料庫系統真的這樣設計。不過，都會內定預設行為是 RESTRICT 或 CASCADE，每個資料庫不同。

如果 DROP 指令列出了多個物件，CASCADE 只有在這些物件之外還有相依性時才會需要。舉個例子，當執行「DROP TABLE tab1, tab2」時，即使 tab1 與 tab2 之間有外部索引鍵的相依關係，而沒有指定 CASCADE，這個操作也會完成。

對於使用者自訂的函數來說，PostgreSQL 會引用函數的外顯屬性來判斷其相依性，例如函數的參數或輸出型態，但函數內部執行的相依關係就無法追蹤了。舉個列子：

（參閱，瞭解 SQL 語言的函數。）PostgreSQL 會知道 get_color_note 函數相依於 rainbow 資料型別：也就是刪去該資料型別時，也會強制要刪去該函數，因為它的參數將不再合法。但 PostgreSQL 就無法發現 get_color_note 和 my_colors 之間的關連性，當該資料表被移除時，此函數並不會跟著被移除。這種情況有好有壞，函數基本上還是合法的，即使內含的資料表不存在的話，頂多就是執行會出錯就是了，只要再建立該名稱的資料表就可以讓這個函數重新正常運作。

6. 資料處理

前一章討論了如何建立資料表和其他結構來保存資料。現在是把資料表填滿的時候了。本章介紹如何新增、更新和刪除資料表的資料。下一章將會完整說明如何從資料庫中取回你遺落在裡面的資料。

6.1. 新增資料

資料表在建立的時候，並不包含任何資料。以各種方式使用資料庫之前，要做的第一件事就是新增資料。概念上，資料是一次新增一列。當然你也可以新增多列，但就沒有辦法新增少於一列。即使只知道某些欄位的值，也必須建立一個完整的資料列。

要建立新的資料列，請使用 INSERT 指令。該命令需要資料表的名稱和各欄位的資料內容。例如，來看看第 5 章中的產品資料表：

CREATE TABLE products (
    product_no integer,
    name text,
    price numeric
);

新增資料列的指令可能如下所示：

INSERT INTO products VALUES (1, 'Cheese', 9.99);

資料內容按資料表表中欄位的順序列出，以逗號分隔。通常，資料內容會是文字（常數），但運算表示式也是允許的。

上面的語法有缺點，就是你需要知道資料表中欄位的順序。為了避免這種情況，您可以明確地列出欄位。例如，以下兩個命令與上面的命令具有相同的效果：

INSERT INTO products (product_no, name, price) VALUES (1, 'Cheese', 9.99);
INSERT INTO products (name, price, product_no) VALUES ('Cheese', 9.99, 1);

許多用戶認為總是列出欄位名稱是一個很好的習慣。

如果你並沒有所有欄位的內容，則可以省略其中一些欄位。在這種情況下，那些欄位將會以預設值代入。如下所示：

INSERT INTO products (product_no, name) VALUES (1, 'Cheese');
INSERT INTO products VALUES (1, 'Cheese');

第二種形式是屬於 PostgreSQL 延伸寫法。從左邊開始的欄位填入所給定的內容，其餘的欄位則使用預設值。

為了清楚起見，你也可以明確地指定個別欄位或整個資料列都使用預設值：

INSERT INTO products (product_no, name, price) VALUES (1, 'Cheese', DEFAULT);
INSERT INTO products DEFAULT VALUES;

您可以在一個命令中新增多個資料列：

INSERT INTO products (product_no, name, price) VALUES
    (1, 'Cheese', 9.99),
    (2, 'Bread', 1.99),
    (3, 'Milk', 2.99);

也可以以查詢的結果新增（可能沒有資料，一個資料列或多個資料列）：

INSERT INTO products (product_no, name, price)
  SELECT product_no, name, price FROM new_products
    WHERE release_date = 'today';

這包含完整 SQL 查詢機制（第 7 章）用於計算需要新增的資料列。

小技巧

同時要新增大量資料時，請考慮使用 COPY 指令。它不像 INSERT 指令那麼靈活，但是效率更高。有關提高批次新增資料效率的更多資訊，請參閱第 14.4 節。

6.2. 更新資料

將已經在資料庫中的資料做修改被稱為更新。您可以單獨更新某個資料列，或資料表中的所有資料列，或是部份資料列。每個欄位可以單獨更新，而不影響其他欄位。

要更新現有的資料列，請使用指令。這需要三種資訊：

要更新的資料表和欄位的名稱
資料欄位新的內容
哪些資料列要更新

回想一下，SQL 通常不提供資料列的唯一識別資訊。因此，直接指定要更新哪一行通常是不行的，而是指定該資料列必須符合哪些條件才能更新。只有你在資料表中有一個主鍵（決定於是否你有宣告過）之後，才能通過選擇與主鍵相匹配的條件來可靠地解決單個資料列的問題。圖形化的資料庫管理工具依賴這個方式才能允許你單獨更新指定的資料列。

例如，這個指令會將價格為 5 的所有產品更新為 10：

這結果可能是零個，一個或多個資料列被更新。嘗試更新卻沒有匹配到任何資料列，並不是一種錯誤。

我們來詳細看看這個命令。首先是關鍵字 UPDATE，然後是資料表的名稱。像往常一樣，資料表的名稱可以使用加上 schema 的完整路徑名稱，否則就會在搜尋路徑中尋找。接下來的關鍵字是 SET，後面接著欄位名稱，等號和新的欄位內容。新的欄位內容可以是任何的運算表示式，而不僅僅是一個常數。例如，如果要將所有產品的價格提高10％，則可以使用：

如你所見，欄位的表示式可以引用資料列中現有的內容。我們還遺漏了 WHERE 子句。如果省略的話，則意味著資料表中的所有資料列都會被更新。如果存在的話，則只有更新符合 WHERE 條件的那些資料列。請注意，SET 子句中的等號是一個賦值運算，而 WHERE 子句中的等號是比較運算，但這不會造成任何誤解。當然，WHERE 條件不一定是等號運算。還有許多其他的運算子可以使用（詳見第 9 章）。但是表示式需要能產生為布林運算的結果。

您可以在使用 UPDATE 指令時，以 SET 子句中列出多個欄位賦值來更新多個欄位內容。例如：

6.3. 刪除資料

到目前為止，我們已經解釋瞭如何將資料新增到資料表以及如何更新資料了。剩下的就是討論如何刪除不再需要的資料。正如新增資料時只能新增整個資料列一樣，你只能從資料表中以資料列為單位刪除資料。在前面的章節中，我們解釋了SQL沒有提供直接處理某個資料列的方法。因此，只能透過指定要刪除的行必須符合的條件來刪除指定的資料列。如果資料列中有主鍵，則可以指定確切的資料列。但是，你也可以刪除全部符合條件的資料列，更可以一次刪除資料表中的所有資料列。

您使用指令刪除資料列；該語法與 UPDATE 指令十分類似。例如，要從產品表中刪除價格為 10 的所有資料列，請使用：

如果你只是寫：

那麼資料表中的所有資料列都將被刪除！請程式設計師一定要小心使用。

6.4. 修改並回傳資料

有時在修改資料列的操作過程中取得資料是很方便的。INSERT、UPDATE 和 DELETE 指令都有一個選擇性的RETURNING 子句來支持這個功能。使用 RETURNING 可以避免執行額外的資料庫查詢來收集資料，特別是在難以可靠地識別修改的資料列時尤其有用。

RETURNING 子句允許的語法與 SELECT 指令的輸出列表相同（詳見）。它可以包含命令目標資料表的欄位名稱，或者包含使用這些欄位的表示式。常用的簡寫形式是 RETURNING *，預設是資料表的所有欄位，且相同次序。

在 INSERT 中，可用於 RETURNING 的資料是新增的資料列。這在一般的資料新增中並不是很有用，因為它只會重複用戶端所提供的資料。但如果是計算過的預設值就會非常方便。例如，當使用串列欄位（）提供唯一識別時，RETURNING 可以回傳分配給新資料列的 ID：

對於 INSERT ... SELECT，RETURNING 子句也非常有用。

在 UPDATE 中，可用於 RETURNING 的資料是被修改的資料列新內容。例如：

在 DELETE 中，可用於 RETURNING 的資料是已刪除資料列的內容。例如：

如果目標資料表上有觸發函數的話（），則可用於 RETURNING 的資料是由該觸發函數所修改的資料列。因此，由觸發函數計算檢查欄位是 RETURNING 的另一個常見用法。

7. 資料查詢

前面的章節解釋了如何建立資料表，如何填入資料以及如何操作這些資料。現在我們是時候討論如何從資料庫中檢索資料了。

7.1. 概觀

檢索過程或從資料庫檢索資料的命令稱之為查詢。在 SQL 中，SELECT 命令用於進行條件查詢。 SELECT 指令的一般語法是：

以下各節介紹了資料列表（select list），資料表和排序規則的詳細資訊。由於 WITH 查詢是高級功能，因此最後再介紹。

一種簡單的查詢形式如下：

假設有一個名稱為 table1 的資料表，該指令會將取出 table1 中的所有資料表和所有用戶定義的欄位。（檢索的方法取決於用戶端的應用程序，例如，psql 程序將在屏幕上顯示一個 ASCII-art 表格，而用戶端的程式函式庫將提供從查詢結果中提取單一值的功能。選擇資料列表定義「*」表示由資料表表示式所產生的所有欄位。篩選列表可以是可用欄位的子集或使用欄位進行計算。例如，如果 table1 具有名稱為 a，b 和 c（也許是其他）的欄位，則可以進行以下查詢：

（假設 b 和 c 是數字型別）。更多細節詳見。

FROM table1是一種簡單的資料表表示式：它只讀取一個資料表。一般來說，資料表表示式可以是一般的資料表，交叉查詢和子查詢的複雜結構。但是，你也可以完全省略資料表表示式，並使用 SELECT 指令作為計算機：

使用資料列表中的表達式產生變動的結果，是更為常用的方式。例如，你可以這樣呼叫一個函數：

7.3. 取得資料列表

如前一節所述，SELECT 指令中的資料示表表示式透過各種可能地組合資料表、view、消除資料列、分組等來建構中介的虛擬資料表。這個資料表最終會被傳遞給資料列表的處理。資料列表確認中介資料表的哪些欄位是實際上要輸出的。

7.3.1. 資料列表項目

最簡單的選擇列表是*，它表示資料表表示式產生的所有欄位。否則，資料列表是逗號分隔的參數表示式列表（如中所定義的）。例如，它可能是欄位名稱的列表：

欄位名稱 a、b 和 c 是 FROM 子句中資料表的欄位的實際名稱，或者是由中所賦予它們的別名。資料列表中可用的命名空間與 WHERE 子句中的命名空間相同，除非是使用分組查詢，在這種情況下，它與 HAVING 子句中的相同。

如果多個資料表具有相同名稱的欄位，則還必須加上資料表的名稱，如下所示：

處理多個資料表時，查詢特定資料表的所有欄位也是可以的：

有關 table_name.* 表示法的更多信息，請參閱第 8.16.5 節。

如果在資料列表中使用任意值表示式，則概念上是它將新的虛擬欄位加到回傳的資料表中。參數表示式對每個結果資料列計算一次，將該資料列的值替換為任何欄位引用。但是資料列表中的表示式不必引用 FROM 子句的資料表表示式中的任何欄位；例如，它們可以是常數算術表示式。

7.3.2. 欄位命名標籤

資料列表中的項目可以被分配用於後續處理的名稱，例如在 ORDER BY 子句中使用或由用戶端應用程序顯示。例如：

如果沒有使用 AS 指定輸出欄位的名稱，系統將分配一個預設的欄位名稱。對於簡單欄位的引用，就是引用欄位的名稱。對於函數呼叫，就是函數的名稱。對於複雜的表示式，系統將會產成一個通用的名稱。

AS 關鍵字是選用的，但前提是新的欄位名稱不為任何PostgreSQL 關鍵字（請參閱）。為避免與關鍵字意外撞名，你可以對欄位名稱使用雙引號。例如，VALUE 是一個關鍵字，所以就不能這樣使用：

但這樣就可以了：

為了防止未來可能增加的關鍵字，建議你習慣使用 AS 或總是在欄位名稱使用雙引號。

注意
這裡輸出欄位的命名與 FROM 子句中的命名不同（參閱第 7.2.1.2 節）。可以重新命名相同的欄位兩次，但在資料列表中分配的名稱是將要回傳的名稱。

7.3.3. `DISTINCT`

在處理了資料列表之後，結果資料表可以選擇性地消除重複的資料列。 DISTINCT 關鍵字在 SELECT 之後直接寫入以指定這個動作：

（如果不是 DISTINCT，而是關鍵字 ALL，可用於指定保留所有資料列的預設行為。）

顯然，如果至少有一個欄位值不同，則兩個資料列就會被認為是不同的。在這個比較中，空值（null）被認為是相等的。

或者，使用表示式可以指定資料列如何被認為是不同的：

這裡表示式是一個任意的運算表示式，對所有資料列進行求值運算。所有表示式相等的一組資料列被認為是重複的，並且只有該組的第一個資料列會被保留在輸出中。請注意，集合中的「第一行」是不可預知的，除非查詢按足夠的欄位進行排序，以保證進到 DISTINCT 過濾器的資料列是唯一排序。（在 ORDER BY 排序後才進行 DISTINCT ON 處理。）

DISTINCT ON 子句不是SQL標準的一部分，有時被認為是不好的樣式，因為其結果有潛在的不確定性。透過在 FROM 中智慧地使用 GROUP BY 和子查詢，可以避免這種結構，但這卻往往是最方便的選擇。

7.4. 合併查詢結果

兩個查詢的結果可以使用集合操作聯、交集和差集來組合。其語法為：

query1 UNION [ALL] query2
query1 INTERSECT [ALL] query2
query1 EXCEPT [ALL]query2

query1 和 query2 是到目前為止討論過的任何查詢功能。集合操作也可以巢狀也可以連接，例如：

query1 UNION query2 UNION query3

會如下方式執行：

(query1 UNION query2) UNION query3

UNION 將 query2 的結果有效率地附加到 query1 的結果中（但不能保證這是實際回傳資料列的次序）。此外，除非使用了UNION ALL，否則它將以與 DISTINCT相同的方式從結果中消除重複的資料列。

INTERSECT 返回 query1 的結果和 query2 的結果中所有共同的資料列。除非使用 INTERSECT ALL，否則會刪除重複的資料列。

EXCEPT 回傳 query1 的結果中但不包含在 query2 的結果中的所有資料列。（這有時被稱為兩個查詢之間的差集。）同樣地，除非使用 EXCEPT ALL，否則重複資料列將被刪除。

為了計算兩個查詢的聯集、交集或差集，兩個查詢必須是「union compatible」，這意味著它們回傳相同數量的欄位，相應的欄位具有相容的資料型別，如 10.5 節所述。

7.5. 資料排序

在查詢產生了一個輸出資料表（處理了資料列表之後）之後，可以對其資料列進行排序。如果未選擇排序，則資料列將以未指定的順序回傳。在這種情況下的實際順序將取決於資料掃描和交叉查詢類型以及磁碟上的順序，但不能依賴它。只有明確選擇了排序方式，才能保證特定的輸出排序。

以 ORDER BY 子句指定排序順序：

排序表示式可以在查詢的資料列表中有效的任何表示式。一個例子是：

當指定多個表示式時，後面的表示式用於前面表示式都相同的資料進行排序。每個表示式可以跟隨一個選擇性的 ASC 或 DESC 關鍵字來設定排序方向為升冪或降冪。 ASC 排序是預設的選項。升冪首先放置較小的值，其中「較小」是根據「<」運算元定義的。同樣，降冪也是由「>」運算元決定的。

NULLS FIRST 和 NULLS LAST 選項可用於確定在排序順序中是否出現空值出現在非空值之前或之後。預設情況下，空值排序大於任何非空值；也就是 NULLS FIRST 是 DESC 選項的預設值，否則就是 NULLS LAST。

請注意，排序選項是針對每個排序欄位獨立考慮的。例如 ORDER BY x, y DESC 是指 ORDER BY x ASC, y DESC，它與 ORDER BY x DESC, y DESC 不同。

排序表示式也可以是輸出欄位的欄位標籤或編號，如下所示：

兩者都按第一個輸出欄位排序。請注意，輸出欄位名稱必須獨立，也就是說，不能在表示式中使用 - 例如，這樣是不正確的：

這種限制是為了減少歧義。即使 ORDER BY 項目是一個簡單的名字，可以匹配輸出欄位名稱或者資料表表示式中的一項，這仍然是會混淆的。在這種情況下請使用輸出欄位。如果您使用 AS 來重新命名輸出欄位以匹配其他資料表欄位的名稱，只會導致混淆。

可以將 ORDER BY 應用於 UNION、INTERSECT 或 EXCEPT 組合的結果，但在這種情況下，只允許按輸出欄位名稱或數字進行排序，而不能使用表示式進行排序。

7.6. 指定資料範圍

LIMIT 和 OFFSET 允許你只回傳由查詢生成的一部分資料列：

SELECT select_list
    FROM table_expression
    [ ORDER BY ... ]
    [ LIMIT { number | ALL } ] [ OFFSET number]

如果給了一個限制的數量，那麼只有那個數目的資料列會回傳（如果查詢本身產生較少的資料列，則可能會少一些）。LIMIT ALL 與省略 LIMIT 子句相同，也如同 LIMIT 的參數為 NULL。

OFFSET 指的是在開始回傳資料列之前跳過那麼多少資料列。OFFSET 0 與忽略 OFFSET 子句相同，就像使用 NULL 參數的 OFFSET 一樣。

如果同時出現 OFFSET 和 LIMIT，則在開始計算回傳的LIMIT 資料列之前，先跳過 OFFSET 數量的資料列。

使用 LIMIT 時，運用 ORDER BY 子句將結果資料列限制為唯一順序非常重要。否則，你會得到一個不可預知的查詢資料列的子集。你可能會查詢第十到第二十個資料列，但是第十到第二十個資料列是按什麼順序排列的？次序是未知的，除非你指定 ORDER BY。

查詢最佳化在產生查詢計劃時會將 LIMIT 考慮在內，所以根據你給的 LIMIT 和 OFFSET，你很可能會得到不同的計劃（產生不同的資料列順序）。因此，使用不同的 LIMIT / OFFSET 值來選擇查詢結果的不同子集將導致不一致的結果，除非使用 ORDER BY 強制執行可預測的結果排序。這不是一個錯誤；這是一種事實上的結果，即 SQL 不保證以任何特定順序傳遞查詢的結果，除非使用 ORDER BY 來約束順序。

由 OFFSET 子句跳過的資料列仍然需要在伺服器內計算。因此一個大的 OFFSET 可能是低效率的。

7.7. 列舉資料

VALUES 提供了一種產生「靜態資料表」的方法，可以在查詢中使用，而不必實際創建和寫入磁碟上的資料表。其語法是

VALUES ( expression [, ...] ) [, ...]

每個括號內的表示式列表在資料表中生成一個資料列。列表必須具有相同數量的元素（即資料表中的欄位數），並且每個列表中的對應條目必須具有兼容的資料型別。分配給結果中每個欄位的實際資料型別，使用與 UNION 相同的規則來給定（請參閱第 10.5 節）。

如下範例所示：

VALUES (1, 'one'), (2, 'two'), (3, 'three');

將回傳一個兩個欄位三個資料列的資料表。這實際上相當於：

SELECT 1 AS column1, 'one' AS column2
UNION ALL
SELECT 2, 'two'
UNION ALL
SELECT 3, 'three';

預設情況下，PostgreSQL 會將名稱 column1、column2 等分配給 VALUES 資料表的欄位。欄位名稱並不是由 SQL 標準規定的，不同的資料庫系統會以不同的方式賦予，所以通常以資料表別名列表覆寫預設名稱會比較好，如下所示：

=> SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS t (num,letter);
 num | letter
-----+--------
   1 | one
   2 | two
   3 | three
(3 rows)

在語法上，VALUES 接在表示式列表之後被視為等同於：

SELECT select_list FROM table_expression

並可以出現在任何一個 SELECT 可以使用的地方。例如，你可以將其用作為 UNION 的一部分，或者為其增加排序規則（ORDER BY、LIMIT 和 OFFSET）。在 INSERT 命令中，VALUES 最常來作為資料源，其次最常在子查詢。

關於更多訊息，請參閱 VALUES。

8.2. 貨幣型別

貨幣型別儲存具有固定小數精確度的貨幣數量；詳見表 8.3。小數精確度視資料庫的 lc_monetary 設定而定。表中顯示的範圍假設有兩個小數位。有許多可以接受的格式，包括整數和浮點數字，以及典型的貨幣格式，例如如「$1,000.00」。輸出時通常採用後者的形式，但取決於語言環境（locale）。

Table 8.3. Monetary Types

Name

Storage Size

Description

Range

money

8 bytes

currency amount

-92233720368547758.08 to +92233720368547758.07

由於此資料型別的輸出是與區域設定有關的，因此可能無法將貨幣資料載入到不同 lc_monetary 設定的資料庫中。為避免出現問題，在將轉換恢復到新的資料庫之前，請確保 lc_monetary 與轉換的資料庫中的設定值相容。

numberic、int 和 bigint 資料型別的值可以轉換為 money。從 real 和 double precision 資料型別轉換會先轉為 numeric 來完成，例如：

SELECT '12.34'::float8::numeric::money;

但是，並不推薦這樣做。由於四捨五入誤差的可能性，不應該使用浮點數來處理貨幣。

money 型別的數值可以轉換為 numeric 而不會損失精確度。轉換為其他型別可能會失去精確性，而且還必須分兩步驟完成：

SELECT '52093.89'::money::numeric::float8;

當貨幣數值除以另一貨幣數值時，結果會是 double precision（即純數，而不是貨幣）；貨幣單位會相互抵消。

8.3. 字串型別

Table 8.4. Character Types

Table 8.4 列出了 PostgreSQL 中可用的通用字串型別。

SQL 定義了兩種主要字串型別：character varying(n) 和 character(n)，其中 n 是正整數。這兩種型別都可以儲存長度最多為 n 個字元（不是位元組）的字串。嘗試將較長的字串儲存到這些型別的欄位中將産生錯誤，除非多餘的字元都是空格，在這種情況下，字串將被截斷為最大長度。（這個有點奇怪的異常是 SQL 標準所要求的。）如果要儲存的字串比宣告的長度短，則 character 型別的值將被空格填充；character varying 的值將只儲存較短的字串。

如果明確地將值轉換為 character varying(n) 或 character(n)，則超長值將被截斷為 n 個字元而不會引發錯誤。（這也是 SQL 標準所要求的。）

型別 varchar(n) 和 char(n) 分別是 character varying(n) 和 character(n) 的別名。沒有長度的 character 等同於 character(1)。如果在沒有長度的情況下使用 character varying，則該型別接受任何長度的字串。後者是 PostgreSQL 延伸功能。

另外，PostgreSQL 提供了 text 型別，它儲存任意長度的字串。雖然型別 text 不在 SQL 標準中，但是其他幾個 SQL 資料庫管理系統也支援它。

character 的值用空格填充到指定的長度 n，並以這種方式儲存和顯示。但是，在比較兩個型別字串時，尾隨空格在語義上無關緊要會被忽略。在空格很重要的排序規則中，這種行為會產生意想不到的結果; 例如 SELECT 'a '::CHAR(2) collate "C"<E'a\n'::CHAR(2) 會回傳 true，即使 C 語言環境會認為空格大於換行符。將字串轉換為其他字串型別之一時，將刪除尾隨的空格。請注意，尾隨空格在 character varying 和 text 方面具有語義重要性，尤其在使用樣式匹配時，即 LIKE 和正規表示式。

短字串（126 個位元組以下）的儲存要求是 1 個位元組加上實際字串，其中包括字串空間填充。較長的字串有 4 個位元組的開銷而不是 1。長字串由系統自動壓縮，因此磁碟上的物理需求可能更少。非常長的值也儲存在後台的資料表中，這樣它們就不會干擾對較短欄位的快速存取。在任何情況下，可儲存的最長字串大約為 1 GB。（資料型別宣告中 n 允許的最大值小於此值。更改此值沒有用，因為使用多位元組字串編碼時，位元組數和字元數可能完全不同。如果您希望儲存沒有特定上限的長字串，使用不帶長度的 text 或 character varying，而不是隨便設定長度限制。）

小提醒

這三種型別之間並沒有效能差異，除了使用空白填充類型時增加的儲存空間之外，以及一些額外的 CPU 週期來檢查儲存長度與欄位中的長度。雖然 character(n) 在其他一些資料庫系統中具有效能優勢，但 PostgreSQL 中並沒有這樣的優勢；事實上，由於額外的儲存成本，character(n) 通常是三者中最慢的。在大多數情況下，應使用 text 或 character varying。

有關字串文字語法的資訊，請參閱；有關可用運算子和函數的資訊，請參閱。資料庫字元集決定用於儲存文字的字元集；有關字元集支援的更多訊息，請參閱。

Example 8.1. Using the Character Types

PostgreSQL 中還有另外兩種固定長度的字串型別，如 Table 8.5 所示。name 型別僅用於在內部系統目錄中儲存指標，並非供一般使用者使用。它的長度目前定義為 64 個位元組（63 個可用字元加結尾符號），但應視 C 原始碼中的常數 NAMEDATALEN 而定。長度在編譯時設定（因此可以根據特殊用途進行調整）; 預設的最大長度可能會在將來的版本中變更。型別「“char”」（注意雙引號）與 char(1) 的不同之處在於它僅使用一個位元組的儲存空間。它在系統目錄中作為簡單內部使用的列舉型別。

Table 8.5. Special Character Types

8.6. 布林型別

PostgreSQL 支援標準 SQL 的布林型別，如表 [Table 8-19]("DATATYPE-BOOLEAN-TABLE") 所示。布林型別有幾種狀態: "true"、"false"，和第三種狀態 "unknown"，"unknown" 會用 SQL 的 null 值表示。

Table 8-19. 布林型別的資料型態描述

以下的字詞都可以代表 "true" 狀態:

"false" 狀態則可以用以下的字詞表示:

開頭和結尾的空白都會被忽略，也不分大小寫。為了符合 SQL 用法，建議使用關鍵字 "TRUE" 和 "FALSE"。

[Example 8-2]("DATATYPE-BOOLEAN-EXAMPLE") 使用字母 t 和 f，來顯示布林型別的輸出。

Example 8-2. 使用布林型別

8.7. 列舉型別

8.9. 網路資訊型別

8.10. 位元字串型別

Bit strings are strings of 1's and 0's. They can be used to store or visualize bit masks. There are two SQL bit types:bit(n)andbit varying(n), wherenis a positive integer.

bittype data must match the lengthnexactly; it is an error to attempt to store shorter or longer bit strings.bit varyingdata is of variable length up to the maximum lengthn; longer strings will be rejected. Writingbitwithout a length is equivalent tobit(1), whilebit varyingwithout a length specification means unlimited length.

Note:If one explicitly casts a bit-string value tobit(n), it will be truncated or zero-padded on the right to be exactlynbits, without raising an error. Similarly, if one explicitly casts a bit-string value tobit varying(n), it will be truncated on the right if it is more thannbits.

Refer tofor information about the syntax of bit string constants. Bit-logical operators and string manipulation functions are available; see.

Example 8-3. Using the Bit String Types

A bit string value requires 1 byte for each group of 8 bits, plus 5 or 8 bytes overhead depending on the length of the string (but long values may be compressed or moved out-of-line, as explained infor character strings).

8.11. 全文檢索型別

8.12. UUID型別

The data typeuuidstores Universally Unique Identifiers (UUID) as defined by RFC 4122, ISO/IEC 9834-8:2005, and related standards. (Some systems refer to this data type as a globally unique identifier, or GUID, instead.) This identifier is a 128-bit quantity that is generated by an algorithm chosen to make it very unlikely that the same identifier will be generated by anyone else in the known universe using the same algorithm. Therefore, for distributed systems, these identifiers provide a better uniqueness guarantee than sequence generators, which are only unique within a single database.

A UUID is written as a sequence of lower-case hexadecimal digits, in several groups separated by hyphens, specifically a group of 8 digits followed by three groups of 4 digits followed by a group of 12 digits, for a total of 32 digits representing the 128 bits. An example of a UUID in this standard form is:

a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11

PostgreSQLalso accepts the following alternative forms for input: use of upper-case digits, the standard format surrounded by braces, omitting some or all hyphens, adding a hyphen after any group of four digits. Examples are:

A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
{a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
a0eebc999c0b4ef8bb6d6bb9bd380a11
a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
{a0eebc99-9c0b4ef8-bb6d6bb9-bd380a11}

Output is always in the standard form.

PostgreSQLprovides storage and comparison functions for UUIDs, but the core database does not include any function for generating UUIDs, because no single algorithm is well suited for every application. Theuuid-osspmodule provides functions that implement several standard algorithms. Thepgcryptomodule also provides a generation function for random UUIDs. Alternatively, UUIDs could be generated by client applications or other libraries invoked through a server-side function.

8.13. XML型別

8.19. pg_lsn型別

Thepg_lsndata type can be used to store LSN (Log Sequence Number) data which is a pointer to a location in the WAL. This type is a representation ofXLogRecPtrand an internal system type ofPostgreSQL.

Internally, an LSN is a 64-bit integer, representing a byte position in the write-ahead log stream. It is printed as two hexadecimal numbers of up to 8 digits each, separated by a slash; for example,16/B374D848. Thepg_lsntype supports the standard comparison operators, like=and>. Two LSNs can be subtracted using the-operator; the result is the number of bytes separating those write-ahead log locations.

8.20. 概念型別

ThePostgreSQLtype system contains a number of special-purpose entries that are collectively calledpseudo-types. A pseudo-type cannot be used as a column data type, but it can be used to declare a function's argument or result type. Each of the available pseudo-types is useful in situations where a function's behavior does not correspond to simply taking or returning a value of a specificSQLdata type.lists the existing pseudo-types.

Table 8.25. Pseudo-Types

Functions coded in C (whether built-in or dynamically loaded) can be declared to accept or return any of these pseudo data types. It is up to the function author to ensure that the function will behave safely when a pseudo-type is used as an argument type.

Functions coded in procedural languages can use pseudo-types only as allowed by their implementation languages. At present most procedural languages forbid use of a pseudo-type as an argument type, and allow onlyvoidandrecordas a result type (plustriggerorevent_triggerwhen the function is used as a trigger or event trigger). Some also support polymorphic functions using the typesanyelement,anyarray,anynonarray,anyenum, andanyrange.

Theinternalpseudo-type is used to declare functions that are meant only to be called internally by the database system, and not by direct invocation in anSQLquery. If a function has at least oneinternal-type argument then it cannot be called fromSQL. To preserve the type safety of this restriction it is important to follow this coding rule: do not create any function that is declared to returninternalunless it has at least oneinternalargument.

9. 函式及運算子

PostgreSQL 為內建的資料型別提供了大量的函數和運算子。使用者還可以定義自己的函數和運算子，如所述。psql 指令 \df 和 \do 可分別用於列出所有可用的函數和運算子。

如果您擔心可移植性，那麼請注意，本章中描述的大多數函數和運算子（最常見的算術運算子和比較運算子以及一些明確標記的函數除外）都不是由 SQL 標準指定的。其他一些 SQL 資料庫管理系統提供了其中一些延伸功能，並且在許多情況下，這些功能在各種實作之間是相容和一致的。本章可能不夠完整；附加功能出現在手冊的其他相關章節中。

8.16. 複合型別

A_composite type_represents the structure of a row or record; it is essentially just a list of field names and their data types.PostgreSQLallows composite types to be used in many of the same ways that simple types can be used. For example, a column of a table can be declared to be of a composite type.

8.16.1. Declaration of Composite Types

Here are two simple examples of defining composite types:

CREATE TYPE complex AS (
    r       double precision,
    i       double precision
);

CREATE TYPE inventory_item AS (
    name            text,
    supplier_id     integer,
    price           numeric
);

The syntax is comparable toCREATE TABLE, except that only field names and types can be specified; no constraints (such asNOT NULL) can presently be included. Note that theASkeyword is essential; without it, the system will think a different kind ofCREATE TYPEcommand is meant, and you will get odd syntax errors.

Having defined the types, we can use them to create tables:

CREATE TABLE on_hand (
    item      inventory_item,
    count     integer
);

INSERT INTO on_hand VALUES (ROW('fuzzy dice', 42, 1.99), 1000);

or functions:

CREATE FUNCTION price_extension(inventory_item, integer) RETURNS numeric
AS 'SELECT $1.price * $2' LANGUAGE SQL;

SELECT price_extension(item, 10) FROM on_hand;

Whenever you create a table, a composite type is also automatically created, with the same name as the table, to represent the table's row type. For example, had we said:

CREATE TABLE inventory_item (
    name            text,
    supplier_id     integer REFERENCES suppliers,
    price           numeric CHECK (price 
>
 0)
);

then the sameinventory_itemcomposite type shown above would come into being as a byproduct, and could be used just as above. Note however an important restriction of the current implementation: since no constraints are associated with a composite type, the constraints shown in the table definition_do not apply_to values of the composite type outside the table. (A partial workaround is to use domain types as members of composite types.)

8.16.2. Constructing Composite Values

To write a composite value as a literal constant, enclose the field values within parentheses and separate them by commas. You can put double quotes around any field value, and must do so if it contains commas or parentheses. (More details appearbelow.) Thus, the general format of a composite constant is the following:

'( 
val1
 , 
val2
 , ... )'

An example is:

'("fuzzy dice",42,1.99)'

which would be a valid value of theinventory_itemtype defined above. To make a field be NULL, write no characters at all in its position in the list. For example, this constant specifies a NULL third field:

'("fuzzy dice",42,)'

If you want an empty string rather than NULL, write double quotes:

'("",42,)'

Here the first field is a non-NULL empty string, the third is NULL.

(These constants are actually only a special case of the generic type constants discussed inSection 4.1.2.7. The constant is initially treated as a string and passed to the composite-type input conversion routine. An explicit type specification might be necessary to tell which type to convert the constant to.)

TheROWexpression syntax can also be used to construct composite values. In most cases this is considerably simpler to use than the string-literal syntax since you don't have to worry about multiple layers of quoting. We already used this method above:

ROW('fuzzy dice', 42, 1.99)
ROW('', 42, NULL)

The ROW keyword is actually optional as long as you have more than one field in the expression, so these can be simplified to:

('fuzzy dice', 42, 1.99)
('', 42, NULL)

TheROWexpression syntax is discussed in more detail inSection 4.2.13.

8.16.3. Accessing Composite Types

To access a field of a composite column, one writes a dot and the field name, much like selecting a field from a table name. In fact, it's so much like selecting from a table name that you often have to use parentheses to keep from confusing the parser. For example, you might try to select some subfields from ouron_handexample table with something like:

SELECT item.name FROM on_hand WHERE item.price 
>
 9.99;

This will not work since the nameitemis taken to be a table name, not a column name ofon_hand, per SQL syntax rules. You must write it like this:

SELECT (item).name FROM on_hand WHERE (item).price 
>
 9.99;

or if you need to use the table name as well (for instance in a multitable query), like this:

SELECT (on_hand.item).name FROM on_hand WHERE (on_hand.item).price 
>
 9.99;

Now the parenthesized object is correctly interpreted as a reference to theitemcolumn, and then the subfield can be selected from it.

Similar syntactic issues apply whenever you select a field from a composite value. For instance, to select just one field from the result of a function that returns a composite value, you'd need to write something like:

SELECT (my_func(...)).field FROM ...

Without the extra parentheses, this will generate a syntax error.

The special field name*means“all fields”, as further explained inSection 8.16.5.

8.16.4. Modifying Composite Types

Here are some examples of the proper syntax for inserting and updating composite columns. First, inserting or updating a whole column:

INSERT INTO mytab (complex_col) VALUES((1.1,2.2));

UPDATE mytab SET complex_col = ROW(1.1,2.2) WHERE ...;

The first example omitsROW, the second uses it; we could have done it either way.

We can update an individual subfield of a composite column:

UPDATE mytab SET complex_col.r = (complex_col).r + 1 WHERE ...;

Notice here that we don't need to (and indeed cannot) put parentheses around the column name appearing just afterSET, but we do need parentheses when referencing the same column in the expression to the right of the equal sign.

And we can specify subfields as targets forINSERT, too:

INSERT INTO mytab (complex_col.r, complex_col.i) VALUES(1.1, 2.2);

Had we not supplied values for all the subfields of the column, the remaining subfields would have been filled with null values.

8.16.5. Using Composite Types in Queries

There are various special syntax rules and behaviors associated with composite types in queries. These rules provide useful shortcuts, but can be confusing if you don't know the logic behind them.

InPostgreSQL, a reference to a table name (or alias) in a query is effectively a reference to the composite value of the table's current row. For example, if we had a tableinventory_itemas shownabove, we could write:

SELECT c FROM inventory_item c;

This query produces a single composite-valued column, so we might get output like:

           c
------------------------
 ("fuzzy dice",42,1.99)
(1 row)

Note however that simple names are matched to column names before table names, so this example works only because there is no column namedcin the query's tables.

The ordinary qualified-column-name syntaxtable_name._column_name_can be understood as applyingfield selectionto the composite value of the table's current row. (For efficiency reasons, it's not actually implemented that way.)

When we write

SELECT c.* FROM inventory_item c;

then, according to the SQL standard, we should get the contents of the table expanded into separate columns:

    name    | supplier_id | price
------------+-------------+-------
 fuzzy dice |          42 |  1.99
(1 row)

as if the query were

SELECT c.name, c.supplier_id, c.price FROM inventory_item c;

PostgreSQLwill apply this expansion behavior to any composite-valued expression, although as shownabove, you need to write parentheses around the value that.*is applied to whenever it's not a simple table name. For example, ifmyfunc()is a function returning a composite type with columnsa,b, andc, then these two queries have the same result:

SELECT (myfunc(x)).* FROM some_table;
SELECT (myfunc(x)).a, (myfunc(x)).b, (myfunc(x)).c FROM some_table;

Tip

PostgreSQLhandles column expansion by actually transforming the first form into the second. So, in this example,myfunc()would get invoked three times per row with either syntax. If it's an expensive function you may wish to avoid that, which you can do with a query like:

SELECT (m).* FROM (SELECT myfunc(x) AS m FROM some_table OFFSET 0) ss;

TheOFFSET 0clause keeps the optimizer from“flattening”the sub-select to arrive at the form with multiple calls ofmyfunc().

Thecomposite_value.*syntax results in column expansion of this kind when it appears at the top level of aSELECToutput list, aRETURNINGlistinINSERT/UPDATE/DELETE, aVALUESclause, or arow constructor. In all other contexts (including when nested inside one of those constructs), attaching.*to a composite value does not change the value, since it means“all columns”and so the same composite value is produced again. For example, ifsomefunc()accepts a composite-valued argument, these queries are the same:

SELECT somefunc(c.*) FROM inventory_item c;
SELECT somefunc(c) FROM inventory_item c;

In both cases, the current row ofinventory_itemis passed to the function as a single composite-valued argument. Even though.*does nothing in such cases, using it is good style, since it makes clear that a composite value is intended. In particular, the parser will considercinc.*to refer to a table name or alias, not to a column name, so that there is no ambiguity; whereas without.*, it is not clear whethercmeans a table name or a column name, and in fact the column-name interpretation will be preferred if there is a column namedc.

Another example demonstrating these concepts is that all these queries mean the same thing:

SELECT * FROM inventory_item c ORDER BY c;
SELECT * FROM inventory_item c ORDER BY c.*;
SELECT * FROM inventory_item c ORDER BY ROW(c.*);

All of theseORDER BYclauses specify the row's composite value, resulting in sorting the rows according to the rules described inSection 9.23.6. However, ifinventory_itemcontained a column namedc, the first case would be different from the others, as it would mean to sort by that column only. Given the column names previously shown, these queries are also equivalent to those above:

SELECT * FROM inventory_item c ORDER BY ROW(c.name, c.supplier_id, c.price);
SELECT * FROM inventory_item c ORDER BY (c.name, c.supplier_id, c.price);

(The last case uses a row constructor with the key wordROWomitted.)

Another special syntactical behavior associated with composite values is that we can usefunctional notation_for extracting a field of a composite value. The simple way to explain this is that the notationsfield(table)andtable.field_are interchangeable. For example, these queries are equivalent:

SELECT c.name FROM inventory_item c WHERE c.price 
>
 1000;
SELECT name(c) FROM inventory_item c WHERE price(c) 
>
 1000;

Moreover, if we have a function that accepts a single argument of a composite type, we can call it with either notation. These queries are all equivalent:

SELECT somefunc(c) FROM inventory_item c;
SELECT somefunc(c.*) FROM inventory_item c;
SELECT c.somefunc FROM inventory_item c;

This equivalence between functional notation and field notation makes it possible to use functions on composite types to implement“computed fields”.An application using the last query above wouldn't need to be directly aware thatsomefuncisn't a real column of the table.

Tip

Because of this behavior, it's unwise to give a function that takes a single composite-type argument the same name as any of the fields of that composite type. If there is ambiguity, the field-name interpretation will be preferred, so that such a function could not be called without tricks. One way to force the function interpretation is to schema-qualify the function name, that is, writeschema.func(compositevalue).

8.16.6. Composite Type Input and Output Syntax

The external text representation of a composite value consists of items that are interpreted according to the I/O conversion rules for the individual field types, plus decoration that indicates the composite structure. The decoration consists of parentheses ((and)) around the whole value, plus commas (,) between adjacent items. Whitespace outside the parentheses is ignored, but within the parentheses it is considered part of the field value, and might or might not be significant depending on the input conversion rules for the field data type. For example, in:

'(  42)'

the whitespace will be ignored if the field type is integer, but not if it is text.

As shown previously, when writing a composite value you can write double quotes around any individual field value. You_must_do so if the field value would otherwise confuse the composite-value parser. In particular, fields containing parentheses, commas, double quotes, or backslashes must be double-quoted. To put a double quote or backslash in a quoted composite field value, precede it with a backslash. (Also, a pair of double quotes within a double-quoted field value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can avoid quoting and use backslash-escaping to protect all data characters that would otherwise be taken as composite syntax.

A completely empty field value (no characters at all between the commas or parentheses) represents a NULL. To write a value that is an empty string rather than NULL, write"".

The composite output routine will put double quotes around field values if they are empty strings or contain parentheses, commas, double quotes, backslashes, or white space. (Doing so for white space is not essential, but aids legibility.) Double quotes and backslashes embedded in field values will be doubled.

Note

Remember that what you write in an SQL command will first be interpreted as a string literal, and then as a composite. This doubles the number of backslashes you need (assuming escape string syntax is used). For example, to insert atextfield containing a double quote and a backslash in a composite value, you'd need to write:

INSERT ... VALUES (E'("\\"\\\\")');

The string-literal processor removes one level of backslashes, so that what arrives at the composite-value parser looks like("\"\\"). In turn, the string fed to thetextdata type's input routine becomes"\. (If we were working with a data type whose input routine also treated backslashes specially,byteafor example, we might need as many as eight backslashes in the command to get one backslash into the stored composite field.) Dollar quoting (seeSection 4.1.2.4) can be used to avoid the need to double backslashes.

Tip

TheROWconstructor syntax is usually easier to work with than the composite-literal syntax when writing composite values in SQL commands. InROW, individual field values are written the same way they would be written when not members of a composite.

8.14. JSON型別

JSON data types are for storing JSON (JavaScript Object Notation) data, as specified inRFC 7159. Such data can also be stored astext, but the JSON data types have the advantage of enforcing that each stored value is valid according to the JSON rules. There are also assorted JSON-specific functions and operators available for data stored in these data types; seeSection 9.15.

There are two JSON data types:jsonandjsonb. They accept_almost_identical sets of values as input. The major practical difference is one of efficiency. Thejsondata type stores an exact copy of the input text, which processing functions must reparse on each execution; whilejsonbdata is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed.jsonbalso supports indexing, which can be a significant advantage.

Because thejsontype stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast,jsonbdoes not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.

In general, most applications should prefer to store JSON data asjsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys.

PostgreSQLallows only one character set encoding per database. It is therefore not possible for the JSON types to conform rigidly to the JSON specification unless the database encoding is UTF8. Attempts to directly include characters that cannot be represented in the database encoding will fail; conversely, characters that can be represented in the database encoding but not in UTF8 will be allowed.

RFC 7159 permits JSON strings to contain Unicode escape sequences denoted by\uXXXX. In the input function for thejsontype, Unicode escapes are allowed regardless of the database encoding, and are checked only for syntactic correctness (that is, that four hex digits follow\u). However, the input function forjsonbis stricter: it disallows Unicode escapes for non-ASCII characters (those aboveU+007F) unless the database encoding is UTF8. Thejsonbtype also rejects\u0000(because that cannot be represented inPostgreSQL'stexttype), and it insists that any use of Unicode surrogate pairs to designate characters outside the Unicode Basic Multilingual Plane be correct. Valid Unicode escapes are converted to the equivalent ASCII or UTF8 character for storage; this includes folding surrogate pairs into a single character.

Note

Many of the JSON processing functions described inSection 9.15will convert Unicode escapes to regular characters, and will therefore throw the same types of errors just described even if their input is of typejsonnotjsonb. The fact that thejsoninput function does not make these checks may be considered a historical artifact, although it does allow for simple storage (without processing) of JSON Unicode escapes in a non-UTF8 database encoding. In general, it is best to avoid mixing Unicode escapes in JSON with a non-UTF8 database encoding, if possible.

When converting textual JSON input intojsonb, the primitive types described byRFC7159 are effectively mapped onto nativePostgreSQLtypes, as shown inTable 8.23. Therefore, there are some minor additional constraints on what constitutes validjsonbdata that do not apply to thejsontype, nor to JSON in the abstract, corresponding to limits on what can be represented by the underlying data type. Notably,jsonbwill reject numbers that are outside the range of thePostgreSQLnumericdata type, whilejsonwill not. Such implementation-defined restrictions are permitted byRFC7159. However, in practice such problems are far more likely to occur in other implementations, as it is common to represent JSON'snumberprimitive type as IEEE 754 double precision floating point (whichRFC7159 explicitly anticipates and allows for). When using JSON as an interchange format with such systems, the danger of losing numeric precision compared to data originally stored byPostgreSQLshould be considered.

Conversely, as noted in the table there are some minor restrictions on the input format of JSON primitive types that do not apply to the correspondingPostgreSQLtypes.

Table 8.23. JSON primitive types and correspondingPostgreSQLtypes

JSON primitive type

PostgreSQL

type

Notes

string

text

\u0000is disallowed, as are non-ASCII Unicode escapes if database encoding is not UTF8

number

numeric

NaNandinfinityvalues are disallowed

boolean

Only lowercasetrueandfalsespellings are accepted

null

(none)

SQLNULLis a different concept

8.14.1. JSON Input and Output Syntax

The input/output syntax for the JSON data types is as specified inRFC7159.

The following are all validjson(orjsonb) expressions:

-- Simple scalar/primitive value
-- Primitive values can be numbers, quoted strings, true, false, or null
SELECT '5'::json;

-- Array of zero or more elements (elements need not be of same type)
SELECT '[1, 2, "foo", null]'::json;

-- Object containing pairs of keys and values
-- Note that object keys must always be quoted strings
SELECT '{"bar": "baz", "balance": 7.77, "active": false}'::json;

-- Arrays and objects can be nested arbitrarily
SELECT '{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'::json;

As previously stated, when a JSON value is input and then printed without any additional processing,jsonoutputs the same text that was input, whilejsonbdoes not preserve semantically-insignificant details such as whitespace. For example, note the differences here:

SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::json;
                      json                       
-------------------------------------------------
 {"bar": "baz", "balance": 7.77, "active":false}
(1 row)

SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::jsonb;
                      jsonb                       
--------------------------------------------------
 {"bar": "baz", "active": false, "balance": 7.77}
(1 row)

One semantically-insignificant detail worth noting is that injsonb, numbers will be printed according to the behavior of the underlyingnumerictype. In practice this means that numbers entered withEnotation will be printed without it, for example:

SELECT '{"reading": 1.230e-5}'::json, '{"reading": 1.230e-5}'::jsonb;
         json          |          jsonb          
-----------------------+-------------------------
 {"reading": 1.230e-5} | {"reading": 0.00001230}
(1 row)

However,jsonbwill preserve trailing fractional zeroes, as seen in this example, even though those are semantically insignificant for purposes such as equality checks.

8.14.2. Designing JSON documents effectively

Representing data as JSON can be considerably more flexible than the traditional relational data model, which is compelling in environments where requirements are fluid. It is quite possible for both approaches to co-exist and complement each other within the same application. However, even for applications where maximal flexibility is desired, it is still recommended that JSON documents have a somewhat fixed structure. The structure is typically unenforced (though enforcing some business rules declaratively is possible), but having a predictable structure makes it easier to write queries that usefully summarize a set of“documents”(datums) in a table.

JSON data is subject to the same concurrency-control considerations as any other data type when stored in a table. Although storing large documents is practicable, keep in mind that any update acquires a row-level lock on the whole row. Consider limiting JSON documents to a manageable size in order to decrease lock contention among updating transactions. Ideally, JSON documents should each represent an atomic datum that business rules dictate cannot reasonably be further subdivided into smaller datums that could be modified independently.

8.14.3. `jsonb`Containment and Existence

Testing_containment_is an important capability ofjsonb. There is no parallel set of facilities for thejsontype. Containment tests whether onejsonbdocument has contained within it another one. These examples return true except as noted:

-- Simple scalar/primitive values contain only the identical value:
SELECT '"foo"'::jsonb @
>
 '"foo"'::jsonb;

-- The array on the right side is contained within the one on the left:
SELECT '[1, 2, 3]'::jsonb @
>
 '[1, 3]'::jsonb;

-- Order of array elements is not significant, so this is also true:
SELECT '[1, 2, 3]'::jsonb @
>
 '[3, 1]'::jsonb;

-- Duplicate array elements don't matter either:
SELECT '[1, 2, 3]'::jsonb @
>
 '[1, 2, 2]'::jsonb;

-- The object with a single pair on the right side is contained
-- within the object on the left side:
SELECT '{"product": "PostgreSQL", "version": 9.4, "jsonb": true}'::jsonb @
>
 '{"version": 9.4}'::jsonb;

-- The array on the right side is 
not
 considered contained within the
-- array on the left, even though a similar array is nested within it:
SELECT '[1, 2, [1, 3]]'::jsonb @
>
 '[1, 3]'::jsonb;  -- yields false

-- But with a layer of nesting, it is contained:
SELECT '[1, 2, [1, 3]]'::jsonb @
>
 '[[1, 3]]'::jsonb;

-- Similarly, containment is not reported here:
SELECT '{"foo": {"bar": "baz"}}'::jsonb @
>
 '{"bar": "baz"}'::jsonb;  -- yields false

-- A top-level key and an empty object is contained:
SELECT '{"foo": {"bar": "baz"}}'::jsonb @
>
 '{"foo": {}}'::jsonb;

The general principle is that the contained object must match the containing object as to structure and data contents, possibly after discarding some non-matching array elements or object key/value pairs from the containing object. But remember that the order of array elements is not significant when doing a containment match, and duplicate array elements are effectively considered only once.

As a special exception to the general principle that the structures must match, an array may contain a primitive value:

-- This array contains the primitive string value:
SELECT '["foo", "bar"]'::jsonb @
>
 '"bar"'::jsonb;

-- This exception is not reciprocal -- non-containment is reported here:
SELECT '"bar"'::jsonb @
>
 '["bar"]'::jsonb;  -- yields false

jsonbalso has an_existence_operator, which is a variation on the theme of containment: it tests whether a string (given as atextvalue) appears as an object key or array element at the top level of thejsonbvalue. These examples return true except as noted:

-- String exists as array element:
SELECT '["foo", "bar", "baz"]'::jsonb ? 'bar';

-- String exists as object key:
SELECT '{"foo": "bar"}'::jsonb ? 'foo';

-- Object values are not considered:
SELECT '{"foo": "bar"}'::jsonb ? 'bar';  -- yields false

-- As with containment, existence must match at the top level:
SELECT '{"foo": {"bar": "baz"}}'::jsonb ? 'bar'; -- yields false

-- A string is considered to exist if it matches a primitive JSON string:
SELECT '"foo"'::jsonb ? 'foo';

JSON objects are better suited than arrays for testing containment or existence when there are many keys or elements involved, because unlike arrays they are internally optimized for searching, and do not need to be searched linearly.

Tip

Because JSON containment is nested, an appropriate query can skip explicit selection of sub-objects. As an example, suppose that we have adoccolumn containing objects at the top level, with most objects containingtagsfields that contain arrays of sub-objects. This query finds entries in which sub-objects containing both"term":"paris"and"term":"food"appear, while ignoring any such keys outside thetagsarray:

SELECT doc-
>
'site_name' FROM websites
  WHERE doc @
>
 '{"tags":[{"term":"paris"}, {"term":"food"}]}';

One could accomplish the same thing with, say,

SELECT doc-
>
'site_name' FROM websites
  WHERE doc-
>
'tags' @
>
 '[{"term":"paris"}, {"term":"food"}]';

but that approach is less flexible, and often less efficient as well.

On the other hand, the JSON existence operator is not nested: it will only look for the specified key or array element at top level of the JSON value.

The various containment and existence operators, along with all other JSON operators and functions are documented inSection 9.15.

8.14.4. `jsonb`Indexing

GIN indexes can be used to efficiently search for keys or key/value pairs occurring within a large number ofjsonbdocuments (datums). Two GIN“operator classes”are provided, offering different performance and flexibility trade-offs.

The default GIN operator class forjsonbsupports queries with top-level key-exists operators?,?&and?|operators and path/value-exists operator@>. (For details of the semantics that these operators implement, seeTable 9.44.) An example of creating an index with this operator class is:

CREATE INDEX idxgin ON api USING GIN (jdoc);

The non-default GIN operator classjsonb_path_opssupports indexing the@>operator only. An example of creating an index with this operator class is:

CREATE INDEX idxginp ON api USING GIN (jdoc jsonb_path_ops);

Consider the example of a table that stores JSON documents retrieved from a third-party web service, with a documented schema definition. A typical document is:

{
    "guid": "9c36adc1-7fb5-4d5b-83b4-90356a46061a",
    "name": "Angela Barton",
    "is_active": true,
    "company": "Magnafone",
    "address": "178 Howard Place, Gulf, Washington, 702",
    "registered": "2009-11-07T08:53:22 +08:00",
    "latitude": 19.793713,
    "longitude": 86.513373,
    "tags": [
        "enim",
        "aliquip",
        "qui"
    ]
}

We store these documents in a table namedapi, in ajsonbcolumn namedjdoc. If a GIN index is created on this column, queries like the following can make use of the index:

-- Find documents in which the key "company" has value "Magnafone"
SELECT jdoc-
>
'guid', jdoc-
>
'name' FROM api WHERE jdoc @
>
 '{"company": "Magnafone"}';

However, the index could not be used for queries like the following, because though the operator?is indexable, it is not applied directly to the indexed columnjdoc:

-- Find documents in which the key "tags" contains key or array element "qui"
SELECT jdoc-
>
'guid', jdoc-
>
'name' FROM api WHERE jdoc -
>
 'tags' ? 'qui';

Still, with appropriate use of expression indexes, the above query can use an index. If querying for particular items within the"tags"key is common, defining an index like this may be worthwhile:

CREATE INDEX idxgintags ON api USING GIN ((jdoc -
>
 'tags'));

Now, theWHEREclausejdoc -> 'tags' ? 'qui'will be recognized as an application of the indexable operator?to the indexed expressionjdoc -> 'tags'. (More information on expression indexes can be found inSection 11.7.)

Another approach to querying is to exploit containment, for example:

-- Find documents in which the key "tags" contains array element "qui"
SELECT jdoc-
>
'guid', jdoc-
>
'name' FROM api WHERE jdoc @
>
 '{"tags": ["qui"]}';

A simple GIN index on thejdoccolumn can support this query. But note that such an index will store copies of every key and value in thejdoccolumn, whereas the expression index of the previous example stores only data found under thetagskey. While the simple-index approach is far more flexible (since it supports queries about any key), targeted expression indexes are likely to be smaller and faster to search than a simple index.

Although thejsonb_path_opsoperator class supports only queries with the@>operator, it has notable performance advantages over the default operator classjsonb_ops. Ajsonb_path_opsindex is usually much smaller than ajsonb_opsindex over the same data, and the specificity of searches is better, particularly when queries contain keys that appear frequently in the data. Therefore search operations typically perform better than with the default operator class.

The technical difference between ajsonb_opsand ajsonb_path_opsGIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data.[6]Basically, eachjsonb_path_opsindex item is a hash of the value and the key(s) leading to it; for example to index{"foo": {"bar": "baz"}}, a single index item would be created incorporating all three offoo,bar, andbazinto the hash value. Thus a containment query looking for this structure would result in an extremely specific index search; but there is no way at all to find out whetherfooappears as a key. On the other hand, ajsonb_opsindex would create three index items representingfoo,bar, andbazseparately; then to do the containment query, it would look for rows containing all three of these items. While GIN indexes can perform such an AND search fairly efficiently, it will still be less specific and slower than the equivalentjsonb_path_opssearch, especially if there are a very large number of rows containing any single one of the three index items.

A disadvantage of thejsonb_path_opsapproach is that it produces no index entries for JSON structures not containing any values, such as{"a": {}}. If a search for documents containing such a structure is requested, it will require a full-index scan, which is quite slow.jsonb_path_opsis therefore ill-suited for applications that often perform such searches.

jsonbalso supportsbtreeandhashindexes. These are usually useful only if it's important to check equality of complete JSON documents. Thebtreeordering forjsonbdatums is seldom of great interest, but for completeness it is:

Object
>
Array
>
Boolean
>
Number
>
String
>
Null
Object with n pairs
>
object with n - 1 pairs
Array with n elements
>
array with n - 1 elements

Objects with equal numbers of pairs are compared in the order:

key-1
, 
value-1
, 
key-2
 ...

Note that object keys are compared in their storage order; in particular, since shorter keys are stored before longer keys, this can lead to results that might be unintuitive, such as:

{ "aa": 1, "c": 1} 
>
 {"b": 1, "d": 1}

Similarly, arrays with equal numbers of elements are compared in the order:

element-1
, 
element-2
 ...

Primitive JSON values are compared using the same comparison rules as for the underlyingPostgreSQLdata type. Strings are compared using the default database collation.

[6]For this purpose, the term“value”includes array elements, though JSON terminology sometimes considers array elements distinct from values within objects.

9.12. 網路位址函式及運算子

Table 9.36shows the operators available for thecidrandinettypes. The operators<<,<<=,>>,>>=, and&&test for subnet inclusion. They consider only the network parts of the two addresses (ignoring any host part) and determine whether one network is identical to or a subnet of the other.

Table 9.36. cidrandinetOperators

Operator

Description

Example

<

is less than

inet '192.168.1.5' < inet '192.168.1.6'

<=

is less than or equal

inet '192.168.1.5' <= inet '192.168.1.5'

=

equals

inet '192.168.1.5' = inet '192.168.1.5'

>=

is greater or equal

inet '192.168.1.5' >= inet '192.168.1.5'

>

is greater than

inet '192.168.1.5' > inet '192.168.1.4'

<>

is not equal

inet '192.168.1.5' <> inet '192.168.1.4'

<<

is contained by

inet '192.168.1.5' << inet '192.168.1/24'

<<=

is contained by or equals

inet '192.168.1/24' <<= inet '192.168.1/24'

>>

contains

inet '192.168.1/24' >> inet '192.168.1.5'

>>=

contains or equals

inet '192.168.1/24' >>= inet '192.168.1/24'

&&

contains or is contained by

inet '192.168.1/24' && inet '192.168.1.80/28'

~

bitwise NOT

~ inet '192.168.1.6'

&

bitwise AND

inet '192.168.1.6' & inet '0.0.0.255'

bitwise OR

`inet '192.168.1.6'

inet '0.0.0.255'`

+

addition

inet '192.168.1.6' + 25

-

subtraction

inet '192.168.1.43' - 36

-

subtraction

inet '192.168.1.43' - inet '192.168.1.19'

Table 9.37shows the functions available for use with thecidrandinettypes. Theabbrev,host, andtextfunctions are primarily intended to offer alternative display formats.

Table 9.37. cidrandinetFunctions

Function

Return Type

Description

Example

Result

abbrev(inet)

text

abbreviated display format as text

abbrev(inet '10.1.0.0/16')

10.1.0.0/16

abbrev(cidr)

text

abbreviated display format as text

abbrev(cidr '10.1.0.0/16')

10.1/16

broadcast(inet)

inet

broadcast address for network

broadcast('192.168.1.5/24')

192.168.1.255/24

family(inet)

int

extract family of address;4for IPv4,6for IPv6

family('::1')

6

host(inet)

text

extract IP address as text

host('192.168.1.5/24')

192.168.1.5

hostmask(inet)

inet

construct host mask for network

hostmask('192.168.23.20/30')

0.0.0.3

masklen(inet)

int

extract netmask length

masklen('192.168.1.5/24')

24

netmask(inet)

inet

construct netmask for network

netmask('192.168.1.5/24')

255.255.255.0

network(inet)

cidr

extract network part of address

network('192.168.1.5/24')

192.168.1.0/24

set_masklen(inet,int)

inet

set netmask length forinetvalue

set_masklen('192.168.1.5/24', 16)

192.168.1.5/16

set_masklen(cidr,int)

cidr

set netmask length forcidrvalue

set_masklen('192.168.1.0/24'::cidr, 16)

192.168.0.0/16

text(inet)

text

extract IP address and netmask length as text

text(inet '192.168.1.5')

192.168.1.5/32

inet_same_family(inet,inet)

boolean

are the addresses from the same family?

inet_same_family('192.168.1.5/24', '::1')

false

inet_merge(inet,inet)

cidr

the smallest network which includes both of the given networks

inet_merge('192.168.1.5/24', '192.168.2.5/24')

192.168.0.0/22

Anycidrvalue can be cast toinetimplicitly or explicitly; therefore, the functions shown above as operating oninetalso work oncidrvalues. (Where there are separate functions forinetandcidr, it is because the behavior should be different for the two cases.) Also, it is permitted to cast aninetvalue tocidr. When this is done, any bits to the right of the netmask are silently zeroed to create a validcidrvalue. In addition, you can cast a text value toinetorcidrusing normal casting syntax: for example,inet(expression)orcolname::cidr.

Table 9.38shows the functions available for use with themacaddrtype. The functiontrunc(macaddr)returns a MAC address with the last 3 bytes set to zero. This can be used to associate the remaining prefix with a manufacturer.

Table 9.38. macaddrFunctions

Function

Return Type

Description

Example

Result

trunc(macaddr)

macaddr

set last 3 bytes to zero

trunc(macaddr '12:34:56:78:90:ab')

12:34:56:00:00:00

Themacaddrtype also supports the standard relational operators (>,<=, etc.) for lexicographical ordering, and the bitwise arithmetic operators (~,&and|) for NOT, AND and OR.

Table 9.39shows the functions available for use with themacaddr8type. The functiontrunc(macaddr8)returns a MAC address with the last 5 bytes set to zero. This can be used to associate the remaining prefix with a manufacturer.

Table 9.39. macaddr8Functions

Function

Return Type

Description

Example

Result

trunc(macaddr8)

macaddr8

set last 5 bytes to zero

trunc(macaddr8 '12:34:56:78:90:ab:cd:ef')

12:34:56:00:00:00:00:00

macaddr8_set7bit(macaddr8)

macaddr8

set 7th bit to one, also known as modified EUI-64, for inclusion in an IPv6 address

macaddr8_set7bit(macaddr8 '00:34:56:ab:cd:ef')

02:34:56:ff:fe:ab:cd:ef

Themacaddr8type also supports the standard relational operators (>,<=, etc.) for ordering, and the bitwise arithmetic operators (~,&and|) for NOT, AND and OR.

8.15. 陣列

PostgreSQLallows columns of a table to be defined as variable-length multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, or composite type can be created. Arrays of domains are not yet supported.

8.15.1. Declaration of Array Types

To illustrate the use of array types, we create this table:

CREATE TABLE sal_emp (
    name            text,
    pay_by_quarter  integer[],
    schedule        text[][]
);

As shown, an array data type is named by appending square brackets ([]) to the data type name of the array elements. The above command will create a table namedsal_empwith a column of typetext(name), a one-dimensional array of typeinteger(pay_by_quarter), which represents the employee's salary by quarter, and a two-dimensional array oftext(schedule), which represents the employee's weekly schedule.

The syntax forCREATE TABLEallows the exact size of arrays to be specified, for example:

CREATE TABLE tictactoe (
    squares   integer[3][3]
);

However, the current implementation ignores any supplied array size limits, i.e., the behavior is the same as for arrays of unspecified length.

The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions inCREATE TABLEis simply documentation; it does not affect run-time behavior.

An alternative syntax, which conforms to the SQL standard by using the keywordARRAY, can be used for one-dimensional arrays.pay_by_quartercould have been defined as:

    pay_by_quarter  integer ARRAY[4],

Or, if no array size is to be specified:

    pay_by_quarter  integer ARRAY,

As before, however,PostgreSQLdoes not enforce the size restriction in any case.

8.15.2. Array Value Input

To write an array value as a literal constant, enclose the element values within curly braces and separate them by commas. (If you know C, this is not unlike the C syntax for initializing structures.) You can put double quotes around any element value, and must do so if it contains commas or curly braces. (More details appear below.) Thus, the general format of an array constant is the following:

'{ 
val1
delim
val2
delim
 ... }'

wheredelim_is the delimiter character for the type, as recorded in itspg_typeentry. Among the standard data types provided in thePostgreSQLdistribution, all use a comma (,), except for typeboxwhich uses a semicolon (;). Eachval_is either a constant of the array element type, or a subarray. An example of an array constant is:

'&#123;{1,2,3},{4,5,6},{7,8,9}&#125;'

This constant is a two-dimensional, 3-by-3 array consisting of three subarrays of integers.

To set an element of an array constant to NULL, writeNULLfor the element value. (Any upper- or lower-case variant ofNULLwill do.) If you want an actual string value“NULL”, you must put double quotes around it.

(These kinds of array constants are actually only a special case of the generic type constants discussed inSection 4.1.2.7. The constant is initially treated as a string and passed to the array input conversion routine. An explicit type specification might be necessary.)

Now we can show someINSERTstatements:

INSERT INTO sal_emp
    VALUES ('Bill',
    '{10000, 10000, 10000, 10000}',
    '&#123;{"meeting", "lunch"}, {"training", "presentation"}&#125;');

INSERT INTO sal_emp
    VALUES ('Carol',
    '{20000, 25000, 25000, 25000}',
    '&#123;{"breakfast", "consulting"}, {"meeting", "lunch"}&#125;');

The result of the previous two inserts looks like this:

SELECT * FROM sal_emp;
 name  |      pay_by_quarter       |                 schedule
-------+---------------------------+-------------------------------------------
 Bill  | {10000,10000,10000,10000} | &#123;{meeting,lunch},{training,presentation}&#125;
 Carol | {20000,25000,25000,25000} | &#123;{breakfast,consulting},{meeting,lunch}&#125;
(2 rows)

Multidimensional arrays must have matching extents for each dimension. A mismatch causes an error, for example:

INSERT INTO sal_emp
    VALUES ('Bill',
    '{10000, 10000, 10000, 10000}',
    '&#123;{"meeting", "lunch"}, {"meeting"}&#125;');
ERROR:  multidimensional arrays must have array expressions with matching dimensions

TheARRAYconstructor syntax can also be used:

INSERT INTO sal_emp
    VALUES ('Bill',
    ARRAY[10000, 10000, 10000, 10000],
    ARRAY[['meeting', 'lunch'], ['training', 'presentation']]);

INSERT INTO sal_emp
    VALUES ('Carol',
    ARRAY[20000, 25000, 25000, 25000],
    ARRAY[['breakfast', 'consulting'], ['meeting', 'lunch']]);

Notice that the array elements are ordinary SQL constants or expressions; for instance, string literals are single quoted, instead of double quoted as they would be in an array literal. TheARRAYconstructor syntax is discussed in more detail inSection 4.2.12.

8.15.3. Accessing Arrays

Now, we can run some queries on the table. First, we show how to access a single element of an array. This query retrieves the names of the employees whose pay changed in the second quarter:

SELECT name FROM sal_emp WHERE pay_by_quarter[1] 
<
>
 pay_by_quarter[2];

 name
-------
 Carol
(1 row)

The array subscript numbers are written within square brackets. By defaultPostgreSQLuses a one-based numbering convention for arrays, that is, an array ofn_elements starts witharray[1]and ends witharray[n_].

This query retrieves the third quarter pay of all employees:

SELECT pay_by_quarter[3] FROM sal_emp;

 pay_by_quarter
----------------
          10000
          25000
(2 rows)

We can also access arbitrary rectangular slices of an array, or subarrays. An array slice is denoted by writinglower-bound:_upper-bound_for one or more array dimensions. For example, this query retrieves the first item on Bill's schedule for the first two days of the week:

SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';

        schedule
------------------------
 &#123;{meeting},{training}&#125;
(1 row)

If any dimension is written as a slice, i.e., contains a colon, then all dimensions are treated as slices. Any dimension that has only a single number (no colon) is treated as being from 1 to the number specified. For example,[2]is treated as[1:2], as in this example:

SELECT schedule[1:2][2] FROM sal_emp WHERE name = 'Bill';

                 schedule
-------------------------------------------
 &#123;{meeting,lunch},{training,presentation}&#125;
(1 row)

To avoid confusion with the non-slice case, it's best to use slice syntax for all dimensions, e.g.,[1:2][1:1], not[2][1:1].

It is possible to omit thelower-bound_and/orupper-bound_of a slice specifier; the missing bound is replaced by the lower or upper limit of the array's subscripts. For example:

SELECT schedule[:2][2:] FROM sal_emp WHERE name = 'Bill';

        schedule
------------------------
 &#123;{lunch},{presentation}&#125;
(1 row)

SELECT schedule[:][1:1] FROM sal_emp WHERE name = 'Bill';

        schedule
------------------------
 &#123;{meeting},{training}&#125;
(1 row)

An array subscript expression will return null if either the array itself or any of the subscript expressions are null. Also, null is returned if a subscript is outside the array bounds (this case does not raise an error). For example, ifschedulecurrently has the dimensions[1:3][1:2]then referencingschedule[3][3]yields NULL. Similarly, an array reference with the wrong number of subscripts yields a null rather than an error.

An array slice expression likewise yields null if the array itself or any of the subscript expressions are null. However, in other cases such as selecting an array slice that is completely outside the current array bounds, a slice expression yields an empty (zero-dimensional) array instead of null. (This does not match non-slice behavior and is done for historical reasons.) If the requested slice partially overlaps the array bounds, then it is silently reduced to just the overlapping region instead of returning null.

The current dimensions of any array value can be retrieved with thearray_dimsfunction:

SELECT array_dims(schedule) FROM sal_emp WHERE name = 'Carol';

 array_dims
------------
 [1:2][1:2]
(1 row)

array_dimsproduces atextresult, which is convenient for people to read but perhaps inconvenient for programs. Dimensions can also be retrieved witharray_upperandarray_lower, which return the upper and lower bound of a specified array dimension, respectively:

SELECT array_upper(schedule, 1) FROM sal_emp WHERE name = 'Carol';

 array_upper
-------------
           2
(1 row)

array_lengthwill return the length of a specified array dimension:

SELECT array_length(schedule, 1) FROM sal_emp WHERE name = 'Carol';

 array_length
--------------
            2
(1 row)

cardinalityreturns the total number of elements in an array across all dimensions. It is effectively the number of rows a call tounnestwould yield:

SELECT cardinality(schedule) FROM sal_emp WHERE name = 'Carol';

 cardinality
-------------
           4
(1 row)

8.15.4. Modifying Arrays

An array value can be replaced completely:

UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}'
    WHERE name = 'Carol';

or using theARRAYexpression syntax:

UPDATE sal_emp SET pay_by_quarter = ARRAY[25000,25000,27000,27000]
    WHERE name = 'Carol';

An array can also be updated at a single element:

UPDATE sal_emp SET pay_by_quarter[4] = 15000
    WHERE name = 'Bill';

or updated in a slice:

UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}'
    WHERE name = 'Carol';

The slice syntaxes with omittedlower-bound_and/orupper-bound_can be used too, but only when updating an array value that is not NULL or zero-dimensional (otherwise, there is no existing subscript limit to substitute).

A stored array value can be enlarged by assigning to elements not already present. Any positions between those previously present and the newly assigned elements will be filled with nulls. For example, if arraymyarraycurrently has 4 elements, it will have six elements after an update that assigns tomyarray[6];myarray[5]will contain null. Currently, enlargement in this fashion is only allowed for one-dimensional arrays, not multidimensional arrays.

Subscripted assignment allows creation of arrays that do not use one-based subscripts. For example one might assign tomyarray[-2:7]to create an array with subscript values from -2 to 7.

New array values can also be constructed using the concatenation operator,||:

SELECT ARRAY[1,2] || ARRAY[3,4];
 ?column?
-----------
 {1,2,3,4}
(1 row)

SELECT ARRAY[5,6] || ARRAY[[1,2],[3,4]];
      ?column?
---------------------
 &#123;{5,6},{1,2},{3,4}&#125;
(1 row)

The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. It also accepts twoN-dimensional arrays, or anN-dimensional and anN+1-dimensional array.

When a single element is pushed onto either the beginning or end of a one-dimensional array, the result is an array with the same lower bound subscript as the array operand. For example:

SELECT array_dims(1 || '[0:1]={2,3}'::int[]);
 array_dims
------------
 [0:2]
(1 row)

SELECT array_dims(ARRAY[1,2] || 3);
 array_dims
------------
 [1:3]
(1 row)

When two arrays with an equal number of dimensions are concatenated, the result retains the lower bound subscript of the left-hand operand's outer dimension. The result is an array comprising every element of the left-hand operand followed by every element of the right-hand operand. For example:

SELECT array_dims(ARRAY[1,2] || ARRAY[3,4,5]);
 array_dims
------------
 [1:5]
(1 row)

SELECT array_dims(ARRAY[[1,2],[3,4]] || ARRAY[[5,6],[7,8],[9,0]]);
 array_dims
------------
 [1:5][1:2]
(1 row)

When anN-dimensional array is pushed onto the beginning or end of anN+1-dimensional array, the result is analogous to the element-array case above. EachN-dimensional sub-array is essentially an element of theN+1-dimensional array's outer dimension. For example:

SELECT array_dims(ARRAY[1,2] || ARRAY[[3,4],[5,6]]);
 array_dims
------------
 [1:3][1:2]
(1 row)

An array can also be constructed by using the functionsarray_prepend,array_append, orarray_cat. The first two only support one-dimensional arrays, butarray_catsupports multidimensional arrays. Some examples:

SELECT array_prepend(1, ARRAY[2,3]);
 array_prepend
---------------
 {1,2,3}
(1 row)

SELECT array_append(ARRAY[1,2], 3);
 array_append
--------------
 {1,2,3}
(1 row)

SELECT array_cat(ARRAY[1,2], ARRAY[3,4]);
 array_cat
-----------
 {1,2,3,4}
(1 row)

SELECT array_cat(ARRAY[[1,2],[3,4]], ARRAY[5,6]);
      array_cat
---------------------
 &#123;{1,2},{3,4},{5,6}&#125;
(1 row)

SELECT array_cat(ARRAY[5,6], ARRAY[[1,2],[3,4]]);
      array_cat
---------------------
 &#123;{5,6},{1,2},{3,4}&#125;

In simple cases, the concatenation operator discussed above is preferred over direct use of these functions. However, because the concatenation operator is overloaded to serve all three cases, there are situations where use of one of the functions is helpful to avoid ambiguity. For example consider:

SELECT ARRAY[1, 2] || '{3, 4}';  -- the untyped literal is taken as an array
 ?column?
-----------
 {1,2,3,4}

SELECT ARRAY[1, 2] || '7';                 -- so is this one
ERROR:  malformed array literal: "7"

SELECT ARRAY[1, 2] || NULL;                -- so is an undecorated NULL
 ?column?
----------
 {1,2}
(1 row)

SELECT array_append(ARRAY[1, 2], NULL);    -- this might have been meant
 array_append
--------------
 {1,2,NULL}

In the examples above, the parser sees an integer array on one side of the concatenation operator, and a constant of undetermined type on the other. The heuristic it uses to resolve the constant's type is to assume it's of the same type as the operator's other input — in this case, integer array. So the concatenation operator is presumed to representarray_cat, notarray_append. When that's the wrong choice, it could be fixed by casting the constant to the array's element type; but explicit use ofarray_appendmight be a preferable solution.

8.15.5. Searching in Arrays

To search for a value in an array, each value must be checked. This can be done manually, if you know the size of the array. For example:

SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR
                            pay_by_quarter[2] = 10000 OR
                            pay_by_quarter[3] = 10000 OR
                            pay_by_quarter[4] = 10000;

However, this quickly becomes tedious for large arrays, and is not helpful if the size of the array is unknown. An alternative method is described inSection 9.23. The above query could be replaced by:

SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);

In addition, you can find rows where the array has all values equal to 10000 with:

SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);

Alternatively, thegenerate_subscriptsfunction can be used. For example:

SELECT * FROM
   (SELECT pay_by_quarter,
           generate_subscripts(pay_by_quarter, 1) AS s
      FROM sal_emp) AS foo
 WHERE pay_by_quarter[s] = 10000;

This function is described inTable 9.59.

You can also search an array using the&&operator, which checks whether the left operand overlaps with the right operand. For instance:

SELECT * FROM sal_emp WHERE pay_by_quarter 
&
&
 ARRAY[10000];

This and other array operators are further described inSection 9.18. It can be accelerated by an appropriate index, as described inSection 11.2.

You can also search for specific values in an array using thearray_positionandarray_positionsfunctions. The former returns the subscript of the first occurrence of a value in an array; the latter returns an array with the subscripts of all occurrences of the value in the array. For example:

SELECT array_position(ARRAY['sun','mon','tue','wed','thu','fri','sat'], 'mon');
 array_positions
-----------------
 2

SELECT array_positions(ARRAY[1, 4, 3, 1, 3, 4, 2, 1], 1);
 array_positions
-----------------
 {1,4,8}

Tip

Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

8.15.6. Array Input and Output Syntax

The external text representation of an array value consists of items that are interpreted according to the I/O conversion rules for the array's element type, plus decoration that indicates the array structure. The decoration consists of curly braces ({and}) around the array value plus delimiter characters between adjacent items. The delimiter character is usually a comma (,) but can be something else: it is determined by thetypdelimsetting for the array's element type. Among the standard data types provided in thePostgreSQLdistribution, all use a comma, except for typebox, which uses a semicolon (;). In a multidimensional array, each dimension (row, plane, cube, etc.) gets its own level of curly braces, and delimiters must be written between adjacent curly-braced entities of the same level.

The array output routine will put double quotes around element values if they are empty strings, contain curly braces, delimiter characters, double quotes, backslashes, or white space, or match the wordNULL. Double quotes and backslashes embedded in element values will be backslash-escaped. For numeric data types it is safe to assume that double quotes will never appear, but for textual data types one should be prepared to cope with either the presence or absence of quotes.

By default, the lower bound index value of an array's dimensions is set to one. To represent arrays with other lower bounds, the array subscript ranges can be specified explicitly before writing the array contents. This decoration consists of square brackets ([]) around each array dimension's lower and upper bounds, with a colon (:) delimiter character in between. The array dimension decoration is followed by an equal sign (=). For example:

SELECT f1[1][-2][3] AS e1, f1[1][-1][5] AS e2
 FROM (SELECT '[1:1][-2:-1][3:5]=&#123;{1,2,3},{4,5,6}}&#125;'::int[] AS f1) AS ss;

 e1 | e2
----+----
  1 |  6
(1 row)

The array output routine will include explicit dimensions in its result only when there are one or more lower bounds different from one.

If the value written for an element isNULL(in any case variant), the element is taken to be NULL. The presence of any quotes or backslashes disables this and allows the literal string value“NULL”to be entered. Also, for backward compatibility with pre-8.2 versions ofPostgreSQL, thearray_nullsconfiguration parameter can be turnedoffto suppress recognition ofNULLas a NULL.

As shown previously, when writing an array value you can use double quotes around any individual array element. You_must_do so if the element value would otherwise confuse the array-value parser. For example, elements containing curly braces, commas (or the data type's delimiter character), double quotes, backslashes, or leading or trailing whitespace must be double-quoted. Empty strings and strings matching the wordNULLmust be quoted, too. To put a double quote or backslash in a quoted array element value, use escape string syntax and precede it with a backslash. Alternatively, you can avoid quotes and use backslash-escaping to protect all data characters that would otherwise be taken as array syntax.

You can add whitespace before a left brace or after a right brace. You can also add whitespace before or after any individual item string. In all of these cases the whitespace will be ignored. However, whitespace within double-quoted elements, or surrounded on both sides by non-whitespace characters of an element, is not ignored.

Note

Remember that what you write in an SQL command will first be interpreted as a string literal, and then as an array. This doubles the number of backslashes you need. For example, to insert atextarray value containing a backslash and a double quote, you'd need to write:

INSERT ... VALUES (E'{"\\\\","\\""}');

The escape string processor removes one level of backslashes, so that what arrives at the array-value parser looks like{"\\","\""}. In turn, the strings fed to thetextdata type's input routine become\and"respectively. (If we were working with a data type whose input routine also treated backslashes specially,byteafor example, we might need as many as eight backslashes in the command to get one backslash into the stored array element.) Dollar quoting (seeSection 4.1.2.4) can be used to avoid the need to double backslashes.

Tip

TheARRAYconstructor syntax (seeSection 4.2.12) is often easier to work with than the array-literal syntax when writing array values in SQL commands. InARRAY, individual element values are written the same way they would be written when not members of an array.

5.10. 分割資料表

PostgreSQL 支援基礎的分割資料表。本節描述如何讓分割資料表成為資料庫設計的一部份。

5.10.1. 概論

資料表的分割，指的是把一個邏輯上很大的資料表，分割為數個實體的小資料表。分割資料表可以獲得幾點好處：

資料查詢的效能在某些情況下會大幅改善，特別是資料表中有一些資料列是時常被存取的，而它們只存在於某一個單一的分割區，或某一小群分割區。分割資料表的優勢在於大幅降低欄位索引的大小，而當其大小縮小到可以完全在記憶體中執行時，那就會獲得相當大的效能改善。
當資料查詢或更新時，它可能牽連到某一分割區大部份的資料列，效能同樣會獲得改善，它可以直接掃描存取整個「小」區域，而不是在「大」資料表中，以索引逐筆搜尋分散的資料列。
大量載入或移除資料的話，可以直接對整個分割區操作，當然這些資料要能符合分割資料表的設計。使用 ALTER TABLE DETACH PARTITION 或是 DROP TABLE 移除特定的分割資料表，都比進行大量的 DELETE 要快非常多。因為這些指令不會進行資料表的整理，而大量的 DELETE 會引發 VACUUM 的啓動。
少用的資料可以搬到較便宜或比較慢的儲存媒體。

這些優勢通常是在原來資料表特別大是會很明顯，不過實際上會獲得什麼樣的改善，還是要視應用程式而定。一個基本的概念是資料表的大小，如果超過了資料庫主機的記憶體上限，那就最好進行資料表的分割。

PostgreSQL 內建提供的資料表分割方式：

Range Partitioning

資料表是以某個欄位或某些欄位的資料內容範圍來分割，所謂的範圍，就表示彼此之間沒有重疊的部份。舉例來說，你可以以資料的範圍做分割，或是以指定的公司資料 ID 的範圍來分割。

List Partitioning

明確列出有哪些資料的值要被分配在哪些資料表。

如果你的應用需要使用上述兩種以外的分割方式，還有其他方式，像是繼承，UNION ALL views，也可以使用。這些方式提供更多的彈性，但都不如內建分割方式所提升的效能。

5.10.2. 分割資料表宣告

PostgreSQL 提供一個方式，可以指定如何將資料表分割為較小的資料表，這些小資料表稱作為分割區（partitions）。被分割的資料表，稱作分割資料表。分割主鍵包含了分割方法與一些欄位內容或是表示式。

所有新插入的資料列將會依分割主鍵的規則，轉送至分割區中。每一個分割區都是所有資料列的子集合，範圍由其定義的資料邊界而定。目前支援的分割方法有 Range 及 List 分割法，也就是每一個分割區都需要指定一個區段或是一個列表。

分割區本身也可以是分割資料表，這稱作為次分割（sub-partitioning）。分割區會擁有它們自己的索引，限制條件，以及預設值，是獨立於其他分割區的。索引必須要分別為每個分割區建立。請參閱 CREATE TABLE 進一步瞭解建立分割資料表及分割區的指令。

一般資料表和分割資料表是無法互相轉換的，但你可以使一個已存放資料的一般資料表或分割資料表成為某個分割資料表的新分割區；或是從某個分割資料表移出某個分割區，使其成為獨立的一般資料表。請參閱 ALTER TABLE 瞭解 ATTACH PARTITION 及 DETACH PARTITION 的使用方式。

分割資料表和個別的分割區之間，隱含著繼承的關係；不過它們並無法使用先前章節所介紹過的繼承功能。舉例來說，分割區不能同時是其他分割資料表的子資料表，一般資料表也不能繼承分割資料表。簡單來說，分割資料表及其分割區，都不能和一般資料表有任何繼承的關係。分割區與分割資料表是階層關係，而其分割區也是繼承的階層，所以所有一般的繼承規則，在 5.9 節中介紹的，都會成立，除了有一些例外，比較重要的如下：

分割資料表的 CHECK 及 NOT NULL 限制條件，會被其分割區所繼承。在分割資料區中，把 CHECK 標示為 NO INHERIT 是不被允許的。
在分割資料表新增或移除限制條件時使用 ONLY 的話，只有在其還沒有分割區時是允許的。一旦其下有分割區存在，使用 ONLY 就會產生錯誤。換句話說，當有分割區時，這個執行方式是不被允許的。但是你可以新增或移除分割區裡的限制條件，只要它們並沒有在分割資料表中存在就好。在分割資料表嘗試執行 TRUNCATE ONLY 指令，也會產生錯誤，因為分割資料表並未實際存放資料。
分割區不能有分割資料表裡沒有的欄位。不能在 CREATE TABLE 時建立，也不能使用 ALTER TABLE 增加。資料表能成為一個分割區，它的欄位必須和分割資料表完全吻全，包含 OIDs。
你無法移除分割區中的 NOT NULL 限制條件，如果它是定義在分割資料表中的話。

分割區可以是外部資料表（參閱 CREATE FOREIGN TABLE），雖然它會有一些使用上的限制。舉例來說，插入資料到分割資料表，資料並不會轉送到外部資料表的分割區處理。

5.10.2.1. 範例

假設我們為一家大型冰淇淋公司建構一個資料庫。這家公司每天測量最高溫度，同時也統計各區域的銷售情況。概念上，我們需要像這樣的資料表：

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
);

我們知道大多數都是在進行近期的資料查詢，如最近一週、最近一個月、或最近一季的資料，這個資料表用於產生管理用的線上報表之用。為了降低需要儲存的資料量，我們決定只保存 3 年內有價值的資料。每一個月開始時，我們就會移除最舊那個月的資料。在這種情況下，我們可以使用分割資料表來幫助我們滿足所有需求。

在這個例子中，使用下列步驟來宣告分割資料表：

建立 measurement 資料表時，使用 PARTITION BY 子句，在本例子使用 RANGE 的分割方法，然後以 logdate 作為分割主鍵。
```
CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);
```
你也可以使用多個欄位作為分割主鍵來依範圍分割，當然，這會產生相當多的分割區，它可以分割得更小一些。也就是說，使用較少的分割主鍵欄位，是較為粗略的分割。當分割資料表被查詢時，就可以減少分割區存取的數量，如果條件是遍及數個欄位時。舉例來說，可以想像一下一個以範圍分割的資料表，同時以 lastname 及 firstname 兩個欄位作為分割主鍵的情況。
建立分割區時，每一個分割區的定義都必須指定其分割方式的規則與分割主鍵。需要注意的是，如果指定的規則，造成某些分割主鍵的值會落在多個分割區中的話，將會產生錯誤。從分割資料表插入資料時，如果沒有對應到任何一個分割區的話，也會產生錯誤；適當擺放資料的分割區必須要手動加入。

分割區的建立就如同一般的 PostgreSQL 資料表一樣（也可以是外部資料表），也可以指定各自的 tablespace 和儲存參數。

不需要為分割規則設定分割區的限制條件，而是在設定分割方式及規則時，其限制條件就已經隱含在內了。

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01')

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
    FOR VALUES FROM ('2006-03-01') TO ('2006-04-01')

...
CREATE TABLE measurement_y2007m11 PARTITION OF measurement
    FOR VALUES FROM ('2007-11-01') TO ('2007-12-01')

CREATE TABLE measurement_y2007m12 PARTITION OF measurement
    FOR VALUES FROM ('2007-12-01') TO ('2008-01-01')
    TABLESPACE fasttablespace;

CREATE TABLE measurement_y2008m01 PARTITION OF measurement
    FOR VALUES FROM ('2008-01-01') TO ('2008-02-01')
    TABLESPACE fasttablespace
    WITH (parallel_workers = 4);

要實現子分割時，使用 PARTITION BY 子句來建立個別的分割區。舉例來說：

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01')
    PARTITION BY RANGE (peaktemp);

在建立了 measurement_y__2006m02 資料表之後，所有新增到 measurement 資料表中符合分割規則而被派送到 measurementy_2006m02 的資料（或是符合條件的資料直接新增到 measurement_y2006m02），都會再進一步依據 peaktemp 欄位的內容轉存到它的子分割區。這個分割主鍵是可以和其父資料表分割主鍵有重疊的，不過要注意的是，指定子分割區的規則時，資料真的會分配到該子分割區，資料庫系統不會去檢查該分配是不是真的會發生。

為每一個分割區資料表的分割主鍵建立索引。（這並不是一定要做的事，不過對大多數的情況是好的。如果你需要這些值俱備唯一性，那你應該建立唯一索引或是主鍵。）

CREATE INDEX ON measurement_y2006m02 (logdate);
CREATE INDEX ON measurement_y2006m03 (logdate);
...
CREATE INDEX ON measurement_y2007m11 (logdate);
CREATE INDEX ON measurement_y2007m12 (logdate);
CREATE INDEX ON measurement_y2008m01 (logdate);

確定 postgresql.conf 中的 constrain_exclusion 設定並未被關閉。如果是關閉狀態的話，查詢最佳化就不會進行。

在上面的例子中，我們需要每個月建立一個分割區，所以如果能再有程序自動建立這些資料表就更好了。

5.10.2.2. 分割區管理

一般來說，分割區在初始建立時，會假設其會不斷地變動。通常會需要定期移除舊的分割區，然後為新的資料加入新的分割區。使用分割資料表時，其中一件很重要的事，就是要能夠很明確地做到這個管理動作，否則大量地實體資料變更，會嚴重拖累資料庫系統的效率。

最簡易移除資料的方式，就是移除分割區：

DROP TABLE measurement_y2006m02;

這可以非常快地移除數百萬筆資料，因為它並不是單獨去移除每一筆資料。注意到的是，這個動作需要父資料表取得 ACCESS EXCLUSIVE 的鎖定。

另一個方式也很常使用，就是把某個分割區從分割資料表中卸載，但仍然保存該分割區的資料表：

ALTER TABLE measurement DETACH PARTITION measurement_y2006m02;

這樣可以在資料被移除之前再進行一些其他的操作。舉例來說，很常見的使用情況是備份資料，利用 COPY 指令、pg_dump、或相關的工具；把資料以小單位進行彙總計算產生報表，也是很常用的方式。

接下來，要處理新的資料也是類似的動作，我們可以建立新的資料表，並宣告為分割區來使用，就如同先前我們介紹的設定方式一樣：

CREATE TABLE measurement_y2008m02 PARTITION OF measurement
    FOR VALUES FROM ('2008-02-01') TO ('2008-03-01')
    TABLESPACE fasttablespace;

另一種更方便的方式是先建立新的資料表，然後再將它掛載為分割區。好處是這樣可以在掛載前先進行資料的載入、檢查和轉換：

CREATE TABLE measurement_y2008m02
  (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS)
  TABLESPACE fasttablespace;

ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02
   CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' );

\copy measurement_y2008m02 from 'measurement_y2008m02'
-- possibly some other data preparation work

ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
    FOR VALUES FROM ('2008-02-01') TO ('2008-03-01' );

在進行 ATTACH PARTITION 指令前，建議最好先設定 CHECK 限制條件，同分割資料表的條件，這樣的話，系統就會跳過隱含的資料檢查過程。如果沒有先設定限制條件的話，資料表會被 ACCESS EXCLUSIVE 鎖定，然後進行全資料掃描以檢查其合法性。最後我們在掛載分割區之後再移除該 CHECK 設定，因為它已經不再需要了。

5.10.2.3. 使用限制

使用分割資料表是會有下面的受限制的使用情況：

沒有方法可以自動在所有分割區建立需要的索引。每個分割區的索引都需要個別建立。這也代表了，沒有任何方式可以建立主鍵、唯一性限制條件、或其他跨分割區的限制條件需求；只能個別分割區自行維護。
因為分割資料表無法建立主鍵，所以外部鍵就無法支援了，無論是參考其他資料表或被參考，都不支援。
在分割資料表使用 ON CONFLICT 子句的話，會產生錯誤訊息，因為沒有唯一性及除外限制可以使用。目前不支援跨所有分割區的唯一性限制，也包含其他除外限制。
想要利用 UPDATE 改變欄位值，使資料移動到另一個分割區是行不通的。因為隱含的資料限制條件會造成其更新失敗。
資料列的事件觸發函數，必須定義在個別分割區的資料表中，而非分割資料表。

5.10.3. 使用繼承來分割資料表

使用內建的分割資料表，基本上適用於大多數的應用情境，也可以使用一些彈性的技巧會更有幫助。分割資料庫也可以用資料表繼承的方式來達成，好處是可以支援一些本來有限制的使用情況，例如：

分割資料表會強制使所有分割區都必須要與父資料表完全一樣的資料結構，但使用繼承的話，就可以允許分割區各自擁有額外的資料欄位。
資料表的繼承可以是多重繼承。
內建的分割資料表只支援列表（list）和範圍（range）兩種資料對應方式，而繼承則可以用自訂的方式來對應資料分區。（注意，如果你的資料對應方式無法適當地利用每個分割區的話，那麼查詢將會很沒有效率。）
內建的分割資料表相對於資料表繼承時，有一些操作需要較嚴格的資料鎖定（lock）。舉例來說，分割資料表在新增或移除分割區時，會使用 ACCESS EXCLUSIVE 等級的資料鎖定，但實際上在資料表繼承維護時，只需要 SHARE UPDATE EXCLUSIVE 等級即可。

5.10.3.1. 範例

以先前使用過的 measurement 資料表作為範例說明，我們要使用繼承功能來完成分割資料表。請參考下列步驟：

建立主資料表（master），所有的分割區將會繼承它，而這個資料表不會儲存任何資料。請不要在這個資料表上定義任何限制條件，除非你希望每一個分割區都要有相等的限制條件。同樣地，也不要定義任何索引或唯一性限制。在這個例子裡，資料表 measurement 就如同先前一開始宣告的一樣。

建立幾個子資料表（child），由主資料表繼承而得。一般來說，這些資料表並不增加額多的欄位。就如同內建的分割資料表一樣，這些子資料表就是一般的 PostgreSQL 資料表（或是外部資料表）。

CREATE TABLE measurement_y2006m02 () INHERITS (measurement);
CREATE TABLE measurement_y2006m03 () INHERITS (measurement);
...
CREATE TABLE measurement_y2007m11 () INHERITS (measurement);
CREATE TABLE measurement_y2007m12 () INHERITS (measurement);
CREATE TABLE measurement_y2008m01 () INHERITS (measurement);

在每個子資料表（分割區）中，加入明確分隔的欄位值限制條件。

典型的範例如下：

CHECK ( x = 1 )
CHECK ( county IN ( 'Oxfordshire', 'Buckinghamshire', 'Warwickshire' ))
CHECK ( outletID >= 100 AND outletID < 200 )

請確認這些限制條件是明確的且彼此不會重疊的。下面是使用範圍分割時常見的錯誤：

CHECK ( outletID BETWEEN 100 AND 200 )
CHECK ( outletID BETWEEN 200 AND 300 )

這裡的錯誤來自於「200」同時符合兩個分割區的條件。

下面是比較好的寫法：

CREATE TABLE measurement_y2006m02 (
    CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )
) INHERITS (measurement);

CREATE TABLE measurement_y2006m03 (
    CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )
) INHERITS (measurement);

...
CREATE TABLE measurement_y2007m11 (
    CHECK ( logdate >= DATE '2007-11-01' AND logdate < DATE '2007-12-01' )
) INHERITS (measurement);

CREATE TABLE measurement_y2007m12 (
    CHECK ( logdate >= DATE '2007-12-01' AND logdate < DATE '2008-01-01' )
) INHERITS (measurement);

CREATE TABLE measurement_y2008m01 (
    CHECK ( logdate >= DATE '2008-01-01' AND logdate < DATE '2008-02-01' )
) INHERITS (measurement);

對每一個分割區，對分割主鍵欄位建立索引，就如同一般的索引建立一樣。

CREATE INDEX measurement_y2006m02_logdate ON measurement_y2006m02 (logdate);
CREATE INDEX measurement_y2006m03_logdate ON measurement_y2006m03 (logdate);
CREATE INDEX measurement_y2007m11_logdate ON measurement_y2007m11 (logdate);
CREATE INDEX measurement_y2007m12_logdate ON measurement_y2007m12 (logdate);
CREATE INDEX measurement_y2008m01_logdate ON measurement_y2008m01 (logdate);

我們希望應用程式可以使用 INSERT INTO measurement ...的語法，資料則自動轉送到適當的資料表。我們可以在主資料表建立適當的 Trigger 來完成此事。如果資料都會被新增到最新的子資料表中，我們可以建立很簡單的事件觸發函數：

CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO measurement_y2008m01 VALUES (NEW.*);
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

建立這個函數之後，再建立 Trigger：

CREATE TRIGGER insert_measurement_trigger
    BEFORE INSERT ON measurement
    FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();

我們必須每個月都重新定義這個函數，使其都指向最新的分割區，但 Trigger 宣告並不需要更新。

我們也可以在新增資料時，讓它們自動找到適當的分割區，那就需要宣告一個比較複雜的函數，如下：

CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.logdate >= DATE '2006-02-01' AND
         NEW.logdate < DATE '2006-03-01' ) THEN
        INSERT INTO measurement_y2006m02 VALUES (NEW.*);
    ELSIF ( NEW.logdate >= DATE '2006-03-01' AND
            NEW.logdate < DATE '2006-04-01' ) THEN
        INSERT INTO measurement_y2006m03 VALUES (NEW.*);
    ...
    ELSIF ( NEW.logdate >= DATE '2008-01-01' AND
            NEW.logdate < DATE '2008-02-01' ) THEN
        INSERT INTO measurement_y2008m01 VALUES (NEW.*);
    ELSE
        RAISE EXCEPTION 'Date out of range.  Fix the measurement_insert_trigger() function!';
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

Trigger 本身的定義仍然是一樣的。要注意的是，每一個 IF 判斷式，都必須要完全符合 CHECK 限制條件的宣告。

這個函數比前一個函數要複雜許多，但它就不需要時常更新了，只要分割區在需要前就被建立就好。

注意

實務上，最好是可以先檢查新建立的分割區，在它要掛載之前。簡化來看，我們在這個例子中使用事件觸發函數（Trigger）來處理這個動作。

另一個作法是在主要的資料表上設定規則，來取代事件觸發函數。例如：

CREATE RULE measurement_insert_y2006m02 AS
ON INSERT TO measurement WHERE
    ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )j水
DO INSTEAD
    INSERT INTO measurement_y2006m02 VALUES (NEW.*);
...
CREATE RULE measurement_insert_y2008m01 AS
ON INSERT TO measurement WHERE
    ( logdate >= DATE '2008-01-01' AND logdate < DATE '2008-02-01' )
DO INSTEAD
    INSERT INTO measurement_y2008m01 VALUES (NEW.*);

基本上，設定規則對資料庫的負擔是比事件觸發函數更重一點，但其負擔是在於每一次查詢，而非每一個資料列，所以這個方式比較適合一次大量插入資料的情況。不過，在大多數的情況，事件觸發函數會有比較好的效能。

但要注意的是 COPY 指令會忽略規則。如果你要使用 COPY 來插入資料，你應該要從父資料表插入。而 COPY 會觸發事件觸發函數，所以你如果使用 Trigger 的話，那就像一般使用的方式使用就好了。

另一個使用 rule 的缺點是，沒有比較簡單的方法可以強制產生錯誤，如果設定的規則錯誤的話；那些出錯的資料，只會靜靜地留在父資料表中而已。

確認一下 postgresql.conf 中的 constraint_exclusion 並沒有被關閉。如果被關閉的話，查詢就不會最佳化處理。

就如同我們看到的，複雜的分割區結構，可能會需要相當數量的 DDL 宣告。在先前的例子，我們每個月建立一個新的分割區，所以比較聰明的作法是寫一小段程式來自動產生那些指令。

5.10.3.2. 分割區管理

要快速刪除舊資料，可以簡單地移除不再使用的分割區資料表即可：

DROP TABLE measurement_y2006m02;

將一個分割區從分割資料表中卸載，仍然留存該資料表：

ALTER TABLE measurement_y2006m02 NO INHERIT measurement;

要新增一個分割區來處理新的資料，建立一個空的分割區，就如同先前介紹的方式：

CREATE TABLE measurement_y2008m02 (
    CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' )
) INHERITS (measurement);

另一種更方便的方式是先建立新的資料表，然後再將它掛載為分割區。好處是這樣可以在掛載前先進行資料的載入、檢查和轉換：

CREATE TABLE measurement_y2008m02
  (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02
   CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' );
\copy measurement_y2008m02 from 'measurement_y2008m02'
-- possibly some other data preparation work
ALTER TABLE measurement_y2008m02 INHERIT measurement;

5.10.3.3. 提醒

如果你使用繼承在實現分割資料表的話，請注意下列項目：

沒有任何自動的方式可以檢驗 CHECK 子句之間是否矛盾。比較建議的作法是程式化控制分割區的建立和維護，而非手動處理。
在這裡所展示的方法都是假設分割主鍵欄位不會改變，也不會需要把某個資料列在分割區間移動。如果你企圖使用 UPDATE 指令，而期待資料列自動移到另一個分割區的話，那將會得到失敗的結果，因為會先被 CHECK 限制條件擋下來。如果你需要做到這樣的效果，那麼你可以建立 UPDATE 事件的觸發函數，但這可能會造成你的資料庫管理更加複雜。
如果你手動執行 VACUUM 或 ANALYZE 指令，不要忘了你需要在每個分割區資料表分別執行。例如：ANALYZE measurement;將只會在父資料表執行。
INSERT 指令裡的 ON CONFLICT 子句將無法運作，因為它只能在父資料表產生作用，而不會到子資料表中執行。
事件觸發函數（Trigger）需要建立，負責把資料放在設計好的資料表中，除非應用程式很清楚分割區的結構。事件觸發函數可能會不太好寫，而且也會比使用內建的分割資料表時慢很多。

5.10.4. 分割資料表與除外限制（Constraint Exclusion）

除外限制（Constraint exclusion）是一種查詢最佳化的技術，用來改善分割資料表的效能。（包含內建分割資料表的方式，以及繼承式的分割資料表）舉個例子如下：

SET constraint_exclusion = on;
SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';

如果沒有除外限制的話，上面的查詢語句將會掃描每一個 measurement 資料表的分割區。而開啓了除外限制的話，查詢前就會先測試限制條件，確認該分割區是否需要掃描，因為有些分割區可能完全沒有資料符合該條件。如果確實有不需要掃描的分割區，那麼它就會在實際查詢時排除在外。

你可以使用 EXPLAIN 指令來比較除外查詢開啓與否的差異。下面是未最佳化的例子：

SET constraint_exclusion = off;
EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';

                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Aggregate  (cost=158.66..158.68 rows=1 width=0)
   ->  Append  (cost=0.00..151.88 rows=2715 width=0)
         ->  Seq Scan on measurement  (cost=0.00..30.38 rows=543 width=0)
               Filter: (logdate >= '2008-01-01'::date)
         ->  Seq Scan on measurement_y2006m02 measurement  (cost=0.00..30.38 rows=543 width=0)
               Filter: (logdate >= '2008-01-01'::date)
         ->  Seq Scan on measurement_y2006m03 measurement  (cost=0.00..30.38 rows=543 width=0)
               Filter: (logdate >= '2008-01-01'::date)
...
         ->  Seq Scan on measurement_y2007m12 measurement  (cost=0.00..30.38 rows=543 width=0)
               Filter: (logdate >= '2008-01-01'::date)
         ->  Seq Scan on measurement_y2008m01 measurement  (cost=0.00..30.38 rows=543 width=0)
               Filter: (logdate >= '2008-01-01'::date)

可以看到有些分割區可能會使用索引掃描來取代全資料表掃描，但這裡的重點是，有些分割區是完全不需要掃描的。當我們開啓除外限制時，很明顯可以得到一個更簡潔的查詢計畫：

SET constraint_exclusion = on;
EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Aggregate  (cost=63.47..63.48 rows=1 width=0)
   ->  Append  (cost=0.00..60.75 rows=1086 width=0)
         ->  Seq Scan on measurement  (cost=0.00..30.38 rows=543 width=0)
               Filter: (logdate >= '2008-01-01'::date)
         ->  Seq Scan on measurement_y2008m01 measurement  (cost=0.00..30.38 rows=543 width=0)
               Filter: (logdate >= '2008-01-01'::date)

要注意的是，除外限制只檢查 CHECK 子句，而不是索引，所以不一定要對主鍵欄位定義索引。索引是否需要在分割區建立，是依據你希望查詢在該分割區大範圍或小範圍被查詢。索引的用處在後者會比較明顯，而不是前者。預設也是建議的選項不是 on 也不是 off，而是使用 partition 子句，讓查詢只在需要執行的分割區執行。設定除外限制為「on」的話，對於大範圍的查詢很有用，但簡單查詢就不見得有好處了。

下面還有幾點注意事項，繼承和分割資料表都適用：

除外限制只適用於 WHERE 子句是常數的條件（或外部引用的參數）。舉例來說，和一個不確定結果的函數比較的話，如 CURRENT_TIMESTAMP，那就無法最佳化，因為查詢計畫無法事先得知執行時的值。
保持分割區限制條件簡潔一些，否則查詢計畫無從查驗該分割區是否需要處理。請在列舉分割時，使用簡單的等式；或在範圍分割時使用簡單的比較式，就如同先前的例子一樣。一個好的規則是只包含分割主鍵的欄位，並且使用 B-tree 可以索引的運算子，也同時宣告在主資料表中，只允許適用於 B-tree 的欄位宣告為分割主鍵。（如果使用內建分割語法的話，這不會有什麼問題，因為系統會自動宣告適合查詢計畫的限制條件。）
由於除外限制會在查詢前檢查所有分割區的限制條件，所以大量的分割區可能會增加查詢計畫的時間。所謂的「大量」，通常幾百個分割區還是可以接受的範圍，但最好不要用於上千個分割區的情境中。

9.14. XML函式

版本：11

本節中描述的函數和類函數表示式對 xml 型別的值進行操作。有關 xml 型別的訊息，請查看第 8.13 節。這裡不再重複用於轉換為 xml 型別的函數表示式 xmlparse 和 xmlserialize。使用大多數這些函數需要使用 configure --with-libxml 編譯安裝。

9.14.1. 産生 XML 內容

一組函數和類函數的表示式可用於從 SQL 資料産生 XML 內容。因此，它們特別適合將查詢結果格式化為 XML 文件以便在用戶端應用程序中進行處理。

9.14.1.1. xmlcomment

xmlcomment(text)

函數 xmlcomment 建立一個 XML 字串，其中包含指定文字作為內容的 XML 註釋。文字不能包含「 -- 」或以「 - 」結尾，以便産生的結構是有效的 XML 註釋。如果參數為 null，則結果為 null。

例如：

SELECT xmlcomment('hello');

  xmlcomment
--------------
 <!--hello-->

9.14.1.2. xmlconcat

xmlconcat(xml[, ...])

函數 xmlconcat 連接列表中各個 XML 字串，以建立包含 XML 內容片段的單個字串。空值會被忽略；如果都沒有非空值參數，則結果僅為 null。

例如：

SELECT xmlconcat('<abc/>', '<bar>foo</bar>');

      xmlconcat
----------------------
 <abc/><bar>foo</bar>

XML 宣告（如果存在）組合如下。如果所有參數值具有相同的 XML 版本宣告，則在結果中使用該版本，否則不使用任何版本。如果所有參數值都具有獨立宣告值「yes」，則在結果中使用該值。如果所有參數值都具有獨立的宣告值且至少有一個為「no」，則在結果中使用該值。否則結果將沒有獨立宣告。如果確定結果需要獨立宣告但沒有版本聲明，則將使用版本為 1.0 的版本宣告，因為 XML 要求 XML 宣告包含版本宣告。在所有情況下都會忽略編碼宣告並將其刪除。

例如：

SELECT xmlconcat('<?xml version="1.1"?><foo/>', '<?xml version="1.1" standalone="no"?><bar/>');

             xmlconcat
-----------------------------------
 <?xml version="1.1"?><foo/><bar/>

9.14.1.3. xmlelement

xmlelement(name name [, xmlattributes(value [AS attname] [, ... ])] [, content, ...])

xmlelement 表示式産生具有給定名稱、屬性和內容的 XML 元素。

範例：

SELECT xmlelement(name foo);

 xmlelement
------------
 <foo/>

SELECT xmlelement(name foo, xmlattributes('xyz' as bar));

    xmlelement
------------------
 <foo bar="xyz"/>

SELECT xmlelement(name foo, xmlattributes(current_date as bar), 'cont', 'ent');

             xmlelement
-------------------------------------
 <foo bar="2007-01-26">content</foo>

透過用 xHHHH 序列替換有問題的字符來轉譯非有效 XML 名稱的元素和屬性名稱，其中 HHHH 是十六進位表示法中字元的 Unicode 代碼。例如：

SELECT xmlelement(name "foo$bar", xmlattributes('xyz' as "a&b"));

            xmlelement
----------------------------------
 <foo_x0024_bar a_x0026_b="xyz"/>

如果屬性值是引用欄位，則無需明確指定屬性名稱，在這種情況下，預設情況下欄位的名稱將用作屬性名稱。在其他情況下，必須為該屬性明確指定名稱。所以這個例子是有效的：

CREATE TABLE test (a xml, b xml);
SELECT xmlelement(name test, xmlattributes(a, b)) FROM test;

但這些不行：

SELECT xmlelement(name test, xmlattributes('constant'), a, b) FROM test;
SELECT xmlelement(name test, xmlattributes(func(a, b))) FROM test;

元素內容（如果已指定）將根據其資料型別進行格式化。如果內容本身是 xml 型別，則可以建構複雜的 XML 文件。例如：

SELECT xmlelement(name foo, xmlattributes('xyz' as bar),
                            xmlelement(name abc),
                            xmlcomment('test'),
                            xmlelement(name xyz));

                  xmlelement
----------------------------------------------
 <foo bar="xyz"><abc/><!--test--><xyz/></foo>

其他型別的內容將被格式化為有效的 XML 字元資料。這尤其意味著字符 <、> 和＆將被轉換為其他形式。二進位資料（資料型別 bytea）將以 base64 或十六進位編碼表示，具體取決於組態參數 xmlbinary 的設定。為了使 SQL 和 PostgreSQL 資料型別與 XML Schema 規範保持一致，預計各種資料型別的特定行為將會各自發展，此時將出現更精確的描述。

9.14.1.4. xmlforest

xmlforest(content [AS name] [, ...])

xmlforest 表示式使用給定的名稱和內容産生元素的 XML 序列。

範例：

SELECT xmlforest('abc' AS foo, 123 AS bar);

          xmlforest
------------------------------
 <foo>abc</foo><bar>123</bar>


SELECT xmlforest(table_name, column_name)
FROM information_schema.columns
WHERE table_schema = 'pg_catalog';

                                         xmlforest
-------------------------------------------------------------------------------------------
 <table_name>pg_authid</table_name><column_name>rolname</column_name>
 <table_name>pg_authid</table_name><column_name>rolsuper</column_name>
 ...

如第二個範例所示，如果內容值是欄位引用，則可以省略元素名稱，在這種情況下，預設情況下使用欄位名稱。否則，必須指定名稱。

非有效的 XML 名稱的元素名稱將被轉譯，如上面的 xmlelement 所示。類似地，內容資料會被轉譯以産生有效的 XML 內容，除非它已經是 xml 型別。

請注意，如果 XML 序列由多個元素組成，則它們不是有效的 XML 文件，因此將 xmlforest 表示式包裝在 xmlelement 中可能很有用。

9.14.1.5. xmlpi

xmlpi(name target [, content])

xmlpi 表示式建立 XML 處理指令。內容（如果存在）不得包含字元序列 ?>。

例如：

SELECT xmlpi(name php, 'echo "hello world";');

            xmlpi
-----------------------------
 <?php echo "hello world";?>

9.14.1.6. xmlroot

xmlroot(xml, version text | no value [, standalone yes|no|no value])

xmlroot 表示式改變 XML 值的根節點屬性。如果指定了版本，它將替換根節點的版本宣告中的值；如果指定了獨立設定，則它將替換根節點的獨立宣告中的值。

SELECT xmlroot(xmlparse(document '<?xml version="1.1"?><content>abc</content>'),
               version '1.0', standalone yes);

                xmlroot
----------------------------------------
 <?xml version="1.0" standalone="yes"?>
 <content>abc</content>

9.14.1.7. xmlagg

xmlagg(xml)

與此處描述的其他函數不同，函數 xmlagg 是一個彙總函數。它將輸入值連接到彙總函數呼叫，就像 xmlconcat 一樣，除了它是跨資料列而不是在單個資料列中的表示式進行連接。有關彙總函數的其他訊息，請參閱第 9.20 節。

例如：

CREATE TABLE test (y int, x xml);
INSERT INTO test VALUES (1, '<foo>abc</foo>');
INSERT INTO test VALUES (2, '<bar/>');
SELECT xmlagg(x) FROM test;
        xmlagg
----------------------
 <foo>abc</foo><bar/>

要確定連接的順序，可以將 ORDER BY 子句加到彙總呼叫中，如第 4.2.7 節中所述。例如：

SELECT xmlagg(x ORDER BY y DESC) FROM test;
        xmlagg
----------------------
 <bar/><foo>abc</foo>

以前的版本中推薦使用以下非標準方法，在特定情況下可能仍然有用：

SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
        xmlagg
----------------------
 <bar/><foo>abc</foo>

9.14.2. XML Predicates

本節中描述的表示式用於檢查 xml 的屬性。

9.14.2.1. IS DOCUMENT

xml IS DOCUMENT

如果參數 XML 是正確的 XML 文件，則表示式 IS DOCUMENT 將回傳 true，如果不是（它是內容片段），則回傳 false；如果參數為 null，則回傳 null。有關文件和內容片段之間的區別，請參閱第 8.13 節。

9.14.2.2. XMLEXISTS

XMLEXISTS(text PASSING [BY REF] xml [BY REF])

如果第一個參數中的 XPath 表示式回傳任何節點，則 xmlexists 函數回傳 true，否則回傳 false。（如果任一參數為 null，則結果為 null。）

範例

SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY REF '<towns><town>Toronto</town><town>Ottawa</town></towns>');

 xmlexists
------------
 t
(1 row)

BY REF 子句在 PostgreSQL 中沒有任何作用，但可以達到 SQL 一致性和與其他實作的相容性。根據 SQL 標準，第一個 BY REF 是必需的，第二個是選擇性的。另請注意，SQL 標準指定 xmlexists 構造將 XQuery 表示式作為第一個參數，但 PostgreSQL 目前僅支持 XPath，它是 XQuery 的子集。

9.14.2.3. xml_is_well_formed

xml_is_well_formed(text)
xml_is_well_formed_document(text)
xml_is_well_formed_content(text)

此函數檢查文字字串是否格式正確，回傳布林結果。xml_is_well_formed_document 檢查格式正確的文檔，而 xml_is_well_formed_content 檢查格式良好的內容。如果 xmloption 配置參數設定為 DOCUMENT，則 xml_is_well_formed 會執行前者；如果設定為 CONTENT，則執行後者。這意味著 xml_is_well_formed 對於查看對 xml 類型的簡單強制轉換是否成功很有用，而其他兩個函數對於查看 XMLPARSE 的相對應變數是否成功很有用。

範例：

SET xmloption TO DOCUMENT;
SELECT xml_is_well_formed('<>');
 xml_is_well_formed 
--------------------
 f
(1 row)

SELECT xml_is_well_formed('<abc/>');
 xml_is_well_formed 
--------------------
 t
(1 row)

SET xmloption TO CONTENT;
SELECT xml_is_well_formed('abc');
 xml_is_well_formed 
--------------------
 t
(1 row)

SELECT xml_is_well_formed_document('<pg:foo xmlns:pg="http://postgresql.org/stuff">bar</pg:foo>');
 xml_is_well_formed_document 
-----------------------------
 t
(1 row)

SELECT xml_is_well_formed_document('<pg:foo xmlns:pg="http://postgresql.org/stuff">bar</my:foo>');
 xml_is_well_formed_document 
-----------------------------
 f
(1 row)

最後一個範例顯示檢查包括命名空間是否符合。

9.14.3. 處理 XML

為了處理資料型別為 xml 的值，PostgreSQL 提供了 xpath 和 xpath_exists 函數，它們用於計算 XPath 1.0 表示式和 XMLTABLE 資料表函數。

9.14.3.1. xpath

xpath(xpath, xml [, nsarray])

函數 xpath 根據 XML 值 xml 計算 XPath 表示式 xpath（字串）。它回傳與 XPath 表示式產生的節點集合所相對應 XML 值的陣列。如果 XPath 表示式回傳單一變數值而不是節點集合，則回傳單個元素的陣列。

第二個參數必須是格式良好的 XML 內容。特別要注意是，它必須具有單一根節點元素。

該函數的選擇性第三個參數是命名空間對應的陣列。該陣列應該是二維字串陣列，第二維的長度等於 2（即，它應該是陣列的陣列，每個陣列恰好由 2 個元素組成）。每個陣列項目的第一個元素是命名空間名稱（別名），第二個是命名空間 URI。不要求此陣列中提供的別名與 XML 內容本身所使用的別名相同（換句話說，在 XML 內容和 xpath 函數內容中，別名都是區域性的）。

例如：

SELECT xpath('/my:a/text()', '<my:a xmlns:my="http://example.com">test</my:a>',
             ARRAY[ARRAY['my', 'http://example.com']]);

 xpath  
--------
 {test}
(1 row)

要設定預設的（匿名）命名空間，請執行以下操作：

SELECT xpath('//mydefns:b/text()', '<a xmlns="http://example.com"><b>test</b></a>',
             ARRAY[ARRAY['mydefns', 'http://example.com']]);

 xpath
--------
 {test}
(1 row)

9.14.3.2. xpath_exists

xpath_exists(xpath, xml [, nsarray])

The function xpath_exists is a specialized form of the xpath function. Instead of returning the individual XML values that satisfy the XPath, this function returns a Boolean indicating whether the query was satisfied or not. This function is equivalent to the standard XMLEXISTS predicate, except that it also offers support for a namespace mapping argument.

Example:

SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</my:a>',
                     ARRAY[ARRAY['my', 'http://example.com']]);

 xpath_exists  
--------------
 t
(1 row)

9.14.3.3. xmltable

xmltable( [XMLNAMESPACES(namespace uri AS namespace name[, ...]), ]
          row_expression PASSING [BY REF] document_expression [BY REF]
          COLUMNS name { type [PATH column_expression] [DEFAULT default_expression] [NOT NULL | NULL]
                        | FOR ORDINALITY }
                   [, ...]
)

The xmltable function produces a table based on the given XML value, an XPath filter to extract rows, and an optional set of column definitions.

The optional XMLNAMESPACES clause is a comma-separated list of namespaces. It specifies the XML namespaces used in the document and their aliases. A default namespace specification is not currently supported.

The required row_expression argument is an XPath expression that is evaluated against the supplied XML document to obtain an ordered sequence of XML nodes. This sequence is what xmltable transforms into output rows.

document_expression provides the XML document to operate on. The BY REF clauses have no effect in PostgreSQL, but are allowed for SQL conformance and compatibility with other implementations. The argument must be a well-formed XML document; fragments/forests are not accepted.

The mandatory COLUMNS clause specifies the list of columns in the output table. If the COLUMNS clause is omitted, the rows in the result set contain a single column of type xml containing the data matched by row_expression. If COLUMNS is specified, each entry describes a single column. See the syntax summary above for the format. The column name and type are required; the path, default and nullability clauses are optional.

A column marked FOR ORDINALITY will be populated with row numbers matching the order in which the output rows appeared in the original input XML document. At most one column may be marked FOR ORDINALITY.

The column_expression for a column is an XPath expression that is evaluated for each row, relative to the result of the row_expression, to find the value of the column. If no column_expression is given, then the column name is used as an implicit path.

If a column's XPath expression returns multiple elements, an error is raised. If the expression matches an empty tag, the result is an empty string (not NULL). Any xsi:nil attributes are ignored.

The text body of the XML matched by the column_expression is used as the column value. Multiple text() nodes within an element are concatenated in order. Any child elements, processing instructions, and comments are ignored, but the text contents of child elements are concatenated to the result. Note that the whitespace-only text() node between two non-text elements is preserved, and that leading whitespace on a text() node is not flattened.

If the path expression does not match for a given row but default_expression is specified, the value resulting from evaluating that expression is used. If no DEFAULT clause is given for the column, the field will be set to NULL. It is possible for a default_expression to reference the value of output columns that appear prior to it in the column list, so the default of one column may be based on the value of another column.

Columns may be marked NOT NULL. If the column_expression for a NOT NULL column does not match anything and there is no DEFAULT or the default_expression also evaluates to null, an error is reported.

Unlike regular PostgreSQL functions, column_expression and default_expression are not evaluated to a simple value before calling the function. column_expression is normally evaluated exactly once per input row, and default_expression is evaluated each time a default is needed for a field. If the expression qualifies as stable or immutable the repeat evaluation may be skipped. Effectively xmltable behaves more like a subquery than a function call. This means that you can usefully use volatile functions like nextval in default_expression, and column_expression may depend on other parts of the XML document.

Examples:

CREATE TABLE xmldata AS SELECT
xml $$
<ROWS>
  <ROW id="1">
    <COUNTRY_ID>AU</COUNTRY_ID>
    <COUNTRY_NAME>Australia</COUNTRY_NAME>
  </ROW>
  <ROW id="5">
    <COUNTRY_ID>JP</COUNTRY_ID>
    <COUNTRY_NAME>Japan</COUNTRY_NAME>
    <PREMIER_NAME>Shinzo Abe</PREMIER_NAME>
    <SIZE unit="sq_mi">145935</SIZE>
  </ROW>
  <ROW id="6">
    <COUNTRY_ID>SG</COUNTRY_ID>
    <COUNTRY_NAME>Singapore</COUNTRY_NAME>
    <SIZE unit="sq_km">697</SIZE>
  </ROW>
</ROWS>
$$ AS data;

SELECT xmltable.*
  FROM xmldata,
       XMLTABLE('//ROWS/ROW'
                PASSING data
                COLUMNS id int PATH '@id',
                        ordinality FOR ORDINALITY,
                        "COUNTRY_NAME" text,
                        country_id text PATH 'COUNTRY_ID',
                        size_sq_km float PATH 'SIZE[@unit = "sq_km"]',
                        size_other text PATH
                             'concat(SIZE[@unit!="sq_km"], " ", SIZE[@unit!="sq_km"]/@unit)',
                        premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified') ;

 id | ordinality | COUNTRY_NAME | country_id | size_sq_km |  size_other  | premier_name  
----+------------+--------------+------------+------------+--------------+---------------
  1 |          1 | Australia    | AU         |            |              | not specified
  5 |          2 | Japan        | JP         |            | 145935 sq_mi | Shinzo Abe
  6 |          3 | Singapore    | SG         |        697 |              | not specified

The following example shows concatenation of multiple text() nodes, usage of the column name as XPath filter, and the treatment of whitespace, XML comments and processing instructions:

CREATE TABLE xmlelements AS SELECT
xml $$
  <root>
   <element>  Hello<!-- xyxxz -->2a2<?aaaaa?> <!--x-->  bbb<x>xxx</x>CC  </element>
  </root>
$$ AS data;

SELECT xmltable.*
  FROM xmlelements, XMLTABLE('/root' PASSING data COLUMNS element text);
       element        
----------------------
   Hello2a2   bbbCC

The following example illustrates how the XMLNAMESPACES clause can be used to specify the default namespace, and a list of additional namespaces used in the XML document as well as in the XPath expressions:

WITH xmldata(data) AS (VALUES ('
<example xmlns="http://example.com/myns" xmlns:B="http://example.com/b">
 <item foo="1" B:bar="2"/>
 <item foo="3" B:bar="4"/>
 <item foo="4" B:bar="5"/>
</example>'::xml)
)
SELECT xmltable.*
  FROM XMLTABLE(XMLNAMESPACES('http://example.com/myns' AS x,
                              'http://example.com/b' AS "B"),
             '/x:example/x:item'
                PASSING (SELECT data FROM xmldata)
                COLUMNS foo int PATH '@foo',
                  bar int PATH '@B:bar');
 foo | bar
-----+-----
   1 |   2
   3 |   4
   4 |   5
(3 rows)

9.14.4. Mapping Tables to XML

The following functions map the contents of relational tables to XML values. They can be thought of as XML export functionality:

table_to_xml(tbl regclass, nulls boolean, tableforest boolean, targetns text)
query_to_xml(query text, nulls boolean, tableforest boolean, targetns text)
cursor_to_xml(cursor refcursor, count int, nulls boolean,
              tableforest boolean, targetns text)

The return type of each function is xml.

table_to_xml maps the content of the named table, passed as parameter tbl. The regclass type accepts strings identifying tables using the usual notation, including optional schema qualifications and double quotes. query_to_xml executes the query whose text is passed as parameter query and maps the result set. cursor_to_xml fetches the indicated number of rows from the cursor specified by the parameter cursor. This variant is recommended if large tables have to be mapped, because the result value is built up in memory by each function.

If tableforest is false, then the resulting XML document looks like this:

<tablename>
  <row>
    <columnname1>data</columnname1>
    <columnname2>data</columnname2>
  </row>

  <row>
    ...
  </row>

  ...
</tablename>

If tableforest is true, the result is an XML content fragment that looks like this:

<tablename>
  <columnname1>data</columnname1>
  <columnname2>data</columnname2>
</tablename>

<tablename>
  ...
</tablename>

...

If no table name is available, that is, when mapping a query or a cursor, the string table is used in the first format, row in the second format.

The choice between these formats is up to the user. The first format is a proper XML document, which will be important in many applications. The second format tends to be more useful in the cursor_to_xml function if the result values are to be reassembled into one document later on. The functions for producing XML content discussed above, in particular xmlelement, can be used to alter the results to taste.

The data values are mapped in the same way as described for the function xmlelement above.

The parameter nulls determines whether null values should be included in the output. If true, null values in columns are represented as:

<columnname xsi:nil="true"/>

where xsi is the XML namespace prefix for XML Schema Instance. An appropriate namespace declaration will be added to the result value. If false, columns containing null values are simply omitted from the output.

The parameter targetns specifies the desired XML namespace of the result. If no particular namespace is wanted, an empty string should be passed.

The following functions return XML Schema documents describing the mappings performed by the corresponding functions above:

table_to_xmlschema(tbl regclass, nulls boolean, tableforest boolean, targetns text)
query_to_xmlschema(query text, nulls boolean, tableforest boolean, targetns text)
cursor_to_xmlschema(cursor refcursor, nulls boolean, tableforest boolean, targetns text)

It is essential that the same parameters are passed in order to obtain matching XML data mappings and XML Schema documents.

The following functions produce XML data mappings and the corresponding XML Schema in one document (or forest), linked together. They can be useful where self-contained and self-describing results are wanted:

table_to_xml_and_xmlschema(tbl regclass, nulls boolean, tableforest boolean, targetns text)
query_to_xml_and_xmlschema(query text, nulls boolean, tableforest boolean, targetns text)

In addition, the following functions are available to produce analogous mappings of entire schemas or the entire current database:

schema_to_xml(schema name, nulls boolean, tableforest boolean, targetns text)
schema_to_xmlschema(schema name, nulls boolean, tableforest boolean, targetns text)
schema_to_xml_and_xmlschema(schema name, nulls boolean, tableforest boolean, targetns text)

database_to_xml(nulls boolean, tableforest boolean, targetns text)
database_to_xmlschema(nulls boolean, tableforest boolean, targetns text)
database_to_xml_and_xmlschema(nulls boolean, tableforest boolean, targetns text)

Note that these potentially produce a lot of data, which needs to be built up in memory. When requesting content mappings of large schemas or databases, it might be worthwhile to consider mapping the tables separately instead, possibly even through a cursor.

The result of a schema content mapping looks like this:

<schemaname>

table1-mapping

table2-mapping

...

</schemaname>

where the format of a table mapping depends on the tableforest parameter as explained above.

The result of a database content mapping looks like this:

<dbname>

<schema1name>
  ...
</schema1name>

<schema2name>
  ...
</schema2name>

...

</dbname>

where the schema mapping is as above.

As an example of using the output produced by these functions, Figure 9.1 shows an XSLT stylesheet that converts the output of table_to_xml_and_xmlschema to an HTML document containing a tabular rendition of the table data. In a similar manner, the results from these functions can be converted into other XML-based formats.

Figure 9.1. XSLT Stylesheet for Converting SQL/XML Output to HTML

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns="http://www.w3.org/1999/xhtml"
>

  <xsl:output method="xml"
      doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
      doctype-public="-//W3C/DTD XHTML 1.0 Strict//EN"
      indent="yes"/>

  <xsl:template match="/*">
    <xsl:variable name="schema" select="//xsd:schema"/>
    <xsl:variable name="tabletypename"
                  select="$schema/xsd:element[@name=name(current())]/@type"/>
    <xsl:variable name="rowtypename"
                  select="$schema/xsd:complexType[@name=$tabletypename]/xsd:sequence/xsd:element[@name='row']/@type"/>

    <html>
      <head>
        <title><xsl:value-of select="name(current())"/></title>
      </head>
      <body>
        <table>
          <tr>
            <xsl:for-each select="$schema/xsd:complexType[@name=$rowtypename]/xsd:sequence/xsd:element/@name">
              <th><xsl:value-of select="."/></th>
            </xsl:for-each>
          </tr>

          <xsl:for-each select="row">
            <tr>
              <xsl:for-each select="*">
                <td><xsl:value-of select="."/></td>
              </xsl:for-each>
            </tr>
          </xsl:for-each>
        </table>
      </body>
    </html>
  </xsl:template>

</xsl:stylesheet>

7.2. 資料表表示式

A table expression computes a table. The table expression contains a FROM clause that is optionally followed by WHERE, GROUP BY, and HAVING clauses. Trivial table expressions simply refer to a table on disk, a so-called base table, but more complex expressions can be used to modify or combine base tables in various ways.

The optional WHERE, GROUP BY, and HAVING clauses in the table expression specify a pipeline of successive transformations performed on the table derived in the FROM clause. All these transformations produce a virtual table that provides the rows that are passed to the select list to compute the output rows of the query.

7.2.1. The `FROM` Clause

The FROM Clause derives a table from one or more other tables given in a comma-separated table reference list.

FROM table_reference [, table_reference [, ...]]

A table reference can be a table name (possibly schema-qualified), or a derived table such as a subquery, a JOIN construct, or complex combinations of these. If more than one table reference is listed in the FROM clause, the tables are cross-joined (that is, the Cartesian product of their rows is formed; see below). The result of the FROM list is an intermediate virtual table that can then be subject to transformations by the WHERE, GROUP BY, and HAVING clauses and is finally the result of the overall table expression.

When a table reference names a table that is the parent of a table inheritance hierarchy, the table reference produces rows of not only that table but all of its descendant tables, unless the key word ONLY precedes the table name. However, the reference produces only the columns that appear in the named table — any columns added in subtables are ignored.

Instead of writing ONLY before the table name, you can write * after the table name to explicitly specify that descendant tables are included. There is no real reason to use this syntax any more, because searching descendant tables is now always the default behavior. However, it is supported for compatibility with older releases.

7.2.1.1. Joined Tables

A joined table is a table derived from two other (real or derived) tables according to the rules of the particular join type. Inner, outer, and cross-joins are available. The general syntax of a joined table is

T1 join_type T2 [ join_condition ]

Joins of all types can be chained together, or nested: either or both T1 and T2 can be joined tables. Parentheses can be used around JOIN clauses to control the join order. In the absence of parentheses, JOIN clauses nest left-to-right.

Join TypesCross join

T1 CROSS JOIN T2

For every possible combination of rows from T1 and T2 (i.e., a Cartesian product), the joined table will contain a row consisting of all columns in T1 followed by all columns in T2. If the tables have N and M rows respectively, the joined table will have N * M rows.

FROM T1 CROSS JOIN T2 is equivalent to FROM T1 INNER JOIN T2 ON TRUE (see below). It is also equivalent to FROM T1, T2.

Note

This latter equivalence does not hold exactly when more than two tables appear, because JOIN binds more tightly than comma. For example FROMT1 CROSS JOIN T2 INNER JOIN T3 ON condition is not the same as FROMT1, T2 INNER JOIN T3 ON condition because the condition can referenceT1 in the first case but not the second.Qualified joins

T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 ON boolean_expression
T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 USING ( join column list )
T1 NATURAL { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2

The words INNER and OUTER are optional in all forms. INNER is the default; LEFT, RIGHT, and FULL imply an outer join.

The join condition is specified in the ON or USING clause, or implicitly by the word NATURAL. The join condition determines which rows from the two source tables are considered to “match”, as explained in detail below.

The possible types of qualified join are:INNER JOIN

For each row R1 of T1, the joined table has a row for each row in T2 that satisfies the join condition with R1.LEFT OUTER JOIN

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Thus, the joined table always has at least one row for each row in T1.RIGHT OUTER JOIN

First, an inner join is performed. Then, for each row in T2 that does not satisfy the join condition with any row in T1, a joined row is added with null values in columns of T1. This is the converse of a left join: the result table will always have a row for each row in T2.FULL OUTER JOIN

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Also, for each row of T2 that does not satisfy the join condition with any row in T1, a joined row with null values in the columns of T1 is added.

The ON clause is the most general kind of join condition: it takes a Boolean value expression of the same kind as is used in a WHERE clause. A pair of rows from T1 and T2match if the ON expression evaluates to true.

The USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining T1 and T2with USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b = T2.b.

Furthermore, the output of JOIN USING suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. While JOIN ON produces all columns from T1 followed by all columns from T2, JOIN USING produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns from T1, followed by any remaining columns from T2.

Finally, NATURAL is a shorthand form of USING: it forms a USING list consisting of all column names that appear in both input tables. As with USING, these columns appear only once in the output table. If there are no common column names, NATURAL JOIN behaves like JOIN ... ON TRUE, producing a cross-product join.

Note

USING is reasonably safe from column changes in the joined relations since only the listed columns are combined. NATURAL is considerably more risky since any schema changes to either relation that cause a new matching column name to be present will cause the join to combine that new column as well.

To put this together, assume we have tables t1:

 num | name
-----+------
   1 | a
   2 | b
   3 | c

and t2:

 num | value
-----+-------
   1 | xxx
   3 | yyy
   5 | zzz

then we get the following results for the various joins:

=> SELECT * FROM t1 CROSS JOIN t2;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   1 | a    |   3 | yyy
   1 | a    |   5 | zzz
   2 | b    |   1 | xxx
   2 | b    |   3 | yyy
   2 | b    |   5 | zzz
   3 | c    |   1 | xxx
   3 | c    |   3 | yyy
   3 | c    |   5 | zzz
(9 rows)

=> SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   3 | c    |   3 | yyy
(2 rows)

=> SELECT * FROM t1 INNER JOIN t2 USING (num);
 num | name | value
-----+------+-------
   1 | a    | xxx
   3 | c    | yyy
(2 rows)

=> SELECT * FROM t1 NATURAL INNER JOIN t2;
 num | name | value
-----+------+-------
   1 | a    | xxx
   3 | c    | yyy
(2 rows)

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   2 | b    |     |
   3 | c    |   3 | yyy
(3 rows)

=> SELECT * FROM t1 LEFT JOIN t2 USING (num);
 num | name | value
-----+------+-------
   1 | a    | xxx
   2 | b    |
   3 | c    | yyy
(3 rows)

=> SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   3 | c    |   3 | yyy
     |      |   5 | zzz
(3 rows)

=> SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   2 | b    |     |
   3 | c    |   3 | yyy
     |      |   5 | zzz
(4 rows)

The join condition specified with ON can also contain conditions that do not relate directly to the join. This can prove useful for some queries but needs to be thought out carefully. For example:

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value = 'xxx';
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   2 | b    |     |
   3 | c    |     |
(3 rows)

Notice that placing the restriction in the WHERE clause produces a different result:

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value = 'xxx';
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
(1 row)

This is because a restriction placed in the ON clause is processed before the join, while a restriction placed in the WHERE clause is processed after the join. That does not matter with inner joins, but it matters a lot with outer joins.

7.2.1.2. Table and Column Aliases

A temporary name can be given to tables and complex table references to be used for references to the derived table in the rest of the query. This is called a table alias.

To create a table alias, write

FROM table_reference AS alias

FROM table_reference alias

The AS key word is optional noise. alias can be any identifier.

A typical application of table aliases is to assign short identifiers to long table names to keep the join clauses readable. For example:

SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.id = a.num;

The alias becomes the new name of the table reference so far as the current query is concerned — it is not allowed to refer to the table by the original name elsewhere in the query. Thus, this is not valid:

SELECT * FROM my_table AS m WHERE my_table.a > 5;    -- wrong

Table aliases are mainly for notational convenience, but it is necessary to use them when joining a table to itself, e.g.:

SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;

Additionally, an alias is required if the table reference is a subquery (see Section 7.2.1.3).

Parentheses are used to resolve ambiguities. In the following example, the first statement assigns the alias b to the second instance of my_table, but the second statement assigns the alias to the result of the join:

SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...

Another form of table aliasing gives temporary names to the columns of the table, as well as the table itself:

FROM table_reference [AS] alias ( column1 [, column2 [, ...]] )

If fewer column aliases are specified than the actual table has columns, the remaining columns are not renamed. This syntax is especially useful for self-joins or subqueries.

When an alias is applied to the output of a JOIN clause, the alias hides the original name(s) within the JOIN. For example:

SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...

is valid SQL, but:

SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c

is not valid; the table alias a is not visible outside the alias c.

7.2.1.3. Subqueries

Subqueries specifying a derived table must be enclosed in parentheses and must be assigned a table alias name (as in Section 7.2.1.2). For example:

FROM (SELECT * FROM table1) AS alias_name

This example is equivalent to FROM table1 AS alias_name. More interesting cases, which cannot be reduced to a plain join, arise when the subquery involves grouping or aggregation.

A subquery can also be a VALUES list:

FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))
     AS names(first, last)

Again, a table alias is required. Assigning alias names to the columns of the VALUES list is optional, but is good practice. For more information see Section 7.7.

7.2.1.4. Table Functions

Table functions are functions that produce a set of rows, made up of either base data types (scalar types) or composite data types (table rows). They are used like a table, view, or subquery in the FROM clause of a query. Columns returned by table functions can be included in SELECT, JOIN, or WHERE clauses in the same manner as columns of a table, view, or subquery.

Table functions may also be combined using the ROWS FROM syntax, with the results returned in parallel columns; the number of result rows in this case is that of the largest function result, with smaller results padded with null values to match.

function_call [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]
ROWS FROM( function_call [, ... ] ) [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]

If the WITH ORDINALITY clause is specified, an additional column of type bigint will be added to the function result columns. This column numbers the rows of the function result set, starting from 1. (This is a generalization of the SQL-standard syntax for UNNEST ... WITH ORDINALITY.) By default, the ordinal column is called ordinality, but a different column name can be assigned to it using an AS clause.

The special table function UNNEST may be called with any number of array parameters, and it returns a corresponding number of columns, as if UNNEST (Section 9.18) had been called on each parameter separately and combined using the ROWS FROM construct.

UNNEST( array_expression [, ... ] ) [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]

If no table_alias is specified, the function name is used as the table name; in the case of a ROWS FROM() construct, the first function's name is used.

If column aliases are not supplied, then for a function returning a base data type, the column name is also the same as the function name. For a function returning a composite type, the result columns get the names of the individual attributes of the type.

Some examples:

CREATE TABLE foo (fooid int, foosubid int, fooname text);

CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
    SELECT * FROM foo WHERE fooid = $1;
$$ LANGUAGE SQL;

SELECT * FROM getfoo(1) AS t1;

SELECT * FROM foo
    WHERE foosubid IN (
                        SELECT foosubid
                        FROM getfoo(foo.fooid) z
                        WHERE z.fooid = foo.fooid
                      );

CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);

SELECT * FROM vw_getfoo;

In some cases it is useful to define table functions that can return different column sets depending on how they are invoked. To support this, the table function can be declared as returning the pseudo-type record. When such a function is used in a query, the expected row structure must be specified in the query itself, so that the system can know how to parse and plan the query. This syntax looks like:

function_call [AS] alias (column_definition [, ... ])
function_call AS [alias] (column_definition [, ... ])
ROWS FROM( ... function_call AS (column_definition [, ... ]) [, ... ] )

When not using the ROWS FROM() syntax, the column_definition list replaces the column alias list that could otherwise be attached to the FROM item; the names in the column definitions serve as column aliases. When using the ROWS FROM() syntax, a column_definition list can be attached to each member function separately; or if there is only one member function and no WITH ORDINALITY clause, a column_definition list can be written in place of a column alias list following ROWS FROM().

Consider this example:

SELECT *
    FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
      AS t1(proname name, prosrc text)
    WHERE proname LIKE 'bytea%';

The dblink function (part of the dblink module) executes a remote query. It is declared to return record since it might be used for any kind of query. The actual column set must be specified in the calling query so that the parser knows, for example, what * should expand to.

7.2.1.5. LATERAL Subqueries

Subqueries appearing in FROM can be preceded by the key word LATERAL. This allows them to reference columns provided by preceding FROM items. (Without LATERAL, each subquery is evaluated independently and so cannot cross-reference any other FROM item.)

Table functions appearing in FROM can also be preceded by the key word LATERAL, but for functions the key word is optional; the function's arguments can contain references to columns provided by preceding FROM items in any case.

A LATERAL item can appear at top level in the FROM list, or within a JOIN tree. In the latter case it can also refer to any items that are on the left-hand side of a JOIN that it is on the right-hand side of.

When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROM item providing the cross-referenced column(s), or set of rows of multiple FROM items providing the columns, the LATERAL item is evaluated using that row or row set's values of the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is repeated for each row or set of rows from the column source table(s).

A trivial example of LATERAL is

SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss;

This is not especially useful since it has exactly the same result as the more conventional

SELECT * FROM foo, bar WHERE bar.id = foo.bar_id;

LATERAL is primarily useful when the cross-referenced column is necessary for computing the row(s) to be joined. A common application is providing an argument value for a set-returning function. For example, supposing that vertices(polygon) returns the set of vertices of a polygon, we could identify close-together vertices of polygons stored in a table with:

SELECT p1.id, p2.id, v1, v2
FROM polygons p1, polygons p2,
     LATERAL vertices(p1.poly) v1,
     LATERAL vertices(p2.poly) v2
WHERE (v1 <-> v2) < 10 AND p1.id != p2.id;

This query could also be written

SELECT p1.id, p2.id, v1, v2
FROM polygons p1 CROSS JOIN LATERAL vertices(p1.poly) v1,
     polygons p2 CROSS JOIN LATERAL vertices(p2.poly) v2
WHERE (v1 <-> v2) < 10 AND p1.id != p2.id;

or in several other equivalent formulations. (As already mentioned, the LATERAL key word is unnecessary in this example, but we use it for clarity.)

It is often particularly handy to LEFT JOIN to a LATERAL subquery, so that source rows will appear in the result even if the LATERAL subquery produces no rows for them. For example, if get_product_names() returns the names of products made by a manufacturer, but some manufacturers in our table currently produce no products, we could find out which ones those are like this:

SELECT m.name
FROM manufacturers m LEFT JOIN LATERAL get_product_names(m.id) pname ON true
WHERE pname IS NULL;

7.2.2. The `WHERE` Clause

The syntax of the WHERE Clause is

WHERE search_condition

where search_condition is any value expression (see Section 4.2) that returns a value of type boolean.

After the processing of the FROM clause is done, each row of the derived virtual table is checked against the search condition. If the result of the condition is true, the row is kept in the output table, otherwise (i.e., if the result is false or null) it is discarded. The search condition typically references at least one column of the table generated in the FROMclause; this is not required, but otherwise the WHERE clause will be fairly useless.

Note

The join condition of an inner join can be written either in the WHEREclause or in the JOIN clause. For example, these table expressions are equivalent:

FROM a, b WHERE a.id = b.id AND b.val > 5

and:

FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5

or perhaps even:

FROM a NATURAL JOIN b WHERE b.val > 5

Which one of these you use is mainly a matter of style. The JOIN syntax in the FROM clause is probably not as portable to other SQL database management systems, even though it is in the SQL standard. For outer joins there is no choice: they must be done in the FROM clause. The ON or USING clause of an outer join is not equivalent to a WHERE condition, because it results in the addition of rows (for unmatched input rows) as well as the removal of rows in the final result.

Here are some examples of WHERE clauses:

SELECT ... FROM fdt WHERE c1 > 5

SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)

SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)

SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)

SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100

SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > fdt.c1)

fdt is the table derived in the FROM clause. Rows that do not meet the search condition of the WHERE clause are eliminated from fdt. Notice the use of scalar subqueries as value expressions. Just like any other query, the subqueries can employ complex table expressions. Notice also how fdt is referenced in the subqueries. Qualifying c1 as fdt.c1 is only necessary if c1 is also the name of a column in the derived input table of the subquery. But qualifying the column name adds clarity even when it is not needed. This example shows how the column naming scope of an outer query extends into its inner queries.

7.2.3. The `GROUP BY` and `HAVING` Clauses

After passing the WHERE filter, the derived input table might be subject to grouping, using the GROUP BY clause, and elimination of group rows using the HAVING clause.

SELECT select_list
    FROM ...
    [WHERE ...]
    GROUP BY grouping_column_reference [, grouping_column_reference]...

The GROUP BY Clause is used to group together those rows in a table that have the same values in all the columns listed. The order in which the columns are listed does not matter. The effect is to combine each set of rows having common values into one group row that represents all rows in the group. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups. For instance:

=> SELECT * FROM test1;
 x | y
---+---
 a | 3
 c | 2
 b | 5
 a | 1
(4 rows)

=> SELECT x FROM test1 GROUP BY x;
 x
---
 a
 b
 c
(3 rows)

In the second query, we could not have written SELECT * FROM test1 GROUP BY x, because there is no single value for the column y that could be associated with each group. The grouped-by columns can be referenced in the select list since they have a single value in each group.

In general, if a table is grouped, columns that are not listed in GROUP BY cannot be referenced except in aggregate expressions. An example with aggregate expressions is:

=> SELECT x, sum(y) FROM test1 GROUP BY x;
 x | sum
---+-----
 a |   4
 b |   5
 c |   2
(3 rows)

Here sum is an aggregate function that computes a single value over the entire group. More information about the available aggregate functions can be found in Section 9.20.

Tip

Grouping without aggregate expressions effectively calculates the set of distinct values in a column. This can also be achieved using the DISTINCTclause (see Section 7.3.3).

Here is another example: it calculates the total sales for each product (rather than the total sales of all products):

SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
    FROM products p LEFT JOIN sales s USING (product_id)
    GROUP BY product_id, p.name, p.price;

In this example, the columns product_id, p.name, and p.price must be in the GROUP BY clause since they are referenced in the query select list (but see below). The columns.units does not have to be in the GROUP BY list since it is only used in an aggregate expression (sum(...)), which represents the sales of a product. For each product, the query returns a summary row about all sales of the product.

If the products table is set up so that, say, product_id is the primary key, then it would be enough to group by product_id in the above example, since name and price would be functionally dependent on the product ID, and so there would be no ambiguity about which name and price value to return for each product ID group.

In strict SQL, GROUP BY can only group by columns of the source table but PostgreSQL extends this to also allow GROUP BY to group by columns in the select list. Grouping by value expressions instead of simple column names is also allowed.

If a table has been grouped using GROUP BY, but only certain groups are of interest, the HAVING clause can be used, much like a WHERE clause, to eliminate groups from the result. The syntax is:

SELECT select_list FROM ... [WHERE ...] GROUP BY ... HAVING boolean_expression

Expressions in the HAVING clause can refer both to grouped expressions and to ungrouped expressions (which necessarily involve an aggregate function).

Example:

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3;
 x | sum
---+-----
 a |   4
 b |   5
(2 rows)

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c';
 x | sum
---+-----
 a |   4
 b |   5
(2 rows)

Again, a more realistic example:

SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit
    FROM products p LEFT JOIN sales s USING (product_id)
    WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
    GROUP BY product_id, p.name, p.price, p.cost
    HAVING sum(p.price * s.units) > 5000;

In the example above, the WHERE clause is selecting rows by a column that is not grouped (the expression is only true for sales during the last four weeks), while the HAVINGclause restricts the output to groups with total gross sales over 5000. Note that the aggregate expressions do not necessarily need to be the same in all parts of the query.

If a query contains aggregate function calls, but no GROUP BY clause, grouping still occurs: the result is a single group row (or perhaps no rows at all, if the single row is then eliminated by HAVING). The same is true if it contains a HAVING clause, even without any aggregate function calls or GROUP BY clause.

7.2.4. `GROUPING SETS`, `CUBE`, and `ROLLUP`

More complex grouping operations than those described above are possible using the concept of grouping sets. The data selected by the FROM and WHERE clauses is grouped separately by each specified grouping set, aggregates computed for each group just as for simple GROUP BY clauses, and then the results returned. For example:

=> SELECT * FROM items_sold;
 brand | size | sales
-------+------+-------
 Foo   | L    |  10
 Foo   | M    |  20
 Bar   | M    |  15
 Bar   | L    |  5
(4 rows)

=> SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING SETS ((brand), (size), ());
 brand | size | sum
-------+------+-----
 Foo   |      |  30
 Bar   |      |  20
       | L    |  15
       | M    |  35
       |      |  50
(5 rows)

Each sublist of GROUPING SETS may specify zero or more columns or expressions and is interpreted the same way as though it were directly in the GROUP BY clause. An empty grouping set means that all rows are aggregated down to a single group (which is output even if no input rows were present), as described above for the case of aggregate functions with no GROUP BY clause.

References to the grouping columns or expressions are replaced by null values in result rows for grouping sets in which those columns do not appear. To distinguish which grouping a particular output row resulted from, see Table 9.56.

A shorthand notation is provided for specifying two common types of grouping set. A clause of the form

ROLLUP ( e1, e2, e3, ... )

represents the given list of expressions and all prefixes of the list including the empty list; thus it is equivalent to

GROUPING SETS (
    ( e1, e2, e3, ... ),
    ...
    ( e1, e2 ),
    ( e1 ),
    ( )
)

This is commonly used for analysis over hierarchical data; e.g. total salary by department, division, and company-wide total.

A clause of the form

CUBE ( e1, e2, ... )

represents the given list and all of its possible subsets (i.e. the power set). Thus

CUBE ( a, b, c )

is equivalent to

GROUPING SETS (
    ( a, b, c ),
    ( a, b    ),
    ( a,    c ),
    ( a       ),
    (    b, c ),
    (    b    ),
    (       c ),
    (         )
)

The individual elements of a CUBE or ROLLUP clause may be either individual expressions, or sublists of elements in parentheses. In the latter case, the sublists are treated as single units for the purposes of generating the individual grouping sets. For example:

CUBE ( (a, b), (c, d) )

is equivalent to

GROUPING SETS (
    ( a, b, c, d ),
    ( a, b       ),
    (       c, d ),
    (            )
)

and

ROLLUP ( a, (b, c), d )

is equivalent to

GROUPING SETS (
    ( a, b, c, d ),
    ( a, b, c    ),
    ( a          ),
    (            )
)

The CUBE and ROLLUP constructs can be used either directly in the GROUP BY clause, or nested inside a GROUPING SETS clause. If one GROUPING SETS clause is nested inside another, the effect is the same as if all the elements of the inner clause had been written directly in the outer clause.

If multiple grouping items are specified in a single GROUP BY clause, then the final list of grouping sets is the cross product of the individual items. For example:

GROUP BY a, CUBE (b, c), GROUPING SETS ((d), (e))

is equivalent to

GROUP BY GROUPING SETS (
    (a, b, c, d), (a, b, c, e),
    (a, b, d),    (a, b, e),
    (a, c, d),    (a, c, e),
    (a, d),       (a, e)
)

Note

The construct (a, b) is normally recognized in expressions as a row constructor. Within the GROUP BY clause, this does not apply at the top levels of expressions, and (a, b) is parsed as a list of expressions as described above. If for some reason you need a row constructor in a grouping expression, use ROW(a, b).

7.2.5. Window Function Processing

If the query contains any window functions (see Section 3.5, Section 9.21 and Section 4.2.8), these functions are evaluated after any grouping, aggregation, and HAVING filtering is performed. That is, if the query uses any aggregates, GROUP BY, or HAVING, then the rows seen by the window functions are the group rows instead of the original table rows from FROM/WHERE.

When multiple window functions are used, all the window functions having syntactically equivalent PARTITION BY and ORDER BY clauses in their window definitions are guaranteed to be evaluated in a single pass over the data. Therefore they will see the same sort ordering, even if the ORDER BY does not uniquely determine an ordering. However, no guarantees are made about the evaluation of functions having different PARTITION BY or ORDER BY specifications. (In such cases a sort step is typically required between the passes of window function evaluations, and the sort is not guaranteed to preserve ordering of rows that its ORDER BY sees as equivalent.)

Currently, window functions always require presorted data, and so the query output will be ordered according to one or another of the window functions' PARTITION BY/ORDER BYclauses. It is not recommended to rely on this, however. Use an explicit top-level ORDER BY clause if you want to be sure the results are sorted in a particular way.

4.2. 參數表示式

參數表示式用在許多不同的方面，像是 SELECT 指令中的回傳列表；在 INSERT 或 UPDATE 指令中指定欄位的新值；又或是在一些命令中，指出搜尋的條件等。參數表示式的結果，有時候會被稱作 scalar，以有別於表格表示式（就是一個表格）的結果。參數表示式也可以稱作 scalar expressions（賦值表示式），甚或簡化為 expressions （表示式）。表示式的語法容許其值為各種運算的單一結果，如數學、邏輯、集合、或其他運算。

參數表示式可以是下列的其中一種形態：

常數或文字內容
欄位的引用
函數參數的引用，在函數裡或預備指令（prepared statement）中
子參數表示式
欄位選擇表示式
運算子宣告
函數呼叫
彙總表示式
窗函數呼叫
型別轉換
校對轉換（collation expression）
賦值子查詢（scalar subquery）
陣列建構式
列建構式
其他被括號括住的參數表示式（用於群組子表示式和強制調整運算優先權）

除了這個列表之外，還有一些建構式也會應用到表示式，但並沒有特別定義語法規則。一般來說，他們會包含函數或運算子的操作，在第 9 章中會有適當的說明。其中有一個例子便是 IS NULL 字句。

我們已經在 4.1.2 節中討論過常數了，所以接下來就從常數以下的項目繼續說明。

4.2.1. 欄位引用

要引要一個欄位的話，請使用下列的形式：

correlation.columnname

「correlation」（所屬名稱）是其所屬表格的名稱（也可能需要包含結構名），或是表格的別名（在 FROM 子句中所定義的）。所屬名稱和分隔用的句點是可以省略的，如果欄位名稱在目前查詢中的所有表格中是唯一的話。（參閱第 7 章）

4.2.2. 函數參數引用

函數參數的引用，用來指定一個不在該 SQL 指令中的值。參數是使用在 SQL 函數定義或預備查詢之中。有一些用戶端函式庫也支援將資料數值與 SQL 指令分離，在這種情境下，參數就會用來指向外部的資料數值。參數引用的形式如下：

$number

舉個例子，有一個函數 dept 的宣告如下：

CREATE FUNCTION dept(text) RETURNS dept
    AS $$ SELECT * FROM dept WHERE name = $1 $$
    LANGUAGE SQL;

這裡的 $1 指的是函數被呼叫時的第 1 個輸入參數：

4.2.3. 子參數表示式（Subscripts）

如果表示式要產生陣列的結果的話，指定陣列中某個元素，請使用：

expression[subscript]

或是要取得陣列中多個相隣的元素，請使用：

expression[lower_subscript:upper_subscript]

每一個「subscript」本身都是一個表示式，必須要產生一個整數值。

一般來說，陣列表示式必須被括號起來，但如果該表示式只是一個欄位或參數的引用的話，那麼括號可以省略。然後，多個子參數表示式可以連在一起使用，當你需要陣列表達多維度的概念時。舉例如下：

mytable.arraycolumn[4]
mytable.two_d_column[17][34]
$1[10:42]
(arrayfunction(a,b))[42]

在最後一個例子中，括號是必須的。關於陣列，在 8.15 節有更多說明。

4.2.4. 欄位選擇

如果一個表示式產生了複合性的型別（列型別），那麼要指定其中的某個欄位時，請使用：

expression.fieldname

一般來說，列的表示式必須被括號起來，但如果該表示式只是一個欄位或參數的引用的話，那麼括號可以省略。舉例如下：

mytable.mycolumn
$1.somecolumn
(rowfunction(a,b)).col3

（然而，有限制的欄位引用，實際上就是一種欄位選擇語法的特列。）有一種重要的特例是從某個複合型別的表格欄位中取其子欄位的值：

(compositecol).somefield
(mytable.compositecol).somefield

在這裡，括號是必要的，以表示 compositecol 是一個欄位名稱，但不是表格名稱。而在第二個例子中，mytable 是表格名稱，而非結構名稱。

你可以取得複合資料的所有欄位值，使用「.*」：

(compositecol).*

這個記號在不同的地方有不同的用法，請參閱 8.16.5 節的說明。

4.2.5. 運算子宣告（Operator Invocations）

有三種用來進行運算子宣告的語法：

expression operator expression(雙元中置運算子)

operator expression(單元前置運算子)

expression operator(單元後置運算子)

運算子記號的語法規則依 4.1.3 節的說明，或是關鍵字 AND、OR、和 NOT，又或是如下形式的限定運算子名稱：

OPERATOR(schema.operatorname)

哪些特定的運算子的使用與運算方式，端看系統與使用者如何定義。在第 9 章中會說明內建的運算子詳情。

4.2.6. 函數呼叫

函數呼叫的語法是，函數的名稱（可能還會加上結構名）接著一連串用括號括起來的參數列表：

function_name ([expression [, expression ... ]] )

舉個例子，下面的函數呼叫可以計算 2 的平方根：

sqrt(2)

內建函數在第 9 章說明，其他的函數可由使用者自訂。

參數可以是選擇性的附加名稱，請參閱 4.3 節的內容。

注意

函數如果只有一個參數，而又是複合型別的話，就稱作使用了欄位選擇語法；反過來說，欄位選擇語法也可以寫成函數的形式。這是因為 col(table) 和 table.col 是可以互換的。這並非標準 SQL，但 PostgreSQL 支援了，因為這使得函數的使用可以模擬「計算欄位」（computed fields）。更多資訊請參閱 8.16.5 節。

4.2.7. 彙總表示式

彙總表示式用在查詢時，過濾資料進行彙總函數計算的應用。彙總函數壓縮了大量資料輸入成為一個單一的輸出值，例如加總或平均數。彙總表示式的語法可以是下列其中之一：

aggregate_name (expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]

aggregate_name (ALL expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]

aggregate_name (DISTINCT expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]

aggregate_name ( * ) [ FILTER ( WHERE filter_clause ) ]

aggregate_name ( [ expression [ , ... ] ] ) WITHIN GROUP ( order_by_clause ) [ FILTER ( WHERE filter_clause ) ]

這裡的 agregate_name 是預先就定義好的（可能還需要加上結構名稱），表示式可以是任何的函數形態，但不能包含彙總函數或窗函數。而 order_by_clause 和 filter_clause 後續進行說明。

第一種形式的彙總表示式用於每次輸入一列的情況；第二種形式和第一種相同，當 ALL 是預設的時候；第三種形式彙總不重覆的資料（或在多種表示式的時候，取不重覆的集合）；第四種形式也是每次輸入一列，但沒有限定輸入條件，通常是用於 count(*)；最後一種形式用於有次序的彙總函數，稍後說明。

大多數的彙總函數會忽略空值，所以如果表示式計算的結果是空值的話，就會忽略不計。這樣的假設除非有特別設定，對所有內建的函數都是如此。

舉例來說，count(*) 計算輸入列的個數，而 count(f1) 是計算輸入列中 f1 欄位非空值的個數，因為 count 會忽略空值；然而，count(distinct f1) 則是計算 f1 欄位不重覆又非空值的個數。

通常彙總函數在處理輸入資料時，都是未排序過的。在大多數的情況沒有關係，例如：min 最小值的計算，與其輸入的次序沒有關係。然而，還是有些彙總函數的結果，與其處理次序是有關連的，例如：array_agg 和 string_agg。ORDER BY 字句就可以達到此效果，其與一般查詢語法 ORDER BY 的用法相同，詳細說明在 7.5 節，除非該表示式無法輸出成欄位名稱或數字。舉例如下：

SELECT array_agg(a ORDER BY b DESC) FROM table;

操作到多參數的彙總函數時，注意 ORDER BY 會處理過所有的彙總參數，例如：

SELECT string_agg(a, ',' ORDER BY a) FROM table;

但不能這樣寫：

SELECT string_agg(a ORDER BY a, ',') FROM table;  -- incorrect

這在語法上沒有不合法，但這表示一個單參數的彙總函數，使用了兩個排序的關鍵值（第二個完全沒用，因為它是常數）。

如果 DISTINCT 被加到 ORDER BY 子句裡的話，那麼所有的 ORDER BY 表示式都必須符合彙總函數的參數，也就是說，你不能使用不在 DISTINCT 列表中的表示式來排序。

注意

在彙總函數中使用 DISTINCT 和 ORDER BY，都是 PostgreSQL 的延伸。

把 ORDER BY 放進彙總函數的參數列表中，就如同到目前為止的描述，用於排序輸入值，進行一般性的處理或統計彙總，而排序是選擇性的。有另一種類型的彙總函數稱作有次序彙總，它們就必須要有 ORDER BY 子句，通常就是因為這些函數的計算結果，只會對某些特定次序的資料產生效果。典型的有次序彙總例子，包含排名和累計百分比計算。對於有次序彙總計算，將 ORDER BY 字句寫進 WITHIN GROUP (...) 中，如同上述最後一個語法例子。在 ORDER BY 子句中的表示式會處理每一筆輸入資料，如同一般的彚總函數，然後將其依子句中的表示式計算並排序，最後再依序轉送給彙總函數處理。（這和非處理 WITHIN GROUP 中的 ORDER BY 不同，它們不會再轉送給彙總函數。）如果有在 WITHIN GROUP 之前的表示式的話，稱作直接參數，會和有 ORDER BY 的參數有區分。不像一般的彙總參數，直接參數只會被處理一次，而不是每一筆都一次。這意思是只有在 GROUP BY 中，這些變數才會被彙總處理。這樣的限制就如同直接參數不在彙總表示式之中一樣。直接參數一般用於累計分配，只有在每一次彙整完的值才有意義。直接參數可以是空值，在這個例子中，使用的是 ()，而非 (*)。（PostgreSQL 兩種寫法都可以接受，但標準 SQL 只接受前者。）

有次序彙總查詢如下：

SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY income) FROM households;

 percentile_cont
-----------------
           50489

這裡包含了 50% 的累計，或是中間數累計，來源是表格 households 的 income 欄位。其中，0.5 是直接參數，它不影響百分累計彙整計算過程。

如果使用了 FILTER，那就只有符合 FILTER 子句條件的資料會被彙總處理，其他的資料都會被忽略掉。舉例來說：

SELECT
    count(*) AS unfiltered,
    count(*) FILTER (WHERE i < 5) AS filtered
FROM generate_series(1,10) AS s(i);

 unfiltered | filtered
------------+----------
         10 |        4
(1 row)

預先內建的彙總函數將在 9.20 節中介紹，其他彙總函數可以由使用者自行設計。

彙總表示式只可以用於結果列表或 SELECT 中的 HAVING 子句。在其他子句中是被禁止的，像是 WHERE，因為這些子句邏輯上都是在彙總處理前就得處理資料。

當彙總表示式使用在子查詢（參閱 4.2.11 節及 9.22 節）中時，彙總計算就會一般性地處理子查詢中的資料。但如果該彙總計算的參數用到了外層的變數時，就會產生例外情況：彙整計算是屬於最接近的外層查詢，並且只處理該層的查詢資料。這個彙總表示式對整體而言，只是一個子查詢的引用，它會被視為一個常數的結果，限制它只會出現在 HAVING 子句的運算層次而已。

4.2.8. 窗函數呼叫

窗函數呼叫指的是使用類似彙總函數的使用方式，只是僅用於查詢中部份列的選擇上。和非窗函數不同的是，這並不會只輸出為單一列—每一列都仍然分開輸出。然而，窗函數也是處理了所有該列所屬群組的其他列（PARTITION BY），依其窗函數所定義的範圍。窗函數呼叫的方式可以是下列其中之一：

function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER ( indow_definition )
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER ( indow_definition )

定義「窗」，請使用下列語法：

[ existing_window_name ][ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]

選擇性的 frame_clause 語法如下：

{ RANGE | ROWS } frame_start
{ RANGE | ROWS } BETWEEN frame_start AND frame_end

frame_start 及 frame_end 的語法如下：

UNBOUNDED PRECEDING
value PRECEDING 
CURRENT ROW
value FOLLOWING 
UNBOUNDED FOLLOWING

在這裡的表示式（expression），除了不能再包含窗函數之外，無其他特別限制。

window_name 是一個定義在 WINDOW 子句中的命名。另一方面，一個完整的窗也可以是被括號括起來，使用和 WINDOW 子句相同語法的定義。詳見 SELECT 語法頁面。值得探討的是，OVER wname 並不完全等同於 OVER (wname ...)；後者隱含著複製及修改窗的定義，而如果包含 frame 子句的話，就會被拒絕執行。

PARTITION BY 子句將查詢分組成為不同的分區，它們將會分別地被窗函數所處理。PARTITION BY 的行為和查詢語句中的 GROUP BY 很類似，除了它的表示式就只是表示式，而且不能產出欄位名稱或編號。沒有 PARTITION BY 的話，所有的列都會被當作一個分組進行彙總。ORDER BY 子句決定窗函數的處理次序，它也和查詢語句中的 ORDER BY 很類似，但它不能使用輸出的欄位或編號。如果沒有 ORDER BY 的話，就無法保證彙總處理的次序了。

frame_clause 指的是構成該窗的列，再進一步以「窗框」拆分，是目前分區的子集合。對窗函數而言，運算會以窗框的範圍取代整合分區。窗框的指定可以是 RANGE 或 ROW 兩種模式。不論哪種模式，都 frame_start 執行到 frame_end，但如果 frame_end 省略了，預設就是到目前的列（CURRENT ROW）。

UNBOUNDED PRECEDING 的窗框始於該分區的第一列，同樣地，UNBOUNDED FOLLOWING 意指窗框結束於分區的最後一列。

在 RANGE 模式裡，如果 frame_start 設定為 CURRENT ROW 的話，表示窗框始於目前列同序的那一列（使用 ORDER BY 時，排序相同的那一列），同理，frame_end 設定為 CURRENT ROW 時，表示窗框止於排序相同的列。而在 ROWS 模式時，CURRENT ROW 指的就是自己。

PRECEDING 和 FOLLOWING 兩個設定值，目前只能用在 ROWS 模式。它們指的是窗框的起迄於指定的一個值，表示目前列之前後多少列。而所謂的值，必須是整數表示式而不包含任何變數、彙總函數、或窗函數。其值也不能是空值或負值，但可以為零，表示只處理目前列。

預設的窗框設定是 RANGE UNBOUNDED PRECEDING，和 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 是一樣的。加上 ORDER BY 的話，這可以讓窗框起於和目前列並列的列；沒有 ORDER BY 的話，所有的列都會在分區裡，因為如此就無法判定次序，表示大家都一樣。

frame_start 的限制是不能使用 UNBOUNDED FOLLOWING，而 frame_end 不能使用 UNBOUNDED PRECEDING。frame_end 的設定也不能先於 frame_start—舉例來說，RANGE BETWEEN CURRENT ROW，使用 PRECEDING 就不可以。

如果有使用到 FILTER 的話，就只有符合 FILTER 條件式的列會被窗函數處理，其餘的列都會被忽略。只有彙總式的窗函數可以使用 FILTER 子句。

內建的窗函數會在 9.57 節中說明，使用者也可以自行設計窗函數。任何內建或自訂的一般函數或統計函數，都可以當作窗函數來使用。（有序集合和假定集合的彙總數，目前不能當作窗函數來使用。）

「*」語法的使用，用來把無參數的彙總函數當作窗函數來使用，例如：count(*) OVER (PARTITION BY x ORDER BY y)。「*」通常不會用於專門的窗函數上，專門的窗函數不允許參數裡有用到 DISTINCT 或 ORDER BY 的語法。

窗函數呼叫只限於 SELECT 回傳列表，及 ORDER BY 子句中。

更多窗函數的說明請參閱 3.5 節、9.21 節、及 7.2.5 節。

4.2.9. 型別轉換

型別轉換指定從一種資料型別轉換為另一種資料型別。PostgreSQL 接受兩種用於型別轉換的等效語法：

CAST ( expression AS type )
expression::type

CAST 語法符合 SQL 標準；帶「::」的語法是 PostgreSQL 既有的用法。

當強制轉換應用於已知型別的值表示式時，它表示執行時型別轉換。只有定義了合適的型別轉換操作，操作才能成功。請注意，這與使用帶常數的強制轉換略有不同，如 4.1.2.7 節所示。應用於未經修飾的字串文字的強制轉換表示將型別初始分配給文字常數，因此對於任何型別（如果字串文字的內容都是資料型別的可接受輸入語法）都會成功。

如果對於值表示式必須產生的型別沒有歧義（例如，當它被分配給資料表欄位），通常可以省略顯式的型別轉換；系統將在這種情況下自動套用型別轉換。但是，只有在系統目錄中標記為「可以隱式套用」的強制轉換才會執行自動強制轉換。其他強制轉換必須使用顯式強制轉換語法來使用。此限制旨在防止系統默默地套用令人意外的轉換。

也可以使用函數式語法來指定型別轉換：

typename ( expression )

但是，這僅適用於名稱也可以作為函數名稱使用的型別。例如，雙精度不能用這種方式，但等價的 float8 可以。而且，由於語法衝突，名稱間隔，時間和時間戳記只能使用雙引號才能用於這種方式。因此，使用類似功能的轉換語法會導致不一致，因此可能應該避免。

注意

函數式語法實際上只是一個函數呼叫。當兩個標準轉換語法之一用於執行轉換時，它將在內部呼叫已註冊的函數來執行轉換。按照慣例，這些轉換函數與它們的輸出類型具有相同的名稱，因此「函數式語法」只不過是直接呼叫底層的轉換函數。顯然，這不是一個可移植式應用程序應該依賴的東西。有關更多詳情，請參閱 CREATE CAST。

4.2.10. 排序表示式

COLLATE 子句用於覆蓋排序規則的表示式。它附加到所套用的表示式上：

expr COLLATE collation

排序規則是一種可以綱要限定識別指標。COLLATE 子句比運算子更緊密；必要時可以使用括號。

如果沒有明確指定排序規則，那麼資料庫系統會從表示式中涉及的欄位中衍生一個排序規則，或者如果表示式中未包含任何欄位，則預設為資料庫的預設排序規則。

COLLATE 子句的兩個常見用法是重寫 ORDER BY 子句中的排序順序，例如：

SELECT a, b, c FROM tbl WHERE ... ORDER BY a COLLATE "C";

並覆蓋具有語言環境特性結果的函數或運算子呼叫的排序規則，例如：

SELECT * FROM tbl WHERE a > 'foo' COLLATE "C";

請注意，在後者的情況下，COLLATE 子句附加到我們希望影響的運算子的輸入參數。無論運算子或函數呼叫 COLLATE 子句的哪個參數被附加到哪個參數都沒有關係，因為運算子或函數套用的排序規則是透過考慮所有參數衍生的，並且顯式 COLLATE 子句將覆蓋所有其他排序規則參數。（然而，將不匹配的 COLLATE 子句連接到多個參數是錯誤的，更多細節請參閱第 23.2 節）。因此，這會産生與前面的例子相同的結果：

SELECT * FROM tbl WHERE a COLLATE "C" > 'foo';

但是這會有錯：

SELECT * FROM tbl WHERE (a > 'foo') COLLATE "C";

因為它試圖將排序規則應用於「>」運算子的結果，該運算符是不可排序的布林資料型別。

4.2.11. Scalar 子查詢

Scalar 子查詢指的是括號中的普通 SELECT 查詢，但它只回傳一個資料列的一個欄位。（有關撰寫查詢的訊息，請參閱第 7 章。）執行 SELECT 查詢並在周圍的值表示式中使用單個回傳的值。使用回傳多於一個資料列或多於一個欄位的查詢作為 scalar 子查詢是錯誤的。（但是，如果在特定執行過程中子查詢不回傳任何資料列，則不會出現錯誤；Scalar 結果將視為空）。子查詢可以引用周圍查詢中的變數，該變數在任何一次運算期間都將用作常數的子查詢。有關子查詢的其他表示式，另請參閱第 9.22 節。

例如，以下是每個州中最大的城市人口數量：

SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name)
    FROM states;

4.2.12. 陣列建構函數

陣列建構函數是一種使用其成員元素的值建構陣列的表示式。一個簡單的陣列建構函數由關鍵字 ARRAY，左方括號 [，陣列元素值的表示式列表（用逗號分隔），最後一個右方括號 ] 組成。例如：

SELECT ARRAY[1,2,3+4];
  array
---------
 {1,2,7}
(1 row)

預設情況下，陣列元素型別是成員表示式的通用型別，使用與 UNION 或 CASE 結構相同的規則來決定（參閱 10.5 節）。您也可以透過明確將陣列建構函數轉換為所需的型別來覆蓋它，例如：

SELECT ARRAY[1,2,22.7]::integer[];
  array
----------
 {1,2,23}
(1 row)

這與分別將每個表示式轉換為陣列元素型別的效果相同。有關型別轉換的更多訊息，請參閱第 4.2.9 節。

可以透過巢狀的陣列建構函數來建構多維陣列。在內部的建構函數中，關鍵字 ARRAY 可以省略。例如，這些語法會產生相同的結果：

SELECT ARRAY[ARRAY[1,2], ARRAY[3,4]];
     array
---------------
 &#123;{1,2},{3,4}&#125;
(1 row)

SELECT ARRAY[[1,2],[3,4]];
     array
---------------
 &#123;{1,2},{3,4}&#125;
(1 row)

由於多維陣列必須是矩形，因此同一級別的內部建構函數必須産生具有相同維數的子陣列。套用於外部 ARRAY 建構函數的任何強制型別都會自動轉送給所有內部建構函數。

多維陣列建構函數的元素可以是任何產生適當型別陣列的東西，不僅只是一個子 ARRAY 結構。例如：

CREATE TABLE arr(f1 int[], f2 int[]);

INSERT INTO arr VALUES (ARRAY[[1,2],[3,4]], ARRAY[[5,6],[7,8]]);

SELECT ARRAY[f1, f2, '&#123;{9,10},{11,12}&#125;'::int[]] FROM arr;
                     array
------------------------------------------------
 {&#123;{1,2},{3,4}},&#123;{5,6},{7,8}},&#123;{9,10},{11,12}&#125;}
(1 row)

你可以建構一個空陣列，但由於不可能有一個沒有型別的陣列，所以你必須明確地將你的空陣列轉換為所需的型別。例如：

SELECT ARRAY[]::integer[];
 array
-------
 {}
(1 row)

也可以從子查詢的結果中建構一個陣列。在這種形式下，陣列建構函數使用關鍵字 ARRAY 和小括號（不是中括號）的子查詢寫入。例如：

SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
                                 array
-----------------------------------------------------------------------
 {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413}
(1 row)

SELECT ARRAY(SELECT ARRAY[i, i*2] FROM generate_series(1,5) AS a(i));
              array
----------------------------------
 &#123;{1,2},{2,4},{3,6},{4,8},{5,10}&#125;
(1 row)

子查詢必須回傳一個資料列。如果子查詢的輸出欄位是非陣列型別，則産生的一維陣列將具有子查詢結果中每個資料列的元素，其元素型別與子查詢的輸出欄位匹配。如果子查詢的輸出欄位是一個陣列型別，則結果將是一個相同型別的陣列，但會是一個更高的維度；在這種情況下，所有子查詢資料列都必須産生具有相同維度的陣列，否則結果將不是矩形。

用 ARRAY 建構的陣列索引值的下標始終以 1 開頭。有關陣列的更多訊息，請參閱第 8.15 節。

4.2.13. 資料列建構者

資料列建構函數是一個表示式，它使用其成員字串的值建構資料列內容（也稱為複合值）。資料建構函數由關鍵字 ROW，左括號，資料列字串的零個或多個表示式（以逗號分隔）所組成，最後則是右括號。例如：

SELECT ROW(1,2.5,'this is a test');

當列表中有多個表示式時，關鍵詞 ROW 是選用的。

資料列建構函數可以包含語法 rowvalue.，它將被延展為資料列內容的元素列表，就像在 SELECT 回傳列表的使用 . 語法時一樣（請參閱第 8.16.5 節）。例如，如果資料列具有欄位 f1 和 f2，則這些欄位是相同的：

SELECT ROW(t.*, 42) FROM t;
SELECT ROW(t.f1, t.f2, 42) FROM t;

注意

在 PostgreSQL 8.2 之前，. 語法在資料列建構函數中不會展開，因此寫了ROW(t., 42) 會建立一個兩個字串欄位的資料列，其第一個是欄位是另一個資料列值。新的建構行為通常更有用。如果您需要嵌套資料列值的舊行為，請不要使用 .* 的內部資料列值，例如 ROW(t, 42)。

預設情況下，由 ROW 表示式建立的值是匿名記錄型別。如有必要，可將其轉換為指定的複合型別 - 資料表的資料列型別或使用 CREATE TYPE AS 建立的複合型別。可能需要明確表示以避免歧義。例如：

CREATE TABLE mytable(f1 int, f2 float, f3 text);

CREATE FUNCTION getf1(mytable) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;

-- No cast needed since only one getf1() exists
SELECT getf1(ROW(1,2.5,'this is a test'));
 getf1
-------
     1
(1 row)

CREATE TYPE myrowtype AS (f1 int, f2 text, f3 numeric);

CREATE FUNCTION getf1(myrowtype) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;

-- Now we need a cast to indicate which function to call:
SELECT getf1(ROW(1,2.5,'this is a test'));
ERROR:  function getf1(record) is not unique

SELECT getf1(ROW(1,2.5,'this is a test')::mytable);
 getf1
-------
     1
(1 row)

SELECT getf1(CAST(ROW(11,'this is a test',2.5) AS myrowtype));
 getf1
-------
    11
(1 row)

資料列建構函數可用於建構要儲存在複合型別資料表欄位中的複合內容，或者要傳遞給接受複合參數的函數。此外，可以比較兩個資料列值或用 IS NULL 或 IS NOT NULL 來測試資料列，例如：

SELECT ROW(1,2.5,'this is a test') = ROW(1, 3, 'not the same');

SELECT ROW(table.*) IS NULL FROM table;  -- detect all-null rows

更多細節請參閱第 9.23 節。資料列建構函數也可以與子查詢結合使用，如第 9.22 節所述。

4.2.14. 表示式運算規則

並沒有定義子表示式的運算順序。特別是，運算子或函數的輸入不一定是從左到右或以任何其他固定順序進行運算。

進一步來說，如果一個表示式的結果可以透過只運算它的某些部分來得到，那麼其他子表示式可能根本就不會被運算。例如，如果有人寫了：

SELECT true OR somefunc();

那麼 somefunc() 將（可能）根本不會被呼叫。如果有人寫了：

SELECT somefunc() OR true;

請注意，這與在某些程語言中發現的布林運算是從左到右的「短路」不同。

因此，將具有副作用的函數用作複雜表示式的一部分是不明智的。在 WHERE 和 HAVING 子句中依賴副作用或運算順序是特別危險的，因為這些子句作為製定執行計劃的一部分經常式會被重新運算。這些子句中的布林表示式（AND / OR / NOT 組合）可以按照布林代數法則的任何方式重新組織。

如果必須強制執行某部份的運算指令，則可以使用 CASE 結構（請參閱第 9.17 節）。例如，這是試圖避免在 WHERE 子句中除以零不可信任的方式：

SELECT ... WHERE x 
>
 0 AND y/x 
>
 1.5;

但這樣是安全的：

SELECT ... WHERE CASE WHEN x 
>
 0 THEN y/x 
>
 1.5 ELSE false END;

以這種方式使用的 CASE 構造將放棄最佳化嘗試，因此只能在必要時進行。（在這個特定的例子中，透過改寫為 y> 1.5 * x 來避免這個問題會更好。）

然而，CASE 對於這些問題並不是萬能的。上述技術的一個局限是它不能阻止對常數子表示式的預先評估。如第 37.6 節所述，標記為 IMMUTABLE 的函數和運算子可以在查詢計劃時進行運算，而不是在執行時進行運算。因此，例如：

SELECT CASE WHEN x 
>
 0 THEN x ELSE 1/0 END FROM tab;

由於查詢規劃試圖簡化常數子表示式，因此即使資料表中的每一個資料列都具有 x> 0，以至於在執行時永遠不會走到 ELSE，也可能導致除以零的例外情況。

雖然這個特殊的例子看起來很愚蠢，但是在函數中執行的查詢中可能會出現不明顯涉及常數的情況，因為函數參數和局部變數的值可以作為常數插入到查詢中以用於查詢規劃。例如，在 PL/pgSQL 函數中，使用 IF-THEN-ELSE 語句來保護有風險的運算要比將它嵌套在 CASE 表示式中要安全得多。

同一種類型的另一個限制是，CASE 無法阻止運算其中包含的彙總表示式，因為需要在 SELECT 資料列表或 HAVING 子句中的其他表示式之前計算彙總表示式。例如，下面的查詢可能會導致一個除以零例外情況，儘管似乎已經受到保護：

SELECT CASE WHEN min(employees) > 0
            THEN avg(expenses / employees)
       END
    FROM departments;

min() 和 avg() 彙總運算是在所有輸入的資料列上同時計算的，因此如果任何員工的資料等於零，則在有任何測試 min() 結果的機會之前，發生除以零的錯誤。相反，使用 WHERE 或 FILTER 子句來防止有問題的輸入資料列，將可以在彙總函數之前來預防這種情況發生。

9.13. 文字檢索函式及運算子

Table 9.40,Table 9.41andTable 9.42summarize the functions and operators that are provided for full text searching. SeeChapter 12for a detailed explanation ofPostgreSQL's text search facility.

Table 9.40. Text Search Operators

Operator

Return Type

Description

Example

Result

@@

boolean

tsvectormatchestsquery?

to_tsvector('fat cats ate rats') @@ to_tsquery('cat & rat')

t

@@@

boolean

deprecated synonym for@@

to_tsvector('fat cats ate rats') @@@ to_tsquery('cat & rat')

t

tsvector

concatenatetsvectors

`'a:1 b:2'::tsvector

'c:1 d:2 b:3'::tsvector`

'a':1 'b':2,5 'c':3 'd':4

&&

tsquery

ANDtsquerys together

`'fat

rat'::tsquery && 'cat'::tsquery`

`( 'fat'

'rat' ) & 'cat'`

tsquery

ORtsquerys together

`'fat

rat'::tsquery

'cat'::tsquery`

`( 'fat'

'rat' )

'cat'`

!!

tsquery

negate atsquery

!! 'cat'::tsquery

!'cat'

<->

tsquery

tsqueryfollowed bytsquery

to_tsquery('fat') <-> to_tsquery('rat')

'fat' <-> 'rat'

@>

boolean

tsquerycontains another ?

'cat'::tsquery @> 'cat & rat'::tsquery

f

<@

boolean

tsqueryis contained in ?

'cat'::tsquery <@ 'cat & rat'::tsquery

t

Note

Thetsquerycontainment operators consider only the lexemes listed in the two queries, ignoring the combining operators.

In addition to the operators shown in the table, the ordinary B-tree comparison operators (=,<, etc) are defined for typestsvectorandtsquery. These are not very useful for text searching but allow, for example, unique indexes to be built on columns of these types.

Table 9.41. Text Search Functions

Function

Return Type

Description

Example

Result

array_to_tsvector(text[])

tsvector

convert array of lexemes totsvector

array_to_tsvector('{fat,cat,rat}'::text[])

'cat' 'fat' 'rat'

get_current_ts_config()

regconfig

get default text search configuration

get_current_ts_config()

english

length(tsvector)

integer

number of lexemes intsvector

length('fat:2,4 cat:3 rat:5A'::tsvector)

3

numnode(tsquery)

integer

number of lexemes plus operators intsquery

`numnode('(fat & rat)

cat'::tsquery)`

5

plainto_tsquery([configregconfig,]querytext)

tsquery

producetsqueryignoring punctuation

plainto_tsquery('english', 'The Fat Rats')

'fat' & 'rat'

phraseto_tsquery([configregconfig,]querytext)

tsquery

producetsquerythat searches for a phrase, ignoring punctuation

phraseto_tsquery('english', 'The Fat Rats')

'fat' <-> 'rat'

querytree(querytsquery)

text

get indexable part of atsquery

querytree('foo & ! bar'::tsquery)

'foo'

setweight(vectortsvector,weight"char")

tsvector

assignweight_to each element ofvector_

setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A')

'cat':3A 'fat':2A,4A 'rat':5A

setweight(vectortsvector,weight"char",lexemestext[])

tsvector

assignweight_to elements ofvectorthat are listed inlexemes_

setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A', '{cat,rat}')

'cat':3A 'fat':2,4 'rat':5A

strip(tsvector)

tsvector

remove positions and weights fromtsvector

strip('fat:2,4 cat:3 rat:5A'::tsvector)

'cat' 'fat' 'rat'

to_tsquery([configregconfig,]querytext)

tsquery

normalize words and convert totsquery

to_tsquery('english', 'The & Fat & Rats')

'fat' & 'rat'

to_tsvector([configregconfig,]documenttext)

tsvector

reduce document text totsvector

to_tsvector('english', 'The Fat Rats')

'fat':2 'rat':3

to_tsvector([configregconfig,]documentjson(b))

tsvector

reduce each string value in the document to atsvector, and then concatentate those in document order to produce a singletsvector

to_tsvector('english', '{"a": "The Fat Rats"}'::json)

'fat':2 'rat':3

ts_delete(vectortsvector,lexemetext)

tsvector

remove givenlexeme_fromvector_

ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, 'fat')

'cat':3 'rat':5A

ts_delete(vectortsvector,lexemestext[])

tsvector

remove any occurrence of lexemes inlexemes_fromvector_

ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat'])

'cat':3

ts_filter(vectortsvector,weights"char"[])

tsvector

select only elements with givenweights_fromvector_

ts_filter('fat:2,4 cat:3b rat:5A'::tsvector, '{a,b}')

'cat':3B 'rat':5A

ts_headline([configregconfig,]documenttext,querytsquery[,optionstext])

text

display a query match

ts_headline('x y z', 'z'::tsquery)

x y <b>z</b>

ts_headline([configregconfig,]documentjson(b),querytsquery[,optionstext])

text

display a query match

ts_headline('{"a":"x y z"}'::json, 'z'::tsquery)

{"a":"x y <b>z</b>"}

ts_rank([weightsfloat4[],]vectortsvector,querytsquery[,normalizationinteger])

float4

rank document for query

ts_rank(textsearch, query)

0.818

ts_rank_cd([weightsfloat4[],]vectortsvector,querytsquery[,normalizationinteger])

float4

rank document for query using cover density

ts_rank_cd('{0.1, 0.2, 0.4, 1.0}', textsearch, query)

2.01317

ts_rewrite(querytsquery,targettsquery,substitutetsquery)

tsquery

replacetarget_withsubstitute_within query

`ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'foo

bar'::tsquery)`

`'b' & ( 'foo'

'bar' )`

ts_rewrite(querytsquery,selecttext)

tsquery

replace using targets and substitutes from aSELECTcommand

SELECT ts_rewrite('a & b'::tsquery, 'SELECT t,s FROM aliases')

`'b' & ( 'foo'

'bar' )`

tsquery_phrase(query1tsquery,query2tsquery)

tsquery

make query that searches forquery1_followed byquery2_(same as<->operator)

tsquery_phrase(to_tsquery('fat'), to_tsquery('cat'))

'fat' <-> 'cat'

tsquery_phrase(query1tsquery,query2tsquery,distanceinteger)

tsquery

make query that searches forquery1_followed byquery2at distancedistance_

tsquery_phrase(to_tsquery('fat'), to_tsquery('cat'), 10)

'fat' <10> 'cat'

tsvector_to_array(tsvector)

text[]

converttsvectorto array of lexemes

tsvector_to_array('fat:2,4 cat:3 rat:5A'::tsvector)

{cat,fat,rat}

tsvector_update_trigger()

trigger

trigger function for automatictsvectorcolumn update

CREATE TRIGGER ... tsvector_update_trigger(tsvcol, 'pg_catalog.swedish', title, body)

tsvector_update_trigger_column()

trigger

trigger function for automatictsvectorcolumn update

CREATE TRIGGER ... tsvector_update_trigger_column(tsvcol, configcol, title, body)

unnest(tsvector, OUTlexemetext, OUTpositionssmallint[], OUTweightstext)

setof record

expand a tsvector to a set of rows

unnest('fat:2,4 cat:3 rat:5A'::tsvector)

(cat,{3},{D}) ...

Note

All the text search functions that accept an optionalregconfigargument will use the configuration specified bydefault_text_search_configwhen that argument is omitted.

The functions inTable 9.42are listed separately because they are not usually used in everyday text searching operations. They are helpful for development and debugging of new text search configurations.

Table 9.42. Text Search Debugging Functions

Function

Return Type

Description

Example

Result

ts_debug([configregconfig,]documenttext, OUTaliastext, OUTdescriptiontext, OUTtokentext, OUTdictionariesregdictionary[], OUTdictionaryregdictionary, OUTlexemestext[])

setof record

test a configuration

ts_debug('english', 'The Brightest supernovaes')

(asciiword,"Word, all ASCII",The,{english_stem},english_stem,{}) ...

ts_lexize(dictregdictionary,tokentext)

text[]

test a dictionary

ts_lexize('english_stem', 'stars')

{star}

ts_parse(parser_nametext,documenttext, OUTtokidinteger, OUTtokentext)

setof record

test a parser

ts_parse('default', 'foo - bar')

(1,foo) ...

ts_parse(parser_oidoid,documenttext, OUTtokidinteger, OUTtokentext)

setof record

test a parser

ts_parse(3722, 'foo - bar')

(1,foo) ...

ts_token_type(parser_nametext, OUTtokidinteger, OUTaliastext, OUTdescriptiontext)

setof record

get token types defined by parser

ts_token_type('default')

(1,asciiword,"Word, all ASCII") ...

ts_token_type(parser_oidoid, OUTtokidinteger, OUTaliastext, OUTdescriptiontext)

setof record

get token types defined by parser

ts_token_type(3722)

(1,asciiword,"Word, all ASCII") ...

ts_stat(sqlquerytext, [weightstext,] OUTwordtext, OUTndocinteger, OUTnentryinteger)

setof record

get statistics of atsvectorcolumn

ts_stat('SELECT vector from apod')

(foo,10,15) ...

9.8. 型別轉換函式

ThePostgreSQLformatting functions provide a powerful set of tools for converting various data types (date/time, integer, floating point, numeric) to formatted strings and for converting from formatted strings to specific data types.Table 9.23lists them. These functions all follow a common calling convention: the first argument is the value to be formatted and the second argument is a template that defines the output or input format.

Table 9.23. Formatting Functions

Function

Return Type

Description

Example

to_char(timestamp,text)

text

convert time stamp to string

to_char(current_timestamp, 'HH12:MI:SS')

to_char(interval,text)

text

convert interval to string

to_char(interval '15h 2m 12s', 'HH24:MI:SS')

to_char(int,text)

text

convert integer to string

to_char(125, '999')

to_char(double precision,text)

text

convert real/double precision to string

to_char(125.8::real, '999D9')

to_char(numeric,text)

text

convert numeric to string

to_char(-125.8, '999D99S')

to_date(text,text)

date

convert string to date

to_date('05 Dec 2000', 'DD Mon YYYY')

to_number(text,text)

numeric

convert string to numeric

to_number('12,454.8-', '99G999D9S')

to_timestamp(text,text)

timestamp with time zone

convert string to time stamp

to_timestamp('05 Dec 2000', 'DD Mon YYYY')

Note

There is also a single-argumentto_timestampfunction; seeTable 9.30.

Tip

to_timestampandto_dateexist to handle input formats that cannot be converted by simple casting. For most standard date/time formats, simply casting the source string to the required data type works, and is much easier. Similarly,to_numberis unnecessary for standard numeric representations.

In ato_charoutput template string, there are certain patterns that are recognized and replaced with appropriately-formatted data based on the given value. Any text that is not a template pattern is simply copied verbatim. Similarly, in an input template string (for the other functions), template patterns identify the values to be supplied by the input data string.

Table 9.24shows the template patterns available for formatting date and time values.

Table 9.24. Template Patterns for Date/Time Formatting

Pattern

Description

HH

hour of day (01-12)

HH12

hour of day (01-12)

HH24

hour of day (00-23)

MI

minute (00-59)

SS

second (00-59)

MS

millisecond (000-999)

US

microsecond (000000-999999)

SSSS

seconds past midnight (0-86399)

AM,am,PMorpm

meridiem indicator (without periods)

A.M.,a.m.,P.M.orp.m.

meridiem indicator (with periods)

Y,YYY

year (4 or more digits) with comma

YYYY

year (4 or more digits)

YYY

last 3 digits of year

YY

last 2 digits of year

Y

last digit of year

IYYY

ISO 8601 week-numbering year (4 or more digits)

IYY

last 3 digits of ISO 8601 week-numbering year

IY

last 2 digits of ISO 8601 week-numbering year

I

last digit of ISO 8601 week-numbering year

BC,bc,ADorad

era indicator (without periods)

B.C.,b.c.,A.D.ora.d.

era indicator (with periods)

MONTH

full upper case month name (blank-padded to 9 chars)

Month

full capitalized month name (blank-padded to 9 chars)

month

full lower case month name (blank-padded to 9 chars)

MON

abbreviated upper case month name (3 chars in English, localized lengths vary)

Mon

abbreviated capitalized month name (3 chars in English, localized lengths vary)

mon

abbreviated lower case month name (3 chars in English, localized lengths vary)

MM

month number (01-12)

DAY

full upper case day name (blank-padded to 9 chars)

Day

full capitalized day name (blank-padded to 9 chars)

day

full lower case day name (blank-padded to 9 chars)

DY

abbreviated upper case day name (3 chars in English, localized lengths vary)

Dy

abbreviated capitalized day name (3 chars in English, localized lengths vary)

dy

abbreviated lower case day name (3 chars in English, localized lengths vary)

DDD

day of year (001-366)

IDDD

day of ISO 8601 week-numbering year (001-371; day 1 of the year is Monday of the first ISO week)

DD

day of month (01-31)

D

day of the week, Sunday (1) to Saturday (7)

ID

ISO 8601 day of the week, Monday (1) to Sunday (7)

W

week of month (1-5) (the first week starts on the first day of the month)

WW

week number of year (1-53) (the first week starts on the first day of the year)

IW

week number of ISO 8601 week-numbering year (01-53; the first Thursday of the year is in week 1)

CC

century (2 digits) (the twenty-first century starts on 2001-01-01)

J

Julian Day (integer days since November 24, 4714 BC at midnight UTC)

Q

quarter

RM

month in upper case Roman numerals (I-XII; I=January)

rm

month in lower case Roman numerals (i-xii; i=January)

TZ

upper case time-zone abbreviation (only supported into_char)

tz

lower case time-zone abbreviation (only supported into_char)

OF

time-zone offset from UTC (only supported into_char)

Modifiers can be applied to any template pattern to alter its behavior. For example,FMMonthis theMonthpattern with theFMmodifier.Table 9.25shows the modifier patterns for date/time formatting.

Table 9.25. Template Pattern Modifiers for Date/Time Formatting

Modifier

Description

Example

FMprefix

fill mode (suppress leading zeroes and padding blanks)

FMMonth

THsuffix

upper case ordinal number suffix

DDTH, e.g.,12TH

thsuffix

lower case ordinal number suffix

DDth, e.g.,12th

FXprefix

fixed format global option (see usage notes)

FX Month DD Day

TMprefix

translation mode (print localized day and month names based on)

TMMonth

SPsuffix

spell mode (not implemented)

DDSP

Usage notes for date/time formatting:

FMsuppresses leading zeroes and trailing blanks that would otherwise be added to make the output of a pattern be fixed-width. InPostgreSQL,FMmodifies only the next specification, while in OracleFMaffects all subsequent specifications, and repeatedFMmodifiers toggle fill mode on and off.
TMdoes not include trailing blanks.to_timestampandto_dateignore theTMmodifier.
to_timestampandto_dateskip multiple blank spaces in the input string unless theFXoption is used. For example,to_timestamp('2000 JUN', 'YYYY MON')works, butto_timestamp('2000 JUN', 'FXYYYY MON')returns an error becauseto_timestampexpects one space only.FXmust be specified as the first item in the template.
Ordinary text is allowed into_chartemplates and will be output literally. You can put a substring in double quotes to force it to be interpreted as literal text even if it contains pattern key words. For example, in'"Hello Year "YYYY', theYYYYwill be replaced by the year data, but the singleYinYearwill not be. Into_date,to_number, andto_timestamp, double-quoted strings skip the number of input characters contained in the string, e.g."XX"skips two input characters.
If you want to have a double quote in the output you must precede it with a backslash, for example'\"YYYY Month\"'.
Into_timestampandto_date, if the year format specification is less than four digits, e.g.YYY, and the supplied year is less than four digits, the year will be adjusted to be nearest to the year 2020, e.g.95becomes 1995.
Into_timestampandto_date, theYYYYconversion has a restriction when processing years with more than 4 digits. You must use some non-digit character or template afterYYYY, otherwise the year is always interpreted as 4 digits. For example (with the year 20000):to_date('200001131', 'YYYYMMDD')will be interpreted as a 4-digit year; instead use a non-digit separator after the year, liketo_date('20000-1131', 'YYYY-MMDD')orto_date('20000Nov31', 'YYYYMonDD').
Into_timestampandto_date, theCC(century) field is accepted but ignored if there is aYYY,YYYYorY,YYYfield. IfCCis used withYYorYthen the result is computed as that year in the specified century. If the century is specified but the year is not, the first year of the century is assumed.
Into_timestampandto_date, weekday names or numbers (DAY,D, and related field types) are accepted but are ignored for purposes of computing the result. The same is true for quarter (Q) fields.
Into_timestampandto_date, an ISO 8601 week-numbering date (as distinct from a Gregorian date) can be specified in one of two ways:
- Year, week number, and weekday: for exampleto_date('2006-42-4', 'IYYY-IW-ID')returns the date2006-10-19. If you omit the weekday it is assumed to be 1 (Monday).
- Year and day of year: for exampleto_date('2006-291', 'IYYY-IDDD')also returns2006-10-19.
Attempting to enter a date using a mixture of ISO 8601 week-numbering fields and Gregorian date fields is nonsensical, and will cause an error. In the context of an ISO 8601 week-numbering year, the concept of a“month”or“day of month”has no meaning. In the context of a Gregorian year, the ISO week has no meaning.
Caution
Whileto_datewill reject a mixture of Gregorian and ISO week-numbering date fields,to_charwill not, since output format specifications likeYYYY-MM-DD (IYYY-IDDD)can be useful. But avoid writing something likeIYYY-MM-DD; that would yield surprising results near the start of the year. (SeeSection 9.9.1for more information.)
Into_timestamp, millisecond (MS) or microsecond (US) fields are used as the seconds digits after the decimal point. For exampleto_timestamp('12.3', 'SS.MS')is not 3 milliseconds, but 300, because the conversion treats it as 12 + 0.3 seconds. So, for the formatSS.MS, the input values12.3,12.30, and12.300specify the same number of milliseconds. To get three milliseconds, one must write12.003, which the conversion treats as 12 + 0.003 = 12.003 seconds.
Here is a more complex example:to_timestamp('15:12:02.020.001230', 'HH24:MI:SS.MS.US')is 15 hours, 12 minutes, and 2 seconds + 20 milliseconds + 1230 microseconds = 2.021230 seconds.
to_char(..., 'ID')'s day of the week numbering matches theextract(isodow from ...)function, butto_char(..., 'D')'s does not matchextract(dow from ...)'s day numbering.
to_char(interval)formatsHHandHH12as shown on a 12-hour clock, for example zero hours and 36 hours both output as12, whileHH24outputs the full hour value, which can exceed 23 in anintervalvalue.

Table 9.26shows the template patterns available for formatting numeric values.

Table 9.26. Template Patterns for Numeric Formatting

Pattern

Description

9

value with the specified number of digits

0

value with leading zeros

.(period)

decimal point

,(comma)

group (thousand) separator

PR

negative value in angle brackets

S

sign anchored to number (uses locale)

L

currency symbol (uses locale)

D

decimal point (uses locale)

G

group separator (uses locale)

MI

minus sign in specified position (if number < 0)

PL

plus sign in specified position (if number > 0)

SG

plus/minus sign in specified position

RN

Roman numeral (input between 1 and 3999)

THorth

ordinal number suffix

V

shift specified number of digits (see notes)

EEEE

exponent for scientific notation

Usage notes for numeric formatting:

A sign formatted usingSG,PL, orMIis not anchored to the number; for example,to_char(-12, 'MI9999')produces'- 12'butto_char(-12, 'S9999')produces' -12'. The Oracle implementation does not allow the use ofMIbefore9, but rather requires that9precedeMI.
9results in a value with the same number of digits as there are9s. If a digit is not available it outputs a space.
THdoes not convert values less than zero and does not convert fractional numbers.
PL,SG, andTHarePostgreSQLextensions.
Vwithto_charmultiplies the input values by10^n, where_n_is the number of digits followingV.Vwithto_numberdivides in a similar manner.to_charandto_numberdo not support the use ofVcombined with a decimal point (e.g.,99.9V99is not allowed).
EEEE(scientific notation) cannot be used in combination with any of the other formatting patterns or modifiers other than digit and decimal point patterns, and must be at the end of the format string (e.g.,9.99EEEEis a valid pattern).

Certain modifiers can be applied to any template pattern to alter its behavior. For example,FM9999is the9999pattern with theFMmodifier.Table 9.27shows the modifier patterns for numeric formatting.

Table 9.27. Template Pattern Modifiers for Numeric Formatting

Modifier

Description

Example

FMprefix

fill mode (suppress leading zeroes and padding blanks)

FM9999

THsuffix

upper case ordinal number suffix

999TH

thsuffix

lower case ordinal number suffix

999th

Table 9.28shows some examples of the use of theto_charfunction.

Table 9.28. to_charExamples

Expression

Result

to_char(current_timestamp, 'Day, DD HH12:MI:SS')

'Tuesday , 06 05:39:18'

to_char(current_timestamp, 'FMDay, FMDD HH12:MI:SS')

'Tuesday, 6 05:39:18'

to_char(-0.1, '99.99')

' -.10'

to_char(-0.1, 'FM9.99')

'-.1'

to_char(0.1, '0.9')

' 0.1'

to_char(12, '9990999.9')

' 0012.0'

to_char(12, 'FM9990999.9')

'0012.'

to_char(485, '999')

' 485'

to_char(-485, '999')

'-485'

to_char(485, '9 9 9')

' 4 8 5'

to_char(1485, '9,999')

' 1,485'

to_char(1485, '9G999')

' 1 485'

to_char(148.5, '999.999')

' 148.500'

to_char(148.5, 'FM999.999')

'148.5'

to_char(148.5, 'FM999.990')

'148.500'

to_char(148.5, '999D999')

' 148,500'

to_char(3148.5, '9G999D999')

' 3 148,500'

to_char(-485, '999S')

'485-'

to_char(-485, '999MI')

'485-'

to_char(485, '999MI')

'485 '

to_char(485, 'FM999MI')

'485'

to_char(485, 'PL999')

'+485'

to_char(485, 'SG999')

'+485'

to_char(-485, 'SG999')

'-485'

to_char(-485, '9SG99')

'4-85'

to_char(-485, '999PR')

'<485>'

to_char(485, 'L999')

'DM 485'

to_char(485, 'RN')

' CDLXXXV'

to_char(485, 'FMRN')

'CDLXXXV'

to_char(5.2, 'FMRN')

'V'

to_char(482, '999th')

' 482nd'

to_char(485, '"Good number:"999')

'Good number: 485'

to_char(485.8, '"Pre:"999" Post:" .999')

'Pre: 485 Post: .800'

to_char(12, '99V999')

' 12000'

to_char(12.4, '99V999')

' 12400'

to_char(12.45, '99V9')

' 125'

to_char(0.0004859, '9.99EEEE')

' 4.86e-04'

9.9 日期時間函式及運算子

Table 9-28 shows the available functions for date/time value processing, with details appearing in the following subsections. Table 9-27 illustrates the behaviors of the basic arithmetic operators (+, *, etc.). For formatting functions, refer to Section 9.8. You should be familiar with the background information on date/time data types from Section 8.5.

All the functions and operators described below that take time or timestamp inputs actually come in two variants: one that takes time with time zone or timestamp with time zone, and one that takes time without time zone or timestamp without time zone. For brevity, these variants are not shown separately. Also, the + and * operators come in commutative pairs (for example both date + integer and integer + date); we show only one of each such pair.

Table 9-27. Date/Time Operators

Operator

Example

Result

date '2001-09-28' + integer '7'

date '2001-10-05'

date '2001-09-28' + interval '1 hour'

timestamp '2001-09-28 01:00:00'

date '2001-09-28' + time '03:00'

timestamp '2001-09-28 03:00:00'

interval '1 day' + interval '1 hour'

interval '1 day 01:00:00'

timestamp '2001-09-28 01:00' + interval '23 hours'

timestamp '2001-09-29 00:00:00'

time '01:00' + interval '3 hours'

time '04:00:00'

- interval '23 hours'

interval '-23:00:00'

date '2001-10-01' - date '2001-09-28'

integer '3' (days)

date '2001-10-01' - integer '7'

date '2001-09-24'

date '2001-09-28' - interval '1 hour'

timestamp '2001-09-27 23:00:00'

time '05:00' - time '03:00'

interval '02:00:00'

time '05:00' - interval '2 hours'

time '03:00:00'

timestamp '2001-09-28 23:00' - interval '23 hours'

timestamp '2001-09-28 00:00:00'

interval '1 day' - interval '1 hour'

interval '1 day -01:00:00'

timestamp '2001-09-29 03:00' - timestamp '2001-09-27 12:00'

interval '1 day 15:00:00'

900 * interval '1 second'

interval '00:15:00'

21 * interval '1 day'

interval '21 days'

double precision '3.5' * interval '1 hour'

interval '03:30:00'

interval '1 hour' / double precision '1.5'

interval '00:40:00'

Table 9-28. Date/Time Functions

Function

Return Type

Description

Example

Result

age(timestamp, timestamp)

interval

參數間相減，產生一個使用年和月的帶有「符號」的結果，而不僅僅是幾天

age(timestamp '2001-04-10', timestamp '1957-06-13')

43 years 9 mons 27 days

age(timestamp)

interval

用 current_date 減去該日期（以午夜為準）

age(timestamp '1957-06-13')

43 years 8 mons 3 days

clock_timestamp()

timestamp with time zone

Current date and time (changes during statement execution); see

current_date

date

目前日期；詳見第

current_time

time with time zone

Current time of day; see

current_timestamp

timestamp with time zone

Current date and time (start of current transaction); see

date_part(text, timestamp)

double precision

Get subfield (equivalent to extract); see

date_part('hour', timestamp '2001-02-16 20:38:40')

date_part(text, interval)

double precision

Get subfield (equivalent to extract); see

date_part('month', interval '2 years 3 months')

date_trunc(text, timestamp)

timestamp

Truncate to specified precision; see also

date_trunc('hour', timestamp '2001-02-16 20:38:40')

2001-02-16 20:00:00

date_trunc(text, interval)

interval

Truncate to specified precision; see also

date_trunc('hour', interval '2 days 3 hours 40 minutes')

2 days 03:00:00

extract(field from timestamp)

double precision

Get subfield; see

extract(hour from timestamp '2001-02-16 20:38:40')

extract(field from interval)

double precision

Get subfield; see

extract(month from interval '2 years 3 months')

isfinite(date)

boolean

Test for finite date (not +/-infinity)

isfinite(date '2001-02-16')

true

isfinite(timestamp)

boolean

Test for finite time stamp (not +/-infinity)

isfinite(timestamp '2001-02-16 21:28:30')

true

isfinite(interval)

boolean

Test for finite interval

isfinite(interval '4 hours')

true

justify_days(interval)

interval

Adjust interval so 30-day time periods are represented as months

justify_days(interval '35 days')

1 mon 5 days

justify_hours(interval)

interval

Adjust interval so 24-hour time periods are represented as days

justify_hours(interval '27 hours')

1 day 03:00:00

justify_interval(interval)

interval

Adjust interval using justify_days and justify_hours, with additional sign adjustments

justify_interval(interval '1 mon -1 hour')

29 days 23:00:00

localtime

time

Current time of day; see

localtimestamp

timestamp

Current date and time (start of current transaction); see

make_date(year int, month int, day int)

date

Create date from year, month and day fields

make_date(2013, 7, 15)

2013-07-15

make_interval(years int DEFAULT 0, months int DEFAULT 0, weeksint DEFAULT 0, days int DEFAULT 0, hours int DEFAULT 0, mins intDEFAULT 0, secs double precision DEFAULT 0.0)

interval

Create interval from years, months, weeks, days, hours, minutes and seconds fields

make_interval(days := 10)

10 days

make_time(hour int, min int, sec double precision)

time

Create time from hour, minute and seconds fields

make_time(8, 15, 23.5)

08:15:23.5

make_timestamp(year int, month int, day int, hour int, min int,sec double precision)

timestamp

Create timestamp from year, month, day, hour, minute and seconds fields

make_timestamp(2013, 7, 15, 8, 15, 23.5)

2013-07-15 08:15:23.5

make_timestamptz(year int, month int, day int, hour int, min int,sec double precision, [ timezone text ])

timestamp with time zone

Create timestamp with time zone from year, month, day, hour, minute and seconds fields. When timezone is not specified, then current time zone is used.

make_timestamptz(2013, 7, 15, 8, 15, 23.5)

2013-07-15 08:15:23.5+01

now()

timestamp with time zone

Current date and time (start of current transaction); see

statement_timestamp()

timestamp with time zone

Current date and time (start of current statement); see

timeofday()

text

Current date and time (like clock_timestamp, but as a textstring); see

transaction_timestamp()

timestamp with time zone

Current date and time (start of current transaction); see

In addition to these functions, the SQL OVERLAPS operator is supported:

(start1, end1) OVERLAPS (start2, end2)
(start1, length1) OVERLAPS (start2, length2)

This expression yields true when two time periods (defined by their endpoints) overlap, false when they do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time, or time stamp followed by an interval. When a pair of values is provided, either the start or the end can be written first; OVERLAPS automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.

SELECT (DATE '2001-02-16', DATE '2001-12-21') OVERLAPS
       (DATE '2001-10-30', DATE '2002-10-30');
Result: true
SELECT (DATE '2001-02-16', INTERVAL '100 days') OVERLAPS
       (DATE '2001-10-30', DATE '2002-10-30');
Result: false
SELECT (DATE '2001-10-29', DATE '2001-10-30') OVERLAPS
       (DATE '2001-10-30', DATE '2001-10-31');
Result: false
SELECT (DATE '2001-10-30', DATE '2001-10-30') OVERLAPS
       (DATE '2001-10-30', DATE '2001-10-31');
Result: true

When adding an interval value to (or subtracting an interval value from) a timestamp with time zone value, the days component advances or decrements the date of the timestamp with time zone by the indicated number of days. Across daylight saving time changes (when the session time zone is set to a time zone that recognizes DST), this means interval '1 day' does not necessarily equal interval '24 hours'. For example, with the session time zone set to CST7CDT, timestamp with time zone '2005-04-02 12:00-07' + interval '1 day' will produce timestamp with time zone '2005-04-03 12:00-06', while adding interval '24 hours' to the same initial timestamp with time zone produces timestamp with time zone '2005-04-03 13:00-06', as there is a change in daylight saving time at 2005-04-03 02:00 in time zone CST7CDT.

Note there can be ambiguity in the months field returned by age because different months have different numbers of days. PostgreSQL's approach uses the month from the earlier of the two dates when calculating partial months. For example, age('2004-06-01', '2004-04-30') uses April to yield 1 mon 1 day, while using May would yield 1 mon 2 days because May has 31 days, while April has only 30.

Subtraction of dates and timestamps can also be complex. One conceptually simple way to perform subtraction is to convert each value to a number of seconds using EXTRACT(EPOCH FROM ...), then subtract the results; this produces the number of seconds between the two values. This will adjust for the number of days in each month, timezone changes, and daylight saving time adjustments. Subtraction of date or timestamp values with the "-" operator returns the number of days (24-hours) and hours/minutes/seconds between the values, making the same adjustments. The age function returns years, months, days, and hours/minutes/seconds, performing field-by-field subtraction and then adjusting for negative field values. The following queries illustrate the differences in these approaches. The sample results were produced with timezone = 'US/Eastern'; there is a daylight saving time change between the two dates used:

SELECT EXTRACT(EPOCH FROM timestamptz '2013-07-01 12:00:00') -
       EXTRACT(EPOCH FROM timestamptz '2013-03-01 12:00:00');
Result: 10537200
SELECT (EXTRACT(EPOCH FROM timestamptz '2013-07-01 12:00:00') -
        EXTRACT(EPOCH FROM timestamptz '2013-03-01 12:00:00'))
        / 60 / 60 / 24;
Result: 121.958333333333
SELECT timestamptz '2013-07-01 12:00:00' - timestamptz '2013-03-01 12:00:00';
Result: 121 days 23:00:00
SELECT age(timestamptz '2013-07-01 12:00:00', timestamptz '2013-03-01 12:00:00');
Result: 4 mons

9.9.1. `EXTRACT`, `date_part`

EXTRACT(field FROM source)

The extract function retrieves subfields such as year or hour from date/time values. source must be a value expression of type timestamp, time, or interval. (Expressions of type dateare cast to timestamp and can therefore be used as well.) field is an identifier or string that selects what field to extract from the source value. The extract function returns values of type double precision. The following are valid field names:century

The century

SELECT EXTRACT(CENTURY FROM TIMESTAMP '2000-12-16 12:21:13');
Result: 20
SELECT EXTRACT(CENTURY FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 21

The first century starts at 0001-01-01 00:00:00 AD, although they did not know it at the time. This definition applies to all Gregorian calendar countries. There is no century number 0, you go from -1 century to 1 century. If you disagree with this, please write your complaint to: Pope, Cathedral Saint-Peter of Roma, Vatican.day

For timestamp values, the day (of the month) field (1 - 31) ; for interval values, the number of days

SELECT EXTRACT(DAY FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 16

SELECT EXTRACT(DAY FROM INTERVAL '40 days 1 minute');
Result: 40

decade

The year field divided by 10

SELECT EXTRACT(DECADE FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 200

dow

The day of the week as Sunday (0) to Saturday (6)

SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 5

Note that extract's day of the week numbering differs from that of the to_char(..., 'D') function.doy

The day of the year (1 - 365/366)

SELECT EXTRACT(DOY FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 47

epoch

For timestamp with time zone values, the number of seconds since 1970-01-01 00:00:00 UTC (can be negative); for date and timestamp values, the number of seconds since 1970-01-01 00:00:00 local time; for interval values, the total number of seconds in the interval

SELECT EXTRACT(EPOCH FROM TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40.12-08');
Result: 982384720.12

SELECT EXTRACT(EPOCH FROM INTERVAL '5 days 3 hours');
Result: 442800

Here is how you can convert an epoch value back to a time stamp:

SELECT TIMESTAMP WITH TIME ZONE 'epoch' + 982384720.12 * INTERVAL '1 second';

(The to_timestamp function encapsulates the above conversion.)hour

The hour field (0 - 23)

SELECT EXTRACT(HOUR FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 20

isodow

The day of the week as Monday (1) to Sunday (7)

SELECT EXTRACT(ISODOW FROM TIMESTAMP '2001-02-18 20:38:40');
Result: 7

This is identical to dow except for Sunday. This matches the ISO 8601 day of the week numbering.isoyear

The ISO 8601 week-numbering year that the date falls in (not applicable to intervals)

SELECT EXTRACT(ISOYEAR FROM DATE '2006-01-01');
Result: 2005
SELECT EXTRACT(ISOYEAR FROM DATE '2006-01-02');
Result: 2006

Each ISO 8601 week-numbering year begins with the Monday of the week containing the 4th of January, so in early January or late December the ISO year may be different from the Gregorian year. See the week field for more information.

This field is not available in PostgreSQL releases prior to 8.3.microseconds

The seconds field, including fractional parts, multiplied by 1 000 000; note that this includes full seconds

SELECT EXTRACT(MICROSECONDS FROM TIME '17:12:28.5');
Result: 28500000

millennium

The millennium

SELECT EXTRACT(MILLENNIUM FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 3

Years in the 1900s are in the second millennium. The third millennium started January 1, 2001.milliseconds

The seconds field, including fractional parts, multiplied by 1000. Note that this includes full seconds.

SELECT EXTRACT(MILLISECONDS FROM TIME '17:12:28.5');
Result: 28500

minute

The minutes field (0 - 59)

SELECT EXTRACT(MINUTE FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 38

month

For timestamp values, the number of the month within the year (1 - 12) ; for interval values, the number of months, modulo 12 (0 - 11)

SELECT EXTRACT(MONTH FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 2

SELECT EXTRACT(MONTH FROM INTERVAL '2 years 3 months');
Result: 3

SELECT EXTRACT(MONTH FROM INTERVAL '2 years 13 months');
Result: 1

quarter

The quarter of the year (1 - 4) that the date is in

SELECT EXTRACT(QUARTER FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 1

second

The seconds field, including fractional parts (0 - 59[1])

SELECT EXTRACT(SECOND FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 40

SELECT EXTRACT(SECOND FROM TIME '17:12:28.5');
Result: 28.5

timezone

The time zone offset from UTC, measured in seconds. Positive values correspond to time zones east of UTC, negative values to zones west of UTC. (Technically, PostgreSQL uses UT1 because leap seconds are not handled.)timezone_hour

The hour component of the time zone offsettimezone_minute

The minute component of the time zone offsetweek

The number of the ISO 8601 week-numbering week of the year. By definition, ISO weeks start on Mondays and the first week of a year contains January 4 of that year. In other words, the first Thursday of a year is in week 1 of that year.

In the ISO week-numbering system, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for late-December dates to be part of the first week of the next year. For example, 2005-01-01 is part of the 53rd week of year 2004, and 2006-01-01 is part of the 52nd week of year 2005, while 2012-12-31 is part of the first week of 2013. It's recommended to use the isoyear field together with week to get consistent results.

SELECT EXTRACT(WEEK FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 7

year

The year field. Keep in mind there is no 0 AD, so subtracting BC years from AD years should be done with care.

SELECT EXTRACT(YEAR FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 2001

The extract function is primarily intended for computational processing. For formatting date/time values for display, see Section 9.8.

The date_part function is modeled on the traditional Ingres equivalent to the SQL-standard function extract:

date_part('field', source)

Note that here the field parameter needs to be a string value, not a name. The valid field names for date_part are the same as for extract.

SELECT date_part('day', TIMESTAMP '2001-02-16 20:38:40');
Result: 16

SELECT date_part('hour', INTERVAL '4 hours 3 minutes');
Result: 4

9.9.2. `date_trunc`

The function date_trunc is conceptually similar to the trunc function for numbers.

date_trunc('field', source)

source is a value expression of type timestamp or interval. (Values of type date and time are cast automatically to timestamp or interval, respectively.) field selects to which precision to truncate the input value. The return value is of type timestamp or interval with all fields that are less significant than the selected one set to zero (or one, for day and month).

Valid values for field are:

microseconds

milliseconds

second

minute

hour

day

week

month

quarter

year

decade

century

millennium

Examples:

SELECT date_trunc('hour', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-02-16 20:00:00

SELECT date_trunc('year', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-01-01 00:00:00

9.9.3. AT TIME ZONE

The AT TIME ZONE construct allows conversions of time stamps to different time zones. Table 9-29 shows its variants.

Table 9-29. AT TIME ZONE Variants

Expression

Return Type

Description

timestamp without time zone AT TIME ZONE zone

timestamp with time zone

Treat given time stamp without time zone as located in the specified time zone

timestamp with time zone AT TIME ZONE zone

timestamp without time zone

Convert given time stamp with time zone to the new time zone, with no time zone designation

time with time zone AT TIME ZONE zone

time with time zone

Convert given time with time zone to the new time zone

In these expressions, the desired time zone zone can be specified either as a text string (e.g., 'PST') or as an interval (e.g., INTERVAL '-08:00'). In the text case, a time zone name can be specified in any of the ways described in Section 8.5.3.

Examples (assuming the local time zone is PST8PDT):

SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'MST';
Result: 2001-02-16 19:38:40-08

SELECT TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40-05' AT TIME ZONE 'MST';
Result: 2001-02-16 18:38:40

The first example takes a time stamp without time zone and interprets it as MST time (UTC-7), which is then converted to PST (UTC-8) for display. The second example takes a time stamp specified in EST (UTC-5) and converts it to local time in MST (UTC-7).

The function timezone(zone, timestamp) is equivalent to the SQL-conforming construct timestamp AT TIME ZONE zone.

9.9.4. Current Date/Time

PostgreSQL provides a number of functions that return values related to the current date and time. These SQL-standard functions all return values based on the start time of the current transaction:

CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
CURRENT_TIME(precision)
CURRENT_TIMESTAMP(precision)
LOCALTIME
LOCALTIMESTAMP
LOCALTIME(precision)
LOCALTIMESTAMP(precision)

CURRENT_TIME and CURRENT_TIMESTAMP deliver values with time zone; LOCALTIME and LOCALTIMESTAMP deliver values without time zone.

CURRENT_TIME, CURRENT_TIMESTAMP, LOCALTIME, and LOCALTIMESTAMP can optionally take a precision parameter, which causes the result to be rounded to that many fractional digits in the seconds field. Without a precision parameter, the result is given to the full available precision.

Some examples:

SELECT CURRENT_TIME;
Result: 14:39:53.662522-05

SELECT CURRENT_DATE;
Result: 2001-12-23

SELECT CURRENT_TIMESTAMP;
Result: 2001-12-23 14:39:53.662522-05

SELECT CURRENT_TIMESTAMP(2);
Result: 2001-12-23 14:39:53.66-05

SELECT LOCALTIMESTAMP;
Result: 2001-12-23 14:39:53.662522

Since these functions return the start time of the current transaction, their values do not change during the transaction. This is considered a feature: the intent is to allow a single transaction to have a consistent notion of the "current" time, so that multiple modifications within the same transaction bear the same time stamp.

Note: Other database systems might advance these values more frequently.

PostgreSQL also provides functions that return the start time of the current statement, as well as the actual current time at the instant the function is called. The complete list of non-SQL-standard time functions is:

transaction_timestamp()
statement_timestamp()
clock_timestamp()
timeofday()
now()

transaction_timestamp() is equivalent to CURRENT_TIMESTAMP, but is named to clearly reflect what it returns. statement_timestamp() returns the start time of the current statement (more specifically, the time of receipt of the latest command message from the client). statement_timestamp() and transaction_timestamp() return the same value during the first command of a transaction, but might differ during subsequent commands. clock_timestamp() returns the actual current time, and therefore its value changes even within a single SQL command. timeofday() is a historical PostgreSQL function. Like clock_timestamp(), it returns the actual current time, but as a formatted text string rather than a timestamp with time zone value. now() is a traditional PostgreSQL equivalent to transaction_timestamp().

All the date/time data types also accept the special literal value now to specify the current date and time (again, interpreted as the transaction start time). Thus, the following three all return the same result:

SELECT CURRENT_TIMESTAMP;
SELECT now();
SELECT TIMESTAMP 'now';  -- incorrect for use with DEFAULT

Tip: You do not want to use the third form when specifying a DEFAULT clause while creating a table. The system will convert now to a timestamp as soon as the constant is parsed, so that when the default value is needed, the time of the table creation would be used! The first two forms will not be evaluated until the default value is used, because they are function calls. Thus they will give the desired behavior of defaulting to the time of row insertion.

9.9.5. Delaying Execution

The following functions are available to delay execution of the server process:

pg_sleep(seconds)
pg_sleep_for(interval)
pg_sleep_until(timestamp with time zone)

pg_sleep makes the current session's process sleep until seconds seconds have elapsed. seconds is a value of type double precision, so fractional-second delays can be specified. pg_sleep_for is a convenience function for larger sleep times specified as an interval. pg_sleep_until is a convenience function for when a specific wake-up time is desired. For example:

SELECT pg_sleep(1.5);
SELECT pg_sleep_for('5 minutes');
SELECT pg_sleep_until('tomorrow 03:00');

Note: The effective resolution of the sleep interval is platform-specific; 0.01 seconds is a common value. The sleep delay will be at least as long as specified. It might be longer depending on factors such as server load. In particular, pg_sleep_until is not guaranteed to wake up exactly at the specified time, but it will not wake up any earlier.

Warning

Make sure that your session does not hold more locks than necessary when calling pg_sleepor its variants. Otherwise other sessions might have to wait for your sleeping process, slowing down the entire system.

Notes

60 if leap seconds are implemented by the operating system

8.5. 日期時間型別

PostgreSQL 支援完整的 SQL 日期和時間格式，如表 8.9 所示。對於這些資料型態能使用的操作，將會在9.9節說明。

Table 8.9. 日期/時間型態

Name

Storage Size

Description

Low Value

High Value

Resolution

timestamp [ (p) ] [ without time zone ]

8 bytes

both date and time (no time zone)

4713 BC

294276 AD

1 microsecond

timestamp [ (p) ] with time zone

8 bytes

both date and time, with time zone

4713 BC

294276 AD

1 microsecond

date

4 bytes

date (no time of day)

4713 BC

5874897 AD

1 day

time [ (p) ] [ without time zone ]

8 bytes

time of day (no date)

00:00:00

24:00:00

1 microsecond

time [ (p) ] with time zone

12 bytes

time of day (no date), with time zone

00:00:00+1459

24:00:00-1459

1 microsecond

interval [ fields ] [ (p) ]

16 bytes

time interval

-178000000 years

178000000 years

1 microsecond

注意

SQL 標準中要求 timestamp 的效果等同於 timestamp without time zone，對此 PostgreSQL 尊重這個行為。同時 PostgreSQL 額外擴充了 timestamptz 作為 timestamp with time zone 的縮寫。

time、timestamp 和 interval 接受 p 作為非必須的精度參數，可指定秒的欄位保留的小數位數。預設情況下，精度沒有明確的界限。其中 p 允許的範圍是 0 到 6。

interval 型態有個額外的選項，可以寫下下列其中一個詞組來限制存放的欄位：

YEAR
MONTH
DAY
HOUR
MINUTE
SECOND
YEAR TO MONTH
DAY TO HOUR
DAY TO MINUTE
DAY TO SECOND
HOUR TO MINUTE
HOUR TO SECOND
MINUTE TO SECOND

需注意若是 fields 和 p 同時指定時，fields 必須要包含 SECOND。這是因為精度只會套用在秒上。

time with time zone 型態是由 SQL 標準所定義的，但是在定義中展示的屬性會導致對有用性產生疑問。在多數狀況下，date、time、timestamp without time zone 和 timestamp with time zone 的組合應該就能提供任何應用程式需要的完整日期/時間功能。

abstime 和 reltime 型態是較低精度的內部用型態，並不建議將這些型態用在應用程式中；這些內部型態也可能在未來的釋出中消失。

8.5.1. 日期/時間輸入

日期和時間的輸入格式可以接受幾乎任何合理的格式，包括 ISO 8601、相容於 SQL 的格式、傳統 POSTGRES 格式或者其他格式。在部份格式中，日期的年、月、日的順序可能很含糊，因此有支援指定這些欄位期望的順序。可以設定 DateStyle 參數為 MDY 來以月-日-年表示、設定為 DMY 以日-月-年表示、或者設定為 YMD 以年-月-日表示。

PostgreSQL 在處理日期/時間的輸入是比 SQL 標準要求的更加靈活，關於精確的解析規則以及包含月份、一週天數、時區等可以接受的文字欄位，可以參閱附錄 B。

請記得，任何日期和時間字面的輸入，都需要像文字一樣以單引號結束，詳細的資訊請參閱4.1.2.7 節。SQL 要求使用以下的語法：

type [ (p) ] 'value'

其中 p 是非必須的精度設定，用來指定秒欄位的小數位數。精度可以用來指定 time、timestamp 和 interval 型態，可指定範圍為 0 到 6。如果沒有指定精度時，預設將以字面數值的精度為準（但最多不超過 6 位）。

8.5.1.1. 日期

表 8.10 列出 date 型態的一些可能的輸入格式：

表 8.10. 日期輸入

Example

Description

1999-01-08

ISO 8601; January 8 in any mode (recommended format)

January 8, 1999

unambiguous in any datestyle input mode

1/8/1999

January 8 in MDY mode; August 1 in DMY mode

1/18/1999

January 18 in MDY mode; rejected in other modes

01/02/03

January 2, 2003 in MDY mode; February 1, 2003 in DMY mode; February 3, 2001 in YMD mode

1999-Jan-08

January 8 in any mode

Jan-08-1999

January 8 in any mode

08-Jan-1999

January 8 in any mode

99-Jan-08

January 8 in YMD mode, else error

08-Jan-99

January 8, except error in YMD mode

Jan-08-99

January 8, except error in YMD mode

19990108

ISO 8601; January 8, 1999 in any mode

990108

ISO 8601; January 8, 1999 in any mode

1999.008

year and day of year

J2451187

Julian date

January 8, 99 BC

year 99 BC

8.5.1.2. 時間

time-of-day 格式包含 time [ (p) ] without time zone和time [ (_p_\) \] with time zone，其中 time 單獨出現時等同於 time without time zone。

這些型態的合法輸入包含了一天當中的時間，以及非必須的時區。（請參照表 8.11 和表 8.12）。如果在 time without time zone 的輸入中指定了時區，則時區會被無聲地忽略。你也可以指定日期，但日期也會被忽略，除非你指定的時區名稱是像 America/New_York 這種具有日光節約規則的時區，因為在這種狀況下，為了能夠決定要套用一般規則或是日光節約規則，必須要有日期。適合的時差資訊會被紀錄在 time with time zone 的值當中。

表 8.11. 時間輸入

Example

Description

04:05:06.789

ISO 8601

04:05:06

ISO 8601

04:05

ISO 8601

040506

ISO 8601

04:05 AM

same as 04:05; AM does not affect value

04:05 PM

same as 16:05; input hour must be <= 12

04:05:06.789-8

ISO 8601

04:05:06-08:00

ISO 8601

04:05-08:00

ISO 8601

040506-08

ISO 8601

04:05:06 PST

time zone specified by abbreviation

2003-04-12 04:05:06 America/New_York

time zone specified by full name

表 8.12. 時區輸入

Example

Description

PST

Abbreviation (for Pacific Standard Time)

America/New_York

Full time zone name

PST8PDT

POSIX-style time zone specification

-8:00

ISO-8601 offset for PST

-800

ISO-8601 offset for PST

-8

ISO-8601 offset for PST

zulu

Military abbreviation for UTC

z

Short form of zulu

關於指定時區的其他資訊，請參照8.5.3節。

8.5.1.3. 時間戳記

時間戳記型態的合法輸入，依序包含了日期、時間、非必須的時區、以及非必須的 AD 或者 BC。（其中，AD 或者 BC 也可以寫在時區前面，但這並非推薦的格式。）因此：

1999-01-08 04:05:06

以及：

1999-01-08 04:05:06 -8:00

都是遵循 ISO 8601 標準的合法值。除此之外，常見的格式：

January 8 04:05:06 1999 PST

也有支援。

SQL 標準中，timestamp without time zone 和 timestamp with time zone 字面可以在時間後面加上 “+” 或 “-” 符號和時差來做區別，因此根據這個標準，

TIMESTAMP '2004-10-19 10:23:54'

是 timestamp without time zone 型態，而

TIMESTAMP '2004-10-19 10:23:54+02'

則是 timestamp with time zone 型態。PostgreSQL 從不會在識別型態前就解析字面的內容，因此會將上述兩種值都視為 timestamp without time zone 型態。如要確保字面會被視為 timestamp with time zone，請給它正確而明確的型態：

TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'

在一個已被確定為沒有時區的時間戳記的字串中，PostgreSQL 將默默地忽略任何時區指示。也就是說，結果值是從輸入值中的日期/時間字串產生的，而不針對時區進行調整。

對於帶有時區的時間戳記，內部儲存的值始終為 UTC（Universal Coordinated Time，傳統上稱為格林威治標準時間，GMT）。具有指定時區的輸入值將使用該時區的相對偏移量轉換為 UTC。如果輸入字串中未指定時區，則假定它位於系統的 TimeZone 參數所指示的時區中，並使用時區的偏移量轉換為 UTC。

輸出帶有時區值的時間戳記時，始終由 UTC 轉換為目前時區，並在該時區中顯示為本地時間。要查看另一個時區的時間，請變更時區或使用 AT TIME ZONE 語法（參閱第 9.9.3 節）。

沒有時區的時間戳記和帶時區的時間戳記之間的轉換通常假定應該採用沒有時區值的時間戳記或本地時間所給予的時區。可以使用 AT TIME ZONE 為指定轉換不同的時區。

8.5.1.4. 特殊值

為方便起見，PostgreSQL 支援幾個特殊的日期/時間輸入值，如 Table 8.13 所示。infinaity 和 -infinity 值在系統內部有特別的表示，但不會顯示；而其他的只是符號縮寫，在閱讀時會轉換為普通的日期/時間值。（特別是，now 和相關的字串一旦被讀取就會被轉換為特定的時間值。）當在 SQL 命令中要作為常數使用時，所有這些值都需要用單引號括起來。

Table 8.13. Special Date/Time Inputs

Input String

Valid Types

Description

epoch

date, timestamp

1970-01-01 00:00:00+00 (Unix system time zero)

infinity

date, timestamp

later than all other time stamps

-infinity

date, timestamp

earlier than all other time stamps

now

date, time, timestamp

current transaction's start time

today

date, timestamp

midnight today

tomorrow

date, timestamp

midnight tomorrow

yesterday

date, timestamp

midnight yesterday

allballs

time

00:00:00.00 UTC

以下 SQL 相容函數也可用於取得相對應資料型別目前的時間值：CURRENT_DATE，CURRENT_TIME，CURRENT_TIMESTAMP，LOCALTIME，LOCALTIMESTAMP。後四者接受選擇性的 subsecond 級精確度。（請參閱第 9.9.4 節。）請注意，這些是 SQL 函數，在資料輸入字串中會無法識別。

8.5.2. Date/Time Output

The output format of the date/time types can be set to one of the four styles ISO 8601, SQL (Ingres), traditional POSTGRES (Unix date format), or German. The default is the ISO format. (The SQL standard requires the use of the ISO 8601 format. The name of the “SQL” output format is a historical accident.) Table 8.14 shows examples of each output style. The output of the date and time types is generally only the date or time part in accordance with the given examples. However, the POSTGRES style outputs date-only values in ISO format.

Table 8.14. Date/Time Output Styles

Style Specification

Description

Example

ISO

ISO 8601, SQL standard

1997-12-17 07:37:16-08

SQL

traditional style

12/17/1997 07:37:16.00 PST

Postgres

original style

Wed Dec 17 07:37:16 1997 PST

German

regional style

17.12.1997 07:37:16.00 PST

Note

ISO 8601 specifies the use of uppercase letter T to separate the date and time. PostgreSQLaccepts that format on input, but on output it uses a space rather than T, as shown above. This is for readability and for consistency with RFC 3339 as well as some other database systems.

In the SQL and POSTGRES styles, day appears before month if DMY field ordering has been specified, otherwise month appears before day. (See Section 8.5.1 for how this setting also affects interpretation of input values.) Table 8.15 shows examples.

Table 8.15. Date Order Conventions

datestyle Setting

Input Ordering

Example Output

SQL, DMY

day/month/year

17/12/1997 15:37:16.00 CET

SQL, MDY

month/day/year

12/17/1997 07:37:16.00 PST

Postgres, DMY

day/month/year

Wed 17 Dec 07:37:16 1997 PST

The date/time style can be selected by the user using the SET datestyle command, the DateStyle parameter in the postgresql.conf configuration file, or the PGDATESTYLE environment variable on the server or client.

The formatting function to_char (see Section 9.8) is also available as a more flexible way to format date/time output.

8.5.3. Time Zones

Time zones, and time-zone conventions, are influenced by political decisions, not just earth geometry. Time zones around the world became somewhat standardized during the 1900s, but continue to be prone to arbitrary changes, particularly with respect to daylight-savings rules. PostgreSQL uses the widely-used IANA (Olson) time zone database for information about historical time zone rules. For times in the future, the assumption is that the latest known rules for a given time zone will continue to be observed indefinitely far into the future.

PostgreSQL endeavors to be compatible with the SQL standard definitions for typical usage. However, the SQL standard has an odd mix of date and time types and capabilities. Two obvious problems are:

Although the date type cannot have an associated time zone, the time type can. Time zones in the real world have little meaning unless associated with a date as well as a time, since the offset can vary through the year with daylight-saving time boundaries.
The default time zone is specified as a constant numeric offset from UTC. It is therefore impossible to adapt to daylight-saving time when doing date/time arithmetic across DST boundaries.

To address these difficulties, we recommend using date/time types that contain both date and time when using time zones. We do not recommend using the type time with time zone (though it is supported by PostgreSQL for legacy applications and for compliance with the SQL standard). PostgreSQL assumes your local time zone for any type containing only date or time.

All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the zone specified by the TimeZone configuration parameter before being displayed to the client.

PostgreSQL allows you to specify time zones in three different forms:

A full time zone name, for example America/New_York. The recognized time zone names are listed in the pg_timezone_names view (see Section 51.90). PostgreSQL uses the widely-used IANA time zone data for this purpose, so the same time zone names are also recognized by much other software.
A time zone abbreviation, for example PST. Such a specification merely defines a particular offset from UTC, in contrast to full time zone names which can imply a set of daylight savings transition-date rules as well. The recognized abbreviations are listed in the pg_timezone_abbrevs view (see Section 51.89). You cannot set the configuration parameters TimeZone or log_timezone to a time zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME ZONE operator.
In addition to the timezone names and abbreviations, PostgreSQL will accept POSIX-style time zone specifications of the form STDoffset or STDoffsetDST, where STD is a zone abbreviation, offset is a numeric offset in hours west from UTC, and DST is an optional daylight-savings zone abbreviation, assumed to stand for one hour ahead of the given offset. For example, if EST5EDT were not already a recognized zone name, it would be accepted and would be functionally equivalent to United States East Coast time. In this syntax, a zone abbreviation can be a string of letters, or an arbitrary string surrounded by angle brackets (<>). When a daylight-savings zone abbreviation is present, it is assumed to be used according to the same daylight-savings transition rules used in the IANA time zone database's posixrules entry. In a standard PostgreSQL installation, posixrules is the same as US/Eastern, so that POSIX-style time zone specifications follow USA daylight-savings rules. If needed, you can adjust this behavior by replacing the posixrules file.

In short, this is the difference between abbreviations and full names: abbreviations represent a specific offset from UTC, whereas many of the full names imply a local daylight-savings time rule, and so have two possible UTC offsets. As an example, 2014-06-04 12:00 America/New_York represents noon local time in New York, which for this particular date was Eastern Daylight Time (UTC-4). So 2014-06-04 12:00 EDT specifies that same time instant. But 2014-06-04 12:00 EST specifies noon Eastern Standard Time (UTC-5), regardless of whether daylight savings was nominally in effect on that date.

To complicate matters, some jurisdictions have used the same timezone abbreviation to mean different UTC offsets at different times; for example, in Moscow MSK has meant UTC+3 in some years and UTC+4 in others. PostgreSQLinterprets such abbreviations according to whatever they meant (or had most recently meant) on the specified date; but, as with the EST example above, this is not necessarily the same as local civil time on that date.

One should be wary that the POSIX-style time zone feature can lead to silently accepting bogus input, since there is no check on the reasonableness of the zone abbreviations. For example, SET TIMEZONE TO FOOBAR0 will work, leaving the system effectively using a rather peculiar abbreviation for UTC. Another issue to keep in mind is that in POSIX time zone names, positive offsets are used for locations west of Greenwich. Everywhere else, PostgreSQLfollows the ISO-8601 convention that positive timezone offsets are east of Greenwich.

In all cases, timezone names and abbreviations are recognized case-insensitively. (This is a change from PostgreSQL versions prior to 8.2, which were case-sensitive in some contexts but not others.)

Neither timezone names nor abbreviations are hard-wired into the server; they are obtained from configuration files stored under .../share/timezone/ and .../share/timezonesets/ of the installation directory (see Section B.3).

The TimeZone configuration parameter can be set in the file postgresql.conf, or in any of the other standard ways described in Chapter 19. There are also some special ways to set it:

The SQL command SET TIME ZONE sets the time zone for the session. This is an alternative spelling of SET TIMEZONE TO with a more SQL-spec-compatible syntax.
The PGTZ environment variable is used by libpq clients to send a SET TIME ZONE command to the server upon connection.

8.5.4. Interval Input

interval values can be written using the following verbose syntax:

[@] quantity unit [quantity unit...] [direction]

where quantity is a number (possibly signed); unit is microsecond, millisecond, second, minute, hour, day, week, month, year, decade, century, millennium, or abbreviations or plurals of these units; direction can be ago or empty. The at sign (@) is optional noise. The amounts of the different units are implicitly added with appropriate sign accounting. ago negates all the fields. This syntax is also used for interval output, if IntervalStyle is set to postgres_verbose.

Quantities of days, hours, minutes, and seconds can be specified without explicit unit markings. For example, '1 12:59:10' is read the same as '1 day 12 hours 59 min 10 sec'. Also, a combination of years and months can be specified with a dash; for example '200-10' is read the same as '200 years 10 months'. (These shorter forms are in fact the only ones allowed by the SQL standard, and are used for output when IntervalStyle is set to sql_standard.)

Interval values can also be written as ISO 8601 time intervals, using either the “format with designators” of the standard's section 4.4.3.2 or the “alternative format” of section 4.4.3.3. The format with designators looks like this:

P quantity unit [ quantity unit ...] [ T [ quantity unit ...]]

The string must start with a P, and may include a T that introduces the time-of-day units. The available unit abbreviations are given in Table 8.16. Units may be omitted, and may be specified in any order, but units smaller than a day must appear after T. In particular, the meaning of M depends on whether it is before or after T.

Table 8.16. ISO 8601 Interval Unit Abbreviations

Abbreviation

Meaning

Years

Months (in the date part)

Weeks

Days

Hours

Minutes (in the time part)

Seconds

In the alternative format:

P [ years-months-days ] [ T hours:minutes:seconds ]

the string must begin with P, and a T separates the date and time parts of the interval. The values are given as numbers similar to ISO 8601 dates.

When writing an interval constant with a fields specification, or when assigning a string to an interval column that was defined with a fields specification, the interpretation of unmarked quantities depends on the fields. For example INTERVAL '1' YEAR is read as 1 year, whereas INTERVAL '1' means 1 second. Also, field values “to the right” of the least significant field allowed by the fields specification are silently discarded. For example, writing INTERVAL '1 day 2:03:04' HOUR TO MINUTE results in dropping the seconds field, but not the day field.

According to the SQL standard all fields of an interval value must have the same sign, so a leading negative sign applies to all fields; for example the negative sign in the interval literal '-1 2:03:04' applies to both the days and hour/minute/second parts. PostgreSQL allows the fields to have different signs, and traditionally treats each field in the textual representation as independently signed, so that the hour/minute/second part is considered positive in this example. If IntervalStyle is set to sql_standard then a leading sign is considered to apply to all fields (but only if no additional signs appear). Otherwise the traditional PostgreSQL interpretation is used. To avoid ambiguity, it's recommended to attach an explicit sign to each field if any field is negative.

Internally interval values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. Because intervals are usually created from constant strings or timestamp subtraction, this storage method works well in most cases. Functions justify_days and justify_hours are available for adjusting days and hours that overflow their normal ranges.

In the verbose input format, and in some fields of the more compact input formats, field values can have fractional parts; for example '1.5 week' or '01:02:03.45'. Such input is converted to the appropriate number of months, days, and seconds for storage. When this would result in a fractional number of months or days, the fraction is added to the lower-order fields using the conversion factors 1 month = 30 days and 1 day = 24 hours. For example,'1.5 month' becomes 1 month and 15 days. Only seconds will ever be shown as fractional on output.

Table 8.17 shows some examples of valid interval input.

Table 8.17. Interval Input

Example

Description

1-2

SQL standard format: 1 year 2 months

3 4:05:06

SQL standard format: 3 days 4 hours 5 minutes 6 seconds

1 year 2 months 3 days 4 hours 5 minutes 6 seconds

Traditional Postgres format: 1 year 2 months 3 days 4 hours 5 minutes 6 seconds

P1Y2M3DT4H5M6S

ISO 8601 “format with designators”: same meaning as above

P0001-02-03T04:05:06

ISO 8601 “alternative format”: same meaning as above

8.5.5. Interval Output

The output format of the interval type can be set to one of the four styles sql_standard, postgres, postgres_verbose, or iso_8601, using the command SET intervalstyle. The default is the postgres format. Table 8.18 shows examples of each output style.

The sql_standard style produces output that conforms to the SQL standard's specification for interval literal strings, if the interval value meets the standard's restrictions (either year-month only or day-time only, with no mixing of positive and negative components). Otherwise the output looks like a standard year-month literal string followed by a day-time literal string, with explicit signs added to disambiguate mixed-sign intervals.

The output of the postgres style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle parameter was set to ISO.

The output of the postgres_verbose style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle parameter was set to non-ISO output.

The output of the iso_8601 style matches the “format with designators” described in section 4.4.3.2 of the ISO 8601 standard.

Table 8.18. Interval Output Style Examples

Style Specification

Year-Month Interval

Day-Time Interval

Mixed Interval

sql_standard

1-2

3 4:05:06

-1-2 +3 -4:05:06

postgres

1 year 2 mons

3 days 04:05:06

-1 year -2 mons +3 days -04:05:06

postgres_verbose

@ 1 year 2 mons

@ 3 days 4 hours 5 mins 6 secs

@ 1 year 2 mons -3 days 4 hours 5 mins 6 secs ago

iso_8601

P1Y2M

P3DT4H5M6S

P-1Y-2M3DT-4H-5M-6S

9.7. 特徵比對

版本：11

PostgreSQL 提供了三種不同的特徵比對方法：傳統的 SQL LIKE 運算子，最新的 SIMILAR TO 運算子（於 SQL：1999 中加入）和 POSIX 樣式的正規表示式。除了基本的「這個字串符合這個樣式嗎？」運算子之外，還可以使用函數來提取或替換符合的子字串，以及在配對的位置拆分字串。

提醒如果您的特徵比對需求超出此範圍，請考慮在 Perl 或 Tcl 中撰寫使用者定義的函數。

注意

雖然大多數正規表示式搜尋可以非常快速地執行，但是完成正規表示式需要花費大量的時間和記憶體來處理。要特別注意從各種來源接受正規表示式的搜尋方式。如果必須這樣做，建議強制限制執行語句執行時間。

使用 SIMILAR TO 方式的搜尋具有相同的安全隱憂，因為 SIMILAR TO 提供了許多與 POSIX 樣式的正規表示式相同功能。

LIKE 搜尋比其他兩個選項要簡單得多，在使用可能惡意的來源時更安全。

9.7.1. `LIKE`

string LIKE pattern [ESCAPE escape-character]
string NOT LIKE pattern [ESCAPE escape-character]

The LIKE expression returns true if the string matches the supplied pattern. (As expected, the NOT LIKE expression returns false if LIKE returns true, and vice versa. An equivalent expression is NOT (string LIKE pattern).)

If pattern does not contain percent signs or underscores, then the pattern only represents the string itself; in that case LIKE acts like the equals operator. An underscore (_) in pattern stands for (matches) any single character; a percent sign (%) matches any sequence of zero or more characters.

Some examples:

'abc' LIKE 'abc'    true
'abc' LIKE 'a%'     true
'abc' LIKE '_b_'    true
'abc' LIKE 'c'      false

LIKE pattern matching always covers the entire string. Therefore, if it's desired to match a sequence anywhere within a string, the pattern must start and end with a percent sign.

To match a literal underscore or percent sign without matching other characters, the respective character in pattern must be preceded by the escape character. The default escape character is the backslash but a different one can be selected by using the ESCAPE clause. To match the escape character itself, write two escape characters.

Note

If you have standard_conforming_strings turned off, any backslashes you write in literal string constants will need to be doubled. See Section 4.1.2.1 for more information.

It's also possible to select no escape character by writing ESCAPE ''. This effectively disables the escape mechanism, which makes it impossible to turn off the special meaning of underscore and percent signs in the pattern.

The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale. This is not in the SQL standard but is a PostgreSQL extension.

The operator ~~ is equivalent to LIKE, and ~~* corresponds to ILIKE. There are also !~~ and !~~* operators that represent NOT LIKE and NOT ILIKE, respectively. All of these operators are PostgreSQL-specific.

There is also the prefix operator ^@ and corresponding starts_with function which covers cases when only searching by beginning of the string is needed.

9.7.2. `SIMILAR TO` Regular Expressions

string SIMILAR TO pattern [ESCAPE escape-character]
string NOT SIMILAR TO pattern [ESCAPE escape-character]

The SIMILAR TO operator returns true or false depending on whether its pattern matches the given string. It is similar to LIKE, except that it interprets the pattern using the SQL standard's definition of a regular expression. SQL regular expressions are a curious cross between LIKE notation and common regular expression notation.

Like LIKE, the SIMILAR TO operator succeeds only if its pattern matches the entire string; this is unlike common regular expression behavior where the pattern can match any part of the string. Also like LIKE, SIMILAR TO uses _ and % as wildcard characters denoting any single character and any string, respectively (these are comparable to . and .* in POSIX regular expressions).

In addition to these facilities borrowed from LIKE, SIMILAR TO supports these pattern-matching metacharacters borrowed from POSIX regular expressions:

| denotes alternation (either of two alternatives).
* denotes repetition of the previous item zero or more times.
+ denotes repetition of the previous item one or more times.
? denotes repetition of the previous item zero or one time.
{m} denotes repetition of the previous item exactly m times.
{m,} denotes repetition of the previous item m or more times.
{m,n} denotes repetition of the previous item at least m and not more than n times.
Parentheses () can be used to group items into a single logical item.
A bracket expression [...] specifies a character class, just as in POSIX regular expressions.

Notice that the period (.) is not a metacharacter for SIMILAR TO.

As with LIKE, a backslash disables the special meaning of any of these metacharacters; or a different escape character can be specified with ESCAPE.

Some examples:

'abc' SIMILAR TO 'abc'      true
'abc' SIMILAR TO 'a'        false
'abc' SIMILAR TO '%(b|d)%'  true
'abc' SIMILAR TO '(b|c)%'   false

The substring function with three parameters, substring(string from pattern for escape-character), provides extraction of a substring that matches an SQL regular expression pattern. As with SIMILAR TO, the specified pattern must match the entire data string, or else the function fails and returns null. To indicate the part of the pattern that should be returned on success, the pattern must contain two occurrences of the escape character followed by a double quote ("). The text matching the portion of the pattern between these markers is returned.

Some examples, with #" delimiting the return string:

substring('foobar' from '%#"o_b#"%' for '#')   oob
substring('foobar' from '#"o_b#"%' for '#')    NULL

9.7.3. POSIX Regular Expressions

Table 9.14 lists the available operators for pattern matching using POSIX regular expressions.

Table 9.14. Regular Expression Match Operators

Operator

Description

Example

~

Matches regular expression, case sensitive

'thomas' ~ '.*thomas.*'

~*

Matches regular expression, case insensitive

'thomas' ~* '.*Thomas.*'

!~

Does not match regular expression, case sensitive

'thomas' !~ '.*Thomas.*'

!~*

Does not match regular expression, case insensitive

'thomas' !~* '.*vadim.*'

POSIX regular expressions provide a more powerful means for pattern matching than the LIKE and SIMILAR TO operators. Many Unix tools such as egrep, sed, or awk use a pattern matching language that is similar to the one described here.

A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a regular expression if it is a member of the regular set described by the regular expression. As with LIKE, pattern characters match string characters exactly unless they are special characters in the regular expression language — but regular expressions use different special characters than LIKE does. Unlike LIKE patterns, a regular expression is allowed to match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or end of the string.

Some examples:

'abc' ~ 'abc'    true
'abc' ~ '^a'     true
'abc' ~ '(b|d)'  true
'abc' ~ '^(b|c)' false

The POSIX pattern language is described in much greater detail below.

The substring function with two parameters, substring(string from pattern), provides extraction of a substring that matches a POSIX regular expression pattern. It returns null if there is no match, otherwise the portion of the text that matched the pattern. But if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned. You can put parentheses around the whole expression if you want to use parentheses within it without triggering this exception. If you need parentheses in the pattern before the subexpression you want to extract, see the non-capturing parentheses described below.

Some examples:

substring('foobar' from 'o.b')     oob
substring('foobar' from 'o(.)b')   o

The regexp_replace function provides substitution of new text for substrings that match POSIX regular expression patterns. It has the syntax regexp_replace(source, pattern, replacement [, flags ]). The source string is returned unchanged if there is no match to the pattern. If there is a match, the source string is returned with the replacement string substituted for the matching substring. The replacement string can contain \n, where n is 1 through 9, to indicate that the source substring matching the n'th parenthesized subexpression of the pattern should be inserted, and it can contain \& to indicate that the substring matching the entire pattern should be inserted. Write \\ if you need to put a literal backslash in the replacement text. The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Flag i specifies case-insensitive matching, while flag g specifies replacement of each matching substring rather than only the first one. Supported flags (though not g) are described in Table 9.22.

Some examples:

regexp_replace('foobarbaz', 'b..', 'X')
                                   fooXbaz
regexp_replace('foobarbaz', 'b..', 'X', 'g')
                                   fooXX
regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
                                   fooXarYXazY

The regexp_match function returns a text array of captured substring(s) resulting from the first match of a POSIX regular expression pattern to a string. It has the syntax regexp_match(string, pattern [, flags ]). If there is no match, the result is NULL. If a match is found, and the pattern contains no parenthesized subexpressions, then the result is a single-element text array containing the substring matching the whole pattern. If a match is found, and the pattern contains parenthesized subexpressions, then the result is a text array whose n'th element is the substring matching the n'th parenthesized subexpression of the pattern (not counting “non-capturing” parentheses; see below for details). The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Supported flags are described in Table 9.22.

Some examples:

SELECT regexp_match('foobarbequebaz', 'bar.*que');
 regexp_match
--------------
 {barbeque}
(1 row)

SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
 regexp_match
--------------
 {bar,beque}
(1 row)

In the common case where you just want the whole matching substring or NULL for no match, write something like

SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
 regexp_match
--------------
 barbeque
(1 row)

The regexp_matches function returns a set of text arrays of captured substring(s) resulting from matching a POSIX regular expression pattern to a string. It has the same syntax as regexp_match. This function returns no rows if there is no match, one row if there is a match and the g flag is not given, or N rows if there are N matches and the g flag is given. Each returned row is a text array containing the whole matched substring or the substrings matching parenthesized subexpressions of the pattern, just as described above for regexp_match. regexp_matches accepts all the flags shown in Table 9.22, plus the g flag which commands it to return all matches, not just the first one.

Some examples:

SELECT regexp_matches('foo', 'not there');
 regexp_matches
----------------
(0 rows)

SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b]+)(b[^b]+)', 'g');
 regexp_matches
----------------
 {bar,beque}
 {bazil,barf}
(2 rows)

Tip

In most cases regexp_matches() should be used with the g flag, since if you only want the first match, it's easier and more efficient to use regexp_match(). However,regexp_match() only exists in PostgreSQL version 10 and up. When working in older versions, a common trick is to place a regexp_matches() call in a sub-select, for example:

SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;

This produces a text array if there's a match, or NULL if not, the same as regexp_match()would do. Without the sub-select, this query would produce no output at all for table rows without a match, which is typically not the desired behavior.

The regexp_split_to_table function splits a string using a POSIX regular expression pattern as a delimiter. It has the syntax regexp_split_to_table(string, pattern [, flags ]). If there is no match to the pattern, the function returns the string. If there is at least one match, for each match it returns the text from the end of the last match (or the beginning of the string) to the beginning of the match. When there are no more matches, it returns the text from the end of the last match to the end of the string. The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. regexp_split_to_table supports the flags described in Table 9.22.

The regexp_split_to_array function behaves the same as regexp_split_to_table, except that regexp_split_to_array returns its result as an array of text. It has the syntax regexp_split_to_array(string, pattern [, flags ]). The parameters are the same as for regexp_split_to_table.

Some examples:


SELECT foo FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', '\s+') AS foo;
  foo   
-------
 the    
 quick  
 brown  
 fox    
 jumps 
 over   
 the    
 lazy   
 dog    
(9 rows)

SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', '\s+');
              regexp_split_to_array             
-----------------------------------------------
 {the,quick,brown,fox,jumps,over,the,lazy,dog}
(1 row)

SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
 foo 
-----
 t         
 h         
 e         
 q         
 u         
 i         
 c         
 k         
 b         
 r         
 o         
 w         
 n         
 f         
 o         
 x         
(16 rows)

As the last example demonstrates, the regexp split functions ignore zero-length matches that occur at the start or end of the string or immediately after a previous match. This is contrary to the strict definition of regexp matching that is implemented by regexp_match and regexp_matches, but is usually the most convenient behavior in practice. Other software systems such as Perl use similar definitions.

9.7.3.1. Regular Expression Details

PostgreSQL's regular expressions are implemented using a software package written by Henry Spencer. Much of the description of regular expressions below is copied verbatim from his manual.

Regular expressions (REs), as defined in POSIX 1003.2, come in two forms: extended REs or EREs (roughly those of egrep), and basic REs or BREs (roughly those of ed). PostgreSQL supports both forms, and also implements some extensions that are not in the POSIX standard, but have become widely used due to their availability in programming languages such as Perl and Tcl. REs using these non-POSIX extensions are called advanced REs or AREs in this documentation. AREs are almost an exact superset of EREs, but BREs have several notational incompatibilities (as well as being much more limited). We first describe the ARE and ERE forms, noting features that apply only to AREs, and then describe how BREs differ.

Note

PostgreSQL always initially presumes that a regular expression follows the ARE rules. However, the more limited ERE or BRE rules can be chosen by prepending an embedded option to the RE pattern, as described in Section 9.7.3.4. This can be useful for compatibility with applications that expect exactly the POSIX 1003.2 rules.

A regular expression is defined as one or more branches, separated by |. It matches anything that matches one of the branches.

A branch is zero or more quantified atoms or constraints, concatenated. It matches a match for the first, followed by a match for the second, etc; an empty branch matches the empty string.

A quantified atom is an atom possibly followed by a single quantifier. Without a quantifier, it matches a match for the atom. With a quantifier, it can match some number of matches of the atom. An atom can be any of the possibilities shown in Table 9.15. The possible quantifiers and their meanings are shown in Table 9.16.

A constraint matches an empty string, but matches only when specific conditions are met. A constraint can be used where an atom could be used, except it cannot be followed by a quantifier. The simple constraints are shown in Table 9.17; some more constraints are described later.

Table 9.15. Regular Expression Atoms

Atom

Description

(re)

(where re is any regular expression) matches a match for re, with the match noted for possible reporting

(?:re)

as above, but the match is not noted for reporting (a “non-capturing” set of parentheses) (AREs only)

.

matches any single character

[chars]

a bracket expression, matching any one of the chars (see for more detail)

\k

(where k is a non-alphanumeric character) matches that character taken as an ordinary character, e.g., \\ matches a backslash character

\c

where c is alphanumeric (possibly followed by other characters) is an escape, see (AREs only; in EREs and BREs, this matches c)

{

when followed by a character other than a digit, matches the left-brace character {; when followed by a digit, it is the beginning of a bound (see below)

x

where x is a single character with no other significance, matches that character

An RE cannot end with a backslash (\).

Note

If you have standard_conforming_strings turned off, any backslashes you write in literal string constants will need to be doubled. See Section 4.1.2.1 for more information.

Table 9.16. Regular Expression Quantifiers

Quantifier

Matches

*

a sequence of 0 or more matches of the atom

+

a sequence of 1 or more matches of the atom

?

a sequence of 0 or 1 matches of the atom

{m}

a sequence of exactly m matches of the atom

{m,}

a sequence of m or more matches of the atom

{m,n}

a sequence of m through n (inclusive) matches of the atom; m cannot exceed n

*?

non-greedy version of *

+?

non-greedy version of +

??

non-greedy version of ?

{m}?

non-greedy version of {m}

{m,}?

non-greedy version of {m,}

{m,n}?

non-greedy version of {m,n}

The forms using {...} are known as bounds. The numbers m and n within a bound are unsigned decimal integers with permissible values from 0 to 255 inclusive.

Non-greedy quantifiers (available in AREs only) match the same possibilities as their corresponding normal (greedy) counterparts, but prefer the smallest number rather than the largest number of matches. See Section 9.7.3.5 for more detail.

Note

A quantifier cannot immediately follow another quantifier, e.g., ** is invalid. A quantifier cannot begin an expression or subexpression or follow ^ or |.

Table 9.17. Regular Expression Constraints

Constraint

Description

^

matches at the beginning of the string

$

matches at the end of the string

(?=re)

positive lookahead matches at any point where a substring matching re begins (AREs only)

(?!re)

negative lookahead matches at any point where no substring matching re begins (AREs only)

(?<=re)

positive lookbehind matches at any point where a substring matching re ends (AREs only)

(?<!re)

negative lookbehind matches at any point where no substring matching re ends (AREs only)

Lookahead and lookbehind constraints cannot contain back references (see Section 9.7.3.3), and all parentheses within them are considered non-capturing.

9.7.3.2. Bracket Expressions

A bracket expression is a list of characters enclosed in []. It normally matches any single character from the list (but see below). If the list begins with ^, it matches any single character not from the rest of the list. If two characters in the list are separated by -, this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g., [0-9] in ASCII matches any decimal digit. It is illegal for two ranges to share an endpoint, e.g., a-c-e. Ranges are very collating-sequence-dependent, so portable programs should avoid relying on them.

To include a literal ] in the list, make it the first character (after ^, if that is used). To include a literal -, make it the first or last character, or the second endpoint of a range. To use a literal - as the first endpoint of a range, enclose it in [. and .] to make it a collating element (see below). With the exception of these characters, some combinations using [ (see next paragraphs), and escapes (AREs only), all other special characters lose their special significance within a bracket expression. In particular, \ is not special when following ERE or BRE rules, though it is special (as introducing an escape) in AREs.

Within a bracket expression, a collating element (a character, a multiple-character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in [. and .]stands for the sequence of characters of that collating element. The sequence is treated as a single element of the bracket expression's list. This allows a bracket expression containing a multiple-character collating element to match more than one character, e.g., if the collating sequence includes a ch collating element, then the RE [[.ch.]]*c matches the first five characters of chchcc.

Note

PostgreSQL currently does not support multi-character collating elements. This information describes possible future behavior.

Within a bracket expression, a collating element enclosed in [= and =] is an equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, the treatment is as if the enclosing delimiters were [. and .].) For example, if o and ^ are the members of an equivalence class, then [[=o=]], [[=^=]], and [o^] are all synonymous. An equivalence class cannot be an endpoint of a range.

Within a bracket expression, the name of a character class enclosed in [: and :] stands for the list of all characters belonging to that class. Standard character class names are: alnum, alpha, blank,cntrl, digit, graph, lower, print, punct, space, upper, xdigit. These stand for the character classes defined in ctype. A locale can provide others. A character class cannot be used as an endpoint of a range.

There are two special cases of bracket expressions: the bracket expressions [[:<:]] and [[:>:]] are constraints, matching empty strings at the beginning and end of a word respectively. A word is defined as a sequence of word characters that is neither preceded nor followed by word characters. A word character is an alnum character (as defined by ctype) or an underscore. This is an extension, compatible with but not specified by POSIX 1003.2, and should be used with caution in software intended to be portable to other systems. The constraint escapes described below are usually preferable; they are no more standard, but are easier to type.

9.7.3.3. Regular Expression Escapes

Escapes are special sequences beginning with \ followed by an alphanumeric character. Escapes come in several varieties: character entry, class shorthands, constraint escapes, and back references. A \ followed by an alphanumeric character but not constituting a valid escape is illegal in AREs. In EREs, there are no escapes: outside a bracket expression, a \ followed by an alphanumeric character merely stands for that character as an ordinary character, and inside a bracket expression, \ is an ordinary character. (The latter is the one actual incompatibility between EREs and AREs.)

Character-entry escapes exist to make it easier to specify non-printing and other inconvenient characters in REs. They are shown in Table 9.18.

Class-shorthand escapes provide shorthands for certain commonly-used character classes. They are shown in Table 9.19.

A constraint escape is a constraint, matching the empty string if specific conditions are met, written as an escape. They are shown in Table 9.20.

A back reference (\n) matches the same string matched by the previous parenthesized subexpression specified by the number n (see Table 9.21). For example, ([bc])\1 matches bb or cc but not bcor cb. The subexpression must entirely precede the back reference in the RE. Subexpressions are numbered in the order of their leading parentheses. Non-capturing parentheses do not define subexpressions.

Table 9.18. Regular Expression Character-entry Escapes

Escape

Description

\a

alert (bell) character, as in C

\b

backspace, as in C

\B

synonym for backslash (\) to help reduce the need for backslash doubling

\cX

(where X is any character) the character whose low-order 5 bits are the same as those of X, and whose other bits are all zero

\e

the character whose collating-sequence name is ESC, or failing that, the character with octal value 033

\f

form feed, as in C

\n

newline, as in C

\r

carriage return, as in C

\t

horizontal tab, as in C

\uwxyz

(where wxyz is exactly four hexadecimal digits) the character whose hexadecimal value is 0xwxyz

\Ustuvwxyz

(where stuvwxyz is exactly eight hexadecimal digits) the character whose hexadecimal value is 0xstuvwxyz

\v

vertical tab, as in C

\xhhh

(where hhh is any sequence of hexadecimal digits) the character whose hexadecimal value is 0xhhh (a single character no matter how many hexadecimal digits are used)

\0

the character whose value is 0 (the null byte)

\xy

(where xy is exactly two octal digits, and is not a back reference) the character whose octal value is 0xy

\xyz

(where xyz is exactly three octal digits, and is not a back reference) the character whose octal value is 0xyz

Hexadecimal digits are 0-9, a-f, and A-F. Octal digits are 0-7.

Numeric character-entry escapes specifying values outside the ASCII range (0-127) have meanings dependent on the database encoding. When the encoding is UTF-8, escape values are equivalent to Unicode code points, for example \u1234 means the character U+1234. For other multibyte encodings, character-entry escapes usually just specify the concatenation of the byte values for the character. If the escape value does not correspond to any legal character in the database encoding, no error will be raised, but it will never match any data.

The character-entry escapes are always taken as ordinary characters. For example, \135 is ] in ASCII, but \135 does not terminate a bracket expression.

Table 9.19. Regular Expression Class-shorthand Escapes

Escape

Description

\d

[[:digit:]]

\s

[[:space:]]

\w

[[:alnum:]_] (note underscore is included)

\D

[^[:digit:]]

\S

[^[:space:]]

\W

[^[:alnum:]_] (note underscore is included)

Within bracket expressions, \d, \s, and \w lose their outer brackets, and \D, \S, and \W are illegal. (So, for example, [a-c\d] is equivalent to [a-c[:digit:]]. Also, [a-c\D], which is equivalent to [a-c^[:digit:]], is illegal.)

Table 9.20. Regular Expression Constraint Escapes

Escape

Description

\A

matches only at the beginning of the string (see for how this differs from ^)

\m

matches only at the beginning of a word

\M

matches only at the end of a word

\y

matches only at the beginning or end of a word

\Y

matches only at a point that is not the beginning or end of a word

\Z

matches only at the end of the string (see for how this differs from $)

A word is defined as in the specification of [[:<:]] and [[:>:]] above. Constraint escapes are illegal within bracket expressions.

Table 9.21. Regular Expression Back References

Escape

Description

\m

(where m is a nonzero digit) a back reference to the m'th subexpression

\mnn

(where m is a nonzero digit, and nn is some more digits, and the decimal value mnn is not greater than the number of closing capturing parentheses seen so far) a back reference to the mnn'th subexpression

Note

There is an inherent ambiguity between octal character-entry escapes and back references, which is resolved by the following heuristics, as hinted at above. A leading zero always indicates an octal escape. A single non-zero digit, not followed by another digit, is always taken as a back reference. A multi-digit sequence not starting with a zero is taken as a back reference if it comes after a suitable subexpression (i.e., the number is in the legal range for a back reference), and otherwise is taken as octal.

9.7.3.4. Regular Expression Metasyntax

In addition to the main syntax described above, there are some special forms and miscellaneous syntactic facilities available.

An RE can begin with one of two special director prefixes. If an RE begins with ***:, the rest of the RE is taken as an ARE. (This normally has no effect in PostgreSQL, since REs are assumed to be AREs; but it does have an effect if ERE or BRE mode had been specified by the flags parameter to a regex function.) If an RE begins with ***=, the rest of the RE is taken to be a literal string, with all characters considered ordinary characters.

An ARE can begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic characters) specifies options affecting the rest of the RE. These options override any previously determined options — in particular, they can override the case-sensitivity behavior implied by a regex operator, or the flags parameter to a regex function. The available option letters are shown in Table 9.22. Note that these same option letters are used in the flags parameters of regex functions.

Table 9.22. ARE Embedded-option Letters

Option

Description

b

rest of RE is a BRE

c

case-sensitive matching (overrides operator type)

e

rest of RE is an ERE

i

case-insensitive matching (see ) (overrides operator type)

m

historical synonym for n

n

newline-sensitive matching (see )

p

partial newline-sensitive matching (see )

q

rest of RE is a literal (“quoted”) string, all ordinary characters

s

non-newline-sensitive matching (default)

t

tight syntax (default; see below)

w

inverse partial newline-sensitive (“weird”) matching (see )

x

expanded syntax (see below)

Embedded options take effect at the ) terminating the sequence. They can appear only at the start of an ARE (after the ***: director if any).

In addition to the usual (tight) RE syntax, in which all characters are significant, there is an expanded syntax, available by specifying the embedded x option. In the expanded syntax, white-space characters in the RE are ignored, as are all characters between a # and the following newline (or the end of the RE). This permits paragraphing and commenting a complex RE. There are three exceptions to that basic rule:

a white-space character or # preceded by \ is retained
white space or # within a bracket expression is retained
white space and comments cannot appear within multi-character symbols, such as (?:

For this purpose, white-space characters are blank, tab, newline, and any character that belongs to the space character class.

Finally, in an ARE, outside bracket expressions, the sequence (?#ttt) (where ttt is any text not containing a )) is a comment, completely ignored. Again, this is not allowed between the characters of multi-character symbols, like (?:. Such comments are more a historical artifact than a useful facility, and their use is deprecated; use the expanded syntax instead.

None of these metasyntax extensions is available if an initial ***= director has specified that the user's input be treated as a literal string rather than as an RE.

9.7.3.5. Regular Expression Matching Rules

In the event that an RE could match more than one substring of a given string, the RE matches the one starting earliest in the string. If the RE could match more than one substring starting at that point, either the longest possible match or the shortest possible match will be taken, depending on whether the RE is greedy or non-greedy.

Whether an RE is greedy or not is determined by the following rules:

Most atoms, and all constraints, have no greediness attribute (because they cannot match variable amounts of text anyway).
Adding parentheses around an RE does not change its greediness.
A quantified atom with a fixed-repetition quantifier ({m} or {m}?) has the same greediness (possibly none) as the atom itself.
A quantified atom with other normal quantifiers (including {m,n} with m equal to n) is greedy (prefers longest match).
A quantified atom with a non-greedy quantifier (including {m,n}? with m equal to n) is non-greedy (prefers shortest match).
A branch — that is, an RE that has no top-level | operator — has the same greediness as the first quantified atom in it that has a greediness attribute.
An RE consisting of two or more branches connected by the | operator is always greedy.

The above rules associate greediness attributes not only with individual quantified atoms, but with branches and entire REs that contain quantified atoms. What that means is that the matching is done in such a way that the branch, or whole RE, matches the longest or shortest possible substring as a whole. Once the length of the entire match is determined, the part of it that matches any particular subexpression is determined on the basis of the greediness attribute of that subexpression, with subexpressions starting earlier in the RE taking priority over ones starting later.

An example of what this means:

SELECT SUBSTRING('XY1234Z', 'Y*([0-9]{1,3})');
Result: 123
SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})');
Result: 1

In the first case, the RE as a whole is greedy because Y* is greedy. It can match beginning at the Y, and it matches the longest possible string starting there, i.e., Y123. The output is the parenthesized part of that, or 123. In the second case, the RE as a whole is non-greedy because Y*? is non-greedy. It can match beginning at the Y, and it matches the shortest possible string starting there, i.e., Y1. The subexpression [0-9]{1,3} is greedy but it cannot change the decision as to the overall match length; so it is forced to match just 1.

In short, when an RE contains both greedy and non-greedy subexpressions, the total match length is either as long as possible or as short as possible, according to the attribute assigned to the whole RE. The attributes assigned to the subexpressions only affect how much of that match they are allowed to “eat” relative to each other.

The quantifiers {1,1} and {1,1}? can be used to force greediness or non-greediness, respectively, on a subexpression or a whole RE. This is useful when you need the whole RE to have a greediness attribute different from what's deduced from its elements. As an example, suppose that we are trying to separate a string containing some digits into the digits and the parts before and after them. We might try to do that like this:

SELECT regexp_match('abc01234xyz', '(.*)(\d+)(.*)');
Result: {abc0123,4,xyz}

That didn't work: the first .* is greedy so it “eats” as much as it can, leaving the \d+ to match at the last possible place, the last digit. We might try to fix that by making it non-greedy:

SELECT regexp_match('abc01234xyz', '(.*?)(\d+)(.*)');
Result: {abc,0,""}

That didn't work either, because now the RE as a whole is non-greedy and so it ends the overall match as soon as possible. We can get what we want by forcing the RE as a whole to be greedy:

SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
Result: {abc,01234,xyz}

Controlling the RE's overall greediness separately from its components' greediness allows great flexibility in handling variable-length patterns.

When deciding what is a longer or shorter match, match lengths are measured in characters, not collating elements. An empty string is considered longer than no match at all. For example: bb*matches the three middle characters of abbbc; (week|wee)(night|knights) matches all ten characters of weeknights; when (.*).* is matched against abc the parenthesized subexpression matches all three characters; and when (a*)* is matched against bc both the whole RE and the parenthesized subexpression match an empty string.

If case-independent matching is specified, the effect is much as if all case distinctions had vanished from the alphabet. When an alphabetic that exists in multiple cases appears as an ordinary character outside a bracket expression, it is effectively transformed into a bracket expression containing both cases, e.g., x becomes [xX]. When it appears inside a bracket expression, all case counterparts of it are added to the bracket expression, e.g., [x] becomes [xX] and [^x] becomes [^xX].

If newline-sensitive matching is specified, . and bracket expressions using ^ will never match the newline character (so that matches will never cross newlines unless the RE explicitly arranges it) and ^and $ will match the empty string after and before a newline respectively, in addition to matching at beginning and end of string respectively. But the ARE escapes \A and \Z continue to match beginning or end of string only.

If partial newline-sensitive matching is specified, this affects . and bracket expressions as with newline-sensitive matching, but not ^ and $.

If inverse partial newline-sensitive matching is specified, this affects ^ and $ as with newline-sensitive matching, but not . and bracket expressions. This isn't very useful but is provided for symmetry.

9.7.3.6. Limits And Compatibility

No particular limit is imposed on the length of REs in this implementation. However, programs intended to be highly portable should not employ REs longer than 256 bytes, as a POSIX-compliant implementation can refuse to accept such REs.

The only feature of AREs that is actually incompatible with POSIX EREs is that \ does not lose its special significance inside bracket expressions. All other ARE features use syntax which is illegal or has undefined or unspecified effects in POSIX EREs; the *** syntax of directors likewise is outside the POSIX syntax for both BREs and EREs.

Many of the ARE extensions are borrowed from Perl, but some have been changed to clean them up, and a few Perl extensions are not present. Incompatibilities of note include \b, \B, the lack of special treatment for a trailing newline, the addition of complemented bracket expressions to the things affected by newline-sensitive matching, the restrictions on parentheses and back references in lookahead/lookbehind constraints, and the longest/shortest-match (rather than first-match) matching semantics.

Two significant incompatibilities exist between AREs and the ERE syntax recognized by pre-7.4 releases of PostgreSQL:

In AREs, \ followed by an alphanumeric character is either an escape or an error, while in previous releases, it was just another way of writing the alphanumeric. This should not be much of a problem because there was no reason to write such a sequence in earlier releases.
In AREs, \ remains a special character within [], so a literal \ within a bracket expression must be written \\.

9.7.3.7. Basic Regular Expressions

BREs differ from EREs in several respects. In BREs, |, +, and ? are ordinary characters and there is no equivalent for their functionality. The delimiters for bounds are \{ and \}, with { and } by themselves ordinary characters. The parentheses for nested subexpressions are $ and $, with ( and ) by themselves ordinary characters. ^ is an ordinary character except at the beginning of the RE or the beginning of a parenthesized subexpression, $ is an ordinary character except at the end of the RE or the end of a parenthesized subexpression, and * is an ordinary character if it appears at the beginning of the RE or the beginning of a parenthesized subexpression (after a possible leading ^). Finally, single-digit back references are available, and \< and \> are synonyms for [[:<:]] and [[:>:]] respectively; no other escapes are available in BREs.

9.4. 字串函式及運算子

本節介紹用於檢查和操作字串值的函數和運算子。此節中的字串包括 character，character varying 和 text 的內容。除非另有說明，否則下面列出的所有函數都適用於所有這些型別，但在使用字串型別時要小心自動填充字元的潛在影響。對於 bit-string 型別，一些函數也可以處理。

SQL 定義了一些字串函數，它們使用關鍵字而不是逗號來分隔參數。詳情見 Table 9.8。PostgreSQL 還提供了一般函數呼叫語法的這些函數的版本（參見 Table 9.9）。

注意在 PostgreSQL 8.3 之前，由於存在從這些資料型別到文字的強制轉換，這些函數也會默默地接受幾個非字串資料型別的值。但這些強制措施已被刪除，因為它們經常引起令人驚訝的行為。不過，字串連接運算子（||）仍然接受非字串輸入，只要至少有一個輸入是字串型別，如 Table 9.8 所示。對於其他情況，如果需要複製先前的行為，請在語法中加入明確的轉換。

Table 9.8. SQL String Functions and Operators

Function

Return Type

Description

Example

Result

string || string

text

字串連接

'Post' || 'greSQL'

PostgreSQL

string || non-string or non-string ||string

text

字串與一個非字串輸入連接

'Value: ' || 42

Value: 42

bit_length(string)

int

字串中的位元數

bit_length('jose')

32

char_length(string) or character_length(string)

int

字串中的字元數

char_length('jose')

4

lower(string)

text

將字串轉換為小寫

lower('TOM')

tom

octet_length(string)

int

字串中的位元組數

octet_length('jose')

4

overlay(string placing string fromint [for int])

text

子字串替換

overlay('Txxxxas' placing 'hom' from 2 for 4)

Thomas

position(substring in string)

int

指出子字串的位置

position('om' in 'Thomas')

3

substring(string [from int] [forint])

text

提取子字串

substring('Thomas' from 2 for 3)

hom

substring(string from pattern)

text

提取符合 POSIX 正規表示式的子字串。有關特徵比對的更多訊息，請參見。

substring('Thomas' from '...$')

mas

substring(string from pattern forescape)

text

提取符合 SQL 正規表示式的子字串。有關特徵比對的更多訊息，請參閱。

substring('Thomas' from '%#"o_a#"_' for '#')

oma

trim([leading | trailing | both] [characters] from string)

text

從字串的開頭，結尾或兩端（兩者都是預設值）中刪除包含某些字元（預設為空格）的最長字串

trim(both 'xyz' from 'yxTomxx')

Tom

trim([leading | trailing | both] [from] string [, characters] )

text

trim() 的非標準語法

trim(both from 'yxTomxx', 'xyz')

Tom

upper(string)

text

將字串轉換為大寫

upper('tom')

TOM

還有其他字串操作函數可用，在 Table 9.9 中列出。其中一些內部用於實作 SQL 標準的字串函數列在 Table 9.8。

Table 9.9. Other String Functions

Function

Return Type

Description

Example

Result

ascii(string)

int

參數的第一個字元的 ASCII 碼。對於 UTF8，回傳字元的 Unicode 代碼。對於其他多位元組編碼，參數必須是 ASCII 字元。

ascii('x')

120

btrim(string text [, characterstext])

text

從字串的開頭和結尾刪除特定字元的最長字串（預設為空格）

btrim('xyxtrimyyx', 'xyz')

trim

chr(int)

text

輸出給定代碼的字元。對於 UTF8，該參數被視為 Unicode 代碼。對於其他多位元組編碼，參數必須指定 ASCII 字元。不允許使用 NULL（0）字元，因為文字資料型別無法儲存這個位元組。

chr(65)

A

concat(str "any" [, str "any" [, ...] ])

text

連接所有參數的文字結果。NULL 參數會被忽略。

concat('abcde', 2, NULL, 22)

abcde222

concat_ws(sep text, str "any" [,str "any" [, ...] ])

text

使用分隔字元連接除第一個參數以外的所有參數。第一個參數用作分隔字串。NULL 參數會被忽略。

concat_ws(',', 'abcde', 2, NULL, 22)

abcde,2,22

convert(string bytea,src_encoding name, dest_encodingname)

bytea

將字串轉換為 dest_encoding。原始編碼由 src_encoding 指定。該字串必須在此編碼中有效。可以透過 CREATE CONVERSION 定義轉換。還有一些預定義的轉換。有關可用的轉換，請參閱。

convert('text_in_utf8', 'UTF8', 'LATIN1')

text_in_utf8represented in Latin-1 encoding (ISO 8859-1)

convert_from(string bytea,src_encoding name)

text

將字串轉換為資料庫編碼。原始編碼由 src_encoding 指定。該字串必須在此編碼中有效。

convert_from('text_in_utf8', 'UTF8')

text_in_utf8represented in the current database encoding

convert_to(string text,dest_encoding name)

bytea

將字串轉換為 dest_encoding。

convert_to('some text', 'UTF8')

some text represented in the UTF8 encoding

decode(string text, format text)

bytea

從字串中的文字表示中解碼二進位資料。格式選項與編碼相同。

decode('MTIzAAE=', 'base64')

\x3132330001

encode(data bytea, format text)

text

將二進制資料編碼為文字表示。支援的格式為：base64，hex，escape。escape 將零位元組和 high-bit-set 位元組轉換為八進制序列（\nnn）並將倒斜線加倍。

encode(E'123\\000\\001', 'base64')

MTIzAAE=

format(formatstr text [,formatarg "any" [, ...] ])

text

根據格式字串格式化參數。此功能類似於 C 函數 sprintf。詳見。

format('Hello %s, %1$s', 'World')

Hello World, World

initcap(string)

text

將每個單詞的第一個字母轉換為大寫，其餘單詞轉換為小寫。單詞是由非字母數字字元分隔的字母數字字元序列。

initcap('hi THOMAS')

Hi Thomas

left(str text, n int)

text

回傳字串中的前 n 個字元。當 n 為負數時，回傳除最後 |n| 之外的所有內容字元。

left('abcde', 2)

ab

length(string)

int

字串中的字元數

length('jose')

4

length(string bytea, encodingname )

int

給定編碼中字串中的字元數。該字串必須在此編碼中有效。

length('jose', 'UTF8')

4

lpad(string text, length int [,fill text])

text

透過在字元填充前加上字串填充（預設為空格）。如果字串已經長於長度，那麼它將被截斷（在右側）。

lpad('hi', 5, 'xy')

xyxhi

ltrim(string text [, characterstext])

text

從字串的開頭刪除最長指定字元的字串（預設為空格）

ltrim('zzzytest', 'xyz')

test

md5(string)

text

計算字串的 MD5 雜湊值，以十六進位形式回傳結果

md5('abc')

900150983cd24fb0 d6963f7d28e17f72

parse_ident(qualified_identifiertext [, strictmode booleanDEFAULT true ] )

text[]

將 qualified_identifier 以標示字拆分為陣列，刪除任何單個標示字的引用。預設情況下，最後一個標示字後面的額外字元將被視為錯誤；但如果第二個參數為 false，則忽略這些額外的字元。（此行為對於解析函數等物件的名稱很有用。）請注意，此函數不會截斷超長標示字。如果要截斷，可以將結果轉換為 name[]。

parse_ident('"SomeSchema".someTable')

{SomeSchema,sometable}

pg_client_encoding()

name

目前用戶端的編碼名稱

pg_client_encoding()

SQL_ASCII

quote_ident(string text)

text

回傳適當引用的字串，以用作 SQL 語句字串中的標示字。僅在必要時加上引號（即如果字串包含非標示字或將被大小寫折疊）。嵌入式引號會正確加倍。請參閱。

quote_ident('Foo bar')

"Foo bar"

quote_literal(string text)

text

回傳適當引用的字串，以用作 SQL 語句字串中的字串文字。嵌入式單引號和倒斜線會適當加倍。請注意，quote_literal 在 null 輸入時回傳 null；如果參數可能為 null，則 quote_nullable 通常更合適。請參閱。

quote_literal(E'O\'Reilly')

'O''Reilly'

quote_literal(value anyelement)

text

將給定的值強制轉換為文字型別，然後將其引用為文字。嵌入式單引號和反斜線會適當加倍。

quote_literal(42.5)

'42.5'

quote_nullable(string text)

text

回傳適當引用的字串，以用作 SQL 語句字串中的字串文字；或者，如果參數為 null，則回傳NULL。嵌入式單引號和倒斜線將適當加倍。請參閱。

quote_nullable(NULL)

NULL

quote_nullable(value anyelement)

text

將給定的值強制轉換為文字型別，然後將其引用為文字；或者，如果參數為 null，則回傳 NULL。嵌入式單引號和倒斜線將適當加倍。

quote_nullable(42.5)

'42.5'

regexp_match(string text,pattern text [, flags text])

text[]

回傳由 POSIX 正規表示式與字串的第一個匹配產生的子字串。有關更多訊息，請參閱。

regexp_match('foobarbequebaz', '(bar)(beque)')

{bar,beque}

regexp_matches(string text,pattern text [, flags text])

setof text[]

回傳透過將 POSIX 正規表示式與字串匹配而得到的子字串。有關更多訊息，請參閱。

regexp_matches('foobarbequebaz', 'ba.', 'g')

{bar}{baz}(2 rows)

regexp_replace(string text,pattern text, replacement text[, flags text])

text

替換與 POSIX 正規表示式匹配的子字串。有關更多訊息，請參閱。

regexp_replace('Thomas', '.[mN]a.', 'M')

ThM

regexp_split_to_array(stringtext, pattern text [, flags text])

text[]

使用 POSIX 正規表示式作為分隔字拆分字串。有關更多訊息，請參閱。

regexp_split_to_array('hello world', E'\\s+')

{hello,world}

regexp_split_to_table(stringtext, pattern text [, flagstext])

setof text

使用 POSIX 正規表示式作為分隔字拆分字串。有關更多訊息，請參閱。

regexp_split_to_table('hello world', E'\\s+')

helloworld(2 rows)

repeat(string text, number int)

text

將字串重複的指定次數

repeat('Pg', 4)

PgPgPgPg

replace(string text, from text,to text)

text

以子字串 to 替換所有符合 from 的子字串

replace('abcdefabcdef', 'cd', 'XX')

abXXefabXXef

reverse(str)

text

回傳反轉字串。

reverse('abcde')

edcba

right(str text, n int)

text

回傳字串中的最後 n 個字元。當 n 為負數時，回傳除了第一個 |n| 之外的所有字元。

right('abcde', 2)

de

rpad(string text, length int [,fill text])

text

透過附加字元 fill（預設為空格）將字串填充至長度 length。如果字串已經長於 length，那麼它將被截斷。

rpad('hi', 5, 'xy')

hixyx

rtrim(string text [, characterstext])

text

從字串末尾刪除最長某包含 characters （預設為空格）的字串

rtrim('testxxzx', 'xyz')

test

split_part(string text,delimiter text, field int)

text

在分隔字上拆分字串並回傳給定段落（從一個字元開始）

split_part('abc~@~def~@~ghi', '~@~', 2)

def

strpos(string, substring)

int

回傳子字串的位置（與 position 相同，但請注意參數順序不同）

strpos('high', 'ig')

2

substr(string, from [, count])

text

提取子字串（與 substring 相同）

substr('alphabet', 3, 2)

ph

to_ascii(string text [, encodingtext])

text

從其他編碼將字串轉換為 ASCII（僅支援從 LATIN1，LATIN2，LATIN9 和 WIN1250 編碼轉換）

to_ascii('Karel')

Karel

to_hex(number int or bigint)

text

將數字轉換為其等效的十六進位表示

to_hex(2147483647)

7fffffff

translate(string text, fromtext, to text)

text

字串中與 from 集合中相符合的任何字元都將替換為 to 集合中的相對應字元。如果 from 長於 to，則會刪除 from 中出現的額外字元。

translate('12345', '143', 'ax')

a2x5

concat，concat_ws 和 format 函數是可變參數，因此可以將值連接或格式化成標記為 VARIADIC 關鍵字的陣列（請參閱第 37.4.5 節）。陣列的元素被視為它們是函數的單獨普通參數。如果 variadic 陣列參數為 NULL，則 concat 和 concat_ws 回傳 NULL，但 format 將 NULL 視為零元素陣列。

另請參閱第 9.20 節中的彙總函數 string_agg。

Table 9.10. Built-in Conversions

Conversion Name

Source Encoding

Destination Encoding

ascii_to_mic

SQL_ASCII

MULE_INTERNAL

ascii_to_utf8

SQL_ASCII

UTF8

big5_to_euc_tw

BIG5

EUC_TW

big5_to_mic

BIG5

MULE_INTERNAL

big5_to_utf8

BIG5

UTF8

euc_cn_to_mic

EUC_CN

MULE_INTERNAL

euc_cn_to_utf8

EUC_CN

UTF8

euc_jp_to_mic

EUC_JP

MULE_INTERNAL

euc_jp_to_sjis

EUC_JP

SJIS

euc_jp_to_utf8

EUC_JP

UTF8

euc_kr_to_mic

EUC_KR

MULE_INTERNAL

euc_kr_to_utf8

EUC_KR

UTF8

euc_tw_to_big5

EUC_TW

BIG5

euc_tw_to_mic

EUC_TW

MULE_INTERNAL

euc_tw_to_utf8

EUC_TW

UTF8

gb18030_to_utf8

GB18030

UTF8

gbk_to_utf8

GBK

UTF8

iso_8859_10_to_utf8

LATIN6

UTF8

iso_8859_13_to_utf8

LATIN7

UTF8

iso_8859_14_to_utf8

LATIN8

UTF8

iso_8859_15_to_utf8

LATIN9

UTF8

iso_8859_16_to_utf8

LATIN10

UTF8

iso_8859_1_to_mic

LATIN1

MULE_INTERNAL

iso_8859_1_to_utf8

LATIN1

UTF8

iso_8859_2_to_mic

LATIN2

MULE_INTERNAL

iso_8859_2_to_utf8

LATIN2

UTF8

iso_8859_2_to_windows_1250

LATIN2

WIN1250

iso_8859_3_to_mic

LATIN3

MULE_INTERNAL

iso_8859_3_to_utf8

LATIN3

UTF8

iso_8859_4_to_mic

LATIN4

MULE_INTERNAL

iso_8859_4_to_utf8

LATIN4

UTF8

iso_8859_5_to_koi8_r

ISO_8859_5

KOI8R

iso_8859_5_to_mic

ISO_8859_5

MULE_INTERNAL

iso_8859_5_to_utf8

ISO_8859_5

UTF8

iso_8859_5_to_windows_1251

ISO_8859_5

WIN1251

iso_8859_5_to_windows_866

ISO_8859_5

WIN866

iso_8859_6_to_utf8

ISO_8859_6

UTF8

iso_8859_7_to_utf8

ISO_8859_7

UTF8

iso_8859_8_to_utf8

ISO_8859_8

UTF8

iso_8859_9_to_utf8

LATIN5

UTF8

johab_to_utf8

JOHAB

UTF8

koi8_r_to_iso_8859_5

KOI8R

ISO_8859_5

koi8_r_to_mic

KOI8R

MULE_INTERNAL

koi8_r_to_utf8

KOI8R

UTF8

koi8_r_to_windows_1251

KOI8R

WIN1251

koi8_r_to_windows_866

KOI8R

WIN866

koi8_u_to_utf8

KOI8U

UTF8

mic_to_ascii

MULE_INTERNAL

SQL_ASCII

mic_to_big5

MULE_INTERNAL

BIG5

mic_to_euc_cn

MULE_INTERNAL

EUC_CN

mic_to_euc_jp

MULE_INTERNAL

EUC_JP

mic_to_euc_kr

MULE_INTERNAL

EUC_KR

mic_to_euc_tw

MULE_INTERNAL

EUC_TW

mic_to_iso_8859_1

MULE_INTERNAL

LATIN1

mic_to_iso_8859_2

MULE_INTERNAL

LATIN2

mic_to_iso_8859_3

MULE_INTERNAL

LATIN3

mic_to_iso_8859_4

MULE_INTERNAL

LATIN4

mic_to_iso_8859_5

MULE_INTERNAL

ISO_8859_5

mic_to_koi8_r

MULE_INTERNAL

KOI8R

mic_to_sjis

MULE_INTERNAL

SJIS

mic_to_windows_1250

MULE_INTERNAL

WIN1250

mic_to_windows_1251

MULE_INTERNAL

WIN1251

mic_to_windows_866

MULE_INTERNAL

WIN866

sjis_to_euc_jp

SJIS

EUC_JP

sjis_to_mic

SJIS

MULE_INTERNAL

sjis_to_utf8

SJIS

UTF8

tcvn_to_utf8

WIN1258

UTF8

uhc_to_utf8

UHC

UTF8

utf8_to_ascii

UTF8

SQL_ASCII

utf8_to_big5

UTF8

BIG5

utf8_to_euc_cn

UTF8

EUC_CN

utf8_to_euc_jp

UTF8

EUC_JP

utf8_to_euc_kr

UTF8

EUC_KR

utf8_to_euc_tw

UTF8

EUC_TW

utf8_to_gb18030

UTF8

GB18030

utf8_to_gbk

UTF8

GBK

utf8_to_iso_8859_1

UTF8

LATIN1

utf8_to_iso_8859_10

UTF8

LATIN6

utf8_to_iso_8859_13

UTF8

LATIN7

utf8_to_iso_8859_14

UTF8

LATIN8

utf8_to_iso_8859_15

UTF8

LATIN9

utf8_to_iso_8859_16

UTF8

LATIN10

utf8_to_iso_8859_2

UTF8

LATIN2

utf8_to_iso_8859_3

UTF8

LATIN3

utf8_to_iso_8859_4

UTF8

LATIN4

utf8_to_iso_8859_5

UTF8

ISO_8859_5

utf8_to_iso_8859_6

UTF8

ISO_8859_6

utf8_to_iso_8859_7

UTF8

ISO_8859_7

utf8_to_iso_8859_8

UTF8

ISO_8859_8

utf8_to_iso_8859_9

UTF8

LATIN5

utf8_to_johab

UTF8

JOHAB

utf8_to_koi8_r

UTF8

KOI8R

utf8_to_koi8_u

UTF8

KOI8U

utf8_to_sjis

UTF8

SJIS

utf8_to_tcvn

UTF8

WIN1258

utf8_to_uhc

UTF8

UHC

utf8_to_windows_1250

UTF8

WIN1250

utf8_to_windows_1251

UTF8

WIN1251

utf8_to_windows_1252

UTF8

WIN1252

utf8_to_windows_1253

UTF8

WIN1253

utf8_to_windows_1254

UTF8

WIN1254

utf8_to_windows_1255

UTF8

WIN1255

utf8_to_windows_1256

UTF8

WIN1256

utf8_to_windows_1257

UTF8

WIN1257

utf8_to_windows_866

UTF8

WIN866

utf8_to_windows_874

UTF8

WIN874

windows_1250_to_iso_8859_2

WIN1250

LATIN2

windows_1250_to_mic

WIN1250

MULE_INTERNAL

windows_1250_to_utf8

WIN1250

UTF8

windows_1251_to_iso_8859_5

WIN1251

ISO_8859_5

windows_1251_to_koi8_r

WIN1251

KOI8R

windows_1251_to_mic

WIN1251

MULE_INTERNAL

windows_1251_to_utf8

WIN1251

UTF8

windows_1251_to_windows_866

WIN1251

WIN866

windows_1252_to_utf8

WIN1252

UTF8

windows_1256_to_utf8

WIN1256

UTF8

windows_866_to_iso_8859_5

WIN866

ISO_8859_5

windows_866_to_koi8_r

WIN866

KOI8R

windows_866_to_mic

WIN866

MULE_INTERNAL

windows_866_to_utf8

WIN866

UTF8

windows_866_to_windows_1251

WIN866

WIN

windows_874_to_utf8

WIN874

UTF8

euc_jis_2004_to_utf8

EUC_JIS_2004

UTF8

utf8_to_euc_jis_2004

UTF8

EUC_JIS_2004

shift_jis_2004_to_utf8

SHIFT_JIS_2004

UTF8

utf8_to_shift_jis_2004

UTF8

SHIFT_JIS_2004

euc_jis_2004_to_shift_jis_2004

EUC_JIS_2004

SHIFT_JIS_2004

shift_jis_2004_to_euc_jis_2004

SHIFT_JIS_2004

EUC_JIS_2004

轉換名稱遵循標準命名方式：原始碼的正式名稱，所有非字母數字字元替換為底線，後接to，後接類似處理的目標編碼名稱。因此，名稱可能會偏離慣用的編碼名稱。

9.4.1. `format`

函數格式化輸出根據格式字串的輸出，其格式類似於 C 函數 sprintf。

format(formatstr text [, formatarg "any" [, ...] ])

formatstr 是一個格式字串，指定如何格式化結果。格式字串中的文字將直接複製到結果中，除非使用格式標示符。格式標示符充當字串中的佔位符，定義後續函數參數應如何格式化並插入結果中。每個 formatarg 參數根據其資料型別的一般輸出規則轉換為文字，然後根據格式標示符進行格式化並插入到結果字串中。

格式標示符由 % 字元引入並具有其語法

%[position][flags][width]type

組件段落的位置：position（選擇性）

形式為 n$ 的字串，其中 n 是要輸入參數的索引。索引 1 表示 formatstr 之後的第一個參數。如果省略該位置，則預設使用 sequence.flags 中的下一個參數（選擇性）

控制格式標示符輸出格式的其他選項。目前唯一支援的標示是減號（ - ），這將使格式標示符的輸出向左對齊。除非還指定了 width，否則這沒有效果。（選擇性）

指定用於顯示格式標示符輸出的最小字元數。輸出在左側或右側（取決於 - 標示）填充，並根據需要填充空格以填充寬度。寬度太小不會導致截斷輸出，但會被忽略。可以使用以下任何一種來指定寬度：正整數；星號（）使用下一個函數參數作為寬度；或者 n$ 形式的字串，以使用第 n 個函數參數作為寬度。

如果寬度來自函數參數，則該參數在用於格式標示符值的參數之前使用。如果 width 參數為負，則結果在長度為 abs(width).type（必要）的段落內保持對齊（就像指定了 - 標誌一樣）。

用於産生格式標示符輸出的格式轉換型別。支援以下型別：

s 將參數值格式化為簡單字串。空值被視為空字串。
I 將參數值視為 SQL 標示符，必要時對其進行雙引號。值為 null（相當於 quote_ident）是一個錯誤。
L 引用參數值作為 SQL 文字。空值顯示為字串 NULL，不帶引號（相當於 quote_nullable）。

除了上面描述的格式標示符之外，特殊序列 %% 可用於輸出文字 % 字元。

以下是基本格式轉換的一些範例：

SELECT format('Hello %s', 'World');
Result: Hello World

SELECT format('Testing %s, %s, %s, %%', 'one', 'two', 'three');
Result: Testing one, two, three, %

SELECT format('INSERT INTO %I VALUES(%L)', 'Foo bar', E'O\'Reilly');
Result: INSERT INTO "Foo bar" VALUES('O''Reilly')

SELECT format('INSERT INTO %I VALUES(%L)', 'locations', E'C:\\Program Files');
Result: INSERT INTO locations VALUES(E'C:\\Program Files')

以下是使用寬度欄位和 - 標示的範例：

SELECT format('|%10s|', 'foo');
Result: |       foo|

SELECT format('|%-10s|', 'foo');
Result: |foo       |

SELECT format('|%*s|', 10, 'foo');
Result: |       foo|

SELECT format('|%*s|', -10, 'foo');
Result: |foo       |

SELECT format('|%-*s|', 10, 'foo');
Result: |foo       |

SELECT format('|%-*s|', -10, 'foo');
Result: |foo       |

這些範例顯示了 position 欄位的使用：

SELECT format('Testing %3$s, %2$s, %1$s', 'one', 'two', 'three');
Result: Testing three, two, one

SELECT format('|%*2$s|', 'foo', 10, 'bar');
Result: |       bar|

SELECT format('|%1$*2$s|', 'foo', 10, 'bar');
Result: |       foo|

與標準 C 函數 sprintf 不同，PostgreSQL 的格式函數允許將具有和不具有位置欄位的格式標示符混合在相同的格式字串中。沒有位置欄位的格式標示符始終使用最後一個參數消耗後的下一個參數。此外，format 函數不要求在格式字串中使用所有函數參數。例如：

SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
Result: Testing three, two, three

%I 和 %L 格式標示符對於安全地建構動態 SQL 語句特別有用。詳見範例 42.1。

11

簡介

前言

1. 什麼是PostgreSQL？

2. PostgreSQL沿革

2.1. 伯克萊大學POSTGRES專案

2.2. Postgres 95

2.3. PostgreSQL

3. 慣例

4. 其他參考資訊

Wiki

Web Site

Mailing Lists

Yourself!

I. 新手教學

1. 入門指南

1.1. 安裝

1.2. 基礎架構

1.3. 建立一個資料庫

1.4. 存取一個資料庫

2. SQL查詢語言

2.1. 簡介

2.2. 概念

2.3. 創建一個新的資料表

2.4. 資料列是資料表的組成單位

2.5. 資料表的查詢

2.6. 交叉查詢

注意

2.7. 彙總查詢

2.8. 更新資料

2.9. 刪除資料

3. 先進功能

3.1. 簡介

3.2. 檢視表（View）

3.3. 外部索引鍵

3.4. 交易安全

注意

3.6. 繼承

注意

3.7. 結論

II. SQL查詢語言

4. SQL語法

5. 定義資料結構

5.1. 認識資料表

小技巧

5.2. 預設值

5.4. 系統欄位

5.5. 表格變更

5.5.1. 加入欄位

小技巧

5.5.2. 移除欄位

5.5.3. 加入限制條件

5.5.4. 移除限制條件

5.5.5. 變更欄位預設值

5.5.6. 變更欄位資料型別

5.5.7. 變更欄位名稱

5.5.8. 變更表格名稱

5.6. 權限

5.11. 外部資料

5.12. 其他資料庫物件

5.13. 相依性追蹤

注意

6. 資料處理

6.1. 新增資料

小技巧

6.2. 更新資料

6.3. 刪除資料

6.4. 修改並回傳資料

7. 資料查詢

7.1. 概觀

7.3. 取得資料列表

7.3.1. 資料列表項目

7.3.2. 欄位命名標籤

注意

7.3.3. DISTINCT

7.4. 合併查詢結果

7.5. 資料排序

7.6. 指定資料範圍

7.7. 列舉資料

8.2. 貨幣型別

7.3.3. `DISTINCT`