Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
這章中說明 SQL 的使用語法。從這裡建立後續章節所需的理解基礎,然後進一步瞭解 SQL 如何使用去定義及修改資料。
我們也建議已經熟悉 SQL 語法的使用者,仔細地閱讀本章,因為這裡包含了一些有別於其他 SQL 資料庫或專屬於 PostgreSQL 的規則和觀念。
參數表示式用在許多不同的方面,像是 SELECT 指令中的回傳列表;在 INSERT 或 UPDATE 指令中指定欄位的新值;又或是在一些命令中,指出搜尋的條件等。參數表示式的結果,有時候會被稱作 scalar,以有別於表格表示式(就是一個表格)的結果。參數表示式也可以稱作 scalar expressions(賦值表示式),甚或簡化為 expressions (表示式)。表示式的語法容許其值為各種運算的單一結果,如數學、邏輯、集合、或其他運算。
參數表示式可以是下列的其中一種形態:
常數或文字內容
欄位的引用
函數參數的引用,在函數裡或預備指令(prepared statement)中
子參數表示式
欄位選擇表示式
運算子宣告
函數呼叫
彙總表示式
窗函數呼叫
型別轉換
校對轉換(collation expression)
賦值子查詢(scalar subquery)
陣列建構式
列建構式
其他被括號括住的參數表示式(用於群組子表示式和強制調整運算優先權)
除了這個列表之外,還有一些建構式也會應用到表示式,但並沒有特別定義語法規則。一般來說,他們會包含函數或運算子的操作,在第 9 章中會有適當的說明。其中有一個例子便是 IS NULL 字句。
我們已經在 4.1.2 節中討論過常數了,所以接下來就從常數以下的項目繼續說明。
要引要一個欄位的話,請使用下列的形式:
「correlation」(所屬名稱)是其所屬表格的名稱(也可能需要包含結構名),或是表格的別名(在 FROM 子句中所定義的)。所屬名稱和分隔用的句點是可以省略的,如果欄位名稱在目前查詢中的所有表格中是唯一的話。(參閱第 7 章)
函數參數的引用,用來指定一個不在該 SQL 指令中的值。參數是使用在 SQL 函數定義或預備查詢之中。有一些用戶端函式庫也支援將資料數值與 SQL 指令分離,在這種情境下,參數就會用來指向外部的資料數值。參數引用的形式如下:
舉個例子,有一個函數 dept 的宣告如下:
這裡的 $1 指的是函數被呼叫時的第 1 個輸入參數:
如果表示式要產生陣列的結果的話,指定陣列中某個元素,請使用:
或是要取得陣列中多個相隣的元素,請使用:
每一個「subscript」本身都是一個表示式,必須要產生一個整數值。
一般來說,陣列表示式必須被括號起來,但如果該表示式只是一個欄位或參數的引用的話,那麼括號可以省略。然後,多個子參數表示式可以連在一起使用,當你需要陣列表達多維度的概念時。舉例如下:
在最後一個例子中,括號是必須的。關於陣列,在 8.15 節有更多說明。
如果一個表示式產生了複合性的型別(列型別),那麼要指定其中的某個欄位時,請使用:
一般來說,列的表示式必須被括號起來,但如果該表示式只是一個欄位或參數的引用的話,那麼括號可以省略。舉例如下:
(然而,有限制的欄位引用,實際上就是一種欄位選擇語法的特列。)有一種重要的特例是從某個複合型別的表格欄位中取其子欄位的值:
在這裡,括號是必要的,以表示 compositecol 是一個欄位名稱,但不是表格名稱。而在第二個例子中,mytable 是表格名稱,而非結構名稱。
你可以取得複合資料的所有欄位值,使用「.*」:
這個記號在不同的地方有不同的用法,請參閱 8.16.5 節的說明。
有三種用來進行運算子宣告的語法:
運算子記號的語法規則依 4.1.3 節的說明,或是關鍵字 AND、OR、和 NOT,又或是如下形式的限定運算子名稱:
哪些特定的運算子的使用與運算方式,端看系統與使用者如何定義。在第 9 章中會說明內建的運算子詳情。
函數呼叫的語法是,函數的名稱(可能還會加上結構名)接著一連串用括號括起來的參數列表:
舉個例子,下面的函數呼叫可以計算 2 的平方根:
內建函數在第 9 章說明,其他的函數可由使用者自訂。
參數可以是選擇性的附加名稱,請參閱 4.3 節的內容。
函數如果只有一個參數,而又是複合型別的話,就稱作使用了欄位選擇語法;反過來說,欄位選擇語法也可以寫成函數的形式。這是因為 col(table) 和 table.col 是可以互換的。這並非標準 SQL,但 PostgreSQL 支援了,因為這使得函數的使用可以模擬「計算欄位」(computed fields)。更多資訊請參閱 8.16.5 節。
彙總表示式用在查詢時,過濾資料進行彙總函數計算的應用。彙總函數壓縮了大量資料輸入成為一個單一的輸出值,例如加總或平均數。彙總表示式的語法可以是下列其中之一:
這裡的 agregate_name 是預先就定義好的(可能還需要加上結構名稱),表示式可以是任何的函數形態,但不能包含彙總函數或窗函數。而 order_by_clause 和 filter_clause 後續進行說明。
第一種形式的彙總表示式用於每次輸入一列的情況;第二種形式和第一種相同,當 ALL 是預設的時候;第三種形式彙總不重覆的資料(或在多種表示式的時候,取不重覆的集合);第四種形式也是每次輸入一列,但沒有限定輸入條件,通常是用於 count(*);最後一種形式用於有次序的彙總函數,稍後說明。
大多數的彙總函數會忽略空值,所以如果表示式計算的結果是空值的話,就會忽略不計。這樣的假設除非有特別設定,對所有內建的函數都是如此。
舉例來說,count(*) 計算輸入列的個數,而 count(f1) 是計算輸入列中 f1 欄位非空值的個數,因為 count 會忽略空值;然而,count(distinct f1) 則是計算 f1 欄位不重覆又非空值的個數。
通常彙總函數在處理輸入資料時,都是未排序過的。在大多數的情況沒有關係,例如:min 最小值的計算,與其輸入的次序沒有關係。然而,還是有些彙總函數的結果,與其處理次序是有關連的,例如:array_agg 和 string_agg。ORDER BY 字句就可以達到此效果,其與一般查詢語法 ORDER BY 的用法相同,詳細說明在 7.5 節,除非該表示式無法輸出成欄位名稱或數字。舉例如下:
操作到多參數的彙總函數時,注意 ORDER BY 會處理過所有的彙總參數,例如:
但不能這樣寫:
這在語法上沒有不合法,但這表示一個單參數的彙總函數,使用了兩個排序的關鍵值(第二個完全沒用,因為它是常數)。
如果 DISTINCT 被加到 ORDER BY 子句裡的話,那麼所有的 ORDER BY 表示式都必須符合彙總函數的參數,也就是說,你不能使用不在 DISTINCT 列表中的表示式來排序。
在彙總函數中使用 DISTINCT 和 ORDER BY,都是 PostgreSQL 的延伸。
把 ORDER BY 放進彙總函數的參數列表中,就如同到目前為止的描述,用於排序輸入值,進行一般性的處理或統計彙總,而排序是選擇性的。有另一種類型的彙總函數稱作有次序彙總,它們就必須要有 ORDER BY 子句,通常就是因為這些函數的計算結果,只會對某些特定次序的資料產生效果。典型的有次序彙總例子,包含排名和累計百分比計算。對於有次序彙總計算,將 ORDER BY 字句寫進 WITHIN GROUP (...) 中,如同上述最後一個語法例子。在 ORDER BY 子句中的表示式會處理每一筆輸入資料,如同一般的彚總函數,然後將其依子句中的表示式計算並排序,最後再依序轉送給彙總函數處理。(這和非處理 WITHIN GROUP 中的 ORDER BY 不同,它們不會再轉送給彙總函數。)如果有在 WITHIN GROUP 之前的表示式的話,稱作直接參數,會和有 ORDER BY 的參數有區分。不像一般的彙總參數,直接參數只會被處理一次,而不是每一筆都一次。這意思是只有在 GROUP BY 中,這些變數才會被彙總處理。這樣的限制就如同直接參數不在彙總表示式之中一樣。直接參數一般用於累計分配,只有在每一次彙整完的值才有意義。直接參數可以是空值,在這個例子中,使用的是 (),而非 (*)。(PostgreSQL 兩種寫法都可以接受,但標準 SQL 只接受前者。)
有次序彙總查詢如下:
這裡包含了 50% 的累計,或是中間數累計,來源是表格 households 的 income 欄位。其中,0.5 是直接參數,它不影響百分累計彙整計算過程。
如果使用了 FILTER,那就只有符合 FILTER 子句條件的資料會被彙總處理,其他的資料都會被忽略掉。舉例來說:
預先內建的彙總函數將在 9.20 節中介紹,其他彙總函數可以由使用者自行設計。
彙總表示式只可以用於結果列表或 SELECT 中的 HAVING 子句。在其他子句中是被禁止的,像是 WHERE,因為這些子句邏輯上都是在彙總處理前就得處理資料。
當彙總表示式使用在子查詢(參閱 4.2.11 節及 9.22 節)中時,彙總計算就會一般性地處理子查詢中的資料。但如果該彙總計算的參數用到了外層的變數時,就會產生例外情況:彙整計算是屬於最接近的外層查詢,並且只處理該層的查詢資料。這個彙總表示式對整體而言,只是一個子查詢的引用,它會被視為一個常數的結果,限制它只會出現在 HAVING 子句的運算層次而已。
窗函數呼叫指的是使用類似彙總函數的使用方式,只是僅用於查詢中部份列的選擇上。和非窗函數不同的是,這並不會只輸出為單一列—每一列都仍然分開輸出。然而,窗函數也是處理了所有該列所屬群組的其他列(PARTITION BY),依其窗函數所定義的範圍。窗函數呼叫的方式可以是下列其中之一:
定義「窗」,請使用下列語法:
選擇性的 frame_clause 語法如下:
frame_start 及 frame_end 的語法如下:
在這裡的表示式(expression),除了不能再包含窗函數之外,無其他特別限制。
window_name 是一個定義在 WINDOW 子句中的命名。另一方面,一個完整的窗也可以是被括號括起來,使用和 WINDOW 子句相同語法的定義。詳見 SELECT 語法頁面。值得探討的是,OVER wname 並不完全等同於 OVER (wname ...);後者隱含著複製及修改窗的定義,而如果包含 frame 子句的話,就會被拒絕執行。
PARTITION BY 子句將查詢分組成為不同的分區,它們將會分別地被窗函數所處理。PARTITION BY 的行為和查詢語句中的 GROUP BY 很類似,除了它的表示式就只是表示式,而且不能產出欄位名稱或編號。沒有 PARTITION BY 的話,所有的列都會被當作一個分組進行彙總。ORDER BY 子句決定窗函數的處理次序,它也和查詢語句中的 ORDER BY 很類似,但它不能使用輸出的欄位或編號。如果沒有 ORDER BY 的話,就無法保證彙總處理的次序了。
frame_clause 指的是構成該窗的列,再進一步以「窗框」拆分,是目前分區的子集合。對窗函數而言,運算會以窗框的範圍取代整合分區。窗框的指定可以是 RANGE 或 ROW 兩種模式。不論哪種模式,都 frame_start 執行到 frame_end,但如果 frame_end 省略了,預設就是到目前的列(CURRENT ROW)。
UNBOUNDED PRECEDING 的窗框始於該分區的第一列,同樣地,UNBOUNDED FOLLOWING 意指窗框結束於分區的最後一列。
在 RANGE 模式裡,如果 frame_start 設定為 CURRENT ROW 的話,表示窗框始於目前列同序的那一列(使用 ORDER BY 時,排序相同的那一列),同理,frame_end 設定為 CURRENT ROW 時,表示窗框止於排序相同的列。而在 ROWS 模式時,CURRENT ROW 指的就是自己。
PRECEDING 和 FOLLOWING 兩個設定值,目前只能用在 ROWS 模式。它們指的是窗框的起迄於指定的一個值,表示目前列之前後多少列。而所謂的值,必須是整數表示式而不包含任何變數、彙總函數、或窗函數。其值也不能是空值或負值,但可以為零,表示只處理目前列。
預設的窗框設定是 RANGE UNBOUNDED PRECEDING,和 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 是一樣的。加上 ORDER BY 的話,這可以讓窗框起於和目前列並列的列;沒有 ORDER BY 的話,所有的列都會在分區裡,因為如此就無法判定次序,表示大家都一樣。
frame_start 的限制是不能使用 UNBOUNDED FOLLOWING,而 frame_end 不能使用 UNBOUNDED PRECEDING。frame_end 的設定也不能先於 frame_start—舉例來說,RANGE BETWEEN CURRENT ROW,使用 PRECEDING 就不可以。
如果有使用到 FILTER 的話,就只有符合 FILTER 條件式的列會被窗函數處理,其餘的列都會被忽略。只有彙總式的窗函數可以使用 FILTER 子句。
內建的窗函數會在 9.57 節中說明,使用者也可以自行設計窗函數。任何內建或自訂的一般函數或統計函數,都可以當作窗函數來使用。(有序集合和假定集合的彙總數,目前不能當作窗函數來使用。)
「*」語法的使用,用來把無參數的彙總函數當作窗函數來使用,例如:count(*) OVER (PARTITION BY x ORDER BY y)。「*」通常不會用於專門的窗函數上,專門的窗函數不允許參數裡有用到 DISTINCT 或 ORDER BY 的語法。
窗函數呼叫只限於 SELECT 回傳列表,及 ORDER BY 子句中。
更多窗函數的說明請參閱 3.5 節、9.21 節、及 7.2.5 節。
型別轉換指定從一種資料型別轉換為另一種資料型別。PostgreSQL 接受兩種用於型別轉換的等效語法:
CAST 語法符合 SQL 標準;帶「::」的語法是 PostgreSQL 既有的用法。
當強制轉換應用於已知型別的值表示式時,它表示執行時型別轉換。只有定義了合適的型別轉換操作,操作才能成功。請注意,這與使用帶常數的強制轉換略有不同,如 4.1.2.7 節所示。應用於未經修飾的字串文字的強制轉換表示將型別初始分配給文字常數,因此對於任何型別(如果字串文字的內容都是資料型別的可接受輸入語法)都會成功。
如果對於值表示式必須產生的型別沒有歧義(例如,當它被分配給資料表欄位),通常可以省略顯式的型別轉換;系統將在這種情況下自動套用型別轉換。但是,只有在系統目錄中標記為「可以隱式套用」的強制轉換才會執行自動強制轉換。其他強制轉換必須使用顯式強制轉換語法來使用。此限制旨在防止系統默默地套用令人意外的轉換。
也可以使用函數式語法來指定型別轉換:
但是,這僅適用於名稱也可以作為函數名稱使用的型別。例如,雙精度不能用這種方式,但等價的 float8 可以。而且,由於語法衝突,名稱間隔,時間和時間戳記只能使用雙引號才能用於這種方式。因此,使用類似功能的轉換語法會導致不一致,因此可能應該避免。
函數式語法實際上只是一個函數呼叫。當兩個標準轉換語法之一用於執行轉換時,它將在內部呼叫已註冊的函數來執行轉換。按照慣例,這些轉換函數與它們的輸出類型具有相同的名稱,因此「函數式語法」只不過是直接呼叫底層的轉換函數。顯然,這不是一個可移植式應用程序應該依賴的東西。有關更多詳情,請參閱 CREATE CAST。
COLLATE 子句用於覆蓋排序規則的表示式。它附加到所套用的表示式上:
排序規則是一種可以綱要限定識別指標。COLLATE 子句比運算子更緊密;必要時可以使用括號。
如果沒有明確指定排序規則,那麼資料庫系統會從表示式中涉及的欄位中衍生一個排序規則,或者如果表示式中未包含任何欄位,則預設為資料庫的預設排序規則。
COLLATE 子句的兩個常見用法是重寫 ORDER BY 子句中的排序順序,例如:
並覆蓋具有語言環境特性結果的函數或運算子呼叫的排序規則,例如:
請注意,在後者的情況下,COLLATE 子句附加到我們希望影響的運算子的輸入參數。 無論運算子或函數呼叫 COLLATE 子句的哪個參數被附加到哪個參數都沒有關係,因為運算子或函數套用的排序規則是透過考慮所有參數衍生的,並且顯式 COLLATE 子句將覆蓋所有其他排序規則參數。(然而,將不匹配的 COLLATE 子句連接到多個參數是錯誤的,更多細節請參閱第 23.2 節)。因此,這會産生與前面的例子相同的結果:
但是這會有錯:
因為它試圖將排序規則應用於「>」運算子的結果,該運算符是不可排序的布林資料型別。
Scalar 子查詢指的是括號中的普通 SELECT 查詢,但它只回傳一個資料列的一個欄位。(有關撰寫查詢的訊息,請參閱第 7 章。)執行 SELECT 查詢並在周圍的值表示式中使用單個回傳的值。使用回傳多於一個資料列或多於一個欄位的查詢作為 scalar 子查詢是錯誤的。(但是,如果在特定執行過程中子查詢不回傳任何資料列,則不會出現錯誤;Scalar 結果將視為空)。子查詢可以引用周圍查詢中的變數,該變數在任何一次運算期間都將用作常數的子查詢。有關子查詢的其他表示式,另請參閱第 9.22 節。
例如,以下是每個州中最大的城市人口數量:
陣列建構函數是一種使用其成員元素的值建構陣列的表示式。一個簡單的陣列建構函數由關鍵字 ARRAY,左方括號 [,陣列元素值的表示式列表(用逗號分隔),最後一個右方括號 ] 組成。例如:
預設情況下,陣列元素型別是成員表示式的通用型別,使用與 UNION 或 CASE 結構相同的規則來決定(參閱 10.5 節)。您也可以透過明確將陣列建構函數轉換為所需的型別來覆蓋它,例如:
這與分別將每個表示式轉換為陣列元素型別的效果相同。有關型別轉換的更多訊息,請參閱第 4.2.9 節。
可以透過巢狀的陣列建構函數來建構多維陣列。在內部的建構函數中,關鍵字 ARRAY 可以省略。例如,這些語法會產生相同的結果:
由於多維陣列必須是矩形,因此同一級別的內部建構函數必須産生具有相同維數的子陣列。套用於外部 ARRAY 建構函數的任何強制型別都會自動轉送給所有內部建構函數。
多維陣列建構函數的元素可以是任何產生適當型別陣列的東西,不僅只是一個子 ARRAY 結構。例如:
你可以建構一個空陣列,但由於不可能有一個沒有型別的陣列,所以你必須明確地將你的空陣列轉換為所需的型別。例如:
也可以從子查詢的結果中建構一個陣列。在這種形式下,陣列建構函數使用關鍵字 ARRAY 和小括號(不是中括號)的子查詢寫入。例如:
子查詢必須回傳一個資料列。如果子查詢的輸出欄位是非陣列型別,則産生的一維陣列將具有子查詢結果中每個資料列的元素,其元素型別與子查詢的輸出欄位匹配。如果子查詢的輸出欄位是一個陣列型別,則結果將是一個相同型別的陣列,但會是一個更高的維度;在這種情況下,所有子查詢資料列都必須産生具有相同維度的陣列,否則結果將不是矩形。
用 ARRAY 建構的陣列索引值的下標始終以 1 開頭。有關陣列的更多訊息,請參閱第 8.15 節。
資料列建構函數是一個表示式,它使用其成員字串的值建構資料列內容(也稱為複合值)。資料建構函數由關鍵字 ROW,左括號,資料列字串的零個或多個表示式(以逗號分隔)所組成,最後則是右括號。例如:
當列表中有多個表示式時,關鍵詞 ROW 是選用的。
資料列建構函數可以包含語法 rowvalue.,它將被延展為資料列內容的元素列表,就像在 SELECT 回傳列表的使用 . 語法時一樣(請參閱第 8.16.5 節)。例如,如果資料列具有欄位 f1 和 f2,則這些欄位是相同的:
在 PostgreSQL 8.2 之前,. 語法在資料列建構函數中不會展開,因此寫了ROW(t., 42) 會建立一個兩個字串欄位的資料列,其第一個是欄位是另一個資料列值。新的建構行為通常更有用。如果您需要嵌套資料列值的舊行為,請不要使用 .* 的內部資料列值,例如 ROW(t, 42)。
預設情況下,由 ROW 表示式建立的值是匿名記錄型別。如有必要,可將其轉換為指定的複合型別 - 資料表的資料列型別或使用 CREATE TYPE AS 建立的複合型別。可能需要明確表示以避免歧義。例如:
資料列建構函數可用於建構要儲存在複合型別資料表欄位中的複合內容,或者要傳遞給接受複合參數的函數。此外,可以比較兩個資料列值或用 IS NULL 或 IS NOT NULL 來測試資料列,例如:
更多細節請參閱第 9.23 節。資料列建構函數也可以與子查詢結合使用,如第 9.22 節所述。
並沒有定義子表示式的運算順序。特別是,運算子或函數的輸入不一定是從左到右或以任何其他固定順序進行運算。
進一步來說,如果一個表示式的結果可以透過只運算它的某些部分來得到,那麼其他子表示式可能根本就不會被運算。 例如,如果有人寫了:
那麼 somefunc() 將(可能)根本不會被呼叫。如果有人寫了:
請注意,這與在某些程語言中發現的布林運算是從左到右的「短路」不同。
因此,將具有副作用的函數用作複雜表示式的一部分是不明智的。在 WHERE 和 HAVING 子句中依賴副作用或運算順序是特別危險的,因為這些子句作為製定執行計劃的一部分經常式會被重新運算。這些子句中的布林表示式(AND / OR / NOT 組合)可以按照布林代數法則的任何方式重新組織。
如果必須強制執行某部份的運算指令,則可以使用 CASE 結構(請參閱第 9.17 節)。例如,這是試圖避免在 WHERE 子句中除以零不可信任的方式:
但這樣是安全的:
以這種方式使用的 CASE 構造將放棄最佳化嘗試,因此只能在必要時進行。(在這個特定的例子中,透過改寫為 y> 1.5 * x 來避免這個問題會更好。)
然而,CASE 對於這些問題並不是萬能的。上述技術的一個局限是它不能阻止對常數子表示式的預先評估。如第 37.6 節所述,標記為 IMMUTABLE 的函數和運算子可以在查詢計劃時進行運算,而不是在執行時進行運算。因此,例如:
由於查詢規劃試圖簡化常數子表示式,因此即使資料表中的每一個資料列都具有 x> 0,以至於在執行時永遠不會走到 ELSE,也可能導致除以零的例外情況。
雖然這個特殊的例子看起來很愚蠢,但是在函數中執行的查詢中可能會出現不明顯涉及常數的情況,因為函數參數和局部變數的值可以作為常數插入到查詢中以用於查詢規劃。例如,在 PL/pgSQL 函數中,使用 IF-THEN-ELSE 語句來保護有風險的運算要比將它嵌套在 CASE 表示式中要安全得多。
同一種類型的另一個限制是,CASE 無法阻止運算其中包含的彙總表示式,因為需要在 SELECT 資料列表或 HAVING 子句中的其他表示式之前計算彙總表示式。例如,下面的查詢可能會導致一個除以零例外情況,儘管似乎已經受到保護:
min() 和 avg() 彙總運算是在所有輸入的資料列上同時計算的,因此如果任何員工的資料等於零,則在有任何測試 min() 結果的機會之前,發生除以零的錯誤。相反,使用 WHERE 或 FILTER 子句來防止有問題的輸入資料列,將可以在彙總函數之前來預防這種情況發生。
expression operator expression
(雙元中置運算子)
operator expression
(單元前置運算子)
expression operator
(單元後置運算子)
「資料表」(table)在關連式資料庫中的角色很接近在紙上畫一個「資料表」:包含了列與欄。欄的數量與次序是固定的,而每個欄位都有一個名稱。列的數量是變動的—它表示在當下有多少資料被存在資料庫中。SQL 並不保證列在資料表中的次序。當讀取資料表的時候,除非明確要求要排序,不然列與列之間是不存在固定的次序。這些將在第 7 章中進一步說明。進一步來說,SQL 並沒有給每一列一個唯一性的識別,所以在資料表中是有可能存在有完全相同內容的列。這是 SQL 架構下的數學模型結果,通常不是理想的結果。在這章之後,我們會說明如何處理這個問題。
每一個欄位都有一個資料型別。資料型別限制了儲存於該欄位的資料內容,同時也設定了資料儲存的型態,使得該資料可以直接用於計算。舉個例子,一個被宣告為數字型別的欄位,就不能放進任何文字字串,而儲存於此欄位中的資料,可用於數學計算。相反地,一個被宣告為字元字串的欄位,可以儲存任何型能的資料,但就無法用於數學計算了,雖然也有其他操作可以進行字串串接。
PostgreSQL 擁有許多內建的資料型別,可以適應許多應用系統。使用者也可以自訂他們所需的資料型別。大多數內建的資料型別都有顯而易見的名稱與用法,所以我們打算在第 8 章再做詳細的說明。有一些常用的資料型別,像是 interger 用於整數,numeric 用於浮點數,text 用於字串,date 則是日期,time 是時間,而 timestamp 則同時包含日期和時間。
要建立一個資料表,你可以使用 CREATE TABLE 指令。這個指令你至少要指定一個名稱給新的資料表,還有每一個欄位的名稱與資料型別。例如:
這個建立一個叫作 my_first_table 的資料表,它包含了兩個欄位。第一個欄位叫作 first_column,其資料型別為 text;第二個欄位名稱為 second_column,資料型別為 integer。表格與欄位名稱的規則依 4.1.1 節中所介紹的識別字語法,但也有一些例外。注意欄位列表是用逗號分隔,並且包含於括號之中。
當然,前面的例子明顯只是做做樣子而已。一般來說,你會將你的資料表欄位以實際用途來命名,所以我們來看一下更實際的例子:
(numeric 資料型別可以儲存浮點數,用於典型的貨幣計量。)
小技巧當你建立了許多相關的資料表時,建立最好選擇一個用於命名表格及欄位的規則。舉例來說,有一個規則是使用單數或複數名詞來取名表格,兩者都有些人喜歡使用。
一個資料表中有多少欄位是有限制的,依欄位型別而定,上限通常是 250 個到 1600 個之間。不過,宣告到這麼多的欄位是非常罕見,而且應該是有問題的設定。
如果你不再需要某個資料表,你可以移除它。請使用 DROP TABLE 指令,如下所示:
企圖要移除一個不存在的資料表,會產生錯誤。不過,在 SQL 腳本中,在建立資料表前嘗試移除是很常見的,通常會忽略錯誤訊息,所以不論資料表是否已經存在,腳本都能如預期執行。(如果你需要的話,你也可以使用 DROP TABLE IF EXISTS 來避免產生錯誤訊息,但這並不是標準 SQL 語法。)
如果你需要變更資料表的結構的話,請參閱本章的 5.5 節。
到目前為止,你已經可以利用工具建立完整功能的資料表。本章接下來的部份會針對附加的功能介紹,像是確保資料完整性、安全性、或方便性。如果你現在急著要將資料存入你的資料表的話,你可以暫時跳過本章,到第 6 章繼續操作。
在這個部份介紹如何在 PostgreSQL 中使用 SQL 語言。首先,我們從一般性的 SQL 語法開始說明,然後解釋如何建立結構來保存資料,如何充實資料庫,以及如何查詢資料的方法。中段的部份列出 SQL 指令中的資料型別與函數。最後剩餘的部份,將會針對一些調教資料庫的重要議題進行說明。
這個部份的內容設計讓初學者可以循序漸進地完整瞭解該主題,而不需要反覆前後查閱。各章的內容設計上都是獨立的,所以進階的使用者可以分別閱讀他們需要的部份。在這個部份的內容,針對於主題式的單元描述。需要瞭解詳情的讀者,請參閱第 6 部份中,個別指令的說明頁面。
在這個部份裡的讀者,應該要知道如何連線到一個 PostgreSQL 資料庫,並且執行 SQL 指令。如果不熟悉這些操作的讀者,建議先閱讀第 1 部份的內容。SQL 指令一般是使用終端工具 psql,但其他具有類似功能的程式也可以使用。
PostgreSQL 允許函數呼叫的時候,使用編號或名稱記號。名稱記號特別好用在於有很多參數的時候,因為它能讓參數與實際的引數有更明確的關連,也更有信賴感。使用編號記號的話,函數呼叫就會依其宣告時的參數次序給予編號;而使用名稱記號的話,參數就會依宣告時的名稱配對,不需要次序對應。
不論哪一種記號方式,如果在宣告時有設定預設值的話,那就不一定要在呼叫時設定其值。不過這點對名稱記號特別好用,因為任何參數的組合都可以省略,而編號記號時就只有從最右邊的參數開始省略。
PostgreSQL 也支援混合式的記號方式,也就是同時使用編號,也使用名稱。在這個例子中,編號的參數會先使用,然後名稱的參數在其之後使用。
接下來的例子,將會描繪所有三種記號方式,都使用下列的函數定義:
函數 concat_lower_or_upper 有兩個必要的參數,a 與 b。然後有一個參數是選擇性的,uppercase 的預設值是 false。參數 a 和 b 的文字會被連結起來,然後依 uppercase 的設定,強制轉換為大寫或小寫字母。這個函數定義的其他部份在這裡並不重要(詳情請參閱第 37 章)。
編號記號是 PostgreSQL 傳統的參數呼叫方式,如下所示:
所有的參數會依序指定。結果是全大寫,因為 uppercase 設定為 true。另一個例子如下:
這裡的 uppercase 省略了,所以會使用預設值 false,結果就以小寫字母輸出。在編號的記號方式時,參數的省略是由右至左,只有具有預設值的部份才能省略。
使用名稱作為參數記號方式的話,每一個參數名使用「=>」來指定其所代表的表示式,如下所示:
In named notation, each argument's name is specified using=>
to separate it from the argument expression. For example:
再一次省略 uppercase,所以它自動設為 false。使用名稱記號的一項好處就是參數不用固定次數,如下例所示:
有一種比較舊的語法是使用「:=」,因為相容性而保留下來:
混用記號指的就是混合使用編號及名稱來設定參數。然而,如前所述,名稱參數不能先於編號參數。例如:
在上面的查詢中,a 和 b 兩個參數以編號指定,而 uppercase 就以名稱指定。在本例子,只有增加一點點內容而已。使用比較複雜的函數時,會有許多參數設定了預設值,以名稱或混合的方式來設定參數,可以節省許多撰寫的程式碼,也可以減少出錯的可能性。
名稱記號和混用記號目前不能用於彙總函數的呼叫(但如果是用於窗函數是就可以)。
這一章涵蓋了如何建立資料庫結構。在關連式資料庫中,原始資料儲存在表格之中,所以在這一章裡,主要說明表格如何建立及調整,以及有什麼樣的功能可以操控所存放的資料。再來我們會討論表格如何以結構來管理,以及權限的設定。最後,我們會簡短地看一下其他的功能如何影響資料儲存,像是繼承、表格分割、view、函數、還有觸發函數。
每一個表格都有幾個系統欄位,而它們是由資料庫系統預先定義好的,所以使用者在定義欄位名稱時,不能使用這些名字。(這些限制並不是因為它們是保留關鍵字,所以就算用引號括起來也不能使用。)但在一般使用時,你也不需要特別考慮這些欄位,只要瞭解會有這些欄位存在就好。
oid
每一個資料列會有一個 Object ID,不過這個欄位只有在建立表格時,加上 WITH OIDS 語法才能使用。或者也可以藉由參數 來切換使用。這個欄位的型別是 oid(和欄位名相同)。參閱 瞭解詳細資訊。
tableoid
每個表格也有一個 ID 也會記錄在每一個資料列中。這個欄位特別方便在取得表格的繼承結構(參閱 ),如果沒有這個欄位的話,要去找出資料列的來源就會很麻煩。tableoid 可以參考 pg_class 表格中的 oid 欄位,進一步取得表格的名稱。
xmin
這指的是資料列在插入資料的版本資訊。(每一個資料列的版本,都是一個獨立的資料狀態;每一次資料的更新,都會在邏輯層產生一個新的資料列版本。)
cmin
指令識別碼,會存在於新增資料的交易中。(從 0 開始)
xmax
刪除資料的交易版本資訊,如果是 0 的話,代表讓資料列不是刪除中的資料列版本。這通常是用來指出某個刪除的交易還未被完成,或某個刪除正在被回復。
cmax
指令識別碼,有數字的話表示一個刪除的交易指令,或是 0。
ctid
表示每一個資料列存在於該表格的實體位址。注意到的是,雖然 ctid 可以用來快速找到特定的資料列版本,但 ctid 是會改變的,如果有執行過 VACUUM FULL 的話。所以 ctid 如果要用於固定的資料定位的話,是不應該被考慮的選項。OID 或額外自訂序列數字,更適合用於分別邏輯上的資料列。
OID 是一個 32 位元的數字,以 cluster 為單位配發。在一個大型或長期使用的資料庫中,是有可能出現重覆的情況。所以,假設 OID 是唯一的識別是不正確的觀念,除非你還有搭配其他方法來確保唯一性。如果你需要識別表格中的資料列的話,使用序列數產生器是比較建議的作法。OID 也可以這樣用來得到一些額外的預防性功能:
唯一性的限制應該設定在 OID 欄位上,來確保每一個 OID 可以識別每一個資料列。當有唯一性限制存在的時候,對於已經存在的資料列就不會有重覆的 OID。(當然,這方法只能用於資料筆數在 40 億筆以下的表格。不過實務上的表格多數都少於這個數目,而且太多資料的話,效果也會變得很差。)
OID 在多個表格間就不能假設為是唯一,你應該搭配 tableoid 來識別資料庫層級的唯一性。
當然,在建立表格時必須要加入 WITH OIDS 語法。在 PostgreSQL 8.1 之前,WITHOUT OIDS 是預設值。
交易識別碼也是 32 位元的數字。在一個長期運行的資料庫中,交易識別碼也可能會重覆。只要有適當的管理機制的話,這並不會是什麼嚴重的問題,詳情請參閱第 24 章。然而,長期來說(超過 10 億個交易),假定交易識別碼的唯一性是不明智的作法。
指令識別碼也是 32 位元的數字,其絕對上限是約 40 億個指令在一個交易當中,實務上這個限制並不會是問題。注意到這個限制是 SQL 指令數量的限制,而不是處理資料的限制。只有真正有改變資料庫內容的指令才會有指令識別碼。
Generated column (自動欄位)是特殊的欄位,它的內容由其他欄位的內容計算得出。相對於資料表來說,就是欄位形態的 View。Generated column 有兩種:stored 和 virtual。 Stored 的自動欄位在寫入(插入或更新)時進行計算,會像正常欄位一樣佔用儲存空間。Virtual 的自動欄位則不佔用任何儲存空間,而是在讀取時會對其進行計算。因此,虛擬的自動欄位類似於檢視表(view),而儲存的自動欄位則類似於具體化檢視表(materialized view)(但會自動更新)。 PostgreSQL 目前僅實作了儲存的自動欄位。
To create a generated column, use the GENERATED ALWAYS AS
clause in CREATE TABLE
, for example:
The keyword STORED
must be specified to choose the stored kind of generated column. See for more details.
A generated column cannot be written to directly. In INSERT
or UPDATE
commands, a value cannot be specified for a generated column, but the keyword DEFAULT
may be specified.
Consider the differences between a column with a default and a generated column. The column default is evaluated once when the row is first inserted if no other value was provided; a generated column is updated whenever the row changes and cannot be overridden. A column default may not refer to other columns of the table; a generation expression would normally do so. A column default can use volatile functions, for example random()
or functions referring to the current time; this is not allowed for generated columns.
Several restrictions apply to the definition of generated columns and tables involving generated columns:
自動欄位的表示式只能使用 immutable 函數,不能使用子查詢或以任何方式引用同筆資料以外的任何內容。
自動欄位的表示式不能引用另一個自動欄位。
自動欄位的表示式不能引用系統欄位(tableoid 除外)。
自動欄位不能有欄位預設值或識別定義。
自動欄位不能是分割區主鍵的一部分。
外部資料表可以具有自動欄位。有關詳細資訊,請參閱 。
其他注意事項適用於自動欄位的使用。
自動欄位與其一般欄位分開維護存取權限。因此,可以對其進行安排,以便設定可以從自動欄位中讀取,但不能從一般欄位中讀取的特定角色。
從概念上講,在執行事件觸發器之前,會先更新自動欄位。因此,在 BEFORE 觸發器中對基本欄位所做的更新將先反映在自動欄位中。但是相反地,不允許在觸發器之前讀取自動欄位。
欄位可以指定一個預設值。當新的列被插入,某些欄位卻沒有指定其值時,這些欄位將會被填入相對應的預設值。資料處理的過程中,當有欄位的值不確定時,也會被設定為其預設值。(關於資料處理的詳細內容,請參閱。)
如果預設值並沒有明確被指定時,預設值就會是 null。一般來說空值是可接受的情況,因為空值可以表示「未知的資料」的意義。
在表格定義時,預設值接在資料型別後宣告,如下所示:
預設值也可以是運算表示式,會在資料插入的同時進行運算(不是在表格建立時)。常見的例子是 timestamp 欄位,會設定一個 CURRENT_TIMESTAMP 的預設值,使其在資料插入時設定為當下的時間。另一個例子是產生「序列數」,這在 PostgreSQL 中,通常以下列語法來表現:
這裡的 nextval() 函數會從序列物件取得下一個數字(參閱 )。這個例子也常簡化為:
當你建立了一個表格,而你發現出了點錯,或者應用需求有一些改變,那麼你可以移除它再重新建立。但這可能不會一個好的選擇,當表格中已經儲存了許多資料時,或者表格正在被其他的資料庫物件所參考中(例如外部鍵參考)。所以 PostgreSQL 提供了一系列的指令來修改現存的表格。注意到這和更新表格內資料的概念是不同的:在這裡,我們主要針對的是調整表格的定義或結構。
你可以:
加入欄位
移除欄位
加入限制條件
移除限制條件
改變預設值
改變欄位資料型別
變更欄位名稱
變更表格名稱
所有這些動作都透過 指令來進行,你可以參考該頁面取得詳細資訊。
要加入一個新欄位,請使用下面的指令:
這個新的欄位預設會以預設值填入(如果你沒有使用 DEFAULT 子句來宣告的話,那會使用 NULL)。
你也可以在新增同時建立限制條件:
事實上,所有在 CREATE TABLE 的選項都可以在這裡使用。要記得的是,預設值必須要符合限制條件的設定,否則這個欄位會無法加入。順帶一提的是,你也可以隨後再加入限制條件(隨後說明),在你更新好新的欄位資料內容後。
加入一個欄位,並且設定預設值,會更新表格的裡的每一個資料列(為了存入新的欄位內容)。然而,無預設值的話,PostgreSQL 就不會在實體上真正進行更新的動行。所以如果你的新欄位大多數的內容都不是預設值的話,那麼就建議不要在加入欄位時設定預設值。之後再使用 UPDATE 來分別更新其內容,然後再以隨後的介紹來更新預設值的設定。
要移除一個欄位,請使用下列指令:
不論資料在該欄位是否消滅,表格的限制條件都會同步再次啓動檢查。所以,如果欄位是被外部鍵所參考的話,PostgreSQL 不會就這樣移除它。你可以宣告同步刪去與此欄位相關的物件,加上 CASCADE:
要加入限制條件,請使用表格限制條件的語法,例如:
要加入 NOT NULL 限制條件的話,就不能寫成表格的限制條件,請使用這樣的語法:
加入的限制條件會立即開始檢查,所以當下的資料內容必須要能符合條件才能加入成功。
要移除限制條件,你需要先知道它的名稱。如果你在宣告時有命名的話,那就使用那個名稱,否則你得找出系統自動命名的名稱。其所使用的指令為「\d tablename」,會列出表格相關的資訊。或使用其他的資料庫工具應該也可以找到它。找到之後請使用下列指令來移除限制條件:
(如果你的限制條件名稱像是「$2」這樣的,不要忘記使用雙引號括住,使其可以正確地被識別為是名稱。)
在移除欄位時,你需要加入 CASCADE,如果你需要同步移除相關的限制條件的話。像是外部鍵就會依賴另一個唯一性限制或主鍵的限制條件。
下面這可以用在移除 NOT NULL 限制的欄位:
(記得 NOT NULL 是沒有名稱的。)
要設定新的欄位預設值,請使用下面指令:
注意這並不會影響到已經存在的資料,只有隨後新增的資料才會使用。
要移除任何預設值,請使用:
這個指令會把預設值設為空值。因為預設值本來就設為空值,所以即使刪去一個未設定預設值欄位的預設值,也不會是一種錯誤。
要變更欄位成為另一個資料型別,請使用下列指令:
這只有在欄位內容可以被自動轉換型別時才會成功。如果存在比較複雜的轉換時,你需要加上 USING 子句來指示如何轉換資料內容。
PostgreSQL 會企圖轉換欄位預設值到任何新的型別,而所有的限制條件也會啓動檢查機制。但這些轉換可能會失敗,也可以產生意外的結果。比較好的作法是,先移除限制條件,再變更資料型別,最後再重新加入適當調整後的限制條件。
要變更某個欄位的名稱:
要變更表格的名稱:
有關 SERIAL 的簡寫方式,將在 中說明。
請參閱 ,瞭解詳細的處理機制。
除了透過 GRANT 指令設定 SQL 標準的權限系統之外,資料表也可以有資料列層級的安全原則,控制每個使用者在資料查詢或變更時,所能接觸到的資料列。這個功能就稱作資料列安全原則(Row-Level Security)。預設上,資料表並不會有這些安全原則,所以只要使用者能存取該資料表,就表示他能存取所有資料列的內容。
當資料列安全原則在資料表裡被啓動後(使用 ALTER TABLE ... ENABLE ROW LEVEL SECURITY),所有資料表的操作,就必須符合資料列安全原則的設定。(當然,資料表的擁有者並不受限於資料列安全原則。)如果資料表中未設定任何原則,那麼預設就是拒絕存取,意思就是任何資料列都不能被看見或修改。但如果是整個資料表的操作行為,像是 TRUNCATE 或 REFERENCES,就不會受到影響。
資料列安全原則可以被設定在命令,使用者角色,或兩者兼具。安全原則也可以使用 ALL 的修飾字,或具體指出是 SELECT、INSERT、UPDATE、或 DELETE。多重角色可以共用一個安全原則,一般使用者或承繼的角色都會被同步影響到。
要設定一個安全原則來指出哪些資料列可見或可修改,是以一個回傳值為布林值的表示式來決定的。這個表示式會計算每一個資料列的結果,在使用者進行任何操作之前。(這個規則唯一的例外是 leakproof 函數,用來確保沒有洩漏資訊;查詢最佳化元件會選押在確任資料列安全原則前就先執行它。)在這個表示式沒有回傳 true 的資料列,都是不能被存取的。獨立的表示式可用於提供資料列專屬的控制,判斷其是否可供讀取或修改。安全原則表示式是查詢的一部份,和使用者執行查詢時一起執行,不過,安全原則表示式是可以存取到該使用者看不到的資料。
超級使用者因為擁有 BYPASSRLS 的屬性,所以永遠可以通過安全原則檢查而存取資料表。資料表的擁有者一般來說也是可以通過檢查,但可以使用 ALTER TABLE ... FORCE ROW LEVEL SECURITY 來強制適用安全原則。
開啓或關閉資料列安全原則的權限,只屬於資料表擁有者。
使用 CREATE POLICY 指令來建立安全原則;使用 ALTER POLICY 指令來修改;使用 DROP POLICY 指令來移除原則。要開啓或關閉安全原則的功能,請使用 ALTER TABLE 指令。
每一個安全原則都有一個名稱,而一個資料表可以定義多個安全原則。安全原則是資料表專屬的,而每一個安全原則在所屬資料表內必須有一個唯一的名稱。不同的資料表下的安全原則可以取相同的名稱。
當多個安全原則使用者某個查詢上時,可能會使用 OR 串接(開放安全原則 permissive policies,這是預設的狀態),也可能以 AND 串接(嚴格安全原則 restrictive policies)。這類似角色授權的情況。有關於開放安全原則與嚴格安全原則的細節,稍後再進行說明。
先進行一個簡單的範例,我們建立一個安全原則在資料表 account 上,它只允許 managers 的使用者可以存取資料列,並且只能存取他自己帳號的資料列:
如果沒有指定角色或使用者時,就會以 PUBLIC 替代,也就是所有使用者都適用。要允許所有使用者存取他們自己的資料列的話,就可以簡化指令為:
想要定義一個安全原則是有別於可見性權限的話,請使用 WITH CHECK 字句。例如希望讓所有人都可以看到所有資料列,但只能修改自己的資料的話:
資料列安全原則也可以透過 ALTER TABLE 指令關閉。不過關閉資料列安全原則,並不會移除任何已定義的原則,只是單純被忽略而已。然後資料表的所有資料列,就只依標準 SQL 的權限系統,決定查詢及修改的權力。
下面是一個較複雜的例子,展示這個功能如何被應用於產品等級的環境裡。資料表 passwd 模擬 Unix 的密碼檔:
對於任何的安全設定,很重要的是,你必須實際測試來確認系統的行為和你預期的相同。使用上面的例子,下面的測試表現出權限設如預期地運作。
所有的安全原則,目前來說都是開放安全原則,意思是當有多個安全原則被引用時,它們會以 OR 運算串連其結果。開放安全原則用於只允許在計畫內的環境使用的話,它會比和嚴格安全原則(把安全原則用 AND 串連起來判斷)一起使用來得簡單。基於上面的列子,我們建立一個嚴格安全原則,它限制管理者只能透過 unix socket 連線才能存取 passwd 資料表:
我們接下來就可以看到,管理者透過一般網路連線,是看不到任何資料的,因為嚴格安全原則:
資料一致性的檢查,像是唯一性、主鍵、以及外部鍵參考,都會略過資料列安全原則,以維持資料的一致性。在發展資料庫結構時應該要特別小心,以資料列安全原則避免透過一致性檢查而產生隱藏通道洩露資訊。
在某些情況,很重要的是要確認安全原則是否被觸發。舉例來說,當進行資料備份流程時,如果安全原則造成某些資料被備份程式忽略了,那可能就會很糟糕。在這種情況下,你可以把 row_security 這個參數設為 off。這並不是避開安全原則,而是在觸發安全原則時,會出現錯誤訊息,使得我們可以發現進而修正原則。
在上面的例子裡,安全原則表示式只引用了目前資料列中的資料。這是最簡單也是最常見的形式,可以的話,最好以這樣的方式來設計安全原則。如果需要參考其他資料列或資料表來做決定的話,那麼可以使用子查詢或函數的方式達成,也就是包含一個 SELECT 的查詢語句在表示式中。要注意到的是,這種方法可能會造成資料庫內交易競爭(race condition)的狀態,不注意的話也可能產生資訊的洩漏。像這樣的例子,試試下面的資料表設計:
現在假設 alice 想要變更"slightly secret"的資訊,但決定不讓 mallory 看到新的內容,所以她這麼做:
看起來很安全,因為沒有窗口讓 mallory 可以看到"secret from mallory",然而,這裡就存在了交易競爭的情況。如果 mallory 也在同時做了:
因為她的交易是屬於 READ COMMITTED 模式,所以她有可能會看到"secret from mallory"。這會剛好發生在,她在 alice 的交易完成前一刻。mallory 的指令會暫時擋下 alice 的提交完成,而因為 FOR UPDATE,她會取得更新後的資訊。所以她並沒有從隱含的使用者執行 SELECT 取得資訊,因為子查詢沒有 FOR UPDATE,使得其他使用者可以從快照裡取得資訊。因為安全原則是以舊的 mallory 權限允許她看見該筆資料。
這個問題有好幾個面向的解決方式。一個簡單的方式就是使用 SELECT ... FOR SHARE 在安全原則的子查詢裡。但這樣就必須要讓使用者擁有 UPDATE 的權限,可能不太合適。(但也可以用另一個安全原則來做更多的限制,又或是把子查詢封裝進另一個安全的函數裡)同時,大量的引用查詢也可能造成效能的問題,特別是更新資料的時候。另一個解決辦法,如果參考的資料表並不是很常更新的話,那麼可以在資料表更新時強制鎖定該資料表,確保沒有其他交易能在同時進行查詢,也就不會洩漏任何資訊。或是等待其他所有交易都完成後,才提交更新變更新的安全方案。
更多詳細,請參閱 CREATE POLICY 和 ALTER TABLE。
PostgreSQL 實作了 SQL/MED 的部份標準,讓你可以存取不在 PostgreSQL 管理下的資料,重點是,你仍然只需要使用 SQL 語法。這樣的資料我們稱作為外部資料。(注意這部份的使用不要和外部鍵搞混了,外部鍵是資料庫內部的一種條件限制。)
外部資料的存取是透過「Foreign data wrapper」(外部資料封裝技術)。外部資料封裝技術是一組函式庫,用於和外部的資料源溝通,它封裝了資料連線和存取資料的細節。有一些外部資料封裝的套件收錄在 contrib 模組之中,參閱附件 F。其他種類的外部封裝套件則由第三方產品提供。如果沒有適合你的資料源的套件的話,你也可以自己寫一個,參閱第 56 章。
要存取外部資料,你需要建立外部服務物件,用它來連結特定的外部資料源,也可以對套件進行一些設定。然後你還需要建立幾個外部資料表,用於定義外部資料的資料結構。外部資料表的使用就如一般的表格一樣,只不過它沒有實際儲存任何資料罷了。當外部資料表被查詢時,PostgreSQL 會透過外部資料封裝套件,從外部資料源取得資料,或者傳送資料到外部,進行更新資料。
存取外部資料可能需要對外部資料源進行認證。這可以利用使用者映對(user mapping)的方法,讓每個 PostgreSQL 使用者在使用部資料表時,可以傳送自己的認證資訊。
進一步的資訊,請參閱 CREATE FOREIGN DATA WRAPPER、CREATE SERVER、CREATE USER MAPPING、CREATE FOREIGN TABLE、IMPORT FOREIGN SCHEMA 等內容。
表格是關連式資料庫結構裡的主要物件,因為它負責存放資料,但並不是資料庫中唯一的物件。還有許多其他種的物件存在,讓使用上更方便或管理更有效率。這些其他的物件並不在本章中討論,但我們先在這裡列出讓你知道:
視觀
函數與運算子
資料型別和領域
觸發事件和規則覆寫
關於這些物件的詳細說明安排在第 V 部份。
當你建立了一個複雜的資料庫結構,包含了許多資料表,也設計了許多外部索引鍵、檢視表、觸發事件、函數.....等等。也就是說,其實你建立了一堆物件之間的關連性。舉例來說,資料表的外部索引鍵就與另一個資料表有著參考的關連性。
要維護整個資料庫結構的完整性,PostgreSQL 得確保你不能在有關連性的情況下,隨意刪去物件。舉例來說,企圖刪去在 5.3.5 節中,我們所使用過的產品資料表,而訂單資料表與其有相依的關連性,那就會產生如下的錯誤訊息:
這個錯誤訊息包含了很有用的指引:如果你不想要一個個處理其相依關連性,那可以一次刪去他們:
如此所有相依的物件就會被刪除了,所有相互依存的物件都會,是遞迴式的處理流程。在這個例子中,它不會移除訂單資料表,只會移除外部索引鍵的限制條件,因為沒有其他物件與該外部索引鍵相依。(如果你要確認 DROP ... CASCADE 會處理哪些物件,你可以用 DETAIL 取代 CASCADE,就會輸出其相依的物件。)
幾乎所有 PostgreSQL 的 DROP 指令都支援 CASCADE 的用法。當然,有些自然的關連性是和物件型別有關。你也可以使用 RESTRICT 來取代 CASCADE 的位置,以強制以預設的行為來處理,也就是絕對不會刪去其他相關的物件。
根據 SQL 標準,不論是 RESTRICT 或 CASCADE,都必須要在 DROP 指令中明確表示,但沒有任何一套資料庫系統真的這樣設計。不過,都會內定預設行為是 RESTRICT 或 CASCADE,每個資料庫系統的情況可能會不同。
如果 DROP 指令列出了多個物件,CASCADE 只有在這些物件之外還有相依性時才會需要。舉個例子,當執行「DROP TABLE tab1, tab2」時,即使 tab1 與 tab2 之間有外部索引鍵的相依關係,而沒有指定 CASCADE,這個操作也會完成。
對於使用者自訂的函數來說,PostgreSQL 會引用函數的外顯屬性來判斷其相依性,例如函數的參數或輸出型態,但函數內部執行的相依關係就無法追蹤了。舉個列子:
(參閱 37.4 節,瞭解 SQL 語言的函數。)PostgreSQL 會知道 get_color_note 函數相依於 rainbow 資料型別:也就是刪去該資料型別時,也會強制要刪去該函數,因為它的參數將不再合法。但 PostgreSQL 就無法發現 get_color_note 和 my_colors 之間的關連性,當該資料表被移除時,此函數並不會跟著被移除。這種情況有好有壞,函數基本上還是合法的,即使內含的資料表不存在的話,頂多就是執行會出錯就是了,只要再建立該名稱的資料表就可以讓這個函數重新正常運作。
PostgreSQL 實作了資料表的繼承方式,對於資料庫設計人員來說,將會是很好用的功能。(SQL:1999 之後定義了型別繼承的功能,但和這裡所介紹的方向有許多不同。)
我們直接以一個例子作為開始:假設我們嘗試建立「城市(city)」的資料模型。每一個州(state)都會有許多城市(city),但只會有一個首都(capital)。我們想要很快地可以找到某個州的首都。這件事我們需要建立兩個資料表,一個存放首都,而另一個記載非首都的城市。只是,當我們想要取得的是所有城市,不論是否為首都,似乎會有些麻煩?這時候繼承功能就可以幫助我們解決這個問題。我們可以定義一個資料表 capitals,它是由資料表 cities 繼承而來:
在這個例子中,資料表 capitals 會繼承父資料表 cities 的所有欄位。只是 capitals 會多一個欄位 state,表示它是哪個州的首都。
在 PostgreSQL 裡,一個資料表可以繼承多個資料表,而一個查詢可以引用該資料表裡的所有資料列或在其所屬的資料表的資料列,後者的行為是預設的。舉個例子,下面的查詢可以列出所有海沷在 500 英呎以上的城市名稱,州的首都也包含在內:
使用 2.1 節中的範例資料,將會回傳:
換句話說,下面的查詢就會查出非首都且海沷超過 500 英呎以上的城市:
這裡「ONLY」關鍵字指的是查詢只需要包含資料表 cities 就好,而不是任何繼承 cities 的資料表都包含在內。我們先前介紹過的指令:SELECT、UPDATE、和 DELETE,都可以使用 ONLY 關鍵字。
你也可以在資料表名稱後面加上「*」,明確指出繼承的資料表都需要包含在內:
注意這個「*」並不是必要的,因為這個行為本來就是預設的。這個語法用於相容舊的版本,有些版本的預設行為可能不太一樣。
在某些例子裡,也許你會希望知道哪些資料列來自於哪個資料表。有一個系統欄位稱作 tableoid,每一個資料表都會有,而它可告訴你資料列的來源:
這將會回傳:
(如果你嘗試重覆執行這個例子,你可能會得到不同的 OID 值。)藉由和資料表 pg_class 交叉查詢,你可以看到實際的資料表名稱:
將會回傳:
另一個可以得到相同結果的方式是,使用 regclass 別名型別,這個型別會將 OID 轉換成名稱輸出:
在使用 INSERT 或 COPY 指令時,繼承並不會自動轉存資料。在我們的例子中,下面的 INSERT 指令將會失敗:
我們可能會希望資料以某種方式轉送到資料表 capitals 中,但這不會發生:INSERT 指令永遠只會將資料插入到指定的資料表中。在某些情況下,如果設定了存取規則(第 40 章)的話,那有可能做到類似的效果。然而,在這個例子下是沒有辦法執行的,因為資料表 cities 中並沒有一個欄位稱作 state,所以這個指令將會被拒絕執行,如果沒有其他規則被設定的話。
所有限制條件的檢查,還有非空值的限制,都會自動從父資料表繼承下來,除非特別使用 NO INHERIT 子句來設定拋棄繼承。而其他型態的限制條件(唯一性、主鍵、外部鍵)都不會自動繼承。
一個資料表也可以繼承超過一個資料表,也就是說,它會擁有這些資料表全部的欄位,然後再加上自己所宣告的欄位。如果父資料表有相同名稱的欄位的話,或是父資料表和子資料表有同名的欄位,那麼這些欄位會被合併,它們會被合併為一個欄位。合併的時候,他們的資料型別必須要一致,否則會產生錯誤。被繼承的限制條件和無空值的限制也會用類似的方式合併。舉個例子來說,如果要合併的欄位中,任何一個欄位有 not-null 的設定的話,那麼合併後的欄位就會被設定為 not-null。如果有同名的限制條件要被合併,但他們的內容不相同的話,那麼合併也會失敗。
資料表的繼承一般來說是在子資料表建立時進行的,也就是在 CREATE TABLE 中使用 INHERITES 子句。然而,資料表也可以在 ALTER TABLE 中使用 INHERIT 子句來新增新的父資料表。要進行這個動作,新的子資料表必須已經包含所有父資料表的欄位—相同的欄位名稱及資料型別。還有在 ALTER TABLE 時加入 INHERIT 子句來移除某個欄位的繼承。動態地新增或移除繼承欄位通常是在應用分割表格(table partitioning)時特別好用(請參閱 5.10 節)。
還有一個方便的方式去建立一個相容於之後繼承的資料表,就是在 CREATE TABLE 中使用 LIKE 子句。這個方式會建立一個新的資料表,其欄位和另一個資料表完全相同。如果有任何 CHECK 子句的限制條件的話,就應該在 LIKE 子句中加入 INCLUDING CONSTRAINTS 選項,這樣就會和父資料表完全相容了。
父資料表無法在子資料表仍然存在時被移除。子資料表的欄位和限制條件也不能被移除,如果它們是由其他資料表繼承而來的話。如果你想要移除某個資料表,包含其相關的物件的話,一個簡單的方式就是在移除時加上 CASCADE 選項(請參閱 5.13 節)。
ALTER TABLE 將會讓欄位型態和限制條件的改變,衍生至繼承它的資料表之中。一樣地,移除某個欄位,如果它有被其他資料表繼承的話,那麼就必須要加上 CASCADE 選項才行。ALTER TABLE 會遵循和 CREATE TABLE 一樣的規則,決定重覆的欄位要合併還是拒絕。
指令的繼承權限是依父資料表的權限。舉個例子,當你存取資料表 cities 時,在 cities 上給予 UPDATE 的權限,同時也隱含了賦予 capitals 更新資料的權限。這考量到這些資料也會出現在父資料表,但如果你沒有特別給予 capitals 權限的話,你還是無法直接存取 capitals。類似的情況也會發生在資料列的安全原則(5.7 節),在繼承查詢時,同樣是參考父資料表的安全原則。而子資料表額外的安全原則,只在直接查詢該資料表時有效,同時任何父資料表的安全原則會失效。
外部資料表(5.11 節)也可以是繼承的一部份,父資料表或子資料表,就如同一般的資料表一樣。只是,如果整個繼承結構中,有任何外部資料不支援的操作的話,那麼整個繼承結構就都不支援。
注意,並非所有的 SQL 指令都可以在繼承結構中執行。一般常用的資料查詢,資料更新,或結構調整(像是 SELECT、UPDATE、DELETE,還有多數 ALTER TABLE 的功能,但不包括 INSERT 或 ALTER TABLE ...... RENAME),基本上預設都是包含子資料表,也支援使用 ONLY 指示字來排除子資料表。如果是資料庫維護或調教的指令,如 REINDEX、VACUUM,一般就只支援特定且實體的資料表,就不會在繼承結構中衍生其他的動作。這些個別指令相關的行為,請參閱 SQL Commands 內的說明。
繼承功能比較嚴格的限制是索引(包含唯一性索引),還有外部鍵的限制條件,都只能用在單一資料表,而不會衍生至他們的子資料表中。對外部鍵來說,無論引用資料表或是參考資料表的情況都一樣。下面是一些例子說明:
如果我們宣告 cities.name 具備唯一性或是主鍵,這不會限制到 capitals 中有重覆的項目。而這些重覆的資料列就會出現在 cities 的查詢結果中。事實上,預設的 capitals 就沒有唯一性的限制,所以就可能有多個資料列記載相同的名稱。你可以在 capitals 中也加入唯一性索引,但這也無法避免 capitals 和 cities 中有重覆的項目。
同樣地,如果我們指定 cities.name 以外部鍵的方式,參考另一個資料表,而這個外部鍵也不會衍生到 capitals 中。這種情況你就必須在 capitals 中也以 REFERENCES 設定同樣外部鍵的引用。
如果有另一個資料表的欄位設定了 REFERENCES cities(name) 就會允許其他的資料表包含城市名稱,但就沒有首都名稱。在這個情況下,沒有好的解決辦法。
這些缺點可能會在後續的版本中被修正,但在此時此刻,當你需要使用繼承功能讓你的應用設計更好用時,你就必須要同時考慮這些限制。
當一個資料庫物件被建立時,它會先指定存取權限給擁有者,而擁有者一般來說就是執行建立指令的使用者。對大多數的資料庫物件來說,其預設的狀態就是只有擁有者(或超級使用者)可以對該物件進行所有操作。要讓給其他使用者來操作的話,就必須進行授權的動作。
有很多不同種類的權限:SELECT、INSERT、UPDATE、DELETE、TRUNCATE、REFERENCES、TRIGGER、CREATE、CONNECT、TEMPORARY、EXECUTE、USAGE。這些權限對於不同物件的效果,會因為是哪一種物件而有所差別(表格、函式...等等)。要瞭解完整在 PostgreSQL 中所支援的各種物件權限,請參考 GRANT 語法頁面。這裡的內容主要說明如何使用。
修改和移除一個資料庫物件,是只有擁有者才具備的權力。
要把一個物件被指派給一個新的擁有者的話,使用該物件的 ALTER 指令,例如:ALTER TABLE。超級使用者也可以做指派的動作;原來的擁有者如果它仍是該物件的管理群組一員的話,當然也可以;再來就管理群組新的成員。
要進行授權行為的話,請使用 GRANT 指令。舉例來說,如果 joe 是一個使用者,而 accounts 是一個表格,要讓他可以獲得更新表格資料的權力:
使用 ALL 的權限,就代表授權所有可設定的權限。
有一個特別的使用者是 PUBLIC,代表的是系統內的所有使用者。當資料庫內有很多使用者時,可以制定「群組(group)」來簡化管理。這部份詳細的說明請參閱第 21 章。
要移除權限,請使用 REVOKE 指令:
物件擁有者的特殊權限(例如DROP、GRANT、REVOKE...等)都是和擁有者一起設定,而無法單獨授權。不過,擁有者可以選擇移除自己的權限,例如建立一個唯讀的表格,讓自己和其他人一樣。
回到前面所說的,只有物件的擁有者(或超級使用者)可以變更該物件的權限。然而,也可以使用「with grant option」讓另一個使用者可以代授權給其他使用者。不過如果這個「grant option」被移除時,所有被代授權的使用者都會同時喪失該權限。更詳細的說明請參閱 GRANT 及 REVOKE 說明頁面。
資料型別是一種限制資料如何被儲存在表格中的方式。然而,對許多應用來說,這樣的限制還是不夠細膩。舉個例子,一個欄位包含了產品價格,當然它必須只能是正整數,但並沒有標準的資料型別可以只限制在正整數。另一個需求是,你可能想要限制的條件是依據其他的資料而定。舉例來說,在表格中的產品資訊,每一個產品編號都不能重覆。
所以,SQL 允許你在表格和欄位上定義額外的限制條件,它幫助你對資料有更多的控制能力。當某個使用者輸入資料時,違反了限制條件,錯誤訊息就會產生。這些限制條件也會限制預設值的設定。
使用 CHECK 是最普遍的限制條件製定方式,它可以允許你指定某個欄位必須符合某個布林條件式的判斷。舉個例子,要滿足產品價格是正數的話,你可以使用這樣的語法:
如同你所看到的,限制條件會接在資料型別之後,就像是預設值的設定一樣。預設值和限制條件的設定,在語法撰寫上沒有先後次序。檢查限制條件使用關鍵字 CHECK,然後接著是一組以括號括起來的條件式。其條件式應該要包含被限制的欄位,不然就沒有任何意義。
你也可以讓該限制條件擁有另一個名稱,這樣的好處是,當錯誤訊息發生時,你可以明確得到是哪一個限制被違反了:
如上,給予這個限制條件一個名稱,使用關鍵字 CONSTRAINT,緊接著一個限制條件的定義。(如果你沒有自行命名,系統也會自動取一個名字)
一個限制條件可以參考多個欄位。例如你設定了標準價格和優惠價格,而你需要確保優惠價格一定是比標準價格要便宜的話:
前兩個限制條件和前述很類似,而第三個是新的語法。它並不是只參考某個特定的欄位,而是以逗號分隔列出所有需要遵守的條件。欄位的定義和限制條件的定義,撰寫上沒有規定次序。
我們會說前兩個是欄位的限制,而第三個是表格的限制,因為它是獨立於其他的欄位定義的。欄位限制也可以寫成表格的限制方式,不過反過來通常就不行,因為一個欄位的限制,指的就是只參考到語法上它所接續的欄位而已。(PostgreSQL 並沒有強制這樣做,但如果你的語法與其他資料庫共用的話,最好還是依這樣的語法避免混用。)上面的例子也可以改寫成如此:
或等同於:
都可以照你所喜愛的語法撰寫。
命名表格的限制條件和欄位限制條件的命名是一樣的:
應該要注意的是,檢查限制條件是否成立,端看條件表示式在運算後是真值(true)還是空值(null)。因為當有運算元是空值時,多數的運算結果都是空值,所以可能會有空值產生在想要限制條件的欄位之中。要確保欄位中不會出現空值的話,請參閱下一段的說明。
限制無空值只要以下方的語法設定,就可以限制欄位不得存在空值的輸入:
限制無空值的語法,只能使用在欄位限制上。而限制無空值等效於以 CHECK 建立一個限制條件式為(IS NOT NULL),但在 PostgreSQL 明確使用 NOT NULL 語法的話,處理會更快速。只是它的缺點是你無法給予這樣的限制一個自訂的名稱。
當然,一個欄位可以有一個以上的限制條件。只要一個接著一個即可:
撰寫的次序沒有關係,也不需要去計較限制被檢查的次序。
NOT NULL 有一個相反的語法:NULL。這並非表示欄位裡只能是空值,如果這樣的話就完全沒用處了。其實這是一種簡化,將預設值設定為空值。NULL 語法並不是 SQL 標準的一部份,所以請不要用在可移植式的應用程式裡。(這僅是 PostgreSQL 為了相容其他資料庫而增加的功能)然而,有一些使用者喜歡使用它,因為在程序檔的撰寫上,很容易利用這個語法來切換限制條件。舉個例子,你可以先寫下:
然後在需要的時候再適時加入 NOT 關鍵字即可。
在多數資料庫設計原則上,主要欄位都應該被標示為 NOT NULL。
限制唯一性,確保在某個欄位或某一群欄位的資料,是在該表格中獨一無二的。語法如下:
這是欄位限制的語法。而:
則是表格限制的寫法。
如果想要限制一群欄位的唯一性的話,請使用表格限制的語法,欄位名稱以逗號分隔:
這表示這些欄位所包含的內容組合,在整個表格中是具有唯一性的,但任何一個欄位本身並不一定具備唯一性。
你可以命名唯一性的限制條件,語法如下:
加入唯一性的限制條件,將會自動建立一個具唯一性的 B-tree 索引,其包含的欄位就如限制條件中所條列的欄位。這樣唯一性限制的語法並不能只限制某部份列的唯一性,但如果使用「部份索引 (partial index) 」的話就可以做到。
一般來說,唯一性被違反的情況是,所限制的欄位在表格中,有超過一列的資料是相等的。不過,空值並不會被計算在內。這表示說,即使設定了唯一性的限制,在被限制的欄位中,還是有可能會有多個列的資料是空值。這個設計源自 SQL 標準,但聽說有其他的 SQL 資料庫並不是這樣的規則。所以,如果要移植這個語法到其他資料庫的話,要注意這項設計有無差異。
主鍵的意思是,某一個欄位或某一群欄位,在整個表格中,其每一列的組合都是唯一的,且有宣告唯一性的限制條件,並且也包含了非空值的條件(UNIQUE 及 NOT NULL)。所以,下面的兩種語法對資料的意義相同:
主鍵也可以包含多個欄位,語法和宣告唯一限制條件類似:
加入主鍵時,會自動建立一個具唯一性的 B-tree 索引,範圍為 PRIMARY KEY 語法所定義的欄位,並且會強制將這些欄位設定為非空值(NOT NULL)。
一個表格只能有一個主鍵。(你可以使用 UNIQUE 及 NOT NULL 設定多個同樣的限制條件,在功能上幾乎是相同的,但只能有一組條件是由 PRIMARY KEY 所定義。)關連式資料庫的理論指出,每一個表格都必須要有一個主鍵。這個規則在 PostgreSQL 中並不是強制的,但通常建議最好遵循這個理論。
主鍵在用戶端文件式的資料處理上是很有用的。舉個例子,一個圖型化介面讓使用者可以修改資料,那麼可能就需要主鍵來確認每一列的唯一性,而不致於產生混淆。也有一些用途是在資料庫系統的管理上,例如,主鍵會用於外部鍵(Foreign Keys)的處理,使其可以處理表格與表格間的資料對應問題。
外部鍵指的是某個欄位或某一群欄位的內容,必須在另一個表格相對欄位之中,存在相同內容的資料。我們會說這樣的行為是在維護兩個表格之間的關連性。
就使用我們已經使用多次的產品表格吧:
讓我們假設你有一個表格用來儲存這些產品的訂單,我們要確保這些訂單內的產品確實存在。所以我們定義一個外部鍵來關連訂單的表格和產品的表格:
這樣的話,如果 product_no 沒有出現在產品表格的話,就無法建立資料了。
我們會說像這樣的情況是,訂單表格是引用表格(referencing table),而產品表格是參考表格(referenced table)。相對地,欄位也稱為引用欄位(referencing columns)及參考欄位(referenced columns)。
你可以將這個語法簡化為:
因為在參考表格中,不在主鍵欄位組合中的欄位,就是參考欄位。
外部鍵也可以參考一組欄位。一般來說,這樣要寫成表格限制條件形式,如下:
當然,組合外部鍵的欄位數量,彼此之間必須要相等。
你可以給外部鍵一個名稱,使用語法與限制條件相同。
一個表格可以有許多個外部鍵,這用於表格之間多對多的關係。例如你有一些表格記錄了很多產品和訂單,但現在你要讓每一筆訂單也可以訂購多項產品(這在先前的語法並不允許)。你也許可以試試這個表格宣告:
注意到這裡的主鍵和外部鍵是重覆的。
我們知道外部鍵不允許沒有關連到產品的訂單,但如果企圖移除一個有訂單的產品會如何呢?SQL 有幾個選項讓你直覺進行這項操作:
不允許移除被參考到的產品
同時也刪去訂單
其他?
要描繪這些情況,讓我們建立如上需求的多對多關連的結構:當某人要移除一個有訂單的產品(以 order_items 關連)時,我們不允許執行。而如果某人移除了一筆訂單,訂單內的項目也會同步被移除:
引用和同步刪除有兩個常見的作法。用「RESTRICT」防止參考的資料被刪除;「NO ACTION」表示當限制條件被違反時,引用欄位的資料仍會留存,然後回傳錯誤訊息,如果未指定處理方式的話,這會是預設的行為(這兩個語法根本上的不同是「NO ACTION」允許延遲檢查到交易事務的最後,而「RESTRICT」則不會。);「CASCADE」指的是當參考的資料列被刪除時,引用的資料列也會同步被刪除。刪除時還有兩個其他的選項:SET NULL 和 SET DEFAULT,表示引用的資料會被更新為空值或其預設值。注意到,這並不是說你就可以違反限制條件。舉個例來說,如果使用了 SET DEFAULT,但預設值卻違反了外部鍵的限制,這個操作將會失敗。
類似的於 ON DELETE 的情況是 ON UPDATE,也就是在參考欄位的資料內容被更新時的情況。可以設定的動作關鍵字是相同的。在這個情況的 CASCADE 指的就是更新參考欄位的資料內容時,引用欄位的內容也會同步被更新為相同的內容。
一般來說,引用的資料列不需要滿足外部鍵的定義,如果其任一欄位內容為空值的話。而如果「MATCH FULL」加到宣告的語法之中的話,引用的資料列就必須要全部都是空值才不受外部鍵的限制(也就是部份空值的資料列就不受限制)。如果要避免空值使得外部鍵失效的話,就應該宣告相關欄位為 NOT NULL。
外部鍵所參考的欄位必須要是主鍵或是宣告其唯一性限制,這表示參考欄位會有索引存在,這使得檢查關連的過程會是很有效率的。因為在刪除或更新參考資料表時,需要檢查引用資料表的情況,所以在引用表格的欄位建立索引,也是常見的作法。因為這並不是一定需要,而還有許多的選擇在於如何索引,所以宣告外部鍵時並不會自行以引用欄位組合建立索引。
關於更新資料與刪除資料的細節在第 6 章。也可以在 CREATE TABLE 語法說明中,找到更多外部鍵的說明。
除外宣告要確保的是,如果任意兩個資料列在指定的欄位或表示式被比較時,用於特定的運算子,至少有一個比較會回傳假(false)或空值(null)。語法如下:
詳情請參考 CREATE TABLE 中,CONSTRAINT 到 EXCLUDE 的段落。
加入除外宣告時,將會自動建立相對應的索引。
前一章討論了如何建立資料表和其他結構來保存資料。現在是把資料表填滿的時候了。本章介紹如何新增、更新和刪除資料表的資料。 下一章將會完整說明如何從資料庫中取回你遺落在裡面的資料。
前面的章節解釋了如何建立資料表,如何填入資料以及如何操作這些資料。現在我們是時候討論如何從資料庫中檢索資料了。
SQL 語法包含一連串的命令,命令是由一系列的指示記號所組合而成,以分號結尾。最後如果是串流輸入,也會結束一個命令。指示的合法性是由特別的命令語法所定義的。
指示記號可能是關鍵字、識別項、引號識別項、文字、或一個特別的字元符號。指示一般來說是以空白分隔(空白符號、定位符號、換行符號),但如果不會混淆的話,也不一定需要。(一般只出現在特殊字元用來調整了其他指示的型別)
舉個例子,下面就是一個合法(符合語法)的 SQL 輸入:
這個序列包含了 3 個命令,每行一個(然而這不是一定的,同一行可以超過一個命令,而一個命令也可以分解為多行使用)。
順帶一提的是,註解也是 SQL 輸入的一部份,但不屬於任何指示記號,他們等同於空白字元。
SQL 語法並不是很嚴格要求什麼樣的指示記號來識別命令,或是哪些是運算子或參數。通常最前面的指示記號是命令的名稱,以上面的例子來說,我們通常會說是一個「SELECT」、一個「UPDATE」、以及一個「INSERT」命令。但對於 UPDATE 命令而言,有一個 SET 指示記號出現在某個地方是必要的;同樣地,INSERT 也需要有 VALUES 來搭配。精確的語法規則都在第 6 部份中的章節進行說明。
在上面的例子中的 SELECT、UPDATE、或是 VALUES,都是屬於關鍵字的範圍。所謂關鍵字,意即在 SQL 語言中,其具有固定的意義。像指示記號 MY_TABLE 則是屬於識別項。它識別表格的名稱,欄位名稱,或是其他的資料庫物件,端看命令如何看待該識別項。然而,有時候它們會被簡稱為「名稱」。關鍵字和識別項的文法結構是相同的,意即不看整個命令的話,是無法辨別到底是識別項還是關鍵字的。完整的關鍵字列表,收錄在附件 C 當中。
SQL 識別項與關鍵字必須以英文字母開頭(a - z,也可以是附加符號和非拉丁字母,中文沒問題)或是底線(_)。剩餘的字元可以是字母、底線、數字(0 - 9)、或錢字號($)。注意錢字號,在標準 SQL 語法中是不允許使用的,所以可能會降低一些應用程式的可攜性。標準 SQL 也沒有定義包含數字或是以底線起迄的關鍵字,所以識別項這樣的形式定義是安全的,不會和標準未來的修訂相衝突。
資料庫系統不能使用長度超過 NAMEDATALEN -1 的識別項;太長的名稱仍然可以在命令中被輸入,但會被截斷。預設上,NAMEDATALEN 的設定是 64,所以最長的識別項名稱長度是 63 位元組。如果這個限制會造成困擾的話,你也可以調整 NAMEDATALEN 的編譯值,它的設定在 src/include/pg_config_manual.h 檔案中。
關鍵字和無引號識別項都是不分大小寫的,所以:
等同於:
有一種寫法很常使用,就是把關鍵字用大寫表示,而識別項名稱使用小寫,例如:
第二種要介紹的識別項是,受限制的識別項,或是引號識別項。它的形式就是以雙引號括住的任何字串。受限制的識別項,就一定是識別項,不會是關鍵字。所以,「"select"」就會被識別為名稱為「select」的表格或欄位,而無引號的 select 就會被視為是關鍵字,也可能會產生解譯錯誤,如果剛好用在可能是表格或欄位名稱的位置上的話。使用引號識別項的例子如下:
引號識別項可以包含任何字元,除了字元碼為 0 的字元以外。(要包含雙引號字元的話,請使用連續兩個雙引號。)這可以用來建立原來不能使用的表格或欄位名稱,甚至是包含空白或"&"。但長度的限制仍然要遵守。
還有一種變形的引號識別項,允許包含跳脫的形式來表現萬國碼(unicode)。這種變形會以「U&」開頭(U大小寫皆可)緊接在前面的雙引號的前面,不能有任何空白在它們之間,例如:U&"foo"。(注意,這可能會和運算子的 & 產生混淆,但可以在運算子的 & 前後都加上空白來避免這個問題。)在雙引號內,萬國碼字元以跳脫的形式表現,也就是以倒斜線再接 4 位數的 16 進位碼,或倒斜線接一個加號再串一組 6 位數的 16 進位碼。例如,識別項 "data" 可以寫成這樣:
下面是稍微不簡明的例子是,俄文的"slon"(大象),以希伯萊文字母表現:
如果希望以不同的跳脫字元來代替倒斜線的話,那麼可以雙引號結束後使用 UESCAPE 子句來指定,舉例來說:
跳脫字元可以是任何的單一字元,除了 16 進位數字的字元、單引號、雙引號、或空白以外。注意指定的跳脫字元是以單引號括住,而不是雙引號。
內容要使用到跳脫字元的話,就重覆輸入 2 次。
萬國碼的跳脫語法,只能使用 UTF8 的編碼。如果有用到其他的編碼的話,只有在 ASCII 範圍(最大為 \007F)可以使用。4 位數及 6 位數的形式,可以組合配對用來指定 UTF-16 中,大於 U+FFFF 的字元,雖然 6 位數的形式單獨就可以解決這個問題(組合配對並不會直接被儲存起來,他們會被編碼成 UTF-8 再儲存。)
把識別項用引號括起來也可以用來保持它的大小寫狀態,沒有括起來的話,都會被轉成小寫字母。舉例來說,對 PostgreSQL 而言,FOO、foo、"foo",三者都是一樣的,但 "Foo" 和 "FOO" 就彼此及前面三者都視為不同。(在 PostgreSQL 中,把未引號括起的名稱轉成小寫,並不是 SQL 的標準。SQL 標準反而是都轉成大寫。所以在 SQL 標準中,foo 應該是等同於 "FOO" 而不同於 "foo"。如果你要增加語法的可攜性的話,建議最好都使用引號括起特別的名稱,或者都不要使用引號。)
PostgreSQL 中有三種隱含型別的常數:字串、位元字串、和數值。常數也可以強制型別,有助於更精確的表達,也可以讓系統處理更有效率。接下來就開始進行相關的說明。
在 SQL 中,所謂的字串常數,指的是用單引號括住的任意字元串列,例如:'This is a string'。如果在字串常數內需要有單引號的話就使用連續兩個單引號,例如:'Dianne''s horse'。注意這不是雙引號,是兩個單引號。
兩個字串常數如果只用空白及至少一個換行符號所分隔的話,那個它們會被連在一起,和寫成一個字串是一樣的。舉例來說:
等同於:
但如果是這樣:
語法上就不正確了。(這是來自於 SQL 奇怪的常規,PostgreSQL 單純只是遵循。)
PostgreSQL 也支援跳脫字串常數,這些是 SQL 標準的延伸。跳脫字串常數使用的是字母 E (大小寫皆可),緊接著單引號所組成,例如:E'foo'。(如果字串有超過一行的話,也只要在第一個單引號前有 E 就可以了。)在跳脫字串當中,使用倒斜線開頭,就可以使用 C 語言式的倒斜線跳脫字串,通常是一個倒斜線再接一個字元,對應到一個特殊位元組的值,如 Table 4.1 所示。
Table 4.1. 倒斜線跳腳字串(Backslash Escape Sequence)
任何其他接在倒斜線後面的字元都僅以原樣呈現。而如果要包含一個倒斜線的話,就使用連續兩個倒斜線輸入。同樣地,要包含一個單引號的話,可以使用跳脫字串 \' 輸入,也可以用一般連續兩個單引號的方式輸入。
你需要確保你所使用的 8 進位或 16 進位創建的位元組序列,都是屬於資料庫中合法的字元集。當資料庫編輯是 UTF-8 時,就應該使用萬國碼跳脫寫法,或其他萬國碼的輸入方式,如前 4.1.2.3 中所述。(所謂其他的方式可能是自行組合每一個位元組,但這樣會是相當麻煩的事。)
萬國碼跳脫語法只有在 UTF8 的編碼下才完整支援。當有其他的字元編碼被使用時,就只能使用 ASCII 的範圍(最大值為 \u007F)中的值。4 位數及 6 位數的型式可以用來配對指定 UTF-16 超過 U+FFFF 的字元,即使 6 位數的型式就足以解決這個問題。(當使用配對語法,且字元編碼為 UTF8 時,他們會先被合併成單一字元,然後再編碼成 UTF-8。)
如果設定檔參數 standard_conforming_string 設定為 off,PostgreSQL 不論在一般字串還是跳脫字串常數,都會把倒斜線識別為跳脫符號。然而,在 PostgreSQL 9.1 之前,這個參數的預設值為 on,表示只在跳脫字串常數裡,才把倒斜線視為跳脫符號。這樣的模式是更與標準相容的,但可能會破壞默認舊有設定的應用程式,也就是總是把倒斜線視為跳脫符號。在這樣的背景之下,你可以把這個參數設為 off,但更好的是,修改程式不再使用倒斜線跳脫符號。如果你需要使用倒斜線跳脫符號來表示一個特殊字元,請使用 E 開頭的字串常數。
有關 standard_conforming_string,順帶一提的是,還有 escape_string_warning 和 backslash_quote 兩個參數,也提供調整倒斜線在字串常數中的使用。
字元代碼 0 的字元不能使用在字串常數當中。
PostgreSQL 也支援其他跳脫字串的語法,可以用來直接輸入任意的萬國碼字元。萬國碼跳脫字串常數是以 U& (U& 或 u& 皆可)開頭,然後緊接著單引號括住的字串,記得中間不能有任何空白,例如:U&'foo'。(注意這可能會混淆到 & 的使用,最好在其他使用 & 作為運算子的指令中,在 & 前後 加上空白字元,以避免這個問題。)在括住的內容裡,萬國碼字元可以使用跳脫字元來指定,也就是使用倒斜線再接一組 4 位數的 16 進位值,或者以倒斜線加上加號再接一組 6 位數的 16 進位值。舉個例子,字串 'data' 也可以寫成:
下面是稍微不簡明的例子是,俄文的"slon"(大象),以希伯萊文字母表現:
如果希望以不同的跳脫字元來代替倒斜線的話,那麼可以雙引號結束後使用 UESCAPE 子句來指定,舉例來說:
跳脫字元可以是任何的單一字元,除了 16 進位數字的字元、單引號、雙引號、或空白以外。
萬國碼跳脫語法只有在 UTF8 的編碼下才完整支援。當有其他的字元編碼被使用時,就只能使用 ASCII 的範圍(最大值為 \u007F)中的值。4 位數及 6 位數的型式可以用來配對指定 UTF-16 超過 U+FFFF 的字元,即使 6 位數的型式就足以解決這個問題。(當使用配對語法,且字元編碼為 UTF8 時,他們會先被合併成單一字元,然後再編碼成 UTF-8。)
然而,萬國碼的跳脫字串語法,只有在參數 standard_conforming_strings 設定為 on 時有效。這是因為這個語法可能會造成 SQL 指令在編譯時的困擾,造成 SQL 隱碼攻擊(SQL injection) 或其他安全性的問題。如果這個參數設定為 off,那麼這個語法就會被禁止,並且產生錯誤訊息。
內容要使用到跳脫字元的話,就重覆輸入 2 次。
標準的語法用於字串常數的設定很方便的,但如果字串裡有很多單引號或倒斜線,可讀性就很低了,因為它們都必須再連續多一個符號輸入。像這樣的例子,要改善可讀性的話,PostgreSQL 提供了另一個方式,稱作「錢字引號」(dollar quoting),來描述字串常數。錢字引號字串常數包含一個錢字號($),可省略或多個字元所組成的「標籤」,另一個錢字號,組成字川的任何序列文字,再一個錢字號,與起始的錢字引號同樣的標籤,再一個錢字號。舉例來說,這裡有兩個不同使用錢字引號的方式,但都是「Dianne's horse」
注意在錢字引號字串中,單引號的使用就不需要跳脫處理了。實際上,在錢字引號字串中,沒有字元需要跳脫處理:字串內容就原樣輸出。倒斜錢並不特別,就算是錢字號也是,除非它們是引號標籤配對的一部份。
巢狀錢字字串常數是可以的,只要在不同層選擇不同的標籤就好。最常見的用途就是撰寫函數定義。舉例如下:
這裡,「$q$[\t\r\n\v\]$q$」以錢字引號字串輸出就是「[\t\r\n\v\]」,作為 PostgreSQL 的函數內容。但這個字串並不會和外層的 $function$ 配對。對外層的字串而言,它只是被包裏的一部份字元而已。
以錢字符作為標籤(如果有的話)的引號字串和無引號的識別項,遵循相同的規則,除了它無法包含錢字符號以外。標籤是區分大小寫的,所以 $tag$String content$tag$ 是正確的,而 $TAG$String content$tag$ 是不合法的。
錢字引號字串緊接著關鍵字或識別項的話,就必須以空白分隔;否則錢字號的終止符可能會被當作前面識別項的一部份。
錢字引號並不是標準 SQL 的用法,但當撰寫一些複雜字串的時候,會比標準語法更為便利。當字串常數內嵌於另一個常數時,也是很好用的情境,像自訂函數時就時常用到。使用單引號的語法時,前面例子中的每一個倒斜線,需要使用 4 個倒斜線才能表示(原來字串常數時需要雙倒斜線,然後在執行階段時也需要雙倒斜線,一共就是 4 倍)。
位元字串常數看起來就像是一般的字串常數,只是將 B(大小寫皆可)放在引號的前面(不能有空白),例如:B'1001'。而在位元字串當中,只能有 0 或 1 的存在。
另一方面,位元字串常數也可以表示一個 16 進位的值,使用的先導字為 X(大小寫皆可),例如:X'1FF'。這個撰寫方式與使用前段方式,以 4 位數 2 進位表示每一個 16 進位位數,是相同的結果。
這兩種位元字串常數的表達方式,都可以在字串中換行,如同一般的字串常數。錢字引號表示方式不能使用在位元字串常數上。
數值常數可以以下列語法輸入:
這裡的 digits 指的是 0 到 9 的多位數十進位數字。如果有小數點的話,在小數點之前或之後要有數字。在指數標記 e 之前,也必須要有數字。字串中間不能再有其他字元或空白出現。注意,最前面正負號並不是數值常數的一部份,它是屬於運算子的概念。
下面是一些合法數值常數的例子:
42 3.5 4. .001 5e2 1.925e-3
數值常數如果沒有小數點或指數標記的話,預設就會被假定為整數,32 位元以內的為整數型別(interger),否則就會以 64 位元的大整數型別(bigint)來處理。其次就會宣告為數值型別(numeric)。只要包含小數點或指數標記的數值,都會預設使用數值型別。
預設數值常數的資料型別只是整個型別解析演算法的開端而已。在多數的情況下,各種常數會自動被轉換為最貼近內容的適當型別。不過,如果需要的話,你可以強制指定一個資料型別給該常數。舉例來說,你可以強制以實數型別(real 或 float4)來處理該數值:
實際上,在型別轉換上還有一些特殊的情況,留待後續探討。
任意型別的常數,可以使用下列的語法來表示:
字串常數的內容會由型別轉換的程序 type 來處理,其結果就會得到該常數的專屬型別。明定型別轉換可以被省略,如果不會混淆的話(舉例來說,要輸入給特定的表格欄位的話,因為已有型別宣告,就不會混淆),那麼就會自動給定型別。
字串常數可以使用一般 SQL 標準寫法,或是錢字引號寫法。
還可以使用函數式的語法來撰寫:
但並非所有的型別都可以使用這個方式,請參閱 4.2.9 節取得詳細說明。
「::」、CAST()、及函數式語法,也可以用來指定任何表示式在執行中的型別轉換,如同 4.2.9 節中所描述的。要避免語法上的混淆,「type 'string'」這個語法,只能用在指定簡單的文字常數,另一個限制是,不能用於陣列型別。陣列常數的型別指定,請使用 :: 或 CAST() 的語法。
一個運算子最長可以是 NAMEDATALEN - 1(預設為 63 個字元),除了以下的字元之外:
* / <> = ~ ! @ # % ^ & | ` ?
還有一些運算子的限制:
「--」和「/*」都不能出現在運算子裡,因為它們表示註解的開始。
多字元的運算子不能以 + 或 - 結尾,除非名稱裡也包含了下列字元:
~ ! @ # % ^ & | ` ?
舉個例子,@- 可以是合法的運算子,但 *- 就不合法。這個限制是讓 PostgreSQL 解譯 SQL 語法時,可以不需要在不同的標記間使用空白分隔。
當使用非 SQL 標準的運算子時,你通常需要在相隣的運算子間使用空白以免混淆。舉例來說,如果你已經定義了一個左側單元運算子 @,你就不能使用 X*@Y,必須寫成 X* @Y,以確保 PostgreSQL 可以識別為兩個運算子,而不是一個。
有一些字元並不是字母型態,而具有特殊意義,但並非運算子。詳細的說明請參閱相對應的語法說明。本節僅簡要描述這些特殊字元的使用情境。
錢字號($)其後接著數字的話,用來表示函數宣告或預備指令的參數編號。其他的用法還有識別項的一部份,或是錢字引號常數。
小括號(( ))一般用來強調表示式並且優先運算。還有某些情況用於表示某些 SQL 指令的部份的必要性。
中括號([ ])用於組成陣列的各個元素。詳情請參閱 8.15 節有關於陣列的內容。
逗號(,)用於一般語法上的結構需要,來分隔列表中的單元。
分號(;)表示 SQL 指令的結束。它不能出現在指令中的其他位置,除非是在字串常數當中,或是引號識別項。
冒號(:)用在取得陣列的小項。(參閱 8.15 節)在某些 SQL 分支(篏入式 SQL 之類的)中,冒號用來前置變數名稱。
米字號(*)用來表示表格中所有的欄位,或複合性的內容。它也可以用於函數宣告時,不限制固定數量的參數。
頓號(.)用在數值常數之中,也用於區分結構、表格、及欄位名稱。
註解是以連續兩個破折號開頭,一直到行結尾的字串。例如:
另外,C 語言的註解語法也可以使用:
這樣的註解,以「/*」開頭,一直持續到對應的「*/」出現才結束。這樣區塊式的註解可以巢狀使用,所以你可以一次註解掉一堆包含註解的指令。這點是 SQL 的標準,和 C 語言的使用不太一樣的地方。
註解會在進一步的語法分析前被消去,也可以方便地以空白字元替代。
Table 4.2 列出在 PostgreSQL 中,運算子的運算優先權及運算次序。大多數的運算子都是相同的運算優先權,並且是左側運算。這些優先權與次序是撰寫在解譯器的程式當中的。
你有時候需要加上括號,當遇到二元運算子與一元運算子一起出現時。舉個例子:
會被解譯為:
因為解譯器並不知道實際的情況,所以它可能會搞錯。「!」是一個後置運算子,並非中置運算子。在這個例子中,要以想要的方式進行運算的話,你必須要改寫為:
這是為了延展性而需要付出的代價。
Table 4.2. Operator Precedence (highest to lowest)
注意,使用與內建運算子同名的自訂運算子,運算優先權的規則也會以原規則適用,如同上面的樣子。舉例來說,如果你定義了一個「+」的運算子,用於自訂的資料型態,那麼它就會和內建的「+」擁有相同的運算優先權,而與你的運算內容無關。
當某個結構操作的運算子用於 OPERATOR 語法之中時,如下所示:
OPERATOR 建構式被用來為任何運算子,取得如 Table 4.2 中所示的預設運算優先權。不論在 OPERATOR() 中指定什麼運算子,都會回傳 true 的結果。
PostgreSQL 在 9.5 之前的運算優先權有一些不同。比較特別的是,比較運算子「<= >= <>」是和一般其他運算子是相同等級的;「IS」先前的優先權較高;而「NOT BETWEEN」和相關的建構式行為不一致,使得在某些情況下,「NOT」和「BETWEEN」的優先權不同。這些規則的改變是為了與 SQL 標準有更好的相容性,減少因為等價轉換的不一致處理所造成的困擾。大多數的情況,這些改變並不需要使用習慣的改變,也不會產生沒有運算子的錯誤,而且都可以透過增加括號來解決。然而,有一些極端的情況可能會在沒有錯誤的情況改變其運算行為。如果你很關心這些變化,很擔心這些無聲的錯誤,你可以打開參數 operator_precedence_warning 來測試你的程式,然後檢查是否有警告被記錄下來。
PostgreSQL 支援基本的分割資料表。本節描述了為什麼以及如何在資料庫設計中實現分割資料表。
分割資料表指的是將一個大型資料表以邏輯規則實體拆分為較小的資料庫。分割資料表可以帶來以下好處:
在某些情況下,尤其是當資料表中大多數被頻繁存取的資料位於單個分割區或少量的分割區之中時,查詢效能可以顯著地提高。分割區替代了索引的前幾個欄位,從而縮減了索引的大小,並使索引中頻繁使用的部分更有可能都放入記憶體之中。
當查詢或更新存取單個分割區的很大一部分時,可以透過對該分割區進行循序掃描而不是使用索引和遍及整個資料表的隨機讀取來提高效能。
如果計劃程序將這種需求計劃在分割區的設計中,則可以透過增加或刪除分區來完成批次加入和刪除。使用 ALTER TABLE DETACH PARTITION 或使用 DROP TABLE 刪除單個分割區比批次操作要快得多。這些命令還完全避免了由批次 DELETE 所增加的 VACUUM 成本。
很少使用的資料可以遷移到慢一些,但更便宜的儲存媒體上。
通常只有在資料表很大的情況下,這些好處才是值得的。資料表可以從分割區中受益的確切評估點取決於應用程式,儘管經驗法則是資料表的大小超過資料庫伺服器的記憶體大小的時候。
PostgreSQL 內建支援以下形式的分割方式:
此資料庫表的分割區以一個欄位為鍵或一組欄位定義的「range」來分配,分配給不同分割區的範圍之間沒有重疊。例如,可以按日期範圍或特定業務對象的標識範圍進行分割。
透過明確列出哪些鍵值出現應該在哪個分割區中來對資料表進行分割。
透過為每個分割區指定除數和餘數來對資料表進行分割。每個分割區將保留其分割鍵的雜湊值除以指定的除數所產生指定的餘數的資料列。
如果您的應用程式需要使用上面未列出的其他分割區形式,則可以使用替代方法,例如繼承和 UNION ALL 檢視表。此類方法提供了靈活性,但沒有內建宣告分割區的效能優勢。
你可以在 PostgreSQL 上宣告一個資料表實際上被劃分為多個分割區。被劃分的資料表稱為分割資料表。此宣告包括如上所述的分割區方法,以及要用作分割區主鍵的欄位或表示式的列表。
分割資料表本身是一個「虛擬」資料表,沒有自己的儲存空間。而是儲存屬於分割區,分割區是與分割資料表相關聯的基本資料表。每個分割區都儲存由其分割區範圍定義的資料子集合。插入分割區資料表中的所有的資料都將根據分割主鍵欄位的值被重新導向到相應的其中一個分割區之中。如果某筆資料的分割主鍵不再滿足其原始分割區的分割區範圍,所以 UPDATE 該筆資料將可能導致其遷移至其他分割區。
Suppose we are constructing a database for a large ice cream company. The company measures peak temperatures every day as well as ice cream sales in each region. Conceptually, we want a table like:
We know that most queries will access just the last week's, month's or quarter's data, since the main use of this table will be to prepare online reports for management. To reduce the amount of old data that needs to be stored, we decide to only keep the most recent 3 years worth of data. At the beginning of each month we will remove the oldest month's data. In this situation we can use partitioning to help us meet all of our different requirements for the measurements table.
To use declarative partitioning in this case, use the following steps:
Create measurement
table as a partitioned table by specifying the PARTITION BY
clause, which includes the partitioning method (RANGE
in this case) and the list of column(s) to use as the partition key.
You may decide to use multiple columns in the partition key for range partitioning, if desired. Of course, this will often result in a larger number of partitions, each of which is individually smaller. On the other hand, using fewer columns may lead to a coarser-grained partitioning criteria with smaller number of partitions. A query accessing the partitioned table will have to scan fewer partitions if the conditions involve some or all of these columns. For example, consider a table range partitioned using columns lastname
and firstname
(in that order) as the partition key.
Create partitions. Each partition's definition must specify the bounds that correspond to the partitioning method and partition key of the parent. Note that specifying bounds such that the new partition's values will overlap with those in one or more existing partitions will cause an error. Inserting data into the parent table that does not map to one of the existing partitions will cause an error; an appropriate partition must be added manually.
Partitions thus created are in every way normal PostgreSQL tables (or, possibly, foreign tables). It is possible to specify a tablespace and storage parameters for each partition separately.
It is not necessary to create table constraints describing partition boundary condition for partitions. Instead, partition constraints are generated implicitly from the partition bound specification whenever there is need to refer to them.
To implement sub-partitioning, specify the PARTITION BY
clause in the commands used to create individual partitions, for example:
After creating partitions of measurement_y2006m02
, any data inserted into measurement
that is mapped to measurement_y2006m02
(or data that is directly inserted into measurement_y2006m02
, provided it satisfies its partition constraint) will be further redirected to one of its partitions based on the peaktemp
column. The partition key specified may overlap with the parent's partition key, although care should be taken when specifying the bounds of a sub-partition such that the set of data it accepts constitutes a subset of what the partition's own bounds allows; the system does not try to check whether that's really the case.
Create an index on the key column(s), as well as any other indexes you might want, on the partitioned table. (The key index is not strictly necessary, but in most scenarios it is helpful.) This automatically creates one index on each partition, and any partitions you create or attach later will also contain the index.
In the above example we would be creating a new partition each month, so it might be wise to write a script that generates the required DDL automatically.
Normally the set of partitions established when initially defining the table are not intended to remain static. It is common to want to remove old partitions of data and periodically add new partitions for new data. One of the most important advantages of partitioning is precisely that it allows this otherwise painful task to be executed nearly instantaneously by manipulating the partition structure, rather than physically moving large amounts of data around.
The simplest option for removing old data is to drop the partition that is no longer necessary:
This can very quickly delete millions of records because it doesn't have to individually delete every record. Note however that the above command requires taking an ACCESS EXCLUSIVE
lock on the parent table.
Another option that is often preferable is to remove the partition from the partitioned table but retain access to it as a table in its own right:
This allows further operations to be performed on the data before it is dropped. For example, this is often a useful time to back up the data using COPY
, pg_dump, or similar tools. It might also be a useful time to aggregate data into smaller formats, perform other data manipulations, or run reports.
Similarly we can add a new partition to handle new data. We can create an empty partition in the partitioned table just as the original partitions were created above:
As an alternative, it is sometimes more convenient to create the new table outside the partition structure, and make it a proper partition later. This allows the data to be loaded, checked, and transformed prior to it appearing in the partitioned table:
Before running the ATTACH PARTITION
command, it is recommended to create a CHECK
constraint on the table to be attached matching the desired partition constraint. That way, the system will be able to skip the scan to validate the implicit partition constraint. Without the CHECK
constraint, the table will be scanned to validate the partition constraint while holding an ACCESS EXCLUSIVE
lock on that partition and a SHARE UPDATE EXCLUSIVE
lock on the parent table. It may be desired to drop the redundant CHECK
constraint after ATTACH PARTITION
is finished.
As explained above, it is possible to create indexes on partitioned tables and they are applied automatically to the entire hierarchy. This is very convenient, as not only the existing partitions will become indexed, but also any partitions that are created in the future will. One limitation is that it's not possible to use the CONCURRENTLY
qualifier when creating such a partitioned index. To overcome long lock times, it is possible to use CREATE INDEX ON ONLY
the partitioned table; such an index is marked invalid, and the partitions do not get the index applied automatically. The indexes on partitions can be created separately using CONCURRENTLY
, and later attached to the index on the parent using ALTER INDEX .. ATTACH PARTITION
. Once indexes for all partitions are attached to the parent index, the parent index is marked valid automatically. Example:
This technique can be used with UNIQUE
and PRIMARY KEY
constraints too; the indexes are created implicitly when the constraint is created. Example:
以下是分割區資料表的限制:
無法建立跨所有分割區的限制條件。只能單獨對每個分割區設定。
分割區資料表上的唯一性限制條件必須包含所有分割主鍵欄位。存在此限制是因為 PostgreSQL 只能在每個分割區中個別實施唯一性。
必要時,必須在單個分割區(而不是分割資料表)上定義 BEFORE ROW 觸發器。
不允許在同一分割區中混合臨時和永久關連。因此,如果分割資料表是永久性的,則分割區也必須是永久性的,或者都臨時的。使用臨時關連時,分割資料表的所有成員必須來自同一個資料庫連線。
While the built-in declarative partitioning is suitable for most common use cases, there are some circumstances where a more flexible approach may be useful. Partitioning can be implemented using table inheritance, which allows for several features not supported by declarative partitioning, such as:
For declarative partitioning, partitions must have exactly the same set of columns as the partitioned table, whereas with table inheritance, child tables may have extra columns not present in the parent.
Table inheritance allows for multiple inheritance.
Declarative partitioning only supports range, list and hash partitioning, whereas table inheritance allows data to be divided in a manner of the user's choosing. (Note, however, that if constraint exclusion is unable to prune child tables effectively, query performance might be poor.)
Some operations require a stronger lock when using declarative partitioning than when using table inheritance. For example, adding or removing a partition to or from a partitioned table requires taking an ACCESS EXCLUSIVE
lock on the parent table, whereas a SHARE UPDATE EXCLUSIVE
lock is enough in the case of regular inheritance.
We use the same measurement
table we used above. To implement partitioning using inheritance, use the following steps:
Create the “master” table, from which all of the “child” tables will inherit. This table will contain no data. Do not define any check constraints on this table, unless you intend them to be applied equally to all child tables. There is no point in defining any indexes or unique constraints on it, either. For our example, the master table is the measurement
table as originally defined.
Create several “child” tables that each inherit from the master table. Normally, these tables will not add any columns to the set inherited from the master. Just as with declarative partitioning, these tables are in every way normal PostgreSQL tables (or foreign tables).
Add non-overlapping table constraints to the child tables to define the allowed key values in each.
Typical examples would be:
Ensure that the constraints guarantee that there is no overlap between the key values permitted in different child tables. A common mistake is to set up range constraints like:
This is wrong since it is not clear which child table the key value 200 belongs in.
It would be better to instead create child tables as follows:
For each child table, create an index on the key column(s), as well as any other indexes you might want.
We want our application to be able to say INSERT INTO measurement ...
and have the data be redirected into the appropriate child table. We can arrange that by attaching a suitable trigger function to the master table. If data will be added only to the latest child, we can use a very simple trigger function:
After creating the function, we create a trigger which calls the trigger function:
We must redefine the trigger function each month so that it always points to the current child table. The trigger definition does not need to be updated, however.
We might want to insert data and have the server automatically locate the child table into which the row should be added. We could do this with a more complex trigger function, for example:
The trigger definition is the same as before. Note that each IF
test must exactly match the CHECK
constraint for its child table.
While this function is more complex than the single-month case, it doesn't need to be updated as often, since branches can be added in advance of being needed.
Note
In practice, it might be best to check the newest child first, if most inserts go into that child. For simplicity, we have shown the trigger's tests in the same order as in other parts of this example.
A different approach to redirecting inserts into the appropriate child table is to set up rules, instead of a trigger, on the master table. For example:
A rule has significantly more overhead than a trigger, but the overhead is paid once per query rather than once per row, so this method might be advantageous for bulk-insert situations. In most cases, however, the trigger method will offer better performance.
Be aware that COPY
ignores rules. If you want to use COPY
to insert data, you'll need to copy into the correct child table rather than directly into the master. COPY
does fire triggers, so you can use it normally if you use the trigger approach.
Another disadvantage of the rule approach is that there is no simple way to force an error if the set of rules doesn't cover the insertion date; the data will silently go into the master table instead.
As we can see, a complex table hierarchy could require a substantial amount of DDL. In the above example we would be creating a new child table each month, so it might be wise to write a script that generates the required DDL automatically.
To remove old data quickly, simply drop the child table that is no longer necessary:
To remove the child table from the inheritance hierarchy table but retain access to it as a table in its own right:
To add a new child table to handle new data, create an empty child table just as the original children were created above:
Alternatively, one may want to create and populate the new child table before adding it to the table hierarchy. This could allow data to be loaded, checked, and transformed before being made visible to queries on the parent table.
The following caveats apply to partitioning implemented using inheritance:
There is no automatic way to verify that all of the CHECK
constraints are mutually exclusive. It is safer to create code that generates child tables and creates and/or modifies associated objects than to write each by hand.
The schemes shown here assume that the values of a row's key column(s) never change, or at least do not change enough to require it to move to another partition. An UPDATE
that attempts to do that will fail because of the CHECK
constraints. If you need to handle such cases, you can put suitable update triggers on the child tables, but it makes management of the structure much more complicated.
If you are using manual VACUUM
or ANALYZE
commands, don't forget that you need to run them on each child table individually. A command like:
will only process the master table.
INSERT
statements with ON CONFLICT
clauses are unlikely to work as expected, as the ON CONFLICT
action is only taken in case of unique violations on the specified target relation, not its child relations.
Triggers or rules will be needed to route rows to the desired child table, unless the application is explicitly aware of the partitioning scheme. Triggers may be complicated to write, and will be much slower than the tuple routing performed internally by declarative partitioning.
Partition pruning (分割區修剪)是一種查詢最佳化技術,可提高分割資料表的效能。 舉個例子:
如果不進行分割區修剪,則上面的查詢將掃描 measurement 資料表的每個分割區。啟用分割區修剪後,計劃程序將檢查每個分割區的定義並證明不需要掃描該分割區,因為該分割區不會包含滿足查詢 WHERE 子句的資料。當計劃程序可以證明這一點時,它將從查詢計劃中排除(修剪)分割區。
有一部份的分割區可能使用索引掃描而不是全資料表的循序掃描,但是這裡的要點是根本不需要掃描較舊的分區來回應此查詢。啟用 partition pruning 之後,我們將獲得更為簡單的查詢計劃,該計劃能夠提供相同的回應:
請注意,partition pruning 僅由分割主鍵隱含定義的內容而來,而不會參考索引。因此,不需要在相關欄位上定義索引。是否需要為該分割區建立索引取決於您是否希望掃描分割區的查詢會掃描大部分分割區還是僅掃描一小部分。在後者情況下,索引將有所幫助,但對於前者則無濟於事。
Partition pruning can be performed not only during the planning of a given query, but also during its execution. This is useful as it can allow more partitions to be pruned when clauses contain expressions whose values are not known at query planning time, for example, parameters defined in a PREPARE
statement, using a value obtained from a subquery, or using a parameterized value on the inner side of a nested loop join. Partition pruning during execution can be performed at any of the following times:
During initialization of the query plan. Partition pruning can be performed here for parameter values which are known during the initialization phase of execution. Partitions which are pruned during this stage will not show up in the query's EXPLAIN
or EXPLAIN ANALYZE
. It is possible to determine the number of partitions which were removed during this phase by observing the “Subplans Removed” property in the EXPLAIN
output.
During actual execution of the query plan. Partition pruning may also be performed here to remove partitions using values which are only known during actual query execution. This includes values from subqueries and values from execution-time parameters such as those from parameterized nested loop joins. Since the value of these parameters may change many times during the execution of the query, partition pruning is performed whenever one of the execution parameters being used by partition pruning changes. Determining if partitions were pruned during this phase requires careful inspection of the loops
property in the EXPLAIN ANALYZE
output. Subplans corresponding to different partitions may have different values for it depending on how many times each of them was pruned during execution. Some may be shown as (never executed)
if they were pruned every time.
目前僅會在 Append 和 MergeAppend 節點類型上執行 partition pruning。尚未為 ModifyTable 節點類型實作此功能,但是在將來的 PostgreSQL 版本中可能會有所改進。
Constraint exclusion is a query optimization technique similar to partition pruning. While it is primarily used for partitioning implemented using the legacy inheritance method, it can be used for other purposes, including with declarative partitioning.
Constraint exclusion works in a very similar way to partition pruning, except that it uses each table's CHECK
constraints — which gives it its name — whereas partition pruning uses the table's partition bounds, which exist only in the case of declarative partitioning. Another difference is that constraint exclusion is only applied at plan time; there is no attempt to remove partitions at execution time.
The fact that constraint exclusion uses CHECK
constraints, which makes it slow compared to partition pruning, can sometimes be used as an advantage: because constraints can be defined even on declaratively-partitioned tables, in addition to their internal partition bounds, constraint exclusion may be able to elide additional partitions from the query plan.
The following caveats apply to constraint exclusion:
Constraint exclusion is only applied during query planning, unlike partition pruning, which can also be applied during query execution.
Constraint exclusion only works when the query's WHERE
clause contains constants (or externally supplied parameters). For example, a comparison against a non-immutable function such as CURRENT_TIMESTAMP
cannot be optimized, since the planner cannot know which child table the function's value might fall into at run time.
Keep the partitioning constraints simple, else the planner may not be able to prove that child tables might not need to be visited. Use simple equality conditions for list partitioning, or simple range tests for range partitioning, as illustrated in the preceding examples. A good rule of thumb is that partitioning constraints should contain only comparisons of the partitioning column(s) to constants using B-tree-indexable operators, because only B-tree-indexable column(s) are allowed in the partition key.
All constraints on all children of the parent table are examined during constraint exclusion, so large numbers of children are likely to increase query planning time considerably. So the legacy inheritance based partitioning will work well with up to perhaps a hundred child tables; don't try to use many thousands of children.
The choice of how to partition a table should be made carefully as the performance of query planning and execution can be negatively affected by poor design.
One of the most critical design decisions will be the column or columns by which you partition your data. Often the best choice will be to partition by the column or set of columns which most commonly appear in WHERE
clauses of queries being executed on the partitioned table. WHERE
clause items that match and are compatible with the partition key can be used to prune unneeded partitions. However, you may be forced into making other decisions by requirements for the PRIMARY KEY
or a UNIQUE
constraint. Removal of unwanted data is also a factor to consider when planning your partitioning strategy. An entire partition can be detached fairly quickly, so it may be beneficial to design the partition strategy in such a way that all data to be removed at once is located in a single partition.
Choosing the target number of partitions that the table should be divided into is also a critical decision to make. Not having enough partitions may mean that indexes remain too large and that data locality remains poor which could result in low cache hit ratios. However, dividing the table into too many partitions can also cause issues. Too many partitions can mean longer query planning times and higher memory consumption during both query planning and execution. When choosing how to partition your table, it's also important to consider what changes may occur in the future. For example, if you choose to have one partition per customer and you currently have a small number of large customers, consider the implications if in several years you instead find yourself with a large number of small customers. In this case, it may be better to choose to partition by HASH
and choose a reasonable number of partitions rather than trying to partition by LIST
and hoping that the number of customers does not increase beyond what it is practical to partition the data by.
Sub-partitioning can be useful to further divide partitions that are expected to become larger than other partitions, although excessive sub-partitioning can easily lead to large numbers of partitions and can cause the same problems mentioned in the preceding paragraph.
It is also important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few thousand partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher when more partitions remain after the planner performs partition pruning. This is particularly true for the UPDATE
and DELETE
commands. Another reason to be concerned about having a large number of partitions is that the server's memory consumption may grow significantly over a period of time, especially if many sessions touch large numbers of partitions. That's because each partition requires its metadata to be loaded into the local memory of each session that touches it.
With data warehouse type workloads, it can make sense to use a larger number of partitions than with an OLTP type workload. Generally, in data warehouses, query planning time is less of a concern as the majority of processing time is spent during query execution. With either of these two types of workload, it is important to make the right decisions early, as re-partitioning large quantities of data can be painfully slow. Simulations of the intended workload are often beneficial for optimizing the partitioning strategy. Never assume that more partitions are better than fewer partitions and vice-versa.
將已經在資料庫中的資料做修改被稱為更新。您可以單獨更新某個資料列,或資料表中的所有資料列,或是部份資料列。每個欄位可以單獨更新,而不影響其他欄位。
要更新現有的資料列,請使用 指令。這需要三種資訊:
要更新的資料表和欄位的名稱
資料欄位新的內容
哪些資料列要更新
回想一下,SQL 通常不提供資料列的唯一識別資訊。因此,直接指定要更新哪一行通常是不行的,而是指定該資料列必須符合哪些條件才能更新。只有你在資料表中有一個主鍵(決定於是否你有宣告過)之後,才能通過選擇與主鍵相匹配的條件來可靠地解決單個資料列的問題。圖形化的資料庫管理工具依賴這個方式才能允許你單獨更新指定的資料列。
例如,這個指令會將價格為 5 的所有產品更新為 10:
這結果可能是零個,一個或多個資料列被更新。嘗試更新卻沒有匹配到任何資料列,並不是一種錯誤。
我們來詳細看看這個命令。首先是關鍵字 UPDATE,然後是資料表的名稱。像往常一樣,資料表的名稱可以使用加上 schema 的完整路徑名稱,否則就會在搜尋路徑中尋找。接下來的關鍵字是 SET,後面接著欄位名稱,等號和新的欄位內容。新的欄位內容可以是任何的運算表示式,而不僅僅是一個常數。例如,如果要將所有產品的價格提高10%,則可以使用:
如你所見,欄位的表示式可以引用資料列中現有的內容。我們還遺漏了 WHERE 子句。如果省略的話,則意味著資料表中的所有資料列都會被更新。如果存在的話,則只有更新符合 WHERE 條件的那些資料列。請注意,SET 子句中的等號是一個賦值運算,而 WHERE 子句中的等號是比較運算,但這不會造成任何誤解。當然,WHERE 條件不一定是等號運算。 還有許多其他的運算子可以使用(詳見第 9 章)。但是表示式需要能產生為布林運算的結果。
您可以在使用 UPDATE 指令時,以 SET 子句中列出多個欄位賦值來更新多個欄位內容。例如:
Schema 在台灣並沒有習慣的中文說法,所以仍使用原文,而不翻譯。
PostgreSQL 資料庫叢集(cluster)可以包含一個或多個資料庫。使用者和群組則是共用於叢集的層次,但沒有任何資料面是在資料庫之間能共用的。任何用戶端連到資料庫服務,都只能存取單一資料庫,你必須在連線時指定一個資料庫。
在叢集內的使用者並不需要對每個資料庫都有使用權。使用者共用指的是它們不能有同名的情況,例如在同一個叢集內,不能有兩個使用者名稱都叫 joe。但系統可以只允許 joe 使用某些叢集內的資料庫。
一個資料庫可以包含一個或多個 schema,它會包含一些資料表。Schema 也可以包含一些資料庫物件,像是資料型別、函數、和運算子。同樣的物件名稱在不同的 schema 中是不會衝突的。舉例來說,schema1 和 myschema 都可以擁有一個叫作 mytable 的資料表。和資料庫不同, schema 並不是完全隔離的:使用者可以直接取用他們連接的資料庫中的任何 schema,只要他們擁有足夠的權限。
使用 schema 有幾個好處:
允許多個使用者存取相同資料庫,而不會互相干擾。
將資料庫物件建立邏輯上的管理層,它們會更有彈性。
第三方的應用結構可以放在不同的 schema 中,避免有撞名的情況產生。
Schema 和作業系統裡的資料夾是類似的,只是它不能使用巢狀結構。
要建立 schema,請使用 指令。給予一個自訂的名稱。例如:
要在 schema 中建立或存取某個物件,請使用句點(.)將兩者名稱串連起來:
這個形式在任何可以使用資料表的地方都是可以的,包含資料表結構更新指令,以及在接下來章節會討論到的資料處理指令。(我們只提到資料表的部份,但相同的概念用於其他資料庫物件都是一樣的,像是資料型別和函數。)
實際上,更一般化的語法是:
也可以這樣使用,但目前這只是為了符合 SQL 標準而已。如果你填上了資料庫的名稱,也必須填上你所連線的資料庫而已。
所以,要在新的 schema 中建立一個資料表,請使用:
要移除一個 schema,它必須要是空的,也就是所有所屬物件都已經被移除了,請使用:
但你也可以同步移除 schema 及其所屬物件,請使用:
通常你會想要建立一個 schema 給某個使用者使用(這是一種藉由命名空間規畫來限制使用者權限的方法)。可以使用下列語法:
你甚至可以省略 schema 名稱,省略的話,schema 名稱會與使用者名稱相同。請參閱後續的 5.8.6 節來瞭解如何使用。
Schema 名稱以「pg_」開頭的,是系統的保留名稱,使用者不能使用這樣的名稱建立 schema。
在前面我們所建立的資料表都沒有指定 schema 名稱。預設使用的 schema 是「public」,每一個資料庫都會有這個 schema。所以,下面兩種寫法是一樣的:
以及:
完整的名稱寫法是冗長而不容易使用的,通常最好不要把一些特別的 schema 名稱寫到應用程式裡。而資料表時常是以簡要的寫法引用,也就是只寫資料表本身的名稱。資料庫系統依據搜尋路徑的規則找到該資料表。在搜尋路徑上所遇到的第一個資料表就會被使用。如果整個搜尋路徑走完都沒有符合的資料表,那麼才會回報錯誤,即使該資料表名稱有出現在資料庫裡的其他 schema 中。
第一個會被搜尋的 schema,就是目前的 schema。除此之外也用於新的資料表建立,當 CREATE TABLE 未指定 schema 名稱的話,也會依搜尋路徑的 schema 建立。
要顯示目前的搜尋路徑,請使用下面的指令:
預設的情況是:
第一個項目指的就是和目前使用者同名的 schema 會被使用,而如果沒有同名的,它就會被忽略。第二個項目則是先前介紹過的公開 schema。第一個被找到的 schema,就會是新建物件時預設的位置,這就是為什麼預設都會被建立在公開的 schema。當某個物件在使用(資料表結構調整、資料更新、或查詢指令)時沒有註明 schema 的話,那也會使用搜尋路徑來找到符合的物件。不過,預設上只會搜尋公開的 schema。
要設定新的搜尋路徑,請使用:
(我們在這邊暫時忽略掉 $user,因為還沒有立即性的需要。)然後我們就可以試著存取資料表而不用加上 schema:
因為 myschema 在搜尋路徑裡是第一個項目,所以新的物件就會被建立在該處。
我們也可以這樣寫:
這樣的話,不指定的話就不再能夠再使用公開的 schema 了。「public」schema 並沒有比較特別,除了它一開始就會存在之外,它也可以被移除。
請參閱 9.25 節,將會介紹其他設定 schema 搜尋路徑的方式。
搜尋路徑也用於資料型別、函數、及運算子的搜尋,就如同在資料表上的行為一樣。資料型別和函數名稱完整的寫法也和資料表相同。如果你需要特別指出運算子的完整路徑的話,它比較特別,你必須這樣寫:
這是為了避免語法上的混淆。如下所示:
實務上我們都還是依賴路徑搜尋來使用運算子,這樣可以避免使用冗長且低可讀性的程式碼。
預設的情況,使用者無法存取任何不屬於他們的 schema 中的物件。要允許存取的話,該 schema 的擁有者必須要授予 USAGE 權限給其他使用者。要允許其他使用者使用某個 schema 中的物件,通常需要額外給予適當的權限。
使用者想要在其他使用者的 schema 中建立新物件的話,就必須要授予 CREATE 的權限。注意,預設上,所有的使用者在 public schema 中,都具備 CREATE 和 USAGE 權限。這使得所有的使用者在連線到某個資料庫之後,就可以在 public schema 上新增物件。如果你不希望這樣,你可以移除這些權限:
除了 public 以及使用者自行建立的 schema 之外,每一個資料庫還有一個稱作 pg_catalog 的 schema,它包含了系統資訊的資料表和內建的資料型別、函數、及運算子。 pg_catlog 永遠都都是搜尋路徑裡的有效項目。它沒有明確地顯示在搜尋路徑裡,但卻是隱含優先搜尋,在那些明定的搜尋項目之前。這是為了確保內建的物件的名稱都能被找到。然而,你可以把 pg_catlog 放在搜尋路徑的最後面,如果你希望自訂的同名物件能優先被使用的話。
系統用的資料表都以「pg_」開頭,為的就是確保不會有衝突的情況出現,以免將來新的系統資料表和你現在所定義的資料表同名。(以預設的搜尋路徑來說,一個簡單的資料表使用,會直接被同名的系統資料表取代。)系統資料表會一直遵循這個命名規則,就不會產生衝突,只要使用者不使用「 pg_」開頭的命名方式就好了。
Schema 可以在許多方面協助你組織你的資料。有一些巧妙的樣版值得推薦,也很方便以預設的方式支援:
如果你沒有建立任何 schema 的話,那麼所有使用者就是隱含著都使用 public schema。這種情況指的是都沒有設定任何 schema,而主要推薦給在一個資料庫中,只有一個使用者的情況。這樣的樣版設定也適合之後轉換到無 schema 設計的資料庫環境。
你可以為每一個使用者建立一個同名的 schema。回想一下先前介紹的預設搜尋路徑,第一個項目就是 $user,表示該使用者的名稱。所以,每一個使用者有一個專屬的 schema,預設上,他們就只存取他們所擁有的 schema。 如果你使用這個情境樣版,你也許會需要移除 public schema 的權限,甚至直接移除它,讓使用者真正被隔離在他們自己的 schema 中。
要安裝共享的應用程式(每個人共享資料表,有一些第三方提供的延伸套件,或其他的東西。),把他們放到不同的 schema 裡,然後記得要設定好適當的存取權限。使用者可以使用完整的名稱來存取這些共享的應用程式,或把他們加入到搜尋路徑中,由使用者自己來決定。
在標準 SQL 中,在同一個 schema 中的物件,分別被不同使用者擁有,是不被允許的。然而,有一些實作系統甚至不允許使用者建立和自己不同名的 schema。事實上,schema 和使用者的概念,對於只支援基本 schema 的資料庫系統本身而言,幾乎是相同的。所以,許多使用者會認為完整名稱指的是 user_name.table_name。這也就是為什麼 PostgreSQL 建議你這樣為每一個使用者建立他們同名的 schema。
再者,在標準 SQL 裡,也沒有所謂 public schema 的概念。極致相容標準的話,你就不應該使用,或移除 public schema。
當然,也有些 SQL 資料庫並沒有實作 schema,或提供其他跨資料庫存取的命名方式。如果你需要和這些系統共同運作,要提高可攜性的方式就是不要使用任何 schema。
到目前為止,我們已經解釋瞭如何將資料新增到資料表以及如何更新資料了。剩下的就是討論如何刪除不再需要的資料。 正如新增資料時只能新增整個資料列一樣,你只能從資料表中以資料列為單位刪除資料。在前面的章節中,我們解釋了SQL沒有提供直接處理某個資料列的方法。因此,只能透過指定要刪除的行必須符合的條件來刪除指定的資料列。如果資料列中有主鍵,則可以指定確切的資料列。但是,你也可以刪除全部符合條件的資料列,更可以一次刪除資料表中的所有資料列。
您使用 指令刪除資料列;該語法與 UPDATE 指令十分類似。例如,要從產品表中刪除價格為 10 的所有資料列,請使用:
如果你只是寫:
那麼資料表中的所有資料列都將被刪除! 請程式設計師一定要小心使用。
有時在修改資料列的操作過程中取得資料是很方便的。INSERT、UPDATE 和 DELETE 指令都有一個選擇性的RETURNING 子句來支持這個功能。使用 RETURNING 可以避免執行額外的資料庫查詢來收集資料,特別是在難以可靠地識別修改的資料列時尤其有用。
RETURNING 子句允許的語法與 SELECT 指令的輸出列表相同(詳見)。它可以包含命令目標資料表的欄位名稱,或者包含使用這些欄位的表示式。常用的簡寫形式是 RETURNING *,預設是資料表的所有欄位,且相同次序。
在 INSERT 中,可用於 RETURNING 的資料是新增的資料列。這在一般的資料新增中並不是很有用,因為它只會重複用戶端所提供的資料。但如果是計算過的預設值就會非常方便。 例如,當使用串列欄位()提供唯一識別時,RETURNING 可以回傳分配給新資料列的 ID:
對於 INSERT ... SELECT,RETURNING 子句也非常有用。
在 UPDATE 中,可用於 RETURNING 的資料是被修改的資料列新內容。例如:
在 DELETE 中,可用於 RETURNING 的資料是已刪除資料列的內容。例如:
如果目標資料表上有觸發函數的話(),則可用於 RETURNING 的資料是由該觸發函數所修改的資料列。因此,由觸發函數計算檢查欄位是 RETURNING 的另一個常見用法。
資料表在建立的時候,並不包含任何資料。以各種方式使用資料庫之前,要做的第一件事就是新增資料。概念上,資料是一次新增一列。當然你也可以新增多列,但就沒有辦法新增少於一列。 即使只知道某些欄位的值,也必須建立一個完整的資料列。
要建立新的資料列,請使用 指令。該命令需要資料表的名稱和各欄位的資料內容。例如,來看看中的產品資料表:
新增資料列的指令可能如下所示:
資料內容按資料表表中欄位的順序列出,以逗號分隔。通常,資料內容會是文字(常數),但運算表示式也是允許的。
上面的語法有缺點,就是你需要知道資料表中欄位的順序。為了避免這種情況,您可以明確地列出欄位。例如,以下兩個命令與上面的命令具有相同的效果:
許多用戶認為總是列出欄位名稱是一個很好的習慣。
如果你並沒有所有欄位的內容,則可以省略其中一些欄位。在這種情況下,那些欄位將會以預設值代入。如下所示:
第二種形式是屬於 PostgreSQL 延伸寫法。 從左邊開始的欄位填入所給定的內容,其餘的欄位則使用預設值。
為了清楚起見,你也可以明確地指定個別欄位或整個資料列都使用預設值:
您可以在一個命令中新增多個資料列:
也可以以查詢的結果新增(可能沒有資料,一個資料列或多個資料列):
分割區本身也可以定義為分割資料表,從而形成子分割區。儘管所有分割區都必須與其分割區的父親具有相同的欄位,但是分割區可以擁有自己的索引、限制條件和預設值,與其他分割區的索引、限制條件和預設值不同。有關建立分割區表和分割區的更多詳細說明,請參閱 。
It is not possible to turn a regular table into a partitioned table or vice versa. However, it is possible to add an existing regular or partitioned table as a partition of a partitioned table, or remove a partition from a partitioned table turning it into a standalone table; this can simplify and speed up many maintenance processes. See to learn more about the ATTACH PARTITION
and DETACH PARTITION
sub-commands.
Partitions can also be foreign tables, although they have some limitations that normal tables do not; see for more information.
Ensure that the configuration parameter is not disabled in postgresql.conf
. If it is, queries will not be optimized as desired.
Ensure that the configuration parameter is not disabled in postgresql.conf
; otherwise child tables may be accessed unnecessarily.
Indexes and foreign key constraints apply to single tables and not to their inheritance children, hence they have some to be aware of.
透過使用 EXPLAIN 指令和 組態參數,可以顯示已修剪分割區的計劃與未修剪分割區的計劃之間差異。對於這種類型的資料表設定,典型未最佳化的計劃是:
可以使用 設定來停用 partition pruning。
The default (and recommended) setting of is neither on
nor off
, but an intermediate setting called partition
, which causes the technique to be applied only to queries that are likely to be working on inheritance partitioned tables. The on
setting causes the planner to examine CHECK
constraints in all queries, even simple ones that are unlikely to benefit.
這個部份的機制請參閱 ,會深入介紹移除時的問題。
前面的「public」指的是 schema,是一個物件識別器;而後面的「PUBLIC」指的是所有使用者,是一個關鍵字。所以使用不同的大小寫,可以再複習 的內容。
這包含完整 SQL 查詢機制()用於計算需要新增的資料列。
同時要新增大量資料時,請考慮使用 指令。它不像 INSERT 指令那麼靈活,但是效率更高。有關提高批次新增資料效率的更多資訊,請參閱。
倒斜線跳腳字串
字元意義
\b
backspace(倒退)
\f
form feed(換頁)
\n
newline(換行)
\r
carriage return(回到行首)
\t
tab(定位符號)
\o
,\oo
,\ooo
(o
= 0 - 7)
octal byte value(8 進位值)
\xh
,\xhh
(h
= 0 - 9, A - F)
hexadecimal byte value(16 進位值)
\uxxxx
,\Uxxxxxxxx
(x
= 0 - 9, A - F)
16 or 32-bit hexadecimal Unicode character value(16 位元或 32 位元的 16 進位萬國碼字元值)
Operator/Element
Associativity
Description
.
left
table/column name separator
::
left
PostgreSQL-style typecast
[]
left
array element selection
+-
right
unary plus, unary minus
^
left
exponentiation
*/%
left
multiplication, division, modulo
+-
left
addition, subtraction
(any other operator)
left
all other native and user-defined operators
BETWEEN / IN / LIKE / ILIKE / SIMILAR
range containment, set membership, string matching
<>=<=>=<>
comparison operators
IS / ISNULL/ NOTNULL
IS TRUE
,IS FALSE
,IS NULL
,IS DISTINCT FROM
, etc
NOT
right
logical negation
AND
left
logical conjunction
OR
left
logical disjunction
如前一節所述,SELECT 指令中的資料示表表示式透過各種可能地組合資料表、view、消除資料列、分組等來建構中介的虛擬資料表。這個資料表最終會被傳遞給資料列表的處理。資料列表確認中介資料表的哪些欄位是實際上要輸出的。
最簡單的選擇列表是*,它表示資料表表示式產生的所有欄位。否則,資料列表是逗號分隔的參數表示式列表(如第 4.2 節中所定義的)。例如,它可能是欄位名稱的列表:
欄位名稱 a、b 和 c 是 FROM 子句中資料表的欄位的實際名稱,或者是由第 7.2.1.2 節中所賦予它們的別名。資料列表中可用的命名空間與 WHERE 子句中的命名空間相同,除非是使用分組查詢,在這種情況下,它與 HAVING 子句中的相同。
如果多個資料表具有相同名稱的欄位,則還必須加上資料表的名稱,如下所示:
處理多個資料表時,查詢特定資料表的所有欄位也是可以的:
有關 table_name.* 表示法的更多信息,請參閱第 8.16.5 節。
如果在資料列表中使用任意值表示式,則概念上是它將新的虛擬欄位加到回傳的資料表中。參數表示式對每個結果資料列計算一次,將該資料列的值替換為任何欄位引用。但是資料列表中的表示式不必引用 FROM 子句的資料表表示式中的任何欄位;例如,它們可以是常數算術表示式。
資料列表中的項目可以被分配用於後續處理的名稱,例如在 ORDER BY 子句中使用或由用戶端應用程序顯示。 例如:
如果沒有使用 AS 指定輸出欄位的名稱,系統將分配一個預設的欄位名稱。對於簡單欄位的引用,就是引用欄位的名稱。對於函數呼叫,就是函數的名稱。對於複雜的表示式,系統將會產成一個通用的名稱。
AS 關鍵字是選用的,但前提是新的欄位名稱不為任何PostgreSQL 關鍵字(請參閱附錄C)。為避免與關鍵字意外撞名,你可以對欄位名稱使用雙引號。例如,VALUE 是一個關鍵字,所以就不能這樣使用:
但這樣就可以了:
為了防止未來可能增加的關鍵字,建議你習慣使用 AS 或總是在欄位名稱使用雙引號。
注意這裡輸出欄位的命名與 FROM 子句中的命名不同(參閱第 7.2.1.2 節)。可以重新命名相同的欄位兩次,但在資料列表中分配的名稱是將要回傳的名稱。
DISTINCT
在處理了資料列表之後,結果資料表可以選擇性地消除重複的資料列。 DISTINCT 關鍵字在 SELECT 之後直接寫入以指定這個動作:
(如果不是 DISTINCT,而是關鍵字 ALL,可用於指定保留所有資料列的預設行為。)
顯然,如果至少有一個欄位值不同,則兩個資料列就會被認為是不同的。 在這個比較中,空值(null)被認為是相等的。
或者,使用表示式可以指定資料列如何被認為是不同的:
這裡表示式是一個任意的運算表示式,對所有資料列進行求值運算。所有表示式相等的一組資料列被認為是重複的,並且只有該組的第一個資料列會被保留在輸出中。請注意,集合中的「第一行」是不可預知的,除非查詢按足夠的欄位進行排序,以保證進到 DISTINCT 過濾器的資料列是唯一排序。(在 ORDER BY 排序後才進行 DISTINCT ON 處理。)
DISTINCT ON 子句不是SQL標準的一部分,有時被認為是不好的樣式,因為其結果有潛在的不確定性。透過在 FROM 中智慧地使用 GROUP BY 和子查詢,可以避免這種結構,但這卻往往是最方便的選擇。
LIMIT 和 OFFSET 允許你只回傳由查詢生成的一部分資料列:
如果給了一個限制的數量,那麼只有那個數目的資料列會回傳(如果查詢本身產生較少的資料列,則可能會少一些)。LIMIT ALL 與省略 LIMIT 子句相同,也如同 LIMIT 的參數為 NULL。
OFFSET 指的是在開始回傳資料列之前跳過那麼多少資料列。OFFSET 0 與忽略 OFFSET 子句相同,就像使用 NULL 參數的 OFFSET 一樣。
如果同時出現 OFFSET 和 LIMIT,則在開始計算回傳的LIMIT 資料列之前,先跳過 OFFSET 數量的資料列。
使用 LIMIT 時,運用 ORDER BY 子句將結果資料列限制為唯一順序非常重要。否則,你會得到一個不可預知的查詢資料列的子集。你可能會查詢第十到第二十個資料列,但是第十到第二十個資料列是按什麼順序排列的?次序是未知的,除非你指定 ORDER BY。
查詢最佳化在產生查詢計劃時會將 LIMIT 考慮在內,所以根據你給的 LIMIT 和 OFFSET,你很可能會得到不同的計劃(產生不同的資料列順序)。因此,使用不同的 LIMIT / OFFSET 值來選擇查詢結果的不同子集將導致不一致的結果,除非使用 ORDER BY 強制執行可預測的結果排序。這不是一個錯誤;這是一種事實上的結果,即 SQL 不保證以任何特定順序傳遞查詢的結果,除非使用 ORDER BY 來約束順序。
由 OFFSET 子句跳過的資料列仍然需要在伺服器內計算。因此一個大的 OFFSET 可能是低效率的。
兩個查詢的結果可以使用集合操作聯、交集和差集來組合。其語法為:
query1 和 query2 是到目前為止討論過的任何查詢功能。集合操作也可以巢狀也可以連接,例如:
會如下方式執行:
UNION 將 query2 的結果有效率地附加到 query1 的結果中(但不能保證這是實際回傳資料列的次序)。此外,除非使用了UNION ALL,否則它將以與 DISTINCT相同的方式從結果中消除重複的資料列。
INTERSECT 返回 query1 的結果和 query2 的結果中所有共同的資料列。除非使用 INTERSECT ALL,否則會刪除重複的資料列。
EXCEPT 回傳 query1 的結果中但不包含在 query2 的結果中的所有資料列。(這有時被稱為兩個查詢之間的差集。)同樣地,除非使用 EXCEPT ALL,否則重複資料列將被刪除。
為了計算兩個查詢的聯集、交集或差集,兩個查詢必須是「union compatible」,這意味著它們回傳相同數量的欄位,相應的欄位具有相容的資料型別,如 10.5 節所述。
一個 資料表表示式 計算出一個資料表。資料表表示式包含了一個可以選擇在後方跟隨WHERE
、GROUP BY
和HAVING
子句的FROM
子句。普遍的資料表表示式簡單地在磁碟上引用一個資料表,, 即聲稱的基底資料表(base table), 但更複雜的表示式可被用於以多種形式修改或組合基底資料表。
在資料表表示式中選擇性的WHERE
、GROUP BY
和HAVING
子句指定一個逐次變換執行在FROM
子句衍生的資料表上的管道。所有的這些轉換都會產生一個虛擬資料表,該資料表提供了被傳遞到選擇串列的資料列,以計算查詢的輸出資料列。
FROM
子句The FROM
子句從逗號分隔資料表參照串列中給出的一個或多個其他的資料表衍生一個資料表。
一個資料表參照能是一個表格名稱(也許綱要限定的),或一個衍生出的資料表,例如子查詢,JOIN
建構或這些的複雜組合。如果多個資料表參照被列在FROM
子句中, 這些資料表參照則表將被交叉聯接(cross-joined,即形成其資料列的笛卡爾積;請參見下文。)FROM
串列的結果是一個中間的虛擬表,該表可以受到WHERE
、GROUP BY
和HAVING
子句的轉換,並且最終是整個資料表表示式的結果。
當一個資料表參照命名一個表格繼承層次結構的父級資料表,資料表參照不只是產生該表格的列,還會產生其所有後代表格的列,除非關鍵字ONLY
在表格名稱之前。然而,該參照僅產生出現在已命名資料表中的欄位—子資料表中添加的任何欄位都將被忽略。
可以在表格名稱之後寫入*
來明確指定包含後代表格,而不是在表格名稱之前寫入ONLY
。因為搜索後代表格現在始終是默認行為,沒有真正的理由再使用此語法。但是,支持它是為了與舊版本的兼容性。
聯接的資料表(joined table)是一個根據特定聯接型別的規則從兩個(真實的或被衍生的)其他資料表衍生的資料表。可以使用 Inner、outer、及cross-join 。聯接資料表的一般語法是
所有型別的聯接可以鏈結或嵌套在一起: T1
and T2
中的一個或兩個都可以被聯接資料表。可以在JOIN
子句周圍使用括號來控制聯接順序。在沒有括號的情況下,JOIN
子句從左到右嵌套。
聯接型別
Cross join
對於從 T1
and T2
的列的每種可能的組合(即笛卡爾積), 聯接的資料表將包含一個由 T1
所有欄其次是 T2
所有欄組成的列。如果資料表分別有 N 列及 M 列, 聯接表將具有 N * M 列。
FROM
T1
CROSS JOIN
T2
相當於 FROM
T1
INNER JOIN
T2
ON TRUE
(見下文。)它也等同於 FROM
T1
,
T2
。
注意
當出現兩個以上的表時,後者的等價關係並不完全成立,因為JOIN
的綁定比逗號更緊密。例如,FROM
T1
CROSS JOIN
T2
INNER JOIN
T3
ON
condition
不同於FROM
T1
,
T2
INNER JOIN
T3
ON
condition
因為 condition
可以第一種情況中但不能在第二個情況中參照 T1
。
Qualified joins
單詞 INNER
及 OUTER
在所有形式中都是可選的。INNER
是默認值; LEFT
、RIGHT
及 FULL
表示外部聯接。
在 ON
or USING
子句中指定 join condition ,或由單詞NATURAL
隱式指定。聯接條件決定兩個來源資料表中的哪些列被視為“匹配”,如下面詳細的說明。
限定聯接(qualified joins)的可能型別為:
INNER JOIN
對於T1
的每一列 R1
,聯接表有一列在T2
中的每一列中滿足R1
的聯接條件。
LEFT OUTER JOIN
首先,執行內部聯接。然後,對於T1
中每一列與T2
中任何列不滿足聯接條件,聯接列在T2
的欄中添加空值。因此,對於T1
中的每一列聯接表始終至少具有一列。
RIGHT OUTER JOIN
首先,執行內部聯接。 然後,對於T2
中每一列與T1
中任何列不滿足聯接條件,聯接列在T1
的欄中添加空值。這是左聯接的反面:對於T2
中的每一列結果表將始終有一列。
FULL OUTER JOIN
首先,執行內部聯接。然後,對於T1
中每一列與T2
中任何列不滿足聯接條件,聯接列在T2
的欄中添加空值。另外,對於T2
中每一列與T1
中任何列不滿足聯接條件,聯接列在T1
的欄中添加空值。
ON
子句是最通用種類的聯接條件:它採用與WHERE
子句中使用的種類相同的Boolean值表示式。如果 ON
表示式評估為真值,來自 T1
和 T2
的一對資料列匹配。
USING
子句是一種簡寫形式,可讓您在特定的情況充分利用,即在聯接兩端使用相同的名稱聯接欄位。它使用逗號分隔的共享欄位名稱串列並形成一個包括每個條件相等性比較的聯接條件。例如,將 T1
和 T2
與 USING (a, b)
進行聯接會產生聯接條件ON
T1
.a =
T2
.a AND
T1
.b =
T2
.b
。
此外,JOIN USING
的輸出抑制多餘的欄:無需打印兩個匹配的欄,因為它們必須具有相等的值。儘管JOIN ON
會產生 T1
的所有欄其次是 T2
的所有欄,JOIN USING
為每個列出的欄配對(按照列出的順序)產生一個輸出欄,其次是 T1
的所有剩餘欄,其次是 T2
的所有剩餘欄。
最後,NATURAL
是USING
的簡寫形式:它形成一個由出現在兩個輸入資料表中的所有欄位名稱組成的USING
串列。 與USING
一樣,這些欄在輸出表中僅出現一次。如果沒有共用的欄位名稱,NATURAL JOIN
的行為類似於JOIN ... ON TRUE
,產生外積聯接(cross-product join。
注意
USING
對於在聯接關係中變更欄位是相當安全的因為只有列出的欄位被合併。NATURAL
的風險相當可觀,因為任何綱要(schema)變更為任一導致新的匹配欄位名稱出現的關係,也將會導致聯接合併該新的欄位。
綜合以上所述,假設我們有資料表t1
:
和資料表t2
:
然後對於各種聯接我們得到以下結果:
以ON
指定的聯接條件還可以包含與聯接不直接相關的條件。對於某些查詢這可以證明是有用的但需要小心地深思熟慮。例如:
請注意,將限制放置在WHERE
子句中會產生不同的結果:
這是因為限制放在 ON
子句會在聯接之前被處理,而限制放在 WHERE
子句會在聯接之後被處理。這與內部聯接無關緊要,但對於外部聯接則很重要。
可以為資料表和復雜資料表參照給定一個臨時名稱來用在其餘查詢中參照衍生的資料表。這稱為 資料表別名(table alias) 。
要創建資料表別名,請編寫
或者是
關鍵字AS
是選擇性的。 alias
可以是任何標識符。
資料表別名的典型應用是將短標識符分配給長資料表名稱,以保持連接子句的可讀性。例如:
以當前查詢而言,別名成為表參照的新名稱 —不允許在查詢其他位置中使用原始名稱引用該表。因此,這是無效的:
資料表別名主要是為了表示法的方便,但是在將資料表聯接到自身時必須使用它們,例如:
此外,如果表參照是子查詢,則需要別名(詳見7.2.1.3節。)
括號被用於解決歧義。在以下範例中,第一條語句將別名b
分配給my_table
的第二個實例,但是第二條語句將別名分配給聯接結果:
資料表別名的另一種形式為資料表欄位以及資料表本身賦予臨時名稱:
如果指定的欄位別名少於實際表中包含的欄位,則不會重命名剩餘的欄位。此語法對於自聯接或子查詢特別有用。
當別名被應用到JOIN
子句的輸出時,別名將原始名稱隱藏在JOIN
中。例如:
是有效的SQL,但是:
是無效的;資料表別名a
在別名c
之外並不可見。
子查詢指定衍生資料表必須括號括起來必須為資料表分配別名(如7.2.1.2節。)例如:
這個例子相當於FROM table1 AS alias_name
。當子查詢涉及分組或彙總時會出現更有趣的無法簡化為普通聯接的情況。
子查詢也可以是VALUES
串列:
同樣,需要資料表別名。為VALUES
串列的欄位分配別名是選擇性的,但這是一種好的實踐。有關更多訊息,請參見7.7節。
資料表函數是產生一組資料列的函數,這些列由基本資料型別(標量(scalar)型別)或複合數資料型別(資料表列)組成。在查詢的 FROM
子句中,它們像資料表、檢視表或子查詢一樣使用。資料表函數返回的欄位以資料表欄位、檢視表或子查詢相同的方式可以包含在SELECT
、JOIN
或WHERE
子句中。
資料表函數也可以使用ROWS FROM
語法進行組合,以並行欄位返回結果;在這種情況下結果列的數量是最大的函數結果,較小的結果將填充空值來匹配。
如果WITH ORDINALITY
子句被指定,一個額外的bigint
型別欄位将被添加到函數結果欄位。這個欄位從1開始為函數結果集合的列作編號(這是SQL標準語法UNNEST ... WITH ORDINALITY
的概括。)在默認情況下,序數欄位欄位被稱為ordinality
,但可以使用AS
子句分配不同的欄位名稱給它。
特別的資料表函數UNNEST
也許伴隨著任意數量的陣列參數被調用,並且他返回一個對應數量的欄位,就如同分別對每個參數調用UNNEST
(9.19節)並使用ROWS FROM
建構將其組合在一起。
如果沒有指定 table_alias
,該函數名稱被用作資料表名稱; 在ROWS FROM
建構的情況中使用第一個函數的名稱。
如果沒有提供欄位別名,則對於返回一個基礎資料型別的函數,該欄位名稱也與函數名稱相同。對於返回一個複合資料型別的函數,該結果欄位取得該型別個別屬性的名稱。
舉一些範例:
在一些情況中他對定義能根據它們的調用方式返回不同欄位集合的資料表函數很有用。為了要支持這情況,資料表函數可以被宣告為返回偽型別 record
。在查詢中使用此種函數時,在查詢本身中必須指定預期的資料列結構,以便讓系統知道如何解析和規劃查詢。這種語法看起來像是:
沒有使用ROWS FROM()
語法時,column_definition
串列替換原本能被附加到FROM
項目的欄位別名串列;在欄位定義中的名稱充當欄位別名。當使用ROWS FROM()
語法時,column_definition
串列能被分別附加到每個成員函數;或者如果只有一個成員函數且沒有WITH ORDINALITY
子句,能編寫column_definition
串列來代替ROWS FROM()
之後的欄位別名串列。
考慮以下範例:
dblink函數(dblink模組的一部分)執行遠端查詢。它宣告返回record
,因為它可以用於任何種類的查詢。實際的欄位集合必須被指定在調用的查詢以便讓解析器知道,舉例來說,*
應該擴展成什麼。
出現在FROM
中的子查詢的前面可以有關鍵字LATERAL
。這允許它們參照前面FROM
項目提供的欄位。(沒有LATERAL
的話,每一個子查詢被個別評估所以不能交叉參照任何其他FROM
項目。)
出現在FROM
中的資料表函數的前面也能有關鍵字LATERAL
,但對於函數來說該關鍵字是選擇性的;在任何情況下該函數的參數能包含前面FROM
項目提供的欄位參照。
LATERAL
項目能出現在FROM
串列的頂層,或在JOIN
樹之中。在後面的情況下在JOIN
右邊的LATERAL
也能引用在JOIN
左邊的任何項目。
當FROM
項目包含LATERAL
交叉參照,評估過程如下: 對於該FROM
項目每一個提供交叉參照後欄位的列,或是多個FROM
項目之提供欄位的列集合,將使用該欄位的列或列集合值來評估LATERAL
項目。結果資料列照常與運算出它們的資料列聯接。對於欄位來源表的每一列或列集合重複此操作。
LATERAL
的一個簡單範例是:
這不是特別有用,因為它與完全常規的結果完全相同
LATERAL
主要有用的時機是在運算資料列聯接而需要交叉參照後欄位的時候。典型的應用是提供一個參數值給會返回集合的函數。舉例來說,假如vertices(polygon)
返回多邊形的頂點集合,我們可以經由以下方式識別存儲在表中多邊形的近似頂點:
這個查詢也可以寫成
或者以其他幾種等效公式表示。(如前所述,關鍵字LATERAL
在此範例中是不必要的,但為了清楚起見而使用它。)
即使LATERAL
子查詢沒有產生資料列,通常特別便利將LEFT JOIN
添加到LATERAL
子查詢,使得來源資料列將出現在結果中。舉例來說,如果get_product_names()
返回製造商生產的產品名稱,但是我們表中的某些製造商目前未生產任何產品,我們可以像這樣找出:
WHERE
子句WHERE
子句的語法是
其中 search_condition
是任何返回型別boolean
值的值表示式(參見4.2節。)
在完成FROM
子句的處理之後,針對搜尋條件檢查衍生虛擬表的每一列。如果條件的結果為true,則資料列保留在輸出表中,否則(即結果為false或null) 被丟棄。搜尋條件通常參照在FROM
子句中生成的表中的至少一欄;這不是必須的,但反之WHERE
子句是相當毫無用處的。
注意
內部聯接的聯接條件可以寫入在 WHERE
子句中或JOIN
子句中。例如,這些資料表表示式等同於:
以及:
或也甚至:
使用其中哪一個主要是風格問題。FROM
子句 的JOIN
語法對其他SQL資料庫管理系統的可能不是可攜式的, 即使它處於SQL標準中。對於外部聯接來說別無選擇:他們必須在FROM
子句中完成。外部聯接的ON
或USING
子句不是等同於WHERE
條件,因為它導致列的添加(對於沒有匹配的輸入列)以及在最終結果中列的刪除。
以下是WHERE
子句的一些範例:
fdt
是在 FROM
子劇中衍生的資料表。不符合WHERE
子句搜尋條件的列從FDT
排除。請注意標量(scalar)子查詢作為值表示式的使用。就像任何其他查詢一樣,子查詢可以採用複雜的資料表表示式。還要注意在子查詢中fdt
是如何被參照的。僅當c1
也是子查詢衍生輸入表中的欄位名稱時,限定(qualifying)c1
為fdt.c1
是必要的。但即使不需要,限定欄位名稱會增加清晰度。此範例顯示了外部查詢的欄位命名作用域如何延伸到其內部查詢中。
GROUP BY
及 HAVING
子句在經過WHERE
篩選器後,衍生的輸入表可能會遭受到使用GROUP BY
子句進行分組,而使用HAVING
子句進行群組資料列的排除。
GROUP BY
子句用於將資料列分組在一起,這些資料列在條列出的所有資料列中具有相同的值。條列出的的欄位順序無關緊要。其效果是將具有共同值的資料列集合在群組中組合到一個群組資料列來表示所有資料列。這樣做是為了排除輸出中的的冗餘且/或運算應用於這些群組的彙總。例如:
在第二個查詢中,我們不能寫成 SELECT * FROM test1 GROUP BY x
,因為對於可能與每個群組相關聯的欄位y
來說沒有單一值。可以在選擇串列中參照被分組的列,因為它們在每個群組中具有單一值。
通常來說,如果將資料表被分組,則除了彙總表示式之外不能參照沒有在GROUP BY
中條列出的欄位。彙總表示式的範例是:
在這裡sum
是一個在整個群組之上運算一個單一值的彙總函數。有關彙總函數的更多訊息,請參見9.21節。
Tip
沒有彙總表示式的分組有效地運算一個欄位中的相異值集合。這也可以使用DISTINCT
子句來實現(詳見7.3.3節。)
這是另一個範例,它計算每個產品的總銷售額(而不是所有產品的總銷售):
在這個範例,欄位product_id
、p.name
、及p.price
必須在GROUP BY
子句中是由於它們在查詢選擇串列中被參照(但詳見下文。)欄位s.units
沒有需要在GROUP BY
串列是由於它只能使用在彙總表示式(sum(...)
),其代表一個產品的銷售。對於每個產品,查詢返回關於該產品所有銷售的摘要資料列。
如果產品資料被設置為product_id
是主鍵(primary key),然後在上方的範例中它足以經由被product_id
分組,是由於名稱與價格將是在功能上依賴於產品ID,所以對與每個產品ID群組要返回哪些名稱和價格值都沒有模棱兩可。
在嚴格的SQL中, GROUP BY
只能經由來源資料表的欄位進行分組但PostgreSQL擴展允許GROUP BY
經由選擇串列中的欄位進行分組。允許經由值表示式來取代簡單的欄位名稱進行分組。
如果資料表已經被GROUP BY
分組,但只有對某些群組感興趣,能使用HAVING
子句,類似WHERE
子句,從結果來排除群組。語法如下:
在HAVING
子句中的表示式能引用已分組表示式及未分組表示式兩者(其必然涉及彙總函數。)
舉例:
再來一個更真實的範例:
在上方的範例中,WHERE
子句正在經由一個未被分組的欄位選擇資料列(在過去四周內,該表示式僅適用於銷售額),儘管 HAVING
子句限制輸出為總銷售額超過5000的群組。 請注意,彙總表示式在查詢的所有部分中不一定需要相同。
如果查詢包含彙總函數調用但沒有 GROUP BY
子句,分組仍然會發生:結果是單個群組資料列(或者可能沒有資料列,如果經由HAVING
排除該單一資料列。)即使沒有任何彙總函數調用或 GROUP BY
子句,如果包含HAVING
子句則同樣會發生。
GROUPING SETS
、CUBE
及 ROLLUP
更多比上方描述較複雜的分組操作可以使用 分組集合(grouping sets) 的概念。經由FROM
及WHERE
子句選擇的資料被每一個特定的分組集合分別地分組,對於每一個群組運算的彙總就如同簡單的GROUP BY
子句,而後返回其結果。舉例來說:
每一個GROUPING SETS
的子串列可以指定零個或多個欄位或表示式並且以它直接在GROUP BY
子句中相同的方式來解釋。 一個空的分組集合意味著所有資料列被彙總到單一的群組(即使沒有輸入資料列被呈現也會輸出),如同上方所述對於沒有GROUP BY
子句的彙總函數之情況。
分組欄位或表示式的參照對於未出現在這些欄位中的分組集合來說會在結果列中由null值替換。要區分源自哪邊的分組特定輸出列,詳見表 9.59。
為了指定兩個分組集合的常見型別提供了一個簡寫表示法。該形式的子句為
代表了給定的表達式串列和該串列的所有前綴,包括空串列;因此它相當於
這通常用於分析階層式資料:例如,部門,分部和公司的總薪資。
另一形式的子句為
表示給定的串列和所有可能的子集合(即power set。)因此
相當於
CUBE
或ROLLUP
子句各自的元素也許是各自的表示式,或元素在括號中的子串列。在後一種情況下,為了生成各自的分組集合的意圖,該子串列被視為單個單元。例如:
相當於
以及
相當於
CUBE
或ROLLUP
建構能被直接用在GROUP BY
子句中,或被嵌套在GROUPING SETS
子句內。如果GROUPING SETS
子句被嵌套在另一個內,效果與內部子句內的所有元素被直接寫入外部子句中時相同。
如果多個的分組項目被指定在單一GROUP BY
子句,分組集合的最終串列會是各自項目的外積。例如:
相當於
注意
建構 (a, b)
一般來說在表示式中被辨識為一個資料列建構子(row constructor)。在GROUP BY
子句內,這不適用於表示式的頂層,並且 (a, b)
是被解析為一個如上方所述的表示式串列。如果為某些理由你 需要 一個資料列建構子在分組表示式,請使用ROW(a, b)
。
如果查詢包含任何窗函數(詳見 3.5節,9.22節, 4.2.8節),這些函數在執行任何分組、彙總及HAVING
篩選之後被評估。也就是說,如果查詢使用任何彙總、GROUP BY
或HAVING
,則窗函數看到的資料列是分組資料列而不是來自FROM
/WHERE
的原始表資料列。
當使用多個窗函數,擁有在語法上等效於PARTITION BY
及ORDER BY
子句的所有窗函數在窗口定義中是被保證在資料上的單次傳遞中被評估。因此它們將看到相同的排序次序,即使ORDER BY
沒有唯一決定次序。然而不保證具有不同於PARTITION BY
或ORDER BY
規範的函數之評估。(在這種情況下窗函數評估的傳遞之間通常需要排序步驟,並且不保證該排序會維持它的ORDER BY
視為等效的資料列之次序。)
目前,窗函數總是必須要預先排序的資料,因此會依照一個或其他窗函數的PARTITION BY
/ORDER BY
子句整理查詢輸出。然而,不建議依賴這一點。使用顯式頂層ORDER BY
子句如果要確保結果以特定方式排序。
在查詢產生了一個輸出資料表(處理了資料列表之後)之後,可以對其資料列進行排序。如果未選擇排序,則資料列將以未指定的順序回傳。在這種情況下的實際順序將取決於資料掃描和交叉查詢類型以及磁碟上的順序,但不能依賴它。只有明確選擇了排序方式,才能保證特定的輸出排序。
以 ORDER BY 子句指定排序順序:
排序表示式可以在查詢的資料列表中有效的任何表示式。 一個例子是:
當指定多個表示式時,後面的表示式用於前面表示式都相同的資料進行排序。每個表示式可以跟隨一個選擇性的 ASC 或 DESC 關鍵字來設定排序方向為升冪或降冪。 ASC 排序是預設的選項。升冪首先放置較小的值,其中「較小」是根據「<」運算元定義的。 同樣,降冪也是由「>」運算元決定的。
NULLS FIRST 和 NULLS LAST 選項可用於確定在排序順序中是否出現空值出現在非空值之前或之後。預設情況下,空值排序大於任何非空值;也就是 NULLS FIRST 是 DESC 選項的預設值,否則就是 NULLS LAST。
請注意,排序選項是針對每個排序欄位獨立考慮的。例如 ORDER BY x, y DESC 是指 ORDER BY x ASC, y DESC,它與 ORDER BY x DESC, y DESC 不同。
排序表示式也可以是輸出欄位的欄位標籤或編號,如下所示:
兩者都按第一個輸出欄位排序。請注意,輸出欄位名稱必須獨立,也就是說,不能在表示式中使用 - 例如,這樣是不正確的:
這種限制是為了減少歧義。 即使 ORDER BY 項目是一個簡單的名字,可以匹配輸出欄位名稱或者資料表表示式中的一項,這仍然是會混淆的。在這種情況下請使用輸出欄位。如果您使用 AS 來重新命名輸出欄位以匹配其他資料表欄位的名稱,只會導致混淆。
可以將 ORDER BY 應用於 UNION、INTERSECT 或 EXCEPT 組合的結果,但在這種情況下,只允許按輸出欄位名稱或數字進行排序,而不能使用表示式進行排序。
檢索過程或從資料庫檢索資料的命令稱之為查詢。在 SQL 中,SELECT 命令用於進行條件查詢。 SELECT 指令的一般語法是:
以下各節介紹了資料列表(select list),資料表和排序規則的詳細資訊。由於 WITH 查詢是高級功能,因此最後再介紹。
一種簡單的查詢形式如下:
假設有一個名稱為 table1 的資料表,該指令會將取出 table1 中的所有資料表和所有用戶定義的欄位。(檢索的方法取決於用戶端的應用程序,例如,psql 程序將在屏幕上顯示一個 ASCII-art 表格,而用戶端的程式函式庫將提供從查詢結果中提取單一值的功能。選擇資料列表定義「*」表示由資料表表示式所產生的所有欄位。篩選列表可以是可用欄位的子集或使用欄位進行計算。例如,如果 table1 具有名稱為 a,b 和 c(也許是其他)的欄位,則可以進行以下查詢:
(假設 b 和 c 是數字型別)。更多細節詳見 7.3 節。
FROM table1是一種簡單的資料表表示式:它只讀取一個資料表。一般來說,資料表表示式可以是一般的資料表,交叉查詢和子查詢的複雜結構。但是,你也可以完全省略資料表表示式,並使用 SELECT 指令作為計算機:
使用資料列表中的表達式產生變動的結果,是更為常用的方式。例如,你可以這樣呼叫一個函數:
WITH
provides a way to write auxiliary statements for use in a larger query. These statements, which are often referred to as Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query. Each auxiliary statement in a WITH
clause can be a SELECT
, INSERT
, UPDATE
, or DELETE
; and the WITH
clause itself is attached to a primary statement that can also be a SELECT
, INSERT
, UPDATE
, or DELETE
.
SELECT
in WITH
The basic value of SELECT
in WITH
is to break down complicated queries into simpler parts. An example is:
which displays per-product sales totals in only the top sales regions. The WITH
clause defines two auxiliary statements named regional_sales
and top_regions
, where the output of regional_sales
is used in top_regions
and the output of top_regions
is used in the primary SELECT
query. This example could have been written without WITH
, but we'd have needed two levels of nested sub-SELECT
s. It's a bit easier to follow this way.
The optional RECURSIVE
modifier changes WITH
from a mere syntactic convenience into a feature that accomplishes things not otherwise possible in standard SQL. Using RECURSIVE
, a WITH
query can refer to its own output. A very simple example is this query to sum the integers from 1 through 100:
The general form of a recursive WITH
query is always a non-recursive term, then UNION
(or UNION ALL
), then a recursive term, where only the recursive term can contain a reference to the query's own output. Such a query is executed as follows:
Recursive Query Evaluation
Evaluate the non-recursive term. For UNION
(but not UNION ALL
), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
So long as the working table is not empty, repeat these steps:
Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION
(but not UNION ALL
), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
Strictly speaking, this process is iteration not recursion, but RECURSIVE
is the terminology chosen by the SQL standards committee.
In the example above, the working table has just a single row in each step, and it takes on the values from 1 through 100 in successive steps. In the 100th step, there is no output because of the WHERE
clause, and so the query terminates.
Recursive queries are typically used to deal with hierarchical or tree-structured data. A useful example is this query to find all the direct and indirect sub-parts of a product, given only a table that shows immediate inclusions:
When working with recursive queries it is important to be sure that the recursive part of the query will eventually return no tuples, or else the query will loop indefinitely. Sometimes, using UNION
instead of UNION ALL
can accomplish this by discarding rows that duplicate previous output rows. However, often a cycle does not involve output rows that are completely duplicate: it may be necessary to check just one or a few fields to see if the same point has been reached before. The standard method for handling such situations is to compute an array of the already-visited values. For example, consider the following query that searches a table graph
using a link
field:
This query will loop if the link
relationships contain cycles. Because we require a “depth” output, just changing UNION ALL
to UNION
would not eliminate the looping. Instead we need to recognize whether we have reached the same row again while following a particular path of links. We add two columns path
and cycle
to the loop-prone query:
Aside from preventing cycles, the array value is often useful in its own right as representing the “path” taken to reach any particular row.
In the general case where more than one field needs to be checked to recognize a cycle, use an array of rows. For example, if we needed to compare fields f1
and f2
:
Omit the ROW()
syntax in the common case where only one field needs to be checked to recognize a cycle. This allows a simple array rather than a composite-type array to be used, gaining efficiency.
The recursive query evaluation algorithm produces its output in breadth-first search order. You can display the results in depth-first search order by making the outer query ORDER BY
a “path” column constructed in this way.
A helpful trick for testing queries when you are not certain if they might loop is to place a LIMIT
in the parent query. For example, this query would loop forever without the LIMIT
:
This works because PostgreSQL's implementation evaluates only as many rows of a WITH
query as are actually fetched by the parent query. Using this trick in production is not recommended, because other systems might work differently. Also, it usually won't work if you make the outer query sort the recursive query's results or join them to some other table, because in such cases the outer query will usually try to fetch all of the WITH
query's output anyway.
A useful property of WITH
queries is that they are normally evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH
queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH
query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is not able to push restrictions from the parent query down into a multiply-referenced WITH
query, since that might affect all uses of the WITH
query's output when it should affect only one. The multiply-referenced WITH
query will be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)
但是,如果 WITH 查詢是非遞迴且不會在執行中變動的(即它是一個不包含 volatile 函數的 SELECT),則可以將其合併到父查詢之中,從而可以對兩個查詢等級進行聯合語法最佳化。預設情況下,如果父查詢僅引用一次 WITH 語句,而不是多次引用 WITH 一次查詢,則會觸發這個機制。您可以透過指定 MATERIALIZED 強制執行 WITH 查詢的單獨計算,或者透過指定 NOT MATERIALIZED 強制執行將其合併到父查詢中來覆蓋該查詢計畫。後面一種選擇可能會冒著重複計算 WITH 查詢的風險,但如果 WITH 查詢的每次使用只需要 WITH 查詢全部輸出的一小部分,那麼它仍然可以節省成本。
A simple example of these rules is
This WITH
query will be folded, producing the same execution plan as
In particular, if there's an index on key
, it will probably be used to fetch just the rows having key = 123
. On the other hand, in
the WITH
query will be materialized, producing a temporary copy of big_table
that is then joined with itself — without benefit of any index. This query will be executed much more efficiently if written as
so that the parent query's restrictions can be applied directly to scans of big_table
.
An example where NOT MATERIALIZED
could be undesirable is
Here, materialization of the WITH
query ensures that very_expensive_function
is evaluated only once per table row, not twice.
The examples above only show WITH
being used with SELECT
, but it can be attached in the same way to INSERT
, UPDATE
, or DELETE
. In each case it effectively provides temporary table(s) that can be referred to in the main command.
WITH
You can use data-modifying statements (INSERT
, UPDATE
, or DELETE
) in WITH
. This allows you to perform several different operations in the same query. An example is:
This query effectively moves rows from products
to products_log
. The DELETE
in WITH
deletes the specified rows from products
, returning their contents by means of its RETURNING
clause; and then the primary query reads that output and inserts it into products_log
.
A fine point of the above example is that the WITH
clause is attached to the INSERT
, not the sub-SELECT
within the INSERT
. This is necessary because data-modifying statements are only allowed in WITH
clauses that are attached to the top-level statement. However, normal WITH
visibility rules apply, so it is possible to refer to the WITH
statement's output from the sub-SELECT
.
Data-modifying statements in WITH
usually have RETURNING
clauses (see Section 6.4), as shown in the example above. It is the output of the RETURNING
clause, not the target table of the data-modifying statement, that forms the temporary table that can be referred to by the rest of the query. If a data-modifying statement in WITH
lacks a RETURNING
clause, then it forms no temporary table and cannot be referred to in the rest of the query. Such a statement will be executed nonetheless. A not-particularly-useful example is:
This example would remove all rows from tables foo
and bar
. The number of affected rows reported to the client would only include rows removed from bar
.
Recursive self-references in data-modifying statements are not allowed. In some cases it is possible to work around this limitation by referring to the output of a recursive WITH
, for example:
This query would remove all direct and indirect subparts of a product.
Data-modifying statements in WITH
are executed exactly once, and always to completion, independently of whether the primary query reads all (or indeed any) of their output. Notice that this is different from the rule for SELECT
in WITH
: as stated in the previous section, execution of a SELECT
is carried only as far as the primary query demands its output.
The sub-statements in WITH
are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH
, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables. This alleviates the effects of the unpredictability of the actual order of row updates, and means that RETURNING
data is the only way to communicate changes between different WITH
sub-statements and the main query. An example of this is that in
the outer SELECT
would return the original prices before the action of the UPDATE
, while in
the outer SELECT
would return the updated data.
Trying to update the same row twice in a single statement is not supported. Only one of the modifications takes place, but it is not easy (and sometimes not possible) to reliably predict which one. This also applies to deleting a row that was already updated in the same statement: only the update is performed. Therefore you should generally avoid trying to modify a single row twice in a single statement. In particular avoid writing WITH
sub-statements that could affect the same rows changed by the main statement or a sibling sub-statement. The effects of such a statement will not be predictable.
At present, any table used as the target of a data-modifying statement in WITH
must not have a conditional rule, nor an ALSO
rule, nor an INSTEAD
rule that expands to multiple statements.
VALUES 提供了一種產生「靜態資料表」的方法,可以在查詢中使用,而不必實際創建和寫入磁碟上的資料表。其語法是
每個括號內的表示式列表在資料表中生成一個資料列。列表必須具有相同數量的元素(即資料表中的欄位數),並且每個列表中的對應條目必須具有兼容的資料型別。 分配給結果中每個欄位的實際資料型別,使用與 UNION 相同的規則來給定(請參閱第 10.5 節)。
如下範例所示:
將回傳一個兩個欄位三個資料列的資料表。這實際上相當於:
預設情況下,PostgreSQL 會將名稱 column1、column2 等分配給 VALUES 資料表的欄位。欄位名稱並不是由 SQL 標準規定的,不同的資料庫系統會以不同的方式賦予,所以通常以資料表別名列表覆寫預設名稱會比較好,如下所示:
在語法上,VALUES 接在表示式列表之後被視為等同於:
並可以出現在任何一個 SELECT 可以使用的地方。例如,你可以將其用作為 UNION 的一部分,或者為其增加排序規則(ORDER BY、LIMIT 和 OFFSET)。在 INSERT 命令中,VALUES 最常來作為資料源,其次最常在子查詢。
關於更多訊息,請參閱 VALUES。
PostgreSQL 支援標準 SQL 的布林型別,如 Table 8-19 所示。布林型別有幾種狀態: "true"、"false",和第三種狀態 "unknown","unknown" 會用 SQL 的 null 值表示。
以下的字詞都可以代表 "true" 狀態:
"false" 狀態則可以用以下的字詞表示:
開頭和結尾的空白都會被忽略,也不分大小寫。 為了符合 SQL 用法,建議使用關鍵字 "TRUE" 和 "FALSE"。
Example 8-2 使用字母 t 和 f,來顯示布林型別的輸出。
bytea 資料型別允許儲存位元組字串;詳見 Table 8.6。
位元組字串是位元組的序列。位元組字串以兩種方式與字串區分開來。首先,位元組字串特別允許儲存零值的位元組和其他「不可列印」位元組(通常是在 32 到 126 範圍之外的位元組)。字串不允許全為零位元組,並且還禁止資料庫選擇無效的字元集編碼序列。其次,對位元組字串的操作處理實際的位元組,而字串的處理取決於區域設定。簡而言之,位元組字串適合於儲存程式設計師認為是「raw bytes」的資料,而字串適合於儲存文字。
bytea 型別支援兩種輸入和輸出的外部格式:PostgreSQL 既有的「escape」格式和「十六進位」格式,輸入時始終接受這兩個。輸出格式取決於組態參數 bytea_output;預設值為十六進位。(注意,在 PostgreSQL 9.0 中引入了十六進位格式;早期版本和一些工具並無法解譯它。)
SQL 標準定義了一種不同的位元組字串型別,稱為 BLOB 或 BINARY LARGE OBJECT。輸入格式與 bytea 不同,但提供的函數和運算子大致相同。
bytea
十六進位格式「十六進位」格式將二進位資料編碼為每個位元組為 2 個十六進位數字,儲存不反轉。整個字符串前面是序列 \x(以區別於轉譯格式)。在某些情況下,初始倒斜線可能需要透過加倍來進行轉譯,在相同的情況下,倒斜線必須以轉譯格式加倍;細節如下。十六進位數字可以是大寫或小寫,並且在數字組之間允許空格(但不在數字組內,也不在起始 \x 序列中)。十六進位格式與各種外部應用程序和協議相容,並且轉換速度往往比轉譯格式更快,因此偏好使用它。
例如:
bytea
轉譯(escape)格式「轉義」格式是 bytea 型別的傳統 PostgreSQL 格式。它採用將位元組字串表示為 ASCII 字元序列的方法,同時將那些不能表示為 ASCII 字元的位元組轉換為特殊的轉譯序列。如果從應用程序的角度來看,將位元組表示為字元是有意義的,那麼這種表示可以很方便。但實際上它通常會令人困惑,因為它模糊了位元組字串和字串之間的區別,而且所選擇的特定轉譯機制也有點笨拙。因此,對於大多數新的應用程序,應該避免使用此格式。
以轉譯格式輸入 bytea 值時,必須轉譯某些值的位元組,也同時可以轉譯所有位元組值。通常,要轉譯位元組,請將其轉換為三位數的八進位值,並在其前面加一個倒斜線(或兩個倒斜線,如果要使用轉譯字串語法將值寫為文字的話)。倒斜線本身(位元組 92)也可以用雙倒斜線表示。Table 8.7 列出了必須轉譯的字元,並在適合的情況下提供了備用轉譯序列。
bytea
Literal Escaped Octets轉譯不可列印的位元組的要求因區域設定而異。在某些情況下,你可以放棄他們而不轉譯。請注意,即使看起來有時多於一個字符,Table 8.7 中每個範例的結果也只有一個位元組。
如 Table 8.7 所示,需要多個倒斜線的原因是,作為字串文字編輯的輸入字串必須通過 PostgreSQL 伺服器中的兩個解析階段。每組的第一個倒斜線以字串文字解析器解釋為轉譯字元(假設使用了轉譯字串語法)並因此被消耗,留下該組的第二個倒斜線。(錢字號引用的字串可用於避免此轉譯程序。)然後,bytea 輸入函數將剩餘的倒斜線識別從三位數八進位值開始或轉譯另一個倒斜線。例如,在通過轉譯字串解析器後,作為 E'\ 001' 傳遞給伺服器的字串文字變為 \001。然後將 \001 發送到 bytea 輸入函數,在該函數中將其轉換為十進制值為 1 的單個位元組。請注意,單引號字元不受 bytea 特殊處理,因此它遵循字串文字的一般規則。(另詳見第 4.1.2.1 節。)
bytea 位元組有時在輸出時被轉義。通常,每個「不可列印」的位元組都會轉換為等效的三位數八進位值,並以一個倒斜線開頭。大多數「可列印」位元組由它們在用戶端字元集中的標準來表示。十進位值為 92(倒斜線)的位元組在輸出中會加倍。詳情見 Table 8.8。
bytea
Output Escaped Octets根據您使用的 PostgreSQL 的前端,在轉譯和未轉譯 bytea 字串方面可能還有其他工作要做。例如,如果您的界面會自動轉譯這些,您可能還必須轉譯換行符號和回行首符號。
貨幣型別儲存具有固定小數精確度的貨幣數量;詳見表 8.3。 小數精確度視資料庫的 lc_monetary 設定而定。表中顯示的範圍假設有兩個小數位。有許多可以接受的格式,包括整數和浮點數字,以及典型的貨幣格式,例如如「$1,000.00」。 輸出時通常採用後者的形式,但取決於語言環境(locale)。
Table 8.3. Monetary Types
由於此資料型別的輸出是與區域設定有關的,因此可能無法將貨幣資料載入到不同 lc_monetary 設定的資料庫中。為避免出現問題,在將轉換恢復到新的資料庫之前,請確保 lc_monetary 與轉換的資料庫中的設定值相容。
numberic、int 和 bigint 資料型別的值可以轉換為 money。從 real 和 double precision 資料型別轉換會先轉為 numeric 來完成,例如:
但是,並不推薦這樣做。由於四捨五入誤差的可能性,不應該使用浮點數來處理貨幣。
money 型別的數值可以轉換為 numeric 而不會損失精確度。轉換為其他型別可能會失去精確性,而且還必須分兩步驟完成:
當貨幣數值除以另一貨幣數值時,結果會是 double precision(即純數,而不是貨幣);貨幣單位會相互抵消。
數字型別由兩位數,四位數和八位數整數,四位元組和八位元組的浮點數以及可調式精確度的小數組成。表格 8.2 列出了可用的類型。
4.1.2 節描述了數字型別常數的語法。 數字型別有一整套相應的算術運算元和函數。有關更多訊息,請參閱第 9 章。 以下各節將詳細介紹這些型別。
smallint、integer 和 bigint 型別儲存整數,即不包含小數部分的各種範圍的數字。嘗試儲存在允許的範圍之外的數值將會導致錯誤。
「integer」型別是常見的選擇,因為它提供了數值範圍、儲存空間及效能之間的最佳平衡。「smallint」 列別通常只在磁碟空間不足的情況下使用。「bigint」 型別被設計用於整數型別的範圍不足時。
SQL僅指定整數型別 integer(或 int)、smallint 和 bigint。 型別名稱 int2、int4 和 int8 則是延伸型別,也有一些其他 SQL 資料庫系統使用。
數字型別可以儲存很多位數的數字。特別建議使用在要求正確性的地方,像是儲存貨幣金額或其他數量。使用數值的計算在可能需要的情況下得到確切的結果,例如 加法、減法、乘法。但是,與整數型別或下一節中介紹的浮點型別相比,對數值的計算速度非常緩慢。
我們使用下面的術語:數字的「scale」是小數點右邊的小數部分,也就是小數的位數。數字的「precision」是整數中有效位數的總數,即小數點兩邊的位數總合。所以 23.5141 的 precision 是 6,scale 是 4。整數可以被認為是 scale 為 0。
可以配置數字欄位的最大 precision 和最大 scale。要宣告數字型別的欄位,請使用以下語法:
precision 必須是正值,scale 為零或正值。或是:
選擇 0 為 scale。這樣使用:
沒有任何 precision 或 scale 的話,就會建立一個欄位,其欄位中可以儲存任何 precision 和 scale 的數字值,直到達到 precision 的極限。這種型別的欄位不會將輸入值強制轉為任何特定的 scale,其中具有聲明比例的數字欄位會將輸入值強制為該 scale。 (SQL 標準需要預設 scale 為 0,即強制為整數精度,我們發現這樣做有點無用。如果你擔心可移植性,請務必明確指定 precision 和 scale。
注意在型別宣告中明確指定時允許的最大 precision 為 1000;沒有指定 precision 的NUMERIC 為 Table 8.2 中所述的限制。
如果要儲存的小數位數大於欄位所宣告的 scale,則係統會將值四捨五入到宣告所指定的小數位數。然後,如果小數點左邊的位數超過宣告的 precise 減去聲明的 scale 的話,則會產生錯誤。
數字內容的實體儲存不會有任何額外的前導位數或補零。因此,欄位宣告的 precise 和 scale 是最大值,而不是固定的分配。(在這個意義上,數字型別更像是 varchar(n) 而不是 char(n)。) 實際儲存的要求是每四個十進制數字組加兩個位元組,再加上三到八個位元組的額外配置。
除了普通的數值之外,數字型別還允許特殊值 NaN,意思是「不是一個數字」。 NaN 的任何操作都會產生另一個 NaN。在 SQL 指令中將此值作為常數寫入時,必須在其中使用單引號,例如 UPDATE table SET x = 'NaN'。 在輸入時,字串 NaN 識別是不區分大小寫的。
注意「非數字」的概念在大多數實作中,NaN 不被視為等於任何其他數值(包括 NaN)。為了允許數值在樹狀索引中排序和使用,PostgreSQL 將 NaN 值視為相等或大於所有的非 NaN 值。
decimal 和 numeric 的型別是相同的。 這兩種型別都是 SQL 標準的一部分。
當需要四捨五入時,數字型別會往離零較遠的值調整,而(在大多數機器上)實數和雙精度型別會調整到最接近的偶數。 例如:
資料型別中 real 和 double 是非精確的、可變精確度的數字型別。在實務上,這些型別通常是針對二進制浮點數運算(分別為單精度和雙精度)的IEEE 754標準的實作,需要底層的中央處理器、作業系統和編譯器支持。
非精確意味著某些值不能完全轉換為內部格式,並以近似值儲存,因此儲存和檢索值可能會表現出輕微的差異。管理這些誤差以及它們如何計算傳遞是數學和計算機科學分支的主題,除了以下幾點之外,這裡不再討論:
如果你需要精確的儲存和計算(例如貨幣金額),請改為使用 numeric 型別。
如果你想對這些型別做任何重要的複雜計算,特別是如果你依賴邊界情況下的某些行為(極大極小值或超過上下限),你應該仔細評估實作方式。
比較兩個相等的浮點數值可能並不總是按預期中直覺的方式運作。
在大多數平台上,real 型別的範圍至少為 1E-37 至 1E + 37,精確度至少為 6 位數十進制數字。double 型別的範圍通常在 1E-307 至 1E + 308 之間,精確度至少為 15 位數。數值太大或太小都會導致錯誤。如果輸入數字的精確度太高,四捨五入的情況則可能會發生。數字太接近於零,卻不能表示為零的話,將導致 underflow 超過下限的錯誤。
注意extra_float_digits 參數設定控制浮點數轉換為文字輸出時所包含的額外有效位數。使用預設值 0 時,PostgreSQL 支援的每個平台上的輸出都是相同的。增加它的話,能更精確地輸出儲存值,但可能在不同平台間是不同的結果。
除了普通的數值之外,浮點型別還有幾個特殊的值:
Infinity
-Infinity
NaN
這些分別代表 IEEE 754 特殊值「無限大」、「負無限大」和「非數字」。(在浮點數計算不符合 IEEE 754 標準的機器上,這些值可能無法如期運作。)在 SQL 指令中將這些值作為常數寫入時,必須在其放入單引號中,例如 UPDATE table SET x = '-Infinity'。 在輸入時,這些字串識別是不區分大小寫的。
注意IEEE 754 規定 NaN 不應與任何其他浮點數值(包括NaN)相等。為了允許浮點值在樹狀索引中排序和使用,PostgreSQL 將 NaN 視為相等或大於所有非 NaN 的數值。
PostgreSQL 也支援 SQL 標準的 float 和 float(p) 來表示非精確的數字型別。這裡,p 指的是二進位數字的最小可接受的精確度。PostgreSQL 接受 float(1) 到 float(24) 選擇視為 real 型別,而 float(25) 到 float(53) 則視為 double。p 超出允許範圍的話會產生錯誤。沒有指定精確度的浮點數意味著 double。
注意假設 real 和 double 的尾數分別為 24 位和 53 位,以 IEEE 標準浮點數實作而言是正確的。在非 IEEE 平台上,它可能會有一些小問題,但為了簡單起見,最好在所有平台上都使用相同的 p 範圍。
注意本節介紹的是 PostgreSQL 專屬建立自動增量(auto-incrementing)欄位的方式。另一種方式是使用 CREATE TABLE 中描述的 SQL 標準識別欄位功能。
資料型別 smallserial、serial 和 bigserial 都不是真正的型別,而僅僅是建立唯一識別欄位(類似於某些其他資料庫所支援的 AUTO_INCREMENT 屬性)的方便型別語法。以目前的實作方式,請使用:
相當於以下的指令:
因此,我們建立了一個整數欄位,並將其預設值設定為序列數字產生器。使用 NOT NULL 限制條件來確保無法插入空值。(在大多數情況下,你還需要附加一個 UNIQUE 或 PRIMARY KEY 限制條件來防止偶然插入重複值,但這不是自動的。) 最後,這個序列被標記為「owned by」欄位,以便在欄位或資料表被刪除時一併被刪除。
注意smallserial、serial 和 bigserial,被實作來實現序列數字,即使沒有資料列被刪除,在欄位中出現的值在序列中仍可能會有「漏洞」或缺口。即使包含該值的資料列從未成功插入資料表中,從序列中分配的值仍然會用完。例如,如果資料插入的交易回溯了,則可能發生這種情況。有關詳細訊息,請參閱第 9.16 節中的 nextval()。
要將序列的下一個值插入到序列欄位中,請指定序列欄位應被分配其預設值。這可以透過從 INSERT 語句中欄位列表中排除欄位或使用DEFAULT關鍵字來完成。
型別名稱 serial 和 serial4 是等價的:都是建立整數(integer)欄位。型別名稱 bigserial 和 serial8 也以相同的方式作用,差別是他們建立一個 bigint 的欄位。如果你預期在資料表的整個生命週期中使用超過 2^31 個標識符,則應使用 bigserial。型別名稱 smallserial 和 serial2 也是以相同的,而除了它們是建立一個 smallint 欄位。
當擁有的欄位被刪除時,為序列欄位創建的序列也將自動刪除。但你可以刪除序列而不刪除欄位,這會強制刪除欄位的預設表示式。
Table 8.4. Character Types
Table 8.4 列出了 PostgreSQL 中可用的通用字串型別。
SQL 定義了兩種主要字串型別:character varying(n) 和 character(n),其中 n 是正整數。這兩種型別都可以儲存長度最多為 n 個字元(不是位元組)的字串。嘗試將較長的字串儲存到這些型別的欄位中將産生錯誤,除非多餘的字元都是空格,在這種情況下,字串將被截斷為最大長度。(這個有點奇怪的異常是 SQL 標準所要求的。)如果要儲存的字串比宣告的長度短,則 character 型別的值將被空格填充;character varying 的值將只儲存較短的字串。
如果明確地將值轉換為 character varying(n) 或 character(n),則超長值將被截斷為 n 個字元而不會引發錯誤。(這也是 SQL 標準所要求的。)
型別 varchar(n) 和 char(n) 分別是 character varying(n) 和 character(n) 的別名。沒有長度的 character 等同於 character(1)。如果在沒有長度的情況下使用 character varying,則該型別接受任何長度的字串。後者是 PostgreSQL 延伸功能。
另外,PostgreSQL 提供了 text 型別,它儲存任意長度的字串。雖然型別 text 不在 SQL 標準中,但是其他幾個 SQL 資料庫管理系統也支援它。
character 的值用空格填充到指定的長度 n,並以這種方式儲存和顯示。但是,在比較兩個型別字串時,尾隨空格在語義上無關緊要會被忽略。在空格很重要的排序規則中,這種行為會產生意想不到的結果; 例如 SELECT 'a '::CHAR(2) collate "C"<E'a\n'::CHAR(2)
會回傳 true,即使 C 語言環境會認為空格大於換行符。將字串轉換為其他字串型別之一時,將刪除尾隨的空格。請注意,尾隨空格在 character varying 和 text 方面具有語義重要性,尤其在使用樣式匹配時,即 LIKE 和正規表示式。
短字串(126 個位元組以下)的儲存要求是 1 個位元組加上實際字串,其中包括字串空間填充。較長的字串有 4 個位元組的開銷而不是 1。長字串由系統自動壓縮,因此磁碟上的物理需求可能更少。非常長的值也儲存在後台的資料表中,這樣它們就不會干擾對較短欄位的快速存取。在任何情況下,可儲存的最長字串大約為 1 GB。(資料型別宣告中 n 允許的最大值小於此值。更改此值沒有用,因為使用多位元組字串編碼時,位元組數和字元數可能完全不同。如果您希望儲存沒有特定上限的長字串,使用不帶長度的 text 或 character varying,而不是隨便設定長度限制。)
這三種型別之間並沒有效能差異,除了使用空白填充類型時增加的儲存空間之外,以及一些額外的 CPU 週期來檢查儲存長度與欄位中的長度。雖然 character(n) 在其他一些資料庫系統中具有效能優勢,但 PostgreSQL 中並沒有這樣的優勢;事實上,由於額外的儲存成本,character(n) 通常是三者中最慢的。在大多數情況下,應使用 text 或 character varying。
有關字串文字語法的資訊,請參閱第 4.1.2.1 節;有關可用運算子和函數的資訊,請參閱第 9 章。資料庫字元集決定用於儲存文字的字元集;有關字元集支援的更多訊息,請參閱第 23.3 節。
Example 8.1. Using the Character Types
PostgreSQL 中還有另外兩種固定長度的字串型別,如 Table 8.5 所示。name 型別僅用於在內部系統目錄中儲存指標,並非供一般使用者使用。它的長度目前定義為 64 個位元組(63 個可用字元加結尾符號),但應視 C 原始碼中的常數 NAMEDATALEN 而定。長度在編譯時設定(因此可以根據特殊用途進行調整); 預設的最大長度可能會在將來的版本中變更。型別「“char”」(注意雙引號)與 char(1) 的不同之處在於它僅使用一個位元組的儲存空間。它在系統目錄中作為簡單內部使用的列舉型別。
Table 8.5. Special Character Types
Enumerated (enum) types are data types that comprise a static, ordered set of values. They are equivalent to the enum
types supported in a number of programming languages. An example of an enum type might be the days of the week, or a set of status values for a piece of data.
Enum types are created using the command, for example:
Once created, the enum type can be used in table and function definitions much like any other type:
The ordering of the values in an enum type is the order in which the values were listed when the type was created. All standard comparison operators and related aggregate functions are supported for enums. For example:
Each enumerated data type is separate and cannot be compared with other enumerated types. See this example:
If you really need to do something like that, you can either write a custom operator or add explicit casts to your query:
Enum labels are case sensitive, so 'happy'
is not the same as 'HAPPY'
. White space in the labels is significant too.
An enum value occupies four bytes on disk. The length of an enum value's textual label is limited by the NAMEDATALEN
setting compiled into PostgreSQL; in standard builds this means at most 63 bytes.
PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses, as shown in . It is better to use these types instead of plain text types to store network addresses, because these types offer input error checking and specialized operators and functions (see ).
When sorting inet
or cidr
data types, IPv4 addresses will always sort before IPv6 addresses, including IPv4 addresses encapsulated or mapped to IPv6 addresses, such as ::10.2.3.4 or ::ffff:10.4.3.2.
inet
The inet
type holds an IPv4 or IPv6 host address, and optionally its subnet, all in one field. The subnet is represented by the number of network address bits present in the host address (the “netmask”). If the netmask is 32 and the address is IPv4, then the value does not indicate a subnet, only a single host. In IPv6, the address length is 128 bits, so 128 bits specify a unique host address. Note that if you want to accept only networks, you should use the cidr
type rather than inet
.
The input format for this type is address/y
where address
is an IPv4 or IPv6 address and y
is the number of bits in the netmask. If the /y
portion is missing, the netmask is 32 for IPv4 and 128 for IPv6, so the value represents just a single host. On display, the /y
portion is suppressed if the netmask specifies a single host.
cidr
The cidr
type holds an IPv4 or IPv6 network specification. Input and output formats follow Classless Internet Domain Routing conventions. The format for specifying networks is address/y
where address
is the network represented as an IPv4 or IPv6 address, and y
is the number of bits in the netmask. If y
is omitted, it is calculated using assumptions from the older classful network numbering system, except it will be at least large enough to include all of the octets written in the input. It is an error to specify a network address that has bits set to the right of the specified netmask.
shows some examples.
cidr
Type Input Examplesinet
vs. cidr
The essential difference between inet
and cidr
data types is that inet
accepts values with nonzero bits to the right of the netmask, whereas cidr
does not. For example, 192.168.0.1/24
is valid for inet
but not for cidr
.
If you do not like the output format for inet
or cidr
values, try the functions host
, text
, and abbrev
.
macaddr
The macaddr
type stores MAC addresses, known for example from Ethernet card hardware addresses (although MAC addresses are used for other purposes as well). Input is accepted in the following formats:
These examples would all specify the same address. Upper and lower case is accepted for the digits a
through f
. Output is always in the first of the forms shown.
IEEE Std 802-2001 specifies the second shown form (with hyphens) as the canonical form for MAC addresses, and specifies the first form (with colons) as the bit-reversed notation, so that 08-00-2b-01-02-03 = 01:00:4D:08:04:0C. This convention is widely ignored nowadays, and it is relevant only for obsolete network protocols (such as Token Ring). PostgreSQL makes no provisions for bit reversal, and all accepted formats use the canonical LSB order.
The remaining five input formats are not part of any standard.
macaddr8
The macaddr8
type stores MAC addresses in EUI-64 format, known for example from Ethernet card hardware addresses (although MAC addresses are used for other purposes as well). This type can accept both 6 and 8 byte length MAC addresses and stores them in 8 byte length format. MAC addresses given in 6 byte format will be stored in 8 byte length format with the 4th and 5th bytes set to FF and FE, respectively. Note that IPv6 uses a modified EUI-64 format where the 7th bit should be set to one after the conversion from EUI-48. The function macaddr8_set7bit
is provided to make this change. Generally speaking, any input which is comprised of pairs of hex digits (on byte boundaries), optionally separated consistently by one of ':'
, '-'
or '.'
, is accepted. The number of hex digits must be either 16 (8 bytes) or 12 (6 bytes). Leading and trailing whitespace is ignored. The following are examples of input formats that are accepted:
These examples would all specify the same address. Upper and lower case is accepted for the digits a
through f
. Output is always in the first of the forms shown. The last six input formats that are mentioned above are not part of any standard. To convert a traditional 48 bit MAC address in EUI-48 format to modified EUI-64 format to be included as the host portion of an IPv6 address, use macaddr8_set7bit
as shown:
PostgreSQL 內建一套豐富的資料型別供用戶使用。使用者也可以使用 指令讓 PostgreSQL 增加新的資料型別。
Table 8.1 列出所有內建的通用資料型別。大多數列在「Aliases」中的替代名稱是由於在 PostgreSQL 內部使用的歷史因素。此外,還有一些內部使用或不建議使用的資料型別,但這裡並沒有列出。
Table 8.1. Data Types
以下資料型別(或其拼寫方式)是由 SQL 指定的:bigint
,bit
,bit varying
,boolean
,char
,character varying
,character
,varchar
,date
,double precision
,integer
,interval
,numeric
,decimal
,real
,smallint
,time
(with or without time zone),timestamp
(with or without time zone),xml
.
每種資料型別都具有其明確的輸入和輸出功能外部表示法。許多內建的資料型別都有明顯的外部格式。但是,有幾種資料型別是 PostgreSQL 獨有的,比如幾何路徑,或者有幾種可能的格式,像是日期和時間型別。某些輸入和輸出功能是不可逆的,意即,與原始輸入相比,輸出功能的結果可能會失去一些精確度。
The xml
data type can be used to store XML data. Its advantage over storing XML data in a text
field is that it checks the input values for well-formedness, and there are support functions to perform type-safe operations on it; see . Use of this data type requires the installation to have been built with configure --with-libxml
.
The xml
type can store well-formed “documents”, as defined by the XML standard, as well as “content” fragments, which are defined by reference to the more permissive of the XQuery and XPath data model. Roughly, this means that content fragments can have more than one top-level element or character node. The expression xmlvalue
IS DOCUMENT can be used to evaluate whether a particular xml
value is a full document or only a content fragment.
Limits and compatibility notes for the xml
data type can be found in .
To produce a value of type xml
from character data, use the function xmlparse
:
Examples:
While this is the only way to convert character strings into XML values according to the SQL standard, the PostgreSQL-specific syntaxes:
can also be used.
The xml
type does not validate input values against a document type declaration (DTD), even when the input value specifies a DTD. There is also currently no built-in support for validating against other XML schema languages such as XML Schema.
The inverse operation, producing a character string value from xml
, uses the function xmlserialize
:
type
can be character
, character varying
, or text
(or an alias for one of those). Again, according to the SQL standard, this is the only way to convert between type xml
and character types, but PostgreSQL also allows you to simply cast the value.
When a character string value is cast to or from type xml
without going through XMLPARSE
or XMLSERIALIZE
, respectively, the choice of DOCUMENT
versus CONTENT
is determined by the “XML option” session configuration parameter, which can be set using the standard command:
or the more PostgreSQL-like syntax
The default is CONTENT
, so all forms of XML data are allowed.
When using binary mode to pass query parameters to the server and query results back to the client, no encoding conversion is performed, so the situation is different. In this case, an encoding declaration in the XML data will be observed, and if it is absent, the data will be assumed to be in UTF-8 (as required by the XML standard; note that PostgreSQL does not support UTF-16). On output, data will have an encoding declaration specifying the client encoding, unless the client encoding is UTF-8, in which case it will be omitted.
Needless to say, processing XML data with PostgreSQL will be less error-prone and more efficient if the XML data encoding, client encoding, and server encoding are the same. Since XML data is internally processed in UTF-8, computations will be most efficient if the server encoding is also UTF-8.
Some XML-related functions may not work at all on non-ASCII data when the server encoding is not UTF-8. This is known to be an issue for xmltable()
and xpath()
in particular.
The xml
data type is unusual in that it does not provide any comparison operators. This is because there is no well-defined and universally useful comparison algorithm for XML data. One consequence of this is that you cannot retrieve rows by comparing an xml
column against a search value. XML values should therefore typically be accompanied by a separate key field such as an ID. An alternative solution for comparing XML values is to convert them to character strings first, but note that character string comparison has little to do with a useful XML comparison method.
Since there are no comparison operators for the xml
data type, it is not possible to create an index directly on a column of this type. If speedy searches in XML data are desired, possible workarounds include casting the expression to a character string type and indexing that, or indexing an XPath expression. Of course, the actual query would have to be adjusted to search by the indexed expression.
The text-search functionality in PostgreSQL can also be used to speed up full-document searches of XML data. The necessary preprocessing support is, however, not yet available in the PostgreSQL distribution.
PostgreSQL provides two data types that are designed to support full text search, which is the activity of searching through a collection of natural-language documents to locate those that best match a query. The tsvector
type represents a document in a form optimized for text search; the tsquery
type similarly represents a text query. provides a detailed explanation of this facility, and summarizes the related functions and operators.
tsvector
A tsvector
value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see for details). Sorting and duplicate-elimination are done automatically during input, as shown in this example:
To represent lexemes containing whitespace or punctuation, surround them with quotes:
(We use dollar-quoted string literals in this example and the next one to avoid the confusion of having to double quote marks within the literals.) Embedded quotes and backslashes must be doubled:
Optionally, integer positions can be attached to lexemes:
A position normally indicates the source word's location in the document. Positional information can be used for proximity ranking. Position values can range from 1 to 16383; larger numbers are silently set to 16383. Duplicate positions for the same lexeme are discarded.
Lexemes that have positions can further be labeled with a weight, which can be A
, B
, C
, or D
. D
is the default and hence is not shown on output:
Weights are typically used to reflect document structure, for example by marking title words differently from body words. Text search ranking functions can assign different priorities to the different weight markers.
It is important to understand that the tsvector
type itself does not perform any word normalization; it assumes the words it is given are normalized appropriately for the application. For example,
For most English-text-searching applications the above words would be considered non-normalized, but tsvector
doesn't care. Raw document text should usually be passed through to_tsvector
to normalize the words appropriately for searching:
tsquery
A tsquery
value stores lexemes that are to be searched for, and can combine them using the Boolean operators &
(AND), |
(OR), and !
(NOT), as well as the phrase search operator <->
(FOLLOWED BY). There is also a variant <
N
> of the FOLLOWED BY operator, where N
is an integer constant that specifies the distance between the two lexemes being searched for. <->
is equivalent to <1>
.
Parentheses can be used to enforce grouping of these operators. In the absence of parentheses, !
(NOT) binds most tightly, <->
(FOLLOWED BY) next most tightly, then &
(AND), with |
(OR) binding the least tightly.
Here are some examples:
Optionally, lexemes in a tsquery
can be labeled with one or more weight letters, which restricts them to match only tsvector
lexemes with one of those weights:
Also, lexemes in a tsquery
can be labeled with *
to specify prefix matching:
This query will match any word in a tsvector
that begins with “super”.
Quoting rules for lexemes are the same as described previously for lexemes in tsvector
; and, as with tsvector
, any required normalization of words must be done before converting to the tsquery
type. The to_tsquery
function is convenient for performing such normalization:
Note that to_tsquery
will process prefixes in the same way as other words, which means this comparison returns true:
because postgres
gets stemmed to postgr
:
which will match the stemmed form of postgraduate
.
Bit strings are strings of 1's and 0's. They can be used to store or visualize bit masks. There are two SQL bit types: bit(
n
) and bit varying(
n
), where n
is a positive integer.
bit
type data must match the length n
exactly; it is an error to attempt to store shorter or longer bit strings. bit varying
data is of variable length up to the maximum length n
; longer strings will be rejected. Writing bit
without a length is equivalent to bit(1)
, while bit varying
without a length specification means unlimited length.
If one explicitly casts a bit-string value to bit(
n
), it will be truncated or zero-padded on the right to be exactly n
bits, without raising an error. Similarly, if one explicitly casts a bit-string value to bit varying(
n
), it will be truncated on the right if it is more than n
bits.
Refer to for information about the syntax of bit string constants. Bit-logical operators and string manipulation functions are available; see .
Example 8.3. Using the Bit String Types
Geometric data types represent two-dimensional spatial objects. shows the geometric types available in PostgreSQL.
Table 8.20. Geometric Types
A rich set of functions and operators is available to perform various geometric operations such as scaling, translation, rotation, and determining intersections. They are explained in .
Points are the fundamental two-dimensional building block for geometric types. Values of type point
are specified using either of the following syntaxes:
where x
and y
are the respective coordinates, as floating-point numbers.
Points are output using the first syntax.
Lines are represented by the linear equation A
x + B
y + C
= 0, where A
and B
are not both zero. Values of type line
are input and output in the following form:
Alternatively, any of the following forms can be used for input:
where (
x1
,y1
) and (
x2
,y2
) are two different points on the line.
Line segments are represented by pairs of points that are the endpoints of the segment. Values of type lseg
are specified using any of the following syntaxes:
where (
x1
,y1
) and (
x2
,y2
) are the end points of the line segment.
Line segments are output using the first syntax.
Boxes are represented by pairs of points that are opposite corners of the box. Values of type box
are specified using any of the following syntaxes:
where (
x1
,y1
) and (
x2
,y2
) are any two opposite corners of the box.
Boxes are output using the second syntax.
Any two opposite corners can be supplied on input, but the values will be reordered as needed to store the upper right and lower left corners, in that order.
Paths are represented by lists of connected points. Paths can be open, where the first and last points in the list are considered not connected, or closed, where the first and last points are considered connected.
Values of type path
are specified using any of the following syntaxes:
where the points are the end points of the line segments comprising the path. Square brackets ([]
) indicate an open path, while parentheses (()
) indicate a closed path. When the outermost parentheses are omitted, as in the third through fifth syntaxes, a closed path is assumed.
Paths are output using the first or second syntax, as appropriate.
Polygons are represented by lists of points (the vertexes of the polygon). Polygons are very similar to closed paths, but are stored differently and have their own set of support routines.
Values of type polygon
are specified using any of the following syntaxes:
where the points are the end points of the line segments comprising the boundary of the polygon.
Polygons are output using the first syntax.
Circles are represented by a center point and radius. Values of type circle
are specified using any of the following syntaxes:
where (
x
,y
) is the center point and r
is the radius of the circle.
Circles are output using the first syntax.
PostgreSQL 支援完整的 SQL 日期和時間格式,如表 8.9 所示。對於這些資料型態能使用的操作,將會在說明。
Table 8.9. 日期/時間型態
SQL 標準中要求 timestamp
的效果等同於 timestamp without time zone
,對此 PostgreSQL 尊重這個行為。同時 PostgreSQL 額外擴充了 timestamptz
作為 timestamp with time zone
的縮寫。
time
、timestamp
和 interval
接受 p
作為非必須的精度參數,可指定秒的欄位保留的小數位數。預設情況下,精度沒有明確的界限。其中 p
允許的範圍是 0 到 6。
interval
型態有個額外的選項,可以寫下下列其中一個詞組來限制存放的欄位:
需注意若是 fields
和 p
同時指定時,fields
必須要包含 SECOND
。這是因為精度只會套用在秒上。
time with time zone
型態是由 SQL 標準所定義的,但是在定義中展示的屬性會導致對有用性產生疑問。在多數狀況下,date
、time
、timestamp without time zone
和 timestamp with time zone
的組合應該就能提供任何應用程式需要的完整日期/時間功能。
abstime
和 reltime
型態是較低精度的內部用型態,並不建議將這些型態用在應用程式中;這些內部型態也可能在未來的釋出中消失。
其中 p
是非必須的精度設定,用來指定秒欄位的小數位數。精度可以用來指定 time
、timestamp
和 interval
型態,可指定範圍為 0 到 6。如果沒有指定精度時,預設將以字面數值的精度為準(但最多不超過 6 位)。
表 8.10. 日期輸入
time-of-day 格式包含 time [ (p
) ] without time zone和
time [ (_
p_\) \] with time zone
,其中 time
單獨出現時等同於 time without time zone
。
表 8.11. 時間輸入
表 8.12. 時區輸入
時間戳記型態的合法輸入,依序包含了日期、時間、非必須的時區、以及非必須的 AD
或者 BC
。 (其中,AD
或者 BC
也可以寫在時區前面,但這並非推薦的格式。)因此:
以及:
都是遵循 ISO 8601 標準的合法值。除此之外,常見的格式:
也有支援。
SQL 標準中,timestamp without time zone
和 timestamp with time zone
字面可以在時間後面加上 “+” 或 “-” 符號和時差來做區別,因此根據這個標準,
是 timestamp without time zone
型態,而
則是 timestamp with time zone
型態。PostgreSQL 從不會在識別型態前就解析字面的內容,因此會將上述兩種值都視為 timestamp without time zone
型態。如要確保字面會被視為 timestamp with time zone
,請給它正確而明確的型態:
在一個已被確定為沒有時區的時間戳記的字串中,PostgreSQL 將默默地忽略任何時區指示。也就是說,結果值是從輸入值中的日期/時間字串產生的,而不針對時區進行調整。
沒有時區的時間戳記和帶時區的時間戳記之間的轉換通常假定應該採用沒有時區值的時間戳記或本地時間所給予的時區。可以使用 AT TIME ZONE 為指定轉換不同的時區。
為方便起見,PostgreSQL 支援幾個特殊的日期/時間輸入值,如 Table 8.13 所示。infinaity 和 -infinity 值在系統內部有特別的表示,但不會顯示;而其他的只是符號縮寫,在閱讀時會轉換為普通的日期/時間值。(特別是,now 和相關的字串一旦被讀取就會被轉換為特定的時間值。)當在 SQL 命令中要作為常數使用時,所有這些值都需要用單引號括起來。
Table 8.13. Special Date/Time Inputs
Table 8.14. Date/Time Output Styles
ISO 8601 specifies the use of uppercase letter T
to separate the date and time. PostgreSQLaccepts that format on input, but on output it uses a space rather than T
, as shown above. This is for readability and for consistency with RFC 3339 as well as some other database systems.
Table 8.15. Date Order Conventions
Time zones, and time-zone conventions, are influenced by political decisions, not just earth geometry. Time zones around the world became somewhat standardized during the 1900s, but continue to be prone to arbitrary changes, particularly with respect to daylight-savings rules. PostgreSQL uses the widely-used IANA (Olson) time zone database for information about historical time zone rules. For times in the future, the assumption is that the latest known rules for a given time zone will continue to be observed indefinitely far into the future.
PostgreSQL endeavors to be compatible with the SQL standard definitions for typical usage. However, the SQL standard has an odd mix of date and time types and capabilities. Two obvious problems are:
Although the date
type cannot have an associated time zone, the time
type can. Time zones in the real world have little meaning unless associated with a date as well as a time, since the offset can vary through the year with daylight-saving time boundaries.
The default time zone is specified as a constant numeric offset from UTC. It is therefore impossible to adapt to daylight-saving time when doing date/time arithmetic across DST boundaries.
To address these difficulties, we recommend using date/time types that contain both date and time when using time zones. We do not recommend using the type time with time zone
(though it is supported by PostgreSQL for legacy applications and for compliance with the SQL standard). PostgreSQL assumes your local time zone for any type containing only date or time.
PostgreSQL allows you to specify time zones in three different forms:
In addition to the timezone names and abbreviations, PostgreSQL will accept POSIX-style time zone specifications of the form STDoffset
or STDoffsetDST
, where STD
is a zone abbreviation, offset
is a numeric offset in hours west from UTC, and DST
is an optional daylight-savings zone abbreviation, assumed to stand for one hour ahead of the given offset. For example, if EST5EDT
were not already a recognized zone name, it would be accepted and would be functionally equivalent to United States East Coast time. In this syntax, a zone abbreviation can be a string of letters, or an arbitrary string surrounded by angle brackets (<>
). When a daylight-savings zone abbreviation is present, it is assumed to be used according to the same daylight-savings transition rules used in the IANA time zone database's posixrules
entry. In a standard PostgreSQL installation, posixrules
is the same as US/Eastern
, so that POSIX-style time zone specifications follow USA daylight-savings rules. If needed, you can adjust this behavior by replacing the posixrules
file.
In short, this is the difference between abbreviations and full names: abbreviations represent a specific offset from UTC, whereas many of the full names imply a local daylight-savings time rule, and so have two possible UTC offsets. As an example, 2014-06-04 12:00 America/New_York
represents noon local time in New York, which for this particular date was Eastern Daylight Time (UTC-4). So 2014-06-04 12:00 EDT
specifies that same time instant. But 2014-06-04 12:00 EST
specifies noon Eastern Standard Time (UTC-5), regardless of whether daylight savings was nominally in effect on that date.
To complicate matters, some jurisdictions have used the same timezone abbreviation to mean different UTC offsets at different times; for example, in Moscow MSK
has meant UTC+3 in some years and UTC+4 in others. PostgreSQLinterprets such abbreviations according to whatever they meant (or had most recently meant) on the specified date; but, as with the EST
example above, this is not necessarily the same as local civil time on that date.
One should be wary that the POSIX-style time zone feature can lead to silently accepting bogus input, since there is no check on the reasonableness of the zone abbreviations. For example, SET TIMEZONE TO FOOBAR0
will work, leaving the system effectively using a rather peculiar abbreviation for UTC. Another issue to keep in mind is that in POSIX time zone names, positive offsets are used for locations west of Greenwich. Everywhere else, PostgreSQLfollows the ISO-8601 convention that positive timezone offsets are east of Greenwich.
In all cases, timezone names and abbreviations are recognized case-insensitively. (This is a change from PostgreSQL versions prior to 8.2, which were case-sensitive in some contexts but not others.)
The SQL command SET TIME ZONE
sets the time zone for the session. This is an alternative spelling of SET TIMEZONE TO
with a more SQL-spec-compatible syntax.
The PGTZ
environment variable is used by libpq clients to send a SET TIME ZONE
command to the server upon connection.
interval
values can be written using the following verbose syntax:
Quantities of days, hours, minutes, and seconds can be specified without explicit unit markings. For example, '1 12:59:10'
is read the same as '1 day 12 hours 59 min 10 sec'
. Also, a combination of years and months can be specified with a dash; for example '200-10'
is read the same as '200 years 10 months'
. (These shorter forms are in fact the only ones allowed by the SQL standard, and are used for output when IntervalStyle
is set to sql_standard
.)
Interval values can also be written as ISO 8601 time intervals, using either the “format with designators” of the standard's section 4.4.3.2 or the “alternative format” of section 4.4.3.3. The format with designators looks like this:
Table 8.16. ISO 8601 Interval Unit Abbreviations
In the alternative format:
the string must begin with P
, and a T
separates the date and time parts of the interval. The values are given as numbers similar to ISO 8601 dates.
When writing an interval constant with a fields
specification, or when assigning a string to an interval column that was defined with a fields
specification, the interpretation of unmarked quantities depends on the fields
. For example INTERVAL '1' YEAR
is read as 1 year, whereas INTERVAL '1'
means 1 second. Also, field values “to the right” of the least significant field allowed by the fields
specification are silently discarded. For example, writing INTERVAL '1 day 2:03:04' HOUR TO MINUTE
results in dropping the seconds field, but not the day field.
According to the SQL standard all fields of an interval value must have the same sign, so a leading negative sign applies to all fields; for example the negative sign in the interval literal '-1 2:03:04'
applies to both the days and hour/minute/second parts. PostgreSQL allows the fields to have different signs, and traditionally treats each field in the textual representation as independently signed, so that the hour/minute/second part is considered positive in this example. If IntervalStyle
is set to sql_standard
then a leading sign is considered to apply to all fields (but only if no additional signs appear). Otherwise the traditional PostgreSQL interpretation is used. To avoid ambiguity, it's recommended to attach an explicit sign to each field if any field is negative.
Internally interval
values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. Because intervals are usually created from constant strings or timestamp
subtraction, this storage method works well in most cases. Functions justify_days
and justify_hours
are available for adjusting days and hours that overflow their normal ranges.
In the verbose input format, and in some fields of the more compact input formats, field values can have fractional parts; for example '1.5 week'
or '01:02:03.45'
. Such input is converted to the appropriate number of months, days, and seconds for storage. When this would result in a fractional number of months or days, the fraction is added to the lower-order fields using the conversion factors 1 month = 30 days and 1 day = 24 hours. For example,'1.5 month'
becomes 1 month and 15 days. Only seconds will ever be shown as fractional on output.
Table 8.17. Interval Input
The sql_standard
style produces output that conforms to the SQL standard's specification for interval literal strings, if the interval value meets the standard's restrictions (either year-month only or day-time only, with no mixing of positive and negative components). Otherwise the output looks like a standard year-month literal string followed by a day-time literal string, with explicit signs added to disambiguate mixed-sign intervals.
The output of the postgres_verbose
style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle
parameter was set to non-ISO
output.
The output of the iso_8601
style matches the “format with designators” described in section 4.4.3.2 of the ISO 8601 standard.
Table 8.18. Interval Output Style Examples
資料型別 uuid 儲存由 RFC 4122、ISO/IEC 9834-8:2005 和相關標準定義的通用唯一識別字 (Universally Unique IDentifiers, UUID)。(有些系統將此資料型別稱為 Globally Unique IDentifier 或 GUID。)此識別字是一個 128 位元的數字,由所選擇演算法產生,以確保其他任何人在已知的情況下使用相同的演算法都不太可能產生相同的識別字。因此,對於分散式系統,這些識別字提供了比序列產生器更好的唯一性保證,序列產生器僅在單一資料庫中確保唯一性。
一個 UUID 寫成一系列小寫的十六進位數字,由連接字元分隔為幾組,特別是一組 8 位數字後跟三組 4 位數字後跟一組 12 位數字,總共 32 位數字代表 128 位元。此標準形式的 UUID 範例是:
PostgreSQL 還接受以下替代形式的輸入方式:使用大寫數字、用大括號括起來的標準格式、省略部分或全部連接字元、在任何一組四位數字後加上連接字元。一些例子如下:
Output is always in the standard form.
有關如何在 PostgreSQL 中產生 UUID,請參閱。
Although enum types are primarily intended for static sets of values, there is support for adding new values to an existing enum type, and for renaming values (see ). Existing values cannot be removed from an enum type, nor can the sort ordering of such values be changed, short of dropping and re-creating the enum type.
The translations from internal enum values to textual labels are kept in the system catalog . Querying this catalog directly can be useful.
Care must be taken when dealing with multiple character encodings on the client, server, and in the XML data passed through them. When using the text mode to pass queries to the server and query results to the client (which is the normal mode), PostgreSQL converts all character data passed between the client and the server and vice versa to the character encoding of the respective end; see . This includes string representations of XML values, such as in the above examples. This would ordinarily mean that encoding declarations contained in XML data can become invalid as the character data is converted to other encodings while traveling between client and server, because the embedded encoding declaration is not changed. To cope with this behavior, encoding declarations contained in character strings presented for input to the xml
type are ignored, and content is assumed to be in the current server encoding. Consequently, for correct processing, character strings of XML data must be sent from the client in the current client encoding. It is the responsibility of the client to either convert documents to the current client encoding before sending them to the server, or to adjust the client encoding appropriately. On output, values of type xml
will not have an encoding declaration, and clients should assume all data is in the current client encoding.
Again, see for more detail.
A bit string value requires 1 byte for each group of 8 bits, plus 5 or 8 bytes overhead depending on the length of the string (but long values may be compressed or moved out-of-line, as explained in for character strings).
日期和時間的輸入格式可以接受幾乎任何合理的格式,包括 ISO 8601、相容於 SQL 的格式、傳統 POSTGRES 格式或者其他格式。在部份格式中,日期的年、月、日的順序可能很含糊,因此有支援指定這些欄位期望的順序。可以設定 參數為 MDY
來以 月-日-年 表示、設定為 DMY
以 日-月-年 表示、或者設定為 YMD
以 年-月-日 表示。
PostgreSQL 在處理日期/時間的輸入是比 SQL 標準要求的更加靈活,關於精確的解析規則以及包含月份、一週天數、時區等可以接受的文字欄位,可以參閱。
請記得,任何日期和時間字面的輸入,都需要像文字一樣以單引號結束,詳細的資訊請參閱。SQL 要求使用以下的語法:
列出 date
型態的一些可能的輸入格式:
這些型態的合法輸入包含了一天當中的時間,以及非必須的時區。(請參照 和)。如果在 time without time zone
的輸入中指定了時區,則時區會被無聲地忽略。你也可以指定日期,但日期也會被忽略,除非你指定的時區名稱是像 America/New_York
這種具有日光節約規則的時區,因為在這種狀況下,為了能夠決定要套用一般規則或是日光節約規則,必須要有日期。適合的時差資訊會被紀錄在 time with time zone
的值當中。
關於指定時區的其他資訊,請參照。
對於帶有時區的時間戳記,內部儲存的值始終為 UTC(Universal Coordinated Time,傳統上稱為格林威治標準時間,GMT)。具有指定時區的輸入值將使用該時區的相對偏移量轉換為 UTC。如果輸入字串中未指定時區,則假定它位於系統的 參數所指示的時區中,並使用時區的偏移量轉換為 UTC。
輸出帶有時區值的時間戳記時,始終由 UTC 轉換為目前時區,並在該時區中顯示為本地時間。要查看另一個時區的時間,請變更時區或使用 AT TIME ZONE 語法(參閱)。
以下 SQL 相容函數也可用於取得相對應資料型別目前的時間值:CURRENT_DATE,CURRENT_TIME,CURRENT_TIMESTAMP,LOCALTIME,LOCALTIMESTAMP。後四者接受選擇性的 subsecond 級精確度。 (請參閱。)請注意,這些是 SQL 函數,在資料輸入字串中會無法識別。
The output format of the date/time types can be set to one of the four styles ISO 8601, SQL (Ingres), traditional POSTGRES (Unix date format), or German. The default is the ISO format. (The SQL standard requires the use of the ISO 8601 format. The name of the “SQL” output format is a historical accident.) shows examples of each output style. The output of the date
and time
types is generally only the date or time part in accordance with the given examples. However, the POSTGRES style outputs date-only values in ISO format.
In the SQL and POSTGRES styles, day appears before month if DMY field ordering has been specified, otherwise month appears before day. (See for how this setting also affects interpretation of input values.) shows examples.
The date/time style can be selected by the user using the SET datestyle
command, the parameter in the postgresql.conf
configuration file, or the PGDATESTYLE
environment variable on the server or client.
The formatting function to_char
(see ) is also available as a more flexible way to format date/time output.
All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the zone specified by the configuration parameter before being displayed to the client.
A full time zone name, for example America/New_York
. The recognized time zone names are listed in the pg_timezone_names
view (see ). PostgreSQL uses the widely-used IANA time zone data for this purpose, so the same time zone names are also recognized by much other software.
A time zone abbreviation, for example PST
. Such a specification merely defines a particular offset from UTC, in contrast to full time zone names which can imply a set of daylight savings transition-date rules as well. The recognized abbreviations are listed in the pg_timezone_abbrevs
view (see ). You cannot set the configuration parameters or to a time zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME ZONE
operator.
Neither timezone names nor abbreviations are hard-wired into the server; they are obtained from configuration files stored under .../share/timezone/
and .../share/timezonesets/
of the installation directory (see ).
The configuration parameter can be set in the file postgresql.conf
, or in any of the other standard ways described in . There are also some special ways to set it:
where quantity
is a number (possibly signed); unit
is microsecond
, millisecond
, second
, minute
, hour
, day
, week
, month
, year
, decade
, century
, millennium
, or abbreviations or plurals of these units; direction
can be ago
or empty. The at sign (@
) is optional noise. The amounts of the different units are implicitly added with appropriate sign accounting. ago
negates all the fields. This syntax is also used for interval output, if is set to postgres_verbose
.
The string must start with a P
, and may include a T
that introduces the time-of-day units. The available unit abbreviations are given in . Units may be omitted, and may be specified in any order, but units smaller than a day must appear after T
. In particular, the meaning of M
depends on whether it is before or after T
.
shows some examples of valid interval
input.
The output format of the interval type can be set to one of the four styles sql_standard
, postgres
, postgres_verbose
, or iso_8601
, using the command SET intervalstyle
. The default is the postgres
format. shows examples of each output style.
The output of the postgres
style matches the output of PostgreSQL releases prior to 8.4 when the parameter was set to ISO
.
Name
Storage Size
Description
boolean
1 byte
state of true or false
TRUE
't'
'true'
'y'
'yes'
'on'
'1'
FALSE
'f'
'false'
'n'
'no'
'off'
'0'
Name
Storage Size
Description
bytea
1 or 4 bytes 加上實際的位元組字串長度
可變長度二進位字串
Decimal Octet Value
Description
Escaped Input Representation
Example
Output Representation
0
zero octet
E'\\000'
SELECT E'\\000'::bytea;
\000
39
single quote
''''
or E'\\047'
SELECT E'\''::bytea;
'
92
backslash
E'\\\\'
or E'\\134'
SELECT E'\\\\'::bytea;
\\
0 to 31 and 127 to 255
“non-printable” octets
E'\\
xxx'
(octal value)
SELECT E'\\001'::bytea;
\001
Decimal Octet Value
Description
Escaped Output Representation
Example
Output Result
92
backslash
\\
SELECT E'\\134'::bytea;
\\
0 to 31 and 127 to 255
“non-printable” octets
\
xxx
(octal value)
SELECT E'\\001'::bytea;
\001
32 to 126
“printable” octets
client character set representation
SELECT E'\\176'::bytea;
~
Name
Storage Size
Description
Range
money
8 bytes
currency amount
-92233720368547758.08 to +92233720368547758.07
Name
Storage Size
Description
Range
smallint
2 bytes
small-range integer
-32768 to +32767
integer
4 bytes
typical choice for integer
-2147483648 to +2147483647
bigint
8 bytes
large-range integer
-9223372036854775808 to +9223372036854775807
decimal
variable
user-specified precision, exact
up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
numeric
variable
user-specified precision, exact
up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
real
4 bytes
variable-precision, inexact
6 decimal digits precision
double precision
8 bytes
variable-precision, inexact
15 decimal digits precision
smallserial
2 bytes
small autoincrementing integer
1 to 32767
serial
4 bytes
autoincrementing integer
1 to 2147483647
bigserial
8 bytes
large autoincrementing integer
1 to 9223372036854775807
Name
Description
character varying(n)
, varchar(n)
可變長度,但有限制
character(n)
, char(n)
固定長度,空白填充
text
可變且無限長度
(1)
char_length 函數在 9.4 節中討論。
Name
Storage Size
Description
"char"
1 byte
單位元組內部型別
name
64 bytes
物件名稱的內部型別
|
|
|
192.168.100.128/25 | 192.168.100.128/25 | 192.168.100.128/25 |
192.168/24 | 192.168.0.0/24 | 192.168.0/24 |
192.168/25 | 192.168.0.0/25 | 192.168.0.0/25 |
192.168.1 | 192.168.1.0/24 | 192.168.1/24 |
192.168 | 192.168.0.0/24 | 192.168.0/24 |
128.1 | 128.1.0.0/16 | 128.1/16 |
128 | 128.0.0.0/16 | 128.0/16 |
128.1.2 | 128.1.2.0/24 | 128.1.2/24 |
10.1.2 | 10.1.2.0/24 | 10.1.2/24 |
10.1 | 10.1.0.0/16 | 10.1/16 |
10 | 10.0.0.0/8 | 10/8 |
10.1.2.3/32 | 10.1.2.3/32 | 10.1.2.3/32 |
2001:4f8:3:ba::/64 | 2001:4f8:3:ba::/64 | 2001:4f8:3:ba::/64 |
2001:4f8:3:ba:2e0:81ff:fe22:d1f1/128 | 2001:4f8:3:ba:2e0:81ff:fe22:d1f1/128 | 2001:4f8:3:ba:2e0:81ff:fe22:d1f1 |
::ffff:1.2.3.0/120 | ::ffff:1.2.3.0/120 | ::ffff:1.2.3/120 |
::ffff:1.2.3.0/128 | ::ffff:1.2.3.0/128 | ::ffff:1.2.3.0/128 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example | Description |
1999-01-08 | ISO 8601; January 8 in any mode (recommended format) |
January 8, 1999 | unambiguous in any |
1/8/1999 | January 8 in |
1/18/1999 | January 18 in |
01/02/03 | January 2, 2003 in |
1999-Jan-08 | January 8 in any mode |
Jan-08-1999 | January 8 in any mode |
08-Jan-1999 | January 8 in any mode |
99-Jan-08 | January 8 in |
08-Jan-99 | January 8, except error in |
Jan-08-99 | January 8, except error in |
19990108 | ISO 8601; January 8, 1999 in any mode |
990108 | ISO 8601; January 8, 1999 in any mode |
1999.008 | year and day of year |
J2451187 | Julian date |
January 8, 99 BC | year 99 BC |
Example | Description |
| ISO 8601 |
| ISO 8601 |
| ISO 8601 |
| ISO 8601 |
| same as 04:05; AM does not affect value |
| same as 16:05; input hour must be <= 12 |
| ISO 8601 |
| ISO 8601 |
| ISO 8601 |
| ISO 8601 |
| time zone specified by abbreviation |
| time zone specified by full name |
Example | Description |
| Abbreviation (for Pacific Standard Time) |
| Full time zone name |
| POSIX-style time zone specification |
| ISO-8601 offset for PST |
| ISO-8601 offset for PST |
| ISO-8601 offset for PST |
| Military abbreviation for UTC |
| Short form of |
Input String | Valid Types | Description |
|
| 1970-01-01 00:00:00+00 (Unix system time zero) |
|
| later than all other time stamps |
|
| earlier than all other time stamps |
|
| current transaction's start time |
|
| midnight today |
|
| midnight tomorrow |
|
| midnight yesterday |
|
| 00:00:00.00 UTC |
Style Specification | Description | Example |
| ISO 8601, SQL standard |
|
| traditional style |
|
| original style |
|
| regional style |
|
| Input Ordering | Example Output |
|
|
|
|
|
|
|
|
|
Abbreviation | Meaning |
Y | Years |
M | Months (in the date part) |
W | Weeks |
D | Days |
H | Hours |
M | Minutes (in the time part) |
S | Seconds |
Example | Description |
1-2 | SQL standard format: 1 year 2 months |
3 4:05:06 | SQL standard format: 3 days 4 hours 5 minutes 6 seconds |
1 year 2 months 3 days 4 hours 5 minutes 6 seconds | Traditional Postgres format: 1 year 2 months 3 days 4 hours 5 minutes 6 seconds |
P1Y2M3DT4H5M6S | ISO 8601 “format with designators”: same meaning as above |
P0001-02-03T04:05:06 | ISO 8601 “alternative format”: same meaning as above |
Style Specification | Year-Month Interval | Day-Time Interval | Mixed Interval |
| 1-2 | 3 4:05:06 | -1-2 +3 -4:05:06 |
| 1 year 2 mons | 3 days 04:05:06 | -1 year -2 mons +3 days -04:05:06 |
| @ 1 year 2 mons | @ 3 days 4 hours 5 mins 6 secs | @ 1 year 2 mons -3 days 4 hours 5 mins 6 secs ago |
| P1Y2M | P3DT4H5M6S | P-1Y-2M3DT-4H-5M-6S |
Name | Storage Size | Description |
| 7 or 19 bytes | IPv4 and IPv6 networks |
| 7 or 19 bytes | IPv4 and IPv6 hosts and networks |
| 6 bytes | MAC addresses |
| 8 bytes | MAC addresses (EUI-64 format) |
Name | Aliases | Description |
|
| signed eight-byte integer |
|
| autoincrementing eight-byte integer |
| fixed-length bit string |
|
| variable-length bit string |
|
| logical Boolean (true/false) |
| rectangular box on a plane |
| binary data (“byte array”) |
|
| fixed-length character string |
|
| variable-length character string |
| IPv4 or IPv6 network address |
| circle on a plane |
| calendar date (year, month, day) |
|
| double precision floating-point number (8 bytes) |
| IPv4 or IPv6 host address |
|
| signed four-byte integer |
| time span |
| textual JSON data |
| binary JSON data, decomposed |
| infinite line on a plane |
| line segment on a plane |
| MAC (Media Access Control) address |
| MAC (Media Access Control) address (EUI-64 format) |
| currency amount |
|
| exact numeric of selectable precision |
| geometric path on a plane |
| PostgreSQLLog Sequence Number |
| geometric point on a plane |
| closed geometric path on a plane |
|
| single precision floating-point number (4 bytes) |
|
| signed two-byte integer |
|
| autoincrementing two-byte integer |
|
| autoincrementing four-byte integer |
| variable-length character string |
| time of day (no time zone) |
|
| time of day, including time zone |
| date and time (no time zone) |
|
| date and time, including time zone |
| text search query |
| text search document |
| user-level transaction ID snapshot |
| universally unique identifier |
| XML data |
Name | Storage Size | Description | Representation |
| 16 bytes | Point on a plane | (x,y) |
| 32 bytes | Infinite line | {A,B,C} |
| 32 bytes | Finite line segment | ((x1,y1),(x2,y2)) |
| 32 bytes | Rectangular box | ((x1,y1),(x2,y2)) |
| 16+16n bytes | Closed path (similar to polygon) | ((x1,y1),...) |
| 16+16n bytes | Open path | [(x1,y1),...] |
| 40+16n bytes | Polygon (similar to closed path) | ((x1,y1),...) |
| 24 bytes | Circle | <(x,y),r> (center point and radius) |
Name | Storage Size | Description | Low Value | High Value | Resolution |
| 8 bytes | both date and time (no time zone) | 4713 BC | 294276 AD | 1 microsecond |
| 8 bytes | both date and time, with time zone | 4713 BC | 294276 AD | 1 microsecond |
| 4 bytes | date (no time of day) | 4713 BC | 5874897 AD | 1 day |
| 8 bytes | time of day (no date) | 00:00:00 | 24:00:00 | 1 microsecond |
| 12 bytes | time of day (no date), with time zone | 00:00:00+1459 | 24:00:00-1459 | 1 microsecond |
| 16 bytes | time interval | -178000000 years | 178000000 years | 1 microsecond |
A composite type represents the structure of a row or record; it is essentially just a list of field names and their data types. PostgreSQL allows composite types to be used in many of the same ways that simple types can be used. For example, a column of a table can be declared to be of a composite type.
Here are two simple examples of defining composite types:
The syntax is comparable to CREATE TABLE
, except that only field names and types can be specified; no constraints (such as NOT NULL
) can presently be included. Note that the AS
keyword is essential; without it, the system will think a different kind of CREATE TYPE
command is meant, and you will get odd syntax errors.
Having defined the types, we can use them to create tables:
or functions:
Whenever you create a table, a composite type is also automatically created, with the same name as the table, to represent the table's row type. For example, had we said:
then the same inventory_item
composite type shown above would come into being as a byproduct, and could be used just as above. Note however an important restriction of the current implementation: since no constraints are associated with a composite type, the constraints shown in the table definition do not apply to values of the composite type outside the table. (To work around this, create a domain over the composite type, and apply the desired constraints as CHECK
constraints of the domain.)
To write a composite value as a literal constant, enclose the field values within parentheses and separate them by commas. You can put double quotes around any field value, and must do so if it contains commas or parentheses. (More details appear below.) Thus, the general format of a composite constant is the following:
An example is:
which would be a valid value of the inventory_item
type defined above. To make a field be NULL, write no characters at all in its position in the list. For example, this constant specifies a NULL third field:
If you want an empty string rather than NULL, write double quotes:
Here the first field is a non-NULL empty string, the third is NULL.
(These constants are actually only a special case of the generic type constants discussed in Section 4.1.2.7. The constant is initially treated as a string and passed to the composite-type input conversion routine. An explicit type specification might be necessary to tell which type to convert the constant to.)
The ROW
expression syntax can also be used to construct composite values. In most cases this is considerably simpler to use than the string-literal syntax since you don't have to worry about multiple layers of quoting. We already used this method above:
The ROW keyword is actually optional as long as you have more than one field in the expression, so these can be simplified to:
The ROW
expression syntax is discussed in more detail in Section 4.2.13.
To access a field of a composite column, one writes a dot and the field name, much like selecting a field from a table name. In fact, it's so much like selecting from a table name that you often have to use parentheses to keep from confusing the parser. For example, you might try to select some subfields from our on_hand
example table with something like:
This will not work since the name item
is taken to be a table name, not a column name of on_hand
, per SQL syntax rules. You must write it like this:
or if you need to use the table name as well (for instance in a multitable query), like this:
Now the parenthesized object is correctly interpreted as a reference to the item
column, and then the subfield can be selected from it.
Similar syntactic issues apply whenever you select a field from a composite value. For instance, to select just one field from the result of a function that returns a composite value, you'd need to write something like:
Without the extra parentheses, this will generate a syntax error.
The special field name *
means “all fields”, as further explained in Section 8.16.5.
Here are some examples of the proper syntax for inserting and updating composite columns. First, inserting or updating a whole column:
The first example omits ROW
, the second uses it; we could have done it either way.
We can update an individual subfield of a composite column:
Notice here that we don't need to (and indeed cannot) put parentheses around the column name appearing just after SET
, but we do need parentheses when referencing the same column in the expression to the right of the equal sign.
And we can specify subfields as targets for INSERT
, too:
Had we not supplied values for all the subfields of the column, the remaining subfields would have been filled with null values.
There are various special syntax rules and behaviors associated with composite types in queries. These rules provide useful shortcuts, but can be confusing if you don't know the logic behind them.
In PostgreSQL, a reference to a table name (or alias) in a query is effectively a reference to the composite value of the table's current row. For example, if we had a table inventory_item
as shown above, we could write:
This query produces a single composite-valued column, so we might get output like:
Note however that simple names are matched to column names before table names, so this example works only because there is no column named c
in the query's tables.
The ordinary qualified-column-name syntax table_name
.
column_name
can be understood as applying field selection to the composite value of the table's current row. (For efficiency reasons, it's not actually implemented that way.)
When we write
then, according to the SQL standard, we should get the contents of the table expanded into separate columns:
as if the query were
PostgreSQL will apply this expansion behavior to any composite-valued expression, although as shown above, you need to write parentheses around the value that .*
is applied to whenever it's not a simple table name. For example, if myfunc()
is a function returning a composite type with columns a
, b
, and c
, then these two queries have the same result:
PostgreSQL handles column expansion by actually transforming the first form into the second. So, in this example, myfunc()
would get invoked three times per row with either syntax. If it's an expensive function you may wish to avoid that, which you can do with a query like:
Placing the function in a LATERAL
FROM
item keeps it from being invoked more than once per row. m.*
is still expanded into m.a, m.b, m.c
, but now those variables are just references to the output of the FROM
item. (The LATERAL
keyword is optional here, but we show it to clarify that the function is getting x
from some_table
.)
The composite_value
.*
syntax results in column expansion of this kind when it appears at the top level of a SELECT
output list, a RETURNING
list in INSERT
/UPDATE
/DELETE
, a VALUES
clause, or a row constructor. In all other contexts (including when nested inside one of those constructs), attaching .*
to a composite value does not change the value, since it means “all columns” and so the same composite value is produced again. For example, if somefunc()
accepts a composite-valued argument, these queries are the same:
In both cases, the current row of inventory_item
is passed to the function as a single composite-valued argument. Even though .*
does nothing in such cases, using it is good style, since it makes clear that a composite value is intended. In particular, the parser will consider c
in c.*
to refer to a table name or alias, not to a column name, so that there is no ambiguity; whereas without .*
, it is not clear whether c
means a table name or a column name, and in fact the column-name interpretation will be preferred if there is a column named c
.
Another example demonstrating these concepts is that all these queries mean the same thing:
All of these ORDER BY
clauses specify the row's composite value, resulting in sorting the rows according to the rules described in Section 9.23.6. However, if inventory_item
contained a column named c
, the first case would be different from the others, as it would mean to sort by that column only. Given the column names previously shown, these queries are also equivalent to those above:
(The last case uses a row constructor with the key word ROW
omitted.)
Another special syntactical behavior associated with composite values is that we can use functional notation for extracting a field of a composite value. The simple way to explain this is that the notations field
(table
) and table
.field
are interchangeable. For example, these queries are equivalent:
Moreover, if we have a function that accepts a single argument of a composite type, we can call it with either notation. These queries are all equivalent:
This equivalence between functional notation and field notation makes it possible to use functions on composite types to implement “computed fields”. An application using the last query above wouldn't need to be directly aware that somefunc
isn't a real column of the table.
Because of this behavior, it's unwise to give a function that takes a single composite-type argument the same name as any of the fields of that composite type. If there is ambiguity, the field-name interpretation will be chosen if field-name syntax is used, while the function will be chosen if function-call syntax is used. However, PostgreSQL versions before 11 always chose the field-name interpretation, unless the syntax of the call required it to be a function call. One way to force the function interpretation in older versions is to schema-qualify the function name, that is, write schema
.func
(compositevalue
).
The external text representation of a composite value consists of items that are interpreted according to the I/O conversion rules for the individual field types, plus decoration that indicates the composite structure. The decoration consists of parentheses ((
and )
) around the whole value, plus commas (,
) between adjacent items. Whitespace outside the parentheses is ignored, but within the parentheses it is considered part of the field value, and might or might not be significant depending on the input conversion rules for the field data type. For example, in:
the whitespace will be ignored if the field type is integer, but not if it is text.
As shown previously, when writing a composite value you can write double quotes around any individual field value. You must do so if the field value would otherwise confuse the composite-value parser. In particular, fields containing parentheses, commas, double quotes, or backslashes must be double-quoted. To put a double quote or backslash in a quoted composite field value, precede it with a backslash. (Also, a pair of double quotes within a double-quoted field value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can avoid quoting and use backslash-escaping to protect all data characters that would otherwise be taken as composite syntax.
A completely empty field value (no characters at all between the commas or parentheses) represents a NULL. To write a value that is an empty string rather than NULL, write ""
.
The composite output routine will put double quotes around field values if they are empty strings or contain parentheses, commas, double quotes, backslashes, or white space. (Doing so for white space is not essential, but aids legibility.) Double quotes and backslashes embedded in field values will be doubled.
Remember that what you write in an SQL command will first be interpreted as a string literal, and then as a composite. This doubles the number of backslashes you need (assuming escape string syntax is used). For example, to insert a text
field containing a double quote and a backslash in a composite value, you'd need to write:
The string-literal processor removes one level of backslashes, so that what arrives at the composite-value parser looks like ("\"\\")
. In turn, the string fed to the text
data type's input routine becomes "\
. (If we were working with a data type whose input routine also treated backslashes specially, bytea
for example, we might need as many as eight backslashes in the command to get one backslash into the stored composite field.) Dollar quoting (see Section 4.1.2.4) can be used to avoid the need to double backslashes.
The ROW
constructor syntax is usually easier to work with than the composite-literal syntax when writing composite values in SQL commands. In ROW
, individual field values are written the same way they would be written when not members of a composite.
pg_lsn 資料型別用於儲存 LSN(日誌序列編號)資料,該資料是指向 WAL 中某個位置的指標。此型別用於表示 XLogRecPtr,並且是 PostgreSQL 的內部系統型別。
Internally, an LSN is a 64-bit integer, representing a byte position in the write-ahead log stream. It is printed as two hexadecimal numbers of up to 8 digits each, separated by a slash; for example, 16/B374D848
. The pg_lsn
type supports the standard comparison operators, like =
and >
. Two LSNs can be subtracted using the -
operator; the result is the number of bytes separating those write-ahead log locations.
PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, composite type, range type, or domain can be created.
To illustrate the use of array types, we create this table:
As shown, an array data type is named by appending square brackets ([]
) to the data type name of the array elements. The above command will create a table named sal_emp
with a column of type text
(name
), a one-dimensional array of type integer
(pay_by_quarter
), which represents the employee's salary by quarter, and a two-dimensional array of text
(schedule
), which represents the employee's weekly schedule.
The syntax for CREATE TABLE
allows the exact size of arrays to be specified, for example:
However, the current implementation ignores any supplied array size limits, i.e., the behavior is the same as for arrays of unspecified length.
The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE
is simply documentation; it does not affect run-time behavior.
An alternative syntax, which conforms to the SQL standard by using the keyword ARRAY
, can be used for one-dimensional arrays. pay_by_quarter
could have been defined as:
Or, if no array size is to be specified:
As before, however, PostgreSQL does not enforce the size restriction in any case.
To write an array value as a literal constant, enclose the element values within curly braces and separate them by commas. (If you know C, this is not unlike the C syntax for initializing structures.) You can put double quotes around any element value, and must do so if it contains commas or curly braces. (More details appear below.) Thus, the general format of an array constant is the following:
where delim
is the delimiter character for the type, as recorded in its pg_type
entry. Among the standard data types provided in the PostgreSQL distribution, all use a comma (,
), except for type box
which uses a semicolon (;
). Each val
is either a constant of the array element type, or a subarray. An example of an array constant is:
This constant is a two-dimensional, 3-by-3 array consisting of three subarrays of integers.
To set an element of an array constant to NULL, write NULL
for the element value. (Any upper- or lower-case variant of NULL
will do.) If you want an actual string value “NULL”, you must put double quotes around it.
(These kinds of array constants are actually only a special case of the generic type constants discussed in Section 4.1.2.7. The constant is initially treated as a string and passed to the array input conversion routine. An explicit type specification might be necessary.)
Now we can show some INSERT
statements:
The result of the previous two inserts looks like this:
Multidimensional arrays must have matching extents for each dimension. A mismatch causes an error, for example:
The ARRAY
constructor syntax can also be used:
Notice that the array elements are ordinary SQL constants or expressions; for instance, string literals are single quoted, instead of double quoted as they would be in an array literal. The ARRAY
constructor syntax is discussed in more detail in Section 4.2.12.
Now, we can run some queries on the table. First, we show how to access a single element of an array. This query retrieves the names of the employees whose pay changed in the second quarter:
The array subscript numbers are written within square brackets. By default PostgreSQL uses a one-based numbering convention for arrays, that is, an array of n
elements starts with array[1]
and ends with array[
n
].
This query retrieves the third quarter pay of all employees:
We can also access arbitrary rectangular slices of an array, or subarrays. An array slice is denoted by writing lower-bound
:upper-bound
for one or more array dimensions. For example, this query retrieves the first item on Bill's schedule for the first two days of the week:
If any dimension is written as a slice, i.e., contains a colon, then all dimensions are treated as slices. Any dimension that has only a single number (no colon) is treated as being from 1 to the number specified. For example, [2]
is treated as [1:2]
, as in this example:
To avoid confusion with the non-slice case, it's best to use slice syntax for all dimensions, e.g., [1:2][1:1]
, not [2][1:1]
.
It is possible to omit the lower-bound
and/or upper-bound
of a slice specifier; the missing bound is replaced by the lower or upper limit of the array's subscripts. For example:
An array subscript expression will return null if either the array itself or any of the subscript expressions are null. Also, null is returned if a subscript is outside the array bounds (this case does not raise an error). For example, if schedule
currently has the dimensions [1:3][1:2]
then referencing schedule[3][3]
yields NULL. Similarly, an array reference with the wrong number of subscripts yields a null rather than an error.
An array slice expression likewise yields null if the array itself or any of the subscript expressions are null. However, in other cases such as selecting an array slice that is completely outside the current array bounds, a slice expression yields an empty (zero-dimensional) array instead of null. (This does not match non-slice behavior and is done for historical reasons.) If the requested slice partially overlaps the array bounds, then it is silently reduced to just the overlapping region instead of returning null.
The current dimensions of any array value can be retrieved with the array_dims
function:
array_dims
produces a text
result, which is convenient for people to read but perhaps inconvenient for programs. Dimensions can also be retrieved with array_upper
and array_lower
, which return the upper and lower bound of a specified array dimension, respectively:
array_length
will return the length of a specified array dimension:
cardinality
returns the total number of elements in an array across all dimensions. It is effectively the number of rows a call to unnest
would yield:
An array value can be replaced completely:
or using the ARRAY
expression syntax:
An array can also be updated at a single element:
or updated in a slice:
The slice syntaxes with omitted lower-bound
and/or upper-bound
can be used too, but only when updating an array value that is not NULL or zero-dimensional (otherwise, there is no existing subscript limit to substitute).
A stored array value can be enlarged by assigning to elements not already present. Any positions between those previously present and the newly assigned elements will be filled with nulls. For example, if array myarray
currently has 4 elements, it will have six elements after an update that assigns to myarray[6]
; myarray[5]
will contain null. Currently, enlargement in this fashion is only allowed for one-dimensional arrays, not multidimensional arrays.
Subscripted assignment allows creation of arrays that do not use one-based subscripts. For example one might assign to myarray[-2:7]
to create an array with subscript values from -2 to 7.
New array values can also be constructed using the concatenation operator, ||
:
The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. It also accepts two N
-dimensional arrays, or an N
-dimensional and an N+1
-dimensional array.
When a single element is pushed onto either the beginning or end of a one-dimensional array, the result is an array with the same lower bound subscript as the array operand. For example:
When two arrays with an equal number of dimensions are concatenated, the result retains the lower bound subscript of the left-hand operand's outer dimension. The result is an array comprising every element of the left-hand operand followed by every element of the right-hand operand. For example:
When an N
-dimensional array is pushed onto the beginning or end of an N+1
-dimensional array, the result is analogous to the element-array case above. Each N
-dimensional sub-array is essentially an element of the N+1
-dimensional array's outer dimension. For example:
An array can also be constructed by using the functions array_prepend
, array_append
, or array_cat
. The first two only support one-dimensional arrays, but array_cat
supports multidimensional arrays. Some examples:
In simple cases, the concatenation operator discussed above is preferred over direct use of these functions. However, because the concatenation operator is overloaded to serve all three cases, there are situations where use of one of the functions is helpful to avoid ambiguity. For example consider:
In the examples above, the parser sees an integer array on one side of the concatenation operator, and a constant of undetermined type on the other. The heuristic it uses to resolve the constant's type is to assume it's of the same type as the operator's other input — in this case, integer array. So the concatenation operator is presumed to represent array_cat
, not array_append
. When that's the wrong choice, it could be fixed by casting the constant to the array's element type; but explicit use of array_append
might be a preferable solution.
To search for a value in an array, each value must be checked. This can be done manually, if you know the size of the array. For example:
However, this quickly becomes tedious for large arrays, and is not helpful if the size of the array is unknown. An alternative method is described in Section 9.23. The above query could be replaced by:
In addition, you can find rows where the array has all values equal to 10000 with:
Alternatively, the generate_subscripts
function can be used. For example:
This function is described in Table 9.62.
You can also search an array using the &&
operator, which checks whether the left operand overlaps with the right operand. For instance:
This and other array operators are further described in Section 9.18. It can be accelerated by an appropriate index, as described in Section 11.2.
You can also search for specific values in an array using the array_position
and array_positions
functions. The former returns the subscript of the first occurrence of a value in an array; the latter returns an array with the subscripts of all occurrences of the value in the array. For example:
Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.
The external text representation of an array value consists of items that are interpreted according to the I/O conversion rules for the array's element type, plus decoration that indicates the array structure. The decoration consists of curly braces ({
and }
) around the array value plus delimiter characters between adjacent items. The delimiter character is usually a comma (,
) but can be something else: it is determined by the typdelim
setting for the array's element type. Among the standard data types provided in the PostgreSQL distribution, all use a comma, except for type box
, which uses a semicolon (;
). In a multidimensional array, each dimension (row, plane, cube, etc.) gets its own level of curly braces, and delimiters must be written between adjacent curly-braced entities of the same level.
The array output routine will put double quotes around element values if they are empty strings, contain curly braces, delimiter characters, double quotes, backslashes, or white space, or match the word NULL
. Double quotes and backslashes embedded in element values will be backslash-escaped. For numeric data types it is safe to assume that double quotes will never appear, but for textual data types one should be prepared to cope with either the presence or absence of quotes.
By default, the lower bound index value of an array's dimensions is set to one. To represent arrays with other lower bounds, the array subscript ranges can be specified explicitly before writing the array contents. This decoration consists of square brackets ([]
) around each array dimension's lower and upper bounds, with a colon (:
) delimiter character in between. The array dimension decoration is followed by an equal sign (=
). For example:
The array output routine will include explicit dimensions in its result only when there are one or more lower bounds different from one.
If the value written for an element is NULL
(in any case variant), the element is taken to be NULL. The presence of any quotes or backslashes disables this and allows the literal string value “NULL” to be entered. Also, for backward compatibility with pre-8.2 versions of PostgreSQL, the array_nulls configuration parameter can be turned off
to suppress recognition of NULL
as a NULL.
As shown previously, when writing an array value you can use double quotes around any individual array element. You must do so if the element value would otherwise confuse the array-value parser. For example, elements containing curly braces, commas (or the data type's delimiter character), double quotes, backslashes, or leading or trailing whitespace must be double-quoted. Empty strings and strings matching the word NULL
must be quoted, too. To put a double quote or backslash in a quoted array element value, precede it with a backslash. Alternatively, you can avoid quotes and use backslash-escaping to protect all data characters that would otherwise be taken as array syntax.
You can add whitespace before a left brace or after a right brace. You can also add whitespace before or after any individual item string. In all of these cases the whitespace will be ignored. However, whitespace within double-quoted elements, or surrounded on both sides by non-whitespace characters of an element, is not ignored.
The ARRAY
constructor syntax (see Section 4.2.12) is often easier to work with than the array-literal syntax when writing array values in SQL commands. In ARRAY
, individual element values are written the same way they would be written when not members of an array.
範圍型別(Range Type)是表示某種資料型別(稱為範圍的子類型)其元素值為某種範圍的資料型別。例如,時間戳記範圍可用於表示保留會議室的時間範圍。在這種情況下,資料型別為 tsrange(「時間戳記範圍」的縮寫),時間戳記是子型別。子型別必須具有次序性,以便可以很明確地定義元素值是在範圍之內,之前還是之後。
範圍類型之所以有用,是因為它們在某個範圍值中表示許多元素值,並且因為可以清楚地表示諸如重疊範圍之類的概念。將時間和日期範圍用於計劃目的是最明顯的例子;還有像是價格範圍、儀器的測量範圍等等也會有用。
PostgreSQL 內建了以下內建範圍型別:
int4range
— Range of integer
int8range
— Range of bigint
numrange
— Range of numeric
tsrange
— Range of timestamp without time zone
tstzrange
— Range of timestamp with time zone
daterange
— Range of date
另外,您也可以定義自己的範圍類型。 有關更多說明,請參閱 CREATE TYPE。
有關範圍型別的運算子和函數的完整列表,請參閱 Table 9.53 和 Table 9.54。
Every non-empty range has two bounds, the lower bound and the upper bound. All points between these values are included in the range. An inclusive bound means that the boundary point itself is included in the range as well, while an exclusive bound means that the boundary point is not included in the range.
In the text form of a range, an inclusive lower bound is represented by “[
” while an exclusive lower bound is represented by “(
”. Likewise, an inclusive upper bound is represented by “]
”, while an exclusive upper bound is represented by “)
”. (See Section 8.17.5 for more details.)
The functions lower_inc
and upper_inc
test the inclusivity of the lower and upper bounds of a range value, respectively.
The lower bound of a range can be omitted, meaning that all values less than the upper bound are included in the range, e.g., (,3]
. Likewise, if the upper bound of the range is omitted, then all values greater than the lower bound are included in the range. If both lower and upper bounds are omitted, all values of the element type are considered to be in the range. Specifying a missing bound as inclusive is automatically converted to exclusive, e.g., [,]
is converted to (,)
. You can think of these missing values as +/-infinity, but they are special range type values and are considered to be beyond any range element type's +/-infinity values.
Element types that have the notion of “infinity” can use them as explicit bound values. For example, with timestamp ranges, [today,infinity)
excludes the special timestamp
value infinity
, while [today,infinity]
include it, as does [today,)
and [today,]
.
The functions lower_inf
and upper_inf
test for infinite lower and upper bounds of a range, respectively.
The input for a range value must follow one of the following patterns:
The parentheses or brackets indicate whether the lower and upper bounds are exclusive or inclusive, as described previously. Notice that the final pattern is empty
, which represents an empty range (a range that contains no points).
The lower-bound
may be either a string that is valid input for the subtype, or empty to indicate no lower bound. Likewise, upper-bound
may be either a string that is valid input for the subtype, or empty to indicate no upper bound.
Each bound value can be quoted using "
(double quote) characters. This is necessary if the bound value contains parentheses, brackets, commas, double quotes, or backslashes, since these characters would otherwise be taken as part of the range syntax. To put a double quote or backslash in a quoted bound value, precede it with a backslash. (Also, a pair of double quotes within a double-quoted bound value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can avoid quoting and use backslash-escaping to protect all data characters that would otherwise be taken as range syntax. Also, to write a bound value that is an empty string, write ""
, since writing nothing means an infinite bound.
Whitespace is allowed before and after the range value, but any whitespace between the parentheses or brackets is taken as part of the lower or upper bound value. (Depending on the element type, it might or might not be significant.)
These rules are very similar to those for writing field values in composite-type literals. See Section 8.16.6 for additional commentary.
Examples:
Each range type has a constructor function with the same name as the range type. Using the constructor function is frequently more convenient than writing a range literal constant, since it avoids the need for extra quoting of the bound values. The constructor function accepts two or three arguments. The two-argument form constructs a range in standard form (lower bound inclusive, upper bound exclusive), while the three-argument form constructs a range with bounds of the form specified by the third argument. The third argument must be one of the strings “()
”, “(]
”, “[)
”, or “[]
”. For example:
A discrete range is one whose element type has a well-defined “step”, such as integer
or date
. In these types two elements can be said to be adjacent, when there are no valid values between them. This contrasts with continuous ranges, where it's always (or almost always) possible to identify other element values between two given values. For example, a range over the numeric
type is continuous, as is a range over timestamp
. (Even though timestamp
has limited precision, and so could theoretically be treated as discrete, it's better to consider it continuous since the step size is normally not of interest.)
Another way to think about a discrete range type is that there is a clear idea of a “next” or “previous” value for each element value. Knowing that, it is possible to convert between inclusive and exclusive representations of a range's bounds, by choosing the next or previous element value instead of the one originally given. For example, in an integer range type [4,8]
and (3,9)
denote the same set of values; but this would not be so for a range over numeric.
A discrete range type should have a canonicalization function that is aware of the desired step size for the element type. The canonicalization function is charged with converting equivalent values of the range type to have identical representations, in particular consistently inclusive or exclusive bounds. If a canonicalization function is not specified, then ranges with different formatting will always be treated as unequal, even though they might represent the same set of values in reality.
The built-in range types int4range
, int8range
, and daterange
all use a canonical form that includes the lower bound and excludes the upper bound; that is, [)
. User-defined range types can use other conventions, however.
Users can define their own range types. The most common reason to do this is to use ranges over subtypes not provided among the built-in range types. For example, to define a new range type of subtype float8
:
Because float8
has no meaningful “step”, we do not define a canonicalization function in this example.
Defining your own range type also allows you to specify a different subtype B-tree operator class or collation to use, so as to change the sort ordering that determines which values fall into a given range.
If the subtype is considered to have discrete rather than continuous values, the CREATE TYPE
command should specify a canonical
function. The canonicalization function takes an input range value, and must return an equivalent range value that may have different bounds and formatting. The canonical output for two ranges that represent the same set of values, for example the integer ranges [1, 7]
and [1, 8)
, must be identical. It doesn't matter which representation you choose to be the canonical one, so long as two equivalent values with different formattings are always mapped to the same value with the same formatting. In addition to adjusting the inclusive/exclusive bounds format, a canonicalization function might round off boundary values, in case the desired step size is larger than what the subtype is capable of storing. For instance, a range type over timestamp
could be defined to have a step size of an hour, in which case the canonicalization function would need to round off bounds that weren't a multiple of an hour, or perhaps throw an error instead.
In addition, any range type that is meant to be used with GiST or SP-GiST indexes should define a subtype difference, or subtype_diff
, function. (The index will still work without subtype_diff
, but it is likely to be considerably less efficient than if a difference function is provided.) The subtype difference function takes two input values of the subtype, and returns their difference (i.e., X
minus Y
) represented as a float8
value. In our example above, the function float8mi
that underlies the regular float8
minus operator can be used; but for any other subtype, some type conversion would be necessary. Some creative thought about how to represent differences as numbers might be needed, too. To the greatest extent possible, the subtype_diff
function should agree with the sort ordering implied by the selected operator class and collation; that is, its result should be positive whenever its first argument is greater than its second according to the sort ordering.
A less-oversimplified example of a subtype_diff
function is:
See CREATE TYPE for more information about creating range types.
GiST and SP-GiST indexes can be created for table columns of range types. For instance, to create a GiST index:
A GiST or SP-GiST index can accelerate queries involving these range operators: =
, &&
, <@
, @>
, <<
, >>
, -|-
, &<
, and &>
(see Table 9.53 for more information).
In addition, B-tree and hash indexes can be created for table columns of range types. For these index types, basically the only useful range operation is equality. There is a B-tree sort ordering defined for range values, with corresponding <
and >
operators, but the ordering is rather arbitrary and not usually useful in the real world. Range types' B-tree and hash support is primarily meant to allow sorting and hashing internally in queries, rather than creation of actual indexes.
While UNIQUE
is a natural constraint for scalar values, it is usually unsuitable for range types. Instead, an exclusion constraint is often more appropriate (see CREATE TABLE ... CONSTRAINT ... EXCLUDE). Exclusion constraints allow the specification of constraints such as “non-overlapping” on a range type. For example:
That constraint will prevent any overlapping values from existing in the table at the same time:
You can use the btree_gist
extension to define exclusion constraints on plain scalar data types, which can then be combined with range exclusions for maximum flexibility. For example, after btree_gist
is installed, the following constraint will reject overlapping ranges only if the meeting room numbers are equal:
Domain 是基於另一個基本型別的使用者定義資料型別。可以選擇性將其有效值限制為基本型別的子集。否則,它的行為類似於基本型別 — 例如,可以應用於基本型別的任何運算子或函數都將在 domain 型別上以相同的行為運作。基本型別可以是任何內建或其他使用者定義的基本型別、列舉型別、陣列型別、複合型別、範圍型別或其他的 domain。
例如,我們可以整數型別上建立一個 domain,其僅接受正數:
當基本型別的運算子或函數運作於 domain 的值時,該 domain 會自動向下轉換為基本型別。因此,例如,mytable.id - 1 的結果被認為是整數型別而不是 posint 型別。 我們可以使用 (mytable.id - 1)::posint 將結果轉換回 posint,使其重新檢查 domain 的條件。在這種情況下,如果將表示式於 ID 值 1 做運算,則會產生錯誤。可以在不明確寫出強制型別轉換的情況下,將基本型別的值寫入到 domain 型別的欄位或變數,但是將會檢查該 domain 的限制條件。
有關更多資訊,請參閱 CREATE DOMAIN。
PostgreSQL 為內建的資料型別提供了大量的函數和運算子。使用者還可以定義自己的函數和運算子,如第 V 部分所述。psql 指令 \df 和 \do 可分別用於列出所有可用的函數和運算子。
The notation used throughout this chapter to describe the argument and result data types of a function or operator is like this:
which says that the function repeat
takes one text and one integer argument and returns a result of type text. The right arrow is also used to indicate the result of an example, thus:
如果您擔心可移植性,那麼請注意,本章中描述的大多數函數和運算子(最常見的算術運算子和比較運算子以及一些明確標記的函數除外)都不是由 SQL 標準指定的。其他一些 SQL 資料庫管理系統提供了其中一些延伸功能,並且在許多情況下,這些功能在各種實作之間是相容和一致的。本章可能不夠完整;附加功能出現在手冊的其他相關章節中。
JSON 資料型別用於儲存 RFC 7159 中所規範的 JSON(JavaScript Object Notation)資料。此類資料也可以儲存為 text,但是 JSON 資料型別的優點是可以根據 JSON 規則強制讓每個儲存的值必須是有效的值 。對於這些資料型別中儲存的資料,還提供了各種特定於 JSON 的函數和運算子。 另請參閱第 9.15 節。
PostgreSQL 提供了兩種儲存 JSON 資料的型別:json 和 jsonb。為了對這些資料型別實作有效的查詢機制,PostgreSQL 還提供了 8.14.6 節中所描述的 jsonpath 資料型別。
json 和 jsonb 資料型別接受幾乎相同的內容集合作為輸入。實際主要的差別是效率。json 資料型別儲存與輸入字串完全相同的內容,處理函數必須在每次執行時重新解析;jsonb 資料型別則以分解後的二進位格式儲存,由於增加了轉換成本,因此資料輸入的速度稍慢,但由於後續不需要解析,因此處理速度明顯加快。jsonb 還支援索引處理,這是一個很大的優勢。
因為 json 型別儲存與輸入字串完全相同的內容,所以它將保留標記之間語義上無關的空白以及 JSON 物件中鍵的順序。另外,如果 JSON 內容物件包含相同的鍵不只一次,則所有鍵/值對都會保留。(處理函數會將最後一個值視為可用的值。)相比之下,jsonb 不會保留空白,不會保留物件中鍵的順序,也不會保留物件中重複的鍵。如果在輸入中指定了重複的鍵,則僅保留最後一個值。
通常,大多數應用程序應該將 JSON 資料儲存為 jsonb,除非有非常特殊的需求,例如關於物件中鍵的順序有一些傳統上的假設。
由於 PostgreSQL 每個資料庫只允許一種字元集的編碼。因此,除非資料庫編碼為 UTF8,否則 JSON 型別不可能嚴格符合 JSON 規範。嘗試直接使用資料庫編碼中無法表示的字元會失敗;相反,character 型別則允許使用可以在資料庫編碼中表示但不能以 UTF8 表示的字元。
RFC 7159 允許 JSON 字串包含 \uXXXX 所表示的 Unicode 轉譯序列。在 json 型別的輸入函數中,無論資料庫編碼如何,都允許 Unicode 轉譯,並且僅檢查語法正確性(即,四個十六進位數字跟在 \u 之後)。但是,jsonb 的輸入函數更嚴格:除非資料庫編碼為 UTF8,否則它不允許非 ASCII 字元(U+007F 以上的字元)使用 Unicode 轉譯。jsonb 型別也拒絕 \u0000(因為無法在 PostgreSQL 的 text 型別中表現),並且堅持認為使用 Unicode surrogate pair 對來指定 Unicode Basic Multilingual Plane 之外的字元都是正確的。有效的 Unicode 轉譯會轉換為等效的 ASCII 或 UTF8 字元進行儲存; 這包括將 surrogate pair 折疊為單個字元。
第 9.15 節中描述的許多 JSON 處理函數會將 Unicode 轉譯為一般字元,因此,即使輸入型別為 json 而不是 jsonb,它們也會拋出與上述類型相同的錯誤。json 輸入函數不進行這些檢查的事實可能被認為是歷史共業,儘管它確實允許以非 UTF8 資料庫編碼的形式簡單儲存(毋須處理)JSON Unicode 轉譯。 通常,如果可以的話,最好避免將 JSON 中的 Unicode 轉譯與非 UTF8 資料庫編碼混在一起。
將字串 JSON 輸入轉換為 jsonb 時,RFC 7159 描述的原始型別將會有效地對應到內建的 PostgreSQL 型別,如 Table 8.23 所示。因此,對於構成有效 jsonb 資料的內容存在一些較小的附加約束條件,這些約束條件既不適用於 json 型別,也不適用於抽象上 JSON,這對應於基礎資料型別可以表示的內容限制。值得注意的是,jsonb 會拒絕 PostgreSQL 數字資料型別範圍之外的數字,而 json 不會。RFC 7159 允許此類實作定義限制。但是,實際上,在其他實作中更容易出現此類問題,因為通常將 JSON 的數字基本型別表示為 IEEE 754 雙精確度浮點數(RFC 7159 明確預期了這一點且允許)。當使用 JSON 作為此類系統的交換格式時,應考慮與 PostgreSQL 最初儲存的資料相比較,可能會有失去數字精確度的風險。
相反,如下表中所示,JSON 基本型別的輸入格式有一些微小的限制,但並不適用於其相應的 PostgreSQL 資料型別。
JSON 資料型別的輸入/輸出語法被規範在 RFC 7159 之中。
以下是所有有效的 json(或 jsonb)表示式:
如前所述,當輸入 JSON 內容然後在不進行任何其他處理的情況下進行輸出時,json 輸出與輸入相同的內容,而 jsonb 則不會保留與語義無關的細節,像是空格。例如,請注意此處的差別:
值得注意的一個語義無關的細節是,在 jsonb 中,數字將根據基本數字型別的行為進行輸出。實際上,這意味著使用 E 記號輸入的數字將不會以原輸出形式輸出,例如:
但是,jsonb 將保留小數尾巴的數字零,如在本範例中所示,即使它們在語義上無意義(例如,相等運算),也是如此。
有關可用於建構和處理 JSON 內容的內建函數和運算子的列表,請參閱第 9.15 節。
將資料表示為 JSON 可以比傳統的關連資料模型要靈活得多,而傳統的關連資料模型在需求多變的環境中非常引人注目。這兩種方法很可能在同一應用程序中共存和互補。但是,即使對於需要最大靈活性的應用程序,仍然建議 JSON 文件具有某種固定的結構。該結構通常是不具有強制性的(儘管可以宣告強制執行某些業務規則),但是具有可預測的結構可以使撰編查詢變得更加容易,該查詢可以有效地彙總資料表中的一組「文件」(datums)。
JSON 資料儲存在資料表中時,與其他任何資料型別一樣,要遵循相同的一致性控制事項。儘管儲存大型文件是可行的,但請記住,任何更新都會取得整筆資料的 row-level lock。考慮將 JSON 文件限制在可管理的大小以內,以減少更新交易事務之間的鎖定競爭。理想情況下,每個 JSON 文件都應代表一個完整交易單位資料(atomic datum),業務規則規定不能將該完整交易單位資料進一步細分為可以獨立更新的較小單位資料。
jsonb
Containment and Existence測試包容性(containment)是 jsonb 的一項重要功能。json 型別沒有平行處理的工具集。包含性測試一個 jsonb 文件是否在其中包含另一個。除說明以外的部份,這些範例會回傳 true:
一般原則是,包含物件必須在結構和資料內容上與包含的物件相吻合,可能是在從包含的物件中丟棄了一些不吻合的陣列元素或物件鍵/值配對之後。但是請記住,進行包含性檢查時,陣列元素的順序並不重要,並且重複陣列元素僅有一個元素會被視為有效。
作為結構必須吻合的一般原則的特殊例外,陣列可以包含單一基本值:
jsonb 還具有一個 existence 運算子,它是包含性的變體:它測試字串(作為 text 值)是否作為物件鍵或陣列元素出現在 jsonb 值的頂層。這些範例回傳 true,除非另有說明:
當涉及許多鍵或元素時,JSON 物件比陣列更適合用於測試是否包含或存在,因為與陣列不同,JSON 物件在內部進行了最佳化以進行搜尋,因此不需要線性搜尋。
由於 JSON 的包含性是巢狀的,因此適當的查詢可以跳過對子物件的明確選擇。舉例來說,假設我們有一個 doc 欄位,其中包含最上層物件,而大多數物件包含子物件陣列的標籤欄位。該查詢項目,在其中包含“ term”:“ paris”和“ term”:“ food”的子物件出現,而忽略標籤陣列以外的任何鍵:
例如,另一個方式可以完成同一件事
但是這種方法靈活性較差,而且效率通常也較低。
另一方面,JSON 存在性運算子不是巢狀的:它只會在 JSON 內容的最上層查詢指定的鍵或陣列元素。
在第 9.15 節中記錄了各種包含性和存在性的運算子,以及所有其他 JSON 運算子和函數。
jsonb
IndexingGIN 索引可用於有效搜尋大量的 jsonb 文件(datums)中出現的鍵或鍵/值配對。有兩種 GIN “operator classes”,提供了不同的效能和靈活性權衡。
jsonb 的預設 GIN 運算子類支援使用最上層鍵存在的運算子 ?,?& 和 ?| 進行查詢。運算子和路徑/值存在性運算子 @>。(有關這些運算子實作的語義的詳細信息,請參見 Table 9.45。)使用此運算子類建立索引的範例是:
非預設 GIN 運算子類 jsonb_path_ops 僅支援對 @> 運算子進行索引。使用此運算子類建立索引的範例是:
想像一個資料表的範例,該資料表儲存了從第三方 Web 服務檢索到的 JSON 文件以及已文件化的結構定義。典型的文件是:
我們將這些文件儲存在名為 api 的資料表中,名為 jdoc 的 jsonb 欄位中。如果在此欄位上建立了 GIN 索引,則如下查詢可以使用到該索引:
但是,索引不能用於以下查詢,儘管運算子 ? 是可索引的,但它不會直接套用於索引欄位 jdoc:
儘管如此,透過適當使用表示式索引,上述查詢仍可以使用索引。如果在“tags”鍵中查詢特定項目很常見,則定義這樣的索引可能是值得的:
現在,WHERE 子句 jdoc->'tags' ? 'qui' 將被識別為可索引運算子的應用程序 ? 到索引表示式 jdoc->'tags'。(有關表示式索引的更多資訊,請參閱第 11.7 節。)
另外,GIN 索引支援 @@ 和 @? 運算子,它們處理 jsonpath 的搜尋。
GIN 索引從 jsonpath 中取出以下形式的語句:accessors_chain = const
。Accessors chain 可能由 .key,[*] 和 [index] 的 Accessor 所組成。jsonb_ops 也支持 .* 和 .** 的 Accessor。
查詢的另一種方法是利用 containment,例如:
jdoc 欄位上的簡單 GIN 索引可以支援此查詢。但是請注意,這樣的索引將在 jdoc 欄位中儲存每個鍵和值的副本,而上一範例的表示式索引僅儲存在 tag 鍵下所找到的資料。儘管簡單索引方法更加靈活(因為它支援對任何鍵的查詢),但目標表示式索引可能比簡單索引更小且搜尋速度更快。
儘管 jsonb_path_ops 運算子類僅支援使用 @>,@@ 和 @? 運算子的查詢,它比預設的運算子類 jsonb_ops 具有明顯的效能優勢。對於相同資料集,jsonb_path_ops 索引通常也比 jsonb_ops 索引小得多,針對搜尋的專用性更好,尤其是當查詢包含頻繁出現在資料中的鍵時。因此,搜尋性質的操作通常比預設運算子類具有更好的效能。
The technical difference between a jsonb_ops
and a jsonb_path_ops
GIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data. [6] Basically, each jsonb_path_ops
index item is a hash of the value and the key(s) leading to it; for example to index {"foo": {"bar": "baz"}}
, a single index item would be created incorporating all three of foo
, bar
, and baz
into the hash value. Thus a containment query looking for this structure would result in an extremely specific index search; but there is no way at all to find out whether foo
appears as a key. On the other hand, a jsonb_ops
index would create three index items representing foo
, bar
, and baz
separately; then to do the containment query, it would look for rows containing all three of these items. While GIN indexes can perform such an AND search fairly efficiently, it will still be less specific and slower than the equivalent jsonb_path_ops
search, especially if there are a very large number of rows containing any single one of the three index items.
A disadvantage of the jsonb_path_ops
approach is that it produces no index entries for JSON structures not containing any values, such as {"a": {}}
. If a search for documents containing such a structure is requested, it will require a full-index scan, which is quite slow. jsonb_path_ops
is therefore ill-suited for applications that often perform such searches.
jsonb
also supports btree
and hash
indexes. These are usually useful only if it's important to check equality of complete JSON documents. The btree
ordering for jsonb
datums is seldom of great interest, but for completeness it is:
Objects with equal numbers of pairs are compared in the order:
Note that object keys are compared in their storage order; in particular, since shorter keys are stored before longer keys, this can lead to results that might be unintuitive, such as:
Similarly, arrays with equal numbers of elements are compared in the order:
Primitive JSON values are compared using the same comparison rules as for the underlying PostgreSQL data type. Strings are compared using the default database collation.
可以使用其他延伸功能來實作針對不同程序語言的 jsonb 型別轉換。
PL/Perl 的延伸功能名稱為 jsonb_plperl 和 jsonb_plperlu。如果使用它們,則 jsonb 的值將視情況對應轉換為到 Perl 的 array、hash 和 scalar。
PL/Python 的延伸功能名稱為 jsonb_plpythonu,jsonb_plpython2u 和 jsonb_plpython3u(有關 PL/Python 的命名約定,請參閱第 45.1 節)。 如果使用它們,則 jsonb 值將適當地對應轉換到 Python 的 dictionary,list 和 scalar。
jsonpath 型別實現了 PostgreSQL 中對 SQL/JSON 路徑語法的支援,以有效地查詢 JSON 資料。它提供以二元運算的形式來使用已解析的 SQL/JSON 路徑表示式,此表示式讓路徑引擎從 JSON 資料檢索的項目取出內容,以供 SQL/JSON 查詢函數進一步處理。
SQL / JSON 路徑 predicate 和運算子的語義基本遵循 SQL 標準。同時,為了提供使用 JSON 資料的更自然的方式,SQL/JSON 路徑語法使用了一些 JavaScript 約定:
點(.)用於資料成員存取。
中括號([ ])用於陣列存取。
與從 1 開始的一般 SQL 陣列不同,SQL/JSON 陣列是 從 0 開始。
SQL/JSON 路徑表示式通常以 SQL 字串文字形式寫在 SQL 查詢中,因此它必須用單引號引起來,並且值中所需的任何單引號都必須加倍(請參閱第 4.1.2.1 節)。某些形式的路徑表示式需要在其中包含字串文字。這些嵌入的字串文字遵循 JavaScript/ECMAScript 約定:它們必須用雙引號引起來,並且在其中可以使用反斜線轉譯符號來表示,否則很難輸入的字元。特別地,在嵌入式字串文字中寫雙引號的方式是 \",而寫反斜線本身則必須寫成 \。其他特殊的反斜線序列包括在 JSON 字串中識別的那些:\b,\f,\n,\r,\t,\v 用於各種 ASCII 控制字元,\uNNNN 用於其 4 進位數字代碼標識的 Unicode 字元。反斜線語法還包括 JSON 不允許的兩種情況:\xNN 僅用兩個十六進位數字編寫的字元代碼,而 \u {N ...} 用於用 1 至 6 個十六進位數字編寫的字元代碼。
A path expression consists of a sequence of path elements, which can be the following:
Path literals of JSON primitive types: Unicode text, numeric, true, false, or null.
Path variables listed in Table 8.24.
Accessor operators listed in Table 8.25.
jsonpath
operators and methods listed in Section 9.15.2.3
Parentheses, which can be used to provide filter expressions or define the order of path evaluation.
For details on using jsonpath
expressions with SQL/JSON query functions, see Section 9.15.2.
jsonpath
Variablesjsonpath
Accessors[6] For this purpose, the term “value” includes array elements, though JSON terminology sometimes considers array elements distinct from values within objects.
PostgreSQL 型別系統包含許多專用項目,這些項目統稱為概念型別(pseudo-type)。概念型別不能用作實際的欄位資料型別,但可以用來宣告函數的參數或結果型別。各個概念型別在於函數的行為不只是對應於簡單地獲取或回傳特定 SQL 資料型別的值的情況下很有用。Table 8.27 列出了現有的概念型別。
Functions coded in C (whether built-in or dynamically loaded) can be declared to accept or return any of these pseudo data types. It is up to the function author to ensure that the function will behave safely when a pseudo-type is used as an argument type.
Functions coded in procedural languages can use pseudo-types only as allowed by their implementation languages. At present most procedural languages forbid use of a pseudo-type as an argument type, and allow only void
and record
as a result type (plus trigger
or event_trigger
when the function is used as a trigger or event trigger). Some also support polymorphic functions using the types anyelement
, anyarray
, anynonarray
, anyenum
, and anyrange
.
The internal
pseudo-type is used to declare functions that are meant only to be called internally by the database system, and not by direct invocation in an SQL query. If a function has at least one internal
-type argument then it cannot be called from SQL. To preserve the type safety of this restriction it is important to follow this coding rule: do not create any function that is declared to return internal
unless it has at least one internal
argument.
Object identifiers (OIDs) are used internally by PostgreSQL as primary keys for various system tables. Type oid
represents an object identifier. There are also several alias types for oid
: regproc
, regprocedure
, regoper
, regoperator
, regclass
, regtype
, regrole
, regnamespace
, regconfig
, and regdictionary
. Table 8.26 shows an overview.
The oid
type is currently implemented as an unsigned four-byte integer. Therefore, it is not large enough to provide database-wide uniqueness in large databases, or even in large individual tables.
The oid
type itself has few operations beyond comparison. It can be cast to integer, however, and then manipulated using the standard integer operators. (Beware of possible signed-versus-unsigned confusion if you do this.)
The OID alias types have no operations of their own except for specialized input and output routines. These routines are able to accept and display symbolic names for system objects, rather than the raw numeric value that type oid
would use. The alias types allow simplified lookup of OID values for objects. For example, to examine the pg_attribute
rows related to a table mytable
, one could write:
rather than:
While that doesn't look all that bad by itself, it's still oversimplified. A far more complicated sub-select would be needed to select the right OID if there are multiple tables named mytable
in different schemas. The regclass
input converter handles the table lookup according to the schema path setting, and so it does the “right thing” automatically. Similarly, casting a table's OID to regclass
is handy for symbolic display of a numeric OID.
All of the OID alias types for objects grouped by namespace accept schema-qualified names, and will display schema-qualified names on output if the object would not be found in the current search path without being qualified. The regproc
and regoper
alias types will only accept input names that are unique (not overloaded), so they are of limited use; for most uses regprocedure
or regoperator
are more appropriate. For regoperator
, unary operators are identified by writing NONE
for the unused operand.
An additional property of most of the OID alias types is the creation of dependencies. If a constant of one of these types appears in a stored expression (such as a column default expression or view), it creates a dependency on the referenced object. For example, if a column has a default expression nextval('my_seq'::regclass)
, PostgreSQL understands that the default expression depends on the sequence my_seq
; the system will not let the sequence be dropped without first removing the default expression. regrole
is the only exception for the property. Constants of this type are not allowed in such expressions.
The OID alias types do not completely follow transaction isolation rules. The planner also treats them as simple constants, which may result in sub-optimal planning.
Another identifier type used by the system is xid
, or transaction (abbreviated xact) identifier. This is the data type of the system columns xmin
and xmax
. Transaction identifiers are 32-bit quantities.
A third identifier type used by the system is cid
, or command identifier. This is the data type of the system columns cmin
and cmax
. Command identifiers are also 32-bit quantities.
A final identifier type used by the system is tid
, or tuple identifier (row identifier). This is the data type of the system column ctid
. A tuple ID is a pair (block number, tuple index within block) that identifies the physical location of the row within its table.
(The system columns are further explained in Section 5.5.)
This section describes functions and operators for examining and manipulating values of type bytea
.
SQL defines some string functions that use key words, rather than commas, to separate arguments. Details are in Table 9.12. PostgreSQL also provides versions of these functions that use the regular function invocation syntax (see Table 9.13).
The sample results shown on this page assume that the server parameter bytea_output
is set to escape
(the traditional PostgreSQL format).
Additional binary string manipulation functions are available and are listed in Table 9.13. Some of them are used internally to implement the SQL-standard string functions listed in Table 9.12.
get_byte
and set_byte
number the first byte of a binary string as byte 0. get_bit
and set_bit
number bits from the right within each byte; for example bit 0 is the least significant bit of the first byte, and bit 15 is the most significant bit of the second byte.
Note that for historic reasons, the function md5
returns a hex-encoded value of type text
whereas the SHA-2 functions return type bytea
. Use the functions encode
and decode
to convert between the two, for example encode(sha256('abc'), 'hex')
to get a hex-encoded text representation.
See also the aggregate function string_agg
in Section 9.20 and the large object functions in Section 34.4.
This section describes functions and operators for examining and manipulating bit strings, that is values of the types bit
and bit varying
. Aside from the usual comparison operators, the operators shown in Table 9.14 can be used. Bit string operands of &
, |
, and #
must be of equal length. When bit shifting, the original length of the string is preserved, as shown in the examples.
The following SQL-standard functions work on bit strings as well as character strings: length
, bit_length
, octet_length
, position
, substring
, overlay
.
The following functions work on bit strings as well as binary strings: get_bit
, set_bit
. When working with a bit string, these functions number the first (leftmost) bit of the string as bit 0.
In addition, it is possible to cast integral values to and from type bit
. Some examples:
Note that casting to just “bit” means casting to bit(1)
, and so will deliver only the least significant bit of the integer.
Casting an integer to bit(n)
copies the rightmost n
bits. Casting an integer to a bit string width wider than the integer itself will sign-extend on the left.
常見可用的邏輯運算子:
SQL 使用具有 true、false 和 null 的三值邏輯系統,其中 null 表示“未知”。請參閱以下真值表:
運算子 AND 和 OR 是可交換的,也就是說,您可以在不影響結果的情況下交換左右運算元。有關子表示式求值順序的更多資訊,請參閱第 4.2.14 節。
本節提供了 PostgreSQL 的數學運算方式。對於沒有標準數學約定的型別(例如,日期/時間型別),我們將在後續部分中介紹具體的行為。
Table 9.4 列出了可用的數學運算子。
位元運算子僅適用於整數資料型別,也可用於位元字串型別的位元和位元變化,如 Table 9.14 所示。
Table 9.5 列出了可用的數學函數。在該表中,dp 表示雙精確度。這些函數中的許多函數都提供了多種形式,且具有不同的參數型別。除非另有說明,否則函數的任何形式都將回傳與其參數相同的資料型別。使用雙精確度資料的功能主要以主機系統的 C 函式庫實作; 因此,邊界情況下的準確性和行為可能會因主機系統而有所差異。
Table 9.6 shows functions for generating random numbers.
The random()
function uses a simple linear congruential algorithm. It is fast but not suitable for cryptographic applications; see the pgcrypto module for a more secure alternative. If setseed()
is called, the results of subsequent random()
calls in the current session are repeatable by re-issuing setseed()
with the same argument.
Table 9.7 shows the available trigonometric functions. All these functions take arguments and return values of type double precision
. Each of the trigonometric functions comes in two variants, one that measures angles in radians and one that measures angles in degrees.
Another way to work with angles measured in degrees is to use the unit transformation functions radians()
and degrees()
shown earlier. However, using the degree-based trigonometric functions is preferred, as that way avoids round-off error for special cases such as sind(30)
.
Table 9.8 shows the available hyperbolic functions. All these functions take arguments and return values of type double precision
.
本節介紹了用於檢查和操作字串的函數和運算子。在這種情況下,字串包括 character、character varying 和 text 型別的值。除非另有說明,否則下面列出的所有函數都可以在這些型別上使用,但是請注意在使用 character 型別時自動空格填充的潛在影響。其中有一些函數還支援對於位元型別的處理。
SQL 定義了一些使用關鍵字而不是逗號分隔參數的字串函數。詳情請見 Table 9.9。PostgreSQL 還提供了使用一般函數呼叫的語法,這些功能的函數版本(請參見 Table 9.10)。
在 PostgreSQL 8.3 之前的版本中,由於存在從這些資料型別到文字的隱式強制轉換,這些函數也將默默接受幾種非字串資料型別的值。這些強制轉換已被刪除,因為它們經常引起令人驚訝的結果。但是,字串連接運算子(||)仍然接受非字串輸入,只要至少一個輸入為字串型別即可,如 Table 9.9 所示。對於其他情況,如果您需要複製以前的行為,請在查詢語句中明確加入型別轉換。
其他字串操作的可用函數,在 Table 9.10 中列出。其中一些用於內部實作的SQL標準字符串函數,則在 Table 9.9 中列出。
concat、concat_ws 和 format 函數是動態參數,因此可以將要連接或格式化的值以 VARIADIC 關鍵字標記的陣列(請參閱第 37.5.5 節)輸入。 將陣列的元素視為函數的一個普通參數。如果動態參數陣列參數為 NULL,則 concat 和 concat_ws 回傳 NULL,但是 format 將 NULL 視為零元素陣列。
另請參閱第 9.20 節中的彙總函數 string_agg。
轉換名稱遵循標準的命名規則:來源編碼的正式名稱,所有非字母數字字元均用下底線代替,接在 _to_ 之後,然後是經過類似處理的目標編碼名稱。因此,名稱可能與習慣的編碼名稱有所不同。
format
The function format
produces output formatted according to a format string, in a style similar to the C function sprintf
.
formatstr
is a format string that specifies how the result should be formatted. Text in the format string is copied directly to the result, except where format specifiers are used. Format specifiers act as placeholders in the string, defining how subsequent function arguments should be formatted and inserted into the result. Each formatarg
argument is converted to text according to the usual output rules for its data type, and then formatted and inserted into the result string according to the format specifier(s).
Format specifiers are introduced by a %
character and have the form
where the component fields are:position
(optional)
A string of the form n
$ where n
is the index of the argument to print. Index 1 means the first argument after formatstr
. If the position
is omitted, the default is to use the next argument in sequence.flags
(optional)
Additional options controlling how the format specifier's output is formatted. Currently the only supported flag is a minus sign (-
) which will cause the format specifier's output to be left-justified. This has no effect unless the width
field is also specified.width
(optional)
Specifies the minimum number of characters to use to display the format specifier's output. The output is padded on the left or right (depending on the -
flag) with spaces as needed to fill the width. A too-small width does not cause truncation of the output, but is simply ignored. The width may be specified using any of the following: a positive integer; an asterisk (*
) to use the next function argument as the width; or a string of the form *
n
$ to use the n
th function argument as the width.
If the width comes from a function argument, that argument is consumed before the argument that is used for the format specifier's value. If the width argument is negative, the result is left aligned (as if the -
flag had been specified) within a field of length abs
(width
).type
(required)
The type of format conversion to use to produce the format specifier's output. The following types are supported:
s
formats the argument value as a simple string. A null value is treated as an empty string.
I
treats the argument value as an SQL identifier, double-quoting it if necessary. It is an error for the value to be null (equivalent to quote_ident
).
L
quotes the argument value as an SQL literal. A null value is displayed as the string NULL
, without quotes (equivalent to quote_nullable
).
In addition to the format specifiers described above, the special sequence %%
may be used to output a literal %
character.
Here are some examples of the basic format conversions:
Here are examples using width
fields and the -
flag:
These examples show use of position
fields:
Unlike the standard C function sprintf
, PostgreSQL's format
function allows format specifiers with and without position
fields to be mixed in the same format string. A format specifier without a position
field always uses the next argument after the last argument consumed. In addition, the format
function does not require all function arguments to be used in the format string. For example:
The %I
and %L
format specifiers are particularly useful for safely constructing dynamic SQL statements. See Example 42.1.
The usual comparison operators are available, as shown in Table 9.1.
The !=
operator is converted to <>
in the parser stage. It is not possible to implement !=
and <>
operators that do different things.
Comparison operators are available for all relevant data types. All comparison operators are binary operators that return values of type boolean
; expressions like 1 < 2 < 3
are not valid (because there is no <
operator to compare a Boolean value with 3
).
There are also some comparison predicates, as shown in Table 9.2. These behave much like operators, but have special syntax mandated by the SQL standard.
The BETWEEN
predicate simplifies range tests:
is equivalent to
Notice that BETWEEN
treats the endpoint values as included in the range. NOT BETWEEN
does the opposite comparison:
is equivalent to
BETWEEN SYMMETRIC
is like BETWEEN
except there is no requirement that the argument to the left of AND
be less than or equal to the argument on the right. If it is not, those two arguments are automatically swapped, so that a nonempty range is always implied.
Ordinary comparison operators yield null (signifying “unknown”), not true or false, when either input is null. For example, 7 = NULL
yields null, as does 7 <> NULL
. When this behavior is not suitable, use the IS [ NOT ] DISTINCT FROM
predicates:
For non-null inputs, IS DISTINCT FROM
is the same as the <>
operator. However, if both inputs are null it returns false, and if only one input is null it returns true. Similarly, IS NOT DISTINCT FROM
is identical to =
for non-null inputs, but it returns true when both inputs are null, and false when only one input is null. Thus, these predicates effectively act as though null were a normal data value, rather than “unknown”.
To check whether a value is or is not null, use the predicates:
or the equivalent, but nonstandard, predicates:
Do not write expression
= NULL because NULL
is not “equal to” NULL
. (The null value represents an unknown value, and it is not known whether two unknown values are equal.)
Some applications might expect that expression
= NULL returns true if expression
evaluates to the null value. It is highly recommended that these applications be modified to comply with the SQL standard. However, if that cannot be done the transform_null_equals configuration variable is available. If it is enabled, PostgreSQL will convert x = NULL
clauses to x IS NULL
.
If the expression
is row-valued, then IS NULL
is true when the row expression itself is null or when all the row's fields are null, while IS NOT NULL
is true when the row expression itself is non-null and all the row's fields are non-null. Because of this behavior, IS NULL
and IS NOT NULL
do not always return inverse results for row-valued expressions; in particular, a row-valued expression that contains both null and non-null fields will return false for both tests. In some cases, it may be preferable to write row
IS DISTINCT FROM NULL
or row
IS NOT DISTINCT FROM NULL
, which will simply check whether the overall row value is null without any additional tests on the row fields.
Boolean values can also be tested using the predicates
These will always return true or false, never a null value, even when the operand is null. A null input is treated as the logical value “unknown”. Notice that IS UNKNOWN
and IS NOT UNKNOWN
are effectively the same as IS NULL
and IS NOT NULL
, respectively, except that the input expression must be of Boolean type.
Some comparison-related functions are also available, as shown in Table 9.3.
For enum types (described in), there are several functions that allow cleaner programming without hard-coding particular values of an enum type. These are listed in. The examples assume an enum type created as:
Table 9.32. Enum Support Functions
Notice that except for the two-argument form ofenum_range
, these functions disregard the specific value passed to them; they care only about its declared data type. Either null or a specific value of the type can be passed, with the same result. It is more common to apply these functions to a table column or function argument than to a hardwired type name as suggested by the examples.
shows the operators available for thecidr
andinet
types. The operators<<
,<<=
,>>
,>>=
, and&&
test for subnet inclusion. They consider only the network parts of the two addresses (ignoring any host part) and determine whether one network is identical to or a subnet of the other.
Table 9.36. cidr
andinet
Operators
Table 9.37. cidr
andinet
Functions
Anycidr
value can be cast toinet
implicitly or explicitly; therefore, the functions shown above as operating oninet
also work oncidr
values. (Where there are separate functions forinet
andcidr
, it is because the behavior should be different for the two cases.) Also, it is permitted to cast aninet
value tocidr
. When this is done, any bits to the right of the netmask are silently zeroed to create a validcidr
value. In addition, you can cast a text value toinet
orcidr
using normal casting syntax: for example,inet(expression
)orcolname
::cidr.
Table 9.38. macaddr
Functions
Themacaddr
type also supports the standard relational operators (>
,<=
, etc.) for lexicographical ordering, and the bitwise arithmetic operators (~
,&
and|
) for NOT, AND and OR.
Table 9.39. macaddr8
Functions
Themacaddr8
type also supports the standard relational operators (>
,<=
, etc.) for ordering, and the bitwise arithmetic operators (~
,&
and|
) for NOT, AND and OR.
PostgreSQL 提供了三種不同的特徵比對方法:傳統的 SQL LIKE 運算子,最新的 SIMILAR TO 運算子(於 SQL:1999 中加入)和 POSIX 樣式的正規表示式。除了基本的「這個字串符合這個樣式嗎?」運算子之外,還可以使用函數來提取或替換符合的子字串,以及在配對的位置拆分字串。
提醒 如果您的特徵比對需求超出此範圍,請考慮在 Perl 或 Tcl 中撰寫使用者定義的函數。
雖然大多數正規表示式搜尋可以非常快速地執行,但是完成正規表示式需要花費大量的時間和記憶體來處理。要特別注意從各種來源接受正規表示式的搜尋方式。如果必須這樣做,建議強制限制執行語句執行時間。
使用 SIMILAR TO 方式的搜尋具有相同的安全隱憂,因為 SIMILAR TO 提供了許多與 POSIX 樣式的正規表示式相同功能。
LIKE 搜尋比其他兩個選項要簡單得多,在使用可能惡意的來源時更安全。
LIKE
The LIKE
expression returns true if the string
matches the supplied pattern
. (As expected, the NOT LIKE
expression returns false if LIKE
returns true, and vice versa. An equivalent expression is NOT (
string
LIKE pattern
).)
If pattern
does not contain percent signs or underscores, then the pattern only represents the string itself; in that case LIKE
acts like the equals operator. An underscore (_
) in pattern
stands for (matches) any single character; a percent sign (%
) matches any sequence of zero or more characters.
Some examples:
LIKE
pattern matching always covers the entire string. Therefore, if it's desired to match a sequence anywhere within a string, the pattern must start and end with a percent sign.
To match a literal underscore or percent sign without matching other characters, the respective character in pattern
must be preceded by the escape character. The default escape character is the backslash but a different one can be selected by using the ESCAPE
clause. To match the escape character itself, write two escape characters.
It's also possible to select no escape character by writing ESCAPE ''
. This effectively disables the escape mechanism, which makes it impossible to turn off the special meaning of underscore and percent signs in the pattern.
The key word ILIKE
can be used instead of LIKE
to make the match case-insensitive according to the active locale. This is not in the SQL standard but is a PostgreSQL extension.
The operator ~~
is equivalent to LIKE
, and ~~*
corresponds to ILIKE
. There are also !~~
and !~~*
operators that represent NOT LIKE
and NOT ILIKE
, respectively. All of these operators are PostgreSQL-specific.
There is also the prefix operator ^@
and corresponding starts_with
function which covers cases when only searching by beginning of the string is needed.
SIMILAR TO
Regular ExpressionsThe SIMILAR TO
operator returns true or false depending on whether its pattern matches the given string. It is similar to LIKE
, except that it interprets the pattern using the SQL standard's definition of a regular expression. SQL regular expressions are a curious cross between LIKE
notation and common regular expression notation.
Like LIKE
, the SIMILAR TO
operator succeeds only if its pattern matches the entire string; this is unlike common regular expression behavior where the pattern can match any part of the string. Also like LIKE
, SIMILAR TO
uses _
and %
as wildcard characters denoting any single character and any string, respectively (these are comparable to .
and .*
in POSIX regular expressions).
In addition to these facilities borrowed from LIKE
, SIMILAR TO
supports these pattern-matching metacharacters borrowed from POSIX regular expressions:
|
denotes alternation (either of two alternatives).
*
denotes repetition of the previous item zero or more times.
+
denotes repetition of the previous item one or more times.
?
denotes repetition of the previous item zero or one time.
{
m
}
denotes repetition of the previous item exactly m
times.
{
m
,}
denotes repetition of the previous item m
or more times.
{
m
,
n
}
denotes repetition of the previous item at least m
and not more than n
times.
Parentheses ()
can be used to group items into a single logical item.
A bracket expression [...]
specifies a character class, just as in POSIX regular expressions.
Notice that the period (.
) is not a metacharacter for SIMILAR TO
.
As with LIKE
, a backslash disables the special meaning of any of these metacharacters; or a different escape character can be specified with ESCAPE
.
Some examples:
The substring
function with three parameters, substring(
string
from pattern
for escape-character
), provides extraction of a substring that matches an SQL regular expression pattern. As with SIMILAR TO
, the specified pattern must match the entire data string, or else the function fails and returns null. To indicate the part of the pattern that should be returned on success, the pattern must contain two occurrences of the escape character followed by a double quote ("
). The text matching the portion of the pattern between these markers is returned.
Some examples, with #"
delimiting the return string:
POSIX regular expressions provide a more powerful means for pattern matching than the LIKE
and SIMILAR TO
operators. Many Unix tools such as egrep
, sed
, or awk
use a pattern matching language that is similar to the one described here.
A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a regular expression if it is a member of the regular set described by the regular expression. As with LIKE
, pattern characters match string characters exactly unless they are special characters in the regular expression language — but regular expressions use different special characters than LIKE
does. Unlike LIKE
patterns, a regular expression is allowed to match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or end of the string.
Some examples:
The POSIX pattern language is described in much greater detail below.
The substring
function with two parameters, substring(
string
from pattern
), provides extraction of a substring that matches a POSIX regular expression pattern. It returns null if there is no match, otherwise the portion of the text that matched the pattern. But if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned. You can put parentheses around the whole expression if you want to use parentheses within it without triggering this exception. If you need parentheses in the pattern before the subexpression you want to extract, see the non-capturing parentheses described below.
Some examples:
Some examples:
Some examples:
In the common case where you just want the whole matching substring or NULL
for no match, write something like
Some examples:
In most cases regexp_matches()
should be used with the g
flag, since if you only want the first match, it's easier and more efficient to use regexp_match()
. However,regexp_match()
only exists in PostgreSQL version 10 and up. When working in older versions, a common trick is to place a regexp_matches()
call in a sub-select, for example:
This produces a text array if there's a match, or NULL
if not, the same as regexp_match()
would do. Without the sub-select, this query would produce no output at all for table rows without a match, which is typically not the desired behavior.
The regexp_split_to_array
function behaves the same as regexp_split_to_table
, except that regexp_split_to_array
returns its result as an array of text
. It has the syntax regexp_split_to_array
(string
, pattern
[, flags
]). The parameters are the same as for regexp_split_to_table
.
Some examples:
As the last example demonstrates, the regexp split functions ignore zero-length matches that occur at the start or end of the string or immediately after a previous match. This is contrary to the strict definition of regexp matching that is implemented by regexp_match
and regexp_matches
, but is usually the most convenient behavior in practice. Other software systems such as Perl use similar definitions.
PostgreSQL's regular expressions are implemented using a software package written by Henry Spencer. Much of the description of regular expressions below is copied verbatim from his manual.
Regular expressions (REs), as defined in POSIX 1003.2, come in two forms: extended REs or EREs (roughly those of egrep
), and basic REs or BREs (roughly those of ed
). PostgreSQL supports both forms, and also implements some extensions that are not in the POSIX standard, but have become widely used due to their availability in programming languages such as Perl and Tcl. REs using these non-POSIX extensions are called advanced REs or AREs in this documentation. AREs are almost an exact superset of EREs, but BREs have several notational incompatibilities (as well as being much more limited). We first describe the ARE and ERE forms, noting features that apply only to AREs, and then describe how BREs differ.
A regular expression is defined as one or more branches, separated by |
. It matches anything that matches one of the branches.
A branch is zero or more quantified atoms or constraints, concatenated. It matches a match for the first, followed by a match for the second, etc; an empty branch matches the empty string.
An RE cannot end with a backslash (\
).
The forms using {
...
}
are known as bounds. The numbers m
and n
within a bound are unsigned decimal integers with permissible values from 0 to 255 inclusive.
A quantifier cannot immediately follow another quantifier, e.g., **
is invalid. A quantifier cannot begin an expression or subexpression or follow ^
or |
.
A bracket expression is a list of characters enclosed in []
. It normally matches any single character from the list (but see below). If the list begins with ^
, it matches any single character not from the rest of the list. If two characters in the list are separated by -
, this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g., [0-9]
in ASCII matches any decimal digit. It is illegal for two ranges to share an endpoint, e.g., a-c-e
. Ranges are very collating-sequence-dependent, so portable programs should avoid relying on them.
To include a literal ]
in the list, make it the first character (after ^
, if that is used). To include a literal -
, make it the first or last character, or the second endpoint of a range. To use a literal -
as the first endpoint of a range, enclose it in [.
and .]
to make it a collating element (see below). With the exception of these characters, some combinations using [
(see next paragraphs), and escapes (AREs only), all other special characters lose their special significance within a bracket expression. In particular, \
is not special when following ERE or BRE rules, though it is special (as introducing an escape) in AREs.
Within a bracket expression, a collating element (a character, a multiple-character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in [.
and .]
stands for the sequence of characters of that collating element. The sequence is treated as a single element of the bracket expression's list. This allows a bracket expression containing a multiple-character collating element to match more than one character, e.g., if the collating sequence includes a ch
collating element, then the RE [[.ch.]]*c
matches the first five characters of chchcc
.
PostgreSQL currently does not support multi-character collating elements. This information describes possible future behavior.
Within a bracket expression, a collating element enclosed in [=
and =]
is an equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, the treatment is as if the enclosing delimiters were [.
and .]
.) For example, if o
and ^
are the members of an equivalence class, then [[=o=]]
, [[=^=]]
, and [o^]
are all synonymous. An equivalence class cannot be an endpoint of a range.
Within a bracket expression, the name of a character class enclosed in [:
and :]
stands for the list of all characters belonging to that class. Standard character class names are: alnum
, alpha
, blank
,cntrl
, digit
, graph
, lower
, print
, punct
, space
, upper
, xdigit
. These stand for the character classes defined in ctype. A locale can provide others. A character class cannot be used as an endpoint of a range.
There are two special cases of bracket expressions: the bracket expressions [[:<:]]
and [[:>:]]
are constraints, matching empty strings at the beginning and end of a word respectively. A word is defined as a sequence of word characters that is neither preceded nor followed by word characters. A word character is an alnum
character (as defined by ctype) or an underscore. This is an extension, compatible with but not specified by POSIX 1003.2, and should be used with caution in software intended to be portable to other systems. The constraint escapes described below are usually preferable; they are no more standard, but are easier to type.
Escapes are special sequences beginning with \
followed by an alphanumeric character. Escapes come in several varieties: character entry, class shorthands, constraint escapes, and back references. A \
followed by an alphanumeric character but not constituting a valid escape is illegal in AREs. In EREs, there are no escapes: outside a bracket expression, a \
followed by an alphanumeric character merely stands for that character as an ordinary character, and inside a bracket expression, \
is an ordinary character. (The latter is the one actual incompatibility between EREs and AREs.)
Hexadecimal digits are 0
-9
, a
-f
, and A
-F
. Octal digits are 0
-7
.
Numeric character-entry escapes specifying values outside the ASCII range (0-127) have meanings dependent on the database encoding. When the encoding is UTF-8, escape values are equivalent to Unicode code points, for example \u1234
means the character U+1234
. For other multibyte encodings, character-entry escapes usually just specify the concatenation of the byte values for the character. If the escape value does not correspond to any legal character in the database encoding, no error will be raised, but it will never match any data.
The character-entry escapes are always taken as ordinary characters. For example, \135
is ]
in ASCII, but \135
does not terminate a bracket expression.
Within bracket expressions, \d
, \s
, and \w
lose their outer brackets, and \D
, \S
, and \W
are illegal. (So, for example, [a-c\d]
is equivalent to [a-c[:digit:]]
. Also, [a-c\D]
, which is equivalent to [a-c^[:digit:]]
, is illegal.)
A word is defined as in the specification of [[:<:]]
and [[:>:]]
above. Constraint escapes are illegal within bracket expressions.
There is an inherent ambiguity between octal character-entry escapes and back references, which is resolved by the following heuristics, as hinted at above. A leading zero always indicates an octal escape. A single non-zero digit, not followed by another digit, is always taken as a back reference. A multi-digit sequence not starting with a zero is taken as a back reference if it comes after a suitable subexpression (i.e., the number is in the legal range for a back reference), and otherwise is taken as octal.
In addition to the main syntax described above, there are some special forms and miscellaneous syntactic facilities available.
An RE can begin with one of two special director prefixes. If an RE begins with ***:
, the rest of the RE is taken as an ARE. (This normally has no effect in PostgreSQL, since REs are assumed to be AREs; but it does have an effect if ERE or BRE mode had been specified by the flags
parameter to a regex function.) If an RE begins with ***=
, the rest of the RE is taken to be a literal string, with all characters considered ordinary characters.
Embedded options take effect at the )
terminating the sequence. They can appear only at the start of an ARE (after the ***:
director if any).
In addition to the usual (tight) RE syntax, in which all characters are significant, there is an expanded syntax, available by specifying the embedded x
option. In the expanded syntax, white-space characters in the RE are ignored, as are all characters between a #
and the following newline (or the end of the RE). This permits paragraphing and commenting a complex RE. There are three exceptions to that basic rule:
a white-space character or #
preceded by \
is retained
white space or #
within a bracket expression is retained
white space and comments cannot appear within multi-character symbols, such as (?:
For this purpose, white-space characters are blank, tab, newline, and any character that belongs to the space
character class.
Finally, in an ARE, outside bracket expressions, the sequence (?#
ttt
)
(where ttt
is any text not containing a )
) is a comment, completely ignored. Again, this is not allowed between the characters of multi-character symbols, like (?:
. Such comments are more a historical artifact than a useful facility, and their use is deprecated; use the expanded syntax instead.
None of these metasyntax extensions is available if an initial ***=
director has specified that the user's input be treated as a literal string rather than as an RE.
In the event that an RE could match more than one substring of a given string, the RE matches the one starting earliest in the string. If the RE could match more than one substring starting at that point, either the longest possible match or the shortest possible match will be taken, depending on whether the RE is greedy or non-greedy.
Whether an RE is greedy or not is determined by the following rules:
Most atoms, and all constraints, have no greediness attribute (because they cannot match variable amounts of text anyway).
Adding parentheses around an RE does not change its greediness.
A quantified atom with a fixed-repetition quantifier ({
m
}
or {
m
}?
) has the same greediness (possibly none) as the atom itself.
A quantified atom with other normal quantifiers (including {
m
,
n
}
with m
equal to n
) is greedy (prefers longest match).
A quantified atom with a non-greedy quantifier (including {
m
,
n
}?
with m
equal to n
) is non-greedy (prefers shortest match).
A branch — that is, an RE that has no top-level |
operator — has the same greediness as the first quantified atom in it that has a greediness attribute.
An RE consisting of two or more branches connected by the |
operator is always greedy.
The above rules associate greediness attributes not only with individual quantified atoms, but with branches and entire REs that contain quantified atoms. What that means is that the matching is done in such a way that the branch, or whole RE, matches the longest or shortest possible substring as a whole. Once the length of the entire match is determined, the part of it that matches any particular subexpression is determined on the basis of the greediness attribute of that subexpression, with subexpressions starting earlier in the RE taking priority over ones starting later.
An example of what this means:
In the first case, the RE as a whole is greedy because Y*
is greedy. It can match beginning at the Y
, and it matches the longest possible string starting there, i.e., Y123
. The output is the parenthesized part of that, or 123
. In the second case, the RE as a whole is non-greedy because Y*?
is non-greedy. It can match beginning at the Y
, and it matches the shortest possible string starting there, i.e., Y1
. The subexpression [0-9]{1,3}
is greedy but it cannot change the decision as to the overall match length; so it is forced to match just 1
.
In short, when an RE contains both greedy and non-greedy subexpressions, the total match length is either as long as possible or as short as possible, according to the attribute assigned to the whole RE. The attributes assigned to the subexpressions only affect how much of that match they are allowed to “eat” relative to each other.
The quantifiers {1,1}
and {1,1}?
can be used to force greediness or non-greediness, respectively, on a subexpression or a whole RE. This is useful when you need the whole RE to have a greediness attribute different from what's deduced from its elements. As an example, suppose that we are trying to separate a string containing some digits into the digits and the parts before and after them. We might try to do that like this:
That didn't work: the first .*
is greedy so it “eats” as much as it can, leaving the \d+
to match at the last possible place, the last digit. We might try to fix that by making it non-greedy:
That didn't work either, because now the RE as a whole is non-greedy and so it ends the overall match as soon as possible. We can get what we want by forcing the RE as a whole to be greedy:
Controlling the RE's overall greediness separately from its components' greediness allows great flexibility in handling variable-length patterns.
When deciding what is a longer or shorter match, match lengths are measured in characters, not collating elements. An empty string is considered longer than no match at all. For example: bb*
matches the three middle characters of abbbc
; (week|wee)(night|knights)
matches all ten characters of weeknights
; when (.*).*
is matched against abc
the parenthesized subexpression matches all three characters; and when (a*)*
is matched against bc
both the whole RE and the parenthesized subexpression match an empty string.
If case-independent matching is specified, the effect is much as if all case distinctions had vanished from the alphabet. When an alphabetic that exists in multiple cases appears as an ordinary character outside a bracket expression, it is effectively transformed into a bracket expression containing both cases, e.g., x
becomes [xX]
. When it appears inside a bracket expression, all case counterparts of it are added to the bracket expression, e.g., [x]
becomes [xX]
and [^x]
becomes [^xX]
.
If newline-sensitive matching is specified, .
and bracket expressions using ^
will never match the newline character (so that matches will never cross newlines unless the RE explicitly arranges it) and ^
and $
will match the empty string after and before a newline respectively, in addition to matching at beginning and end of string respectively. But the ARE escapes \A
and \Z
continue to match beginning or end of string only.
If partial newline-sensitive matching is specified, this affects .
and bracket expressions as with newline-sensitive matching, but not ^
and $
.
If inverse partial newline-sensitive matching is specified, this affects ^
and $
as with newline-sensitive matching, but not .
and bracket expressions. This isn't very useful but is provided for symmetry.
No particular limit is imposed on the length of REs in this implementation. However, programs intended to be highly portable should not employ REs longer than 256 bytes, as a POSIX-compliant implementation can refuse to accept such REs.
The only feature of AREs that is actually incompatible with POSIX EREs is that \
does not lose its special significance inside bracket expressions. All other ARE features use syntax which is illegal or has undefined or unspecified effects in POSIX EREs; the ***
syntax of directors likewise is outside the POSIX syntax for both BREs and EREs.
Many of the ARE extensions are borrowed from Perl, but some have been changed to clean them up, and a few Perl extensions are not present. Incompatibilities of note include \b
, \B
, the lack of special treatment for a trailing newline, the addition of complemented bracket expressions to the things affected by newline-sensitive matching, the restrictions on parentheses and back references in lookahead/lookbehind constraints, and the longest/shortest-match (rather than first-match) matching semantics.
Two significant incompatibilities exist between AREs and the ERE syntax recognized by pre-7.4 releases of PostgreSQL:
In AREs, \
followed by an alphanumeric character is either an escape or an error, while in previous releases, it was just another way of writing the alphanumeric. This should not be much of a problem because there was no reason to write such a sequence in earlier releases.
In AREs, \
remains a special character within []
, so a literal \
within a bracket expression must be written \\
.
BREs differ from EREs in several respects. In BREs, |
, +
, and ?
are ordinary characters and there is no equivalent for their functionality. The delimiters for bounds are \{
and \}
, with {
and }
by themselves ordinary characters. The parentheses for nested subexpressions are \(
and \)
, with (
and )
by themselves ordinary characters. ^
is an ordinary character except at the beginning of the RE or the beginning of a parenthesized subexpression, $
is an ordinary character except at the end of the RE or the end of a parenthesized subexpression, and *
is an ordinary character if it appears at the beginning of the RE or the beginning of a parenthesized subexpression (after a possible leading ^
). Finally, single-digit back references are available, and \<
and \>
are synonyms for [[:<:]]
and [[:>:]]
respectively; no other escapes are available in BREs.
shows the available functions for date/time value processing, with details appearing in the following subsections. illustrates the behaviors of the basic arithmetic operators (+
, *
, etc.). For formatting functions, refer to . You should be familiar with the background information on date/time data types from .
All the functions and operators described below that take time
or timestamp
inputs actually come in two variants: one that takes time with time zone
or timestamp with time zone
, and one that takes time without time zone
or timestamp without time zone
. For brevity, these variants are not shown separately. Also, the +
and *
operators come in commutative pairs (for example both date + integer and integer + date); we show only one of each such pair.
In addition to these functions, the SQL OVERLAPS
operator is supported:
This expression yields true when two time periods (defined by their endpoints) overlap, false when they do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time, or time stamp followed by an interval. When a pair of values is provided, either the start or the end can be written first; OVERLAPS
automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start
<=
time
<
end
, unless start
and end
are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.
When adding an interval
value to (or subtracting an interval
value from) a timestamp with time zone
value, the days component advances or decrements the date of the timestamp with time zone
by the indicated number of days, keeping the time of day the same. Across daylight saving time changes (when the session time zone is set to a time zone that recognizes DST), this means interval '1 day'
does not necessarily equal interval '24 hours'
. For example, with the session time zone set to America/Denver
:
This happens because an hour was skipped due to a change in daylight saving time at 2005-04-03 02:00:00
in time zone America/Denver
.
Note there can be ambiguity in the months
field returned by age
because different months have different numbers of days. PostgreSQL's approach uses the month from the earlier of the two dates when calculating partial months. For example, age('2004-06-01', '2004-04-30')
uses April to yield 1 mon 1 day
, while using May would yield 1 mon 2 days
because May has 31 days, while April has only 30.
Subtraction of dates and timestamps can also be complex. One conceptually simple way to perform subtraction is to convert each value to a number of seconds using EXTRACT(EPOCH FROM ...)
, then subtract the results; this produces the number of seconds between the two values. This will adjust for the number of days in each month, timezone changes, and daylight saving time adjustments. Subtraction of date or timestamp values with the “-
” operator returns the number of days (24-hours) and hours/minutes/seconds between the values, making the same adjustments. The age
function returns years, months, days, and hours/minutes/seconds, performing field-by-field subtraction and then adjusting for negative field values. The following queries illustrate the differences in these approaches. The sample results were produced with timezone = 'US/Eastern'
; there is a daylight saving time change between the two dates used:
EXTRACT
, date_part
The extract
function retrieves subfields such as year or hour from date/time values. source
must be a value expression of type timestamp
, time
, or interval
. (Expressions of type date
are cast to timestamp
and can therefore be used as well.) field
is an identifier or string that selects what field to extract from the source value. The extract
function returns values of type double precision
. The following are valid field names:century
The century
The first century starts at 0001-01-01 00:00:00 AD, although they did not know it at the time. This definition applies to all Gregorian calendar countries. There is no century number 0, you go from -1 century to 1 century. If you disagree with this, please write your complaint to: Pope, Cathedral Saint-Peter of Roma, Vatican.day
For timestamp
values, the day (of the month) field (1 - 31) ; for interval
values, the number of days
decade
The year field divided by 10
dow
The day of the week as Sunday (0
) to Saturday (6
)
Note that extract
's day of the week numbering differs from that of the to_char(..., 'D')
function.doy
The day of the year (1 - 365/366)
epoch
For timestamp with time zone
values, the number of seconds since 1970-01-01 00:00:00 UTC (can be negative); for date
and timestamp
values, the number of seconds since 1970-01-01 00:00:00 local time; for interval
values, the total number of seconds in the interval
You can convert an epoch value back to a time stamp with to_timestamp
:
hour
The hour field (0 - 23)
isodow
The day of the week as Monday (1
) to Sunday (7
)
This is identical to dow
except for Sunday. This matches the ISO 8601 day of the week numbering.isoyear
The ISO 8601 week-numbering year that the date falls in (not applicable to intervals)
Each ISO 8601 week-numbering year begins with the Monday of the week containing the 4th of January, so in early January or late December the ISO year may be different from the Gregorian year. See the week
field for more information.
This field is not available in PostgreSQL releases prior to 8.3.microseconds
The seconds field, including fractional parts, multiplied by 1 000 000; note that this includes full seconds
millennium
The millennium
Years in the 1900s are in the second millennium. The third millennium started January 1, 2001.milliseconds
The seconds field, including fractional parts, multiplied by 1000. Note that this includes full seconds.
minute
The minutes field (0 - 59)
month
For timestamp
values, the number of the month within the year (1 - 12) ; for interval
values, the number of months, modulo 12 (0 - 11)
quarter
The quarter of the year (1 - 4) that the date is in
second
timezone
The time zone offset from UTC, measured in seconds. Positive values correspond to time zones east of UTC, negative values to zones west of UTC. (Technically, PostgreSQL does not use UTC because leap seconds are not handled.)timezone_hour
The hour component of the time zone offsettimezone_minute
The minute component of the time zone offsetweek
The number of the ISO 8601 week-numbering week of the year. By definition, ISO weeks start on Mondays and the first week of a year contains January 4 of that year. In other words, the first Thursday of a year is in week 1 of that year.
In the ISO week-numbering system, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for late-December dates to be part of the first week of the next year. For example, 2005-01-01
is part of the 53rd week of year 2004, and 2006-01-01
is part of the 52nd week of year 2005, while 2012-12-31
is part of the first week of 2013. It's recommended to use the isoyear
field together with week
to get consistent results.
year
The year field. Keep in mind there is no 0 AD
, so subtracting BC
years from AD
years should be done with care.
When the input value is +/-Infinity, extract
returns +/-Infinity for monotonically-increasing fields (epoch
, julian
, year
, isoyear
, decade
, century
, and millennium
). For other fields, NULL is returned. PostgreSQL versions before 9.6 returned zero for all cases of infinite input.
The date_part
function is modeled on the traditional Ingres equivalent to the SQL-standard function extract
:
Note that here the field
parameter needs to be a string value, not a name. The valid field names for date_part
are the same as for extract
.
date_trunc
The function date_trunc
is conceptually similar to the trunc
function for numbers.
source
is a value expression of type timestamp
, timestamp with time zone
, or interval
. (Values of type date
and time
are cast automatically to timestamp
or interval
, respectively.) field
selects to which precision to truncate the input value. The return value is likewise of type timestamp
, timestamp with time zone
, or interval
, and it has all fields that are less significant than the selected one set to zero (or one, for day and month).
Valid values for field
are:
A time zone cannot be specified when processing timestamp without time zone
or interval
inputs. These are always taken at face value.
Examples (assuming the local time zone is America/New_York
):
AT TIME ZONE
AT TIME ZONE
VariantsExamples (assuming the local time zone is America/Los_Angeles
):
The first example adds a time zone to a value that lacks it, and displays the value using the current TimeZone
setting. The second example shifts the time stamp with time zone value to the specified time zone, and returns the value without a time zone. This allows storage and display of values different from the current TimeZone
setting. The third example converts Tokyo time to Chicago time. Converting time values to other time zones uses the currently active time zone rules since no date is supplied.
The function timezone
(zone
, timestamp
) is equivalent to the SQL-conforming construct timestamp
AT TIME ZONE zone
.
PostgreSQL provides a number of functions that return values related to the current date and time. These SQL-standard functions all return values based on the start time of the current transaction:
CURRENT_TIME
and CURRENT_TIMESTAMP
deliver values with time zone; LOCALTIME
and LOCALTIMESTAMP
deliver values without time zone.
CURRENT_TIME
, CURRENT_TIMESTAMP
, LOCALTIME
, and LOCALTIMESTAMP
can optionally take a precision parameter, which causes the result to be rounded to that many fractional digits in the seconds field. Without a precision parameter, the result is given to the full available precision.
Some examples:
Since these functions return the start time of the current transaction, their values do not change during the transaction. This is considered a feature: the intent is to allow a single transaction to have a consistent notion of the “current” time, so that multiple modifications within the same transaction bear the same time stamp.
Other database systems might advance these values more frequently.
PostgreSQL also provides functions that return the start time of the current statement, as well as the actual current time at the instant the function is called. The complete list of non-SQL-standard time functions is:
transaction_timestamp()
is equivalent to CURRENT_TIMESTAMP
, but is named to clearly reflect what it returns. statement_timestamp()
returns the start time of the current statement (more specifically, the time of receipt of the latest command message from the client). statement_timestamp()
and transaction_timestamp()
return the same value during the first command of a transaction, but might differ during subsequent commands. clock_timestamp()
returns the actual current time, and therefore its value changes even within a single SQL command. timeofday()
is a historical PostgreSQL function. Like clock_timestamp()
, it returns the actual current time, but as a formatted text
string rather than a timestamp with time zone
value. now()
is a traditional PostgreSQL equivalent to transaction_timestamp()
.
All the date/time data types also accept the special literal value now
to specify the current date and time (again, interpreted as the transaction start time). Thus, the following three all return the same result:
You do not want to use the third form when specifying a DEFAULT
clause while creating a table. The system will convert now
to a timestamp
as soon as the constant is parsed, so that when the default value is needed, the time of the table creation would be used! The first two forms will not be evaluated until the default value is used, because they are function calls. Thus they will give the desired behavior of defaulting to the time of row insertion.
The following functions are available to delay execution of the server process:
pg_sleep
makes the current session's process sleep until seconds
seconds have elapsed. seconds
is a value of type double precision
, so fractional-second delays can be specified. pg_sleep_for
is a convenience function for larger sleep times specified as an interval
. pg_sleep_until
is a convenience function for when a specific wake-up time is desired. For example:
The effective resolution of the sleep interval is platform-specific; 0.01 seconds is a common value. The sleep delay will be at least as long as specified. It might be longer depending on factors such as server load. In particular, pg_sleep_until
is not guaranteed to wake up exactly at the specified time, but it will not wake up any earlier.
Make sure that your session does not hold more locks than necessary when calling pg_sleep
or its variants. Otherwise other sessions might have to wait for your sleeping process, slowing down the entire system.
PostgreSQL 提供了一個產成 UUID 的函數:
此函數回傳版本 4(隨機)的 UUID。這是最常用的 UUID 樣式,適用於大多數的應用。
模組提供了其他函數,這些函數實作了用於產生成其他 UUID 的標準演算法。
PostgreSQL 還提供了 中列出的用於 UUID 常用的比較運算子。
本節介紹 PostgreSQL 中符合 SQL 標準可用的條件表示式。
如果您的需求超出了這些條件表示式的功能,您可能需要考慮使用功能更強的程序語言編寫 stored procedure。
CASE
SQL 中的 CASE 表示式是一種通用的條件表示式,類似於其他程序語言中的 if / else 語句:
CASE子句可用於任何表示式有效的地方。每個條件都是一個回傳布林值的表示式。如果條件結果為 true,則 CASE 表示式的值為該條件之後的結果,而不處理CASE表達式的其餘部分。如果條件的結果不成立,則以相同的方式檢查後續的 WHEN 子句。如果沒有任何 WHEN 條件成立,則 CASE 表示式的值是 ELSE 子句的結果。如果省略了 ELSE 子句並且沒有條件為真,則結果為 null。
範例:
CASE 表示式的「簡單」語法是上述一般語法的變形:
計算第一個表示式,然後與 WHEN 子句中的每個表示式的結果值進行比較,直到找到與其相等的值。如果未找到匹配的項目,則回傳 ELSE 子句(或空值)的結果。這與 C 語言中的 switch 語句類似。
上面的例子可以使用簡單的 CASE 語法來撰寫:
CASE 表示式不會計算任何不需要的子表示式來確定結果。例如,這是避免除以零例外狀況可能的方法:
COALESCE
COALESCE 函數回傳非空值的第一個參數。僅當所有參數都為空值時才回傳空值。當檢索資料要進行顯示時,它通常用於將預認值替換為空值,例如:
如果它不為 null,則回傳 descrtiption;否則,如果 short_description 不為null,則傳回 short_description;否則回傳(none)。
像 CASE 表示式一樣,COALESCE 只計算確定結果所需的參數;也就是說,不會計算第一個非空值參數之後的參數。此 SQL 標準函數提供了與 NVL 和 IFNULL 類似的功能,這些在其他某些資料庫系統中所使用的功能。
NULLIF
如果 value1 等於 value2,則 NULLIF 函數回傳空值;否則回傳 value1。這可以用來執行上面 COALESCE 範例的逆操作:
在這個例子中,如果 value 是(none),則回傳 null,否則回傳 value 的值。
GREATEST
and LEAST
請注意,GREATEST 和 LEAST 並不在 SQL 標準中,但卻是一個常見的延伸功能。如果任何參數為 NULL,則其他一些資料庫會使其回傳 NULL,而不是僅在所有參數都為 NULL 時回傳 NULL。
This section describes functions for operating onsequence objects, also called sequence generators or just sequences. Sequence objects are special single-row tables created with. Sequence objects are commonly used to generate unique identifiers for rows of a table. The sequence functions, listed in, provide simple, multiuser-safe methods for obtaining successive sequence values from sequence objects.
Table 9.47. Sequence Functions
The sequence to be operated on by a sequence function is specified by aregclass
argument, which is simply the OID of the sequence in thepg_class
system catalog. You do not have to look up the OID by hand, however, since theregclass
data type's input converter will do the work for you. Just write the sequence name enclosed in single quotes so that it looks like a literal constant. For compatibility with the handling of ordinarySQLnames, the string will be converted to lower case unless it contains double quotes around the sequence name. Thus:
The sequence name can be schema-qualified if necessary:
BeforePostgreSQL8.1, the arguments of the sequence functions were of typetext
, notregclass
, and the above-described conversion from a text string to an OID value would happen at run time during each call. For backward compatibility, this facility still exists, but internally it is now handled as an implicit coercion fromtext
toregclass
before the function is invoked.
When you write the argument of a sequence function as an unadorned literal string, it becomes a constant of typeregclass
. Since this is really just an OID, it will track the originally identified sequence despite later renaming, schema reassignment, etc. This“early binding”behavior is usually desirable for sequence references in column defaults and views. But sometimes you might want“late binding”where the sequence reference is resolved at run time. To get late-binding behavior, force the constant to be stored as atext
constant instead ofregclass
:
Note that late binding was the only behavior supported inPostgreSQLreleases before 8.1, so you might need to do this to preserve the semantics of old applications.
Of course, the argument of a sequence function can be an expression as well as a constant. If it is a text expression then the implicit coercion will result in a run-time lookup.
The available sequence functions are:
nextval
Advance the sequence object to its next value and return that value. This is done atomically: even if multiple sessions executenextval
concurrently, each will safely receive a distinct sequence value.
To avoid blocking concurrent transactions that obtain numbers from the same sequence, anextval
operation is never rolled back; that is, once a value has been fetched it is considered used and will not be returned again. This is true even if the surrounding transaction later aborts, or if the calling query ends up not using the value. For example anINSERT
with anON CONFLICT
clause will compute the to-be-inserted tuple, including doing any requirednextval
calls, before detecting any conflict that would cause it to follow theON CONFLICT
rule instead. Such cases will leave unused“holes”in the sequence of assigned values. Thus,PostgreSQLsequence objectscannot be used to obtain“gapless”sequences.
This function requiresUSAGE
orUPDATE
privilege on the sequence.
currval
Return the value most recently obtained bynextval
for this sequence in the current session. (An error is reported ifnextval
has never been called for this sequence in this session.) Because this is returning a session-local value, it gives a predictable answer whether or not other sessions have executednextval
since the current session did.
This function requiresUSAGE
orSELECT
privilege on the sequence.
lastval
Return the value most recently returned bynextval
in the current session. This function is identical tocurrval
, except that instead of taking the sequence name as an argument it refers to whichever sequencenextval
was most recently applied to in the current session. It is an error to calllastval
ifnextval
has not yet been called in the current session.
This function requiresUSAGE
orSELECT
privilege on the last used sequence.
setval
Reset the sequence object's counter value. The two-parameter form sets the sequence'slast_value
field to the specified value and sets itsis_called
field totrue
, meaning that the nextnextval
will advance the sequence before returning a value. The value reported bycurrval
is also set to the specified value. In the three-parameter form,is_called
can be set to eithertrue
orfalse
.true
has the same effect as the two-parameter form. If it is set tofalse
, the nextnextval
will return exactly the specified value, and sequence advancement commences with the followingnextval
. Furthermore, the value reported bycurrval
is not changed in this case. For example,
The result returned bysetval
is just the value of its second argument.
Because sequences are non-transactional, changes made bysetval
are not undone if the transaction rolls back.
This function requiresUPDATE
privilege on the sequence.
,andsummarize the functions and operators that are provided for full text searching. Seefor a detailed explanation ofPostgreSQL's text search facility.
Table 9.40. Text Search Operators
Thetsquery
containment operators consider only the lexemes listed in the two queries, ignoring the combining operators.
In addition to the operators shown in the table, the ordinary B-tree comparison operators (=
,<
, etc) are defined for typestsvector
andtsquery
. These are not very useful for text searching but allow, for example, unique indexes to be built on columns of these types.
Table 9.41. Text Search Functions
Table 9.42. Text Search Debugging Functions
本節中描述的函數和類函數表示式對 xml 型別的值進行操作。有關 xml 型別的訊息,請查看。這裡不再重複用於轉換為 xml 型別的函數表示式 xmlparse 和 xmlserialize。使用大多數這些函數需要使用 configure --with-libxml 編譯安裝。
一組函數和類函數的表示式可用於從 SQL 資料産生 XML 內容。因此,它們特別適合將查詢結果格式化為 XML 文件以便在用戶端應用程序中進行處理。
函數 xmlcomment 建立一個 XML 字串,其中包含指定文字作為內容的 XML 註釋。文字不能包含「 -- 」或以「 - 」結尾,以便産生的結構是有效的 XML 註釋。 如果參數為 null,則結果為 null。
例如:
函數 xmlconcat 連接列表中各個 XML 字串,以建立包含 XML 內容片段的單個字串。空值會被忽略;如果都沒有非空值參數,則結果僅為 null。
例如:
XML 宣告(如果存在)組合如下。如果所有參數值具有相同的 XML 版本宣告,則在結果中使用該版本,否則不使用任何版本。如果所有參數值都具有獨立宣告值「yes」,則在結果中使用該值。如果所有參數值都具有獨立的宣告值且至少有一個為「no」,則在結果中使用該值。否則結果將沒有獨立宣告。如果確定結果需要獨立宣告但沒有版本聲明,則將使用版本為 1.0 的版本宣告,因為 XML 要求 XML 宣告包含版本宣告。在所有情況下都會忽略編碼宣告並將其刪除。
例如:
xmlelement 表示式産生具有給定名稱、屬性和內容的 XML 元素。
範例:
透過用 xHHHH 序列替換有問題的字符來轉譯非有效 XML 名稱的元素和屬性名稱,其中 HHHH 是十六進位表示法中字元的 Unicode 代碼。例如:
如果屬性值是引用欄位,則無需明確指定屬性名稱,在這種情況下,預設情況下欄位的名稱將用作屬性名稱。在其他情況下,必須為該屬性明確指定名稱。所以這個例子是有效的:
但這些不行:
元素內容(如果已指定)將根據其資料型別進行格式化。如果內容本身是 xml 型別,則可以建構複雜的 XML 文件。例如:
其他型別的內容將被格式化為有效的 XML 字元資料。這尤其意味著字符 <、> 和 & 將被轉換為其他形式。二進位資料(資料型別 bytea)將以 base64 或十六進位編碼表示,具體取決於組態參數 xmlbinary 的設定。為了使 SQL 和 PostgreSQL 資料型別與 XML Schema 規範保持一致,預計各種資料型別的特定行為將會各自發展,此時將出現更精確的描述。
xmlforest 表示式使用給定的名稱和內容産生元素的 XML 序列。
範例:
如第二個範例所示,如果內容值是欄位引用,則可以省略元素名稱,在這種情況下,預設情況下使用欄位名稱。 否則,必須指定名稱。
非有效的 XML 名稱的元素名稱將被轉譯,如上面的 xmlelement 所示。類似地,內容資料會被轉譯以産生有效的 XML 內容,除非它已經是 xml 型別。
請注意,如果 XML 序列由多個元素組成,則它們不是有效的 XML 文件,因此將 xmlforest 表示式包裝在 xmlelement 中可能很有用。
xmlpi 表示式建立 XML 處理指令。內容(如果存在)不得包含字元序列 ?>。
例如:
xmlroot 表示式改變 XML 值的根節點屬性。如果指定了版本,它將替換根節點的版本宣告中的值;如果指定了獨立設定,則它將替換根節點的獨立宣告中的值。
例如:
要確定連接的順序,可以將 ORDER BY 子句加到彙總呼叫中,如第 4.2.7 節中所述。例如:
以前的版本中推薦使用以下非標準方法,在特定情況下可能仍然有用:
本節中描述的表示式用於檢查 xml 的屬性。
如果第一個參數中的 XPath 表示式回傳任何節點,則 xmlexists 函數回傳 true,否則回傳 false。 (如果任一參數為 null,則結果為 null。)
範例
BY REF 子句在 PostgreSQL 中沒有任何作用,但可以達到 SQL 一致性和與其他實作的相容性。根據 SQL 標準,第一個 BY REF 是必需的,第二個是選擇性的。另請注意,SQL 標準指定 xmlexists 構造將 XQuery 表示式作為第一個參數,但 PostgreSQL 目前僅支持 XPath,它是 XQuery 的子集。
此函數檢查文字字串是否格式正確,回傳布林結果。xml_is_well_formed_document 檢查格式正確的文檔,而 xml_is_well_formed_content 檢查格式良好的內容。如果 xmloption 配置參數設定為 DOCUMENT,則 xml_is_well_formed 會執行前者;如果設定為 CONTENT,則執行後者。這意味著 xml_is_well_formed 對於查看對 xml 類型的簡單強制轉換是否成功很有用,而其他兩個函數對於查看 XMLPARSE 的相對應變數是否成功很有用。
範例:
最後一個範例顯示檢查包括命名空間是否符合。
為了處理資料型別為 xml 的值,PostgreSQL 提供了 xpath 和 xpath_exists 函數,它們用於計算 XPath 1.0 表示式和 XMLTABLE 資料表函數。
函數 xpath 根據 XML 值 xml 計算 XPath 表示式 xpath(字串)。 它回傳與 XPath 表示式產生的節點集合所相對應 XML 值的陣列。如果 XPath 表示式回傳單一變數值而不是節點集合,則回傳單個元素的陣列。
第二個參數必須是格式良好的 XML 內容。特別要注意是,它必須具有單一根節點元素。
該函數的選擇性第三個參數是命名空間對應的陣列。該陣列應該是二維字串陣列,第二維的長度等於 2(即,它應該是陣列的陣列,每個陣列恰好由 2 個元素組成)。每個陣列項目的第一個元素是命名空間名稱(別名),第二個是命名空間 URI。不要求此陣列中提供的別名與 XML 內容本身所使用的別名相同(換句話說,在 XML 內容和 xpath 函數內容中,別名都是區域性的)。
例如:
要設定預設的(匿名)命名空間,請執行以下操作:
The function xpath_exists
is a specialized form of the xpath
function. Instead of returning the individual XML values that satisfy the XPath, this function returns a Boolean indicating whether the query was satisfied or not. This function is equivalent to the standard XMLEXISTS
predicate, except that it also offers support for a namespace mapping argument.
Example:
The xmltable
function produces a table based on the given XML value, an XPath filter to extract rows, and an optional set of column definitions.
The optional XMLNAMESPACES
clause is a comma-separated list of namespaces. It specifies the XML namespaces used in the document and their aliases. A default namespace specification is not currently supported.
The required row_expression
argument is an XPath expression that is evaluated against the supplied XML document to obtain an ordered sequence of XML nodes. This sequence is what xmltable
transforms into output rows.
document_expression
provides the XML document to operate on. The BY REF
clauses have no effect in PostgreSQL, but are allowed for SQL conformance and compatibility with other implementations. The argument must be a well-formed XML document; fragments/forests are not accepted.
The mandatory COLUMNS
clause specifies the list of columns in the output table. If the COLUMNS
clause is omitted, the rows in the result set contain a single column of type xml
containing the data matched by row_expression
. If COLUMNS
is specified, each entry describes a single column. See the syntax summary above for the format. The column name and type are required; the path, default and nullability clauses are optional.
A column marked FOR ORDINALITY
will be populated with row numbers matching the order in which the output rows appeared in the original input XML document. At most one column may be marked FOR ORDINALITY
.
The column_expression
for a column is an XPath expression that is evaluated for each row, relative to the result of the row_expression
, to find the value of the column. If no column_expression
is given, then the column name is used as an implicit path.
If a column's XPath expression returns multiple elements, an error is raised. If the expression matches an empty tag, the result is an empty string (not NULL
). Any xsi:nil
attributes are ignored.
The text body of the XML matched by the column_expression
is used as the column value. Multiple text()
nodes within an element are concatenated in order. Any child elements, processing instructions, and comments are ignored, but the text contents of child elements are concatenated to the result. Note that the whitespace-only text()
node between two non-text elements is preserved, and that leading whitespace on a text()
node is not flattened.
If the path expression does not match for a given row but default_expression
is specified, the value resulting from evaluating that expression is used. If no DEFAULT
clause is given for the column, the field will be set to NULL
. It is possible for a default_expression
to reference the value of output columns that appear prior to it in the column list, so the default of one column may be based on the value of another column.
Columns may be marked NOT NULL
. If the column_expression
for a NOT NULL
column does not match anything and there is no DEFAULT
or the default_expression
also evaluates to null, an error is reported.
Unlike regular PostgreSQL functions, column_expression
and default_expression
are not evaluated to a simple value before calling the function. column_expression
is normally evaluated exactly once per input row, and default_expression
is evaluated each time a default is needed for a field. If the expression qualifies as stable or immutable the repeat evaluation may be skipped. Effectively xmltable
behaves more like a subquery than a function call. This means that you can usefully use volatile functions like nextval
in default_expression
, and column_expression
may depend on other parts of the XML document.
Examples:
The following example shows concatenation of multiple text() nodes, usage of the column name as XPath filter, and the treatment of whitespace, XML comments and processing instructions:
The following example illustrates how the XMLNAMESPACES
clause can be used to specify the default namespace, and a list of additional namespaces used in the XML document as well as in the XPath expressions:
The following functions map the contents of relational tables to XML values. They can be thought of as XML export functionality:
The return type of each function is xml
.
table_to_xml
maps the content of the named table, passed as parameter tbl
. The regclass
type accepts strings identifying tables using the usual notation, including optional schema qualifications and double quotes. query_to_xml
executes the query whose text is passed as parameter query
and maps the result set. cursor_to_xml
fetches the indicated number of rows from the cursor specified by the parameter cursor
. This variant is recommended if large tables have to be mapped, because the result value is built up in memory by each function.
If tableforest
is false, then the resulting XML document looks like this:
If tableforest
is true, the result is an XML content fragment that looks like this:
If no table name is available, that is, when mapping a query or a cursor, the string table
is used in the first format, row
in the second format.
The choice between these formats is up to the user. The first format is a proper XML document, which will be important in many applications. The second format tends to be more useful in the cursor_to_xml
function if the result values are to be reassembled into one document later on. The functions for producing XML content discussed above, in particular xmlelement
, can be used to alter the results to taste.
The data values are mapped in the same way as described for the function xmlelement
above.
The parameter nulls
determines whether null values should be included in the output. If true, null values in columns are represented as:
where xsi
is the XML namespace prefix for XML Schema Instance. An appropriate namespace declaration will be added to the result value. If false, columns containing null values are simply omitted from the output.
The parameter targetns
specifies the desired XML namespace of the result. If no particular namespace is wanted, an empty string should be passed.
The following functions return XML Schema documents describing the mappings performed by the corresponding functions above:
It is essential that the same parameters are passed in order to obtain matching XML data mappings and XML Schema documents.
The following functions produce XML data mappings and the corresponding XML Schema in one document (or forest), linked together. They can be useful where self-contained and self-describing results are wanted:
In addition, the following functions are available to produce analogous mappings of entire schemas or the entire current database:
Note that these potentially produce a lot of data, which needs to be built up in memory. When requesting content mappings of large schemas or databases, it might be worthwhile to consider mapping the tables separately instead, possibly even through a cursor.
The result of a schema content mapping looks like this:
where the format of a table mapping depends on the tableforest
parameter as explained above.
The result of a database content mapping looks like this:
where the schema mapping is as above.
PostgreSQL 格式化函數提供了一套功能強大的工具,用於將各種資料型別(日期/時間、整數、浮點數、數字)轉換為格式化的字串,以及從格式化字串轉換為特定資料型別。列出了這些函數,而這些函數都遵循一個通用的呼叫約定:第一個參數是要格式化的值,第二個參數是定義輸出或輸入格式的樣板。
提醒 還有一個單一參數 to_timestamp 函數; 請參閱 。
小技巧 存在有 to_timestamp 和 to_date 來處理無法透過簡單轉換進行轉換的輸入格式。對於大多數標準日期/時間格式,只需將來源字串強制轉換為所需的資料型別即可,並且非常容易。同樣地,對於標準數字表示形式,to_number 也不是必要的。
在 to_char 輸出樣版字串中,基於給予值識別並替換為某些格式資料的某些樣式。 非樣板的任何文字都將被逐字複製。同樣地,在輸入樣板字串(用於其他功能)中,樣板標識輸入資料字串要提供的值。如果樣板字串中存在不是樣板的字串,則只需跳過輸入資料字串中的相對應字元(無論它們是否等於樣板字串字元)。
shows the template patterns available for formatting date and time values.
Usage notes for date/time formatting:
FM
suppresses leading zeroes and trailing blanks that would otherwise be added to make the output of a pattern be fixed-width. In PostgreSQL, FM
modifies only the next specification, while in Oracle FM
affects all subsequent specifications, and repeated FM
modifiers toggle fill mode on and off.
TM
does not include trailing blanks. to_timestamp
and to_date
ignore the TM
modifier.
to_timestamp
and to_date
skip multiple blank spaces at the beginning of the input string and around date and time values unless the FX
option is used. For example, to_timestamp(' 2000 JUN', 'YYYY MON')
and to_timestamp('2000 - JUN', 'YYYY-MON')
work, but to_timestamp('2000 JUN', 'FXYYYY MON')
returns an error because to_timestamp
expects only a single space. FX
must be specified as the first item in the template.
A separator (a space or non-letter/non-digit character) in the template string of to_timestamp
and to_date
matches any single separator in the input string or is skipped, unless the FX
option is used. For example, to_timestamp('2000JUN', 'YYYY///MON')
and to_timestamp('2000/JUN', 'YYYY MON')
work, but to_timestamp('2000//JUN', 'YYYY/MON')
returns an error because the number of separators in the input string exceeds the number of separators in the template.
If FX
is specified, a separator in the template string matches exactly one character in the input string. But note that the input string character is not required to be the same as the separator from the template string. For example, to_timestamp('2000/JUN', 'FXYYYY MON')
works, but to_timestamp('2000/JUN', 'FXYYYY MON')
returns an error because the second space in the template string consumes the letter J
from the input string.
A TZH
template pattern can match a signed number. Without the FX
option, minus signs may be ambiguous, and could be interpreted as a separator. This ambiguity is resolved as follows: If the number of separators before TZH
in the template string is less than the number of separators before the minus sign in the input string, the minus sign is interpreted as part of TZH
. Otherwise, the minus sign is considered to be a separator between values. For example, to_timestamp('2000 -10', 'YYYY TZH')
matches -10
to TZH
, but to_timestamp('2000 -10', 'YYYY TZH')
matches 10
to TZH
.
Ordinary text is allowed in to_char
templates and will be output literally. You can put a substring in double quotes to force it to be interpreted as literal text even if it contains template patterns. For example, in '"Hello Year "YYYY'
, the YYYY
will be replaced by the year data, but the single Y
in Year
will not be. In to_date
, to_number
, and to_timestamp
, literal text and double-quoted strings result in skipping the number of characters contained in the string; for example "XX"
skips two input characters (whether or not they are XX
).
Tip
Prior to PostgreSQL 12, it was possible to skip arbitrary text in the input string using non-letter or non-digit characters. For example, to_timestamp('2000y6m1d', 'yyyy-MM-DD')
used to work. Now you can only use letter characters for this purpose. For example, to_timestamp('2000y6m1d', 'yyyytMMtDDt')
and to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"')
skip y
, m
, and d
.
If you want to have a double quote in the output you must precede it with a backslash, for example '\"YYYY Month\"'
. Backslashes are not otherwise special outside of double-quoted strings. Within a double-quoted string, a backslash causes the next character to be taken literally, whatever it is (but this has no special effect unless the next character is a double quote or another backslash).
In to_timestamp
and to_date
, if the year format specification is less than four digits, e.g. YYY
, and the supplied year is less than four digits, the year will be adjusted to be nearest to the year 2020, e.g. 95
becomes 1995.
In to_timestamp
and to_date
, the YYYY
conversion has a restriction when processing years with more than 4 digits. You must use some non-digit character or template after YYYY
, otherwise the year is always interpreted as 4 digits. For example (with the year 20000): to_date('200001131', 'YYYYMMDD')
will be interpreted as a 4-digit year; instead use a non-digit separator after the year, like to_date('20000-1131', 'YYYY-MMDD')
or to_date('20000Nov31', 'YYYYMonDD')
.
In to_timestamp
and to_date
, the CC
(century) field is accepted but ignored if there is a YYY
, YYYY
or Y,YYY
field. If CC
is used with YY
or Y
then the result is computed as that year in the specified century. If the century is specified but the year is not, the first year of the century is assumed.
In to_timestamp
and to_date
, weekday names or numbers (DAY
, D
, and related field types) are accepted but are ignored for purposes of computing the result. The same is true for quarter (Q
) fields.
In to_timestamp
and to_date
, an ISO 8601 week-numbering date (as distinct from a Gregorian date) can be specified in one of two ways:
Year, week number, and weekday: for example to_date('2006-42-4', 'IYYY-IW-ID')
returns the date 2006-10-19
. If you omit the weekday it is assumed to be 1 (Monday).
Year and day of year: for example to_date('2006-291', 'IYYY-IDDD')
also returns 2006-10-19
.
Attempting to enter a date using a mixture of ISO 8601 week-numbering fields and Gregorian date fields is nonsensical, and will cause an error. In the context of an ISO 8601 week-numbering year, the concept of a “month” or “day of month” has no meaning. In the context of a Gregorian year, the ISO week has no meaning.
Caution
In to_timestamp
, millisecond (MS
) or microsecond (US
) fields are used as the seconds digits after the decimal point. For example to_timestamp('12.3', 'SS.MS')
is not 3 milliseconds, but 300, because the conversion treats it as 12 + 0.3 seconds. So, for the format SS.MS
, the input values 12.3
, 12.30
, and 12.300
specify the same number of milliseconds. To get three milliseconds, one must write 12.003
, which the conversion treats as 12 + 0.003 = 12.003 seconds.
Here is a more complex example: to_timestamp('15:12:02.020.001230', 'HH24:MI:SS.MS.US')
is 15 hours, 12 minutes, and 2 seconds + 20 milliseconds + 1230 microseconds = 2.021230 seconds.
to_char(..., 'ID')
's day of the week numbering matches the extract(isodow from ...)
function, but to_char(..., 'D')
's does not match extract(dow from ...)
's day numbering.
to_char(interval)
formats HH
and HH12
as shown on a 12-hour clock, for example zero hours and 36 hours both output as 12
, while HH24
outputs the full hour value, which can exceed 23 in an interval
value.
Usage notes for numeric formatting:
0
specifies a digit position that will always be printed, even if it contains a leading/trailing zero. 9
also specifies a digit position, but if it is a leading zero then it will be replaced by a space, while if it is a trailing zero and fill mode is specified then it will be deleted. (For to_number()
, these two pattern characters are equivalent.)
If no explicit provision is made for a sign in to_char()
's pattern, one column will be reserved for the sign, and it will be anchored to (appear just left of) the number. If S
appears just left of some 9
's, it will likewise be anchored to the number.
A sign formatted using SG
, PL
, or MI
is not anchored to the number; for example, to_char(-12, 'MI9999')
produces '- 12'
but to_char(-12, 'S9999')
produces ' -12'
. (The Oracle implementation does not allow the use of MI
before 9
, but rather requires that 9
precede MI
.)
TH
does not convert values less than zero and does not convert fractional numbers.
PL
, SG
, and TH
are PostgreSQL extensions.
In to_number
, if non-data template patterns such as L
or TH
are used, the corresponding number of input characters are skipped, whether or not they match the template pattern, unless they are data characters (that is, digits, sign, decimal point, or comma). For example, TH
would skip two non-data characters.
V
with to_char
multiplies the input values by 10^
n
, where n
is the number of digits following V
. V
with to_number
divides in a similar manner. to_char
and to_number
do not support the use of V
combined with a decimal point (e.g., 99.9V99
is not allowed).
EEEE
(scientific notation) cannot be used in combination with any of the other formatting patterns or modifiers other than digit and decimal point patterns, and must be at the end of the format string (e.g., 9.99EEEE
is a valid pattern).
to_char
ExamplesThe geometric typespoint
,box
,lseg
,line
,path
,polygon
, andcircle
have a large set of native support functions and operators, shown in,, and.
Note that the“same as”operator,~=
, represents the usual notion of equality for thepoint
,box
,polygon
, andcircle
types. Some of these types also have an=
operator, but=
compares for equal_areas_only. The other scalar comparison operators (<=
and so on) likewise compare areas for these types.
Table 9.33. Geometric Operators
BeforePostgreSQL8.2, the containment operators@>
and<@
were respectively called~
and@
. These names are still available, but are deprecated and will eventually be removed.
Table 9.34. Geometric Functions
Table 9.35. Geometric Type Conversion Functions
It is possible to access the two component numbers of apoint
as though the point were an array with indexes 0 and 1. For example, ift.p
is apoint
column thenSELECT p[0] FROM t
retrieves the X coordinate andUPDATE t SET p[1] = ...
changes the Y coordinate. In the same way, a value of typebox
orlseg
can be treated as an array of twopoint
values.
Thearea
function works for the typesbox
,circle
, andpath
. Thearea
function only works on thepath
data type if the points in thepath
are non-intersecting. For example, thepath'((0,0),(0,1),(2,1),(2,2),(1,2),(1,0),(0,0))'::PATH
will not work; however, the following visually identicalpath'((0,0),(0,1),(1,1),(1,2),(2,2),(2,1),(1,1),(1,0),(0,0))'::PATH
will work. If the concept of an intersecting versus non-intersectingpath
is confusing, draw both of the abovepath
s side by side on a piece of graph paper.
本節描述的內容為:
用於處理和建立 JSON 資料的函數和運算子
SQL/JSON 路徑語言
要了解有關 SQL/JSON 標準的更多資訊,請參閱 []。有關於 PostgreSQL 支援的 JSON 型別的詳細資訊,請參閱。
列出了可用於 JSON 資料型別的運算子(請參閱)。
json
and jsonb
Operators這些運算子都有 json 和 jsonb 型別共用的變形。欄位/元素/路徑提取運算子回傳與其左側輸入相同的類型(json 或 jsonb),但指定為回傳 text 的運算符除外,這些運算子將結果強制轉換為 text。如果 JSON 輸入的結構不符合要求,則欄位/元素/路徑提取運算子將回傳 NULL 而不會失敗。例如,如果不存在這樣的元素。接受整數 JSON 陣列索引的欄位/元素/路徑提取運算子均支援表示從陣列末尾開始的負數索引值。
The standard comparison operators shown in are available for jsonb
, but not for json
. They follow the ordering rules for B-tree operations outlined at .
Some further operators also exist only for jsonb
, as shown in . Many of these operators can be indexed by jsonb
operator classes. For a full description of jsonb
containment and existence semantics, see . describes how these operators can be used to effectively index jsonb
data.
jsonb
OperatorsThe ||
operator concatenates the elements at the top level of each of its operands. It does not operate recursively. For example, if both operands are objects with a common key field name, the value of the field in the result will just be the value from the right hand operand.
The @?
and @@
operators suppress the following errors: lacking object field or array element, unexpected JSON item type, and numeric errors. This behavior might be helpful while searching over JSON document collections of varying structure.
array_to_json
and row_to_json
have the same behavior as to_json
except for offering a pretty-printing option. The behavior described for to_json
likewise applies to each individual value converted by the other JSON creation functions.
The functions json[b]_populate_record
, json[b]_populate_recordset
, json[b]_to_record
and json[b]_to_recordset
operate on a JSON object, or array of objects, and extract the values associated with keys whose names match column names of the output row type. Object fields that do not correspond to any output column name are ignored, and output columns that do not match any object field will be filled with nulls. To convert a JSON value to the SQL type of an output column, the following rules are applied in sequence:
A JSON null value is converted to a SQL null in all cases.
If the output column is of type json
or jsonb
, the JSON value is just reproduced exactly.
If the output column is a composite (row) type, and the JSON value is a JSON object, the fields of the object are converted to columns of the output row type by recursive application of these rules.
Likewise, if the output column is an array type and the JSON value is a JSON array, the elements of the JSON array are converted to elements of the output array by recursive application of these rules.
Otherwise, if the JSON value is a string literal, the contents of the string are fed to the input conversion function for the column's data type.
Otherwise, the ordinary text representation of the JSON value is fed to the input conversion function for the column's data type.
While the examples for these functions use constants, the typical use would be to reference a table in the FROM
clause and use one of its json
or jsonb
columns as an argument to the function. Extracted key values can then be referenced in other parts of the query, like WHERE
clauses and target lists. Extracting multiple values in this way can improve performance over extracting them separately with per-key operators.
All the items of the path
parameter of jsonb_set
as well as jsonb_insert
except the last item must be present in the target
. If create_missing
is false, all items of the path
parameter of jsonb_set
must be present. If these conditions are not met the target
is returned unchanged.
If the last path item is an object key, it will be created if it is absent and given the new value. If the last path item is an array index, if it is positive the item to set is found by counting from the left, and if negative by counting from the right - -1
designates the rightmost element, and so on. If the item is out of the range -array_length .. array_length -1, and create_missing is true, the new value is added at the beginning of the array if the item is negative, and at the end of the array if it is positive.
The json_typeof
function's null
return value should not be confused with a SQL NULL. While calling json_typeof('null'::json)
will return null
, calling json_typeof(NULL::json)
will return a SQL NULL.
If the argument to json_strip_nulls
contains duplicate field names in any object, the result could be semantically somewhat different, depending on the order in which they occur. This is not an issue for jsonb_strip_nulls
since jsonb
values never have duplicate object field names.
The jsonb_path_exists
, jsonb_path_match
, jsonb_path_query
, jsonb_path_query_array
, and jsonb_path_query_first
functions have optional vars
and silent
arguments.
If the vars
argument is specified, it provides an object containing named variables to be substituted into a jsonpath
expression.
If the silent
argument is specified and has the true
value, these functions suppress the same errors as the @?
and @@
operators.
JSON query functions and operators pass the provided path expression to the path engine for evaluation. If the expression matches the queried JSON data, the corresponding SQL/JSON item is returned. Path expressions are written in the SQL/JSON path language and can also include arithmetic expressions and functions. Query functions treat the provided expression as a text string, so it must be enclosed in single quotes.
A path expression consists of a sequence of elements allowed by the jsonpath
data type. The path expression is evaluated from left to right, but you can use parentheses to change the order of operations. If the evaluation is successful, a sequence of SQL/JSON items (SQL/JSON sequence) is produced, and the evaluation result is returned to the JSON query function that completes the specified computation.
For example, suppose you have some JSON data from a GPS tracker that you would like to parse, such as:
To retrieve the available track segments, you need to use the .
key
accessor operator for all the preceding JSON objects:
If the item to retrieve is an element of an array, you have to unnest this array using the [*]
operator. For example, the following path will return location coordinates for all the available track segments:
To return the coordinates of the first segment only, you can specify the corresponding subscript in the []
accessor operator. Note that the SQL/JSON arrays are 0-relative:
When defining the path, you can also use one or more filter expressions that work similar to the WHERE
clause in SQL. A filter expression begins with a question mark and provides a condition in parentheses:
Filter expressions must be specified right after the path evaluation step to which they are applied. The result of this step is filtered to include only those items that satisfy the provided condition. SQL/JSON defines three-valued logic, so the condition can be true
, false
, or unknown
. The unknown
value plays the same role as SQL NULL
and can be tested for with the is unknown
predicate. Further path evaluation steps use only those items for which filter expressions return true
.
Suppose you would like to retrieve all heart rate values higher than 130. You can achieve this using the following expression:
To get the start time of segments with such values instead, you have to filter out irrelevant segments before returning the start time, so the filter expression is applied to the previous step, and the path used in the condition is different:
You can use several filter expressions on the same nesting level, if required. For example, the following expression selects all segments that contain locations with relevant coordinates and high heart rate values:
Using filter expressions at different nesting levels is also allowed. The following example first filters all segments by location, and then returns high heart rate values for these segments, if available:
You can also nest filter expressions within each other:
This expression returns the size of the track if it contains any segments with high heart rate values, or an empty sequence otherwise.
PostgreSQL's implementation of SQL/JSON path language has the following deviations from the SQL/JSON standard:
.datetime()
item method is not implemented yet mainly because immutable jsonpath
functions and operators cannot reference session timezone, which is used in some datetime operations. Datetime support will be added to jsonpath
in future versions of PostgreSQL.
A path expression can be a Boolean predicate, although the SQL/JSON standard allows predicates only in filters. This is necessary for implementation of the @@
operator. For example, the following jsonpath
expression is valid in PostgreSQL:
When you query JSON data, the path expression may not match the actual JSON data structure. An attempt to access a non-existent member of an object or element of an array results in a structural error. SQL/JSON path expressions have two modes of handling structural errors:
lax (default) — the path engine implicitly adapts the queried data to the specified path. Any remaining structural errors are suppressed and converted to empty SQL/JSON sequences.
strict — if a structural error occurs, an error is raised.
The lax mode facilitates matching of a JSON document structure and path expression if the JSON data does not conform to the expected schema. If an operand does not match the requirements of a particular operation, it can be automatically wrapped as an SQL/JSON array or unwrapped by converting its elements into an SQL/JSON sequence before performing this operation. Besides, comparison operators automatically unwrap their operands in the lax mode, so you can compare SQL/JSON arrays out-of-the-box. An array of size 1 is considered equal to its sole element. Automatic unwrapping is not performed only when:
The path expression contains type()
or size()
methods that return the type and the number of elements in the array, respectively.
The queried JSON data contain nested arrays. In this case, only the outermost array is unwrapped, while all the inner arrays remain unchanged. Thus, implicit unwrapping can only go one level down within each path evaluation step.
For example, when querying the GPS data listed above, you can abstract from the fact that it stores an array of segments when using the lax mode:
In the strict mode, the specified path must exactly match the structure of the queried JSON document to return an SQL/JSON item, so using this path expression will cause an error. To get the same result as in the lax mode, you have to explicitly unwrap the segments
array:
SQL/JSON path expressions allow matching text to a regular expression with the like_regex
filter. For example, the following SQL/JSON path query would case-insensitively match all strings in an array that start with an English vowel:
The optional flag
string may include one or more of the characters i
for case-insensitive match, m
to allow ^
and $
to match at newlines, s
to allow .
to match a newline, and q
to quote the whole pattern (reducing the behavior to a simple substring match).
jsonpath
Operators and Methodsjsonpath
Filter Expression Elementsshows the functions available for use with thecidr
andinet
types. Theabbrev
,host
, andtext
functions are primarily intended to offer alternative display formats.
shows the functions available for use with themacaddr
type. The functiontrunc(macaddr
)returns a MAC address with the last 3 bytes set to zero. This can be used to associate the remaining prefix with a manufacturer.
shows the functions available for use with themacaddr8
type. The functiontrunc(macaddr8
)returns a MAC address with the last 5 bytes set to zero. This can be used to associate the remaining prefix with a manufacturer.
If you have turned off, any backslashes you write in literal string constants will need to be doubled. See for more information.
lists the available operators for pattern matching using POSIX regular expressions.
The regexp_replace
function provides substitution of new text for substrings that match POSIX regular expression patterns. It has the syntax regexp_replace
(source
, pattern
, replacement
[, flags
]). The source
string is returned unchanged if there is no match to the pattern
. If there is a match, the source
string is returned with the replacement
string substituted for the matching substring. The replacement
string can contain \
n
, where n
is 1 through 9, to indicate that the source substring matching the n
'th parenthesized subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be inserted. Write \\
if you need to put a literal backslash in the replacement text. The flags
parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Flag i
specifies case-insensitive matching, while flag g
specifies replacement of each matching substring rather than only the first one. Supported flags (though not g
) are described in .
The regexp_match
function returns a text array of captured substring(s) resulting from the first match of a POSIX regular expression pattern to a string. It has the syntax regexp_match
(string
, pattern
[, flags
]). If there is no match, the result is NULL
. If a match is found, and the pattern
contains no parenthesized subexpressions, then the result is a single-element text array containing the substring matching the whole pattern. If a match is found, and the pattern
contains parenthesized subexpressions, then the result is a text array whose n
'th element is the substring matching the n
'th parenthesized subexpression of the pattern
(not counting “non-capturing” parentheses; see below for details). The flags
parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Supported flags are described in .
The regexp_matches
function returns a set of text arrays of captured substring(s) resulting from matching a POSIX regular expression pattern to a string. It has the same syntax as regexp_match
. This function returns no rows if there is no match, one row if there is a match and the g
flag is not given, or N
rows if there are N
matches and the g
flag is given. Each returned row is a text array containing the whole matched substring or the substrings matching parenthesized subexpressions of the pattern
, just as described above for regexp_match
. regexp_matches
accepts all the flags shown in , plus the g
flag which commands it to return all matches, not just the first one.
The regexp_split_to_table
function splits a string using a POSIX regular expression pattern as a delimiter. It has the syntax regexp_split_to_table
(string
, pattern
[, flags
]). If there is no match to the pattern
, the function returns the string
. If there is at least one match, for each match it returns the text from the end of the last match (or the beginning of the string) to the beginning of the match. When there are no more matches, it returns the text from the end of the last match to the end of the string. The flags
parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. regexp_split_to_table
supports the flags described in .
PostgreSQL always initially presumes that a regular expression follows the ARE rules. However, the more limited ERE or BRE rules can be chosen by prepending an embedded option to the RE pattern, as described in . This can be useful for compatibility with applications that expect exactly the POSIX 1003.2 rules.
A quantified atom is an atom possibly followed by a single quantifier. Without a quantifier, it matches a match for the atom. With a quantifier, it can match some number of matches of the atom. An atom can be any of the possibilities shown in . The possible quantifiers and their meanings are shown in .
A constraint matches an empty string, but matches only when specific conditions are met. A constraint can be used where an atom could be used, except it cannot be followed by a quantifier. The simple constraints are shown in ; some more constraints are described later.
If you have turned off, any backslashes you write in literal string constants will need to be doubled. See for more information.
Non-greedy quantifiers (available in AREs only) match the same possibilities as their corresponding normal (greedy) counterparts, but prefer the smallest number rather than the largest number of matches. See for more detail.
Lookahead and lookbehind constraints cannot contain back references (see ), and all parentheses within them are considered non-capturing.
Character-entry escapes exist to make it easier to specify non-printing and other inconvenient characters in REs. They are shown in .
Class-shorthand escapes provide shorthands for certain commonly-used character classes. They are shown in .
A constraint escape is a constraint, matching the empty string if specific conditions are met, written as an escape. They are shown in .
A back reference (\
n
) matches the same string matched by the previous parenthesized subexpression specified by the number n
(see ). For example, ([bc])\1
matches bb
or cc
but not bc
or cb
. The subexpression must entirely precede the back reference in the RE. Subexpressions are numbered in the order of their leading parentheses. Non-capturing parentheses do not define subexpressions.
An ARE can begin with embedded options: a sequence (?
xyz
)
(where xyz
is one or more alphabetic characters) specifies options affecting the rest of the RE. These options override any previously determined options — in particular, they can override the case-sensitivity behavior implied by a regex operator, or the flags
parameter to a regex function. The available option letters are shown in . Note that these same option letters are used in the flags
parameters of regex functions.
The seconds field, including fractional parts (0 - 59)
The extract
function is primarily intended for computational processing. For formatting date/time values for display, see .
When the input value is of type timestamp with time zone
, the truncation is performed with respect to a particular time zone; for example, truncation to day
produces a value that is midnight in that zone. By default, truncation is done with respect to the current setting, but the optional time_zone
argument can be provided to specify a different time zone. The time zone name can be specified in any of the ways described in .
The AT TIME ZONE
converts time stamp without time zone to/from time stamp with time zone, and time values to different time zones. shows its variants.
In these expressions, the desired time zone zone
can be specified either as a text string (e.g., 'America/Los_Angeles'
) or as an interval (e.g., INTERVAL '-08:00'
). In the text case, a time zone name can be specified in any of the ways described in .
60 if leap seconds are implemented by the operating system
所有結果表示式的資料型別都必須可轉換為單一的輸出型別。更多細節請參閱 。
如 所述,在不同時候計算表示式的子表示式時會出現各種情況,因此「CASE 只計算必要子表示式」的原則並不是固定的。例如,一個常數 1/0 的子表示式在查詢規畫時通常就會導致一個除以零的錯誤,即使它在 CASE 部分內,在執行時永遠不會被使用。
GREATEST 和 LEAST 函數從任意數量的表示式列表中選擇最大值或最小值。表示式必須全部轉換為通用的資料型別,這將成為結果的別型(詳見 )。列表中的 NULL 值將會被忽略。僅當所有表示式求值為 NULL 時,結果才會為 NULL。
Seefor more information aboutregclass
.
If a sequence object has been created with default parameters, successivenextval
calls will return successive values beginning with 1. Other behaviors can be obtained by using special parameters in thecommand; see its command reference page for more information.
All the text search functions that accept an optionalregconfig
argument will use the configuration specified bywhen that argument is omitted.
The functions inare listed separately because they are not usually used in everyday text searching operations. They are helpful for development and debugging of new text search configurations.
與此處描述的其他函數不同,函數 xmlagg 是一個彙總函數。它將輸入值連接到彙總函數呼叫,就像 xmlconcat 一樣,除了它是跨資料列而不是在單個資料列中的表示式進行連接。有關彙總函數的其他訊息,請參閱。
如果參數 XML 是正確的 XML 文件,則表示式 IS DOCUMENT 將回傳 true,如果不是(它是內容片段),則回傳 false;如果參數為 null,則回傳 null。有關文件和內容片段之間的區別,請參閱。
As an example of using the output produced by these functions, shows an XSLT stylesheet that converts the output of table_to_xml_and_xmlschema
to an HTML document containing a tabular rendition of the table data. In a similar manner, the results from these functions can be converted into other XML-based formats.
Modifiers can be applied to any template pattern to alter its behavior. For example, FMMonth
is the Month
pattern with the FM
modifier. shows the modifier patterns for date/time formatting.
While to_date
will reject a mixture of Gregorian and ISO week-numbering date fields, to_char
will not, since output format specifications like YYYY-MM-DD (IYYY-IDDD)
can be useful. But avoid writing something like IYYY-MM-DD
; that would yield surprising results near the start of the year. (See for more information.)
shows the template patterns available for formatting numeric values.
The pattern characters S
, L
, D
, and G
represent the sign, currency symbol, decimal point, and thousands separator characters defined by the current locale (see and ). The pattern characters period and comma represent those exact characters, with the meanings of decimal point and thousands separator, regardless of locale.
Certain modifiers can be applied to any template pattern to alter its behavior. For example, FM99.99
is the 99.99
pattern with the FM
modifier. shows the modifier patterns for numeric formatting.
shows some examples of the use of the to_char
function.
shows the functions that are available for creating json
and jsonb
values. (There are no equivalent functions for jsonb
, of the row_to_json
and array_to_json
functions. However, the to_jsonb
function supplies much the same functionality as these functions would.)
The extension has a cast from hstore
to json
, so that hstore
values converted via the JSON creation functions will be represented as JSON objects, not as primitive string values.
shows the functions that are available for processing json
and jsonb
values.
Many of these functions and operators will convert Unicode escapes in JSON strings to the appropriate single character. This is a non-issue if the input is type jsonb
, because the conversion was already done; but for json
input, this may result in throwing an error, as noted in .
See also for the aggregate function json_agg
which aggregates record values as JSON, and the aggregate function json_object_agg
which aggregates pairs of values into a JSON object, and their jsonb
equivalents, jsonb_agg
and jsonb_object_agg
.
SQL/JSON path expressions specify the items to be retrieved from the JSON data, similar to XPath expressions used for SQL access to XML. In PostgreSQL, path expressions are implemented as the jsonpath
data type and can use any elements described in .
To refer to the JSON data to be queried (the context item), use the $
sign in the path expression. It can be followed by one or more , which go down the JSON structure level by level to retrieve the content of context item. Each operator that follows deals with the result of the previous evaluation step.
The result of each path evaluation step can be processed by one or more jsonpath
operators and methods listed in . Each method name must be preceded by a dot. For example, you can get an array size:
For more examples of using jsonpath
operators and methods within path expressions, see .
Functions and operators that can be used in filter expressions are listed in . The path evaluation result to be filtered is denoted by the @
variable. To refer to a JSON element stored at a lower nesting level, add one or more accessor operators after @
.
There are minor differences in the interpretation of regular expression patterns used in like_regex
filters, as described in .
The SQL/JSON standard borrows its definition for regular expressions from the LIKE_REGEX
operator, which in turn uses the XQuery standard. PostgreSQL does not currently support the LIKE_REGEX
operator. Therefore, the like_regex
filter is implemented using the POSIX regular expression engine described in . This leads to various minor discrepancies from standard SQL/JSON behavior, which are cataloged in . Note, however, that the flag-letter incompatibilities described there do not apply to SQL/JSON, as it translates the XQuery flag letters to match what the POSIX engine expects.
Keep in mind that the pattern argument of like_regex
is a JSON path string literal, written according to the rules given in . This means in particular that any backslashes you want to use in the regular expression must be doubled. For example, to match strings that contain only digits:
shows the operators and methods available in jsonpath
. shows the available filter expression elements.
JSON primitive type
PostgreSQL type
Notes
string
text
禁止使用 \u0000,如果資料庫編碼不是 UTF8,則不允許使用非 ASCII Unicode 轉譯
number
numeric
不允許使用 NaN 和 infinity
boolean
boolean
僅接受小寫的 true 和 false
null
(none)
與 SQL NULL 是不同的概念
Variable
Description
$
A variable representing the JSON text to be queried (the context item).
$varname
A named variable. Its value can be set by the parameter vars
of several JSON processing functions. See Table 9.47 and its notes for details.
@
A variable representing the result of path evaluation in filter expressions.
Accessor Operator
Description
.
key
."$
varname
"
Member accessor that returns an object member with the specified key. If the key name is a named variable starting with $
or does not meet the JavaScript rules of an identifier, it must be enclosed in double quotes as a character string literal.
.*
Wildcard member accessor that returns the values of all members located at the top level of the current object.
.**
Recursive wildcard member accessor that processes all levels of the JSON hierarchy of the current object and returns all the member values, regardless of their nesting level. This is a PostgreSQL extension of the SQL/JSON standard.
.**{
level
}
.**{
start_level
to end_level
}
Same as .**
, but with a filter over nesting levels of JSON hierarchy. Nesting levels are specified as integers. Zero level corresponds to the current object. To access the lowest nesting level, you can use the last
keyword. This is a PostgreSQL extension of the SQL/JSON standard.
[
subscript
, ...]
Array element accessor. subscript
can be given in two forms: index
or start_index
to end_index
. The first form returns a single array element by its index. The second form returns an array slice by the range of indexes, including the elements that correspond to the provided start_index
and end_index
.
The specified index
can be an integer, as well as an expression returning a single numeric value, which is automatically cast to integer. Zero index corresponds to the first array element. You can also use the last
keyword to denote the last array element, which is useful for handling arrays of unknown length.
[*]
Wildcard array element accessor that returns all array elements.
Name
Description
any
Indicates that a function accepts any input data type.
anyelement
Indicates that a function accepts any data type (see Section 37.2.5).
anyarray
Indicates that a function accepts any array data type (see Section 37.2.5).
anynonarray
Indicates that a function accepts any non-array data type (see Section 37.2.5).
anyenum
Indicates that a function accepts any enum data type (see Section 37.2.5 and Section 8.7).
anyrange
Indicates that a function accepts any range data type (see Section 37.2.5 and Section 8.17).
cstring
Indicates that a function accepts or returns a null-terminated C string.
internal
Indicates that a function accepts or returns a server-internal data type.
language_handler
A procedural language call handler is declared to return language_handler
.
fdw_handler
A foreign-data wrapper handler is declared to return fdw_handler
.
index_am_handler
An index access method handler is declared to return index_am_handler
.
tsm_handler
A tablesample method handler is declared to return tsm_handler
.
record
Identifies a function taking or returning an unspecified row type.
trigger
A trigger function is declared to return trigger.
event_trigger
An event trigger function is declared to return event_trigger.
pg_ddl_command
Identifies a representation of DDL commands that is available to event triggers.
void
Indicates that a function returns no value.
unknown
Identifies a not-yet-resolved type, e.g. of an undecorated string literal.
opaque
An obsolete type name that formerly served many of the above purposes.
Name
References
Description
Value Example
oid
any
numeric object identifier
564182
regproc
pg_proc
function name
sum
regprocedure
pg_proc
function with argument types
sum(int4)
regoper
pg_operator
operator name
+
regoperator
pg_operator
operator with argument types
*(integer,integer)
or -(NONE,integer)
regclass
pg_class
relation name
pg_type
regtype
pg_type
data type name
integer
regrole
pg_authid
role name
smithee
regnamespace
pg_namespace
namespace name
pg_catalog
regconfig
pg_ts_config
text search configuration
english
regdictionary
pg_ts_dict
text search dictionary
simple
Function
Return Type
Description
Example
Result
string
||
string
bytea
String concatenation
'\\Post'::bytea || '\047gres\000'::bytea
\\Post'gres\000
octet_length(
string
)
int
Number of bytes in binary string
octet_length('jo\000se'::bytea)
5
overlay(
string
placing string
from int
[for int
])
bytea
Replace substring
overlay('Th\000omas'::bytea placing '\002\003'::bytea from 2 for 3)
T\\002\\003mas
position(
substring
in string
)
int
Location of specified substring
position('\000om'::bytea in 'Th\000omas'::bytea)
3
substring(
string
[from int
] [for int
])
bytea
Extract substring
substring('Th\000omas'::bytea from 2 for 3)
h\000o
trim([both]
bytes
from string
)
bytea
Remove the longest string containing only bytes appearing in bytes
from the start and end of string
trim('\000\001'::bytea from '\000Tom\001'::bytea)
Tom
Function
Return Type
Description
Example
Result
btrim(
string
bytea
, bytes
bytea
)
bytea
Remove the longest string containing only bytes appearing in bytes
from the start and end of string
btrim('\000trim\001'::bytea, '\000\001'::bytea)
trim
decode(
string
text
, format
text
)
bytea
Decode binary data from textual representation in string
. Options for format
are same as in encode
.
decode('123\000456', 'escape')
123\000456
encode(
data
bytea
, format
text
)
text
Encode binary data into a textual representation. Supported formats are: base64
, hex
, escape
. escape
converts zero bytes and high-bit-set bytes to octal sequences (\
nnn
) and doubles backslashes.
encode('123\000456'::bytea, 'escape')
123\000456
get_bit(
string
, offset
)
int
Extract bit from string
get_bit('Th\000omas'::bytea, 45)
1
get_byte(
string
, offset
)
int
Extract byte from string
get_byte('Th\000omas'::bytea, 4)
109
length(
string
)
int
Length of binary string
length('jo\000se'::bytea)
5
md5(
string
)
text
Calculates the MD5 hash of string
, returning the result in hexadecimal
md5('Th\000omas'::bytea)
8ab2d3c9689aaf18b4958c334c82d8b1
set_bit(
string
, offset
, newvalue
)
bytea
Set bit in string
set_bit('Th\000omas'::bytea, 45, 0)
Th\000omAs
set_byte(
string
, offset
, newvalue
)
bytea
Set byte in string
set_byte('Th\000omas'::bytea, 4, 64)
Th\000o@as
sha224(bytea
)
bytea
SHA-224 hash
sha224('abc')
\x23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7
sha256(bytea
)
bytea
SHA-256 hash
sha256('abc')
\xba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
sha384(bytea
)
bytea
SHA-384 hash
sha384('abc')
\xcb00753f45a35e8bb5a03d699ac65007272c32ab0eded1631a8b605a43ff5bed8086072ba1e7cc2358baeca134c825a7
sha512(bytea
)
bytea
SHA-512 hash
sha512('abc')
\xddaf35a193617abacc417349ae20413112e6fa4e89a97ea20a9eeee64b55d39a2192992a274fc1a836ba3c23a3feebbd454d4423643ce80e2a9ac94fa54ca49f
Operator
Description
Example
Result
||
concatenation
B'10001' || B'011'
10001011
&
bitwise AND
B'10001' & B'01101'
00001
|
bitwise OR
B'10001' | B'01101'
11101
#
bitwise XOR
B'10001' # B'01101'
11100
~
bitwise NOT
~ B'10001'
01110
<<
bitwise shift left
B'10001' << 3
01000
>>
bitwise shift right
B'10001' >> 2
00100
AND
OR
NOT
a
b
a
AND b
a
OR b
TRUE
TRUE
TRUE
TRUE
TRUE
FALSE
FALSE
TRUE
TRUE
NULL
NULL
TRUE
FALSE
FALSE
FALSE
FALSE
FALSE
NULL
FALSE
NULL
NULL
NULL
NULL
NULL
a
NOT a
TRUE
FALSE
FALSE
TRUE
NULL
NULL
Operator
Description
Example
Result
+
addition
2 + 3
5
-
subtraction
2 - 3
-1
*
multiplication
2 * 3
6
/
division (integer division truncates the result)
4 / 2
2
%
modulo (remainder)
5 % 4
1
^
exponentiation (associates left to right)
2.0 ^ 3.0
8
|/
square root
|/ 25.0
5
||/
cube root
||/ 27.0
3
!
factorial
5 !
120
!!
factorial (prefix operator)
!! 5
120
@
absolute value
@ -5.0
5
&
bitwise AND
91 & 15
11
|
bitwise OR
32 | 3
35
#
bitwise XOR
17 # 5
20
~
bitwise NOT
~1
-2
<<
bitwise shift left
1 << 4
16
>>
bitwise shift right
8 >> 2
2
Function
Return Type
Description
Example
Result
abs(
x
)
(same as input)
absolute value
abs(-17.4)
17.4
cbrt(dp
)
dp
cube root
cbrt(27.0)
3
ceil(dp
or numeric
)
(same as input)
nearest integer greater than or equal to argument
ceil(-42.8)
-42
ceiling(dp
or numeric
)
(same as input)
nearest integer greater than or equal to argument (same as ceil
)
ceiling(-95.3)
-95
degrees(dp
)
dp
radians to degrees
degrees(0.5)
28.6478897565412
div(
y
numeric
, x
numeric
)
numeric
integer quotient of y
/x
div(9,4)
2
exp(dp
or numeric
)
(same as input)
exponential
exp(1.0)
2.71828182845905
floor(dp
or numeric
)
(same as input)
nearest integer less than or equal to argument
floor(-42.8)
-43
ln(dp
or numeric
)
(same as input)
natural logarithm
ln(2.0)
0.693147180559945
log(dp
or numeric
)
(same as input)
base 10 logarithm
log(100.0)
2
log10(dp
or numeric
)
(same as input)
base 10 logarithm
log10(100.0)
2
log(
b
numeric
, x
numeric
)
numeric
logarithm to base b
log(2.0, 64.0)
6.0000000000
mod(
y
, x
)
(same as argument types)
remainder of y
/x
mod(9,4)
1
pi()
dp
“π” constant
pi()
3.14159265358979
power(
a
dp
, b
dp
)
dp
a
raised to the power of b
power(9.0, 3.0)
729
power(
a
numeric
, b
numeric
)
numeric
a
raised to the power of b
power(9.0, 3.0)
729
radians(dp
)
dp
degrees to radians
radians(45.0)
0.785398163397448
round(dp
or numeric
)
(same as input)
round to nearest integer
round(42.4)
42
round(
v
numeric
, s
int
)
numeric
round to s
decimal places
round(42.4382, 2)
42.44
scale(numeric
)
integer
scale of the argument (the number of decimal digits in the fractional part)
scale(8.41)
2
sign(dp
or numeric
)
(same as input)
sign of the argument (-1, 0, +1)
sign(-8.4)
-1
sqrt(dp
or numeric
)
(same as input)
square root
sqrt(2.0)
1.4142135623731
trunc(dp
or numeric
)
(same as input)
truncate toward zero
trunc(42.8)
42
trunc(
v
numeric
, s
int
)
numeric
truncate to s
decimal places
trunc(42.4382, 2)
42.43
width_bucket(
operand
dp
, b1
dp
, b2
dp
, count
int
)
int
return the bucket number to which operand
would be assigned in a histogram having count
equal-width buckets spanning the range b1
to b2
; returns 0
or count
+1 for an input outside the range
width_bucket(5.35, 0.024, 10.06, 5)
3
width_bucket(
operand
numeric
, b1
numeric
, b2
numeric
, count
int
)
int
return the bucket number to which operand
would be assigned in a histogram having count
equal-width buckets spanning the range b1
to b2
; returns 0
or count
+1 for an input outside the range
width_bucket(5.35, 0.024, 10.06, 5)
3
width_bucket(
operand
anyelement
, thresholds
anyarray
)
int
return the bucket number to which operand
would be assigned given an array listing the lower bounds of the buckets; returns 0
for an input less than the first lower bound; the thresholds
array must be sorted, smallest first, or unexpected results will be obtained
width_bucket(now(), array['yesterday', 'today', 'tomorrow']::timestamptz[])
2
Function
Return Type
Description
random()
dp
random value in the range 0.0 <= x < 1.0
setseed(dp
)
void
set seed for subsequent random()
calls (value between -1.0 and 1.0, inclusive)
Function (radians)
Function (degrees)
Description
acos(
x
)
acosd(
x
)
inverse cosine
asin(
x
)
asind(
x
)
inverse sine
atan(
x
)
atand(
x
)
inverse tangent
atan2(
y
, x
)
atan2d(
y
, x
)
inverse tangent of y
/x
cos(
x
)
cosd(
x
)
cosine
cot(
x
)
cotd(
x
)
cotangent
sin(
x
)
sind(
x
)
sine
tan(
x
)
tand(
x
)
tangent
Function
Description
Example
Result
sinh(
x
)
hyperbolic sine
sinh(0)
0
cosh(
x
)
hyperbolic cosine
cosh(0)
1
tanh(
x
)
hyperbolic tangent
tanh(0)
0
asinh(
x
)
inverse hyperbolic sine
asinh(0)
0
acosh(
x
)
inverse hyperbolic cosine
acosh(1)
0
atanh(
x
)
inverse hyperbolic tangent
atanh(0)
0
Function
Return Type
Description
Example
Result
string
||
string
text
String concatenation
'Post' || 'greSQL'
PostgreSQL
string
||
non-string
or non-string
||
string
text
String concatenation with one non-string input
'Value: ' || 42
Value: 42
bit_length(
string
)
int
Number of bits in string
bit_length('jose')
32
char_length(
string
) or character_length(
string
)
int
Number of characters in string
char_length('jose')
4
lower(
string
)
text
Convert string to lower case
lower('TOM')
tom
octet_length(
string
)
int
Number of bytes in string
octet_length('jose')
4
overlay(
string
placing string
from int
[for int
])
text
Replace substring
overlay('Txxxxas' placing 'hom' from 2 for 4)
Thomas
position(
substring
in string
)
int
Location of specified substring
position('om' in 'Thomas')
3
substring(
string
[from int
] [for int
])
text
Extract substring
substring('Thomas' from 2 for 3)
hom
substring(
string
from pattern
)
text
Extract substring matching POSIX regular expression. See Section 9.7 for more information on pattern matching.
substring('Thomas' from '...$')
mas
substring(
string
from pattern
for escape
)
text
Extract substring matching SQL regular expression. See Section 9.7 for more information on pattern matching.
substring('Thomas' from '%#"o_a#"_' for '#')
oma
trim([leading | trailing | both] [
characters
] from string
)
text
Remove the longest string containing only characters from characters
(a space by default) from the start, end, or both ends (both
is the default) of string
trim(both 'xyz' from 'yxTomxx')
Tom
trim([leading | trailing | both] [from]
string
[, characters
] )
text
Non-standard syntax for trim()
trim(both from 'yxTomxx', 'xyz')
Tom
upper(
string
)
text
Convert string to upper case
upper('tom')
TOM
Function
Return Type
Description
Example
Result
ascii(
string
)
int
ASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character.
ascii('x')
120
btrim(
string
text
[, characters
text
])
text
Remove the longest string consisting only of characters in characters
(a space by default) from the start and end of string
btrim('xyxtrimyyx', 'xyz')
trim
chr(int
)
text
Character with the given code. For UTF8 the argument is treated as a Unicode code point. For other multibyte encodings the argument must designate an ASCII character. The NULL (0) character is not allowed because text data types cannot store such bytes.
chr(65)
A
concat(
str
"any"
[, str
"any"
[, ...] ])
text
Concatenate the text representations of all the arguments. NULL arguments are ignored.
concat('abcde', 2, NULL, 22)
abcde222
concat_ws(
sep
text
, str
"any"
[, str
"any"
[, ...] ])
text
Concatenate all but the first argument with separators. The first argument is used as the separator string. NULL arguments are ignored.
concat_ws(',', 'abcde', 2, NULL, 22)
abcde,2,22
convert(
string
bytea
, src_encoding
name
, dest_encoding
name
)
bytea
Convert string to dest_encoding
. The original encoding is specified by src_encoding
. The string
must be valid in this encoding. Conversions can be defined by CREATE CONVERSION
. Also there are some predefined conversions. See Table 9.11 for available conversions.
convert('text_in_utf8', 'UTF8', 'LATIN1')
text_in_utf8
represented in Latin-1 encoding (ISO 8859-1)
convert_from(
string
bytea
, src_encoding
name
)
text
Convert string to the database encoding. The original encoding is specified by src_encoding
. The string
must be valid in this encoding.
convert_from('text_in_utf8', 'UTF8')
text_in_utf8
represented in the current database encoding
convert_to(
string
text
, dest_encoding
name
)
bytea
Convert string to dest_encoding
.
convert_to('some text', 'UTF8')
some text
represented in the UTF8 encoding
decode(
string
text
, format
text
)
bytea
Decode binary data from textual representation in string
. Options for format
are same as in encode
.
decode('MTIzAAE=', 'base64')
\x3132330001
encode(
data
bytea
, format
text
)
text
Encode binary data into a textual representation. Supported formats are: base64
, hex
, escape
. escape
converts zero bytes and high-bit-set bytes to octal sequences (\
nnn
) and doubles backslashes.
encode('123\000\001', 'base64')
MTIzAAE=
format
(formatstr
text
[, formatarg
"any"
[, ...] ])
text
Format arguments according to a format string. This function is similar to the C function sprintf
. See Section 9.4.1.
format('Hello %s, %1$s', 'World')
Hello World, World
initcap(
string
)
text
Convert the first letter of each word to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.
initcap('hi THOMAS')
Hi Thomas
left(
str
text
, n
int
)
text
Return first n
characters in the string. When n
is negative, return all but last |n
| characters.
left('abcde', 2)
ab
length(
string
)
int
Number of characters in string
length('jose')
4
length(
string
bytea
, encoding
name
)
int
Number of characters in string
in the given encoding
. The string
must be valid in this encoding.
length('jose', 'UTF8')
4
lpad(
string
text
, length
int
[, fill
text
])
text
Fill up the string
to length length
by prepending the characters fill
(a space by default). If the string
is already longer than length
then it is truncated (on the right).
lpad('hi', 5, 'xy')
xyxhi
ltrim(
string
text
[, characters
text
])
text
Remove the longest string containing only characters from characters
(a space by default) from the start of string
ltrim('zzzytest', 'xyz')
test
md5(
string
)
text
Calculates the MD5 hash of string
, returning the result in hexadecimal
md5('abc')
900150983cd24fb0 d6963f7d28e17f72
parse_ident(
qualified_identifier
text
[, strictmode
boolean
DEFAULT true ] )
text[]
Split qualified_identifier
into an array of identifiers, removing any quoting of individual identifiers. By default, extra characters after the last identifier are considered an error; but if the second parameter is false
, then such extra characters are ignored. (This behavior is useful for parsing names for objects like functions.) Note that this function does not truncate over-length identifiers. If you want truncation you can cast the result to name[]
.
parse_ident('"SomeSchema".someTable')
{SomeSchema,sometable}
pg_client_encoding()
name
Current client encoding name
pg_client_encoding()
SQL_ASCII
quote_ident(
string
text
)
text
Return the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are properly doubled. See also Example 42.1.
quote_ident('Foo bar')
"Foo bar"
quote_literal(
string
text
)
text
Return the given string suitably quoted to be used as a string literal in an SQL statement string. Embedded single-quotes and backslashes are properly doubled. Note that quote_literal
returns null on null input; if the argument might be null, quote_nullable
is often more suitable. See also Example 42.1.
quote_literal(E'O\'Reilly')
'O''Reilly'
quote_literal(
value
anyelement
)
text
Coerce the given value to text and then quote it as a literal. Embedded single-quotes and backslashes are properly doubled.
quote_literal(42.5)
'42.5'
quote_nullable(
string
text
)
text
Return the given string suitably quoted to be used as a string literal in an SQL statement string; or, if the argument is null, return NULL
. Embedded single-quotes and backslashes are properly doubled. See also Example 42.1.
quote_nullable(NULL)
NULL
quote_nullable(
value
anyelement
)
text
Coerce the given value to text and then quote it as a literal; or, if the argument is null, return NULL
. Embedded single-quotes and backslashes are properly doubled.
quote_nullable(42.5)
'42.5'
regexp_match(
string
text
, pattern
text
[, flags
text
])
text[]
Return captured substring(s) resulting from the first match of a POSIX regular expression to the string
. See Section 9.7.3 for more information.
regexp_match('foobarbequebaz', '(bar)(beque)')
{bar,beque}
regexp_matches(
string
text
, pattern
text
[, flags
text
])
setof text[]
Return captured substring(s) resulting from matching a POSIX regular expression to the string
. See Section 9.7.3 for more information.
regexp_matches('foobarbequebaz', 'ba.', 'g')
{bar}
{baz}
(2 rows)
regexp_replace(
string
text
, pattern
text
, replacement
text
[, flags
text
])
text
Replace substring(s) matching a POSIX regular expression. See Section 9.7.3 for more information.
regexp_replace('Thomas', '.[mN]a.', 'M')
ThM
regexp_split_to_array(
string
text
, pattern
text
[, flags
text
])
text[]
Split string
using a POSIX regular expression as the delimiter. See Section 9.7.3 for more information.
regexp_split_to_array('hello world', '\s+')
{hello,world}
regexp_split_to_table(
string
text
, pattern
text
[, flags
text
])
setof text
Split string
using a POSIX regular expression as the delimiter. See Section 9.7.3 for more information.
regexp_split_to_table('hello world', '\s+')
hello
world
(2 rows)
repeat(
string
text
, number
int
)
text
Repeat string
the specified number
of times
repeat('Pg', 4)
PgPgPgPg
replace(
string
text
, from
text
, to
text
)
text
Replace all occurrences in string
of substring from
with substring to
replace('abcdefabcdef', 'cd', 'XX')
abXXefabXXef
reverse(
str
)
text
Return reversed string.
reverse('abcde')
edcba
right(
str
text
, n
int
)
text
Return last n
characters in the string. When n
is negative, return all but first |n
| characters.
right('abcde', 2)
de
rpad(
string
text
, length
int
[, fill
text
])
text
Fill up the string
to length length
by appending the characters fill
(a space by default). If the string
is already longer than length
then it is truncated.
rpad('hi', 5, 'xy')
hixyx
rtrim(
string
text
[, characters
text
])
text
Remove the longest string containing only characters from characters
(a space by default) from the end of string
rtrim('testxxzx', 'xyz')
test
split_part(
string
text
, delimiter
text
, field
int
)
text
Split string
on delimiter
and return the given field (counting from one)
split_part('abc~@~def~@~ghi', '~@~', 2)
def
strpos(
string
, substring
)
int
Location of specified substring (same as position(
substring
in string
), but note the reversed argument order)
strpos('high', 'ig')
2
substr(
string
, from
[, count
])
text
回傳子字串(與 substring(string
from from
for count
) 相同)
substr('alphabet', 3, 2)
ph
starts_with(
string
, prefix
)
bool
Returns true if string
starts with prefix
.
starts_with('alphabet', 'alph')
t
to_ascii(
string
text
[, encoding
text
])
text
Convert string
to ASCII from another encoding (only supports conversion from LATIN1
, LATIN2
, LATIN9
, and WIN1250
encodings)
to_ascii('Karel')
Karel
to_hex(
number
int
or bigint
)
text
Convert number
to its equivalent hexadecimal representation
to_hex(2147483647)
7fffffff
translate(
string
text
, from
text
, to
text
)
text
Any character in string
that matches a character in the from
set is replaced by the corresponding character in the to
set. If from
is longer than to
, occurrences of the extra characters in from
are removed.
translate('12345', '143', 'ax')
a2x5
Conversion Name
Source Encoding
Destination Encoding
ascii_to_mic
SQL_ASCII
MULE_INTERNAL
ascii_to_utf8
SQL_ASCII
UTF8
big5_to_euc_tw
BIG5
EUC_TW
big5_to_mic
BIG5
MULE_INTERNAL
big5_to_utf8
BIG5
UTF8
euc_cn_to_mic
EUC_CN
MULE_INTERNAL
euc_cn_to_utf8
EUC_CN
UTF8
euc_jp_to_mic
EUC_JP
MULE_INTERNAL
euc_jp_to_sjis
EUC_JP
SJIS
euc_jp_to_utf8
EUC_JP
UTF8
euc_kr_to_mic
EUC_KR
MULE_INTERNAL
euc_kr_to_utf8
EUC_KR
UTF8
euc_tw_to_big5
EUC_TW
BIG5
euc_tw_to_mic
EUC_TW
MULE_INTERNAL
euc_tw_to_utf8
EUC_TW
UTF8
gb18030_to_utf8
GB18030
UTF8
gbk_to_utf8
GBK
UTF8
iso_8859_10_to_utf8
LATIN6
UTF8
iso_8859_13_to_utf8
LATIN7
UTF8
iso_8859_14_to_utf8
LATIN8
UTF8
iso_8859_15_to_utf8
LATIN9
UTF8
iso_8859_16_to_utf8
LATIN10
UTF8
iso_8859_1_to_mic
LATIN1
MULE_INTERNAL
iso_8859_1_to_utf8
LATIN1
UTF8
iso_8859_2_to_mic
LATIN2
MULE_INTERNAL
iso_8859_2_to_utf8
LATIN2
UTF8
iso_8859_2_to_windows_1250
LATIN2
WIN1250
iso_8859_3_to_mic
LATIN3
MULE_INTERNAL
iso_8859_3_to_utf8
LATIN3
UTF8
iso_8859_4_to_mic
LATIN4
MULE_INTERNAL
iso_8859_4_to_utf8
LATIN4
UTF8
iso_8859_5_to_koi8_r
ISO_8859_5
KOI8R
iso_8859_5_to_mic
ISO_8859_5
MULE_INTERNAL
iso_8859_5_to_utf8
ISO_8859_5
UTF8
iso_8859_5_to_windows_1251
ISO_8859_5
WIN1251
iso_8859_5_to_windows_866
ISO_8859_5
WIN866
iso_8859_6_to_utf8
ISO_8859_6
UTF8
iso_8859_7_to_utf8
ISO_8859_7
UTF8
iso_8859_8_to_utf8
ISO_8859_8
UTF8
iso_8859_9_to_utf8
LATIN5
UTF8
johab_to_utf8
JOHAB
UTF8
koi8_r_to_iso_8859_5
KOI8R
ISO_8859_5
koi8_r_to_mic
KOI8R
MULE_INTERNAL
koi8_r_to_utf8
KOI8R
UTF8
koi8_r_to_windows_1251
KOI8R
WIN1251
koi8_r_to_windows_866
KOI8R
WIN866
koi8_u_to_utf8
KOI8U
UTF8
mic_to_ascii
MULE_INTERNAL
SQL_ASCII
mic_to_big5
MULE_INTERNAL
BIG5
mic_to_euc_cn
MULE_INTERNAL
EUC_CN
mic_to_euc_jp
MULE_INTERNAL
EUC_JP
mic_to_euc_kr
MULE_INTERNAL
EUC_KR
mic_to_euc_tw
MULE_INTERNAL
EUC_TW
mic_to_iso_8859_1
MULE_INTERNAL
LATIN1
mic_to_iso_8859_2
MULE_INTERNAL
LATIN2
mic_to_iso_8859_3
MULE_INTERNAL
LATIN3
mic_to_iso_8859_4
MULE_INTERNAL
LATIN4
mic_to_iso_8859_5
MULE_INTERNAL
ISO_8859_5
mic_to_koi8_r
MULE_INTERNAL
KOI8R
mic_to_sjis
MULE_INTERNAL
SJIS
mic_to_windows_1250
MULE_INTERNAL
WIN1250
mic_to_windows_1251
MULE_INTERNAL
WIN1251
mic_to_windows_866
MULE_INTERNAL
WIN866
sjis_to_euc_jp
SJIS
EUC_JP
sjis_to_mic
SJIS
MULE_INTERNAL
sjis_to_utf8
SJIS
UTF8
windows_1258_to_utf8
WIN1258
UTF8
uhc_to_utf8
UHC
UTF8
utf8_to_ascii
UTF8
SQL_ASCII
utf8_to_big5
UTF8
BIG5
utf8_to_euc_cn
UTF8
EUC_CN
utf8_to_euc_jp
UTF8
EUC_JP
utf8_to_euc_kr
UTF8
EUC_KR
utf8_to_euc_tw
UTF8
EUC_TW
utf8_to_gb18030
UTF8
GB18030
utf8_to_gbk
UTF8
GBK
utf8_to_iso_8859_1
UTF8
LATIN1
utf8_to_iso_8859_10
UTF8
LATIN6
utf8_to_iso_8859_13
UTF8
LATIN7
utf8_to_iso_8859_14
UTF8
LATIN8
utf8_to_iso_8859_15
UTF8
LATIN9
utf8_to_iso_8859_16
UTF8
LATIN10
utf8_to_iso_8859_2
UTF8
LATIN2
utf8_to_iso_8859_3
UTF8
LATIN3
utf8_to_iso_8859_4
UTF8
LATIN4
utf8_to_iso_8859_5
UTF8
ISO_8859_5
utf8_to_iso_8859_6
UTF8
ISO_8859_6
utf8_to_iso_8859_7
UTF8
ISO_8859_7
utf8_to_iso_8859_8
UTF8
ISO_8859_8
utf8_to_iso_8859_9
UTF8
LATIN5
utf8_to_johab
UTF8
JOHAB
utf8_to_koi8_r
UTF8
KOI8R
utf8_to_koi8_u
UTF8
KOI8U
utf8_to_sjis
UTF8
SJIS
utf8_to_windows_1258
UTF8
WIN1258
utf8_to_uhc
UTF8
UHC
utf8_to_windows_1250
UTF8
WIN1250
utf8_to_windows_1251
UTF8
WIN1251
utf8_to_windows_1252
UTF8
WIN1252
utf8_to_windows_1253
UTF8
WIN1253
utf8_to_windows_1254
UTF8
WIN1254
utf8_to_windows_1255
UTF8
WIN1255
utf8_to_windows_1256
UTF8
WIN1256
utf8_to_windows_1257
UTF8
WIN1257
utf8_to_windows_866
UTF8
WIN866
utf8_to_windows_874
UTF8
WIN874
windows_1250_to_iso_8859_2
WIN1250
LATIN2
windows_1250_to_mic
WIN1250
MULE_INTERNAL
windows_1250_to_utf8
WIN1250
UTF8
windows_1251_to_iso_8859_5
WIN1251
ISO_8859_5
windows_1251_to_koi8_r
WIN1251
KOI8R
windows_1251_to_mic
WIN1251
MULE_INTERNAL
windows_1251_to_utf8
WIN1251
UTF8
windows_1251_to_windows_866
WIN1251
WIN866
windows_1252_to_utf8
WIN1252
UTF8
windows_1256_to_utf8
WIN1256
UTF8
windows_866_to_iso_8859_5
WIN866
ISO_8859_5
windows_866_to_koi8_r
WIN866
KOI8R
windows_866_to_mic
WIN866
MULE_INTERNAL
windows_866_to_utf8
WIN866
UTF8
windows_866_to_windows_1251
WIN866
WIN
windows_874_to_utf8
WIN874
UTF8
euc_jis_2004_to_utf8
EUC_JIS_2004
UTF8
utf8_to_euc_jis_2004
UTF8
EUC_JIS_2004
shift_jis_2004_to_utf8
SHIFT_JIS_2004
UTF8
utf8_to_shift_jis_2004
UTF8
SHIFT_JIS_2004
euc_jis_2004_to_shift_jis_2004
EUC_JIS_2004
SHIFT_JIS_2004
shift_jis_2004_to_euc_jis_2004
SHIFT_JIS_2004
EUC_JIS_2004
Operator
Description
<
less than
>
greater than
<=
less than or equal to
>=
greater than or equal to
=
equal
<>
or !=
not equal
Predicate
Description
a
BETWEEN
x
AND
y
between
a
NOT BETWEEN
x
AND
y
not between
a
BETWEEN SYMMETRIC
x
AND
y
between, after sorting the comparison values
a
NOT BETWEEN SYMMETRIC
x
AND
y
not between, after sorting the comparison values
a
IS DISTINCT FROM
b
not equal, treating null like an ordinary value
a
IS NOT DISTINCT FROM
b
equal, treating null like an ordinary value
expression
IS NULL
is null
expression
IS NOT NULL
is not null
expression
ISNULL
is null (nonstandard syntax)
expression
NOTNULL
is not null (nonstandard syntax)
boolean_expression
IS TRUE
is true
boolean_expression
IS NOT TRUE
is false or unknown
boolean_expression
IS FALSE
is false
boolean_expression
IS NOT FALSE
is true or unknown
boolean_expression
IS UNKNOWN
is unknown
boolean_expression
IS NOT UNKNOWN
is true or false
Function
Description
Example
Example Result
num_nonnulls(VARIADIC "any")
returns the number of non-null arguments
num_nonnulls(1, NULL, 2)
2
num_nulls(VARIADIC "any")
returns the number of null arguments
num_nulls(1, NULL, 2)
1
Function | Return Type | Description | Example | Result |
|
| abbreviated display format as text |
|
|
|
| abbreviated display format as text |
|
|
|
| broadcast address for network |
|
|
|
| extract family of address; |
|
|
|
| extract IP address as text |
|
|
|
| construct host mask for network |
|
|
|
| extract netmask length |
|
|
|
| construct netmask for network |
|
|
|
| extract network part of address |
|
|
|
| set netmask length for |
|
|
|
| set netmask length for |
|
|
|
| extract IP address and netmask length as text |
|
|
|
| are the addresses from the same family? |
|
|
|
| the smallest network which includes both of the given networks |
|
|
Function | Return Type | Description | Example | Result |
|
| set last 3 bytes to zero |
|
|
Function | Return Type | Description | Example | Result |
|
| set last 5 bytes to zero |
|
|
|
| set 7th bit to one, also known as modified EUI-64, for inclusion in an IPv6 address |
|
|
Operator | Description | Example |
| Matches regular expression, case sensitive |
|
| Matches regular expression, case insensitive |
|
| Does not match regular expression, case sensitive |
|
| Does not match regular expression, case insensitive |
|
Quantifier | Matches |
| a sequence of 0 or more matches of the atom |
| a sequence of 1 or more matches of the atom |
| a sequence of 0 or 1 matches of the atom |
| a sequence of exactly |
| a sequence of |
| a sequence of |
| non-greedy version of |
| non-greedy version of |
| non-greedy version of |
| non-greedy version of |
| non-greedy version of |
| non-greedy version of |
Constraint | Description |
| matches at the beginning of the string |
| matches at the end of the string |
| positive lookahead matches at any point where a substring matching |
| negative lookahead matches at any point where no substring matching |
| positive lookbehind matches at any point where a substring matching |
| negative lookbehind matches at any point where no substring matching |
Escape | Description |
| alert (bell) character, as in C |
| backspace, as in C |
| synonym for backslash ( |
| (where |
| the character whose collating-sequence name is |
| form feed, as in C |
| newline, as in C |
| carriage return, as in C |
| horizontal tab, as in C |
| (where |
| (where |
| vertical tab, as in C |
| (where |
| the character whose value is |
| (where |
| (where |
Escape | Description |
|
|
|
|
|
|
|
|
|
|
|
|
Escape | Description |
| (where |
| (where |
|
|
|
|
|
|
|
|
|
|
|
|
|
Expression | Return Type | Description |
|
| Treat given time stamp without time zone as located in the specified time zone |
|
| Convert given time stamp with time zone to the new time zone, with no time zone designation |
|
| Convert given time with time zone to the new time zone |
Function | Return Type | Description | Example | Result |
|
| test a configuration |
|
|
|
| test a dictionary |
|
|
|
| test a parser |
|
|
|
| test a parser |
|
|
|
| get token types defined by parser |
|
|
|
| get token types defined by parser |
|
|
|
| get statistics of a |
|
|
Pattern | Description |
| digit position (can be dropped if insignificant) |
| digit position (will not be dropped, even if insignificant) |
| decimal point |
| group (thousands) separator |
| negative value in angle brackets |
| sign anchored to number (uses locale) |
| currency symbol (uses locale) |
| decimal point (uses locale) |
| group separator (uses locale) |
| minus sign in specified position (if number < 0) |
| plus sign in specified position (if number > 0) |
| plus/minus sign in specified position |
| Roman numeral (input between 1 and 3999) |
| ordinal number suffix |
| shift specified number of digits (see notes) |
| exponent for scientific notation |
Modifier | Description | Example |
| fill mode (suppress trailing zeroes and padding blanks) |
|
| upper case ordinal number suffix |
|
| lower case ordinal number suffix |
|
Expression | Result |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Function | Return Type | Description | Example |
|
| area |
|
|
| center |
|
|
| diameter of circle |
|
|
| vertical size of box |
|
|
| a closed path? |
|
|
| an open path? |
|
|
| length |
|
|
| number of points |
|
|
| number of points |
|
|
| convert path to closed |
|
|
| convert path to open |
|
|
| radius of circle |
|
|
| horizontal size of box |
|
Function | Return Type | Description | Example |
|
| circle to box |
|
|
| point to empty box |
|
|
| points to box |
|
|
| polygon to box |
|
|
| boxes to bounding box |
|
|
| box to circle |
|
|
| center and radius to circle |
|
|
| polygon to circle |
|
|
| points to line |
|
|
| box diagonal to line segment |
|
|
| points to line segment |
|
|
| polygon to path |
|
|
| construct point |
|
|
| center of box |
|
|
| center of circle |
|
|
| center of line segment |
|
|
| center of polygon |
|
|
| box to 4-point polygon |
|
|
| circle to 12-point polygon |
|
|
| circle to |
|
|
| path to polygon |
|
Operator | Right Operand Type | Description | Example |
|
| Does the left JSON value contain the right JSON path/value entries at the top level? |
|
|
| Are the left JSON path/value entries contained at the top level within the right JSON value? |
|
|
| Does the string exist as a top-level key within the JSON value? |
|
|
| Do any of these array strings exist as top-level keys? |
|
|
| Do all of these array strings exist as top-level keys? |
|
|
| Concatenate two |
|
|
| Delete key/value pair or string element from left operand. Key/value pairs are matched based on their key value. |
|
|
| Delete multiple key/value pairs or string elements from left operand. Key/value pairs are matched based on their key value. |
|
|
| Delete the array element with specified index (Negative integers count from the end). Throws an error if top level container is not an array. |
|
|
| Delete the field or element with specified path (for JSON arrays, negative integers count from the end) |
|
|
| Does JSON path return any item for the specified JSON value? |
|
|
| Returns the result of JSON path predicate check for the specified JSON value. Only the first item of the result is taken into account. If the result is not Boolean, then |
|
Function | Description | Example | Example Result |
| Returns the value as |
|
|
| Returns the array as a JSON array. A PostgreSQL multidimensional array becomes a JSON array of arrays. Line feeds will be added between dimension-1 elements if |
|
|
| Returns the row as a JSON object. Line feeds will be added between level-1 elements if |
|
|
| Builds a possibly-heterogeneously-typed JSON array out of a variadic argument list. |
|
|
| Builds a JSON object out of a variadic argument list. By convention, the argument list consists of alternating keys and values. |
|
|
| Builds a JSON object out of a text array. The array must have either exactly one dimension with an even number of members, in which case they are taken as alternating key/value pairs, or two dimensions such that each inner array has exactly two elements, which are taken as a key/value pair. |
|
|
| This form of |
|
|
Function | Return Type | Description | Example | Example Result |
|
| Returns the number of elements in the outermost JSON array. |
|
|
|
| Expands the outermost JSON object into a set of key/value pairs. |
|
|
| Expands the outermost JSON object into a set of key/value pairs. The returned values will be of type |
|
|
| Returns JSON value pointed to by |
|
|
|
| Returns JSON value pointed to by |
|
|
|
| Returns set of keys in the outermost JSON object. |
|
|
| Expands the object in |
|
|
| Expands the outermost array of objects in |
|
|
| Expands a JSON array to a set of JSON values. |
|
|
| Expands a JSON array to a set of |
|
|
| Returns the type of the outermost JSON value as a text string. Possible types are |
|
|
|
| Builds an arbitrary record from a JSON object (see note below). As with all functions returning |
|
|
| Builds an arbitrary set of records from a JSON array of objects (see note below). As with all functions returning |
|
|
| Returns |
|
|
|
| Returns |
|
|
|
| Returns |
|
|
|
| Returns |
|
|
| Checks whether JSON path returns any item for the specified JSON value. |
|
|
|
| Returns the result of JSON path predicate check for the specified JSON value. Only the first item of the result is taken into account. If the result is not Boolean, then |
|
|
|
| Gets all JSON items returned by JSON path for the specified JSON value. |
|
|
| Gets all JSON items returned by JSON path for the specified JSON value and wraps result into an array. |
|
|
|
| Gets the first JSON item returned by JSON path for the specified JSON value. Returns |
|
|
Operator/Method | Description | Example JSON | Example Query | Result |
| Plus operator that iterates over the SQL/JSON sequence |
|
|
|
| Minus operator that iterates over the SQL/JSON sequence |
|
|
|
| Addition |
|
|
|
| Subtraction |
|
|
|
| Multiplication |
|
|
|
| Division |
|
|
|
| Modulus |
|
|
|
| Type of the SQL/JSON item |
|
|
|
| Size of the SQL/JSON item |
|
|
|
| Approximate floating-point number converted from an SQL/JSON number or a string |
|
|
|
| Nearest integer greater than or equal to the SQL/JSON number |
|
|
|
| Nearest integer less than or equal to the SQL/JSON number |
|
|
|
| Absolute value of the SQL/JSON number |
|
|
|
| Sequence of object's key-value pairs represented as array of items containing three fields ( |
|
|
|
Function | Description | Example | Example Result |
| Returns the first value of the input enum type |
|
|
| Returns the last value of the input enum type |
|
|
| Returns all values of the input enum type in an ordered array |
|
|
| Returns the range between the two given enum values, as an ordered array. The values must be from the same enum type. If the first parameter is null, the result will start with the first value of the enum type. If the second parameter is null, the result will end with the last value of the enum type. |
|
|
|
|
|
|
Operator | Description | Example |
| is less than |
|
| is less than or equal |
|
| equals |
|
| is greater or equal |
|
| is greater than |
|
| is not equal |
|
| is contained by |
|
| is contained by or equals |
|
| contains |
|
| contains or equals |
|
| contains or is contained by |
|
| bitwise NOT |
|
| bitwise AND |
|
` | ` | bitwise OR | `inet '192.168.1.6' | inet '0.0.0.255'` |
| addition |
|
| subtraction |
|
| subtraction |
|
Operator | Example | Result |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Function | Return Type | Description |
|
| Return value most recently obtained with |
|
| Return value most recently obtained with |
|
| Advance sequence and return new value |
|
| Set sequence's current value |
|
| Set sequence's current value and |
Operator | Return Type | Description | Example | Result |
|
|
|
|
|
|
| deprecated synonym for |
|
|
` | ` |
| concatenate | `'a:1 b:2'::tsvector | 'c:1 d:2 b:3'::tsvector` |
|
|
| AND | `'fat | rat'::tsquery && 'cat'::tsquery` | `( 'fat' | 'rat' ) & 'cat'` |
` | ` |
| OR | `'fat | rat'::tsquery | 'cat'::tsquery` | `( 'fat' | 'rat' ) | 'cat'` |
|
| negate a |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Function | Return Type | Description | Example | Result |
|
| convert array of lexemes to |
|
|
|
| get default text search configuration |
|
|
|
| number of lexemes in |
|
|
|
| number of lexemes plus operators in | `numnode('(fat & rat) | cat'::tsquery)` |
|
|
| produce |
|
|
|
| produce |
|
|
|
| get indexable part of a |
|
|
|
| assign |
|
|
|
| assign |
|
|
|
| remove positions and weights from |
|
|
|
| normalize words and convert to |
|
|
|
| reduce document text to |
|
|
|
| reduce each string value in the document to a |
|
|
|
| remove given |
|
|
|
| remove any occurrence of lexemes in |
|
|
|
| select only elements with given |
|
|
|
| display a query match |
|
|
|
| display a query match |
|
|
|
| rank document for query |
|
|
|
| rank document for query using cover density |
|
|
|
| replace | `ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'foo | bar'::tsquery)` | `'b' & ( 'foo' | 'bar' )` |
|
| replace using targets and substitutes from a |
| `'b' & ( 'foo' | 'bar' )` |
|
| make query that searches for |
|
|
|
| make query that searches for |
|
|
|
| convert |
|
|
|
| trigger function for automatic |
|
|
| trigger function for automatic |
|
|
| expand a tsvector to a set of rows |
|
|
Function | Return Type | Description | Example |
|
| 將時間戳記轉換為字串 |
|
|
| convert interval to string |
|
|
| convert integer to string |
|
|
| convert real/double precision to string |
|
|
| convert numeric to string |
|
|
| convert string to date |
|
|
| convert string to numeric |
|
|
| convert string to time stamp |
|
Pattern | Description |
| hour of day (01-12) |
| hour of day (01-12) |
| hour of day (00-23) |
| minute (00-59) |
| second (00-59) |
| millisecond (000-999) |
| microsecond (000000-999999) |
| seconds past midnight (0-86399) |
| meridiem indicator (without periods) |
| meridiem indicator (with periods) |
| year (4 or more digits) with comma |
| year (4 or more digits) |
| last 3 digits of year |
| last 2 digits of year |
| last digit of year |
| ISO 8601 week-numbering year (4 or more digits) |
| last 3 digits of ISO 8601 week-numbering year |
| last 2 digits of ISO 8601 week-numbering year |
| last digit of ISO 8601 week-numbering year |
| era indicator (without periods) |
| era indicator (with periods) |
| full upper case month name (blank-padded to 9 chars) |
| full capitalized month name (blank-padded to 9 chars) |
| full lower case month name (blank-padded to 9 chars) |
| abbreviated upper case month name (3 chars in English, localized lengths vary) |
| abbreviated capitalized month name (3 chars in English, localized lengths vary) |
| abbreviated lower case month name (3 chars in English, localized lengths vary) |
| month number (01-12) |
| full upper case day name (blank-padded to 9 chars) |
| full capitalized day name (blank-padded to 9 chars) |
| full lower case day name (blank-padded to 9 chars) |
| abbreviated upper case day name (3 chars in English, localized lengths vary) |
| abbreviated capitalized day name (3 chars in English, localized lengths vary) |
| abbreviated lower case day name (3 chars in English, localized lengths vary) |
| day of year (001-366) |
| day of ISO 8601 week-numbering year (001-371; day 1 of the year is Monday of the first ISO week) |
| day of month (01-31) |
| day of the week, Sunday ( |
| ISO 8601 day of the week, Monday ( |
| week of month (1-5) (the first week starts on the first day of the month) |
| week number of year (1-53) (the first week starts on the first day of the year) |
| week number of ISO 8601 week-numbering year (01-53; the first Thursday of the year is in week 1) |
| century (2 digits) (the twenty-first century starts on 2001-01-01) |
| Julian Day (integer days since November 24, 4714 BC at midnight UTC) |
| quarter |
| month in upper case Roman numerals (I-XII; I=January) |
| month in lower case Roman numerals (i-xii; i=January) |
| upper case time-zone abbreviation (only supported in |
| lower case time-zone abbreviation (only supported in |
| time-zone hours |
| time-zone minutes |
| time-zone offset from UTC (only supported in |
Operator | Description | Example |
| Translation |
|
| Translation |
|
| Scaling/rotation |
|
| Scaling/rotation |
|
| Point or box of intersection |
|
| Number of points in path or polygon |
|
| Length or circumference |
|
| Center |
|
| Closest point to first operand on second operand |
|
| Distance between |
|
| Overlaps? (One point in common makes this true.) |
|
| Is strictly left of? |
|
| Is strictly right of? |
|
| Does not extend to the right of? |
|
| Does not extend to the left of? |
|
`<< | ` | Is strictly below? | `box '((0,0),(3,3))' << | box '((3,4),(5,5))'` |
` | >>` | Is strictly above? | `box '((3,4),(5,5))' | >> box '((0,0),(3,3))'` |
`&< | ` | Does not extend above? | `box '((0,0),(1,1))' &< | box '((0,0),(2,2))'` |
` | &>` | Does not extend below? | `box '((0,0),(3,3))' | &> box '((0,0),(2,2))'` |
| Is below (allows touching)? |
|
| Is above (allows touching)? |
|
| Intersects? |
|
| Is horizontal? |
|
| Are horizontally aligned? |
|
`? | ` | Is vertical? | `? | lseg '((-1,0),(1,0))'` |
`? | ` | Are vertically aligned? | `point '(0,1)' ? | point '(0,0)'` |
`?- | ` | Is perpendicular? | `lseg '((0,0),(0,1))' ?- | lseg '((0,0),(1,0))'` |
`? | ` | Are parallel? | `lseg '((-1,0),(1,0))' ? | lseg '((-1,2),(1,2))'` |
| Contains? |
|
| Contained in or on? |
|
| Same as? |
|
Operator | Right Operand Type | Return type | Description | Example | Example Result |
|
|
| Get JSON array element (indexed from zero, negative integers count from the end) |
|
|
|
|
| Get JSON object field by key |
|
|
|
|
| Get JSON array element as |
|
|
|
|
| Get JSON object field as |
|
|
|
|
| Get JSON object at the specified path |
|
|
|
|
| Get JSON object at the specified path as |
|
|
本節介紹了用於在多群組內容之間進行多重比較的幾個專用語法結構。這些功能在語法上與前一節的子查詢形式相關,但不涉及子查詢。涉及陣列子表示式的形式是 PostgreSQL 的延伸功能;其餘的都是相容 SQL 的。本節中記錄的所有表達形式都是回傳布林值(true/false)結果。
IN
右側是 scalar 表示式帶括號的列表。如果左側表示式的結果等於任何右側表示式,結果為「true」。這是一個簡寫的方式
請注意,如果左側表示式產生空值,或者沒有相等的右側值並且至少有一個右側表示式產生空值,則 IN 的的結果將為空,而不是 false。這符合 SQL 空值布林組合的普遍規則。
NOT IN
右側是 scalar 表示式帶括號的列表。如果左側表示式的結果不等於所有右側表示式,則結果為「true」。 這是一個簡寫的方式
請注意,如果左邊的表示式為空,或者沒有相等的右邊的值,並且至少有一個右邊的表示式為空,則 NOT IN 的結果將為空,而不要天真地認為是 true。這符合 SQL 空值布林組合的普遍規則。
x NOT IN y 在所有情況下都等於 NOT(x IN y)。但是,使用 NOT IN 時,與使用 IN 時相比,空值更有可能讓新手感到痛苦。如果可能的話,最好積極轉換自己需要的比較內容。
ANY
/SOME
(array)右側是一個帶括號的表示式,它必須產生一個陣列。使用給定的運算子評估左側表示式並與陣列的每個元素進行比較,該運算子必須產生布林結果。如果獲得任何 true 結果,則 ANY 的結果為「true」。 如果未找到 true(包括陣列中沒有元素的情況),則結果為「false」。
如果陣列表示式產生一個空的陣列,則 ANY 的結果將為空。如果左邊的表示式為空,則 ANY 的結果通常為空(儘管非嚴格的比較運算子可能會產生不同的結果)。另外,如果右邊的陣列包含任何空元素並且沒有獲得真正的比較結果,則 ANY 的結果將為空,而不是 false(再次假設嚴格的比較運算子)。這符合 SQL 空值布林組合的普遍規則。
SOME 是 ANY 的同義詞。
ALL
(array)右側是一個帶括號的表示式,它必須產生一個陣列。使用給定的運算子計算左側表示式並與陣列的每個元素進行比較,該運算子必須產生布林結果。如果所有比較都為真(包括陣列為空的情況),則 ALL 的結果為“真”。如果發現任何錯誤的情況,結果就為“假”。
如果陣列表示式產生一個空陣列,則 ALL 的結果將為 NULL。如果左邊的表示式為NULL,則 ALL 的結果通常為 NULL(儘管非嚴格的比較運算子可能產生不同的結果)。另外,如果右邊的陣列包含任何 NULL 元素,並且沒有獲得錯誤的比較結果,則 ALL 的結果將為 NULL,而不是 TRUE(再次假設一個嚴格的比較運算子)。 這符合 SQL NULL 布林組合的一般性規則。
每一邊都是資料列建構函數,如 4.2.13 節所述。兩個資料列內容必須具有相同的欄位數。運算好每一側,並逐個資料列比較它們。當運算子為 =,<>,<,<=,>或 >=時,允許進行資料列建構函數比較。每個資料列元素必須是具有預設 B-tree運算子類的型別,否則嘗試的比較可能會産生錯誤。
如果使用前面的欄位解析比較,則可能不會發生與元素數量或型別相關的錯誤。
= 和 <> 比較的工作方式與其他比較略有不同。如果所有相應的成員都是非空且相等的,則認為兩個資料列相等;如果任何相應的成員非空且不相等,則資料列不相等;否則資料列比較的結果是未知的(null)。
對於 <,<=,> 和 >= 情況,資料列元素從左到右進行比較,一旦找到不相等或空的元素配對就停止。如果這對元素中的任何一個為 null,則資料列比較的結果是未知的(null);否則這對元素的比較就決定了結果。例如,ROW(1, 2, NULL) < ROW(1, 3, 0) 產生 true,而不是 null,因為不考慮第三組元素。
在 PostgreSQL 8.2 之前,每個 SQL 規範都沒有處理 <,<=,> 和 >=。像ROW(a, b) < ROW(c, d) 這樣的比較被實作為 a < c AND b < d,而正確的行為等同於 a < c OR (a = c AND b <d)。
此語法類似於 <> 行比較,但它不會因為 null 輸入產生 null。相反地,任何空值被認為不等於(不同於)任何非空值,並且任何兩個空值被認為是相等的(不是不同的)。因此結果將為 true 或 false,永遠不為 null。
此語法類似於 a = 資料列比較,但它不會因為 null 輸入而產生 null。相反地,任何空值被認為不等於(不同於)任何非空值,並且任何兩個空值被認為是相等的(不是不同的)。因此,結果將始終為 true 或 false,永遠不會為 null。
如果結果取決於比較兩個 NULL 值或 NULL 和非 NULL,則 SQL 規範要求按資料列進行比較以回傳 NULL。PostgreSQL只在比較兩個資料列建構函數的結果(如 9.23.5 節)或者將一個資料列建構函數與子查詢的輸出結果進行比較時(如 9.22 節)那樣做。在比較兩個複合型別內容的其他部份中,兩個 NULL 字串會被認為是相等的,並且 NULL 被認為大於非 NULL。為了對複合型別進行一致的排序和索引行為,這是必須的。
評估每一側,並逐個資料列比較它們。 當運算符為 =,<>,<,<=,> 或 >= 時,允許複合型別比較,或者俱有與其中一個類似的語義。(具體而言,如果一個運算子是 B-Tree 運算子類的成員,或者是 B-Tree 運算子類的 = 成員的否定運算,則它可以是資料列比較運算子。)上述運算子的預設行為與資料列建構函數的 IS [NOT] DISTINCT FROM 相同(見第 9.23.5 節)。
為了支援包含沒有預設 B-Tree 運算子類的元素的資料列匹配,以下運算子被定義用於複合型別比較: =, <>, <, <=,> 和 >=。這些運算子比較兩個資料列的內部二進製表示形式。即使兩個資料列與等號運算子的比較為真,兩個資料列也可能具有不同的二進製表示形式。 這些比較運算子下的資料列排序是確定性的,但沒有其他意義。這些運算子在內部用於具體化檢視表,並可用於其他專用目的(如複寫),但不打算經常用於撰寫查詢。
This section describes the SQL-compliant subquery expressions available in PostgreSQL. All of the expression forms documented in this section return Boolean (true/false) results.
EXISTS
The argument of EXISTS
is an arbitrary SELECT
statement, or subquery. The subquery is evaluated to determine whether it returns any rows. If it returns at least one row, the result of EXISTS
is “true”; if the subquery returns no rows, the result of EXISTS
is “false”.
The subquery can refer to variables from the surrounding query, which will act as constants during any one evaluation of the subquery.
The subquery will generally only be executed long enough to determine whether at least one row is returned, not all the way to completion. It is unwise to write a subquery that has side effects (such as calling sequence functions); whether the side effects occur might be unpredictable.
Since the result depends only on whether any rows are returned, and not on the contents of those rows, the output list of the subquery is normally unimportant. A common coding convention is to write all EXISTS
tests in the form EXISTS(SELECT 1 WHERE ...)
. There are exceptions to this rule however, such as subqueries that use INTERSECT
.
This simple example is like an inner join on col2
, but it produces at most one output row for each tab1
row, even if there are several matching tab2
rows:
IN
The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result. The result of IN
is “true” if any equal subquery row is found. The result is “false” if no equal row is found (including the case where the subquery returns no rows).
Note that if the left-hand expression yields null, or if there are no equal right-hand values and at least one right-hand row yields null, the result of the IN
construct will be null, not false. This is in accordance with SQL's normal rules for Boolean combinations of null values.
As with EXISTS
, it's unwise to assume that the subquery will be evaluated completely.
The left-hand side of this form of IN
is a row constructor, as described in Section 4.2.13. The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result. The result of IN
is “true” if any equal subquery row is found. The result is “false” if no equal row is found (including the case where the subquery returns no rows).
As usual, null values in the rows are combined per the normal rules of SQL Boolean expressions. Two rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal if any corresponding members are non-null and unequal; otherwise the result of that row comparison is unknown (null). If all the per-row results are either unequal or null, with at least one null, then the result of IN
is null.
NOT IN
The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result. The result of NOT IN
is “true” if only unequal subquery rows are found (including the case where the subquery returns no rows). The result is “false” if any equal row is found.
Note that if the left-hand expression yields null, or if there are no equal right-hand values and at least one right-hand row yields null, the result of the NOT IN
construct will be null, not true. This is in accordance with SQL's normal rules for Boolean combinations of null values.
As with EXISTS
, it's unwise to assume that the subquery will be evaluated completely.
The left-hand side of this form of NOT IN
is a row constructor, as described in Section 4.2.13. The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result. The result of NOT IN
is “true” if only unequal subquery rows are found (including the case where the subquery returns no rows). The result is “false” if any equal row is found.
As usual, null values in the rows are combined per the normal rules of SQL Boolean expressions. Two rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal if any corresponding members are non-null and unequal; otherwise the result of that row comparison is unknown (null). If all the per-row results are either unequal or null, with at least one null, then the result of NOT IN
is null.
ANY
/SOME
The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result using the given operator
, which must yield a Boolean result. The result of ANY
is “true” if any true result is obtained. The result is “false” if no true result is found (including the case where the subquery returns no rows).
SOME
is a synonym for ANY
. IN
is equivalent to = ANY
.
Note that if there are no successes and at least one right-hand row yields null for the operator's result, the result of the ANY
construct will be null, not false. This is in accordance with SQL's normal rules for Boolean combinations of null values.
As with EXISTS
, it's unwise to assume that the subquery will be evaluated completely.
The left-hand side of this form of ANY
is a row constructor, as described in Section 4.2.13. The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result, using the given operator
. The result of ANY
is “true” if the comparison returns true for any subquery row. The result is “false” if the comparison returns false for every subquery row (including the case where the subquery returns no rows). The result is NULL if no comparison with a subquery row returns true, and at least one comparison returns NULL.
See Section 9.24.5 for details about the meaning of a row constructor comparison.
ALL
The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result using the given operator
, which must yield a Boolean result. The result of ALL
is “true” if all rows yield true (including the case where the subquery returns no rows). The result is “false” if any false result is found. The result is NULL if no comparison with a subquery row returns false, and at least one comparison returns NULL.
NOT IN
is equivalent to <> ALL
.
As with EXISTS
, it's unwise to assume that the subquery will be evaluated completely.
The left-hand side of this form of ALL
is a row constructor, as described in Section 4.2.13. The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result, using the given operator
. The result of ALL
is “true” if the comparison returns true for all subquery rows (including the case where the subquery returns no rows). The result is “false” if the comparison returns false for any subquery row. The result is NULL if no comparison with a subquery row returns false, and at least one comparison returns NULL.
See Section 9.24.5 for details about the meaning of a row constructor comparison.
The left-hand side is a row constructor, as described in Section 4.2.13. The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. Furthermore, the subquery cannot return more than one row. (If it returns zero rows, the result is taken to be null.) The left-hand side is evaluated and compared row-wise to the single subquery result row.
See Section 9.24.5 for details about the meaning of a row constructor comparison.
SQL statements can, intentionally or not, require the mixing of different data types in the same expression. PostgreSQL has extensive facilities for evaluating mixed-type expressions.
In many cases a user does not need to understand the details of the type conversion mechanism. However, implicit conversions done by PostgreSQL can affect the results of a query. When necessary, these results can be tailored by using explicit type conversion.
This chapter introduces the PostgreSQL type conversion mechanisms and conventions. Refer to the relevant sections in Chapter 8 and Chapter 9 for more information on specific data types and allowed functions and operators.
The specific operator that is referenced by an operator expression is determined using the following procedure. Note that this procedure is indirectly affected by the precedence of the operators involved, since that will determine which sub-expressions are taken to be the inputs of which operators. SeeSection 4.1.6for more information.
Operator Type Resolution
Select the operators to be considered from thepg_operator
system catalog. If a non-schema-qualified operator name was used (the usual case), the operators considered are those with the matching name and argument count that are visible in the current search path (seeSection 5.8.3). If a qualified operator name was given, only operators in the specified schema are considered.
If the search path finds multiple operators with identical argument types, only the one appearing earliest in the path is considered. Operators with different argument types are considered on an equal footing regardless of search path position.
Check for an operator accepting exactly the input argument types. If one exists (there can be only one exact match in the set of operators considered), use it.
If one argument of a binary operator invocation is of theunknown
type, then assume it is the same type as the other argument for this check. Invocations involving twounknown
inputs, or a unary operator with anunknown
input, will never find a match at this step.
If one argument of a binary operator invocation is of theunknown
type and the other is of a domain type, next check to see if there is an operator accepting exactly the domain's base type on both sides; if so, use it.
Look for the best match.
Discard candidate operators for which the input types do not match and cannot be converted (using an implicit conversion) to match.unknown
literals are assumed to be convertible to anything for this purpose. If only one candidate remains, use it; else continue to the next step.
If any input argument is of a domain type, treat it as being of the domain's base type for all subsequent steps. This ensures that domains act like their base types for purposes of ambiguous-operator resolution.
Run through all candidates and keep those with the most exact matches on input types. Keep all candidates if none have exact matches. If only one candidate remains, use it; else continue to the next step.
Run through all candidates and keep those that accept preferred types (of the input data type's type category) at the most positions where type conversion will be required. Keep all candidates if none accept preferred types. If only one candidate remains, use it; else continue to the next step.
If any input arguments areunknown
, check the type categories accepted at those argument positions by the remaining candidates. At each position, select thestring
category if any candidate accepts that category. (This bias towards string is appropriate since an unknown-type literal looks like a string.) Otherwise, if all the remaining candidates accept the same type category, select that category; otherwise fail because the correct choice cannot be deduced without more clues. Now discard candidates that do not accept the selected type category. Furthermore, if any candidate accepts a preferred type in that category, discard candidates that accept non-preferred types for that argument. Keep all candidates if none survive these tests. If only one candidate remains, use it; else continue to the next step.
If there are bothunknown
and known-type arguments, and all the known-type arguments have the same type, assume that theunknown
arguments are also of that type, and check which candidates can accept that type at theunknown
-argument positions. If exactly one candidate passes this test, use it. Otherwise, fail.
Some examples follow.
Example 10.1. Factorial Operator Type Resolution
There is only one factorial operator (postfix!
) defined in the standard catalog, and it takes an argument of typebigint
. The scanner assigns an initial type ofinteger
to the argument in this query expression:
So the parser does a type conversion on the operand and the query is equivalent to:
Example 10.2. String Concatenation Operator Type Resolution
A string-like syntax is used for working with string types and for working with complex extension types. Strings with unspecified type are matched with likely operator candidates.
An example with one unspecified argument:
In this case the parser looks to see if there is an operator takingtext
for both arguments. Since there is, it assumes that the second argument should be interpreted as typetext
.
Here is a concatenation of two values of unspecified types:
In this case there is no initial hint for which type to use, since no types are specified in the query. So, the parser looks for all candidate operators and finds that there are candidates accepting both string-category and bit-string-category inputs. Since string category is preferred when available, that category is selected, and then the preferred type for strings,text
, is used as the specific type to resolve the unknown-type literals as.
Example 10.3. Absolute-Value and Negation Operator Type Resolution
ThePostgreSQLoperator catalog has several entries for the prefix operator@
, all of which implement absolute-value operations for various numeric data types. One of these entries is for typefloat8
, which is the preferred type in the numeric category. Therefore,PostgreSQLwill use that entry when faced with anunknown
input:
Here the system has implicitly resolved the unknown-type literal as typefloat8
before applying the chosen operator. We can verify thatfloat8
and not some other type was used:
On the other hand, the prefix operator~
(bitwise negation) is defined only for integer data types, not forfloat8
. So, if we try a similar case with~
, we get:
This happens because the system cannot decide which of the several possible~
operators should be preferred. We can help it out with an explicit cast:
Example 10.4. Array Inclusion Operator Type Resolution
Here is another example of resolving an operator with one known and one unknown input:
ThePostgreSQLoperator catalog has several entries for the infix operator<@
, but the only two that could possibly accept an integer array on the left-hand side are array inclusion (anyarray<@anyarray
) and range inclusion (anyelement<@anyrange
). Since none of these polymorphic pseudo-types (seeSection 8.20) are considered preferred, the parser cannot resolve the ambiguity on that basis. However,Step 3.ftells it to assume that the unknown-type literal is of the same type as the other input, that is, integer array. Now only one of the two operators can match, so array inclusion is selected. (Had range inclusion been selected, we would have gotten an error, because the string does not have the right format to be a range literal.)
Example 10.5. Custom Operator on a Domain Type
Users sometimes try to declare operators applying just to a domain type. This is possible but is not nearly as useful as it might seem, because the operator resolution rules are designed to select operators applying to the domain's base type. As an example consider
This query will not use the custom operator. The parser will first see if there is amytext=mytext
operator (Step 2.a), which there is not; then it will consider the domain's base typetext
, and see if there is atext=text
operator (Step 2.b), which there is; so it resolves theunknown
-type literal astext
and uses thetext=text
operator. The only way to get the custom operator to be used is to explicitly cast the literal:
so that themytext=text
operator is found immediately according to the exact-match rule. If the best-match rules are reached, they actively discriminate against operators on domain types. If they did not, such an operator would create too many ambiguous-operator failures, because the casting rules always consider a domain as castable to or from its base type, and so the domain operator would be considered usable in all the same cases as a similarly-named operator on the base type.
The rules given in the preceding sections will result in assignment of non-unknown
data types to all expressions in a SQL query, except for unspecified-type literals that appear as simple output columns of aSELECT
command. For example, in
there is nothing to identify what type the string literal should be taken as. In this situationPostgreSQLwill fall back to resolving the literal's type astext
.
When theSELECT
is one arm of aUNION
(orINTERSECT
orEXCEPT
) construct, or when it appears withinINSERT ... SELECT
, this rule is not applied since rules given in preceding sections take precedence. The type of an unspecified-type literal can be taken from the otherUNION
arm in the first case, or from the destination column in the second case.
RETURNING
lists are treated the same asSELECT
output lists for this purpose.
Prior toPostgreSQL10, this rule did not exist, and unspecified-type literals in aSELECT
output list were left as typeunknown
. That had assorted bad consequences, so it's been changed.
SQLis a strongly typed language. That is, every data item has an associated data type which determines its behavior and allowed usage.PostgreSQLhas an extensible type system that is more general and flexible than otherSQLimplementations. Hence, most type conversion behavior inPostgreSQLis governed by general rules rather than by_ad hoc_heuristics. This allows the use of mixed-type expressions even with user-defined types.
ThePostgreSQLscanner/parser divides lexical elements into five fundamental categories: integers, non-integer numbers, strings, identifiers, and key words. Constants of most non-numeric types are first classified as strings. TheSQLlanguage definition allows specifying type names with strings, and this mechanism can be used inPostgreSQLto start the parser down the correct path. For example, the query:
has two literal constants, of typetext
andpoint
. If a type is not specified for a string literal, then the placeholder typeunknown
is assigned initially, to be resolved in later stages as described below.
There are four fundamentalSQLconstructs requiring distinct type conversion rules in thePostgreSQLparser:
Function calls
Much of thePostgreSQLtype system is built around a rich set of functions. Functions can have one or more arguments. SincePostgreSQLpermits function overloading, the function name alone does not uniquely identify the function to be called; the parser must select the right function based on the data types of the supplied arguments.
Operators
PostgreSQLallows expressions with prefix and postfix unary (one-argument) operators, as well as binary (two-argument) operators. Like functions, operators can be overloaded, so the same problem of selecting the right operator exists.
Value Storage
SQLINSERT
andUPDATE
statements place the results of expressions into a table. The expressions in the statement must be matched up with, and perhaps converted to, the types of the target columns.
UNION
,
CASE
, and related constructs
Since all query results from a unionizedSELECT
statement must appear in a single set of columns, the types of the results of eachSELECT
clause must be matched up and converted to a uniform set. Similarly, the result expressions of aCASE
construct must be converted to a common type so that theCASE
expression as a whole has a known output type. The same holds forARRAY
constructs, and for theGREATEST
andLEAST
functions.
The system catalogs store information about which conversions, orcasts, exist between which data types, and how to perform those conversions. Additional casts can be added by the user with theCREATE CASTcommand. (This is usually done in conjunction with defining new data types. The set of casts between built-in types has been carefully crafted and is best not altered.)
An additional heuristic provided by the parser allows improved determination of the proper casting behavior among groups of types that have implicit casts. Data types are divided into several basictype categories, includingboolean
,numeric
,string
,bitstring
,datetime
,timespan
,geometric
,network
, and user-defined. (For a list seeTable 51.63; but note it is also possible to create custom type categories.) Within each category there can be one or morepreferred types, which are preferred when there is a choice of possible types. With careful selection of preferred types and available implicit casts, it is possible to ensure that ambiguous expressions (those with multiple candidate parsing solutions) can be resolved in a useful way.
All type conversion rules are designed with several principles in mind:
Implicit conversions should never have surprising or unpredictable outcomes.
There should be no extra overhead in the parser or executor if a query does not need implicit type conversion. That is, if a query is well-formed and the types already match, then the query should execute without spending extra time in the parser and without introducing unnecessary implicit conversion calls in the query.
Additionally, if a query usually requires an implicit conversion for a function, and if then the user defines a new function with the correct argument types, the parser should use this new function and no longer do implicit conversion to use the old function.
The specific function that is referenced by a function call is determined using the following procedure.
Function Type Resolution
Select the functions to be considered from thepg_proc
system catalog. If a non-schema-qualified function name was used, the functions considered are those with the matching name and argument count that are visible in the current search path (seeSection 5.8.3). If a qualified function name was given, only functions in the specified schema are considered.
If the search path finds multiple functions of identical argument types, only the one appearing earliest in the path is considered. Functions of different argument types are considered on an equal footing regardless of search path position.
If a function is declared with aVARIADIC
array parameter, and the call does not use theVARIADIC
keyword, then the function is treated as if the array parameter were replaced by one or more occurrences of its element type, as needed to match the call. After such expansion the function might have effective argument types identical to some non-variadic function. In that case the function appearing earlier in the search path is used, or if the two functions are in the same schema, the non-variadic one is preferred.
Functions that have default values for parameters are considered to match any call that omits zero or more of the defaultable parameter positions. If more than one such function matches a call, the one appearing earliest in the search path is used. If there are two or more such functions in the same schema with identical parameter types in the non-defaulted positions (which is possible if they have different sets of defaultable parameters), the system will not be able to determine which to prefer, and so an“ambiguous function call”error will result if no better match to the call can be found.
Check for a function accepting exactly the input argument types. If one exists (there can be only one exact match in the set of functions considered), use it. (Cases involvingunknown
will never find a match at this step.)
If no exact match is found, see if the function call appears to be a special type conversion request. This happens if the function call has just one argument and the function name is the same as the (internal) name of some data type. Furthermore, the function argument must be either an unknown-type literal, or a type that is binary-coercible to the named data type, or a type that could be converted to the named data type by applying that type's I/O functions (that is, the conversion is either to or from one of the standard string types). When these conditions are met, the function call is treated as a form ofCAST
specification.[8]
Look for the best match.
Discard candidate functions for which the input types do not match and cannot be converted (using an implicit conversion) to match.unknown
literals are assumed to be convertible to anything for this purpose. If only one candidate remains, use it; else continue to the next step.
If any input argument is of a domain type, treat it as being of the domain's base type for all subsequent steps. This ensures that domains act like their base types for purposes of ambiguous-function resolution.
Run through all candidates and keep those with the most exact matches on input types. Keep all candidates if none have exact matches. If only one candidate remains, use it; else continue to the next step.
Run through all candidates and keep those that accept preferred types (of the input data type's type category) at the most positions where type conversion will be required. Keep all candidates if none accept preferred types. If only one candidate remains, use it; else continue to the next step.
If any input arguments areunknown
, check the type categories accepted at those argument positions by the remaining candidates. At each position, select thestring
category if any candidate accepts that category. (This bias towards string is appropriate since an unknown-type literal looks like a string.) Otherwise, if all the remaining candidates accept the same type category, select that category; otherwise fail because the correct choice cannot be deduced without more clues. Now discard candidates that do not accept the selected type category. Furthermore, if any candidate accepts a preferred type in that category, discard candidates that accept non-preferred types for that argument. Keep all candidates if none survive these tests. If only one candidate remains, use it; else continue to the next step.
If there are bothunknown
and known-type arguments, and all the known-type arguments have the same type, assume that theunknown
arguments are also of that type, and check which candidates can accept that type at theunknown
-argument positions. If exactly one candidate passes this test, use it. Otherwise, fail.
Note that the“best match”rules are identical for operator and function type resolution. Some examples follow.
Example 10.6. Rounding Function Argument Type Resolution
There is only oneround
function that takes two arguments; it takes a first argument of typenumeric
and a second argument of typeinteger
. So the following query automatically converts the first argument of typeinteger
tonumeric
:
That query is actually transformed by the parser to:
Since numeric constants with decimal points are initially assigned the typenumeric
, the following query will require no type conversion and therefore might be slightly more efficient:
Example 10.7. Substring Function Type Resolution
There are severalsubstr
functions, one of which takes typestext
andinteger
. If called with a string constant of unspecified type, the system chooses the candidate function that accepts an argument of the preferred categorystring
(namely of typetext
).
If the string is declared to be of typevarchar
, as might be the case if it comes from a table, then the parser will try to convert it to becometext
:
This is transformed by the parser to effectively become:
The parser learns from thepg_cast
catalog thattext
andvarchar
are binary-compatible, meaning that one can be passed to a function that accepts the other without doing any physical conversion. Therefore, no type conversion call is really inserted in this case.
And, if the function is called with an argument of typeinteger
, the parser will try to convert that totext
:
This does not work becauseinteger
does not have an implicit cast totext
. An explicit cast will work, however:
[8]The reason for this step is to support function-style cast specifications in cases where there is not an actual cast function. If there is a cast function, it is conventionally named after its output type, and so there is no need to have a special case. SeeCREATE CASTfor additional commentary.
版本:11
Values to be inserted into a table are converted to the destination column's data type according to the following steps.
Value Storage Type Conversion
Check for an exact match with the target.
Otherwise, try to convert the expression to the target type. This is possible if an assignment cast between the two types is registered in the pg_cast
catalog (see CREATE CAST). Alternatively, if the expression is an unknown-type literal, the contents of the literal string will be fed to the input conversion routine for the target type.
Check to see if there is a sizing cast for the target type. A sizing cast is a cast from that type to itself. If one is found in the pg_cast
catalog, apply it to the expression before storing into the destination column. The implementation function for such a cast always takes an extra parameter of type integer
, which receives the destination column's atttypmod
value (typically its declared length, although the interpretation of atttypmod
varies for different data types), and it may take a third boolean
parameter that says whether the cast is explicit or implicit. The cast function is responsible for applying any length-dependent semantics such as size checking or truncation.
character
Storage Type ConversionFor a target column declared as character(20)
the following statement shows that the stored value is sized correctly:
What has really happened here is that the two unknown literals are resolved to text
by default, allowing the ||
operator to be resolved as text
concatenation. Then the text
result of the operator is converted to bpchar
(“blank-padded char”, the internal name of the character
data type) to match the target column type. (Since the conversion from text
to bpchar
is binary-coercible, this conversion does not insert any real function call.) Finally, the sizing function bpchar(bpchar, integer, boolean)
is found in the system catalog and applied to the operator's result and the stored column length. This type-specific function performs the required length check and addition of padding spaces.
SQL UNION 結構必須匹配可能不相似的型別才能成為單個結果集合。解析演算法分別套用於合併集合查詢的每個輸出欄位。INTERSECT 和 EXCEPT 結構以與 UNION 相同的方式解析不同型別。CASE,ARRAY,VALUES,GREATEST 和 LEAST 結構使用相同的演算法來匹配其組合表示式並選擇結果資料型別。
UNION,CASE 和相關結構的型別解析
如果所有輸入屬於同一型別且不是未知,就以該型別解析。
如果任何輸入屬於 domain 型別,則將其視為 domain 的基本型別進行所有後續步驟。
註:有點像處理運算子和函數的 domain 輸入,這種行為允許透過 UNION 或類似結構保留 doamin 型別,只要使用者小心確保所有輸入都能確定該型別。否則, domain 的基本型別將是首選。
如果所有輸入都是未知類型,則解析為 text 型別(字串類別的偏好型別)。否則,為了剩餘規則的處理,將忽略未知輸入。
如果非未知輸入不是所有相同的型別類別,則失敗。
選擇第一個非未知輸入型別,如果有,則選擇該類別中的偏好型別。
否則,選擇允許所有前面的非未知輸入直接轉換為它的最後一個非未知輸入型別。 (總是有這樣的型別,因為列表中至少第一個型別必須滿足這個條件。)
將所有輸入轉換為所選型別。如果沒有從給予輸入到所選型別的轉換,則失敗。
一些例子如下。
Example 10.10. 在 UNION 中使用未指定型別輸入解析方案
在這裡,未知型別文字「b」將被解析為 text 型別。
Example 10.11. 在簡單 UNION 中的型別解析
文字 1.2 是數字型別,整數值 1 可以直接轉換為數字,因此使用該型別。
Example 10.12. 在轉置 UNION 中的型別轉換
這裡,由於型別 real 不能直接轉換為整數,但整數可以直接轉換為實數,因此 union 結果型別被解析為 real。
Example 10.13. 在巢狀 UNION 中的型別解析
此失敗是因為 PostgreSQL 將多個 UNION 視為巢狀的成對操作;也就是說,與以下的輸入是相同的
根據上面給予的規則,內部 UNION 被解析為 text。然後外部 UNION 具有 text 和 integer 型別的輸入,導致觀察的錯誤。可以透過確保最左邊的 UNION 至少具有所需結果型別的一種輸入來解決此問題。
INTERSECT 和 EXCEPT 操作同樣成對處理。然而,本節中描述的其他結構同樣在一套解析步驟中考慮它們的所有輸入。
索引是增強資料庫效能的常用方法。索引允許資料庫伺服器比沒有索引時更快地查詢和檢索特定資料列。但索引也會增加整個資料庫系統的開銷,因此應該合理地使用它們。
Atom | Description |
| (where |
| as above, but the match is not noted for reporting (a “non-capturing” set of parentheses) (AREs only) |
| matches any single character |
|
| (where |
|
| when followed by a character other than a digit, matches the left-brace character |
| where |
Escape | Description |
|
| matches only at the beginning of a word |
| matches only at the end of a word |
| matches only at the beginning or end of a word |
| matches only at a point that is not the beginning or end of a word |
|
Option | Description |
| rest of RE is a BRE |
| case-sensitive matching (overrides operator type) |
| rest of RE is an ERE |
|
| historical synonym for |
|
|
| rest of RE is a literal (“quoted”) string, all ordinary characters |
| non-newline-sensitive matching (default) |
| tight syntax (default; see below) |
|
| expanded syntax (see below) |
Function | Return Type | Description | Example | Result |
|
| Subtract arguments, producing a “symbolic” result that uses years and months, rather than just days |
|
|
|
| Subtract from |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Test for finite date (not +/-infinity) |
|
|
|
| Test for finite time stamp (not +/-infinity) |
|
|
|
| Test for finite interval |
|
|
|
| Adjust interval so 30-day time periods are represented as months |
|
|
|
| Adjust interval so 24-hour time periods are represented as days |
|
|
|
| Adjust interval using |
|
|
|
|
|
|
|
| Create date from year, month and day fields |
|
|
|
| Create interval from years, months, weeks, days, hours, minutes and seconds fields |
|
|
|
| Create time from hour, minute and seconds fields |
|
|
|
| Create timestamp from year, month, day, hour, minute and seconds fields |
|
|
|
| Create timestamp with time zone from year, month, day, hour, minute and seconds fields; if |
|
|
|
|
|
|
|
|
|
|
|
| Convert Unix epoch (seconds since 1970-01-01 00:00:00+00) to timestamp |
|
|
Modifier | Description | Example |
| fill mode (suppress leading zeroes and padding blanks) |
|
| upper case ordinal number suffix |
|
| lower case ordinal number suffix |
|
| fixed format global option (see usage notes) |
|
|
|
| spell mode (not implemented) |
|
Value/Predicate | Description | Example JSON | Example Query | Result |
| Equality operator |
|
|
|
| Non-equality operator |
|
|
|
| Non-equality operator (same as |
|
|
|
| Less-than operator |
|
|
|
| Less-than-or-equal-to operator |
|
|
|
| Greater-than operator |
|
|
|
| Greater-than-or-equal-to operator |
|
|
|
| Value used to perform comparison with JSON |
|
|
|
| Value used to perform comparison with JSON |
|
|
|
| Value used to perform comparison with JSON |
|
|
|
| Boolean AND |
|
|
|
| Boolean OR |
|
|
|
| Boolean NOT |
|
|
|
|
|
|
|
| Tests whether the second operand is an initial substring of the first operand |
|
|
|
| Tests whether a path expression matches at least one SQL/JSON item |
|
|
|
| Tests whether a Boolean condition is |
|
|
|
See Section 8.17 for an overview of range types.
Table 9.53 shows the specialized operators available for range types. In addition to those, the usual comparison operators shown in Table 9.1 are available for range types. The comparison operators order first by the range lower bounds, and only if those are equal do they compare the upper bounds. This does not usually result in a useful overall ordering, but the operators are provided to allow unique indexes to be constructed on ranges.
The left-of/right-of/adjacent operators always return false when an empty range is involved; that is, an empty range is not considered to be either before or after any other range.
Table 9.54 shows the functions available for use with range types.
The lower_inc
, upper_inc
, lower_inf
, and upper_inf
functions all return false for an empty range.
本節介紹可以回傳多個資料列的函數。此類中使用最廣泛的函數是序列生成函數,如 Table 9.61和 Table 9.62 所述。其他更專門的集合回傳函數在本手冊的其他地方介紹。有關組合多個集合回傳函數的方法,請參見第 7.2.1.4 節。
當 step 為正時,如果 start 大於 stop 則回傳零筆資料。相反地,當 step 為負時,如果 start 小於 stop 也回傳零筆資料。NULL 的輸入也回傳零筆資料。 step 為零是錯誤的。以下是一些範例:
generate_subscripts 是一個很方便的函數,用於為給定陣列的指定維度產成一組有效的索引內容。對於沒有所請求維數的陣列或 NULL 陣列,回傳零筆資料(但是對於 NULL 陣列元素,回傳有效的索引)。以下是一些範例:
當 FROM 子句中的函數加上 WITH ORDINALITY 時,一個 bigint 欄位將附加到輸出資料中,該欄位從 1 開始,並針對函數輸出的每一筆資料以 1 遞增。這對集合回傳函數中的 unnest() 特別有用。
Window functions provide the ability to perform calculations across sets of rows that are related to the current query row. See Section 3.5 for an introduction to this feature, and Section 4.2.8 for syntax details.
The built-in window functions are listed in Table 9.60. Note that these functions must be invoked using window function syntax, i.e., an OVER
clause is required.
In addition to these functions, any built-in or user-defined ordinary aggregate (i.e., not ordered-set or hypothetical-set aggregates) can be used as a window function; see Section 9.21 for a list of the built-in aggregates. Aggregate functions act as window functions only when an OVER
clause follows the call; otherwise they act as plain aggregates and return a single row for the entire set.
All of the functions listed in Table 9.60 depend on the sort ordering specified by the ORDER BY
clause of the associated window definition. Rows that are not distinct when considering only the ORDER BY
columns are said to be peers. The four ranking functions (including cume_dist
) are defined so that they give the same answer for all rows of a peer group.
Note that first_value
, last_value
, and nth_value
consider only the rows within the “window frame”, which by default contains the rows from the start of the partition through the last peer of the current row. This is likely to give unhelpful results for last_value
and sometimes also nth_value
. You can redefine the frame by adding a suitable frame specification (RANGE
, ROWS
or GROUPS
) to the OVER
clause. See Section 4.2.8 for more information about frame specifications.
When an aggregate function is used as a window function, it aggregates over the rows within the current row's window frame. An aggregate used with ORDER BY
and the default window frame definition produces a “running sum” type of behavior, which may or may not be what's wanted. To obtain aggregation over the whole partition, omit ORDER BY
or use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
. Other frame specifications can be used to obtain other effects.
The SQL standard defines a RESPECT NULLS
or IGNORE NULLS
option for lead
, lag
, first_value
, last_value
, and nth_value
. This is not implemented in PostgreSQL: the behavior is always the same as the standard's default, namely RESPECT NULLS
. Likewise, the standard's FROM FIRST
or FROM LAST
option for nth_value
is not implemented: only the default FROM FIRST
behavior is supported. (You can achieve the result of FROM LAST
by reversing the ORDER BY
ordering.)
雖然觸發器的許多用途涉及使用者自行定義的觸發器函數,但 PostgreSQL 提供了一些可以直接在使用者定義的觸發器中使用的內建觸發器函數。這些都列在 Table 9.97 之中。(還有一些是附加的內建觸發器函數,它們用於實現外部鍵和延遲性的索引限制條件。由於使用者並不需要直接使用它們,因此這裡就沒有將它們文件化。)
有關建立觸發器的更多說明,請參閱 CREATE TRIGGER。
當作用於資料列層級的 BEFORE UPDATE 觸發器時,suppress_redundant_updates_trigger 函數將阻止任何不實際變更資料表資料的 UPDATE 行為發生。這會替代掉無論如何都會正常執行的實體資料更新行為,無論是否發生資料變更。(這種行為使 UPDATE 執行得更快,因為不需要重覆進行相關的檢查,並且在某些情況下也很有用。)
理想情況下,您應該避免執行實際上不會變更資料的 UPDATE。多餘的 UPDATE 可能會花費大量不必要的時間,尤其是當有大量索引要更改,並且棄置的資料空間必須時常被清理時。然而,在客戶端程式中檢測這種情況並不是這麼容易,甚至是不可能的,而且撰寫表示式來檢測也容易出錯。另一種方法則是使用 suppress_redundant_updates_trigger,它會跳過不改變資料的 UPDATE。觸發器為每筆資料花費很少但非常重要的時間,因此如果UPDATE 條件的大多數資料確實發生了變化,則使用此觸發器將使更新執行得比平均的情況更慢。
Suppress_redundant_updates_trigger 函數可以像這樣加到資料表中來使用:
在大多數情況下,對於每一筆資料來說,您需要在最後才觸發此觸發器,以便它不會妨礙可能希望更改該筆資料的其他觸發器。請記住,觸發器是按名稱次序觸發的,因此您應該選擇觸發器名稱在資料表上可能有的任何其他觸發器的名稱之後。(因此在範例中使用「z」的開頭名稱。)
Table 9.48shows the operators available for array types.
Table 9.48. Array Operators
Array comparisons compare the array contents element-by-element, using the default B-tree comparison function for the element data type. In multidimensional arrays the elements are visited in row-major order (last subscript varies most rapidly). If the contents of two arrays are equal but the dimensionality is different, the first difference in the dimensionality information determines the sort order. (This is a change from versions ofPostgreSQLprior to 8.2: older versions would claim that two arrays with the same contents were equal, even if the number of dimensions or subscript ranges were different.)
SeeSection 8.15for more details about array operator behavior. SeeSection 11.2for more details about which operators support indexed operations.
Table 9.49shows the functions available for use with array types. SeeSection 8.15for more information and examples of the use of these functions.
Table 9.49. Array Functions
Inarray_position
andarray_positions
, each array element is compared to the searched value usingIS NOT DISTINCT FROM
semantics.
Inarray_position
,NULL
is returned if the value is not found.
Inarray_positions
,NULL
is returned only if the array isNULL
; if the value is not found in the array, an empty array is returned instead.
Instring_to_array
, if the delimiter parameter is NULL, each character in the input string will become a separate element in the resulting array. If the delimiter is an empty string, then the entire input string is returned as a one-element array. Otherwise the input string is split at each occurrence of the delimiter string.
Instring_to_array
, if the null-string parameter is omitted or NULL, none of the substrings of the input will be replaced by NULL. Inarray_to_string
, if the null-string parameter is omitted or NULL, any null elements in the array are simply skipped and not represented in the output string.
There are two differences in the behavior ofstring_to_array
from pre-9.1 versions ofPostgreSQL. First, it will return an empty (zero-element) array rather than NULL when the input string is of zero length. Second, if the delimiter string is NULL, the function splits the input into individual characters, rather than returning NULL as before.
See alsoSection 9.20about the aggregate functionarray_agg
for use with arrays.
Aggregate functions compute a single result from a set of input values. The built-in general-purpose aggregate functions are listed in Table 9.55 while statistical aggregates are in Table 9.56. The built-in within-group ordered-set aggregate functions are listed in Table 9.57 while the built-in within-group hypothetical-set ones are in Table 9.58. Grouping operations, which are closely related to aggregate functions, are listed in Table 9.59. The special syntax considerations for aggregate functions are explained in Section 4.2.7. Consult Section 2.7 for additional introductory information.
Aggregate functions that support Partial Mode are eligible to participate in various optimizations, such as parallel aggregation.
It should be noted that except for count
, these functions return a null value when no rows are selected. In particular, sum
of no rows returns null, not zero as one might expect, and array_agg
returns null rather than an empty array when there are no input rows. The coalesce
function can be used to substitute zero or an empty array for null when necessary.
The aggregate functions array_agg
, json_agg
, jsonb_agg
, json_object_agg
, jsonb_object_agg
, string_agg
, and xmlagg
, as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an ORDER BY
clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work. For example:
Beware that this approach can fail if the outer query level contains additional processing, such as a join, because that might cause the subquery's output to be reordered before the aggregate is computed.
The boolean aggregates bool_and
and bool_or
correspond to the standard SQL aggregates every
and any
or some
. PostgreSQL supports every
, but not any
or some
, because there is an ambiguity built into the standard syntax:
Here ANY
can be considered either as introducing a subquery, or as being an aggregate function, if the subquery returns one row with a Boolean value. Thus the standard name cannot be given to these aggregates.
Users accustomed to working with other SQL database management systems might be disappointed by the performance of the count
aggregate when it is applied to the entire table. A query like:
will require effort proportional to the size of the table: PostgreSQL will need to scan either the entire table or the entirety of an index that includes all rows in the table.
Table 9.56 shows aggregate functions typically used in statistical analysis. (These are separated out merely to avoid cluttering the listing of more-commonly-used aggregates.) Functions shown as accepting numeric_type
are available for all the types smallint
, integer
, bigint
, numeric
, real
, and double precision
. Where the description mentions N
, it means the number of input rows for which all the input expressions are non-null. In all cases, null is returned if the computation is meaningless, for example when N
is zero.
Table 9.57 shows some aggregate functions that use the ordered-set aggregate syntax. These functions are sometimes referred to as “inverse distribution” functions. Their aggregated input is introduced by ORDER BY
, and they may also take a direct argument that is not aggregated, but is computed only once. All these functions ignore null values in their aggregated input. For those that take a fraction
parameter, the fraction value must be between 0 and 1; an error is thrown if not. However, a null fraction
value simply produces a null result.
Each of the “hypothetical-set” aggregates listed in Table 9.58 is associated with a window function of the same name defined in Section 9.22. In each case, the aggregate's result is the value that the associated window function would have returned for the “hypothetical” row constructed from args
, if such a row had been added to the sorted group of rows represented by the sorted_args
. For each of these functions, the list of direct arguments given in args
must match the number and types of the aggregated arguments given in sorted_args
. Unlike most built-in aggregates, these aggregates are not strict, that is they do not drop input rows containing nulls. Null values sort according to the rule specified in the ORDER BY
clause.
The grouping operations shown in Table 9.59 are used in conjunction with grouping sets (see Section 7.2.4) to distinguish result rows. The arguments to the GROUPING
function are not actually evaluated, but they must exactly match expressions given in the GROUP BY
clause of the associated query level. For example:
Here, the grouping
value 0
in the first four rows shows that those have been grouped normally, over both the grouping columns. The value 1
indicates that model
was not grouped by in the next-to-last two rows, and the value 3
indicates that neither make
nor model
was grouped by in the last row (which therefore is an aggregate over all the input rows).
PostgreSQL provides these helper functions to retrieve information from event triggers.
For more information about event triggers, see Chapter 39.
pg_event_trigger_ddl_commands
returns a list of DDL commands executed by each user action, when invoked in a function attached to a ddl_command_end
event trigger. If called in any other context, an error is raised. pg_event_trigger_ddl_commands
returns one row for each base command executed; some commands that are a single SQL sentence may return more than one row. This function returns the following columns:
pg_event_trigger_dropped_objects
returns a list of all objects dropped by the command in whose sql_drop
event it is called. If called in any other context, an error is raised. This function returns the following columns:
The pg_event_trigger_dropped_objects
function can be used in an event trigger like this:
The functions shown in Table 9.98 provide information about a table for which a table_rewrite
event has just been called. If called in any other context, an error is raised.
These functions can be used in an event trigger like this:
PostgreSQL provides a function to inspect complex statistics defined using the CREATE STATISTICS
command.
pg_mcv_list_items
returns a list of all items stored in a multi-column MCV list, and returns the following columns:
The pg_mcv_list_items
function can be used like this:
Values of the pg_mcv_list
can be obtained only from the pg_statistic_ext_data.stxdmcv
column.
Table 9.63 shows several functions that extract session and system information.
In addition to the functions listed in this section, there are a number of functions related to the statistics system that also provide system information. See Section 27.2.2 for more information.
current_catalog
, current_role
, current_schema
, current_user
, session_user
, and user
have special syntactic status in SQL: they must be called without trailing parentheses. (In PostgreSQL, parentheses can optionally be used with current_schema
, but not with the others.)
The session_user
is normally the user who initiated the current database connection; but superusers can change this setting with SET SESSION AUTHORIZATION. The current_user
is the user identifier that is applicable for permission checking. Normally it is equal to the session user, but it can be changed with SET ROLE. It also changes during the execution of functions with the attribute SECURITY DEFINER
. In Unix parlance, the session user is the “real user” and the current user is the “effective user”. current_role
and user
are synonyms for current_user
. (The SQL standard draws a distinction between current_role
and current_user
, but PostgreSQL does not, since it unifies users and roles into a single kind of entity.)
current_schema
returns the name of the schema that is first in the search path (or a null value if the search path is empty). This is the schema that will be used for any tables or other named objects that are created without specifying a target schema. current_schemas(boolean)
returns an array of the names of all schemas presently in the search path. The Boolean option determines whether or not implicitly included system schemas such as pg_catalog
are included in the returned search path.
The search path can be altered at run time. The command is:
inet_client_addr
returns the IP address of the current client, and inet_client_port
returns the port number. inet_server_addr
returns the IP address on which the server accepted the current connection, and inet_server_port
returns the port number. All these functions return NULL if the current connection is via a Unix-domain socket.
pg_blocking_pids
returns an array of the process IDs of the sessions that are blocking the server process with the specified process ID, or an empty array if there is no such server process or it is not blocked. One server process blocks another if it either holds a lock that conflicts with the blocked process's lock request (hard block), or is waiting for a lock that would conflict with the blocked process's lock request and is ahead of it in the wait queue (soft block). When using parallel queries the result always lists client-visible process IDs (that is, pg_backend_pid
results) even if the actual lock is held or awaited by a child worker process. As a result of that, there may be duplicated PIDs in the result. Also note that when a prepared transaction holds a conflicting lock, it will be represented by a zero process ID in the result of this function. Frequent calls to this function could have some impact on database performance, because it needs exclusive access to the lock manager's shared state for a short time.
pg_conf_load_time
returns the timestamp with time zone
when the server configuration files were last loaded. (If the current session was alive at the time, this will be the time when the session itself re-read the configuration files, so the reading will vary a little in different sessions. Otherwise it is the time when the postmaster process re-read the configuration files.)
pg_current_logfile
returns, as text
, the path of the log file(s) currently in use by the logging collector. The path includes the log_directory directory and the log file name. Log collection must be enabled or the return value is NULL
. When multiple log files exist, each in a different format, pg_current_logfile
called without arguments returns the path of the file having the first format found in the ordered list: stderr, csvlog. NULL
is returned when no log file has any of these formats. To request a specific file format supply, as text
, either csvlog or stderr as the value of the optional parameter. The return value is NULL
when the log format requested is not a configured log_destination. The pg_current_logfile
reflects the contents of the current_logfiles
file.
pg_my_temp_schema
returns the OID of the current session's temporary schema, or zero if it has none (because it has not created any temporary tables). pg_is_other_temp_schema
returns true if the given OID is the OID of another session's temporary schema. (This can be useful, for example, to exclude other sessions' temporary tables from a catalog display.)
pg_listening_channels
returns a set of names of asynchronous notification channels that the current session is listening to. pg_notification_queue_usage
returns the fraction of the total available space for notifications currently occupied by notifications that are waiting to be processed, as a double
in the range 0-1. See LISTEN and NOTIFY for more information.
pg_postmaster_start_time
returns the timestamp with time zone
when the server started.
pg_safe_snapshot_blocking_pids
returns an array of the process IDs of the sessions that are blocking the server process with the specified process ID from acquiring a safe snapshot, or an empty array if there is no such server process or it is not blocked. A session running a SERIALIZABLE
transaction blocks a SERIALIZABLE READ ONLY DEFERRABLE
transaction from acquiring a snapshot until the latter determines that it is safe to avoid taking any predicate locks. See Section 13.2.3 for more information about serializable and deferrable transactions. Frequent calls to this function could have some impact on database performance, because it needs access to the predicate lock manager's shared state for a short time.
version
returns a string describing the PostgreSQL server's version. You can also get this information from server_version or for a machine-readable version, server_version_num. Software developers should use server_version_num
(available since 8.2) or PQserverVersion
instead of parsing the text version.
Table 9.64 lists functions that allow the user to query object access privileges programmatically. See Section 5.7 for more information about privileges.
has_table_privilege
checks whether a user can access a table in a particular way. The user can be specified by name, by OID (pg_authid.oid
), public
to indicate the PUBLIC pseudo-role, or if the argument is omitted current_user
is assumed. The table can be specified by name or by OID. (Thus, there are actually six variants of has_table_privilege
, which can be distinguished by the number and types of their arguments.) When specifying by name, the name can be schema-qualified if necessary. The desired access privilege type is specified by a text string, which must evaluate to one of the values SELECT
, INSERT
, UPDATE
, DELETE
, TRUNCATE
, REFERENCES
, or TRIGGER
. Optionally, WITH GRANT OPTION
can be added to a privilege type to test whether the privilege is held with grant option. Also, multiple privilege types can be listed separated by commas, in which case the result will be true
if any of the listed privileges is held. (Case of the privilege string is not significant, and extra whitespace is allowed between but not within privilege names.) Some examples:
has_sequence_privilege
checks whether a user can access a sequence in a particular way. The possibilities for its arguments are analogous to has_table_privilege
. The desired access privilege type must evaluate to one of USAGE
, SELECT
, or UPDATE
.
has_any_column_privilege
checks whether a user can access any column of a table in a particular way. Its argument possibilities are analogous to has_table_privilege
, except that the desired access privilege type must evaluate to some combination of SELECT
, INSERT
, UPDATE
, or REFERENCES
. Note that having any of these privileges at the table level implicitly grants it for each column of the table, so has_any_column_privilege
will always return true
if has_table_privilege
does for the same arguments. But has_any_column_privilege
also succeeds if there is a column-level grant of the privilege for at least one column.
has_column_privilege
checks whether a user can access a column in a particular way. Its argument possibilities are analogous to has_table_privilege
, with the addition that the column can be specified either by name or attribute number. The desired access privilege type must evaluate to some combination of SELECT
, INSERT
, UPDATE
, or REFERENCES
. Note that having any of these privileges at the table level implicitly grants it for each column of the table.
has_database_privilege
checks whether a user can access a database in a particular way. Its argument possibilities are analogous to has_table_privilege
. The desired access privilege type must evaluate to some combination of CREATE
, CONNECT
, TEMPORARY
, or TEMP
(which is equivalent to TEMPORARY
).
has_function_privilege
checks whether a user can access a function in a particular way. Its argument possibilities are analogous to has_table_privilege
. When specifying a function by a text string rather than by OID, the allowed input is the same as for the regprocedure
data type (see Section 8.19). The desired access privilege type must evaluate to EXECUTE
. An example is:
has_foreign_data_wrapper_privilege
checks whether a user can access a foreign-data wrapper in a particular way. Its argument possibilities are analogous to has_table_privilege
. The desired access privilege type must evaluate to USAGE
.
has_language_privilege
checks whether a user can access a procedural language in a particular way. Its argument possibilities are analogous to has_table_privilege
. The desired access privilege type must evaluate to USAGE
.
has_schema_privilege
checks whether a user can access a schema in a particular way. Its argument possibilities are analogous to has_table_privilege
. The desired access privilege type must evaluate to some combination of CREATE
or USAGE
.
has_server_privilege
checks whether a user can access a foreign server in a particular way. Its argument possibilities are analogous to has_table_privilege
. The desired access privilege type must evaluate to USAGE
.
has_tablespace_privilege
checks whether a user can access a tablespace in a particular way. Its argument possibilities are analogous to has_table_privilege
. The desired access privilege type must evaluate to CREATE
.
has_type_privilege
checks whether a user can access a type in a particular way. Its argument possibilities are analogous to has_table_privilege
. When specifying a type by a text string rather than by OID, the allowed input is the same as for the regtype
data type (see Section 8.19). The desired access privilege type must evaluate to USAGE
.
pg_has_role
checks whether a user can access a role in a particular way. Its argument possibilities are analogous to has_table_privilege
, except that public
is not allowed as a user name. The desired access privilege type must evaluate to some combination of MEMBER
or USAGE
. MEMBER
denotes direct or indirect membership in the role (that is, the right to do SET ROLE
), while USAGE
denotes whether the privileges of the role are immediately available without doing SET ROLE
.
row_security_active
checks whether row level security is active for the specified table in the context of the current_user
and environment. The table can be specified by name or by OID.
Table 9.65 shows the operators available for the aclitem
type, which is the catalog representation of access privileges. See Section 5.7 for information about how to read access privilege values.
aclitem
OperatorsTable 9.66 shows some additional functions to manage the aclitem
type.
aclitem
Functionsacldefault
returns the built-in default access privileges for an object of type type
belonging to role ownerId
. These represent the access privileges that will be assumed when an object's ACL entry is null. (The default access privileges are described in Section 5.7.) The type
parameter is a CHAR
: write 'c' for COLUMN
, 'r' for TABLE
and table-like objects, 's' for SEQUENCE
, 'd' for DATABASE
, 'f' for FUNCTION
or PROCEDURE
, 'l' for LANGUAGE
, 'L' for LARGE OBJECT
, 'n' for SCHEMA
, 't' for TABLESPACE
, 'F' for FOREIGN DATA WRAPPER
, 'S' for FOREIGN SERVER
, or 'T' for TYPE
or DOMAIN
.
aclexplode
returns an aclitem
array as a set of rows. Output columns are grantor oid
, grantee oid
(0
for PUBLIC
), granted privilege as text
(SELECT
, ...) and whether the privilege is grantable as boolean
. makeaclitem
performs the inverse operation.
Table 9.67 shows functions that determine whether a certain object is visible in the current schema search path. For example, a table is said to be visible if its containing schema is in the search path and no table of the same name appears earlier in the search path. This is equivalent to the statement that the table can be referenced by name without explicit schema qualification. To list the names of all visible tables:
Each function performs the visibility check for one type of database object. Note that pg_table_is_visible
can also be used with views, materialized views, indexes, sequences and foreign tables; pg_function_is_visible
can also be used with procedures and aggregates; pg_type_is_visible
can also be used with domains. For functions and operators, an object in the search path is visible if there is no object of the same name and argument data type(s) earlier in the path. For operator classes, both name and associated index access method are considered.
All these functions require object OIDs to identify the object to be checked. If you want to test an object by name, it is convenient to use the OID alias types (regclass
, regtype
, regprocedure
, regoperator
, regconfig
, or regdictionary
), for example:
Note that it would not make much sense to test a non-schema-qualified type name in this way — if the name can be recognized at all, it must be visible.
Table 9.68 lists functions that extract information from the system catalogs.
format_type
returns the SQL name of a data type that is identified by its type OID and possibly a type modifier. Pass NULL for the type modifier if no specific modifier is known.
pg_get_keywords
returns a set of records describing the SQL keywords recognized by the server. The word
column contains the keyword. The catcode
column contains a category code: U
for unreserved, C
for column name, T
for type or function name, or R
for reserved. The catdesc
column contains a possibly-localized string describing the category.
pg_get_constraintdef
, pg_get_indexdef
, pg_get_ruledef
, pg_get_statisticsobjdef
, and pg_get_triggerdef
, respectively reconstruct the creating command for a constraint, index, rule, extended statistics object, or trigger. (Note that this is a decompiled reconstruction, not the original text of the command.) pg_get_expr
decompiles the internal form of an individual expression, such as the default value for a column. It can be useful when examining the contents of system catalogs. If the expression might contain Vars, specify the OID of the relation they refer to as the second parameter; if no Vars are expected, zero is sufficient. pg_get_viewdef
reconstructs the SELECT
query that defines a view. Most of these functions come in two variants, one of which can optionally “pretty-print” the result. The pretty-printed format is more readable, but the default format is more likely to be interpreted the same way by future versions of PostgreSQL; avoid using pretty-printed output for dump purposes. Passing false
for the pretty-print parameter yields the same result as the variant that does not have the parameter at all.
pg_get_functiondef
returns a complete CREATE OR REPLACE FUNCTION
statement for a function. pg_get_function_arguments
returns the argument list of a function, in the form it would need to appear in within CREATE FUNCTION
. pg_get_function_result
similarly returns the appropriate RETURNS
clause for the function. pg_get_function_identity_arguments
returns the argument list necessary to identify a function, in the form it would need to appear in within ALTER FUNCTION
, for instance. This form omits default values.
pg_get_serial_sequence
returns the name of the sequence associated with a column, or NULL if no sequence is associated with the column. If the column is an identity column, the associated sequence is the sequence internally created for the identity column. For columns created using one of the serial types (serial
, smallserial
, bigserial
), it is the sequence created for that serial column definition. In the latter case, this association can be modified or removed with ALTER SEQUENCE OWNED BY
. (The function probably should have been called pg_get_owned_sequence
; its current name reflects the fact that it has typically been used with serial
or bigserial
columns.) The first input parameter is a table name with optional schema, and the second parameter is a column name. Because the first parameter is potentially a schema and table, it is not treated as a double-quoted identifier, meaning it is lower cased by default, while the second parameter, being just a column name, is treated as double-quoted and has its case preserved. The function returns a value suitably formatted for passing to sequence functions (see Section 9.16). A typical use is in reading the current value of a sequence for an identity or serial column, for example:
pg_get_userbyid
extracts a role's name given its OID.
pg_index_column_has_property
, pg_index_has_property
, and pg_indexam_has_property
return whether the specified index column, index, or index access method possesses the named property. NULL
is returned if the property name is not known or does not apply to the particular object, or if the OID or column number does not identify a valid object. Refer to Table 9.69 for column properties, Table 9.70 for index properties, and Table 9.71 for access method properties. (Note that extension access methods can define additional property names for their indexes.)
pg_options_to_table
returns the set of storage option name/value pairs (option_name
/option_value
) when passed pg_class
.reloptions
or pg_attribute
.attoptions
.
pg_tablespace_databases
allows a tablespace to be examined. It returns the set of OIDs of databases that have objects stored in the tablespace. If this function returns any rows, the tablespace is not empty and cannot be dropped. To display the specific objects populating the tablespace, you will need to connect to the databases identified by pg_tablespace_databases
and query their pg_class
catalogs.
pg_typeof
returns the OID of the data type of the value that is passed to it. This can be helpful for troubleshooting or dynamically constructing SQL queries. The function is declared as returning regtype
, which is an OID alias type (see Section 8.19); this means that it is the same as an OID for comparison purposes but displays as a type name. For example:
The expression collation for
returns the collation of the value that is passed to it. Example:
The value might be quoted and schema-qualified. If no collation is derived for the argument expression, then a null value is returned. If the argument is not of a collatable data type, then an error is raised.
The to_regclass
, to_regproc
, to_regprocedure
, to_regoper
, to_regoperator
, to_regtype
, to_regnamespace
, and to_regrole
functions translate relation, function, operator, type, schema, and role names (given as text
) to objects of type regclass
, regproc
, regprocedure
, regoper
, regoperator
, regtype
, regnamespace
, and regrole
respectively. These functions differ from a cast from text in that they don't accept a numeric OID, and that they return null rather than throwing an error if the name is not found (or, for to_regproc
and to_regoper
, if the given name matches multiple objects).
Table 9.72 lists functions related to database object identification and addressing.
pg_describe_object
returns a textual description of a database object specified by catalog OID, object OID, and sub-object ID (such as a column number within a table; the sub-object ID is zero when referring to a whole object). This description is intended to be human-readable, and might be translated, depending on server configuration. This is useful to determine the identity of an object as stored in the pg_depend
catalog.
pg_identify_object
returns a row containing enough information to uniquely identify the database object specified by catalog OID, object OID and sub-object ID. This information is intended to be machine-readable, and is never translated. type
identifies the type of database object; schema
is the schema name that the object belongs in, or NULL
for object types that do not belong to schemas; name
is the name of the object, quoted if necessary, if the name (along with schema name, if pertinent) is sufficient to uniquely identify the object, otherwise NULL
; identity
is the complete object identity, with the precise format depending on object type, and each name within the format being schema-qualified and quoted as necessary.
pg_identify_object_as_address
returns a row containing enough information to uniquely identify the database object specified by catalog OID, object OID and sub-object ID. The returned information is independent of the current server, that is, it could be used to identify an identically named object in another server. type
identifies the type of database object; object_names
and object_args
are text arrays that together form a reference to the object. These three values can be passed to pg_get_object_address
to obtain the internal address of the object. This function is the inverse of pg_get_object_address
.
pg_get_object_address
returns a row containing enough information to uniquely identify the database object specified by its type and object name and argument arrays. The returned values are the ones that would be used in system catalogs such as pg_depend
and can be passed to other system functions such as pg_identify_object
or pg_describe_object
. classid
is the OID of the system catalog containing the object; objid
is the OID of the object itself, and objsubid
is the sub-object ID, or zero if none. This function is the inverse of pg_identify_object_as_address
.
The functions shown in Table 9.73 extract comments previously stored with the COMMENT command. A null value is returned if no comment could be found for the specified parameters.
Table 9.73. Comment Information Functions
col_description
returns the comment for a table column, which is specified by the OID of its table and its column number. (obj_description
cannot be used for table columns since columns do not have OIDs of their own.)
The two-parameter form of obj_description
returns the comment for a database object specified by its OID and the name of the containing system catalog. For example, obj_description(123456,'pg_class')
would retrieve the comment for the table with OID 123456. The one-parameter form of obj_description
requires only the object OID. It is deprecated since there is no guarantee that OIDs are unique across different system catalogs; therefore, the wrong comment might be returned.
shobj_description
is used just like obj_description
except it is used for retrieving comments on shared objects. Some system catalogs are global to all databases within each cluster, and the descriptions for objects in them are stored globally as well.
The functions shown in Table 9.74 provide server transaction information in an exportable form. The main use of these functions is to determine which transactions were committed between two snapshots.
Table 9.74. Transaction IDs and Snapshots
The internal transaction ID type (xid
) is 32 bits wide and wraps around every 4 billion transactions. However, these functions export a 64-bit format that is extended with an “epoch” counter so it will not wrap around during the life of an installation. The data type used by these functions, txid_snapshot
, stores information about transaction ID visibility at a particular moment in time. Its components are described in Table 9.75.
Table 9.75. Snapshot Components
txid_snapshot
's textual representation is xmin
:xmax
:xip_list
. For example 10:20:10,14,15
means xmin=10, xmax=20, xip_list=10, 14, 15
.
txid_status(bigint)
reports the commit status of a recent transaction. Applications may use it to determine whether a transaction committed or aborted when the application and database server become disconnected while a COMMIT
is in progress. The status of a transaction will be reported as either in progress
, committed
, or aborted
, provided that the transaction is recent enough that the system retains the commit status of that transaction. If is old enough that no references to that transaction survive in the system and the commit status information has been discarded, this function will return NULL. Note that prepared transactions are reported as in progress
; applications must check pg_prepared_xacts
if they need to determine whether the txid is a prepared transaction.
The functions shown in Table 9.76 provide information about transactions that have been already committed. These functions mainly provide information about when the transactions were committed. They only provide useful data when track_commit_timestamp configuration option is enabled and only for transactions that were committed after it was enabled.
Table 9.76. Committed Transaction Information
The functions shown in Table 9.77 print information initialized during initdb
, such as the catalog version. They also show information about write-ahead logging and checkpoint processing. This information is cluster-wide, and not specific to any one database. They provide most of the same information, from the same source, as pg_controldata, although in a form better suited to SQL functions.
Table 9.77. Control Data Functions
pg_control_checkpoint
returns a record, shown in Table 9.78
Table 9.78. pg_control_checkpoint
Columns
pg_control_system
returns a record, shown in Table 9.79
Table 9.79. pg_control_system
Columns
pg_control_init
returns a record, shown in Table 9.80
Table 9.80. pg_control_init
Columns
pg_control_recovery
returns a record, shown in Table 9.81
Table 9.81. pg_control_recovery
Columns
The functions described in this section are used to control and monitor a PostgreSQL installation.
Table 9.83 shows the functions available to query and alter run-time configuration parameters.
The functions shown in Table 9.84 send control signals to other server processes. Use of these functions is restricted to superusers by default but access may be granted to others using GRANT
, with noted exceptions.
Each of these functions returns true
if successful and false
otherwise.
pg_cancel_backend
and pg_terminate_backend
send signals (SIGINT or SIGTERM respectively) to backend processes identified by process ID. The process ID of an active backend can be found from the pid
column of the pg_stat_activity
view, or by listing the postgres
processes on the server (using ps on Unix or the Task Manager on Windows). The role of an active backend can be found from the usename
column of the pg_stat_activity
view.
The functions shown in Table 9.85 assist in making on-line backups. These functions cannot be executed during recovery (except non-exclusive pg_start_backup
, non-exclusive pg_stop_backup
, pg_is_in_backup
, pg_backup_start_time
and pg_wal_lsn_diff
).
For details about proper usage of these functions, see Section 25.3.
pg_current_wal_lsn
displays the current write-ahead log write location in the same format used by the above functions. Similarly, pg_current_wal_insert_lsn
displays the current write-ahead log insertion location and pg_current_wal_flush_lsn
displays the current write-ahead log flush location. The insertion location is the “logical” end of the write-ahead log at any instant, while the write location is the end of what has actually been written out from the server's internal buffers, and the flush location is the last location known to be written to durable storage. The write location is the end of what can be examined from outside the server, and is usually what you want if you are interested in archiving partially-complete write-ahead log files. The insertion and flush locations are made available primarily for server debugging purposes. These are all read-only operations and do not require superuser permissions.
You can use pg_walfile_name_offset
to extract the corresponding write-ahead log file name and byte offset from a pg_lsn
value. For example:
Similarly, pg_walfile_name
extracts just the write-ahead log file name. When the given write-ahead log location is exactly at a write-ahead log file boundary, both these functions return the name of the preceding write-ahead log file. This is usually the desired behavior for managing write-ahead log archiving behavior, since the preceding file is the last one that currently needs to be archived.
The functions shown in Table 9.86 provide information about the current status of a standby server. These functions may be executed both during recovery and in normal running.
The functions shown in Table 9.87 control the progress of recovery. These functions may be executed only during recovery.
pg_wal_replay_pause
and pg_wal_replay_resume
cannot be executed while a promotion is ongoing. If a promotion is triggered while recovery is paused, the paused state ends and promotion continues.
If streaming replication is disabled, the paused state may continue indefinitely without a problem. If streaming replication is in progress then WAL records will continue to be received, which will eventually fill available disk space, depending upon the duration of the pause, the rate of WAL generation and available disk space.
PostgreSQL allows database sessions to synchronize their snapshots. A snapshot determines which data is visible to the transaction that is using the snapshot. Synchronized snapshots are necessary when two or more sessions need to see identical content in the database. If two sessions just start their transactions independently, there is always a possibility that some third transaction commits between the executions of the two START TRANSACTION
commands, so that one session sees the effects of that transaction and the other does not.
To solve this problem, PostgreSQL allows a transaction to export the snapshot it is using. As long as the exporting transaction remains open, other transactions can import its snapshot, and thereby be guaranteed that they see exactly the same view of the database that the first transaction sees. But note that any database changes made by any one of these transactions remain invisible to the other transactions, as is usual for changes made by uncommitted transactions. So the transactions are synchronized with respect to pre-existing data, but act normally for changes they make themselves.
Snapshots are exported with the pg_export_snapshot
function, shown in Table 9.88, and imported with the SET TRANSACTION command.
The functions shown in Table 9.89 are for controlling and interacting with replication features. See Section 26.2.5, Section 26.2.6, and Chapter 49 for information about the underlying features. Use of functions for replication origin is restricted to superusers. Use of functions for replication slots is restricted to superusers and users having REPLICATION
privilege.
Many of these functions have equivalent commands in the replication protocol; see Section 52.4.
The functions described in Section 9.27.3, Section 9.27.4, and Section 9.27.5 are also relevant for replication.
The functions shown in Table 9.90 calculate the disk space usage of database objects, or assist in presentation of usage results. All these functions return sizes measured in bytes. If an OID that does not represent an existing object is passed to one of these functions, NULL
is returned.
The functions above that operate on tables or indexes accept a regclass
argument, which is simply the OID of the table or index in the pg_class
system catalog. You do not have to look up the OID by hand, however, since the regclass
data type's input converter will do the work for you. Just write the table name enclosed in single quotes so that it looks like a literal constant. For compatibility with the handling of ordinary SQL names, the string will be converted to lower case unless it contains double quotes around the table name.
The functions shown in Table 9.91 assist in identifying the specific disk files associated with database objects.
Table 9.92 lists functions used to manage collations.
Table 9.93 lists functions that provide information about the structure of partitioned tables.
For example, to check the total size of the data contained in a partitioned table measurement
, one could use the following query:
Table 9.94 shows the functions available for index maintenance tasks. (Note that these maintenance tasks are normally done automatically by autovacuum; use of these functions is only required in special cases.) These functions cannot be executed during recovery. Use of these functions is restricted to superusers and the owner of the given index.
The functions shown in Table 9.95 provide native access to files on the machine hosting the server. Only files within the database cluster directory and the log_directory
can be accessed, unless the user is a superuser or is granted the role pg_read_server_files
. Use a relative path for files in the cluster directory, and a path matching the log_directory
configuration setting for log files.
Note that granting users the EXECUTE privilege on pg_read_file()
, or related functions, allows them the ability to read any file on the server that the database server process can read; these functions bypass all in-database privilege checks. This means that, for example, a user with such access is able to read the contents of the pg_authid
table where authentication information is stored, as well as read any table data in the database. Therefore, granting access to these functions should be carefully considered.
Some of these functions take an optional missing_ok
parameter, which specifies the behavior when the file or directory does not exist. If true
, the function returns NULL
or an empty result set, as appropriate. If false
, an error is raised. The default is false
.
The functions shown in Table 9.96 manage advisory locks. For details about proper use of these functions, see Section 13.3.5.
All these functions are intended to be used to lock application-defined resources, which can be identified either by a single 64-bit key value or two 32-bit key values (note that these two key spaces do not overlap). If another session already holds a conflicting lock on the same resource identifier, the functions will either wait until the resource becomes available, or return a false
result, as appropriate for the function. Locks can be either shared or exclusive: a shared lock does not conflict with other shared locks on the same resource, only with exclusive locks. Locks can be taken at session level (so that they are held until released or the session ends) or at transaction level (so that they are held until the current transaction ends; there is no provision for manual release). Multiple session-level lock requests stack, so that if the same resource identifier is locked three times there must then be three unlock requests to release the resource in advance of session end.
這裡增加說明一個查詢的 ORDER BY 原則,毋須單獨的排序步驟即可獲得。目前 PostgreSQL 支援的索引類型中,只有 B-tree 可以產生有序輸出 - 其他索引類型以未指定的,依賴於實作上的順序回傳符合的資料列。
規劃程予將考慮透過掃描與語法符合的可用索引,或者透過按儲存循序掃描資料表並進行明確的排序來滿足 ORDER BY 語法。對於需要掃描資料表的大部分的查詢,明確的排序中一個重要的特殊情況是 ORDER BY 與 LIMIT n 結合使用:必須對所有資料完全排序,以取得前 n 筆資料,但如果存在與 ORDER BY 符合的索引,則可以直接檢索前 n 筆資料,而不掃描剩餘的資料列。
預設情況下,B-tree 索引按升羃儲存其項目,最後為空。這樣産生的索引是 x 欄位上索引的正向掃描,產生的輸出滿足 ORDER BY x(或者更詳細,ORDER BY x ASC NULLS LAST)。也可以向後掃描,產生滿足 ORDER BY x DESC 的輸出(或者更詳細地說,ORDER BY x DESC NULLS FIRST,因為 NULLS FIRST 是ORDER BY DESC 的預設值)。
您可以透過在建笠索引時包含選項 ASC,DESC,NULLS FIRST 或 NULLS LAST來調整 B-tree 索引的順序;例如:
首先以空值以升羃儲存的索引可以滿足 ORDER BY x ASC NULLS FIRST 或 ORDER BY x DESC NULLS LAST,具體取決於掃描的方向。
您可能想知道為什麼還要提供所有四個選項,當兩個選項和後向掃描的可能性將覆蓋 ORDER BY 的所有變形時。在單欄位索引中,選項確實是多餘的,但在多欄位索引中它們可能很有用。 考慮 (x, y) 上的兩欄位索引:如果我們向前掃描,這可以滿足 ORDER BY x,y,如果我們向後掃描,則可以滿足 ORDER BY x DESC, y DESC。但可能是應用程序經常需要使用 ORDER BY x ASC, y DESC。無法從普通索引獲取該排序,但如果索引定義為 (x ASC, y DESC) 或 (x DESC, y ASC),則可能。
顯然,具有非隱含排序順序的索引是一個相當專業的功能,但有時它們可以為某些查詢產生巨大的加速。是否值得維護這樣的索引取決於您使用需要特殊排序順序的查詢的頻率。
假設我們有一個類似於這樣的資料表:
並且應用程序發出許多這樣形式的查詢:
如果沒有提前準備,系統必須逐行掃描整個 test1 資料表,以查詢所有符合的項目。如果 test1 中有很多資料列,並且只有幾個資料列(可能是零個或一個)會被這樣的查詢回傳,這顯然是一種效率低下的方法。但是,如果系統已被指示在 id 欄位上維護索引,則可以使用更有效的方法來定位符合的資料列。例如,它可能只需要深入走幾層搜索樹就好。
在大多數非小說類書籍中使用了類似的方法:讀者經常查詢的術語和概念收集在本書末尾的字母索引中。感興趣的讀者可以相對快速地掃描索引並翻到適當的頁面,而不必閱讀整本書以找到感興趣的內容。正如作者的任務是預測讀者可能會查詢的項目一樣,資料庫管理員的任務是預測哪些索引有用。
可以使用以下指令在 id 欄位上建立索引,如下所示:
可以自由選擇名稱 test1_id_index,但是您應該選擇能夠讓您以後記住索引的名字。
要移除索引,請使用 DROP INDEX 指令。 可以隨時向資料表中增加索引或從資料表中移除索引。
建立索引後,就不需要進一步操作:系統將在修改資料表時更新索引,並且當它認為這樣做比使用循序資料表掃描更有效時,它將在查詢中使用索引。但是,您可能必須定期執行 ANALYZE 指令以更新統計訊息,以允許查詢計劃程序做出明智的決策。有關如何確定是否使用索引以及計劃程序何時以及為何可以選擇不使用索引的訊息,請參閱。
索引還可以使用搜索條件使 UPDATE 和 DELETE 指令受益。此外,索引可用於交叉查詢。因此,在作為交叉查詢條件一部分的欄位上定義的索引也可以顯著加快交叉查詢。
在大型資料表上建立索引可能需要很長時間。預設情況下,PostgreSQL 允許在索引建立的同時在資料表上進行讀取(SELECT 語句),但寫入(INSERT,UPDATE,DELETE)將被阻止,直到索引建構完成。在産品環境中,這通常是不可接受的。允許寫入與索引建立同時發生是可能的,但有幾點值得注意 - 有關更多訊息,請參閱。
建立索引後,系統必須使其與資料表保持同步。這增加了資料操作的開銷。因此,應移除在查詢中很少或從不使用的索引。
單個索引掃描只能使用將索引的欄位與其運算子類的運算子一起使用的查詢子句,並以 AND 連接。例如,給予 (a, b) 上的索引,如 WHERE a = 5 AND b = 6 的查詢條件可以使用索引,但是像 WHERE a = 5 OR b = 6 這樣的查詢就無法直接使用索引。
幸運的是,PostgreSQL 能夠組合多個索引(包括同一索引的多次使用)來處理單個索引掃描無法對應的情況。系統可以在多個索引掃描中形成 AND 和 OR 條件。例如,像 WHERE x = 42 OR x = 47 OR x = 53 OR x = 99 這樣的查詢可以分解為x 上索引的四個單獨掃描,每次掃描使用一個查詢子句。然後將這些掃描的結果進行「或」運算以產生結果。另一個例子是,如果我們在 x 和 y 上有單獨的索引,那麼像 WHERE x = 5 AND y = 6 這樣的查詢的一種可能情況是使用具有適當查詢子句的每個索引,然後將索引結果和 AND 一起識別結果資料列。
為了組合多個索引,系統掃描每個所需的索引會在記憶體中準備一個 bitmap,給予回報與索引條件符合的資料表中資料列的位置。然後根據查詢的需要對 bitmap 進行 AND 運算和 OR 運算。最後,存取並回傳實際資料表中的資料列。資料列按磁碟循序存取,因為這是 bitmap 的檔案結構;這意味著原始索引的任何排序都會遺失,因此如果查詢具有 ORDER BY 子句,則需要單獨的排序步驟。由於這個原因,並且因為每次額外的索引掃描都會增加額外的時間,所以計劃程序有時會選擇使用簡單的索引掃描,即使可以使用其他索引也是如此。
除了最簡單的應用程序之外,還有各種可能有用的索引組合,資料庫開發人員必須進行權衡以決定提供哪些索引。有時多欄位索引是最好的,但有時最好建立單獨的索引並依賴索引組合功能。例如,如果您的工作負載包含有時只涉及欄位 x 的查詢混合,有時只涉及欄位 y,有時只有兩個欄位,您可以選擇在 x 和 y 上建立兩個單獨的索引,依靠索引組合來處理查詢使用兩個欄位。您還可以在 (x, y) 上建立多個欄位索引。對於涉及兩個欄位的查詢,此索引通常比索引組合更有效,但如所述,對於僅涉及 y 的查詢,它幾乎無用,因此它不應該是唯一的索引。多欄位索引和 y 上單獨索引的組合可以很好地發揮作用。對於僅涉及 x 的查詢,可以使用多列索引,儘管它會更大,因此比單獨 x 上的索引慢。最後一種方法是建立所有三個索引,但如果資料的搜尋頻率比更新頻率更高,並且所有三種類型的查詢都很常見,則這可能是合理的。如果其中一種類型的查詢比其他類型的查詢少得多,那麼您可能只想建立最符合常見類型的兩個索引。
可以在資料表的多個欄位上定義索引。例如,如果您有此形式的資料表:
(比如,你將 /dev 目錄內容儲存在資料庫中......)並經常發出如下查詢:
那麼在 major 和 minor 欄位上定義一個索引可能是合適的,例如:
目前,只有 B-tree、GiST、GIN 和 BRIN 索引型別支援多欄位索引。最多可以指定 32 個欄位。(編譯 PostgreSQL 時可以變更此限制;請參閱檔案 pg_config_manual.h。)
多欄位 B-tree 索引可以與涉及索引欄位的任何子集的查詢條件一起使用,但是當前導(最左側)欄位存在限制條件時,索引最有效。確切的規則是對前導欄位的等式限制條件以及第一欄位上沒有等式限制條件的任何不等式限制條件將用於索引的條件掃描。在索引中檢查對這些欄位右側的欄位限制條件,因此它們可以儲存對資料表的正確存取,而不會減少必須掃描的索引部分。例如,給定 (a, b, c) 上的索引和查詢條件 WHERE a = 5 AND b >= 42 AND c < 77,必須從第一個項目掃描索引,其中 a = 5 且 b = 42,直到最後一個項目 a = 5。將跳過 c >= 77 的索引條目,但仍然需要掃描它們。該索引原則上可以用於對 b 或 c 有限制條件,但對 a 沒有限制條件的查詢 - 只是必須掃描整個索引,因此在大多數情況下,查詢規劃程序更喜歡使用索引進行循序資料表掃描。
多欄位 GiST 索引可以與涉及索引欄位的任何子集的查詢條件一起使用。其他欄位的條件限制索引回傳的項目,但第一欄位的條件是確定需要掃描多少索引的最重要條件。如果 GiST 索引的第一欄位只有幾個不同的值,即使其他欄位中有許多不同的值,它也會相對無效。
多欄位 GIN 索引可以與涉及索引欄位的任何子集的查詢條件一起使用。與 B-tree 或 GiST 不同,無論查詢條件使用哪個索引欄位,索引搜尋的有效性都是相同的。
多欄位 BRIN 索引可以與涉及索引欄位的任何子集的查詢條件一起使用。與 GIN 類似,與 B-tree 或 GiST 不同,無論查詢條件使用哪個索引欄位,索引搜尋有效性都是相同的。在單個資料表上具有多個 BRIN 索引而不是一個多欄位 BRIN 索引的唯一原因是具有不同的 pages_per_range 儲存參數。
當然,每個欄位必須與適合索引類型的運算子一起使用;涉及其他運算子的子句將不予考慮。
應謹慎使用多欄位索引。在大多數情況下,單個欄位上的索引就足夠了,節省了空間和時間。除非資料表的使用非常特殊,否則具有三個欄位以上的索引不太可能有用。有關不同索引配置的優點的一些討論,另請參閱和。
PostgreSQL 提供了幾種索引型別:B-tree,Hash,GiST,SP-GiST,GIN 和 BRIN。每種索引型別依適合類型的查詢使用不同的演算法。預設情況下, CREATE INDEX 指令建立適合最常見情況的 B-tree 索引。
B-tree 可以處理為某種排序的資料比較和範圍查詢。特別是,只要使用以下運算子之一進行比較時,PostgreSQL 查詢計劃程序就會考慮使用 B-tree 索引:
也可以使用 B-tree 索引搜尋來實作等同於這些運算子的組合的語法,例如 BETWEEN 和 IN。此外,索引欄位上的 IS NULL 或 IS NOT NULL 條件也可以與 B-tree 索引一起使用。
對於涉及樣式比對運算子 LIKE 和 ~ 的查詢,最佳化程序也可以使用 B-tree 索引,如果是樣式是常數並且放在字串的開頭的話 - 例如,col LIKE 'foo%' 或 col~' ^ foo',但 col LIKE '%bar' 就不是。但是,如果您的資料庫不使用 C 語言環境,則需要使用特殊的運算子類建立索引,以支援樣式比對查詢的索引;詳見下面的第 11.9 節。也可以對 ILIKE 和 ~* 使用 B-tree 索引,但前提是樣式以非字母字元開頭,即不受大/小寫轉換影響的字元。
B-tree 索引也可用於按排序順序檢索資料。這並不一定會比簡單的掃描及排序更快,但通常會很有幫助。
Hash 索引只能處理簡單的相等比較。只要使用 = 運算子在比較中涉及索引欄位,查詢計劃程序就會考慮使用 Hash 索引。以下指令用於建立 Hash 索引:
GiST 索引不是一種索引,而是一種可以實作許多不同索引策略的基礎結構。因此,可以使用 GiST 索引的特定運算子根據索引策略(運算子類)而變化。例如,PostgreSQL 的標準版本包括幾個二維幾何資料型別的 GiST 運算子類,它們支援使用這些運算子的索引查詢:
GiST 索引還能夠最佳化「最近鄰居」搜尋,例如
找到最接近給予目標點的 10 個位置。執行此操作的能力再次取決於所使用的特定運算子類。在 Table 62.1 中,可以以這種方式使用的運算子列在「Ordering Operators」欄位中。
SP-GiST 索引(如 GiST 索引)提供支援各種搜尋的基礎結構。SP-GiST 允許實作各種不同的非平衡的磁碟資料結構,例如 quadtree,k-d tree 和 radix tree。 例如,PostgreSQL 的標準版本包括用於二維空間的 SP-GiST 運算子類,它支援使用這些運算子的索引查詢:
GIN 索引是「反向索引」,適用於包含多個值的組合的資料值,例如陣列。反向索引包含每個組合值的單獨項目,並且可以有效地處理測試特定組合值是否存在的查詢。
與 GiST 和 SP-GiST 一樣,GIN 可以支援許多不同的使用者定義的索引策略,並且可以使用 GIN 索引的特定運算子根據索引策略而有所不同。例如,PostgreSQL 的標準發行版包括一個陣列的 GIN 運算子類,它支援使用這些運算子的索引查詢:
BRIN 索引(Block Range Indexes 的簡寫)儲存關於儲存在資料表的連續物理區塊範圍中值的摘要。與 GiST,SP-GiST 和 GIN 一樣,BRIN 可以支援許多不同的索引策略,並且可以使用 BRIN 索引的特定運算子根據索引策略而變化。對於具有線性排序順序的資料類型,索引數據對應於每個區塊範圍的欄位中值的最小值和最大值。這支援使用這些運算子的索引查詢:
a bracket expression, matching any one of the chars
(see for more detail)
where c
is alphanumeric (possibly followed by other characters) is an escape, see (AREs only; in EREs and BREs, this matches c
)
matches only at the beginning of the string (see for how this differs from ^
)
matches only at the end of the string (see for how this differs from $
)
case-insensitive matching (see ) (overrides operator type)
newline-sensitive matching (see )
partial newline-sensitive matching (see )
inverse partial newline-sensitive (“weird”) matching (see )
Current date and time (changes during statement execution); see
Current date; see
Current time of day; see
Current date and time (start of current transaction); see
Get subfield (equivalent to extract
); see
Get subfield (equivalent to extract
); see
Truncate to specified precision; see
Truncate to specified precision in the specified time zone; see
Truncate to specified precision; see
Get subfield; see
Get subfield; see
Current time of day; see
Current date and time (start of current transaction); see
Current date and time (start of current transaction); see
Current date and time (start of current statement); see
Current date and time (like clock_timestamp
, but as a text
string); see
Current date and time (start of current transaction); see
translation mode (print localized day and month names based on )
Tests whether the first operand matches the regular expression given by the second operand, optionally with modifications described by a string of flag
characters (see )
(有關這些運算子的含義,請參閱。)標準版本中包含的 GiST 運算子類記錄在 中。許多其他 GiST 運算子類在 contrib 套件中可用或作為單獨的專案支援。有關更多訊息,請參閱。
(有關這些運算子的含義,請參閱。)標準版本中包含的 SP-GiST 運算子類記錄在中。有關更多訊息,請參閱。
(有關這些運算子的含義,請參閱。)標準版本中包含的 GIN 運算子類記錄在 中。許多其他 GIN 運算子類在 contrib 套件中可用或作為單獨的專案支援。有關更多訊息,請參閱。
中記錄了標準發行版中包含的 BRIN 運算子類。有關更多訊息,請參閱。
Operator
Description
Example(s)
anyrange
@>
anyrange
→ boolean
Does the first range contain the second?
int4range(2,4) @> int4range(2,3)
→ t
anyrange
@>
anyelement
→ boolean
Does the range contain the element?
'[2011-01-01,2011-03-01)'::tsrange @> '2011-01-10'::timestamp
→ t
anyrange
<@
anyrange
→ boolean
Is the first range contained by the second?
int4range(2,4) <@ int4range(1,7)
→ t
anyelement
<@
anyrange
→ boolean
Is the element contained in the range?
42 <@ int4range(1,7)
→ f
anyrange
&&
anyrange
→ boolean
Do the ranges overlap, that is, have any elements in common?
int8range(3,7) && int8range(4,12)
→ t
anyrange
<<
anyrange
→ boolean
Is the first range strictly left of the second?
int8range(1,10) << int8range(100,110)
→ t
anyrange
>>
anyrange
→ boolean
Is the first range strictly right of the second?
int8range(50,60) >> int8range(20,30)
→ t
anyrange
&<
anyrange
→ boolean
Does the first range not extend to the right of the second?
int8range(1,20) &< int8range(18,20)
→ t
anyrange
&>
anyrange
→ boolean
Does the first range not extend to the left of the second?
int8range(7,20) &> int8range(5,10)
→ t
anyrange
-|-
anyrange
→ boolean
Are the ranges adjacent?
numrange(1.1,2.2) -|- numrange(2.2,3.3)
→ t
anyrange
+
anyrange
→ anyrange
Computes the union of the ranges. The ranges must overlap or be adjacent, so that the union is a single range (but see range_merge()
).
numrange(5,15) + numrange(10,20)
→ [5,20)
anyrange
*
anyrange
→ anyrange
Computes the intersection of the ranges.
int8range(5,15) * int8range(10,20)
→ [10,15)
anyrange
-
anyrange
→ anyrange
Computes the difference of the ranges. The second range must not be contained in the first in such a way that the difference would not be a single range.
int8range(5,15) - int8range(10,20)
→ [5,10)
Function
Description
Example(s)
lower
( anyrange
) → anyelement
Extracts the lower bound of the range (NULL
if the range is empty or the lower bound is infinite).
lower(numrange(1.1,2.2))
→ 1.1
upper
( anyrange
) → anyelement
Extracts the upper bound of the range (NULL
if the range is empty or the upper bound is infinite).
upper(numrange(1.1,2.2))
→ 2.2
isempty
( anyrange
) → boolean
Is the range empty?
isempty(numrange(1.1,2.2))
→ f
lower_inc
( anyrange
) → boolean
Is the range's lower bound inclusive?
lower_inc(numrange(1.1,2.2))
→ t
upper_inc
( anyrange
) → boolean
Is the range's upper bound inclusive?
upper_inc(numrange(1.1,2.2))
→ f
lower_inf
( anyrange
) → boolean
Is the range's lower bound infinite?
lower_inf('(,)'::daterange)
→ t
upper_inf
( anyrange
) → boolean
Is the range's upper bound infinite?
upper_inf('(,)'::daterange)
→ t
range_merge
( anyrange
, anyrange
) → anyrange
Computes the smallest range that includes both of the given ranges.
range_merge('[1,2)'::int4range, '[3,4)'::int4range)
→ [1,4)
Function
Argument Type
Return Type
Description
generate_series(
start
, stop
)
int
, bigint
or numeric
setof int
, setof bigint
, or setof numeric
(same as argument type)
從 start 到 stop 產生成一系列的值,間隔為 1
generate_series(
start
, stop
, step
)
int
, bigint
or numeric
setof int
, setof bigint
or setof numeric
(same as argument type)
產生一系列的值,從 start 到 end,間隔為 step
generate_series(
start
, stop
, step
interval
)
timestamp
or timestamp with time zone
setof timestamp
or setof timestamp with time zone
(same as argument type)
產生一系列的值,從 start 到 end,間隔為 step
Function
Return Type
Description
generate_subscripts(
array anyarray
, dim int
)
setof int
產生成一個包含給定陣列索引的系列內容。
generate_subscripts(
array anyarray
, dim int
, reverse boolean
)
setof int
產生一個包含給定陣列索引的序列內容。當 reverse 為 true 時,將以相反的順序回傳該序列。
Function
Description
row_number
() → bigint
Returns the number of the current row within its partition, counting from 1.
rank
() → bigint
Returns the rank of the current row, with gaps; that is, the row_number
of the first row in its peer group.
dense_rank
() → bigint
Returns the rank of the current row, without gaps; this function effectively counts peer groups.
percent_rank
() → double precision
Returns the relative rank of the current row, that is (rank
- 1) / (total partition rows - 1). The value thus ranges from 0 to 1 inclusive.
cume_dist
() → double precision
Returns the cumulative distribution, that is (number of partition rows preceding or peers with current row) / (total partition rows). The value thus ranges from 1/N
to 1.
ntile
( num_buckets
integer
) → integer
Returns an integer ranging from 1 to the argument value, dividing the partition as equally as possible.
lag
( value
anyelement
[, offset
integer
[, default
anyelement
]] ) → anyelement
Returns value
evaluated at the row that is offset
rows before the current row within the partition; if there is no such row, instead returns default
(which must be of the same type as value
). Both offset
and default
are evaluated with respect to the current row. If omitted, offset
defaults to 1 and default
to NULL
.
lead
( value
anyelement
[, offset
integer
[, default
anyelement
]] ) → anyelement
Returns value
evaluated at the row that is offset
rows after the current row within the partition; if there is no such row, instead returns default
(which must be of the same type as value
). Both offset
and default
are evaluated with respect to the current row. If omitted, offset
defaults to 1 and default
to NULL
.
first_value
( value
anyelement
) → anyelement
Returns value
evaluated at the row that is the first row of the window frame.
last_value
( value
anyelement
) → anyelement
Returns value
evaluated at the row that is the last row of the window frame.
nth_value
( value
anyelement
, n
integer
) → anyelement
Returns value
evaluated at the row that is the n
'th row of the window frame (counting from 1); returns NULL
if there is no such row.
Function
Description
Example Usage
suppress_redundant_updates_trigger
( ) → trigger
跳過不會產生具體的更新行為。 詳情請見下文。
CREATE TRIGGER ... suppress_redundant_updates_trigger()
tsvector_update_trigger
( ) → trigger
從關聯的純文字檔欄位自動更新 tsvector 欄位。要使用的文字搜尋配置以名稱指定為觸發器參數。有關詳細說明,請參閱第 12.4.3 節。
CREATE TRIGGER ... tsvector_update_trigger(tsvcol, 'pg_catalog.swedish', title, body)
tsvector_update_trigger_column
( ) → trigger
從關聯的純文字檔欄位自動更新 tsvector 欄位。 要使用的文字搜尋配置取自資料表的 regconfig 欄位。有關詳細說明,請參閱第 12.4.3 節。
CREATE TRIGGER ... tsvector_update_trigger_column(tsvcol, tsconfigcol, title, body)
Operator
Description
Example
Result
=
equal
ARRAY[1.1,2.1,3.1]::int[] = ARRAY[1,2,3]
t
<>
not equal
ARRAY[1,2,3] <> ARRAY[1,2,4]
t
<
less than
ARRAY[1,2,3] < ARRAY[1,2,4]
t
>
greater than
ARRAY[1,4,3] > ARRAY[1,2,4]
t
<=
less than or equal
ARRAY[1,2,3] <= ARRAY[1,2,3]
t
>=
greater than or equal
ARRAY[1,4,3] >= ARRAY[1,4,3]
t
@>
contains
ARRAY[1,4,3] @> ARRAY[3,1]
t
<@
is contained by
ARRAY[2,7] <@ ARRAY[1,7,4,2,6]
t
&&
overlap (have elements in common)
ARRAY[1,4,3] && ARRAY[2,1]
t
`
`
array-to-array concatenation
`ARRAY[1,2,3]
ARRAY[4,5,6]`
{1,2,3,4,5,6}
`
`
array-to-array concatenation
`ARRAY[1,2,3]
ARRAY[[4,5,6],[7,8,9]]`
{ {1,2,3},{4,5,6},{7,8,9} }
`
`
element-to-array concatenation
`3
ARRAY[4,5,6]`
{3,4,5,6}
`
`
array-to-element concatenation
`ARRAY[4,5,6]
7`
{4,5,6,7}
Function
Return Type
Description
Example
Result
array_append
(anyarray
,anyelement
)
anyarray
append an element to the end of an array
array_append(ARRAY[1,2], 3)
{1,2,3}
array_cat
(anyarray
,anyarray
)
anyarray
concatenate two arrays
array_cat(ARRAY[1,2,3], ARRAY[4,5])
{1,2,3,4,5}
array_ndims
(anyarray
)
int
returns the number of dimensions of the array
array_ndims(ARRAY[[1,2,3], [4,5,6]])
2
array_dims
(anyarray
)
text
returns a text representation of array's dimensions
array_dims(ARRAY[[1,2,3], [4,5,6]])
[1:2][1:3]
array_fill
(anyelement
,int[]
, [,int[]
])
anyarray
returns an array initialized with supplied value and dimensions, optionally with lower bounds other than 1
array_fill(7, ARRAY[3], ARRAY[2])
[2:4]={7,7,7}
array_length
(anyarray
,int
)
int
returns the length of the requested array dimension
array_length(array[1,2,3], 1)
3
array_lower
(anyarray
,int
)
int
returns lower bound of the requested array dimension
array_lower('[0:2]={1,2,3}'::int[], 1)
0
array_position
(anyarray
,anyelement
[,int
])
int
returns the subscript of the first occurrence of the second argument in the array, starting at the element indicated by the third argument or at the first element (array must be one-dimensional)
array_position(ARRAY['sun','mon','tue','wed','thu','fri','sat'], 'mon')
2
array_positions
(anyarray
,anyelement
)
int[]
returns an array of subscripts of all occurrences of the second argument in the array given as first argument (array must be one-dimensional)
array_positions(ARRAY['A','A','B','A'], 'A')
{1,2,4}
array_prepend
(anyelement
,anyarray
)
anyarray
append an element to the beginning of an array
array_prepend(1, ARRAY[2,3])
{1,2,3}
array_remove
(anyarray
,anyelement
)
anyarray
remove all elements equal to the given value from the array (array must be one-dimensional)
array_remove(ARRAY[1,2,3,2], 2)
{1,3}
array_replace
(anyarray
,anyelement
,anyelement
)
anyarray
replace each array element equal to the given value with a new value
array_replace(ARRAY[1,2,5,4], 5, 3)
{1,2,3,4}
array_to_string
(anyarray
,text
[,text
])
text
concatenates array elements using supplied delimiter and optional null string
array_to_string(ARRAY[1, 2, 3, NULL, 5], ',', '*')
1,2,3,*,5
array_upper
(anyarray
,int
)
int
returns upper bound of the requested array dimension
array_upper(ARRAY[1,8,3,7], 1)
4
cardinality
(anyarray
)
int
returns the total number of elements in the array, or 0 if the array is empty
cardinality(ARRAY[[1,2],[3,4]])
4
string_to_array
(text
,text
[,text
])
text[]
splits string into array elements using supplied delimiter and optional null string
string_to_array('xx~^~yy~^~zz', '~^~', 'yy')
{xx,NULL,zz}
unnest
(anyarray
)
setof anyelement
expand an array to a set of rows
unnest(ARRAY[1,2])
12(2 rows)
unnest
(anyarray
,anyarray
[, ...])
setof anyelement, anyelement [, ...]
expand multiple arrays (possibly of different types) to a set of rows. This is only allowed in the FROM clause; seeSection 7.2.1.4
unnest(ARRAY[1,2],ARRAY['foo','bar','baz'])
1 foo2 barNULL baz(3 rows)
Function
Description
Partial Mode
array_agg
( anynonarray
) → anyarray
Collects all the input values, including nulls, into an array.
No
array_agg
( anyarray
) → anyarray
Concatenates all the input arrays into an array of one higher dimension. (The inputs must all have the same dimensionality, and cannot be empty or null.)
No
avg
( smallint
) → numeric
avg
( integer
) → numeric
avg
( bigint
) → numeric
avg
( numeric
) → numeric
avg
( real
) → double precision
avg
( double precision
) → double precision
avg
( interval
) → interval
Computes the average (arithmetic mean) of all the non-null input values.
Yes
bit_and
( smallint
) → smallint
bit_and
( integer
) → integer
bit_and
( bigint
) → bigint
bit_and
( bit
) → bit
Computes the bitwise AND of all non-null input values.
Yes
bit_or
( smallint
) → smallint
bit_or
( integer
) → integer
bit_or
( bigint
) → bigint
bit_or
( bit
) → bit
Computes the bitwise OR of all non-null input values.
Yes
bool_and
( boolean
) → boolean
Returns true if all non-null input values are true, otherwise false.
Yes
bool_or
( boolean
) → boolean
Returns true if any non-null input value is true, otherwise false.
Yes
count
( *
) → bigint
Computes the number of input rows.
Yes
count
( "any"
) → bigint
Computes the number of input rows in which the input value is not null.
Yes
every
( boolean
) → boolean
This is the SQL standard's equivalent to bool_and
.
Yes
json_agg
( anyelement
) → json
jsonb_agg
( anyelement
) → jsonb
Collects all the input values, including nulls, into a JSON array. Values are converted to JSON as per to_json
or to_jsonb
.
No
json_object_agg
( key
"any"
, value
"any"
) → json
jsonb_object_agg
( key
"any"
, value
"any"
) → jsonb
Collects all the key/value pairs into a JSON object. Key arguments are coerced to text; value arguments are converted as per to_json
or to_jsonb
. Values can be null, but not keys.
No
max
( see text
) → same as input type
Computes the maximum of the non-null input values. Available for any numeric, string, date/time, or enum type, as well as inet
, interval
, money
, oid
, pg_lsn
, tid
, and arrays of any of these types.
Yes
min
( see text
) → same as input type
Computes the minimum of the non-null input values. Available for any numeric, string, date/time, or enum type, as well as inet
, interval
, money
, oid
, pg_lsn
, tid
, and arrays of any of these types.
Yes
string_agg
( value
text
, delimiter
text
) → text
string_agg
( value
bytea
, delimiter
bytea
) → bytea
Concatenates the non-null input values into a string. Each value after the first is preceded by the corresponding delimiter
(if it's not null).
No
sum
( smallint
) → bigint
sum
( integer
) → bigint
sum
( bigint
) → numeric
sum
( numeric
) → numeric
sum
( real
) → real
sum
( double precision
) → double precision
sum
( interval
) → interval
sum
( money
) → money
Computes the sum of the non-null input values.
Yes
xmlagg
( xml
) → xml
Concatenates the non-null XML input values (see Section 9.15.1.7).
No
Function
Description
Partial Mode
corr
( Y
double precision
, X
double precision
) → double precision
Computes the correlation coefficient.
Yes
covar_pop
( Y
double precision
, X
double precision
) → double precision
Computes the population covariance.
Yes
covar_samp
( Y
double precision
, X
double precision
) → double precision
Computes the sample covariance.
Yes
regr_avgx
( Y
double precision
, X
double precision
) → double precision
Computes the average of the independent variable, sum(
X
)/N
.
Yes
regr_avgy
( Y
double precision
, X
double precision
) → double precision
Computes the average of the dependent variable, sum(
Y
)/N
.
Yes
regr_count
( Y
double precision
, X
double precision
) → bigint
Computes the number of rows in which both inputs are non-null.
Yes
regr_intercept
( Y
double precision
, X
double precision
) → double precision
Computes the y-intercept of the least-squares-fit linear equation determined by the (X
, Y
) pairs.
Yes
regr_r2
( Y
double precision
, X
double precision
) → double precision
Computes the square of the correlation coefficient.
Yes
regr_slope
( Y
double precision
, X
double precision
) → double precision
Computes the slope of the least-squares-fit linear equation determined by the (X
, Y
) pairs.
Yes
regr_sxx
( Y
double precision
, X
double precision
) → double precision
Computes the “sum of squares” of the independent variable, sum(
X
^2) - sum(X
)^2/N
.
Yes
regr_sxy
( Y
double precision
, X
double precision
) → double precision
Computes the “sum of products” of independent times dependent variables, sum(
X
*Y
) - sum(X
) * sum(Y
)/N
.
Yes
regr_syy
( Y
double precision
, X
double precision
) → double precision
Computes the “sum of squares” of the dependent variable, sum(
Y
^2) - sum(Y
)^2/N
.
Yes
stddev
( numeric_type
) → double precision
for real
or double precision
, otherwise numeric
This is a historical alias for stddev_samp
.
Yes
stddev_pop
( numeric_type
) → double precision
for real
or double precision
, otherwise numeric
Computes the population standard deviation of the input values.
Yes
stddev_samp
( numeric_type
) → double precision
for real
or double precision
, otherwise numeric
Computes the sample standard deviation of the input values.
Yes
variance
( numeric_type
) → double precision
for real
or double precision
, otherwise numeric
This is a historical alias for var_samp
.
Yes
var_pop
( numeric_type
) → double precision
for real
or double precision
, otherwise numeric
Computes the population variance of the input values (square of the population standard deviation).
Yes
var_samp
( numeric_type
) → double precision
for real
or double precision
, otherwise numeric
Computes the sample variance of the input values (square of the sample standard deviation).
Yes
Function
Description
Partial Mode
mode
() WITHIN GROUP
( ORDER BY
anyelement
) → anyelement
Computes the mode, the most frequent value of the aggregated argument (arbitrarily choosing the first one if there are multiple equally-frequent values). The aggregated argument must be of a sortable type.
No
percentile_cont
( fraction
double precision
) WITHIN GROUP
( ORDER BY
double precision
) → double precision
percentile_cont
( fraction
double precision
) WITHIN GROUP
( ORDER BY
interval
) → interval
Computes the continuous percentile, a value corresponding to the specified fraction
within the ordered set of aggregated argument values. This will interpolate between adjacent input items if needed.
No
percentile_cont
( fractions
double precision[]
) WITHIN GROUP
( ORDER BY
double precision
) → double precision[]
percentile_cont
( fractions
double precision[]
) WITHIN GROUP
( ORDER BY
interval
) → interval[]
Computes multiple continuous percentiles. The result is an array of the same dimensions as the fractions
parameter, with each non-null element replaced by the (possibly interpolated) value corresponding to that percentile.
No
percentile_disc
( fraction
double precision
) WITHIN GROUP
( ORDER BY
anyelement
) → anyelement
Computes the discrete percentile, the first value within the ordered set of aggregated argument values whose position in the ordering equals or exceeds the specified fraction
. The aggregated argument must be of a sortable type.
No
percentile_disc
( fractions
double precision[]
) WITHIN GROUP
( ORDER BY
anyelement
) → anyarray
Computes multiple discrete percentiles. The result is an array of the same dimensions as the fractions
parameter, with each non-null element replaced by the input value corresponding to that percentile. The aggregated argument must be of a sortable type.
No
Function
Description
Partial Mode
rank
( args
) WITHIN GROUP
( ORDER BY
sorted_args
) → bigint
Computes the rank of the hypothetical row, with gaps; that is, the row number of the first row in its peer group.
No
dense_rank
( args
) WITHIN GROUP
( ORDER BY
sorted_args
) → bigint
Computes the rank of the hypothetical row, without gaps; this function effectively counts peer groups.
No
percent_rank
( args
) WITHIN GROUP
( ORDER BY
sorted_args
) → double precision
Computes the relative rank of the hypothetical row, that is (rank
- 1) / (total rows - 1). The value thus ranges from 0 to 1 inclusive.
No
cume_dist
( args
) WITHIN GROUP
( ORDER BY
sorted_args
) → double precision
Computes the cumulative distribution, that is (number of rows preceding or peers with hypothetical row) / (total rows). The value thus ranges from 1/N
to 1.
No
Function
Description
GROUPING
( group_by_expression(s)
) → integer
Returns a bit mask indicating which GROUP BY
expressions are not included in the current grouping set. Bits are assigned with the rightmost argument corresponding to the least-significant bit; each bit is 0 if the corresponding expression is included in the grouping criteria of the grouping set generating the current result row, and 1 if it is not included.
Name
Type
Description
classid
oid
OID of catalog the object belongs in
objid
oid
OID of the object itself
objsubid
integer
Sub-object ID (e.g., attribute number for a column)
command_tag
text
Command tag
object_type
text
Type of the object
schema_name
text
Name of the schema the object belongs in, if any; otherwise NULL
. No quoting is applied.
object_identity
text
Text rendering of the object identity, schema-qualified. Each identifier included in the identity is quoted if necessary.
in_extension
boolean
True if the command is part of an extension script
command
pg_ddl_command
A complete representation of the command, in internal format. This cannot be output directly, but it can be passed to other functions to obtain different pieces of information about the command.
Name
Type
Description
classid
oid
OID of catalog the object belonged in
objid
oid
OID of the object itself
objsubid
integer
Sub-object ID (e.g., attribute number for a column)
original
boolean
True if this was one of the root object(s) of the deletion
normal
boolean
True if there was a normal dependency relationship in the dependency graph leading to this object
is_temporary
boolean
True if this was a temporary object
object_type
text
Type of the object
schema_name
text
Name of the schema the object belonged in, if any; otherwise NULL
. No quoting is applied.
object_name
text
Name of the object, if the combination of schema and name can be used as a unique identifier for the object; otherwise NULL
. No quoting is applied, and name is never schema-qualified.
object_identity
text
Text rendering of the object identity, schema-qualified. Each identifier included in the identity is quoted if necessary.
address_names
text[]
An array that, together with object_type
and address_args
, can be used by the pg_get_object_address
function to recreate the object address in a remote server containing an identically named object of the same kind.
address_args
text[]
Complement for address_names
Function
Description
pg_event_trigger_table_rewrite_oid
() → oid
Returns the OID of the table about to be rewritten.
pg_event_trigger_table_rewrite_reason
() → integer
Returns a code explaining the reason(s) for rewriting. The exact meaning of the codes is release dependent.
Name
Type
Description
index
int
index of the item in the MCV list
values
text[]
values stored in the MCV item
nulls
boolean[]
flags identifying NULL
values
frequency
double precision
frequency of this MCV item
base_frequency
double precision
base frequency of this MCV item
Name
Return Type
Description
current_catalog
name
name of current database (called “catalog” in the SQL standard)
current_database()
name
name of current database
current_query()
text
text of the currently executing query, as submitted by the client (might contain more than one statement)
current_role
name
equivalent to current_user
current_schema
[()]
name
name of current schema
current_schemas(boolean
)
name[]
names of schemas in search path, optionally including implicit schemas
current_user
name
user name of current execution context
inet_client_addr()
inet
address of the remote connection
inet_client_port()
int
port of the remote connection
inet_server_addr()
inet
address of the local connection
inet_server_port()
int
port of the local connection
pg_backend_pid()
int
Process ID of the server process attached to the current session
pg_blocking_pids(int
)
int[]
Process ID(s) that are blocking specified server process ID from acquiring a lock
pg_conf_load_time()
timestamp with time zone
configuration load time
pg_current_logfile([text
])
text
Primary log file name, or log in the requested format, currently in use by the logging collector
pg_my_temp_schema()
oid
OID of session's temporary schema, or 0 if none
pg_is_other_temp_schema(oid
)
boolean
is schema another session's temporary schema?
pg_jit_available()
boolean
is a JIT compiler extension available (see Chapter 31) and the jit configuration parameter set to on
.
pg_listening_channels()
setof text
channel names that the session is currently listening on
pg_notification_queue_usage()
double
fraction of the asynchronous notification queue currently occupied (0-1)
pg_postmaster_start_time()
timestamp with time zone
server start time
pg_safe_snapshot_blocking_pids(int
)
int[]
Process ID(s) that are blocking specified server process ID from acquiring a safe snapshot
pg_trigger_depth()
int
current nesting level of PostgreSQL triggers (0 if not called, directly or indirectly, from inside a trigger)
session_user
name
session user name
user
name
equivalent to current_user
version()
text
PostgreSQL version information. See also server_version_num for a machine-readable version.
Name
Return Type
Description
has_any_column_privilege
(user
, table
, privilege
)
boolean
does user have privilege for any column of table
has_any_column_privilege
(table
, privilege
)
boolean
does current user have privilege for any column of table
has_column_privilege
(user
, table
, column
, privilege
)
boolean
does user have privilege for column
has_column_privilege
(table
, column
, privilege
)
boolean
does current user have privilege for column
has_database_privilege
(user
, database
, privilege
)
boolean
does user have privilege for database
has_database_privilege
(database
, privilege
)
boolean
does current user have privilege for database
has_foreign_data_wrapper_privilege
(user
, fdw
, privilege
)
boolean
does user have privilege for foreign-data wrapper
has_foreign_data_wrapper_privilege
(fdw
, privilege
)
boolean
does current user have privilege for foreign-data wrapper
has_function_privilege
(user
, function
, privilege
)
boolean
does user have privilege for function
has_function_privilege
(function
, privilege
)
boolean
does current user have privilege for function
has_language_privilege
(user
, language
, privilege
)
boolean
does user have privilege for language
has_language_privilege
(language
, privilege
)
boolean
does current user have privilege for language
has_schema_privilege
(user
, schema
, privilege
)
boolean
does user have privilege for schema
has_schema_privilege
(schema
, privilege
)
boolean
does current user have privilege for schema
has_sequence_privilege
(user
, sequence
, privilege
)
boolean
does user have privilege for sequence
has_sequence_privilege
(sequence
, privilege
)
boolean
does current user have privilege for sequence
has_server_privilege
(user
, server
, privilege
)
boolean
does user have privilege for foreign server
has_server_privilege
(server
, privilege
)
boolean
does current user have privilege for foreign server
has_table_privilege
(user
, table
, privilege
)
boolean
does user have privilege for table
has_table_privilege
(table
, privilege
)
boolean
does current user have privilege for table
has_tablespace_privilege
(user
, tablespace
, privilege
)
boolean
does user have privilege for tablespace
has_tablespace_privilege
(tablespace
, privilege
)
boolean
does current user have privilege for tablespace
has_type_privilege
(user
, type
, privilege
)
boolean
does user have privilege for type
has_type_privilege
(type
, privilege
)
boolean
does current user have privilege for type
pg_has_role
(user
, role
, privilege
)
boolean
does user have privilege for role
pg_has_role
(role
, privilege
)
boolean
does current user have privilege for role
row_security_active
(table
)
boolean
does current user have row level security active for table
Operator
Description
Example
Result
=
equal
'calvin=r*w/hobbes'::aclitem = 'calvin=r*w*/hobbes'::aclitem
f
@>
contains element
'{calvin=r*w/hobbes,hobbes=r*w*/postgres}'::aclitem[] @> 'calvin=r*w/hobbes'::aclitem
t
~
contains element
'{calvin=r*w/hobbes,hobbes=r*w*/postgres}'::aclitem[] ~ 'calvin=r*w/hobbes'::aclitem
t
Name
Return Type
Description
acldefault
(type
, ownerId
)
aclitem[]
get the default access privileges for an object belonging to ownerId
aclexplode
(aclitem[]
)
setof record
get aclitem
array as tuples
makeaclitem
(grantee
, grantor
, privilege
, grantable
)
aclitem
build an aclitem
from input
Name
Return Type
Description
pg_collation_is_visible(
collation_oid
)
boolean
is collation visible in search path
pg_conversion_is_visible(
conversion_oid
)
boolean
is conversion visible in search path
pg_function_is_visible(
function_oid
)
boolean
is function visible in search path
pg_opclass_is_visible(
opclass_oid
)
boolean
is operator class visible in search path
pg_operator_is_visible(
operator_oid
)
boolean
is operator visible in search path
pg_opfamily_is_visible(
opclass_oid
)
boolean
is operator family visible in search path
pg_statistics_obj_is_visible(
stat_oid
)
boolean
is statistics object visible in search path
pg_table_is_visible(
table_oid
)
boolean
is table visible in search path
pg_ts_config_is_visible(
config_oid
)
boolean
is text search configuration visible in search path
pg_ts_dict_is_visible(
dict_oid
)
boolean
is text search dictionary visible in search path
pg_ts_parser_is_visible(
parser_oid
)
boolean
is text search parser visible in search path
pg_ts_template_is_visible(
template_oid
)
boolean
is text search template visible in search path
pg_type_is_visible(
type_oid
)
boolean
is type (or domain) visible in search path
Name
Return Type
Description
format_type(
type_oid
, typemod
)
text
get SQL name of a data type
pg_get_constraintdef(
constraint_oid
)
text
get definition of a constraint
pg_get_constraintdef(
constraint_oid
, pretty_bool
)
text
get definition of a constraint
pg_get_expr(
pg_node_tree
, relation_oid
)
text
decompile internal form of an expression, assuming that any Vars in it refer to the relation indicated by the second parameter
pg_get_expr(
pg_node_tree
, relation_oid
, pretty_bool
)
text
decompile internal form of an expression, assuming that any Vars in it refer to the relation indicated by the second parameter
pg_get_functiondef(
func_oid
)
text
get definition of a function or procedure
pg_get_function_arguments(
func_oid
)
text
get argument list of function's or procedure's definition (with default values)
pg_get_function_identity_arguments(
func_oid
)
text
get argument list to identify a function or procedure (without default values)
pg_get_function_result(
func_oid
)
text
get RETURNS
clause for function (returns null for a procedure)
pg_get_indexdef(
index_oid
)
text
get CREATE INDEX
command for index
pg_get_indexdef(
index_oid
, column_no
, pretty_bool
)
text
get CREATE INDEX
command for index, or definition of just one index column when column_no
is not zero
pg_get_keywords()
setof record
get list of SQL keywords and their categories
pg_get_ruledef(
rule_oid
)
text
get CREATE RULE
command for rule
pg_get_ruledef(
rule_oid
, pretty_bool
)
text
get CREATE RULE
command for rule
pg_get_serial_sequence(
table_name
, column_name
)
text
get name of the sequence that a serial or identity column uses
pg_get_statisticsobjdef(
statobj_oid
)
text
get CREATE STATISTICS
command for extended statistics object
pg_get_triggerdef
(trigger_oid
)
text
get CREATE [ CONSTRAINT ] TRIGGER
command for trigger
pg_get_triggerdef
(trigger_oid
, pretty_bool
)
text
get CREATE [ CONSTRAINT ] TRIGGER
command for trigger
pg_get_userbyid(
role_oid
)
name
get role name with given OID
pg_get_viewdef(
view_name
)
text
get underlying SELECT
command for view or materialized view (deprecated)
pg_get_viewdef(
view_name
, pretty_bool
)
text
get underlying SELECT
command for view or materialized view (deprecated)
pg_get_viewdef(
view_oid
)
text
get underlying SELECT
command for view or materialized view
pg_get_viewdef(
view_oid
, pretty_bool
)
text
get underlying SELECT
command for view or materialized view
pg_get_viewdef(
view_oid
, wrap_column_int
)
text
get underlying SELECT
command for view or materialized view; lines with fields are wrapped to specified number of columns, pretty-printing is implied
pg_index_column_has_property(
index_oid
, column_no
, prop_name
)
boolean
test whether an index column has a specified property
pg_index_has_property(
index_oid
, prop_name
)
boolean
test whether an index has a specified property
pg_indexam_has_property(
am_oid
, prop_name
)
boolean
test whether an index access method has a specified property
pg_options_to_table(
reloptions
)
setof record
get the set of storage option name/value pairs
pg_tablespace_databases(
tablespace_oid
)
setof oid
get the set of database OIDs that have objects in the tablespace
pg_tablespace_location(
tablespace_oid
)
text
get the path in the file system that this tablespace is located in
pg_typeof(
any
)
regtype
get the data type of any value
collation for (
any
)
text
get the collation of the argument
to_regclass(
rel_name
)
regclass
get the OID of the named relation
to_regproc(
func_name
)
regproc
get the OID of the named function
to_regprocedure(
func_name
)
regprocedure
get the OID of the named function
to_regoper(
operator_name
)
regoper
get the OID of the named operator
to_regoperator(
operator_name
)
regoperator
get the OID of the named operator
to_regtype(
type_name
)
regtype
get the OID of the named type
to_regnamespace(
schema_name
)
regnamespace
get the OID of the named schema
to_regrole(
role_name
)
regrole
get the OID of the named role
Name
Description
asc
Does the column sort in ascending order on a forward scan?
desc
Does the column sort in descending order on a forward scan?
nulls_first
Does the column sort with nulls first on a forward scan?
nulls_last
Does the column sort with nulls last on a forward scan?
orderable
Does the column possess any defined sort ordering?
distance_orderable
Can the column be scanned in order by a “distance” operator, for example ORDER BY col <-> constant
?
returnable
Can the column value be returned by an index-only scan?
search_array
Does the column natively support col = ANY(array)
searches?
search_nulls
Does the column support IS NULL
and IS NOT NULL
searches?
Name
Description
clusterable
Can the index be used in a CLUSTER
command?
index_scan
Does the index support plain (non-bitmap) scans?
bitmap_scan
Does the index support bitmap scans?
backward_scan
Can the scan direction be changed in mid-scan (to support FETCH BACKWARD
on a cursor without needing materialization)?
Name
Description
can_order
Does the access method support ASC
, DESC
and related keywords in CREATE INDEX
?
can_unique
Does the access method support unique indexes?
can_multi_col
Does the access method support indexes with multiple columns?
can_exclude
Does the access method support exclusion constraints?
can_include
Does the access method support the INCLUDE
clause of CREATE INDEX
?
Name
Return Type
Description
pg_describe_object(
classid
oid
, objid
oid
, objsubid
integer
)
text
get description of a database object
pg_identify_object(
classid
oid
, objid
oid
, objsubid
integer
)
type
text
, schema
text
, name
text
, identity
text
get identity of a database object
pg_identify_object_as_address(
classid
oid
, objid
oid
, objsubid
integer
)
type
text
, object_names
text[]
, object_args
text[]
get external representation of a database object's address
pg_get_object_address(
type
text
, object_names
text[]
, object_args
text[]
)
classid
oid
, objid
oid
, objsubid
integer
get address of a database object from its external representation
Name
Return Type
Description
col_description(
table_oid
, column_number
)
text
get comment for a table column
obj_description(
object_oid
, catalog_name
)
text
get comment for a database object
obj_description(
object_oid
)
text
get comment for a database object (deprecated)
shobj_description(
object_oid
, catalog_name
)
text
get comment for a shared database object
Name
Return Type
Description
txid_current()
bigint
get current transaction ID, assigning a new one if the current transaction does not have one
txid_current_if_assigned()
bigint
same as txid_current()
but returns null instead of assigning a new transaction ID if none is already assigned
txid_current_snapshot()
txid_snapshot
get current snapshot
txid_snapshot_xip(
txid_snapshot
)
setof bigint
get in-progress transaction IDs in snapshot
txid_snapshot_xmax(
txid_snapshot
)
bigint
get xmax
of snapshot
txid_snapshot_xmin(
txid_snapshot
)
bigint
get xmin
of snapshot
txid_visible_in_snapshot(
bigint
, txid_snapshot
)
boolean
is transaction ID visible in snapshot? (do not use with subtransaction ids)
txid_status(
bigint
)
text
report the status of the given transaction: committed
, aborted
, in progress
, or null if the transaction ID is too old
Name
Description
xmin
Earliest transaction ID (txid) that is still active. All earlier transactions will either be committed and visible, or rolled back and dead.
xmax
First as-yet-unassigned txid. All txids greater than or equal to this are not yet started as of the time of the snapshot, and thus invisible.
xip_list
Active txids at the time of the snapshot. The list includes only those active txids between xmin
and xmax
; there might be active txids higher than xmax
. A txid that is xmin <= txid < xmax
and not in this list was already completed at the time of the snapshot, and thus either visible or dead according to its commit status. The list does not include txids of subtransactions.
Name
Return Type
Description
pg_xact_commit_timestamp(
xid
)
timestamp with time zone
get commit timestamp of a transaction
pg_last_committed_xact()
xid
xid
, timestamp
timestamp with time zone
get transaction ID and commit timestamp of latest committed transaction
Name
Return Type
Description
pg_control_checkpoint()
record
Returns information about current checkpoint state.
pg_control_system()
record
Returns information about current control file state.
pg_control_init()
record
Returns information about cluster initialization state.
pg_control_recovery()
record
Returns information about recovery state.
Column Name
Data Type
checkpoint_lsn
pg_lsn
redo_lsn
pg_lsn
redo_wal_file
text
timeline_id
integer
prev_timeline_id
integer
full_page_writes
boolean
next_xid
text
next_oid
oid
next_multixact_id
xid
next_multi_offset
xid
oldest_xid
xid
oldest_xid_dbid
oid
oldest_active_xid
xid
oldest_multi_xid
xid
oldest_multi_dbid
oid
oldest_commit_ts_xid
xid
newest_commit_ts_xid
xid
checkpoint_time
timestamp with time zone
Column Name
Data Type
pg_control_version
integer
catalog_version_no
integer
system_identifier
bigint
pg_control_last_modified
timestamp with time zone
Column Name
Data Type
max_data_alignment
integer
database_block_size
integer
blocks_per_segment
integer
wal_block_size
integer
bytes_per_wal_segment
integer
max_identifier_length
integer
max_index_columns
integer
max_toast_chunk_size
integer
large_object_chunk_size
integer
float4_pass_by_value
boolean
float8_pass_by_value
boolean
data_page_checksum_version
integer
Column Name
Data Type
min_recovery_end_lsn
pg_lsn
min_recovery_end_timeline
integer
backup_start_lsn
pg_lsn
backup_end_lsn
pg_lsn
end_of_backup_record_required
boolean
Function
Description
Example(s)
current_setting
( setting_name
text
[, missing_ok
boolean
] ) → text
Returns the current value of the setting setting_name
. If there is no such setting, current_setting
throws an error unless missing_ok
is supplied and is true
. This function corresponds to the SQL command SHOW
.
current_setting('datestyle')
→ ISO, MDY
set_config
( setting_name
text
, new_value
text
, is_local
boolean
) → text
Sets the parameter setting_name
to new_value
, and returns that value. If is_local
is true
, the new value will only apply for the current transaction. If you want the new value to apply for the current session, use false
instead. This function corresponds to the SQL command SET
.
set_config('log_statement_stats', 'off', false)
→ off
Function
Description
pg_cancel_backend
( pid
integer
) → boolean
Cancels the current query of the session whose backend process has the specified process ID. This is also allowed if the calling role is a member of the role whose backend is being canceled or the calling role has been granted pg_signal_backend
, however only superusers can cancel superuser backends.
pg_reload_conf
() → boolean
Causes all processes of the PostgreSQL server to reload their configuration files. (This is initiated by sending a SIGHUP signal to the postmaster process, which in turn sends SIGHUP to each of its children.)
pg_rotate_logfile
() → boolean
Signals the log-file manager to switch to a new output file immediately. This works only when the built-in log collector is running, since otherwise there is no log-file manager subprocess.
pg_terminate_backend
( pid
integer
) → boolean
Terminates the session whose backend process has the specified process ID. This is also allowed if the calling role is a member of the role whose backend is being terminated or the calling role has been granted pg_signal_backend
, however only superusers can terminate superuser backends.
Function
Description
pg_create_restore_point
( name
text
) → pg_lsn
Creates a named marker record in the write-ahead log that can later be used as a recovery target, and returns the corresponding write-ahead log location. The given name can then be used with recovery_target_name to specify the point up to which recovery will proceed. Avoid creating multiple restore points with the same name, since recovery will stop at the first one whose name matches the recovery target.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_current_wal_flush_lsn
() → pg_lsn
Returns the current write-ahead log flush location (see notes below).
pg_current_wal_insert_lsn
() → pg_lsn
Returns the current write-ahead log insert location (see notes below).
pg_current_wal_lsn
() → pg_lsn
Returns the current write-ahead log write location (see notes below).
pg_start_backup
( label
text
[, fast
boolean
[, exclusive
boolean
]] ) → pg_lsn
Prepares the server to begin an on-line backup. The only required parameter is an arbitrary user-defined label for the backup. (Typically this would be the name under which the backup dump file will be stored.) If the optional second parameter is given as true
, it specifies executing pg_start_backup
as quickly as possible. This forces an immediate checkpoint which will cause a spike in I/O operations, slowing any concurrently executing queries. The optional third parameter specifies whether to perform an exclusive or non-exclusive backup (default is exclusive).
When used in exclusive mode, this function writes a backup label file (backup_label
) and, if there are any links in the pg_tblspc/
directory, a tablespace map file (tablespace_map
) into the database cluster's data directory, then performs a checkpoint, and then returns the backup's starting write-ahead log location. (The user can ignore this result value, but it is provided in case it is useful.) When used in non-exclusive mode, the contents of these files are instead returned by the pg_stop_backup
function, and should be copied to the backup area by the user.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_stop_backup
( exclusive
boolean
[, wait_for_archive
boolean
] ) → setof record
( lsn
pg_lsn
, labelfile
text
, spcmapfile
text
)
Finishes performing an exclusive or non-exclusive on-line backup. The exclusive
parameter must match the previous pg_start_backup
call. In an exclusive backup, pg_stop_backup
removes the backup label file and, if it exists, the tablespace map file created by pg_start_backup
. In a non-exclusive backup, the desired contents of these files are returned as part of the result of the function, and should be written to files in the backup area (not in the data directory).
There is an optional second parameter of type boolean
. If false, the function will return immediately after the backup is completed, without waiting for WAL to be archived. This behavior is only useful with backup software that independently monitors WAL archiving. Otherwise, WAL required to make the backup consistent might be missing and make the backup useless. By default or when this parameter is true, pg_stop_backup
will wait for WAL to be archived when archiving is enabled. (On a standby, this means that it will wait only when archive_mode
= always
. If write activity on the primary is low, it may be useful to run pg_switch_wal
on the primary in order to trigger an immediate segment switch.)
When executed on a primary, this function also creates a backup history file in the write-ahead log archive area. The history file includes the label given to pg_start_backup
, the starting and ending write-ahead log locations for the backup, and the starting and ending times of the backup. After recording the ending location, the current write-ahead log insertion point is automatically advanced to the next write-ahead log file, so that the ending write-ahead log file can be archived immediately to complete the backup.
The result of the function is a single record. The lsn
column holds the backup's ending write-ahead log location (which again can be ignored). The second and third columns are NULL
when ending an exclusive backup; after a non-exclusive backup they hold the desired contents of the label and tablespace map files.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_stop_backup
() → pg_lsn
Finishes performing an exclusive on-line backup. This simplified version is equivalent to pg_stop_backup(true, true)
, except that it only returns the pg_lsn
result.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_is_in_backup
() → boolean
Returns true if an on-line exclusive backup is in progress.
pg_backup_start_time
() → timestamp with time zone
Returns the start time of the current on-line exclusive backup if one is in progress, otherwise NULL
.
pg_switch_wal
() → pg_lsn
Forces the server to switch to a new write-ahead log file, which allows the current file to be archived (assuming you are using continuous archiving). The result is the ending write-ahead log location plus 1 within the just-completed write-ahead log file. If there has been no write-ahead log activity since the last write-ahead log switch, pg_switch_wal
does nothing and returns the start location of the write-ahead log file currently in use.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_walfile_name
( lsn
pg_lsn
) → text
Converts a write-ahead log location to the name of the WAL file holding that location.
pg_walfile_name_offset
( lsn
pg_lsn
) → record
( file_name
text
, file_offset
integer
)
Converts a write-ahead log location to a WAL file name and byte offset within that file.
pg_wal_lsn_diff
( lsn
pg_lsn
, lsn
pg_lsn
) → numeric
Calculates the difference in bytes between two write-ahead log locations. This can be used with pg_stat_replication
or some of the functions shown in Table 9.85 to get the replication lag.
Function
Description
pg_is_in_recovery
() → boolean
Returns true if recovery is still in progress.
pg_last_wal_receive_lsn
() → pg_lsn
Returns the last write-ahead log location that has been received and synced to disk by streaming replication. While streaming replication is in progress this will increase monotonically. If recovery has completed then this will remain static at the location of the last WAL record received and synced to disk during recovery. If streaming replication is disabled, or if it has not yet started, the function returns NULL
.
pg_last_wal_replay_lsn
() → pg_lsn
Returns the last write-ahead log location that has been replayed during recovery. If recovery is still in progress this will increase monotonically. If recovery has completed then this will remain static at the location of the last WAL record applied during recovery. When the server has been started normally without recovery, the function returns NULL
.
pg_last_xact_replay_timestamp
() → timestamp with time zone
Returns the time stamp of the last transaction replayed during recovery. This is the time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, the function returns NULL
. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this will remain static at the time of the last transaction applied during recovery. When the server has been started normally without recovery, the function returns NULL
.
Function
Description
pg_is_wal_replay_paused
() → boolean
Returns true if recovery is paused.
pg_promote
( wait
boolean
DEFAULT
true
, wait_seconds
integer
DEFAULT
60
) → boolean
Promotes a standby server to primary status. With wait
set to true
(the default), the function waits until promotion is completed or wait_seconds
seconds have passed, and returns true
if promotion is successful and false
otherwise. If wait
is set to false
, the function returns true
immediately after sending a SIGUSR1
signal to the postmaster to trigger promotion.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_wal_replay_pause
() → void
Pauses recovery. While recovery is paused, no further database changes are applied. If hot standby is active, all new queries will see the same consistent snapshot of the database, and no further query conflicts will be generated until recovery is resumed.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_wal_replay_resume
() → void
Restarts recovery if it was paused.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
Function
Description
pg_export_snapshot
() → text
Saves the transaction's current snapshot and returns a text
string identifying the snapshot. This string must be passed (outside the database) to clients that want to import the snapshot. The snapshot is available for import only until the end of the transaction that exported it.
A transaction can export more than one snapshot, if needed. Note that doing so is only useful in READ COMMITTED
transactions, since in REPEATABLE READ
and higher isolation levels, transactions use the same snapshot throughout their lifetime. Once a transaction has exported any snapshots, it cannot be prepared with PREPARE TRANSACTION.
Function
Description
pg_create_physical_replication_slot
( slot_name
name
[, immediately_reserve
boolean
, temporary
boolean
] ) → record
( slot_name
name
, lsn
pg_lsn
)
Creates a new physical replication slot named slot_name
. The optional second parameter, when true
, specifies that the LSN for this replication slot be reserved immediately; otherwise the LSN is reserved on first connection from a streaming replication client. Streaming changes from a physical slot is only possible with the streaming-replication protocol — see Section 52.4. The optional third parameter, temporary
, when set to true, specifies that the slot should not be permanently stored to disk and is only meant for use by the current session. Temporary slots are also released upon any error. This function corresponds to the replication protocol command CREATE_REPLICATION_SLOT ... PHYSICAL
.
pg_drop_replication_slot
( slot_name
name
) → void
Drops the physical or logical replication slot named slot_name
. Same as replication protocol command DROP_REPLICATION_SLOT
. For logical slots, this must be called while connected to the same database the slot was created on.
pg_create_logical_replication_slot
( slot_name
name
, plugin
name
[, temporary
boolean
] ) → record
( slot_name
name
, lsn
pg_lsn
)
Creates a new logical (decoding) replication slot named slot_name
using the output plugin plugin
. The optional third parameter, temporary
, when set to true, specifies that the slot should not be permanently stored to disk and is only meant for use by the current session. Temporary slots are also released upon any error. A call to this function has the same effect as the replication protocol command CREATE_REPLICATION_SLOT ... LOGICAL
.
pg_copy_physical_replication_slot
( src_slot_name
name
, dst_slot_name
name
[, temporary
boolean
] ) → record
( slot_name
name
, lsn
pg_lsn
)
Copies an existing physical replication slot named src_slot_name
to a physical replication slot named dst_slot_name
. The copied physical slot starts to reserve WAL from the same LSN as the source slot. temporary
is optional. If temporary
is omitted, the same value as the source slot is used.
pg_copy_logical_replication_slot
( src_slot_name
name
, dst_slot_name
name
[, temporary
boolean
[, plugin
name
]] ) → record
( slot_name
name
, lsn
pg_lsn
)
Copies an existing logical replication slot named src_slot_name
to a logical replication slot named dst_slot_name
, optionally changing the output plugin and persistence. The copied logical slot starts from the same LSN as the source logical slot. Both temporary
and plugin
are optional; if they are omitted, the values of the source slot are used.
pg_logical_slot_get_changes
( slot_name
name
, upto_lsn
pg_lsn
, upto_nchanges
integer
, VARIADIC
options
text[]
) → setof record
( lsn
pg_lsn
, xid
xid
, data
text
)
Returns changes in the slot slot_name
, starting from the point from which changes have been consumed last. If upto_lsn
and upto_nchanges
are NULL, logical decoding will continue until end of WAL. If upto_lsn
is non-NULL, decoding will include only those transactions which commit prior to the specified LSN. If upto_nchanges
is non-NULL, decoding will stop when the number of rows produced by decoding exceeds the specified value. Note, however, that the actual number of rows returned may be larger, since this limit is only checked after adding the rows produced when decoding each new transaction commit.
pg_logical_slot_peek_changes
( slot_name
name
, upto_lsn
pg_lsn
, upto_nchanges
integer
, VARIADIC
options
text[]
) → setof record
( lsn
pg_lsn
, xid
xid
, data
text
)
Behaves just like the pg_logical_slot_get_changes()
function, except that changes are not consumed; that is, they will be returned again on future calls.
pg_logical_slot_get_binary_changes
( slot_name
name
, upto_lsn
pg_lsn
, upto_nchanges
integer
, VARIADIC
options
text[]
) → setof record
( lsn
pg_lsn
, xid
xid
, data
bytea
)
Behaves just like the pg_logical_slot_get_changes()
function, except that changes are returned as bytea
.
pg_logical_slot_peek_binary_changes
( slot_name
name
, upto_lsn
pg_lsn
, upto_nchanges
integer
, VARIADIC
options
text[]
) → setof record
( lsn
pg_lsn
, xid
xid
, data
bytea
)
Behaves just like the pg_logical_slot_peek_changes()
function, except that changes are returned as bytea
.
pg_replication_slot_advance
( slot_name
name
, upto_lsn
pg_lsn
) → record
( slot_name
name
, end_lsn
pg_lsn
)
Advances the current confirmed position of a replication slot named slot_name
. The slot will not be moved backwards, and it will not be moved beyond the current insert location. Returns the name of the slot and the actual position that it was advanced to. The updated slot position information is written out at the next checkpoint if any advancing is done. So in the event of a crash, the slot may return to an earlier position.
pg_replication_origin_create
( node_name
text
) → oid
Creates a replication origin with the given external name, and returns the internal ID assigned to it.
pg_replication_origin_drop
( node_name
text
) → void
Deletes a previously-created replication origin, including any associated replay progress.
pg_replication_origin_oid
( node_name
text
) → oid
Looks up a replication origin by name and returns the internal ID. If no such replication origin is found an error is thrown.
pg_replication_origin_session_setup
( node_name
text
) → void
Marks the current session as replaying from the given origin, allowing replay progress to be tracked. Can only be used if no origin is currently selected. Use pg_replication_origin_session_reset
to undo.
pg_replication_origin_session_reset
() → void
Cancels the effects of pg_replication_origin_session_setup()
.
pg_replication_origin_session_is_setup
() → boolean
Returns true if a replication origin has been selected in the current session.
pg_replication_origin_session_progress
( flush
boolean
) → pg_lsn
Returns the replay location for the replication origin selected in the current session. The parameter flush
determines whether the corresponding local transaction will be guaranteed to have been flushed to disk or not.
pg_replication_origin_xact_setup
( origin_lsn
pg_lsn
, origin_timestamp
timestamp with time zone
) → void
Marks the current transaction as replaying a transaction that has committed at the given LSN and timestamp. Can only be called when a replication origin has been selected using pg_replication_origin_session_setup
.
pg_replication_origin_xact_reset
() → void
Cancels the effects of pg_replication_origin_xact_setup()
.
pg_replication_origin_advance
( node_name
text
, lsn
pg_lsn
) → void
Sets replication progress for the given node to the given location. This is primarily useful for setting up the initial location, or setting a new location after configuration changes and similar. Be aware that careless use of this function can lead to inconsistently replicated data.
pg_replication_origin_progress
( node_name
text
, flush
boolean
) → pg_lsn
Returns the replay location for the given replication origin. The parameter flush
determines whether the corresponding local transaction will be guaranteed to have been flushed to disk or not.
pg_logical_emit_message
( transactional
boolean
, prefix
text
, content
text
) → pg_lsn
pg_logical_emit_message
( transactional
boolean
, prefix
text
, content
bytea
) → pg_lsn
Emits a logical decoding message. This can be used to pass generic messages to logical decoding plugins through WAL. The transactional
parameter specifies if the message should be part of the current transaction, or if it should be written immediately and decoded as soon as the logical decoder reads the record. The prefix
parameter is a textual prefix that can be used by logical decoding plugins to easily recognize messages that are interesting for them. The content
parameter is the content of the message, given either in text or binary form.
Function
Description
pg_column_size
( "any"
) → integer
Shows the number of bytes used to store any individual data value. If applied directly to a table column value, this reflects any compression that was done.
pg_database_size
( name
) → bigint
pg_database_size
( oid
) → bigint
Computes the total disk space used by the database with the specified name or OID. To use this function, you must have CONNECT
privilege on the specified database (which is granted by default) or be a member of the pg_read_all_stats
role.
pg_indexes_size
( regclass
) → bigint
Computes the total disk space used by indexes attached to the specified table.
pg_relation_size
( relation
regclass
[, fork
text
] ) → bigint
Computes the disk space used by one “fork” of the specified relation. (Note that for most purposes it is more convenient to use the higher-level functions pg_total_relation_size
or pg_table_size
, which sum the sizes of all forks.) With one argument, this returns the size of the main data fork of the relation. The second argument can be provided to specify which fork to examine:
main
returns the size of the main data fork of the relation.
fsm
returns the size of the Free Space Map (see Section 68.3) associated with the relation.
vm
returns the size of the Visibility Map (see Section 68.4) associated with the relation.
init
returns the size of the initialization fork, if any, associated with the relation.
pg_size_bytes
( text
) → bigint
Converts a size in human-readable format (as returned by pg_size_pretty
) into bytes.
pg_size_pretty
( bigint
) → text
pg_size_pretty
( numeric
) → text
Converts a size in bytes into a more easily human-readable format with size units (bytes, kB, MB, GB or TB as appropriate). Note that the units are powers of 2 rather than powers of 10, so 1kB is 1024 bytes, 1MB is 10242 = 1048576 bytes, and so on.
pg_table_size
( regclass
) → bigint
Computes the disk space used by the specified table, excluding indexes (but including its TOAST table if any, free space map, and visibility map).
pg_tablespace_size
( name
) → bigint
pg_tablespace_size
( oid
) → bigint
Computes the total disk space used in the tablespace with the specified name or OID. To use this function, you must have CREATE
privilege on the specified tablespace or be a member of the pg_read_all_stats
role, unless it is the default tablespace for the current database.
pg_total_relation_size
( regclass
) → bigint
Computes the total disk space used by the specified table, including all indexes and TOAST data. The result is equivalent to pg_table_size
+
pg_indexes_size
.
Function
Description
pg_relation_filenode
( relation
regclass
) → oid
Returns the “filenode” number currently assigned to the specified relation. The filenode is the base component of the file name(s) used for the relation (see Section 68.1 for more information). For most relations the result is the same as pg_class
.relfilenode
, but for certain system catalogs relfilenode
is zero and this function must be used to get the correct value. The function returns NULL if passed a relation that does not have storage, such as a view.
pg_relation_filepath
( relation
regclass
) → text
Returns the entire file path name (relative to the database cluster's data directory, PGDATA
) of the relation.
pg_filenode_relation
( tablespace
oid
, filenode
oid
) → regclass
Returns a relation's OID given the tablespace OID and filenode it is stored under. This is essentially the inverse mapping of pg_relation_filepath
. For a relation in the database's default tablespace, the tablespace can be specified as zero. Returns NULL
if no relation in the current database is associated with the given values.
Function
Description
pg_collation_actual_version
( oid
) → text
Returns the actual version of the collation object as it is currently installed in the operating system. If this is different from the value in pg_collation
.collversion
, then objects depending on the collation might need to be rebuilt. See also ALTER COLLATION.
pg_import_system_collations
( schema
regnamespace
) → integer
Adds collations to the system catalog pg_collation
based on all the locales it finds in the operating system. This is what initdb
uses; see Section 23.2.2 for more details. If additional locales are installed into the operating system later on, this function can be run again to add collations for the new locales. Locales that match existing entries in pg_collation
will be skipped. (But collation objects based on locales that are no longer present in the operating system are not removed by this function.) The schema
parameter would typically be pg_catalog
, but that is not a requirement; the collations could be installed into some other schema as well. The function returns the number of new collation objects it created.
Function
Description
pg_partition_tree
( regclass
) → setof record
( relid
regclass
, parentrelid
regclass
, isleaf
boolean
, level
integer
)
Lists the tables or indexes in the partition tree of the given partitioned table or partitioned index, with one row for each partition. Information provided includes the OID of the partition, the OID of its immediate parent, a boolean value telling if the partition is a leaf, and an integer telling its level in the hierarchy. The level value is 0 for the input table or index, 1 for its immediate child partitions, 2 for their partitions, and so on. Returns no rows if the relation does not exist or is not a partition or partitioned table.
pg_partition_ancestors
( regclass
) → setof regclass
Lists the ancestor relations of the given partition, including the relation itself. Returns no rows if the relation does not exist or is not a partition or partitioned table.
pg_partition_root
( regclass
) → regclass
Returns the top-most parent of the partition tree to which the given relation belongs. Returns NULL
if the relation does not exist or is not a partition or partitioned table.
Function
Description
brin_summarize_new_values
( index
regclass
) → integer
Scans the specified BRIN index to find page ranges in the base table that are not currently summarized by the index; for any such range it creates a new summary index tuple by scanning those table pages. Returns the number of new page range summaries that were inserted into the index.
brin_summarize_range
( index
regclass
, blockNumber
bigint
) → integer
Summarizes the page range covering the given block, if not already summarized. This is like brin_summarize_new_values
except that it only processes the page range that covers the given table block number.
brin_desummarize_range
( index
regclass
, blockNumber
bigint
) → void
Removes the BRIN index tuple that summarizes the page range covering the given table block, if there is one.
gin_clean_pending_list
( index
regclass
) → bigint
Cleans up the “pending” list of the specified GIN index by moving entries in it, in bulk, to the main GIN data structure. Returns the number of pages removed from the pending list. If the argument is a GIN index built with the fastupdate
option disabled, no cleanup happens and the result is zero, because the index doesn't have a pending list. See Section 66.4.1 and Section 66.5 for details about the pending list and fastupdate
option.
Function
Description
pg_ls_dir
( dirname
text
[, missing_ok
boolean
, include_dot_dirs
boolean
] ) → setof text
Returns the names of all files (and directories and other special files) in the specified directory. The include_dot_dirs
parameter indicates whether “.” and “..” are to be included in the result set; the default is to exclude them. Including them can be useful when missing_ok
is true
, to distinguish an empty directory from a non-existent directory.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_ls_logdir
() → setof record
( name
text
, size
bigint
, modification
timestamp with time zone
)
Returns the name, size, and last modification time (mtime) of each ordinary file in the server's log directory. Filenames beginning with a dot, directories, and other special files are excluded.
This function is restricted to superusers and members of the pg_monitor
role by default, but other users can be granted EXECUTE to run the function.
pg_ls_waldir
() → setof record
( name
text
, size
bigint
, modification
timestamp with time zone
)
Returns the name, size, and last modification time (mtime) of each ordinary file in the server's write-ahead log (WAL) directory. Filenames beginning with a dot, directories, and other special files are excluded.
This function is restricted to superusers and members of the pg_monitor
role by default, but other users can be granted EXECUTE to run the function.
pg_ls_archive_statusdir
() → setof record
( name
text
, size
bigint
, modification
timestamp with time zone
)
Returns the name, size, and last modification time (mtime) of each ordinary file in the server's WAL archive status directory (pg_wal/archive_status
). Filenames beginning with a dot, directories, and other special files are excluded.
This function is restricted to superusers and members of the pg_monitor
role by default, but other users can be granted EXECUTE to run the function.
pg_ls_tmpdir
( [ tablespace
oid
] ) → setof record
( name
text
, size
bigint
, modification
timestamp with time zone
)
Returns the name, size, and last modification time (mtime) of each ordinary file in the temporary file directory for the specified tablespace
. If tablespace
is not provided, the pg_default
tablespace is examined. Filenames beginning with a dot, directories, and other special files are excluded.
This function is restricted to superusers and members of the pg_monitor
role by default, but other users can be granted EXECUTE to run the function.
pg_read_file
( filename
text
[, offset
bigint
, length
bigint
[, missing_ok
boolean
]] ) → text
Returns all or part of a text file, starting at the given byte offset
, returning at most length
bytes (less if the end of file is reached first). If offset
is negative, it is relative to the end of the file. If offset
and length
are omitted, the entire file is returned. The bytes read from the file are interpreted as a string in the database's encoding; an error is thrown if they are not valid in that encoding.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_read_binary_file
( filename
text
[, offset
bigint
, length
bigint
[, missing_ok
boolean
]] ) → bytea
Returns all or part of a file. This function is identical to pg_read_file
except that it can read arbitrary binary data, returning the result as bytea
not text
; accordingly, no encoding checks are performed.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
In combination with the convert_from
function, this function can be used to read a text file in a specified encoding and convert to the database's encoding:
pg_stat_file
( filename
text
[, missing_ok
boolean
] ) → record
( size
bigint
, access
timestamp with time zone
, modification
timestamp with time zone
, change
timestamp with time zone
, creation
timestamp with time zone
, isdir
boolean
)
Returns a record containing the file's size, last access time stamp, last modification time stamp, last file status change time stamp (Unix platforms only), file creation time stamp (Windows only), and a flag indicating if it is a directory.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
Function
Description
pg_advisory_lock
( key
bigint
) → void
pg_advisory_lock
( key1
integer
, key2
integer
) → void
Obtains an exclusive session-level advisory lock, waiting if necessary.
pg_advisory_lock_shared
( key
bigint
) → void
pg_advisory_lock_shared
( key1
integer
, key2
integer
) → void
Obtains a shared session-level advisory lock, waiting if necessary.
pg_advisory_unlock
( key
bigint
) → boolean
pg_advisory_unlock
( key1
integer
, key2
integer
) → boolean
Releases a previously-acquired exclusive session-level advisory lock. Returns true
if the lock is successfully released. If the lock was not held, false
is returned, and in addition, an SQL warning will be reported by the server.
pg_advisory_unlock_all
() → void
Releases all session-level advisory locks held by the current session. (This function is implicitly invoked at session end, even if the client disconnects ungracefully.)
pg_advisory_unlock_shared
( key
bigint
) → boolean
pg_advisory_unlock_shared
( key1
integer
, key2
integer
) → boolean
Releases a previously-acquired shared session-level advisory lock. Returns true
if the lock is successfully released. If the lock was not held, false
is returned, and in addition, an SQL warning will be reported by the server.
pg_advisory_xact_lock
( key
bigint
) → void
pg_advisory_xact_lock
( key1
integer
, key2
integer
) → void
Obtains an exclusive transaction-level advisory lock, waiting if necessary.
pg_advisory_xact_lock_shared
( key
bigint
) → void
pg_advisory_xact_lock_shared
( key1
integer
, key2
integer
) → void
Obtains a shared transaction-level advisory lock, waiting if necessary.
pg_try_advisory_lock
( key
bigint
) → boolean
pg_try_advisory_lock
( key1
integer
, key2
integer
) → boolean
Obtains an exclusive session-level advisory lock if available. This will either obtain the lock immediately and return true
, or return false
without waiting if the lock cannot be acquired immediately.
pg_try_advisory_lock_shared
( key
bigint
) → boolean
pg_try_advisory_lock_shared
( key1
integer
, key2
integer
) → boolean
Obtains a shared session-level advisory lock if available. This will either obtain the lock immediately and return true
, or return false
without waiting if the lock cannot be acquired immediately.
pg_try_advisory_xact_lock
( key
bigint
) → boolean
pg_try_advisory_xact_lock
( key1
integer
, key2
integer
) → boolean
Obtains an exclusive transaction-level advisory lock if available. This will either obtain the lock immediately and return true
, or return false
without waiting if the lock cannot be acquired immediately.
pg_try_advisory_xact_lock_shared
( key
bigint
) → boolean
pg_try_advisory_xact_lock_shared
( key1
integer
, key2
integer
) → boolean
Obtains a shared transaction-level advisory lock if available. This will either obtain the lock immediately and return true
, or return false
without waiting if the lock cannot be acquired immediately.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|