在這一章之中,提供了一個如何使用 SQL 進行簡易操作的大致概念。這裡主要讓你有基本的認識,但無法提供 SQL 完整且巨細靡遺的說明。許多書籍詳細介紹了 SQL,例如「Understanding the New SQL. A complete quide.」及「A Guide to the SQL Standard. A user's guid to the standard database language SQL.」。你應該瞭解的是,一些 PostgreSQL 語法來自於標準 SQL 的延伸。
Internally, an LSN is a 64-bit integer, representing a byte position in the write-ahead log stream. It is printed as two hexadecimal numbers of up to 8 digits each, separated by a slash; for example, 16/B374D848. The pg_lsn type supports the standard comparison operators, like = and >. Two LSNs can be subtracted using the - operator; the result is the number of bytes separating those write-ahead log locations.
CREATE TABLE cities (
city varchar(80) primary key,
location point
);
CREATE TABLE weather (
city varchar(80) references cities(city),
temp_lo int,
temp_hi int,
prcp real,
date date
);
現在嘗試新增一筆不合理的資料:
INSERT INTO weather VALUES ('Berkeley', 45, 53, 0.0, '1994-11-28');
ERROR: insert or update on table "weather" violates foreign key constraint "weather_city_fkey"
DETAIL: Key (city)=(Berkeley) is not present in table "cities".
查詢語句可以加上「WHERE」來設定限制條件,以指定哪些列才需要被回傳。WHERE 的內容是一個布林(truth value)表示式,而只有在其運算值為真(true)時,該列才會被回傳。一般的布林運算子(AND, OR, NOT)都是被允許出現在表示式中的。舉例來說,下列的指令將會回傳 San Francisco 在雨天的天氣數值:
使用 LIMIT 時,運用 ORDER BY 子句將結果資料列限制為唯一順序非常重要。否則,你會得到一個不可預知的查詢資料列的子集。你可能會查詢第十到第二十個資料列,但是第十到第二十個資料列是按什麼順序排列的?次序是未知的,除非你指定 ORDER BY。
查詢最佳化在產生查詢計劃時會將 LIMIT 考慮在內,所以根據你給的 LIMIT 和 OFFSET,你很可能會得到不同的計劃(產生不同的資料列順序)。因此,使用不同的 LIMIT / OFFSET 值來選擇查詢結果的不同子集將導致不一致的結果,除非使用 ORDER BY 強制執行可預測的結果排序。這不是一個錯誤;這是一種事實上的結果,即 SQL 不保證以任何特定順序傳遞查詢的結果,除非使用 ORDER BY 來約束順序。
FROM table1是一種簡單的資料表表示式:它只讀取一個資料表。一般來說,資料表表示式可以是一般的資料表,交叉查詢和子查詢的複雜結構。但是,你也可以完全省略資料表表示式,並使用 SELECT 指令作為計算機:
8.10. 位元字串型別
Bit strings are strings of 1's and 0's. They can be used to store or visualize bit masks. There are two SQL bit types: bit(n) and bit varying(n), where n is a positive integer.
bit type data must match the length n exactly; it is an error to attempt to store shorter or longer bit strings. bit varying
The data type uuid stores Universally Unique Identifiers (UUID) as defined by RFC 4122, ISO/IEC 9834-8:2005, and related standards. (Some systems refer to this data type as a globally unique identifier, or GUID, instead.) This identifier is a 128-bit quantity that is generated by an algorithm chosen to make it very unlikely that the same identifier will be generated by anyone else in the known universe using the same algorithm. Therefore, for distributed systems, these identifiers provide a better uniqueness guarantee than sequence generators, which are only unique within a single database.
A UUID is written as a sequence of lower-case hexadecimal digits, in several groups separated by hyphens, specifically a group of 8 digits followed by three groups of 4 digits followed by a group of 12 digits, for a total of 32 digits representing the 128 bits. An example of a UUID in this standard form is:
PostgreSQL also accepts the following alternative forms for input: use of upper-case digits, the standard format surrounded by braces, omitting some or all hyphens, adding a hyphen after any group of four digits. Examples are:
Output is always in the standard form.
8.18. Domain Types
A domain is a user-defined data type that is based on another underlying type. Optionally, it can have constraints that restrict its valid values to a subset of what the underlying type would allow. Otherwise it behaves like the underlying type — for example, any operator or function that can be applied to the underlying type will work on the domain type. The underlying type can be any built-in or user-defined base type, enum type, array type, composite type, range type, or another domain.
For example, we could create a domain over integers that accepts only positive integers:
When an operator or function of the underlying type is applied to a domain value, the domain is automatically down-cast to the underlying type. Thus, for example, the result of mytable.id - 1 is considered to be of type integer not posint. We could write (mytable.id - 1)::posint
PostgreSQL provides storage and comparison functions for UUIDs, but the core database does not include any function for generating UUIDs, because no single algorithm is well suited for every application. The uuid-ossp module provides functions that implement several standard algorithms. The pgcrypto module also provides a generation function for random UUIDs. Alternatively, UUIDs could be generated by client applications or other libraries invoked through a server-side function.
, causing the domain's constraints to be rechecked. In this case, that would result in an error if the expression had been applied to an
id
value of 1. Assigning a value of the underlying type to a field or variable of the domain type is allowed without writing an explicit cast, but the domain's constraints will be checked.
CREATE TABLE capitals (
name text,
population real,
altitude int, -- (in ft)
state char(2)
);
CREATE TABLE non_capitals (
name text,
population real,
altitude int -- (in ft)
);
CREATE VIEW cities AS
SELECT name, population, altitude FROM capitals
UNION
SELECT name, population, altitude FROM non_capitals;
結果:
你可以將結果進行排序:
在這個例子之中,其次序並沒有完全地被指定,所以你可能會得到 San Francisco 的列以另一種次序呈現。而你如果以下列指令查詢的話,那你就會得到如上但固定的結果:
你可以在查詢時去除重覆的列:
再一次,其結果的次序可能每次都不同,你可以同時使用 DISTINCT 及 ORDER BY 來確保能得到一致性的查詢結果:
SELECT * FROM weather;
SELECT city, temp_lo, temp_hi, prcp, date FROM weather;
SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
data is of variable length up to the maximum length
n
; longer strings will be rejected. Writing
bit
without a length is equivalent to
bit(1)
, while
bit varying
without a length specification means unlimited length.
Note
If one explicitly casts a bit-string value to bit(n), it will be truncated or zero-padded on the right to be exactly n bits, without raising an error. Similarly, if one explicitly casts a bit-string value to bit varying(n), it will be truncated on the right if it is more than n bits.
Refer to Section 4.1.2.5 for information about the syntax of bit string constants. Bit-logical operators and string manipulation functions are available; see Section 9.6.
Example 8.3. Using the Bit String Types
A bit string value requires 1 byte for each group of 8 bits, plus 5 or 8 bytes overhead depending on the length of the string (but long values may be compressed or moved out-of-line, as explained in Section 8.3 for character strings).
DELETE FROM products
WHERE obsoletion_date = 'today'
RETURNING *;
CREATE TABLE cities (
name text,
population real,
altitude int -- (in ft)
);
CREATE TABLE capitals (
state char(2)
) INHERITS (cities);
SELECT name, altitude
FROM cities
WHERE altitude > 500;
name | altitude
-----------+----------
Las Vegas | 2174
Mariposa | 1953
Madison | 845
(3 rows)
SELECT name, altitude
FROM ONLY cities
WHERE altitude
>
500;
name | altitude
-----------+----------
Las Vegas | 2174
Mariposa | 1953
(2 rows)
city | temp_lo | temp_hi | prcp | date
---------------+---------+---------+------+------------
San Francisco | 46 | 50 | 0.25 | 1994-11-27
San Francisco | 43 | 57 | 0 | 1994-11-29
Hayward | 37 | 54 | | 1994-11-29
(3 rows)
city | temp_avg | date
---------------+----------+------------
San Francisco | 48 | 1994-11-27
San Francisco | 50 | 1994-11-29
Hayward | 45 | 1994-11-29
(3 rows)
SELECT * FROM weather
WHERE city = 'San Francisco' AND prcp > 0.0;
city | temp_lo | temp_hi | prcp | date
---------------+---------+---------+------+------------
San Francisco | 46 | 50 | 0.25 | 1994-11-27
(1 row)
SELECT * FROM weather
ORDER BY city;
city | temp_lo | temp_hi | prcp | date
---------------+---------+---------+------+------------
Hayward | 37 | 54 | | 1994-11-29
San Francisco | 43 | 57 | 0 | 1994-11-29
San Francisco | 46 | 50 | 0.25 | 1994-11-27
SELECT * FROM weather
ORDER BY city, temp_lo;
SELECT DISTINCT city
FROM weather;
city
---------------
Hayward
San Francisco
(2 rows)
CREATE TABLE test (a BIT(3), b BIT VARYING(5));
INSERT INTO test VALUES (B'101', B'00');
INSERT INTO test VALUES (B'10', B'101');
ERROR: bit string length 2 does not match type bit(3)
INSERT INTO test VALUES (B'10'::bit(3), B'101');
SELECT * FROM test;
a | b
-----+-----
101 | 00
100 | 101
INSERT INTO weather (date, city, temp_hi, temp_lo)
VALUES ('1994-11-29', 'Hayward', 54, 37);
POSTGRES 專案是由 Michael Stonebraker 教授領導的團隊進行研發,由美國國防高等研究計劃署(DARPA, Defense Advanced Research Projects Agency)、美國陸軍研究辦公室(ARO, the Army Research Office)、美國國家科學基金會(NSF, the National Science Foundation)及美國電磁系統實驗室(ESL, Inc)所贊助。POSTGRES 專案始於 1986 年,最原始的設計,"The design of POSTGRES",作為開端,其最初的資料結構模型則揭露於"The POSTGRES data model"。規則系統設計發表於"The design of the POSTGRES rules system",而當時的關連式資料儲存的架構則刊載於"The design of the POSTGRES storage system"。
舉例來說,某個銀行資料庫存放著各個客戶的存款資訊,也存放著分行的存款總額資訊。假設我們想要轉帳 $100.00,從 Alice 的帳號轉到 Bob 的帳戶。可以很直觀地依敘述,直接以下列指令執行:
這些指令的細節在這裡並不重要,重要的是,有好幾個更新資料的動作要被執行。我們銀行的營業員需要保證所有的更新資料都要完成,或是保持原樣。如果因為系統錯誤,而造成 Bob 收到 $100.00,但 Alice 卻沒有轉出金額,就不是應該發生的事。又或是 Alice 轉出了現金,而 Bob 卻沒有轉入金額,她也不會是開心的客戶。我們需要具有保證交易安全的方法,也就是如果在執行過程中,有部份出了錯,那麼即使是已經執行的部份,也不會對資料庫產生影響。把這些更新資料的指令,包裝在一個交易之中,就是這個保證交易安全的方法。這樣的交易稱作為 atomic:從其他的交易的角度來看,整個行為只有完全執行,亦或是什麼都沒有做,兩種結果而已。
我們也希望有某個保證是,一旦某個交易被完成了,那麼會由資料庫系統發出通知,使它確實是永久性的資料,即使發生短暫的當機之後,資料也不會遺失。舉例來說,如果我們正在進行 Bob 的提款系統操作行為,在他走出銀行大門之後,我們不要有任何可能性使他的提款記錄消失。一個具備交易安全的資料庫,會將這裡交易裡的更新行為,在它們被回報完成之前,都記錄在長效型儲存裝置上(也就是磁碟機)。
交易安全資料庫的另一個重要性質是, atomic update 的概念:當多個交易同時在進行時,每一個交易都不能夠看到其他交易未完成交易的資料狀態。舉個例子,如果某個交易正在進行總計所有分行的餘額,它不會只包含 Alice 的分行的提款,或不計算 Bob 的分行的存款,反之亦然。所以交易必須是全有全無的結果,而不只是資料庫資料的永久性,還包含了交易執行過程的可視性。一個未完成的交易直到完全完成之前,其間資料的改變,對其他的交易而言都看不見;而當交易完成的同時,資料的改變也同時全部呈現出來。
在 PostgreSQL 中,所謂的交易,是以 SQL 的 BEGIN 及 COMMIT 兩個指令相夾的過程。
所以我們前述的銀行交易實際上會像這樣:
如果在交易的過程之中,我們決定不要完成交易(也許我們發現 Alice 的帳戶餘額不足),我們可以使用 ROLLBACK 指令來取代 COMMIT,那麼所有資料的變更都會取消。
PostgreSQL 一般將每一個 SQL 指令都視為一個交易來執行。如果你並沒有使用 BEGIN 指令,那麼每一個個別的指令就會隱含 BEGIN 先行,然後如果成功的話,COMMIT 也自動執行。一系列被 BEGIN 和 COMMIT 包夾的區域,有時候就稱為交易區塊。
注意
有一些用戶端程式會自動加入執行 BEGIN 及 COMMIT 指令,使得你不需要要求就獲得交易區塊的效果。請詳閱你所所用的工具文件。
這裡很重要的是,瞭解 SQL 中 WHERE 和 HAVING 之間的行為。其根本上的差異是:WHERE 會在合併和彙總計算之前進行選擇資料的動作(也就是它控制著,哪些資料需要被彙總計算);而 HAVING 是在合併及彙整計算之後,才進行過濾資料的動作。所以,在 WHERE 條件式當中,絕不可以使用彙整運算式;另一方面,HAVING 條件式總是使用彙整運算式。(嚴格來說,你也可以不在 HAVING 條件式中使用彙整運算式,但很少人這樣使用,通常就會改寫到 WHERE 條式件當中,那會更有效率。)
在先前的例子當中,我們可以把城市名稱的限制放在 WHERE 條件式之中,因為它不需要彙總。這將會比放在 HAVING 條件式中更有效率,因為這樣可以避免合併及彙整運算整個表格,不用浪費時間在本來就會被過濾掉的資料上。
如你所見,欄位的表示式可以引用資料列中現有的內容。我們還遺漏了 WHERE 子句。如果省略的話,則意味著資料表中的所有資料列都會被更新。如果存在的話,則只有更新符合 WHERE 條件的那些資料列。請注意,SET 子句中的等號是一個賦值運算,而 WHERE 子句中的等號是比較運算,但這不會造成任何誤解。當然,WHERE 條件不一定是等號運算。 還有許多其他的運算子可以使用(詳見第 9 章)。但是表示式需要能產生為布林運算的結果。
有很多不同種類的權限:SELECT、INSERT、UPDATE、DELETE、TRUNCATE、REFERENCES、TRIGGER、CREATE、CONNECT、TEMPORARY、EXECUTE、USAGE。這些權限對於不同物件的效果,會因為是哪一種物件而有所差別(表格、函式...等等)。要瞭解完整在 PostgreSQL 中所支援的各種物件權限,請參考 GRANT 語法頁面。這裡的內容主要說明如何使用。
修改和移除一個資料庫物件,是只有擁有者才具備的權力。
要把一個物件被指派給一個新的擁有者的話,使用該物件的 ALTER 指令,例如:ALTER TABLE。超級使用者也可以做指派的動作;原來的擁有者如果它仍是該物件的管理群組一員的話,當然也可以;再來就管理群組新的成員。
要進行授權行為的話,請使用 GRANT 指令。舉例來說,如果 joe 是一個使用者,而 accounts 是一個表格,要讓他可以獲得更新表格資料的權力:
SELECT select_list
FROM table_expression
ORDER BY sort_expression1 [ASC | DESC] [NULLS { FIRST | LAST }]
[, sort_expression2 [ASC | DESC] [NULLS { FIRST | LAST }] ...]
DROP TABLE products;
ERROR: cannot drop table products because other objects depend on it
DETAIL: constraint orders_product_no_fkey on table orders depends on table products
HINT: Use DROP ... CASCADE to drop the dependent objects too.
這個錯誤訊息包含了很有用的指引:如果你不想要一個個處理其相依關連性,那可以一次刪去他們:
如此所有相依的物件就會被刪除了,所有相互依存的物件都會,是遞迴式的處理流程。在這個例子中,它不會移除訂單資料表,只會移除外部索引鍵的限制條件,因為沒有其他物件與該外部索引鍵相依。(如果你要確認 DROP ... CASCADE 會處理哪些物件,你可以用 DETAIL 取代 CASCADE,就會輸出其相依的物件。)
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
UPDATE branches SET balance = balance - 100.00
WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Alice');
UPDATE accounts SET balance = balance + 100.00
WHERE name = 'Bob';
UPDATE branches SET balance = balance + 100.00
WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Bob');
SELECT city FROM weather WHERE temp_lo = max(temp_lo);
WRONG
SELECT city FROM weather
WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
city
---------------
San Francisco
(1 row)
SELECT city, max(temp_lo)
FROM weather
GROUP BY city;
city | max
---------------+-----
Hayward | 37
San Francisco | 46
(2 rows)
=> SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS t (num,letter);
num | letter
-----+--------
1 | one
2 | two
3 | three
(3 rows)
SELECT select_list FROM table_expression
DROP TABLE products CASCADE;
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm / 2.54) STORED
);
OID 是一個 32 位元的數字,以 cluster 為單位配發。在一個大型或長期使用的資料庫中,是有可能出現重覆的情況。所以,假設 OID 是唯一的識別是不正確的觀念,除非你還有搭配其他方法來確保唯一性。如果你需要識別表格中的資料列的話,使用序列數產生器是比較建議的作法。OID 也可以這樣用來得到一些額外的預防性功能:
唯一性的限制應該設定在 OID 欄位上,來確保每一個 OID 可以識別每一個資料列。當有唯一性限制存在的時候,對於已經存在的資料列就不會有重覆的 OID。(當然,這方法只能用於資料筆數在 40 億筆以下的表格。不過實務上的表格多數都少於這個數目,而且太多資料的話,效果也會變得很差。)
OID 在多個表格間就不能假設為是唯一,你應該搭配 tableoid 來識別資料庫層級的唯一性。
當然,在建立表格時必須要加入 WITH OIDS 語法。在 PostgreSQL 8.1 之前,WITHOUT OIDS 是預設值。
SELECT a + b AS sum, c FROM table1 ORDER BY sum;
SELECT a, max(b) FROM table1 GROUP BY a ORDER BY 1;
SELECT a + b AS sum, c FROM table1 ORDER BY sum + c; -- 錯誤
CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow',
'green', 'blue', 'purple');
CREATE TABLE my_colors (color rainbow, note text);
CREATE FUNCTION get_color_note (rainbow) RETURNS text AS
'SELECT note FROM my_colors WHERE color = $1'
LANGUAGE SQL;
這裡表示式是一個任意的運算表示式,對所有資料列進行求值運算。所有表示式相等的一組資料列被認為是重複的,並且只有該組的第一個資料列會被保留在輸出中。請注意,集合中的「第一行」是不可預知的,除非查詢按足夠的欄位進行排序,以保證進到 DISTINCT 過濾器的資料列是唯一排序。(在 ORDER BY 排序後才進行 DISTINCT ON 處理。)
DISTINCT ON 子句不是SQL標準的一部分,有時被認為是不好的樣式,因為其結果有潛在的不確定性。透過在 FROM 中智慧地使用 GROUP BY 和子查詢,可以避免這種結構,但這卻往往是最方便的選擇。
CREATE TABLE weather (
city varchar(80),
temp_lo int, -- low temperature
temp_hi int, -- high temperature
prcp real, -- precipitation
date date
);
CREATE TABLE cities (
name varchar(80),
location point
);
DROP TABLE tablename;
BEGIN;
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
-- etc etc
COMMIT;
BEGIN;
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
SAVEPOINT my_savepoint;
UPDATE accounts SET balance = balance + 100.00
WHERE name = 'Bob';
-- oops ... forget that and use Wally's account
ROLLBACK TO my_savepoint;
UPDATE accounts SET balance = balance + 100.00
WHERE name = 'Wally';
COMMIT;
psql (12.4)
Type "help" for help.
mydb=>
mydb=> SELECT version();
version
------------------------------------------------------------------------------------------
PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
(1 row)
mydb=> SELECT current_date;
date
------------
2016-01-07
(1 row)
mydb=> SELECT 2 + 2;
?column?
----------
4
(1 row)
mydb=> \h
mydb=> \q
SELECT city, max(temp_lo)
FROM weather
WHERE city LIKE 'S%' -- (1)
GROUP BY city
HAVING max(temp_lo) < 40;
UPDATE products SET price = price * 1.10;
UPDATE mytable SET a = 5, b = 3, c = 1 WHERE a > 0;
REVOKE ALL ON accounts FROM PUBLIC;
SELECT '52093.89'::money::numeric::float8;
CREATE FUNCTION concat_lower_or_upper(a text, b text, uppercase boolean DEFAULT false)
RETURNS text
AS
$$
SELECT CASE
WHEN $3 THEN UPPER($1 || ' ' || $2)
ELSE LOWER($1 || ' ' || $2)
END;
$$
LANGUAGE SQL IMMUTABLE STRICT;
SELECT concat_lower_or_upper('Hello', 'World', true);
concat_lower_or_upper
-----------------------
HELLO WORLD
(1 row)
SELECT concat_lower_or_upper('Hello', 'World');
concat_lower_or_upper
-----------------------
hello world
(1 row)
SELECT concat_lower_or_upper(a => 'Hello', b => 'World');
concat_lower_or_upper
-----------------------
hello world
(1 row)
SELECT concat_lower_or_upper(a => 'Hello', b => 'World', uppercase => true);
concat_lower_or_upper
-----------------------
HELLO WORLD
(1 row)
SELECT concat_lower_or_upper(a => 'Hello', uppercase => true, b => 'World');
concat_lower_or_upper
-----------------------
HELLO WORLD
(1 row)
SELECT concat_lower_or_upper(a := 'Hello', uppercase := true, b := 'World');
concat_lower_or_upper
-----------------------
HELLO WORLD
(1 row)
INSERT INTO products (product_no, name, price)
SELECT product_no, name, price FROM new_products
WHERE release_date = 'today';
SELECT a, b, c FROM ...
SELECT tbl1.a, tbl2.a, tbl1.b FROM ...
SELECT tbl1.*, tbl2.a FROM ...
SELECT a AS value, b + c AS sum FROM ...
SELECT a value, b + c AS sum FROM ...
SELECT a "value", b + c AS sum FROM ...
SELECT DISTINCT select_list ...
SELECT DISTINCT ON (expression [, expression ...]) select_list ...
createdb: command not found
$ /usr/local/pgsql/bin/createdb mydb
createdb: could not connect to database postgres: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
createdb: could not connect to database postgres: FATAL: role "joe" does not exist
createdb: database creation failed: ERROR: permission denied to create database
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
前面三個欄位是由資料表 empsalary 直接取得,每一個資料列就是該資料表的每一個資料列列。而第四個欄位則呈現整個資料表中,與其 depname 相同的平均薪資。(這實際上就是由非窗函數的 avg 彙總而得,只是 OVER 修飾字讓它成為窗函數,透過「窗」的可見範圍做計算。)
窗函數都會使用 OVER 修飾字,然後緊接著窗函數及其參數。這是在語法上使其有別於一般函數或非窗函數的彙總。OVER 區段需要確切指出如何分組要被窗函數計算的資料列。PARTITION BY 在 OVER 中,意思是要以 PARTITION BY 之後的表示式來分組或拆分資料列的資料。對於每一個資料列而言,窗函數的結果是,透過所有和該資料列相同分組的資料,共同運算而得。
你也可以控制列被窗函數處理的次序,透過在 OVER 中加入 ORDER BY。(窗內的 ORDER BY 不見得需要對應到資料列輸出的次序)例子如下:
如上所示,rank 函數為每個有使用 ORDER BY 的分組,標記一系列數字的次序。rank 不需要特定的參數,因為它標記的範圍一定是整個 OVER 所涵蓋定的範圍。
窗函數所計算的範圍,是一個虛擬資料表的概念,是由 WHERE、GROUP BY、HAVING、或其他方式虛擬出來的。舉例來說,當某個資料列被 WHERE 過濾掉時,它也不會被任何窗函數看見。一個查詢中可以包含多個窗函數,透過不同 OVER 修飾字的指定,將資料做不同觀點的處理。但他們都會在一個相同的虛擬資料表中進行處理。
我們已經瞭解如果次序不重要的話, ORDER BY 可以被省略;且如果所有的資料列都只區分成一組的話,其實 PARITION BY 也可以省略。
還有另一個窗函數相關的重要概念:對於每一個資料列來說,它會在分組中還有個分組,另稱作窗框(window frame),有一些窗函數只對窗框裡的資料列進行處理,而不是整個分組。預設的情況是,如果 ORDER BY 被指定了,以 ORDER BY 排序後,那麼窗框的範圍就是從分組的第一列到該列為止,而在那之後資料列的值都會相同。當 ORDER BY 被省略的時候,預設窗框的範圍就是整個分組。下面是使用 sum 的例子:
上面可以看到,因為在 OVER 裡面沒有 ORDER BY,窗框就等於整個分組,甚至因為沒有 PARTITION BY,所以等於整個資料表。換句話說,每一個資料列總和都是整個資料表的總計,所以我們在每一個資料列中都得到相同的結果。但如果我們加入了 ORDER BY 之後,結果將會不同:
這裡的總和就是從第一筆(最小),加計到每一列,包含薪資相同的每一列(注意薪資相同的)。
窗函數只允許出現在 SELECT 的輸出列表及 ORDER BY 子句裡,在其他地方都是被禁止的,像是 GROUP BY,HAVING,WHERE等區段。這是因為窗函數在邏輯上,都是在他們處理完之後才進一步處理資料的。也就是說,窗函數是在非窗函數之後才執行的。這意指在窗函數中使用非窗函數是可以的,但反過來就不行了。
如果有一個需要在窗函數處理完再進行過濾或分組的查詢的話,你可以使用子查詢。舉列來說:
上面的查詢只會顯示內層查詢的次序(rank)小於 3 的資料。
當一個查詢使用了多個窗函數的話,它就會分別使用 OVER 子句來描述,但如果相同的分組方式要被多個函數所引用的話,就重覆了,也容易出錯。這種情況可以使用 WINDOW 子句來取一個別名,來取代 OVER。舉個例子:
更多窗函數的細節可以參閱 、、、及 的說明頁。
2.6. 交叉查詢
到目前為止,我們的一個查詢都只涉及到一個資料表。其實可以在同一個查詢中,同時查詢多個資料表,或者在同一個資料表之中同時處理多個資料列的資料。在一個查詢之中,涉及到同一個或多個不同的資料表中的資料,稱作為交叉查詢(join)。舉個例子來說,你希望同時列出天氣和城市位置的資料。要完成這項工作,我們需要關連資料表 weather 中的 city 欄位與表格 cities 中的 name 欄位,然後回傳符合條件的資料。
[Example 8-2]("DATATYPE-BOOLEAN-EXAMPLE") 使用字母 t 和 f,來顯示布林型別的輸出。
Example 8-2. 使用布林型別
8.7. 列舉型別
Enumerated (enum) types are data types that comprise a static, ordered set of values. They are equivalent to the enum types supported in a number of programming languages. An example of an enum type might be the days of the week, or a set of status values for a piece of data.
8.7.1. Declaration of Enumerated Types
Enum types are created using the CREATE TYPE command, for example:
Once created, the enum type can be used in table and function definitions much like any other type:
8.7.2. Ordering
The ordering of the values in an enum type is the order in which the values were listed when the type was created. All standard comparison operators and related aggregate functions are supported for enums. For example:
8.7.3. Type Safety
Each enumerated data type is separate and cannot be compared with other enumerated types. See this example:
If you really need to do something like that, you can either write a custom operator or add explicit casts to your query:
8.7.4. Implementation Details
Enum labels are case sensitive, so 'happy' is not the same as 'HAPPY'. White space in the labels is significant too.
Although enum types are primarily intended for static sets of values, there is support for adding new values to an existing enum type, and for renaming values (see ). Existing values cannot be removed from an enum type, nor can the sort ordering of such values be changed, short of dropping and re-creating the enum type.
An enum value occupies four bytes on disk. The length of an enum value's textual label is limited by the NAMEDATALEN setting compiled into PostgreSQL; in standard builds this means at most 63 bytes.
The translations from internal enum values to textual labels are kept in the system catalog . Querying this catalog directly can be useful.
8.13. XML 型別
The xml data type can be used to store XML data. Its advantage over storing XML data in a text field is that it checks the input values for well-formedness, and there are support functions to perform type-safe operations on it; see Section 9.14. Use of this data type requires the installation to have been built with configure --with-libxml.
The xml type can store well-formed “documents”, as defined by the XML standard, as well as “content” fragments, which are defined by reference to the more permissive “document node” of the XQuery and XPath data model. Roughly, this means that content fragments can have more than one top-level element or character node. The expression xmlvalue IS DOCUMENT can be used to evaluate whether a particular xml value is a full document or only a content fragment.
Limits and compatibility notes for the xml data type can be found in .
8.13.1. Creating XML Values
To produce a value of type xml from character data, use the function xmlparse:
Examples:
While this is the only way to convert character strings into XML values according to the SQL standard, the PostgreSQL-specific syntaxes:
can also be used.
The xml type does not validate input values against a document type declaration (DTD), even when the input value specifies a DTD. There is also currently no built-in support for validating against other XML schema languages such as XML Schema.
The inverse operation, producing a character string value from xml, uses the function xmlserialize:
type can be character, character varying, or text (or an alias for one of those). Again, according to the SQL standard, this is the only way to convert between type xml and character types, but PostgreSQL also allows you to simply cast the value.
When a character string value is cast to or from type xml without going through XMLPARSE or XMLSERIALIZE, respectively, the choice of DOCUMENT versus CONTENT is determined by the “XML option” session configuration parameter, which can be set using the standard command:
or the more PostgreSQL-like syntax
The default is CONTENT, so all forms of XML data are allowed.
8.13.2. Encoding Handling
Care must be taken when dealing with multiple character encodings on the client, server, and in the XML data passed through them. When using the text mode to pass queries to the server and query results to the client (which is the normal mode), PostgreSQL converts all character data passed between the client and the server and vice versa to the character encoding of the respective end; see . This includes string representations of XML values, such as in the above examples. This would ordinarily mean that encoding declarations contained in XML data can become invalid as the character data is converted to other encodings while traveling between client and server, because the embedded encoding declaration is not changed. To cope with this behavior, encoding declarations contained in character strings presented for input to the xml type are ignored, and content is assumed to be in the current server encoding. Consequently, for correct processing, character strings of XML data must be sent from the client in the current client encoding. It is the responsibility of the client to either convert documents to the current client encoding before sending them to the server, or to adjust the client encoding appropriately. On output, values of type xml will not have an encoding declaration, and clients should assume all data is in the current client encoding.
When using binary mode to pass query parameters to the server and query results back to the client, no encoding conversion is performed, so the situation is different. In this case, an encoding declaration in the XML data will be observed, and if it is absent, the data will be assumed to be in UTF-8 (as required by the XML standard; note that PostgreSQL does not support UTF-16). On output, data will have an encoding declaration specifying the client encoding, unless the client encoding is UTF-8, in which case it will be omitted.
Needless to say, processing XML data with PostgreSQL will be less error-prone and more efficient if the XML data encoding, client encoding, and server encoding are the same. Since XML data is internally processed in UTF-8, computations will be most efficient if the server encoding is also UTF-8.
Caution
Some XML-related functions may not work at all on non-ASCII data when the server encoding is not UTF-8. This is known to be an issue for xmltable() and xpath() in particular.
8.13.3. Accessing XML Values
The xml data type is unusual in that it does not provide any comparison operators. This is because there is no well-defined and universally useful comparison algorithm for XML data. One consequence of this is that you cannot retrieve rows by comparing an xml column against a search value. XML values should therefore typically be accompanied by a separate key field such as an ID. An alternative solution for comparing XML values is to convert them to character strings first, but note that character string comparison has little to do with a useful XML comparison method.
Since there are no comparison operators for the xml data type, it is not possible to create an index directly on a column of this type. If speedy searches in XML data are desired, possible workarounds include casting the expression to a character string type and indexing that, or indexing an XPath expression. Of course, the actual query would have to be adjusted to search by the indexed expression.
The text-search functionality in PostgreSQL can also be used to speed up full-document searches of XML data. The necessary preprocessing support is, however, not yet available in the PostgreSQL distribution.
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
(2 rows)
SELECT depname, empno, salary, enroll_date
FROM
(SELECT depname, empno, salary, enroll_date,
rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos
FROM empsalary
) AS ss
WHERE pos < 3;
SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);
SELECT city, temp_lo, temp_hi, prcp, date, location
FROM weather, cities
WHERE city = name;
SELECT weather.city, weather.temp_lo, weather.temp_hi,
weather.prcp, weather.date, cities.location
FROM weather, cities
WHERE cities.name = weather.city;
SELECT *
FROM weather INNER JOIN cities ON (weather.city = cities.name);
SELECT *
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
Hayward | 37 | 54 | | 1994-11-29 | |
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
(3 rows)
SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
W2.city, W2.temp_lo AS low, W2.temp_hi AS high
FROM weather W1, weather W2
WHERE W1.temp_lo < W2.temp_lo
AND W1.temp_hi > W2.temp_hi;
city | low | high | city | low | high
---------------+-----+------+---------------+-----+------
San Francisco | 43 | 57 | San Francisco | 46 | 50
Hayward | 37 | 54 | San Francisco | 46 | 50
(2 rows)
SELECT *
FROM weather w, cities c
WHERE w.city = c.name;
ALTER TABLE products ADD COLUMN description text;
ALTER TABLE products ADD COLUMN description text CHECK (description <> '');
ALTER TABLE products DROP COLUMN description;
ALTER TABLE products DROP COLUMN description CASCADE;
ALTER TABLE products ADD CHECK (name <> '');
ALTER TABLE products ADD CONSTRAINT some_name UNIQUE (product_no);
ALTER TABLE products ADD FOREIGN KEY (product_group_id) REFERENCES product_groups;
ALTER TABLE products ALTER COLUMN product_no SET NOT NULL;
ALTER TABLE products DROP CONSTRAINT some_name;
ALTER TABLE products ALTER COLUMN product_no DROP NOT NULL;
ALTER TABLE products ALTER COLUMN price SET DEFAULT 7.77;
ALTER TABLE products ALTER COLUMN price DROP DEFAULT;
ALTER TABLE products ALTER COLUMN price TYPE numeric(10,2);
ALTER TABLE products RENAME COLUMN product_no TO product_number;
ALTER TABLE products RENAME TO items;
CREATE TABLE test1 (a boolean, b text);
INSERT INTO test1 VALUES (TRUE, 'sic est');
INSERT INTO test1 VALUES (FALSE, 'non est');
SELECT * FROM test1;
a | b
---+---------
t | sic est
f | non est
SELECT * FROM test1 WHERE a;
a | b
---+---------
t | sic est
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
CREATE TABLE person (
name text,
current_mood mood
);
INSERT INTO person VALUES ('Moe', 'happy');
SELECT * FROM person WHERE current_mood = 'happy';
name | current_mood
------+--------------
Moe | happy
(1 row)
INSERT INTO person VALUES ('Larry', 'sad');
INSERT INTO person VALUES ('Curly', 'ok');
SELECT * FROM person WHERE current_mood > 'sad';
name | current_mood
-------+--------------
Moe | happy
Curly | ok
(2 rows)
SELECT * FROM person WHERE current_mood > 'sad' ORDER BY current_mood;
name | current_mood
-------+--------------
Curly | ok
Moe | happy
(2 rows)
SELECT name
FROM person
WHERE current_mood = (SELECT MIN(current_mood) FROM person);
name
-------
Larry
(1 row)
CREATE TYPE happiness AS ENUM ('happy', 'very happy', 'ecstatic');
CREATE TABLE holidays (
num_weeks integer,
happiness happiness
);
INSERT INTO holidays(num_weeks,happiness) VALUES (4, 'happy');
INSERT INTO holidays(num_weeks,happiness) VALUES (6, 'very happy');
INSERT INTO holidays(num_weeks,happiness) VALUES (8, 'ecstatic');
INSERT INTO holidays(num_weeks,happiness) VALUES (2, 'sad');
ERROR: invalid input value for enum happiness: "sad"
SELECT person.name, holidays.num_weeks FROM person, holidays
WHERE person.current_mood = holidays.happiness;
ERROR: operator does not exist: mood = happiness
SELECT person.name, holidays.num_weeks FROM person, holidays
WHERE person.current_mood::text = holidays.happiness::text;
name | num_weeks
------+-----------
Moe | 4
(1 row)
XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
SET XML OPTION { DOCUMENT | CONTENT };
SET xmloption TO { DOCUMENT | CONTENT };
any
Indicates that a function accepts any input data type.
anyelement
Indicates that a function accepts any data type (see ).
anyarray
Indicates that a function accepts any array data type (see ).
anynonarray
Indicates that a function accepts any non-array data type (see ).
anyenum
Indicates that a function accepts any enum data type (see and ).
anyrange
Indicates that a function accepts any range data type (see and ).
cstring
Indicates that a function accepts or returns a null-terminated C string.
internal
Indicates that a function accepts or returns a server-internal data type.
language_handler
A procedural language call handler is declared to return language_handler.
fdw_handler
A foreign-data wrapper handler is declared to return fdw_handler.
index_am_handler
An index access method handler is declared to return index_am_handler.
tsm_handler
A tablesample method handler is declared to return tsm_handler.
record
Identifies a function taking or returning an unspecified row type.
trigger
A trigger function is declared to return trigger.
event_trigger
An event trigger function is declared to return event_trigger.
pg_ddl_command
Identifies a representation of DDL commands that is available to event triggers.
void
Indicates that a function returns no value.
unknown
Identifies a not-yet-resolved type, e.g. of an undecorated string literal.
opaque
An obsolete type name that formerly served many of the above purposes.
Functions coded in C (whether built-in or dynamically loaded) can be declared to accept or return any of these pseudo data types. It is up to the function author to ensure that the function will behave safely when a pseudo-type is used as an argument type.
Functions coded in procedural languages can use pseudo-types only as allowed by their implementation languages. At present most procedural languages forbid use of a pseudo-type as an argument type, and allow only void and record as a result type (plus trigger or event_trigger when the function is used as a trigger or event trigger). Some also support polymorphic functions using the types anyelement, anyarray, anynonarray, anyenum, and anyrange.
The internal pseudo-type is used to declare functions that are meant only to be called internally by the database system, and not by direct invocation in an SQL query. If a function has at least one internal-type argument then it cannot be called from SQL. To preserve the type safety of this restriction it is important to follow this coding rule: do not create any function that is declared to return internal unless it has at least one internal argument.
在繼續之前,請檢查 TODO 列表和常見問題解答,看看您的錯誤是否已知。 如果您無法瞭解 TODO 列表中的資訊,請報告你的問題。我們至少可以做的是使 TODO 列表更清楚。
5.2. 回報內容
關於錯誤報告最重要的是陳述所有事實並且只有事實。不要對錯誤的可能性進行推測,比如用「似乎(it seemed to do)」字句進行描述或程式的哪個部分有故障。如果你不熟悉實作的方法,你可能會猜測錯誤,而無法幫助我們。即使你有知識性的解釋,也應該是對事實的補充而不是替代它們。如果我們要修復這個錯誤,我們還是首先要看到它發生。報告裸露的事實是相對簡單的(你可以從屏幕上複製貼上它們),但是經常重要的細節被忽略,因為有人認為這並不重要,或者認為報告被理解是應當的。
不幸由於垃圾郵件數量,所有上述電子郵件地址都是封閉的郵件列表。也就是說,您需要訂閱列表才能發布。(然而,您不需要訂閱使用錯誤報告 Web 表單。)如果你想發送郵件但不想接收郵件列表,您可以訂閱並設置您的訂閱選項為「nomail」。想得到更多訊息的話,請寄一封郵件到 <>,內文請輸入「help」一個單字,就會自動回覆你進一步的訊息。`
9.6. 二元字串函式及運算子
This section describes functions and operators for examining and manipulating bit strings, that is values of the types bit and bit varying. Aside from the usual comparison operators, the operators shown in can be used. Bit string operands of &, |, and # must be of equal length. When bit shifting, the original length of the string is preserved, as shown in the examples.
8.3. 字串型別
Table 8.4. Character Types
9.10. 列舉型別函式
For enum types (described in), there are several functions that allow cleaner programming without hard-coding particular values of an enum type. These are listed in. The examples assume an enum type created as:
Returns all values of the input enum type in an ordered array
enum_range(null::rainbow)
{red,orange,yellow,green,blue,purple}
enum_range(anyenum, anyenum)
Returns the range between the two given enum values, as an ordered array. The values must be from the same enum type. If the first parameter is null, the result will start with the first value of the enum type. If the second parameter is null, the result will end with the last value of the enum type.
enum_range('orange'::rainbow, 'green'::rainbow)
{orange,yellow,green}
enum_range(NULL, 'green'::rainbow)
{red,orange,yellow,green}
enum_range('orange'::rainbow, NULL)
{orange,yellow,green,blue,purple}
Notice that except for the two-argument form ofenum_range, these functions disregard the specific value passed to them; they care only about its declared data type. Either null or a specific value of the type can be passed, with the same result. It is more common to apply these functions to a table column or function argument than to a hardwired type name as suggested by the examples.
CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple');
Table 9.14. Bit String Operators
Operator
Description
Example
Result
`
The following SQL-standard functions work on bit strings as well as character strings: length, bit_length, octet_length, position, substring, overlay.
The following functions work on bit strings as well as binary strings: get_bit, set_bit. When working with a bit string, these functions number the first (leftmost) bit of the string as bit 0.
In addition, it is possible to cast integral values to and from type bit. Some examples:
Note that casting to just “bit” means casting to bit(1), and so will deliver only the least significant bit of the integer.
Note
Casting an integer to bit(n) copies the rightmost n bits. Casting an integer to a bit string width wider than the integer itself will sign-extend on the left.
SQL 定義了兩種主要字串型別:character varying(n) 和 character(n),其中 n 是正整數。這兩種型別都可以儲存長度最多為 n 個字元(不是位元組)的字串。嘗試將較長的字串儲存到這些型別的欄位中將産生錯誤,除非多餘的字元都是空格,在這種情況下,字串將被截斷為最大長度。(這個有點奇怪的異常是 SQL 標準所要求的。)如果要儲存的字串比宣告的長度短,則 character 型別的值將被空格填充;character varying 的值將只儲存較短的字串。
如果明確地將值轉換為 character varying(n) 或 character(n),則超長值將被截斷為 n 個字元而不會引發錯誤。(這也是 SQL 標準所要求的。)
型別 varchar(n) 和 char(n) 分別是 character varying(n) 和 character(n) 的別名。沒有長度的 character 等同於 character(1)。如果在沒有長度的情況下使用 character varying,則該型別接受任何長度的字串。後者是 PostgreSQL 延伸功能。
另外,PostgreSQL 提供了 text 型別,它儲存任意長度的字串。雖然型別 text 不在 SQL 標準中,但是其他幾個 SQL 資料庫管理系統也支援它。
character 的值用空格填充到指定的長度 n,並以這種方式儲存和顯示。但是,在比較兩個型別字串時,尾隨空格在語義上無關緊要會被忽略。在空格很重要的排序規則中,這種行為會產生意想不到的結果; 例如 SELECT 'a '::CHAR(2) collate "C"<E'a\n'::CHAR(2) 會回傳 true,即使 C 語言環境會認為空格大於換行符。將字串轉換為其他字串型別之一時,將刪除尾隨的空格。請注意,尾隨空格在 character varying 和 text 方面具有語義重要性,尤其在使用樣式匹配時,即 LIKE 和正規表示式。
短字串(126 個位元組以下)的儲存要求是 1 個位元組加上實際字串,其中包括字串空間填充。較長的字串有 4 個位元組的開銷而不是 1。長字串由系統自動壓縮,因此磁碟上的物理需求可能更少。非常長的值也儲存在後台的資料表中,這樣它們就不會干擾對較短欄位的快速存取。在任何情況下,可儲存的最長字串大約為 1 GB。(資料型別宣告中 n 允許的最大值小於此值。更改此值沒有用,因為使用多位元組字串編碼時,位元組數和字元數可能完全不同。如果您希望儲存沒有特定上限的長字串,使用不帶長度的 text 或 character varying,而不是隨便設定長度限制。)
小提醒
這三種型別之間並沒有效能差異,除了使用空白填充類型時增加的儲存空間之外,以及一些額外的 CPU 週期來檢查儲存長度與欄位中的長度。雖然 character(n) 在其他一些資料庫系統中具有效能優勢,但 PostgreSQL 中並沒有這樣的優勢;事實上,由於額外的儲存成本,character(n) 通常是三者中最慢的。在大多數情況下,應使用 text 或 character varying。
Object identifiers (OIDs) are used internally by PostgreSQL as primary keys for various system tables. Type oid represents an object identifier. There are also several alias types for oid: regproc, regprocedure, regoper, regoperator, regclass, regtype, regrole, regnamespace, regconfig, and regdictionary. shows an overview.
The oid type is currently implemented as an unsigned four-byte integer. Therefore, it is not large enough to provide database-wide uniqueness in large databases, or even in large individual tables.
The oid type itself has few operations beyond comparison. It can be cast to integer, however, and then manipulated using the standard integer operators. (Beware of possible signed-versus-unsigned confusion if you do this.)
The OID alias types have no operations of their own except for specialized input and output routines. These routines are able to accept and display symbolic names for system objects, rather than the raw numeric value that type oid would use. The alias types allow simplified lookup of OID values for objects. For example, to examine the pg_attribute rows related to a table mytable, one could write:
rather than:
While that doesn't look all that bad by itself, it's still oversimplified. A far more complicated sub-select would be needed to select the right OID if there are multiple tables named mytable in different schemas. The regclass input converter handles the table lookup according to the schema path setting, and so it does the “right thing” automatically. Similarly, casting a table's OID to regclass is handy for symbolic display of a numeric OID.
Table 8.26. Object Identifier Types
All of the OID alias types for objects grouped by namespace accept schema-qualified names, and will display schema-qualified names on output if the object would not be found in the current search path without being qualified. The regproc and regoper alias types will only accept input names that are unique (not overloaded), so they are of limited use; for most uses regprocedure or regoperator are more appropriate. For regoperator, unary operators are identified by writing NONE for the unused operand.
An additional property of most of the OID alias types is the creation of dependencies. If a constant of one of these types appears in a stored expression (such as a column default expression or view), it creates a dependency on the referenced object. For example, if a column has a default expression nextval('my_seq'::regclass), PostgreSQL understands that the default expression depends on the sequence my_seq; the system will not let the sequence be dropped without first removing the default expression. regrole is the only exception for the property. Constants of this type are not allowed in such expressions.
Note
The OID alias types do not completely follow transaction isolation rules. The planner also treats them as simple constants, which may result in sub-optimal planning.
Another identifier type used by the system is xid, or transaction (abbreviated xact) identifier. This is the data type of the system columns xmin and xmax. Transaction identifiers are 32-bit quantities.
A third identifier type used by the system is cid, or command identifier. This is the data type of the system columns cmin and cmax. Command identifiers are also 32-bit quantities.
A final identifier type used by the system is tid, or tuple identifier (row identifier). This is the data type of the system column ctid. A tuple ID is a pair (block number, tuple index within block) that identifies the physical location of the row within its table.
Geometric data types represent two-dimensional spatial objects. shows the geometric types available in PostgreSQL.
Table 8.20. Geometric Types
44::bit(10) 0000101100
44::bit(3) 100
cast(-44 as bit(12)) 111111010100
'1110'::bit(4)::integer 14
CREATE TABLE test1 (a character(4));
INSERT INTO test1 VALUES ('ok');
SELECT a, char_length(a) FROM test1; -- (1)
a | char_length
------+-------------
ok | 2
CREATE TABLE test2 (b varchar(5));
INSERT INTO test2 VALUES ('ok');
INSERT INTO test2 VALUES ('good ');
INSERT INTO test2 VALUES ('too long');
ERROR: value too long for type character varying(5)
INSERT INTO test2 VALUES ('too long'::varchar(5)); -- explicit truncation
SELECT b, char_length(b) FROM test2;
b | char_length
-------+-------------
ok | 2
good | 5
tool | 5
資料表的繼承一般來說是在子資料表建立時進行的,也就是在 CREATE TABLE 中使用 INHERITES 子句。然而,資料表也可以在 ALTER TABLE 中使用 INHERIT 子句來新增新的父資料表。要進行這個動作,新的子資料表必須已經包含所有父資料表的欄位—相同的欄位名稱及資料型別。還有在 ALTER TABLE 時加入 INHERIT 子句來移除某個欄位的繼承。動態地新增或移除繼承欄位通常是在應用分割表格(table partitioning)時特別好用(請參閱 5.10 節)。
還有一個方便的方式去建立一個相容於之後繼承的資料表,就是在 CREATE TABLE 中使用 LIKE 子句。這個方式會建立一個新的資料表,其欄位和另一個資料表完全相同。如果有任何 CHECK 子句的限制條件的話,就應該在 LIKE 子句中加入 INCLUDING CONSTRAINTS 選項,這樣就會和父資料表完全相容了。
A rich set of functions and operators is available to perform various geometric operations such as scaling, translation, rotation, and determining intersections. They are explained in Section 9.11.
8.8.1. Points
Points are the fundamental two-dimensional building block for geometric types. Values of type point are specified using either of the following syntaxes:
where x and y are the respective coordinates, as floating-point numbers.
Points are output using the first syntax.
8.8.2. Lines
Lines are represented by the linear equation A_x +By +C= 0, whereAandB_ are not both zero. Values of type line are input and output in the following form:
Alternatively, any of the following forms can be used for input:
where (x1,y1) and (x2,y2) are two different points on the line.
8.8.3. Line Segments
Line segments are represented by pairs of points that are the endpoints of the segment. Values of type lseg are specified using any of the following syntaxes:
where (x1,y1) and (x2,y2) are the end points of the line segment.
Line segments are output using the first syntax.
8.8.4. Boxes
Boxes are represented by pairs of points that are opposite corners of the box. Values of type box are specified using any of the following syntaxes:
where (x1,y1) and (x2,y2) are any two opposite corners of the box.
Boxes are output using the second syntax.
Any two opposite corners can be supplied on input, but the values will be reordered as needed to store the upper right and lower left corners, in that order.
8.8.5. Paths
Paths are represented by lists of connected points. Paths can be open, where the first and last points in the list are considered not connected, or closed, where the first and last points are considered connected.
Values of type path are specified using any of the following syntaxes:
where the points are the end points of the line segments comprising the path. Square brackets ([]) indicate an open path, while parentheses (()) indicate a closed path. When the outermost parentheses are omitted, as in the third through fifth syntaxes, a closed path is assumed.
Paths are output using the first or second syntax, as appropriate.
8.8.6. Polygons
Polygons are represented by lists of points (the vertexes of the polygon). Polygons are very similar to closed paths, but are stored differently and have their own set of support routines.
Values of type polygon are specified using any of the following syntaxes:
where the points are the end points of the line segments comprising the boundary of the polygon.
Polygons are output using the first syntax.
8.8.7. Circles
Circles are represented by a center point and radius. Values of type circle are specified using any of the following syntaxes:
where (x,y) is the center point and r is the radius of the circle.
< ( x , y ) , r >
( ( x , y ) , r )
( x , y ) , r
x , y , r
9.2. 比較函式及運算子
The usual comparison operators are available, as shown in Table 9.1.
Table 9.1. Comparison Operators
Operator
Note
The != operator is converted to <> in the parser stage. It is not possible to implement != and <> operators that do different things.
Comparison operators are available for all relevant data types. All comparison operators are binary operators that return values of type boolean; expressions like 1 < 2 < 3 are not valid (because there is no < operator to compare a Boolean value with 3).
There are also some comparison predicates, as shown in . These behave much like operators, but have special syntax mandated by the SQL standard.
Table 9.2. Comparison Predicates
The BETWEEN predicate simplifies range tests:
is equivalent to
Notice that BETWEEN treats the endpoint values as included in the range. NOT BETWEEN does the opposite comparison:
is equivalent to
BETWEEN SYMMETRIC is like BETWEEN except there is no requirement that the argument to the left of AND be less than or equal to the argument on the right. If it is not, those two arguments are automatically swapped, so that a nonempty range is always implied.
Ordinary comparison operators yield null (signifying “unknown”), not true or false, when either input is null. For example, 7 = NULL yields null, as does 7 <> NULL. When this behavior is not suitable, use the IS [ NOT ] DISTINCT FROM predicates:
For non-null inputs, IS DISTINCT FROM is the same as the <> operator. However, if both inputs are null it returns false, and if only one input is null it returns true. Similarly, IS NOT DISTINCT FROM is identical to = for non-null inputs, but it returns true when both inputs are null, and false when only one input is null. Thus, these predicates effectively act as though null were a normal data value, rather than “unknown”.
To check whether a value is or is not null, use the predicates:
or the equivalent, but nonstandard, predicates:
Do not write expression = NULL because NULL is not “equal to” NULL. (The null value represents an unknown value, and it is not known whether two unknown values are equal.)
Tip
Some applications might expect that expression = NULL returns true if expression evaluates to the null value. It is highly recommended that these applications be modified to comply with the SQL standard. However, if that cannot be done the configuration variable is available. If it is enabled, PostgreSQL will convert x = NULL clauses to x IS NULL.
If the expression is row-valued, then IS NULL is true when the row expression itself is null or when all the row's fields are null, while IS NOT NULL is true when the row expression itself is non-null and all the row's fields are non-null. Because of this behavior, IS NULL and IS NOT NULL do not always return inverse results for row-valued expressions; in particular, a row-valued expression that contains both null and non-null fields will return false for both tests. In some cases, it may be preferable to write rowIS DISTINCT FROM NULL or rowIS NOT DISTINCT FROM NULL, which will simply check whether the overall row value is null without any additional tests on the row fields.
Boolean values can also be tested using the predicates
These will always return true or false, never a null value, even when the operand is null. A null input is treated as the logical value “unknown”. Notice that IS UNKNOWN and IS NOT UNKNOWN are effectively the same as IS NULL and IS NOT NULL, respectively, except that the input expression must be of Boolean type.
Some comparison-related functions are also available, as shown in .
PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses, as shown in . It is better to use these types instead of plain text types to store network addresses, because these types offer input error checking and specialized operators and functions (see ).
boolean_expression IS TRUE
boolean_expression IS NOT TRUE
boolean_expression IS FALSE
boolean_expression IS NOT FALSE
boolean_expression IS UNKNOWN
boolean_expression IS NOT UNKNOWN
SELECT E'\\xDEADBEEF';
Storage Size
Description
cidr
7 or 19 bytes
IPv4 and IPv6 networks
inet
7 or 19 bytes
IPv4 and IPv6 hosts and networks
macaddr
6 bytes
MAC addresses
macaddr8
8 bytes
MAC addresses (EUI-64 format)
When sorting inet or cidr data types, IPv4 addresses will always sort before IPv6 addresses, including IPv4 addresses encapsulated or mapped to IPv6 addresses, such as ::10.2.3.4 or ::ffff:10.4.3.2.
8.9.1. inet
The inet type holds an IPv4 or IPv6 host address, and optionally its subnet, all in one field. The subnet is represented by the number of network address bits present in the host address (the “netmask”). If the netmask is 32 and the address is IPv4, then the value does not indicate a subnet, only a single host. In IPv6, the address length is 128 bits, so 128 bits specify a unique host address. Note that if you want to accept only networks, you should use the cidr type rather than inet.
The input format for this type is address/y where address is an IPv4 or IPv6 address and y is the number of bits in the netmask. If the /y portion is missing, the netmask is 32 for IPv4 and 128 for IPv6, so the value represents just a single host. On display, the /y portion is suppressed if the netmask specifies a single host.
8.9.2. cidr
The cidr type holds an IPv4 or IPv6 network specification. Input and output formats follow Classless Internet Domain Routing conventions. The format for specifying networks is address/y where address is the network represented as an IPv4 or IPv6 address, and y is the number of bits in the netmask. If y is omitted, it is calculated using assumptions from the older classful network numbering system, except it will be at least large enough to include all of the octets written in the input. It is an error to specify a network address that has bits set to the right of the specified netmask.
The essential difference between inet and cidr data types is that inet accepts values with nonzero bits to the right of the netmask, whereas cidr does not. For example, 192.168.0.1/24 is valid for inet but not for cidr.
Tip
If you do not like the output format for inet or cidr values, try the functions host, text, and abbrev.
8.9.4. macaddr
The macaddr type stores MAC addresses, known for example from Ethernet card hardware addresses (although MAC addresses are used for other purposes as well). Input is accepted in the following formats:
'08:00:2b:01:02:03'
'08-00-2b-01-02-03'
'08002b:010203'
'08002b-010203'
'0800.2b01.0203'
'0800-2b01-0203'
These examples would all specify the same address. Upper and lower case is accepted for the digits a through f. Output is always in the first of the forms shown.
IEEE Std 802-2001 specifies the second shown form (with hyphens) as the canonical form for MAC addresses, and specifies the first form (with colons) as the bit-reversed notation, so that 08-00-2b-01-02-03 = 01:00:4D:08:04:0C. This convention is widely ignored nowadays, and it is relevant only for obsolete network protocols (such as Token Ring). PostgreSQL makes no provisions for bit reversal, and all accepted formats use the canonical LSB order.
The remaining five input formats are not part of any standard.
8.9.5. macaddr8
The macaddr8 type stores MAC addresses in EUI-64 format, known for example from Ethernet card hardware addresses (although MAC addresses are used for other purposes as well). This type can accept both 6 and 8 byte length MAC addresses and stores them in 8 byte length format. MAC addresses given in 6 byte format will be stored in 8 byte length format with the 4th and 5th bytes set to FF and FE, respectively. Note that IPv6 uses a modified EUI-64 format where the 7th bit should be set to one after the conversion from EUI-48. The function macaddr8_set7bit is provided to make this change. Generally speaking, any input which is comprised of pairs of hex digits (on byte boundaries), optionally separated consistently by one of ':', '-' or '.', is accepted. The number of hex digits must be either 16 (8 bytes) or 12 (6 bytes). Leading and trailing whitespace is ignored. The following are examples of input formats that are accepted:
'08:00:2b:01:02:03:04:05'
'08-00-2b-01-02-03-04-05'
'08002b:0102030405'
'08002b-0102030405'
'0800.2b01.0203.0405'
'0800-2b01-0203-0405'
These examples would all specify the same address. Upper and lower case is accepted for the digits a through f. Output is always in the first of the forms shown. The last six input formats that are mentioned above are not part of any standard. To convert a traditional 48 bit MAC address in EUI-48 format to modified EUI-64 format to be included as the host portion of an IPv6 address, use macaddr8_set7bit as shown:
MAC (Media Access Control) address (EUI-64 format)
money
currency amount
numeric [ (p,s) ]
decimal [ (p,s) ]
exact numeric of selectable precision
path
geometric path on a plane
pg_lsn
PostgreSQLLog Sequence Number
point
geometric point on a plane
polygon
closed geometric path on a plane
real
float4
single precision floating-point number (4 bytes)
smallint
int2
signed two-byte integer
smallserial
serial2
autoincrementing two-byte integer
serial
serial4
autoincrementing four-byte integer
text
variable-length character string
time [ (p) ] [ without time zone ]
time of day (no time zone)
time [ (p) ] with time zone
timetz
time of day, including time zone
timestamp [ (p) ] [ without time zone ]
date and time (no time zone)
timestamp [ (p) ] with time zone
timestamptz
date and time, including time zone
tsquery
text search query
tsvector
text search document
txid_snapshot
user-level transaction ID snapshot
uuid
universally unique identifier
xml
XML data
相容性
以下資料型別(或其拼寫方式)是由 SQL 指定的:bigint,bit,bit varying,boolean,char,character varying,character,varchar,date,double precision,integer,interval,numeric,decimal,real,smallint,time(with or without time zone),timestamp(with or without time zone),xml.
CREATE SEQUENCE tablename_colname_seq;
CREATE TABLE tablename (
colname integer NOT NULL DEFAULT nextval('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;
9.5. 位元字串函式及運算子
This section describes functions and operators for examining and manipulating values of type bytea.
SQL defines some string functions that use key words, rather than commas, to separate arguments. Details are in Table 9.12. PostgreSQL also provides versions of these functions that use the regular function invocation syntax (see Table 9.13).
Note
The sample results shown on this page assume that the server parameter is set to escape (the traditional PostgreSQL format).
Table 9.12. SQL Binary String Functions and Operators
Additional binary string manipulation functions are available and are listed in . Some of them are used internally to implement the SQL-standard string functions listed in .
Table 9.13. Other Binary String Functions
get_byte and set_byte number the first byte of a binary string as byte 0. get_bit and set_bit number bits from the right within each byte; for example bit 0 is the least significant bit of the first byte, and bit 15 is the most significant bit of the second byte.
Note that for historic reasons, the function md5 returns a hex-encoded value of type text whereas the SHA-2 functions return type bytea. Use the functions encode and decode to convert between the two, for example encode(sha256('abc'), 'hex') to get a hex-encoded text representation.
See also the aggregate function string_agg in and the large object functions in .
8.16. 複合型別
A composite type represents the structure of a row or record; it is essentially just a list of field names and their data types. PostgreSQL allows composite types to be used in many of the same ways that simple types can be used. For example, a column of a table can be declared to be of a composite type.
8.16.1. Declaration of Composite Types
Here are two simple examples of defining composite types:
The syntax is comparable to CREATE TABLE, except that only field names and types can be specified; no constraints (such as NOT NULL) can presently be included. Note that the AS keyword is essential; without it, the system will think a different kind of CREATE TYPE command is meant, and you will get odd syntax errors.
Having defined the types, we can use them to create tables:
or functions:
Whenever you create a table, a composite type is also automatically created, with the same name as the table, to represent the table's row type. For example, had we said:
then the same inventory_item composite type shown above would come into being as a byproduct, and could be used just as above. Note however an important restriction of the current implementation: since no constraints are associated with a composite type, the constraints shown in the table definition do not apply to values of the composite type outside the table. (To work around this, create a domain over the composite type, and apply the desired constraints as CHECK constraints of the domain.)
8.16.2. Constructing Composite Values
To write a composite value as a literal constant, enclose the field values within parentheses and separate them by commas. You can put double quotes around any field value, and must do so if it contains commas or parentheses. (More details appear .) Thus, the general format of a composite constant is the following:
An example is:
which would be a valid value of the inventory_item type defined above. To make a field be NULL, write no characters at all in its position in the list. For example, this constant specifies a NULL third field:
If you want an empty string rather than NULL, write double quotes:
Here the first field is a non-NULL empty string, the third is NULL.
(These constants are actually only a special case of the generic type constants discussed in . The constant is initially treated as a string and passed to the composite-type input conversion routine. An explicit type specification might be necessary to tell which type to convert the constant to.)
The ROW expression syntax can also be used to construct composite values. In most cases this is considerably simpler to use than the string-literal syntax since you don't have to worry about multiple layers of quoting. We already used this method above:
The ROW keyword is actually optional as long as you have more than one field in the expression, so these can be simplified to:
The ROW expression syntax is discussed in more detail in .
8.16.3. Accessing Composite Types
To access a field of a composite column, one writes a dot and the field name, much like selecting a field from a table name. In fact, it's so much like selecting from a table name that you often have to use parentheses to keep from confusing the parser. For example, you might try to select some subfields from our on_hand example table with something like:
This will not work since the name item is taken to be a table name, not a column name of on_hand, per SQL syntax rules. You must write it like this:
or if you need to use the table name as well (for instance in a multitable query), like this:
Now the parenthesized object is correctly interpreted as a reference to the item column, and then the subfield can be selected from it.
Similar syntactic issues apply whenever you select a field from a composite value. For instance, to select just one field from the result of a function that returns a composite value, you'd need to write something like:
Without the extra parentheses, this will generate a syntax error.
The special field name * means “all fields”, as further explained in .
8.16.4. Modifying Composite Types
Here are some examples of the proper syntax for inserting and updating composite columns. First, inserting or updating a whole column:
The first example omits ROW, the second uses it; we could have done it either way.
We can update an individual subfield of a composite column:
Notice here that we don't need to (and indeed cannot) put parentheses around the column name appearing just after SET, but we do need parentheses when referencing the same column in the expression to the right of the equal sign.
And we can specify subfields as targets for INSERT, too:
Had we not supplied values for all the subfields of the column, the remaining subfields would have been filled with null values.
8.16.5. Using Composite Types in Queries
There are various special syntax rules and behaviors associated with composite types in queries. These rules provide useful shortcuts, but can be confusing if you don't know the logic behind them.
In PostgreSQL, a reference to a table name (or alias) in a query is effectively a reference to the composite value of the table's current row. For example, if we had a table inventory_item as shown , we could write:
This query produces a single composite-valued column, so we might get output like:
Note however that simple names are matched to column names before table names, so this example works only because there is no column named c in the query's tables.
The ordinary qualified-column-name syntax table_name.column_name can be understood as applying to the composite value of the table's current row. (For efficiency reasons, it's not actually implemented that way.)
When we write
then, according to the SQL standard, we should get the contents of the table expanded into separate columns:
as if the query were
PostgreSQL will apply this expansion behavior to any composite-valued expression, although as shown , you need to write parentheses around the value that .* is applied to whenever it's not a simple table name. For example, if myfunc() is a function returning a composite type with columns a, b, and c, then these two queries have the same result:
Tip
PostgreSQL handles column expansion by actually transforming the first form into the second. So, in this example, myfunc() would get invoked three times per row with either syntax. If it's an expensive function you may wish to avoid that, which you can do with a query like:
Placing the function in a LATERALFROM item keeps it from being invoked more than once per row. m.* is still expanded into m.a, m.b, m.c, but now those variables are just references to the output of the FROM item. (The LATERAL keyword is optional here, but we show it to clarify that the function is getting x from some_table.)
The composite_value.* syntax results in column expansion of this kind when it appears at the top level of a , a in INSERT/UPDATE/DELETE, a , or a . In all other contexts (including when nested inside one of those constructs), attaching .* to a composite value does not change the value, since it means “all columns” and so the same composite value is produced again. For example, if somefunc() accepts a composite-valued argument, these queries are the same:
In both cases, the current row of inventory_item is passed to the function as a single composite-valued argument. Even though .* does nothing in such cases, using it is good style, since it makes clear that a composite value is intended. In particular, the parser will consider c in c.* to refer to a table name or alias, not to a column name, so that there is no ambiguity; whereas without .*, it is not clear whether c means a table name or a column name, and in fact the column-name interpretation will be preferred if there is a column named c.
Another example demonstrating these concepts is that all these queries mean the same thing:
All of these ORDER BY clauses specify the row's composite value, resulting in sorting the rows according to the rules described in . However, if inventory_item contained a column named c, the first case would be different from the others, as it would mean to sort by that column only. Given the column names previously shown, these queries are also equivalent to those above:
(The last case uses a row constructor with the key word ROW omitted.)
Another special syntactical behavior associated with composite values is that we can use functional notation for extracting a field of a composite value. The simple way to explain this is that the notations field(table) and table.field are interchangeable. For example, these queries are equivalent:
Moreover, if we have a function that accepts a single argument of a composite type, we can call it with either notation. These queries are all equivalent:
This equivalence between functional notation and field notation makes it possible to use functions on composite types to implement “computed fields”. An application using the last query above wouldn't need to be directly aware that somefunc isn't a real column of the table.
Tip
Because of this behavior, it's unwise to give a function that takes a single composite-type argument the same name as any of the fields of that composite type. If there is ambiguity, the field-name interpretation will be chosen if field-name syntax is used, while the function will be chosen if function-call syntax is used. However, PostgreSQL versions before 11 always chose the field-name interpretation, unless the syntax of the call required it to be a function call. One way to force the function interpretation in older versions is to schema-qualify the function name, that is, write schema.func(compositevalue).
8.16.6. Composite Type Input and Output Syntax
The external text representation of a composite value consists of items that are interpreted according to the I/O conversion rules for the individual field types, plus decoration that indicates the composite structure. The decoration consists of parentheses (( and )) around the whole value, plus commas (,) between adjacent items. Whitespace outside the parentheses is ignored, but within the parentheses it is considered part of the field value, and might or might not be significant depending on the input conversion rules for the field data type. For example, in:
the whitespace will be ignored if the field type is integer, but not if it is text.
As shown previously, when writing a composite value you can write double quotes around any individual field value. You must do so if the field value would otherwise confuse the composite-value parser. In particular, fields containing parentheses, commas, double quotes, or backslashes must be double-quoted. To put a double quote or backslash in a quoted composite field value, precede it with a backslash. (Also, a pair of double quotes within a double-quoted field value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can avoid quoting and use backslash-escaping to protect all data characters that would otherwise be taken as composite syntax.
A completely empty field value (no characters at all between the commas or parentheses) represents a NULL. To write a value that is an empty string rather than NULL, write "".
The composite output routine will put double quotes around field values if they are empty strings or contain parentheses, commas, double quotes, backslashes, or white space. (Doing so for white space is not essential, but aids legibility.) Double quotes and backslashes embedded in field values will be doubled.
Note
Remember that what you write in an SQL command will first be interpreted as a string literal, and then as a composite. This doubles the number of backslashes you need (assuming escape string syntax is used). For example, to insert a text field containing a double quote and a backslash in a composite value, you'd need to write:
The string-literal processor removes one level of backslashes, so that what arrives at the composite-value parser looks like ("\"\\"). In turn, the string fed to the text data type's input routine becomes "\. (If we were working with a data type whose input routine also treated backslashes specially, bytea for example, we might need as many as eight backslashes in the command to get one backslash into the stored composite field.) Dollar quoting (see ) can be used to avoid the need to double backslashes.
Tip
The ROW constructor syntax is usually easier to work with than the composite-literal syntax when writing composite values in SQL commands. In ROW, individual field values are written the same way they would be written when not members of a composite.
8.17. 範圍型別
版本:11
Range types are data types representing a range of values of some element type (called the range's subtype). For instance, ranges of timestamp might be used to represent the ranges of time that a meeting room is reserved. In this case the data type is tsrange (short for “timestamp range”), and timestamp is the subtype. The subtype must have a total order so that it is well-defined whether element values are within, before, or after a range of values.
Range types are useful because they represent many element values in a single range value, and because concepts such as overlapping ranges can be expressed clearly. The use of time and date ranges for scheduling purposes is the clearest example; but price ranges, measurement ranges from an instrument, and so forth can also be useful.
8.17.1. Built-in Range Types
PostgreSQL comes with the following built-in range types:
int4range — Range of integer
int8range — Range of bigint
In addition, you can define your own range types; see for more information.
8.17.2. Examples
See and for complete lists of operators and functions on range types.
8.17.3. Inclusive and Exclusive Bounds
Every non-empty range has two bounds, the lower bound and the upper bound. All points between these values are included in the range. An inclusive bound means that the boundary point itself is included in the range as well, while an exclusive bound means that the boundary point is not included in the range.
In the text form of a range, an inclusive lower bound is represented by “[” while an exclusive lower bound is represented by “(”. Likewise, an inclusive upper bound is represented by “]”, while an exclusive upper bound is represented by “)”. (See for more details.)
The functions lower_inc and upper_inc test the inclusivity of the lower and upper bounds of a range value, respectively.
8.17.4. Infinite (Unbounded) Ranges
The lower bound of a range can be omitted, meaning that all values less than the upper bound are included in the range, e.g., (,3]. Likewise, if the upper bound of the range is omitted, then all values greater than the lower bound are included in the range. If both lower and upper bounds are omitted, all values of the element type are considered to be in the range. Specifying a missing bound as inclusive is automatically converted to exclusive, e.g., [,] is converted to (,). You can think of these missing values as +/-infinity, but they are special range type values and are considered to be beyond any range element type's +/-infinity values.
Element types that have the notion of “infinity” can use them as explicit bound values. For example, with timestamp ranges, [today,infinity) excludes the special timestamp value infinity, while [today,infinity] include it, as does [today,) and [today,].
The functions lower_inf and upper_inf test for infinite lower and upper bounds of a range, respectively.
8.17.5. Range Input/Output
The input for a range value must follow one of the following patterns:
The parentheses or brackets indicate whether the lower and upper bounds are exclusive or inclusive, as described previously. Notice that the final pattern is empty, which represents an empty range (a range that contains no points).
The lower-bound may be either a string that is valid input for the subtype, or empty to indicate no lower bound. Likewise, upper-bound may be either a string that is valid input for the subtype, or empty to indicate no upper bound.
Each bound value can be quoted using " (double quote) characters. This is necessary if the bound value contains parentheses, brackets, commas, double quotes, or backslashes, since these characters would otherwise be taken as part of the range syntax. To put a double quote or backslash in a quoted bound value, precede it with a backslash. (Also, a pair of double quotes within a double-quoted bound value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can avoid quoting and use backslash-escaping to protect all data characters that would otherwise be taken as range syntax. Also, to write a bound value that is an empty string, write "", since writing nothing means an infinite bound.
Whitespace is allowed before and after the range value, but any whitespace between the parentheses or brackets is taken as part of the lower or upper bound value. (Depending on the element type, it might or might not be significant.)
Note
These rules are very similar to those for writing field values in composite-type literals. See for additional commentary.
Examples:
8.17.6. Constructing Ranges
Each range type has a constructor function with the same name as the range type. Using the constructor function is frequently more convenient than writing a range literal constant, since it avoids the need for extra quoting of the bound values. The constructor function accepts two or three arguments. The two-argument form constructs a range in standard form (lower bound inclusive, upper bound exclusive), while the three-argument form constructs a range with bounds of the form specified by the third argument. The third argument must be one of the strings “()”, “(]”, “[)”, or “[]”. For example:
8.17.7. Discrete Range Types
A discrete range is one whose element type has a well-defined “step”, such as integer or date. In these types two elements can be said to be adjacent, when there are no valid values between them. This contrasts with continuous ranges, where it's always (or almost always) possible to identify other element values between two given values. For example, a range over the numeric type is continuous, as is a range over timestamp. (Even though timestamp has limited precision, and so could theoretically be treated as discrete, it's better to consider it continuous since the step size is normally not of interest.)
Another way to think about a discrete range type is that there is a clear idea of a “next” or “previous” value for each element value. Knowing that, it is possible to convert between inclusive and exclusive representations of a range's bounds, by choosing the next or previous element value instead of the one originally given. For example, in an integer range type [4,8] and (3,9) denote the same set of values; but this would not be so for a range over numeric.
A discrete range type should have a canonicalization function that is aware of the desired step size for the element type. The canonicalization function is charged with converting equivalent values of the range type to have identical representations, in particular consistently inclusive or exclusive bounds. If a canonicalization function is not specified, then ranges with different formatting will always be treated as unequal, even though they might represent the same set of values in reality.
The built-in range types int4range, int8range, and daterange all use a canonical form that includes the lower bound and excludes the upper bound; that is, [). User-defined range types can use other conventions, however.
8.17.8. Defining New Range Types
Users can define their own range types. The most common reason to do this is to use ranges over subtypes not provided among the built-in range types. For example, to define a new range type of subtype float8:
Because float8 has no meaningful “step”, we do not define a canonicalization function in this example.
Defining your own range type also allows you to specify a different subtype B-tree operator class or collation to use, so as to change the sort ordering that determines which values fall into a given range.
If the subtype is considered to have discrete rather than continuous values, the CREATE TYPE command should specify a canonical function. The canonicalization function takes an input range value, and must return an equivalent range value that may have different bounds and formatting. The canonical output for two ranges that represent the same set of values, for example the integer ranges [1, 7] and [1, 8), must be identical. It doesn't matter which representation you choose to be the canonical one, so long as two equivalent values with different formattings are always mapped to the same value with the same formatting. In addition to adjusting the inclusive/exclusive bounds format, a canonicalization function might round off boundary values, in case the desired step size is larger than what the subtype is capable of storing. For instance, a range type over timestamp could be defined to have a step size of an hour, in which case the canonicalization function would need to round off bounds that weren't a multiple of an hour, or perhaps throw an error instead.
In addition, any range type that is meant to be used with GiST or SP-GiST indexes should define a subtype difference, or subtype_diff, function. (The index will still work without subtype_diff, but it is likely to be considerably less efficient than if a difference function is provided.) The subtype difference function takes two input values of the subtype, and returns their difference (i.e., X minus Y) represented as a float8 value. In our example above, the function float8mi that underlies the regular float8 minus operator can be used; but for any other subtype, some type conversion would be necessary. Some creative thought about how to represent differences as numbers might be needed, too. To the greatest extent possible, the subtype_diff function should agree with the sort ordering implied by the selected operator class and collation; that is, its result should be positive whenever its first argument is greater than its second according to the sort ordering.
A less-oversimplified example of a subtype_diff function is:
See for more information about creating range types.
8.17.9. Indexing
GiST and SP-GiST indexes can be created for table columns of range types. For instance, to create a GiST index:
A GiST or SP-GiST index can accelerate queries involving these range operators: =, &&, <@, @>, <<, >>, -|-, &<, and &> (see for more information).
In addition, B-tree and hash indexes can be created for table columns of range types. For these index types, basically the only useful range operation is equality. There is a B-tree sort ordering defined for range values, with corresponding < and > operators, but the ordering is rather arbitrary and not usually useful in the real world. Range types' B-tree and hash support is primarily meant to allow sorting and hashing internally in queries, rather than creation of actual indexes.
8.17.10. Constraints on Ranges
While UNIQUE is a natural constraint for scalar values, it is usually unsuitable for range types. Instead, an exclusion constraint is often more appropriate (see ). Exclusion constraints allow the specification of constraints such as “non-overlapping” on a range type. For example:
That constraint will prevent any overlapping values from existing in the table at the same time:
You can use the extension to define exclusion constraints on plain scalar data types, which can then be combined with range exclusions for maximum flexibility. For example, after btree_gist is installed, the following constraint will reject overlapping ranges only if the meeting room numbers are equal:
8.11. 全文檢索型別
PostgreSQL provides two data types that are designed to support full text search, which is the activity of searching through a collection of natural-language documents to locate those that best match a query. The tsvector type represents a document in a form optimized for text search; the tsquery type similarly represents a text query. Chapter 12 provides a detailed explanation of this facility, and Section 9.13 summarizes the related functions and operators.
8.11.1. tsvector
A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see for details). Sorting and duplicate-elimination are done automatically during input, as shown in this example:
To represent lexemes containing whitespace or punctuation, surround them with quotes:
(We use dollar-quoted string literals in this example and the next one to avoid the confusion of having to double quote marks within the literals.) Embedded quotes and backslashes must be doubled:
Optionally, integer positions can be attached to lexemes:
A position normally indicates the source word's location in the document. Positional information can be used for proximity ranking. Position values can range from 1 to 16383; larger numbers are silently set to 16383. Duplicate positions for the same lexeme are discarded.
Lexemes that have positions can further be labeled with a weight, which can be A, B, C, or D. D is the default and hence is not shown on output:
Weights are typically used to reflect document structure, for example by marking title words differently from body words. Text search ranking functions can assign different priorities to the different weight markers.
It is important to understand that the tsvector type itself does not perform any word normalization; it assumes the words it is given are normalized appropriately for the application. For example,
For most English-text-searching applications the above words would be considered non-normalized, but tsvector doesn't care. Raw document text should usually be passed through to_tsvector to normalize the words appropriately for searching:
Again, see for more detail.
8.11.2. tsquery
A tsquery value stores lexemes that are to be searched for, and can combine them using the Boolean operators & (AND), | (OR), and ! (NOT), as well as the phrase search operator <-> (FOLLOWED BY). There is also a variant <N> of the FOLLOWED BY operator, where N is an integer constant that specifies the distance between the two lexemes being searched for. <-> is equivalent to <1>.
Parentheses can be used to enforce grouping of these operators. In the absence of parentheses, ! (NOT) binds most tightly, <-> (FOLLOWED BY) next most tightly, then & (AND), with | (OR) binding the least tightly.
Here are some examples:
Optionally, lexemes in a tsquery can be labeled with one or more weight letters, which restricts them to match only tsvector lexemes with one of those weights:
Also, lexemes in a tsquery can be labeled with * to specify prefix matching:
This query will match any word in a tsvector that begins with “super”.
Quoting rules for lexemes are the same as described previously for lexemes in tsvector; and, as with tsvector, any required normalization of words must be done before converting to the tsquery type. The to_tsquery function is convenient for performing such normalization:
Note that to_tsquery will process prefixes in the same way as other words, which means this comparison returns true:
because postgres gets stemmed to postgr:
which will match the stemmed form of postgraduate.
WITH provides a way to write auxiliary statements for use in a larger query. These statements, which are often referred to as Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query. Each auxiliary statement in a WITH clause can be a SELECT, INSERT, UPDATE, or DELETE; and the WITH clause itself is attached to a primary statement that can also be a SELECT, INSERT,
CREATE TYPE complex AS (
r double precision,
i double precision
);
CREATE TYPE inventory_item AS (
name text,
supplier_id integer,
price numeric
);
_string`_
bytea
String concatenation
`'\Post'::bytea
'\047gres\000'::bytea`
\\Post'gres\000
octet_length(string)
int
Number of bytes in binary string
octet_length('jo\000se'::bytea)
5
overlay(string placing string from int [for int])
bytea
Replace substring
overlay('Th\000omas'::bytea placing '\002\003'::bytea from 2 for 3)
T\\002\\003mas
position(substring in string)
int
Location of specified substring
position('\000om'::bytea in 'Th\000omas'::bytea)
3
substring(string [from int] [for int])
bytea
Extract substring
substring('Th\000omas'::bytea from 2 for 3)
h\000o
trim([both]bytes from string)
bytea
Remove the longest string containing only bytes appearing in bytes from the start and end of string
trim('\000\001'::bytea from '\000Tom\001'::bytea)
Tom
decode(stringtext, formattext)
bytea
Decode binary data from textual representation in string. Options for format are same as in encode.
decode('123\000456', 'escape')
123\000456
encode(databytea, formattext)
text
Encode binary data into a textual representation. Supported formats are: base64, hex, escape. escape converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes.
encode('123\000456'::bytea, 'escape')
123\000456
get_bit(string, offset)
int
Extract bit from string
get_bit('Th\000omas'::bytea, 45)
1
get_byte(string, offset)
int
Extract byte from string
get_byte('Th\000omas'::bytea, 4)
109
length(string)
int
Length of binary string
length('jo\000se'::bytea)
5
md5(string)
text
Calculates the MD5 hash of string, returning the result in hexadecimal
所有的安全原則,目前來說都是開放安全原則,意思是當有多個安全原則被引用時,它們會以 OR 運算串連其結果。開放安全原則用於只允許在計畫內的環境使用的話,它會比和嚴格安全原則(把安全原則用 AND 串連起來判斷)一起使用來得簡單。基於上面的列子,我們建立一個嚴格安全原則,它限制管理者只能透過 unix socket 連線才能存取 passwd 資料表:
現在假設 alice 想要變更"slightly secret"的資訊,但決定不讓 mallory 看到新的內容,所以她這麼做:
看起來很安全,因為沒有窗口讓 mallory 可以看到"secret from mallory",然而,這裡就存在了交易競爭的情況。如果 mallory 也在同時做了:
因為她的交易是屬於 READ COMMITTED 模式,所以她有可能會看到"secret from mallory"。這會剛好發生在,她在 alice 的交易完成前一刻。mallory 的指令會暫時擋下 alice 的提交完成,而因為 FOR UPDATE,她會取得更新後的資訊。所以她並沒有從隱含的使用者執行 SELECT 取得資訊,因為子查詢沒有 FOR UPDATE,使得其他使用者可以從快照裡取得資訊。因為安全原則是以舊的 mallory 權限允許她看見該筆資料。
這個問題有好幾個面向的解決方式。一個簡單的方式就是使用 SELECT ... FOR SHARE 在安全原則的子查詢裡。但這樣就必須要讓使用者擁有 UPDATE 的權限,可能不太合適。(但也可以用另一個安全原則來做更多的限制,又或是把子查詢封裝進另一個安全的函數裡)同時,大量的引用查詢也可能造成效能的問題,特別是更新資料的時候。另一個解決辦法,如果參考的資料表並不是很常更新的話,那麼可以在資料表更新時強制鎖定該資料表,確保沒有其他交易能在同時進行查詢,也就不會洩漏任何資訊。或是等待其他所有交易都完成後,才提交更新變更新的安全方案。
The basic value of SELECT in WITH is to break down complicated queries into simpler parts. An example is:
which displays per-product sales totals in only the top sales regions. The WITH clause defines two auxiliary statements named regional_sales and top_regions, where the output of regional_sales is used in top_regions and the output of top_regions is used in the primary SELECT query. This example could have been written without WITH, but we'd have needed two levels of nested sub-SELECTs. It's a bit easier to follow this way.
The optional RECURSIVE modifier changes WITH from a mere syntactic convenience into a feature that accomplishes things not otherwise possible in standard SQL. Using RECURSIVE, a WITH query can refer to its own output. A very simple example is this query to sum the integers from 1 through 100:
The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION ALL), then a recursive term, where only the recursive term can contain a reference to the query's own output. Such a query is executed as follows:
Recursive Query Evaluation
Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
So long as the working table is not empty, repeat these steps:
Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
Note
Strictly speaking, this process is iteration not recursion, but RECURSIVE is the terminology chosen by the SQL standards committee.
In the example above, the working table has just a single row in each step, and it takes on the values from 1 through 100 in successive steps. In the 100th step, there is no output because of the WHERE clause, and so the query terminates.
Recursive queries are typically used to deal with hierarchical or tree-structured data. A useful example is this query to find all the direct and indirect sub-parts of a product, given only a table that shows immediate inclusions:
When working with recursive queries it is important to be sure that the recursive part of the query will eventually return no tuples, or else the query will loop indefinitely. Sometimes, using UNION instead of UNION ALL can accomplish this by discarding rows that duplicate previous output rows. However, often a cycle does not involve output rows that are completely duplicate: it may be necessary to check just one or a few fields to see if the same point has been reached before. The standard method for handling such situations is to compute an array of the already-visited values. For example, consider the following query that searches a table graph using a link field:
This query will loop if the link relationships contain cycles. Because we require a “depth” output, just changing UNION ALL to UNION would not eliminate the looping. Instead we need to recognize whether we have reached the same row again while following a particular path of links. We add two columns path and cycle to the loop-prone query:
Aside from preventing cycles, the array value is often useful in its own right as representing the “path” taken to reach any particular row.
In the general case where more than one field needs to be checked to recognize a cycle, use an array of rows. For example, if we needed to compare fields f1 and f2:
Tip
Omit the ROW() syntax in the common case where only one field needs to be checked to recognize a cycle. This allows a simple array rather than a composite-type array to be used, gaining efficiency.
Tip
The recursive query evaluation algorithm produces its output in breadth-first search order. You can display the results in depth-first search order by making the outer query ORDER BY a “path” column constructed in this way.
A helpful trick for testing queries when you are not certain if they might loop is to place a LIMIT in the parent query. For example, this query would loop forever without the LIMIT:
This works because PostgreSQL's implementation evaluates only as many rows of a WITH query as are actually fetched by the parent query. Using this trick in production is not recommended, because other systems might work differently. Also, it usually won't work if you make the outer query sort the recursive query's results or join them to some other table, because in such cases the outer query will usually try to fetch all of the WITH query's output anyway.
A useful property of WITH queries is that they are normally evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is not able to push restrictions from the parent query down into a multiply-referenced WITH query, since that might affect all uses of the WITH query's output when it should affect only one. The multiply-referenced WITH query will be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)
但是,如果 WITH 查詢是非遞迴且不會在執行中變動的(即它是一個不包含 volatile 函數的 SELECT),則可以將其合併到父查詢之中,從而可以對兩個查詢等級進行聯合語法最佳化。預設情況下,如果父查詢僅引用一次 WITH 語句,而不是多次引用 WITH 一次查詢,則會觸發這個機制。您可以透過指定 MATERIALIZED 強制執行 WITH 查詢的單獨計算,或者透過指定 NOT MATERIALIZED 強制執行將其合併到父查詢中來覆蓋該查詢計畫。後面一種選擇可能會冒著重複計算 WITH 查詢的風險,但如果 WITH 查詢的每次使用只需要 WITH 查詢全部輸出的一小部分,那麼它仍然可以節省成本。
A simple example of these rules is
This WITH query will be folded, producing the same execution plan as
In particular, if there's an index on key, it will probably be used to fetch just the rows having key = 123. On the other hand, in
the WITH query will be materialized, producing a temporary copy of big_table that is then joined with itself — without benefit of any index. This query will be executed much more efficiently if written as
so that the parent query's restrictions can be applied directly to scans of big_table.
An example where NOT MATERIALIZED could be undesirable is
Here, materialization of the WITH query ensures that very_expensive_function is evaluated only once per table row, not twice.
The examples above only show WITH being used with SELECT, but it can be attached in the same way to INSERT, UPDATE, or DELETE. In each case it effectively provides temporary table(s) that can be referred to in the main command.
7.8.2. Data-Modifying Statements in WITH
You can use data-modifying statements (INSERT, UPDATE, or DELETE) in WITH. This allows you to perform several different operations in the same query. An example is:
This query effectively moves rows from products to products_log. The DELETE in WITH deletes the specified rows from products, returning their contents by means of its RETURNING clause; and then the primary query reads that output and inserts it into products_log.
A fine point of the above example is that the WITH clause is attached to the INSERT, not the sub-SELECT within the INSERT. This is necessary because data-modifying statements are only allowed in WITH clauses that are attached to the top-level statement. However, normal WITH visibility rules apply, so it is possible to refer to the WITH statement's output from the sub-SELECT.
Data-modifying statements in WITH usually have RETURNING clauses (see Section 6.4), as shown in the example above. It is the output of the RETURNING clause, not the target table of the data-modifying statement, that forms the temporary table that can be referred to by the rest of the query. If a data-modifying statement in WITH lacks a RETURNING clause, then it forms no temporary table and cannot be referred to in the rest of the query. Such a statement will be executed nonetheless. A not-particularly-useful example is:
This example would remove all rows from tables foo and bar. The number of affected rows reported to the client would only include rows removed from bar.
Recursive self-references in data-modifying statements are not allowed. In some cases it is possible to work around this limitation by referring to the output of a recursive WITH, for example:
This query would remove all direct and indirect subparts of a product.
Data-modifying statements in WITH are executed exactly once, and always to completion, independently of whether the primary query reads all (or indeed any) of their output. Notice that this is different from the rule for SELECT in WITH: as stated in the previous section, execution of a SELECT is carried only as far as the primary query demands its output.
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables. This alleviates the effects of the unpredictability of the actual order of row updates, and means that RETURNING data is the only way to communicate changes between different WITH sub-statements and the main query. An example of this is that in
the outer SELECT would return the original prices before the action of the UPDATE, while in
the outer SELECT would return the updated data.
Trying to update the same row twice in a single statement is not supported. Only one of the modifications takes place, but it is not easy (and sometimes not possible) to reliably predict which one. This also applies to deleting a row that was already updated in the same statement: only the update is performed. Therefore you should generally avoid trying to modify a single row twice in a single statement. In particular avoid writing WITH sub-statements that could affect the same rows changed by the main statement or a sibling sub-statement. The effects of such a statement will not be predictable.
At present, any table used as the target of a data-modifying statement in WITH must not have a conditional rule, nor an ALSO rule, nor an INSTEAD rule that expands to multiple statements.
CREATE FUNCTION price_extension(inventory_item, integer) RETURNS numeric
AS 'SELECT $1.price * $2' LANGUAGE SQL;
SELECT price_extension(item, 10) FROM on_hand;
-- includes 3, does not include 7, and does include all points in between
SELECT '[3,7)'::int4range;
-- does not include either 3 or 7, but includes all points in between
SELECT '(3,7)'::int4range;
-- includes only the single point 4
SELECT '[4,4]'::int4range;
-- includes no points (and will be normalized to 'empty')
SELECT '[4,4)'::int4range;
-- The full form is: lower bound, upper bound, and text argument indicating
-- inclusivity/exclusivity of bounds.
SELECT numrange(1.0, 14.0, '(]');
-- If the third argument is omitted, '[)' is assumed.
SELECT numrange(1.0, 14.0);
-- Although '(]' is specified here, on display the value will be converted to
-- canonical form, since int8range is a discrete range type (see below).
SELECT int8range(1, 14, '(]');
-- Using NULL for either bound causes the range to be unbounded on that side.
SELECT numrange(NULL, 2.2);
CREATE TYPE floatrange AS RANGE (
subtype = float8,
subtype_diff = float8mi
);
SELECT '[1.234, 5.678]'::floatrange;
CREATE FUNCTION time_subtype_diff(x time, y time) RETURNS float8 AS
'SELECT EXTRACT(EPOCH FROM (x - y))' LANGUAGE sql STRICT IMMUTABLE;
CREATE TYPE timerange AS RANGE (
subtype = time,
subtype_diff = time_subtype_diff
);
SELECT '[11:10, 23:00]'::timerange;
CREATE INDEX reservation_idx ON reservation USING GIST (during);
CREATE TABLE reservation (
during tsrange,
EXCLUDE USING GIST (during WITH &&)
);
INSERT INTO reservation VALUES
('[2010-01-01 11:30, 2010-01-01 15:00)');
INSERT 0 1
INSERT INTO reservation VALUES
('[2010-01-01 14:45, 2010-01-01 15:45)');
ERROR: conflicting key value violates exclusion constraint "reservation_during_excl"
DETAIL: Key (during)=(["2010-01-01 14:45:00","2010-01-01 15:45:00")) conflicts
with existing key (during)=(["2010-01-01 11:30:00","2010-01-01 15:00:00")).
CREATE EXTENSION btree_gist;
CREATE TABLE room_reservation (
room text,
during tsrange,
EXCLUDE USING GIST (room WITH =, during WITH &&)
);
INSERT INTO room_reservation VALUES
('123A', '[2010-01-01 14:00, 2010-01-01 15:00)');
INSERT 0 1
INSERT INTO room_reservation VALUES
('123A', '[2010-01-01 14:30, 2010-01-01 15:30)');
ERROR: conflicting key value violates exclusion constraint "room_reservation_room_during_excl"
DETAIL: Key (room, during)=(123A, ["2010-01-01 14:30:00","2010-01-01 15:30:00")) conflicts
with existing key (room, during)=(123A, ["2010-01-01 14:00:00","2010-01-01 15:00:00")).
INSERT INTO room_reservation VALUES
('123B', '[2010-01-01 14:30, 2010-01-01 15:30)');
INSERT 0 1
SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector;
tsvector
----------------------------------------------------
'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'
CREATE TABLE accounts (manager text, company text, contact_email text);
ALTER TABLE accounts ENABLE ROW LEVEL SECURITY;
CREATE POLICY account_managers ON accounts TO managers
USING (manager = current_user);
CREATE POLICY user_policy ON users
USING (user_name = current_user);
CREATE POLICY user_policy ON users
USING (true)
WITH CHECK (user_name = current_user);
-- Simple passwd-file based example
CREATE TABLE passwd (
user_name text UNIQUE NOT NULL,
pwhash text,
uid int PRIMARY KEY,
gid int NOT NULL,
real_name text NOT NULL,
home_phone text,
extra_info text,
home_dir text NOT NULL,
shell text NOT NULL
);
CREATE ROLE admin; -- Administrator
CREATE ROLE bob; -- Normal user
CREATE ROLE alice; -- Normal user
-- Populate the table
INSERT INTO passwd VALUES
('admin','xxx',0,0,'Admin','111-222-3333',null,'/root','/bin/dash');
INSERT INTO passwd VALUES
('bob','xxx',1,1,'Bob','123-456-7890',null,'/home/bob','/bin/zsh');
INSERT INTO passwd VALUES
('alice','xxx',2,1,'Alice','098-765-4321',null,'/home/alice','/bin/zsh');
-- Be sure to enable row level security on the table
ALTER TABLE passwd ENABLE ROW LEVEL SECURITY;
-- Create policies
-- Administrator can see all rows and add any rows
CREATE POLICY admin_all ON passwd TO admin USING (true) WITH CHECK (true);
-- Normal users can view all rows
CREATE POLICY all_view ON passwd FOR SELECT USING (true);
-- Normal users can update their own records, but
-- limit which shells a normal user is allowed to set
CREATE POLICY user_mod ON passwd FOR UPDATE
USING (current_user = user_name)
WITH CHECK (
current_user = user_name AND
shell IN ('/bin/bash','/bin/sh','/bin/dash','/bin/zsh','/bin/tcsh')
);
-- Allow admin all normal rights
GRANT SELECT, INSERT, UPDATE, DELETE ON passwd TO admin;
-- Users only get select access on public columns
GRANT SELECT
(user_name, uid, gid, real_name, home_phone, extra_info, home_dir, shell)
ON passwd TO public;
-- Allow users to update certain columns
GRANT UPDATE
(pwhash, real_name, home_phone, extra_info, shell)
ON passwd TO public;
-- admin can view all rows and fields
postgres=> set role admin;
SET
postgres=> table passwd;
user_name | pwhash | uid | gid | real_name | home_phone | extra_info | home_dir | shell
-----------+--------+-----+-----+-----------+--------------+------------+-------------+-----------
admin | xxx | 0 | 0 | Admin | 111-222-3333 | | /root | /bin/dash
bob | xxx | 1 | 1 | Bob | 123-456-7890 | | /home/bob | /bin/zsh
alice | xxx | 2 | 1 | Alice | 098-765-4321 | | /home/alice | /bin/zsh
(3 rows)
-- Test what Alice is able to do
postgres=> set role alice;
SET
postgres=> table passwd;
ERROR: permission denied for relation passwd
postgres=> select user_name,real_name,home_phone,extra_info,home_dir,shell from passwd;
user_name | real_name | home_phone | extra_info | home_dir | shell
-----------+-----------+--------------+------------+-------------+-----------
admin | Admin | 111-222-3333 | | /root | /bin/dash
bob | Bob | 123-456-7890 | | /home/bob | /bin/zsh
alice | Alice | 098-765-4321 | | /home/alice | /bin/zsh
(3 rows)
postgres=> update passwd set user_name = 'joe';
ERROR: permission denied for relation passwd
-- Alice is allowed to change her own real_name, but no others
postgres=> update passwd set real_name = 'Alice Doe';
UPDATE 1
postgres=> update passwd set real_name = 'John Doe' where user_name = 'admin';
UPDATE 0
postgres=> update passwd set shell = '/bin/xx';
ERROR: new row violates WITH CHECK OPTION for "passwd"
postgres=> delete from passwd;
ERROR: permission denied for relation passwd
postgres=> insert into passwd (user_name) values ('xxx');
ERROR: permission denied for relation passwd
-- Alice can change her own password; RLS silently prevents updating other rows
postgres=> update passwd set pwhash = 'abc';
UPDATE 1
CREATE POLICY admin_local_only ON passwd AS RESTRICTIVE TO admin
USING (pg_catalog.inet_client_addr() IS NULL);
-- definition of privilege groups
CREATE TABLE groups (group_id int PRIMARY KEY,
group_name text NOT NULL);
INSERT INTO groups VALUES
(1, 'low'),
(2, 'medium'),
(5, 'high');
GRANT ALL ON groups TO alice; -- alice is the administrator
GRANT SELECT ON groups TO public;
-- definition of users' privilege levels
CREATE TABLE users (user_name text PRIMARY KEY,
group_id int NOT NULL REFERENCES groups);
INSERT INTO users VALUES
('alice', 5),
('bob', 2),
('mallory', 2);
GRANT ALL ON users TO alice;
GRANT SELECT ON users TO public;
-- table holding the information to be protected
CREATE TABLE information (info text,
group_id int NOT NULL REFERENCES groups);
INSERT INTO information VALUES
('barely secret', 1),
('slightly secret', 2),
('very secret', 5);
ALTER TABLE information ENABLE ROW LEVEL SECURITY;
-- a row should be visible to/updatable by users whose security group_id is
-- greater than or equal to the row's group_id
CREATE POLICY fp_s ON information FOR SELECT
USING (group_id <= (SELECT group_id FROM users WHERE user_name = current_user));
CREATE POLICY fp_u ON information FOR UPDATE
USING (group_id <= (SELECT group_id FROM users WHERE user_name = current_user));
-- we rely only on RLS to protect the information table
GRANT ALL ON information TO public;
BEGIN;
UPDATE users SET group_id = 1 WHERE user_name = 'mallory';
UPDATE information SET info = 'secret from mallory' WHERE group_id = 2;
COMMIT;
SELECT * FROM information WHERE group_id = 2 FOR UPDATE;
WITH regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
), top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
)
SELECT region,
product,
SUM(quantity) AS product_units,
SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;
WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
UNION ALL
SELECT p.sub_part, p.part, p.quantity
FROM included_parts pr, parts p
WHERE p.part = pr.sub_part
)
SELECT sub_part, SUM(quantity) as total_quantity
FROM included_parts
GROUP BY sub_part
WITH RECURSIVE search_graph(id, link, data, depth) AS (
SELECT g.id, g.link, g.data, 1
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1
FROM graph g, search_graph sg
WHERE g.id = sg.link
)
SELECT * FROM search_graph;
WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
SELECT g.id, g.link, g.data, 1,
ARRAY[g.id],
false
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,
path || g.id,
g.id = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
SELECT g.id, g.link, g.data, 1,
ARRAY[ROW(g.f1, g.f2)],
false
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,
path || ROW(g.f1, g.f2),
ROW(g.f1, g.f2) = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
WITH RECURSIVE t(n) AS (
SELECT 1
UNION ALL
SELECT n+1 FROM t
)
SELECT n FROM t LIMIT 100;
WITH w AS (
SELECT * FROM big_table
)
SELECT * FROM w WHERE key = 123;
SELECT * FROM big_table WHERE key = 123;
WITH w AS (
SELECT * FROM big_table
)
SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
WHERE w2.key = 123;
WITH w AS NOT MATERIALIZED (
SELECT * FROM big_table
)
SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
WHERE w2.key = 123;
WITH w AS (
SELECT key, very_expensive_function(val) as f FROM some_table
)
SELECT * FROM w AS w1 JOIN w AS w2 ON w1.f = w2.f;
WITH moved_rows AS (
DELETE FROM products
WHERE
"date" >= '2010-10-01' AND
"date" < '2010-11-01'
RETURNING *
)
INSERT INTO products_log
SELECT * FROM moved_rows;
WITH t AS (
DELETE FROM foo
)
DELETE FROM bar;
WITH RECURSIVE included_parts(sub_part, part) AS (
SELECT sub_part, part FROM parts WHERE part = 'our_product'
UNION ALL
SELECT p.sub_part, p.part
FROM included_parts pr, parts p
WHERE p.part = pr.sub_part
)
DELETE FROM parts
WHERE part IN (SELECT part FROM included_parts);
WITH t AS (
UPDATE products SET price = price * 1.05
RETURNING *
)
SELECT * FROM products;
WITH t AS (
UPDATE products SET price = price * 1.05
RETURNING *
)
SELECT * FROM t;
引用和同步刪除有兩個常見的作法。用「RESTRICT」防止參考的資料被刪除;「NO ACTION」表示當限制條件被違反時,引用欄位的資料仍會留存,然後回傳錯誤訊息,如果未指定處理方式的話,這會是預設的行為(這兩個語法根本上的不同是「NO ACTION」允許延遲檢查到交易事務的最後,而「RESTRICT」則不會。);「CASCADE」指的是當參考的資料列被刪除時,引用的資料列也會同步被刪除。刪除時還有兩個其他的選項:SET NULL 和 SET DEFAULT,表示引用的資料會被更新為空值或其預設值。注意到,這並不是說你就可以違反限制條件。舉個例來說,如果使用了 SET DEFAULT,但預設值卻違反了外部鍵的限制,這個操作將會失敗。
類似的於 ON DELETE 的情況是 ON UPDATE,也就是在參考欄位的資料內容被更新時的情況。可以設定的動作關鍵字是相同的。在這個情況的 CASCADE 指的就是更新參考欄位的資料內容時,引用欄位的內容也會同步被更新為相同的內容。
一般來說,引用的資料列不需要滿足外部鍵的定義,如果其任一欄位內容為空值的話。而如果「MATCH FULL」加到宣告的語法之中的話,引用的資料列就必須要全部都是空值才不受外部鍵的限制(也就是部份空值的資料列就不受限制)。如果要避免空值使得外部鍵失效的話,就應該宣告相關欄位為 NOT NULL。
CREATE TABLE circles (
c circle,
EXCLUDE USING gist (c WITH &&)
);
9.12. 網路位址函式及運算子
Table 9.36shows the operators available for thecidrandinettypes. The operators<<,<<=,>>,>>=, and&&test for subnet inclusion. They consider only the network parts of the two addresses (ignoring any host part) and determine whether one network is identical to or a subnet of the other.
Table 9.36. cidrandinetOperators
shows the functions available for use with thecidrandinettypes. Theabbrev,host, andtextfunctions are primarily intended to offer alternative display formats.
Table 9.37. cidrandinetFunctions
Anycidrvalue can be cast toinetimplicitly or explicitly; therefore, the functions shown above as operating oninetalso work oncidrvalues. (Where there are separate functions forinetandcidr, it is because the behavior should be different for the two cases.) Also, it is permitted to cast aninetvalue tocidr. When this is done, any bits to the right of the netmask are silently zeroed to create a validcidrvalue. In addition, you can cast a text value toinetorcidr
shows the functions available for use with themacaddrtype. The functiontrunc(macaddr)returns a MAC address with the last 3 bytes set to zero. This can be used to associate the remaining prefix with a manufacturer.
Table 9.38. macaddrFunctions
Themacaddrtype also supports the standard relational operators (>,<=, etc.) for lexicographical ordering, and the bitwise arithmetic operators (~,&and|) for NOT, AND and OR.
shows the functions available for use with themacaddr8type. The functiontrunc(macaddr8)returns a MAC address with the last 5 bytes set to zero. This can be used to associate the remaining prefix with a manufacturer.
Table 9.39. macaddr8Functions
Themacaddr8type also supports the standard relational operators (>,<=, etc.) for ordering, and the bitwise arithmetic operators (~,&and|) for NOT, AND and OR.
8.15. 陣列
PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, composite type, range type, or domain can be created.
8.15.1. Declaration of Array Types
To illustrate the use of array types, we create this table:
As shown, an array data type is named by appending square brackets ([]
<=
is less than or equal
inet '192.168.1.5' <= inet '192.168.1.5'
=
equals
inet '192.168.1.5' = inet '192.168.1.5'
>=
is greater or equal
inet '192.168.1.5' >= inet '192.168.1.5'
>
is greater than
inet '192.168.1.5' > inet '192.168.1.4'
<>
is not equal
inet '192.168.1.5' <> inet '192.168.1.4'
<<
is contained by
inet '192.168.1.5' << inet '192.168.1/24'
<<=
is contained by or equals
inet '192.168.1/24' <<= inet '192.168.1/24'
>>
contains
inet '192.168.1/24' >> inet '192.168.1.5'
>>=
contains or equals
inet '192.168.1/24' >>= inet '192.168.1/24'
&&
contains or is contained by
inet '192.168.1/24' && inet '192.168.1.80/28'
~
bitwise NOT
~ inet '192.168.1.6'
&
bitwise AND
inet '192.168.1.6' & inet '0.0.0.255'
`
`
bitwise OR
`inet '192.168.1.6'
inet '0.0.0.255'`
+
addition
inet '192.168.1.6' + 25
-
subtraction
inet '192.168.1.43' - 36
-
subtraction
inet '192.168.1.43' - inet '192.168.1.19'
abbrev(cidr)
text
abbreviated display format as text
abbrev(cidr '10.1.0.0/16')
10.1/16
broadcast(inet)
inet
broadcast address for network
broadcast('192.168.1.5/24')
192.168.1.255/24
family(inet)
int
extract family of address;4for IPv4,6for IPv6
family('::1')
6
host(inet)
text
extract IP address as text
host('192.168.1.5/24')
192.168.1.5
hostmask(inet)
inet
construct host mask for network
hostmask('192.168.23.20/30')
0.0.0.3
masklen(inet)
int
extract netmask length
masklen('192.168.1.5/24')
24
netmask(inet)
inet
construct netmask for network
netmask('192.168.1.5/24')
255.255.255.0
network(inet)
cidr
extract network part of address
network('192.168.1.5/24')
192.168.1.0/24
set_masklen(inet,int)
inet
set netmask length forinetvalue
set_masklen('192.168.1.5/24', 16)
192.168.1.5/16
set_masklen(cidr,int)
cidr
set netmask length forcidrvalue
set_masklen('192.168.1.0/24'::cidr, 16)
192.168.0.0/16
text(inet)
text
extract IP address and netmask length as text
text(inet '192.168.1.5')
192.168.1.5/32
inet_same_family(inet,inet)
boolean
are the addresses from the same family?
inet_same_family('192.168.1.5/24', '::1')
false
inet_merge(inet,inet)
cidr
the smallest network which includes both of the given networks
inet_merge('192.168.1.5/24', '192.168.2.5/24')
192.168.0.0/22
using normal casting syntax: for example,
inet(expression
)or
colname
::cidr.
macaddr8_set7bit(macaddr8)
macaddr8
set 7th bit to one, also known as modified EUI-64, for inclusion in an IPv6 address
) to the data type name of the array elements. The above command will create a table named
sal_emp
with a column of type
text
(
name
), a one-dimensional array of type
integer
(
pay_by_quarter
), which represents the employee's salary by quarter, and a two-dimensional array of
text
(
schedule
), which represents the employee's weekly schedule.
The syntax for CREATE TABLE allows the exact size of arrays to be specified, for example:
However, the current implementation ignores any supplied array size limits, i.e., the behavior is the same as for arrays of unspecified length.
The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE is simply documentation; it does not affect run-time behavior.
An alternative syntax, which conforms to the SQL standard by using the keyword ARRAY, can be used for one-dimensional arrays. pay_by_quarter could have been defined as:
Or, if no array size is to be specified:
As before, however, PostgreSQL does not enforce the size restriction in any case.
8.15.2. Array Value Input
To write an array value as a literal constant, enclose the element values within curly braces and separate them by commas. (If you know C, this is not unlike the C syntax for initializing structures.) You can put double quotes around any element value, and must do so if it contains commas or curly braces. (More details appear below.) Thus, the general format of an array constant is the following:
where delim is the delimiter character for the type, as recorded in its pg_type entry. Among the standard data types provided in the PostgreSQL distribution, all use a comma (,), except for type box which uses a semicolon (;). Each val is either a constant of the array element type, or a subarray. An example of an array constant is:
This constant is a two-dimensional, 3-by-3 array consisting of three subarrays of integers.
To set an element of an array constant to NULL, write NULL for the element value. (Any upper- or lower-case variant of NULL will do.) If you want an actual string value “NULL”, you must put double quotes around it.
(These kinds of array constants are actually only a special case of the generic type constants discussed in Section 4.1.2.7. The constant is initially treated as a string and passed to the array input conversion routine. An explicit type specification might be necessary.)
Now we can show some INSERT statements:
The result of the previous two inserts looks like this:
Multidimensional arrays must have matching extents for each dimension. A mismatch causes an error, for example:
The ARRAY constructor syntax can also be used:
Notice that the array elements are ordinary SQL constants or expressions; for instance, string literals are single quoted, instead of double quoted as they would be in an array literal. The ARRAY constructor syntax is discussed in more detail in Section 4.2.12.
8.15.3. Accessing Arrays
Now, we can run some queries on the table. First, we show how to access a single element of an array. This query retrieves the names of the employees whose pay changed in the second quarter:
The array subscript numbers are written within square brackets. By default PostgreSQL uses a one-based numbering convention for arrays, that is, an array of n elements starts with array[1] and ends with array[n].
This query retrieves the third quarter pay of all employees:
We can also access arbitrary rectangular slices of an array, or subarrays. An array slice is denoted by writing lower-bound:upper-bound for one or more array dimensions. For example, this query retrieves the first item on Bill's schedule for the first two days of the week:
If any dimension is written as a slice, i.e., contains a colon, then all dimensions are treated as slices. Any dimension that has only a single number (no colon) is treated as being from 1 to the number specified. For example, [2] is treated as [1:2], as in this example:
To avoid confusion with the non-slice case, it's best to use slice syntax for all dimensions, e.g., [1:2][1:1], not [2][1:1].
It is possible to omit the lower-bound and/or upper-bound of a slice specifier; the missing bound is replaced by the lower or upper limit of the array's subscripts. For example:
An array subscript expression will return null if either the array itself or any of the subscript expressions are null. Also, null is returned if a subscript is outside the array bounds (this case does not raise an error). For example, if schedule currently has the dimensions [1:3][1:2] then referencing schedule[3][3] yields NULL. Similarly, an array reference with the wrong number of subscripts yields a null rather than an error.
An array slice expression likewise yields null if the array itself or any of the subscript expressions are null. However, in other cases such as selecting an array slice that is completely outside the current array bounds, a slice expression yields an empty (zero-dimensional) array instead of null. (This does not match non-slice behavior and is done for historical reasons.) If the requested slice partially overlaps the array bounds, then it is silently reduced to just the overlapping region instead of returning null.
The current dimensions of any array value can be retrieved with the array_dims function:
array_dims produces a text result, which is convenient for people to read but perhaps inconvenient for programs. Dimensions can also be retrieved with array_upper and array_lower, which return the upper and lower bound of a specified array dimension, respectively:
array_length will return the length of a specified array dimension:
cardinality returns the total number of elements in an array across all dimensions. It is effectively the number of rows a call to unnest would yield:
8.15.4. Modifying Arrays
An array value can be replaced completely:
or using the ARRAY expression syntax:
An array can also be updated at a single element:
or updated in a slice:
The slice syntaxes with omitted lower-bound and/or upper-bound can be used too, but only when updating an array value that is not NULL or zero-dimensional (otherwise, there is no existing subscript limit to substitute).
A stored array value can be enlarged by assigning to elements not already present. Any positions between those previously present and the newly assigned elements will be filled with nulls. For example, if array myarray currently has 4 elements, it will have six elements after an update that assigns to myarray[6]; myarray[5] will contain null. Currently, enlargement in this fashion is only allowed for one-dimensional arrays, not multidimensional arrays.
Subscripted assignment allows creation of arrays that do not use one-based subscripts. For example one might assign to myarray[-2:7] to create an array with subscript values from -2 to 7.
New array values can also be constructed using the concatenation operator, ||:
The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. It also accepts two N-dimensional arrays, or an N-dimensional and an N+1-dimensional array.
When a single element is pushed onto either the beginning or end of a one-dimensional array, the result is an array with the same lower bound subscript as the array operand. For example:
When two arrays with an equal number of dimensions are concatenated, the result retains the lower bound subscript of the left-hand operand's outer dimension. The result is an array comprising every element of the left-hand operand followed by every element of the right-hand operand. For example:
When an N-dimensional array is pushed onto the beginning or end of an N+1-dimensional array, the result is analogous to the element-array case above. Each N-dimensional sub-array is essentially an element of the N+1-dimensional array's outer dimension. For example:
An array can also be constructed by using the functions array_prepend, array_append, or array_cat. The first two only support one-dimensional arrays, but array_cat supports multidimensional arrays. Some examples:
In simple cases, the concatenation operator discussed above is preferred over direct use of these functions. However, because the concatenation operator is overloaded to serve all three cases, there are situations where use of one of the functions is helpful to avoid ambiguity. For example consider:
In the examples above, the parser sees an integer array on one side of the concatenation operator, and a constant of undetermined type on the other. The heuristic it uses to resolve the constant's type is to assume it's of the same type as the operator's other input — in this case, integer array. So the concatenation operator is presumed to represent array_cat, not array_append. When that's the wrong choice, it could be fixed by casting the constant to the array's element type; but explicit use of array_append might be a preferable solution.
8.15.5. Searching in Arrays
To search for a value in an array, each value must be checked. This can be done manually, if you know the size of the array. For example:
However, this quickly becomes tedious for large arrays, and is not helpful if the size of the array is unknown. An alternative method is described in Section 9.23. The above query could be replaced by:
In addition, you can find rows where the array has all values equal to 10000 with:
Alternatively, the generate_subscripts function can be used. For example:
You can also search an array using the && operator, which checks whether the left operand overlaps with the right operand. For instance:
This and other array operators are further described in Section 9.18. It can be accelerated by an appropriate index, as described in Section 11.2.
You can also search for specific values in an array using the array_position and array_positions functions. The former returns the subscript of the first occurrence of a value in an array; the latter returns an array with the subscripts of all occurrences of the value in the array. For example:
Tip
Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.
8.15.6. Array Input and Output Syntax
The external text representation of an array value consists of items that are interpreted according to the I/O conversion rules for the array's element type, plus decoration that indicates the array structure. The decoration consists of curly braces ({ and }) around the array value plus delimiter characters between adjacent items. The delimiter character is usually a comma (,) but can be something else: it is determined by the typdelim setting for the array's element type. Among the standard data types provided in the PostgreSQL distribution, all use a comma, except for type box, which uses a semicolon (;). In a multidimensional array, each dimension (row, plane, cube, etc.) gets its own level of curly braces, and delimiters must be written between adjacent curly-braced entities of the same level.
The array output routine will put double quotes around element values if they are empty strings, contain curly braces, delimiter characters, double quotes, backslashes, or white space, or match the word NULL. Double quotes and backslashes embedded in element values will be backslash-escaped. For numeric data types it is safe to assume that double quotes will never appear, but for textual data types one should be prepared to cope with either the presence or absence of quotes.
By default, the lower bound index value of an array's dimensions is set to one. To represent arrays with other lower bounds, the array subscript ranges can be specified explicitly before writing the array contents. This decoration consists of square brackets ([]) around each array dimension's lower and upper bounds, with a colon (:) delimiter character in between. The array dimension decoration is followed by an equal sign (=). For example:
The array output routine will include explicit dimensions in its result only when there are one or more lower bounds different from one.
If the value written for an element is NULL (in any case variant), the element is taken to be NULL. The presence of any quotes or backslashes disables this and allows the literal string value “NULL” to be entered. Also, for backward compatibility with pre-8.2 versions of PostgreSQL, the array_nulls configuration parameter can be turned off to suppress recognition of NULL as a NULL.
As shown previously, when writing an array value you can use double quotes around any individual array element. You must do so if the element value would otherwise confuse the array-value parser. For example, elements containing curly braces, commas (or the data type's delimiter character), double quotes, backslashes, or leading or trailing whitespace must be double-quoted. Empty strings and strings matching the word NULL must be quoted, too. To put a double quote or backslash in a quoted array element value, precede it with a backslash. Alternatively, you can avoid quotes and use backslash-escaping to protect all data characters that would otherwise be taken as array syntax.
You can add whitespace before a left brace or after a right brace. You can also add whitespace before or after any individual item string. In all of these cases the whitespace will be ignored. However, whitespace within double-quoted elements, or surrounded on both sides by non-whitespace characters of an element, is not ignored.
Tip
The ARRAY constructor syntax (see Section 4.2.12) is often easier to work with than the array-literal syntax when writing array values in SQL commands. In ARRAY, individual element values are written the same way they would be written when not members of an array.
SELECT name FROM sal_emp WHERE pay_by_quarter[1] <> pay_by_quarter[2];
name
-------
Carol
(1 row)
SELECT pay_by_quarter[3] FROM sal_emp;
pay_by_quarter
----------------
10000
25000
(2 rows)
SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';
schedule
------------------------
{{meeting},{training}}
(1 row)
SELECT schedule[1:2][2] FROM sal_emp WHERE name = 'Bill';
schedule
-------------------------------------------
{{meeting,lunch},{training,presentation}}
(1 row)
SELECT schedule[:2][2:] FROM sal_emp WHERE name = 'Bill';
schedule
------------------------
{{lunch},{presentation}}
(1 row)
SELECT schedule[:][1:1] FROM sal_emp WHERE name = 'Bill';
schedule
------------------------
{{meeting},{training}}
(1 row)
SELECT array_dims(schedule) FROM sal_emp WHERE name = 'Carol';
array_dims
------------
[1:2][1:2]
(1 row)
SELECT array_upper(schedule, 1) FROM sal_emp WHERE name = 'Carol';
array_upper
-------------
2
(1 row)
SELECT array_length(schedule, 1) FROM sal_emp WHERE name = 'Carol';
array_length
--------------
2
(1 row)
SELECT cardinality(schedule) FROM sal_emp WHERE name = 'Carol';
cardinality
-------------
4
(1 row)
UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}'
WHERE name = 'Carol';
UPDATE sal_emp SET pay_by_quarter = ARRAY[25000,25000,27000,27000]
WHERE name = 'Carol';
UPDATE sal_emp SET pay_by_quarter[4] = 15000
WHERE name = 'Bill';
UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}'
WHERE name = 'Carol';
SELECT ARRAY[1, 2] || '{3, 4}'; -- the untyped literal is taken as an array
?column?
-----------
{1,2,3,4}
SELECT ARRAY[1, 2] || '7'; -- so is this one
ERROR: malformed array literal: "7"
SELECT ARRAY[1, 2] || NULL; -- so is an undecorated NULL
?column?
----------
{1,2}
(1 row)
SELECT array_append(ARRAY[1, 2], NULL); -- this might have been meant
array_append
--------------
{1,2,NULL}
SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR
pay_by_quarter[2] = 10000 OR
pay_by_quarter[3] = 10000 OR
pay_by_quarter[4] = 10000;
SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);
SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);
SELECT * FROM
(SELECT pay_by_quarter,
generate_subscripts(pay_by_quarter, 1) AS s
FROM sal_emp) AS foo
WHERE pay_by_quarter[s] = 10000;
SELECT * FROM sal_emp WHERE pay_by_quarter && ARRAY[10000];
SELECT f1[1][-2][3] AS e1, f1[1][-1][5] AS e2
FROM (SELECT '[1:1][-2:-1][3:5]={{{1,2,3},{4,5,6}}}'::int[] AS f1) AS ss;
e1 | e2
----+----
1 | 6
(1 row)
The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data. Basically, each jsonb_path_ops index item is a hash of the value and the key(s) leading to it; for example to index {"foo": {"bar": "baz"}}, a single index item would be created incorporating all three of foo, bar, and baz into the hash value. Thus a containment query looking for this structure would result in an extremely specific index search; but there is no way at all to find out whether foo appears as a key. On the other hand, a jsonb_ops index would create three index items representing
A disadvantage of the jsonb_path_ops approach is that it produces no index entries for JSON structures not containing any values, such as {"a": {}}. If a search for documents containing such a structure is requested, it will require a full-index scan, which is quite slow. jsonb_path_ops is therefore ill-suited for applications that often perform such searches.
jsonb also supports btree and hash indexes. These are usually useful only if it's important to check equality of complete JSON documents. The btree ordering for jsonb datums is seldom of great interest, but for completeness it is:
Objects with equal numbers of pairs are compared in the order:
Note that object keys are compared in their storage order; in particular, since shorter keys are stored before longer keys, this can lead to results that might be unintuitive, such as:
Similarly, arrays with equal numbers of elements are compared in the order:
Primitive JSON values are compared using the same comparison rules as for the underlying PostgreSQL data type. Strings are compared using the default database collation.
A path expression consists of a sequence of path elements, which can be the following:
Path literals of JSON primitive types: Unicode text, numeric, true, false, or null.
Path variables listed in .
Accessor operators listed in .
For details on using jsonpath expressions with SQL/JSON query functions, see .
Table 8.24. jsonpath Variables
Table 8.25. jsonpath Accessors
For this purpose, the term “value” includes array elements, though JSON terminology sometimes considers array elements distinct from values within objects.
SQL 語法並不是很嚴格要求什麼樣的指示記號來識別命令,或是哪些是運算子或參數。通常最前面的指示記號是命令的名稱,以上面的例子來說,我們通常會說是一個「SELECT」、一個「UPDATE」、以及一個「INSERT」命令。但對於 UPDATE 命令而言,有一個 SET 指示記號出現在某個地方是必要的;同樣地,INSERT 也需要有 VALUES 來搭配。精確的語法規則都在中的章節進行說明。
boolean
boolean
僅接受小寫的 true 和 false
null
(none)
與 SQL NULL 是不同的概念
foo
,
bar
, and
baz
separately; then to do the containment query, it would look for rows containing all three of these items. While GIN indexes can perform such an AND search fairly efficiently, it will still be less specific and slower than the equivalent
jsonb_path_ops
search, especially if there are a very large number of rows containing any single one of the three index items.
Parentheses, which can be used to provide filter expressions or define the order of path evaluation.
.**{level}
.**{start_level to end_level}
Same as .**, but with a filter over nesting levels of JSON hierarchy. Nesting levels are specified as integers. Zero level corresponds to the current object. To access the lowest nesting level, you can use the last keyword. This is a PostgreSQL extension of the SQL/JSON standard.
[subscript, ...]
Array element accessor. subscript can be given in two forms: index or start_index to end_index. The first form returns a single array element by its index. The second form returns an array slice by the range of indexes, including the elements that correspond to the provided start_index and end_index.
The specified index can be an integer, as well as an expression returning a single numeric value, which is automatically cast to integer. Zero index corresponds to the first array element. You can also use the
[*]
Wildcard array element accessor that returns all array elements.
A variable representing the JSON text to be queried (the context item).
$varname
A named variable. Its value can be set by the parameter vars of several JSON processing functions. See Table 9.47 and its notes for details.
@
A variable representing the result of path evaluation in filter expressions.
Accessor Operator
Description
.key
."$varname"
Member accessor that returns an object member with the specified key. If the key name is a named variable starting with $ or does not meet the JavaScript rules of an identifier, it must be enclosed in double quotes as a character string literal.
.*
Wildcard member accessor that returns the values of all members located at the top level of the current object.
.**
Recursive wildcard member accessor that processes all levels of the JSON hierarchy of the current object and returns all the member values, regardless of their nesting level. This is a PostgreSQL extension of the SQL/JSON standard.
PostgreSQL 也支援跳脫字串常數,這些是 SQL 標準的延伸。跳脫字串常數使用的是字母 E (大小寫皆可),緊接著單引號所組成,例如:E'foo'。(如果字串有超過一行的話,也只要在第一個單引號前有 E 就可以了。)在跳脫字串當中,使用倒斜線開頭,就可以使用 C 語言式的倒斜線跳脫字串,通常是一個倒斜線再接一個字元,對應到一個特殊位元組的值,如 Table 4.1 所示。
-- Simple scalar/primitive value
-- Primitive values can be numbers, quoted strings, true, false, or null
SELECT '5'::json;
-- Array of zero or more elements (elements need not be of same type)
SELECT '[1, 2, "foo", null]'::json;
-- Object containing pairs of keys and values
-- Note that object keys must always be quoted strings
SELECT '{"bar": "baz", "balance": 7.77, "active": false}'::json;
-- Arrays and objects can be nested arbitrarily
SELECT '{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'::json;
-- Simple scalar/primitive values contain only the identical value:
SELECT '"foo"'::jsonb @> '"foo"'::jsonb;
-- The array on the right side is contained within the one on the left:
SELECT '[1, 2, 3]'::jsonb @> '[1, 3]'::jsonb;
-- Order of array elements is not significant, so this is also true:
SELECT '[1, 2, 3]'::jsonb @> '[3, 1]'::jsonb;
-- Duplicate array elements don't matter either:
SELECT '[1, 2, 3]'::jsonb @> '[1, 2, 2]'::jsonb;
-- The object with a single pair on the right side is contained
-- within the object on the left side:
SELECT '{"product": "PostgreSQL", "version": 9.4, "jsonb": true}'::jsonb @> '{"version": 9.4}'::jsonb;
-- The array on the right side is not considered contained within the
-- array on the left, even though a similar array is nested within it:
SELECT '[1, 2, [1, 3]]'::jsonb @> '[1, 3]'::jsonb; -- yields false
-- But with a layer of nesting, it is contained:
SELECT '[1, 2, [1, 3]]'::jsonb @> '[[1, 3]]'::jsonb;
-- Similarly, containment is not reported here:
SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"bar": "baz"}'::jsonb; -- yields false
-- A top-level key and an empty object is contained:
SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"foo": {}}'::jsonb;
-- This array contains the primitive string value:
SELECT '["foo", "bar"]'::jsonb @> '"bar"'::jsonb;
-- This exception is not reciprocal -- non-containment is reported here:
SELECT '"bar"'::jsonb @> '["bar"]'::jsonb; -- yields false
-- String exists as array element:
SELECT '["foo", "bar", "baz"]'::jsonb ? 'bar';
-- String exists as object key:
SELECT '{"foo": "bar"}'::jsonb ? 'foo';
-- Object values are not considered:
SELECT '{"foo": "bar"}'::jsonb ? 'bar'; -- yields false
-- As with containment, existence must match at the top level:
SELECT '{"foo": {"bar": "baz"}}'::jsonb ? 'bar'; -- yields false
-- A string is considered to exist if it matches a primitive JSON string:
SELECT '"foo"'::jsonb ? 'foo';
SELECT doc->'site_name' FROM websites
WHERE doc @> '{"tags":[{"term":"paris"}, {"term":"food"}]}';
SELECT doc->'site_name' FROM websites
WHERE doc->'tags' @> '[{"term":"paris"}, {"term":"food"}]';
CREATE INDEX idxgin ON api USING GIN (jdoc);
CREATE INDEX idxginp ON api USING GIN (jdoc jsonb_path_ops);
-- Find documents in which the key "company" has value "Magnafone"
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"company": "Magnafone"}';
-- Find documents in which the key "tags" contains key or array element "qui"
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc -> 'tags' ? 'qui';
CREATE INDEX idxgintags ON api USING GIN ((jdoc -> 'tags'));
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"';
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] ? (@ == "qui")';
-- Find documents in which the key "tags" contains array element "qui"
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qui"]}';
Object > Array > Boolean > Number > String > Null
Object with n pairs > object with n - 1 pairs
Array with n elements > array with n - 1 elements
key-1, value-1, key-2 ...
{ "aa": 1, "c": 1} > {"b": 1, "d": 1}
element-1, element-2 ...
SELECT * FROM MY_TABLE;
UPDATE MY_TABLE SET A = 5;
INSERT INTO MY_TABLE VALUES (3, 'hi there');
如果您的應用程式需要使用上面未列出的其他分割區形式,則可以使用替代方法,例如繼承和 UNION ALL 檢視表。此類方法提供了靈活性,但沒有內建宣告分割區的效能優勢。
5.11.2. Declarative Partitioning
PostgreSQL offers a way to specify how to divide a table into pieces called partitions. The table that is divided is referred to as a partitioned table. The specification consists of the partitioning method and a list of columns or expressions to be used as the partition key.
All rows inserted into a partitioned table will be routed to one of the partitions based on the value of the partition key. Each partition has a subset of the data defined by its partition bounds. The currently supported partitioning methods are range, list, and hash.
Partitions may themselves be defined as partitioned tables, using what is called sub-partitioning. Partitions may have their own indexes, constraints and default values, distinct from those of other partitions. See for more details on creating partitioned tables and partitions.
It is not possible to turn a regular table into a partitioned table or vice versa. However, it is possible to add a regular or partitioned table containing data as a partition of a partitioned table, or remove a partition from a partitioned table turning it into a standalone table; see to learn more about the ATTACH PARTITION and DETACH PARTITION sub-commands.
Individual partitions are linked to the partitioned table with inheritance behind-the-scenes; however, it is not possible to use some of the generic features of inheritance (discussed below) with declaratively partitioned tables or their partitions. For example, a partition cannot have any parents other than the partitioned table it is a partition of, nor can a regular table inherit from a partitioned table making the latter its parent. That means partitioned tables and their partitions do not participate in inheritance with regular tables. Since a partition hierarchy consisting of the partitioned table and its partitions is still an inheritance hierarchy, all the normal rules of inheritance apply as described in with some exceptions, most notably:
Both CHECK and NOT NULL constraints of a partitioned table are always inherited by all its partitions. CHECK constraints that are marked NO INHERIT are not allowed to be created on partitioned tables.
Using ONLY to add or drop a constraint on only the partitioned table is supported as long as there are no partitions. Once partitions exist, using
分割區也可以是外部資料表,儘管它們會有一些普通資料表所沒有的限制。有關更多資訊,請參閱 。
更新資料的分割區主鍵會將其遷移到該筆資料所滿足分割區範圍的其他分割區中。
5.11.2.1. Example
Suppose we are constructing a database for a large ice cream company. The company measures peak temperatures every day as well as ice cream sales in each region. Conceptually, we want a table like:
We know that most queries will access just the last week's, month's or quarter's data, since the main use of this table will be to prepare online reports for management. To reduce the amount of old data that needs to be stored, we decide to only keep the most recent 3 years worth of data. At the beginning of each month we will remove the oldest month's data. In this situation we can use partitioning to help us meet all of our different requirements for the measurements table.
To use declarative partitioning in this case, use the following steps:
Create measurement table as a partitioned table by specifying the PARTITION BY clause, which includes the partitioning method (RANGE in this case) and the list of column(s) to use as the partition key.
You may decide to use multiple columns in the partition key for range partitioning, if desired. Of course, this will often result in a larger number of partitions, each of which is individually smaller. On the other hand, using fewer columns may lead to a coarser-grained partitioning criteria with smaller number of partitions. A query accessing the partitioned table will have to scan fewer partitions if the conditions involve some or all of these columns. For example, consider a table range partitioned using columns lastname and firstname
In the above example we would be creating a new partition each month, so it might be wise to write a script that generates the required DDL automatically.
5.11.2.2. Partition Maintenance
Normally the set of partitions established when initially defining the table are not intended to remain static. It is common to want to remove old partitions of data and periodically add new partitions for new data. One of the most important advantages of partitioning is precisely that it allows this otherwise painful task to be executed nearly instantaneously by manipulating the partition structure, rather than physically moving large amounts of data around.
The simplest option for removing old data is to drop the partition that is no longer necessary:
This can very quickly delete millions of records because it doesn't have to individually delete every record. Note however that the above command requires taking an ACCESS EXCLUSIVE lock on the parent table.
Another option that is often preferable is to remove the partition from the partitioned table but retain access to it as a table in its own right:
This allows further operations to be performed on the data before it is dropped. For example, this is often a useful time to back up the data using COPY, pg_dump, or similar tools. It might also be a useful time to aggregate data into smaller formats, perform other data manipulations, or run reports.
Similarly we can add a new partition to handle new data. We can create an empty partition in the partitioned table just as the original partitions were created above:
As an alternative, it is sometimes more convenient to create the new table outside the partition structure, and make it a proper partition later. This allows the data to be loaded, checked, and transformed prior to it appearing in the partitioned table:
Before running the ATTACH PARTITION command, it is recommended to create a CHECK constraint on the table to be attached matching the desired partition constraint. That way, the system will be able to skip the scan to validate the implicit partition constraint. Without the CHECK constraint, the table will be scanned to validate the partition constraint while holding an ACCESS EXCLUSIVE lock on that partition and a SHARE UPDATE EXCLUSIVE lock on the parent table. It may be desired to drop the redundant CHECK constraint after ATTACH PARTITION is finished.
As explained above, it is possible to create indexes on partitioned tables and they are applied automatically to the entire hierarchy. This is very convenient, as not only the existing partitions will become indexed, but also any partitions that are created in the future will. One limitation is that it's not possible to use the CONCURRENTLY qualifier when creating such a partitioned index. To overcome long lock times, it is possible to use CREATE INDEX ON ONLY the partitioned table; such an index is marked invalid, and the partitions do not get the index applied automatically. The indexes on partitions can be created separately using CONCURRENTLY, and later attached to the index on the parent using ALTER INDEX .. ATTACH PARTITION. Once indexes for all partitions are attached to the parent index, the parent index is marked valid automatically. Example:
This technique can be used with UNIQUE and PRIMARY KEY constraints too; the indexes are created implicitly when the constraint is created. Example:
While the built-in declarative partitioning is suitable for most common use cases, there are some circumstances where a more flexible approach may be useful. Partitioning can be implemented using table inheritance, which allows for several features not supported by declarative partitioning, such as:
For declarative partitioning, partitions must have exactly the same set of columns as the partitioned table, whereas with table inheritance, child tables may have extra columns not present in the parent.
Table inheritance allows for multiple inheritance.
Declarative partitioning only supports range, list and hash partitioning, whereas table inheritance allows data to be divided in a manner of the user's choosing. (Note, however, that if constraint exclusion is unable to prune child tables effectively, query performance might be poor.)
5.11.3.1. Example
We use the same measurement table we used above. To implement partitioning using inheritance, use the following steps:
Create the “master” table, from which all of the “child” tables will inherit. This table will contain no data. Do not define any check constraints on this table, unless you intend them to be applied equally to all child tables. There is no point in defining any indexes or unique constraints on it, either. For our example, the master table is the measurement table as originally defined.
Create several “child” tables that each inherit from the master table. Normally, these tables will not add any columns to the set inherited from the master. Just as with declarative partitioning, these tables are in every way normal PostgreSQL tables (or foreign tables).
As we can see, a complex table hierarchy could require a substantial amount of DDL. In the above example we would be creating a new child table each month, so it might be wise to write a script that generates the required DDL automatically.
5.11.3.2. Maintenance For Inheritance Partitioning
To remove old data quickly, simply drop the child table that is no longer necessary:
To remove the child table from the inheritance hierarchy table but retain access to it as a table in its own right:
To add a new child table to handle new data, create an empty child table just as the original children were created above:
Alternatively, one may want to create and populate the new child table before adding it to the table hierarchy. This could allow data to be loaded, checked, and transformed before being made visible to queries on the parent table.
5.11.3.3. Caveats
The following caveats apply to partitioning implemented using inheritance:
There is no automatic way to verify that all of the CHECK constraints are mutually exclusive. It is safer to create code that generates child tables and creates and/or modifies associated objects than to write each by hand.
Indexes and foreign key constraints apply to single tables and not to their inheritance children, hence they have some to be aware of.
Some or all of the partitions might use index scans instead of full-table sequential scans, but the point here is that there is no need to scan the older partitions at all to answer this query. When we enable partition pruning, we get a significantly cheaper plan that will deliver the same answer:
Note that partition pruning is driven only by the constraints defined implicitly by the partition keys, not by the presence of indexes. Therefore it isn't necessary to define indexes on the key columns. Whether an index needs to be created for a given partition depends on whether you expect that queries that scan the partition will generally scan a large part of the partition or just a small part. An index will be helpful in the latter case but not the former.
Partition pruning can be performed not only during the planning of a given query, but also during its execution. This is useful as it can allow more partitions to be pruned when clauses contain expressions whose values are not known at query planning time, for example, parameters defined in a PREPARE statement, using a value obtained from a subquery, or using a parameterized value on the inner side of a nested loop join. Partition pruning during execution can be performed at any of the following times:
During initialization of the query plan. Partition pruning can be performed here for parameter values which are known during the initialization phase of execution. Partitions which are pruned during this stage will not show up in the query's EXPLAIN or EXPLAIN ANALYZE. It is possible to determine the number of partitions which were removed during this phase by observing the “Subplans Removed” property in the EXPLAIN output.
During actual execution of the query plan. Partition pruning may also be performed here to remove partitions using values which are only known during actual query execution. This includes values from subqueries and values from execution-time parameters such as those from parameterized nested loop joins. Since the value of these parameters may change many times during the execution of the query, partition pruning is performed whenever one of the execution parameters being used by partition pruning changes. Determining if partitions were pruned during this phase requires careful inspection of the
Partition pruning can be disabled using the setting.
Note
Execution-time partition pruning currently only occurs for the Append and MergeAppend node types. It is not yet implemented for the ModifyTable node type, but that is likely to be changed in a future release of PostgreSQL.
5.11.5. Partitioning and Constraint Exclusion
Constraint exclusion is a query optimization technique similar to partition pruning. While it is primarily used for partitioning implemented using the legacy inheritance method, it can be used for other purposes, including with declarative partitioning.
Constraint exclusion works in a very similar way to partition pruning, except that it uses each table's CHECK constraints — which gives it its name — whereas partition pruning uses the table's partition bounds, which exist only in the case of declarative partitioning. Another difference is that constraint exclusion is only applied at plan time; there is no attempt to remove partitions at execution time.
The fact that constraint exclusion uses CHECK constraints, which makes it slow compared to partition pruning, can sometimes be used as an advantage: because constraints can be defined even on declaratively-partitioned tables, in addition to their internal partition bounds, constraint exclusion may be able to elide additional partitions from the query plan.
The default (and recommended) setting of is neither on nor off, but an intermediate setting called partition, which causes the technique to be applied only to queries that are likely to be working on inheritance partitioned tables. The on setting causes the planner to examine CHECK constraints in all queries, even simple ones that are unlikely to benefit.
The following caveats apply to constraint exclusion:
Constraint exclusion is only applied during query planning, unlike partition pruning, which can also be applied during query execution.
Constraint exclusion only works when the query's WHERE clause contains constants (or externally supplied parameters). For example, a comparison against a non-immutable function such as CURRENT_TIMESTAMP cannot be optimized, since the planner cannot know which child table the function's value might fall into at run time.
5.11.6. Declarative Partitioning Best Practices
The choice of how to partition a table should be made carefully as the performance of query planning and execution can be negatively affected by poor design.
One of the most critical design decisions will be the column or columns by which you partition your data. Often the best choice will be to partition by the column or set of columns which most commonly appear in WHERE clauses of queries being executed on the partitioned table. WHERE clause items that match and are compatible with the partition key can be used to prune unneeded partitions. However, you may be forced into making other decisions by requirements for the PRIMARY KEY or a UNIQUE constraint. Removal of unwanted data is also a factor to consider when planning your partitioning strategy. An entire partition can be detached fairly quickly, so it may be beneficial to design the partition strategy in such a way that all data to be removed at once is located in a single partition.
Choosing the target number of partitions that the table should be divided into is also a critical decision to make. Not having enough partitions may mean that indexes remain too large and that data locality remains poor which could result in low cache hit ratios. However, dividing the table into too many partitions can also cause issues. Too many partitions can mean longer query planning times and higher memory consumption during both query planning and execution. When choosing how to partition your table, it's also important to consider what changes may occur in the future. For example, if you choose to have one partition per customer and you currently have a small number of large customers, consider the implications if in several years you instead find yourself with a large number of small customers. In this case, it may be better to choose to partition by HASH and choose a reasonable number of partitions rather than trying to partition by LIST and hoping that the number of customers does not increase beyond what it is practical to partition the data by.
Sub-partitioning can be useful to further divide partitions that are expected to become larger than other partitions, although excessive sub-partitioning can easily lead to large numbers of partitions and can cause the same problems mentioned in the preceding paragraph.
It is also important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few thousand partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher when more partitions remain after the planner performs partition pruning. This is particularly true for the UPDATE and DELETE commands. Another reason to be concerned about having a large number of partitions is that the server's memory consumption may grow significantly over a period of time, especially if many sessions touch large numbers of partitions. That's because each partition requires its metadata to be loaded into the local memory of each session that touches it.
With data warehouse type workloads, it can make sense to use a larger number of partitions than with an OLTP type workload. Generally, in data warehouses, query planning time is less of a concern as the majority of processing time is spent during query execution. With either of these two types of workload, it is important to make the right decisions early, as re-partitioning large quantities of data can be painfully slow. Simulations of the intended workload are often beneficial for optimizing the partitioning strategy. Never assume that more partitions are better than fewer partitions and vice-versa.
舉例來說,count(*) 計算輸入列的個數,而 count(f1) 是計算輸入列中 f1 欄位非空值的個數,因為 count 會忽略空值;然而,count(distinct f1) 則是計算 f1 欄位不重覆又非空值的個數。
通常彙總函數在處理輸入資料時,都是未排序過的。在大多數的情況沒有關係,例如:min 最小值的計算,與其輸入的次序沒有關係。然而,還是有些彙總函數的結果,與其處理次序是有關連的,例如:array_agg 和 string_agg。ORDER BY 字句就可以達到此效果,其與一般查詢語法 ORDER BY 的用法相同,詳細說明在 7.5 節,除非該表示式無法輸出成欄位名稱或數字。舉例如下:
如果 DISTINCT 被加到 ORDER BY 子句裡的話,那麼所有的 ORDER BY 表示式都必須符合彙總函數的參數,也就是說,你不能使用不在 DISTINCT 列表中的表示式來排序。
注意
在彙總函數中使用 DISTINCT 和 ORDER BY,都是 PostgreSQL 的延伸。
把 ORDER BY 放進彙總函數的參數列表中,就如同到目前為止的描述,用於排序輸入值,進行一般性的處理或統計彙總,而排序是選擇性的。有另一種類型的彙總函數稱作有次序彙總,它們就必須要有 ORDER BY 子句,通常就是因為這些函數的計算結果,只會對某些特定次序的資料產生效果。典型的有次序彙總例子,包含排名和累計百分比計算。對於有次序彙總計算,將 ORDER BY 字句寫進 WITHIN GROUP (...) 中,如同上述最後一個語法例子。在 ORDER BY 子句中的表示式會處理每一筆輸入資料,如同一般的彚總函數,然後將其依子句中的表示式計算並排序,最後再依序轉送給彙總函數處理。(這和非處理 WITHIN GROUP 中的 ORDER BY 不同,它們不會再轉送給彙總函數。)如果有在 WITHIN GROUP 之前的表示式的話,稱作直接參數,會和有 ORDER BY 的參數有區分。不像一般的彙總參數,直接參數只會被處理一次,而不是每一筆都一次。這意思是只有在 GROUP BY 中,這些變數才會被彙總處理。這樣的限制就如同直接參數不在彙總表示式之中一樣。直接參數一般用於累計分配,只有在每一次彙整完的值才有意義。直接參數可以是空值,在這個例子中,使用的是 (),而非 (*)。(PostgreSQL 兩種寫法都可以接受,但標準 SQL 只接受前者。)
有次序彙總查詢如下:
這裡包含了 50% 的累計,或是中間數累計,來源是表格 households 的 income 欄位。其中,0.5 是直接參數,它不影響百分累計彙整計算過程。
PARTITION BY 子句將查詢分組成為不同的分區,它們將會分別地被窗函數所處理。PARTITION BY 的行為和查詢語句中的 GROUP BY 很類似,除了它的表示式就只是表示式,而且不能產出欄位名稱或編號。沒有 PARTITION BY 的話,所有的列都會被當作一個分組進行彙總。ORDER BY 子句決定窗函數的處理次序,它也和查詢語句中的 ORDER BY 很類似,但它不能使用輸出的欄位或編號。如果沒有 ORDER BY 的話,就無法保證彙總處理的次序了。
UNBOUNDED PRECEDING 的窗框始於該分區的第一列,同樣地,UNBOUNDED FOLLOWING 意指窗框結束於分區的最後一列。
在 RANGE 模式裡,如果 frame_start 設定為 CURRENT ROW 的話,表示窗框始於目前列同序的那一列(使用 ORDER BY 時,排序相同的那一列),同理,frame_end 設定為 CURRENT ROW 時,表示窗框止於排序相同的列。而在 ROWS 模式時,CURRENT ROW 指的就是自己。
PRECEDING 和 FOLLOWING 兩個設定值,目前只能用在 ROWS 模式。它們指的是窗框的起迄於指定的一個值,表示目前列之前後多少列。而所謂的值,必須是整數表示式而不包含任何變數、彙總函數、或窗函數。其值也不能是空值或負值,但可以為零,表示只處理目前列。
預設的窗框設定是 RANGE UNBOUNDED PRECEDING,和 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 是一樣的。加上 ORDER BY 的話,這可以讓窗框起於和目前列並列的列;沒有 ORDER BY 的話,所有的列都會在分區裡,因為如此就無法判定次序,表示大家都一樣。
frame_start 的限制是不能使用 UNBOUNDED FOLLOWING,而 frame_end 不能使用 UNBOUNDED PRECEDING。frame_end 的設定也不能先於 frame_start—舉例來說,RANGE BETWEEN CURRENT ROW,使用 PRECEDING 就不可以。
SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY income) FROM households;
percentile_cont
-----------------
50489
SELECT
count(*) AS unfiltered,
count(*) FILTER (WHERE i < 5) AS filtered
FROM generate_series(1,10) AS s(i);
unfiltered | filtered
------------+----------
10 | 4
(1 row)
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER ( indow_definition )
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER ( indow_definition )
[ existing_window_name ][ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]
{ RANGE | ROWS } frame_start
{ RANGE | ROWS } BETWEEN frame_start AND frame_end
UNBOUNDED PRECEDING
value PRECEDING
CURRENT ROW
value FOLLOWING
UNBOUNDED FOLLOWING
CAST ( expression AS type )
expression::type
typename ( expression )
expr COLLATE collation
SELECT a, b, c FROM tbl WHERE ... ORDER BY a COLLATE "C";
SELECT * FROM tbl WHERE a > 'foo' COLLATE "C";
SELECT * FROM tbl WHERE a COLLATE "C" > 'foo';
SELECT * FROM tbl WHERE (a > 'foo') COLLATE "C";
SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name)
FROM states;
SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
array
-----------------------------------------------------------------------
{2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413}
(1 row)
SELECT ARRAY(SELECT ARRAY[i, i*2] FROM generate_series(1,5) AS a(i));
array
----------------------------------
{{1,2},{2,4},{3,6},{4,8},{5,10}}
(1 row)
SELECT ROW(1,2.5,'this is a test');
SELECT ROW(t.*, 42) FROM t;
SELECT ROW(t.f1, t.f2, 42) FROM t;
CREATE TABLE mytable(f1 int, f2 float, f3 text);
CREATE FUNCTION getf1(mytable) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;
-- No cast needed since only one getf1() exists
SELECT getf1(ROW(1,2.5,'this is a test'));
getf1
-------
1
(1 row)
CREATE TYPE myrowtype AS (f1 int, f2 text, f3 numeric);
CREATE FUNCTION getf1(myrowtype) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;
-- Now we need a cast to indicate which function to call:
SELECT getf1(ROW(1,2.5,'this is a test'));
ERROR: function getf1(record) is not unique
SELECT getf1(ROW(1,2.5,'this is a test')::mytable);
getf1
-------
1
(1 row)
SELECT getf1(CAST(ROW(11,'this is a test',2.5) AS myrowtype));
getf1
-------
11
(1 row)
SELECT ROW(1,2.5,'this is a test') = ROW(1, 3, 'not the same');
SELECT ROW(table.*) IS NULL FROM table; -- detect all-null rows
SELECT true OR somefunc();
SELECT somefunc() OR true;
SELECT ... WHERE x
>
0 AND y/x
>
1.5;
SELECT ... WHERE CASE WHEN x
>
0 THEN y/x
>
1.5 ELSE false END;
SELECT CASE WHEN x
>
0 THEN x ELSE 1/0 END FROM tab;
SELECT CASE WHEN min(employees) > 0
THEN avg(expenses / employees)
END
FROM departments;
ONLY
will result in an error as adding or dropping constraints on only the partitioned table, when partitions exist, is not supported. Instead, constraints on the partitions themselves can be added and (if they are not present in the parent table) dropped.
As a partitioned table does not have any data directly, attempts to use TRUNCATEONLY on a partitioned table will always return an error.
Partitions cannot have columns that are not present in the parent. It is not possible to specify columns when creating partitions with CREATE TABLE, nor is it possible to add columns to partitions after-the-fact using ALTER TABLE. Tables may be added as a partition with ALTER TABLE ... ATTACH PARTITION only if their columns exactly match the parent.
You cannot drop the NOT NULL constraint on a partition's column if the constraint is present in the parent table.
(in that order) as the partition key.
Create partitions. Each partition's definition must specify the bounds that correspond to the partitioning method and partition key of the parent. Note that specifying bounds such that the new partition's values will overlap with those in one or more existing partitions will cause an error. Inserting data into the parent table that does not map to one of the existing partitions will cause an error; an appropriate partition must be added manually.
Partitions thus created are in every way normal PostgreSQL tables (or, possibly, foreign tables). It is possible to specify a tablespace and storage parameters for each partition separately.
It is not necessary to create table constraints describing partition boundary condition for partitions. Instead, partition constraints are generated implicitly from the partition bound specification whenever there is need to refer to them.
To implement sub-partitioning, specify the PARTITION BY clause in the commands used to create individual partitions, for example:
After creating partitions of measurement_y2006m02, any data inserted into measurement that is mapped to measurement_y2006m02 (or data that is directly inserted into measurement_y2006m02, provided it satisfies its partition constraint) will be further redirected to one of its partitions based on the peaktemp column. The partition key specified may overlap with the parent's partition key, although care should be taken when specifying the bounds of a sub-partition such that the set of data it accepts constitutes a subset of what the partition's own bounds allows; the system does not try to check whether that's really the case.
Create an index on the key column(s), as well as any other indexes you might want, on the partitioned table. (The key index is not strictly necessary, but in most scenarios it is helpful.) This automatically creates one index on each partition, and any partitions you create or attach later will also contain the index.
Ensure that the enable_partition_pruning configuration parameter is not disabled in postgresql.conf. If it is, queries will not be optimized as desired.
Some operations require a stronger lock when using declarative partitioning than when using table inheritance. For example, adding or removing a partition to or from a partitioned table requires taking an ACCESS EXCLUSIVE lock on the parent table, whereas a SHARE UPDATE EXCLUSIVE lock is enough in the case of regular inheritance.
Add non-overlapping table constraints to the child tables to define the allowed key values in each.
Typical examples would be:
Ensure that the constraints guarantee that there is no overlap between the key values permitted in different child tables. A common mistake is to set up range constraints like:
This is wrong since it is not clear which child table the key value 200 belongs in.
It would be better to instead create child tables as follows:
For each child table, create an index on the key column(s), as well as any other indexes you might want.
We want our application to be able to say INSERT INTO measurement ... and have the data be redirected into the appropriate child table. We can arrange that by attaching a suitable trigger function to the master table. If data will be added only to the latest child, we can use a very simple trigger function:
After creating the function, we create a trigger which calls the trigger function:
We must redefine the trigger function each month so that it always points to the current child table. The trigger definition does not need to be updated, however.
We might want to insert data and have the server automatically locate the child table into which the row should be added. We could do this with a more complex trigger function, for example:
The trigger definition is the same as before. Note that each IF test must exactly match the CHECK constraint for its child table.
While this function is more complex than the single-month case, it doesn't need to be updated as often, since branches can be added in advance of being needed.
Note
In practice, it might be best to check the newest child first, if most inserts go into that child. For simplicity, we have shown the trigger's tests in the same order as in other parts of this example.
A different approach to redirecting inserts into the appropriate child table is to set up rules, instead of a trigger, on the master table. For example:
A rule has significantly more overhead than a trigger, but the overhead is paid once per query rather than once per row, so this method might be advantageous for bulk-insert situations. In most cases, however, the trigger method will offer better performance.
Be aware that COPY ignores rules. If you want to use COPY to insert data, you'll need to copy into the correct child table rather than directly into the master. COPY does fire triggers, so you can use it normally if you use the trigger approach.
Another disadvantage of the rule approach is that there is no simple way to force an error if the set of rules doesn't cover the insertion date; the data will silently go into the master table instead.
Ensure that the constraint_exclusion configuration parameter is not disabled in postgresql.conf; otherwise child tables may be accessed unnecessarily.
The schemes shown here assume that the values of a row's key column(s) never change, or at least do not change enough to require it to move to another partition. An
UPDATE
that attempts to do that will fail because of the
CHECK
constraints. If you need to handle such cases, you can put suitable update triggers on the child tables, but it makes management of the structure much more complicated.
If you are using manual VACUUM or ANALYZE commands, don't forget that you need to run them on each child table individually. A command like:
will only process the master table.
INSERT statements with ON CONFLICT clauses are unlikely to work as expected, as the ON CONFLICT action is only taken in case of unique violations on the specified target relation, not its child relations.
Triggers or rules will be needed to route rows to the desired child table, unless the application is explicitly aware of the partitioning scheme. Triggers may be complicated to write, and will be much slower than the tuple routing performed internally by declarative partitioning.
loops
property in the
EXPLAIN ANALYZE
output. Subplans corresponding to different partitions may have different values for it depending on how many times each of them was pruned during execution. Some may be shown as
(never executed)
if they were pruned every time.
Keep the partitioning constraints simple, else the planner may not be able to prove that child tables might not need to be visited. Use simple equality conditions for list partitioning, or simple range tests for range partitioning, as illustrated in the preceding examples. A good rule of thumb is that partitioning constraints should contain only comparisons of the partitioning column(s) to constants using B-tree-indexable operators, because only B-tree-indexable column(s) are allowed in the partition key.
All constraints on all children of the parent table are examined during constraint exclusion, so large numbers of children are likely to increase query planning time considerably. So the legacy inheritance based partitioning will work well with up to perhaps a hundred child tables; don't try to use many thousands of children.
A table expression computes a table. The table expression contains a FROM clause that is optionally followed by WHERE, GROUP BY, and HAVING clauses. Trivial table expressions simply refer to a table on disk, a so-called base table, but more complex expressions can be used to modify or combine base tables in various ways.
The optional WHERE, GROUP BY, and HAVING clauses in the table expression specify a pipeline of successive transformations performed on the table derived in the FROM clause. All these transformations produce a virtual table that provides the rows that are passed to the select list to compute the output rows of the query.
7.2.1. The FROM Clause
The derives a table from one or more other tables given in a comma-separated table reference list.
A table reference can be a table name (possibly schema-qualified), or a derived table such as a subquery, a JOIN construct, or complex combinations of these. If more than one table reference is listed in the FROM clause, the tables are cross-joined (that is, the Cartesian product of their rows is formed; see below). The result of the FROM list is an intermediate virtual table that can then be subject to transformations by the WHERE, GROUP BY, and HAVING clauses and is finally the result of the overall table expression.
When a table reference names a table that is the parent of a table inheritance hierarchy, the table reference produces rows of not only that table but all of its descendant tables, unless the key word ONLY precedes the table name. However, the reference produces only the columns that appear in the named table — any columns added in subtables are ignored.
Instead of writing ONLY before the table name, you can write * after the table name to explicitly specify that descendant tables are included. There is no real reason to use this syntax any more, because searching descendant tables is now always the default behavior. However, it is supported for compatibility with older releases.
7.2.1.1. Joined Tables
A joined table is a table derived from two other (real or derived) tables according to the rules of the particular join type. Inner, outer, and cross-joins are available. The general syntax of a joined table is
Joins of all types can be chained together, or nested: either or both T1 and T2 can be joined tables. Parentheses can be used around JOIN clauses to control the join order. In the absence of parentheses, JOIN clauses nest left-to-right.
Join TypesCross join
For every possible combination of rows from T1 and T2 (i.e., a Cartesian product), the joined table will contain a row consisting of all columns in T1 followed by all columns in T2. If the tables have N and M rows respectively, the joined table will have N * M rows.
FROMT1 CROSS JOIN T2 is equivalent to FROMT1 INNER JOIN T2 ON TRUE (see below). It is also equivalent to FROMT1, T2.
Note
This latter equivalence does not hold exactly when more than two tables appear, because JOIN binds more tightly than comma. For example FROMT1 CROSS JOIN T2 INNER JOIN T3 ON condition is not the same as FROMT1, T2 INNER JOIN T3 ON
The words INNER and OUTER are optional in all forms. INNER is the default; LEFT, RIGHT, and FULL imply an outer join.
The join condition is specified in the ON or USING clause, or implicitly by the word NATURAL. The join condition determines which rows from the two source tables are considered to “match”, as explained in detail below.
The possible types of qualified join are:INNER JOIN
For each row R1 of T1, the joined table has a row for each row in T2 that satisfies the join condition with R1.LEFT OUTER JOIN
First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Thus, the joined table always has at least one row for each row in T1.RIGHT OUTER JOIN
First, an inner join is performed. Then, for each row in T2 that does not satisfy the join condition with any row in T1, a joined row is added with null values in columns of T1. This is the converse of a left join: the result table will always have a row for each row in T2.FULL OUTER JOIN
First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Also, for each row of T2 that does not satisfy the join condition with any row in T1, a joined row with null values in the columns of T1 is added.
The ON clause is the most general kind of join condition: it takes a Boolean value expression of the same kind as is used in a WHERE clause. A pair of rows from T1 and _T2_match if the ON expression evaluates to true.
The USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining T1 and T2_with USING (a, b) produces the join condition ONT1.a =T2.a ANDT1.b =T2
Furthermore, the output of JOIN USING suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. While JOIN ON produces all columns from T1 followed by all columns from T2, JOIN USING produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns from T1, followed by any remaining columns from T2.
Finally, NATURAL is a shorthand form of USING: it forms a USING list consisting of all column names that appear in both input tables. As with USING, these columns appear only once in the output table. If there are no common column names, NATURAL JOIN behaves like JOIN ... ON TRUE, producing a cross-product join.
Note
USING is reasonably safe from column changes in the joined relations since only the listed columns are combined. NATURAL is considerably more risky since any schema changes to either relation that cause a new matching column name to be present will cause the join to combine that new column as well.
To put this together, assume we have tables t1:
and t2:
then we get the following results for the various joins:
The join condition specified with ON can also contain conditions that do not relate directly to the join. This can prove useful for some queries but needs to be thought out carefully. For example:
Notice that placing the restriction in the WHERE clause produces a different result:
This is because a restriction placed in the ON clause is processed before the join, while a restriction placed in the WHERE clause is processed after the join. That does not matter with inner joins, but it matters a lot with outer joins.
7.2.1.2. Table and Column Aliases
A temporary name can be given to tables and complex table references to be used for references to the derived table in the rest of the query. This is called a table alias.
To create a table alias, write
or
The AS key word is optional noise. alias can be any identifier.
A typical application of table aliases is to assign short identifiers to long table names to keep the join clauses readable. For example:
The alias becomes the new name of the table reference so far as the current query is concerned — it is not allowed to refer to the table by the original name elsewhere in the query. Thus, this is not valid:
Table aliases are mainly for notational convenience, but it is necessary to use them when joining a table to itself, e.g.:
Additionally, an alias is required if the table reference is a subquery (see ).
Parentheses are used to resolve ambiguities. In the following example, the first statement assigns the alias b to the second instance of my_table, but the second statement assigns the alias to the result of the join:
Another form of table aliasing gives temporary names to the columns of the table, as well as the table itself:
If fewer column aliases are specified than the actual table has columns, the remaining columns are not renamed. This syntax is especially useful for self-joins or subqueries.
When an alias is applied to the output of a JOIN clause, the alias hides the original name(s) within the JOIN. For example:
is valid SQL, but:
is not valid; the table alias a is not visible outside the alias c.
7.2.1.3. Subqueries
Subqueries specifying a derived table must be enclosed in parentheses and must be assigned a table alias name (as in ). For example:
This example is equivalent to FROM table1 AS alias_name. More interesting cases, which cannot be reduced to a plain join, arise when the subquery involves grouping or aggregation.
A subquery can also be a VALUES list:
Again, a table alias is required. Assigning alias names to the columns of the VALUES list is optional, but is good practice. For more information see .
7.2.1.4. Table Functions
Table functions are functions that produce a set of rows, made up of either base data types (scalar types) or composite data types (table rows). They are used like a table, view, or subquery in the FROM clause of a query. Columns returned by table functions can be included in SELECT, JOIN, or WHERE clauses in the same manner as columns of a table, view, or subquery.
Table functions may also be combined using the ROWS FROM syntax, with the results returned in parallel columns; the number of result rows in this case is that of the largest function result, with smaller results padded with null values to match.
If the WITH ORDINALITY clause is specified, an additional column of type bigint will be added to the function result columns. This column numbers the rows of the function result set, starting from 1. (This is a generalization of the SQL-standard syntax for UNNEST ... WITH ORDINALITY.) By default, the ordinal column is called ordinality, but a different column name can be assigned to it using an AS clause.
The special table function UNNEST may be called with any number of array parameters, and it returns a corresponding number of columns, as if UNNEST () had been called on each parameter separately and combined using the ROWS FROM construct.
If no table_alias is specified, the function name is used as the table name; in the case of a ROWS FROM() construct, the first function's name is used.
If column aliases are not supplied, then for a function returning a base data type, the column name is also the same as the function name. For a function returning a composite type, the result columns get the names of the individual attributes of the type.
Some examples:
In some cases it is useful to define table functions that can return different column sets depending on how they are invoked. To support this, the table function can be declared as returning the pseudo-type record. When such a function is used in a query, the expected row structure must be specified in the query itself, so that the system can know how to parse and plan the query. This syntax looks like:
When not using the ROWS FROM() syntax, the column_definition list replaces the column alias list that could otherwise be attached to the FROM item; the names in the column definitions serve as column aliases. When using the ROWS FROM() syntax, a column_definition list can be attached to each member function separately; or if there is only one member function and no WITH ORDINALITY clause, a column_definition list can be written in place of a column alias list following ROWS FROM().
Consider this example:
The function (part of the module) executes a remote query. It is declared to return record since it might be used for any kind of query. The actual column set must be specified in the calling query so that the parser knows, for example, what * should expand to.
7.2.1.5. LATERAL Subqueries
Subqueries appearing in FROM can be preceded by the key word LATERAL. This allows them to reference columns provided by preceding FROM items. (Without LATERAL, each subquery is evaluated independently and so cannot cross-reference any other FROM item.)
Table functions appearing in FROM can also be preceded by the key word LATERAL, but for functions the key word is optional; the function's arguments can contain references to columns provided by preceding FROM items in any case.
A LATERAL item can appear at top level in the FROM list, or within a JOIN tree. In the latter case it can also refer to any items that are on the left-hand side of a JOIN that it is on the right-hand side of.
When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROM item providing the cross-referenced column(s), or set of rows of multiple FROM items providing the columns, the LATERAL item is evaluated using that row or row set's values of the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is repeated for each row or set of rows from the column source table(s).
A trivial example of LATERAL is
This is not especially useful since it has exactly the same result as the more conventional
LATERAL is primarily useful when the cross-referenced column is necessary for computing the row(s) to be joined. A common application is providing an argument value for a set-returning function. For example, supposing that vertices(polygon) returns the set of vertices of a polygon, we could identify close-together vertices of polygons stored in a table with:
This query could also be written
or in several other equivalent formulations. (As already mentioned, the LATERAL key word is unnecessary in this example, but we use it for clarity.)
It is often particularly handy to LEFT JOIN to a LATERAL subquery, so that source rows will appear in the result even if the LATERAL subquery produces no rows for them. For example, if get_product_names() returns the names of products made by a manufacturer, but some manufacturers in our table currently produce no products, we could find out which ones those are like this:
7.2.2. The WHERE Clause
The syntax of the is
where search_condition is any value expression (see ) that returns a value of type boolean.
After the processing of the FROM clause is done, each row of the derived virtual table is checked against the search condition. If the result of the condition is true, the row is kept in the output table, otherwise (i.e., if the result is false or null) it is discarded. The search condition typically references at least one column of the table generated in the FROMclause; this is not required, but otherwise the WHERE clause will be fairly useless.
Note
The join condition of an inner join can be written either in the WHEREclause or in the JOIN clause. For example, these table expressions are equivalent:
and:
or perhaps even:
Which one of these you use is mainly a matter of style. The JOIN syntax in the FROM clause is probably not as portable to other SQL database management systems, even though it is in the SQL standard. For outer joins there is no choice: they must be done in the FROM clause. The ON or USING clause of an outer join is not equivalent to a WHERE condition, because it results in the addition of rows (for unmatched input rows) as well as the removal of rows in the final result.
Here are some examples of WHERE clauses:
fdt is the table derived in the FROM clause. Rows that do not meet the search condition of the WHERE clause are eliminated from fdt. Notice the use of scalar subqueries as value expressions. Just like any other query, the subqueries can employ complex table expressions. Notice also how fdt is referenced in the subqueries. Qualifying c1 as fdt.c1 is only necessary if c1 is also the name of a column in the derived input table of the subquery. But qualifying the column name adds clarity even when it is not needed. This example shows how the column naming scope of an outer query extends into its inner queries.
7.2.3. The GROUP BY and HAVING Clauses
After passing the WHERE filter, the derived input table might be subject to grouping, using the GROUP BY clause, and elimination of group rows using the HAVING clause.
The is used to group together those rows in a table that have the same values in all the columns listed. The order in which the columns are listed does not matter. The effect is to combine each set of rows having common values into one group row that represents all rows in the group. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups. For instance:
In the second query, we could not have written SELECT * FROM test1 GROUP BY x, because there is no single value for the column y that could be associated with each group. The grouped-by columns can be referenced in the select list since they have a single value in each group.
In general, if a table is grouped, columns that are not listed in GROUP BY cannot be referenced except in aggregate expressions. An example with aggregate expressions is:
Here sum is an aggregate function that computes a single value over the entire group. More information about the available aggregate functions can be found in .
Tip
Grouping without aggregate expressions effectively calculates the set of distinct values in a column. This can also be achieved using the DISTINCTclause (see ).
Here is another example: it calculates the total sales for each product (rather than the total sales of all products):
In this example, the columns product_id, p.name, and p.price must be in the GROUP BY clause since they are referenced in the query select list (but see below). The columns.units does not have to be in the GROUP BY list since it is only used in an aggregate expression (sum(...)), which represents the sales of a product. For each product, the query returns a summary row about all sales of the product.
If the products table is set up so that, say, product_id is the primary key, then it would be enough to group by product_id in the above example, since name and price would be functionally dependent on the product ID, and so there would be no ambiguity about which name and price value to return for each product ID group.
In strict SQL, GROUP BY can only group by columns of the source table but PostgreSQL extends this to also allow GROUP BY to group by columns in the select list. Grouping by value expressions instead of simple column names is also allowed.
If a table has been grouped using GROUP BY, but only certain groups are of interest, the HAVING clause can be used, much like a WHERE clause, to eliminate groups from the result. The syntax is:
Expressions in the HAVING clause can refer both to grouped expressions and to ungrouped expressions (which necessarily involve an aggregate function).
Example:
Again, a more realistic example:
In the example above, the WHERE clause is selecting rows by a column that is not grouped (the expression is only true for sales during the last four weeks), while the HAVINGclause restricts the output to groups with total gross sales over 5000. Note that the aggregate expressions do not necessarily need to be the same in all parts of the query.
If a query contains aggregate function calls, but no GROUP BY clause, grouping still occurs: the result is a single group row (or perhaps no rows at all, if the single row is then eliminated by HAVING). The same is true if it contains a HAVING clause, even without any aggregate function calls or GROUP BY clause.
7.2.4. GROUPING SETS, CUBE, and ROLLUP
More complex grouping operations than those described above are possible using the concept of grouping sets. The data selected by the FROM and WHERE clauses is grouped separately by each specified grouping set, aggregates computed for each group just as for simple GROUP BY clauses, and then the results returned. For example:
Each sublist of GROUPING SETS may specify zero or more columns or expressions and is interpreted the same way as though it were directly in the GROUP BY clause. An empty grouping set means that all rows are aggregated down to a single group (which is output even if no input rows were present), as described above for the case of aggregate functions with no GROUP BY clause.
References to the grouping columns or expressions are replaced by null values in result rows for grouping sets in which those columns do not appear. To distinguish which grouping a particular output row resulted from, see .
A shorthand notation is provided for specifying two common types of grouping set. A clause of the form
represents the given list of expressions and all prefixes of the list including the empty list; thus it is equivalent to
This is commonly used for analysis over hierarchical data; e.g. total salary by department, division, and company-wide total.
A clause of the form
represents the given list and all of its possible subsets (i.e. the power set). Thus
is equivalent to
The individual elements of a CUBE or ROLLUP clause may be either individual expressions, or sublists of elements in parentheses. In the latter case, the sublists are treated as single units for the purposes of generating the individual grouping sets. For example:
is equivalent to
and
is equivalent to
The CUBE and ROLLUP constructs can be used either directly in the GROUP BY clause, or nested inside a GROUPING SETS clause. If one GROUPING SETS clause is nested inside another, the effect is the same as if all the elements of the inner clause had been written directly in the outer clause.
If multiple grouping items are specified in a single GROUP BY clause, then the final list of grouping sets is the cross product of the individual items. For example:
is equivalent to
Note
The construct (a, b) is normally recognized in expressions as a . Within the GROUP BY clause, this does not apply at the top levels of expressions, and (a, b) is parsed as a list of expressions as described above. If for some reason you need a row constructor in a grouping expression, use ROW(a, b).
7.2.5. Window Function Processing
If the query contains any window functions (see , and ), these functions are evaluated after any grouping, aggregation, and HAVING filtering is performed. That is, if the query uses any aggregates, GROUP BY, or HAVING, then the rows seen by the window functions are the group rows instead of the original table rows from FROM/WHERE.
When multiple window functions are used, all the window functions having syntactically equivalent PARTITION BY and ORDER BY clauses in their window definitions are guaranteed to be evaluated in a single pass over the data. Therefore they will see the same sort ordering, even if the ORDER BY does not uniquely determine an ordering. However, no guarantees are made about the evaluation of functions having different PARTITION BY or ORDER BY specifications. (In such cases a sort step is typically required between the passes of window function evaluations, and the sort is not guaranteed to preserve ordering of rows that its ORDER BY sees as equivalent.)
Currently, window functions always require presorted data, and so the query output will be ordered according to one or another of the window functions' PARTITION BY/ORDER BYclauses. It is not recommended to rely on this, however. Use an explicit top-level ORDER BY clause if you want to be sure the results are sorted in a particular way.
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
...
CREATE TABLE measurement_y2007m11 PARTITION OF measurement
FOR VALUES FROM ('2007-11-01') TO ('2007-12-01');
CREATE TABLE measurement_y2007m12 PARTITION OF measurement
FOR VALUES FROM ('2007-12-01') TO ('2008-01-01')
TABLESPACE fasttablespace;
CREATE TABLE measurement_y2008m01 PARTITION OF measurement
FOR VALUES FROM ('2008-01-01') TO ('2008-02-01')
WITH (parallel_workers = 4)
TABLESPACE fasttablespace;
CREATE INDEX ON measurement (logdate);
CREATE INDEX measurement_y2006m02_logdate ON measurement_y2006m02 (logdate);
CREATE INDEX measurement_y2006m03_logdate ON measurement_y2006m03 (logdate);
CREATE INDEX measurement_y2007m11_logdate ON measurement_y2007m11 (logdate);
CREATE INDEX measurement_y2007m12_logdate ON measurement_y2007m12 (logdate);
CREATE INDEX measurement_y2008m01_logdate ON measurement_y2008m01 (logdate);
CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO measurement_y2008m01 VALUES (NEW.*);
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER insert_measurement_trigger
BEFORE INSERT ON measurement
FOR EACH ROW EXECUTE FUNCTION measurement_insert_trigger();
CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.logdate >= DATE '2006-02-01' AND
NEW.logdate < DATE '2006-03-01' ) THEN
INSERT INTO measurement_y2006m02 VALUES (NEW.*);
ELSIF ( NEW.logdate >= DATE '2006-03-01' AND
NEW.logdate < DATE '2006-04-01' ) THEN
INSERT INTO measurement_y2006m03 VALUES (NEW.*);
...
ELSIF ( NEW.logdate >= DATE '2008-01-01' AND
NEW.logdate < DATE '2008-02-01' ) THEN
INSERT INTO measurement_y2008m01 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
ANALYZE measurement;
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
);
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
DROP TABLE measurement_y2006m02;
ALTER TABLE measurement DETACH PARTITION measurement_y2006m02;
CREATE TABLE measurement_y2008m02 PARTITION OF measurement
FOR VALUES FROM ('2008-02-01') TO ('2008-03-01')
TABLESPACE fasttablespace;
CREATE TABLE measurement_y2008m02
(LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS)
TABLESPACE fasttablespace;
ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02
CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' );
\copy measurement_y2008m02 from 'measurement_y2008m02'
-- possibly some other data preparation work
ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
FOR VALUES FROM ('2008-02-01') TO ('2008-03-01' );
CREATE INDEX measurement_usls_idx ON ONLY measurement (unitsales);
CREATE INDEX measurement_usls_200602_idx
ON measurement_y2006m02 (unitsales);
ALTER INDEX measurement_usls_idx
ATTACH PARTITION measurement_usls_200602_idx;
...
ALTER TABLE ONLY measurement ADD UNIQUE (city_id, logdate);
ALTER TABLE measurement_y2006m02 ADD UNIQUE (city_id, logdate);
ALTER INDEX measurement_city_id_logdate_key
ATTACH PARTITION measurement_y2006m02_city_id_logdate_key;
...
ALTER TABLE measurement_y2006m02 NO INHERIT measurement;
CREATE TABLE measurement_y2008m02 (
CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' )
) INHERITS (measurement);
CREATE TABLE measurement_y2008m02
(LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02
CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' );
\copy measurement_y2008m02 from 'measurement_y2008m02'
-- possibly some other data preparation work
ALTER TABLE measurement_y2008m02 INHERIT measurement;
SET enable_partition_pruning = on; -- the default
SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
SET enable_partition_pruning = off;
EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
QUERY PLAN
-----------------------------------------------------------------------------------
Aggregate (cost=188.76..188.77 rows=1 width=8)
-> Append (cost=0.00..181.05 rows=3085 width=0)
-> Seq Scan on measurement_y2006m02 (cost=0.00..33.12 rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
-> Seq Scan on measurement_y2006m03 (cost=0.00..33.12 rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
...
-> Seq Scan on measurement_y2007m11 (cost=0.00..33.12 rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
-> Seq Scan on measurement_y2007m12 (cost=0.00..33.12 rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
-> Seq Scan on measurement_y2008m01 (cost=0.00..33.12 rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
SET enable_partition_pruning = on;
EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
QUERY PLAN
-----------------------------------------------------------------------------------
Aggregate (cost=37.75..37.76 rows=1 width=8)
-> Seq Scan on measurement_y2008m01 (cost=0.00..33.12 rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
CHECK ( x = 1 )
CHECK ( county IN ( 'Oxfordshire', 'Buckinghamshire', 'Warwickshire' ))
CHECK ( outletID >= 100 AND outletID < 200 )
CHECK ( outletID BETWEEN 100 AND 200 )
CHECK ( outletID BETWEEN 200 AND 300 )
CREATE TABLE measurement_y2006m02 (
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )
) INHERITS (measurement);
CREATE TABLE measurement_y2006m03 (
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )
) INHERITS (measurement);
...
CREATE TABLE measurement_y2007m11 (
CHECK ( logdate >= DATE '2007-11-01' AND logdate < DATE '2007-12-01' )
) INHERITS (measurement);
CREATE TABLE measurement_y2007m12 (
CHECK ( logdate >= DATE '2007-12-01' AND logdate < DATE '2008-01-01' )
) INHERITS (measurement);
CREATE TABLE measurement_y2008m01 (
CHECK ( logdate >= DATE '2008-01-01' AND logdate < DATE '2008-02-01' )
) INHERITS (measurement);
condition
because the
condition
can reference
T1
in the first case but not the second.Qualified joins
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01')
PARTITION BY RANGE (peaktemp);
CREATE RULE measurement_insert_y2006m02 AS
ON INSERT TO measurement WHERE
( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )
DO INSTEAD
INSERT INTO measurement_y2006m02 VALUES (NEW.*);
...
CREATE RULE measurement_insert_y2008m01 AS
ON INSERT TO measurement WHERE
( logdate >= DATE '2008-01-01' AND logdate < DATE '2008-02-01' )
DO INSTEAD
INSERT INTO measurement_y2008m01 VALUES (NEW.*);
FROM table_reference [, table_reference [, ...]]
T1 join_type T2 [ join_condition ]
T1 CROSS JOIN T2
T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 ON boolean_expression
T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 USING ( join column list )
T1 NATURAL { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2
num | name
-----+------
1 | a
2 | b
3 | c
num | value
-----+-------
1 | xxx
3 | yyy
5 | zzz
=> SELECT * FROM t1 CROSS JOIN t2;
num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
1 | a | 3 | yyy
1 | a | 5 | zzz
2 | b | 1 | xxx
2 | b | 3 | yyy
2 | b | 5 | zzz
3 | c | 1 | xxx
3 | c | 3 | yyy
3 | c | 5 | zzz
(9 rows)
=> SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;
num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
3 | c | 3 | yyy
(2 rows)
=> SELECT * FROM t1 INNER JOIN t2 USING (num);
num | name | value
-----+------+-------
1 | a | xxx
3 | c | yyy
(2 rows)
=> SELECT * FROM t1 NATURAL INNER JOIN t2;
num | name | value
-----+------+-------
1 | a | xxx
3 | c | yyy
(2 rows)
=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;
num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
2 | b | |
3 | c | 3 | yyy
(3 rows)
=> SELECT * FROM t1 LEFT JOIN t2 USING (num);
num | name | value
-----+------+-------
1 | a | xxx
2 | b |
3 | c | yyy
(3 rows)
=> SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;
num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
3 | c | 3 | yyy
| | 5 | zzz
(3 rows)
=> SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;
num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
2 | b | |
3 | c | 3 | yyy
| | 5 | zzz
(4 rows)
=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value = 'xxx';
num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
2 | b | |
3 | c | |
(3 rows)
=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value = 'xxx';
num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
(1 row)
FROM table_reference AS alias
FROM table_reference alias
SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.id = a.num;
SELECT * FROM my_table AS m WHERE my_table.a > 5; -- wrong
SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;
SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
FROM table_reference [AS] alias ( column1 [, column2 [, ...]] )
SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...
SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c
FROM (SELECT * FROM table1) AS alias_name
FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))
AS names(first, last)
CREATE TABLE foo (fooid int, foosubid int, fooname text);
CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
SELECT * FROM foo WHERE fooid = $1;
$$ LANGUAGE SQL;
SELECT * FROM getfoo(1) AS t1;
SELECT * FROM foo
WHERE foosubid IN (
SELECT foosubid
FROM getfoo(foo.fooid) z
WHERE z.fooid = foo.fooid
);
CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
SELECT * FROM vw_getfoo;
function_call [AS] alias (column_definition [, ... ])
function_call AS [alias] (column_definition [, ... ])
ROWS FROM( ... function_call AS (column_definition [, ... ]) [, ... ] )
SELECT *
FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
AS t1(proname name, prosrc text)
WHERE proname LIKE 'bytea%';
SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss;
SELECT * FROM foo, bar WHERE bar.id = foo.bar_id;
SELECT p1.id, p2.id, v1, v2
FROM polygons p1, polygons p2,
LATERAL vertices(p1.poly) v1,
LATERAL vertices(p2.poly) v2
WHERE (v1 <-> v2) < 10 AND p1.id != p2.id;
SELECT p1.id, p2.id, v1, v2
FROM polygons p1 CROSS JOIN LATERAL vertices(p1.poly) v1,
polygons p2 CROSS JOIN LATERAL vertices(p2.poly) v2
WHERE (v1 <-> v2) < 10 AND p1.id != p2.id;
SELECT m.name
FROM manufacturers m LEFT JOIN LATERAL get_product_names(m.id) pname ON true
WHERE pname IS NULL;
WHERE search_condition
FROM a, b WHERE a.id = b.id AND b.val > 5
FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5
FROM a NATURAL JOIN b WHERE b.val > 5
SELECT ... FROM fdt WHERE c1 > 5
SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)
SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)
SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)
SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100
SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > fdt.c1)
SELECT select_list
FROM ...
[WHERE ...]
GROUP BY grouping_column_reference [, grouping_column_reference]...
=> SELECT * FROM test1;
x | y
---+---
a | 3
c | 2
b | 5
a | 1
(4 rows)
=> SELECT x FROM test1 GROUP BY x;
x
---
a
b
c
(3 rows)
=> SELECT x, sum(y) FROM test1 GROUP BY x;
x | sum
---+-----
a | 4
b | 5
c | 2
(3 rows)
SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
FROM products p LEFT JOIN sales s USING (product_id)
GROUP BY product_id, p.name, p.price;
SELECT select_list FROM ... [WHERE ...] GROUP BY ... HAVING boolean_expression
=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3;
x | sum
---+-----
a | 4
b | 5
(2 rows)
=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c';
x | sum
---+-----
a | 4
b | 5
(2 rows)
SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit
FROM products p LEFT JOIN sales s USING (product_id)
WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
GROUP BY product_id, p.name, p.price, p.cost
HAVING sum(p.price * s.units) > 5000;
=> SELECT * FROM items_sold;
brand | size | sales
-------+------+-------
Foo | L | 10
Foo | M | 20
Bar | M | 15
Bar | L | 5
(4 rows)
=> SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING SETS ((brand), (size), ());
brand | size | sum
-------+------+-----
Foo | | 30
Bar | | 20
| L | 15
| M | 35
| | 50
(5 rows)
Table 9.5 列出了可用的數學函數。在該表中,dp 表示雙精確度。這些函數中的許多函數都提供了多種形式,且具有不同的參數型別。除非另有說明,否則函數的任何形式都將回傳與其參數相同的資料型別。使用雙精確度資料的功能主要以主機系統的 C 函式庫實作; 因此,邊界情況下的準確性和行為可能會因主機系統而有所差異。
Table 9.5. Mathematical Functions
Function
Return Type
Description
Example
Result
abs(x)
(same as input)
absolute value
abs(-17.4)
Table 9.6 shows functions for generating random numbers.
Table 9.6. Random Functions
Function
Return Type
Description
random()
dp
random value in the range 0.0 <= x < 1.0
setseed(dp)
void
The random() function uses a simple linear congruential algorithm. It is fast but not suitable for cryptographic applications; see the pgcrypto module for a more secure alternative. If setseed() is called, the results of subsequent random() calls in the current session are repeatable by re-issuing setseed() with the same argument.
Table 9.7 shows the available trigonometric functions. All these functions take arguments and return values of type double precision. Each of the trigonometric functions comes in two variants, one that measures angles in radians and one that measures angles in degrees.
Table 9.7. Trigonometric Functions
Function (radians)
Function (degrees)
Description
acos(x)
acosd(x)
inverse cosine
asin(x)
asind(x)
Note
Another way to work with angles measured in degrees is to use the unit transformation functions radians() and degrees() shown earlier. However, using the degree-based trigonometric functions is preferred, as that way avoids round-off error for special cases such as sind(30).
Table 9.8 shows the available hyperbolic functions. All these functions take arguments and return values of type double precision.
nearest integer greater than or equal to argument (same as ceil)
ceiling(-95.3)
-95
degrees(dp)
dp
radians to degrees
degrees(0.5)
28.6478897565412
div(ynumeric, xnumeric)
numeric
integer quotient of y/x
div(9,4)
2
exp(dp or numeric)
(same as input)
exponential
exp(1.0)
2.71828182845905
floor(dp or numeric)
(same as input)
nearest integer less than or equal to argument
floor(-42.8)
-43
ln(dp or numeric)
(same as input)
natural logarithm
ln(2.0)
0.693147180559945
log(dp or numeric)
(same as input)
base 10 logarithm
log(100.0)
2
log10(dp or numeric)
(same as input)
base 10 logarithm
log10(100.0)
2
log(bnumeric, xnumeric)
numeric
logarithm to base b
log(2.0, 64.0)
6.0000000000
mod(y, x)
(same as argument types)
remainder of y/x
mod(9,4)
1
pi()
dp
“π” constant
pi()
3.14159265358979
power(adp, bdp)
dp
a raised to the power of b
power(9.0, 3.0)
729
power(anumeric, bnumeric)
numeric
a raised to the power of b
power(9.0, 3.0)
729
radians(dp)
dp
degrees to radians
radians(45.0)
0.785398163397448
round(dp or numeric)
(same as input)
round to nearest integer
round(42.4)
42
round(vnumeric, sint)
numeric
round to s decimal places
round(42.4382, 2)
42.44
scale(numeric)
integer
scale of the argument (the number of decimal digits in the fractional part)
scale(8.41)
2
sign(dp or numeric)
(same as input)
sign of the argument (-1, 0, +1)
sign(-8.4)
-1
sqrt(dp or numeric)
(same as input)
square root
sqrt(2.0)
1.4142135623731
trunc(dp or numeric)
(same as input)
truncate toward zero
trunc(42.8)
42
trunc(vnumeric, sint)
numeric
truncate to s decimal places
trunc(42.4382, 2)
42.43
width_bucket(operanddp, b1dp, b2dp, countint)
int
return the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2; returns 0 or count+1 for an input outside the range
return the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2; returns 0 or count+1 for an input outside the range
return the bucket number to which operand would be assigned given an array listing the lower bounds of the buckets; returns 0 for an input less than the first lower bound; the thresholds array must be sorted, smallest first, or unexpected results will be obtained
set seed for subsequent random() calls (value between -1.0 and 1.0, inclusive)
inverse sine
atan(x)
atand(x)
inverse tangent
atan2(y, x)
atan2d(y, x)
inverse tangent of y/x
cos(x)
cosd(x)
cosine
cot(x)
cotd(x)
cotangent
sin(x)
sind(x)
sine
tan(x)
tand(x)
tangent
cosh(x)
hyperbolic cosine
cosh(0)
1
tanh(x)
hyperbolic tangent
tanh(0)
0
asinh(x)
inverse hyperbolic sine
asinh(0)
0
acosh(x)
inverse hyperbolic cosine
acosh(1)
0
atanh(x)
inverse hyperbolic tangent
atanh(0)
0
9.11. 地理資訊函式及運算子
The geometric typespoint,box,lseg,line,path,polygon, andcirclehave a large set of native support functions and operators, shown inTable 9.33,Table 9.34, andTable 9.35.
Caution
Note that the“same as”operator,~=, represents the usual notion of equality for thepoint,box,polygon, andcircletypes. Some of these types also have an=operator, but=compares for equal_areas_only. The other scalar comparison operators (<=and so on) likewise compare areas for these types.
Table 9.33. Geometric Operators
Note
BeforePostgreSQL8.2, the containment operators@>and<@were respectively called~and@. These names are still available, but are deprecated and will eventually be removed.
Table 9.34. Geometric Functions
Table 9.35. Geometric Type Conversion Functions
It is possible to access the two component numbers of apointas though the point were an array with indexes 0 and 1. For example, ift.pis apointcolumn thenSELECT p[0] FROM tretrieves the X coordinate andUPDATE t SET p[1] = ...changes the Y coordinate. In the same way, a value of typeboxorlsegcan be treated as an array of twopointvalues.
Theareafunction works for the typesbox,circle, andpath. Theareafunction only works on thepathdata type if the points in thepathare non-intersecting. For example, thepath'((0,0),(0,1),(2,1),(2,2),(1,2),(1,0),(0,0))'::PATHwill not work; however, the following visually identicalpath'((0,0),(0,1),(1,1),(1,2),(2,2),(2,1),(1,1),(1,0),(0,0))'::PATHwill work. If the concept of an intersecting versus non-intersectingpathis confusing, draw both of the abovepath
Table 9.25 shows the template patterns available for formatting date and time values.
Table 9.25. Template Patterns for Date/Time Formatting
Pattern
Description
HH
hour of day (01-12)
HH12
hour of day (01-12)
HH24
hour of day (00-23)
Modifiers can be applied to any template pattern to alter its behavior. For example, FMMonth is the Month pattern with the FM modifier. Table 9.26 shows the modifier patterns for date/time formatting.
Table 9.26. Template Pattern Modifiers for Date/Time Formatting
Modifier
Description
Example
FM prefix
fill mode (suppress leading zeroes and padding blanks)
FMMonth
TH suffix
upper case ordinal number suffix
Usage notes for date/time formatting:
FM suppresses leading zeroes and trailing blanks that would otherwise be added to make the output of a pattern be fixed-width. In PostgreSQL, FM modifies only the next specification, while in Oracle FM affects all subsequent specifications, and repeated FM modifiers toggle fill mode on and off.
TM does not include trailing blanks. to_timestamp and to_date ignore the TM modifier.
to_timestamp and to_date skip multiple blank spaces at the beginning of the input string and around date and time values unless the FX option is used. For example, to_timestamp(' 2000 JUN', 'YYYY MON') and to_timestamp('2000 - JUN', 'YYYY-MON') work, but to_timestamp('2000 JUN', 'FXYYYY MON') returns an error because to_timestamp expects only a single space. FX must be specified as the first item in the template.
A separator (a space or non-letter/non-digit character) in the template string of to_timestamp and to_date matches any single separator in the input string or is skipped, unless the FX option is used. For example, to_timestamp('2000JUN', 'YYYY///MON') and to_timestamp('2000/JUN', 'YYYY MON') work, but to_timestamp('2000//JUN', 'YYYY/MON') returns an error because the number of separators in the input string exceeds the number of separators in the template.
If FX
A TZH template pattern can match a signed number. Without the FX option, minus signs may be ambiguous, and could be interpreted as a separator. This ambiguity is resolved as follows: If the number of separators before TZH in the template string is less than the number of separators before the minus sign in the input string, the minus sign is interpreted as part of TZH. Otherwise, the minus sign is considered to be a separator between values. For example, to_timestamp('2000 -10', 'YYYY TZH') matches -10 to TZH, but to_timestamp('2000 -10', 'YYYY TZH') matches
Ordinary text is allowed in to_char templates and will be output literally. You can put a substring in double quotes to force it to be interpreted as literal text even if it contains template patterns. For example, in '"Hello Year "YYYY', the YYYY will be replaced by the year data, but the single Y in Year will not be. In to_date, to_number, and to_timestamp, literal text and double-quoted strings result in skipping the number of characters contained in the string; for example "XX"
If you want to have a double quote in the output you must precede it with a backslash, for example '\"YYYY Month\"'. Backslashes are not otherwise special outside of double-quoted strings. Within a double-quoted string, a backslash causes the next character to be taken literally, whatever it is (but this has no special effect unless the next character is a double quote or another backslash).
In to_timestamp and to_date, if the year format specification is less than four digits, e.g. YYY, and the supplied year is less than four digits, the year will be adjusted to be nearest to the year 2020, e.g. 95 becomes 1995.
In to_timestamp and to_date, the YYYY conversion has a restriction when processing years with more than 4 digits. You must use some non-digit character or template after YYYY, otherwise the year is always interpreted as 4 digits. For example (with the year 20000): to_date('200001131', 'YYYYMMDD') will be interpreted as a 4-digit year; instead use a non-digit separator after the year, like to_date('20000-1131', 'YYYY-MMDD') or to_date('20000Nov31', 'YYYYMonDD').
In to_timestamp and to_date, the CC (century) field is accepted but ignored if there is a YYY, YYYY or Y,YYY field. If CC is used with YY or Y then the result is computed as that year in the specified century. If the century is specified but the year is not, the first year of the century is assumed.
In to_timestamp and to_date, weekday names or numbers (DAY, D, and related field types) are accepted but are ignored for purposes of computing the result. The same is true for quarter (Q) fields.
In to_timestamp and to_date, an ISO 8601 week-numbering date (as distinct from a Gregorian date) can be specified in one of two ways:
Year, week number, and weekday: for example to_date('2006-42-4', 'IYYY-IW-ID') returns the date 2006-10-19. If you omit the weekday it is assumed to be 1 (Monday).
In to_timestamp, millisecond (MS) or microsecond (US) fields are used as the seconds digits after the decimal point. For example to_timestamp('12.3', 'SS.MS') is not 3 milliseconds, but 300, because the conversion treats it as 12 + 0.3 seconds. So, for the format SS.MS, the input values 12.3, 12.30, and 12.300 specify the same number of milliseconds. To get three milliseconds, one must write 12.003
to_char(..., 'ID')'s day of the week numbering matches the extract(isodow from ...) function, but to_char(..., 'D')'s does not match extract(dow from ...)'s day numbering.
to_char(interval) formats HH and HH12 as shown on a 12-hour clock, for example zero hours and 36 hours both output as 12, while HH24 outputs the full hour value, which can exceed 23 in an interval value.
Table 9.27 shows the template patterns available for formatting numeric values.
Table 9.27. Template Patterns for Numeric Formatting
Pattern
Description
9
digit position (can be dropped if insignificant)
0
digit position (will not be dropped, even if insignificant)
. (period)
decimal point
Usage notes for numeric formatting:
0 specifies a digit position that will always be printed, even if it contains a leading/trailing zero. 9 also specifies a digit position, but if it is a leading zero then it will be replaced by a space, while if it is a trailing zero and fill mode is specified then it will be deleted. (For to_number(), these two pattern characters are equivalent.)
The pattern characters S, L, D, and G represent the sign, currency symbol, decimal point, and thousands separator characters defined by the current locale (see and ). The pattern characters period and comma represent those exact characters, with the meanings of decimal point and thousands separator, regardless of locale.
If no explicit provision is made for a sign in to_char()'s pattern, one column will be reserved for the sign, and it will be anchored to (appear just left of) the number. If S appears just left of some 9's, it will likewise be anchored to the number.
A sign formatted using SG, PL, or MI is not anchored to the number; for example, to_char(-12, 'MI9999') produces '- 12' but to_char(-12, 'S9999') produces ' -12'. (The Oracle implementation does not allow the use of MI before 9, but rather requires that 9
TH does not convert values less than zero and does not convert fractional numbers.
PL, SG, and TH are PostgreSQL extensions.
In to_number, if non-data template patterns such as L or TH are used, the corresponding number of input characters are skipped, whether or not they match the template pattern, unless they are data characters (that is, digits, sign, decimal point, or comma). For example, TH would skip two non-data characters.
V with to_char multiplies the input values by 10^n, where n is the number of digits following V. V with to_number divides in a similar manner. to_char and to_number
EEEE (scientific notation) cannot be used in combination with any of the other formatting patterns or modifiers other than digit and decimal point patterns, and must be at the end of the format string (e.g., 9.99EEEE is a valid pattern).
Certain modifiers can be applied to any template pattern to alter its behavior. For example, FM99.99 is the 99.99 pattern with the FM modifier. Table 9.28 shows the modifier patterns for numeric formatting.
Table 9.28. Template Pattern Modifiers for Numeric Formatting
Modifier
Description
Example
FM prefix
fill mode (suppress trailing zeroes and padding blanks)
FM99.99
TH suffix
upper case ordinal number suffix
Table 9.29 shows some examples of the use of the to_char function.
SQL 標準中要求 timestamp 的效果等同於 timestamp without time zone,對此 PostgreSQL 尊重這個行為。同時 PostgreSQL 額外擴充了 timestamptz 作為 timestamp with time zone 的縮寫。
time、timestamp 和 interval 接受 p 作為非必須的精度參數,可指定秒的欄位保留的小數位數。預設情況下,精度沒有明確的界限。其中 p 允許的範圍是 0 到 6。
interval 型態有個額外的選項,可以寫下下列其中一個詞組來限制存放的欄位:
需注意若是 fields 和 p 同時指定時,fields 必須要包含 SECOND。這是因為精度只會套用在秒上。
time with time zone 型態是由 SQL 標準所定義的,但是在定義中展示的屬性會導致對有用性產生疑問。在多數狀況下,date、time、timestamp without time zone 和 timestamp with time zone 的組合應該就能提供任何應用程式需要的完整日期/時間功能。
time-of-day 格式包含 time [ (p) ] without time zone和time [ (_p_\) \] with time zone,其中 time 單獨出現時等同於 time without time zone。
這些型態的合法輸入包含了一天當中的時間,以及非必須的時區。(請參照 和)。如果在 time without time zone 的輸入中指定了時區,則時區會被無聲地忽略。你也可以指定日期,但日期也會被忽略,除非你指定的時區名稱是像 America/New_York 這種具有日光節約規則的時區,因為在這種狀況下,為了能夠決定要套用一般規則或是日光節約規則,必須要有日期。適合的時差資訊會被紀錄在 time with time zone 的值當中。
表 8.11. 時間輸入
表 8.12. 時區輸入
關於指定時區的其他資訊,請參照。
8.5.1.3. 時間戳記
時間戳記型態的合法輸入,依序包含了日期、時間、非必須的時區、以及非必須的 AD 或者 BC。 (其中,AD 或者 BC 也可以寫在時區前面,但這並非推薦的格式。)因此:
以及:
都是遵循 ISO 8601 標準的合法值。除此之外,常見的格式:
也有支援。
SQL 標準中,timestamp without time zone 和 timestamp with time zone 字面可以在時間後面加上 “+” 或 “-” 符號和時差來做區別,因此根據這個標準,
是 timestamp without time zone 型態,而
則是 timestamp with time zone 型態。PostgreSQL 從不會在識別型態前就解析字面的內容,因此會將上述兩種值都視為 timestamp without time zone 型態。如要確保字面會被視為 timestamp with time zone,請給它正確而明確的型態:
The output format of the date/time types can be set to one of the four styles ISO 8601, SQL (Ingres), traditional POSTGRES (Unix date format), or German. The default is the ISO format. (The SQL standard requires the use of the ISO 8601 format. The name of the “SQL” output format is a historical accident.) shows examples of each output style. The output of the date and time types is generally only the date or time part in accordance with the given examples. However, the POSTGRES style outputs date-only values in ISO format.
Table 8.14. Date/Time Output Styles
Note
ISO 8601 specifies the use of uppercase letter T to separate the date and time. PostgreSQLaccepts that format on input, but on output it uses a space rather than T, as shown above. This is for readability and for consistency with RFC 3339 as well as some other database systems.
In the SQL and POSTGRES styles, day appears before month if DMY field ordering has been specified, otherwise month appears before day. (See for how this setting also affects interpretation of input values.) shows examples.
Table 8.15. Date Order Conventions
The date/time style can be selected by the user using the SET datestyle command, the parameter in the postgresql.conf configuration file, or the PGDATESTYLE environment variable on the server or client.
The formatting function to_char (see ) is also available as a more flexible way to format date/time output.
8.5.3. Time Zones
Time zones, and time-zone conventions, are influenced by political decisions, not just earth geometry. Time zones around the world became somewhat standardized during the 1900s, but continue to be prone to arbitrary changes, particularly with respect to daylight-savings rules. PostgreSQL uses the widely-used IANA (Olson) time zone database for information about historical time zone rules. For times in the future, the assumption is that the latest known rules for a given time zone will continue to be observed indefinitely far into the future.
PostgreSQL endeavors to be compatible with the SQL standard definitions for typical usage. However, the SQL standard has an odd mix of date and time types and capabilities. Two obvious problems are:
Although the date type cannot have an associated time zone, the time type can. Time zones in the real world have little meaning unless associated with a date as well as a time, since the offset can vary through the year with daylight-saving time boundaries.
The default time zone is specified as a constant numeric offset from UTC. It is therefore impossible to adapt to daylight-saving time when doing date/time arithmetic across DST boundaries.
To address these difficulties, we recommend using date/time types that contain both date and time when using time zones. We do not recommend using the type time with time zone (though it is supported by PostgreSQL for legacy applications and for compliance with the SQL standard). PostgreSQL assumes your local time zone for any type containing only date or time.
All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the zone specified by the configuration parameter before being displayed to the client.
PostgreSQL allows you to specify time zones in three different forms:
A full time zone name, for example America/New_York. The recognized time zone names are listed in the pg_timezone_names view (see ). PostgreSQL uses the widely-used IANA time zone data for this purpose, so the same time zone names are also recognized by much other software.
A time zone abbreviation, for example PST. Such a specification merely defines a particular offset from UTC, in contrast to full time zone names which can imply a set of daylight savings transition-date rules as well. The recognized abbreviations are listed in the pg_timezone_abbrevs
In short, this is the difference between abbreviations and full names: abbreviations represent a specific offset from UTC, whereas many of the full names imply a local daylight-savings time rule, and so have two possible UTC offsets. As an example, 2014-06-04 12:00 America/New_York represents noon local time in New York, which for this particular date was Eastern Daylight Time (UTC-4). So 2014-06-04 12:00 EDT specifies that same time instant. But 2014-06-04 12:00 EST specifies noon Eastern Standard Time (UTC-5), regardless of whether daylight savings was nominally in effect on that date.
To complicate matters, some jurisdictions have used the same timezone abbreviation to mean different UTC offsets at different times; for example, in Moscow MSK has meant UTC+3 in some years and UTC+4 in others. PostgreSQLinterprets such abbreviations according to whatever they meant (or had most recently meant) on the specified date; but, as with the EST example above, this is not necessarily the same as local civil time on that date.
One should be wary that the POSIX-style time zone feature can lead to silently accepting bogus input, since there is no check on the reasonableness of the zone abbreviations. For example, SET TIMEZONE TO FOOBAR0 will work, leaving the system effectively using a rather peculiar abbreviation for UTC. Another issue to keep in mind is that in POSIX time zone names, positive offsets are used for locations west of Greenwich. Everywhere else, PostgreSQLfollows the ISO-8601 convention that positive timezone offsets are east of Greenwich.
In all cases, timezone names and abbreviations are recognized case-insensitively. (This is a change from PostgreSQL versions prior to 8.2, which were case-sensitive in some contexts but not others.)
Neither timezone names nor abbreviations are hard-wired into the server; they are obtained from configuration files stored under .../share/timezone/ and .../share/timezonesets/ of the installation directory (see ).
The configuration parameter can be set in the file postgresql.conf, or in any of the other standard ways described in . There are also some special ways to set it:
The SQL command SET TIME ZONE sets the time zone for the session. This is an alternative spelling of SET TIMEZONE TO with a more SQL-spec-compatible syntax.
The PGTZ environment variable is used by libpq clients to send a SET TIME ZONE command to the server upon connection.
8.5.4. Interval Input
interval values can be written using the following verbose syntax:
where quantity is a number (possibly signed); unit is microsecond, millisecond, second, minute, hour, day, week, month, year, decade
Quantities of days, hours, minutes, and seconds can be specified without explicit unit markings. For example, '1 12:59:10' is read the same as '1 day 12 hours 59 min 10 sec'. Also, a combination of years and months can be specified with a dash; for example '200-10' is read the same as '200 years 10 months'. (These shorter forms are in fact the only ones allowed by the SQL standard, and are used for output when IntervalStyle is set to sql_standard.)
Interval values can also be written as ISO 8601 time intervals, using either the “format with designators” of the standard's section 4.4.3.2 or the “alternative format” of section 4.4.3.3. The format with designators looks like this:
The string must start with a P, and may include a T that introduces the time-of-day units. The available unit abbreviations are given in . Units may be omitted, and may be specified in any order, but units smaller than a day must appear after T. In particular, the meaning of M depends on whether it is before or after T.
Table 8.16. ISO 8601 Interval Unit Abbreviations
In the alternative format:
the string must begin with P, and a T separates the date and time parts of the interval. The values are given as numbers similar to ISO 8601 dates.
When writing an interval constant with a fields specification, or when assigning a string to an interval column that was defined with a fields specification, the interpretation of unmarked quantities depends on the fields. For example INTERVAL '1' YEAR is read as 1 year, whereas INTERVAL '1' means 1 second. Also, field values “to the right” of the least significant field allowed by the fields specification are silently discarded. For example, writing INTERVAL '1 day 2:03:04' HOUR TO MINUTE results in dropping the seconds field, but not the day field.
According to the SQL standard all fields of an interval value must have the same sign, so a leading negative sign applies to all fields; for example the negative sign in the interval literal '-1 2:03:04' applies to both the days and hour/minute/second parts. PostgreSQL allows the fields to have different signs, and traditionally treats each field in the textual representation as independently signed, so that the hour/minute/second part is considered positive in this example. If IntervalStyle is set to sql_standard then a leading sign is considered to apply to all fields (but only if no additional signs appear). Otherwise the traditional PostgreSQL interpretation is used. To avoid ambiguity, it's recommended to attach an explicit sign to each field if any field is negative.
Internally interval values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. Because intervals are usually created from constant strings or timestamp subtraction, this storage method works well in most cases. Functions justify_days and justify_hours are available for adjusting days and hours that overflow their normal ranges.
In the verbose input format, and in some fields of the more compact input formats, field values can have fractional parts; for example '1.5 week' or '01:02:03.45'. Such input is converted to the appropriate number of months, days, and seconds for storage. When this would result in a fractional number of months or days, the fraction is added to the lower-order fields using the conversion factors 1 month = 30 days and 1 day = 24 hours. For example,'1.5 month' becomes 1 month and 15 days. Only seconds will ever be shown as fractional on output.
shows some examples of valid interval input.
Table 8.17. Interval Input
8.5.5. Interval Output
The output format of the interval type can be set to one of the four styles sql_standard, postgres, postgres_verbose, or iso_8601, using the command SET intervalstyle. The default is the postgres format. shows examples of each output style.
The sql_standard style produces output that conforms to the SQL standard's specification for interval literal strings, if the interval value meets the standard's restrictions (either year-month only or day-time only, with no mixing of positive and negative components). Otherwise the output looks like a standard year-month literal string followed by a day-time literal string, with explicit signs added to disambiguate mixed-sign intervals.
The output of the postgres style matches the output of PostgreSQL releases prior to 8.4 when the parameter was set to ISO.
The output of the postgres_verbose style matches the output of PostgreSQL releases prior to 8.4 when the DateStyle parameter was set to non-ISO output.
The output of the iso_8601 style matches the “format with designators” described in section 4.4.3.2 of the ISO 8601 standard.
Table 8.18. Interval Output Style Examples
is specified, a separator in the template string matches exactly one character in the input string. But note that the input string character is not required to be the same as the separator from the template string. For example,
to_timestamp('2000/JUN', 'FXYYYY MON')
works, but
to_timestamp('2000/JUN', 'FXYYYY MON')
returns an error because the second space in the template string consumes the letter
J
from the input string.
10
to
TZH
.
skips two input characters (whether or not they are
XX
).
Tip
Prior to PostgreSQL 12, it was possible to skip arbitrary text in the input string using non-letter or non-digit characters. For example, to_timestamp('2000y6m1d', 'yyyy-MM-DD') used to work. Now you can only use letter characters for this purpose. For example, to_timestamp('2000y6m1d', 'yyyytMMtDDt') and to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"') skip y, m, and d.
Year and day of year: for example to_date('2006-291', 'IYYY-IDDD') also returns 2006-10-19.
Attempting to enter a date using a mixture of ISO 8601 week-numbering fields and Gregorian date fields is nonsensical, and will cause an error. In the context of an ISO 8601 week-numbering year, the concept of a “month” or “day of month” has no meaning. In the context of a Gregorian year, the ISO week has no meaning.
Caution
While to_date will reject a mixture of Gregorian and ISO week-numbering date fields, to_char will not, since output format specifications like YYYY-MM-DD (IYYY-IDDD) can be useful. But avoid writing something like IYYY-MM-DD; that would yield surprising results near the start of the year. (See Section 9.9.1 for more information.)
, which the conversion treats as 12 + 0.003 = 12.003 seconds.
Here is a more complex example: to_timestamp('15:12:02.020.001230', 'HH24:MI:SS.MS.US') is 15 hours, 12 minutes, and 2 seconds + 20 milliseconds + 1230 microseconds = 2.021230 seconds.
precede
MI
.)
do not support the use of
V
combined with a decimal point (e.g.,
99.9V99
is not allowed).
MI
minute (00-59)
SS
second (00-59)
MS
millisecond (000-999)
US
microsecond (000000-999999)
SSSS
seconds past midnight (0-86399)
AM, am, PM or pm
meridiem indicator (without periods)
A.M., a.m., P.M. or p.m.
meridiem indicator (with periods)
Y,YYY
year (4 or more digits) with comma
YYYY
year (4 or more digits)
YYY
last 3 digits of year
YY
last 2 digits of year
Y
last digit of year
IYYY
ISO 8601 week-numbering year (4 or more digits)
IYY
last 3 digits of ISO 8601 week-numbering year
IY
last 2 digits of ISO 8601 week-numbering year
I
last digit of ISO 8601 week-numbering year
BC, bc, AD or ad
era indicator (without periods)
B.C., b.c., A.D. or a.d.
era indicator (with periods)
MONTH
full upper case month name (blank-padded to 9 chars)
Month
full capitalized month name (blank-padded to 9 chars)
month
full lower case month name (blank-padded to 9 chars)
MON
abbreviated upper case month name (3 chars in English, localized lengths vary)
Mon
abbreviated capitalized month name (3 chars in English, localized lengths vary)
mon
abbreviated lower case month name (3 chars in English, localized lengths vary)
MM
month number (01-12)
DAY
full upper case day name (blank-padded to 9 chars)
Day
full capitalized day name (blank-padded to 9 chars)
day
full lower case day name (blank-padded to 9 chars)
DY
abbreviated upper case day name (3 chars in English, localized lengths vary)
Dy
abbreviated capitalized day name (3 chars in English, localized lengths vary)
dy
abbreviated lower case day name (3 chars in English, localized lengths vary)
DDD
day of year (001-366)
IDDD
day of ISO 8601 week-numbering year (001-371; day 1 of the year is Monday of the first ISO week)
DD
day of month (01-31)
D
day of the week, Sunday (1) to Saturday (7)
ID
ISO 8601 day of the week, Monday (1) to Sunday (7)
W
week of month (1-5) (the first week starts on the first day of the month)
WW
week number of year (1-53) (the first week starts on the first day of the year)
IW
week number of ISO 8601 week-numbering year (01-53; the first Thursday of the year is in week 1)
CC
century (2 digits) (the twenty-first century starts on 2001-01-01)
J
Julian Day (integer days since November 24, 4714 BC at midnight UTC)
Q
quarter
RM
month in upper case Roman numerals (I-XII; I=January)
rm
month in lower case Roman numerals (i-xii; i=January)
TZ
upper case time-zone abbreviation (only supported in to_char)
tz
lower case time-zone abbreviation (only supported in to_char)
TZH
time-zone hours
TZM
time-zone minutes
OF
time-zone offset from UTC (only supported in to_char)
DDTH, e.g., 12TH
th suffix
lower case ordinal number suffix
DDth, e.g., 12th
FX prefix
fixed format global option (see usage notes)
FX Month DD Day
TM prefix
translation mode (print localized day and month names based on lc_time)
January 2, 2003 in MDY mode; February 1, 2003 in DMY mode; February 3, 2001 in YMD mode
1999-Jan-08
January 8 in any mode
Jan-08-1999
January 8 in any mode
08-Jan-1999
January 8 in any mode
99-Jan-08
January 8 in YMD mode, else error
08-Jan-99
January 8, except error in YMD mode
Jan-08-99
January 8, except error in YMD mode
19990108
ISO 8601; January 8, 1999 in any mode
990108
ISO 8601; January 8, 1999 in any mode
1999.008
year and day of year
J2451187
Julian date
January 8, 99 BC
year 99 BC
040506
ISO 8601
04:05 AM
same as 04:05; AM does not affect value
04:05 PM
same as 16:05; input hour must be <= 12
04:05:06.789-8
ISO 8601
04:05:06-08:00
ISO 8601
04:05-08:00
ISO 8601
040506-08
ISO 8601
04:05:06 PST
time zone specified by abbreviation
2003-04-12 04:05:06 America/New_York
time zone specified by full name
-8:00
ISO-8601 offset for PST
-800
ISO-8601 offset for PST
-8
ISO-8601 offset for PST
zulu
Military abbreviation for UTC
z
Short form of zulu
-infinity
date, timestamp
earlier than all other time stamps
now
date, time, timestamp
current transaction's start time
today
date, timestamp
midnight today
tomorrow
date, timestamp
midnight tomorrow
yesterday
date, timestamp
midnight yesterday
allballs
time
00:00:00.00 UTC
Postgres
original style
Wed Dec 17 07:37:16 1997 PST
German
regional style
17.12.1997 07:37:16.00 PST
Postgres, DMY
day/month/year
Wed 17 Dec 07:37:16 1997 PST
view (see
). You cannot set the configuration parameters
or
to a time zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME ZONE operator.
In addition to the timezone names and abbreviations, PostgreSQL will accept POSIX-style time zone specifications of the form STDoffset or STDoffsetDST, where STD is a zone abbreviation, offset is a numeric offset in hours west from UTC, and DST is an optional daylight-savings zone abbreviation, assumed to stand for one hour ahead of the given offset. For example, if EST5EDT were not already a recognized zone name, it would be accepted and would be functionally equivalent to United States East Coast time. In this syntax, a zone abbreviation can be a string of letters, or an arbitrary string surrounded by angle brackets (<>). When a daylight-savings zone abbreviation is present, it is assumed to be used according to the same daylight-savings transition rules used in the IANA time zone database's posixrules entry. In a standard PostgreSQL installation, posixrules is the same as US/Eastern, so that POSIX-style time zone specifications follow USA daylight-savings rules. If needed, you can adjust this behavior by replacing the posixrules file.
,
century
,
millennium
, or abbreviations or plurals of these units;
direction
can be
ago
or empty. The at sign (
@
) is optional noise. The amounts of the different units are implicitly added with appropriate sign accounting.
ago
negates all the fields. This syntax is also used for interval output, if
is set to postgres_verbose.
D
Days
H
Hours
M
Minutes (in the time part)
S
Seconds
P1Y2M3DT4H5M6S
ISO 8601 “format with designators”: same meaning as above
P0001-02-03T04:05:06
ISO 8601 “alternative format”: same meaning as above
1 year 2 mons
3 days 04:05:06
-1 year -2 mons +3 days -04:05:06
postgres_verbose
@ 1 year 2 mons
@ 3 days 4 hours 5 mins 6 secs
@ 1 year 2 mons -3 days 4 hours 5 mins 6 secs ago
iso_8601
P1Y2M
P3DT4H5M6S
P-1Y-2M3DT-4H-5M-6S
Resolution
timestamp [ (p) ] [ without time zone ]
8 bytes
both date and time (no time zone)
4713 BC
294276 AD
1 microsecond
timestamp [ (p) ] with time zone
8 bytes
both date and time, with time zone
4713 BC
294276 AD
1 microsecond
date
4 bytes
date (no time of day)
4713 BC
5874897 AD
1 day
time [ (p) ] [ without time zone ]
8 bytes
time of day (no date)
00:00:00
24:00:00
1 microsecond
time [ (p) ] with time zone
12 bytes
time of day (no date), with time zone
00:00:00+1459
24:00:00-1459
1 microsecond
interval [fields ] [ (p) ]
16 bytes
time interval
-178000000 years
178000000 years
1 microsecond
Example
Description
1999-01-08
ISO 8601; January 8 in any mode (recommended format)
January 8, 1999
unambiguous in any datestyle input mode
1/8/1999
January 8 in MDY mode; August 1 in DMY mode
Example
Description
04:05:06.789
ISO 8601
04:05:06
ISO 8601
04:05
ISO 8601
Example
Description
PST
Abbreviation (for Pacific Standard Time)
America/New_York
Full time zone name
PST8PDT
POSIX-style time zone specification
Input String
Valid Types
Description
epoch
date, timestamp
1970-01-01 00:00:00+00 (Unix system time zero)
infinity
date, timestamp
Style Specification
Description
Example
ISO
ISO 8601, SQL standard
1997-12-17 07:37:16-08
SQL
traditional style
datestyle Setting
Input Ordering
Example Output
SQL, DMY
day/month/year
17/12/1997 15:37:16.00 CET
SQL, MDY
month/day/year
Abbreviation
Meaning
Y
Years
M
Months (in the date part)
W
Weeks
Example
Description
1-2
SQL standard format: 1 year 2 months
3 4:05:06
SQL standard format: 3 days 4 hours 5 minutes 6 seconds
1 year 2 months 3 days 4 hours 5 minutes 6 seconds
Traditional Postgres format: 1 year 2 months 3 days 4 hours 5 minutes 6 seconds
YEAR
MONTH
DAY
HOUR
MINUTE
SECOND
YEAR TO MONTH
DAY TO HOUR
DAY TO MINUTE
DAY TO SECOND
HOUR TO MINUTE
HOUR TO SECOND
MINUTE TO SECOND
type [ (p) ] 'value'
1999-01-08 04:05:06
1999-01-08 04:05:06 -8:00
January 8 04:05:06 1999 PST
TIMESTAMP '2004-10-19 10:23:54'
TIMESTAMP '2004-10-19 10:23:54+02'
TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'
[@] quantity unit [quantity unit...] [direction]
P quantity unit [ quantity unit ...] [ T [ quantity unit ...]]
P [ years-months-days ] [ T hours:minutes:seconds ]
9.9 日期時間函式及運算子
Table 9.31 shows the available functions for date/time value processing, with details appearing in the following subsections. Table 9.30 illustrates the behaviors of the basic arithmetic operators (+, *, etc.). For formatting functions, refer to Section 9.8. You should be familiar with the background information on date/time data types from Section 8.5.
All the functions and operators described below that take time or timestamp inputs actually come in two variants: one that takes time with time zone or timestamp with time zone, and one that takes time without time zone or timestamp without time zone. For brevity, these variants are not shown separately. Also, the + and * operators come in commutative pairs (for example both date + integer and integer + date); we show only one of each such pair.
Table 9.30. Date/Time Operators
Table 9.31. Date/Time Functions
In addition to these functions, the SQL OVERLAPS operator is supported:
This expression yields true when two time periods (defined by their endpoints) overlap, false when they do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time, or time stamp followed by an interval. When a pair of values is provided, either the start or the end can be written first; OVERLAPS automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start<=time<end, unless start and end are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.
When adding an interval value to (or subtracting an interval value from) a timestamp with time zone value, the days component advances or decrements the date of the timestamp with time zone by the indicated number of days, keeping the time of day the same. Across daylight saving time changes (when the session time zone is set to a time zone that recognizes DST), this means interval '1 day' does not necessarily equal interval '24 hours'. For example, with the session time zone set to America/Denver:
This happens because an hour was skipped due to a change in daylight saving time at 2005-04-03 02:00:00 in time zone America/Denver.
Note there can be ambiguity in the months field returned by age because different months have different numbers of days. PostgreSQL's approach uses the month from the earlier of the two dates when calculating partial months. For example, age('2004-06-01', '2004-04-30') uses April to yield 1 mon 1 day, while using May would yield 1 mon 2 days because May has 31 days, while April has only 30.
Subtraction of dates and timestamps can also be complex. One conceptually simple way to perform subtraction is to convert each value to a number of seconds using EXTRACT(EPOCH FROM ...), then subtract the results; this produces the number of seconds between the two values. This will adjust for the number of days in each month, timezone changes, and daylight saving time adjustments. Subtraction of date or timestamp values with the “-” operator returns the number of days (24-hours) and hours/minutes/seconds between the values, making the same adjustments. The age function returns years, months, days, and hours/minutes/seconds, performing field-by-field subtraction and then adjusting for negative field values. The following queries illustrate the differences in these approaches. The sample results were produced with timezone = 'US/Eastern'; there is a daylight saving time change between the two dates used:
9.9.1. EXTRACT, date_part
The extract function retrieves subfields such as year or hour from date/time values. source must be a value expression of type timestamp, time, or interval. (Expressions of type date are cast to timestamp and can therefore be used as well.) field is an identifier or string that selects what field to extract from the source value. The extract function returns values of type double precision. The following are valid field names:century
The century
The first century starts at 0001-01-01 00:00:00 AD, although they did not know it at the time. This definition applies to all Gregorian calendar countries. There is no century number 0, you go from -1 century to 1 century. If you disagree with this, please write your complaint to: Pope, Cathedral Saint-Peter of Roma, Vatican.day
For timestamp values, the day (of the month) field (1 - 31) ; for interval values, the number of days
decade
The year field divided by 10
dow
The day of the week as Sunday (0) to Saturday (6)
Note that extract's day of the week numbering differs from that of the to_char(..., 'D') function.doy
The day of the year (1 - 365/366)
epoch
For timestamp with time zone values, the number of seconds since 1970-01-01 00:00:00 UTC (can be negative); for date and timestamp values, the number of seconds since 1970-01-01 00:00:00 local time; for interval values, the total number of seconds in the interval
You can convert an epoch value back to a time stamp with to_timestamp:
hour
The hour field (0 - 23)
isodow
The day of the week as Monday (1) to Sunday (7)
This is identical to dow except for Sunday. This matches the ISO 8601 day of the week numbering.isoyear
The ISO 8601 week-numbering year that the date falls in (not applicable to intervals)
Each ISO 8601 week-numbering year begins with the Monday of the week containing the 4th of January, so in early January or late December the ISO year may be different from the Gregorian year. See the week field for more information.
This field is not available in PostgreSQL releases prior to 8.3.microseconds
The seconds field, including fractional parts, multiplied by 1 000 000; note that this includes full seconds
millennium
The millennium
Years in the 1900s are in the second millennium. The third millennium started January 1, 2001.milliseconds
The seconds field, including fractional parts, multiplied by 1000. Note that this includes full seconds.
minute
The minutes field (0 - 59)
month
For timestamp values, the number of the month within the year (1 - 12) ; for interval values, the number of months, modulo 12 (0 - 11)
quarter
The quarter of the year (1 - 4) that the date is in
second
The seconds field, including fractional parts (0 - 59)
timezone
The time zone offset from UTC, measured in seconds. Positive values correspond to time zones east of UTC, negative values to zones west of UTC. (Technically, PostgreSQL does not use UTC because leap seconds are not handled.)timezone_hour
The hour component of the time zone offsettimezone_minute
The minute component of the time zone offsetweek
The number of the ISO 8601 week-numbering week of the year. By definition, ISO weeks start on Mondays and the first week of a year contains January 4 of that year. In other words, the first Thursday of a year is in week 1 of that year.
In the ISO week-numbering system, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for late-December dates to be part of the first week of the next year. For example, 2005-01-01 is part of the 53rd week of year 2004, and 2006-01-01 is part of the 52nd week of year 2005, while 2012-12-31 is part of the first week of 2013. It's recommended to use the isoyear field together with week to get consistent results.
year
The year field. Keep in mind there is no 0 AD, so subtracting BC years from AD years should be done with care.
Note
When the input value is +/-Infinity, extract returns +/-Infinity for monotonically-increasing fields (epoch, julian, year, isoyear, decade, century, and millennium). For other fields, NULL is returned. PostgreSQL versions before 9.6 returned zero for all cases of infinite input.
The extract function is primarily intended for computational processing. For formatting date/time values for display, see .
The date_part function is modeled on the traditional Ingres equivalent to the SQL-standard function extract:
Note that here the field parameter needs to be a string value, not a name. The valid field names for date_part are the same as for extract.
9.9.2. date_trunc
The function date_trunc is conceptually similar to the trunc function for numbers.
source is a value expression of type timestamp, timestamp with time zone, or interval. (Values of type date and time are cast automatically to timestamp or interval, respectively.) field selects to which precision to truncate the input value. The return value is likewise of type timestamp, timestamp with time zone
Valid values for field are:
When the input value is of type timestamp with time zone, the truncation is performed with respect to a particular time zone; for example, truncation to day produces a value that is midnight in that zone. By default, truncation is done with respect to the current setting, but the optional time_zone argument can be provided to specify a different time zone. The time zone name can be specified in any of the ways described in .
A time zone cannot be specified when processing timestamp without time zone or interval inputs. These are always taken at face value.
Examples (assuming the local time zone is America/New_York):
9.9.3. AT TIME ZONE
The AT TIME ZONE converts time stamp without time zone to/from time stamp with time zone, and time values to different time zones. shows its variants.
Table 9.32. AT TIME ZONE Variants
In these expressions, the desired time zone zone can be specified either as a text string (e.g., 'America/Los_Angeles') or as an interval (e.g., INTERVAL '-08:00'). In the text case, a time zone name can be specified in any of the ways described in .
Examples (assuming the local time zone is America/Los_Angeles):
The first example adds a time zone to a value that lacks it, and displays the value using the current TimeZone setting. The second example shifts the time stamp with time zone value to the specified time zone, and returns the value without a time zone. This allows storage and display of values different from the current TimeZone setting. The third example converts Tokyo time to Chicago time. Converting time values to other time zones uses the currently active time zone rules since no date is supplied.
The function timezone(zone, timestamp) is equivalent to the SQL-conforming construct timestamp AT TIME ZONE zone.
9.9.4. Current Date/Time
PostgreSQL provides a number of functions that return values related to the current date and time. These SQL-standard functions all return values based on the start time of the current transaction:
CURRENT_TIME and CURRENT_TIMESTAMP deliver values with time zone; LOCALTIME and LOCALTIMESTAMP deliver values without time zone.
CURRENT_TIME, CURRENT_TIMESTAMP, LOCALTIME, and LOCALTIMESTAMP can optionally take a precision parameter, which causes the result to be rounded to that many fractional digits in the seconds field. Without a precision parameter, the result is given to the full available precision.
Some examples:
Since these functions return the start time of the current transaction, their values do not change during the transaction. This is considered a feature: the intent is to allow a single transaction to have a consistent notion of the “current” time, so that multiple modifications within the same transaction bear the same time stamp.
Note
Other database systems might advance these values more frequently.
PostgreSQL also provides functions that return the start time of the current statement, as well as the actual current time at the instant the function is called. The complete list of non-SQL-standard time functions is:
transaction_timestamp() is equivalent to CURRENT_TIMESTAMP, but is named to clearly reflect what it returns. statement_timestamp() returns the start time of the current statement (more specifically, the time of receipt of the latest command message from the client). statement_timestamp() and transaction_timestamp() return the same value during the first command of a transaction, but might differ during subsequent commands. clock_timestamp() returns the actual current time, and therefore its value changes even within a single SQL command. timeofday() is a historical PostgreSQL function. Like clock_timestamp(), it returns the actual current time, but as a formatted text string rather than a timestamp with time zone
All the date/time data types also accept the special literal value now to specify the current date and time (again, interpreted as the transaction start time). Thus, the following three all return the same result:
Tip
You do not want to use the third form when specifying a DEFAULT clause while creating a table. The system will convert now to a timestamp as soon as the constant is parsed, so that when the default value is needed, the time of the table creation would be used! The first two forms will not be evaluated until the default value is used, because they are function calls. Thus they will give the desired behavior of defaulting to the time of row insertion.
9.9.5. Delaying Execution
The following functions are available to delay execution of the server process:
pg_sleep makes the current session's process sleep until seconds seconds have elapsed. seconds is a value of type double precision, so fractional-second delays can be specified. pg_sleep_for is a convenience function for larger sleep times specified as an interval. pg_sleep_until is a convenience function for when a specific wake-up time is desired. For example:
Note
The effective resolution of the sleep interval is platform-specific; 0.01 seconds is a common value. The sleep delay will be at least as long as specified. It might be longer depending on factors such as server load. In particular, pg_sleep_until is not guaranteed to wake up exactly at the specified time, but it will not wake up any earlier.
Warning
Make sure that your session does not hold more locks than necessary when calling pg_sleep or its variants. Otherwise other sessions might have to wait for your sleeping process, slowing down the entire system.
60 if leap seconds are implemented by the operating system
Convert given time stamp with time zone to the new time zone, with no time zone designation
9.7. 特徵比對
PostgreSQL 提供了三種不同的特徵比對方法:傳統的 SQL LIKE 運算子,最新的 SIMILAR TO 運算子(於 SQL:1999 中加入)和 POSIX 樣式的正規表示式。除了基本的「這個字串符合這個樣式嗎?」運算子之外,還可以使用函數來提取或替換符合的子字串,以及在配對的位置拆分字串。
使用 SIMILAR TO 方式的搜尋具有相同的安全隱憂,因為 SIMILAR TO 提供了許多與 POSIX 樣式的正規表示式相同功能。
LIKE 搜尋比其他兩個選項要簡單得多,在使用可能惡意的來源時更安全。
9.7.1. LIKE
The LIKE expression returns true if the string matches the supplied pattern. (As expected, the NOT LIKE expression returns false if LIKE returns true, and vice versa. An equivalent expression is NOT (string LIKE pattern).)
If pattern does not contain percent signs or underscores, then the pattern only represents the string itself; in that case LIKE acts like the equals operator. An underscore (_) in pattern stands for (matches) any single character; a percent sign (%) matches any sequence of zero or more characters.
Some examples:
LIKE pattern matching always covers the entire string. Therefore, if it's desired to match a sequence anywhere within a string, the pattern must start and end with a percent sign.
To match a literal underscore or percent sign without matching other characters, the respective character in pattern must be preceded by the escape character. The default escape character is the backslash but a different one can be selected by using the ESCAPE clause. To match the escape character itself, write two escape characters.
Note
If you have turned off, any backslashes you write in literal string constants will need to be doubled. See for more information.
It's also possible to select no escape character by writing ESCAPE ''. This effectively disables the escape mechanism, which makes it impossible to turn off the special meaning of underscore and percent signs in the pattern.
The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale. This is not in the SQL standard but is a PostgreSQL extension.
The operator ~~ is equivalent to LIKE, and ~~* corresponds to ILIKE. There are also !~~ and !~~* operators that represent NOT LIKE and NOT ILIKE, respectively. All of these operators are PostgreSQL-specific.
There is also the prefix operator ^@ and corresponding starts_with function which covers cases when only searching by beginning of the string is needed.
9.7.2. SIMILAR TO Regular Expressions
The SIMILAR TO operator returns true or false depending on whether its pattern matches the given string. It is similar to LIKE, except that it interprets the pattern using the SQL standard's definition of a regular expression. SQL regular expressions are a curious cross between LIKE notation and common regular expression notation.
Like LIKE, the SIMILAR TO operator succeeds only if its pattern matches the entire string; this is unlike common regular expression behavior where the pattern can match any part of the string. Also like LIKE, SIMILAR TO uses _ and % as wildcard characters denoting any single character and any string, respectively (these are comparable to . and .* in POSIX regular expressions).
In addition to these facilities borrowed from LIKE, SIMILAR TO supports these pattern-matching metacharacters borrowed from POSIX regular expressions:
| denotes alternation (either of two alternatives).
* denotes repetition of the previous item zero or more times.
+
Notice that the period (.) is not a metacharacter for SIMILAR TO.
As with LIKE, a backslash disables the special meaning of any of these metacharacters; or a different escape character can be specified with ESCAPE.
Some examples:
The substring function with three parameters, substring(string from pattern for escape-character), provides extraction of a substring that matches an SQL regular expression pattern. As with SIMILAR TO, the specified pattern must match the entire data string, or else the function fails and returns null. To indicate the part of the pattern that should be returned on success, the pattern must contain two occurrences of the escape character followed by a double quote ("). The text matching the portion of the pattern between these markers is returned.
Some examples, with #" delimiting the return string:
9.7.3. POSIX Regular Expressions
lists the available operators for pattern matching using POSIX regular expressions.
Table 9.14. Regular Expression Match Operators
POSIX regular expressions provide a more powerful means for pattern matching than the LIKE and SIMILAR TO operators. Many Unix tools such as egrep, sed, or awk use a pattern matching language that is similar to the one described here.
A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a regular expression if it is a member of the regular set described by the regular expression. As with LIKE, pattern characters match string characters exactly unless they are special characters in the regular expression language — but regular expressions use different special characters than LIKE does. Unlike LIKE patterns, a regular expression is allowed to match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or end of the string.
Some examples:
The POSIX pattern language is described in much greater detail below.
The substring function with two parameters, substring(string from pattern), provides extraction of a substring that matches a POSIX regular expression pattern. It returns null if there is no match, otherwise the portion of the text that matched the pattern. But if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned. You can put parentheses around the whole expression if you want to use parentheses within it without triggering this exception. If you need parentheses in the pattern before the subexpression you want to extract, see the non-capturing parentheses described below.
Some examples:
The regexp_replace function provides substitution of new text for substrings that match POSIX regular expression patterns. It has the syntax regexp_replace(source, pattern, replacement [, flags ]). The source string is returned unchanged if there is no match to the pattern. If there is a match, the source string is returned with the
Some examples:
The regexp_match function returns a text array of captured substring(s) resulting from the first match of a POSIX regular expression pattern to a string. It has the syntax regexp_match(string, pattern [, flags ]). If there is no match, the result is NULL. If a match is found, and the pattern contains no parenthesized subexpressions, then the result is a single-element text array containing the substring matching the whole pattern. If a match is found, and the pattern contains parenthesized subexpressions, then the result is a text array whose n'th element is the substring matching the
Some examples:
In the common case where you just want the whole matching substring or NULL for no match, write something like
The regexp_matches function returns a set of text arrays of captured substring(s) resulting from matching a POSIX regular expression pattern to a string. It has the same syntax as regexp_match. This function returns no rows if there is no match, one row if there is a match and the g flag is not given, or N rows if there are N matches and the g flag is given. Each returned row is a text array containing the whole matched substring or the substrings matching parenthesized subexpressions of the pattern, just as described above for regexp_match. regexp_matches accepts all the flags shown in , plus the
Some examples:
Tip
In most cases regexp_matches() should be used with the g flag, since if you only want the first match, it's easier and more efficient to use regexp_match(). However,regexp_match() only exists in PostgreSQL version 10 and up. When working in older versions, a common trick is to place a regexp_matches() call in a sub-select, for example:
This produces a text array if there's a match, or NULL if not, the same as regexp_match()would do. Without the sub-select, this query would produce no output at all for table rows without a match, which is typically not the desired behavior.
The regexp_split_to_table function splits a string using a POSIX regular expression pattern as a delimiter. It has the syntax regexp_split_to_table(string, pattern [, flags ]). If there is no match to the pattern, the function returns the string. If there is at least one match, for each match it returns the text from the end of the last match (or the beginning of the string) to the beginning of the match. When there are no more matches, it returns the text from the end of the last match to the end of the string. The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. regexp_split_to_table
The regexp_split_to_array function behaves the same as regexp_split_to_table, except that regexp_split_to_array returns its result as an array of text. It has the syntax regexp_split_to_array(string, pattern [, flags ]). The parameters are the same as for regexp_split_to_table.
Some examples:
As the last example demonstrates, the regexp split functions ignore zero-length matches that occur at the start or end of the string or immediately after a previous match. This is contrary to the strict definition of regexp matching that is implemented by regexp_match and regexp_matches, but is usually the most convenient behavior in practice. Other software systems such as Perl use similar definitions.
9.7.3.1. Regular Expression Details
PostgreSQL's regular expressions are implemented using a software package written by Henry Spencer. Much of the description of regular expressions below is copied verbatim from his manual.
Regular expressions (REs), as defined in POSIX 1003.2, come in two forms: extended REs or EREs (roughly those of egrep), and basic REs or BREs (roughly those of ed). PostgreSQL supports both forms, and also implements some extensions that are not in the POSIX standard, but have become widely used due to their availability in programming languages such as Perl and Tcl. REs using these non-POSIX extensions are called advanced REs or AREs in this documentation. AREs are almost an exact superset of EREs, but BREs have several notational incompatibilities (as well as being much more limited). We first describe the ARE and ERE forms, noting features that apply only to AREs, and then describe how BREs differ.
Note
PostgreSQL always initially presumes that a regular expression follows the ARE rules. However, the more limited ERE or BRE rules can be chosen by prepending an embedded option to the RE pattern, as described in . This can be useful for compatibility with applications that expect exactly the POSIX 1003.2 rules.
A regular expression is defined as one or more branches, separated by |. It matches anything that matches one of the branches.
A branch is zero or more quantified atoms or constraints, concatenated. It matches a match for the first, followed by a match for the second, etc; an empty branch matches the empty string.
A quantified atom is an atom possibly followed by a single quantifier. Without a quantifier, it matches a match for the atom. With a quantifier, it can match some number of matches of the atom. An atom can be any of the possibilities shown in . The possible quantifiers and their meanings are shown in .
A constraint matches an empty string, but matches only when specific conditions are met. A constraint can be used where an atom could be used, except it cannot be followed by a quantifier. The simple constraints are shown in ; some more constraints are described later.
Table 9.15. Regular Expression Atoms
An RE cannot end with a backslash (\).
Note
If you have turned off, any backslashes you write in literal string constants will need to be doubled. See for more information.
Table 9.16. Regular Expression Quantifiers
The forms using {...} are known as bounds. The numbers m and n within a bound are unsigned decimal integers with permissible values from 0 to 255 inclusive.
Non-greedy quantifiers (available in AREs only) match the same possibilities as their corresponding normal (greedy) counterparts, but prefer the smallest number rather than the largest number of matches. See for more detail.
Note
A quantifier cannot immediately follow another quantifier, e.g., ** is invalid. A quantifier cannot begin an expression or subexpression or follow ^ or |.
Table 9.17. Regular Expression Constraints
Lookahead and lookbehind constraints cannot contain back references (see ), and all parentheses within them are considered non-capturing.
9.7.3.2. Bracket Expressions
A bracket expression is a list of characters enclosed in []. It normally matches any single character from the list (but see below). If the list begins with ^, it matches any single character not from the rest of the list. If two characters in the list are separated by -, this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g., [0-9] in ASCII matches any decimal digit. It is illegal for two ranges to share an endpoint, e.g., a-c-e. Ranges are very collating-sequence-dependent, so portable programs should avoid relying on them.
To include a literal ] in the list, make it the first character (after ^, if that is used). To include a literal -, make it the first or last character, or the second endpoint of a range. To use a literal - as the first endpoint of a range, enclose it in [. and .] to make it a collating element (see below). With the exception of these characters, some combinations using [ (see next paragraphs), and escapes (AREs only), all other special characters lose their special significance within a bracket expression. In particular, \ is not special when following ERE or BRE rules, though it is special (as introducing an escape) in AREs.
Within a bracket expression, a collating element (a character, a multiple-character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in [. and .]stands for the sequence of characters of that collating element. The sequence is treated as a single element of the bracket expression's list. This allows a bracket expression containing a multiple-character collating element to match more than one character, e.g., if the collating sequence includes a ch collating element, then the RE [[.ch.]]*c matches the first five characters of chchcc.
Note
PostgreSQL currently does not support multi-character collating elements. This information describes possible future behavior.
Within a bracket expression, a collating element enclosed in [= and =] is an equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, the treatment is as if the enclosing delimiters were [. and .].) For example, if o and ^ are the members of an equivalence class, then [[=o=]], [[=^=]], and [o^] are all synonymous. An equivalence class cannot be an endpoint of a range.
Within a bracket expression, the name of a character class enclosed in [: and :] stands for the list of all characters belonging to that class. Standard character class names are: alnum, alpha, blank,cntrl, digit, graph, lower, print, punct, space
There are two special cases of bracket expressions: the bracket expressions [[:<:]] and [[:>:]] are constraints, matching empty strings at the beginning and end of a word respectively. A word is defined as a sequence of word characters that is neither preceded nor followed by word characters. A word character is an alnum character (as defined by ctype) or an underscore. This is an extension, compatible with but not specified by POSIX 1003.2, and should be used with caution in software intended to be portable to other systems. The constraint escapes described below are usually preferable; they are no more standard, but are easier to type.
9.7.3.3. Regular Expression Escapes
Escapes are special sequences beginning with \ followed by an alphanumeric character. Escapes come in several varieties: character entry, class shorthands, constraint escapes, and back references. A \ followed by an alphanumeric character but not constituting a valid escape is illegal in AREs. In EREs, there are no escapes: outside a bracket expression, a \ followed by an alphanumeric character merely stands for that character as an ordinary character, and inside a bracket expression, \ is an ordinary character. (The latter is the one actual incompatibility between EREs and AREs.)
Character-entry escapes exist to make it easier to specify non-printing and other inconvenient characters in REs. They are shown in .
Class-shorthand escapes provide shorthands for certain commonly-used character classes. They are shown in .
A constraint escape is a constraint, matching the empty string if specific conditions are met, written as an escape. They are shown in .
A back reference (\n) matches the same string matched by the previous parenthesized subexpression specified by the number n (see ). For example, ([bc])\1 matches bb or cc but not bcor cb. The subexpression must entirely precede the back reference in the RE. Subexpressions are numbered in the order of their leading parentheses. Non-capturing parentheses do not define subexpressions.
Hexadecimal digits are 0-9, a-f, and A-F. Octal digits are 0-7.
Numeric character-entry escapes specifying values outside the ASCII range (0-127) have meanings dependent on the database encoding. When the encoding is UTF-8, escape values are equivalent to Unicode code points, for example \u1234 means the character U+1234. For other multibyte encodings, character-entry escapes usually just specify the concatenation of the byte values for the character. If the escape value does not correspond to any legal character in the database encoding, no error will be raised, but it will never match any data.
The character-entry escapes are always taken as ordinary characters. For example, \135 is ] in ASCII, but \135 does not terminate a bracket expression.
Within bracket expressions, \d, \s, and \w lose their outer brackets, and \D, \S, and \W are illegal. (So, for example, [a-c\d] is equivalent to [a-c[:digit:]]. Also, [a-c\D], which is equivalent to [a-c^[:digit:]], is illegal.)
Table 9.20. Regular Expression Constraint Escapes
A word is defined as in the specification of [[:<:]] and [[:>:]] above. Constraint escapes are illegal within bracket expressions.
Table 9.21. Regular Expression Back References
Note
There is an inherent ambiguity between octal character-entry escapes and back references, which is resolved by the following heuristics, as hinted at above. A leading zero always indicates an octal escape. A single non-zero digit, not followed by another digit, is always taken as a back reference. A multi-digit sequence not starting with a zero is taken as a back reference if it comes after a suitable subexpression (i.e., the number is in the legal range for a back reference), and otherwise is taken as octal.
9.7.3.4. Regular Expression Metasyntax
In addition to the main syntax described above, there are some special forms and miscellaneous syntactic facilities available.
An RE can begin with one of two special director prefixes. If an RE begins with ***:, the rest of the RE is taken as an ARE. (This normally has no effect in PostgreSQL, since REs are assumed to be AREs; but it does have an effect if ERE or BRE mode had been specified by the flags parameter to a regex function.) If an RE begins with ***=, the rest of the RE is taken to be a literal string, with all characters considered ordinary characters.
An ARE can begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic characters) specifies options affecting the rest of the RE. These options override any previously determined options — in particular, they can override the case-sensitivity behavior implied by a regex operator, or the flags parameter to a regex function. The available option letters are shown in . Note that these same option letters are used in the flags parameters of regex functions.
Table 9.22. ARE Embedded-option Letters
Embedded options take effect at the ) terminating the sequence. They can appear only at the start of an ARE (after the ***: director if any).
In addition to the usual (tight) RE syntax, in which all characters are significant, there is an expanded syntax, available by specifying the embedded x option. In the expanded syntax, white-space characters in the RE are ignored, as are all characters between a # and the following newline (or the end of the RE). This permits paragraphing and commenting a complex RE. There are three exceptions to that basic rule:
a white-space character or # preceded by \ is retained
white space or # within a bracket expression is retained
For this purpose, white-space characters are blank, tab, newline, and any character that belongs to the space character class.
Finally, in an ARE, outside bracket expressions, the sequence (?#ttt) (where ttt is any text not containing a )) is a comment, completely ignored. Again, this is not allowed between the characters of multi-character symbols, like (?:. Such comments are more a historical artifact than a useful facility, and their use is deprecated; use the expanded syntax instead.
None of these metasyntax extensions is available if an initial ***= director has specified that the user's input be treated as a literal string rather than as an RE.
9.7.3.5. Regular Expression Matching Rules
In the event that an RE could match more than one substring of a given string, the RE matches the one starting earliest in the string. If the RE could match more than one substring starting at that point, either the longest possible match or the shortest possible match will be taken, depending on whether the RE is greedy or non-greedy.
Whether an RE is greedy or not is determined by the following rules:
Most atoms, and all constraints, have no greediness attribute (because they cannot match variable amounts of text anyway).
Adding parentheses around an RE does not change its greediness.
A quantified atom with a fixed-repetition quantifier ({
The above rules associate greediness attributes not only with individual quantified atoms, but with branches and entire REs that contain quantified atoms. What that means is that the matching is done in such a way that the branch, or whole RE, matches the longest or shortest possible substring as a whole. Once the length of the entire match is determined, the part of it that matches any particular subexpression is determined on the basis of the greediness attribute of that subexpression, with subexpressions starting earlier in the RE taking priority over ones starting later.
An example of what this means:
In the first case, the RE as a whole is greedy because Y* is greedy. It can match beginning at the Y, and it matches the longest possible string starting there, i.e., Y123. The output is the parenthesized part of that, or 123. In the second case, the RE as a whole is non-greedy because Y*? is non-greedy. It can match beginning at the Y, and it matches the shortest possible string starting there, i.e., Y1. The subexpression [0-9]{1,3} is greedy but it cannot change the decision as to the overall match length; so it is forced to match just 1.
In short, when an RE contains both greedy and non-greedy subexpressions, the total match length is either as long as possible or as short as possible, according to the attribute assigned to the whole RE. The attributes assigned to the subexpressions only affect how much of that match they are allowed to “eat” relative to each other.
The quantifiers {1,1} and {1,1}? can be used to force greediness or non-greediness, respectively, on a subexpression or a whole RE. This is useful when you need the whole RE to have a greediness attribute different from what's deduced from its elements. As an example, suppose that we are trying to separate a string containing some digits into the digits and the parts before and after them. We might try to do that like this:
That didn't work: the first .* is greedy so it “eats” as much as it can, leaving the \d+ to match at the last possible place, the last digit. We might try to fix that by making it non-greedy:
That didn't work either, because now the RE as a whole is non-greedy and so it ends the overall match as soon as possible. We can get what we want by forcing the RE as a whole to be greedy:
Controlling the RE's overall greediness separately from its components' greediness allows great flexibility in handling variable-length patterns.
When deciding what is a longer or shorter match, match lengths are measured in characters, not collating elements. An empty string is considered longer than no match at all. For example: bb*matches the three middle characters of abbbc; (week|wee)(night|knights) matches all ten characters of weeknights; when (.*).* is matched against abc the parenthesized subexpression matches all three characters; and when (a*)* is matched against bc both the whole RE and the parenthesized subexpression match an empty string.
If case-independent matching is specified, the effect is much as if all case distinctions had vanished from the alphabet. When an alphabetic that exists in multiple cases appears as an ordinary character outside a bracket expression, it is effectively transformed into a bracket expression containing both cases, e.g., x becomes [xX]. When it appears inside a bracket expression, all case counterparts of it are added to the bracket expression, e.g., [x] becomes [xX] and [^x] becomes [^xX].
If newline-sensitive matching is specified, . and bracket expressions using ^ will never match the newline character (so that matches will never cross newlines unless the RE explicitly arranges it) and ^and $ will match the empty string after and before a newline respectively, in addition to matching at beginning and end of string respectively. But the ARE escapes \A and \Z continue to match beginning or end of string only.
If partial newline-sensitive matching is specified, this affects . and bracket expressions as with newline-sensitive matching, but not ^ and $.
If inverse partial newline-sensitive matching is specified, this affects ^ and $ as with newline-sensitive matching, but not . and bracket expressions. This isn't very useful but is provided for symmetry.
9.7.3.6. Limits And Compatibility
No particular limit is imposed on the length of REs in this implementation. However, programs intended to be highly portable should not employ REs longer than 256 bytes, as a POSIX-compliant implementation can refuse to accept such REs.
The only feature of AREs that is actually incompatible with POSIX EREs is that \ does not lose its special significance inside bracket expressions. All other ARE features use syntax which is illegal or has undefined or unspecified effects in POSIX EREs; the *** syntax of directors likewise is outside the POSIX syntax for both BREs and EREs.
Many of the ARE extensions are borrowed from Perl, but some have been changed to clean them up, and a few Perl extensions are not present. Incompatibilities of note include \b, \B, the lack of special treatment for a trailing newline, the addition of complemented bracket expressions to the things affected by newline-sensitive matching, the restrictions on parentheses and back references in lookahead/lookbehind constraints, and the longest/shortest-match (rather than first-match) matching semantics.
Two significant incompatibilities exist between AREs and the ERE syntax recognized by pre-7.4 releases of PostgreSQL:
In AREs, \ followed by an alphanumeric character is either an escape or an error, while in previous releases, it was just another way of writing the alphanumeric. This should not be much of a problem because there was no reason to write such a sequence in earlier releases.
In AREs, \ remains a special character within [], so a literal \ within a bracket expression must be written \\
9.7.3.7. Basic Regular Expressions
BREs differ from EREs in several respects. In BREs, |, +, and ? are ordinary characters and there is no equivalent for their functionality. The delimiters for bounds are \{ and \}, with { and } by themselves ordinary characters. The parentheses for nested subexpressions are \( and \), with ( and ) by themselves ordinary characters.
SELECT (DATE '2001-02-16', DATE '2001-12-21') OVERLAPS
(DATE '2001-10-30', DATE '2002-10-30');
Result: true
SELECT (DATE '2001-02-16', INTERVAL '100 days') OVERLAPS
(DATE '2001-10-30', DATE '2002-10-30');
Result: false
SELECT (DATE '2001-10-29', DATE '2001-10-30') OVERLAPS
(DATE '2001-10-30', DATE '2001-10-31');
Result: false
SELECT (DATE '2001-10-30', DATE '2001-10-30') OVERLAPS
(DATE '2001-10-30', DATE '2001-10-31');
Result: true
SELECT timestamp with time zone '2005-04-02 12:00:00-07' + interval '1 day';
Result: 2005-04-03 12:00:00-06
SELECT timestamp with time zone '2005-04-02 12:00:00-07' + interval '24 hours';
Result: 2005-04-03 13:00:00-06
SELECT EXTRACT(EPOCH FROM timestamptz '2013-07-01 12:00:00') -
EXTRACT(EPOCH FROM timestamptz '2013-03-01 12:00:00');
Result: 10537200
SELECT (EXTRACT(EPOCH FROM timestamptz '2013-07-01 12:00:00') -
EXTRACT(EPOCH FROM timestamptz '2013-03-01 12:00:00'))
/ 60 / 60 / 24;
Result: 121.958333333333
SELECT timestamptz '2013-07-01 12:00:00' - timestamptz '2013-03-01 12:00:00';
Result: 121 days 23:00:00
SELECT age(timestamptz '2013-07-01 12:00:00', timestamptz '2013-03-01 12:00:00');
Result: 4 mons
EXTRACT(field FROM source)
SELECT EXTRACT(CENTURY FROM TIMESTAMP '2000-12-16 12:21:13');
Result: 20
SELECT EXTRACT(CENTURY FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 21
SELECT EXTRACT(DAY FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 16
SELECT EXTRACT(DAY FROM INTERVAL '40 days 1 minute');
Result: 40
SELECT EXTRACT(DECADE FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 200
SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 5
SELECT EXTRACT(DOY FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 47
SELECT EXTRACT(EPOCH FROM TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40.12-08');
Result: 982384720.12
SELECT EXTRACT(EPOCH FROM INTERVAL '5 days 3 hours');
Result: 442800
SELECT date_trunc('hour', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-02-16 20:00:00
SELECT date_trunc('year', TIMESTAMP '2001-02-16 20:38:40');
Result: 2001-01-01 00:00:00
SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00');
Result: 2001-02-16 00:00:00-05
SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00', 'Australia/Sydney');
Result: 2001-02-16 08:00:00-05
SELECT date_trunc('hour', INTERVAL '3 days 02:47:33');
Result: 3 days 02:00:00
SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'America/Denver';
Result: 2001-02-16 19:38:40-08
SELECT TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40-05' AT TIME ZONE 'America/Denver';
Result: 2001-02-16 18:38:40
SELECT TIMESTAMP '2001-02-16 20:38:40-05' AT TIME ZONE 'Asia/Tokyo' AT TIME ZONE 'America/Chicago';
Result: 2001-02-16 05:38:40
denotes repetition of the previous item one or more times.
? denotes repetition of the previous item zero or one time.
{m} denotes repetition of the previous item exactly m times.
{m,} denotes repetition of the previous item m or more times.
{m,n} denotes repetition of the previous item at least m and not more than n times.
Parentheses () can be used to group items into a single logical item.
A bracket expression [...] specifies a character class, just as in POSIX regular expressions.
!~
Does not match regular expression, case sensitive
'thomas' !~ '.*Thomas.*'
!~*
Does not match regular expression, case insensitive
'thomas' !~* '.*vadim.*'
replacement
string substituted for the matching substring. The
replacement
string can contain
\
n
, where
n
is 1 through 9, to indicate that the source substring matching the
n
'th parenthesized subexpression of the pattern should be inserted, and it can contain
\&
to indicate that the substring matching the entire pattern should be inserted. Write
\\
if you need to put a literal backslash in the replacement text. The
flags
parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Flag
i
specifies case-insensitive matching, while flag
g
specifies replacement of each matching substring rather than only the first one. Supported flags (though not
g
) are described in
.
n
'th parenthesized subexpression of the
pattern
(not counting “non-capturing” parentheses; see below for details). The
flags
parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Supported flags are described in
.
g
flag which commands it to return all matches, not just the first one.
supports the flags described in
.
[chars]
a bracket expression, matching any one of the chars (see for more detail)
\k
(where k is a non-alphanumeric character) matches that character taken as an ordinary character, e.g., \\ matches a backslash character
\c
where c is alphanumeric (possibly followed by other characters) is an escape, see (AREs only; in EREs and BREs, this matches c)
{
when followed by a character other than a digit, matches the left-brace character {; when followed by a digit, it is the beginning of a bound (see below)
x
where x is a single character with no other significance, matches that character
{m}
a sequence of exactly m matches of the atom
{m,}
a sequence of m or more matches of the atom
{m,n}
a sequence of m through n (inclusive) matches of the atom; m cannot exceed n
*?
non-greedy version of *
+?
non-greedy version of +
??
non-greedy version of ?
{m}?
non-greedy version of {m}
{m,}?
non-greedy version of {m,}
{m,n}?
non-greedy version of {m,n}
(?!re)
negative lookahead matches at any point where no substring matching re begins (AREs only)
(?<=re)
positive lookbehind matches at any point where a substring matching re ends (AREs only)
(?<!re)
negative lookbehind matches at any point where no substring matching re ends (AREs only)
,
upper
,
xdigit
. These stand for the character classes defined in ctype. A locale can provide others. A character class cannot be used as an endpoint of a range.
\cX
(where X is any character) the character whose low-order 5 bits are the same as those of X, and whose other bits are all zero
\e
the character whose collating-sequence name is ESC, or failing that, the character with octal value 033
\f
form feed, as in C
\n
newline, as in C
\r
carriage return, as in C
\t
horizontal tab, as in C
\uwxyz
(where wxyz is exactly four hexadecimal digits) the character whose hexadecimal value is 0xwxyz
\Ustuvwxyz
(where stuvwxyz is exactly eight hexadecimal digits) the character whose hexadecimal value is 0xstuvwxyz
\v
vertical tab, as in C
\xhhh
(where hhh is any sequence of hexadecimal digits) the character whose hexadecimal value is 0xhhh (a single character no matter how many hexadecimal digits are used)
\0
the character whose value is 0 (the null byte)
\xy
(where xy is exactly two octal digits, and is not a back reference) the character whose octal value is 0xy
\xyz
(where xyz is exactly three octal digits, and is not a back reference) the character whose octal value is 0xyz
\D
[^[:digit:]]
\S
[^[:space:]]
\W
[^[:alnum:]_] (note underscore is included)
\y
matches only at the beginning or end of a word
\Y
matches only at a point that is not the beginning or end of a word
\Z
matches only at the end of the string (see for how this differs from $)
i
case-insensitive matching (see ) (overrides operator type)
m
historical synonym for n
n
newline-sensitive matching (see )
p
partial newline-sensitive matching (see )
q
rest of RE is a literal (“quoted”) string, all ordinary characters
s
non-newline-sensitive matching (default)
t
tight syntax (default; see below)
w
inverse partial newline-sensitive (“weird”) matching (see )
x
expanded syntax (see below)
white space and comments cannot appear within multi-character symbols, such as (?:
m
}
or
{
m
}?
) has the same greediness (possibly none) as the atom itself.
A quantified atom with other normal quantifiers (including {m,n} with m equal to n) is greedy (prefers longest match).
A quantified atom with a non-greedy quantifier (including {m,n}? with m equal to n) is non-greedy (prefers shortest match).
A branch — that is, an RE that has no top-level | operator — has the same greediness as the first quantified atom in it that has a greediness attribute.
An RE consisting of two or more branches connected by the | operator is always greedy.
.
^
is an ordinary character except at the beginning of the RE or the beginning of a parenthesized subexpression,
$
is an ordinary character except at the end of the RE or the end of a parenthesized subexpression, and
*
is an ordinary character if it appears at the beginning of the RE or the beginning of a parenthesized subexpression (after a possible leading
^
). Finally, single-digit back references are available, and
\<
and
\>
are synonyms for
[[:<:]]
and
[[:>:]]
respectively; no other escapes are available in BREs.
Operator
Description
Example
~
Matches regular expression, case sensitive
'thomas' ~ '.*thomas.*'
~*
Matches regular expression, case insensitive
Atom
Description
(re)
(where re is any regular expression) matches a match for re, with the match noted for possible reporting
(?:re)
as above, but the match is not noted for reporting (a “non-capturing” set of parentheses) (AREs only)
.
matches any single character
Quantifier
Matches
*
a sequence of 0 or more matches of the atom
+
a sequence of 1 or more matches of the atom
?
a sequence of 0 or 1 matches of the atom
Constraint
Description
^
matches at the beginning of the string
$
matches at the end of the string
(?=re)
positive lookahead matches at any point where a substring matching re begins (AREs only)
Escape
Description
\a
alert (bell) character, as in C
\b
backspace, as in C
\B
synonym for backslash (\) to help reduce the need for backslash doubling
Escape
Description
\d
[[:digit:]]
\s
[[:space:]]
\w
[[:alnum:]_] (note underscore is included)
Escape
Description
\A
matches only at the beginning of the string (see Section 9.7.3.5 for how this differs from ^)
\m
matches only at the beginning of a word
\M
matches only at the end of a word
Escape
Description
\m
(where m is a nonzero digit) a back reference to the m'th subexpression
\mnn
(where m is a nonzero digit, and nn is some more digits, and the decimal value mnn is not greater than the number of closing capturing parentheses seen so far) a back reference to the mnn'th subexpression
SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
SELECT foo FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', '\s+') AS foo;
foo
-------
the
quick
brown
fox
jumps
over
the
lazy
dog
(9 rows)
SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', '\s+');
regexp_split_to_array
-----------------------------------------------
{the,quick,brown,fox,jumps,over,the,lazy,dog}
(1 row)
SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
foo
-----
t
h
e
q
u
i
c
k
b
r
o
w
n
f
o
x
(16 rows)
本節介紹了用於檢查和操作字串的函數和運算子。在這種情況下,字串包括 character、character varying 和 text 型別的值。除非另有說明,否則下面列出的所有函數都可以在這些型別上使用,但是請注意在使用 character 型別時自動空格填充的潛在影響。其中有一些函數還支援對於位元型別的處理。
The function format produces output formatted according to a format string, in a style similar to the C function sprintf.
formatstr is a format string that specifies how the result should be formatted. Text in the format string is copied directly to the result, except where format specifiers are used. Format specifiers act as placeholders in the string, defining how subsequent function arguments should be formatted and inserted into the result. Each formatarg argument is converted to text according to the usual output rules for its data type, and then formatted and inserted into the result string according to the format specifier(s).
Format specifiers are introduced by a % character and have the form
where the component fields are:position (optional)
A string of the form n$ where n is the index of the argument to print. Index 1 means the first argument after formatstr. If the position is omitted, the default is to use the next argument in sequence.flags (optional)
Additional options controlling how the format specifier's output is formatted. Currently the only supported flag is a minus sign (-) which will cause the format specifier's output to be left-justified. This has no effect unless the width field is also specified.width (optional)
Specifies the minimum number of characters to use to display the format specifier's output. The output is padded on the left or right (depending on the - flag) with spaces as needed to fill the width. A too-small width does not cause truncation of the output, but is simply ignored. The width may be specified using any of the following: a positive integer; an asterisk (*) to use the next function argument as the width; or a string of the form *n$ to use the _n_th function argument as the width.
If the width comes from a function argument, that argument is consumed before the argument that is used for the format specifier's value. If the width argument is negative, the result is left aligned (as if the - flag had been specified) within a field of length abs(width).type (required)
The type of format conversion to use to produce the format specifier's output. The following types are supported:
s formats the argument value as a simple string. A null value is treated as an empty string.
I treats the argument value as an SQL identifier, double-quoting it if necessary. It is an error for the value to be null (equivalent to quote_ident).
In addition to the format specifiers described above, the special sequence %% may be used to output a literal % character.
Here are some examples of the basic format conversions:
Here are examples using width fields and the - flag:
These examples show use of position fields:
Unlike the standard C function sprintf, PostgreSQL's format function allows format specifiers with and without position fields to be mixed in the same format string. A format specifier without a position field always uses the next argument after the last argument consumed. In addition, the format function does not require all function arguments to be used in the format string. For example:
The %I and %L format specifiers are particularly useful for safely constructing dynamic SQL statements. See .
string `
_string`_
text
String concatenation
`'Post'
'greSQL'`
PostgreSQL
string `
_non-string_ or _non-string_
_string`_
text
String concatenation with one non-string input
`'Value: '
42`
Value: 42
bit_length(string)
int
Number of bits in string
bit_length('jose')
32
char_length(string) or character_length(string)
int
Number of characters in string
char_length('jose')
4
lower(string)
text
Convert string to lower case
lower('TOM')
tom
octet_length(string)
int
Number of bytes in string
octet_length('jose')
4
overlay(string placing string from int [for int])
text
Replace substring
overlay('Txxxxas' placing 'hom' from 2 for 4)
Thomas
position(substring in string)
int
Location of specified substring
position('om' in 'Thomas')
3
substring(string [from int] [for int])
text
Extract substring
substring('Thomas' from 2 for 3)
hom
substring(string from pattern)
text
Extract substring matching POSIX regular expression. See for more information on pattern matching.
substring('Thomas' from '...$')
mas
substring(string from pattern for escape)
text
Extract substring matching SQL regular expression. See for more information on pattern matching.
substring('Thomas' from '%#"o_a#"_' for '#')
oma
`trim([leading
trailing
both] [_characters_\] from _string`_)
text
Remove the longest string containing only characters from characters (a space by default) from the start, end, or both ends (both is the default) of string
trim(both 'xyz' from 'yxTomxx')
Tom
`trim([leading
trailing
both] [from]_string_ \[, _characters`_] )
text
Non-standard syntax for trim()
trim(both from 'yxTomxx', 'xyz')
Tom
upper(string)
text
Convert string to upper case
upper('tom')
TOM
btrim(stringtext [, characterstext])
text
Remove the longest string consisting only of characters in characters (a space by default) from the start and end of string
btrim('xyxtrimyyx', 'xyz')
trim
chr(int)
text
Character with the given code. For UTF8 the argument is treated as a Unicode code point. For other multibyte encodings the argument must designate an ASCII character. The NULL (0) character is not allowed because text data types cannot store such bytes.
chr(65)
A
concat(str"any" [, str"any" [, ...] ])
text
Concatenate the text representations of all the arguments. NULL arguments are ignored.
Convert string to dest_encoding. The original encoding is specified by src_encoding. The string must be valid in this encoding. Conversions can be defined by CREATE CONVERSION. Also there are some predefined conversions. See for available conversions.
convert('text_in_utf8', 'UTF8', 'LATIN1')
text_in_utf8 represented in Latin-1 encoding (ISO 8859-1)
convert_from(stringbytea, src_encodingname)
text
Convert string to the database encoding. The original encoding is specified by src_encoding. The string must be valid in this encoding.
convert_from('text_in_utf8', 'UTF8')
text_in_utf8 represented in the current database encoding
convert_to(stringtext, dest_encodingname)
bytea
Convert string to dest_encoding.
convert_to('some text', 'UTF8')
some text represented in the UTF8 encoding
decode(stringtext, formattext)
bytea
Decode binary data from textual representation in string. Options for format are same as in encode.
decode('MTIzAAE=', 'base64')
\x3132330001
encode(databytea, formattext)
text
Encode binary data into a textual representation. Supported formats are: base64, hex, escape. escape converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes.
encode('123\000\001', 'base64')
MTIzAAE=
format(formatstrtext [, formatarg"any" [, ...] ])
text
Format arguments according to a format string. This function is similar to the C function sprintf. See .
format('Hello %s, %1$s', 'World')
Hello World, World
initcap(string)
text
Convert the first letter of each word to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.
initcap('hi THOMAS')
Hi Thomas
left(strtext, nint)
text
Return first n characters in the string. When n is negative, return all but last |n| characters.
left('abcde', 2)
ab
length(string)
int
Number of characters in string
length('jose')
4
length(stringbytea, encodingname )
int
Number of characters in string in the given encoding. The string must be valid in this encoding.
length('jose', 'UTF8')
4
lpad(stringtext, lengthint [, filltext])
text
Fill up the string to length length by prepending the characters fill (a space by default). If the string is already longer than length then it is truncated (on the right).
lpad('hi', 5, 'xy')
xyxhi
ltrim(stringtext [, characterstext])
text
Remove the longest string containing only characters from characters (a space by default) from the start of string
ltrim('zzzytest', 'xyz')
test
md5(string)
text
Calculates the MD5 hash of string, returning the result in hexadecimal
Split qualified_identifier into an array of identifiers, removing any quoting of individual identifiers. By default, extra characters after the last identifier are considered an error; but if the second parameter is false, then such extra characters are ignored. (This behavior is useful for parsing names for objects like functions.) Note that this function does not truncate over-length identifiers. If you want truncation you can cast the result to name[].
parse_ident('"SomeSchema".someTable')
{SomeSchema,sometable}
pg_client_encoding()
name
Current client encoding name
pg_client_encoding()
SQL_ASCII
quote_ident(stringtext)
text
Return the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are properly doubled. See also .
quote_ident('Foo bar')
"Foo bar"
quote_literal(stringtext)
text
Return the given string suitably quoted to be used as a string literal in an SQL statement string. Embedded single-quotes and backslashes are properly doubled. Note that quote_literal returns null on null input; if the argument might be null, quote_nullable is often more suitable. See also .
quote_literal(E'O\'Reilly')
'O''Reilly'
quote_literal(valueanyelement)
text
Coerce the given value to text and then quote it as a literal. Embedded single-quotes and backslashes are properly doubled.
quote_literal(42.5)
'42.5'
quote_nullable(stringtext)
text
Return the given string suitably quoted to be used as a string literal in an SQL statement string; or, if the argument is null, return NULL. Embedded single-quotes and backslashes are properly doubled. See also .
quote_nullable(NULL)
NULL
quote_nullable(valueanyelement)
text
Coerce the given value to text and then quote it as a literal; or, if the argument is null, return NULL. Embedded single-quotes and backslashes are properly doubled.
Split string using a POSIX regular expression as the delimiter. See for more information.
regexp_split_to_table('hello world', '\s+')
hello
world(2 rows)
repeat(stringtext, numberint)
text
Repeat string the specified number of times
repeat('Pg', 4)
PgPgPgPg
replace(stringtext, fromtext, totext)
text
Replace all occurrences in string of substring from with substring to
replace('abcdefabcdef', 'cd', 'XX')
abXXefabXXef
reverse(str)
text
Return reversed string.
reverse('abcde')
edcba
right(strtext, nint)
text
Return last n characters in the string. When n is negative, return all but first |n| characters.
right('abcde', 2)
de
rpad(stringtext, lengthint [, filltext])
text
Fill up the string to length length by appending the characters fill (a space by default). If the string is already longer than length then it is truncated.
rpad('hi', 5, 'xy')
hixyx
rtrim(stringtext [, characterstext])
text
Remove the longest string containing only characters from characters (a space by default) from the end of string
rtrim('testxxzx', 'xyz')
test
split_part(stringtext, delimitertext, fieldint)
text
Split string on delimiter and return the given field (counting from one)
split_part('abc~@~def~@~ghi', '~@~', 2)
def
strpos(string, substring)
int
Location of specified substring (same as position(substring in string), but note the reversed argument order)
strpos('high', 'ig')
2
substr(string, from [, count])
text
回傳子字串(與 substring(string from from for count) 相同)
substr('alphabet', 3, 2)
ph
starts_with(string, prefix)
bool
Returns true if string starts with prefix.
starts_with('alphabet', 'alph')
t
to_ascii(stringtext [, encodingtext])
text
Convert string to ASCII from another encoding (only supports conversion from LATIN1, LATIN2, LATIN9, and WIN1250 encodings)
to_ascii('Karel')
Karel
to_hex(numberint or bigint)
text
Convert number to its equivalent hexadecimal representation
to_hex(2147483647)
7fffffff
translate(stringtext, fromtext, totext)
text
Any character in string that matches a character in the from set is replaced by the corresponding character in the to set. If from is longer than to, occurrences of the extra characters in from are removed.
translate('12345', '143', 'ax')
a2x5
big5_to_euc_tw
BIG5
EUC_TW
big5_to_mic
BIG5
MULE_INTERNAL
big5_to_utf8
BIG5
UTF8
euc_cn_to_mic
EUC_CN
MULE_INTERNAL
euc_cn_to_utf8
EUC_CN
UTF8
euc_jp_to_mic
EUC_JP
MULE_INTERNAL
euc_jp_to_sjis
EUC_JP
SJIS
euc_jp_to_utf8
EUC_JP
UTF8
euc_kr_to_mic
EUC_KR
MULE_INTERNAL
euc_kr_to_utf8
EUC_KR
UTF8
euc_tw_to_big5
EUC_TW
BIG5
euc_tw_to_mic
EUC_TW
MULE_INTERNAL
euc_tw_to_utf8
EUC_TW
UTF8
gb18030_to_utf8
GB18030
UTF8
gbk_to_utf8
GBK
UTF8
iso_8859_10_to_utf8
LATIN6
UTF8
iso_8859_13_to_utf8
LATIN7
UTF8
iso_8859_14_to_utf8
LATIN8
UTF8
iso_8859_15_to_utf8
LATIN9
UTF8
iso_8859_16_to_utf8
LATIN10
UTF8
iso_8859_1_to_mic
LATIN1
MULE_INTERNAL
iso_8859_1_to_utf8
LATIN1
UTF8
iso_8859_2_to_mic
LATIN2
MULE_INTERNAL
iso_8859_2_to_utf8
LATIN2
UTF8
iso_8859_2_to_windows_1250
LATIN2
WIN1250
iso_8859_3_to_mic
LATIN3
MULE_INTERNAL
iso_8859_3_to_utf8
LATIN3
UTF8
iso_8859_4_to_mic
LATIN4
MULE_INTERNAL
iso_8859_4_to_utf8
LATIN4
UTF8
iso_8859_5_to_koi8_r
ISO_8859_5
KOI8R
iso_8859_5_to_mic
ISO_8859_5
MULE_INTERNAL
iso_8859_5_to_utf8
ISO_8859_5
UTF8
iso_8859_5_to_windows_1251
ISO_8859_5
WIN1251
iso_8859_5_to_windows_866
ISO_8859_5
WIN866
iso_8859_6_to_utf8
ISO_8859_6
UTF8
iso_8859_7_to_utf8
ISO_8859_7
UTF8
iso_8859_8_to_utf8
ISO_8859_8
UTF8
iso_8859_9_to_utf8
LATIN5
UTF8
johab_to_utf8
JOHAB
UTF8
koi8_r_to_iso_8859_5
KOI8R
ISO_8859_5
koi8_r_to_mic
KOI8R
MULE_INTERNAL
koi8_r_to_utf8
KOI8R
UTF8
koi8_r_to_windows_1251
KOI8R
WIN1251
koi8_r_to_windows_866
KOI8R
WIN866
koi8_u_to_utf8
KOI8U
UTF8
mic_to_ascii
MULE_INTERNAL
SQL_ASCII
mic_to_big5
MULE_INTERNAL
BIG5
mic_to_euc_cn
MULE_INTERNAL
EUC_CN
mic_to_euc_jp
MULE_INTERNAL
EUC_JP
mic_to_euc_kr
MULE_INTERNAL
EUC_KR
mic_to_euc_tw
MULE_INTERNAL
EUC_TW
mic_to_iso_8859_1
MULE_INTERNAL
LATIN1
mic_to_iso_8859_2
MULE_INTERNAL
LATIN2
mic_to_iso_8859_3
MULE_INTERNAL
LATIN3
mic_to_iso_8859_4
MULE_INTERNAL
LATIN4
mic_to_iso_8859_5
MULE_INTERNAL
ISO_8859_5
mic_to_koi8_r
MULE_INTERNAL
KOI8R
mic_to_sjis
MULE_INTERNAL
SJIS
mic_to_windows_1250
MULE_INTERNAL
WIN1250
mic_to_windows_1251
MULE_INTERNAL
WIN1251
mic_to_windows_866
MULE_INTERNAL
WIN866
sjis_to_euc_jp
SJIS
EUC_JP
sjis_to_mic
SJIS
MULE_INTERNAL
sjis_to_utf8
SJIS
UTF8
windows_1258_to_utf8
WIN1258
UTF8
uhc_to_utf8
UHC
UTF8
utf8_to_ascii
UTF8
SQL_ASCII
utf8_to_big5
UTF8
BIG5
utf8_to_euc_cn
UTF8
EUC_CN
utf8_to_euc_jp
UTF8
EUC_JP
utf8_to_euc_kr
UTF8
EUC_KR
utf8_to_euc_tw
UTF8
EUC_TW
utf8_to_gb18030
UTF8
GB18030
utf8_to_gbk
UTF8
GBK
utf8_to_iso_8859_1
UTF8
LATIN1
utf8_to_iso_8859_10
UTF8
LATIN6
utf8_to_iso_8859_13
UTF8
LATIN7
utf8_to_iso_8859_14
UTF8
LATIN8
utf8_to_iso_8859_15
UTF8
LATIN9
utf8_to_iso_8859_16
UTF8
LATIN10
utf8_to_iso_8859_2
UTF8
LATIN2
utf8_to_iso_8859_3
UTF8
LATIN3
utf8_to_iso_8859_4
UTF8
LATIN4
utf8_to_iso_8859_5
UTF8
ISO_8859_5
utf8_to_iso_8859_6
UTF8
ISO_8859_6
utf8_to_iso_8859_7
UTF8
ISO_8859_7
utf8_to_iso_8859_8
UTF8
ISO_8859_8
utf8_to_iso_8859_9
UTF8
LATIN5
utf8_to_johab
UTF8
JOHAB
utf8_to_koi8_r
UTF8
KOI8R
utf8_to_koi8_u
UTF8
KOI8U
utf8_to_sjis
UTF8
SJIS
utf8_to_windows_1258
UTF8
WIN1258
utf8_to_uhc
UTF8
UHC
utf8_to_windows_1250
UTF8
WIN1250
utf8_to_windows_1251
UTF8
WIN1251
utf8_to_windows_1252
UTF8
WIN1252
utf8_to_windows_1253
UTF8
WIN1253
utf8_to_windows_1254
UTF8
WIN1254
utf8_to_windows_1255
UTF8
WIN1255
utf8_to_windows_1256
UTF8
WIN1256
utf8_to_windows_1257
UTF8
WIN1257
utf8_to_windows_866
UTF8
WIN866
utf8_to_windows_874
UTF8
WIN874
windows_1250_to_iso_8859_2
WIN1250
LATIN2
windows_1250_to_mic
WIN1250
MULE_INTERNAL
windows_1250_to_utf8
WIN1250
UTF8
windows_1251_to_iso_8859_5
WIN1251
ISO_8859_5
windows_1251_to_koi8_r
WIN1251
KOI8R
windows_1251_to_mic
WIN1251
MULE_INTERNAL
windows_1251_to_utf8
WIN1251
UTF8
windows_1251_to_windows_866
WIN1251
WIN866
windows_1252_to_utf8
WIN1252
UTF8
windows_1256_to_utf8
WIN1256
UTF8
windows_866_to_iso_8859_5
WIN866
ISO_8859_5
windows_866_to_koi8_r
WIN866
KOI8R
windows_866_to_mic
WIN866
MULE_INTERNAL
windows_866_to_utf8
WIN866
UTF8
windows_866_to_windows_1251
WIN866
WIN
windows_874_to_utf8
WIN874
UTF8
euc_jis_2004_to_utf8
EUC_JIS_2004
UTF8
utf8_to_euc_jis_2004
UTF8
EUC_JIS_2004
shift_jis_2004_to_utf8
SHIFT_JIS_2004
UTF8
utf8_to_shift_jis_2004
UTF8
SHIFT_JIS_2004
euc_jis_2004_to_shift_jis_2004
EUC_JIS_2004
SHIFT_JIS_2004
shift_jis_2004_to_euc_jis_2004
SHIFT_JIS_2004
EUC_JIS_2004
L quotes the argument value as an SQL literal. A null value is displayed as the string NULL, without quotes (equivalent to quote_nullable).
Function
Return Type
Description
Example
Result
Function
Return Type
Description
Example
Result
ascii(string)
int
ASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character.