1 of 75

VIII. 附錄

B. 日期時間格式支援

PostgreSQL內建支援智慧式日期時間輸入格式。日期及時間以字串格式輸入，以預先定義的方式，分隔為多個欄位。每一個欄位可能會被解譯為數字、忽視或拒絕。處理程式也內建常見的描述字，如月份、星期及時區名稱。

本文件說明日期及時間解譯的方法和步驟，也包含了常見描述字的列表。

https://www.postgresql.org/docs/10/static/datetime-appendix.html

B.1. 日期時間解譯流程

日期時間資料將會以下列流程解譯：（分隔符號均為半型文字）

以分隔符號將其分解為多個段落，如字串、時區或數字。
1. 如果是以冒號（:）分隔的數字格式，那麼這是一個時間的字串，其所包括的內容都是時間資訊的一部份。
2. 如果是以連字號（-）、斜線（/）、或兩個以上的間隔號（.）所分隔的數字格式，那麼這是一個日期的字串，它可能包含文字型式的月份名稱。但如果日期分隔符號已經先出現了，那麼它將被解釋為時區資訊（例如：Asia/Taipei）。
3. 如果整個字串都是數字所組成，那麼它可能是符合ISO 8601格式的日期（例如：19990113，表示西元1999年1月3日），或時間（例如：141516，表示14:15:16）。
4. 如果它是以加號（+）或減號（-）開頭的話，那麼它是一個數字型態的時區資訊或是特別的區間。
如果是一個字串，進行下列比對規則：
1. 以binary-search表格，尋找時區的表示字。
2. 如果沒有找到的話，則搜尋慣用字（如：today），星期（如：Thursday），月份（如：January），或介系詞（如：at, on）。
3. 如果都沒有找到，就回傳錯誤訊息。
如果是一個由數字組成的字串，則進行下列判斷：
1. 如果是8個或6個數字，而先前也沒有讀到其他日期的資訊，那麼它會被解譯為一個數字型態的日期（如：19990118或990118），對應年月日格式為YYYYMMDD或YYMMDD。
2. 如果是3位數字，而且先前已處理到年份資訊的話，那麼它會被解譯為該年的第幾天。
3. 如果是4位或6位數字，而且先前已處理到年份資訊的話，那麼它會被解譯為時間資訊，對應格式為HHMM或HHMMSS。
4. 如果是3個或更多的數字，並且先前未處理到日期資訊的話，那麼它會是年份資訊。（這里將直接判斷為yy-mm-dd的日期格式。）
5. 最後，日期格式將依DateStyle所定義的，設定為：mm-dd-yy，dd-mm-yy，或yy-mm-dd。其中如果月份或日子名稱無法找到合法字詞的話，那將會回傳錯誤的訊息。
如果指定了BC（西元前）的話，那麼實際儲存值將是負的年份數再加1。（陽曆Gregorian year中並無西元0年，所以西元1年，以數值0作為年份的記錄值。）
沒有指定BC的2位數年份，將自動被調整為4位數。其規則為：若小於70，則加2000作為其記錄值，否則就加1900作為記錄值。

小技巧

若需要描述西元（Gregorian years AD） 1-99年，請將年份數值以0補滿4位數（例如：0099表西元99年）。
https://www.postgresql.org/docs/10/static/datetime-input-rules.html

B.3. 日期時間設定檔

由於時區縮寫並沒有很好地標準化，PostgreSQL 提供了一種自訂的伺服器接受的縮寫集方法。timezone_abbreviations 運行時參數決定有效的縮寫集。雖然此參數可由任何資料庫使用者變更，但其可能的值受資料庫管理員控制 - 實際上它們是儲存在安裝目錄的 .../share/timezonesets/ 中的組態檔案的名稱。透過增加或變更該目錄中的檔案，管理員可以為時區縮寫設定本地策略。

timezone_abbreviations 可以設定為在 .../share/timezonesets/ 中找到的任何檔案名稱，檔案的名稱需要完全是英文字母的。（禁止 timezone_abbreviations 中的非英文字母字元可以防止讀取目標目錄之外的檔案，以及讀取編輯器備份檔案和其他無關檔案。）

時區縮寫檔案可以包含空白行和以 # 開頭的註釋。非註釋行必須具有以下格式之一：

zone_abbreviation offset
zone_abbreviation offset D
zone_abbreviation time_zone_name
@INCLUDE file_name
@OVERRIDE

zone_abbreviation 只是定義的縮寫。offset 是一個整數，定義與 UTC 相等的偏移量，以秒為單位，正向格林威治為東，負向為西。例如，-18000 將在格林威治以西 5 小時或北美東海岸標準時間。D 的區域名稱表示本地夏令時間而不是標準時間。

或者，可以定義 time_zone_name，引用 IANA 時區資料庫中定義的區域名稱。查詢區域的定義以查看該區域中是否正在使用縮寫，如果是，則使用適當的含義 - 即，目前正在確定其值的時間戳記中使用的含義，或者如果當時不是目前使用的話，則使用之前的含義，如果僅在該時間之後使用，則使用最舊的含義。這種行為對於處理其含義歷史變化的縮寫至關重要。還允許根據區域名稱定義縮寫，其中不出現該縮寫；使用縮寫就等於寫出區域名稱。

小技巧 在定義與 UTC 的偏移量從未改變的縮寫時，偏好使用簡單的整數偏移量，因為這些縮寫比需要查閱時區定義的縮寫要簡單得多。

@INCLUDE 語法允許在 .../share/timezonesets/ 目錄中包含另一個檔案。包含可以嵌套到某個有限的深度。

@OVERRIDE 語法指示檔案中的後續項目可以覆蓋先前的項目（通常是從包含的檔案中獲取的項目）。如果沒有這個，相同時區縮寫的衝突定義將被視為錯誤。

在未修改的安裝環境中，檔案 Default 包含了世界上大多數地區的所有非衝突時區縮寫。為這些區域提供了澳大利亞和印度的附加檔案：這些檔案先有預設設定，然後根據需要增加或修改縮寫。

出於參考目的，標準安裝還包含 Africa.txt，America.txt 等文件，其中包含有關根據 IANA 時區資料庫已知正在使用的每個時區縮寫的訊息。可以根據需要將這些檔案中找到的區域名稱定義複製並貼到自行定義配置檔案中。請注意，由於名稱中包含了點，因此無法將這些檔案直接引用為 timezone_abbreviations 設定。

注意

如果在讀取時區縮寫集時發生錯誤，則不會應用新值並保留舊值。如果在啟動資料庫時發生錯誤，則啟動失敗。

配置檔案中定義的時區縮寫會覆寫 PostgreSQL 中內建的非時區定義。例如，澳大利亞配置檔案定義了 SAT（南澳大利亞標準時間）。當此檔案有效時，SAT 將不會被識別為星期六的縮寫。

如果您修改 .../share/timezonesets/ 中的檔案，則由您來進行備份 - 正常的資料庫轉存將不包含此目錄。

C. SQL 關鍵字

Table C.1 lists all tokens that are key words in the SQL standard and in PostgreSQL 11devel. Background information can be found in Section 4.1.1. (For space reasons, only the latest two versions of the SQL standard, and SQL-92 for historical comparison, are included. The differences between those and the other intermediate standard versions are small.)

SQL distinguishes between reserved and non-reserved key words. According to the standard, reserved key words are the only real key words; they are never allowed as identifiers. Non-reserved key words only have a special meaning in particular contexts and can be used as identifiers in other contexts. Most non-reserved key words are actually the names of built-in tables and functions specified by SQL. The concept of non-reserved key words essentially only exists to declare that some predefined meaning is attached to a word in some contexts.

In the PostgreSQL parser life is a bit more complicated. There are several different classes of tokens ranging from those that can never be used as an identifier to those that have absolutely no special status in the parser as compared to an ordinary identifier. (The latter is usually the case for functions specified by SQL.) Even reserved key words are not completely reserved in PostgreSQL, but can be used as column labels (for example, SELECT 55 AS CHECK, even though CHECK is a reserved key word).

In Table C.1 in the column for PostgreSQL we classify as “non-reserved” those key words that are explicitly known to the parser but are allowed as column or table names. Some key words that are otherwise non-reserved cannot be used as function or data type names and are marked accordingly. (Most of these words represent built-in functions or data types with special syntax. The function or type is still available but it cannot be redefined by the user.) Labeled “reserved” are those tokens that are not allowed as column or table names. Some reserved key words are allowable as names for functions or data types; this is also shown in the table. If not so marked, a reserved key word is only allowed as an “AS” column label name.

As a general rule, if you get spurious parser errors for commands that contain any of the listed key words as an identifier you should try to quote the identifier to see if the problem goes away.

It is important to understand before studying Table C.1 that the fact that a key word is not reserved in PostgreSQL does not mean that the feature related to the word is not implemented. Conversely, the presence of a key word does not indicate the existence of a feature.

Table C.1. SQL Key Words

E. 版本資訊

The release notes contain the significant changes in each PostgreSQL release, with major features and migration issues listed at the top. The release notes do not contain changes that affect only a few users or changes that are internal and therefore not user-visible. For example, the optimizer is improved in almost every release, but the improvements are usually observed by users as simply faster queries.

A complete list of changes for each release can be obtained by viewing the Git logs for each release. The pgsql-committers email list records all source code changes as well. There is also a web interface that shows changes to specific files.

The name appearing next to each item represents the major developer for that item. Of course all changes involve community discussion and patch review, so each item is truly a community effort.

E.1. Release 13.3

Release date: 2021-05-13

This release contains a variety of fixes from 13.2. For information about new features in major release 13, see Section E.4.

E.1.1. Migration to Version 13.3

A dump/restore is not required for those running 13.X.

However, if you are upgrading from a version earlier than 13.2, see Section E.2.

E.1.2. Changes

Prevent integer overflows in array subscripting calculations (Tom Lane)
The array code previously did not complain about cases where an array's lower bound plus length overflows an integer. This resulted in later entries in the array becoming inaccessible (since their subscripts could not be written as integers), but more importantly it confused subsequent assignment operations. This could lead to memory overwrites, with ensuing crashes or unwanted data modifications. (CVE-2021-32027)
Fix mishandling of “junk” columns in INSERT ... ON CONFLICT ... UPDATE target lists (Tom Lane)
If the UPDATE list contains any multi-column sub-selects (which give rise to junk columns in addition to the results proper), the UPDATE path would end up storing tuples that include the values of the extra junk columns. That's fairly harmless in the short run, but if new columns are added to the table then the values would become accessible, possibly leading to malfunctions if they don't match the datatypes of the added columns.
In addition, in versions supporting cross-partition updates, a cross-partition update triggered by such a case had the reverse problem: the junk columns were removed from the target list, typically causing an immediate crash due to malfunction of the multi-column sub-select mechanism. (CVE-2021-32028)
Fix possibly-incorrect computation of UPDATE ... RETURNING outputs for joined cross-partition updates (Amit Langote, Etsuro Fujita)
If an UPDATE for a partitioned table caused a row to be moved to another partition with a physically different row type (for example, one with a different set of dropped columns), computation of RETURNING results for that row could produce errors or wrong answers. No error is observed unless the UPDATE involves other tables being joined to the target table. (CVE-2021-32029)
Fix adjustment of constraint deferrability properties in partitioned tables (Álvaro Herrera)
When applied to a foreign-key constraint of a partitioned table, ALTER TABLE ... ALTER CONSTRAINT failed to adjust the DEFERRABLE and/or INITIALLY DEFERRED markings of the constraints and triggers of leaf partitions. This led to unexpected behavior of such constraints. After updating to this version, any misbehaving partitioned tables can be fixed by executing a new ALTER command to set the desired properties.
This change also disallows applying such an ALTER directly to the constraints of leaf partitions. The only supported case is for the whole partitioning hierarchy to have identical constraint properties, so such ALTERs must be applied at the partition root.
When attaching a child table with ALTER TABLE ... INHERIT, insist that any generated columns in the parent be generated the same way in the child (Peter Eisentraut)
Forbid marking an identity column as nullable (Vik Fearing)
GENERATED ALWAYS AS IDENTITY implies NOT NULL, so don't allow it to be combined with an explicit NULL specification.
Allow ALTER ROLE/DATABASE ... SET to set the role, session_authorization, and temp_buffers parameters (Tom Lane)
Previously, over-eager validity checks might reject these commands, even if the values would have worked when used later. This created a command ordering hazard for dump/reload and upgrade scenarios.
Ensure that REINDEX CONCURRENTLY preserves any statistics target that's been set for the index (Michael Paquier)
Fix COMMIT AND CHAIN to work correctly when the current transaction has live savepoints (Fujii Masao)
Fix list-manipulation bug in WITH RECURSIVE processing (Michael Paquier, Tom Lane)
Sufficiently deep nesting of WITH constructs (at least seven levels) triggered core dumps or incorrect complaints of faulty WITH nesting.
Fix bug with coercing the result of a COLLATE expression to a non-collatable type (Tom Lane)
This led to a parse tree in which the COLLATE appears to be applied to a non-collatable value. While that normally has no real impact (since COLLATE has no effect at runtime), it was possible to construct views that would be rejected during dump/reload.
Fix use-after-free bug in saving tuples for AFTER triggers (Amit Langote)
This could cause crashes in some situations.
Disallow calling window functions and procedures via the “fast path” wire protocol message (Tom Lane)
Only plain functions are supported here. While trying to call an aggregate function failed already, calling a window function would crash, and calling a procedure would work only if the procedure did no transaction control.
Extend pg_identify_object_as_address() to support event triggers (Joel Jacobson)
Fix to_char()'s handling of Roman-numeral month format codes with negative intervals (Julien Rouhaud)
Previously, such cases would usually cause a crash.
Check that the argument of pg_import_system_collations() is a valid schema OID (Tom Lane)
Fix use of uninitialized value while parsing an \{m,n\} quantifier in a BRE-mode regular expression (Tom Lane)
This error could cause the quantifier to act non-greedy, that is behave like an {m,n}? quantifier would do in full regular expressions.
Fix “could not find pathkey item to sort” planner errors in some situations where the sort key involves an aggregate or window function (James Coleman, Tom Lane)
Don't ignore system columns when estimating the number of groups using extended statistics (Tomas Vondra)
This led to strange estimates for queries such as SELECT ... GROUP BY a, b, ctid.
Avoid divide-by-zero when estimating selectivity of a regular expression with a very long fixed prefix (Tom Lane)
This typically led to a NaN selectivity value, causing assertion failures or strange planner behavior.
Fix access-off-the-end-of-the-table error in BRIN index bitmap scans (Tomas Vondra)
If the page range size used by a BRIN index isn't a power of two, there were corner cases in which a bitmap scan could try to fetch pages past the actual end of the table, leading to “could not open file” errors.
Fix potentially wrong answers from GIN tsvector index searches, when there are many matching tuples (Tom Lane)
If the number of index matches became large enough to make the bitmap holding them become lossy (a threshold that depends on work_mem), the code could get confused about whether rechecks are required, allowing rows to be returned that don't actually match the query.
Fix concurrency issues with WAL segment recycling on Windows (Michael Paquier)
This reverts a change that caused intermittent “could not rename file ...: Permission denied” log messages. While there were not serious consequences, the log spam was annoying.
Avoid incorrect timeline change while recovering uncommitted two-phase transactions from WAL (Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap)
This error could lead to subsequent WAL records being written under the wrong timeline ID, leading to consistency problems, or even complete failure to be able to restart the server, later on.
Ensure that locks are released while shutting down a standby server's startup process (Fujii Masao)
When a standby server is shut down while still in recovery, some locks might be left held. This causes assertion failures in debug builds; it's unclear whether any serious consequence could occur in production builds.
Fix crash when a logical replication worker does ALTER SUBSCRIPTION REFRESH (Peter Smith)
The core code won't do this, but a replica trigger could.
Ensure we default to wal_sync_method = fdatasync on recent FreeBSD (Thomas Munro)
FreeBSD 13 supports open_datasync, which would normally become the default choice. However, it's unclear whether that is actually an improvement for Postgres, so preserve the existing default for now.
Disable the vacuum_cleanup_index_scale_factor parameter and storage option (Peter Geoghegan)
The notion of tracking “stale” index statistics proved to interact badly with the autovacuum_vacuum_insert_threshold parameter, resulting in unnecessary full-index scans and consequent degradation of autovacuum performance. The latter mechanism seems superior, so remove the stale-statistics logic. The control parameter for that, vacuum_cleanup_index_scale_factor, will be removed entirely in v14. In v13, it remains present to avoid breaking existing configuration files, but it no longer does anything.
Pass the correct trigger OID to object post-alter hooks during ALTER CONSTRAINT (Álvaro Herrera)
When updating trigger properties during ALTER CONSTRAINT, the post-alter hook was told that we are updating a trigger, but the constraint's OID was passed instead of the trigger's.
Ensure we finish cleaning up when interrupted while detaching a DSM segment (Thomas Munro)
This error could result in temporary files not being cleaned up promptly after a parallel query.
Fix assorted minor memory leaks in the server (Tom Lane, Andres Freund)
Fix uninitialized variable in walreceiver's statistics in shared memory (Fujii Masao)
This error was harmless on most platforms, but could cause issues on platforms lacking atomic variables and/or spinlock support.
Reduce the overhead of dtrace probes for LWLock operations, when dtrace support is compiled in but not active (Peter Eisentraut)
Fix failure when a PL/pgSQL DO block makes use of both composite-type variables and transaction control (Tom Lane)
Previously, such cases led to errors about leaked tuple descriptors.
Prevent infinite loop in libpq if a ParameterDescription message with a corrupt length is received (Tom Lane)
When initdb prints instructions about how to start the server, make the path shown for pg_ctl use backslash separators on Windows (Nitin Jadhav)
Fix psql to restore the previous behavior of \connect service=something (Tom Lane)
A previous bug fix caused environment variables (such as PGPORT) to override entries in the service file in this context. Restore the previous behavior, in which the priority is the other way around.
Fix psql's ON_ERROR_ROLLBACK feature to handle COMMIT AND CHAIN commands correctly (Arthur Nascimento)
Previously, this case failed with “savepoint "pg_psql_temporary_savepoint" does not exist”.
In psql, avoid repeated “could not print result table” failures after the first such error (Álvaro Herrera)
Fix race condition in detection of file modification by psql's \e and related commands (Laurenz Albe)
A very fast typist could fool the code's file-timestamp-based detection of whether the temporary edit file was changed.
Fix pg_dump's dumping of generated columns in partitioned tables (Peter Eisentraut)
A fix introduced in the previous minor release should not be applied to partitioned tables, only traditionally-inherited tables.
Fix missed file version check in pg_restore (Tom Lane)
When reading a custom-format archive from a non-seekable source, pg_restore neglected to check the archive version. If it was fed a newer archive version than it can support, it would fail messily later on.
Add some more checks to pg_upgrade for user tables containing non-upgradable data types (Tom Lane)
Fix detection of some cases where a non-upgradable data type is embedded within a container type (such as an array or range). Also disallow upgrading when user tables contain columns of system-defined composite types, since those types' OIDs are not stable across versions.
Fix incorrect progress-reporting calculation in pg_checksums (Shinya Kato)
Fix pg_waldump to count XACT records correctly when generating per-record statistics (Kyotaro Horiguchi)
Fix contrib/amcheck to not complain about the tuple flags HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both being set (Julien Rouhaud)
This is a valid state after SELECT FOR UPDATE.
Adjust VPATH build rules to support recent Oracle Developer Studio compiler versions (Noah Misch)
Fix testing of PL/Python for Python 3 on Solaris (Noah Misch)

B.3. 日期時間設定檔

時區縮寫檔案可以包含空白行和以 # 開頭的註釋。非註釋行必須具有以下格式之一：

zone_abbreviation offset
zone_abbreviation offset D
zone_abbreviation time_zone_name
@INCLUDE file_name
@OVERRIDE

小技巧 在定義與 UTC 的偏移量從未改變的縮寫時，偏好使用簡單的整數偏移量，因為這些縮寫比需要查閱時區定義的縮寫要簡單得多。

@INCLUDE 語法允許在 .../share/timezonesets/ 目錄中包含另一個檔案。包含可以嵌套到某個有限的深度。

注意

如果在讀取時區縮寫集時發生錯誤，則不會應用新值並保留舊值。如果在啟動資料庫時發生錯誤，則啟動失敗。

如果您修改 .../share/timezonesets/ 中的檔案，則由您來進行備份 - 正常的資料庫轉存將不包含此目錄。

B.1. 日期時間解譯流程

日期時間資料將會以下列流程解譯：（分隔符號均為半型文字）

以分隔符號將其分解為多個段落，如字串、時區或數字。
1. 如果是以冒號（:）分隔的數字格式，那麼這是一個時間的字串，其所包括的內容都是時間資訊的一部份。
2. 如果是以連字號（-）、斜線（/）、或兩個以上的間隔號（.）所分隔的數字格式，那麼這是一個日期的字串，它可能包含文字型式的月份名稱。但如果日期分隔符號已經先出現了，那麼它將被解釋為時區資訊（例如：Asia/Taipei）。
3. 如果整個字串都是數字所組成，那麼它可能是符合ISO 8601格式的日期（例如：19990113，表示西元1999年1月3日），或時間（例如：141516，表示14:15:16）。
4. 如果它是以加號（+）或減號（-）開頭的話，那麼它是一個數字型態的時區資訊或是特別的區間。
如果是一個字串，進行下列比對規則：
1. 以binary-search表格，尋找時區的表示字。
2. 如果沒有找到的話，則搜尋慣用字（如：today），星期（如：Thursday），月份（如：January），或介系詞（如：at, on）。
3. 如果都沒有找到，就回傳錯誤訊息。
如果是一個由數字組成的字串，則進行下列判斷：
1. 如果是8個或6個數字，而先前也沒有讀到其他日期的資訊，那麼它會被解譯為一個數字型態的日期（如：19990118或990118），對應年月日格式為YYYYMMDD或YYMMDD。
2. 如果是3位數字，而且先前已處理到年份資訊的話，那麼它會被解譯為該年的第幾天。
3. 如果是4位或6位數字，而且先前已處理到年份資訊的話，那麼它會被解譯為時間資訊，對應格式為HHMM或HHMMSS。
4. 如果是3個或更多的數字，並且先前未處理到日期資訊的話，那麼它會是年份資訊。（這里將直接判斷為yy-mm-dd的日期格式。）
5. 最後，日期格式將依DateStyle所定義的，設定為：mm-dd-yy，dd-mm-yy，或yy-mm-dd。其中如果月份或日子名稱無法找到合法字詞的話，那將會回傳錯誤的訊息。
如果指定了BC（西元前）的話，那麼實際儲存值將是負的年份數再加1。（陽曆Gregorian year中並無西元0年，所以西元1年，以數值0作為年份的記錄值。）
沒有指定BC的2位數年份，將自動被調整為4位數。其規則為：若小於70，則加2000作為其記錄值，否則就加1900作為記錄值。

小技巧

若需要描述西元（Gregorian years AD） 1-99年，請將年份數值以0補滿4位數（例如：0099表西元99年）。
https://www.postgresql.org/docs/10/static/datetime-input-rules.html

E.1. Release 13.3

Release date: 2021-05-13

This release contains a variety of fixes from 13.2. For information about new features in major release 13, see Section E.4.

E.1.1. Migration to Version 13.3

A dump/restore is not required for those running 13.X.

However, if you are upgrading from a version earlier than 13.2, see Section E.2.

E.1.2. Changes

Prevent integer overflows in array subscripting calculations (Tom Lane)
The array code previously did not complain about cases where an array's lower bound plus length overflows an integer. This resulted in later entries in the array becoming inaccessible (since their subscripts could not be written as integers), but more importantly it confused subsequent assignment operations. This could lead to memory overwrites, with ensuing crashes or unwanted data modifications. (CVE-2021-32027)
Fix mishandling of “junk” columns in INSERT ... ON CONFLICT ... UPDATE target lists (Tom Lane)
If the UPDATE list contains any multi-column sub-selects (which give rise to junk columns in addition to the results proper), the UPDATE path would end up storing tuples that include the values of the extra junk columns. That's fairly harmless in the short run, but if new columns are added to the table then the values would become accessible, possibly leading to malfunctions if they don't match the datatypes of the added columns.
In addition, in versions supporting cross-partition updates, a cross-partition update triggered by such a case had the reverse problem: the junk columns were removed from the target list, typically causing an immediate crash due to malfunction of the multi-column sub-select mechanism. (CVE-2021-32028)
Fix possibly-incorrect computation of UPDATE ... RETURNING outputs for joined cross-partition updates (Amit Langote, Etsuro Fujita)
If an UPDATE for a partitioned table caused a row to be moved to another partition with a physically different row type (for example, one with a different set of dropped columns), computation of RETURNING results for that row could produce errors or wrong answers. No error is observed unless the UPDATE involves other tables being joined to the target table. (CVE-2021-32029)
Fix adjustment of constraint deferrability properties in partitioned tables (Álvaro Herrera)
When applied to a foreign-key constraint of a partitioned table, ALTER TABLE ... ALTER CONSTRAINT failed to adjust the DEFERRABLE and/or INITIALLY DEFERRED markings of the constraints and triggers of leaf partitions. This led to unexpected behavior of such constraints. After updating to this version, any misbehaving partitioned tables can be fixed by executing a new ALTER command to set the desired properties.
This change also disallows applying such an ALTER directly to the constraints of leaf partitions. The only supported case is for the whole partitioning hierarchy to have identical constraint properties, so such ALTERs must be applied at the partition root.
When attaching a child table with ALTER TABLE ... INHERIT, insist that any generated columns in the parent be generated the same way in the child (Peter Eisentraut)
Forbid marking an identity column as nullable (Vik Fearing)
GENERATED ALWAYS AS IDENTITY implies NOT NULL, so don't allow it to be combined with an explicit NULL specification.
Allow ALTER ROLE/DATABASE ... SET to set the role, session_authorization, and temp_buffers parameters (Tom Lane)
Previously, over-eager validity checks might reject these commands, even if the values would have worked when used later. This created a command ordering hazard for dump/reload and upgrade scenarios.
Ensure that REINDEX CONCURRENTLY preserves any statistics target that's been set for the index (Michael Paquier)
Fix COMMIT AND CHAIN to work correctly when the current transaction has live savepoints (Fujii Masao)
Fix list-manipulation bug in WITH RECURSIVE processing (Michael Paquier, Tom Lane)
Sufficiently deep nesting of WITH constructs (at least seven levels) triggered core dumps or incorrect complaints of faulty WITH nesting.
Fix bug with coercing the result of a COLLATE expression to a non-collatable type (Tom Lane)
This led to a parse tree in which the COLLATE appears to be applied to a non-collatable value. While that normally has no real impact (since COLLATE has no effect at runtime), it was possible to construct views that would be rejected during dump/reload.
Fix use-after-free bug in saving tuples for AFTER triggers (Amit Langote)
This could cause crashes in some situations.
Disallow calling window functions and procedures via the “fast path” wire protocol message (Tom Lane)
Only plain functions are supported here. While trying to call an aggregate function failed already, calling a window function would crash, and calling a procedure would work only if the procedure did no transaction control.
Extend pg_identify_object_as_address() to support event triggers (Joel Jacobson)
Fix to_char()'s handling of Roman-numeral month format codes with negative intervals (Julien Rouhaud)
Previously, such cases would usually cause a crash.
Check that the argument of pg_import_system_collations() is a valid schema OID (Tom Lane)
Fix use of uninitialized value while parsing an \{m,n\} quantifier in a BRE-mode regular expression (Tom Lane)
This error could cause the quantifier to act non-greedy, that is behave like an {m,n}? quantifier would do in full regular expressions.
Fix “could not find pathkey item to sort” planner errors in some situations where the sort key involves an aggregate or window function (James Coleman, Tom Lane)
Don't ignore system columns when estimating the number of groups using extended statistics (Tomas Vondra)
This led to strange estimates for queries such as SELECT ... GROUP BY a, b, ctid.
Avoid divide-by-zero when estimating selectivity of a regular expression with a very long fixed prefix (Tom Lane)
This typically led to a NaN selectivity value, causing assertion failures or strange planner behavior.
Fix access-off-the-end-of-the-table error in BRIN index bitmap scans (Tomas Vondra)
If the page range size used by a BRIN index isn't a power of two, there were corner cases in which a bitmap scan could try to fetch pages past the actual end of the table, leading to “could not open file” errors.
Fix potentially wrong answers from GIN tsvector index searches, when there are many matching tuples (Tom Lane)
If the number of index matches became large enough to make the bitmap holding them become lossy (a threshold that depends on work_mem), the code could get confused about whether rechecks are required, allowing rows to be returned that don't actually match the query.
Fix concurrency issues with WAL segment recycling on Windows (Michael Paquier)
This reverts a change that caused intermittent “could not rename file ...: Permission denied” log messages. While there were not serious consequences, the log spam was annoying.
Avoid incorrect timeline change while recovering uncommitted two-phase transactions from WAL (Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap)
This error could lead to subsequent WAL records being written under the wrong timeline ID, leading to consistency problems, or even complete failure to be able to restart the server, later on.
Ensure that locks are released while shutting down a standby server's startup process (Fujii Masao)
When a standby server is shut down while still in recovery, some locks might be left held. This causes assertion failures in debug builds; it's unclear whether any serious consequence could occur in production builds.
Fix crash when a logical replication worker does ALTER SUBSCRIPTION REFRESH (Peter Smith)
The core code won't do this, but a replica trigger could.
Ensure we default to wal_sync_method = fdatasync on recent FreeBSD (Thomas Munro)
FreeBSD 13 supports open_datasync, which would normally become the default choice. However, it's unclear whether that is actually an improvement for Postgres, so preserve the existing default for now.
Disable the vacuum_cleanup_index_scale_factor parameter and storage option (Peter Geoghegan)
The notion of tracking “stale” index statistics proved to interact badly with the autovacuum_vacuum_insert_threshold parameter, resulting in unnecessary full-index scans and consequent degradation of autovacuum performance. The latter mechanism seems superior, so remove the stale-statistics logic. The control parameter for that, vacuum_cleanup_index_scale_factor, will be removed entirely in v14. In v13, it remains present to avoid breaking existing configuration files, but it no longer does anything.
Pass the correct trigger OID to object post-alter hooks during ALTER CONSTRAINT (Álvaro Herrera)
When updating trigger properties during ALTER CONSTRAINT, the post-alter hook was told that we are updating a trigger, but the constraint's OID was passed instead of the trigger's.
Ensure we finish cleaning up when interrupted while detaching a DSM segment (Thomas Munro)
This error could result in temporary files not being cleaned up promptly after a parallel query.
Fix assorted minor memory leaks in the server (Tom Lane, Andres Freund)
Fix uninitialized variable in walreceiver's statistics in shared memory (Fujii Masao)
This error was harmless on most platforms, but could cause issues on platforms lacking atomic variables and/or spinlock support.
Reduce the overhead of dtrace probes for LWLock operations, when dtrace support is compiled in but not active (Peter Eisentraut)
Fix failure when a PL/pgSQL DO block makes use of both composite-type variables and transaction control (Tom Lane)
Previously, such cases led to errors about leaked tuple descriptors.
Prevent infinite loop in libpq if a ParameterDescription message with a corrupt length is received (Tom Lane)
When initdb prints instructions about how to start the server, make the path shown for pg_ctl use backslash separators on Windows (Nitin Jadhav)
Fix psql to restore the previous behavior of \connect service=something (Tom Lane)
A previous bug fix caused environment variables (such as PGPORT) to override entries in the service file in this context. Restore the previous behavior, in which the priority is the other way around.
Fix psql's ON_ERROR_ROLLBACK feature to handle COMMIT AND CHAIN commands correctly (Arthur Nascimento)
Previously, this case failed with “savepoint "pg_psql_temporary_savepoint" does not exist”.
In psql, avoid repeated “could not print result table” failures after the first such error (Álvaro Herrera)
Fix race condition in detection of file modification by psql's \e and related commands (Laurenz Albe)
A very fast typist could fool the code's file-timestamp-based detection of whether the temporary edit file was changed.
Fix pg_dump's dumping of generated columns in partitioned tables (Peter Eisentraut)
A fix introduced in the previous minor release should not be applied to partitioned tables, only traditionally-inherited tables.
Fix missed file version check in pg_restore (Tom Lane)
When reading a custom-format archive from a non-seekable source, pg_restore neglected to check the archive version. If it was fed a newer archive version than it can support, it would fail messily later on.
Add some more checks to pg_upgrade for user tables containing non-upgradable data types (Tom Lane)
Fix detection of some cases where a non-upgradable data type is embedded within a container type (such as an array or range). Also disallow upgrading when user tables contain columns of system-defined composite types, since those types' OIDs are not stable across versions.
Fix incorrect progress-reporting calculation in pg_checksums (Shinya Kato)
Fix pg_waldump to count XACT records correctly when generating per-record statistics (Kyotaro Horiguchi)
Fix contrib/amcheck to not complain about the tuple flags HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both being set (Julien Rouhaud)
This is a valid state after SELECT FOR UPDATE.
Adjust VPATH build rules to support recent Oracle Developer Studio compiler versions (Noah Misch)
Fix testing of PL/Python for Python 3 on Solaris (Noah Misch)

C. SQL 關鍵字

As a general rule, if you get spurious parser errors for commands that contain any of the listed key words as an identifier you should try to quote the identifier to see if the problem goes away.

Table C.1. SQL Key Words

D. SQL 相容性

本節試圖描述 PostgreSQL 在相當程度上符合目前的 SQL 標準。以下內容可能並非是完整且一致性聲明，但它以合理且對使用者有意義的方式詳細介紹了相關的主題。

The formal name of the SQL standard is ISO/IEC 9075 “Database Language SQL”. A revised version of the standard is released from time to time; the most recent update appearing in 2016. The 2016 version is referred to as ISO/IEC 9075:2016, or simply as SQL:2016. The versions prior to that were SQL:2011, SQL:2008, SQL:2006, SQL:2003, SQL:1999, and SQL-92. Each version replaces the previous one, so claims of conformance to earlier versions have no official merit. PostgreSQL development aims for conformance with the latest official version of the standard where such conformance does not contradict traditional features or common sense. Many of the features required by the SQL standard are supported, though sometimes with slightly differing syntax or function. Further moves towards conformance can be expected over time.

SQL-92 defined three feature sets for conformance: Entry, Intermediate, and Full. Most database management systems claiming SQL standard conformance were conforming at only the Entry level, since the entire set of features in the Intermediate and Full levels was either too voluminous or in conflict with legacy behaviors.

Starting with SQL:1999, the SQL standard defines a large set of individual features rather than the ineffectively broad three levels found in SQL-92. A large subset of these features represents the “Core” features, which every conforming SQL implementation must supply. The rest of the features are purely optional.

The standard versions beginning with SQL:2003 are also split into a number of parts. Each is known by a shorthand name. Note that these parts are not consecutively numbered.

ISO/IEC 9075-1 Framework (SQL/Framework)
ISO/IEC 9075-2 Foundation (SQL/Foundation)
ISO/IEC 9075-3 Call Level Interface (SQL/CLI)
ISO/IEC 9075-4 Persistent Stored Modules (SQL/PSM)
ISO/IEC 9075-9 Management of External Data (SQL/MED)
ISO/IEC 9075-10 Object Language Bindings (SQL/OLB)
ISO/IEC 9075-11 Information and Definition Schemas (SQL/Schemata)
ISO/IEC 9075-13 Routines and Types using the Java Language (SQL/JRT)
ISO/IEC 9075-14 XML-related specifications (SQL/XML)
ISO/IEC 9075-15 Multi-dimensional arrays (SQL/MDA)

The PostgreSQL core covers parts 1, 2, 9, 11, and 14. Part 3 is covered by the ODBC driver, and part 13 is covered by the PL/Java plug-in, but exact conformance is currently not being verified for these components. There are currently no implementations of parts 4, 10, and 15 for PostgreSQL.

PostgreSQL supports most of the major features of SQL:2016. Out of 177 mandatory features required for full Core conformance, PostgreSQL conforms to at least 170. In addition, there is a long list of supported optional features. It might be worth noting that at the time of writing, no current version of any database management system claims full conformance to Core SQL:2016.

In the following two sections, we provide a list of those features that PostgreSQL supports, followed by a list of the features defined in SQL:2016 which are not yet supported in PostgreSQL. Both of these lists are approximate: There might be minor details that are nonconforming for a feature that is listed as supported, and large parts of an unsupported feature might in fact be implemented. The main body of the documentation always contains the most accurate information about what does and does not work.

Note

Feature codes containing a hyphen are subfeatures. Therefore, if a particular subfeature is not supported, the main feature is listed as unsupported even if some other subfeatures are supported.

D.3. XML Limits and Conformance to SQL/XML

Significant revisions to the XML-related specifications in ISO/IEC 9075-14 (SQL/XML) were introduced with SQL:2006. PostgreSQL's implementation of the XML data type and related functions largely follows the earlier 2003 edition, with some borrowing from later editions. In particular:

Where the current standard provides a family of XML data types to hold “document” or “content” in untyped or XML Schema-typed variants, and a type XML(SEQUENCE) to hold arbitrary pieces of XML content, PostgreSQL provides the single xml type, which can hold “document” or “content”. There is no equivalent of the standard's “sequence” type.
PostgreSQL provides two functions introduced in SQL:2006, but in variants that use the XPath 1.0 language, rather than XML Query as specified for them in the standard.

This section presents some of the resulting differences you may encounter.

D.3.1. Queries Are Restricted to XPath 1.0

The PostgreSQL-specific functions xpath() and xpath_exists() query XML documents using the XPath language. PostgreSQL also provides XPath-only variants of the standard functions XMLEXISTS and XMLTABLE, which officially use the XQuery language. For all of these functions, PostgreSQL relies on the libxml2 library, which provides only XPath 1.0.

There is a strong connection between the XQuery language and XPath versions 2.0 and later: any expression that is syntactically valid and executes successfully in both produces the same result (with a minor exception for expressions containing numeric character references or predefined entity references, which XQuery replaces with the corresponding character while XPath leaves them alone). But there is no such connection between these languages and XPath 1.0; it was an earlier language and differs in many respects.

There are two categories of limitation to keep in mind: the restriction from XQuery to XPath for the functions specified in the SQL standard, and the restriction of XPath to version 1.0 for both the standard and the PostgreSQL-specific functions.

D.3.1.1. Restriction Of XQuery To XPath

Features of XQuery beyond those of XPath include:

XQuery expressions can construct and return new XML nodes, in addition to all possible XPath values. XPath can create and return values of the atomic types (numbers, strings, and so on) but can only return XML nodes that were already present in documents supplied as input to the expression.
XQuery has control constructs for iteration, sorting, and grouping.
XQuery allows declaration and use of local functions.

Recent XPath versions begin to offer capabilities overlapping with these (such as functional-style for-each and sort, anonymous functions, and parse-xml to create a node from a string), but such features were not available before XPath 3.0.

D.3.1.2. Restriction Of XPath To 1.0

For developers familiar with XQuery and XPath 2.0 or later, XPath 1.0 presents a number of differences to contend with:

The fundamental type of an XQuery/XPath expression, the sequence, which can contain XML nodes, atomic values, or both, does not exist in XPath 1.0. A 1.0 expression can only produce a node-set (containing zero or more XML nodes), or a single atomic value.
Unlike an XQuery/XPath sequence, which can contain any desired items in any desired order, an XPath 1.0 node-set has no guaranteed order and, like any set, does not allow multiple appearances of the same item.
Note
The libxml2 library does seem to always return node-sets to PostgreSQL with their members in the same relative order they had in the input document. Its documentation does not commit to this behavior, and an XPath 1.0 expression cannot control it.
While XQuery/XPath provides all of the types defined in XML Schema and many operators and functions over those types, XPath 1.0 has only node-sets and the three atomic types boolean, double, and string.
XPath 1.0 has no conditional operator. An XQuery/XPath expression such as if ( hat ) then hat/@size else "no hat" has no XPath 1.0 equivalent.
XPath 1.0 has no ordering comparison operator for strings. Both "cat" < "dog" and "cat" > "dog" are false, because each is a numeric comparison of two NaNs. In contrast, = and != do compare the strings as strings.
XPath 1.0 blurs the distinction between value comparisons and general comparisons as XQuery/XPath define them. Both sale/@hatsize = 7 and sale/@customer = "alice" are existentially quantified comparisons, true if there is any sale with the given value for the attribute, but sale/@taxable = false() is a value comparison to the effective boolean value of a whole node-set. It is true only if no sale has a taxable attribute at all.
In the XQuery/XPath data model, a document node can have either document form (i.e., exactly one top-level element, with only comments and processing instructions outside of it) or content form (with those constraints relaxed). Its equivalent in XPath 1.0, the root node, can only be in document form. This is part of the reason an xml value passed as the context item to any PostgreSQL XPath-based function must be in document form.

The differences highlighted here are not all of them. In XQuery and the 2.0 and later versions of XPath, there is an XPath 1.0 compatibility mode, and the W3C lists of function library changes and language changes applied in that mode offer a more complete (but still not exhaustive) account of the differences. The compatibility mode cannot make the later languages exactly equivalent to XPath 1.0.

D.3.1.3. Mappings Between SQL And XML Data Types And Values

In SQL:2006 and later, both directions of conversion between standard SQL data types and the XML Schema types are specified precisely. However, the rules are expressed using the types and semantics of XQuery/XPath, and have no direct application to the different data model of XPath 1.0.

When PostgreSQL maps SQL data values to XML (as in xmlelement), or XML to SQL (as in the output columns of xmltable), except for a few cases treated specially, PostgreSQL simply assumes that the XML data type's XPath 1.0 string form will be valid as the text-input form of the SQL datatype, and conversely. This rule has the virtue of simplicity while producing, for many data types, results similar to the mappings specified in the standard.

Where interoperability with other systems is a concern, for some data types, it may be necessary to use data type formatting functions (such as those in Section 9.8) explicitly to produce the standard mappings.

D.3.2. Incidental Limits of the Implementation

This section concerns limits that are not inherent in the libxml2 library, but apply to the current implementation in PostgreSQL.

D.3.2.1. Only BY VALUE Passing Mechanism Is Supported

The SQL standard defines two passing mechanisms that apply when passing an XML argument from SQL to an XML function or receiving a result: BY REF, in which a particular XML value retains its node identity, and BY VALUE, in which the content of the XML is passed but node identity is not preserved. A mechanism can be specified before a list of parameters, as the default mechanism for all of them, or after any parameter, to override the default.

To illustrate the difference, if x is an XML value, these two queries in an SQL:2006 environment would produce true and false, respectively:

SELECT XMLQUERY('$a is $b' PASSING BY REF x AS a, x AS b NULL ON EMPTY);
SELECT XMLQUERY('$a is $b' PASSING BY VALUE x AS a, x AS b NULL ON EMPTY);

PostgreSQL will accept BY VALUE or BY REF in an XMLEXISTS or XMLTABLE construct, but it ignores them. The xml data type holds a character-string serialized representation, so there is no node identity to preserve, and passing is always effectively BY VALUE.

D.3.2.2. Cannot Pass Named Parameters To Queries

The XPath-based functions support passing one parameter to serve as the XPath expression's context item, but do not support passing additional values to be available to the expression as named parameters.

D.3.2.3. No XML(SEQUENCE) Type

The PostgreSQL xml data type can only hold a value in DOCUMENT or CONTENT form. An XQuery/XPath expression context item must be a single XML node or atomic value, but XPath 1.0 further restricts it to be only an XML node, and has no node type allowing CONTENT. The upshot is that a well-formed DOCUMENT is the only form of XML value that PostgreSQL can supply as an XPath context item.

F.3. auth_delay

auth_delay 讓伺服器在回報身份驗證失敗之前會短暫暫停，從而使對資料庫密碼的暴力攻擊更加困難。請注意，它不會阻止拒絕服務攻擊，甚至可能加劇攻擊，因為在回報身份驗證失敗之前等待的程序仍然會佔用連線。

要使用這個功能，必須透過 postgresql.conf 中的 shared_preload_libraries 載入此模組。

F.3.1. Configuration Parameters

auth_delay.milliseconds (int)

The number of milliseconds to wait before reporting an authentication failure. The default is 0.

These parameters must be set in postgresql.conf. Typical usage might be:

# postgresql.conf
shared_preload_libraries = 'auth_delay'

auth_delay.milliseconds = '500'

F.3.2. Author

KaiGai Kohei <kaigai@ak.jp.nec.com>

F.4. auto_explain

auto_explain 模組提供了一種自動記錄慢速語句執行計劃的方法，毌須手動執行 EXPLAIN。這對於在大型應用程序中追踪未最佳化的查詢特別有用。

該模組不提供 SQL 可存取的功能。要使用它，只需將其載入到伺服器中即可。您也可以將其載入到單個連線之中：

LOAD 'auto_explain';

（您必須是超級使用者才能這樣做。）更典型的用法是透過在 postgresql.conf 中的 session_preload_libraries 或 shared_preload_libraries 中包含 auto_explain 將其預先載入到部分或全部連線中。然後，無論何時發生，您都可以追踪意外緩慢的查詢。當然，會有一些系統代價。

F.4.1. 組態參數

有幾個組態參數可以控制 auto_explain 的行為。請注意，預設行為是什麼都不做，因此如果需要任何結果，必須至少設定 auto_explain.log_min_duration。

auto_explain.log_min_duration (integer)

auto_explain.log_min_duration 是記錄語句計劃的最小語句執行時間（以毫秒為單位）。將此設定為零會記錄所有計劃。減號（預設值）停用計劃的記錄。例如，如果將其設定為 250ms，則將記錄執行 250ms 或更長時間的所有語句。只有超級使用者才能變更此設定。

auto_explain.log_analyze (boolean)

auto_explain.log_analyze 會在記錄執行計劃時列印 EXPLAIN ANALYZE 輸出，而不僅僅是 EXPLAIN 輸出。預設情況下，此參數處於停用狀態。只有超級使用者才能變更此設定。

啟用此參數後，將對所有執行的語句執行每計劃節點計時，無論它們是否執行足夠長時間以實際記錄。這可能會對效能產生極為不利的影響。關閉 auto_explain.log_timing 可以獲得較少的訊息，從而改善效能成本。

auto_explain.log_buffers (boolean)

auto_explain.log_buffers 控制是否在記錄執行計劃時輸出緩衝區使用情況統計訊息；它相當於 EXPLAIN 的 BUFFERS 選項。除非啟用了 auto_explain.log_analyze，否則此參數無效。預鉆水情況下，此參數處於停用狀態。只有超級使用者才能變更改此設定。

auto_explain.log_wal (boolean)

auto_explain.log_wal 控制在記錄執行計劃時是否輸出 WAL 使用情況統計資訊。它等同於 EXPLAIN 的 WAL 選項。除非啟用了 auto_explain.log_analyze，否則此參數無效。預設情況下，此參數是停用的。只有超級使用者可以變更此設定。

auto_explain.log_timing (boolean)

auto_explain.log_timing 控制在記錄執行計劃時是否輸出每個節點的計時訊息；它相當於 EXPLAIN 的 TIMING 選項。重複讀取系統時鐘的成本會在某些系統上明顯減慢查詢速度，因此當只需要實際資料列計數而非精確時間計時，將此參數設定為關閉可能很有用。除非啟用了 auto_explain.log_analyze，否則此參數無效。預設情況下，此參數處於啟用狀態。只有超級使用者才能變更此設定。

auto_explain.log_triggers (boolean)

auto_explain.log_triggers 會在記錄執行計劃時包含觸發器執行統計訊息。除非啟用了 auto_explain.log_analyze，否則此參數無效。預設情況下，此參數處於停用狀態。只有超級使用者才能變更此設定。

auto_explain.log_verbose (boolean)

auto_explain.log_verbose 控制是否在記錄執行計劃時輸出詳細訊息；它相當於 EXPLAIN 的 VERBOSE 選項。預設情況下，此參數處於停用狀態。只有超級使用者才能變更此設定。

auto_explain.log_settings (boolean)

auto_explain.log_settings 控制記錄執行計劃時是否輸出有關被修改的組態選項資訊。輸出中僅包含影響其值與內建預設值不同的查詢計劃的選項。預設情況下，此參數是停用的。只有超級使用者可以變更此設定。

auto_explain.log_format (enum)

auto_explain.log_format 選擇要使用的 EXPLAIN 輸出格式。允許的值為 text、xml、json 和 yaml。預設為 text。只有超級使用者才能變更此設定。

auto_explain.log_level (enum)

auto_explain.log_level 選擇 auto_explain 將記錄查詢計劃的日誌等級。有效值為 DEBUG5，DEBUG4，DEBUG3，DEBUG2，DEBUG1，INFO，NOTICE，WARNING 和 LOG。預設值為 LOG。只有超級使用者可以變更此設定。

auto_explain.log_nested_statements (boolean)

auto_explain.log_nested_statements 會讓巢狀語句（在函數內執行的語句）記錄下來。關閉時，僅記錄最上層查詢計劃。預設情況下，此參數處於停用狀態。只有超級使用者才能變更此設定。

auto_explain.sample_rate (real)

auto_explain.sample_rate 使 auto_explain 僅解釋每個連線中的一小部分語句。預設值為 1，表示 EXPLAIN 所有查詢。在巢狀語句的情況下，要就全部都要解釋，要就都不解釋。只有超級使用者才能變更此設定。

在一般的用法中，這些參數在 postgresql.conf 中設定，儘管超級使用者可以在他們自己的連線中即時更改它們。典型用法可能是：

# postgresql.conf
session_preload_libraries = 'auto_explain'

auto_explain.log_min_duration = '3s'

F.4.2. 範例

postgres=# LOAD 'auto_explain';
postgres=# SET auto_explain.log_min_duration = 0;
postgres=# SET auto_explain.log_analyze = true;
postgres=# SELECT count(*)
           FROM pg_class, pg_index
           WHERE oid = indrelid AND indisunique;

這可能會產生如下的日誌輸出：

LOG:  duration: 3.651 ms  plan:
  Query Text: SELECT count(*)
              FROM pg_class, pg_index
              WHERE oid = indrelid AND indisunique;
  Aggregate  (cost=16.79..16.80 rows=1 width=0) (actual time=3.626..3.627 rows=1 loops=1)
    ->  Hash Join  (cost=4.17..16.55 rows=92 width=0) (actual time=3.349..3.594 rows=92 loops=1)
          Hash Cond: (pg_class.oid = pg_index.indrelid)
          ->  Seq Scan on pg_class  (cost=0.00..9.55 rows=255 width=4) (actual time=0.016..0.140 rows=255 loops=1)
          ->  Hash  (cost=3.02..3.02 rows=92 width=4) (actual time=3.238..3.238 rows=92 loops=1)
                Buckets: 1024  Batches: 1  Memory Usage: 4kB
                ->  Seq Scan on pg_index  (cost=0.00..3.02 rows=92 width=4) (actual time=0.008..3.187 rows=92 loops=1)
                      Filter: indisunique

F.4.3. 作者

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>

F.2. amcheck

amcheck 模塊提供的功能是讓你可以驗證關連結構邏輯的一致性。如果該結構看起來有效，就不會引發任何錯誤。

The functions verify various invariants in the structure of the representation of particular relations. The correctness of the access method functions behind index scans and other important operations relies on these invariants always holding. For example, certain functions verify, among other things, that all B-Tree pages have items in “logical” order (e.g., for B-Tree indexes on text, index tuples should be in collated lexical order). If that particular invariant somehow fails to hold, we can expect binary searches on the affected page to incorrectly guide index scans, resulting in wrong answers to SQL queries.

Verification is performed using the same procedures as those used by index scans themselves, which may be user-defined operator class code. For example, B-Tree index verification relies on comparisons made with one or more B-Tree support function 1 routines. See Section 37.16.3 for details of operator class support functions.

amcheck functions may only be used by superusers.

F.2.1. Functions

bt_index_check(index regclass, heapallindexed boolean) returns void

bt_index_check tests that its target, a B-Tree index, respects a variety of invariants. Example usage:

test=# SELECT bt_index_check(index => c.oid, heapallindexed => i.indisunique),
               c.relname,
               c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
 bt_index_check |             relname             | relpages 
----------------+---------------------------------+----------
                | pg_depend_reference_index       |       43
                | pg_depend_depender_index        |       40
                | pg_proc_proname_args_nsp_index  |       31
                | pg_description_o_c_o_index      |       21
                | pg_attribute_relid_attnam_index |       14
                | pg_proc_oid_index               |       10
                | pg_attribute_relid_attnum_index |        9
                | pg_amproc_fam_proc_index        |        5
                | pg_amop_opr_fam_index           |        5
                | pg_amop_fam_strat_index         |        5
(10 rows)

This example shows a session that performs verification of the 10 largest catalog indexes in the database “test”. Verification of the presence of heap tuples as index tuples is requested for the subset that are unique indexes. Since no error is raised, all indexes tested appear to be logically consistent. Naturally, this query could easily be changed to call bt_index_check for every index in the database where verification is supported.

bt_index_check acquires an AccessShareLock on the target index and the heap relation it belongs to. This lock mode is the same lock mode acquired on relations by simple SELECT statements. bt_index_check does not verify invariants that span child/parent relationships, but will verify the presence of all heap tuples as index tuples within the index when heapallindexed is true. When a routine, lightweight test for corruption is required in a live production environment, using bt_index_check often provides the best trade-off between thoroughness of verification and limiting the impact on application performance and availability.

bt_index_parent_check(index regclass, heapallindexed boolean, rootdescend boolean) returns void

bt_index_parent_check tests that its target, a B-Tree index, respects a variety of invariants. Optionally, when the heapallindexed argument is true, the function verifies the presence of all heap tuples that should be found within the index. When the optional rootdescend argument is true, verification re-finds tuples on the leaf level by performing a new search from the root page for each tuple. The checks that can be performed by bt_index_parent_check are a superset of the checks that can be performed by bt_index_check. bt_index_parent_check can be thought of as a more thorough variant of bt_index_check: unlike bt_index_check, bt_index_parent_check also checks invariants that span parent/child relationships, including checking that there are no missing downlinks in the index structure. bt_index_parent_check follows the general convention of raising an error if it finds a logical inconsistency or other problem.

A ShareLock is required on the target index by bt_index_parent_check (a ShareLock is also acquired on the heap relation). These locks prevent concurrent data modification from INSERT, UPDATE, and DELETE commands. The locks also prevent the underlying relation from being concurrently processed by VACUUM, as well as all other utility commands. Note that the function holds locks only while running, not for the entire transaction.

bt_index_parent_check's additional verification is more likely to detect various pathological cases. These cases may involve an incorrectly implemented B-Tree operator class used by the index that is checked, or, hypothetically, undiscovered bugs in the underlying B-Tree index access method code. Note that bt_index_parent_check cannot be used when Hot Standby mode is enabled (i.e., on read-only physical replicas), unlike bt_index_check.

Tip

bt_index_check and bt_index_parent_check both output log messages about the verification process at DEBUG1 and DEBUG2 severity levels. These messages provide detailed information about the verification process that may be of interest to PostgreSQL developers. Advanced users may also find this information helpful, since it provides additional context should verification actually detect an inconsistency. Running:

SET client_min_messages = DEBUG1;

in an interactive psql session before running a verification query will display messages about the progress of verification with a manageable level of detail.

F.2.2. Optional `heapallindexed` Verification

When the heapallindexed argument to verification functions is true, an additional phase of verification is performed against the table associated with the target index relation. This consists of a “dummy” CREATE INDEX operation, which checks for the presence of all hypothetical new index tuples against a temporary, in-memory summarizing structure (this is built when needed during the basic first phase of verification). The summarizing structure “fingerprints” every tuple found within the target index. The high level principle behind heapallindexed verification is that a new index that is equivalent to the existing, target index must only have entries that can be found in the existing structure.

The additional heapallindexed phase adds significant overhead: verification will typically take several times longer. However, there is no change to the relation-level locks acquired when heapallindexed verification is performed.

The summarizing structure is bound in size by maintenance_work_mem. In order to ensure that there is no more than a 2% probability of failure to detect an inconsistency for each heap tuple that should be represented in the index, approximately 2 bytes of memory are needed per tuple. As less memory is made available per tuple, the probability of missing an inconsistency slowly increases. This approach limits the overhead of verification significantly, while only slightly reducing the probability of detecting a problem, especially for installations where verification is treated as a routine maintenance task. Any single absent or malformed tuple has a new opportunity to be detected with each new verification attempt.

F.2.3. Using `amcheck` Effectively

amcheck can be effective at detecting various types of failure modes that data page checksums will always fail to catch. These include:

Structural inconsistencies caused by incorrect operator class implementations.
This includes issues caused by the comparison rules of operating system collations changing. Comparisons of datums of a collatable type like text must be immutable (just as all comparisons used for B-Tree index scans must be immutable), which implies that operating system collation rules must never change. Though rare, updates to operating system collation rules can cause these issues. More commonly, an inconsistency in the collation order between a master server and a standby server is implicated, possibly because the major operating system version in use is inconsistent. Such inconsistencies will generally only arise on standby servers, and so can generally only be detected on standby servers.
If a problem like this arises, it may not affect each individual index that is ordered using an affected collation, simply because indexed values might happen to have the same absolute ordering regardless of the behavioral inconsistency. See Section 23.1 and Section 23.2 for further details about how PostgreSQL uses operating system locales and collations.
Structural inconsistencies between indexes and the heap relations that are indexed (when heapallindexed verification is performed).
There is no cross-checking of indexes against their heap relation during normal operation. Symptoms of heap corruption can be subtle.
Corruption caused by hypothetical undiscovered bugs in the underlying PostgreSQL access method code, sort code, or transaction management code.
Automatic verification of the structural integrity of indexes plays a role in the general testing of new or proposed PostgreSQL features that could plausibly allow a logical inconsistency to be introduced. Verification of table structure and associated visibility and transaction status information plays a similar role. One obvious testing strategy is to call amcheck functions continuously when running the standard regression tests. See Section 32.1 for details on running the tests.
File system or storage subsystem faults where checksums happen to simply not be enabled.
Note that amcheck examines a page as represented in some shared memory buffer at the time of verification if there is only a shared buffer hit when accessing the block. Consequently, amcheck does not necessarily examine data read from the file system at the time of verification. Note that when checksums are enabled, amcheck may raise an error due to a checksum failure when a corrupt block is read into a buffer.
Corruption caused by faulty RAM, or the broader memory subsystem.
PostgreSQL does not protect against correctable memory errors and it is assumed you will operate using RAM that uses industry standard Error Correcting Codes (ECC) or better protection. However, ECC memory is typically only immune to single-bit errors, and should not be assumed to provide absolute protection against failures that result in memory corruption.
When heapallindexed verification is performed, there is generally a greatly increased chance of detecting single-bit errors, since strict binary equality is tested, and the indexed attributes within the heap are tested.

In general, amcheck can only prove the presence of corruption; it cannot prove its absence.

F.2.4. Repairing Corruption

No error concerning corruption raised by amcheck should ever be a false positive. amcheck raises errors in the event of conditions that, by definition, should never happen, and so careful analysis of amcheck errors is often required.

There is no general method of repairing problems that amcheck detects. An explanation for the root cause of an invariant violation should be sought. pageinspect may play a useful role in diagnosing corruption that amcheck detects. A REINDEX may not be effective in repairing corruption.

F.5. bloom

Bloom 提供了一種基於 Bloom fliters 索引方法。

Bloom filter 是一種節省空間的資料結構，用於測試元素是否為集合的成員。在使用索引方法時，它透過 signatures（其大小是在建立索引時確定的）快速排除不相符的內容。

Signature 是索引屬性的失真表示，因此很容易誤報。也就是說，可能會回報某個元素在集合中（實際上沒有）。因此，必須使用資料中的實際屬性值來重新檢查索引並搜尋結果。較大的 signature 會減少了誤報的機率，也減少了無效的存取次數，但當然也會使索引變大，造成掃描速度變慢。

當資料表具有許多屬性並且查詢測試它們的任意組合時，這種類型的索引最有用。傳統的 btree 索引會比 Bloom 索引快，但是它可能需要許多 btree 索引來支援所有可能的查詢，而其中一個查詢只需要一個 Bloom 索引。但是請注意，bloom 索引僅支援相等查詢，而 btree 索引也可以用於不相等和範圍查詢。

F.5.1. 參數

Bloom 索引的 WITH 子句接受以下參數：

length

每個 signature（索引項目）的長度（以位元為單位）。四捨五入到最接近的 16 的倍數。預設值為 80 位元，最大值為 4096。

col1 — col32

每個索引欄位產成的位元數。每個參數的名稱指的是它控制的索引欄位的編號。預設值為 2 位元，最大值為 4095。實際未使用的索引欄位的參數將被忽略。

F.5.2. Examples

This is an example of creating a bloom index:

CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
       WITH (length=80, col1=2, col2=2, col3=4);

The index is created with a signature length of 80 bits, with attributes i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits. We could have omitted the length, col1, and col2 specifications since those have the default values.

Here is a more complete example of bloom index definition and usage, as well as a comparison with equivalent btree indexes. The bloom index is considerably smaller than the btree index, and can perform better.

=# CREATE TABLE tbloom AS
   SELECT
     (random() * 1000000)::int as i1,
     (random() * 1000000)::int as i2,
     (random() * 1000000)::int as i3,
     (random() * 1000000)::int as i4,
     (random() * 1000000)::int as i5,
     (random() * 1000000)::int as i6
   FROM
  generate_series(1,10000000);
SELECT 10000000
=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
 pg_size_pretty
----------------
 153 MB
(1 row)
=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
 pg_size_pretty
----------------
 387 MB
(1 row)

A sequential scan over this large table takes a long time:

=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Seq Scan on tbloom  (cost=0.00..213694.08 rows=1 width=24) (actual time=1445.438..1445.438 rows=0 loops=1)
   Filter: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Filter: 10000000
 Planning time: 0.177 ms
 Execution time: 1445.473 ms
(5 rows)

So the planner will usually select an index scan if possible. With a btree index, we get results like this:

=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using btreeidx on tbloom  (cost=0.56..298311.96 rows=1 width=24) (actual time=445.709..445.709 rows=0 loops=1)
   Index Cond: ((i2 = 898732) AND (i5 = 123451))
   Heap Fetches: 0
 Planning time: 0.193 ms
 Execution time: 445.770 ms
(5 rows)

Bloom is better than btree in handling this type of search:

=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=178435.39..178439.41 rows=1 width=24) (actual time=76.698..76.698 rows=0 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 2439
   Heap Blocks: exact=2408
   ->  Bitmap Index Scan on bloomidx  (cost=0.00..178435.39 rows=1 width=0) (actual time=72.455..72.455 rows=2439 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
 Planning time: 0.475 ms
 Execution time: 76.778 ms
(8 rows)

請注意，這裡的誤報的數量相對較多：被選擇 2439 筆資料要進行確認，但實際上沒有與查詢相符的資料。我們可以透過指定更大的 signature 長度來減少這種情況。在此範例中，建立長度為 200 的索引可將誤報數量減少到 55；但它也使索引大小增加了一倍（至 306 MB），並最終使該查詢的速度變慢（總計 125 毫秒）。

現在，btree 搜搜的主要問題在於，當搜搜條件不限制前導索引欄時，btree 的效率低下。btree 的更好策略是在每欄位上建立一個單獨的索引。然後計劃程序將會選擇規劃以下內容：

=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=9.29..13.30 rows=1 width=24) (actual time=0.148..0.148 rows=0 loops=1)
   Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
   ->  BitmapAnd  (cost=9.29..9.29 rows=1 width=0) (actual time=0.145..0.145 rows=0 loops=1)
         ->  Bitmap Index Scan on tbloom_i5_idx  (cost=0.00..4.52 rows=11 width=0) (actual time=0.089..0.089 rows=10 loops=1)
               Index Cond: (i5 = 123451)
         ->  Bitmap Index Scan on tbloom_i2_idx  (cost=0.00..4.52 rows=11 width=0) (actual time=0.048..0.048 rows=8 loops=1)
               Index Cond: (i2 = 898732)
 Planning time: 2.049 ms
 Execution time: 0.280 ms
(9 rows)

儘管此查詢的執行速度比使用單個索引的查詢快得多，但我們在索引大小上付出了很大的代價。每個單欄位 btree 索引佔用 214 MB，因此所需的總空間超過 1.2GB，是 Bloom 索引使用的空間 8 倍以上。

F.5.3. Operator Class Interface

An operator class for bloom indexes requires only a hash function for the indexed data type and an equality operator for searching. This example shows the operator class definition for the text data type:

CREATE OPERATOR CLASS text_ops
DEFAULT FOR TYPE text USING bloom AS
    OPERATOR    1   =(text, text),
    FUNCTION    1   hashtext(text);

F.5.4. Limitations

Only operator classes for int4 and text are included with the module.
Only the = operator is supported for search. But it is possible to add support for arrays with union and intersection operations in the future.
bloom access method doesn't support UNIQUE indexes.
bloom access method doesn't support searching for NULL values.

F.5.5. Authors

Teodor Sigaev <teodor@postgrespro.ru>, Postgres Professional, Moscow, Russia

Alexander Korotkov <a.korotkov@postgrespro.ru>, Postgres Professional, Moscow, Russia

Oleg Bartunov <obartunov@postgrespro.ru>, Postgres Professional, Moscow, Russia

dblink_disconnect

dblink_disconnect — closes a persistent connection to a remote database

Synopsis

dblink_disconnect() returns text
dblink_disconnect(text connname) returns text

Description

dblink_disconnect() closes a connection previously opened by dblink_connect(). The form with no arguments closes an unnamed connection.

Arguments

connname

The name of a named connection to be closed.

Return Value

Returns status, which is always OK (since any error causes the function to throw an error instead of returning).

Examples

SELECT dblink_disconnect();
 dblink_disconnect
-------------------
 OK
(1 row)

SELECT dblink_disconnect('myconn');
 dblink_disconnect
-------------------
 OK
(1 row)

dblink

dblink — executes a query in a remote database

Synopsis

dblink(text connname, text sql [, bool fail_on_error]) returns setof record
dblink(text connstr, text sql [, bool fail_on_error]) returns setof record
dblink(text sql [, bool fail_on_error]) returns setof record

Description

dblink executes a query (usually a SELECT, but it can be any SQL statement that returns rows) in a remote database.

When two text arguments are given, the first one is first looked up as a persistent connection's name; if found, the command is executed on that connection. If not found, the first argument is treated as a connection info string as for dblink_connect, and the indicated connection is made just for the duration of this command.

Arguments

connname

Name of the connection to use; omit this parameter to use the unnamed connection.connstr

A connection info string, as previously described for dblink_connect.

sql

The SQL query that you wish to execute in the remote database, for example select * from foo.

fail_on_error

If true (the default when omitted) then an error thrown on the remote side of the connection causes an error to also be thrown locally. If false, the remote error is locally reported as a NOTICE, and the function returns no rows.

Return Value

The function returns the row(s) produced by the query. Since dblink can be used with any query, it is declared to return record, rather than specifying any particular set of columns. This means that you must specify the expected set of columns in the calling query — otherwise PostgreSQL would not know what to expect. Here is an example:

SELECT *
    FROM dblink('dbname=mydb options=-csearch_path=',
                'select proname, prosrc from pg_proc')
      AS t1(proname name, prosrc text)
    WHERE proname LIKE 'bytea%';

The “alias” part of the FROM clause must specify the column names and types that the function will return. (Specifying column names in an alias is actually standard SQL syntax, but specifying column types is a PostgreSQL extension.) This allows the system to understand what * should expand to, and what proname in the WHERE clause refers to, in advance of trying to execute the function. At run time, an error will be thrown if the actual query result from the remote database does not have the same number of columns shown in the FROM clause. The column names need not match, however, and dblink does not insist on exact type matches either. It will succeed so long as the returned data strings are valid input for the column type declared in the FROM clause.

Notes

A convenient way to use dblink with predetermined queries is to create a view. This allows the column type information to be buried in the view, instead of having to spell it out in every query. For example,

CREATE VIEW myremote_pg_proc AS
  SELECT *
    FROM dblink('dbname=postgres options=-csearch_path=',
                'select proname, prosrc from pg_proc')
    AS t1(proname name, prosrc text);

SELECT * FROM myremote_pg_proc WHERE proname LIKE 'bytea%';

Examples

SELECT * FROM dblink('dbname=postgres options=-csearch_path=',
                     'select proname, prosrc from pg_proc')
  AS t1(proname name, prosrc text) WHERE proname LIKE 'bytea%';
  proname   |   prosrc
------------+------------
 byteacat   | byteacat
 byteaeq    | byteaeq
 bytealt    | bytealt
 byteale    | byteale
 byteagt    | byteagt
 byteage    | byteage
 byteane    | byteane
 byteacmp   | byteacmp
 bytealike  | bytealike
 byteanlike | byteanlike
 byteain    | byteain
 byteaout   | byteaout
(12 rows)

SELECT dblink_connect('dbname=postgres options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

SELECT * FROM dblink('select proname, prosrc from pg_proc')
  AS t1(proname name, prosrc text) WHERE proname LIKE 'bytea%';
  proname   |   prosrc
------------+------------
 byteacat   | byteacat
 byteaeq    | byteaeq
 bytealt    | bytealt
 byteale    | byteale
 byteagt    | byteagt
 byteage    | byteage
 byteane    | byteane
 byteacmp   | byteacmp
 bytealike  | bytealike
 byteanlike | byteanlike
 byteain    | byteain
 byteaout   | byteaout
(12 rows)

SELECT dblink_connect('myconn', 'dbname=regression options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

SELECT * FROM dblink('myconn', 'select proname, prosrc from pg_proc')
  AS t1(proname name, prosrc text) WHERE proname LIKE 'bytea%';
  proname   |   prosrc
------------+------------
 bytearecv  | bytearecv
 byteasend  | byteasend
 byteale    | byteale
 byteagt    | byteagt
 byteage    | byteage
 byteane    | byteane
 byteacmp   | byteacmp
 bytealike  | bytealike
 byteanlike | byteanlike
 byteacat   | byteacat
 byteaeq    | byteaeq
 bytealt    | bytealt
 byteain    | byteain
 byteaout   | byteaout
(14 rows)

dblink_connect

dblink_connect — opens a persistent connection to a remote database

Synopsis

dblink_connect(text connstr) returns text
dblink_connect(text connname, text connstr) returns text

Description

dblink_connect() establishes a connection to a remote PostgreSQL database. The server and database to be contacted are identified through a standard libpq connection string. Optionally, a name can be assigned to the connection. Multiple named connections can be open at once, but only one unnamed connection is permitted at a time. The connection will persist until closed or until the database session is ended.

The connection string may also be the name of an existing foreign server. It is recommended to use the foreign-data wrapper dblink_fdw when defining the foreign server. See the example below, as well as CREATE SERVER and CREATE USER MAPPING.

Arguments

connname

The name to use for this connection; if omitted, an unnamed connection is opened, replacing any existing unnamed connection.connstr

libpq-style connection info string, for example hostaddr=127.0.0.1 port=5432 dbname=mydb user=postgres password=mypasswd options=-csearch_path=. For details see Section 33.1.1. Alternatively, the name of a foreign server.

Return Value

Returns status, which is always OK (since any error causes the function to throw an error instead of returning).

Notes

If untrusted users have access to a database that has not adopted a secure schema usage pattern, begin each session by removing publicly-writable schemas from search_path. One could, for example, add options=-csearch_path= to connstr. This consideration is not specific to dblink; it applies to every interface for executing arbitrary SQL commands.

Only superusers may use dblink_connect to create non-password-authenticated connections. If non-superusers need this capability, use dblink_connect_u instead.

It is unwise to choose connection names that contain equal signs, as this opens a risk of confusion with connection info strings in other dblink functions.

Examples

SELECT dblink_connect('dbname=postgres options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

SELECT dblink_connect('myconn', 'dbname=postgres options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

-- FOREIGN DATA WRAPPER functionality
-- Note: local connection must require password authentication for this to work properly
--       Otherwise, you will receive the following error from dblink_connect():
--       ----------------------------------------------------------------------
--       ERROR:  password is required
--       DETAIL:  Non-superuser cannot connect if the server does not request a password.
--       HINT:  Target server's authentication method must be changed.

CREATE SERVER fdtest FOREIGN DATA WRAPPER dblink_fdw OPTIONS (hostaddr '127.0.0.1', dbname 'contrib_regression');

CREATE USER regress_dblink_user WITH PASSWORD 'secret';
CREATE USER MAPPING FOR regress_dblink_user SERVER fdtest OPTIONS (user 'regress_dblink_user', password 'secret');
GRANT USAGE ON FOREIGN SERVER fdtest TO regress_dblink_user;
GRANT SELECT ON TABLE foo TO regress_dblink_user;

\set ORIGINAL_USER :USER
\c - regress_dblink_user
SELECT dblink_connect('myconn', 'fdtest');
 dblink_connect 
----------------
 OK
(1 row)

SELECT * FROM dblink('myconn','SELECT * FROM foo') AS t(a int, b text, c text[]);
 a  | b |       c       
----+---+---------------
  0 | a | {a0,b0,c0}
  1 | b | {a1,b1,c1}
  2 | c | {a2,b2,c2}
  3 | d | {a3,b3,c3}
  4 | e | {a4,b4,c4}
  5 | f | {a5,b5,c5}
  6 | g | {a6,b6,c6}
  7 | h | {a7,b7,c7}
  8 | i | {a8,b8,c8}
  9 | j | {a9,b9,c9}
 10 | k | {a10,b10,c10}
(11 rows)

\c - :ORIGINAL_USER
REVOKE USAGE ON FOREIGN SERVER fdtest FROM regress_dblink_user;
REVOKE SELECT ON TABLE foo FROM regress_dblink_user;
DROP USER MAPPING FOR regress_dblink_user SERVER fdtest;
DROP USER regress_dblink_user;
DROP SERVER fdtest;

F.6. btree_gin

btree_gin provides sample GIN operator classes that implement B-tree equivalent behavior for the data types int2, int4, int8, float4, float8, timestamp with time zone, timestamp without time zone, time with time zone, time without time zone, date, interval, oid, money, "char", varchar, text, bytea, bit, varbit, macaddr, macaddr8, inet, cidr, uuid, name, bool, bpchar, and all enum types.

通常，這些運算子類不會優於等效的標準 B-tree 索引方法，並且它們缺少標準 B-tree 的一個主要功能：強制執行唯一性。但是，它們對於 GIN 測試很有用，並且可以作為開發其他 GIN 運算子類的基礎。同樣地，對於同時測試可索引 GIN 欄位和 B-tree 可索引列的查詢，建立使用這些運算子之一的多欄位 GIN 索引可能比建立兩個必須獨立的索引更有效，以 bitmap ANDing 的方式。

F.6.1. 使用範例

CREATE TABLE test (a int4);
-- create index
CREATE INDEX testidx ON test USING GIN (a);
-- query
SELECT * FROM test WHERE a < 10;

F.6.2. 作者們

Teodor Sigaev (<teodor@stack.net>) and Oleg Bartunov (<oleg@sai.msu.su>). See http://www.sai.msu.su/~megera/oddmuse/index.cgi/Gin for additional information.

F.1. adminpack

adminpack 提供了許多支援性質的函數，pgAdmin 和其他管理工具可以用來提供其他功能，例如伺服器日誌檔案的遠程管理。預設情況下，僅超級使用者可以使用所有的這些函數，但使用 GRANT 指令可以允許其他使用者使用它們。

Table F.1 中列出的函數提供了對伺服器的主機上檔案的寫入功能。（另請參見 Table 9.95 中提供唯讀的功能。）只能存取資料庫叢集目錄中的檔案，除非使用者是超級使用者或具有 pg_read_server_files 或 pg_write_server_files 個角色（視情況而定），而相對路徑及絕對路徑都是允許的。

Table F.1. `adminpack` Functions

pg_file_write writes the specified data into the file named by filename. If append is false, the file must not already exist. If append is true, the file can already exist, and will be appended to if so. Returns the number of bytes written.

pg_file_sync fsyncs the specified file or directory named by filename. An error is thrown on failure (e.g., the specified file is not present). Note that data_sync_retry has no effect on this function, and therefore a PANIC-level error will not be raised even on failure to flush database files.

pg_file_rename renames a file. If archivename is omitted or NULL, it simply renames oldname to newname (which must not already exist). If archivename is provided, it first renames newname to archivename (which must not already exist), and then renames oldname to newname. In event of failure of the second rename step, it will try to rename archivename back to newname before reporting the error. Returns true on success, false if the source file(s) are not present or not writable; other cases throw errors.

pg_file_unlink removes the specified file. Returns true on success, false if the specified file is not present or the unlink() call fails; other cases throw errors.

pg_logdir_ls returns the start timestamps and path names of all the log files in the log_directory directory. The log_filename parameter must have its default setting (postgresql-%Y-%m-%d_%H%M%S.log) to use this function.

B.4. 日期時間的沿革

The SQL standard states that“Within the definition of a‘datetime literal’, the‘datetime values’are constrained by the natural rules for dates and times according to the Gregorian calendar”.PostgreSQLfollows the SQL standard's lead by counting dates exclusively in the Gregorian calendar, even for years before that calendar was in use. This rule is known as theproleptic Gregorian calendar.

The Julian calendar was introduced by Julius Caesar in 45 BC. It was in common use in the Western world until the year 1582, when countries started changing to the Gregorian calendar. In the Julian calendar, the tropical year is approximated as 365 1/4 days = 365.25 days. This gives an error of about 1 day in 128 years.

The accumulating calendar error prompted Pope Gregory XIII to reform the calendar in accordance with instructions from the Council of Trent. In the Gregorian calendar, the tropical year is approximated as 365 + 97 / 400 days = 365.2425 days. Thus it takes approximately 3300 years for the tropical year to shift one day with respect to the Gregorian calendar.

The approximation 365+97/400 is achieved by having 97 leap years every 400 years, using the following rules:

So, 1700, 1800, 1900, 2100, and 2200 are not leap years. But 1600, 2000, and 2400 are leap years. By contrast, in the older Julian calendar all years divisible by 4 are leap years.

The papal bull of February 1582 decreed that 10 days should be dropped from October 1582 so that 15 October should follow immediately after 4 October. This was observed in Italy, Poland, Portugal, and Spain. Other Catholic countries followed shortly after, but Protestant countries were reluctant to change, and the Greek Orthodox countries didn't change until the start of the 20th century. The reform was observed by Great Britain and its dominions (including what is now the USA) in 1752. Thus 2 September 1752 was followed by 14 September 1752. This is why Unix systems have thecalprogram produce the following:

$ cal 9 1752

   September 1752
 S  M Tu  W Th  F  S
       1  2 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

But, of course, this calendar is only valid for Great Britain and dominions, not other places. Since it would be difficult and confusing to try to track the actual calendars that were in use in various places at various times,PostgreSQLdoes not try, but rather follows the Gregorian calendar rules for all dates, even though this method is not historically accurate.

Different calendars have been developed in various parts of the world, many predating the Gregorian system. For example, the beginnings of the Chinese calendar can be traced back to the 14th century BC. Legend has it that the Emperor Huangdi invented that calendar in 2637 BC. The People's Republic of China uses the Gregorian calendar for civil purposes. The Chinese calendar is used for determining festivals.

The_Julian Date_system is another type of calendar, unrelated to the Julian calendar though it is confusingly named similarly to that calendar. The Julian Date system was invented by the French scholar Joseph Justus Scaliger (1540-1609) and probably takes its name from Scaliger's father, the Italian scholar Julius Caesar Scaliger (1484-1558). In the Julian Date system, each day has a sequential number, starting from JD 0 (which is sometimes called_the_Julian Date). JD 0 corresponds to 1 January 4713 BC in the Julian calendar, or 24 November 4714 BC in the Gregorian calendar. Julian Date counting is most often used by astronomers for labeling their nightly observations, and therefore a date runs from noon UTC to the next noon UTC, rather than from midnight to midnight: JD 0 designates the 24 hours from noon UTC on 24 November 4714 BC to noon UTC on 25 November 4714 BC.

AlthoughPostgreSQLsupports Julian Date notation for input and output of dates (and also uses Julian dates for some internal datetime calculations), it does not observe the nicety of having dates run from noon to noon.PostgreSQLtreats a Julian Date as running from midnight to midnight.

https://www.postgresql.org/docs/10/static/datetime-units-history.html

A. PostgreSQL 錯誤代碼

所有由PostgreSQL伺服器回傳的訊息，都會搭配一個 5 個字元的錯誤代碼，這些代碼均以 SQL 標準的 SQLSTATE 代碼所定義。應用程式應該以錯誤代碼作為程式行為，而非處理文字的錯誤訊息。這些錯誤代碼會隨 PostgreSQL 版本發佈而微幅更新，但不會因為不同語言的文字訊息影響其定義。特別注意，部份錯誤代碼是以 SQL 標準的定義所制定，而有些錯誤代碼是額外制定的，以對應其他 SQL 標準未定義的使用情況。

標準來看，錯誤代碼的前 2 個字元代表著這個錯誤的類別，而後面 3 個字元則指出在此類別中更詳細的發生情況。所以，應用程式即使未完整支援特定的錯誤代碼，仍然可以依類別代碼瞭解大概發生了什麼事。

下方的 Table A.1 列出了所有在 PostgreSQL 10 中所定義的錯誤代碼。（部份錯誤代碼可能實際上沒有使用，但仍然依 SQL 標準所制定）每一個錯誤的類別都有一個＂標準＂的錯誤情況，其最後 3 個字元為 000，只用於在該類別未特別詳細定義的錯誤情況。

在＂Condition Name＂欄位中的內容將能在 PL/pgSQL 中被使用。Condition Name 不論大小寫字母均可合法使用。（特別注意，PL/pgSQL 並不支援 00、01 及 02 的錯誤類別。）

有一些錯誤訊息，其回傳內容是發生錯誤的資料庫物件名稱（表格 Table、表格欄位 Table column、資料型別 data type、限制條件 constraint）。舉例來說，如果產生了 unique_violation 錯誤，則會回傳某個限制條件名稱。這些訊息將會分別額外的欄位回傳，所以應用程式不需要特別解譯描述性的訊息內容。在 PostgreSQL 9.3之前，這種回傳方式僅限於 SQLSTATE 類別 23（違反限制條件的一致性 integrity constraint violation），日後將儘可能延伸支援到所有類別。

Table A.1. PostgreSQL Error Codes

https://www.postgresql.org/docs/10/static/errcodes-appendix.html

F. 延伸支援模組

This appendix and the next one contain information regarding the modules that can be found in the contrib directory of the PostgreSQL distribution. These include porting tools, analysis utilities, and plug-in features that are not part of the core PostgreSQL system, mainly because they address a limited audience or are too experimental to be part of the main source tree. This does not preclude their usefulness.

This appendix covers extensions and other server plug-in modules found in contrib. covers utility programs.

When building from the source distribution, these components are not built automatically, unless you build the "world" target (see ). You can build and install all of them by running:

in the contrib directory of a configured source tree; or to build and install just one selected module, do the same in that module's subdirectory. Many of the modules have regression tests, which can be executed by running:

before installation or

once you have a PostgreSQL server running.

If you are using a pre-packaged version of PostgreSQL, these modules are typically made available as a separate subpackage, such as postgresql-contrib.

Many modules supply new user-defined functions, operators, or types. To make use of one of these modules, after you have installed the code you need to register the new SQL objects in the database system. This is done by executing a command. In a fresh database, you can simply do

This command registers the new SQL objects in the current database only, so you need to run it in each database that you want the module's facilities to be available in. Alternatively, run it in database template1 so that the extension will be copied into subsequently-created databases by default.

For all these modules, CREATE EXTENSION must be run by a database superuser, unless the module is considered “trusted”, in which case it can be run by any user who has CREATE privilege on the current database. Modules that are trusted are identified as such in the sections that follow. Generally, trusted modules are ones that cannot provide access to outside-the-database functionality.

Many modules allow you to install their objects in a schema of your choice. To do that, add SCHEMA schema_name to the CREATE EXTENSION command. By default, the objects will be placed in your current creation target schema, which in turn defaults to public.

Note, however, that some of these modules are not “extensions” in this sense, but are loaded into the server in some other way, for instance by way of . See the documentation of each module for details.

SELECT dblink_connect('dbname=postgres options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

SELECT dblink_open('foo', 'select proname, prosrc from pg_proc');
 dblink_open
-------------
 OK
(1 row)

dblink_exec(text connname, text sql [, bool fail_on_error]) returns text
dblink_exec(text connstr, text sql [, bool fail_on_error]) returns text
dblink_exec(text sql [, bool fail_on_error]) returns text

SELECT dblink_connect('dbname=dblink_test_standby');
 dblink_connect
----------------
 OK
(1 row)

SELECT dblink_exec('insert into foo values(21,''z'',''{"a0","b0","c0"}'');');
   dblink_exec
-----------------
 INSERT 943366 1
(1 row)

SELECT dblink_connect('myconn', 'dbname=regression');
 dblink_connect
----------------
 OK
(1 row)

SELECT dblink_exec('myconn', 'insert into foo values(21,''z'',''{"a0","b0","c0"}'');');
   dblink_exec
------------------
 INSERT 6432584 1
(1 row)

SELECT dblink_exec('myconn', 'insert into pg_class values (''foo'')',false);
NOTICE:  sql error
DETAIL:  ERROR:  null value in column "relnamespace" violates not-null constraint

 dblink_exec
-------------
 ERROR
(1 row)

SELECT dblink_connect('dbname=postgres options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

SELECT dblink_open('foo', 'select proname, prosrc from pg_proc');
 dblink_open
-------------
 OK
(1 row)

SELECT dblink_close('foo');
 dblink_close
--------------
 OK
(1 row)

SELECT dblink_connect('dbname=postgres options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

SELECT dblink_open('foo', 'select proname, prosrc from pg_proc where proname like ''bytea%''');
 dblink_open
-------------
 OK
(1 row)

SELECT * FROM dblink_fetch('foo', 5) AS (funcname name, source text);
 funcname |  source
----------+----------
 byteacat | byteacat
 byteacmp | byteacmp
 byteaeq  | byteaeq
 byteage  | byteage
 byteagt  | byteagt
(5 rows)

SELECT * FROM dblink_fetch('foo', 5) AS (funcname name, source text);
 funcname  |  source
-----------+-----------
 byteain   | byteain
 byteale   | byteale
 bytealike | bytealike
 bytealt   | bytealt
 byteane   | byteane
(5 rows)

SELECT * FROM dblink_fetch('foo', 5) AS (funcname name, source text);
  funcname  |   source
------------+------------
 byteanlike | byteanlike
 byteaout   | byteaout
(2 rows)

SELECT * FROM dblink_fetch('foo', 5) AS (funcname name, source text);
 funcname | source
----------+--------
(0 rows)

contrib_regression=# SELECT dblink_connect('dtest1', 'dbname=contrib_regression');
 dblink_connect
----------------
 OK
(1 row)

contrib_regression=# SELECT * FROM
contrib_regression-# dblink_send_query('dtest1', 'select * from foo where f1 < 3') AS t1;
 t1
----
  1
(1 row)

contrib_regression=# SELECT * FROM dblink_get_result('dtest1') AS t1(f1 int, f2 text, f3 text[]);
 f1 | f2 |     f3
----+----+------------
  0 | a  | {a0,b0,c0}
  1 | b  | {a1,b1,c1}
  2 | c  | {a2,b2,c2}
(3 rows)

contrib_regression=# SELECT * FROM dblink_get_result('dtest1') AS t1(f1 int, f2 text, f3 text[]);
 f1 | f2 | f3
----+----+----
(0 rows)

contrib_regression=# SELECT * FROM
contrib_regression-# dblink_send_query('dtest1', 'select * from foo where f1 < 3; select * from foo where f1 > 6') AS t1;
 t1
----
  1
(1 row)

contrib_regression=# SELECT * FROM dblink_get_result('dtest1') AS t1(f1 int, f2 text, f3 text[]);
 f1 | f2 |     f3
----+----+------------
  0 | a  | {a0,b0,c0}
  1 | b  | {a1,b1,c1}
  2 | c  | {a2,b2,c2}
(3 rows)

contrib_regression=# SELECT * FROM dblink_get_result('dtest1') AS t1(f1 int, f2 text, f3 text[]);
 f1 | f2 |      f3
----+----+---------------
  7 | h  | {a7,b7,c7}
  8 | i  | {a8,b8,c8}
  9 | j  | {a9,b9,c9}
 10 | k  | {a10,b10,c10}
(4 rows)

contrib_regression=# SELECT * FROM dblink_get_result('dtest1') AS t1(f1 int, f2 text, f3 text[]);
 f1 | f2 | f3
----+----+----
(0 rows)

SELECT dblink_exec('LISTEN virtual');
 dblink_exec 
-------------
 LISTEN
(1 row)

SELECT * FROM dblink_get_notify();
 notify_name | be_pid | extra
-------------+--------+-------
(0 rows)

NOTIFY virtual;
NOTIFY

SELECT * FROM dblink_get_notify();
 notify_name | be_pid | extra
-------------+--------+-------
 virtual     |   1229 |
(1 row)

dblink_build_sql_delete

dblink_build_sql_delete — builds a DELETE statement using supplied values for primary key field values

Synopsis

dblink_build_sql_delete(text relname,
                        int2vector primary_key_attnums,
                        integer num_primary_key_atts,
                        text[] tgt_pk_att_vals_array) returns text

Description

dblink_build_sql_delete can be useful in doing selective replication of a local table to a remote database. It builds a SQL DELETE command that will delete the row with the given primary key values.

Arguments

relname

Name of a local relation, for example foo or myschema.mytab. Include double quotes if the name is mixed-case or contains special characters, for example "FooBar"; without quotes, the string will be folded to lower case.

primary_key_attnums

Attribute numbers (1-based) of the primary key fields, for example 1 2.

num_primary_key_atts

The number of primary key fields.

tgt_pk_att_vals_array

Values of the primary key fields to be used in the resulting DELETE command. Each field is represented in text form.

Return Value

Returns the requested SQL statement as text.

Notes

As of PostgreSQL 9.0, the attribute numbers in primary_key_attnums are interpreted as logical column numbers, corresponding to the column's position in SELECT * FROM relname. Previous versions interpreted the numbers as physical column positions. There is a difference if any column(s) to the left of the indicated column have been dropped during the lifetime of the table.

Examples

SELECT dblink_build_sql_delete('"MyFoo"', '1 2', 2, '{"1", "b"}');
           dblink_build_sql_delete
---------------------------------------------
 DELETE FROM "MyFoo" WHERE f1='1' AND f2='b'
(1 row)

dblink_get_pkey

dblink_get_pkey — returns the positions and field names of a relation's primary key fields

Synopsis

dblink_get_pkey(text relname) returns setof dblink_pkey_results

Description

dblink_get_pkey provides information about the primary key of a relation in the local database. This is sometimes useful in generating queries to be sent to remote databases.

Arguments

relname

Return Value

Returns one row for each primary key field, or no rows if the relation has no primary key. The result row type is defined as

CREATE TYPE dblink_pkey_results AS (position int, colname text);

The position column simply runs from 1 to N; it is the number of the field within the primary key, not the number within the table's columns.

Examples

CREATE TABLE foobar (
    f1 int,
    f2 int,
    f3 int,
    PRIMARY KEY (f1, f2, f3)
);
CREATE TABLE

SELECT * FROM dblink_get_pkey('foobar');
 position | colname
----------+---------
        1 | f1
        2 | f2
        3 | f3
(3 rows)

dblink_build_sql_insert

dblink_build_sql_insert — builds an INSERT statement using a local tuple, replacing the primary key field values with alternative supplied values

Synopsis

dblink_build_sql_insert(text relname,
                        int2vector primary_key_attnums,
                        integer num_primary_key_atts,
                        text[] src_pk_att_vals_array,
                        text[] tgt_pk_att_vals_array) returns text

Description

dblink_build_sql_insert can be useful in doing selective replication of a local table to a remote database. It selects a row from the local table based on primary key, and then builds a SQL INSERT command that will duplicate that row, but with the primary key values replaced by the values in the last argument. (To make an exact copy of the row, just specify the same values for the last two arguments.)

Arguments

relname

primary_key_attnums

Attribute numbers (1-based) of the primary key fields, for example 1 2.

num_primary_key_atts

The number of primary key fields.

src_pk_att_vals_array

Values of the primary key fields to be used to look up the local tuple. Each field is represented in text form. An error is thrown if there is no local row with these primary key values.

tgt_pk_att_vals_array

Values of the primary key fields to be placed in the resulting INSERT command. Each field is represented in text form.

Return Value

Returns the requested SQL statement as text.

Notes

Examples

SELECT dblink_build_sql_insert('foo', '1 2', 2, '{"1", "a"}', '{"1", "b''a"}');
             dblink_build_sql_insert
--------------------------------------------------
 INSERT INTO foo(f1,f2,f3) VALUES('1','b''a','1')
(1 row)

dblink_build_sql_update

dblink_build_sql_update — builds an UPDATE statement using a local tuple, replacing the primary key field values with alternative supplied values

Synopsis

dblink_build_sql_update(text relname,
                        int2vector primary_key_attnums,
                        integer num_primary_key_atts,
                        text[] src_pk_att_vals_array,
                        text[] tgt_pk_att_vals_array) returns text

Description

dblink_build_sql_update can be useful in doing selective replication of a local table to a remote database. It selects a row from the local table based on primary key, and then builds a SQL UPDATE command that will duplicate that row, but with the primary key values replaced by the values in the last argument. (To make an exact copy of the row, just specify the same values for the last two arguments.) The UPDATE command always assigns all fields of the row — the main difference between this and dblink_build_sql_insert is that it's assumed that the target row already exists in the remote table.

Arguments

relname

primary_key_attnums

Attribute numbers (1-based) of the primary key fields, for example 1 2.num_primary_key_atts

The number of primary key fields.

src_pk_att_vals_array

Values of the primary key fields to be used to look up the local tuple. Each field is represented in text form. An error is thrown if there is no local row with these primary key values.

tgt_pk_att_vals_array

Values of the primary key fields to be placed in the resulting UPDATE command. Each field is represented in text form.

Return Value

Returns the requested SQL statement as text.

Notes

Examples

SELECT dblink_build_sql_update('foo', '1 2', 2, '{"1", "a"}', '{"1", "b"}');
                   dblink_build_sql_update
-------------------------------------------------------------
 UPDATE foo SET f1='1',f2='b',f3='1' WHERE f1='1' AND f2='b'
(1 row)

dblink_cancel_query

dblink_cancel_query — cancels any active query on the named connection

Synopsis

dblink_cancel_query(text connname) returns text

Description

dblink_cancel_query attempts to cancel any query that is in progress on the named connection. Note that this is not certain to succeed (since, for example, the remote query might already have finished). A cancel request simply improves the odds that the query will fail soon. You must still complete the normal query protocol, for example by calling dblink_get_result.

Arguments

connname

Name of the connection to use.

Return Value

Returns OK if the cancel request has been sent, or the text of an error message on failure.

Examples

SELECT dblink_cancel_query('dtest1');

F.40. test_decoding

test_decoding is an example of a logical decoding output plugin. It doesn't do anything especially useful, but can serve as a starting point for developing your own output plugin.

test_decoding receives WAL through the logical decoding mechanism and decodes it into text representations of the operations performed.

Typical output from this plugin, used over the SQL logical decoding interface, might be:

postgres=# SELECT * FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'include-xids', '0');
   lsn     | xid |                       data
-----------+-----+--------------------------------------------------
 0/16D30F8 | 691 | BEGIN
 0/16D32A0 | 691 | table public.data: INSERT: id[int4]:2 data[text]:'arg'
 0/16D32A0 | 691 | table public.data: INSERT: id[int4]:3 data[text]:'demo'
 0/16D32A0 | 691 | COMMIT
 0/16D32D8 | 692 | BEGIN
 0/16D3398 | 692 | table public.data: DELETE: id[int4]:2
 0/16D3398 | 692 | table public.data: DELETE: id[int4]:3
 0/16D3398 | 692 | COMMIT
(8 rows)

F.14. file_fdw

file_fdw 模組提供了外部資料封裝器 file_fdw，可用於存取伺服器檔案系統中的資料檔案，或在伺服器上執行某個程序並取得其輸出。資料檔案或程序輸出必須採用可由 COPY FROM 讀取的格式；有關詳細資訊，請參閱 COPY。目前對資料檔案的存取只有讀取的功能。

A foreign table created using this wrapper can have the following options:

filename

Specifies the file to be read. Must be an absolute path name. Either filename or program must be specified, but not both.

program

Specifies the command to be executed. The standard output of this command will be read as though COPY FROM PROGRAM were used. Either program or filename must be specified, but not both.

format

Specifies the data format, the same as COPY's FORMAT option.header

Specifies whether the data has a header line, the same as COPY's HEADER option.

delimiter

Specifies the data delimiter character, the same as COPY's DELIMITER option.

quote

Specifies the data quote character, the same as COPY's QUOTE option.

escape

Specifies the data escape character, the same as COPY's ESCAPE option.

null

Specifies the data null string, the same as COPY's NULL option.

encoding

Specifies the data encoding, the same as COPY's ENCODING option.

Note that while COPY allows options such as HEADER to be specified without a corresponding value, the foreign table option syntax requires a value to be present in all cases. To activate COPY options typically written without a value, you can pass the value TRUE, since all such options are Booleans.

A column of a foreign table created using this wrapper can have the following options:

force_not_null

This is a Boolean option. If true, it specifies that values of the column should not be matched against the null string (that is, the table-level null option). This has the same effect as listing the column in COPY's FORCE_NOT_NULL option.

force_null

This is a Boolean option. If true, it specifies that values of the column which match the null string are returned as NULL even if the value is quoted. Without this option, only unquoted values matching the null string are returned as NULL. This has the same effect as listing the column in COPY's FORCE_NULL option.

COPY's FORCE_QUOTE option is currently not supported by file_fdw.

These options can only be specified for a foreign table or its columns, not in the options of the file_fdw foreign-data wrapper, nor in the options of a server or user mapping using the wrapper.

Changing table-level options requires being a superuser or having the privileges of the default role pg_read_server_files (to use a filename) or the default role pg_execute_server_program (to use a program), for security reasons: only certain users should be able to control which file is read or which program is run. In principle regular users could be allowed to change the other options, but that's not supported at present.

When specifying the program option, keep in mind that the option string is executed by the shell. If you need to pass any arguments to the command that come from an untrusted source, you must be careful to strip or escape any characters that might have special meaning to the shell. For security reasons, it is best to use a fixed command string, or at least avoid passing any user input in it.

For a foreign table using file_fdw, EXPLAIN shows the name of the file to be read or program to be run. For a file, unless COSTS OFF is specified, the file size (in bytes) is shown as well.

Example F.1. Create a Foreign Table for PostgreSQL CSV Logs

file_fdw 其中一個明顯的用途是使 PostgreSQL 活動日誌形成查詢方便的資料表。為此，首先必須先產生記錄為 CSV 檔案，在這裡我們將其稱為 pglog.csv。首先，安裝 file_fdw 延伸套件：

CREATE EXTENSION file_fdw;

然後建立一個外部伺服器：

CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw;

現在您可以建外部資料表了。使用 CREATE FOREIGN TABLE 命令，您將需要定義資料表的欄位、CSV 檔案名稱及其格式：

CREATE FOREIGN TABLE pglog (
  log_time timestamp(3) with time zone,
  user_name text,
  database_name text,
  process_id integer,
  connection_from text,
  session_id text,
  session_line_num bigint,
  command_tag text,
  session_start_time timestamp with time zone,
  virtual_transaction_id text,
  transaction_id bigint,
  error_severity text,
  sql_state_code text,
  message text,
  detail text,
  hint text,
  internal_query text,
  internal_query_pos integer,
  context text,
  query text,
  query_pos integer,
  location text,
  application_name text
) SERVER pglog
OPTIONS ( filename '/home/josh/data/log/pglog.csv', format 'csv' );

就是這樣-現在您可以直接查詢日誌了。當然，在正式的運作環境中，您需要定義某種方式來處理日誌檔案的輪轉。

F.41. tsm_system_rows

tsm_system_rows 模組提供資料表抽樣方法 SYSTEM_ROWS，此方法可在 SELECT 指令的 TABLESAMPLE 子句中使用。

此資料表抽樣方法接受整數的參數，該參數是要讀取的最大資料筆數。除非資料表沒有足夠的資料，結果樣本將恰好包含那麼多筆資料；否則在這種情況下，將回傳整個資料表。

像內建的 SYSTEM 抽樣方法一樣，SYSTEM_ROWS 執行區塊策略抽樣，因此抽樣並不是完全隨機的，但可能會有些群聚的效應，尤其是在僅要求少量資料的情況下。

SYSTEM_ROWS 不支援 REPEATABLE 子句。

F.41.1. 範例

使用 SYSTEM_ROWS 選擇資料表樣本的範例。首先要安裝延伸功能：

CREATE EXTENSION tsm_system_rows;

然後，您可以在 SELECT 指令中使用它，例如：

SELECT * FROM my_table TABLESAMPLE SYSTEM_ROWS(100);

此命令將從資料表 my_table 回傳 100 筆資料的樣本（除非該資料表沒有 100 筆資料，在這種情況下將回傳其所有資料）。

F.32. pg_visibility

The pg_visibility module provides a means for examining the visibility map (VM) and page-level visibility information of a table. It also provides functions to check the integrity of a visibility map and to force it to be rebuilt.

Three different bits are used to store information about page-level visibility. The all-visible bit in the visibility map indicates that every tuple in the corresponding page of the relation is visible to every current and future transaction. The all-frozen bit in the visibility map indicates that every tuple in the page is frozen; that is, no future vacuum will need to modify the page until such time as a tuple is inserted, updated, deleted, or locked on that page. The page header's PD_ALL_VISIBLE bit has the same meaning as the all-visible bit in the visibility map, but is stored within the data page itself rather than in a separate data structure. These two bits will normally agree, but the page's all-visible bit can sometimes be set while the visibility map bit is clear after a crash recovery. The reported values can also disagree because of a change that occurs after pg_visibility examines the visibility map and before it examines the data page. Any event that causes data corruption can also cause these bits to disagree.

Functions that display information about PD_ALL_VISIBLE bits are much more costly than those that only consult the visibility map, because they must read the relation's data blocks rather than only the (much smaller) visibility map. Functions that check the relation's data blocks are similarly expensive.

F.32.1. Functions

pg_visibility_map(relation regclass, blkno bigint, all_visible OUT boolean, all_frozen OUT boolean) returns record

Returns the all-visible and all-frozen bits in the visibility map for the given block of the given relation.pg_visibility(relation regclass, blkno bigint, all_visible OUT boolean, all_frozen OUT boolean, pd_all_visible OUT boolean) returns record

Returns the all-visible and all-frozen bits in the visibility map for the given block of the given relation, plus the PD_ALL_VISIBLE bit of that block.pg_visibility_map(relation regclass, blkno OUT bigint, all_visible OUT boolean, all_frozen OUT boolean) returns setof record

Returns the all-visible and all-frozen bits in the visibility map for each block of the given relation.pg_visibility(relation regclass, blkno OUT bigint, all_visible OUT boolean, all_frozen OUT boolean, pd_all_visible OUT boolean) returns setof record

Returns the all-visible and all-frozen bits in the visibility map for each block of the given relation, plus the PD_ALL_VISIBLE bit of each block.pg_visibility_map_summary(relation regclass, all_visible OUT bigint, all_frozen OUT bigint) returns record

Returns the number of all-visible pages and the number of all-frozen pages in the relation according to the visibility map.pg_check_frozen(relation regclass, t_ctid OUT tid) returns setof tid

Returns the TIDs of non-frozen tuples stored in pages marked all-frozen in the visibility map. If this function returns a non-empty set of TIDs, the visibility map is corrupt.pg_check_visible(relation regclass, t_ctid OUT tid) returns setof tid

Returns the TIDs of non-all-visible tuples stored in pages marked all-visible in the visibility map. If this function returns a non-empty set of TIDs, the visibility map is corrupt.pg_truncate_visibility_map(relation regclass) returns void

Truncates the visibility map for the given relation. This function is useful if you believe that the visibility map for the relation is corrupt and wish to force rebuilding it. The first VACUUM executed on the given relation after this function is executed will scan every page in the relation and rebuild the visibility map. (Until that is done, queries will treat the visibility map as containing all zeroes.)

By default, these functions are executable only by superusers and members of the pg_stat_scan_tables role, with the exception of pg_truncate_visibility_map(relation regclass) which can only be executed by superusers.

F.32.2. Author

Robert Haas <rhaas@postgresql.org>

F.42. tsm_system_time

tsm_system_time 模組提供資料表抽樣方法 SYSTEM_TIME，此方法可在 SELECT 指令的 TABLESAMPLE 子句中使用。

此資料表抽樣方法接受一個浮點數參數，該參數是讀取資料表所花費的最大毫秒數(milliseconds)。這使您可以直接控制查詢所花費的時間，而代價是樣本大小變得難以預測。結果樣本將包含在指定時間內可以讀取的盡可能多的資料，除非已經讀取了整個資料表。

SYSTEM_ROWS 不支援 REPEATABLE 子句。

F.42.1. Examples

這是一個使用 SYSTEM_TIME 選擇資料表樣本的範例。首先要安裝延伸功能：

CREATE EXTENSION tsm_system_time;

然後，您可以在 SELECT 指令中使用它，例如：

SELECT * FROM my_table TABLESAMPLE SYSTEM_TIME(1000);

此指令將回傳 1 秒鐘（1,000 毫秒）內讀取 my_table 的樣本。當然，如果可以在 1 秒內讀取整個資料表，則將回傳其所有資料。

F.33. postgres_fdw

postgres_fdw 模組提供了外部資料封裝器 postgres_fdw，可用於存取儲存在外部 PostgreSQL 伺服器中的資料。

此模組提供的功能與舊版 dblink 模組的功能基本上是重疊的。但是 postgres_fdw 提供了更直覺且符合標準的語法來存取遠端資料表，並且在許多情況下可以提供更好的效能。

要準備使用 postgres_fdw 進行遠端存取：

使用 CREATE EXTENSION 安裝 postgres_fdw 延伸功能。
使用 CREATE SERVER 建立一個外部伺服器物件，設定您要連線的每個遠端資料庫。將連線資訊（使用者名稱和密碼除外）指定為 SERVER 物件的選項。
使用 CREATE USER MAPPING 為要存取每個外部伺服器的每個資料庫使用者建立一個使用者對應。指定用作使用者對應的使用者和密碼選項的遠端使用者名稱和密碼。
使用 CREATE FOREIGN TABLE 或 IMPORT FOREIGN SCHEMA 為要存取的每個遠端資料表建立一個外部資料表。外部資料表的欄位必須與引用的遠端資料表相符。但是，如果您指定正確的遠端名稱作為外部資料表物件的選項，則可以使用與遠端資料不同的資料表名稱和欄位名稱。

現在，您只需要從外部資料表中執行 SELECT 即可存取儲存在遠端的基本資料表中的資料。您還可以使用 INSERT、UPDATE 或 DELETE 來修改遠端資料表。（當然，您在 USER MAPPING 中所指定的遠端使用者必須具有執行這些操作的權限。）

請注意，postgres_fdw 目前不支援帶有 ON CONFLICT DO UPDATE 子句的 INSERT 語句。但是，支援 ON CONFLICT DO NOTHING 子句，唯一性衝突處理宣告將會被忽略。還要注意，postgres_fdw 支援在分割資料表上執行的 UPDATE 語句所呼叫的資料遷移，但是目前不能處理以下情況：選擇將插入資料的遠端分割區也剛好是後續要 UPDATE 的目標分割區。

通常建議使用與遠端資料表欄位完全相同的資料型別和排序規則來宣告外部資料表的欄位。儘管 postgres_fdw 目前相當寬容地根據需要執行資料型別轉換，但是當型別或排序規則不符合時，由於遠端伺服器對 WHERE 子句的解釋與本機伺服器的解釋略有不同，因此可能會出現令人驚訝的語義結果。

請注意，與基本遠端資料表相比，可以用更少的欄位或不同的欄位順序宣告一個外部資料表。欄位與遠端資料表的對應是按名稱而不是位置進行的。

F.33.1. FDW Options of postgres_fdw

F.33.1.1. Connection Options

A foreign server using the postgres_fdw foreign data wrapper can have the same options that libpq accepts in connection strings, as described in Section 33.1.2, except that these options are not allowed or have special handling:

user, password and sslpassword (specify these in a user mapping, instead, or use a service file)
client_encoding (this is automatically set from the local server encoding)
fallback_application_name (always set to postgres_fdw)
sslkey and sslcert - these may appear in either or both a connection and a user mapping. If both are present, the user mapping setting overrides the connection setting.

Only superusers may create or modify user mappings with the sslcert or sslkey settings.

Only superusers may connect to foreign servers without password authentication, so always specify the password option for user mappings belonging to non-superusers.

A superuser may override this check on a per-user-mapping basis by setting the user mapping option password_required 'false', e.g.,

ALTER USER MAPPING FOR some_non_superuser SERVER loopback_nopw
OPTIONS (ADD password_required 'false');

To prevent unprivileged users from exploiting the authentication rights of the unix user the postgres server is running as to escalate to superuser rights, only the superuser may set this option on a user mapping.

Care is required to ensure that this does not allow the mapped user the ability to connect as superuser to the mapped database per CVE-2007-3278 and CVE-2007-6601. Don't set password_required=false on the public role. Keep in mind that the mapped user can potentially use any client certificates, .pgpass, .pg_service.conf etc in the unix home directory of the system user the postgres server runs as. They can also use any trust relationship granted by authentication modes like peer or ident authentication.

F.33.1.2. Object Name Options

These options can be used to control the names used in SQL statements sent to the remote PostgreSQL server. These options are needed when a foreign table is created with names different from the underlying remote table's names.

schema_name

This option, which can be specified for a foreign table, gives the schema name to use for the foreign table on the remote server. If this option is omitted, the name of the foreign table's schema is used.

table_name

This option, which can be specified for a foreign table, gives the table name to use for the foreign table on the remote server. If this option is omitted, the foreign table's name is used.

column_name

This option, which can be specified for a column of a foreign table, gives the column name to use for the column on the remote server. If this option is omitted, the column's name is used.

F.33.1.3. Cost Estimation Options

postgres_fdw retrieves remote data by executing queries against remote servers, so ideally the estimated cost of scanning a foreign table should be whatever it costs to be done on the remote server, plus some overhead for communication. The most reliable way to get such an estimate is to ask the remote server and then add something for overhead — but for simple queries, it may not be worth the cost of an additional remote query to get a cost estimate. So postgres_fdw provides the following options to control how cost estimation is done:

use_remote_estimate

This option, which can be specified for a foreign table or a foreign server, controls whether postgres_fdw issues remote EXPLAIN commands to obtain cost estimates. A setting for a foreign table overrides any setting for its server, but only for that table. The default is false.

fdw_startup_cost

This option, which can be specified for a foreign server, is a numeric value that is added to the estimated startup cost of any foreign-table scan on that server. This represents the additional overhead of establishing a connection, parsing and planning the query on the remote side, etc. The default value is 100.

fdw_tuple_cost

This option, which can be specified for a foreign server, is a numeric value that is used as extra cost per-tuple for foreign-table scans on that server. This represents the additional overhead of data transfer between servers. You might increase or decrease this number to reflect higher or lower network delay to the remote server. The default value is 0.01.

When use_remote_estimate is true, postgres_fdw obtains row count and cost estimates from the remote server and then adds fdw_startup_cost and fdw_tuple_cost to the cost estimates. When use_remote_estimate is false, postgres_fdw performs local row count and cost estimation and then adds fdw_startup_cost and fdw_tuple_cost to the cost estimates. This local estimation is unlikely to be very accurate unless local copies of the remote table's statistics are available. Running ANALYZE on the foreign table is the way to update the local statistics; this will perform a scan of the remote table and then calculate and store statistics just as though the table were local. Keeping local statistics can be a useful way to reduce per-query planning overhead for a remote table — but if the remote table is frequently updated, the local statistics will soon be obsolete.

F.33.1.4. Remote Execution Options

By default, only WHERE clauses using built-in operators and functions will be considered for execution on the remote server. Clauses involving non-built-in functions are checked locally after rows are fetched. If such functions are available on the remote server and can be relied on to produce the same results as they do locally, performance can be improved by sending such WHERE clauses for remote execution. This behavior can be controlled using the following option:

extensions

This option is a comma-separated list of names of PostgreSQL extensions that are installed, in compatible versions, on both the local and remote servers. Functions and operators that are immutable and belong to a listed extension will be considered shippable to the remote server. This option can only be specified for foreign servers, not per-table.

When using the extensions option, it is the user's responsibility that the listed extensions exist and behave identically on both the local and remote servers. Otherwise, remote queries may fail or behave unexpectedly.

fetch_size

This option specifies the number of rows postgres_fdw should get in each fetch operation. It can be specified for a foreign table or a foreign server. The option specified on a table overrides an option specified for the server. The default is 100.

F.33.1.5. Updatability Options

By default all foreign tables using postgres_fdw are assumed to be updatable. This may be overridden using the following option:updatable

This option controls whether postgres_fdw allows foreign tables to be modified using INSERT, UPDATE and DELETE commands. It can be specified for a foreign table or a foreign server. A table-level option overrides a server-level option. The default is true.

Of course, if the remote table is not in fact updatable, an error would occur anyway. Use of this option primarily allows the error to be thrown locally without querying the remote server. Note however that the information_schema views will report a postgres_fdw foreign table to be updatable (or not) according to the setting of this option, without any check of the remote server.

F.33.1.6. Importing Options

postgres_fdw is able to import foreign table definitions using IMPORT FOREIGN SCHEMA. This command creates foreign table definitions on the local server that match tables or views present on the remote server. If the remote tables to be imported have columns of user-defined data types, the local server must have compatible types of the same names.

Importing behavior can be customized with the following options (given in the IMPORT FOREIGN SCHEMA command):

import_collate

This option controls whether column COLLATE options are included in the definitions of foreign tables imported from a foreign server. The default is true. You might need to turn this off if the remote server has a different set of collation names than the local server does, which is likely to be the case if it's running on a different operating system.

import_default

This option controls whether column DEFAULT expressions are included in the definitions of foreign tables imported from a foreign server. The default is false. If you enable this option, be wary of defaults that might get computed differently on the local server than they would be on the remote server; nextval() is a common source of problems. The IMPORT will fail altogether if an imported default expression uses a function or operator that does not exist locally.

import_not_null

This option controls whether column NOT NULL constraints are included in the definitions of foreign tables imported from a foreign server. The default is true.

Note that constraints other than NOT NULL will never be imported from the remote tables. Although PostgreSQL does support CHECK constraints on foreign tables, there is no provision for importing them automatically, because of the risk that a constraint expression could evaluate differently on the local and remote servers. Any such inconsistency in the behavior of a CHECK constraint could lead to hard-to-detect errors in query optimization. So if you wish to import CHECK constraints, you must do so manually, and you should verify the semantics of each one carefully. For more detail about the treatment of CHECK constraints on foreign tables, see CREATE FOREIGN TABLE.

Tables or foreign tables which are partitions of some other table are automatically excluded. Partitioned tables are imported, unless they are a partition of some other table. Since all data can be accessed through the partitioned table which is the root of the partitioning hierarchy, this approach should allow access to all the data without creating extra objects.

F.33.2. Connection Management

postgres_fdw establishes a connection to a foreign server during the first query that uses a foreign table associated with the foreign server. This connection is kept and re-used for subsequent queries in the same session. However, if multiple user identities (user mappings) are used to access the foreign server, a connection is established for each user mapping.

F.33.3. Transaction Management

During a query that references any remote tables on a foreign server, postgres_fdw opens a transaction on the remote server if one is not already open corresponding to the current local transaction. The remote transaction is committed or aborted when the local transaction commits or aborts. Savepoints are similarly managed by creating corresponding remote savepoints.

The remote transaction uses SERIALIZABLE isolation level when the local transaction has SERIALIZABLE isolation level; otherwise it uses REPEATABLE READ isolation level. This choice ensures that if a query performs multiple table scans on the remote server, it will get snapshot-consistent results for all the scans. A consequence is that successive queries within a single transaction will see the same data from the remote server, even if concurrent updates are occurring on the remote server due to other activities. That behavior would be expected anyway if the local transaction uses SERIALIZABLE or REPEATABLE READ isolation level, but it might be surprising for a READ COMMITTED local transaction. A future PostgreSQL release might modify these rules.

Note that it is currently not supported by postgres_fdw to prepare the remote transaction for two-phase commit.

F.33.4. Remote Query Optimization

postgres_fdw attempts to optimize remote queries to reduce the amount of data transferred from foreign servers. This is done by sending query WHERE clauses to the remote server for execution, and by not retrieving table columns that are not needed for the current query. To reduce the risk of misexecution of queries, WHERE clauses are not sent to the remote server unless they use only data types, operators, and functions that are built-in or belong to an extension that's listed in the foreign server's extensions option. Operators and functions in such clauses must be IMMUTABLE as well. For an UPDATE or DELETE query, postgres_fdw attempts to optimize the query execution by sending the whole query to the remote server if there are no query WHERE clauses that cannot be sent to the remote server, no local joins for the query, no row-level local BEFORE or AFTER triggers or stored generated columns on the target table, and no CHECK OPTION constraints from parent views. In UPDATE, expressions to assign to target columns must use only built-in data types, IMMUTABLE operators, or IMMUTABLE functions, to reduce the risk of misexecution of the query.

When postgres_fdw encounters a join between foreign tables on the same foreign server, it sends the entire join to the foreign server, unless for some reason it believes that it will be more efficient to fetch rows from each table individually, or unless the table references involved are subject to different user mappings. While sending the JOIN clauses, it takes the same precautions as mentioned above for the WHERE clauses.

The query that is actually sent to the remote server for execution can be examined using EXPLAIN VERBOSE.

F.33.5. Remote Query Execution Environment

In the remote sessions opened by postgres_fdw, the search_path parameter is set to just pg_catalog, so that only built-in objects are visible without schema qualification. This is not an issue for queries generated by postgres_fdw itself, because it always supplies such qualification. However, this can pose a hazard for functions that are executed on the remote server via triggers or rules on remote tables. For example, if a remote table is actually a view, any functions used in that view will be executed with the restricted search path. It is recommended to schema-qualify all names in such functions, or else attach SET search_path options (see CREATE FUNCTION) to such functions to establish their expected search path environment.

postgres_fdw likewise establishes remote session settings for various parameters:

TimeZone is set to UTC
DateStyle is set to ISO
IntervalStyle is set to postgres
extra_float_digits is set to 3 for remote servers 9.0 and newer and is set to 2 for older versions

These are less likely to be problematic than search_path, but can be handled with function SET options if the need arises.

It is not recommended that you override this behavior by changing the session-level settings of these parameters; that is likely to cause postgres_fdw to malfunction.

F.33.6. Cross-Version Compatibility

postgres_fdw can be used with remote servers dating back to PostgreSQL 8.3. Read-only capability is available back to 8.1. A limitation however is that postgres_fdw generally assumes that immutable built-in functions and operators are safe to send to the remote server for execution, if they appear in a WHERE clause for a foreign table. Thus, a built-in function that was added since the remote server's release might be sent to it for execution, resulting in “function does not exist” or a similar error. This type of failure can be worked around by rewriting the query, for example by embedding the foreign table reference in a sub-SELECT with OFFSET 0 as an optimization fence, and placing the problematic function or operator outside the sub-SELECT.

F.33.7. 範例

這是使用 postgres_fdw 建立一個外部資料表的範例。首先安裝延伸功能：

CREATE EXTENSION postgres_fdw;

然後使用 CREATE SERVER 建立一個外部伺服器。在此範例中，我們希望連線到主機 192.83.123.89 上連接埠為 5432 的 PostgreSQL 伺服器。該伺服器建立連線的資料庫，在遠端伺服器上的名稱為 foreign_db：

CREATE SERVER foreign_server
        FOREIGN DATA WRAPPER postgres_fdw
        OPTIONS (host '192.83.123.89', port '5432', dbname 'foreign_db');

還需要使用 CREATE USER MAPPING 定義的使用者對應，以標示將在遠端伺服器上所使用的角色：

CREATE USER MAPPING FOR local_user
        SERVER foreign_server
        OPTIONS (user 'foreign_user', password 'password');

現在可以用 CREATE FOREIGN TABLE 建立一個外部資料表。在此範例中，我們希望存取遠端伺服器上名為 some_schema.some_table 的資料表。它在本機的名稱將為 foreign_table：

CREATE FOREIGN TABLE foreign_table (
        id integer NOT NULL,
        data text
)
        SERVER foreign_server
        OPTIONS (schema_name 'some_schema', table_name 'some_table');

在 CREATE FOREIGN TABLE 中宣告的欄位資料型別和其他屬性必須與實際遠端的資料表相符。欄位名稱也必須相符，除非您將 column_name 選項附加到各個欄位以表示它們在遠端資料表中的命名方式。在許多情況下，使用 IMPORT FOREIGN SCHEMA 優於手動建構外部資料表定義。

F.33.8. 作者

Shigeru Hanada <shigeru.hanada@gmail.com>

F.35. sepgsql

sepgsql is a loadable module that supports label-based mandatory access control (MAC) based on SELinux security policy.

Warning

The current implementation has significant limitations, and does not enforce mandatory access control for all actions. See Section F.35.7.

F.35.1. Overview

This module integrates with SELinux to provide an additional layer of security checking above and beyond what is normally provided by PostgreSQL. From the perspective of SELinux, this module allows PostgreSQL to function as a user-space object manager. Each table or function access initiated by a DML query will be checked against the system security policy. This check is in addition to the usual SQL permissions checking performed by PostgreSQL.

SELinux access control decisions are made using security labels, which are represented by strings such as system_u:object_r:sepgsql_table_t:s0. Each access control decision involves two labels: the label of the subject attempting to perform the action, and the label of the object on which the operation is to be performed. Since these labels can be applied to any sort of object, access control decisions for objects stored within the database can be (and, with this module, are) subjected to the same general criteria used for objects of any other type, such as files. This design is intended to allow a centralized security policy to protect information assets independent of the particulars of how those assets are stored.

The SECURITY LABEL statement allows assignment of a security label to a database object.

F.35.2. Installation

sepgsql can only be used on Linux 2.6.28 or higher with SELinux enabled. It is not available on any other platform. You will also need libselinux 2.1.10 or higher and selinux-policy 3.9.13 or higher (although some distributions may backport the necessary rules into older policy versions).

The sestatus command allows you to check the status of SELinux. A typical display is:

$ sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   enforcing
Mode from config file:          enforcing
Policy version:                 24
Policy from config file:        targeted

If SELinux is disabled or not installed, you must set that product up first before installing this module.

To build this module, include the option --with-selinux in your PostgreSQL configure command. Be sure that the libselinux-devel RPM is installed at build time.

To use this module, you must include sepgsql in the shared_preload_libraries parameter in postgresql.conf. The module will not function correctly if loaded in any other manner. Once the module is loaded, you should execute sepgsql.sql in each database. This will install functions needed for security label management, and assign initial security labels.

Here is an example showing how to initialize a fresh database cluster with sepgsql functions and security labels installed. Adjust the paths shown as appropriate for your installation:

$ export PGDATA=/path/to/data/directory
$ initdb
$ vi $PGDATA/postgresql.conf
  change
    #shared_preload_libraries = ''                # (change requires restart)
  to
    shared_preload_libraries = 'sepgsql'          # (change requires restart)
$ for DBNAME in template0 template1 postgres; do
    postgres --single -F -c exit_on_error=true $DBNAME \
      </usr/local/pgsql/share/contrib/sepgsql.sql >/dev/null
  done

Please note that you may see some or all of the following notifications depending on the particular versions you have of libselinux and selinux-policy:

/etc/selinux/targeted/contexts/sepgsql_contexts:  line 33 has invalid object type db_blobs
/etc/selinux/targeted/contexts/sepgsql_contexts:  line 36 has invalid object type db_language
/etc/selinux/targeted/contexts/sepgsql_contexts:  line 37 has invalid object type db_language
/etc/selinux/targeted/contexts/sepgsql_contexts:  line 38 has invalid object type db_language
/etc/selinux/targeted/contexts/sepgsql_contexts:  line 39 has invalid object type db_language
/etc/selinux/targeted/contexts/sepgsql_contexts:  line 40 has invalid object type db_language

These messages are harmless and should be ignored.

If the installation process completes without error, you can now start the server normally.

F.35.3. Regression Tests

Due to the nature of SELinux, running the regression tests for sepgsql requires several extra configuration steps, some of which must be done as root. The regression tests will not be run by an ordinary make check or make installcheck command; you must set up the configuration and then invoke the test script manually. The tests must be run in the contrib/sepgsql directory of a configured PostgreSQL build tree. Although they require a build tree, the tests are designed to be executed against an installed server, that is they are comparable to make installcheck not make check.

First, set up sepgsql in a working database according to the instructions in Section F.35.2. Note that the current operating system user must be able to connect to the database as superuser without password authentication.

Second, build and install the policy package for the regression test. The sepgsql-regtest policy is a special purpose policy package which provides a set of rules to be allowed during the regression tests. It should be built from the policy source file sepgsql-regtest.te, which is done using make with a Makefile supplied by SELinux. You will need to locate the appropriate Makefile on your system; the path shown below is only an example. Once built, install this policy package using the semodule command, which loads supplied policy packages into the kernel. If the package is correctly installed, semodule -l should list sepgsql-regtest as an available policy package:

$ cd .../contrib/sepgsql
$ make -f /usr/share/selinux/devel/Makefile
$ sudo semodule -u sepgsql-regtest.pp
$ sudo semodule -l | grep sepgsql
sepgsql-regtest 1.07

Third, turn on sepgsql_regression_test_mode. For security reasons, the rules in sepgsql-regtest are not enabled by default; the sepgsql_regression_test_mode parameter enables the rules needed to launch the regression tests. It can be turned on using the setsebool command:

$ sudo setsebool sepgsql_regression_test_mode on
$ getsebool sepgsql_regression_test_mode
sepgsql_regression_test_mode --> on

Fourth, verify your shell is operating in the unconfined_t domain:

$ id -Z
unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

See Section F.35.8 for details on adjusting your working domain, if necessary.

Finally, run the regression test script:

$ ./test_sepgsql

This script will attempt to verify that you have done all the configuration steps correctly, and then it will run the regression tests for the sepgsql module.

After completing the tests, it's recommended you disable the sepgsql_regression_test_mode parameter:

$ sudo setsebool sepgsql_regression_test_mode off

You might prefer to remove the sepgsql-regtest policy entirely:

$ sudo semodule -r sepgsql-regtest

F.35.4. GUC Parameters

sepgsql.permissive (boolean)

This parameter enables sepgsql to function in permissive mode, regardless of the system setting. The default is off. This parameter can only be set in the postgresql.conf file or on the server command line.

When this parameter is on, sepgsql functions in permissive mode, even if SELinux in general is working in enforcing mode. This parameter is primarily useful for testing purposes.sepgsql.debug_audit (boolean)

This parameter enables the printing of audit messages regardless of the system policy settings. The default is off, which means that messages will be printed according to the system settings.

The security policy of SELinux also has rules to control whether or not particular accesses are logged. By default, access violations are logged, but allowed accesses are not.

This parameter forces all possible logging to be turned on, regardless of the system policy.

F.35.5. Features

F.35.5.1. Controlled Object Classes

The security model of SELinux describes all the access control rules as relationships between a subject entity (typically, a client of the database) and an object entity (such as a database object), each of which is identified by a security label. If access to an unlabeled object is attempted, the object is treated as if it were assigned the label unlabeled_t.

Currently, sepgsql allows security labels to be assigned to schemas, tables, columns, sequences, views, and functions. When sepgsql is in use, security labels are automatically assigned to supported database objects at creation time. This label is called a default security label, and is decided according to the system security policy, which takes as input the creator's label, the label assigned to the new object's parent object and optionally name of the constructed object.

A new database object basically inherits the security label of the parent object, except when the security policy has special rules known as type-transition rules, in which case a different label may be applied. For schemas, the parent object is the current database; for tables, sequences, views, and functions, it is the containing schema; for columns, it is the containing table.

F.35.5.2. DML Permissions

For tables, db_table:select, db_table:insert, db_table:update or db_table:delete are checked for all the referenced target tables depending on the kind of statement; in addition, db_table:select is also checked for all the tables that contain columns referenced in the WHERE or RETURNING clause, as a data source for UPDATE, and so on.

Column-level permissions will also be checked for each referenced column. db_column:select is checked on not only the columns being read using SELECT, but those being referenced in other DML statements; db_column:update or db_column:insert will also be checked for columns being modified by UPDATE or INSERT.

For example, consider:

UPDATE t1 SET x = 2, y = func1(y) WHERE z = 100;

Here, db_column:update will be checked for t1.x, since it is being updated, db_column:{select update} will be checked for t1.y, since it is both updated and referenced, and db_column:select will be checked for t1.z, since it is only referenced. db_table:{select update} will also be checked at the table level.

For sequences, db_sequence:get_value is checked when we reference a sequence object using SELECT; however, note that we do not currently check permissions on execution of corresponding functions such as lastval().

For views, db_view:expand will be checked, then any other required permissions will be checked on the objects being expanded from the view, individually.

For functions, db_procedure:{execute} will be checked when user tries to execute a function as a part of query, or using fast-path invocation. If this function is a trusted procedure, it also checks db_procedure:{entrypoint} permission to check whether it can perform as entry point of trusted procedure.

In order to access any schema object, db_schema:search permission is required on the containing schema. When an object is referenced without schema qualification, schemas on which this permission is not present will not be searched (just as if the user did not have USAGE privilege on the schema). If an explicit schema qualification is present, an error will occur if the user does not have the requisite permission on the named schema.

The client must be allowed to access all referenced tables and columns, even if they originated from views which were then expanded, so that we apply consistent access control rules independent of the manner in which the table contents are referenced.

The default database privilege system allows database superusers to modify system catalogs using DML commands, and reference or modify toast tables. These operations are prohibited when sepgsql is enabled.

F.35.5.3. DDL Permissions

SELinux defines several permissions to control common operations for each object type; such as creation, alter, drop and relabel of security label. In addition, several object types have special permissions to control their characteristic operations; such as addition or deletion of name entries within a particular schema.

Creating a new database object requires create permission. SELinux will grant or deny this permission based on the client's security label and the proposed security label for the new object. In some cases, additional privileges are required:

CREATE DATABASE additionally requires getattr permission for the source or template database.
Creating a schema object additionally requires add_name permission on the parent schema.
Creating a table additionally requires permission to create each individual table column, just as if each table column were a separate top-level object.
Creating a function marked as LEAKPROOF additionally requires install permission. (This permission is also checked when LEAKPROOF is set for an existing function.)

When DROP command is executed, drop will be checked on the object being removed. Permissions will be also checked for objects dropped indirectly via CASCADE. Deletion of objects contained within a particular schema (tables, views, sequences and procedures) additionally requires remove_name on the schema.

When ALTER command is executed, setattr will be checked on the object being modified for each object types, except for subsidiary objects such as the indexes or triggers of a table, where permissions are instead checked on the parent object. In some cases, additional permissions are required:

Moving an object to a new schema additionally requires remove_name permission on the old schema and add_name permission on the new one.
Setting the LEAKPROOF attribute on a function requires install permission.
Using SECURITY LABEL on an object additionally requires relabelfrom permission for the object in conjunction with its old security label and relabelto permission for the object in conjunction with its new security label. (In cases where multiple label providers are installed and the user tries to set a security label, but it is not managed by SELinux, only setattr should be checked here. This is currently not done due to implementation restrictions.)

F.35.5.4. Trusted Procedures

Trusted procedures are similar to security definer functions or setuid commands. SELinux provides a feature to allow trusted code to run using a security label different from that of the client, generally for the purpose of providing highly controlled access to sensitive data (e.g. rows might be omitted, or the precision of stored values might be reduced). Whether or not a function acts as a trusted procedure is controlled by its security label and the operating system security policy. For example:

postgres=# CREATE TABLE customer (
               cid     int primary key,
               cname   text,
               credit  text
           );
CREATE TABLE
postgres=# SECURITY LABEL ON COLUMN customer.credit
               IS 'system_u:object_r:sepgsql_secret_table_t:s0';
SECURITY LABEL
postgres=# CREATE FUNCTION show_credit(int) RETURNS text
             AS 'SELECT regexp_replace(credit, ''-[0-9]+$'', ''-xxxx'', ''g'')
                        FROM customer WHERE cid = $1'
           LANGUAGE sql;
CREATE FUNCTION
postgres=# SECURITY LABEL ON FUNCTION show_credit(int)
               IS 'system_u:object_r:sepgsql_trusted_proc_exec_t:s0';
SECURITY LABEL

The above operations should be performed by an administrative user.

postgres=# SELECT * FROM customer;
ERROR:  SELinux: security policy violation
postgres=# SELECT cid, cname, show_credit(cid) FROM customer;
 cid | cname  |     show_credit
-----+--------+---------------------
   1 | taro   | 1111-2222-3333-xxxx
   2 | hanako | 5555-6666-7777-xxxx
(2 rows)

In this case, a regular user cannot reference customer.credit directly, but a trusted procedure show_credit allows the user to print the credit card numbers of customers with some of the digits masked out.

F.35.5.5. Dynamic Domain Transitions

It is possible to use SELinux's dynamic domain transition feature to switch the security label of the client process, the client domain, to a new context, if that is allowed by the security policy. The client domain needs the setcurrent permission and also dyntransition from the old to the new domain.

Dynamic domain transitions should be considered carefully, because they allow users to switch their label, and therefore their privileges, at their option, rather than (as in the case of a trusted procedure) as mandated by the system. Thus, the dyntransition permission is only considered safe when used to switch to a domain with a smaller set of privileges than the original one. For example:

regression=# select sepgsql_getcon();
                    sepgsql_getcon
-------------------------------------------------------
 unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
(1 row)

regression=# SELECT sepgsql_setcon('unconfined_u:unconfined_r:unconfined_t:s0-s0:c1.c4');
 sepgsql_setcon 
----------------
 t
(1 row)

regression=# SELECT sepgsql_setcon('unconfined_u:unconfined_r:unconfined_t:s0-s0:c1.c1023');
ERROR:  SELinux: security policy violation

In this example above we were allowed to switch from the larger MCS range c1.c1023 to the smaller range c1.c4, but switching back was denied.

A combination of dynamic domain transition and trusted procedure enables an interesting use case that fits the typical process life-cycle of connection pooling software. Even if your connection pooling software is not allowed to run most of SQL commands, you can allow it to switch the security label of the client using the sepgsql_setcon() function from within a trusted procedure; that should take some credential to authorize the request to switch the client label. After that, this session will have the privileges of the target user, rather than the connection pooler. The connection pooler can later revert the security label change by again using sepgsql_setcon() with NULL argument, again invoked from within a trusted procedure with appropriate permissions checks. The point here is that only the trusted procedure actually has permission to change the effective security label, and only does so when given proper credentials. Of course, for secure operation, the credential store (table, procedure definition, or whatever) must be protected from unauthorized access.

F.35.5.6. Miscellaneous

We reject the LOAD command across the board, because any module loaded could easily circumvent security policy enforcement.

F.35.6. Sepgsql Functions

Table F.29 shows the available functions.

Table F.29. Sepgsql Functions

F.35.7. Limitations

Data Definition Language (DDL) Permissions

Due to implementation restrictions, some DDL operations do not check permissions.Data Control Language (DCL) Permissions

Due to implementation restrictions, DCL operations do not check permissions.Row-level access control

PostgreSQL supports row-level access, but sepgsql does not.Covert channels

sepgsql does not try to hide the existence of a certain object, even if the user is not allowed to reference it. For example, we can infer the existence of an invisible object as a result of primary key conflicts, foreign key violations, and so on, even if we cannot obtain the contents of the object. The existence of a top secret table cannot be hidden; we only hope to conceal its contents.

F.35.8. External Resources

SE-PostgreSQL Introduction

This wiki page provides a brief overview, security design, architecture, administration and upcoming features.SELinux User's and Administrator's Guide

This document provides a wide spectrum of knowledge to administer SELinux on your systems. It focuses primarily on Red Hat operating systems, but is not limited to them.Fedora SELinux FAQ

This document answers frequently asked questions about SELinux. It focuses primarily on Fedora, but is not limited to Fedora.

F.35.9. Author

KaiGai Kohei <kaigai@ak.jp.nec.com>

F.30. pgstattuple

pgstattuple 模組提供各種函數來取得 tuple 層級的統計資訊。

由於這些函數會回傳詳細的 page-level 資訊，因此預設是限制存取的。預設情況下，只有角色 pg_stat_scan_tables 具有 EXECUTE 權限。超級使用者當然可以繞過此限制。安裝此延伸功能後，使用者可以發出 GRANT 指令來授予函數的權限，以允許其他人執行它們。但是，最好還是將這些使用者加到 pg_stat_scan_tables 角色群組之中。

F.30.1. Functions

pgstattuple(regclass) returns record

pgstattuple returns a relation's physical length, percentage of “dead” tuples, and other info. This may help users to determine whether vacuum is necessary or not. The argument is the target relation's name (optionally schema-qualified) or OID. For example:

test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
-[ RECORD 1 ]------+-------
table_len          | 458752
tuple_count        | 1470
tuple_len          | 438896
tuple_percent      | 95.67
dead_tuple_count   | 11
dead_tuple_len     | 3157
dead_tuple_percent | 0.69
free_space         | 8932
free_percent       | 1.95

The output columns are described in Table F.22.

Table F.22. `pgstattuple` Output Columns

Note

The table_len will always be greater than the sum of the tuple_len, dead_tuple_len and free_space. The difference is accounted for by fixed page overhead, the per-page table of pointers to tuples, and padding to ensure that tuples are correctly aligned.

pgstattuple acquires only a read lock on the relation. So the results do not reflect an instantaneous snapshot; concurrent updates will affect them.

pgstattuple judges a tuple is “dead” if HeapTupleSatisfiesDirty returns false.pgstattuple(text) returns record

This is the same as pgstattuple(regclass), except that the target relation is specified as TEXT. This function is kept because of backward-compatibility so far, and will be deprecated in some future release.pgstatindex(regclass) returns record

pgstatindex returns a record showing information about a B-tree index. For example:

test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
-[ RECORD 1 ]------+------
version            | 2
tree_level         | 0
index_size         | 16384
root_block_no      | 1
internal_pages     | 0
leaf_pages         | 1
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 54.27
leaf_fragmentation | 0

The output columns are:

The reported index_size will normally correspond to one more page than is accounted for by internal_pages + leaf_pages + empty_pages + deleted_pages, because it also includes the index's metapage.

As with pgstattuple, the results are accumulated page-by-page, and should not be expected to represent an instantaneous snapshot of the whole index.pgstatindex(text) returns record

This is the same as pgstatindex(regclass), except that the target index is specified as TEXT. This function is kept because of backward-compatibility so far, and will be deprecated in some future release.pgstatginindex(regclass) returns record

pgstatginindex returns a record showing information about a GIN index. For example:

test=> SELECT * FROM pgstatginindex('test_gin_index');
-[ RECORD 1 ]--+--
version        | 1
pending_pages  | 0
pending_tuples | 0

The output columns are:

pgstathashindex(regclass) returns record

pgstathashindex returns a record showing information about a HASH index. For example:

test=> select * from pgstathashindex('con_hash_index');
-[ RECORD 1 ]--+-----------------
version        | 4
bucket_pages   | 33081
overflow_pages | 0
bitmap_pages   | 1
unused_pages   | 32455
live_items     | 10204006
dead_items     | 0
free_percent   | 61.8005949100872

The output columns are:

pg_relpages(regclass) returns bigint

pg_relpages returns the number of pages in the relation.pg_relpages(text) returns bigint

This is the same as pg_relpages(regclass), except that the target relation is specified as TEXT. This function is kept because of backward-compatibility so far, and will be deprecated in some future release.pgstattuple_approx(regclass) returns record

pgstattuple_approx is a faster alternative to pgstattuple that returns approximate results. The argument is the target relation's name or OID. For example:

test=> SELECT * FROM pgstattuple_approx('pg_catalog.pg_proc'::regclass);
-[ RECORD 1 ]--------+-------
table_len            | 573440
scanned_percent      | 2
approx_tuple_count   | 2740
approx_tuple_len     | 561210
approx_tuple_percent | 97.87
dead_tuple_count     | 0
dead_tuple_len       | 0
dead_tuple_percent   | 0
approx_free_space    | 11996
approx_free_percent  | 2.09

The output columns are described in Table F.23.

Whereas pgstattuple always performs a full-table scan and returns an exact count of live and dead tuples (and their sizes) and free space, pgstattuple_approx tries to avoid the full-table scan and returns exact dead tuple statistics along with an approximation of the number and size of live tuples and free space.

It does this by skipping pages that have only visible tuples according to the visibility map (if a page has the corresponding VM bit set, then it is assumed to contain no dead tuples). For such pages, it derives the free space value from the free space map, and assumes that the rest of the space on the page is taken up by live tuples.

For pages that cannot be skipped, it scans each tuple, recording its presence and size in the appropriate counters, and adding up the free space on the page. At the end, it estimates the total number of live tuples based on the number of pages and tuples scanned (in the same way that VACUUM estimates pg_class.reltuples).

Table F.23. `pgstattuple_approx` Output Columns

In the above output, the free space figures may not match the pgstattuple output exactly, because the free space map gives us an exact figure, but is not guaranteed to be accurate to the byte.

F.30.2. Authors

Tatsuo Ishii, Satoshi Nagayasu and Abhijit Menon-Sen

F.13. earthdistance

The earthdistance module provides two different approaches to calculating great circle distances on the surface of the Earth. The one described first depends on the cube module (which must be installed before earthdistance can be installed). The second one is based on the built-in point data type, using longitude and latitude for the coordinates.

In this module, the Earth is assumed to be perfectly spherical. (If that's too inaccurate for you, you might want to look at the PostGIS project.)

F.13.1. Cube-Based Earth Distances

Data is stored in cubes that are points (both corners are the same) using 3 coordinates representing the x, y, and z distance from the center of the Earth. A domain earth over cube is provided, which includes constraint checks that the value meets these restrictions and is reasonably close to the actual surface of the Earth.

The radius of the Earth is obtained from the earth() function. It is given in meters. But by changing this one function you can change the module to use some other units, or to use a different value of the radius that you feel is more appropriate.

This package has applications to astronomical databases as well. Astronomers will probably want to change earth() to return a radius of 180/pi() so that distances are in degrees.

Functions are provided to support input in latitude and longitude (in degrees), to support output of latitude and longitude, to calculate the great circle distance between two points and to easily specify a bounding box usable for index searches.

The provided functions are shown in Table F.5.

Table F.5. Cube-Based Earthdistance Functions

F.13.2. Point-Based Earth Distances

The second part of the module relies on representing Earth locations as values of type point, in which the first component is taken to represent longitude in degrees, and the second component is taken to represent latitude in degrees. Points are taken as (longitude, latitude) and not vice versa because longitude is closer to the intuitive idea of x-axis and latitude to y-axis.

A single operator is provided, shown in Table F.6.

Table F.6. Point-Based Earthdistance Operators

Note that unlike the cube-based part of the module, units are hardwired here: changing the earth() function will not affect the results of this operator.

One disadvantage of the longitude/latitude representation is that you need to be careful about the edge conditions near the poles and near +/- 180 degrees of longitude. The cube-based representation avoids these discontinuities.

F.24. pg_buffercache

pg_buffercache 延伸功能提供了一個即時檢查共享緩衝區配置情況的方法。

該模塊提供了一個 C 函數的 pg_buffercache_pages，該函數回傳一組記錄，以及一個檢視表 pg_buffercache，該檢視表封裝了該函數以便於使用。

預設情況下，僅限於超級使用者和 pg_monitor 角色的成員使用。也可以使用 GRANT 將存取權限授予其他角色。

F.24.1. The `pg_buffercache` View

Table F.15 中列出了此檢視表的欄位定義。

Table F.15. `pg_buffercache` Columns

There is one row for each buffer in the shared cache. Unused buffers are shown with all fields null except bufferid. Shared system catalogs are shown as belonging to database zero.

Because the cache is shared by all the databases, there will normally be pages from relations not belonging to the current database. This means that there may not be matching join rows in pg_class for some rows, or that there could even be incorrect joins. If you are trying to join against pg_class, it's a good idea to restrict the join to rows having reldatabase equal to the current database's OID or zero.

Since buffer manager locks are not taken to copy the buffer state data that the view will display, accessing pg_buffercache view has less impact on normal buffer activity but it doesn't provide a consistent set of results across all buffers. However, we ensure that the information of each buffer is self-consistent.

F.24.2. Sample Output

regression=# SELECT n.nspname, c.relname, count(*) AS buffers
             FROM pg_buffercache b JOIN pg_class c
             ON b.relfilenode = pg_relation_filenode(c.oid) AND
                b.reldatabase IN (0, (SELECT oid FROM pg_database
                                      WHERE datname = current_database()))
             JOIN pg_namespace n ON n.oid = c.relnamespace
             GROUP BY n.nspname, c.relname
             ORDER BY 3 DESC
             LIMIT 10;

  nspname   |        relname         | buffers
------------+------------------------+---------
 public     | delete_test_table      |     593
 public     | delete_test_table_pkey |     494
 pg_catalog | pg_attribute           |     472
 public     | quad_poly_tbl          |     353
 public     | tenk2                  |     349
 public     | tenk1                  |     349
 public     | gin_test_idx           |     306
 pg_catalog | pg_largeobject         |     206
 public     | gin_test_tbl           |     188
 public     | spgist_text_tbl        |     182
(10 rows)

F.24.3. Authors

Mark Kirkwood <markir@paradise.net.nz>

Design suggestions: Neil Conway <neilc@samurai.com>

Debugging advice: Tom Lane <tgl@sss.pgh.pa.us>

F.29. pg_stat_statements

pg_stat_statements 模組提供了一個追踪在伺服器上執行的 SQL 語句統計資訊方法。

必須透過將 pg_stat_statements 加到 postgresql.conf 中的 shared_preload_libraries 中來載入模組，因為它需要額外的共享記憶體。這意味著需要重新啟動伺服器才能載加或刪除模組。

載入 pg_stat_statements 後，它將追踪伺服器所有資料庫的統計資訊。為了存取和處理這些統計資訊，此模組提供了一個檢視表 pg_stat_statements 以及工具程序函數 pg_stat_statements_reset 和 pg_stat_statements。這些不是全域可用的，但可以使用 CREATE EXTENSION pg_stat_statements 為特定資料庫啟用。

F.29.1. The `pg_stat_statements` View

此延伸功能收集的統計數據可透過名為 pg_stat_statements 的檢視表查詢。對於每個不同的資料庫 ID、使用者 ID和查詢語句 ID（此延伸功能可以追踪的最大不同查詢語句數量），在此檢視表會在一筆資料中呈現。檢視表的欄位在 Table F.21 中說明。

Table F.21. `pg_stat_statements` Columns

因為安全因素，僅超級使用者和 pg_read_all_stats 角色成員被允許查看其他使用者所執行的 SQL 語句和 queryid。但是，如果檢視圖已安裝在他們的資料庫中，則其他使用者也可以查看統計內容。

只要是有查詢計劃查詢的查詢（即 SELECT、INSERT、UPDATE 和 DELETE）根據內部雜湊計算具有相同的查詢結構，它們就會組合到單筆 pg_stat_statements 資料中。通常，如果兩個查詢在語義上等效，即兩個查詢在此意義上是相同的，只是出現在查詢中的常數內容的值除外。但是，會嚴格地根據資料庫結構維護指令（即所有其他指令）的查詢字串進行比較。

為了將查詢與其他查詢搭配而忽略了常數內容時，該常數內容會在 pg_stat_statements 顯示中替換為參數符號，例如 $1。查詢語句的其餘部分是第一個查詢的內容，該查詢具有與 pg_stat_statements 項目關聯的特定 queryid 雜湊值。

In some cases, queries with visibly different texts might get merged into a single pg_stat_statements entry. Normally this will happen only for semantically equivalent queries, but there is a small chance of hash collisions causing unrelated queries to be merged into one entry. (This cannot happen for queries belonging to different users or databases, however.)

Since the queryid hash value is computed on the post-parse-analysis representation of the queries, the opposite is also possible: queries with identical texts might appear as separate entries, if they have different meanings as a result of factors such as different search_path settings.

Consumers of pg_stat_statements may wish to use queryid (perhaps in combination with dbid and userid) as a more stable and reliable identifier for each entry than its query text. However, it is important to understand that there are only limited guarantees around the stability of the queryid hash value. Since the identifier is derived from the post-parse-analysis tree, its value is a function of, among other things, the internal object identifiers appearing in this representation. This has some counterintuitive implications. For example, pg_stat_statements will consider two apparently-identical queries to be distinct, if they reference a table that was dropped and recreated between the executions of the two queries. The hashing process is also sensitive to differences in machine architecture and other facets of the platform. Furthermore, it is not safe to assume that queryid will be stable across major versions of PostgreSQL.

As a rule of thumb, queryid values can be assumed to be stable and comparable only so long as the underlying server version and catalog metadata details stay exactly the same. Two servers participating in replication based on physical WAL replay can be expected to have identical queryid values for the same query. However, logical replication schemes do not promise to keep replicas identical in all relevant details, so queryid will not be a useful identifier for accumulating costs across a set of logical replicas. If in doubt, direct testing is recommended.

The parameter symbols used to replace constants in representative query texts start from the next number after the highest $n parameter in the original query text, or $1 if there was none. It's worth noting that in some cases there may be hidden parameter symbols that affect this numbering. For example, PL/pgSQL uses hidden parameter symbols to insert values of function local variables into queries, so that a PL/pgSQL statement like SELECT i + 1 INTO j would have representative text like SELECT i + $2.

The representative query texts are kept in an external disk file, and do not consume shared memory. Therefore, even very lengthy query texts can be stored successfully. However, if many long query texts are accumulated, the external file might grow unmanageably large. As a recovery method if that happens, pg_stat_statements may choose to discard the query texts, whereupon all existing entries in the pg_stat_statements view will show null query fields, though the statistics associated with each queryid are preserved. If this happens, consider reducing pg_stat_statements.max to prevent recurrences.

plans 和 calls 不一定會完全相等，因為查詢計劃和執行統計資訊會在其各自的執行階段進行更新，並且僅針對成功的操作進行更新。例如，某一條語句已經進行了查詢計劃，但在執行階段卻失敗了，則僅更新其查詢計劃統計資訊。如果由於使用了快取的查詢計劃而跳過了計劃階段，也只會更新其執行階段的統計資訊。

F.29.2. Functions

pg_stat_statements_reset(userid Oid, dbid Oid, queryid bigint) returns void

pg_stat_statements_reset 會移除到目前為止由 pg_stat_statements 收集的與指定的 userid、dbid 和 queryid 相對應的統計資訊。如果未指定任何參數，則對每個參數使用預設值 0（無效），並且將重置與其他參數相對應的統計資訊。如果未指定任何參數，或者所有指定的參數均為0（無效），則將移除所有統計資訊。預設情況下，此功能只能由超級使用者執行。可以使用 GRANT 將存取權限授予其他人。

pg_stat_statements(showtext boolean) returns setof record

pg_stat_statements 檢視表是根據也稱為 pg_stat_statements 的函數定義的。用戶端可以直接呼叫 pg_stat_statements 函數，並透過指定showtext := false 可以省略查詢字串（即，對應於檢視圖查詢欄位的 OUT 參數將回傳 null）。此功能旨在支持可能希望避免重複獲取長度不確定的查詢字串成本的外部工具。這樣的工具可以代替暫存每個項目本身觀察到的第一個查詢字串，因為 pg_stat_statements 本身就是這樣做的，然後僅根據需要檢索查詢字串。由於伺服器將查詢字串儲存在檔案中，因此此方法可以減少用於重複檢查 pg_stat_statements 資料的實際 I/O 成本。

F.29.3. Configuration Parameters

pg_stat_statements.max (integer)

pg_stat_statements.max 設定此模組所追踪的語句數量上限（即 pg_stat_statements 檢視表中的最大資料列數）。如果觀察到的語句不同，則將丟棄有關執行最少的語句的資訊。預設值為 5,000。只能在伺服器啟動時設定此參數。

pg_stat_statements.track (enum)

pg_stat_statements.track 控制此模組關注哪些語句。指定 top 表示追踪最上層語句（由用戶端直接發出的語句），也可以全部追踪巢狀語句（例如在函數內呼叫的語句），或者不指定以停用語句統計資訊收集。預設值為 top。只有超級使用者可以變更此設定。

pg_stat_statements.track_utility (boolean)

pg_stat_statements.track_utility 控制模組是否追踪管理程序命令。管理程序命令是除 SELECT、INSERT、UPDATE 和 DELETE 之外的所有命令。預設值為 on。只有超級使用者可以變更改此設定。

pg_stat_statements.save (boolean)

pg_stat_statements.save 指定是否在伺服器關閉時保存語句統計資訊。如果關閉，則統計資訊不會在關閉時保存，也不會在伺服器啟動時重新載入。預設值為開。只能在 postgresql.conf 檔案或伺服器命令列中設定此參數。

此模塊需要與 pg_stat_statements.max 成比例的額外共享記憶體。請注意，即使將 pg_stat_statements.track 設定為 none，只要載入模組，就會佔用記憶體空間。

這些參數必須在 postgresql.conf 中設定。典型的用法可能是：

# postgresql.conf
shared_preload_libraries = 'pg_stat_statements'

pg_stat_statements.max = 10000
pg_stat_statements.track = all

F.29.4. Sample Output

bench=# SELECT pg_stat_statements_reset();

$ pgbench -i bench
$ pgbench -c10 -t300 bench

bench=# \x
bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
               nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
          FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
-[ RECORD 1 ]--------------------------------------------------------------------
query       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2
calls       | 3000
total_time  | 25565.855387
rows        | 3000
hit_percent | 100.0000000000000000
-[ RECORD 2 ]--------------------------------------------------------------------
query       | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2
calls       | 3000
total_time  | 20756.669379
rows        | 3000
hit_percent | 100.0000000000000000
-[ RECORD 3 ]--------------------------------------------------------------------
query       | copy pgbench_accounts from stdin
calls       | 1
total_time  | 291.865911
rows        | 100000
hit_percent | 100.0000000000000000
-[ RECORD 4 ]--------------------------------------------------------------------
query       | UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2
calls       | 3000
total_time  | 271.232977
rows        | 3000
hit_percent | 98.5723926698852723
-[ RECORD 5 ]--------------------------------------------------------------------
query       | alter table pgbench_accounts add primary key (aid)
calls       | 1
total_time  | 160.588563
rows        | 0
hit_percent | 100.0000000000000000


bench=# SELECT pg_stat_statements_reset(0,0,s.queryid) FROM pg_stat_statements AS s
            WHERE s.query = 'UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2';

bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
               nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
          FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
-[ RECORD 1 ]--------------------------------------------------------------------
query       | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2
calls       | 3000
total_time  | 20756.669379
rows        | 3000
hit_percent | 100.0000000000000000
-[ RECORD 2 ]--------------------------------------------------------------------
query       | copy pgbench_accounts from stdin
calls       | 1
total_time  | 291.865911
rows        | 100000
hit_percent | 100.0000000000000000
-[ RECORD 3 ]--------------------------------------------------------------------
query       | UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2
calls       | 3000
total_time  | 271.232977
rows        | 3000
hit_percent | 98.5723926698852723
-[ RECORD 4 ]--------------------------------------------------------------------
query       | alter table pgbench_accounts add primary key (aid)
calls       | 1
total_time  | 160.588563
rows        | 0
hit_percent | 100.0000000000000000
-[ RECORD 5 ]--------------------------------------------------------------------
query       | vacuum analyze pgbench_accounts
calls       | 1
total_time  | 136.448116
rows        | 0
hit_percent | 99.9201915403032721

bench=# SELECT pg_stat_statements_reset(0,0,0);

bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
               nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
          FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
-[ RECORD 1 ]---------------------------------------
query       | SELECT pg_stat_statements_reset(0,0,0)
calls       | 1
total_time  | 0.189497
rows        | 1
hit_percent |

F.29.5. Authors

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>. Query normalization added by Peter Geoghegan <peter@2ndquadrant.com>.

F.31. pg_trgm

pg_trgm 模組提供了用於根據 trigram 配對決定包含字母及數字文字內容相似性的函數和運算子，以及支援快速搜索相似字串的索引運算子類。

F.31.1. Trigram (or Trigraph) Concepts

trigram 是從字串中提取的一組三個連續字元。我們可以透過計算兩個字串共享的三連詞的數量來衡量它們的相似性。這個簡單的想法對測量許多自然語言中單詞的相似性非常有用。

Note

pg_trgm ignores non-word characters (non-alphanumerics) when extracting trigrams from a string. Each word is considered to have two spaces prefixed and one space suffixed when determining the set of trigrams contained in the string. For example, the set of trigrams in the string “cat” is “ c”, “ ca”, “cat”, and “at ”. The set of trigrams in the string “foo|bar” is “ f”, “ fo”, “foo”, “oo ”, “ b”, “ ba”, “bar”, and “ar ”.

F.31.2. Functions and Operators

The functions provided by the pg_trgm module are shown in Table F.24, the operators in Table F.25.

Table F.24. `pg_trgm` Functions

Consider the following example:

# SELECT word_similarity('word', 'two words');
 word_similarity
-----------------
             0.8
(1 row)

In the first string, the set of trigrams is {" w"," wo","wor","ord","rd "}. In the second string, the ordered set of trigrams is {" t"," tw","two","wo "," w"," wo","wor","ord","rds","ds "}. The most similar extent of an ordered set of trigrams in the second string is {" w"," wo","wor","ord"}, and the similarity is 0.8.

This function returns a value that can be approximately understood as the greatest similarity between the first string and any substring of the second string. However, this function does not add padding to the boundaries of the extent. Thus, the number of additional characters present in the second string is not considered, except for the mismatched word boundaries.

At the same time, strict_word_similarity(text, text) selects an extent of words in the second string. In the example above, strict_word_similarity(text, text) would select the extent of a single word 'words', whose set of trigrams is {" w"," wo","wor","ord","rds","ds "}.

# SELECT strict_word_similarity('word', 'two words'), similarity('word', 'words');
 strict_word_similarity | similarity
------------------------+------------
               0.571429 |   0.571429
(1 row)

Thus, the strict_word_similarity(text, text) function is useful for finding the similarity to whole words, while word_similarity(text, text) is more suitable for finding the similarity for parts of words.

Table F.25. `pg_trgm` Operators

F.31.3. GUC Parameters

pg_trgm.similarity_threshold (real)

Sets the current similarity threshold that is used by the % operator. The threshold must be between 0 and 1 (default is 0.3).pg_trgm.word_similarity_threshold (real)

Sets the current word similarity threshold that is used by the <% and %> operators. The threshold must be between 0 and 1 (default is 0.6).pg_trgm.strict_word_similarity_threshold (real)

Sets the current strict word similarity threshold that is used by the <<% and %>> operators. The threshold must be between 0 and 1 (default is 0.5).

F.31.4. Index Support

The pg_trgm module provides GiST and GIN index operator classes that allow you to create an index over a text column for the purpose of very fast similarity searches. These index types support the above-described similarity operators, and additionally support trigram-based index searches for LIKE, ILIKE, ~ and ~* queries. (These indexes do not support equality nor simple comparison operators, so you may need a regular B-tree index too.)

Example:

CREATE TABLE test_trgm (t text);
CREATE INDEX trgm_idx ON test_trgm USING GIST (t gist_trgm_ops);

CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops);

At this point, you will have an index on the t column that you can use for similarity searching. A typical query is

SELECT t, similarity(t, 'word') AS sml
  FROM test_trgm
  WHERE t % 'word'
  ORDER BY sml DESC, t;

This will return all values in the text column that are sufficiently similar to word, sorted from best match to worst. The index will be used to make this a fast operation even over very large data sets.

A variant of the above query is

SELECT t, t <-> 'word' AS dist
  FROM test_trgm
  ORDER BY dist LIMIT 10;

This can be implemented quite efficiently by GiST indexes, but not by GIN indexes. It will usually beat the first formulation when only a small number of the closest matches is wanted.

Also you can use an index on the t column for word similarity or strict word similarity. Typical queries are:

SELECT t, word_similarity('word', t) AS sml
  FROM test_trgm
  WHERE 'word' <% t
  ORDER BY sml DESC, t;

and

SELECT t, strict_word_similarity('word', t) AS sml
  FROM test_trgm
  WHERE 'word' <<% t
  ORDER BY sml DESC, t;

This will return all values in the text column for which there is a continuous extent in the corresponding ordered trigram set that is sufficiently similar to the trigram set of word, sorted from best match to worst. The index will be used to make this a fast operation even over very large data sets.

Possible variants of the above queries are:

SELECT t, 'word' <<-> t AS dist
  FROM test_trgm
  ORDER BY dist LIMIT 10;

and

SELECT t, 'word' <<<-> t AS dist
  FROM test_trgm
  ORDER BY dist LIMIT 10;

This can be implemented quite efficiently by GiST indexes, but not by GIN indexes.

Beginning in PostgreSQL 9.1, these index types also support index searches for LIKE and ILIKE, for example

SELECT * FROM test_trgm WHERE t LIKE '%foo%bar';

The index search works by extracting trigrams from the search string and then looking these up in the index. The more trigrams in the search string, the more effective the index search is. Unlike B-tree based searches, the search string need not be left-anchored.

Beginning in PostgreSQL 9.3, these index types also support index searches for regular-expression matches (~ and ~* operators), for example

SELECT * FROM test_trgm WHERE t ~ '(foo|bar)';

The index search works by extracting trigrams from the regular expression and then looking these up in the index. The more trigrams that can be extracted from the regular expression, the more effective the index search is. Unlike B-tree based searches, the search string need not be left-anchored.

For both LIKE and regular-expression searches, keep in mind that a pattern with no extractable trigrams will degenerate to a full-index scan.

The choice between GiST and GIN indexing depends on the relative performance characteristics of GiST and GIN, which are discussed elsewhere.

F.31.5. Text Search Integration

Trigram matching is a very useful tool when used in conjunction with a full text index. In particular it can help to recognize misspelled input words that will not be matched directly by the full text search mechanism.

The first step is to generate an auxiliary table containing all the unique words in the documents:

CREATE TABLE words AS SELECT word FROM
        ts_stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');

where documents is a table that has a text field bodytext that we wish to search. The reason for using the simple configuration with the to_tsvector function, instead of using a language-specific configuration, is that we want a list of the original (unstemmed) words.

Next, create a trigram index on the word column:

CREATE INDEX words_idx ON words USING GIN (word gin_trgm_ops);

Now, a SELECT query similar to the previous example can be used to suggest spellings for misspelled words in user search terms. A useful extra test is to require that the selected words are also of similar length to the misspelled word.

Note

Since the words table has been generated as a separate, static table, it will need to be periodically regenerated so that it remains reasonably up-to-date with the document collection. Keeping it exactly current is usually unnecessary.

F.31.6. References

GiST Development Site http://www.sai.msu.su/~megera/postgres/gist/

Tsearch2 Development Site http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

F.31.7. Authors

Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia

Teodor Sigaev <teodor@sigaev.ru>, Moscow, Delta-Soft Ltd.,Russia

Alexander Korotkov <a.korotkov@postgrespro.ru>, Moscow, Postgres Professional, Russia

Documentation: Christopher Kings-Lynne

This module is sponsored by Delta-Soft Ltd., Moscow, Russia.

F.38. tablefunc

tablefunc 模組內含了回傳資料表（即多筆資料列）的各種函數。這些函數本身很有用，也可以用作設計回傳多筆資料列的 C 函數的範例。

F.38.1. Functions Provided

Table F.30 列出了 tablefunc 模組所提供的函數。

Table F.30. `tablefunc` Functions

F.38.1.1. Normal_rand

normal_rand(int numvals, float8 mean, float8 stddev) returns setof float8

normal_rand produces a set of normally distributed random values (Gaussian distribution).

numvals is the number of values to be returned from the function. mean is the mean of the normal distribution of values and stddev is the standard deviation of the normal distribution of values.

For example, this call requests 1000 values with a mean of 5 and a standard deviation of 3:

test=# SELECT * FROM normal_rand(1000, 5, 3);
     normal_rand
----------------------
     1.56556322244898
     9.10040991424657
     5.36957140345079
   -0.369151492880995
    0.283600703686639
       .
       .
       .
     4.82992125404908
     9.71308014517282
     2.49639286969028
(1000 rows)

F.38.1.2. Crosstab(Text)

crosstab(text sql)
crosstab(text sql, int N)

The crosstab function is used to produce “pivot” displays, wherein data is listed across the page rather than down. For example, we might have data like

row1    val11
row1    val12
row1    val13
...
row2    val21
row2    val22
row2    val23
...

which we wish to display like

row1    val11   val12   val13   ...
row2    val21   val22   val23   ...
...

The crosstab function takes a text parameter that is a SQL query producing raw data formatted in the first way, and produces a table formatted in the second way.

The sql parameter is a SQL statement that produces the source set of data. This statement must return one row_name column, one category column, and one value column. N is an obsolete parameter, ignored if supplied (formerly this had to match the number of output value columns, but now that is determined by the calling query).

For example, the provided query might produce a set something like:

 row_name    cat    value
----------+-------+-------
  row1      cat1    val1
  row1      cat2    val2
  row1      cat3    val3
  row1      cat4    val4
  row2      cat1    val5
  row2      cat2    val6
  row2      cat3    val7
  row2      cat4    val8

The crosstab function is declared to return setof record, so the actual names and types of the output columns must be defined in the FROM clause of the calling SELECT statement, for example:

SELECT * FROM crosstab('...') AS ct(row_name text, category_1 text, category_2 text);

This example produces a set something like:

           <== value  columns  ==>
 row_name   category_1   category_2
----------+------------+------------
  row1        val1         val2
  row2        val5         val6

The FROM clause must define the output as one row_name column (of the same data type as the first result column of the SQL query) followed by N value columns (all of the same data type as the third result column of the SQL query). You can set up as many output value columns as you wish. The names of the output columns are up to you.

The crosstab function produces one output row for each consecutive group of input rows with the same row_name value. It fills the output value columns, left to right, with the value fields from these rows. If there are fewer rows in a group than there are output value columns, the extra output columns are filled with nulls; if there are more rows, the extra input rows are skipped.

In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row. Notice that crosstab itself does not pay any attention to the second column of the query result; it's just there to be ordered by, to control the order in which the third-column values appear across the page.

Here is a complete example:

CREATE TABLE ct(id SERIAL, rowid TEXT, attribute TEXT, value TEXT);
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att1','val1');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att2','val2');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att3','val3');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att4','val4');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att1','val5');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att2','val6');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att3','val7');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att4','val8');

SELECT *
FROM crosstab(
  'select rowid, attribute, value
   from ct
   where attribute = ''att2'' or attribute = ''att3''
   order by 1,2')
AS ct(row_name text, category_1 text, category_2 text, category_3 text);

 row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
 test1    | val2       | val3       |
 test2    | val6       | val7       |
(2 rows)

You can avoid always having to write out a FROM clause to define the output columns, by setting up a custom crosstab function that has the desired output row type wired into its definition. This is described in the next section. Another possibility is to embed the required FROM clause in a view definition.

另請參閱 psql 中的 \crosstabview 指令，該指令提供的功能類似於 crosstab()。

F.38.1.3. CrosstabN(Text)

crosstabN(text sql)

The crosstabN functions are examples of how to set up custom wrappers for the general crosstab function, so that you need not write out column names and types in the calling SELECT query. The tablefunc module includes crosstab2, crosstab3, and crosstab4, whose output row types are defined as

CREATE TYPE tablefunc_crosstab_N AS (
    row_name TEXT,
    category_1 TEXT,
    category_2 TEXT,
        .
        .
        .
    category_N TEXT
);

Thus, these functions can be used directly when the input query produces row_name and value columns of type text, and you want 2, 3, or 4 output values columns. In all other ways they behave exactly as described above for the general crosstab function.

For instance, the example given in the previous section would also work as

SELECT *
FROM crosstab3(
  'select rowid, attribute, value
   from ct
   where attribute = ''att2'' or attribute = ''att3''
   order by 1,2');

These functions are provided mostly for illustration purposes. You can create your own return types and functions based on the underlying crosstab() function. There are two ways to do it:

Create a composite type describing the desired output columns, similar to the examples in contrib/tablefunc/tablefunc--1.0.sql. Then define a unique function name accepting one text parameter and returning setof your_type_name, but linking to the same underlying crosstab C function. For example, if your source data produces row names that are text, and values that are float8, and you want 5 value columns:

CREATE TYPE my_crosstab_float8_5_cols AS (
    my_row_name text,
    my_category_1 float8,
    my_category_2 float8,
    my_category_3 float8,
    my_category_4 float8,
    my_category_5 float8
);

CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text)
    RETURNS setof my_crosstab_float8_5_cols
    AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;

Use OUT parameters to define the return type implicitly. The same example could also be done this way:

CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(
    IN text,
    OUT my_row_name text,
    OUT my_category_1 float8,
    OUT my_category_2 float8,
    OUT my_category_3 float8,
    OUT my_category_4 float8,
    OUT my_category_5 float8)
  RETURNS setof record
  AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;

F.38.1.4. Crosstab(Text, Text)

crosstab(text source_sql, text category_sql)

The main limitation of the single-parameter form of crosstab is that it treats all values in a group alike, inserting each value into the first available column. If you want the value columns to correspond to specific categories of data, and some groups might not have data for some of the categories, that doesn't work well. The two-parameter form of crosstab handles this case by providing an explicit list of the categories corresponding to the output columns.

source_sql is a SQL statement that produces the source set of data. This statement must return one row_name column, one category column, and one value column. It may also have one or more “extra” columns. The row_name column must be first. The category and value columns must be the last two columns, in that order. Any columns between row_name and category are treated as “extra”. The “extra” columns are expected to be the same for all rows with the same row_name value.

For example, source_sql might produce a set something like:

SELECT row_name, extra_col, cat, value FROM foo ORDER BY 1;

 row_name    extra_col   cat    value
----------+------------+-----+---------
  row1         extra1    cat1    val1
  row1         extra1    cat2    val2
  row1         extra1    cat4    val4
  row2         extra2    cat1    val5
  row2         extra2    cat2    val6
  row2         extra2    cat3    val7
  row2         extra2    cat4    val8

category_sql is a SQL statement that produces the set of categories. This statement must return only one column. It must produce at least one row, or an error will be generated. Also, it must not produce duplicate values, or an error will be generated. category_sql might be something like:

SELECT DISTINCT cat FROM foo ORDER BY 1;
    cat
  -------
    cat1
    cat2
    cat3
    cat4

The crosstab function is declared to return setof record, so the actual names and types of the output columns must be defined in the FROM clause of the calling SELECT statement, for example:

SELECT * FROM crosstab('...', '...')
    AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);

This will produce a result something like:

                  <==  value  columns   ==>
row_name   extra   cat1   cat2   cat3   cat4
---------+-------+------+------+------+------
  row1     extra1  val1   val2          val4
  row2     extra2  val5   val6   val7   val8

The FROM clause must define the proper number of output columns of the proper data types. If there are N columns in the source_sql query's result, the first N-2 of them must match up with the first N-2 output columns. The remaining output columns must have the type of the last column of the source_sql query's result, and there must be exactly as many of them as there are rows in the category_sql query's result.

The crosstab function produces one output row for each consecutive group of input rows with the same row_name value. The output row_name column, plus any “extra” columns, are copied from the first row of the group. The output value columns are filled with the value fields from rows having matching category values. If a row's category does not match any output of the category_sql query, its value is ignored. Output columns whose matching category is not present in any input row of the group are filled with nulls.

In practice the source_sql query should always specify ORDER BY 1 to ensure that values with the same row_name are brought together. However, ordering of the categories within a group is not important. Also, it is essential to be sure that the order of the category_sql query's output matches the specified output column order.

Here are two complete examples:

create table sales(year int, month int, qty int);
insert into sales values(2007, 1, 1000);
insert into sales values(2007, 2, 1500);
insert into sales values(2007, 7, 500);
insert into sales values(2007, 11, 1500);
insert into sales values(2007, 12, 2000);
insert into sales values(2008, 1, 1000);

select * from crosstab(
  'select year, month, qty from sales order by 1',
  'select m from generate_series(1,12) m'
) as (
  year int,
  "Jan" int,
  "Feb" int,
  "Mar" int,
  "Apr" int,
  "May" int,
  "Jun" int,
  "Jul" int,
  "Aug" int,
  "Sep" int,
  "Oct" int,
  "Nov" int,
  "Dec" int
);
 year | Jan  | Feb  | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov  | Dec
------+------+------+-----+-----+-----+-----+-----+-----+-----+-----+------+------
 2007 | 1000 | 1500 |     |     |     |     | 500 |     |     |     | 1500 | 2000
 2008 | 1000 |      |     |     |     |     |     |     |     |     |      |
(2 rows)

CREATE TABLE cth(rowid text, rowdt timestamp, attribute text, val text);
INSERT INTO cth VALUES('test1','01 March 2003','temperature','42');
INSERT INTO cth VALUES('test1','01 March 2003','test_result','PASS');
INSERT INTO cth VALUES('test1','01 March 2003','volts','2.6987');
INSERT INTO cth VALUES('test2','02 March 2003','temperature','53');
INSERT INTO cth VALUES('test2','02 March 2003','test_result','FAIL');
INSERT INTO cth VALUES('test2','02 March 2003','test_startdate','01 March 2003');
INSERT INTO cth VALUES('test2','02 March 2003','volts','3.1234');

SELECT * FROM crosstab
(
  'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
  'SELECT DISTINCT attribute FROM cth ORDER BY 1'
)
AS
(
       rowid text,
       rowdt timestamp,
       temperature int4,
       test_result text,
       test_startdate timestamp,
       volts float8
);
 rowid |          rowdt           | temperature | test_result |      test_startdate      | volts
-------+--------------------------+-------------+-------------+--------------------------+--------
 test1 | Sat Mar 01 00:00:00 2003 |          42 | PASS        |                          | 2.6987
 test2 | Sun Mar 02 00:00:00 2003 |          53 | FAIL        | Sat Mar 01 00:00:00 2003 | 3.1234
(2 rows)

You can create predefined functions to avoid having to write out the result column names and types in each query. See the examples in the previous section. The underlying C function for this form of crosstab is named crosstab_hash.

F.38.1.5. Connectby

connectby(text relname, text keyid_fld, text parent_keyid_fld
          [, text orderby_fld ], text start_with, int max_depth
          [, text branch_delim ])

The connectby function produces a display of hierarchical data that is stored in a table. The table must have a key field that uniquely identifies rows, and a parent-key field that references the parent (if any) of each row. connectby can display the sub-tree descending from any row.

Table F.31 explains the parameters.

Table F.31. `connectby` Parameters

The key and parent-key fields can be any data type, but they must be the same type. Note that the start_with value must be entered as a text string, regardless of the type of the key field.

The connectby function is declared to return setof record, so the actual names and types of the output columns must be defined in the FROM clause of the calling SELECT statement, for example:

SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
    AS t(keyid text, parent_keyid text, level int, branch text, pos int);

The first two output columns are used for the current row's key and its parent row's key; they must match the type of the table's key field. The third output column is the depth in the tree and must be of type integer. If a branch_delim parameter was given, the next output column is the branch display and must be of type text. Finally, if an orderby_fld parameter was given, the last output column is a serial number, and must be of type integer.

The “branch” output column shows the path of keys taken to reach the current row. The keys are separated by the specified branch_delim string. If no branch display is wanted, omit both the branch_delim parameter and the branch column in the output column list.

If the ordering of siblings of the same parent is important, include the orderby_fld parameter to specify which field to order siblings by. This field can be of any sortable data type. The output column list must include a final integer serial-number column, if and only if orderby_fld is specified.

The parameters representing table and field names are copied as-is into the SQL queries that connectby generates internally. Therefore, include double quotes if the names are mixed-case or contain special characters. You may also need to schema-qualify the table name.

In large tables, performance will be poor unless there is an index on the parent-key field.

It is important that the branch_delim string not appear in any key values, else connectby may incorrectly report an infinite-recursion error. Note that if branch_delim is not provided, a default value of ~ is used for recursion detection purposes.

Here is an example:

CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int);

INSERT INTO connectby_tree VALUES('row1',NULL, 0);
INSERT INTO connectby_tree VALUES('row2','row1', 0);
INSERT INTO connectby_tree VALUES('row3','row1', 0);
INSERT INTO connectby_tree VALUES('row4','row2', 1);
INSERT INTO connectby_tree VALUES('row5','row2', 0);
INSERT INTO connectby_tree VALUES('row6','row4', 0);
INSERT INTO connectby_tree VALUES('row7','row3', 0);
INSERT INTO connectby_tree VALUES('row8','row6', 0);
INSERT INTO connectby_tree VALUES('row9','row5', 0);

-- with branch, without orderby_fld (order of results is not guaranteed)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
 AS t(keyid text, parent_keyid text, level int, branch text);
 keyid | parent_keyid | level |       branch
-------+--------------+-------+---------------------
 row2  |              |     0 | row2
 row4  | row2         |     1 | row2~row4
 row6  | row4         |     2 | row2~row4~row6
 row8  | row6         |     3 | row2~row4~row6~row8
 row5  | row2         |     1 | row2~row5
 row9  | row5         |     2 | row2~row5~row9
(6 rows)

-- without branch, without orderby_fld (order of results is not guaranteed)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
 AS t(keyid text, parent_keyid text, level int);
 keyid | parent_keyid | level
-------+--------------+-------
 row2  |              |     0
 row4  | row2         |     1
 row6  | row4         |     2
 row8  | row6         |     3
 row5  | row2         |     1
 row9  | row5         |     2
(6 rows)

-- with branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
 AS t(keyid text, parent_keyid text, level int, branch text, pos int);
 keyid | parent_keyid | level |       branch        | pos
-------+--------------+-------+---------------------+-----
 row2  |              |     0 | row2                |   1
 row5  | row2         |     1 | row2~row5           |   2
 row9  | row5         |     2 | row2~row5~row9      |   3
 row4  | row2         |     1 | row2~row4           |   4
 row6  | row4         |     2 | row2~row4~row6      |   5
 row8  | row6         |     3 | row2~row4~row6~row8 |   6
(6 rows)

-- without branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
 AS t(keyid text, parent_keyid text, level int, pos int);
 keyid | parent_keyid | level | pos
-------+--------------+-------+-----
 row2  |              |     0 |   1
 row5  | row2         |     1 |   2
 row9  | row5         |     2 |   3
 row4  | row2         |     1 |   4
 row6  | row4         |     2 |   5
 row8  | row6         |     3 |   6
(6 rows)

F.38.2. Author

Joe Conway

oid2name

oid2name — resolve OIDs and file nodes in a PostgreSQL data directory

Synopsis

oid2name [option...]

Description

oid2name is a utility program that helps administrators to examine the file structure used by PostgreSQL. To make use of it, you need to be familiar with the database file structure, which is described in .

Note

The name “oid2name” is historical, and is actually rather misleading, since most of the time when you use it, you will really be concerned with tables' filenode numbers (which are the file names visible in the database directories). Be sure you understand the difference between table OIDs and table filenodes!

oid2name connects to a target database and extracts OID, filenode, and/or table name information. You can also have it show database OIDs or tablespace OIDs.

Options

oid2name accepts the following command-line arguments:

-f filenode

show info for table with filenode

filenode-i

include indexes and sequences in the listing

-o oid

show info for table with OID

oid-q

omit headers (useful for scripting)

-s

show tablespace OIDs

-S

include system objects (those in information_schema, pg_toast and pg_catalog schemas)

-t tablename_pattern

show info for table(s) matching

tablename_pattern-V --version

Print the oid2name version and exit.

-x

display more information about each object shown: tablespace name, schema name, and OID

-? --help

Show help about oid2name command line arguments, and exit.

oid2name also accepts the following command-line arguments for connection parameters:

-d database

database to connect to

-H host

database server's host

-p port

database server's port

-U username

user name to connect as

-P password

password (deprecated — putting this on the command line is a security hazard)

To display specific tables, select which tables to show by using -o, -f and/or -t. -o takes an OID, -f takes a filenode, and -t takes a table name (actually, it's a LIKE pattern, so you can use things like foo%). You can use as many of these options as you like, and the listing will include all objects matched by any of the options. But note that these options can only show objects in the database given by -d.

If you don't give any of -o, -f or -t, but do give -d, it will list all tables in the database named by -d. In this mode, the -S and -i options control what gets listed.

If you don't give -d either, it will show a listing of database OIDs. Alternatively you can give -s to get a tablespace listing.

Notes

oid2name requires a running database server with non-corrupt system catalogs. It is therefore of only limited use for recovering from catastrophic database corruption situations.

Examples

Author

vacuumlo

vacuumlo — remove orphaned large objects from a PostgreSQL database

Synopsis

vacuumlo [option...] dbname...

Description

vacuumlo is a simple utility program that will remove any “orphaned” large objects from a PostgreSQL database. An orphaned large object (LO) is considered to be any LO whose OID does not appear in any oid or lo data column of the database.

If you use this, you may also be interested in the lo_manage trigger in the module. lo_manage is useful to try to avoid creating orphaned LOs in the first place.

All databases named on the command line are processed.

Options

vacuumlo accepts the following command-line arguments:

-l limit

Remove no more than limit large objects per transaction (default 1000). Since the server acquires a lock per LO removed, removing too many LOs in one transaction risks exceeding . Set the limit to zero if you want all removals done in a single transaction.

-n

Don't remove anything, just show what would be done.

-v

Write a lot of progress messages.

-V --version

Print the vacuumlo version and exit.

-? --help

Show help about vacuumlo command line arguments, and exit.

vacuumlo also accepts the following command-line arguments for connection parameters:

-h hostname

Database server's host.

-p port

Database server's port.

-U username

User name to connect as.

-w --no-password

Never issue a password prompt. If the server requires password authentication and a password is not available by other means such as a .pgpass file, the connection attempt will fail. This option can be useful in batch jobs and scripts where no user is present to enter a password.

-W

Force vacuumlo to prompt for a password before connecting to a database.

This option is never essential, since vacuumlo will automatically prompt for a password if the server demands password authentication. However, vacuumlo will waste a connection attempt finding out that the server wants a password. In some cases it is worth typing -W to avoid the extra connection attempt.

Notes

vacuumlo works by the following method: First, vacuumlo builds a temporary table which contains all of the OIDs of the large objects in the selected database. It then scans through all columns in the database that are of type oid or lo, and removes matching entries from the temporary table. (Note: Only types with these names are considered; in particular, domains over them are not considered.) The remaining entries in the temporary table identify orphaned LOs. These are removed.

Author

pg_standby

pg_standby — supports the creation of a PostgreSQL warm standby server

Synopsis

pg_standby [option...] archivelocation nextwalfile walfilepath [restartwalfile]

Description

pg_standby supports creation of a “warm standby” database server. It is designed to be a production-ready program, as well as a customizable template should you require specific modifications.

pg_standby is designed to be a waiting restore_command, which is needed to turn a standard archive recovery into a warm standby operation. Other configuration is required as well, all of which is described in the main server manual (see ).

To configure a standby server to use pg_standby, put this into its recovery.conf configuration file:

where archiveDir is the directory from which WAL segment files should be restored.

If restartwalfile is specified, normally by using the %r macro, then all WAL files logically preceding this file will be removed from archivelocation. This minimizes the number of files that need to be retained, while preserving crash-restart capability. Use of this parameter is appropriate if the archivelocation is a transient staging area for this particular standby server, but not when the archivelocation is intended as a long-term WAL archive area.

pg_standby assumes that archivelocation is a directory readable by the server-owning user. If restartwalfile (or -k) is specified, the archivelocation directory must be writable too.

There are two ways to fail over to a “warm standby” database server when the master server fails:Smart Failover

In smart failover, the server is brought up after applying all WAL files available in the archive. This results in zero data loss, even if the standby server has fallen behind, but if there is a lot of unapplied WAL it can be a long time before the standby server becomes ready. To trigger a smart failover, create a trigger file containing the word smart, or just create it and leave it empty.Fast Failover

In fast failover, the server is brought up immediately. Any WAL files in the archive that have not yet been applied will be ignored, and all transactions in those files are lost. To trigger a fast failover, create a trigger file and write the word fast into it. pg_standby can also be configured to execute a fast failover automatically if no new WAL file appears within a defined interval.

Options

pg_standby accepts the following command-line arguments:

-c

Use cp or copy command to restore WAL files from archive. This is the only supported behavior so this option is useless.-d

Print lots of debug logging output on stderr.

-k

Remove files from archivelocation so that no more than this many WAL files before the current one are kept in the archive. Zero (the default) means not to remove any files from archivelocation. This parameter will be silently ignored if restartwalfile is specified, since that specification method is more accurate in determining the correct archive cut-off point. Use of this parameter is deprecated as of PostgreSQL 8.3; it is safer and more efficient to specify a restartwalfile parameter. A too small setting could result in removal of files that are still needed for a restart of the standby server, while a too large setting wastes archive space.

-r maxretries

Set the maximum number of times to retry the copy command if it fails (default 3). After each failure, we wait for sleeptime * num_retries so that the wait time increases progressively. So by default, we will wait 5 secs, 10 secs, then 15 secs before reporting the failure back to the standby server. This will be interpreted as end of recovery and the standby will come up fully as a result.

-s sleeptime

-t triggerfile

Specify a trigger file whose presence should cause failover. It is recommended that you use a structured file name to avoid confusion as to which server is being triggered when multiple servers exist on the same system; for example /tmp/pgsql.trigger.5432.

-V --version

Print the pg_standby version and exit.

-w maxwaittime

-? --help

Show help about pg_standby command line arguments, and exit.

Notes

pg_standby is designed to work with PostgreSQL 8.2 and later.

PostgreSQL 8.3 provides the %r macro, which is designed to let pg_standby know the last file it needs to keep. With PostgreSQL 8.2, the -k option must be used if archive cleanup is required. This option remains available in 8.3, but its use is deprecated.

PostgreSQL 8.4 provides the recovery_end_command option. Without this option a leftover trigger file can be hazardous.

pg_standby is written in C and has an easy-to-modify source code, with specifically designated sections to modify for your own needs

Examples

On Linux or Unix systems, you might use:

where the archive directory is physically located on the standby server, so that the archive_command is accessing it across NFS, but the files are local to the standby (enabling use of ln). This will:

produce debugging output in standby.log
sleep for 2 seconds between checks for next WAL file availability
stop waiting only when a trigger file called /tmp/pgsql.trigger.5442 appears, and perform failover according to its content
remove the trigger file when recovery ends
remove no-longer-needed files from the archive directory

On Windows, you might use:

Note that backslashes need to be doubled in the archive_command, but not in the restore_command or recovery_end_command. This will:

use the copy command to restore WAL files from archive
produce debugging output in standby.log
sleep for 5 seconds between checks for next WAL file availability
stop waiting only when a trigger file called C:\pgsql.trigger.5442 appears, and perform failover according to its content
remove the trigger file when recovery ends
remove no-longer-needed files from the archive directory

The copy command on Windows sets the final file size before the file is completely copied, which would ordinarily confuse pg_standby. Therefore pg_standby waits sleeptime seconds once it sees the proper file size. GNUWin32's cp sets the file size only after the file copy is complete.

Since the Windows example uses copy at both ends, either or both servers might be accessing the archive directory across the network.

Author

M. Glossary

This is a list of terms and their meaning in the context of PostgreSQL and relational database systems in general.

ACID

Atomicity, Consistency, Isolation, and Durability. This set of properties of database transactions is intended to guarantee validity in concurrent operation and even in event of errors, power failures, etc.

Aggregate function (routine)

A function that combines (aggregates) multiple input values, for example by counting, averaging or adding, yielding a single output value.

For more information, see Section 9.21.

Analytic function

See Window function (routine).

Analyze (operation)

The process of collecting statistics from data in tables and other relations to help the query planner to make decisions about how to execute queries.

(Don't confuse this term with the ANALYZE option to the EXPLAIN command.)

For more information, see ANALYZE.

Atomic

In reference to a datum: the fact that its value cannot be broken down into smaller components.

In reference to a database transaction: see atomicity.

Atomicity

The property of a transaction that either all its operations complete as a single unit or none do. In addition, if a system failure occurs during the execution of a transaction, no partial results are visible after recovery. This is one of the ACID properties.

Attribute

An element with a certain name and data type found within a tuple.

Autovacuum (process)

A set of background processes that routinely perform vacuum and analyze operations.

For more information, see Section 24.1.6.

Backend (process)

Process of an instance which acts on behalf of a client session and handles its requests.

(Don't confuse this term with the similar terms Background Worker or Background Writer).

Background worker (process)

Process within an instance, which runs system- or user-supplied code. Serves as infrastructure for several features in PostgreSQL, such as logical replication and parallel queries. In addition, Extensions can add custom background worker processes.

For more information, see Chapter 47.

Background writer (process)

A process that writes dirty data pages from shared memory to the file system. It wakes up periodically, but works only for a short period in order to distribute its expensive I/O activity over time to avoid generating larger I/O peaks which could block other processes.

For more information, see Section 19.4.5.

Bloat

Space in data pages which does not contain current row versions, such as unused (free) space or outdated row versions.Cast

A conversion of a datum from its current data type to another data type.

For more information, see CREATE CAST.

Catalog

The SQL standard uses this term to indicate what is called a database in PostgreSQL's terminology.

(Don't confuse this term with system catalog).

For more information, see Section 22.1.

Check constraint

A type of constraint defined on a relation which restricts the values allowed in one or more attributes. The check constraint can make reference to any attribute of the same row in the relation, but cannot reference other rows of the same relation or other relations.

For more information, see Section 5.4.

Checkpoint

A point in the WAL sequence at which it is guaranteed that the heap and index data files have been updated with all information from shared memory modified before that checkpoint; a checkpoint record is written and flushed to WAL to mark that point.

A checkpoint is also the act of carrying out all the actions that are necessary to reach a checkpoint as defined above. This process is initiated when predefined conditions are met, such as a specified amount of time has passed, or a certain volume of records has been written; or it can be invoked by the user with the command CHECKPOINT.

For more information, see Section 29.4.

Checkpointer (process)

A specialized process responsible for executing checkpoints.

Class (archaic)

See Relation.Client (process)

Any process, possibly remote, that establishes a session by connecting to an instance to interact with a database.

Column

An attribute found in a table or view.Commit

The act of finalizing a transaction within the database, which makes it visible to other transactions and assures its durability.

For more information, see COMMIT.

Concurrency

The concept that multiple independent operations happen within the database at the same time. In PostgreSQL, concurrency is controlled by the multiversion concurrency control mechanism.

Connection

An established line of communication between a client process and a backend process, usually over a network, supporting a session. This term is sometimes used as a synonym for session.

For more information, see Section 19.3.

Consistency

The property that the data in the database is always in compliance with integrity constraints. Transactions may be allowed to violate some of the constraints transiently before it commits, but if such violations are not resolved by the time it commits, such a transaction is automatically rolled back. This is one of the ACID properties.

Constraint

A restriction on the values of data allowed within a table, or in attributes of a domain.

For more information, see Section 5.4.

Data area

See Data directory.Database

A named collection of local SQL objects.

For more information, see Section 22.1.Database cluster

A collection of databases and global SQL objects, and their common static and dynamic metadata. Sometimes referred to as a cluster.

In PostgreSQL, the term cluster is also sometimes used to refer to an instance. (Don't confuse this term with the SQL command CLUSTER.)Database server

See Instance.Data directory

The base directory on the file system of a server that contains all data files and subdirectories associated with a database cluster (with the exception of tablespaces, and optionally WAL). The environment variable PGDATA is commonly used to refer to the data directory.

A cluster's storage space comprises the data directory plus any additional tablespaces.

For more information, see Section 68.1.Data page

The basic structure used to store relation data. All pages are of the same size. Data pages are typically stored on disk, each in a specific file, and can be read to shared buffers where they can be modified, becoming dirty. They become clean when written to disk. New pages, which initially exist in memory only, are also dirty until written.Datum

The internal representation of one value of an SQL data type.Delete

An SQL command which removes rows from a given table or relation.

For more information, see DELETE.Durability

The assurance that once a transaction has been committed, the changes remain even after a system failure or crash. This is one of the ACID properties.Epoch

See Transaction ID.Extension

A software add-on package that can be installed on an instance to get extra features.

For more information, see Section 37.17.File segment

A physical file which stores data for a given relation. File segments are limited in size by a configuration value (typically 1 gigabyte), so if a relation exceeds that size, it is split into multiple segments.

For more information, see Section 68.1.

(Don't confuse this term with the similar term WAL segment).Foreign data wrapper

A means of representing data that is not contained in the local database so that it appears as if were in local table(s). With a foreign data wrapper it is possible to define a foreign server and foreign tables.

For more information, see CREATE FOREIGN DATA WRAPPER.Foreign key

A type of constraint defined on one or more columns in a table which requires the value(s) in those columns to identify zero or one row in another (or, infrequently, the same) table.Foreign server

A named collection of foreign tables which all use the same foreign data wrapper and have other configuration values in common.

For more information, see CREATE SERVER.Foreign table (relation)

A relation which appears to have rows and columns similar to a regular table, but will forward requests for data through its foreign data wrapper, which will return result sets structured according to the definition of the foreign table.

For more information, see CREATE FOREIGN TABLE.Fork

Each of the separate segmented file sets in which a relation is stored. The main fork is where the actual data resides. There also exist two secondary forks for metadata: the free space map and the visibility map. Unlogged relations also have an init fork.Free space map (fork)

A storage structure that keeps metadata about each data page of a table's main fork. The free space map entry for each page stores the amount of free space that's available for future tuples, and is structured to be efficiently searched for available space for a new tuple of a given size.

For more information, see Section 68.3.Function (routine)

A type of routine that receives zero or more arguments, returns zero or more output values, and is constrained to run within one transaction. Functions are invoked as part of a query, for example via SELECT. Certain functions can return sets; those are called set-returning functions.

Functions can also be used for triggers to invoke.

For more information, see CREATE FUNCTION.Grant

An SQL command that is used to allow a user or role to access specific objects within the database.

For more information, see GRANT.Heap

Contains the values of row attributes (i.e., the data) for a relation. The heap is realized within one or more file segments in the relation's main fork.Host

A computer that communicates with other computers over a network. This is sometimes used as a synonym for server. It is also used to refer to a computer where client processes run.Index (relation)

A relation that contains data derived from a table or materialized view. Its internal structure supports fast retrieval of and access to the original data.

For more information, see CREATE INDEX.Insert

An SQL command used to add new data into a table.

For more information, see INSERT.Instance

A group of backend and auxiliary processes that communicate using a common shared memory area. One postmaster process manages the instance; one instance manages exactly one database cluster with all its databases. Many instances can run on the same server as long as their TCP ports do not conflict.

The instance handles all key features of a DBMS: read and write access to files and shared memory, assurance of the ACID properties, connections to client processes, privilege verification, crash recovery, replication, etc.Isolation

The property that the effects of a transaction are not visible to concurrent transactions before it commits. This is one of the ACID properties.

For more information, see Section 13.2.Join

An operation and SQL keyword used in queries for combining data from multiple relations.Key

A means of identifying a row within a table or other relation by values contained within one or more attributes in that relation.Lock

A mechanism that allows a process to limit or prevent simultaneous access to a resource.Log file

Log files contain human-readable text lines about events. Examples include login failures, long-running queries, etc.

For more information, see Section 24.3.Logged

A table is considered logged if changes to it are sent to the WAL. By default, all regular tables are logged. A table can be specified as unlogged either at creation time or via the ALTER TABLE command.Logger (process)

If activated, the process writes information about database events into the current log file. When reaching certain time- or volume-dependent criteria, a new log file is created. Also called syslogger.

For more information, see Section 19.8.Log record

Archaic term for a WAL record.Master (server)

See Primary (server).Materialized

The property that some information has been pre-computed and stored for later use, rather than computing it on-the-fly.

This term is used in materialized view, to mean that the data derived from the view's query is stored on disk separately from the sources of that data.

This term is also used to refer to some multi-step queries to mean that the data resulting from executing a given step is stored in memory (with the possibility of spilling to disk), so that it can be read multiple times by another step.Materialized view (relation)

A relation that is defined by a SELECT statement (just like a view), but stores data in the same way that a table does. It cannot be modified via INSERT, UPDATE, or DELETE operations.

For more information, see CREATE MATERIALIZED VIEW.Multi-version concurrency control (MVCC)

A mechanism designed to allow several transactions to be reading and writing the same rows without one process causing other processes to stall. In PostgreSQL, MVCC is implemented by creating copies (versions) of tuples as they are modified; after transactions that can see the old versions terminate, those old versions need to be removed.Null

A concept of non-existence that is a central tenet of relational database theory. It represents the absence of a definite value.Optimizer

See Query planner.Parallel query

The ability to handle parts of executing a query to take advantage of parallel processes on servers with multiple CPUs.Partition

One of several disjoint (not overlapping) subsets of a larger set.

In reference to a partitioned table: One of the tables that each contain part of the data of the partitioned table, which is said to be the parent. The partition is itself a table, so it can also be queried directly; at the same time, a partition can sometimes be a partitioned table, allowing hierarchies to be created.

In reference to a window function in a query, a partition is a user-defined criterion that identifies which neighboring rows of the query's result set can be considered by the function.Partitioned table (relation)

A relation that is in semantic terms the same as a table, but whose storage is distributed across several partitions.Postmaster (process)

The very first process of an instance. It starts and manages the other auxiliary processes and creates backend processes on demand.

For more information, see Section 18.3.Primary key

A special case of a unique constraint defined on a table or other relation that also guarantees that all of the attributes within the primary key do not have null values. As the name implies, there can be only one primary key per table, though it is possible to have multiple unique constraints that also have no null-capable attributes.Primary (server)

When two or more databases are linked via replication, the server that is considered the authoritative source of information is called the primary, also known as a master.Procedure (routine)

A type of routine. Their distinctive qualities are that they do not return values, and that they are allowed to make transactional statements such as COMMIT and ROLLBACK. They are invoked via the CALL command.

For more information, see CREATE PROCEDURE.Query

A request sent by a client to a backend, usually to return results or to modify data on the database.Query planner

The part of PostgreSQL that is devoted to determining (planning) the most efficient way to execute queries. Also known as query optimizer, optimizer, or simply planner.Record

See Tuple.Recycling

See WAL file.Referential integrity

A means of restricting data in one relation by a foreign key so that it must have matching data in another relation.Relation

The generic term for all objects in a database that have a name and a list of attributes defined in a specific order. Tables, sequences, views, foreign tables, materialized views, composite types, and indexes are all relations.

More generically, a relation is a set of tuples; for example, the result of a query is also a relation.

In PostgreSQL, Class is an archaic synonym for relation.Replica (server)

A database that is paired with a primary database and is maintaining a copy of some or all of the primary database's data. The foremost reasons for doing this are to allow for greater access to that data, and to maintain availability of the data in the event that the primary becomes unavailable.Replication

The act of reproducing data on one server onto another server called a replica. This can take the form of physical replication, where all file changes from one server are copied verbatim, or logical replication where a defined subset of data changes are conveyed using a higher-level representation.Result set

A relation transmitted from a backend process to a client upon the completion of an SQL command, usually a SELECT but it can be an INSERT, UPDATE, or DELETE command if the RETURNING clause is specified.

The fact that a result set is a relation means that a query can be used in the definition of another query, becoming a subquery.Revoke

A command to prevent access to a named set of database objects for a named list of roles.

For more information, see REVOKE.Role

A collection of access privileges to the instance. Roles are themselves a privilege that can be granted to other roles. This is often done for convenience or to ensure completeness when multiple users need the same privileges.

For more information, see CREATE ROLE.Rollback

A command to undo all of the operations performed since the beginning of a transaction.

For more information, see ROLLBACK.Routine

A defined set of instructions stored in the database system that can be invoked for execution. A routine can be written in a variety of programming languages. Routines can be functions (including set-returning functions and trigger functions), aggregate functions, and procedures.

Many routines are already defined within PostgreSQL itself, but user-defined ones can also be added.Row

See Tuple.Savepoint

A special mark in the sequence of steps in a transaction. Data modifications after this point in time may be reverted to the time of the savepoint.

For more information, see SAVEPOINT.Schema

A schema is a namespace for SQL objects, which all reside in the same database. Each SQL object must reside in exactly one schema.

All system-defined SQL objects reside in schema pg_catalog.

More generically, the term schema is used to mean all data descriptions (table definitions, constraints, comments, etc) for a given database or subset thereof.

For more information, see Section 5.9.Segment

See File segment.Select

The SQL command used to request data from a database. Normally, SELECT commands are not expected to modify the database in any way, but it is possible that functions invoked within the query could have side effects that do modify data.

For more information, see SELECT.Sequence (relation)

A type of relation that is used to generate values. Typically the generated values are sequential non-repeating numbers. They are commonly used to generate surrogate primary key values.Server

A computer on which PostgreSQL instances run. The term server denotes real hardware, a container, or a virtual machine.

This term is sometimes used to refer to an instance or to a host.Session

A state that allows a client and a backend to interact, communicating over a connection.Shared memory

RAM which is used by the processes common to an instance. It mirrors parts of database files, provides a transient area for WAL records, and stores additional common information. Note that shared memory belongs to the complete instance, not to a single database.

The largest part of shared memory is known as shared buffers and is used to mirror part of data files, organized into pages. When a page is modified, it is called a dirty page until it is written back to the file system.

For more information, see Section 19.4.1.SQL object

Any object that can be created with a CREATE command. Most objects are specific to one database, and are commonly known as local objects.

Most local objects belong to a specific schema in their containing database, such as relations (all types), routines (all types), data types, etc. The names of such objects of the same type in the same schema are enforced to be unique.

There also exist local objects that do not belong to schemas; some examples are extensions, data type casts, and foreign data wrappers. The names of such objects of the same type are enforced to be unique within the database.

Other object types, such as roles, tablespaces, replication origins, subscriptions for logical replication, and databases themselves are not local SQL objects since they exist entirely outside of any specific database; they are called global objects. The names of such objects are enforced to be unique within the whole database cluster.

For more information, see Section 22.1.SQL standard

A series of documents that define the SQL language.Standby (server)

See Replica (server).Stats collector (process)

This process collects statistical information about the instance's activities.

For more information, see Section 27.2.System catalog

A collection of tables which describe the structure of all SQL objects of the instance. The system catalog resides in the schema pg_catalog. These tables contain data in internal representation and are not typically considered useful for user examination; a number of user-friendlier views, also in schema pg_catalog, offer more convenient access to some of that information, while additional tables and views exist in schema information_schema (see Chapter 36) that expose some of the same and additional information as mandated by the SQL standard.

For more information, see Section 5.9.Table

A collection of tuples having a common data structure (the same number of attributes, in the same order, having the same name and type per position). A table is the most common form of relation in PostgreSQL.

For more information, see CREATE TABLE.Tablespace

A named location on the server file system. All SQL objects which require storage beyond their definition in the system catalog must belong to a single tablespace. Initially, a database cluster contains a single usable tablespace which is used as the default for all SQL objects, called pg_default.

For more information, see Section 22.6.Temporary table

Tables that exist either for the lifetime of a session or a transaction, as specified at the time of creation. The data in them is not visible to other sessions, and is not logged. Temporary tables are often used to store intermediate data for a multi-step operation.

For more information, see CREATE TABLE.TOAST

A mechanism by which large attributes of table rows are split and stored in a secondary table, called the TOAST table. Each relation with large attributes has its own TOAST table.

For more information, see Section 68.2.Transaction

A combination of commands that must act as a single atomic command: they all succeed or all fail as a single unit, and their effects are not visible to other sessions until the transaction is complete, and possibly even later, depending on the isolation level.

For more information, see Section 13.2.Transaction ID

The numerical, unique, sequentially-assigned identifier that each transaction receives when it first causes a database modification. Frequently abbreviated as xid. When stored on disk, xids are only 32-bits wide, so only approximately four billion write transaction IDs can be generated; to permit the system to run for longer than that, epochs are used, also 32 bits wide. When the counter reaches the maximum xid value, it starts over at 3 (values under that are reserved) and the epoch value is incremented by one. In some contexts, the epoch and xid values are considered together as a single 64-bit value.

For more information, see Section 8.19.Transactions per second (TPS)

Average number of transactions that are executed per second, totaled across all sessions active for a measured run. This is used as a measure of the performance characteristics of an instance.Trigger

A function which can be defined to execute whenever a certain operation (INSERT, UPDATE, DELETE, TRUNCATE) is applied to a relation. A trigger executes within the same transaction as the statement which invoked it, and if the function fails, then the invoking statement also fails.

For more information, see CREATE TRIGGER.Tuple

A collection of attributes in a fixed order. That order may be defined by the table (or other relation) where the tuple is contained, in which case the tuple is often called a row. It may also be defined by the structure of a result set, in which case it is sometimes called a record.Unique constraint

A type of constraint defined on a relation which restricts the values allowed in one or a combination of columns so that each value or combination of values can only appear once in the relation — that is, no other row in the relation contains values that are equal to those.

Because null values are not considered equal to each other, multiple rows with null values are allowed to exist without violating the unique constraint.Unlogged

The property of certain relations that the changes to them are not reflected in the WAL. This disables replication and crash recovery for these relations.

The primary use of unlogged tables is for storing transient work data that must be shared across processes.

Temporary tables are always unlogged.Update

An SQL command used to modify rows that may already exist in a specified table. It cannot create or remove rows.

For more information, see UPDATE.User

A role that has the LOGIN privilege.User mapping

The translation of login credentials in the local database to credentials in a remote data system defined by a foreign data wrapper.

For more information, see CREATE USER MAPPING.Vacuum

The process of removing outdated tuple versions from tables or materialized views, and other closely related processing required by PostgreSQL's implementation of MVCC. This can be initiated through the use of the VACUUM command, but can also be handled automatically via autovacuum processes.

For more information, see Section 24.1 .View

A relation that is defined by a SELECT statement, but has no storage of its own. Any time a query references a view, the definition of the view is substituted into the query as if the user had typed it as a subquery instead of the name of the view.

For more information, see CREATE VIEW.Visibility map (fork)

A storage structure that keeps metadata about each data page of a table's main fork. The visibility map entry for each page stores two bits: the first one (all-visible) indicates that all tuples in the page are visible to all transactions. The second one (all-frozen) indicates that all tuples in the page are marked frozen.WAL

See Write-ahead log.WAL archiver (process)

A process that saves copies of WAL files for the purpose of creating backups or keeping replicas current.

For more information, see Section 25.3.WAL file

Also known as WAL segment or WAL segment file. Each of the sequentially-numbered files that provide storage space for WAL. The files are all of the same predefined size and are written in sequential order, interspersing changes as they occur in multiple simultaneous sessions. If the system crashes, the files are read in order, and each of the changes is replayed to restore the system to the state it was in before the crash.

Each WAL file can be released after a checkpoint writes all the changes in it to the corresponding data files. Releasing the file can be done either by deleting it, or by changing its name so that it will be used in the future, which is called recycling.

For more information, see Section 29.5.WAL record

A low-level description of an individual data change. It contains sufficient information for the data change to be re-executed (replayed) in case a system failure causes the change to be lost. WAL records use a non-printable binary format.

For more information, see Section 29.5.WAL segment

See WAL file.WAL writer (process)

A process that writes WAL records from shared memory to WAL files.

For more information, see Section 19.5.Window function (routine)

A type of function used in a query that applies to a partition of the query's result set; the function's result is based on values found in rows of the same partition or frame.

All aggregate functions can be used as window functions, but window functions can also be used to, for example, give ranks to each of the rows in the partition. Also known as analytic functions.

For more information, see Section 3.5.Write-ahead log

The journal that keeps track of the changes in the database cluster as user- and system-invoked operations take place. It comprises many individual WAL records written sequentially to WAL files.

VIII. 附錄

B. 日期時間格式支援

B.1. 日期時間解譯流程

小技巧

B.3. 日期時間設定檔

注意

C. SQL 關鍵字

Table C.1. SQL Key Words

E. 版本資訊

E.1. Release 13.3

E.1.1. Migration to Version 13.3

E.1.2. Changes

B. 日期時間格式支援

B.3. 日期時間設定檔

注意

B.1. 日期時間解譯流程

小技巧

E. 版本資訊

E.1. Release 13.3

E.1.1. Migration to Version 13.3

E.1.2. Changes

C. SQL 關鍵字

Table C.1. SQL Key Words

D. SQL 相容性

Note

D.3. XML Limits and Conformance to SQL/XML

D.3.1. Queries Are Restricted to XPath 1.0

D.3.1.1. Restriction Of XQuery To XPath

D.3.1.2. Restriction Of XPath To 1.0

D.3.1.3. Mappings Between SQL And XML Data Types And Values

D.3.2. Incidental Limits of the Implementation

D.3.2.1. Only BY VALUE Passing Mechanism Is Supported

D.3.2.2. Cannot Pass Named Parameters To Queries

D.3.2.3. No XML(SEQUENCE) Type

F.3. auth_delay

F.3.1. Configuration Parameters

F.3.2. Author

F.4. auto_explain

F.4.1. 組態參數

F.4.2. 範例

F.4.3. 作者

F.2. amcheck

F.2.1. Functions

Tip

F.2.2. Optional heapallindexed Verification

F.2.3. Using amcheck Effectively

F.2.4. Repairing Corruption

F.5. bloom

F.5.1. 參數

F.5.2. Examples

F.5.3. Operator Class Interface

F.5.4. Limitations

F.5.5. Authors

dblink_disconnect

Synopsis

Description

Arguments

Return Value

Examples

dblink

Synopsis

Description

Arguments

Return Value

Notes

Examples

F.10. dblink

dblink_connect

Synopsis

Description

Arguments

Return Value

Notes

Examples

F.6. btree_gin

F.6.1. 使用範例

F.6.2. 作者們

B.2. 日期時間慣用字

F.1. adminpack

Table F.1. adminpack Functions

F.2.2. Optional `heapallindexed` Verification

F.2.3. Using `amcheck` Effectively

Table F.1. `adminpack` Functions