1 of 33

F. 延伸支援模組

This appendix and the next one contain information regarding the modules that can be found in the contrib directory of the PostgreSQL distribution. These include porting tools, analysis utilities, and plug-in features that are not part of the core PostgreSQL system, mainly because they address a limited audience or are too experimental to be part of the main source tree. This does not preclude their usefulness.

This appendix covers extensions and other server plug-in modules found in contrib. Appendix G covers utility programs.

When building from the source distribution, these components are not built automatically, unless you build the "world" target (see Step 2). You can build and install all of them by running:

make
make install

in the contrib directory of a configured source tree; or to build and install just one selected module, do the same in that module's subdirectory. Many of the modules have regression tests, which can be executed by running:

make check

before installation or

make installcheck

once you have a PostgreSQL server running.

If you are using a pre-packaged version of PostgreSQL, these modules are typically made available as a separate subpackage, such as postgresql-contrib.

Many modules supply new user-defined functions, operators, or types. To make use of one of these modules, after you have installed the code you need to register the new SQL objects in the database system. In PostgreSQL 9.1 and later, this is done by executing a CREATE EXTENSION command. In a fresh database, you can simply do

CREATE EXTENSION module_name;

This command must be run by a database superuser. This registers the new SQL objects in the current database only, so you need to run this command in each database that you want the module's facilities to be available in. Alternatively, run it in database template1 so that the extension will be copied into subsequently-created databases by default.

Many modules allow you to install their objects in a schema of your choice. To do that, add SCHEMA schema_name to the CREATE EXTENSION command. By default, the objects will be placed in your current creation target schema, which in turn defaults to public.

If your database was brought forward by dump and reload from a pre-9.1 version of PostgreSQL, and you had been using the pre-9.1 version of the module in it, you should instead do

CREATE EXTENSION module_name FROM unpackaged;

This will update the pre-9.1 objects of the module into a proper extension object. Future updates to the module will be managed by ALTER EXTENSION. For more information about extension updates, see Section 37.17.

Note, however, that some of these modules are not “extensions” in this sense, but are loaded into the server in some other way, for instance by way of shared_preload_libraries. See the documentation of each module for details.

F.4. auto_explain

auto_explain 模組提供了一種自動記錄慢速語句執行計劃的方法，毌須手動執行 EXPLAIN。這對於在大型應用程序中追踪未最佳化的查詢特別有用。

該模組不提供 SQL 可存取的功能。要使用它，只需將其載入到伺服器中即可。您也可以將其載入到單個連線之中：

LOAD 'auto_explain';

（您必須是超級使用者才能這樣做。）更典型的用法是透過在 postgresql.conf 中的 session_preload_libraries 或 shared_preload_libraries 中包含 auto_explain 將其預先載入到部分或全部連線中。然後，無論何時發生，您都可以追踪意外緩慢的查詢。當然，會有一些系統代價。

F.4.1. 組態參數

有幾個組態參數可以控制 auto_explain 的行為。請注意，預設行為是什麼都不做，因此如果需要任何結果，必須至少設定 auto_explain.log_min_duration。

auto_explain.log_min_duration (integer)

auto_explain.log_min_duration 是記錄語句計劃的最小語句執行時間（以毫秒為單位）。將此設定為零會記錄所有計劃。減號（預設值）停用計劃的記錄。例如，如果將其設定為 250ms，則將記錄執行 250ms 或更長時間的所有語句。只有超級使用者才能變更此設定。

auto_explain.log_analyze (boolean)

auto_explain.log_analyze 會在記錄執行計劃時列印 EXPLAIN ANALYZE 輸出，而不僅僅是 EXPLAIN 輸出。預設情況下，此參數處於停用狀態。只有超級使用者才能變更此設定。

注意

啟用此參數後，將對所有執行的語句執行每計劃節點計時，無論它們是否執行足夠長時間以實際記錄。這可能會對效能產生極為不利的影響。關閉 auto_explain.log_timing 可以獲得較少的訊息，從而改善效能成本。

auto_explain.log_buffers (boolean)

auto_explain.log_buffers 控制是否在記錄執行計劃時輸出緩衝區使用情況統計訊息；它相當於 EXPLAIN 的 BUFFERS 選項。除非啟用了 auto_explain.log_analyze，否則此參數無效。預鉆水情況下，此參數處於停用狀態。只有超級使用者才能變更改此設定。

auto_explain.log_timing (boolean)

auto_explain.log_timing 控制在記錄執行計劃時是否輸出每個節點的計時訊息；它相當於 EXPLAIN 的 TIMING 選項。重複讀取系統時鐘的成本會在某些系統上明顯減慢查詢速度，因此當只需要實際資料列計數而非精確時間計時，將此參數設定為關閉可能很有用。除非啟用了 auto_explain.log_analyze，否則此參數無效。預設情況下，此參數處於啟用狀態。只有超級使用者才能變更此設定。

auto_explain.log_triggers (boolean)

auto_explain.log_triggers 會在記錄執行計劃時包含觸發器執行統計訊息。除非啟用了 auto_explain.log_analyze，否則此參數無效。預設情況下，此參數處於停用狀態。只有超級使用者才能變更此設定。

auto_explain.log_verbose (boolean)

auto_explain.log_verbose 控制是否在記錄執行計劃時輸出詳細訊息；它相當於 EXPLAIN 的 VERBOSE 選項。預設情況下，此參數處於停用狀態。只有超級使用者才能變更此設定。

auto_explain.log_format (enum)

auto_explain.log_format 選擇要使用的 EXPLAIN 輸出格式。允許的值為 text、xml、json 和 yaml。預設為 text。只有超級使用者才能變更此設定。

auto_explain.log_nested_statements (boolean)

auto_explain.log_nested_statements 會讓巢狀語句（在函數內執行的語句）記錄下來。關閉時，僅記錄最上層查詢計劃。預設情況下，此參數處於停用狀態。只有超級使用者才能變更此設定。

auto_explain.sample_rate (real)

auto_explain.sample_rate 使 auto_explain 僅解釋每個連線中的一小部分語句。預設值為 1，表示 EXPLAIN 所有查詢。在巢狀語句的情況下，要就全部都要解釋，要就都不解釋。只有超級使用者才能變更此設定。

在一般的用法中，這些參數在 postgresql.conf 中設定，儘管超級使用者可以在他們自己的連線中即時更改它們。典型用法可能是：

# postgresql.conf
session_preload_libraries = 'auto_explain'

auto_explain.log_min_duration = '3s'

F.4.2. 範例

postgres=# LOAD 'auto_explain';
postgres=# SET auto_explain.log_min_duration = 0;
postgres=# SET auto_explain.log_analyze = true;
postgres=# SELECT count(*)
           FROM pg_class, pg_index
           WHERE oid = indrelid AND indisunique;

這可能會產生如下的日誌輸出：

LOG:  duration: 3.651 ms  plan:
  Query Text: SELECT count(*)
              FROM pg_class, pg_index
              WHERE oid = indrelid AND indisunique;
  Aggregate  (cost=16.79..16.80 rows=1 width=0) (actual time=3.626..3.627 rows=1 loops=1)
    ->  Hash Join  (cost=4.17..16.55 rows=92 width=0) (actual time=3.349..3.594 rows=92 loops=1)
          Hash Cond: (pg_class.oid = pg_index.indrelid)
          ->  Seq Scan on pg_class  (cost=0.00..9.55 rows=255 width=4) (actual time=0.016..0.140 rows=255 loops=1)
          ->  Hash  (cost=3.02..3.02 rows=92 width=4) (actual time=3.238..3.238 rows=92 loops=1)
                Buckets: 1024  Batches: 1  Memory Usage: 4kB
                ->  Seq Scan on pg_index  (cost=0.00..3.02 rows=92 width=4) (actual time=0.008..3.187 rows=92 loops=1)
                      Filter: indisunique

F.4.3. 作者

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>

F.6. btree_gin

btree_gin provides sample GIN operator classes that implement B-tree equivalent behavior for the data types int2, int4, int8, float4, float8, timestamp with time zone, timestamp without time zone, time with time zone, time without time zone, date, interval, oid, money, "char", varchar, text, bytea, bit, varbit, macaddr, macaddr8, inet, cidr, uuid, name, bool, bpchar, and all enum types.

通常，這些運算子類不會優於等效的標準 B-tree 索引方法，並且它們缺少標準 B-tree 的一個主要功能：強制執行唯一性。但是，它們對於 GIN 測試很有用，並且可以作為開發其他 GIN 運算子類的基礎。同樣地，對於同時測試可索引 GIN 欄位和 B-tree 可索引列的查詢，建立使用這些運算子之一的多欄位 GIN 索引可能比建立兩個必須獨立的索引更有效，以 bitmap ANDing 的方式。

F.6.1. 使用範例

CREATE TABLE test (a int4);
-- create index
CREATE INDEX testidx ON test USING GIN (a);
-- query
SELECT * FROM test WHERE a < 10;

F.6.2. 作者們

Teodor Sigaev (<teodor@stack.net>) and Oleg Bartunov (<oleg@sai.msu.su>). See http://www.sai.msu.su/~megera/oddmuse/index.cgi/Gin for additional information.

F.10. dblink

dblink 模組可以讓你從資料庫連線中再連線到其他 PostgreSQL 資料庫。

另請參閱，它使用更現代且符合標準的基礎架構提供大致相同的功能。

dblink_connect

dblink_connect — opens a persistent connection to a remote database

Synopsis

dblink_connect(text connstr) returns text
dblink_connect(text connname, text connstr) returns text

Description

dblink_connect() establishes a connection to a remote PostgreSQL database. The server and database to be contacted are identified through a standard libpq connection string. Optionally, a name can be assigned to the connection. Multiple named connections can be open at once, but only one unnamed connection is permitted at a time. The connection will persist until closed or until the database session is ended.

The connection string may also be the name of an existing foreign server. It is recommended to use the foreign-data wrapper dblink_fdw when defining the foreign server. See the example below, as well as CREATE SERVER and CREATE USER MAPPING.

Arguments

connname

The name to use for this connection; if omitted, an unnamed connection is opened, replacing any existing unnamed connection.connstr

libpq-style connection info string, for example hostaddr=127.0.0.1 port=5432 dbname=mydb user=postgres password=mypasswd options=-csearch_path=. For details see Section 33.1.1. Alternatively, the name of a foreign server.

Return Value

Returns status, which is always OK (since any error causes the function to throw an error instead of returning).

Notes

If untrusted users have access to a database that has not adopted a secure schema usage pattern, begin each session by removing publicly-writable schemas from search_path. One could, for example, add options=-csearch_path= to connstr. This consideration is not specific to dblink; it applies to every interface for executing arbitrary SQL commands.

Only superusers may use dblink_connect to create non-password-authenticated connections. If non-superusers need this capability, use dblink_connect_u instead.

It is unwise to choose connection names that contain equal signs, as this opens a risk of confusion with connection info strings in other dblink functions.

Examples

SELECT dblink_connect('dbname=postgres options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

SELECT dblink_connect('myconn', 'dbname=postgres options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

-- FOREIGN DATA WRAPPER functionality
-- Note: local connection must require password authentication for this to work properly
--       Otherwise, you will receive the following error from dblink_connect():
--       ----------------------------------------------------------------------
--       ERROR:  password is required
--       DETAIL:  Non-superuser cannot connect if the server does not request a password.
--       HINT:  Target server's authentication method must be changed.

CREATE SERVER fdtest FOREIGN DATA WRAPPER dblink_fdw OPTIONS (hostaddr '127.0.0.1', dbname 'contrib_regression');

CREATE USER regress_dblink_user WITH PASSWORD 'secret';
CREATE USER MAPPING FOR regress_dblink_user SERVER fdtest OPTIONS (user 'regress_dblink_user', password 'secret');
GRANT USAGE ON FOREIGN SERVER fdtest TO regress_dblink_user;
GRANT SELECT ON TABLE foo TO regress_dblink_user;

\set ORIGINAL_USER :USER
\c - regress_dblink_user
SELECT dblink_connect('myconn', 'fdtest');
 dblink_connect 
----------------
 OK
(1 row)

SELECT * FROM dblink('myconn','SELECT * FROM foo') AS t(a int, b text, c text[]);
 a  | b |       c       
----+---+---------------
  0 | a | {a0,b0,c0}
  1 | b | {a1,b1,c1}
  2 | c | {a2,b2,c2}
  3 | d | {a3,b3,c3}
  4 | e | {a4,b4,c4}
  5 | f | {a5,b5,c5}
  6 | g | {a6,b6,c6}
  7 | h | {a7,b7,c7}
  8 | i | {a8,b8,c8}
  9 | j | {a9,b9,c9}
 10 | k | {a10,b10,c10}
(11 rows)

\c - :ORIGINAL_USER
REVOKE USAGE ON FOREIGN SERVER fdtest FROM regress_dblink_user;
REVOKE SELECT ON TABLE foo FROM regress_dblink_user;
DROP USER MAPPING FOR regress_dblink_user SERVER fdtest;
DROP USER regress_dblink_user;
DROP SERVER fdtest;

dblink_connect_u

dblink_connect_u — opens a persistent connection to a remote database, insecurely

Synopsis

dblink_connect_u(text connstr) returns text
dblink_connect_u(text connname, text connstr) returns text

Description

dblink_connect_u() is identical to dblink_connect(), except that it will allow non-superusers to connect using any authentication method.

If the remote server selects an authentication method that does not involve a password, then impersonation and subsequent escalation of privileges can occur, because the session will appear to have originated from the user as which the local PostgreSQL server runs. Also, even if the remote server does demand a password, it is possible for the password to be supplied from the server environment, such as a ~/.pgpass file belonging to the server's user. This opens not only a risk of impersonation, but the possibility of exposing a password to an untrustworthy remote server. Therefore, dblink_connect_u() is initially installed with all privileges revoked from PUBLIC, making it un-callable except by superusers. In some situations it may be appropriate to grant EXECUTE permission for dblink_connect_u() to specific users who are considered trustworthy, but this should be done with care. It is also recommended that any ~/.pgpass file belonging to the server's user not contain any records specifying a wildcard host name.

For further details see dblink_connect().

dblink_disconnect

dblink_disconnect — closes a persistent connection to a remote database

Synopsis

dblink_disconnect() returns text
dblink_disconnect(text connname) returns text

Description

dblink_disconnect() closes a connection previously opened by dblink_connect(). The form with no arguments closes an unnamed connection.

Arguments

connname

The name of a named connection to be closed.

Return Value

Returns status, which is always OK (since any error causes the function to throw an error instead of returning).

Examples

SELECT dblink_disconnect();
 dblink_disconnect
-------------------
 OK
(1 row)

SELECT dblink_disconnect('myconn');
 dblink_disconnect
-------------------
 OK
(1 row)

dblink

dblink — executes a query in a remote database

Synopsis

dblink(text connname, text sql [, bool fail_on_error]) returns setof record
dblink(text connstr, text sql [, bool fail_on_error]) returns setof record
dblink(text sql [, bool fail_on_error]) returns setof record

Description

dblink executes a query (usually a SELECT, but it can be any SQL statement that returns rows) in a remote database.

When two text arguments are given, the first one is first looked up as a persistent connection's name; if found, the command is executed on that connection. If not found, the first argument is treated as a connection info string as for dblink_connect, and the indicated connection is made just for the duration of this command.

Arguments

connname

Name of the connection to use; omit this parameter to use the unnamed connection.connstr

A connection info string, as previously described for dblink_connect.

sql

The SQL query that you wish to execute in the remote database, for example select * from foo.

fail_on_error

If true (the default when omitted) then an error thrown on the remote side of the connection causes an error to also be thrown locally. If false, the remote error is locally reported as a NOTICE, and the function returns no rows.

Return Value

The function returns the row(s) produced by the query. Since dblink can be used with any query, it is declared to return record, rather than specifying any particular set of columns. This means that you must specify the expected set of columns in the calling query — otherwise PostgreSQL would not know what to expect. Here is an example:

SELECT *
    FROM dblink('dbname=mydb options=-csearch_path=',
                'select proname, prosrc from pg_proc')
      AS t1(proname name, prosrc text)
    WHERE proname LIKE 'bytea%';

The “alias” part of the FROM clause must specify the column names and types that the function will return. (Specifying column names in an alias is actually standard SQL syntax, but specifying column types is a PostgreSQL extension.) This allows the system to understand what * should expand to, and what proname in the WHERE clause refers to, in advance of trying to execute the function. At run time, an error will be thrown if the actual query result from the remote database does not have the same number of columns shown in the FROM clause. The column names need not match, however, and dblink does not insist on exact type matches either. It will succeed so long as the returned data strings are valid input for the column type declared in the FROM clause.

Notes

A convenient way to use dblink with predetermined queries is to create a view. This allows the column type information to be buried in the view, instead of having to spell it out in every query. For example,

CREATE VIEW myremote_pg_proc AS
  SELECT *
    FROM dblink('dbname=postgres options=-csearch_path=',
                'select proname, prosrc from pg_proc')
    AS t1(proname name, prosrc text);

SELECT * FROM myremote_pg_proc WHERE proname LIKE 'bytea%';

Examples

SELECT * FROM dblink('dbname=postgres options=-csearch_path=',
                     'select proname, prosrc from pg_proc')
  AS t1(proname name, prosrc text) WHERE proname LIKE 'bytea%';
  proname   |   prosrc
------------+------------
 byteacat   | byteacat
 byteaeq    | byteaeq
 bytealt    | bytealt
 byteale    | byteale
 byteagt    | byteagt
 byteage    | byteage
 byteane    | byteane
 byteacmp   | byteacmp
 bytealike  | bytealike
 byteanlike | byteanlike
 byteain    | byteain
 byteaout   | byteaout
(12 rows)

SELECT dblink_connect('dbname=postgres options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

SELECT * FROM dblink('select proname, prosrc from pg_proc')
  AS t1(proname name, prosrc text) WHERE proname LIKE 'bytea%';
  proname   |   prosrc
------------+------------
 byteacat   | byteacat
 byteaeq    | byteaeq
 bytealt    | bytealt
 byteale    | byteale
 byteagt    | byteagt
 byteage    | byteage
 byteane    | byteane
 byteacmp   | byteacmp
 bytealike  | bytealike
 byteanlike | byteanlike
 byteain    | byteain
 byteaout   | byteaout
(12 rows)

SELECT dblink_connect('myconn', 'dbname=regression options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

SELECT * FROM dblink('myconn', 'select proname, prosrc from pg_proc')
  AS t1(proname name, prosrc text) WHERE proname LIKE 'bytea%';
  proname   |   prosrc
------------+------------
 bytearecv  | bytearecv
 byteasend  | byteasend
 byteale    | byteale
 byteagt    | byteagt
 byteage    | byteage
 byteane    | byteane
 byteacmp   | byteacmp
 bytealike  | bytealike
 byteanlike | byteanlike
 byteacat   | byteacat
 byteaeq    | byteaeq
 bytealt    | bytealt
 byteain    | byteain
 byteaout   | byteaout
(14 rows)

dblink_exec

dblink_exec — executes a command in a remote database

Synopsis

dblink_exec(text connname, text sql [, bool fail_on_error]) returns text
dblink_exec(text connstr, text sql [, bool fail_on_error]) returns text
dblink_exec(text sql [, bool fail_on_error]) returns text

Description

dblink_exec executes a command (that is, any SQL statement that doesn't return rows) in a remote database.

Arguments

connname

Name of the connection to use; omit this parameter to use the unnamed connection.

connstr

A connection info string, as previously described for dblink_connect.

sql

The SQL command that you wish to execute in the remote database, for example insert into foo values(0,'a','{"a0","b0","c0"}').

fail_on_error

Return Value

Returns status, either the command's status string or ERROR.

Examples

SELECT dblink_connect('dbname=dblink_test_standby');
 dblink_connect
----------------
 OK
(1 row)

SELECT dblink_exec('insert into foo values(21,''z'',''{"a0","b0","c0"}'');');
   dblink_exec
-----------------
 INSERT 943366 1
(1 row)

SELECT dblink_connect('myconn', 'dbname=regression');
 dblink_connect
----------------
 OK
(1 row)

SELECT dblink_exec('myconn', 'insert into foo values(21,''z'',''{"a0","b0","c0"}'');');
   dblink_exec
------------------
 INSERT 6432584 1
(1 row)

SELECT dblink_exec('myconn', 'insert into pg_class values (''foo'')',false);
NOTICE:  sql error
DETAIL:  ERROR:  null value in column "relnamespace" violates not-null constraint

 dblink_exec
-------------
 ERROR
(1 row)

dblink_open

dblink_open — opens a cursor in a remote database

Synopsis

dblink_open(text cursorname, text sql [, bool fail_on_error]) returns text
dblink_open(text connname, text cursorname, text sql [, bool fail_on_error]) returns text

Description

dblink_open() opens a cursor in a remote database. The cursor can subsequently be manipulated with dblink_fetch() and dblink_close().

Arguments

connname

Name of the connection to use; omit this parameter to use the unnamed connection.

cursorname

The name to assign to this cursor.

sql

The SELECT statement that you wish to execute in the remote database, for example select * from pg_class.

fail_on_error

Return Value

Returns status, either OK or ERROR.

Notes

Since a cursor can only persist within a transaction, dblink_open starts an explicit transaction block (BEGIN) on the remote side, if the remote side was not already within a transaction. This transaction will be closed again when the matching dblink_close is executed. Note that if you use dblink_exec to change data between dblink_open and dblink_close, and then an error occurs or you use dblink_disconnect before dblink_close, your change will be lost because the transaction will be aborted.

Examples

SELECT dblink_connect('dbname=postgres options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

SELECT dblink_open('foo', 'select proname, prosrc from pg_proc');
 dblink_open
-------------
 OK
(1 row)

dblink_fetch

dblink_fetch — returns rows from an open cursor in a remote database

Synopsis

Description

dblink_fetch fetches rows from a cursor previously established by dblink_open.

Arguments

connname

Name of the connection to use; omit this parameter to use the unnamed connection.

cursorname

The name of the cursor to fetch from.

howmany

The maximum number of rows to retrieve. The next howmany rows are fetched, starting at the current cursor position, moving forward. Once the cursor has reached its end, no more rows are produced.

fail_on_error

Return Value

The function returns the row(s) fetched from the cursor. To use this function, you will need to specify the expected set of columns, as previously discussed for dblink.

Notes

On a mismatch between the number of return columns specified in the FROM clause, and the actual number of columns returned by the remote cursor, an error will be thrown. In this event, the remote cursor is still advanced by as many rows as it would have been if the error had not occurred. The same is true for any other error occurring in the local query after the remote FETCH has been done.

Examples

dblink_close

dblink_close — closes a cursor in a remote database

Synopsis

dblink_close(text cursorname [, bool fail_on_error]) returns text
dblink_close(text connname, text cursorname [, bool fail_on_error]) returns text

Description

dblink_close closes a cursor previously opened with dblink_open.

Arguments

connname

Name of the connection to use; omit this parameter to use the unnamed connection.

cursorname

The name of the cursor to close.

fail_on_error

Return Value

Returns status, either OK or ERROR.

Notes

If dblink_open started an explicit transaction block, and this is the last remaining open cursor in this connection, dblink_close will issue the matching COMMIT.

Examples

SELECT dblink_connect('dbname=postgres options=-csearch_path=');
 dblink_connect
----------------
 OK
(1 row)

SELECT dblink_open('foo', 'select proname, prosrc from pg_proc');
 dblink_open
-------------
 OK
(1 row)

SELECT dblink_close('foo');
 dblink_close
--------------
 OK
(1 row)

dblink_get_connections

dblink_get_connections — returns the names of all open named dblink connections

Synopsis

dblink_get_connections() returns text[]

Description

dblink_get_connections returns an array of the names of all open named dblink connections.

Return Value

Returns a text array of connection names, or NULL if none.

Examples

SELECT dblink_get_connections();

dblink_error_message

dblink_error_message — gets last error message on the named connection

Synopsis

dblink_error_message(text connname) returns text

Description

dblink_error_message fetches the most recent remote error message for a given connection.

Arguments

connname

Name of the connection to use.

Return Value

Returns last error message, or OK if there has been no error in this connection.

Notes

When asynchronous queries are initiated by dblink_send_query, the error message associated with the connection might not get updated until the server's response message is consumed. This typically means that dblink_is_busy or dblink_get_result should be called prior to dblink_error_message, so that any error generated by the asynchronous query will be visible.

Examples

SELECT dblink_error_message('dtest1');

dblink_send_query

dblink_send_query — sends an async query to a remote database

Synopsis

dblink_send_query(text connname, text sql) returns int

Description

dblink_send_query sends a query to be executed asynchronously, that is, without immediately waiting for the result. There must not be an async query already in progress on the connection.

After successfully dispatching an async query, completion status can be checked with dblink_is_busy, and the results are ultimately collected with dblink_get_result. It is also possible to attempt to cancel an active async query using dblink_cancel_query.

Arguments

connname

Name of the connection to use.

sql

The SQL statement that you wish to execute in the remote database, for example select * from pg_class.

Return Value

Returns 1 if the query was successfully dispatched, 0 otherwise.

Examples

SELECT dblink_send_query('dtest1', 'SELECT * FROM foo WHERE f1 < 3');

dblink_is_busy

dblink_get_notify

dblink_get_notify — retrieve async notifications on a connection

Synopsis

Description

dblink_get_notify retrieves notifications on either the unnamed connection, or on a named connection if specified. To receive notifications via dblink, LISTEN must first be issued, using dblink_exec. For details see and .

Arguments

connname

The name of a named connection to get notifications on.

Return Value

Returns setof (notify_name text, be_pid int, extra text), or an empty set if none.

Examples

dblink_get_result

dblink_get_result — gets an async query result

Synopsis

Description

dblink_get_result collects the results of an asynchronous query previously sent with dblink_send_query. If the query is not already completed, dblink_get_result will wait until it is.

Arguments

connname

Name of the connection to use.

fail_on_error

Return Value

For an async query (that is, a SQL statement returning rows), the function returns the row(s) produced by the query. To use this function, you will need to specify the expected set of columns, as previously discussed for dblink.

For an async command (that is, a SQL statement not returning rows), the function returns a single row with a single text column containing the command's status string. It is still necessary to specify that the result will have a single text column in the calling FROM clause.

Notes

This function must be called if dblink_send_query returned 1. It must be called once for each query sent, and one additional time to obtain an empty set result, before the connection can be used again.

When using dblink_send_query and dblink_get_result, dblink fetches the entire remote query result before returning any of it to the local query processor. If the query returns a large number of rows, this can result in transient memory bloat in the local session. It may be better to open such a query as a cursor with dblink_open and then fetch a manageable number of rows at a time. Alternatively, use plain dblink(), which avoids memory bloat by spooling large result sets to disk.

Examples

dblink_cancel_query

dblink_cancel_query — cancels any active query on the named connection

Synopsis

dblink_cancel_query(text connname) returns text

Description

dblink_cancel_query attempts to cancel any query that is in progress on the named connection. Note that this is not certain to succeed (since, for example, the remote query might already have finished). A cancel request simply improves the odds that the query will fail soon. You must still complete the normal query protocol, for example by calling dblink_get_result.

Arguments

connname

Name of the connection to use.

Return Value

Returns OK if the cancel request has been sent, or the text of an error message on failure.

Examples

SELECT dblink_cancel_query('dtest1');

dblink_get_pkey

dblink_get_pkey — returns the positions and field names of a relation's primary key fields

Synopsis

Description

dblink_get_pkey provides information about the primary key of a relation in the local database. This is sometimes useful in generating queries to be sent to remote databases.

Arguments

relname

Name of a local relation, for example foo or myschema.mytab. Include double quotes if the name is mixed-case or contains special characters, for example "FooBar"; without quotes, the string will be folded to lower case.

Return Value

Returns one row for each primary key field, or no rows if the relation has no primary key. The result row type is defined as

The position column simply runs from 1 to N; it is the number of the field within the primary key, not the number within the table's columns.

Examples

dblink_build_sql_insert

dblink_build_sql_insert — builds an INSERT statement using a local tuple, replacing the primary key field values with alternative supplied values

Synopsis

Description

dblink_build_sql_insert can be useful in doing selective replication of a local table to a remote database. It selects a row from the local table based on primary key, and then builds a SQL INSERT command that will duplicate that row, but with the primary key values replaced by the values in the last argument. (To make an exact copy of the row, just specify the same values for the last two arguments.)

Arguments

relname

primary_key_attnums

Attribute numbers (1-based) of the primary key fields, for example 1 2.

num_primary_key_atts

The number of primary key fields.

src_pk_att_vals_array

Values of the primary key fields to be used to look up the local tuple. Each field is represented in text form. An error is thrown if there is no local row with these primary key values.

tgt_pk_att_vals_array

Values of the primary key fields to be placed in the resulting INSERT command. Each field is represented in text form.

Return Value

Returns the requested SQL statement as text.

Notes

As of PostgreSQL 9.0, the attribute numbers in primary_key_attnums are interpreted as logical column numbers, corresponding to the column's position in SELECT * FROM relname. Previous versions interpreted the numbers as physical column positions. There is a difference if any column(s) to the left of the indicated column have been dropped during the lifetime of the table.

Examples

dblink_build_sql_delete

dblink_build_sql_delete — builds a DELETE statement using supplied values for primary key field values

Synopsis

Description

dblink_build_sql_delete can be useful in doing selective replication of a local table to a remote database. It builds a SQL DELETE command that will delete the row with the given primary key values.

Arguments

relname

primary_key_attnums

Attribute numbers (1-based) of the primary key fields, for example 1 2.

num_primary_key_atts

The number of primary key fields.

tgt_pk_att_vals_array

Values of the primary key fields to be used in the resulting DELETE command. Each field is represented in text form.

Return Value

Returns the requested SQL statement as text.

Notes

Examples

dblink_build_sql_update

dblink_build_sql_update — builds an UPDATE statement using a local tuple, replacing the primary key field values with alternative supplied values

Synopsis

dblink_build_sql_update(text relname,
                        int2vector primary_key_attnums,
                        integer num_primary_key_atts,
                        text[] src_pk_att_vals_array,
                        text[] tgt_pk_att_vals_array) returns text

Description

dblink_build_sql_update can be useful in doing selective replication of a local table to a remote database. It selects a row from the local table based on primary key, and then builds a SQL UPDATE command that will duplicate that row, but with the primary key values replaced by the values in the last argument. (To make an exact copy of the row, just specify the same values for the last two arguments.) The UPDATE command always assigns all fields of the row — the main difference between this and dblink_build_sql_insert is that it's assumed that the target row already exists in the remote table.

Arguments

relname

primary_key_attnums

Attribute numbers (1-based) of the primary key fields, for example 1 2.num_primary_key_atts

The number of primary key fields.

src_pk_att_vals_array

Values of the primary key fields to be used to look up the local tuple. Each field is represented in text form. An error is thrown if there is no local row with these primary key values.

tgt_pk_att_vals_array

Values of the primary key fields to be placed in the resulting UPDATE command. Each field is represented in text form.

Return Value

Returns the requested SQL statement as text.

Notes

Examples

SELECT dblink_build_sql_update('foo', '1 2', 2, '{"1", "a"}', '{"1", "b"}');
                   dblink_build_sql_update
-------------------------------------------------------------
 UPDATE foo SET f1='1',f2='b',f3='1' WHERE f1='1' AND f2='b'
(1 row)

F.13. earthdistance

The earthdistance module provides two different approaches to calculating great circle distances on the surface of the Earth. The one described first depends on the cube module (which must be installed before earthdistance can be installed). The second one is based on the built-in point data type, using longitude and latitude for the coordinates.

In this module, the Earth is assumed to be perfectly spherical. (If that's too inaccurate for you, you might want to look at the project.)

F.13.1. Cube-Based Earth Distances

Data is stored in cubes that are points (both corners are the same) using 3 coordinates representing the x, y, and z distance from the center of the Earth. A domain earth over cube is provided, which includes constraint checks that the value meets these restrictions and is reasonably close to the actual surface of the Earth.

The radius of the Earth is obtained from the earth() function. It is given in meters. But by changing this one function you can change the module to use some other units, or to use a different value of the radius that you feel is more appropriate.

This package has applications to astronomical databases as well. Astronomers will probably want to change earth() to return a radius of 180/pi() so that distances are in degrees.

Functions are provided to support input in latitude and longitude (in degrees), to support output of latitude and longitude, to calculate the great circle distance between two points and to easily specify a bounding box usable for index searches.

The provided functions are shown in .

Table F.5. Cube-Based Earthdistance Functions

F.13.2. Point-Based Earth Distances

The second part of the module relies on representing Earth locations as values of type point, in which the first component is taken to represent longitude in degrees, and the second component is taken to represent latitude in degrees. Points are taken as (longitude, latitude) and not vice versa because longitude is closer to the intuitive idea of x-axis and latitude to y-axis.

Table F.6. Point-Based Earthdistance Operators

Note that unlike the cube-based part of the module, units are hardwired here: changing the earth() function will not affect the results of this operator.

One disadvantage of the longitude/latitude representation is that you need to be careful about the edge conditions near the poles and near +/- 180 degrees of longitude. The cube-based representation avoids these discontinuities.

F.14. file_fdw

file_fdw 模組提供了外部資料封裝器 file_fdw，可用於存取伺服器檔案系統中的資料檔案，或在伺服器上執行某個程序並取得其輸出。資料檔案或程序輸出必須採用可由 COPY FROM 讀取的格式；有關詳細資訊，請參閱。目前對資料檔案的存取只有讀取的功能。

A foreign table created using this wrapper can have the following options:

filename

Specifies the file to be read. Must be an absolute path name. Either filename or program must be specified, but not both.

program

Specifies the command to be executed. The standard output of this command will be read as though COPY FROM PROGRAM were used. Either program or filename must be specified, but not both.

format

Specifies the data format, the same as COPY's FORMAT option.header

Specifies whether the data has a header line, the same as COPY's HEADER option.

delimiter

Specifies the data delimiter character, the same as COPY's DELIMITER option.

quote

Specifies the data quote character, the same as COPY's QUOTE option.

escape

Specifies the data escape character, the same as COPY's ESCAPE option.

null

Specifies the data null string, the same as COPY's NULL option.

encoding

Specifies the data encoding, the same as COPY's ENCODING option.

Note that while COPY allows options such as HEADER to be specified without a corresponding value, the foreign table option syntax requires a value to be present in all cases. To activate COPY options typically written without a value, you can pass the value TRUE, since all such options are Booleans.

A column of a foreign table created using this wrapper can have the following options:

force_not_null

This is a Boolean option. If true, it specifies that values of the column should not be matched against the null string (that is, the table-level null option). This has the same effect as listing the column in COPY's FORCE_NOT_NULL option.

force_null

This is a Boolean option. If true, it specifies that values of the column which match the null string are returned as NULL even if the value is quoted. Without this option, only unquoted values matching the null string are returned as NULL. This has the same effect as listing the column in COPY's FORCE_NULL option.

COPY's FORCE_QUOTE option is currently not supported by file_fdw.

These options can only be specified for a foreign table or its columns, not in the options of the file_fdw foreign-data wrapper, nor in the options of a server or user mapping using the wrapper.

Changing table-level options requires being a superuser or having the privileges of the default role pg_read_server_files (to use a filename) or the default role pg_execute_server_program (to use a program), for security reasons: only certain users should be able to control which file is read or which program is run. In principle regular users could be allowed to change the other options, but that's not supported at present.

When specifying the program option, keep in mind that the option string is executed by the shell. If you need to pass any arguments to the command that come from an untrusted source, you must be careful to strip or escape any characters that might have special meaning to the shell. For security reasons, it is best to use a fixed command string, or at least avoid passing any user input in it.

For a foreign table using file_fdw, EXPLAIN shows the name of the file to be read or program to be run. For a file, unless COSTS OFF is specified, the file size (in bytes) is shown as well.

Example F.1. Create a Foreign Table for PostgreSQL CSV Logs

file_fdw 其中一個明顯的用途是使 PostgreSQL 活動日誌形成查詢方便的資料表。為此，首先必須先產生記錄為 CSV 檔案，在這裡我們將其稱為 pglog.csv。首先，安裝 file_fdw 延伸套件：

然後建立一個外部伺服器：

現在您可以建外部資料表了。使用 CREATE FOREIGN TABLE 命令，您將需要定義資料表的欄位、CSV 檔案名稱及其格式：

就是這樣-現在您可以直接查詢日誌了。當然，在正式的運作環境中，您需要定義某種方式來處理日誌檔案的輪轉。

F.29. pg_stat_statements

pg_stat_statements 模組提供了一個追踪在伺服器上執行的 SQL 語句統計資訊方法。

必須透過將 pg_stat_statements 加到 postgresql.conf 中的 shared_preload_libraries 中來載入模組，因為它需要額外的共享記憶體。這意味著需要重新啟動伺服器才能載加或刪除模組。

載入 pg_stat_statements 後，它將追踪伺服器所有資料庫的統計資訊。為了存取和處理這些統計資訊，此模組提供了一個檢視表 pg_stat_statements 以及工具程序函數 pg_stat_statements_reset 和 pg_stat_statements。這些不是全域可用的，但可以使用 CREATE EXTENSION pg_stat_statements 為特定資料庫啟用。

F.29.1. The `pg_stat_statements` View

此延伸功能收集的統計數據可透過名為 pg_stat_statements 的檢視表查詢。對於每個不同的資料庫 ID、使用者 ID和查詢語句 ID（此延伸功能可以追踪的最大不同查詢語句數量），在此檢視表會在一筆資料中呈現。檢視表的欄位在 Table F.21 中說明。

Table F.21. `pg_stat_statements` Columns

因為安全因素，僅超級使用者和 pg_read_all_stats 角色成員被允許查看其他使用者所執行的 SQL 語句和 queryid。但是，如果檢視圖已安裝在他們的資料庫中，則其他使用者也可以查看統計內容。

只要有計劃查詢的查詢（即 SELECT、INSERT、UPDATE 和 DELETE）根據內部雜湊計算具有相同的查詢結構，它們就會組合到單筆 pg_stat_statements 資料中。通常，如果兩個查詢在語義上等效，即兩個查詢在此意義上是相同的，只是出現在查詢中的常數內容的值除外。但是，會嚴格地根據資料庫結構維護指令（即所有其他指令）的查詢字串進行比較。

為了將查詢與其他查詢搭配而忽略了常數內容時，該常數內容會在 pg_stat_statements 顯示中替換為參數符號，例如 $1。查詢語句的其餘部分是第一個查詢的內容，該查詢具有與 pg_stat_statements 項目關聯的特定 queryid 雜湊值。

In some cases, queries with visibly different texts might get merged into a single pg_stat_statements entry. Normally this will happen only for semantically equivalent queries, but there is a small chance of hash collisions causing unrelated queries to be merged into one entry. (This cannot happen for queries belonging to different users or databases, however.)

Since the queryid hash value is computed on the post-parse-analysis representation of the queries, the opposite is also possible: queries with identical texts might appear as separate entries, if they have different meanings as a result of factors such as different search_path settings.

Consumers of pg_stat_statements may wish to use queryid (perhaps in combination with dbid and userid) as a more stable and reliable identifier for each entry than its query text. However, it is important to understand that there are only limited guarantees around the stability of the queryid hash value. Since the identifier is derived from the post-parse-analysis tree, its value is a function of, among other things, the internal object identifiers appearing in this representation. This has some counterintuitive implications. For example, pg_stat_statements will consider two apparently-identical queries to be distinct, if they reference a table that was dropped and recreated between the executions of the two queries. The hashing process is also sensitive to differences in machine architecture and other facets of the platform. Furthermore, it is not safe to assume that queryid will be stable across major versions of PostgreSQL.

As a rule of thumb, queryid values can be assumed to be stable and comparable only so long as the underlying server version and catalog metadata details stay exactly the same. Two servers participating in replication based on physical WAL replay can be expected to have identical queryid values for the same query. However, logical replication schemes do not promise to keep replicas identical in all relevant details, so queryid will not be a useful identifier for accumulating costs across a set of logical replicas. If in doubt, direct testing is recommended.

The parameter symbols used to replace constants in representative query texts start from the next number after the highest $n parameter in the original query text, or $1 if there was none. It's worth noting that in some cases there may be hidden parameter symbols that affect this numbering. For example, PL/pgSQL uses hidden parameter symbols to insert values of function local variables into queries, so that a PL/pgSQL statement like SELECT i + 1 INTO j would have representative text like SELECT i + $2.

The representative query texts are kept in an external disk file, and do not consume shared memory. Therefore, even very lengthy query texts can be stored successfully. However, if many long query texts are accumulated, the external file might grow unmanageably large. As a recovery method if that happens, pg_stat_statements may choose to discard the query texts, whereupon all existing entries in the pg_stat_statements view will show null query fields, though the statistics associated with each queryid are preserved. If this happens, consider reducing pg_stat_statements.max to prevent recurrences.

F.29.2. Functions

pg_stat_statements_reset(userid Oid, dbid Oid, queryid bigint) returns void

pg_stat_statements_reset 會移除到目前為止由 pg_stat_statements 收集的與指定的 userid、dbid 和 queryid 相對應的統計資訊。如果未指定任何參數，則對每個參數使用預設值 0（無效），並且將重置與其他參數相對應的統計資訊。如果未指定任何參數，或者所有指定的參數均為0（無效），則將移除所有統計資訊。預設情況下，此功能只能由超級使用者執行。可以使用 GRANT 將存取權限授予其他人。

pg_stat_statements(showtext boolean) returns setof record

pg_stat_statements 檢視表是根據也稱為 pg_stat_statements 的函數定義的。用戶端可以直接呼叫 pg_stat_statements 函數，並透過指定showtext := false 可以省略查詢字串（即，對應於檢視圖查詢欄位的 OUT 參數將回傳 null）。此功能旨在支持可能希望避免重複獲取長度不確定的查詢字串成本的外部工具。這樣的工具可以代替暫存每個項目本身觀察到的第一個查詢字串，因為 pg_stat_statements 本身就是這樣做的，然後僅根據需要檢索查詢字串。由於伺服器將查詢字串儲存在檔案中，因此此方法可以減少用於重複檢查 pg_stat_statements 資料的實際 I/O 成本。

F.29.3. Configuration Parameters

pg_stat_statements.max (integer)

pg_stat_statements.max 設定此模組所追踪的語句數量上限（即 pg_stat_statements 檢視表中的最大資料列數）。如果觀察到的語句不同，則將丟棄有關執行最少的語句的資訊。預設值為 5,000。只能在伺服器啟動時設定此參數。

pg_stat_statements.track (enum)

pg_stat_statements.track 控制此模組關注哪些語句。指定 top 表示追踪最上層語句（由用戶端直接發出的語句），也可以全部追踪巢狀語句（例如在函數內呼叫的語句），或者不指定以停用語句統計資訊收集。預設值為 top。只有超級使用者可以變更此設定。

pg_stat_statements.track_utility (boolean)

pg_stat_statements.track_utility 控制模組是否追踪管理程序命令。管理程序命令是除 SELECT、INSERT、UPDATE 和 DELETE 之外的所有命令。預設值為 on。只有超級使用者可以變更改此設定。

pg_stat_statements.save (boolean)

pg_stat_statements.save 指定是否在伺服器關閉時保存語句統計資訊。如果關閉，則統計資訊不會在關閉時保存，也不會在伺服器啟動時重新載入。預設值為開。只能在 postgresql.conf 檔案或伺服器命令列中設定此參數。

此模塊需要與 pg_stat_statements.max 成比例的額外共享記憶體。請注意，即使將 pg_stat_statements.track 設定為 none，只要載入模組，就會佔用記憶體空間。

這些參數必須在 postgresql.conf 中設定。典型的用法可能是：

# postgresql.conf
shared_preload_libraries = 'pg_stat_statements'

pg_stat_statements.max = 10000
pg_stat_statements.track = all

F.29.4. Sample Output

bench=# SELECT pg_stat_statements_reset();

$ pgbench -i bench
$ pgbench -c10 -t300 bench

bench=# \x
bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
               nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
          FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
-[ RECORD 1 ]--------------------------------------------------------------------
query       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2
calls       | 3000
total_time  | 25565.855387
rows        | 3000
hit_percent | 100.0000000000000000
-[ RECORD 2 ]--------------------------------------------------------------------
query       | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2
calls       | 3000
total_time  | 20756.669379
rows        | 3000
hit_percent | 100.0000000000000000
-[ RECORD 3 ]--------------------------------------------------------------------
query       | copy pgbench_accounts from stdin
calls       | 1
total_time  | 291.865911
rows        | 100000
hit_percent | 100.0000000000000000
-[ RECORD 4 ]--------------------------------------------------------------------
query       | UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2
calls       | 3000
total_time  | 271.232977
rows        | 3000
hit_percent | 98.5723926698852723
-[ RECORD 5 ]--------------------------------------------------------------------
query       | alter table pgbench_accounts add primary key (aid)
calls       | 1
total_time  | 160.588563
rows        | 0
hit_percent | 100.0000000000000000


bench=# SELECT pg_stat_statements_reset(0,0,s.queryid) FROM pg_stat_statements AS s
            WHERE s.query = 'UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2';

bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
               nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
          FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
-[ RECORD 1 ]--------------------------------------------------------------------
query       | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2
calls       | 3000
total_time  | 20756.669379
rows        | 3000
hit_percent | 100.0000000000000000
-[ RECORD 2 ]--------------------------------------------------------------------
query       | copy pgbench_accounts from stdin
calls       | 1
total_time  | 291.865911
rows        | 100000
hit_percent | 100.0000000000000000
-[ RECORD 3 ]--------------------------------------------------------------------
query       | UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2
calls       | 3000
total_time  | 271.232977
rows        | 3000
hit_percent | 98.5723926698852723
-[ RECORD 4 ]--------------------------------------------------------------------
query       | alter table pgbench_accounts add primary key (aid)
calls       | 1
total_time  | 160.588563
rows        | 0
hit_percent | 100.0000000000000000
-[ RECORD 5 ]--------------------------------------------------------------------
query       | vacuum analyze pgbench_accounts
calls       | 1
total_time  | 136.448116
rows        | 0
hit_percent | 99.9201915403032721

bench=# SELECT pg_stat_statements_reset(0,0,0);

bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
               nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
          FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
-[ RECORD 1 ]---------------------------------------
query       | SELECT pg_stat_statements_reset(0,0,0)
calls       | 1
total_time  | 0.189497
rows        | 1
hit_percent |

F.29.5. Authors

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>. Query normalization added by Peter Geoghegan <peter@2ndquadrant.com>.

F.31. pg_trgm

pg_trgm 模組提供了用於根據 trigram 配對決定包含字母及數字文字內容相似性的函數和運算子，以及支援快速搜索相似字串的索引運算子類。

F.31.1. Trigram (or Trigraph) Concepts

trigram 是從字串中提取的一組三個連續字元。我們可以透過計算兩個字串共享的三連詞的數量來衡量它們的相似性。這個簡單的想法對測量許多自然語言中單詞的相似性非常有用。

Note

pg_trgm ignores non-word characters (non-alphanumerics) when extracting trigrams from a string. Each word is considered to have two spaces prefixed and one space suffixed when determining the set of trigrams contained in the string. For example, the set of trigrams in the string “cat” is “ c”, “ ca”, “cat”, and “at ”. The set of trigrams in the string “foo|bar” is “ f”, “ fo”, “foo”, “oo ”, “ b”, “ ba”, “bar”, and “ar ”.

F.31.2. Functions and Operators

The functions provided by the pg_trgm module are shown in Table F.24, the operators in Table F.25.

Table F.24. `pg_trgm` Functions

Consider the following example:

# SELECT word_similarity('word', 'two words');
 word_similarity
-----------------
             0.8
(1 row)

In the first string, the set of trigrams is {" w"," wo","wor","ord","rd "}. In the second string, the ordered set of trigrams is {" t"," tw","two","wo "," w"," wo","wor","ord","rds","ds "}. The most similar extent of an ordered set of trigrams in the second string is {" w"," wo","wor","ord"}, and the similarity is 0.8.

This function returns a value that can be approximately understood as the greatest similarity between the first string and any substring of the second string. However, this function does not add padding to the boundaries of the extent. Thus, the number of additional characters present in the second string is not considered, except for the mismatched word boundaries.

At the same time, strict_word_similarity(text, text) selects an extent of words in the second string. In the example above, strict_word_similarity(text, text) would select the extent of a single word 'words', whose set of trigrams is {" w"," wo","wor","ord","rds","ds "}.

# SELECT strict_word_similarity('word', 'two words'), similarity('word', 'words');
 strict_word_similarity | similarity
------------------------+------------
               0.571429 |   0.571429
(1 row)

Thus, the strict_word_similarity(text, text) function is useful for finding the similarity to whole words, while word_similarity(text, text) is more suitable for finding the similarity for parts of words.

Table F.25. `pg_trgm` Operators

F.31.3. GUC Parameters

pg_trgm.similarity_threshold (real)

Sets the current similarity threshold that is used by the % operator. The threshold must be between 0 and 1 (default is 0.3).pg_trgm.word_similarity_threshold (real)

Sets the current word similarity threshold that is used by the <% and %> operators. The threshold must be between 0 and 1 (default is 0.6).pg_trgm.strict_word_similarity_threshold (real)

Sets the current strict word similarity threshold that is used by the <<% and %>> operators. The threshold must be between 0 and 1 (default is 0.5).

F.31.4. Index Support

The pg_trgm module provides GiST and GIN index operator classes that allow you to create an index over a text column for the purpose of very fast similarity searches. These index types support the above-described similarity operators, and additionally support trigram-based index searches for LIKE, ILIKE, ~ and ~* queries. (These indexes do not support equality nor simple comparison operators, so you may need a regular B-tree index too.)

Example:

CREATE TABLE test_trgm (t text);
CREATE INDEX trgm_idx ON test_trgm USING GIST (t gist_trgm_ops);

CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops);

At this point, you will have an index on the t column that you can use for similarity searching. A typical query is

SELECT t, similarity(t, 'word') AS sml
  FROM test_trgm
  WHERE t % 'word'
  ORDER BY sml DESC, t;

This will return all values in the text column that are sufficiently similar to word, sorted from best match to worst. The index will be used to make this a fast operation even over very large data sets.

A variant of the above query is

SELECT t, t <-> 'word' AS dist
  FROM test_trgm
  ORDER BY dist LIMIT 10;

This can be implemented quite efficiently by GiST indexes, but not by GIN indexes. It will usually beat the first formulation when only a small number of the closest matches is wanted.

Also you can use an index on the t column for word similarity or strict word similarity. Typical queries are:

SELECT t, word_similarity('word', t) AS sml
  FROM test_trgm
  WHERE 'word' <% t
  ORDER BY sml DESC, t;

and

SELECT t, strict_word_similarity('word', t) AS sml
  FROM test_trgm
  WHERE 'word' <<% t
  ORDER BY sml DESC, t;

This will return all values in the text column for which there is a continuous extent in the corresponding ordered trigram set that is sufficiently similar to the trigram set of word, sorted from best match to worst. The index will be used to make this a fast operation even over very large data sets.

Possible variants of the above queries are:

SELECT t, 'word' <<-> t AS dist
  FROM test_trgm
  ORDER BY dist LIMIT 10;

and

SELECT t, 'word' <<<-> t AS dist
  FROM test_trgm
  ORDER BY dist LIMIT 10;

This can be implemented quite efficiently by GiST indexes, but not by GIN indexes.

Beginning in PostgreSQL 9.1, these index types also support index searches for LIKE and ILIKE, for example

SELECT * FROM test_trgm WHERE t LIKE '%foo%bar';

The index search works by extracting trigrams from the search string and then looking these up in the index. The more trigrams in the search string, the more effective the index search is. Unlike B-tree based searches, the search string need not be left-anchored.

Beginning in PostgreSQL 9.3, these index types also support index searches for regular-expression matches (~ and ~* operators), for example

SELECT * FROM test_trgm WHERE t ~ '(foo|bar)';

The index search works by extracting trigrams from the regular expression and then looking these up in the index. The more trigrams that can be extracted from the regular expression, the more effective the index search is. Unlike B-tree based searches, the search string need not be left-anchored.

For both LIKE and regular-expression searches, keep in mind that a pattern with no extractable trigrams will degenerate to a full-index scan.

The choice between GiST and GIN indexing depends on the relative performance characteristics of GiST and GIN, which are discussed elsewhere.

F.31.5. Text Search Integration

Trigram matching is a very useful tool when used in conjunction with a full text index. In particular it can help to recognize misspelled input words that will not be matched directly by the full text search mechanism.

The first step is to generate an auxiliary table containing all the unique words in the documents:

CREATE TABLE words AS SELECT word FROM
        ts_stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');

where documents is a table that has a text field bodytext that we wish to search. The reason for using the simple configuration with the to_tsvector function, instead of using a language-specific configuration, is that we want a list of the original (unstemmed) words.

Next, create a trigram index on the word column:

CREATE INDEX words_idx ON words USING GIN (word gin_trgm_ops);

Now, a SELECT query similar to the previous example can be used to suggest spellings for misspelled words in user search terms. A useful extra test is to require that the selected words are also of similar length to the misspelled word.

Note

Since the words table has been generated as a separate, static table, it will need to be periodically regenerated so that it remains reasonably up-to-date with the document collection. Keeping it exactly current is usually unnecessary.

F.31.6. References

GiST Development Site http://www.sai.msu.su/~megera/postgres/gist/

Tsearch2 Development Site http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

F.31.7. Authors

Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia

Teodor Sigaev <teodor@sigaev.ru>, Moscow, Delta-Soft Ltd.,Russia

Alexander Korotkov <a.korotkov@postgrespro.ru>, Moscow, Postgres Professional, Russia

Documentation: Christopher Kings-Lynne

This module is sponsored by Delta-Soft Ltd., Moscow, Russia.

F.32. pg_visibility

The pg_visibility module provides a means for examining the visibility map (VM) and page-level visibility information of a table. It also provides functions to check the integrity of a visibility map and to force it to be rebuilt.

Three different bits are used to store information about page-level visibility. The all-visible bit in the visibility map indicates that every tuple in the corresponding page of the relation is visible to every current and future transaction. The all-frozen bit in the visibility map indicates that every tuple in the page is frozen; that is, no future vacuum will need to modify the page until such time as a tuple is inserted, updated, deleted, or locked on that page. The page header's PD_ALL_VISIBLE bit has the same meaning as the all-visible bit in the visibility map, but is stored within the data page itself rather than in a separate data structure. These two bits will normally agree, but the page's all-visible bit can sometimes be set while the visibility map bit is clear after a crash recovery. The reported values can also disagree because of a change that occurs after pg_visibility examines the visibility map and before it examines the data page. Any event that causes data corruption can also cause these bits to disagree.

Functions that display information about PD_ALL_VISIBLE bits are much more costly than those that only consult the visibility map, because they must read the relation's data blocks rather than only the (much smaller) visibility map. Functions that check the relation's data blocks are similarly expensive.

F.32.1. Functions

pg_visibility_map(relation regclass, blkno bigint, all_visible OUT boolean, all_frozen OUT boolean) returns record

Returns the all-visible and all-frozen bits in the visibility map for the given block of the given relation.pg_visibility(relation regclass, blkno bigint, all_visible OUT boolean, all_frozen OUT boolean, pd_all_visible OUT boolean) returns record

Returns the all-visible and all-frozen bits in the visibility map for the given block of the given relation, plus the PD_ALL_VISIBLE bit of that block.pg_visibility_map(relation regclass, blkno OUT bigint, all_visible OUT boolean, all_frozen OUT boolean) returns setof record

Returns the all-visible and all-frozen bits in the visibility map for each block of the given relation.pg_visibility(relation regclass, blkno OUT bigint, all_visible OUT boolean, all_frozen OUT boolean, pd_all_visible OUT boolean) returns setof record

Returns the all-visible and all-frozen bits in the visibility map for each block of the given relation, plus the PD_ALL_VISIBLE bit of each block.pg_visibility_map_summary(relation regclass, all_visible OUT bigint, all_frozen OUT bigint) returns record

Returns the number of all-visible pages and the number of all-frozen pages in the relation according to the visibility map.pg_check_frozen(relation regclass, t_ctid OUT tid) returns setof tid

Returns the TIDs of non-frozen tuples stored in pages marked all-frozen in the visibility map. If this function returns a non-empty set of TIDs, the visibility map is corrupt.pg_check_visible(relation regclass, t_ctid OUT tid) returns setof tid

Returns the TIDs of non-all-visible tuples stored in pages marked all-visible in the visibility map. If this function returns a non-empty set of TIDs, the visibility map is corrupt.pg_truncate_visibility_map(relation regclass) returns void

Truncates the visibility map for the given relation. This function is useful if you believe that the visibility map for the relation is corrupt and wish to force rebuilding it. The first VACUUM executed on the given relation after this function is executed will scan every page in the relation and rebuild the visibility map. (Until that is done, queries will treat the visibility map as containing all zeroes.)

By default, these functions are executable only by superusers and members of the pg_stat_scan_tables role, with the exception of pg_truncate_visibility_map(relation regclass) which can only be executed by superusers.

F.32.2. Author

Robert Haas <rhaas@postgresql.org>

F.33. postgres_fdw

The postgres_fdw module provides the foreign-data wrapper postgres_fdw, which can be used to access data stored in external PostgreSQL servers.

The functionality provided by this module overlaps substantially with the functionality of the older module. But postgres_fdw provides more transparent and standards-compliant syntax for accessing remote tables, and can give better performance in many cases.

To prepare for remote access using postgres_fdw:

Install the postgres_fdw extension using .
Create a foreign server object, using , to represent each remote database you want to connect to. Specify connection information, except user and password, as options of the server object.
Create a user mapping, using , for each database user you want to allow to access each foreign server. Specify the remote user name and password to use as user and password options of the user mapping.
Create a foreign table, using or , for each remote table you want to access. The columns of the foreign table must match the referenced remote table. You can, however, use table and/or column names different from the remote table's, if you specify the correct remote names as options of the foreign table object.

Now you need only SELECT from a foreign table to access the data stored in its underlying remote table. You can also modify the remote table using INSERT, UPDATE, or DELETE. (Of course, the remote user you have specified in your user mapping must have privileges to do these things.)

Note that postgres_fdw currently lacks support for INSERT statements with an ON CONFLICT DO UPDATE clause. However, the ON CONFLICT DO NOTHING clause is supported, provided a unique index inference specification is omitted. Note also that postgres_fdw supports row movement invoked by UPDATE statements executed on partitioned tables, but it currently does not handle the case where a remote partition chosen to insert a moved row into is also an UPDATE target partition that will be updated later.

It is generally recommended that the columns of a foreign table be declared with exactly the same data types, and collations if applicable, as the referenced columns of the remote table. Although postgres_fdw is currently rather forgiving about performing data type conversions at need, surprising semantic anomalies may arise when types or collations do not match, due to the remote server interpreting WHERE clauses slightly differently from the local server.

Note that a foreign table can be declared with fewer columns, or with a different column order, than its underlying remote table has. Matching of columns to the remote table is by name, not position.

F.33.1. FDW Options of postgres_fdw

F.33.1.1. Connection Options

A foreign server using the postgres_fdw foreign data wrapper can have the same options that libpq accepts in connection strings, as described in , except that these options are not allowed:

user and password (specify these in a user mapping, instead)
client_encoding (this is automatically set from the local server encoding)
fallback_application_name (always set to postgres_fdw)

Only superusers may connect to foreign servers without password authentication, so always specify the password option for user mappings belonging to non-superusers.

F.33.1.2. Object Name Options

These options can be used to control the names used in SQL statements sent to the remote PostgreSQL server. These options are needed when a foreign table is created with names different from the underlying remote table's names.schema_name

This option, which can be specified for a foreign table, gives the schema name to use for the foreign table on the remote server. If this option is omitted, the name of the foreign table's schema is used.table_name

This option, which can be specified for a foreign table, gives the table name to use for the foreign table on the remote server. If this option is omitted, the foreign table's name is used.column_name

This option, which can be specified for a column of a foreign table, gives the column name to use for the column on the remote server. If this option is omitted, the column's name is used.

F.33.1.3. Cost Estimation Options

postgres_fdw retrieves remote data by executing queries against remote servers, so ideally the estimated cost of scanning a foreign table should be whatever it costs to be done on the remote server, plus some overhead for communication. The most reliable way to get such an estimate is to ask the remote server and then add something for overhead — but for simple queries, it may not be worth the cost of an additional remote query to get a cost estimate. So postgres_fdw provides the following options to control how cost estimation is done:use_remote_estimate

This option, which can be specified for a foreign table or a foreign server, controls whether postgres_fdw issues remote EXPLAIN commands to obtain cost estimates. A setting for a foreign table overrides any setting for its server, but only for that table. The default is false.fdw_startup_cost

This option, which can be specified for a foreign server, is a numeric value that is added to the estimated startup cost of any foreign-table scan on that server. This represents the additional overhead of establishing a connection, parsing and planning the query on the remote side, etc. The default value is 100.fdw_tuple_cost

This option, which can be specified for a foreign server, is a numeric value that is used as extra cost per-tuple for foreign-table scans on that server. This represents the additional overhead of data transfer between servers. You might increase or decrease this number to reflect higher or lower network delay to the remote server. The default value is 0.01.

F.33.1.4. Remote Execution Options

By default, only WHERE clauses using built-in operators and functions will be considered for execution on the remote server. Clauses involving non-built-in functions are checked locally after rows are fetched. If such functions are available on the remote server and can be relied on to produce the same results as they do locally, performance can be improved by sending such WHERE clauses for remote execution. This behavior can be controlled using the following option:extensions

This option is a comma-separated list of names of PostgreSQL extensions that are installed, in compatible versions, on both the local and remote servers. Functions and operators that are immutable and belong to a listed extension will be considered shippable to the remote server. This option can only be specified for foreign servers, not per-table.

When using the extensions option, it is the user's responsibility that the listed extensions exist and behave identically on both the local and remote servers. Otherwise, remote queries may fail or behave unexpectedly.fetch_size

This option specifies the number of rows postgres_fdw should get in each fetch operation. It can be specified for a foreign table or a foreign server. The option specified on a table overrides an option specified for the server. The default is 100.

F.33.1.5. Updatability Options

By default all foreign tables using postgres_fdw are assumed to be updatable. This may be overridden using the following option:updatable

This option controls whether postgres_fdw allows foreign tables to be modified using INSERT, UPDATE and DELETE commands. It can be specified for a foreign table or a foreign server. A table-level option overrides a server-level option. The default is true.

Of course, if the remote table is not in fact updatable, an error would occur anyway. Use of this option primarily allows the error to be thrown locally without querying the remote server. Note however that the information_schema views will report a postgres_fdw foreign table to be updatable (or not) according to the setting of this option, without any check of the remote server.

F.33.1.6. Importing Options

Importing behavior can be customized with the following options (given in the IMPORT FOREIGN SCHEMA command):import_collate

This option controls whether column COLLATE options are included in the definitions of foreign tables imported from a foreign server. The default is true. You might need to turn this off if the remote server has a different set of collation names than the local server does, which is likely to be the case if it's running on a different operating system.import_default

This option controls whether column DEFAULT expressions are included in the definitions of foreign tables imported from a foreign server. The default is false. If you enable this option, be wary of defaults that might get computed differently on the local server than they would be on the remote server; nextval() is a common source of problems. The IMPORT will fail altogether if an imported default expression uses a function or operator that does not exist locally.import_not_null

This option controls whether column NOT NULL constraints are included in the definitions of foreign tables imported from a foreign server. The default is true.

Tables or foreign tables which are partitions of some other table are automatically excluded. Partitioned tables are imported, unless they are a partition of some other table. Since all data can be accessed through the partitioned table which is the root of the partitioning hierarchy, this approach should allow access to all the data without creating extra objects.

F.33.2. Connection Management

postgres_fdw establishes a connection to a foreign server during the first query that uses a foreign table associated with the foreign server. This connection is kept and re-used for subsequent queries in the same session. However, if multiple user identities (user mappings) are used to access the foreign server, a connection is established for each user mapping.

F.33.3. Transaction Management

During a query that references any remote tables on a foreign server, postgres_fdw opens a transaction on the remote server if one is not already open corresponding to the current local transaction. The remote transaction is committed or aborted when the local transaction commits or aborts. Savepoints are similarly managed by creating corresponding remote savepoints.

The remote transaction uses SERIALIZABLE isolation level when the local transaction has SERIALIZABLE isolation level; otherwise it uses REPEATABLE READ isolation level. This choice ensures that if a query performs multiple table scans on the remote server, it will get snapshot-consistent results for all the scans. A consequence is that successive queries within a single transaction will see the same data from the remote server, even if concurrent updates are occurring on the remote server due to other activities. That behavior would be expected anyway if the local transaction uses SERIALIZABLE or REPEATABLE READ isolation level, but it might be surprising for a READ COMMITTED local transaction. A future PostgreSQL release might modify these rules.

Note that it is currently not supported by postgres_fdw to prepare the remote transaction for two-phase commit.

F.33.4. Remote Query Optimization

postgres_fdw attempts to optimize remote queries to reduce the amount of data transferred from foreign servers. This is done by sending query WHERE clauses to the remote server for execution, and by not retrieving table columns that are not needed for the current query. To reduce the risk of misexecution of queries, WHERE clauses are not sent to the remote server unless they use only data types, operators, and functions that are built-in or belong to an extension that's listed in the foreign server's extensions option. Operators and functions in such clauses must be IMMUTABLE as well. For an UPDATE or DELETE query, postgres_fdw attempts to optimize the query execution by sending the whole query to the remote server if there are no query WHERE clauses that cannot be sent to the remote server, no local joins for the query, no row-level local BEFORE or AFTER triggers or stored generated columns on the target table, and no CHECK OPTION constraints from parent views. In UPDATE, expressions to assign to target columns must use only built-in data types, IMMUTABLE operators, or IMMUTABLE functions, to reduce the risk of misexecution of the query.

When postgres_fdw encounters a join between foreign tables on the same foreign server, it sends the entire join to the foreign server, unless for some reason it believes that it will be more efficient to fetch rows from each table individually, or unless the table references involved are subject to different user mappings. While sending the JOIN clauses, it takes the same precautions as mentioned above for the WHERE clauses.

The query that is actually sent to the remote server for execution can be examined using EXPLAIN VERBOSE.

F.33.5. Remote Query Execution Environment

postgres_fdw likewise establishes remote session settings for various parameters:

These are less likely to be problematic than search_path, but can be handled with function SET options if the need arises.

It is not recommended that you override this behavior by changing the session-level settings of these parameters; that is likely to cause postgres_fdw to malfunction.

F.33.6. Cross-Version Compatibility

postgres_fdw can be used with remote servers dating back to PostgreSQL 8.3. Read-only capability is available back to 8.1. A limitation however is that postgres_fdw generally assumes that immutable built-in functions and operators are safe to send to the remote server for execution, if they appear in a WHERE clause for a foreign table. Thus, a built-in function that was added since the remote server's release might be sent to it for execution, resulting in “function does not exist” or a similar error. This type of failure can be worked around by rewriting the query, for example by embedding the foreign table reference in a sub-SELECT with OFFSET 0 as an optimization fence, and placing the problematic function or operator outside the sub-SELECT.

F.33.7. Examples

Here is an example of creating a foreign table with postgres_fdw. First install the extension:

F.33.8. Author

F.35. sepgsql

sepgsql is a loadable module that supports label-based mandatory access control (MAC) based on SELinux security policy.

Warning

The current implementation has significant limitations, and does not enforce mandatory access control for all actions. See .

F.35.1. Overview

This module integrates with SELinux to provide an additional layer of security checking above and beyond what is normally provided by PostgreSQL. From the perspective of SELinux, this module allows PostgreSQL to function as a user-space object manager. Each table or function access initiated by a DML query will be checked against the system security policy. This check is in addition to the usual SQL permissions checking performed by PostgreSQL.

SELinux access control decisions are made using security labels, which are represented by strings such as system_u:object_r:sepgsql_table_t:s0. Each access control decision involves two labels: the label of the subject attempting to perform the action, and the label of the object on which the operation is to be performed. Since these labels can be applied to any sort of object, access control decisions for objects stored within the database can be (and, with this module, are) subjected to the same general criteria used for objects of any other type, such as files. This design is intended to allow a centralized security policy to protect information assets independent of the particulars of how those assets are stored.

The statement allows assignment of a security label to a database object.

F.35.2. Installation

sepgsql can only be used on Linux 2.6.28 or higher with SELinux enabled. It is not available on any other platform. You will also need libselinux 2.1.10 or higher and selinux-policy 3.9.13 or higher (although some distributions may backport the necessary rules into older policy versions).

The sestatus command allows you to check the status of SELinux. A typical display is:

If SELinux is disabled or not installed, you must set that product up first before installing this module.

To build this module, include the option --with-selinux in your PostgreSQL configure command. Be sure that the libselinux-devel RPM is installed at build time.

Here is an example showing how to initialize a fresh database cluster with sepgsql functions and security labels installed. Adjust the paths shown as appropriate for your installation:

Please note that you may see some or all of the following notifications depending on the particular versions you have of libselinux and selinux-policy:

These messages are harmless and should be ignored.

If the installation process completes without error, you can now start the server normally.

F.35.3. Regression Tests

Due to the nature of SELinux, running the regression tests for sepgsql requires several extra configuration steps, some of which must be done as root. The regression tests will not be run by an ordinary make check or make installcheck command; you must set up the configuration and then invoke the test script manually. The tests must be run in the contrib/sepgsql directory of a configured PostgreSQL build tree. Although they require a build tree, the tests are designed to be executed against an installed server, that is they are comparable to make installcheck not make check.

Second, build and install the policy package for the regression test. The sepgsql-regtest policy is a special purpose policy package which provides a set of rules to be allowed during the regression tests. It should be built from the policy source file sepgsql-regtest.te, which is done using make with a Makefile supplied by SELinux. You will need to locate the appropriate Makefile on your system; the path shown below is only an example. Once built, install this policy package using the semodule command, which loads supplied policy packages into the kernel. If the package is correctly installed, semodule -l should list sepgsql-regtest as an available policy package:

Third, turn on sepgsql_regression_test_mode. For security reasons, the rules in sepgsql-regtest are not enabled by default; the sepgsql_regression_test_mode parameter enables the rules needed to launch the regression tests. It can be turned on using the setsebool command:

Fourth, verify your shell is operating in the unconfined_t domain:

Finally, run the regression test script:

This script will attempt to verify that you have done all the configuration steps correctly, and then it will run the regression tests for the sepgsql module.

After completing the tests, it's recommended you disable the sepgsql_regression_test_mode parameter:

You might prefer to remove the sepgsql-regtest policy entirely:

F.35.4. GUC Parameters

sepgsql.permissive (boolean)

This parameter enables sepgsql to function in permissive mode, regardless of the system setting. The default is off. This parameter can only be set in the postgresql.conf file or on the server command line.

When this parameter is on, sepgsql functions in permissive mode, even if SELinux in general is working in enforcing mode. This parameter is primarily useful for testing purposes.sepgsql.debug_audit (boolean)

This parameter enables the printing of audit messages regardless of the system policy settings. The default is off, which means that messages will be printed according to the system settings.

The security policy of SELinux also has rules to control whether or not particular accesses are logged. By default, access violations are logged, but allowed accesses are not.

This parameter forces all possible logging to be turned on, regardless of the system policy.

F.35.5. Features

F.35.5.1. Controlled Object Classes

The security model of SELinux describes all the access control rules as relationships between a subject entity (typically, a client of the database) and an object entity (such as a database object), each of which is identified by a security label. If access to an unlabeled object is attempted, the object is treated as if it were assigned the label unlabeled_t.

Currently, sepgsql allows security labels to be assigned to schemas, tables, columns, sequences, views, and functions. When sepgsql is in use, security labels are automatically assigned to supported database objects at creation time. This label is called a default security label, and is decided according to the system security policy, which takes as input the creator's label, the label assigned to the new object's parent object and optionally name of the constructed object.

A new database object basically inherits the security label of the parent object, except when the security policy has special rules known as type-transition rules, in which case a different label may be applied. For schemas, the parent object is the current database; for tables, sequences, views, and functions, it is the containing schema; for columns, it is the containing table.

F.35.5.2. DML Permissions

For tables, db_table:select, db_table:insert, db_table:update or db_table:delete are checked for all the referenced target tables depending on the kind of statement; in addition, db_table:select is also checked for all the tables that contain columns referenced in the WHERE or RETURNING clause, as a data source for UPDATE, and so on.

Column-level permissions will also be checked for each referenced column. db_column:select is checked on not only the columns being read using SELECT, but those being referenced in other DML statements; db_column:update or db_column:insert will also be checked for columns being modified by UPDATE or INSERT.

For example, consider:

Here, db_column:update will be checked for t1.x, since it is being updated, db_column:{select update} will be checked for t1.y, since it is both updated and referenced, and db_column:select will be checked for t1.z, since it is only referenced. db_table:{select update} will also be checked at the table level.

For sequences, db_sequence:get_value is checked when we reference a sequence object using SELECT; however, note that we do not currently check permissions on execution of corresponding functions such as lastval().

For views, db_view:expand will be checked, then any other required permissions will be checked on the objects being expanded from the view, individually.

For functions, db_procedure:{execute} will be checked when user tries to execute a function as a part of query, or using fast-path invocation. If this function is a trusted procedure, it also checks db_procedure:{entrypoint} permission to check whether it can perform as entry point of trusted procedure.

In order to access any schema object, db_schema:search permission is required on the containing schema. When an object is referenced without schema qualification, schemas on which this permission is not present will not be searched (just as if the user did not have USAGE privilege on the schema). If an explicit schema qualification is present, an error will occur if the user does not have the requisite permission on the named schema.

The client must be allowed to access all referenced tables and columns, even if they originated from views which were then expanded, so that we apply consistent access control rules independent of the manner in which the table contents are referenced.

The default database privilege system allows database superusers to modify system catalogs using DML commands, and reference or modify toast tables. These operations are prohibited when sepgsql is enabled.

F.35.5.3. DDL Permissions

SELinux defines several permissions to control common operations for each object type; such as creation, alter, drop and relabel of security label. In addition, several object types have special permissions to control their characteristic operations; such as addition or deletion of name entries within a particular schema.

Creating a new database object requires create permission. SELinux will grant or deny this permission based on the client's security label and the proposed security label for the new object. In some cases, additional privileges are required:

Creating a schema object additionally requires add_name permission on the parent schema.
Creating a table additionally requires permission to create each individual table column, just as if each table column were a separate top-level object.
Creating a function marked as LEAKPROOF additionally requires install permission. (This permission is also checked when LEAKPROOF is set for an existing function.)

When DROP command is executed, drop will be checked on the object being removed. Permissions will be also checked for objects dropped indirectly via CASCADE. Deletion of objects contained within a particular schema (tables, views, sequences and procedures) additionally requires remove_name on the schema.

When ALTER command is executed, setattr will be checked on the object being modified for each object types, except for subsidiary objects such as the indexes or triggers of a table, where permissions are instead checked on the parent object. In some cases, additional permissions are required:

Moving an object to a new schema additionally requires remove_name permission on the old schema and add_name permission on the new one.
Setting the LEAKPROOF attribute on a function requires install permission.

F.35.5.4. Trusted Procedures

Trusted procedures are similar to security definer functions or setuid commands. SELinux provides a feature to allow trusted code to run using a security label different from that of the client, generally for the purpose of providing highly controlled access to sensitive data (e.g. rows might be omitted, or the precision of stored values might be reduced). Whether or not a function acts as a trusted procedure is controlled by its security label and the operating system security policy. For example:

The above operations should be performed by an administrative user.

In this case, a regular user cannot reference customer.credit directly, but a trusted procedure show_credit allows the user to print the credit card numbers of customers with some of the digits masked out.

F.35.5.5. Dynamic Domain Transitions

It is possible to use SELinux's dynamic domain transition feature to switch the security label of the client process, the client domain, to a new context, if that is allowed by the security policy. The client domain needs the setcurrent permission and also dyntransition from the old to the new domain.

Dynamic domain transitions should be considered carefully, because they allow users to switch their label, and therefore their privileges, at their option, rather than (as in the case of a trusted procedure) as mandated by the system. Thus, the dyntransition permission is only considered safe when used to switch to a domain with a smaller set of privileges than the original one. For example:

In this example above we were allowed to switch from the larger MCS range c1.c1023 to the smaller range c1.c4, but switching back was denied.

A combination of dynamic domain transition and trusted procedure enables an interesting use case that fits the typical process life-cycle of connection pooling software. Even if your connection pooling software is not allowed to run most of SQL commands, you can allow it to switch the security label of the client using the sepgsql_setcon() function from within a trusted procedure; that should take some credential to authorize the request to switch the client label. After that, this session will have the privileges of the target user, rather than the connection pooler. The connection pooler can later revert the security label change by again using sepgsql_setcon() with NULL argument, again invoked from within a trusted procedure with appropriate permissions checks. The point here is that only the trusted procedure actually has permission to change the effective security label, and only does so when given proper credentials. Of course, for secure operation, the credential store (table, procedure definition, or whatever) must be protected from unauthorized access.

F.35.5.6. Miscellaneous

F.35.6. Sepgsql Functions

Table F.29. Sepgsql Functions

F.35.7. Limitations

Data Definition Language (DDL) Permissions

Due to implementation restrictions, some DDL operations do not check permissions.Data Control Language (DCL) Permissions

Due to implementation restrictions, DCL operations do not check permissions.Row-level access control

PostgreSQL supports row-level access, but sepgsql does not.Covert channels

sepgsql does not try to hide the existence of a certain object, even if the user is not allowed to reference it. For example, we can infer the existence of an invisible object as a result of primary key conflicts, foreign key violations, and so on, even if we cannot obtain the contents of the object. The existence of a top secret table cannot be hidden; we only hope to conceal its contents.

F.35.8. External Resources

This document answers frequently asked questions about SELinux. It focuses primarily on Fedora, but is not limited to Fedora.

F.35.9. Author

F.41. tsm_system_rows

tsm_system_rows 模組提供資料表抽樣方法 SYSTEM_ROWS，此方法可在 SELECT 指令的 TABLESAMPLE 子句中使用。

此資料表抽樣方法接受整數的參數，該參數是要讀取的最大資料筆數。除非資料表沒有足夠的資料，結果樣本將恰好包含那麼多筆資料；否則在這種情況下，將回傳整個資料表。

像內建的 SYSTEM 抽樣方法一樣，SYSTEM_ROWS 執行區塊策略抽樣，因此抽樣並不是完全隨機的，但可能會有些群聚的效應，尤其是在僅要求少量資料的情況下。

SYSTEM_ROWS 不支援 REPEATABLE 子句。

F.41.1. 範例

使用 SYSTEM_ROWS 選擇資料表樣本的範例。首先要安裝延伸功能：

CREATE EXTENSION tsm_system_rows;

然後，您可以在 SELECT 指令中使用它，例如：

SELECT * FROM my_table TABLESAMPLE SYSTEM_ROWS(100);

此命令將從資料表 my_table 回傳 100 筆資料的樣本（除非該資料表沒有 100 筆資料，在這種情況下將回傳其所有資料）。

F.38. tablefunc

tablefunc 模組內含了回傳資料表（即多筆資料列）的各種函數。這些函數本身很有用，也可以用作設計回傳多筆資料列的 C 函數的範例。

F.38.1. Functions Provided

Table F.30 列出了 tablefunc 模組所提供的函數。

Table F.30. `tablefunc` Functions

F.38.1.1. Normal_rand

normal_rand(int numvals, float8 mean, float8 stddev) returns setof float8

normal_rand produces a set of normally distributed random values (Gaussian distribution).

numvals is the number of values to be returned from the function. mean is the mean of the normal distribution of values and stddev is the standard deviation of the normal distribution of values.

For example, this call requests 1000 values with a mean of 5 and a standard deviation of 3:

test=# SELECT * FROM normal_rand(1000, 5, 3);
     normal_rand
----------------------
     1.56556322244898
     9.10040991424657
     5.36957140345079
   -0.369151492880995
    0.283600703686639
       .
       .
       .
     4.82992125404908
     9.71308014517282
     2.49639286969028
(1000 rows)

F.38.1.2. Crosstab(Text)

crosstab(text sql)
crosstab(text sql, int N)

The crosstab function is used to produce “pivot” displays, wherein data is listed across the page rather than down. For example, we might have data like

row1    val11
row1    val12
row1    val13
...
row2    val21
row2    val22
row2    val23
...

which we wish to display like

row1    val11   val12   val13   ...
row2    val21   val22   val23   ...
...

The crosstab function takes a text parameter that is a SQL query producing raw data formatted in the first way, and produces a table formatted in the second way.

The sql parameter is a SQL statement that produces the source set of data. This statement must return one row_name column, one category column, and one value column. N is an obsolete parameter, ignored if supplied (formerly this had to match the number of output value columns, but now that is determined by the calling query).

For example, the provided query might produce a set something like:

 row_name    cat    value
----------+-------+-------
  row1      cat1    val1
  row1      cat2    val2
  row1      cat3    val3
  row1      cat4    val4
  row2      cat1    val5
  row2      cat2    val6
  row2      cat3    val7
  row2      cat4    val8

The crosstab function is declared to return setof record, so the actual names and types of the output columns must be defined in the FROM clause of the calling SELECT statement, for example:

SELECT * FROM crosstab('...') AS ct(row_name text, category_1 text, category_2 text);

This example produces a set something like:

           <== value  columns  ==>
 row_name   category_1   category_2
----------+------------+------------
  row1        val1         val2
  row2        val5         val6

The FROM clause must define the output as one row_name column (of the same data type as the first result column of the SQL query) followed by N value columns (all of the same data type as the third result column of the SQL query). You can set up as many output value columns as you wish. The names of the output columns are up to you.

The crosstab function produces one output row for each consecutive group of input rows with the same row_name value. It fills the output value columns, left to right, with the value fields from these rows. If there are fewer rows in a group than there are output value columns, the extra output columns are filled with nulls; if there are more rows, the extra input rows are skipped.

In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row. Notice that crosstab itself does not pay any attention to the second column of the query result; it's just there to be ordered by, to control the order in which the third-column values appear across the page.

Here is a complete example:

CREATE TABLE ct(id SERIAL, rowid TEXT, attribute TEXT, value TEXT);
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att1','val1');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att2','val2');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att3','val3');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att4','val4');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att1','val5');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att2','val6');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att3','val7');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att4','val8');

SELECT *
FROM crosstab(
  'select rowid, attribute, value
   from ct
   where attribute = ''att2'' or attribute = ''att3''
   order by 1,2')
AS ct(row_name text, category_1 text, category_2 text, category_3 text);

 row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
 test1    | val2       | val3       |
 test2    | val6       | val7       |
(2 rows)

You can avoid always having to write out a FROM clause to define the output columns, by setting up a custom crosstab function that has the desired output row type wired into its definition. This is described in the next section. Another possibility is to embed the required FROM clause in a view definition.

另請參閱 psql 中的 \crosstabview 指令，該指令提供的功能類似於 crosstab()。

F.38.1.3. CrosstabN(Text)

crosstabN(text sql)

The crosstabN functions are examples of how to set up custom wrappers for the general crosstab function, so that you need not write out column names and types in the calling SELECT query. The tablefunc module includes crosstab2, crosstab3, and crosstab4, whose output row types are defined as

CREATE TYPE tablefunc_crosstab_N AS (
    row_name TEXT,
    category_1 TEXT,
    category_2 TEXT,
        .
        .
        .
    category_N TEXT
);

Thus, these functions can be used directly when the input query produces row_name and value columns of type text, and you want 2, 3, or 4 output values columns. In all other ways they behave exactly as described above for the general crosstab function.

For instance, the example given in the previous section would also work as

SELECT *
FROM crosstab3(
  'select rowid, attribute, value
   from ct
   where attribute = ''att2'' or attribute = ''att3''
   order by 1,2');

These functions are provided mostly for illustration purposes. You can create your own return types and functions based on the underlying crosstab() function. There are two ways to do it:

Create a composite type describing the desired output columns, similar to the examples in contrib/tablefunc/tablefunc--1.0.sql. Then define a unique function name accepting one text parameter and returning setof your_type_name, but linking to the same underlying crosstab C function. For example, if your source data produces row names that are text, and values that are float8, and you want 5 value columns:

CREATE TYPE my_crosstab_float8_5_cols AS (
    my_row_name text,
    my_category_1 float8,
    my_category_2 float8,
    my_category_3 float8,
    my_category_4 float8,
    my_category_5 float8
);

CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text)
    RETURNS setof my_crosstab_float8_5_cols
    AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;

Use OUT parameters to define the return type implicitly. The same example could also be done this way:

CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(
    IN text,
    OUT my_row_name text,
    OUT my_category_1 float8,
    OUT my_category_2 float8,
    OUT my_category_3 float8,
    OUT my_category_4 float8,
    OUT my_category_5 float8)
  RETURNS setof record
  AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;

F.38.1.4. Crosstab(Text, Text)

crosstab(text source_sql, text category_sql)

The main limitation of the single-parameter form of crosstab is that it treats all values in a group alike, inserting each value into the first available column. If you want the value columns to correspond to specific categories of data, and some groups might not have data for some of the categories, that doesn't work well. The two-parameter form of crosstab handles this case by providing an explicit list of the categories corresponding to the output columns.

source_sql is a SQL statement that produces the source set of data. This statement must return one row_name column, one category column, and one value column. It may also have one or more “extra” columns. The row_name column must be first. The category and value columns must be the last two columns, in that order. Any columns between row_name and category are treated as “extra”. The “extra” columns are expected to be the same for all rows with the same row_name value.

For example, source_sql might produce a set something like:

SELECT row_name, extra_col, cat, value FROM foo ORDER BY 1;

 row_name    extra_col   cat    value
----------+------------+-----+---------
  row1         extra1    cat1    val1
  row1         extra1    cat2    val2
  row1         extra1    cat4    val4
  row2         extra2    cat1    val5
  row2         extra2    cat2    val6
  row2         extra2    cat3    val7
  row2         extra2    cat4    val8

category_sql is a SQL statement that produces the set of categories. This statement must return only one column. It must produce at least one row, or an error will be generated. Also, it must not produce duplicate values, or an error will be generated. category_sql might be something like:

SELECT DISTINCT cat FROM foo ORDER BY 1;
    cat
  -------
    cat1
    cat2
    cat3
    cat4

The crosstab function is declared to return setof record, so the actual names and types of the output columns must be defined in the FROM clause of the calling SELECT statement, for example:

SELECT * FROM crosstab('...', '...')
    AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);

This will produce a result something like:

                  <==  value  columns   ==>
row_name   extra   cat1   cat2   cat3   cat4
---------+-------+------+------+------+------
  row1     extra1  val1   val2          val4
  row2     extra2  val5   val6   val7   val8

The FROM clause must define the proper number of output columns of the proper data types. If there are N columns in the source_sql query's result, the first N-2 of them must match up with the first N-2 output columns. The remaining output columns must have the type of the last column of the source_sql query's result, and there must be exactly as many of them as there are rows in the category_sql query's result.

The crosstab function produces one output row for each consecutive group of input rows with the same row_name value. The output row_name column, plus any “extra” columns, are copied from the first row of the group. The output value columns are filled with the value fields from rows having matching category values. If a row's category does not match any output of the category_sql query, its value is ignored. Output columns whose matching category is not present in any input row of the group are filled with nulls.

In practice the source_sql query should always specify ORDER BY 1 to ensure that values with the same row_name are brought together. However, ordering of the categories within a group is not important. Also, it is essential to be sure that the order of the category_sql query's output matches the specified output column order.

Here are two complete examples:

create table sales(year int, month int, qty int);
insert into sales values(2007, 1, 1000);
insert into sales values(2007, 2, 1500);
insert into sales values(2007, 7, 500);
insert into sales values(2007, 11, 1500);
insert into sales values(2007, 12, 2000);
insert into sales values(2008, 1, 1000);

select * from crosstab(
  'select year, month, qty from sales order by 1',
  'select m from generate_series(1,12) m'
) as (
  year int,
  "Jan" int,
  "Feb" int,
  "Mar" int,
  "Apr" int,
  "May" int,
  "Jun" int,
  "Jul" int,
  "Aug" int,
  "Sep" int,
  "Oct" int,
  "Nov" int,
  "Dec" int
);
 year | Jan  | Feb  | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov  | Dec
------+------+------+-----+-----+-----+-----+-----+-----+-----+-----+------+------
 2007 | 1000 | 1500 |     |     |     |     | 500 |     |     |     | 1500 | 2000
 2008 | 1000 |      |     |     |     |     |     |     |     |     |      |
(2 rows)

CREATE TABLE cth(rowid text, rowdt timestamp, attribute text, val text);
INSERT INTO cth VALUES('test1','01 March 2003','temperature','42');
INSERT INTO cth VALUES('test1','01 March 2003','test_result','PASS');
INSERT INTO cth VALUES('test1','01 March 2003','volts','2.6987');
INSERT INTO cth VALUES('test2','02 March 2003','temperature','53');
INSERT INTO cth VALUES('test2','02 March 2003','test_result','FAIL');
INSERT INTO cth VALUES('test2','02 March 2003','test_startdate','01 March 2003');
INSERT INTO cth VALUES('test2','02 March 2003','volts','3.1234');

SELECT * FROM crosstab
(
  'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
  'SELECT DISTINCT attribute FROM cth ORDER BY 1'
)
AS
(
       rowid text,
       rowdt timestamp,
       temperature int4,
       test_result text,
       test_startdate timestamp,
       volts float8
);
 rowid |          rowdt           | temperature | test_result |      test_startdate      | volts
-------+--------------------------+-------------+-------------+--------------------------+--------
 test1 | Sat Mar 01 00:00:00 2003 |          42 | PASS        |                          | 2.6987
 test2 | Sun Mar 02 00:00:00 2003 |          53 | FAIL        | Sat Mar 01 00:00:00 2003 | 3.1234
(2 rows)

You can create predefined functions to avoid having to write out the result column names and types in each query. See the examples in the previous section. The underlying C function for this form of crosstab is named crosstab_hash.

F.38.1.5. Connectby

connectby(text relname, text keyid_fld, text parent_keyid_fld
          [, text orderby_fld ], text start_with, int max_depth
          [, text branch_delim ])

The connectby function produces a display of hierarchical data that is stored in a table. The table must have a key field that uniquely identifies rows, and a parent-key field that references the parent (if any) of each row. connectby can display the sub-tree descending from any row.

Table F.31 explains the parameters.

Table F.31. `connectby` Parameters

The key and parent-key fields can be any data type, but they must be the same type. Note that the start_with value must be entered as a text string, regardless of the type of the key field.

The connectby function is declared to return setof record, so the actual names and types of the output columns must be defined in the FROM clause of the calling SELECT statement, for example:

SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
    AS t(keyid text, parent_keyid text, level int, branch text, pos int);

The first two output columns are used for the current row's key and its parent row's key; they must match the type of the table's key field. The third output column is the depth in the tree and must be of type integer. If a branch_delim parameter was given, the next output column is the branch display and must be of type text. Finally, if an orderby_fld parameter was given, the last output column is a serial number, and must be of type integer.

The “branch” output column shows the path of keys taken to reach the current row. The keys are separated by the specified branch_delim string. If no branch display is wanted, omit both the branch_delim parameter and the branch column in the output column list.

If the ordering of siblings of the same parent is important, include the orderby_fld parameter to specify which field to order siblings by. This field can be of any sortable data type. The output column list must include a final integer serial-number column, if and only if orderby_fld is specified.

The parameters representing table and field names are copied as-is into the SQL queries that connectby generates internally. Therefore, include double quotes if the names are mixed-case or contain special characters. You may also need to schema-qualify the table name.

In large tables, performance will be poor unless there is an index on the parent-key field.

It is important that the branch_delim string not appear in any key values, else connectby may incorrectly report an infinite-recursion error. Note that if branch_delim is not provided, a default value of ~ is used for recursion detection purposes.

Here is an example:

CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int);

INSERT INTO connectby_tree VALUES('row1',NULL, 0);
INSERT INTO connectby_tree VALUES('row2','row1', 0);
INSERT INTO connectby_tree VALUES('row3','row1', 0);
INSERT INTO connectby_tree VALUES('row4','row2', 1);
INSERT INTO connectby_tree VALUES('row5','row2', 0);
INSERT INTO connectby_tree VALUES('row6','row4', 0);
INSERT INTO connectby_tree VALUES('row7','row3', 0);
INSERT INTO connectby_tree VALUES('row8','row6', 0);
INSERT INTO connectby_tree VALUES('row9','row5', 0);

-- with branch, without orderby_fld (order of results is not guaranteed)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
 AS t(keyid text, parent_keyid text, level int, branch text);
 keyid | parent_keyid | level |       branch
-------+--------------+-------+---------------------
 row2  |              |     0 | row2
 row4  | row2         |     1 | row2~row4
 row6  | row4         |     2 | row2~row4~row6
 row8  | row6         |     3 | row2~row4~row6~row8
 row5  | row2         |     1 | row2~row5
 row9  | row5         |     2 | row2~row5~row9
(6 rows)

-- without branch, without orderby_fld (order of results is not guaranteed)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
 AS t(keyid text, parent_keyid text, level int);
 keyid | parent_keyid | level
-------+--------------+-------
 row2  |              |     0
 row4  | row2         |     1
 row6  | row4         |     2
 row8  | row6         |     3
 row5  | row2         |     1
 row9  | row5         |     2
(6 rows)

-- with branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
 AS t(keyid text, parent_keyid text, level int, branch text, pos int);
 keyid | parent_keyid | level |       branch        | pos
-------+--------------+-------+---------------------+-----
 row2  |              |     0 | row2                |   1
 row5  | row2         |     1 | row2~row5           |   2
 row9  | row5         |     2 | row2~row5~row9      |   3
 row4  | row2         |     1 | row2~row4           |   4
 row6  | row4         |     2 | row2~row4~row6      |   5
 row8  | row6         |     3 | row2~row4~row6~row8 |   6
(6 rows)

-- without branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
 AS t(keyid text, parent_keyid text, level int, pos int);
 keyid | parent_keyid | level | pos
-------+--------------+-------+-----
 row2  |              |     0 |   1
 row5  | row2         |     1 |   2
 row9  | row5         |     2 |   3
 row4  | row2         |     1 |   4
 row6  | row4         |     2 |   5
 row8  | row6         |     3 |   6
(6 rows)

F.38.2. Author

Joe Conway