Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
本部分涵蓋了 PostgreSQL 資料庫管理員會感興趣的主題。這包括安裝軟體,設定和配置伺服器,管理使用者和資料庫以及維護任務。任何運行 PostgreSQL 伺服器的人,即使是個人使用,特別是在產品環境中,都應該熟悉本部分所涉及的主題。
這部分的資訊大致按照新使用者閱讀的順序排列。但是這些章節是獨立的,可以根據需求再單獨閱讀。這部分的內容以主題單位的敘述方式呈現。要查看某個特定指令的完整說明,請參閱第 VI 部分。
前幾章是為了在沒有必要知識的情況下可以理解而撰寫的,因此需要建立自有伺服器的新使用者可以使用這一部分開始探索。這部分的其餘部分是關於調教和管理;該內容假定讀者熟悉 PostgreSQL 資料庫系統的一般用法。建議讀者閱讀第 I 部分和第 II 部分以取得更多訊息。
PostgreSQL 以預編譯套件的形式提供給當今大多數常見的操作系統。如果可用,這是為系統用戶推薦的安裝 PostgreSQL 的方式。以原始碼安裝的方式(請參閱第 17 章)只推薦給開發 PostgreSQL 或擴充套件的朋友.
有關提供預編譯套件的平台更新列表,請參閱 PostgreSQL 網站上的下載頁面,網址為 https://www.postgresql.org/download/,按照個別平台的說明進行操作。
本章介紹使用原始碼安裝 PostgreSQL。(如果您正在安裝預先編譯的發行版,例如 RPM 或 Debian 套件,請忽略本章並改為閱讀套件程序的說明。)
本章討論如何設定和運行資料庫伺服器及其與作業系統的互動。
與外部世界可存取的任何伺服器背景程序一樣,建議在單獨的使用者帳戶下運行 PostgreSQL。此使用者帳戶應僅擁有由伺服器管理的資料,不應與其他背景程序共享。(例如,使用使用者 nobody 就是個壞主意。)安裝此使用者所擁有的可執行檔案不可取,因為有漏洞的系統可以修改它們自己的可執行檔案。
要將 Unix 使用者帳號加到系統中,請查詢指令 useradd 或 adduser。使用者名稱 postgres 經常被使用,也在本使用手冊中被假定,但如果你想要,也可以使用其他名字。
On some systems with shared libraries you need to tell the system how to find the newly installed shared libraries. The systems on which this is not necessary include FreeBSD, HP-UX, Linux, NetBSD, OpenBSD, and Solaris.
The method to set the shared library search path varies between platforms, but the most widely-used method is to set the environment variable LD_LIBRARY_PATH
like so: In Bourne shells (sh
, ksh
, bash
, zsh
):
or in csh
or tcsh
:
Replace /usr/local/pgsql/lib
with whatever you set --libdir
to in . You should put these commands into a shell start-up file such as /etc/profile
or ~/.bash_profile
. Some good information about the caveats associated with this method can be found at .
On some systems it might be preferable to set the environment variable LD_RUN_PATH
before building.
On Cygwin, put the library directory in the PATH
or move the .dll
files into the bin
directory.
If in doubt, refer to the manual pages of your system (perhaps ld.so
or rld
). If you later get a message like:
then this step was necessary. Simply take care of it then.
If you are on Linux and you have root access, you can run:
(or equivalent directory) after installation to enable the run-time linker to find the shared libraries faster. Refer to the manual page of ldconfig
for more information. On FreeBSD, NetBSD, and OpenBSD the command is:
instead. Other systems are not known to have an equivalent command.
To do this, add the following to your shell start-up file, such as ~/.bash_profile
(or /etc/profile
, if you want it to affect all users):
If you are using csh
or tcsh
, then use this command:
To enable your system to find the man documentation, you need to add lines like the following to a shell start-up file unless you installed into a location that is searched by default:
The environment variables PGHOST
and PGPORT
specify to client applications the host and port of the database server, overriding the compiled-in defaults. If you are going to run client applications remotely then it is convenient if every user that plans to use the database sets PGHOST
. This is not required, however; the settings can be communicated via command line options to most client programs.\
本節記錄了有關安裝 PostgreSQL 於其他特定平台的問題。 請務必閱讀安裝說明,尤其是。 另外,也請查閱關於迴歸測試結果的解釋。
此處未涵蓋的平台皆無法預想其可能的安裝問題。
您可以使用 GCC 或 IBM 內建編譯器 xlc 在 AIX 上編譯 PostgreSQL。
PostgreSQL 社群不再測試也不支援 AIX 7.1 之前的版本。
AIX can be somewhat peculiar with regards to the way it does memory management. You can have a server with many multiples of gigabytes of RAM free, but still get out of memory or address space errors when running applications. One example is loading of extensions failing with unusual errors. For example, running as the owner of the PostgreSQL installation:
Running as a non-owner in the group possessing the PostgreSQL installation:
Another example is out of memory errors in the PostgreSQL server logs, with every memory allocation near or greater than 256 MB failing.
The overall cause of all these problems is the default bittedness and memory model used by the server process. By default, all binaries built on AIX are 32-bit. This does not depend upon hardware type or kernel in use. These 32-bit processes are limited to 4 GB of memory laid out in 256 MB segments using one of a few models. The default allows for less than 256 MB in the heap as it shares a single segment with the stack.
In the case of the plperl
example, above, check your umask and the permissions of the binaries in your PostgreSQL installation. The binaries involved in that example were 32-bit and installed as mode 750 instead of 755. Due to the permissions being set in this fashion, only the owner or a member of the possessing group can load the library. Since it isn't world-readable, the loader places the object into the process' heap instead of the shared library segments where it would otherwise be placed.
The “ideal” solution for this is to use a 64-bit build of PostgreSQL, but that is not always practical, because systems with 32-bit processors can build, but not run, 64-bit binaries.
If a 32-bit binary is desired, set LDR_CNTRL
to MAXDATA=0x
n
0000000
, where 1 <= n <= 8, before starting the PostgreSQL server, and try different values and postgresql.conf
settings to find a configuration that works satisfactorily. This use of LDR_CNTRL
tells AIX that you want the server to have MAXDATA
bytes set aside for the heap, allocated in 256 MB segments. When you find a workable configuration, ldedit
can be used to modify the binaries so that they default to using the desired heap size. PostgreSQL can also be rebuilt, passing configure LDFLAGS="-Wl,-bmaxdata:0x
n
0000000"
to achieve the same effect.
For a 64-bit build, set OBJECT_MODE
to 64 and pass CC="gcc -maix64"
and LDFLAGS="-Wl,-bbigtoc"
to configure
. (Options for xlc
might differ.) If you omit the export of OBJECT_MODE
, your build may fail with linker errors. When OBJECT_MODE
is set, it tells AIX's build utilities such as ar
, as
, and ld
what type of objects to default to handling.
By default, overcommit of paging space can happen. While we have not seen this occur, AIX will kill processes when it runs out of memory and the overcommit is accessed. The closest to this that we have seen is fork failing because the system decided that there was not enough memory for another process. Like many other parts of AIX, the paging space allocation method and out-of-memory kill is configurable on a system- or process-wide basis if this becomes a problem.
When building from source, proceed according to the Unix-style installation procedure (i.e., ./configure; make
; etc.), noting the following Cygwin-specific differences:
Set your path to use the Cygwin bin directory before the Windows utilities. This will help prevent problems with compilation.
The adduser
command is not supported; use the appropriate user management application on Windows NT, 2000, or XP. Otherwise, skip this step.
The su
command is not supported; use ssh to simulate su on Windows NT, 2000, or XP. Otherwise, skip this step.
OpenSSL is not supported.
Start cygserver
for shared memory support. To do this, enter the command /usr/sbin/cygserver &
. This program needs to be running anytime you start the PostgreSQL server or initialize a database cluster (initdb
). The default cygserver
configuration may need to be changed (e.g., increase SEMMNS
) to prevent PostgreSQL from failing due to a lack of system resources.
Building might fail on some systems where a locale other than C is in use. To fix this, set the locale to C by doing export LANG=C.utf8
before building, and then setting it back to the previous setting after you have installed PostgreSQL.
The parallel regression tests (make check
) can generate spurious regression test failures due to overflowing the listen()
backlog queue which causes connection refused errors or hangs. You can limit the number of connections using the make variable MAX_CONNECTIONS
thus:
(On some systems you can have up to about 10 simultaneous connections.)
It is possible to install cygserver
and the PostgreSQL server as Windows NT services. For information on how to do this, please refer to the README
document included with the PostgreSQL binary package on Cygwin. It is installed in the directory /usr/share/doc/Cygwin
.
To build PostgreSQL from source on macOS, you will need to install Apple's command line developer tools, which can be done by issuing
(note that this will pop up a GUI dialog window for confirmation). You may or may not wish to also install Xcode.
On recent macOS releases, it's necessary to embed the “sysroot” path in the include switches used to find some system header files. This results in the outputs of the configure script varying depending on which SDK version was used during configure. That shouldn't pose any problem in simple scenarios, but if you are trying to do something like building an extension on a different machine than the server code was built on, you may need to force use of a different sysroot path. To do that, set PG_SYSROOT
, for example
To find out the appropriate path on your machine, run
Note that building an extension using a different sysroot version than was used to build the core server is not really recommended; in the worst case it could result in hard-to-debug ABI inconsistencies.
You can also select a non-default sysroot path when configuring, by specifying PG_SYSROOT
to configure:
This would primarily be useful to cross-compile for some other macOS version. There is no guarantee that the resulting executables will run on the current host.
To suppress the -isysroot
options altogether, use
(any nonexistent pathname will work). This might be useful if you wish to build with a non-Apple compiler, but beware that that case is not tested or supported by the PostgreSQL developers.
macOS's “System Integrity Protection” (SIP) feature breaks make check
, because it prevents passing the needed setting of DYLD_LIBRARY_PATH
down to the executables being tested. You can work around that by doing make install
before make check
. Most PostgreSQL developers just turn off SIP, though.
After you have everything installed, it is suggested that you run psql under CMD.EXE
, as the MSYS console has buffering issues.
17.7.4.1. Collecting Crash Dumps On Windows
If PostgreSQL on Windows crashes, it has the ability to generate minidumps that can be used to track down the cause for the crash, similar to core dumps on Unix. These dumps can be read using the Windows Debugger Tools or using Visual Studio. To enable the generation of dumps on Windows, create a subdirectory named crashdumps
inside the cluster data directory. The dumps will then be written into this directory with a unique name based on the identifier of the crashing process and the current time of the crash.
PostgreSQL is well-supported on Solaris. The more up to date your operating system, the fewer issues you will experience.
You can build with either GCC or Sun's compiler suite. For better code optimization, Sun's compiler is strongly recommended on the SPARC architecture. If you are using Sun's compiler, be careful not to select /usr/ucb/cc
; use /opt/SUNWspro/bin/cc
.
If configure
complains about a failed test program, this is probably a case of the run-time linker being unable to find some library, probably libz, libreadline or some other non-standard library such as libssl. To point it to the right location, set the LDFLAGS
environment variable on the configure
command line, e.g.,
See the ld man page for more information.
On the SPARC architecture, Sun Studio is strongly recommended for compilation. Try using the -xO5
optimization flag to generate significantly faster binaries. Do not use any flags that modify behavior of floating-point operations and errno
processing (e.g., -fast
).
If you do not have a reason to use 64-bit binaries on SPARC, prefer the 32-bit version. The 64-bit operations are slower and 64-bit binaries are slower than the 32-bit variants. On the other hand, 32-bit code on the AMD64 CPU family is not native, so 32-bit code is significantly slower on that CPU family.
If you see the linking of the postgres
executable abort with an error message like:
your DTrace installation is too old to handle probes in static functions. You need Solaris 10u4 or newer to use DTrace.
詳細步驟於本章的其餘部分中說明。
It is recommended that most users download the binary distribution for Windows, available as a graphical installer package from the PostgreSQL website at . Building from source is only intended for people developing PostgreSQL or extensions.
There are several different ways of building PostgreSQL on Windows. The simplest way to build with Microsoft tools is to install Visual Studio 2022 and use the included compiler. It is also possible to build with the full Microsoft Visual C++ 2013 to 2022. In some cases that requires the installation of the Windows SDK in addition to the compiler.
It is also possible to build PostgreSQL using the GNU compiler tools provided by MinGW, or using Cygwin for older versions of Windows.
使用 MinGW 或 Cygwin 的話,請以標準方式建置系統,參閱和的特定說明。 要在這些環境中產生原生 64 位元的編輯執行檔,請使用 MinGW-w64 的工具。 這些工具還可用於在其他主機(例如 Linux 和 macOS)上交叉編譯 32 位元和 64 位元 Windows 標的。 但不建議將 Cygwin 用於運作正式線上伺服器,它應該只用於在無法建置的舊版本 Windows 上。 官方預編譯安裝套件是使用 Visual Studio 編譯的。
Native builds of psql don't support command line editing. The Cygwin build does support command line editing, so it should be used where psql is needed for interactive use on Windows.
PostgreSQL can be built using the Visual C++ compiler suite from Microsoft. These compilers can be either from Visual Studio, Visual Studio Express or some versions of the Microsoft Windows SDK. If you do not already have a Visual Studio environment set up, the easiest ways are to use the compilers from Visual Studio 2022 or those in the Windows SDK 10, which are both free downloads from Microsoft.
Both 32-bit and 64-bit builds are possible with the Microsoft Compiler suite. 32-bit PostgreSQL builds are possible with Visual Studio 2013 to Visual Studio 2022, as well as standalone Windows SDK releases 8.1a to 10. 64-bit PostgreSQL builds are supported with Microsoft Windows SDK version 8.1a to 10 or Visual Studio 2013 and above. Compilation is supported down to Windows 7 and Windows Server 2008 R2 SP1 when building with Visual Studio 2013 to Visual Studio 2022.
The tools for building using Visual C++ or Platform SDK are in the src\tools\msvc
directory. When building, make sure there are no tools from MinGW or Cygwin present in your system PATH. Also, make sure you have all the required Visual C++ tools available in the PATH. In Visual Studio, start the Visual Studio Command Prompt. If you wish to build a 64-bit version, you must use the 64-bit version of the command, and vice versa. Starting with Visual Studio 2017 this can be done from the command line using VsDevCmd.bat
, see -help
for the available options and their default values. vsvars32.bat
is available in Visual Studio 2015 and earlier versions for the same purpose. From the Visual Studio Command Prompt, you can change the targeted CPU architecture, build type, and target OS by using the vcvarsall.bat
command, e.g., vcvarsall.bat x64 10.0.10240.0
to target Windows 10 with a 64-bit release build. See -help
for the other options of vcvarsall.bat
. All commands should be run from the src\tools\msvc
directory.
Before you build, you can create the file config.pl
to reflect any configuration options you want to change, or the paths to any third party libraries to use. The complete configuration is determined by first reading and parsing the file config_default.pl
, and then apply any changes from config.pl
. For example, to specify the location of your Python installation, put the following in config.pl
:
You only need to specify those parameters that are different from what's in config_default.pl
.
If you need to set any other environment variables, create a file called buildenv.pl
and put the required commands there. For example, to add the path for bison when it's not in the PATH, create a file containing:
To pass additional command line arguments to the Visual Studio build command (msbuild or vcbuild):
The following additional products are required to build PostgreSQL. Use the config.pl
file to specify which directories the libraries are available in.
Microsoft Windows SDK
If your build environment doesn't ship with a supported version of the Microsoft Windows SDK it is recommended that you upgrade to the latest version (currently version 10), available for download from .
You must always include the Windows Headers and Libraries part of the SDK. If you install a Windows SDK including the Visual C++ Compilers, you don't need Visual Studio to build. Note that as of Version 8.0a the Windows SDK no longer ships with a complete command-line build environment.
ActiveState Perl
ActiveState Perl is required to run the build generation scripts. MinGW or Cygwin Perl will not work. It must also be present in the PATH. Binaries can be downloaded from (Note: version 5.8.3 or later is required, the free Standard Distribution is sufficient).
The following additional products are not required to get started, but are required to build the complete package. Use the config.pl
file to specify which directories the libraries are available in.
ActiveState TCL
Required for building PL/Tcl (Note: version 8.4 is required, the free Standard Distribution is sufficient).
Bison and Flex
Bison and Flex are required to build from Git, but not required when building from a release file. Only Bison 1.875 or versions 2.2 and later will work. Flex must be version 2.5.31 or later.
You will need to add the directory containing flex.exe
and bison.exe
to the PATH environment variable in buildenv.pl
unless they are already in PATH. In the case of MinGW, the directory is the \msys\1.0\bin
subdirectory of your MinGW installation directory.
The Bison distribution from GnuWin32 appears to have a bug that causes Bison to malfunction when installed in a directory with spaces in the name, such as the default location on English installations C:\Program Files\GnuWin32
. Consider installing into C:\GnuWin32
or use the NTFS short name path to GnuWin32 in your PATH environment setting (e.g., C:\PROGRA~1\GnuWin32
).
Diff
Gettext
MIT Kerberos
libxml2 and libxslt
LZ4
Zstandard
OpenSSL
ossp-uuid
Python
zlib
PostgreSQL will only build for the x64 architecture on 64-bit Windows, there is no support for Itanium processors.
Mixing 32- and 64-bit versions in the same build tree is not supported. The build system will automatically detect if it's running in a 32- or 64-bit environment, and build PostgreSQL accordingly. For this reason, it is important to start the correct command prompt before building.
To use a server-side third party library such as Python or OpenSSL, this library must also be 64-bit. There is no support for loading a 32-bit library in a 64-bit server. Several of the third party libraries that PostgreSQL supports may only be available in 32-bit versions, in which case they cannot be used with 64-bit PostgreSQL.
To build all of PostgreSQL in release configuration (the default), run the command:
To build all of PostgreSQL in debug configuration, run the command:
To build just a single project, for example psql, run the commands:
To change the default build configuration to debug, put the following in the buildenv.pl
file:
It is also possible to build from inside the Visual Studio GUI. In this case, you need to run:
from the command prompt, and then open the generated pgsql.sln
(in the root directory of the source tree) in Visual Studio.
Most of the time, the automatic dependency tracking in Visual Studio will handle changed files. But if there have been large changes, you may need to clean the installation. To do this, simply run the clean.bat
command, which will automatically clean out all generated files. You can also run it with the dist
parameter, in which case it will behave like make distclean
and remove the flex/bison output files as well.
By default, all files are written into a subdirectory of the debug
or release
directories. To install these files using the standard layout, and also generate the files required to initialize and use the database, run the command:
If you want to install only the client applications and interface libraries, then you can use these commands:
To run the regression tests, make sure you have completed the build of all required parts first. Also, make sure that the DLLs required to load all parts of the system (such as the Perl and Python DLLs for the procedural languages) are present in the system path. If they are not, set it through the buildenv.pl
file. To run the tests, run one of the following commands from the src\tools\msvc
directory:
To change the schedule used (default is parallel), append it to the command line like:
Running the regression tests on client programs, with vcregress bincheck
, or on recovery tests, with vcregress recoverycheck
, requires an additional Perl module to be installed:
IPC::Run
The TAP tests run with vcregress
support the environment variables PROVE_TESTS
, that is expanded automatically using the name patterns given, and PROVE_FLAGS
. These can be set on a Windows terminal, before running vcregress
:
It is also possible to set up those parameters in buildenv.pl
:
Some of the TAP tests depend on a set of external commands that would optionally trigger tests related to them. Each one of those variables can be set or unset in buildenv.pl
:
GZIP_PROGRAM
Path to a gzip command. The default is gzip
, which will search for a command by that name in the configured PATH
.
LZ4
Path to a lz4 command. The default is lz4
, which will search for a command by that name in the configured PATH
.
TAR
Path to a tar command. The default is tar
, which will search for a command by that name in the configured PATH
.
ZSTD
Path to a zstd command. The default is zstd
, which will search for a command by that name in the configured PATH
.
在您可以做任何事情之前,您必須在磁碟中初始化一個資料庫儲存區域。 我們稱之為數據庫叢集(Database Cluster,SQL 標準術語為 Catalog Cluster)。資料庫叢集是由正在運行的資料庫伺服器的單一個執行實例管理的資料庫集合。 初始化後,資料庫叢集將包含一個名為 postgres 的資料庫,這是供工具程式、資料庫使用者和第三方應用程式所預設的資料庫。 資料庫伺服器本身不需要 postgres 資料庫存在,但許多外部工具會假設它存在。 初始化期間在每個叢集中所建置的另一個資料庫稱為 template1。 顧名思義,這將作為後續建立的資料庫的樣板; 它不應該用於實際的資料作業。 (有關在叢集中建立新資料庫的說明,請參閱。)
In file system terms, a database cluster is a single directory under which all data will be stored. We call this the data directory or data area. It is completely up to you where you choose to store your data. There is no default, although locations such as /usr/local/pgsql/data
or /var/lib/pgsql/data
are popular. To initialize a database cluster, use the command , which is installed with PostgreSQL. The desired file system location of your database cluster is indicated by the -D
option, for example:
Note that you must execute this command while logged into the PostgreSQL user account, which is described in the previous section.
As an alternative to the -D
option, you can set the environment variable PGDATA
.
Alternatively, you can run initdb
via the program like so:
如果您使用 pg_ctl 來啟動和停止伺服器(請參閱),這相當直覺,因此 pg_ctl 將是您用於管理資料庫伺服器實例的唯一命令。
initdb
will attempt to create the directory you specify if it does not already exist. Of course, this will fail if initdb
does not have permissions to write in the parent directory. It's generally recommendable that the PostgreSQL user own not just the data directory but its parent directory as well, so that this should not be a problem. If the desired parent directory doesn't exist either, you will need to create it first, using root privileges if the grandparent directory isn't writable. So the process might look like this:
initdb
will refuse to run if the data directory exists and already contains files; this is to prevent accidentally overwriting an existing installation.
Because the data directory contains all the data stored in the database, it is essential that it be secured from unauthorized access. initdb
therefore revokes access permissions from everyone but the PostgreSQL user, and optionally, group. Group access, when enabled, is read-only. This allows an unprivileged user in the same group as the cluster owner to take a backup of the cluster data or perform other operations that only require read access.
Note that enabling or disabling group access on an existing cluster requires the cluster to be shut down and the appropriate mode to be set on all directories and files before restarting PostgreSQL. Otherwise, a mix of modes might exist in the data directory. For clusters that allow access only by the owner, the appropriate modes are 0700
for directories and 0600
for files. For clusters that also allow reads by the group, the appropriate modes are 0750
for directories and 0640
for files.
However, while the directory contents are secure, the default client authentication setup allows any local user to connect to the database and even become the database superuser. If you do not trust other local users, we recommend you use one of initdb
's -W
, --pwprompt
or --pwfile
options to assign a password to the database superuser. Also, specify -A scram-sha-256
so that the default trust
authentication mode is not used; or modify the generated pg_hba.conf
file after running initdb
, but before you start the server for the first time. (Other reasonable approaches include using peer
authentication or file system permissions to restrict connections. See for more information.)
Non-C
and non-POSIX
locales rely on the operating system's collation library for character set ordering. This controls the ordering of keys stored in indexes. For this reason, a cluster cannot switch to an incompatible collation library version, either through snapshot restore, binary streaming replication, a different operating system, or an operating system upgrade.
Many installations create their database clusters on file systems (volumes) other than the machine's “root” volume. If you choose to do this, it is not advisable to try to use the secondary volume's topmost directory (mount point) as the data directory. Best practice is to create a directory within the mount-point directory that is owned by the PostgreSQL user, and then create the data directory within that. This avoids permissions problems, particularly for operations such as pg_upgrade, and it also ensures clean failures if the secondary volume is taken offline.
一般來說,任何具備 POSIX 標準的檔案系統都可以用於 PostgreSQL。 由於各種原因,使用者可能會使用不同的檔案系統,包括供應商支援、效能和熟悉程度。經驗上來說,在所有其他條件都相同的情況下,不應該僅因為切換檔案系統或進行次要的檔案系統配置變更,而期待效能或行為有明顯的改變。
可以使用 NFS 檔案系統來儲存 PostgreSQL 資料目錄。PostgreSQL 對 NFS 檔案系統並沒有任何特殊的要求,這意味著它假設 NFS 的行為與本地連接的磁碟完全相同。PostgreSQL 不使用已知在NFS上具有非標準行為的任何功能,例如檔案鎖定。
將 NFS 與 PostgreSQL 一起使用時,唯一確定要求是使用 hard 選項安裝檔案系統。使用 hard 選項,如果出現網路問題,NFS 程序可以無限期「hang」(暫停),因此此配置將需要仔細的監控。如果出現網路問題,soft 選項會中斷系統呼,但是 PostgreSQL 不會重複以此方式中斷的系統呼叫,因此任何此類中斷都將導致回報 I/O 錯誤。
不必要使用同步(sync)掛載選項。 async 選項的行為就足夠了,因為 PostgreSQL 會在適當的時機發出 fsync 呼叫來強制緩衝寫入。(這類似於它在本機檔案系統上的工作方式。)但是,強烈建議在存在該檔案的系統(主要是 Linux)上的 NFS 伺服器上使用 sync export 選項。否則,實際上不能保證 NFS 用戶端上的 fsync 或等效檔案可以到達伺服器上的永久儲存,這可能導致損壞,類似於在關閉參數 fsync 的情況下提供服務。這些掛載和輸出選項的預設設定在不同的供應商和版本之間略所不同,因此建議在任何情況下都需要進行檢查並且明確指定它們的內容,以避免任何誤解。
在某些情況下,可以透過 NFS 或更低等級的通訊協定(例如 iSCSI)存取外部儲存產品。在後者,儲存裝置為 block device,可以在其上建立任何可用的檔案系統。這種方法可能使 DBA 不必處理 NFS 的某些特質,不過,管理遠端儲存服務的複雜性會仍發生在其他層級之中。
If you installed into /usr/local/pgsql
or some other location that is not searched for programs by default, you should add /usr/local/pgsql/bin
(or whatever you set --bindir
to in ) into your PATH
. Strictly speaking, this is not necessary, but it will make the use of PostgreSQL much more convenient.
PostgreSQL 可以使用 Cygwin 建譯,Cygwin 是一種用於 Windows 的類 Linux 環境,但該方法不如原生 Windows 建置(請參閱),因此並不推薦在 Cygwin 下執行服務。
PostgreSQL for Windows 可以使用 MinGW 建置,MinGW 是一種用於 Microsoft 作業系統的類 Unix 執行環境,或是使用 Microsoft 的 Visual C++ 編譯器套件。 MinGW 建置過程使用本章中描述的正常建置系統; Visual C++ 編譯的工作方式則完全不同,將在中進行描述。
The native Windows port requires a 32 or 64-bit version of Windows 2000 or later. Earlier operating systems do not have sufficient infrastructure (but Cygwin may be used on those). MinGW, the Unix-like build tools, and MSYS, a collection of Unix tools required to run shell scripts like configure
, can be downloaded from . Neither is required to run the resulting binaries; they are needed only for creating the binaries.
To build 64 bit binaries using MinGW, install the 64 bit tool set from , put its bin directory in the PATH
, and run configure
with the --host=x86_64-w64-mingw32
option.
You can download Sun Studio from . Many GNU tools are integrated into Solaris 10, or they are present on the Solaris companion CD. If you need packages for older versions of Solaris, you can find these tools at . If you prefer sources, look at .
Yes, using DTrace is possible. See for further information.
Both Bison and Flex are included in the msys tool suite, available from as part of the MinGW compiler suite.
Diff is required to run the regression tests, and can be downloaded from .
Gettext is required to build with NLS support, and can be downloaded from . Note that binaries, dependencies and developer files are all needed.
Required for GSSAPI authentication support. MIT Kerberos can be downloaded from .
Required for XML support. Binaries can be downloaded from or source from . Note that libxml2 requires iconv, which is available from the same download location.
Required for supporting LZ4 compression. Binaries and source can be downloaded from .
Required for supporting Zstandard compression. Binaries and source can be downloaded from .
Required for SSL support. Binaries can be downloaded from or source from .
Required for UUID-OSSP support (contrib only). Source can be downloaded from .
Required for building PL/Python. Binaries can be downloaded from .
Required for compression support in pg_dump and pg_restore. Binaries can be downloaded from .
For more information about the regression tests, see .
As of this writing, IPC::Run
is not included in the ActiveState Perl installation, nor in the ActiveState Perl Package Manager (PPM) library. To install, download the IPC-Run-<version>.tar.gz
source archive from CPAN, at , and uncompress. Edit the buildenv.pl
file, and add a PERL5LIB variable to point to the lib
subdirectory from the extracted archive. For example:
initdb
also initializes the default locale for the database cluster. Normally, it will just take the locale settings in the environment and apply them to the initialized database. It is possible to specify a different locale for the database; more information about that can be found in . The default sort order used within the particular database cluster is set by initdb
, and while you can create new databases using different sort order, the order used in the template databases that initdb creates cannot be changed without dropping and recreating them. There is also a performance impact for using locales other than C
or POSIX
. Therefore, it is important to make this choice correctly the first time.
initdb
also sets the default character set encoding for the database cluster. Normally this should be chosen to match the locale setting. For details see .
This section discusses how to upgrade your database data from one PostgreSQL release to a newer one.
Current PostgreSQL version numbers consist of a major and a minor version number. For example, in the version number 10.1, the 10 is the major version number and the 1 is the minor version number, meaning this would be the first minor release of the major release 10. For releases before PostgreSQL version 10.0, version numbers consist of three numbers, for example, 9.5.3. In those cases, the major version consists of the first two digit groups of the version number, e.g., 9.5, and the minor version is the third number, e.g., 3, meaning this would be the third minor release of the major release 9.5.
Minor releases never change the internal storage format and are always compatible with earlier and later minor releases of the same major version number. For example, version 10.1 is compatible with version 10.0 and version 10.6. Similarly, for example, 9.5.3 is compatible with 9.5.0, 9.5.1, and 9.5.6. To update between compatible versions, you simply replace the executables while the server is down and restart the server. The data directory remains unchanged — minor upgrades are that simple.
For major releases of PostgreSQL, the internal data storage format is subject to change, thus complicating upgrades. The traditional method for moving data to a new major version is to dump and reload the database, though this can be slow. A faster method is pg_upgrade. Replication methods are also available, as discussed below.
New major versions also typically introduce some user-visible incompatibilities, so application programming changes might be required. All user-visible changes are listed in the release notes (Appendix E); pay particular attention to the section labeled "Migration". If you are upgrading across several major versions, be sure to read the release notes for each intervening version.
Cautious users will want to test their client applications on the new version before switching over fully; therefore, it's often a good idea to set up concurrent installations of old and new versions. When testing a PostgreSQL major upgrade, consider the following categories of possible changes:Administration
The capabilities available for administrators to monitor and control the server often change and improve in each major release.SQL
Typically this includes new SQL command capabilities and not changes in behavior, unless specifically mentioned in the release notes.Library API
Typically libraries like libpq only add new functionality, again unless mentioned in the release notes.System Catalogs
System catalog changes usually only affect database management tools.Server C-language API
This involves changes in the backend function API, which is written in the C programming language. Such changes affect code that references backend functions deep inside the server.
One upgrade method is to dump data from one major version of PostgreSQL and reload it in another — to do this, you must use a logical backup tool like pg_dumpall; file system level backup methods will not work. (There are checks in place that prevent you from using a data directory with an incompatible version of PostgreSQL, so no great harm can be done by trying to start the wrong server version on a data directory.)
It is recommended that you use the pg_dump and pg_dumpall programs from the newer version of PostgreSQL, to take advantage of enhancements that might have been made in these programs. Current releases of the dump programs can read data from any server version back to 7.0.
These instructions assume that your existing installation is under the /usr/local/pgsql
directory, and that the data area is in /usr/local/pgsql/data
. Substitute your paths appropriately.
If making a backup, make sure that your database is not being updated. This does not affect the integrity of the backup, but the changed data would of course not be included. If necessary, edit the permissions in the file /usr/local/pgsql/data/pg_hba.conf
(or equivalent) to disallow access from everyone except you. See Chapter 20 for additional information on access control.
To back up your database installation, type:
To make the backup, you can use the pg_dumpall command from the version you are currently running; see Section 25.1.2 for more details. For best results, however, try to use the pg_dumpall command from PostgreSQL 12.2, since this version contains bug fixes and improvements over older versions. While this advice might seem idiosyncratic since you haven't installed the new version yet, it is advisable to follow it if you plan to install the new version in parallel with the old version. In that case you can complete the installation normally and transfer the data later. This will also decrease the downtime.
Shut down the old server:
On systems that have PostgreSQL started at boot time, there is probably a start-up file that will accomplish the same thing. For example, on a Red Hat Linux system one might find that this works:
See Chapter 18 for details about starting and stopping the server.
If restoring from backup, rename or delete the old installation directory if it is not version-specific. It is a good idea to rename the directory, rather than delete it, in case you have trouble and need to revert to it. Keep in mind the directory might consume significant disk space. To rename the directory, use a command like this:
(Be sure to move the directory as a single unit so relative paths remain unchanged.)
Install the new version of PostgreSQL as outlined in Section 16.4.
Create a new database cluster if needed. Remember that you must execute these commands while logged in to the special database user account (which you already have if you are upgrading).
Restore your previous pg_hba.conf
and any postgresql.conf
modifications.
Start the database server, again using the special database user account:
Finally, restore your data from backup with:
using the new psql.
The least downtime can be achieved by installing the new server in a different directory and running both the old and the new servers in parallel, on different ports. Then you can use something like:
to transfer your data.
The pg_upgrade module allows an installation to be migrated in-place from one major PostgreSQL version to another. Upgrades can be performed in minutes, particularly with --link
mode. It requires steps similar to pg_dumpall above, e.g. starting/stopping the server, running initdb. The pg_upgrade documentation outlines the necessary steps.
It is also possible to use logical replication methods to create a standby server with the updated version of PostgreSQL. This is possible because logical replication supports replication between different major versions of PostgreSQL. The standby can be on the same computer or a different computer. Once it has synced up with the master server (running the older version of PostgreSQL), you can switch masters and make the standby the master and shut down the older database instance. Such a switch-over results in only several seconds of downtime for an upgrade.
This method of upgrading can be performed using the built-in logical replication facilities as well as using external logical replication systems such as pglogical, Slony, Londiste, and Bucardo.\
PostgreSQL can sometimes exhaust various operating system resource limits, especially when multiple copies of the server are running on the same system, or in very large installations. This section explains the kernel resources used by PostgreSQL and the steps you can take to resolve problems related to kernel resource consumption.
PostgreSQL requires the operating system to provide inter-process communication (IPC) features, specifically shared memory and semaphores. Unix-derived systems typically provide “System V” IPC, “POSIX” IPC, or both. Windows has its own implementation of these features and is not discussed here.
The complete lack of these facilities is usually manifested by an “Illegal system call” error upon server start. In that case there is no alternative but to reconfigure your kernel. PostgreSQL won't work without them. This situation is rare, however, among modern operating systems.
Upon starting the server, PostgreSQL normally allocates a very small amount of System V shared memory, as well as a much larger amount of POSIX (mmap
) shared memory. In addition a significant number of semaphores, which can be either System V or POSIX style, are created at server startup. Currently, POSIX semaphores are used on Linux and FreeBSD systems while other platforms use System V semaphores.
Prior to PostgreSQL 9.3, only System V shared memory was used, so the amount of System V shared memory required to start the server was much larger. If you are running an older version of the server, please consult the documentation for your server version.
System V IPC features are typically constrained by system-wide allocation limits. When PostgreSQL exceeds one of these limits, the server will refuse to start and should leave an instructive error message describing the problem and what to do about it. (See also Section 18.3.1.) The relevant kernel parameters are named consistently across different systems; Table 18.1 gives an overview. The methods to set them, however, vary. Suggestions for some platforms are given below.
Table 18.1. System V IPC Parameters
SHMMAX
Maximum size of shared memory segment (bytes)
at least 1kB, but the default is usually much higher
SHMMIN
Minimum size of shared memory segment (bytes)
1
SHMALL
Total amount of shared memory available (bytes or pages)
same as SHMMAX
if bytes, or ceil(SHMMAX/PAGE_SIZE)
if pages, plus room for other applications
SHMSEG
Maximum number of shared memory segments per process
only 1 segment is needed, but the default is much higher
SHMMNI
Maximum number of shared memory segments system-wide
like SHMSEG
plus room for other applications
SEMMNI
Maximum number of semaphore identifiers (i.e., sets)
at least ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) / 16)
plus room for other applications
SEMMNS
Maximum number of semaphores system-wide
ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) / 16) * 17
plus room for other applications
SEMMSL
Maximum number of semaphores per set
at least 17
SEMMAP
Number of entries in semaphore map
see text
SEMVMX
Maximum value of semaphore
at least 1000 (The default is often 32767; do not change unless necessary)
PostgreSQL requires a few bytes of System V shared memory (typically 48 bytes, on 64-bit platforms) for each copy of the server. On most modern operating systems, this amount can easily be allocated. However, if you are running many copies of the server, or if other applications are also using System V shared memory, it may be necessary to increase SHMALL
, which is the total amount of System V shared memory system-wide. Note that SHMALL
is measured in pages rather than bytes on many systems.
Less likely to cause problems is the minimum size for shared memory segments (SHMMIN
), which should be at most approximately 32 bytes for PostgreSQL (it is usually just 1). The maximum number of segments system-wide (SHMMNI
) or per-process (SHMSEG
) are unlikely to cause a problem unless your system has them set to zero.
When using System V semaphores, PostgreSQL uses one semaphore per allowed connection (max_connections), allowed autovacuum worker process (autovacuum_max_workers) and allowed background process (max_worker_processes), in sets of 16. Each such set will also contain a 17th semaphore which contains a “magic number”, to detect collision with semaphore sets used by other applications. The maximum number of semaphores in the system is set by SEMMNS
, which consequently must be at least as high as max_connections
plus autovacuum_max_workers
plus max_worker_processes
, plus one extra for each 16 allowed connections plus workers (see the formula in Table 18.1). The parameter SEMMNI
determines the limit on the number of semaphore sets that can exist on the system at one time. Hence this parameter must be at least ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) / 16)
. Lowering the number of allowed connections is a temporary workaround for failures, which are usually confusingly worded “No space left on device”, from the function semget
.
In some cases it might also be necessary to increase SEMMAP
to be at least on the order of SEMMNS
. This parameter defines the size of the semaphore resource map, in which each contiguous block of available semaphores needs an entry. When a semaphore set is freed it is either added to an existing entry that is adjacent to the freed block or it is registered under a new map entry. If the map is full, the freed semaphores get lost (until reboot). Fragmentation of the semaphore space could over time lead to fewer available semaphores than there should be.
Various other settings related to “semaphore undo”, such as SEMMNU
and SEMUME
, do not affect PostgreSQL.
When using POSIX semaphores, the number of semaphores needed is the same as for System V, that is one semaphore per allowed connection (max_connections), allowed autovacuum worker process (autovacuum_max_workers) and allowed background process (max_worker_processes). On the platforms where this option is preferred, there is no specific kernel limit on the number of POSIX semaphores.AIX
At least as of version 5.1, it should not be necessary to do any special configuration for such parameters as SHMMAX
, as it appears this is configured to allow all memory to be used as shared memory. That is the sort of configuration commonly used for other databases such as DB/2.
It might, however, be necessary to modify the global ulimit
information in /etc/security/limits
, as the default hard limits for file sizes (fsize
) and numbers of files (nofiles
) might be too low.FreeBSD
The default settings can be changed using the sysctl
or loader
interfaces. The following parameters can be set using sysctl
:
To make these settings persist over reboots, modify /etc/sysctl.conf
.
These semaphore-related settings are read-only as far as sysctl
is concerned, but can be set in /boot/loader.conf
:
After modifying these values a reboot is required for the new settings to take effect. (Note: FreeBSD does not use SEMMAP
. Older versions would accept but ignore a setting for kern.ipc.semmap
; newer versions reject it altogether.)
You might also want to configure your kernel to lock shared memory into RAM and prevent it from being paged out to swap. This can be accomplished using the sysctl
setting kern.ipc.shm_use_phys
.
If running in FreeBSD jails by enabling sysctl's security.jail.sysvipc_allowed
, postmasters running in different jails should be run by different operating system users. This improves security because it prevents non-root users from interfering with shared memory or semaphores in different jails, and it allows the PostgreSQL IPC cleanup code to function properly. (In FreeBSD 6.0 and later the IPC cleanup code does not properly detect processes in other jails, preventing the running of postmasters on the same port in different jails.)
FreeBSD versions before 4.0 work like OpenBSD (see below).NetBSD
In NetBSD 5.0 and later, IPC parameters can be adjusted using sysctl
, for example:
To have these settings persist over reboots, modify /etc/sysctl.conf
.
You might also want to configure your kernel to lock shared memory into RAM and prevent it from being paged out to swap. This can be accomplished using the sysctl
setting kern.ipc.shm_use_phys
.
NetBSD versions before 5.0 work like OpenBSD (see below), except that parameters should be set with the keyword options
not option
.OpenBSD
The options SYSVSHM
and SYSVSEM
need to be enabled when the kernel is compiled. (They are by default.) The maximum size of shared memory is determined by the option SHMMAXPGS
(in pages). The following shows an example of how to set the various parameters:
You might also want to configure your kernel to lock shared memory into RAM and prevent it from being paged out to swap. This can be accomplished using the sysctl
setting kern.ipc.shm_use_phys
.HP-UX
The default settings tend to suffice for normal installations. On HP-UX 10, the factory default for SEMMNS
is 128, which might be too low for larger database sites.
IPC parameters can be set in the System Administration Manager (SAM) under Kernel Configuration → Configurable Parameters. Choose Create A New Kernel when you're done.Linux
The default maximum segment size is 32 MB, and the default maximum total size is 2097152 pages. A page is almost always 4096 bytes except in unusual kernel configurations with “huge pages” (use getconf PAGE_SIZE
to verify).
The shared memory size settings can be changed via the sysctl
interface. For example, to allow 16 GB:
In addition these settings can be preserved between reboots in the file /etc/sysctl.conf
. Doing that is highly recommended.
Ancient distributions might not have the sysctl
program, but equivalent changes can be made by manipulating the /proc
file system:
The remaining defaults are quite generously sized, and usually do not require changes.macOS
The recommended method for configuring shared memory in macOS is to create a file named /etc/sysctl.conf
, containing variable assignments such as:
Note that in some macOS versions, all five shared-memory parameters must be set in /etc/sysctl.conf
, else the values will be ignored.
Beware that recent releases of macOS ignore attempts to set SHMMAX
to a value that isn't an exact multiple of 4096.
SHMALL
is measured in 4 kB pages on this platform.
In older macOS versions, you will need to reboot to have changes in the shared memory parameters take effect. As of 10.5 it is possible to change all but SHMMNI
on the fly, using sysctl. But it's still best to set up your preferred values via /etc/sysctl.conf
, so that the values will be kept across reboots.
The file /etc/sysctl.conf
is only honored in macOS 10.3.9 and later. If you are running a previous 10.3.x release, you must edit the file /etc/rc
and change the values in the following commands:
Note that /etc/rc
is usually overwritten by macOS system updates, so you should expect to have to redo these edits after each update.
In macOS 10.2 and earlier, instead edit these commands in the file /System/Library/StartupItems/SystemTuning/SystemTuning
.Solaris 2.6 to 2.9 (Solaris 6 to Solaris 9)
The relevant settings can be changed in /etc/system
, for example:
You need to reboot for the changes to take effect. See also http://sunsite.uakom.sk/sunworldonline/swol-09-1997/swol-09-insidesolaris.html for information on shared memory under older versions of Solaris.Solaris 2.10 (Solaris 10) and later OpenSolaris
In Solaris 10 and later, and OpenSolaris, the default shared memory and semaphore settings are good enough for most PostgreSQL applications. Solaris now defaults to a SHMMAX
of one-quarter of system RAM. To further adjust this setting, use a project setting associated with the postgres
user. For example, run the following as root
:
This command adds the user.postgres
project and sets the shared memory maximum for the postgres
user to 8GB, and takes effect the next time that user logs in, or when you restart PostgreSQL (not reload). The above assumes that PostgreSQL is run by the postgres
user in the postgres
group. No server reboot is required.
Other recommended kernel setting changes for database servers which will have a large number of connections are:
Additionally, if you are running PostgreSQL inside a zone, you may need to raise the zone resource usage limits as well. See "Chapter2: Projects and Tasks" in the System Administrator's Guide for more information on projects
and prctl
.
If systemd is in use, some care must be taken that IPC resources (shared memory and semaphores) are not prematurely removed by the operating system. This is especially of concern when installing PostgreSQL from source. Users of distribution packages of PostgreSQL are less likely to be affected, as the postgres
user is then normally created as a system user.
The setting RemoveIPC
in logind.conf
controls whether IPC objects are removed when a user fully logs out. System users are exempt. This setting defaults to on in stock systemd, but some operating system distributions default it to off.
A typical observed effect when this setting is on is that the semaphore objects used by a PostgreSQL server are removed at apparently random times, leading to the server crashing with log messages like
Different types of IPC objects (shared memory vs. semaphores, System V vs. POSIX) are treated slightly differently by systemd, so one might observe that some IPC resources are not removed in the same way as others. But it is not advisable to rely on these subtle differences.
A “user logging out” might happen as part of a maintenance job or manually when an administrator logs in as the postgres
user or something similar, so it is hard to prevent in general.
What is a “system user” is determined at systemd compile time from the SYS_UID_MAX
setting in /etc/login.defs
.
Packaging and deployment scripts should be careful to create the postgres
user as a system user by using useradd -r
, adduser --system
, or equivalent.
Alternatively, if the user account was created incorrectly or cannot be changed, it is recommended to set
in /etc/systemd/logind.conf
or another appropriate configuration file.
At least one of these two things has to be ensured, or the PostgreSQL server will be very unreliable.
Unix-like operating systems enforce various kinds of resource limits that might interfere with the operation of your PostgreSQL server. Of particular importance are limits on the number of processes per user, the number of open files per process, and the amount of memory available to each process. Each of these have a “hard” and a “soft” limit. The soft limit is what actually counts but it can be changed by the user up to the hard limit. The hard limit can only be changed by the root user. The system call setrlimit
is responsible for setting these parameters. The shell's built-in command ulimit
(Bourne shells) or limit
(csh) is used to control the resource limits from the command line. On BSD-derived systems the file /etc/login.conf
controls the various resource limits set during login. See the operating system documentation for details. The relevant parameters are maxproc
, openfiles
, and datasize
. For example:
(-cur
is the soft limit. Append -max
to set the hard limit.)
Kernels can also have system-wide limits on some resources.
On Linux /proc/sys/fs/file-max
determines the maximum number of open files that the kernel will support. It can be changed by writing a different number into the file or by adding an assignment in /etc/sysctl.conf
. The maximum limit of files per process is fixed at the time the kernel is compiled; see /usr/src/linux/Documentation/proc.txt
for more information.
The PostgreSQL server uses one process per connection so you should provide for at least as many processes as allowed connections, in addition to what you need for the rest of your system. This is usually not a problem but if you run several servers on one machine things might get tight.
The factory default limit on open files is often set to “socially friendly” values that allow many users to coexist on a machine without using an inappropriate fraction of the system resources. If you run many servers on a machine this is perhaps what you want, but on dedicated servers you might want to raise this limit.
On the other side of the coin, some systems allow individual processes to open large numbers of files; if more than a few processes do so then the system-wide limit can easily be exceeded. If you find this happening, and you do not want to alter the system-wide limit, you can set PostgreSQL's max_files_per_process configuration parameter to limit the consumption of open files.
In Linux 2.4 and later, the default virtual memory behavior is not optimal for PostgreSQL. Because of the way that the kernel implements memory overcommit, the kernel might terminate the PostgreSQL postmaster (the master server process) if the memory demands of either PostgreSQL or another process cause the system to run out of virtual memory.
If this happens, you will see a kernel message that looks like this (consult your system documentation and configuration on where to look for such a message):
This indicates that the postgres
process has been terminated due to memory pressure. Although existing database connections will continue to function normally, no new connections will be accepted. To recover, PostgreSQL will need to be restarted.
One way to avoid this problem is to run PostgreSQL on a machine where you can be sure that other processes will not run the machine out of memory. If memory is tight, increasing the swap space of the operating system can help avoid the problem, because the out-of-memory (OOM) killer is invoked only when physical memory and swap space are exhausted.
If PostgreSQL itself is the cause of the system running out of memory, you can avoid the problem by changing your configuration. In some cases, it may help to lower memory-related configuration parameters, particularly shared_buffers
and work_mem
. In other cases, the problem may be caused by allowing too many connections to the database server itself. In many cases, it may be better to reduce max_connections
and instead make use of external connection-pooling software.
On Linux 2.6 and later, it is possible to modify the kernel's behavior so that it will not “overcommit” memory. Although this setting will not prevent the OOM killer from being invoked altogether, it will lower the chances significantly and will therefore lead to more robust system behavior. This is done by selecting strict overcommit mode via sysctl
:
or placing an equivalent entry in /etc/sysctl.conf
. You might also wish to modify the related setting vm.overcommit_ratio
. For details see the kernel documentation file https://www.kernel.org/doc/Documentation/vm/overcommit-accounting.
Another approach, which can be used with or without altering vm.overcommit_memory
, is to set the process-specific OOM score adjustment value for the postmaster process to -1000
, thereby guaranteeing it will not be targeted by the OOM killer. The simplest way to do this is to execute
in the postmaster's startup script just before invoking the postmaster. Note that this action must be done as root, or it will have no effect; so a root-owned startup script is the easiest place to do it. If you do this, you should also set these environment variables in the startup script before invoking the postmaster:
These settings will cause postmaster child processes to run with the normal OOM score adjustment of zero, so that the OOM killer can still target them at need. You could use some other value for PG_OOM_ADJUST_VALUE
if you want the child processes to run with some other OOM score adjustment. (PG_OOM_ADJUST_VALUE
can also be omitted, in which case it defaults to zero.) If you do not set PG_OOM_ADJUST_FILE
, the child processes will run with the same OOM score adjustment as the postmaster, which is unwise since the whole point is to ensure that the postmaster has a preferential setting.
Older Linux kernels do not offer /proc/self/oom_score_adj
, but may have a previous version of the same functionality called /proc/self/oom_adj
. This works the same except the disable value is -17
not -1000
.
Some vendors' Linux 2.4 kernels are reported to have early versions of the 2.6 overcommit sysctl
parameter. However, setting vm.overcommit_memory
to 2 on a 2.4 kernel that does not have the relevant code will make things worse, not better. It is recommended that you inspect the actual kernel source code (see the function vm_enough_memory
in the file mm/mmap.c
) to verify what is supported in your kernel before you try this in a 2.4 installation. The presence of the overcommit-accounting
documentation file should not be taken as evidence that the feature is there. If in any doubt, consult a kernel expert or your kernel vendor.
Using huge pages reduces overhead when using large contiguous chunks of memory, as PostgreSQL does, particularly when using large values of shared_buffers. To use this feature in PostgreSQL you need a kernel with CONFIG_HUGETLBFS=y
and CONFIG_HUGETLB_PAGE=y
. You will also have to adjust the kernel setting vm.nr_hugepages
. To estimate the number of huge pages needed, start PostgreSQL without huge pages enabled and check the postmaster's VmPeak
value, as well as the system's huge page size, using the /proc
file system. This might look like:
6490428
/ 2048
gives approximately 3169.154
, so in this example we need at least 3170
huge pages, which we can set with:
A larger setting would be appropriate if other programs on the machine also need huge pages. Don't forget to add this setting to /etc/sysctl.conf
so that it will be reapplied after reboots.
Sometimes the kernel is not able to allocate the desired number of huge pages immediately, so it might be necessary to repeat the command or to reboot. (Immediately after a reboot, most of the machine's memory should be available to convert into huge pages.) To verify the huge page allocation situation, use:
It may also be necessary to give the database server's operating system user permission to use huge pages by setting vm.hugetlb_shm_group
via sysctl, and/or give permission to lock memory with ulimit -l
.
The default behavior for huge pages in PostgreSQL is to use them when possible and to fall back to normal pages when failing. To enforce the use of huge pages, you can set huge_pages to on
in postgresql.conf
. Note that with this setting PostgreSQL will fail to start if not enough huge pages are available.
For a detailed description of the Linux huge pages feature have a look at https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt.
While the server is running, it is not possible for a malicious user to take the place of the normal database server. However, when the server is down, it is possible for a local user to spoof the normal server by starting their own server. The spoof server could read passwords and queries sent by clients, but could not return any data because the PGDATA
directory would still be secure because of directory permissions. Spoofing is possible because any user can start a database server; a client cannot identify an invalid server unless it is specially configured.
One way to prevent spoofing of local
connections is to use a Unix domain socket directory (unix_socket_directories) that has write permission only for a trusted local user. This prevents a malicious user from creating their own socket file in that directory. If you are concerned that some applications might still reference /tmp
for the socket file and hence be vulnerable to spoofing, during operating system startup create a symbolic link /tmp/.s.PGSQL.5432
that points to the relocated socket file. You also might need to modify your /tmp
cleanup script to prevent removal of the symbolic link.
Another option for local
connections is for clients to use requirepeer
to specify the required owner of the server process connected to the socket.
To prevent spoofing on TCP connections, either use SSL certificates and make sure that clients check the server's certificate, or use GSSAPI encryption (or both, if they're on separate connections).
To prevent spoofing with SSL, the server must be configured to accept only hostssl
connections (Section 20.1) and have SSL key and certificate files (Section 18.9). The TCP client must connect using sslmode=verify-ca
or verify-full
and have the appropriate root certificate file installed (Section 33.18.1).
To prevent spoofing with GSSAPI, the server must be configured to accept only hostgssenc
connections (Section 20.1) and use gss
authentication with them. The TCP client must connect using gssencmode=require
.
The PostgreSQL source code for released versions can be obtained from the download section of our website: https://www.postgresql.org/ftp/source/. Download the postgresql-
version
.tar.gz
or postgresql-
version
.tar.bz2
file you're interested in, then unpack it:
This will create a directory postgresql-
version
under the current directory with the PostgreSQL sources. Change into that directory for the rest of the installation procedure.
Alternatively, you can use the Git version control system; see Section I.1 for more information.
There are several ways to shut down the database server. You control the type of shutdown by sending different signals to the master postgres
process.SIGTERM
This is the Smart Shutdown mode. After receiving SIGTERM, the server disallows new connections, but lets existing sessions end their work normally. It shuts down only after all of the sessions terminate. If the server is in online backup mode, it additionally waits until online backup mode is no longer active. While backup mode is active, new connections will still be allowed, but only to superusers (this exception allows a superuser to connect to terminate online backup mode). If the server is in recovery when a smart shutdown is requested, recovery and streaming replication will be stopped only after all regular sessions have terminated.SIGINT
This is the Fast Shutdown mode. The server disallows new connections and sends all existing server processes SIGTERM, which will cause them to abort their current transactions and exit promptly. It then waits for all server processes to exit and finally shuts down. If the server is in online backup mode, backup mode will be terminated, rendering the backup useless.SIGQUIT
This is the Immediate Shutdown mode. The server will send SIGQUIT to all child processes and wait for them to terminate. If any do not terminate within 5 seconds, they will be sent SIGKILL. The master server process exits as soon as all child processes have exited, without doing normal database shutdown processing. This will lead to recovery (by replaying the WAL log) upon next start-up. This is recommended only in emergencies.
The pg_ctl program provides a convenient interface for sending these signals to shut down the server. Alternatively, you can send the signal directly using kill
on non-Windows systems. The PID of the postgres
process can be found using the ps
program, or from the file postmaster.pid
in the data directory. For example, to do a fast shutdown:
It is best not to use SIGKILL to shut down the server. Doing so will prevent the server from releasing shared memory and semaphores. Furthermore, SIGKILL kills the postgres
process without letting it relay the signal to its subprocesses, so it might be necessary to kill the individual subprocesses by hand as well.
To terminate an individual session while allowing other sessions to continue, use pg_terminate_backend()
(see Table 9.83) or send a SIGTERM signal to the child process associated with the session.
Before anyone can access the database, you must start the database server. The database server program is called postgres
. The postgres
program must know where to find the data it is supposed to use. This is done with the -D
option. Thus, the simplest way to start the server is:
which will leave the server running in the foreground. This must be done while logged into the PostgreSQL user account. Without -D
, the server will try to use the data directory named by the environment variable PGDATA
. If that variable is not provided either, it will fail.
Normally it is better to start postgres
in the background. For this, use the usual Unix shell syntax:
It is important to store the server's stdout and stderr output somewhere, as shown above. It will help for auditing purposes and to diagnose problems. (See Section 24.3 for a more thorough discussion of log file handling.)
The postgres
program also takes a number of other command-line options. For more information, see the postgres reference page and Chapter 19 below.
This shell syntax can get tedious quickly. Therefore the wrapper program pg_ctl is provided to simplify some tasks. For example:
will start the server in the background and put the output into the named log file. The -D
option has the same meaning here as for postgres
. pg_ctl
is also capable of stopping the server.
Normally, you will want to start the database server when the computer boots. Autostart scripts are operating-system-specific. There are a few distributed with PostgreSQL in the contrib/start-scripts
directory. Installing one will require root privileges.
Different systems have different conventions for starting up daemons at boot time. Many systems have a file /etc/rc.local
or /etc/rc.d/rc.local
. Others use init.d
or rc.d
directories. Whatever you do, the server must be run by the PostgreSQL user account and not by root or any other user. Therefore you probably should form your commands using su postgres -c '...'
. For example:
Here are a few more operating-system-specific suggestions. (In each case be sure to use the proper installation directory and user name where we show generic values.)
For FreeBSD, look at the file contrib/start-scripts/freebsd
in the PostgreSQL source distribution.
On OpenBSD, add the following lines to the file /etc/rc.local
:
On Linux systems either add
to /etc/rc.d/rc.local
or /etc/rc.local
or look at the file contrib/start-scripts/linux
in the PostgreSQL source distribution.
When using systemd, you can use the following service unit file (e.g., at /etc/systemd/system/postgresql.service
):
Using Type=notify
requires that the server binary was built with configure --with-systemd
.
Consider carefully the timeout setting. systemd has a default timeout of 90 seconds as of this writing and will kill a process that does not notify readiness within that time. But a PostgreSQL server that might have to perform crash recovery at startup could take much longer to become ready. The suggested value of 0 disables the timeout logic.
On NetBSD, use either the FreeBSD or Linux start scripts, depending on preference.
On Solaris, create a file called /etc/init.d/postgresql
that contains the following line:
Then, create a symbolic link to it in /etc/rc3.d
as S99postgresql
.
While the server is running, its PID is stored in the file postmaster.pid
in the data directory. This is used to prevent multiple server instances from running in the same data directory and can also be used for shutting down the server.
There are several common reasons the server might fail to start. Check the server's log file, or start it by hand (without redirecting standard output or standard error) and see what error messages appear. Below we explain some of the most common error messages in more detail.
This usually means just what it suggests: you tried to start another server on the same port where one is already running. However, if the kernel error message is not Address already in use
or some variant of that, there might be a different problem. For example, trying to start a server on a reserved port number might draw something like:
A message like:
probably means your kernel's limit on the size of shared memory is smaller than the work area PostgreSQL is trying to create (4011376640 bytes in this example). Or it could mean that you do not have System-V-style shared memory support configured into your kernel at all. As a temporary workaround, you can try starting the server with a smaller-than-normal number of buffers (shared_buffers). You will eventually want to reconfigure your kernel to increase the allowed shared memory size. You might also see this message when trying to start multiple servers on the same machine, if their total space requested exceeds the kernel limit.
An error like:
does not mean you've run out of disk space. It means your kernel's limit on the number of System V semaphores is smaller than the number PostgreSQL wants to create. As above, you might be able to work around the problem by starting the server with a reduced number of allowed connections (max_connections), but you'll eventually want to increase the kernel limit.
If you get an “illegal system call” error, it is likely that shared memory or semaphores are not supported in your kernel at all. In that case your only option is to reconfigure the kernel to enable these features.
Details about configuring System V IPC facilities are given in Section 18.4.1.
Although the error conditions possible on the client side are quite varied and application-dependent, a few of them might be directly related to how the server was started. Conditions other than those shown below should be documented with the respective client application.
This is the generic “I couldn't find a server to talk to” failure. It looks like the above when TCP/IP communication is attempted. A common mistake is to forget to configure the server to allow TCP/IP connections.
Alternatively, you'll get this when attempting Unix-domain socket communication to a local server:
The last line is useful in verifying that the client is trying to connect to the right place. If there is in fact no server running there, the kernel error message will typically be either Connection refused
or No such file or directory
, as illustrated. (It is important to realize that Connection refused
in this context does not mean that the server got your connection request and rejected it. That case will produce a different message, as shown in Section 20.15.) Other error messages such as Connection timed out
might indicate more fundamental problems, like lack of network connectivity.
Configuration
The first step of the installation procedure is to configure the source tree for your system and choose the options you would like. This is done by running the configure
script. For a default installation simply enter:
This script will run a number of tests to determine values for various system dependent variables and detect any quirks of your operating system, and finally will create several files in the build tree to record what it found.
You can also run configure
in a directory outside the source tree, and then build there, if you want to keep the build directory separate from the original source files. This procedure is called a VPATH build. Here's how:
The default configuration will build the server and utilities, as well as all client applications and interfaces that require only a C compiler. All files will be installed under /usr/local/pgsql
by default.
You can customize the build and installation process by supplying one or more command line options to configure
. Typically you would customize the install location, or the set of optional features that are built. configure
has a large number of options, which are described in Section 17.4.1.
Also, configure
responds to certain environment variables, as described in Section 17.4.2. These provide additional ways to customize the configuration.
Build
To start the build, type either of:
(Remember to use GNU make.) The build will take a few minutes depending on your hardware.
If you want to build everything that can be built, including the documentation (HTML and man pages), and the additional modules (contrib
), type instead:
If you want to build everything that can be built, including the additional modules (contrib
), but without the documentation, type instead:
If you want to invoke the build from another makefile rather than manually, you must unset MAKELEVEL
or set it to zero, for instance like this:
Failure to do that can lead to strange error messages, typically about missing header files.
Regression Tests
If you want to test the newly built server before you install it, you can run the regression tests at this point. The regression tests are a test suite to verify that PostgreSQL runs on your machine in the way the developers expected it to. Type:
(This won't work as root; do it as an unprivileged user.) See Chapter 33 for detailed information about interpreting the test results. You can repeat this test at any later time by issuing the same command.
Installing the Files
If you are upgrading an existing system be sure to read Section 19.6, which has instructions about upgrading a cluster.
To install PostgreSQL enter:
This will install files into the directories that were specified in Step 1. Make sure that you have appropriate permissions to write into that area. Normally you need to do this step as root. Alternatively, you can create the target directories in advance and arrange for appropriate permissions to be granted.
To install the documentation (HTML and man pages), enter:
If you built the world above, type instead:
This also installs the documentation.
If you built the world without the documentation above, type instead:
You can use make install-strip
instead of make install
to strip the executable files and libraries as they are installed. This will save some space. If you built with debugging support, stripping will effectively remove the debugging support, so it should only be done if debugging is no longer needed. install-strip
tries to do a reasonable job saving space, but it does not have perfect knowledge of how to strip every unneeded byte from an executable file, so if you want to save all the disk space you possibly can, you will have to do manual work.
The standard installation provides all the header files needed for client application development as well as for server-side program development, such as custom functions or data types written in C.
Client-only installation: If you want to install only the client applications and interface libraries, then you can use these commands:
src/bin
has a few binaries for server-only use, but they are small.
Uninstallation: To undo the installation use the command make uninstall
. However, this will not remove any created directories.
Cleaning: After the installation you can free disk space by removing the built files from the source tree with the command make clean
. This will preserve the files made by the configure
program, so that you can rebuild everything with make
later on. To reset the source tree to the state in which it was distributed, use make distclean
. If you are going to build for several platforms within the same source tree you must do this and re-configure for each platform. (Alternatively, use a separate build tree for each platform, so that the source tree remains unmodified.)
If you perform a build and then discover that your configure
options were wrong, or if you change anything that configure
investigates (for example, software upgrades), then it's a good idea to do make distclean
before reconfiguring and rebuilding. Without this, your changes in configuration choices might not propagate everywhere they need to.
configure
Optionsconfigure
's command line options are explained below. This list is not exhaustive (use ./configure --help
to get one that is). The options not covered here are meant for advanced use-cases such as cross-compilation, and are documented in the standard Autoconf documentation.
These options control where make install
will put the files. The --prefix
option is sufficient for most cases. If you have special needs, you can customize the installation subdirectories with the other options described in this section. Beware however that changing the relative locations of the different subdirectories may render the installation non-relocatable, meaning you won't be able to move it after installation. (The man
and doc
locations are not affected by this restriction.) For relocatable installs, you might want to use the --disable-rpath
option described later.
--prefix=
PREFIX
Install all files under the directory PREFIX
instead of /usr/local/pgsql
. The actual files will be installed into various subdirectories; no files will ever be installed directly into the PREFIX
directory.
--exec-prefix=
EXEC-PREFIX
You can install architecture-dependent files under a different prefix, EXEC-PREFIX
, than what PREFIX
was set to. This can be useful to share architecture-independent files between hosts. If you omit this, then EXEC-PREFIX
is set equal to PREFIX
and both architecture-dependent and independent files will be installed under the same tree, which is probably what you want.
--bindir=
DIRECTORY
Specifies the directory for executable programs. The default is EXEC-PREFIX
/bin
, which normally means /usr/local/pgsql/bin
.
--sysconfdir=
DIRECTORY
Sets the directory for various configuration files, PREFIX
/etc
by default.
--libdir=
DIRECTORY
Sets the location to install libraries and dynamically loadable modules. The default is EXEC-PREFIX
/lib
.
--includedir=
DIRECTORY
Sets the directory for installing C and C++ header files. The default is PREFIX
/include
.
--datarootdir=
DIRECTORY
Sets the root directory for various types of read-only data files. This only sets the default for some of the following options. The default is PREFIX
/share
.
--datadir=
DIRECTORY
Sets the directory for read-only data files used by the installed programs. The default is DATAROOTDIR
. Note that this has nothing to do with where your database files will be placed.
--localedir=
DIRECTORY
Sets the directory for installing locale data, in particular message translation catalog files. The default is DATAROOTDIR
/locale
.
--mandir=
DIRECTORY
The man pages that come with PostgreSQL will be installed under this directory, in their respective man
x
subdirectories. The default is DATAROOTDIR
/man
.
--docdir=
DIRECTORY
Sets the root directory for installing documentation files, except “man” pages. This only sets the default for the following options. The default value for this option is DATAROOTDIR
/doc/postgresql
.
--htmldir=
DIRECTORY
The HTML-formatted documentation for PostgreSQL will be installed under this directory. The default is DATAROOTDIR
.
Care has been taken to make it possible to install PostgreSQL into shared installation locations (such as /usr/local/include
) without interfering with the namespace of the rest of the system. First, the string “/postgresql
” is automatically appended to datadir
, sysconfdir
, and docdir
, unless the fully expanded directory name already contains the string “postgres
” or “pgsql
”. For example, if you choose /usr/local
as prefix, the documentation will be installed in /usr/local/doc/postgresql
, but if the prefix is /opt/postgres
, then it will be in /opt/postgres/doc
. The public C header files of the client interfaces are installed into includedir
and are namespace-clean. The internal header files and the server header files are installed into private directories under includedir
. See the documentation of each interface for information about how to access its header files. Finally, a private subdirectory will also be created, if appropriate, under libdir
for dynamically loadable modules.
The options described in this section enable building of various PostgreSQL features that are not built by default. Most of these are non-default only because they require additional software, as described in Section 17.2.
--enable-nls[=
LANGUAGES
]
Enables Native Language Support (NLS), that is, the ability to display a program's messages in a language other than English. LANGUAGES
is an optional space-separated list of codes of the languages that you want supported, for example --enable-nls='de fr'
. (The intersection between your list and the set of actually provided translations will be computed automatically.) If you do not specify a list, then all available translations are installed.
To use this option, you will need an implementation of the Gettext API.
--with-perl
Build the PL/Perl server-side language.
--with-python
Build the PL/Python server-side language.
--with-tcl
Build the PL/Tcl server-side language.
--with-tclconfig=
DIRECTORY
Tcl installs the file tclConfig.sh
, which contains configuration information needed to build modules interfacing to Tcl. This file is normally found automatically at a well-known location, but if you want to use a different version of Tcl you can specify the directory in which to look for tclConfig.sh
.
--with-icu
Build with support for the ICU library, enabling use of ICU collation features (see Section 24.2). This requires the ICU4C package to be installed. The minimum required version of ICU4C is currently 4.2.
By default, pkg-config will be used to find the required compilation options. This is supported for ICU4C version 4.6 and later. For older versions, or if pkg-config is not available, the variables ICU_CFLAGS
and ICU_LIBS
can be specified to configure
, like in this example:
(If ICU4C is in the default search path for the compiler, then you still need to specify nonempty strings in order to avoid use of pkg-config, for example, ICU_CFLAGS=' '
.)
--with-llvm
Build with support for LLVM based JIT compilation (see Chapter 32). This requires the LLVM library to be installed. The minimum required version of LLVM is currently 3.9.
llvm-config
will be used to find the required compilation options. llvm-config
, and then llvm-config-$major-$minor
for all supported versions, will be searched for in your PATH
. If that would not yield the desired program, use LLVM_CONFIG
to specify a path to the correct llvm-config
. For example
LLVM support requires a compatible clang
compiler (specified, if necessary, using the CLANG
environment variable), and a working C++ compiler (specified, if necessary, using the CXX
environment variable).
--with-lz4
Build with LZ4 compression support.
--with-zstd
Build with Zstandard compression support.
--with-ssl=
LIBRARY
Build with support for SSL (encrypted) connections. The only LIBRARY
supported is openssl
. This requires the OpenSSL package to be installed. configure
will check for the required header files and libraries to make sure that your OpenSSL installation is sufficient before proceeding.
--with-openssl
Obsolete equivalent of --with-ssl=openssl
.
--with-gssapi
Build with support for GSSAPI authentication. On many systems, the GSSAPI system (usually a part of the Kerberos installation) is not installed in a location that is searched by default (e.g., /usr/include
, /usr/lib
), so you must use the options --with-includes
and --with-libraries
in addition to this option. configure
will check for the required header files and libraries to make sure that your GSSAPI installation is sufficient before proceeding.
--with-ldap
Build with LDAP support for authentication and connection parameter lookup (see Section 34.18 and Section 21.10 for more information). On Unix, this requires the OpenLDAP package to be installed. On Windows, the default WinLDAP library is used. configure
will check for the required header files and libraries to make sure that your OpenLDAP installation is sufficient before proceeding.
--with-pam
Build with PAM (Pluggable Authentication Modules) support.
--with-bsd-auth
Build with BSD Authentication support. (The BSD Authentication framework is currently only available on OpenBSD.)
--with-systemd
Build with support for systemd service notifications. This improves integration if the server is started under systemd but has no impact otherwise; see Section 19.3 for more information. libsystemd and the associated header files need to be installed to use this option.
--with-bonjour
Build with support for Bonjour automatic service discovery. This requires Bonjour support in your operating system. Recommended on macOS.
--with-uuid=
LIBRARY
Build the uuid-ossp module (which provides functions to generate UUIDs), using the specified UUID library. LIBRARY
must be one of:
bsd
to use the UUID functions found in FreeBSD and some other BSD-derived systems
e2fs
to use the UUID library created by the e2fsprogs
project; this library is present in most Linux systems and in macOS, and can be obtained for other platforms as well
ossp
to use the OSSP UUID library
--with-ossp-uuid
Obsolete equivalent of --with-uuid=ossp
.
--with-libxml
Build with libxml2, enabling SQL/XML support. Libxml2 version 2.6.23 or later is required for this feature.
To detect the required compiler and linker options, PostgreSQL will query pkg-config
, if that is installed and knows about libxml2. Otherwise the program xml2-config
, which is installed by libxml2, will be used if it is found. Use of pkg-config
is preferred, because it can deal with multi-architecture installations better.
To use a libxml2 installation that is in an unusual location, you can set pkg-config
-related environment variables (see its documentation), or set the environment variable XML2_CONFIG
to point to the xml2-config
program belonging to the libxml2 installation, or set the variables XML2_CFLAGS
and XML2_LIBS
. (If pkg-config
is installed, then to override its idea of where libxml2 is you must either set XML2_CONFIG
or set both XML2_CFLAGS
and XML2_LIBS
to nonempty strings.)
--with-libxslt
Build with libxslt, enabling the xml2 module to perform XSL transformations of XML. --with-libxml
must be specified as well.
The options described in this section allow disabling certain PostgreSQL features that are built by default, but which might need to be turned off if the required software or system features are not available. Using these options is not recommended unless really necessary.
--without-readline
Prevents use of the Readline library (and libedit as well). This option disables command-line editing and history in psql.
--with-libedit-preferred
Favors the use of the BSD-licensed libedit library rather than GPL-licensed Readline. This option is significant only if you have both libraries installed; the default in that case is to use Readline.
--without-zlib
Prevents use of the Zlib library. This disables support for compressed archives in pg_dump and pg_restore.
--disable-spinlocks
Allow the build to succeed even if PostgreSQL has no CPU spinlock support for the platform. The lack of spinlock support will result in very poor performance; therefore, this option should only be used if the build aborts and informs you that the platform lacks spinlock support. If this option is required to build PostgreSQL on your platform, please report the problem to the PostgreSQL developers.
--disable-atomics
Disable use of CPU atomic operations. This option does nothing on platforms that lack such operations. On platforms that do have them, this will result in poor performance. This option is only useful for debugging or making performance comparisons.
--disable-thread-safety
Disable the thread-safety of client libraries. This prevents concurrent threads in libpq and ECPG programs from safely controlling their private connection handles. Use this only on platforms with deficient threading support.
--with-includes=
DIRECTORIES
DIRECTORIES
is a colon-separated list of directories that will be added to the list the compiler searches for header files. If you have optional packages (such as GNU Readline) installed in a non-standard location, you have to use this option and probably also the corresponding --with-libraries
option.
Example: --with-includes=/opt/gnu/include:/usr/sup/include
.
--with-libraries=
DIRECTORIES
DIRECTORIES
is a colon-separated list of directories to search for libraries. You will probably have to use this option (and the corresponding --with-includes
option) if you have packages installed in non-standard locations.
Example: --with-libraries=/opt/gnu/lib:/usr/sup/lib
.
--with-system-tzdata=
DIRECTORY
PostgreSQL includes its own time zone database, which it requires for date and time operations. This time zone database is in fact compatible with the IANA time zone database provided by many operating systems such as FreeBSD, Linux, and Solaris, so it would be redundant to install it again. When this option is used, the system-supplied time zone database in DIRECTORY
is used instead of the one included in the PostgreSQL source distribution. DIRECTORY
must be specified as an absolute path. /usr/share/zoneinfo
is a likely directory on some operating systems. Note that the installation routine will not detect mismatching or erroneous time zone data. If you use this option, you are advised to run the regression tests to verify that the time zone data you have pointed to works correctly with PostgreSQL.
This option is mainly aimed at binary package distributors who know their target operating system well. The main advantage of using this option is that the PostgreSQL package won't need to be upgraded whenever any of the many local daylight-saving time rules change. Another advantage is that PostgreSQL can be cross-compiled more straightforwardly if the time zone database files do not need to be built during the installation.
--with-extra-version=
STRING
Append STRING
to the PostgreSQL version number. You can use this, for example, to mark binaries built from unreleased Git snapshots or containing custom patches with an extra version string, such as a git describe
identifier or a distribution package release number.
--disable-rpath
Do not mark PostgreSQL's executables to indicate that they should search for shared libraries in the installation's library directory (see --libdir
). On most platforms, this marking uses an absolute path to the library directory, so that it will be unhelpful if you relocate the installation later. However, you will then need to provide some other way for the executables to find the shared libraries. Typically this requires configuring the operating system's dynamic linker to search the library directory; see Section 17.5.1 for more detail.
It's fairly common, particularly for test builds, to adjust the default port number with --with-pgport
. The other options in this section are recommended only for advanced users.
--with-pgport=
NUMBER
Set NUMBER
as the default port number for server and clients. The default is 5432. The port can always be changed later on, but if you specify it here then both server and clients will have the same default compiled in, which can be very convenient. Usually the only good reason to select a non-default value is if you intend to run multiple PostgreSQL servers on the same machine.
--with-krb-srvnam=
NAME
The default name of the Kerberos service principal used by GSSAPI. postgres
is the default. There's usually no reason to change this unless you are building for a Windows environment, in which case it must be set to upper case POSTGRES
.
--with-segsize=
SEGSIZE
Set the segment size, in gigabytes. Large tables are divided into multiple operating-system files, each of size equal to the segment size. This avoids problems with file size limits that exist on many platforms. The default segment size, 1 gigabyte, is safe on all supported platforms. If your operating system has “largefile” support (which most do, nowadays), you can use a larger segment size. This can be helpful to reduce the number of file descriptors consumed when working with very large tables. But be careful not to select a value larger than is supported by your platform and the file systems you intend to use. Other tools you might wish to use, such as tar, could also set limits on the usable file size. It is recommended, though not absolutely required, that this value be a power of 2. Note that changing this value breaks on-disk database compatibility, meaning you cannot use pg_upgrade
to upgrade to a build with a different segment size.
--with-blocksize=
BLOCKSIZE
Set the block size, in kilobytes. This is the unit of storage and I/O within tables. The default, 8 kilobytes, is suitable for most situations; but other values may be useful in special cases. The value must be a power of 2 between 1 and 32 (kilobytes). Note that changing this value breaks on-disk database compatibility, meaning you cannot use pg_upgrade
to upgrade to a build with a different block size.
--with-wal-blocksize=
BLOCKSIZE
Set the WAL block size, in kilobytes. This is the unit of storage and I/O within the WAL log. The default, 8 kilobytes, is suitable for most situations; but other values may be useful in special cases. The value must be a power of 2 between 1 and 64 (kilobytes). Note that changing this value breaks on-disk database compatibility, meaning you cannot use pg_upgrade
to upgrade to a build with a different WAL block size.
Most of the options in this section are only of interest for developing or debugging PostgreSQL. They are not recommended for production builds, except for --enable-debug
, which can be useful to enable detailed bug reports in the unlucky event that you encounter a bug. On platforms supporting DTrace, --enable-dtrace
may also be reasonable to use in production.
When building an installation that will be used to develop code inside the server, it is recommended to use at least the options --enable-debug
and --enable-cassert
.
--enable-debug
Compiles all programs and libraries with debugging symbols. This means that you can run the programs in a debugger to analyze problems. This enlarges the size of the installed executables considerably, and on non-GCC compilers it usually also disables compiler optimization, causing slowdowns. However, having the symbols available is extremely helpful for dealing with any problems that might arise. Currently, this option is recommended for production installations only if you use GCC. But you should always have it on if you are doing development work or running a beta version.
--enable-cassert
Enables assertion checks in the server, which test for many “cannot happen” conditions. This is invaluable for code development purposes, but the tests can slow down the server significantly. Also, having the tests turned on won't necessarily enhance the stability of your server! The assertion checks are not categorized for severity, and so what might be a relatively harmless bug will still lead to server restarts if it triggers an assertion failure. This option is not recommended for production use, but you should have it on for development work or when running a beta version.
--enable-tap-tests
Enable tests using the Perl TAP tools. This requires a Perl installation and the Perl module IPC::Run
. See Section 33.4 for more information.
--enable-depend
Enables automatic dependency tracking. With this option, the makefiles are set up so that all affected object files will be rebuilt when any header file is changed. This is useful if you are doing development work, but is just wasted overhead if you intend only to compile once and install. At present, this option only works with GCC.
--enable-coverage
If using GCC, all programs and libraries are compiled with code coverage testing instrumentation. When run, they generate files in the build directory with code coverage metrics. See Section 33.5 for more information. This option is for use only with GCC and when doing development work.
--enable-profiling
If using GCC, all programs and libraries are compiled so they can be profiled. On backend exit, a subdirectory will be created that contains the gmon.out
file containing profile data. This option is for use only with GCC and when doing development work.
--enable-dtrace
Compiles PostgreSQL with support for the dynamic tracing tool DTrace. See Section 28.5 for more information.
To point to the dtrace
program, the environment variable DTRACE
can be set. This will often be necessary because dtrace
is typically installed under /usr/sbin
, which might not be in your PATH
.
Extra command-line options for the dtrace
program can be specified in the environment variable DTRACEFLAGS
. On Solaris, to include DTrace support in a 64-bit binary, you must specify DTRACEFLAGS="-64"
. For example, using the GCC compiler:
Using Sun's compiler:
configure
Environment VariablesIn addition to the ordinary command-line options described above, configure
responds to a number of environment variables. You can specify environment variables on the configure
command line, for example:
In this usage an environment variable is little different from a command-line option. You can also set such variables beforehand:
This usage can be convenient because many programs' configuration scripts respond to these variables in similar ways.
The most commonly used of these environment variables are CC
and CFLAGS
. If you prefer a C compiler different from the one configure
picks, you can set the variable CC
to the program of your choice. By default, configure
will pick gcc
if available, else the platform's default (usually cc
). Similarly, you can override the default compiler flags if needed with the CFLAGS
variable.
Here is a list of the significant variables that can be set in this manner:
BISON
Bison program
CC
C compiler
CFLAGS
options to pass to the C compiler
CLANG
path to clang
program used to process source code for inlining when compiling with --with-llvm
CPP
C preprocessor
CPPFLAGS
options to pass to the C preprocessor
CXX
C++ compiler
CXXFLAGS
options to pass to the C++ compiler
DTRACE
location of the dtrace
program
DTRACEFLAGS
options to pass to the dtrace
program
FLEX
Flex program
LDFLAGS
options to use when linking either executables or shared libraries
LDFLAGS_EX
additional options for linking executables only
LDFLAGS_SL
additional options for linking shared libraries only
LLVM_CONFIG
llvm-config
program used to locate the LLVM installation
MSGFMT
msgfmt
program for native language support
PERL
Perl interpreter program. This will be used to determine the dependencies for building PL/Perl. The default is perl
.
PYTHON
Python interpreter program. This will be used to determine the dependencies for building PL/Python. If this is not set, the following are probed in this order: python3 python
.
TCLSH
Tcl interpreter program. This will be used to determine the dependencies for building PL/Tcl. If this is not set, the following are probed in this order: tclsh tcl tclsh8.6 tclsh86 tclsh8.5 tclsh85 tclsh8.4 tclsh84
.
XML2_CONFIG
xml2-config
program used to locate the libxml2 installation
Sometimes it is useful to add compiler flags after-the-fact to the set that were chosen by configure
. An important example is that gcc's -Werror
option cannot be included in the CFLAGS
passed to configure
, because it will break many of configure
's built-in tests. To add such flags, include them in the COPT
environment variable while running make
. The contents of COPT
are added to both the CFLAGS
and LDFLAGS
options set up by configure
. For example, you could do
or
If using GCC, it is best to build with an optimization level of at least -O1
, because using no optimization (-O0
) disables some important compiler warnings (such as the use of uninitialized variables). However, non-zero optimization levels can complicate debugging because stepping through compiled code will usually not match up one-to-one with source code lines. If you get confused while trying to debug optimized code, recompile the specific files of interest with -O0
. An easy way to do this is by passing an option to make: make PROFILE=-O0 file.o
.
The COPT
and PROFILE
environment variables are actually handled identically by the PostgreSQL makefiles. Which to use is a matter of preference, but a common habit among developers is to use PROFILE
for one-time flag adjustments, while COPT
might be kept set all the time.
PostgreSQL offers encryption at several levels, and provides flexibility in protecting data from disclosure due to database server theft, unscrupulous administrators, and insecure networks. Encryption might also be required to secure sensitive data such as medical records or financial transactions.
Database user passwords are stored as hashes (determined by the setting password_encryption), so the administrator cannot determine the actual password assigned to the user. If SCRAM or MD5 encryption is used for client authentication, the unencrypted password is never even temporarily present on the server because the client encrypts it before being sent across the network. SCRAM is preferred, because it is an Internet standard and is more secure than the PostgreSQL-specific MD5 authentication protocol.
The pgcrypto module allows certain fields to be stored encrypted. This is useful if only some of the data is sensitive. The client supplies the decryption key and the data is decrypted on the server and then sent to the client.
The decrypted data and the decryption key are present on the server for a brief time while it is being decrypted and communicated between the client and server. This presents a brief moment where the data and keys can be intercepted by someone with complete access to the database server, such as the system administrator.
Storage encryption can be performed at the file system level or the block level. Linux file system encryption options include eCryptfs and EncFS, while FreeBSD uses PEFS. Block level or full disk encryption options include dm-crypt + LUKS on Linux and GEOM modules geli and gbde on FreeBSD. Many other operating systems support this functionality, including Windows.
This mechanism prevents unencrypted data from being read from the drives if the drives or the entire computer is stolen. This does not protect against attacks while the file system is mounted, because when mounted, the operating system provides an unencrypted view of the data. However, to mount the file system, you need some way for the encryption key to be passed to the operating system, and sometimes the key is stored somewhere on the host that mounts the disk.
SSL connections encrypt all data sent across the network: the password, the queries, and the data returned. The pg_hba.conf
file allows administrators to specify which hosts can use non-encrypted connections (host
) and which require SSL-encrypted connections (hostssl
). Also, clients can specify that they connect to servers only via SSL.
GSSAPI-encrypted connections encrypt all data sent across the network, including queries and data returned. (No password is sent across the network.) The pg_hba.conf
file allows administrators to specify which hosts can use non-encrypted connections (host
) and which require GSSAPI-encrypted connections (hostgssenc
). Also, clients can specify that they connect to servers only on GSSAPI-encrypted connections (gssencmode=require
).
Stunnel or SSH can also be used to encrypt transmissions.
It is possible for both the client and server to provide SSL certificates to each other. It takes some extra configuration on each side, but this provides stronger verification of identity than the mere use of passwords. It prevents a computer from pretending to be the server just long enough to read the password sent by the client. It also helps prevent “man in the middle” attacks where a computer between the client and server pretends to be the server and reads and passes all data between the client and server.
If the system administrator for the server's machine cannot be trusted, it is necessary for the client to encrypt the data; this way, unencrypted data never appears on the database server. Data is encrypted on the client before being sent to the server, and database results have to be decrypted on the client before being used.
要在作業系統註冊 Windows 事件日誌,請使用以下指令:
這將建立事件檢視器使用的註冊機碼項目,該項目由名為 PostgreSQL 的預設事件來源建立。
要指定不同的事件來源名稱(請參閱 event_source),請使用 /n 和 /i 選項:
要從作業系統註銷事件日誌,請使用以下指令:
要在資料庫伺服器中啟用事件日誌記錄,請修改 postgresql.conf 中的 log_destination ,使其包含 eventlog。
有許多設定參數會影響資料庫系統的行為。在本章的第一部分中,我們將介紹如何瞭解如何設定參數。接下來的部分將詳細討論每個參數。
A platform (that is, a CPU architecture and operating system combination) is considered supported by the PostgreSQL development community if the code contains provisions to work on that platform and it has recently been verified to build and pass its regression tests on that platform. Currently, most testing of platform compatibility is done automatically by test machines in the PostgreSQL Build Farm. If you are interested in using PostgreSQL on a platform that is not represented in the build farm, but on which the code works or can be made to work, you are strongly encouraged to set up a build farm member machine so that continued compatibility can be assured.
In general, PostgreSQL can be expected to work on these CPU architectures: x86, x86_64, IA64, PowerPC, PowerPC 64, S/390, S/390x, Sparc, Sparc 64, ARM, MIPS, MIPSEL, and PA-RISC. Code support exists for M68K, M32R, and VAX, but these architectures are not known to have been tested recently. It is often possible to build on an unsupported CPU type by configuring with --disable-spinlocks
, but performance will be poor.
PostgreSQL can be expected to work on these operating systems: Linux (all recent distributions), Windows (Win2000 SP4 and later), FreeBSD, OpenBSD, NetBSD, macOS, AIX, HP/UX, and Solaris. Other Unix-like systems may also work but are not currently being tested. In most cases, all CPU architectures supported by a given operating system will work. Look in Section 16.7 below to see if there is information specific to your operating system, particularly if using an older system.
If you have installation problems on a platform that is known to be supported according to recent build farm results, please report it to <
pgsql-bugs@postgresql.org
>
. If you are interested in porting PostgreSQL to a new platform, <
pgsql-hackers@postgresql.org
>
is the appropriate place to discuss that.
PostgreSQL also has native support for using GSSAPI to encrypt client/server communications for increased security. Support requires that a GSSAPI implementation (such as MIT krb5) is installed on both client and server systems, and that support in PostgreSQL is enabled at build time (see Chapter 16).
The PostgreSQL server will listen for both normal and GSSAPI-encrypted connections on the same TCP port, and will negotiate with any connecting client on whether to use GSSAPI for encryption (and for authentication). By default, this decision is up to the client (which means it can be downgraded by an attacker); see Section 20.1 about setting up the server to require the use of GSSAPI for some or all connections.
Other than configuration of the negotiation behavior, GSSAPI encryption requires no setup beyond that which is necessary for GSSAPI authentication. (For more information on configuring that, see Section 20.6.)\
It is possible to use SSH to encrypt the network connection between clients and a PostgreSQL server. Done properly, this provides an adequately secure network connection, even for non-SSL-capable clients.
First make sure that an SSH server is running properly on the same machine as the PostgreSQL server and that you can log in using ssh
as some user. Then you can establish a secure tunnel with a command like this from the client machine:
The first number in the -L
argument, 63333, is the port number of your end of the tunnel; it can be any unused port. (IANA reserves ports 49152 through 65535 for private use.) The second number, 5432, is the remote end of the tunnel: the port number your server is using. The name or IP address between the port numbers is the host with the database server you are going to connect to, as seen from the host you are logging in to, which is foo.com
in this example. In order to connect to the database server using this tunnel, you connect to port 63333 on the local machine:
To the database server it will then look as though you are really user joe
on host foo.com
connecting to localhost
in that context, and it will use whatever authentication procedure was configured for connections from this user and host. Note that the server will not think the connection is SSL-encrypted, since in fact it is not encrypted between the SSH server and the PostgreSQL server. This should not pose any extra security risk as long as they are on the same machine.
In order for the tunnel setup to succeed you must be allowed to connect via ssh
as joe@foo.com
, just as if you had attempted to use ssh
to create a terminal session.
You could also have set up the port forwarding as
but then the database server will see the connection as coming in on its foo.com
interface, which is not opened by the default setting listen_addresses = 'localhost'
. This is usually not what you want.
If you have to “hop” to the database server via some login host, one possible setup could look like this:
Note that this way the connection from shell.foo.com
to db.foo.com
will not be encrypted by the SSH tunnel. SSH offers quite a few configuration possibilities when the network is restricted in various ways. Please refer to the SSH documentation for details.
Several other applications exist that can provide secure tunnels using a procedure similar in concept to the one just described.
PostgreSQL has native support for using SSL connections to encrypt client/server communications for increased security. This requires that OpenSSL is installed on both client and server systems and that support in PostgreSQL is enabled at build time (see Chapter 16).
With SSL support compiled in, the PostgreSQL server can be started with SSL enabled by setting the parameter ssl to on
in postgresql.conf
. The server will listen for both normal and SSL connections on the same TCP port, and will negotiate with any connecting client on whether to use SSL. By default, this is at the client's option; see Section 20.1 about how to set up the server to require use of SSL for some or all connections.
To start in SSL mode, files containing the server certificate and private key must exist. By default, these files are expected to be named server.crt
and server.key
, respectively, in the server's data directory, but other names and locations can be specified using the configuration parameters ssl_cert_file and ssl_key_file.
On Unix systems, the permissions on server.key
must disallow any access to world or group; achieve this by the command chmod 0600 server.key
. Alternatively, the file can be owned by root and have group read access (that is, 0640
permissions). That setup is intended for installations where certificate and key files are managed by the operating system. The user under which the PostgreSQL server runs should then be made a member of the group that has access to those certificate and key files.
If the data directory allows group read access then certificate files may need to be located outside of the data directory in order to conform to the security requirements outlined above. Generally, group access is enabled to allow an unprivileged user to backup the database, and in that case the backup software will not be able to read the certificate files and will likely error.
If the private key is protected with a passphrase, the server will prompt for the passphrase and will not start until it has been entered. Using a passphrase by default disables the ability to change the server's SSL configuration without a server restart, but see ssl_passphrase_command_supports_reload. Furthermore, passphrase-protected private keys cannot be used at all on Windows.
The first certificate in server.crt
must be the server's certificate because it must match the server's private key. The certificates of “intermediate” certificate authorities can also be appended to the file. Doing this avoids the necessity of storing intermediate certificates on clients, assuming the root and intermediate certificates were created with v3_ca
extensions. This allows easier expiration of intermediate certificates.
It is not necessary to add the root certificate to server.crt
. Instead, clients must have the root certificate of the server's certificate chain.
PostgreSQL reads the system-wide OpenSSL configuration file. By default, this file is named openssl.cnf
and is located in the directory reported by openssl version -d
. This default can be overridden by setting environment variable OPENSSL_CONF
to the name of the desired configuration file.
OpenSSL supports a wide range of ciphers and authentication algorithms, of varying strength. While a list of ciphers can be specified in the OpenSSL configuration file, you can specify ciphers specifically for use by the database server by modifying ssl_ciphers in postgresql.conf
.
It is possible to have authentication without encryption overhead by using NULL-SHA
or NULL-MD5
ciphers. However, a man-in-the-middle could read and pass communications between client and server. Also, encryption overhead is minimal compared to the overhead of authentication. For these reasons NULL ciphers are not recommended.
To require the client to supply a trusted certificate, place certificates of the root certificate authorities (CAs) you trust in a file in the data directory, set the parameter ssl_ca_file in postgresql.conf
to the new file name, and add the authentication option clientcert=verify-ca
or clientcert=verify-full
to the appropriate hostssl
line(s) in pg_hba.conf
. A certificate will then be requested from the client during SSL connection startup. (See Section 33.18 for a description of how to set up certificates on the client.)
For a hostssl
entry with clientcert=verify-ca
, the server will verify that the client's certificate is signed by one of the trusted certificate authorities. If clientcert=verify-full
is specified, the server will not only verify the certificate chain, but it will also check whether the username or its mapping matches the cn
(Common Name) of the provided certificate. Note that certificate chain validation is always ensured when the cert
authentication method is used (see Section 20.12).
Intermediate certificates that chain up to existing root certificates can also appear in the ssl_ca_file file if you wish to avoid storing them on clients (assuming the root and intermediate certificates were created with v3_ca
extensions). Certificate Revocation List (CRL) entries are also checked if the parameter ssl_crl_file is set. (See http://h41379.www4.hpe.com/doc/83final/ba554_90007/ch04s02.html for diagrams showing SSL certificate usage.)
The clientcert
authentication option is available for all authentication methods, but only in pg_hba.conf
lines specified as hostssl
. When clientcert
is not specified or is set to no-verify
, the server will still verify any presented client certificates against its CA file, if one is configured — but it will not insist that a client certificate be presented.
There are two approaches to enforce that users provide a certificate during login.
The first approach makes use of the cert
authentication method for hostssl
entries in pg_hba.conf
, such that the certificate itself is used for authentication while also providing ssl connection security. See Section 20.12 for details. (It is not necessary to specify any clientcert
options explicitly when using the cert
authentication method.) In this case, the cn
(Common Name) provided in the certificate is checked against the user name or an applicable mapping.
The second approach combines any authentication method for hostssl
entries with the verification of client certificates by setting the clientcert
authentication option to verify-ca
or verify-full
. The former option only enforces that the certificate is valid, while the latter also ensures that the cn
(Common Name) in the certificate matches the user name or an applicable mapping.
Table 18.2 summarizes the files that are relevant to the SSL setup on the server. (The shown file names are default names. The locally configured names could be different.)
server certificate
sent to client to indicate server's identity
server private key
proves server certificate was sent by the owner; does not indicate certificate owner is trustworthy
trusted certificate authorities
checks that client certificate is signed by a trusted certificate authority
certificates revoked by certificate authorities
client certificate must not be on this list
The server reads these files at server start and whenever the server configuration is reloaded. On Windows systems, they are also re-read whenever a new backend process is spawned for a new client connection.
If an error in these files is detected at server start, the server will refuse to start. But if an error is detected during a configuration reload, the files are ignored and the old SSL configuration continues to be used. On Windows systems, if an error in these files is detected at backend start, that backend will be unable to establish an SSL connection. In all these cases, the error condition is reported in the server log.
To create a simple self-signed certificate for the server, valid for 365 days, use the following OpenSSL command, replacing dbhost.yourdomain.com
with the server's host name:
Then do:
because the server will reject the file if its permissions are more liberal than this. For more details on how to create your server private key and certificate, refer to the OpenSSL documentation.
While a self-signed certificate can be used for testing, a certificate signed by a certificate authority (CA) (usually an enterprise-wide root CA) should be used in production.
To create a server certificate whose identity can be validated by clients, first create a certificate signing request (CSR) and a public/private key file:
Then, sign the request with the key to create a root certificate authority (using the default OpenSSL configuration file location on Linux):
Finally, create a server certificate signed by the new root certificate authority:
server.crt
and server.key
should be stored on the server, and root.crt
should be stored on the client so the client can verify that the server's leaf certificate was signed by its trusted root certificate. root.key
should be stored offline for use in creating future certificates.
It is also possible to create a chain of trust that includes intermediate certificates:
server.crt
and intermediate.crt
should be concatenated into a certificate file bundle and stored on the server. server.key
should also be stored on the server. root.crt
should be stored on the client so the client can verify that the server's leaf certificate was signed by a chain of certificates linked to its trusted root certificate. root.key
and intermediate.key
should be stored offline for use in creating future certificates.
All parameter names are case-insensitive. Every parameter takes a value of one of five types: boolean, string, integer, floating point, or enumerated (enum). The type determines the syntax for setting the parameter:
Boolean: Values can be written as on
, off
, true
, false
, yes
, no
, 1
, 0
(all case-insensitive) or any unambiguous prefix of one of these.
String: In general, enclose the value in single quotes, doubling any single quotes within the value. Quotes can usually be omitted if the value is a simple number or identifier, however.
Numeric (integer and floating point): A decimal point is permitted only for floating-point parameters. Do not use thousands separators. Quotes are not required.
Numeric with Unit: Some numeric parameters have an implicit unit, because they describe quantities of memory or time. The unit might be kilobytes, blocks (typically eight kilobytes), milliseconds, seconds, or minutes. An unadorned numeric value for one of these settings will use the setting's default unit, which can be learned from pg_settings
.unit
. For convenience, settings can be given with a unit specified explicitly, for example '120 ms'
for a time value, and they will be converted to whatever the parameter's actual unit is. Note that the value must be written as a string (with quotes) to use this feature. The unit name is case-sensitive, and there can be whitespace between the numeric value and the unit.
Valid memory units are kB
(kilobytes), MB
(megabytes), GB
(gigabytes), and TB
(terabytes). The multiplier for memory units is 1024, not 1000.
Valid time units are ms
(milliseconds), s
(seconds), min
(minutes), h
(hours), and d
(days).
Enumerated: Enumerated-type parameters are written in the same way as string parameters, but are restricted to have one of a limited set of values. The values allowable for such a parameter can be found frompg_settings
.enumvals
. Enum parameter values are case-insensitive.
The most fundamental way to set these parameters is to edit the file postgresql.conf
, which is normally kept in the data directory. A default copy is installed when the database cluster directory is initialized. An example of what this file might look like is:
One parameter is specified per line. The equal sign between name and value is optional. Whitespace is insignificant (except within a quoted parameter value) and blank lines are ignored. Hash marks (#
) designate the remainder of the line as a comment. Parameter values that are not simple identifiers or numbers must be single-quoted. To embed a single quote in a parameter value, write either two quotes (preferred) or backslash-quote.
Parameters set in this way provide default values for the cluster. The settings seen by active sessions will be these values unless they are overridden. The following sections describe ways in which the administrator or user can override these defaults.
The configuration file is reread whenever the main server process receives a SIGHUP signal; this signal is most easily sent by running pg_ctl reload
from the command line or by calling the SQL function pg_reload_conf()
. The main server process also propagates this signal to all currently running server processes, so that existing sessions also adopt the new values (this will happen after they complete any currently-executing client command). Alternatively, you can send the signal to a single server process directly. Some parameters can only be set at server start; any changes to their entries in the configuration file will be ignored until the server is restarted. Invalid parameter settings in the configuration file are likewise ignored (but logged) during SIGHUP processing.
Values set with ALTER DATABASE
and ALTER ROLE
are applied only when starting a fresh database session. They override values obtained from the configuration files or server command line, and constitute defaults for the rest of the session. Note that some settings cannot be changed after server start, and so cannot be set with these commands (or the ones listed below).
Once a client is connected to the database, PostgreSQL provides two additional SQL commands (and equivalent functions) to interact with session-local configuration settings:
Querying this view is similar to using SHOW ALL
but provides more detail. It is also more flexible, since it's possible to specify filter conditions or join against other relations.
is:
In addition to setting global defaults or attaching overrides at the database or role level, you can pass settings to PostgreSQL via shell facilities. Both the server and libpq client library accept parameter values via the shell.
During server startup, parameter settings can be passed to the postgres
command via the -c
command-line parameter. For example,
Settings provided in this way override those set via postgresql.conf
or ALTER SYSTEM
, so they cannot be changed globally without restarting the server.
When starting a client session via libpq, parameter settings can be specified using the PGOPTIONS
environment variable. Settings established in this way constitute defaults for the life of the session, but do not affect other sessions. For historical reasons, the format of PGOPTIONS
is similar to that used when launching the postgres
command; specifically, the -c
flag must be specified. For example,
Other clients and libraries might provide their own mechanisms, via the shell or otherwise, that allow the user to alter session settings without direct use of SQL commands.
PostgreSQL provides several features for breaking down complex postgresql.conf
files into sub-files. These features are especially useful when managing multiple servers with related, but not identical, configurations.
In addition to individual parameter settings, the postgresql.conf
file can contain include directives, which specify another file to read and process as if it were inserted into the configuration file at this point. This feature allows a configuration file to be divided into physically separate parts. Include directives simply look like:
If the file name is not an absolute path, it is taken as relative to the directory containing the referencing configuration file. Inclusions can be nested.
There is also an include_if_exists
directive, which acts the same as the include
directive, except when the referenced file does not exist or cannot be read. A regular include
will consider this an error condition, but include_if_exists
merely logs a message and continues processing the referencing configuration file.
The postgresql.conf
file can also contain include_dir
directives, which specify an entire directory of configuration files to include. These look like
Non-absolute directory names are taken as relative to the directory containing the referencing configuration file. Within the specified directory, only non-directory files whose names end with the suffix .conf
will be included. File names that start with the .
character are also ignored, to prevent mistakes since such files are hidden on some platforms. Multiple files within an include directory are processed in file name order (according to C locale rules, i.e. numbers before letters, and uppercase letters before lowercase ones).
Include files or directories can be used to logically separate portions of the database configuration, rather than having a single large postgresql.conf
file. Consider a company that has two database servers, each with a different amount of memory. There are likely elements of the configuration both will share, for things such as logging. But memory-related parameters on the server will vary between the two. And there might be server specific customizations, too. One way to manage this situation is to break the custom configuration changes for your site into three files. You could add this to the end of your postgresql.conf
file to include them:
All systems would have the same shared.conf
. Each server with a particular amount of memory could share the same memory.conf
; you might have one for all servers with 8GB of RAM, another for those having 16GB. And finallyserver.conf
could have truly server-specific configuration information in it.
Another possibility is to create a configuration file directory and put this information into files there. For example, a conf.d
directory could be referenced at the end of postgresql.conf
:
Then you could name the files in the conf.d
directory like this:
This naming convention establishes a clear order in which these files will be loaded. This is important because only the last setting encountered for a particular parameter while the server is reading configuration files will be used. In this example, something set in conf.d/02server.conf
would override a value set in conf.d/01memory.conf
.
You might instead use this approach to naming the files descriptively:
This sort of arrangement gives a unique name for each configuration file variation. This can help eliminate ambiguity when several servers have their configurations all stored in one place, such as in a version control repository. (Storing database configuration files under version control is another good practice to consider.)\
listen_addresses
(string
)指定伺服器監聽用戶端應用程序連線的 TCP/IP 位址。該值採用逗號分隔的主機名稱或數字 IP 位址列表的形式。特殊項目「*」對應於所有可用的 IP。項目 0.0.0.0 允許監聽所有 IPv4 位址,還有「::」允許監聽所有 IPv6 位址。如果列表為空,則伺服器根本不監聽任何 IP 接口,在這種情況下,就只能使用 Unix-domain socket 來連接它。預設值是 localhost,它只允許進行本地 TCP/IP loopback 連線。儘管用戶端身份驗證()允許對誰可以存取伺服器進行細維的控制,但 listen_addresses 控制哪些 IP 接受連線嘗試,這有助於防止在不安全的網路接口上重複發出惡意的連線請求。此參數只能在伺服器啟動時設定。
port
(integer
)伺服器監聽的 TCP 連接埠;預設是 5432。請注意,相同的連接埠號號用於伺服器監聽的所有 IP 地址。此參數只能在伺服器啟動時設定。
max_connections
(integer
)決定資料庫伺服器的最大同時連線數。預設值通常為 100 個連線,但如果您的核心設定不支援它(在 initdb 期間確定),則可能會更少。該參數只能在伺服器啟動時設定。
運行備用伺服器時,必須將此參數設定為與主服務器上相同或更高的值。 否則,查詢將不被允許在備用伺服器中使用。
superuser_reserved_connections
(integer
)決定為 PostgreSQL 超級使用者連線保留的連線「插槽」的數量。最多 max_connections 連線可以同時活動。當活動同時連線的數量為 max_connections 減去 superuser_reserved_connections 以上時,新連線將僅接受超級使用者,並且不會接受新的複寫作業連線。
預設值是三個連線。該值必須小於 max_connections 的值。此參數只能在伺服器啟動時設定。
unix_socket_directories
(string
)指定伺服器要監聽來自用戶端應用程序以 Unix-domain socket 連線的目錄。列出由逗號分隔的多個目錄可以建立多個 socket。項目之間的空白會被忽略;如果您需要在名稱中包含空格或逗號,請用雙引號括住目錄名稱。空值表示不監聽任何 Unix-domain socket,在這種情況下,只有 TCP/IP 協定可用於連線到服務器。預設值通常是 /tmp,但可以在編譯時變更。此參數只能在伺服器啟動時設定。
除了名為 .s.PGSQL.nnnn 的 socket 檔案本身之外,其中 nnnn 是伺服器的連接埠號號,將在每個 unix_socket_directories 目錄中建立一個名為 .s.PGSQL.nnnn.lock 的普通檔案。這兩個檔案都不應該手動刪除。
這個參數與 Windows 無關,它沒有 Unix-domain socket。
unix_socket_group
(string
)
設定 Unix-domain socket 的群組。(socket 的使用者始終是啟動伺服器的使用者。)結合參數 unix_socket_permissions,可以將其用作為 Unix-domain socket 的附加存取控制機制。預設情況下,這是空字符串,它使用服務器用戶的預設群組。此參數只能在伺服器啟動時設定。
這個參數與 Windows 無關,它沒有 Unix-domain socket。
unix_socket_permissions
(integer
)
設定 Unix-domain socket 的存取權限。Unix-domain socket 使用一般的 Unix 檔案系統權限設定。期望的參數值是以 chmod 和 umask 系統呼叫可接受的格式指定數字模式。(要使用習慣的八進制格式,數字必須以 0(零)開頭。)
預設權限是 0777,意味著任何人都可以進行連線。合理的選擇是 0770(僅使用者和其群組,另請參閱 unix_socket_group)和 0700(僅使用者本身)。(請注意,對於Unix-domain socket,只有寫入權限很重要,所以設定還是撤消讀取或執行權限都沒有意義。)
此參數只能在伺服器啟動時設定。
此參數在某些系統上無關緊要,特別是從 Solaris 10 開始的 Solaris,會完全忽略權限許可。在那裡,透過將 unix_socket_directories 指向具有僅限於所需的搜尋權限的目錄,就可以實現類似的效果。這個參數與 Windows 也是無關的,它沒有 Unix-domain socket。
bonjour
(boolean
)
透過 Bonjour 啟用伺服器存在的廣播。預設是關閉的。此參數只能在伺服器啟動時設定。
bonjour_name
(string
)
指定 Bonjour 的服務名稱。如果此參數設定為空字串''(這是預設值),則使用電腦名稱。 如果伺服器未使用 Bonjour 支援進行編譯,則此參數將被忽略。此參數只能在伺服器啟動時設定。
tcp_keepalives_idle
(integer
)
指定 TCP 在發送 Keepalive 訊息給用戶端之後保持連線的秒數。值為 0 時使用系統預設值。此參數僅在支援 TCP_KEEPIDLE 或等效網路選項的系統上以及在 Windows 上受到支援;在其他系統上,它必須是零。在透過 Unix-domain socket 的連線中,該參數將被忽略並始終為零。
在 Windows 上,值為 0 會將此參數設定為2小時,因為 Windows 不提供讀取系統預設值的方法。
tcp_keepalives_interval
(integer
)
指定用戶端未回應的 TCP 保持活動訊息應重新傳輸的秒數。值為 0 時使用系統預設值。此參數僅在支援 TCP_KEEPINTVL 或等效網路選項的系統上以及在 Windows 上受到支援;在其他系統上,它必須是零。在透過 Unix-domain socket 的連線中,此參數將被忽略並始終為零。
在 Windows 上,值為 0 會將此參數設定為 1 秒,因為 Windows 不提供讀取系統預設值的方法。
tcp_keepalives_count
(integer
)
指定在伺服器連線到用戶端之前可能已經失去的 TCP 保持連線的數量。值為 0 時使用系統預設值。此參數僅在支援 TCP_KEEPCNT 或等效網路選項的系統上受到支持;在其他系統上,它必須是零。在透過 Unix-domain socket 的連線中,此參數將被忽略並始終為零。
此參數在 Windows 上不支援,並且必須為零。
authentication_timeout
(integer
)以秒為單位設定用戶端身份驗證的最長時間。如果可能的用戶端在這段時間內還沒有完成認證協議,伺服器將會關閉連線。這可以防止掛起的用戶端無限期地佔用連線。預設值是一分鐘。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
password_encryption
(enum
)ssl
(boolean
)
ssl_ca_file
(string
)
指定包含 SSL 伺服器證書頒發機構(CA)的檔案名稱。相對路徑與資料目錄有關。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值為空,表示未載入 CA 檔案,並且不執行用戶端證書驗證。
在以前的 PostgreSQL 版本中,該檔案的名稱被硬性指定為 root.crt。
ssl_cert_file
(string
)
指定包含 SSL 伺服器證書的檔案名稱。相對路徑與資料目錄有關。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值是 server.crt。
ssl_crl_file
(string
)
指定包含 SSL 伺服器證書吊銷列表(CRL)的文件的名稱。相對路徑與資料目錄有關。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值為空,表示沒有加載 CRL 檔案。
在以前的 PostgreSQL 版本中,該檔案的名稱被硬性指定為 root.crl。
ssl_key_file
(string
)
指定包含 SSL 伺服器私鑰的檔案名稱。相對路徑與資料目錄有關。 此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值是 server.key。
ssl_ciphers
(string
)
指定允許在安全連線上使用的 SSL 密碼套件列表。有關此設定的語法和支援的列表,請參閱 OpenSSL 軟體套件中的密碼手冊文件。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值為 HIGH:MEDIUM:+3DES:!aNULL。這個預設通常是一個合理的設定,除非您有特定的安全要求。
預設值延伸說明:
HIGH
使用來自 HIGH group 的密碼套件(例如:AES,Camellia,3DES)
MEDIUM
使用來自 MEDIUM group 的密碼套件(例如:RC4,SEED)
+3DES
HIGH 的 OpenSSL 預設順序有問題,因為它的 3DES 高於 AES128。這是錯誤的,因為 3DES 比 AES128 提供較低的安全性,而且速度也更慢。+3DES 將所有其他高級和中級密碼重新排序。
!aNULL
停用不進行身份驗證的匿名密碼套件。這種密碼套件容易受到中間人攻擊,因此不應使用。
可用的密碼套件詳細訊息將因 OpenSSL 版本而異。使用命令 openssl ciphers -v'HIGH:MEDIUM:+3DES:!aNULL'
來查看當下安裝的 OpenSSL 版本細節。請注意,此列表在運行時基於伺服器密鑰型別進行過濾。
ssl_prefer_server_ciphers
(boolean
)
指定是否使用伺服器的 SSL 密碼設定,而不是用戶端的。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值是 true。
較舊的 PostgreSQL 版本並沒有此設定,始終使用用戶端的設定。此設定主要是為了與這些版本的相容性。使用伺服器的選項通常更好,因為伺服器更有可能做適當的配置。
ssl_ecdh_curve
(string
)
指定要在 ECDH 密鑰交換中使用的 curve 名稱。它需要所有連線的用戶端支援。它不需要與伺服器的 curve 鍵使用的相同。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設為 prime256v1。
最常用 curve 的 OpenSSL 名稱為:prime256v1(NIST P-256),secp384r1(NIST P-384),secp521r1(NIST P-521)。可用 curve 的完整列表可以使用 openssl ecparam -list_curves 指令列出。但並非所有的結果都可以在 TLS 中使用。
password_encryption
(enum
)
ssl_dh_params_file
(string
)
指定包含用於所謂的 ephemeral DH family 的 SSL 加密的 Diffie-Hellman 參數的檔案名稱。預設值為空,在這種情況下,使用預設編譯的 DH 參數。如果攻擊者設法破解眾所周知的編譯 DH 參數,則使用自行定義 DH 參數可以減少暴露的可能性。 您可以使用指令 openssl dhparam -out dhparams.pem 2048
建立您自己的DH參數檔案。
此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
krb_server_keyfile
(string
)
krb_caseins_users
(boolean
)
設定是否應該區分大小寫地處理 GSSAPI 用戶名。預設是關閉的(區分大小寫)。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
db_user_namespace
(boolean
)
此參數啟用每個資料庫分別的使用者名稱。預設是關閉的。 此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
如果開啓的話,您應該將使用者建立為 username@dbname。當連線用戶端傳遞使用者名稱時,@和資料庫名稱將附加到使用者名稱中,並且該伺服器會查詢特定於資料庫的使用者名稱。請注意,當您在 SQL 環境中建立名稱包含 @ 的使用者時,您需要以引號括住使用者名稱。
啟用此參數後,您仍然可以建立普通的全域使用者。在用戶端指定使用者名稱時簡單追加 @,例如 joe@。在使用者名稱被伺服器查詢之前,@ 將被剝離。
db_user_namespace 會導致用戶端和伺服器的使用者名稱表示方式不同。身份驗證檢查始終使用伺服器的使用者名稱完成,因此必須為伺服器的使用者名稱配置身份驗證方法,而不是用戶端。而 md5 在用戶端和伺服器上均使用使用者名稱作為 salt,所以 md5 不能與 db_user_namespace 一起使用。
此功能是一種臨時措施,到找到完整的解決方案的時候,這個選項將被刪除。
ssl
(boolean
)
啟用 SSL 連線。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設為 off。
ssl_ca_file
(string
)
Specifies the name of the file containing the SSL server certificate authority (CA). Relative paths are relative to the data directory. This parameter can only be set in the postgresql.conf
file or on the server command line. The default is empty, meaning no CA file is loaded, and client certificate verification is not performed.
ssl_cert_file
(string
)
Specifies the name of the file containing the SSL server certificate. Relative paths are relative to the data directory. This parameter can only be set in the postgresql.conf
file or on the server command line. The default is server.crt
.
(string
)
Specifies the name of the file containing the SSL server certificate revocation list (CRL). Relative paths are relative to the data directory. This parameter can only be set in the postgresql.conf
file or on the server command line. The default is empty, meaning no CRL file is loaded.
ssl_key_file
(string
)
Specifies the name of the file containing the SSL server private key. Relative paths are relative to the data directory. This parameter can only be set in the postgresql.conf
file or on the server command line. The default is server.key
.
ssl_ciphers
(string
)
Specifies a list of SSL cipher suites that are allowed to be used on secure connections. See the ciphers manual page in the OpenSSL package for the syntax of this setting and a list of supported values. This parameter can only be set in the postgresql.conf
file or on the server command line. The default value is HIGH:MEDIUM:+3DES:!aNULL
. The default is usually a reasonable choice unless you have specific security requirements.
Explanation of the default value:
HIGH
Cipher suites that use ciphers from HIGH
group (e.g., AES, Camellia, 3DES)
MEDIUM
Cipher suites that use ciphers from MEDIUM
group (e.g., RC4, SEED)
+3DES
The OpenSSL default order for HIGH
is problematic because it orders 3DES higher than AES128. This is wrong because 3DES offers less security than AES128, and it is also much slower. +3DES
reorders it after all other HIGH
and MEDIUM
ciphers.
!aNULL
Disables anonymous cipher suites that do no authentication. Such cipher suites are vulnerable to man-in-the-middle attacks and therefore should not be used.
Available cipher suite details will vary across OpenSSL versions. Use the command openssl ciphers -v 'HIGH:MEDIUM:+3DES:!aNULL'
to see actual details for the currently installed OpenSSL version. Note that this list is filtered at run time based on the server key type
ssl_prefer_server_ciphers
(boolean
)
Specifies whether to use the server's SSL cipher preferences, rather than the client's. This parameter can only be set in the postgresql.conf
file or on the server command line. The default is true
.
Older PostgreSQL versions do not have this setting and always use the client's preferences. This setting is mainly for backward compatibility with those versions. Using the server's preferences is usually better because it is more likely that the server is appropriately configured.
ssl_ecdh_curve
(string
)
Specifies the name of the curve to use in ECDH key exchange. It needs to be supported by all clients that connect. It does not need to be the same curve used by the server's Elliptic Curve key. This parameter can only be set in the postgresql.conf
file or on the server command line. The default is prime256v1
.
OpenSSL names for the most common curves are: prime256v1
(NIST P-256), secp384r1
(NIST P-384), secp521r1
(NIST P-521). The full list of available curves can be shown with the command openssl ecparam -list_curves
. Not all of them are usable in TLS though.
ssl_dh_params_file
(string
)
Specifies the name of the file containing Diffie-Hellman parameters used for so-called ephemeral DH family of SSL ciphers. The default is empty, in which case compiled-in default DH parameters used. Using custom DH parameters reduces the exposure if an attacker manages to crack the well-known compiled-in DH parameters. You can create your own DH parameters file with the command openssl dhparam -out dhparams.pem 2048
.
This parameter can only be set in the postgresql.conf
file or on the server command line.
ssl_passphrase_command
(string
)
Sets an external command to be invoked when a passphrase for decrypting an SSL file such as a private key needs to be obtained. By default, this parameter is empty, which means the built-in prompting mechanism is used.
The command must print the passphrase to the standard output and exit with code 0. In the parameter value, %p
is replaced by a prompt string. (Write %%
for a literal %
.) Note that the prompt string will probably contain whitespace, so be sure to quote adequately. A single newline is stripped from the end of the output if present.
The command does not actually have to prompt the user for a passphrase. It can read it from a file, obtain it from a keychain facility, or similar. It is up to the user to make sure the chosen mechanism is adequately secure.
This parameter can only be set in the postgresql.conf
file or on the server command line
ssl_passphrase_command_supports_reload
(boolean
)
This parameter determines whether the passphrase command set by ssl_passphrase_command
will also be called during a configuration reload if a key file needs a passphrase. If this parameter is false (the default), then ssl_passphrase_command
will be ignored during a reload and the SSL configuration will not be reloaded if a passphrase is needed. That setting is appropriate for a command that requires a TTY for prompting, which might not be available when the server is running. Setting this parameter to true might be appropriate if the passphrase is obtained from a file, for example.
This parameter can only be set in the postgresql.conf
file or on the server command line.
For additional information on tuning these settings, see .
wal_level
(enum
)wal_level
determines how much information is written to the WAL. The default value is replica
, which writes enough data to support WAL archiving and replication, including running read-only queries on a standby server. minimal
removes all logging except the information required to recover from a crash or immediate shutdown. Finally, logical
adds information necessary to support logical decoding. Each level includes the information logged at all lower levels. This parameter can only be set at server start.
In minimal
level, WAL-logging of some bulk operations can be safely skipped, which can make those operations much faster (see ). Operations in which this optimization can be applied include:
But minimal WAL does not contain enough information to reconstruct the data from a base backup and the WAL logs, so replica
or higher must be used to enable WAL archiving () and streaming replication.
In logical
level, the same information is logged as with replica
, plus information needed to allow extracting logical change sets from the WAL. Using a level of logical
will increase the WAL volume, particularly if many tables are configured for REPLICA IDENTITY FULL
and many UPDATE
and DELETE
statements are executed.
In releases prior to 9.6, this parameter also allowed the values archive
and hot_standby
. These are still accepted but mapped to replica
.
fsync
(boolean
)If this parameter is on, the PostgreSQL server will try to make sure that updates are physically written to disk, by issuing fsync()
system calls or various equivalent methods (see ). This ensures that the database cluster can recover to a consistent state after an operating system or hardware crash.
While turning off fsync
is often a performance benefit, this can result in unrecoverable data corruption in the event of a power failure or system crash. Thus it is only advisable to turn off fsync
if you can easily recreate your entire database from external data.
Examples of safe circumstances for turning off fsync
include the initial loading of a new database cluster from a backup file, using a database cluster for processing a batch of data after which the database will be thrown away and recreated, or for a read-only database clone which gets recreated frequently and is not used for failover. High quality hardware alone is not a sufficient justification for turning off fsync
.
For reliable recovery when changing fsync
off to on, it is necessary to force all modified buffers in the kernel to durable storage. This can be done while the cluster is shutdown or while fsync
is on by running initdb --sync-only
, running sync
, unmounting the file system, or rebooting the server.
synchronous_commit
(enum
)If synchronous_standby_names
is empty, the settings on
, remote_apply
, remote_write
and local
all provide the same synchronization level: transaction commits only wait for local flush to disk.
This parameter can be changed at any time; the behavior for any one transaction is determined by the setting in effect when it commits. It is therefore possible, and useful, to have some transactions commit synchronously and others asynchronously. For example, to make a single multistatement transaction commit asynchronously when the default is the opposite, issue SET LOCAL synchronous_commit TO OFF
within the transaction.
wal_sync_method
(enum
)Method used for forcing WAL updates out to disk. If fsync
is off then this setting is irrelevant, since WAL file updates will not be forced out at all. Possible values are:
open_datasync
(write WAL files with open()
option O_DSYNC
)
fdatasync
(call fdatasync()
at each commit)
fsync
(call fsync()
at each commit)
fsync_writethrough
(call fsync()
at each commit, forcing write-through of any disk write cache)
open_sync
(write WAL files with open()
option O_SYNC
)
full_page_writes
(boolean
)啟用此參數後,PostgreSQL 伺服器會在檢查點之後對該頁面的首次修改期間將每個磁碟頁面的全部內容寫入 WAL。這是必要的,因為在作業系統當機期間正在進行的頁面寫入可能僅部分完成,從而導致包含新舊資料混合在磁碟頁面之中。通常在 WAL 中所儲存的資料列層級更改資料不足以在當機後還原期間完全還原此類頁面。儲存完整的頁面映像可確保還原正確的頁面,但是這樣做的代價是增加了必須寫入 WAL 的資料量。 (由於 WAL 重放總是從檢查點開始,因此在檢查點之後每頁的第一次更改期間執行此操作就足夠了。也因此,減少全頁寫入成本的一種方法是增加檢查點間隔參數。)
停用此參數可加快正常操作的速度,但在系統故障後可能會導致不可恢復的資料損壞或未知的資料損壞。風險與關閉 fsync 相似,儘管較小,但應僅根據針對該參數建議的相同情況將其關閉。
該參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設為 on。
wal_log_hints
(boolean
)When this parameter is on
, the PostgreSQL server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint, even for non-critical modifications of so-called hint bits.
If data checksums are enabled, hint bit updates are always WAL-logged and this setting is ignored. You can use this setting to test how much extra WAL-logging would occur if your database had data checksums enabled.
This parameter can only be set at server start. The default value is off
.
wal_compression
(boolean
)Turning this parameter on can reduce the WAL volume without increasing the risk of unrecoverable data corruption, but at the cost of some extra CPU spent on the compression during WAL logging and on the decompression during WAL replay.
wal_buffers
(integer
)The contents of the WAL buffers are written out to disk at every transaction commit, so extremely large values are unlikely to provide a significant benefit. However, setting this value to at least a few megabytes can improve write performance on a busy server where many clients are committing at once. The auto-tuning selected by the default setting of -1 should give reasonable results in most cases.
wal_writer_delay
(integer
)Specifies how often the WAL writer flushes WAL, in time terms. After flushing WAL the writer sleeps for the length of time given by wal_writer_delay
, unless woken up sooner by an asynchronously committing transaction. If the last flush happened less than wal_writer_delay
ago and less than wal_writer_flush_after
worth of WAL has been produced since, then WAL is only written to the operating system, not flushed to disk. If this value is specified without units, it is taken as milliseconds. The default value is 200 milliseconds (200ms
). Note that on many systems, the effective resolution of sleep delays is 10 milliseconds; setting wal_writer_delay
to a value that is not a multiple of 10 might have the same results as setting it to the next higher multiple of 10. This parameter can only be set in the postgresql.conf
file or on the server command line.
wal_writer_flush_after
(integer
)Specifies how often the WAL writer flushes WAL, in volume terms. If the last flush happened less than wal_writer_delay
ago and less than wal_writer_flush_after
worth of WAL has been produced since, then WAL is only written to the operating system, not flushed to disk. If wal_writer_flush_after
is set to 0
then WAL data is always flushed immediately. If this value is specified without units, it is taken as WAL blocks, that is XLOG_BLCKSZ
bytes, typically 8kB. The default is 1MB
. This parameter can only be set in the postgresql.conf
file or on the server command line.
commit_delay
(integer
)Setting commit_delay
adds a time delay before a WAL flush is initiated. This can improve group commit throughput by allowing a larger number of transactions to commit via a single WAL flush, if system load is high enough that additional transactions become ready to commit within the given interval. However, it also increases latency by up to the commit_delay
for each WAL flush. Because the delay is just wasted if no other transactions become ready to commit, a delay is only performed if at least commit_siblings
other transactions are active when a flush is about to be initiated. Also, no delays are performed if fsync
is disabled. If this value is specified without units, it is taken as microseconds. The default commit_delay
is zero (no delay). Only superusers can change this setting.
In PostgreSQL releases prior to 9.3, commit_delay
behaved differently and was much less effective: it affected only commits, rather than all WAL flushes, and waited for the entire configured delay even if the WAL flush was completed sooner. Beginning in PostgreSQL 9.3, the first process that becomes ready to flush waits for the configured interval, while subsequent processes wait only until the leader completes the flush operation.
commit_siblings
(integer
)Minimum number of concurrent open transactions to require before performing the commit_delay
delay. A larger value makes it more probable that at least one other transaction will become ready to commit during the delay interval. The default is five transactions.
checkpoint_timeout
(integer
)自動 WAL 檢查點之間的最長時間。如果指定的值不帶單位,則以秒為單位。有效範圍是 30 秒至 1 天。預設值為五分鐘(5 分鐘)。增大此參數可能會增加當機回復所需的時間。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
checkpoint_completion_target
(floating point
)指定檢查點完成的目標,佔檢查點之間總時間的一部分。預設值為 0.5。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
checkpoint_flush_after
(integer
)
checkpoint_warning
(integer
)Write a message to the server log if checkpoints caused by the filling of WAL segment files happen closer together than this amount of time (which suggests that max_wal_size
ought to be raised). If this value is specified without units, it is taken as seconds. The default is 30 seconds (30s
). Zero disables the warning. No warnings will be generated if checkpoint_timeout
is less than checkpoint_warning
. This parameter can only be set in the postgresql.conf
file or on the server command line.
max_wal_size
(integer
)使 WAL 增長到自動 WAL 檢查點之間的最大大小。這是一個軟限制。在特殊情況下,例如重度負載,失敗的 archive_command 或較高的 wal_keep_segments 設定,WAL 大小可能會超過 max_wal_size。如果指定的該值不帶單位,則以 MegaByte 為單位。預設值為1 GB。增大此參數可能會增加當機回復所需的時間。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
min_wal_size
(integer
)As long as WAL disk usage stays below this setting, old WAL files are always recycled for future use at a checkpoint, rather than removed. This can be used to ensure that enough WAL space is reserved to handle spikes in WAL usage, for example when running large batch jobs. If this value is specified without units, it is taken as megabytes. The default is 80 MB. This parameter can only be set in the postgresql.conf
file or on the server command line.
archive_mode
(enum
)
archive_mode
and archive_command
are separate variables so that archive_command
can be changed without leaving archiving mode. This parameter can only be set at server start. archive_mode
cannot be enabled when wal_level
is set to minimal
.
archive_command
(string
)
This parameter can only be set in the postgresql.conf
file or on the server command line. It is ignored unless archive_mode
was enabled at server start. If archive_command
is an empty string (the default) while archive_mode
is enabled, WAL archiving is temporarily disabled, but the server continues to accumulate WAL segment files in the expectation that a command will soon be provided. Setting archive_command
to a command that does nothing but return true, e.g. /bin/true
(REM
on Windows), effectively disables archiving, but also breaks the chain of WAL files needed for archive recovery, so it should only be used in unusual circumstances.
archive_timeout
(integer
)
This section describes the settings that apply only for the duration of the recovery. They must be reset for any subsequent recovery you wish to perform.
“Recovery” covers using the server as a standby or for executing a targeted recovery. Typically, standby mode would be used to provide high availability and/or read scalability, whereas a targeted recovery is used to recover from data loss.
restore_command
(string
)It is important for the command to return a zero exit status only if it succeeds. The command will be asked for file names that are not present in the archive; it must return nonzero when so asked. Examples:
An exception is that if the command was terminated by a signal (other than SIGTERM, which is used as part of a database server shutdown) or an error by the shell (such as command not found), then recovery will abort and the server will not start up.
This parameter can only be set at server start.
archive_cleanup_command
(string
)If the command returns a nonzero exit status then a warning log message will be written. An exception is that if the command was terminated by a signal or an error by the shell (such as command not found), a fatal error will be raised.
This parameter can only be set in the postgresql.conf
file or on the server command line.
recovery_end_command
(string
)If the command returns a nonzero exit status then a warning log message will be written and the database will proceed to start up anyway. An exception is that if the command was terminated by a signal or an error by the shell (such as command not found), the database will not proceed with startup.
This parameter can only be set in the postgresql.conf
file or on the server command line.
By default, recovery will recover to the end of the WAL log. The following parameters can be used to specify an earlier stopping point. At most one of recovery_target
, recovery_target_lsn
, recovery_target_name
, recovery_target_time
, or recovery_target_xid
can be used; if more than one of these is specified in the configuration file, an error will be raised. These parameters can only be set at server start.
recovery_target
= 'immediate'
This parameter specifies that recovery should end as soon as a consistent state is reached, i.e. as early as possible. When restoring from an online backup, this means the point where taking the backup ended.
Technically, this is a string parameter, but 'immediate'
is currently the only allowed value.
recovery_target_name
(string
)This parameter specifies the named restore point (created with pg_create_restore_point()
) to which recovery will proceed.
recovery_target_time
(timestamp
)recovery_target_xid
(string
)recovery_target_lsn
(pg_lsn
)The following options further specify the recovery target, and affect what happens when the target is reached:
recovery_target_inclusive
(boolean
)recovery_target_timeline
(string
)Specifies recovering into a particular timeline. The value can be a numeric timeline ID or a special value. The value current
recovers along the same timeline that was current when the base backup was taken. The value latest
recovers to the latest timeline found in the archive, which is useful in a standby server. latest
is the default.
recovery_target_action
(enum
)Specifies what action the server should take once the recovery target is reached. The default is pause
, which means recovery will be paused. promote
means the recovery process will finish and the server will start to accept connections. Finally shutdown
will stop the server after reaching the recovery target.
The shutdown
setting is useful to have the instance ready at the exact replay point desired. The instance will still be able to replay more WAL records (and in fact will have to replay WAL records since the last checkpoint next time it is started).
Note that because recovery.signal
will not be removed when recovery_target_action
is set to shutdown
, any subsequent start will end with immediate shutdown unless the configuration is changed or the recovery.signal
file is removed manually.
In addition to the postgresql.conf
file already mentioned, PostgreSQL uses two other manually-edited configuration files, which control client authentication (their use is discussed in ). By default, all three configuration files are stored in the database cluster's data directory. The parameters described in this section allow the configuration files to be placed elsewhere. (Doing so can ease administration. In particular it is often easier to ensure that the configuration files are properly backed-up when they are kept separate.)
data_directory
(string
)Specifies the directory to use for data storage. This parameter can only be set at server start.
config_file
(string
)Specifies the main server configuration file (customarily called postgresql.conf
). This parameter can only be set on the postgres
command line.
hba_file
(string
)Specifies the configuration file for host-based authentication (customarily called pg_hba.conf
). This parameter can only be set at server start.
ident_file
(string
)Specifies the configuration file for user name mapping (customarily called pg_ident.conf
). This parameter can only be set at server start. See also .
external_pid_file
(string
)Specifies the name of an additional process-ID (PID) file that the server should create for use by server administration programs. This parameter can only be set at server start.
In a default installation, none of the above parameters are set explicitly. Instead, the data directory is specified by the -D
command-line option or the PGDATA
environment variable, and the configuration files are all found within the data directory.
If you wish to keep the configuration files elsewhere than the data directory, the postgres
-D
command-line option or PGDATA
environment variable must point to the directory containing the configuration files, and the data_directory
parameter must be set in postgresql.conf
(or on the command line) to show where the data directory is actually located. Notice that data_directory
overrides -D
and PGDATA
for the location of the data directory, but not for the location of the configuration files.
If you wish, you can specify the configuration file names and locations individually using the parameters config_file
, hba_file
and/or ident_file
. config_file
can only be specified on the postgres
command line, but the others can be set within the main configuration file. If all three parameters plus data_directory
are explicitly set, then it is not necessary to specify -D
or PGDATA
.
When setting any of these parameters, a relative path will be interpreted with respect to the directory in which postgres
is started.
這些配置參數提供了影響查詢最佳化程序選擇的查詢計劃決策方法。如果最佳化程序為特定查詢選擇的預設計劃並非最佳,則臨時的解決方案是使用這些配置參數來強制最佳化程序選擇不同的計劃。提高最佳化程序選擇的計劃素質的有效方法包括了調整計劃程序成本常數(請參閱),手動執行 ,增加 配置參數的值,以及增加為特定欄位收集的統計訊息量,使用 ALTER TABLE SET STATISTICS。
enable_bitmapscan
(boolean
)啟用或停用查詢計劃程序使用 bitmap 掃描計劃類型。預設為開啓。
enable_gathermerge
(boolean
)啟用或停用查詢計劃程序使用 gather merge 計劃類型。預設為開啓。
enable_hashagg
(boolean
)啟用或停用查詢計劃程序使用 hashed aggregation 計劃類型。預設為開啓。
enable_hashjoin
(boolean
)啟用或停用查詢計劃程序使用 hash-join 計劃類型。預設為開啓。
enable_indexscan
(boolean
)啟用或停用查詢計劃程序使用 index-scan 計劃類型。預設為開啓。
enable_indexonlyscan
(boolean
)啟用或停用查詢計劃程序使用 index-only 掃描計劃類型(請參閱)。預設為開啓。
enable_material
(boolean
)啟用或停用查詢計劃程序對實作的使用。完全抑制實作是不可能的,但是關閉此變數會阻止計劃程序插入實體化的節點,除非真的需要它。預設為開啓。
enable_memoize
(boolean
)啟用或停用查詢計劃程序使用記憶體來快取 Nested-loop JOINs 條件篩選的結果。此計劃類型將會在目前的條件結果已經在快取中時跳過對原查詢計劃的執行。當新查詢需要更多記憶體空間時,不太常查詢的結果將會在快取中移除。預設值為啟用。
enable_mergejoin
(boolean
)啟用或停用查詢計劃程序使用 merge-join 計劃類型。預設為開啓。
enable_nestloop
(boolean
)啟用或停用查詢計劃程序使用 nested-loop join 計劃。完全抑制 nested-loop join 是不可能的,但如果有其他可用方法,則關閉此變數會阻止規劃器使用它。預設為開啓。
enable_parallel_append
(boolean
)Enables or disables the query planner's use of parallel-aware append plan types. The default is on
.
enable_parallel_hash
(boolean
)Enables or disables the query planner's use of hash-join plan types with parallel hash. Has no effect if hash-join plans are not also enabled. The default is on
.
enable_partition_pruning
(boolean
)enable_partitionwise_join
(boolean
)Enables or disables the query planner's use of partitionwise join, which allows a join between partitioned tables to be performed by joining the matching partitions. Partitionwise join currently applies only when the join conditions include all the partition keys, which must be of the same data type and have exactly matching sets of child partitions. Because partitionwise join planning can use significantly more CPU time and memory during planning, the default is off
.
enable_partitionwise_aggregate
(boolean
)Enables or disables the query planner's use of partitionwise grouping or aggregation, which allows grouping or aggregation on a partitioned tables performed separately for each partition. If the GROUP BY
clause does not include the partition keys, only partial aggregation can be performed on a per-partition basis, and finalization must be performed later. Because partitionwise grouping or aggregation can use significantly more CPU time and memory during planning, the default is off
.
enable_seqscan
(boolean
)啟用或停用查詢計劃程序使用循序掃描計劃類型。完全抑制循序掃描是不可能的,但如果有其他方法可用,則關閉此變數會阻止計劃程序使用。預設為開啓。
enable_sort
(boolean
)啟用或停用查詢計劃程序使用明確的排序步驟。完全抑制明確排序是不可能的,但如果有其他可用方法,則關閉此變數會阻止計劃程序使用。預設為開啓。
enable_tidscan
(boolean
)啟用或停用查詢計劃程序使用 TID 掃描計劃類型。預設為開啓。
本節中描述的成本變數是以比例來使用的。只有它們的相對值很重要,因此按相同因子放大或縮小它們將不會讓規劃程式的選擇有所變化。預設情況下,這些成本變數基於連續頁面讀取的成本;也就是說,seq_page_cost 通常設定為 1.0,其他成本變數是相對參考其設定的。 但是,如果您願意,可以使用不同的比例,例如特定主機上的實際執行時間(以毫秒為單位)。
注意 不幸的是,並沒有明確定義的方法來決定成本變數的理想值。它們最好被視為特定安裝環境可能接受的所有查詢組合的平均值。這意味著僅僅根據一些實驗來改變它們都不是真正的最佳。
seq_page_cost
(floating point
)random_page_cost
(floating point
)Reducing this value relative to seq_page_cost
will cause the system to prefer index scans; raising it will make index scans look relatively more expensive. You can raise or lower both values together to change the importance of disk I/O costs relative to CPU costs, which are described by the following parameters.
Random access to mechanical disk storage is normally much more expensive than four times sequential access. However, a lower default is used (4.0) because the majority of random accesses to disk, such as indexed reads, are assumed to be in cache. The default value can be thought of as modeling random access as 40 times slower than sequential, while expecting 90% of random reads to be cached.
If you believe a 90% cache rate is an incorrect assumption for your workload, you can increase random_page_cost to better reflect the true cost of random storage reads. Correspondingly, if your data is likely to be completely in cache, such as when the database is smaller than the total server memory, decreasing random_page_cost can be appropriate. Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost.
儘管系統會允許您將 random_page_cost 設定值小於 seq_page_cost,但這樣做在物理上是不合理的。然而,如果資料庫完全儲存在 RAM 之中,則將它們設定為相等是有意義的;因為在這種情況下,不按順序讀取頁面也不會受到懲罰。此外,在高容量記憶體的資料庫中,您應該相對於 CPU 參數降低這兩個設定值,因為取得 RAM 中已存在的頁面的成本比通常要小得多。
cpu_tuple_cost
(floating point
)設定計劃程序在查詢期間處理每個資料列的成本估算。預設值為 0.01。
cpu_index_tuple_cost
(floating point
)設定計劃程序在索引掃描期間處理每個索引項目的成本估計。預設值為 0.005。
cpu_operator_cost
(floating point
)設定計劃程序對查詢期間執行的每個運算子或函數的處理成本的估計。 預設值為 0.0025。
parallel_setup_cost
(floating point
)設定計劃程序對啟動平行工作程序的成本估計。預設值為 1000。
parallel_tuple_cost
(floating point
)設定計劃程序對從一個平行工作程序轉移到另一個程序的一個 tuple 的成本估算。預設值為 0.1。
min_parallel_table_scan_size
(integer
)Sets the minimum amount of table data that must be scanned in order for a parallel scan to be considered. For a parallel sequential scan, the amount of table data scanned is always equal to the size of the table, but when indexes are used the amount of table data scanned will normally be less. The default is 8 megabytes (8MB
).
min_parallel_index_scan_size
(integer
)Sets the minimum amount of index data that must be scanned in order for a parallel scan to be considered. Note that a parallel index scan typically won't touch the entire index; it is the number of pages which the planner believes will actually be touched by the scan which is relevant. The default is 512 kilobytes (512kB
).
effective_cache_size
(integer
)Sets the planner's assumption about the effective size of the disk cache that is available to a single query. This is factored into estimates of the cost of using an index; a higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used. When setting this parameter you should consider both PostgreSQL's shared buffers and the portion of the kernel's disk cache that will be used for PostgreSQL data files. Also, take into account the expected number of concurrent queries on different tables, since they will have to share the available space. This parameter has no effect on the size of shared memory allocated by PostgreSQL, nor does it reserve kernel disk cache; it is used only for estimation purposes. The system also does not assume data remains in the disk cache between queries. The default is 4 gigabytes (4GB
).
geqo
(boolean
)Enables or disables genetic query optimization. This is on by default. It is usually best not to turn it off in production; the geqo_threshold
variable provides more granular control of GEQO.
geqo_threshold
(integer
)Use genetic query optimization to plan queries with at least this many FROM
items involved. (Note that a FULL OUTER JOIN
construct counts as only one FROM
item.) The default is 12. For simpler queries it is usually best to use the regular, exhaustive-search planner, but for queries with many tables the exhaustive search takes too long, often longer than the penalty of executing a suboptimal plan. Thus, a threshold on the size of the query is a convenient way to manage use of GEQO.
geqo_effort
(integer
)Controls the trade-off between planning time and query plan quality in GEQO. This variable must be an integer in the range from 1 to 10. The default value is five. Larger values increase the time spent doing query planning, but also increase the likelihood that an efficient query plan will be chosen.
geqo_effort
doesn't actually do anything directly; it is only used to compute the default values for the other variables that influence GEQO behavior (described below). If you prefer, you can set the other parameters by hand instead.
geqo_pool_size
(integer
)Controls the pool size used by GEQO, that is the number of individuals in the genetic population. It must be at least two, and useful values are typically 100 to 1000. If it is set to zero (the default setting) then a suitable value is chosen based on geqo_effort
and the number of tables in the query.
geqo_generations
(integer
)Controls the number of generations used by GEQO, that is the number of iterations of the algorithm. It must be at least one, and useful values are in the same range as the pool size. If it is set to zero (the default setting) then a suitable value is chosen based on geqo_pool_size
.
geqo_selection_bias
(floating point
)Controls the selection bias used by GEQO. The selection bias is the selective pressure within the population. Values can be from 1.50 to 2.00; the latter is the default.
geqo_seed
(floating point
)Controls the initial value of the random number generator used by GEQO to select random paths through the join order search space. The value can range from zero (the default) to one. Varying the value changes the set of join paths explored, and may result in a better or worse best path being found.
default_statistics_target
(integer
)constraint_exclusion
(enum
)Controls the query planner's use of table constraints to optimize queries. The allowed values of constraint_exclusion
are on
(examine constraints for all tables), off
(never examine constraints), and partition
(examine constraints only for inheritance child tables and UNION ALL
subqueries). partition
is the default setting. It is often used with inheritance and partitioned tables to improve performance.
When this parameter allows it for a particular table, the planner compares query conditions with the table's CHECK
constraints, and omits scanning tables for which the conditions contradict the constraints. For example:
With constraint exclusion enabled, this SELECT
will not scan child1000
at all, improving performance.
Currently, constraint exclusion is enabled by default only for cases that are often used to implement table partitioning. Turning it on for all tables imposes extra planning overhead that is quite noticeable on simple queries, and most often will yield no benefit for simple queries. If you have no partitioned tables you might prefer to turn it off entirely.
cursor_tuple_fraction
(floating point
)Sets the planner's estimate of the fraction of a cursor's rows that will be retrieved. The default is 0.1. Smaller values of this setting bias the planner towards using “fast start” plans for cursors, which will retrieve the first few rows quickly while perhaps taking a long time to fetch all rows. Larger values put more emphasis on the total estimated time. At the maximum setting of 1.0, cursors are planned exactly like regular queries, considering only the total estimated time and not how soon the first rows might be delivered.
from_collapse_limit
(integer
)join_collapse_limit
(integer
)The planner will rewrite explicit JOIN
constructs (except FULL JOIN
s) into lists of FROM
items whenever a list of no more than this many items would result. Smaller values reduce planning time but might yield inferior query plans.
force_parallel_mode
(enum
)Allows the use of parallel queries for testing purposes even in cases where no performance benefit is expected. The allowed values of force_parallel_mode
are off
(use parallel mode only when it is expected to improve performance), on
(force parallel query for all queries for which it is thought to be safe), and regress
(like on
, but with additional behavior changes as explained below).
More specifically, setting this value to on
will add a Gather
node to the top of any query plan for which this appears to be safe, so that the query runs inside of a parallel worker. Even when a parallel worker is not available or cannot be used, operations such as starting a subtransaction that would be prohibited in a parallel query context will be prohibited unless the planner believes that this will cause the query to fail. If failures or unexpected results occur when this option is set, some functions used by the query may need to be marked PARALLEL UNSAFE
(or, possibly, PARALLEL RESTRICTED
).
Setting this value to regress
has all of the same effects as setting it to on
plus some additional effects that are intended to facilitate automated regression testing. Normally, messages from a parallel worker include a context line indicating that, but a setting of regress
suppresses this line so that the output is the same as in non-parallel execution. Also, the Gather
nodes added to plans by this setting are hidden in EXPLAIN
output so that the output matches what would be obtained if this setting were turned off
.
($PGDATA/server.crt
)
($PGDATA/server.key
)
In addition to postgresql.conf
, a PostgreSQL data directory contains a file postgresql.auto.conf
, which has the same format as postgresql.conf
but should never be edited manually. This file holds settings provided through the command. This file is automatically read whenever postgresql.conf
is, and its settings take effect in the same way. Settings in postgresql.auto.conf
override those in postgresql.conf
.
The system view can be helpful for pre-testing changes to the configuration file, or for diagnosing problems if a SIGHUP signal did not have the desired effects.
PostgreSQL provides three SQL commands to establish configuration defaults. The already-mentioned command provides a SQL-accessible means of changing global defaults; it is functionally equivalent to editing postgresql.conf
. In addition, there are two commands that allow setting of defaults on a per-database or per-role basis:
The command allows global settings to be overridden on a per-database basis.
The command allows both global and per-database settings to be overridden with user-specific values.
The command allows inspection of the current value of all parameters. The corresponding function is current_setting(setting_name text)
.
The command allows modification of the current value of those parameters that can be set locally to a session; it has no effect on other sessions. The corresponding function is set_config(setting_name, new_value, is_local)
.
In addition, the system view can be used to view and change session-local values:
Using on this view, specifically updating the setting
column, is the equivalent of issuing SET
commands. For example, the equivalent of
這種存取控制機制獨立於中所描述的機制。
在 或 中指定了密碼後,此參數確定用於加密密碼的演算法。預設值為 md5,它將密碼儲存為 MD5 雜湊值(也可以使用 on,作為 md5 的別名)。將此參數設定為 scram-sha-256 將使用 SCRAM-SHA-256 來加密密碼。
請注意,較舊的用戶端程式可能會缺乏對 SCRAM 身份驗證機制的支援,因此不適用於使用 SCRAM-SHA-256 加密的密碼。有關更多詳細資訊,請參閱 。
啟用 SSL 連線。使用前請先閱讀。 此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設是關閉的。
當在 或 中指定密碼時,此參數決定用於加密密碼的演算法。預設值是md5,它將密碼儲存為MD5 hash(on 也被接受,作為 md5 的別名)。將此參數設定為 scram-sha-256 時將使用 SCRAM-SHA-256 加密密碼。
請注意,較舊的用戶端可能缺少對 SCRAM 認證機制的支援,因此不適用於使用 SCRAM-SHA-256 加密的密碼。有關更多詳細訊息,請參閱。
設定 Kerberos 伺服器密鑰檔案的位置。有關詳細訊息,請參閱。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
有關設定 SSL 的更多資訊,請參閱。
In many situations, turning off for noncritical transactions can provide much of the potential performance benefit of turning off fsync
, without the attendant risks of data corruption.
fsync
can only be set in the postgresql.conf
file or on the server command line. If you turn this parameter off, also consider turning off .
Specifies whether transaction commit will wait for WAL records to be written to disk before the command returns a “success” indication to the client. Valid values are on
, remote_apply
, remote_write
, local
, and off
. The default, and safe, setting is on
. When off
, there can be a delay between when success is reported to the client and when the transaction is really guaranteed to be safe against a server crash. (The maximum delay is three times .) Unlike , setting this parameter to off
does not create any risk of database inconsistency: an operating system or database crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly. So, turning synchronous_commit
off can be a useful alternative when performance is more important than exact certainty about the durability of a transaction. For more discussion see .
If is non-empty, this parameter also controls whether or not transaction commits will wait for their WAL records to be replicated to the standby server(s). When set to on
, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and flushed it to disk. This ensures the transaction will not be lost unless both the primary and all synchronous standbys suffer corruption of their database storage. When set to remote_apply
, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and applied it, so that it has become visible to queries on the standby(s). When set to remote_write
, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and written it out to their operating system. This setting is sufficient to ensure data preservation even if a standby instance of PostgreSQL were to crash, but not if the standby suffers an operating-system-level crash, since the data has not necessarily reached stable storage on the standby. Finally, the setting local
causes commits to wait for local flush to disk, but not for replication. This is not usually desirable when synchronous replication is in use, but is provided for completeness.
The open_
* options also use O_DIRECT
if available. Not all of these choices are available on all platforms. The default is the first method in the above list that is supported by the platform, except that fdatasync
is the default on Linux. The default is not necessarily ideal; it might be necessary to change this setting or other aspects of your system configuration in order to create a crash-safe configuration or achieve optimal performance. These aspects are discussed in . This parameter can only be set in the postgresql.conf
file or on the server command line.
禁用此參數不會影響使用 WAL 歸檔進行時間點還原作業(PITR)(請參閱)。
When this parameter is on
, the PostgreSQL server compresses a full page image written to WAL when is on or during a base backup. A compressed page image will be decompressed during WAL replay. The default value is off
. Only superusers can change this setting.
The amount of shared memory used for WAL data that has not yet been written to disk. The default setting of -1 selects a size equal to 1/32nd (about 3%) of , but not less than 64kB
nor more than the size of one WAL segment, typically 16MB
. This value can be set manually if the automatic choice is too large or too small, but any positive value less than 32kB
will be treated as 32kB
. If this value is specified without units, it is taken as WAL blocks, that is XLOG_BLCKSZ
bytes, typically 8kB. This parameter can only be set at server start.
Whenever more than this amount of data has been written while performing a checkpoint, attempt to force the OS to issue these writes to the underlying storage. Doing so will limit the amount of dirty data in the kernel's page cache, reducing the likelihood of stalls when an fsync
is issued at the end of the checkpoint, or when the OS writes data back in larger batches in the background. Often that will result in greatly reduced transaction latency, but there also are some cases, especially with workloads that are bigger than , but smaller than the OS's page cache, where performance might degrade. This setting may have no effect on some platforms. If this value is specified without units, it is taken as blocks, that is BLCKSZ
bytes, typically 8kB. The valid range is between 0
, which disables forced writeback, and 2MB
. The default is 256kB
on Linux, 0
elsewhere. (If BLCKSZ
is not 8kB, the default and maximum values scale proportionally to it.) This parameter can only be set in the postgresql.conf
file or on the server command line.
When archive_mode
is enabled, completed WAL segments are sent to archive storage by setting . In addition to off
, to disable, there are two modes: on
, and always
. During normal operation, there is no difference between the two modes, but when set to always
the WAL archiver is enabled also during archive recovery or standby mode. In always
mode, all files restored from the archive or streamed with streaming replication will be archived (again). See for details.
The local shell command to execute to archive a completed WAL file segment. Any %p
in the string is replaced by the path name of the file to archive, and any %f
is replaced by only the file name. (The path name is relative to the working directory of the server, i.e., the cluster's data directory.) Use %%
to embed an actual %
character in the command. It is important for the command to return a zero exit status only if it succeeds. For more information see .
The is only invoked for completed WAL segments. Hence, if your server generates little WAL traffic (or has slack periods where it does so), there could be a long delay between the completion of a transaction and its safe recording in archive storage. To limit how old unarchived data can be, you can set archive_timeout
to force the server to switch to a new WAL segment file periodically. When this parameter is greater than zero, the server will switch to a new segment file whenever this amount of time has elapsed since the last segment file switch, and there has been any database activity, including a single checkpoint (checkpoints are skipped if there is no database activity). Note that archived files that are closed early due to a forced switch are still the same length as completely full files. Therefore, it is unwise to use a very short archive_timeout
— it will bloat your archive storage. archive_timeout
settings of a minute or so are usually reasonable. You should consider using streaming replication, instead of archiving, if you want data to be copied off the master server more quickly than that. If this value is specified without units, it is taken as seconds. This parameter can only be set in the postgresql.conf
file or on the server command line.
To start the server in standby mode, create a file called standby.signal
in the data directory. The server will enter recovery and will not stop recovery when the end of archived WAL is reached, but will keep trying to continue recovery by connecting to the sending server as specified by the primary_conninfo
setting and/or by fetching new WAL segments using restore_command
. For this mode, the parameters from this section and are of interest. Parameters from will also be applied but are typically not useful in this mode.
To start the server in targeted recovery mode, create a file called recovery.signal
in the data directory. If both standby.signal
and recovery.signal
files are created, standby mode takes precedence. Targeted recovery mode ends when the archived WAL is fully replayed, or when recovery_target
is reached. In this mode, the parameters from both this section and will be used.
The local shell command to execute to retrieve an archived segment of the WAL file series. This parameter is required for archive recovery, but optional for streaming replication. Any %f
in the string is replaced by the name of the file to retrieve from the archive, and any %p
is replaced by the copy destination path name on the server. (The path name is relative to the current working directory, i.e., the cluster's data directory.) Any %r
is replaced by the name of the file containing the last valid restart point. That is the earliest file that must be kept to allow a restore to be restartable, so this information can be used to truncate the archive to just the minimum required to support restarting from the current restore. %r
is typically only used by warm-standby configurations (see ). Write %%
to embed an actual %
character.
This optional parameter specifies a shell command that will be executed at every restartpoint. The purpose of archive_cleanup_command
is to provide a mechanism for cleaning up old archived WAL files that are no longer needed by the standby server. Any %r
is replaced by the name of the file containing the last valid restart point. That is the earliest file that must be kept to allow a restore to be restartable, and so all files earlier than %r
may be safely removed. This information can be used to truncate the archive to just the minimum required to support restart from the current restore. The module is often used in archive_cleanup_command
for single-standby configurations, for example:
Note however that if multiple standby servers are restoring from the same archive directory, you will need to ensure that you do not delete WAL files until they are no longer needed by any of the servers. archive_cleanup_command
would typically be used in a warm-standby configuration (see ). Write %%
to embed an actual %
character in the command.
This parameter specifies a shell command that will be executed once only at the end of recovery. This parameter is optional. The purpose of the recovery_end_command
is to provide a mechanism for cleanup following replication or recovery. Any %r
is replaced by the name of the file containing the last valid restart point, like in .
This parameter specifies the time stamp up to which recovery will proceed. The precise stopping point is also influenced by .
This parameter specifies the transaction ID up to which recovery will proceed. Keep in mind that while transaction IDs are assigned sequentially at transaction start, transactions can complete in a different numeric order. The transactions that will be recovered are those that committed before (and optionally including) the specified one. The precise stopping point is also influenced by .
This parameter specifies the LSN of the write-ahead log location up to which recovery will proceed. The precise stopping point is also influenced by . This parameter is parsed using the system data type .
Specifies whether to stop just after the specified recovery target (on
), or just before the recovery target (off
). Applies when , , or is specified. This setting controls whether transactions having exactly the target WAL location (LSN), commit time, or transaction ID, respectively, will be included in the recovery. Default is on
.
You usually only need to set this parameter in complex re-recovery situations, where you need to return to a state that itself was reached after a point-in-time recovery. See for discussion.
The intended use of the pause
setting is to allow queries to be executed against the database to check if this recovery target is the most desirable point for recovery. The paused state can be resumed by using pg_wal_replay_resume()
(see ), which then causes recovery to end. If this recovery target is not the desired stopping point, then shut down the server, change the recovery target settings to a later target and restart to continue recovery.
This setting has no effect if no recovery target is set. If is not enabled, a setting of pause
will act the same as shutdown
.
啟用或停用查詢計劃程序從查詢計劃中修剪分割資料表分割區的功能。 這也控制了計劃程序產生查詢計劃的功能,此功能使查詢執行程序可以在查詢執行期間刪除(忽略)分割區。 預設為 on。有關詳細資訊,請參閱。
設定計劃程序對磁碟頁面讀取的成本估計,此成本是一系列連續讀取的一部分。預設值為 1.0。透過設定同名的 tablespace 參數,可以為特定資料表空間中的資料表和索引覆寫此值(請參閱 )。
Sets the planner's estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0. This value can be overridden for tables and indexes in a particular tablespace by setting the tablespace parameter of the same name (see ).
The genetic query optimizer (GEQO) is an algorithm that does query planning using heuristic searching. This reduces planning time for complex queries (those joining many relations), at the cost of producing plans that are sometimes inferior to those found by the normal exhaustive-search algorithm. For more information see .
為沒有透過 ALTER TABLE SET STATISTICS 設定特定於欄位目標的資料表欄位設定預設的統計目標。較大的值會增加進行分析所需的時間,但可能會提高查詢計劃程序評估的準確度。預設值為 100。有關 PostgreSQL 查詢計劃程序使用統計資訊的更多說明,請參閱 。
Refer to for more information on using constraint exclusion and partitioning.
The planner will merge sub-queries into upper queries if the resulting FROM
list would have no more than this many items. Smaller values reduce planning time but might yield inferior query plans. The default is eight. For more information see .
Setting this value to or more may trigger use of the GEQO planner, resulting in non-optimal plans. See .
By default, this variable is set the same as from_collapse_limit
, which is appropriate for most uses. Setting it to 1 prevents any reordering of explicit JOIN
s. Thus, the explicit join order specified in the query will be the actual order in which the relations are joined. Because the query planner does not always choose the optimal join order, advanced users can elect to temporarily set this variable to 1, and then specify the join order they desire explicitly. For more information see .
Setting this value to or more may trigger use of the GEQO planner, resulting in non-optimal plans. See .
CREATE TABLE AS
CREATE INDEX
CLUSTER
COPY
into tables that were created or truncated in the same transaction
這些設定控制自動資料清理(autovacuum)功能的行為。有關更多訊息,請參閱第 24.1.6 節。請注意,許多這些設定可以基於每個資料表進行調整;請參閱儲存參數的說明。
autovacuum
(boolean
)
控制伺服器是否應該執行 autovacuum 啟動程序背景程序。這是預設開啟的;但是,track_counts 也必須啟用 autovacuum 工作。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但是,可以透過變更資料表儲存參數來禁用單個資料表的自動清除。
請注意,即使禁用此參數,系統也會在必要時啟動自動清理過程以防止交易事務 ID 重覆。有關更多訊息,請參閱第 24.1.5 節。
log_autovacuum_min_duration
(integer
)
如果 autovacuum 執行的每個操作至少運行了指定的毫秒數,則會被記錄下來。 將其設定為零會記錄所有自動清理操作。-1(預設值)禁用記錄自動清理操作。例如,如果將此設定為 250ms,則會記錄所有執行 250ms 或更長時間的自動清理和分析。另外,當此參數設定為除 -1 之外的任何值時,如果由於存在衝突鎖定而導致 autovacuum 操作被跳過,則會記錄一條記錄。啟用此參數可以有助於跟踪自動清理活動。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但是可以透過變更資料表的儲存參數來覆寫單個資料表的設定。
autovacuum_max_workers
(integer
)
指定可能在任何時間運行的自動清理程序的最大數目(除了自動清理啟動程序)。預設值是 3。該參數只能在伺服器啟動時設定。
autovacuum_naptime
(integer
)
指定在任何資料庫上執行 autovacuum 之間的最小延遲。 在每一輪背景程序檢查資料庫並根據需要為該資料庫中的資料表發出 VACUUM 和 ANALYZE 命令。延遲以秒為單位進行測量,預設值為 1 分鐘。該參數只能在 postgresql.conf 檔案或伺服器命令行中設定。
autovacuum_vacuum_threshold
(integer
)
指定在任何一個資料表中觸發 VACUUM 所需的更新或刪除 tuple 的最小數目。預設值是 50 個 tuple。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但是可以透過變更資料儲存參數來覆寫單個資料表的設定。
autovacuum_vacuum_insert_threshold
(integer
)
指定在任何一張資料表中觸發 VACUUM 所需的 INSERT tuple 數量。預設值為 1000 個 tuple。如果指定 -1,則 VACUUM 將不會基於 INSERT 數量而被觸發。此參數只能在 postgresql.conf 檔案中或在伺服器命令列上設定;但是可以透過修改資料表儲存參數來覆寫某個特定資料表的設定。
autovacuum_analyze_threshold
(integer
)
指定在任何一個資料表中觸發 ANALYZE 所需的插入、更新或刪除的 tuple 的最小數目。預設值是 50 個 tuple。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但是可以透過變更資料表儲存參數來覆寫單個資料表的設定。
autovacuum_vacuum_scale_factor
(floating point
)
決定觸發 VACUUM 時,指定要加到 autovacuum_vacuum_threshold 的資料表大小的比例。預設值是0.2(資料表大小的 20%)。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但是可以透過變更資料表儲存參數來覆寫單個資料表的設定。
autovacuum_vacuum_insert_scale_factor
(floating point
)
Specifies a fraction of the table size to add to autovacuum_vacuum_insert_threshold
when deciding whether to trigger a VACUUM
. The default is 0.2 (20% of table size). This parameter can only be set in the postgresql.conf
file or on the server command line; but the setting can be overridden for individual tables by changing table storage parameters.
autovacuum_analyze_scale_factor
(floating point
)
指定在決定是否觸發 ANALYZE 時加到 autovacuum_analyze_threshold 的資料表大小的比例。預設值是 0.1(資料表大小的 10%)。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但是可以透過變更資料表儲存參數來覆寫單個資料表的設定。
autovacuum_freeze_max_age
(integer
)
指定資料表的 pg_class.relfrozenxid 參數在 VACUUM 操作時被強制阻止資料表中的交易事務 ID 重覆之前可以達到的最大期限(在交易事務中)。請注意,系統將啟動 autovacuum 程序以防止重覆,即使禁用 autovacuum 時也會進行。
Vacuum 還允許從 pg_xact 子目錄中刪除舊檔案,這就是為什麼預設值是相對較低的 2 億次事務。該參數只能在伺服器啟動時設定,但透過變更資料表儲存參數可以減少單個資表的設定。有關更多訊息,請參閱第 24.1.5 節。
autovacuum_multixact_freeze_max_age
(integer
)
指定資料表的 pg_class.relminmxid 參數在 VACUUM 操作以防止資料表中的多個事務ID 重覆之前可以達到的最大時間(以 multixacts 表示)。請注意,系統將啟動 autovacuum 程序以防止重覆,即使禁用 autovacuum 也會進行。
資料庫清理 multixacts 還允許從 pg_multixact/members 和 pg_multixact/offset 子目錄中刪除舊檔案,這就是為什麼預設值是相對較低的 4 億個 multixacts。該參數只能在伺服器啟動時設定,但透過變更資料表儲存參數可以減少單個資料表的設定。有關更多訊息,請參閱第 24.1.5.1 節。
autovacuum_vacuum_cost_delay
(integer
)
指定將在自動 VACUUM 操作中使用的成本延遲值。如果指定了 -1,則將使用標準的 vacuum_cost_delay 值。預設值是 20 毫秒。此參數只能在 postgresql.conf 檔案或伺務器命令行中設定;但是可以透過變更資料表儲存參數來覆寫單個資料表的設定。
autovacuum_vacuum_cost_limit
(integer
)
指定將在自動 VACUUM 操作中使用的成本上限值。如果指定了 -1(這是預設值),則將使用標準的 vacuum_cost_limit 值。請注意,如果有多個工作程序,則在運行的自動清理工作程序之間會按比例分配值,以便每個工作程序的限制總和不超過此參數的值。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但也可以透過變更資料表儲存參數來覆寫單個資料表的設定。
這些參數控制伺服器端的統計數據收集功能。啟用統計數據收集後,可以透過 pg_stat 和 pg_statio 系列系統檢視表取得相關的資料。有關更多資訊,請參閱第 27 章。
track_activities
(boolean
)
啟用收集有關每個連線的當下執行命令的資訊以及該命令開始執行的時間的資訊。預設情況下,此參數是開啓的。請注意,即使啟用此功能,也不是所有使用者都可以取用,而只有超級使用者和擁有該連線的使用者可以檢視這些數據,因此它不會有安全風險。僅超級使用者可以變更此設定。
track_activity_query_size
(integer
)
為 pg_stat_activity.query 欄位指定保留的字元數,以追踪每個連線查詢當下執行的指令字串。預設值為1024。只能在伺服器啟動時設定此參數。
track_counts
(boolean
)
啟用有關資料庫活動的統計資訊收集。預設情況下,此參數是啟用的,因為 autovacuum 背景程序需要收集資訊。僅超級使用者可以變更此設定。
track_io_timing
(boolean
)
啟用資料庫 I/O 呼叫的計時。此參數預設情況下是處於關閉狀態,因為它將重複查詢作業系統當下的時間,這可能會導致某些平台上的大量運算成本。您可以使用 pg_test_timing 工具來測量系統計時的成本。I/O 時序資訊會顯示在 pg_stat_database 中,使用 BUFFERS 選項時在 EXPLAIN 的輸出中以及 pg_stat_statements 中顯示。僅超級使用者可以變更改此設定。
track_functions
(enum
)
啟用對函數呼叫計數和使用時間的追踪。指定 pl 僅追踪程序語言函數,all 則表示也追踪 SQL 和 C 語言函數。預設值為 none,這將停用函數統計資訊追踪。僅超級使用者可以變更此設定。
注意 不管此設定如何,都不會追踪足夠簡單以「inline」到呼叫查詢中的 SQL 語言函數。
stats_temp_directory
(string
)
設定用於儲存臨時統計數據的目錄。這可以是相對於資料目錄的路徑,也可以是絕對路徑。預設值為 pg_stat_tmp。將其指向基於 RAM 的檔案系統可以降低物理性 I/O 的要求,使得效能提升。只能在 postgresql.conf 檔案或伺服器命令列中設定此參數。
log_statement_stats
(boolean
)
log_parser_stats
(boolean
)
log_planner_stats
(boolean
)
log_executor_stats
(boolean
)
對於每個查詢,將相對應模組的效能統計數據輸出到伺服器日誌。這是一個粗略的分析工具,類似於 Unix getrusage() 作業系統的工具。log_statement_stats 總計整個查詢語句過程的統計數據,而其他的設定是每個查詢模組的統計數據。log_statement_stats 不能與任何其他模組選項同時啟用。預設情況下,所有這些選項都是停用的。只有超級使用者可以變更這些設定。
exit_on_error
(boolean
)If on, any error will terminate the current session. By default, this is set to off, so that only FATAL errors will terminate the session.
restart_after_crash
(boolean
)When set to on, which is the default, PostgreSQL will automatically reinitialize after a backend crash. Leaving this value set to on is normally the best way to maximize the availability of the database. However, in some circumstances, such as when PostgreSQL is being invoked by clusterware, it may be useful to disable the restart so that the clusterware can gain control and take any actions it deems appropriate.
This parameter can only be set in the postgresql.conf
file or on the server command line.
data_sync_retry
(boolean
)When set to off, which is the default, PostgreSQL will raise a PANIC-level error on failure to flush modified data files to the file system. This causes the database server to crash. This parameter can only be set at server start.
On some operating systems, the status of data in the kernel's page cache is unknown after a write-back failure. In some cases it might have been entirely forgotten, making it unsafe to retry; the second attempt may be reported as successful, when in fact the data has been lost. In these circumstances, the only way to avoid data loss is to recover from the WAL after any failure is reported, preferably after investigating the root cause of the failure and replacing any faulty hardware.
If set to on, PostgreSQL will instead report an error but continue to run so that the data flushing operation can be retried in a later checkpoint. Only set it to on after investigating the operating system's treatment of buffered data in case of write-back failure.
recovery_init_sync_method
(enum
)When set to fsync
, which is the default, PostgreSQL will recursively open and synchronize all files in the data directory before crash recovery begins. The search for files will follow symbolic links for the WAL directory and each configured tablespace (but not any other symbolic links). This is intended to make sure that all WAL and data files are durably stored on disk before replaying changes. This applies whenever starting a database cluster that did not shut down cleanly, including copies created with pg_basebackup.
On Linux, syncfs
may be used instead, to ask the operating system to synchronize the whole file systems that contain the data directory, the WAL files and each tablespace (but not any other file systems that may be reachable through symbolic links). This may be a lot faster than the fsync
setting, because it doesn't need to open each file one by one. On the other hand, it may be slower if a file system is shared by other applications that modify a lot of files, since those files will also be written to disk. Furthermore, on versions of Linux before 5.8, I/O errors encountered while writing data to disk may not be reported to PostgreSQL, and relevant error messages may appear only in kernel logs.
This parameter can only be set in the postgresql.conf
file or on the server command line.
shared_buffers
(integer
)
設定資料庫伺服器用於共享記憶體緩衝區的大小。預設值通常為 128 MB,但如果您的核心設定不支援(在 initdb 期間確定),則可能會更少。此設定必須至少為128 KB。(非預設值的 BLCKSZ 會改變最小值。)但是,通常需要高於最小值的設定才能獲得良好的性能。此參數只能在伺服器啟動時設定。
如果您擁有 1GB 或更多記憶體的專用資料庫伺服器,shared_buffers 的合理起始值是系統記憶體的 25%。有些工作負載甚至可以為 shared_buffers 設定更大的值,但由於PostgreSQL 依賴於作業系統緩衝區,因此,把 shared_buffers 分配 40% 以上的記憶體大小不太可能比少量分配更好。shared_buffers 較大設定通常需要 max_wal_size 相對應的增加,以便分散在較長時間內寫入大量新資料或變更資料的過程。
在 RAM 小於 1GB 的系統上,更小比例是合適的,以便為作業系統留下足夠的空間。
huge_pages
(enum
)
啟用/停用大型記憶體頁面。有效值為 try(預設值),on 和 off。
目前,僅在 Linux 上支援此功能。設定為 try 時,在其他系統上會忽略該設定。
大型頁面的使用會使得頁面管理表更小,記憶體管理花費的 CPU 時間更少,從而提高了效能。有關更多詳細訊息,請參閱第 18.4.5 節。
設定 huge_pages 後,伺服器將嘗試使用大型頁面,但如果失敗則回退到使用正常分配。如果為 on,則若無法使用大型頁面將使伺服器無法啟動。 off 時,則不會使用大型頁面。
temp_buffers
(integer
)
設定每個資料庫連線使用的最大臨時緩衝區大小。這些是僅用於存取臨時資料表的連線本地緩衝區。預設值為 8MB。可以在單個連線中變更設定,但只能在連線中首次使用臨時資料表之前更改;後續嘗試更改該值將不會對該連線產生任何影響。
連線將根據需要分配臨時緩衝區,直到 temp_buffers 的上限。實際上不需要很多臨時緩衝區的連線中設定較大值的成本只是 temp_buffers 中每個增量的緩衝區描述指標,或大約 64 個位元組。但是,如果實際使用緩衝區,則會消耗額外的 8192 位元組(或者通常為 BLCKSZ 個位元組)。
max_prepared_transactions
(integer
)
設定可同時處於「prepared」狀態的最大交易事務數量(請參閱 PREPARE TRANSACTION)。將此參數設定為零(這是預設值)的話,會停用預備交易的功能。此參數只能在伺服器啟動時設定。
如果您不打算使用預備交易事務,則應將此參數設定為零以防止意外建立預備的交易事務。如果您正在使用預備的交易事務,那麼您可能希望 max_prepared_transactions 至少與 max_connections 一樣大,以便每個連線都可以至少有一個準備好的預備交易事務。
運行備用伺服器時,必須將此參數設定為與主服務器上相同或更高的值。 否則,查詢將不被允許在備用伺服器中。
work_mem
(integer
)
指定寫入暫存檔之前內部排序操作和雜湊表使用的記憶體大小。此值預設為 4 MB。請注意,對於複雜的查詢,可能會同時執行多個排序或雜湊作業;在開始將資料寫入暫存檔之前,每個操作都將被允許盡可能使用記憶體。此外,多個連線可以同時進行這些操作。因此,所使用的總記憶體量可能是 work_mem 值的許多倍;決定值時必須牢記此一事實。排序操作用於 ORDER BY,DISTINCT 和 merge JOIN。雜湊表用於 hash JOIN,hash aggregation 和 IN 子查詢處理。
hash_mem_multiplier
(floating point
)
Used to compute the maximum amount of memory that hash-based operations can use. The final limit is determined by multiplying work_mem
by hash_mem_multiplier
. The default value is 1.0, which makes hash-based operations subject to the same simple work_mem
maximum as sort-based operations.
Consider increasing hash_mem_multiplier
in environments where spilling by query operations is a regular occurrence, especially when simply increasing work_mem
results in memory pressure (memory pressure typically takes the form of intermittent out of memory errors). A setting of 1.5 or 2.0 may be effective with mixed workloads. Higher settings in the range of 2.0 - 8.0 or more may be effective in environments where work_mem
has already been increased to 40MB or more.
maintenance_work_mem
(integer
)
指定維護操作要使用的最大記憶體大小,例如 VACUUM,CREATE INDEX 和ALTER TABLE ADD FOREIGN KEY。預設為 64 MB。由於資料庫連線一次只能執行其中一個操作,不會有多個同時運行,因此將此值設定為遠大於 work_mem 是安全的。較大的設定可能會提高清理和恢復資料庫回復的效能。
請注意,當 autovacuum 運行時,最多可以分配 autovacuum_max_workers 倍的記憶體,因此請注意不要將預設值設定得太高。透過單獨設定 autovacuum_work_mem 來控制它會有幫助。
autovacuum_work_mem
(integer
)
指定每個 autovacuum 工作程序使用的最大記憶體。它預設為 -1,表示應該使用 maintenance_work_mem 的值。以其他方式執行時,此設定對 VACUUM 的行為沒有影響。
logical_decoding_work_mem
(integer
)
Specifies the maximum amount of memory to be used by logical decoding, before some of the decoded changes are written to local disk. This limits the amount of memory used by logical streaming replication connections. It defaults to 64 megabytes (64MB
). Since each replication connection only uses a single buffer of this size, and an installation normally doesn't have many such connections concurrently (as limited by max_wal_senders
), it's safe to set this value significantly higher than work_mem
, reducing the amount of decoded changes written to disk.
max_stack_depth
(integer
)
指定伺服器工作堆疊的最大安全深度。此參數的理想設定是核心強制執行的實際堆疊大小限制(由 ulimit -s 或其他等效設定),減去 1 MB 左右的安全範圍。需要安全額度,因為在伺服器的每個程序中都不會檢查堆疊深度,而是僅在關鍵的潛在遞迴程序(例如表示式求值)中檢查。預設設定是 2 MB,這是保守地小,不太可能冒崩潰的風險。但是,它可能太小而無法執行複雜的功能。只有超級使用者才能變更此設定。
將 max_stack_depth 設定為高於實際核心限制將意味著失控的遞迴函數可能導致單個後端程序崩潰。在 PostgreSQL 可以確定核心限制的平台上,伺服器不允許將此變數設定為不安全的值。但是,並非所有平台都有提供資訊,因此建議在選擇值時要小心。
shared_memory_type
(enum
)
Specifies the shared memory implementation that the server should use for the main shared memory region that holds PostgreSQL's shared buffers and other shared data. Possible values are mmap
(for anonymous shared memory allocated using mmap
), sysv
(for System V shared memory allocated via shmget
) and windows
(for Windows shared memory). Not all values are supported on all platforms; the first supported option is the default for that platform. The use of the sysv
option, which is not the default on any platform, is generally discouraged because it typically requires non-default kernel settings to allow for large allocations (see Section 18.4.1).
dynamic_shared_memory_type
(enum
)
指定伺服器應使用的動態共享記憶體方法。可能的值是 posix(使用 shm_open 分配的 POSIX 共享記憶體),sysv(透過 shmget 分配的 System V 共享記憶體),windows(Windows 共享記憶體),mmap(使用儲存在資料目錄中的記憶體映射檔案來模擬共享記憶體) ),沒有(停用此功能)。並非所有平台都支援所有值;第一個受支援的選項是該平台的預設選項。通常不鼓勵使用 mmap 選項,這在任何平台上都不是預設選項,因為作業系統可能會將修改後的頁面重複寫回磁碟,從而增加系統 I/O 負載;但是,當 pg_dynshmem 目錄儲存在 RAM 磁碟上或其他共享記憶體裝置不可用時,它可能對除錯很有用。
temp_file_limit
(integer
)
指定程序可用於暫存檔的最大磁碟空間大小,例如排序和雜湊暫存檔,或持有游標的檔案。試圖超過此限制的交易將被取消。此值以 KB 為單位指定,-1(預設值)表示無限制。只有超級使用者可以變更改此設定。
此設定限制了給予 PostgreSQL 程序使用的所有暫存檔在任何時刻能使用的總空間。應該注意的是,用於臨時資料表的磁碟空間與在查詢執行過程中使用的暫存檔不同,並不會計入此限制。
max_files_per_process
(integer
)
設定每個伺服器子程序允許的同時最大開啓的檔案數。預設值是 1000 個檔案。如果核心可以確保每個程序的安全限制,則不必擔心此設定。但是在某些平台上(特別是大多數 BSD 系統),如果許多程序都嘗試開啓那麼多檔案,核心將允許單個程序打開比系統實際支援的更多的檔案。如果您發現自己看到“Too many open files”失敗,請嘗試減少此設定。此參數只能在伺服器啟動時設定。
在執行 VACUUM 和 ANALYZE 指令期間,系統會維護一個內部計數器,用於追踪執行的各種 I/O 操作的估計成本。當累計成本達到極限(由 vacuum_cost_limit 指定)時,執行操作的過程將在 sleep_cost_delay 指定的短時間內休眠。然後它將重置計數器並繼續執行。
此功能的目的是允許管理員減少這些指令對同時間資料庫活動的 I/O 影響。在許多情況下,像 VACUUM 和 ANALYZE 這樣的維護指令很快完成就不重要;但是,這些指令又通常非常重要,不會嚴重干擾系統執行其他資料庫操作的能力。基於成本的清理延遲為管理員提供了實現這一目標的途徑。
對於手動發出的 VACUUM 指令,預設情況下會停用此功能。要啟用它,請將 vacuum_cost_delay 變數設定為非零值。
vacuum_cost_delay
(integer
)
超出成本限制時程序將休眠的時間長度(以毫秒為單位)。預設值為零,這會停用成本考量的清理延遲功能。正值可實現成本考量的清理。請注意,在許多系統上,睡眠延遲的有效分辨率為 10 毫秒;將 vacuum_cost_delay 設定為不是 10 的倍數的值可能與將其設定為 10 的下一個更高倍數具有相同的結果。
當使用成本考量的資料庫清理時,vacuum_cost_delay 的適當值通常非常小,可能是 10 或 20 毫秒。調整清理的資源消耗最好透過變更其他清理成本參數來完成。
vacuum_cost_page_hit
(integer
)
清除共享緩衝區中找到的緩衝區估計成本。它表示鎖定緩衝池,查詢共享雜湊表和掃描頁面內容的成本。預設值為 1。
vacuum_cost_page_miss
(integer
)
清除必須從磁碟讀取的緩衝區的估計成本。這表示鎖定緩衝池,查詢共享雜湊表,從磁碟讀取所需塊並掃描其內容的成本。預設值為 10。
vacuum_cost_page_dirty
(integer
)
清理修改先前清理的區塊時産生的估計成本。它表示將已修改區塊再次更新到磁碟所需的額外 I/O。預設值為 20。
vacuum_cost_limit
(integer
)
累積成本將導致清理程序進入睡眠狀態。預設值為 200。
某些操作可能會持有關鍵的鎖定,因此應盡快完成。在此類操作期間不會發生成本考量的清理延遲。因此,成本可能會遠遠高於指定的限制。為了避免在這種情況下無意義的長延遲,實際延遲計算為 vacuum_cost_delay __* cumulative_balance / vacuum_cost_limit,最大為 vacuum_cost_delay _*_ 4。
有一個單獨的伺服器程序稱為背景寫入程序,其功能是發起「dirty」(新的或修改的)共享緩衝區的寫入。 它會寫入共享緩衝區,因此處理使用者查詢的伺服器程序很少或永遠不需要等待寫入的發生。但是,背景寫入程序確實導致 I/O 負載的整體的淨增加,因為雖然每個檢查點間隔可能只會寫一次 repeatedly-dirtied 頁面,但背景寫入程序可能會發起多次寫入,因為它在同一時間間隔內被變更了。本小節中討論的參數可用於調整適於本地需求的行為。
bgwriter_delay
(integer
)
指定背景寫入程序的輪詢之間的延遲。在每一次輪詢中,寫入程序發出一些 dirty 緩衝區的寫入(可透過以下參數控制)。然後它睡眠 bgwriter_delay 毫秒,再重複。但是,當緩衝池中沒有 dirty 緩衝區時,無論 bgwriter_delay 如何,它都會進入更長的睡眠狀態。預設值為 200 毫秒。請注意,在許多系統上,睡眠延遲的有效分辨率為 10 毫秒;將 bgwriter_delay 設定為不是 10 的倍數可能與將其設定為 10 的下一個更高倍數具有相同的結果。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
bgwriter_lru_maxpages
(integer
)
在每一次輪詢中,背景寫入程序將寫入多個緩衝區。將此值設定為零將停用背景寫入。(請注意,由單獨的專用輔助程序管理的檢查點不受影響。)預設值為 100 個緩衝區。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
bgwriter_lru_multiplier
(floating point
)
每次輪詢寫入的 dirty 緩衝區數量取決於最近幾輪中伺服器程序所需的新緩衝區數。將最近的平均需求乘以 bgwriter_lru_multiplier,得出下一輪期間所需緩衝區數量的估計值。寫入 dirty 緩衝區,直到有許多乾淨,可再利用的緩衝區可用。(但是,每輪不會寫入超過 bgwriter_lru_maxpages 的緩衝區。)因此,1.0 的設定表示準確寫出預測需要的緩衝區數量的「Just in time」策略。較大的值為需求中的峰值提供了一些緩衝,而較小的值有意地使寫入由伺服器程序完成。預設值為 2.0。 此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
bgwriter_flush_after
(integer
)
只要背景寫入程序寫入了超過 bgwriter_flush_after 個位元組,就會嘗試強制作業系統向底層儲存系統發出這些寫入操作。這樣做會限制核心頁面緩衝區中的 dirty 資料量,減少在檢查點結束時發出 fsync 時停止的可能性,或者作業系統在背景以較大批次寫回資料的可能性。通常這會導致事務延遲大大減少,但也有一些情況,特別是工作負載大於 shared_buffers,但小於作業系統的頁面緩衝,其效能可能會降低。 此設定可能對某些平台沒有影響。有效範圍介於 0(停用強制寫回)和2MB之間。Linux 上的預設值為 512kB,其他地方為 0。(如果 BLCKSZ 不是8kB,則預設值和最大值會按比例縮放。)此參數只能在 postgresql.conf 檔案或匼服器命令列中設定。
bgwriter_lru_maxpages 和 bgwriter_lru_multiplier 設定較小值可以減少背景寫入程序造成的額外 I/O 負載,但使伺服器程序更有可能必須為自己發出寫入要求,可能造成交互查詢的延遟。
effective_io_concurrency
(integer
)設定 PostgreSQL 期望可以同時執行的磁碟 I/O 操作數。提高此值將增加任何單個 PostgreSQL 連線嘗試同時啟動的 I/O 操作數。允許的範圍是 1 到 1000,或者為零以停用非同步 I/O 要求的使用。目前,此設定僅影響 bitmap heap 掃描。
對於磁碟機而言,此設定一個很好的起點是包含用於資料庫的 RAID 0 分散或 RAID 1 鏡像的單獨磁碟數量。(對於 RAID 5,不應計算奇偶校驗磁碟。)但是,如果資料庫通常忙於在同時連線中發出多個查詢,則較低的值可能足以使磁碟陣列保持忙碌狀態。高於保持磁碟繁忙所需的值只會導致額外的 CPU 開銷。SSD和其他基於內存的儲存通常可以處理許多同時要求,因此最佳值可能是數百個。
非同步 I/O 取決於某些作業系統缺乏的有效 posix_fadvise 函數。如果該功能不存在,則將此參數設定為零以外的任何值將導致錯誤。而在某些作業系統(例如,Solaris)上,此功能存在但實際上並沒有做任何事情。
在受支援的系統上預設值為 1,否則為 0。透過設定同名的 tablespace 參數,可以為特定資料表空間中的資料表覆寫此值(請參閱 ALTER TABLESPACE)。
maintenance_io_concurrency
(integer
)Similar to effective_io_concurrency
, but used for maintenance work that is done on behalf of many client sessions.
The default is 10 on supported systems, otherwise 0. This value can be overridden for tables in a particular tablespace by setting the tablespace parameter of the same name (see ALTER TABLESPACE).
max_worker_processes
(integer
)設定系統可以支援的最大背景程序數量。此參數只能在伺服器啟動時設定。預定值為 8。
執行備用伺服器時,必須將此參數設定為與主伺服器上相同或更高的值。否則,將不允許在備用伺服器中進行查詢。
變更此值時,請考慮同步調整 max_parallel_workers 和 max_parallel_workers_per_gather。
max_parallel_workers_per_gather
(integer
)設定單個 Gather 或 Gather Merge 節點可以啟動的最大工作程序數量。同時工作程序取自 max_worker_processes 建立的程序池,由 max_parallel_workers 限制。請注意,請求的工作程序數量在執行時可能實際上不可用。如果發生這種情況,計劃將以比預期更少的工作程序運行,這可能是低效能的。預設值為 2。將此值設定為 0 將停用平行查詢執行。
請注意,平行查詢可能比非平行查詢消耗的資源要多得多,因為每個工作程序都是一個完全獨立的程序,與其他使用者連線對系統的影響大致相同。在為此設定選擇值時,以及在配置控制資源利用率的其他設定(例如work_mem)時,應考慮這一點。 諸如 work_mem 之類的資源限制被單獨應用於每個工作程序,這意味著所有程序的總利用率可能比通常用於任何單個程序的總利用率高得多。例如,使用 4 個工作程序的平行查詢可能會使用高達 5 倍的 CPU 時間、記憶體、I/O 頻寬等作為根本不使用工作程序的查詢。
有關平行查詢的更多訊息,請參閱第 15 章。
max_parallel_maintenance_workers
(integer
)Sets the maximum number of parallel workers that can be started by a single utility command. Currently, the parallel utility commands that support the use of parallel workers are CREATE INDEX
only when building a B-tree index, and VACUUM
without FULL
option. Parallel workers are taken from the pool of processes established by max_worker_processes, limited by max_parallel_workers. Note that the requested number of workers may not actually be available at run time. If this occurs, the utility operation will run with fewer workers than expected. The default value is 2. Setting this value to 0 disables the use of parallel workers by utility commands.
Note that parallel utility commands should not consume substantially more memory than equivalent non-parallel operations. This strategy differs from that of parallel query, where resource limits generally apply per worker process. Parallel utility commands treat the resource limit maintenance_work_mem
as a limit to be applied to the entire utility command, regardless of the number of parallel worker processes. However, parallel utility commands may still consume substantially more CPU resources and I/O bandwidth.
max_parallel_workers
(integer
)
設定系統可以支援平行查詢的最大工作程序數量。預設值為 8。增大或減小此值時,請考慮調整 max_parallel_workers_per_gather。另請注意,此值的設定高於 max_worker_processes 將不起作用,因為平行工作程序取自該設定所建立的工作程序池。
backend_flush_after
(integer
)
只要一個後端寫入了多個 backend_flush_after 字串,就會嘗試強制作業系統向底層儲存發出這些寫入操作。這樣做會限制核心頁面緩衝區中的非同步資料量,減少在檢查點結束時發出 fsync 時暫時停止的可能性,或者作業系統在後端以較大批量寫回資料的可能性。通常這會導致事務延遲大大減少,但也有一些情況,特別是工作負載大於shared_buffers,但小於作業系統的頁面暫存,其性能可能會降低。此設定可能對某些平台沒有影響。有效範圍介於 0(停用強制寫回)和 2MB 之間。預設值為 0,即沒有強制寫回。(如果 BLCKSZ 不是 8kB,則最大值與其成比例。)
old_snapshot_threshold
(integer
)
設定可以使用快照的最短時間,而不會在使用快照時發生快照過舊的錯誤。此參數只能在伺服器啟動時設定。
超過閾值,舊資料可能被清理。這可以幫助防止長時間使用的快照所面臨的資料膨脹。為了防止由於清理快照可能會顯示資料的錯誤結果,當快照早於此閾值時會産生錯誤,並且快照用於讀取自建構快照以來已修改的頁面。
值 -1 將停用此功能,並且是預設值。産品等級的有用值可能從少量幾小時到幾天不等。此設定將被強制為分鐘的顆粒度,並且僅允許小數字(例如 0 或 1 分鐘),因為它們有時可用於測試。雖然允許設定高達 60d,但請注意,在許多工作負載中,可能會在更短的時間範圍內發生極端資料膨脹或事務 ID 重覆。
啟用此功能後,關連末尾釋放的空間無法釋放到作業系統,因為這可能會刪除檢測快照過舊狀態所需的訊息。除非明確要求釋放(例如,使用 VACUUM FULL),否則分配給關連的所有空間仍與該關連相關聯,僅在該關連內重覆使用。
此設定不會嘗試保證在任何特定情況下都會産生錯誤。實際上,如果可以從已完成結果集合的游標産生正確的結果,即使引用資料表中的基礎資料列已被清理,也不會産生錯誤。有些資料表不能安全地儘早清理,因此不會受到此設定的影響,例如系統目錄。對於此類資料表,此設定既不會減少膨脹,也不會在掃描時產生快照過舊的錯誤。
The following “parameters” are read-only, and are determined when PostgreSQL is compiled or when it is installed. As such, they have been excluded from the sample postgresql.conf
file. These options report various aspects of PostgreSQL behavior that might be of interest to certain applications, particularly administrative front-ends.
block_size
(integer
)
Reports the size of a disk block. It is determined by the value of BLCKSZ
when building the server. The default value is 8192 bytes. The meaning of some configuration variables (such as shared_buffers) is influenced by block_size
. See Section 19.4 for information.
data_checksums
(boolean
)
Reports whether data checksums are enabled for this cluster. See data checksums for more information.
debug_assertions
(boolean
)
Reports whether PostgreSQL has been built with assertions enabled. That is the case if the macro USE_ASSERT_CHECKING
is defined when PostgreSQL is built (accomplished e.g. by the configure
option --enable-cassert
). By default PostgreSQL is built without assertions.
integer_datetimes
(boolean
)
Reports whether PostgreSQL was built with support for 64-bit-integer dates and times. As of PostgreSQL 10, this is always on
.
lc_collate
(string
)
Reports the locale in which sorting of textual data is done. See Section 23.1 for more information. This value is determined when a database is created.
lc_ctype
(string
)
Reports the locale that determines character classifications. See Section 23.1 for more information. This value is determined when a database is created. Ordinarily this will be the same as lc_collate
, but for special applications it might be set differently.
max_function_args
(integer
)
Reports the maximum number of function arguments. It is determined by the value of FUNC_MAX_ARGS
when building the server. The default value is 100 arguments.
max_identifier_length
(integer
)
Reports the maximum identifier length. It is determined as one less than the value of NAMEDATALEN
when building the server. The default value of NAMEDATALEN
is 64; therefore the default max_identifier_length
is 63 bytes, which can be less than 63 characters when using multibyte encodings.
max_index_keys
(integer
)
Reports the maximum number of index keys. It is determined by the value of INDEX_MAX_KEYS
when building the server. The default value is 32 keys.
segment_size
(integer
)
Reports the number of blocks (pages) that can be stored within a file segment. It is determined by the value of RELSEG_SIZE
when building the server. The maximum size of a segment file in bytes is equal to segment_size
multiplied by block_size
; by default this is 1GB.
server_encoding
(string
)
Reports the database encoding (character set). It is determined when the database is created. Ordinarily, clients need only be concerned with the value of client_encoding.
server_version
(string
)
Reports the version number of the server. It is determined by the value of PG_VERSION
when building the server.
server_version_num
(integer
)
Reports the version number of the server as an integer. It is determined by the value of PG_VERSION_NUM
when building the server.
wal_block_size
(integer
)
Reports the size of a WAL disk block. It is determined by the value of XLOG_BLCKSZ
when building the server. The default value is 8192 bytes.
wal_segment_size
(integer
)
Reports the number of blocks (pages) in a WAL segment file. The total size of a WAL segment file in bytes is equal to wal_segment_size
multiplied by wal_block_size
; by default this is 16MB. See Section 30.4 for more information.
log_destination
(string
)PostgreSQL 支援多種記錄伺服器訊息的方法,包括 stderr、csvlog 和 syslog。在 Windows 上,支援 eventlog。 將此參數設定為用逗號分隔的所需日誌目標的列表。 預設情況下僅記錄到 stderr。此參數只能在 postgresql.conf 檔案或伺服器命令中設定。
如果 csvlog 包含在 log_destination 中的話,則日誌將以「逗號分隔」(CSV)格式輸出,便於將日誌載入到其他程序中。詳情請參閱第 19.8.4 節。 必須啟用 logging_collector 才能産生 CSV 格式的日誌輸出。
如果包含 stderr 或 csvlog,則會建立 current_logfiles 檔案以記錄日誌記錄收集器和相關日誌記錄目標目前正在使用的日誌檔案的位置。這提供了一種便捷的方式來查詢目前資料庫實例正在使用的日誌。這裡有這個檔案內容的一個例子:
當一個新的日誌檔案被建立為循環的效果,並且重新載入 log_destination 時,會重新建立 current_logfiles。當 log_destination 中不包含 stderr 和 csvlog,並且日誌記錄收集器被停用時,它將被刪除。
在大多數 Unix 系統上,您需要變更系統 syslog daemon 的配置,以便使用 log_destination 的 syslog 選項。PostgreSQL 可以登入到系統日誌工具 LOCAL0 到 LOCAL7(請參閱 syslog_facility),但大多數平台上的預設 syslog 配置將放棄所有此類訊息。您需要加入如下的內容:
變更 syslog 背景程序的配置檔案以使其産生作用。
在 Windows 上,當您為 log_destination 使用 eventlog 選項時,應該向作業系統註冊事件來源及其函式庫,以便 Windows 事件查詢器可以清楚地顯示事件日誌消息。詳情請參閱第 18.11 節。
logging_collector
(boolean
)此參數啟用日誌收集器,這是一個後端的程序,用於攔截發送到 stderr 的日誌訊息並將其重新輸出到日誌檔案。這種方法通常比記錄到 syslog 更有用,因為某些類型的訊息可能不會出現在 syslog 輸出之中。(一個常見的案例是動態連結程序失敗訊息;另一個案例是由如 archive_command 等腳本産生的錯誤訊息。)此參數只能在伺服器啟動時設定。
可以在不使用日誌收集器的情況下送到 stderr;日誌訊息將只發送到伺服器的 stderr 所指向的任何地方。但是,該方法僅適用於較低階的日誌程序,因為它不提供日誌檔案覆寫的簡便方法。另外,在某些不使用日誌收集器的平台上可能會導致日誌輸出遺失或出現亂碼,因為同時寫入同一日誌檔案的多個程序可能會覆蓋彼此的輸出。
日誌記錄收集器旨在永不遺失訊息。這意味著,如果負載極高,則在收集器落後時嘗試發送其他日誌消息時,伺服器程序可能會被阻止繼續執行。相比之下,如果系統日誌不能寫入訊息,系統日誌會傾向於丟棄訊息,這意味著在這種情況下它可能無法記錄某些訊息,但不會阻塞系統的其餘部分。
log_directory
(string
)當啟用 logging_collector 時,此參數確定將在其中建立日誌檔案的目錄。它可以被指定為絕對路徑,或相對於叢集的 data 目錄。該參數只能在 postgresql.conf 檔案或伺服器指令行中設定。預設值是 log。
log_filename
(string
)當啟用 logging_collector 時,此參數設定建立的日誌檔案的檔案名稱。該值被視為 strftime 模式,因此 %-escapes 可用於指定隨時間變化的檔案名稱。(請注意,如果有任何時區相關的 %-escapes,計算將在由 log_timezone 指定的區域中完成。)支援的 %-escapes 與 Open Group 的 strftime 規範中列出的類似。請注意,系統的 strftime 並未直接使用,因此特定於平台的(非標準)延伸功能不起作用。預設值是 postgresql-%Y-%m-%d_%H%M%S.log。
如果您指定的檔案名稱不含跳脫符號,則應該計劃使用日誌覆寫程序來避免最後存滿整個磁碟。在 8.4 之前的版本中,如果不存在 % 跳脫符號,PostgreSQL 會追加新日誌檔案建立時間的紀元,但已經不再是這種情況了。
如果在 log_destination 中啟用 CSV 格式的輸出,則會將時間戳記檔案名稱附加.csv 以建立 CSV 格式輸出的檔案名稱。 (如果 log_filename 以 .log 結尾,則替換後綴。)
該參數只能在 postgresql.conf 檔案或伺服器指令中設定。
log_file_mode
(integer
)在 Unix 系統上,此參數在啟用 logging_collector 時設定日誌檔案的權限。(在 Microsoft Windows 上,此參數將被忽略。)參數值預期為以 chmod 和 umask 系統呼叫接受的格式來指定的數字模式。(要使用習慣的八進制格式,數字必須以 0(零)開頭。)
預設權限為 0600,這意味著只有伺服器擁有者才能讀取或寫入日誌檔案。另一個常用的設定是 0640,允許擁有者組群的成員讀取文件。 但是請注意,要使用這種設定,您需要變更 log_directory 以將檔案儲存在叢集 data 目錄之外的某個位置。無論如何,使日誌檔案讓任何人都可讀是不明智的,因為它們可能包含敏感資料。
該參數只能在 postgresql.conf 檔案或伺服器指令中設定。
log_rotation_age
(integer
)當啟用 logging_collector 時,此參數決定單個日誌檔案的最長生命週期。經過指定的分鐘後,會建立一個新的日誌檔案。設定為零以停用基於時間的新日誌檔案建立。該參數只能在 postgresql.conf 檔案或伺服器指令中設定。
log_rotation_size
(integer
)當啟用 logging_collector 時,此參數決定單個日誌檔的大小上限。在超過上限的記錄被發送到日誌檔案後,將建立一個新的日誌檔案。設定為零以禁用基於大小的新日誌檔案創立。該參數只能在 postgresql.conf 檔案或伺服器指令中設定。
log_truncate_on_rotation
(boolean
)當啟用 logging_collector 時,此參數將導致 PostgreSQL 分割(覆蓋)而不是追加到任何具有相同名稱的現有日誌檔案。 但是,分割只會在由於基於時間的覆寫而打開新檔案時發生,而不是在伺服器啟動或基於大小的覆寫情況進行。關閉時,預先存在的檔案將被附加到所有情況下。例如,將此設定與 log_filename(如 postgresql-%H.log)結合使用可産生 24 個小時日誌檔案,然後循環覆蓋它們。該參數只能在 postgresql.conf 檔案或伺服器指令中設定。
例如:要保留 7 天的日誌,每天一個名稱為 server_log.Mon,server_log.Tue 等的日誌檔案,並自動使用本週的日誌覆蓋上週的日誌,將 log_filename 設定為server_log.%a,將 log_truncate_on_rotation 設定為 on,並將 log_rotation_age 到 1440。
又例如:要保留 24 小時的日誌,每小時記錄一個日誌檔案,但是如果日誌檔案大小超過 1GB,則會盡快輪換,將 log_filename 設定為 server_log.%H%M,log_truncate_on_rotation 為 on,log_rotation_age 為 60,log_rotation_size 為1000000。在 log_filename 中包含 %M 允許可能出現的任何大小驅動的旋轉,以選擇與小時的初始檔案名稱不同的檔案名稱。
syslog_facility
(enum
)當啟用日誌記錄到 syslog 時,此參數確定要使用的系統日誌的「設施」。 您可以選擇 LOCAL0,LOCAL1,LOCAL2,LOCAL3,LOCAL4,LOCAL5,LOCAL6,LOCAL7;預設值是 LOCAL0。另請參閱系統的 syslog 背景程序的文件。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
syslog_ident
(string
)當啟用日誌記錄到系統日誌時,此參數決定用於在系統日誌中識別 PostgreSQL 記錄的程序名稱。預設是 postgres。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
syslog_sequence_numbers
(boolean
)當記錄到系統日誌並且這是啓用的(預設),那麼每筆記錄將以遞增的序列號碼(例如[2])作為前置內容。這規避了「---最後一條記錄重複 N 次---」抑制了許多 syslog 實務上預設執行的操作。在更現代的 syslog 實作中,可以設定重複的記錄抑制(例如,rsyslog 中的 $RepeatedMsgReduction),所以這可能不是必要的。另外,如果你真的想要抑制重複的記錄,就可以關掉它。
此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
syslog_split_messages
(boolean
)當啟用日誌記錄到 syslog 時,此參數決定記錄如何傳遞到系統日誌。啟用時(預設),記錄按行分割,使得行長在 1024 字元以下,這是傳統 syslog 實作的典型大小限制。關閉時,PostgreSQL 伺服器日誌記錄會按原樣傳遞到系統日誌服務,並由系統日誌服務來處理潛在的龐大記錄。
如果 syslog 最終記錄到文字檔案,那麼效果將是相同的,並且最好將設定保留為開啟狀態,因為大多數 syslog 實作無法處理大量記錄,或者需要專門設定以處理它們。但是,如果系統日誌最終寫入其他媒體,將記錄邏輯上地組合在一起可能是必要的或更有用的。
此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
event_source
(string
)當啟用記錄到事件日誌時,此參數確定用於在記錄中識別 PostgreSQL 記錄的程序名稱。預設是 PostgreSQL。 此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
client_min_messages
(enum
)控制將哪些訊息等級要發送到用戶端。有效的值為 DEBUG5、DEBUG4、DEBUG3、DEBUG2、DEBUG1、LOG、NOTICE、WARNING、ERROR、FATAL 和 PANIC。每個等級包括其後的所有等級。等級越低,發送的訊息越少。預設值為 NOTICE。請注意,LOG 在此處的排名與 log_min_messages 中的排序不同。
log_min_messages
(enum
)控制將哪些訊息等級寫入伺服器日誌。有效的值為 DEBUG5、DEBUG4、DEBUG3、DEBUG2、DEBUG1、INFO、NOTICE、WARNING、ERROR、LOG、FATAL 和 PANIC。每個等級包括其後的所有等級。等級越低,發送到日誌的訊息越少。預設值為 WARNING。請注意,LOG 在此處的排序與 client_min_messages 中的排名不同。只有超級使用者才能變更此設定。
log_min_error_statement
(enum
)將導致錯誤情況的 SQL 語句記錄在伺服器日誌中。當下的 SQL 語句包含在指定的嚴重性或更高等級的任何訊息日誌項目中。有效值為 DEBUG5、DEBUG4、DEBUG3、DEBUG2、DEBUG1、INFO、NOTICE、WARNING、ERROR、LOG、FATAL 和 PANIC。預設值為 ERROR,這意味著將會記錄 ERROR、LOG、FATL 或 PANIC。要有效地關閉失敗語句的日誌記錄,請將此參數設定為PANIC。只有超級使用者才能變更此設定。
log_min_duration_statement
(integer
)如果語句執行達到指定的毫秒數,則會記錄每個已完成語句的執行時間。將此值設定為零將輸出所有語句的執行時間。減號(預設值)停用日誌記錄語句執行時間。例如,如果將其設定為 250ms,則將記錄執行 250ms 或更長時間的所有 SQL 語句。啟用此參數有助於在應用程序中追踪未優化的查詢。只有超級使用者才能變更此設定。
對於使用延伸查詢協議的用戶端,Parse、Bind 和 Execute 步驟的執行時間是獨立記錄的。
將此選項與 log_statement 一起使用時,由於 log_statement 而記錄的語句文字將不會在執行時間日誌訊息中重複。如果您不使用 syslog,建議您使用 log_line_prefix 記錄 PID 或連線 ID,以便可以使用 PID 或連線 ID將語句訊息連接到之後的執行時間訊息。
表格 19.1 說明了 PostgreSQL 使用的訊息嚴重性等級。如果將日誌記錄輸出發送到 syslog 或 Windows 的事件日誌,則嚴重性等級將按表格中所示進行轉換。
DEBUG1..DEBUG5
提供連續且更詳細的訊息供開發人員使用。
DEBUG
INFORMATION
INFO
提供隱含用戶請求的訊息,例如來自 VACUUM VERBOSE 的輸出。
INFO
INFORMATION
NOTICE
提供可能對用戶有幫助的訊息,例如,截斷 long identifier 的通知。
NOTICE
INFORMATION
WARNING
提供可能出現問題的警告,例如交易事務區塊外的 COMMIT。
NOTICE
WARNING
ERROR
回報導致當下指令中止的錯誤。
WARNING
ERROR
LOG
回報管理員感興趣的訊息,例如檢查點的活動。
INFO
INFORMATION
FATAL
回報導致當下連線中止的錯誤。
ERR
ERROR
PANIC
回報導致所有資料庫連線中止的錯誤。
CRIT
ERROR
application_name
(string
)application_name 可以是少於 NAMEDATALEN 個字元的任何字串(標準版本中為 64 個字元)。它通常由應用程序在連線到伺服器時設定。此名稱將顯示在 pg_stat_activity 檢視表中,並包含在 CSV 日誌項目中。它還可以透過 log_line_prefix 參數包含在日常日誌項目中。application_name 中只能使用可列印的 ASCII 字元。其他字元將替換為問號(?)。
debug_print_parse
(boolean
) debug_print_rewritten
(boolean
) debug_print_plan
(boolean
)這些參數可以發出各種除錯輸出。設定後,它們將輸出産生的語法解析樹,查詢重寫程序輸出或每個已執行查詢的執行計劃。這些訊息以 LOG 訊息等級發出,因此預設情況下它們將顯示在伺服器日誌中,但不會發送到客戶端。您可以透過調整 client_min_messages 或 log_min_messages 來變更它。這些參數預設是關閉的。
debug_pretty_print
(boolean
)設定後,debug_pretty_print 會放進 debug_print_parse,debug_print_rewritten 或debug_print_plan 産生的訊息。與關閉時使用的緊湊格式相比,這會產生更多可讀但更長的輸出。它預設開啟。
log_checkpoints
(boolean
)使檢查點和重新啟動點記錄在伺服器日誌中。日誌訊息中包含一些統計訊息,包括寫入的緩衝區數和寫入時間。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設為關閉。
log_connections
(boolean
)導致記錄每個嘗試連線到伺服器,以及成功完成用戶端身份驗證。只有超級使用者才能在連線開始時變更此參數,之後在連線中無法更改。預設為關閉。
注意 某些用戶端程序(如 psql)在確定是否需要密碼時會嘗試連線兩次,因此重複的“connection received”訊息不一定表示存在問題。
log_disconnections
(boolean
)導致連線終止會被記錄。日誌輸出提供類似於 log_connections 的訊息,以及連線的持續時間。只有超級使用者才能在連線開始時變更此參數,並且在連線中無法更改。預設為關閉。
log_duration
(boolean
)記錄每個已完成語句的持續時間。預設為關閉。只有超級使用者才能變更此設定。
對於使用延伸查詢協議的用戶端,Parse、Bind 和 Execute 步驟的持續時間是獨立記錄的。
注意 設定選項和將 log_min_duration_statement 設定為零之間的區別在於,超出 log_min_duration_statement 會強制記錄查詢的語句,但此選項不會。因此,如果啟用了 log_duration 且 log_min_duration_statement 具有正值,則會記錄所有持續時間,但僅包含超過閾值的語句的查詢語句。此行為對於在高負載環境中收集統計訊息非常有用。
log_error_verbosity
(enum
)控制記錄的每條訊息在伺服器日誌中寫入的詳細訊息量。有效值為 TERSE,DEFAULT 和 VERBOSE,每個都向顯示的訊息加上更多內容。TERSE 排除記錄DETAIL,HINT,QUERY 和 CONTEXT 錯誤訊息。VERBOSE 輸出包括 SQLSTATE 錯誤代碼(另請參閱附錄 A)以及産生錯誤的原始檔案名稱,函數名稱和行號。只有超級使用者才能更改此設定。
log_hostname
(boolean
)預設情況下,連線日誌訊息僅顯示連線主機的 IP 位址。打開此參數就會記錄主機名。請注意,根據您的主機名稱解析設定,這可能會造成不可忽視的效能損失。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
log_line_prefix
(string
)這是一個 printf 樣式的字串,在每個日誌的開頭輸出。%字元開始「跳脫序列(escape sequence)」,它們會被狀態訊息替換,如下所述。 無法識別的跳脫字元會被忽略。其他字元將直接複製到日誌內容。某些跳脫字元只能由連線程序識別,並且將被背景程序(例如主伺服器程序)視為空。透過在 % 之後和選項之前指定數字文字,可以向左或向右對齊狀態訊息。負值會將狀態信息在右側填充空格以給予一個最小寬度,而正值將填充在左側。填充可用於增加日誌檔案中的可讀性。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值為'%m [%p]',用於記錄時間戳記和程序 ID。
%a
應用名稱
yes
%u
使用者名稱
yes
%d
資料庫名稱
yes
%r
遠端主機名稱或 IP 位址,以及遠端連接埠
yes
%h
遠端主機名稱或 IP 位址
yes
%p
程序 ID
no
%t
時間戳記,不含毫秒
no
%m
時間戳記,包含毫秒
no
%n
時間戳記,包含毫秒(Unix epoch)
no
%i
指令標記:連線的當下指令類型
yes
%e
SQLSTATE 錯誤代碼
no
%c
連線 ID:詳見下文
no
%l
每個連線或程序的日誌行號,從 1 開始
no
%s
開始處理的時間戳記
no
%v
虛擬交易事務 ID(backendID / localXID)
no
%x
交易事務 ID(如果沒有分配,則為 0)
no
%q
不產生輸出,但告訴非連線程序在此字串中停止;被連線中程序忽略
no
%%
文字 %
no
%c 跳脫字元輸出一個幾乎唯一的連線指標,兩個由點分隔的 4 位元組的十六進制數字(不帶前導零)組成。數字是流程開始時間和程序 ID,因此 %c 也可以用作輸出這些項目的節省空間的方式。例如,要從 pg_stat_activity 産生連線指標,請使用以下查詢:
小技巧 如果為 log_line_prefix 設定了非空值,則通常應將其最後一個字元設為空格,以便與日誌行的其餘部分進行視覺隔離。也可以使用標點符號。
小技巧 Syslog 會産生成自己的時間戳記和程序 ID 訊息,因此如果要輸出到 syslog,可能不希望包含這些跳脫字元。
小技巧 當包含僅在使用者或資料庫名稱等連線(後端)內容中可用的訊息時,%q 跳脫字元非常有用。例如:
log_lock_waits
(boolean
)控制連線等待時間超過 deadlock_timeout 時是否產生日誌訊息。這對於確定鎖定等待是否導致性能較差很有用。預設是關閉的。只有超級使用者可以變更此設定。
log_statement
(enum
)控制記錄哪些 SQL 語句。有效值為 none(off),ddl,mod 和 all(所有語句)。 ddl 記錄所有資料定義語句,例如 CREATE,ALTER 和 DROP 語句。mod 記錄所有 ddl 語句,以及 INSERT,UPDATE,DELETE,TRUNCATE 和 COPY FROM 等資料修改語句。如果包含的指令屬於適合的類型,也會記錄 PREPARE,EXECUTE 和 EXPLAIN ANALYZE 語句。對於使用延伸查詢協議的用戶端,在收到 Execute 訊息時會發生日誌記錄,並且包含 Bind 參數的值(任何嵌入的單引號標記加倍)。
預設值為 none。只有超級使用者才能變更此設定。
注意 即使是 log_statement = all 設定也不會記錄包含簡單語法錯誤的語句,因為只有在完成基本分析以確定語句類型後才會發出日誌訊息。在延伸查詢協議的情況下,此設定同樣不記錄在執行階段之前失敗的語句(即,在解析分析或計劃期間)。將 log_min_error_statement 設定為 ERROR(或更低)以記錄此類語句。
log_replication_commands
(boolean
)讓每個複寫指令都記錄在伺服器日誌中。有關複寫指令的更多訊息,請參閱第 52.4 節。預設值為 off。只有超級使用者才能變更此設定。
log_temp_files
(integer
)控制臨時檔案名稱和大小的記錄。可以為排序,雜湊和臨時查詢結果建立臨時檔案。移除時,會為每個臨時檔案建立一個日誌項目。值為 0 時會記錄所有臨時檔案訊息,而正值僅記錄大小大於或等於指定 KB 的檔案。預設設定為 -1,停用此類日誌記錄。只有超級使用者才能變更此設定。
log_timezone
(string
)設定用於在伺服器日誌中寫入的時間戳記的時區。與 TimeZone 不同,此值是叢集範圍的,因此所有連線都將一致地報告時間戳記。內建的預設值是 GMT,但這通常在 postgresql.conf 中會再設定過;initdb 將在那裡安裝與其系統環境相對應的設定。有關更多訊息,請參閱第 8.5.3 節。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
在 log_destination 列表中包含 csvlog 提供了將日誌檔案匯入資料庫資料表的便捷方法。此選項以逗號分隔(CSV)格式送出日誌資料,其中包含以下欄位:時間戳記,毫秒,使用者名稱,資料庫名稱,程序 ID,用戶端主機:連接埠號號,連線 ID,每個連線的行號,指令標記,連線開始時間,虛擬交易事務 ID,一般交易事務 ID,錯誤嚴重性,SQLSTATE 代碼,錯誤訊息,錯誤訊息的詳細訊息,提示,導致錯誤的內部查詢(如果有的話),其中錯誤位置的字串位置,錯誤內容,導致錯誤的使用者查詢(如果有的話,由 log_min_error_statement 啟用),其中錯誤位置的字元數,PostgreSQL 原始碼中的錯誤位置(如果 log_error_verbosity 設定為 verbose)和應用程序名稱。以下是用於儲存 CSV 格式日誌輸出的範例資料表定義:
要將日誌檔案匯入此資料表,請使用 COPY FROM 指令:
您需要做一些事情來簡化匯入 CSV 日誌檔案:
設定 log_filename 和 log_rotation_age 使日誌檔案提供一致性,可預測的命名方案。這使您可以預測檔案名稱會是什麼,並知道單個日誌檔案何時完成而可以匯入。
將 log_rotation_size 設定為 0 可停用基於大小的日誌輪轉,因為它會使日誌檔案名稱難以預測。
將 log_truncate_on_rotation 設定為 on,以便舊的日誌資料不會與同一檔案中的新資料混合。
上面的資料表定義包含主鍵規範。這有助於防止意外匯入兩次相同的訊息。COPY 指令一次提交它匯入的所有資料,因此任何錯誤都會導致整個匯入失敗。如果匯入部分日誌檔案,並在稍後再次匯入該檔案時,主鍵重覆將導致匯入失敗。請等到日誌完成關閉後再匯入。此過程還可以防止意外匯入尚未完全寫入的部分資料列,這也會導致 COPY 失敗。
這些設定控制如何修改伺服器程序的程序標題。程序標題通常使用如 ps 或 Windows 上的 Process Explorer 查看。詳情請參閱第 28.1 節。
cluster_name
(string
)設定此叢集中所有伺服器程序的程序標題中顯示的叢集名稱。該名稱可以是任何少於 NAMEDATALEN 字元數的字串(標準版本中為 64 個字元)。cluster_name 值中只能使用可列印的 ASCII 字元。其他字元將被替換為問號(?)。如果此參數設定為空字串“”(這是預設值),則不顯示名稱。此參數只能在伺服器啟動時設定。
update_process_title
(boolean
)每當伺服器收到新的 SQL 指令時,都可以更新程序標題。此設定在大多數平台上預設為開啟,不過在 Windows 上預設為關閉,因為在 Windows 上更新程序標題的開銷較大。只有超級使用者可以變更此設定。
這些設定控制內建的串流複寫功能行為(請參閱第 26.2.5 節)。伺服器指的是主伺服務器或備用伺服器。主伺服器可以發送資料,而備用伺服器始終是複寫資料的接收者。當使用串聯複寫(請參閱第 26.2.7 節)時,備用伺服器也可以是發送者和接收者。參數主要用於發送和備用伺服器,但某些參數僅在主伺服器上有意義。如果需要,設定是跨群集的,不會産生問題。
可以在將資料複寫發送到一個或多個備用伺服器的任何伺服器上設定這些參數。主伺服器始終是發送伺服器,因此必須在主伺服器上設定這些參數。備用資料庫成為主資料庫後,這些參數的作用也不會改變。
max_wal_senders
(integer
)指定來自備用伺服器或串流複寫備份用戶端的最大同時連線數(即同時運行的 WAL 發送程序的最大數量)。預設值為 10,0 表示停用複寫。WAL 發送方程序也計入連線總數,因此參數不能設定高於 max_connections。突然串流用戶端中斷連線可能會導致遺留連線插槽,直到達到超時。因此此參數應設定為略高於預期用戶端的最大數量,以便中斷連線的用戶端可以立即重新連線。此參數只能在伺服器啟動時設定。wal_level 必須設定為 replica 或更高設定才能允許來自備用伺服器的連線。
max_replication_slots
(integer
)指定伺服器可以支援的最大複寫槽數(請參閱第 26.2.6 節)。預設值為 10。此參數只能在伺服器啟動時設定。必須將 wal_level 設定為 replica 或更高設定才能使用複寫槽。將其設定為低於目前現有複寫插槽數的值將阻止伺服器啟動。
wal_keep_segments
(integer
)指定保留在 pg_wal 目錄中的過時日誌段落檔案的最小數量,以防備用伺服器需要取得它們以進行串流複寫。每個段落段通常為 16 MB。如果連線到發送伺服器的備用伺服器落後於 wal_keep_segments 個段落以上,則發送伺服器可能會刪除備用資料庫仍需要的 WAL 段落,在這種情況下,複寫連線將會終止。因此,下游連線最終也會失敗。(但是,如果正在使用 WAL Archive,則備用伺服器可以透過從 Archive 中取得段落來進行回復。)
這僅設定 pg_wal 中保留的最小段落數量;系統可能需要為 WAL 存檔保留更多段落或從檢查點回復。如果 wal_keep_segments 為零(預設值),則系統不會為備用目的保留任何額外的段落,因此備用伺服器可用的舊 WAL 段落數是上一個檢查點的位置和WAL 歸檔狀態的函數。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
max_slot_wal_keep_size
(integer
)指定在檢查點時間允許複寫槽(replication slots)保留在 pg_wal 目錄中的 WAL 檔案最大大小。如果 max_slot_wal_keep_size 為 -1(預設值),則複寫槽可能會保留無限數量的 WAL 檔案。否則,如果複寫槽的 restart_lsn 落後於目前 LSN 超過設定大小,就會刪除所需的 WAL 檔案,使用該複寫槽的備用資料庫可能就不再能夠繼續複寫。您可以在 pg_replication_slots 中看到複寫槽的 WAL 可用性。
wal_sender_timeout
(integer
)終止靜止狀態超過指定毫秒數的複寫連線。這對於發送伺服器檢測備用伺服器當機或網路斷線很有用。值為零會停用超時機制。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值為 60 秒。
track_commit_timestamp
(boolean
)記錄事務的提交時間。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值為 off。
可以將要複寫資料發送到一個或多個備用伺服器,在主要伺服器上設定這些參數。請注意,除了這些參數之外,還必須在主要伺服器上正確設定 wal_level,可以選擇啟用 WAL 歸檔(參閱第 19.5.3 節)。備用伺服器上這些參數的值是無意義的,儘管您可能希望將它們設定在那裡以預備備用資料庫成為主要伺服器的可能性。
synchronous_standby_names
(string
)指定可支援同步複寫的備用伺服器列表,如第 26.2.8 節中所述。 將有一個或多個線上同步的備用資料庫;在這些備用伺服器確認收到其資料後,將允許等待提交的事務繼續進行。同步備用資料庫將是其名稱出現在此列表中的那些,並且即時以串流傳輸資料(如 pg_stat_replication 檢視表中的串流傳輸狀態所示)。指定多個同步備用資料庫可以達到非常高的可用性並防止資料遺失。
用於此目的的備用伺服器的名稱是以備用資料庫的 application_name 設定,在備用資料庫的連線資訊中設定。如果是物理性複寫的備用,則應在 recovery.conf 中的 primary_conninfo 設定中進行設定;預設的是 cluster_name 的內容,不然就會是 walreceiver。對於邏輯性複寫,可以在訂閱的連線訊息中設定,並且預設為訂閱名稱。對於其他複寫的串流使用者,請查閱其文件。
此參數使用以下任一語法指定備用伺服器列表:
其中 num_sync 是交易事務需要等待回覆的同步備用數量,而 standby_name 是備用伺服器的名稱。FIRST 和 ANY 指定從列出的伺服器中選擇同步備用資料庫的方法。
關鍵字 FIRST 與 num_sync 合併使用,指定基於優先的同步複寫,讓事務提交等待,直到將其 WAL 記錄複寫到優先選擇的 num_sync 同步備用資料庫。例如,FIRST 3(s1,s2,s3,s4)的設定將使得每個提交等待從備用伺服器 s1,s2,s3 和 s4 中選擇的三個較優先的備用資料庫回覆。名稱在列表中較早出現的備用資料庫具有較高的優先等級,並被視為是同步的。此列表中稍後出現的其他備用伺服器代表潛在的同步備用資料庫。如果任何當下的同步備用資料庫因任何原因斷開連線,它將立即被替換為次高優先等級的備用資料庫。關鍵字 FIRST 是選用的。
關鍵字 ANY 與 num_sync 一起使用,指定需要仲裁的同步複寫,使事務提交等待,直到將其 WAL 記錄複寫到至少 num_sync 列出的備用資料庫。例如,ANY 3(s1,s2,s3,s4)的設定將使得每個提交在 s1,s2,s3 和 s4 的至少任何三個備用資料回覆時繼續進行。
FIRST 和 ANY 都不區分大小寫。 如果將這些關鍵字用作備用伺服器的名稱,則其 standby_name 必須使用雙引號。
第三種語法在 PostgreSQL 版本 9.6 之前使用,仍然受支援。它與 FIRST 和 num_sync 等於 1 的第一個語法相同。例如,FIRST 1(s1,s2)和 s1,s2 具有相同的含義:s1 或 s2 被選為同步的備用伺服器。
特殊符號 * 表示匹配任何備用名稱。
沒有其他機制來強制備用名稱的唯一性。如果重複的話,其中一個備用資料庫將被視為更優先的,但無法確切說是哪一個。
注意 每個 standby_name 都應具有有效 SQL 識別字的形式,除非是 *。如有必要,您可以使用雙引號。但請注意,standby_names 與備用 application name 都不區分大小寫,無論是否為雙引號。
如果此處未指定同步的備用伺服器名稱,則不啟用同步複寫,事務提交就不會等待複寫。這是預設配置。即使啟用了同步複寫,也可以將單個事務設定為不等待複寫,方法是將 synchronous_commit 參數設定為 local 或 off。
此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
vacuum_defer_cleanup_age
(integer
)指定 VACUUM 和 HOT 更新將延遲清除過期資料列版本的事務數。預設值為 0 事務,這意味著可以盡快刪除過期資料列的版本。也就是說,只要它們不再對任何開放的事務是可見的。您可能希望在支持熱備用伺服器的主要服務器上將其設定為非零值,如第 26.5 節中所述。這樣可以讓備用資料庫上的查詢有更多時間完成,而不會因過早清理資料列而導致衝突。但是,由於該值是根據主要服務器上所發生的寫入事務的數量來衡量的,因此很難預測備用查詢可用多少額外的寬限時間。 此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
您還應該考慮在備用伺服器上設定 hot_standby_feedback 作為使用此參數的替代方法。
這不會阻止已達到 old_snapshot_threshold 指定期間的過時資料列清除。
這些設定控制要接收複寫資料的備用伺服器行為,與主伺服器上的設定是無關的。
primary_conninfo
(string
)Specifies a connection string to be used for the standby server to connect with a sending server. This string is in the format described in Section 33.1.1. If any option is unspecified in this string, then the corresponding environment variable (see Section 33.14) is checked. If the environment variable is not set either, then defaults are used.
The connection string should specify the host name (or address) of the sending server, as well as the port number if it is not the same as the standby server's default. Also specify a user name corresponding to a suitably-privileged role on the sending server (see Section 26.2.5.1). A password needs to be provided too, if the sender demands password authentication. It can be provided in the primary_conninfo
string, or in a separate ~/.pgpass
file on the standby server (use replication
as the database name). Do not specify a database name in the primary_conninfo
string.
This parameter can only be set in the postgresql.conf
file or on the server command line. If this parameter is changed while the WAL receiver process is running, that process is signaled to shut down and expected to restart with the new setting (except if primary_conninfo
is an empty string). This setting has no effect if the server is not in standby mode.
primary_slot_name
(string
)Optionally specifies an existing replication slot to be used when connecting to the sending server via streaming replication to control resource removal on the upstream node (see Section 26.2.6). This parameter can only be set in the postgresql.conf
file or on the server command line. If this parameter is changed while the WAL receiver process is running, that process is signaled to shut down and expected to restart with the new setting. This setting has no effect if primary_conninfo
is not set or the server is not in standby mode.
promote_trigger_file
(string
)
Specifies a trigger file whose presence ends recovery in the standby. Even if this value is not set, you can still promote the standby using pg_ctl promote
or calling pg_promote()
. This parameter can only be set in the postgresql.conf
file or on the server command line.
hot_standby
(boolean
)指定是否可以在回復期間連線和執行查詢,如第 26.5 節中所述。預設值為 on。 此參數只能在伺服器啟動時設定。它僅在歸檔回復或備機模式下有效。
max_standby_archive_delay
(integer
)當 Hot Standby 處於啟用狀態時,此參數確定備用伺服器在取消與即將套用的 WAL 項目衝突的備用查詢之前應等待的時間,如第 26.5.2 節中所述。當從 WAL 歸檔中讀取 WAL 資料時,max_standby_archive_delay 適用(因此不是當下的)。預設值為 30 秒。如果未指定,則單位為毫秒。值 -1 時允許備用資料庫永遠等待衝突查詢完成。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
請注意,max_standby_archive_delay 與取消前查詢可以執行的最長時間不同;相反地,它是允許套用任何一個 WAL 資料段的最大總時間。因此,如果一個查詢在 WAL 資料段中導致顯著延遲,則後續衝突查詢將具有更少的寬限時間。
max_standby_streaming_delay
(integer
)當 Hot Standby 處於啓用狀態時,此參數決定備用伺服器在取消與即將套用的 WAL 項目衝突的備用查詢之前應等待的時間,如第 26.5.2 節中所述。當透過串流複寫接收 WAL 資料時,套用max_standby_streaming_delay。預設值為 30 秒。如果未指定,則單位為毫秒。值 -1 時允許備用資料庫永遠等待衝突查詢完成。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
請注意,max_standby_streaming_delay 與取消前查詢可以執行的最長時間不同;相反地,它是從主伺服器收到 WAL 資料後允許套用的最大總時間。因此,如果一個查詢導致顯著延遲,則後續衝突查詢將具有更少的寬限時間,直到備用伺服器再次趕上。
wal_receiver_create_temp_slot
(boolean
)Specifies whether the WAL receiver process should create a temporary replication slot on the remote instance when no permanent replication slot to use has been configured (using primary_slot_name). The default is off. This parameter can only be set in the postgresql.conf
file or on the server command line. If this parameter is changed while the WAL receiver process is running, that process is signaled to shut down and expected to restart with the new setting.
wal_receiver_status_interval
(integer
)Specifies the minimum frequency for the WAL receiver process on the standby to send information about replication progress to the primary or upstream standby, where it can be seen using the pg_stat_replication
view. The standby will report the last write-ahead log location it has written, the last position it has flushed to disk, and the last position it has applied. This parameter's value is the maximum interval, in seconds, between reports. Updates are sent each time the write or flush positions change, or at least as often as specified by this parameter. Thus, the apply position may lag slightly behind the true position. Setting this parameter to zero disables status updates completely. This parameter can only be set in the postgresql.conf
file or on the server command line. The default value is 10 seconds.
hot_standby_feedback
(boolean
)Specifies whether or not a hot standby will send feedback to the primary or upstream standby about queries currently executing on the standby. This parameter can be used to eliminate query cancels caused by cleanup records, but can cause database bloat on the primary for some workloads. Feedback messages will not be sent more frequently than once per wal_receiver_status_interval
. The default value is off
. This parameter can only be set in the postgresql.conf
file or on the server command line.
If cascaded replication is in use the feedback is passed upstream until it eventually reaches the primary. Standbys make no other use of feedback they receive other than to pass upstream.
This setting does not override the behavior of old_snapshot_threshold
on the primary; a snapshot on the standby which exceeds the primary's age threshold can become invalid, resulting in cancellation of transactions on the standby. This is because old_snapshot_threshold
is intended to provide an absolute limit on the time which dead rows can contribute to bloat, which would otherwise be violated because of the configuration of a standby.
wal_receiver_timeout
(integer
)Terminate replication connections that are inactive longer than the specified number of milliseconds. This is useful for the receiving standby server to detect a primary node crash or network outage. A value of zero disables the timeout mechanism. This parameter can only be set in the postgresql.conf
file or on the server command line. The default value is 60 seconds.
wal_retrieve_retry_interval
(integer
)Specify how long the standby server should wait when WAL data is not available from any sources (streaming replication, local pg_wal
or WAL archive) before retrying to retrieve WAL data. This parameter can only be set in the postgresql.conf
file or on the server command line. The default value is 5 seconds. Units are milliseconds if not specified.
This parameter is useful in configurations where a node in recovery needs to control the amount of time to wait for new WAL data to be available. For example, in archive recovery, it is possible to make the recovery more responsive in the detection of a new WAL log file by reducing the value of this parameter. On a system with low WAL activity, increasing it reduces the amount of requests necessary to access WAL archives, something useful for example in cloud environments where the amount of times an infrastructure is accessed is taken into account.
recovery_min_apply_delay
(integer
)By default, a standby server restores WAL records from the sending server as soon as possible. It may be useful to have a time-delayed copy of the data, offering opportunities to correct data loss errors. This parameter allows you to delay recovery by a specified amount of time. For example, if you set this parameter to 5min
, the standby will replay each transaction commit only when the system time on the standby is at least five minutes past the commit time reported by the master. If this value is specified without units, it is taken as milliseconds. The default is zero, adding no delay.
It is possible that the replication delay between servers exceeds the value of this parameter, in which case no delay is added. Note that the delay is calculated between the WAL time stamp as written on master and the current time on the standby. Delays in transfer because of network lag or cascading replication configurations may reduce the actual wait time significantly. If the system clocks on master and standby are not synchronized, this may lead to recovery applying records earlier than expected; but that is not a major issue because useful settings of this parameter are much larger than typical time deviations between servers.
The delay occurs only on WAL records for transaction commits. Other records are replayed as quickly as possible, which is not a problem because MVCC visibility rules ensure their effects are not visible until the corresponding commit record is applied.
The delay occurs once the database in recovery has reached a consistent state, until the standby is promoted or triggered. After that the standby will end recovery without further waiting.
This parameter is intended for use with streaming replication deployments; however, if the parameter is specified it will be honored in all cases except crash recovery. hot_standby_feedback
will be delayed by use of this feature which could lead to bloat on the master; use both together with care.
Synchronous replication is affected by this setting when synchronous_commit
is set to remote_apply
; every COMMIT
will need to wait to be applied.
This parameter can only be set in the postgresql.conf
file or on the server command line.
這些設定控制著邏輯複寫訂閱伺服器的行為。它們與發佈者的設定無關。
請注意,wal_receiver_timeout,wal_receiver_status_interval 和 wal_retrieve_retry_interval 組態參數也會影響邏輯複寫的工作程序。
max_logical_replication_workers
(int
)指定邏輯複寫工作程序的最大數量。這包括應用工作程序和資料表同步的工作程序。
邏輯複寫工作程序來自 max_worker_processes 定義的資源池。
預設值為 4。
max_sync_workers_per_subscription
(integer
)每個訂閱的最大同步工作程序數目。此參數控制訂閱初始化期間或增加新資料表時初始資料副本的平行處理數量。
目前,每個資料表只會有一個同步工作程序。
同步工作程序來自 max_logical_replication_workers 定義的資源池。
預設值為 2。
search_path
(string
)這個參數表示,當一個物件(資料表、資料型別、函數等)以未指定 schema 的簡單名稱引用時,其搜尋的路徑順序。當不同 schema 中有相同名稱的物件時,將採用搜尋路徑中第一個找到的物件。不在搜尋路徑中的任何 schema 中物件,就只能透過使用限定名稱來指定其 schema 來引用。
search_path 的內容必須是逗號分隔的 schema 名稱列表。任何非現有 schema 的名稱,或是使用者不具有 USAGE 權限的 schema,都將被忽略。
如果其中一個項目是特殊名稱 $user,則會使用 SESSION_USER 回傳的名稱作為 schema 名稱,確認該 schema 存在且使用者具有 USAGE 權限。 (如果沒有權限,$user 將被忽略。)
系統目錄 pg_catalog 一定會被搜尋,無論是否列在搜尋路徑中。如果列在搜尋路徑中了,那麼它將按照指定的順序被搜尋。 如果 pg_catalog 不在搜尋路徑中,那麼它將會優先被搜尋。
同樣地,目前連線的臨時資料表的schema,pg_temp_nnn,如果它存在的話,就一定會被搜尋。它可以透過使用別名 pg_temp 明確列在搜尋路徑中。如果沒有在搜尋路徑中列出的話,則優先搜尋(在 pg_catalog 之前)。但是,臨時 schema 只是搜索關連(資料表、view,序列等)和資料型別名稱。不會搜尋函數或運算子名稱。
建立物件時沒有指定特定的 schema,那麼它們將被放置在 search_path 中的第一個有效 schema 中。如果搜尋路徑為空,則會產生錯誤。
這個參數的預設值是 “$user”,public。此設定用來支援共享資料庫,沒有使用者具有私有 schema、所有共享使用 public、私人自有 schema ,以及以上情境的組合。其他的需求也可以透過更改預設的搜索路徑設置來達到,無論是全域或自有搜尋路徑。
搜尋路徑的目前內容可以使用 SQL 函數 current_schemas 來檢查(詳見 9.25 節)。這與檢查 search_path 的內容並不完全相同,因為 current_schemas 表示 search_path 中出現的項目是如何解析的。
有關 schema 處理的更多訊息,請參見第 5.9 節。
row_security
(boolean
)此參數控制在資料列安全原則檢查時是否進行錯誤中斷。設定為 on 時,安全原則以正常方式運作。當設定為 off 時,除非查詢失敗,否則會至少符合一個原則。 預設值為 on。變更為 off 時,將會限制資料列的可視性,而可能造成不正確的結果;例如,pg_dump 就會變更其預設值。此參數對於可以繞過每個安全原則的角色,也就是對具有 BYPASSRLS 屬性的超級使用者和角色都不會產生影響。
有關於資料列安全原則的更多訊息,請參閱 CREATE POLICY。
default_tablespace
(string
)此參數指的是在 CREATE 指令未明確指定資料表空間(tablespace)時用於建立的資料庫物件(資料表和索引)的預設資料表空間。
該值可以是資料表空間的名稱,也可以是使用空字串表示為目前資料庫的預設資料表空間。如果該值與不符合任何現有的資料表空間名稱時,PostgreSQL 將自動使用目前資料庫的預設資料表空間。如果指定了非預設的資料表空間,則使用者必須具有 CREATE 權限,否則建立的操作將會失敗。
這個參數不用於臨時資料表;對於臨時資料表來說,會參考 temp_tablespaces 參數。
建立資料庫時也不會使用這個參數。預設情況下,新的資料庫將複製的樣板資料庫,並繼承其資料表空間的設定。
有關於資料表空間的更多資訊,請參閱第 23.6 節。
default_toast_compression
(enum
)This variable sets the default TOAST compression method for values of compressible columns. (This can be overridden for individual columns by setting the COMPRESSION
column option in CREATE TABLE
or ALTER TABLE
.) The supported compression methods are pglz
and (if PostgreSQL was compiled with --with-lz4
) lz4
. The default is pglz
.
temp_tablespaces
(string
)此參數指定在 CREATE 指令未指定資料表空間時創立臨時物件(臨時資料表和臨時資料表的索引)的資料表空間。用於排序大量資料集的臨時檔案也在這些資料表空間中創立。
該內容是資料表空間名稱的列表。當列表中有多個名稱時,PostgreSQL 在每次建立臨時物件時都會隨機選擇一個列表成員;除非是在一個交易中,連續建立的臨時物件將會被放置在列表的後續資料表空間中。 如果列表的元素是空字串,PostgreSQL 將自動使用目前資料庫的預設資料表空間。
設定 temp_tablespaces 時,指定一個不存在的資料表空間會造成錯誤,因為指定一個使用者沒有 CREATE 權限的資料表空間。但是,使用先前設定的內容時,不存在的資料表空間將被忽略,使用者缺少 CREATE 權限的資料表空間也將被忽略。特別是,在使用 postgresql.conf 中設定的內容時,此規則適用。
預設值是一個空字串,這將會使用目前資料庫的預設資料空間中建立所有臨時物件。
另請參閱本頁的 default_tablespace。
check_function_bodies
(boolean
)這個參數通常是啓用(on)的。如果把它關閉(off)的話,將在 CREATE FUNCTION 時關閉函數內容檢驗的措施。停用檢驗可避免檢驗過程的副作用,避免由於物件引用等問題所導致的誤報。例如以其他使用者載入函數之前,將此參數設置為 off;pg_dump 將會自動執行此操作。
default_transaction_isolation
(enum
)每組 SQL 交易查詢都有一個隔離的等級,可以是「read uncommitted」、「read committed」、「repeatable read」或「serializable」。此參數控制每個新的交易產生時的預設隔離等級。預設是「read committed」。
請參閱第 13 章和 SET TRANSACTION 以取得更多訊息。
default_transaction_read_only
(boolean
)一個唯讀的 SQL 交易不能更新非臨時的資料表。此參數控制每個新的交易的預設為唯讀狀態。預設是關閉(off)的(可讀/可寫)。
請參閱 SET TRANSACTION 以取得更多訊息。
default_transaction_deferrable
(boolean
)以 serializable 的隔離等級執行時,可延遲的唯讀 SQL 交易可能會被延遲,稍後才允許繼續。但是,一旦開始執行,就不會產生確保可序列化所需的任何成本;所以序列化代碼將不會因為同步更新而強制中止,使得這個選項適合用於長時間運行的唯讀交易。
此參數控制每個新交易查詢的預設可延期狀態。它目前對讀寫交易或者低於 serializable 隔離等級的操作沒有影響。預設是關閉(off)的。
請參閱 SET TRANSACTION 以取得更多訊息。
session_replication_role
(enum
)控制目前連線與複寫相關觸發器與規則。設定此參數需要超級使用者權限,會導致放棄任何先前快取的查詢計劃。可能的值是 origin(預設)、replica 和 local。 有關更多訊息,請參閱 ALTER TABLE。
statement_timeout
(integer
)任何指令執行超過指定的時間時,就會中止其執行。時間單位為 millisecond(毫秒)。以伺服器接受到的時間起算。 如果 log_min_error_statement 設定為 ERROR 或更低的等級時,則超時的查詢語句將被記錄下來。設定值為零(預設值),將其關閉功能。
不建議在 postgresql.conf 中設定 statement_timeout,因為它會影響所有的連線。
lock_timeout
(integer
)當你企圖鎖定資料表、索引、資料列或其他資料庫物件上時,任何等待超過指定的毫秒數的語句都會被強制中止。時間限制會分別適用於每次鎖定取得的嘗試。此限制適用於明確的鎖定請求(例如 LOCK TABLE 或 SELECT FOR UPDATE without NOWAIT)以及隱含的鎖定請求。如果將 log_min_error_statement 設定為 ERROR 或更低的等級時,則會記錄超時的語查詢句。設定值為零(預設值),將其關閉功能。
與 statement_timeout 不同,這個超時設定只會在等待鎖定的時候有作用。請注意,如果 statement_timeout 不為零,則將 lock_timeout 設定為相同或更大的值是毫無意義的,因為查詢語句超時總是會首先觸發。
不建議在 postgresql.conf 中設定 lock_timeout,因為這會影響所有的連線。
idle_in_transaction_session_timeout
(integer
)如果空閒時間超過指定的持續時間時(以毫秒為單位)未完成的交易將會被終止。這會釋放該連線所持有的任何鎖定,並使連線可以重新使用;也只有 tuple 才能看到這個交易被清除。有關這方面的更多細節,請參閱第 25.1 節。
預設值 0 表停用此功能。
idle_session_timeout
(integer
)終止任何已閒置(也就是等待用戶端查詢中)但不在交易事務中且超過指定時間的連線。 如果此值未指定單位,則以毫秒為單位。0 值(預設值)為停用此功能。
Unlike the case with an open transaction, an idle session without a transaction imposes no large costs on the server, so there is less need to enable this timeout than idle_in_transaction_session_timeout
.
Be wary of enforcing this timeout on connections made through connection-pooling software or other middleware, as such a layer may not react well to unexpected connection closure. It may be helpful to enable this timeout only for interactive sessions, perhaps by applying it only to particular users.
vacuum_freeze_table_age
(integer
)如果資料表的 pg_class.relfrozenxid 欄位值已達到此設定的指定時間,VACUUM 將主動執行掃描。主動的掃描不同於一般的 VACUUM,因為它會訪問每個可能包含解開的 XID 或 MXID的頁面,而不僅僅是那些可能包含廢棄 tuple 的頁面。預設是 1.5 億筆交易。 儘管使用者可以設定的範圍為 0 到 20 億,但 VACUUM 將自動地將有效值限制為 autovacuum_freeze_max_age 的 95%,以便在啟動資料表的 anti-wraparound 自動清理之前,定期的手動 VACUUM 有機會運行。欲了解更多訊息,請參閱第 24.1.5 節。
vacuum_freeze_min_age
(integer
)指定 VACUUM 是否決定在掃描資料表時凍結資料列版本的截止期限(交易中)。預設是5000萬交易。 儘管使用者可以設定此值為 0 到 10 億之間的任何值,但 VACUUM 將自動地將有效值限制為 autovacuum_freeze_max_age 值的一半,以便在強制自動清理之間沒有過短的不合理時間間隔。欲了解更多訊息,請參閱第 24.1.5 節。
vacuum_multixact_freeze_table_age
(integer
)如果資料表的 pg_class.relminmxid 欄位值已達到此設定指定的時間,VACUUM 將主動執行掃描。主動的掃描不同於一般的 VACUUM,因為它會訪問每個可能包含解開的 XID 或 MXID 的頁面,而不僅僅是那些可能包含廢棄 tuple 的頁面。預設值是 1.5 億個交易。儘管使用者可以設定的範圍為 0 到 20 億,但 VACUUM 將自動地將有效值限制為 autovacuum_freeze_max_age的 95%,以便在啟動資料表的 anti-wraparound 自動清理之前,定期的手動 VACUUM 有機會運行。欲了解更多訊息,請參閱第 24.1.5 節。
vacuum_multixact_freeze_min_age
(integer
)指定 VACUUM 在掃描資料表時是使用較新的 transaction ID 或是 multixact ID,來替換多個 multixact ID 的截斷年限(以 multixact 表示)。預設是500萬個 multixact。儘管使用者可以設定此值為 0 到 10 億之間的任何值,但 VACUUM 將自動地將有效值限制為 autovacuum_freeze_max_age 值的一半,以便在強制自動清理之間沒有過短的不合理時間間隔。欲了解更多訊息,請參閱 第 24.1.5.1 節。
vacuum_cleanup_index_scale_factor
(floating point
)Specifies the fraction of the total number of heap tuples counted in the previous statistics collection that can be inserted without incurring an index scan at the VACUUM
cleanup stage. This setting currently applies to B-tree indexes only.
If no tuples were deleted from the heap, B-tree indexes are still scanned at the VACUUM
cleanup stage when at least one of the following conditions is met: the index statistics are stale, or the index contains deleted pages that can be recycled during cleanup. Index statistics are considered to be stale if the number of newly inserted tuples exceeds the vacuum_cleanup_index_scale_factor
fraction of the total number of heap tuples detected by the previous statistics collection. The total number of heap tuples is stored in the index meta-page. Note that the meta-page does not include this data until VACUUM
finds no dead tuples, so B-tree index scan at the cleanup stage can only be skipped if the second and subsequent VACUUM
cycles detect no dead tuples.
The value can range from 0
to 10000000000
. When vacuum_cleanup_index_scale_factor
is set to 0
, index scans are never skipped during VACUUM
cleanup. The default value is 0.1
.
bytea_output
(enum
)設定預設的輸出格式型別為bytea
。合法的設定值為 hex(預設)和 escape(傳統的 PostgreSQL 格式)。請參閱第 8.4 節取得更多資訊。無論這個設定如何,bytea 型別在輸入時,兩種格式都能接受。
xmlbinary
(enum
)設定如何在 XML 中編碼二進位數值。例如,當 bytea 值被函數 xmlelement 或 xmlforest 轉換為XML時,就適用這個設定。可以使用的值是 base64 和 hex,都是在 XML Schema 標準中定義的。 預設值是 base64。有關 XML 相關函數的更多訊息,請參閱第 9.14 節。
實際上的選擇主要是習慣問題,僅受限於客戶端應用程式中的可能限制。這兩種方法都支援所有可能的值,儘管 hex 編碼會比 base64 編碼稍大。
xmloption
(enum
)在 XML 和字串之間轉換時,設定是否隱含 DOCUMENT 或 CONTENT。請參閱 8.13 節的描述。有效值是 DOCUMENT 和 CONTENT。預設值是 CONTENT。
根據 SQL 標準,設定此選項的命令是
這個語法在 PostgreSQL 中也是可以使用的。
gin_pending_list_limit
(integer
)設定啟用 fastupdate 時使用的 GIN 排程列表的最大空間。如果列表大於這個最大空間,則透過將其中的項目整批移動到主 GIN 資料結構來清除它。預設值是 4MB。透過更改索引的儲存參數,可以為單個 GIN 索引覆寫此設定。有關更多訊息,請參閱第 64.4.1 節和第 64.5 節。
DateStyle
(string
)設定日期和時間內容的顯示格式,以及解釋模糊日期輸入的規則。由於歷史的因素,此參數包含兩個獨立的參數:輸出格式規範(ISO、Postgres、SQL 或 German)以及年/月/日次序(DMY、MDY 或 YMD)的輸入/輸出規範。它們可以單獨或一起設定。 關鍵字 Euro 和 European 是 DMY 的同義詞;關鍵字 US、NonEuro 和 NonEuropean 是 MDY 的同義詞。有關更多訊息,請參閱第 8.5 節。 內建的預設值是 ISO、MDY,但是 initdb 會以使用所選的 lc_time 語言環境相對應的設定來初始化設定內容。
IntervalStyle
(enum
)設定間隔時間內容的顯示格式。設定為 sql_standard 時,將產生合於 SQL 標準的間隔時間的輸出。當 DateStyle 參數設定為 ISO 時,設定為 postgres(預設值)將會產生與 8.4 之前的 PostgreSQL 版本相容輸出。當 DateStyle 參數設定為 non-ISO 時,設定為 postgres_verbose 將生成與 8.4之前的 PostgreSQL 版本相容輸出。 設定為 iso_8601 時,將產生 ISO 8601 中 4.4.3.2 節裡所定義的時間間隔「格式與標誌符」相容的輸出。
Interval Style 參數也會影響模糊區間輸入的解釋。有關更多訊息,請參閱第 8.5.4 節。
TimeZone
(string
)設定顯示和解釋時間戳記的時區。內建的預設值是 GMT,但通常會在 postgresql.conf 中被覆寫;initdb 將在安裝時取得其系統環境相對應的設定。 有關更多訊息,請參閱第 8.5.3 節。
timezone_abbreviations
(string
)設定日期時間輸入能被伺服器接受的時區縮寫集合。預設是「Default」,這是一個在世界大部分地區都可以使用的集合;還有「Australia」和「India」,並且可以為特定定義安裝其他集合。 更多訊息詳見 B.3 節。
extra_float_digits
(integer
)此參數調整顯示浮點數的位數,包括 float4、float8 和地理資料型別。參數值會被加到標準位數之中(FLT_DIG 或 DBL_DIG)。此值可以設定為 3,以包含部分有效數字;這對於需要精確回存浮點數資料特別有用。或者可以將其設定為負數來減少不需要的數字。請另參閱第 8.1.3 節。
client_encoding
(string
)設定用戶端編碼(字元集)。預設是使用資料庫的編碼方式。在 23.3.1 節描述了 PostgreSQL 資料庫支援的字元集。
lc_messages
(string
)設定訊息顯示的語言。可接受的值取決於系統;關於更多訊息,請參閱第 23.1 節。如果此參數設定為空字串(預設值),則該值將以系統相關的方式從伺服器的執行環境中繼承。
在某些系統上,此語言環境類別並不存在。設定這個參數仍然可以運作,但不會有任何影響。此外,也可能還沒有用於所需語言翻譯的訊息。在這種情況下,你會繼續看到英文訊息。
只有系統管理者可以更改此設定,因為它會影響發送到伺服器日誌以及用戶端的訊息,而不正確的值可能會影響伺服器日誌的可讀性。
lc_monetary
(string
)設定用於格式化貨幣金額的區域配置,例如 to_char 系列函數。可接受的值取決於系統;關於更多訊息,請參閱第 23.1 節。如果此參數設定為空字串(預設值),則該值將以系統相關的方式從伺服器的執行環境中繼承。
lc_numeric
(string
)設定用於格式化數字的區域配置,例如 to_char 系列函數。可接受的值取決於系統;關於更多訊息,請參閱第 23.1 節。如果此參數設定為空字串(預設值),則該值將以系統相關的方式從伺服器的執行環境中繼承。
lc_time
(string
)設定用於格式化時間的區域配置,例如 to_char 系列函數。可接受的值取決於系統;關於更多訊息,請參閱第 23.1 節。如果此參數設定為空字串(預設值),則該值將以系統相關的方式從伺服器的執行環境中繼承。
default_text_search_config
(string
)選擇全文檢索的設定,用於那些無法指定語系的全文檢索函數。 更多說明詳見第12章。內建的預設值為 pg_catalog.simple,但如果可以識別與該語言環境匹配的配置,則 initdb 將使用與所選 lc_ctype 語言環境相對應的設置來初始化配置設定。
有幾個設定可用於將共享函式庫預載到伺服器中,以便載入延伸功能並展現性能優勢。例如,設定 '$libdir / mylib' 能將 mylib.so(在某些平台上是 mylib.sl)從安裝的標準函式庫目錄中預載。這些設定之間的差異主要是控制在何時生效,以及需要哪些權限才能更改它們。
PostgreSQL 的程序語言庫可以用這種方式預載,通常語法是 '$libdir/plXXX',其中 XXX 是 pgsql、perl、tcl 或 python。
只有專門用於 PostgreSQL 的共享函式庫才能以這種方式載入。每個支援 PostgreSQL 的函式庫都有一個「magic block」,它會被檢查以確保相容性。由於這個原因的關係,非 PostgreSQL 函式庫不能以這種方式載入。你可能可以使用作業系統的功能,例如 LD_PRELOAD。
一般來說,都需要詳閱該函式庫的文件,以獲得載入該函式庫推薦的方法.
local_preload_libraries
(string
)此參數指定一個或多個要在連線啟動時預載的共享函式庫。它是逗號分隔的函式庫名稱列表,其中每個名稱都被以 LOAD 命令處理。 項目之間的空白都會被忽略;如果需要在名稱中包含空格或逗號,請用雙引號括住函式庫名稱。參數值僅在連線開始時生效。 後續更改都不起作用。如果未找到指定的函式庫,則連線嘗試將會失敗。
這個選項可以由任何使用者設定。因此,可以載入的函式庫僅限於出現在標準函式庫目錄的外掛目錄中的函式庫。 (資料庫管理員有責任確保在那裡只安裝了「安全的」函式庫。)local_preload_libraries 中的項目可以明確指定此目錄,例如 $libdir/plugins/mylib,或者只指定函式庫名稱 mylib 與 $libdir/plugins/mylib 具有相同的效果。
此功能的目的是允許非特權用戶將調教或性能測試函式庫加載到特定的連線中,而不需要明確的 LOAD 命令。為此,通常使用用戶端上的 PGOPTIONS 環境變數或透過使用 ALTER ROLE SET 來設定此參數。
但是,除非一個模組是專門設計用於非超級用戶的方式,否則這通常不適合使用。請參考使用 session_preload_libraries 參數。
session_preload_libraries
(string
)此參數指定一個或多個要在連線啟動時預載的共享函式庫。它是逗號分隔的函式庫名稱列表,其中每個名稱都被以 LOAD 命令處理。. 項目之間的空白都會被忽略;如果需要在名稱中包含空格或逗號,請用雙引號括住函式庫名稱。參數值僅在連線開始時生效。 後續更改都不起作用。如果未找到指定的函式庫,則連線嘗試將會失敗。 只有超級使用者可以調整此參數。
此功能的目的是允許除錯或性能測試的函式庫載入到特定的連線中,而不需要指示明確的 LOAD 指令。例如,透過使用 ALTER ROLE SET 設定此參數,可以為指定用戶的所有連線啟用 auto_explain。此外,可以在不重新啟動服務的情況下更改此參數(但更改僅在啟動新的連線時生效),因此即使應用於所有連線,以這種方式增加新的模組也很容易。
與 shared_preload_libraries 不同,在連線啟動時載入函式庫時並沒有很大的效能優勢,相對於第一次使用時。 但是,使用連接池時會有一些優勢。
shared_preload_libraries
(string
)此參數指定一個或多個要在伺服器啟動時預載的共享函式庫。它是逗號分隔的函式庫名稱列表,其中每個名稱都被以 LOAD 命令處理。. 項目之間的空白都會被忽略;如果需要在名稱中包含空格或逗號,請用雙引號括住函式庫名稱。參數值僅在伺服器啓動時生效。 後續更改都不起作用。如果未找到指定的函式庫,則連線嘗試將會失敗。
有些函式庫需要執行某些只能在 postmaster 啟動時才能執行的操作,例如分配共享記憶體,保留輕量級鎖定或啟動背景執行程序。 這些函式庫必須在伺服器啟動時通過此參數載入。有關詳細信息,請參閱各別函式庫的文件。
其他的函式庫也可以預先載入。通過預先載入共享函式庫,首次使用函式庫時可以減少啟動時間的成本。但是,啟動每個新伺服器服務的時間可能會略有增加,即使該服務從不使用該函式庫。因此,此參數僅適用於大多數連線中將使用的函式庫。另外,更改此參數需要重新啟動伺服器,因此這不適用於短期除錯事務的需求,請改為使用 session_preload_libraries。
注意在Windows主機上,在伺服器啟動時預載函式庫不會減少啟動每個新伺服器服務所需的時間;每個伺服器服務程將重新加載所有預載函式庫。但是,shared_preload_libraries 仍然是有用的,在你的 Windows 主機的 postmaster 啓動時操作所需的函式庫。
dynamic_library_path
(string
)如果需要開啓一個可動態載入的模組,並且在 CREATE FUNCTION 或 LOAD 指令中使用沒有目錄名稱的模組檔案(即該名稱不包含斜線),系統將在此路徑中搜尋所需的檔案。
dynamic_library_path 的內容必須是由冒號(或在 Windows 上是分號)分隔的絕對路徑的列表。如果該列表項目以特殊字符串 $libdir 開頭,那麼編譯後的 PostgreSQL 函式庫目錄會被替換為 $libdir;這是安裝標準 PostgreSQL 發行版所提供的模組的路徑。(可以使用 pg_config --pkglibdir 查詢此目錄的路徑。)例如:
或者,在 Windows 環境中:
此參數的預設值是「$libdir」。如果此值設定為空字串,則將關閉自動路徑搜尋。
超級使用者可以在服務執行時更改此參數,但以這種方式完成的設定只會持續到用戶端連線結束,因此應將此方法保留用於開發階段使用。建議使用此參數的方式是在 postgresql.conf 設定檔中。
gin_fuzzy_search_limit
(integer
)由 GIN 索引掃描回傳集合大小的軟上限。詳情請參閱第 66.5 節。
deadlock_timeout
(integer
)這是查看是否存在交易鎖定鎖死情況之前,所等待的時間量(以毫秒為單位)。檢查鎖死是相對昂貴的,所以伺服器在每次等待鎖定時都不會執行這個動作。我們樂觀地認為鎖死在產品應用程式中並不常見,所以在檢查鎖死之前等待鎖定一段時間。增加此值可減少無謂的鎖死檢查所浪費的時間,但會減慢真正鎖死錯誤的回報速度。預設值是 1 秒,這可能是您實際需要的最小值。 在負載很重的伺服器上,您可能需要提升一些。理想情況下,此設定應該超過您典型的交易時間,以便提高在伺服器決定檢查鎖死之前鎖定就被解除的可能性。只有超級使用者可以變更此設定。
當設定 log_lock_waits 時,此參數還會確定在發出有關鎖定等待的日誌消息之前需要等待的時間長度。如果您試圖查看鎖定延遲,則可能需要設定比正常情況更短的 deadlock_timeout。
max_locks_per_transaction
(integer
)共享鎖定資料表追踪 max_locks_per_transaction *(max_connections + max_prepared_transactions)個物件(例如資料表)上的交易鎖定;因此,在任何時候都可以鎖定許多不同的物件。 此參數控制為每個交易事務分配的平均對象鎖數量; 只要所有交易的鎖定符合鎖定資料表,個別交易就可以鎖定更多的對象。 這不是可以鎖定的資料列數;該值是無限的。預設值 64 在歷史上證明是足夠的,但如果在單個交易事務中有許多不同資料表的查詢,則可能需要提高此值。例如有很多子資料表的父資料表的查詢。此參數只能在伺服器啟動時設定。
運行備用伺服器時,必須將此參數設定為與主服務器上相同或更高的值。 否則,查詢將不被允許在備用伺服器中。
max_pred_locks_per_transaction
(integer
)共享的 predicate lock 資料表追踪 max_pred_locks_per_transaction *(max_connections + max_prepared_transactions)個物件(例如資料表)上的交易鎖定;因此,在任何時候不會有比這個數字更多的物件被鎖定。此參數控制為每個交易事務分配的平均物件鎖定的數量;只要所有交易的鎖定符合鎖定資料表,個別交易就可以鎖定更多的物件。不是可以鎖定的資料列數;該值是無限的。預設值 64 通常在測試中足夠了,但如果您的用戶端在單個可序列化交易事務中觸及許多不同的資料表,您可能需要提高此值。此參數只能在伺服器啟動時設定。
max_pred_locks_per_relation
(integer
)這可以控制在鎖定被提升為鎖定整個關連之前,單個關連的多少個 page 或 tuple 可以被 predicate-lock。大於或等於零的值表示絕對限制,而負值表示 max_pred_locks_per_transaction 除以此設定的絕對值。預設值是 -2,它保留了先前版本 PostgreSQL 的行為。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
max_pred_locks_per_page
(integer
)這可以控制在將鎖定升級為覆蓋整個 page 之前,單個 page 上有多少資料列可以 predicate-locked。 預設值是 2。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。\
This feature was designed to allow parameters not normally known to PostgreSQL to be added by add-on modules (such as procedural languages). This allows extension modules to be configured in the standard ways.
Custom options have two-part names: an extension name, then a dot, then the parameter name proper, much like qualified names in SQL. An example is plpgsql.variable_conflict
.
Because custom options may need to be set in processes that have not loaded the relevant extension module, PostgreSQL will accept a setting for any two-part parameter name. Such variables are treated as placeholders and have no function until the module that defines them is loaded. When an extension module is loaded, it will add its variable definitions, convert any placeholder values according to those definitions, and issue warnings for any unrecognized placeholders that begin with its extension name.\
用戶端身份驗證由組態檔案控制,組態檔案通常名稱為 pg_hba.conf,並儲存在資料庫叢集的資料目錄中。 (HBA 代表 host-based authentication。)當 initdb 初始化資料目錄時,將安裝預設的 pg_hba.conf 檔案。但是,可以將身份驗證組態檔案放在其他路徑;請參閱 組態參數。
pg_hba.conf 檔案的一般格式是一組記錄,每行一個。空白行將被忽略,# comment 字元後面的任何文字都將被忽略。記錄不能跨行。記錄由許多段落組成,這些段落由空格或 tab 分隔。如果段落的值用了雙引號,則段落可以包含空格。在資料庫,使用者或位址段落(例如,all 或 replication)中括起其中一個關鍵字會使該字失去其特殊含義,並且只是將資料庫,使用者或主機與該名稱相匹配。
每條記錄指定連線類型,用戶端 IP 位址範圍(如果與連線類型相關)、資料庫名稱、使用者名稱以及符合這些參數的連線身份驗證方法。具有符合的連線類型、用戶端位址、要求的資料庫和使用者名稱的第一個記錄用於執行身份驗證。沒有“fall-through”或“replication”:如果選擇了一條記錄而認證失敗,就不再考慮後續記錄。如果沒有記錄匹配,則拒絕存取。
記錄可以是下面的七種格式之一
段落的含義如下:
local
此記錄搭配使用 Unix-domain socket 的連線嘗試。如果沒有此類型的記錄,則不允許使用 Unix-domain socket 連線。
host
此記錄用於使用 TCP/IP 進行的連線嘗試。主機記錄使用 SSL 或非 SSL 連線嘗試.
重要 除非使用 組態參數的適當值啟動伺服器,否則將無法進行遠端 TCP/IP 連線,因為預設行為是僅在 localhost 上監聽 TCP/IP 連線。
hostssl
此記錄會套用於使用 TCP/IP 進行的連線嘗試,但僅限於使用 SSL 加密進行連線時。
hostnossl
此記錄類型與 hostssl 具有相反的行為;它僅套用於透過 TCP/IP 且不使用 SSL 的連線嘗試。
database
指定此記錄所要求搭配的資料庫名稱。值 all 使其搭配所有資料庫。如果請求的資料庫與請求的使用者具有相同的名稱,則可以用 sameuser 值來指定。值 samerole 指定所請求的使用者必須是與請求的資料庫同名的角色成員。 ( samegroup 是一個過時但仍然被接受的 samerole 別名。)超級使用者不被認為是同一角色的成員,除非他們直接或間接地明確地成為角色的成員,而不僅僅是作為超級使用者。值 replication 指定在請求 physical replication 連線時的記錄搭配(請注意,複寫連線不指定任何特定資料庫)。否則,這是特定 PostgreSQL 資料庫的名稱。可以透過用逗號分隔它們來設定多個資料庫名稱,也可以透過在檔案名稱前加上 @ 來指定包含資料庫名稱的額外檔案。
user
指定此記錄所限制的資料庫使用者名稱。all 表示所有使用者都適用。否則,它就是是特定資料庫使用者的名稱,要就是帶有 + 的群組名稱。(回想一下,PostgreSQL 中的使用者和群組之間並沒有真正的差別; + 標記實際上表示「符合直接或間接地成為該角色成員的任何角色」,而沒有 + 標記的名稱僅適用該特定角色。 )為此,只有超級使用者直接或間接明確地是角色的成員,而不僅僅是憑藉超級使用者,才將其視為角色的成員。可以使用逗號分隔多個使用者名稱。透過在檔案名稱前面加上 @ 來指定包含使用者名稱的獨立設定檔案。
address
Specifies the client machine address(es) that this record matches. This field can contain either a host name, an IP address range, or one of the special key words mentioned below.
An IP address range is specified using standard numeric notation for the range's starting address, then a slash (/
) and a CIDR mask length. The mask length indicates the number of high-order bits of the client IP address that must match. Bits to the right of this should be zero in the given IP address. There must not be any white space between the IP address, the /
, and the CIDR mask length.
Typical examples of an IPv4 address range specified this way are 172.20.143.89/32
for a single host, or 172.20.143.0/24
for a small network, or 10.6.0.0/16
for a larger one. An IPv6 address range might look like ::1/128
for a single host (in this case the IPv6 loopback address) or fe80::7a31:c1ff:0000:0000/96
for a small network. 0.0.0.0/0
represents all IPv4 addresses, and ::0/0
represents all IPv6 addresses. To specify a single host, use a mask length of 32 for IPv4 or 128 for IPv6. In a network address, do not omit trailing zeroes.
An entry given in IPv4 format will match only IPv4 connections, and an entry given in IPv6 format will match only IPv6 connections, even if the represented address is in the IPv4-in-IPv6 range. Note that entries in IPv6 format will be rejected if the system's C library does not have support for IPv6 addresses.
You can also write all
to match any IP address, samehost
to match any of the server's own IP addresses, or samenet
to match any address in any subnet that the server is directly connected to.
If a host name is specified (anything that is not an IP address range or a special key word is treated as a host name), that name is compared with the result of a reverse name resolution of the client's IP address (e.g., reverse DNS lookup, if DNS is used). Host name comparisons are case insensitive. If there is a match, then a forward name resolution (e.g., forward DNS lookup) is performed on the host name to check whether any of the addresses it resolves to are equal to the client's IP address. If both directions match, then the entry is considered to match. (The host name that is used in pg_hba.conf
should be the one that address-to-name resolution of the client's IP address returns, otherwise the line won't be matched. Some host name databases allow associating an IP address with multiple host names, but the operating system will only return one host name when asked to resolve an IP address.)
以點(.)開頭的主機名稱表示與實際主機名稱的後段比對。因此,.example.com 與 foo.example.com 比較是相符的(不僅限於 example.com)。
當在 pg_hba.conf 中指定了主機名稱時,您應該確保名稱解析足夠快。設定本地名稱解析暫存(例如 nscd)可能是有幫助的。另外,您可能希望啟用配置參數 log_hostname 來查看用戶端的主機名稱,而不是日誌中的 IP 位址。
此欄位僅適用於 host、hostssl 和 hostnossl 規則項目。
使用者有時會想知道為什麼以這種看似複雜的方式來處理主機名稱,並具有兩種名稱解析,其中包括對用戶端 IP 地址的反向查詢。如果未設定用戶端的反向 DNS 項目或設定了某些不良的主機名稱,則會使該功能的使用複雜化。這樣做主要是為了提高效率:透過這種方式,連線嘗試最多需要兩次 DNS 查詢,一次反向查詢和一次正向查詢。如果某個位址存在 DNS 問題,則僅成為該使用者的問題。假設僅執行正向查詢的替代實作方式,必須在每次連線嘗試期間解析 pg_hba.conf 中提到的每個主機名稱。 如果列出了許多名稱,那可能會很慢。而且,如果其中一個主機名稱存在有 DNS 問題,那麼它將成為每個人的問題。
另外,必須執行反向查詢以實作後段樣式比對的功能,因為需要知道實際的用戶端主機名稱,以便將其與樣式進行比對。
請注意,此行為與基於主機名稱的存取控制的其他常見的實作方式一致,例如 Apache HTTP Server 和 TCP Wrappers。
IP-address
IP-mask
These two fields can be used as an alternative to the IP-address
/
mask-length
notation. Instead of specifying the mask length, the actual mask is specified in a separate column. For example, 255.0.0.0
represents an IPv4 CIDR mask length of 8, and 255.255.255.255
represents a CIDR mask length of 32.
These fields only apply to host
, hostssl
, and hostnossl
records.
auth-method
trust
reject
無條件地拒絕連線。這對於「過濾」群組網路中的某些主機很有用。例如拒絕阻止特定主機的連接,而更前面的規則則允許特定網路中的其餘主機進行連線。
scram-sha-256
md5
password
gss
sspi
ident
peer
ldap
radius
cert
pam
bsd
auth-options
After the auth-method
field, there can be field(s) of the form name
=
value
that specify options for the authentication method. Details about which options are available for which authentication methods appear below.
In addition to the method-specific options listed below, there is one method-independent authentication option clientcert
, which can be specified in any hostssl
record. When set to 1
, this option requires the client to present a valid (trusted) SSL certificate, in addition to the other requirements of the authentication method.
@ 語法結構包含的檔案會被入為名稱列表,可以用空格或逗號分隔。就像在 pg_hba.conf 中一樣,註釋由 # 引入,並且允許巢狀式的 @ 結構。除非 @ 之後的檔案名稱是絕對路徑,否則它將被視為相對於包含引用檔案的目錄。
Since the pg_hba.conf
records are examined sequentially for each connection attempt, the order of the records is significant. Typically, earlier records will have tight connection match parameters and weaker authentication methods, while later records will have looser match parameters and stronger authentication methods. For example, one might wish to use trust
authentication for local TCP/IP connections but require a password for remote TCP/IP connections. In this case a record specifying trust
authentication for connections from 127.0.0.1 would appear before a record specifying password authentication for a wider range of allowed client IP addresses.
The pg_hba.conf
file is read on start-up and when the main server process receives a SIGHUP signal. If you edit the file on an active system, you will need to signal the postmaster (using pg_ctl reload
or kill -HUP
) to make it re-read the file.
前面的宣告在 Microsoft Windows 上是不正確的:在 Windows,pg_hba.conf 檔案中的任何變更都會在後續的新連線立即適用。
要連線到特定的資料庫,使用者不僅必須通過 pg_hba.conf 檢查,而且必須具有資料庫的 CONNECT 權限。如果您希望限制哪些使用者可以連接到哪些資料庫,通常比設定 pg_hba.conf 項目更容易,透過授權/撤銷 CONNECT 權限來控制。
Example 20.1. Example pg_hba.conf
Entries
The following parameters are intended for work on the PostgreSQL source code, and in some cases to assist with recovery of severely damaged databases. There should be no reason to use them on a production database. As such, they have been excluded from the sample postgresql.conf
file. Note that many of these parameters require special source compilation flags to work at all.
allow_system_table_mods
(boolean
)
Allows modification of the structure of system tables. This is used by initdb
. This parameter can only be set at server start.
ignore_system_indexes
(boolean
)
Ignore system indexes when reading system tables (but still update the indexes when modifying the tables). This is useful when recovering from damaged system indexes. This parameter cannot be changed after session start.
post_auth_delay
(integer
)
The amount of time to delay when a new server process is started, after it conducts the authentication procedure. This is intended to give developers an opportunity to attach to the server process with a debugger. If this value is specified without units, it is taken as seconds. A value of zero (the default) disables the delay. This parameter cannot be changed after session start.
pre_auth_delay
(integer
)
The amount of time to delay just after a new server process is forked, before it conducts the authentication procedure. This is intended to give developers an opportunity to attach to the server process with a debugger to trace down misbehavior in authentication. If this value is specified without units, it is taken as seconds. A value of zero (the default) disables the delay. This parameter can only be set in the postgresql.conf
file or on the server command line.
trace_notify
(boolean
)
Generates a great amount of debugging output for the LISTEN
and NOTIFY
commands. or must be DEBUG1
or lower to send this output to the client or server logs, respectively.
trace_recovery_messages
(enum
)
Enables logging of recovery-related debugging output that otherwise would not be logged. This parameter allows the user to override the normal setting of , but only for specific messages. This is intended for use in debugging Hot Standby. Valid values are DEBUG5
, DEBUG4
, DEBUG3
, DEBUG2
, DEBUG1
, and LOG
. The default, LOG
, does not affect logging decisions at all. The other values cause recovery-related debug messages of that priority or higher to be logged as though they had LOG
priority; for common settings of log_min_messages
this results in unconditionally sending them to the server log. This parameter can only be set in the postgresql.conf
file or on the server command line.
trace_sort
(boolean
)
If on, emit information about resource usage during sort operations. This parameter is only available if the TRACE_SORT
macro was defined when PostgreSQL was compiled. (However, TRACE_SORT
is currently defined by default.)
trace_locks
(boolean
)
If on, emit information about lock usage. Information dumped includes the type of lock operation, the type of lock and the unique identifier of the object being locked or unlocked. Also included are bit masks for the lock types already granted on this object as well as for the lock types awaited on this object. For each lock type a count of the number of granted locks and waiting locks is also dumped as well as the totals. An example of the log file output is shown here:
Details of the structure being dumped may be found in src/include/storage/lock.h
.
This parameter is only available if the LOCK_DEBUG
macro was defined when PostgreSQL was compiled.
trace_lwlocks
(boolean
)
If on, emit information about lightweight lock usage. Lightweight locks are intended primarily to provide mutual exclusion of access to shared-memory data structures.
This parameter is only available if the LOCK_DEBUG
macro was defined when PostgreSQL was compiled.
trace_userlocks
(boolean
)
If on, emit information about user lock usage. Output is the same as for trace_locks
, only for advisory locks.
This parameter is only available if the LOCK_DEBUG
macro was defined when PostgreSQL was compiled.
trace_lock_oidmin
(integer
)
If set, do not trace locks for tables below this OID. (use to avoid output on system tables)
This parameter is only available if the LOCK_DEBUG
macro was defined when PostgreSQL was compiled.
trace_lock_table
(integer
)
Unconditionally trace locks on this table (OID).
This parameter is only available if the LOCK_DEBUG
macro was defined when PostgreSQL was compiled.
debug_deadlocks
(boolean
)
If set, dumps information about all current locks when a deadlock timeout occurs.
This parameter is only available if the LOCK_DEBUG
macro was defined when PostgreSQL was compiled.
log_btree_build_stats
(boolean
)
If set, logs system resource usage statistics (memory and CPU) on various B-tree operations.
This parameter is only available if the BTREE_BUILD_STATS
macro was defined when PostgreSQL was compiled.
wal_consistency_checking
(string
)
This parameter is intended to be used to check for bugs in the WAL redo routines. When enabled, full-page images of any buffers modified in conjunction with the WAL record are added to the record. If the record is subsequently replayed, the system will first apply each record and then test whether the buffers modified by the record match the stored images. In certain cases (such as hint bits), minor variations are acceptable, and will be ignored. Any unexpected differences will result in a fatal error, terminating recovery.
The default value of this setting is the empty string, which disables the feature. It can be set to all
to check all records, or to a comma-separated list of resource managers to check only records originating from those resource managers. Currently, the supported resource managers are heap
, heap2
, btree
, hash
, gin
, gist
, sequence
, spgist
, brin
, and generic
. Only superusers can change this setting.wal_debug
(boolean
)
If on, emit WAL-related debugging output. This parameter is only available if the WAL_DEBUG
macro was defined when PostgreSQL was compiled.
ignore_checksum_failure
(boolean
)
Detection of a checksum failure during a read normally causes PostgreSQL to report an error, aborting the current transaction. Setting ignore_checksum_failure
to on causes the system to ignore the failure (but still report a warning), and continue processing. This behavior may cause crashes, propagate or hide corruption, or other serious problems. However, it may allow you to get past the error and retrieve undamaged tuples that might still be present in the table if the block header is still sane. If the header is corrupt an error will be reported even if this option is enabled. The default setting is off
, and it can only be changed by a superuser.
zero_damaged_pages
(boolean
)
Detection of a damaged page header normally causes PostgreSQL to report an error, aborting the current transaction. Setting zero_damaged_pages
to on causes the system to instead report a warning, zero out the damaged page in memory, and continue processing. This behavior will destroy data, namely all the rows on the damaged page. However, it does allow you to get past the error and retrieve rows from any undamaged pages that might be present in the table. It is useful for recovering data if corruption has occurred due to a hardware or software error. You should generally not set this on until you have given up hope of recovering data from the damaged pages of a table. Zeroed-out pages are not forced to disk so it is recommended to recreate the table or the index before turning this parameter off again. The default setting is off
, and it can only be changed by a superuser.
jit_debugging_support
(boolean
)
If LLVM has the required functionality, register generated functions with GDB. This makes debugging easier. The default setting is off
. This parameter can only be set at server start.
jit_dump_bitcode
(boolean
)
jit_expressions
(boolean
)
jit_profiling_support
(boolean
)
If LLVM has the required functionality, emit the data needed to allow perf to profile functions generated by JIT. This writes out files to $HOME/.debug/jit/
; the user is responsible for performing cleanup when desired. The default setting is off
. This parameter can only be set at server start.
jit_tuple_deforming
(boolean
)
array_nulls
(boolean
)
This controls whether the array input parser recognizes unquoted NULL
as specifying a null array element. By default, this is on
, allowing array values containing null values to be entered. However, PostgreSQL versions before 8.2 did not support null values in arrays, and therefore would treat NULL
as specifying a normal array element with the string value “NULL”. For backward compatibility with applications that require the old behavior, this variable can be turned off
.
Note that it is possible to create array values containing null values even when this variable is off
.
backslash_quote
(enum
)
This controls whether a quote mark can be represented by \'
in a string literal. The preferred, SQL-standard way to represent a quote mark is by doubling it (''
) but PostgreSQL has historically also accepted \'
. However, use of \'
creates security risks because in some client character set encodings, there are multibyte characters in which the last byte is numerically equivalent to ASCII \
. If client-side code does escaping incorrectly then an SQL-injection attack is possible. This risk can be prevented by making the server reject queries in which a quote mark appears to be escaped by a backslash. The allowed values of backslash_quote
are on
(allow \'
always), off
(reject always), and safe_encoding
(allow only if client encoding does not allow ASCII \
within a multibyte character). safe_encoding
is the default setting.
Note that in a standard-conforming string literal, \
just means \
anyway. This parameter only affects the handling of non-standard-conforming literals, including escape string syntax (E'...'
).
escape_string_warning
(boolean
)
When on, a warning is issued if a backslash (\
) appears in an ordinary string literal ('...'
syntax) and standard_conforming_strings
is off. The default is on
.
Applications that wish to use backslash as escape should be modified to use escape string syntax (E'...'
), because the default behavior of ordinary strings is now to treat backslash as an ordinary character, per SQL standard. This variable can be enabled to help locate code that needs to be changed.
lo_compat_privileges
(boolean
)
In PostgreSQL releases prior to 9.0, large objects did not have access privileges and were, therefore, always readable and writable by all users. Setting this variable to on
disables the new privilege checks, for compatibility with prior releases. The default is off
. Only superusers and users with the appropriate SET
privilege can change this setting.
Setting this variable does not disable all security checks related to large objects — only those for which the default behavior has changed in PostgreSQL 9.0.
quote_all_identifiers
(boolean
)
When the database generates SQL, force all identifiers to be quoted, even if they are not (currently) keywords. This will affect the output of EXPLAIN
as well as the results of functions like pg_get_viewdef
. See also the --quote-all-identifiers
option of and .
standard_conforming_strings
(boolean
)
synchronize_seqscans
(boolean
)
This allows sequential scans of large tables to synchronize with each other, so that concurrent scans read the same block at about the same time and hence share the I/O workload. When this is enabled, a scan might start in the middle of the table and then “wrap around” the end to cover all rows, so as to synchronize with the activity of scans already in progress. This can result in unpredictable changes in the row ordering returned by queries that have no ORDER BY
clause. Setting this parameter to off
ensures the pre-8.3 behavior in which a sequential scan always starts from the beginning of the table. The default is on
.
transform_null_equals
(boolean
)
When on, expressions of the form expr
= NULL
(or NULL =
expr
) are treated as expr
IS NULL
, that is, they return true if expr
evaluates to the null value, and false otherwise. The correct SQL-spec-compliant behavior of expr
= NULL
is to always return null (unknown). Therefore this parameter defaults to off
.
However, filtered forms in Microsoft Access generate queries that appear to use expr
= NULL
to test for null values, so if you use that interface to access the database you might want to turn this option on. Since expressions of the form expr
= NULL
always return the null value (using the SQL standard interpretation), they are not very useful and do not appear often in normal applications so this option does little harm in practice. But new users are frequently confused about the semantics of expressions involving null values, so this option is off by default.
Note that this option only affects the exact form = NULL
, not other comparison operators or other expressions that are computationally equivalent to some expression involving the equals operator (such as IN
). Thus, this option is not a general fix for bad programming.
For convenience there are also single letter command-line option switches available for some parameters. They are described in . Some of these options exist for historical reasons, and their presence as a single-letter option does not necessarily indicate an endorsement to use the option heavily.
當用戶端應用程序連線到資料庫伺服器時,它將指定要連線的 PostgreSQL 資料庫使用者名稱,這與以特定使用者身份登入到 Unix 伺服器的方式大致相同。在 SQL 環境中,有效的資料庫使用者名確定資料庫物件的存取權限 - 有關詳細訊息,請參閱。因此,限制哪些資料庫使用者可以進行連線是非常重要的。
正如第 21 章所描述的,PostgreSQL 實際上是以「角色」的角度來管理權限的。在本章中,我們一直使用資料庫使用者來表示「具有 LOGIN 權限的角色」。
身份驗證是資料庫伺服器建立用戶端身份的過程,延伸確認用戶端應用程序(或執行用戶端應用程序的使用者)是否被允許以請求的資料庫使用者名稱進行連線。
PostgreSQL 提供了許多不同的用戶端身份驗證方法。用於驗證特定用戶端連線的方法可以根據(用戶端)主機位址、資料庫名稱和使用者名稱進行驗證。
PostgreSQL 資料庫使用者名稱在邏輯上與運行服務器的作業系統的使用者名稱是分開的。如果特定伺服器的所有用戶在伺服器的機器上也有帳戶,那麼分配與其作業系統用戶名搭配的資料庫用戶名是有意義的。但是,接受遠端連線的伺服器可能有許多沒有本地作業系統帳戶的資料庫用戶,在這種情況下,資料庫用戶名和作業系統用戶名之間不需要有所關連。
PostgreSQL provides various methods for authenticating users:
, which simply trusts that users are who they say they are.
, which requires that users send a password.
, which relies on a GSSAPI-compatible security library. Typically this is used to access an authentication server such as a Kerberos or Microsoft Active Directory server.
, which uses a Windows-specific protocol similar to GSSAPI.
, which relies on an “Identification Protocol” (RFC 1413) service on the client's machine. (On local Unix-socket connections, this is treated as peer authentication.)
, which relies on operating system facilities to identify the process at the other end of a local connection. This is not supported for remote connections.
, which relies on an LDAP authentication server.
, which relies on a RADIUS authentication server.
, which requires an SSL connection and authenticates users by checking the SSL certificate they send.
, which relies on a PAM (Pluggable Authentication Modules) library.
, which relies on the BSD Authentication framework (currently available only on OpenBSD).
Peer authentication is usually recommendable for local connections, though trust authentication might be sufficient in some circumstances. Password authentication is the easiest choice for remote connections. All the other options require some kind of external security infrastructure (usually an authentication server or a certificate authority for issuing SSL certificates), or are platform-specific.
The following sections describe each of these authentication methods in more detail.
When using an external authentication system such as Ident or GSSAPI, the name of the operating system user that initiated the connection might not be the same as the database user (role) that is to be used. In this case, a user name map can be applied to map the operating system user name to a database user. To use user name mapping, specify map
=map-name
in the options field in pg_hba.conf
. This option is supported for all authentication methods that receive external user names. Since different mappings might be needed for different connections, the name of the map to be used is specified in the map-name
parameter in pg_hba.conf
to indicate which map to use for each individual connection.
使用者名映射在標識映射檔中定義,預設情況下名為 pg_ident.conf,存儲在群集的數據目錄中。(但是,可以將映射檔放置在其他位置;請參閱 配置參數。識別對應檔包含一般形式的行:
Comments, whitespace and line continuations are handled in the same way as in pg_hba.conf
. The map-name
is an arbitrary name that will be used to refer to this mapping in pg_hba.conf
. The other two fields specify an operating system user name and a matching database user name. The same map-name
can be used repeatedly to specify multiple user-mappings within a single map.
There is no restriction regarding how many database users a given operating system user can correspond to, nor vice versa. Thus, entries in a map should be thought of as meaning “this operating system user is allowed to connect as this database user”, rather than implying that they are equivalent. The connection will be allowed if there is any map entry that pairs the user name obtained from the external authentication system with the database user name that the user has requested to connect as.
If the system-username
field starts with a slash (/
), the remainder of the field is treated as a regular expression. (See for details of PostgreSQL's regular expression syntax.) The regular expression can include a single capture, or parenthesized subexpression, which can then be referenced in the database-username
field as \1
(backslash-one). This allows the mapping of multiple user names in a single line, which is particularly useful for simple syntax substitutions. For example, these entries
will remove the domain part for users with system user names that end with @mydomain.com
, and allow any user whose system name ends with @otherdomain.com
to log in as guest
.
The pg_ident.conf
file is read on start-up and when the main server process receives a SIGHUP signal. If you edit the file on an active system, you will need to signal the postmaster (using pg_ctl reload
, calling the SQL function pg_reload_conf()
, or using kill -HUP
) to make it re-read the file.
The system view can be helpful for pre-testing changes to the pg_ident.conf
file, or for diagnosing problems if loading of the file did not have the desired effects. Rows in the view with non-null error
fields indicate problems in the corresponding lines of the file.
A pg_ident.conf
file that could be used in conjunction with the pg_hba.conf
file in is shown in . In this example, anyone logged in to a machine on the 192.168 network that does not have the operating system user name bryanh
, ann
, or robert
would not be granted access. Unix user robert
would only be allowed access when he tries to connect as PostgreSQL user bob
, not as robert
or anyone else. ann
would only be allowed to connect as ann
. User bryanh
would be allowed to connect as either bryanh
or as guest1
.
pg_ident.conf
FileWhen trust
authentication is specified, PostgreSQL assumes that anyone who can connect to the server is authorized to access the database with whatever database user name they specify (even superuser names). Of course, restrictions made in the database
and user
columns still apply. This method should only be used when there is adequate operating-system-level protection on connections to the server.
trust
authentication is appropriate and very convenient for local connections on a single-user workstation. It is usually not appropriate by itself on a multiuser machine. However, you might be able to use trust
even on a multiuser machine, if you restrict access to the server's Unix-domain socket file using file-system permissions. To do this, set the unix_socket_permissions
(and possibly unix_socket_group
) configuration parameters as described in . Or you could set the unix_socket_directories
configuration parameter to place the socket file in a suitably restricted directory.
Setting file-system permissions only helps for Unix-socket connections. Local TCP/IP connections are not restricted by file-system permissions. Therefore, if you want to use file-system permissions for local security, remove the host ... 127.0.0.1 ...
line from pg_hba.conf
, or change it to a non-trust
authentication method.
trust
authentication is only suitable for TCP/IP connections if you trust every user on every machine that is allowed to connect to the server by the pg_hba.conf
lines that specify trust
. It is seldom reasonable to use trust
for any TCP/IP connections other than those from localhost (127.0.0.1).\
The ident authentication method works by obtaining the client's operating system user name from an ident server and using it as the allowed database user name (with an optional user name mapping). This is only supported on TCP/IP connections.
When ident is specified for a local (non-TCP/IP) connection, peer authentication (see ) will be used instead.
The following configuration options are supported for ident:
map
Allows for mapping between system and database user names. See for details.
The “Identification Protocol” is described in RFC 1413. Virtually every Unix-like operating system ships with an ident server that listens on TCP port 113 by default. The basic functionality of an ident server is to answer questions like “What user initiated the connection that goes out of your port X
and connects to my port Y
?”. Since PostgreSQL knows both X
and Y
when a physical connection is established, it can interrogate the ident server on the host of the connecting client and can theoretically determine the operating system user for any given connection.
The drawback of this procedure is that it depends on the integrity of the client: if the client machine is untrusted or compromised, an attacker could run just about any program on port 113 and return any user name they choose. This authentication method is therefore only appropriate for closed networks where each client machine is under tight control and where the database and system administrators operate in close contact. In other words, you must trust the machine running the ident server. Heed the warning:
Some ident servers have a nonstandard option that causes the returned user name to be encrypted, using a key that only the originating machine's administrator knows. This option must not be used when using the ident server with PostgreSQL, since PostgreSQL does not have any way to decrypt the returned string to determine the actual user
GSSAPI is an industry-standard protocol for secure authentication defined in . PostgreSQL supports GSSAPI for authentication, communications encryption, or both. GSSAPI provides automatic authentication (single sign-on) for systems that support it. The authentication itself is secure. If GSSAPI encryption or SSL encryption is used, the data sent along the database connection will be encrypted; otherwise, it will not.
GSSAPI support has to be enabled when PostgreSQL is built; see for more information.
When GSSAPI uses Kerberos, it uses a standard service principal (authentication identity) name in the format servicename
/hostname
@realm
. The principal name used by a particular installation is not encoded in the PostgreSQL server in any way; rather it is specified in the keytab file that the server reads to determine its identity. If multiple principals are listed in the keytab file, the server will accept any one of them. The server's realm name is the preferred realm specified in the Kerberos configuration file(s) accessible to the server.
When connecting, the client must know the principal name of the server it intends to connect to. The servicename
part of the principal is ordinarily postgres
, but another value can be selected via libpq's connection parameter. The hostname
part is the fully qualified host name that libpq is told to connect to. The realm name is the preferred realm specified in the Kerberos configuration file(s) accessible to the client.
The client will also have a principal name for its own identity (and it must have a valid ticket for this principal). To use GSSAPI for authentication, the client principal must be associated with a PostgreSQL database user name. The pg_ident.conf
configuration file can be used to map principals to user names; for example, pgusername@realm
could be mapped to just pgusername
. Alternatively, you can use the full username@realm
principal as the role name in PostgreSQL without any mapping.
PostgreSQL also supports mapping client principals to user names by just stripping the realm from the principal. This method is supported for backwards compatibility and is strongly discouraged as it is then impossible to distinguish different users with the same user name but coming from different realms. To enable this, set include_realm
to 0. For simple single-realm installations, doing that combined with setting the krb_realm
parameter (which checks that the principal's realm matches exactly what is in the krb_realm
parameter) is still secure; but this is a less capable approach compared to specifying an explicit mapping in pg_ident.conf
.
The location of the server's keytab file is specified by the configuration parameter. For security reasons, it is recommended to use a separate keytab just for the PostgreSQL server rather than allowing the server to read the system keytab file. Make sure that your server keytab file is readable (and preferably only readable, not writable) by the PostgreSQL server account. (See also .)
The keytab file is generated using the Kerberos software; see the Kerberos documentation for details. The following example shows doing this using the kadmin tool of MIT-compatible Kerberos 5 implementations:
The following authentication options are supported for the GSSAPI authentication method:
include_realm
If set to 0, the realm name from the authenticated user principal is stripped off before being passed through the user name mapping (). This is discouraged and is primarily available for backwards compatibility, as it is not secure in multi-realm environments unless krb_realm
is also used. It is recommended to leave include_realm
set to the default (1) and to provide an explicit mapping in pg_ident.conf
to convert principal names to PostgreSQL user names.
map
Allows mapping from client principals to database user names. See for details. For a GSSAPI/Kerberos principal, such as username@EXAMPLE.COM
(or, less commonly, username/hostbased@EXAMPLE.COM
), the user name used for mapping is username@EXAMPLE.COM
(or username/hostbased@EXAMPLE.COM
, respectively), unless include_realm
has been set to 0, in which case username
(or username/hostbased
) is what is seen as the system user name when mapping.
krb_realm
Sets the realm to match user principal names against. If this parameter is set, only users of that realm will be accepted. If it is not set, users of any realm can connect, subject to whatever user name mapping is done.
In addition to these settings, which can be different for different pg_hba.conf
entries, there is the server-wide configuration parameter. If that is set to true, client principals are matched to user map entries case-insensitively. krb_realm
, if set, is also matched case-insensitively.
This authentication method operates similarly to password
except that it uses RADIUS as the password verification method. RADIUS is used only to validate the user name/password pairs. Therefore the user must already exist in the database before RADIUS can be used for authentication.
When using RADIUS authentication, an Access Request message will be sent to the configured RADIUS server. This request will be of type Authenticate Only
, and include parameters for user name
, password
(encrypted) and NAS Identifier
. The request will be encrypted using a secret shared with the server. The RADIUS server will respond to this request with either Access Accept
or Access Reject
. There is no support for RADIUS accounting.
Multiple RADIUS servers can be specified, in which case they will be tried sequentially. If a negative response is received from a server, the authentication will fail. If no response is received, the next server in the list will be tried. To specify multiple servers, separate the server names with commas and surround the list with double quotes. If multiple servers are specified, the other RADIUS options can also be given as comma-separated lists, to provide individual values for each server. They can also be specified as a single value, in which case that value will apply to all servers.
The following configuration options are supported for RADIUS:radiusservers
The DNS names or IP addresses of the RADIUS servers to connect to. This parameter is required.radiussecrets
The shared secrets used when talking securely to the RADIUS servers. This must have exactly the same value on the PostgreSQL and RADIUS servers. It is recommended that this be a string of at least 16 characters. This parameter is required.
The encryption vector used will only be cryptographically strong if PostgreSQL is built with support for OpenSSL. In other cases, the transmission to the RADIUS server should only be considered obfuscated, not secured, and external security measures should be applied if necessary.radiusports
The port numbers to connect to on the RADIUS servers. If no port is specified, the default RADIUS port (1812
) will be used.radiusidentifiers
The strings to be used as NAS Identifier
in the RADIUS requests. This parameter can be used, for example, to identify which database cluster the user is attempting to connect to, which can be useful for policy matching on the RADIUS server. If no identifier is specified, the default postgresql
will be used.
If it is necessary to have a comma or whitespace in a RADIUS parameter value, that can be done by putting double quotes around the value, but it is tedious because two layers of double-quoting are now required. An example of putting whitespace into RADIUS secret strings is:
SSPI is a Windows technology for secure authentication with single sign-on. PostgreSQL will use SSPI in negotiate
mode, which will use Kerberos when possible and automatically fall back to NTLM in other cases. SSPI authentication only works when both server and client are running Windows, or, on non-Windows platforms, when GSSAPI is available.
When using Kerberos authentication, SSPI works the same way GSSAPI does; see for details.
The following configuration options are supported for SSPI:
include_realm
If set to 0, the realm name from the authenticated user principal is stripped off before being passed through the user name mapping (). This is discouraged and is primarily available for backwards compatibility, as it is not secure in multi-realm environments unless krb_realm
is also used. It is recommended to leave include_realm
set to the default (1) and to provide an explicit mapping in pg_ident.conf
to convert principal names to PostgreSQL user names.
compat_realm
If set to 1, the domain's SAM-compatible name (also known as the NetBIOS name) is used for the include_realm
option. This is the default. If set to 0, the true realm name from the Kerberos user principal name is used.
Do not disable this option unless your server runs under a domain account (this includes virtual service accounts on a domain member system) and all clients authenticating through SSPI are also using domain accounts, or authentication will fail.
upn_username
If this option is enabled along with compat_realm
, the user name from the Kerberos UPN is used for authentication. If it is disabled (the default), the SAM-compatible user name is used. By default, these two names are identical for new user accounts.
Note that libpq uses the SAM-compatible name if no explicit user name is specified. If you use libpq or a driver based on it, you should leave this option disabled or explicitly specify user name in the connection string.
map
Allows for mapping between system and database user names. See for details. For a SSPI/Kerberos principal, such as username@EXAMPLE.COM
(or, less commonly, username/hostbased@EXAMPLE.COM
), the user name used for mapping is username@EXAMPLE.COM
(or username/hostbased@EXAMPLE.COM
, respectively), unless include_realm
has been set to 0, in which case username
(or username/hostbased
) is what is seen as the system user name when mapping.
krb_realm
Sets the realm to match user principal names against. If this parameter is set, only users of that realm will be accepted. If it is not set, users of any realm can connect, subject to whatever user name mapping is done.
The peer authentication method works by obtaining the client's operating system user name from the kernel and using it as the allowed database user name (with optional user name mapping). This method is only supported on local connections.
The following configuration options are supported for peer:
map
Allows for mapping between system and database user names. See for details.
Peer authentication is only available on operating systems providing the getpeereid()
function, the SO_PEERCRED
socket parameter, or similar mechanisms. Currently that includes Linux, most flavors of BSD including macOS, and Solaris.
There are several password-based authentication methods. These methods operate similarly but differ in how the users' passwords are stored on the server and how the password provided by a client is sent across the connection.
scram-sha-256
The method scram-sha-256
performs SCRAM-SHA-256 authentication, as described in . It is a challenge-response scheme that prevents password sniffing on untrusted connections and supports storing passwords on the server in a cryptographically hashed form that is thought to be secure.
This is the most secure of the currently provided methods, but it is not supported by older client libraries.
md5
The method md5
uses a custom less secure challenge-response mechanism. It prevents password sniffing and avoids storing passwords on the server in plain text but provides no protection if an attacker manages to steal the password hash from the server. Also, the MD5 hash algorithm is nowadays no longer considered secure against determined attacks.
The md5
method cannot be used with the feature.
To ease transition from the md5
method to the newer SCRAM method, if md5
is specified as a method in pg_hba.conf
but the user's password on the server is encrypted for SCRAM (see below), then SCRAM-based authentication will automatically be chosen instead.
password
The method password
sends the password in clear-text and is therefore vulnerable to password “sniffing” attacks. It should always be avoided if possible. If the connection is protected by SSL encryption then password
can be used safely, though. (Though SSL certificate authentication might be a better choice if one is depending on using SSL).
PostgreSQL database passwords are separate from operating system user passwords. The password for each database user is stored in the pg_authid
system catalog. Passwords can be managed with the SQL commands and , e.g., CREATE ROLE foo WITH LOGIN PASSWORD 'secret'
, or the psql command \password
. If no password has been set up for a user, the stored password is null and password authentication will always fail for that user.
The availability of the different password-based authentication methods depends on how a user's password on the server is encrypted (or hashed, more accurately). This is controlled by the configuration parameter at the time the password is set. If a password was encrypted using the scram-sha-256
setting, then it can be used for the authentication methods scram-sha-256
and password
(but password transmission will be in plain text in the latter case). The authentication method specification md5
will automatically switch to using the scram-sha-256
method in this case, as explained above, so it will also work. If a password was encrypted using the md5
setting, then it can be used only for the md5
and password
authentication method specifications (again, with the password transmitted in plain text in the latter case). (Previous PostgreSQL releases supported storing the password on the server in plain text. This is no longer possible.) To check the currently stored password hashes, see the system catalog pg_authid
.
To upgrade an existing installation from md5
to scram-sha-256
, after having ensured that all client libraries in use are new enough to support SCRAM, set password_encryption = 'scram-sha-256'
in postgresql.conf
, make all users set new passwords, and change the authentication method specifications in pg_hba.conf
to scram-sha-256
.
Authentication failures and related problems generally manifest themselves through error messages like the following:
This is what you are most likely to get if you succeed in contacting the server, but it does not want to talk to you. As the message suggests, the server refused the connection request because it found no matching entry in its pg_hba.conf
configuration file.
Messages like this indicate that you contacted the server, and it is willing to talk to you, but not until you pass the authorization method specified in the pg_hba.conf
file. Check the password you are providing, or check your Kerberos or ident software if the complaint mentions one of those authentication types.
The indicated database user name was not found.
The database you are trying to connect to does not exist. Note that if you do not specify a database name, it defaults to the database user name, which might or might not be the right thing.
要使用此選項,必須以 SSL 建置伺服器,也必須透過設定 來啟用 SSL(有關更多訊息,請參閱)。否則,將會忽略 hostssl 記錄,除非是為了記錄不能與任何連線相符合的警告。
Specifies the authentication method to use when a connection matches this record. The possible choices are summarized here; details are in .
無條件地允許連線。此方法允許可以連線到 PostgreSQL 資料庫伺服器的任何人以他們希望的任何 PostgreSQL 使用者身份登入,而毌需密碼或任何其他身份驗證。有關詳細資訊,請參閱。
Perform SCRAM-SHA-256 authentication to verify the user's password. See for details.
Perform SCRAM-SHA-256 or MD5 authentication to verify the user's password. See for details.
Require the client to supply an unencrypted password for authentication. Since the password is sent in clear text over the network, this should not be used on untrusted networks. See for details.
Use GSSAPI to authenticate the user. This is only available for TCP/IP connections. See for details.
Use SSPI to authenticate the user. This is only available on Windows. See for details.
Obtain the operating system user name of the client by contacting the ident server on the client and check if it matches the requested database user name. Ident authentication can only be used on TCP/IP connections. When specified for local connections, peer authentication will be used instead. See for details.
Obtain the client's operating system user name from the operating system and check if it matches the requested database user name. This is only available for local connections. See for details.
Authenticate using an LDAP server. See for details.
Authenticate using a RADIUS server. See for details.
Authenticate using SSL client certificates. See for details.
Authenticate using the Pluggable Authentication Modules (PAM) service provided by the operating system. See for details.
Authenticate using the BSD Authentication service provided by the operating system. See for details.
系統檢視表 有助於預先測試對 pg_hba.conf 檔案的變更,或者在檔案載入未達到預期效果時診斷問題。檢視表中帶有非空白錯誤欄位會指示檔案相應規則項目中的問題。
Some examples of pg_hba.conf
entries are shown in . See the next section for details on the different authentication methods.
Only has effect if are enabled.
Writes the generated LLVM IR out to the file system, inside . This is only useful for working on the internals of the JIT implementation. The default setting is off
. This parameter can only be changed by a superuser.
Determines whether expressions are JIT compiled, when JIT compilation is activated (see ). The default is on
.
Determines whether tuple deforming is JIT compiled, when JIT compilation is activated (see ). The default is on
.
This controls whether ordinary string literals ('...'
) treat backslashes literally, as specified in the SQL standard. Beginning in PostgreSQL 9.1, the default is on
(prior releases defaulted to off
). Applications can check this parameter to determine how string literals will be processed. The presence of this parameter can also be taken as an indication that the escape string syntax (E'...'
) is supported. Escape string syntax () should be used if an application desires backslashes to be treated as escape characters.
Refer to for related information.
-B
x
shared_buffers =
x
-d
x
log_min_messages = DEBUG
x
-e
datestyle = euro
-fb
, -fh
, -fi
, -fm
, -fn
, -fo
, -fs
, -ft
enable_bitmapscan = off
, enable_hashjoin = off
, enable_indexscan = off
, enable_mergejoin = off
, enable_nestloop = off
, enable_indexonlyscan = off
, enable_seqscan = off
, enable_tidscan = off
-F
fsync = off
-h
x
listen_addresses =
x
-i
listen_addresses = '*'
-k
x
unix_socket_directories =
x
-l
ssl = on
-N
x
max_connections =
x
-O
allow_system_table_mods = on
-p
x
port =
x
-P
ignore_system_indexes = on
-s
log_statement_stats = on
-S
x
work_mem =
x
-tpa
, -tpl
, -te
log_parser_stats = on
, log_planner_stats = on
, log_executor_stats = on
-W
x
post_auth_delay =
x
--RFC 1413
This authentication method operates similarly to password
except that it uses PAM (Pluggable Authentication Modules) as the authentication mechanism. The default PAM service name is postgresql
. PAM is used only to validate user name/password pairs and optionally the connected remote host name or IP address. Therefore the user must already exist in the database before PAM can be used for authentication. For more information about PAM, please read the Linux-PAM Page.
The following configuration options are supported for PAM:
pamservice
PAM service name.
pam_use_hostname
Determines whether the remote IP address or the host name is provided to PAM modules through the PAM_RHOST
item. By default, the IP address is used. Set this option to 1 to use the resolved host name instead. Host name resolution can lead to login delays. (Most PAM configurations don't use this information, so it is only necessary to consider this setting if a PAM configuration was specifically created to make use of it.)
If PAM is set up to read /etc/shadow
, authentication will fail because the PostgreSQL server is started by a non-root user. However, this is not an issue when PAM is configured to use LDAP or other authentication methods.
This authentication method operates similarly to password
except that it uses BSD Authentication to verify the password. BSD Authentication is used only to validate user name/password pairs. Therefore the user's role must already exist in the database before BSD Authentication can be used for authentication. The BSD Authentication framework is currently only available on OpenBSD.
BSD Authentication in PostgreSQL uses the auth-postgresql
login type and authenticates with the postgresql
login class if that's defined in login.conf
. By default that login class does not exist, and PostgreSQL will use the default login class.
This authentication method operates similarly to password
except that it uses LDAP as the password verification method. LDAP is used only to validate the user name/password pairs. Therefore the user must already exist in the database before LDAP can be used for authentication.
LDAP authentication can operate in two modes. In the first mode, which we will call the simple bind mode, the server will bind to the distinguished name constructed as prefix
username
suffix
. Typically, the prefix
parameter is used to specify cn=
, or DOMAIN
\
in an Active Directory environment. suffix
is used to specify the remaining part of the DN in a non-Active Directory environment.
In the second mode, which we will call the search+bind mode, the server first binds to the LDAP directory with a fixed user name and password, specified with ldapbinddn
and ldapbindpasswd
, and performs a search for the user trying to log in to the database. If no user and password is configured, an anonymous bind will be attempted to the directory. The search will be performed over the subtree at ldapbasedn
, and will try to do an exact match of the attribute specified in ldapsearchattribute
. Once the user has been found in this search, the server disconnects and re-binds to the directory as this user, using the password specified by the client, to verify that the login is correct. This mode is the same as that used by LDAP authentication schemes in other software, such as Apache mod_authnz_ldap
and pam_ldap
. This method allows for significantly more flexibility in where the user objects are located in the directory, but will cause two separate connections to the LDAP server to be made.
The following configuration options are used in both modes:ldapserver
Names or IP addresses of LDAP servers to connect to. Multiple servers may be specified, separated by spaces.ldapport
Port number on LDAP server to connect to. If no port is specified, the LDAP library's default port setting will be used.ldapscheme
Set to ldaps
to use LDAPS. This is a non-standard way of using LDAP over SSL, supported by some LDAP server implementations. See also the ldaptls
option for an alternative.ldaptls
Set to 1 to make the connection between PostgreSQL and the LDAP server use TLS encryption. This uses the StartTLS
operation per RFC 4513. See also the ldapscheme
option for an alternative.
Note that using ldapscheme
or ldaptls
only encrypts the traffic between the PostgreSQL server and the LDAP server. The connection between the PostgreSQL server and the PostgreSQL client will still be unencrypted unless SSL is used there as well.
The following options are used in simple bind mode only:ldapprefix
String to prepend to the user name when forming the DN to bind as, when doing simple bind authentication.ldapsuffix
String to append to the user name when forming the DN to bind as, when doing simple bind authentication.
The following options are used in search+bind mode only:ldapbasedn
Root DN to begin the search for the user in, when doing search+bind authentication.ldapbinddn
DN of user to bind to the directory with to perform the search when doing search+bind authentication.ldapbindpasswd
Password for user to bind to the directory with to perform the search when doing search+bind authentication.ldapsearchattribute
Attribute to match against the user name in the search when doing search+bind authentication. If no attribute is specified, the uid
attribute will be used.ldapsearchfilter
The search filter to use when doing search+bind authentication. Occurrences of $username
will be replaced with the user name. This allows for more flexible search filters than ldapsearchattribute
.ldapurl
An RFC 4516 LDAP URL. This is an alternative way to write some of the other LDAP options in a more compact and standard form. The format is
scope
must be one of base
, one
, sub
, typically the last. (The default is base
, which is normally not useful in this application.) attribute
can nominate a single attribute, in which case it is used as a value for ldapsearchattribute
. If attribute
is empty then filter
can be used as a value for ldapsearchfilter
.
The URL scheme ldaps
chooses the LDAPS method for making LDAP connections over SSL, equivalent to using ldapscheme=ldaps
. To use encrypted LDAP connections using the StartTLS
operation, use the normal URL scheme ldap
and specify the ldaptls
option in addition to ldapurl
.
For non-anonymous binds, ldapbinddn
and ldapbindpasswd
must be specified as separate options.
LDAP URLs are currently only supported with OpenLDAP, not on Windows.
It is an error to mix configuration options for simple bind with options for search+bind.
When using search+bind mode, the search can be performed using a single attribute specified with ldapsearchattribute
, or using a custom search filter specified with ldapsearchfilter
. Specifying ldapsearchattribute=foo
is equivalent to specifying ldapsearchfilter="(foo=$username)"
. If neither option is specified the default is ldapsearchattribute=uid
.
If PostgreSQL was compiled with OpenLDAP as the LDAP client library, the ldapserver
setting may be omitted. In that case, a list of host names and ports is looked up via RFC 2782 DNS SRV records. The name _ldap._tcp.DOMAIN
is looked up, where DOMAIN
is extracted from ldapbasedn
.
Here is an example for a simple-bind LDAP configuration:
When a connection to the database server as database user someuser
is requested, PostgreSQL will attempt to bind to the LDAP server using the DN cn=someuser, dc=example, dc=net
and the password provided by the client. If that connection succeeds, the database access is granted.
Here is an example for a search+bind configuration:
When a connection to the database server as database user someuser
is requested, PostgreSQL will attempt to bind anonymously (since ldapbinddn
was not specified) to the LDAP server, perform a search for (uid=someuser)
under the specified base DN. If an entry is found, it will then attempt to bind using that found information and the password supplied by the client. If that second connection succeeds, the database access is granted.
Here is the same search+bind configuration written as a URL:
Some other software that supports authentication against LDAP uses the same URL format, so it will be easier to share the configuration.
Here is an example for a search+bind configuration that uses ldapsearchfilter
instead of ldapsearchattribute
to allow authentication by user ID or email address:
Here is an example for a search+bind configuration that uses DNS SRV discovery to find the host name(s) and port(s) for the LDAP service for the domain name example.net
:
Since LDAP often uses commas and spaces to separate the different parts of a DN, it is often necessary to use double-quoted parameter values when configuring LDAP options, as shown in the examples.
PostgreSQL 使用角色的概念來管理資料庫的存取權限。角色可以被視為資料庫使用者或一個資料庫使用者群組,具體取決於角色的設定方式。角色可以擁有資料庫物件(例如資料表和函數),並可以將這些物件的權限分配給其他角色,以控制誰可以存取哪些物件。此外,也可以將角色中的成員身份授予另一個角色,使得成員角色能夠使用分配給其他角色的權限。
角色的概念包含「使用者」和「群組」的概念。在 8.1 版之前的 PostgreSQL中,使用者和群組是不同種類的實體,但現在只有角色。任何角色都可以充當使用者、群組或兩者兼具。
本章介紹如何建立和管理角色。有關角色權限對各種資料庫物件的影響和更多訊息可以在 5.7 節中找到。
此身份驗證方法使用 SSL 用戶端憑證進行身份驗證。因此,它僅適用於 SSL 連線。使用此身份驗證方法時,伺服器將要求用戶端提供有效可信任的憑證。不會有密碼提示發送給用戶端。憑證的 cn(Common Name)屬性將與請求連線的資料庫使用者名稱進行比較,如果符合,則允許登入。使用者名稱對應可用於允許 cn 與資料庫使用者名稱不同。
SSL 憑證身份驗證支援以下配置選項:
map
允許在系統使用者名稱和資料庫使用者名稱之間進行對應。相關詳細資訊,請參閱第 21.2 節。
在指定憑證認證的 pg_hba.conf 記錄中,憑證選項 clientcert 被假設為 verify-ca 或 verify-full,由於此方法需要用戶端憑證,因此無法將其關閉。 cert 方法增加了基本 clientcert 憑證有效性測試的方法是檢查 cn 屬性是否與資料庫使用者名稱相符。
Functions, triggers and row-level security policies allow users to insert code into the backend server that other users might execute unintentionally. Hence, these mechanisms permit users to “Trojan horse” others with relative ease. The strongest protection is tight control over who can define objects. Where that is infeasible, write queries referring only to objects having trusted owners. Remove from search_path
the public schema and any other schemas that permit untrusted users to create objects.
Functions run inside the backend server process with the operating system permissions of the database server daemon. If the programming language used for the function allows unchecked memory accesses, it is possible to change the server's internal data structures. Hence, among many other things, such functions can circumvent any system access controls. Function languages that allow such access are considered “untrusted”, and PostgreSQL allows only superusers to create functions written in those languages.
Database roles are conceptually completely separate from operating system users. In practice it might be convenient to maintain a correspondence, but this is not required. Database roles are global across a database cluster installation (and not per individual database). To create a role use the CREATE ROLE SQL command:
name
follows the rules for SQL identifiers: either unadorned without special characters, or double-quoted. (In practice, you will usually want to add additional options, such as LOGIN
, to the command. More details appear below.) To remove an existing role, use the analogous DROP ROLE command:
For convenience, the programs createuser and dropuser are provided as wrappers around these SQL commands that can be called from the shell command line:
To determine the set of existing roles, examine the pg_roles
system catalog, for example
The psql program's \du
meta-command is also useful for listing the existing roles.
In order to bootstrap the database system, a freshly initialized system always contains one predefined role. This role is always a “superuser”, and by default (unless altered when running initdb
) it will have the same name as the operating system user that initialized the database cluster. Customarily, this role will be named postgres
. In order to create more roles you first have to connect as this initial role.
Every connection to the database server is made using the name of some particular role, and this role determines the initial access privileges for commands issued in that connection. The role name to use for a particular database connection is indicated by the client that is initiating the connection request in an application-specific fashion. For example, the psql
program uses the -U
command line option to indicate the role to connect as. Many applications assume the name of the current operating system user by default (including createuser
and psql
). Therefore it is often convenient to maintain a naming correspondence between roles and operating system users.
The set of database roles a given client connection can connect as is determined by the client authentication setup, as explained in Chapter 20. (Thus, a client is not limited to connect as the role matching its operating system user, just as a person's login name need not match his or her real name.) Since the role identity determines the set of privileges available to a connected client, it is important to carefully configure privileges when setting up a multiuser environment.
PostgreSQL provides a set of predefined roles that provide access to certain, commonly needed, privileged capabilities and information. Administrators (including roles that have the CREATEROLE
privilege) can GRANT
these roles to users and/or other roles in their environment, providing those users with access to the specified capabilities and information.
The predefined roles are described in Table 22.1. Note that the specific permissions for each of the roles may change in the future as additional capabilities are added. Administrators should monitor the release notes for changes.
Table 22.1. Predefined Roles
pg_read_all_data
Read all data (tables, views, sequences), as if having SELECT
rights on those objects, and USAGE rights on all schemas, even without having it explicitly. This role does not have the role attribute BYPASSRLS
set. If RLS is being used, an administrator may wish to set BYPASSRLS
on roles which this role is GRANTed to.
pg_write_all_data
Write all data (tables, views, sequences), as if having INSERT
, UPDATE
, and DELETE
rights on those objects, and USAGE rights on all schemas, even without having it explicitly. This role does not have the role attribute BYPASSRLS
set. If RLS is being used, an administrator may wish to set BYPASSRLS
on roles which this role is GRANTed to.
pg_read_all_settings
Read all configuration variables, even those normally visible only to superusers.
pg_read_all_stats
Read all pg_stat_* views and use various statistics related extensions, even those normally visible only to superusers.
pg_stat_scan_tables
Execute monitoring functions that may take ACCESS SHARE
locks on tables, potentially for a long time.
pg_monitor
Read/execute various monitoring views and functions. This role is a member of pg_read_all_settings
, pg_read_all_stats
and pg_stat_scan_tables
.
pg_database_owner
None. Membership consists, implicitly, of the current database owner.
pg_signal_backend
Signal another backend to cancel a query or terminate its session.
pg_read_server_files
Allow reading files from any location the database can access on the server with COPY and other file-access functions.
pg_write_server_files
Allow writing to files in any location the database can access on the server with COPY and other file-access functions.
pg_execute_server_program
Allow executing programs on the database server as the user the database runs as with COPY and other functions which allow executing a server-side program.
The pg_monitor
, pg_read_all_settings
, pg_read_all_stats
and pg_stat_scan_tables
roles are intended to allow administrators to easily configure a role for the purpose of monitoring the database server. They grant a set of common privileges allowing the role to read various useful configuration settings, statistics and other system information normally restricted to superusers.
The pg_database_owner
role has one implicit, situation-dependent member, namely the owner of the current database. The role conveys no rights at first. Like any role, it can own objects or receive grants of access privileges. Consequently, once pg_database_owner
has rights within a template database, each owner of a database instantiated from that template will exercise those rights. pg_database_owner
cannot be a member of any role, and it cannot have non-implicit members.
The pg_signal_backend
role is intended to allow administrators to enable trusted, but non-superuser, roles to send signals to other backends. Currently this role enables sending of signals for canceling a query on another backend or terminating its session. A user granted this role cannot however send signals to a backend owned by a superuser. See Section 9.27.2.
The pg_read_server_files
, pg_write_server_files
and pg_execute_server_program
roles are intended to allow administrators to have trusted, but non-superuser, roles which are able to access files and run programs on the database server as the user the database runs as. As these roles are able to access any file on the server file system, they bypass all database-level permission checks when accessing files directly and they could be used to gain superuser-level access, therefore great care should be taken when granting these roles to users.
Care should be taken when granting these roles to ensure they are only used where needed and with the understanding that these roles grant access to privileged information.
Administrators can grant access to these roles to users using the GRANT
command, for example:
A database role can have a number of attributes that define its privileges and interact with the client authentication system.login privilege
Only roles that have the LOGIN
attribute can be used as the initial role name for a database connection. A role with the LOGIN
attribute can be considered the same as a “database user”. To create a role with login privilege, use either:
(CREATE USER
is equivalent to CREATE ROLE
except that CREATE USER
includes LOGIN
by default, while CREATE ROLE
does not.)superuser status
A database superuser bypasses all permission checks, except the right to log in. This is a dangerous privilege and should not be used carelessly; it is best to do most of your work as a role that is not a superuser. To create a new database superuser, use CREATE ROLE
name
SUPERUSER. You must do this as a role that is already a superuser.database creation
A role must be explicitly given permission to create databases (except for superusers, since those bypass all permission checks). To create such a role, use CREATE ROLE
name
CREATEDB.role creation
A role must be explicitly given permission to create more roles (except for superusers, since those bypass all permission checks). To create such a role, use CREATE ROLE
name
CREATEROLE. A role with CREATEROLE
privilege can alter and drop other roles, too, as well as grant or revoke membership in them. However, to create, alter, drop, or change membership of a superuser role, superuser status is required; CREATEROLE
is insufficient for that.initiating replication
A role must explicitly be given permission to initiate streaming replication (except for superusers, since those bypass all permission checks). A role used for streaming replication must have LOGIN
permission as well. To create such a role, use CREATE ROLE
name
REPLICATION LOGIN.password
A password is only significant if the client authentication method requires the user to supply a password when connecting to the database. The password
and md5
authentication methods make use of passwords. Database passwords are separate from operating system passwords. Specify a password upon role creation with CREATE ROLE
name
PASSWORD 'string
'.
A role's attributes can be modified after creation with ALTER ROLE
. See the reference pages for the CREATE ROLE and ALTER ROLE commands for details.
It is good practice to create a role that has the CREATEDB
and CREATEROLE
privileges, but is not a superuser, and then use this role for all routine management of databases and roles. This approach avoids the dangers of operating as a superuser for tasks that do not really require it.
A role can also have role-specific defaults for many of the run-time configuration settings described in Chapter 19. For example, if for some reason you want to disable index scans (hint: not a good idea) anytime you connect, you can use:
This will save the setting (but not set it immediately). In subsequent connections by this role it will appear as though SET enable_indexscan TO off
had been executed just before the session started. You can still alter this setting during the session; it will only be the default. To remove a role-specific default setting, use ALTER ROLE
rolename
RESET varname
. Note that role-specific defaults attached to roles without LOGIN
privilege are fairly useless, since they will never be invoked.
It is frequently convenient to group users together to ease management of privileges: that way, privileges can be granted to, or revoked from, a group as a whole. In PostgreSQL this is done by creating a role that represents the group, and then granting membership in the group role to individual user roles.
To set up a group role, first create the role:
Typically a role being used as a group would not have the LOGIN
attribute, though you can set it if you wish.
Once the group role exists, you can add and remove members using the GRANT and REVOKE commands:
You can grant membership to other group roles, too (since there isn't really any distinction between group roles and non-group roles). The database will not let you set up circular membership loops. Also, it is not permitted to grant membership in a role to PUBLIC
.
The members of a group role can use the privileges of the role in two ways. First, every member of a group can explicitly do SET ROLE to temporarily “become” the group role. In this state, the database session has access to the privileges of the group role rather than the original login role, and any database objects created are considered owned by the group role not the login role. Second, member roles that have the INHERIT
attribute automatically have use of the privileges of roles of which they are members, including any privileges inherited by those roles. As an example, suppose we have done:
Immediately after connecting as role joe
, a database session will have use of privileges granted directly to joe
plus any privileges granted to admin
, because joe
“inherits” admin
's privileges. However, privileges granted to wheel
are not available, because even though joe
is indirectly a member of wheel
, the membership is via admin
which has the NOINHERIT
attribute. After:
the session would have use of only those privileges granted to admin
, and not those granted to joe
. After:
the session would have use of only those privileges granted to wheel
, and not those granted to either joe
or admin
. The original privilege state can be restored with any of:
The SET ROLE
command always allows selecting any role that the original login role is directly or indirectly a member of. Thus, in the above example, it is not necessary to become admin
before becoming wheel
.
In the SQL standard, there is a clear distinction between users and roles, and users do not automatically inherit privileges while roles do. This behavior can be obtained in PostgreSQL by giving roles being used as SQL roles the INHERIT
attribute, while giving roles being used as SQL users the NOINHERIT
attribute. However, PostgreSQL defaults to giving all roles the INHERIT
attribute, for backward compatibility with pre-8.1 releases in which users always had use of permissions granted to groups they were members of.
The role attributes LOGIN
, SUPERUSER
, CREATEDB
, and CREATEROLE
can be thought of as special privileges, but they are never inherited as ordinary privileges on database objects are. You must actually SET ROLE
to a specific role having one of these attributes in order to make use of the attribute. Continuing the above example, we might choose to grant CREATEDB
and CREATEROLE
to the admin
role. Then a session connecting as role joe
would not have these privileges immediately, only after doing SET ROLE admin
.
To destroy a group role, use DROP ROLE:
Any memberships in the group role are automatically revoked (but the member roles are not otherwise affected).
因為角色可以擁有資料庫物件,並且可以擁有存取其他物件的權限,所以移除角色通常不僅僅是快速 DROP USER 的問題。該角色擁有的任何物件必須先被移除或重新分配給其他使用者;而授予角色的任何權限也都必須被撤銷。
物件的所有權可以使用 ALTER 指令一次轉換,例如:
或者,可以使用 REASSIGN OWNED 指令將要移除角色擁有的所有物件的所有權重新分配給另一個角色。由於 REASSIGN OWNED 無法存取其他資料庫中的物件,因此有必要在包含該角色擁有的物件的每個資料庫中執行它。(請注意,第一個這樣的 REASSIGN OWNED 將改變任何共享的資料庫間物件,即資料庫或資料表空間的所有權,這些資料庫或資料表空間由將被移除的角色所擁有)。
一旦任何有價值的物件已經轉移到新的所有者中,則可以使用 DROP OWNED 指令移弓除待移除角色擁有的任何剩餘物件。同樣,此指令無法存取其他資料庫中的物件,因此有必要在包含該角色擁有的物件的每個資料庫中執行它。此外,DROP OWNED 不會刪除整個資料庫或資料表空間,因此如果角色擁有尚未轉移到新所有者的任何資料庫或資料表空間,則必須手動執行此操作。
DROP OWNED 還負責為不屬於它的物件移除授予目標角色的所有權限。由於 REASSIGN OWNED 不會觸及這些物件,因此通常需要運行 REASSIGN OWNED 和 DROP OWNED(按此順序!)以完全移除要移除的角色的相依關係。
簡而言之,移除已用於擁有物件的角色的最一般的方式是:
當並非所有擁有的物件都將被轉移到相同的繼任者使用者時,最好手動處理異常,然後執行上述步驟來清除。
如果在相依物件仍然存在的情況下嘗試 DROP ROLE,則會發出哪些物件需要重新分配或移除的訊息。
正在執行的 PostgreSQL 伺服器的每個執行程序都會管理一個或多個資料庫。而資料庫是組織 SQL 物件(資料庫物件)的最高層級。本章介紹資料庫的屬性以及如何建立、管理和銷毀資料庫。
In order to create a database, the PostgreSQL server must be up and running (see ).
Databases are created with the SQL command :
where name
follows the usual rules for SQL identifiers. The current role automatically becomes the owner of the new database. It is the privilege of the owner of a database to remove it later (which also removes all the objects in it, even if they have a different owner).
The creation of databases is a restricted operation. See for how to grant permission.
Since you need to be connected to the database server in order to execute the CREATE DATABASE
command, the question remains how the first database at any given site can be created. The first database is always created by the initdb
command when the data storage area is initialized. (See .) This database is called postgres
. So to create the first “ordinary” database you can connect to postgres
.
Two additional databases, template1
and template0
, are also created during database cluster initialization. Whenever a new database is created within the cluster, template1
is essentially cloned. This means that any changes you make in template1
are propagated to all subsequently created databases. Because of this, avoid creating objects in template1
unless you want them propagated to every newly created database. template0
is meant as a pristine copy of the original contents of template1
. It can be cloned instead of template1
when it is important to make a database without any such site-local additions. More details appear in .
As a convenience, there is a program you can execute from the shell to create new databases, createdb
.
createdb
does no magic. It connects to the postgres
database and issues the CREATE DATABASE
command, exactly as described above. The reference page contains the invocation details. Note that createdb
without any arguments will create a database with the current user name.
Sometimes you want to create a database for someone else, and have them become the owner of the new database, so they can configure and manage it themselves. To achieve that, use one of the following commands:
from the SQL environment, or:
from the shell. Only the superuser is allowed to create a database for someone else (that is, for a role you are not a member of).
CREATE DATABASE 實際上是透過複製現有資料庫來作業的。預設情況下,它是複製名為 template1 的標準系統資料庫。因此,該資料庫是製作新資料庫的「樣板」。如果向 template1 新增物件,則會使這些物件複製到隨後建立的使用者資料庫中。此行為可以對資料庫中的標準物件集合進行本地的修改。例如,如果在 template1 中安裝程序語言 PL/Perl,它將自動在使用者資料庫中可用,而在建立這些資料庫時不需要採取任何額外操作。
有一個名為 template0 的第二個標準系統資料庫。此資料庫包含與 template1 初始內容相同的資料,即只有您的 PostgreSQL 版本預先定義的標準物件。初始化資料庫叢集後,永遠都不應變更 template0。透過指示 CREATE DATABASE 複製 template0 而不是 template1,您可以建立一個「virgin」使用者資料庫,其中不包含 template1 中的任何本地變更。這在恢復 pg_dump 轉存時尤其方便:轉存腳本應該在原始資料庫中恢復,以確保重新建立轉存資料庫的正確內容,而不會與稍後可能增加到 template1 的物件發生衝突。
複製 template0 而不是 template1 的另一個常見原因是,在複製 template0 時可以指定新的編碼和區域設定,而 template1 的副本必須使用與其相同的設定。這是因為 template1 可能包含特定於編碼或特定於語言環境的資料,是 template0 所不知道的。
要透過複製 template0 建立資料庫,請使用:
在 SQL 環境,或:
在命令列。
要建立其他樣模資料庫,實際上可以透過將其名稱指定為 CREATE DATABASE 的樣板來複製叢集中的任何資料庫。然而,重要的是要理解,這不能用作通用的「COPY DATABASE」操作。主要限制是在複製來源資料庫時不能將其他連線連接到來源資料庫。如果啟動時存在任何其他的連線,則 CREATE DATABASE 將會失敗;在複製操作期間,會阻止與來源資料庫的新連線。
每個資料庫的 pg_database 中都存在兩個有用的標記:欄位 datistemplate 和 datallowconn。可以設定 datistemplate 以指示資料庫是否用作 CREATE DATABASE 的樣板。如果設定了此標記,則任何具有 CREATEDB 權限的使用者都可以複製資料庫;如果未設定,則只有超級使用者和資料庫的所有者才能複製它。如果 datallowconn 為 false,則不允許與該資料庫建立新的連線(但僅透過將標記設定為 false,不會終止現有連線)。template0 資料庫通常標記為 datallowconn = false 以防止其修改。template0 和 template1 都應該始終保持 datistemplate = true 標記。
除了名稱 template1 是 CREATE DATABASE 的預設來源資料庫名稱之外,template1 和 template0 沒有任何特殊狀態。例如,可以刪除 template1 並從 template0 重新建立它而不會產生任何不良影響。如果一個人在 template1 中不小心加入了一堆垃圾,那麼這個方案可能是可接受的。(要刪除 template1,必須具有 pg_database.datistemplate = false。)
初始化資料庫叢集時也會建立 postgres 資料庫。此資料庫用作連線使用者和應用程序的預設資料庫。它只是 template1 的副本,可以在必要時刪除並重新建立。
A small number of objects, like role, database, and tablespace names, are defined at the cluster level and stored in the pg_global
tablespace. Inside the cluster are multiple databases, which are isolated from each other but can access cluster-level objects. Inside each database are multiple schemas, which contain objects like tables and functions. So the full hierarchy is: cluster, database, schema, table (or some other kind of object, such as a function).
When connecting to the database server, a client must specify the database name in its connection request. It is not possible to access more than one database per connection. However, clients can open multiple connections to the same database, or different databases. Database-level security has two components: access control (see ), managed at the connection level, and authorization control (see ), managed via the grant system. Foreign data wrappers (see ) allow for objects within one database to act as proxies for objects in other database or clusters. The older dblink module (see ) provides a similar capability. By default, all users can connect to all databases using all connection methods.
If one PostgreSQL server cluster is planned to contain unrelated projects or users that should be, for the most part, unaware of each other, it is recommended to put them into separate databases and adjust authorizations and access controls accordingly. If the projects or users are interrelated, and thus should be able to use each other's resources, they should be put in the same database but probably into separate schemas; this provides a modular structure with namespace isolation and authorization control. More information about managing schemas is in .
While multiple databases can be created within a single cluster, it is advised to consider carefully whether the benefits outweigh the risks and limitations. In particular, the impact that having a shared WAL (see ) has on backup and recovery options. While individual databases in the cluster are isolated when considered from the user's perspective, they are closely bound from the database administrator's point-of-view.
Databases are created with the CREATE DATABASE
command (see ) and destroyed with the DROP DATABASE
command (see ). To determine the set of existing databases, examine the pg_database
system catalog, for example
The program's \l
meta-command and -l
command-line option are also useful for listing the existing databases.
Recall from that the PostgreSQL server provides a large number of run-time configuration variables. You can set database-specific default values for many of these settings.
For example, if for some reason you want to disable the GEQO optimizer for a given database, you'd ordinarily have to either disable it for all databases or make sure that every connecting client is careful to issue SET geqo TO off
. To make this setting the default within a particular database, you can execute the command:
This will save the setting (but not set it immediately). In subsequent connections to this database it will appear as though SET geqo TO off;
had been executed just before the session started. Note that users can still alter this setting during their sessions; it will only be the default. To undo any such setting, use ALTER DATABASE
dbname
RESET
varname
.
本章從管理員的角度描述可用的本地化語系功能。PostgreSQL 支援兩種本地化的功能:
使用作業系統的區域設定功能來提供特定於語言環境的資料排序、數字格式、翻譯的訊息和其他方面。這在和中有介紹。
提供許多不同的字元集以支援各種語言的儲存檔案,並在用戶端和伺服器之間提供字元集轉換。這在中有介紹。
像任何資料庫軟體一樣,PostgreSQL 要求定期執行某些任務以維持最佳性能。這裡討論的任務是必須的,但它們本質上是重複性的,並且可以使用標準工具(如 cron 腳本或 Windows 的「Task Scheduler」)輕鬆實現自動化。資料庫管理員有責任設置適當的腳本,並檢查它們是否成功執行。
一項明顯的維護任務是定期建立資料的備份副本。如果沒有最近的備份,在災難發生後(磁碟故障、火災、錯誤地刪除關鍵資料表等),您將無法恢復。PostgreSQL 中的備份和還原機制將在中詳細討論。
另一個主要類別的維護任務是定期「清理」資料庫。這個活動在中討論。與此密切相關的是更新查詢規劃器所使用的統計信息,如所述。
另一個需要定期關注的任務是日誌檔案管理。這在中討論。
可用於監控資料庫執行狀況並回報異常情況。check_postgres 能與 Nagios 和 MRTG 共同運作,但也可以獨立運行。
與其他一些資料庫管理系統相比,PostgreSQL 維護費用較低。儘管如此,對這些任務的適當關注將能有效地確保系統的使用上愉快且富有成效的體驗。
Databases are destroyed with the command DROP DATABASE:
Only the owner of the database, or a superuser, can drop a database. Dropping a database removes all objects that were contained within the database. The destruction of a database cannot be undone.
You cannot execute the DROP DATABASE
command while connected to the victim database. You can, however, be connected to any other database, including the template1
database. template1
would be the only option for dropping the last user database of a given cluster.
For convenience, there is also a shell program to drop databases, dropdb:
(Unlike createdb
, it is not the default action to drop the database with the current user name.)\
PostgreSQL 中的資料表空間允許資料庫管理者定義檔案系統中可以儲存資料庫物件的檔案的路徑。建立完成後,在建立資料庫物件時可以透過名稱來引用資料表空間。
通過使用資料表空間,管理者可以控制 PostgreSQL 安裝的磁碟規畫。至少在兩個方面是很有用的。首先,如果初始化叢集的分割區(partition)或磁碟區(volume)的空間不足並且無法擴展時,則可以在不同的分割區上建立資料表空間,資料庫系統重新配置即可使用。
其次,資料表空間允許管理者依資料庫物件特性的知識來優化效能。例如,使用率很高的索引可以放置在非常快速、高可用的磁碟上,例如昂貴的固態磁碟。另一方面,對於很少使用或不關鍵的歸檔資料的資料表可以儲存在較便宜、速度較慢的磁碟系統上。
即使位於主 PostgreSQL 資料目錄之外,資料表空間也是資料庫叢集組成的一部分,並且它將作為資料檔案的自治集合來處理。它們會依賴於主資料目錄中包含的中繼資料,因此無法附加到不同的資料庫叢集或單獨備份。同樣,如果您失去了一個資料表空間(檔案被刪除、磁碟故障等),資料庫叢集可能變得不可讀取或無法啟動。所以將資料表空間放置在臨時檔案系統(如 RAM Disk)上會影響整個叢集的可靠性。
要定義資料表空間,請使用 CREATE TABLESPACE 指令,例如:
該路徑必須是 PostgreSQL 作業系統使用者所擁有的空白目錄。隨後在資料表空間內建立的所有物件都將儲存在此目錄下的檔案中。該位置不得位於可移除或瞬時儲存上,因為如果資料表空間失去了,叢集可能會無法運行。
在每個邏輯檔案系統中建立多個資料表空間通常沒有什麼意義,因為你無法控制邏輯檔案系統內單個檔案的位置。但是,PostgreSQL 不會強制實施任何此類限制,事實上它並不直接發現系統上的檔案系統界線。 它只是將檔案儲存在你告訴它所使用的目錄中而已。
建立資料表空間本身必須以資料庫超級使用者的身份完成,但在此之後,你可以允許普通的資料庫使用者來使用它。為此,請為它們授予 CREATE 的權限。
資料表、索引和整個資料庫可以分配給特定的資料表空間。為此,具有給定資料表空間上的 CREATE 權限的使用者必須將資料表空間名稱作為參數傳遞給相關的指令。例如,下面會在資料表空間 space1 中建立一個資料表:
或者,使用 default_tablespace 參數:
當 default_tablespace 設定為空字符之外的任何內容時,它將為 CREATE TABLE 和 CREATE INDEX 指令提供一個隱含的 TABLESPACE 子句,當它們沒有明確的 TABLESPACE 子句的時候。
還有一個 temp_tablespaces 參數,用於指定臨時資料表和索引的位置,以及用於排序大型資料之類目的的臨時檔案。這可以是資料表空間名稱的列表,而不是只有一個,以便與臨時物件關聯的負載可以分佈在多個資料表空間中。每次建立臨時物件時都會挑選該列表的隨機成員。
與資料庫關聯的資料表空間用於儲存該資料庫的系統目錄。此外,如果沒有給予 TABLESPACE 子句也沒有其他由 default_tablespace 或 temp_tablespaces(根據需要)的選擇指定的話,那麼它是用於在資料庫內建立的資料表、索引和臨時檔案的預設資料表空間。如果建立的資料庫沒有為其指定資料表空間,則它使用與從其複製的模板資料庫相同的資料表空間。
當資料庫叢集初始化時,會自動建立兩個資料表空間。pg_global 資料表空間用於共享的系統目錄。pg_default 資料表空間是 template1 和 template0 資料庫的預設資料表空間(因此,除非它被 CREATE DATABASE 中的 TABLESPACE 子句所取代,否則它將成為其他資料庫的預設資料表空間)。
一旦建立之後,可以從任何資料庫使用資料表空間,只要請求的使用者具有足夠的權限即可。這意味著,除非所有使用資料表空間的資料庫中所有物件都被刪除,否則不能刪除資料表空間。
要刪除空的資料表空間,請使用 DROP TABLESPACE 指令。
例如,要確認一組現有的資料表空間,請檢查 pg_tablespace 系統目錄
psql 工具中的 \db 指令對於列出現有的資料表空間也很有用。
PostgreSQL 利用 symbolic link 來簡化資料表空間的管理。但這也意味著資料表空間只能用於支援 symbolic link 的系統。
目錄 $PGDATA/pg_tblspc 包含指向叢集中定義的每個非內建資料表空間的 symbolic link。儘管並不推薦,但可以透過重新定義這些連接來手動調整資料表空間的佈局。在伺服器運行期間,任何情況下都不會執行此操作。請注意,在 PostgreSQL 9.1 及更早版本中,你還需要使用新位置更新 pg_tablespace 目錄。(如果你不這樣做,pg_dump 將繼續輸出到舊的資料表空間路徑。)
PostgreSQL 資料庫需要定期維護,稱為資料庫清理(vacuum)。 對於一裝的執行環境而言,透過 autovacuum 背景程序進行資料庫清理就足夠了,這在 24.1.6 節中有描述。您可能需要調整其中所描述的自動清除參數,以獲得您的情況的最佳結果。 一些資料庫管理員希望用手動管理的 VACUUM 命令來補充或替換背景程序的活動,這些命令通常根據 cron 或 Task Scheduler 的腳本計劃執行。 要正確設定手動管理的資料庫清理,了解接下來幾小節中討論的問題至關重要。依靠自動清理的管理員可能仍然希望瀏覽這些內容以幫助他們理解和調整自動清理。
必須以 PostgreSQL VACUUM 命令處理每個資料表,原因如下:
恢復或回收使用因更新或刪除資料列所佔用的磁碟空間。
更新 PostgreSQL 查詢計劃器使用的資料統計資訊。
更新可視性結構,這會增加索引限定掃描的效率。
防止由於事務 ID 重覆或 multixact ID 重覆而失去非常舊的資料。
這些原因中的每一個都會要求執行不同頻率和範圍的 VACUUM 操作,如以下小節所述。
VACUUM 有兩種執行方式:標準 VACUUM 和 VACUUM FULL。VACUUM FULL 可以回收更多磁碟空間,但執行速度要慢得多。而且,VACUUM 的標準形式可以與線上資料庫同時運作。(SELECT、INSERT、UPDATE 和 DELETE 等指令將繼續正常工作,但在 VACUUM FULL 時,您將無法使用諸如 ALTER TABLE 之類的指令修改資料表的定義。)VACUUM FULL 需要獨占鎖定它正在處理的資料表,因此無法與其他資料表的使用同時進行。因此,一般來說,管理員應該努力使用一般的 VACUUM 而避免進行 VACUUM FULL。
VACUUM 會產生大量的 I/O流量,這會導致其他正在進行的連線效能較差。有一些配置參數可以調整以減少背景資料庫清理對效能的影響 - 參閱第 19.4.4 節。
在 PostgreSQL 中,資料列的 UPDATE 或 DELETE 不會立即刪除該資料列的舊版本。這種方法對於獲得多版本平行控制(MVCC,參閱第 13 章)的好處是必要的:資料列的版本不能被刪除,而其他事務仍然可以看到。 但最終,過時或刪除的資料列版本不再讓任何交易感興趣。它佔用的空間必須被新的資料列重新使用以避免無限增長的磁碟空間需求。這就是透過執行 VACUUM 來完成的。
VACUUM 的標準作法是移除資料表和索引中過時的資料列版本,並標記可供將來重複使用的空間。 但是,除非資料表末端的一個或多個頁面變為完全空閒並且可以輕鬆獲取排他資料表鎖定的特殊情況,否則它不會將空間還給作業系統。相比之下,VACUUM FULL 透過寫入完整新版本使其沒有空閒的空間來主動壓縮資料表。這最大限度地減少了資料表的大小,但可能需要很長時間。 它還需要用於資料表新副本的額外磁碟空間,直到操作完成。
常態的資料庫清理通常目標是經常足夠地執行標準 VACUUM 以避免需要 VACUUM FULL。autovacuum 背景程序嘗試以這種方式工作,實際上永遠不會發出 VACUUM FULL。在這種方法中,這個想法並不是將資料表保持在最小尺寸,而是為了保持磁碟空間的穩定狀態使用:每個資料表都佔用相當於其最小尺寸的空間,再加上在 VACUUM 之間使用的空間很大,儘管可以使用 VACUUM FULL 將表縮回到最小大小並將磁碟空間還回到作業系統,但如果資料表將來會再次增長,則沒有多大意義。 因此,適度頻繁的標準 VACUUM 運行比用於維護大量更新資料表的罕見 VACUUM FULL 運行更好。
有些管理者更喜歡自己安排資料庫清理作業,例如在負載較低時在夜間進行所有工作。按照固定的時間表進行資料庫清理作業的困難在於,如果資料表在更新活動中出現意外的峰值,則可能會變得臃腫到 VACUUM FULL 真的需要回收空間。使用自動清理背景程序緩解了這個問題,因為背景程序會根據更新活動動態調度清理作業。除非您有一個非常可預測的工作量,否則完全停用該背景程序是不明智的。一個可能的折衷辦法是設定背景程序的參數,以便它僅對異常繁重的更新活動作出反應,從而避免事情失控,而預定的 VACUUM 參數是能在典型的情況下完成大部分工作。
對於那些不使用自動清理的人來說,一種典型的方法是在低使用期內每天安排一次資料庫範圍內的 VACUUM,並根據需要更頻繁地清空大量更新的資料表。(一些具有極高更新率的設定每隔幾分鐘就會清理一次最繁忙的資料表,如此頻繁。)如果叢集中有多個資料庫,請不要忘記每個資料庫都有 VACUUM;vacuumdb 工作可能會有所幫助。
當一個資料表由於大量更新或刪除活動而包含大量過時資料列版本時,一般的 VACUUM 可能不能令人滿意。如果您有這樣一個資料表並且您需要回收佔用的多餘磁碟空間,則需要使用 VACUUM FULL 或 CLUSTER 或一些 ALTER TABLE 的資料表重寫的方式。這些命令重寫整個資料表的新副本並為其建構新的索引。所有這些執行選項都需要獨占鎖定。請注意,它們也需要暫時使用大約等於一倍資料表大小的額外磁碟空間,因為資料表和索引的舊副本只有在新資料表完成後才能完全釋放。
如果您有一張定期刪除整個內容的資料表,請考慮使用 TRUNCATE 而不是使用 DELETE 和 VACUUM。TRUNCATE 會立即刪除資料表的全部內容,而不需要後續的 VACUUM 或 VACUUM FULL 來回收現在未使用的磁碟空間。缺點是違反了嚴格的 MVCC 意義。
PostgreSQL 查詢規劃器依賴於關於表格內容的統計資訊,以便為查詢産生良好的查詢計劃。這些統計資訊由 ANALYZE 指令收集,該指令可以單獨呼叫,也可以作為 VACUUM 中的選擇性的使用。有足夠準確的統計數據非常重要,糟糕的計劃選擇可能會降低資料庫效能。PostgreSQL 查詢規劃器依賴於關於表格內容的統計資訊,以便為查詢産生良好的查詢計劃。這些統計資訊由 ANALYZE 指令收集,該指令可以單獨呼叫,也可以作為 VACUUM 中的選擇性的使用。有足夠準確的統計數據非常重要,糟糕的計劃選擇可能會降低資料庫效能。
autovacuum 背景程序(如果啟用的話)會在資料表內容發生相當的變化時自動發出 ANALYZE 指令。但是,管理員可能更喜歡依靠手動調度的 ANALYZE 操作,尤其是如果知道資料表上的更新活動不會影響「有興趣的」欄位的統計信息。背景程序嚴格按照插入或更新的資料列數的安排 ANALYZE;不過它並不知道這是否會導致有意義的統計變化。
與資料清理恢復空間一樣,頻繁更新統計數據對於大量更新的資料表比對很少更新的資料表更有用。但即使對於大量更新的資料表,如果資料的統計分佈變化不大,也可能不需要進行統計更新。一個簡單的經驗法則是考慮資料表中欄位的最小值和最大值的變化。例如,包含行更新時間的 timestamp 欄在插入和更新資料列時會不斷增加最大值;這樣的欄位可能需要更頻繁的統計更新,而不是包含網頁內容的網址欄位。URL 欄位可能會經常收到更新,但其內容的統計分佈可能變化比較慢。
可以在特定的資料表上執行 ANALYZE,甚至可以在資料表中特定的欄位上執行ANALYZE,因此如果應用程序需要,可以更靈活地更新某些統計資訊。然而,在實務上,通常最好僅分析整個資料庫,因為這是一種快速操作。ANALYZE 以資料表中資料列的隨機抽樣而不是讀取每一個資料列。
儘管 ANALYZE 頻率對每個欄位的調整可能效率不高,但您可能會發現值得對 ANALYZE 統計資訊的詳細程度進行每個欄位調整。在 WHERE 子句中大量使用且具有高度不規則資料分佈的欄位可能需要比其他欄位更精細的資料直方圖。請參閱 ALTER TABLE SET STATISTICS,或使用 default_statistics_target 組態參數變更資料庫層級的預設值。
此外,預設情況下,有關 SELECT 函數的訊息有限。但是,如果建立使用函數呼叫的表示式索引,則會收集有關該函數的有用統計訊息,這可以極大地改進使用表示式索引的查詢計劃。
autovacuum 背景程序不會為外部資料表發出 ANALYZE 指令,因為它無法確定可能有用的頻率。如果您的查詢需要統計外部資料表的正確計劃,最好在適當的時間表上執行手動管理的 ANALYZE 指令。
Vacuum 為每個資料表維護一個可見性映射表(Visibility Map),以追踪哪些頁面包含對所有進行中事務(以及所有未來事務,直到頁面再次被修改)可見的 tuple。這有兩個目的,首先,資料庫清理本身可以在下一次運行中跳過這些頁面,因為沒有什麼要清理的。
其次,它允許 PostgreSQL 僅使用索引來回應某些查詢,而無需參考基本資料表。 由於 PostgreSQL 索引不包含 tuple 的可見性資訊,因此普通的索引掃描會取得每個匹配索引項目的 heap tuple,以檢查目前事務是否應該看到它。另一方面,僅索引掃描首先檢查可見性映射表。如果知道頁面上的所有 tuple 都可見,則可以跳過 heap 取回。這對於可見性映射表可以防止磁碟存取的大型資料集非常有用。可見性映射表遠小於 heap,因此即使 heap 非常大,也可以輕鬆地進行快取。
PostgreSQL 的 MVCC 交易事務處理相依於比較交易事務 ID(XID):插入 XID 大於目前事務的 XID 的資料列版本是「未來」,則目前事務不應該是可見的。但由於事務 ID 的大小有限(32 位元),運行了很長時間(超過 40 億次事務)的叢集將遭受事務 ID 重覆:XID 計數器繞回到零,並且所有突然發生的事務在過去似乎變得是在未來 - 這意味著他們的輸出變得不可見。簡而言之,就是災難性的資料遺失。(實際上資料仍然存在,但如果你無法獲得它,那就沒意義了。)為了避免這種情況,有必要每 20 億次交易至少清理一次每個資料庫中的每個資料表。
定期清理能解決問題的原因是 VACUUM 會將資料列標記為凍結,表明它們是由過去的事務插入的,以至於插入事務的影響肯定對所有目前和未來的事務都可見。使用 modulo-232 運算比較普通 XID。這意味著對於每個普通的 XID,有20億個「較舊」的 XID 和 20 個「較新」的 XID;另一種說法是普通的 XID 空間是圓形的,沒有端點。因此,一旦使用特定的普通 XID 建立了資料列版本,無論我們在談論哪種正常的 XID,資料列版本對於接下來的 20 億次交易看起來都是“過去的”。如果資料列版本在超過 20 億次交易後仍然存在,那麼它將來會突然出現。為了防止這種情況,PostgreSQL 保留了一個特殊的 XID,FrozenTransactionId,它不遵循正常的 XID 比較規則,並且總是被認為比每個普通的 XID 都舊。凍結資料列版本被視為插入 XID 是 FrozenTransactionId,因此它們對於所有正常事務而言似乎都是「過去」而不管繞回重覆的問題,因此這些資料列版本在刪除之前有效,無論多長時間都是。
在 9.4 之前的 PostgreSQL 版本中,透過實際用 FrozenTransactionId 替換資料列的插入 XID 來實現凍結,這在資料列的 xmin 系統欄位中是可見的。較新版本只設置一個指標,保留資料列的原始 xmin 以便進行可能的查證使用。但是,仍然可以在 9.4 之前版本的資料庫 pg_upgrade 中找到 xmin 等於 FrozenTransactionId(2)的資料列。
此外,系統目錄可能包含 xmin 等於 BootstrapTransactionId(1) 的資料列,表示它們是在 initdb 的第一階段插入的。與 FrozenTransactionId 一樣,此特殊 XID 被視為比每個普通 XID 更舊。
vacuum_freeze_min_age 控制在凍結該 XID 的資料列之前 XID 值的大小。增加此設定可以避免不必要的維護工作,否則將很快再次修改否則交易事務將被凍結,但減少此設定會增加在必須再次對資料表進行清理之前可以處理的交易事務數量。
VACUUM 使用可見性映射表來確定必須掃描資料表的哪些頁面。通常,它會跳過沒有任何過期資料列版本的頁面,即使這些頁面可能仍然具有舊 XID 值的資料列版本。因此,正常的 VACUUM 並不總是凍結資料表中每個舊的資料列版本。 VACUUM 會定期執行積極的清理,僅跳過既不包含過期資料列也不包含任何未凍結的 XID 或 MXID 值的頁面。vacuum_freeze_table_age 控制 VACUUM 何時執行此操作:如果自上次此類掃描以來已經處理過的事務數量大於 vacuum_freeze_table_age 減去 vacuum_freeze_min_age,則掃描全部可見但未全部凍結的頁面。將 vacuum_freeze_table_age 設定為 0 會強制 VACUUM 對所有掃描使用此更積極的策略。
資料表可以不清理的最長時間是 20 億個事務減去上次積極清理時的 vacuum_freeze_min_age 值。如果它不清理超過了那個時間,可能會導致資料遺失。為確保不會發生這種情況,將在任何可能包含 XID 未滿配定參數 autovacuum_freeze_max_age 指定的年齡的未凍結資料列的資料表上呼叫autovacuum。(即使禁用 autovacuum,也會執行這個動作。)
這意味著如果資料表沒有以其他方式進行清理,則每次 autovacuum_freeze_max_age 減去 vacuum_freeze_min_age 的事務數量時,將在其上執行 autovacuum。對於經常用於空間回收目的而被清理的資料表,這一點並不重要。但是,對於靜態資料表(包括接收插入但沒有更新或刪除的資料表),不需要清理進行空間回收,因此嘗試最大化非常大的靜態資料表上強制自動清理之間的間隔會很有用。顯然,可以透過增加 autovacuum_freeze_max_age 或減少 vacuum_freeze_min_age 來達到此目的。
vacuum_freeze_table_age 的有效最大值為 0.95 * autovacuum_freeze_max_age;高於此值的設定將被限制為最大值。高於 autovacuum_freeze_max_age 的值是沒有意義的,因為無論如何都會在該點觸發n防止交易重疊的自動清理,並且 0.95 乘數在此之前留下一些喘息空間來執行手動 VACUUM。根據經驗,vacuum_freeze_table_age 應設定為略低於 autovacuum_freeze_max_age 的值,留下足夠的間隙,以便在該間隙中執行由日常刪除和更新活動觸發定期的 VACUUM 或 autovacuum。將它設定得太近可能會導致防止交易重疊的自動清理,即使該資料表最近被清理以回收空間,而較低的值還是會導致更頻繁的積極清理。
增加 autovacuum_freeze_max_age(以及 vacuum_freeze_table_age)的唯一缺點是資料庫叢集的 pg_xact 和 pg_commit_ts 子目錄將佔用更多空間,因為它必須儲存提交狀態和(如果啟用了 track_commit_timestamp)所有事務的時間戳記回到 autovacuum_freeze_max_age horizon。提交狀態每個交易事務使用兩個位元,因此如果 autovacuum_freeze_max_age 設定為其最大允許值 20 億,則 pg_xact 可以增長到大約 0.5 GB,pg_commit_ts 可以增長到大約 20 GB,這與總資料庫大小相比這是微不足道的。建議將 autovacuum_freeze_max_age 設定為其最大允許值。否則,根據您願意允許 pg_xact 和 pg_commit_ts 儲存的內容進行設定。(一般情況下,2 億次交易,轉換為大約 50 MB 的 pg_xact 儲存空間和大約 2 GB 的pg_commit_ts 儲存空間。)
減少 vacuum_freeze_min_age 的一個缺點是它可能導致 VACUUM 進行無謂的工作:如果此後很快更新資料列(導致它獲取新的 XID),凍結資料列版本會浪費時間。因此,設定應該足夠大,以至於資料列不會被凍結,直到它們不再可能更新為止。
為了追踪資料庫中最早解凍的 XID 的值,VACUUM 將 XID 統計訊息儲存在系統資料表 pg_class 和 pg_database 中。特別是,資料表 pg_class 的 relfrozenxid 欄位包含該資料表的最後一個積極 VACUUM 使用的凍結截止 XID。由 XID 早於此截止 XID 的事務插入,則所有資料列都保證已被凍結。同理,資料庫的 pg_database 的 datfrozenxid 欄位是該資料庫中出現的未凍結 XID 的下限 - 它只是資料庫中每個資料表 relfrozenxid 的最小值。檢查此訊息的便捷方法是執行以下查詢:
age 欄位測量從截止 XID 到目前事務的 XID 的事務數。
VACUUM 通常僅掃描自上次清理以來已修改的頁面,但只有在掃描可能包含未凍結 XID 資料表的每個頁面時才能提升 relfrozenxid。當 relfrozenxid 超過 vacuum_freeze_table_agetransactions 時,或當使用 VACUUM 的 FREEZE 選項時,又或當所有尚未全部凍結的頁面碰巧需要清理以刪除過期資料列版本時,才會發生這種情況。當 VACUUM 掃描資料表中尚未全部凍結的每個頁面時,應將 age(relfrozenxid)設定為比 vacuum_freeze_min_age 設定略多一點的值(更多是自 VACUUM 啟動以來啟動的事務數量)。如果在達到 autovacuum_freeze_max_age 之前沒有在資料表上發出 relfrozenxid-advance 的 VACUUM,則很快將強制執行該資料表的 autovacuum。
如果由於某種原因 autovacuum 無法從資料表中清除舊的 XID,當資料庫最舊的 XID 從重疊點到達一千萬個事務時,系統將開始發出這樣的警告消息:
(應該按照提示的建議進行手動 VACUUM 解決問題;但請注意,VACUUM 必須由超級使用者執行,否則它將無法處理系統目錄,就無法推進資料庫的 datfrozenxid。)這些警告如果被忽略,系統將關閉並拒絕啟動任何新的事務,一旦剩下的事務 XID 在重疊前少於 100 萬:
透過手動執行所需的 VACUUM 命令,可以讓管理員在沒有資料遺失的情況下恢復 100 萬個事務安全邊界。但是,由於系統一旦進入安全關閉模式就不會執行命令,唯一的方法是停止伺服器並以單一使用者模式啟動伺服器再執行 VACUUM。在單一使用者模式下不會強制執行關閉。有關使用單一使用者模式的詳細訊息,請參閱 postgres 參考頁面。
Multixact ID 用於支援多個事務的資料列鎖定。由於 tuple 標頭中只有有限的空間來儲存鎖定訊息,因此只要有多個事務同時鎖定一個資料列,該訊息就會被編碼為“multiple transaction ID”或簡稱 Multixact ID。 有關哪些事務 ID 包含在任何特定 multixact ID 中的訊息將單獨儲存在 pg_multixact 目錄中,並且只有 multixact ID 出現在 tuple 標頭中的 xmax 字串中。與事務 ID 一樣,multixact ID 實作為 32 位元計數器和相對應的儲存,所有這些都需要仔細的存續管理,儲存清理和環繞處理。有一個單獨的儲存區域,用於保存每個 multixact 中的成員列表,該列表也使用 32 位元計數器,必須進行管理。
每當 VACUUM 掃描資料表時,它將替換任何比 vacuum_multixact_freeze_min_age 更舊的多重 ID(Multixact ID),其值可以是零值,單個事務 ID 或更新的多重 ID。對於每個資料表,pg_class.relminmxid 儲存仍出現在該資料表的任何 tuple 中的最舊的多重 ID。如果此值早於 vacuum_multixact_freeze_table_age,則強制使用積極地清理。如前一節所述,積極的清理意味著只會跳過那些已知全凍結的頁面。可以在 pg_class.relminmxid 上使用 mxid_age() 來查詢其存在時間。
無論是什麼原因導致積極的 VACUUM 掃描都能夠提升該資料表的值。最終,由於掃描了所有資料庫中的所有資料表並提升了其最舊的 multixact 值,因此可以移除舊的 multixacts 的磁碟儲存。
作為安全設備,對於 multixact-age 大於 autovacuum_multixact_freeze_max_age 的任何資料表,都將進行積極的清理掃描。如果使用的成員儲存空間量超過可定址儲存空間的 50%,那麼對於所有資料表,從具有最早的 multixact-age 的那些開始,也將逐步進行積極的清理掃描。即使名義上停用了 autovacuum,也會發生這兩種積極性掃描。
PostgreSQL 有一個選用但強烈推薦的 autovacuum 功能,其目的是自動執行 VACUUM 和 ANALYZE 指令。啟用後,autovacuum 將檢查已插入、更新或刪除大量 tuple 的資料表。這些檢查使用統計資訊收集工具;因此,除非將 track_counts 設定為 true,否則無法使用 autovacuum。在預設配置中,啟用 autovacuuming 並相對應地設定相關的配置參數。
「autovacuum 背景程序」實際上由多個程序所組成。有一個主控的背景程序,稱為 autovacuum 啟動程序,負責啟動所有資料庫的 autovacuum 工作程序。啟動程序將跨時間分配工作,嘗試在每個 autovacuum_naptime 秒內啟動每個資料庫中的一個工作程序。(因此,如果安裝 N 個資料庫,則每個 autovacuum_naptime / N 秒將啟動一個新工作程序。)允許最多同時運行 autovacuum_max_workers 工作程序。如果要處理的 autovacuum_max_workers 資料庫不止一個,則第一個工作程序完成後將立即處理下一個資料庫。每個工作程序將檢查其資料庫中的每個資料表,並根據需要執行 VACUUM 或 ANALYZE。log_autovacuum_min_duration 可以設定為監控 autovacuum 工作程序的活動。
如果幾個大型資料表都有資格在短時間內進行清理,那麼所有自動清理工作程序可能會長時間針對這些資料表進行清理。這將導致其他資料表和資料庫在工作程序可用之前無法被清理。單個資料庫中可能有多少程序沒有限制,但工作程序確實會試圖避免重複已經由其他工作程序完成的工作。請注意,正在運行的 worker 的數量不計入 max_connections 或 superuser_reserved_connections 限制。
其 relfrozenxid 值大於 autovacuum_freeze_max_age 事務舊的資料表總是被清理(這也適用於那些已通過儲存參數修改了凍結最大年齡的資料表;請參閱下文)。 否則,如果自上一個 VACUUM 以來廢棄的 tuple 數超過“清理閾值(vacuum threshold)”,則對該資料表進行清理。 清理閾值的定義為:
自動清理的基準閾值為 autovacuum_vacuum_threshold,自動清理比例因子為 autovacuum_vacuum_scale_factor,tuple 數為 pg_class.reltuples。從統計資訊收集器獲取過時 tuple 的數量;它是由每個 UPDATE 和 DELETE 操作時的半精確計數。(這只是半精確的,因為某些資訊可能會在負載較重時下遺失。)如果資料表的 relfrozenxid 值超過 vacuum_freeze_table_age 時,則執行積極的清理以凍結舊 tuple 並提前 relfrozenxid;否則,僅掃描自上次清理以來已修改的頁面。
對於分析,使用類似的條件:此閾值定義為:
與自上次 ANALYZE 以來插入、更新或刪除的 tuple 總數進行比較。
autovacuum 無法存取臨時資料表。因此,應透過直接執行 SQL 指令進行適當的清理和分析操作。
預設閾值和比例因子來自 postgresql.conf,但可以基於每個資料表覆寫它們(以及許多其他 autovacuum 控制參數);有關更多訊息,請參閱儲存參數。如果透過資料表的儲存參數變更了設定,則在處理該資料表時使用該值;否則使用全域設定。 有關全域設定的更多詳細訊息,請參閱第 19.10 節。
當多個工作程序執行時,autovacuum 成本延遲參數(參閱第 19.4.4 節)在所有正在執行的工作程序中是「平衡的」,因此無論實際執行的工作程序數量如何,對系統的總 I/O 影響都是相同的。但是,在平衡算法中不考慮任何處理已設定每表 autovacuum_vacuum_cost_delay 或 autovacuum_vacuum_cost_limit 儲存參數的資料表工作程序。
在某些情況下,使用 REINDEX 指令或一系列單獨的重建步驟定期重建索引是值得的。
已完全為空的 B-tree 索引頁面將被回收以供重複使用。但是,仍然存在空間使用效率低的可能性:如果頁面上除了少數索引鍵之外的所有索引鍵都已被刪除,則頁面仍然會被分配。因此,最終刪除每個範圍中的大多數但不是所有鍵的使用模式將會發現空間使用率不佳。對於此類使用模式,建議定期重建索引。
非 B-tree 索引中膨脹的可能性尚未具有很好的研究。所以在使用任何非 B-tree 索引類型時,定期監視索引的磁碟大小是個好主意。
此外,對於 B-tree 索引,新建構的索引比多次更新的索引要快一些,因為邏輯上相鄰的頁面通常在新建構的索引中也是物理上相鄰的。(這種考慮不適用於非 B-tree 索引。)為了提高存取速度,定期重建索引會是值得的。
REINDEX 可以在所有情況下安全且輕鬆地使用。此命令預設情況下需要 ACCESS EXCLUSIVE 鎖定,因此通常最好使用其 CONCURRENTLY 選項來執行它,該選項僅需要 SHARE UPDATE EXCLUSIVE 鎖定。
The collation feature allows specifying the sort order and character classification behavior of data per-column, or even per-operation. This alleviates the restriction that the LC_COLLATE
and LC_CTYPE
settings of a database cannot be changed after its creation.
Conceptually, every expression of a collatable data type has a collation. (The built-in collatable data types are text
, varchar
, and char
. User-defined base types can also be marked collatable, and of course a domain over a collatable data type is collatable.) If the expression is a column reference, the collation of the expression is the defined collation of the column. If the expression is a constant, the collation is the default collation of the data type of the constant. The collation of a more complex expression is derived from the collations of its inputs, as described below.
The collation of an expression can be the “default” collation, which means the locale settings defined for the database. It is also possible for an expression's collation to be indeterminate. In such cases, ordering operations and other operations that need to know the collation will fail.
When the database system has to perform an ordering or a character classification, it uses the collation of the input expression. This happens, for example, with ORDER BY
clauses and function or operator calls such as <
. The collation to apply for an ORDER BY
clause is simply the collation of the sort key. The collation to apply for a function or operator call is derived from the arguments, as described below. In addition to comparison operators, collations are taken into account by functions that convert between lower and upper case letters, such as lower
, upper
, and initcap
; by pattern matching operators; and by to_char
and related functions.
For a function or operator call, the collation that is derived by examining the argument collations is used at run time for performing the specified operation. If the result of the function or operator call is of a collatable data type, the collation is also used at parse time as the defined collation of the function or operator expression, in case there is a surrounding expression that requires knowledge of its collation.
The collation derivation of an expression can be implicit or explicit. This distinction affects how collations are combined when multiple different collations appear in an expression. An explicit collation derivation occurs when a COLLATE
clause is used; all other collation derivations are implicit. When multiple collations need to be combined, for example in a function call, the following rules are used:
If any input expression has an explicit collation derivation, then all explicitly derived collations among the input expressions must be the same, otherwise an error is raised. If any explicitly derived collation is present, that is the result of the collation combination.
Otherwise, all input expressions must have the same implicit collation derivation or the default collation. If any non-default collation is present, that is the result of the collation combination. Otherwise, the result is the default collation.
If there are conflicting non-default implicit collations among the input expressions, then the combination is deemed to have indeterminate collation. This is not an error condition unless the particular function being invoked requires knowledge of the collation it should apply. If it does, an error will be raised at run-time.
For example, consider this table definition:
Then in
the <
comparison is performed according to de_DE
rules, because the expression combines an implicitly derived collation with the default collation. But in
the comparison is performed using fr_FR
rules, because the explicit collation derivation overrides the implicit one. Furthermore, given
the parser cannot determine which collation to apply, since the a
and b
columns have conflicting implicit collations. Since the <
operator does need to know which collation to use, this will result in an error. The error can be resolved by attaching an explicit collation specifier to either input expression, thus:
or equivalently
On the other hand, the structurally similar case
does not result in an error, because the ||
operator does not care about collations: its result is the same regardless of the collation.
The collation assigned to a function or operator's combined input expressions is also considered to apply to the function or operator's result, if the function or operator delivers a result of a collatable data type. So, in
the ordering will be done according to de_DE
rules. But this query:
results in an error, because even though the ||
operator doesn't need to know a collation, the ORDER BY
clause does. As before, the conflict can be resolved with an explicit collation specifier:
A collation is an SQL schema object that maps an SQL name to locales provided by libraries installed in the operating system. A collation definition has a provider that specifies which library supplies the locale data. One standard provider name is libc
, which uses the locales provided by the operating system C library. These are the locales used by most tools provided by the operating system. Another provider is icu
, which uses the external ICU library. ICU locales can only be used if support for ICU was configured when PostgreSQL was built.
A collation object provided by libc
maps to a combination of LC_COLLATE
and LC_CTYPE
settings, as accepted by the setlocale()
system library call. (As the name would suggest, the main purpose of a collation is to set LC_COLLATE
, which controls the sort order. But it is rarely necessary in practice to have an LC_CTYPE
setting that is different from LC_COLLATE
, so it is more convenient to collect these under one concept than to create another infrastructure for setting LC_CTYPE
per expression.) Also, a libc
collation is tied to a character set encoding (see Section 24.3). The same collation name may exist for different encodings.
A collation object provided by icu
maps to a named collator provided by the ICU library. ICU does not support separate “collate” and “ctype” settings, so they are always the same. Also, ICU collations are independent of the encoding, so there is always only one ICU collation of a given name in a database.
On all platforms, the collations named default
, C
, and POSIX
are available. Additional collations may be available depending on operating system support. The default
collation selects the LC_COLLATE
and LC_CTYPE
values specified at database creation time. The C
and POSIX
collations both specify “traditional C” behavior, in which only the ASCII letters “A
” through “Z
” are treated as letters, and sorting is done strictly by character code byte values.
Additionally, the SQL standard collation name ucs_basic
is available for encoding UTF8
. It is equivalent to C
and sorts by Unicode code point.
If the operating system provides support for using multiple locales within a single program (newlocale
and related functions), or if support for ICU is configured, then when a database cluster is initialized, initdb
populates the system catalog pg_collation
with collations based on all the locales it finds in the operating system at the time.
To inspect the currently available locales, use the query SELECT * FROM pg_collation
, or the command \dOS+
in psql.
For example, the operating system might provide a locale named de_DE.utf8
. initdb
would then create a collation named de_DE.utf8
for encoding UTF8
that has both LC_COLLATE
and LC_CTYPE
set to de_DE.utf8
. It will also create a collation with the .utf8
tag stripped off the name. So you could also use the collation under the name de_DE
, which is less cumbersome to write and makes the name less encoding-dependent. Note that, nevertheless, the initial set of collation names is platform-dependent.
The default set of collations provided by libc
map directly to the locales installed in the operating system, which can be listed using the command locale -a
. In case a libc
collation is needed that has different values for LC_COLLATE
and LC_CTYPE
, or if new locales are installed in the operating system after the database system was initialized, then a new collation may be created using the CREATE COLLATION command. New operating system locales can also be imported en masse using the pg_import_system_collations()
function.
Within any particular database, only collations that use that database's encoding are of interest. Other entries in pg_collation
are ignored. Thus, a stripped collation name such as de_DE
can be considered unique within a given database even though it would not be unique globally. Use of the stripped collation names is recommended, since it will make one fewer thing you need to change if you decide to change to another database encoding. Note however that the default
, C
, and POSIX
collations can be used regardless of the database encoding.
PostgreSQL considers distinct collation objects to be incompatible even when they have identical properties. Thus for example,
will draw an error even though the C
and POSIX
collations have identical behaviors. Mixing stripped and non-stripped collation names is therefore not recommended.
With ICU, it is not sensible to enumerate all possible locale names. ICU uses a particular naming system for locales, but there are many more ways to name a locale than there are actually distinct locales. initdb
uses the ICU APIs to extract a set of distinct locales to populate the initial set of collations. Collations provided by ICU are created in the SQL environment with names in BCP 47 language tag format, with a “private use” extension -x-icu
appended, to distinguish them from libc locales.
Here are some example collations that might be created:
de-x-icu
German collation, default variant
de-AT-x-icu
German collation for Austria, default variant
(There are also, say, de-DE-x-icu
or de-CH-x-icu
, but as of this writing, they are equivalent to de-x-icu
.)
und-x-icu
(for “undefined”)
ICU “root” collation. Use this to get a reasonable language-agnostic sort order.
Some (less frequently used) encodings are not supported by ICU. When the database encoding is one of these, ICU collation entries in pg_collation
are ignored. Attempting to use one will draw an error along the lines of “collation "de-x-icu" for encoding "WIN874" does not exist”.
If the standard and predefined collations are not sufficient, users can create their own collation objects using the SQL command CREATE COLLATION.
The standard and predefined collations are in the schema pg_catalog
, like all predefined objects. User-defined collations should be created in user schemas. This also ensures that they are saved by pg_dump
.
New libc collations can be created like this:
The exact values that are acceptable for the locale
clause in this command depend on the operating system. On Unix-like systems, the command locale -a
will show a list.
Since the predefined libc collations already include all collations defined in the operating system when the database instance is initialized, it is not often necessary to manually create new ones. Reasons might be if a different naming system is desired (in which case see also Section 24.2.2.3.3) or if the operating system has been upgraded to provide new locale definitions (in which case see also pg_import_system_collations()
).
ICU allows collations to be customized beyond the basic language+country set that is preloaded by initdb
. Users are encouraged to define their own collation objects that make use of these facilities to suit the sorting behavior to their requirements. See https://unicode-org.github.io/icu/userguide/locale/ and https://unicode-org.github.io/icu/userguide/collation/api.html for information on ICU locale naming. The set of acceptable names and attributes depends on the particular ICU version.
Here are some examples:
CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');
CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de@collation=phonebook');
German collation with phone book collation type
The first example selects the ICU locale using a “language tag” per BCP 47. The second example uses the traditional ICU-specific locale syntax. The first style is preferred going forward, but it is not supported by older ICU versions.
Note that you can name the collation objects in the SQL environment anything you want. In this example, we follow the naming style that the predefined collations use, which in turn also follow BCP 47, but that is not required for user-defined collations.
CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');
CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = '@collation=emoji');
Root collation with Emoji collation type, per Unicode Technical Standard #51
Observe how in the traditional ICU locale naming system, the root locale is selected by an empty string.
CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');
CREATE COLLATION latinlast (provider = icu, locale = 'en@colReorder=grek-latn');
Sort Greek letters before Latin ones. (The default is Latin before Greek.)
CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');
CREATE COLLATION upperfirst (provider = icu, locale = 'en@colCaseFirst=upper');
Sort upper-case letters before lower-case letters. (The default is lower-case letters first.)
CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');
CREATE COLLATION special (provider = icu, locale = 'en@colCaseFirst=upper;colReorder=grek-latn');
Combines both of the above options.
CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');
CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');
Numeric ordering, sorts sequences of digits by their numeric value, for example: A-21
< A-123
(also known as natural sort).
See Unicode Technical Standard #35 and BCP 47 for details. The list of possible collation types (co
subtag) can be found in the CLDR repository.
Note that while this system allows creating collations that “ignore case” or “ignore accents” or similar (using the ks
key), in order for such collations to act in a truly case- or accent-insensitive manner, they also need to be declared as not deterministic in CREATE COLLATION
; see Section 24.2.2.4. Otherwise, any strings that compare equal according to the collation but are not byte-wise equal will be sorted according to their byte values.
By design, ICU will accept almost any string as a locale name and match it to the closest locale it can provide, using the fallback procedure described in its documentation. Thus, there will be no direct feedback if a collation specification is composed using features that the given ICU installation does not actually support. It is therefore recommended to create application-level test cases to check that the collation definitions satisfy one's requirements.
The command CREATE COLLATION can also be used to create a new collation from an existing collation, which can be useful to be able to use operating-system-independent collation names in applications, create compatibility names, or use an ICU-provided collation under a more readable name. For example:
A collation is either deterministic or nondeterministic. A deterministic collation uses deterministic comparisons, which means that it considers strings to be equal only if they consist of the same byte sequence. Nondeterministic comparison may determine strings to be equal even if they consist of different bytes. Typical situations include case-insensitive comparison, accent-insensitive comparison, as well as comparison of strings in different Unicode normal forms. It is up to the collation provider to actually implement such insensitive comparisons; the deterministic flag only determines whether ties are to be broken using bytewise comparison. See also Unicode Technical Standard 10 for more information on the terminology.
To create a nondeterministic collation, specify the property deterministic = false
to CREATE COLLATION
, for example:
This example would use the standard Unicode collation in a nondeterministic way. In particular, this would allow strings in different normal forms to be compared correctly. More interesting examples make use of the ICU customization facilities explained above. For example:
All standard and predefined collations are deterministic, all user-defined collations are deterministic by default. While nondeterministic collations give a more “correct” behavior, especially when considering the full power of Unicode and its many special cases, they also have some drawbacks. Foremost, their use leads to a performance penalty. Note, in particular, that B-tree cannot use deduplication with indexes that use a nondeterministic collation. Also, certain operations are not possible with nondeterministic collations, such as pattern matching operations. Therefore, they should be used only in cases where they are specifically wanted.
To deal with text in different Unicode normalization forms, it is also an option to use the functions/expressions normalize
and is normalized
to preprocess or check the strings, instead of using nondeterministic collations. There are different trade-offs for each approach.
區域設定支援是指某個應用程序,它提供有關字母、排序、數字格式等文化偏好。PostgreSQL 使用伺服器作業系統提供的標準 ISO C 和 POSIX 區域設定。有關其他訊息,請參閱作業系統文件。
使用 initdb 建立資料庫叢集時,將自動初始化語言環境支援。initdb 將預設使用其執行環境的語言環境設定初始化資料庫叢集。因此,如果您的作業系統已設定為使用資料庫叢集中所需的語言環境,那麼您毌須進行任何額外操作。如果要使用其他語言環境(或者您不確定系統設定的語言環境),可以透過指定 --locale 選項指示 initdb 確切使用哪個語言環境。例如:
Unix 系統的這個範例將語言環境設定為瑞典語(SE)中的瑞典語(sv)。其他可能性可能包括 en_US(美國英語)和 fr_CA(加拿大法語)。如果可以將多個字元集用於語言環境,則規範可採用 language_territory.codeset 形式。例如,fr_BE.UTF-8 表示比利時(BE)中使用的法語(fr),具有 UTF-8 字元集編碼。
系統上可用的區域設定取決於作業系統供應商提供和安裝的內容。在大多數 Unix 系統上,指令 locale -a
將提供可用語言環境的列表。Windows 使用更詳細的區域設定名稱,例如 German_Germany 或 Swedish_Sweden.1252,但原則是相同的。
有時,混合來自多個語言環境的規則很有用,例如,使用英語校對規則,但使用西班牙語訊息。為了支援這一點,可以存在一組區域設定子類別,它們僅控制本地化規則的某些方面:
LC_COLLATE
String sort order
LC_CTYPE
字元分類(什麼是字母?它的大寫字母是?)
LC_MESSAGES
訊息的語言
LC_MONETARY
格式化貨幣金額
LC_NUMERIC
格式化數字
LC_TIME
格式化日期和時間
類別名稱轉換為 initdb 選項的名稱,以覆蓋特定類別的區域設定選項。例如,要將語言環境設定為加拿大法語,但使用美國規則格式化貨幣,請使用 initdb --locale = fr_CA --lc-monetary = en_US
。
如果您希望系統的行為就像它沒有語言環境支援一樣,請使用特殊的語言環境名稱 C 或等效的 POSIX。
建立資料庫時,某些區域設定類別必須固定其值。 您可以對不同的資料庫使用不同的設定,但是一旦建立了資料庫,就無法再為該資料庫更改它們。LC_COLLATE 和 LC_CTYPE 是這些類別。它們會影響索引的排序順序,因此必須保持不變,否則文字欄位上的索引會損壞。(但是您可以使用排序規則來緩解此限制,如第 24.2 節中所述。)這些類別的預設值在執行 initdb 時確定,並且在建立新資料庫時使用這些值,除非在 CREATE DATABASE 指令中另行指定。
透過設定與語言環境類別同名的伺服器配置參數,可以隨時更改其他語言環境類別(有關詳細訊息,請參閱第 20.11.2 節)。initdb 選擇的值實際上只寫入配置文件 postgresql.conf,以在伺服器啟動時用作預設值。如果從 postgresql.conf 中刪除這些設定,則伺服器將從其執行環境繼承設定。
請注意,服務器的區域設定行為由伺服器看到的環境變數決定,而不是由任何用戶端的環境確定。因此,在啟動伺服器之前,請務必配置正確的區域設定。這樣做的結果是,如果用戶端和伺服器設定在不同的區域設定中,則訊息可能會以不同的語言顯示,具體取決於它們的來源。
注意 當我們談到從執行環境繼承語言環境時,這意味著在大多數作業系統上都有以下內容:對於給定的語言環境類別,比如排序規則,將按此順序查詢以下環境變數,直到找到一個設定:LC_ALL, LC_COLLATE(或對應於相應類別的變數),LANG。如果未設定這些環境變數,則語言環境預設為 C.
某些訊息的本地化函式庫還會查看環境變數 LANGUAGE,該變數將覆寫所有其他區域設定,以便設定訊息的語言。如有疑問,請參閱作業系統的文件,特別是有關 gettext 的文件。
要使訊息能夠轉換為用戶的偏好語言,必須在編譯時選擇 NLS(configure --enable-nls
)。所有其他語言環境支援都是自動編譯的。
語系設定會影響以下的 SQL 功能:
使用 ORDER BY 或標準比較運算子對查詢中文字排序
upper,lower 和 initcap 功能
樣式匹配運算子(LIKE,SIMILAR TO 和 POSIX 形式的正規表示式);locales 透過字元類的正規表示式影響不區分大小寫的匹配和字元分類
to_char 系列函數
索引可以與 LIKE 子句一起使用
在 PostgreSQL 中使用 C 或 POSIX 以外語言環境的缺點是對效能的影響。它會減慢字元處理速度並阻止 LIKE 使用普通索引。因此,最好只有在實際需要時才進行區域設定。
作為允許 PostgreSQL 在非 C 語言環境下使用具有 LIKE 子句索引的解決方法,存在多個自訂運算子類。允許建立一個執行嚴格的逐字元比較的索引,忽略區域設定的比較規則。有關更多訊息,請參閱第 11.9 節。另一種方法是使用 C collation 建立索引,如第 24.2節中所述。
Locales can be selected in different scopes depending on requirements. The above overview showed how locales are specified using initdb
to set the defaults for the entire cluster. The following list shows where locales can be selected. Each item provides the defaults for the subsequent items, and each lower item allows overriding the defaults on a finer granularity.
As explained above, the environment of the operating system provides the defaults for the locales of a newly initialized database cluster. In many cases, this is enough: If the operating system is configured for the desired language/territory, then PostgreSQL will by default also behave according to that locale.
As shown above, command-line options for initdb
specify the locale settings for a newly initialized database cluster. Use this if the operating system does not have the locale configuration you want for your database system.
A locale can be selected separately for each database. The SQL command CREATE DATABASE
and its command-line equivalent createdb
have options for that. Use this for example if a database cluster houses databases for multiple tenants with different requirements.
Locale settings can be made for individual table columns. This uses an SQL object called collation and is explained in Section 24.2. Use this for example to sort data in different languages or customize the sort order of a particular table.
Finally, locales can be selected for an individual query. Again, this uses SQL collation objects. This could be used to change the sort order based on run-time choices or for ad-hoc experimentation.
PostgreSQL supports multiple locale providers. This specifies which library supplies the locale data. One standard provider name is libc
, which uses the locales provided by the operating system C library. These are the locales used by most tools provided by the operating system. Another provider is icu
, which uses the external ICU library. ICU locales can only be used if support for ICU was configured when PostgreSQL was built.
The commands and tools that select the locale settings, as described above, each have an option to select the locale provider. The examples shown earlier all use the libc
provider, which is the default. Here is an example to initialize a database cluster using the ICU provider:
See the description of the respective commands and programs for details. Note that you can mix locale providers at different granularities, for example use libc
by default for the cluster but have one database that uses the icu
provider, and then have collation objects using either provider within those databases.
Which locale provider to use depends on individual requirements. For most basic uses, either provider will give adequate results. For the libc provider, it depends on what the operating system offers; some operating systems are better than others. For advanced uses, ICU offers more locale variants and customization options.
如果區域設定依上述說明操作卻不起作用的話,請檢查作業系統中的區域設定是否已正確配置。要檢查作業系統上安裝的語言環境,可以使用命令 locale -a(如果作業系統有提供的話)。
檢查 PostgreSQL 實際上是否正在使用您認為的語言環境。LC_COLLATE 和 LC_CTYPE 設定會在建立資料庫時確定,除非建立新的資料庫,否則無法變更。其他區域設定(包括 LC_MESSAGES 和 LC_MONETARY)最初由伺服器啟動的環境決定,但可以即時變更。您可以使用 SHOW 命令檢查當下有效的區域設定。
原始碼發行版中的目錄 src/test/locale 包含了 PostgreSQL 語言環境支援的測試套件。
當伺服器的訊息使用不同的語言時,透過解析錯誤訊息文字來處理伺服器端錯誤的用戶端應用程序顯然會出現問題。建議此類應用程序的作者使用錯誤代碼方案。
維護訊息翻譯目錄需要許多志願者的持續努力,他們希望看到 PostgreSQL 能夠順暢地說出他們喜歡的語言。如果您的語言訊息目前無法使用或未完全翻譯,我們將非常歡迎您的協助。如果您想幫助我們,請參閱第 54 章或寫信給開發人員的郵件列表。
It is a good idea to save the database server's log output somewhere, rather than just discarding it via /dev/null
. The log output is invaluable when diagnosing problems. However, the log output tends to be voluminous (especially at higher debug levels) so you won't want to save it indefinitely. You need to rotate the log files so that new log files are started and old ones removed after a reasonable period of time.
If you simply direct the stderr of postgres
into a file, you will have log output, but the only way to truncate the log file is to stop and restart the server. This might be acceptable if you are using PostgreSQL in a development environment, but few production servers would find this behavior acceptable.
更好的方法是將伺服器的 stderr 輸出發送到某種日誌輪轉程序。有一個內建的日誌輪轉工具,您可以透過在 postgresql.conf 中將組態參數 logging_collector 設定為 true 來使用。該程序的控制參數在 19.8.1 節中介紹。您還可以使用這種方法以機器可讀的 CSV(逗號分隔內容)格式取得日誌內容。
Alternatively, you might prefer to use an external log rotation program if you have one that you are already using with other server software. For example, the rotatelogs tool included in the Apache distribution can be used with PostgreSQL. One way to do this is to pipe the server's stderr output to the desired program. If you start the server with pg_ctl
, then stderr is already redirected to stdout, so you just need a pipe command, for example:
您可以透過設定 logrotate 來收集 PostgreSQL 內建日誌收集器所產生的日誌檔案,從而結合使用這些方法。在這種情況下,日誌收集器會定義日誌檔案的名稱和位置,而 logrotate 會定期封存這些檔案。啟動日誌輪轉時,logrotate 必須確保應用程序會進一步的輸出發送到新檔案。 通常會使用 postrotate 腳本來完成此操作,該腳本將 SIGHUP 信號發送到應用程序,然後重新打開日誌檔案。在 PostgreSQL 中,您可以使用 logrotate 選項執行 pg_ctl。伺服器收到此命令後,將切轉到新的日誌檔案或重新打開現有檔案,具體取決於日誌記錄設定(請參閱第 19.8.1 節)。
使用靜態日誌檔案名稱時,如果達到最大開啓檔案數量限制或超過最大檔案大小,伺服器可能會無法重新開啓日誌檔案。在這種情況下,日誌訊息將發送到舊的日誌檔案,直到成功進行日誌輪轉為止。如果將 logrotate 設定為壓縮日誌檔案並將其刪除,則伺服器可能會失去此時間範圍內記錄的訊息。為避免此問題,可以將日誌收集器設定為動態分配日誌檔案名稱,並使用 prerotate 的腳本以避免開啓日誌檔案。
Another production-grade approach to managing log output is to send it to syslog and let syslog deal with file rotation. To do this, set the configuration parameter log_destination
to syslog
(to log to syslog only) in postgresql.conf
. Then you can send a SIGHUP
signal to the syslog daemon whenever you want to force it to start writing a new log file. If you want to automate log rotation, the logrotate program can be configured to work with log files from syslog.
On many systems, however, syslog is not very reliable, particularly with large log messages; it might truncate or drop messages just when you need them the most. Also, on Linux, syslog will flush each message to disk, yielding poor performance. (You can use a “-
” at the start of the file name in the syslog configuration file to disable syncing.)
Note that all the solutions described above take care of starting new log files at configurable intervals, but they do not handle deletion of old, no-longer-useful log files. You will probably want to set up a batch job to periodically delete old log files. Another possibility is to configure the rotation program so that old log files are overwritten cyclically.
pgBadger is an external project that does sophisticated log file analysis. check_postgres provides Nagios alerts when important messages appear in the log files, as well as detection of many other extraordinary conditions.
資料庫伺服器可以協同工作,以便在主要伺服器故障時允許第二台伺服器快速的接管(高可用性 High Availability),或者允許多台伺服器提供相同的資料(負載平衡 Loading Balancing)。理想狀況下,資料伺服器可以無縫接軌地協同工作。網頁伺服器提供靜態網頁可以被相當簡單的組合,僅透過負載平衡把網頁請求分配到多台機器上。事實上,只提供讀取的資料庫伺服器也可以相對容易地被組合。不幸地是大多數地資料庫伺服器具有讀/寫請求的組合,可是具備讀/寫請求資料伺服器被組合起來是相當地困難。這是因為儘管只供讀取的資料只被放進每台伺服器一次,但是必須將對任何伺服器寫入的資料傳播到所有的伺服器中,以便將來對這些伺服器發送讀取請求能夠返回一致的結果。
這種同步化的問題算是伺服器協同工作上的基本難題。由於沒有單一得解決方案可以消除所有使用案例同步問題的影響,因此有多許多種解決方案。每種解決方案都以不同的方式解決問題,並最小化該問題對特定工作負載的影響。
有些解決方案處理同步化是藉由只讓單一伺服器可以修改資料。可以修改資料伺服器被稱之為 read/write、master或primary的伺服器。 可以追蹤master伺服器改變的伺服器我們稱為standby或secondary伺服器。standby伺服器不能接受連線上直到他被提升為maaster伺服器才能連線的伺服器稱之為warm standby伺服器,另一種可以接受連線且只提供其他伺服器作讀取查詢的稱之為hot standby伺服器。
一些解決方案是同步的,代表說一個資料修改的交易是不被認為提交直到所有伺服器都已經提交這些交易。這保證故障轉移不會遺失掉任何資料,和不論哪一台資料庫伺服器被查詢時,所有負載平衡的伺服器都可以返回一致的結果。相反地,非同步解決方案允許在提交交易時間和傳播到其他伺服器之間存在些許延遲,從而可能會在切換到備份伺服器時遺失某些交易,且負載平衡伺服器可能會返回一些稍微過時的結果。非同步解決方案被運用在當同步解決方案太慢的時候。
解決方案也可以被依照規模分類,某些解決方案只能處理整個資料庫伺服器,然而其他的解決方案允許處理控制在每個表或每個資料庫等級。
做任何選擇都必須考慮到其性能。通常必須在功能和性能之間取其權衡。例如一個完整的同步解決方案可能會讓性能降低一半以上,而異步解決方案可能會對性能有比較小的影響。
本節的其餘部分概述了各種故障轉移、複製和負載平衡解決方案。
持續性歸檔可用於建構高可用性(HA)的叢集配置,其中一個或多個備用伺服器準備好在主伺服器發生故障時接管操作。此功能被廣泛稱為熱備份(warm standby)或日誌轉送(Log-Shipping)。
伺服器們是人為的相依,由主伺服器和備用伺服器協同工作以提供此功能。主伺服器以持續性歸檔模式運行,而每個備用伺服器以連續恢復模式運行,從主伺服器讀取 WAL 檔案。毌須更改資料庫的資料表即可啟用此功能,因此與其他一些複寫解決方案相比,它可以提供較低的管理成本。此配置對主伺服器的效能影響也相對較低。
直接將 WAL 記錄從一個資料庫伺服器移動到另一個資料庫伺服器通常被稱為日誌轉送。PostgreSQL 透過一次傳輸 WAL 記錄一個檔案(WAL 段落)來實現基於檔案的日誌轉送。WAL 檔案(16MB)可以在任何距離上輕鬆便宜地運輸,無論是相鄰系統,同一站點的另一個系統,還是地球另一端的其他系統。此技術所需的頻寬依主伺服器的事務速率而變化。基於記錄的日誌傳送更精細,並且通過網路連連逐步更改 WAL(請參閱第 26.2.5 節)。
應該注意的是,日誌輸送是非同步的,即 WAL 記錄在事務提交之後被傳送。因此,如果主伺服器遭受災難性故障,則存在資料遺失的可能性;尚未提交的交易將會失去。基於檔案的日誌轉送中的資料遺失的大小可以透過使用 archive_timeout 參數來限制,該參數可以設定低至數秒鐘。然而,這種低的設定將大大增加檔案傳送所需的頻寬。 串流複寫(參閱第 26.2.5 節)允許更小的資料遺失大小。
回復的效率很高,一旦備用轉為主要,備用資料庫通常只需要幾分鐘即可完全可用。因此,這稱為熱備用配置,可提供高可用性。從歸檔的基本備份和回溯還原伺服器將花費相當長的時間,因此該技術僅提供災難恢復的解決方案,而不是高可用性。備用伺服器也可用於唯讀查詢,在這種情況下,它稱為熱備份伺服器。有關更多訊息,請參閱第 26.5 節。
建立主伺服器和備用伺服器通常是好的規畫,使它們可以盡可能相似,至少從資料庫伺服器的角度來看。特別是,與資料表空間關聯的路徑名稱將在未修改的情況下傳遞。因此,如果使用此功能,主伺服器和備用伺服器必須具有相同的資料表空間的安裝路徑。請記住,如果在主伺服器上執行 CREATE TABLESPACE,則必須在執行命令之前在主伺服器和所有備用伺服器上建立所需的所有新安裝點。硬體不需要完全相同,但經驗上,維護兩個相同的系統會比在應用系統的生命週期內維護兩個不同的系統更容易。不過在硬體架構則必須相同 - 例如,從 32 位元到 64 位元系統的搭配則無法運作。
一般來說,無法在不同主要 PostgreSQL 版本的伺服器之間進行日誌傳送。PostgreSQL 全球開發團隊的原則是不要在次要版本升級期間更改磁碟格式,因此在主伺服器和備用伺服器上使用不同的次要版本可能會成功執行。 但是,並沒有保證正式支持,建議您盡可能將主伺服器和備用伺服器保持在同一版本。更新到新的次要版本時,最安全的策略是先更新備用伺服器 - 新的次要版本更有可能從先前的次要版本讀取 WAL 檔案,反過來則不一定。
在備用模式下,伺服器連續套用從主要伺服器所接收的 WAL。備用伺服器可以透過 TCP 連線(串流複寫)從 WAL 歸檔(請參閱 restore_command)。備用伺服器也會嘗試恢復在備用集群的 pg_wal 目錄中能找到的任何 WAL。這通常發生在伺服器重新啟動之後,當備用資料庫再次重新執行在重新啟動之前從主服務器串流傳輸的 WAL 時,您也可以隨時手動將檔案複製到 pg_wal 以重新執行它們。
在啟動時,備用資料庫首先恢復存檔路徑中的所有可用的 WAL,然後呼叫 restore_command。一旦達到 WAL 可用的尾端並且 restore_command 失敗,它就會嘗試恢復 pg_wal 目錄中可用的任何WAL。如果失敗,並且已啟用串流複寫,則備用資料庫會嘗試連到主伺服器,並從 archive 或 pg_wal 中找到的最後一個有效記錄開始串流傳輸 WAL。 如果失敗或未啟用串流複寫,或者稍後中斷連線,則備用資料庫將返回步驟 1 並嘗試再次從存檔中還原交易。pg_wal 和串流複寫的重試循環一直持續到伺服器停止或觸發故障轉移為止。
退出備用模式,當執行 pg_ctl promote 或找到觸發器檔案(trigger_file)時,伺服器將切換到正常操作。在故障轉移之前,將恢復存檔或 pg_wal 中立即可用的 WAL,但不會嘗試連線到主要伺服器。
Set up continuous archiving on the primary to an archive directory accessible from the standby, as described in Section 25.3. The archive location should be accessible from the standby even when the master is down, i.e. it should reside on the standby server itself or another trusted server, not on the master server.
If you want to use streaming replication, set up authentication on the primary server to allow replication connections from the standby server(s); that is, create a role and provide a suitable entry or entries in pg_hba.conf
with the database field set to replication
. Also ensure max_wal_senders
is set to a sufficiently large value in the configuration file of the primary server. If replication slots will be used, ensure that max_replication_slots
is set sufficiently high as well.
Take a base backup as described in Section 25.3.2 to bootstrap the standby server.
To set up the standby server, restore the base backup taken from primary server (see Section 25.3.4). Create a recovery command file recovery.conf
in the standby's cluster data directory, and turn on standby_mode
. Set restore_command
to a simple command to copy files from the WAL archive. If you plan to have multiple standby servers for high availability purposes, set recovery_target_timeline
to latest
, to make the standby server follow the timeline change that occurs at failover to another standby.
Do not use pg_standby or similar tools with the built-in standby mode described here. restore_command
should return immediately if the file does not exist; the server will retry the command again if necessary. See Section 26.4 for using tools like pg_standby.
If you want to use streaming replication, fill in primary_conninfo
with a libpq connection string, including the host name (or IP address) and any additional details needed to connect to the primary server. If the primary needs a password for authentication, the password needs to be specified in primary_conninfo
as well.
If you're setting up the standby server for high availability purposes, set up WAL archiving, connections and authentication like the primary server, because the standby server will work as a primary server after failover.
If you're using a WAL archive, its size can be minimized using the archive_cleanup_command parameter to remove files that are no longer required by the standby server. The pg_archivecleanup utility is designed specifically to be used with archive_cleanup_command
in typical single-standby configurations, see pg_archivecleanup. Note however, that if you're using the archive for backup purposes, you need to retain files needed to recover from at least the latest base backup, even if they're no longer needed by the standby.
A simple example of a recovery.conf
is:
You can have any number of standby servers, but if you use streaming replication, make sure you set max_wal_senders
high enough in the primary to allow them to be connected simultaneously.
Streaming replication allows a standby server to stay more up-to-date than is possible with file-based log shipping. The standby connects to the primary, which streams WAL records to the standby as they're generated, without waiting for the WAL file to be filled.
Streaming replication is asynchronous by default (see Section 26.2.8), in which case there is a small delay between committing a transaction in the primary and the changes becoming visible in the standby. This delay is however much smaller than with file-based log shipping, typically under one second assuming the standby is powerful enough to keep up with the load. With streaming replication, archive_timeout
is not required to reduce the data loss window.
If you use streaming replication without file-based continuous archiving, the server might recycle old WAL segments before the standby has received them. If this occurs, the standby will need to be reinitialized from a new base backup. You can avoid this by setting wal_keep_segments
to a value large enough to ensure that WAL segments are not recycled too early, or by configuring a replication slot for the standby. If you set up a WAL archive that's accessible from the standby, these solutions are not required, since the standby can always use the archive to catch up provided it retains enough segments.
To use streaming replication, set up a file-based log-shipping standby server as described in Section 26.2. The step that turns a file-based log-shipping standby into streaming replication standby is setting primary_conninfo
setting in the recovery.conf
file to point to the primary server. Set listen_addresses and authentication options (see pg_hba.conf
) on the primary so that the standby server can connect to the replication
pseudo-database on the primary server (see Section 26.2.5.1).
On systems that support the keepalive socket option, setting tcp_keepalives_idle, tcp_keepalives_interval and tcp_keepalives_count helps the primary promptly notice a broken connection.
Set the maximum number of concurrent connections from the standby servers (see max_wal_senders for details).
When the standby is started and primary_conninfo
is set correctly, the standby will connect to the primary after replaying all WAL files available in the archive. If the connection is established successfully, you will see a walreceiver process in the standby, and a corresponding walsender process in the primary.
It is very important that the access privileges for replication be set up so that only trusted users can read the WAL stream, because it is easy to extract privileged information from it. Standby servers must authenticate to the primary as a superuser or an account that has the REPLICATION
privilege. It is recommended to create a dedicated user account with REPLICATION
and LOGIN
privileges for replication. While REPLICATION
privilege gives very high permissions, it does not allow the user to modify any data on the primary system, which the SUPERUSER
privilege does.
Client authentication for replication is controlled by a pg_hba.conf
record specifying replication
in the database
field. For example, if the standby is running on host IP 192.168.1.100
and the account name for replication is foo
, the administrator can add the following line to the pg_hba.conf
file on the primary:
The host name and port number of the primary, connection user name, and password are specified in the recovery.conf
file. The password can also be set in the ~/.pgpass
file on the standby (specify replication
in the database
field). For example, if the primary is running on host IP 192.168.1.50
, port 5432
, the account name for replication is foo
, and the password is foopass
, the administrator can add the following line to the recovery.conf
file on the standby:
An important health indicator of streaming replication is the amount of WAL records generated in the primary, but not yet applied in the standby. You can calculate this lag by comparing the current WAL write location on the primary with the last WAL location received by the standby. These locations can be retrieved using pg_current_wal_lsn
on the primary and pg_last_wal_receive_lsn
on the standby, respectively (see Table 9.79 and Table 9.80 for details). The last WAL receive location in the standby is also displayed in the process status of the WAL receiver process, displayed using the ps
command (see Section 28.1 for details).
You can retrieve a list of WAL sender processes via the pg_stat_replication
view. Large differences between pg_current_wal_lsn
and the view's sent_lsn
field might indicate that the master server is under heavy load, while differences between sent_lsn
and pg_last_wal_receive_lsn
on the standby might indicate network delay, or that the standby is under heavy load.
Replication slots provide an automated way to ensure that the master does not remove WAL segments until they have been received by all standbys, and that the master does not remove rows which could cause a recovery conflict even when the standby is disconnected.
In lieu of using replication slots, it is possible to prevent the removal of old WAL segments using wal_keep_size, or by storing the segments in an archive using archive_command. However, these methods often result in retaining more WAL segments than required, whereas replication slots retain only the number of segments known to be needed. On the other hand, replication slots can retain so many WAL segments that they fill up the space allocated for pg_wal
; max_slot_wal_keep_size limits the size of WAL files retained by replication slots.
Similarly, hot_standby_feedback and vacuum_defer_cleanup_age provide protection against relevant rows being removed by vacuum, but the former provides no protection during any time period when the standby is not connected, and the latter often needs to be set to a high value to provide adequate protection. Replication slots overcome these disadvantages.
Each replication slot has a name, which can contain lower-case letters, numbers, and the underscore character.
Existing replication slots and their state can be seen in the pg_replication_slots
view.
Slots can be created and dropped either via the streaming replication protocol (see Section 52.4) or via SQL functions (see Section 9.27.6).
You can create a replication slot like this:
To configure the standby to use this slot, primary_slot_name
should be configured on the standby. Here is a simple example:
The cascading replication feature allows a standby server to accept replication connections and stream WAL records to other standbys, acting as a relay. This can be used to reduce the number of direct connections to the master and also to minimize inter-site bandwidth overheads.
A standby acting as both a receiver and a sender is known as a cascading standby. Standbys that are more directly connected to the master are known as upstream servers, while those standby servers further away are downstream servers. Cascading replication does not place limits on the number or arrangement of downstream servers, though each standby connects to only one upstream server which eventually links to a single master/primary server.
A cascading standby sends not only WAL records received from the master but also those restored from the archive. So even if the replication connection in some upstream connection is terminated, streaming replication continues downstream for as long as new WAL records are available.
Cascading replication is currently asynchronous. Synchronous replication (see Section 26.2.8) settings have no effect on cascading replication at present.
Hot Standby feedback propagates upstream, whatever the cascaded arrangement.
If an upstream standby server is promoted to become new master, downstream servers will continue to stream from the new master if recovery_target_timeline
is set to 'latest'
(the default).
To use cascading replication, set up the cascading standby so that it can accept replication connections (that is, set max_wal_senders and hot_standby, and configure host-based authentication). You will also need to set primary_conninfo
in the downstream standby to point to the cascading standby.
PostgreSQL streaming replication is asynchronous by default. If the primary server crashes then some transactions that were committed may not have been replicated to the standby server, causing data loss. The amount of data loss is proportional to the replication delay at the time of failover.
Synchronous replication offers the ability to confirm that all changes made by a transaction have been transferred to one or more synchronous standby servers. This extends that standard level of durability offered by a transaction commit. This level of protection is referred to as 2-safe replication in computer science theory, and group-1-safe (group-safe and 1-safe) when synchronous_commit
is set to remote_write
.
When requesting synchronous replication, each commit of a write transaction will wait until confirmation is received that the commit has been written to the write-ahead log on disk of both the primary and standby server. The only possibility that data can be lost is if both the primary and the standby suffer crashes at the same time. This can provide a much higher level of durability, though only if the sysadmin is cautious about the placement and management of the two servers. Waiting for confirmation increases the user's confidence that the changes will not be lost in the event of server crashes but it also necessarily increases the response time for the requesting transaction. The minimum wait time is the round-trip time between primary to standby.
Read only transactions and transaction rollbacks need not wait for replies from standby servers. Subtransaction commits do not wait for responses from standby servers, only top-level commits. Long running actions such as data loading or index building do not wait until the very final commit message. All two-phase commit actions require commit waits, including both prepare and commit.
A synchronous standby can be a physical replication standby or a logical replication subscriber. It can also be any other physical or logical WAL replication stream consumer that knows how to send the appropriate feedback messages. Besides the built-in physical and logical replication systems, this includes special programs such as pg_receivewal
and pg_recvlogical
as well as some third-party replication systems and custom programs. Check the respective documentation for details on synchronous replication support.
Once streaming replication has been configured, configuring synchronous replication requires only one additional configuration step: synchronous_standby_names must be set to a non-empty value. synchronous_commit
must also be set to on
, but since this is the default value, typically no change is required. (See Section 19.5.1 and Section 19.6.2.) This configuration will cause each commit to wait for confirmation that the standby has written the commit record to durable storage. synchronous_commit
can be set by individual users, so it can be configured in the configuration file, for particular users or databases, or dynamically by applications, in order to control the durability guarantee on a per-transaction basis.
After a commit record has been written to disk on the primary, the WAL record is then sent to the standby. The standby sends reply messages each time a new batch of WAL data is written to disk, unless wal_receiver_status_interval
is set to zero on the standby. In the case that synchronous_commit
is set to remote_apply
, the standby sends reply messages when the commit record is replayed, making the transaction visible. If the standby is chosen as a synchronous standby, according to the setting of synchronous_standby_names
on the primary, the reply messages from that standby will be considered along with those from other synchronous standbys to decide when to release transactions waiting for confirmation that the commit record has been received. These parameters allow the administrator to specify which standby servers should be synchronous standbys. Note that the configuration of synchronous replication is mainly on the master. Named standbys must be directly connected to the master; the master knows nothing about downstream standby servers using cascaded replication.
Setting synchronous_commit
to remote_write
will cause each commit to wait for confirmation that the standby has received the commit record and written it out to its own operating system, but not for the data to be flushed to disk on the standby. This setting provides a weaker guarantee of durability than on
does: the standby could lose the data in the event of an operating system crash, though not a PostgreSQL crash. However, it's a useful setting in practice because it can decrease the response time for the transaction. Data loss could only occur if both the primary and the standby crash and the database of the primary gets corrupted at the same time.
Setting synchronous_commit
to remote_apply
will cause each commit to wait until the current synchronous standbys report that they have replayed the transaction, making it visible to user queries. In simple cases, this allows for load balancing with causal consistency.
Users will stop waiting if a fast shutdown is requested. However, as when using asynchronous replication, the server will not fully shutdown until all outstanding WAL records are transferred to the currently connected standby servers.
Synchronous replication supports one or more synchronous standby servers; transactions will wait until all the standby servers which are considered as synchronous confirm receipt of their data. The number of synchronous standbys that transactions must wait for replies from is specified in synchronous_standby_names
. This parameter also specifies a list of standby names and the method (FIRST
and ANY
) to choose synchronous standbys from the listed ones.
The method FIRST
specifies a priority-based synchronous replication and makes transaction commits wait until their WAL records are replicated to the requested number of synchronous standbys chosen based on their priorities. The standbys whose names appear earlier in the list are given higher priority and will be considered as synchronous. Other standby servers appearing later in this list represent potential synchronous standbys. If any of the current synchronous standbys disconnects for whatever reason, it will be replaced immediately with the next-highest-priority standby.
An example of synchronous_standby_names
for a priority-based multiple synchronous standbys is:
In this example, if four standby servers s1
, s2
, s3
and s4
are running, the two standbys s1
and s2
will be chosen as synchronous standbys because their names appear early in the list of standby names. s3
is a potential synchronous standby and will take over the role of synchronous standby when either of s1
or s2
fails. s4
is an asynchronous standby since its name is not in the list.
The method ANY
specifies a quorum-based synchronous replication and makes transaction commits wait until their WAL records are replicated to at least the requested number of synchronous standbys in the list.
An example of synchronous_standby_names
for a quorum-based multiple synchronous standbys is:
In this example, if four standby servers s1
, s2
, s3
and s4
are running, transaction commits will wait for replies from at least any two standbys of s1
, s2
and s3
. s4
is an asynchronous standby since its name is not in the list.
The synchronous states of standby servers can be viewed using the pg_stat_replication
view.
Synchronous replication usually requires carefully planned and placed standby servers to ensure applications perform acceptably. Waiting doesn't utilize system resources, but transaction locks continue to be held until the transfer is confirmed. As a result, incautious use of synchronous replication will reduce performance for database applications because of increased response times and higher contention.
PostgreSQL allows the application developer to specify the durability level required via replication. This can be specified for the system overall, though it can also be specified for specific users or connections, or even individual transactions.
For example, an application workload might consist of: 10% of changes are important customer details, while 90% of changes are less important data that the business can more easily survive if it is lost, such as chat messages between users.
With synchronous replication options specified at the application level (on the primary) we can offer synchronous replication for the most important changes, without slowing down the bulk of the total workload. Application level options are an important and practical tool for allowing the benefits of synchronous replication for high performance applications.
You should consider that the network bandwidth must be higher than the rate of generation of WAL data.
synchronous_standby_names
specifies the number and names of synchronous standbys that transaction commits made when synchronous_commit
is set to on
, remote_apply
or remote_write
will wait for responses from. Such transaction commits may never be completed if any one of synchronous standbys should crash.
The best solution for high availability is to ensure you keep as many synchronous standbys as requested. This can be achieved by naming multiple potential synchronous standbys using synchronous_standby_names
.
In a priority-based synchronous replication, the standbys whose names appear earlier in the list will be used as synchronous standbys. Standbys listed after these will take over the role of synchronous standby if one of current ones should fail.
In a quorum-based synchronous replication, all the standbys appearing in the list will be used as candidates for synchronous standbys. Even if one of them should fail, the other standbys will keep performing the role of candidates of synchronous standby.
When a standby first attaches to the primary, it will not yet be properly synchronized. This is described as catchup
mode. Once the lag between standby and primary reaches zero for the first time we move to real-time streaming
state. The catch-up duration may be long immediately after the standby has been created. If the standby is shut down, then the catch-up period will increase according to the length of time the standby has been down. The standby is only able to become a synchronous standby once it has reached streaming
state. This state can be viewed using the pg_stat_replication
view.
If primary restarts while commits are waiting for acknowledgement, those waiting transactions will be marked fully committed once the primary database recovers. There is no way to be certain that all standbys have received all outstanding WAL data at time of the crash of the primary. Some transactions may not show as committed on the standby, even though they show as committed on the primary. The guarantee we offer is that the application will not receive explicit acknowledgement of the successful commit of a transaction until the WAL data is known to be safely received by all the synchronous standbys.
If you really cannot keep as many synchronous standbys as requested then you should decrease the number of synchronous standbys that transaction commits must wait for responses from in synchronous_standby_names
(or disable it) and reload the configuration file on the primary server.
If the primary is isolated from remaining standby servers you should fail over to the best candidate of those other remaining standby servers.
If you need to re-create a standby server while transactions are waiting, make sure that the commands pg_start_backup() and pg_stop_backup() are run in a session with synchronous_commit
= off
, otherwise those requests will wait forever for the standby to appear.
When continuous WAL archiving is used in a standby, there are two different scenarios: the WAL archive can be shared between the primary and the standby, or the standby can have its own WAL archive. When the standby has its own WAL archive, set archive_mode
to always
, and the standby will call the archive command for every WAL segment it receives, whether it's by restoring from the archive or by streaming replication. The shared archive can be handled similarly, but the archive_command
must test if the file being archived exists already, and if the existing file has identical contents. This requires more care in the archive_command
, as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice. And all that must be done free of race conditions, if two servers attempt to archive the same file at the same time.
If archive_mode
is set to on
, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but will not archive any WAL it did not generate itself. To get a complete series of WAL files in the archive, you must ensure that all WAL is archived, before it reaches the standby. This is inherently true with file-based log shipping, as the standby can only restore files that are found in the archive, but not if streaming replication is enabled. When a server is not in recovery mode, there is no difference between on
and always
modes.
一般來說,一個現代的 Unix 相容平台應該都能夠執行 PostgreSQL。 在發佈時接受過特定測試的平台在下面的第 17.6 節中列出。 在發行版的 doc 子目錄中,有幾個特定於平台的 FAQ 文檔,如果您遇到問題,您可能希望查閱。
The following software packages are required for building PostgreSQL:
GNU make version 3.81 or newer is required; other make programs or older GNU make versions will not work. (GNU make is sometimes installed under the name gmake
.) To test for GNU make enter:
You need an ISO/ANSI C compiler (at least C99-compliant). Recent versions of GCC are recommended, but PostgreSQL is known to build using a wide variety of compilers from different vendors.
tar is required to unpack the source distribution, in addition to either gzip or bzip2.
The GNU Readline library is used by default. It allows psql (the PostgreSQL command line SQL interpreter) to remember each command you type, and allows you to use arrow keys to recall and edit previous commands. This is very helpful and is strongly recommended. If you don't want to use it then you must specify the --without-readline
option to configure
. As an alternative, you can often use the BSD-licensed libedit
library, originally developed on NetBSD. The libedit
library is GNU Readline-compatible and is used if libreadline
is not found, or if --with-libedit-preferred
is used as an option to configure
. If you are using a package-based Linux distribution, be aware that you need both the readline
and readline-devel
packages, if those are separate in your distribution.
The zlib compression library is used by default. If you don't want to use it then you must specify the --without-zlib
option to configure
. Using this option disables support for compressed archives in pg_dump and pg_restore.
The following packages are optional. They are not required in the default configuration, but they are needed when certain build options are enabled, as explained below:
To build the server programming language PL/Perl you need a full Perl installation, including the libperl
library and the header files. The minimum required version is Perl 5.8.3. Since PL/Perl will be a shared library, the libperl
library must be a shared library also on most platforms. This appears to be the default in recent Perl versions, but it was not in earlier versions, and in any case it is the choice of whomever installed Perl at your site. configure
will fail if building PL/Perl is selected but it cannot find a shared libperl
. In that case, you will have to rebuild and install Perl manually to be able to build PL/Perl. During the configuration process for Perl, request a shared library.
If you intend to make more than incidental use of PL/Perl, you should ensure that the Perl installation was built with the usemultiplicity
option enabled (perl -V
will show whether this is the case).
To build the PL/Python server programming language, you need a Python installation with the header files and the sysconfig module. The minimum required version is Python 3.2.
Since PL/Python will be a shared library, the libpython
library must be a shared library also on most platforms. This is not the case in a default Python installation built from source, but a shared library is available in many operating system distributions. configure
will fail if building PL/Python is selected but it cannot find a shared libpython
. That might mean that you either have to install additional packages or rebuild (part of) your Python installation to provide this shared library. When building from source, run Python's configure with the --enable-shared
flag.
To build the PL/Tcl procedural language, you of course need a Tcl installation. The minimum required version is Tcl 8.4.
To enable Native Language Support (NLS), that is, the ability to display a program's messages in a language other than English, you need an implementation of the Gettext API. Some operating systems have this built-in (e.g., Linux, NetBSD, Solaris), for other systems you can download an add-on package from https://www.gnu.org/software/gettext/. If you are using the Gettext implementation in the GNU C library then you will additionally need the GNU Gettext package for some utility programs. For any of the other implementations you will not need it.
You need OpenSSL, if you want to support encrypted client connections. OpenSSL is also required for random number generation on platforms that do not have /dev/urandom
(except Windows). The minimum required version is 1.0.1.
You need Kerberos, OpenLDAP, and/or PAM, if you want to support authentication using those services.
You need LZ4, if you want to support compression of data with that method; see default_toast_compression and wal_compression.
You need Zstandard, if you want to support compression of data with that method; see wal_compression. The minimum required version is 1.4.0.
要編譯 PostgreSQL 文件,有一些獨特的要求; 請參閱第 J.2 節。
If you are building from a Git tree instead of using a released source package, or if you want to do server development, you also need the following packages:
Flex and Bison are needed to build from a Git checkout, or if you changed the actual scanner and parser definition files. If you need them, be sure to get Flex 2.5.31 or later and Bison 1.875 or later. Other lex and yacc programs cannot be used.
Perl 5.8.3 or later is needed to build from a Git checkout, or if you changed the input files for any of the build steps that use Perl scripts. If building on Windows you will need Perl in any case. Perl is also required to run some test suites.
If you need to get a GNU package, you can find it at your local GNU mirror site (see https://www.gnu.org/prep/ftp for a list) or at ftp://ftp.gnu.org/gnu/.
Also check that you have sufficient disk space. You will need about 350 MB for the source tree during compilation and about 60 MB for the installation directory. An empty database cluster takes about 40 MB; databases take about five times the amount of space that a flat text file with the same data would take. If you are going to run the regression tests you will temporarily need up to an extra 300 MB. Use the df
command to check free disk space.
The idea behind this dump method is to generate a file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. PostgreSQL provides the utility program pg_dump for this purpose. The basic usage of this command is:
As you see, pg_dump writes its result to the standard output. We will see below how this can be useful. While the above command creates a text file, pg_dump can create files in other formats that allow for parallelism and more fine-grained control of object restoration.
pg_dump is a regular PostgreSQL client application (albeit a particularly clever one). This means that you can perform this backup procedure from any remote host that has access to the database. But remember that pg_dump does not operate with special permissions. In particular, it must have read access to all tables that you want to back up, so in order to back up the entire database you almost always have to run it as a database superuser. (If you do not have sufficient privileges to back up the entire database, you can still back up portions of the database to which you do have access using options such as -n
schema
or -t
table
.)
To specify which database server pg_dump should contact, use the command line options -h
host
and -p
port
. The default host is the local host or whatever your PGHOST
environment variable specifies. Similarly, the default port is indicated by the PGPORT
environment variable or, failing that, by the compiled-in default. (Conveniently, the server will normally have the same compiled-in default.)
Like any other PostgreSQL client application, pg_dump will by default connect with the database user name that is equal to the current operating system user name. To override this, either specify the -U
option or set the environment variable PGUSER
. Remember that pg_dump connections are subject to the normal client authentication mechanisms (which are described in Chapter 21).
An important advantage of pg_dump over the other backup methods described later is that pg_dump's output can generally be re-loaded into newer versions of PostgreSQL, whereas file-level backups and continuous archiving are both extremely server-version-specific. pg_dump is also the only method that will work when transferring a database to a different machine architecture, such as going from a 32-bit to a 64-bit server.
Dumps created by pg_dump are internally consistent, meaning, the dump represents a snapshot of the database at the time pg_dump began running. pg_dump does not block other operations on the database while it is working. (Exceptions are those operations that need to operate with an exclusive lock, such as most forms of ALTER TABLE
.)
Text files created by pg_dump are intended to be read in by the psql program. The general command form to restore a dump is
where dumpfile
is the file output by the pg_dump command. The database dbname
will not be created by this command, so you must create it yourself from template0
before executing psql (e.g., with createdb -T template0
dbname
). psql supports options similar to pg_dump for specifying the database server to connect to and the user name to use. See the psql reference page for more information. Non-text file dumps are restored using the pg_restore utility.
Before restoring an SQL dump, all the users who own objects or were granted permissions on objects in the dumped database must already exist. If they do not, the restore will fail to recreate the objects with the original ownership and/or permissions. (Sometimes this is what you want, but usually it is not.)
By default, the psql script will continue to execute after an SQL error is encountered. You might wish to run psql with the ON_ERROR_STOP
variable set to alter that behavior and have psql exit with an exit status of 3 if an SQL error occurs:
Either way, you will only have a partially restored database. Alternatively, you can specify that the whole dump should be restored as a single transaction, so the restore is either fully completed or fully rolled back. This mode can be specified by passing the -1
or --single-transaction
command-line options to psql. When using this mode, be aware that even a minor error can rollback a restore that has already run for many hours. However, that might still be preferable to manually cleaning up a complex database after a partially restored dump.
The ability of pg_dump and psql to write to or read from pipes makes it possible to dump a database directly from one server to another, for example:
The dumps produced by pg_dump are relative to template0
. This means that any languages, procedures, etc. added via template1
will also be dumped by pg_dump. As a result, when restoring, if you are using a customized template1
, you must create the empty database from template0
, as in the example above.
After restoring a backup, it is wise to run ANALYZE
on each database so the query optimizer has useful statistics; see Section 25.1.3 and Section 25.1.6 for more information. For more advice on how to load large amounts of data into PostgreSQL efficiently, refer to Section 14.4.
pg_dump dumps only a single database at a time, and it does not dump information about roles or tablespaces (because those are cluster-wide rather than per-database). To support convenient dumping of the entire contents of a database cluster, the pg_dumpall program is provided. pg_dumpall backs up each database in a given cluster, and also preserves cluster-wide data such as role and tablespace definitions. The basic usage of this command is:
The resulting dump can be restored with psql:
(Actually, you can specify any existing database name to start from, but if you are loading into an empty cluster then postgres
should usually be used.) It is always necessary to have database superuser access when restoring a pg_dumpall dump, as that is required to restore the role and tablespace information. If you use tablespaces, make sure that the tablespace paths in the dump are appropriate for the new installation.
pg_dumpall works by emitting commands to re-create roles, tablespaces, and empty databases, then invoking pg_dump for each database. This means that while each database will be internally consistent, the snapshots of different databases are not synchronized.
Cluster-wide data can be dumped alone using the pg_dumpall --globals-only
option. This is necessary to fully backup the cluster if running the pg_dump command on individual databases.
Some operating systems have maximum file size limits that cause problems when creating large pg_dump output files. Fortunately, pg_dump can write to the standard output, so you can use standard Unix tools to work around this potential problem. There are several possible methods:
Use compressed dumps. You can use your favorite compression program, for example gzip:
Reload with:
or:
Use split
. The split
command allows you to split the output into smaller files that are acceptable in size to the underlying file system. For example, to make 2 gigabyte chunks:
Reload with:
If using GNU split, it is possible to use it and gzip together:
It can be restored using zcat
.
Use pg_dump's custom dump format. If PostgreSQL was built on a system with the zlib compression library installed, the custom dump format will compress data as it writes it to the output file. This will produce dump file sizes similar to using gzip
, but it has the added advantage that tables can be restored selectively. The following command dumps a database using the custom dump format:
A custom-format dump is not a script for psql, but instead must be restored with pg_restore, for example:
See the pg_dump and pg_restore reference pages for details.
For very large databases, you might need to combine split
with one of the other two approaches.
Use pg_dump's parallel dump feature. To speed up the dump of a large database, you can use pg_dump's parallel mode. This will dump multiple tables at the same time. You can control the degree of parallelism with the -j
parameter. Parallel dumps are only supported for the "directory" archive format.
You can use pg_restore -j
to restore a dump in parallel. This will work for any archive of either the "custom" or the "directory" archive mode, whether or not it has been created with pg_dump -j
.
PostgreSQL 在執行過程中不斷地在叢集資料目錄的 pg_wal/ 子目錄中維護一個交易日誌(Write Ahead Log, WAL)。日誌記錄了對資料庫資料檔案所做的所有變更。該日誌主要用於意外災難還原的目的:如果系統意外損毁,則可以透過「重播」自上一個檢查點以來所建立的日誌項目來恢復資料庫的一致性。然而,日誌的存在使得可以使用第三種策略來備份數據庫:我們可以將檔案系統級備份與 WAL 檔案備份結合在一起。 如果需要復原,我們將還原檔案系統備份,然後從備份的 WAL 檔案中重播以使系統進入當下的狀態。 與前面所介紹的方法相比,這種方法的管理更為複雜,但具有一些明顯的好處:
我們不需要完美一致的檔案系統備份作為起點。備份中的任何內部不一致都將透過日誌重播進行糾正(這與損毁復原期間發生的變化沒有太大不同)。因此,我們不需要檔案系統的快照功能,而只需要 tar 或類似的封存工具。
由於我們可以結合無限長的 WAL 檔案序列進行重播,因此只需繼續封存 WAL 檔案就可以實現連續備份。這對於大型資料庫來說尤其具有價值,在大型資料庫中,經常性進行完整備份可能不太方便。
不必一直重複播放 WAL 項目。我們可以隨時停止重播,並獲得當時的資料庫快照。因此,此技術支持時間點還原:自從進行基本備份以來,可以隨時將資料庫還原到其狀態。
如果我們將一系列 WAL 檔案連續提供給另一台已載入了相同基本備份檔案的伺服器,則我們將擁有一個熱備份系統:在任何時候,我們都可以啟動第二台伺服器,而該伺服器將具有近乎最新的資料庫副本。
pg_dump 和 pg_dumpall 並不會產生檔案系統層級的備份,因此不能用於連續歸檔解決方案的一部分。這樣的備份是邏輯上的,並且沒有包含足夠的資訊供 WAL 重播使用。
與普通資料系統備份技術一樣,此方法只能支援還原整個資料庫叢集,而不支援部份還原。此外,它還需要大量的檔案儲存空間:基本備份可能會很龐大,繁忙的系統將產生成許多數 MegaByte 等級的 WAL 流量,必須對其進行封存。儘管如此,在許多需要高可靠性的情況下,它還是備份技術中的首選。
要使用連續歸檔(許多資料庫供應商也將其稱為「線上備份」)成功恢復,您需要連續的 WAL 歸檔序列,該序列至少可以延伸到備份的開始時間。因此,在開始第一次基本備份之前,應先設定並測試用於封存 WAL 檔案的程序。因此,我們首先討論封存 WAL 檔案的機制。
從抽象的意義上講,執行中的 PostgreSQL 系統會產生無限長的 WAL 記錄序列。系統從物理上將此序列劃分為 WAL 分段檔案,每個檔案通常為16MB(儘管分段大小可以在 initdb 期間變更)。 分段檔案被賦予數字名稱,以反映它們在抽象的 WAL 序列中的位置。當不使用 WAL 歸檔時,系統通常只建立幾個分段檔案,然後透過將不再需要的分段檔案重新命名為較高的分段號號來「回收」它們。假設其內容在最後一個檢查點之前的分段檔案不再受關注時,即為可以回收。
歸檔處理 WAL 資料時,我們需要在每個分段檔案填滿後取得其內容,並將該資料保存在回收分段檔案以供重用之前的某個位置。根據應用程序和可用硬體的不同,可能有許多不同的「將資料保存到某處」的方式:我們可以將分段檔案複製到另一台主機上 NFS 掛載的目錄中,然後將它們寫入磁帶中(確保您擁有 一種識別每個檔案的原始名稱的方法),或者將它們一起批次處理並燒錄到 CD 上,或者也可以完全燒錄所有資料。為了給資料庫管理者提供靈活性,PostgreSQL 嘗試不對如何完成歸檔做任何假設。相反地,PostgreSQL 讓管理者指定要執行的 shell 命令,以將完整的分段檔案複製到需要的位置。該命令可以像 cp 一樣簡單,也可以呼叫複雜的 shell 腳本—一切由你決定。
要啟用 WAL 歸檔機制,請將 wal_level 組態參數設定為 replica 或更高的等級,將 archive_mode 設定為 on,然後在 archive_command 組態參數中指定要使用的 shell 命令。實際上,這些設定始終會放置在 postgresql.conf 檔案中。在 archive_command 中,%p 替換為要存檔的檔案路徑名稱,而 %f 僅替換為檔案名稱。(路徑名稱是相對於目前的工作目錄(即叢集的資料目錄)的。)如果需要在命令中嵌入實際的 % 字符,請使用 %%。最簡單的指令是:
它將可歸檔的 WAL 分段檔案複製到目錄 /mnt/server/archivedir 中。 (這是範例,而不是建議,並且可能不是所有平台都適用。)替換 %p 和 %f 參數後,實際執行的命令可能如下所示:
將為每個要歸檔的新檔案產生一個類似的命令。
將以執行 PostgreSQL 伺服器的同一用戶的所有權執行 archive 命令。由於要歸檔的一系列 WAL 檔案實際上包含了資料庫中的所有內容,因此您將要確保已歸檔的資料受到保護,以免被窺探;例如,應該存檔到沒有同群組使用者,所有其他人都沒有讀取權限的目錄中。
重要的是,檔案封存指令只有在成功時才回傳零並且退出。結果為零時,PostgreSQL 將假設該檔案已成功封存,將會刪除或回收它。但是,回傳非零的狀態將會告訴 PostgreSQL 該檔案尚未封存。它將定期重試,直到成功為止。
通常應將 archive 指令設計為拒絕覆蓋任何先前存在的封存檔案。這是一種重要的安全設定,可以在管理員出錯(例如將兩個不同伺服器的輸出發送到同一封存目錄)時保持封存檔案的完整性。
仍然建議測試的封存指令以確保它確實不會覆蓋現有檔案,並且在這種情況下會回傳非零的結果。上面用於 Unix 的範例指令透過包含一個單獨的測試步驟來確保這一點。在某些 Unix 平台上,cp 具有諸如 -i 之類的選項,這些選項可用於更輕鬆地完成相同的操作,但是在不驗證是否回傳正確結束狀態的情況下,請不要依賴這些選項。(特別是,當使用 -i 並且目標檔案已經存在時,GNU cp 將回傳零,這並不是 PostgreSQL 所預期的行為。)
While designing your archiving setup, consider what will happen if the archive command fails repeatedly because some aspect requires operator intervention or the archive runs out of space. For example, this could occur if you write to tape without an autochanger; when the tape fills, nothing further can be archived until the tape is swapped. You should ensure that any error condition or request to a human operator is reported appropriately so that the situation can be resolved reasonably quickly. The pg_wal/
directory will continue to fill with WAL segment files until the situation is resolved. (If the file system containing pg_wal/
fills up, PostgreSQL will do a PANIC shutdown. No committed transactions will be lost, but the database will remain offline until you free some space.)
The speed of the archiving command is unimportant as long as it can keep up with the average rate at which your server generates WAL data. Normal operation continues even if the archiving process falls a little behind. If archiving falls significantly behind, this will increase the amount of data that would be lost in the event of a disaster. It will also mean that the pg_wal/
directory will contain large numbers of not-yet-archived segment files, which could eventually exceed available disk space. You are advised to monitor the archiving process to ensure that it is working as you intend.
In writing your archive command, you should assume that the file names to be archived can be up to 64 characters long and can contain any combination of ASCII letters, digits, and dots. It is not necessary to preserve the original relative path (%p
) but it is necessary to preserve the file name (%f
).
Note that although WAL archiving will allow you to restore any modifications made to the data in your PostgreSQL database, it will not restore changes made to configuration files (that is, postgresql.conf
, pg_hba.conf
and pg_ident.conf
), since those are edited manually rather than through SQL operations. You might wish to keep the configuration files in a location that will be backed up by your regular file system backup procedures. See Section 19.2 for how to relocate the configuration files.
The archive command is only invoked on completed WAL segments. Hence, if your server generates only little WAL traffic (or has slack periods where it does so), there could be a long delay between the completion of a transaction and its safe recording in archive storage. To put a limit on how old unarchived data can be, you can set archive_timeout to force the server to switch to a new WAL segment file at least that often. Note that archived files that are archived early due to a forced switch are still the same length as completely full files. It is therefore unwise to set a very short archive_timeout
— it will bloat your archive storage. archive_timeout
settings of a minute or so are usually reasonable.
Also, you can force a segment switch manually with pg_switch_wal
if you want to ensure that a just-finished transaction is archived as soon as possible. Other utility functions related to WAL management are listed in Table 9.84.
When wal_level
is minimal
some SQL commands are optimized to avoid WAL logging, as described in Section 14.4.7. If archiving or streaming replication were turned on during execution of one of these statements, WAL would not contain enough information for archive recovery. (Crash recovery is unaffected.) For this reason, wal_level
can only be changed at server start. However, archive_command
can be changed with a configuration file reload. If you wish to temporarily stop archiving, one way to do it is to set archive_command
to the empty string (''
). This will cause WAL files to accumulate in pg_wal/
until a working archive_command
is re-established.
The easiest way to perform a base backup is to use the pg_basebackup tool. It can create a base backup either as regular files or as a tar archive. If more flexibility than pg_basebackup can provide is required, you can also make a base backup using the low level API (see Section 25.3.3).
It is not necessary to be concerned about the amount of time it takes to make a base backup. However, if you normally run the server with full_page_writes
disabled, you might notice a drop in performance while the backup runs since full_page_writes
is effectively forced on during backup mode.
To make use of the backup, you will need to keep all the WAL segment files generated during and after the file system backup. To aid you in doing this, the base backup process creates a backup history file that is immediately stored into the WAL archive area. This file is named after the first WAL segment file that you need for the file system backup. For example, if the starting WAL file is 0000000100001234000055CD
the backup history file will be named something like 0000000100001234000055CD.007C9330.backup
. (The second part of the file name stands for an exact position within the WAL file, and can ordinarily be ignored.) Once you have safely archived the file system backup and the WAL segment files used during the backup (as specified in the backup history file), all archived WAL segments with names numerically less are no longer needed to recover the file system backup and can be deleted. However, you should consider keeping several backup sets to be absolutely certain that you can recover your data.
The backup history file is just a small text file. It contains the label string you gave to pg_basebackup, as well as the starting and ending times and WAL segments of the backup. If you used the label to identify the associated dump file, then the archived history file is enough to tell you which dump file to restore.
Since you have to keep around all the archived WAL files back to your last base backup, the interval between base backups should usually be chosen based on how much storage you want to expend on archived WAL files. You should also consider how long you are prepared to spend recovering, if recovery should be necessary — the system will have to replay all those WAL segments, and that could take awhile if it has been a long time since the last base backup.
使用低階 API 進行基本備份的程序比 pg_basebackup 方法需要更多的步驟,但是相對簡單。依次執行這些步驟,並在繼續進行下一步之前驗證步驟的成功是非常重要的。
可以以非排他性(Non-Exclusive)或排他性(Exclusive)方式進行低階的基礎備份。建議使用非排他性方法,不建議使用排他性方法,此方式將來會被捨棄。
非排他性的低階備份是一種允許其他同時備份也正在運行的備份方式(使用相同備份 API 啟動的備份和使用 pg_basebackup 啟動的備份)。
確保已啟用 WAL 封存選項並且是在正常的狀態。
以具有運行 pg_start_backup 的權限的使用者(超級使用者,或者是已經被授權執行此函數的使用者)身份連線到伺服器(無論哪個資料庫),並執行以下指令:
其中 label 是您要用來唯一識別此備份操作的任何字串。必須維持呼叫 pg_start_backup 的連線,直到備份結束,否則備份將會自動中止。
預設情況下,pg_start_backup 可能需要很長時間才能完成。這是因為它會執行一個檢查點(checkpoint),並且該檢查點所需的 I/O 將進行相當長的一段時間,一般情況下是檢查點時間間隔的一半(請參閱配置參數 checkpoint_completion_target)。通常這就是您想要的,因為它最大程度地減少了對查詢處理的影響。如果要儘快開始備份,請將第二個參數更改為 true,這將使用儘可能多的 I/O 發出立即檢查點。 第三個參數為 false 告訴 pg_start_backup 啟動非排他性的基礎備份。
Perform the backup, using any convenient file-system-backup tool such as tar or cpio (not pg_dump or pg_dumpall). It is neither necessary nor desirable to stop normal operation of the database while you do this. See Section 25.3.3.3 for things to consider during this backup.
In the same connection as before, issue the command:
This terminates backup mode. On a primary, it also performs an automatic switch to the next WAL segment. On a standby, it is not possible to automatically switch WAL segments, so you may wish to run pg_switch_wal
on the primary to perform a manual switch. The reason for the switch is to arrange for the last WAL segment file written during the backup interval to be ready to archive.
The pg_stop_backup
will return one row with three values. The second of these fields should be written to a file named backup_label
in the root directory of the backup. The third field should be written to a file named tablespace_map
unless the field is empty. These files are vital to the backup working, and must be written without modification.
Once the WAL segment files active during the backup are archived, you are done. The file identified by pg_stop_backup
's first return value is the last segment that is required to form a complete set of backup files. On a primary, if archive_mode
is enabled and the wait_for_archive
parameter is true
, pg_stop_backup
does not return until the last segment has been archived. On a standby, archive_mode
must be always
in order for pg_stop_backup
to wait. Archiving of these files happens automatically since you have already configured archive_command
. In most cases this happens quickly, but you are advised to monitor your archive system to ensure there are no delays. If the archive process has fallen behind because of failures of the archive command, it will keep retrying until the archive succeeds and the backup is complete. If you wish to place a time limit on the execution of pg_stop_backup
, set an appropriate statement_timeout
value, but make note that if pg_stop_backup
terminates because of this your backup may not be valid.
If the backup process monitors and ensures that all WAL segment files required for the backup are successfully archived then the wait_for_archive
parameter (which defaults to true) can be set to false to have pg_stop_backup
return as soon as the stop backup record is written to the WAL. By default, pg_stop_backup
will wait until all WAL has been archived, which can take some time. This option must be used with caution: if WAL archiving is not monitored correctly then the backup might not include all of the WAL files and will therefore be incomplete and not able to be restored.
排他性的備份方法已經過時,應該避免使用。在 PostgreSQL 9.6 之前,這是唯一可用的低階方法,但是現在建議所有使用者升級其腳本以使用非排他性的備份。
排他性備份的流程與非排他性備份的流程基本相同,但是在幾個關鍵步驟上有所不同。這種類型的備份只能在主要資料庫上進行,不允許同時進行其他備份。此外,由於如下所述建立了備份標籤檔案,因此它可以阻止當機後主伺服器的自動重啟。另一方面,從備份或備用資料庫中刪除此檔案是一個常見的人為錯誤,它可能導致嚴重的資料損壞。如果必須使用此方法,則可以使用以下步驟。
Ensure that WAL archiving is enabled and working.
Connect to the server (it does not matter which database) as a user with rights to run pg_start_backup (superuser, or a user who has been granted EXECUTE on the function) and issue the command:
where label
is any string you want to use to uniquely identify this backup operation. pg_start_backup
creates a backup label file, called backup_label
, in the cluster directory with information about your backup, including the start time and label string. The function also creates a tablespace map file, called tablespace_map
, in the cluster directory with information about tablespace symbolic links in pg_tblspc/
if one or more such link is present. Both files are critical to the integrity of the backup, should you need to restore from it.
By default, pg_start_backup
can take a long time to finish. This is because it performs a checkpoint, and the I/O required for the checkpoint will be spread out over a significant period of time, by default half your inter-checkpoint interval (see the configuration parameter checkpoint_completion_target). This is usually what you want, because it minimizes the impact on query processing. If you want to start the backup as soon as possible, use:
This forces the checkpoint to be done as quickly as possible.
Perform the backup, using any convenient file-system-backup tool such as tar or cpio (not pg_dump or pg_dumpall). It is neither necessary nor desirable to stop normal operation of the database while you do this. See Section 25.3.3.3 for things to consider during this backup.
As noted above, if the server crashes during the backup it may not be possible to restart until the backup_label
file has been manually deleted from the PGDATA
directory. Note that it is very important to never remove the backup_label
file when restoring a backup, because this will result in corruption. Confusion about when it is appropriate to remove this file is a common cause of data corruption when using this method; be very certain that you remove the file only on an existing master and never when building a standby or restoring a backup, even if you are building a standby that will subsequently be promoted to a new master.
Again connect to the database as a user with rights to run pg_stop_backup (superuser, or a user who has been granted EXECUTE on the function), and issue the command:
This function terminates backup mode and performs an automatic switch to the next WAL segment. The reason for the switch is to arrange for the last WAL segment written during the backup interval to be ready to archive.
Once the WAL segment files active during the backup are archived, you are done. The file identified by pg_stop_backup
's result is the last segment that is required to form a complete set of backup files. If archive_mode
is enabled, pg_stop_backup
does not return until the last segment has been archived. Archiving of these files happens automatically since you have already configured archive_command
. In most cases this happens quickly, but you are advised to monitor your archive system to ensure there are no delays. If the archive process has fallen behind because of failures of the archive command, it will keep retrying until the archive succeeds and the backup is complete.
When using exclusive backup mode, it is absolutely imperative to ensure that pg_stop_backup
completes successfully at the end of the backup. Even if the backup itself fails, for example due to lack of disk space, failure to call pg_stop_backup
will leave the server in backup mode indefinitely, causing future backups to fail and increasing the risk of a restart failure during the time that backup_label
exists.
Some file system backup tools emit warnings or errors if the files they are trying to copy change while the copy proceeds. When taking a base backup of an active database, this situation is normal and not an error. However, you need to ensure that you can distinguish complaints of this sort from real errors. For example, some versions of rsync return a separate exit code for “vanished source files”, and you can write a driver script to accept this exit code as a non-error case. Also, some versions of GNU tar return an error code indistinguishable from a fatal error if a file was truncated while tar was copying it. Fortunately, GNU tar versions 1.16 and later exit with 1 if a file was changed during the backup, and 2 for other errors. With GNU tar version 1.23 and later, you can use the warning options --warning=no-file-changed --warning=no-file-removed
to hide the related warning messages.
Be certain that your backup includes all of the files under the database cluster directory (e.g., /usr/local/pgsql/data
). If you are using tablespaces that do not reside underneath this directory, be careful to include them as well (and be sure that your backup archives symbolic links as links, otherwise the restore will corrupt your tablespaces).
You should, however, omit from the backup the files within the cluster's pg_wal/
subdirectory. This slight adjustment is worthwhile because it reduces the risk of mistakes when restoring. This is easy to arrange if pg_wal/
is a symbolic link pointing to someplace outside the cluster directory, which is a common setup anyway for performance reasons. You might also want to exclude postmaster.pid
and postmaster.opts
, which record information about the running postmaster, not about the postmaster which will eventually use this backup. (These files can confuse pg_ctl.)
It is often a good idea to also omit from the backup the files within the cluster's pg_replslot/
directory, so that replication slots that exist on the master do not become part of the backup. Otherwise, the subsequent use of the backup to create a standby may result in indefinite retention of WAL files on the standby, and possibly bloat on the master if hot standby feedback is enabled, because the clients that are using those replication slots will still be connecting to and updating the slots on the master, not the standby. Even if the backup is only intended for use in creating a new master, copying the replication slots isn't expected to be particularly useful, since the contents of those slots will likely be badly out of date by the time the new master comes on line.
The contents of the directories pg_dynshmem/
, pg_notify/
, pg_serial/
, pg_snapshots/
, pg_stat_tmp/
, and pg_subtrans/
(but not the directories themselves) can be omitted from the backup as they will be initialized on postmaster startup. If stats_temp_directory is set and is under the data directory then the contents of that directory can also be omitted.
Any file or directory beginning with pgsql_tmp
can be omitted from the backup. These files are removed on postmaster start and the directories will be recreated as needed.
pg_internal.init
files can be omitted from the backup whenever a file of that name is found. These files contain relation cache data that is always rebuilt when recovering.
The backup label file includes the label string you gave to pg_start_backup
, as well as the time at which pg_start_backup
was run, and the name of the starting WAL file. In case of confusion it is therefore possible to look inside a backup file and determine exactly which backup session the dump file came from. The tablespace map file includes the symbolic link names as they exist in the directory pg_tblspc/
and the full path of each symbolic link. These files are not merely for your information; their presence and contents are critical to the proper operation of the system's recovery process.
It is also possible to make a backup while the server is stopped. In this case, you obviously cannot use pg_start_backup
or pg_stop_backup
, and you will therefore be left to your own devices to keep track of which backup is which and how far back the associated WAL files go. It is generally better to follow the continuous archiving procedure above.
好的,剛好最糟糕的事情發生了,這時候您需要使用備份來還原資料庫。步驟如下:
停止伺服器(如果正在執行的話)。
如果有足夠的空間,請將整個叢集資料目錄和所有資料表空間複製到一個暫存的路徑,以備之需。請注意,此預防措施需要你的系統上有足夠的可用空間來容納現有資料庫的兩個副本。如果沒有足夠的空間,則至少應保存叢集的 pg_wal 子目錄的內容,因為它可能包含在系統關閉之前尚未歸檔封存的交易日誌。
刪除叢集資料目錄下以及正在使用的所有資料表空間目錄下的所有現有檔案和子目錄。
從檔案系統備份中還原資料庫檔案。確保已授予正確的擁有者(資料庫系統使用者,而不是 root!)和正確的權限還原它們。如果有使用額外的資料表空間,則應驗證 pg_tblspc/ 中的符號連結是否也已正確還原。
刪除 pg_wal/ 中的所有檔案;這些來自檔案系統的備份,因此可能已過時而不是最新。如果您根本沒有備份 pg_wal/,那麼請以適當的權限重新建立它,請小心確保如果您之前已進行過額外配置,則應將其重新建立為符號連結。
如果您具有在步驟 2 中所保存的未封存 WAL 檔案,請將其複製到 pg_wal/ 之中。(最好複製它們,而不是移動它們,因為如果出現問題而必須重新開始的話,您仍然擁有未修改的檔案。)
在 postgresql.conf 中進行還原設定(請參閱第 19.5.4 節),並在叢集資料目錄中建立檔案 recovery.signal。您可能還需要臨時修改 pg_hba.conf,以防止一般使用者連線進來,直到您確定還原成功為止。
啟動伺服器。伺服器將進入還原模式,並繼續讀取所需的 WAL 檔案。如果還原由於外部錯誤而終止,則只需重啟伺服器即可繼續還原。還原過程完成後,伺服器將刪除 recovery.signal(以防止以後意外重新進入還原模式),然後開始正常的資料庫操作。
檢查資料庫的內容,以確保您已經還原到所需要的狀態。如果沒有,請回到步驟 1。如果一切正常,請透過將 pg_hba.conf 恢復為正常狀態來允許您的使用者進行連線。
所有這一切的關鍵部分是建立還原設定,該設定描述了您要如何還原以及還原應進行多長的時間。你絕對必須指定的一件事是 restore_command,它告訴 PostgreSQL 如何檢索已封存的 WAL 檔案。像 archive_command 一樣,這是一個 shell 指令字串。它可以包含 %f(依所需的日誌檔案的名稱代換)和 %p(將日誌檔案複製到的路徑名)代換。(路徑名是相對於目前的工作目錄(即叢集的資料目錄)的。)如果需要在指令中使用實際的 % 字元,請寫入 %%。最簡單的指令是:
which will copy previously archived WAL segments from the directory /mnt/server/archivedir
. Of course, you can use something much more complicated, perhaps even a shell script that requests the operator to mount an appropriate tape.
It is important that the command return nonzero exit status on failure. The command will be called requesting files that are not present in the archive; it must return nonzero when so asked. This is not an error condition. An exception is that if the command was terminated by a signal (other than SIGTERM, which is used as part of a database server shutdown) or an error by the shell (such as command not found), then recovery will abort and the server will not start up.
Not all of the requested files will be WAL segment files; you should also expect requests for files with a suffix of .history
. Also be aware that the base name of the %p
path will be different from %f
; do not expect them to be interchangeable.
WAL segments that cannot be found in the archive will be sought in pg_wal/
; this allows use of recent un-archived segments. However, segments that are available from the archive will be used in preference to files in pg_wal/
.
Normally, recovery will proceed through all available WAL segments, thereby restoring the database to the current point in time (or as close as possible given the available WAL segments). Therefore, a normal recovery will end with a “file not found” message, the exact text of the error message depending upon your choice of restore_command
. You may also see an error message at the start of recovery for a file named something like 00000001.history
. This is also normal and does not indicate a problem in simple recovery situations; see Section 25.3.5 for discussion.
If you want to recover to some previous point in time (say, right before the junior DBA dropped your main transaction table), just specify the required stopping point. You can specify the stop point, known as the “recovery target”, either by date/time, named restore point or by completion of a specific transaction ID. As of this writing only the date/time and named restore point options are very usable, since there are no tools to help you identify with any accuracy which transaction ID to use.
The stop point must be after the ending time of the base backup, i.e., the end time of pg_stop_backup
. You cannot use a base backup to recover to a time when that backup was in progress. (To recover to such a time, you must go back to your previous base backup and roll forward from there.)
If recovery finds corrupted WAL data, recovery will halt at that point and the server will not start. In such a case the recovery process could be re-run from the beginning, specifying a “recovery target” before the point of corruption so that recovery can complete normally. If recovery fails for an external reason, such as a system crash or if the WAL archive has become inaccessible, then the recovery can simply be restarted and it will restart almost from where it failed. Recovery restart works much like checkpointing in normal operation: the server periodically forces all its state to disk, and then updates the pg_control
file to indicate that the already-processed WAL data need not be scanned again.
將資料庫還原到先前時間點的能力會有一些複雜,類似於有關時間旅行和平行宇宙的科幻小說故事。例如,在資料庫的原始歷史記錄中,假設您在星期二晚上 5:15 PM 刪除了一個關鍵的資料表,但是直到星期三中午才意識到自己的錯誤。不用擔心,您可以取出備份,恢復到星期二晚上 5:14 的時間點,並開始運行。在資料庫宇宙的歷史記錄中,其實您從未刪除過資料表。但是,假設您後來又意識到這不是一個好主意,並且想回到原始歷史中的星期三上午。在資料庫執行期間,如果您覆蓋了一些 WAL 檔案,而這些檔案會造成你無法再回到你希望回到原來的時空。因此,為避免這種情況,您需要將時間點恢復後產生的一系列 WAL 記錄與原始資料庫歷史記錄中產生的 WAL 記錄檔案區分開來。
To deal with this problem, PostgreSQL has a notion of timelines. Whenever an archive recovery completes, a new timeline is created to identify the series of WAL records generated after that recovery. The timeline ID number is part of WAL segment file names so a new timeline does not overwrite the WAL data generated by previous timelines. It is in fact possible to archive many different timelines. While that might seem like a useless feature, it's often a lifesaver. Consider the situation where you aren't quite sure what point-in-time to recover to, and so have to do several point-in-time recoveries by trial and error until you find the best place to branch off from the old history. Without timelines this process would soon generate an unmanageable mess. With timelines, you can recover to any prior state, including states in timeline branches that you abandoned earlier.
Every time a new timeline is created, PostgreSQL creates a “timeline history” file that shows which timeline it branched off from and when. These history files are necessary to allow the system to pick the right WAL segment files when recovering from an archive that contains multiple timelines. Therefore, they are archived into the WAL archive area just like WAL segment files. The history files are just small text files, so it's cheap and appropriate to keep them around indefinitely (unlike the segment files which are large). You can, if you like, add comments to a history file to record your own notes about how and why this particular timeline was created. Such comments will be especially valuable when you have a thicket of different timelines as a result of experimentation.
The default behavior of recovery is to recover along the same timeline that was current when the base backup was taken. If you wish to recover into some child timeline (that is, you want to return to some state that was itself generated after a recovery attempt), you need to specify the target timeline ID in recovery_target_timeline. You cannot recover into timelines that branched off earlier than the base backup.
Some tips for configuring continuous archiving are given here.
It is possible to use PostgreSQL's backup facilities to produce standalone hot backups. These are backups that cannot be used for point-in-time recovery, yet are typically much faster to backup and restore than pg_dump dumps. (They are also much larger than pg_dump dumps, so in some cases the speed advantage might be negated.)
As with base backups, the easiest way to produce a standalone hot backup is to use the pg_basebackup tool. If you include the -X
parameter when calling it, all the write-ahead log required to use the backup will be included in the backup automatically, and no special action is required to restore the backup.
If more flexibility in copying the backup files is needed, a lower level process can be used for standalone hot backups as well. To prepare for low level standalone hot backups, make sure wal_level
is set to replica
or higher, archive_mode
to on
, and set up an archive_command
that performs archiving only when a switch file exists. For example:
This command will perform archiving when /var/lib/pgsql/backup_in_progress
exists, and otherwise silently return zero exit status (allowing PostgreSQL to recycle the unwanted WAL file).
With this preparation, a backup can be taken using a script like the following:
The switch file /var/lib/pgsql/backup_in_progress
is created first, enabling archiving of completed WAL files to occur. After the backup the switch file is removed. Archived WAL files are then added to the backup so that both base backup and all required WAL files are part of the same tar file. Please remember to add error handling to your backup scripts.
如果需要考慮封存檔案的儲存空間,則可以使用 gzip 壓縮這些檔案:
然後,您將需要在還原過程中使用 gunzip:
Many people choose to use scripts to define their archive_command
, so that their postgresql.conf
entry looks very simple:
Using a separate script file is advisable any time you want to use more than a single command in the archiving process. This allows all complexity to be managed within the script, which can be written in a popular scripting language such as bash or perl.
Examples of requirements that might be solved within a script include:
Copying data to secure off-site data storage
Batching WAL files so that they are transferred every three hours, rather than one at a time
Interfacing with other backup and recovery software
Interfacing with monitoring software to report errors
使用 archive_command 腳本時,最好啟用 logging_collector。這樣的話,從腳本寫入 stderr 的所有訊息都會出現在資料庫伺服器記錄檔之中,從而使複雜的設定在異常時易於除錯。
截至目前為止,連續歸檔技術(PITR)仍然存在著一些侷限性。這些可能會在未來的版本中改善:
如果在執行基礎備份時執行了 CREATE DATABASE 命令,然後在仍在進行基礎備份的同時修改了 CREATE DATABASE 所複製的樣版資料庫,則還原的時候很可能會使這些修改連帶影響到其所建立的資料庫之中。 這當然不是希望發生的事。為了避免這種風險,最好在進行基礎1備份的同時不要修改任何樣版資料庫。
CREATE TABLESPACE 指令使用絕對路徑進行存放 WAL 記錄,因此重放交易時,將會以相同絕對路徑的資料表空間進行重放。如果正在其他主機上重放交易日誌,則這可能不是希望的的結果。即使在同一台主機上重放交易日誌,但是將日誌重放到新的資料目錄中,也可能很危險:重放仍將覆蓋原始資料表空間的內容。為了避免這種潛在的麻煩,最佳實作是在建立或刪除資料表空間之後進行新的基礎備份。
你還需要注意的是,一般而言 WAL 格式相當龐大,因為它包含許多磁碟頁面快照。這些頁面快照旨在支援災難復原,因為我們可能需要修復部分寫入的磁碟頁面。根據系統硬體和軟體環境的不同,部分寫入的風險可能很小,可以忽略,在這種情況下,您可以透過使用 full_page_writes 參數關閉頁面快照來顯著減少已歸檔日誌的總量。(在執行此操作之前,請先閱讀第 30 章中的說明和警告。)關閉頁面快照並不會阻礙將日誌用於 PITR 操作。未來的發展方向1是即使在啟用 full_page_writes 的情況下,也可以透過刪除不必要的頁面副本來壓縮已歸檔封存的 WAL 資料。同時,管理者可能希望透過儘可能增加檢查點(checkpoint)間隔參數來減少 WAL 中包含的頁面快照的數量。
如果主伺服器發生故障,則備用伺服器應該開始故障轉移程序。
如果備用伺服器發生故障,則毌須進行故障轉移。如果備用伺服器可以重新啟動,即使是在某個時間點之後,也可以利用可重新啟動的還原功能立即重新啟動還原程序。如果備用伺服器無法重新啟動,則應該重新建立一個完整的備用伺服器。
如果主伺服器發生故障,並且備用伺服器成為新的主伺服器,然後舊的主伺服器重新啟動,則必須具有一種機制,通知舊的主伺服器不再是主伺服器。這有時被稱為 STONITH(Shoot The Other Node In The Head),這是避免兩個系統都認為它們是主要系統所必須要做的事,這種情況會導致混亂並導致資料損毁。
Many failover systems use just two systems, the primary and the standby, connected by some kind of heartbeat mechanism to continually verify the connectivity between the two and the viability of the primary. It is also possible to use a third system (called a witness server) to prevent some cases of inappropriate failover, but the additional complexity might not be worthwhile unless it is set up with sufficient care and rigorous testing.
PostgreSQL does not provide the system software required to identify a failure on the primary and notify the standby database server. Many such tools exist and are well integrated with the operating system facilities required for successful failover, such as IP address migration.
Once failover to the standby occurs, there is only a single server in operation. This is known as a degenerate state. The former standby is now the primary, but the former primary is down and might stay down. To return to normal operation, a standby server must be recreated, either on the former primary system when it comes up, or on a third, possibly new, system. The pg_rewind utility can be used to speed up this process on large clusters. Once complete, the primary and standby can be considered to have switched roles. Some people choose to use a third server to provide backup for the new primary until the new standby server is recreated, though clearly this complicates the system configuration and operational processes.
So, switching from primary to standby server can be fast but requires some time to re-prepare the failover cluster. Regular switching from primary to standby is useful, since it allows regular downtime on each system for maintenance. This also serves as a test of the failover mechanism to ensure that it will really work when you need it. Written administration procedures are advised.
要觸發日誌傳送備用伺服器的故障轉移,請執行 pg_ctl promote
、呼叫 pg_promote
或建立一個事件觸發的執行腳本檔案,該檔案名稱及路徑由 promot_trigger_file 指定。如果您打算使用 pg_ctl promote
或呼叫 pg_promote
進行故障轉移,則不需要 promote_trigger_file。 如果要設定僅用於從主伺服器唯讀查詢(而不是出於高可用性目的)的報表伺服器,則毌須進行故障轉移。
另一種備份策略是直接複製 PostgreSQL 用於資料儲存的資料庫中檔案。第 19.2 節介紹了這些檔案的位置。您可以使用自己喜歡的任何方法進行檔案系統備份。例如:
但是,有兩個限制會讓這個方法不可行,或者至少不如 pg_dump 方法:
必須關閉資料庫伺服器才能完成可用的備份。備份其間都無法操作,像是必須禁止所有連線(部分原因是 tar 和類似工具無法對檔案系統狀態進行原子快照,而且還因為伺服器內部還存在一些未儲存的資料緩衝)。 有關停止伺服器的資訊可以在第 19.5 節中找到。不用說,您也需要在還原資料之前關閉伺服器。
如果您已深入研究資料庫的檔案系統結構的詳細資訊,則可能會嘗試僅從特定檔案或目錄中備份或還原某些特定資料表或資料庫。這些都不會成功,因為沒有提交日誌檔案 pg_xact/*,其中包含所有事務的提交狀態,這些檔案中包含的資料將無法使用。資料表檔案僅可用於此資訊。當然,僅還原資料表和關聯的 pg_xact 資料也是不可能的,因為這會使資料庫叢集中的所有其他資料表失效。因此,檔案系統備份僅適用於完整資料庫叢集的完整備份和還原。
另一種檔案系統備份方法是,如果檔案系統支持該功能(並且您願意相信它已正確實作),也就是對資料目錄建立「一致性快照(consistent snapshot)」。典型的過程是製作包含資料庫 volume 的「凍結快照(frozen snapshot)」,然後將整個資料目錄(不僅僅是部分,請參閱前文)從快照複製到備份設備,然後釋放凍結快照。即使資料庫伺服器正在執行,這也能完成備份。但是,以這種方式建立的備份會將資料庫檔案保存為某種狀態,就好像資料庫伺服器未正確關閉一樣。因此,當您以備份的檔案啟動資料庫伺服器時,它將認為先前的伺服器實例崩潰了,並且將重新執行 WAL 日誌。這不會是問題;請注意這一點(並確保在備份中包含 WAL 檔案)。而您可以在拍攝快照之前執行 CHECKPOINT,以減少恢復的時間。
如果您的資料庫分散在多個檔案系統中,則可能沒有任何方法可以獲取所有 volume 完全同步的凍結快照。例如,如果資料檔案和 WAL 日誌位於不同的磁碟上,或者資料表空間位於不同的檔案系統上,則可能無法使用快照備份,因為快照必須同時進行。在這種情況下,請務必仔細閱讀檔案系統文件,然後再使用一致性快照技術。
如果不可能同時建立快照,則一種選擇是關閉資料庫伺服器足夠長的時間以建立所有凍結的快照。或者你還有一種選擇是執行連續歸檔(continuous archiving)基礎備份(第 26.3.2 節),因為此類備份在備份期間不受檔案系統變更的影響。這要求僅在備份過程中啟用連續歸檔。使用連續歸檔還原(第 26.3.4 節)來完成還原。
使用 rsync 執行檔案系統備份也是可以的。首先在資料庫伺服器執行時也執行 rsync,然後關閉資料庫伺服器足夠長的時間以執行 rsync --checksum,即可完成此操作。(--checksum 是必須的,因為 rsync 僅具有一秒的檔案修改時間顆粒度。)第二次 rsync 將比第一次更快,因為它要傳輸的資料相對較少,並且最終結果將會是一致的,因為伺服器是關閉的狀態。此方法目標在最少停機時間的情況下執行檔案系統備份。
請注意,檔案系統備份通常會比 SQL dump 要佔空間。(例如,pg_dump 不需要匯出索引的內容,只需匯出重新建立索引的指令。)但是,進行檔案系統備份可能會更快。
共享磁碟的故障轉移透過僅使用一份資料庫檔案來避免同步程序的花費。它使用由多個伺服器共享的同一個磁碟陣列。如果主要資料庫伺服器發生故障,備用伺服器就可以掛載並啟動資料庫,就好像它正在從資料庫崩潰後恢復一樣。這樣可以進行快速的故障轉移而不會遺失資料。
共享硬體的功能在網路儲存設備中很常見。儘管必須注意檔案系統具有完整的 POSIX 行為,但也可以使用網路檔案系統(請參閱 18.2.2.1 節)。此方法的一個重要限制是,如果共享磁碟陣列發生故障或損壞,則主要伺服器和備用伺服器都將無法運作。另一個問題是,在主要伺服器運行時,備用伺服器永遠都不應存取共享的資料庫檔案。
共享硬碟功能的另一種版本是檔案系統複製,其中對檔案系統的所有變更都將鏡像同步到另一台主機上的檔案系統。唯一的限制是必須確保備用伺服器具有檔案系統一致副本的方式進行鏡像-特別是,寫入備用資料庫的順序必須與主要伺服器上的順序相同。DRBD 是針對 Linux 常用檔案系統複製的解決方案。
透過讀取預寫日誌(WAL)記錄串流,可以使熱備用伺服器保持最新狀態。如果主要伺服器發生故障,則備用資料庫幾乎將包含主要伺服器的所有資料,並且可以迅速成為新的主要資料庫伺服器。這可以是同步或非同步的,不過只能以整個資料庫伺服器為單位來實行。
備用伺服器可以使用基於檔案的日誌傳送(第 26.2 節)或串流式複寫(請參閱第 26.2.5 節)或兩者的結合來實現。有關熱備用伺服器的說明,請參閱第 26.5 節。
邏輯複寫讓資料庫伺服器可以將資料修改的過程發送到另一台伺服器。 PostgreSQL 的邏輯複寫會從 WAL 產生邏輯資料修改串流。邏輯複寫允許複寫單個資料表中的資料變更。邏輯複寫不需要將特定的伺服器指定為主要伺服器或備用伺服器,而是允許資料沿多個方向流動。有關邏輯複寫的更多資訊,請參閱第 30 章。透過邏輯解譯介面(第 48 章),第三方延伸套件也有機會提供類似的功能。
A master-standby replication setup sends all data modification queries to the master server. The master server asynchronously sends data changes to the standby server. The standby can answer read-only queries while the master server is running. The standby server is ideal for data warehouse queries.
Slony-I is an example of this type of replication, with per-table granularity, and support for multiple standby servers. Because it updates the standby server asynchronously (in batches), there is possible data loss during fail over.
With SQL-based replication middleware, a program intercepts every SQL query and sends it to one or all servers. Each server operates independently. Read-write queries must be sent to all servers, so that every server receives any changes. But read-only queries can be sent to just one server, allowing the read workload to be distributed among them.
If queries are simply broadcast unmodified, functions like random()
, CURRENT_TIMESTAMP
, and sequences can have different values on different servers. This is because each server operates independently, and because SQL queries are broadcast (and not actual modified rows). If this is unacceptable, either the middleware or the application must query such values from a single server and then use those values in write queries. Another option is to use this replication option with a traditional master-standby setup, i.e., data modification queries are sent only to the master and are propagated to the standby servers via master-standby replication, not by the replication middleware. Care must also be taken that all transactions either commit or abort on all servers, perhaps using two-phase commit (PREPARE TRANSACTION and COMMIT PREPARED). Pgpool-II and Continuent Tungsten are examples of this type of replication.
對於不定期連線或通訊網路速度較慢的伺服器(例如筆記型電腦或遠端伺服器),保持伺服器之間的資料一致性是一個挑戰。使用 Asynchronous Multimaster Replication,每個伺服器獨立工作,並定期與其他伺服器溝通以識別衝突的交易事務。可以透過使用者或訂定規則來解決衝突。Bucardo 是這種複寫方式的一個範例。
在 Synchronous Multimaster Replication 中,每個伺服器都可以接受寫入請求,並且在每個事務提交之前,已修改的資料將從原始伺服器傳輸到所有其他伺服器之中。繁重的寫入活動可能導致過多的鎖定和提交延遲,從而導致效能下降。讀取請求可以發送到任何一個伺服器。有一些實作方案使用共享磁碟來減少通訊成本。同步多重主要伺服器複寫最適合大多數為讀取工作的負載情況,儘管它的最大優點是任何伺服器都可以接受寫入請求-毋須在主要伺服器和備用伺服器之間區分工作負載,並且因為資料變更是由一台伺服器發送到另一台伺服器的,像 random() 這樣的不確定結果的函數也沒有問題。
PostgreSQL 並不提供這種複寫機制,儘管可以使用 PostgreSQL 兩階段提交(PREPARE TRANSACTION 和 COMMIT PREPARED)在應用程式或中間層服務中實作這種複寫方式。
Table 26.1 總結了上面各種解決方案的功能。
Popular examples
NAS
DRBD
built-in streaming repl.
built-in logical repl., pglogical
Londiste, Slony
pgpool-II
Bucardo
Comm. method
shared disk
disk blocks
WAL
logical decoding
table rows
SQL
table rows
table rows and row locks
No special hardware required
•
•
•
•
•
•
•
Allows multiple master servers
•
•
•
•
No master server overhead
•
•
•
•
No waiting for multiple servers
•
with sync off
with sync off
•
•
Master failure will never lose data
•
•
with sync on
with sync on
•
•
Replicas accept read-only queries
with hot standby
•
•
•
•
•
Per-table granularity
•
•
•
•
No conflict resolution necessary
•
•
•
•
•
•
There are a few solutions that do not fit into the above categories:
Data partitioning splits tables into data sets. Each set can be modified by only one server. For example, data can be partitioned by offices, e.g., London and Paris, with a server in each office. If queries combining London and Paris data are necessary, an application can query both servers, or master/standby replication can be used to keep a read-only copy of the other office's data on each server.
Many of the above solutions allow multiple servers to handle multiple queries, but none allow a single query to use multiple servers to complete faster. This solution allows multiple servers to work concurrently on a single query. It is usually accomplished by splitting the data among servers and having each server execute its part of the query and return results to a central server where they are combined and returned to the user. This can be implemented using the PL/Proxy tool set.
It should also be noted that because PostgreSQL is open source and easily extended, a number of companies have taken PostgreSQL and created commercial closed-source solutions with unique failover, replication, and load balancing capabilities. These are not discussed here.
Hot standby is the term used to describe the ability to connect to the server and run read-only queries while the server is in archive recovery or standby mode. This is useful both for replication purposes and for restoring a backup to a desired state with great precision. The term hot standby also refers to the ability of the server to move from recovery through to normal operation while users continue running queries and/or keep their connections open.
Running queries in hot standby mode is similar to normal query operation, though there are several usage and administrative differences explained below.
When the hot_standby parameter is set to true on a standby server, it will begin accepting connections once the recovery has brought the system to a consistent state. All such connections are strictly read-only; not even temporary tables may be written.
The data on the standby takes some time to arrive from the primary server so there will be a measurable delay between primary and standby. Running the same query nearly simultaneously on both primary and standby might therefore return differing results. We say that data on the standby is eventually consistent with the primary. Once the commit record for a transaction is replayed on the standby, the changes made by that transaction will be visible to any new snapshots taken on the standby. Snapshots may be taken at the start of each query or at the start of each transaction, depending on the current transaction isolation level. For more details, see Section 13.2.
Transactions started during hot standby may issue the following commands:
Query access: SELECT
, COPY TO
Cursor commands: DECLARE
, FETCH
, CLOSE
Settings: SHOW
, SET
, RESET
Transaction management commands:
BEGIN
, END
, ABORT
, START TRANSACTION
SAVEPOINT
, RELEASE
, ROLLBACK TO SAVEPOINT
EXCEPTION
blocks and other internal subtransactions
LOCK TABLE
, though only when explicitly in one of these modes: ACCESS SHARE
, ROW SHARE
or ROW EXCLUSIVE
.
Plans and resources: PREPARE
, EXECUTE
, DEALLOCATE
, DISCARD
Plugins and extensions: LOAD
UNLISTEN
Transactions started during hot standby will never be assigned a transaction ID and cannot write to the system write-ahead log. Therefore, the following actions will produce error messages:
Data Manipulation Language (DML): INSERT
, UPDATE
, DELETE
, COPY FROM
, TRUNCATE
. Note that there are no allowed actions that result in a trigger being executed during recovery. This restriction applies even to temporary tables, because table rows cannot be read or written without assigning a transaction ID, which is currently not possible in a hot standby environment.
Data Definition Language (DDL): CREATE
, DROP
, ALTER
, COMMENT
. This restriction applies even to temporary tables, because carrying out these operations would require updating the system catalog tables.
SELECT ... FOR SHARE | UPDATE
, because row locks cannot be taken without updating the underlying data files.
Rules on SELECT
statements that generate DML commands.
LOCK
that explicitly requests a mode higher than ROW EXCLUSIVE MODE
.
LOCK
in short default form, since it requests ACCESS EXCLUSIVE MODE
.
Transaction management commands that explicitly set non-read-only state:
BEGIN READ WRITE
, START TRANSACTION READ WRITE
SET TRANSACTION READ WRITE
, SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
SET transaction_read_only = off
Two-phase commit commands: PREPARE TRANSACTION
, COMMIT PREPARED
, ROLLBACK PREPARED
because even read-only transactions need to write WAL in the prepare phase (the first phase of two phase commit).
Sequence updates: nextval()
, setval()
LISTEN
, NOTIFY
In normal operation, “read-only” transactions are allowed to use LISTEN
and NOTIFY
, so hot standby sessions operate under slightly tighter restrictions than ordinary read-only sessions. It is possible that some of these restrictions might be loosened in a future release.
During hot standby, the parameter transaction_read_only
is always true and may not be changed. But as long as no attempt is made to modify the database, connections during hot standby will act much like any other database connection. If failover or switchover occurs, the database will switch to normal processing mode. Sessions will remain connected while the server changes mode. Once hot standby finishes, it will be possible to initiate read-write transactions (even from a session begun during hot standby).
Users can determine whether hot standby is currently active for their session by issuing SHOW in_hot_standby
. (In server versions before 14, the in_hot_standby
parameter did not exist; a workable substitute method for older servers is SHOW transaction_read_only
.) In addition, a set of functions (Table 9.90) allow users to access information about the standby server. These allow you to write programs that are aware of the current state of the database. These can be used to monitor the progress of recovery, or to allow you to write complex programs that restore the database to particular states.
The primary and standby servers are in many ways loosely connected. Actions on the primary will have an effect on the standby. As a result, there is potential for negative interactions or conflicts between them. The easiest conflict to understand is performance: if a huge data load is taking place on the primary then this will generate a similar stream of WAL records on the standby, so standby queries may contend for system resources, such as I/O.
There are also additional types of conflict that can occur with hot standby. These conflicts are hard conflicts in the sense that queries might need to be canceled and, in some cases, sessions disconnected to resolve them. The user is provided with several ways to handle these conflicts. Conflict cases include:
Access Exclusive locks taken on the primary server, including both explicit LOCK
commands and various DDL actions, conflict with table accesses in standby queries.
Dropping a tablespace on the primary conflicts with standby queries using that tablespace for temporary work files.
Dropping a database on the primary conflicts with sessions connected to that database on the standby.
Application of a vacuum cleanup record from WAL conflicts with standby transactions whose snapshots can still “see” any of the rows to be removed.
Application of a vacuum cleanup record from WAL conflicts with queries accessing the target page on the standby, whether or not the data to be removed is visible.
On the primary server, these cases simply result in waiting; and the user might choose to cancel either of the conflicting actions. However, on the standby there is no choice: the WAL-logged action already occurred on the primary so the standby must not fail to apply it. Furthermore, allowing WAL application to wait indefinitely may be very undesirable, because the standby's state will become increasingly far behind the primary's. Therefore, a mechanism is provided to forcibly cancel standby queries that conflict with to-be-applied WAL records.
An example of the problem situation is an administrator on the primary server running DROP TABLE
on a table that is currently being queried on the standby server. Clearly the standby query cannot continue if the DROP TABLE
is applied on the standby. If this situation occurred on the primary, the DROP TABLE
would wait until the other query had finished. But when DROP TABLE
is run on the primary, the primary doesn't have information about what queries are running on the standby, so it will not wait for any such standby queries. The WAL change records come through to the standby while the standby query is still running, causing a conflict. The standby server must either delay application of the WAL records (and everything after them, too) or else cancel the conflicting query so that the DROP TABLE
can be applied.
When a conflicting query is short, it's typically desirable to allow it to complete by delaying WAL application for a little bit; but a long delay in WAL application is usually not desirable. So the cancel mechanism has parameters, max_standby_archive_delay and max_standby_streaming_delay, that define the maximum allowed delay in WAL application. Conflicting queries will be canceled once it has taken longer than the relevant delay setting to apply any newly-received WAL data. There are two parameters so that different delay values can be specified for the case of reading WAL data from an archive (i.e., initial recovery from a base backup or “catching up” a standby server that has fallen far behind) versus reading WAL data via streaming replication.
In a standby server that exists primarily for high availability, it's best to set the delay parameters relatively short, so that the server cannot fall far behind the primary due to delays caused by standby queries. However, if the standby server is meant for executing long-running queries, then a high or even infinite delay value may be preferable. Keep in mind however that a long-running query could cause other sessions on the standby server to not see recent changes on the primary, if it delays application of WAL records.
Once the delay specified by max_standby_archive_delay
or max_standby_streaming_delay
has been exceeded, conflicting queries will be canceled. This usually results just in a cancellation error, although in the case of replaying a DROP DATABASE
the entire conflicting session will be terminated. Also, if the conflict is over a lock held by an idle transaction, the conflicting session is terminated (this behavior might change in the future).
Canceled queries may be retried immediately (after beginning a new transaction, of course). Since query cancellation depends on the nature of the WAL records being replayed, a query that was canceled may well succeed if it is executed again.
Keep in mind that the delay parameters are compared to the elapsed time since the WAL data was received by the standby server. Thus, the grace period allowed to any one query on the standby is never more than the delay parameter, and could be considerably less if the standby has already fallen behind as a result of waiting for previous queries to complete, or as a result of being unable to keep up with a heavy update load.
The most common reason for conflict between standby queries and WAL replay is “early cleanup”. Normally, PostgreSQL allows cleanup of old row versions when there are no transactions that need to see them to ensure correct visibility of data according to MVCC rules. However, this rule can only be applied for transactions executing on the primary. So it is possible that cleanup on the primary will remove row versions that are still visible to a transaction on the standby.
Experienced users should note that both row version cleanup and row version freezing will potentially conflict with standby queries. Running a manual VACUUM FREEZE
is likely to cause conflicts even on tables with no updated or deleted rows.
Users should be clear that tables that are regularly and heavily updated on the primary server will quickly cause cancellation of longer running queries on the standby. In such cases the setting of a finite value for max_standby_archive_delay
or max_standby_streaming_delay
can be considered similar to setting statement_timeout
.
Remedial possibilities exist if the number of standby-query cancellations is found to be unacceptable. The first option is to set the parameter hot_standby_feedback
, which prevents VACUUM
from removing recently-dead rows and so cleanup conflicts do not occur. If you do this, you should note that this will delay cleanup of dead rows on the primary, which may result in undesirable table bloat. However, the cleanup situation will be no worse than if the standby queries were running directly on the primary server, and you are still getting the benefit of off-loading execution onto the standby. If standby servers connect and disconnect frequently, you might want to make adjustments to handle the period when hot_standby_feedback
feedback is not being provided. For example, consider increasing max_standby_archive_delay
so that queries are not rapidly canceled by conflicts in WAL archive files during disconnected periods. You should also consider increasing max_standby_streaming_delay
to avoid rapid cancellations by newly-arrived streaming WAL entries after reconnection.
Another option is to increase vacuum_defer_cleanup_age on the primary server, so that dead rows will not be cleaned up as quickly as they normally would be. This will allow more time for queries to execute before they are canceled on the standby, without having to set a high max_standby_streaming_delay
. However it is difficult to guarantee any specific execution-time window with this approach, since vacuum_defer_cleanup_age
is measured in transactions executed on the primary server.
The number of query cancels and the reason for them can be viewed using the pg_stat_database_conflicts
system view on the standby server. The pg_stat_database
system view also contains summary information.
Users can control whether a log message is produced when WAL replay is waiting longer than deadlock_timeout
for conflicts. This is controlled by the log_recovery_conflict_waits parameter.
If hot_standby
is on
in postgresql.conf
(the default value) and there is a standby.signal
file present, the server will run in hot standby mode. However, it may take some time for hot standby connections to be allowed, because the server will not accept connections until it has completed sufficient recovery to provide a consistent state against which queries can run. During this period, clients that attempt to connect will be refused with an error message. To confirm the server has come up, either loop trying to connect from the application, or look for these messages in the server logs:
Consistency information is recorded once per checkpoint on the primary. It is not possible to enable hot standby when reading WAL written during a period when wal_level
was not set to replica
or logical
on the primary. Reaching a consistent state can also be delayed in the presence of both of these conditions:
A write transaction has more than 64 subtransactions
Very long-lived write transactions
If you are running file-based log shipping ("warm standby"), you might need to wait until the next WAL file arrives, which could be as long as the archive_timeout
setting on the primary.
The settings of some parameters determine the size of shared memory for tracking transaction IDs, locks, and prepared transactions. These shared memory structures must be no smaller on a standby than on the primary in order to ensure that the standby does not run out of shared memory during recovery. For example, if the primary had used a prepared transaction but the standby had not allocated any shared memory for tracking prepared transactions, then recovery could not continue until the standby's configuration is changed. The parameters affected are:
max_connections
max_prepared_transactions
max_locks_per_transaction
max_wal_senders
max_worker_processes
The easiest way to ensure this does not become a problem is to have these parameters set on the standbys to values equal to or greater than on the primary. Therefore, if you want to increase these values, you should do so on all standby servers first, before applying the changes to the primary server. Conversely, if you want to decrease these values, you should do so on the primary server first, before applying the changes to all standby servers. Keep in mind that when a standby is promoted, it becomes the new reference for the required parameter settings for the standbys that follow it. Therefore, to avoid this becoming a problem during a switchover or failover, it is recommended to keep these settings the same on all standby servers.
The WAL tracks changes to these parameters on the primary. If a hot standby processes WAL that indicates that the current value on the primary is higher than its own value, it will log a warning and pause recovery, for example:
At that point, the settings on the standby need to be updated and the instance restarted before recovery can continue. If the standby is not a hot standby, then when it encounters the incompatible parameter change, it will shut down immediately without pausing, since there is then no value in keeping it up.
It is important that the administrator select appropriate settings for max_standby_archive_delay and max_standby_streaming_delay. The best choices vary depending on business priorities. For example if the server is primarily tasked as a High Availability server, then you will want low delay settings, perhaps even zero, though that is a very aggressive setting. If the standby server is tasked as an additional server for decision support queries then it might be acceptable to set the maximum delay values to many hours, or even -1 which means wait forever for queries to complete.
Transaction status "hint bits" written on the primary are not WAL-logged, so data on the standby will likely re-write the hints again on the standby. Thus, the standby server will still perform disk writes even though all users are read-only; no changes occur to the data values themselves. Users will still write large sort temporary files and re-generate relcache info files, so no part of the database is truly read-only during hot standby mode. Note also that writes to remote databases using dblink module, and other operations outside the database using PL functions will still be possible, even though the transaction is read-only locally.
The following types of administration commands are not accepted during recovery mode:
Data Definition Language (DDL): e.g., CREATE INDEX
Privilege and Ownership: GRANT
, REVOKE
, REASSIGN
Maintenance commands: ANALYZE
, VACUUM
, CLUSTER
, REINDEX
Again, note that some of these commands are actually allowed during "read only" mode transactions on the primary.
As a result, you cannot create additional indexes that exist solely on the standby, nor statistics that exist solely on the standby. If these administration commands are needed, they should be executed on the primary, and eventually those changes will propagate to the standby.
pg_cancel_backend()
and pg_terminate_backend()
will work on user backends, but not the startup process, which performs recovery. pg_stat_activity
does not show recovering transactions as active. As a result, pg_prepared_xacts
is always empty during recovery. If you wish to resolve in-doubt prepared transactions, view pg_prepared_xacts
on the primary and issue commands to resolve transactions there or resolve them after the end of recovery.
pg_locks
will show locks held by backends, as normal. pg_locks
also shows a virtual transaction managed by the startup process that owns all AccessExclusiveLocks
held by transactions being replayed by recovery. Note that the startup process does not acquire locks to make database changes, and thus locks other than AccessExclusiveLocks
do not show in pg_locks
for the Startup process; they are just presumed to exist.
The Nagios plugin check_pgsql will work, because the simple information it checks for exists. The check_postgres monitoring script will also work, though some reported values could give different or confusing results. For example, last vacuum time will not be maintained, since no vacuum occurs on the standby. Vacuums running on the primary do still send their changes to the standby.
WAL file control commands will not work during recovery, e.g., pg_backup_start
, pg_switch_wal
etc.
Dynamically loadable modules work, including pg_stat_statements
.
Advisory locks work normally in recovery, including deadlock detection. Note that advisory locks are never WAL logged, so it is impossible for an advisory lock on either the primary or the standby to conflict with WAL replay. Nor is it possible to acquire an advisory lock on the primary and have it initiate a similar advisory lock on the standby. Advisory locks relate only to the server on which they are acquired.
Trigger-based replication systems such as Slony, Londiste and Bucardo won't run on the standby at all, though they will run happily on the primary server as long as the changes are not sent to standby servers to be applied. WAL replay is not trigger-based so you cannot relay from the standby to any system that requires additional database writes or relies on the use of triggers.
New OIDs cannot be assigned, though some UUID generators may still work as long as they do not rely on writing new status to the database.
Currently, temporary table creation is not allowed during read-only transactions, so in some cases existing scripts will not run correctly. This restriction might be relaxed in a later release. This is both an SQL standard compliance issue and a technical issue.
DROP TABLESPACE
can only succeed if the tablespace is empty. Some standby users may be actively using the tablespace via their temp_tablespaces
parameter. If there are temporary files in the tablespace, all active queries are canceled to ensure that temporary files are removed, so the tablespace can be removed and WAL replay can continue.
Running DROP DATABASE
or ALTER DATABASE ... SET TABLESPACE
on the primary will generate a WAL entry that will cause all users connected to that database on the standby to be forcibly disconnected. This action occurs immediately, whatever the setting of max_standby_streaming_delay
. Note that ALTER DATABASE ... RENAME
does not disconnect users, which in most cases will go unnoticed, though might in some cases cause a program confusion if it depends in some way upon database name.
In normal (non-recovery) mode, if you issue DROP USER
or DROP ROLE
for a role with login capability while that user is still connected then nothing happens to the connected user — they remain connected. The user cannot reconnect however. This behavior applies in recovery also, so a DROP USER
on the primary does not disconnect that user on the standby.
The cumulative statistics system is active during recovery. All scans, reads, blocks, index usage, etc., will be recorded normally on the standby. However, WAL replay will not increment relation and database specific counters. I.e. replay will not increment pg_stat_all_tables columns (like n_tup_ins), nor will reads or writes performed by the startup process be tracked in the pg_statio views, nor will associated pg_stat_database columns be incremented.
Autovacuum is not active during recovery. It will start normally at the end of recovery.
The checkpointer process and the background writer process are active during recovery. The checkpointer process will perform restartpoints (similar to checkpoints on the primary) and the background writer process will perform normal block cleaning activities. This can include updates of the hint bit information stored on the standby server. The CHECKPOINT
command is accepted during recovery, though it performs a restartpoint rather than a new checkpoint.
Various parameters have been mentioned above in Section 27.4.2 and Section 27.4.3.
On the primary, parameters wal_level and vacuum_defer_cleanup_age can be used. max_standby_archive_delay and max_standby_streaming_delay have no effect if set on the primary.
On the standby, parameters hot_standby, max_standby_archive_delay and max_standby_streaming_delay can be used. vacuum_defer_cleanup_age has no effect as long as the server remains in standby mode, though it will become relevant if the standby becomes primary.
There are several limitations of hot standby. These can and probably will be fixed in future releases:
Full knowledge of running transactions is required before snapshots can be taken. Transactions that use large numbers of subtransactions (currently greater than 64) will delay the start of read-only connections until the completion of the longest running write transaction. If this situation occurs, explanatory messages will be sent to the server log.
Valid starting points for standby queries are generated at each checkpoint on the primary. If the standby is shut down while the primary is in a shutdown state, it might not be possible to re-enter hot standby until the primary is started up, so that it generates further starting points in the WAL logs. This situation isn't a problem in the most common situations where it might happen. Generally, if the primary is shut down and not available anymore, that's likely due to a serious failure that requires the standby being converted to operate as the new primary anyway. And in situations where the primary is being intentionally taken down, coordinating to make sure the standby becomes the new primary smoothly is also standard procedure.
At the end of recovery, AccessExclusiveLocks
held by prepared transactions will require twice the normal number of lock table entries. If you plan on running either a large number of concurrent prepared transactions that normally take AccessExclusiveLocks
, or you plan on having one large transaction that takes many AccessExclusiveLocks
, you are advised to select a larger value of max_locks_per_transaction
, perhaps as much as twice the value of the parameter on the primary server. You need not consider this at all if your setting of max_prepared_transactions
is 0.
The Serializable transaction isolation level is not yet available in hot standby. (See Section 13.2.3 and Section 13.4.1 for details.) An attempt to set a transaction to the serializable isolation level in hot standby mode will generate an error.
與所有有價值的資料一樣,PostgreSQL 資料庫應該定期備份。雖然程序本質上很簡單,但對基本技術和假設有清晰的了解是很重要的。
以三種根本上不同的方法來備份 PostgreSQL 資料:
SQL dump
檔案系統層級的備份
持續性歸檔封存
每個方式都有它的優點和缺點;以下各節將逐一討論。
PostgreSQL 中的字元集支援允許您將文字以各種字元集(也稱為編碼)儲存,包括單位元組字元集(如 ISO 8859 系列)和多位元組字元集,如 EUC(延伸 Unix 代碼), UTF-8 和 Mule 內部代碼。用戶端可以透通地使用所有支援的字元集,但有一些並不支援在伺服器中使用(即作為伺服器端編碼)。使用 initdb 初始化 PostgreSQL 資料庫叢集時,會選擇預設字元集。建立資料庫時可以覆寫它,因此您可以擁有多個資料庫,每個資料庫具有不同的字元集。
但是,一個重要的限制是每個資料庫的字元集必須與資料庫的 LC_CTYPE(字元分類)和 LC_COLLATE(字串排序順序)語言環境設定相容。對於 C 或 POSIX 語言環境,允許使用任何字元集,但對於其他 libc 提供的語言環境,只有一個字元集可以正常工作。(但在 Windows 上,UTF-8 編碼可以與任何語言環境一起使用。)如果您配置了 ICU 支援,ICU 提供的語言環境可以與大多數但不是所有伺服器端編碼一起使用。
Table 24.1 顯示了可在 PostgreSQL 中使用的字元集。
BIG5
Big Five
Traditional Chinese
No
No
1-2
WIN950
, Windows950
EUC_CN
Extended UNIX Code-CN
Simplified Chinese
Yes
Yes
1-3
EUC_JP
Extended UNIX Code-JP
Japanese
Yes
Yes
1-3
EUC_JIS_2004
Extended UNIX Code-JP, JIS X 0213
Japanese
Yes
No
1-3
EUC_KR
Extended UNIX Code-KR
Korean
Yes
Yes
1-3
EUC_TW
Extended UNIX Code-TW
Traditional Chinese, Taiwanese
Yes
Yes
1-3
GB18030
National Standard
Chinese
No
No
1-4
GBK
Extended National Standard
Simplified Chinese
No
No
1-2
WIN936
, Windows936
ISO_8859_5
ISO 8859-5, ECMA 113
Latin/Cyrillic
Yes
Yes
1
ISO_8859_6
ISO 8859-6, ECMA 114
Latin/Arabic
Yes
Yes
1
ISO_8859_7
ISO 8859-7, ECMA 118
Latin/Greek
Yes
Yes
1
ISO_8859_8
ISO 8859-8, ECMA 121
Latin/Hebrew
Yes
Yes
1
JOHAB
JOHAB
Korean (Hangul)
No
No
1-3
KOI8R
KOI8-R
Cyrillic (Russian)
Yes
Yes
1
KOI8
KOI8U
KOI8-U
Cyrillic (Ukrainian)
Yes
Yes
1
LATIN1
ISO 8859-1, ECMA 94
Western European
Yes
Yes
1
ISO88591
LATIN2
ISO 8859-2, ECMA 94
Central European
Yes
Yes
1
ISO88592
LATIN3
ISO 8859-3, ECMA 94
South European
Yes
Yes
1
ISO88593
LATIN4
ISO 8859-4, ECMA 94
North European
Yes
Yes
1
ISO88594
LATIN5
ISO 8859-9, ECMA 128
Turkish
Yes
Yes
1
ISO88599
LATIN6
ISO 8859-10, ECMA 144
Nordic
Yes
Yes
1
ISO885910
LATIN7
ISO 8859-13
Baltic
Yes
Yes
1
ISO885913
LATIN8
ISO 8859-14
Celtic
Yes
Yes
1
ISO885914
LATIN9
ISO 8859-15
LATIN1 with Euro and accents
Yes
Yes
1
ISO885915
LATIN10
ISO 8859-16, ASRO SR 14111
Romanian
Yes
No
1
ISO885916
MULE_INTERNAL
Mule internal code
Multilingual Emacs
Yes
No
1-4
SJIS
Shift JIS
Japanese
No
No
1-2
Mskanji
, ShiftJIS
, WIN932
, Windows932
SHIFT_JIS_2004
Shift JIS, JIS X 0213
Japanese
No
No
1-2
SQL_ASCII
unspecified (see text)
any
Yes
No
1
UHC
Unified Hangul Code
Korean
No
No
1-2
WIN949
, Windows949
UTF8
Unicode, 8-bit
all
Yes
Yes
1-4
Unicode
WIN866
Windows CP866
Cyrillic
Yes
Yes
1
ALT
WIN874
Windows CP874
Thai
Yes
No
1
WIN1250
Windows CP1250
Central European
Yes
Yes
1
WIN1251
Windows CP1251
Cyrillic
Yes
Yes
1
WIN
WIN1252
Windows CP1252
Western European
Yes
Yes
1
WIN1253
Windows CP1253
Greek
Yes
Yes
1
WIN1254
Windows CP1254
Turkish
Yes
Yes
1
WIN1255
Windows CP1255
Hebrew
Yes
Yes
1
WIN1256
Windows CP1256
Arabic
Yes
Yes
1
WIN1257
Windows CP1257
Baltic
Yes
Yes
1
WIN1258
Windows CP1258
Vietnamese
Yes
Yes
1
ABC
, TCVN
, TCVN5712
, VSCII
並非所有用戶端 API 都支援所有列出的字元集。例如,PostgreSQL JDBC 驅動程式就不支援 MULE_INTERNAL,LATIN6,LATIN8 和 LATIN10。
SQL_ASCII 設定與其他設定的行為大不相同。當伺服器字元集是 SQL_ASCII 時,伺服器根據 ASCII 標準解譯位元組值 0-127,而位元組值 128-255 作為未解譯的字元。當設定為 SQL_ASCII 時,不會進行編碼轉換。因此,這個設定並不是使用特定編碼的宣告,而是對編碼的未知宣告。在大多數情況下,如果您使用任何非 ASCII 資料,使用 SQL_ASCII 設定是不明智的,因為 PostgreSQL 將無法透過轉換或驗證非 ASCII 字元來幫助您。
initdb 定義 PostgreSQL 叢集的預設字元集(編碼)。例如,
將預設字元集設定為 EUC_JP(日本語的延伸 Unix 代碼)。如果您喜歡更長的選項字串,則可以使用 --encoding 而不是 -E。如果使用 -E 或 --encoding 選項,initdb 將嘗試根據指定的或預設的語言環境決定要使用的相對應編碼。
您可以在資料庫建立時指定非預設編碼,前提是該編碼與所選語言環境相容:
這將建立一個名為 korean 的資料庫,該資料庫使用字元集 EUC_KR 和語言環境 ko_KR。另一種方法是使用此 SQL 指令:
請注意,上述指令指定複製 template0 資料庫。複製任何其他資料庫時,無法更改原資料庫的編碼和語言環境設定,因為這可能會導致資料損壞。有關更多訊息,請參閱第 22.3 節。
資料庫的編碼儲存在系統目錄 pg_database 中。您可以使用 psql -l 選項或 \l 指令查看。
注意 在大多數現代作業系統上,PostgreSQL 可以確定 LC_CTYPE 設定所隱含的字元集,並強制只使用相符合的資料庫編碼。在較舊的系統上,您有責任確保使用所選區域設定所需的編碼。此區域中的錯誤可能會導致與區域設定相關操作(如排序)的奇怪行為。
即使 LC_CTYPE 不是 C 或 POSIX,PostgreSQL 也允許超級使用者使用 SQL_ASCII 編碼建立資料庫。如上所述,SQL_ASCII 不強制儲存在資料庫中的資料具有任何特定編碼,因此這種選擇會帶來相依於語言環境的錯誤行為風險。不推薦使用這種設定組合,有一天可能會被禁止使用。
PostgreSQL 支援伺服器和用戶端之間針對某些字元集組合的自動字元集轉換。轉換訊息儲存在 pg_conversion 系統目錄中。PostgreSQL 帶有一些預先定義的轉換,如 Table 23.2 所示。您可以使用 SQL 指令 CREATE CONVERSION 建立新的轉換。
BIG5
not supported as a server encoding
EUC_CN
EUC_CN, MULE_INTERNAL
, UTF8
EUC_JP
EUC_JP, MULE_INTERNAL
, SJIS
, UTF8
EUC_KR
EUC_KR, MULE_INTERNAL
, UTF8
EUC_TW
EUC_TW, BIG5
, MULE_INTERNAL
, UTF8
GB18030
not supported as a server encoding
GBK
not supported as a server encoding
ISO_8859_5
ISO_8859_5, KOI8R
, MULE_INTERNAL
, UTF8
, WIN866
, WIN1251
ISO_8859_6
ISO_8859_6, UTF8
ISO_8859_7
ISO_8859_7, UTF8
ISO_8859_8
ISO_8859_8, UTF8
JOHAB
JOHAB, UTF8
KOI8R
KOI8R, ISO_8859_5
, MULE_INTERNAL
, UTF8
, WIN866
, WIN1251
KOI8U
KOI8U, UTF8
LATIN1
LATIN1, MULE_INTERNAL
, UTF8
LATIN2
LATIN2, MULE_INTERNAL
, UTF8
, WIN1250
LATIN3
LATIN3, MULE_INTERNAL
, UTF8
LATIN4
LATIN4, MULE_INTERNAL
, UTF8
LATIN5
LATIN5, UTF8
LATIN6
LATIN6, UTF8
LATIN7
LATIN7, UTF8
LATIN8
LATIN8, UTF8
LATIN9
LATIN9, UTF8
LATIN10
LATIN10, UTF8
MULE_INTERNAL
MULE_INTERNAL, BIG5
, EUC_CN
, EUC_JP
, EUC_KR
, EUC_TW
, ISO_8859_5
, KOI8R
, LATIN1
to LATIN4
, SJIS
, WIN866
, WIN1250
, WIN1251
SJIS
not supported as a server encoding
SQL_ASCII
any (no conversion will be performed)
UHC
not supported as a server encoding
UTF8
all supported encodings
WIN866
WIN866, ISO_8859_5
, KOI8R
, MULE_INTERNAL
, UTF8
, WIN1251
WIN874
WIN874, UTF8
WIN1250
WIN1250, LATIN2
, MULE_INTERNAL
, UTF8
WIN1251
WIN1251, ISO_8859_5
, KOI8R
, MULE_INTERNAL
, UTF8
, WIN866
WIN1252
WIN1252, UTF8
WIN1253
WIN1253, UTF8
WIN1254
WIN1254, UTF8
WIN1255
WIN1255, UTF8
WIN1256
WIN1256, UTF8
WIN1257
WIN1257, UTF8
WIN1258
WIN1258, UTF8
要啟用自動字元集轉換,您必須告訴 PostgreSQL 您要在用戶端中使用的字元集(編碼)。有幾種方法可以實現此目的:
在 psql 中使用 \encoding 指令。\encoding 允許您即時更改用戶端編碼。例如,要將編碼更改為 SJIS,請鍵入:
libpq(第 33.10 節)具有控制用戶端編碼的功能。
使用 SET client_encoding TO。可以使用以下 SQL 指令設定用戶端編碼:
您還可以使用標準 SQL 語法 SET NAMES 來達到此目的:
要查詢目前用戶端編碼:
要回傳預設編碼:
使用 PGCLIENTENCODING。如果在用戶端環境中定義了環境變數 PGCLIENTENCODING,則在建立與伺服器的連線時會自動選擇該用戶端編碼。(這可以隨後使用上面提到的任何其他方法覆蓋。)
使用組態變數 client_encoding。如果設定了 client_encoding 變數,則在建立與伺服器的連線時會自動選擇該用戶端編碼。(這可以隨後使用上面提到的任何其他方法覆蓋。)
如果無法轉換特定字元 - 假設您為伺服器選擇了 EUC_JP 而為用戶端選擇了 LATIN1,並且回傳了一些在 LATIN1 中沒有表示的日文字元 - 回報錯誤。
如果用戶端字元集定義為 SQL_ASCII,則無論伺服器的字元集如何,都將停用編碼轉換。就像伺服器一樣,除非使用全 ASCII 資料,否則使用 SQL_ASCII 是不明智的。
PostgreSQL allows conversion between any two character sets for which a conversion function is listed in the pg_conversion
system catalog. PostgreSQL comes with some predefined conversions, as summarized in Table 24.2 and shown in more detail in Table 24.3. You can create a new conversion using the SQL command CREATE CONVERSION. (To be used for automatic client/server conversions, a conversion must be marked as “default” for its character set pair.)
Table 24.2. Built-in Client/Server Character Set Conversions
BIG5
not supported as a server encoding
EUC_CN
EUC_CN, MULE_INTERNAL
, UTF8
EUC_JP
EUC_JP, MULE_INTERNAL
, SJIS
, UTF8
EUC_JIS_2004
EUC_JIS_2004, SHIFT_JIS_2004
, UTF8
EUC_KR
EUC_KR, MULE_INTERNAL
, UTF8
EUC_TW
EUC_TW, BIG5
, MULE_INTERNAL
, UTF8
GB18030
not supported as a server encoding
GBK
not supported as a server encoding
ISO_8859_5
ISO_8859_5, KOI8R
, MULE_INTERNAL
, UTF8
, WIN866
, WIN1251
ISO_8859_6
ISO_8859_6, UTF8
ISO_8859_7
ISO_8859_7, UTF8
ISO_8859_8
ISO_8859_8, UTF8
JOHAB
not supported as a server encoding
KOI8R
KOI8R, ISO_8859_5
, MULE_INTERNAL
, UTF8
, WIN866
, WIN1251
KOI8U
KOI8U, UTF8
LATIN1
LATIN1, MULE_INTERNAL
, UTF8
LATIN2
LATIN2, MULE_INTERNAL
, UTF8
, WIN1250
LATIN3
LATIN3, MULE_INTERNAL
, UTF8
LATIN4
LATIN4, MULE_INTERNAL
, UTF8
LATIN5
LATIN5, UTF8
LATIN6
LATIN6, UTF8
LATIN7
LATIN7, UTF8
LATIN8
LATIN8, UTF8
LATIN9
LATIN9, UTF8
LATIN10
LATIN10, UTF8
MULE_INTERNAL
MULE_INTERNAL, BIG5
, EUC_CN
, EUC_JP
, EUC_KR
, EUC_TW
, ISO_8859_5
, KOI8R
, LATIN1
to LATIN4
, SJIS
, WIN866
, WIN1250
, WIN1251
SJIS
not supported as a server encoding
SHIFT_JIS_2004
not supported as a server encoding
SQL_ASCII
any (no conversion will be performed)
UHC
not supported as a server encoding
UTF8
all supported encodings
WIN866
WIN866, ISO_8859_5
, KOI8R
, MULE_INTERNAL
, UTF8
, WIN1251
WIN874
WIN874, UTF8
WIN1250
WIN1250, LATIN2
, MULE_INTERNAL
, UTF8
WIN1251
WIN1251, ISO_8859_5
, KOI8R
, MULE_INTERNAL
, UTF8
, WIN866
WIN1252
WIN1252, UTF8
WIN1253
WIN1253, UTF8
WIN1254
WIN1254, UTF8
WIN1255
WIN1255, UTF8
WIN1256
WIN1256, UTF8
WIN1257
WIN1257, UTF8
WIN1258
WIN1258, UTF8
Table 24.3. All Built-in Character Set Conversions
Source Encoding
Destination Encoding
big5_to_euc_tw
BIG5
EUC_TW
big5_to_mic
BIG5
MULE_INTERNAL
big5_to_utf8
BIG5
UTF8
euc_cn_to_mic
EUC_CN
MULE_INTERNAL
euc_cn_to_utf8
EUC_CN
UTF8
euc_jp_to_mic
EUC_JP
MULE_INTERNAL
euc_jp_to_sjis
EUC_JP
SJIS
euc_jp_to_utf8
EUC_JP
UTF8
euc_kr_to_mic
EUC_KR
MULE_INTERNAL
euc_kr_to_utf8
EUC_KR
UTF8
euc_tw_to_big5
EUC_TW
BIG5
euc_tw_to_mic
EUC_TW
MULE_INTERNAL
euc_tw_to_utf8
EUC_TW
UTF8
gb18030_to_utf8
GB18030
UTF8
gbk_to_utf8
GBK
UTF8
iso_8859_10_to_utf8
LATIN6
UTF8
iso_8859_13_to_utf8
LATIN7
UTF8
iso_8859_14_to_utf8
LATIN8
UTF8
iso_8859_15_to_utf8
LATIN9
UTF8
iso_8859_16_to_utf8
LATIN10
UTF8
iso_8859_1_to_mic
LATIN1
MULE_INTERNAL
iso_8859_1_to_utf8
LATIN1
UTF8
iso_8859_2_to_mic
LATIN2
MULE_INTERNAL
iso_8859_2_to_utf8
LATIN2
UTF8
iso_8859_2_to_windows_1250
LATIN2
WIN1250
iso_8859_3_to_mic
LATIN3
MULE_INTERNAL
iso_8859_3_to_utf8
LATIN3
UTF8
iso_8859_4_to_mic
LATIN4
MULE_INTERNAL
iso_8859_4_to_utf8
LATIN4
UTF8
iso_8859_5_to_koi8_r
ISO_8859_5
KOI8R
iso_8859_5_to_mic
ISO_8859_5
MULE_INTERNAL
iso_8859_5_to_utf8
ISO_8859_5
UTF8
iso_8859_5_to_windows_1251
ISO_8859_5
WIN1251
iso_8859_5_to_windows_866
ISO_8859_5
WIN866
iso_8859_6_to_utf8
ISO_8859_6
UTF8
iso_8859_7_to_utf8
ISO_8859_7
UTF8
iso_8859_8_to_utf8
ISO_8859_8
UTF8
iso_8859_9_to_utf8
LATIN5
UTF8
johab_to_utf8
JOHAB
UTF8
koi8_r_to_iso_8859_5
KOI8R
ISO_8859_5
koi8_r_to_mic
KOI8R
MULE_INTERNAL
koi8_r_to_utf8
KOI8R
UTF8
koi8_r_to_windows_1251
KOI8R
WIN1251
koi8_r_to_windows_866
KOI8R
WIN866
koi8_u_to_utf8
KOI8U
UTF8
mic_to_big5
MULE_INTERNAL
BIG5
mic_to_euc_cn
MULE_INTERNAL
EUC_CN
mic_to_euc_jp
MULE_INTERNAL
EUC_JP
mic_to_euc_kr
MULE_INTERNAL
EUC_KR
mic_to_euc_tw
MULE_INTERNAL
EUC_TW
mic_to_iso_8859_1
MULE_INTERNAL
LATIN1
mic_to_iso_8859_2
MULE_INTERNAL
LATIN2
mic_to_iso_8859_3
MULE_INTERNAL
LATIN3
mic_to_iso_8859_4
MULE_INTERNAL
LATIN4
mic_to_iso_8859_5
MULE_INTERNAL
ISO_8859_5
mic_to_koi8_r
MULE_INTERNAL
KOI8R
mic_to_sjis
MULE_INTERNAL
SJIS
mic_to_windows_1250
MULE_INTERNAL
WIN1250
mic_to_windows_1251
MULE_INTERNAL
WIN1251
mic_to_windows_866
MULE_INTERNAL
WIN866
sjis_to_euc_jp
SJIS
EUC_JP
sjis_to_mic
SJIS
MULE_INTERNAL
sjis_to_utf8
SJIS
UTF8
windows_1258_to_utf8
WIN1258
UTF8
uhc_to_utf8
UHC
UTF8
utf8_to_big5
UTF8
BIG5
utf8_to_euc_cn
UTF8
EUC_CN
utf8_to_euc_jp
UTF8
EUC_JP
utf8_to_euc_kr
UTF8
EUC_KR
utf8_to_euc_tw
UTF8
EUC_TW
utf8_to_gb18030
UTF8
GB18030
utf8_to_gbk
UTF8
GBK
utf8_to_iso_8859_1
UTF8
LATIN1
utf8_to_iso_8859_10
UTF8
LATIN6
utf8_to_iso_8859_13
UTF8
LATIN7
utf8_to_iso_8859_14
UTF8
LATIN8
utf8_to_iso_8859_15
UTF8
LATIN9
utf8_to_iso_8859_16
UTF8
LATIN10
utf8_to_iso_8859_2
UTF8
LATIN2
utf8_to_iso_8859_3
UTF8
LATIN3
utf8_to_iso_8859_4
UTF8
LATIN4
utf8_to_iso_8859_5
UTF8
ISO_8859_5
utf8_to_iso_8859_6
UTF8
ISO_8859_6
utf8_to_iso_8859_7
UTF8
ISO_8859_7
utf8_to_iso_8859_8
UTF8
ISO_8859_8
utf8_to_iso_8859_9
UTF8
LATIN5
utf8_to_johab
UTF8
JOHAB
utf8_to_koi8_r
UTF8
KOI8R
utf8_to_koi8_u
UTF8
KOI8U
utf8_to_sjis
UTF8
SJIS
utf8_to_windows_1258
UTF8
WIN1258
utf8_to_uhc
UTF8
UHC
utf8_to_windows_1250
UTF8
WIN1250
utf8_to_windows_1251
UTF8
WIN1251
utf8_to_windows_1252
UTF8
WIN1252
utf8_to_windows_1253
UTF8
WIN1253
utf8_to_windows_1254
UTF8
WIN1254
utf8_to_windows_1255
UTF8
WIN1255
utf8_to_windows_1256
UTF8
WIN1256
utf8_to_windows_1257
UTF8
WIN1257
utf8_to_windows_866
UTF8
WIN866
utf8_to_windows_874
UTF8
WIN874
windows_1250_to_iso_8859_2
WIN1250
LATIN2
windows_1250_to_mic
WIN1250
MULE_INTERNAL
windows_1250_to_utf8
WIN1250
UTF8
windows_1251_to_iso_8859_5
WIN1251
ISO_8859_5
windows_1251_to_koi8_r
WIN1251
KOI8R
windows_1251_to_mic
WIN1251
MULE_INTERNAL
windows_1251_to_utf8
WIN1251
UTF8
windows_1251_to_windows_866
WIN1251
WIN866
windows_1252_to_utf8
WIN1252
UTF8
windows_1256_to_utf8
WIN1256
UTF8
windows_866_to_iso_8859_5
WIN866
ISO_8859_5
windows_866_to_koi8_r
WIN866
KOI8R
windows_866_to_mic
WIN866
MULE_INTERNAL
windows_866_to_utf8
WIN866
UTF8
windows_866_to_windows_1251
WIN866
WIN
windows_874_to_utf8
WIN874
UTF8
euc_jis_2004_to_utf8
EUC_JIS_2004
UTF8
utf8_to_euc_jis_2004
UTF8
EUC_JIS_2004
shift_jis_2004_to_utf8
SHIFT_JIS_2004
UTF8
utf8_to_shift_jis_2004
UTF8
SHIFT_JIS_2004
euc_jis_2004_to_shift_jis_2004
EUC_JIS_2004
SHIFT_JIS_2004
shift_jis_2004_to_euc_jis_2004
SHIFT_JIS_2004
EUC_JIS_2004
這些是開始學習各種編碼系統的好資源。
CJKV 訊息處理:中文,日文,韓文和越南文運算
包含 EUC_JP,EUC_CN,EUC_KR,EUC_TW 的詳細說明。
Unicode Consortium 的網站。
RFC 3629
UTF-8 (8-bit UCS/Unicode Transformation Format) 定義在這裡
Another useful tool for monitoring database activity is the pg_locks
system table. It allows the database administrator to view information about the outstanding locks in the lock manager. For example, this capability can be used to:
View all the locks currently outstanding, all the locks on relations in a particular database, all the locks on a particular relation, or all the locks held by a particular PostgreSQL session.
Determine the relation in the current database with the most ungranted locks (which might be a source of contention among database clients).
Determine the effect of lock contention on overall database performance, as well as the extent to which contention varies with overall database traffic.
Details of the pg_locks
view appear in . For more information on locking and managing concurrency with PostgreSQL, refer to .
資料庫管理員經常會想:「系統現在正在做什麼?」本章討論如何回答這個問題。
有幾種工具可用於監控資料庫活動和分析效能。本章的大部分內容都致力於描述 PostgreSQL 的統計收集器,但不應忽視普通的 Unix 監控程序,如 ps、top、iostat 和 vmstat。而且,一旦發現查詢效率不佳,可能需要使用 PostgreSQL 的 指令進一步調查。討論了 EXPLAIN 和其他方法來解析單個查詢的行為。
PostgreSQL has the ability to report the progress of certain commands during command execution. Currently, the only commands which support progress reporting are ANALYZE
, CLUSTER
, CREATE INDEX
, VACUUM
, COPY
, and (i.e., replication command that issues to take a base backup). This may be expanded in the future.
Whenever ANALYZE
is running, the pg_stat_progress_analyze
view will contain a row for each backend that is currently running that command. The tables below describe the information that will be reported and provide information about how to interpret it.
pg_stat_progress_analyze
ViewNote that when ANALYZE
is run on a partitioned table, all of its partitions are also recursively analyzed. In that case, ANALYZE
progress is reported first for the parent table, whereby its inheritance statistics are collected, followed by that for each partition.
Whenever CREATE INDEX
or REINDEX
is running, the pg_stat_progress_create_index
view will contain one row for each backend that is currently creating indexes. The tables below describe the information that will be reported and provide information about how to interpret it.
pg_stat_progress_create_index
Viewpg_stat_progress_vacuum
ViewWhenever CLUSTER
or VACUUM FULL
is running, the pg_stat_progress_cluster
view will contain a row for each backend that is currently running either command. The tables below describe the information that will be reported and provide information about how to interpret it.
pg_stat_progress_cluster
ViewWhenever an application like pg_basebackup is taking a base backup, the pg_stat_progress_basebackup
view will contain a row for each WAL sender process that is currently running the BASE_BACKUP
replication command and streaming the backup. The tables below describe the information that will be reported and provide information about how to interpret it.
pg_stat_progress_basebackup
ViewWhenever COPY
is running, the pg_stat_progress_copy
view will contain one row for each backend that is currently running a COPY
command. The table below describes the information that will be reported and provides information about how to interpret it.
pg_stat_progress_copy
ViewOn most Unix platforms, PostgreSQL modifies its command title as reported by ps
, so that individual server processes can readily be identified. A sample display is
(The appropriate invocation of ps
varies across different platforms, as do the details of what is shown. This example is from a recent Linux system.) The first process listed here is the primary server process. The command arguments shown for it are the same ones used when it was launched. The next four processes are background worker processes automatically launched by the primary process. (The “autovacuum launcher” process will not be present if you have set the system not to run autovacuum.) Each of the remaining processes is a server process handling one client connection. Each such process sets its command line display in the form
The user, database, and (client) host items remain the same for the life of the client connection, but the activity indicator changes. The activity can be idle
(i.e., waiting for a client command), idle in transaction
(waiting for client inside a BEGIN
block), or a command type name such as SELECT
. Also, waiting
is appended if the server process is presently waiting on a lock held by another session. In the above example we can infer that process 15606 is waiting for process 15610 to complete its transaction and thereby release some lock. (Process 15610 must be the blocker, because there is no other active session. In more complicated cases it would be necessary to look into the system view to determine who is blocking whom.)
If has been configured the cluster name will also be shown in ps
output:
Solaris requires special handling. You must use /usr/ucb/ps
, rather than /bin/ps
. You also must use two w
flags, not just one. In addition, your original invocation of the postgres
command must have a shorter ps
status display than that provided by each server process. If you fail to do all three things, the ps
output for each server process will be the original postgres
command line.
Conversion Name
The conversion names follow a standard naming scheme: The official name of the source encoding with all non-alphanumeric characters replaced by underscores, followed by _to_
, followed by the similarly processed destination encoding name. Therefore, these names sometimes deviate from the customary encoding names shown in .
Whenever VACUUM
is running, the pg_stat_progress_vacuum
view will contain one row for each backend (including autovacuum worker processes) that is currently vacuuming. The tables below describe the information that will be reported and provide information about how to interpret it. Progress for VACUUM FULL
commands is reported via pg_stat_progress_cluster
because both VACUUM FULL
and CLUSTER
rewrite the table, while regular VACUUM
only modifies it in place. See .
If you have turned off then the activity indicator is not updated; the process title is set only once when a new process is launched. On some platforms this saves a measurable amount of per-command overhead; on others it's insignificant.
initializing
The command is preparing to begin scanning the heap. This phase is expected to be very brief.
acquiring sample rows
The command is currently scanning the table given by relid
to obtain sample rows.
acquiring inherited sample rows
The command is currently scanning child tables to obtain sample rows. Columns child_tables_total
, child_tables_done
, and current_child_table_relid
contain the progress information for this phase.
computing statistics
The command is computing statistics from the sample rows obtained during the table scan.
computing extended statistics
The command is computing extended statistics from the sample rows obtained during the table scan.
finalizing analyze
The command is updating pg_class
. When this phase is completed, ANALYZE
will end.
Column Type
Description
pid
integer
Process ID of backend.
datid
oid
OID of the database to which this backend is connected.
datname
name
Name of the database to which this backend is connected.
relid
oid
OID of the table on which the index is being created.
index_relid
oid
OID of the index being created or reindexed. During a non-concurrent CREATE INDEX
, this is 0.
command
text
The command that is running: CREATE INDEX
, CREATE INDEX CONCURRENTLY
, REINDEX
, or REINDEX CONCURRENTLY
.
phase
text
Current processing phase of index creation. See Table 28.39.
lockers_total
bigint
Total number of lockers to wait for, when applicable.
lockers_done
bigint
Number of lockers already waited for.
current_locker_pid
bigint
Process ID of the locker currently being waited for.
blocks_total
bigint
Total number of blocks to be processed in the current phase.
blocks_done
bigint
Number of blocks already processed in the current phase.
tuples_total
bigint
Total number of tuples to be processed in the current phase.
tuples_done
bigint
Number of tuples already processed in the current phase.
partitions_total
bigint
When creating an index on a partitioned table, this column is set to the total number of partitions on which the index is to be created. This field is 0
during a REINDEX
.
partitions_done
bigint
When creating an index on a partitioned table, this column is set to the number of partitions on which the index has been created. This field is 0
during a REINDEX
.
initializing
CREATE INDEX
or REINDEX
is preparing to create the index. This phase is expected to be very brief.
waiting for writers before build
CREATE INDEX CONCURRENTLY
or REINDEX CONCURRENTLY
is waiting for transactions with write locks that can potentially see the table to finish. This phase is skipped when not in concurrent mode. Columns lockers_total
, lockers_done
and current_locker_pid
contain the progress information for this phase.
building index
The index is being built by the access method-specific code. In this phase, access methods that support progress reporting fill in their own progress data, and the subphase is indicated in this column. Typically, blocks_total
and blocks_done
will contain progress data, as well as potentially tuples_total
and tuples_done
.
waiting for writers before validation
CREATE INDEX CONCURRENTLY
or REINDEX CONCURRENTLY
is waiting for transactions with write locks that can potentially write into the table to finish. This phase is skipped when not in concurrent mode. Columns lockers_total
, lockers_done
and current_locker_pid
contain the progress information for this phase.
index validation: scanning index
CREATE INDEX CONCURRENTLY
is scanning the index searching for tuples that need to be validated. This phase is skipped when not in concurrent mode. Columns blocks_total
(set to the total size of the index) and blocks_done
contain the progress information for this phase.
index validation: sorting tuples
CREATE INDEX CONCURRENTLY
is sorting the output of the index scanning phase.
index validation: scanning table
CREATE INDEX CONCURRENTLY
is scanning the table to validate the index tuples collected in the previous two phases. This phase is skipped when not in concurrent mode. Columns blocks_total
(set to the total size of the table) and blocks_done
contain the progress information for this phase.
waiting for old snapshots
CREATE INDEX CONCURRENTLY
or REINDEX CONCURRENTLY
is waiting for transactions that can potentially see the table to release their snapshots. This phase is skipped when not in concurrent mode. Columns lockers_total
, lockers_done
and current_locker_pid
contain the progress information for this phase.
waiting for readers before marking dead
REINDEX CONCURRENTLY
is waiting for transactions with read locks on the table to finish, before marking the old index dead. This phase is skipped when not in concurrent mode. Columns lockers_total
, lockers_done
and current_locker_pid
contain the progress information for this phase.
waiting for readers before dropping
REINDEX CONCURRENTLY
is waiting for transactions with read locks on the table to finish, before dropping the old index. This phase is skipped when not in concurrent mode. Columns lockers_total
, lockers_done
and current_locker_pid
contain the progress information for this phase.
Column Type
Description
pid
integer
Process ID of backend.
datid
oid
OID of the database to which this backend is connected.
datname
name
Name of the database to which this backend is connected.
relid
oid
OID of the table being vacuumed.
phase
text
Current processing phase of vacuum. See Table 28.41.
heap_blks_total
bigint
Total number of heap blocks in the table. This number is reported as of the beginning of the scan; blocks added later will not be (and need not be) visited by this VACUUM
.
heap_blks_scanned
bigint
Number of heap blocks scanned. Because the visibility map is used to optimize scans, some blocks will be skipped without inspection; skipped blocks are included in this total, so that this number will eventually become equal to heap_blks_total
when the vacuum is complete. This counter only advances when the phase is scanning heap
.
heap_blks_vacuumed
bigint
Number of heap blocks vacuumed. Unless the table has no indexes, this counter only advances when the phase is vacuuming heap
. Blocks that contain no dead tuples are skipped, so the counter may sometimes skip forward in large increments.
index_vacuum_count
bigint
Number of completed index vacuum cycles.
max_dead_tuples
bigint
Number of dead tuples that we can store before needing to perform an index vacuum cycle, based on maintenance_work_mem.
num_dead_tuples
bigint
Number of dead tuples collected since the last index vacuum cycle.
initializing
VACUUM
is preparing to begin scanning the heap. This phase is expected to be very brief.
scanning heap
VACUUM
is currently scanning the heap. It will prune and defragment each page if required, and possibly perform freezing activity. The heap_blks_scanned
column can be used to monitor the progress of the scan.
vacuuming indexes
VACUUM
is currently vacuuming the indexes. If a table has any indexes, this will happen at least once per vacuum, after the heap has been completely scanned. It may happen multiple times per vacuum if maintenance_work_mem (or, in the case of autovacuum, autovacuum_work_mem if set) is insufficient to store the number of dead tuples found.
vacuuming heap
VACUUM
is currently vacuuming the heap. Vacuuming the heap is distinct from scanning the heap, and occurs after each instance of vacuuming indexes. If heap_blks_scanned
is less than heap_blks_total
, the system will return to scanning the heap after this phase is completed; otherwise, it will begin cleaning up indexes after this phase is completed.
cleaning up indexes
VACUUM
is currently cleaning up indexes. This occurs after the heap has been completely scanned and all vacuuming of the indexes and the heap has been completed.
truncating heap
VACUUM
is currently truncating the heap so as to return empty pages at the end of the relation to the operating system. This occurs after cleaning up indexes.
performing final cleanup
VACUUM
is performing final cleanup. During this phase, VACUUM
will vacuum the free space map, update statistics in pg_class
, and report statistics to the cumulative statistics system. When this phase is completed, VACUUM
will end.
Column Type
Description
pid
integer
Process ID of backend.
datid
oid
OID of the database to which this backend is connected.
datname
name
Name of the database to which this backend is connected.
relid
oid
OID of the table being clustered.
command
text
The command that is running. Either CLUSTER
or VACUUM FULL
.
phase
text
Current processing phase. See Table 28.43.
cluster_index_relid
oid
If the table is being scanned using an index, this is the OID of the index being used; otherwise, it is zero.
heap_tuples_scanned
bigint
Number of heap tuples scanned. This counter only advances when the phase is seq scanning heap
, index scanning heap
or writing new heap
.
heap_tuples_written
bigint
Number of heap tuples written. This counter only advances when the phase is seq scanning heap
, index scanning heap
or writing new heap
.
heap_blks_total
bigint
Total number of heap blocks in the table. This number is reported as of the beginning of seq scanning heap
.
heap_blks_scanned
bigint
Number of heap blocks scanned. This counter only advances when the phase is seq scanning heap
.
index_rebuild_count
bigint
Number of indexes rebuilt. This counter only advances when the phase is rebuilding index
.
initializing
The command is preparing to begin scanning the heap. This phase is expected to be very brief.
seq scanning heap
The command is currently scanning the table using a sequential scan.
index scanning heap
CLUSTER
is currently scanning the table using an index scan.
sorting tuples
CLUSTER
is currently sorting tuples.
writing new heap
CLUSTER
is currently writing the new heap.
swapping relation files
The command is currently swapping newly-built files into place.
rebuilding index
The command is currently rebuilding an index.
performing final cleanup
The command is performing final cleanup. When this phase is completed, CLUSTER
or VACUUM FULL
will end.
Column Type
Description
pid
integer
Process ID of a WAL sender process.
phase
text
Current processing phase. See Table 28.45.
backup_total
bigint
Total amount of data that will be streamed. This is estimated and reported as of the beginning of streaming database files
phase. Note that this is only an approximation since the database may change during streaming database files
phase and WAL log may be included in the backup later. This is always the same value as backup_streamed
once the amount of data streamed exceeds the estimated total size. If the estimation is disabled in pg_basebackup (i.e., --no-estimate-size
option is specified), this is NULL
.
backup_streamed
bigint
Amount of data streamed. This counter only advances when the phase is streaming database files
or transferring wal files
.
tablespaces_total
bigint
Total number of tablespaces that will be streamed.
tablespaces_streamed
bigint
Number of tablespaces streamed. This counter only advances when the phase is streaming database files
.
initializing
The WAL sender process is preparing to begin the backup. This phase is expected to be very brief.
waiting for checkpoint to finish
The WAL sender process is currently performing pg_backup_start
to prepare to take a base backup, and waiting for the start-of-backup checkpoint to finish.
estimating backup size
The WAL sender process is currently estimating the total amount of database files that will be streamed as a base backup.
streaming database files
The WAL sender process is currently streaming database files as a base backup.
waiting for wal archiving to finish
The WAL sender process is currently performing pg_backup_stop
to finish the backup, and waiting for all the WAL files required for the base backup to be successfully archived. If either --wal-method=none
or --wal-method=stream
is specified in pg_basebackup, the backup will end when this phase is completed.
transferring wal files
The WAL sender process is currently transferring all WAL logs generated during the backup. This phase occurs after waiting for wal archiving to finish
phase if --wal-method=fetch
is specified in pg_basebackup. The backup will end when this phase is completed.
Column Type
Description
pid
integer
Process ID of backend.
datid
oid
OID of the database to which this backend is connected.
datname
name
Name of the database to which this backend is connected.
relid
oid
OID of the table on which the COPY
command is executed. It is set to 0
if copying from a SELECT
query.
command
text
The command that is running: COPY FROM
, or COPY TO
.
type
text
The io type that the data is read from or written to: FILE
, PROGRAM
, PIPE
(for COPY FROM STDIN
and COPY TO STDOUT
), or CALLBACK
(used for example during the initial table synchronization in logical replication).
bytes_processed
bigint
Number of bytes already processed by COPY
command.
bytes_total
bigint
Size of source file for COPY FROM
command in bytes. It is set to 0
if not available.
tuples_processed
bigint
Number of tuples already processed by COPY
command.
tuples_excluded
bigint
Number of tuples not processed because they were excluded by the WHERE
clause of the COPY
command.
Column Type
Description
pid
integer
Process ID of backend.
datid
oid
OID of the database to which this backend is connected.
datname
name
Name of the database to which this backend is connected.
relid
oid
OID of the table being analyzed.
phase
text
Current processing phase. See Table 28.37.
sample_blks_total
bigint
Total number of heap blocks that will be sampled.
sample_blks_scanned
bigint
Number of heap blocks scanned.
ext_stats_total
bigint
Number of extended statistics.
ext_stats_computed
bigint
Number of extended statistics computed. This counter only advances when the phase is computing extended statistics
.
child_tables_total
bigint
Number of child tables.
child_tables_done
bigint
Number of child tables scanned. This counter only advances when the phase is acquiring inherited sample rows
.
current_child_table_relid
oid
OID of the child table currently being scanned. This field is only valid when the phase is acquiring inherited sample rows
.
資料庫管理員最重要的磁碟監控任務是確保磁碟空間是足夠的。充滿資料的資料磁碟不會導致資料損壞,但是可能限制繼續進行資料處理的活動。如果儲存 WAL 檔案的磁碟空間已滿,則資料庫伺服器會出現混亂,並因此而導致服務中斷。
如果無法透過刪除其他內容來釋放磁碟上的其他空間,則可以透過使用資料表空間將某些資料庫檔案移至其他檔案系統。有關更多資訊,請參閱第 23.6 節。
注意 有一些檔案系統在幾乎全滿時效能會很差,因此不要等到磁碟完全滿之後才採取措施。
如果您的系統支援使用者的磁碟配額,那麼資料庫自然會受到伺服器作為其執行使用者的配額限制。超過配額將帶來與完全用盡磁碟空間相同的不良影響。
每個資料表都有一個主要的 heap 磁碟檔案,其中儲存了大多數的資料。如果資料表中的任何欄位可能會有大量內容,則可能還會有一個與該資料表相關聯的 TOAST 欄位,該欄位用於儲存太大量而無法適當地容納在主資料表中的內容(請參閱第 73.3 節)。如果存在的話,TOAST 資料表上將有一個有效的索引。也可能會有與基本資料表關聯的索引。每個資料表和索引都會儲存在一個單獨的磁碟檔案中-如果檔案超過 1 GB,則可能有多個文檔案。這些檔案的命名規則的請參閱第 73.1 節。
您可以透過三種方式監控磁碟空間:使用 Table 9.94 中所列出的 SQL 函數,使用 oid2name 模組或對系統目錄進行手動檢查。SQL 函數最易於使用,通常建議使用。本節的其餘部分顯示如何透過檢查系統目錄來執行此操作。
在最近清理或分析的資料庫上使用 psql,可以發出查詢以查看任何資料表的磁碟使用情況:
每個頁面通常為 8 KB。(請記住,只有 VACUUM,ANALYZE 和一些 DDL 命令(如 CREATE INDEX)才能更新 relpages。)如果要直接檢查資料表的磁碟檔案,則需要使用檔案路徑名稱。
要顯示 TOAST 資料表所使用的空間,請使用如下的查詢:
您也可以輕鬆顯示索引大小:
使用以下語法可以很容易找到最大的資料表和索引:
本章討論如何監控 PostgreSQL 資料庫系統的磁碟使用情況。
PostgreSQL's cumulative statistics system supports collection and reporting of information about server activity. Presently, accesses to tables and indexes in both disk-block and individual-row terms are counted. The total number of rows in each table, and information about vacuum and analyze actions for each table are also counted. If enabled, calls to user-defined functions and the total time spent in each one are counted as well.
PostgreSQL also supports reporting dynamic information about exactly what is going on in the system right now, such as the exact command currently being executed by other server processes, and which other connections exist in the system. This facility is independent of the cumulative statistics system.
Since collection of statistics adds some overhead to query execution, the system can be configured to collect or not collect information. This is controlled by configuration parameters that are normally set in postgresql.conf
. (See Chapter 20 for details about setting configuration parameters.)
The parameter track_activities enables monitoring of the current command being executed by any server process.
The parameter track_counts controls whether cumulative statistics are collected about table and index accesses.
The parameter track_functions enables tracking of usage of user-defined functions.
The parameter track_io_timing enables monitoring of block read and write times.
The parameter track_wal_io_timing enables monitoring of WAL write times.
Normally these parameters are set in postgresql.conf
so that they apply to all server processes, but it is possible to turn them on or off in individual sessions using the SET command. (To prevent ordinary users from hiding their activity from the administrator, only superusers are allowed to change these parameters with SET
.)
Cumulative statistics are collected in shared memory. Every PostgreSQL process collects statistics locally, then updates the shared data at appropriate intervals. When a server, including a physical replica, shuts down cleanly, a permanent copy of the statistics data is stored in the pg_stat
subdirectory, so that statistics can be retained across server restarts. In contrast, when starting from an unclean shutdown (e.g., after an immediate shutdown, a server crash, starting from a base backup, and point-in-time recovery), all statistics counters are reset.
Several predefined views, listed in Table 28.1, are available to show the current state of the system. There are also several other views, listed in Table 28.2, available to show the accumulated statistics. Alternatively, one can build custom views using the underlying cumulative statistics functions, as discussed in Section 28.2.24.
When using the cumulative statistics views and functions to monitor collected data, it is important to realize that the information does not update instantaneously. Each individual server process flushes out accumulated statistics to shared memory just before going idle, but not more frequently than once per PGSTAT_MIN_INTERVAL
milliseconds (1 second unless altered while building the server); so a query or transaction still in progress does not affect the displayed totals and the displayed information lags behind actual activity. However, current-query information collected by track_activities
is always up-to-date.
Another important point is that when a server process is asked to display any of the accumulated statistics, accessed values are cached until the end of its current transaction in the default configuration. So the statistics will show static information as long as you continue the current transaction. Similarly, information about the current queries of all sessions is collected when any such information is first requested within a transaction, and the same information will be displayed throughout the transaction. This is a feature, not a bug, because it allows you to perform several queries on the statistics and correlate the results without worrying that the numbers are changing underneath you. When analyzing statistics interactively, or with expensive queries, the time delta between accesses to individual statistics can lead to significant skew in the cached statistics. To minimize skew, stats_fetch_consistency
can be set to snapshot
, at the price of increased memory usage for caching not-needed statistics data. Conversely, if it's known that statistics are only accessed once, caching accessed statistics is unnecessary and can be avoided by setting stats_fetch_consistency
to none
. You can invoke pg_stat_clear_snapshot
() to discard the current transaction's statistics snapshot or cached values (if any). The next use of statistical information will (when in snapshot mode) cause a new snapshot to be built or (when in cache mode) accessed statistics to be cached.
A transaction can also see its own statistics (not yet flushed out to the shared memory statistics) in the views pg_stat_xact_all_tables
, pg_stat_xact_sys_tables
, pg_stat_xact_user_tables
, and pg_stat_xact_user_functions
. These numbers do not act as stated above; instead they update continuously throughout the transaction.
Some of the information in the dynamic statistics views shown in Table 28.1 is security restricted. Ordinary users can only see all the information about their own sessions (sessions belonging to a role that they are a member of). In rows about other sessions, many columns will be null. Note, however, that the existence of a session and its general properties such as its sessions user and database are visible to all users. Superusers and roles with privileges of built-in role pg_read_all_stats
(see also Section 22.5) can see all the information about all sessions.
pg_stat_activity
pg_stat_replication
pg_stat_wal_receiver
pg_stat_recovery_prefetch
pg_stat_subscription
pg_stat_ssl
pg_stat_gssapi
pg_stat_progress_analyze
pg_stat_progress_create_index
pg_stat_progress_vacuum
pg_stat_progress_cluster
pg_stat_progress_basebackup
pg_stat_progress_copy
pg_stat_archiver
pg_stat_bgwriter
pg_stat_wal
pg_stat_database
pg_stat_database_conflicts
pg_stat_all_tables
pg_stat_sys_tables
Same as pg_stat_all_tables
, except that only system tables are shown.
pg_stat_user_tables
Same as pg_stat_all_tables
, except that only user tables are shown.
pg_stat_xact_all_tables
Similar to pg_stat_all_tables
, but counts actions taken so far within the current transaction (which are not yet included in pg_stat_all_tables
and related views). The columns for numbers of live and dead rows and vacuum and analyze actions are not present in this view.
pg_stat_xact_sys_tables
Same as pg_stat_xact_all_tables
, except that only system tables are shown.
pg_stat_xact_user_tables
Same as pg_stat_xact_all_tables
, except that only user tables are shown.
pg_stat_all_indexes
pg_stat_sys_indexes
Same as pg_stat_all_indexes
, except that only indexes on system tables are shown.
pg_stat_user_indexes
Same as pg_stat_all_indexes
, except that only indexes on user tables are shown.
pg_statio_all_tables
pg_statio_sys_tables
Same as pg_statio_all_tables
, except that only system tables are shown.
pg_statio_user_tables
Same as pg_statio_all_tables
, except that only user tables are shown.
pg_statio_all_indexes
pg_statio_sys_indexes
Same as pg_statio_all_indexes
, except that only indexes on system tables are shown.
pg_statio_user_indexes
Same as pg_statio_all_indexes
, except that only indexes on user tables are shown.
pg_statio_all_sequences
pg_statio_sys_sequences
Same as pg_statio_all_sequences
, except that only system sequences are shown. (Presently, no system sequences are defined, so this view is always empty.)
pg_statio_user_sequences
Same as pg_statio_all_sequences
, except that only user sequences are shown.
pg_stat_user_functions
pg_stat_xact_user_functions
Similar to pg_stat_user_functions
, but counts only calls during the current transaction (which are not yet included in pg_stat_user_functions
).
pg_stat_slru
pg_stat_replication_slots
pg_stat_subscription_stats
The per-index statistics are particularly useful to determine which indexes are being used and how effective they are.
The pg_statio_
views are primarily useful to determine the effectiveness of the buffer cache. When the number of actual disk reads is much smaller than the number of buffer hits, then the cache is satisfying most read requests without invoking a kernel call. However, these statistics do not give the entire story: due to the way in which PostgreSQL handles disk I/O, data that is not in the PostgreSQL buffer cache might still reside in the kernel's I/O cache, and might therefore still be fetched without requiring a physical read. Users interested in obtaining more detailed information on PostgreSQL I/O behavior are advised to use the PostgreSQL statistics views in combination with operating system utilities that allow insight into the kernel's handling of I/O.
pg_stat_activity
The pg_stat_activity
view will have one row per server process, showing information related to the current activity of that process.
pg_stat_activity
ViewColumn Type
Description
datid
oid
OID of the database this backend is connected to
datname
name
Name of the database this backend is connected to
pid
integer
Process ID of this backend
leader_pid
integer
Process ID of the parallel group leader, if this process is a parallel query worker. NULL
if this process is a parallel group leader or does not participate in parallel query.
usesysid
oid
OID of the user logged into this backend
usename
name
Name of the user logged into this backend
application_name
text
Name of the application that is connected to this backend
client_addr
inet
IP address of the client connected to this backend. If this field is null, it indicates either that the client is connected via a Unix socket on the server machine or that this is an internal process such as autovacuum.
client_hostname
text
client_port
integer
TCP port number that the client is using for communication with this backend, or -1
if a Unix socket is used. If this field is null, it indicates that this is an internal server process.
backend_start
timestamp with time zone
Time when this process was started. For client backends, this is the time the client connected to the server.
xact_start
timestamp with time zone
Time when this process' current transaction was started, or null if no transaction is active. If the current query is the first of its transaction, this column is equal to the query_start
column.
query_start
timestamp with time zone
Time when the currently active query was started, or if state
is not active
, when the last query was started
state_change
timestamp with time zone
Time when the state
was last changed
wait_event_type
text
wait_event
text
state
text
Current overall state of this backend. Possible values are:
active
: The backend is executing a query.
idle
: The backend is waiting for a new client command.
idle in transaction
: The backend is in a transaction, but is not currently executing a query.
idle in transaction (aborted)
: This state is similar to idle in transaction
, except one of the statements in the transaction caused an error.
fastpath function call
: The backend is executing a fast-path function.
backend_xid
xid
Top-level transaction identifier of this backend, if any.
backend_xmin
xid
The current backend's xmin
horizon.
query_id
bigint
query
text
backend_type
text
Type of current backend. Possible types are autovacuum launcher
, autovacuum worker
, logical replication launcher
, logical replication worker
, parallel worker
, background writer
, client backend
, checkpointer
, archiver
, startup
, walreceiver
, walsender
and walwriter
. In addition, background workers registered by extensions may have additional types.
The wait_event
and state
columns are independent. If a backend is in the active
state, it may or may not be waiting
on some event. If the state is active
and wait_event
is non-null, it means that a query is being executed, but is being blocked somewhere in the system.
Activity
BufferPin
Client
Extension
IO
IPC
Lock
LWLock
Timeout
Activity
Activity
Wait Event
Description
ArchiverMain
Waiting in main loop of archiver process.
AutoVacuumMain
Waiting in main loop of autovacuum launcher process.
BgWriterHibernate
Waiting in background writer process, hibernating.
BgWriterMain
Waiting in main loop of background writer process.
CheckpointerMain
Waiting in main loop of checkpointer process.
LogicalApplyMain
Waiting in main loop of logical replication apply process.
LogicalLauncherMain
Waiting in main loop of logical replication launcher process.
RecoveryWalStream
Waiting in main loop of startup process for WAL to arrive, during streaming recovery.
SysLoggerMain
Waiting in main loop of syslogger process.
WalReceiverMain
Waiting in main loop of WAL receiver process.
WalSenderMain
Waiting in main loop of WAL sender process.
WalWriterMain
Waiting in main loop of WAL writer process.
BufferPin
BufferPin
Wait Event
Description
BufferPin
Waiting to acquire an exclusive pin on a buffer.
Client
Client
Wait Event
Description
ClientRead
Waiting to read data from the client.
ClientWrite
Waiting to write data to the client.
GSSOpenServer
Waiting to read data from the client while establishing a GSSAPI session.
LibPQWalReceiverConnect
Waiting in WAL receiver to establish connection to remote server.
LibPQWalReceiverReceive
Waiting in WAL receiver to receive data from remote server.
SSLOpenServer
Waiting for SSL while attempting connection.
WalSenderWaitForWAL
Waiting for WAL to be flushed in WAL sender process.
WalSenderWriteData
Waiting for any activity when processing replies from WAL receiver in WAL sender process.
Extension
Extension
Wait Event
Description
Extension
Waiting in an extension.
IO
IO
Wait Event
Description
BaseBackupRead
Waiting for base backup to read from a file.
BufFileRead
Waiting for a read from a buffered file.
BufFileWrite
Waiting for a write to a buffered file.
BufFileTruncate
Waiting for a buffered file to be truncated.
ControlFileRead
Waiting for a read from the pg_control
file.
ControlFileSync
Waiting for the pg_control
file to reach durable storage.
ControlFileSyncUpdate
Waiting for an update to the pg_control
file to reach durable storage.
ControlFileWrite
Waiting for a write to the pg_control
file.
ControlFileWriteUpdate
Waiting for a write to update the pg_control
file.
CopyFileRead
Waiting for a read during a file copy operation.
CopyFileWrite
Waiting for a write during a file copy operation.
DSMFillZeroWrite
Waiting to fill a dynamic shared memory backing file with zeroes.
DataFileExtend
Waiting for a relation data file to be extended.
DataFileFlush
Waiting for a relation data file to reach durable storage.
DataFileImmediateSync
Waiting for an immediate synchronization of a relation data file to durable storage.
DataFilePrefetch
Waiting for an asynchronous prefetch from a relation data file.
DataFileRead
Waiting for a read from a relation data file.
DataFileSync
Waiting for changes to a relation data file to reach durable storage.
DataFileTruncate
Waiting for a relation data file to be truncated.
DataFileWrite
Waiting for a write to a relation data file.
LockFileAddToDataDirRead
Waiting for a read while adding a line to the data directory lock file.
LockFileAddToDataDirSync
Waiting for data to reach durable storage while adding a line to the data directory lock file.
LockFileAddToDataDirWrite
Waiting for a write while adding a line to the data directory lock file.
LockFileCreateRead
Waiting to read while creating the data directory lock file.
LockFileCreateSync
Waiting for data to reach durable storage while creating the data directory lock file.
LockFileCreateWrite
Waiting for a write while creating the data directory lock file.
LockFileReCheckDataDirRead
Waiting for a read during recheck of the data directory lock file.
LogicalRewriteCheckpointSync
Waiting for logical rewrite mappings to reach durable storage during a checkpoint.
LogicalRewriteMappingSync
Waiting for mapping data to reach durable storage during a logical rewrite.
LogicalRewriteMappingWrite
Waiting for a write of mapping data during a logical rewrite.
LogicalRewriteSync
Waiting for logical rewrite mappings to reach durable storage.
LogicalRewriteTruncate
Waiting for truncate of mapping data during a logical rewrite.
LogicalRewriteWrite
Waiting for a write of logical rewrite mappings.
RelationMapRead
Waiting for a read of the relation map file.
RelationMapSync
Waiting for the relation map file to reach durable storage.
RelationMapWrite
Waiting for a write to the relation map file.
ReorderBufferRead
Waiting for a read during reorder buffer management.
ReorderBufferWrite
Waiting for a write during reorder buffer management.
ReorderLogicalMappingRead
Waiting for a read of a logical mapping during reorder buffer management.
ReplicationSlotRead
Waiting for a read from a replication slot control file.
ReplicationSlotRestoreSync
Waiting for a replication slot control file to reach durable storage while restoring it to memory.
ReplicationSlotSync
Waiting for a replication slot control file to reach durable storage.
ReplicationSlotWrite
Waiting for a write to a replication slot control file.
SLRUFlushSync
Waiting for SLRU data to reach durable storage during a checkpoint or database shutdown.
SLRURead
Waiting for a read of an SLRU page.
SLRUSync
Waiting for SLRU data to reach durable storage following a page write.
SLRUWrite
Waiting for a write of an SLRU page.
SnapbuildRead
Waiting for a read of a serialized historical catalog snapshot.
SnapbuildSync
Waiting for a serialized historical catalog snapshot to reach durable storage.
SnapbuildWrite
Waiting for a write of a serialized historical catalog snapshot.
TimelineHistoryFileSync
Waiting for a timeline history file received via streaming replication to reach durable storage.
TimelineHistoryFileWrite
Waiting for a write of a timeline history file received via streaming replication.
TimelineHistoryRead
Waiting for a read of a timeline history file.
TimelineHistorySync
Waiting for a newly created timeline history file to reach durable storage.
TimelineHistoryWrite
Waiting for a write of a newly created timeline history file.
TwophaseFileRead
Waiting for a read of a two phase state file.
TwophaseFileSync
Waiting for a two phase state file to reach durable storage.
TwophaseFileWrite
Waiting for a write of a two phase state file.
VersionFileWrite
Waiting for the version file to be written while creating a database.
WALBootstrapSync
Waiting for WAL to reach durable storage during bootstrapping.
WALBootstrapWrite
Waiting for a write of a WAL page during bootstrapping.
WALCopyRead
Waiting for a read when creating a new WAL segment by copying an existing one.
WALCopySync
Waiting for a new WAL segment created by copying an existing one to reach durable storage.
WALCopyWrite
Waiting for a write when creating a new WAL segment by copying an existing one.
WALInitSync
Waiting for a newly initialized WAL file to reach durable storage.
WALInitWrite
Waiting for a write while initializing a new WAL file.
WALRead
Waiting for a read from a WAL file.
WALSenderTimelineHistoryRead
Waiting for a read from a timeline history file during a walsender timeline command.
WALSync
Waiting for a WAL file to reach durable storage.
WALSyncMethodAssign
Waiting for data to reach durable storage while assigning a new WAL sync method.
WALWrite
Waiting for a write to a WAL file.
IPC
IPC
Wait Event
Description
AppendReady
Waiting for subplan nodes of an Append
plan node to be ready.
ArchiveCleanupCommand
ArchiveCommand
BackendTermination
Waiting for the termination of another backend.
BackupWaitWalArchive
Waiting for WAL files required for a backup to be successfully archived.
BgWorkerShutdown
Waiting for background worker to shut down.
BgWorkerStartup
Waiting for background worker to start up.
BtreePage
Waiting for the page number needed to continue a parallel B-tree scan to become available.
BufferIO
Waiting for buffer I/O to complete.
CheckpointDone
Waiting for a checkpoint to complete.
CheckpointStart
Waiting for a checkpoint to start.
ExecuteGather
Waiting for activity from a child process while executing a Gather
plan node.
HashBatchAllocate
Waiting for an elected Parallel Hash participant to allocate a hash table.
HashBatchElect
Waiting to elect a Parallel Hash participant to allocate a hash table.
HashBatchLoad
Waiting for other Parallel Hash participants to finish loading a hash table.
HashBuildAllocate
Waiting for an elected Parallel Hash participant to allocate the initial hash table.
HashBuildElect
Waiting to elect a Parallel Hash participant to allocate the initial hash table.
HashBuildHashInner
Waiting for other Parallel Hash participants to finish hashing the inner relation.
HashBuildHashOuter
Waiting for other Parallel Hash participants to finish partitioning the outer relation.
HashGrowBatchesAllocate
Waiting for an elected Parallel Hash participant to allocate more batches.
HashGrowBatchesDecide
Waiting to elect a Parallel Hash participant to decide on future batch growth.
HashGrowBatchesElect
Waiting to elect a Parallel Hash participant to allocate more batches.
HashGrowBatchesFinish
Waiting for an elected Parallel Hash participant to decide on future batch growth.
HashGrowBatchesRepartition
Waiting for other Parallel Hash participants to finish repartitioning.
HashGrowBucketsAllocate
Waiting for an elected Parallel Hash participant to finish allocating more buckets.
HashGrowBucketsElect
Waiting to elect a Parallel Hash participant to allocate more buckets.
HashGrowBucketsReinsert
Waiting for other Parallel Hash participants to finish inserting tuples into new buckets.
LogicalSyncData
Waiting for a logical replication remote server to send data for initial table synchronization.
LogicalSyncStateChange
Waiting for a logical replication remote server to change state.
MessageQueueInternal
Waiting for another process to be attached to a shared message queue.
MessageQueuePutMessage
Waiting to write a protocol message to a shared message queue.
MessageQueueReceive
Waiting to receive bytes from a shared message queue.
MessageQueueSend
Waiting to send bytes to a shared message queue.
ParallelBitmapScan
Waiting for parallel bitmap scan to become initialized.
ParallelCreateIndexScan
Waiting for parallel CREATE INDEX
workers to finish heap scan.
ParallelFinish
Waiting for parallel workers to finish computing.
ProcArrayGroupUpdate
Waiting for the group leader to clear the transaction ID at end of a parallel operation.
ProcSignalBarrier
Waiting for a barrier event to be processed by all backends.
Promote
Waiting for standby promotion.
RecoveryConflictSnapshot
Waiting for recovery conflict resolution for a vacuum cleanup.
RecoveryConflictTablespace
Waiting for recovery conflict resolution for dropping a tablespace.
RecoveryEndCommand
RecoveryPause
Waiting for recovery to be resumed.
ReplicationOriginDrop
Waiting for a replication origin to become inactive so it can be dropped.
ReplicationSlotDrop
Waiting for a replication slot to become inactive so it can be dropped.
RestoreCommand
SafeSnapshot
Waiting to obtain a valid snapshot for a READ ONLY DEFERRABLE
transaction.
SyncRep
Waiting for confirmation from a remote server during synchronous replication.
WalReceiverExit
Waiting for the WAL receiver to exit.
WalReceiverWaitStart
Waiting for startup process to send initial data for streaming replication.
XactGroupUpdate
Waiting for the group leader to update transaction status at end of a parallel operation.
Lock
Lock
Wait Event
Description
advisory
Waiting to acquire an advisory user lock.
extend
Waiting to extend a relation.
frozenid
Waiting to update pg_database
.datfrozenxid
and pg_database
.datminmxid
.
object
Waiting to acquire a lock on a non-relation database object.
page
Waiting to acquire a lock on a page of a relation.
relation
Waiting to acquire a lock on a relation.
spectoken
Waiting to acquire a speculative insertion lock.
transactionid
Waiting for a transaction to finish.
tuple
Waiting to acquire a lock on a tuple.
userlock
Waiting to acquire a user lock.
virtualxid
Waiting to acquire a virtual transaction ID lock.
LWLock
LWLock
Wait Event
Description
AddinShmemInit
Waiting to manage an extension's space allocation in shared memory.
AutoFile
Waiting to update the postgresql.auto.conf
file.
Autovacuum
Waiting to read or update the current state of autovacuum workers.
AutovacuumSchedule
Waiting to ensure that a table selected for autovacuum still needs vacuuming.
BackgroundWorker
Waiting to read or update background worker state.
BtreeVacuum
Waiting to read or update vacuum-related information for a B-tree index.
BufferContent
Waiting to access a data page in memory.
BufferMapping
Waiting to associate a data block with a buffer in the buffer pool.
CheckpointerComm
Waiting to manage fsync requests.
CommitTs
Waiting to read or update the last value set for a transaction commit timestamp.
CommitTsBuffer
Waiting for I/O on a commit timestamp SLRU buffer.
CommitTsSLRU
Waiting to access the commit timestamp SLRU cache.
ControlFile
Waiting to read or update the pg_control
file or create a new WAL file.
DynamicSharedMemoryControl
Waiting to read or update dynamic shared memory allocation information.
LockFastPath
Waiting to read or update a process' fast-path lock information.
LockManager
Waiting to read or update information about “heavyweight” locks.
LogicalRepWorker
Waiting to read or update the state of logical replication workers.
MultiXactGen
Waiting to read or update shared multixact state.
MultiXactMemberBuffer
Waiting for I/O on a multixact member SLRU buffer.
MultiXactMemberSLRU
Waiting to access the multixact member SLRU cache.
MultiXactOffsetBuffer
Waiting for I/O on a multixact offset SLRU buffer.
MultiXactOffsetSLRU
Waiting to access the multixact offset SLRU cache.
MultiXactTruncation
Waiting to read or truncate multixact information.
NotifyBuffer
Waiting for I/O on a NOTIFY
message SLRU buffer.
NotifyQueue
Waiting to read or update NOTIFY
messages.
NotifyQueueTail
Waiting to update limit on NOTIFY
message storage.
NotifySLRU
Waiting to access the NOTIFY
message SLRU cache.
OidGen
Waiting to allocate a new OID.
OldSnapshotTimeMap
Waiting to read or update old snapshot control information.
ParallelAppend
Waiting to choose the next subplan during Parallel Append plan execution.
ParallelHashJoin
Waiting to synchronize workers during Parallel Hash Join plan execution.
ParallelQueryDSA
Waiting for parallel query dynamic shared memory allocation.
PerSessionDSA
Waiting for parallel query dynamic shared memory allocation.
PerSessionRecordType
Waiting to access a parallel query's information about composite types.
PerSessionRecordTypmod
Waiting to access a parallel query's information about type modifiers that identify anonymous record types.
PerXactPredicateList
Waiting to access the list of predicate locks held by the current serializable transaction during a parallel query.
PredicateLockManager
Waiting to access predicate lock information used by serializable transactions.
ProcArray
Waiting to access the shared per-process data structures (typically, to get a snapshot or report a session's transaction ID).
RelationMapping
Waiting to read or update a pg_filenode.map
file (used to track the filenode assignments of certain system catalogs).
RelCacheInit
Waiting to read or update a pg_internal.init
relation cache initialization file.
ReplicationOrigin
Waiting to create, drop or use a replication origin.
ReplicationOriginState
Waiting to read or update the progress of one replication origin.
ReplicationSlotAllocation
Waiting to allocate or free a replication slot.
ReplicationSlotControl
Waiting to read or update replication slot state.
ReplicationSlotIO
Waiting for I/O on a replication slot.
SerialBuffer
Waiting for I/O on a serializable transaction conflict SLRU buffer.
SerializableFinishedList
Waiting to access the list of finished serializable transactions.
SerializablePredicateList
Waiting to access the list of predicate locks held by serializable transactions.
PgStatsDSA
Waiting for stats dynamic shared memory allocator access
PgStatsHash
Waiting for stats shared memory hash table access
PgStatsData
Waiting for shared memory stats data access
SerializableXactHash
Waiting to read or update information about serializable transactions.
SerialSLRU
Waiting to access the serializable transaction conflict SLRU cache.
SharedTidBitmap
Waiting to access a shared TID bitmap during a parallel bitmap index scan.
SharedTupleStore
Waiting to access a shared tuple store during parallel query.
ShmemIndex
Waiting to find or allocate space in shared memory.
SInvalRead
Waiting to retrieve messages from the shared catalog invalidation queue.
SInvalWrite
Waiting to add a message to the shared catalog invalidation queue.
SubtransBuffer
Waiting for I/O on a sub-transaction SLRU buffer.
SubtransSLRU
Waiting to access the sub-transaction SLRU cache.
SyncRep
Waiting to read or update information about the state of synchronous replication.
SyncScan
Waiting to select the starting location of a synchronized table scan.
TablespaceCreate
Waiting to create or drop a tablespace.
TwoPhaseState
Waiting to read or update the state of prepared transactions.
WALBufMapping
Waiting to replace a page in WAL buffers.
WALInsert
Waiting to insert WAL data into a memory buffer.
WALWrite
Waiting for WAL buffers to be written to disk.
WrapLimitsVacuum
Waiting to update limits on transaction id and multixact consumption.
XactBuffer
Waiting for I/O on a transaction status SLRU buffer.
XactSLRU
Waiting to access the transaction status SLRU cache.
XactTruncation
Waiting to execute pg_xact_status
or update the oldest transaction ID available to it.
XidGen
Waiting to allocate a new transaction ID.
Extensions can add LWLock
types to the list shown in Table 28.12. In some cases, the name assigned by an extension will not be available in all server processes; so an LWLock
wait event might be reported as just “extension
” rather than the extension-assigned name.
Timeout
Timeout
Wait Event
Description
BaseBackupThrottle
Waiting during base backup when throttling activity.
CheckpointWriteDelay
Waiting between writes while performing a checkpoint.
PgSleep
Waiting due to a call to pg_sleep
or a sibling function.
RecoveryApplyDelay
Waiting to apply WAL during recovery because of a delay setting.
RecoveryRetrieveRetryInterval
Waiting during recovery when WAL data is not available from any source (pg_wal
, archive or stream).
RegisterSyncRequest
Waiting while sending synchronization requests to the checkpointer, because the request queue is full.
VacuumDelay
Waiting in a cost-based vacuum delay point.
VacuumTruncate
Waiting to acquire an exclusive lock to truncate off any empty pages at the end of a table vacuumed.
Here is an example of how wait events can be viewed:
pg_stat_replication
The pg_stat_replication
view will contain one row per WAL sender process, showing statistics about replication to that sender's connected standby server. Only directly connected standbys are listed; no information is available about downstream standby servers.
pg_stat_replication
ViewColumn Type
Description
pid
integer
Process ID of a WAL sender process
usesysid
oid
OID of the user logged into this WAL sender process
usename
name
Name of the user logged into this WAL sender process
application_name
text
Name of the application that is connected to this WAL sender
client_addr
inet
IP address of the client connected to this WAL sender. If this field is null, it indicates that the client is connected via a Unix socket on the server machine.
client_hostname
text
client_port
integer
TCP port number that the client is using for communication with this WAL sender, or -1
if a Unix socket is used
backend_start
timestamp with time zone
Time when this process was started, i.e., when the client connected to this WAL sender
backend_xmin
xid
state
text
Current WAL sender state. Possible values are:
startup
: This WAL sender is starting up.
catchup
: This WAL sender's connected standby is catching up with the primary.
streaming
: This WAL sender is streaming changes after its connected standby server has caught up with the primary.
backup
: This WAL sender is sending a backup.
stopping
: This WAL sender is stopping.
sent_lsn
pg_lsn
Last write-ahead log location sent on this connection
write_lsn
pg_lsn
Last write-ahead log location written to disk by this standby server
flush_lsn
pg_lsn
Last write-ahead log location flushed to disk by this standby server
replay_lsn
pg_lsn
Last write-ahead log location replayed into the database on this standby server
write_lag
interval
Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written it (but not yet flushed it or applied it). This can be used to gauge the delay that synchronous_commit
level remote_write
incurred while committing if this server was configured as a synchronous standby.
flush_lag
interval
Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written and flushed it (but not yet applied it). This can be used to gauge the delay that synchronous_commit
level on
incurred while committing if this server was configured as a synchronous standby.
replay_lag
interval
Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written, flushed and applied it. This can be used to gauge the delay that synchronous_commit
level remote_apply
incurred while committing if this server was configured as a synchronous standby.
sync_priority
integer
Priority of this standby server for being chosen as the synchronous standby in a priority-based synchronous replication. This has no effect in a quorum-based synchronous replication.
sync_state
text
Synchronous state of this standby server. Possible values are:
async
: This standby server is asynchronous.
potential
: This standby server is now asynchronous, but can potentially become synchronous if one of current synchronous ones fails.
sync
: This standby server is synchronous.
quorum
: This standby server is considered as a candidate for quorum standbys.
reply_time
timestamp with time zone
Send time of last reply message received from standby server
The lag times reported in the pg_stat_replication
view are measurements of the time taken for recent WAL to be written, flushed and replayed and for the sender to know about it. These times represent the commit delay that was (or would have been) introduced by each synchronous commit level, if the remote server was configured as a synchronous standby. For an asynchronous standby, the replay_lag
column approximates the delay before recent transactions became visible to queries. If the standby server has entirely caught up with the sending server and there is no more WAL activity, the most recently measured lag times will continue to be displayed for a short time and then show NULL.
Lag times work automatically for physical replication. Logical decoding plugins may optionally emit tracking messages; if they do not, the tracking mechanism will simply display NULL lag.
The reported lag times are not predictions of how long it will take for the standby to catch up with the sending server assuming the current rate of replay. Such a system would show similar times while new WAL is being generated, but would differ when the sender becomes idle. In particular, when the standby has caught up completely, pg_stat_replication
shows the time taken to write, flush and replay the most recent reported WAL location rather than zero as some users might expect. This is consistent with the goal of measuring synchronous commit and transaction visibility delays for recent write transactions. To reduce confusion for users expecting a different model of lag, the lag columns revert to NULL after a short time on a fully replayed idle system. Monitoring systems should choose whether to represent this as missing data, zero or continue to display the last known value.
pg_stat_replication_slots
The pg_stat_replication_slots
view will contain one row per logical replication slot, showing statistics about its usage.
pg_stat_replication_slots
ViewColumn Type
Description
slot_name
text
A unique, cluster-wide identifier for the replication slot
spill_txns
bigint
Number of transactions spilled to disk once the memory used by logical decoding to decode changes from WAL has exceeded logical_decoding_work_mem
. The counter gets incremented for both top-level transactions and subtransactions.
spill_count
bigint
Number of times transactions were spilled to disk while decoding changes from WAL for this slot. This counter is incremented each time a transaction is spilled, and the same transaction may be spilled multiple times.
spill_bytes
bigint
Amount of decoded transaction data spilled to disk while performing decoding of changes from WAL for this slot. This and other spill counters can be used to gauge the I/O which occurred during logical decoding and allow tuning logical_decoding_work_mem
.
stream_txns
bigint
Number of in-progress transactions streamed to the decoding output plugin after the memory used by logical decoding to decode changes from WAL for this slot has exceeded logical_decoding_work_mem
. Streaming only works with top-level transactions (subtransactions can't be streamed independently), so the counter is not incremented for subtransactions.
stream_countbigint
Number of times in-progress transactions were streamed to the decoding output plugin while decoding changes from WAL for this slot. This counter is incremented each time a transaction is streamed, and the same transaction may be streamed multiple times.
stream_bytesbigint
Amount of transaction data decoded for streaming in-progress transactions to the decoding output plugin while decoding changes from WAL for this slot. This and other streaming counters for this slot can be used to tune logical_decoding_work_mem
.
total_txns
bigint
Number of decoded transactions sent to the decoding output plugin for this slot. This counts top-level transactions only, and is not incremented for subtransactions. Note that this includes the transactions that are streamed and/or spilled.
total_bytesbigint
Amount of transaction data decoded for sending transactions to the decoding output plugin while decoding changes from WAL for this slot. Note that this includes data that is streamed and/or spilled.
stats_reset
timestamp with time zone
Time at which these statistics were last reset
pg_stat_wal_receiver
The pg_stat_wal_receiver
view will contain only one row, showing statistics about the WAL receiver from that receiver's connected server.
pg_stat_wal_receiver
ViewColumn Type
Description
pid
integer
Process ID of the WAL receiver process
status
text
Activity status of the WAL receiver process
receive_start_lsn
pg_lsn
First write-ahead log location used when WAL receiver is started
receive_start_tli
integer
First timeline number used when WAL receiver is started
written_lsn
pg_lsn
Last write-ahead log location already received and written to disk, but not flushed. This should not be used for data integrity checks.
flushed_lsn
pg_lsn
Last write-ahead log location already received and flushed to disk, the initial value of this field being the first log location used when WAL receiver is started
received_tli
integer
Timeline number of last write-ahead log location received and flushed to disk, the initial value of this field being the timeline number of the first log location used when WAL receiver is started
last_msg_send_time
timestamp with time zone
Send time of last message received from origin WAL sender
last_msg_receipt_time
timestamp with time zone
Receipt time of last message received from origin WAL sender
latest_end_lsn
pg_lsn
Last write-ahead log location reported to origin WAL sender
latest_end_time
timestamp with time zone
Time of last write-ahead log location reported to origin WAL sender
slot_name
text
Replication slot name used by this WAL receiver
sender_host
text
Host of the PostgreSQL instance this WAL receiver is connected to. This can be a host name, an IP address, or a directory path if the connection is via Unix socket. (The path case can be distinguished because it will always be an absolute path, beginning with /
.)
sender_port
integer
Port number of the PostgreSQL instance this WAL receiver is connected to.
conninfo
text
Connection string used by this WAL receiver, with security-sensitive fields obfuscated.
pg_stat_recovery_prefetch
The pg_stat_recovery_prefetch
view will contain only one row. The columns wal_distance
, block_distance
and io_depth
show current values, and the other columns show cumulative counters that can be reset with the pg_stat_reset_shared
function.
pg_stat_recovery_prefetch
ViewColumn Type
Description
stats_reset
timestamp with time zone
Time at which these statistics were last reset
prefetch
bigint
Number of blocks prefetched because they were not in the buffer pool
hit
bigint
Number of blocks not prefetched because they were already in the buffer pool
skip_init
bigint
Number of blocks not prefetched because they would be zero-initialized
skip_new
bigint
Number of blocks not prefetched because they didn't exist yet
skip_fpw
bigint
Number of blocks not prefetched because a full page image was included in the WAL
skip_rep
bigint
Number of blocks not prefetched because they were already recently prefetched
wal_distance
int
How many bytes ahead the prefetcher is looking
block_distance
int
How many blocks ahead the prefetcher is looking
io_depth
int
How many prefetches have been initiated but are not yet known to have completed
pg_stat_subscription
pg_stat_subscription
ViewColumn Type
Description
subid
oid
OID of the subscription
subname
name
Name of the subscription
pid
integer
Process ID of the subscription worker process
relid
oid
OID of the relation that the worker is synchronizing; null for the main apply worker
received_lsn
pg_lsn
Last write-ahead log location received, the initial value of this field being 0
last_msg_send_time
timestamp with time zone
Send time of last message received from origin WAL sender
last_msg_receipt_time
timestamp with time zone
Receipt time of last message received from origin WAL sender
latest_end_lsn
pg_lsn
Last write-ahead log location reported to origin WAL sender
latest_end_time
timestamp with time zone
Time of last write-ahead log location reported to origin WAL sender
pg_stat_subscription_stats
The pg_stat_subscription_stats
view will contain one row per subscription.
pg_stat_subscription_stats
ViewColumn Type
Description
subid
oid
OID of the subscription
subname
name
Name of the subscription
apply_error_count
bigint
Number of times an error occurred while applying changes
sync_error_count
bigint
Number of times an error occurred during the initial table synchronization
stats_reset
timestamp with time zone
Time at which these statistics were last reset
pg_stat_ssl
The pg_stat_ssl
view will contain one row per backend or WAL sender process, showing statistics about SSL usage on this connection. It can be joined to pg_stat_activity
or pg_stat_replication
on the pid
column to get more details about the connection.
pg_stat_ssl
ViewColumn Type
Description
pid
integer
Process ID of a backend or WAL sender process
ssl
boolean
True if SSL is used on this connection
version
text
Version of SSL in use, or NULL if SSL is not in use on this connection
cipher
text
Name of SSL cipher in use, or NULL if SSL is not in use on this connection
bits
integer
Number of bits in the encryption algorithm used, or NULL if SSL is not used on this connection
client_dn
text
Distinguished Name (DN) field from the client certificate used, or NULL if no client certificate was supplied or if SSL is not in use on this connection. This field is truncated if the DN field is longer than NAMEDATALEN
(64 characters in a standard build).
client_serial
numeric
Serial number of the client certificate, or NULL if no client certificate was supplied or if SSL is not in use on this connection. The combination of certificate serial number and certificate issuer uniquely identifies a certificate (unless the issuer erroneously reuses serial numbers).
issuer_dn
text
DN of the issuer of the client certificate, or NULL if no client certificate was supplied or if SSL is not in use on this connection. This field is truncated like client_dn
.
pg_stat_gssapi
The pg_stat_gssapi
view will contain one row per backend, showing information about GSSAPI usage on this connection. It can be joined to pg_stat_activity
or pg_stat_replication
on the pid
column to get more details about the connection.
pg_stat_gssapi
ViewColumn Type
Description
pid
integer
Process ID of a backend
gss_authenticated
boolean
True if GSSAPI authentication was used for this connection
principal
text
Principal used to authenticate this connection, or NULL if GSSAPI was not used to authenticate this connection. This field is truncated if the principal is longer than NAMEDATALEN
(64 characters in a standard build).
encrypted
boolean
True if GSSAPI encryption is in use on this connection
pg_stat_archiver
The pg_stat_archiver
view will always have a single row, containing data about the archiver process of the cluster.
pg_stat_archiver
ViewColumn Type
Description
archived_count
bigint
Number of WAL files that have been successfully archived
last_archived_wal
text
Name of the WAL file most recently successfully archived
last_archived_time
timestamp with time zone
Time of the most recent successful archive operation
failed_count
bigint
Number of failed attempts for archiving WAL files
last_failed_wal
text
Name of the WAL file of the most recent failed archival operation
last_failed_time
timestamp with time zone
Time of the most recent failed archival operation
stats_reset
timestamp with time zone
Time at which these statistics were last reset
Normally, WAL files are archived in order, oldest to newest, but that is not guaranteed, and does not hold under special circumstances like when promoting a standby or after crash recovery. Therefore it is not safe to assume that all files older than last_archived_wal
have also been successfully archived.
pg_stat_bgwriter
The pg_stat_bgwriter
view will always have a single row, containing global data for the cluster.
pg_stat_bgwriter
ViewColumn Type
Description
checkpoints_timed
bigint
Number of scheduled checkpoints that have been performed
checkpoints_req
bigint
Number of requested checkpoints that have been performed
checkpoint_write_time
double precision
Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in milliseconds
checkpoint_sync_time
double precision
Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk, in milliseconds
buffers_checkpoint
bigint
Number of buffers written during checkpoints
buffers_clean
bigint
Number of buffers written by the background writer
maxwritten_clean
bigint
Number of times the background writer stopped a cleaning scan because it had written too many buffers
buffers_backend
bigint
Number of buffers written directly by a backend
buffers_backend_fsync
bigint
Number of times a backend had to execute its own fsync
call (normally the background writer handles those even when the backend does its own write)
buffers_alloc
bigint
Number of buffers allocated
stats_reset
timestamp with time zone
Time at which these statistics were last reset
pg_stat_wal
The pg_stat_wal
view will always have a single row, containing data about WAL activity of the cluster.
pg_stat_wal
ViewColumn Type
Description
wal_records
bigint
Total number of WAL records generated
wal_fpi
bigint
Total number of WAL full page images generated
wal_bytes
numeric
Total amount of WAL generated in bytes
wal_buffers_full
bigint
Number of times WAL data was written to disk because WAL buffers became full
wal_write
bigint
wal_sync
bigint
wal_write_time
double precision
wal_sync_time
double precision
Total amount of time spent syncing WAL files to disk via issue_xlog_fsync
request, in milliseconds (if track_wal_io_timing
is enabled, fsync
is on
, and wal_sync_method
is either fdatasync
, fsync
or fsync_writethrough
, otherwise zero).
stats_reset
timestamp with time zone
Time at which these statistics were last reset
pg_stat_database
The pg_stat_database
view will contain one row for each database in the cluster, plus one for shared objects, showing database-wide statistics.
pg_stat_database
ViewColumn Type
Description
datid
oid
OID of this database, or 0 for objects belonging to a shared relation
datname
name
Name of this database, or NULL
for shared objects.
numbackends
integer
Number of backends currently connected to this database, or NULL
for shared objects. This is the only column in this view that returns a value reflecting current state; all other columns return the accumulated values since the last reset.
xact_commit
bigint
Number of transactions in this database that have been committed
xact_rollback
bigint
Number of transactions in this database that have been rolled back
blks_read
bigint
Number of disk blocks read in this database
blks_hit
bigint
Number of times disk blocks were found already in the buffer cache, so that a read was not necessary (this only includes hits in the PostgreSQL buffer cache, not the operating system's file system cache)
tup_returned
bigint
Number of live rows fetched by sequential scans and index entries returned by index scans in this database
tup_fetched
bigint
Number of live rows fetched by index scans in this database
tup_inserted
bigint
Number of rows inserted by queries in this database
tup_updated
bigint
Number of rows updated by queries in this database
tup_deleted
bigint
Number of rows deleted by queries in this database
conflicts
bigint
temp_files
bigint
temp_bytes
bigint
deadlocks
bigint
Number of deadlocks detected in this database
checksum_failures
bigint
Number of data page checksum failures detected in this database (or on a shared object), or NULL if data checksums are not enabled.
checksum_last_failure
timestamp with time zone
Time at which the last data page checksum failure was detected in this database (or on a shared object), or NULL if data checksums are not enabled.
blk_read_time
double precision
blk_write_time
double precision
session_time
double precision
Time spent by database sessions in this database, in milliseconds (note that statistics are only updated when the state of a session changes, so if sessions have been idle for a long time, this idle time won't be included)
active_time
double precision
idle_in_transaction_time
double precision
sessions
bigint
Total number of sessions established to this database
sessions_abandoned
bigint
Number of database sessions to this database that were terminated because connection to the client was lost
sessions_fatal
bigint
Number of database sessions to this database that were terminated by fatal errors
sessions_killed
bigint
Number of database sessions to this database that were terminated by operator intervention
stats_reset
timestamp with time zone
Time at which these statistics were last reset
pg_stat_database_conflicts
The pg_stat_database_conflicts
view will contain one row per database, showing database-wide statistics about query cancels occurring due to conflicts with recovery on standby servers. This view will only contain information on standby servers, since conflicts do not occur on primary servers.
pg_stat_database_conflicts
ViewColumn Type
Description
datid
oid
OID of a database
datname
name
Name of this database
confl_tablespace
bigint
Number of queries in this database that have been canceled due to dropped tablespaces
confl_lock
bigint
Number of queries in this database that have been canceled due to lock timeouts
confl_snapshot
bigint
Number of queries in this database that have been canceled due to old snapshots
confl_bufferpin
bigint
Number of queries in this database that have been canceled due to pinned buffers
confl_deadlock
bigint
Number of queries in this database that have been canceled due to deadlocks
pg_stat_all_tables
The pg_stat_all_tables
view will contain one row for each table in the current database (including TOAST tables), showing statistics about accesses to that specific table. The pg_stat_user_tables
and pg_stat_sys_tables
views contain the same information, but filtered to only show user and system tables respectively.
pg_stat_all_tables
ViewColumn Type
Description
relid
oid
OID of a table
schemaname
name
Name of the schema that this table is in
relname
name
Name of this table
seq_scan
bigint
Number of sequential scans initiated on this table
seq_tup_read
bigint
Number of live rows fetched by sequential scans
idx_scan
bigint
Number of index scans initiated on this table
idx_tup_fetch
bigint
Number of live rows fetched by index scans
n_tup_ins
bigint
Number of rows inserted
n_tup_upd
bigint
n_tup_del
bigint
Number of rows deleted
n_tup_hot_upd
bigint
Number of rows HOT updated (i.e., with no separate index update required)
n_live_tup
bigint
Estimated number of live rows
n_dead_tup
bigint
Estimated number of dead rows
n_mod_since_analyze
bigint
Estimated number of rows modified since this table was last analyzed
n_ins_since_vacuum
bigint
Estimated number of rows inserted since this table was last vacuumed
last_vacuum
timestamp with time zone
Last time at which this table was manually vacuumed (not counting VACUUM FULL
)
last_autovacuum
timestamp with time zone
Last time at which this table was vacuumed by the autovacuum daemon
last_analyze
timestamp with time zone
Last time at which this table was manually analyzed
last_autoanalyze
timestamp with time zone
Last time at which this table was analyzed by the autovacuum daemon
vacuum_count
bigint
Number of times this table has been manually vacuumed (not counting VACUUM FULL
)
autovacuum_count
bigint
Number of times this table has been vacuumed by the autovacuum daemon
analyze_count
bigint
Number of times this table has been manually analyzed
autoanalyze_count
bigint
Number of times this table has been analyzed by the autovacuum daemon
pg_stat_all_indexes
The pg_stat_all_indexes
view will contain one row for each index in the current database, showing statistics about accesses to that specific index. The pg_stat_user_indexes
and pg_stat_sys_indexes
views contain the same information, but filtered to only show user and system indexes respectively.
pg_stat_all_indexes
ViewColumn Type
Description
relid
oid
OID of the table for this index
indexrelid
oid
OID of this index
schemaname
name
Name of the schema this index is in
relname
name
Name of the table for this index
indexrelname
name
Name of this index
idx_scan
bigint
Number of index scans initiated on this index
idx_tup_read
bigint
Number of index entries returned by scans on this index
idx_tup_fetch
bigint
Number of live table rows fetched by simple index scans using this index
Indexes can be used by simple index scans, “bitmap” index scans, and the optimizer. In a bitmap scan the output of several indexes can be combined via AND or OR rules, so it is difficult to associate individual heap row fetches with specific indexes when a bitmap scan is used. Therefore, a bitmap scan increments the pg_stat_all_indexes
.idx_tup_read
count(s) for the index(es) it uses, and it increments the pg_stat_all_tables
.idx_tup_fetch
count for the table, but it does not affect pg_stat_all_indexes
.idx_tup_fetch
. The optimizer also accesses indexes to check for supplied constants whose values are outside the recorded range of the optimizer statistics because the optimizer statistics might be stale.
The idx_tup_read
and idx_tup_fetch
counts can be different even without any use of bitmap scans, because idx_tup_read
counts index entries retrieved from the index while idx_tup_fetch
counts live rows fetched from the table. The latter will be less if any dead or not-yet-committed rows are fetched using the index, or if any heap fetches are avoided by means of an index-only scan.
pg_statio_all_tables
The pg_statio_all_tables
view will contain one row for each table in the current database (including TOAST tables), showing statistics about I/O on that specific table. The pg_statio_user_tables
and pg_statio_sys_tables
views contain the same information, but filtered to only show user and system tables respectively.
pg_statio_all_tables
ViewColumn Type
Description
relid
oid
OID of a table
schemaname
name
Name of the schema that this table is in
relname
name
Name of this table
heap_blks_read
bigint
Number of disk blocks read from this table
heap_blks_hit
bigint
Number of buffer hits in this table
idx_blks_read
bigint
Number of disk blocks read from all indexes on this table
idx_blks_hit
bigint
Number of buffer hits in all indexes on this table
toast_blks_read
bigint
Number of disk blocks read from this table's TOAST table (if any)
toast_blks_hit
bigint
Number of buffer hits in this table's TOAST table (if any)
tidx_blks_read
bigint
Number of disk blocks read from this table's TOAST table indexes (if any)
tidx_blks_hit
bigint
Number of buffer hits in this table's TOAST table indexes (if any)
pg_statio_all_indexes
The pg_statio_all_indexes
view will contain one row for each index in the current database, showing statistics about I/O on that specific index. The pg_statio_user_indexes
and pg_statio_sys_indexes
views contain the same information, but filtered to only show user and system indexes respectively.
pg_statio_all_indexes
ViewColumn Type
Description
relid
oid
OID of the table for this index
indexrelid
oid
OID of this index
schemaname
name
Name of the schema this index is in
relname
name
Name of the table for this index
indexrelname
name
Name of this index
idx_blks_read
bigint
Number of disk blocks read from this index
idx_blks_hit
bigint
Number of buffer hits in this index
pg_statio_all_sequences
The pg_statio_all_sequences
view will contain one row for each sequence in the current database, showing statistics about I/O on that specific sequence.
pg_statio_all_sequences
ViewColumn Type
Description
relid
oid
OID of a sequence
schemaname
name
Name of the schema this sequence is in
relname
name
Name of this sequence
blks_read
bigint
Number of disk blocks read from this sequence
blks_hit
bigint
Number of buffer hits in this sequence
pg_stat_user_functions
The pg_stat_user_functions
view will contain one row for each tracked function, showing statistics about executions of that function. The track_functions parameter controls exactly which functions are tracked.
pg_stat_user_functions
ViewColumn Type
Description
funcid
oid
OID of a function
schemaname
name
Name of the schema this function is in
funcname
name
Name of this function
calls
bigint
Number of times this function has been called
total_time
double precision
Total time spent in this function and all other functions called by it, in milliseconds
self_time
double precision
Total time spent in this function itself, not including other functions called by it, in milliseconds
pg_stat_slru
PostgreSQL accesses certain on-disk information via SLRU (simple least-recently-used) caches. The pg_stat_slru
view will contain one row for each tracked SLRU cache, showing statistics about access to cached pages.
pg_stat_slru
ViewColumn Type
Description
name
text
Name of the SLRU
blks_zeroed
bigint
Number of blocks zeroed during initializations
blks_hit
bigint
Number of times disk blocks were found already in the SLRU, so that a read was not necessary (this only includes hits in the SLRU, not the operating system's file system cache)
blks_read
bigint
Number of disk blocks read for this SLRU
blks_written
bigint
Number of disk blocks written for this SLRU
blks_exists
bigint
Number of blocks checked for existence for this SLRU
flushes
bigint
Number of flushes of dirty data for this SLRU
truncates
bigint
Number of truncates for this SLRU
stats_reset
timestamp with time zone
Time at which these statistics were last reset
Other ways of looking at the statistics can be set up by writing queries that use the same underlying statistics access functions used by the standard views shown above. For details such as the functions' names, consult the definitions of the standard views. (For example, in psql you could issue \d+ pg_stat_activity
.) The access functions for per-database statistics take a database OID as an argument to identify which database to report on. The per-table and per-index functions take a table or index OID. The functions for per-function statistics take a function OID. Note that only tables, indexes, and functions in the current database can be seen with these functions.
Additional functions related to the cumulative statistics system are listed in Table 28.34.
Function
Description
pg_backend_pid
() → integer
Returns the process ID of the server process attached to the current session.
pg_stat_get_activity
( integer
) → setof record
Returns a record of information about the backend with the specified process ID, or one record for each active backend in the system if NULL
is specified. The fields returned are a subset of those in the pg_stat_activity
view.
pg_stat_get_snapshot_timestamp
() → timestamp with time zone
Returns the timestamp of the current statistics snapshot, or NULL if no statistics snapshot has been taken. A snapshot is taken the first time cumulative statistics are accessed in a transaction if stats_fetch_consistency
is set to snapshot
pg_stat_clear_snapshot
() → void
Discards the current statistics snapshot or cached information.
pg_stat_reset
() → void
Resets all statistics counters for the current database to zero.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_stat_reset_shared
( text
) → void
Resets some cluster-wide statistics counters to zero, depending on the argument. The argument can be bgwriter
to reset all the counters shown in the pg_stat_bgwriter
view, archiver
to reset all the counters shown in the pg_stat_archiver
view, wal
to reset all the counters shown in the pg_stat_wal
view or recovery_prefetch
to reset all the counters shown in the pg_stat_recovery_prefetch
view.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_stat_reset_single_table_counters
( oid
) → void
Resets statistics for a single table or index in the current database or shared across all databases in the cluster to zero.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_stat_reset_single_function_counters
( oid
) → void
Resets statistics for a single function in the current database to zero.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_stat_reset_slru
( text
) → void
Resets statistics to zero for a single SLRU cache, or for all SLRUs in the cluster. If the argument is NULL, all counters shown in the pg_stat_slru
view for all SLRU caches are reset. The argument can be one of CommitTs
, MultiXactMember
, MultiXactOffset
, Notify
, Serial
, Subtrans
, or Xact
to reset the counters for only that entry. If the argument is other
(or indeed, any unrecognized name), then the counters for all other SLRU caches, such as extension-defined caches, are reset.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_stat_reset_replication_slot
( text
) → void
Resets statistics of the replication slot defined by the argument. If the argument is NULL
, resets statistics for all the replication slots.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
pg_stat_reset_subscription_stats
( oid
) → void
Resets statistics for a single subscription shown in the pg_stat_subscription_stats
view to zero. If the argument is NULL
, reset statistics for all subscriptions.
This function is restricted to superusers by default, but other users can be granted EXECUTE to run the function.
Using pg_stat_reset()
also resets counters that autovacuum uses to determine when to trigger a vacuum or an analyze. Resetting these counters can cause autovacuum to not perform necessary work, which can cause problems such as table bloat or out-dated table statistics. A database-wide ANALYZE
is recommended after the statistics have been reset.
pg_stat_get_activity
, the underlying function of the pg_stat_activity
view, returns a set of records containing all the available information about each backend process. Sometimes it may be more convenient to obtain just a subset of this information. In such cases, an older set of per-backend statistics access functions can be used; these are shown in Table 28.35. These access functions use a backend ID number, which ranges from one to the number of currently active backends. The function pg_stat_get_backend_idset
provides a convenient way to generate one row for each active backend for invoking these functions. For example, to show the PIDs and current queries of all backends:
Function
Description
pg_stat_get_backend_idset
() → setof integer
Returns the set of currently active backend ID numbers (from 1 to the number of active backends).
pg_stat_get_backend_activity
( integer
) → text
Returns the text of this backend's most recent query.
pg_stat_get_backend_activity_start
( integer
) → timestamp with time zone
Returns the time when the backend's most recent query was started.
pg_stat_get_backend_client_addr
( integer
) → inet
Returns the IP address of the client connected to this backend.
pg_stat_get_backend_client_port
( integer
) → integer
Returns the TCP port number that the client is using for communication.
pg_stat_get_backend_dbid
( integer
) → oid
Returns the OID of the database this backend is connected to.
pg_stat_get_backend_pid
( integer
) → integer
Returns the process ID of this backend.
pg_stat_get_backend_start
( integer
) → timestamp with time zone
Returns the time when this process was started.
pg_stat_get_backend_userid
( integer
) → oid
Returns the OID of the user logged into this backend.
pg_stat_get_backend_wait_event_type
( integer
) → text
pg_stat_get_backend_wait_event
( integer
) → text
pg_stat_get_backend_xact_start
( integer
) → timestamp with time zone
Returns the time when the backend's current transaction was started.
PostgreSQL provides facilities to support dynamic tracing of the database server. This allows an external utility to be called at specific points in the code and thereby trace execution.
A number of probes or trace points are already inserted into the source code. These probes are intended to be used by database developers and administrators. By default the probes are not compiled into PostgreSQL; the user needs to explicitly tell the configure script to make the probes available.
Currently, the DTrace utility is supported, which, at the time of this writing, is available on Solaris, macOS, FreeBSD, NetBSD, and Oracle Linux. The SystemTap project for Linux provides a DTrace equivalent and can also be used. Supporting other dynamic tracing utilities is theoretically possible by changing the definitions for the macros in src/include/utils/probes.h
.
By default, probes are not available, so you will need to explicitly tell the configure script to make the probes available in PostgreSQL. To include DTrace support specify --enable-dtrace
to configure. See Section 17.4 for further information.
A number of standard probes are provided in the source code, as shown in Table 28.47; Table 28.48 shows the types used in the probes. More probes can certainly be added to enhance PostgreSQL's observability.
transaction-start
(LocalTransactionId)
Probe that fires at the start of a new transaction. arg0 is the transaction ID.
transaction-commit
(LocalTransactionId)
Probe that fires when a transaction completes successfully. arg0 is the transaction ID.
transaction-abort
(LocalTransactionId)
Probe that fires when a transaction completes unsuccessfully. arg0 is the transaction ID.
query-start
(const char *)
Probe that fires when the processing of a query is started. arg0 is the query string.
query-done
(const char *)
Probe that fires when the processing of a query is complete. arg0 is the query string.
query-parse-start
(const char *)
Probe that fires when the parsing of a query is started. arg0 is the query string.
query-parse-done
(const char *)
Probe that fires when the parsing of a query is complete. arg0 is the query string.
query-rewrite-start
(const char *)
Probe that fires when the rewriting of a query is started. arg0 is the query string.
query-rewrite-done
(const char *)
Probe that fires when the rewriting of a query is complete. arg0 is the query string.
query-plan-start
()
Probe that fires when the planning of a query is started.
query-plan-done
()
Probe that fires when the planning of a query is complete.
query-execute-start
()
Probe that fires when the execution of a query is started.
query-execute-done
()
Probe that fires when the execution of a query is complete.
statement-status
(const char *)
Probe that fires anytime the server process updates its pg_stat_activity
.status
. arg0 is the new status string.
checkpoint-start
(int)
Probe that fires when a checkpoint is started. arg0 holds the bitwise flags used to distinguish different checkpoint types, such as shutdown, immediate or force.
checkpoint-done
(int, int, int, int, int)
Probe that fires when a checkpoint is complete. (The probes listed next fire in sequence during checkpoint processing.) arg0 is the number of buffers written. arg1 is the total number of buffers. arg2, arg3 and arg4 contain the number of WAL files added, removed and recycled respectively.
clog-checkpoint-start
(bool)
Probe that fires when the CLOG portion of a checkpoint is started. arg0 is true for normal checkpoint, false for shutdown checkpoint.
clog-checkpoint-done
(bool)
Probe that fires when the CLOG portion of a checkpoint is complete. arg0 has the same meaning as for clog-checkpoint-start
.
subtrans-checkpoint-start
(bool)
Probe that fires when the SUBTRANS portion of a checkpoint is started. arg0 is true for normal checkpoint, false for shutdown checkpoint.
subtrans-checkpoint-done
(bool)
Probe that fires when the SUBTRANS portion of a checkpoint is complete. arg0 has the same meaning as for subtrans-checkpoint-start
.
multixact-checkpoint-start
(bool)
Probe that fires when the MultiXact portion of a checkpoint is started. arg0 is true for normal checkpoint, false for shutdown checkpoint.
multixact-checkpoint-done
(bool)
Probe that fires when the MultiXact portion of a checkpoint is complete. arg0 has the same meaning as for multixact-checkpoint-start
.
buffer-checkpoint-start
(int)
Probe that fires when the buffer-writing portion of a checkpoint is started. arg0 holds the bitwise flags used to distinguish different checkpoint types, such as shutdown, immediate or force.
buffer-sync-start
(int, int)
Probe that fires when we begin to write dirty buffers during checkpoint (after identifying which buffers must be written). arg0 is the total number of buffers. arg1 is the number that are currently dirty and need to be written.
buffer-sync-written
(int)
Probe that fires after each buffer is written during checkpoint. arg0 is the ID number of the buffer.
buffer-sync-done
(int, int, int)
Probe that fires when all dirty buffers have been written. arg0 is the total number of buffers. arg1 is the number of buffers actually written by the checkpoint process. arg2 is the number that were expected to be written (arg1 of buffer-sync-start
); any difference reflects other processes flushing buffers during the checkpoint.
buffer-checkpoint-sync-start
()
Probe that fires after dirty buffers have been written to the kernel, and before starting to issue fsync requests.
buffer-checkpoint-done
()
Probe that fires when syncing of buffers to disk is complete.
twophase-checkpoint-start
()
Probe that fires when the two-phase portion of a checkpoint is started.
twophase-checkpoint-done
()
Probe that fires when the two-phase portion of a checkpoint is complete.
buffer-read-start
(ForkNumber, BlockNumber, Oid, Oid, Oid, int, bool)
Probe that fires when a buffer read is started. arg0 and arg1 contain the fork and block numbers of the page (but arg1 will be -1 if this is a relation extension request). arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. arg5 is the ID of the backend which created the temporary relation for a local buffer, or InvalidBackendId
(-1) for a shared buffer. arg6 is true for a relation extension request, false for normal read.
buffer-read-done
(ForkNumber, BlockNumber, Oid, Oid, Oid, int, bool, bool)
Probe that fires when a buffer read is complete. arg0 and arg1 contain the fork and block numbers of the page (if this is a relation extension request, arg1 now contains the block number of the newly added block). arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. arg5 is the ID of the backend which created the temporary relation for a local buffer, or InvalidBackendId
(-1) for a shared buffer. arg6 is true for a relation extension request, false for normal read. arg7 is true if the buffer was found in the pool, false if not.
buffer-flush-start
(ForkNumber, BlockNumber, Oid, Oid, Oid)
Probe that fires before issuing any write request for a shared buffer. arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation.
buffer-flush-done
(ForkNumber, BlockNumber, Oid, Oid, Oid)
Probe that fires when a write request is complete. (Note that this just reflects the time to pass the data to the kernel; it's typically not actually been written to disk yet.) The arguments are the same as for buffer-flush-start
.
buffer-write-dirty-start
(ForkNumber, BlockNumber, Oid, Oid, Oid)
buffer-write-dirty-done
(ForkNumber, BlockNumber, Oid, Oid, Oid)
Probe that fires when a dirty-buffer write is complete. The arguments are the same as for buffer-write-dirty-start
.
wal-buffer-write-dirty-start
()
wal-buffer-write-dirty-done
()
Probe that fires when a dirty WAL buffer write is complete.
wal-insert
(unsigned char, unsigned char)
Probe that fires when a WAL record is inserted. arg0 is the resource manager (rmid) for the record. arg1 contains the info flags.
wal-switch
()
Probe that fires when a WAL segment switch is requested.
smgr-md-read-start
(ForkNumber, BlockNumber, Oid, Oid, Oid, int)
Probe that fires when beginning to read a block from a relation. arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. arg5 is the ID of the backend which created the temporary relation for a local buffer, or InvalidBackendId
(-1) for a shared buffer.
smgr-md-read-done
(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int)
Probe that fires when a block read is complete. arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. arg5 is the ID of the backend which created the temporary relation for a local buffer, or InvalidBackendId
(-1) for a shared buffer. arg6 is the number of bytes actually read, while arg7 is the number requested (if these are different it indicates trouble).
smgr-md-write-start
(ForkNumber, BlockNumber, Oid, Oid, Oid, int)
Probe that fires when beginning to write a block to a relation. arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. arg5 is the ID of the backend which created the temporary relation for a local buffer, or InvalidBackendId
(-1) for a shared buffer.
smgr-md-write-done
(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int)
Probe that fires when a block write is complete. arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. arg5 is the ID of the backend which created the temporary relation for a local buffer, or InvalidBackendId
(-1) for a shared buffer. arg6 is the number of bytes actually written, while arg7 is the number requested (if these are different it indicates trouble).
sort-start
(int, bool, int, int, bool, int)
Probe that fires when a sort operation is started. arg0 indicates heap, index or datum sort. arg1 is true for unique-value enforcement. arg2 is the number of key columns. arg3 is the number of kilobytes of work memory allowed. arg4 is true if random access to the sort result is required. arg5 indicates serial when 0
, parallel worker when 1
, or parallel leader when 2
.
sort-done
(bool, long)
Probe that fires when a sort is complete. arg0 is true for external sort, false for internal sort. arg1 is the number of disk blocks used for an external sort, or kilobytes of memory used for an internal sort.
lwlock-acquire
(char *, LWLockMode)
Probe that fires when an LWLock has been acquired. arg0 is the LWLock's tranche. arg1 is the requested lock mode, either exclusive or shared.
lwlock-release
(char *)
Probe that fires when an LWLock has been released (but note that any released waiters have not yet been awakened). arg0 is the LWLock's tranche.
lwlock-wait-start
(char *, LWLockMode)
Probe that fires when an LWLock was not immediately available and a server process has begun to wait for the lock to become available. arg0 is the LWLock's tranche. arg1 is the requested lock mode, either exclusive or shared.
lwlock-wait-done
(char *, LWLockMode)
Probe that fires when a server process has been released from its wait for an LWLock (it does not actually have the lock yet). arg0 is the LWLock's tranche. arg1 is the requested lock mode, either exclusive or shared.
lwlock-condacquire
(char *, LWLockMode)
Probe that fires when an LWLock was successfully acquired when the caller specified no waiting. arg0 is the LWLock's tranche. arg1 is the requested lock mode, either exclusive or shared.
lwlock-condacquire-fail
(char *, LWLockMode)
Probe that fires when an LWLock was not successfully acquired when the caller specified no waiting. arg0 is the LWLock's tranche. arg1 is the requested lock mode, either exclusive or shared.
lock-wait-start
(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE)
Probe that fires when a request for a heavyweight lock (lmgr lock) has begun to wait because the lock is not available. arg0 through arg3 are the tag fields identifying the object being locked. arg4 indicates the type of object being locked. arg5 indicates the lock type being requested.
lock-wait-done
(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE)
Probe that fires when a request for a heavyweight lock (lmgr lock) has finished waiting (i.e., has acquired the lock). The arguments are the same as for lock-wait-start
.
deadlock-found
()
Probe that fires when a deadlock is found by the deadlock detector.
LocalTransactionId
unsigned int
LWLockMode
int
LOCKMODE
int
BlockNumber
unsigned int
Oid
unsigned int
ForkNumber
int
bool
unsigned char
The example below shows a DTrace script for analyzing transaction counts in the system, as an alternative to snapshotting pg_stat_database
before and after a performance test:
When executed, the example D script gives output such as:
SystemTap uses a different notation for trace scripts than DTrace does, even though the underlying trace points are compatible. One point worth noting is that at this writing, SystemTap scripts must reference probe names using double underscores in place of hyphens. This is expected to be fixed in future SystemTap releases.
You should remember that DTrace scripts need to be carefully written and debugged, otherwise the trace information collected might be meaningless. In most cases where problems are found it is the instrumentation that is at fault, not the underlying system. When discussing information found using dynamic tracing, be sure to enclose the script used to allow that too to be checked and discussed.
New probes can be defined within the code wherever the developer desires, though this will require a recompilation. Below are the steps for inserting new probes:
Decide on probe names and data to be made available through the probes
Add the probe definitions to src/backend/utils/probes.d
Include pg_trace.h
if it is not already present in the module(s) containing the probe points, and insert TRACE_POSTGRESQL
probe macros at the desired locations in the source code
Recompile and verify that the new probes are available
Example: Here is an example of how you would add a probe to trace all new transactions by transaction ID.
Decide that the probe will be named transaction-start
and requires a parameter of type LocalTransactionId
Add the probe definition to src/backend/utils/probes.d
:
Note the use of the double underline in the probe name. In a DTrace script using the probe, the double underline needs to be replaced with a hyphen, so transaction-start
is the name to document for users.
At compile time, transaction__start
is converted to a macro called TRACE_POSTGRESQL_TRANSACTION_START
(notice the underscores are single here), which is available by including pg_trace.h
. Add the macro call to the appropriate location in the source code. In this case, it looks like the following:
After recompiling and running the new binary, check that your newly added probe is available by executing the following DTrace command. You should see similar output:
There are a few things to be careful about when adding trace macros to the C code:
You should take care that the data types specified for a probe's parameters match the data types of the variables used in the macro. Otherwise, you will get compilation errors.
On most platforms, if PostgreSQL is built with --enable-dtrace
, the arguments to a trace macro will be evaluated whenever control passes through the macro, even if no tracing is being done. This is usually not worth worrying about if you are just reporting the values of a few local variables. But beware of putting expensive function calls into the arguments. If you need to do that, consider protecting the macro with a check to see if the trace is actually enabled:
Each trace macro has a corresponding ENABLED
macro.
One row per server process, showing information related to the current activity of that process, such as state and current query. See for details.
One row per WAL sender process, showing statistics about replication to that sender's connected standby server. See for details.
Only one row, showing statistics about the WAL receiver from that receiver's connected server. See for details.
Only one row, showing statistics about blocks prefetched during recovery. See for details.
At least one row per subscription, showing information about the subscription workers. See for details.
One row per connection (regular and replication), showing information about SSL used on this connection. See for details.
One row per connection (regular and replication), showing information about GSSAPI authentication and encryption used on this connection. See for details.
One row for each backend (including autovacuum worker processes) running ANALYZE
, showing current progress. See .
One row for each backend running CREATE INDEX
or REINDEX
, showing current progress. See .
One row for each backend (including autovacuum worker processes) running VACUUM
, showing current progress. See .
One row for each backend running CLUSTER
or VACUUM FULL
, showing current progress. See .
One row for each WAL sender process streaming a base backup, showing current progress. See .
One row for each backend running COPY
, showing current progress. See .
One row only, showing statistics about the WAL archiver process's activity. See for details.
One row only, showing statistics about the background writer process's activity. See for details.
One row only, showing statistics about WAL activity. See for details.
One row per database, showing database-wide statistics. See for details.
One row per database, showing database-wide statistics about query cancels due to conflict with recovery on standby servers. See for details.
One row for each table in the current database, showing statistics about accesses to that specific table. See for details.
One row for each index in the current database, showing statistics about accesses to that specific index. See for details.
One row for each table in the current database, showing statistics about I/O on that specific table. See for details.
One row for each index in the current database, showing statistics about I/O on that specific index. See for details.
One row for each sequence in the current database, showing statistics about I/O on that specific sequence. See for details.
One row for each tracked function, showing statistics about executions of that function. See for details.
One row per SLRU, showing statistics of operations. See for details.
One row per replication slot, showing statistics about the replication slot's usage. See for details.
One row per subscription, showing statistics about errors. See for details.
Host name of the connected client, as reported by a reverse DNS lookup of client_addr
. This field will only be non-null for IP connections, and only when is enabled.
The type of event for which the backend is waiting, if any; otherwise NULL. See .
Wait event name if backend is currently waiting, otherwise NULL. See through .
disabled
: This state is reported if is disabled in this backend.
Identifier of this backend's most recent query. If state
is active
this field shows the identifier of the currently executing query. In all other states, it shows the identifier of last query that was executed. Query identifiers are not computed by default so this field will be null unless parameter is enabled or a third-party module that computes query identifiers is configured.
Text of this backend's most recent query. If state
is active
this field shows the currently executing query. In all other states, it shows the last query that was executed. By default the query text is truncated at 1024 bytes; this value can be changed via the parameter .
The server process is idle. This event type indicates a process waiting for activity in its main processing loop. wait_event
will identify the specific wait point; see .
The server process is waiting for exclusive access to a data buffer. Buffer pin waits can be protracted if another process holds an open cursor that last read data from the buffer in question. See .
The server process is waiting for activity on a socket connected to a user application. Thus, the server expects something to happen that is independent of its internal processes. wait_event
will identify the specific wait point; see .
The server process is waiting for some condition defined by an extension module. See .
The server process is waiting for an I/O operation to complete. wait_event
will identify the specific wait point; see .
The server process is waiting for some interaction with another server process. wait_event
will identify the specific wait point; see .
The server process is waiting for a heavyweight lock. Heavyweight locks, also known as lock manager locks or simply locks, primarily protect SQL-visible objects such as tables. However, they are also used to ensure mutual exclusion for certain internal operations such as relation extension. wait_event
will identify the type of lock awaited; see .
The server process is waiting for a lightweight lock. Most such locks protect a particular data structure in shared memory. wait_event
will contain a name identifying the purpose of the lightweight lock. (Some locks have specific names; others are part of a group of locks each with a similar purpose.) See .
The server process is waiting for a timeout to expire. wait_event
will identify the specific wait point; see .
Waiting for to complete.
Waiting for to complete.
Waiting for to complete.
Waiting for to complete.
Host name of the connected client, as reported by a reverse DNS lookup of client_addr
. This field will only be non-null for IP connections, and only when is enabled.
This standby's xmin
horizon reported by .
Number of times WAL buffers were written out to disk via XLogWrite
request. See for more information about the internal WAL function XLogWrite
.
Number of times WAL files were synced to disk via issue_xlog_fsync
request (if is on
and is either fdatasync
, fsync
or fsync_writethrough
, otherwise zero). See for more information about the internal WAL function issue_xlog_fsync
.
Total amount of time spent writing WAL buffers to disk via XLogWrite
request, in milliseconds (if is enabled, otherwise zero). This includes the sync time when wal_sync_method
is either open_datasync
or open_sync
.
Number of queries canceled due to conflicts with recovery in this database. (Conflicts occur only on standby servers; see for details.)
Number of temporary files created by queries in this database. All temporary files are counted, regardless of why the temporary file was created (e.g., sorting or hashing), and regardless of the setting.
Total amount of data written to temporary files by queries in this database. All temporary files are counted, regardless of why the temporary file was created, and regardless of the setting.
Time spent reading data file blocks by backends in this database, in milliseconds (if is enabled, otherwise zero)
Time spent writing data file blocks by backends in this database, in milliseconds (if is enabled, otherwise zero)
Time spent executing SQL statements in this database, in milliseconds (this corresponds to the states active
and fastpath function call
in )
Time spent idling while in a transaction in this database, in milliseconds (this corresponds to the states idle in transaction
and idle in transaction (aborted)
in )
Number of rows updated (includes )
Returns the wait event type name if this backend is currently waiting, otherwise NULL. See for details.
Returns the wait event name if this backend is currently waiting, otherwise NULL. See through .
Probe that fires when a server process begins to write a dirty buffer. (If this happens often, it implies that is too small or the background writer control parameters need adjustment.) arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation.
Probe that fires when a server process begins to write a dirty WAL buffer because no more WAL buffer space is available. (If this happens often, it implies that is too small.)