Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
本章介紹使用原始碼安裝 PostgreSQL。(如果您正在安裝預先封裝的發行版,例如 RPM 或 Debian 套件,請忽略本章並改為閱讀套件程序的說明。)
安裝過程的第一步是設定原始碼編譯時所需的選項。這是透過執行 configure 腳本完成的。對於預設安裝而言,只需輸入:
此腳本將執行許多測試以確定各種系統相關變數的值,並檢測操作業系統的所有特性,最後將在編譯樹中建立多個檔案以記錄它找到的內容。如果要將編譯目錄分開,也可以在原始碼以外的目錄中執行 configure。此過程也稱為 VPATH 編譯。可以這樣做:
預設配置將編譯伺服器和工具程式,以及僅需要 C 編譯器的所有用戶端應用程式和介存取介面。預設情況下,所有檔案都將安裝在 /usr/local/pgsql 下。
您可以透過提供以下一個或多個命令列選項來自訂編譯的過程:
--prefix=
PREFIX
安裝所以檔案到到目錄 PREFIX 下而不是 /usr/local/pgsql。實際檔案將安裝到各個子目錄中;任何檔案都不會直接安裝到 PREFIX 目錄中。
如果您有特殊需求,還可以使用以下選項自訂各個子目錄。但是,如果保留這些預設值,則安裝結果將是可重新配置的,這意味著您可以在安裝後移動目錄。(man 和 doc 路徑不受此影響。)
對於可重新配置的安裝,您可能希望使用 configure 的 --disable-rpath 選項。此外,您需要告訴作業系統如何尋找共享函式庫。
--exec-prefix=
EXEC-PREFIX
您可以在與 PREFIX 設定的前綴不同的前綴 EXEC-PREFIX 下安裝相依於系統結構的檔案。這對於在主機之間共享與系統結構無關的檔案非常有用。如果省略這一點,則 EXEC-PREFIX 設定為等於 PREFIX,並且相依於系統結構的檔案和獨立檔案都將安裝在同一個樹下,這可能就是您想要的。
--bindir=
DIRECTORY
指定可執行程式的目錄。預設值為 EXEC-PREFIX/bin,通常為 /usr/local/pgsql/bin。
--sysconfdir=
DIRECTORY
預設設定各種組態配置檔案的目錄,PREFIX/etc。
--libdir=
DIRECTORY
設定安裝函式庫和動態模組的位置。預設值為 EXEC-PREFIX/lib。
--includedir=
DIRECTORY
設定安裝 C 和 C++ 標頭檔案的目錄。預設值為 PREFIX/include。
--datarootdir=
DIRECTORY
設定各種類型的唯讀資料檔案的根目錄。這僅設定以下某些選項的預設值。預設值為 PREFIX/share。
--datadir=
DIRECTORY
設定安裝好的程式所使用的唯讀資料檔案目錄。預設值為 DATAROOTDIR。請注意,這與放置資料庫檔案的位置無關。
--localedir=
DIRECTORY
設定用於安裝區域設定資料的目錄,像是訊息翻譯的目錄檔案。預設值為 DATAROOTDIR/locale。
--mandir=
DIRECTORY
PostgreSQL 附帶的手冊頁面將安裝在此目錄下的各自 manx 子目錄中。預設值為 DATAROOTDIR/man。
--docdir=
DIRECTORY
設定安裝文件檔案的根目錄,“man” 頁面除外。這僅設定以下選項的預設值。此選項的預設值為 DARAROOTDIR/doc/postgresql。
--htmldir=
DIRECTORY
PostgreSQL 的 HTML 格式文件檔案將安裝在此目錄下。預設值為 DATAROOTDIR。
注意 可以將 PostgreSQL 安裝到共享安裝位置(例如 /usr/local/include),而不會干擾系統其餘部分的命名空間。首先,字串 “/postgresql” 會自動附加到 datadir,sysconfdir 和docdir,除非完全展開的目錄名已包含字串 “postgres” 或 “pgsql”。例如,如果選擇 /usr/local 作為前綴,則檔案將安裝在 /usr/local/doc/postgresql 中,但如果前綴為 /opt/postgres,則它將位於 /opt/postgres/doc 中。用戶端介面的公用 C 標頭檔案安裝在 includedir 中,並且命名空間是清楚的。內部標頭檔案和伺服器標頭檔案安裝在 includedir 下的私有目錄中。有關如何存取其標頭檔案的訊息,請參閱每個介面的文件檔案。最後,如果可以的話,還將在 libdir 下為可動態載入的模組建立一個私有的子目錄。
--with-extra-version=
STRING
將 STRING 附加到 PostgreSQL 版本號。例如,您可以使用此標記來標記從未發布的 Git 快照所編譯的二進位檔案,或者包含帶有額外版本字串的自訂修補程式,例如 git describe 識別字或某個發行套裝的版本號碼。
--with-includes=
DIRECTORIES
DIRECTORIES 是一個以冒號分隔的目錄列表,它們將加到編譯器搜尋標頭檔案的列表中。如果您在非標準路徑安裝了選擇性套件(例如GNU Readline),則必須使用此選項,並且可能還需要設定相對應的 --with-libraries 選項。
例如: --with-includes=/opt/gnu/include:/usr/sup/include
--with-libraries=
DIRECTORIES
DIRECTORIES 是一個以冒號分隔的目錄列表,用於搜尋函式庫。如果您在非標準路徑安裝了某些軟體套件,則可能必須使用此選項(以及相對應的 --with-includes 選項)。
例如: --with-libraries=/opt/gnu/lib:/usr/sup/lib
--enable-nls[=
LANGUAGES
]
啟用內建語言支援(NLS),即以英語以外的語言顯示程式訊息的功能。 LANGUAGES 是您希望支援語言代碼的選擇性空格分隔列表,例如 --enable-nls ='de fr'。(將自動計算列表與實際提供的翻譯集之間的交集。)如果未指定列表,則會安裝所有可用的翻譯。
要使用此選項,您需要實作 Gettext API;如上所述。
--with-pgport=
NUMBER
將 NUMBER 設定為伺服器和用戶端的預設連接埠號碼。預設值為 5432。之後可以隨時更改連接埠,但如果在此處指定端口,則伺服器和用戶端都將具有相同的預設編譯,這會非常方便。通常,選擇非預設值的唯一理由是,您打算在同一台機器上執行多個 PostgreSQL 伺服器。
--with-perl
編譯 PL / Perl 伺服器端語言。
--with-python
編譯 PL / Python 伺服器端語言。
--with-tcl
編譯 PL / Tcl 伺服器端語言。
--with-tclconfig=
DIRECTORY
Tcl 安裝檔案 tclConfig.sh,其中包含編譯與 Tcl 介面模組所需的組態資訊。此檔案通常在一個眾所周知的路徑中自動找到,但如果您想使用不同版本的 Tcl,則可以指定搜尋它的目錄。
--with-gssapi
Build with support for GSSAPI authentication. On many systems, the GSSAPI (usually a part of the Kerberos installation) system is not installed in a location that is searched by default (e.g.,/usr/include
, /usr/lib
), so you must use the options --with-includes
and --with-libraries
in addition to this option. configure
will check for the required header files and libraries to make sure that your GSSAPI installation is sufficient before proceeding.
--with-krb-srvnam=
NAME
The default name of the Kerberos service principal used by GSSAPI. postgres
is the default. There's usually no reason to change this unless you have a Windows environment, in which case it must be set to upper case POSTGRES
.
--with-llvm
Build with support for LLVM based JIT compilation (see Chapter 32). This requires the LLVM library to be installed. The minimum required version of LLVM is currently 3.9.
llvm-config
will be used to find the required compilation options. llvm-config
, and then llvm-config-$major-$minor
for all supported versions, will be searched on PATH
. If that would not yield the correct binary, use LLVM_CONFIG
to specify a path to the correct llvm-config
. For example
LLVM support requires a compatible clang
compiler (specified, if necessary, using the CLANG
environment variable), and a working C++ compiler (specified, if necessary, using the CXX
environment variable).
--with-icu
Build with support for the ICU library. This requires the ICU4C package to be installed. The minimum required version of ICU4C is currently 4.2.
By default, pkg-config will be used to find the required compilation options. This is supported for ICU4C version 4.6 and later. For older versions, or if pkg-config is not available, the variables ICU_CFLAGS
and ICU_LIBS
can be specified to configure
, like in this example:
(If ICU4C is in the default search path for the compiler, then you still need to specify a nonempty string in order to avoid use of pkg-config, for example, ICU_CFLAGS=' '
.)
--with-openssl
Build with support for SSL (encrypted) connections. This requires the OpenSSL package to be installed. configure
will check for the required header files and libraries to make sure that your OpenSSL installation is sufficient before proceeding.
--with-pam
Build with PAM (Pluggable Authentication Modules) support.
--with-bsd-auth
Build with BSD Authentication support. (The BSD Authentication framework is currently only available on OpenBSD.)
--with-ldap
Build with LDAP support for authentication and connection parameter lookup (see Section 34.17 and Section 20.10 for more information). On Unix, this requires the OpenLDAP package to be installed. On Windows, the default WinLDAP library is used. configure
will check for the required header files and libraries to make sure that your OpenLDAP installation is sufficient before proceeding.
--with-systemd
Build with support for systemd service notifications. This improves integration if the server binary is started under systemd but has no impact otherwise; see Section 18.3 for more information. libsystemd and the associated header files need to be installed to be able to use this option.
--without-readline
Prevents use of the Readline library (and libedit as well). This option disables command-line editing and history in psql, so it is not recommended.
--with-libedit-preferred
Favors the use of the BSD-licensed libedit library rather than GPL-licensed Readline. This option is significant only if you have both libraries installed; the default in that case is to use Readline.
--with-bonjour
Build with Bonjour support. This requires Bonjour support in your operating system. Recommended on macOS.
--with-uuid=
LIBRARY
Build the uuid-ossp module (which provides functions to generate UUIDs), using the specified UUID library. LIBRARY
must be one of:
bsd
to use the UUID functions found in FreeBSD, NetBSD, and some other BSD-derived systems
e2fs
to use the UUID library created by the e2fsprogs
project; this library is present in most Linux systems and in macOS, and can be obtained for other platforms as well
ossp
to use the OSSP UUID library
--with-ossp-uuid
Obsolete equivalent of --with-uuid=ossp
.
--with-libxml
Build with libxml (enables SQL/XML support). Libxml version 2.6.23 or later is required for this feature.
Libxml installs a program xml2-config
that can be used to detect the required compiler and linker options. PostgreSQL will use it automatically if found. To specify a libxml installation at an unusual location, you can either set the environment variable XML2_CONFIG
to point to the xml2-config
program belonging to the installation, or use the options --with-includes
and --with-libraries
.--with-libxslt
Use libxslt when building the xml2 module. xml2 relies on this library to perform XSL transformations of XML.
--disable-float4-byval
Disable passing float4 values “by value”, causing them to be passed “by reference” instead. This option costs performance, but may be needed for compatibility with old user-defined functions that are written in C and use the “version 0” calling convention. A better long-term solution is to update any such functions to use the “version 1” calling convention.
--disable-float8-byval
Disable passing float8 values “by value”, causing them to be passed “by reference” instead. This option costs performance, but may be needed for compatibility with old user-defined functions that are written in C and use the “version 0” calling convention. A better long-term solution is to update any such functions to use the “version 1” calling convention. Note that this option affects not only float8, but also int8 and some related types such as timestamp. On 32-bit platforms, --disable-float8-byval
is the default and it is not allowed to select --enable-float8-byval
.
--with-segsize=
SEGSIZE
Set the segment size, in gigabytes. Large tables are divided into multiple operating-system files, each of size equal to the segment size. This avoids problems with file size limits that exist on many platforms. The default segment size, 1 gigabyte, is safe on all supported platforms. If your operating system has “largefile” support (which most do, nowadays), you can use a larger segment size. This can be helpful to reduce the number of file descriptors consumed when working with very large tables. But be careful not to select a value larger than is supported by your platform and the file systems you intend to use. Other tools you might wish to use, such as tar, could also set limits on the usable file size. It is recommended, though not absolutely required, that this value be a power of 2. Note that changing this value requires an initdb.
--with-blocksize=
BLOCKSIZE
Set the block size, in kilobytes. This is the unit of storage and I/O within tables. The default, 8 kilobytes, is suitable for most situations; but other values may be useful in special cases. The value must be a power of 2 between 1 and 32 (kilobytes). Note that changing this value requires an initdb.
--with-wal-blocksize=
BLOCKSIZE
Set the WAL block size, in kilobytes. This is the unit of storage and I/O within the WAL log. The default, 8 kilobytes, is suitable for most situations; but other values may be useful in special cases. The value must be a power of 2 between 1 and 64 (kilobytes). Note that changing this value requires an initdb.
--disable-spinlocks
Allow the build to succeed even if PostgreSQL has no CPU spinlock support for the platform. The lack of spinlock support will result in poor performance; therefore, this option should only be used if the build aborts and informs you that the platform lacks spinlock support. If this option is required to build PostgreSQL on your platform, please report the problem to the PostgreSQLdevelopers.
--disable-strong-random
Allow the build to succeed even if PostgreSQL has no support for strong random numbers on the platform. A source of random numbers is needed for some authentication protocols, as well as some routines in the pgcrypto module. --disable-strong-random
disables functionality that requires cryptographically strong random numbers, and substitutes a weak pseudo-random-number-generator for the generation of authentication salt values and query cancel keys. It may make authentication less secure.
--disable-thread-safety
Disable the thread-safety of client libraries. This prevents concurrent threads in libpq and ECPG programs from safely controlling their private connection handles.
--with-system-tzdata=
DIRECTORY
PostgreSQL includes its own time zone database, which it requires for date and time operations. This time zone database is in fact compatible with the IANA time zone database provided by many operating systems such as FreeBSD, Linux, and Solaris, so it would be redundant to install it again. When this option is used, the system-supplied time zone database in DIRECTORY
is used instead of the one included in the PostgreSQL source distribution. DIRECTORY
must be specified as an absolute path. /usr/share/zoneinfo
is a likely directory on some operating systems. Note that the installation routine will not detect mismatching or erroneous time zone data. If you use this option, you are advised to run the regression tests to verify that the time zone data you have pointed to works correctly with PostgreSQL.
This option is mainly aimed at binary package distributors who know their target operating system well. The main advantage of using this option is that the PostgreSQL package won't need to be upgraded whenever any of the many local daylight-saving time rules change. Another advantage is that PostgreSQL can be cross-compiled more straightforwardly if the time zone database files do not need to be built during the installation.
--without-zlib
Prevents use of the Zlib library. This disables support for compressed archives in pg_dump and pg_restore. This option is only intended for those rare systems where this library is not available.
--enable-debug
Compiles all programs and libraries with debugging symbols. This means that you can run the programs in a debugger to analyze problems. This enlarges the size of the installed executables considerably, and on non-GCC compilers it usually also disables compiler optimization, causing slowdowns. However, having the symbols available is extremely helpful for dealing with any problems that might arise. Currently, this option is recommended for production installations only if you use GCC. But you should always have it on if you are doing development work or running a beta version.
--enable-coverage
If using GCC, all programs and libraries are compiled with code coverage testing instrumentation. When run, they generate files in the build directory with code coverage metrics. SeeSection 33.5 for more information. This option is for use only with GCC and when doing development work.
--enable-profiling
If using GCC, all programs and libraries are compiled so they can be profiled. On backend exit, a subdirectory will be created that contains the gmon.out
file for use in profiling. This option is for use only with GCC and when doing development work.
--enable-cassert
Enables assertion checks in the server, which test for many “cannot happen” conditions. This is invaluable for code development purposes, but the tests can slow down the server significantly. Also, having the tests turned on won't necessarily enhance the stability of your server! The assertion checks are not categorized for severity, and so what might be a relatively harmless bug will still lead to server restarts if it triggers an assertion failure. This option is not recommended for production use, but you should have it on for development work or when running a beta version.
--enable-depend
Enables automatic dependency tracking. With this option, the makefiles are set up so that all affected object files will be rebuilt when any header file is changed. This is useful if you are doing development work, but is just wasted overhead if you intend only to compile once and install. At present, this option only works with GCC.
--enable-dtrace
Compiles PostgreSQL with support for the dynamic tracing tool DTrace. See Section 28.5 for more information.
To point to the dtrace
program, the environment variable DTRACE
can be set. This will often be necessary because dtrace
is typically installed under /usr/sbin
, which might not be in the path.
Extra command-line options for the dtrace
program can be specified in the environment variable DTRACEFLAGS
. On Solaris, to include DTrace support in a 64-bit binary, you must specify DTRACEFLAGS="-64"
to configure. For example, using the GCC compiler:
Using Sun's compiler:
--enable-tap-tests
Enable tests using the Perl TAP tools. This requires a Perl installation and the Perl module IPC::Run
. See Section 33.4 for more information.
If you prefer a C compiler different from the one configure
picks, you can set the environment variable CC
to the program of your choice. By default, configure
will pick gcc
if available, else the platform's default (usually cc
). Similarly, you can override the default compiler flags if needed with the CFLAGS
variable.
You can specify environment variables on the configure
command line, for example:
Here is a list of the significant variables that can be set in this manner:
BISON
Bison program
CC
C compiler
CFLAGS
options to pass to the C compiler
CLANG
path to clang
program used to process source code for inlining when compiling with --with-llvmCPP
C preprocessor
CPPFLAGS
options to pass to the C preprocessor
CXX
C++ compiler
CXXFLAGS
options to pass to the C++ compiler
DTRACE
location of the dtrace
program
DTRACEFLAGS
options to pass to the dtrace
program
FLEX
Flex programLDFLAGS
options to use when linking either executables or shared libraries
LDFLAGS_EX
additional options for linking executables only
LDFLAGS_SL
additional options for linking shared libraries only
LLVM_CONFIG
llvm-config
program used to locate the LLVM installation.
MSGFMT
msgfmt
program for native language support
PERL
Full path name of the Perl interpreter. This will be used to determine the dependencies for building PL/Perl.
PYTHON
Full path name of the Python interpreter. This will be used to determine the dependencies for building PL/Python. Also, whether Python 2 or 3 is specified here (or otherwise implicitly chosen) determines which variant of the PL/Python language becomes available. See Section 46.1 for more information.
TCLSH
Full path name of the Tcl interpreter. This will be used to determine the dependencies for building PL/Tcl, and it will be substituted into Tcl scripts.
XML2_CONFIG
xml2-config
program used to locate the libxml installation.
Sometimes it is useful to add compiler flags after-the-fact to the set that were chosen by configure
. An important example is that gcc's -Werror
option cannot be included in the CFLAGS
passed to configure
, because it will break many of configure
's built-in tests. To add such flags, include them in the COPT
environment variable while running make
. The contents ofCOPT
are added to both the CFLAGS
and LDFLAGS
options set up by configure
. For example, you could do
or
Note
When developing code inside the server, it is recommended to use the configure options --enable-cassert
(which turns on many run-time error checks) and --enable-debug
(which improves the usefulness of debugging tools).
If using GCC, it is best to build with an optimization level of at least -O1
, because using no optimization (-O0
) disables some important compiler warnings (such as the use of uninitialized variables). However, non-zero optimization levels can complicate debugging because stepping through compiled code will usually not match up one-to-one with source code lines. If you get confused while trying to debug optimized code, recompile the specific files of interest with -O0
. An easy way to do this is by passing an option to make:
make PROFILE=-O0 file.o
.
The COPT
and PROFILE
environment variables are actually handled identically by the PostgreSQL makefiles. Which to use is a matter of preference, but a common habit among developers is to use PROFILE
for one-time flag adjustments, while COPT
might be kept set all the time.
要開始編譯,請輸入以下任一項:
(請使用GNU make。)編譯將花費一些時間,具體取決於您的硬體。顯示的最後一行應該是:
如果要編譯所有可編譯的內容,包括文件(HTML和手冊頁)以及其他模組(contrib),請輸入:
顯示的最後一行應該是:
如果要從另一個 makefile 而不是手動呼叫編譯,則必須取消設定 MAKELEVEL 或將其設定為零,例如:
如果不這樣做可能會導致奇怪的錯誤訊息,通常是缺少標頭檔案。
如果要在安裝之前測試新編譯的伺服器,則可以在此時執行迴歸測試。迴歸測試是一個測試套件,用於驗證 PostgreSQL 是否以開發人員期望的方式在您的主機上執行。輸入:
(這不能以 root 身份運行;請以非特權用戶身份執行。)有關解釋測試結果的詳細訊息,請參閱第 33 章。您可以在之後透過相同的命令重複此測試。
Note
If you are upgrading an existing system be sure to read Section 18.6, which has instructions about upgrading a cluster.
To install PostgreSQL enter:
This will install files into the directories that were specified in Step 1. Make sure that you have appropriate permissions to write into that area. Normally you need to do this step as root. Alternatively, you can create the target directories in advance and arrange for appropriate permissions to be granted.
To install the documentation (HTML and man pages), enter:
If you built the world above, type instead:
This also installs the documentation.
You can use make install-strip
instead of make install
to strip the executable files and libraries as they are installed. This will save some space. If you built with debugging support, stripping will effectively remove the debugging support, so it should only be done if debugging is no longer needed. install-strip
tries to do a reasonable job saving space, but it does not have perfect knowledge of how to strip every unneeded byte from an executable file, so if you want to save all the disk space you possibly can, you will have to do manual work.
The standard installation provides all the header files needed for client application development as well as for server-side program development, such as custom functions or data types written in C. (Prior to PostgreSQL 8.0, a separate make install-all-headers
command was needed for the latter, but this step has been folded into the standard install.)
Client-only installation: If you want to install only the client applications and interface libraries, then you can use these commands:
src/bin
has a few binaries for server-only use, but they are small.
Uninstallation: To undo the installation use the command make uninstall
. However, this will not remove any created directories.
Cleaning: After the installation you can free disk space by removing the built files from the source tree with the command make clean
. This will preserve the files made by the configure
program, so that you can rebuild everything with make
later on. To reset the source tree to the state in which it was distributed, use make distclean
. If you are going to build for several platforms within the same source tree you must do this and re-configure for each platform. (Alternatively, use a separate build tree for each platform, so that the source tree remains unmodified.)
If you perform a build and then discover that your configure
options were wrong, or if you change anything that configure
investigates (for example, software upgrades), then it's a good idea to do make distclean
before reconfiguring and rebuilding. Without this, your changes in configuration choices might not propagate everywhere they need to.
The long version is the rest of this chapter.
In general, a modern Unix-compatible platform should be able to run PostgreSQL. The platforms that had received specific testing at the time of release are listed in Section 16.6 below. In the doc
subdirectory of the distribution there are several platform-specific FAQ documents you might wish to consult if you are having trouble.
The following software packages are required for building PostgreSQL:
GNU make version 3.80 or newer is required; other make programs or older GNU make versions will not work. (GNU make is sometimes installed under the name gmake
.) To test for GNU make enter:
You need an ISO/ANSI C compiler (at least C89-compliant). Recent versions of GCC are recommended, but PostgreSQL is known to build using a wide variety of compilers from different vendors.
tar is required to unpack the source distribution, in addition to either gzip or bzip2.
The GNU Readline library is used by default. It allows psql (the PostgreSQL command line SQL interpreter) to remember each command you type, and allows you to use arrow keys to recall and edit previous commands. This is very helpful and is strongly recommended. If you don't want to use it then you must specify the --without-readline
option to configure
. As an alternative, you can often use the BSD-licensed libedit
library, originally developed on NetBSD. The libedit
library is GNU Readline-compatible and is used if libreadline
is not found, or if --with-libedit-preferred
is used as an option to configure
. If you are using a package-based Linux distribution, be aware that you need both the readline
and readline-devel
packages, if those are separate in your distribution.
The zlib compression library is used by default. If you don't want to use it then you must specify the --without-zlib
option to configure
. Using this option disables support for compressed archives in pg_dump andpg_restore.
The following packages are optional. They are not required in the default configuration, but they are needed when certain build options are enabled, as explained below:
To build the server programming language PL/Perl you need a full Perl installation, including the libperl
library and the header files. The minimum required version is Perl 5.8.3. Since PL/Perl will be a shared library, the libperl
library must be a shared library also on most platforms. This appears to be the default in recent Perl versions, but it was not in earlier versions, and in any case it is the choice of whomever installed Perl at your site. configure
will fail if building PL/Perl is selected but it cannot find a shared libperl
. In that case, you will have to rebuild and install Perl manually to be able to build PL/Perl. During the configuration process for Perl, request a shared library.
If you intend to make more than incidental use of PL/Perl, you should ensure that the Perl installation was built with the usemultiplicity
option enabled (perl -V
will show whether this is the case).
To build the PL/Python server programming language, you need a Python installation with the header files and the distutils module. The minimum required version is Python 2.4. Python 3 is supported if it's version 3.1 or later; but see Section 45.1 when using Python 3.
Since PL/Python will be a shared library, the libpython
library must be a shared library also on most platforms. This is not the case in a default Python installation built from source, but a shared library is available in many operating system distributions. configure
will fail if building PL/Python is selected but it cannot find a shared libpython
. That might mean that you either have to install additional packages or rebuild (part of) your Python installation to provide this shared library. When building from source, run Python's configure with the --enable-shared
flag.
To build the PL/Tcl procedural language, you of course need a Tcl installation. The minimum required version is Tcl 8.4.
To enable Native Language Support (NLS), that is, the ability to display a program's messages in a language other than English, you need an implementation of the Gettext API. Some operating systems have this built-in (e.g., Linux, NetBSD, Solaris), for other systems you can download an add-on package from http://www.gnu.org/software/gettext/. If you are using the Gettext implementation in the GNU C library then you will additionally need the GNU Gettext package for some utility programs. For any of the other implementations you will not need it.
You need OpenSSL, if you want to support encrypted client connections. The minimum required version is 0.9.8.
You need Kerberos, OpenLDAP, and/or PAM, if you want to support authentication using those services.
To build the PostgreSQL documentation, there is a separate set of requirements; see Section J.2.
If you are building from a Git tree instead of using a released source package, or if you want to do server development, you also need the following packages:
GNU Flex and Bison are needed to build from a Git checkout, or if you changed the actual scanner and parser definition files. If you need them, be sure to get Flex 2.5.31 or later and Bison 1.875 or later. Other lexand yacc programs cannot be used.
Perl 5.8.3 or later is needed to build from a Git checkout, or if you changed the input files for any of the build steps that use Perl scripts. If building on Windows you will need Perl in any case. Perl is also required to run some test suites.
If you need to get a GNU package, you can find it at your local GNU mirror site (see http://www.gnu.org/order/ftp.html for a list) or at ftp://ftp.gnu.org/gnu/.
Also check that you have sufficient disk space. You will need about 100 MB for the source tree during compilation and about 20 MB for the installation directory. An empty database cluster takes about 35 MB; databases take about five times the amount of space that a flat text file with the same data would take. If you are going to run the regression tests you will temporarily need up to an extra 150 MB. Use the df
command to check free disk space.
On some systems with shared libraries you need to tell the system how to find the newly installed shared libraries. The systems on which this is not necessary include FreeBSD, HP-UX, Linux, NetBSD, OpenBSD, and Solaris.
The method to set the shared library search path varies between platforms, but the most widely-used method is to set the environment variable LD_LIBRARY_PATH
like so: In Bourne shells (sh
, ksh
, bash
, zsh
):
or in csh
or tcsh
:
Replace /usr/local/pgsql/lib
with whatever you set --libdir
to in Step 1. You should put these commands into a shell start-up file such as /etc/profile
or ~/.bash_profile
. Some good information about the caveats associated with this method can be found at http://xahlee.org/UnixResource_dir/_/ldpath.html.
On some systems it might be preferable to set the environment variable LD_RUN_PATH
before building.
On Cygwin, put the library directory in the PATH
or move the .dll
files into the bin
directory.
If in doubt, refer to the manual pages of your system (perhaps ld.so
or rld
). If you later get a message like:
then this step was necessary. Simply take care of it then.
If you are on Linux and you have root access, you can run:
(or equivalent directory) after installation to enable the run-time linker to find the shared libraries faster. Refer to the manual page of ldconfig
for more information. On FreeBSD, NetBSD, and OpenBSD the command is:
instead. Other systems are not known to have an equivalent command.
If you installed into /usr/local/pgsql
or some other location that is not searched for programs by default, you should add /usr/local/pgsql/bin
(or whatever you set --bindir
to in Step 1) into your PATH
. Strictly speaking, this is not necessary, but it will make the use of PostgreSQL much more convenient.
To do this, add the following to your shell start-up file, such as ~/.bash_profile
(or /etc/profile
, if you want it to affect all users):
If you are using csh
or tcsh
, then use this command:
To enable your system to find the man documentation, you need to add lines like the following to a shell start-up file unless you installed into a location that is searched by default:
The environment variables PGHOST
and PGPORT
specify to client applications the host and port of the database server, overriding the compiled-in defaults. If you are going to run client applications remotely then it is convenient if every user that plans to use the database sets PGHOST
. This is not required, however; the settings can be communicated via command line options to most client programs.
本章討論如何設定和運行資料庫伺服器及其與作業系統的互動。
與外部世界可存取的任何伺服器背景程序一樣,建議在單獨的使用者帳戶下運行 PostgreSQL。此使用者帳戶應僅擁有由伺服器管理的資料,不應與其他背景程序共享。(例如,使用使用者 nobody 就是個壞主意。)安裝此使用者所擁有的可執行檔案不可取,因為有漏洞的系統可以修改它們自己的可執行檔案。
要將 Unix 使用者帳號加到系統中,請查詢指令 useradd 或 adduser。使用者名稱 postgres 經常被使用,也在本使用手冊中被假定,但如果你想要,也可以使用其他名字。
This section documents additional platform-specific issues regarding the installation and setup of PostgreSQL. Be sure to read the installation instructions, and in particular as well. Also, check regarding the interpretation of regression test results.
Platforms that are not covered here have no known platform-specific installation issues.
PostgreSQL works on AIX, but getting it installed properly can be challenging. AIX versions from 4.3.3 to 6.1 are considered supported. You can use GCC or the native IBM compiler xlc
. In general, using recent versions of AIX and PostgreSQL helps. Check the build farm for up to date information about which versions of AIX are known to work.
The minimum recommended fix levels for supported AIX versions are:AIX 4.3.3
Maintenance Level 11 + post ML11 bundleAIX 5.1
Maintenance Level 9 + post ML9 bundleAIX 5.2
Technology Level 10 Service Pack 3AIX 5.3
Technology Level 7AIX 6.1
Base Level
To check your current fix level, use oslevel -r
in AIX 4.3.3 to AIX 5.2 ML 7, or oslevel -s
in later versions.
Use the following configure
flags in addition to your own if you have installed Readline or libz in /usr/local
: --with-includes=/usr/local/include --with-libraries=/usr/local/lib
.
16.7.1.1. GCC Issues
On AIX 5.3, there have been some problems getting PostgreSQL to compile and run using GCC.
You will want to use a version of GCC subsequent to 3.3.2, particularly if you use a prepackaged version. We had good success with 4.0.1. Problems with earlier versions seem to have more to do with the way IBM packaged GCC than with actual issues with GCC, so that if you compile GCC yourself, you might well have success with an earlier version of GCC.
16.7.1.2. Unix-Domain Sockets Broken
AIX 5.3 has a problem where sockaddr_storage
is not defined to be large enough. In version 5.3, IBM increased the size of sockaddr_un
, the address structure for Unix-domain sockets, but did not correspondingly increase the size of sockaddr_storage
. The result of this is that attempts to use Unix-domain sockets with PostgreSQL lead to libpq overflowing the data structure. TCP/IP connections work OK, but not Unix-domain sockets, which prevents the regression tests from working.
The problem was reported to IBM, and is recorded as bug report PMR29657. If you upgrade to maintenance level 5300-03 or later, that will include this fix. A quick workaround is to alter _SS_MAXSIZE
to 1025 in /usr/include/sys/socket.h
. In either case, recompile PostgreSQL once you have the corrected header file.
16.7.1.3. Internet Address Issues
PostgreSQL relies on the system's getaddrinfo
function to parse IP addresses in listen_addresses
, pg_hba.conf
, etc. Older versions of AIX have assorted bugs in this function. If you have problems related to these settings, updating to the appropriate AIX fix level shown above should take care of it.
One user reports:
When implementing PostgreSQL version 8.1 on AIX 5.3, we periodically ran into problems where the statistics collector would “mysteriously” not come up successfully. This appears to be the result of unexpected behavior in the IPv6 implementation. It looks like PostgreSQL and IPv6 do not play very well together on AIX 5.3.
Any of the following actions “fix” the problem.
Delete the IPv6 address for localhost:
Remove IPv6 from net services. The file /etc/netsvc.conf
on AIX is roughly equivalent to /etc/nsswitch.conf
on Solaris/Linux. The default, on AIX, is thus:
Replace this with:
to deactivate searching for IPv6 addresses.
This is really a workaround for problems relating to immaturity of IPv6 support, which improved visibly during the course of AIX 5.3 releases. It has worked with AIX version 5.3, but does not represent an elegant solution to the problem. It has been reported that this workaround is not only unnecessary, but causes problems on AIX 6.1, where IPv6 support has become more mature.
16.7.1.4. Memory Management
AIX can be somewhat peculiar with regards to the way it does memory management. You can have a server with many multiples of gigabytes of RAM free, but still get out of memory or address space errors when running applications. One example is loading of extensions failing with unusual errors. For example, running as the owner of the PostgreSQL installation:
Running as a non-owner in the group possessing the PostgreSQL installation:
Another example is out of memory errors in the PostgreSQL server logs, with every memory allocation near or greater than 256 MB failing.
The overall cause of all these problems is the default bittedness and memory model used by the server process. By default, all binaries built on AIX are 32-bit. This does not depend upon hardware type or kernel in use. These 32-bit processes are limited to 4 GB of memory laid out in 256 MB segments using one of a few models. The default allows for less than 256 MB in the heap as it shares a single segment with the stack.
In the case of the plperl
example, above, check your umask and the permissions of the binaries in your PostgreSQL installation. The binaries involved in that example were 32-bit and installed as mode 750 instead of 755. Due to the permissions being set in this fashion, only the owner or a member of the possessing group can load the library. Since it isn't world-readable, the loader places the object into the process' heap instead of the shared library segments where it would otherwise be placed.
The “ideal” solution for this is to use a 64-bit build of PostgreSQL, but that is not always practical, because systems with 32-bit processors can build, but not run, 64-bit binaries.
If a 32-bit binary is desired, set LDR_CNTRL
to MAXDATA=0x
n
_0000000, where 1 <= n <= 8, before starting the PostgreSQL server, and try different values and postgresql.conf
settings to find a configuration that works satisfactorily. This use of LDR_CNTRL
tells AIX that you want the server to have MAXDATA
bytes set aside for the heap, allocated in 256 MB segments. When you find a workable configuration, ldedit
can be used to modify the binaries so that they default to using the desired heap size. PostgreSQL can also be rebuilt, passingconfigure LDFLAGS="-Wl,-bmaxdata:0x
n
_0000000" to achieve the same effect.
For a 64-bit build, set OBJECT_MODE
to 64 and pass CC="gcc -maix64"
and LDFLAGS="-Wl,-bbigtoc"
to configure
. (Options for xlc
might differ.) If you omit the export of OBJECT_MODE
, your build may fail with linker errors. When OBJECT_MODE
is set, it tells AIX's build utilities such as ar
, as
, and ld
what type of objects to default to handling.
By default, overcommit of paging space can happen. While we have not seen this occur, AIX will kill processes when it runs out of memory and the overcommit is accessed. The closest to this that we have seen is fork failing because the system decided that there was not enough memory for another process. Like many other parts of AIX, the paging space allocation method and out-of-memory kill is configurable on a system- or process-wide basis if this becomes a problem.
References and Resources
When building from source, proceed according to the normal installation procedure (i.e., ./configure; make
; etc.), noting the following-Cygwin specific differences:
Set your path to use the Cygwin bin directory before the Windows utilities. This will help prevent problems with compilation.
The adduser
command is not supported; use the appropriate user management application on Windows NT, 2000, or XP. Otherwise, skip this step.
The su
command is not supported; use ssh to simulate su on Windows NT, 2000, or XP. Otherwise, skip this step.
OpenSSL is not supported.
Start cygserver
for shared memory support. To do this, enter the command /usr/sbin/cygserver &
. This program needs to be running anytime you start the PostgreSQL server or initialize a database cluster (initdb
). The default cygserver
configuration may need to be changed (e.g., increase SEMMNS
) to prevent PostgreSQL from failing due to a lack of system resources.
Building might fail on some systems where a locale other than C is in use. To fix this, set the locale to C by doing export LANG=C.utf8
before building, and then setting it back to the previous setting, after you have installed PostgreSQL.
The parallel regression tests (make check
) can generate spurious regression test failures due to overflowing the listen()
backlog queue which causes connection refused errors or hangs. You can limit the number of connections using the make variableMAX_CONNECTIONS
thus:
(On some systems you can have up to about 10 simultaneous connections).
It is possible to install cygserver
and the PostgreSQL server as Windows NT services. For information on how to do this, please refer to the README
document included with the PostgreSQL binary package on Cygwin. It is installed in the directory /usr/share/doc/Cygwin
.
PostgreSQL 7.3+ should work on Series 700/800 PA-RISC machines running HP-UX 10.X or 11.X, given appropriate system patch levels and build tools. At least one developer routinely tests on HP-UX 10.20, and we have reports of successful installations on HP-UX 11.00 and 11.11.
Aside from the PostgreSQL source distribution, you will need GNU make (HP's make will not do), and either GCC or HP's full ANSI C compiler. If you intend to build from Git sources rather than a distribution tarball, you will also need Flex (GNU lex) and Bison (GNU yacc). We also recommend making sure you are fairly up-to-date on HP patches. At a minimum, if you are building 64 bit binaries on HP-UX 11.11 you may need PHSS_30966 (11.11) or a successor patch otherwise initdb
may hang:
PHSS_30966 s700_800 ld(1) and linker tools cumulative patch
If you are building on a PA-RISC 2.0 machine and want to have 64-bit binaries using GCC, you must use a GCC 64-bit version.
If you are building on a PA-RISC 2.0 machine and want the compiled binaries to run on PA-RISC 1.1 machines you will need to specify +DAportable
in CFLAGS
.
If you are building on a HP-UX Itanium machine, you will need the latest HP ANSI C compiler with its dependent patch or successor patches:
PHSS_30848 s700_800 HP C Compiler (A.05.57) PHSS_30849 s700_800 u2comp/be/plugin library Patch
If you have both HP's C compiler and GCC's, then you might want to explicitly select the compiler to use when you run configure
:
for HP's C compiler, or
for GCC. If you omit this setting, then configure will pick gcc
if it has a choice.
The default install target location is /usr/local/pgsql
, which you might want to change to something under /opt
. If so, use the --prefix
switch to configure
.
In the regression tests, there might be some low-order-digit differences in the geometry tests, which vary depending on which compiler and math library versions you use. Any other error is cause for suspicion.
After you have everything installed, it is suggested that you run psql under CMD.EXE
, as the MSYS console has buffering issues.
16.7.4.1. Collecting Crash Dumps on Windows
If PostgreSQL on Windows crashes, it has the ability to generate minidumps that can be used to track down the cause for the crash, similar to core dumps on Unix. These dumps can be read using the Windows Debugger Tools or using Visual Studio. To enable the generation of dumps on Windows, create a subdirectory named crashdumps
inside the cluster data directory. The dumps will then be written into this directory with a unique name based on the identifier of the crashing process and the current time of the crash.
PostgreSQL is well-supported on Solaris. The more up to date your operating system, the fewer issues you will experience; details below.
16.7.5.1. Required Tools
You can build with either GCC or Sun's compiler suite. For better code optimization, Sun's compiler is strongly recommended on the SPARC architecture. We have heard reports of problems when using GCC 2.95.1; GCC 2.95.3 or later is recommended. If you are using Sun's compiler, be careful not to select /usr/ucb/cc
; use /opt/SUNWspro/bin/cc
.
16.7.5.2. configure Complains About a Failed Test Program
If configure
complains about a failed test program, this is probably a case of the run-time linker being unable to find some library, probably libz, libreadline or some other non-standard library such as libssl. To point it to the right location, set the LDFLAGS
environment variable on the configure
command line, e.g.,
See the ld man page for more information.
16.7.5.3. 64-bit Build Sometimes Crashes
On Solaris 7 and older, the 64-bit version of libc has a buggy vsnprintf
routine, which leads to erratic core dumps in PostgreSQL. The simplest known workaround is to force PostgreSQL to use its own version of vsnprintf
rather than the library copy. To do this, after you run configure
edit a file produced by configure
: In src/Makefile.global
, change the line
to read
(There might be other files already listed in this variable. Order does not matter.) Then build as usual.
16.7.5.4. Compiling for Optimal Performance
On the SPARC architecture, Sun Studio is strongly recommended for compilation. Try using the -xO5
optimization flag to generate significantly faster binaries. Do not use any flags that modify behavior of floating-point operations and errno
processing (e.g., -fast
). These flags could raise some nonstandard PostgreSQL behavior for example in the date/time computing.
If you do not have a reason to use 64-bit binaries on SPARC, prefer the 32-bit version. The 64-bit operations are slower and 64-bit binaries are slower than the 32-bit variants. And on other hand, 32-bit code on the AMD64 CPU family is not native, and that is why 32-bit code is significant slower on this CPU family.
16.7.5.5. Using DTrace for Tracing PostgreSQL
If you see the linking of the postgres
executable abort with an error message like:
your DTrace installation is too old to handle probes in static functions. You need Solaris 10u4 or newer.
本部分涵蓋了 PostgreSQL 資料庫管理員會感興趣的主題。這包括安裝軟體,設定和配置伺服器,管理使用者和資料庫以及維護任務。任何運行 PostgreSQL 伺服器的人,即使是個人使用,特別是在產品環境中,都應該熟悉本部分所涉及的主題。
這部分的資訊大致按照新使用者閱讀的順序排列。但是這些章節是獨立的,可以根據需求再單獨閱讀。這部分的內容以主題單位的敘述方式呈現。要查看某個特定指令的完整說明,請參閱。
前幾章是為了在沒有必要知識的情況下可以理解而撰寫的,因此需要建立自有伺服器的新使用者可以使用這一部分開始探索。這部分的其餘部分是關於調教和管理;該內容假定讀者熟悉 PostgreSQL 資料庫系統的一般用法。建議讀者閱讀和以取得更多訊息。
It is recommended that most users download the binary distribution for Windows, available as a graphical installer package from the PostgreSQL website. Building from source is only intended for people developing PostgreSQL or extensions.
There are several different ways of building PostgreSQL on Windows. The simplest way to build with Microsoft tools is to install Visual Studio Express 2017 for Windows Desktop and use the included compiler. It is also possible to build with the full Microsoft Visual C++ 2005 to 2017. In some cases that requires the installation of the Windows SDK in addition to the compiler.
It is also possible to build PostgreSQL using the GNU compiler tools provided by MinGW, or using Cygwin for older versions of Windows.
Building using MinGW or Cygwin uses the normal build system, see and the specific notes in and . To produce native 64 bit binaries in these environments, use the tools from MinGW-w64. These tools can also be used to cross-compile for 32 bit and 64 bit Windows targets on other hosts, such as Linux and macOS. Cygwin is not recommended for running a production server, and it should only be used for running on older versions of Windows where the native build does not work, such as Windows 98. The official binaries are built using Visual Studio.
Native builds of psql don't support command line editing. The Cygwin build does support command line editing, so it should be used where psql is needed for interactive use on Windows.
PostgreSQL can sometimes exhaust various operating system resource limits, especially when multiple copies of the server are running on the same system, or in very large installations. This section explains the kernel resources used by PostgreSQL and the steps you can take to resolve problems related to kernel resource consumption.
PostgreSQL requires the operating system to provide inter-process communication (IPC) features, specifically shared memory and semaphores. Unix-derived systems typically provide “System V” IPC, “POSIX” IPC, or both. Windows has its own implementation of these features and is not discussed here.
The complete lack of these facilities is usually manifested by an “Illegal system call” error upon server start. In that case there is no alternative but to reconfigure your kernel. PostgreSQL won't work without them. This situation is rare, however, among modern operating systems.
Upon starting the server, PostgreSQL normally allocates a very small amount of System V shared memory, as well as a much larger amount of POSIX (mmap
) shared memory. In addition a significant number of semaphores, which can be either System V or POSIX style, are created at server startup. Currently, POSIX semaphores are used on Linux and FreeBSD systems while other platforms use System V semaphores.
Prior to PostgreSQL 9.3, only System V shared memory was used, so the amount of System V shared memory required to start the server was much larger. If you are running an older version of the server, please consult the documentation for your server version.
System V IPC features are typically constrained by system-wide allocation limits. When PostgreSQL exceeds one of these limits, the server will refuse to start and should leave an instructive error message describing the problem and what to do about it. (See also .) The relevant kernel parameters are named consistently across different systems; gives an overview. The methods to set them, however, vary. Suggestions for some platforms are given below.
Table 18.1. System V IPC Parameters
PostgreSQL requires a few bytes of System V shared memory (typically 48 bytes, on 64-bit platforms) for each copy of the server. On most modern operating systems, this amount can easily be allocated. However, if you are running many copies of the server, or if other applications are also using System V shared memory, it may be necessary to increase SHMALL
, which is the total amount of System V shared memory system-wide. Note that SHMALL
is measured in pages rather than bytes on many systems.
Less likely to cause problems is the minimum size for shared memory segments (SHMMIN
), which should be at most approximately 32 bytes for PostgreSQL (it is usually just 1). The maximum number of segments system-wide (SHMMNI
) or per-process (SHMSEG
) are unlikely to cause a problem unless your system has them set to zero.
In some cases it might also be necessary to increase SEMMAP
to be at least on the order of SEMMNS
. This parameter defines the size of the semaphore resource map, in which each contiguous block of available semaphores needs an entry. When a semaphore set is freed it is either added to an existing entry that is adjacent to the freed block or it is registered under a new map entry. If the map is full, the freed semaphores get lost (until reboot). Fragmentation of the semaphore space could over time lead to fewer available semaphores than there should be.
Various other settings related to “semaphore undo”, such as SEMMNU
and SEMUME
, do not affect PostgreSQL.
At least as of version 5.1, it should not be necessary to do any special configuration for such parameters as SHMMAX
, as it appears this is configured to allow all memory to be used as shared memory. That is the sort of configuration commonly used for other databases such as DB/2.
It might, however, be necessary to modify the global ulimit
information in /etc/security/limits
, as the default hard limits for file sizes (fsize
) and numbers of files (nofiles
) might be too low.FreeBSD
The default settings can be changed using the sysctl
or loader
interfaces. The following parameters can be set using sysctl
:
To make these settings persist over reboots, modify /etc/sysctl.conf
.
These semaphore-related settings are read-only as far as sysctl
is concerned, but can be set in /boot/loader.conf
:
After modifying these values a reboot is required for the new settings to take effect. (Note: FreeBSD does not use SEMMAP
. Older versions would accept but ignore a setting for kern.ipc.semmap
; newer versions reject it altogether.)
You might also want to configure your kernel to lock shared memory into RAM and prevent it from being paged out to swap. This can be accomplished using the sysctl
setting kern.ipc.shm_use_phys
.
If running in FreeBSD jails by enabling sysctl's security.jail.sysvipc_allowed
, postmasters running in different jails should be run by different operating system users. This improves security because it prevents non-root users from interfering with shared memory or semaphores in different jails, and it allows the PostgreSQL IPC cleanup code to function properly. (In FreeBSD 6.0 and later the IPC cleanup code does not properly detect processes in other jails, preventing the running of postmasters on the same port in different jails.)
FreeBSD versions before 4.0 work like OpenBSD (see below).NetBSD
In NetBSD 5.0 and later, IPC parameters can be adjusted using sysctl
, for example:
To have these settings persist over reboots, modify /etc/sysctl.conf
.
You might also want to configure your kernel to lock shared memory into RAM and prevent it from being paged out to swap. This can be accomplished using the sysctl
setting kern.ipc.shm_use_phys
.
NetBSD versions before 5.0 work like OpenBSD (see below), except that parameters should be set with the keyword options
not option
.OpenBSD
The options SYSVSHM
and SYSVSEM
need to be enabled when the kernel is compiled. (They are by default.) The maximum size of shared memory is determined by the option SHMMAXPGS
(in pages). The following shows an example of how to set the various parameters:
You might also want to configure your kernel to lock shared memory into RAM and prevent it from being paged out to swap. This can be accomplished using the sysctl
setting kern.ipc.shm_use_phys
.HP-UX
The default settings tend to suffice for normal installations. On HP-UX 10, the factory default for SEMMNS
is 128, which might be too low for larger database sites.
IPC parameters can be set in the System Administration Manager (SAM) under Kernel Configuration → Configurable Parameters. Choose Create A New Kernel when you're done.Linux
The default maximum segment size is 32 MB, and the default maximum total size is 2097152 pages. A page is almost always 4096 bytes except in unusual kernel configurations with “huge pages” (use getconf PAGE_SIZE
to verify).
The shared memory size settings can be changed via the sysctl
interface. For example, to allow 16 GB:
In addition these settings can be preserved between reboots in the file /etc/sysctl.conf
. Doing that is highly recommended.
Ancient distributions might not have the sysctl
program, but equivalent changes can be made by manipulating the /proc
file system:
The remaining defaults are quite generously sized, and usually do not require changes.macOS
The recommended method for configuring shared memory in macOS is to create a file named /etc/sysctl.conf
, containing variable assignments such as:
Note that in some macOS versions, all five shared-memory parameters must be set in /etc/sysctl.conf
, else the values will be ignored.
Beware that recent releases of macOS ignore attempts to set SHMMAX
to a value that isn't an exact multiple of 4096.
SHMALL
is measured in 4 kB pages on this platform.
In older macOS versions, you will need to reboot to have changes in the shared memory parameters take effect. As of 10.5 it is possible to change all but SHMMNI
on the fly, using sysctl. But it's still best to set up your preferred values via /etc/sysctl.conf
, so that the values will be kept across reboots.
The file /etc/sysctl.conf
is only honored in macOS 10.3.9 and later. If you are running a previous 10.3.x release, you must edit the file /etc/rc
and change the values in the following commands:
Note that /etc/rc
is usually overwritten by macOS system updates, so you should expect to have to redo these edits after each update.
In macOS 10.2 and earlier, instead edit these commands in the file /System/Library/StartupItems/SystemTuning/SystemTuning
.Solaris 2.6 to 2.9 (Solaris 6 to Solaris 9)
The relevant settings can be changed in /etc/system
, for example:
In Solaris 10 and later, and OpenSolaris, the default shared memory and semaphore settings are good enough for most PostgreSQL applications. Solaris now defaults to a SHMMAX
of one-quarter of system RAM. To further adjust this setting, use a project setting associated with the postgres
user. For example, run the following as root
:
This command adds the user.postgres
project and sets the shared memory maximum for the postgres
user to 8GB, and takes effect the next time that user logs in, or when you restart PostgreSQL (not reload). The above assumes that PostgreSQL is run by the postgres
user in the postgres
group. No server reboot is required.
Other recommended kernel setting changes for database servers which will have a large number of connections are:
Additionally, if you are running PostgreSQL inside a zone, you may need to raise the zone resource usage limits as well. See "Chapter2: Projects and Tasks" in the System Administrator's Guide for more information on projects
and prctl
.
If systemd is in use, some care must be taken that IPC resources (shared memory and semaphores) are not prematurely removed by the operating system. This is especially of concern when installing PostgreSQL from source. Users of distribution packages of PostgreSQL are less likely to be affected, as the postgres
user is then normally created as a system user.
The setting RemoveIPC
in logind.conf
controls whether IPC objects are removed when a user fully logs out. System users are exempt. This setting defaults to on in stock systemd, but some operating system distributions default it to off.
A typical observed effect when this setting is on is that the semaphore objects used by a PostgreSQL server are removed at apparently random times, leading to the server crashing with log messages like
Different types of IPC objects (shared memory vs. semaphores, System V vs. POSIX) are treated slightly differently by systemd, so one might observe that some IPC resources are not removed in the same way as others. But it is not advisable to rely on these subtle differences.
A “user logging out” might happen as part of a maintenance job or manually when an administrator logs in as the postgres
user or something similar, so it is hard to prevent in general.
What is a “system user” is determined at systemd compile time from the SYS_UID_MAX
setting in /etc/login.defs
.
Packaging and deployment scripts should be careful to create the postgres
user as a system user by using useradd -r
, adduser --system
, or equivalent.
Alternatively, if the user account was created incorrectly or cannot be changed, it is recommended to set
in /etc/systemd/logind.conf
or another appropriate configuration file.
At least one of these two things has to be ensured, or the PostgreSQL server will be very unreliable.
Unix-like operating systems enforce various kinds of resource limits that might interfere with the operation of your PostgreSQL server. Of particular importance are limits on the number of processes per user, the number of open files per process, and the amount of memory available to each process. Each of these have a “hard” and a “soft” limit. The soft limit is what actually counts but it can be changed by the user up to the hard limit. The hard limit can only be changed by the root user. The system call setrlimit
is responsible for setting these parameters. The shell's built-in command ulimit
(Bourne shells) or limit
(csh) is used to control the resource limits from the command line. On BSD-derived systems the file /etc/login.conf
controls the various resource limits set during login. See the operating system documentation for details. The relevant parameters are maxproc
, openfiles
, and datasize
. For example:
(-cur
is the soft limit. Append -max
to set the hard limit.)
Kernels can also have system-wide limits on some resources.
On Linux /proc/sys/fs/file-max
determines the maximum number of open files that the kernel will support. It can be changed by writing a different number into the file or by adding an assignment in /etc/sysctl.conf
. The maximum limit of files per process is fixed at the time the kernel is compiled; see /usr/src/linux/Documentation/proc.txt
for more information.
The PostgreSQL server uses one process per connection so you should provide for at least as many processes as allowed connections, in addition to what you need for the rest of your system. This is usually not a problem but if you run several servers on one machine things might get tight.
The factory default limit on open files is often set to “socially friendly” values that allow many users to coexist on a machine without using an inappropriate fraction of the system resources. If you run many servers on a machine this is perhaps what you want, but on dedicated servers you might want to raise this limit.
In Linux 2.4 and later, the default virtual memory behavior is not optimal for PostgreSQL. Because of the way that the kernel implements memory overcommit, the kernel might terminate the PostgreSQL postmaster (the master server process) if the memory demands of either PostgreSQL or another process cause the system to run out of virtual memory.
If this happens, you will see a kernel message that looks like this (consult your system documentation and configuration on where to look for such a message):
This indicates that the postgres
process has been terminated due to memory pressure. Although existing database connections will continue to function normally, no new connections will be accepted. To recover, PostgreSQL will need to be restarted.
One way to avoid this problem is to run PostgreSQL on a machine where you can be sure that other processes will not run the machine out of memory. If memory is tight, increasing the swap space of the operating system can help avoid the problem, because the out-of-memory (OOM) killer is invoked only when physical memory and swap space are exhausted.
Another approach, which can be used with or without altering vm.overcommit_memory
, is to set the process-specific OOM score adjustment value for the postmaster process to -1000
, thereby guaranteeing it will not be targeted by the OOM killer. The simplest way to do this is to execute
in the postmaster's startup script just before invoking the postmaster. Note that this action must be done as root, or it will have no effect; so a root-owned startup script is the easiest place to do it. If you do this, you should also set these environment variables in the startup script before invoking the postmaster:
These settings will cause postmaster child processes to run with the normal OOM score adjustment of zero, so that the OOM killer can still target them at need. You could use some other value for PG_OOM_ADJUST_VALUE
if you want the child processes to run with some other OOM score adjustment. (PG_OOM_ADJUST_VALUE
can also be omitted, in which case it defaults to zero.) If you do not set PG_OOM_ADJUST_FILE
, the child processes will run with the same OOM score adjustment as the postmaster, which is unwise since the whole point is to ensure that the postmaster has a preferential setting.
Older Linux kernels do not offer /proc/self/oom_score_adj
, but may have a previous version of the same functionality called /proc/self/oom_adj
. This works the same except the disable value is -17
not -1000
.
Some vendors' Linux 2.4 kernels are reported to have early versions of the 2.6 overcommit sysctl
parameter. However, setting vm.overcommit_memory
to 2 on a 2.4 kernel that does not have the relevant code will make things worse, not better. It is recommended that you inspect the actual kernel source code (see the function vm_enough_memory
in the file mm/mmap.c
) to verify what is supported in your kernel before you try this in a 2.4 installation. The presence of the overcommit-accounting
documentation file should not be taken as evidence that the feature is there. If in any doubt, consult a kernel expert or your kernel vendor.
6490428
/ 2048
gives approximately 3169.154
, so in this example we need at least 3170
huge pages, which we can set with:
A larger setting would be appropriate if other programs on the machine also need huge pages. Don't forget to add this setting to /etc/sysctl.conf
so that it will be reapplied after reboots.
Sometimes the kernel is not able to allocate the desired number of huge pages immediately, so it might be necessary to repeat the command or to reboot. (Immediately after a reboot, most of the machine's memory should be available to convert into huge pages.) To verify the huge page allocation situation, use:
It may also be necessary to give the database server's operating system user permission to use huge pages by setting vm.hugetlb_shm_group
via sysctl, and/or give permission to lock memory with ulimit -l
.
A platform (that is, a CPU architecture and operating system combination) is considered supported by the PostgreSQL development community if the code contains provisions to work on that platform and it has recently been verified to build and pass its regression tests on that platform. Currently, most testing of platform compatibility is done automatically by test machines in the . If you are interested in using PostgreSQL on a platform that is not represented in the build farm, but on which the code works or can be made to work, you are strongly encouraged to set up a build farm member machine so that continued compatibility can be assured.
In general, PostgreSQL can be expected to work on these CPU architectures: x86, x86_64, IA64, PowerPC, PowerPC 64, S/390, S/390x, Sparc, Sparc 64, ARM, MIPS, MIPSEL, and PA-RISC. Code support exists for M68K, M32R, and VAX, but these architectures are not known to have been tested recently. It is often possible to build on an unsupported CPU type by configuring with --disable-spinlocks
, but performance will be poor.
PostgreSQL can be expected to work on these operating systems: Linux (all recent distributions), Windows (Win2000 SP4 and later), FreeBSD, OpenBSD, NetBSD, macOS, AIX, HP/UX, and Solaris. Other Unix-like systems may also work but are not currently being tested. In most cases, all CPU architectures supported by a given operating system will work. Look in below to see if there is information specific to your operating system, particularly if using an older system.
If you have installation problems on a platform that is known to be supported according to recent build farm results, please report it to <
>
. If you are interested in porting PostgreSQL to a new platform, <
>
is the appropriate place to discuss that.
Before you can do anything, you must initialize a database storage area on disk. We call this a database cluster. (The SQL standard uses the term catalog cluster.) A database cluster is a collection of databases that is managed by a single instance of a running database server. After initialization, a database cluster will contain a database named postgres
, which is meant as a default database for use by utilities, users and third party applications. The database server itself does not require the postgres
database to exist, but many external utility programs assume it exists. Another database created within each cluster during initialization is called template1
. As the name suggests, this will be used as a template for subsequently created databases; it should not be used for actual work. (See for information about creating new databases within a cluster.)
In file system terms, a database cluster is a single directory under which all data will be stored. We call this the data directory or data area. It is completely up to you where you choose to store your data. There is no default, although locations such as /usr/local/pgsql/data
or /var/lib/pgsql/data
are popular. To initialize a database cluster, use the command , which is installed with PostgreSQL. The desired file system location of your database cluster is indicated by the -D
option, for example:
Note that you must execute this command while logged into the PostgreSQL user account, which is described in the previous section.
As an alternative to the -D
option, you can set the environment variable PGDATA
.
Alternatively, you can run initdb
via the program like so:
This may be more intuitive if you are using pg_ctl
for starting and stopping the server (see ), so that pg_ctl
would be the sole command you use for managing the database server instance.
initdb
will attempt to create the directory you specify if it does not already exist. Of course, this will fail if initdb
does not have permissions to write in the parent directory. It's generally recommendable that the PostgreSQL user own not just the data directory but its parent directory as well, so that this should not be a problem. If the desired parent directory doesn't exist either, you will need to create it first, using root privileges if the grandparent directory isn't writable. So the process might look like this:
initdb
will refuse to run if the data directory exists and already contains files; this is to prevent accidentally overwriting an existing installation.
Because the data directory contains all the data stored in the database, it is essential that it be secured from unauthorized access. initdb
therefore revokes access permissions from everyone but the PostgreSQL user, and optionally, group. Group access, when enabled, is read-only. This allows an unprivileged user in the same group as the cluster owner to take a backup of the cluster data or perform other operations that only require read access.
Note that enabling or disabling group access on an existing cluster requires the cluster to be shut down and the appropriate mode to be set on all directories and files before restarting PostgreSQL. Otherwise, a mix of modes might exist in the data directory. For clusters that allow access only by the owner, the appropriate modes are 0700
for directories and 0600
for files. For clusters that also allow reads by the group, the appropriate modes are 0750
for directories and 0640
for files.
However, while the directory contents are secure, the default client authentication setup allows any local user to connect to the database and even become the database superuser. If you do not trust other local users, we recommend you use one of initdb
's -W
, --pwprompt
or --pwfile
options to assign a password to the database superuser. Also, specify -A md5
or -A password
so that the default trust
authentication mode is not used; or modify the generated pg_hba.conf
file after running initdb
, but before you start the server for the first time. (Other reasonable approaches include using peer
authentication or file system permissions to restrict connections. See for more information.)
Non-C
and non-POSIX
locales rely on the operating system's collation library for character set ordering. This controls the ordering of keys stored in indexes. For this reason, a cluster cannot switch to an incompatible collation library version, either through snapshot restore, binary streaming replication, a different operating system, or an operating system upgrade.
Many installations create their database clusters on file systems (volumes) other than the machine's “root” volume. If you choose to do this, it is not advisable to try to use the secondary volume's topmost directory (mount point) as the data directory. Best practice is to create a directory within the mount-point directory that is owned by the PostgreSQL user, and then create the data directory within that. This avoids permissions problems, particularly for operations such as pg_upgrade, and it also ensures clean failures if the secondary volume is taken offline.
一般來說,任何具備 POSIX 標準的檔案系統都可以用於 PostgreSQL。 由於各種原因,使用者可能會使用不同的檔案系統,包括供應商支援、效能和熟悉程度。經驗上來說,在所有其他條件都相同的情況下,不應該僅因為切換檔案系統或進行次要的檔案系統配置變更,而期待效能或行為有明顯的改變。
可以使用 NFS 檔案系統來儲存 PostgreSQL 資料目錄。PostgreSQL 對 NFS 檔案系統並沒有任何特殊的要求,這意味著它假設 NFS 的行為與本地連接的磁碟完全相同。PostgreSQL 不使用已知在NFS上具有非標準行為的任何功能,例如檔案鎖定。
將 NFS 與 PostgreSQL 一起使用時,唯一確定要求是使用 hard 選項安裝檔案系統。使用 hard 選項,如果出現網路問題,NFS 程序可以無限期「hang」(暫停),因此此配置將需要仔細的監控。如果出現網路問題,soft 選項會中斷系統呼,但是 PostgreSQL 不會重複以此方式中斷的系統呼叫,因此任何此類中斷都將導致回報 I/O 錯誤。
不必要使用同步(sync)掛載選項。 async 選項的行為就足夠了,因為 PostgreSQL 會在適當的時機發出 fsync 呼叫來強制緩衝寫入。(這類似於它在本機檔案系統上的工作方式。)但是,強烈建議在存在該檔案的系統(主要是 Linux)上的 NFS 伺服器上使用 sync export 選項。否則,實際上不能保證 NFS 用戶端上的 fsync 或等效檔案可以到達伺服器上的永久儲存,這可能導致損壞,類似於在關閉參數 fsync 的情況下提供服務。這些掛載和輸出選項的預設設定在不同的供應商和版本之間略所不同,因此建議在任何情況下都需要進行檢查並且明確指定它們的內容,以避免任何誤解。
在某些情況下,可以透過 NFS 或更低等級的通訊協定(例如 iSCSI)存取外部儲存產品。在後者,儲存裝置為 block device,可以在其上建立任何可用的檔案系統。這種方法可能使 DBA 不必處理 NFS 的某些特質,不過,管理遠端儲存服務的複雜性會仍發生在其他層級之中。
“”. AIX Documentation: General Programming Concepts: Writing and Debugging Programs.
“”. AIX Documentation: General Programming Concepts: Writing and Debugging Programs.
“”. AIX Documentation: Performance Management Guide.
“”. AIX Documentation: Performance Management Guide.
“”. AIX Documentation: Performance Management Guide.
. IBM Redbook.
PostgreSQL can be built using Cygwin, a Linux-like environment for Windows, but that method is inferior to the native Windows build (see ) and running a server under Cygwin is no longer recommended.
On general principles you should be current on libc and ld/dld patches, as well as compiler patches if you are using HP's C compiler. See HP's support sites such as for free copies of their latest patches.
PostgreSQL for Windows can be built using MinGW, a Unix-like build environment for Microsoft operating systems, or using Microsoft's Visual C++ compiler suite. The MinGW build variant uses the normal build system described in this chapter; the Visual C++ build works completely differently and is described in . It is a fully native build and uses no additional software like MinGW. A ready-made installer is available on the main PostgreSQL web site.
The native Windows port requires a 32 or 64-bit version of Windows 2000 or later. Earlier operating systems do not have sufficient infrastructure (but Cygwin may be used on those). MinGW, the Unix-like build tools, and MSYS, a collection of Unix tools required to run shell scripts like configure
, can be downloaded from . Neither is required to run the resulting binaries; they are needed only for creating the binaries.
To build 64 bit binaries using MinGW, install the 64 bit tool set from , put its bin directory in the PATH
, and run configure
with the --host=x86_64-w64-mingw32
option.
You can download Sun Studio from . Many of GNU tools are integrated into Solaris 10, or they are present on the Solaris companion CD. If you like packages for older version of Solaris, you can find these tools at . If you prefer sources, look at .
Yes, using DTrace is possible. See for further information.
When using System V semaphores, PostgreSQL uses one semaphore per allowed connection (), allowed autovacuum worker process () and allowed background process (), in sets of 16. Each such set will also contain a 17th semaphore which contains a “magic number”, to detect collision with semaphore sets used by other applications. The maximum number of semaphores in the system is set by SEMMNS
, which consequently must be at least as high as max_connections
plus autovacuum_max_workers
plus max_worker_processes
, plus one extra for each 16 allowed connections plus workers (see the formula in ). The parameter SEMMNI
determines the limit on the number of semaphore sets that can exist on the system at one time. Hence this parameter must be at least ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) / 16)
. Lowering the number of allowed connections is a temporary workaround for failures, which are usually confusingly worded “No space left on device”, from the function semget
.
When using POSIX semaphores, the number of semaphores needed is the same as for System V, that is one semaphore per allowed connection (), allowed autovacuum worker process () and allowed background process (). On the platforms where this option is preferred, there is no specific kernel limit on the number of POSIX semaphores.AIX
You need to reboot for the changes to take effect. See also for information on shared memory under older versions of Solaris.Solaris 2.10 (Solaris 10) and later OpenSolaris
On the other side of the coin, some systems allow individual processes to open large numbers of files; if more than a few processes do so then the system-wide limit can easily be exceeded. If you find this happening, and you do not want to alter the system-wide limit, you can set PostgreSQL's configuration parameter to limit the consumption of open files.
If PostgreSQL itself is the cause of the system running out of memory, you can avoid the problem by changing your configuration. In some cases, it may help to lower memory-related configuration parameters, particularly and . In other cases, the problem may be caused by allowing too many connections to the database server itself. In many cases, it may be better to reduce and instead make use of external connection-pooling software.
On Linux 2.6 and later, it is possible to modify the kernel's behavior so that it will not “overcommit” memory. Although this setting will not prevent the from being invoked altogether, it will lower the chances significantly and will therefore lead to more robust system behavior. This is done by selecting strict overcommit mode via sysctl
:
or placing an equivalent entry in /etc/sysctl.conf
. You might also wish to modify the related setting vm.overcommit_ratio
. For details see the kernel documentation file .
Using huge pages reduces overhead when using large contiguous chunks of memory, as PostgreSQL does, particularly when using large values of . To use this feature in PostgreSQL you need a kernel with CONFIG_HUGETLBFS=y
and CONFIG_HUGETLB_PAGE=y
. You will also have to adjust the kernel setting vm.nr_hugepages
. To estimate the number of huge pages needed, start PostgreSQL without huge pages enabled and check the postmaster's VmPeak
value, as well as the system's huge page size, using the /proc
file system. This might look like:
The default behavior for huge pages in PostgreSQL is to use them when possible and to fall back to normal pages when failing. To enforce the use of huge pages, you can set to on
in postgresql.conf
. Note that with this setting PostgreSQL will fail to start if not enough huge pages are available.
For a detailed description of the Linux huge pages feature have a look at .
initdb
also initializes the default locale for the database cluster. Normally, it will just take the locale settings in the environment and apply them to the initialized database. It is possible to specify a different locale for the database; more information about that can be found in . The default sort order used within the particular database cluster is set by initdb
, and while you can create new databases using different sort order, the order used in the template databases that initdb creates cannot be changed without dropping and recreating them. There is also a performance impact for using locales other than C
or POSIX
. Therefore, it is important to make this choice correctly the first time.
initdb
also sets the default character set encoding for the database cluster. Normally this should be chosen to match the locale setting. For details see .
Name | Description | Values needed to run one PostgreSQL instance |
| Maximum size of shared memory segment (bytes) | at least 1kB, but the default is usually much higher |
| Minimum size of shared memory segment (bytes) | 1 |
| Total amount of shared memory available (bytes or pages) | same as |
| Maximum number of shared memory segments per process | only 1 segment is needed, but the default is much higher |
| Maximum number of shared memory segments system-wide | like |
| Maximum number of semaphore identifiers (i.e., sets) | at least |
| Maximum number of semaphores system-wide |
|
| Maximum number of semaphores per set | at least 17 |
| Number of entries in semaphore map | see text |
| Maximum value of semaphore | at least 1000 (The default is often 32767; do not change unless necessary) |
Before anyone can access the database, you must start the database server. The database server program is called postgres
. The postgres
program must know where to find the data it is supposed to use. This is done with the -D
option. Thus, the simplest way to start the server is:
which will leave the server running in the foreground. This must be done while logged into the PostgreSQL user account. Without -D
, the server will try to use the data directory named by the environment variable PGDATA
. If that variable is not provided either, it will fail.
Normally it is better to start postgres
in the background. For this, use the usual Unix shell syntax:
It is important to store the server's stdout and stderr output somewhere, as shown above. It will help for auditing purposes and to diagnose problems. (See Section 24.3 for a more thorough discussion of log file handling.)
The postgres
program also takes a number of other command-line options. For more information, see the postgres reference page and Chapter 19 below.
This shell syntax can get tedious quickly. Therefore the wrapper program pg_ctl is provided to simplify some tasks. For example:
will start the server in the background and put the output into the named log file. The -D
option has the same meaning here as for postgres
. pg_ctl
is also capable of stopping the server.
Normally, you will want to start the database server when the computer boots. Autostart scripts are operating-system-specific. There are a few distributed with PostgreSQL in the contrib/start-scripts
directory. Installing one will require root privileges.
Different systems have different conventions for starting up daemons at boot time. Many systems have a file /etc/rc.local
or /etc/rc.d/rc.local
. Others use init.d
or rc.d
directories. Whatever you do, the server must be run by the PostgreSQL user account and not by root or any other user. Therefore you probably should form your commands using su postgres -c '...'
. For example:
Here are a few more operating-system-specific suggestions. (In each case be sure to use the proper installation directory and user name where we show generic values.)
For FreeBSD, look at the file contrib/start-scripts/freebsd
in the PostgreSQL source distribution.
On OpenBSD, add the following lines to the file /etc/rc.local
:
On Linux systems either add
to /etc/rc.d/rc.local
or /etc/rc.local
or look at the file contrib/start-scripts/linux
in the PostgreSQL source distribution.
When using systemd, you can use the following service unit file (e.g., at /etc/systemd/system/postgresql.service
):
Using Type=notify
requires that the server binary was built with configure --with-systemd
.
Consider carefully the timeout setting. systemd has a default timeout of 90 seconds as of this writing and will kill a process that does not notify readiness within that time. But a PostgreSQL server that might have to perform crash recovery at startup could take much longer to become ready. The suggested value of 0 disables the timeout logic.
On NetBSD, use either the FreeBSD or Linux start scripts, depending on preference.
On Solaris, create a file called /etc/init.d/postgresql
that contains the following line:
Then, create a symbolic link to it in /etc/rc3.d
as S99postgresql
.
While the server is running, its PID is stored in the file postmaster.pid
in the data directory. This is used to prevent multiple server instances from running in the same data directory and can also be used for shutting down the server.
There are several common reasons the server might fail to start. Check the server's log file, or start it by hand (without redirecting standard output or standard error) and see what error messages appear. Below we explain some of the most common error messages in more detail.
This usually means just what it suggests: you tried to start another server on the same port where one is already running. However, if the kernel error message is not Address already in use
or some variant of that, there might be a different problem. For example, trying to start a server on a reserved port number might draw something like:
A message like:
probably means your kernel's limit on the size of shared memory is smaller than the work area PostgreSQL is trying to create (4011376640 bytes in this example). Or it could mean that you do not have System-V-style shared memory support configured into your kernel at all. As a temporary workaround, you can try starting the server with a smaller-than-normal number of buffers (shared_buffers). You will eventually want to reconfigure your kernel to increase the allowed shared memory size. You might also see this message when trying to start multiple servers on the same machine, if their total space requested exceeds the kernel limit.
An error like:
does not mean you've run out of disk space. It means your kernel's limit on the number of System V semaphores is smaller than the number PostgreSQL wants to create. As above, you might be able to work around the problem by starting the server with a reduced number of allowed connections (max_connections), but you'll eventually want to increase the kernel limit.
If you get an “illegal system call” error, it is likely that shared memory or semaphores are not supported in your kernel at all. In that case your only option is to reconfigure the kernel to enable these features.
Details about configuring System V IPC facilities are given in Section 18.4.1.
Although the error conditions possible on the client side are quite varied and application-dependent, a few of them might be directly related to how the server was started. Conditions other than those shown below should be documented with the respective client application.
This is the generic “I couldn't find a server to talk to” failure. It looks like the above when TCP/IP communication is attempted. A common mistake is to forget to configure the server to allow TCP/IP connections.
Alternatively, you'll get this when attempting Unix-domain socket communication to a local server:
The last line is useful in verifying that the client is trying to connect to the right place. If there is in fact no server running there, the kernel error message will typically be either Connection refused
or No such file or directory
, as illustrated. (It is important to realize that Connection refused
in this context does not mean that the server got your connection request and rejected it. That case will produce a different message, as shown in Section 20.15.) Other error messages such as Connection timed out
might indicate more fundamental problems, like lack of network connectivity.
PostgreSQL offers encryption at several levels, and provides flexibility in protecting data from disclosure due to database server theft, unscrupulous administrators, and insecure networks. Encryption might also be required to secure sensitive data such as medical records or financial transactions.
Database user passwords are stored as hashes (determined by the setting password_encryption), so the administrator cannot determine the actual password assigned to the user. If SCRAM or MD5 encryption is used for client authentication, the unencrypted password is never even temporarily present on the server because the client encrypts it before being sent across the network. SCRAM is preferred, because it is an Internet standard and is more secure than the PostgreSQL-specific MD5 authentication protocol.
The pgcrypto module allows certain fields to be stored encrypted. This is useful if only some of the data is sensitive. The client supplies the decryption key and the data is decrypted on the server and then sent to the client.
The decrypted data and the decryption key are present on the server for a brief time while it is being decrypted and communicated between the client and server. This presents a brief moment where the data and keys can be intercepted by someone with complete access to the database server, such as the system administrator.
Storage encryption can be performed at the file system level or the block level. Linux file system encryption options include eCryptfs and EncFS, while FreeBSD uses PEFS. Block level or full disk encryption options include dm-crypt + LUKS on Linux and GEOM modules geli and gbde on FreeBSD. Many other operating systems support this functionality, including Windows.
This mechanism prevents unencrypted data from being read from the drives if the drives or the entire computer is stolen. This does not protect against attacks while the file system is mounted, because when mounted, the operating system provides an unencrypted view of the data. However, to mount the file system, you need some way for the encryption key to be passed to the operating system, and sometimes the key is stored somewhere on the host that mounts the disk.
SSL connections encrypt all data sent across the network: the password, the queries, and the data returned. The pg_hba.conf
file allows administrators to specify which hosts can use non-encrypted connections (host
) and which require SSL-encrypted connections (hostssl
). Also, clients can specify that they connect to servers only via SSL.
GSSAPI-encrypted connections encrypt all data sent across the network, including queries and data returned. (No password is sent across the network.) The pg_hba.conf
file allows administrators to specify which hosts can use non-encrypted connections (host
) and which require GSSAPI-encrypted connections (hostgssenc
). Also, clients can specify that they connect to servers only on GSSAPI-encrypted connections (gssencmode=require
).
Stunnel or SSH can also be used to encrypt transmissions.
It is possible for both the client and server to provide SSL certificates to each other. It takes some extra configuration on each side, but this provides stronger verification of identity than the mere use of passwords. It prevents a computer from pretending to be the server just long enough to read the password sent by the client. It also helps prevent “man in the middle” attacks where a computer between the client and server pretends to be the server and reads and passes all data between the client and server.
If the system administrator for the server's machine cannot be trusted, it is necessary for the client to encrypt the data; this way, unencrypted data never appears on the database server. Data is encrypted on the client before being sent to the server, and database results have to be decrypted on the client before being used.
There are several ways to shut down the database server. You control the type of shutdown by sending different signals to the master postgres
process.SIGTERM
This is the Smart Shutdown mode. After receiving SIGTERM, the server disallows new connections, but lets existing sessions end their work normally. It shuts down only after all of the sessions terminate. If the server is in online backup mode, it additionally waits until online backup mode is no longer active. While backup mode is active, new connections will still be allowed, but only to superusers (this exception allows a superuser to connect to terminate online backup mode). If the server is in recovery when a smart shutdown is requested, recovery and streaming replication will be stopped only after all regular sessions have terminated.SIGINT
This is the Fast Shutdown mode. The server disallows new connections and sends all existing server processes SIGTERM, which will cause them to abort their current transactions and exit promptly. It then waits for all server processes to exit and finally shuts down. If the server is in online backup mode, backup mode will be terminated, rendering the backup useless.SIGQUIT
This is the Immediate Shutdown mode. The server will send SIGQUIT to all child processes and wait for them to terminate. If any do not terminate within 5 seconds, they will be sent SIGKILL. The master server process exits as soon as all child processes have exited, without doing normal database shutdown processing. This will lead to recovery (by replaying the WAL log) upon next start-up. This is recommended only in emergencies.
The pg_ctl program provides a convenient interface for sending these signals to shut down the server. Alternatively, you can send the signal directly using kill
on non-Windows systems. The PID of the postgres
process can be found using the ps
program, or from the file postmaster.pid
in the data directory. For example, to do a fast shutdown:
It is best not to use SIGKILL to shut down the server. Doing so will prevent the server from releasing shared memory and semaphores. Furthermore, SIGKILL kills the postgres
process without letting it relay the signal to its subprocesses, so it might be necessary to kill the individual subprocesses by hand as well.
To terminate an individual session while allowing other sessions to continue, use pg_terminate_backend()
(see Table 9.83) or send a SIGTERM signal to the child process associated with the session.
The PostgreSQL 10.5 sources can be obtained from the download section of our website: https://www.postgresql.org/download/. You should get a file named postgresql-10.5.tar.gz
or postgresql-10.5.tar.bz2
. After you have obtained the file, unpack it:
(Use bunzip2
instead of gunzip
if you have the .bz2
file.) This will create a directory postgresql-10.5
under the current directory with the PostgreSQL sources. Change into that directory for the rest of the installation procedure.
You can also get the source directly from the version control repository, see Appendix I.
It is possible to use SSH to encrypt the network connection between clients and a PostgreSQL server. Done properly, this provides an adequately secure network connection, even for non-SSL-capable clients.
First make sure that an SSH server is running properly on the same machine as the PostgreSQL server and that you can log in using ssh
as some user. Then you can establish a secure tunnel with a command like this from the client machine:
The first number in the -L
argument, 63333, is the port number of your end of the tunnel; it can be any unused port. (IANA reserves ports 49152 through 65535 for private use.) The second number, 5432, is the remote end of the tunnel: the port number your server is using. The name or IP address between the port numbers is the host with the database server you are going to connect to, as seen from the host you are logging in to, which is foo.com
in this example. In order to connect to the database server using this tunnel, you connect to port 63333 on the local machine:
To the database server it will then look as though you are really user joe
on host foo.com
connecting to localhost
in that context, and it will use whatever authentication procedure was configured for connections from this user and host. Note that the server will not think the connection is SSL-encrypted, since in fact it is not encrypted between the SSH server and the PostgreSQL server. This should not pose any extra security risk as long as they are on the same machine.
In order for the tunnel setup to succeed you must be allowed to connect via ssh
as joe@foo.com
, just as if you had attempted to use ssh
to create a terminal session.
You could also have set up the port forwarding as
but then the database server will see the connection as coming in on its foo.com
interface, which is not opened by the default setting listen_addresses = 'localhost'
. This is usually not what you want.
If you have to “hop” to the database server via some login host, one possible setup could look like this:
Note that this way the connection from shell.foo.com
to db.foo.com
will not be encrypted by the SSH tunnel. SSH offers quite a few configuration possibilities when the network is restricted in various ways. Please refer to the SSH documentation for details.
Several other applications exist that can provide secure tunnels using a procedure similar in concept to the one just described.
This section discusses how to upgrade your database data from one PostgreSQL release to a newer one.
Current PostgreSQL version numbers consist of a major and a minor version number. For example, in the version number 10.1, the 10 is the major version number and the 1 is the minor version number, meaning this would be the first minor release of the major release 10. For releases before PostgreSQL version 10.0, version numbers consist of three numbers, for example, 9.5.3. In those cases, the major version consists of the first two digit groups of the version number, e.g., 9.5, and the minor version is the third number, e.g., 3, meaning this would be the third minor release of the major release 9.5.
Minor releases never change the internal storage format and are always compatible with earlier and later minor releases of the same major version number. For example, version 10.1 is compatible with version 10.0 and version 10.6. Similarly, for example, 9.5.3 is compatible with 9.5.0, 9.5.1, and 9.5.6. To update between compatible versions, you simply replace the executables while the server is down and restart the server. The data directory remains unchanged — minor upgrades are that simple.
For major releases of PostgreSQL, the internal data storage format is subject to change, thus complicating upgrades. The traditional method for moving data to a new major version is to dump and reload the database, though this can be slow. A faster method is pg_upgrade. Replication methods are also available, as discussed below.
New major versions also typically introduce some user-visible incompatibilities, so application programming changes might be required. All user-visible changes are listed in the release notes (Appendix E); pay particular attention to the section labeled "Migration". If you are upgrading across several major versions, be sure to read the release notes for each intervening version.
Cautious users will want to test their client applications on the new version before switching over fully; therefore, it's often a good idea to set up concurrent installations of old and new versions. When testing a PostgreSQL major upgrade, consider the following categories of possible changes:Administration
The capabilities available for administrators to monitor and control the server often change and improve in each major release.SQL
Typically this includes new SQL command capabilities and not changes in behavior, unless specifically mentioned in the release notes.Library API
Typically libraries like libpq only add new functionality, again unless mentioned in the release notes.System Catalogs
System catalog changes usually only affect database management tools.Server C-language API
This involves changes in the backend function API, which is written in the C programming language. Such changes affect code that references backend functions deep inside the server.
One upgrade method is to dump data from one major version of PostgreSQL and reload it in another — to do this, you must use a logical backup tool like pg_dumpall; file system level backup methods will not work. (There are checks in place that prevent you from using a data directory with an incompatible version of PostgreSQL, so no great harm can be done by trying to start the wrong server version on a data directory.)
It is recommended that you use the pg_dump and pg_dumpall programs from the newer version of PostgreSQL, to take advantage of enhancements that might have been made in these programs. Current releases of the dump programs can read data from any server version back to 7.0.
These instructions assume that your existing installation is under the /usr/local/pgsql
directory, and that the data area is in /usr/local/pgsql/data
. Substitute your paths appropriately.
If making a backup, make sure that your database is not being updated. This does not affect the integrity of the backup, but the changed data would of course not be included. If necessary, edit the permissions in the file /usr/local/pgsql/data/pg_hba.conf
(or equivalent) to disallow access from everyone except you. See Chapter 20 for additional information on access control.
To back up your database installation, type:
To make the backup, you can use the pg_dumpall command from the version you are currently running; see Section 25.1.2 for more details. For best results, however, try to use the pg_dumpall command from PostgreSQL 12.2, since this version contains bug fixes and improvements over older versions. While this advice might seem idiosyncratic since you haven't installed the new version yet, it is advisable to follow it if you plan to install the new version in parallel with the old version. In that case you can complete the installation normally and transfer the data later. This will also decrease the downtime.
Shut down the old server:
On systems that have PostgreSQL started at boot time, there is probably a start-up file that will accomplish the same thing. For example, on a Red Hat Linux system one might find that this works:
See Chapter 18 for details about starting and stopping the server.
If restoring from backup, rename or delete the old installation directory if it is not version-specific. It is a good idea to rename the directory, rather than delete it, in case you have trouble and need to revert to it. Keep in mind the directory might consume significant disk space. To rename the directory, use a command like this:
(Be sure to move the directory as a single unit so relative paths remain unchanged.)
Install the new version of PostgreSQL as outlined in Section 16.4.
Create a new database cluster if needed. Remember that you must execute these commands while logged in to the special database user account (which you already have if you are upgrading).
Restore your previous pg_hba.conf
and any postgresql.conf
modifications.
Start the database server, again using the special database user account:
Finally, restore your data from backup with:
using the new psql.
The least downtime can be achieved by installing the new server in a different directory and running both the old and the new servers in parallel, on different ports. Then you can use something like:
to transfer your data.
The pg_upgrade module allows an installation to be migrated in-place from one major PostgreSQL version to another. Upgrades can be performed in minutes, particularly with --link
mode. It requires steps similar to pg_dumpall above, e.g. starting/stopping the server, running initdb. The pg_upgrade documentation outlines the necessary steps.
It is also possible to use logical replication methods to create a standby server with the updated version of PostgreSQL. This is possible because logical replication supports replication between different major versions of PostgreSQL. The standby can be on the same computer or a different computer. Once it has synced up with the master server (running the older version of PostgreSQL), you can switch masters and make the standby the master and shut down the older database instance. Such a switch-over results in only several seconds of downtime for an upgrade.
This method of upgrading can be performed using the built-in logical replication facilities as well as using external logical replication systems such as pglogical, Slony, Londiste, and Bucardo.
PostgreSQL also has native support for using GSSAPI to encrypt client/server communications for increased security. Support requires that a GSSAPI implementation (such as MIT krb5) is installed on both client and server systems, and that support in PostgreSQL is enabled at build time (see Chapter 16).
The PostgreSQL server will listen for both normal and GSSAPI-encrypted connections on the same TCP port, and will negotiate with any connecting client on whether to use GSSAPI for encryption (and for authentication). By default, this decision is up to the client (which means it can be downgraded by an attacker); see Section 20.1 about setting up the server to require the use of GSSAPI for some or all connections.
Other than configuration of the negotiation behavior, GSSAPI encryption requires no setup beyond that which is necessary for GSSAPI authentication. (For more information on configuring that, see Section 20.6.)
shared_buffers
(integer
)
設定資料庫伺服器用於共享記憶體緩衝區的大小。預設值通常為 128 MB,但如果您的核心設定不支援(在 initdb 期間確定),則可能會更少。此設定必須至少為128 KB。(非預設值的 BLCKSZ 會改變最小值。)但是,通常需要高於最小值的設定才能獲得良好的性能。此參數只能在伺服器啟動時設定。
如果您擁有 1GB 或更多記憶體的專用資料庫伺服器,shared_buffers 的合理起始值是系統記憶體的 25%。有些工作負載甚至可以為 shared_buffers 設定更大的值,但由於PostgreSQL 依賴於作業系統緩衝區,因此,把 shared_buffers 分配 40% 以上的記憶體大小不太可能比少量分配更好。shared_buffers 較大設定通常需要 max_wal_size 相對應的增加,以便分散在較長時間內寫入大量新資料或變更資料的過程。
在 RAM 小於 1GB 的系統上,更小比例是合適的,以便為作業系統留下足夠的空間。
huge_pages
(enum
)
啟用/停用大型記憶體頁面。有效值為 try(預設值),on 和 off。
目前,僅在 Linux 上支援此功能。設定為 try 時,在其他系統上會忽略該設定。
大型頁面的使用會使得頁面管理表更小,記憶體管理花費的 CPU 時間更少,從而提高了效能。有關更多詳細訊息,請參閱第 18.4.5 節。
設定 huge_pages 後,伺服器將嘗試使用大型頁面,但如果失敗則回退到使用正常分配。如果為 on,則若無法使用大型頁面將使伺服器無法啟動。 off 時,則不會使用大型頁面。
temp_buffers
(integer
)
設定每個資料庫連線使用的最大臨時緩衝區大小。這些是僅用於存取臨時資料表的連線本地緩衝區。預設值為 8MB。可以在單個連線中變更設定,但只能在連線中首次使用臨時資料表之前更改;後續嘗試更改該值將不會對該連線產生任何影響。
連線將根據需要分配臨時緩衝區,直到 temp_buffers 的上限。實際上不需要很多臨時緩衝區的連線中設定較大值的成本只是 temp_buffers 中每個增量的緩衝區描述指標,或大約 64 個位元組。但是,如果實際使用緩衝區,則會消耗額外的 8192 位元組(或者通常為 BLCKSZ 個位元組)。
max_prepared_transactions
(integer
)
設定可同時處於「prepared」狀態的最大交易事務數量(請參閱 PREPARE TRANSACTION)。將此參數設定為零(這是預設值)的話,會停用預備交易的功能。此參數只能在伺服器啟動時設定。
如果您不打算使用預備交易事務,則應將此參數設定為零以防止意外建立預備的交易事務。如果您正在使用預備的交易事務,那麼您可能希望 max_prepared_transactions 至少與 max_connections 一樣大,以便每個連線都可以至少有一個準備好的預備交易事務。
運行備用伺服器時,必須將此參數設定為與主服務器上相同或更高的值。 否則,查詢將不被允許在備用伺服器中。
work_mem
(integer
)
指定寫入暫存檔之前內部排序操作和雜湊表使用的記憶體大小。此值預設為 4 MB。請注意,對於複雜的查詢,可能會同時執行多個排序或雜湊作業;在開始將資料寫入暫存檔之前,每個操作都將被允許盡可能使用記憶體。此外,多個連線可以同時進行這些操作。因此,所使用的總記憶體量可能是 work_mem 值的許多倍;決定值時必須牢記此一事實。排序操作用於 ORDER BY,DISTINCT 和 merge JOIN。雜湊表用於 hash JOIN,hash aggregation 和 IN 子查詢處理。
maintenance_work_mem
(integer
)
指定維護操作要使用的最大記憶體大小,例如 VACUUM,CREATE INDEX 和ALTER TABLE ADD FOREIGN KEY。預設為 64 MB。由於資料庫連線一次只能執行其中一個操作,不會有多個同時運行,因此將此值設定為遠大於 work_mem 是安全的。較大的設定可能會提高清理和恢復資料庫回復的效能。
請注意,當 autovacuum 運行時,最多可以分配 autovacuum_max_workers 倍的記憶體,因此請注意不要將預設值設定得太高。透過單獨設定 autovacuum_work_mem 來控制它會有幫助。
replacement_sort_tuples
(integer
)
當要排序的 tuple 數小於此數時,排序將使用 replacement selection 而不是以 quicksort 産生其第一個輸出,這在記憶體受限的環境中可能很有用。在這種環境中,輸入到較大排序操作的 tuple 具有強大的物理到邏輯關連。請注意,這不包括具有反相關的輸入 tuple。替換選擇算法有可能産生一個不需要合併的長查詢,其中使用預設策略將導致必須合併以產生最終排序輸出的許多輸出資料列。這能更快地完成排序操作。
預設值為 150,000 個 tuple。請注意,較高的值通常不會更有效,並且可能適得其反,因為優先佇列對可用 CPU 緩衝區的大小很敏感,而預設策略使用快取的 oblivious algorithm 運行。此屬性允許預設排序策略自動且透明地有效使用可用的CPU 緩衝區。
將 maintenance_work_mem 設定為其預設值通常會防止工具程序命令的外部排序(例如,CREATE INDEX 用於建構 B-tree 索引的排序)使用選擇排序法,除非輸入tuple 非常大。
autovacuum_work_mem
(integer
)
指定每個 autovacuum 工作程序使用的最大記憶體。它預設為 -1,表示應該使用 maintenance_work_mem 的值。以其他方式執行時,此設定對 VACUUM 的行為沒有影響。
max_stack_depth
(integer
)
指定伺服器工作堆疊的最大安全深度。此參數的理想設定是核心強制執行的實際堆疊大小限制(由 ulimit -s 或其他等效設定),減去 1 MB 左右的安全範圍。需要安全額度,因為在伺服器的每個程序中都不會檢查堆疊深度,而是僅在關鍵的潛在遞迴程序(例如表示式求值)中檢查。預設設定是 2 MB,這是保守地小,不太可能冒崩潰的風險。但是,它可能太小而無法執行複雜的功能。只有超級使用者才能變更此設定。
將 max_stack_depth 設定為高於實際核心限制將意味著失控的遞迴函數可能導致單個後端程序崩潰。在 PostgreSQL 可以確定核心限制的平台上,伺服器不允許將此變數設定為不安全的值。但是,並非所有平台都有提供資訊,因此建議在選擇值時要小心。
dynamic_shared_memory_type
(enum
)
指定伺服器應使用的動態共享記憶體方法。可能的值是 posix(使用 shm_open 分配的 POSIX 共享記憶體),sysv(透過 shmget 分配的 System V 共享記憶體),windows(Windows 共享記憶體),mmap(使用儲存在資料目錄中的記憶體映射檔案來模擬共享記憶體) ),沒有(停用此功能)。並非所有平台都支援所有值;第一個受支援的選項是該平台的預設選項。通常不鼓勵使用 mmap 選項,這在任何平台上都不是預設選項,因為作業系統可能會將修改後的頁面重複寫回磁碟,從而增加系統 I/O 負載;但是,當 pg_dynshmem 目錄儲存在 RAM 磁碟上或其他共享記憶體裝置不可用時,它可能對除錯很有用。
temp_file_limit
(integer
)
指定程序可用於暫存檔的最大磁碟空間大小,例如排序和雜湊暫存檔,或持有游標的檔案。試圖超過此限制的交易將被取消。此值以 KB 為單位指定,-1(預設值)表示無限制。只有超級使用者可以變更改此設定。
此設定限制了給予 PostgreSQL 程序使用的所有暫存檔在任何時刻能使用的總空間。應該注意的是,用於臨時資料表的磁碟空間與在查詢執行過程中使用的暫存檔不同,並不會計入此限制。
max_files_per_process
(integer
)
設定每個伺服器子程序允許的同時最大開啓的檔案數。預設值是 1000 個檔案。如果核心可以確保每個程序的安全限制,則不必擔心此設定。但是在某些平台上(特別是大多數 BSD 系統),如果許多程序都嘗試開啓那麼多檔案,核心將允許單個程序打開比系統實際支援的更多的檔案。如果您發現自己看到“Too many open files”失敗,請嘗試減少此設定。此參數只能在伺服器啟動時設定。
在執行 VACUUM 和 ANALYZE 指令期間,系統會維護一個內部計數器,用於追踪執行的各種 I/O 操作的估計成本。當累計成本達到極限(由 vacuum_cost_limit 指定)時,執行操作的過程將在 sleep_cost_delay 指定的短時間內休眠。然後它將重置計數器並繼續執行。
此功能的目的是允許管理員減少這些指令對同時間資料庫活動的 I/O 影響。在許多情況下,像 VACUUM 和 ANALYZE 這樣的維護指令很快完成就不重要;但是,這些指令又通常非常重要,不會嚴重干擾系統執行其他資料庫操作的能力。基於成本的清理延遲為管理員提供了實現這一目標的途徑。
對於手動發出的 VACUUM 指令,預設情況下會停用此功能。要啟用它,請將 vacuum_cost_delay 變數設定為非零值。
vacuum_cost_delay
(integer
)
超出成本限制時程序將休眠的時間長度(以毫秒為單位)。預設值為零,這會停用成本考量的清理延遲功能。正值可實現成本考量的清理。請注意,在許多系統上,睡眠延遲的有效分辨率為 10 毫秒;將 vacuum_cost_delay 設定為不是 10 的倍數的值可能與將其設定為 10 的下一個更高倍數具有相同的結果。
當使用成本考量的資料庫清理時,vacuum_cost_delay 的適當值通常非常小,可能是 10 或 20 毫秒。調整清理的資源消耗最好透過變更其他清理成本參數來完成。
vacuum_cost_page_hit
(integer
)
清除共享緩衝區中找到的緩衝區估計成本。它表示鎖定緩衝池,查詢共享雜湊表和掃描頁面內容的成本。預設值為 1。
vacuum_cost_page_miss
(integer
)
清除必須從磁碟讀取的緩衝區的估計成本。這表示鎖定緩衝池,查詢共享雜湊表,從磁碟讀取所需塊並掃描其內容的成本。預設值為 10。
vacuum_cost_page_dirty
(integer
)
清理修改先前清理的區塊時産生的估計成本。它表示將已修改區塊再次更新到磁碟所需的額外 I/O。預設值為 20。
vacuum_cost_limit
(integer
)
累積成本將導致清理程序進入睡眠狀態。預設值為 200。
某些操作可能會持有關鍵的鎖定,因此應盡快完成。在此類操作期間不會發生成本考量的清理延遲。因此,成本可能會遠遠高於指定的限制。為了避免在這種情況下無意義的長延遲,實際延遲計算為 vacuum_cost_delay \* cumulative_balance / vacuum_cost_limit,最大為 vacuum_cost_delay *\ 4。
有一個單獨的伺服器程序稱為背景寫入程序,其功能是發起「dirty」(新的或修改的)共享緩衝區的寫入。 它會寫入共享緩衝區,因此處理使用者查詢的伺服器程序很少或永遠不需要等待寫入的發生。但是,背景寫入程序確實導致 I/O 負載的整體的淨增加,因為雖然每個檢查點間隔可能只會寫一次 repeatedly-dirtied 頁面,但背景寫入程序可能會發起多次寫入,因為它在同一時間間隔內被變更了。本小節中討論的參數可用於調整適於本地需求的行為。
bgwriter_delay
(integer
)
指定背景寫入程序的輪詢之間的延遲。在每一次輪詢中,寫入程序發出一些 dirty 緩衝區的寫入(可透過以下參數控制)。然後它睡眠 bgwriter_delay 毫秒,再重複。但是,當緩衝池中沒有 dirty 緩衝區時,無論 bgwriter_delay 如何,它都會進入更長的睡眠狀態。預設值為 200 毫秒。請注意,在許多系統上,睡眠延遲的有效分辨率為 10 毫秒;將 bgwriter_delay 設定為不是 10 的倍數可能與將其設定為 10 的下一個更高倍數具有相同的結果。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
bgwriter_lru_maxpages
(integer
)
在每一次輪詢中,背景寫入程序將寫入多個緩衝區。將此值設定為零將停用背景寫入。(請注意,由單獨的專用輔助程序管理的檢查點不受影響。)預設值為 100 個緩衝區。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
bgwriter_lru_multiplier
(floating point
)
每次輪詢寫入的 dirty 緩衝區數量取決於最近幾輪中伺服器程序所需的新緩衝區數。將最近的平均需求乘以 bgwriter_lru_multiplier,得出下一輪期間所需緩衝區數量的估計值。寫入 dirty 緩衝區,直到有許多乾淨,可再利用的緩衝區可用。(但是,每輪不會寫入超過 bgwriter_lru_maxpages 的緩衝區。)因此,1.0 的設定表示準確寫出預測需要的緩衝區數量的「Just in time」策略。較大的值為需求中的峰值提供了一些緩衝,而較小的值有意地使寫入由伺服器程序完成。預設值為 2.0。 此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
bgwriter_flush_after
(integer
)
只要背景寫入程序寫入了超過 bgwriter_flush_after 個位元組,就會嘗試強制作業系統向底層儲存系統發出這些寫入操作。這樣做會限制核心頁面緩衝區中的 dirty 資料量,減少在檢查點結束時發出 fsync 時停止的可能性,或者作業系統在背景以較大批次寫回資料的可能性。通常這會導致事務延遲大大減少,但也有一些情況,特別是工作負載大於 shared_buffers,但小於作業系統的頁面緩衝,其效能可能會降低。 此設定可能對某些平台沒有影響。有效範圍介於 0(停用強制寫回)和2MB之間。Linux 上的預設值為 512kB,其他地方為 0。(如果 BLCKSZ 不是8kB,則預設值和最大值會按比例縮放。)此參數只能在 postgresql.conf 檔案或匼服器命令列中設定。
bgwriter_lru_maxpages 和 bgwriter_lru_multiplier 設定較小值可以減少背景寫入程序造成的額外 I/O 負載,但使伺服器程序更有可能必須為自己發出寫入要求,可能造成交互查詢的延遟。
effective_io_concurrency
(integer
)
設定 PostgreSQL 期望可以同時執行的磁碟 I/O 操作數。提高此值將增加任何單個 PostgreSQL 連線嘗試同時啟動的 I/O 操作數。允許的範圍是 1 到 1000,或者為零以停用非同步 I/O 要求的使用。目前,此設定僅影響 bitmap heap 掃描。
對於磁碟機而言,此設定一個很好的起點是包含用於資料庫的 RAID 0 分散或 RAID 1 鏡像的單獨磁碟數量。(對於 RAID 5,不應計算奇偶校驗磁碟。)但是,如果資料庫通常忙於在同時連線中發出多個查詢,則較低的值可能足以使磁碟陣列保持忙碌狀態。高於保持磁碟繁忙所需的值只會導致額外的 CPU 開銷。SSD和其他基於內存的儲存通常可以處理許多同時要求,因此最佳值可能是數百個。
非同步 I/O 取決於某些作業系統缺乏的有效 posix_fadvise 函數。如果該功能不存在,則將此參數設定為零以外的任何值將導致錯誤。而在某些作業系統(例如,Solaris)上,此功能存在但實際上並沒有做任何事情。
在受支援的系統上預設值為 1,否則為 0。透過設定同名的 tablespace 參數,可以為特定資料表空間中的資料表覆寫此值(請參閱 ALTER TABLESPACE)。
max_worker_processes
(integer
)
設定系統可以支援的最大背景程序數量。此參數只能在伺服器啟動時設定。預定值為 8。
執行備用伺服器時,必須將此參數設定為與主伺服器上相同或更高的值。否則,將不允許在備用伺服器中進行查詢。
變更此值時,請考慮同步調整 max_parallel_workers 和 max_parallel_workers_per_gather。
max_parallel_workers_per_gather
(integer
)
設定單個 Gather 或 Gather Merge 節點可以啟動的最大工作程序數量。同時工作程序取自 max_worker_processes 建立的程序池,由 max_parallel_workers 限制。請注意,請求的工作程序數量在執行時可能實際上不可用。如果發生這種情況,計劃將以比預期更少的工作程序運行,這可能是低效能的。預設值為 2。將此值設定為 0 將停用平行查詢執行。
請注意,平行查詢可能比非平行查詢消耗的資源要多得多,因為每個工作程序都是一個完全獨立的程序,與其他使用者連線對系統的影響大致相同。在為此設定選擇值時,以及在配置控制資源利用率的其他設定(例如work_mem)時,應考慮這一點。 諸如 work_mem 之類的資源限制被單獨應用於每個工作程序,這意味著所有程序的總利用率可能比通常用於任何單個程序的總利用率高得多。例如,使用 4 個工作程序的平行查詢可能會使用高達 5 倍的 CPU 時間、記憶體、I/O 頻寬等作為根本不使用工作程序的查詢。
有關平行查詢的更多訊息,請參閱第 15 章。
max_parallel_workers
(integer
)
設定系統可以支援平行查詢的最大工作程序數量。預設值為 8。增大或減小此值時,請考慮調整 max_parallel_workers_per_gather。另請注意,此值的設定高於 max_worker_processes 將不起作用,因為平行工作程序取自該設定所建立的工作程序池。
backend_flush_after
(integer
)
只要一個後端寫入了多個 backend_flush_after 字串,就會嘗試強制作業系統向底層儲存發出這些寫入操作。這樣做會限制核心頁面緩衝區中的非同步資料量,減少在檢查點結束時發出 fsync 時暫時停止的可能性,或者作業系統在後端以較大批量寫回資料的可能性。通常這會導致事務延遲大大減少,但也有一些情況,特別是工作負載大於shared_buffers,但小於作業系統的頁面暫存,其性能可能會降低。此設定可能對某些平台沒有影響。有效範圍介於 0(停用強制寫回)和 2MB 之間。預設值為 0,即沒有強制寫回。(如果 BLCKSZ 不是 8kB,則最大值與其成比例。)
old_snapshot_threshold
(integer
)
設定可以使用快照的最短時間,而不會在使用快照時發生快照過舊的錯誤。此參數只能在伺服器啟動時設定。
超過閾值,舊資料可能被清理。這可以幫助防止長時間使用的快照所面臨的資料膨脹。為了防止由於清理快照可能會顯示資料的錯誤結果,當快照早於此閾值時會産生錯誤,並且快照用於讀取自建構快照以來已修改的頁面。
值 -1 將停用此功能,並且是預設值。産品等級的有用值可能從少量幾小時到幾天不等。此設定將被強制為分鐘的顆粒度,並且僅允許小數字(例如 0 或 1 分鐘),因為它們有時可用於測試。雖然允許設定高達 60d,但請注意,在許多工作負載中,可能會在更短的時間範圍內發生極端資料膨脹或事務 ID 重覆。
啟用此功能後,關連末尾釋放的空間無法釋放到作業系統,因為這可能會刪除檢測快照過舊狀態所需的訊息。除非明確要求釋放(例如,使用 VACUUM FULL),否則分配給關連的所有空間仍與該關連相關聯,僅在該關連內重覆使用。
此設定不會嘗試保證在任何特定情況下都會産生錯誤。實際上,如果可以從已完成結果集合的游標産生正確的結果,即使引用資料表中的基礎資料列已被清理,也不會産生錯誤。有些資料表不能安全地儘早清理,因此不會受到此設定的影響,例如系統目錄。對於此類資料表,此設定既不會減少膨脹,也不會在掃描時產生快照過舊的錯誤。
要在作業系統註冊 Windows 事件日誌,請使用以下指令:
這將建立事件檢視器使用的註冊機碼項目,該項目由名為 PostgreSQL 的預設事件來源建立。
要指定不同的事件來源名稱(請參閱 event_source),請使用 /n 和 /i 選項:
要從作業系統註銷事件日誌,請使用以下指令:
要在資料庫伺服器中啟用事件日誌記錄,請修改 postgresql.conf 中的 log_destination ,使其包含 eventlog。
search_path
(string
)這個參數表示,當一個物件(資料表、資料型別、函數等)以未指定 schema 的簡單名稱引用時,其搜尋的路徑順序。當不同 schema 中有相同名稱的物件時,將採用搜尋路徑中第一個找到的物件。不在搜尋路徑中的任何 schema 中物件,就只能透過使用限定名稱來指定其 schema 來引用。
search_path 的內容必須是逗號分隔的 schema 名稱列表。任何非現有 schema 的名稱,或是使用者不具有 USAGE 權限的 schema,都將被忽略。
如果其中一個項目是特殊名稱 $user,則會使用 SESSION_USER 回傳的名稱作為 schema 名稱,確認該 schema 存在且使用者具有 USAGE 權限。 (如果沒有權限,$user 將被忽略。)
系統目錄 pg_catalog 一定會被搜尋,無論是否列在搜尋路徑中。如果列在搜尋路徑中了,那麼它將按照指定的順序被搜尋。 如果 pg_catalog 不在搜尋路徑中,那麼它將會優先被搜尋。
同樣地,目前連線的臨時資料表的schema,pg_temp_nnn,如果它存在的話,就一定會被搜尋。它可以透過使用別名 pg_temp 明確列在搜尋路徑中。如果沒有在搜尋路徑中列出的話,則優先搜尋(在 pg_catalog 之前)。但是,臨時 schema 只是搜索關連(資料表、view,序列等)和資料型別名稱。不會搜尋函數或運算子名稱。
建立物件時沒有指定特定的 schema,那麼它們將被放置在 search_path 中的第一個有效 schema 中。如果搜尋路徑為空,則會產生錯誤。
這個參數的預設值是 “$user”,public。此設定用來支援共享資料庫,沒有使用者具有私有 schema、所有共享使用 public、私人自有 schema ,以及以上情境的組合。其他的需求也可以透過更改預設的搜索路徑設置來達到,無論是全域或自有搜尋路徑。
搜尋路徑的目前內容可以使用 SQL 函數 current_schemas 來檢查(詳見 9.25 節)。這與檢查 search_path 的內容並不完全相同,因為 current_schemas 表示 search_path 中出現的項目是如何解析的。
有關 schema 處理的更多訊息,請參見第 5.9 節。
row_security
(boolean
)此參數控制在資料列安全原則檢查時是否進行錯誤中斷。設定為 on 時,安全原則以正常方式運作。當設定為 off 時,除非查詢失敗,否則會至少符合一個原則。 預設值為 on。變更為 off 時,將會限制資料列的可視性,而可能造成不正確的結果;例如,pg_dump 就會變更其預設值。此參數對於可以繞過每個安全原則的角色,也就是對具有 BYPASSRLS 屬性的超級使用者和角色都不會產生影響。
有關於資料列安全原則的更多訊息,請參閱 CREATE POLICY。
default_tablespace
(string
)此參數指的是在 CREATE 指令未明確指定資料表空間(tablespace)時用於建立的資料庫物件(資料表和索引)的預設資料表空間。
該值可以是資料表空間的名稱,也可以是使用空字串表示為目前資料庫的預設資料表空間。如果該值與不符合任何現有的資料表空間名稱時,PostgreSQL 將自動使用目前資料庫的預設資料表空間。如果指定了非預設的資料表空間,則使用者必須具有 CREATE 權限,否則建立的操作將會失敗。
這個參數不用於臨時資料表;對於臨時資料表來說,會參考 temp_tablespaces 參數。
建立資料庫時也不會使用這個參數。預設情況下,新的資料庫將複製的樣板資料庫,並繼承其資料表空間的設定。
有關於資料表空間的更多資訊,請參閱第 22.6 節。
temp_tablespaces
(string
)此參數指定在 CREATE 指令未指定資料表空間時創立臨時物件(臨時資料表和臨時資料表的索引)的資料表空間。用於排序大量資料集的臨時檔案也在這些資料表空間中創立。
該內容是資料表空間名稱的列表。當列表中有多個名稱時,PostgreSQL 在每次建立臨時物件時都會隨機選擇一個列表成員;除非是在一個交易中,連續建立的臨時物件將會被放置在列表的後續資料表空間中。 如果列表的元素是空字串,PostgreSQL 將自動使用目前資料庫的預設資料表空間。
設定 temp_tablespaces 時,指定一個不存在的資料表空間會造成錯誤,因為指定一個使用者沒有 CREATE 權限的資料表空間。但是,使用先前設定的內容時,不存在的資料表空間將被忽略,使用者缺少 CREATE 權限的資料表空間也將被忽略。特別是,在使用 postgresql.conf 中設定的內容時,此規則適用。
預設值是一個空字串,這將會使用目前資料庫的預設資料空間中建立所有臨時物件。
另請參閱本頁的 default_tablespace。
check_function_bodies
(boolean
)這個參數通常是啓用(on)的。如果把它關閉(off)的話,將在 CREATE FUNCTION 時關閉函數內容檢驗的措施。停用檢驗可避免檢驗過程的副作用,避免由於物件引用等問題所導致的誤報。例如以其他使用者載入函數之前,將此參數設置為 off;pg_dump 將會自動執行此操作。
default_transaction_isolation
(enum
)每組 SQL 交易查詢都有一個隔離的等級,可以是「read uncommitted」、「read committed」、「repeatable read」或「serializable」。此參數控制每個新的交易產生時的預設隔離等級。預設是「read committed」。
請參閱第 13 章和 SET TRANSACTION 以取得更多訊息。
default_transaction_read_only
(boolean
)一個唯讀的 SQL 交易不能更新非臨時的資料表。此參數控制每個新的交易的預設為唯讀狀態。預設是關閉(off)的(可讀/可寫)。
請參閱 SET TRANSACTION 以取得更多訊息。
default_transaction_deferrable
(boolean
)以 serializable 的隔離等級執行時,可延遲的唯讀 SQL 交易可能會被延遲,稍後才允許繼續。但是,一旦開始執行,就不會產生確保可序列化所需的任何成本;所以序列化代碼將不會因為同步更新而強制中止,使得這個選項適合用於長時間運行的唯讀交易。
此參數控制每個新交易查詢的預設可延期狀態。它目前對讀寫交易或者低於 serializable 隔離等級的操作沒有影響。預設是關閉(off)的。
請參閱 SET TRANSACTION 以取得更多訊息。
session_replication_role
(enum
)控制目前連線與複寫相關觸發器與規則。設定此參數需要超級使用者權限,會導致放棄任何先前快取的查詢計劃。可能的值是 origin(預設)、replica 和 local。 有關更多訊息,請參閱 ALTER TABLE。
statement_timeout
(integer
)任何指令執行超過指定的時間時,就會中止其執行。時間單位為 millisecond(毫秒)。以伺服器接受到的時間起算。 如果 log_min_error_statement 設定為 ERROR 或更低的等級時,則超時的查詢語句將被記錄下來。設定值為零(預設值),將其關閉功能。
不建議在 postgresql.conf 中設定 statement_timeout,因為它會影響所有的連線。
lock_timeout
(integer
)當你企圖鎖定資料表、索引、資料列或其他資料庫物件上時,任何等待超過指定的毫秒數的語句都會被強制中止。時間限制會分別適用於每次鎖定取得的嘗試。此限制適用於明確的鎖定請求(例如 LOCK TABLE 或 SELECT FOR UPDATE without NOWAIT)以及隱含的鎖定請求。如果將 log_min_error_statement 設定為 ERROR 或更低的等級時,則會記錄超時的語查詢句。設定值為零(預設值),將其關閉功能。
與 statement_timeout 不同,這個超時設定只會在等待鎖定的時候有作用。請注意,如果 statement_timeout 不為零,則將 lock_timeout 設定為相同或更大的值是毫無意義的,因為查詢語句超時總是會首先觸發。
不建議在 postgresql.conf 中設定 lock_timeout,因為這會影響所有的連線。
idle_in_transaction_session_timeout
(integer
)如果空閒時間超過指定的持續時間時(以毫秒為單位)未完成的交易將會被終止。這會釋放該連線所持有的任何鎖定,並使連線可以重新使用;也只有 tuple 才能看到這個交易被清除。有關這方面的更多細節,請參閱第 24.1 節。
預設值 0 表停用此功能。
vacuum_freeze_table_age
(integer
)如果資料表的 pg_class.relfrozenxid 欄位值已達到此設定的指定時間,VACUUM 將主動執行掃描。主動的掃描不同於一般的 VACUUM,因為它會訪問每個可能包含解開的 XID 或 MXID的頁面,而不僅僅是那些可能包含廢棄 tuple 的頁面。預設是 1.5 億筆交易。 儘管使用者可以設定的範圍為 0 到 20 億,但 VACUUM 將自動地將有效值限制為 autovacuum_freeze_max_age 的 95%,以便在啟動資料表的 anti-wraparound 自動清理之前,定期的手動 VACUUM 有機會運行。欲了解更多訊息,請參閱第 24.1.5 節。
vacuum_freeze_min_age
(integer
)指定 VACUUM 是否決定在掃描資料表時凍結資料列版本的截止期限(交易中)。預設是5000萬交易。 儘管使用者可以設定此值為 0 到 10 億之間的任何值,但 VACUUM 將自動地將有效值限制為 autovacuum_freeze_max_age 值的一半,以便在強制自動清理之間沒有過短的不合理時間間隔。欲了解更多訊息,請參閱第 24.1.5 節。
vacuum_multixact_freeze_table_age
(integer
)如果資料表的 pg_class.relminmxid 欄位值已達到此設定指定的時間,VACUUM 將主動執行掃描。主動的掃描不同於一般的 VACUUM,因為它會訪問每個可能包含解開的 XID 或 MXID 的頁面,而不僅僅是那些可能包含廢棄 tuple 的頁面。預設值是 1.5 億個交易。儘管使用者可以設定的範圍為 0 到 20 億,但 VACUUM 將自動地將有效值限制為 autovacuum_freeze_max_age的 95%,以便在啟動資料表的 anti-wraparound 自動清理之前,定期的手動 VACUUM 有機會運行。欲了解更多訊息,請參閱第 24.1.5 節。
vacuum_multixact_freeze_min_age
(integer
)指定 VACUUM 在掃描資料表時是使用較新的 transaction ID 或是 multixact ID,來替換多個 multixact ID 的截斷年限(以 multixact 表示)。預設是500萬個 multixact。儘管使用者可以設定此值為 0 到 10 億之間的任何值,但 VACUUM 將自動地將有效值限制為 autovacuum_freeze_max_age 值的一半,以便在強制自動清理之間沒有過短的不合理時間間隔。欲了解更多訊息,請參閱 第 24.1.5.1 節。
bytea_output
(enum
)設定預設的輸出格式型別為bytea
。合法的設定值為 hex(預設)和 escape(傳統的 PostgreSQL 格式)。請參閱第 8.4 節取得更多資訊。無論這個設定如何,bytea 型別在輸入時,兩種格式都能接受。
xmlbinary
(enum
)設定如何在 XML 中編碼二進位數值。例如,當 bytea 值被函數 xmlelement 或 xmlforest 轉換為XML時,就適用這個設定。可以使用的值是 base64 和 hex,都是在 XML Schema 標準中定義的。 預設值是 base64。有關 XML 相關函數的更多訊息,請參閱第 9.14 節。
實際上的選擇主要是習慣問題,僅受限於客戶端應用程式中的可能限制。這兩種方法都支援所有可能的值,儘管 hex 編碼會比 base64 編碼稍大。
xmloption
(enum
)在 XML 和字串之間轉換時,設定是否隱含 DOCUMENT 或 CONTENT。請參閱 8.13 節的描述。有效值是 DOCUMENT 和 CONTENT。預設值是 CONTENT。
根據 SQL 標準,設定此選項的命令是
這個語法在 PostgreSQL 中也是可以使用的。
gin_pending_list_limit
(integer
)設定啟用 fastupdate 時使用的 GIN 排程列表的最大空間。如果列表大於這個最大空間,則透過將其中的項目整批移動到主 GIN 資料結構來清除它。預設值是 4MB。透過更改索引的儲存參數,可以為單個 GIN 索引覆寫此設定。有關更多訊息,請參閱第 64.4.1 節和第 64.5 節。
DateStyle
(string
)設定日期和時間內容的顯示格式,以及解釋模糊日期輸入的規則。由於歷史的因素,此參數包含兩個獨立的參數:輸出格式規範(ISO、Postgres、SQL 或 German)以及年/月/日次序(DMY、MDY 或 YMD)的輸入/輸出規範。它們可以單獨或一起設定。 關鍵字 Euro 和 European 是 DMY 的同義詞;關鍵字 US、NonEuro 和 NonEuropean 是 MDY 的同義詞。有關更多訊息,請參閱第 8.5 節。 內建的預設值是 ISO、MDY,但是 initdb 會以使用所選的 lc_time 語言環境相對應的設定來初始化設定內容。
IntervalStyle
(enum
)設定間隔時間內容的顯示格式。設定為 sql_standard 時,將產生合於 SQL 標準的間隔時間的輸出。當 DateStyle 參數設定為 ISO 時,設定為 postgres(預設值)將會產生與 8.4 之前的 PostgreSQL 版本相容輸出。當 DateStyle 參數設定為 non-ISO 時,設定為 postgres_verbose 將生成與 8.4之前的 PostgreSQL 版本相容輸出。 設定為 iso_8601 時,將產生 ISO 8601 中 4.4.3.2 節裡所定義的時間間隔「格式與標誌符」相容的輸出。
Interval Style 參數也會影響模糊區間輸入的解釋。有關更多訊息,請參閱第 8.5.4 節。
TimeZone
(string
)設定顯示和解釋時間戳記的時區。內建的預設值是 GMT,但通常會在 postgresql.conf 中被覆寫;initdb 將在安裝時取得其系統環境相對應的設定。 有關更多訊息,請參閱第 8.5.3 節。
timezone_abbreviations
(string
)設定日期時間輸入能被伺服器接受的時區縮寫集合。預設是「Default」,這是一個在世界大部分地區都可以使用的集合;還有「Australia」和「India」,並且可以為特定定義安裝其他集合。 更多訊息詳見 B.3 節。
extra_float_digits
(integer
)此參數調整顯示浮點數的位數,包括 float4、float8 和地理資料型別。參數值會被加到標準位數之中(FLT_DIG 或 DBL_DIG)。此值可以設定為 3,以包含部分有效數字;這對於需要精確回存浮點數資料特別有用。或者可以將其設定為負數來減少不需要的數字。請另參閱第 8.1.3 節。
client_encoding
(string
)設定用戶端編碼(字元集)。預設是使用資料庫的編碼方式。在 23.3.1 節描述了 PostgreSQL 資料庫支援的字元集。
lc_messages
(string
)設定訊息顯示的語言。可接受的值取決於系統;關於更多訊息,請參閱第 23.1 節。如果此參數設定為空字串(預設值),則該值將以系統相關的方式從伺服器的執行環境中繼承。
在某些系統上,此語言環境類別並不存在。設定這個參數仍然可以運作,但不會有任何影響。此外,也可能還沒有用於所需語言翻譯的訊息。在這種情況下,你會繼續看到英文訊息。
只有系統管理者可以更改此設定,因為它會影響發送到伺服器日誌以及用戶端的訊息,而不正確的值可能會影響伺服器日誌的可讀性。
lc_monetary
(string
)設定用於格式化貨幣金額的區域配置,例如 to_char 系列函數。可接受的值取決於系統;關於更多訊息,請參閱第 23.1 節。如果此參數設定為空字串(預設值),則該值將以系統相關的方式從伺服器的執行環境中繼承。
lc_numeric
(string
)設定用於格式化數字的區域配置,例如 to_char 系列函數。可接受的值取決於系統;關於更多訊息,請參閱第 23.1 節。如果此參數設定為空字串(預設值),則該值將以系統相關的方式從伺服器的執行環境中繼承。
lc_time
(string
)設定用於格式化時間的區域配置,例如 to_char 系列函數。可接受的值取決於系統;關於更多訊息,請參閱第 23.1 節。如果此參數設定為空字串(預設值),則該值將以系統相關的方式從伺服器的執行環境中繼承。
default_text_search_config
(string
)選擇全文檢索的設定,用於那些無法指定語系的全文檢索函數。 更多說明詳見第12章。內建的預設值為 pg_catalog.simple,但如果可以識別與該語言環境匹配的配置,則 initdb 將使用與所選 lc_ctype 語言環境相對應的設置來初始化配置設定。
有幾個設定可用於將共享函式庫預載到伺服器中,以便載入延伸功能並展現性能優勢。例如,設定 '$libdir / mylib' 能將 mylib.so(在某些平台上是 mylib.sl)從安裝的標準函式庫目錄中預載。這些設定之間的差異主要是控制在何時生效,以及需要哪些權限才能更改它們。
PostgreSQL 的程序語言庫可以用這種方式預載,通常語法是 '$libdir/plXXX',其中 XXX 是 pgsql、perl、tcl 或 python。
只有專門用於 PostgreSQL 的共享函式庫才能以這種方式載入。每個支援 PostgreSQL 的函式庫都有一個「magic block」,它會被檢查以確保相容性。由於這個原因的關係,非 PostgreSQL 函式庫不能以這種方式載入。你可能可以使用作業系統的功能,例如 LD_PRELOAD。
一般來說,都需要詳閱該函式庫的文件,以獲得載入該函式庫推薦的方法.
local_preload_libraries
(string
)此參數指定一個或多個要在連線啟動時預載的共享函式庫。它是逗號分隔的函式庫名稱列表,其中每個名稱都被以 LOAD 命令處理。 項目之間的空白都會被忽略;如果需要在名稱中包含空格或逗號,請用雙引號括住函式庫名稱。參數值僅在連線開始時生效。 後續更改都不起作用。如果未找到指定的函式庫,則連線嘗試將會失敗。
這個選項可以由任何使用者設定。因此,可以載入的函式庫僅限於出現在標準函式庫目錄的外掛目錄中的函式庫。 (資料庫管理員有責任確保在那裡只安裝了「安全的」函式庫。)local_preload_libraries 中的項目可以明確指定此目錄,例如 $libdir/plugins/mylib,或者只指定函式庫名稱 mylib 與 $libdir/plugins/mylib 具有相同的效果。
此功能的目的是允許非特權用戶將調教或性能測試函式庫加載到特定的連線中,而不需要明確的 LOAD 命令。為此,通常使用用戶端上的 PGOPTIONS 環境變數或透過使用 ALTER ROLE SET 來設定此參數。
但是,除非一個模組是專門設計用於非超級用戶的方式,否則這通常不適合使用。請參考使用 session_preload_libraries 參數。
session_preload_libraries
(string
)此參數指定一個或多個要在連線啟動時預載的共享函式庫。它是逗號分隔的函式庫名稱列表,其中每個名稱都被以 LOAD 命令處理。. 項目之間的空白都會被忽略;如果需要在名稱中包含空格或逗號,請用雙引號括住函式庫名稱。參數值僅在連線開始時生效。 後續更改都不起作用。如果未找到指定的函式庫,則連線嘗試將會失敗。 只有超級使用者可以調整此參數。
此功能的目的是允許除錯或性能測試的函式庫載入到特定的連線中,而不需要指示明確的 LOAD 指令。例如,透過使用 ALTER ROLE SET 設定此參數,可以為指定用戶的所有連線啟用 auto_explain。此外,可以在不重新啟動服務的情況下更改此參數(但更改僅在啟動新的連線時生效),因此即使應用於所有連線,以這種方式增加新的模組也很容易。
與 shared_preload_libraries 不同,在連線啟動時載入函式庫時並沒有很大的效能優勢,相對於第一次使用時。 但是,使用連接池時會有一些優勢。
shared_preload_libraries
(string
)此參數指定一個或多個要在伺服器啟動時預載的共享函式庫。它是逗號分隔的函式庫名稱列表,其中每個名稱都被以 LOAD 命令處理。. 項目之間的空白都會被忽略;如果需要在名稱中包含空格或逗號,請用雙引號括住函式庫名稱。參數值僅在伺服器啓動時生效。 後續更改都不起作用。如果未找到指定的函式庫,則連線嘗試將會失敗。
有些函式庫需要執行某些只能在 postmaster 啟動時才能執行的操作,例如分配共享記憶體,保留輕量級鎖定或啟動背景執行程序。 這些函式庫必須在伺服器啟動時通過此參數載入。有關詳細信息,請參閱各別函式庫的文件。
其他的函式庫也可以預先載入。通過預先載入共享函式庫,首次使用函式庫時可以減少啟動時間的成本。但是,啟動每個新伺服器服務的時間可能會略有增加,即使該服務從不使用該函式庫。因此,此參數僅適用於大多數連線中將使用的函式庫。另外,更改此參數需要重新啟動伺服器,因此這不適用於短期除錯事務的需求,請改為使用 session_preload_libraries。
注意在Windows主機上,在伺服器啟動時預載函式庫不會減少啟動每個新伺服器服務所需的時間;每個伺服器服務程將重新加載所有預載函式庫。但是,shared_preload_libraries 仍然是有用的,在你的 Windows 主機的 postmaster 啓動時操作所需的函式庫。
dynamic_library_path
(string
)如果需要開啓一個可動態載入的模組,並且在 CREATE FUNCTION 或 LOAD 指令中使用沒有目錄名稱的模組檔案(即該名稱不包含斜線),系統將在此路徑中搜尋所需的檔案。
dynamic_library_path 的內容必須是由冒號(或在 Windows 上是分號)分隔的絕對路徑的列表。如果該列表項目以特殊字符串 $libdir 開頭,那麼編譯後的 PostgreSQL 函式庫目錄會被替換為 $libdir;這是安裝標準 PostgreSQL 發行版所提供的模組的路徑。(可以使用 pg_config --pkglibdir 查詢此目錄的路徑。)例如:
或者,在 Windows 環境中:
此參數的預設值是「$libdir」。如果此值設定為空字串,則將關閉自動路徑搜尋。
超級使用者可以在服務執行時更改此參數,但以這種方式完成的設定只會持續到用戶端連線結束,因此應將此方法保留用於開發階段使用。建議使用此參數的方式是在 postgresql.conf 設定檔中。
gin_fuzzy_search_limit
(integer
)由 GIN 索引掃描回傳集合大小的軟上限。詳情請參閱第 66.5 節。
有許多設定參數會影響資料庫系統的行為。在本章的第一部分中,我們將介紹如何瞭解如何設定參數。接下來的部分將詳細討論每個參數。
While the server is running, it is not possible for a malicious user to take the place of the normal database server. However, when the server is down, it is possible for a local user to spoof the normal server by starting their own server. The spoof server could read passwords and queries sent by clients, but could not return any data because the PGDATA
directory would still be secure because of directory permissions. Spoofing is possible because any user can start a database server; a client cannot identify an invalid server unless it is specially configured.
One way to prevent spoofing of local
connections is to use a Unix domain socket directory (unix_socket_directories) that has write permission only for a trusted local user. This prevents a malicious user from creating their own socket file in that directory. If you are concerned that some applications might still reference /tmp
for the socket file and hence be vulnerable to spoofing, during operating system startup create a symbolic link /tmp/.s.PGSQL.5432
that points to the relocated socket file. You also might need to modify your /tmp
cleanup script to prevent removal of the symbolic link.
Another option for local
connections is for clients to use requirepeer
to specify the required owner of the server process connected to the socket.
To prevent spoofing on TCP connections, either use SSL certificates and make sure that clients check the server's certificate, or use GSSAPI encryption (or both, if they're on separate connections).
To prevent spoofing with SSL, the server must be configured to accept only hostssl
connections (Section 20.1) and have SSL key and certificate files (Section 18.9). The TCP client must connect using sslmode=verify-ca
or verify-full
and have the appropriate root certificate file installed (Section 33.18.1).
To prevent spoofing with GSSAPI, the server must be configured to accept only hostgssenc
connections (Section 20.1) and use gss
authentication with them. The TCP client must connect using gssencmode=require
.
版本:11
這些設定控制內建的串流複寫功能行為(請參閱第 26.2.5 節)。伺服器指的是主伺服務器或備用伺服器。主伺服器可以發送資料,而備用伺服器始終是複寫資料的接收者。當使用串聯複寫(請參閱第 26.2.7 節)時,備用伺服器也可以是發送者和接收者。參數主要用於發送和備用伺服器,但某些參數僅在主伺服器上有意義。如果需要,設定是跨群集的,不會産生問題。
可以在將資料複寫發送到一個或多個備用伺服器的任何伺服器上設定這些參數。主伺服器始終是發送伺服器,因此必須在主伺服器上設定這些參數。備用資料庫成為主資料庫後,這些參數的作用也不會改變。
max_wal_senders
(integer
)
指定來自備用伺服器或串流複寫備份用戶端的最大同時連線數(即同時運行的 WAL 發送程序的最大數量)。預設值為 10,0 表示停用複寫。WAL 發送方程序也計入連線總數,因此參數不能設定高於 max_connections。突然串流用戶端中斷連線可能會導致遺留連線插槽,直到達到超時。因此此參數應設定為略高於預期用戶端的最大數量,以便中斷連線的用戶端可以立即重新連線。此參數只能在伺服器啟動時設定。wal_level 必須設定為副本或更高版本才能允許來自備用伺服器的連線。
max_replication_slots
(integer
)
指定伺服器可以支援的最大複寫槽數(請參閱第 26.2.6 節)。預設值為 10。此參數只能在伺服器啟動時設定。必須將 wal_level 設定為副本或更高版本才能使用複寫槽。將其設定為低於目前現有複寫插槽數的值將阻止伺服器啟動。
wal_keep_segments
(integer
)
指定保留在 pg_wal 目錄中的過時日誌段落檔案的最小數量,以防備用伺服器需要取得它們以進行串流複寫。每個段落段通常為 16 MB。如果連線到發送伺服器的備用伺服器落後於 wal_keep_segments 個段落以上,則發送伺服器可能會刪除備用資料庫仍需要的 WAL 段落,在這種情況下,複寫連線將會終止。因此,下游連線最終也會失敗。(但是,如果正在使用 WAL Archive,則備用伺服器可以透過從 Archive 中取得段落來進行回復。)
這僅設定 pg_wal 中保留的最小段落數量;系統可能需要為 WAL 存檔保留更多段落或從檢查點回復。如果 wal_keep_segments 為零(預設值),則系統不會為備用目的保留任何額外的段落,因此備用伺服器可用的舊 WAL 段落數是上一個檢查點的位置和WAL 歸檔狀態的函數。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
wal_sender_timeout
(integer
)
終止靜止狀態超過指定毫秒數的複寫連線。這對於發送伺服器檢測備用伺服器當機或網路斷線很有用。值為零會停用超時機制。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值為 60 秒。
track_commit_timestamp
(boolean
)
記錄事務的提交時間。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值為 off。
可以將要複寫資料發送到一個或多個備用伺服器,在主要伺服器上設定這些參數。請注意,除了這些參數之外,還必須在主要伺服器上正確設定 wal_level,可以選擇啟用 WAL 歸檔(參閱第 19.5.3 節)。備用伺服器上這些參數的值是無意義的,儘管您可能希望將它們設定在那裡以預備備用資料庫成為主要伺服器的可能性。
synchronous_standby_names
(string
)
指定可支援同步複寫的備用伺服器列表,如第 26.2.8 節中所述。 將有一個或多個線上同步的備用資料庫;在這些備用伺服器確認收到其資料後,將允許等待提交的事務繼續進行。同步備用資料庫將是其名稱出現在此列表中的那些,並且即時以串流傳輸資料(如 pg_stat_replication 檢視表中的串流傳輸狀態所示)。指定多個同步備用資料庫可以達到非常高的可用性並防止資料遺失。
用於此目的的備用伺服器的名稱是以備用資料庫的 application_name 設定,在備用資料庫的連線資訊中設定。如果是物理性複寫的備用,則應在 recovery.conf 中的 primary_conninfo 設定中進行設定;預設是 walreceiver。對於邏輯性複寫,可以在訂閱的連線訊息中設定,並且預設為訂閱名稱。對於其他複寫的串流使用者,請查閱其文件。
此參數使用以下任一語法指定備用伺服器列表:
其中 num_sync 是交易事務需要等待回覆的同步備用數量,而 standby_name 是備用伺服器的名稱。FIRST 和 ANY 指定從列出的伺服器中選擇同步備用資料庫的方法。
關鍵字 FIRST 與 num_sync 合併使用,指定基於優先的同步複寫,讓事務提交等待,直到將其 WAL 記錄複寫到優先選擇的 num_sync 同步備用資料庫。例如,FIRST 3(s1,s2,s3,s4)的設定將使得每個提交等待從備用伺服器 s1,s2,s3 和 s4 中選擇的三個較優先的備用資料庫回覆。名稱在列表中較早出現的備用資料庫具有較高的優先等級,並被視為是同步的。此列表中稍後出現的其他備用伺服器代表潛在的同步備用資料庫。如果任何當下的同步備用資料庫因任何原因斷開連線,它將立即被替換為次高優先等級的備用資料庫。關鍵字 FIRST 是選用的。
關鍵字 ANY 與 num_sync 一起使用,指定需要仲裁的同步複寫,使事務提交等待,直到將其 WAL 記錄複寫到至少 num_sync 列出的備用資料庫。例如,ANY 3(s1,s2,s3,s4)的設定將使得每個提交在 s1,s2,s3 和 s4 的至少任何三個備用資料回覆時繼續進行。
FIRST 和 ANY 都不區分大小寫。 如果將這些關鍵字用作備用伺服器的名稱,則其 standby_name 必須使用雙引號。
第三種語法在 PostgreSQL 版本 9.6 之前使用,仍然受支援。它與 FIRST 和 num_sync 等於 1 的第一個語法相同。例如,FIRST 1(s1,s2)和 s1,s2 具有相同的含義:s1 或 s2 被選為同步的備用伺服器。
特殊符號 * 表示匹配任何備用名稱。
沒有其他機制來強制備用名稱的唯一性。如果重複的話,其中一個備用資料庫將被視為更優先的,但無法確切說是哪一個。
注意 每個 standby_name 都應具有有效 SQL 識別字的形式,除非是 *。如有必要,您可以使用雙引號。但請注意,standby_names 與備用 application name 都不區分大小寫,無論是否為雙引號。
如果此處未指定同步的備用伺服器名稱,則不啟用同步複寫,事務提交就不會等待複寫。這是預設配置。即使啟用了同步複寫,也可以將單個事務設定為不等待複寫,方法是將 synchronous_commit 參數設定為 local 或 off。
此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
vacuum_defer_cleanup_age
(integer
)
指定 VACUUM 和 HOT 更新將延遲清除過期資料列版本的事務數。預設值為 0 事務,這意味著可以盡快刪除過期資料列的版本。也就是說,只要它們不再對任何開放的事務是可見的。您可能希望在支持熱備用伺服器的主要服務器上將其設定為非零值,如第 26.5 節中所述。這樣可以讓備用資料庫上的查詢有更多時間完成,而不會因過早清理資料列而導致衝突。但是,由於該值是根據主要服務器上所發生的寫入事務的數量來衡量的,因此很難預測備用查詢可用多少額外的寬限時間。 此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
您還應該考慮在備用伺服器上設定 hot_standby_feedback 作為使用此參數的替代方法。
這不會阻止已達到 old_snapshot_threshold 指定期間的過時資料列清除。
這些設定控制要接收複寫資料的備用伺服器行為,與主伺服器上的設定是無關的。
hot_standby
(boolean
)
指定是否可以在回復期間連線和執行查詢,如第 26.5 節中所述。預設值為 on。 此參數只能在伺服器啟動時設定。它僅在歸檔回復或備機模式下有效。
max_standby_archive_delay
(integer
)
當 Hot Standby 處於啟用狀態時,此參數確定備用伺服器在取消與即將套用的 WAL 項目衝突的備用查詢之前應等待的時間,如第 26.5.2 節中所述。當從 WAL 歸檔中讀取 WAL 資料時,max_standby_archive_delay 適用(因此不是當下的)。預設值為 30 秒。如果未指定,則單位為毫秒。值 -1 時允許備用資料庫永遠等待衝突查詢完成。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
請注意,max_standby_archive_delay 與取消前查詢可以執行的最長時間不同;相反地,它是允許套用任何一個 WAL 資料段的最大總時間。因此,如果一個查詢在 WAL 資料段中導致顯著延遲,則後續衝突查詢將具有更少的寬限時間。
max_standby_streaming_delay
(integer
)
當 Hot Standby 處於啓用狀態時,此參數決定備用伺服器在取消與即將套用的 WAL 項目衝突的備用查詢之前應等待的時間,如第 26.5.2 節中所述。當透過串流複寫接收 WAL 資料時,套用max_standby_streaming_delay。預設值為 30 秒。如果未指定,則單位為毫秒。值 -1 時允許備用資料庫永遠等待衝突查詢完成。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
請注意,max_standby_streaming_delay 與取消前查詢可以執行的最長時間不同;相反地,它是從主伺服器收到 WAL 資料後允許套用的最大總時間。因此,如果一個查詢導致顯著延遲,則後續衝突查詢將具有更少的寬限時間,直到備用伺服器再次趕上。
wal_receiver_status_interval
(integer
)
Specifies the minimum frequency for the WAL receiver process on the standby to send information about replication progress to the primary or upstream standby, where it can be seen using the pg_stat_replication
view. The standby will report the last write-ahead log location it has written, the last position it has flushed to disk, and the last position it has applied. This parameter's value is the maximum interval, in seconds, between reports. Updates are sent each time the write or flush positions change, or at least as often as specified by this parameter. Thus, the apply position may lag slightly behind the true position. Setting this parameter to zero disables status updates completely. This parameter can only be set in the postgresql.conf
file or on the server command line. The default value is 10 seconds.
hot_standby_feedback
(boolean
)
Specifies whether or not a hot standby will send feedback to the primary or upstream standby about queries currently executing on the standby. This parameter can be used to eliminate query cancels caused by cleanup records, but can cause database bloat on the primary for some workloads. Feedback messages will not be sent more frequently than once per wal_receiver_status_interval
. The default value is off
. This parameter can only be set in the postgresql.conf
file or on the server command line.
If cascaded replication is in use the feedback is passed upstream until it eventually reaches the primary. Standbys make no other use of feedback they receive other than to pass upstream.
This setting does not override the behavior of old_snapshot_threshold
on the primary; a snapshot on the standby which exceeds the primary's age threshold can become invalid, resulting in cancellation of transactions on the standby. This is because old_snapshot_threshold
is intended to provide an absolute limit on the time which dead rows can contribute to bloat, which would otherwise be violated because of the configuration of a standby.
wal_receiver_timeout
(integer
)
Terminate replication connections that are inactive longer than the specified number of milliseconds. This is useful for the receiving standby server to detect a primary node crash or network outage. A value of zero disables the timeout mechanism. This parameter can only be set in the postgresql.conf
file or on the server command line. The default value is 60 seconds.
wal_retrieve_retry_interval
(integer
)
Specify how long the standby server should wait when WAL data is not available from any sources (streaming replication, local pg_wal
or WAL archive) before retrying to retrieve WAL data. This parameter can only be set in the postgresql.conf
file or on the server command line. The default value is 5 seconds. Units are milliseconds if not specified.
This parameter is useful in configurations where a node in recovery needs to control the amount of time to wait for new WAL data to be available. For example, in archive recovery, it is possible to make the recovery more responsive in the detection of a new WAL log file by reducing the value of this parameter. On a system with low WAL activity, increasing it reduces the amount of requests necessary to access WAL archives, something useful for example in cloud environments where the amount of times an infrastructure is accessed is taken into account.
These settings control the behavior of a logical replication subscriber. Their values on the publisher are irrelevant.
Note that wal_receiver_timeout
, wal_receiver_status_interval
and wal_retrieve_retry_interval
configuration parameters affect the logical replication workers as well.
max_logical_replication_workers
(int
)
Specifies maximum number of logical replication workers. This includes both apply workers and table synchronization workers.
Logical replication workers are taken from the pool defined by max_worker_processes
.
The default value is 4.
max_sync_workers_per_subscription
(integer
)
Maximum number of synchronization workers per subscription. This parameter controls the amount of parallelism of the initial data copy during the subscription initialization or when new tables are added.
Currently, there can be only one synchronization worker per table.
The synchronization workers are taken from the pool defined by max_logical_replication_workers
.
The default value is 2.
All parameter names are case-insensitive. Every parameter takes a value of one of five types: boolean, string, integer, floating point, or enumerated (enum). The type determines the syntax for setting the parameter:
Boolean: Values can be written as on
, off
, true
, false
, yes
, no
, 1
, 0
(all case-insensitive) or any unambiguous prefix of one of these.
String: In general, enclose the value in single quotes, doubling any single quotes within the value. Quotes can usually be omitted if the value is a simple number or identifier, however.
Numeric (integer and floating point): A decimal point is permitted only for floating-point parameters. Do not use thousands separators. Quotes are not required.
Numeric with Unit: Some numeric parameters have an implicit unit, because they describe quantities of memory or time. The unit might be kilobytes, blocks (typically eight kilobytes), milliseconds, seconds, or minutes. An unadorned numeric value for one of these settings will use the setting's default unit, which can be learned from pg_settings
.unit
. For convenience, settings can be given with a unit specified explicitly, for example '120 ms'
for a time value, and they will be converted to whatever the parameter's actual unit is. Note that the value must be written as a string (with quotes) to use this feature. The unit name is case-sensitive, and there can be whitespace between the numeric value and the unit.
Valid memory units are kB
(kilobytes), MB
(megabytes), GB
(gigabytes), and TB
(terabytes). The multiplier for memory units is 1024, not 1000.
Valid time units are ms
(milliseconds), s
(seconds), min
(minutes), h
(hours), and d
(days).
Enumerated: Enumerated-type parameters are written in the same way as string parameters, but are restricted to have one of a limited set of values. The values allowable for such a parameter can be found frompg_settings
.enumvals
. Enum parameter values are case-insensitive.
The most fundamental way to set these parameters is to edit the file postgresql.conf
, which is normally kept in the data directory. A default copy is installed when the database cluster directory is initialized. An example of what this file might look like is:
One parameter is specified per line. The equal sign between name and value is optional. Whitespace is insignificant (except within a quoted parameter value) and blank lines are ignored. Hash marks (#
) designate the remainder of the line as a comment. Parameter values that are not simple identifiers or numbers must be single-quoted. To embed a single quote in a parameter value, write either two quotes (preferred) or backslash-quote.
Parameters set in this way provide default values for the cluster. The settings seen by active sessions will be these values unless they are overridden. The following sections describe ways in which the administrator or user can override these defaults.
The configuration file is reread whenever the main server process receives a SIGHUP signal; this signal is most easily sent by running pg_ctl reload
from the command line or by calling the SQL function pg_reload_conf()
. The main server process also propagates this signal to all currently running server processes, so that existing sessions also adopt the new values (this will happen after they complete any currently-executing client command). Alternatively, you can send the signal to a single server process directly. Some parameters can only be set at server start; any changes to their entries in the configuration file will be ignored until the server is restarted. Invalid parameter settings in the configuration file are likewise ignored (but logged) during SIGHUP processing.
In addition to postgresql.conf
, a PostgreSQL data directory contains a file postgresql.auto.conf
, which has the same format as postgresql.conf
but should never be edited manually. This file holds settings provided through the ALTER SYSTEM command. This file is automatically read whenever postgresql.conf
is, and its settings take effect in the same way. Settings in postgresql.auto.conf
override those in postgresql.conf
.
The system view pg_file_settings
can be helpful for pre-testing changes to the configuration file, or for diagnosing problems if a SIGHUP signal did not have the desired effects.
PostgreSQL provides three SQL commands to establish configuration defaults. The already-mentioned ALTER SYSTEM command provides a SQL-accessible means of changing global defaults; it is functionally equivalent to editing postgresql.conf
. In addition, there are two commands that allow setting of defaults on a per-database or per-role basis:
The ALTER DATABASE command allows global settings to be overridden on a per-database basis.
The ALTER ROLE command allows both global and per-database settings to be overridden with user-specific values.
Values set with ALTER DATABASE
and ALTER ROLE
are applied only when starting a fresh database session. They override values obtained from the configuration files or server command line, and constitute defaults for the rest of the session. Note that some settings cannot be changed after server start, and so cannot be set with these commands (or the ones listed below).
Once a client is connected to the database, PostgreSQL provides two additional SQL commands (and equivalent functions) to interact with session-local configuration settings:
The SHOW command allows inspection of the current value of all parameters. The corresponding function is current_setting(setting_name text)
.
The SET command allows modification of the current value of those parameters that can be set locally to a session; it has no effect on other sessions. The corresponding function is set_config(setting_name, new_value, is_local)
.
In addition, the system view pg_settings
can be used to view and change session-local values:
Querying this view is similar to using SHOW ALL
but provides more detail. It is also more flexible, since it's possible to specify filter conditions or join against other relations.
Using UPDATE on this view, specifically updating the setting
column, is the equivalent of issuing SET
commands. For example, the equivalent of
is:
In addition to setting global defaults or attaching overrides at the database or role level, you can pass settings to PostgreSQL via shell facilities. Both the server and libpq client library accept parameter values via the shell.
During server startup, parameter settings can be passed to the postgres
command via the -c
command-line parameter. For example,
Settings provided in this way override those set via postgresql.conf
or ALTER SYSTEM
, so they cannot be changed globally without restarting the server.
When starting a client session via libpq, parameter settings can be specified using the PGOPTIONS
environment variable. Settings established in this way constitute defaults for the life of the session, but do not affect other sessions. For historical reasons, the format of PGOPTIONS
is similar to that used when launching the postgres
command; specifically, the -c
flag must be specified. For example,
Other clients and libraries might provide their own mechanisms, via the shell or otherwise, that allow the user to alter session settings without direct use of SQL commands.
PostgreSQL provides several features for breaking down complex postgresql.conf
files into sub-files. These features are especially useful when managing multiple servers with related, but not identical, configurations.
In addition to individual parameter settings, the postgresql.conf
file can contain include directives, which specify another file to read and process as if it were inserted into the configuration file at this point. This feature allows a configuration file to be divided into physically separate parts. Include directives simply look like:
If the file name is not an absolute path, it is taken as relative to the directory containing the referencing configuration file. Inclusions can be nested.
There is also an include_if_exists
directive, which acts the same as the include
directive, except when the referenced file does not exist or cannot be read. A regular include
will consider this an error condition, but include_if_exists
merely logs a message and continues processing the referencing configuration file.
The postgresql.conf
file can also contain include_dir
directives, which specify an entire directory of configuration files to include. These look like
Non-absolute directory names are taken as relative to the directory containing the referencing configuration file. Within the specified directory, only non-directory files whose names end with the suffix .conf
will be included. File names that start with the .
character are also ignored, to prevent mistakes since such files are hidden on some platforms. Multiple files within an include directory are processed in file name order (according to C locale rules, i.e. numbers before letters, and uppercase letters before lowercase ones).
Include files or directories can be used to logically separate portions of the database configuration, rather than having a single large postgresql.conf
file. Consider a company that has two database servers, each with a different amount of memory. There are likely elements of the configuration both will share, for things such as logging. But memory-related parameters on the server will vary between the two. And there might be server specific customizations, too. One way to manage this situation is to break the custom configuration changes for your site into three files. You could add this to the end of your postgresql.conf
file to include them:
All systems would have the same shared.conf
. Each server with a particular amount of memory could share the same memory.conf
; you might have one for all servers with 8GB of RAM, another for those having 16GB. And finallyserver.conf
could have truly server-specific configuration information in it.
Another possibility is to create a configuration file directory and put this information into files there. For example, a conf.d
directory could be referenced at the end of postgresql.conf
:
Then you could name the files in the conf.d
directory like this:
This naming convention establishes a clear order in which these files will be loaded. This is important because only the last setting encountered for a particular parameter while the server is reading configuration files will be used. In this example, something set in conf.d/02server.conf
would override a value set in conf.d/01memory.conf
.
You might instead use this approach to naming the files descriptively:
This sort of arrangement gives a unique name for each configuration file variation. This can help eliminate ambiguity when several servers have their configurations all stored in one place, such as in a version control repository. (Storing database configuration files under version control is another good practice to consider.)
In addition to the postgresql.conf
file already mentioned, PostgreSQL uses two other manually-edited configuration files, which control client authentication (their use is discussed in Chapter 20). By default, all three configuration files are stored in the database cluster's data directory. The parameters described in this section allow the configuration files to be placed elsewhere. (Doing so can ease administration. In particular it is often easier to ensure that the configuration files are properly backed-up when they are kept separate.)
data_directory
(string
)Specifies the directory to use for data storage. This parameter can only be set at server start.
config_file
(string
)Specifies the main server configuration file (customarily called postgresql.conf
). This parameter can only be set on the postgres
command line.
hba_file
(string
)Specifies the configuration file for host-based authentication (customarily called pg_hba.conf
). This parameter can only be set at server start.
ident_file
(string
)Specifies the configuration file for user name mapping (customarily called pg_ident.conf
). This parameter can only be set at server start. See also Section 20.2.
external_pid_file
(string
)Specifies the name of an additional process-ID (PID) file that the server should create for use by server administration programs. This parameter can only be set at server start.
In a default installation, none of the above parameters are set explicitly. Instead, the data directory is specified by the -D
command-line option or the PGDATA
environment variable, and the configuration files are all found within the data directory.
If you wish to keep the configuration files elsewhere than the data directory, the postgres
-D
command-line option or PGDATA
environment variable must point to the directory containing the configuration files, and the data_directory
parameter must be set inpostgresql.conf
(or on the command line) to show where the data directory is actually located. Notice that data_directory
overrides -D
and PGDATA
for the location of the data directory, but not for the location of the configuration files.
If you wish, you can specify the configuration file names and locations individually using the parameters config_file
, hba_file
and/or ident_file
. config_file
can only be specified on the postgres
command line, but the others can be set within the main configuration file. If all three parameters plus data_directory
are explicitly set, then it is not necessary to specify -D
or PGDATA
.
When setting any of these parameters, a relative path will be interpreted with respect to the directory in which postgres
is started.
deadlock_timeout
(integer
)這是查看是否存在交易鎖定鎖死情況之前,所等待的時間量(以毫秒為單位)。檢查鎖死是相對昂貴的,所以伺服器在每次等待鎖定時都不會執行這個動作。我們樂觀地認為鎖死在產品應用程式中並不常見,所以在檢查鎖死之前等待鎖定一段時間。增加此值可減少無謂的鎖死檢查所浪費的時間,但會減慢真正鎖死錯誤的回報速度。預設值是 1 秒,這可能是您實際需要的最小值。 在負載很重的伺服器上,您可能需要提升一些。理想情況下,此設定應該超過您典型的交易時間,以便提高在伺服器決定檢查鎖死之前鎖定就被解除的可能性。只有超級使用者可以變更此設定。
當設定 log_lock_waits 時,此參數還會確定在發出有關鎖定等待的日誌消息之前需要等待的時間長度。如果您試圖查看鎖定延遲,則可能需要設定比正常情況更短的 deadlock_timeout。
max_locks_per_transaction
(integer
)共享鎖定資料表追踪 max_locks_per_transaction *(max_connections + max_prepared_transactions)個物件(例如資料表)上的交易鎖定;因此,在任何時候都可以鎖定許多不同的物件。 此參數控制為每個交易事務分配的平均對象鎖數量; 只要所有交易的鎖定符合鎖定資料表,個別交易就可以鎖定更多的對象。 這不是可以鎖定的資料列數;該值是無限的。預設值 64 在歷史上證明是足夠的,但如果在單個交易事務中有許多不同資料表的查詢,則可能需要提高此值。例如有很多子資料表的父資料表的查詢。此參數只能在伺服器啟動時設定。
運行備用伺服器時,必須將此參數設定為與主服務器上相同或更高的值。 否則,查詢將不被允許在備用伺服器中。
max_pred_locks_per_transaction
(integer
)共享的 predicate lock 資料表追踪 max_pred_locks_per_transaction *(max_connections + max_prepared_transactions)個物件(例如資料表)上的交易鎖定;因此,在任何時候不會有比這個數字更多的物件被鎖定。此參數控制為每個交易事務分配的平均物件鎖定的數量;只要所有交易的鎖定符合鎖定資料表,個別交易就可以鎖定更多的物件。不是可以鎖定的資料列數;該值是無限的。預設值 64 通常在測試中足夠了,但如果您的用戶端在單個可序列化交易事務中觸及許多不同的資料表,您可能需要提高此值。此參數只能在伺服器啟動時設定。
max_pred_locks_per_relation
(integer
)這可以控制在鎖定被提升為鎖定整個關連之前,單個關連的多少個 page 或 tuple 可以被 predicate-lock。大於或等於零的值表示絕對限制,而負值表示 max_pred_locks_per_transaction 除以此設定的絕對值。預設值是 -2,它保留了先前版本 PostgreSQL 的行為。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
max_pred_locks_per_page
(integer
)這可以控制在將鎖定升級為覆蓋整個 page 之前,單個 page 上有多少資料列可以 predicate-locked。 預設值是 2。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
19.13.1. Previous PostgreSQL Versions
19.13.2. Platform and Client Compatibility
array_nulls
(
boolean
)
This controls whether the array input parser recognizes unquotedNULL
as specifying a null array element. By default, this ison
, allowing array values containing null values to be entered. However,PostgreSQLversions before 8.2 did not support null values in arrays, and therefore would treatNULL
as specifying a normal array element with the string value“NULL”. For backward compatibility with applications that require the old behavior, this variable can be turnedoff
.
Note that it is possible to create array values containing null values even when this variable isoff
.
backslash_quote
(
enum
)
This controls whether a quote mark can be represented by\'
in a string literal. The preferred, SQL-standard way to represent a quote mark is by doubling it (''
) butPostgreSQLhas historically also accepted\'
. However, use of\'
creates security risks because in some client character set encodings, there are multibyte characters in which the last byte is numerically equivalent to ASCII\
. If client-side code does escaping incorrectly then a SQL-injection attack is possible. This risk can be prevented by making the server reject queries in which a quote mark appears to be escaped by a backslash. The allowed values ofbackslash_quote
areon
(allow\'
always),off
(reject always), andsafe_encoding
(allow only if client encoding does not allow ASCII\
within a multibyte character).safe_encoding
is the default setting.
Note that in a standard-conforming string literal,\
just means\
anyway. This parameter only affects the handling of non-standard-conforming literals, including escape string syntax (E'...'
).
default_with_oids
(
boolean
)
This controls whetherCREATE TABLE
andCREATE TABLE AS
include an OID column in newly-created tables, if neitherWITH OIDS
norWITHOUT OIDS
is specified. It also determines whether OIDs will be included in tables created bySELECT INTO
. The parameter isoff
by default; inPostgreSQL8.0 and earlier, it wason
by default.
The use of OIDs in user tables is considered deprecated, so most installations should leave this variable disabled. Applications that require OIDs for a particular table should specifyWITH OIDS
when creating the table. This variable can be enabled for compatibility with old applications that do not follow this behavior.
escape_string_warning
(
boolean
)
When on, a warning is issued if a backslash (\
) appears in an ordinary string literal ('...'
syntax) andstandard_conforming_strings
is off. The default ison
.
Applications that wish to use backslash as escape should be modified to use escape string syntax (E'...'
), because the default behavior of ordinary strings is now to treat backslash as an ordinary character, per SQL standard. This variable can be enabled to help locate code that needs to be changed.
lo_compat_privileges
(
boolean
)
InPostgreSQLreleases prior to 9.0, large objects did not have access privileges and were, therefore, always readable and writable by all users. Setting this variable toon
disables the new privilege checks, for compatibility with prior releases. The default isoff
. Only superusers can change this setting.
Setting this variable does not disable all security checks related to large objects — only those for which the default behavior has changed inPostgreSQL9.0. For example,lo_import()
andlo_export()
need superuser privileges regardless of this setting.
operator_precedence_warning
(
boolean
)
When on, the parser will emit a warning for any construct that might have changed meanings sincePostgreSQL9.4 as a result of changes in operator precedence. This is useful for auditing applications to see if precedence changes have broken anything; but it is not meant to be kept turned on in production, since it will warn about some perfectly valid, standard-compliant SQL code. The default isoff
.
SeeSection 4.1.6for more information.
quote_all_identifiers
(
boolean
)
When the database generates SQL, force all identifiers to be quoted, even if they are not (currently) keywords. This will affect the output ofEXPLAIN
as well as the results of functions likepg_get_viewdef
. See also the--quote-all-identifiers
option ofpg_dumpandpg_dumpall.
standard_conforming_strings
(
boolean
)
This controls whether ordinary string literals ('...'
) treat backslashes literally, as specified in the SQL standard. Beginning inPostgreSQL9.1, the default ison
(prior releases defaulted tooff
). Applications can check this parameter to determine how string literals will be processed. The presence of this parameter can also be taken as an indication that the escape string syntax (E'...'
) is supported. Escape string syntax (Section 4.1.2.2) should be used if an application desires backslashes to be treated as escape characters.
synchronize_seqscans
(
boolean
)
This allows sequential scans of large tables to synchronize with each other, so that concurrent scans read the same block at about the same time and hence share the I/O workload. When this is enabled, a scan might start in the middle of the table and then“wrap around”the end to cover all rows, so as to synchronize with the activity of scans already in progress. This can result in unpredictable changes in the row ordering returned by queries that have noORDER BY
clause. Setting this parameter tooff
ensures the pre-8.3 behavior in which a sequential scan always starts from the beginning of the table. The default ison
.
transform_null_equals
(
boolean
)
When on, expressions of the formexpr
= NULL(orNULL =expr
) are treated asexpr
_IS NULL, that is, they return true ifexpr
evaluates to the null value, and false otherwise. The correct SQL-spec-compliant behavior ofexpr
_= NULLis to always return null (unknown). Therefore this parameter defaults tooff
.
However, filtered forms inMicrosoft Accessgenerate queries that appear to useexpr
= NULLto test for null values, so if you use that interface to access the database you might want to turn this option on. Since expressions of the formexpr
= NULLalways return the null value (using the SQL standard interpretation), they are not very useful and do not appear often in normal applications so this option does little harm in practice. But new users are frequently confused about the semantics of expressions involving null values, so this option is off by default.
Note that this option only affects the exact form= NULL
, not other comparison operators or other expressions that are computationally equivalent to some expression involving the equals operator (such asIN
). Thus, this option is not a general fix for bad programming.
Refer toSection 9.2for related information.
This feature was designed to allow parameters not normally known to PostgreSQL to be added by add-on modules (such as procedural languages). This allows extension modules to be configured in the standard ways.
Custom options have two-part names: an extension name, then a dot, then the parameter name proper, much like qualified names in SQL. An example is plpgsql.variable_conflict
.
Because custom options may need to be set in processes that have not loaded the relevant extension module, PostgreSQL will accept a setting for any two-part parameter name. Such variables are treated as placeholders and have no function until the module that defines them is loaded. When an extension module is loaded, it will add its variable definitions, convert any placeholder values according to those definitions, and issue warnings for any unrecognized placeholders that begin with its extension name.
exit_on_error
(boolean
)
If on, any error will terminate the current session. By default, this is set to off, so that only FATAL errors will terminate the session.
restart_after_crash
(boolean
)
When set to on, which is the default, PostgreSQL will automatically reinitialize after a backend crash. Leaving this value set to on is normally the best way to maximize the availability of the database. However, in some circumstances, such as when PostgreSQL is being invoked by clusterware, it may be useful to disable the restart so that the clusterware can gain control and take any actions it deems appropriate.
data_sync_retry
(boolean
)
When set to off, which is the default, PostgreSQL will raise a PANIC-level error on failure to flush modified data files to the file system. This causes the database server to crash. This parameter can only be set at server start.
On some operating systems, the status of data in the kernel's page cache is unknown after a write-back failure. In some cases it might have been entirely forgotten, making it unsafe to retry; the second attempt may be reported as successful, when in fact the data has been lost. In these circumstances, the only way to avoid data loss is to recover from the WAL after any failure is reported, preferably after investigating the root cause of the failure and replacing any faulty hardware.
If set to on, PostgreSQL will instead report an error but continue to run so that the data flushing operation can be retried in a later checkpoint. Only set it to on after investigating the operating system's treatment of buffered data in case of write-back failure.
PostgreSQL has native support for using SSL connections to encrypt client/server communications for increased security. This requires that OpenSSL is installed on both client and server systems and that support in PostgreSQL is enabled at build time (see Chapter 16).
With SSL support compiled in, the PostgreSQL server can be started with SSL enabled by setting the parameter ssl to on
in postgresql.conf
. The server will listen for both normal and SSL connections on the same TCP port, and will negotiate with any connecting client on whether to use SSL. By default, this is at the client's option; see Section 20.1 about how to set up the server to require use of SSL for some or all connections.
To start in SSL mode, files containing the server certificate and private key must exist. By default, these files are expected to be named server.crt
and server.key
, respectively, in the server's data directory, but other names and locations can be specified using the configuration parameters ssl_cert_file and ssl_key_file.
On Unix systems, the permissions on server.key
must disallow any access to world or group; achieve this by the command chmod 0600 server.key
. Alternatively, the file can be owned by root and have group read access (that is, 0640
permissions). That setup is intended for installations where certificate and key files are managed by the operating system. The user under which the PostgreSQL server runs should then be made a member of the group that has access to those certificate and key files.
If the data directory allows group read access then certificate files may need to be located outside of the data directory in order to conform to the security requirements outlined above. Generally, group access is enabled to allow an unprivileged user to backup the database, and in that case the backup software will not be able to read the certificate files and will likely error.
If the private key is protected with a passphrase, the server will prompt for the passphrase and will not start until it has been entered. Using a passphrase by default disables the ability to change the server's SSL configuration without a server restart, but see ssl_passphrase_command_supports_reload. Furthermore, passphrase-protected private keys cannot be used at all on Windows.
The first certificate in server.crt
must be the server's certificate because it must match the server's private key. The certificates of “intermediate” certificate authorities can also be appended to the file. Doing this avoids the necessity of storing intermediate certificates on clients, assuming the root and intermediate certificates were created with v3_ca
extensions. This allows easier expiration of intermediate certificates.
It is not necessary to add the root certificate to server.crt
. Instead, clients must have the root certificate of the server's certificate chain.
PostgreSQL reads the system-wide OpenSSL configuration file. By default, this file is named openssl.cnf
and is located in the directory reported by openssl version -d
. This default can be overridden by setting environment variable OPENSSL_CONF
to the name of the desired configuration file.
OpenSSL supports a wide range of ciphers and authentication algorithms, of varying strength. While a list of ciphers can be specified in the OpenSSL configuration file, you can specify ciphers specifically for use by the database server by modifying ssl_ciphers in postgresql.conf
.
It is possible to have authentication without encryption overhead by using NULL-SHA
or NULL-MD5
ciphers. However, a man-in-the-middle could read and pass communications between client and server. Also, encryption overhead is minimal compared to the overhead of authentication. For these reasons NULL ciphers are not recommended.
To require the client to supply a trusted certificate, place certificates of the root certificate authorities (CAs) you trust in a file in the data directory, set the parameter ssl_ca_file in postgresql.conf
to the new file name, and add the authentication option clientcert=verify-ca
or clientcert=verify-full
to the appropriate hostssl
line(s) in pg_hba.conf
. A certificate will then be requested from the client during SSL connection startup. (See Section 33.18 for a description of how to set up certificates on the client.)
For a hostssl
entry with clientcert=verify-ca
, the server will verify that the client's certificate is signed by one of the trusted certificate authorities. If clientcert=verify-full
is specified, the server will not only verify the certificate chain, but it will also check whether the username or its mapping matches the cn
(Common Name) of the provided certificate. Note that certificate chain validation is always ensured when the cert
authentication method is used (see Section 20.12).
Intermediate certificates that chain up to existing root certificates can also appear in the ssl_ca_file file if you wish to avoid storing them on clients (assuming the root and intermediate certificates were created with v3_ca
extensions). Certificate Revocation List (CRL) entries are also checked if the parameter ssl_crl_file is set. (See http://h41379.www4.hpe.com/doc/83final/ba554_90007/ch04s02.html for diagrams showing SSL certificate usage.)
The clientcert
authentication option is available for all authentication methods, but only in pg_hba.conf
lines specified as hostssl
. When clientcert
is not specified or is set to no-verify
, the server will still verify any presented client certificates against its CA file, if one is configured — but it will not insist that a client certificate be presented.
There are two approaches to enforce that users provide a certificate during login.
The first approach makes use of the cert
authentication method for hostssl
entries in pg_hba.conf
, such that the certificate itself is used for authentication while also providing ssl connection security. See Section 20.12 for details. (It is not necessary to specify any clientcert
options explicitly when using the cert
authentication method.) In this case, the cn
(Common Name) provided in the certificate is checked against the user name or an applicable mapping.
The second approach combines any authentication method for hostssl
entries with the verification of client certificates by setting the clientcert
authentication option to verify-ca
or verify-full
. The former option only enforces that the certificate is valid, while the latter also ensures that the cn
(Common Name) in the certificate matches the user name or an applicable mapping.
Table 18.2 summarizes the files that are relevant to the SSL setup on the server. (The shown file names are default names. The locally configured names could be different.)
The server reads these files at server start and whenever the server configuration is reloaded. On Windows systems, they are also re-read whenever a new backend process is spawned for a new client connection.
If an error in these files is detected at server start, the server will refuse to start. But if an error is detected during a configuration reload, the files are ignored and the old SSL configuration continues to be used. On Windows systems, if an error in these files is detected at backend start, that backend will be unable to establish an SSL connection. In all these cases, the error condition is reported in the server log.
To create a simple self-signed certificate for the server, valid for 365 days, use the following OpenSSL command, replacing dbhost.yourdomain.com
with the server's host name:
Then do:
because the server will reject the file if its permissions are more liberal than this. For more details on how to create your server private key and certificate, refer to the OpenSSL documentation.
While a self-signed certificate can be used for testing, a certificate signed by a certificate authority (CA) (usually an enterprise-wide root CA) should be used in production.
To create a server certificate whose identity can be validated by clients, first create a certificate signing request (CSR) and a public/private key file:
Then, sign the request with the key to create a root certificate authority (using the default OpenSSL configuration file location on Linux):
Finally, create a server certificate signed by the new root certificate authority:
server.crt
and server.key
should be stored on the server, and root.crt
should be stored on the client so the client can verify that the server's leaf certificate was signed by its trusted root certificate. root.key
should be stored offline for use in creating future certificates.
It is also possible to create a chain of trust that includes intermediate certificates:
server.crt
and intermediate.crt
should be concatenated into a certificate file bundle and stored on the server. server.key
should also be stored on the server. root.crt
should be stored on the client so the client can verify that the server's leaf certificate was signed by a chain of certificates linked to its trusted root certificate. root.key
and intermediate.key
should be stored offline for use in creating future certificates.
log_destination
(string
)PostgreSQL 支援多種記錄伺服器訊息的方法,包括 stderr、csvlog 和 syslog。在 Windows 上,支援 eventlog。 將此參數設定為用逗號分隔的所需日誌目標的列表。 預設情況下僅記錄到 stderr。此參數只能在 postgresql.conf 檔案或伺服器命令中設定。
如果 csvlog 包含在 log_destination 中的話,則日誌將以「逗號分隔」(CSV)格式輸出,便於將日誌載入到其他程序中。詳情請參閱第 19.8.4 節。 必須啟用 logging_collector 才能産生 CSV 格式的日誌輸出。
如果包含 stderr 或 csvlog,則會建立 current_logfiles 檔案以記錄日誌記錄收集器和相關日誌記錄目標目前正在使用的日誌檔案的位置。這提供了一種便捷的方式來查詢目前資料庫實例正在使用的日誌。這裡有這個檔案內容的一個例子:
當一個新的日誌檔案被建立為循環的效果,並且重新載入 log_destination 時,會重新建立 current_logfiles。當 log_destination 中不包含 stderr 和 csvlog,並且日誌記錄收集器被停用時,它將被刪除。
在大多數 Unix 系統上,您需要變更系統 syslog daemon 的配置,以便使用 log_destination 的 syslog 選項。PostgreSQL 可以登入到系統日誌工具 LOCAL0 到 LOCAL7(請參閱 syslog_facility),但大多數平台上的預設 syslog 配置將放棄所有此類訊息。您需要加入如下的內容:
變更 syslog 背景程序的配置檔案以使其産生作用。
在 Windows 上,當您為 log_destination 使用 eventlog 選項時,應該向作業系統註冊事件來源及其函式庫,以便 Windows 事件查詢器可以清楚地顯示事件日誌消息。詳情請參閱第 18.11 節。
logging_collector
(boolean
)此參數啟用日誌收集器,這是一個後端的程序,用於攔截發送到 stderr 的日誌訊息並將其重新輸出到日誌檔案。這種方法通常比記錄到 syslog 更有用,因為某些類型的訊息可能不會出現在 syslog 輸出之中。(一個常見的案例是動態連結程序失敗訊息;另一個案例是由如 archive_command 等腳本産生的錯誤訊息。)此參數只能在伺服器啟動時設定。
可以在不使用日誌收集器的情況下送到 stderr;日誌訊息將只發送到伺服器的 stderr 所指向的任何地方。但是,該方法僅適用於較低階的日誌程序,因為它不提供日誌檔案覆寫的簡便方法。另外,在某些不使用日誌收集器的平台上可能會導致日誌輸出遺失或出現亂碼,因為同時寫入同一日誌檔案的多個程序可能會覆蓋彼此的輸出。
日誌記錄收集器旨在永不遺失訊息。這意味著,如果負載極高,則在收集器落後時嘗試發送其他日誌消息時,伺服器程序可能會被阻止繼續執行。相比之下,如果系統日誌不能寫入訊息,系統日誌更喜歡丟棄訊息,這意味著在這種情況下它可能無法記錄某些訊息,但不會阻塞系統的其餘部分。
log_directory
(string
)當啟用 logging_collector 時,此參數確定將在其中建立日誌檔案的目錄。它可以被指定為絕對路徑,或相對於叢集的 data 目錄。該參數只能在 postgresql.conf 檔案或伺服器指令行中設定。預設值是 log。
log_filename
(string
)當啟用 logging_collector 時,此參數設定建立的日誌檔案的檔案名稱。該值被視為 strftime 模式,因此 %-escapes 可用於指定隨時間變化的檔案名稱。(請注意,如果有任何時區相關的 %-escapes,計算將在由 log_timezone 指定的區域中完成。)支援的 %-escapes 與 Open Group 的 strftime 規範中列出的類似。請注意,系統的 strftime 並未直接使用,因此特定於平台的(非標準)延伸功能不起作用。預設值是 postgresql-%Y-%m-%d_%H%M%S.log。
如果您指定的檔案名稱不含跳脫符號,則應該計劃使用日誌覆寫程序來避免最後存滿整個磁碟。在 8.4 之前的版本中,如果不存在 % 跳脫符號,PostgreSQL 會追加新日誌檔案建立時間的紀元,但已經不再是這種情況了。
如果在 log_destination 中啟用 CSV 格式的輸出,則會將時間戳記檔案名稱附加.csv 以建立 CSV 格式輸出的檔案名稱。 (如果 log_filename 以 .log 結尾,則替換後綴。)
該參數只能在 postgresql.conf 檔案或伺服器指令中設定。
log_file_mode
(integer
)在 Unix 系統上,此參數在啟用 logging_collector 時設定日誌檔案的權限。(在 Microsoft Windows 上,此參數將被忽略。)參數值預期為以 chmod 和 umask 系統呼叫接受的格式來指定的數字模式。(要使用習慣的八進制格式,數字必須以 0(零)開頭。)
預設權限為 0600,這意味著只有伺服器擁有者才能讀取或寫入日誌檔案。另一個常用的設定是 0640,允許擁有者組群的成員讀取文件。 但是請注意,要使用這種設定,您需要變更 log_directory 以將檔案儲存在叢集 data 目錄之外的某個位置。無論如何,使日誌檔案讓任何人都可讀是不明智的,因為它們可能包含敏感資料。
該參數只能在 postgresql.conf 檔案或伺服器指令中設定。
log_rotation_age
(integer
)當啟用 logging_collector 時,此參數決定單個日誌檔案的最長生命週期。經過指定的分鐘後,會建立一個新的日誌檔案。設定為零以停用基於時間的新日誌檔案建立。該參數只能在 postgresql.conf 檔案或伺服器指令中設定。
log_rotation_size
(integer
)當啟用 logging_collector 時,此參數決定單個日誌檔的大小上限。在超過上限的記錄被發送到日誌檔案後,將建立一個新的日誌檔案。設定為零以禁用基於大小的新日誌檔案創立。該參數只能在 postgresql.conf 檔案或伺服器指令中設定。
log_truncate_on_rotation
(boolean
)當啟用 logging_collector 時,此參數將導致 PostgreSQL 分割(覆蓋)而不是追加到任何具有相同名稱的現有日誌檔案。 但是,分割只會在由於基於時間的覆寫而打開新檔案時發生,而不是在伺服器啟動或基於大小的覆寫情況進行。關閉時,預先存在的檔案將被附加到所有情況下。例如,將此設定與 log_filename(如 postgresql-%H.log)結合使用可産生 24 個小時日誌檔案,然後循環覆蓋它們。該參數只能在 postgresql.conf 檔案或伺服器指令中設定。
例如:要保留 7 天的日誌,每天一個名稱為 server_log.Mon,server_log.Tue 等的日誌檔案,並自動使用本週的日誌覆蓋上週的日誌,將 log_filename 設定為server_log.%a,將 log_truncate_on_rotation 設定為 on,並將 log_rotation_age 到 1440。
又例如:要保留 24 小時的日誌,每小時記錄一個日誌檔案,但是如果日誌檔案大小超過 1GB,則會盡快輪換,將 log_filename 設定為 server_log.%H%M,log_truncate_on_rotation 為 on,log_rotation_age 為 60,log_rotation_size 為1000000。在 log_filename 中包含 %M 允許可能出現的任何大小驅動的旋轉,以選擇與小時的初始檔案名稱不同的檔案名稱。
syslog_facility
(enum
)當啟用日誌記錄到 syslog 時,此參數確定要使用的系統日誌的「設施」。 您可以選擇 LOCAL0,LOCAL1,LOCAL2,LOCAL3,LOCAL4,LOCAL5,LOCAL6,LOCAL7;預設值是 LOCAL0。另請參閱系統的 syslog 背景程序的文件。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
syslog_ident
(string
)當啟用日誌記錄到系統日誌時,此參數決定用於在系統日誌中識別 PostgreSQL 記錄的程序名稱。預設是 postgres。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
syslog_sequence_numbers
(boolean
)當記錄到系統日誌並且這是啓用的(預設),那麼每筆記錄將以遞增的序列號碼(例如[2])作為前置內容。這規避了「---最後一條記錄重複 N 次---」抑制了許多 syslog 實務上預設執行的操作。在更現代的 syslog 實作中,可以設定重複的記錄抑制(例如,rsyslog 中的 $RepeatedMsgReduction),所以這可能不是必要的。另外,如果你真的想要抑制重複的記錄,就可以關掉它。
此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
syslog_split_messages
(boolean
)當啟用日誌記錄到 syslog 時,此參數決定記錄如何傳遞到系統日誌。啟用時(預設),記錄按行分割,使得行長在 1024 字元以下,這是傳統 syslog 實作的典型大小限制。關閉時,PostgreSQL 伺服器日誌記錄會按原樣傳遞到系統日誌服務,並由系統日誌服務來處理潛在的龐大記錄。
如果 syslog 最終記錄到文字檔案,那麼效果將是相同的,並且最好將設定保留為開啟狀態,因為大多數 syslog 實作無法處理大量記錄,或者需要專門設定以處理它們。但是,如果系統日誌最終寫入其他媒體,將記錄邏輯上地組合在一起可能是必要的或更有用的。
此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
event_source
(string
)當啟用記錄到事件日誌時,此參數確定用於在記錄中識別 PostgreSQL 記錄的程序名稱。預設是 PostgreSQL。 此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
client_min_messages
(enum
)控制將哪些訊息等級要發送到用戶端。有效的值為 DEBUG5、DEBUG4、DEBUG3、DEBUG2、DEBUG1、LOG、NOTICE、WARNING、ERROR、FATAL 和 PANIC。每個等級包括其後的所有等級。等級越低,發送的訊息越少。預設值為 NOTICE。請注意,LOG 在此處的排名與 log_min_messages 中的排序不同。
log_min_messages
(enum
)控制將哪些訊息等級寫入伺服器日誌。有效的值為 DEBUG5、DEBUG4、DEBUG3、DEBUG2、DEBUG1、INFO、NOTICE、WARNING、ERROR、LOG、FATAL 和 PANIC。每個等級包括其後的所有等級。等級越低,發送到日誌的訊息越少。預設值為 WARNING。請注意,LOG 在此處的排序與 client_min_messages 中的排名不同。只有超級使用者才能變更此設定。
log_min_error_statement
(enum
)將導致錯誤情況的 SQL 語句記錄在伺服器日誌中。當下的 SQL 語句包含在指定的嚴重性或更高等級的任何訊息日誌項目中。有效值為 DEBUG5、DEBUG4、DEBUG3、DEBUG2、DEBUG1、INFO、NOTICE、WARNING、ERROR、LOG、FATAL 和 PANIC。預設值為 ERROR,這意味著將會記錄 ERROR、LOG、FATL 或 PANIC。要有效地關閉失敗語句的日誌記錄,請將此參數設定為PANIC。只有超級使用者才能變更此設定。
log_min_duration_statement
(integer
)如果語句執行達到指定的毫秒數,則會記錄每個已完成語句的執行時間。將此值設定為零將輸出所有語句的執行時間。減號(預設值)停用日誌記錄語句執行時間。例如,如果將其設定為 250ms,則將記錄執行 250ms 或更長時間的所有 SQL 語句。啟用此參數有助於在應用程序中追踪未優化的查詢。只有超級使用者才能變更此設定。
對於使用延伸查詢協議的用戶端,Parse、Bind 和 Execute 步驟的執行時間是獨立記錄的。
將此選項與 log_statement 一起使用時,由於 log_statement 而記錄的語句文字將不會在執行時間日誌訊息中重複。如果您不使用 syslog,建議您使用 log_line_prefix 記錄 PID 或連線 ID,以便可以使用 PID 或連線 ID將語句訊息連接到之後的執行時間訊息。
表格 19.1 說明了 PostgreSQL 使用的訊息嚴重性等級。如果將日誌記錄輸出發送到 syslog 或 Windows 的事件日誌,則嚴重性等級將按表格中所示進行轉換。
application_name
(string
)application_name 可以是少於 NAMEDATALEN 個字元的任何字串(標準版本中為 64 個字元)。它通常由應用程序在連線到伺服器時設定。此名稱將顯示在 pg_stat_activity 檢視表中,並包含在 CSV 日誌項目中。它還可以透過 log_line_prefix 參數包含在日常日誌項目中。application_name 中只能使用可列印的 ASCII 字元。其他字元將替換為問號(?)。
debug_print_parse
(boolean
) debug_print_rewritten
(boolean
) debug_print_plan
(boolean
)這些參數可以發出各種除錯輸出。設定後,它們將輸出産生的語法解析樹,查詢重寫程序輸出或每個已執行查詢的執行計劃。這些訊息以 LOG 訊息等級發出,因此預設情況下它們將顯示在伺服器日誌中,但不會發送到客戶端。您可以透過調整 client_min_messages 或 log_min_messages 來變更它。這些參數預設是關閉的。
debug_pretty_print
(boolean
)設定後,debug_pretty_print 會放進 debug_print_parse,debug_print_rewritten 或debug_print_plan 産生的訊息。與關閉時使用的緊湊格式相比,這會產生更多可讀但更長的輸出。它預設開啟。
log_checkpoints
(boolean
)使檢查點和重新啟動點記錄在伺服器日誌中。日誌訊息中包含一些統計訊息,包括寫入的緩衝區數和寫入時間。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設為關閉。
log_connections
(boolean
)導致記錄每個嘗試連線到伺服器,以及成功完成用戶端身份驗證。只有超級使用者才能在連線開始時變更此參數,之後在連線中無法更改。預設為關閉。
注意 某些用戶端程序(如 psql)在確定是否需要密碼時會嘗試連線兩次,因此重複的“connection received”訊息不一定表示存在問題。
log_disconnections
(boolean
)導致連線終止會被記錄。日誌輸出提供類似於 log_connections 的訊息,以及連線的持續時間。只有超級使用者才能在連線開始時變更此參數,並且在連線中無法更改。預設為關閉。
log_duration
(boolean
)記錄每個已完成語句的持續時間。預設為關閉。只有超級使用者才能變更此設定。
對於使用延伸查詢協議的用戶端,Parse、Bind 和 Execute 步驟的持續時間是獨立記錄的。
注意 設定選項和將 log_min_duration_statement 設定為零之間的區別在於,超出 log_min_duration_statement 會強制記錄查詢的語句,但此選項不會。因此,如果啟用了 log_duration 且 log_min_duration_statement 具有正值,則會記錄所有持續時間,但僅包含超過閾值的語句的查詢語句。此行為對於在高負載環境中收集統計訊息非常有用。
log_error_verbosity
(enum
)控制記錄的每條訊息在伺服器日誌中寫入的詳細訊息量。有效值為 TERSE,DEFAULT 和 VERBOSE,每個都向顯示的訊息加上更多內容。TERSE 排除記錄DETAIL,HINT,QUERY 和 CONTEXT 錯誤訊息。VERBOSE 輸出包括 SQLSTATE 錯誤代碼(另請參閱附錄 A)以及産生錯誤的原始檔案名稱,函數名稱和行號。只有超級使用者才能更改此設定。
log_hostname
(boolean
)預設情況下,連線日誌訊息僅顯示連線主機的 IP 位址。打開此參數就會記錄主機名。請注意,根據您的主機名稱解析設定,這可能會造成不可忽視的效能損失。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
log_line_prefix
(string
)這是一個 printf 樣式的字串,在每個日誌的開頭輸出。%字元開始「跳脫序列(escape sequence)」,它們會被狀態訊息替換,如下所述。 無法識別的跳脫字元會被忽略。其他字元將直接複製到日誌內容。某些跳脫字元只能由連線程序識別,並且將被背景程序(例如主伺服器程序)視為空。透過在 % 之後和選項之前指定數字文字,可以向左或向右對齊狀態訊息。負值會將狀態信息在右側填充空格以給予一個最小寬度,而正值將填充在左側。填充可用於增加日誌檔案中的可讀性。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值為'%m [%p]',用於記錄時間戳記和程序 ID。
%c 跳脫字元輸出一個幾乎唯一的連線指標,兩個由點分隔的 4 位元組的十六進制數字(不帶前導零)組成。數字是流程開始時間和程序 ID,因此 %c 也可以用作輸出這些項目的節省空間的方式。例如,要從 pg_stat_activity 産生連線指標,請使用以下查詢:
小技巧 如果為 log_line_prefix 設定了非空值,則通常應將其最後一個字元設為空格,以便與日誌行的其餘部分進行視覺隔離。也可以使用標點符號。
小技巧 Syslog 會産生成自己的時間戳記和程序 ID 訊息,因此如果要輸出到 syslog,可能不希望包含這些跳脫字元。
小技巧 當包含僅在使用者或資料庫名稱等連線(後端)內容中可用的訊息時,%q 跳脫字元非常有用。例如:
log_lock_waits
(boolean
)控制連線等待時間超過 deadlock_timeout 時是否產生日誌訊息。這對於確定鎖定等待是否導致性能較差很有用。預設是關閉的。只有超級使用者可以變更此設定。
log_statement
(enum
)控制記錄哪些 SQL 語句。有效值為 none(off),ddl,mod 和 all(所有語句)。 ddl 記錄所有資料定義語句,例如 CREATE,ALTER 和 DROP 語句。mod 記錄所有 ddl 語句,以及 INSERT,UPDATE,DELETE,TRUNCATE 和 COPY FROM 等資料修改語句。如果包含的指令屬於適合的類型,也會記錄 PREPARE,EXECUTE 和 EXPLAIN ANALYZE 語句。對於使用延伸查詢協議的用戶端,在收到 Execute 訊息時會發生日誌記錄,並且包含 Bind 參數的值(任何嵌入的單引號標記加倍)。
預設值為 none。只有超級使用者才能變更此設定。
注意 即使是 log_statement = all 設定也不會記錄包含簡單語法錯誤的語句,因為只有在完成基本分析以確定語句類型後才會發出日誌訊息。在延伸查詢協議的情況下,此設定同樣不記錄在執行階段之前失敗的語句(即,在解析分析或計劃期間)。將 log_min_error_statement 設定為 ERROR(或更低)以記錄此類語句。
log_replication_commands
(boolean
)讓每個複寫指令都記錄在伺服器日誌中。有關複寫指令的更多訊息,請參閱第 52.4 節。預設值為 off。只有超級使用者才能變更此設定。
log_temp_files
(integer
)控制臨時檔案名稱和大小的記錄。可以為排序,雜湊和臨時查詢結果建立臨時檔案。移除時,會為每個臨時檔案建立一個日誌項目。值為 0 時會記錄所有臨時檔案訊息,而正值僅記錄大小大於或等於指定 KB 的檔案。預設設定為 -1,停用此類日誌記錄。只有超級使用者才能變更此設定。
log_timezone
(string
)設定用於在伺服器日誌中寫入的時間戳記的時區。與 TimeZone 不同,此值是叢集範圍的,因此所有連線都將一致地報告時間戳記。內建的預設值是 GMT,但這通常在 postgresql.conf 中會再設定過;initdb 將在那裡安裝與其系統環境相對應的設定。有關更多訊息,請參閱第 8.5.3 節。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
在 log_destination 列表中包含 csvlog 提供了將日誌檔案匯入資料庫資料表的便捷方法。此選項以逗號分隔(CSV)格式送出日誌資料,其中包含以下欄位:時間戳記,毫秒,使用者名稱,資料庫名稱,程序 ID,用戶端主機:連接埠號號,連線 ID,每個連線的行號,指令標記,連線開始時間,虛擬交易事務 ID,一般交易事務 ID,錯誤嚴重性,SQLSTATE 代碼,錯誤訊息,錯誤訊息的詳細訊息,提示,導致錯誤的內部查詢(如果有的話),其中錯誤位置的字串位置,錯誤內容,導致錯誤的使用者查詢(如果有的話,由 log_min_error_statement 啟用),其中錯誤位置的字元數,PostgreSQL 原始碼中的錯誤位置(如果 log_error_verbosity 設定為 verbose)和應用程序名稱。以下是用於儲存 CSV 格式日誌輸出的範例資料表定義:
要將日誌檔案匯入此資料表,請使用 COPY FROM 指令:
您需要做一些事情來簡化匯入 CSV 日誌檔案:
設定 log_filename 和 log_rotation_age 使日誌檔案提供一致性,可預測的命名方案。這使您可以預測檔案名稱會是什麼,並知道單個日誌檔案何時完成而可以匯入。
將 log_rotation_size 設定為 0 可停用基於大小的日誌輪轉,因為它會使日誌檔案名稱難以預測。
將 log_truncate_on_rotation 設定為 on,以便舊的日誌資料不會與同一檔案中的新資料混合。
上面的資料表定義包含主鍵規範。這有助於防止意外匯入兩次相同的訊息。COPY 指令一次提交它匯入的所有資料,因此任何錯誤都會導致整個匯入失敗。如果匯入部分日誌檔案,並在稍後再次匯入該檔案時,主鍵重覆將導致匯入失敗。請等到日誌完成關閉後再匯入。此過程還可以防止意外匯入尚未完全寫入的部分資料列,這也會導致 COPY 失敗。
這些設定控制如何修改伺服器程序的程序標題。程序標題通常使用如 ps 或 Windows 上的 Process Explorer 查看。詳情請參閱第 28.1 節。
cluster_name
(string
)設定此叢集中所有伺服器程序的程序標題中顯示的叢集名稱。該名稱可以是任何少於 NAMEDATALEN 字元數的字串(標準版本中為 64 個字元)。cluster_name 值中只能使用可列印的 ASCII 字元。其他字元將被替換為問號(?)。如果此參數設定為空字串“”(這是預設值),則不顯示名稱。此參數只能在伺服器啟動時設定。
update_process_title
(boolean
)每當伺服器收到新的 SQL 指令時,都可以更新程序標題。此設定在大多數平台上預設為開啟,不過在 Windows 上預設為關閉,因為在 Windows 上更新程序標題的開銷較大。只有超級使用者可以變更此設定。
For additional information on tuning these settings, see Section 29.4.
wal_level
(enum
)
wal_level
determines how much information is written to the WAL. The default value is replica
, which writes enough data to support WAL archiving and replication, including running read-only queries on a standby server. minimal
removes all logging except the information required to recover from a crash or immediate shutdown. Finally, logical
adds information necessary to support logical decoding. Each level includes the information logged at all lower levels. This parameter can only be set at server start.
In minimal
level, WAL-logging of some bulk operations can be safely skipped, which can make those operations much faster (see Section 14.4.7). Operations in which this optimization can be applied include:
But minimal WAL does not contain enough information to reconstruct the data from a base backup and the WAL logs, so replica
or higher must be used to enable WAL archiving (archive_mode) and streaming replication.
In logical
level, the same information is logged as with replica
, plus information needed to allow extracting logical change sets from the WAL. Using a level of logical
will increase the WAL volume, particularly if many tables are configured for REPLICA IDENTITY FULL
and many UPDATE
and DELETE
statements are executed.
In releases prior to 9.6, this parameter also allowed the values archive
and hot_standby
. These are still accepted but mapped to replica
.
fsync
(boolean
)If this parameter is on, the PostgreSQL server will try to make sure that updates are physically written to disk, by issuing fsync()
system calls or various equivalent methods (see wal_sync_method). This ensures that the database cluster can recover to a consistent state after an operating system or hardware crash.
While turning off fsync
is often a performance benefit, this can result in unrecoverable data corruption in the event of a power failure or system crash. Thus it is only advisable to turn off fsync
if you can easily recreate your entire database from external data.
Examples of safe circumstances for turning off fsync
include the initial loading of a new database cluster from a backup file, using a database cluster for processing a batch of data after which the database will be thrown away and recreated, or for a read-only database clone which gets recreated frequently and is not used for failover. High quality hardware alone is not a sufficient justification for turning off fsync
.
For reliable recovery when changing fsync
off to on, it is necessary to force all modified buffers in the kernel to durable storage. This can be done while the cluster is shutdown or while fsync
is on by running initdb --sync-only
, running sync
, unmounting the file system, or rebooting the server.
In many situations, turning off synchronous_commit for noncritical transactions can provide much of the potential performance benefit of turning off fsync
, without the attendant risks of data corruption.
fsync
can only be set in the postgresql.conf
file or on the server command line. If you turn this parameter off, also consider turning off full_page_writes.
synchronous_commit
(enum
)Specifies whether transaction commit will wait for WAL records to be written to disk before the command returns a “success” indication to the client. Valid values are on
, remote_apply
, remote_write
, local
, and off
. The default, and safe, setting is on
. When off
, there can be a delay between when success is reported to the client and when the transaction is really guaranteed to be safe against a server crash. (The maximum delay is three times wal_writer_delay.) Unlike fsync, setting this parameter to off
does not create any risk of database inconsistency: an operating system or database crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly. So, turning synchronous_commit
off can be a useful alternative when performance is more important than exact certainty about the durability of a transaction. For more discussion see Section 29.3.
If synchronous_standby_names is non-empty, this parameter also controls whether or not transaction commits will wait for their WAL records to be replicated to the standby server(s). When set to on
, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and flushed it to disk. This ensures the transaction will not be lost unless both the primary and all synchronous standbys suffer corruption of their database storage. When set to remote_apply
, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and applied it, so that it has become visible to queries on the standby(s). When set to remote_write
, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and written it out to their operating system. This setting is sufficient to ensure data preservation even if a standby instance of PostgreSQL were to crash, but not if the standby suffers an operating-system-level crash, since the data has not necessarily reached stable storage on the standby. Finally, the setting local
causes commits to wait for local flush to disk, but not for replication. This is not usually desirable when synchronous replication is in use, but is provided for completeness.
If synchronous_standby_names
is empty, the settings on
, remote_apply
, remote_write
and local
all provide the same synchronization level: transaction commits only wait for local flush to disk.
This parameter can be changed at any time; the behavior for any one transaction is determined by the setting in effect when it commits. It is therefore possible, and useful, to have some transactions commit synchronously and others asynchronously. For example, to make a single multistatement transaction commit asynchronously when the default is the opposite, issue SET LOCAL synchronous_commit TO OFF
within the transaction.
wal_sync_method
(enum
)Method used for forcing WAL updates out to disk. If fsync
is off then this setting is irrelevant, since WAL file updates will not be forced out at all. Possible values are:
open_datasync
(write WAL files with open()
option O_DSYNC
)
fdatasync
(call fdatasync()
at each commit)
fsync
(call fsync()
at each commit)
fsync_writethrough
(call fsync()
at each commit, forcing write-through of any disk write cache)
open_sync
(write WAL files with open()
option O_SYNC
)
The open_
* options also use O_DIRECT
if available. Not all of these choices are available on all platforms. The default is the first method in the above list that is supported by the platform, except that fdatasync
is the default on Linux. The default is not necessarily ideal; it might be necessary to change this setting or other aspects of your system configuration in order to create a crash-safe configuration or achieve optimal performance. These aspects are discussed in Section 29.1. This parameter can only be set in the postgresql.conf
file or on the server command line.
full_page_writes
(boolean
)啟用此參數後,PostgreSQL 伺服器會在檢查點之後對該頁面的首次修改期間將每個磁碟頁面的全部內容寫入 WAL。這是必要的,因為在作業系統當機期間正在進行的頁面寫入可能僅部分完成,從而導致包含新舊資料混合在磁碟頁面之中。通常在 WAL 中所儲存的資料列層級更改資料不足以在當機後還原期間完全還原此類頁面。儲存完整的頁面映像可確保還原正確的頁面,但是這樣做的代價是增加了必須寫入 WAL 的資料量。 (由於 WAL 重放總是從檢查點開始,因此在檢查點之後每頁的第一次更改期間執行此操作就足夠了。也因此,減少全頁寫入成本的一種方法是增加檢查點間隔參數。)
停用此參數可加快正常操作的速度,但在系統故障後可能會導致不可恢復的資料損壞或未知的資料損壞。風險與關閉 fsync 相似,儘管較小,但應僅根據針對該參數建議的相同情況將其關閉。
禁用此參數不會影響使用 WAL 歸檔進行時間點還原作業(PITR)(請參閱第 25.3 節)。
該參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設為 on。
wal_log_hints
(boolean
)When this parameter is on
, the PostgreSQL server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint, even for non-critical modifications of so-called hint bits.
If data checksums are enabled, hint bit updates are always WAL-logged and this setting is ignored. You can use this setting to test how much extra WAL-logging would occur if your database had data checksums enabled.
This parameter can only be set at server start. The default value is off
.
wal_compression
(boolean
)When this parameter is on
, the PostgreSQL server compresses a full page image written to WAL when full_page_writes is on or during a base backup. A compressed page image will be decompressed during WAL replay. The default value is off
. Only superusers can change this setting.
Turning this parameter on can reduce the WAL volume without increasing the risk of unrecoverable data corruption, but at the cost of some extra CPU spent on the compression during WAL logging and on the decompression during WAL replay.
wal_buffers
(integer
)The amount of shared memory used for WAL data that has not yet been written to disk. The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers, but not less than 64kB
nor more than the size of one WAL segment, typically 16MB
. This value can be set manually if the automatic choice is too large or too small, but any positive value less than 32kB
will be treated as 32kB
. If this value is specified without units, it is taken as WAL blocks, that is XLOG_BLCKSZ
bytes, typically 8kB. This parameter can only be set at server start.
The contents of the WAL buffers are written out to disk at every transaction commit, so extremely large values are unlikely to provide a significant benefit. However, setting this value to at least a few megabytes can improve write performance on a busy server where many clients are committing at once. The auto-tuning selected by the default setting of -1 should give reasonable results in most cases.
wal_writer_delay
(integer
)Specifies how often the WAL writer flushes WAL, in time terms. After flushing WAL the writer sleeps for the length of time given by wal_writer_delay
, unless woken up sooner by an asynchronously committing transaction. If the last flush happened less than wal_writer_delay
ago and less than wal_writer_flush_after
worth of WAL has been produced since, then WAL is only written to the operating system, not flushed to disk. If this value is specified without units, it is taken as milliseconds. The default value is 200 milliseconds (200ms
). Note that on many systems, the effective resolution of sleep delays is 10 milliseconds; setting wal_writer_delay
to a value that is not a multiple of 10 might have the same results as setting it to the next higher multiple of 10. This parameter can only be set in the postgresql.conf
file or on the server command line.
wal_writer_flush_after
(integer
)Specifies how often the WAL writer flushes WAL, in volume terms. If the last flush happened less than wal_writer_delay
ago and less than wal_writer_flush_after
worth of WAL has been produced since, then WAL is only written to the operating system, not flushed to disk. If wal_writer_flush_after
is set to 0
then WAL data is always flushed immediately. If this value is specified without units, it is taken as WAL blocks, that is XLOG_BLCKSZ
bytes, typically 8kB. The default is 1MB
. This parameter can only be set in the postgresql.conf
file or on the server command line.
commit_delay
(integer
)Setting commit_delay
adds a time delay before a WAL flush is initiated. This can improve group commit throughput by allowing a larger number of transactions to commit via a single WAL flush, if system load is high enough that additional transactions become ready to commit within the given interval. However, it also increases latency by up to the commit_delay
for each WAL flush. Because the delay is just wasted if no other transactions become ready to commit, a delay is only performed if at least commit_siblings
other transactions are active when a flush is about to be initiated. Also, no delays are performed if fsync
is disabled. If this value is specified without units, it is taken as microseconds. The default commit_delay
is zero (no delay). Only superusers can change this setting.
In PostgreSQL releases prior to 9.3, commit_delay
behaved differently and was much less effective: it affected only commits, rather than all WAL flushes, and waited for the entire configured delay even if the WAL flush was completed sooner. Beginning in PostgreSQL 9.3, the first process that becomes ready to flush waits for the configured interval, while subsequent processes wait only until the leader completes the flush operation.
commit_siblings
(integer
)Minimum number of concurrent open transactions to require before performing the commit_delay
delay. A larger value makes it more probable that at least one other transaction will become ready to commit during the delay interval. The default is five transactions.
checkpoint_timeout
(integer
)自動 WAL 檢查點之間的最長時間。如果指定的值不帶單位,則以秒為單位。有效範圍是 30 秒至 1 天。預設值為五分鐘(5 分鐘)。增大此參數可能會增加當機回復所需的時間。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
checkpoint_completion_target
(floating point
)
Specifies the target of checkpoint completion, as a fraction of total time between checkpoints. The default is 0.5. This parameter can only be set in the postgresql.conf
file or on the server command line.
checkpoint_flush_after
(integer
)
Whenever more than this amount of data has been written while performing a checkpoint, attempt to force the OS to issue these writes to the underlying storage. Doing so will limit the amount of dirty data in the kernel's page cache, reducing the likelihood of stalls when an fsync
is issued at the end of the checkpoint, or when the OS writes data back in larger batches in the background. Often that will result in greatly reduced transaction latency, but there also are some cases, especially with workloads that are bigger than shared_buffers, but smaller than the OS's page cache, where performance might degrade. This setting may have no effect on some platforms. If this value is specified without units, it is taken as blocks, that is BLCKSZ
bytes, typically 8kB. The valid range is between 0
, which disables forced writeback, and 2MB
. The default is 256kB
on Linux, 0
elsewhere. (If BLCKSZ
is not 8kB, the default and maximum values scale proportionally to it.) This parameter can only be set in the postgresql.conf
file or on the server command line.
checkpoint_warning
(integer
)Write a message to the server log if checkpoints caused by the filling of WAL segment files happen closer together than this amount of time (which suggests that max_wal_size
ought to be raised). If this value is specified without units, it is taken as seconds. The default is 30 seconds (30s
). Zero disables the warning. No warnings will be generated if checkpoint_timeout
is less than checkpoint_warning
. This parameter can only be set in the postgresql.conf
file or on the server command line.
max_wal_size
(integer
)使 WAL 增長到自動 WAL 檢查點之間的最大大小。這是一個軟限制。在特殊情況下,例如重度負載,失敗的 archive_command 或較高的 wal_keep_segments 設定,WAL 大小可能會超過 max_wal_size。如果指定的該值不帶單位,則以 MegaByte 為單位。預設值為1 GB。增大此參數可能會增加當機回復所需的時間。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
min_wal_size
(integer
)As long as WAL disk usage stays below this setting, old WAL files are always recycled for future use at a checkpoint, rather than removed. This can be used to ensure that enough WAL space is reserved to handle spikes in WAL usage, for example when running large batch jobs. If this value is specified without units, it is taken as megabytes. The default is 80 MB. This parameter can only be set in the postgresql.conf
file or on the server command line.
archive_mode
(enum
)
When archive_mode
is enabled, completed WAL segments are sent to archive storage by setting archive_command. In addition to off
, to disable, there are two modes: on
, and always
. During normal operation, there is no difference between the two modes, but when set to always
the WAL archiver is enabled also during archive recovery or standby mode. In always
mode, all files restored from the archive or streamed with streaming replication will be archived (again). See Section 26.2.9 for details.
archive_mode
and archive_command
are separate variables so that archive_command
can be changed without leaving archiving mode. This parameter can only be set at server start. archive_mode
cannot be enabled when wal_level
is set to minimal
.
archive_command
(string
)
The local shell command to execute to archive a completed WAL file segment. Any %p
in the string is replaced by the path name of the file to archive, and any %f
is replaced by only the file name. (The path name is relative to the working directory of the server, i.e., the cluster's data directory.) Use %%
to embed an actual %
character in the command. It is important for the command to return a zero exit status only if it succeeds. For more information see Section 25.3.1.
This parameter can only be set in the postgresql.conf
file or on the server command line. It is ignored unless archive_mode
was enabled at server start. If archive_command
is an empty string (the default) while archive_mode
is enabled, WAL archiving is temporarily disabled, but the server continues to accumulate WAL segment files in the expectation that a command will soon be provided. Setting archive_command
to a command that does nothing but return true, e.g. /bin/true
(REM
on Windows), effectively disables archiving, but also breaks the chain of WAL files needed for archive recovery, so it should only be used in unusual circumstances.
archive_timeout
(integer
)
The archive_command is only invoked for completed WAL segments. Hence, if your server generates little WAL traffic (or has slack periods where it does so), there could be a long delay between the completion of a transaction and its safe recording in archive storage. To limit how old unarchived data can be, you can set archive_timeout
to force the server to switch to a new WAL segment file periodically. When this parameter is greater than zero, the server will switch to a new segment file whenever this amount of time has elapsed since the last segment file switch, and there has been any database activity, including a single checkpoint (checkpoints are skipped if there is no database activity). Note that archived files that are closed early due to a forced switch are still the same length as completely full files. Therefore, it is unwise to use a very short archive_timeout
— it will bloat your archive storage. archive_timeout
settings of a minute or so are usually reasonable. You should consider using streaming replication, instead of archiving, if you want data to be copied off the master server more quickly than that. If this value is specified without units, it is taken as seconds. This parameter can only be set in the postgresql.conf
file or on the server command line.
This section describes the settings that apply only for the duration of the recovery. They must be reset for any subsequent recovery you wish to perform.
“Recovery” covers using the server as a standby or for executing a targeted recovery. Typically, standby mode would be used to provide high availability and/or read scalability, whereas a targeted recovery is used to recover from data loss.
To start the server in standby mode, create a file called standby.signal
in the data directory. The server will enter recovery and will not stop recovery when the end of archived WAL is reached, but will keep trying to continue recovery by connecting to the sending server as specified by the primary_conninfo
setting and/or by fetching new WAL segments using restore_command
. For this mode, the parameters from this section and Section 19.6.3 are of interest. Parameters from Section 19.5.5 will also be applied but are typically not useful in this mode.
To start the server in targeted recovery mode, create a file called recovery.signal
in the data directory. If both standby.signal
and recovery.signal
files are created, standby mode takes precedence. Targeted recovery mode ends when the archived WAL is fully replayed, or when recovery_target
is reached. In this mode, the parameters from both this section and Section 19.5.5 will be used.
restore_command
(string
)
The local shell command to execute to retrieve an archived segment of the WAL file series. This parameter is required for archive recovery, but optional for streaming replication. Any %f
in the string is replaced by the name of the file to retrieve from the archive, and any %p
is replaced by the copy destination path name on the server. (The path name is relative to the current working directory, i.e., the cluster's data directory.) Any %r
is replaced by the name of the file containing the last valid restart point. That is the earliest file that must be kept to allow a restore to be restartable, so this information can be used to truncate the archive to just the minimum required to support restarting from the current restore. %r
is typically only used by warm-standby configurations (see Section 26.2). Write %%
to embed an actual %
character.
It is important for the command to return a zero exit status only if it succeeds. The command will be asked for file names that are not present in the archive; it must return nonzero when so asked. Examples:
An exception is that if the command was terminated by a signal (other than SIGTERM, which is used as part of a database server shutdown) or an error by the shell (such as command not found), then recovery will abort and the server will not start up.
This parameter can only be set at server start.
archive_cleanup_command
(string
)
This optional parameter specifies a shell command that will be executed at every restartpoint. The purpose of archive_cleanup_command
is to provide a mechanism for cleaning up old archived WAL files that are no longer needed by the standby server. Any %r
is replaced by the name of the file containing the last valid restart point. That is the earliest file that must be kept to allow a restore to be restartable, and so all files earlier than %r
may be safely removed. This information can be used to truncate the archive to just the minimum required to support restart from the current restore. The pg_archivecleanup module is often used in archive_cleanup_command
for single-standby configurations, for example:
Note however that if multiple standby servers are restoring from the same archive directory, you will need to ensure that you do not delete WAL files until they are no longer needed by any of the servers. archive_cleanup_command
would typically be used in a warm-standby configuration (see Section 26.2). Write %%
to embed an actual %
character in the command.
If the command returns a nonzero exit status then a warning log message will be written. An exception is that if the command was terminated by a signal or an error by the shell (such as command not found), a fatal error will be raised.
This parameter can only be set in the postgresql.conf
file or on the server command line.
recovery_end_command
(string
)
This parameter specifies a shell command that will be executed once only at the end of recovery. This parameter is optional. The purpose of the recovery_end_command
is to provide a mechanism for cleanup following replication or recovery. Any %r
is replaced by the name of the file containing the last valid restart point, like in archive_cleanup_command.
If the command returns a nonzero exit status then a warning log message will be written and the database will proceed to start up anyway. An exception is that if the command was terminated by a signal or an error by the shell (such as command not found), the database will not proceed with startup.
This parameter can only be set in the postgresql.conf
file or on the server command line.
By default, recovery will recover to the end of the WAL log. The following parameters can be used to specify an earlier stopping point. At most one of recovery_target
, recovery_target_lsn
, recovery_target_name
, recovery_target_time
, or recovery_target_xid
can be used; if more than one of these is specified in the configuration file, an error will be raised. These parameters can only be set at server start.
recovery_target
= 'immediate'
This parameter specifies that recovery should end as soon as a consistent state is reached, i.e. as early as possible. When restoring from an online backup, this means the point where taking the backup ended.
Technically, this is a string parameter, but 'immediate'
is currently the only allowed value.
recovery_target_name
(string
)
This parameter specifies the named restore point (created with pg_create_restore_point()
) to which recovery will proceed.
recovery_target_time
(timestamp
)
This parameter specifies the time stamp up to which recovery will proceed. The precise stopping point is also influenced by recovery_target_inclusive.
recovery_target_xid
(string
)
This parameter specifies the transaction ID up to which recovery will proceed. Keep in mind that while transaction IDs are assigned sequentially at transaction start, transactions can complete in a different numeric order. The transactions that will be recovered are those that committed before (and optionally including) the specified one. The precise stopping point is also influenced by recovery_target_inclusive.
recovery_target_lsn
(pg_lsn
)
This parameter specifies the LSN of the write-ahead log location up to which recovery will proceed. The precise stopping point is also influenced by recovery_target_inclusive. This parameter is parsed using the system data type pg_lsn
.
The following options further specify the recovery target, and affect what happens when the target is reached:
recovery_target_inclusive
(boolean
)
Specifies whether to stop just after the specified recovery target (on
), or just before the recovery target (off
). Applies when recovery_target_lsn, recovery_target_time, or recovery_target_xid is specified. This setting controls whether transactions having exactly the target WAL location (LSN), commit time, or transaction ID, respectively, will be included in the recovery. Default is on
.
recovery_target_timeline
(string
)
Specifies recovering into a particular timeline. The value can be a numeric timeline ID or a special value. The value current
recovers along the same timeline that was current when the base backup was taken. The value latest
recovers to the latest timeline found in the archive, which is useful in a standby server. latest
is the default.
You usually only need to set this parameter in complex re-recovery situations, where you need to return to a state that itself was reached after a point-in-time recovery. See Section 25.3.5 for discussion.
recovery_target_action
(enum
)
Specifies what action the server should take once the recovery target is reached. The default is pause
, which means recovery will be paused. promote
means the recovery process will finish and the server will start to accept connections. Finally shutdown
will stop the server after reaching the recovery target.
The intended use of the pause
setting is to allow queries to be executed against the database to check if this recovery target is the most desirable point for recovery. The paused state can be resumed by using pg_wal_replay_resume()
(see Table 9.86), which then causes recovery to end. If this recovery target is not the desired stopping point, then shut down the server, change the recovery target settings to a later target and restart to continue recovery.
The shutdown
setting is useful to have the instance ready at the exact replay point desired. The instance will still be able to replay more WAL records (and in fact will have to replay WAL records since the last checkpoint next time it is started).
Note that because recovery.signal
will not be removed when recovery_target_action
is set to shutdown
, any subsequent start will end with immediate shutdown unless the configuration is changed or the recovery.signal
file is removed manually.
This setting has no effect if no recovery target is set. If hot_standby is not enabled, a setting of pause
will act the same as shutdown
.
這些設定控制自動資料清理(autovacuum)功能的行為。有關更多訊息,請參閱。請注意,許多這些設定可以基於每個資料表進行調整;請參閱的說明。
autovacuum
(boolean
)
控制伺服器是否應該執行 autovacuum 啟動程序背景程序。這是預設開啟的;但是, 也必須啟用 autovacuum 工作。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但是,可以透過變更資料表儲存參數來禁用單個資料表的自動清除。
請注意,即使禁用此參數,系統也會在必要時啟動自動清理過程以防止交易事務 ID 重覆。有關更多訊息,請參閱。
log_autovacuum_min_duration
(integer
)
如果 autovacuum 執行的每個操作至少運行了指定的毫秒數,則會被記錄下來。 將其設定為零會記錄所有自動清理操作。-1(預設值)禁用記錄自動清理操作。例如,如果將此設定為 250ms,則會記錄所有執行 250ms 或更長時間的自動清理和分析。另外,當此參數設定為除 -1 之外的任何值時,如果由於存在衝突鎖定而導致 autovacuum 操作被跳過,則會記錄一條記錄。啟用此參數可以有助於跟踪自動清理活動。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但是可以透過變更資料表的儲存參數來覆寫單個資料表的設定。
autovacuum_max_workers
(integer
)
指定可能在任何時間運行的自動清理程序的最大數目(除了自動清理啟動程序)。預設值是 3。該參數只能在伺服器啟動時設定。
autovacuum_naptime
(integer
)
指定在任何資料庫上執行 autovacuum 之間的最小延遲。 在每一輪背景程序檢查資料庫並根據需要為該資料庫中的資料表發出 VACUUM 和 ANALYZE 命令。延遲以秒為單位進行測量,預設值為 1 分鐘(1分鐘)。該參數只能在 postgresql.conf 檔案或伺服器命令行中設定。
autovacuum_vacuum_threshold
(integer
)
指定在任何一個資料表中觸發 VACUUM 所需的更新或刪除 tuple 的最小數目。預設值是 50 個 tuple。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但是可以透過變更資料儲存參數來覆寫單個資料表的設定。
autovacuum_analyze_threshold
(integer
)
指定在任何一個資料表中觸發 ANALYZE 所需的插入、更新或刪除的 tuple 的最小數目。預設值是 50 個 tuple。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但是可以透過變更資料表儲存參數來覆寫單個資料表的設定。
autovacuum_vacuum_scale_factor
(floating point
)
決定觸發 VACUUM 時,指定要加到 autovacuum_vacuum_threshold 的資料表大小的比例。預設值是0.2(資料表大小的 20%)。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但是可以透過變更資料表儲存參數來覆寫單個資料表的設定。
autovacuum_analyze_scale_factor
(floating point
)
指定在決定是否觸發 ANALYZE 時加到 autovacuum_analyze_threshold 的資料表大小的比例。預設值是 0.1(資料表大小的 10%)。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但是可以透過變更資料表儲存參數來覆寫單個資料表的設定。
autovacuum_freeze_max_age
(integer
)
指定資料表的 pg_class.relfrozenxid 參數在 VACUUM 操作時被強制阻止資料表中的交易事務 ID 重覆之前可以達到的最大期限(在交易事務中)。請注意,系統將啟動 autovacuum 程序以防止重覆,即使禁用 autovacuum 時也會進行。
autovacuum_multixact_freeze_max_age
(integer
)
指定資料表的 pg_class.relminmxid 參數在 VACUUM 操作以防止資料表中的多個事務ID 重覆之前可以達到的最大時間(以 multixacts 表示)。請注意,系統將啟動 autovacuum 程序以防止重覆,即使禁用 autovacuum 也會進行。
autovacuum_vacuum_cost_delay
(integer
)
autovacuum_vacuum_cost_limit
(integer
)
listen_addresses
(string
)
指定伺服器監聽用戶端應用程序連線的 TCP/IP 位址。該值採用逗號分隔的主機名稱或數字 IP 位址列表的形式。特殊項目「*」對應於所有可用的 IP。項目 0.0.0.0 允許監聽所有 IPv4 位址,還有「::」允許監聽所有 IPv6 位址。如果列表為空,則伺服器根本不監聽任何 IP 接口,在這種情況下,就只能使用 Unix-domain socket 來連接它。預設值是 localhost,它只允許進行本地 TCP/IP loopback 連線。儘管用戶端身份驗證()允許對誰可以存取伺服器進行細維的控制,但 listen_addresses 控制哪些 IP 接受連線嘗試,這有助於防止在不安全的網路接口上重複發出惡意的連線請求。此參數只能在伺服器啟動時設定。
port
(integer
)
伺服器監聽的 TCP 連接埠;預設是 5432。請注意,相同的連接埠號號用於伺服器監聽的所有 IP 地址。此參數只能在伺服器啟動時設定。
max_connections
(integer
)
決定資料庫伺服器的最大同時連線數。預設值通常為 100 個連線,但如果您的核心設定不支援它(在 initdb 期間確定),則可能會更少。該參數只能在伺服器啟動時設定。
運行備用伺服器時,必須將此參數設定為與主服務器上相同或更高的值。 否則,查詢將不被允許在備用伺服器中使用。
superuser_reserved_connections
(integer
)
決定為 PostgreSQL 超級使用者連線保留的連線「插槽」的數量。最多 max_connections 連線可以同時活動。當活動同時連線的數量為 max_connections 減去 superuser_reserved_connections 以上時,新連線將僅接受超級使用者,並且不會接受新的複寫作業連線。
預設值是三個連線。該值必須小於 max_connections 的值。此參數只能在伺服器啟動時設定。
unix_socket_directories
(string
)
指定伺服器要監聽來自用戶端應用程序以 Unix-domain socket 連線的目錄。列出由逗號分隔的多個目錄可以建立多個 socket。項目之間的空白會被忽略;如果您需要在名稱中包含空格或逗號,請用雙引號括住目錄名稱。空值表示不監聽任何 Unix-domain socket,在這種情況下,只有 TCP/IP 協定可用於連線到服務器。預設值通常是 /tmp,但可以在編譯時變更。此參數只能在伺服器啟動時設定。
除了名為 .s.PGSQL.nnnn 的 socket 檔案本身之外,其中 nnnn 是伺服器的連接埠號號,將在每個 unix_socket_directories 目錄中建立一個名為 .s.PGSQL.nnnn.lock 的普通檔案。這兩個檔案都不應該手動刪除。
這個參數與 Windows 無關,它沒有 Unix-domain socket。
unix_socket_group
(string
)
設定 Unix-domain socket 的群組。(socket 的使用者始終是啟動伺服器的使用者。)結合參數 unix_socket_permissions,可以將其用作為 Unix-domain socket 的附加存取控制機制。預設情況下,這是空字符串,它使用服務器用戶的預設群組。此參數只能在伺服器啟動時設定。
這個參數與 Windows 無關,它沒有 Unix-domain socket。
unix_socket_permissions
(integer
)
設定 Unix-domain socket 的存取權限。Unix-domain socket 使用一般的 Unix 檔案系統權限設定。期望的參數值是以 chmod 和 umask 系統呼叫可接受的格式指定數字模式。(要使用習慣的八進制格式,數字必須以 0(零)開頭。)
預設權限是 0777,意味著任何人都可以進行連線。合理的選擇是 0770(僅使用者和其群組,另請參閱 unix_socket_group)和 0700(僅使用者本身)。(請注意,對於Unix-domain socket,只有寫入權限很重要,所以設定還是撤消讀取或執行權限都沒有意義。)
此參數只能在伺服器啟動時設定。
此參數在某些系統上無關緊要,特別是從 Solaris 10 開始的 Solaris,會完全忽略權限許可。在那裡,透過將 unix_socket_directories 指向具有僅限於所需的搜尋權限的目錄,就可以實現類似的效果。這個參數與 Windows 也是無關的,它沒有 Unix-domain socket。
bonjour
(boolean
)
透過 Bonjour 啟用伺服器存在的廣播。預設是關閉的。此參數只能在伺服器啟動時設定。
bonjour_name
(string
)
指定 Bonjour 的服務名稱。如果此參數設定為空字串''(這是預設值),則使用電腦名稱。 如果伺服器未使用 Bonjour 支援進行編譯,則此參數將被忽略。此參數只能在伺服器啟動時設定。
tcp_keepalives_idle
(integer
)
指定 TCP 在發送 Keepalive 訊息給用戶端之後保持連線的秒數。值為 0 時使用系統預設值。此參數僅在支援 TCP_KEEPIDLE 或等效網路選項的系統上以及在 Windows 上受到支援;在其他系統上,它必須是零。在透過 Unix-domain socket 的連線中,該參數將被忽略並始終為零。
在 Windows 上,值為 0 會將此參數設定為2小時,因為 Windows 不提供讀取系統預設值的方法。
tcp_keepalives_interval
(integer
)
指定用戶端未回應的 TCP 保持活動訊息應重新傳輸的秒數。值為 0 時使用系統預設值。此參數僅在支援 TCP_KEEPINTVL 或等效網路選項的系統上以及在 Windows 上受到支援;在其他系統上,它必須是零。在透過 Unix-domain socket 的連線中,此參數將被忽略並始終為零。
在 Windows 上,值為 0 會將此參數設定為 1 秒,因為 Windows 不提供讀取系統預設值的方法。
tcp_keepalives_count
(integer
)
指定在伺服器連線到用戶端之前可能已經失去的 TCP 保持連線的數量。值為 0 時使用系統預設值。此參數僅在支援 TCP_KEEPCNT 或等效網路選項的系統上受到支持;在其他系統上,它必須是零。在透過 Unix-domain socket 的連線中,此參數將被忽略並始終為零。
此參數在 Windows 上不支援,並且必須為零。
authentication_timeout
(integer
)
以秒為單位設定用戶端身份驗證的最長時間。如果可能的用戶端在這段時間內還沒有完成認證協議,伺服器將會關閉連線。這可以防止掛起的用戶端無限期地佔用連線。預設值是一分鐘。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
ssl
(boolean
)
ssl_ca_file
(string
)
指定包含 SSL 伺服器證書頒發機構(CA)的檔案名稱。相對路徑與資料目錄有關。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值為空,表示未載入 CA 檔案,並且不執行用戶端證書驗證。
在以前的 PostgreSQL 版本中,該檔案的名稱被硬性指定為 root.crt。
ssl_cert_file
(string
)
指定包含 SSL 伺服器證書的檔案名稱。相對路徑與資料目錄有關。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值是 server.crt。
ssl_crl_file
(string
)
指定包含 SSL 伺服器證書吊銷列表(CRL)的文件的名稱。相對路徑與資料目錄有關。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值為空,表示沒有加載 CRL 檔案。
在以前的 PostgreSQL 版本中,該檔案的名稱被硬性指定為 root.crl。
ssl_key_file
(string
)
指定包含 SSL 伺服器私鑰的檔案名稱。相對路徑與資料目錄有關。 此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值是 server.key。
ssl_ciphers
(string
)
指定允許在安全連線上使用的 SSL 密碼套件列表。有關此設定的語法和支援的列表,請參閱 OpenSSL 軟體套件中的密碼手冊文件。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值為 HIGH:MEDIUM:+3DES:!aNULL。這個預設通常是一個合理的設定,除非您有特定的安全要求。
預設值延伸說明:
HIGH
使用來自 HIGH group 的密碼套件(例如:AES,Camellia,3DES)
MEDIUM
使用來自 MEDIUM group 的密碼套件(例如:RC4,SEED)
+3DES
HIGH 的 OpenSSL 預設順序有問題,因為它的 3DES 高於 AES128。這是錯誤的,因為 3DES 比 AES128 提供較低的安全性,而且速度也更慢。+3DES 將所有其他高級和中級密碼重新排序。
!aNULL
停用不進行身份驗證的匿名密碼套件。這種密碼套件容易受到中間人攻擊,因此不應使用。
可用的密碼套件詳細訊息將因 OpenSSL 版本而異。使用命令 openssl ciphers -v'HIGH:MEDIUM:+3DES:!aNULL'
來查看當下安裝的 OpenSSL 版本細節。請注意,此列表在運行時基於伺服器密鑰型別進行過濾。
ssl_prefer_server_ciphers
(boolean
)
指定是否使用伺服器的 SSL 密碼設定,而不是用戶端的。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設值是 true。
較舊的 PostgreSQL 版本並沒有此設定,始終使用用戶端的設定。此設定主要是為了與這些版本的相容性。使用伺服器的選項通常更好,因為伺服器更有可能做適當的配置。
ssl_ecdh_curve
(string
)
指定要在 ECDH 密鑰交換中使用的 curve 名稱。它需要所有連線的用戶端支援。它不需要與伺服器的 curve 鍵使用的相同。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設為 prime256v1。
最常用 curve 的 OpenSSL 名稱為:prime256v1(NIST P-256),secp384r1(NIST P-384),secp521r1(NIST P-521)。可用 curve 的完整列表可以使用 openssl ecparam -list_curves 指令列出。但並非所有的結果都可以在 TLS 中使用。
password_encryption
(enum
)
ssl_dh_params_file
(string
)
指定包含用於所謂的 ephemeral DH family 的 SSL 加密的 Diffie-Hellman 參數的檔案名稱。預設值為空,在這種情況下,使用預設編譯的 DH 參數。如果攻擊者設法破解眾所周知的編譯 DH 參數,則使用自行定義 DH 參數可以減少暴露的可能性。 您可以使用指令 openssl dhparam -out dhparams.pem 2048
建立您自己的DH參數檔案。
此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
krb_server_keyfile
(string
)
krb_caseins_users
(boolean
)
設定是否應該區分大小寫地處理 GSSAPI 用戶名。預設是關閉的(區分大小寫)。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
db_user_namespace
(boolean
)
此參數啟用每個資料庫分別的使用者名稱。預設是關閉的。 此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
如果開啓的話,您應該將使用者建立為 username@dbname。當連線用戶端傳遞使用者名稱時,@和資料庫名稱將附加到使用者名稱中,並且該伺服器會查詢特定於資料庫的使用者名稱。請注意,當您在 SQL 環境中建立名稱包含 @ 的使用者時,您需要以引號括住使用者名稱。
啟用此參數後,您仍然可以建立普通的全域使用者。在用戶端指定使用者名稱時簡單追加 @,例如 joe@。在使用者名稱被伺服器查詢之前,@ 將被剝離。
db_user_namespace 會導致用戶端和伺服器的使用者名稱表示方式不同。身份驗證檢查始終使用伺服器的使用者名稱完成,因此必須為伺服器的使用者名稱配置身份驗證方法,而不是用戶端。而 md5 在用戶端和伺服器上均使用使用者名稱作為 salt,所以 md5 不能與 db_user_namespace 一起使用。
此功能是一種臨時措施,到找到完整的解決方案的時候,這個選項將被刪除。
ssl
(boolean
)
啟用 SSL 連線。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設為 off。
ssl_ca_file
(string
)
Specifies the name of the file containing the SSL server certificate authority (CA). Relative paths are relative to the data directory. This parameter can only be set in the postgresql.conf
file or on the server command line. The default is empty, meaning no CA file is loaded, and client certificate verification is not performed.
ssl_cert_file
(string
)
Specifies the name of the file containing the SSL server certificate. Relative paths are relative to the data directory. This parameter can only be set in the postgresql.conf
file or on the server command line. The default is server.crt
.
(string
)
Specifies the name of the file containing the SSL server certificate revocation list (CRL). Relative paths are relative to the data directory. This parameter can only be set in the postgresql.conf
file or on the server command line. The default is empty, meaning no CRL file is loaded.
ssl_key_file
(string
)
Specifies the name of the file containing the SSL server private key. Relative paths are relative to the data directory. This parameter can only be set in the postgresql.conf
file or on the server command line. The default is server.key
.
ssl_ciphers
(string
)
Specifies a list of SSL cipher suites that are allowed to be used on secure connections. See the ciphers manual page in the OpenSSL package for the syntax of this setting and a list of supported values. This parameter can only be set in the postgresql.conf
file or on the server command line. The default value is HIGH:MEDIUM:+3DES:!aNULL
. The default is usually a reasonable choice unless you have specific security requirements.
Explanation of the default value:
HIGH
Cipher suites that use ciphers from HIGH
group (e.g., AES, Camellia, 3DES)
MEDIUM
Cipher suites that use ciphers from MEDIUM
group (e.g., RC4, SEED)
+3DES
The OpenSSL default order for HIGH
is problematic because it orders 3DES higher than AES128. This is wrong because 3DES offers less security than AES128, and it is also much slower. +3DES
reorders it after all other HIGH
and MEDIUM
ciphers.
!aNULL
Disables anonymous cipher suites that do no authentication. Such cipher suites are vulnerable to man-in-the-middle attacks and therefore should not be used.
Available cipher suite details will vary across OpenSSL versions. Use the command openssl ciphers -v 'HIGH:MEDIUM:+3DES:!aNULL'
to see actual details for the currently installed OpenSSL version. Note that this list is filtered at run time based on the server key type
ssl_prefer_server_ciphers
(boolean
)
Specifies whether to use the server's SSL cipher preferences, rather than the client's. This parameter can only be set in the postgresql.conf
file or on the server command line. The default is true
.
Older PostgreSQL versions do not have this setting and always use the client's preferences. This setting is mainly for backward compatibility with those versions. Using the server's preferences is usually better because it is more likely that the server is appropriately configured.
ssl_ecdh_curve
(string
)
Specifies the name of the curve to use in ECDH key exchange. It needs to be supported by all clients that connect. It does not need to be the same curve used by the server's Elliptic Curve key. This parameter can only be set in the postgresql.conf
file or on the server command line. The default is prime256v1
.
OpenSSL names for the most common curves are: prime256v1
(NIST P-256), secp384r1
(NIST P-384), secp521r1
(NIST P-521). The full list of available curves can be shown with the command openssl ecparam -list_curves
. Not all of them are usable in TLS though.
ssl_dh_params_file
(string
)
Specifies the name of the file containing Diffie-Hellman parameters used for so-called ephemeral DH family of SSL ciphers. The default is empty, in which case compiled-in default DH parameters used. Using custom DH parameters reduces the exposure if an attacker manages to crack the well-known compiled-in DH parameters. You can create your own DH parameters file with the command openssl dhparam -out dhparams.pem 2048
.
This parameter can only be set in the postgresql.conf
file or on the server command line.
ssl_passphrase_command
(string
)
Sets an external command to be invoked when a passphrase for decrypting an SSL file such as a private key needs to be obtained. By default, this parameter is empty, which means the built-in prompting mechanism is used.
The command must print the passphrase to the standard output and exit with code 0. In the parameter value, %p
is replaced by a prompt string. (Write %%
for a literal %
.) Note that the prompt string will probably contain whitespace, so be sure to quote adequately. A single newline is stripped from the end of the output if present.
The command does not actually have to prompt the user for a passphrase. It can read it from a file, obtain it from a keychain facility, or similar. It is up to the user to make sure the chosen mechanism is adequately secure.
This parameter can only be set in the postgresql.conf
file or on the server command line
ssl_passphrase_command_supports_reload
(boolean
)
This parameter determines whether the passphrase command set by ssl_passphrase_command
will also be called during a configuration reload if a key file needs a passphrase. If this parameter is false (the default), then ssl_passphrase_command
will be ignored during a reload and the SSL configuration will not be reloaded if a passphrase is needed. That setting is appropriate for a command that requires a TTY for prompting, which might not be available when the server is running. Setting this parameter to true might be appropriate if the passphrase is obtained from a file, for example.
This parameter can only be set in the postgresql.conf
file or on the server command line.
這些參數控制伺服器端的統計數據收集功能。啟用統計數據收集後,可以透過 pg_stat 和 pg_statio 系列系統檢視表取得相關的資料。有關更多資訊,請參閱。
track_activities
(boolean
)
啟用收集有關每個連線的當下執行命令的資訊以及該命令開始執行的時間的資訊。預設情況下,此參數是開啓的。請注意,即使啟用此功能,也不是所有使用者都可以取用,而只有超級使用者和擁有該連線的使用者可以檢視這些數據,因此它不會有安全風險。僅超級使用者可以變更此設定。
track_activity_query_size
(integer
)
為 pg_stat_activity.query 欄位指定保留的字元數,以追踪每個連線查詢當下執行的指令字串。預設值為1024。只能在伺服器啟動時設定此參數。
track_counts
(boolean
)
啟用有關資料庫活動的統計資訊收集。預設情況下,此參數是啟用的,因為 autovacuum 背景程序需要收集資訊。僅超級使用者可以變更此設定。
track_io_timing
(boolean
)
啟用資料庫 I/O 呼叫的計時。此參數預設情況下是處於關閉狀態,因為它將重複查詢作業系統當下的時間,這可能會導致某些平台上的大量運算成本。您可以使用 工具來測量系統計時的成本。I/O 時序資訊會顯示在 中,使用 BUFFERS 選項時在 的輸出中以及 中顯示。僅超級使用者可以變更改此設定。
track_functions
(enum
)
啟用對函數呼叫計數和使用時間的追踪。指定 pl 僅追踪程序語言函數,all 則表示也追踪 SQL 和 C 語言函數。預設值為 none,這將停用函數統計資訊追踪。僅超級使用者可以變更此設定。
注意 不管此設定如何,都不會追踪足夠簡單以「inline」到呼叫查詢中的 SQL 語言函數。
stats_temp_directory
(string
)
設定用於儲存臨時統計數據的目錄。這可以是相對於資料目錄的路徑,也可以是絕對路徑。預設值為 pg_stat_tmp。將其指向基於 RAM 的檔案系統可以降低物理性 I/O 的要求,使得效能提升。只能在 postgresql.conf 檔案或伺服器命令列中設定此參數。
log_statement_stats
(boolean
)
log_parser_stats
(boolean
)
log_planner_stats
(boolean
)
log_executor_stats
(boolean
)
對於每個查詢,將相對應模組的效能統計數據輸出到伺服器日誌。這是一個粗略的分析工具,類似於 Unix getrusage() 作業系統的工具。log_statement_stats 總計整個查詢語句過程的統計數據,而其他的設定是每個查詢模組的統計數據。log_statement_stats 不能與任何其他模組選項同時啟用。預設情況下,所有這些選項都是停用的。只有超級使用者可以變更這些設定。
這些配置參數提供了影響查詢最佳化程序選擇的查詢計劃決策方法。如果最佳化程序為特定查詢選擇的預設計劃並非最佳,則臨時的解決方案是使用這些配置參數來強制最佳化程序選擇不同的計劃。提高最佳化程序選擇的計劃素質的有效方法包括了調整計劃程序成本常數(請參閱),手動執行 ,增加 配置參數的值,以及增加為特定欄位收集的統計訊息量,使用 ALTER TABLE SET STATISTICS。
enable_bitmapscan
(boolean
)啟用或停用查詢計劃程序使用 bitmap 掃描計劃類型。預設為開啓。
enable_gathermerge
(boolean
)啟用或停用查詢計劃程序使用 gather merge 計劃類型。預設為開啓。
enable_hashagg
(boolean
)啟用或停用查詢計劃程序使用 hashed aggregation 計劃類型。預設為開啓。
enable_hashjoin
(boolean
)啟用或停用查詢計劃程序使用 hash-join 計劃類型。預設為開啓。
enable_indexscan
(boolean
)啟用或停用查詢計劃程序使用 index-scan 計劃類型。預設為開啓。
enable_indexonlyscan
(boolean
)enable_material
(boolean
)啟用或停用查詢計劃程序對實作的使用。完全抑制實作是不可能的,但是關閉此變數會阻止計劃程序插入實體化的節點,除非真的需要它。預設為開啓。
enable_mergejoin
(boolean
)啟用或停用查詢計劃程序使用 merge-join 計劃類型。預設為開啓。
enable_nestloop
(boolean
)啟用或停用查詢計劃程序使用 nested-loop join 計劃。完全抑制 nested-loop join 是不可能的,但如果有其他可用方法,則關閉此變數會阻止規劃器使用它。預設為開啓。
enable_parallel_append
(boolean
)Enables or disables the query planner's use of parallel-aware append plan types. The default is on
.
enable_parallel_hash
(boolean
)Enables or disables the query planner's use of hash-join plan types with parallel hash. Has no effect if hash-join plans are not also enabled. The default is on
.
enable_partition_pruning
(boolean
)enable_partitionwise_join
(boolean
)Enables or disables the query planner's use of partitionwise join, which allows a join between partitioned tables to be performed by joining the matching partitions. Partitionwise join currently applies only when the join conditions include all the partition keys, which must be of the same data type and have exactly matching sets of child partitions. Because partitionwise join planning can use significantly more CPU time and memory during planning, the default is off
.
enable_partitionwise_aggregate
(boolean
)Enables or disables the query planner's use of partitionwise grouping or aggregation, which allows grouping or aggregation on a partitioned tables performed separately for each partition. If the GROUP BY
clause does not include the partition keys, only partial aggregation can be performed on a per-partition basis, and finalization must be performed later. Because partitionwise grouping or aggregation can use significantly more CPU time and memory during planning, the default is off
.
enable_seqscan
(boolean
)啟用或停用查詢計劃程序使用循序掃描計劃類型。完全抑制循序掃描是不可能的,但如果有其他方法可用,則關閉此變數會阻止計劃程序使用。預設為開啓。
enable_sort
(boolean
)啟用或停用查詢計劃程序使用明確的排序步驟。完全抑制明確排序是不可能的,但如果有其他可用方法,則關閉此變數會阻止計劃程序使用。預設為開啓。
enable_tidscan
(boolean
)啟用或停用查詢計劃程序使用 TID 掃描計劃類型。預設為開啓。
本節中描述的成本變數是以比例來使用的。只有它們的相對值很重要,因此按相同因子放大或縮小它們將不會讓規劃程式的選擇有所變化。預設情況下,這些成本變數基於連續頁面讀取的成本;也就是說,seq_page_cost 通常設定為 1.0,其他成本變數是相對參考其設定的。 但是,如果您願意,可以使用不同的比例,例如特定主機上的實際執行時間(以毫秒為單位)。
注意 不幸的是,並沒有明確定義的方法來決定成本變數的理想值。它們最好被視為特定安裝環境可能接受的所有查詢組合的平均值。這意味著僅僅根據一些實驗來改變它們都不是真正的最佳。
seq_page_cost
(floating point
)random_page_cost
(floating point
)Reducing this value relative to seq_page_cost
will cause the system to prefer index scans; raising it will make index scans look relatively more expensive. You can raise or lower both values together to change the importance of disk I/O costs relative to CPU costs, which are described by the following parameters.
Random access to mechanical disk storage is normally much more expensive than four times sequential access. However, a lower default is used (4.0) because the majority of random accesses to disk, such as indexed reads, are assumed to be in cache. The default value can be thought of as modeling random access as 40 times slower than sequential, while expecting 90% of random reads to be cached.
If you believe a 90% cache rate is an incorrect assumption for your workload, you can increase random_page_cost to better reflect the true cost of random storage reads. Correspondingly, if your data is likely to be completely in cache, such as when the database is smaller than the total server memory, decreasing random_page_cost can be appropriate. Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost.
Although the system will let you set random_page_cost
to less than seq_page_cost
, it is not physically sensible to do so. However, setting them equal makes sense if the database is entirely cached in RAM, since in that case there is no penalty for touching pages out of sequence. Also, in a heavily-cached database you should lower both values relative to the CPU parameters, since the cost of fetching a page already in RAM is much smaller than it would normally be.
cpu_tuple_cost
(floating point
)設定計劃程序在查詢期間處理每個資料列的成本估算。預設值為 0.01。
cpu_index_tuple_cost
(floating point
)設定計劃程序在索引掃描期間處理每個索引項目的成本估計。預設值為 0.005。
cpu_operator_cost
(floating point
)設定計劃程序對查詢期間執行的每個運算子或函數的處理成本的估計。 預設值為 0.0025。
parallel_setup_cost
(floating point
)設定計劃程序對啟動平行工作程序的成本估計。預設值為 1000。
parallel_tuple_cost
(floating point
)設定計劃程序對從一個平行工作程序轉移到另一個程序的一個 tuple 的成本估算。預設值為 0.1。
min_parallel_table_scan_size
(integer
)Sets the minimum amount of table data that must be scanned in order for a parallel scan to be considered. For a parallel sequential scan, the amount of table data scanned is always equal to the size of the table, but when indexes are used the amount of table data scanned will normally be less. The default is 8 megabytes (8MB
).
min_parallel_index_scan_size
(integer
)Sets the minimum amount of index data that must be scanned in order for a parallel scan to be considered. Note that a parallel index scan typically won't touch the entire index; it is the number of pages which the planner believes will actually be touched by the scan which is relevant. The default is 512 kilobytes (512kB
).
effective_cache_size
(integer
)Sets the planner's assumption about the effective size of the disk cache that is available to a single query. This is factored into estimates of the cost of using an index; a higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used. When setting this parameter you should consider both PostgreSQL's shared buffers and the portion of the kernel's disk cache that will be used for PostgreSQL data files. Also, take into account the expected number of concurrent queries on different tables, since they will have to share the available space. This parameter has no effect on the size of shared memory allocated by PostgreSQL, nor does it reserve kernel disk cache; it is used only for estimation purposes. The system also does not assume data remains in the disk cache between queries. The default is 4 gigabytes (4GB
).
geqo
(boolean
)Enables or disables genetic query optimization. This is on by default. It is usually best not to turn it off in production; the geqo_threshold
variable provides more granular control of GEQO.
geqo_threshold
(integer
)Use genetic query optimization to plan queries with at least this many FROM
items involved. (Note that a FULL OUTER JOIN
construct counts as only one FROM
item.) The default is 12. For simpler queries it is usually best to use the regular, exhaustive-search planner, but for queries with many tables the exhaustive search takes too long, often longer than the penalty of executing a suboptimal plan. Thus, a threshold on the size of the query is a convenient way to manage use of GEQO.
geqo_effort
(integer
)Controls the trade-off between planning time and query plan quality in GEQO. This variable must be an integer in the range from 1 to 10. The default value is five. Larger values increase the time spent doing query planning, but also increase the likelihood that an efficient query plan will be chosen.
geqo_effort
doesn't actually do anything directly; it is only used to compute the default values for the other variables that influence GEQO behavior (described below). If you prefer, you can set the other parameters by hand instead.
geqo_pool_size
(integer
)Controls the pool size used by GEQO, that is the number of individuals in the genetic population. It must be at least two, and useful values are typically 100 to 1000. If it is set to zero (the default setting) then a suitable value is chosen based on geqo_effort
and the number of tables in the query.
geqo_generations
(integer
)Controls the number of generations used by GEQO, that is the number of iterations of the algorithm. It must be at least one, and useful values are in the same range as the pool size. If it is set to zero (the default setting) then a suitable value is chosen based on geqo_pool_size
.
geqo_selection_bias
(floating point
)Controls the selection bias used by GEQO. The selection bias is the selective pressure within the population. Values can be from 1.50 to 2.00; the latter is the default.
geqo_seed
(floating point
)Controls the initial value of the random number generator used by GEQO to select random paths through the join order search space. The value can range from zero (the default) to one. Varying the value changes the set of join paths explored, and may result in a better or worse best path being found.
default_statistics_target
(integer
)constraint_exclusion
(enum
)Controls the query planner's use of table constraints to optimize queries. The allowed values of constraint_exclusion
are on
(examine constraints for all tables), off
(never examine constraints), and partition
(examine constraints only for inheritance child tables and UNION ALL
subqueries). partition
is the default setting. It is often used with inheritance and partitioned tables to improve performance.
When this parameter allows it for a particular table, the planner compares query conditions with the table's CHECK
constraints, and omits scanning tables for which the conditions contradict the constraints. For example:
With constraint exclusion enabled, this SELECT
will not scan child1000
at all, improving performance.
Currently, constraint exclusion is enabled by default only for cases that are often used to implement table partitioning. Turning it on for all tables imposes extra planning overhead that is quite noticeable on simple queries, and most often will yield no benefit for simple queries. If you have no partitioned tables you might prefer to turn it off entirely.
cursor_tuple_fraction
(floating point
)Sets the planner's estimate of the fraction of a cursor's rows that will be retrieved. The default is 0.1. Smaller values of this setting bias the planner towards using “fast start” plans for cursors, which will retrieve the first few rows quickly while perhaps taking a long time to fetch all rows. Larger values put more emphasis on the total estimated time. At the maximum setting of 1.0, cursors are planned exactly like regular queries, considering only the total estimated time and not how soon the first rows might be delivered.
from_collapse_limit
(integer
)join_collapse_limit
(integer
)The planner will rewrite explicit JOIN
constructs (except FULL JOIN
s) into lists of FROM
items whenever a list of no more than this many items would result. Smaller values reduce planning time but might yield inferior query plans.
force_parallel_mode
(enum
)Allows the use of parallel queries for testing purposes even in cases where no performance benefit is expected. The allowed values of force_parallel_mode
are off
(use parallel mode only when it is expected to improve performance), on
(force parallel query for all queries for which it is thought to be safe), and regress
(like on
, but with additional behavior changes as explained below).
More specifically, setting this value to on
will add a Gather
node to the top of any query plan for which this appears to be safe, so that the query runs inside of a parallel worker. Even when a parallel worker is not available or cannot be used, operations such as starting a subtransaction that would be prohibited in a parallel query context will be prohibited unless the planner believes that this will cause the query to fail. If failures or unexpected results occur when this option is set, some functions used by the query may need to be marked PARALLEL UNSAFE
(or, possibly, PARALLEL RESTRICTED
).
Setting this value to regress
has all of the same effects as setting it to on
plus some additional effects that are intended to facilitate automated regression testing. Normally, messages from a parallel worker include a context line indicating that, but a setting of regress
suppresses this line so that the output is the same as in non-parallel execution. Also, the Gather
nodes added to plans by this setting are hidden in EXPLAIN
output so that the output matches what would be obtained if this setting were turned off
.
當用戶端應用程序連線到資料庫伺服器時,它將指定要連線的 PostgreSQL 資料庫使用者名稱,這與以特定使用者身份登入到 Unix 伺服器的方式大致相同。在 SQL 環境中,有效的資料庫使用者名確定資料庫物件的存取權限 - 有關詳細訊息,請參閱。因此,限制哪些資料庫使用者可以進行連線是非常重要的。
正如第 21 章所描述的,PostgreSQL 實際上是以「角色」的角度來管理權限的。在本章中,我們一直使用資料庫使用者來表示「具有 LOGIN 權限的角色」。
身份驗證是資料庫伺服器建立用戶端身份的過程,延伸確認用戶端應用程序(或執行用戶端應用程序的使用者)是否被允許以請求的資料庫使用者名稱進行連線。
PostgreSQL 提供了許多不同的用戶端身份驗證方法。用於驗證特定用戶端連線的方法可以根據(用戶端)主機位址、資料庫名稱和使用者名稱進行驗證。
PostgreSQL 資料庫使用者名稱在邏輯上與運行服務器的作業系統的使用者名稱是分開的。如果特定伺服器的所有用戶在伺服器的機器上也有帳戶,那麼分配與其作業系統用戶名搭配的資料庫用戶名是有意義的。但是,接受遠端連線的伺服器可能有許多沒有本地作業系統帳戶的資料庫用戶,在這種情況下,資料庫用戶名和作業系統用戶名之間不需要有所關連。
The following “parameters” are read-only, and are determined when PostgreSQL is compiled or when it is installed. As such, they have been excluded from the sample postgresql.conf
file. These options report various aspects of PostgreSQL behavior that might be of interest to certain applications, particularly administrative front-ends.
block_size
(integer
)
Reports the size of a disk block. It is determined by the value of BLCKSZ
when building the server. The default value is 8192 bytes. The meaning of some configuration variables (such as ) is influenced by block_size
. See for information.
data_checksums
(boolean
)
Reports whether data checksums are enabled for this cluster. See for more information.
debug_assertions
(boolean
)
Reports whether PostgreSQL has been built with assertions enabled. That is the case if the macro USE_ASSERT_CHECKING
is defined when PostgreSQL is built (accomplished e.g. by the configure
option --enable-cassert
). By default PostgreSQL is built without assertions.
integer_datetimes
(boolean
)
Reports whether PostgreSQL was built with support for 64-bit-integer dates and times. As of PostgreSQL 10, this is always on
.
lc_collate
(string
)
Reports the locale in which sorting of textual data is done. See for more information. This value is determined when a database is created.
lc_ctype
(string
)
Reports the locale that determines character classifications. See for more information. This value is determined when a database is created. Ordinarily this will be the same as lc_collate
, but for special applications it might be set differently.
max_function_args
(integer
)
Reports the maximum number of function arguments. It is determined by the value of FUNC_MAX_ARGS
when building the server. The default value is 100 arguments.
max_identifier_length
(integer
)
Reports the maximum identifier length. It is determined as one less than the value of NAMEDATALEN
when building the server. The default value of NAMEDATALEN
is 64; therefore the default max_identifier_length
is 63 bytes, which can be less than 63 characters when using multibyte encodings.
max_index_keys
(integer
)
Reports the maximum number of index keys. It is determined by the value of INDEX_MAX_KEYS
when building the server. The default value is 32 keys.
segment_size
(integer
)
Reports the number of blocks (pages) that can be stored within a file segment. It is determined by the value of RELSEG_SIZE
when building the server. The maximum size of a segment file in bytes is equal to segment_size
multiplied by block_size
; by default this is 1GB.
server_encoding
(string
)
server_version
(string
)
Reports the version number of the server. It is determined by the value of PG_VERSION
when building the server.
server_version_num
(integer
)
Reports the version number of the server as an integer. It is determined by the value of PG_VERSION_NUM
when building the server.
wal_block_size
(integer
)
Reports the size of a WAL disk block. It is determined by the value of XLOG_BLCKSZ
when building the server. The default value is 8192 bytes.
wal_segment_size
(integer
)
When using an external authentication system such as Ident or GSSAPI, the name of the operating system user that initiated the connection might not be the same as the database user (role) that is to be used. In this case, a user name map can be applied to map the operating system user name to a database user. To use user name mapping, specify map
=map-name
in the options field in pg_hba.conf
. This option is supported for all authentication methods that receive external user names. Since different mappings might be needed for different connections, the name of the map to be used is specified in the map-name
parameter in pg_hba.conf
to indicate which map to use for each individual connection.
User name maps are defined in the ident map file, which by default is named pg_ident.conf
and is stored in the cluster's data directory. (It is possible to place the map file elsewhere, however; see the configuration parameter.) The ident map file contains lines of the general form:
Comments and whitespace are handled in the same way as in pg_hba.conf
. The map-name
is an arbitrary name that will be used to refer to this mapping in pg_hba.conf
. The other two fields specify an operating system user name and a matching database user name. The same map-name
can be used repeatedly to specify multiple user-mappings within a single map.
There is no restriction regarding how many database users a given operating system user can correspond to, nor vice versa. Thus, entries in a map should be thought of as meaning “this operating system user is allowed to connect as this database user”, rather than implying that they are equivalent. The connection will be allowed if there is any map entry that pairs the user name obtained from the external authentication system with the database user name that the user has requested to connect as.
If the system-username
field starts with a slash (/
), the remainder of the field is treated as a regular expression. (See for details of PostgreSQL's regular expression syntax.) The regular expression can include a single capture, or parenthesized subexpression, which can then be referenced in the database-username
field as \1
(backslash-one). This allows the mapping of multiple user names in a single line, which is particularly useful for simple syntax substitutions. For example, these entries
will remove the domain part for users with system user names that end with @mydomain.com
, and allow any user whose system name ends with @otherdomain.com
to log in as guest
.
Keep in mind that by default, a regular expression can match just part of a string. It's usually wise to use ^
and $
, as shown in the above example, to force the match to be to the entire system user name.
The pg_ident.conf
file is read on start-up and when the main server process receives a SIGHUP signal. If you edit the file on an active system, you will need to signal the postmaster (using pg_ctl reload
, calling the SQL function pg_reload_conf()
, or using kill -HUP
) to make it re-read the file.
A pg_ident.conf
file that could be used in conjunction with the pg_hba.conf
file in is shown in . In this example, anyone logged in to a machine on the 192.168 network that does not have the operating system user name bryanh
, ann
, or robert
would not be granted access. Unix user robert
would only be allowed access when he tries to connect as PostgreSQL user bob
, not as robert
or anyone else. ann
would only be allowed to connect as ann
. User bryanh
would be allowed to connect as either bryanh
or as guest1
.
Example 20.2. An Example pg_ident.conf
File
For convenience there are also single letter command-line option switches available for some parameters. They are described in . Some of these options exist for historical reasons, and their presence as a single-letter option does not necessarily indicate an endorsement to use the option heavily.
此身份驗證方法使用 SSL 用戶端憑證進行身份驗證。因此,它僅適用於 SSL 連線。使用此身份驗證方法時,伺服器將要求用戶端提供有效可信任的憑證。不會有密碼提示發送給用戶端。憑證的 cn(Common Name)屬性將與請求連線的資料庫使用者名稱進行比較,如果符合,則允許登錄。使用者名稱對應可用於允許 cn 與資料庫使用者名稱不同。
SSL 憑證身份驗證支援以下配置選項:
map
允許在系統使用者名稱和資料庫使用者名稱之間進行對應。相關詳細資訊,請參閱。
在指定憑證認證的 pg_hba.conf 記錄中,憑證選項 clientcert 被假設為 verify-ca 或 verify-full,由於此方法需要用戶端憑證,因此無法將其關閉。 cert 方法增加了基本 clientcert 憑證有效性測試的方法是檢查 cn 屬性是否與資料庫使用者名稱相符。
There are several password-based authentication methods. These methods operate similarly but differ in how the users' passwords are stored on the server and how the password provided by a client is sent across the connection.
scram-sha-256
The method scram-sha-256
performs SCRAM-SHA-256 authentication, as described in . It is a challenge-response scheme that prevents password sniffing on untrusted connections and supports storing passwords on the server in a cryptographically hashed form that is thought to be secure.
This is the most secure of the currently provided methods, but it is not supported by older client libraries.
md5
The method md5
uses a custom less secure challenge-response mechanism. It prevents password sniffing and avoids storing passwords on the server in plain text but provides no protection if an attacker manages to steal the password hash from the server. Also, the MD5 hash algorithm is nowadays no longer considered secure against determined attacks.
The md5
method cannot be used with the feature.
To ease transition from the md5
method to the newer SCRAM method, if md5
is specified as a method in pg_hba.conf
but the user's password on the server is encrypted for SCRAM (see below), then SCRAM-based authentication will automatically be chosen instead.
password
The method password
sends the password in clear-text and is therefore vulnerable to password “sniffing” attacks. It should always be avoided if possible. If the connection is protected by SSL encryption then password
can be used safely, though. (Though SSL certificate authentication might be a better choice if one is depending on using SSL).
PostgreSQL database passwords are separate from operating system user passwords. The password for each database user is stored in the pg_authid
system catalog. Passwords can be managed with the SQL commands and , e.g., CREATE ROLE foo WITH LOGIN PASSWORD 'secret'
, or the psql command \password
. If no password has been set up for a user, the stored password is null and password authentication will always fail for that user.
The availability of the different password-based authentication methods depends on how a user's password on the server is encrypted (or hashed, more accurately). This is controlled by the configuration parameter at the time the password is set. If a password was encrypted using the scram-sha-256
setting, then it can be used for the authentication methods scram-sha-256
and password
(but password transmission will be in plain text in the latter case). The authentication method specification md5
will automatically switch to using the scram-sha-256
method in this case, as explained above, so it will also work. If a password was encrypted using the md5
setting, then it can be used only for the md5
and password
authentication method specifications (again, with the password transmitted in plain text in the latter case). (Previous PostgreSQL releases supported storing the password on the server in plain text. This is no longer possible.) To check the currently stored password hashes, see the system catalog pg_authid
.
To upgrade an existing installation from md5
to scram-sha-256
, after having ensured that all client libraries in use are new enough to support SCRAM, set password_encryption = 'scram-sha-256'
in postgresql.conf
, make all users set new passwords, and change the authentication method specifications in pg_hba.conf
to scram-sha-256
.
The following subsections describe the authentication methods in more detail.
When trust
authentication is specified, PostgreSQL assumes that anyone who can connect to the server is authorized to access the database with whatever database user name they specify (even superuser names). Of course, restrictions made in the database
and user
columns still apply. This method should only be used when there is adequate operating-system-level protection on connections to the server.
trust
authentication is appropriate and very convenient for local connections on a single-user workstation. It is usually not appropriate by itself on a multiuser machine. However, you might be able to use trust
even on a multiuser machine, if you restrict access to the server's Unix-domain socket file using file-system permissions. To do this, set the unix_socket_permissions
(and possibly unix_socket_group
) configuration parameters as described in . Or you could set the unix_socket_directories
configuration parameter to place the socket file in a suitably restricted directory.
Setting file-system permissions only helps for Unix-socket connections. Local TCP/IP connections are not restricted by file-system permissions. Therefore, if you want to use file-system permissions for local security, remove the host ... 127.0.0.1 ...
line from pg_hba.conf
, or change it to a non-trust
authentication method.
trust
authentication is only suitable for TCP/IP connections if you trust every user on every machine that is allowed to connect to the server by the pg_hba.conf
lines that specify trust
. It is seldom reasonable to use trust
for any TCP/IP connections other than those from localhost (127.0.0.1).
There are several password-based authentication methods. These methods operate similarly but differ in how the users' passwords are stored on the server and how the password provided by a client is sent across the connection.scram-sha-256
The method scram-sha-256
performs SCRAM-SHA-256 authentication, as described in . It is a challenge-response scheme that prevents password sniffing on untrusted connections and supports storing passwords on the server in a cryptographically hashed form that is thought to be secure.
This is the most secure of the currently provided methods, but it is not supported by older client libraries.md5
The method md5
uses a custom less secure challenge-response mechanism. It prevents password sniffing and avoids storing passwords on the server in plain text but provides no protection if an attacker manages to steal the password hash from the server. Also, the MD5 hash algorithm is nowadays no longer considered secure against determined attacks.
The md5
method cannot be used with the feature.
To ease transition from the md5
method to the newer SCRAM method, if md5
is specified as a method in pg_hba.conf
but the user's password on the server is encrypted for SCRAM (see below), then SCRAM-based authentication will automatically be chosen instead.password
The method password
sends the password in clear-text and is therefore vulnerable to password “sniffing” attacks. It should always be avoided if possible. If the connection is protected by SSL encryption then password
can be used safely, though. (Though SSL certificate authentication might be a better choice if one is depending on using SSL).
PostgreSQL database passwords are separate from operating system user passwords. The password for each database user is stored in the pg_authid
system catalog. Passwords can be managed with the SQL commands and , e.g., CREATE USER foo WITH PASSWORD 'secret'
, or the psql command \password
. If no password has been set up for a user, the stored password is null and password authentication will always fail for that user.
To upgrade an existing installation from md5
to scram-sha-256
, after having ensured that all client libraries in use are new enough to support SCRAM, set password_encryption = 'scram-sha-256'
in postgresql.conf
, make all users set new passwords, and change the authentication method specifications in pg_hba.conf
to scram-sha-256
.
GSSAPI is an industry-standard protocol for secure authentication defined in RFC 2743. PostgreSQL supports GSSAPI with Kerberos authentication according to RFC 1964. GSSAPIprovides automatic authentication (single sign-on) for systems that support it. The authentication itself is secure, but the data sent over the database connection will be sent unencrypted unless SSL is used.
hostname
is the fully qualified host name of the server machine. The service principal's realm is the preferred realm of the server machine.
Client principals can be mapped to different PostgreSQL database user names with pg_ident.conf
. For example, pgusername@realm
could be mapped to just pgusername
. Alternatively, you can use the full username@realm
principal as the role name in PostgreSQL without any mapping.
PostgreSQL also supports a parameter to strip the realm from the principal. This method is supported for backwards compatibility and is strongly discouraged as it is then impossible to distinguish different users with the same user name but coming from different realms. To enable this, set include_realm
to 0. For simple single-realm installations, doing that combined with setting the krb_realm
parameter (which checks that the principal's realm matches exactly what is in the krb_realm
parameter) is still secure; but this is a less capable approach compared to specifying an explicit mapping in pg_ident.conf
.
The keytab file is generated by the Kerberos software; see the Kerberos documentation for details. The following example is for MIT-compatible Kerberos 5 implementations:
The following configuration options are supported for GSSAPI:include_realm
Sets the realm to match user principal names against. If this parameter is set, only users of that realm will be accepted. If it is not set, users of any realm can connect, subject to whatever user name mapping is done.
SSPI is a Windows technology for secure authentication with single sign-on. PostgreSQL will use SSPI in negotiate
mode, which will use Kerberos when possible and automatically fall back to NTLM in other cases. SSPI authentication only works when both server and client are running Windows, or, on non-Windows platforms, when GSSAPI is available.
The following configuration options are supported for SSPI:include_realm
If set to 1, the domain's SAM-compatible name (also known as the NetBIOS name) is used for the include_realm
option. This is the default. If set to 0, the true realm name from the Kerberos user principal name is used.
Do not disable this option unless your server runs under a domain account (this includes virtual service accounts on a domain member system) and all clients authenticating through SSPI are also using domain accounts, or authentication will fail.upn_username
If this option is enabled along with compat_realm
, the user name from the Kerberos UPN is used for authentication. If it is disabled (the default), the SAM-compatible user name is used. By default, these two names are identical for new user accounts.
Note that libpq uses the SAM-compatible name if no explicit user name is specified. If you use libpq or a driver based on it, you should leave this option disabled or explicitly specify user name in the connection string.map
Sets the realm to match user principal names against. If this parameter is set, only users of that realm will be accepted. If it is not set, users of any realm can connect, subject to whatever user name mapping is done.
The ident authentication method works by obtaining the client's operating system user name from an ident server and using it as the allowed database user name (with an optional user name mapping). This is only supported on TCP/IP connections.
The following configuration options are supported for ident:map
The “Identification Protocol” is described in RFC 1413. Virtually every Unix-like operating system ships with an ident server that listens on TCP port 113 by default. The basic functionality of an ident server is to answer questions like “What user initiated the connection that goes out of your port X
and connects to my port Y
?”. Since PostgreSQL knows both X
_and Y
_ when a physical connection is established, it can interrogate the ident server on the host of the connecting client and can theoretically determine the operating system user for any given connection.
The drawback of this procedure is that it depends on the integrity of the client: if the client machine is untrusted or compromised, an attacker could run just about any program on port 113 and return any user name they choose. This authentication method is therefore only appropriate for closed networks where each client machine is under tight control and where the database and system administrators operate in close contact. In other words, you must trust the machine running the ident server. Heed the warning:
Some ident servers have a nonstandard option that causes the returned user name to be encrypted, using a key that only the originating machine's administrator knows. This option must not be used when using the ident server with PostgreSQL, since PostgreSQL does not have any way to decrypt the returned string to determine the actual user name.
The peer authentication method works by obtaining the client's operating system user name from the kernel and using it as the allowed database user name (with optional user name mapping). This method is only supported on local connections.
The following configuration options are supported for peer:map
Peer authentication is only available on operating systems providing the getpeereid()
function, the SO_PEERCRED
socket parameter, or similar mechanisms. Currently that includes Linux, most flavors of BSD including macOS, and Solaris.
This authentication method operates similarly to password
except that it uses LDAP as the password verification method. LDAP is used only to validate the user name/password pairs. Therefore the user must already exist in the database before LDAP can be used for authentication.
LDAP authentication can operate in two modes. In the first mode, which we will call the simple bind mode, the server will bind to the distinguished name constructed as prefix
usernamesuffix
. Typically, the prefix
parameter is used to specify cn=
, or DOMAIN
\
in an Active Directory environment. suffix
is used to specify the remaining part of the DN in a non-Active Directory environment.
In the second mode, which we will call the search+bind mode, the server first binds to the LDAP directory with a fixed user name and password, specified with ldapbinddn
and ldapbindpasswd
, and performs a search for the user trying to log in to the database. If no user and password is configured, an anonymous bind will be attempted to the directory. The search will be performed over the subtree at ldapbasedn
, and will try to do an exact match of the attribute specified in ldapsearchattribute
. Once the user has been found in this search, the server disconnects and re-binds to the directory as this user, using the password specified by the client, to verify that the login is correct. This mode is the same as that used by LDAP authentication schemes in other software, such as Apache mod_authnz_ldap
and pam_ldap
. This method allows for significantly more flexibility in where the user objects are located in the directory, but will cause two separate connections to the LDAP server to be made.
The following configuration options are used in both modes:ldapserver
Names or IP addresses of LDAP servers to connect to. Multiple servers may be specified, separated by spaces.ldapport
Port number on LDAP server to connect to. If no port is specified, the LDAP library's default port setting will be used.ldaptls
Set to 1 to make the connection between PostgreSQL and the LDAP server use TLS encryption. Note that this only encrypts the traffic to the LDAP server — the connection to the client will still be unencrypted unless SSL is used.
The following options are used in simple bind mode only:ldapprefix
String to prepend to the user name when forming the DN to bind as, when doing simple bind authentication.ldapsuffix
String to append to the user name when forming the DN to bind as, when doing simple bind authentication.
The following options are used in search+bind mode only:ldapbasedn
Root DN to begin the search for the user in, when doing search+bind authentication.ldapbinddn
DN of user to bind to the directory with to perform the search when doing search+bind authentication.ldapbindpasswd
Password for user to bind to the directory with to perform the search when doing search+bind authentication.ldapsearchattribute
Attribute to match against the user name in the search when doing search+bind authentication. If no attribute is specified, the uid
attribute will be used.ldapurl
An RFC 4516 LDAP URL. This is an alternative way to write some of the other LDAP options in a more compact and standard form. The format is
scope
must be one of base
, one
, sub
, typically the latter. Only one attribute is used, and some other components of standard LDAP URLs such as filters and extensions are not supported.
For non-anonymous binds, ldapbinddn
and ldapbindpasswd
must be specified as separate options.
To use encrypted LDAP connections, the ldaptls
option has to be used in addition to ldapurl
. The ldaps
URL scheme (direct SSL connection) is not supported.
LDAP URLs are currently only supported with OpenLDAP, not on Windows.
It is an error to mix configuration options for simple bind with options for search+bind.
Here is an example for a simple-bind LDAP configuration:
When a connection to the database server as database user someuser
is requested, PostgreSQL will attempt to bind to the LDAP server using the DN cn=someuser, dc=example, dc=net
and the password provided by the client. If that connection succeeds, the database access is granted.
Here is an example for a search+bind configuration:
When a connection to the database server as database user someuser
is requested, PostgreSQL will attempt to bind anonymously (since ldapbinddn
was not specified) to the LDAP server, perform a search for (uid=someuser)
under the specified base DN. If an entry is found, it will then attempt to bind using that found information and the password supplied by the client. If that second connection succeeds, the database access is granted.
Here is the same search+bind configuration written as a URL:
Some other software that supports authentication against LDAP uses the same URL format, so it will be easier to share the configuration.
Since LDAP often uses commas and spaces to separate the different parts of a DN, it is often necessary to use double-quoted parameter values when configuring LDAP options, as shown in the examples.
This authentication method operates similarly to password
except that it uses RADIUS as the password verification method. RADIUS is used only to validate the user name/password pairs. Therefore the user must already exist in the database before RADIUS can be used for authentication.
When using RADIUS authentication, an Access Request message will be sent to the configured RADIUS server. This request will be of type Authenticate Only
, and include parameters for user name
, password
(encrypted) and NAS Identifier
. The request will be encrypted using a secret shared with the server. The RADIUS server will respond to this server with either Access Accept
or Access Reject
. There is no support for RADIUS accounting.
Multiple RADIUS servers can be specified, in which case they will be tried sequentially. If a negative response is received from a server, the authentication will fail. If no response is received, the next server in the list will be tried. To specify multiple servers, put the names within quotes and separate the server names with a comma. If multiple servers are specified, all other RADIUS options can also be given as a comma separate list, to apply individual values to each server. They can also be specified as a single value, in which case this value will apply to all servers.
The following configuration options are supported for RADIUS:radiusservers
The name or IP addresses of the RADIUS servers to connect to. This parameter is required.radiussecrets
The shared secrets used when talking securely to the RADIUS server. This must have exactly the same value on the PostgreSQL and RADIUS servers. It is recommended that this be a string of at least 16 characters. This parameter is required.
The encryption vector used will only be cryptographically strong if PostgreSQL is built with support for OpenSSL. In other cases, the transmission to the RADIUS server should only be considered obfuscated, not secured, and external security measures should be applied if necessary.radiusports
The port number on the RADIUS servers to connect to. If no port is specified, the default port 1812
will be used.radiusidentifiers
The string used as NAS Identifier
in the RADIUS requests. This parameter can be used as a second parameter identifying for example which database user the user is attempting to authenticate as, which can be used for policy matching on the RADIUS server. If no identifier is specified, the default postgresql
will be used.
This authentication method uses SSL client certificates to perform authentication. It is therefore only available for SSL connections. When using this authentication method, the server will require that the client provide a valid, trusted certificate. No password prompt will be sent to the client. The cn
(Common Name) attribute of the certificate will be compared to the requested database user name, and if they match the login will be allowed. User name mapping can be used to allow cn
to be different from the database user name.
The following configuration options are supported for SSL certificate authentication:map
In a pg_hba.conf
record specifying certificate authentication, the authentication option clientcert
is assumed to be 1
, and it cannot be turned off since a client certificate is necessary for this method. What the cert
method adds to the basic clientcert
certificate validity test is a check that the cn
attribute matches the database user name.
The following configuration options are supported for PAM:pamservice
PAM service name.pam_use_hostname
Determines whether the remote IP address or the host name is provided to PAM modules through the PAM_RHOST
item. By default, the IP address is used. Set this option to 1 to use the resolved host name instead. Host name resolution can lead to login delays. (Most PAM configurations don't use this information, so it is only necessary to consider this setting if a PAM configuration was specifically created to make use of it.)
If PAM is set up to read /etc/shadow
, authentication will fail because the PostgreSQL server is started by a non-root user. However, this is not an issue when PAM is configured to use LDAP or other authentication methods.
This authentication method operates similarly to password
except that it uses BSD Authentication to verify the password. BSD Authentication is used only to validate user name/password pairs. Therefore the user's role must already exist in the database before BSD Authentication can be used for authentication. The BSD Authentication framework is currently only available on OpenBSD.
BSD Authentication in PostgreSQL uses the auth-postgresql
login type and authenticates with the postgresql
login class if that's defined in login.conf
. By default that login class does not exist, and PostgreSQL will use the default login class.
To use BSD Authentication, the PostgreSQL user account (that is, the operating system user running the server) must first be added to the auth
group. The auth
group exists by default on OpenBSD systems.
用戶端身份驗證由組態檔案控制,組態檔案通常名稱為 pg_hba.conf,並儲存在資料庫叢集的資料目錄中。 (HBA 代表 host-based authentication。)當 initdb 初始化資料目錄時,將安裝預設的 pg_hba.conf 檔案。但是,可以將身份驗證組態檔案放在其他路徑;請參閱 組態參數。
pg_hba.conf 檔案的一般格式是一組記錄,每行一個。空白行將被忽略,# comment 字元後面的任何文字都將被忽略。記錄不能跨行。記錄由許多段落組成,這些段落由空格或 tab 分隔。如果段落的值用了雙引號,則段落可以包含空格。在資料庫,使用者或位址段落(例如,all 或 replication)中括起其中一個關鍵字會使該字失去其特殊含義,並且只是將資料庫,使用者或主機與該名稱相匹配。
每條記錄指定連線類型,用戶端 IP 位址範圍(如果與連線類型相關)、資料庫名稱、使用者名稱以及符合這些參數的連線身份驗證方法。具有符合的連線類型、用戶端位址、要求的資料庫和使用者名稱的第一個記錄用於執行身份驗證。沒有“fall-through”或“replication”:如果選擇了一條記錄而認證失敗,就不再考慮後續記錄。如果沒有記錄匹配,則拒絕存取。
記錄可以是下面的七種格式之一
段落的含義如下:
local
此記錄搭配使用 Unix-domain socket 的連線嘗試。如果沒有此類型的記錄,則不允許使用 Unix-domain socket 連線。
host
此記錄用於使用 TCP/IP 進行的連線嘗試。主機記錄使用 SSL 或非 SSL 連線嘗試.
重要 除非使用 組態參數的適當值啟動伺服器,否則將無法進行遠端 TCP/IP 連線,因為預設行為是僅在 localhost 上監聽 TCP/IP 連線。
hostssl
此記錄會套用於使用 TCP/IP 進行的連線嘗試,但僅限於使用 SSL 加密進行連線時。
hostnossl
此記錄類型與 hostssl 具有相反的行為;它僅套用於透過 TCP/IP 且不使用 SSL 的連線嘗試。
database
指定此記錄所要求搭配的資料庫名稱。值 all 使其搭配所有資料庫。如果請求的資料庫與請求的使用者具有相同的名稱,則可以用 sameuser 值來指定。值 samerole 指定所請求的使用者必須是與請求的資料庫同名的角色成員。 ( samegroup 是一個過時但仍然被接受的 samerole 別名。)超級使用者不被認為是同一角色的成員,除非他們直接或間接地明確地成為角色的成員,而不僅僅是作為超級使用者。值 replication 指定在請求 physical replication 連線時的記錄搭配(請注意,複寫連線不指定任何特定資料庫)。否則,這是特定 PostgreSQL 資料庫的名稱。可以透過用逗號分隔它們來設定多個資料庫名稱,也可以透過在檔案名稱前加上 @ 來指定包含資料庫名稱的額外檔案。
user
指定此記錄所限制的資料庫使用者名稱。all 表示所有使用者都適用。否則,它就是是特定資料庫使用者的名稱,要就是帶有 + 的群組名稱。(回想一下,PostgreSQL 中的使用者和群組之間並沒有真正的差別; + 標記實際上表示「符合直接或間接地成為該角色成員的任何角色」,而沒有 + 標記的名稱僅適用該特定角色。 )為此,只有超級使用者直接或間接明確地是角色的成員,而不僅僅是憑藉超級使用者,才將其視為角色的成員。可以使用逗號分隔多個使用者名稱。透過在檔案名稱前面加上 @ 來指定包含使用者名稱的獨立設定檔案。
address
Specifies the client machine address(es) that this record matches. This field can contain either a host name, an IP address range, or one of the special key words mentioned below.
An IP address range is specified using standard numeric notation for the range's starting address, then a slash (/
) and a CIDR mask length. The mask length indicates the number of high-order bits of the client IP address that must match. Bits to the right of this should be zero in the given IP address. There must not be any white space between the IP address, the /
, and the CIDR mask length.
Typical examples of an IPv4 address range specified this way are 172.20.143.89/32
for a single host, or 172.20.143.0/24
for a small network, or 10.6.0.0/16
for a larger one. An IPv6 address range might look like ::1/128
for a single host (in this case the IPv6 loopback address) or fe80::7a31:c1ff:0000:0000/96
for a small network. 0.0.0.0/0
represents all IPv4 addresses, and ::0/0
represents all IPv6 addresses. To specify a single host, use a mask length of 32 for IPv4 or 128 for IPv6. In a network address, do not omit trailing zeroes.
An entry given in IPv4 format will match only IPv4 connections, and an entry given in IPv6 format will match only IPv6 connections, even if the represented address is in the IPv4-in-IPv6 range. Note that entries in IPv6 format will be rejected if the system's C library does not have support for IPv6 addresses.
You can also write all
to match any IP address, samehost
to match any of the server's own IP addresses, or samenet
to match any address in any subnet that the server is directly connected to.
If a host name is specified (anything that is not an IP address range or a special key word is treated as a host name), that name is compared with the result of a reverse name resolution of the client's IP address (e.g., reverse DNS lookup, if DNS is used). Host name comparisons are case insensitive. If there is a match, then a forward name resolution (e.g., forward DNS lookup) is performed on the host name to check whether any of the addresses it resolves to are equal to the client's IP address. If both directions match, then the entry is considered to match. (The host name that is used in pg_hba.conf
should be the one that address-to-name resolution of the client's IP address returns, otherwise the line won't be matched. Some host name databases allow associating an IP address with multiple host names, but the operating system will only return one host name when asked to resolve an IP address.)
以點(.)開頭的主機名稱表示與實際主機名稱的後段比對。因此,.example.com 與 foo.example.com 比較是相符的(不僅限於 example.com)。
當在 pg_hba.conf 中指定了主機名稱時,您應該確保名稱解析足夠快。設定本地名稱解析暫存(例如 nscd)可能是有幫助的。另外,您可能希望啟用配置參數 log_hostname 來查看用戶端的主機名稱,而不是日誌中的 IP 位址。
此欄位僅適用於 host、hostssl 和 hostnossl 規則項目。
使用者有時會想知道為什麼以這種看似複雜的方式來處理主機名稱,並具有兩種名稱解析,其中包括對用戶端 IP 地址的反向查詢。如果未設定用戶端的反向 DNS 項目或設定了某些不良的主機名稱,則會使該功能的使用複雜化。這樣做主要是為了提高效率:透過這種方式,連線嘗試最多需要兩次 DNS 查詢,一次反向查詢和一次正向查詢。如果某個位址存在 DNS 問題,則僅成為該使用者的問題。假設僅執行正向查詢的替代實作方式,必須在每次連線嘗試期間解析 pg_hba.conf 中提到的每個主機名稱。 如果列出了許多名稱,那可能會很慢。而且,如果其中一個主機名稱存在有 DNS 問題,那麼它將成為每個人的問題。
另外,必須執行反向查詢以實作後段樣式比對的功能,因為需要知道實際的用戶端主機名稱,以便將其與樣式進行比對。
請注意,此行為與基於主機名稱的存取控制的其他常見的實作方式一致,例如 Apache HTTP Server 和 TCP Wrappers。
IP-address
IP-mask
These two fields can be used as an alternative to the IP-address
/
mask-length
notation. Instead of specifying the mask length, the actual mask is specified in a separate column. For example, 255.0.0.0
represents an IPv4 CIDR mask length of 8, and 255.255.255.255
represents a CIDR mask length of 32.
These fields only apply to host
, hostssl
, and hostnossl
records.
auth-method
trust
reject
無條件地拒絕連線。這對於「過濾」群組網路中的某些主機很有用。例如拒絕阻止特定主機的連接,而更前面的規則則允許特定網路中的其餘主機進行連線。
scram-sha-256
md5
password
gss
sspi
ident
peer
ldap
radius
cert
pam
bsd
auth-options
After the auth-method
field, there can be field(s) of the form name
=
value
that specify options for the authentication method. Details about which options are available for which authentication methods appear below.
In addition to the method-specific options listed below, there is one method-independent authentication option clientcert
, which can be specified in any hostssl
record. When set to 1
, this option requires the client to present a valid (trusted) SSL certificate, in addition to the other requirements of the authentication method.
@ 語法結構包含的檔案會被入為名稱列表,可以用空格或逗號分隔。就像在 pg_hba.conf 中一樣,註釋由 # 引入,並且允許巢狀式的 @ 結構。除非 @ 之後的檔案名稱是絕對路徑,否則它將被視為相對於包含引用檔案的目錄。
Since the pg_hba.conf
records are examined sequentially for each connection attempt, the order of the records is significant. Typically, earlier records will have tight connection match parameters and weaker authentication methods, while later records will have looser match parameters and stronger authentication methods. For example, one might wish to use trust
authentication for local TCP/IP connections but require a password for remote TCP/IP connections. In this case a record specifying trust
authentication for connections from 127.0.0.1 would appear before a record specifying password authentication for a wider range of allowed client IP addresses.
The pg_hba.conf
file is read on start-up and when the main server process receives a SIGHUP signal. If you edit the file on an active system, you will need to signal the postmaster (using pg_ctl reload
or kill -HUP
) to make it re-read the file.
前面的宣告在 Microsoft Windows 上是不正確的:在 Windows,pg_hba.conf 檔案中的任何變更都會在後續的新連線立即適用。
要連線到特定的資料庫,使用者不僅必須通過 pg_hba.conf 檢查,而且必須具有資料庫的 CONNECT 權限。如果您希望限制哪些使用者可以連接到哪些資料庫,通常比設定 pg_hba.conf 項目更容易,透過授權/撤銷 CONNECT 權限來控制。
Example 20.1. Example pg_hba.conf
Entries
The following parameters are intended for work on the PostgreSQL source code, and in some cases to assist with recovery of severely damaged databases. There should be no reason to use them on a production database. As such, they have been excluded from the sample postgresql.conf
file. Note that many of these parameters require special source compilation flags to work at all.
allow_system_table_mods
(boolean
)
Allows modification of the structure of system tables. This is used by initdb
. This parameter can only be set at server start.
ignore_system_indexes
(boolean
)
Ignore system indexes when reading system tables (but still update the indexes when modifying the tables). This is useful when recovering from damaged system indexes. This parameter cannot be changed after session start.
post_auth_delay
(integer
)
The amount of time to delay when a new server process is started, after it conducts the authentication procedure. This is intended to give developers an opportunity to attach to the server process with a debugger. If this value is specified without units, it is taken as seconds. A value of zero (the default) disables the delay. This parameter cannot be changed after session start.
pre_auth_delay
(integer
)
The amount of time to delay just after a new server process is forked, before it conducts the authentication procedure. This is intended to give developers an opportunity to attach to the server process with a debugger to trace down misbehavior in authentication. If this value is specified without units, it is taken as seconds. A value of zero (the default) disables the delay. This parameter can only be set in the postgresql.conf
file or on the server command line.
trace_notify
(boolean
)
Generates a great amount of debugging output for the LISTEN
and NOTIFY
commands. or must be DEBUG1
or lower to send this output to the client or server logs, respectively.
trace_recovery_messages
(enum
)
Enables logging of recovery-related debugging output that otherwise would not be logged. This parameter allows the user to override the normal setting of , but only for specific messages. This is intended for use in debugging Hot Standby. Valid values are DEBUG5
, DEBUG4
, DEBUG3
, DEBUG2
, DEBUG1
, and LOG
. The default, LOG
, does not affect logging decisions at all. The other values cause recovery-related debug messages of that priority or higher to be logged as though they had LOG
priority; for common settings of log_min_messages
this results in unconditionally sending them to the server log. This parameter can only be set in the postgresql.conf
file or on the server command line.
trace_sort
(boolean
)
If on, emit information about resource usage during sort operations. This parameter is only available if the TRACE_SORT
macro was defined when PostgreSQL was compiled. (However, TRACE_SORT
is currently defined by default.)
trace_locks
(boolean
)
If on, emit information about lock usage. Information dumped includes the type of lock operation, the type of lock and the unique identifier of the object being locked or unlocked. Also included are bit masks for the lock types already granted on this object as well as for the lock types awaited on this object. For each lock type a count of the number of granted locks and waiting locks is also dumped as well as the totals. An example of the log file output is shown here:
Details of the structure being dumped may be found in src/include/storage/lock.h
.
This parameter is only available if the LOCK_DEBUG
macro was defined when PostgreSQL was compiled.
trace_lwlocks
(boolean
)
If on, emit information about lightweight lock usage. Lightweight locks are intended primarily to provide mutual exclusion of access to shared-memory data structures.
This parameter is only available if the LOCK_DEBUG
macro was defined when PostgreSQL was compiled.
trace_userlocks
(boolean
)
If on, emit information about user lock usage. Output is the same as for trace_locks
, only for advisory locks.
This parameter is only available if the LOCK_DEBUG
macro was defined when PostgreSQL was compiled.
trace_lock_oidmin
(integer
)
If set, do not trace locks for tables below this OID. (use to avoid output on system tables)
This parameter is only available if the LOCK_DEBUG
macro was defined when PostgreSQL was compiled.
trace_lock_table
(integer
)
Unconditionally trace locks on this table (OID).
This parameter is only available if the LOCK_DEBUG
macro was defined when PostgreSQL was compiled.
debug_deadlocks
(boolean
)
If set, dumps information about all current locks when a deadlock timeout occurs.
This parameter is only available if the LOCK_DEBUG
macro was defined when PostgreSQL was compiled.
log_btree_build_stats
(boolean
)
If set, logs system resource usage statistics (memory and CPU) on various B-tree operations.
This parameter is only available if the BTREE_BUILD_STATS
macro was defined when PostgreSQL was compiled.
wal_consistency_checking
(string
)
This parameter is intended to be used to check for bugs in the WAL redo routines. When enabled, full-page images of any buffers modified in conjunction with the WAL record are added to the record. If the record is subsequently replayed, the system will first apply each record and then test whether the buffers modified by the record match the stored images. In certain cases (such as hint bits), minor variations are acceptable, and will be ignored. Any unexpected differences will result in a fatal error, terminating recovery.
The default value of this setting is the empty string, which disables the feature. It can be set to all
to check all records, or to a comma-separated list of resource managers to check only records originating from those resource managers. Currently, the supported resource managers are heap
, heap2
, btree
, hash
, gin
, gist
, sequence
, spgist
, brin
, and generic
. Only superusers can change this setting.wal_debug
(boolean
)
If on, emit WAL-related debugging output. This parameter is only available if the WAL_DEBUG
macro was defined when PostgreSQL was compiled.
ignore_checksum_failure
(boolean
)
Detection of a checksum failure during a read normally causes PostgreSQL to report an error, aborting the current transaction. Setting ignore_checksum_failure
to on causes the system to ignore the failure (but still report a warning), and continue processing. This behavior may cause crashes, propagate or hide corruption, or other serious problems. However, it may allow you to get past the error and retrieve undamaged tuples that might still be present in the table if the block header is still sane. If the header is corrupt an error will be reported even if this option is enabled. The default setting is off
, and it can only be changed by a superuser.
zero_damaged_pages
(boolean
)
Detection of a damaged page header normally causes PostgreSQL to report an error, aborting the current transaction. Setting zero_damaged_pages
to on causes the system to instead report a warning, zero out the damaged page in memory, and continue processing. This behavior will destroy data, namely all the rows on the damaged page. However, it does allow you to get past the error and retrieve rows from any undamaged pages that might be present in the table. It is useful for recovering data if corruption has occurred due to a hardware or software error. You should generally not set this on until you have given up hope of recovering data from the damaged pages of a table. Zeroed-out pages are not forced to disk so it is recommended to recreate the table or the index before turning this parameter off again. The default setting is off
, and it can only be changed by a superuser.
jit_debugging_support
(boolean
)
If LLVM has the required functionality, register generated functions with GDB. This makes debugging easier. The default setting is off
. This parameter can only be set at server start.
jit_dump_bitcode
(boolean
)
jit_expressions
(boolean
)
jit_profiling_support
(boolean
)
If LLVM has the required functionality, emit the data needed to allow perf to profile functions generated by JIT. This writes out files to $HOME/.debug/jit/
; the user is responsible for performing cleanup when desired. The default setting is off
. This parameter can only be set at server start.
jit_tuple_deforming
(boolean
)
Vacuum 還允許從 pg_xact 子目錄中刪除舊檔案,這就是為什麼預設值是相對較低的 2 億次事務。該參數只能在伺服器啟動時設定,但透過變更資料表儲存參數可以減少單個資表的設定。有關更多訊息,請參閱。
資料庫清理 multixacts 還允許從 pg_multixact/members 和 pg_multixact/offset 子目錄中刪除舊檔案,這就是為什麼預設值是相對較低的 4 億個 multixacts。該參數只能在伺服器啟動時設定,但透過變更資料表儲存參數可以減少單個資料表的設定。有關更多訊息,請參閱。
指定將在自動 VACUUM 操作中使用的成本延遲值。如果指定了 -1,則將使用標準的 值。預設值是 20 毫秒。此參數只能在 postgresql.conf 檔案或伺務器命令行中設定;但是可以透過變更資料表儲存參數來覆寫單個資料表的設定。
指定將在自動 VACUUM 操作中使用的成本上限值。如果指定了 -1(這是預設值),則將使用標準的 值。請注意,如果有多個工作程序,則在運行的自動清理工作程序之間會按比例分配值,以便每個工作程序的限制總和不超過此參數的值。此參數只能在 postgresql.conf 檔案或伺服器命令行中設定;但也可以透過變更資料表儲存參數來覆寫單個資料表的設定。
這種存取控制機制獨立於中所描述的機制。
啟用 SSL 連線。使用前請先閱讀。 此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。預設是關閉的。
當在 或 中指定密碼時,此參數決定用於加密密碼的演算法。預設值是md5,它將密碼儲存為MD5 hash(on 也被接受,作為 md5 的別名)。將此參數設定為 scram-sha-256 時將使用 SCRAM-SHA-256 加密密碼。
請注意,較舊的用戶端可能缺少對 SCRAM 認證機制的支援,因此不適用於使用 SCRAM-SHA-256 加密的密碼。有關更多詳細訊息,請參閱。
設定 Kerberos 伺服器密鑰檔案的位置。有關詳細訊息,請參閱。此參數只能在 postgresql.conf 檔案或伺服器命令列中設定。
有關設定 SSL 的更多資訊,請參閱。
啟用或停用查詢計劃程序使用 index-only 掃描計劃類型(請參閱)。預設為開啓。
啟用或停用查詢計劃程序從查詢計劃中修剪分割資料表分割區的功能。 這也控制了計劃程序產生查詢計劃的功能,此功能使查詢執行程序可以在查詢執行期間刪除(忽略)分割區。 預設為 on。有關詳細資訊,請參閱。
設定計劃程序對磁碟頁面讀取的成本估計,此成本是一系列連續讀取的一部分。預設值為 1.0。透過設定同名的 tablespace 參數,可以為特定資料表空間中的資料表和索引覆寫此值(請參閱 )。
Sets the planner's estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0. This value can be overridden for tables and indexes in a particular tablespace by setting the tablespace parameter of the same name (see ).
The genetic query optimizer (GEQO) is an algorithm that does query planning using heuristic searching. This reduces planning time for complex queries (those joining many relations), at the cost of producing plans that are sometimes inferior to those found by the normal exhaustive-search algorithm. For more information see .
Sets the default statistics target for table columns without a column-specific target set via ALTER TABLE SET STATISTICS
. Larger values increase the time needed to do ANALYZE
, but might improve the quality of the planner's estimates. The default is 100. For more information on the use of statistics by the PostgreSQL query planner, refer to .
Refer to for more information on using constraint exclusion and partitioning.
The planner will merge sub-queries into upper queries if the resulting FROM
list would have no more than this many items. Smaller values reduce planning time but might yield inferior query plans. The default is eight. For more information see .
Setting this value to or more may trigger use of the GEQO planner, resulting in non-optimal plans. See .
By default, this variable is set the same as from_collapse_limit
, which is appropriate for most uses. Setting it to 1 prevents any reordering of explicit JOIN
s. Thus, the explicit join order specified in the query will be the actual order in which the relations are joined. Because the query planner does not always choose the optimal join order, advanced users can elect to temporarily set this variable to 1, and then specify the join order they desire explicitly. For more information see .
Setting this value to or more may trigger use of the GEQO planner, resulting in non-optimal plans. See .
Reports the database encoding (character set). It is determined when the database is created. Ordinarily, clients need only be concerned with the value of .
Reports the number of blocks (pages) in a WAL segment file. The total size of a WAL segment file in bytes is equal to wal_segment_size
multiplied by wal_block_size
; by default this is 16MB. See for more information.
The availability of the different password-based authentication methods depends on how a user's password on the server is encrypted (or hashed, more accurately). This is controlled by the configuration parameter at the time the password is set. If a password was encrypted using the scram-sha-256
setting, then it can be used for the authentication methods scram-sha-256
and password
(but password transmission will be in plain text in the latter case). The authentication method specification md5
will automatically switch to using the scram-sha-256
method in this case, as explained above, so it will also work. If a password was encrypted using the md5
setting, then it can be used only for the md5
and password
authentication method specifications (again, with the password transmitted in plain text in the latter case). (Previous PostgreSQL releases supported storing the password on the server in plain text. This is no longer possible.) To check the currently stored password hashes, see the system catalog pg_authid
.
GSSAPI support has to be enabled when PostgreSQL is built; see for more information.
When GSSAPI uses Kerberos, it uses a standard principal in the format servicename
/hostname
@realm
. The PostgreSQL server will accept any principal that is included in the keytab used by the server, but care needs to be taken to specify the correct principal details when making the connection from the client using the krbsrvname
connection parameter. (See also .) The installation default can be changed from the default postgres
at build time using ./configure --with-krb-srvnam=
whatever
. In most environments, this parameter never needs to be changed. Some Kerberos implementations might require a different service name, such as Microsoft Active Directory which requires the service name to be in upper case (POSTGRES
).
Make sure that your server keytab file is readable (and preferably only readable, not writable) by the PostgreSQL server account. (See also .) The location of the key file is specified by the configuration parameter. The default is /usr/local/pgsql/etc/krb5.keytab
(or whatever directory was specified as sysconfdir
at build time). For security reasons, it is recommended to use a separate keytab just for the PostgreSQL server rather than opening up permissions on the system keytab file.
When connecting to the database make sure you have a ticket for a principal matching the requested database user name. For example, for database user name fred
, principalfred@EXAMPLE.COM
would be able to connect. To also allow principal fred/users.example.com@EXAMPLE.COM
, use a user name map, as described in .
If set to 0, the realm name from the authenticated user principal is stripped off before being passed through the user name mapping (). This is discouraged and is primarily available for backwards compatibility, as it is not secure in multi-realm environments unless krb_realm
is also used. It is recommended to leave include_realm
set to the default (1) and to provide an explicit mapping in pg_ident.conf
to convert principal names to PostgreSQL user names.map
Allows for mapping between system and database user names. See for details. For a GSSAPI/Kerberos principal, such as username@EXAMPLE.COM
(or, less commonly,username/hostbased@EXAMPLE.COM
), the user name used for mapping is username@EXAMPLE.COM
(or username/hostbased@EXAMPLE.COM
, respectively), unless include_realm
has been set to 0, in which case username
(or username/hostbased
) is what is seen as the system user name when mapping.krb_realm
When using Kerberos authentication, SSPI works the same way GSSAPI does; see for details.
If set to 0, the realm name from the authenticated user principal is stripped off before being passed through the user name mapping (). This is discouraged and is primarily available for backwards compatibility, as it is not secure in multi-realm environments unless krb_realm
is also used. It is recommended to leave include_realm
set to the default (1) and to provide an explicit mapping in pg_ident.conf
to convert principal names to PostgreSQL user names.compat_realm
Allows for mapping between system and database user names. See for details. For a SSPI/Kerberos principal, such as username@EXAMPLE.COM
(or, less commonly,username/hostbased@EXAMPLE.COM
), the user name used for mapping is username@EXAMPLE.COM
(or username/hostbased@EXAMPLE.COM
, respectively), unless include_realm
has been set to 0, in which case username
(or username/hostbased
) is what is seen as the system user name when mapping.krb_realm
When ident is specified for a local (non-TCP/IP) connection, peer authentication (see ) will be used instead.
Allows for mapping between system and database user names. See for details.
Allows for mapping between system and database user names. See for details.
Allows for mapping between system and database user names. See for details.
This authentication method operates similarly to password
except that it uses PAM (Pluggable Authentication Modules) as the authentication mechanism. The default PAM service name is postgresql
. PAM is used only to validate user name/password pairs and optionally the connected remote host name or IP address. Therefore the user must already exist in the database before PAM can be used for authentication. For more information about PAM, please read the .
要使用此選項,必須以 SSL 建置伺服器,也必須透過設定 來啟用 SSL(有關更多訊息,請參閱)。否則,將會忽略 hostssl 記錄,除非是為了記錄不能與任何連線相符合的警告。
Specifies the authentication method to use when a connection matches this record. The possible choices are summarized here; details are in .
無條件地允許連線。此方法允許可以連線到 PostgreSQL 資料庫伺服器的任何人以他們希望的任何 PostgreSQL 使用者身份登入,而毌需密碼或任何其他身份驗證。有關詳細資訊,請參閱。
Perform SCRAM-SHA-256 authentication to verify the user's password. See for details.
Perform SCRAM-SHA-256 or MD5 authentication to verify the user's password. See for details.
Require the client to supply an unencrypted password for authentication. Since the password is sent in clear text over the network, this should not be used on untrusted networks. See for details.
Use GSSAPI to authenticate the user. This is only available for TCP/IP connections. See for details.
Use SSPI to authenticate the user. This is only available on Windows. See for details.
Obtain the operating system user name of the client by contacting the ident server on the client and check if it matches the requested database user name. Ident authentication can only be used on TCP/IP connections. When specified for local connections, peer authentication will be used instead. See for details.
Obtain the client's operating system user name from the operating system and check if it matches the requested database user name. This is only available for local connections. See for details.
Authenticate using an LDAP server. See for details.
Authenticate using a RADIUS server. See for details.
Authenticate using SSL client certificates. See for details.
Authenticate using the Pluggable Authentication Modules (PAM) service provided by the operating system. See for details.
Authenticate using the BSD Authentication service provided by the operating system. See for details.
系統檢視表 有助於預先測試對 pg_hba.conf 檔案的變更,或者在檔案載入未達到預期效果時診斷問題。檢視表中帶有非空白錯誤欄位會指示檔案相應規則項目中的問題。
Some examples of pg_hba.conf
entries are shown in . See the next section for details on the different authentication methods.
Only has effect if are enabled.
Writes the generated LLVM IR out to the file system, inside . This is only useful for working on the internals of the JIT implementation. The default setting is off
. This parameter can only be changed by a superuser.
Determines whether expressions are JIT compiled, when JIT compilation is activated (see ). The default is on
.
Determines whether tuple deforming is JIT compiled, when JIT compilation is activated (see ). The default is on
.
File
Contents
Effect
ssl_cert_file ($PGDATA/server.crt
)
server certificate
sent to client to indicate server's identity
ssl_key_file ($PGDATA/server.key
)
server private key
proves server certificate was sent by the owner; does not indicate certificate owner is trustworthy
trusted certificate authorities
checks that client certificate is signed by a trusted certificate authority
certificates revoked by certificate authorities
client certificate must not be on this list
CREATE TABLE AS
CREATE INDEX
CLUSTER
COPY
into tables that were created or truncated in the same transaction
Severity
Usage
syslog
eventlog
DEBUG1..DEBUG5
提供連續且更詳細的訊息供開發人員使用。
DEBUG
INFORMATION
INFO
提供隱含用戶請求的訊息,例如來自 VACUUM VERBOSE 的輸出。
INFO
INFORMATION
NOTICE
提供可能對用戶有幫助的訊息,例如,截斷 long identifier 的通知。
NOTICE
INFORMATION
WARNING
提供可能出現問題的警告,例如交易事務區塊外的 COMMIT。
NOTICE
WARNING
ERROR
回報導致當下指令中止的錯誤。
WARNING
ERROR
LOG
回報管理員感興趣的訊息,例如檢查點的活動。
INFO
INFORMATION
FATAL
回報導致當下連線中止的錯誤。
ERR
ERROR
PANIC
回報導致所有資料庫連線中止的錯誤。
CRIT
ERROR
Escape
Effect
Session only
%a
應用名稱
yes
%u
使用者名稱
yes
%d
資料庫名稱
yes
%r
遠端主機名稱或 IP 位址,以及遠端連接埠
yes
%h
遠端主機名稱或 IP 位址
yes
%p
程序 ID
no
%t
時間戳記,不含毫秒
no
%m
時間戳記,包含毫秒
no
%n
時間戳記,包含毫秒(Unix epoch)
no
%i
指令標記:連線的當下指令類型
yes
%e
SQLSTATE 錯誤代碼
no
%c
連線 ID:詳見下文
no
%l
每個連線或程序的日誌行號,從 1 開始
no
%s
開始處理的時間戳記
no
%v
虛擬交易事務 ID(backendID / localXID)
no
%x
交易事務 ID(如果沒有分配,則為 0)
no
%q
不產生輸出,但告訴非連線程序在此字串中停止;被連線中程序忽略
no
%%
文字 %
no
The Identification Protocol is not intended as an authorization or access control protocol. |
--RFC 1413 |
Short Option | Equivalent |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Database roles are conceptually completely separate from operating system users. In practice it might be convenient to maintain a correspondence, but this is not required. Database roles are global across a database cluster installation (and not per individual database). To create a role use the CREATE ROLE SQL command:
name
follows the rules for SQL identifiers: either unadorned without special characters, or double-quoted. (In practice, you will usually want to add additional options, such as LOGIN
, to the command. More details appear below.) To remove an existing role, use the analogous DROP ROLE command:
For convenience, the programs createuser and dropuser are provided as wrappers around these SQL commands that can be called from the shell command line:
To determine the set of existing roles, examine the pg_roles
system catalog, for example
The psql program's \du
meta-command is also useful for listing the existing roles.
In order to bootstrap the database system, a freshly initialized system always contains one predefined role. This role is always a “superuser”, and by default (unless altered when running initdb
) it will have the same name as the operating system user that initialized the database cluster. Customarily, this role will be named postgres
. In order to create more roles you first have to connect as this initial role.
Every connection to the database server is made using the name of some particular role, and this role determines the initial access privileges for commands issued in that connection. The role name to use for a particular database connection is indicated by the client that is initiating the connection request in an application-specific fashion. For example, the psql
program uses the -U
command line option to indicate the role to connect as. Many applications assume the name of the current operating system user by default (including createuser
and psql
). Therefore it is often convenient to maintain a naming correspondence between roles and operating system users.
The set of database roles a given client connection can connect as is determined by the client authentication setup, as explained in Chapter 20. (Thus, a client is not limited to connect as the role matching its operating system user, just as a person's login name need not match his or her real name.) Since the role identity determines the set of privileges available to a connected client, it is important to carefully configure privileges when setting up a multiuser environment.
It is frequently convenient to group users together to ease management of privileges: that way, privileges can be granted to, or revoked from, a group as a whole. In PostgreSQL this is done by creating a role that represents the group, and then granting membership in the group role to individual user roles.
To set up a group role, first create the role:
Typically a role being used as a group would not have the LOGIN
attribute, though you can set it if you wish.
Once the group role exists, you can add and remove members using the GRANT and REVOKE commands:
You can grant membership to other group roles, too (since there isn't really any distinction between group roles and non-group roles). The database will not let you set up circular membership loops. Also, it is not permitted to grant membership in a role to PUBLIC
.
The members of a group role can use the privileges of the role in two ways. First, every member of a group can explicitly do SET ROLE to temporarily “become” the group role. In this state, the database session has access to the privileges of the group role rather than the original login role, and any database objects created are considered owned by the group role not the login role. Second, member roles that have the INHERIT
attribute automatically have use of the privileges of roles of which they are members, including any privileges inherited by those roles. As an example, suppose we have done:
Immediately after connecting as role joe
, a database session will have use of privileges granted directly to joe
plus any privileges granted to admin
, because joe
“inherits” admin
's privileges. However, privileges granted to wheel
are not available, because even though joe
is indirectly a member of wheel
, the membership is via admin
which has the NOINHERIT
attribute. After:
the session would have use of only those privileges granted to admin
, and not those granted to joe
. After:
the session would have use of only those privileges granted to wheel
, and not those granted to either joe
or admin
. The original privilege state can be restored with any of:
The SET ROLE
command always allows selecting any role that the original login role is directly or indirectly a member of. Thus, in the above example, it is not necessary to become admin
before becoming wheel
.
In the SQL standard, there is a clear distinction between users and roles, and users do not automatically inherit privileges while roles do. This behavior can be obtained in PostgreSQL by giving roles being used as SQL roles the INHERIT
attribute, while giving roles being used as SQL users the NOINHERIT
attribute. However, PostgreSQL defaults to giving all roles the INHERIT
attribute, for backward compatibility with pre-8.1 releases in which users always had use of permissions granted to groups they were members of.
The role attributes LOGIN
, SUPERUSER
, CREATEDB
, and CREATEROLE
can be thought of as special privileges, but they are never inherited as ordinary privileges on database objects are. You must actually SET ROLE
to a specific role having one of these attributes in order to make use of the attribute. Continuing the above example, we might choose to grant CREATEDB
and CREATEROLE
to the admin
role. Then a session connecting as role joe
would not have these privileges immediately, only after doing SET ROLE admin
.
To destroy a group role, use DROP ROLE:
Any memberships in the group role are automatically revoked (but the member roles are not otherwise affected).
因為角色可以擁有資料庫物件,並且可以擁有存取其他物件的權限,所以移除角色通常不僅僅是快速 DROP USER 的問題。該角色擁有的任何物件必須先被移除或重新分配給其他使用者;而授予角色的任何權限也都必須被撤銷。
物件的所有權可以使用 ALTER 指令一次轉換,例如:
或者,可以使用 REASSIGN OWNED 指令將要移除角色擁有的所有物件的所有權重新分配給另一個角色。由於 REASSIGN OWNED 無法存取其他資料庫中的物件,因此有必要在包含該角色擁有的物件的每個資料庫中執行它。(請注意,第一個這樣的 REASSIGN OWNED 將改變任何共享的資料庫間物件,即資料庫或資料表空間的所有權,這些資料庫或資料表空間由將被移除的角色所擁有)。
一旦任何有價值的物件已經轉移到新的所有者中,則可以使用 DROP OWNED 指令移弓除待移除角色擁有的任何剩餘物件。同樣,此指令無法存取其他資料庫中的物件,因此有必要在包含該角色擁有的物件的每個資料庫中執行它。此外,DROP OWNED 不會刪除整個資料庫或資料表空間,因此如果角色擁有尚未轉移到新所有者的任何資料庫或資料表空間,則必須手動執行此操作。
DROP OWNED 還負責為不屬於它的物件移除授予目標角色的所有權限。由於 REASSIGN OWNED 不會觸及這些物件,因此通常需要運行 REASSIGN OWNED 和 DROP OWNED(按此順序!)以完全移除要移除的角色的相依關係。
簡而言之,移除已用於擁有物件的角色的最一般的方式是:
當並非所有擁有的物件都將被轉移到相同的繼任者使用者時,最好手動處理異常,然後執行上述步驟來清除。
如果在相依物件仍然存在的情況下嘗試 DROP ROLE,則會發出哪些物件需要重新分配或移除的訊息。
版本:11
CREATE DATABASE 實際上是透過複製現有資料庫來作業的。預設情況下,它是複製名為 template1 的標準系統資料庫。因此,該資料庫是製作新資料庫的「樣板」。如果向 template1 新增物件,則會使這些物件複製到隨後建立的使用者資料庫中。此行為可以對資料庫中的標準物件集合進行本地的修改。例如,如果在 template1 中安裝程序語言 PL/Perl,它將自動在使用者資料庫中可用,而在建立這些資料庫時不需要採取任何額外操作。
有一個名為 template0 的第二個標準系統資料庫。此資料庫包含與 template1 初始內容相同的資料,即只有您的 PostgreSQL 版本預先定義的標準物件。初始化資料庫叢集後,永遠都不應變更 template0。透過指示 CREATE DATABASE 複製 template0 而不是 template1,您可以建立一個「virgin」使用者資料庫,其中不包含 template1 中的任何本地變更。這在恢復 pg_dump 轉存時尤其方便:轉存腳本應該在原始資料庫中恢復,以確保重新建立轉存資料庫的正確內容,而不會與稍後可能增加到 template1 的物件發生衝突。
複製 template0 而不是 template1 的另一個常見原因是,在複製 template0 時可以指定新的編碼和區域設定,而 template1 的副本必須使用與其相同的設定。這是因為 template1 可能包含特定於編碼或特定於語言環境的資料,是 template0 所不知道的。
要透過複製 template0 建立資料庫,請使用:
在 SQL 環境,或:
在命令列。
要建立其他樣模資料庫,實際上可以透過將其名稱指定為 CREATE DATABASE 的樣板來複製叢集中的任何資料庫。然而,重要的是要理解,這不能用作通用的「COPY DATABASE」操作。主要限制是在複製來源資料庫時不能將其他連線連接到來源資料庫。如果啟動時存在任何其他的連線,則 CREATE DATABASE 將會失敗;在複製操作期間,會阻止與來源資料庫的新連線。
每個資料庫的 pg_database 中都存在兩個有用的標記:欄位 datistemplate 和 datallowconn。可以設定 datistemplate 以指示資料庫是否用作 CREATE DATABASE 的樣板。如果設定了此標記,則任何具有 CREATEDB 權限的使用者都可以複製資料庫;如果未設定,則只有超級使用者和資料庫的所有者才能複製它。如果 datallowconn 為 false,則不允許與該資料庫建立新的連線(但僅透過將標記設定為 false,不會終止現有連線)。template0 資料庫通常標記為 datallowconn = false 以防止其修改。template0 和 template1 都應該始終保持 datistemplate = true 標記。
除了名稱 template1 是 CREATE DATABASE 的預設來源資料庫名稱之外,template1 和 template0 沒有任何特殊狀態。例如,可以刪除 template1 並從 template0 重新建立它而不會產生任何不良影響。如果一個人在 template1 中不小心加入了一堆垃圾,那麼這個方案可能是可接受的。(要刪除 template1,必須具有 pg_database.datistemplate = false。)
初始化資料庫叢集時也會建立 postgres 資料庫。此資料庫用作連線使用者和應用程序的預設資料庫。它只是 template1 的副本,可以在必要時刪除並重新建立。
PostgreSQL 使用角色的概念來管理資料庫的存取權限。角色可以被視為資料庫使用者或一個資料庫使用者群組,具體取決於角色的設定方式。角色可以擁有資料庫物件(例如資料表和函數),並可以將這些物件的權限分配給其他角色,以控制誰可以存取哪些物件。此外,也可以將角色中的成員身份授予另一個角色,使得成員角色能夠使用分配給其他角色的權限。
角色的概念包含「使用者」和「群組」的概念。在 8.1 版之前的 PostgreSQL中,使用者和群組是不同種類的實體,但現在只有角色。任何角色都可以充當使用者、群組或兩者兼具。
本章介紹如何建立和管理角色。有關角色權限對各種資料庫物件的影響和更多訊息可以在 5.7 節中找到。
A database role can have a number of attributes that define its privileges and interact with the client authentication system.login privilege
Only roles that have the LOGIN
attribute can be used as the initial role name for a database connection. A role with the LOGIN
attribute can be considered the same as a “database user”. To create a role with login privilege, use either:
(CREATE USER
is equivalent to CREATE ROLE
except that CREATE USER
includes LOGIN
by default, while CREATE ROLE
does not.)superuser status
A database superuser bypasses all permission checks, except the right to log in. This is a dangerous privilege and should not be used carelessly; it is best to do most of your work as a role that is not a superuser. To create a new database superuser, use CREATE ROLE
name
SUPERUSER. You must do this as a role that is already a superuser.database creation
A role must be explicitly given permission to create databases (except for superusers, since those bypass all permission checks). To create such a role, use CREATE ROLE
name
CREATEDB.role creation
A role must be explicitly given permission to create more roles (except for superusers, since those bypass all permission checks). To create such a role, use CREATE ROLE
name
CREATEROLE. A role with CREATEROLE
privilege can alter and drop other roles, too, as well as grant or revoke membership in them. However, to create, alter, drop, or change membership of a superuser role, superuser status is required; CREATEROLE
is insufficient for that.initiating replication
A role must explicitly be given permission to initiate streaming replication (except for superusers, since those bypass all permission checks). A role used for streaming replication must have LOGIN
permission as well. To create such a role, use CREATE ROLE
name
REPLICATION LOGIN.password
A password is only significant if the client authentication method requires the user to supply a password when connecting to the database. The password
and md5
authentication methods make use of passwords. Database passwords are separate from operating system passwords. Specify a password upon role creation with CREATE ROLE
name
PASSWORD 'string
'.
A role's attributes can be modified after creation with ALTER ROLE
. See the reference pages for the CREATE ROLE and ALTER ROLE commands for details.
It is good practice to create a role that has the CREATEDB
and CREATEROLE
privileges, but is not a superuser, and then use this role for all routine management of databases and roles. This approach avoids the dangers of operating as a superuser for tasks that do not really require it.
A role can also have role-specific defaults for many of the run-time configuration settings described in Chapter 19. For example, if for some reason you want to disable index scans (hint: not a good idea) anytime you connect, you can use:
This will save the setting (but not set it immediately). In subsequent connections by this role it will appear as though SET enable_indexscan TO off
had been executed just before the session started. You can still alter this setting during the session; it will only be the default. To remove a role-specific default setting, use ALTER ROLE
rolename
RESET varname
. Note that role-specific defaults attached to roles without LOGIN
privilege are fairly useless, since they will never be invoked.
This authentication method operates similarly to password
except that it uses PAM (Pluggable Authentication Modules) as the authentication mechanism. The default PAM service name is postgresql
. PAM is used only to validate user name/password pairs and optionally the connected remote host name or IP address. Therefore the user must already exist in the database before PAM can be used for authentication. For more information about PAM, please read the Linux-PAM Page.
The following configuration options are supported for PAM:
pamservice
PAM service name.
pam_use_hostname
Determines whether the remote IP address or the host name is provided to PAM modules through the PAM_RHOST
item. By default, the IP address is used. Set this option to 1 to use the resolved host name instead. Host name resolution can lead to login delays. (Most PAM configurations don't use this information, so it is only necessary to consider this setting if a PAM configuration was specifically created to make use of it.)
If PAM is set up to read /etc/shadow
, authentication will fail because the PostgreSQL server is started by a non-root user. However, this is not an issue when PAM is configured to use LDAP or other authentication methods.
PostgreSQL 中的資料表空間允許資料庫管理者定義檔案系統中可以儲存資料庫物件的檔案的路徑。建立完成後,在建立資料庫物件時可以透過名稱來引用資料表空間。
通過使用資料表空間,管理者可以控制 PostgreSQL 安裝的磁碟規畫。至少在兩個方面是很有用的。首先,如果初始化叢集的分割區(partition)或磁碟區(volume)的空間不足並且無法擴展時,則可以在不同的分割區上建立資料表空間,資料庫系統重新配置即可使用。
其次,資料表空間允許管理者依資料庫物件特性的知識來優化效能。例如,使用率很高的索引可以放置在非常快速、高可用的磁碟上,例如昂貴的固態磁碟。另一方面,對於很少使用或不關鍵的歸檔資料的資料表可以儲存在較便宜、速度較慢的磁碟系統上。
即使位於主 PostgreSQL 資料目錄之外,資料表空間也是資料庫叢集組成的一部分,並且它將作為資料檔案的自治集合來處理。它們會依賴於主資料目錄中包含的中繼資料,因此無法附加到不同的資料庫叢集或單獨備份。同樣,如果您失去了一個資料表空間(檔案被刪除、磁碟故障等),資料庫叢集可能變得不可讀取或無法啟動。所以將資料表空間放置在臨時檔案系統(如 RAM Disk)上會影響整個叢集的可靠性。
要定義資料表空間,請使用 CREATE TABLESPACE 指令,例如:
該路徑必須是 PostgreSQL 作業系統使用者所擁有的空白目錄。隨後在資料表空間內建立的所有物件都將儲存在此目錄下的檔案中。該位置不得位於可移除或瞬時儲存上,因為如果資料表空間失去了,叢集可能會無法運行。
在每個邏輯檔案系統中建立多個資料表空間通常沒有什麼意義,因為你無法控制邏輯檔案系統內單個檔案的位置。但是,PostgreSQL 不會強制實施任何此類限制,事實上它並不直接發現系統上的檔案系統界線。 它只是將檔案儲存在你告訴它所使用的目錄中而已。
建立資料表空間本身必須以資料庫超級使用者的身份完成,但在此之後,你可以允許普通的資料庫使用者來使用它。為此,請為它們授予 CREATE 的權限。
資料表、索引和整個資料庫可以分配給特定的資料表空間。為此,具有給定資料表空間上的 CREATE 權限的使用者必須將資料表空間名稱作為參數傳遞給相關的指令。例如,下面會在資料表空間 space1 中建立一個資料表:
或者,使用 default_tablespace 參數:
當 default_tablespace 設定為空字符之外的任何內容時,它將為 CREATE TABLE 和 CREATE INDEX 指令提供一個隱含的 TABLESPACE 子句,當它們沒有明確的 TABLESPACE 子句的時候。
還有一個 temp_tablespaces 參數,用於指定臨時資料表和索引的位置,以及用於排序大型資料之類目的的臨時檔案。這可以是資料表空間名稱的列表,而不是只有一個,以便與臨時物件關聯的負載可以分佈在多個資料表空間中。每次建立臨時物件時都會挑選該列表的隨機成員。
與資料庫關聯的資料表空間用於儲存該資料庫的系統目錄。此外,如果沒有給予 TABLESPACE 子句也沒有其他由 default_tablespace 或 temp_tablespaces(根據需要)的選擇指定的話,那麼它是用於在資料庫內建立的資料表、索引和臨時檔案的預設資料表空間。如果建立的資料庫沒有為其指定資料表空間,則它使用與從其複製的模板資料庫相同的資料表空間。
當資料庫叢集初始化時,會自動建立兩個資料表空間。pg_global 資料表空間用於共享的系統目錄。pg_default 資料表空間是 template1 和 template0 資料庫的預設資料表空間(因此,除非它被 CREATE DATABASE 中的 TABLESPACE 子句所取代,否則它將成為其他資料庫的預設資料表空間)。
一旦建立之後,可以從任何資料庫使用資料表空間,只要請求的使用者具有足夠的權限即可。這意味著,除非所有使用資料表空間的資料庫中所有物件都被刪除,否則不能刪除資料表空間。
要刪除空的資料表空間,請使用 DROP TABLESPACE 指令。
例如,要確認一組現有的資料表空間,請檢查 pg_tablespace 系統目錄
psql 工具中的 \db 指令對於列出現有的資料表空間也很有用。
PostgreSQL 利用 symbolic link 來簡化資料表空間的管理。但這也意味著資料表空間只能用於支援 symbolic link 的系統。
目錄 $PGDATA/pg_tblspc 包含指向叢集中定義的每個非內建資料表空間的 symbolic link。儘管並不推薦,但可以透過重新定義這些連接來手動調整資料表空間的佈局。在伺服器運行期間,任何情況下都不會執行此操作。請注意,在 PostgreSQL 9.1 及更早版本中,你還需要使用新位置更新 pg_tablespace 目錄。(如果你不這樣做,pg_dump 將繼續輸出到舊的資料表空間路徑。)
Functions, triggers and row-level security policies allow users to insert code into the backend server that other users might execute unintentionally. Hence, these mechanisms permit users to “Trojan horse” others with relative ease. The strongest protection is tight control over who can define objects. Where that is infeasible, write queries referring only to objects having trusted owners. Remove from search_path
the public schema and any other schemas that permit untrusted users to create objects.
Functions run inside the backend server process with the operating system permissions of the database server daemon. If the programming language used for the function allows unchecked memory accesses, it is possible to change the server's internal data structures. Hence, among many other things, such functions can circumvent any system access controls. Function languages that allow such access are considered “untrusted”, and PostgreSQL allows only superusers to create functions written in those languages.
Table of Contents 22.1. Overview 22.2. Creating a Database 22.3. Template Databases 22.4. Database Configuration 22.5. Destroying a Database 22.6. Tablespaces
正在執行的 PostgreSQL 伺服器的每個執行程序都會管理一個或多個資料庫。而資料庫是組織 SQL 物件(資料庫物件)的最高層級。本章介紹資料庫的屬性以及如何建立、管理和銷毀資料庫。
The collation feature allows specifying the sort order and character classification behavior of data per-column, or even per-operation. This alleviates the restriction that the LC_COLLATE
and LC_CTYPE
settings of a database cannot be changed after its creation.
Conceptually, every expression of a collatable data type has a collation. (The built-in collatable data types are text
, varchar
, and char
. User-defined base types can also be marked collatable, and of course a domain over a collatable data type is collatable.) If the expression is a column reference, the collation of the expression is the defined collation of the column. If the expression is a constant, the collation is the default collation of the data type of the constant. The collation of a more complex expression is derived from the collations of its inputs, as described below.
The collation of an expression can be the “default” collation, which means the locale settings defined for the database. It is also possible for an expression's collation to be indeterminate. In such cases, ordering operations and other operations that need to know the collation will fail.
When the database system has to perform an ordering or a character classification, it uses the collation of the input expression. This happens, for example, with ORDER BY
clauses and function or operator calls such as <
. The collation to apply for an ORDER BY
clause is simply the collation of the sort key. The collation to apply for a function or operator call is derived from the arguments, as described below. In addition to comparison operators, collations are taken into account by functions that convert between lower and upper case letters, such as lower
, upper
, and initcap
; by pattern matching operators; and by to_char
and related functions.
For a function or operator call, the collation that is derived by examining the argument collations is used at run time for performing the specified operation. If the result of the function or operator call is of a collatable data type, the collation is also used at parse time as the defined collation of the function or operator expression, in case there is a surrounding expression that requires knowledge of its collation.
The collation derivation of an expression can be implicit or explicit. This distinction affects how collations are combined when multiple different collations appear in an expression. An explicit collation derivation occurs when a COLLATE
clause is used; all other collation derivations are implicit. When multiple collations need to be combined, for example in a function call, the following rules are used:
If any input expression has an explicit collation derivation, then all explicitly derived collations among the input expressions must be the same, otherwise an error is raised. If any explicitly derived collation is present, that is the result of the collation combination.
Otherwise, all input expressions must have the same implicit collation derivation or the default collation. If any non-default collation is present, that is the result of the collation combination. Otherwise, the result is the default collation.
If there are conflicting non-default implicit collations among the input expressions, then the combination is deemed to have indeterminate collation. This is not an error condition unless the particular function being invoked requires knowledge of the collation it should apply. If it does, an error will be raised at run-time.
For example, consider this table definition:
Then in
the <
comparison is performed according to de_DE
rules, because the expression combines an implicitly derived collation with the default collation. But in
the comparison is performed using fr_FR
rules, because the explicit collation derivation overrides the implicit one. Furthermore, given
the parser cannot determine which collation to apply, since the a
and b
columns have conflicting implicit collations. Since the <
operator does need to know which collation to use, this will result in an error. The error can be resolved by attaching an explicit collation specifier to either input expression, thus:
or equivalently
On the other hand, the structurally similar case
does not result in an error, because the ||
operator does not care about collations: its result is the same regardless of the collation.
The collation assigned to a function or operator's combined input expressions is also considered to apply to the function or operator's result, if the function or operator delivers a result of a collatable data type. So, in
the ordering will be done according to de_DE
rules. But this query:
results in an error, because even though the ||
operator doesn't need to know a collation, the ORDER BY
clause does. As before, the conflict can be resolved with an explicit collation specifier:
A collation is an SQL schema object that maps an SQL name to locales provided by libraries installed in the operating system. A collation definition has a provider that specifies which library supplies the locale data. One standard provider name is libc
, which uses the locales provided by the operating system C library. These are the locales that most tools provided by the operating system use. Another provider is icu
, which uses the external ICU library. ICU locales can only be used if support for ICU was configured when PostgreSQL was built.
A collation object provided by libc
maps to a combination of LC_COLLATE
and LC_CTYPE
settings, as accepted by the setlocale()
system library call. (As the name would suggest, the main purpose of a collation is to set LC_COLLATE
, which controls the sort order. But it is rarely necessary in practice to have an LC_CTYPE
setting that is different from LC_COLLATE
, so it is more convenient to collect these under one concept than to create another infrastructure for setting LC_CTYPE
per expression.) Also, a libc
collation is tied to a character set encoding (see Section 23.3). The same collation name may exist for different encodings.
A collation object provided by icu
maps to a named collator provided by the ICU library. ICU does not support separate “collate” and “ctype” settings, so they are always the same. Also, ICU collations are independent of the encoding, so there is always only one ICU collation of a given name in a database.
23.2.2.1. Standard Collations
On all platforms, the collations named default
, C
, and POSIX
are available. Additional collations may be available depending on operating system support. The default
collation selects the LC_COLLATE
and LC_CTYPE
values specified at database creation time. The C
and POSIX
collations both specify “traditional C” behavior, in which only the ASCII letters “A
” through “Z
” are treated as letters, and sorting is done strictly by character code byte values.
Additionally, the SQL standard collation name ucs_basic
is available for encoding UTF8
. It is equivalent to C
and sorts by Unicode code point.
23.2.2.2. Predefined Collations
If the operating system provides support for using multiple locales within a single program (newlocale
and related functions), or if support for ICU is configured, then when a database cluster is initialized, initdb
populates the system catalog pg_collation
with collations based on all the locales it finds in the operating system at the time.
To inspect the currently available locales, use the query SELECT * FROM pg_collation
, or the command \dOS+
in psql.
23.2.2.2.1. libc collations
For example, the operating system might provide a locale named de_DE.utf8
. initdb
would then create a collation named de_DE.utf8
for encoding UTF8
that has both LC_COLLATE
and LC_CTYPE
set to de_DE.utf8
. It will also create a collation with the .utf8
tag stripped off the name. So you could also use the collation under the name de_DE
, which is less cumbersome to write and makes the name less encoding-dependent. Note that, nevertheless, the initial set of collation names is platform-dependent.
The default set of collations provided by libc
map directly to the locales installed in the operating system, which can be listed using the command locale -a
. In case a libc
collation is needed that has different values for LC_COLLATE
and LC_CTYPE
, or if new locales are installed in the operating system after the database system was initialized, then a new collation may be created using the CREATE COLLATION command. New operating system locales can also be imported en masse using the pg_import_system_collations()
function.
Within any particular database, only collations that use that database's encoding are of interest. Other entries in pg_collation
are ignored. Thus, a stripped collation name such as de_DE
can be considered unique within a given database even though it would not be unique globally. Use of the stripped collation names is recommended, since it will make one less thing you need to change if you decide to change to another database encoding. Note however that the default
, C
, and POSIX
collations can be used regardless of the database encoding.
PostgreSQL considers distinct collation objects to be incompatible even when they have identical properties. Thus for example,
will draw an error even though the C
and POSIX
collations have identical behaviors. Mixing stripped and non-stripped collation names is therefore not recommended.
23.2.2.2.2. ICU collations
With ICU, it is not sensible to enumerate all possible locale names. ICU uses a particular naming system for locales, but there are many more ways to name a locale than there are actually distinct locales. initdb
uses the ICU APIs to extract a set of distinct locales to populate the initial set of collations. Collations provided by ICU are created in the SQL environment with names in BCP 47 language tag format, with a “private use” extension -x-icu
appended, to distinguish them from libc locales.
Here are some example collations that might be created:de-x-icu
German collation, default variantde-AT-x-icu
German collation for Austria, default variant
(There are also, say, de-DE-x-icu
or de-CH-x-icu
, but as of this writing, they are equivalent to de-x-icu
.)und-x-icu
(for “undefined”)
ICU “root” collation. Use this to get a reasonable language-agnostic sort order.
Some (less frequently used) encodings are not supported by ICU. When the database encoding is one of these, ICU collation entries in pg_collation
are ignored. Attempting to use one will draw an error along the lines of “collation "de-x-icu" for encoding "WIN874" does not exist”.
23.2.2.3. Creating New Collation Objects
If the standard and predefined collations are not sufficient, users can create their own collation objects using the SQL command CREATE COLLATION.
The standard and predefined collations are in the schema pg_catalog
, like all predefined objects. User-defined collations should be created in user schemas. This also ensures that they are saved by pg_dump
.
23.2.2.3.1. libc collations
New libc collations can be created like this:
The exact values that are acceptable for the locale
clause in this command depend on the operating system. On Unix-like systems, the command locale -a
will show a list.
Since the predefined libc collations already include all collations defined in the operating system when the database instance is initialized, it is not often necessary to manually create new ones. Reasons might be if a different naming system is desired (in which case see also Section 23.2.2.3.3) or if the operating system has been upgraded to provide new locale definitions (in which case see also pg_import_system_collations()
).
23.2.2.3.2. ICU collations
ICU allows collations to be customized beyond the basic language+country set that is preloaded by initdb
. Users are encouraged to define their own collation objects that make use of these facilities to suit the sorting behavior to their requirements. See http://userguide.icu-project.org/locale and http://userguide.icu-project.org/collation/api for information on ICU locale naming. The set of acceptable names and attributes depends on the particular ICU version.
Here are some examples:CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');
CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de@collation=phonebook');
German collation with phone book collation type
The first example selects the ICU locale using a “language tag” per BCP 47. The second example uses the traditional ICU-specific locale syntax. The first style is preferred going forward, but it is not supported by older ICU versions.
Note that you can name the collation objects in the SQL environment anything you want. In this example, we follow the naming style that the predefined collations use, which in turn also follow BCP 47, but that is not required for user-defined collations.CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');
CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = '@collation=emoji');
Root collation with Emoji collation type, per Unicode Technical Standard #51
Observe how in the traditional ICU locale naming system, the root locale is selected by an empty string.CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit');
CREATE COLLATION digitslast (provider = icu, locale = 'en@colReorder=latn-digit');
Sort digits after Latin letters. (The default is digits before letters.)CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');
CREATE COLLATION upperfirst (provider = icu, locale = 'en@colCaseFirst=upper');
Sort upper-case letters before lower-case letters. (The default is lower-case letters first.)CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-latn-digit');
CREATE COLLATION special (provider = icu, locale = 'en@colCaseFirst=upper;colReorder=latn-digit');
Combines both of the above options.CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');
CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');
Numeric ordering, sorts sequences of digits by their numeric value, for example: A-21
< A-123
(also known as natural sort).
See Unicode Technical Standard #35 and BCP 47 for details. The list of possible collation types (co
subtag) can be found in the CLDR repository. The ICU Locale Explorer can be used to check the details of a particular locale definition. The examples using the k*
subtags require at least ICU version 54.
Note that while this system allows creating collations that “ignore case” or “ignore accents” or similar (using the ks
key), PostgreSQL does not at the moment allow such collations to act in a truly case- or accent-insensitive manner. Any strings that compare equal according to the collation but are not byte-wise equal will be sorted according to their byte values.
By design, ICU will accept almost any string as a locale name and match it to the closest locale it can provide, using the fallback procedure described in its documentation. Thus, there will be no direct feedback if a collation specification is composed using features that the given ICU installation does not actually support. It is therefore recommended to create application-level test cases to check that the collation definitions satisfy one's requirements.
23.2.2.3.3. Copying Collations
The command CREATE COLLATION can also be used to create a new collation from an existing collation, which can be useful to be able to use operating-system-independent collation names in applications, create compatibility names, or use an ICU-provided collation under a more readable name. For example:
與有價值的所有資料一樣,PostgreSQL 資料庫應定期備份。雖然程序本質上很簡單,但對基本技術和假設有清晰的了解是很重要的。
以三種根本上不同的方法來備份 PostgreSQL 資料:
SQL dump
檔案系統層級的備份
持續性歸檔
每個方式都有它的優點和缺點;以下各節將逐一討論。
在某些情況下,使用 REINDEX 指令或一系列單獨的重建步驟定期重建索引是值得的。
已完全為空的 B-tree 索引頁面將被回收以供重複使用。但是,仍然存在空間使用效率低的可能性:如果頁面上除了少數索引鍵之外的所有索引鍵都已被刪除,則頁面仍然會被分配。因此,最終刪除每個範圍中的大多數但不是所有鍵的使用模式將會發現空間使用率不佳。對於此類使用模式,建議定期重建索引。
非 B-tree 索引中膨脹的可能性尚未具有很好的研究。所以在使用任何非 B-tree 索引類型時,定期監視索引的磁碟大小是個好主意。
此外,對於 B-tree 索引,新建構的索引比多次更新的索引要快一些,因為邏輯上相鄰的頁面通常在新建構的索引中也是物理上相鄰的。(這種考慮不適用於非 B-tree 索引。)為了提高存取速度,定期重建索引會是值得的。
在所有情況下,REINDEX 都可以很安全,簡單地使用。但由於該指令需要獨占資料表鎖定,因此通常最好使用一系列建立和替換步驟來執行索引重建。使用 CONCURRENTLY 選項支援 CREATE INDEX 的索引類型可以透過這種方式重新建立。如果成功並且結果索引有效,則可以使用 ALTER INDEX 和 DROP INDEX 的組合將原始索引替換為新建構的索引。當索引用於強制唯一性或其他約束時,可能需要使用 ALTER TABLE 將現有限制條件與新索引強制執行的限制條件交換。 在使用之前仔細檢查這種多步驟重建方法,因為對這些索引透過這種方式重新建立索引可能有些限制,並且必須處理錯誤。
It is a good idea to save the database server's log output somewhere, rather than just discarding it via /dev/null
. The log output is invaluable when diagnosing problems. However, the log output tends to be voluminous (especially at higher debug levels) so you won't want to save it indefinitely. You need to rotate the log files so that new log files are started and old ones removed after a reasonable period of time.
If you simply direct the stderr of postgres
into a file, you will have log output, but the only way to truncate the log file is to stop and restart the server. This might be acceptable if you are using PostgreSQL in a development environment, but few production servers would find this behavior acceptable.
A better approach is to send the server's stderr output to some type of log rotation program. There is a built-in log rotation facility, which you can use by setting the configuration parameter logging_collector
to true
in postgresql.conf
. The control parameters for this program are described in Section 19.8.1. You can also use this approach to capture the log data in machine readable CSV (comma-separated values) format.
Alternatively, you might prefer to use an external log rotation program if you have one that you are already using with other server software. For example, the rotatelogs tool included in the Apache distribution can be used with PostgreSQL. To do this, just pipe the server's stderr output to the desired program. If you start the server with pg_ctl
, then stderr is already redirected to stdout, so you just need a pipe command, for example:
Another production-grade approach to managing log output is to send it to syslog and let syslog deal with file rotation. To do this, set the configuration parameter log_destination
to syslog
(to log to syslog only) in postgresql.conf
. Then you can send a SIGHUP
signal to the syslog daemon whenever you want to force it to start writing a new log file. If you want to automate log rotation, the logrotate program can be configured to work with log files from syslog.
On many systems, however, syslog is not very reliable, particularly with large log messages; it might truncate or drop messages just when you need them the most. Also, on Linux, syslog will flush each message to disk, yielding poor performance. (You can use a “-
” at the start of the file name in the syslog configuration file to disable syncing.)
Note that all the solutions described above take care of starting new log files at configurable intervals, but they do not handle deletion of old, no-longer-useful log files. You will probably want to set up a batch job to periodically delete old log files. Another possibility is to configure the rotation program so that old log files are overwritten cyclically.
pgBadger is an external project that does sophisticated log file analysis. check_postgres provides Nagios alerts when important messages appear in the log files, as well as detection of many other extraordinary conditions.
PostgreSQL 資料庫需要定期維護,稱為資料庫清理(vacuum)。 對於一裝的執行環境而言,透過 autovacuum 背景程序進行資料庫清理就足夠了,這在 24.1.6 節中有描述。您可能需要調整其中所描述的自動清除參數,以獲得您的情況的最佳結果。 一些資料庫管理員希望用手動管理的 VACUUM 命令來補充或替換背景程序的活動,這些命令通常根據 cron 或 Task Scheduler 的腳本計劃執行。 要正確設定手動管理的資料庫清理,了解接下來幾小節中討論的問題至關重要。依靠自動清理的管理員可能仍然希望瀏覽這些內容以幫助他們理解和調整自動清理。
必須以 PostgreSQL VACUUM 命令處理每個資料表,原因如下:
恢復或回收使用因更新或刪除資料列所佔用的磁碟空間。
更新 PostgreSQL 查詢計劃器使用的資料統計資訊。
更新可視性結構,這會增加索引限定掃描的效率。
防止由於事務 ID 重覆或 multixact ID 重覆而失去非常舊的資料。
這些原因中的每一個都會要求執行不同頻率和範圍的 VACUUM 操作,如以下小節所述。
VACUUM 有兩種變形:標準 VACUUM 和 VACUUM FULL。VACUUM FULL 可以回收更多磁碟空間,但執行速度要慢得多。而且,VACUUM 的標準形式可以與產品資料庫同時操作運行。(SELECT、INSERT、UPDATE 和 DELETE 等指令將繼續正常工作,但在 VACUUM FULL 時,您將無法使用諸如 ALTER TABLE 之類的指令修改資料表的定義。)VACUUM FULL 需要獨占鎖定它正在處理的資料表,因此無法與其他資料表的使用同時進行。因此,一般來說,管理員應該努力使用標準 VACUUM 並避免VACUUM FULL。
VACUUM 會產生大量的 I/O流量,這會導致其他正在進行的連線效能較差。有一些配置參數可以調整以減少背景資料庫清理對效能的影響 - 參閱第 19.4.4 節。
在 PostgreSQL 中,資料列的 UPDATE 或 DELETE 不會立即刪除該資料列的舊版本。這種方法對於獲得多版本平行控制(MVCC,參閱第 13 章)的好處是必要的:資料列的版本不能被刪除,而其他事務仍然可以看到。 但最終,過時或刪除的資料列版本不再讓任何交易感興趣。它佔用的空間必須被新行重新使用以避免無限增長的磁碟空間需求。這是透過執行 VACUUM 來完成的。
VACUUM 的標準作法是移除資料表和索引中過時的資料列版本,並標記可供將來重複使用的空間。 但是,除非資料表末端的一個或多個頁面變為完全空閒並且可以輕鬆獲取排他資料表鎖的特殊情況,否則它不會將空間還給作業系統。相比之下,VACUUM FULL 透過寫入完整新版本使其沒有空閒的空間來主動壓縮資料表。這最大限度地減少了資料表的大小,但可能需要很長時間。 它還需要用於資料表的新副本的額外磁碟空間,直到操作完成。
常態的資料庫清理通常目標是經常足夠地執行標準 VACUUM 以避免需要 VACUUM FULL。autovacuum 背景程序嘗試以這種方式工作,實際上永遠不會發出 VACUUM FULL。在這種方法中,這個想法並不是將資料表保持在最小尺寸,而是為了保持磁碟空間的穩定狀態使用:每個資料表都佔用相當於其最小尺寸的空間,再加上在 VACUUM 之間使用的空間很大,儘管可以使用 VACUUM FULL 將表縮回到最小大小並將磁碟空間還回到作業系統,但如果資料表將來會再次增長,則沒有多大意義。 因此,適度頻繁的標準 VACUUM 運行比用於維護大量更新資料表的罕見 VACUUM FULL 運行更好。
有些管理者更喜歡自己安排資料庫清理作業,例如在負載較低時在夜間進行所有工作。按照固定的時間表進行資料庫清理作業的困難在於,如果資料表在更新活動中出現意外的峰值,則可能會變得臃腫到 VACUUM FULL 真的需要回收空間。使用自動清理背景程序緩解了這個問題,因為背景程序會根據更新活動動態調度清理作業。除非您有一個非常可預測的工作量,否則完全停用該背景程序是不明智的。一個可能的折衷辦法是設定背景程序的參數,以便它僅對異常繁重的更新活動作出反應,從而避免事情失控,而預定的 VACUUM 參數是能在典型的情況下完成大部分工作。
對於那些不使用自動清理的人來說,一種典型的方法是在低使用期內每天安排一次資料庫範圍內的 VACUUM,並根據需要更頻繁地清空大量更新的資料表。(一些具有極高更新率的設定每隔幾分鐘就會清理一次最繁忙的資料表,如此頻繁。)如果叢集中有多個資料庫,請不要忘記每個資料庫都有 VACUUM;vacuumdb 工作可能會有所幫助。
當一個資料表由於大量更新或刪除活動而包含大量過時資料列版本時,一般的 VACUUM 可能不能令人滿意。如果您有這樣一個資料表並且您需要回收佔用的多餘磁碟空間,則需要使用 VACUUM FULL 或 CLUSTER 或 ALTER TABLE 的資料表重寫變形之一。這些命令重寫整個資料表的新副本並為其構建新的索引。所有這些選項都需要獨占鎖定。請注意,它們也暫時使用大約等於資料表大小的額外磁碟空間,因為資料表和索引的舊副本只有在新資料表完成後才能完全釋放。
如果您有一張定期刪除整個內容的資料表,請考慮使用 TRUNCATE 而不是使用 DELETE 和 VACUUM。TRUNCATE 會立即刪除資料表的全部內容,而不需要後續的 VACUUM 或 VACUUM FULL 來回收現在未使用的磁碟空間。缺點是違反了嚴格的 MVCC 意義。
PostgreSQL 查詢規劃器依賴於關於表格內容的統計資訊,以便為查詢産生良好的查詢計劃。這些統計資訊由 ANALYZE 指令收集,該指令可以單獨呼叫,也可以作為 VACUUM 中的選擇性的使用。有足夠準確的統計數據非常重要,糟糕的計劃選擇可能會降低資料庫效能。PostgreSQL 查詢規劃器依賴於關於表格內容的統計資訊,以便為查詢産生良好的查詢計劃。這些統計資訊由 ANALYZE 指令收集,該指令可以單獨呼叫,也可以作為 VACUUM 中的選擇性的使用。有足夠準確的統計數據非常重要,糟糕的計劃選擇可能會降低資料庫效能。
autovacuum 背景程序(如果啟用的話)會在資料表內容發生相當的變化時自動發出 ANALYZE 指令。但是,管理員可能更喜歡依靠手動調度的 ANALYZE 操作,尤其是如果知道資料表上的更新活動不會影響「有興趣的」欄位的統計信息。背景程序嚴格按照插入或更新的資料列數的安排 ANALYZE;不過它並不知道這是否會導致有意義的統計變化。
與資料清理恢復空間一樣,頻繁更新統計數據對於大量更新的資料表比對很少更新的資料表更有用。但即使對於大量更新的資料表,如果資料的統計分佈變化不大,也可能不需要進行統計更新。一個簡單的經驗法則是考慮資料表中欄位的最小值和最大值的變化。例如,包含行更新時間的 timestamp 欄在插入和更新資料列時會不斷增加最大值;這樣的欄位可能需要更頻繁的統計更新,而不是包含網頁內容的網址欄位。URL 欄位可能會經常收到更新,但其內容的統計分佈可能變化比較慢。
可以在特定的資料表上執行 ANALYZE,甚至可以在資料表中特定的欄位上執行ANALYZE,因此如果應用程序需要,可以更靈活地更新某些統計資訊。然而,在實務上,通常最好僅分析整個資料庫,因為這是一種快速操作。ANALYZE 以資料表中資料列的隨機抽樣而不是讀取每一個資料列。
儘管 ANALYZE 頻率對每個欄位的調整可能效率不高,但您可能會發現值得對 ANALYZE 統計資訊的詳細程度進行每個欄位調整。在 WHERE 子句中大量使用且具有高度不規則資料分佈的欄位可能需要比其他欄位更精細的資料直方圖。請參閱 ALTER TABLE SET STATISTICS,或使用 default_statistics_target 組態參數變更資料庫層級的預設值。
此外,預設情況下,有關 SELECT 函數的訊息有限。但是,如果建立使用函數呼叫的表示式索引,則會收集有關該函數的有用統計訊息,這可以極大地改進使用表示式索引的查詢計劃。
autovacuum 背景程序不會為外部資料表發出 ANALYZE 指令,因為它無法確定可能有用的頻率。如果您的查詢需要統計外部資料表的正確計劃,最好在適當的時間表上執行手動管理的 ANALYZE 指令。
Vacuum 為每個資料表維護一個可見性映射表(Visibility Map),以追踪哪些頁面包含對所有進行中事務(以及所有未來事務,直到頁面再次被修改)可見的 tuple。這有兩個目的,首先,資料庫清理本身可以在下一次運行中跳過這些頁面,因為沒有什麼要清理的。
其次,它允許 PostgreSQL 僅使用索引來回應某些查詢,而無需參考基本資料表。 由於 PostgreSQL 索引不包含 tuple 的可見性資訊,因此普通的索引掃描會取得每個匹配索引項目的 heap tuple,以檢查目前事務是否應該看到它。另一方面,僅索引掃描首先檢查可見性映射表。如果知道頁面上的所有 tuple 都可見,則可以跳過 heap 取回。這對於可見性映射表可以防止磁碟存取的大型資料集非常有用。可見性映射表遠小於 heap,因此即使 heap 非常大,也可以輕鬆地進行快取。
PostgreSQL 的 MVCC 交易事務處理相依於比較交易事務 ID(XID):插入 XID 大於目前事務的 XID 的資料列版本是「未來」,則目前事務不應該是可見的。但由於事務 ID 的大小有限(32 位元),運行了很長時間(超過 40 億次事務)的叢集將遭受事務 ID 重覆:XID 計數器繞回到零,並且所有突然發生的事務在過去似乎變得是在未來 - 這意味著他們的輸出變得不可見。簡而言之,災難性的資料丟失。(實際上資料仍然存在,但如果你無法獲得它,那就沒意義了。)為了避免這種情況,有必要每 20 億次交易至少清理一次每個資料庫中的每個資料表。
定期清理能解決問題的原因是 VACUUM 會將資料列標記為凍結,表明它們是由過去的事務插入的,以至於插入事務的影響肯定對所有目前和未來的事務都可見。使用 modulo-232 運算比較普通 XID。這意味著對於每個普通的 XID,有20億個「較舊」的 XID 和 20 個「較新」的 XID;另一種說法是普通的 XID 空間是圓形的,沒有端點。因此,一旦使用特定的普通 XID 建立了資料列版本,無論我們在談論哪種正常的 XID,資料列版本對於接下來的 20 億次交易看起來都是“過去的”。如果資料列版本在超過 20 億次交易後仍然存在,那麼它將來會突然出現。為了防止這種情況,PostgreSQL 保留了一個特殊的 XID,FrozenTransactionId,它不遵循正常的 XID 比較規則,並且總是被認為比每個普通的 XID 都舊。凍結資料列版本被視為插入 XID 是 FrozenTransactionId,因此它們對於所有正常事務而言似乎都是「過去」而不管繞回重覆的問題,因此這些資料列版本在刪除之前有效,無論多長時間都是。
在 9.4 之前的 PostgreSQL 版本中,透過實際用 FrozenTransactionId 替換資料列的插入 XID 來實現凍結,這在資料列的 xmin 系統欄位中是可見的。較新版本只設置一個指標,保留資料列的原始 xmin 以便進行可能的查證使用。但是,仍然可以在 9.4 之前版本的資料庫 pg_upgrade 中找到 xmin 等於 FrozenTransactionId(2)的資料列。
此外,系統目錄可能包含 xmin 等於 BootstrapTransactionId(1) 的資料列,表示它們是在 initdb 的第一階段插入的。與 FrozenTransactionId 一樣,此特殊 XID 被視為比每個普通 XID 更舊。
vacuum_freeze_min_age 控制在凍結該 XID 的資料列之前 XID 值的大小。增加此設定可以避免不必要的維護工作,否則將很快再次修改否則交易事務將被凍結,但減少此設定會增加在必須再次對資料表進行清理之前可以處理的交易事務數量。
VACUUM 使用可見性映射表來確定必須掃描資料表的哪些頁面。通常,它會跳過沒有任何過期資料列版本的頁面,即使這些頁面可能仍然具有舊 XID 值的資料列版本。因此,正常的 VACUUM 並不總是凍結資料表中每個舊的資料列版本。 VACUUM 會定期執行積極的清理,僅跳過既不包含過期資料列也不包含任何未凍結的 XID 或 MXID 值的頁面。vacuum_freeze_table_age 控制 VACUUM 何時執行此操作:如果自上次此類掃描以來已經處理過的事務數量大於 vacuum_freeze_table_age 減去 vacuum_freeze_min_age,則掃描全部可見但未全部凍結的頁面。將 vacuum_freeze_table_age 設定為 0 會強制 VACUUM 對所有掃描使用此更積極的策略。
資料表可以不清理的最長時間是 20 億個事務減去上次積極清理時的 vacuum_freeze_min_age 值。如果它不清理超過了那個時間,可能會導致資料遺失。為確保不會發生這種情況,將在任何可能包含 XID 未滿配定參數 autovacuum_freeze_max_age 指定的年齡的未凍結資料列的資料表上呼叫autovacuum。(即使禁用 autovacuum,也會執行這個動作。)
這意味著如果資料表沒有以其他方式進行清理,則每次 autovacuum_freeze_max_age 減去 vacuum_freeze_min_age 的事務數量時,將在其上執行 autovacuum。對於經常用於空間回收目的而被清理的資料表,這一點並不重要。但是,對於靜態資料表(包括接收插入但沒有更新或刪除的資料表),不需要清理進行空間回收,因此嘗試最大化非常大的靜態資料表上強制自動清理之間的間隔會很有用。顯然,可以透過增加 autovacuum_freeze_max_age 或減少 vacuum_freeze_min_age 來達到此目的。
vacuum_freeze_table_age 的有效最大值為 0.95 * autovacuum_freeze_max_age;高於此值的設定將被限制為最大值。高於 autovacuum_freeze_max_age 的值是沒有意義的,因為無論如何都會在該點觸發n防止交易重疊的自動清理,並且 0.95 乘數在此之前留下一些喘息空間來執行手動 VACUUM。根據經驗,vacuum_freeze_table_age 應設定為略低於 autovacuum_freeze_max_age 的值,留下足夠的間隙,以便在該間隙中執行由日常刪除和更新活動觸發定期的 VACUUM 或 autovacuum。將它設定得太近可能會導致防止交易重疊的自動清理,即使該資料表最近被清理以回收空間,而較低的值還是會導致更頻繁的積極清理。
增加 autovacuum_freeze_max_age(以及 vacuum_freeze_table_age)的唯一缺點是資料庫叢集的 pg_xact 和 pg_commit_ts 子目錄將佔用更多空間,因為它必須儲存提交狀態和(如果啟用了 track_commit_timestamp)所有事務的時間戳記回到 autovacuum_freeze_max_age horizon。提交狀態每個交易事務使用兩個位元,因此如果 autovacuum_freeze_max_age 設定為其最大允許值 20 億,則 pg_xact 可以增長到大約 0.5 GB,pg_commit_ts 可以增長到大約 20 GB,這與總資料庫大小相比這是微不足道的。建議將 autovacuum_freeze_max_age 設定為其最大允許值。否則,根據您願意允許 pg_xact 和 pg_commit_ts 儲存的內容進行設定。(一般情況下,2 億次交易,轉換為大約 50 MB 的 pg_xact 儲存空間和大約 2 GB 的pg_commit_ts 儲存空間。)
減少 vacuum_freeze_min_age 的一個缺點是它可能導致 VACUUM 進行無謂的工作:如果此後很快更新資料列(導致它獲取新的 XID),凍結資料列版本會浪費時間。因此,設定應該足夠大,以至於資料列不會被凍結,直到它們不再可能更新為止。
為了追踪資料庫中最早解凍的 XID 的值,VACUUM 將 XID 統計訊息儲存在系統資料表 pg_class 和 pg_database 中。特別是,資料表 pg_class 的 relfrozenxid 欄位包含該資料表的最後一個積極 VACUUM 使用的凍結截止 XID。由 XID 早於此截止 XID 的事務插入,則所有資料列都保證已被凍結。同理,資料庫的 pg_database 的 datfrozenxid 欄位是該資料庫中出現的未凍結 XID 的下限 - 它只是資料庫中每個資料表 relfrozenxid 的最小值。檢查此訊息的便捷方法是執行以下查詢:
age 欄位測量從截止 XID 到目前事務的 XID 的事務數。
VACUUM 通常僅掃描自上次清理以來已修改的頁面,但只有在掃描可能包含未凍結 XID 資料表的每個頁面時才能提升 relfrozenxid。當 relfrozenxid 超過 vacuum_freeze_table_agetransactions 時,或當使用 VACUUM 的 FREEZE 選項時,又或當所有尚未全部凍結的頁面碰巧需要清理以刪除過期資料列版本時,才會發生這種情況。當 VACUUM 掃描資料表中尚未全部凍結的每個頁面時,應將 age(relfrozenxid)設定為比 vacuum_freeze_min_age 設定略多一點的值(更多是自 VACUUM 啟動以來啟動的事務數量)。如果在達到 autovacuum_freeze_max_age 之前沒有在資料表上發出 relfrozenxid-advance 的 VACUUM,則很快將強制執行該資料表的 autovacuum。
如果由於某種原因 autovacuum 無法從資料表中清除舊的 XID,當資料庫最舊的 XID 從重疊點到達一千萬個事務時,系統將開始發出這樣的警告消息:
(應該按照提示的建議進行手動 VACUUM 解決問題;但請注意,VACUUM 必須由超級使用者執行,否則它將無法處理系統目錄,就無法推進資料庫的 datfrozenxid。)這些警告如果被忽略,系統將關閉並拒絕啟動任何新的事務,一旦剩下的事務 XID 在重疊前少於 100 萬:
透過手動執行所需的 VACUUM 命令,可以讓管理員在沒有資料遺失的情況下恢復 100 萬個事務安全邊界。但是,由於系統一旦進入安全關閉模式就不會執行命令,唯一的方法是停止伺服器並以單一使用者模式啟動伺服器再執行 VACUUM。在單一使用者模式下不會強制執行關閉。有關使用單一使用者模式的詳細訊息,請參閱 postgres 參考頁面。
Multixact ID 用於支援多個事務的資料列鎖定。由於 tuple 標頭中只有有限的空間來儲存鎖定訊息,因此只要有多個事務同時鎖定一個資料列,該訊息就會被編碼為“multiple transaction ID”或簡稱 Multixact ID。 有關哪些事務 ID 包含在任何特定 multixact ID 中的訊息將單獨儲存在 pg_multixact 目錄中,並且只有 multixact ID 出現在 tuple 標頭中的 xmax 字串中。與事務 ID 一樣,multixact ID 實作為 32 位元計數器和相對應的儲存,所有這些都需要仔細的存續管理,儲存清理和環繞處理。有一個單獨的儲存區域,用於保存每個 multixact 中的成員列表,該列表也使用 32 位元計數器,必須進行管理。
每當 VACUUM 掃描資料表時,它將替換任何比 vacuum_multixact_freeze_min_age 更舊的多重 ID(Multixact ID),其值可以是零值,單個事務 ID 或更新的多重 ID。對於每個資料表,pg_class.relminmxid 儲存仍出現在該資料表的任何 tuple 中的最舊的多重 ID。如果此值早於 vacuum_multixact_freeze_table_age,則強制使用積極地清理。如前一節所述,積極的清理意味著只會跳過那些已知全凍結的頁面。可以在 pg_class.relminmxid 上使用 mxid_age() 來查詢其存在時間。
無論是什麼原因導致積極的 VACUUM 掃描都能夠提升該資料表的值。最終,由於掃描了所有資料庫中的所有資料表並提升了其最舊的 multixact 值,因此可以移除舊的 multixacts 的磁碟儲存。
作為安全設備,對於 multixact-age 大於 autovacuum_multixact_freeze_max_age 的任何資料表,都將進行積極的清理掃描。如果使用的成員儲存空間量超過可定址儲存空間的 50%,那麼對於所有資料表,從具有最早的 multixact-age 的那些開始,也將逐步進行積極的清理掃描。即使名義上停用了 autovacuum,也會發生這兩種積極性掃描。
PostgreSQL 有一個選用但強烈推薦的 autovacuum 功能,其目的是自動執行 VACUUM 和 ANALYZE 指令。啟用後,autovacuum 將檢查已插入、更新或刪除大量 tuple 的資料表。這些檢查使用統計資訊收集工具;因此,除非將 track_counts 設定為 true,否則無法使用 autovacuum。在預設配置中,啟用 autovacuuming 並相對應地設定相關的配置參數。
「autovacuum 背景程序」實際上由多個程序所組成。有一個主控的背景程序,稱為 autovacuum 啟動程序,負責啟動所有資料庫的 autovacuum 工作程序。啟動程序將跨時間分配工作,嘗試在每個 autovacuum_naptime 秒內啟動每個資料庫中的一個工作程序。(因此,如果安裝 N 個資料庫,則每個 autovacuum_naptime / N 秒將啟動一個新工作程序。)允許最多同時運行 autovacuum_max_workers 工作程序。如果要處理的 autovacuum_max_workers 資料庫不止一個,則第一個工作程序完成後將立即處理下一個資料庫。每個工作程序將檢查其資料庫中的每個資料表,並根據需要執行 VACUUM 或 ANALYZE。log_autovacuum_min_duration 可以設定為監控 autovacuum 工作程序的活動。
如果幾個大型資料表都有資格在短時間內進行清理,那麼所有自動清理工作程序可能會長時間針對這些資料表進行清理。這將導致其他資料表和資料庫在工作程序可用之前無法被清理。單個資料庫中可能有多少程序沒有限制,但工作程序確實會試圖避免重複已經由其他工作程序完成的工作。請注意,正在運行的 worker 的數量不計入 max_connections 或 superuser_reserved_connections 限制。
其 relfrozenxid 值大於 autovacuum_freeze_max_age 事務舊的資料表總是被清理(這也適用於那些已通過儲存參數修改了凍結最大年齡的資料表;請參閱下文)。 否則,如果自上一個 VACUUM 以來廢棄的 tuple 數超過“清理閾值(vacuum threshold)”,則對該資料表進行清理。 清理閾值的定義為:
自動清理的基準閾值為 autovacuum_vacuum_threshold,自動清理比例因子為 autovacuum_vacuum_scale_factor,tuple 數為 pg_class.reltuples。從統計資訊收集器獲取過時 tuple 的數量;它是由每個 UPDATE 和 DELETE 操作時的半精確計數。(這只是半精確的,因為某些資訊可能會在負載較重時下遺失。)如果資料表的 relfrozenxid 值超過 vacuum_freeze_table_age 時,則執行積極的清理以凍結舊 tuple 並提前 relfrozenxid;否則,僅掃描自上次清理以來已修改的頁面。
對於分析,使用類似的條件:此閾值定義為:
與自上次 ANALYZE 以來插入、更新或刪除的 tuple 總數進行比較。
autovacuum 無法存取臨時資料表。因此,應透過直接執行 SQL 指令進行適當的清理和分析操作。
預設閾值和比例因子來自 postgresql.conf,但可以基於每個資料表覆寫它們(以及許多其他 autovacuum 控制參數);有關更多訊息,請參閱儲存參數。如果透過資料表的儲存參數變更了設定,則在處理該資料表時使用該值;否則使用全域設定。 有關全域設定的更多詳細訊息,請參閱第 19.10 節。
當多個工作程序執行時,autovacuum 成本延遲參數(參閱第 19.4.4 節)在所有正在執行的工作程序中是「平衡的」,因此無論實際執行的工作程序數量如何,對系統的總 I/O 影響都是相同的。但是,在平衡算法中不考慮任何處理已設定每表 autovacuum_vacuum_cost_delay 或 autovacuum_vacuum_cost_limit 儲存參數的資料表工作程序。
像任何資料庫軟體一樣,PostgreSQL 要求定期執行某些任務以維持最佳性能。這裡討論的任務是必須的,但它們本質上是重複性的,並且可以使用標準工具(如 cron 腳本或 Windows 的「Task Scheduler」)輕鬆實現自動化。資料庫管理員有責任設置適當的腳本,並檢查它們是否成功執行。
一項明顯的維護任務是定期建立資料的備份副本。如果沒有最近的備份,在災難發生後(磁碟故障、火災、錯誤地刪除關鍵資料表等),您將無法恢復。PostgreSQL 中的備份和還原機制將在第 25 章中詳細討論。
另一個主要類別的維護任務是定期「清理」資料庫。這個活動在第 24.1 節中討論。與此密切相關的是更新查詢規劃器所使用的統計信息,如第 24.1.3 節所述。
另一個需要定期關注的任務是日誌檔案管理。這在第 24.3 節中討論。
check_postgres 可用於監控資料庫執行狀況並回報異常情況。check_postgres 能與Nagios 和 MRTG 共同運作,但也可以獨立運行。
與其他一些資料庫管理系統相比,PostgreSQL 維護費用較低。儘管如此,對這些任務的適當關注將能有效地確保系統的使用上愉快且富有成效的體驗。
持續性歸檔可用於建構高可用性(HA)的叢集配置,其中一個或多個備用伺服器準備好在主伺服器發生故障時接管操作。此功能被廣泛稱為熱備份(warm standby)或日誌轉送(Log-Shipping)。
伺服器們是人為的相依,由主伺服器和備用伺服器協同工作以提供此功能。主伺服器以持續性歸檔模式運行,而每個備用伺服器以連續恢復模式運行,從主伺服器讀取 WAL 檔案。毌須更改資料庫的資料表即可啟用此功能,因此與其他一些複寫解決方案相比,它可以提供較低的管理成本。此配置對主伺服器的效能影響也相對較低。
直接將 WAL 記錄從一個資料庫伺服器移動到另一個資料庫伺服器通常被稱為日誌轉送。PostgreSQL 透過一次傳輸 WAL 記錄一個檔案(WAL 段落)來實現基於檔案的日誌轉送。WAL 檔案(16MB)可以在任何距離上輕鬆便宜地運輸,無論是相鄰系統,同一站點的另一個系統,還是地球另一端的其他系統。此技術所需的頻寬依主伺服器的事務速率而變化。基於記錄的日誌傳送更精細,並且通過網路連連逐步更改 WAL(請參閱第 26.2.5 節)。
應該注意的是,日誌輸送是非同步的,即 WAL 記錄在事務提交之後被傳送。因此,如果主伺服器遭受災難性故障,則存在資料遺失的可能性;尚未提交的交易將會失去。基於檔案的日誌轉送中的資料遺失的大小可以透過使用 archive_timeout 參數來限制,該參數可以設定低至數秒鐘。然而,這種低的設定將大大增加檔案傳送所需的頻寬。 串流複寫(參閱第 26.2.5 節)允許更小的資料遺失大小。
回復的效率很高,一旦備用轉為主要,備用資料庫通常只需要幾分鐘即可完全可用。因此,這稱為熱備用配置,可提供高可用性。從歸檔的基本備份和回溯還原伺服器將花費相當長的時間,因此該技術僅提供災難恢復的解決方案,而不是高可用性。備用伺服器也可用於唯讀查詢,在這種情況下,它稱為熱備份伺服器。有關更多訊息,請參閱第 26.5 節。
建立主伺服器和備用伺服器通常是好的規畫,使它們可以盡可能相似,至少從資料庫伺服器的角度來看。特別是,與資料表空間關聯的路徑名稱將在未修改的情況下傳遞。因此,如果使用此功能,主伺服器和備用伺服器必須具有相同的資料表空間的安裝路徑。請記住,如果在主伺服器上執行 CREATE TABLESPACE,則必須在執行命令之前在主伺服器和所有備用伺服器上建立所需的所有新安裝點。硬體不需要完全相同,但經驗上,維護兩個相同的系統會比在應用系統的生命週期內維護兩個不同的系統更容易。不過在硬體架構則必須相同 - 例如,從 32 位元到 64 位元系統的搭配則無法運作。
一般來說,無法在不同主要 PostgreSQL 版本的伺服器之間進行日誌傳送。PostgreSQL 全球開發團隊的原則是不要在次要版本升級期間更改磁碟格式,因此在主伺服器和備用伺服器上使用不同的次要版本可能會成功執行。 但是,並沒有保證正式支持,建議您盡可能將主伺服器和備用伺服器保持在同一版本。更新到新的次要版本時,最安全的策略是先更新備用伺服器 - 新的次要版本更有可能從先前的次要版本讀取 WAL 檔案,反過來則不一定。
在備用模式下,伺服器連續套用從主要伺服器所接收的 WAL。備用伺服器可以透過 TCP 連線(串流複寫)從 WAL 歸檔(請參閱 restore_command)。備用伺服器也會嘗試恢復在備用集群的 pg_wal 目錄中能找到的任何 WAL。這通常發生在伺服器重新啟動之後,當備用資料庫再次重新執行在重新啟動之前從主服務器串流傳輸的 WAL 時,您也可以隨時手動將檔案複製到 pg_wal 以重新執行它們。
在啟動時,備用資料庫首先恢復存檔路徑中的所有可用的 WAL,然後呼叫 restore_command。一旦達到 WAL 可用的尾端並且 restore_command 失敗,它就會嘗試恢復 pg_wal 目錄中可用的任何WAL。如果失敗,並且已啟用串流複寫,則備用資料庫會嘗試連到主伺服器,並從 archive 或 pg_wal 中找到的最後一個有效記錄開始串流傳輸 WAL。 如果失敗或未啟用串流複寫,或者稍後中斷連線,則備用資料庫將返回步驟 1 並嘗試再次從存檔中還原交易。pg_wal 和串流複寫的重試循環一直持續到伺服器停止或觸發故障轉移為止。
退出備用模式,當執行 pg_ctl promote 或找到觸發器檔案(trigger_file)時,伺服器將切換到正常操作。在故障轉移之前,將恢復存檔或 pg_wal 中立即可用的 WAL,但不會嘗試連線到主要伺服器。
Set up continuous archiving on the primary to an archive directory accessible from the standby, as described in Section 25.3. The archive location should be accessible from the standby even when the master is down, i.e. it should reside on the standby server itself or another trusted server, not on the master server.
If you want to use streaming replication, set up authentication on the primary server to allow replication connections from the standby server(s); that is, create a role and provide a suitable entry or entries in pg_hba.conf
with the database field set to replication
. Also ensure max_wal_senders
is set to a sufficiently large value in the configuration file of the primary server. If replication slots will be used, ensure that max_replication_slots
is set sufficiently high as well.
Take a base backup as described in Section 25.3.2 to bootstrap the standby server.
To set up the standby server, restore the base backup taken from primary server (see Section 25.3.4). Create a recovery command file recovery.conf
in the standby's cluster data directory, and turn on standby_mode
. Set restore_command
to a simple command to copy files from the WAL archive. If you plan to have multiple standby servers for high availability purposes, set recovery_target_timeline
to latest
, to make the standby server follow the timeline change that occurs at failover to another standby.
Do not use pg_standby or similar tools with the built-in standby mode described here. restore_command
should return immediately if the file does not exist; the server will retry the command again if necessary. See Section 26.4 for using tools like pg_standby.
If you want to use streaming replication, fill in primary_conninfo
with a libpq connection string, including the host name (or IP address) and any additional details needed to connect to the primary server. If the primary needs a password for authentication, the password needs to be specified in primary_conninfo
as well.
If you're setting up the standby server for high availability purposes, set up WAL archiving, connections and authentication like the primary server, because the standby server will work as a primary server after failover.
If you're using a WAL archive, its size can be minimized using the archive_cleanup_command parameter to remove files that are no longer required by the standby server. The pg_archivecleanup utility is designed specifically to be used with archive_cleanup_command
in typical single-standby configurations, see pg_archivecleanup. Note however, that if you're using the archive for backup purposes, you need to retain files needed to recover from at least the latest base backup, even if they're no longer needed by the standby.
A simple example of a recovery.conf
is:
You can have any number of standby servers, but if you use streaming replication, make sure you set max_wal_senders
high enough in the primary to allow them to be connected simultaneously.
Streaming replication allows a standby server to stay more up-to-date than is possible with file-based log shipping. The standby connects to the primary, which streams WAL records to the standby as they're generated, without waiting for the WAL file to be filled.
Streaming replication is asynchronous by default (see Section 26.2.8), in which case there is a small delay between committing a transaction in the primary and the changes becoming visible in the standby. This delay is however much smaller than with file-based log shipping, typically under one second assuming the standby is powerful enough to keep up with the load. With streaming replication, archive_timeout
is not required to reduce the data loss window.
If you use streaming replication without file-based continuous archiving, the server might recycle old WAL segments before the standby has received them. If this occurs, the standby will need to be reinitialized from a new base backup. You can avoid this by setting wal_keep_segments
to a value large enough to ensure that WAL segments are not recycled too early, or by configuring a replication slot for the standby. If you set up a WAL archive that's accessible from the standby, these solutions are not required, since the standby can always use the archive to catch up provided it retains enough segments.
To use streaming replication, set up a file-based log-shipping standby server as described in Section 26.2. The step that turns a file-based log-shipping standby into streaming replication standby is setting primary_conninfo
setting in the recovery.conf
file to point to the primary server. Set listen_addresses and authentication options (see pg_hba.conf
) on the primary so that the standby server can connect to the replication
pseudo-database on the primary server (see Section 26.2.5.1).
On systems that support the keepalive socket option, setting tcp_keepalives_idle, tcp_keepalives_interval and tcp_keepalives_count helps the primary promptly notice a broken connection.
Set the maximum number of concurrent connections from the standby servers (see max_wal_senders for details).
When the standby is started and primary_conninfo
is set correctly, the standby will connect to the primary after replaying all WAL files available in the archive. If the connection is established successfully, you will see a walreceiver process in the standby, and a corresponding walsender process in the primary.
It is very important that the access privileges for replication be set up so that only trusted users can read the WAL stream, because it is easy to extract privileged information from it. Standby servers must authenticate to the primary as a superuser or an account that has the REPLICATION
privilege. It is recommended to create a dedicated user account with REPLICATION
and LOGIN
privileges for replication. While REPLICATION
privilege gives very high permissions, it does not allow the user to modify any data on the primary system, which the SUPERUSER
privilege does.
Client authentication for replication is controlled by a pg_hba.conf
record specifying replication
in the database
field. For example, if the standby is running on host IP 192.168.1.100
and the account name for replication is foo
, the administrator can add the following line to the pg_hba.conf
file on the primary:
The host name and port number of the primary, connection user name, and password are specified in the recovery.conf
file. The password can also be set in the ~/.pgpass
file on the standby (specify replication
in the database
field). For example, if the primary is running on host IP 192.168.1.50
, port 5432
, the account name for replication is foo
, and the password is foopass
, the administrator can add the following line to the recovery.conf
file on the standby:
An important health indicator of streaming replication is the amount of WAL records generated in the primary, but not yet applied in the standby. You can calculate this lag by comparing the current WAL write location on the primary with the last WAL location received by the standby. These locations can be retrieved using pg_current_wal_lsn
on the primary and pg_last_wal_receive_lsn
on the standby, respectively (see Table 9.79 and Table 9.80 for details). The last WAL receive location in the standby is also displayed in the process status of the WAL receiver process, displayed using the ps
command (see Section 28.1 for details).
You can retrieve a list of WAL sender processes via the pg_stat_replication
view. Large differences between pg_current_wal_lsn
and the view's sent_lsn
field might indicate that the master server is under heavy load, while differences between sent_lsn
and pg_last_wal_receive_lsn
on the standby might indicate network delay, or that the standby is under heavy load.
Replication slots provide an automated way to ensure that the master does not remove WAL segments until they have been received by all standbys, and that the master does not remove rows which could cause a recovery conflict even when the standby is disconnected.
In lieu of using replication slots, it is possible to prevent the removal of old WAL segments using wal_keep_segments, or by storing the segments in an archive using archive_command. However, these methods often result in retaining more WAL segments than required, whereas replication slots retain only the number of segments known to be needed. An advantage of these methods is that they bound the space requirement for pg_wal
; there is currently no way to do this using replication slots.
Similarly, hot_standby_feedback and vacuum_defer_cleanup_age provide protection against relevant rows being removed by vacuum, but the former provides no protection during any time period when the standby is not connected, and the latter often needs to be set to a high value to provide adequate protection. Replication slots overcome these disadvantages.
Each replication slot has a name, which can contain lower-case letters, numbers, and the underscore character.
Existing replication slots and their state can be seen in the pg_replication_slots
view.
Slots can be created and dropped either via the streaming replication protocol (see Section 52.4) or via SQL functions (see Section 9.26.6).
You can create a replication slot like this:
To configure the standby to use this slot, primary_slot_name
should be configured in the standby's recovery.conf
. Here is a simple example:
The cascading replication feature allows a standby server to accept replication connections and stream WAL records to other standbys, acting as a relay. This can be used to reduce the number of direct connections to the master and also to minimize inter-site bandwidth overheads.
A standby acting as both a receiver and a sender is known as a cascading standby. Standbys that are more directly connected to the master are known as upstream servers, while those standby servers further away are downstream servers. Cascading replication does not place limits on the number or arrangement of downstream servers, though each standby connects to only one upstream server which eventually links to a single master/primary server.
A cascading standby sends not only WAL records received from the master but also those restored from the archive. So even if the replication connection in some upstream connection is terminated, streaming replication continues downstream for as long as new WAL records are available.
Cascading replication is currently asynchronous. Synchronous replication (see Section 26.2.8) settings have no effect on cascading replication at present.
Hot Standby feedback propagates upstream, whatever the cascaded arrangement.
If an upstream standby server is promoted to become new master, downstream servers will continue to stream from the new master if recovery_target_timeline
is set to 'latest'
.
To use cascading replication, set up the cascading standby so that it can accept replication connections (that is, set max_wal_senders and hot_standby, and configure host-based authentication). You will also need to set primary_conninfo
in the downstream standby to point to the cascading standby.
PostgreSQL streaming replication is asynchronous by default. If the primary server crashes then some transactions that were committed may not have been replicated to the standby server, causing data loss. The amount of data loss is proportional to the replication delay at the time of failover.
Synchronous replication offers the ability to confirm that all changes made by a transaction have been transferred to one or more synchronous standby servers. This extends that standard level of durability offered by a transaction commit. This level of protection is referred to as 2-safe replication in computer science theory, and group-1-safe (group-safe and 1-safe) when synchronous_commit
is set to remote_write
.
When requesting synchronous replication, each commit of a write transaction will wait until confirmation is received that the commit has been written to the write-ahead log on disk of both the primary and standby server. The only possibility that data can be lost is if both the primary and the standby suffer crashes at the same time. This can provide a much higher level of durability, though only if the sysadmin is cautious about the placement and management of the two servers. Waiting for confirmation increases the user's confidence that the changes will not be lost in the event of server crashes but it also necessarily increases the response time for the requesting transaction. The minimum wait time is the round-trip time between primary to standby.
Read only transactions and transaction rollbacks need not wait for replies from standby servers. Subtransaction commits do not wait for responses from standby servers, only top-level commits. Long running actions such as data loading or index building do not wait until the very final commit message. All two-phase commit actions require commit waits, including both prepare and commit.
A synchronous standby can be a physical replication standby or a logical replication subscriber. It can also be any other physical or logical WAL replication stream consumer that knows how to send the appropriate feedback messages. Besides the built-in physical and logical replication systems, this includes special programs such as pg_receivewal
and pg_recvlogical
as well as some third-party replication systems and custom programs. Check the respective documentation for details on synchronous replication support.
Once streaming replication has been configured, configuring synchronous replication requires only one additional configuration step: synchronous_standby_names must be set to a non-empty value. synchronous_commit
must also be set to on
, but since this is the default value, typically no change is required. (See Section 19.5.1 and Section 19.6.2.) This configuration will cause each commit to wait for confirmation that the standby has written the commit record to durable storage. synchronous_commit
can be set by individual users, so it can be configured in the configuration file, for particular users or databases, or dynamically by applications, in order to control the durability guarantee on a per-transaction basis.
After a commit record has been written to disk on the primary, the WAL record is then sent to the standby. The standby sends reply messages each time a new batch of WAL data is written to disk, unless wal_receiver_status_interval
is set to zero on the standby. In the case that synchronous_commit
is set to remote_apply
, the standby sends reply messages when the commit record is replayed, making the transaction visible. If the standby is chosen as a synchronous standby, according to the setting of synchronous_standby_names
on the primary, the reply messages from that standby will be considered along with those from other synchronous standbys to decide when to release transactions waiting for confirmation that the commit record has been received. These parameters allow the administrator to specify which standby servers should be synchronous standbys. Note that the configuration of synchronous replication is mainly on the master. Named standbys must be directly connected to the master; the master knows nothing about downstream standby servers using cascaded replication.
Setting synchronous_commit
to remote_write
will cause each commit to wait for confirmation that the standby has received the commit record and written it out to its own operating system, but not for the data to be flushed to disk on the standby. This setting provides a weaker guarantee of durability than on
does: the standby could lose the data in the event of an operating system crash, though not a PostgreSQL crash. However, it's a useful setting in practice because it can decrease the response time for the transaction. Data loss could only occur if both the primary and the standby crash and the database of the primary gets corrupted at the same time.
Setting synchronous_commit
to remote_apply
will cause each commit to wait until the current synchronous standbys report that they have replayed the transaction, making it visible to user queries. In simple cases, this allows for load balancing with causal consistency.
Users will stop waiting if a fast shutdown is requested. However, as when using asynchronous replication, the server will not fully shutdown until all outstanding WAL records are transferred to the currently connected standby servers.
Synchronous replication supports one or more synchronous standby servers; transactions will wait until all the standby servers which are considered as synchronous confirm receipt of their data. The number of synchronous standbys that transactions must wait for replies from is specified in synchronous_standby_names
. This parameter also specifies a list of standby names and the method (FIRST
and ANY
) to choose synchronous standbys from the listed ones.
The method FIRST
specifies a priority-based synchronous replication and makes transaction commits wait until their WAL records are replicated to the requested number of synchronous standbys chosen based on their priorities. The standbys whose names appear earlier in the list are given higher priority and will be considered as synchronous. Other standby servers appearing later in this list represent potential synchronous standbys. If any of the current synchronous standbys disconnects for whatever reason, it will be replaced immediately with the next-highest-priority standby.
An example of synchronous_standby_names
for a priority-based multiple synchronous standbys is:
In this example, if four standby servers s1
, s2
, s3
and s4
are running, the two standbys s1
and s2
will be chosen as synchronous standbys because their names appear early in the list of standby names. s3
is a potential synchronous standby and will take over the role of synchronous standby when either of s1
or s2
fails. s4
is an asynchronous standby since its name is not in the list.
The method ANY
specifies a quorum-based synchronous replication and makes transaction commits wait until their WAL records are replicated to at least the requested number of synchronous standbys in the list.
An example of synchronous_standby_names
for a quorum-based multiple synchronous standbys is:
In this example, if four standby servers s1
, s2
, s3
and s4
are running, transaction commits will wait for replies from at least any two standbys of s1
, s2
and s3
. s4
is an asynchronous standby since its name is not in the list.
The synchronous states of standby servers can be viewed using the pg_stat_replication
view.
Synchronous replication usually requires carefully planned and placed standby servers to ensure applications perform acceptably. Waiting doesn't utilize system resources, but transaction locks continue to be held until the transfer is confirmed. As a result, incautious use of synchronous replication will reduce performance for database applications because of increased response times and higher contention.
PostgreSQL allows the application developer to specify the durability level required via replication. This can be specified for the system overall, though it can also be specified for specific users or connections, or even individual transactions.
For example, an application workload might consist of: 10% of changes are important customer details, while 90% of changes are less important data that the business can more easily survive if it is lost, such as chat messages between users.
With synchronous replication options specified at the application level (on the primary) we can offer synchronous replication for the most important changes, without slowing down the bulk of the total workload. Application level options are an important and practical tool for allowing the benefits of synchronous replication for high performance applications.
You should consider that the network bandwidth must be higher than the rate of generation of WAL data.
synchronous_standby_names
specifies the number and names of synchronous standbys that transaction commits made when synchronous_commit
is set to on
, remote_apply
or remote_write
will wait for responses from. Such transaction commits may never be completed if any one of synchronous standbys should crash.
The best solution for high availability is to ensure you keep as many synchronous standbys as requested. This can be achieved by naming multiple potential synchronous standbys using synchronous_standby_names
.
In a priority-based synchronous replication, the standbys whose names appear earlier in the list will be used as synchronous standbys. Standbys listed after these will take over the role of synchronous standby if one of current ones should fail.
In a quorum-based synchronous replication, all the standbys appearing in the list will be used as candidates for synchronous standbys. Even if one of them should fail, the other standbys will keep performing the role of candidates of synchronous standby.
When a standby first attaches to the primary, it will not yet be properly synchronized. This is described as catchup
mode. Once the lag between standby and primary reaches zero for the first time we move to real-time streaming
state. The catch-up duration may be long immediately after the standby has been created. If the standby is shut down, then the catch-up period will increase according to the length of time the standby has been down. The standby is only able to become a synchronous standby once it has reached streaming
state. This state can be viewed using the pg_stat_replication
view.
If primary restarts while commits are waiting for acknowledgement, those waiting transactions will be marked fully committed once the primary database recovers. There is no way to be certain that all standbys have received all outstanding WAL data at time of the crash of the primary. Some transactions may not show as committed on the standby, even though they show as committed on the primary. The guarantee we offer is that the application will not receive explicit acknowledgement of the successful commit of a transaction until the WAL data is known to be safely received by all the synchronous standbys.
If you really cannot keep as many synchronous standbys as requested then you should decrease the number of synchronous standbys that transaction commits must wait for responses from in synchronous_standby_names
(or disable it) and reload the configuration file on the primary server.
If the primary is isolated from remaining standby servers you should fail over to the best candidate of those other remaining standby servers.
If you need to re-create a standby server while transactions are waiting, make sure that the commands pg_start_backup() and pg_stop_backup() are run in a session with synchronous_commit
= off
, otherwise those requests will wait forever for the standby to appear.
When continuous WAL archiving is used in a standby, there are two different scenarios: the WAL archive can be shared between the primary and the standby, or the standby can have its own WAL archive. When the standby has its own WAL archive, set archive_mode
to always
, and the standby will call the archive command for every WAL segment it receives, whether it's by restoring from the archive or by streaming replication. The shared archive can be handled similarly, but the archive_command
must test if the file being archived exists already, and if the existing file has identical contents. This requires more care in the archive_command
, as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice. And all that must be done free of race conditions, if two servers attempt to archive the same file at the same time.
If archive_mode
is set to on
, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but will not archive any WAL it did not generate itself. To get a complete series of WAL files in the archive, you must ensure that all WAL is archived, before it reaches the standby. This is inherently true with file-based log shipping, as the standby can only restore files that are found in the archive, but not if streaming replication is enabled. When a server is not in recovery mode, there is no difference between on
and always
modes.
本章討論如何監控 PostgreSQL 資料庫系統的磁碟使用情況。
資料庫伺服器可以協同工作,以便在主要伺服器故障時允許第二台伺服器快速的接管(高可用性 High Availability),或者允許多台伺服器提供相同的資料(負載平衡 Loading Balancing)。理想狀況下,資料伺服器可以無縫接軌地協同工作。網頁伺服器提供靜態網頁可以被相當簡單的組合,僅透過負載平衡把網頁請求分配到多台機器上。事實上,只提供讀取的資料庫伺服器也可以相對容易地被組合。不幸地是大多數地資料庫伺服器具有讀/寫請求的組合,可是具備讀/寫請求資料伺服器被組合起來是相當地困難。這是因為儘管只供讀取的資料只被放進每台伺服器一次,但是必須將對任何伺服器寫入的資料傳播到所有的伺服器中,以便將來對這些伺服器發送讀取請求能夠返回一致的結果。
這種同步化的問題算是伺服器協同工作上的基本難題。由於沒有單一得解決方案可以消除所有使用案例同步問題的影響,因此有多許多種解決方案。每種解決方案都以不同的方式解決問題,並最小化該問題對特定工作負載的影響。
有些解決方案處理同步化是藉由只讓單一伺服器可以修改資料。可以修改資料伺服器被稱之為 read/write、master或primary的伺服器。 可以追蹤master伺服器改變的伺服器我們稱為standby或secondary伺服器。standby伺服器不能接受連線上直到他被提升為maaster伺服器才能連線的伺服器稱之為warm standby伺服器,另一種可以接受連線且只提供其他伺服器作讀取查詢的稱之為hot standby伺服器。
一些解決方案是同步的,代表說一個資料修改的交易是不被認為提交直到所有伺服器都已經提交這些交易。這保證故障轉移不會遺失掉任何資料,和不論哪一台資料庫伺服器被查詢時,所有負載平衡的伺服器都可以返回一致的結果。相反地,非同步解決方案允許在提交交易時間和傳播到其他伺服器之間存在些許延遲,從而可能會在切換到備份伺服器時遺失某些交易,且負載平衡伺服器可能會返回一些稍微過時的結果。非同步解決方案被運用在當同步解決方案太慢的時候。
解決方案也可以被依照規模分類,某些解決方案只能處理整個資料庫伺服器,然而其他的解決方案允許處理控制在每個表或每個資料庫等級。
做任何選擇都必須考慮到其性能。通常必須在功能和性能之間取其權衡。例如一個完整的同步解決方案可能會讓性能降低一半以上,而異步解決方案可能會對性能有比較小的影響。
本節的其餘部分概述了各種故障轉移、複製和負載平衡解決方案。
本章介紹預寫日誌(WAL, Write-Ahead Log)如何達到高效能及高可靠度的運作。
PostgreSQL provides a set of default roles which provide access to certain, commonly needed, privileged capabilities and information. Administrators can GRANT these roles to users and/or other roles in their environment, providing those users with access to the specified capabilities and information.
The default roles are described in Table 21.1. Note that the specific permissions for each of the default roles may change in the future as additional capabilities are added. Administrators should monitor the release notes for changes.
The pg_monitor
, pg_read_all_settings
, pg_read_all_stats
and pg_stat_scan_tables
roles are intended to allow administrators to easily configure a role for the purpose of monitoring the database server. They grant a set of common privileges allowing the role to read various useful configuration settings, statistics and other system information normally restricted to superusers.
The pg_signal_backend
role is intended to allow administrators to enable trusted, but non-superuser, roles to send signals to other backends. Currently this role enables sending of signals for canceling a query on another backend or terminating its session. A user granted this role cannot however send signals to a backend owned by a superuser. See Section 9.26.2.
The pg_read_server_files
, pg_write_server_files
and pg_execute_server_program
roles are intended to allow administrators to have trusted, but non-superuser, roles which are able to access files and run programs on the database server as the user the database runs as. As these roles are able to access any file on the server file system, they bypass all database-level permission checks when accessing files directly and they could be used to gain superuser-level access, therefore great care should be taken when granting these roles to users.
Care should be taken when granting these roles to ensure they are only used where needed and with the understanding that these roles grant access to privileged information.
Administrators can grant access to these roles to users using the GRANT command, for example:
版本:11
區域設定支援是指某個應用程序,它提供有關字母、排序、數字格式等文化偏好。PostgreSQL 使用伺服器作業系統提供的標準 ISO C 和 POSIX 區域設定。有關其他訊息,請參閱作業系統文件。
使用 initdb 建立資料庫叢集時,將自動初始化語言環境支援。initdb 將預設使用其執行環境的語言環境設定初始化資料庫叢集。因此,如果您的作業系統已設定為使用資料庫叢集中所需的語言環境,那麼您毌須進行任何額外操作。如果要使用其他語言環境(或者您不確定系統設定的語言環境),可以透過指定 --locale 選項指示 initdb 確切使用哪個語言環境。例如:
Unix 系統的這個範例將語言環境設定為瑞典語(SE)中的瑞典語(sv)。其他可能性可能包括 en_US(美國英語)和 fr_CA(加拿大法語)。如果可以將多個字元集用於語言環境,則規範可採用 language_territory.codeset 形式。例如,fr_BE.UTF-8 表示比利時(BE)中使用的法語(fr),具有 UTF-8 字元集編碼。
系統上可用的區域設定取決於作業系統供應商提供和安裝的內容。在大多數 Unix 系統上,指令 locale -a
將提供可用語言環境的列表。Windows 使用更詳細的區域設定名稱,例如 German_Germany 或 Swedish_Sweden.1252,但原則是相同的。
有時,混合來自多個語言環境的規則很有用,例如,使用英語校對規則,但使用西班牙語訊息。為了支援這一點,可以存在一組區域設定子類別,它們僅控制本地化規則的某些方面:
類別名稱轉換為 initdb 選項的名稱,以覆蓋特定類別的區域設定選項。例如,要將語言環境設定為加拿大法語,但使用美國規則格式化貨幣,請使用 initdb --locale = fr_CA --lc-monetary = en_US
。
如果您希望系統的行為就像它沒有語言環境支援一樣,請使用特殊的語言環境名稱 C 或等效的 POSIX。
建立資料庫時,某些區域設定類別必須固定其值。 您可以對不同的資料庫使用不同的設定,但是一旦建立了資料庫,就無法再為該資料庫更改它們。LC_COLLATE 和 LC_CTYPE 是這些類別。它們會影響索引的排序順序,因此必須保持不變,否則文字欄位上的索引會損壞。(但是您可以使用排序規則來緩解此限制,如第 23.2 節中所述。)這些類別的預設值在執行 initdb 時確定,並且在建立新資料庫時使用這些值,除非在 CREATE DATABASE 指令中另行指定。
透過設定與語言環境類別同名的伺服器配置參數,可以隨時更改其他語言環境類別(有關詳細訊息,請參閱第 19.11.2 節)。initdb 選擇的值實際上只寫入配置文件 postgresql.conf,以在伺服器啟動時用作預設值。如果從 postgresql.conf 中刪除這些設定,則伺服器將從其執行環境繼承設定。
請注意,服務器的區域設定行為由伺服器看到的環境變數決定,而不是由任何用戶端的環境確定。因此,在啟動伺服器之前,請務必配置正確的區域設定。這樣做的結果是,如果用戶端和伺服器設定在不同的區域設定中,則訊息可能會以不同的語言顯示,具體取決於它們的來源。
注意 當我們談到從執行環境繼承語言環境時,這意味著在大多數作業系統上都有以下內容:對於給定的語言環境類別,比如排序規則,將按此順序查詢以下環境變數,直到找到一個設定:LC_ALL, LC_COLLATE(或對應於相應類別的變數),LANG。如果未設定這些環境變數,則語言環境預設為 C.
某些訊息的本地化函式庫還會查看環境變數 LANGUAGE,該變數將覆寫所有其他區域設定,以便設定訊息的語言。如有疑問,請參閱作業系統的文件,特別是有關 gettext 的文件。
要使訊息能夠轉換為用戶的偏好語言,必須在編譯時選擇 NLS(configure --enable-nls
)。所有其他語言環境支援都是自動編譯的。
語系設定會影響以下的 SQL 功能:
使用 ORDER BY 或標準比較運算子對查詢中文字排序
upper,lower 和 initcap 功能
樣式匹配運算子(LIKE,SIMILAR TO 和 POSIX 形式的正規表示式);locales 透過字元類的正規表示式影響不區分大小寫的匹配和字元分類
to_char 系列函數
索引可以與 LIKE 子句一起使用
在 PostgreSQL 中使用 C 或 POSIX 以外語言環境的缺點是對效能的影響。它會減慢字元處理速度並阻止 LIKE 使用普通索引。因此,最好只有在實際需要時才進行區域設定。
作為允許 PostgreSQL 在非 C 語言環境下使用具有 LIKE 子句索引的解決方法,存在多個自訂運算子類。允許建立一個執行嚴格的逐字元比較的索引,忽略區域設定的比較規則。有關更多訊息,請參閱第 11.9 節。另一種方法是使用 C collation 建立索引,如第 23.2 節中所述。
如果區域設定依上述說明操作卻不起作用的話,請檢查作業系統中的區域設定是否已正確配置。要檢查作業系統上安裝的語言環境,可以使用命令 locale -a(如果作業系統有提供的話)。
檢查 PostgreSQL 實際上是否正在使用您認為的語言環境。LC_COLLATE 和 LC_CTYPE 設定會在建立資料庫時確定,除非建立新的資料庫,否則無法變更。其他區域設定(包括 LC_MESSAGES 和 LC_MONETARY)最初由伺服器啟動的環境決定,但可以即時變更。您可以使用 SHOW 命令檢查當下有效的區域設定。
原始碼發行版中的目錄 src/test/locale 包含了 PostgreSQL 語言環境支援的測試套件。
當伺服器的訊息使用不同的語言時,透過解析錯誤訊息文字來處理伺服器端錯誤的用戶端應用程序顯然會出現問題。建議此類應用程序的作者使用錯誤代碼方案。
維護訊息翻譯目錄需要許多志願者的持續努力,他們希望看到 PostgreSQL 能夠順暢地說出他們喜歡的語言。如果您的語言訊息目前無法使用或未完全翻譯,我們將非常歡迎您的協助。如果您想幫助我們,請參閱第 54 章或寫信給開發人員的郵件列表。
PostgreSQL 中的字元集支援允許您將文字以各種字元集(也稱為編碼)儲存,包括單位元組字元集(如 ISO 8859 系列)和多位元組字元集,如 EUC(延伸 Unix 代碼), UTF-8 和 Mule 內部代碼。用戶端可以透通地使用所有支援的字元集,但有一些並不支援在伺服器中使用(即作為伺服器端編碼)。使用 initdb 初始化 PostgreSQL 資料庫叢集時,會選擇預設字元集。建立資料庫時可以覆寫它,因此您可以擁有多個資料庫,每個資料庫具有不同的字元集。
但是,一個重要的限制是每個資料庫的字元集必須與資料庫的 LC_CTYPE(字元分類)和 LC_COLLATE(字串排序順序)語言環境設定相容。對於 C 或 POSIX 語言環境,允許使用任何字元集,但對於其他 libc 提供的語言環境,只有一個字元集可以正常工作。(但在 Windows 上,UTF-8 編碼可以與任何語言環境一起使用。)如果您配置了 ICU 支援,ICU 提供的語言環境可以與大多數但不是所有伺服器端編碼一起使用。
Table 23.1 顯示了可在 PostgreSQL 中使用的字元集。
並非所有用戶端 API 都支援所有列出的字元集。例如,PostgreSQL JDBC 驅動程式就不支援 MULE_INTERNAL,LATIN6,LATIN8 和 LATIN10。
SQL_ASCII 設定與其他設定的行為大不相同。當伺服器字元集是 SQL_ASCII 時,伺服器根據 ASCII 標準解譯位元組值 0-127,而位元組值 128-255 作為未解譯的字元。當設定為 SQL_ASCII 時,不會進行編碼轉換。因此,這個設定並不是使用特定編碼的宣告,而是對編碼的未知宣告。在大多數情況下,如果您使用任何非 ASCII 資料,使用 SQL_ASCII 設定是不明智的,因為 PostgreSQL 將無法透過轉換或驗證非 ASCII 字元來幫助您。
initdb 定義 PostgreSQL 叢集的預設字元集(編碼)。例如,
將預設字元集設定為 EUC_JP(日本語的延伸 Unix 代碼)。如果您喜歡更長的選項字串,則可以使用 --encoding 而不是 -E。如果使用 -E 或 --encoding 選項,initdb 將嘗試根據指定的或預設的語言環境決定要使用的相對應編碼。
您可以在資料庫建立時指定非預設編碼,前提是該編碼與所選語言環境相容:
這將建立一個名為 korean 的資料庫,該資料庫使用字元集 EUC_KR 和語言環境 ko_KR。另一種方法是使用此 SQL 指令:
請注意,上述指令指定複製 template0 資料庫。複製任何其他資料庫時,無法更改原資料庫的編碼和語言環境設定,因為這可能會導致資料損壞。有關更多訊息,請參閱第 22.3 節。
資料庫的編碼儲存在系統目錄 pg_database 中。您可以使用 psql -l 選項或 \l 指令查看。
注意 在大多數現代作業系統上,PostgreSQL 可以確定 LC_CTYPE 設定所隱含的字元集,並強制只使用相符合的資料庫編碼。在較舊的系統上,您有責任確保使用所選區域設定所需的編碼。此區域中的錯誤可能會導致與區域設定相關操作(如排序)的奇怪行為。
即使 LC_CTYPE 不是 C 或 POSIX,PostgreSQL 也允許超級使用者使用 SQL_ASCII 編碼建立資料庫。如上所述,SQL_ASCII 不強制儲存在資料庫中的資料具有任何特定編碼,因此這種選擇會帶來相依於語言環境的錯誤行為風險。不推薦使用這種設定組合,有一天可能會被禁止使用。
PostgreSQL 支援伺服器和用戶端之間針對某些字元集組合的自動字元集轉換。轉換訊息儲存在 pg_conversion 系統目錄中。PostgreSQL 帶有一些預先定義的轉換,如 Table 23.2 所示。您可以使用 SQL 指令 CREATE CONVERSION 建立新的轉換。
要啟用自動字元集轉換,您必須告訴 PostgreSQL 您要在用戶端中使用的字元集(編碼)。有幾種方法可以實現此目的:
在 psql 中使用 \encoding 指令。\encoding 允許您即時更改用戶端編碼。例如,要將編碼更改為 SJIS,請鍵入:
libpq(第 33.10 節)具有控制用戶端編碼的功能。
使用 SET client_encoding TO。可以使用以下 SQL 指令設定用戶端編碼:
您還可以使用標準 SQL 語法 SET NAMES 來達到此目的:
要查詢目前用戶端編碼:
要回傳預設編碼:
使用 PGCLIENTENCODING。如果在用戶端環境中定義了環境變數 PGCLIENTENCODING,則在建立與伺服器的連線時會自動選擇該用戶端編碼。(這可以隨後使用上面提到的任何其他方法覆蓋。)
使用組態變數 client_encoding。如果設定了 client_encoding 變數,則在建立與伺服器的連線時會自動選擇該用戶端編碼。(這可以隨後使用上面提到的任何其他方法覆蓋。)
如果無法轉換特定字元 - 假設您為伺服器選擇了 EUC_JP 而為用戶端選擇了 LATIN1,並且回傳了一些在 LATIN1 中沒有表示的日文字元 - 回報錯誤。
如果用戶端字元集定義為 SQL_ASCII,則無論伺服器的字元集如何,都將停用編碼轉換。就像伺服器一樣,除非使用全 ASCII 資料,否則使用 SQL_ASCII 是不明智的。
這些是開始學習各種編碼系統的好資源。
CJKV 訊息處理:中文,日文,韓文和越南文運算
包含 EUC_JP,EUC_CN,EUC_KR,EUC_TW 的詳細說明。
Unicode Consortium 的網站。
RFC 3629
UTF-8 (8-bit UCS/Unicode Transformation Format) 定義在這裡
另一種備份策略是直接複製 PostgreSQL 用於資料儲存的資料庫中檔案。介紹了這些檔案的位置。您可以使用自己喜歡的任何方法進行檔案系統備份。例如:
但是,有兩個限制會讓這個方法不可行,或者至少不如 pg_dump 方法:
必須關閉資料庫伺服器才能完成可用的備份。備份其間都無法操作,像是必須禁止所有連線(部分原因是 tar 和類似工具無法對檔案系統狀態進行原子快照,而且還因為伺服器內部還存在一些未儲存的資料緩衝)。 有關停止伺服器的資訊可以在中找到。不用說,您也需要在還原資料之前關閉伺服器。
如果您已深入研究資料庫的檔案系統結構的詳細資訊,則可能會嘗試僅從特定檔案或目錄中備份或還原某些特定資料表或資料庫。這些都不會成功,因為沒有提交日誌檔案 pg_xact/*,其中包含所有事務的提交狀態,這些檔案中包含的資料將無法使用。資料表檔案僅可用於此資訊。當然,僅還原資料表和關聯的 pg_xact 資料也是不可能的,因為這會使資料庫叢集中的所有其他資料表失效。因此,檔案系統備份僅適用於完整資料庫叢集的完整備份和還原。
另一種檔案系統備份方法是,如果檔案系統支持該功能(並且您願意相信它已正確實作),也就是對資料目錄建立「一致性快照(consistent snapshot)」。典型的過程是製作包含資料庫 volume 的「凍結快照(frozen snapshot)」,然後將整個資料目錄(不僅僅是部分,請參閱前文)從快照複製到備份設備,然後釋放凍結快照。即使資料庫伺服器正在執行,這也能完成備份。但是,以這種方式建立的備份會將資料庫檔案保存為某種狀態,就好像資料庫伺服器未正確關閉一樣。因此,當您以備份的檔案啟動資料庫伺服器時,它將認為先前的伺服器實例崩潰了,並且將重新執行 WAL 日誌。這不會是問題;請注意這一點(並確保在備份中包含 WAL 檔案)。而您可以在拍攝快照之前執行 CHECKPOINT,以減少恢復的時間。
如果您的資料庫分散在多個檔案系統中,則可能沒有任何方法可以獲取所有 volume 完全同步的凍結快照。例如,如果資料檔案和 WAL 日誌位於不同的磁碟上,或者資料表空間位於不同的檔案系統上,則可能無法使用快照備份,因為快照必須同時進行。在這種情況下,請務必仔細閱讀檔案系統文件,然後再使用一致性快照技術。
如果不可能同時建立快照,則一種選擇是關閉資料庫伺服器足夠長的時間以建立所有凍結的快照。或者你還有一種選擇是執行連續歸檔(continuous archiving)基礎備份(),因為此類備份在備份期間不受檔案系統變更的影響。這要求僅在備份過程中啟用連續歸檔。使用連續歸檔還原()來完成還原。
使用 rsync 執行檔案系統備份也是可以的。首先在資料庫伺服器執行時也執行 rsync,然後關閉資料庫伺服器足夠長的時間以執行 rsync --checksum,即可完成此操作。(--checksum 是必須的,因為 rsync 僅具有一秒的檔案修改時間顆粒度。)第二次 rsync 將比第一次更快,因為它要傳輸的資料相對較少,並且最終結果將會是一致的,因為伺服器是關閉的狀態。此方法目標在最少停機時間的情況下執行檔案系統備份。
請注意,檔案系統備份通常會比 SQL dump 要佔空間。(例如,pg_dump 不需要匯出索引的內容,只需匯出重新建立索引的指令。)但是,進行檔案系統備份可能會更快。
The idea behind this dump method is to generate a file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. PostgreSQL provides the utility program for this purpose. The basic usage of this command is:
As you see, pg_dump writes its result to the standard output. We will see below how this can be useful. While the above command creates a text file, pg_dump can create files in other formats that allow for parallelism and more fine-grained control of object restoration.
pg_dump is a regular PostgreSQL client application (albeit a particularly clever one). This means that you can perform this backup procedure from any remote host that has access to the database. But remember that pg_dump does not operate with special permissions. In particular, it must have read access to all tables that you want to back up, so in order to back up the entire database you almost always have to run it as a database superuser. (If you do not have sufficient privileges to back up the entire database, you can still back up portions of the database to which you do have access using options such as -n
schema
or -t
table
.)
To specify which database server pg_dump should contact, use the command line options -h
host
and -p
port
. The default host is the local host or whatever your PGHOST
environment variable specifies. Similarly, the default port is indicated by the PGPORT
environment variable or, failing that, by the compiled-in default. (Conveniently, the server will normally have the same compiled-in default.)
Like any other PostgreSQL client application, pg_dump will by default connect with the database user name that is equal to the current operating system user name. To override this, either specify the -U
option or set the environment variable PGUSER
. Remember that pg_dump connections are subject to the normal client authentication mechanisms (which are described in ).
An important advantage of pg_dump over the other backup methods described later is that pg_dump's output can generally be re-loaded into newer versions of PostgreSQL, whereas file-level backups and continuous archiving are both extremely server-version-specific. pg_dump is also the only method that will work when transferring a database to a different machine architecture, such as going from a 32-bit to a 64-bit server.
Dumps created by pg_dump are internally consistent, meaning, the dump represents a snapshot of the database at the time pg_dump began running. pg_dump does not block other operations on the database while it is working. (Exceptions are those operations that need to operate with an exclusive lock, such as most forms of ALTER TABLE
.)
Text files created by pg_dump are intended to be read in by the psql program. The general command form to restore a dump is
where infile
is the file output by the pg_dump command. The database dbname
will not be created by this command, so you must create it yourself from template0
before executing psql (e.g., with createdb -T template0
dbname
). psql supports options similar to pg_dumpfor specifying the database server to connect to and the user name to use. See the reference page for more information. Non-text file dumps are restored using the utility.
Before restoring an SQL dump, all the users who own objects or were granted permissions on objects in the dumped database must already exist. If they do not, the restore will fail to recreate the objects with the original ownership and/or permissions. (Sometimes this is what you want, but usually it is not.)
By default, the psql script will continue to execute after an SQL error is encountered. You might wish to run psql with the ON_ERROR_STOP
variable set to alter that behavior and have psql exit with an exit status of 3 if an SQL error occurs:
Either way, you will only have a partially restored database. Alternatively, you can specify that the whole dump should be restored as a single transaction, so the restore is either fully completed or fully rolled back. This mode can be specified by passing the -1
or --single-transaction
command-line options to psql. When using this mode, be aware that even a minor error can rollback a restore that has already run for many hours. However, that might still be preferable to manually cleaning up a complex database after a partially restored dump.
The ability of pg_dump and psql to write to or read from pipes makes it possible to dump a database directly from one server to another, for example:
The dumps produced by pg_dump are relative to template0
. This means that any languages, procedures, etc. added via template1
will also be dumped by pg_dump. As a result, when restoring, if you are using a customized template1
, you must create the empty database from template0
, as in the example above.
The resulting dump can be restored with psql:
(Actually, you can specify any existing database name to start from, but if you are loading into an empty cluster then postgres
should usually be used.) It is always necessary to have database superuser access when restoring a pg_dumpall dump, as that is required to restore the role and tablespace information. If you use tablespaces, make sure that the tablespace paths in the dump are appropriate for the new installation.
pg_dumpall works by emitting commands to re-create roles, tablespaces, and empty databases, then invoking pg_dump for each database. This means that while each database will be internally consistent, the snapshots of different databases are not synchronized.
Cluster-wide data can be dumped alone using the pg_dumpall --globals-only
option. This is necessary to fully backup the cluster if running the pg_dump command on individual databases.
Some operating systems have maximum file size limits that cause problems when creating large pg_dump output files. Fortunately, pg_dump can write to the standard output, so you can use standard Unix tools to work around this potential problem. There are several possible methods:
Use compressed dumps. You can use your favorite compression program, for example gzip:
Reload with:
or:
Use split
. The split
command allows you to split the output into smaller files that are acceptable in size to the underlying file system. For example, to make chunks of 1 megabyte:
Reload with:
Use pg_dump's custom dump format. If PostgreSQL was built on a system with the zlib compression library installed, the custom dump format will compress data as it writes it to the output file. This will produce dump file sizes similar to using gzip
, but it has the added advantage that tables can be restored selectively. The following command dumps a database using the custom dump format:
A custom-format dump is not a script for psql, but instead must be restored with pg_restore, for example:
For very large databases, you might need to combine split
with one of the other two approaches.
Use pg_dump's parallel dump feature. To speed up the dump of a large database, you can use pg_dump's parallel mode. This will dump multiple tables at the same time. You can control the degree of parallelism with the -j
parameter. Parallel dumps are only supported for the "directory" archive format.
You can use pg_restore -j
to restore a dump in parallel. This will work for any archive of either the "custom" or the "directory" archive mode, whether or not it has been created with pg_dump -j
.
Hot Standby is the term used to describe the ability to connect to the server and run read-only queries while the server is in archive recovery or standby mode. This is useful both for replication purposes and for restoring a backup to a desired state with great precision. The term Hot Standby also refers to the ability of the server to move from recovery through to normal operation while users continue running queries and/or keep their connections open.
Running queries in hot standby mode is similar to normal query operation, though there are several usage and administrative differences explained below.
When the parameter is set to true on a standby server, it will begin accepting connections once the recovery has brought the system to a consistent state. All such connections are strictly read-only; not even temporary tables may be written.
The data on the standby takes some time to arrive from the primary server so there will be a measurable delay between primary and standby. Running the same query nearly simultaneously on both primary and standby might therefore return differing results. We say that data on the standby is eventually consistent with the primary. Once the commit record for a transaction is replayed on the standby, the changes made by that transaction will be visible to any new snapshots taken on the standby. Snapshots may be taken at the start of each query or at the start of each transaction, depending on the current transaction isolation level. For more details, see .
Transactions started during hot standby may issue the following commands:
Query access - SELECT
, COPY TO
Cursor commands - DECLARE
, FETCH
, CLOSE
Parameters - SHOW
, SET
, RESET
Transaction management commands
BEGIN
, END
, ABORT
, START TRANSACTION
SAVEPOINT
, RELEASE
, ROLLBACK TO SAVEPOINT
EXCEPTION
blocks and other internal subtransactions
LOCK TABLE
, though only when explicitly in one of these modes: ACCESS SHARE
, ROW SHARE
or ROW EXCLUSIVE
.
Plans and resources - PREPARE
, EXECUTE
, DEALLOCATE
, DISCARD
Plugins and extensions - LOAD
Transactions started during hot standby will never be assigned a transaction ID and cannot write to the system write-ahead log. Therefore, the following actions will produce error messages:
Data Manipulation Language (DML) - INSERT
, UPDATE
, DELETE
, COPY FROM
, TRUNCATE
. Note that there are no allowed actions that result in a trigger being executed during recovery. This restriction applies even to temporary tables, because table rows cannot be read or written without assigning a transaction ID, which is currently not possible in a Hot Standby environment.
Data Definition Language (DDL) - CREATE
, DROP
, ALTER
, COMMENT
. This restriction applies even to temporary tables, because carrying out these operations would require updating the system catalog tables.
SELECT ... FOR SHARE | UPDATE
, because row locks cannot be taken without updating the underlying data files.
Rules on SELECT
statements that generate DML commands.
LOCK
that explicitly requests a mode higher than ROW EXCLUSIVE MODE
.
LOCK
in short default form, since it requests ACCESS EXCLUSIVE MODE
.
Transaction management commands that explicitly set non-read-only state:
BEGIN READ WRITE
, START TRANSACTION READ WRITE
SET TRANSACTION READ WRITE
, SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
SET transaction_read_only = off
Two-phase commit commands - PREPARE TRANSACTION
, COMMIT PREPARED
, ROLLBACK PREPARED
because even read-only transactions need to write WAL in the prepare phase (the first phase of two phase commit).
Sequence updates - nextval()
, setval()
LISTEN
, UNLISTEN
, NOTIFY
In normal operation, “read-only” transactions are allowed to use LISTEN
, UNLISTEN
, and NOTIFY
, so Hot Standby sessions operate under slightly tighter restrictions than ordinary read-only sessions. It is possible that some of these restrictions might be loosened in a future release.
During hot standby, the parameter transaction_read_only
is always true and may not be changed. But as long as no attempt is made to modify the database, connections during hot standby will act much like any other database connection. If failover or switchover occurs, the database will switch to normal processing mode. Sessions will remain connected while the server changes mode. Once hot standby finishes, it will be possible to initiate read-write transactions (even from a session begun during hot standby).
The primary and standby servers are in many ways loosely connected. Actions on the primary will have an effect on the standby. As a result, there is potential for negative interactions or conflicts between them. The easiest conflict to understand is performance: if a huge data load is taking place on the primary then this will generate a similar stream of WAL records on the standby, so standby queries may contend for system resources, such as I/O.
There are also additional types of conflict that can occur with Hot Standby. These conflicts are hard conflicts in the sense that queries might need to be canceled and, in some cases, sessions disconnected to resolve them. The user is provided with several ways to handle these conflicts. Conflict cases include:
Access Exclusive locks taken on the primary server, including both explicit LOCK
commands and various DDL actions, conflict with table accesses in standby queries.
Dropping a tablespace on the primary conflicts with standby queries using that tablespace for temporary work files.
Dropping a database on the primary conflicts with sessions connected to that database on the standby.
Application of a vacuum cleanup record from WAL conflicts with standby transactions whose snapshots can still “see” any of the rows to be removed.
Application of a vacuum cleanup record from WAL conflicts with queries accessing the target page on the standby, whether or not the data to be removed is visible.
On the primary server, these cases simply result in waiting; and the user might choose to cancel either of the conflicting actions. However, on the standby there is no choice: the WAL-logged action already occurred on the primary so the standby must not fail to apply it. Furthermore, allowing WAL application to wait indefinitely may be very undesirable, because the standby's state will become increasingly far behind the primary's. Therefore, a mechanism is provided to forcibly cancel standby queries that conflict with to-be-applied WAL records.
An example of the problem situation is an administrator on the primary server running DROP TABLE
on a table that is currently being queried on the standby server. Clearly the standby query cannot continue if the DROP TABLE
is applied on the standby. If this situation occurred on the primary, the DROP TABLE
would wait until the other query had finished. But when DROP TABLE
is run on the primary, the primary doesn't have information about what queries are running on the standby, so it will not wait for any such standby queries. The WAL change records come through to the standby while the standby query is still running, causing a conflict. The standby server must either delay application of the WAL records (and everything after them, too) or else cancel the conflicting query so that the DROP TABLE
can be applied.
In a standby server that exists primarily for high availability, it's best to set the delay parameters relatively short, so that the server cannot fall far behind the primary due to delays caused by standby queries. However, if the standby server is meant for executing long-running queries, then a high or even infinite delay value may be preferable. Keep in mind however that a long-running query could cause other sessions on the standby server to not see recent changes on the primary, if it delays application of WAL records.
Once the delay specified by max_standby_archive_delay
or max_standby_streaming_delay
has been exceeded, conflicting queries will be canceled. This usually results just in a cancellation error, although in the case of replaying a DROP DATABASE
the entire conflicting session will be terminated. Also, if the conflict is over a lock held by an idle transaction, the conflicting session is terminated (this behavior might change in the future).
Canceled queries may be retried immediately (after beginning a new transaction, of course). Since query cancellation depends on the nature of the WAL records being replayed, a query that was canceled may well succeed if it is executed again.
Keep in mind that the delay parameters are compared to the elapsed time since the WAL data was received by the standby server. Thus, the grace period allowed to any one query on the standby is never more than the delay parameter, and could be considerably less if the standby has already fallen behind as a result of waiting for previous queries to complete, or as a result of being unable to keep up with a heavy update load.
The most common reason for conflict between standby queries and WAL replay is “early cleanup”. Normally, PostgreSQL allows cleanup of old row versions when there are no transactions that need to see them to ensure correct visibility of data according to MVCC rules. However, this rule can only be applied for transactions executing on the master. So it is possible that cleanup on the master will remove row versions that are still visible to a transaction on the standby.
Experienced users should note that both row version cleanup and row version freezing will potentially conflict with standby queries. Running a manual VACUUM FREEZE
is likely to cause conflicts even on tables with no updated or deleted rows.
Users should be clear that tables that are regularly and heavily updated on the primary server will quickly cause cancellation of longer running queries on the standby. In such cases the setting of a finite value for max_standby_archive_delay
or max_standby_streaming_delay
can be considered similar to setting statement_timeout
.
Remedial possibilities exist if the number of standby-query cancellations is found to be unacceptable. The first option is to set the parameter hot_standby_feedback
, which prevents VACUUM
from removing recently-dead rows and so cleanup conflicts do not occur. If you do this, you should note that this will delay cleanup of dead rows on the primary, which may result in undesirable table bloat. However, the cleanup situation will be no worse than if the standby queries were running directly on the primary server, and you are still getting the benefit of off-loading execution onto the standby. If standby servers connect and disconnect frequently, you might want to make adjustments to handle the period when hot_standby_feedback
feedback is not being provided. For example, consider increasing max_standby_archive_delay
so that queries are not rapidly canceled by conflicts in WAL archive files during disconnected periods. You should also consider increasing max_standby_streaming_delay
to avoid rapid cancellations by newly-arrived streaming WAL entries after reconnection.
The number of query cancels and the reason for them can be viewed using the pg_stat_database_conflicts
system view on the standby server. The pg_stat_database
system view also contains summary information.
If hot_standby
is on
in postgresql.conf
(the default value) and there is a recovery.conf
file present, the server will run in Hot Standby mode. However, it may take some time for Hot Standby connections to be allowed, because the server will not accept connections until it has completed sufficient recovery to provide a consistent state against which queries can run. During this period, clients that attempt to connect will be refused with an error message. To confirm the server has come up, either loop trying to connect from the application, or look for these messages in the server logs:
Consistency information is recorded once per checkpoint on the primary. It is not possible to enable hot standby when reading WAL written during a period when wal_level
was not set to replica
or logical
on the primary. Reaching a consistent state can also be delayed in the presence of both of these conditions:
A write transaction has more than 64 subtransactions
Very long-lived write transactions
If you are running file-based log shipping ("warm standby"), you might need to wait until the next WAL file arrives, which could be as long as the archive_timeout
setting on the primary.
The setting of some parameters on the standby will need reconfiguration if they have been changed on the primary. For these parameters, the value on the standby must be equal to or greater than the value on the primary. If these parameters are not set high enough then the standby will refuse to start. Higher values can then be supplied and the server restarted to begin recovery again. These parameters are:
max_connections
max_prepared_transactions
max_locks_per_transaction
max_worker_processes
Transaction status "hint bits" written on the primary are not WAL-logged, so data on the standby will likely re-write the hints again on the standby. Thus, the standby server will still perform disk writes even though all users are read-only; no changes occur to the data values themselves. Users will still write large sort temporary files and re-generate relcache info files, so no part of the database is truly read-only during hot standby mode. Note also that writes to remote databases using dblink module, and other operations outside the database using PL functions will still be possible, even though the transaction is read-only locally.
The following types of administration commands are not accepted during recovery mode:
Data Definition Language (DDL) - e.g. CREATE INDEX
Privilege and Ownership - GRANT
, REVOKE
, REASSIGN
Maintenance commands - ANALYZE
, VACUUM
, CLUSTER
, REINDEX
Again, note that some of these commands are actually allowed during "read only" mode transactions on the primary.
As a result, you cannot create additional indexes that exist solely on the standby, nor statistics that exist solely on the standby. If these administration commands are needed, they should be executed on the primary, and eventually those changes will propagate to the standby.
pg_cancel_backend()
and pg_terminate_backend()
will work on user backends, but not the Startup process, which performs recovery. pg_stat_activity
does not show recovering transactions as active. As a result, pg_prepared_xacts
is always empty during recovery. If you wish to resolve in-doubt prepared transactions, view pg_prepared_xacts
on the primary and issue commands to resolve transactions there or resolve them after the end of recovery.
pg_locks
will show locks held by backends, as normal. pg_locks
also shows a virtual transaction managed by the Startup process that owns all AccessExclusiveLocks
held by transactions being replayed by recovery. Note that the Startup process does not acquire locks to make database changes, and thus locks other than AccessExclusiveLocks
do not show in pg_locks
for the Startup process; they are just presumed to exist.
The Nagios plugin check_pgsql will work, because the simple information it checks for exists. The check_postgres monitoring script will also work, though some reported values could give different or confusing results. For example, last vacuum time will not be maintained, since no vacuum occurs on the standby. Vacuums running on the primary do still send their changes to the standby.
WAL file control commands will not work during recovery, e.g. pg_start_backup
, pg_switch_wal
etc.
Dynamically loadable modules work, including pg_stat_statements
.
Advisory locks work normally in recovery, including deadlock detection. Note that advisory locks are never WAL logged, so it is impossible for an advisory lock on either the primary or the standby to conflict with WAL replay. Nor is it possible to acquire an advisory lock on the primary and have it initiate a similar advisory lock on the standby. Advisory locks relate only to the server on which they are acquired.
Trigger-based replication systems such as Slony, Londiste and Bucardo won't run on the standby at all, though they will run happily on the primary server as long as the changes are not sent to standby servers to be applied. WAL replay is not trigger-based so you cannot relay from the standby to any system that requires additional database writes or relies on the use of triggers.
New OIDs cannot be assigned, though some UUID generators may still work as long as they do not rely on writing new status to the database.
Currently, temporary table creation is not allowed during read only transactions, so in some cases existing scripts will not run correctly. This restriction might be relaxed in a later release. This is both a SQL Standard compliance issue and a technical issue.
DROP TABLESPACE
can only succeed if the tablespace is empty. Some standby users may be actively using the tablespace via their temp_tablespaces
parameter. If there are temporary files in the tablespace, all active queries are canceled to ensure that temporary files are removed, so the tablespace can be removed and WAL replay can continue.
Running DROP DATABASE
or ALTER DATABASE ... SET TABLESPACE
on the primary will generate a WAL entry that will cause all users connected to that database on the standby to be forcibly disconnected. This action occurs immediately, whatever the setting of max_standby_streaming_delay
. Note that ALTER DATABASE ... RENAME
does not disconnect users, which in most cases will go unnoticed, though might in some cases cause a program confusion if it depends in some way upon database name.
In normal (non-recovery) mode, if you issue DROP USER
or DROP ROLE
for a role with login capability while that user is still connected then nothing happens to the connected user - they remain connected. The user cannot reconnect however. This behavior applies in recovery also, so a DROP USER
on the primary does not disconnect that user on the standby.
The statistics collector is active during recovery. All scans, reads, blocks, index usage, etc., will be recorded normally on the standby. Replayed actions will not duplicate their effects on primary, so replaying an insert will not increment the Inserts column of pg_stat_user_tables. The stats file is deleted at the start of recovery, so stats from primary and standby will differ; this is considered a feature, not a bug.
Autovacuum is not active during recovery. It will start normally at the end of recovery.
The background writer is active during recovery and will perform restartpoints (similar to checkpoints on the primary) and normal block cleaning activities. This can include updates of the hint bit information stored on the standby server. The CHECKPOINT
command is accepted during recovery, though it performs a restartpoint rather than a new checkpoint.
There are several limitations of Hot Standby. These can and probably will be fixed in future releases:
Full knowledge of running transactions is required before snapshots can be taken. Transactions that use large numbers of subtransactions (currently greater than 64) will delay the start of read only connections until the completion of the longest running write transaction. If this situation occurs, explanatory messages will be sent to the server log.
Valid starting points for standby queries are generated at each checkpoint on the master. If the standby is shut down while the master is in a shutdown state, it might not be possible to re-enter Hot Standby until the primary is started up, so that it generates further starting points in the WAL logs. This situation isn't a problem in the most common situations where it might happen. Generally, if the primary is shut down and not available anymore, that's likely due to a serious failure that requires the standby being converted to operate as the new primary anyway. And in situations where the primary is being intentionally taken down, coordinating to make sure the standby becomes the new primary smoothly is also standard procedure.
At the end of recovery, AccessExclusiveLocks
held by prepared transactions will require twice the normal number of lock table entries. If you plan on running either a large number of concurrent prepared transactions that normally take AccessExclusiveLocks
, or you plan on having one large transaction that takes many AccessExclusiveLocks
, you are advised to select a larger value of max_locks_per_transaction
, perhaps as much as twice the value of the parameter on the primary server. You need not consider this at all if your setting of max_prepared_transactions
is 0.
如果主伺服器發生故障,則備用伺服器應該開始故障轉移程序。
如果備用伺服器發生故障,則毌須進行故障轉移。如果備用伺服器可以重新啟動,即使是在某個時間點之後,也可以利用可重新啟動的還原功能立即重新啟動還原程序。如果備用伺服器無法重新啟動,則應該重新建立一個完整的備用伺服器。
如果主伺服器發生故障,並且備用伺服器成為新的主伺服器,然後舊的主伺服器重新啟動,則必須具有一種機制,通知舊的主伺服器不再是主伺服器。這有時被稱為 STONITH(Shoot The Other Node In The Head),這是避免兩個系統都認為它們是主要系統所必須要做的事,這種情況會導致混亂並導致資料損毁。
Many failover systems use just two systems, the primary and the standby, connected by some kind of heartbeat mechanism to continually verify the connectivity between the two and the viability of the primary. It is also possible to use a third system (called a witness server) to prevent some cases of inappropriate failover, but the additional complexity might not be worthwhile unless it is set up with sufficient care and rigorous testing.
PostgreSQL does not provide the system software required to identify a failure on the primary and notify the standby database server. Many such tools exist and are well integrated with the operating system facilities required for successful failover, such as IP address migration.
Once failover to the standby occurs, there is only a single server in operation. This is known as a degenerate state. The former standby is now the primary, but the former primary is down and might stay down. To return to normal operation, a standby server must be recreated, either on the former primary system when it comes up, or on a third, possibly new, system. The utility can be used to speed up this process on large clusters. Once complete, the primary and standby can be considered to have switched roles. Some people choose to use a third server to provide backup for the new primary until the new standby server is recreated, though clearly this complicates the system configuration and operational processes.
So, switching from primary to standby server can be fast but requires some time to re-prepare the failover cluster. Regular switching from primary to standby is useful, since it allows regular downtime on each system for maintenance. This also serves as a test of the failover mechanism to ensure that it will really work when you need it. Written administration procedures are advised.
要觸發日誌傳送備用伺服器的故障轉移,請執行 pg_ctl promote
、呼叫 pg_promote
或建立一個事件觸發的執行腳本檔案,該檔案名稱及路徑由 promot_trigger_file 指定。如果您打算使用 pg_ctl promote
或呼叫 pg_promote
進行故障轉移,則不需要 promote_trigger_file。 如果要設定僅用於從主伺服器唯讀查詢(而不是出於高可用性目的)的報表伺服器,則毌須進行故障轉移。
資料庫管理員經常會想:「系統現在正在做什麼?」本章討論如何回答這個問題。
有幾種工具可用於監控資料庫活動和分析效能。本章的大部分內容都致力於描述 PostgreSQL 的統計收集器,但不應忽視普通的 Unix 監控程序,如 ps、top、iostat 和 vmstat。而且,一旦發現查詢效率不佳,可能需要使用 PostgreSQL 的 指令進一步調查。討論了 EXPLAIN 和其他方法來解析單個查詢的行為。
PostgreSQL 在執行過程中不斷地在叢集資料目錄的 pg_wal/ 子目錄中維護一個交易日誌(Write Ahead Log, WAL)。日誌記錄了對資料庫資料檔案所做的所有變更。該日誌主要用於意外災難還原的目的:如果系統意外損毁,則可以透過「重播」自上一個檢查點以來所建立的日誌項目來恢復資料庫的一致性。然而,日誌的存在使得可以使用第三種策略來備份數據庫:我們可以將檔案系統級備份與 WAL 檔案備份結合在一起。 如果需要復原,我們將還原檔案系統備份,然後從備份的 WAL 檔案中重播以使系統進入當下的狀態。 與前面所介紹的方法相比,這種方法的管理更為複雜,但具有一些明顯的好處:
我們不需要完美一致的檔案系統備份作為起點。備份中的任何內部不一致都將透過日誌重播進行糾正(這與損毁復原期間發生的變化沒有太大不同)。因此,我們不需要檔案系統的快照功能,而只需要 tar 或類似的封存工具。
由於我們可以結合無限長的 WAL 檔案序列進行重播,因此只需繼續封存 WAL 檔案就可以實現連續備份。這對於大型資料庫來說尤其具有價值,在大型資料庫中,經常性進行完整備份可能不太方便。
不必一直重複播放 WAL 項目。我們可以隨時停止重播,並獲得當時的資料庫快照。因此,此技術支持時間點還原:自從進行基本備份以來,可以隨時將資料庫還原到其狀態。
如果我們將一系列 WAL 檔案連續提供給另一台已載入了相同基本備份檔案的伺服器,則我們將擁有一個熱備份系統:在任何時候,我們都可以啟動第二台伺服器,而該伺服器將具有近乎最新的資料庫副本。
pg_dump 和 pg_dumpall 並不會產生檔案系統層級的備份,因此不能用於連續歸檔解決方案的一部分。這樣的備份是邏輯上的,並且沒有包含足夠的資訊供 WAL 重播使用。
與普通資料系統備份技術一樣,此方法只能支援還原整個資料庫叢集,而不支援部份還原。此外,它還需要大量的檔案儲存空間:基本備份可能會很龐大,繁忙的系統將產生成許多數 MegaByte 等級的 WAL 流量,必須對其進行封存。儘管如此,在許多需要高可靠性的情況下,它還是備份技術中的首選。
要使用連續歸檔(許多資料庫供應商也將其稱為「線上備份」)成功恢復,您需要連續的 WAL 歸檔序列,該序列至少可以延伸到備份的開始時間。因此,在開始第一次基本備份之前,應先設定並測試用於封存 WAL 檔案的程序。因此,我們首先討論封存 WAL 檔案的機制。
從抽象的意義上講,執行中的 PostgreSQL 系統會產生無限長的 WAL 記錄序列。系統從物理上將此序列劃分為 WAL 分段檔案,每個檔案通常為16MB(儘管分段大小可以在 initdb 期間變更)。 分段檔案被賦予數字名稱,以反映它們在抽象的 WAL 序列中的位置。當不使用 WAL 歸檔時,系統通常只建立幾個分段檔案,然後透過將不再需要的分段檔案重新命名為較高的分段號號來「回收」它們。假設其內容在最後一個檢查點之前的分段檔案不再受關注時,即為可以回收。
歸檔處理 WAL 資料時,我們需要在每個分段檔案填滿後取得其內容,並將該資料保存在回收分段檔案以供重用之前的某個位置。根據應用程序和可用硬體的不同,可能有許多不同的「將資料保存到某處」的方式:我們可以將分段檔案複製到另一台主機上 NFS 掛載的目錄中,然後將它們寫入磁帶中(確保您擁有 一種識別每個檔案的原始名稱的方法),或者將它們一起批次處理並燒錄到 CD 上,或者也可以完全燒錄所有資料。為了給資料庫管理者提供靈活性,PostgreSQL 嘗試不對如何完成歸檔做任何假設。相反地,PostgreSQL 讓管理者指定要執行的 shell 命令,以將完整的分段檔案複製到需要的位置。該命令可以像 cp 一樣簡單,也可以呼叫複雜的 shell 腳本—一切由你決定。
要啟用 WAL 歸檔機制,請將 組態參數設定為 replica 或更高的等級,將 設定為 on,然後在 組態參數中指定要使用的 shell 命令。實際上,這些設定始終會放置在 postgresql.conf 檔案中。在 archive_command 中,%p 替換為要存檔的檔案路徑名稱,而 %f 僅替換為檔案名稱。(路徑名稱是相對於目前的工作目錄(即叢集的資料目錄)的。)如果需要在命令中嵌入實際的 % 字符,請使用 %%。最簡單的指令是:
它將可歸檔的 WAL 分段檔案複製到目錄 /mnt/server/archivedir 中。 (這是範例,而不是建議,並且可能不是所有平台都適用。)替換 %p 和 %f 參數後,實際執行的命令可能如下所示:
將為每個要歸檔的新檔案產生一個類似的命令。
將以執行 PostgreSQL 伺服器的同一用戶的所有權執行 archive 命令。由於要歸檔的一系列 WAL 檔案實際上包含了資料庫中的所有內容,因此您將要確保已歸檔的資料受到保護,以免被窺探;例如,應該存檔到沒有同群組使用者,所有其他人都沒有讀取權限的目錄中。
It is important that the archive command return zero exit status if and only if it succeeds. Upon getting a zero result, PostgreSQL will assume that the file has been successfully archived, and will remove or recycle it. However, a nonzero status tells PostgreSQL that the file was not archived; it will try again periodically until it succeeds.
The archive command should generally be designed to refuse to overwrite any pre-existing archive file. This is an important safety feature to preserve the integrity of your archive in case of administrator error (such as sending the output of two different servers to the same archive directory).
It is advisable to test your proposed archive command to ensure that it indeed does not overwrite an existing file, and that it returns nonzero status in this case. The example command above for Unix ensures this by including a separate test
step. On some Unix platforms, cp
has switches such as -i
that can be used to do the same thing less verbosely, but you should not rely on these without verifying that the right exit status is returned. (In particular, GNU cp
will return status zero when -i
is used and the target file already exists, which is not the desired behavior.)
While designing your archiving setup, consider what will happen if the archive command fails repeatedly because some aspect requires operator intervention or the archive runs out of space. For example, this could occur if you write to tape without an autochanger; when the tape fills, nothing further can be archived until the tape is swapped. You should ensure that any error condition or request to a human operator is reported appropriately so that the situation can be resolved reasonably quickly. The pg_wal/
directory will continue to fill with WAL segment files until the situation is resolved. (If the file system containing pg_wal/
fills up, PostgreSQL will do a PANIC shutdown. No committed transactions will be lost, but the database will remain offline until you free some space.)
The speed of the archiving command is unimportant as long as it can keep up with the average rate at which your server generates WAL data. Normal operation continues even if the archiving process falls a little behind. If archiving falls significantly behind, this will increase the amount of data that would be lost in the event of a disaster. It will also mean that the pg_wal/
directory will contain large numbers of not-yet-archived segment files, which could eventually exceed available disk space. You are advised to monitor the archiving process to ensure that it is working as you intend.
In writing your archive command, you should assume that the file names to be archived can be up to 64 characters long and can contain any combination of ASCII letters, digits, and dots. It is not necessary to preserve the original relative path (%p
) but it is necessary to preserve the file name (%f
).
It is not necessary to be concerned about the amount of time it takes to make a base backup. However, if you normally run the server with full_page_writes
disabled, you might notice a drop in performance while the backup runs since full_page_writes
is effectively forced on during backup mode.
To make use of the backup, you will need to keep all the WAL segment files generated during and after the file system backup. To aid you in doing this, the base backup process creates a backup history file that is immediately stored into the WAL archive area. This file is named after the first WAL segment file that you need for the file system backup. For example, if the starting WAL file is 0000000100001234000055CD
the backup history file will be named something like 0000000100001234000055CD.007C9330.backup
. (The second part of the file name stands for an exact position within the WAL file, and can ordinarily be ignored.) Once you have safely archived the file system backup and the WAL segment files used during the backup (as specified in the backup history file), all archived WAL segments with names numerically less are no longer needed to recover the file system backup and can be deleted. However, you should consider keeping several backup sets to be absolutely certain that you can recover your data.
Since you have to keep around all the archived WAL files back to your last base backup, the interval between base backups should usually be chosen based on how much storage you want to expend on archived WAL files. You should also consider how long you are prepared to spend recovering, if recovery should be necessary — the system will have to replay all those WAL segments, and that could take awhile if it has been a long time since the last base backup.
Low level base backups can be made in a non-exclusive or an exclusive way. The non-exclusive method is recommended and the exclusive one is deprecated and will eventually be removed.
Ensure that WAL archiving is enabled and working.
Connect to the server (it does not matter which database) as a user with rights to run pg_start_backup (superuser, or a user who has been granted EXECUTE on the function) and issue the command:
where label
is any string you want to use to uniquely identify this backup operation. The connection calling pg_start_backup
must be maintained until the end of the backup, or the backup will be automatically aborted.
The third parameter being false
tells pg_start_backup
to initiate a non-exclusive base backup.
In the same connection as before, issue the command:
This terminates backup mode. On a primary, it also performs an automatic switch to the next WAL segment. On a standby, it is not possible to automatically switch WAL segments, so you may wish to run pg_switch_wal
on the primary to perform a manual switch. The reason for the switch is to arrange for the last WAL segment file written during the backup interval to be ready to archive.
The pg_stop_backup
will return one row with three values. The second of these fields should be written to a file named backup_label
in the root directory of the backup. The third field should be written to a file named tablespace_map
unless the field is empty. These files are vital to the backup working, and must be written without modification.
Once the WAL segment files active during the backup are archived, you are done. The file identified by pg_stop_backup
's first return value is the last segment that is required to form a complete set of backup files. On a primary, if archive_mode
is enabled and the wait_for_archive
parameter is true
, pg_stop_backup
does not return until the last segment has been archived. On a standby, archive_mode
must be always
in order for pg_stop_backup
to wait. Archiving of these files happens automatically since you have already configured archive_command
. In most cases this happens quickly, but you are advised to monitor your archive system to ensure there are no delays. If the archive process has fallen behind because of failures of the archive command, it will keep retrying until the archive succeeds and the backup is complete. If you wish to place a time limit on the execution of pg_stop_backup
, set an appropriate statement_timeout
value, but make note that if pg_stop_backup
terminates because of this your backup may not be valid.
If the backup process monitors and ensures that all WAL segment files required for the backup are successfully archived then the wait_for_archive
parameter (which defaults to true) can be set to false to have pg_stop_backup
return as soon as the stop backup record is written to the WAL. By default, pg_stop_backup
will wait until all WAL has been archived, which can take some time. This option must be used with caution: if WAL archiving is not monitored correctly then the backup might not include all of the WAL files and will therefore be incomplete and not able to be restored.
The exclusive backup method is deprecated and should be avoided. Prior to PostgreSQL 9.6, this was the only low-level method available, but it is now recommended that all users upgrade their scripts to use non-exclusive backups.
The process for an exclusive backup is mostly the same as for a non-exclusive one, but it differs in a few key steps. This type of backup can only be taken on a primary and does not allow concurrent backups. Moreover, because it creates a backup label file, as described below, it can block automatic restart of the master server after a crash. On the other hand, the erroneous removal of this file from a backup or standby is a common mistake, which can result in serious data corruption. If it is necessary to use this method, the following steps may be used.
Ensure that WAL archiving is enabled and working.
Connect to the server (it does not matter which database) as a user with rights to run pg_start_backup (superuser, or a user who has been granted EXECUTE on the function) and issue the command:
where label
is any string you want to use to uniquely identify this backup operation. pg_start_backup
creates a backup label file, called backup_label
, in the cluster directory with information about your backup, including the start time and label string. The function also creates a tablespace map file, called tablespace_map
, in the cluster directory with information about tablespace symbolic links in pg_tblspc/
if one or more such link is present. Both files are critical to the integrity of the backup, should you need to restore from it.
This forces the checkpoint to be done as quickly as possible.
As noted above, if the server crashes during the backup it may not be possible to restart until the backup_label
file has been manually deleted from the PGDATA
directory. Note that it is very important to never remove the backup_label
file when restoring a backup, because this will result in corruption. Confusion about when it is appropriate to remove this file is a common cause of data corruption when using this method; be very certain that you remove the file only on an existing master and never when building a standby or restoring a backup, even if you are building a standby that will subsequently be promoted to a new master.
Again connect to the database as a user with rights to run pg_stop_backup (superuser, or a user who has been granted EXECUTE on the function), and issue the command:
This function terminates backup mode and performs an automatic switch to the next WAL segment. The reason for the switch is to arrange for the last WAL segment written during the backup interval to be ready to archive.
Once the WAL segment files active during the backup are archived, you are done. The file identified by pg_stop_backup
's result is the last segment that is required to form a complete set of backup files. If archive_mode
is enabled, pg_stop_backup
does not return until the last segment has been archived. Archiving of these files happens automatically since you have already configured archive_command
. In most cases this happens quickly, but you are advised to monitor your archive system to ensure there are no delays. If the archive process has fallen behind because of failures of the archive command, it will keep retrying until the archive succeeds and the backup is complete.
When using exclusive backup mode, it is absolutely imperative to ensure that pg_stop_backup
completes successfully at the end of the backup. Even if the backup itself fails, for example due to lack of disk space, failure to call pg_stop_backup
will leave the server in backup mode indefinitely, causing future backups to fail and increasing the risk of a restart failure during the time that backup_label
exists.
Some file system backup tools emit warnings or errors if the files they are trying to copy change while the copy proceeds. When taking a base backup of an active database, this situation is normal and not an error. However, you need to ensure that you can distinguish complaints of this sort from real errors. For example, some versions of rsync return a separate exit code for “vanished source files”, and you can write a driver script to accept this exit code as a non-error case. Also, some versions of GNU tar return an error code indistinguishable from a fatal error if a file was truncated while tar was copying it. Fortunately, GNU tar versions 1.16 and later exit with 1 if a file was changed during the backup, and 2 for other errors. With GNU tar version 1.23 and later, you can use the warning options --warning=no-file-changed --warning=no-file-removed
to hide the related warning messages.
Be certain that your backup includes all of the files under the database cluster directory (e.g., /usr/local/pgsql/data
). If you are using tablespaces that do not reside underneath this directory, be careful to include them as well (and be sure that your backup archives symbolic links as links, otherwise the restore will corrupt your tablespaces).
You should, however, omit from the backup the files within the cluster's pg_wal/
subdirectory. This slight adjustment is worthwhile because it reduces the risk of mistakes when restoring. This is easy to arrange if pg_wal/
is a symbolic link pointing to someplace outside the cluster directory, which is a common setup anyway for performance reasons. You might also want to exclude postmaster.pid
and postmaster.opts
, which record information about the running postmaster, not about the postmaster which will eventually use this backup. (These files can confuse pg_ctl.)
It is often a good idea to also omit from the backup the files within the cluster's pg_replslot/
directory, so that replication slots that exist on the master do not become part of the backup. Otherwise, the subsequent use of the backup to create a standby may result in indefinite retention of WAL files on the standby, and possibly bloat on the master if hot standby feedback is enabled, because the clients that are using those replication slots will still be connecting to and updating the slots on the master, not the standby. Even if the backup is only intended for use in creating a new master, copying the replication slots isn't expected to be particularly useful, since the contents of those slots will likely be badly out of date by the time the new master comes on line.
Any file or directory beginning with pgsql_tmp
can be omitted from the backup. These files are removed on postmaster start and the directories will be recreated as needed.
pg_internal.init
files can be omitted from the backup whenever a file of that name is found. These files contain relation cache data that is always rebuilt when recovering.
The backup label file includes the label string you gave to pg_start_backup
, as well as the time at which pg_start_backup
was run, and the name of the starting WAL file. In case of confusion it is therefore possible to look inside a backup file and determine exactly which backup session the dump file came from. The tablespace map file includes the symbolic link names as they exist in the directory pg_tblspc/
and the full path of each symbolic link. These files are not merely for your information; their presence and contents are critical to the proper operation of the system's recovery process.
It is also possible to make a backup while the server is stopped. In this case, you obviously cannot use pg_start_backup
or pg_stop_backup
, and you will therefore be left to your own devices to keep track of which backup is which and how far back the associated WAL files go. It is generally better to follow the continuous archiving procedure above.
Okay, the worst has happened and you need to recover from your backup. Here is the procedure:
Stop the server, if it's running.
If you have the space to do so, copy the whole cluster data directory and any tablespaces to a temporary location in case you need them later. Note that this precaution will require that you have enough free space on your system to hold two copies of your existing database. If you do not have enough space, you should at least save the contents of the cluster's pg_wal
subdirectory, as it might contain logs which were not archived before the system went down.
Remove all existing files and subdirectories under the cluster data directory and under the root directories of any tablespaces you are using.
Restore the database files from your file system backup. Be sure that they are restored with the right ownership (the database system user, not root
!) and with the right permissions. If you are using tablespaces, you should verify that the symbolic links in pg_tblspc/
were correctly restored.
Remove any files present in pg_wal/
; these came from the file system backup and are therefore probably obsolete rather than current. If you didn't archive pg_wal/
at all, then recreate it with proper permissions, being careful to ensure that you re-establish it as a symbolic link if you had it set up that way before.
If you have unarchived WAL segment files that you saved in step 2, copy them into pg_wal/
. (It is best to copy them, not move them, so you still have the unmodified files if a problem occurs and you have to start over.)
Start the server. The server will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery. Upon completion of the recovery process, the server will remove recovery.signal
(to prevent accidentally re-entering recovery mode later) and then commence normal database operations.
Inspect the contents of the database to ensure you have recovered to the desired state. If not, return to step 1. If all is well, allow your users to connect by restoring pg_hba.conf
to normal.
The key part of all this is to set up a recovery configuration that describes how you want to recover and how far the recovery should run. The one thing that you absolutely must specify is the restore_command
, which tells PostgreSQL how to retrieve archived WAL file segments. Like the archive_command
, this is a shell command string. It can contain %f
, which is replaced by the name of the desired log file, and %p
, which is replaced by the path name to copy the log file to. (The path name is relative to the current working directory, i.e., the cluster's data directory.) Write %%
if you need to embed an actual %
character in the command. The simplest useful command is something like:
which will copy previously archived WAL segments from the directory /mnt/server/archivedir
. Of course, you can use something much more complicated, perhaps even a shell script that requests the operator to mount an appropriate tape.
It is important that the command return nonzero exit status on failure. The command will be called requesting files that are not present in the archive; it must return nonzero when so asked. This is not an error condition. An exception is that if the command was terminated by a signal (other than SIGTERM, which is used as part of a database server shutdown) or an error by the shell (such as command not found), then recovery will abort and the server will not start up.
Not all of the requested files will be WAL segment files; you should also expect requests for files with a suffix of .history
. Also be aware that the base name of the %p
path will be different from %f
; do not expect them to be interchangeable.
WAL segments that cannot be found in the archive will be sought in pg_wal/
; this allows use of recent un-archived segments. However, segments that are available from the archive will be used in preference to files in pg_wal/
.
The stop point must be after the ending time of the base backup, i.e., the end time of pg_stop_backup
. You cannot use a base backup to recover to a time when that backup was in progress. (To recover to such a time, you must go back to your previous base backup and roll forward from there.)
If recovery finds corrupted WAL data, recovery will halt at that point and the server will not start. In such a case the recovery process could be re-run from the beginning, specifying a “recovery target” before the point of corruption so that recovery can complete normally. If recovery fails for an external reason, such as a system crash or if the WAL archive has become inaccessible, then the recovery can simply be restarted and it will restart almost from where it failed. Recovery restart works much like checkpointing in normal operation: the server periodically forces all its state to disk, and then updates the pg_control
file to indicate that the already-processed WAL data need not be scanned again.
The ability to restore the database to a previous point in time creates some complexities that are akin to science-fiction stories about time travel and parallel universes. For example, in the original history of the database, suppose you dropped a critical table at 5:15PM on Tuesday evening, but didn't realize your mistake until Wednesday noon. Unfazed, you get out your backup, restore to the point-in-time 5:14PM Tuesday evening, and are up and running. In this history of the database universe, you never dropped the table. But suppose you later realize this wasn't such a great idea, and would like to return to sometime Wednesday morning in the original history. You won't be able to if, while your database was up-and-running, it overwrote some of the WAL segment files that led up to the time you now wish you could get back to. Thus, to avoid this, you need to distinguish the series of WAL records generated after you've done a point-in-time recovery from those that were generated in the original database history.
To deal with this problem, PostgreSQL has a notion of timelines. Whenever an archive recovery completes, a new timeline is created to identify the series of WAL records generated after that recovery. The timeline ID number is part of WAL segment file names so a new timeline does not overwrite the WAL data generated by previous timelines. It is in fact possible to archive many different timelines. While that might seem like a useless feature, it's often a lifesaver. Consider the situation where you aren't quite sure what point-in-time to recover to, and so have to do several point-in-time recoveries by trial and error until you find the best place to branch off from the old history. Without timelines this process would soon generate an unmanageable mess. With timelines, you can recover to any prior state, including states in timeline branches that you abandoned earlier.
Every time a new timeline is created, PostgreSQL creates a “timeline history” file that shows which timeline it branched off from and when. These history files are necessary to allow the system to pick the right WAL segment files when recovering from an archive that contains multiple timelines. Therefore, they are archived into the WAL archive area just like WAL segment files. The history files are just small text files, so it's cheap and appropriate to keep them around indefinitely (unlike the segment files which are large). You can, if you like, add comments to a history file to record your own notes about how and why this particular timeline was created. Such comments will be especially valuable when you have a thicket of different timelines as a result of experimentation.
Some tips for configuring continuous archiving are given here.
It is possible to use PostgreSQL's backup facilities to produce standalone hot backups. These are backups that cannot be used for point-in-time recovery, yet are typically much faster to backup and restore than pg_dump dumps. (They are also much larger than pg_dump dumps, so in some cases the speed advantage might be negated.)
If more flexibility in copying the backup files is needed, a lower level process can be used for standalone hot backups as well. To prepare for low level standalone hot backups, make sure wal_level
is set to replica
or higher, archive_mode
to on
, and set up an archive_command
that performs archiving only when a switch file exists. For example:
This command will perform archiving when /var/lib/pgsql/backup_in_progress
exists, and otherwise silently return zero exit status (allowing PostgreSQL to recycle the unwanted WAL file).
With this preparation, a backup can be taken using a script like the following:
The switch file /var/lib/pgsql/backup_in_progress
is created first, enabling archiving of completed WAL files to occur. After the backup the switch file is removed. Archived WAL files are then added to the backup so that both base backup and all required WAL files are part of the same tar file. Please remember to add error handling to your backup scripts.
If archive storage size is a concern, you can use gzip to compress the archive files:
You will then need to use gunzip during recovery:
Many people choose to use scripts to define their archive_command
, so that their postgresql.conf
entry looks very simple:
Using a separate script file is advisable any time you want to use more than a single command in the archiving process. This allows all complexity to be managed within the script, which can be written in a popular scripting language such as bash or perl.
Examples of requirements that might be solved within a script include:
Copying data to secure off-site data storage
Batching WAL files so that they are transferred every three hours, rather than one at a time
Interfacing with other backup and recovery software
Interfacing with monitoring software to report errors
At this writing, there are several limitations of the continuous archiving technique. These will probably be fixed in future releases:
PostgreSQL provides facilities to support dynamic tracing of the database server. This allows an external utility to be called at specific points in the code and thereby trace execution.
A number of probes or trace points are already inserted into the source code. These probes are intended to be used by database developers and administrators. By default the probes are not compiled into PostgreSQL; the user needs to explicitly tell the configure script to make the probes available.
Currently, the utility is supported, which, at the time of this writing, is available on Solaris, macOS, FreeBSD, NetBSD, and Oracle Linux. The project for Linux provides a DTrace equivalent and can also be used. Supporting other dynamic tracing utilities is theoretically possible by changing the definitions for the macros in src/include/utils/probes.h
.
By default, probes are not available, so you will need to explicitly tell the configure script to make the probes available in PostgreSQL. To include DTrace support specify --enable-dtrace
to configure. See for further information.
A number of standard probes are provided in the source code, as shown in ; shows the types used in the probes. More probes can certainly be added to enhance PostgreSQL's observability.
The example below shows a DTrace script for analyzing transaction counts in the system, as an alternative to snapshotting pg_stat_database
before and after a performance test:
When executed, the example D script gives output such as:
SystemTap uses a different notation for trace scripts than DTrace does, even though the underlying trace points are compatible. One point worth noting is that at this writing, SystemTap scripts must reference probe names using double underscores in place of hyphens. This is expected to be fixed in future SystemTap releases.
You should remember that DTrace scripts need to be carefully written and debugged, otherwise the trace information collected might be meaningless. In most cases where problems are found it is the instrumentation that is at fault, not the underlying system. When discussing information found using dynamic tracing, be sure to enclose the script used to allow that too to be checked and discussed.
New probes can be defined within the code wherever the developer desires, though this will require a recompilation. Below are the steps for inserting new probes:
Decide on probe names and data to be made available through the probes
Add the probe definitions to src/backend/utils/probes.d
Include pg_trace.h
if it is not already present in the module(s) containing the probe points, and insert TRACE_POSTGRESQL
probe macros at the desired locations in the source code
Recompile and verify that the new probes are available
Example: Here is an example of how you would add a probe to trace all new transactions by transaction ID.
Decide that the probe will be named transaction-start
and requires a parameter of type LocalTransactionId
Add the probe definition to src/backend/utils/probes.d
:
Note the use of the double underline in the probe name. In a DTrace script using the probe, the double underline needs to be replaced with a hyphen, so transaction-start
is the name to document for users.
At compile time, transaction__start
is converted to a macro called TRACE_POSTGRESQL_TRANSACTION_START
(notice the underscores are single here), which is available by including pg_trace.h
. Add the macro call to the appropriate location in the source code. In this case, it looks like the following:
After recompiling and running the new binary, check that your newly added probe is available by executing the following DTrace command. You should see similar output:
There are a few things to be careful about when adding trace macros to the C code:
You should take care that the data types specified for a probe's parameters match the data types of the variables used in the macro. Otherwise, you will get compilation errors.
On most platforms, if PostgreSQL is built with --enable-dtrace
, the arguments to a trace macro will be evaluated whenever control passes through the macro, even if no tracing is being done. This is usually not worth worrying about if you are just reporting the values of a few local variables. But beware of putting expensive function calls into the arguments. If you need to do that, consider protecting the macro with a check to see if the trace is actually enabled:
Each trace macro has a corresponding ENABLED
macro.
預寫日誌記錄(WAL)是確保資料完整性的標準方法。在大多數(可能不是全部)有關交易處理的書中可以找到詳細的說明。簡而言之,WAL的中心概念是,只有在記錄了這些變更後,即在描述變更的日誌記錄已更新到永久儲存的時候,才必須寫入對資料檔案(資料表和索引所在的位置)的變更。如果遵循此流程,則不需要在每次事務提交時都將資料完全更新到磁碟,因為我們知道在系統崩潰的情況下,我們將能夠使用日誌來恢復資料庫:尚未套用的所有變更則可以從日誌記錄重新執行到資料頁面。 (這是 roll-forward recovery,也稱為 REDO。)
Because WAL restores database file contents after a crash, journaled file systems are not necessary for reliable storage of the data files or WAL files. In fact, journaling overhead can reduce performance, especially if journaling causes file system data to be flushed to disk. Fortunately, data flushing during journaling can often be disabled with a file system mount option, e.g. data=writeback
on a Linux ext3 file system. Journaled file systems do improve boot speed after a crash.
Using WAL results in a significantly reduced number of disk writes, because only the log file needs to be flushed to disk to guarantee that a transaction is committed, rather than every data file changed by the transaction. The log file is written sequentially, and so the cost of syncing the log is much less than the cost of flushing the data pages. This is especially true for servers handling many small transactions touching different parts of the data store. Furthermore, when the server is processing many small concurrent transactions, one fsync
of the log file may suffice to commit many transactions.
WAL also makes it possible to support on-line backup and point-in-time recovery, as described in . By archiving the WAL data we can support reverting to any time instant covered by the available WAL data: we simply install a prior physical backup of the database, and replay the WAL log just as far as the desired time. What's more, the physical backup doesn't have to be an instantaneous snapshot of the database state — if it is made over some period of time, then replaying the WAL log for that period will fix any internal inconsistencies.
可靠性是任何嚴肅的資料系統的重要需求之一,而 PostgreSQL 盡一切可能保證可靠的操作。可靠操作的一個面向是,已提交事務記錄的所有資料都應該儲存在高可靠性的設備中,這樣可以防止斷電,作業系統故障和硬體故障(當然,高可靠度設備本身的故障除外)。通常,成功將資料寫入主機的永久性儲存(磁碟設備或等效設備)即可滿足此要求。 實際上,即使一台主機受到了致命的損壞,如果磁碟設備倖免於難,也可以將它們移動到具有類似硬體的另一台主機上,並且所有已提交的事務會保持不變。
While forcing data to the disk platters periodically might seem like a simple operation, it is not. Because disk drives are dramatically slower than main memory and CPUs, several layers of caching exist between the computer's main memory and the disk platters. First, there is the operating system's buffer cache, which caches frequently requested disk blocks and combines disk writes. Fortunately, all operating systems give applications a way to force writes from the buffer cache to disk, and PostgreSQL uses those features. (See the parameter to adjust how this is done.)
Next, there might be a cache in the disk drive controller; this is particularly common on RAID controller cards. Some of these caches are write-through, meaning writes are sent to the drive as soon as they arrive. Others are write-back, meaning data is sent to the drive at some later time. Such caches can be a reliability hazard because the memory in the disk controller cache is volatile, and will lose its contents in a power failure. Better controller cards have battery-backup units (BBUs), meaning the card has a battery that maintains power to the cache in case of system power loss. After power is restored the data will be written to the disk drives.
And finally, most disk drives have caches. Some are write-through while some are write-back, and the same concerns about data loss exist for write-back drive caches as for disk controller caches. Consumer-grade IDE and SATA drives are particularly likely to have write-back caches that will not survive a power failure. Many solid-state drives (SSD) also have volatile write-back caches.
These caches can typically be disabled; however, the method for doing this varies by operating system and drive type:
On Linux, IDE and SATA drives can be queried using hdparm -I
; write caching is enabled if there is a *
next to Write cache
. hdparm -W 0
can be used to turn off write caching. SCSI drives can be queried using . Use sdparm --get=WCE
to check whether the write cache is enabled and sdparm --clear=WCE
to disable it.
On FreeBSD, IDE drives can be queried using atacontrol
and write caching turned off using hw.ata.wc=0
in /boot/loader.conf
; SCSI drives can be queried using camcontrol identify
, and the write cache both queried and changed using sdparm
when available.
On Solaris, the disk write cache is controlled by format -e
. (The Solaris ZFS file system is safe with disk write-cache enabled because it issues its own disk cache flush commands.)
On Windows, if wal_sync_method
is open_datasync
(the default), write caching can be disabled by unchecking My Computer\Open\
disk drive
\Properties\Hardware\Properties\Policies\Enable write caching on the disk. Alternatively, set wal_sync_method
to fsync
or fsync_writethrough
, which prevent write caching.
On macOS, write caching can be prevented by setting wal_sync_method
to fsync_writethrough
.
Recent SATA drives (those following ATAPI-6 or later) offer a drive cache flush command (FLUSH CACHE EXT
), while SCSI drives have long supported a similar command SYNCHRONIZE CACHE
. These commands are not directly accessible to PostgreSQL, but some file systems (e.g., ZFS, ext4) can use them to flush data to the platters on write-back-enabled drives. Unfortunately, such file systems behave suboptimally when combined with battery-backup unit (BBU) disk controllers. In such setups, the synchronize command forces all data from the controller cache to the disks, eliminating much of the benefit of the BBU. You can run the program to see if you are affected. If you are affected, the performance benefits of the BBU can be regained by turning off write barriers in the file system or reconfiguring the disk controller, if that is an option. If write barriers are turned off, make sure the battery remains functional; a faulty battery can potentially lead to data loss. Hopefully file system and disk controller designers will eventually address this suboptimal behavior.
When the operating system sends a write request to the storage hardware, there is little it can do to make sure the data has arrived at a truly non-volatile storage area. Rather, it is the administrator's responsibility to make certain that all storage components ensure integrity for both data and file-system metadata. Avoid disk controllers that have non-battery-backed write caches. At the drive level, disable write-back caching if the drive cannot guarantee the data will be written before shutdown. If you use SSDs, be aware that many of these do not honor cache flush commands by default. You can test for reliable I/O subsystem behavior using .
Another risk of data loss is posed by the disk platter write operations themselves. Disk platters are divided into sectors, commonly 512 bytes each. Every physical read or write operation processes a whole sector. When a write request arrives at the drive, it might be for some multiple of 512 bytes (PostgreSQL typically writes 8192 bytes, or 16 sectors, at a time), and the process of writing could fail due to power loss at any time, meaning some of the 512-byte sectors were written while others were not. To guard against such failures, PostgreSQL periodically writes full page images to permanent WAL storage before modifying the actual page on disk. By doing this, during crash recovery PostgreSQL can restore partially-written pages from WAL. If you have file-system software that prevents partial page writes (e.g., ZFS), you can turn off this page imaging by turning off the parameter. Battery-Backed Unit (BBU) disk controllers do not prevent partial page writes unless they guarantee that data is written to the BBU as full (8kB) pages.
PostgreSQL also protects against some kinds of data corruption on storage devices that may occur because of hardware errors or media failure over time, such as reading/writing garbage data.
Each individual record in a WAL file is protected by a CRC-32 (32-bit) check that allows us to tell if record contents are correct. The CRC value is set when we write each WAL record and checked during crash recovery, archive recovery and replication.
Data pages are not currently checksummed by default, though full page images recorded in WAL records will be protected; see for details about enabling data page checksums.
Internal data structures such as pg_xact
, pg_subtrans
, pg_multixact
, pg_serial
, pg_notify
, pg_stat
, pg_snapshots
are not directly checksummed, nor are pages protected by full page writes. However, where such data structures are persistent, WAL records are written that allow recent changes to be accurately rebuilt at crash recovery and those WAL records are protected as discussed above.
Individual state files in pg_twophase
are protected by CRC-32.
Temporary data files used in larger SQL queries for sorts, materializations and intermediate results are not currently checksummed, nor will WAL records be written for changes to those files.
PostgreSQL does not protect against correctable memory errors and it is assumed you will operate using RAM that uses industry standard Error Correcting Codes (ECC) or better protection.
After restoring a backup, it is wise to run on each database so the query optimizer has useful statistics; see and for more information. For more advice on how to load large amounts of data into PostgreSQL efficiently, refer to .
pg_dump dumps only a single database at a time, and it does not dump information about roles or tablespaces (because those are cluster-wide rather than per-database). To support convenient dumping of the entire contents of a database cluster, the program is provided. pg_dumpall backs up each database in a given cluster, and also preserves cluster-wide data such as role and tablespace definitions. The basic usage of this command is:
See the and reference pages for details.
Users will be able to tell whether their session is read-only by issuing SHOW transaction_read_only
. In addition, a set of functions () allow users to access information about the standby server. These allow you to write programs that are aware of the current state of the database. These can be used to monitor the progress of recovery, or to allow you to write complex programs that restore the database to particular states.
When a conflicting query is short, it's typically desirable to allow it to complete by delaying WAL application for a little bit; but a long delay in WAL application is usually not desirable. So the cancel mechanism has parameters, and , that define the maximum allowed delay in WAL application. Conflicting queries will be canceled once it has taken longer than the relevant delay setting to apply any newly-received WAL data. There are two parameters so that different delay values can be specified for the case of reading WAL data from an archive (i.e., initial recovery from a base backup or “catching up” a standby server that has fallen far behind) versus reading WAL data via streaming replication.
Another option is to increase on the primary server, so that dead rows will not be cleaned up as quickly as they normally would be. This will allow more time for queries to execute before they are canceled on the standby, without having to set a high max_standby_streaming_delay
. However it is difficult to guarantee any specific execution-time window with this approach, since vacuum_defer_cleanup_age
is measured in transactions executed on the primary server.
It is important that the administrator select appropriate settings for and . The best choices vary depending on business priorities. For example if the server is primarily tasked as a High Availability server, then you will want low delay settings, perhaps even zero, though that is a very aggressive setting. If the standby server is tasked as an additional server for decision support queries then it might be acceptable to set the maximum delay values to many hours, or even -1 which means wait forever for queries to complete.
Various parameters have been mentioned above in and .
On the primary, parameters and can be used. and have no effect if set on the primary.
On the standby, parameters , and can be used. has no effect as long as the server remains in standby mode, though it will become relevant if the standby becomes primary.
The Serializable transaction isolation level is not yet available in hot standby. (See and for details.) An attempt to set a transaction to the serializable isolation level in hot standby mode will generate an error.
Note that although WAL archiving will allow you to restore any modifications made to the data in your PostgreSQL database, it will not restore changes made to configuration files (that is, postgresql.conf
, pg_hba.conf
and pg_ident.conf
), since those are edited manually rather than through SQL operations. You might wish to keep the configuration files in a location that will be backed up by your regular file system backup procedures. See for how to relocate the configuration files.
The archive command is only invoked on completed WAL segments. Hence, if your server generates only little WAL traffic (or has slack periods where it does so), there could be a long delay between the completion of a transaction and its safe recording in archive storage. To put a limit on how old unarchived data can be, you can set to force the server to switch to a new WAL segment file at least that often. Note that archived files that are archived early due to a forced switch are still the same length as completely full files. It is therefore unwise to set a very short archive_timeout
— it will bloat your archive storage. archive_timeout
settings of a minute or so are usually reasonable.
Also, you can force a segment switch manually with pg_switch_wal
if you want to ensure that a just-finished transaction is archived as soon as possible. Other utility functions related to WAL management are listed in .
When wal_level
is minimal
some SQL commands are optimized to avoid WAL logging, as described in . If archiving or streaming replication were turned on during execution of one of these statements, WAL would not contain enough information for archive recovery. (Crash recovery is unaffected.) For this reason, wal_level
can only be changed at server start. However, archive_command
can be changed with a configuration file reload. If you wish to temporarily stop archiving, one way to do it is to set archive_command
to the empty string (''
). This will cause WAL files to accumulate in pg_wal/
until a working archive_command
is re-established.
The easiest way to perform a base backup is to use the tool. It can create a base backup either as regular files or as a tar archive. If more flexibility than can provide is required, you can also make a base backup using the low level API (see ).
The backup history file is just a small text file. It contains the label string you gave to , as well as the starting and ending times and WAL segments of the backup. If you used the label to identify the associated dump file, then the archived history file is enough to tell you which dump file to restore.
The procedure for making a base backup using the low level APIs contains a few more steps than the method, but is relatively simple. It is very important that these steps are executed in sequence, and that the success of a step is verified before proceeding to the next step.
A non-exclusive low level backup is one that allows other concurrent backups to be running (both those started using the same backup API and those started using ).
By default, pg_start_backup
can take a long time to finish. This is because it performs a checkpoint, and the I/O required for the checkpoint will be spread out over a significant period of time, by default half your inter-checkpoint interval (see the configuration parameter ). This is usually what you want, because it minimizes the impact on query processing. If you want to start the backup as soon as possible, change the second parameter to true
, which will issue an immediate checkpoint using as much I/O as available.
Perform the backup, using any convenient file-system-backup tool such as tar or cpio (not pg_dump or pg_dumpall). It is neither necessary nor desirable to stop normal operation of the database while you do this. See for things to consider during this backup.
By default, pg_start_backup
can take a long time to finish. This is because it performs a checkpoint, and the I/O required for the checkpoint will be spread out over a significant period of time, by default half your inter-checkpoint interval (see the configuration parameter ). This is usually what you want, because it minimizes the impact on query processing. If you want to start the backup as soon as possible, use:
Perform the backup, using any convenient file-system-backup tool such as tar or cpio (not pg_dump or pg_dumpall). It is neither necessary nor desirable to stop normal operation of the database while you do this. See for things to consider during this backup.
The contents of the directories pg_dynshmem/
, pg_notify/
, pg_serial/
, pg_snapshots/
, pg_stat_tmp/
, and pg_subtrans/
(but not the directories themselves) can be omitted from the backup as they will be initialized on postmaster startup. If is set and is under the data directory then the contents of that directory can also be omitted.
Set recovery configuration settings in postgresql.conf
(see ) and create a file recovery.signal
in the cluster data directory. You might also want to temporarily modify pg_hba.conf
to prevent ordinary users from connecting until you are sure the recovery was successful.
Normally, recovery will proceed through all available WAL segments, thereby restoring the database to the current point in time (or as close as possible given the available WAL segments). Therefore, a normal recovery will end with a “file not found” message, the exact text of the error message depending upon your choice of restore_command
. You may also see an error message at the start of recovery for a file named something like 00000001.history
. This is also normal and does not indicate a problem in simple recovery situations; see for discussion.
If you want to recover to some previous point in time (say, right before the junior DBA dropped your main transaction table), just specify the required . You can specify the stop point, known as the “recovery target”, either by date/time, named restore point or by completion of a specific transaction ID. As of this writing only the date/time and named restore point options are very usable, since there are no tools to help you identify with any accuracy which transaction ID to use.
The default behavior of recovery is to recover along the same timeline that was current when the base backup was taken. If you wish to recover into some child timeline (that is, you want to return to some state that was itself generated after a recovery attempt), you need to specify the target timeline ID in . You cannot recover into timelines that branched off earlier than the base backup.
As with base backups, the easiest way to produce a standalone hot backup is to use the tool. If you include the -X
parameter when calling it, all the write-ahead log required to use the backup will be included in the backup automatically, and no special action is required to restore the backup.
When using an archive_command
script, it's desirable to enable . Any messages written to stderr from the script will then appear in the database server log, allowing complex configurations to be diagnosed easily if they fail.
If a command is executed while a base backup is being taken, and then the template database that the CREATE DATABASE
copied is modified while the base backup is still in progress, it is possible that recovery will cause those modifications to be propagated into the created database as well. This is of course undesirable. To avoid this risk, it is best not to modify any template databases while taking a base backup.
commands are WAL-logged with the literal absolute path, and will therefore be replayed as tablespace creations with the same absolute path. This might be undesirable if the log is being replayed on a different machine. It can be dangerous even if the log is being replayed on the same machine, but into a new data directory: the replay will still overwrite the contents of the original tablespace. To avoid potential gotchas of this sort, the best practice is to take a new base backup after creating or dropping tablespaces.
It should also be noted that the default WAL format is fairly bulky since it includes many disk page snapshots. These page snapshots are designed to support crash recovery, since we might need to fix partially-written disk pages. Depending on your system hardware and software, the risk of partial writes might be small enough to ignore, in which case you can significantly reduce the total volume of archived logs by turning off page snapshots using the parameter. (Read the notes and warnings in before you do so.) Turning off page snapshots does not prevent use of the logs for PITR operations. An area for future development is to compress archived WAL data by removing unnecessary page copies even when full_page_writes
is on. In the meantime, administrators might wish to reduce the number of page snapshots included in WAL by increasing the checkpoint interval parameters as much as feasible.
Role
Allowed Access
pg_read_all_settings
Read all configuration variables, even those normally visible only to superusers.
pg_read_all_stats
Read all pg_stat_* views and use various statistics related extensions, even those normally visible only to superusers.
pg_stat_scan_tables
Execute monitoring functions that may take ACCESS SHARE
locks on tables, potentially for a long time.
pg_monitor
Read/execute various monitoring views and functions. This role is a member of pg_read_all_settings
, pg_read_all_stats
and pg_stat_scan_tables
.
pg_signal_backend
Signal another backend to cancel a query or terminate its session.
pg_read_server_files
Allow reading files from any location the database can access on the server with COPY and other file-access functions.
pg_write_server_files
Allow writing to files in any location the database can access on the server with COPY and other file-access functions.
pg_execute_server_program
Allow executing programs on the database server as the user the database runs as with COPY and other functions which allow executing a server-side program.
LC_COLLATE
String sort order
LC_CTYPE
字元分類(什麼是字母?它的大寫字母是?)
LC_MESSAGES
訊息的語言
LC_MONETARY
格式化貨幣金額
LC_NUMERIC
格式化數字
LC_TIME
格式化日期和時間
Name
Description
Language
Server?
ICU?
Bytes/Char
Aliases
BIG5
Big Five
Traditional Chinese
No
No
1-2
WIN950
, Windows950
EUC_CN
Extended UNIX Code-CN
Simplified Chinese
Yes
Yes
1-3
EUC_JP
Extended UNIX Code-JP
Japanese
Yes
Yes
1-3
EUC_JIS_2004
Extended UNIX Code-JP, JIS X 0213
Japanese
Yes
No
1-3
EUC_KR
Extended UNIX Code-KR
Korean
Yes
Yes
1-3
EUC_TW
Extended UNIX Code-TW
Traditional Chinese, Taiwanese
Yes
Yes
1-3
GB18030
National Standard
Chinese
No
No
1-4
GBK
Extended National Standard
Simplified Chinese
No
No
1-2
WIN936
, Windows936
ISO_8859_5
ISO 8859-5, ECMA 113
Latin/Cyrillic
Yes
Yes
1
ISO_8859_6
ISO 8859-6, ECMA 114
Latin/Arabic
Yes
Yes
1
ISO_8859_7
ISO 8859-7, ECMA 118
Latin/Greek
Yes
Yes
1
ISO_8859_8
ISO 8859-8, ECMA 121
Latin/Hebrew
Yes
Yes
1
JOHAB
JOHAB
Korean (Hangul)
No
No
1-3
KOI8R
KOI8-R
Cyrillic (Russian)
Yes
Yes
1
KOI8
KOI8U
KOI8-U
Cyrillic (Ukrainian)
Yes
Yes
1
LATIN1
ISO 8859-1, ECMA 94
Western European
Yes
Yes
1
ISO88591
LATIN2
ISO 8859-2, ECMA 94
Central European
Yes
Yes
1
ISO88592
LATIN3
ISO 8859-3, ECMA 94
South European
Yes
Yes
1
ISO88593
LATIN4
ISO 8859-4, ECMA 94
North European
Yes
Yes
1
ISO88594
LATIN5
ISO 8859-9, ECMA 128
Turkish
Yes
Yes
1
ISO88599
LATIN6
ISO 8859-10, ECMA 144
Nordic
Yes
Yes
1
ISO885910
LATIN7
ISO 8859-13
Baltic
Yes
Yes
1
ISO885913
LATIN8
ISO 8859-14
Celtic
Yes
Yes
1
ISO885914
LATIN9
ISO 8859-15
LATIN1 with Euro and accents
Yes
Yes
1
ISO885915
LATIN10
ISO 8859-16, ASRO SR 14111
Romanian
Yes
No
1
ISO885916
MULE_INTERNAL
Mule internal code
Multilingual Emacs
Yes
No
1-4
SJIS
Shift JIS
Japanese
No
No
1-2
Mskanji
, ShiftJIS
, WIN932
, Windows932
SHIFT_JIS_2004
Shift JIS, JIS X 0213
Japanese
No
No
1-2
SQL_ASCII
unspecified (see text)
any
Yes
No
1
UHC
Unified Hangul Code
Korean
No
No
1-2
WIN949
, Windows949
UTF8
Unicode, 8-bit
all
Yes
Yes
1-4
Unicode
WIN866
Windows CP866
Cyrillic
Yes
Yes
1
ALT
WIN874
Windows CP874
Thai
Yes
No
1
WIN1250
Windows CP1250
Central European
Yes
Yes
1
WIN1251
Windows CP1251
Cyrillic
Yes
Yes
1
WIN
WIN1252
Windows CP1252
Western European
Yes
Yes
1
WIN1253
Windows CP1253
Greek
Yes
Yes
1
WIN1254
Windows CP1254
Turkish
Yes
Yes
1
WIN1255
Windows CP1255
Hebrew
Yes
Yes
1
WIN1256
Windows CP1256
Arabic
Yes
Yes
1
WIN1257
Windows CP1257
Baltic
Yes
Yes
1
WIN1258
Windows CP1258
Vietnamese
Yes
Yes
1
ABC
, TCVN
, TCVN5712
, VSCII
Server Character Set
Available Client Character Sets
BIG5
not supported as a server encoding
EUC_CN
EUC_CN, MULE_INTERNAL
, UTF8
EUC_JP
EUC_JP, MULE_INTERNAL
, SJIS
, UTF8
EUC_KR
EUC_KR, MULE_INTERNAL
, UTF8
EUC_TW
EUC_TW, BIG5
, MULE_INTERNAL
, UTF8
GB18030
not supported as a server encoding
GBK
not supported as a server encoding
ISO_8859_5
ISO_8859_5, KOI8R
, MULE_INTERNAL
, UTF8
, WIN866
, WIN1251
ISO_8859_6
ISO_8859_6, UTF8
ISO_8859_7
ISO_8859_7, UTF8
ISO_8859_8
ISO_8859_8, UTF8
JOHAB
JOHAB, UTF8
KOI8R
KOI8R, ISO_8859_5
, MULE_INTERNAL
, UTF8
, WIN866
, WIN1251
KOI8U
KOI8U, UTF8
LATIN1
LATIN1, MULE_INTERNAL
, UTF8
LATIN2
LATIN2, MULE_INTERNAL
, UTF8
, WIN1250
LATIN3
LATIN3, MULE_INTERNAL
, UTF8
LATIN4
LATIN4, MULE_INTERNAL
, UTF8
LATIN5
LATIN5, UTF8
LATIN6
LATIN6, UTF8
LATIN7
LATIN7, UTF8
LATIN8
LATIN8, UTF8
LATIN9
LATIN9, UTF8
LATIN10
LATIN10, UTF8
MULE_INTERNAL
MULE_INTERNAL, BIG5
, EUC_CN
, EUC_JP
, EUC_KR
, EUC_TW
, ISO_8859_5
, KOI8R
, LATIN1
to LATIN4
, SJIS
, WIN866
, WIN1250
, WIN1251
SJIS
not supported as a server encoding
SQL_ASCII
any (no conversion will be performed)
UHC
not supported as a server encoding
UTF8
all supported encodings
WIN866
WIN866, ISO_8859_5
, KOI8R
, MULE_INTERNAL
, UTF8
, WIN1251
WIN874
WIN874, UTF8
WIN1250
WIN1250, LATIN2
, MULE_INTERNAL
, UTF8
WIN1251
WIN1251, ISO_8859_5
, KOI8R
, MULE_INTERNAL
, UTF8
, WIN866
WIN1252
WIN1252, UTF8
WIN1253
WIN1253, UTF8
WIN1254
WIN1254, UTF8
WIN1255
WIN1255, UTF8
WIN1256
WIN1256, UTF8
WIN1257
WIN1257, UTF8
WIN1258
WIN1258, UTF8
Type | Definition |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
資料庫管理員最重要的磁碟監控任務是確保磁碟空間是足夠的。充滿資料的資料磁碟不會導致資料損壞,但是可能限制繼續進行資料處理的活動。如果儲存 WAL 檔案的磁碟空間已滿,則資料庫伺服器會出現混亂,並因此而導致服務中斷。
如果無法透過刪除其他內容來釋放磁碟上的其他空間,則可以透過使用資料表空間將某些資料庫檔案移至其他檔案系統。有關更多資訊,請參閱第 22.6 節。
注意 有一些檔案系統在幾乎全滿時效能會很差,因此不要等到磁碟完全滿之後才採取措施。
如果您的系統支援使用者的磁碟配額,那麼資料庫自然會受到伺服器作為其執行使用者的配額限制。超過配額將帶來與完全用盡磁碟空間相同的不良影響。
每個資料表都有一個主要的 heap 磁碟檔案,其中儲存了大多數的資料。如果資料表中的任何欄位可能會有大量內容,則可能還會有一個與該資料表相關聯的 TOAST 欄位,該欄位用於儲存太大量而無法適當地容納在主資料表中的內容(請參閱第 68.2 節)。如果存在的話,TOAST 資料表上將有一個有效的索引。也可能會有與基本資料表關聯的索引。每個資料表和索引都會儲存在一個單獨的磁碟檔案中-如果檔案超過 1 GB,則可能有多個文檔案。這些檔案的命名規則的請參閱第 68.1 節。
您可以透過三種方式監控磁碟空間:使用 Table 9.89 中所列出的 SQL 函數,使用 oid2name 模組或對系統目錄進行手動檢查。SQL 函數最易於使用,通常建議使用。本節的其餘部分顯示如何透過檢查系統目錄來執行此操作。
在最近清理或分析的資料庫上使用 psql,可以發出查詢以查看任何資料表的磁碟使用情況:
每個頁面通常為 8 KB。(請記住,只有 VACUUM,ANALYZE 和一些 DDL 命令(如 CREATE INDEX)才能更新 relpages。)如果要直接檢查資料表的磁碟檔案,則需要使用檔案路徑名稱。
要顯示 TOAST 資料表所使用的空間,請使用如下的查詢:
您也可以輕鬆顯示索引大小:
使用以下語法可以很容易找到最大的資料表和索引:
There are several WAL-related configuration parameters that affect database performance. This section explains their use. Consult Chapter 19 for general information about setting server configuration parameters.
Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before that checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. (The change records were previously flushed to the WAL files.) In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the log (known as the redo record) from which it should start the REDO operation. Any changes made to data files before that point are guaranteed to be already on disk. Hence, after a checkpoint, log segments preceding the one containing the redo record are no longer needed and can be recycled or removed. (When WAL archiving is being done, the log segments must be archived before being recycled or removed.)
The checkpoint requirement of flushing all dirty data pages to disk can cause a significant I/O load. For this reason, checkpoint activity is throttled so that I/O begins at checkpoint start and completes before the next checkpoint is due to start; this minimizes performance degradation during checkpoints.
The server's checkpointer process automatically performs a checkpoint every so often. A checkpoint is begun every checkpoint_timeout seconds, or if max_wal_size is about to be exceeded, whichever comes first. The default settings are 5 minutes and 1 GB, respectively. If no WAL has been written since the previous checkpoint, new checkpoints will be skipped even if checkpoint_timeout
has passed. (If WAL archiving is being used and you want to put a lower limit on how often files are archived in order to bound potential data loss, you should adjust the archive_timeout parameter rather than the checkpoint parameters.) It is also possible to force a checkpoint by using the SQL command CHECKPOINT
.
Reducing checkpoint_timeout
and/or max_wal_size
causes checkpoints to occur more often. This allows faster after-crash recovery, since less work will need to be redone. However, one must balance this against the increased cost of flushing dirty data pages more often. If full_page_writes is set (as is the default), there is another factor to consider. To ensure data page consistency, the first modification of a data page after each checkpoint results in logging the entire page content. In that case, a smaller checkpoint interval increases the volume of output to the WAL log, partially negating the goal of using a smaller interval, and in any case causing more disk I/O.
Checkpoints are fairly expensive, first because they require writing out all currently dirty buffers, and second because they result in extra subsequent WAL traffic as discussed above. It is therefore wise to set the checkpointing parameters high enough so that checkpoints don't happen too often. As a simple sanity check on your checkpointing parameters, you can set the checkpoint_warning parameter. If checkpoints happen closer together than checkpoint_warning
seconds, a message will be output to the server log recommending increasing max_wal_size
. Occasional appearance of such a message is not cause for alarm, but if it appears often then the checkpoint control parameters should be increased. Bulk operations such as large COPY
transfers might cause a number of such warnings to appear if you have not set max_wal_size
high enough.
To avoid flooding the I/O system with a burst of page writes, writing dirty buffers during a checkpoint is spread over a period of time. That period is controlled by checkpoint_completion_target, which is given as a fraction of the checkpoint interval. The I/O rate is adjusted so that the checkpoint finishes when the given fraction of checkpoint_timeout
seconds have elapsed, or before max_wal_size
is exceeded, whichever is sooner. With the default value of 0.5, PostgreSQL can be expected to complete each checkpoint in about half the time before the next checkpoint starts. On a system that's very close to maximum I/O throughput during normal operation, you might want to increase checkpoint_completion_target
to reduce the I/O load from checkpoints. The disadvantage of this is that prolonging checkpoints affects recovery time, because more WAL segments will need to be kept around for possible use in recovery. Although checkpoint_completion_target
can be set as high as 1.0, it is best to keep it less than that (perhaps 0.9 at most) since checkpoints include some other activities besides writing dirty buffers. A setting of 1.0 is quite likely to result in checkpoints not being completed on time, which would result in performance loss due to unexpected variation in the number of WAL segments needed.
On Linux and POSIX platforms checkpoint_flush_after allows to force the OS that pages written by the checkpoint should be flushed to disk after a configurable number of bytes. Otherwise, these pages may be kept in the OS's page cache, inducing a stall when fsync
is issued at the end of a checkpoint. This setting will often help to reduce transaction latency, but it also can have an adverse effect on performance; particularly for workloads that are bigger than shared_buffers, but smaller than the OS's page cache.
The number of WAL segment files in pg_wal
directory depends on min_wal_size
, max_wal_size
and the amount of WAL generated in previous checkpoint cycles. When old log segment files are no longer needed, they are removed or recycled (that is, renamed to become future segments in the numbered sequence). If, due to a short-term peak of log output rate, max_wal_size
is exceeded, the unneeded segment files will be removed until the system gets back under this limit. Below that limit, the system recycles enough WAL files to cover the estimated need until the next checkpoint, and removes the rest. The estimate is based on a moving average of the number of WAL files used in previous checkpoint cycles. The moving average is increased immediately if the actual usage exceeds the estimate, so it accommodates peak usage rather than average usage to some extent. min_wal_size
puts a minimum on the amount of WAL files recycled for future usage; that much WAL is always recycled for future use, even if the system is idle and the WAL usage estimate suggests that little WAL is needed.
Independently of max_wal_size
, wal_keep_segments + 1 most recent WAL files are kept at all times. Also, if WAL archiving is used, old segments can not be removed or recycled until they are archived. If WAL archiving cannot keep up with the pace that WAL is generated, or if archive_command
fails repeatedly, old WAL files will accumulate in pg_wal
until the situation is resolved. A slow or failed standby server that uses a replication slot will have the same effect (see Section 26.2.6).
In archive recovery or standby mode, the server periodically performs restartpoints, which are similar to checkpoints in normal operation: the server forces all its state to disk, updates the pg_control
file to indicate that the already-processed WAL data need not be scanned again, and then recycles any old log segment files in the pg_wal
directory. Restartpoints can't be performed more frequently than checkpoints in the master because restartpoints can only be performed at checkpoint records. A restartpoint is triggered when a checkpoint record is reached if at least checkpoint_timeout
seconds have passed since the last restartpoint, or if WAL size is about to exceed max_wal_size
. However, because of limitations on when a restartpoint can be performed, max_wal_size
is often exceeded during recovery, by up to one checkpoint cycle's worth of WAL. (max_wal_size
is never a hard limit anyway, so you should always leave plenty of headroom to avoid running out of disk space.)
There are two commonly used internal WAL functions: XLogInsertRecord
and XLogFlush
. XLogInsertRecord
is used to place a new record into the WAL buffers in shared memory. If there is no space for the new record, XLogInsertRecord
will have to write (move to kernel cache) a few filled WAL buffers. This is undesirable because XLogInsertRecord
is used on every database low level modification (for example, row insertion) at a time when an exclusive lock is held on affected data pages, so the operation needs to be as fast as possible. What is worse, writing WAL buffers might also force the creation of a new log segment, which takes even more time. Normally, WAL buffers should be written and flushed by an XLogFlush
request, which is made, for the most part, at transaction commit time to ensure that transaction records are flushed to permanent storage. On systems with high log output, XLogFlush
requests might not occur often enough to prevent XLogInsertRecord
from having to do writes. On such systems one should increase the number of WAL buffers by modifying the wal_buffers parameter. When full_page_writes is set and the system is very busy, setting wal_buffers
higher will help smooth response times during the period immediately following each checkpoint.
The commit_delay parameter defines for how many microseconds a group commit leader process will sleep after acquiring a lock within XLogFlush
, while group commit followers queue up behind the leader. This delay allows other server processes to add their commit records to the WAL buffers so that all of them will be flushed by the leader's eventual sync operation. No sleep will occur if fsync is not enabled, or if fewer than commit_siblings other sessions are currently in active transactions; this avoids sleeping when it's unlikely that any other session will commit soon. Note that on some platforms, the resolution of a sleep request is ten milliseconds, so that any nonzero commit_delay
setting between 1 and 10000 microseconds would have the same effect. Note also that on some platforms, sleep operations may take slightly longer than requested by the parameter.
Since the purpose of commit_delay
is to allow the cost of each flush operation to be amortized across concurrently committing transactions (potentially at the expense of transaction latency), it is necessary to quantify that cost before the setting can be chosen intelligently. The higher that cost is, the more effective commit_delay
is expected to be in increasing transaction throughput, up to a point. The pg_test_fsync program can be used to measure the average time in microseconds that a single WAL flush operation takes. A value of half of the average time the program reports it takes to flush after a single 8kB write operation is often the most effective setting for commit_delay
, so this value is recommended as the starting point to use when optimizing for a particular workload. While tuning commit_delay
is particularly useful when the WAL log is stored on high-latency rotating disks, benefits can be significant even on storage media with very fast sync times, such as solid-state drives or RAID arrays with a battery-backed write cache; but this should definitely be tested against a representative workload. Higher values of commit_siblings
should be used in such cases, whereas smaller commit_siblings
values are often helpful on higher latency media. Note that it is quite possible that a setting of commit_delay
that is too high can increase transaction latency by so much that total transaction throughput suffers.
When commit_delay
is set to zero (the default), it is still possible for a form of group commit to occur, but each group will consist only of sessions that reach the point where they need to flush their commit records during the window in which the previous flush operation (if any) is occurring. At higher client counts a “gangway effect” tends to occur, so that the effects of group commit become significant even when commit_delay
is zero, and thus explicitly setting commit_delay
tends to help less. Setting commit_delay
can only help when (1) there are some concurrently committing transactions, and (2) throughput is limited to some degree by commit rate; but with high rotational latency this setting can be effective in increasing transaction throughput with as few as two clients (that is, a single committing client with one sibling transaction).
The wal_sync_method parameter determines how PostgreSQL will ask the kernel to force WAL updates out to disk. All the options should be the same in terms of reliability, with the exception of fsync_writethrough
, which can sometimes force a flush of the disk cache even when other options do not do so. However, it's quite platform-specific which one will be the fastest. You can test the speeds of different options using the pg_test_fsync program. Note that this parameter is irrelevant if fsync
has been turned off.
Enabling the wal_debug configuration parameter (provided that PostgreSQL has been compiled with support for it) will result in each XLogInsertRecord
and XLogFlush
WAL call being logged to the server log. This option might be replaced by a more general mechanism in the future.
WAL is automatically enabled; no action is required from the administrator except ensuring that the disk-space requirements for the WAL logs are met, and that any necessary tuning is done (see Section 29.4).
WAL records are appended to the WAL logs as each new record is written. The insert position is described by a Log Sequence Number (LSN) that is a byte offset into the logs, increasing monotonically with each new record. LSN values are returned as the datatype pg_lsn
. Values can be compared to calculate the volume of WAL data that separates them, so they are used to measure the progress of replication and recovery.
WAL logs are stored in the directory pg_wal
under the data directory, as a set of segment files, normally each 16 MB in size (but the size can be changed by altering the --wal-segsize
initdb option). Each segment is divided into pages, normally 8 kB each (this size can be changed via the --with-wal-blocksize
configure option). The log record headers are described in access/xlogrecord.h
; the record content is dependent on the type of event that is being logged. Segment files are given ever-increasing numbers as names, starting at 000000010000000000000001
. The numbers do not wrap, but it will take a very, very long time to exhaust the available stock of numbers.
It is advantageous if the log is located on a different disk from the main database files. This can be achieved by moving the pg_wal
directory to another location (while the server is shut down, of course) and creating a symbolic link from the original location in the main data directory to the new location.
The aim of WAL is to ensure that the log is written before database records are altered, but this can be subverted by disk drives that falsely report a successful write to the kernel, when in fact they have only cached the data and not yet stored it on the disk. A power failure in such a situation might lead to irrecoverable data corruption. Administrators should try to ensure that disks holding PostgreSQL's WAL log files do not make such false reports. (See Section 29.1.)
After a checkpoint has been made and the log flushed, the checkpoint's position is saved in the file pg_control
. Therefore, at the start of recovery, the server first reads pg_control
and then the checkpoint record; then it performs the REDO operation by scanning forward from the log location indicated in the checkpoint record. Because the entire content of data pages is saved in the log on the first page modification after a checkpoint (assuming full_page_writes is not disabled), all pages changed since the checkpoint will be restored to a consistent state.
To deal with the case where pg_control
is corrupt, we should support the possibility of scanning existing log segments in reverse order — newest to oldest — in order to find the latest checkpoint. This has not been implemented yet. pg_control
is small enough (less than one disk page) that it is not subject to partial-write problems, and as of this writing there have been no reports of database failures due solely to the inability to read pg_control
itself. So while it is theoretically a weak spot, pg_control
does not seem to be a problem in practice.
Another useful tool for monitoring database activity is the pg_locks
system table. It allows the database administrator to view information about the outstanding locks in the lock manager. For example, this capability can be used to:
View all the locks currently outstanding, all the locks on relations in a particular database, all the locks on a particular relation, or all the locks held by a particular PostgreSQL session.
Determine the relation in the current database with the most ungranted locks (which might be a source of contention among database clients).
Determine the effect of lock contention on overall database performance, as well as the extent to which contention varies with overall database traffic.
Details of the pg_locks
view appear in Section 51.74. For more information on locking and managing concurrency with PostgreSQL, refer to Chapter 13.
On most Unix platforms, PostgreSQL modifies its command title as reported by ps
, so that individual server processes can readily be identified. A sample display is
(The appropriate invocation of ps
varies across different platforms, as do the details of what is shown. This example is from a recent Linux system.) The first process listed here is the master server process. The command arguments shown for it are the same ones used when it was launched. The next five processes are background worker processes automatically launched by the master process. (The “stats collector” process will not be present if you have set the system not to start the statistics collector; likewise the “autovacuum launcher” process can be disabled.) Each of the remaining processes is a server process handling one client connection. Each such process sets its command line display in the form
The user, database, and (client) host items remain the same for the life of the client connection, but the activity indicator changes. The activity can be idle
(i.e., waiting for a client command), idle in transaction
(waiting for client inside a BEGIN
block), or a command type name such as SELECT
. Also, waiting
is appended if the server process is presently waiting on a lock held by another session. In the above example we can infer that process 15606 is waiting for process 15610 to complete its transaction and thereby release some lock. (Process 15610 must be the blocker, because there is no other active session. In more complicated cases it would be necessary to look into the pg_locks
system view to determine who is blocking whom.)
If cluster_name has been configured the cluster name will also be shown in ps
output:
If you have turned off update_process_title then the activity indicator is not updated; the process title is set only once when a new process is launched. On some platforms this saves a measurable amount of per-command overhead; on others it's insignificant.
Solaris requires special handling. You must use /usr/ucb/ps
, rather than /bin/ps
. You also must use two w
flags, not just one. In addition, your original invocation of the postgres
command must have a shorter ps
status display than that provided by each server process. If you fail to do all three things, the ps
output for each server process will be the original postgres
command line.
可以在任何物理複寫主機上定義發佈。 定義發佈的節點稱為發佈者。發佈是從一個資料表或一組資料表中産生的一組變更,也可能被描述為變更集合或複寫集合。每個發佈僅能存在於一個資料庫中。
發佈與綱要不同,不會影響資料表的存取方式。如果需要,每個資料表可以加到多個發佈之中。 發佈目前可能只包含資料表。物件必須明確加入,除非為 ALL TABLES 建立發佈。
發佈可以選擇將其產生的變更限制為 INSERT,UPDATE 和 DELETE 的任意組合,類似於特定事件類型觸發器的方式。預設情況下,所有操作類型都被複寫。
發佈的資料表必須配置「副本識別」,以便能夠複寫 UPDATE 和 DELETE 操作,使在訂閱端可以識別更新或刪除適當的資料列。預設情況下,這是主鍵,如果有的話。另一個是唯一索引(具有某些附加要求)也可以設定為副本識別。如果該資料表沒有任何合適的方式,則可以將其設定為副本識別「full」,這意味著整個資料列都作為識別。但是,這效能是非常低的,只有在沒有其他解決方案可行的情況下才可以這樣使用。如果在發佈方設定了除「full」之外的副本識別,則還必須在訂閱戶設定包含相同或更少欄位的副本標別。有關如何設定副本識別的詳細訊息,請參閱 REPLICA IDENTITY。如果沒有副本識別的資料表被加到複寫 UPDATE 或 DELETE 操作的發佈中,則隨後的 UPDATE 或 DELETE 操作將導致發佈者出錯。不管任何副本識別,INSERT 操作都可以繼續進行。
每個發佈可以有多個訂閱者。
使用 CREATE PUBLICATION 指令建立發佈,稍後可以使用相應的命令變更或移除發佈。
可以使用 ALTER PUBLICATION 動態加入和移除單個資料表。ADD TABLE 和 DROP TABLE 操作都是交易安全的;所以一旦交易事務提交後,資料表就會在正確的快照上,並且啟動或停止複寫。
邏輯複寫(Logical Replication)是一種依據複寫指標(通常是主鍵)複製資料物件及其更新的方法。我們使用術語邏輯與物理複寫相對比,物理複寫使用確切的區塊位址進行每一個字元組的複寫。PostgreSQL 同時支持這兩種機制,請參閱第 26 章。邏輯複寫允許對資料複寫和安全性進行更精細的控制。
邏輯複寫使用 publish(發佈)和 subscribe(訂閱)模式,其中一個或多個訂閱者 subscribe 發布者節點上的一個或多個 publish。 訂閱戶從他們訂閱的 publish 中提取資料,並可能隨後重新發布資料以允許串聯複寫或更複雜的複寫架構。
資料表的邏輯複寫通常始於對發佈者資料庫上的資料進行快照並將其複寫到訂閱伺服器。一旦完成,發佈者的變化就會即時發送給訂閱者。訂閱者按照與發佈者相同的順序變動資料,以確保單個訂閱內的發佈的交易事務一致性。這種資料複寫方法有時被稱為交易事務複寫。
典型的邏輯複寫情況有:
在訂閱者時常向單個資料庫或資料庫的子集發送增量變更時。
當訂閱者收到個別的變更時能觸發事件。
將多個資料庫合併成一個資料庫(例如為了分析的需求)。
在不同的 PostgreSQL 版本之間複寫。
將複寫的資料給予不同的用戶群組存取權限。
在多個資料庫之間共享資料庫的一部份。
訂閱戶資料庫的行為與任何其他 PostgreSQL 的行為相同,並且可以透過定義其自己的發佈來用作其他資料庫的發佈者。當訂閱者被應用程序視為唯讀時,單個訂閱就不會發生衝突。另一方面,如果應用程序或其他使用者對同一組資料表執行了其他的寫入操作,則可能會出現衝突。
Asyncgronous commit 是可以讓事務更快完成的選項,但如果資料庫崩潰,最後未完成的事務可能會遺失。在許多應用中,這是一個可以接受的折衷方案。
如上一節所述,事務提交通常是同步的:伺服器在將成功的指示回傳給用戶端之前,等待事務的 WAL 記錄寫入到永久儲存。因此,即使在此之後立即發生伺服器崩潰的情況下,也可以保證用戶端將保留回報為已提交的事務。但是,對於短交易,此延遲時間是總交易時間的主要組成部分。選擇非同步提交模式意味著在邏輯上完成事務後,伺服器將在產生的 WAL 記錄實際進入磁碟之前回傳成功。這可以大大提高短交易事務的處理量。
非同步提交會帶來資料遺失的風險。在向用戶端提交事務完成報告與真正提交事務的時間之間存在很短的時間窗口(也就是說,如果伺服器崩潰,可以保證不會遺失)。因此,如果用戶端將根據將記住交易的假設來採取額外的操作,則不應使用非同步提交。例如,銀行肯定不會在記錄 ATM 現金分配的交易中使用非同步提交。但是在許多情況下,例如事件日誌記錄,不需要這種強大的保證。
The risk that is taken by using asynchronous commit is of data loss, not data corruption. If the database should crash, it will recover by replaying WAL up to the last record that was flushed. The database will therefore be restored to a self-consistent state, but any transactions that were not yet flushed to disk will not be reflected in that state. The net effect is therefore loss of the last few transactions. Because the transactions are replayed in commit order, no inconsistency can be introduced — for example, if transaction B made changes relying on the effects of a previous transaction A, it is not possible for A's effects to be lost while B's effects are preserved.
The user can select the commit mode of each transaction, so that it is possible to have both synchronous and asynchronous commit transactions running concurrently. This allows flexible trade-offs between performance and certainty of transaction durability. The commit mode is controlled by the user-settable parameter synchronous_commit, which can be changed in any of the ways that a configuration parameter can be set. The mode used for any one transaction depends on the value of synchronous_commit
when transaction commit begins.
Certain utility commands, for instance DROP TABLE
, are forced to commit synchronously regardless of the setting of synchronous_commit
. This is to ensure consistency between the server's file system and the logical state of the database. The commands supporting two-phase commit, such as PREPARE TRANSACTION
, are also always synchronous.
If the database crashes during the risk window between an asynchronous commit and the writing of the transaction's WAL records, then changes made during that transaction will be lost. The duration of the risk window is limited because a background process (the “WAL writer”) flushes unwritten WAL records to disk every wal_writer_delay milliseconds. The actual maximum duration of the risk window is three times wal_writer_delay
because the WAL writer is designed to favor writing whole pages at a time during busy periods.
An immediate-mode shutdown is equivalent to a server crash, and will therefore cause loss of any unflushed asynchronous commits.
Asynchronous commit provides behavior different from setting fsync = off. fsync
is a server-wide setting that will alter the behavior of all transactions. It disables all logic within PostgreSQL that attempts to synchronize writes to different portions of the database, and therefore a system crash (that is, a hardware or operating system crash, not a failure of PostgreSQL itself) could result in arbitrarily bad corruption of the database state. In many scenarios, asynchronous commit provides most of the performance improvement that could be obtained by turning off fsync
, but without the risk of data corruption.
commit_delay also sounds very similar to asynchronous commit, but it is actually a synchronous commit method (in fact, commit_delay
is ignored during an asynchronous commit). commit_delay
causes a delay just before a transaction flushes WAL to disk, in the hope that a single flush executed by one such transaction can also serve other transactions committing at about the same time. The setting can be thought of as a way of increasing the time window in which transactions can join a group about to participate in a single flush, to amortize the cost of the flush among multiple transactions.
Name | Parameters | Description |
|
| Probe that fires at the start of a new transaction. arg0 is the transaction ID. |
|
| Probe that fires when a transaction completes successfully. arg0 is the transaction ID. |
|
| Probe that fires when a transaction completes unsuccessfully. arg0 is the transaction ID. |
|
| Probe that fires when the processing of a query is started. arg0 is the query string. |
|
| Probe that fires when the processing of a query is complete. arg0 is the query string. |
|
| Probe that fires when the parsing of a query is started. arg0 is the query string. |
|
| Probe that fires when the parsing of a query is complete. arg0 is the query string. |
|
| Probe that fires when the rewriting of a query is started. arg0 is the query string. |
|
| Probe that fires when the rewriting of a query is complete. arg0 is the query string. |
|
| Probe that fires when the planning of a query is started. |
|
| Probe that fires when the planning of a query is complete. |
|
| Probe that fires when the execution of a query is started. |
|
| Probe that fires when the execution of a query is complete. |
|
| Probe that fires anytime the server process updates its |
|
| Probe that fires when a checkpoint is started. arg0 holds the bitwise flags used to distinguish different checkpoint types, such as shutdown, immediate or force. |
|
| Probe that fires when a checkpoint is complete. (The probes listed next fire in sequence during checkpoint processing.) arg0 is the number of buffers written. arg1 is the total number of buffers. arg2, arg3 and arg4 contain the number of WAL files added, removed and recycled respectively. |
|
| Probe that fires when the CLOG portion of a checkpoint is started. arg0 is true for normal checkpoint, false for shutdown checkpoint. |
|
| Probe that fires when the CLOG portion of a checkpoint is complete. arg0 has the same meaning as for |
|
| Probe that fires when the SUBTRANS portion of a checkpoint is started. arg0 is true for normal checkpoint, false for shutdown checkpoint. |
|
| Probe that fires when the SUBTRANS portion of a checkpoint is complete. arg0 has the same meaning as for |
|
| Probe that fires when the MultiXact portion of a checkpoint is started. arg0 is true for normal checkpoint, false for shutdown checkpoint. |
|
| Probe that fires when the MultiXact portion of a checkpoint is complete. arg0 has the same meaning as for |
|
| Probe that fires when the buffer-writing portion of a checkpoint is started. arg0 holds the bitwise flags used to distinguish different checkpoint types, such as shutdown, immediate or force. |
|
| Probe that fires when we begin to write dirty buffers during checkpoint (after identifying which buffers must be written). arg0 is the total number of buffers. arg1 is the number that are currently dirty and need to be written. |
|
| Probe that fires after each buffer is written during checkpoint. arg0 is the ID number of the buffer. |
|
| Probe that fires when all dirty buffers have been written. arg0 is the total number of buffers. arg1 is the number of buffers actually written by the checkpoint process. arg2 is the number that were expected to be written (arg1 of |
|
| Probe that fires after dirty buffers have been written to the kernel, and before starting to issue fsync requests. |
|
| Probe that fires when syncing of buffers to disk is complete. |
|
| Probe that fires when the two-phase portion of a checkpoint is started. |
|
| Probe that fires when the two-phase portion of a checkpoint is complete. |
|
| Probe that fires when a buffer read is started. arg0 and arg1 contain the fork and block numbers of the page (but arg1 will be -1 if this is a relation extension request). arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. arg5 is the ID of the backend which created the temporary relation for a local buffer, or |
|
| Probe that fires when a buffer read is complete. arg0 and arg1 contain the fork and block numbers of the page (if this is a relation extension request, arg1 now contains the block number of the newly added block). arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. arg5 is the ID of the backend which created the temporary relation for a local buffer, or |
|
| Probe that fires before issuing any write request for a shared buffer. arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. |
|
| Probe that fires when a write request is complete. (Note that this just reflects the time to pass the data to the kernel; it's typically not actually been written to disk yet.) The arguments are the same as for |
|
|
|
| Probe that fires when a dirty-buffer write is complete. The arguments are the same as for |
|
|
|
| Probe that fires when a dirty WAL buffer write is complete. |
|
| Probe that fires when a WAL record is inserted. arg0 is the resource manager (rmid) for the record. arg1 contains the info flags. |
|
| Probe that fires when a WAL segment switch is requested. |
|
| Probe that fires when beginning to read a block from a relation. arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. arg5 is the ID of the backend which created the temporary relation for a local buffer, or |
|
| Probe that fires when a block read is complete. arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. arg5 is the ID of the backend which created the temporary relation for a local buffer, or |
|
| Probe that fires when beginning to write a block to a relation. arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. arg5 is the ID of the backend which created the temporary relation for a local buffer, or |
|
| Probe that fires when a block write is complete. arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation. arg5 is the ID of the backend which created the temporary relation for a local buffer, or |
|
| Probe that fires when a sort operation is started. arg0 indicates heap, index or datum sort. arg1 is true for unique-value enforcement. arg2 is the number of key columns. arg3 is the number of kilobytes of work memory allowed. arg4 is true if random access to the sort result is required. arg5 indicates serial when |
|
| Probe that fires when a sort is complete. arg0 is true for external sort, false for internal sort. arg1 is the number of disk blocks used for an external sort, or kilobytes of memory used for an internal sort. |
|
| Probe that fires when an LWLock has been acquired. arg0 is the LWLock's tranche. arg1 is the requested lock mode, either exclusive or shared. |
|
| Probe that fires when an LWLock has been released (but note that any released waiters have not yet been awakened). arg0 is the LWLock's tranche. |
|
| Probe that fires when an LWLock was not immediately available and a server process has begun to wait for the lock to become available. arg0 is the LWLock's tranche. arg1 is the requested lock mode, either exclusive or shared. |
|
| Probe that fires when a server process has been released from its wait for an LWLock (it does not actually have the lock yet). arg0 is the LWLock's tranche. arg1 is the requested lock mode, either exclusive or shared. |
|
| Probe that fires when an LWLock was successfully acquired when the caller specified no waiting. arg0 is the LWLock's tranche. arg1 is the requested lock mode, either exclusive or shared. |
|
| Probe that fires when an LWLock was not successfully acquired when the caller specified no waiting. arg0 is the LWLock's tranche. arg1 is the requested lock mode, either exclusive or shared. |
|
| Probe that fires when a request for a heavyweight lock (lmgr lock) has begun to wait because the lock is not available. arg0 through arg3 are the tag fields identifying the object being locked. arg4 indicates the type of object being locked. arg5 indicates the lock type being requested. |
|
| Probe that fires when a request for a heavyweight lock (lmgr lock) has finished waiting (i.e., has acquired the lock). The arguments are the same as for |
|
| Probe that fires when a deadlock is found by the deadlock detector. |
版本:11
PostgreSQL 的統計資訊收集器是一個子系統,支援收集和回報有關伺服器活動的資訊。目前,收集器可以以磁區和單個資料列的方式計算對資料表和索引的存取。它也追踪著每個資料表中的總資料筆數,以及有關每個資料表的清理和分析操作的資訊。它還可以計算對使用者定義函數的呼叫以及每個函數所花費的總時間。
PostgreSQL 也支援回報有關目前系統中確切正在發生情況的動態資訊,例如目前由其他伺服器程序執行的確切指令以及系統中存在哪些其他連線。此功能獨立於收集器程序。
由於收集統計資訊會增加查詢執行的成本,因此可以將系統配置為收集或不收集資訊。這由通常由 postgresql.conf 中的配置參數所控制。 (有關設定配置參數的詳細資訊,請參閱第 19 章。)
The parameter track_activities enables monitoring of the current command being executed by any server process.
The parameter track_counts controls whether statistics are collected about table and index accesses.
The parameter track_functions enables tracking of usage of user-defined functions.
The parameter track_io_timing enables monitoring of block read and write times.
Normally these parameters are set in postgresql.conf
so that they apply to all server processes, but it is possible to turn them on or off in individual sessions using the SET command. (To prevent ordinary users from hiding their activity from the administrator, only superusers are allowed to change these parameters with SET
.)
The statistics collector transmits the collected information to other PostgreSQL processes through temporary files. These files are stored in the directory named by the stats_temp_directory parameter, pg_stat_tmp
by default. For better performance, stats_temp_directory
can be pointed at a RAM-based file system, decreasing physical I/O requirements. When the server shuts down cleanly, a permanent copy of the statistics data is stored in the pg_stat
subdirectory, so that statistics can be retained across server restarts. When recovery is performed at server start (e.g. after immediate shutdown, server crash, and point-in-time recovery), all statistics counters are reset.
Several predefined views, listed in Table 27.1, are available to show the current state of the system. There are also several other views, listed in Table 27.2, available to show the results of statistics collection. Alternatively, one can build custom views using the underlying statistics functions, as discussed in Section 27.2.3.
When using the statistics to monitor collected data, it is important to realize that the information does not update instantaneously. Each individual server process transmits new statistical counts to the collector just before going idle; so a query or transaction still in progress does not affect the displayed totals. Also, the collector itself emits a new report at most once per PGSTAT_STAT_INTERVAL
milliseconds (500 ms unless altered while building the server). So the displayed information lags behind actual activity. However, current-query information collected by track_activities
is always up-to-date.
Another important point is that when a server process is asked to display any of these statistics, it first fetches the most recent report emitted by the collector process and then continues to use this snapshot for all statistical views and functions until the end of its current transaction. So the statistics will show static information as long as you continue the current transaction. Similarly, information about the current queries of all sessions is collected when any such information is first requested within a transaction, and the same information will be displayed throughout the transaction. This is a feature, not a bug, because it allows you to perform several queries on the statistics and correlate the results without worrying that the numbers are changing underneath you. But if you want to see new results with each query, be sure to do the queries outside any transaction block. Alternatively, you can invoke pg_stat_clear_snapshot
(), which will discard the current transaction's statistics snapshot (if any). The next use of statistical information will cause a new snapshot to be fetched.
A transaction can also see its own statistics (as yet untransmitted to the collector) in the views pg_stat_xact_all_tables
, pg_stat_xact_sys_tables
, pg_stat_xact_user_tables
, and pg_stat_xact_user_functions
. These numbers do not act as stated above; instead they update continuously throughout the transaction.
Some of the information in the dynamic statistics views shown in Table 27.1 is security restricted. Ordinary users can only see all the information about their own sessions (sessions belonging to a role that they are a member of). In rows about other sessions, many columns will be null. Note, however, that the existence of a session and its general properties such as its sessions user and database are visible to all users. Superusers and members of the built-in role pg_read_all_stats
(see also Section 21.5) can see all the information about all sessions.
The per-index statistics are particularly useful to determine which indexes are being used and how effective they are.
The pg_statio_
views are primarily useful to determine the effectiveness of the buffer cache. When the number of actual disk reads is much smaller than the number of buffer hits, then the cache is satisfying most read requests without invoking a kernel call. However, these statistics do not give the entire story: due to the way in which PostgreSQL handles disk I/O, data that is not in the PostgreSQL buffer cache might still reside in the kernel's I/O cache, and might therefore still be fetched without requiring a physical read. Users interested in obtaining more detailed information on PostgreSQL I/O behavior are advised to use the PostgreSQL statistics collector in combination with operating system utilities that allow insight into the kernel's handling of I/O.
pg_stat_activity
ViewThe pg_stat_activity
view will have one row per server process, showing information related to the current activity of that process.
The wait_event
and state
columns are independent. If a backend is in the active
state, it may or may not be waiting
on some event. If the state is active
and wait_event
is non-null, it means that a query is being executed, but is being blocked somewhere in the system.
wait_event
DescriptionFor tranches registered by extensions, the name is specified by extension and this will be displayed as wait_event
. It is quite possible that user has registered the tranche in one of the backends (by having allocation in dynamic shared memory) in which case other backends won't have that information, so we display extension
for such cases.
Here is an example of how wait events can be viewed
Table 27.5. pg_stat_replication
View
The pg_stat_replication
view will contain one row per WAL sender process, showing statistics about replication to that sender's connected standby server. Only directly connected standbys are listed; no information is available about downstream standby servers.
The lag times reported in the pg_stat_replication
view are measurements of the time taken for recent WAL to be written, flushed and replayed and for the sender to know about it. These times represent the commit delay that was (or would have been) introduced by each synchronous commit level, if the remote server was configured as a synchronous standby. For an asynchronous standby, the replay_lag
column approximates the delay before recent transactions became visible to queries. If the standby server has entirely caught up with the sending server and there is no more WAL activity, the most recently measured lag times will continue to be displayed for a short time and then show NULL.
Lag times work automatically for physical replication. Logical decoding plugins may optionally emit tracking messages; if they do not, the tracking mechanism will simply display NULL lag.
The reported lag times are not predictions of how long it will take for the standby to catch up with the sending server assuming the current rate of replay. Such a system would show similar times while new WAL is being generated, but would differ when the sender becomes idle. In particular, when the standby has caught up completely, pg_stat_replication
shows the time taken to write, flush and replay the most recent reported WAL location rather than zero as some users might expect. This is consistent with the goal of measuring synchronous commit and transaction visibility delays for recent write transactions. To reduce confusion for users expecting a different model of lag, the lag columns revert to NULL after a short time on a fully replayed idle system. Monitoring systems should choose whether to represent this as missing data, zero or continue to display the last known value.
pg_stat_wal_receiver
ViewThe pg_stat_wal_receiver
view will contain only one row, showing statistics about the WAL receiver from that receiver's connected server.
pg_stat_subscription
ViewThe pg_stat_subscription
view will contain one row per subscription for main worker (with null PID if the worker is not running), and additional rows for workers handling the initial data copy of the subscribed tables.
pg_stat_ssl
ViewThe pg_stat_ssl
view will contain one row per backend or WAL sender process, showing statistics about SSL usage on this connection. It can be joined to pg_stat_activity
or pg_stat_replication
on the pid
column to get more details about the connection.
pg_stat_gssapi
ViewThe pg_stat_gssapi
view will contain one row per backend, showing information about GSSAPI usage on this connection. It can be joined to pg_stat_activity
or pg_stat_replication
on the pid
column to get more details about the connection.
pg_stat_archiver
ViewThe pg_stat_archiver
view will always have a single row, containing data about the archiver process of the cluster.
pg_stat_bgwriter
ViewThe pg_stat_bgwriter
view will always have a single row, containing global data for the cluster.
pg_stat_database
ViewThe pg_stat_database
view will contain one row for each database in the cluster, plus one for the shared objects, showing database-wide statistics.
pg_stat_database_conflicts
ViewThe pg_stat_database_conflicts
view will contain one row per database, showing database-wide statistics about query cancels occurring due to conflicts with recovery on standby servers. This view will only contain information on standby servers, since conflicts do not occur on master servers.
pg_stat_all_tables
ViewThe pg_stat_all_tables
view will contain one row for each table in the current database (including TOAST tables), showing statistics about accesses to that specific table. The pg_stat_user_tables
and pg_stat_sys_tables
views contain the same information, but filtered to only show user and system tables respectively.
pg_stat_all_indexes
ViewThe pg_stat_all_indexes
view will contain one row for each index in the current database, showing statistics about accesses to that specific index. The pg_stat_user_indexes
and pg_stat_sys_indexes
views contain the same information, but filtered to only show user and system indexes respectively.
Indexes can be used by simple index scans, “bitmap” index scans, and the optimizer. In a bitmap scan the output of several indexes can be combined via AND or OR rules, so it is difficult to associate individual heap row fetches with specific indexes when a bitmap scan is used. Therefore, a bitmap scan increments the pg_stat_all_indexes
.idx_tup_read
count(s) for the index(es) it uses, and it increments the pg_stat_all_tables
.idx_tup_fetch
count for the table, but it does not affect pg_stat_all_indexes
.idx_tup_fetch
. The optimizer also accesses indexes to check for supplied constants whose values are outside the recorded range of the optimizer statistics because the optimizer statistics might be stale.
The idx_tup_read
and idx_tup_fetch
counts can be different even without any use of bitmap scans, because idx_tup_read
counts index entries retrieved from the index while idx_tup_fetch
counts live rows fetched from the table. The latter will be less if any dead or not-yet-committed rows are fetched using the index, or if any heap fetches are avoided by means of an index-only scan.
pg_statio_all_tables
ViewThe pg_statio_all_tables
view will contain one row for each table in the current database (including TOAST tables), showing statistics about I/O on that specific table. The pg_statio_user_tables
and pg_statio_sys_tables
views contain the same information, but filtered to only show user and system tables respectively.
pg_statio_all_indexes
ViewThe pg_statio_all_indexes
view will contain one row for each index in the current database, showing statistics about I/O on that specific index. The pg_statio_user_indexes
and pg_statio_sys_indexes
views contain the same information, but filtered to only show user and system indexes respectively.
pg_statio_all_sequences
ViewThe pg_statio_all_sequences
view will contain one row for each sequence in the current database, showing statistics about I/O on that specific sequence.
pg_stat_user_functions
ViewThe pg_stat_user_functions
view will contain one row for each tracked function, showing statistics about executions of that function. The track_functions parameter controls exactly which functions are tracked.
Other ways of looking at the statistics can be set up by writing queries that use the same underlying statistics access functions used by the standard views shown above. For details such as the functions' names, consult the definitions of the standard views. (For example, in psql you could issue \d+ pg_stat_activity
.) The access functions for per-database statistics take a database OID as an argument to identify which database to report on. The per-table and per-index functions take a table or index OID. The functions for per-function statistics take a function OID. Note that only tables, indexes, and functions in the current database can be seen with these functions.
Additional functions related to statistics collection are listed in Table 27.20.
pg_stat_get_activity
, the underlying function of the pg_stat_activity
view, returns a set of records containing all the available information about each backend process. Sometimes it may be more convenient to obtain just a subset of this information. In such cases, an older set of per-backend statistics access functions can be used; these are shown in Table 27.21. These access functions use a backend ID number, which ranges from one to the number of currently active backends. The function pg_stat_get_backend_idset
provides a convenient way to generate one row for each active backend for invoking these functions. For example, to show the PIDs and current queries of all backends:
PostgreSQL has the ability to report the progress of certain commands during command execution. Currently, the only commands which support progress reporting are CREATE INDEX
, VACUUM
and CLUSTER
. This may be expanded in the future.
Whenever CREATE INDEX
or REINDEX
is running, the pg_stat_progress_create_index
view will contain one row for each backend that is currently creating indexes. The tables below describe the information that will be reported and provide information about how to interpret it.
pg_stat_progress_create_index
ViewWhenever VACUUM
is running, the pg_stat_progress_vacuum
view will contain one row for each backend (including autovacuum worker processes) that is currently vacuuming. The tables below describe the information that will be reported and provide information about how to interpret it. Progress for VACUUM FULL
commands is reported via pg_stat_progress_cluster
because both VACUUM FULL
and CLUSTER
rewrite the table, while regular VACUUM
only modifies it in place. See Section 27.4.3.
pg_stat_progress_vacuum
ViewWhenever CLUSTER
or VACUUM FULL
is running, the pg_stat_progress_cluster
view will contain a row for each backend that is currently running either command. The tables below describe the information that will be reported and provide information about how to interpret it.
pg_stat_progress_cluster
ViewProbe that fires when a server process begins to write a dirty buffer. (If this happens often, it implies that is too small or the background writer control parameters need adjustment.) arg0 and arg1 contain the fork and block numbers of the page. arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs identifying the relation.
Probe that fires when a server process begins to write a dirty WAL buffer because no more WAL buffer space is available. (If this happens often, it implies that is too small.)
View Name
Description
pg_stat_activity
One row per server process, showing information related to the current activity of that process, such as state and current query. See pg_stat_activity for details.
pg_stat_replication
One row per WAL sender process, showing statistics about replication to that sender's connected standby server. See pg_stat_replication for details.
pg_stat_wal_receiver
Only one row, showing statistics about the WAL receiver from that receiver's connected server. See pg_stat_wal_receiver for details.
pg_stat_subscription
At least one row per subscription, showing information about the subscription workers. See pg_stat_subscription for details.
pg_stat_ssl
One row per connection (regular and replication), showing information about SSL used on this connection. See pg_stat_ssl for details.
pg_stat_gssapi
One row per connection (regular and replication), showing information about GSSAPI authentication and encryption used on this connection. See pg_stat_gssapi for details.
pg_stat_progress_create_index
One row for each backend running CREATE INDEX
or REINDEX
, showing current progress. See Section 27.4.1.
pg_stat_progress_vacuum
One row for each backend (including autovacuum worker processes) running VACUUM
, showing current progress. See Section 27.4.2.
pg_stat_progress_cluster
One row for each backend running CLUSTER
or VACUUM FULL
, showing current progress. See Section 27.4.3.
View Name
Description
pg_stat_archiver
One row only, showing statistics about the WAL archiver process's activity. See pg_stat_archiver for details.
pg_stat_bgwriter
One row only, showing statistics about the background writer process's activity. See pg_stat_bgwriter for details.
pg_stat_database
One row per database, showing database-wide statistics. See pg_stat_database for details.
pg_stat_database_conflicts
One row per database, showing database-wide statistics about query cancels due to conflict with recovery on standby servers. See pg_stat_database_conflicts for details.
pg_stat_all_tables
One row for each table in the current database, showing statistics about accesses to that specific table. See pg_stat_all_tables for details.
pg_stat_sys_tables
Same as pg_stat_all_tables
, except that only system tables are shown.
pg_stat_user_tables
Same as pg_stat_all_tables
, except that only user tables are shown.
pg_stat_xact_all_tables
Similar to pg_stat_all_tables
, but counts actions taken so far within the current transaction (which are not yet included in pg_stat_all_tables
and related views). The columns for numbers of live and dead rows and vacuum and analyze actions are not present in this view.
pg_stat_xact_sys_tables
Same as pg_stat_xact_all_tables
, except that only system tables are shown.
pg_stat_xact_user_tables
Same as pg_stat_xact_all_tables
, except that only user tables are shown.
pg_stat_all_indexes
One row for each index in the current database, showing statistics about accesses to that specific index. See pg_stat_all_indexes for details.
pg_stat_sys_indexes
Same as pg_stat_all_indexes
, except that only indexes on system tables are shown.
pg_stat_user_indexes
Same as pg_stat_all_indexes
, except that only indexes on user tables are shown.
pg_statio_all_tables
One row for each table in the current database, showing statistics about I/O on that specific table. See pg_statio_all_tables for details.
pg_statio_sys_tables
Same as pg_statio_all_tables
, except that only system tables are shown.
pg_statio_user_tables
Same as pg_statio_all_tables
, except that only user tables are shown.
pg_statio_all_indexes
One row for each index in the current database, showing statistics about I/O on that specific index. See pg_statio_all_indexes for details.
pg_statio_sys_indexes
Same as pg_statio_all_indexes
, except that only indexes on system tables are shown.
pg_statio_user_indexes
Same as pg_statio_all_indexes
, except that only indexes on user tables are shown.
pg_statio_all_sequences
One row for each sequence in the current database, showing statistics about I/O on that specific sequence. See pg_statio_all_sequences for details.
pg_statio_sys_sequences
Same as pg_statio_all_sequences
, except that only system sequences are shown. (Presently, no system sequences are defined, so this view is always empty.)
pg_statio_user_sequences
Same as pg_statio_all_sequences
, except that only user sequences are shown.
pg_stat_user_functions
One row for each tracked function, showing statistics about executions of that function. See pg_stat_user_functions for details.
pg_stat_xact_user_functions
Similar to pg_stat_user_functions
, but counts only calls during the current transaction (which are not yet included in pg_stat_user_functions
).
Column
Type
Description
datid
oid
OID of the database this backend is connected to
datname
name
Name of the database this backend is connected to
pid
integer
Process ID of this backend
usesysid
oid
OID of the user logged into this backend
usename
name
Name of the user logged into this backend
application_name
text
Name of the application that is connected to this backend
client_addr
inet
IP address of the client connected to this backend. If this field is null, it indicates either that the client is connected via a Unix socket on the server machine or that this is an internal process such as autovacuum.
client_hostname
text
Host name of the connected client, as reported by a reverse DNS lookup of client_addr
. This field will only be non-null for IP connections, and only when log_hostname is enabled.
client_port
integer
TCP port number that the client is using for communication with this backend, or -1
if a Unix socket is used
backend_start
timestamp with time zone
Time when this process was started. For client backends, this is the time the client connected to the server.
xact_start
timestamp with time zone
Time when this process' current transaction was started, or null if no transaction is active. If the current query is the first of its transaction, this column is equal to the query_start
column.
query_start
timestamp with time zone
Time when the currently active query was started, or if state
is not active
, when the last query was started
state_change
timestamp with time zone
Time when the state
was last changed
wait_event_type
text
The type of event for which the backend is waiting, if any; otherwise NULL. Possible values are:
LWLock
: The backend is waiting for a lightweight lock. Each such lock protects a particular data structure in shared memory. wait_event
will contain a name identifying the purpose of the lightweight lock. (Some locks have specific names; others are part of a group of locks each with a similar purpose.)
Lock
: The backend is waiting for a heavyweight lock. Heavyweight locks, also known as lock manager locks or simply locks, primarily protect SQL-visible objects such as tables. However, they are also used to ensure mutual exclusion for certain internal operations such as relation extension. wait_event
will identify the type of lock awaited.
BufferPin
: The server process is waiting to access to a data buffer during a period when no other process can be examining that buffer. Buffer pin waits can be protracted if another process holds an open cursor which last read data from the buffer in question.
Activity
: The server process is idle. This is used by system processes waiting for activity in their main processing loop. wait_event
will identify the specific wait point.
Extension
: The server process is waiting for activity in an extension module. This category is useful for modules to track custom waiting points.
Client
: The server process is waiting for some activity on a socket from user applications, and that the server expects something to happen that is independent from its internal processes. wait_event
will identify the specific wait point.
IPC
: The server process is waiting for some activity from another process in the server. wait_event
will identify the specific wait point.
Timeout
: The server process is waiting for a timeout to expire. wait_event
will identify the specific wait point.
IO
: The server process is waiting for a IO to complete. wait_event
will identify the specific wait point.
wait_event
text
Wait event name if backend is currently waiting, otherwise NULL. See Table 27.4for details.
state
text
Current overall state of this backend. Possible values are:
active
: The backend is executing a query.
idle
: The backend is waiting for a new client command.
idle in transaction
: The backend is in a transaction, but is not currently executing a query.
idle in transaction (aborted)
: This state is similar to idle in transaction
, except one of the statements in the transaction caused an error.
fastpath function call
: The backend is executing a fast-path function.
disabled
: This state is reported if track_activities is disabled in this backend.
backend_xid
xid
Top-level transaction identifier of this backend, if any.
backend_xmin
xid
The current backend's xmin
horizon.
query
text
Text of this backend's most recent query. If state
is active
this field shows the currently executing query. In all other states, it shows the last query that was executed. By default the query text is truncated at 1024 characters; this value can be changed via the parameter track_activity_query_size.
backend_type
text
Type of current backend. Possible types are autovacuum launcher
, autovacuum worker
, logical replication launcher
, logical replication worker
, parallel worker
, background writer
, client backend
, checkpointer
, startup
, walreceiver
, walsender
and walwriter
. In addition, background workers registered by extensions may have additional types.
Wait Event Type
Wait Event Name
Description
LWLock
ShmemIndexLock
Waiting to find or allocate space in shared memory.
OidGenLock
Waiting to allocate or assign an OID.
XidGenLock
Waiting to allocate or assign a transaction id.
ProcArrayLock
Waiting to get a snapshot or clearing a transaction id at transaction end.
SInvalReadLock
Waiting to retrieve or remove messages from shared invalidation queue.
SInvalWriteLock
Waiting to add a message in shared invalidation queue.
WALBufMappingLock
Waiting to replace a page in WAL buffers.
WALWriteLock
Waiting for WAL buffers to be written to disk.
ControlFileLock
Waiting to read or update the control file or creation of a new WAL file.
CheckpointLock
Waiting to perform checkpoint.
CLogControlLock
Waiting to read or update transaction status.
SubtransControlLock
Waiting to read or update sub-transaction information.
MultiXactGenLock
Waiting to read or update shared multixact state.
MultiXactOffsetControlLock
Waiting to read or update multixact offset mappings.
MultiXactMemberControlLock
Waiting to read or update multixact member mappings.
RelCacheInitLock
Waiting to read or write relation cache initialization file.
CheckpointerCommLock
Waiting to manage fsync requests.
TwoPhaseStateLock
Waiting to read or update the state of prepared transactions.
TablespaceCreateLock
Waiting to create or drop the tablespace.
BtreeVacuumLock
Waiting to read or update vacuum-related information for a B-tree index.
AddinShmemInitLock
Waiting to manage space allocation in shared memory.
AutovacuumLock
Autovacuum worker or launcher waiting to update or read the current state of autovacuum workers.
AutovacuumScheduleLock
Waiting to ensure that the table it has selected for a vacuum still needs vacuuming.
SyncScanLock
Waiting to get the start location of a scan on a table for synchronized scans.
RelationMappingLock
Waiting to update the relation map file used to store catalog to filenode mapping.
AsyncCtlLock
Waiting to read or update shared notification state.
AsyncQueueLock
Waiting to read or update notification messages.
SerializableXactHashLock
Waiting to retrieve or store information about serializable transactions.
SerializableFinishedListLock
Waiting to access the list of finished serializable transactions.
SerializablePredicateLockListLock
Waiting to perform an operation on a list of locks held by serializable transactions.
OldSerXidLock
Waiting to read or record conflicting serializable transactions.
SyncRepLock
Waiting to read or update information about synchronous replicas.
BackgroundWorkerLock
Waiting to read or update background worker state.
DynamicSharedMemoryControlLock
Waiting to read or update dynamic shared memory state.
AutoFileLock
Waiting to update the postgresql.auto.conf
file.
ReplicationSlotAllocationLock
Waiting to allocate or free a replication slot.
ReplicationSlotControlLock
Waiting to read or update replication slot state.
CommitTsControlLock
Waiting to read or update transaction commit timestamps.
CommitTsLock
Waiting to read or update the last value set for the transaction timestamp.
ReplicationOriginLock
Waiting to setup, drop or use replication origin.
MultiXactTruncationLock
Waiting to read or truncate multixact information.
OldSnapshotTimeMapLock
Waiting to read or update old snapshot control information.
LogicalRepWorkerLock
Waiting for action on logical replication worker to finish.
CLogTruncationLock
Waiting to execute txid_status
or update the oldest transaction id available to it.
clog
Waiting for I/O on a clog (transaction status) buffer.
commit_timestamp
Waiting for I/O on commit timestamp buffer.
subtrans
Waiting for I/O a subtransaction buffer.
multixact_offset
Waiting for I/O on a multixact offset buffer.
multixact_member
Waiting for I/O on a multixact_member buffer.
async
Waiting for I/O on an async (notify) buffer.
oldserxid
Waiting for I/O on an oldserxid buffer.
wal_insert
Waiting to insert WAL into a memory buffer.
buffer_content
Waiting to read or write a data page in memory.
buffer_io
Waiting for I/O on a data page.
replication_origin
Waiting to read or update the replication progress.
replication_slot_io
Waiting for I/O on a replication slot.
proc
Waiting to read or update the fast-path lock information.
buffer_mapping
Waiting to associate a data block with a buffer in the buffer pool.
lock_manager
Waiting to add or examine locks for backends, or waiting to join or exit a locking group (used by parallel query).
predicate_lock_manager
Waiting to add or examine predicate lock information.
serializable_xact
Waiting to perform an operation on a serializable transaction in a parallel query.
parallel_query_dsa
Waiting for parallel query dynamic shared memory allocation lock.
tbm
Waiting for TBM shared iterator lock.
parallel_append
Waiting to choose the next subplan during Parallel Append plan execution.
parallel_hash_join
Waiting to allocate or exchange a chunk of memory or update counters during Parallel Hash plan execution.
Lock
relation
Waiting to acquire a lock on a relation.
extend
Waiting to extend a relation.
page
Waiting to acquire a lock on page of a relation.
tuple
Waiting to acquire a lock on a tuple.
transactionid
Waiting for a transaction to finish.
virtualxid
Waiting to acquire a virtual xid lock.
speculative token
Waiting to acquire a speculative insertion lock.
object
Waiting to acquire a lock on a non-relation database object.
userlock
Waiting to acquire a user lock.
advisory
Waiting to acquire an advisory user lock.
BufferPin
BufferPin
Waiting to acquire a pin on a buffer.
Activity
ArchiverMain
Waiting in main loop of the archiver process.
AutoVacuumMain
Waiting in main loop of autovacuum launcher process.
BgWriterHibernate
Waiting in background writer process, hibernating.
BgWriterMain
Waiting in main loop of background writer process background worker.
CheckpointerMain
Waiting in main loop of checkpointer process.
LogicalApplyMain
Waiting in main loop of logical apply process.
LogicalLauncherMain
Waiting in main loop of logical launcher process.
PgStatMain
Waiting in main loop of the statistics collector process.
RecoveryWalAll
Waiting for WAL from a stream at recovery.
RecoveryWalStream
Waiting when WAL data is not available from any kind of sources (local, archive or stream) before trying again to retrieve WAL data, at recovery.
SysLoggerMain
Waiting in main loop of syslogger process.
WalReceiverMain
Waiting in main loop of WAL receiver process.
WalSenderMain
Waiting in main loop of WAL sender process.
WalWriterMain
Waiting in main loop of WAL writer process.
Client
ClientRead
Waiting to read data from the client.
ClientWrite
Waiting to write data to the client.
GSSOpenServer
Waiting to read data from the client while establishing the GSSAPI session.
LibPQWalReceiverConnect
Waiting in WAL receiver to establish connection to remote server.
LibPQWalReceiverReceive
Waiting in WAL receiver to receive data from remote server.
SSLOpenServer
Waiting for SSL while attempting connection.
WalReceiverWaitStart
Waiting for startup process to send initial data for streaming replication.
WalSenderWaitForWAL
Waiting for WAL to be flushed in WAL sender process.
WalSenderWriteData
Waiting for any activity when processing replies from WAL receiver in WAL sender process.
Extension
Extension
Waiting in an extension.
IPC
BgWorkerShutdown
Waiting for background worker to shut down.
BgWorkerStartup
Waiting for background worker to start up.
BtreePage
Waiting for the page number needed to continue a parallel B-tree scan to become available.
CheckpointDone
Waiting for a checkpoint to complete.
CheckpointStart
Waiting for a checkpoint to start.
ClogGroupUpdate
Waiting for group leader to update transaction status at transaction end.
ExecuteGather
Waiting for activity from child process when executing Gather
node.
Hash/Batch/Allocating
Waiting for an elected Parallel Hash participant to allocate a hash table.
Hash/Batch/Electing
Electing a Parallel Hash participant to allocate a hash table.
Hash/Batch/Loading
Waiting for other Parallel Hash participants to finish loading a hash table.
Hash/Build/Allocating
Waiting for an elected Parallel Hash participant to allocate the initial hash table.
Hash/Build/Electing
Electing a Parallel Hash participant to allocate the initial hash table.
Hash/Build/HashingInner
Waiting for other Parallel Hash participants to finish hashing the inner relation.
Hash/Build/HashingOuter
Waiting for other Parallel Hash participants to finish partitioning the outer relation.
Hash/GrowBatches/Allocating
Waiting for an elected Parallel Hash participant to allocate more batches.
Hash/GrowBatches/Deciding
Electing a Parallel Hash participant to decide on future batch growth.
Hash/GrowBatches/Electing
Electing a Parallel Hash participant to allocate more batches.
Hash/GrowBatches/Finishing
Waiting for an elected Parallel Hash participant to decide on future batch growth.
Hash/GrowBatches/Repartitioning
Waiting for other Parallel Hash participants to finishing repartitioning.
Hash/GrowBuckets/Allocating
Waiting for an elected Parallel Hash participant to finish allocating more buckets.
Hash/GrowBuckets/Electing
Electing a Parallel Hash participant to allocate more buckets.
Hash/GrowBuckets/Reinserting
Waiting for other Parallel Hash participants to finish inserting tuples into new buckets.
LogicalSyncData
Waiting for logical replication remote server to send data for initial table synchronization.
LogicalSyncStateChange
Waiting for logical replication remote server to change state.
MessageQueueInternal
Waiting for other process to be attached in shared message queue.
MessageQueuePutMessage
Waiting to write a protocol message to a shared message queue.
MessageQueueReceive
Waiting to receive bytes from a shared message queue.
MessageQueueSend
Waiting to send bytes to a shared message queue.
ParallelBitmapScan
Waiting for parallel bitmap scan to become initialized.
ParallelCreateIndexScan
Waiting for parallel CREATE INDEX
workers to finish heap scan.
ParallelFinish
Waiting for parallel workers to finish computing.
ProcArrayGroupUpdate
Waiting for group leader to clear transaction id at transaction end.
Promote
Waiting for standby promotion.
ReplicationOriginDrop
Waiting for a replication origin to become inactive to be dropped.
ReplicationSlotDrop
Waiting for a replication slot to become inactive to be dropped.
SafeSnapshot
Waiting for a snapshot for a READ ONLY DEFERRABLE
transaction.
SyncRep
Waiting for confirmation from remote server during synchronous replication.
Timeout
BaseBackupThrottle
Waiting during base backup when throttling activity.
PgSleep
Waiting in process that called pg_sleep
.
RecoveryApplyDelay
Waiting to apply WAL at recovery because it is delayed.
IO
BufFileRead
Waiting for a read from a buffered file.
BufFileWrite
Waiting for a write to a buffered file.
ControlFileRead
Waiting for a read from the control file.
ControlFileSync
Waiting for the control file to reach stable storage.
ControlFileSyncUpdate
Waiting for an update to the control file to reach stable storage.
ControlFileWrite
Waiting for a write to the control file.
ControlFileWriteUpdate
Waiting for a write to update the control file.
CopyFileRead
Waiting for a read during a file copy operation.
CopyFileWrite
Waiting for a write during a file copy operation.
DataFileExtend
Waiting for a relation data file to be extended.
DataFileFlush
Waiting for a relation data file to reach stable storage.
DataFileImmediateSync
Waiting for an immediate synchronization of a relation data file to stable storage.
DataFilePrefetch
Waiting for an asynchronous prefetch from a relation data file.
DataFileRead
Waiting for a read from a relation data file.
DataFileSync
Waiting for changes to a relation data file to reach stable storage.
DataFileTruncate
Waiting for a relation data file to be truncated.
DataFileWrite
Waiting for a write to a relation data file.
DSMFillZeroWrite
Waiting to write zero bytes to a dynamic shared memory backing file.
LockFileAddToDataDirRead
Waiting for a read while adding a line to the data directory lock file.
LockFileAddToDataDirSync
Waiting for data to reach stable storage while adding a line to the data directory lock file.
LockFileAddToDataDirWrite
Waiting for a write while adding a line to the data directory lock file.
LockFileCreateRead
Waiting to read while creating the data directory lock file.
LockFileCreateSync
Waiting for data to reach stable storage while creating the data directory lock file.
LockFileCreateWrite
Waiting for a write while creating the data directory lock file.
LockFileReCheckDataDirRead
Waiting for a read during recheck of the data directory lock file.
LogicalRewriteCheckpointSync
Waiting for logical rewrite mappings to reach stable storage during a checkpoint.
LogicalRewriteMappingSync
Waiting for mapping data to reach stable storage during a logical rewrite.
LogicalRewriteMappingWrite
Waiting for a write of mapping data during a logical rewrite.
LogicalRewriteSync
Waiting for logical rewrite mappings to reach stable storage.
LogicalRewriteTruncate
Waiting for truncate of mapping data during a logical rewrite.
LogicalRewriteWrite
Waiting for a write of logical rewrite mappings.
RelationMapRead
Waiting for a read of the relation map file.
RelationMapSync
Waiting for the relation map file to reach stable storage.
RelationMapWrite
Waiting for a write to the relation map file.
ReorderBufferRead
Waiting for a read during reorder buffer management.
ReorderBufferWrite
Waiting for a write during reorder buffer management.
ReorderLogicalMappingRead
Waiting for a read of a logical mapping during reorder buffer management.
ReplicationSlotRead
Waiting for a read from a replication slot control file.
ReplicationSlotRestoreSync
Waiting for a replication slot control file to reach stable storage while restoring it to memory.
ReplicationSlotSync
Waiting for a replication slot control file to reach stable storage.
ReplicationSlotWrite
Waiting for a write to a replication slot control file.
SLRUFlushSync
Waiting for SLRU data to reach stable storage during a checkpoint or database shutdown.
SLRURead
Waiting for a read of an SLRU page.
SLRUSync
Waiting for SLRU data to reach stable storage following a page write.
SLRUWrite
Waiting for a write of an SLRU page.
SnapbuildRead
Waiting for a read of a serialized historical catalog snapshot.
SnapbuildSync
Waiting for a serialized historical catalog snapshot to reach stable storage.
SnapbuildWrite
Waiting for a write of a serialized historical catalog snapshot.
TimelineHistoryFileSync
Waiting for a timeline history file received via streaming replication to reach stable storage.
TimelineHistoryFileWrite
Waiting for a write of a timeline history file received via streaming replication.
TimelineHistoryRead
Waiting for a read of a timeline history file.
TimelineHistorySync
Waiting for a newly created timeline history file to reach stable storage.
TimelineHistoryWrite
Waiting for a write of a newly created timeline history file.
TwophaseFileRead
Waiting for a read of a two phase state file.
TwophaseFileSync
Waiting for a two phase state file to reach stable storage.
TwophaseFileWrite
Waiting for a write of a two phase state file.
WALBootstrapSync
Waiting for WAL to reach stable storage during bootstrapping.
WALBootstrapWrite
Waiting for a write of a WAL page during bootstrapping.
WALCopyRead
Waiting for a read when creating a new WAL segment by copying an existing one.
WALCopySync
Waiting a new WAL segment created by copying an existing one to reach stable storage.
WALCopyWrite
Waiting for a write when creating a new WAL segment by copying an existing one.
WALInitSync
Waiting for a newly initialized WAL file to reach stable storage.
WALInitWrite
Waiting for a write while initializing a new WAL file.
WALRead
Waiting for a read from a WAL file.
WALSenderTimelineHistoryRead
Waiting for a read from a timeline history file during walsender timeline command.
WALSync
Waiting for a WAL file to reach stable storage.
WALSyncMethodAssign
Waiting for data to reach stable storage while assigning WAL sync method.
WALWrite
Waiting for a write to a WAL file.
Column
Type
Description
pid
integer
Process ID of a WAL sender process
usesysid
oid
OID of the user logged into this WAL sender process
usename
name
Name of the user logged into this WAL sender process
application_name
text
Name of the application that is connected to this WAL sender
client_addr
inet
IP address of the client connected to this WAL sender. If this field is null, it indicates that the client is connected via a Unix socket on the server machine.
client_hostname
text
Host name of the connected client, as reported by a reverse DNS lookup of client_addr
. This field will only be non-null for IP connections, and only when log_hostname is enabled.
client_port
integer
TCP port number that the client is using for communication with this WAL sender, or -1
if a Unix socket is used
backend_start
timestamp with time zone
Time when this process was started, i.e., when the client connected to this WAL sender
backend_xmin
xid
This standby's xmin
horizon reported by hot_standby_feedback.
state
text
Current WAL sender state. Possible values are:
startup
: This WAL sender is starting up.
catchup
: This WAL sender's connected standby is catching up with the primary.
streaming
: This WAL sender is streaming changes after its connected standby server has caught up with the primary.
backup
: This WAL sender is sending a backup.
stopping
: This WAL sender is stopping.
sent_lsn
pg_lsn
Last write-ahead log location sent on this connection
write_lsn
pg_lsn
Last write-ahead log location written to disk by this standby server
flush_lsn
pg_lsn
Last write-ahead log location flushed to disk by this standby server
replay_lsn
pg_lsn
Last write-ahead log location replayed into the database on this standby server
write_lag
interval
Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written it (but not yet flushed it or applied it). This can be used to gauge the delay that synchronous_commit
level remote_write
incurred while committing if this server was configured as a synchronous standby.
flush_lag
interval
Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written and flushed it (but not yet applied it). This can be used to gauge the delay that synchronous_commit
level on
incurred while committing if this server was configured as a synchronous standby.
replay_lag
interval
Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written, flushed and applied it. This can be used to gauge the delay that synchronous_commit
level remote_apply
incurred while committing if this server was configured as a synchronous standby.
sync_priority
integer
Priority of this standby server for being chosen as the synchronous standby in a priority-based synchronous replication. This has no effect in a quorum-based synchronous replication.
sync_state
text
Synchronous state of this standby server. Possible values are:
async
: This standby server is asynchronous.
potential
: This standby server is now asynchronous, but can potentially become synchronous if one of current synchronous ones fails.
sync
: This standby server is synchronous.
quorum
: This standby server is considered as a candidate for quorum standbys.
reply_time
timestamp with time zone
Send time of last reply message received from standby server
Column
Type
Description
pid
integer
Process ID of the WAL receiver process
status
text
Activity status of the WAL receiver process
receive_start_lsn
pg_lsn
First write-ahead log location used when WAL receiver is started
receive_start_tli
integer
First timeline number used when WAL receiver is started
received_lsn
pg_lsn
Last write-ahead log location already received and flushed to disk, the initial value of this field being the first log location used when WAL receiver is started
received_tli
integer
Timeline number of last write-ahead log location received and flushed to disk, the initial value of this field being the timeline number of the first log location used when WAL receiver is started
last_msg_send_time
timestamp with time zone
Send time of last message received from origin WAL sender
last_msg_receipt_time
timestamp with time zone
Receipt time of last message received from origin WAL sender
latest_end_lsn
pg_lsn
Last write-ahead log location reported to origin WAL sender
latest_end_time
timestamp with time zone
Time of last write-ahead log location reported to origin WAL sender
slot_name
text
Replication slot name used by this WAL receiver
sender_host
text
Host of the PostgreSQL instance this WAL receiver is connected to. This can be a host name, an IP address, or a directory path if the connection is via Unix socket. (The path case can be distinguished because it will always be an absolute path, beginning with /
.)
sender_port
integer
Port number of the PostgreSQL instance this WAL receiver is connected to.
conninfo
text
Connection string used by this WAL receiver, with security-sensitive fields obfuscated.
Column
Type
Description
subid
oid
OID of the subscription
subname
text
Name of the subscription
pid
integer
Process ID of the subscription worker process
relid
Oid
OID of the relation that the worker is synchronizing; null for the main apply worker
received_lsn
pg_lsn
Last write-ahead log location received, the initial value of this field being 0
last_msg_send_time
timestamp with time zone
Send time of last message received from origin WAL sender
last_msg_receipt_time
timestamp with time zone
Receipt time of last message received from origin WAL sender
latest_end_lsn
pg_lsn
Last write-ahead log location reported to origin WAL sender
latest_end_time
timestamp with time zone
Time of last write-ahead log location reported to origin WAL sender
Column
Type
Description
pid
integer
Process ID of a backend or WAL sender process
ssl
boolean
True if SSL is used on this connection
version
text
Version of SSL in use, or NULL if SSL is not in use on this connection
cipher
text
Name of SSL cipher in use, or NULL if SSL is not in use on this connection
bits
integer
Number of bits in the encryption algorithm used, or NULL if SSL is not used on this connection
compression
boolean
True if SSL compression is in use, false if not, or NULL if SSL is not in use on this connection
client_dn
text
Distinguished Name (DN) field from the client certificate used, or NULL if no client certificate was supplied or if SSL is not in use on this connection. This field is truncated if the DN field is longer than NAMEDATALEN
(64 characters in a standard build).
client_serial
numeric
Serial number of the client certificate, or NULL if no client certificate was supplied or if SSL is not in use on this connection. The combination of certificate serial number and certificate issuer uniquely identifies a certificate (unless the issuer erroneously reuses serial numbers).
issuer_dn
text
DN of the issuer of the client certificate, or NULL if no client certificate was supplied or if SSL is not in use on this connection. This field is truncated like client_dn
.
Column
Type
Description
pid
integer
Process ID of a backend
gss_authenticated
boolean
True if GSSAPI authentication was used for this connection
principal
text
Principal used to authenticate this connection, or NULL if GSSAPI was not used to authenticate this connection. This field is truncated if the principal is longer than NAMEDATALEN
(64 characters in a standard build).
encrypted
boolean
True if GSSAPI encryption is in use on this connection
Column
Type
Description
archived_count
bigint
Number of WAL files that have been successfully archived
last_archived_wal
text
Name of the last WAL file successfully archived
last_archived_time
timestamp with time zone
Time of the last successful archive operation
failed_count
bigint
Number of failed attempts for archiving WAL files
last_failed_wal
text
Name of the WAL file of the last failed archival operation
last_failed_time
timestamp with time zone
Time of the last failed archival operation
stats_reset
timestamp with time zone
Time at which these statistics were last reset
Column
Type
Description
checkpoints_timed
bigint
Number of scheduled checkpoints that have been performed
checkpoints_req
bigint
Number of requested checkpoints that have been performed
checkpoint_write_time
double precision
Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in milliseconds
checkpoint_sync_time
double precision
Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk, in milliseconds
buffers_checkpoint
bigint
Number of buffers written during checkpoints
buffers_clean
bigint
Number of buffers written by the background writer
maxwritten_clean
bigint
Number of times the background writer stopped a cleaning scan because it had written too many buffers
buffers_backend
bigint
Number of buffers written directly by a backend
buffers_backend_fsync
bigint
Number of times a backend had to execute its own fsync
call (normally the background writer handles those even when the backend does its own write)
buffers_alloc
bigint
Number of buffers allocated
stats_reset
timestamp with time zone
Time at which these statistics were last reset
Column
Type
Description
datid
oid
OID of this database, or 0 for objects belonging to a shared relation
datname
name
Name of this database, or NULL
for the shared objects.
numbackends
integer
Number of backends currently connected to this database, or NULL
for the shared objects. This is the only column in this view that returns a value reflecting current state; all other columns return the accumulated values since the last reset.
xact_commit
bigint
Number of transactions in this database that have been committed
xact_rollback
bigint
Number of transactions in this database that have been rolled back
blks_read
bigint
Number of disk blocks read in this database
blks_hit
bigint
Number of times disk blocks were found already in the buffer cache, so that a read was not necessary (this only includes hits in the PostgreSQL buffer cache, not the operating system's file system cache)
tup_returned
bigint
Number of rows returned by queries in this database
tup_fetched
bigint
Number of rows fetched by queries in this database
tup_inserted
bigint
Number of rows inserted by queries in this database
tup_updated
bigint
Number of rows updated by queries in this database
tup_deleted
bigint
Number of rows deleted by queries in this database
conflicts
bigint
Number of queries canceled due to conflicts with recovery in this database. (Conflicts occur only on standby servers; see pg_stat_database_conflicts for details.)
temp_files
bigint
Number of temporary files created by queries in this database. All temporary files are counted, regardless of why the temporary file was created (e.g., sorting or hashing), and regardless of the log_temp_files setting.
temp_bytes
bigint
Total amount of data written to temporary files by queries in this database. All temporary files are counted, regardless of why the temporary file was created, and regardless of the log_temp_files setting.
deadlocks
bigint
Number of deadlocks detected in this database
checksum_failures
bigint
Number of data page checksum failures detected in this database (or on a shared object), or NULL if data checksums are not enabled.
checksum_last_failure
timestamp with time zone
Time at which the last data page checksum failure was detected in this database (or on a shared object), or NULL if data checksums are not enabled.
blk_read_time
double precision
Time spent reading data file blocks by backends in this database, in milliseconds
blk_write_time
double precision
Time spent writing data file blocks by backends in this database, in milliseconds
stats_reset
timestamp with time zone
Time at which these statistics were last reset
Column
Type
Description
datid
oid
OID of a database
datname
name
Name of this database
confl_tablespace
bigint
Number of queries in this database that have been canceled due to dropped tablespaces
confl_lock
bigint
Number of queries in this database that have been canceled due to lock timeouts
confl_snapshot
bigint
Number of queries in this database that have been canceled due to old snapshots
confl_bufferpin
bigint
Number of queries in this database that have been canceled due to pinned buffers
confl_deadlock
bigint
Number of queries in this database that have been canceled due to deadlocks
Column
Type
Description
relid
oid
OID of a table
schemaname
name
Name of the schema that this table is in
relname
name
Name of this table
seq_scan
bigint
Number of sequential scans initiated on this table
seq_tup_read
bigint
Number of live rows fetched by sequential scans
idx_scan
bigint
Number of index scans initiated on this table
idx_tup_fetch
bigint
Number of live rows fetched by index scans
n_tup_ins
bigint
Number of rows inserted
n_tup_upd
bigint
Number of rows updated (includes HOT updated rows)
n_tup_del
bigint
Number of rows deleted
n_tup_hot_upd
bigint
Number of rows HOT updated (i.e., with no separate index update required)
n_live_tup
bigint
Estimated number of live rows
n_dead_tup
bigint
Estimated number of dead rows
n_mod_since_analyze
bigint
Estimated number of rows modified since this table was last analyzed
last_vacuum
timestamp with time zone
Last time at which this table was manually vacuumed (not counting VACUUM FULL
)
last_autovacuum
timestamp with time zone
Last time at which this table was vacuumed by the autovacuum daemon
last_analyze
timestamp with time zone
Last time at which this table was manually analyzed
last_autoanalyze
timestamp with time zone
Last time at which this table was analyzed by the autovacuum daemon
vacuum_count
bigint
Number of times this table has been manually vacuumed (not counting VACUUM FULL
)
autovacuum_count
bigint
Number of times this table has been vacuumed by the autovacuum daemon
analyze_count
bigint
Number of times this table has been manually analyzed
autoanalyze_count
bigint
Number of times this table has been analyzed by the autovacuum daemon
Column
Type
Description
relid
oid
OID of the table for this index
indexrelid
oid
OID of this index
schemaname
name
Name of the schema this index is in
relname
name
Name of the table for this index
indexrelname
name
Name of this index
idx_scan
bigint
Number of index scans initiated on this index
idx_tup_read
bigint
Number of index entries returned by scans on this index
idx_tup_fetch
bigint
Number of live table rows fetched by simple index scans using this index
Column
Type
Description
relid
oid
OID of a table
schemaname
name
Name of the schema that this table is in
relname
name
Name of this table
heap_blks_read
bigint
Number of disk blocks read from this table
heap_blks_hit
bigint
Number of buffer hits in this table
idx_blks_read
bigint
Number of disk blocks read from all indexes on this table
idx_blks_hit
bigint
Number of buffer hits in all indexes on this table
toast_blks_read
bigint
Number of disk blocks read from this table's TOAST table (if any)
toast_blks_hit
bigint
Number of buffer hits in this table's TOAST table (if any)
tidx_blks_read
bigint
Number of disk blocks read from this table's TOAST table indexes (if any)
tidx_blks_hit
bigint
Number of buffer hits in this table's TOAST table indexes (if any)
Column
Type
Description
relid
oid
OID of the table for this index
indexrelid
oid
OID of this index
schemaname
name
Name of the schema this index is in
relname
name
Name of the table for this index
indexrelname
name
Name of this index
idx_blks_read
bigint
Number of disk blocks read from this index
idx_blks_hit
bigint
Number of buffer hits in this index
Column
Type
Description
relid
oid
OID of a sequence
schemaname
name
Name of the schema this sequence is in
relname
name
Name of this sequence
blks_read
bigint
Number of disk blocks read from this sequence
blks_hit
bigint
Number of buffer hits in this sequence
Column
Type
Description
funcid
oid
OID of a function
schemaname
name
Name of the schema this function is in
funcname
name
Name of this function
calls
bigint
Number of times this function has been called
total_time
double precision
Total time spent in this function and all other functions called by it, in milliseconds
self_time
double precision
Total time spent in this function itself, not including other functions called by it, in milliseconds
Function
Return Type
Description
pg_backend_pid()
integer
Process ID of the server process handling the current session
pg_stat_get_activity
(integer
)
setof record
Returns a record of information about the backend with the specified PID, or one record for each active backend in the system if NULL
is specified. The fields returned are a subset of those in the pg_stat_activity
view.
pg_stat_get_snapshot_timestamp()
timestamp with time zone
Returns the timestamp of the current statistics snapshot
pg_stat_clear_snapshot()
void
Discard the current statistics snapshot
pg_stat_reset()
void
將目前資料庫的所有統計數據計數器重置為零(預設情況下需要超級使用者權限,但是也可以將此函數的 EXECUTE 權限授予其他人。)
pg_stat_reset_shared
(text)
void
Reset some cluster-wide statistics counters to zero, depending on the argument (requires superuser privileges by default, but EXECUTE for this function can be granted to others). Calling pg_stat_reset_shared('bgwriter')
will zero all the counters shown in the pg_stat_bgwriter
view. Calling pg_stat_reset_shared('archiver')
will zero all the counters shown in the pg_stat_archiver
view.
pg_stat_reset_single_table_counters
(oid)
void
Reset statistics for a single table or index in the current database to zero (requires superuser privileges by default, but EXECUTE for this function can be granted to others)
pg_stat_reset_single_function_counters
(oid)
void
Reset statistics for a single function in the current database to zero (requires superuser privileges by default, but EXECUTE for this function can be granted to others)
Function
Return Type
Description
pg_stat_get_backend_idset()
setof integer
Set of currently active backend ID numbers (from 1 to the number of active backends)
pg_stat_get_backend_activity(integer)
text
Text of this backend's most recent query
pg_stat_get_backend_activity_start(integer)
timestamp with time zone
Time when the most recent query was started
pg_stat_get_backend_client_addr(integer)
inet
IP address of the client connected to this backend
pg_stat_get_backend_client_port(integer)
integer
TCP port number that the client is using for communication
pg_stat_get_backend_dbid(integer)
oid
OID of the database this backend is connected to
pg_stat_get_backend_pid(integer)
integer
Process ID of this backend
pg_stat_get_backend_start(integer)
timestamp with time zone
Time when this process was started
pg_stat_get_backend_userid(integer)
oid
OID of the user logged into this backend
pg_stat_get_backend_wait_event_type(integer)
text
Wait event type name if backend is currently waiting, otherwise NULL. See Table 27.4 for details.
pg_stat_get_backend_wait_event(integer)
text
Wait event name if backend is currently waiting, otherwise NULL. See Table 27.4 for details.
pg_stat_get_backend_xact_start(integer)
timestamp with time zone
Time when the current transaction was started
Column
Type
Description
pid
integer
Process ID of backend.
datid
oid
OID of the database to which this backend is connected.
datname
name
Name of the database to which this backend is connected.
relid
oid
OID of the table on which the index is being created.
index_relid
oid
OID of the index being created or reindexed. During a non-concurrent CREATE INDEX
, this is 0.
command
text
The command that is running: CREATE INDEX
, CREATE INDEX CONCURRENTLY
, REINDEX
, or REINDEX CONCURRENTLY
.
phase
text
Current processing phase of index creation. See Table 27.23.
lockers_total
bigint
Total number of lockers to wait for, when applicable.
lockers_done
bigint
Number of lockers already waited for.
current_locker_pid
bigint
Process ID of the locker currently being waited for.
blocks_total
bigint
Total number of blocks to be processed in the current phase.
blocks_done
bigint
Number of blocks already processed in the current phase.
tuples_total
bigint
Total number of tuples to be processed in the current phase.
tuples_done
bigint
Number of tuples already processed in the current phase.
partitions_total
bigint
When creating an index on a partitioned table, this column is set to the total number of partitions on which the index is to be created.
partitions_done
bigint
When creating an index on a partitioned table, this column is set to the number of partitions on which the index has been completed.
Phase
Description
initializing
CREATE INDEX
or REINDEX
is preparing to create the index. This phase is expected to be very brief.
waiting for writers before build
CREATE INDEX CONCURRENTLY
or REINDEX CONCURRENTLY
is waiting for transactions with write locks that can potentially see the table to finish. This phase is skipped when not in concurrent mode. Columns lockers_total
, lockers_done
and current_locker_pid
contain the progress information for this phase.
building index
The index is being built by the access method-specific code. In this phase, access methods that support progress reporting fill in their own progress data, and the subphase is indicated in this column. Typically, blocks_total
and blocks_done
will contain progress data, as well as potentially tuples_total
and tuples_done
.
waiting for writers before validation
CREATE INDEX CONCURRENTLY
or REINDEX CONCURRENTLY
is waiting for transactions with write locks that can potentially write into the table to finish. This phase is skipped when not in concurrent mode. Columns lockers_total
, lockers_done
and current_locker_pid
contain the progress information for this phase.
index validation: scanning index
CREATE INDEX CONCURRENTLY
is scanning the index searching for tuples that need to be validated. This phase is skipped when not in concurrent mode. Columns blocks_total
(set to the total size of the index) and blocks_done
contain the progress information for this phase.
index validation: sorting tuples
CREATE INDEX CONCURRENTLY
is sorting the output of the index scanning phase.
index validation: scanning table
CREATE INDEX CONCURRENTLY
is scanning the table to validate the index tuples collected in the previous two phases. This phase is skipped when not in concurrent mode. Columns blocks_total
(set to the total size of the table) and blocks_done
contain the progress information for this phase.
waiting for old snapshots
CREATE INDEX CONCURRENTLY
or REINDEX CONCURRENTLY
is waiting for transactions that can potentially see the table to release their snapshots. This phase is skipped when not in concurrent mode. Columns lockers_total
, lockers_done
and current_locker_pid
contain the progress information for this phase.
waiting for readers before marking dead
REINDEX CONCURRENTLY
is waiting for transactions with read locks on the table to finish, before marking the old index dead. This phase is skipped when not in concurrent mode. Columns lockers_total
, lockers_done
and current_locker_pid
contain the progress information for this phase.
waiting for readers before dropping
REINDEX CONCURRENTLY
is waiting for transactions with read locks on the table to finish, before dropping the old index. This phase is skipped when not in concurrent mode. Columns lockers_total
, lockers_done
and current_locker_pid
contain the progress information for this phase.
Column
Type
Description
pid
integer
Process ID of backend.
datid
oid
OID of the database to which this backend is connected.
datname
name
Name of the database to which this backend is connected.
relid
oid
OID of the table being vacuumed.
phase
text
Current processing phase of vacuum. See Table 27.25.
heap_blks_total
bigint
Total number of heap blocks in the table. This number is reported as of the beginning of the scan; blocks added later will not be (and need not be) visited by this VACUUM
.
heap_blks_scanned
bigint
Number of heap blocks scanned. Because the visibility map is used to optimize scans, some blocks will be skipped without inspection; skipped blocks are included in this total, so that this number will eventually become equal to heap_blks_total
when the vacuum is complete. This counter only advances when the phase is scanning heap
.
heap_blks_vacuumed
bigint
Number of heap blocks vacuumed. Unless the table has no indexes, this counter only advances when the phase is vacuuming heap
. Blocks that contain no dead tuples are skipped, so the counter may sometimes skip forward in large increments.
index_vacuum_count
bigint
Number of completed index vacuum cycles.
max_dead_tuples
bigint
Number of dead tuples that we can store before needing to perform an index vacuum cycle, based on maintenance_work_mem.
num_dead_tuples
bigint
Number of dead tuples collected since the last index vacuum cycle.
Phase
Description
initializing
VACUUM
is preparing to begin scanning the heap. This phase is expected to be very brief.
scanning heap
VACUUM
is currently scanning the heap. It will prune and defragment each page if required, and possibly perform freezing activity. The heap_blks_scanned
column can be used to monitor the progress of the scan.
vacuuming indexes
VACUUM
is currently vacuuming the indexes. If a table has any indexes, this will happen at least once per vacuum, after the heap has been completely scanned. It may happen multiple times per vacuum if maintenance_work_mem is insufficient to store the number of dead tuples found.
vacuuming heap
VACUUM
is currently vacuuming the heap. Vacuuming the heap is distinct from scanning the heap, and occurs after each instance of vacuuming indexes. If heap_blks_scanned
is less than heap_blks_total
, the system will return to scanning the heap after this phase is completed; otherwise, it will begin cleaning up indexes after this phase is completed.
cleaning up indexes
VACUUM
is currently cleaning up indexes. This occurs after the heap has been completely scanned and all vacuuming of the indexes and the heap has been completed.
truncating heap
VACUUM
is currently truncating the heap so as to return empty pages at the end of the relation to the operating system. This occurs after cleaning up indexes.
performing final cleanup
VACUUM
is performing final cleanup. During this phase, VACUUM
will vacuum the free space map, update statistics in pg_class
, and report statistics to the statistics collector. When this phase is completed, VACUUM
will end.
Column
Type
Description
pid
integer
Process ID of backend.
datid
oid
OID of the database to which this backend is connected.
datname
name
Name of the database to which this backend is connected.
relid
oid
OID of the table being clustered.
command
text
The command that is running. Either CLUSTER
or VACUUM FULL
.
phase
text
Current processing phase. See Table 27.27.
cluster_index_relid
oid
If the table is being scanned using an index, this is the OID of the index being used; otherwise, it is zero.
heap_tuples_scanned
bigint
Number of heap tuples scanned. This counter only advances when the phase is seq scanning heap
, index scanning heap
or writing new heap
.
heap_tuples_written
bigint
Number of heap tuples written. This counter only advances when the phase is seq scanning heap
, index scanning heap
or writing new heap
.
heap_blks_total
bigint
Total number of heap blocks in the table. This number is reported as of the beginning of seq scanning heap
.
heap_blks_scanned
bigint
Number of heap blocks scanned. This counter only advances when the phase is seq scanning heap
.
index_rebuild_count
bigint
Number of indexes rebuilt. This counter only advances when the phase is rebuilding index
.
Phase
Description
initializing
The command is preparing to begin scanning the heap. This phase is expected to be very brief.
seq scanning heap
The command is currently scanning the table using a sequential scan.
index scanning heap
CLUSTER
is currently scanning the table using an index scan.
sorting tuples
CLUSTER
is currently sorting tuples.
writing new heap
CLUSTER
is currently writing the new heap.
swapping relation files
The command is currently swapping newly-built files into place.
rebuilding index
The command is currently rebuilding an index.
performing final cleanup
The command is performing final cleanup. When this phase is completed, CLUSTER
or VACUUM FULL
will end.