73.1. Database File Layout
This section describes the storage format at the level of files and directories.
Traditionally, the configuration and data files used by a database cluster are stored together within the cluster's data directory, commonly referred to as
PGDATA(after the name of the environment variable that can be used to define it). A common location for
/var/lib/pgsql/data. Multiple clusters, managed by different server instances, can exist on the same machine.
PGDATAdirectory contains several subdirectories and control files, as shown in Table 68.1. In addition to these required items, the cluster configuration files
pg_ident.confare traditionally stored in
PGDATA, although it is possible to place them elsewhere.
Table 68.1. Contents of
For each database in the cluster there is a subdirectory within
PGDATA/base, named after the database's OID in
pg_database. This subdirectory is the default location for the database's files; in particular, its system catalogs are stored there.
Each table and index is stored in a separate file. For ordinary relations, these files are named after the table or index's filenode number, which can be found in
relfilenode. But for temporary relations, the file name is of the form
BBBis the backend ID of the backend which created the file, and
FFFis the filenode number. In either case, in addition to the main file (a/k/a main fork), each table and index has a free space map (see Section 68.3), which stores information about free space available in the relation. The free space map is stored in a file named with the filenode number plus the suffix
_fsm. Tables also have a visibility map, stored in a fork with the suffix
_vm, to track which pages are known to have no dead tuples. The visibility map is described further in Section 68.4. Unlogged tables and indexes have a third fork, known as the initialization fork, which is stored in a fork with the suffix
_init(see Section 68.5).
Note that while a table's filenode often matches its OID, this is not necessarily the case; some operations, like
CLUSTERand some forms of
ALTER TABLE, can change the filenode while preserving the OID. Avoid assuming that filenode and table OID are the same. Also, for certain system catalogs including
relfilenodecontains zero. The actual filenode number of these catalogs is stored in a lower-level data structure, and can be obtained using the
When a table or index exceeds 1 GB, it is divided into gigabyte-sized segments. The first segment's file name is the same as the filenode; subsequent segments are named filenode.1, filenode.2, etc. This arrangement avoids problems on platforms that have file size limitations. (Actually, 1 GB is just the default segment size. The segment size can be adjusted using the configuration option
--with-segsizewhen building PostgreSQL.) In principle, free space map and visibility map forks could require multiple segments as well, though this is unlikely to happen in practice.
A table that has columns with potentially large entries will have an associated TOAST table, which is used for out-of-line storage of field values that are too large to keep in the table rows proper.
reltoastrelidlinks from a table to its TOAST table, if any. See Section 68.2 for more information.
Tablespaces make the scenario more complicated. Each user-defined tablespace has a symbolic link inside the
PGDATA/pg_tblspcdirectory, which points to the physical tablespace directory (i.e., the location specified in the tablespace's
CREATE TABLESPACEcommand). This symbolic link is named after the tablespace's OID. Inside the physical tablespace directory there is a subdirectory with a name that depends on the PostgreSQL server version, such as
PG_9.0_201008051. (The reason for using this subdirectory is so that successive versions of the database can use the same
CREATE TABLESPACElocation value without conflicts.) Within the version-specific subdirectory, there is a subdirectory for each database that has elements in the tablespace, named after the database's OID. Tables and indexes are stored within that directory, using the filenode naming scheme. The
pg_defaulttablespace is not accessed through
pg_tblspc, but corresponds to
PGDATA/base. Similarly, the
pg_globaltablespace is not accessed through
pg_tblspc, but corresponds to
pg_relation_filepath()function shows the entire path (relative to
PGDATA) of any relation. It is often useful as a substitute for remembering many of the above rules. But keep in mind that this function just gives the name of the first segment of the main fork of the relation — you may need to append a segment number and/or
_initto find all the files associated with the relation.
Temporary files (for operations such as sorting more data than can fit in memory) are created within
PGDATA/base/pgsql_tmp, or within a
pgsql_tmpsubdirectory of a tablespace directory if a tablespace other than
pg_defaultis specified for them. The name of a temporary file has the form
PPPis the PID of the owning backend and
NNNdistinguishes different temporary files of that backend.