PostgreSQL 正體中文使用手冊
PostgreSQL.TW官方使用手冊小島故事加入社團
11
11
  • 簡介
  • 前言
    • 1. 什麼是PostgreSQL?
    • 2. PostgreSQL沿革
    • 3. 慣例
    • 4. 其他參考資訊
    • 5. 問題回報指南
  • I. 新手教學
    • 1. 入門指南
      • 1.1. 安裝
      • 1.2. 基礎架構
      • 1.3. 建立一個資料庫
      • 1.4. 存取一個資料庫
    • 2. SQL查詢語言
      • 2.1. 簡介
      • 2.2. 概念
      • 2.3. 創建一個新的資料表
      • 2.4. 資料列是資料表的組成單位
      • 2.5. 資料表的查詢
      • 2.6. 交叉查詢
      • 2.7. 彙總查詢
      • 2.8. 更新資料
      • 2.9. 刪除資料
    • 3. 先進功能
      • 3.1. 簡介
      • 3.2. 檢視表(View)
      • 3.3. 外部索引鍵
      • 3.4. 交易安全
      • 3.5. 窗函數
      • 3.6. 繼承
      • 3.7. 結論
  • II. SQL查詢語言
    • 4. SQL語法
      • 4.1. 語法結構
      • 4.2. 參數表示式
      • 4.3. 函數呼叫
    • 5. 定義資料結構
      • 5.1. 認識資料表
      • 5.2. 預設值
      • 5.3. 限制條件
      • 5.4. 系統欄位
      • 5.5. 表格變更
      • 5.6. 權限
      • 5.7. 資料列安全原則
      • 5.8. Schemas
      • 5.9. 繼承
      • 5.10. 分割資料表
      • 5.11. 外部資料
      • 5.12. 其他資料庫物件
      • 5.13. 相依性追蹤
    • 6. 資料處理
      • 6.1. 新增資料
      • 6.2. 更新資料
      • 6.3. 刪除資料
      • 6.4. 修改並回傳資料
    • 7. 資料查詢
      • 7.1. 概觀
      • 7.2. 資料表表示式
      • 7.3. 取得資料列表
      • 7.4. 合併查詢結果
      • 7.5. 資料排序
      • 7.6. 指定資料範圍
      • 7.7. 列舉資料
      • 7.8. 遞迴查詢(Common Table Expressions)
    • 8. 資料型別
      • 8.1. 數字型別
      • 8.2. 貨幣型別
      • 8.3. 字串型別
      • 8.4. 位元組型別(bytea)
      • 8.5. 日期時間型別
      • 8.6. 布林型別
      • 8.7. 列舉型別
      • 8.8. 地理資訊型別
      • 8.9. 網路資訊型別
      • 8.10. 位元字串型別
      • 8.11. 全文檢索型別
      • 8.12. UUID型別
      • 8.13. XML型別
      • 8.14. JSON型別
      • 8.15. 陣列
      • 8.16. 複合型別
      • 8.17. 範圍型別
      • 8.18. 指標型別
      • 8.19. pg_lsn型別
      • 8.20. 概念型別
    • 9. 函式及運算子
      • 9.1. 邏輯運算子
      • 9.2. 比較函式及運算子
      • 9.3. 數學函式及運算子
      • 9.4. 字串函式及運算子
      • 9.5. 位元字串函式及運算子
      • 9.6. 二元字串函式及運算子
      • 9.7. 特徵比對
      • 9.8. 型別轉換函式
      • 9.9 日期時間函式及運算子
      • 9.10. 列舉型別函式
      • 9.11. 地理資訊函式及運算子
      • 9.12. 網路位址函式及運算子
      • 9.13. 文字檢索函式及運算子
      • 9.14. XML函式
      • 9.15. JSON函式及運算子
      • 9.16. 序列函式
      • 9.17. 條件表示式
      • 9.18. 陣列函式及運算子
      • 9.19. 範圍函式及運算子
      • 9.20. 彙總函數
      • 9.21. Window函式
      • 9.22. 子查詢
      • 9.23. 資料列與陣列的比較運算
      • 9.24. 集合回傳函式
      • 9.25. 系統資訊函數
      • 9.26. 系統管理函式
      • 9.27. 觸發函式
      • 9.28. 事件觸發函式
    • 10. 型別轉換
      • 10.1. 概觀
      • 10.2. 運算子
      • 10.3. 函式
      • 10.4. 資料儲存轉換規則
      • 10.5. UNION、CASE 等相關結構
      • 10.6. SELECT輸出規則
    • 11. 索引(Index)
      • 11.1. 簡介
      • 11.2. 索引型別
      • 11.3. 多欄位索引
      • 11.4. 索引與ORDER BY
      • 11.5. 善用多個索引
      • 11.6. 唯一值索引
      • 11.7. 表示式索引
      • 11.8. 部份索引(partial index)
      • 11.9. 運算子物件及家族
      • 11.10. 索引與排序規則
      • 11.11. 索引限定查詢(Index-only scan)
      • 11.12. 檢查索引運用
    • 12. 全文檢索
      • 12.1. 簡介
      • 12.2. 查詢與索引
      • 12.3. 細部控制
      • 12.4. 延伸功能
      • 12.5. 斷詞
      • 12.6. 字典
      • 12.7. 組態範例
      • 12.8. 測試與除錯
      • 12.9. GIN及GiST索引型別
      • 12.10. psql支援
      • 12.11. 功能限制
    • 13. 一致性管理(MVCC)
      • 13.1. 簡介
      • 13.2. 交易隔離
      • 13.3. 鎖定模式
      • 13.4. 在應用端檢視資料一致性
      • 13.5. 特別注意
      • 13.6. 鎖定與索引
    • 14. 效能技巧
      • 14.1. 善用EXPLAIN
      • 14.2. 統計資訊
      • 14.3. 使用確切的JOIN方式
      • 14.4. 快速建立資料庫內容
      • 14.5. 彈性設定
    • 15. 平行查詢
      • 15.1. 如何運作?
      • 15.2. 啓用時機?
      • 15.3. 平行查詢計畫
      • 15.4. 平行查詢的安全性
  • III. 系統管理
    • 16. 用原始碼安裝
      • 16.1. Short Version
      • 16.2. Requirements
      • 16.3. Getting The Source
      • 16.4. 安裝流程
      • 16.5. Post-Installation Setup
      • 16.6. Supported Platforms
      • 16.7. 平台相關的注意事項
    • 17. 用原始碼在 Windows 上安裝
      • 17.1. Building with Visual C++ or the Microsoft Windows SDK
    • 18. 服務配置與維運
      • 18.1. PostgreSQL 使用者帳號
      • 18.2. Creating a Database Cluster
      • 18.3. Starting the Database Server
      • 18.4. 核心資源管理
      • 18.5. Shutting Down the Server
      • 18.6. Upgrading a PostgreSQL Cluster
      • 18.7. Preventing Server Spoofing
      • 18.8. Encryption Options
      • 18.9. Secure TCP/IP Connections with SSL
      • 18.10. Secure TCP/IP Connections with SSH Tunnels
      • 18.11. 在 Windows 註冊事件日誌
    • 19. 服務組態設定
      • 19.1. Setting Parameters
      • 19.2. File Locations
      • 19.3. 連線與認證
      • 19.4. 資源配置
      • 19.5. Write Ahead Log
      • 19.6. 複寫(Replication)
      • 19.7. 查詢規畫
      • 19.8. 錯誤回報與日誌記錄
      • 19.9. Run-time Statistics
      • 19.10. 自動資料庫清理
      • 19.11. 用戶端連線預設參數
      • 19.12. 交易鎖定管理
      • 19.13. 版本與平台的相容性
      • 19.14. Error Handling
      • 19.15. 預先配置的參數
      • 19.16. Customized Options
      • 19.17. Developer Options
      • 19.18. Short Options
    • 20. 使用者認證
      • 20.1. 設定檔:pg_hba.conf
      • 20.2. User Name Maps
      • 20.3. Authentication Methods
      • 20.4. Authentication Problems
    • 21. 資料庫角色
      • 21.1. Database Roles
      • 21.2. Role Attributes
      • 21.3. Role Membership
      • 21.4. 移除角色
      • 21.5. Default Roles
      • 21.6. Function Security
    • 22. Managing Databases
      • 22.1. Overview
      • 22.2. Creating a Database
      • 22.3. 樣版資料庫
      • 22.4. Database Configuration
      • 22.5. Destroying a Database
      • 22.6. Tablespaces
    • 23. 語系
      • 23.1. 語系支援
      • 23.2. Collation Support
      • 23.3. 字元集支援
    • 24. 例行性資料庫維護工作
      • 24.1. 例行性資料清理
      • 24.2. 定期重建索引
      • 24.3. Log File Maintenance
    • 25. 備份及還原
      • 25.1. SQL Dump
      • 25.2. File System Level Backup
      • 25.3. Continuous Archiving and Point-in-Time Recovery (PITR)
    • 26. High Availability, Load Balancing, and Replication
      • 26.1. Comparison of Different Solutions
      • 26.2. 日誌轉送備用伺服器 Log-Shipping Standby Servers
      • 26.3. Failover
      • 26.4. Alternative Method for Log Shipping
      • 26.5. Hot Standby
    • 27. Recovery Configuration
      • 27.1. Archive Recovery Settings
      • 27.2. Recovery Target Settings
      • 27.3. Standby Server Settings
    • 28. 監控資料庫活動
      • 28.1. Standard Unix Tools
      • 28.2. 統計資訊收集器
      • 28.3. Viewing Locks
      • 28.4. Progress Reporting
      • 28.5. Dynamic Tracing
    • 29. Monitoring Disk Usage
      • 29.1. Determining Disk Usage
      • 29.2. Disk Full Failure
    • 30. 高可靠度及預寫日誌
      • 30.1. Reliability
      • 30.2. Write-Ahead Logging (WAL)
      • 30.3. Asynchronous Commit
      • 30.4. WAL Configuration
      • 30.5. WAL Internals
    • 31. 邏輯複寫(Logical Replication)
      • 31.1. 發佈(Publication)
      • 31.2. 訂閱(Subscription)
      • 31.3. 衝突處理
      • 31.4. 限制
      • 31.5. 架構
      • 31.6. 監控
      • 31.7. 安全性
      • 31.8. 系統設定
      • 31.9. 快速設定
    • 32. Just-in-Time Compilation (JIT)
      • 32.1. What is JIT compilation?
      • 32.2. When to JIT?
      • 32.3. Configuration
      • 32.4. Extensibility
    • 33. 迴歸測試
      • 33.1. Running the Tests
      • 33.2. Test Evaluation
      • 33.3. Variant Comparison Files
      • 33.4. TAP Tests
      • 33.5. Test Coverage Examination
  • IV. 用戶端介面
    • 34. libpq - C Library
      • 34.1. 資料庫連線控制函數
      • 34.2. 連線狀態函數
      • 34.3. Command Execution Functions
      • 34.4. Asynchronous Command Processing
      • 34.5. Retrieving Query Results Row-By-Row
      • 34.6. Canceling Queries in Progress
      • 34.7. The Fast-Path Interface
      • 34.8. Asynchronous Notification
      • 34.9. Functions Associated with the COPY Command
      • 34.10. Control Functions
      • 34.11. Miscellaneous Functions
      • 34.12. Notice Processing
      • 34.13. Event System
      • 34.14. 環境變數
      • 34.15. 密碼檔
      • 34.16. The Connection Service File
      • 34.17. LDAP Lookup of Connection Parameters
      • 34.18. SSL Support
      • 34.19. Behavior in Threaded Programs
      • 34.20. Building libpq Programs
      • 34.21. Example Programs
    • 35. Large Objects
      • 35.1. Introduction
      • 35.2. Implementation Features
      • 35.3. Client Interfaces
      • 35.4. Server-side Functions
      • 35.5. Example Program
    • 36. ECPG - Embedded SQL in C
      • 36.1. The Concept
      • 36.2. Managing Database Connections
      • 36.3. Running SQL Commands
      • 36.4. Using Host Variables
      • 36.5. Dynamic SQL
      • 36.6. pgtypes Library
      • 36.7. Using Descriptor Areas
      • 36.8. Error Handling
      • 36.9. Preprocessor Directives
      • 36.10. Processing Embedded SQL Programs
      • 36.11. Library Functions
      • 36.12. Large Objects
      • 36.13. C++ Applications
      • 36.14. Embedded SQL Commands
      • 36.15. Informix Compatibility Mode
      • 36.16. Internals
    • 37. The Information Schema
      • 37.1. The Schema
      • 37.2. Data Types
      • 37.3. information_schema_catalog_name
      • 37.4. administrable_role_authorizations
      • 37.5. applicable_roles
      • 37.6. attributes
      • 37.7. character_sets
      • 37.8. check_constraint_routine_usage
      • 37.9. check_constraints
      • 37.10. collations
      • 37.11. collation_character_set_applicability
      • 37.12. column_domain_usage
      • 37.13. column_options
      • 37.14. column_privileges
      • 37.15. column_udt_usage
      • 37.16. columns
      • 37.17. constraint_column_usage
      • 37.18. constraint_table_usage
      • 37.19. data_type_privileges
      • 37.20. domain_constraints
      • 37.21. domain_udt_usage
      • 37.22. domains
      • 37.23. element_types
      • 37.24. enabled_roles
      • 37.25. foreign_data_wrapper_options
      • 37.26. foreign_data_wrappers
      • 37.27. foreign_server_options
      • 37.28. foreign_servers
      • 37.29. foreign_table_options
      • 37.30. foreign_tables
      • 37.31. key_column_usage
      • 37.32. parameters
      • 37.33. referential_constraints
      • 37.34. role_column_grants
      • 37.35. role_routine_grants
      • 37.36. role_table_grants
      • 37.37. role_udt_grants
      • 37.38. role_usage_grants
      • 37.39. routine_privileges
      • 37.40. routines
      • 37.41. schemata
      • 37.42. sequences
      • 37.43. sql_features
      • 37.44. sql_implementation_info
      • 37.45. sql_languages
      • 37.46. sql_packages
      • 37.47. sql_parts
      • 37.48. sql_sizing
      • 37.49. sql_sizing_profiles
      • 37.50. table_constraints
      • 37.51. table_privileges
      • 37.52. tables
      • 37.53. transforms
      • 37.54. triggered_update_columns
      • 37.55. triggers
      • 37.56. udt_privileges
      • 37.57. usage_privileges
      • 37.58. user_defined_types
      • 37.59. user_mapping_options
      • 37.60. user_mappings
      • 37.61. view_column_usage
      • 37.62. view_routine_usage
      • 37.63. view_table_usage
      • 37.64. views
  • V. 資料庫程式設計
    • 38. SQL 延伸功能
      • 38.1. How Extensibility Works
      • 38.2. The PostgreSQL Type System
      • 38.3. 使用者自訂函數
      • 38.4. User-defined Procedures
      • 38.5. Query Language (SQL) Functions
      • 38.6. Function Overloading
      • 38.7. 函數易變性類別
      • 38.8. Procedural Language Functions
      • 38.9. Internal Functions
      • 38.10. C-Language Functions
      • 38.11. User-defined Aggregates
      • 38.12. User-defined Types
      • 38.13. User-defined Operators
      • 38.14. Operator Optimization Information
      • 38.15. Interfacing Extensions To Indexes
      • 38.16. Packaging Related Objects into an Extension
      • 38.17. Extension Building Infrastructure
    • 39. Triggers
    • 40. Event Triggers
    • 41. 規則系統
      • 41.1. The Query Tree
      • 41.2. Views and the Rule System
      • 41.3. Materialized Views
      • 41.4. Rules on INSERT, UPDATE, and DELETE
      • 41.5. 規則及權限
      • 41.6. Rules and Command Status
      • 41.7. Rules Versus Triggers
    • 42. Procedural Languages(程序語言)
      • 42.1. Installing Procedural Languages
    • 43. PL/pgSQL - SQL Procedural Language
      • 43.5. 基本語法
    • 44. PL/Tcl - Tcl Procedural Language
    • 45. PL/Perl - Perl Procedural Language
    • 46. PL/Python - Python Procedural Language
    • 47. Server Programming Interface
    • 48. Background Worker Processes
    • 49. Logical Decoding
    • 50. Replication Progress Tracking
  • VI. 參考資訊
    • I. SQL 指令
      • ALTER DATABASE
      • ALTER DEFAULT PRIVILEGES
      • ALTER EXTENSION
      • ALTER FUNCTION
      • ALTER INDEX
      • ALTER LANGUAGE
      • ALTER MATERIALIZED VIEW
      • ALTER POLICY
      • ALTER PUBLICATION
      • ALTER ROLE
      • ALTER RULE
      • ALTER SCHEMA
      • ALTER SEQUENCE
      • ALTER STATISTICS
      • ALTER SUBSCRIPTION
      • ALTER TABLE
      • ALTER TABLESPACE
      • ALTER TRIGGER
      • ALTER TYPE
      • ALTER VIEW
      • ANALYZE
      • CLUSTER
      • COMMENT
      • COPY
      • CREATE CAST
      • CREATE DATABASE
      • CREATE EXTENSION
      • CREATE FOREIGN TABLE
      • CREATE FOREIGN DATA WRAPPER
      • CREATE FUNCTION
      • CREATE INDEX
      • CREATE LANGUAGE
      • CREATE MATERIALIZED VIEW
      • CREATE DOMAIN
      • CREATE POLICY
      • CREATE PROCEDURE
      • CREATE PUBLICATION
      • CREATE ROLE
      • CREATE RULE
      • CREATE SCHEMA
      • CREATE SEQUENCE
      • CREATE SERVER
      • CREATE STATISTICS
      • CREATE SUBSCRIPTION
      • CREATE TABLE
      • CREATE TABLE AS
      • CREATE TABLESPACE
      • CREATE TRANSFORM
      • CREATE TRIGGER
      • CREATE TYPE
      • CREATE USER
      • CREATE USER MAPPING
      • CREATE VIEW
      • DELETE
      • DO
      • DROP DATABASE
      • DROP EXTENSION
      • DROP FUNCTION
      • DROP INDEX
      • DROP LANGUAGE
      • DROP MATERIALIZED VIEW
      • DROP OWNED
      • DROP POLICY
      • DROP ROLE
      • DROP RULE
      • DROP SCHEMA
      • DROP SEQUENCE
      • DROP STATISTICS
      • DROP SUBSCRIPTION
      • DROP TABLE
      • DROP TABLESPACE
      • DROP TRANSFORM
      • DROP TRIGGER
      • DROP TYPE
      • DROP USER
      • DROP VIEW
      • EXECUTE
      • EXPLAIN
      • GRANT
      • IMPORT FOREIGN SCHEMA
      • INSERT
      • LISTEN
      • LOAD
      • NOTIFY
      • PREPARE TRANSACTION
      • REASSIGN OWNED
      • REFRESH MATERIALIZED VIEW
      • REINDEX
      • RESET
      • REVOKE
      • SELECT
      • SELECT INTO
      • SET
      • SET CONSTRAINTS
      • SET ROLE
      • SET SESSION AUTHORIZATION
      • SET TRANSACTION
      • SHOW
      • TRUNCATE
      • UNLISTEN
      • UPDATE
      • VACUUM
      • VALUES
    • II. PostgreSQL 用戶端工具
      • createdb
      • createuser
      • dropdb
      • dropuser
      • pgbench
      • pg_dump
      • psql
      • vacuumdb
    • III. PostgreSQL 伺服器應用程式
      • pg_test_timing
      • postgres
  • VII. 資料庫進階
    • 52. 系統目錄
      • 52.3. pg_am
      • 52.7. pg_attribute
      • 52.8. pg_authid
      • 52.9. pg_auth_members
      • 52.11 pg_class
      • 52.12. pg_collation
      • 52.13. pg_constraint
      • 52.15 pg_database
      • 52.26 pg_index
      • 52.29. pg_language
      • 52.32. pg_namespace
      • 52.33. pg_opclass
      • 52.38. pg_policy
      • 52.39. pg_proc
      • 52.44. pg_rewrite
      • 52.50. pg_statistic
      • 52.51. pg_statistic_ext
      • 52.54. pg_tablespace
      • 52.56. pg_trigger
      • 52.62. pg_type
      • 52.79. pg_replication_origin_status
      • 52.81 pg_roles
      • 52.85. pg_settings
      • 52.87. pg_stats
    • 53. Frontend/Backend Protocol
      • 53.1. Overview
      • 53.2. Message Flow
      • 53.3. SASL Authentication
      • 53.4. Streaming Replication Protocol
      • 53.5. Logical Streaming Replication Protocol
      • 53.6. Message Data Types
      • 53.7. Message Formats
      • 53.8. Error and Notice Message Fields
      • 53.9. Logical Replication Message Formats
      • 53.10. Summary of Changes since Protocol 2.0
    • 54. PostgreSQL 程式撰寫慣例
      • 54.1. Formatting
      • 54.2. Reporting Errors Within the Server
      • 54.3. Error Message Style Guide
      • 54.4. Miscellaneous Coding Conventions
    • 56. Writing A Procedural Language Handler
    • 64. GiST Indexes
      • 64.1. Introduction
      • 64.2. Built-in Operator Classes
      • 64.3. Extensibility
      • 64.4. Implementation
      • 64.5. Examples
    • 65. SP-GiST Indexes
      • 65.1. Introduction
      • 65.2. Built-in Operator Classes
      • 65.3. Extensibility
      • 65.4. Implementation
      • 65.5. Examples
    • 66. GIN 索引
      • 66.1. 簡介
      • 66.2. 內建運算子類
      • 66.3. Extensibility
      • 66.4. Implementation
      • 66.5. GIN Tips and Tricks
      • 66.6. Limitations
      • 66.7. Examples
    • 67. BRIN Indexes
      • 67.1. Introduction
      • 67.2. Built-in Operator Classes
      • 67.3. Extensibility
    • 68. 資料庫實體儲存格式
      • 68.2. TOAST
      • 68.4 可視性映射表(Visibility Map)
    • 70. How the Planner Uses Statistics
      • 70.2. Multivariate Statistics Examples
  • VIII. 附錄
    • A. PostgreSQL錯誤代碼
    • B. 日期時間格式支援
      • B.1. 日期時間解譯流程
      • B.2. 日期時間慣用字
      • B.3. 日期時間設定檔
      • B.4. 日期時間的沿革
    • C. SQL 關鍵字
    • D. SQL 相容性
    • E. 版本資訊
    • F. 延伸支援模組
      • F.4. auto_explain
      • F.11. dblink
        • dblink
      • F.33. pg_visibility
    • G. Additional Supplied Programs
      • G.1. Client Applications
        • oid2name
        • vacuumlo
      • G.2. Server Applications
        • pg_standby
    • H. 外部專案
      • H.1. 用戶端介面
      • H.2. Administration Tools
      • H.3. Procedural Languages
      • H.4. Extensions
    • I. The Source Code Repository
      • I.1. Getting The Source via Git
    • J. 文件取得
    • K. 縮寫字
  • 參考書目
Powered by GitBook
On this page

Was this helpful?

Edit on Git
Export as PDF
  1. III. 系統管理
  2. 30. 高可靠度及預寫日誌

30.4. WAL Configuration

Previous30.3. Asynchronous CommitNext30.5. WAL Internals

Last updated 6 years ago

Was this helpful?

There are several WAL-related configuration parameters that affect database performance. This section explains their use. Consult for general information about setting server configuration parameters.

Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before that checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. (The change records were previously flushed to the WAL files.) In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the log (known as the redo record) from which it should start the REDO operation. Any changes made to data files before that point are guaranteed to be already on disk. Hence, after a checkpoint, log segments preceding the one containing the redo record are no longer needed and can be recycled or removed. (When WAL archiving is being done, the log segments must be archived before being recycled or removed.)

The checkpoint requirement of flushing all dirty data pages to disk can cause a significant I/O load. For this reason, checkpoint activity is throttled so that I/O begins at checkpoint start and completes before the next checkpoint is due to start; this minimizes performance degradation during checkpoints.

The server's checkpointer process automatically performs a checkpoint every so often. A checkpoint is begun every seconds, or if is about to be exceeded, whichever comes first. The default settings are 5 minutes and 1 GB, respectively. If no WAL has been written since the previous checkpoint, new checkpoints will be skipped even if checkpoint_timeout has passed. (If WAL archiving is being used and you want to put a lower limit on how often files are archived in order to bound potential data loss, you should adjust the parameter rather than the checkpoint parameters.) It is also possible to force a checkpoint by using the SQL command CHECKPOINT.

Reducing checkpoint_timeout and/or max_wal_size causes checkpoints to occur more often. This allows faster after-crash recovery, since less work will need to be redone. However, one must balance this against the increased cost of flushing dirty data pages more often. If is set (as is the default), there is another factor to consider. To ensure data page consistency, the first modification of a data page after each checkpoint results in logging the entire page content. In that case, a smaller checkpoint interval increases the volume of output to the WAL log, partially negating the goal of using a smaller interval, and in any case causing more disk I/O.

Checkpoints are fairly expensive, first because they require writing out all currently dirty buffers, and second because they result in extra subsequent WAL traffic as discussed above. It is therefore wise to set the checkpointing parameters high enough so that checkpoints don't happen too often. As a simple sanity check on your checkpointing parameters, you can set the parameter. If checkpoints happen closer together than checkpoint_warning seconds, a message will be output to the server log recommending increasing max_wal_size. Occasional appearance of such a message is not cause for alarm, but if it appears often then the checkpoint control parameters should be increased. Bulk operations such as large COPY transfers might cause a number of such warnings to appear if you have not set max_wal_size high enough.

To avoid flooding the I/O system with a burst of page writes, writing dirty buffers during a checkpoint is spread over a period of time. That period is controlled by , which is given as a fraction of the checkpoint interval. The I/O rate is adjusted so that the checkpoint finishes when the given fraction of checkpoint_timeout seconds have elapsed, or before max_wal_size is exceeded, whichever is sooner. With the default value of 0.5,PostgreSQL can be expected to complete each checkpoint in about half the time before the next checkpoint starts. On a system that's very close to maximum I/O throughput during normal operation, you might want to increasecheckpoint_completion_target to reduce the I/O load from checkpoints. The disadvantage of this is that prolonging checkpoints affects recovery time, because more WAL segments will need to be kept around for possible use in recovery. Although checkpoint_completion_target can be set as high as 1.0, it is best to keep it less than that (perhaps 0.9 at most) since checkpoints include some other activities besides writing dirty buffers. A setting of 1.0 is quite likely to result in checkpoints not being completed on time, which would result in performance loss due to unexpected variation in the number of WAL segments needed.

On Linux and POSIX platforms allows to force the OS that pages written by the checkpoint should be flushed to disk after a configurable number of bytes. Otherwise, these pages may be kept in the OS's page cache, inducing a stall when fsync is issued at the end of a checkpoint. This setting will often help to reduce transaction latency, but it also can an adverse effect on performance; particularly for workloads that are bigger than , but smaller than the OS's page cache.

The number of WAL segment files in pg_wal directory depends on min_wal_size, max_wal_size and the amount of WAL generated in previous checkpoint cycles. When old log segment files are no longer needed, they are removed or recycled (that is, renamed to become future segments in the numbered sequence). If, due to a short-term peak of log output rate, max_wal_size is exceeded, the unneeded segment files will be removed until the system gets back under this limit. Below that limit, the system recycles enough WAL files to cover the estimated need until the next checkpoint, and removes the rest. The estimate is based on a moving average of the number of WAL files used in previous checkpoint cycles. The moving average is increased immediately if the actual usage exceeds the estimate, so it accommodates peak usage rather than average usage to some extent. min_wal_size puts a minimum on the amount of WAL files recycled for future usage; that much WAL is always recycled for future use, even if the system is idle and the WAL usage estimate suggests that little WAL is needed.

Independently of max_wal_size, + 1 most recent WAL files are kept at all times. Also, if WAL archiving is used, old segments can not be removed or recycled until they are archived. If WAL archiving cannot keep up with the pace that WAL is generated, or if archive_command fails repeatedly, old WAL files will accumulate in pg_wal until the situation is resolved. A slow or failed standby server that uses a replication slot will have the same effect (see ).

In archive recovery or standby mode, the server periodically performs restartpoints, which are similar to checkpoints in normal operation: the server forces all its state to disk, updates the pg_control file to indicate that the already-processed WAL data need not be scanned again, and then recycles any old log segment files in the pg_wal directory. Restartpoints can't be performed more frequently than checkpoints in the master because restartpoints can only be performed at checkpoint records. A restartpoint is triggered when a checkpoint record is reached if at least checkpoint_timeout seconds have passed since the last restartpoint, or if WAL size is about to exceed max_wal_size. However, because of limitations on when a restartpoint can be performed, max_wal_size is often exceeded during recovery, by up to one checkpoint cycle's worth of WAL. (max_wal_size is never a hard limit anyway, so you should always leave plenty of headroom to avoid running out of disk space.)

There are two commonly used internal WAL functions: XLogInsertRecord and XLogFlush. XLogInsertRecord is used to place a new record into the WAL buffers in shared memory. If there is no space for the new record, XLogInsertRecord will have to write (move to kernel cache) a few filled WAL buffers. This is undesirable because XLogInsertRecord is used on every database low level modification (for example, row insertion) at a time when an exclusive lock is held on affected data pages, so the operation needs to be as fast as possible. What is worse, writing WAL buffers might also force the creation of a new log segment, which takes even more time. Normally, WAL buffers should be written and flushed by an XLogFlush request, which is made, for the most part, at transaction commit time to ensure that transaction records are flushed to permanent storage. On systems with high log output, XLogFlush requests might not occur often enough to prevent XLogInsertRecord from having to do writes. On such systems one should increase the number of WAL buffers by modifying the parameter. When is set and the system is very busy, setting wal_buffers higher will help smooth response times during the period immediately following each checkpoint.

The parameter defines for how many microseconds a group commit leader process will sleep after acquiring a lock within XLogFlush, while group commit followers queue up behind the leader. This delay allows other server processes to add their commit records to the WAL buffers so that all of them will be flushed by the leader's eventual sync operation. No sleep will occur if is not enabled, or if fewer than other sessions are currently in active transactions; this avoids sleeping when it's unlikely that any other session will commit soon. Note that on some platforms, the resolution of a sleep request is ten milliseconds, so that any nonzero commit_delaysetting between 1 and 10000 microseconds would have the same effect. Note also that on some platforms, sleep operations may take slightly longer than requested by the parameter.

Since the purpose of commit_delay is to allow the cost of each flush operation to be amortized across concurrently committing transactions (potentially at the expense of transaction latency), it is necessary to quantify that cost before the setting can be chosen intelligently. The higher that cost is, the more effective commit_delay is expected to be in increasing transaction throughput, up to a point. The program can be used to measure the average time in microseconds that a single WAL flush operation takes. A value of half of the average time the program reports it takes to flush after a single 8kB write operation is often the most effective setting for commit_delay, so this value is recommended as the starting point to use when optimizing for a particular workload. While tuning commit_delay is particularly useful when the WAL log is stored on high-latency rotating disks, benefits can be significant even on storage media with very fast sync times, such as solid-state drives or RAID arrays with a battery-backed write cache; but this should definitely be tested against a representative workload. Higher values of commit_siblings should be used in such cases, whereas smaller commit_siblings values are often helpful on higher latency media. Note that it is quite possible that a setting of commit_delay that is too high can increase transaction latency by so much that total transaction throughput suffers.

When commit_delay is set to zero (the default), it is still possible for a form of group commit to occur, but each group will consist only of sessions that reach the point where they need to flush their commit records during the window in which the previous flush operation (if any) is occurring. At higher client counts a “gangway effect” tends to occur, so that the effects of group commit become significant even when commit_delay is zero, and thus explicitly settingcommit_delay tends to help less. Setting commit_delay can only help when (1) there are some concurrently committing transactions, and (2) throughput is limited to some degree by commit rate; but with high rotational latency this setting can be effective in increasing transaction throughput with as few as two clients (that is, a single committing client with one sibling transaction).

The parameter determines how PostgreSQL will ask the kernel to force WAL updates out to disk. All the options should be the same in terms of reliability, with the exception of fsync_writethrough, which can sometimes force a flush of the disk cache even when other options do not do so. However, it's quite platform-specific which one will be the fastest. You can test the speeds of different options using the program. Note that this parameter is irrelevant if fsync has been turned off.

Enabling the configuration parameter (provided that PostgreSQL has been compiled with support for it) will result in each XLogInsertRecord and XLogFlush WAL call being logged to the server log. This option might be replaced by a more general mechanism in the future.

Chapter 19
checkpoint_timeout
max_wal_size
archive_timeout
full_page_writes
checkpoint_warning
checkpoint_completion_target
checkpoint_flush_after
shared_buffers
wal_keep_segments
Section 26.2.6
wal_buffers
full_page_writes
commit_delay
fsync
commit_siblings
pg_test_fsync
wal_sync_method
pg_test_fsync
wal_debug