PostgreSQL 正體中文使用手冊
PostgreSQL.TW官方使用手冊小島故事加入社團
12
12
  • 簡介
  • 前言
    • 1. 什麼是 PostgreSQL?
    • 2. PostgreSQL 沿革
    • 3. 慣例
    • 4. 其他參考資訊
    • 5. 問題回報指南
  • I. 新手教學
    • 1. 入門指南
      • 1.1. 安裝
      • 1.2. 基礎架構
      • 1.3. 建立一個資料庫
      • 1.4. 存取一個資料庫
    • 2. SQL 查詢語言
      • 2.1. 簡介
      • 2.2. 概念
      • 2.3. 創建一個新的資料表
      • 2.4. 資料列是資料表的組成單位
      • 2.5. 資料表的查詢
      • 2.6. 交叉查詢
      • 2.7. 彙總查詢
      • 2.8. 更新資料
      • 2.9. 刪除資料
    • 3. 先進功能
      • 3.1. 簡介
      • 3.2. 檢視表(View)
      • 3.3. 外部索引鍵
      • 3.4. 交易安全
      • 3.5. 窗函數
      • 3.6. 繼承
      • 3.7. 結論
  • II. SQL 查詢語言
    • 4. SQL 語法
      • 4.1. 語法結構
      • 4.2. 參數表示式
      • 4.3. 函數呼叫
    • 5. 定義資料結構
      • 5.1. 認識資料表
      • 5.2. 預設值
      • 5.3. Generated Columns
      • 5.4. 限制條件
      • 5.5. 系統欄位
      • 5.6. 表格變更
      • 5.7. 權限
      • 5.8. 資料列安全原則
      • 5.9. Schemas
      • 5.10. 繼承
      • 5.11. 分割資料表
      • 5.12. 外部資料
      • 5.13. 其他資料庫物件
      • 5.14. 相依性追蹤
    • 6. 資料處理
      • 6.1. 新增資料
      • 6.2. 更新資料
      • 6.3. 刪除資料
      • 6.4. 修改並回傳資料
    • 7. 資料查詢
      • 7.1. 概觀
      • 7.2. 資料表表示式
      • 7.3. 取得資料列表
      • 7.4. 合併查詢結果
      • 7.5. 資料排序
      • 7.6. 指定資料範圍
      • 7.7. 列舉資料
      • 7.8. WITH Querys(Common Table Expressions)
    • 8. 資料型別
      • 8.1. 數字型別
      • 8.2. 貨幣型別
      • 8.3. 字串型別
      • 8.4. 位元組型別(bytea)
      • 8.5. 日期時間型別
      • 8.6. 布林型別
      • 8.7. 列舉型別
      • 8.8. 地理資訊型別
      • 8.9. 網路資訊型別
      • 8.10. 位元字串型別
      • 8.11. 全文檢索型別
      • 8.12. UUID 型別
      • 8.13. XML 型別
      • 8.14. JSON 型別
      • 8.15. 陣列
      • 8.16. 複合型別
      • 8.17. 範圍型別
      • 8.18. Domain Types
      • 8.19. 物件指標型別
      • 8.20. pg_lsn 型別
      • 8.21. 概念型別
    • 9. 函式及運算子
      • 9.1. 邏輯運算子
      • 9.2. 比較函式及運算子
      • 9.3. 數學函式及運算子
      • 9.4. 字串函式及運算子
      • 9.5. 位元字串函式及運算子
      • 9.6. 二元字串函式及運算子
      • 9.7. 特徵比對
      • 9.8. 型別轉換函式
      • 9.9 日期時間函式及運算子
      • 9.10. 列舉型別函式
      • 9.11. 地理資訊函式及運算子
      • 9.12. 網路位址函式及運算子
      • 9.13. 文字檢索函式及運算子
      • 9.14. XML 函式
      • 9.15. JSON 函式及運算子
      • 9.16. 序列函式
      • 9.17. 條件表示式
      • 9.18. 陣列函式及運算子
      • 9.19. 範圍函式及運算子
      • 9.20. 彙總函數
      • 9.21. Window 函式
      • 9.22. 子查詢
      • 9.23. 資料列與陣列的比較運算
      • 9.24. 集合回傳函式
      • 9.25. 系統資訊函數
      • 9.26. 系統管理函式
      • 9.27. 觸發函式
      • 9.28. 事件觸發函式
      • 9.29. Statistics Information Functions
    • 10. 型別轉換
      • 10.1. 概觀
      • 10.2. 運算子
      • 10.3. 函式
      • 10.4. 資料儲存轉換規則
      • 10.5. UNION、CASE 等相關結構
      • 10.6. SELECT 輸出規則
    • 11. 索引(Index)
      • 11.1. 簡介
      • 11.2. 索引型別
      • 11.3. 多欄位索引
      • 11.4. 索引與 ORDER BY
      • 11.5. 善用多個索引
      • 11.6. 唯一值索引
      • 11.7. 表示式索引
      • 11.8. 部份索引(partial index)
      • 11.9. 運算子物件及家族
      • 11.10. 索引與排序規則
      • 11.11. 索引限定查詢(Index-only scan)
      • 11.12. 檢查索引運用
    • 12. 全文檢索
      • 12.1. 簡介
      • 12.2. 查詢與索引
      • 12.3. 細部控制
      • 12.4. 延伸功能
      • 12.5. 斷詞
      • 12.6. 字典
      • 12.7. 組態範例
      • 12.8. 測試與除錯
      • 12.9. GIN 及 GiST 索引型別
      • 12.10. psql支援
      • 12.11. 功能限制
    • 13. 一致性管理(MVCC)
      • 13.1. 簡介
      • 13.2. 交易隔離
      • 13.3. 鎖定模式
      • 13.4. 在應用端檢視資料一致性
      • 13.5. 特別注意
      • 13.6. 鎖定與索引
    • 14. 效能技巧
      • 14.1. 善用 EXPLAIN
      • 14.2. 統計資訊
      • 14.3. 使用確切的 JOIN 方式
      • 14.4. 快速建立資料庫內容
      • 14.5. 風險性彈性設定
    • 15. 平行查詢
      • 15.1. 如何運作?
      • 15.2. 啓用時機?
      • 15.3. 平行查詢計畫
      • 15.4. 平行查詢的安全性
  • III. 系統管理
    • 16. 用原始碼安裝
      • 16.1. Short Version
      • 16.2. Requirements
      • 16.3. Getting The Source
      • 16.4. 安裝流程
      • 16.5. Post-Installation Setup
      • 16.6. Supported Platforms
      • 16.7. 平台相關的注意事項
    • 17. 用原始碼在 Windows 上安裝
      • 17.1. Building with Visual C++ or the Microsoft Windows SDK
    • 18. 服務配置與維運
      • 18.1. PostgreSQL 使用者帳號
      • 18.2. Creating a Database Cluster
      • 18.3. Starting the Database Server
      • 18.4. 核心資源管理
      • 18.5. Shutting Down the Server
      • 18.6. Upgrading a PostgreSQL Cluster
      • 18.7. Preventing Server Spoofing
      • 18.8. Encryption Options
      • 18.9. Secure TCP/IP Connections with SSL
      • 18.10. Secure TCP/IP Connections with GSSAPI Encryption
      • 18.11. Secure TCP/IP Connections with SSH Tunnels
      • 18.12. 在 Windows 註冊事件日誌
    • 19. 服務組態設定
      • 19.1. Setting Parameters
      • 19.2. File Locations
      • 19.3. 連線與認證
      • 19.4. 資源配置
      • 19.5. Write Ahead Log
      • 19.6. 複寫(Replication)
      • 19.7. 查詢規畫
      • 19.8. 錯誤回報與日誌記錄
      • 19.9. 執行階段統計資訊
      • 19.10. 自動資料庫清理
      • 19.11. 用戶端連線預設參數
      • 19.12. 交易鎖定管理
      • 19.13. 版本與平台的相容性
      • 19.14. Error Handling
      • 19.15. 預先配置的參數
      • 19.16. Customized Options
      • 19.17. Developer Options
      • 19.18. Short Options
    • 20. 使用者認證
      • 20.1. 設定檔:pg_hba.conf
      • 20.2. User Name Maps
      • 20.3. Authentication Methods
      • 20.4. Authentication Problems
      • 20.5. Password Authentication
      • 20.12. Certificate Authentication
      • 20.13. PAM Authentication
    • 21. 資料庫角色
      • 21.1. Database Roles
      • 21.2. Role Attributes
      • 21.3. Role Membership
      • 21.4. 移除角色
      • 21.5. Default Roles
      • 21.6. Function Security
    • 22. Managing Databases
      • 22.1. Overview
      • 22.2. Creating a Database
      • 22.3. 樣版資料庫
      • 22.4. Database Configuration
      • 22.5. Destroying a Database
      • 22.6. Tablespaces
    • 23. 語系
      • 23.1. 語系支援
      • 23.2. Collation Support
      • 23.3. 字元集支援
    • 24. 例行性資料庫維護工作
      • 24.1. 例行性資料清理
      • 24.2. 定期重建索引
      • 24.3. Log File Maintenance
    • 25. 備份及還原
      • 25.1. SQL Dump
      • 25.2. 檔案系統層級備份
      • 25.3. Continuous Archiving and Point-in-Time Recovery (PITR)
    • 26. High Availability, Load Balancing, and Replication
      • 26.1. Comparison of Different Solutions
      • 26.2. 日誌轉送備用伺服器 Log-Shipping Standby Servers
      • 26.3. Failover
      • 26.4. Alternative Method for Log Shipping
      • 26.5. Hot Standby
    • 27. 監控資料庫活動
      • 27.1. Standard Unix Tools
      • 27.2. 統計資訊收集器
      • 27.3. Viewing Locks
      • 27.4. Progress Reporting
      • 27.5. Dynamic Tracing
    • 28. 監控磁碟使用情況
      • 28.1. 瞭解磁碟使用情形
      • 28.2. 磁碟空間不足錯誤
    • 29. 高可靠度及預寫日誌
      • 29.1. 可靠度
      • 29.2. Write-Ahead Logging(WAL)
      • 29.3. Asynchronous Commit
      • 29.4. WAL Configuration
      • 29.5. WAL Internals
    • 30. 邏輯複寫(Logical Replication)
      • 30.1. 發佈(Publication)
      • 30.2. 訂閱(Subscription)
      • 30.3. 衝突處理
      • 30.4. 限制
      • 30.5. 架構
      • 30.6. 監控
      • 30.7. 安全性
      • 30.8. 系統設定
      • 30.9. 快速設定
    • 31. Just-in-Time Compilation(JIT)
      • 31.1. What is JIT compilation?
      • 31.2. When to JIT?
      • 31.3. Configuration
      • 31.4. Extensibility
    • 32. 迴歸測試
      • 32.1. Running the Tests
      • 32.2. Test Evaluation
      • 32.3. Variant Comparison Files
      • 32.4. TAP Tests
      • 32.5. Test Coverage Examination
  • IV. 用戶端介面
    • 33. libpq - C Library
      • 33.1. 資料庫連線控制函數
      • 33.2. 連線狀態函數
      • 33.3. Command Execution Functions
      • 33.4. Asynchronous Command Processing
      • 33.5. Retrieving Query Results Row-By-Row
      • 33.6. Canceling Queries in Progress
      • 33.7. The Fast-Path Interface
      • 33.8. Asynchronous Notification
      • 33.9. Functions Associated with the COPY Command
      • 33.10. Control Functions
      • 33.11. Miscellaneous Functions
      • 33.12. Notice Processing
      • 33.13. Event System
      • 33.14. 環境變數
      • 33.15. 密碼檔
      • 33.16. The Connection Service File
      • 33.17. LDAP Lookup of Connection Parameters
      • 33.18. SSL Support
      • 33.19. Behavior in Threaded Programs
      • 33.20. Building libpq Programs
      • 33.21. Example Programs
    • 34. Large Objects
      • 35.1. Introduction
      • 35.2. Implementation Features
      • 35.3. Client Interfaces
      • 35.4. Server-side Functions
      • 35.5. Example Program
    • 35. ECPG - Embedded SQL in C
      • 36.1. The Concept
      • 36.2. Managing Database Connections
      • 36.3. Running SQL Commands
      • 36.4. Using Host Variables
      • 36.5. Dynamic SQL
      • 36.6. pgtypes Library
      • 36.7. Using Descriptor Areas
      • 36.8. Error Handling
      • 36.9. Preprocessor Directives
      • 36.10. Processing Embedded SQL Programs
      • 36.11. Library Functions
      • 36.12. Large Objects
      • 36.13. C++ Applications
      • 36.14. Embedded SQL Commands
      • 36.15. Informix Compatibility Mode
      • 36.16. Internals
    • 36. The Information Schema
      • 36.1. The Schema
      • 36.2. Data Types
      • 36.3. information_schema_catalog_name
      • 36.4. administrable_role_authorizations
      • 36.5. applicable_roles
      • 36.6. attributes
      • 36.7. character_sets
      • 36.8. check_constraint_routine_usage
      • 36.9. check_constraints
      • 36.10. collations
      • 36.11. collation_character_set_applicability
      • 36.12. column_domain_usage
      • 36.13. column_options
      • 36.14. column_privileges
      • 36.15. column_udt_usage
      • 36.17. columns
      • 36.18. constraint_column_usage
      • 37.18. constraint_table_usage
      • 37.19. data_type_privileges
      • 37.20. domain_constraints
      • 37.21. domain_udt_usage
      • 37.22. domains
      • 37.23. element_types
      • 37.24. enabled_roles
      • 37.25. foreign_data_wrapper_options
      • 37.26. foreign_data_wrappers
      • 37.27. foreign_server_options
      • 37.28. foreign_servers
      • 37.29. foreign_table_options
      • 37.30. foreign_tables
      • 36.32. key_column_usage
      • 37.32. parameters
      • 36.34. referential_constraints
      • 37.34. role_column_grants
      • 37.35. role_routine_grants
      • 37.36. role_table_grants
      • 37.37. role_udt_grants
      • 37.38. role_usage_grants
      • 37.39. routine_privileges
      • 37.40. routines
      • 36.42. schemata
      • 37.42. sequences
      • 37.43. sql_features
      • 37.44. sql_implementation_info
      • 37.45. sql_languages
      • 37.46. sql_packages
      • 37.47. sql_parts
      • 37.48. sql_sizing
      • 37.49. sql_sizing_profiles
      • 36.51. table_constraints
      • 37.51. table_privileges
      • 37.52. tables
      • 37.53. transforms
      • 37.54. triggered_update_columns
      • 37.55. triggers
      • 37.56. udt_privileges
      • 37.57. usage_privileges
      • 37.58. user_defined_types
      • 37.59. user_mapping_options
      • 37.60. user_mappings
      • 37.61. view_column_usage
      • 37.62. view_routine_usage
      • 37.63. view_table_usage
      • 37.64. views
  • V. 資料庫程式設計
    • 37. SQL 延伸功能
      • 37.1. How Extensibility Works
      • 37.2. The PostgreSQL Type System
      • 37.3. 使用者自訂函數
      • 37.4. User-defined Procedures
      • 37.5. Query Language (SQL) Functions
      • 37.6. Function Overloading
      • 37.7. 函數易變性類別
      • 37.8. Procedural Language Functions
      • 37.9. Internal Functions
      • 37.10. C-Language Functions
      • 37.11. Function Optimization Information
      • 37.12. User-defined Aggregates
      • 37.13. User-defined Types
      • 37.14. User-defined Operators
      • 37.15. Operator Optimization Information
      • 37.16. Interfacing Extensions To Indexes
      • 37.17. 封裝相關物件到延伸功能中
      • 37.18. Extension Building Infrastructure
    • 38. Triggers
    • 39. Event Triggers
    • 40. 規則系統
      • 40.1. The Query Tree
      • 40.2. Views and the Rule System
      • 40.3. Materialized Views
      • 40.4. Rules on INSERT, UPDATE, and DELETE
      • 40.5. 規則及權限
      • 40.6. Rules and Command Status
      • 40.7. Rules Versus Triggers
    • 41. Procedural Languages(程序語言)
      • 41.1. Installing Procedural Languages
      • 41.2. Structure of PL/pgSQL
      • 41.5. Basic Statements
      • 41.11. 深入了解 PL/pgSQL
    • 43. PL/pgSQL - SQL Procedural Language
      • 43.5. 基本語法
    • 44. PL/Tcl - Tcl Procedural Language
    • 45. PL/Python - Python Procedural Language
      • 45.1. Python 2 vs. Python 3
      • 45.2. PL/Python Functions
      • 45.3. Data Values
      • 45.4. Sharing Data
      • 45.5. Anonymous Code Blocks
      • 45.6. Trigger Functions
      • 45.7. Database Access
      • 45.8. Explicit Subtransactions
      • 45.9. Transaction Management
      • 45.10. Utility Functions
      • 45.11. Environment Variables
    • 46. PL/Python - Python Procedural Language
    • 47. Server Programming Interface
    • 48. Background Worker Processes
    • 49. Logical Decoding
    • 50. Replication Progress Tracking
  • VI. 參考資訊
    • I. SQL 指令
      • ALTER DATABASE
      • ALTER DEFAULT PRIVILEGES
      • ALTER EXTENSION
      • ALTER FUNCTION
      • ALTER INDEX
      • ALTER LANGUAGE
      • ALTER MATERIALIZED VIEW
      • ALTER POLICY
      • ALTER PUBLICATION
      • ALTER ROLE
      • ALTER RULE
      • ALTER SCHEMA
      • ALTER SEQUENCE
      • ALTER STATISTICS
      • ALTER SUBSCRIPTION
      • ALTER SYSTEM
      • ALTER TABLE
      • ALTER TABLESPACE
      • ALTER TRIGGER
      • ALTER TYPE
      • ALTER VIEW
      • ANALYZE
      • CLUSTER
      • COMMENT
      • COPY
      • CREATE CAST
      • CREATE DATABASE
      • CREATE EXTENSION
      • CREATE FOREIGN TABLE
      • CREATE FOREIGN DATA WRAPPER
      • CREATE FUNCTION
      • CREATE INDEX
      • CREATE LANGUAGE
      • CREATE MATERIALIZED VIEW
      • CREATE DOMAIN
      • CREATE POLICY
      • CREATE PROCEDURE
      • CREATE PUBLICATION
      • CREATE ROLE
      • CREATE RULE
      • CREATE SCHEMA
      • CREATE SEQUENCE
      • CREATE SERVER
      • CREATE STATISTICS
      • CREATE SUBSCRIPTION
      • CREATE TABLE
      • CREATE TABLE AS
      • CREATE TABLESPACE
      • CREATE TRANSFORM
      • CREATE TRIGGER
      • CREATE TYPE
      • CREATE USER
      • CREATE USER MAPPING
      • CREATE VIEW
      • DELETE
      • DO
      • DROP DATABASE
      • DROP EXTENSION
      • DROP FUNCTION
      • DROP INDEX
      • DROP LANGUAGE
      • DROP MATERIALIZED VIEW
      • DROP OWNED
      • DROP POLICY
      • DROP ROLE
      • DROP RULE
      • DROP SCHEMA
      • DROP SEQUENCE
      • DROP STATISTICS
      • DROP SUBSCRIPTION
      • DROP TABLE
      • DROP TABLESPACE
      • DROP TRANSFORM
      • DROP TRIGGER
      • DROP TYPE
      • DROP USER
      • DROP VIEW
      • EXECUTE
      • EXPLAIN
      • GRANT
      • IMPORT FOREIGN SCHEMA
      • INSERT
      • LISTEN
      • LOAD
      • NOTIFY
      • PREPARE TRANSACTION
      • REASSIGN OWNED
      • REFRESH MATERIALIZED VIEW
      • REINDEX
      • RESET
      • REVOKE
      • SECURITY LABEL
      • SELECT
      • SELECT INTO
      • SET
      • SET CONSTRAINTS
      • SET ROLE
      • SET SESSION AUTHORIZATION
      • SET TRANSACTION
      • SHOW
      • TRUNCATE
      • UNLISTEN
      • UPDATE
      • VACUUM
      • VALUES
    • II. PostgreSQL 用戶端工具
      • createdb
      • createuser
      • dropdb
      • dropuser
      • oid2name
      • pgbench
      • pg_basebackup
      • pg_dump
      • pg_dumpall
      • pg_restore
      • psql
      • vacuumdb
    • III. PostgreSQL 伺服器應用程式
      • initdb
      • pg_archivecleanup
      • pg_ctl
      • pg_standby
      • pg_test_timing
      • postgres
  • VII. 資料庫進階
    • 50. PostgreSQL 的內部架構
      • 50.1. The Path of a Query
      • 50.2. How Connections Are Established
      • 50.3. The Parser Stage
      • 50.4. The PostgreSQL Rule System
      • 50.5. Planner/Optimizer
      • 50.6. Executor
    • 51. 系統目錄
      • 51.3. pg_am
      • 51.7. pg_attribute
      • 51.8. pg_authid
      • 51.9. pg_auth_members
      • 51.11 pg_class
      • 51.12. pg_collation
      • 51.13. pg_constraint
      • 51.15 pg_database
      • 51.22. pg_extension
      • 51.26 pg_index
      • 51.29. pg_language
      • 51.32. pg_namespace
      • 51.33. pg_opclass
      • 51.38. pg_policy
      • 51.39. pg_proc
      • 51.44. pg_rewrite
      • 51.50. pg_statistic
      • 51.51. pg_statistic_ext
      • 51.54. pg_tablespace
      • 51.56. pg_trigger
      • 51.62. pg_type
      • 51.66. pg_available_extensions
      • 51.67. pg_available_extension_versions
      • 51.72. pg_hba_file_rules
      • 51.73. pg_indexes
      • 51.74. pg_locks
      • 51.79. pg_replication_origin_status
      • 51.82 pg_roles
      • 51.85. pg_settings
      • 51.87. pg_stats
      • 51.90. pg_tables
      • 51.95. pg_views
    • 52. Frontend/Backend Protocol
      • 52.1. Overview
      • 52.2. Message Flow
      • 52.3. SASL Authentication
      • 52.4. Streaming Replication Protocol
      • 52.5. Logical Streaming Replication Protocol
      • 52.6. Message Data Types
      • 52.7. Message Formats
      • 52.8. Error and Notice Message Fields
      • 52.9. Logical Replication Message Formats
      • 52.10. Summary of Changes since Protocol 2.0
    • 53. PostgreSQL 程式撰寫慣例
      • 53.1. Formatting
      • 53.2. Reporting Errors Within the Server
      • 53.3. Error Message Style Guide
      • 53.4. Miscellaneous Coding Conventions
    • 54. Native Language Support
      • 54.1. For the Translator
      • 54.2. For the Programmer
    • 55. 撰寫程序語言的處理程序
    • 56. Writing a Foreign Data Wrapper
      • 56.1. Foreign Data Wrapper Functions
      • 56.2. Foreign Data Wrapper Callback Routines
      • 56.3. Foreign Data Wrapper Helper Functions
      • 56.4. Foreign Data Wrapper Query Planning
      • 56.5. Row Locking in Foreign Data Wrappers
    • 59. Genetic Query Optimizer
      • 59.1. Query Handling as a Complex Optimization Problem
      • 59.2. Genetic Algorithms
      • 59.3. Genetic Query Optimization (GEQO) in PostgreSQL
      • 59.4. Further Reading
    • 64. GiST Indexes
      • 64.1. Introduction
      • 64.2. Built-in Operator Classes
      • 64.3. Extensibility
      • 64.4. Implementation
      • 64.5. Examples
    • 65. SP-GiST Indexes
      • 65.1. Introduction
      • 65.2. Built-in Operator Classes
      • 65.3. Extensibility
      • 65.4. Implementation
      • 65.5. Examples
    • 66. GIN 索引
      • 66.1. 簡介
      • 66.2. 內建運算子類
      • 66.3. 延伸介面
      • 66.4. 實作說明
      • 66.5. GIN 小巧技
      • 66.6. 限制
      • 66.7. 範例
    • 67. BRIN Indexes
      • 67.1. Introduction
      • 67.2. Built-in Operator Classes
      • 67.3. Extensibility
    • 68. 資料庫實體儲存格式
      • 68.1. Database File Layout
      • 68.2. TOAST
      • 68.3. Free Space Map
      • 68.4 可視性映射表(Visibility Map)
      • 68.5. The Initialization Fork
      • 68.6. Database Page Layout
    • 70. 查詢計畫如何使用統計資訊
      • 70.1. Row Estimation Examples
      • 70.2. Multivariate Statistics Examples
      • 70.3. Planner Statistics and Security
  • VIII. 附錄
    • A. PostgreSQL 錯誤代碼
    • B. 日期時間格式支援
      • B.1. 日期時間解譯流程
      • B.2. 日期時間慣用字
      • B.3. 日期時間設定檔
      • B.4. 日期時間的沿革
    • C. SQL 關鍵字
    • D. SQL 相容性
    • E. 版本資訊
    • F. 延伸支援模組
      • F.4. auto_explain
      • F.6. btree_gin
      • F.10. dblink
        • dblink_connect
        • dblink_connect_u
        • dblink_disconnect
        • dblink
        • dblink_exec
        • dblink_open
        • dblink_fetch
        • dblink_close
        • dblink_get_connections
        • dblink_error_message
        • dblink_send_query
        • dblink_is_busy
        • dblink_get_notify
        • dblink_get_result
        • dblink_cancel_query
        • dblink_get_pkey
        • dblink_build_sql_insert
        • dblink_build_sql_delete
        • dblink_build_sql_update
      • F.13. earthdistance
      • F.14. file_fdw
      • F.29. pg_stat_statements
      • F.31. pg_trgm
      • F.32. pg_visibility
      • F.33. postgres_fdw
      • F.35. sepgsql
      • F.38. tablefunc
      • F.41. tsm_system_rows
      • F.42. tsm_system_time
    • G. Additional Supplied Programs
      • G.1. Client Applications
        • oid2name
        • vacuumlo
      • G.2. Server Applications
        • pg_standby
    • H. 外部專案
      • H.1. 用戶端介面
      • H.2. Administration Tools
      • H.3. Procedural Languages
      • H.4. Extensions
    • I. The Source Code Repository
      • I.1. Getting The Source via Git
    • J. 文件取得
    • K. 縮寫字
  • 參考書目
Powered by GitBook
On this page
  • 12.1.1. What Is a Document?
  • Note
  • 12.1.2. Basic Text Matching
  • 12.1.3. Configurations

Was this helpful?

Export as PDF
  1. II. SQL 查詢語言
  2. 12. 全文檢索

12.1. 簡介

Previous12. 全文檢索Next12.2. 查詢與索引

Last updated 7 years ago

Was this helpful?

Full Text Searching (or justtext search) provides the capability to identify natural-languagedocuments_that satisfy a_query, and optionally to sort them by relevance to the query. The most common type of search is to find all documents containing given_query terms_and return them in order of their_similarity_to the query. Notions ofqueryandsimilarityare very flexible and depend on the specific application. The simplest search considersqueryas a set of words andsimilarityas the frequency of query words in the document.

Textual search operators have existed in databases for years.PostgreSQLhas~,~*,LIKE, andILIKEoperators for textual data types, but they lack many essential properties required by modern information systems:

  • There is no linguistic support, even for English. Regular expressions are not sufficient because they cannot easily handle derived words, e.g.,satisfiesandsatisfy. You might miss documents that containsatisfies, although you probably would like to find them when searching forsatisfy. It is possible to useORto search for multiple derived forms, but this is tedious and error-prone (some words can have several thousand derivatives).

  • They provide no ordering (ranking) of search results, which makes them ineffective when thousands of matching documents are found.

  • They tend to be slow because there is no index support, so they must process all documents for every search.

Full text indexing allows documents to be_preprocessed_and an index saved for later rapid searching. Preprocessing includes:

  • Parsing documents intotokens. It is useful to identify various classes of tokens, e.g., numbers, words, complex words, email addresses, so that they can be processed differently. In principle token classes depend on the specific application, but for most purposes it is adequate to use a predefined set of classes.PostgreSQLuses a_parser_to perform this step. A standard parser is provided, and custom parsers can be created for specific needs.

  • Converting tokens intolexemes. A lexeme is a string, just like a token, but it has beennormalized_so that different forms of the same word are made alike. For example, normalization almost always includes folding upper-case letters to lower-case, and often involves removal of suffixes (such assoresin English). This allows searches to find variant forms of the same word, without tediously entering all the possible variants. Also, this step typically eliminates_stop words, which are words that are so common that they are useless for searching. (In short, then, tokens are raw fragments of the document text, while lexemes are words that are believed useful for indexing and searching.)PostgreSQLuses_dictionaries_to perform this step. Various standard dictionaries are provided, and custom ones can be created for specific needs.

  • Storing preprocessed documents optimized for searching. For example, each document can be represented as a sorted array of normalized lexemes. Along with the lexemes it is often desirable to store positional information to use forproximity ranking, so that a document that contains a more“dense”region of query words is assigned a higher rank than one with scattered query words.

Dictionaries allow fine-grained control over how tokens are normalized. With appropriate dictionaries, you can:

  • Define stop words that should not be indexed.

  • Map synonyms to a single word usingIspell.

  • Map phrases to a single word using a thesaurus.

  • Map different variations of a word to a canonical form using anIspelldictionary.

  • Map different variations of a word to a canonical form usingSnowballstemmer rules.

A data typetsvectoris provided for storing preprocessed documents, along with a typetsqueryfor representing processed queries (). There are many functions and operators available for these data types (), the most important of which is the match operator@@, which we introduce in. Full text searches can be accelerated using indexes ().

12.1.1. What Is a Document?

A_document_is the unit of searching in a full text search system; for example, a magazine article or email message. The text search engine must be able to parse documents and store associations of lexemes (key words) with their parent document. Later, these associations are used to search for documents that contain query words.

For searches withinPostgreSQL, a document is normally a textual field within a row of a database table, or possibly a combination (concatenation) of such fields, perhaps stored in several tables or obtained dynamically. In other words, a document can be constructed from different parts for indexing and it might not be stored anywhere as a whole. For example:

SELECT title || ' ' ||  author || ' ' ||  abstract || ' ' || body AS document
FROM messages
WHERE mid = 12;

SELECT m.title || ' ' || m.author || ' ' || m.abstract || ' ' || d.body AS document
FROM messages m, docs d
WHERE mid = did AND mid = 12;

Note

Actually, in these example queries,coalesceshould be used to prevent a singleNULLattribute from causing aNULLresult for the whole document.

Another possibility is to store the documents as simple text files in the file system. In this case, the database can be used to store the full text index and to execute searches, and some unique identifier can be used to retrieve the document from the file system. However, retrieving files from outside the database requires superuser permissions or special function support, so this is usually less convenient than keeping all the data insidePostgreSQL. Also, keeping everything inside the database allows easy access to document metadata to assist in indexing and display.

For text search purposes, each document must be reduced to the preprocessedtsvectorformat. Searching and ranking are performed entirely on thetsvectorrepresentation of a document — the original text need only be retrieved when the document has been selected for display to a user. We therefore often speak of thetsvectoras being the document, but of course it is only a compact representation of the full document.

12.1.2. Basic Text Matching

Full text searching inPostgreSQLis based on the match operator@@, which returnstrueif atsvector(document) matches atsquery(query). It doesn't matter which data type is written first:

SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector @@ 'cat 
&
 rat'::tsquery;
 ?column?
----------
 t

SELECT 'fat 
&
 cow'::tsquery @@ 'a fat cat sat on a mat and ate a fat rat'::tsvector;
 ?column?
----------
 f
SELECT to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat 
&
 rat');
 ?column? 
----------
 t

Observe that this match would not succeed if written as

SELECT 'fat cats ate fat rats'::tsvector @@ to_tsquery('fat 
&
 rat');
 ?column? 
----------
 f

since here no normalization of the wordratswill occur. The elements of atsvectorare lexemes, which are assumed already normalized, soratsdoes not matchrat.

The@@operator also supportstextinput, allowing explicit conversion of a text string totsvectorortsqueryto be skipped in simple cases. The variants available are:

tsvector @@ tsquery
tsquery  @@ tsvector
text @@ tsquery
text @@ text

The first two of these we saw already. The formtext@@tsqueryis equivalent toto_tsvector(x) @@ y. The formtext@@textis equivalent toto_tsvector(x) @@ plainto_tsquery(y).

Within atsquery, the&(AND) operator specifies that both its arguments must appear in the document to have a match. Similarly, the|(OR) operator specifies that at least one of its arguments must appear, while the!(NOT) operator specifies that its argument must_not_appear in order to have a match. For example, the queryfat & ! ratmatches documents that containfatbut notrat.

Searching for phrases is possible with the help of the<->(FOLLOWED BY)tsqueryoperator, which matches only if its arguments have matches that are adjacent and in the given order. For example:

SELECT to_tsvector('fatal error') @@ to_tsquery('fatal 
<
-
>
 error');
 ?column? 
----------
 t

SELECT to_tsvector('error is not fatal') @@ to_tsquery('fatal 
<
-
>
 error');
 ?column? 
----------
 f

There is a more general version of the FOLLOWED BY operator having the form<N>, where_N_is an integer standing for the difference between the positions of the matching lexemes.<1>is the same as<->, while<2>allows exactly one other lexeme to appear between the matches, and so on. Thephraseto_tsqueryfunction makes use of this operator to construct atsquerythat can match a multi-word phrase when some of the words are stop words. For example:

SELECT phraseto_tsquery('cats ate rats');
       phraseto_tsquery        
-------------------------------
 'cat' 
<
-
>
 'ate' 
<
-
>
 'rat'

SELECT phraseto_tsquery('the cats ate the rats');
       phraseto_tsquery        
-------------------------------
 'cat' 
<
-
>
 'ate' 
<
2
>
 'rat'

A special case that's sometimes useful is that<0>can be used to require that two patterns match the same word.

Parentheses can be used to control nesting of thetsqueryoperators. Without parentheses,|binds least tightly, then&, then<->, and!most tightly.

It's worth noticing that the AND/OR/NOT operators mean something subtly different when they are within the arguments of a FOLLOWED BY operator than when they are not, because within FOLLOWED BY the exact position of the match is significant. For example, normally!xmatches only documents that do not containxanywhere. But!x <-> ymatchesyif it is not immediately after anx; an occurrence ofxelsewhere in the document does not prevent a match. Another example is thatx & ynormally only requires thatxandyboth appear somewhere in the document, but(x & y) <-> zrequiresxandyto match at the same place, immediately before az. Thus this query behaves differently fromx <-> z & y <-> z, which will match a document containing two separate sequencesx zandy z. (This specific query is useless as written, sincexandycould not match at the same place; but with more complex situations such as prefix-match patterns, a query of this form could be useful.)

12.1.3. Configurations

The above are all simple text search examples. As mentioned before, full text search functionality includes the ability to do many more things: skip indexing certain words (stop words), process synonyms, and use sophisticated parsing, e.g., parse based on more than just white space. This functionality is controlled bytext search configurations.PostgreSQLcomes with predefined configurations for many languages, and you can easily create your own configurations. (psql's\dFcommand shows all available configurations.)

Each text search function that depends on a configuration has an optionalregconfigargument, so that the configuration to use can be specified explicitly.default_text_search_configis used only when this argument is omitted.

To make it easier to build custom text search configurations, a configuration is built up from simpler database objects.PostgreSQL's text search facility provides four types of configuration-related database objects:

  • _Text search parsers_break documents into tokens and classify each token (for example, as words or numbers).

  • _Text search dictionaries_convert tokens to normalized form and reject stop words.

  • _Text search templates_provide the functions underlying dictionaries. (A dictionary simply specifies a template and a set of parameters for the template.)

  • _Text search configurations_select a parser and a set of dictionaries to use to normalize the tokens produced by the parser.

Text search parsers and templates are built from low-level C functions; therefore it requires C programming ability to develop new ones, and superuser privileges to install one into a database. (There are examples of add-on parsers and templates in thecontrib/area of thePostgreSQLdistribution.) Since dictionaries and configurations just parameterize and connect together some underlying parsers and templates, no special privilege is needed to create a new dictionary or configuration. Examples of creating custom dictionaries and configurations appear later in this chapter.

As the above example suggests, atsqueryis not just raw text, any more than atsvectoris. Atsquerycontains search terms, which must be already-normalized lexemes, and may combine multiple terms using AND, OR, NOT, and FOLLOWED BY operators. (For syntax details see.) There are functionsto_tsquery,plainto_tsquery, andphraseto_tsquerythat are helpful in converting user-written text into a propertsquery, primarily by normalizing words appearing in the text. Similarly,to_tsvectoris used to parse and normalize a document string. So in practice a text search match would look more like this:

During installation an appropriate configuration is selected andis set accordingly inpostgresql.conf. If you are using the same text search configuration for the entire cluster you can use the value inpostgresql.conf. To use different configurations throughout the cluster but the same configuration within any one database, useALTER DATABASE ... SET. Otherwise, you can setdefault_text_search_configin each session.

Section 8.11
Section 9.13
Section 12.1.2
Section 12.9
Section 8.11.2
default_text_search_config