PostgreSQL 正體中文使用手冊
PostgreSQL.TW官方使用手冊小島故事加入社團
12
12
  • 簡介
  • 前言
    • 1. 什麼是 PostgreSQL?
    • 2. PostgreSQL 沿革
    • 3. 慣例
    • 4. 其他參考資訊
    • 5. 問題回報指南
  • I. 新手教學
    • 1. 入門指南
      • 1.1. 安裝
      • 1.2. 基礎架構
      • 1.3. 建立一個資料庫
      • 1.4. 存取一個資料庫
    • 2. SQL 查詢語言
      • 2.1. 簡介
      • 2.2. 概念
      • 2.3. 創建一個新的資料表
      • 2.4. 資料列是資料表的組成單位
      • 2.5. 資料表的查詢
      • 2.6. 交叉查詢
      • 2.7. 彙總查詢
      • 2.8. 更新資料
      • 2.9. 刪除資料
    • 3. 先進功能
      • 3.1. 簡介
      • 3.2. 檢視表(View)
      • 3.3. 外部索引鍵
      • 3.4. 交易安全
      • 3.5. 窗函數
      • 3.6. 繼承
      • 3.7. 結論
  • II. SQL 查詢語言
    • 4. SQL 語法
      • 4.1. 語法結構
      • 4.2. 參數表示式
      • 4.3. 函數呼叫
    • 5. 定義資料結構
      • 5.1. 認識資料表
      • 5.2. 預設值
      • 5.3. Generated Columns
      • 5.4. 限制條件
      • 5.5. 系統欄位
      • 5.6. 表格變更
      • 5.7. 權限
      • 5.8. 資料列安全原則
      • 5.9. Schemas
      • 5.10. 繼承
      • 5.11. 分割資料表
      • 5.12. 外部資料
      • 5.13. 其他資料庫物件
      • 5.14. 相依性追蹤
    • 6. 資料處理
      • 6.1. 新增資料
      • 6.2. 更新資料
      • 6.3. 刪除資料
      • 6.4. 修改並回傳資料
    • 7. 資料查詢
      • 7.1. 概觀
      • 7.2. 資料表表示式
      • 7.3. 取得資料列表
      • 7.4. 合併查詢結果
      • 7.5. 資料排序
      • 7.6. 指定資料範圍
      • 7.7. 列舉資料
      • 7.8. WITH Querys(Common Table Expressions)
    • 8. 資料型別
      • 8.1. 數字型別
      • 8.2. 貨幣型別
      • 8.3. 字串型別
      • 8.4. 位元組型別(bytea)
      • 8.5. 日期時間型別
      • 8.6. 布林型別
      • 8.7. 列舉型別
      • 8.8. 地理資訊型別
      • 8.9. 網路資訊型別
      • 8.10. 位元字串型別
      • 8.11. 全文檢索型別
      • 8.12. UUID 型別
      • 8.13. XML 型別
      • 8.14. JSON 型別
      • 8.15. 陣列
      • 8.16. 複合型別
      • 8.17. 範圍型別
      • 8.18. Domain Types
      • 8.19. 物件指標型別
      • 8.20. pg_lsn 型別
      • 8.21. 概念型別
    • 9. 函式及運算子
      • 9.1. 邏輯運算子
      • 9.2. 比較函式及運算子
      • 9.3. 數學函式及運算子
      • 9.4. 字串函式及運算子
      • 9.5. 位元字串函式及運算子
      • 9.6. 二元字串函式及運算子
      • 9.7. 特徵比對
      • 9.8. 型別轉換函式
      • 9.9 日期時間函式及運算子
      • 9.10. 列舉型別函式
      • 9.11. 地理資訊函式及運算子
      • 9.12. 網路位址函式及運算子
      • 9.13. 文字檢索函式及運算子
      • 9.14. XML 函式
      • 9.15. JSON 函式及運算子
      • 9.16. 序列函式
      • 9.17. 條件表示式
      • 9.18. 陣列函式及運算子
      • 9.19. 範圍函式及運算子
      • 9.20. 彙總函數
      • 9.21. Window 函式
      • 9.22. 子查詢
      • 9.23. 資料列與陣列的比較運算
      • 9.24. 集合回傳函式
      • 9.25. 系統資訊函數
      • 9.26. 系統管理函式
      • 9.27. 觸發函式
      • 9.28. 事件觸發函式
      • 9.29. Statistics Information Functions
    • 10. 型別轉換
      • 10.1. 概觀
      • 10.2. 運算子
      • 10.3. 函式
      • 10.4. 資料儲存轉換規則
      • 10.5. UNION、CASE 等相關結構
      • 10.6. SELECT 輸出規則
    • 11. 索引(Index)
      • 11.1. 簡介
      • 11.2. 索引型別
      • 11.3. 多欄位索引
      • 11.4. 索引與 ORDER BY
      • 11.5. 善用多個索引
      • 11.6. 唯一值索引
      • 11.7. 表示式索引
      • 11.8. 部份索引(partial index)
      • 11.9. 運算子物件及家族
      • 11.10. 索引與排序規則
      • 11.11. 索引限定查詢(Index-only scan)
      • 11.12. 檢查索引運用
    • 12. 全文檢索
      • 12.1. 簡介
      • 12.2. 查詢與索引
      • 12.3. 細部控制
      • 12.4. 延伸功能
      • 12.5. 斷詞
      • 12.6. 字典
      • 12.7. 組態範例
      • 12.8. 測試與除錯
      • 12.9. GIN 及 GiST 索引型別
      • 12.10. psql支援
      • 12.11. 功能限制
    • 13. 一致性管理(MVCC)
      • 13.1. 簡介
      • 13.2. 交易隔離
      • 13.3. 鎖定模式
      • 13.4. 在應用端檢視資料一致性
      • 13.5. 特別注意
      • 13.6. 鎖定與索引
    • 14. 效能技巧
      • 14.1. 善用 EXPLAIN
      • 14.2. 統計資訊
      • 14.3. 使用確切的 JOIN 方式
      • 14.4. 快速建立資料庫內容
      • 14.5. 風險性彈性設定
    • 15. 平行查詢
      • 15.1. 如何運作?
      • 15.2. 啓用時機?
      • 15.3. 平行查詢計畫
      • 15.4. 平行查詢的安全性
  • III. 系統管理
    • 16. 用原始碼安裝
      • 16.1. Short Version
      • 16.2. Requirements
      • 16.3. Getting The Source
      • 16.4. 安裝流程
      • 16.5. Post-Installation Setup
      • 16.6. Supported Platforms
      • 16.7. 平台相關的注意事項
    • 17. 用原始碼在 Windows 上安裝
      • 17.1. Building with Visual C++ or the Microsoft Windows SDK
    • 18. 服務配置與維運
      • 18.1. PostgreSQL 使用者帳號
      • 18.2. Creating a Database Cluster
      • 18.3. Starting the Database Server
      • 18.4. 核心資源管理
      • 18.5. Shutting Down the Server
      • 18.6. Upgrading a PostgreSQL Cluster
      • 18.7. Preventing Server Spoofing
      • 18.8. Encryption Options
      • 18.9. Secure TCP/IP Connections with SSL
      • 18.10. Secure TCP/IP Connections with GSSAPI Encryption
      • 18.11. Secure TCP/IP Connections with SSH Tunnels
      • 18.12. 在 Windows 註冊事件日誌
    • 19. 服務組態設定
      • 19.1. Setting Parameters
      • 19.2. File Locations
      • 19.3. 連線與認證
      • 19.4. 資源配置
      • 19.5. Write Ahead Log
      • 19.6. 複寫(Replication)
      • 19.7. 查詢規畫
      • 19.8. 錯誤回報與日誌記錄
      • 19.9. 執行階段統計資訊
      • 19.10. 自動資料庫清理
      • 19.11. 用戶端連線預設參數
      • 19.12. 交易鎖定管理
      • 19.13. 版本與平台的相容性
      • 19.14. Error Handling
      • 19.15. 預先配置的參數
      • 19.16. Customized Options
      • 19.17. Developer Options
      • 19.18. Short Options
    • 20. 使用者認證
      • 20.1. 設定檔:pg_hba.conf
      • 20.2. User Name Maps
      • 20.3. Authentication Methods
      • 20.4. Authentication Problems
      • 20.5. Password Authentication
      • 20.12. Certificate Authentication
      • 20.13. PAM Authentication
    • 21. 資料庫角色
      • 21.1. Database Roles
      • 21.2. Role Attributes
      • 21.3. Role Membership
      • 21.4. 移除角色
      • 21.5. Default Roles
      • 21.6. Function Security
    • 22. Managing Databases
      • 22.1. Overview
      • 22.2. Creating a Database
      • 22.3. 樣版資料庫
      • 22.4. Database Configuration
      • 22.5. Destroying a Database
      • 22.6. Tablespaces
    • 23. 語系
      • 23.1. 語系支援
      • 23.2. Collation Support
      • 23.3. 字元集支援
    • 24. 例行性資料庫維護工作
      • 24.1. 例行性資料清理
      • 24.2. 定期重建索引
      • 24.3. Log File Maintenance
    • 25. 備份及還原
      • 25.1. SQL Dump
      • 25.2. 檔案系統層級備份
      • 25.3. Continuous Archiving and Point-in-Time Recovery (PITR)
    • 26. High Availability, Load Balancing, and Replication
      • 26.1. Comparison of Different Solutions
      • 26.2. 日誌轉送備用伺服器 Log-Shipping Standby Servers
      • 26.3. Failover
      • 26.4. Alternative Method for Log Shipping
      • 26.5. Hot Standby
    • 27. 監控資料庫活動
      • 27.1. Standard Unix Tools
      • 27.2. 統計資訊收集器
      • 27.3. Viewing Locks
      • 27.4. Progress Reporting
      • 27.5. Dynamic Tracing
    • 28. 監控磁碟使用情況
      • 28.1. 瞭解磁碟使用情形
      • 28.2. 磁碟空間不足錯誤
    • 29. 高可靠度及預寫日誌
      • 29.1. 可靠度
      • 29.2. Write-Ahead Logging(WAL)
      • 29.3. Asynchronous Commit
      • 29.4. WAL Configuration
      • 29.5. WAL Internals
    • 30. 邏輯複寫(Logical Replication)
      • 30.1. 發佈(Publication)
      • 30.2. 訂閱(Subscription)
      • 30.3. 衝突處理
      • 30.4. 限制
      • 30.5. 架構
      • 30.6. 監控
      • 30.7. 安全性
      • 30.8. 系統設定
      • 30.9. 快速設定
    • 31. Just-in-Time Compilation(JIT)
      • 31.1. What is JIT compilation?
      • 31.2. When to JIT?
      • 31.3. Configuration
      • 31.4. Extensibility
    • 32. 迴歸測試
      • 32.1. Running the Tests
      • 32.2. Test Evaluation
      • 32.3. Variant Comparison Files
      • 32.4. TAP Tests
      • 32.5. Test Coverage Examination
  • IV. 用戶端介面
    • 33. libpq - C Library
      • 33.1. 資料庫連線控制函數
      • 33.2. 連線狀態函數
      • 33.3. Command Execution Functions
      • 33.4. Asynchronous Command Processing
      • 33.5. Retrieving Query Results Row-By-Row
      • 33.6. Canceling Queries in Progress
      • 33.7. The Fast-Path Interface
      • 33.8. Asynchronous Notification
      • 33.9. Functions Associated with the COPY Command
      • 33.10. Control Functions
      • 33.11. Miscellaneous Functions
      • 33.12. Notice Processing
      • 33.13. Event System
      • 33.14. 環境變數
      • 33.15. 密碼檔
      • 33.16. The Connection Service File
      • 33.17. LDAP Lookup of Connection Parameters
      • 33.18. SSL Support
      • 33.19. Behavior in Threaded Programs
      • 33.20. Building libpq Programs
      • 33.21. Example Programs
    • 34. Large Objects
      • 35.1. Introduction
      • 35.2. Implementation Features
      • 35.3. Client Interfaces
      • 35.4. Server-side Functions
      • 35.5. Example Program
    • 35. ECPG - Embedded SQL in C
      • 36.1. The Concept
      • 36.2. Managing Database Connections
      • 36.3. Running SQL Commands
      • 36.4. Using Host Variables
      • 36.5. Dynamic SQL
      • 36.6. pgtypes Library
      • 36.7. Using Descriptor Areas
      • 36.8. Error Handling
      • 36.9. Preprocessor Directives
      • 36.10. Processing Embedded SQL Programs
      • 36.11. Library Functions
      • 36.12. Large Objects
      • 36.13. C++ Applications
      • 36.14. Embedded SQL Commands
      • 36.15. Informix Compatibility Mode
      • 36.16. Internals
    • 36. The Information Schema
      • 36.1. The Schema
      • 36.2. Data Types
      • 36.3. information_schema_catalog_name
      • 36.4. administrable_role_authorizations
      • 36.5. applicable_roles
      • 36.6. attributes
      • 36.7. character_sets
      • 36.8. check_constraint_routine_usage
      • 36.9. check_constraints
      • 36.10. collations
      • 36.11. collation_character_set_applicability
      • 36.12. column_domain_usage
      • 36.13. column_options
      • 36.14. column_privileges
      • 36.15. column_udt_usage
      • 36.17. columns
      • 36.18. constraint_column_usage
      • 37.18. constraint_table_usage
      • 37.19. data_type_privileges
      • 37.20. domain_constraints
      • 37.21. domain_udt_usage
      • 37.22. domains
      • 37.23. element_types
      • 37.24. enabled_roles
      • 37.25. foreign_data_wrapper_options
      • 37.26. foreign_data_wrappers
      • 37.27. foreign_server_options
      • 37.28. foreign_servers
      • 37.29. foreign_table_options
      • 37.30. foreign_tables
      • 36.32. key_column_usage
      • 37.32. parameters
      • 36.34. referential_constraints
      • 37.34. role_column_grants
      • 37.35. role_routine_grants
      • 37.36. role_table_grants
      • 37.37. role_udt_grants
      • 37.38. role_usage_grants
      • 37.39. routine_privileges
      • 37.40. routines
      • 36.42. schemata
      • 37.42. sequences
      • 37.43. sql_features
      • 37.44. sql_implementation_info
      • 37.45. sql_languages
      • 37.46. sql_packages
      • 37.47. sql_parts
      • 37.48. sql_sizing
      • 37.49. sql_sizing_profiles
      • 36.51. table_constraints
      • 37.51. table_privileges
      • 37.52. tables
      • 37.53. transforms
      • 37.54. triggered_update_columns
      • 37.55. triggers
      • 37.56. udt_privileges
      • 37.57. usage_privileges
      • 37.58. user_defined_types
      • 37.59. user_mapping_options
      • 37.60. user_mappings
      • 37.61. view_column_usage
      • 37.62. view_routine_usage
      • 37.63. view_table_usage
      • 37.64. views
  • V. 資料庫程式設計
    • 37. SQL 延伸功能
      • 37.1. How Extensibility Works
      • 37.2. The PostgreSQL Type System
      • 37.3. 使用者自訂函數
      • 37.4. User-defined Procedures
      • 37.5. Query Language (SQL) Functions
      • 37.6. Function Overloading
      • 37.7. 函數易變性類別
      • 37.8. Procedural Language Functions
      • 37.9. Internal Functions
      • 37.10. C-Language Functions
      • 37.11. Function Optimization Information
      • 37.12. User-defined Aggregates
      • 37.13. User-defined Types
      • 37.14. User-defined Operators
      • 37.15. Operator Optimization Information
      • 37.16. Interfacing Extensions To Indexes
      • 37.17. 封裝相關物件到延伸功能中
      • 37.18. Extension Building Infrastructure
    • 38. Triggers
    • 39. Event Triggers
    • 40. 規則系統
      • 40.1. The Query Tree
      • 40.2. Views and the Rule System
      • 40.3. Materialized Views
      • 40.4. Rules on INSERT, UPDATE, and DELETE
      • 40.5. 規則及權限
      • 40.6. Rules and Command Status
      • 40.7. Rules Versus Triggers
    • 41. Procedural Languages(程序語言)
      • 41.1. Installing Procedural Languages
      • 41.2. Structure of PL/pgSQL
      • 41.5. Basic Statements
      • 41.11. 深入了解 PL/pgSQL
    • 43. PL/pgSQL - SQL Procedural Language
      • 43.5. 基本語法
    • 44. PL/Tcl - Tcl Procedural Language
    • 45. PL/Python - Python Procedural Language
      • 45.1. Python 2 vs. Python 3
      • 45.2. PL/Python Functions
      • 45.3. Data Values
      • 45.4. Sharing Data
      • 45.5. Anonymous Code Blocks
      • 45.6. Trigger Functions
      • 45.7. Database Access
      • 45.8. Explicit Subtransactions
      • 45.9. Transaction Management
      • 45.10. Utility Functions
      • 45.11. Environment Variables
    • 46. PL/Python - Python Procedural Language
    • 47. Server Programming Interface
    • 48. Background Worker Processes
    • 49. Logical Decoding
    • 50. Replication Progress Tracking
  • VI. 參考資訊
    • I. SQL 指令
      • ALTER DATABASE
      • ALTER DEFAULT PRIVILEGES
      • ALTER EXTENSION
      • ALTER FUNCTION
      • ALTER INDEX
      • ALTER LANGUAGE
      • ALTER MATERIALIZED VIEW
      • ALTER POLICY
      • ALTER PUBLICATION
      • ALTER ROLE
      • ALTER RULE
      • ALTER SCHEMA
      • ALTER SEQUENCE
      • ALTER STATISTICS
      • ALTER SUBSCRIPTION
      • ALTER SYSTEM
      • ALTER TABLE
      • ALTER TABLESPACE
      • ALTER TRIGGER
      • ALTER TYPE
      • ALTER VIEW
      • ANALYZE
      • CLUSTER
      • COMMENT
      • COPY
      • CREATE CAST
      • CREATE DATABASE
      • CREATE EXTENSION
      • CREATE FOREIGN TABLE
      • CREATE FOREIGN DATA WRAPPER
      • CREATE FUNCTION
      • CREATE INDEX
      • CREATE LANGUAGE
      • CREATE MATERIALIZED VIEW
      • CREATE DOMAIN
      • CREATE POLICY
      • CREATE PROCEDURE
      • CREATE PUBLICATION
      • CREATE ROLE
      • CREATE RULE
      • CREATE SCHEMA
      • CREATE SEQUENCE
      • CREATE SERVER
      • CREATE STATISTICS
      • CREATE SUBSCRIPTION
      • CREATE TABLE
      • CREATE TABLE AS
      • CREATE TABLESPACE
      • CREATE TRANSFORM
      • CREATE TRIGGER
      • CREATE TYPE
      • CREATE USER
      • CREATE USER MAPPING
      • CREATE VIEW
      • DELETE
      • DO
      • DROP DATABASE
      • DROP EXTENSION
      • DROP FUNCTION
      • DROP INDEX
      • DROP LANGUAGE
      • DROP MATERIALIZED VIEW
      • DROP OWNED
      • DROP POLICY
      • DROP ROLE
      • DROP RULE
      • DROP SCHEMA
      • DROP SEQUENCE
      • DROP STATISTICS
      • DROP SUBSCRIPTION
      • DROP TABLE
      • DROP TABLESPACE
      • DROP TRANSFORM
      • DROP TRIGGER
      • DROP TYPE
      • DROP USER
      • DROP VIEW
      • EXECUTE
      • EXPLAIN
      • GRANT
      • IMPORT FOREIGN SCHEMA
      • INSERT
      • LISTEN
      • LOAD
      • NOTIFY
      • PREPARE TRANSACTION
      • REASSIGN OWNED
      • REFRESH MATERIALIZED VIEW
      • REINDEX
      • RESET
      • REVOKE
      • SECURITY LABEL
      • SELECT
      • SELECT INTO
      • SET
      • SET CONSTRAINTS
      • SET ROLE
      • SET SESSION AUTHORIZATION
      • SET TRANSACTION
      • SHOW
      • TRUNCATE
      • UNLISTEN
      • UPDATE
      • VACUUM
      • VALUES
    • II. PostgreSQL 用戶端工具
      • createdb
      • createuser
      • dropdb
      • dropuser
      • oid2name
      • pgbench
      • pg_basebackup
      • pg_dump
      • pg_dumpall
      • pg_restore
      • psql
      • vacuumdb
    • III. PostgreSQL 伺服器應用程式
      • initdb
      • pg_archivecleanup
      • pg_ctl
      • pg_standby
      • pg_test_timing
      • postgres
  • VII. 資料庫進階
    • 50. PostgreSQL 的內部架構
      • 50.1. The Path of a Query
      • 50.2. How Connections Are Established
      • 50.3. The Parser Stage
      • 50.4. The PostgreSQL Rule System
      • 50.5. Planner/Optimizer
      • 50.6. Executor
    • 51. 系統目錄
      • 51.3. pg_am
      • 51.7. pg_attribute
      • 51.8. pg_authid
      • 51.9. pg_auth_members
      • 51.11 pg_class
      • 51.12. pg_collation
      • 51.13. pg_constraint
      • 51.15 pg_database
      • 51.22. pg_extension
      • 51.26 pg_index
      • 51.29. pg_language
      • 51.32. pg_namespace
      • 51.33. pg_opclass
      • 51.38. pg_policy
      • 51.39. pg_proc
      • 51.44. pg_rewrite
      • 51.50. pg_statistic
      • 51.51. pg_statistic_ext
      • 51.54. pg_tablespace
      • 51.56. pg_trigger
      • 51.62. pg_type
      • 51.66. pg_available_extensions
      • 51.67. pg_available_extension_versions
      • 51.72. pg_hba_file_rules
      • 51.73. pg_indexes
      • 51.74. pg_locks
      • 51.79. pg_replication_origin_status
      • 51.82 pg_roles
      • 51.85. pg_settings
      • 51.87. pg_stats
      • 51.90. pg_tables
      • 51.95. pg_views
    • 52. Frontend/Backend Protocol
      • 52.1. Overview
      • 52.2. Message Flow
      • 52.3. SASL Authentication
      • 52.4. Streaming Replication Protocol
      • 52.5. Logical Streaming Replication Protocol
      • 52.6. Message Data Types
      • 52.7. Message Formats
      • 52.8. Error and Notice Message Fields
      • 52.9. Logical Replication Message Formats
      • 52.10. Summary of Changes since Protocol 2.0
    • 53. PostgreSQL 程式撰寫慣例
      • 53.1. Formatting
      • 53.2. Reporting Errors Within the Server
      • 53.3. Error Message Style Guide
      • 53.4. Miscellaneous Coding Conventions
    • 54. Native Language Support
      • 54.1. For the Translator
      • 54.2. For the Programmer
    • 55. 撰寫程序語言的處理程序
    • 56. Writing a Foreign Data Wrapper
      • 56.1. Foreign Data Wrapper Functions
      • 56.2. Foreign Data Wrapper Callback Routines
      • 56.3. Foreign Data Wrapper Helper Functions
      • 56.4. Foreign Data Wrapper Query Planning
      • 56.5. Row Locking in Foreign Data Wrappers
    • 59. Genetic Query Optimizer
      • 59.1. Query Handling as a Complex Optimization Problem
      • 59.2. Genetic Algorithms
      • 59.3. Genetic Query Optimization (GEQO) in PostgreSQL
      • 59.4. Further Reading
    • 64. GiST Indexes
      • 64.1. Introduction
      • 64.2. Built-in Operator Classes
      • 64.3. Extensibility
      • 64.4. Implementation
      • 64.5. Examples
    • 65. SP-GiST Indexes
      • 65.1. Introduction
      • 65.2. Built-in Operator Classes
      • 65.3. Extensibility
      • 65.4. Implementation
      • 65.5. Examples
    • 66. GIN 索引
      • 66.1. 簡介
      • 66.2. 內建運算子類
      • 66.3. 延伸介面
      • 66.4. 實作說明
      • 66.5. GIN 小巧技
      • 66.6. 限制
      • 66.7. 範例
    • 67. BRIN Indexes
      • 67.1. Introduction
      • 67.2. Built-in Operator Classes
      • 67.3. Extensibility
    • 68. 資料庫實體儲存格式
      • 68.1. Database File Layout
      • 68.2. TOAST
      • 68.3. Free Space Map
      • 68.4 可視性映射表(Visibility Map)
      • 68.5. The Initialization Fork
      • 68.6. Database Page Layout
    • 70. 查詢計畫如何使用統計資訊
      • 70.1. Row Estimation Examples
      • 70.2. Multivariate Statistics Examples
      • 70.3. Planner Statistics and Security
  • VIII. 附錄
    • A. PostgreSQL 錯誤代碼
    • B. 日期時間格式支援
      • B.1. 日期時間解譯流程
      • B.2. 日期時間慣用字
      • B.3. 日期時間設定檔
      • B.4. 日期時間的沿革
    • C. SQL 關鍵字
    • D. SQL 相容性
    • E. 版本資訊
    • F. 延伸支援模組
      • F.4. auto_explain
      • F.6. btree_gin
      • F.10. dblink
        • dblink_connect
        • dblink_connect_u
        • dblink_disconnect
        • dblink
        • dblink_exec
        • dblink_open
        • dblink_fetch
        • dblink_close
        • dblink_get_connections
        • dblink_error_message
        • dblink_send_query
        • dblink_is_busy
        • dblink_get_notify
        • dblink_get_result
        • dblink_cancel_query
        • dblink_get_pkey
        • dblink_build_sql_insert
        • dblink_build_sql_delete
        • dblink_build_sql_update
      • F.13. earthdistance
      • F.14. file_fdw
      • F.29. pg_stat_statements
      • F.31. pg_trgm
      • F.32. pg_visibility
      • F.33. postgres_fdw
      • F.35. sepgsql
      • F.38. tablefunc
      • F.41. tsm_system_rows
      • F.42. tsm_system_time
    • G. Additional Supplied Programs
      • G.1. Client Applications
        • oid2name
        • vacuumlo
      • G.2. Server Applications
        • pg_standby
    • H. 外部專案
      • H.1. 用戶端介面
      • H.2. Administration Tools
      • H.3. Procedural Languages
      • H.4. Extensions
    • I. The Source Code Repository
      • I.1. Getting The Source via Git
    • J. 文件取得
    • K. 縮寫字
  • 參考書目
Powered by GitBook
On this page
  • 12.3.1. Parsing Documents
  • 12.3.2. Parsing Queries
  • 12.3.3. Ranking Search Results
  • 12.3.4. Highlighting Results

Was this helpful?

Export as PDF
  1. II. SQL 查詢語言
  2. 12. 全文檢索

12.3. 細部控制

To implement full text searching there must be a function to create atsvectorfrom a document and atsqueryfrom a user query. Also, we need to return results in a useful order, so we need a function that compares documents with respect to their relevance to the query. It's also important to be able to display the results nicely.PostgreSQLprovides support for all of these functions.

12.3.1. Parsing Documents

PostgreSQLprovides the functionto_tsvectorfor converting a document to thetsvectordata type.

to_tsvector([
config
regconfig
, 
] 
document
text
) returns 
tsvector

to_tsvectorparses a textual document into tokens, reduces the tokens to lexemes, and returns atsvectorwhich lists the lexemes together with their positions in the document. The document is processed according to the specified or default text search configuration. Here is a simple example:

SELECT to_tsvector('english', 'a fat  cat sat on a mat - it ate a fat rats');
                  to_tsvector
-----------------------------------------------------
 'ate':9 'cat':3 'fat':2,11 'mat':7 'rat':12 'sat':4

In the example above we see that the resultingtsvectordoes not contain the wordsa,on, orit, the wordratsbecamerat, and the punctuation sign-was ignored.

The functionsetweightcan be used to label the entries of atsvectorwith a givenweight, where a weight is one of the lettersA,B,C, orD. This is typically used to mark entries coming from different parts of a document, such as title versus body. Later, this information can be used for ranking of search results.

Becauseto_tsvector(NULL) will returnNULL, it is recommended to usecoalescewhenever a field might be null. Here is the recommended method for creating atsvectorfrom a structured document:

UPDATE tt SET ti =
    setweight(to_tsvector(coalesce(title,'')), 'A')    ||
    setweight(to_tsvector(coalesce(keyword,'')), 'B')  ||
    setweight(to_tsvector(coalesce(abstract,'')), 'C') ||
    setweight(to_tsvector(coalesce(body,'')), 'D');

12.3.2. Parsing Queries

PostgreSQLprovides the functionsto_tsquery,plainto_tsquery, andphraseto_tsqueryfor converting a query to thetsquerydata type.to_tsqueryoffers access to more features than eitherplainto_tsqueryorphraseto_tsquery, but it is less forgiving about its input.

to_tsquery([
config
regconfig
, 
] 
querytext
text
) returns 
tsquery
SELECT to_tsquery('english', 'The 
&
 Fat 
&
 Rats');
  to_tsquery   
---------------
 'fat' 
&
 'rat'

As in basictsqueryinput, weight(s) can be attached to each lexeme to restrict it to match onlytsvectorlexemes of those weight(s). For example:

SELECT to_tsquery('english', 'Fat | Rats:AB');
    to_tsquery    
------------------
 'fat' | 'rat':AB

Also,*can be attached to a lexeme to specify prefix matching:

SELECT to_tsquery('supern:*A 
&
 star:A*B');
        to_tsquery        
--------------------------
 'supern':*A 
&
 'star':*AB

Such a lexeme will match any word in atsvectorthat begins with the given string.

to_tsquerycan also accept single-quoted phrases. This is primarily useful when the configuration includes a thesaurus dictionary that may trigger on such phrases. In the example below, a thesaurus contains the rulesupernovae stars : sn:

SELECT to_tsquery('''supernovae stars'' 
&
 !crab');
  to_tsquery
---------------
 'sn' 
&
 !'crab'

Without quotes,to_tsquerywill generate a syntax error for tokens that are not separated by an AND, OR, or FOLLOWED BY operator.

plainto_tsquery([
config
regconfig
, 
] 
querytext
text
) returns 
tsquery

plainto_tsquerytransforms the unformatted text_querytext_to atsqueryvalue. The text is parsed and normalized much as forto_tsvector, then the&(AND)tsqueryoperator is inserted between surviving words.

Example:

SELECT plainto_tsquery('english', 'The Fat Rats');
 plainto_tsquery 
-----------------
 'fat' 
&
 'rat'

Note thatplainto_tsquerywill not recognizetsqueryoperators, weight labels, or prefix-match labels in its input:

SELECT plainto_tsquery('english', 'The Fat 
&
 Rats:C');
   plainto_tsquery   
---------------------
 'fat' 
&
 'rat' 
&
 'c'

Here, all the input punctuation was discarded as being space symbols.

phraseto_tsquery([
config
regconfig
, 
] 
querytext
text
) returns 
tsquery

phraseto_tsquerybehaves much likeplainto_tsquery, except that it inserts the<->(FOLLOWED BY) operator between surviving words instead of the&(AND) operator. Also, stop words are not simply discarded, but are accounted for by inserting<N>operators rather than<->operators. This function is useful when searching for exact lexeme sequences, since the FOLLOWED BY operators check lexeme order not just the presence of all the lexemes.

Example:

SELECT phraseto_tsquery('english', 'The Fat Rats');
 phraseto_tsquery
------------------
 'fat' 
<
-
>
 'rat'

Likeplainto_tsquery, thephraseto_tsqueryfunction will not recognizetsqueryoperators, weight labels, or prefix-match labels in its input:

SELECT phraseto_tsquery('english', 'The Fat 
&
 Rats:C');
      phraseto_tsquery
-----------------------------
 'fat' 
<
-
>
 'rat' 
<
-
>
 'c'

12.3.3. Ranking Search Results

Ranking attempts to measure how relevant documents are to a particular query, so that when there are many matches the most relevant ones can be shown first.PostgreSQLprovides two predefined ranking functions, which take into account lexical, proximity, and structural information; that is, they consider how often the query terms appear in the document, how close together the terms are in the document, and how important is the part of the document where they occur. However, the concept of relevancy is vague and very application-specific. Different applications might require additional information for ranking, e.g., document modification time. The built-in ranking functions are only examples. You can write your own ranking functions and/or combine their results with additional factors to fit your specific needs.

The two ranking functions currently available are:

ts_rank([

weights

float4[]

,

]

vector

tsvector

,

query

tsquery

[

,

normalization

integer

]) returns

float4

Ranks vectors based on the frequency of their matching lexemes.

ts_rank_cd([

weights

float4[]

,

]

vector

tsvector

,

query

tsquery

[

,

normalization

integer

]) returns

float4

This function computes the_cover density_ranking for the given document vector and query, as described in Clarke, Cormack, and Tudhope's "Relevance Ranking for One to Three Term Queries" in the journal "Information Processing and Management", 1999. Cover density is similar tots_rankranking except that the proximity of matching lexemes to each other is taken into consideration.

For both these functions, the optional_weights_argument offers the ability to weigh word instances more or less heavily depending on how they are labeled. The weight arrays specify how heavily to weigh each category of word, in the order:

{D-weight, C-weight, B-weight, A-weight}

If no_weights_are provided, then these defaults are used:

{0.1, 0.2, 0.4, 1.0}

Typically weights are used to mark words from special areas of the document, like the title or an initial abstract, so they can be treated with more or less importance than words in the document body.

Since a longer document has a greater chance of containing a query term it is reasonable to take into account document size, e.g., a hundred-word document with five instances of a search word is probably more relevant than a thousand-word document with five instances. Both ranking functions take an integer_normalization_option that specifies whether and how a document's length should impact its rank. The integer option controls several behaviors, so it is a bit mask: you can specify one or more behaviors using|(for example,2|4).

  • 0 (the default) ignores the document length

  • 1 divides the rank by 1 + the logarithm of the document length

  • 2 divides the rank by the document length

  • 4 divides the rank by the mean harmonic distance between extents (this is implemented only byts_rank_cd)

  • 8 divides the rank by the number of unique words in document

  • 16 divides the rank by 1 + the logarithm of the number of unique words in document

  • 32 divides the rank by itself + 1

If more than one flag bit is specified, the transformations are applied in the order listed.

It is important to note that the ranking functions do not use any global information, so it is impossible to produce a fair normalization to 1% or 100% as sometimes desired. Normalization option 32 (rank/(rank+1)) can be applied to scale all ranks into the range zero to one, but of course this is just a cosmetic change; it will not affect the ordering of the search results.

Here is an example that selects only the ten highest-ranked matches:

SELECT title, ts_rank_cd(textsearch, query) AS rank
FROM apod, to_tsquery('neutrino|(dark 
&
 matter)') query
WHERE query @@ textsearch
ORDER BY rank DESC
LIMIT 10;
                     title                     |   rank
-----------------------------------------------+----------
 Neutrinos in the Sun                          |      3.1
 The Sudbury Neutrino Detector                 |      2.4
 A MACHO View of Galactic Dark Matter          |  2.01317
 Hot Gas and Dark Matter                       |  1.91171
 The Virgo Cluster: Hot Plasma and Dark Matter |  1.90953
 Rafting for Solar Neutrinos                   |      1.9
 NGC 4650A: Strange Galaxy and Dark Matter     |  1.85774
 Hot Gas and Dark Matter                       |   1.6123
 Ice Fishing for Cosmic Neutrinos              |      1.6
 Weak Lensing Distorts the Universe            | 0.818218

This is the same example using normalized ranking:

SELECT title, ts_rank_cd(textsearch, query, 32 /* rank/(rank+1) */ ) AS rank
FROM apod, to_tsquery('neutrino|(dark 
&
 matter)') query
WHERE  query @@ textsearch
ORDER BY rank DESC
LIMIT 10;
                     title                     |        rank
-----------------------------------------------+-------------------
 Neutrinos in the Sun                          | 0.756097569485493
 The Sudbury Neutrino Detector                 | 0.705882361190954
 A MACHO View of Galactic Dark Matter          | 0.668123210574724
 Hot Gas and Dark Matter                       |  0.65655958650282
 The Virgo Cluster: Hot Plasma and Dark Matter | 0.656301290640973
 Rafting for Solar Neutrinos                   | 0.655172410958162
 NGC 4650A: Strange Galaxy and Dark Matter     | 0.650072921219637
 Hot Gas and Dark Matter                       | 0.617195790024749
 Ice Fishing for Cosmic Neutrinos              | 0.615384618911517
 Weak Lensing Distorts the Universe            | 0.450010798361481

Ranking can be expensive since it requires consulting thetsvectorof each matching document, which can be I/O bound and therefore slow. Unfortunately, it is almost impossible to avoid since practical queries often result in large numbers of matches.

12.3.4. Highlighting Results

To present search results it is ideal to show a part of each document and how it is related to the query. Usually, search engines show fragments of the document with marked search terms.PostgreSQLprovides a functionts_headlinethat implements this functionality.

ts_headline([
config
regconfig
, 
] 
document
text
, 
query
tsquery
 [
, 
options
text
]) returns 
text

ts_headlineaccepts a document along with a query, and returns an excerpt from the document in which terms from the query are highlighted. The configuration to be used to parse the document can be specified byconfig; if_config_is omitted, thedefault_text_search_configconfiguration is used.

If anoptions_string is specified it must consist of a comma-separated list of one or moreoption=value_pairs. The available options are:

  • StartSel,StopSel: the strings with which to delimit query words appearing in the document, to distinguish them from other excerpted words. You must double-quote these strings if they contain spaces or commas.

  • MaxWords,MinWords: these numbers determine the longest and shortest headlines to output.

  • ShortWord: words of this length or less will be dropped at the start and end of a headline. The default value of three eliminates common English articles.

  • HighlightAll: Boolean flag; iftruethe whole document will be used as the headline, ignoring the preceding three parameters.

  • MaxFragments: maximum number of text excerpts or fragments to display. The default value of zero selects a non-fragment-oriented headline generation method. A value greater than zero selects fragment-based headline generation. This method finds text fragments with as many query words as possible and stretches those fragments around the query words. As a result query words are close to the middle of each fragment and have words on each side. Each fragment will be of at mostMaxWordsand words of lengthShortWordor less are dropped at the start and end of each fragment. If not all query words are found in the document, then a single fragment of the firstMinWordsin the document will be displayed.

  • FragmentDelimiter: When more than one fragment is displayed, the fragments will be separated by this string.

Any unspecified options receive these defaults:

StartSel=
<
b
>
, StopSel=
<
/b
>
,
MaxWords=35, MinWords=15, ShortWord=3, HighlightAll=FALSE,
MaxFragments=0, FragmentDelimiter=" ... "

For example:

SELECT ts_headline('english',
  'The most common type of search
is to find all documents containing given query terms
and return them in order of their similarity to the
query.',
  to_tsquery('query 
&
 similarity'));
                        ts_headline                         
------------------------------------------------------------
 containing given 
<
b
>
query
<
/b
>
 terms
 and return them in order of their 
<
b
>
similarity
<
/b
>
 to the

<
b
>
query
<
/b
>
.

SELECT ts_headline('english',
  'The most common type of search
is to find all documents containing given query terms
and return them in order of their similarity to the
query.',
  to_tsquery('query 
&
 similarity'),
  'StartSel = 
<
, StopSel = 
>
');
                      ts_headline                      
-------------------------------------------------------
 containing given 
<
query
>
 terms
 and return them in order of their 
<
similarity
>
 to the

<
query
>
.

ts_headlineuses the original document, not atsvectorsummary, so it can be slow and should be used with care.

Previous12.2. 查詢與索引Next12.4. 延伸功能

Last updated 7 years ago

Was this helpful?

Theto_tsvectorfunction internally calls a parser which breaks the document text into tokens and assigns a type to each token. For each token, a list of dictionaries () is consulted, where the list can vary depending on the token type. The first dictionary thatrecognizes_the token emits one or more normalized_lexemes_to represent the token. For example,ratsbecameratbecause one of the dictionaries recognized that the wordratsis a plural form ofrat. Some words are recognized as_stop words(), which causes them to be ignored since they occur too frequently to be useful in searching. In our example these area,on, andit. If no dictionary in the list recognizes the token then it is also ignored. In this example that happened to the punctuation sign-because there are in fact no dictionaries assigned for its token type (Space symbols), meaning space tokens will never be indexed. The choices of parser, dictionaries and which types of tokens to index are determined by the selected text search configuration (). It is possible to have many different configurations in the same database, and predefined configurations are available for various languages. In our example we used the default configurationenglishfor the English language.

Here we have usedsetweightto label the source of each lexeme in the finishedtsvector, and then merged the labeledtsvectorvalues using thetsvectorconcatenation operator||. (gives details about these operations.)

to_tsquerycreates atsqueryvalue fromquerytext, which must consist of single tokens separated by thetsqueryoperators&(AND),|(OR),!(NOT), and<->(FOLLOWED BY), possibly grouped using parentheses. In other words, the input toto_tsquerymust already follow the general rules fortsqueryinput, as described in. The difference is that while basictsqueryinput takes the tokens at face value,to_tsquerynormalizes each token into a lexeme using the specified or default configuration, and discards any tokens that are stop words according to the configuration. For example:

This function requires lexeme positional information to perform its calculation. Therefore, it ignores any“stripped”lexemes in thetsvector. If there are no unstripped lexemes in the input, the result will be zero. (Seefor more information about thestripfunction and positional information intsvectors.)

Section 12.6
Section 12.6.1
Section 12.7
Section 12.4.1
Section 8.11.2
Section 12.4.1