1 of 5

15. 平行查詢

PostgreSQL 可以設計平行運算的查詢計劃，利用多個 CPU 來更快地回應查詢。此功能稱為平行查詢。許多查詢無法從平行查詢中受益，或者是由於目前實作的限制，或者因為沒有比序列查詢計劃可以想到更快的查詢計劃。但是，對於可以受益的查詢，平行查詢的加速通常非常重要。使用平行查詢時，許多查詢的執行速度可能會提高兩倍以上，並且某些查詢的執行速度可能會提高四倍甚至更多。涉及大量資料但只向使用者回傳少量資料列的查詢通常會受益最多。本章解釋了一些關於平行查詢如何工作的細節，以及在哪些情況下可以使用這些細節，以便希望使用它的使用者可以理解期望的內容。

15.1. 如何運作？

當優化器確定平行查詢是某個查詢的最快執行策略時，它將建立一個查詢計劃，其中包含一個 Gather 或 Gather Merge 節點。這是一個簡單的例子：

EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Gather  (cost=1000.00..217018.43 rows=1 width=97)
   Workers Planned: 2
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..216018.33 rows=1 width=97)
         Filter: (filler ~~ '%x%'::text)
(4 rows)

在所有情況下，Gather 或 Gather Merge 合併節點都只有一個子計劃，它是平行執行計劃的一部分。如果 Gather 或 Gather Merge 節點位於計劃樹的頂部，則整個查詢將會平行執行。如果它在計劃樹中的其他位置，那麼只有它下面的計劃部分將平行運行。在上面的範例中，查詢只存取一個資料表，因此除了 Gather 節點本身之外，只有一個計劃節點；由於該計劃節點是 Gather 節點的子節點，因此它將平行運行。

使用 EXPLAIN，您可以看到規劃器選擇的後端程序數量。當在查詢執行過程中到達 Gather 節點時，執行使用者連線的程序將請求與規劃器選擇的後端程序數量相等的後端工作程序。規劃器將考慮使用的後端工作程序數量最多限制為 max_parallel_workers_per_gather。任何時候可以存在的後端工作程序總數受 max_worker_processes 和 max_parallel_workers 限制。因此，平行查詢可以用比計劃更少的工作程序運行，甚至根本不需要工作程序。最佳計劃可能取決於可用的工作程序數量，因此這可能會導致查詢性能較差。如果頻繁發生，請考慮增加 max_worker_processes 和 max_parallel_workers，以便可以同時運行更多工作程序，或者減少 max_parallel_workers_per_gather，以便規劃器請求更少的工作程序。

為給予的平行查詢成功啟動的每個後端工作程序都將執行計劃的平行部分。領導程序也會執行計劃的一部分，但它還有一個額外的責任：它還必須讀取其他後諯程序所生成的所有資料。當計劃的平行部分只産生少量資料時，領導程序通常會表現得非常像一個額外的後端程序，而加快了查詢的執行速度。相反地，當計劃的平行部分産生大量資料時，領導程序可能幾乎完全被讀取由後端程序産生的資料所佔據，並且執行 Gather 節點級之上的計劃節點所需的任何進一步處理步驟或是扮演 Gather Merge 節點。在這種情況下，領導程序將會少量執行計劃的平行部分。

當計劃的平行行部分頂部的節點是 Gather Merge 而不是 Gather 時，它表示執行計劃的平行部分的每個程序正在按排序順序産生資料，並且領導程序正依序執行保持合併。相反地，Gather 只以便利的順序讀取後端程序的資料，會破壞可能存在的任何排序順序。

15.2. 啓用時機？

There are several settings which can cause the query planner not to generate a parallel query plan under any circumstances. In order for any parallel query plans whatsoever to be generated, the following settings must be configured as indicated.

max_parallel_workers_per_gather must be set to a value which is greater than zero. This is a special case of the more general principle that no more workers should be used than the number configured via max_parallel_workers_per_gather.

In addition, the system must not be running in single-user mode. Since the entire database system is running in single process in this situation, no background workers will be available.

Even when it is in general possible for parallel query plans to be generated, the planner will not generate them for a given query if any of the following are true:

The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. As an exception, the commands CREATE TABLE ... AS, SELECT INTO, and CREATE MATERIALIZED VIEW which create a new table and populate it can use a parallel plan.
The query might be suspended during execution. In any situation in which the system thinks that partial or incremental execution might occur, no parallel plan is generated. For example, a cursor created using DECLARE CURSOR will never use a parallel plan. Similarly, a PL/pgSQL loop of the form FOR x IN query LOOP .. END LOOP will never use a parallel plan, because the parallel query system is unable to verify that the code in the loop is safe to execute while parallel query is active.
The query uses any function marked PARALLEL UNSAFE. Most system-defined functions are PARALLEL SAFE, but user-defined functions are marked PARALLEL UNSAFE by default. See the discussion of Section 15.4.
The query is running inside of another query that is already parallel. For example, if a function called by a parallel query issues an SQL query itself, that query will never use a parallel plan. This is a limitation of the current implementation, but it may not be desirable to remove this limitation, since it could result in a single query using a very large number of processes.

Even when parallel query plan is generated for a particular query, there are several circumstances under which it will be impossible to execute that plan in parallel at execution time. If this occurs, the leader will execute the portion of the plan below the Gather node entirely by itself, almost as if the Gather node were not present. This will happen if any of the following conditions are met:

No background workers can be obtained because of the limitation that the total number of background workers cannot exceed max_worker_processes.
No background workers can be obtained because of the limitation that the total number of background workers launched for purposes of parallel query cannot exceed max_parallel_workers.
The client sends an Execute message with a non-zero fetch count. See the discussion of the extended query protocol. Since libpq currently provides no way to send such a message, this can only occur when using a client that does not rely on libpq. If this is a frequent occurrence, it may be a good idea to set max_parallel_workers_per_gather to zero in sessions where it is likely, so as to avoid generating query plans that may be suboptimal when run serially.

15.3. 平行查詢計畫

Because each worker executes the parallel portion of the plan to completion, it is not possible to simply take an ordinary query plan and run it using multiple workers. Each worker would produce a full copy of the output result set, so the query would not run any faster than normal but would produce incorrect results. Instead, the parallel portion of the plan must be what is known internally to the query optimizer as a partial plan; that is, it must be constructed so that each process which executes the plan will generate only a subset of the output rows in such a way that each required output row is guaranteed to be generated by exactly one of the cooperating processes. Generally, this means that the scan on the driving table of the query must be a parallel-aware scan.

15.3.1. Parallel Scans

The following types of parallel-aware table scans are currently supported.

In a parallel sequential scan, the table's blocks will be divided among the cooperating processes. Blocks are handed out one at a time, so that access to the table remains sequential.
In a parallel bitmap heap scan, one process is chosen as the leader. That process performs a scan of one or more indexes and builds a bitmap indicating which table blocks need to be visited. These blocks are then divided among the cooperating processes as in a parallel sequential scan. In other words, the heap scan is performed in parallel, but the underlying index scan is not.
In a parallel index scan or parallel index-only scan, the cooperating processes take turns reading data from the index. Currently, parallel index scans are supported only for btree indexes. Each process will claim a single index block and will scan and return all tuples referenced by that block; other processes can at the same time be returning tuples from a different index block. The results of a parallel btree scan are returned in sorted order within each worker process.

Other scan types, such as scans of non-btree indexes, may support parallel scans in the future.

15.3.2. Parallel Joins

Just as in a non-parallel plan, the driving table may be joined to one or more other tables using a nested loop, hash join, or merge join. The inner side of the join may be any kind of non-parallel plan that is otherwise supported by the planner provided that it is safe to run within a parallel worker. Depending on the join type, the inner side may also be a parallel plan.

In a nested loop join, the inner side is always non-parallel. Although it is executed in full, this is efficient if the inner side is an index scan, because the outer tuples and thus the loops that look up values in the index are divided over the cooperating processes.
In a merge join, the inner side is always a non-parallel plan and therefore executed in full. This may be inefficient, especially if a sort must be performed, because the work and resulting data are duplicated in every cooperating process.
In a hash join (without the "parallel" prefix), the inner side is executed in full by every cooperating process to build identical copies of the hash table. This may be inefficient if the hash table is large or the plan is expensive. In a parallel hash join, the inner side is a parallel hash that divides the work of building a shared hash table over the cooperating processes.

15.3.3. Parallel Aggregation

PostgreSQL supports parallel aggregation by aggregating in two stages. First, each process participating in the parallel portion of the query performs an aggregation step, producing a partial result for each group of which that process is aware. This is reflected in the plan as a Partial Aggregate node. Second, the partial results are transferred to the leader via Gather or Gather Merge. Finally, the leader re-aggregates the results across all workers in order to produce the final result. This is reflected in the plan as a Finalize Aggregate node.

Because the Finalize Aggregate node runs on the leader process, queries which produce a relatively large number of groups in comparison to the number of input rows will appear less favorable to the query planner. For example, in the worst-case scenario the number of groups seen by the Finalize Aggregate node could be as many as the number of input rows which were seen by all worker processes in the Partial Aggregate stage. For such cases, there is clearly going to be no performance benefit to using parallel aggregation. The query planner takes this into account during the planning process and is unlikely to choose parallel aggregate in this scenario.

Parallel aggregation is not supported in all situations. Each aggregate must be safe for parallelism and must have a combine function. If the aggregate has a transition state of type internal, it must have serialization and deserialization functions. See CREATE AGGREGATE for more details. Parallel aggregation is not supported if any aggregate function call contains DISTINCT or ORDER BY clause and is also not supported for ordered set aggregates or when the query involves GROUPING SETS. It can only be used when all joins involved in the query are also part of the parallel portion of the plan.

15.3.4. Parallel Append

每當 PostgreSQL 需要將來自多個資料源的資料列合併到一個結果集合之中時，它都會使用一個 Append 或 MergeAppend 的查詢計劃節點。在執行 UNION ALL 或掃描分割區資料表時，通常會是這種情況。這樣的節點可以在平行查詢計劃中使用，就像在其他任何查詢計劃中一樣。但是，在平行查詢計劃中，查詢計劃程序可以改為使用「Parallel Append」節點。

當在平行查詢計劃中使用 Append 節點時，每個流程將按照它們出現的順序執行子計劃，以便所有參與的流程合作執行第一個子計劃，直到完成為止，然後移至第二個計劃。大約在同一時間。當使用 Parallel Append 時，執行程序時，將在其子計劃中盡可能平均地分散參與的程序，以便同時執行多個子計劃。這樣可以避免競爭使用，也可以避免在從未執行子計劃的程序中負擔子計劃的啟動成本。

同樣，與一般的 Append 節點不同，一般的 Append 節點在平行查詢計劃中使用時只能具有 partial 子項，而 Parallel Append 節點可以同時具有 partial 和 non-partial 子計劃。non-partial 子項只能透過一個程序進行掃描，因為多次掃描它們會產生重複的結果內容。因此，即使沒有有效的局部計劃，涉及附加多個結果集合的計劃也可以實作粗粒度的平行處理。例如，考慮對分割資料表的查詢，該查詢只能透過使用不支援平行掃描的索引來有效地實作。計劃程序可以選擇“一般索引掃描”計劃的“Parallel Append”；每個單獨的索引掃描都必須由一個程序執行才能完成，但是不同的程序可以同時執行不同的掃描。

enable_parallel_append 可用於停用此功能。

15.3.5. Parallel Plan Tips

If a query that is expected to do so does not produce a parallel plan, you can try reducing parallel_setup_cost or parallel_tuple_cost. Of course, this plan may turn out to be slower than the serial plan which the planner preferred, but this will not always be the case. If you don't get a parallel plan even with very small values of these settings (e.g. after setting them both to zero), there may be some reason why the query planner is unable to generate a parallel plan for your query. See Section 15.2 and Section 15.4 for information on why this may be the case.

When executing a parallel plan, you can use EXPLAIN (ANALYZE, VERBOSE) to display per-worker statistics for each plan node. This may be useful in determining whether the work is being evenly distributed between all plan nodes and more generally in understanding the performance characteristics of the plan.

15.4. 平行查詢的安全性

The planner classifies operations involved in a query as either parallel safe, parallel restricted, or parallel unsafe. A parallel safe operation is one which does not conflict with the use of parallel query. A parallel restricted operation is one which cannot be performed in a parallel worker, but which can be performed in the leader while parallel query is in use. Therefore, parallel restricted operations can never occur below a Gather or Gather Merge node, but can occur elsewhere in a plan which contains such a node. A parallel unsafe operation is one which cannot be performed while parallel query is in use, not even in the leader. When a query contains anything which is parallel unsafe, parallel query is completely disabled for that query.

The following operations are always parallel restricted.

Scans of common table expressions (CTEs).
Scans of temporary tables.
Scans of foreign tables, unless the foreign data wrapper has an IsForeignScanParallelSafe API which indicates otherwise.
Plan nodes to which an InitPlan is attached.
Plan nodes which reference a correlated SubPlan.

15.4.1. Parallel Labeling for Functions and Aggregates

The planner cannot automatically determine whether a user-defined function or aggregate is parallel safe, parallel restricted, or parallel unsafe, because this would require predicting every operation which the function could possibly perform. In general, this is equivalent to the Halting Problem and therefore impossible. Even for simple functions where it could conceivably be done, we do not try, since this would be expensive and error-prone. Instead, all user-defined functions are assumed to be parallel unsafe unless otherwise marked. When using CREATE FUNCTION or ALTER FUNCTION, markings can be set by specifying PARALLEL SAFE, PARALLEL RESTRICTED, or PARALLEL UNSAFE as appropriate. When using CREATE AGGREGATE, the PARALLEL option can be specified with SAFE, RESTRICTED, or UNSAFE as the corresponding value.

Functions and aggregates must be marked PARALLEL UNSAFE if they write to the database, access sequences, change the transaction state even temporarily (e.g. a PL/pgSQL function which establishes an EXCEPTION block to catch errors), or make persistent changes to settings. Similarly, functions must be marked PARALLEL RESTRICTED if they access temporary tables, client connection state, cursors, prepared statements, or miscellaneous backend-local state which the system cannot synchronize across workers. For example, setseed and random are parallel restricted for this last reason.

In general, if a function is labeled as being safe when it is restricted or unsafe, or if it is labeled as being restricted when it is in fact unsafe, it may throw errors or produce wrong answers when used in a parallel query. C-language functions could in theory exhibit totally undefined behavior if mislabeled, since there is no way for the system to protect itself against arbitrary C code, but in most likely cases the result will be no worse than for any other function. If in doubt, it is probably best to label functions as UNSAFE.

If a function executed within a parallel worker acquires locks which are not held by the leader, for example by querying a table not referenced in the query, those locks will be released at worker exit, not end of transaction. If you write a function which does this, and this behavior difference is important to you, mark such functions as PARALLEL RESTRICTED to ensure that they execute only in the leader.

Note that the query planner does not consider deferring the evaluation of parallel-restricted functions or aggregates involved in the query in order to obtain a superior plan. So, for example, if a WHERE clause applied to a particular table is parallel restricted, the query planner will not consider performing a scan of that table in the parallel portion of a plan. In some cases, it would be possible (and perhaps even efficient) to include the scan of that table in the parallel portion of the query and defer the evaluation of the WHERE clause so that it happens above the Gather node. However, the planner does not do this.