1 of 5

15. 平行查詢

PostgreSQL 可以設計平行運算的查詢計劃，利用多個 CPU 來更快地回應查詢。此功能稱為平行查詢。許多查詢無法從平行查詢中受益，或者是由於目前實作的限制，或者因為沒有比序列查詢計劃可以想到更快的查詢計劃。但是，對於可以受益的查詢，平行查詢的加速通常非常重要。使用平行查詢時，許多查詢的執行速度可能會提高兩倍以上，並且某些查詢的執行速度可能會提高四倍甚至更多。涉及大量資料但只向使用者回傳少量資料列的查詢通常會受益最多。本章解釋了一些關於平行查詢如何工作的細節，以及在哪些情況下可以使用這些細節，以便希望使用它的使用者可以理解期望的內容。

15.1. 如何運作？

當優化器確定平行查詢是某個查詢的最快執行策略時，它將建立一個查詢計劃，其中包含一個 Gather 或 Gather Merge 節點。這是一個簡單的例子：

EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Gather  (cost=1000.00..217018.43 rows=1 width=97)
   Workers Planned: 2
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..216018.33 rows=1 width=97)
         Filter: (filler ~~ '%x%'::text)
(4 rows)

在所有情況下，Gather 或 Gather Merge 合併節點都只有一個子計劃，它是平行執行計劃的一部分。如果 Gather 或 Gather Merge 節點位於計劃樹的頂部，則整個查詢將會平行執行。如果它在計劃樹中的其他位置，那麼只有它下面的計劃部分將平行運行。在上面的範例中，查詢只存取一個資料表，因此除了 Gather 節點本身之外，只有一個計劃節點；由於該計劃節點是 Gather 節點的子節點，因此它將平行運行。

使用 EXPLAIN，您可以看到規劃器選擇的後端程序數量。當在查詢執行過程中到達 Gather 節點時，執行使用者連線的程序將請求與規劃器選擇的後端程序數量相等的後端工作程序。規劃器將考慮使用的後端工作程序數量最多限制為 max_parallel_workers_per_gather。任何時候可以存在的後端工作程序總數受 max_worker_processes 和 max_parallel_workers 限制。因此，平行查詢可以用比計劃更少的工作程序運行，甚至根本不需要工作程序。最佳計劃可能取決於可用的工作程序數量，因此這可能會導致查詢性能較差。如果頻繁發生，請考慮增加 max_worker_processes 和 max_parallel_workers，以便可以同時運行更多工作程序，或者減少 max_parallel_workers_per_gather，以便規劃器請求更少的工作程序。

為給予的平行查詢成功啟動的每個後端工作程序都將執行計劃的平行部分。領導程序也會執行計劃的一部分，但它還有一個額外的責任：它還必須讀取其他後諯程序所生成的所有資料。當計劃的平行部分只産生少量資料時，領導程序通常會表現得非常像一個額外的後端程序，而加快了查詢的執行速度。相反地，當計劃的平行部分産生大量資料時，領導程序可能幾乎完全被讀取由後端程序産生的資料所佔據，並且執行 Gather 節點級之上的計劃節點所需的任何進一步處理步驟或是扮演 Gather Merge 節點。在這種情況下，領導程序將會少量執行計劃的平行部分。

當計劃的平行行部分頂部的節點是 Gather Merge 而不是 Gather 時，它表示執行計劃的平行部分的每個程序正在按排序順序産生資料，並且領導程序正依序執行保持合併。相反地，Gather 只以便利的順序讀取後端程序的資料，會破壞可能存在的任何排序順序。

15.2. 啓用時機？

There are several settings which can cause the query planner not to generate a parallel query plan under any circumstances. In order for any parallel query plans whatsoever to be generated, the following settings must be configured as indicated.

max_parallel_workers_per_gathermust be set to a value which is greater than zero. This is a special case of the more general principle that no more workers should be used than the number configured viamax_parallel_workers_per_gather.
dynamic_shared_memory_typemust be set to a value other thannone. Parallel query requires dynamic shared memory in order to pass data between cooperating processes.

In addition, the system must not be running in single-user mode. Since the entire database system is running in single process in this situation, no background workers will be available.

Even when it is in general possible for parallel query plans to be generated, the planner will not generate them for a given query if any of the following are true:

The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. This is a limitation of the current implementation which could be lifted in a future release.
The query might be suspended during execution. In any situation in which the system thinks that partial or incremental execution might occur, no parallel plan is generated. For example, a cursor created usingDECLARE CURSORwill never use a parallel plan. Similarly, a PL/pgSQL loop of the formFOR x IN query LOOP .. END LOOPwill never use a parallel plan, because the parallel query system is unable to verify that the code in the loop is safe to execute while parallel query is active.
The query uses any function markedPARALLEL UNSAFE. Most system-defined functions arePARALLEL SAFE, but user-defined functions are markedPARALLEL UNSAFEby default. See the discussion ofSection 15.4.
The query is running inside of another query that is already parallel. For example, if a function called by a parallel query issues an SQL query itself, that query will never use a parallel plan. This is a limitation of the current implementation, but it may not be desirable to remove this limitation, since it could result in a single query using a very large number of processes.
The transaction isolation level is serializable. This is a limitation of the current implementation.

Even when parallel query plan is generated for a particular query, there are several circumstances under which it will be impossible to execute that plan in parallel at execution time. If this occurs, the leader will execute the portion of the plan below theGathernode entirely by itself, almost as if theGathernode were not present. This will happen if any of the following conditions are met:

No background workers can be obtained because of the limitation that the total number of background workers cannot exceedmax_worker_processes.
No background workers can be obtained because of the limitation that the total number of background workers launched for purposes of parallel query cannot exceedmax_parallel_workers.
The client sends an Execute message with a non-zero fetch count. See the discussion of theextended query protocol. Sincelibpqcurrently provides no way to send such a message, this can only occur when using a client that does not rely on libpq. If this is a frequent occurrence, it may be a good idea to setmax_parallel_workers_per_gatherin sessions where it is likely, so as to avoid generating query plans that may be suboptimal when run serially.
A prepared statement is executed using aCREATE TABLE .. AS EXECUTE ..statement. This construct converts what otherwise would have been a read-only operation into a read-write operation, making it ineligible for parallel query.
The transaction isolation level is serializable. This situation does not normally arise, because parallel query plans are not generated when the transaction isolation level is serializable. However, it can happen if the transaction isolation level is changed to serializable after the plan is generated and before it is executed.

15.3. 平行查詢計畫

15.3.1. Parallel Scans

15.3.2. Parallel Joins

15.3.3. Parallel Aggregation

15.3.4. Parallel Plan Tips

Because each worker executes the parallel portion of the plan to completion, it is not possible to simply take an ordinary query plan and run it using multiple workers. Each worker would produce a full copy of the output result set, so the query would not run any faster than normal but would produce incorrect results. Instead, the parallel portion of the plan must be what is known internally to the query optimizer as apartial plan; that is, it must be constructed so that each process which executes the plan will generate only a subset of the output rows in such a way that each required output row is guaranteed to be generated by exactly one of the cooperating processes.

15.3.1. Parallel Scans

The following types of parallel-aware table scans are currently supported.

In aparallel sequential scan, the table's blocks will be divided among the cooperating processes. Blocks are handed out one at a time, so that access to the table remains sequential.
In aparallel bitmap heap scan, one process is chosen as the leader. That process performs a scan of one or more indexes and builds a bitmap indicating which table blocks need to be visited. These blocks are then divided among the cooperating processes as in a parallel sequential scan. In other words, the heap scan is performed in parallel, but the underlying index scan is not.
In aparallel index scan_or_parallel index-only scan, the cooperating processes take turns reading data from the index. Currently, parallel index scans are supported only for btree indexes. Each process will claim a single index block and will scan and return all tuples referenced by that block; other process can at the same time be returning tuples from a different index block. The results of a parallel btree scan are returned in sorted order within each worker process.

Only the scan types listed above may be used for a scan on the driving table within a parallel plan. Other scan types, such as parallel scans of non-btree indexes, may be supported in the future.

15.3.2. Parallel Joins

Just as in a non-parallel plan, the driving table may be joined to one or more other tables using a nested loop, hash join, or merge join. The inner side of the join may be any kind of non-parallel plan that is otherwise supported by the planner provided that it is safe to run within a parallel worker. For example, if a nested loop join is chosen, the inner plan may be an index scan which looks up a value taken from the outer side of the join.

Each worker will execute the inner side of the join in full. This is typically not a problem for nested loops, but may be inefficient for cases involving hash or merge joins. For example, for a hash join, this restriction means that an identical hash table is built in each worker process, which works fine for joins against small tables but may not be efficient when the inner table is large. For a merge join, it might mean that each worker performs a separate sort of the inner relation, which could be slow. Of course, in cases where a parallel plan of this type would be inefficient, the query planner will normally choose some other plan (possibly one which does not use parallelism) instead.

15.3.3. Parallel Aggregation

PostgreSQLsupports parallel aggregation by aggregating in two stages. First, each process participating in the parallel portion of the query performs an aggregation step, producing a partial result for each group of which that process is aware. This is reflected in the plan as aPartial Aggregatenode. Second, the partial results are transferred to the leader via theGathernode. Finally, the leader re-aggregates the results across all workers in order to produce the final result. This is reflected in the plan as aFinalize Aggregatenode.

Because theFinalize Aggregatenode runs on the leader process, queries which produce a relatively large number of groups in comparison to the number of input rows will appear less favorable to the query planner. For example, in the worst-case scenario the number of groups seen by theFinalize Aggregatenode could be as many as the number of input rows which were seen by all worker processes in thePartial Aggregatestage. For such cases, there is clearly going to be no performance benefit to using parallel aggregation. The query planner takes this into account during the planning process and is unlikely to choose parallel aggregate in this scenario.

Parallel aggregation is not supported in all situations. Each aggregate must besafefor parallelism and must have a combine function. If the aggregate has a transition state of typeinternal, it must have serialization and deserialization functions. SeeCREATE AGGREGATEfor more details. Parallel aggregation is not supported if any aggregate function call containsDISTINCTorORDER BYclause and is also not supported for ordered set aggregates or when the query involvesGROUPING SETS. It can only be used when all joins involved in the query are also part of the parallel portion of the plan.

15.3.4. Parallel Plan Tips

If a query that is expected to do so does not produce a parallel plan, you can try reducingparallel_setup_costorparallel_tuple_cost. Of course, this plan may turn out to be slower than the serial plan which the planner preferred, but this will not always be the case. If you don't get a parallel plan even with very small values of these settings (e.g. after setting them both to zero), there may be some reason why the query planner is unable to generate a parallel plan for your query. SeeSection 15.2andSection 15.4for information on why this may be the case.

When executing a parallel plan, you can useEXPLAIN (ANALYZE, VERBOSE)to display per-worker statistics for each plan node. This may be useful in determining whether the work is being evenly distributed between all plan nodes and more generally in understanding the performance characteristics of the plan.

15.4. 平行查詢的安全性

15.4.1. Parallel Labeling for Functions and Aggregates

The planner classifies operations involved in a query as eitherparallel safe,parallel restricted, orparallel unsafe. A parallel safe operation is one which does not conflict with the use of parallel query. A parallel restricted operation is one which cannot be performed in a parallel worker, but which can be performed in the leader while parallel query is in use. Therefore, parallel restricted operations can never occur below aGathernode, but can occur elsewhere in a plan which contains aGathernode. A parallel unsafe operation is one which cannot be performed while parallel query is in use, not even in the leader. When a query contains anything which is parallel unsafe, parallel query is completely disabled for that query.

The following operations are always parallel restricted.

Scans of common table expressions (CTEs).
Scans of temporary tables.
Scans of foreign tables, unless the foreign data wrapper has anIsForeignScanParallelSafeAPI which indicates otherwise.
Access to anInitPlanorSubPlan.

15.4.1. Parallel Labeling for Functions and Aggregates

The planner cannot automatically determine whether a user-defined function or aggregate is parallel safe, parallel restricted, or parallel unsafe, because this would require predicting every operation which the function could possibly perform. In general, this is equivalent to the Halting Problem and therefore impossible. Even for simple functions where it conceivably be done, we do not try, since this would be expensive and error-prone. Instead, all user-defined functions are assumed to be parallel unsafe unless otherwise marked. When usingCREATE FUNCTIONorALTER FUNCTION, markings can be set by specifyingPARALLEL SAFE,PARALLEL RESTRICTED, orPARALLEL UNSAFEas appropriate. When usingCREATE AGGREGATE, thePARALLELoption can be specified withSAFE,RESTRICTED, orUNSAFEas the corresponding value.

Functions and aggregates must be markedPARALLEL UNSAFEif they write to the database, access sequences, change the transaction state even temporarily (e.g. a PL/pgSQL function which establishes anEXCEPTIONblock to catch errors), or make persistent changes to settings. Similarly, functions must be markedPARALLEL RESTRICTEDif they access temporary tables, client connection state, cursors, prepared statements, or miscellaneous backend-local state which the system cannot synchronize across workers. For example,setseedandrandomare parallel restricted for this last reason.

In general, if a function is labeled as being safe when it is restricted or unsafe, or if it is labeled as being restricted when it is in fact unsafe, it may throw errors or produce wrong answers when used in a parallel query. C-language functions could in theory exhibit totally undefined behavior if mislabeled, since there is no way for the system to protect itself against arbitrary C code, but in most likely cases the result will be no worse than for any other function. If in doubt, it is probably best to label functions asUNSAFE.

If a function executed within a parallel worker acquires locks which are not held by the leader, for example by querying a table not referenced in the query, those locks will be released at worker exit, not end of transaction. If you write a function which does this, and this behavior difference is important to you, mark such functions asPARALLEL RESTRICTEDto ensure that they execute only in the leader.

Note that the query planner does not consider deferring the evaluation of parallel-restricted functions or aggregates involved in the query in order to obtain a superior plan. So, for example, if aWHEREclause applied to a particular table is parallel restricted, the query planner will not consider placing the scan of that table below aGathernode. In some cases, it would be possible (and perhaps even efficient) to include the scan of that table in the parallel portion of the query and defer the evaluation of theWHEREclause so that it happens above theGathernode. However, the planner does not do this.

15.2. 啓用時機？

max_parallel_workers_per_gathermust be set to a value which is greater than zero. This is a special case of the more general principle that no more workers should be used than the number configured viamax_parallel_workers_per_gather.
dynamic_shared_memory_typemust be set to a value other thannone. Parallel query requires dynamic shared memory in order to pass data between cooperating processes.

In addition, the system must not be running in single-user mode. Since the entire database system is running in single process in this situation, no background workers will be available.

Even when it is in general possible for parallel query plans to be generated, the planner will not generate them for a given query if any of the following are true:

The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. This is a limitation of the current implementation which could be lifted in a future release.
The query might be suspended during execution. In any situation in which the system thinks that partial or incremental execution might occur, no parallel plan is generated. For example, a cursor created usingDECLARE CURSORwill never use a parallel plan. Similarly, a PL/pgSQL loop of the formFOR x IN query LOOP .. END LOOPwill never use a parallel plan, because the parallel query system is unable to verify that the code in the loop is safe to execute while parallel query is active.
The query uses any function markedPARALLEL UNSAFE. Most system-defined functions arePARALLEL SAFE, but user-defined functions are markedPARALLEL UNSAFEby default. See the discussion ofSection 15.4.
The query is running inside of another query that is already parallel. For example, if a function called by a parallel query issues an SQL query itself, that query will never use a parallel plan. This is a limitation of the current implementation, but it may not be desirable to remove this limitation, since it could result in a single query using a very large number of processes.
The transaction isolation level is serializable. This is a limitation of the current implementation.

No background workers can be obtained because of the limitation that the total number of background workers cannot exceedmax_worker_processes.
No background workers can be obtained because of the limitation that the total number of background workers launched for purposes of parallel query cannot exceedmax_parallel_workers.
The client sends an Execute message with a non-zero fetch count. See the discussion of theextended query protocol. Sincelibpqcurrently provides no way to send such a message, this can only occur when using a client that does not rely on libpq. If this is a frequent occurrence, it may be a good idea to setmax_parallel_workers_per_gatherin sessions where it is likely, so as to avoid generating query plans that may be suboptimal when run serially.
A prepared statement is executed using aCREATE TABLE .. AS EXECUTE ..statement. This construct converts what otherwise would have been a read-only operation into a read-write operation, making it ineligible for parallel query.
The transaction isolation level is serializable. This situation does not normally arise, because parallel query plans are not generated when the transaction isolation level is serializable. However, it can happen if the transaction isolation level is changed to serializable after the plan is generated and before it is executed.

15.4. 平行查詢的安全性

15.4.1. Parallel Labeling for Functions and Aggregates

The following operations are always parallel restricted.

Scans of common table expressions (CTEs).
Scans of temporary tables.
Scans of foreign tables, unless the foreign data wrapper has anIsForeignScanParallelSafeAPI which indicates otherwise.
Access to anInitPlanorSubPlan.

15.4.1. Parallel Labeling for Functions and Aggregates

15.3. 平行查詢計畫

15.3.1. Parallel Scans

15.3.2. Parallel Joins

15.3.3. Parallel Aggregation

15.3.4. Parallel Plan Tips

15.3.1. Parallel Scans

The following types of parallel-aware table scans are currently supported.

In aparallel sequential scan, the table's blocks will be divided among the cooperating processes. Blocks are handed out one at a time, so that access to the table remains sequential.
In aparallel bitmap heap scan, one process is chosen as the leader. That process performs a scan of one or more indexes and builds a bitmap indicating which table blocks need to be visited. These blocks are then divided among the cooperating processes as in a parallel sequential scan. In other words, the heap scan is performed in parallel, but the underlying index scan is not.
In aparallel index scan_or_parallel index-only scan, the cooperating processes take turns reading data from the index. Currently, parallel index scans are supported only for btree indexes. Each process will claim a single index block and will scan and return all tuples referenced by that block; other process can at the same time be returning tuples from a different index block. The results of a parallel btree scan are returned in sorted order within each worker process.

Only the scan types listed above may be used for a scan on the driving table within a parallel plan. Other scan types, such as parallel scans of non-btree indexes, may be supported in the future.