1 of 7

13. 一致性管理（MVCC）

本章描述兩個或更多連線在同一時間嘗試存取相同資料時 PostgreSQL 資料庫系統的行為。這種情況下的目標是允許所有連線進行高效率的存取，同時保持嚴格的資料一致性。每個資料庫應用程式開發人員都應該熟悉本章所涉及的主題。

13.1. 簡介

PostgreSQL 為開發者們提供了豐富的工具來管理資料的同時存取。資料的一致性在內部會以多版本模式的多版本一致性控制（Multiversion Concurrency Control，MVCC），這表示無論目前資料的當下狀態如何，每個 SQL 指令會看見的是資料在一段時間前的快照（資料庫的某個版本）。這個機制可以避免指令看到由其他同時交易正在更新同個資料列所產生的資料不一致，也對每個資料庫的連線階段提供了交易隔離。MVCC 也藉由避開傳統資料庫系統的上鎖方式減少了鎖的競爭，以在多使用者的環境中提供合理的效能。

相較於鎖定的機制來說，一致性控制使用 MVCC 模式的主要優勢，是在於 MVCC 對於查詢（讀取）資料的鎖並不會和寫入資料的鎖發生衝突，因此讀取不會阻擋寫入、寫入也不會阻擋讀取。PostgreSQL 即使在提供最嚴格的交易隔離等級中，也會透過使用創新的可序列化快照隔離（Serializable Snapshot Isolation，SSI）等級來維持這個保證。

對於不需要完整的交易隔離、或者喜歡明確地管理特定衝突點的應用程式，PostgreSQL 也提供資料表和資料列等級的鎖定功能。然而，適當地使用 MVCC 一般能夠提供比鎖定功能更佳的效能。此外，應用程式定義的 advisory lock 提供了一種與交易事務無關的鎖定機制。

13.2. 交易隔離

SQL 標準中定義了四個等級的交易隔離，其中最嚴格的隔離是「Serializable（序列化）」。序列化在標準的描述中，被定義為任意序列化交易的並行操作，都會保證產出與依照某種任意順序一個一個執行它們的效果相同。其他三個等級則是經由現象來定義的，即在不同的並行交易間的互動，這些現象在各個等集中必不能發生。標準中也注意到，基於 Serializable 的定義，這些現象都不可能發生在 Serializable 等級。（這並不難理解 -- 如果交易的影響必須與一個一個執行的結果一致，你怎麼會看到這些因為互動而產生的現象呢？）

在不同等級中被禁止的現象是：

dirty read（髒讀） 交易讀取的資料是由尚未提交的並行交易寫入的。

nonrepeatable read（無法重複的讀取） 交易重新讀取它之前讀過的資料，但是卻發現資料被其他交易修改（在最初讀取之後提交）了。

phantom read（幻讀） 交易重新執行查詢，得到滿足搜尋條件的資料集，但卻發現得到的資料集因為其他最近剛提交的交易而變更了。

serialization anomaly（序列化異常） 在成功提交一群交易後，結果與以所有可能的順序依序執行交易的結果都不一致。

SQL 標準以及 PostgreSQL 實作的交易隔離等級，可參閱 Table 13.1。

Table 13.1. 交易隔離等級

在 PostgreSQL 當中，你可以要求上述四種交易隔離等級的任何一種，但在 PostgreSQL 內部實際上實作的只有三種隔離等級。舉例來說，PostgreSQL 的 Read Uncommitted 模式跟 Read Committed 的行為很相像，這是因為這是能夠把標準的隔離等級對應到 PostgreSQL 的 MVCC 架構的明智方法。

表格中也顯示了 PostgreSQL 的 Repeatable Read 的實作並不允許 phantom read。關於這點，在 SQL 標準中，更嚴格的行為是被容許的：標準中定義的四個隔離等級，只定義了哪些現象必須不會發生，而沒有定義哪些現象必須發生。可用的隔離等級的行為，在接下來的小節中會詳細描述。

若要設定交易的隔離等級，可使用指令 SET TRANSACTION。

有些 PostgreSQL 的資料型態和函式具有特殊的交易行為的規則。具體來說，對序列（以及以 serial 宣告的計算器欄位）造成的變更將會立刻能夠被所有其他交易所看見，並且即使造成變更的交易取消了也不會被還原。請參考 9.16 小節及 8.1.4 小節。

13.2.1. Read Committed 隔離等級

Read Committed（提交讀） 是 PostgreSQL 預設的隔離等級。當交易使用這個隔離等級時，一個 SELECT 的查詢（沒有 FOR UPDATE/SHARE 的宣告）只會看見在查詢開始前已經被提交的資料；它不會看見尚未提交的資料或者是在查詢執行過程中被並行的交易提交的變更。實際上，SELECT 查詢是看見了在查詢開始執行的那一瞬間的資料庫快照。然而，SELECT 會看見在它自己的交易中比它先執行的更新，即使那些更新尚未被提交。同時需要注意的是，對於兩個成功的 SELECT 指令，如果其他交易在第一個 SELECT 開始後、第二個 SELECT 開始前提交變更，則即使這兩個 SELECT 在同一個交易中也可能會看見不同的資料，

UPDATE、DELETE、SELECT FOR UPDATE、和 SELECT FOR SHARE 在搜尋目標資料列的行為與 SELECT 一樣：它們只會找出在指令開始時已經被提交的目標資料列。然而，這些資料列在被找到的時候，有可能已經被其他並行交易更新（或者刪除、鎖定），在這個情況下，這個更新者會等待第一個更新的交易提交或者還原（如果那個交易還正在進行中）。如果第一個更新者還原了，那麼它的影響就無效了，第二個更新者會以原本找到的資料列做更新。而當第一個更新者提交了的狀況，若它把資料列刪除，則第二個更新者會忽略這個資料列，否則其他狀況下第二個更新者會嘗試在更新後的資料列上套用它要做的操作。此時指令中的搜尋條件（WHERE 宣告）會被重新評估，看看更新後的資料列是否依然符合搜尋條件，如果符合的話，第二個更新者就會對更新後的資料列套用它的操作。在 SELECT FOR UPDATE 和 SELECT FOR SHARE 的案例中，這代表會對更新後的資料列上鎖並回傳給客戶端。

包含 ON CONFLICT DO UPDATE 宣告的 INSERT 的行為也很相似。在 Read Committed 模式中，每個插入的資料列會被插入或者更新，除非有其他不相關的錯誤，否則會保證兩種結果的其中一種發生。如果衝突來自於其影響尚未能被 INSERT 所看見的其他交易，即使指令可能沒有該資料列在慣例上可見的版本，UPDATE 宣告也將會影響那個資料列。

包含 ON CONFLICT DO NOTHING 宣告的 INSERT 在其他交易的影響尚未在 INSERT 的快照中可見的狀況，有可能不會執行插入。再次強調，這是只有在 Read Committed 模式的案例。

因為上述的規則，對更新的指令來說有可能會看見不一致的快照：它能夠看見其他並行的更新指令對它嘗試要更新的資料列的影響，但它不會看見這些其他指令對資料庫裡的其他資料列的影響。這個行為使得 Read Committed 模式並不適合牽涉到複雜的搜尋條件的指令；然而，它對於比較簡單的案例卻是剛剛好。例如，考慮以下更新銀行餘額的交易：

BEGIN;
UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345;
UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 7534;
COMMIT;

如果兩個這樣的交易並行嘗試更新帳戶 12345 的餘額，我們顯然會希望第二個交易以帳戶資料列的更新後版本開始。因為每個指令都只會影響可預先決定的資料列，使得讓它看到更新後的資料列版本並不會造成任何有問題的不一致。

其他更複雜的使用狀況，在 Read Committed 模式中可以產生出並非期望的結果。例如考慮一個 DELETE 指令，要操作正好從它的限制條件中加入和移除的資料，像是假設 website 是一張有兩個資料列的表格，其中 website.hits 分別為 9 和 10：

BEGIN;
UPDATE website SET hits = hits + 1;
-- run from another session:  DELETE FROM website WHERE hits = 10;
COMMIT;

即使在 UPDATE 之前與之後，都存在 website.hits = 10 的資料列，但 DELETE 指令不會產生任何影響。這是因為更新前值為 9 的資料列已經被忽略了，並且當 UPDATE 執行完成且 DELETE 取得鎖，新的資料列的值已經不是 10 而是 11，不再滿足限制條件了。

因為 Read Committed 模式在每個指令開始時會取得包含在那個瞬間已經提交的所有交易的快照，其後在同個交易中的指令無論如何都會看到並行交易提交的影響。上述的爭論點在於單一指令是否看見資料庫完全一致的視野。

Read Committed 提供的部份交易隔離對很多應用程式來說已經足夠，並且這個模式很快且容易使用；然而，它沒辦法滿足所有的案例。對於做複雜查詢和更新的應用程式，可能會需要比 Read Committed 更嚴格的資料庫一致性視野。

13.2.2. Repeatable Read 隔離等級

Repeatable Read（重複讀） 隔離等級只會看到在交易開始前已經被提交的資料；它永遠不會看見尚未提交的資料或者在交易期間被並行交易提交的變更。（然而，查詢會看見在它自己的交易中前面的更新所造成的影響，即使那些影響尚未被提交。）這是比 SQL 標準對這個隔離等級要求的還要更強的保證，並且能夠預防除了 serialization anomalies 以外所有在表格 13.1描述的現象。如前所述，這是標準所允許的，因為標準只描述了每個隔離等級一定要提供的最小保護。

這個等級跟 Read Committed 不同的地方在於 Repeatable Read 的交易看見的是在交易的第一個非交易控制指令開始時的快照，而不是當前指令開始時的。因此，一個成功的 SELECT 指令在單一交易之內都會看見相同的資料，例如它們不會看見在它們的交易開始以後其他交易所提交的變更。

使用這個等級的應用程式必須要準備好因為序列化失敗造成的交易重試。

UPDATE、DELETE、SELECT FOR UPDATE、和 SELECT FOR SHARE 指令的行為在搜尋目標資料列時與 SELECT 相同：它們將只會找在交易開始之前已經被提交的目標資料列。然而，這些目標資料列有可能在它被搜尋到時，已經被其他並行交易更新（或者刪除、上鎖），此時，Repeatable Read 的交易會等待第一個更新的交易提交或者還原（如果它正在進行中）。如果第一個更新者還原了，那麼它的影響就無效了，Repeatable Read 的交易就可以對原本找到的資料列做更新。但如果第一個更新者提交了（且確實更新或者刪除這個資料列，而非只是鎖定它而已），那麼 Repeatable Read 的交易將會還原並回應以下的訊息。

ERROR:  could not serialize access due to concurrent update

這是因為 Repeatable Read 的交易無法更新或者鎖定在 Repeatable Read 的交易開始後被其他交易變更過的資料列。

當應用程式收到這個錯誤訊息，它應該要放棄現在的交易並且從頭開始重試整個交易。在第二次的期間，交易會看見之前提交的變更作為它對於資料庫的初始視野，因此在新的交易的更新中，不會有因為使用資料列的新版本為起始點而產生的邏輯衝突。

需注意的是，只有更新的交易可能需要重試；只有讀取的交易永遠不會發生序列化衝突。

Repeatable Read 模式提供了嚴格的保證，每個交易會看見完全穩定的資料庫視野。然而，這個視野並不需要跟並行交易的某些依序（一次一個）執行的結果總是維持一致。舉例來說，在這個等級中即使是一個只有讀取的交易，也可能只看見反應一個批次完成的控制紀錄的更新，但卻沒看見邏輯上是批次的一部分的一個細節紀錄變更，因為它讀取到的是比較早的控制紀錄的版本。嘗試想要在這個隔離等級的交易下執行商業邏輯時，若沒有謹慎地使用明確的鎖去阻止並行交易的話，可能不會正確地運作。

注意

在 PostgreSQL 9.1 以前，一個採用 Serializable 交易隔離等級的要求，會提供跟這裡描述的完全一樣的行為。若想要保持過去的 Serializable 的行為，在現在應該要使用 Repeatable Read。

13.2.3. Serializable 隔離等級

Serializable（序列化） 隔離等級提供了最嚴格的交易隔離。這個等級模擬了對所有提交的交易的一系列交易執行；如同交易們被一個接著一個地執行，連續地、而不是並行地。然而，如同 Repeatable Read，應用程式使用這個等級時必須要準備好因為序列化失敗而重試交易。事實上，這個隔離等級運作地完全與 Repeatable Read 相同，除了它會監視可能造成並行的序列化交易的執行結果跟一個一個執行這些交易產生些微不一致的症狀。這個監視行為並不會比 Repeatable Read 產生額外任何的阻塞（blocking），但監視的確會有一些額外的負擔，並且偵測到會造成 serialization anomaly 的症狀將會觸發 serialization failure 。

舉例來說，考慮有一個表格 mytab，一開始有：

 class | value
-------+-------
     1 |    10
     1 |    20
     2 |   100
     2 |   200

假設有個序列化交易 A 要計算：

SELECT SUM(value) FROM mytab WHERE class = 1;

然後將結果（30）作為 value 插入為 class = 2 的新資料列。同時序列化交易 B 並行地計算：

SELECT SUM(value) FROM mytab WHERE class = 2;

並且得到結果 300，將它插入為 class = 1 的新資料列。接著兩個交易嘗試要提交。如果其中任何一個交易是以 Repeatable Read 隔離等級執行的，兩個交易都會被允許提交；但因為並沒有序列的執行順序與這個結果一致，使用 Serializable 的交易將會導致一個交易提交、另一個被還原並回覆這個訊息：

ERROR:  could not serialize access due to read/write dependencies among transactions

這是因為如果 A 在 B 之前執行的話，B 的 SUM 函數就會得到 330 的總和而不是 300。然而若是將執行順序反過來的話，A 的 SUM 函數也會有總合數值不同的類似問題。

當我們藉由序列化交易來預防 serialization anomaly 的時候，在一個正在進行讀取的交易成功提交之前，任何從持久性使用者資料表所讀取的資料都不該被視為合法的。即使是唯讀的交易也是如此，除非是在讀取時直接視作合法資料的 deferrable 的交易，因為他會一直等到能取得一個確保不被這類問題影響的 snapshot 後才會開始讀取資料。在所有其他的應用情境中，都不應該依賴於稍後被中斷的交易所讀取的資料；取而代之的是．他們應該持續重試整個交易直到成功。

在 PostgreSQL 中可以使用 predicate locking 確保正確的序列化，當一個交易優先執行時，他將會持有一把特別的鎖，用來決定一個寫入操作何時會對先前其他併發交易中的讀取操作產生影響。這種鎖在 PostgreSQL 中並不會導致任何阻塞或是死鎖的情況，他們主要是用來判斷與標記在進行序列化併發交易時，會導致 serialization anomalies 的特定組合。相對而言，一個 Read Committed 或 Repeatable Read 隔離等級的交易，就可能需要取得一整個資料表的鎖才能確保其資料一致性。然而這可能會阻塞其他想存取那張資料表的使用者，或是在使用 SELECT FOR UPDATE 或 SELECT FOR SHARE 的指令時阻塞其他交易進行且造成額外的硬碟資料存取。

就像其他大多數的資料庫系統一樣，PostgreSQL 的 predicate locks 是基於一筆交易所實際訪問的資料的。這將會在 pg_locks 系統介面中的 SIReadLock 模式下呈現。在一次查詢中所需要的鎖會取決於它採用的執行計畫，同時為了避免追蹤各個鎖的狀態而導致記憶體用罄，他將會傾向於把多個細粒度的鎖（比如 tuple locks）組合成更粗粒度的鎖（比如 page locks）。而在一筆唯讀的交易中，如果它檢測到已經沒有會導致 serialization anomaly 的衝突時，它可以提前釋放所持有的 SIRead 鎖，而且實際上唯讀的交易往往可以在開始時就這麼做來避免持有任何 predicate locks。但如果你特別請求一筆 SERIALIZABLE READ ONLY DEFERRABLE 的交易的話，他仍會持續阻塞直到能確認這種情況為止。（這也是唯一一個 Serializable 交易會阻塞，但 Repeatable Read 交易卻不會的狀況。）從另一方面來說，SIRead 鎖通常需要持有直到交易提交之後，直到所有存取資料相互重疊的讀寫交易都完成為止。

一致使用序列化交易可以簡化開發。對於任何成功提交的併發交易的集合，都會有和逐個執行相同的效果，這個保證意味著如果你能證明一筆交易在獨自執行時是正確的，那麼你便可以相信他在任何混雜的序列化交易中也能運作良好，即使你不知道其他交易做了什麼或是使否成功提交也絲毫不影響。對於一個使用這類型技術的環境而言，有一個通用的方法來處理序列化失敗是非常重要的（他通常會返回一個 '40001' 的 SQLSTATE），因為很難去準確的預測哪些交易可能會對相關的資料進行讀寫並進行回滾來預防 serialization anomalies。對於相關資料的讀寫進行監控，以及重啟因為序列化失敗而受影響的交易都需要付出成本，但相對於顯式鎖導致的阻塞以及 SELECT FOR UPDATE 或 SELECT FOR SHARE 指令來得更平衡些，序列化交易確實是某些環境中追求效能的最佳解。

在 PostgresSQL 的 Serializable transaction 的隔離等級中，併發的交易只有在能夠證明有一個可產生同樣結果的序列化執行順序時，才會被允許提交，但他仍無法完全避免實際序列化執行時所引發的錯誤。更準確的來說，即使再插入鍵值之前已經明確檢查確認過他並不存在，仍然可能因為序列化交易存取資料的重疊問題，而導致違反 unique constraint 的衝突。這可以透過在插入可能引發衝突的鍵值之前檢查所有的序列化交易來避免。舉例而言，想像有一個應用程式它向使用者索取一個新的鍵值，並預先透過在資料表中選取他來檢查確保他尚未存在，或是直接選擇現有的最大鍵值加上 1 來生成一個新的鍵值。如果有些序列化交易沒有遵循這個協議反而直接插入鍵值的話，可能會回報違反了 unique constraint，即使他在一連串序列化執行的併發交易中不可能發生。

若要仰賴序列化交易的併發控制來追求效能最佳化的話，你需要考慮到這些議題：

盡可能將交易宣告為 READ ONLY 的
妥善控制使用中的連線數，並在必要時刻使用連線池。這總會是一個重要的效能考量，尤其是在使用序列化交易的工作量繁重的系統中。
別在單一筆交易中安排不必要的事務，以實現簡潔的目的
別讓連線陷入過久的閒置狀態。idle_in_transaction_session_timeout 這個設定參數可以用來自動斷開閒置的會話。
當有了序列化交易所帶來的隔離與保護後，移除掉不必要的顯式鎖、SELECT FOR UPDATE, 和 SELECT FOR SHARE 指令。
當 predicate lock 的資料表因為記憶體不足，而被迫把多個 page-level predicate locks 合併成單一個 relation-level predicate lock 時，序列化失敗率可能會因此上升。你可以透過提高 max_pred_locks_per_transaction, max_pred_locks_per_relation, 且/或 max_pred_locks_per_page 來避免。
線性掃描始終都需要一個 relation-level predicate lock，這可能導致序列化失敗率的上升。透過減少 random_page_cost 且/或增加 cpu_tuple_cost 來鼓勵使用索引掃描可能會有所幫助。記得要在減少交易回滾及重試次數與執行查詢的整題時間變化之間做好權衡。

13.3. 鎖定模式

PostgreSQL provides various lock modes to control concurrent access to data in tables. These modes can be used for application-controlled locking in situations where MVCC does not give the desired behavior. Also, most PostgreSQL commands automatically acquire locks of appropriate modes to ensure that referenced tables are not dropped or modified in incompatible ways while the command executes. (For example, TRUNCATE cannot safely be executed concurrently with other operations on the same table, so it obtains an exclusive lock on the table to enforce that.)

To examine a list of the currently outstanding locks in a database server, use the pg_locks system view. For more information on monitoring the status of the lock manager subsystem, refer to Chapter 27.

13.3.1. Table-Level Locks

The list below shows the available lock modes and the contexts in which they are used automatically by PostgreSQL. You can also acquire any of these locks explicitly with the command LOCK. Remember that all of these lock modes are table-level locks, even if the name contains the word “row”; the names of the lock modes are historical. To some extent the names reflect the typical usage of each lock mode — but the semantics are all the same. The only real difference between one lock mode and another is the set of lock modes with which each conflicts (see Table 13.2). Two transactions cannot hold locks of conflicting modes on the same table at the same time. (However, a transaction never conflicts with itself. For example, it might acquire ACCESS EXCLUSIVE lock and later acquire ACCESS SHARE lock on the same table.) Non-conflicting lock modes can be held concurrently by many transactions. Notice in particular that some lock modes are self-conflicting (for example, an ACCESS EXCLUSIVE lock cannot be held by more than one transaction at a time) while others are not self-conflicting (for example, an ACCESS SHARE lock can be held by multiple transactions).

Table-Level Lock Modes

ACCESS SHARE

Conflicts with the ACCESS EXCLUSIVE lock mode only.

The SELECT command acquires a lock of this mode on referenced tables. In general, any query that only reads a table and does not modify it will acquire this lock mode.ROW SHARE

Conflicts with the EXCLUSIVE and ACCESS EXCLUSIVE lock modes.

The SELECT FOR UPDATE and SELECT FOR SHARE commands acquire a lock of this mode on the target table(s) (in addition to ACCESS SHARE locks on any other tables that are referenced but not selected FOR UPDATE/FOR SHARE).ROW EXCLUSIVE

Conflicts with the SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes.

The commands UPDATE, DELETE, and INSERT acquire this lock mode on the target table (in addition to ACCESS SHARE locks on any other referenced tables). In general, this lock mode will be acquired by any command that modifies data in a table.SHARE UPDATE EXCLUSIVE

Conflicts with the SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode protects a table against concurrent schema changes and VACUUM runs.

Acquired by VACUUM (without FULL), ANALYZE, CREATE INDEX CONCURRENTLY, REINDEX CONCURRENTLY, CREATE STATISTICS, and certain ALTER INDEX and ALTER TABLE variants (for full details see ALTER INDEX and ALTER TABLE).

SHARE

Conflicts with the ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode protects a table against concurrent data changes.

Acquired by CREATE INDEX (without CONCURRENTLY).

SHARE ROW EXCLUSIVE

Conflicts with the ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode protects a table against concurrent data changes, and is self-exclusive so that only one session can hold it at a time.

Acquired by CREATE TRIGGER and some forms of ALTER TABLE (see ALTER TABLE).

EXCLUSIVE

Conflicts with the ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode allows only concurrent ACCESS SHARE locks, i.e., only reads from the table can proceed in parallel with a transaction holding this lock mode.

Acquired by REFRESH MATERIALIZED VIEW CONCURRENTLY.

ACCESS EXCLUSIVE

Conflicts with locks of all modes (ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE). This mode guarantees that the holder is the only transaction accessing the table in any way.

Acquired by the DROP TABLE, TRUNCATE, REINDEX, CLUSTER, VACUUM FULL, and REFRESH MATERIALIZED VIEW (without CONCURRENTLY) commands. Many forms of ALTER INDEX and ALTER TABLE also acquire a lock at this level. This is also the default lock mode for LOCK TABLE statements that do not specify a mode explicitly.

Tip

Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR UPDATE/SHARE) statement.

Once acquired, a lock is normally held until the end of the transaction. But if a lock is acquired after establishing a savepoint, the lock is released immediately if the savepoint is rolled back to. This is consistent with the principle that ROLLBACK cancels all effects of the commands since the savepoint. The same holds for locks acquired within a PL/pgSQL exception block: an error escape from the block releases locks acquired within it.

Table 13.2. Conflicting Lock Modes

13.3.2. Row-Level Locks

In addition to table-level locks, there are row-level locks, which are listed as below with the contexts in which they are used automatically by PostgreSQL. See Table 13.3 for a complete table of row-level lock conflicts. Note that a transaction can hold conflicting locks on the same row, even in different subtransactions; but other than that, two transactions can never hold conflicting locks on the same row. Row-level locks do not affect data querying; they block only writers and lockers to the same row. Row-level locks are released at transaction end or during savepoint rollback, just like table-level locks.

Row-Level Lock Modes

FOR UPDATE

FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.

The FOR UPDATE lock mode is also acquired by any DELETE on a row, and also by an UPDATE that modifies the values on certain columns. Currently, the set of columns considered for the UPDATE case are those that have a unique index on them that can be used in a foreign key (so partial indexes and expressional indexes are not considered), but this may change in the future.

FOR NO KEY UPDATE

Behaves similarly to FOR UPDATE, except that the lock acquired is weaker: this lock will not block SELECT FOR KEY SHARE commands that attempt to acquire a lock on the same rows. This lock mode is also acquired by any UPDATE that does not acquire a FOR UPDATE lock.

FOR SHARE

Behaves similarly to FOR NO KEY UPDATE, except that it acquires a shared lock rather than exclusive lock on each retrieved row. A shared lock blocks other transactions from performing UPDATE, DELETE, SELECT FOR UPDATE or SELECT FOR NO KEY UPDATE on these rows, but it does not prevent them from performing SELECT FOR SHARE or SELECT FOR KEY SHARE.

FOR KEY SHARE

Behaves similarly to FOR SHARE, except that the lock is weaker: SELECT FOR UPDATE is blocked, but not SELECT FOR NO KEY UPDATE. A key-shared lock blocks other transactions from performing DELETE or any UPDATE that changes the key values, but not other UPDATE, and neither does it prevent SELECT FOR NO KEY UPDATE, SELECT FOR SHARE, or SELECT FOR KEY SHARE.

PostgreSQL doesn't remember any information about modified rows in memory, so there is no limit on the number of rows locked at one time. However, locking a row might cause a disk write, e.g., SELECT FOR UPDATE modifies selected rows to mark them locked, and so will result in disk writes.

Table 13.3. Conflicting Row-Level Locks

13.3.3. Page-Level Locks

In addition to table and row locks, page-level share/exclusive locks are used to control read/write access to table pages in the shared buffer pool. These locks are released immediately after a row is fetched or updated. Application developers normally need not be concerned with page-level locks, but they are mentioned here for completeness.

13.3.4. Deadlocks

The use of explicit locking can increase the likelihood of deadlocks, wherein two (or more) transactions each hold locks that the other wants. For example, if transaction 1 acquires an exclusive lock on table A and then tries to acquire an exclusive lock on table B, while transaction 2 has already exclusive-locked table B and now wants an exclusive lock on table A, then neither one can proceed. PostgreSQL automatically detects deadlock situations and resolves them by aborting one of the transactions involved, allowing the other(s) to complete. (Exactly which transaction will be aborted is difficult to predict and should not be relied upon.)

Note that deadlocks can also occur as the result of row-level locks (and thus, they can occur even if explicit locking is not used). Consider the case in which two concurrent transactions modify a table. The first transaction executes:

UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 11111;

This acquires a row-level lock on the row with the specified account number. Then, the second transaction executes:

UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 22222;
UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 11111;

The first UPDATE statement successfully acquires a row-level lock on the specified row, so it succeeds in updating that row. However, the second UPDATE statement finds that the row it is attempting to update has already been locked, so it waits for the transaction that acquired the lock to complete. Transaction two is now waiting on transaction one to complete before it continues execution. Now, transaction one executes:

UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;

Transaction one attempts to acquire a row-level lock on the specified row, but it cannot: transaction two already holds such a lock. So it waits for transaction two to complete. Thus, transaction one is blocked on transaction two, and transaction two is blocked on transaction one: a deadlock condition. PostgreSQL will detect this situation and abort one of the transactions.

The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order. In the example above, if both transactions had updated the rows in the same order, no deadlock would have occurred. One should also ensure that the first lock acquired on an object in a transaction is the most restrictive mode that will be needed for that object. If it is not feasible to verify this in advance, then deadlocks can be handled on-the-fly by retrying transactions that abort due to deadlocks.

So long as no deadlock situation is detected, a transaction seeking either a table-level or row-level lock will wait indefinitely for conflicting locks to be released. This means it is a bad idea for applications to hold transactions open for long periods of time (e.g., while waiting for user input).

13.3.5. Advisory Locks

PostgreSQL 提供了一種建立具有應用程式定義的鎖定方法。這些被稱為 advisory lock，因為系統不會強制使用它們-取決於應用程式是否正確地使用它們。Advisory lock 對於不適用於 MVCC 模型的鎖定策略很有用。例如，Advisory lock 常見的用法是模擬所謂的「flat file」資料管理系統特有的悲觀鎖定策略。雖然儲存在資料表中的識別內容可以用於相同的目的，但 Advisory lock 更快，可以避免資料表膨脹，並在連線結束時由伺服器自動清除。

There are two ways to acquire an advisory lock in PostgreSQL: at session level or at transaction level. Once acquired at session level, an advisory lock is held until explicitly released or the session ends. Unlike standard lock requests, session-level advisory lock requests do not honor transaction semantics: a lock acquired during a transaction that is later rolled back will still be held following the rollback, and likewise an unlock is effective even if the calling transaction fails later. A lock can be acquired multiple times by its owning process; for each completed lock request there must be a corresponding unlock request before the lock is actually released. Transaction-level lock requests, on the other hand, behave more like regular lock requests: they are automatically released at the end of the transaction, and there is no explicit unlock operation. This behavior is often more convenient than the session-level behavior for short-term usage of an advisory lock. Session-level and transaction-level lock requests for the same advisory lock identifier will block each other in the expected way. If a session already holds a given advisory lock, additional requests by it will always succeed, even if other sessions are awaiting the lock; this statement is true regardless of whether the existing lock hold and new request are at session level or transaction level.

Like all locks in PostgreSQL, a complete list of advisory locks currently held by any session can be found in the pg_locks system view.

Both advisory locks and regular locks are stored in a shared memory pool whose size is defined by the configuration variables max_locks_per_transaction and max_connections. Care must be taken not to exhaust this memory or the server will be unable to grant any locks at all. This imposes an upper limit on the number of advisory locks grantable by the server, typically in the tens to hundreds of thousands depending on how the server is configured.

In certain cases using advisory locking methods, especially in queries involving explicit ordering and LIMIT clauses, care must be taken to control the locks acquired because of the order in which SQL expressions are evaluated. For example:

SELECT pg_advisory_lock(id) FROM foo WHERE id = 12345; -- ok
SELECT pg_advisory_lock(id) FROM foo WHERE id > 12345 LIMIT 100; -- danger!
SELECT pg_advisory_lock(q.id) FROM
(
  SELECT id FROM foo WHERE id > 12345 LIMIT 100
) q; -- ok

In the above queries, the second form is dangerous because the LIMIT is not guaranteed to be applied before the locking function is executed. This might cause some locks to be acquired that the application was not expecting, and hence would fail to release (until it ends the session). From the point of view of the application, such locks would be dangling, although still viewable in pg_locks.

The functions provided to manipulate advisory locks are described in Section 9.26.10.

13.4. 在應用端檢視資料一致性

It is very difficult to enforce business rules regarding data integrity using Read Committed transactions because the view of the data is shifting with each statement, and even a single statement may not restrict itself to the statement's snapshot if a write conflict occurs.

While a Repeatable Read transaction has a stable view of the data throughout its execution, there is a subtle issue with using MVCC snapshots for data consistency checks, involving something known as read/write conflicts. If one transaction writes data and a concurrent transaction attempts to read the same data (whether before or after the write), it cannot see the work of the other transaction. The reader then appears to have executed first regardless of which started first or which committed first. If that is as far as it goes, there is no problem, but if the reader also writes data which is read by a concurrent transaction there is now a transaction which appears to have run before either of the previously mentioned transactions. If the transaction which appears to have executed last actually commits first, it is very easy for a cycle to appear in a graph of the order of execution of the transactions. When such a cycle appears, integrity checks will not work correctly without some help.

As mentioned in Section 13.2.3, Serializable transactions are just Repeatable Read transactions which add nonblocking monitoring for dangerous patterns of read/write conflicts. When a pattern is detected which could cause a cycle in the apparent order of execution, one of the transactions involved is rolled back to break the cycle.

13.4.1. Enforcing Consistency With Serializable Transactions

If the Serializable transaction isolation level is used for all writes and for all reads which need a consistent view of the data, no other effort is required to ensure consistency. Software from other environments which is written to use serializable transactions to ensure consistency should “just work” in this regard in PostgreSQL.

When using this technique, it will avoid creating an unnecessary burden for application programmers if the application software goes through a framework which automatically retries transactions which are rolled back with a serialization failure. It may be a good idea to set default_transaction_isolation to serializable. It would also be wise to take some action to ensure that no other transaction isolation level is used, either inadvertently or to subvert integrity checks, through checks of the transaction isolation level in triggers.

See Section 13.2.3 for performance suggestions.

Warning

This level of integrity protection using Serializable transactions does not yet extend to hot standby mode (Section 26.5). Because of that, those using hot standby may want to use Repeatable Read and explicit locking on the master.

13.4.2. Enforcing Consistency With Explicit Blocking Locks

When non-serializable writes are possible, to ensure the current validity of a row and protect it against concurrent updates one must use SELECT FOR UPDATE, SELECT FOR SHARE, or an appropriate LOCK TABLE statement. (SELECT FOR UPDATE and SELECT FOR SHARE lock just the returned rows against concurrent updates, while LOCK TABLE locks the whole table.) This should be taken into account when porting applications to PostgreSQL from other environments.

Also of note to those converting from other environments is the fact that SELECT FOR UPDATE does not ensure that a concurrent transaction will not update or delete a selected row. To do that in PostgreSQL you must actually update the row, even if no values need to be changed. SELECT FOR UPDATE temporarily blocks other transactions from acquiring the same lock or executing an UPDATEor DELETE which would affect the locked row, but once the transaction holding this lock commits or rolls back, a blocked transaction will proceed with the conflicting operation unless an actual UPDATEof the row was performed while the lock was held.

Global validity checks require extra thought under non-serializable MVCC. For example, a banking application might wish to check that the sum of all credits in one table equals the sum of debits in another table, when both tables are being actively updated. Comparing the results of two successive SELECT sum(...) commands will not work reliably in Read Committed mode, since the second query will likely include the results of transactions not counted by the first. Doing the two sums in a single repeatable read transaction will give an accurate picture of only the effects of transactions that committed before the repeatable read transaction started — but one might legitimately wonder whether the answer is still relevant by the time it is delivered. If the repeatable read transaction itself applied some changes before trying to make the consistency check, the usefulness of the check becomes even more debatable, since now it includes some but not all post-transaction-start changes. In such cases a careful person might wish to lock all tables needed for the check, in order to get an indisputable picture of current reality. A SHARE mode (or higher) lock guarantees that there are no uncommitted changes in the locked table, other than those of the current transaction.

Note also that if one is relying on explicit locking to prevent concurrent changes, one should either use Read Committed mode, or in Repeatable Read mode be careful to obtain locks before performing queries. A lock obtained by a repeatable read transaction guarantees that no other transactions modifying the table are still running, but if the snapshot seen by the transaction predates obtaining the lock, it might predate some now-committed changes in the table. A repeatable read transaction's snapshot is actually frozen at the start of its first query or data-modification command (SELECT, INSERT, UPDATE, or DELETE), so it is possible to obtain locks explicitly before the snapshot is frozen.

13.5. 特別注意

有一些 DDL 命令，目前只有 TRUNCATE 和 ALTER TABLE 的資料表重寫語法，並不是 MVCC 安全的。這意味著在清除或重寫提交之後，如果使用在提交 DDL 指令之前的快照，會使該資料表對於平行處理中的事務將顯示為空。對於在 DDL 指令開始之前沒有存取相關資料表的事務來說，這會是一個問題 - 任何已經這樣做的事務都將至少保存一個 ACCESS SHARE 資料表鎖，這會在該事務完成之前阻擋 DDL 指令。因此，這些指令不會使目標資料表的連續查詢中導致資料表內容的明顯不一致，但它們可能會導致目標資料表的內容與資料庫中的其他資料表之間出現可見的不一致。

尚未將對 Serializable 事務隔離等級的支援增加到 Hot Standby 複製目標（如第 26.5 節中所述）。熱備份模式目前支援的最嚴格的隔離等級是 Repeatable Read。在主服務器上的 Serializable 事務中執行所有永久性資料庫寫操作時，將確保所有備用服務器最終能達到一致狀態，在備用服務器上執行的可重複讀事務處理有時會看到與任何串行執行的事務不一致的暫時狀態。

13.6. 鎖定與索引

Though PostgreSQL provides nonblocking read/write access to table data, nonblocking read/write access is not currently offered for every index access method implemented in PostgreSQL. The various index types are handled as follows:

B-tree, GiST and SP-GiST indexes

Short-term share/exclusive page-level locks are used for read/write access. Locks are released immediately after each index row is fetched or inserted. These index types provide the highest concurrency without deadlock conditions.

Hash indexes

Share/exclusive hash-bucket-level locks are used for read/write access. Locks are released after the whole bucket is processed. Bucket-level locks provide better concurrency than index-level ones, but deadlock is possible since the locks are held longer than one index operation.

GIN indexes

Short-term share/exclusive page-level locks are used for read/write access. Locks are released immediately after each index row is fetched or inserted. But note that insertion of a GIN-indexed value usually produces several index key insertions per row, so GIN might do substantial work for a single value's insertion.

Currently, B-tree indexes offer the best performance for concurrent applications; since they also have more features than hash indexes, they are the recommended index type for concurrent applications that need to index scalar data. When dealing with non-scalar data, B-trees are not useful, and GiST, SP-GiST or GIN indexes should be used instead.