1 of 7

13. 一致性管理（MVCC）

本章描述兩個或更多連線在同一時間嘗試存取相同資料時 PostgreSQL 資料庫系統的行為。這種情況下的目標是允許所有連線進行高效率的存取，同時保持嚴格的資料一致性。每個資料庫應用程式開發人員都應該熟悉本章所涉及的主題。

13.1. 簡介

PostgreSQL 為開發者們提供了豐富的工具來管理資料的同時存取。資料的一致性在資料內部是以多重資料版本的方式維護（Multiversion Concurrency Control，MVCC），這表示無論目前資料的當下狀態如何，每個 SQL 指令會看見的是資料在一段時間前的快照（資料庫的某個版本）。這個機制可以避免指令看到由其他同時交易正在更新同個資料列所產生的資料不一致，也對每個資料庫的連線階段提供了_交易隔離_。MVCC 也藉由避開傳統資料庫系統的上鎖方式減少了鎖的競爭，以在多使用者的環境中提供合理的效能。

相較於鎖定的機制來說，一致性控制使用 MVCC 模式的主要優勢，是在於 MVCC 對於查詢（讀取）資料的鎖並不會和寫入資料的鎖發生衝突，因此讀取不會阻擋寫入、寫入也不會阻擋讀取。PostgreSQL 即使在提供最嚴格的交易隔離等級中，也會透過使用創新的_可序列化快照隔離_（Serializable Snapshot Isolation，SSI）等級來維持這個保證。

對於不需要完整的交易隔離、或者喜歡明確地管理特定衝突點的應用程式，PostgreSQL 也提供資料表和資料列等級的鎖定功能。然而，適當地使用 MVCC 一般能夠提供比鎖定功能更佳的效能。此外，應用程式定義的 advisory lock 提供了一種與交易事務無關的鎖定機制。

13.2. 交易隔離

SQL 標準中定義了四個等級的交易隔離，其中最嚴格的隔離是「Serializable（序列化）」。序列化在標準的描述中，被定義為任意序列化交易的並行操作，都會保證產出與依照某種任意順序一個一個執行它們的效果相同。其他三個等級則是經由現象來定義的，即在不同的並行交易間的互動，這些現象在各個等級中必不能發生。標準中也注意到，基於 Serializable 的定義，這些現象都不可能發生在 Serializable 等級。（這並不難理解 -- 如果交易的影響必須與一個一個執行的結果一致，你怎麼會看到這些因為互動而產生的現象呢？）

在不同等級中被禁止的現象是：

dirty read（髒讀） 交易讀取的資料是由尚未提交的並行交易寫入的。

nonrepeatable read（無法重複的讀取） 交易重新讀取它之前讀過的資料，但是卻發現資料被其他交易修改（在最初讀取之後提交）了。

phantom read（幻讀） 交易重新執行查詢，得到滿足搜尋條件的資料集，但卻發現得到的資料集因為其他最近剛提交的交易而變更了。

serialization anomaly（序列化異常） 在成功提交一群交易後，結果與以所有可能的順序依序執行交易的結果都不一致。

SQL 標準以及 PostgreSQL 實作的交易隔離等級，可參閱 Table 13.1。

Table 13.1. 交易隔離等級

隔離等級

Dirty Read

Nonrepeatable Read

Phantom Read

Serialization Anomaly

在 PostgreSQL 當中，你可以要求上述四種交易隔離等級的任何一種，但在 PostgreSQL 內部實際上實作的只有三種隔離等級。舉例來說，PostgreSQL 的 Read Uncommitted 模式跟 Read Committed 的行為很相像，這是因為這是能夠把標準的隔離等級對應到 PostgreSQL 的 MVCC 架構的明智方法。

表格中也顯示了 PostgreSQL 的 Repeatable Read 的實作並不允許 phantom read。關於這點，在 SQL 標準中，更嚴格的行為是被容許的：標準中定義的四個隔離等級，只定義了哪些現象必須不會發生，而沒有定義哪些現象必須發生。可用的隔離等級的行為，在接下來的小節中會詳細描述。

若要設定交易的隔離等級，可使用指令 SET TRANSACTION。

有些 PostgreSQL 的資料型態和函式具有特殊的交易行為的規則。具體來說，對序列（以及以 serial 宣告的計算器欄位）造成的變更將會立刻能夠被所有其他交易所看見，並且即使造成變更的交易取消了也不會被還原。請參考 9.16 小節及 8.1.4 小節。

13.2.1. Read Committed 隔離等級

Read Committed（提交讀） 是 PostgreSQL 預設的隔離等級。當交易使用這個隔離等級時，一個 SELECT 的查詢（沒有 FOR UPDATE/SHARE 的宣告）只會看見在查詢開始前已經被提交的資料；它不會看見尚未提交的資料或者是在查詢執行過程中被並行的交易提交的變更。實際上，SELECT 查詢是看見了在查詢開始執行的那一瞬間的資料庫快照。然而，SELECT 會看見在它自己的交易中比它先執行的更新，即使那些更新尚未被提交。同時需要注意的是，對於兩個成功的 SELECT 指令，如果其他交易在第一個 SELECT 開始後、第二個 SELECT 開始前提交變更，則即使這兩個 SELECT 在同一個交易中也可能會看見不同的資料，

UPDATE、DELETE、SELECT FOR UPDATE、和 SELECT FOR SHARE 在搜尋目標資料列的行為與 SELECT 一樣：它們只會找出在指令開始時已經被提交的目標資料列。然而，這些資料列在被找到的時候，有可能已經被其他並行交易更新（或者刪除、鎖定），在這個情況下，這個更新者會等待第一個更新的交易提交或者還原（如果那個交易還正在進行中）。如果第一個更新者還原了，那麼它的影響就無效了，第二個更新者會以原本找到的資料列做更新。而當第一個更新者提交了的狀況，若它把資料列刪除，則第二個更新者會忽略這個資料列，否則其他狀況下第二個更新者會嘗試在更新後的資料列上套用它要做的操作。此時指令中的搜尋條件（WHERE 宣告）會被重新評估，看看更新後的資料列是否依然符合搜尋條件，如果符合的話，第二個更新者就會對更新後的資料列套用它的操作。在 SELECT FOR UPDATE 和 SELECT FOR SHARE 的案例中，這代表會對更新後的資料列上鎖並回傳給客戶端。

包含 ON CONFLICT DO UPDATE 宣告的 INSERT 的行為也很相似。在 Read Committed 模式中，每個插入的資料列會被插入或者更新，除非有其他不相關的錯誤，否則會保證兩種結果的其中一種發生。如果衝突來自於其影響尚未能被 INSERT 所看見的其他交易，即使指令可能沒有該資料列在慣例上可見的版本，UPDATE 宣告也將會影響那個資料列。

包含 ON CONFLICT DO NOTHING 宣告的 INSERT 在其他交易的影響尚未在 INSERT 的快照中可見的狀況，有可能不會執行插入。再次強調，這是只有在 Read Committed 模式的案例。

因為上述的規則，對更新的指令來說有可能會看見不一致的快照：它能夠看見其他並行的更新指令對它嘗試要更新的資料列的影響，但它不會看見這些其他指令對資料庫裡的其他資料列的影響。這個行為使得 Read Committed 模式並不適合牽涉到複雜的搜尋條件的指令；然而，它對於比較簡單的案例卻是剛剛好。例如，考慮以下更新銀行餘額的交易：

BEGIN;
UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345;
UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 7534;
COMMIT;

如果兩個這樣的交易並行嘗試更新帳戶 12345 的餘額，我們顯然會希望第二個交易以帳戶資料列的更新後版本開始。因為每個指令都只會影響可預先決定的資料列，使得讓它看到更新後的資料列版本並不會造成任何有問題的不一致。

其他更複雜的使用狀況，在 Read Committed 模式中可以產生出並非期望的結果。例如考慮一個 DELETE 指令，要操作正好從它的限制條件中加入和移除的資料，像是假設 website 是一張有兩個資料列的表格，其中 website.hits 分別為 9 和 10：

BEGIN;
UPDATE website SET hits = hits + 1;
-- run from another session:  DELETE FROM website WHERE hits = 10;
COMMIT;

即使在 UPDATE 之前與之後，都存在 website.hits = 10 的資料列，但 DELETE 指令不會產生任何影響。這是因為更新前值為 9 的資料列已經被忽略了，並且當 UPDATE 執行完成且 DELETE 取得鎖，新的資料列的值已經不是 10 而是 11，不再滿足限制條件了。

因為 Read Committed 模式在每個指令開始時會取得包含在那個瞬間已經提交的所有交易的快照，其後在同個交易中的指令無論如何都會看到並行交易提交的影響。上述的爭論點在於單一指令是否看見資料庫完全一致的視野。

Read Committed 提供的部份交易隔離對很多應用程式來說已經足夠，並且這個模式很快且容易使用；然而，它沒辦法滿足所有的案例。對於做複雜查詢和更新的應用程式，可能會需要比 Read Committed 更嚴格的資料庫一致性視野。

13.2.2. Repeatable Read 隔離等級

Repeatable Read（重複讀取） 隔離等級只會看到在交易開始前已經被提交的資料；它永遠不會看見尚未提交的資料或者在交易期間被並行交易提交的變更。（然而，查詢會看見在它自己的交易中前面的更新所造成的影響，即使那些影響尚未被提交。）這是比 SQL 標準對這個隔離等級要求的還要更強的保證，並且能夠預防除了 serialization anomalies 以外所有在表格 13.1描述的現象。如前所述，這是標準所允許的，因為標準只描述了每個隔離等級一定要提供的最小保護。

這個等級跟 Read Committed 不同的地方在於 Repeatable Read 的交易看見的是在交易的第一個非交易控制指令開始時的快照，而不是當前指令開始時的。因此，一個成功的 SELECT 指令在單一交易之內都會看見相同的資料，例如它們不會看見在它們的交易開始以後其他交易所提交的變更。

使用這個等級的應用程式必須要準備好因為序列化失敗造成的交易重試。

UPDATE、DELETE、SELECT FOR UPDATE、和 SELECT FOR SHARE 指令的行為在搜尋目標資料列時與 SELECT 相同：它們將只會找在交易開始之前已經被提交的目標資料列。然而，這些目標資料列有可能在它被搜尋到時，已經被其他並行交易更新（或者刪除、上鎖），此時，Repeatable Read 的交易會等待第一個更新的交易提交或者還原（如果它正在進行中）。如果第一個更新者還原了，那麼它的影響就無效了，Repeatable Read 的交易就可以對原本找到的資料列做更新。但如果第一個更新者提交了（且確實更新或者刪除這個資料列，而非只是鎖定它而已），那麼 Repeatable Read 的交易將會還原並回應以下的訊息。

ERROR:  could not serialize access due to concurrent update

這是因為 Repeatable Read 的交易無法更新或者鎖定在 Repeatable Read 的交易開始後被其他交易變更過的資料列。

當應用程式收到這個錯誤訊息，它應該要放棄現在的交易並且從頭開始重試整個交易。在第二次的期間，交易會看見之前提交的變更作為它對於資料庫的初始視野，因此在新的交易的更新中，不會有因為使用資料列的新版本為起始點而產生的邏輯衝突。

需注意的是，只有更新的交易可能需要重試；只有讀取的交易永遠不會發生序列化衝突。

Repeatable Read 模式提供了嚴格的保證，每個交易會看見完全穩定的資料庫視野。然而，這個視野並不需要跟並行交易的某些依序（一次一個）執行的結果總是維持一致。舉例來說，在這個等級中即使是一個只有讀取的交易，也可能只看見反應一個批次完成的控制紀錄的更新，但卻沒看見邏輯上是批次的一部分的一個細節紀錄變更，因為它讀取到的是比較早的控制紀錄的版本。嘗試想要在這個隔離等級的交易下執行商業邏輯時，若沒有謹慎地使用明確的鎖去阻止並行交易的話，可能不會正確地運作。

在 PostgreSQL 9.1 以前，一個採用 Serializable 交易隔離等級的要求，會提供跟這裡描述的完全一樣的行為。若想要保持過去的 Serializable 的行為，在現在應該要使用 Repeatable Read。

13.2.3. Serializable 隔離等級

Serializable（序列化） 隔離等級提供了最嚴格的交易隔離。這個等級模擬了對所有提交的交易的一系列交易執行；如同交易們被一個接著一個地執行，連續地、而不是並行地。然而，如同 Repeatable Read，應用程式使用這個等級時必須要準備好因為序列化失敗而重試交易。事實上，這個隔離等級運作地完全與 Repeatable Read 相同，除了它會監視可能造成並行的序列化交易的執行結果跟一個一個執行這些交易產生些微不一致的症狀。這個監視行為並不會比 Repeatable Read 產生額外任何的阻塞（blocking），但監視的確會有一些額外的負擔，並且偵測到會造成 serialization anomaly 的症狀將會觸發 serialization failure 。

舉例來說，考慮有一個表格 mytab，一開始有：

 class | value
-------+-------
     1 |    10
     1 |    20
     2 |   100
     2 |   200

假設有個序列化交易 A 要計算：

SELECT SUM(value) FROM mytab WHERE class = 1;

然後將結果（30）作為 value 插入為 class = 2 的新資料列。同時序列化交易 B 並行地計算：

SELECT SUM(value) FROM mytab WHERE class = 2;

並且得到結果 300，將它插入為 class = 1 的新資料列。接著兩個交易嘗試要提交。如果其中任何一個交易是以 Repeatable Read 隔離等級執行的，兩個交易都會被允許提交；但因為並沒有序列的執行順序與這個結果一致，使用 Serializable 的交易將會導致一個交易提交、另一個被還原並回覆這個訊息：

ERROR:  could not serialize access due to read/write dependencies among transactions

這是因為如果 A 在 B 之前執行的話，B 會得到結果 330 而非 300；同樣的，其他順序會導致 A 得到不同的結果。

When relying on Serializable transactions to prevent anomalies, it is important that any data read from a permanent user table not be considered valid until the transaction which read it has successfully committed. This is true even for read-only transactions, except that data read within a deferrable read-only transaction is known to be valid as soon as it is read, because such a transaction waits until it can acquire a snapshot guaranteed to be free from such problems before starting to read any data. In all other cases applications must not depend on results read during a transaction that later aborted; instead, they should retry the transaction until it succeeds.

To guarantee true serializability PostgreSQL uses predicate locking, which means that it keeps locks which allow it to determine when a write would have had an impact on the result of a previous read from a concurrent transaction, had it run first. In PostgreSQL these locks do not cause any blocking and therefore can not play any part in causing a deadlock. They are used to identify and flag dependencies among concurrent Serializable transactions which in certain combinations can lead to serialization anomalies. In contrast, a Read Committed or Repeatable Read transaction which wants to ensure data consistency may need to take out a lock on an entire table, which could block other users attempting to use that table, or it may use SELECT FOR UPDATE or SELECT FOR SHARE which not only can block other transactions but cause disk access.

Predicate locks in PostgreSQL, like in most other database systems, are based on data actually accessed by a transaction. These will show up in the pg_locks system view with a mode of SIReadLock. The particular locks acquired during execution of a query will depend on the plan used by the query, and multiple finer-grained locks (e.g., tuple locks) may be combined into fewer coarser-grained locks (e.g., page locks) during the course of the transaction to prevent exhaustion of the memory used to track the locks. A READ ONLYtransaction may be able to release its SIRead locks before completion, if it detects that no conflicts can still occur which could lead to a serialization anomaly. In fact, READ ONLYtransactions will often be able to establish that fact at startup and avoid taking any predicate locks. If you explicitly request a SERIALIZABLE READ ONLY DEFERRABLE transaction, it will block until it can establish this fact. (This is the only case where Serializable transactions block but Repeatable Read transactions don't.) On the other hand, SIRead locks often need to be kept past transaction commit, until overlapping read write transactions complete.

Consistent use of Serializable transactions can simplify development. The guarantee that any set of successfully committed concurrent Serializable transactions will have the same effect as if they were run one at a time means that if you can demonstrate that a single transaction, as written, will do the right thing when run by itself, you can have confidence that it will do the right thing in any mix of Serializable transactions, even without any information about what those other transactions might do, or it will not successfully commit. It is important that an environment which uses this technique have a generalized way of handling serialization failures (which always return with a SQLSTATE value of '40001'), because it will be very hard to predict exactly which transactions might contribute to the read/write dependencies and need to be rolled back to prevent serialization anomalies. The monitoring of read/write dependencies has a cost, as does the restart of transactions which are terminated with a serialization failure, but balanced against the cost and blocking involved in use of explicit locks and SELECT FOR UPDATE or SELECT FOR SHARE, Serializable transactions are the best performance choice for some environments.

While PostgreSQL's Serializable transaction isolation level only allows concurrent transactions to commit if it can prove there is a serial order of execution that would produce the same effect, it doesn't always prevent errors from being raised that would not occur in true serial execution. In particular, it is possible to see unique constraint violations caused by conflicts with overlapping Serializable transactions even after explicitly checking that the key isn't present before attempting to insert it. This can be avoided by making sure that all Serializable transactions that insert potentially conflicting keys explicitly check if they can do so first. For example, imagine an application that asks the user for a new key and then checks that it doesn't exist already by trying to select it first, or generates a new key by selecting the maximum existing key and adding one. If some Serializable transactions insert new keys directly without following this protocol, unique constraints violations might be reported even in cases where they could not occur in a serial execution of the concurrent transactions.

For optimal performance when relying on Serializable transactions for concurrency control, these issues should be considered:

Declare transactions as READ ONLY when possible.
Control the number of active connections, using a connection pool if needed. This is always an important performance consideration, but it can be particularly important in a busy system using Serializable transactions.
Don't put more into a single transaction than needed for integrity purposes.
Don't leave connections dangling “idle in transaction” longer than necessary. The configuration parameter idle_in_transaction_session_timeout may be used to automatically disconnect lingering sessions.
Eliminate explicit locks, SELECT FOR UPDATE, and SELECT FOR SHARE where no longer needed due to the protections automatically provided by Serializable transactions.
When the system is forced to combine multiple page-level predicate locks into a single relation-level predicate lock because the predicate lock table is short of memory, an increase in the rate of serialization failures may occur. You can avoid this by increasing max_pred_locks_per_transaction, max_pred_locks_per_relation, and/or max_pred_locks_per_page.
A sequential scan will always necessitate a relation-level predicate lock. This can result in an increased rate of serialization failures. It may be helpful to encourage the use of index scans by reducing random_page_cost and/or increasing cpu_tuple_cost. Be sure to weigh any decrease in transaction rollbacks and restarts against any overall change in query execution time.

13.3. 鎖定模式

PostgreSQL provides various lock modes to control concurrent access to data in tables. These modes can be used for application-controlled locking in situations where MVCC does not give the desired behavior. Also, most PostgreSQL commands automatically acquire locks of appropriate modes to ensure that referenced tables are not dropped or modified in incompatible ways while the command executes. (For example, TRUNCATE cannot safely be executed concurrently with other operations on the same table, so it obtains an exclusive lock on the table to enforce that.)

To examine a list of the currently outstanding locks in a database server, use the pg_locks system view. For more information on monitoring the status of the lock manager subsystem, refer to Chapter 27.

13.3.1. Table-Level Locks

The list below shows the available lock modes and the contexts in which they are used automatically by PostgreSQL. You can also acquire any of these locks explicitly with the command LOCK. Remember that all of these lock modes are table-level locks, even if the name contains the word “row”; the names of the lock modes are historical. To some extent the names reflect the typical usage of each lock mode — but the semantics are all the same. The only real difference between one lock mode and another is the set of lock modes with which each conflicts (see Table 13.2). Two transactions cannot hold locks of conflicting modes on the same table at the same time. (However, a transaction never conflicts with itself. For example, it might acquire ACCESS EXCLUSIVE lock and later acquire ACCESS SHARE lock on the same table.) Non-conflicting lock modes can be held concurrently by many transactions. Notice in particular that some lock modes are self-conflicting (for example, an ACCESS EXCLUSIVE lock cannot be held by more than one transaction at a time) while others are not self-conflicting (for example, an ACCESS SHARE lock can be held by multiple transactions).

Table-Level Lock Modes

ACCESS SHARE

Conflicts with the ACCESS EXCLUSIVE lock mode only.

The SELECT command acquires a lock of this mode on referenced tables. In general, any query that only reads a table and does not modify it will acquire this lock mode.

ROW SHARE

Conflicts with the EXCLUSIVE and ACCESS EXCLUSIVE lock modes.

The SELECT FOR UPDATE and SELECT FOR SHARE commands acquire a lock of this mode on the target table(s) (in addition to ACCESS SHARE locks on any other tables that are referenced but not selected FOR UPDATE/FOR SHARE).

ROW EXCLUSIVE

Conflicts with the SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes.

The commands UPDATE, DELETE, and INSERT acquire this lock mode on the target table (in addition to ACCESS SHARE locks on any other referenced tables). In general, this lock mode will be acquired by any command that modifies data in a table.

SHARE UPDATE EXCLUSIVE

與 SHARE UPDATE EXCLUSIVE、SHARE、SHARE ROW EXCLUSIVE，EXCLUSIVE 和 ACCESS EXCLUSIVE 鎖定模式衝突。此模式可防止資料表發生同時間的資料庫結構變更及 VACUUM 執行。

由 VACUUM（非 FULL）取用，ANALYZE，CREATE INDEX CONCURRENTLY、REINDEX CONCURRENTLY、CREATE STATISTICS 以及某些 ALTER INDEX 和 ALTER TABLE 的指令（有關詳細資訊，請參閱 ALTER INDEX 和 ALTER TABLE）。

SHARE

Conflicts with the ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode protects a table against concurrent data changes.

Acquired by CREATE INDEX (without CONCURRENTLY).

SHARE ROW EXCLUSIVE

Conflicts with the ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode protects a table against concurrent data changes, and is self-exclusive so that only one session can hold it at a time.

Acquired by CREATE TRIGGER and some forms of ALTER TABLE (see ALTER TABLE).

EXCLUSIVE

Conflicts with the ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode allows only concurrent ACCESS SHARE locks, i.e., only reads from the table can proceed in parallel with a transaction holding this lock mode.

Acquired by REFRESH MATERIALIZED VIEW CONCURRENTLY.

ACCESS EXCLUSIVE

Conflicts with locks of all modes (ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE). This mode guarantees that the holder is the only transaction accessing the table in any way.

Acquired by the DROP TABLE, TRUNCATE, REINDEX, CLUSTER, VACUUM FULL, and REFRESH MATERIALIZED VIEW (without CONCURRENTLY) commands. Many forms of ALTER INDEX and ALTER TABLE also acquire a lock at this level. This is also the default lock mode for LOCK TABLE statements that do not specify a mode explicitly.

僅 ACCESS EXCLUSIVE 鎖定會阻擋 SELECT（無 FOR UPDATE / SHARE）語句。

Once acquired, a lock is normally held until the end of the transaction. But if a lock is acquired after establishing a savepoint, the lock is released immediately if the savepoint is rolled back to. This is consistent with the principle that ROLLBACK cancels all effects of the commands since the savepoint. The same holds for locks acquired within a PL/pgSQL exception block: an error escape from the block releases locks acquired within it.

Table 13.2. Conflicting Lock Modes

13.3.2. Row-Level Locks

除資料表層級的鎖定外，還有資料列層級的鎖定，下面列出了這些資料列層級的鎖定以及 PostgreSQL 自動使用它們的情形。有關資料列層級鎖衝突的完整列表，請參閱 Table 13.3。請注意，即使在不同的子事務中，事務也可以在同一筆資料上上持有相衝突的鎖定。但是除此之外，兩個事務永遠不能在同一筆資料上持有相衝突的鎖定。資料列層級的鎖定不會影響資料查詢。它們只是阻止同一筆資料的寫入和鎖定。資料列層級的鎖定會在事務結束時或在保存點回溯期間釋放，就像資料表層級的鎖定一樣。

Row-Level Lock Modes

FOR UPDATE

FOR UPDATE 會使 SELECT 語句所檢索到的資料被鎖定，就像是要進行更新一樣。這樣可以防止它們被其他事務鎖定，修改或刪除，直到目前的事務結束為止。也就是說，將阻止其他想要嘗試對於這些資料進行 UPDATE，DELETE，SELECT FOR UPDATE，SELECT FOR NO KEY UPDATE，SELECT FOR SHARE 或 SELECT FOR KEY SHARE 的事務，直到目前事務結束為止。相反地，SELECT FOR UPDATE 將等待在同一筆資料上已經執行了以上這些命令的線上事務，然後將鎖定並回傳更新的資料（如果刪除了該筆資料，則不回傳任何資料）。但是，在 REPEATABLE READ 或 SERIALIZABLE 事務中，如果從事務起始以來要鎖定的資料已被修改，則將引發錯誤。有關更多討論，請參閱第 13.4 節。

資料列上的任何 DELETE 以及修改某些欄位值的 UPDATE 也會獲取 FOR UPDATE 鎖定模式。目前在UPDATE情況下考慮使用的欄位集合是可以在外部鍵上使用的唯一索引（因此不考慮部分索引和表式式索引），但是將來可能會有所改變。

FOR NO KEY UPDATE

行為與 FOR UPDATE 類似，除了獲取的鎖定較弱勢之外：此鎖定不會阻止試圖獲取同一筆資料上鎖定的 SELECT FOR KEY SHARE 指令。任何不會取得 FOR UPDATE 鎖定的 UPDATE 也會取得此鎖定模式。

FOR SHARE

行為類似於 FOR NO KEY UPDATE，不同之處在於它在每筆檢索到的資料上取得一個共享鎖定，而不是互斥鎖定。共享鎖定可以阻止其他事務在這些資料上執行 UPDATE，DELETE，SELECT FOR UPDATE 或 SELECT FOR NO KEY UPDATE，但不會阻止它們執行 SELECT FOR SHARE 或 SELECT FOR KEY SHARE。

FOR KEY SHARE

行為與 FOR SHARE 類似，除了鎖定較弱勢之外：SELECT FOR UPDATE 被阻擋，但 SELECT FOR NO KEY UPDATE 不會。 SELECT FOR NO KEY 鎖定會阻止其他事務執行 DELETE 或任何變更鍵值的 UPDATE 操作，但不會阻止其他 UPDATE 操作，也不會阻止 SELECT FOR NO KEY UPDATE，SELECT FOR SHARE 或 SELECT FOR KEY SHARE。

PostgreSQL 不會去記住記憶體中已修改資料的任何資訊，因此一次鎖定的資料筆數沒有限制。但是，鎖定一筆資料就可能會導致磁碟寫入行為。例如，SELECT FOR UPDATE 會修改所選資料以將其標記為已鎖定，因此將產生磁碟寫入行為。

Table 13.3. Conflicting Row-Level Locks

13.3.3. Page-Level Locks

除了資料表和資料列層級的鎖定之外，頁面層級共享/獨占鎖定用於控制對共享緩衝區中資料表頁面的讀寫存取。在取得或更新一筆資料後，就會立即釋放這些鎖定。應用程序開發人員通常不必關心頁面層級的鎖定，但是出於說明完整性的考量，還是在此稍作說明。

13.3.4. Deadlocks

The use of explicit locking can increase the likelihood of deadlocks, wherein two (or more) transactions each hold locks that the other wants. For example, if transaction 1 acquires an exclusive lock on table A and then tries to acquire an exclusive lock on table B, while transaction 2 has already exclusive-locked table B and now wants an exclusive lock on table A, then neither one can proceed. PostgreSQL automatically detects deadlock situations and resolves them by aborting one of the transactions involved, allowing the other(s) to complete. (Exactly which transaction will be aborted is difficult to predict and should not be relied upon.)

Note that deadlocks can also occur as the result of row-level locks (and thus, they can occur even if explicit locking is not used). Consider the case in which two concurrent transactions modify a table. The first transaction executes:

UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 11111;

This acquires a row-level lock on the row with the specified account number. Then, the second transaction executes:

UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 22222;
UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 11111;

The first UPDATE statement successfully acquires a row-level lock on the specified row, so it succeeds in updating that row. However, the second UPDATE statement finds that the row it is attempting to update has already been locked, so it waits for the transaction that acquired the lock to complete. Transaction two is now waiting on transaction one to complete before it continues execution. Now, transaction one executes:

UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;

Transaction one attempts to acquire a row-level lock on the specified row, but it cannot: transaction two already holds such a lock. So it waits for transaction two to complete. Thus, transaction one is blocked on transaction two, and transaction two is blocked on transaction one: a deadlock condition. PostgreSQL will detect this situation and abort one of the transactions.

The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order. In the example above, if both transactions had updated the rows in the same order, no deadlock would have occurred. One should also ensure that the first lock acquired on an object in a transaction is the most restrictive mode that will be needed for that object. If it is not feasible to verify this in advance, then deadlocks can be handled on-the-fly by retrying transactions that abort due to deadlocks.

只要未檢測到死鎖情況，尋求資料表層級或資料列層級鎖定的事務將無限期地等待衝突的鎖定被釋放。對於應用程序來說，長時間保持事務處於打開狀態（例如，在等待使用者輸入時）不是一件好事。

13.3.5. Advisory Locks

PostgreSQL 提供了一種建立具有應用程式定義的鎖定方法。這些被稱為 advisory lock，因為系統不會強制使用它們-取決於應用程式是否正確地使用它們。Advisory lock 對於不適用於 MVCC 模型的鎖定策略很有用。例如，Advisory lock 常見的用法是模擬所謂的「flat file」資料管理系統特有的悲觀鎖定策略。雖然儲存在資料表中的識別內容可以用於相同的目的，但 Advisory lock 更快，可以避免資料表膨脹，並在連線結束時由伺服器自動清除。

There are two ways to acquire an advisory lock in PostgreSQL: at session level or at transaction level. Once acquired at session level, an advisory lock is held until explicitly released or the session ends. Unlike standard lock requests, session-level advisory lock requests do not honor transaction semantics: a lock acquired during a transaction that is later rolled back will still be held following the rollback, and likewise an unlock is effective even if the calling transaction fails later. A lock can be acquired multiple times by its owning process; for each completed lock request there must be a corresponding unlock request before the lock is actually released. Transaction-level lock requests, on the other hand, behave more like regular lock requests: they are automatically released at the end of the transaction, and there is no explicit unlock operation. This behavior is often more convenient than the session-level behavior for short-term usage of an advisory lock. Session-level and transaction-level lock requests for the same advisory lock identifier will block each other in the expected way. If a session already holds a given advisory lock, additional requests by it will always succeed, even if other sessions are awaiting the lock; this statement is true regardless of whether the existing lock hold and new request are at session level or transaction level.

Like all locks in PostgreSQL, a complete list of advisory locks currently held by any session can be found in the pg_locks system view.

Both advisory locks and regular locks are stored in a shared memory pool whose size is defined by the configuration variables max_locks_per_transaction and max_connections. Care must be taken not to exhaust this memory or the server will be unable to grant any locks at all. This imposes an upper limit on the number of advisory locks grantable by the server, typically in the tens to hundreds of thousands depending on how the server is configured.

In certain cases using advisory locking methods, especially in queries involving explicit ordering and LIMIT clauses, care must be taken to control the locks acquired because of the order in which SQL expressions are evaluated. For example:

SELECT pg_advisory_lock(id) FROM foo WHERE id = 12345; -- ok
SELECT pg_advisory_lock(id) FROM foo WHERE id > 12345 LIMIT 100; -- danger!
SELECT pg_advisory_lock(q.id) FROM
(
  SELECT id FROM foo WHERE id > 12345 LIMIT 100
) q; -- ok

In the above queries, the second form is dangerous because the LIMIT is not guaranteed to be applied before the locking function is executed. This might cause some locks to be acquired that the application was not expecting, and hence would fail to release (until it ends the session). From the point of view of the application, such locks would be dangling, although still viewable in pg_locks.

The functions provided to manipulate advisory locks are described in Section 9.26.10.

13.4. 在應用端檢視資料一致性

It is very difficult to enforce business rules regarding data integrity using Read Committed transactions because the view of the data is shifting with each statement, and even a single statement may not restrict itself to the statement's snapshot if a write conflict occurs.

While a Repeatable Read transaction has a stable view of the data throughout its execution, there is a subtle issue with using MVCC snapshots for data consistency checks, involving something known as read/write conflicts. If one transaction writes data and a concurrent transaction attempts to read the same data (whether before or after the write), it cannot see the work of the other transaction. The reader then appears to have executed first regardless of which started first or which committed first. If that is as far as it goes, there is no problem, but if the reader also writes data which is read by a concurrent transaction there is now a transaction which appears to have run before either of the previously mentioned transactions. If the transaction which appears to have executed last actually commits first, it is very easy for a cycle to appear in a graph of the order of execution of the transactions. When such a cycle appears, integrity checks will not work correctly without some help.

As mentioned in Section 13.2.3, Serializable transactions are just Repeatable Read transactions which add nonblocking monitoring for dangerous patterns of read/write conflicts. When a pattern is detected which could cause a cycle in the apparent order of execution, one of the transactions involved is rolled back to break the cycle.

13.4.1. Enforcing Consistency With Serializable Transactions

If the Serializable transaction isolation level is used for all writes and for all reads which need a consistent view of the data, no other effort is required to ensure consistency. Software from other environments which is written to use serializable transactions to ensure consistency should “just work” in this regard in PostgreSQL.

When using this technique, it will avoid creating an unnecessary burden for application programmers if the application software goes through a framework which automatically retries transactions which are rolled back with a serialization failure. It may be a good idea to set default_transaction_isolation to serializable. It would also be wise to take some action to ensure that no other transaction isolation level is used, either inadvertently or to subvert integrity checks, through checks of the transaction isolation level in triggers.

See Section 13.2.3 for performance suggestions.

Warning

This level of integrity protection using Serializable transactions does not yet extend to hot standby mode (Section 26.5). Because of that, those using hot standby may want to use Repeatable Read and explicit locking on the master.

13.4.2. Enforcing Consistency With Explicit Blocking Locks

When non-serializable writes are possible, to ensure the current validity of a row and protect it against concurrent updates one must use SELECT FOR UPDATE, SELECT FOR SHARE, or an appropriate LOCK TABLE statement. (SELECT FOR UPDATE and SELECT FOR SHARE lock just the returned rows against concurrent updates, while LOCK TABLE locks the whole table.) This should be taken into account when porting applications to PostgreSQL from other environments.

Also of note to those converting from other environments is the fact that SELECT FOR UPDATE does not ensure that a concurrent transaction will not update or delete a selected row. To do that in PostgreSQL you must actually update the row, even if no values need to be changed. SELECT FOR UPDATE temporarily blocks other transactions from acquiring the same lock or executing an UPDATEor DELETE which would affect the locked row, but once the transaction holding this lock commits or rolls back, a blocked transaction will proceed with the conflicting operation unless an actual UPDATEof the row was performed while the lock was held.

Global validity checks require extra thought under non-serializable MVCC. For example, a banking application might wish to check that the sum of all credits in one table equals the sum of debits in another table, when both tables are being actively updated. Comparing the results of two successive SELECT sum(...) commands will not work reliably in Read Committed mode, since the second query will likely include the results of transactions not counted by the first. Doing the two sums in a single repeatable read transaction will give an accurate picture of only the effects of transactions that committed before the repeatable read transaction started — but one might legitimately wonder whether the answer is still relevant by the time it is delivered. If the repeatable read transaction itself applied some changes before trying to make the consistency check, the usefulness of the check becomes even more debatable, since now it includes some but not all post-transaction-start changes. In such cases a careful person might wish to lock all tables needed for the check, in order to get an indisputable picture of current reality. A SHARE mode (or higher) lock guarantees that there are no uncommitted changes in the locked table, other than those of the current transaction.

Note also that if one is relying on explicit locking to prevent concurrent changes, one should either use Read Committed mode, or in Repeatable Read mode be careful to obtain locks before performing queries. A lock obtained by a repeatable read transaction guarantees that no other transactions modifying the table are still running, but if the snapshot seen by the transaction predates obtaining the lock, it might predate some now-committed changes in the table. A repeatable read transaction's snapshot is actually frozen at the start of its first query or data-modification command (SELECT, INSERT, UPDATE, or DELETE), so it is possible to obtain locks explicitly before the snapshot is frozen.

13.5. 特別注意

有一些 DDL 命令，目前只有和的資料表重寫語法，並不是 MVCC 安全的。這意味著在清除或重寫提交之後，如果使用在提交 DDL 指令之前的快照，會使該資料表對於平行處理中的事務將顯示為空。對於在 DDL 指令開始之前沒有存取相關資料表的事務來說，這會是一個問題 - 任何已經這樣做的事務都將至少保存一個 ACCESS SHARE 資料表鎖，這會在該事務完成之前阻擋 DDL 指令。因此，這些指令不會使目標資料表的連續查詢中導致資料表內容的明顯不一致，但它們可能會導致目標資料表的內容與資料庫中的其他資料表之間出現可見的不一致。

尚未將對 Serializable 事務隔離等級的支援增加到 Hot Standby 複製目標（如中所述）。熱備份模式目前支援的最嚴格的隔離等級是 Repeatable Read。在主服務器上的 Serializable 事務中執行所有永久性資料庫寫操作時，將確保所有備用服務器最終能達到一致狀態，在備用服務器上執行的可重複讀事務處理有時會看到與任何串行執行的事務不一致的暫時狀態。

13.6. 鎖定與索引

Though PostgreSQL provides nonblocking read/write access to table data, nonblocking read/write access is not currently offered for every index access method implemented in PostgreSQL. The various index types are handled as follows:

B-tree, GiST and SP-GiST indexes

Short-term share/exclusive page-level locks are used for read/write access. Locks are released immediately after each index row is fetched or inserted. These index types provide the highest concurrency without deadlock conditions.

Hash indexes

Share/exclusive hash-bucket-level locks are used for read/write access. Locks are released after the whole bucket is processed. Bucket-level locks provide better concurrency than index-level ones, but deadlock is possible since the locks are held longer than one index operation.

GIN indexes

Short-term share/exclusive page-level locks are used for read/write access. Locks are released immediately after each index row is fetched or inserted. But note that insertion of a GIN-indexed value usually produces several index key insertions per row, so GIN might do substantial work for a single value's insertion.

Currently, B-tree indexes offer the best performance for concurrent applications; since they also have more features than hash indexes, they are the recommended index type for concurrent applications that need to index scalar data. When dealing with non-scalar data, B-trees are not useful, and GiST, SP-GiST or GIN indexes should be used instead.