Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Views in PostgreSQL are implemented using the rule system. In fact, there is essentially no difference between:
compared against the two commands:
because this is exactly what the CREATE VIEW
command does internally. This has some side effects. One of them is that the information about a view in the PostgreSQL system catalogs is exactly the same as it is for a table. So for the parser, there is absolutely no difference between a table and a view. They are the same thing: relations.
SELECT
Rules WorkRules ON SELECT
are applied to all queries as the last step, even if the command given is an INSERT
, UPDATE
or DELETE
. And they have different semantics from rules on the other command types in that they modify the query tree in place instead of creating a new one. So SELECT
rules are described first.
Currently, there can be only one action in an ON SELECT
rule, and it must be an unconditional SELECT
action that is INSTEAD
. This restriction was required to make rules safe enough to open them for ordinary users, and it restricts ON SELECT
rules to act like views.
The examples for this chapter are two join views that do some calculations and some more views using them in turn. One of the two first views is customized later by adding rules for INSERT
, UPDATE
, and DELETE
operations so that the final result will be a view that behaves like a real table with some magic functionality. This is not such a simple example to start from and this makes things harder to get into. But it's better to have one example that covers all the points discussed step by step rather than having many different ones that might mix up in mind.
For the example, we need a little min
function that returns the lower of 2 integer values. We create that as:
The real tables we need in the first two rule system descriptions are these:
As you can see, they represent shoe-store data.
The views are created as:
The CREATE VIEW
command for the shoelace
view (which is the simplest one we have) will create a relation shoelace
and an entry in pg_rewrite
that tells that there is a rewrite rule that must be applied whenever the relation shoelace
is referenced in a query's range table. The rule has no rule qualification (discussed later, with the non-SELECT
rules, since SELECT
rules currently cannot have them) and it is INSTEAD
. Note that rule qualifications are not the same as query qualifications. The action of our rule has a query qualification. The action of the rule is one query tree that is a copy of the SELECT
statement in the view creation command.
The two extra range table entries for NEW
and OLD
that you can see in the pg_rewrite
entry aren't of interest for SELECT
rules.
Now we populate unit
, shoe_data
and shoelace_data
and run a simple query on a view:
This is the simplest SELECT
you can do on our views, so we take this opportunity to explain the basics of view rules. The SELECT * FROM shoelace
was interpreted by the parser and produced the query tree:
and this is given to the rule system. The rule system walks through the range table and checks if there are rules for any relation. When processing the range table entry for shoelace
(the only one up to now) it finds the _RETURN
rule with the query tree:
To expand the view, the rewriter simply creates a subquery range-table entry containing the rule's action query tree, and substitutes this range table entry for the original one that referenced the view. The resulting rewritten query tree is almost the same as if you had typed:
There is one difference however: the subquery's range table has two extra entries shoelace old
and shoelace new
. These entries don't participate directly in the query, since they aren't referenced by the subquery's join tree or target list. The rewriter uses them to store the access privilege check information that was originally present in the range-table entry that referenced the view. In this way, the executor will still check that the user has proper privileges to access the view, even though there's no direct use of the view in the rewritten query.
That was the first rule applied. The rule system will continue checking the remaining range-table entries in the top query (in this example there are no more), and it will recursively check the range-table entries in the added subquery to see if any of them reference views. (But it won't expand old
or new
— otherwise we'd have infinite recursion!) In this example, there are no rewrite rules for shoelace_data
or unit
, so rewriting is complete and the above is the final result given to the planner.
Now we want to write a query that finds out for which shoes currently in the store we have the matching shoelaces (color and length) and where the total number of exactly matching pairs is greater or equal to two.
The output of the parser this time is the query tree:
The first rule applied will be the one for the shoe_ready
view and it results in the query tree:
Similarly, the rules for shoe
and shoelace
are substituted into the range table of the subquery, leading to a three-level final query tree:
It turns out that the planner will collapse this tree into a two-level query tree: the bottommost SELECT
commands will be “pulled up” into the middle SELECT
since there's no need to process them separately. But the middle SELECT
will remain separate from the top, because it contains aggregate functions. If we pulled those up it would change the behavior of the topmost SELECT
, which we don't want. However, collapsing the query tree is an optimization that the rewrite system doesn't have to concern itself with.
SELECT
StatementsTwo details of the query tree aren't touched in the description of view rules above. These are the command type and the result relation. In fact, the command type is not needed by view rules, but the result relation may affect the way in which the query rewriter works, because special care needs to be taken if the result relation is a view.
There are only a few differences between a query tree for a SELECT
and one for any other command. Obviously, they have a different command type and for a command other than a SELECT
, the result relation points to the range-table entry where the result should go. Everything else is absolutely the same. So having two tables t1
and t2
with columns a
and b
, the query trees for the two statements:
are nearly identical. In particular:
The range tables contain entries for the tables t1
and t2
.
The target lists contain one variable that points to column b
of the range table entry for table t2
.
The qualification expressions compare the columns a
of both range-table entries for equality.
The join trees show a simple join between t1
and t2
.
The consequence is, that both query trees result in similar execution plans: They are both joins over the two tables. For the UPDATE
the missing columns from t1
are added to the target list by the planner and the final query tree will read as:
and thus the executor run over the join will produce exactly the same result set as:
But there is a little problem in UPDATE
: the part of the executor plan that does the join does not care what the results from the join are meant for. It just produces a result set of rows. The fact that one is a SELECT
command and the other is an UPDATE
is handled higher up in the executor, where it knows that this is an UPDATE
, and it knows that this result should go into table t1
. But which of the rows that are there has to be replaced by the new row?
To resolve this problem, another entry is added to the target list in UPDATE
(and also in DELETE
) statements: the current tuple ID (CTID). This is a system column containing the file block number and position in the block for the row. Knowing the table, the CTID can be used to retrieve the original row of t1
to be updated. After adding the CTID to the target list, the query actually looks like:
Now another detail of PostgreSQL enters the stage. Old table rows aren't overwritten, and this is why ROLLBACK
is fast. In an UPDATE
, the new result row is inserted into the table (after stripping the CTID) and in the row header of the old row, which the CTID pointed to, the cmax
and xmax
entries are set to the current command counter and current transaction ID. Thus the old row is hidden, and after the transaction commits the vacuum cleaner can eventually remove the dead row.
Knowing all that, we can simply apply view rules in absolutely the same way to any command. There is no difference.
The above demonstrates how the rule system incorporates view definitions into the original query tree. In the second example, a simple SELECT
from one view created a final query tree that is a join of 4 tables (unit
was used twice with different names).
The benefit of implementing views with the rule system is, that the planner has all the information about which tables have to be scanned plus the relationships between these tables plus the restrictive qualifications from the views plus the qualifications from the original query in one single query tree. And this is still the situation when the original query is already a join over views. The planner has to decide which is the best path to execute the query, and the more information the planner has, the better this decision can be. And the rule system as implemented in PostgreSQL ensures, that this is all information available about the query up to that point.
What happens if a view is named as the target relation for an INSERT
, UPDATE
, or DELETE
? Doing the substitutions described above would give a query tree in which the result relation points at a subquery range-table entry, which will not work. There are several ways in which PostgreSQL can support the appearance of updating a view, however.
If the subquery selects from a single base relation and is simple enough, the rewriter can automatically replace the subquery with the underlying base relation so that the INSERT
, UPDATE
, or DELETE
is applied to the base relation in the appropriate way. Views that are “simple enough” for this are called automatically updatable. For detailed information on the kinds of view that can be automatically updated, see CREATE VIEW.
Alternatively, the operation may be handled by a user-provided INSTEAD OF
trigger on the view. Rewriting works slightly differently in this case. For INSERT
, the rewriter does nothing at all with the view, leaving it as the result relation for the query. For UPDATE
and DELETE
, it's still necessary to expand the view query to produce the “old” rows that the command will attempt to update or delete. So the view is expanded as normal, but another unexpanded range-table entry is added to the query to represent the view in its capacity as the result relation.
The problem that now arises is how to identify the rows to be updated in the view. Recall that when the result relation is a table, a special CTID entry is added to the target list to identify the physical locations of the rows to be updated. This does not work if the result relation is a view, because a view does not have any CTID, since its rows do not have actual physical locations. Instead, for an UPDATE
or DELETE
operation, a special wholerow
entry is added to the target list, which expands to include all columns from the view. The executor uses this value to supply the “old” row to the INSTEAD OF
trigger. It is up to the trigger to work out what to update based on the old and new row values.
Another possibility is for the user to define INSTEAD
rules that specify substitute actions for INSERT
, UPDATE
, and DELETE
commands on a view. These rules will rewrite the command, typically into a command that updates one or more tables, rather than views. That is the topic of Section 40.4.
Note that rules are evaluated first, rewriting the original query before it is planned and executed. Therefore, if a view has INSTEAD OF
triggers as well as rules on INSERT
, UPDATE
, or DELETE
, then the rules will be evaluated first, and depending on the result, the triggers may not be used at all.
Automatic rewriting of an INSERT
, UPDATE
, or DELETE
query on a simple view is always tried last. Therefore, if a view has rules or triggers, they will override the default behavior of automatically updatable views.
If there are no INSTEAD
rules or INSTEAD OF
triggers for the view, and the rewriter cannot automatically rewrite the query as an update on the underlying base relation, an error will be thrown because the executor cannot update a view as such.
Materialized views in PostgreSQL use the rule system like views do, but persist the results in a table-like form. The main differences between:
and:
are that the materialized view cannot subsequently be directly updated and that the query used to create the materialized view is stored in exactly the same way that a view's query is stored, so that fresh data can be generated for the materialized view with:
The information about a materialized view in the PostgreSQL system catalogs is exactly the same as it is for a table or view. So for the parser, a materialized view is a relation, just like a table or a view. When a materialized view is referenced in a query, the data is returned directly from the materialized view, like from a table; the rule is only used for populating the materialized view.
While access to the data stored in a materialized view is often much faster than accessing the underlying tables directly or through a view, the data is not always current; yet sometimes current data is not needed. Consider a table which records sales:
If people want to be able to quickly graph historical sales data, they might want to summarize, and they may not care about the incomplete data for the current date:
This materialized view might be useful for displaying a graph in the dashboard created for salespeople. A job could be scheduled to update the statistics each night using this SQL statement:
Another use for a materialized view is to allow faster access to data brought across from a remote system through a foreign data wrapper. A simple example using file_fdw
is below, with timings, but since this is using cache on the local system the performance difference compared to access to a remote system would usually be greater than shown here. Notice we are also exploiting the ability to put an index on the materialized view, whereas file_fdw
does not support indexes; this advantage might not apply for other sorts of foreign data access.
Setup:
Now let's spell-check a word. Using file_fdw
directly:
With EXPLAIN ANALYZE
, we see:
If the materialized view is used instead, the query is much faster:
Either way, the word is spelled wrong, so let's look for what we might have wanted. Again using file_fdw
:
Using the materialized view:
If you can tolerate periodic update of the remote data to the local database, the performance benefit can be substantial.
本章討論 PostgreSQL中的規則系統(rule system)。產品的規則系統在概念上很簡單,但實際使用它們會涉及許多細微的觀點。
其他一些資料庫系統定義了有效的資料庫規則,這些規則通常是儲存過程和觸發器。在 PostgreSQL 中,這些可以使用函數和觸發器來達成。
規則系統(更確切地說,查詢覆寫規則系統)與儲存過程和觸發器完全不同。它以規則修改查詢,然後將修改的查詢傳遞給查詢規劃器進行規劃和執行。它的功能非常強大,可用於查詢語言程序、view 和版本等多種功能。[] 和 [] 也討論了這個規則系統的理論基礎和能力。
The PostgreSQL server returns a command status string, such as INSERT 149592 1
, for each command it receives. This is simple enough when there are no rules involved, but what happens when the query is rewritten by rules?
Rules affect the command status as follows:
If there is no unconditional INSTEAD
rule for the query, then the originally given query will be executed, and its command status will be returned as usual. (But note that if there were any conditional INSTEAD
rules, the negation of their qualifications will have been added to the original query. This might reduce the number of rows it processes, and if so the reported status will be affected.)
If there is any unconditional INSTEAD
rule for the query, then the original query will not be executed at all. In this case, the server will return the command status for the last query that was inserted by an INSTEAD
rule (conditional or unconditional) and is of the same command type (INSERT
, UPDATE
, or DELETE
) as the original query. If no query meeting those requirements is added by any rule, then the returned command status shows the original query type and zeroes for the row-count and OID fields.
The programmer can ensure that any desired INSTEAD
rule is the one that sets the command status in the second case, by giving it the alphabetically last rule name among the active rules, so that it gets applied last.
由於PostgreSQL 規則系統重寫了查詢,因此可以存取除原始查詢中使用的資料表/檢視表之外的其他資料表/檢視表。使用規則更新時,可以包括對資料表的寫入存取。
重寫規則沒有單獨的擁有者。關連(資料表或檢視表)的擁有者自動成為為其定義的重寫規則的擁有者。PostgreSQL 規則系統改變了預設存取控制系統的行為。根據規則使用的關連將根據規則擁有者的權限進行檢查,而不是呼叫規則的使用者。這意味著使用者只需要在查詢中明確命名的資料表/檢視表所需的權限。
例如:使用者有一個電話號碼列表,其中一些是私人的,其他的是辦公室助理共享的。使用者可以建構以下內容:
除了該使用者(以及資料庫超級使用者)之外,沒有人可以存取 phone_data 資料表。 但是由於 GRANT,assistant 可以在 phone_number 檢視表上執行 SELECT。規則系統將 phone_number 中的 SELECT 重寫為來自 phone_data 的 SELECT。由於使用者是phone_number的所有者,因此是規則的擁有者,而現在根據使用者的權限檢查對 phone_data 的讀取,並允許查詢。 還執行了存取 phone_number 的檢查,但這是針對呼叫使用者完成的,因此除了使用者和助理之外,沒有人可以使用它。
按規則檢查權限。 所以助理現在是唯一可以看到公用電話號碼的人。 但助理可以設定另一個檢視表並向公眾授予存取權限。然後,任何人都可以透過助理的檢視表查看 phone_number 資料。助理不能做的是建立一個直接存取 phone_data 的檢視表。(實際上助理可以,但是由於在權限檢查期間每次存取都將被拒絕,所以它將無法使用。)一旦使用者注意到助理打開了他們的 phone_number 檢視表,使用者就可以撤銷助理的存取權限。馬上,對助理檢視表的任何存取都將失敗。
有人可能認為這種逐規則檢查是一個安全漏洞,但實際上並非如此。 但如果它不能以這種方式工作,助理可以設定一個與 phone_number 具有相同欄位的資料表,並將資料每天複製到那裡一次。然後它是助理自己的資料,助理可以授予他們想要給每個人的存取權限。GRANT 指令意味著「我相信你」。如果你信任的人做了上面的事情,是時候考慮一下然後使用 REVOKE。
請注意,雖然檢視表可用於使用上面顯示的技術隱藏某些欄位內容,但除非已設定 security_barrier 旗標,否則它們不能可靠地用於隱藏未顯示資料列的資料。 例如,以下檢視表不安全:
此檢視表可能看起來很安全,因為規則系統會將來自 phone_number 的任何 SELECT 重寫為來自 phone_data 的 SELECT,並增加僅需要電話不以 412 開頭項目的限定條件。但是,如果使用者可以建立自己的函數,則說服計劃程序在 NOT LIKE 表示式之前執行使用者定義的函數並不困難。 例如:
phone_data 資料表中的每個人和電話號碼都將以 NOTICE 輸出,因為計劃程序將選擇在更昂貴的 NOT LIKE 之前執行廉價的複雜功能。即使阻止使用者定義新功能,內建功能也可用於類似的攻擊。(例如,大多數強制轉換函數會在它們産生的錯誤訊息中包含它們的輸入值。)
類似的考慮適用於更新規則。在上一節的範例中,範例資料庫中資料表的擁有者可以將 shoelace 檢視表上的 SELECT,INSERT,UPDATE 和 DELETE 權限授予其他人,但僅在 shoelace_log 上授予 SELECT。寫入日誌項目的規則操作仍將成功執行,其他用戶可以查看日誌項目。 但他們無法建立假項目,也無法變更或刪除現有項目。在這種情況下,不可能通過說服規劃程予改變操作順序來破壞規則,因為引用 shoelace_log 的唯一規則是不合格的 INSERT。在更複雜的場景中可能不是這樣。
當檢視表需要提供資料列級安全性時,應將 security_barrier 屬性套用於檢視表。這可以防止惡意的函數和運算子從資料列傳遞值,直到檢視表完成其工作。例如,如果上面顯示的檢視表是這樣建立,那麼它將是安全的:
使用 security_barrier 建立的檢視表可能比沒有此選項建立的檢視表更糟糕。通常,沒有辦法避免這種情況:如果可能危及安全性,則必須拒絕最快的計劃。因此,預設情況下不啟用此選項。
在處理沒有副作用的函數時,查詢規劃程序具有更大的靈活性。 這些函數稱為 LEAKPROOF,包括許多簡單的常用運算子,例如許多相等運算子。查詢計劃程序可以安全地允許在查詢執行過程中的任何時候執行此類函數,因為在使用者不可見的資料列上呼叫它們不會洩漏有關不可見資料列的任何信息。此外,不帶參數或未從安全屏障檢視表傳遞任何參數的函數不必標記為 LEAKPROOF 以便向下推,因為它們從不從檢視表接收資料。相反地,根據作為參數接收的值(例如在溢出或除零時拋出錯誤的函數)可能拋出錯誤的函數都不是防漏的,並且可能提供關於不可見資料列的重要信息,如果在安全檢視表的資料列過濾器之前套用。
重要的是要理解即使是使用 security_barrier 選項建立的檢視圖也只是在有限的意義上是安全的,即不可見 tuple 的內容不會傳遞給可能不安全的函數。使用者可能還有其他方法可以推斷出看不見的資料;例如,他們可以使用 EXPLAIN 查看查詢計劃,或者根據檢視表測量查詢的執行時間。惡意攻擊者可能能夠推斷出有關未見資料量的訊息,甚至可以獲得有關資料分佈或最常見值的一些訊息(因為這些事情可能會影響計劃的執行時間;甚至,因為它們也會被反映出來。在最佳化程序統計中,選擇計劃)。如果關注這些類型的“隱蔽通道(covert channel)”攻擊,則根本不允許對資料進行任何存取。
Many things that can be done using triggers can also be implemented using the PostgreSQL rule system. One of the things that cannot be implemented by rules are some kinds of constraints, especially foreign keys. It is possible to place a qualified rule that rewrites a command to NOTHING
if the value of a column does not appear in another table. But then the data is silently thrown away and that's not a good idea. If checks for valid values are required, and in the case of an invalid value an error message should be generated, it must be done by a trigger.
In this chapter, we focused on using rules to update views. All of the update rule examples in this chapter can also be implemented using INSTEAD OF
triggers on the views. Writing such triggers is often easier than writing rules, particularly if complex logic is required to perform the update.
For the things that can be implemented by both, which is best depends on the usage of the database. A trigger is fired once for each affected row. A rule modifies the query or generates an additional query. So if many rows are affected in one statement, a rule issuing one extra command is likely to be faster than a trigger that is called for every single row and must re-determine what to do many times. However, the trigger approach is conceptually far simpler than the rule approach, and is easier for novices to get right.
Here we show an example of how the choice of rules versus triggers plays out in one situation. There are two tables:
Both tables have many thousands of rows and the indexes on hostname
are unique. The rule or trigger should implement a constraint that deletes rows from software
that reference a deleted computer. The trigger would use this command:
Since the trigger is called for each individual row deleted from computer
, it can prepare and save the plan for this command and pass the hostname
value in the parameter. The rule would be written as:
Now we look at different types of deletes. In the case of a:
the table computer
is scanned by index (fast), and the command issued by the trigger would also use an index scan (also fast). The extra command from the rule would be:
Since there are appropriate indexes set up, the planner will create a plan of
So there would be not that much difference in speed between the trigger and the rule implementation.
With the next delete we want to get rid of all the 2000 computers where the hostname
starts with old
. There are two possible commands to do that. One is:
The command added by the rule will be:
with the plan
The other possible command is:
which results in the following executing plan for the command added by the rule:
This shows, that the planner does not realize that the qualification for hostname
in computer
could also be used for an index scan on software
when there are multiple qualification expressions combined with AND
, which is what it does in the regular-expression version of the command. The trigger will get invoked once for each of the 2000 old computers that have to be deleted, and that will result in one index scan over computer
and 2000 index scans over software
. The rule implementation will do it with two commands that use indexes. And it depends on the overall size of the table software
whether the rule will still be faster in the sequential scan situation. 2000 command executions from the trigger over the SPI manager take some time, even if all the index blocks will soon be in the cache.
The last command we look at is:
Again this could result in many rows to be deleted from computer
. So the trigger will again run many commands through the executor. The command generated by the rule will be:
The plan for that command will again be the nested loop over two index scans, only using a different index on computer
:
In any of these cases, the extra commands from the rule system will be more or less independent from the number of affected rows in a command.
The summary is, rules will only be significantly slower than triggers if their actions result in large and badly qualified joins, a situation where the planner fails.
To understand how the rule system works it is necessary to know when it is invoked and what its input and results are.
The rule system is located between the parser and the planner. It takes the output of the parser, one query tree, and the user-defined rewrite rules, which are also query trees with some extra information, and creates zero or more query trees as result. So its input and output are always things the parser itself could have produced and thus, anything it sees is basically representable as an SQL statement.
Now what is a query tree? It is an internal representation of an SQL statement where the single parts that it is built from are stored separately. These query trees can be shown in the server log if you set the configuration parameters debug_print_parse
, debug_print_rewritten
, or debug_print_plan
. The rule actions are also stored as query trees, in the system catalog pg_rewrite
. They are not formatted like the log output, but they contain exactly the same information.
Reading a raw query tree requires some experience. But since SQL representations of query trees are sufficient to understand the rule system, this chapter will not teach how to read them.
When reading the SQL representations of the query trees in this chapter it is necessary to be able to identify the parts the statement is broken into when it is in the query tree structure. The parts of a query tree arethe command type
This is a simple value telling which command (SELECT
, INSERT
, UPDATE
, DELETE
) produced the query tree.the range table
The range table is a list of relations that are used in the query. In a SELECT
statement these are the relations given after the FROM
key word.
Every range table entry identifies a table or view and tells by which name it is called in the other parts of the query. In the query tree, the range table entries are referenced by number rather than by name, so here it doesn't matter if there are duplicate names as it would in an SQL statement. This can happen after the range tables of rules have been merged in. The examples in this chapter will not have this situation.the result relation
This is an index into the range table that identifies the relation where the results of the query go.
SELECT
queries don't have a result relation. (The special case of SELECT INTO
is mostly identical to CREATE TABLE
followed by INSERT ... SELECT
, and is not discussed separately here.)
For INSERT
, UPDATE
, and DELETE
commands, the result relation is the table (or view!) where the changes are to take effect.the target list
The target list is a list of expressions that define the result of the query. In the case of a SELECT
, these expressions are the ones that build the final output of the query. They correspond to the expressions between the key words SELECT
and FROM
. (*
is just an abbreviation for all the column names of a relation. It is expanded by the parser into the individual columns, so the rule system never sees it.)
DELETE
commands don't need a normal target list because they don't produce any result. Instead, the planner adds a special CTID entry to the empty target list, to allow the executor to find the row to be deleted. (CTID is added when the result relation is an ordinary table. If it is a view, a whole-row variable is added instead, by the rule system, as described in Section 40.2.4.)
For INSERT
commands, the target list describes the new rows that should go into the result relation. It consists of the expressions in the VALUES
clause or the ones from the SELECT
clause in INSERT ... SELECT
. The first step of the rewrite process adds target list entries for any columns that were not assigned to by the original command but have defaults. Any remaining columns (with neither a given value nor a default) will be filled in by the planner with a constant null expression.
For UPDATE
commands, the target list describes the new rows that should replace the old ones. In the rule system, it contains just the expressions from the SET column = expression
part of the command. The planner will handle missing columns by inserting expressions that copy the values from the old row into the new one. Just as for DELETE
, a CTID or whole-row variable is added so that the executor can identify the old row to be updated.
Every entry in the target list contains an expression that can be a constant value, a variable pointing to a column of one of the relations in the range table, a parameter, or an expression tree made of function calls, constants, variables, operators, etc.the qualification
The query's qualification is an expression much like one of those contained in the target list entries. The result value of this expression is a Boolean that tells whether the operation (INSERT
, UPDATE
, DELETE
, or SELECT
) for the final result row should be executed or not. It corresponds to the WHERE
clause of an SQL statement.the join tree
The query's join tree shows the structure of the FROM
clause. For a simple query like SELECT ... FROM a, b, c
, the join tree is just a list of the FROM
items, because we are allowed to join them in any order. But when JOIN
expressions, particularly outer joins, are used, we have to join in the order shown by the joins. In that case, the join tree shows the structure of the JOIN
expressions. The restrictions associated with particular JOIN
clauses (from ON
or USING
expressions) are stored as qualification expressions attached to those join-tree nodes. It turns out to be convenient to store the top-level WHERE
expression as a qualification attached to the top-level join-tree item, too. So really the join tree represents both the FROM
and WHERE
clauses of a SELECT
.the others
The other parts of the query tree like the ORDER BY
clause aren't of interest here. The rule system substitutes some entries there while applying rules, but that doesn't have much to do with the fundamentals of the rule system.
Rules that are defined on INSERT
, UPDATE
, and DELETE
are significantly different from the view rules described in the previous section. First, their CREATE RULE
command allows more:
They are allowed to have no action.
They can have multiple actions.
They can be INSTEAD
or ALSO
(the default).
The pseudorelations NEW
and OLD
become useful.
They can have rule qualifications.
Second, they don't modify the query tree in place. Instead they create zero or more new query trees and can throw away the original one.
In many cases, tasks that could be performed by rules on INSERT
/UPDATE
/DELETE
are better done with triggers. Triggers are notationally a bit more complicated, but their semantics are much simpler to understand. Rules tend to have surprising results when the original query contains volatile functions: volatile functions may get executed more times than expected in the process of carrying out the rules.
Also, there are some cases that are not supported by these types of rules at all, notably including WITH
clauses in the original query and multiple-assignment sub-SELECT
s in the SET
list of UPDATE
queries. This is because copying these constructs into a rule query would result in multiple evaluations of the sub-query, contrary to the express intent of the query's author.
Keep the syntax:
in mind. In the following, update rules means rules that are defined on INSERT
, UPDATE
, or DELETE
.
Update rules get applied by the rule system when the result relation and the command type of a query tree are equal to the object and event given in the CREATE RULE
command. For update rules, the rule system creates a list of query trees. Initially the query-tree list is empty. There can be zero (NOTHING
key word), one, or multiple actions. To simplify, we will look at a rule with one action. This rule can have a qualification or not and it can be INSTEAD
or ALSO
(the default).
What is a rule qualification? It is a restriction that tells when the actions of the rule should be done and when not. This qualification can only reference the pseudorelations NEW
and/or OLD
, which basically represent the relation that was given as object (but with a special meaning).
So we have three cases that produce the following query trees for a one-action rule.No qualification, with either ALSO
or INSTEAD
the query tree from the rule action with the original query tree's qualification addedQualification given and ALSO
the query tree from the rule action with the rule qualification and the original query tree's qualification addedQualification given and INSTEAD
the query tree from the rule action with the rule qualification and the original query tree's qualification; and the original query tree with the negated rule qualification added
Finally, if the rule is ALSO
, the unchanged original query tree is added to the list. Since only qualified INSTEAD
rules already add the original query tree, we end up with either one or two output query trees for a rule with one action.
For ON INSERT
rules, the original query (if not suppressed by INSTEAD
) is done before any actions added by rules. This allows the actions to see the inserted row(s). But for ON UPDATE
and ON DELETE
rules, the original query is done after the actions added by rules. This ensures that the actions can see the to-be-updated or to-be-deleted rows; otherwise, the actions might do nothing because they find no rows matching their qualifications.
The query trees generated from rule actions are thrown into the rewrite system again, and maybe more rules get applied resulting in more or less query trees. So a rule's actions must have either a different command type or a different result relation than the rule itself is on, otherwise this recursive process will end up in an infinite loop. (Recursive expansion of a rule will be detected and reported as an error.)
The query trees found in the actions of the pg_rewrite
system catalog are only templates. Since they can reference the range-table entries for NEW
and OLD
, some substitutions have to be made before they can be used. For any reference to NEW
, the target list of the original query is searched for a corresponding entry. If found, that entry's expression replaces the reference. Otherwise, NEW
means the same as OLD
(for an UPDATE
) or is replaced by a null value (for an INSERT
). Any reference to OLD
is replaced by a reference to the range-table entry that is the result relation.
After the system is done applying update rules, it applies view rules to the produced query tree(s). Views cannot insert new update actions so there is no need to apply update rules to the output of view rewriting.
40.4.1.1. A First Rule Step by Step
Say we want to trace changes to the sl_avail
column in the shoelace_data
relation. So we set up a log table and a rule that conditionally writes a log entry when an UPDATE
is performed on shoelace_data
.
Now someone does:
and we look at the log table:
That's what we expected. What happened in the background is the following. The parser created the query tree:
There is a rule log_shoelace
that is ON UPDATE
with the rule qualification expression:
and the action:
(This looks a little strange since you cannot normally write INSERT ... VALUES ... FROM
. The FROM
clause here is just to indicate that there are range-table entries in the query tree for new
and old
. These are needed so that they can be referenced by variables in the INSERT
command's query tree.)
The rule is a qualified ALSO
rule, so the rule system has to return two query trees: the modified rule action and the original query tree. In step 1, the range table of the original query is incorporated into the rule's action query tree. This results in:
In step 2, the rule qualification is added to it, so the result set is restricted to rows where sl_avail
changes:
(This looks even stranger, since INSERT ... VALUES
doesn't have a WHERE
clause either, but the planner and executor will have no difficulty with it. They need to support this same functionality anyway for INSERT ... SELECT
.)
In step 3, the original query tree's qualification is added, restricting the result set further to only the rows that would have been touched by the original query:
Step 4 replaces references to NEW
by the target list entries from the original query tree or by the matching variable references from the result relation:
Step 5 changes OLD
references into result relation references:
That's it. Since the rule is ALSO
, we also output the original query tree. In short, the output from the rule system is a list of two query trees that correspond to these statements:
These are executed in this order, and that is exactly what the rule was meant to do.
The substitutions and the added qualifications ensure that, if the original query would be, say:
no log entry would get written. In that case, the original query tree does not contain a target list entry for sl_avail
, so NEW.sl_avail
will get replaced by shoelace_data.sl_avail
. Thus, the extra command generated by the rule is:
and that qualification will never be true.
It will also work if the original query modifies multiple rows. So if someone issued the command:
four rows in fact get updated (sl1
, sl2
, sl3
, and sl4
). But sl3
already has sl_avail = 0
. In this case, the original query trees qualification is different and that results in the extra query tree:
being generated by the rule. This query tree will surely insert three new log entries. And that's absolutely correct.
Here we can see why it is important that the original query tree is executed last. If the UPDATE
had been executed first, all the rows would have already been set to zero, so the logging INSERT
would not find any row where 0 <> shoelace_data.sl_avail
.
A simple way to protect view relations from the mentioned possibility that someone can try to run INSERT
, UPDATE
, or DELETE
on them is to let those query trees get thrown away. So we could create the rules:
If someone now tries to do any of these operations on the view relation shoe
, the rule system will apply these rules. Since the rules have no actions and are INSTEAD
, the resulting list of query trees will be empty and the whole query will become nothing because there is nothing left to be optimized or executed after the rule system is done with it.
A more sophisticated way to use the rule system is to create rules that rewrite the query tree into one that does the right operation on the real tables. To do that on the shoelace
view, we create the following rules:
If you want to support RETURNING
queries on the view, you need to make the rules include RETURNING
clauses that compute the view rows. This is usually pretty trivial for views on a single table, but it's a bit tedious for join views such as shoelace
. An example for the insert case is:
Note that this one rule supports both INSERT
and INSERT RETURNING
queries on the view — the RETURNING
clause is simply ignored for INSERT
.
Now assume that once in a while, a pack of shoelaces arrives at the shop and a big parts list along with it. But you don't want to manually update the shoelace
view every time. Instead we set up two little tables: one where you can insert the items from the part list, and one with a special trick. The creation commands for these are:
Now you can fill the table shoelace_arrive
with the data from the parts list:
Take a quick look at the current data:
Now move the arrived shoelaces in:
and check the results:
It's a long way from the one INSERT ... SELECT
to these results. And the description of the query-tree transformation will be the last in this chapter. First, there is the parser's output:
Now the first rule shoelace_ok_ins
is applied and turns this into:
and throws away the original INSERT
on shoelace_ok
. This rewritten query is passed to the rule system again, and the second applied rule shoelace_upd
produces:
Again it's an INSTEAD
rule and the previous query tree is trashed. Note that this query still uses the view shoelace
. But the rule system isn't finished with this step, so it continues and applies the _RETURN
rule on it, and we get:
Finally, the rule log_shoelace
gets applied, producing the extra query tree:
After that the rule system runs out of rules and returns the generated query trees.
So we end up with two final query trees that are equivalent to the SQL statements:
The result is that data coming from one relation inserted into another, changed into updates on a third, changed into updating a fourth plus logging that final update in a fifth gets reduced into two queries.
There is a little detail that's a bit ugly. Looking at the two queries, it turns out that the shoelace_data
relation appears twice in the range table where it could definitely be reduced to one. The planner does not handle it and so the execution plan for the rule systems output of the INSERT
will be
while omitting the extra range table entry would result in a
which produces exactly the same entries in the log table. Thus, the rule system caused one extra scan on the table shoelace_data
that is absolutely not necessary. And the same redundant scan is done once more in the UPDATE
. But it was a really hard job to make that all possible at all.
Now we make a final demonstration of the PostgreSQL rule system and its power. Say you add some shoelaces with extraordinary colors to your database:
We would like to make a view to check which shoelace
entries do not fit any shoe in color. The view for this is:
Its output is:
Now we want to set it up so that mismatching shoelaces that are not in stock are deleted from the database. To make it a little harder for PostgreSQL, we don't delete it directly. Instead we create one more view:
and do it this way:
Voilà:
A DELETE
on a view, with a subquery qualification that in total uses 4 nesting/joined views, where one of them itself has a subquery qualification containing a view and where calculated view columns are used, gets rewritten into one single query tree that deletes the requested data from a real table.
There are probably only a few situations out in the real world where such a construct is necessary. But it makes you feel comfortable that it works.