EnterpriseDB 打算推出的 zheap,想要解 VACUUM 問題...

2018-02-05 10:32:53来源:https://blog.gslin.org/archives/2018/02/03/8112/enterprisedb作者:Gea-Suan Lin's BLOG人点击


前天被问到「DO or UNDO - there is no VACUUM

与 InnoDB 都是透过MVCC
的概念实做 transaction 之间的互动,但两者实际的作法不太一样。其中带来一个明显的差异就是 PostgreSQL 需要VACUUM
。这点在同一篇作者八年前 (2011) 的文章就有提过两者的差异以及优缺点:「MySQL vs. PostgreSQL, Part 2: VACUUM vs. Purge

时,InnoDB 会把新资料写到表格内,然后把可能会被 rollback 的旧资料放到表格外:

In InnoDB, only the most recent version of an updated row is retained in the table itself. Old versions of updated rows are moved to the rollback segment, while deleted row versions are left in place and marked for future cleanup. Thus, purge must get rid of any deleted rows from the table itself, and clear out any old versions of updated rows from the rollback segment.

清除的资料则是由 purge thread 处理:

All the information necessary to find the deleted records that might need to be purged is also written to the rollback segment, so it's quite easy to find the rows that need to be cleaned out; and the old versions of the updated records are all in the rollback segment itself, so those are easy to find, too.

所以可以在 InnoDB 看到 purge thread 相关的设定:「MySQL :: MySQL 5.7 Reference Manual :: 14.6.11 Configuring InnoDB Purge Scheduling

而在 PostgreSQL 的作法则是反过来,旧的资料放在原来地方,新资料另外存:

PostgreSQL takes a completely different approach. There is no rollback tablespace, or anything similar. When a row is updated, the old version is left in place; the new version is simply written into the table along with it.


Lacking a centralized record of what must be purged, PostgreSQL's VACUUM has historically needed to scan the entire table to look for records that might require cleanup.

这也使得 PostgreSQL 里需要autovacuum
之类的程序去扫,或是手动跑 vacuum。而在去年 (2017) 的文章里也有提到目前还是类似的情况:「MVCC and VACUUM

而在今年 (2018) 的文章里,EnterpriseDB
就提出了 zheap 的想法,在UPDATE
时写到 table 里,把可能被 rollback 的资料放到 undo log 里。其实就是把 InnoDB 那套方法拿过来用,只是整篇都没提到而已 XD:

That brings me to the design which EnterpriseDB is proposing. We are working to build a new table storage format for PostgreSQL, which we’re calling zheap. In a zheap, whenever possible, we handle an UPDATE by moving the old row version to an undo log, and putting the new row version in the place previously occupied by the old one. If the transaction aborts, we retrieve the old row version from undo and put it back in the original location; if a concurrent transaction needs to see the old row version, it can find it in undo. Of course, this doesn’t work when the block is full and the row is getting wider, and there are some other problem cases as well, but it covers many useful cases. In the typical case, therefore, even bulk updates do not force a zheap to grow. Instead, the undo grows. When a transaction commits, all row versions that will become dead are in the undo, not the zheap.

不过马上就会想到问题,如果要改善问题,不是个找地方记录哪些位置要回收就好了吗?顺便改变方法是为了避免 fragment 吗?