Vacuuming
Overview
ADPG/PostgreSQL maintains data consistency by using a multi-version model (Multiversion Concurrency Control, MVCC). It utilizes tuple multi-versioning — data is not changed and is not deleted during transaction processing, but a new version of the record is added to the table. This record is called a tuple. Tuples that are deleted or obsoleted by an update are not physically removed from their tables but are marked using special fields. A row in a table that has been marked for deletion but has not yet been physically removed is called a dead tuple. Vacuuming reclaims storage occupied by dead tuples.
It is necessary to vacuum each table regularly for several reasons:
-
to recover or reuse disk space occupied by updated or deleted rows;
-
to update data statistics used by the query planner;
-
to update the visibility map, which speeds up index-only scans;
-
to protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.
Different reasons dictate that vacuuming be performed at different frequencies and to different extents, as discussed in the subsections below.
For the most scenarios, it is preferable to let vacuuming be performed by the autovacuum daemon that automates the execution of the VACUUM
command.
VACUUM and VACUUM FULL commands
As mentioned above, tuples that are deleted or obsoleted by an update are not physically removed from tables. Each tuple contains service information — HeapTupleHeaderData that allows you to determine if the current tuple is the valid row version or is outdated. (For more information on the HeapTupleHeaderData structure, refer to Table row layout.)
Create and populate a table with data to view HeapTupleHeaderData fields:
CREATE TABLE test (
id serial PRIMARY KEY,
num int
);
INSERT INTO test (num) VALUES (10),(20),(30);
Delete a row from this table and update another row:
UPDATE test SET num = 25 where id = 2;
DELETE FROM test WHERE id = 3;
View table data with columns that contain hidden tuple attributes:
SELECT id, num, xmin,xmax,cmin,cmax,ctid FROM test;
Two rows are displayed, and the second one has a new number of the inserting transaction:
id | num | xmin | xmax | cmin | cmax | ctid ----+-----+---------+------+------+------+------- 1 | 10 | 2339945 | 0 | 7 | 7 | (0,1) 2 | 25 | 2339949 | 0 | 0 | 0 | (0,4) (2 rows)
The following hidden fields are shown:
-
xmin
— the identity (transaction ID) of the inserting transaction for this row version. -
xmax
— the identity (transaction ID) of the deleting transaction, or zero for an undeleted row version. -
cmin
— the command identifier (starting at zero) within the inserting transaction. -
cmax
— the command identifier within the deleting transaction, or zero. -
ctid
— the physical location of the current row version within its table. For the most current version of a row,ctid
refers to this version itself. It has the following form:(<page number>, <the index number in the array>)
.
To view dead tuples, install the pageinspect extension:
CREATE EXTENSION pageinspect;
Run the following query that uses the heap_page_items function to display all tuples:
SELECT lp as tuple, t_xmin, t_xmax, t_ctid FROM heap_page_items(get_raw_page('test',0));
Result:
tuple | t_xmin | t_xmax | t_cid | t_ctid -------+---------+---------+-------+-------- 1 | 2339945 | 0 | 7 | (0,1) 2 | 2339945 | 2339949 | 0 | (0,4) 3 | 2339945 | 2339949 | 1 | (0,3) 4 | 2339949 | 0 | 0 | (0,4)
The test
table contains two dead tuples (2
and 3
).
Then, run the VACUUM command. You can use the VERBOSE
key word to print a detailed vacuum activity report:
VACUUM VERBOSE test;
Result:
INFO: vacuuming "public.test" INFO: scanned index "test_pkey" to remove 1 row versions DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s INFO: table "test": removed 1 dead item identifiers in 1 pages DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s INFO: index "test_pkey" now contains 2 row versions in 2 pages DETAIL: 1 index row versions were removed. 0 index pages were newly deleted. 0 index pages are currently deleted, of which 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: table "test": found 2 removable, 2 nonremovable row versions in 1 out of 1 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2378768 Skipped 0 pages due to buffer pins, 0 frozen pages. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. VACUUM
Also, each backend running VACUUM
without the FULL
option will report its progress in the pg_stat_progress_vacuum
view.
Execute the query mentioned above to view all tuples in the test
table:
SELECT lp as tuple, t_xmin, t_xmax, t_ctid FROM heap_page_items(get_raw_page('test',0));
Result:
tuple | t_xmin | t_xmax | t_ctid -------+---------+--------+------- 1 | 2339945 | 0 | (0,1) 2 | | | 3 | | | 4 | 2339949 | 0 | (0,4) (4 rows)
From the result, you can see that VACUUM
deletes the dead tuples, but does not free up the disk space. However, VACUUM
makes it available for re-use, and new tuples will be written in place of the deleted dead tuples.
You can also use the query against the pg_stat_user_tables
view to obtain tuple statistics:
SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables WHERE relname = 'test';
Result:
relname | n_live_tup | n_dead_tup ---------+------------+------------ test | 2 | 2
If tables and columns are not specified in the VACUUM
command, it processes every table and materialized view in the current database. Also, VACUUM
cannot be executed inside a transaction block.
VACUUM
generates a significant amount of I/O traffic, which can cause poor performance for other active sessions. PostgreSQL provides configuration parameters that can be adjusted to reduce the performance impact of background vacuuming — see Cost-based vacuum delay. Use postgresql.conf custom section to set vacuum parameters as described in Configuration parameters.
You can retrieve current vacuum settings from the pg_settings view using the following query:
SELECT name, setting, context, short_desc
FROM pg_settings WHERE name like 'vacuum%';
Result:
name | setting | context | short_desc -----------------------------------+------------+---------+------------------------------------------------------------------------------------- vacuum_cost_delay | 0 | user | Vacuum cost delay in milliseconds. vacuum_cost_limit | 200 | user | Vacuum cost amount available before napping. vacuum_cost_page_dirty | 20 | user | Vacuum cost for a page dirtied by vacuum. vacuum_cost_page_hit | 1 | user | Vacuum cost for a page found in the buffer cache. vacuum_cost_page_miss | 2 | user | Vacuum cost for a page not found in the buffer cache. vacuum_defer_cleanup_age | 0 | sighup | Number of transactions by which VACUUM and HOT cleanup should be deferred, if any. vacuum_failsafe_age | 1600000000 | user | Age at which VACUUM should trigger failsafe to avoid a wraparound outage. vacuum_freeze_min_age | 50000000 | user | Minimum age at which VACUUM should freeze a table row. vacuum_freeze_table_age | 150000000 | user | Age at which VACUUM should scan whole table to freeze tuples. vacuum_multixact_failsafe_age | 1600000000 | user | Multixact age at which VACUUM should trigger failsafe to avoid a wraparound outage. vacuum_multixact_freeze_min_age | 5000000 | user | Minimum age at which VACUUM should freeze a MultiXactId in a table row. vacuum_multixact_freeze_table_age | 150000000 | user | Multixact age at which VACUUM should scan whole table to freeze tuples.
There are two variants of VACUUM
: standard VACUUM
and VACUUM FULL
. VACUUM FULL
can reclaim more disk space but runs much more slowly. Also, the standard form of VACUUM
can run in parallel with production database operations. The SELECT
, INSERT
, UPDATE
, and DELETE
commands will continue to operate normally, though you cannot modify the definition of a table with commands such as ALTER TABLE
while VACUUM
is running.
Run VACUUM FULL
for the test
table:
VACUUM FULL test;
Execute the query mentioned above to view all tuples in the test
table:
SELECT lp as tuple, t_xmin, t_xmax, t_ctid FROM heap_page_items(get_raw_page('test',0));
As a result, the empty rows have been deleted:
tuple | t_xmin | t_xmax | t_ctid -------+---------+--------+-------- 1 | 2339945 | 0 | (0,1) 2 | 2339949 | 0 | (0,2)
VACUUM FULL
creates a new copy of the table and recovers the disk space. VACUUM FULL
requires an ACCESS EXCLUSIVE
lock on the table, and cannot be done in parallel with other operations on the table. VACUUM FULL
can reclaim more space, but takes much longer. This method also requires extra disk space, since it writes a new copy of the table and does not release the old copy until the operation is complete. Generally, VACUUM FULL
should only be used when a significant amount of space needs to be reclaimed. Administrators should strive to use standard VACUUM
and avoid VACUUM FULL
.
TIP
You can use the query to the
Result: relname | n_live_tup | n_dead_tup ---------+------------+------------ test | 2 | 3 |
Recover disk space
As mentioned above, an UPDATE
and DELETE
command does not immediately remove the old version of the row. This approach is necessary to gain the benefits of multi-version concurrency control: the row version must not be deleted while it is still potentially visible to other transactions. Eventually, an outdated or deleted row version is no longer of interest to any transaction. The space it occupies must be reclaimed for reuse by new rows, to avoid unbounded growth of disk space requirements. As described above, the standard form of VACUUM
removes dead row versions in tables and indexes and marks the space available for future reuse. However, it does not return the space to the operating system, except in the special case where one or more pages at the end of a table become entirely free and an exclusive table lock can be easily obtained. In contrast, VACUUM FULL
writes a new version of the table file with no dead space. This minimizes the size of the table, but can take a long time and requires extra disk space.
The usual goal of regular vacuuming is to perform standard VACUUM
frequently enough to avoid the need for VACUUM FULL
. The main idea of this approach is not to minimize the size of tables, but to keep disk space usage at a stable level: each table occupies a space equal to its minimum size plus the space used up between vacuum operations. Although you can use VACUUM FULL
to shrink a table to its minimum and return disk space to the operating system, there is little point in doing so if the table grows again in the future. For tables that are updated frequently, it is better to run VACUUM
at a moderate frequency than to run VACUUM FULL
very rarely.
TIP
|
Update planner statistics
The query planner relies on statistical information about the contents of tables in order to generate good plans for queries. These statistics are gathered by the ANALYZE command, which can be invoked by itself or as an optional step in VACUUM
. It is important to have accurate statistics, otherwise poorly chosen query plans can reduce database performance.
The autovacuum daemon runs the ANALYZE
command whenever the content of a table has changed sufficiently. However, administrators may prefer to perform ANALYZE
manually, because the daemon schedules ANALYZE
execution based only on the number of rows inserted or modified. The daemon does not have information whether these changes will lead to meaningful changes in statistics.
Tuples changed in partitions and inheritance children do not trigger ANALYZE
on the parent table. If the parent table is empty or rarely changed, it may never be processed by autovacuum, and the statistics for the inheritance tree will not be collected. It is necessary to run ANALYZE
on the parent tables manually in order to keep the statistics up to date.
Update the visibility map
Vacuuming maintains a visibility map for each table to keep track of which pages contain only tuples that are visible to all active transactions (and all future transactions, until the page is modified). This has two purposes. First, vacuuming can skip such pages on the next run, since there is nothing to clean up. Second, it allows ADPG/PostgreSQL to execute some queries using only the index, without reference to the underlying table. Since indexes do not contain tuple visibility information, a standard index scan fetches a tuple for each matching index entry, to check whether it can be seen by the current transaction. An index-only scan checks the visibility map first. If all tuples on the page are visible, the fetch for checking visibility can be skipped. This is most useful for large data sets where the visibility map can prevent disk access.
Prevent transaction ID wraparound failures
PostgreSQL MVCC transaction semantics depend on being able to compare transaction ID (XID) numbers: a row version with an XID greater than the current transaction’s XID is "in the future" and should not be visible to the current transaction. Since transaction IDs have a limited size (32 bits), a cluster that runs for a long time (more than 4 billion transactions) would suffer transaction ID wraparound. The XID counter wraps around to zero, and transactions that were "in the past" appear to be "in the future". It means their output becomes invisible resulting in data loss. Actually, the data is still there, but you cannot get it. To avoid this, it is necessary to vacuum every table in every database at least once every two billion transactions.
Periodic vacuuming solves the problem. VACUUM
marks rows as frozen, indicating that they were inserted by a transaction that was committed sufficiently far in the past and should be visible to all current and future transactions.
PostgreSQL allocates 32 bits for XID (about 4 billion values). This means that for every transaction, there are two billion XIDs that are "older" and two billion that are "newer". The XID space is circular with no endpoint. Therefore, once a row version has been created with a particular XID, it will appear to be "in the past" for the next two bilbillionlion transactions. If the row version still exists after more than two billion transactions, it will suddenly appear to be "in the future". To prevent this, PostgreSQL reserves a special XID, FrozenTransactionId
, which does not follow the normal XID comparison rules and is always considered older than every normal XID. Frozen row versions are "in the past" to all normal transactions, and such row versions will be valid until deleted, no matter how long that is.
The vacuum_freeze_min_age parameter specifies the cutoff age of transactions that VACUUM
should use to decide whether to freeze row versions (use postgresql.conf custom section to set it). Increasing this setting may avoid unnecessary work if the rows that would otherwise be frozen will soon be modified again, but decreasing this setting increases the number of transactions that can elapse before the table must be vacuumed again.
VACUUM
uses the visibility map to determine which pages of a table must be scanned. Normally, it will skip pages that do not have any dead row versions even if those pages still have row versions with old XID values. Therefore, VACUUM
does not always freeze every old row version in the table. Periodically, VACUUM
performs an aggressive vacuum that skips only pages, which contain neither dead rows nor any unfrozen XID or MXID values. The vacuum_freeze_table_age parameter controls when VACUUM
does that. All visible but not all frozen pages are scanned if the number of transactions that have passed since the last such scan is greater than vacuum_freeze_table_age
minus vacuum_freeze_min_age
. Setting vacuum_freeze_table_age
to 0
forces VACUUM
to use this more aggressive strategy for all scans.
The maximum time that a table can be unvacuumed is two billion transactions minus the vacuum_freeze_min_age
value at the time of the last aggressive vacuum. If a table is not vacuumed for longer, data loss would be possible. To ensure that this does not happen, autovacuum is invoked on any table that might contain unfrozen rows with XIDs older than the age specified by the autovacuum_freeze_max_age
parameter. This happens even if autovacuum is disabled.
To track the age of the oldest unfrozen XIDs in a database, VACUUM
stores XID statistics in the system tables pg_class and pg_database. In particular, the relfrozenxid
column of pg_class
contains the freeze cutoff XID that was used by the last aggressive VACUUM
for the current table. All rows inserted by transactions with XIDs older than this cutoff XID are guaranteed to have been frozen. Similarly, the datfrozenxid
column of pg_database
contains a lower bound on the unfrozen XIDs appearing in the current database. It is the minimum of the per-table relfrozenxid
values within the database. A convenient way to examine this information is to execute the queries below. In their results, the age
column measures the number of transactions from the cutoff XID to the current transaction’s XID.
The query to pg_class
:
SELECT c.oid::regclass as table_name,
greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm');
Result:
table_name | age --------------------------------------------+--------- books_log | 649725 books | 650456 book_orders | 929364 accounts | 486357 pg_statistic | 2300534 pg_type | 2300534 employees | 447694 pg_foreign_table | 2300534 pg_authid | 2300534 pg_statistic_ext_data | 2300534 pg_user_mapping | 2300534 pg_subscription | 2300534 pg_attribute | 2300534 pg_proc | 2300534 pg_class | 2300534 pg_attrdef | 2300534 pg_constraint | 2300534 pg_inherits | 2300534 ...
The query to pg_database
:
SELECT datname, age(datfrozenxid) FROM pg_database;
Result:
datname | age --------------+--------- postgres | 2300549 adpg_metrics | 2300549 template1 | 2300549 template0 | 2300549 bookstore | 2300549 adpg_db1 | 2300549 adpg_db2 | 2300549 sb_tpcc | 2300549
If for some reason autovacuum fails to clear old XIDs from a table, the system begins to emit warning messages when the database’s oldest XIDs reach forty million transactions from the wraparound point:
WARNING: database "mydatabase" must be vacuumed within 39985967 transactions HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
Manual VACUUM
should fix the problem, as suggested by the hint. Note that VACUUM
should be performed by a superuser.
If these warnings are ignored, the system will refuse to assign new XIDs once there are fewer than three million transactions left until wraparound. In this condition, any transactions already in progress can continue, but only read-only transactions can be started. Operations that modify database records or truncate relations will fail.
Multixacts and wraparound
Multixact IDs are used to support row locking by multiple transactions. Since there is only limited space in a tuple header to store lock information, that information is encoded as a multiple transaction ID (multixact ID), whenever there is more than one transaction concurrently locking a row. Information about which transaction IDs belong to a particular multixact ID is stored separately in the pg_multixact
subdirectory, and only the multixact ID is stored in the xmax
field of the tuple header. Like transaction IDs, multixact IDs are implemented as a 32-bit counter and are stored similarly, which requires careful management of their age, storage cleanup, and wraparound handling.
Whenever VACUUM
scans any part of a table, it will replace any multixact ID (the xmax
field) it encounters which is older than vacuum_multixact_freeze_min_age by a different value, which can be the zero value, a single transaction ID, or a newer multixact ID. For each table, pg_class.relminmxid
stores the oldest possible multixact ID from any tuple of the current table. If this value is older than vacuum_multixact_freeze_table_age, an aggressive vacuum is forced.
For safety reasons, an aggressive vacuum scan will occur for any table whose multixact age is greater than autovacuum_multixact_freeze_max_age. Also, if the storage occupied by members of multi-transactions exceeds 2 GB, aggressive vacuum scans will occur more often for all tables, starting with tables that have the oldest multixact age. Both of these kinds of aggressive scans will occur even if autovacuum is nominally disabled.
Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the system will begin to emit warning messages when the database’s oldest MXIDs reach forty million transactions from the wraparound point. If these warnings are ignored, the system will refuse to generate new MXIDs once there are fewer than three million left until wraparound.
Autovacuum
ADPG/PostgreSQL has highly recommended feature called autovacuum that automates the execution of the VACUUM
and ANALYZE
commands. When enabled, autovacuum checks for tables that have a large number of inserted, updated, or deleted tuples. These checks use the statistics collection facility; therefore, autovacuum cannot be used unless the track_counts parameter is set to true
(the default value).
The autovacuum daemon consists of multiple processes. There is a persistent daemon process, called the autovacuum launcher, which is in charge of starting autovacuum worker processes for all databases. The launcher will distribute the work across time, attempting to start one worker within each database every autovacuum_naptime seconds. If there are N
databases, a new worker is launched every autovacuum_naptime
/N
seconds. The autovacuum_max_workers parameter is set the maximum of worker processes that are allowed to run at the same time. If there are more than autovacuum_max_workers
databases to be processed, the next database will be processed as soon as the first worker finishes. Each worker process will check each table within its database and execute VACUUM
and/or ANALYZE
as needed.
By default, autovacuuming is enabled, and the related configuration parameters are appropriately set. You can use the following query to display values of the autovacuum parameters:
SELECT name, setting, context, short_desc
FROM pg_settings WHERE name like 'autovacuum%';
Result:
name | setting | context | short_desc -------------------------------------+----------+------------+------------------------------------------------------------------------------------------- autovacuum | on | sighup | Starts the autovacuum subprocess. autovacuum_analyze_scale_factor | 0.1 | sighup | Number of tuple inserts, updates, or deletes prior to analyze as a fraction of reltuples. autovacuum_analyze_threshold | 50 | sighup | Minimum number of tuple inserts, updates, or deletes prior to analyze. autovacuum_freeze_max_age | 200000000| postmaster | Age at which to autovacuum a table to prevent transaction ID wraparound. autovacuum_max_workers | 3 | postmaster | Sets the maximum number of simultaneously running autovacuum worker processes. autovacuum_multixact_freeze_max_age | 400000000| postmaster | Multixact age at which to autovacuum a table to prevent multixact wraparound. autovacuum_naptime | 60 | sighup | Time to sleep between autovacuum runs. autovacuum_vacuum_cost_delay | 2 | sighup | Vacuum cost delay in milliseconds, for autovacuum. autovacuum_vacuum_cost_limit | -1 | sighup | Vacuum cost amount available before napping, for autovacuum. autovacuum_vacuum_insert_scale_factor| 0.2 | sighup | Number of tuple inserts prior to vacuum as a fraction of reltuples. autovacuum_vacuum_insert_threshold | 1000 | sighup | Minimum number of tuple inserts prior to vacuum, or -1 to disable insert vacuums. autovacuum_vacuum_scale_factor | 0.2 | sighup | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples. autovacuum_vacuum_threshold | 50 | sighup | Minimum number of tuple updates or deletes prior to vacuum. autovacuum_work_mem | -1 | sighup | Sets the maximum memory to be used by each autovacuum worker process.
For more information on the autovacuum settings, refer to Automatic vacuuming.
Autovacuum runs not by a timer, but when certain conditions are met. Tables with the relfrozenxid
value is more than autovacuum_freeze_max_age transactions old are always vacuumed. Otherwise, if the number of tuples obsoleted since the last VACUUM
exceeds the vacuum threshold
, the table is vacuumed. The vacuum threshold
is defined as:
Where:
-
vacuum base threshold
— autovacuum_vacuum_threshold; -
vacuum scale factor
— autovacuum_vacuum_scale_factor; -
number of tuples
— pg_class.reltuples (number of live rows in the table).
The table is also vacuumed if the number of tuples inserted since the last vacuum has exceeded insert threshold
, which is defined as:
Where:
-
vacuum base insert threshold
— autovacuum_vacuum_insert_threshold; -
vacuum insert scale factor
— autovacuum_vacuum_insert_scale_factor; -
number of tuples
— pg_class.reltuples (number of live rows in the table).
Such vacuuming may allow a part of the table pages to be marked as all visible, and also allow tuples to be frozen, which can reduce the work required in subsequent vacuuming. For tables, which receive INSERT
operations but no or almost no UPDATE
/DELETE
operations, it may be beneficial to lower the table’s autovacuum_freeze_min_age as this may allow tuples to be frozen by earlier vacuums. The number of obsolete tuples and the number of inserted tuples are obtained from the statistics collector; it is an approximate number that is updated by each UPDATE
, DELETE
, and INSERT
operation. It is only approximate because some information might be lost under heavy load. If the relfrozenxid
value of the table is more than vacuum_freeze_table_age
transactions old, an aggressive vacuum is performed to freeze old tuples and advance relfrozenxid
; otherwise, only pages that have been modified since the last vacuum are scanned.
Autovacuum uses a similar condition for ANALYZE
— the analyze threshold
is compared to the total number of tuples inserted, updated, or deleted since the last ANALYZE
and is defined as:
Where:
-
analyze base threshold
— theautovacuum_analyze_threshold
parameter; -
analyze scale factor
— theautovacuum_analyze_scale_factor
parameter; -
number of tuples
— pg_class.reltuples (number of live rows in the table).
Root partitioned tables do not directly store tuples and consequently are not processed by autovacuum, but autovacuum processes the partitions, which are regular PostgreSQL tables. This means that autovacuum does not run ANALYZE
on root partitioned tables, and this can cause suboptimal plans for queries that reference partitioned table statistics. You can work around this problem by manually running ANALYZE
on partitioned tables when they are first populated, and whenever the distribution of data in their partitions changes significantly. See ANALYZE on partitioned and child tables.
Temporary tables cannot be accessed by autovacuum. Therefore, VACUUM
and ANALYZE
operations can be performed via SQL commands.
You can also override the autovacuum control parameters on a per-table basis. See Storage parameters for more information. If any values are defined through table storage parameters, they are used when processing the current table, otherwise, global parameters are utilized.