Performance tuning

Query performance can be affected by many factors. A part of them can be controlled by a user or administrator, while others are mostly influenced by the underlying system architecture. This article provides some tips for understanding and tuning PostgreSQL performance.

Optimize resource consumption

This section contains general guidelines on configuration parameter values that can be used as a starting point. Optimal parameter values depend on the load of a particular cluster and used hardware. Please note that there are open-source configurators that calculate basic PostgreSQL settings based on the specified hardware configuration. Some of them are available online.

Memory settings

The table below contains base recommendations on memory settings based on the server memory.

Parameter name Description Recommended value

effective_cache_size

Sets the planner’s assumption about the effective size of the disk cache that is available to a single query. This is taken into account when estimating the cost of using the index. A higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be applied. When setting this parameter, you should consider both PostgreSQL shared buffers and the portion of the kernel’s disk cache that will be used for PostgreSQL data files, though some data might exist in both places. Also, take into account the expected number of concurrent queries on different tables, since they will have to share the available space. This parameter does not affect the size of shared memory allocated by PostgreSQL, and it does not reserve kernel disk cache. It is used only for estimation purposes. The system also does not assume data remains in the disk cache between queries. If this value is specified without units, it is taken as blocks, that is BLCKSZ bytes, typically 8 KB

2/3 RAM

shared_buffers

Sets the amount of memory the database server uses for shared memory buffers. If you have a dedicated database server with 1 GB or more of RAM, a reasonable starting value for shared_buffers is 25% of the memory in your system. There are some workloads where even larger settings for shared_buffers are effective, but because PostgreSQL also relies on the operating system cache, it is unlikely that an allocation of more than 40% of RAM to shared_buffers will work better than a smaller amount. Larger settings for shared_buffers usually require a corresponding increase of the max_wal_size parameter to spread out the process of writing a large amount of new or changed data over a longer period of time. If this value is specified without units, it is taken as blocks, that is BLCKSZ bytes, typically 8 KB

RAM/4

temp_buffers

Sets the maximum amount of memory used for temporary buffers within each session. These are session-local buffers used only for access to temporary tables. If this value is specified without units, it is taken as blocks, that is BLCKSZ bytes, typically 8 KB

256 MB

work_mem

Sets the planner’s assumption about the effective size of the disk cache that is available to a single query. This is taken into account when estimating the cost of using the index. A higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be applied. When setting this parameter, you should consider both PostgreSQL shared buffers and the portion of the kernel’s disk cache that will be used for PostgreSQL data files, though some data might exist in both places. Also, take into account the expected number of concurrent queries on different tables, since they will have to share the available space. This parameter does not affect the size of shared memory allocated by PostgreSQL, and it does not reserve kernel disk cache. It is used only for estimation purposes. The system also does not assume data remains in the disk cache between queries. If this value is specified without units, it is taken as blocks, that is BLCKSZ bytes, typically 8 KB

RAM/32

maintenance_work_mem

Specifies the maximum amount of memory to be used by maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY. If this value is specified without units, it is taken as kilobytes. Since only one of these operations can be executed at a time by a session, and they are usually not executed in parallel in a cluster, it is safe to set this value significantly larger than work_mem. Larger settings might improve performance of vacuuming and restoring database dumps .

Note that when autovacuum runs, this memory may be allocated up to autovacuum_max_workers times, so be careful not to set the value too high. It may be useful to utilize the autovacuum_work_mem configuration parameter to control autovacuum

RAM/16

You can set all the described parameters except autovacuum_work_mem and temp_buffers in the ADPG configurations section of the Configuration tree on the Clusters → ADPG cluster → Services → ADPG → Primary configuration page in ADCM UI. To specify autovacuum_work_mem and temp_buffers, use the ADPG configurations section of the Configuration tree. See Configuration parameters for details.

After changing the described settings, execute the Reconfigure & Restart action of the ADPG service to apply changes.

The table below lists parameters related to I/O operations, changing the values of which can significantly affect performance.

Parameter name Description

fsync

If this parameter is on, the PostgreSQL server tries to make sure that updates are physically written to disk, by issuing fsync() system calls or various equivalent methods (see wal_sync_method). This ensures that the database cluster can recover to a consistent state after an operating system or hardware crash. While turning off fsync is often a performance benefit, this can result in unrecoverable data corruption in case of a power failure or system crash. It is only advisable to turn off fsync if you can easily recreate your entire database from external data.

Examples where disabling fsync is safe include the initial loading of a new database cluster from a backup file, using a database cluster for processing a batch of data after which the database will be thrown away and recreated, or for a read-only database clone which is recreated frequently and is not used for failover. High quality hardware is not a sufficient reason to turning off fsync.

In many situations, turning off synchronous_commit for noncritical transactions can provide much of the potential performance benefit of turning off fsync, without the attendant risks of data corruption.

You can set this parameter in ADPG configurations on the Clusters → ADPG cluster → Services → ADPG → Primary configuration page in ADCM UI (see Configuration parameters).

synchronous_commit

Specifies how much WAL processing must complete before the database server returns a success indication to the client. See synchronous_commit for detailed description.

Setting this parameter to off does not create any risk of database inconsistency: an operating system or database failure may result in the loss of the last transactions considered committed, but the database state will be just the same as if those transactions are aborted. So, turning synchronous_commit off can be a useful alternative when performance is more important than a reliable guarantee of the safety of each transaction. For additional information, refer to Asynchronous commit.

The behavior of each transaction is determined by the setting in effect when it commits. It is possible to commit some transactions synchronously and others asynchronously. For example, to make a single multistatement transaction commit asynchronously when the default is the opposite, issue SET LOCAL synchronous_commit TO OFF in that transaction

checkpoint_completion_target

Specifies the target time of checkpoint completion as a fraction of total time between checkpoints. The default is 0.9, which spreads the checkpoint across almost of the available interval, providing fairly consistent I/O load while also leaving some time for checkpoint completion overhead. Reducing this parameter is not recommended because it causes the checkpoint to complete faster. This results in a higher rate of I/O during the checkpoint, followed by a period of less I/O between the checkpoint completion and the next scheduled checkpoint

effective_io_concurrency

Sets the number of concurrent disk I/O operations that can be executed simultaneously. Raising this value will increase the number of I/O operations that a PostgreSQL session attempts to initiate in parallel. The allowed range is 1 to 1000, or zero to disable issuance of asynchronous I/O requests. Currently, this setting only affects bitmap heap scans.

For magnetic drives, a good starting point for this setting is the number of separate drives comprising a RAID 0 or RAID 1 being used for the database. For RAID 5 the parity drive should not be counted. If the database is often busy with multiple queries issued in concurrent sessions, lower values may be sufficient to keep the disk array busy. A value higher than needed to keep the disks busy will only result in extra CPU overhead. SSD storages can often process many concurrent requests, so the best value might be in the hundreds.

This value can be overridden for tables in a particular tablespace by setting the tablespace parameter of the same name (see ALTER TABLESPACE)

random_page_cost

Sets the planner’s estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0.

Reducing this value relative to seq_page_cost will cause the system to prefer index scans; raising it will make index scans look relatively more expensive. You can raise or lower both values together to change the importance of disk I/O costs relative to CPU costs.

Random access to mechanical disk storage is normally much more expensive than four times sequential access. However, a lower default is used (4.0) because the majority of random accesses to disk, such as indexed reads, are assumed to be in cache. The default value can be thought of as modeling random access as 40 times slower than sequential, while expecting 90% of random reads to be cached.

If you believe a 90% cache rate is an incorrect assumption for your workload, you can increase random_page_cost to better reflect the true cost of random storage reads. Correspondingly, if your data is likely to be completely in cache, such as when the database is smaller than the total server memory, decreasing random_page_cost can be appropriate. Storage that has a low random read cost relative to sequential, e.g., SSDs, might also be better modeled with a lower value for random_page_cost, for example, 1.1.

This value can be overridden for tables and indexes in a particular tablespace by setting the tablespace parameter of the same name (see ALTER TABLESPACE)

Determine disk usage

In ADPG/PostgreSQL, each table has a primary heap disk file where most of the data is stored. If a table has columns with potentially large values, there can also be a TOAST file associated with the table, which is used to store these large values. There will be one valid index on each TOAST table on a disk. In addition, there can be indexes associated with the base table. PostgreSQL stores each table and index in a separate disk file. If a file exceeds 1 GB, additional files will be created. Naming conventions for these files are described in the Database file layout article.

You can monitor disk space in three ways:

Use database object size functions

These SQL functions are the easiest to use and are generally recommended. They are listed in the Database object size functions table.

For example, the pg_relation_size() function returns the size of the specified table (pgbench_accounts in this example) in bytes. You can use the psql tool to run the following query:

SELECT pg_relation_size('pgbench_accounts');

Result:

 pg_relation_size
------------------
          671481856

The pg_size_pretty() function converts a size in bytes into a more human-readable format with size units: bytes, KB, MB, GB, or TB:

SELECT pg_size_pretty(pg_relation_size('pgbench_accounts'));

Result:

 pg_size_pretty
----------------
 640 MB

You can use the pg_database_size() function to compute the total disk space used by the specified database (example in this example):

SELECT pg_size_pretty (pg_database_size('example'));

Result:

 pg_size_pretty
----------------
  756 MB

Inspect system catalogs

You can issue queries in psql to see the disk usage of a table. For correct results, a VACUUM or ANALYZE operation can be performed on the table. The following query determines the number of pages (the relpages column) that belong to the pgbench_accounts table:

SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'pgbench_accounts';

Result:

 pg_relation_filepath | relpages
----------------------+----------
 base/19191/19244     |    81968

Each page is typically 8 kilobytes. This is the same result that the pg_relation_size() function returns: KB. Note that relpages is only updated by VACUUM, ANALYZE, and a few DDL commands such as CREATE INDEX. The file path (the pg_relation_filepath column) is of interest if you want to examine the table’s disk file directly.

To determine the space used by the TOAST tables associated with the my_table table, use the following query:

SELECT relname, relpages
FROM pg_class,
(SELECT reltoastrelid
FROM pg_class
WHERE relname = 'my_table') AS ss
WHERE oid = ss.reltoastrelid OR
oid = (SELECT indexrelid
FROM pg_index
WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;

Result:

        relname        | relpages
-----------------------+----------
 pg_toast_295925       |        0
 pg_toast_295925_index |        1

The query below displays the index size for the pgbench_branches table:

SELECT c2.relname, c2.relpages
FROM pg_class c, pg_class c2, pg_index i
WHERE c.relname = 'pgbench_branches' AND
      c.oid = i.indrelid AND
      c2.oid = i.indexrelid
ORDER BY c2.relname;

Result:

        relname        | relpages
-----------------------+----------
 ind_bbalance          |        2
 pgbench_branches_pkey |        2

To find the largest tables and indexes, use the following query:

SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC LIMIT 5;

Result:

       relname         | relpages
-----------------------+----------
 pgbench_accounts      |    81968
 pgbench_accounts_pkey |    13713
 pg_proc               |       94
 pg_depend             |       65
 pg_toast_2618         |       63

Use the oid2name module

The oid2name utility program helps administrators examine the file structure used by ADPG/PostgreSQL.

oid2name connects to a target database and extracts OID, filenode (a unique numeric identifier used to name on-disk files belonging to a relation), and/or table name information:

$ oid2name

Result:

All databases:
     Oid  Database Name  Tablespace
-----------------------------------
  19191        example  pg_default
  14486       postgres  pg_default
  14485      template0  pg_default
      1      template1  pg_default

For example, consider the example database from the list above. To do this, change the current catalog to the database catalog:

cd /pg_data1/adpg16/base/19191

where 19191 is Oid of the example database from the command output above.

Display the first 5 database objects in the default tablespace, ordered by size (the biggest):

ls -lS * | head -5

Result:

-rw------- 1 postgres postgres 671481856 Jul 31 10:54 19244
-rw------- 1 postgres postgres 112336896 Jul 31 11:03 19261
-rw------- 1 postgres postgres    770048 Jul 31 10:43 1255
-rw------- 1 postgres postgres    532480 Aug  1 08:11 2608
-rw------- 1 postgres postgres    516096 Jul 31 10:43 2838

Determine what the 19244 file is:

oid2name -d example -f 19244

Result:

  Filenode        Table Name
----------------------------
     19244  pgbench_accounts

The oid2name utility requires a running database server with non-corrupt system catalogs. In situations with catastrophic database corruption, its use is limited.

Populate a database

This section provides some suggestions on how to efficiently insert a large amount of data when populating a database for the first time or during the process of database use.

Disable autocommit

When using multiple INSERT operators, turn off autocommit and do one commit at the end. In plain SQL, this means issuing BEGIN at the start and COMMIT at the end. Some client libraries can do this automatically, in such cases you need to check this behaviour. If you allow each insertion to be committed separately, PostgreSQL does a lot of work for each added row. An additional benefit of doing all insertions in one transaction is that if the insertion of one row fails, the insertion of all rows will be rolled back. So, you will not be stuck with partially loaded data.

Use COPY

Use COPY to load all the rows in one command, instead of using a series of INSERT commands. The COPY command is optimized for loading large numbers of rows. It is less flexible than INSERT but has significantly less overhead for large data loads. Since COPY is a single command, there is no need to disable autocommit if you use this method to populate a table.

If you cannot use COPY, it may be useful to utilize PREPARE to create a prepared INSERT statement, and then use EXECUTE as many times as required. This allows you to avoid the overhead of repeatedly parsing and planning INSERT.

Note that loading a large number of rows using COPY is almost always faster than using INSERT, even if PREPARE is used and multiple insertions are batched into a single transaction.

If the wal_level parameter is minimal, COPY is fastest when executed in the same transaction as the CREATE TABLE or TRUNCATE commands. In such cases, there is no need to write WAL, since in case of an error, the files containing the newly loaded data will be removed.

Remove indexes

If you load data into a table you just created, the fastest method is to add data using COPY and then create indexes. It takes less time to create an index for existing data than to update it sequentially as each row is added.

If you need to add large amounts of data to an existing table, it may make sense to drop the indexes, load the table, and then recreate the indexes. Note that the database performance for other users might suffer during the time the indexes are missing. Additionally, you should think twice before dropping unique indexes, as without them the corresponding key checks will not be performed.

Remove foreign key constraints

Just as with indexes, a foreign key constraint can be checked "in bulk" more efficiently than row-by-row. It can be useful to drop foreign key constraints, load data, and re-create the constraints. In this case, you have to choose between the speed of loading data and loss of error checking while the constraint is missing.

Moreover, when you load data into a table with existing foreign key constraints, each new row requires an entry in the server’s list of pending trigger events, since it is the firing of a trigger that checks the row’s foreign key constraint. Loading many millions of rows can cause the trigger event queue to overflow available memory, leading to intolerable swapping or even the failure of the command. Therefore, it may be necessary, not just desirable, to drop and re-apply foreign keys when loading large amounts of data. If temporarily removing the constraint is not acceptable, the only possible solution may be to split the load operation into smaller transactions.

Increase maintenance_work_mem

Temporarily increasing the maintenance_work_mem configuration parameter when loading large amounts of data can improve performance. It helps to speed up the CREATE INDEX commands and ALTER TABLE ADD FOREIGN KEY commands. This does not affect the speed of the COPY command, so this tip is only useful when you are using one or both of the techniques described above. maintenance_work_mem is located in the ADPG configurations section of the Configuration tree on the Clusters → ADPG cluster → Services → ADPG → Primary configuration page in ADCM UI. See Configuration parameters.

Increase max_wal_size

Temporarily increasing the max_wal_size configuration parameter can also make large data loads faster. Loading a large amount of data causes checkpoints to occur more often than the normal checkpoint frequency (specified by the checkpoint_timeout configuration parameter). Whenever a checkpoint occurs, all dirty pages must be flushed to disk. The number of checkpoints that are required can be reduced by increasing max_wal_size during bulk data loads. max_wal_size is also located in the ADPG configurations section of the Configuration tree on the Clusters → ADPG cluster → Services → ADPG → Primary configuration page in ADCM UI.

Disable WAL archiving and backups

When loading large amounts of data to a cluster that uses WAL archiving and streaming replication, it might be faster to take a new base backup after the load is complete than to process a large amount of incremental WAL data.

To disable WAL archiving, set the following configuration parameters:

Use ADPG configurations on the Clusters → ADPG cluster → Services → ADPG → Primary configuration tab in ADCM UI to specify these parameters. See Configuration parameters for details.

To disable backup creation, switch off the Enable backups toggle button displayed on the Primary configuration tab of the ADPG service (Clusters → ADPG cluster → Services → ADPG → Primary configuration).

To apply changes of these settings, execute the Reconfigure & Restart action.

IMPORTANT
When you disable WAL archiving, it makes any backups taken before unavailable for archive recovery and a standby server, which may lead to data loss.

These settings also make certain commands run faster because they do not write WAL if wal_level is set to minimal.

Run ANALYZE afterwards

Whenever you have significantly altered the distribution of data within a table, running the ANALYZE command is strongly recommended. Running ANALYZE (or VACUUM ANALYZE) ensures that the planner has up-to-date statistics about the table. With no statistics or obsolete statistics, the planner might make poor decisions during query planning, leading to poor performance. Note that if the autovacuum daemon is enabled, it might run ANALYZE automatically.

Improve pg_dump performance

Dump scripts generated by pg_dump automatically apply several, but not all, of the above guidelines. To restore a dump as quickly as possible, you need to do a few extra things manually. Note that these points apply while restoring a dump, not while creating it.

By default, pg_dump uses COPY, and when it is generating a complete schema-and-data dump, it is careful to load data before creating indexes and foreign keys. So, several guidelines are handled automatically. All you have to consider is the following:

  • Set appropriate (larger than normal) values for maintenance_work_mem and max_wal_size.

  • If using WAL archiving or streaming replication, disable them during the restore as described in the Disable WAL archiving and backups section.

  • Experiment with the parallel dump and restore modes of both the pg_dump and pg_restore utility and find the optimal number of concurrent jobs. Parallel data copying and restoring, controlled by the -j option, should give you a significantly higher performance over the serial mode.

  • In some cases, it may be preferable to restore the dump in a single transaction. To do this, pass the -1 or --single-transaction command-line option to psql or pg_restore. COPY commands will also run faster if you use a single transaction and disable WAL archiving. Please note that when using this mode, any error will roll back the entire restore, potentially resulting in the loss of many hours of processing.

  • If multiple CPUs are available in the database server, consider using the --jobs option of pg_restore. This allows concurrent data loading and index creation.

  • Run ANALYZE after backup restoring.

A data-only dump also uses COPY, but it does not drop or recreate indexes, and it does not operate with foreign keys by default. So, when loading a data-only dump, it is your responsibility to drop and recreate indexes and foreign keys. It is still useful to increase max_wal_size while loading the data, but not maintenance_work_mem. maintenance_work_mem is worth changing only if you manually recreate indexes and foreign keys. It is also recommended to run ANALYZE at the end of the recovery.

Found a mistake? Seleсt text and press Ctrl+Enter to report it