Analyze queries
PostgreSQL creates an execution plan for each query. Since the resulting plan is critical to performance, PostgreSQL includes a complex planner that chooses the best plan based on an estimated cost of each alternative option. You can use the EXPLAIN command to see what plan the planner produces for a query.
Create a table for test examples:
CREATE TABLE test_table (field1 integer, field2 text);
Add data to the table and execute the ANALYZE command to create table statistics after data update:
INSERT INTO test_table
SELECT i, md5(random()::text)
FROM generate_series(1, 1000000) AS i;
ANALYZE test_table;
Call the EXPLAIN
command to see the query plan:
EXPLAIN SELECT * FROM test_table;
The result:
QUERY PLAN --------------------------------------------------------------------- Seq Scan on test_table (cost=0.00..18334.00 rows=1000000 width=37)
PostgreSQL can read data from a table in several ways. Seq Scan
reports that PostgreSQL reads data sequentially, block by block.
The cost
option contains two values: estimated start-up and total costs of the query execution. The start-up cost has the 0.00
value. This is the cost value before the output phase begins, for example, the cost value required to sort in a sort node. The estimated total cost (18334.00
) shows the cost that is necessary for retrieving all available rows. The cost
option does not take into account resources taken to send result rows to the client, because the planner cannot change this value by altering the plan. By default, the cost
is measured in units of disk page fetches (seq_page_cost
). The cost
value most often indicates the consumption of resources, and not the required time, although these values correlate with each other. See Planner Cost Constants for details.
The rows
option is the estimated number of rows that Seq Scan
returns.
The width
option is the estimated average size of an output row in bytes.
When you call EXPLAIN
, PostgreSQL does not execute the query, instead it creates an estimated execution plan based on available statistics. The actual plan can differ. Use the ANALYZE
parameter to execute the query and get actual statistics.
EXPLAIN (ANALYZE) SELECT * FROM test_table;
The result:
QUERY PLAN ------------------------------------------------------------------ Seq Scan on test_table (cost=0.00..18334.00 rows=1000000 width=37) (actual time=0.005..62.035 rows=1000000 loops=1) Planning Time: 0.056 ms Execution Time: 96.757 ms
The actual time
option is the time expended to get the first row and all rows, in milliseconds.
The rows
option is the actual number of rows that Seq Scan
returns.
The loops
option shows how many times PostgreSQL scans the entire table, i.e. executes Seq Scan
on all table data.
Execution Time
is the total time to complete the request.
The EXPLAIN ANALYZE
command executes queries. If you need to get statistics on the modifying INSERT
, DELETE
, or UPDATE
statements and do not change data, use the BEGIN and ROLLBACK commands to execute EXPLAIN ANALYZE
inside a transaction block with a rollback:
BEGIN;
EXPLAIN (ANALYZE) INSERT INTO test_table
SELECT i, md5(random()::text)
FROM generate_series(1, 100) AS i;
ROLLBACK;
Cache usage
The EXPLAIN
command allows you to get information on buffer usage. Utilize the BUFFERS
parameter for this purpose.
To see correct results, restart ADPG to clear the buffer cache before testing BUFFERS
. You can use the Reconfigure & Restart action for this.
Execute the EXPLAIN
command with the ANALYZE
and BUFFERS
parameters:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM test_table;
The result:
QUERY PLAN ------------------------------------------------------------------ Seq Scan on test_table (cost=0.00..18334.00 rows=1000000 width=37) (actual time=1.400..626.291 rows=1000000 loops=1) Buffers: shared read=8334 Planning: Buffers: shared hit=47 read=14 Planning Time: 35.535 ms Execution Time: 663.332 ms
PostgreSQL reads the table in parts called blocks. The Buffers: shared read
option is the number of blocks read from the disk. The cache is empty and the table is completely read from the disk, 8334 blocks are read.
Run the same query:
EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM test_table;
The result:
QUERY PLAN ------------------------------------------------------------------- Seq Scan on test_table (cost=0.00..18334.00 rows=1000000 width=37) (actual time=0.037..73.519 rows=1000000 loops=1) Buffers: shared hit=32 read=8302 Planning Time: 0.033 ms Execution Time: 108.673 ms
PostgreSQL adds the Buffers: shared hit
parameter to the result. Buffers: shared hit
is the number of blocks read from the PostgreSQL cache.
If you repeat this query multiple times, you will see that PostgreSQL takes more and more data from the cache each time. PostgreSQL adds data to the cache on each request. It reads from the cache faster than from the disk. PostgreSQL increases Buffers: shared hit
and reduces Execution Time
on each query execution.
The shared_buffers
configuration parameter determines the cache size. You can change it via ADCM, see Configuration parameters.
Index usage
Add the WHERE
clause to the query:
EXPLAIN SELECT * FROM test_table WHERE field1 > 750;
The result:
QUERY PLAN --------------------------------------------------------------------- Seq Scan on test_table (cost=0.00..20834.00 rows=999288 width=37) Filter: (field1 > 750)
The table has no indexes, and PostgreSQL reads each record sequentially (Seq Scan
).
Filter: (field1 > 750)
means that each entry is compared with the condition field1 > 750
. If a field1
value meets the condition, PostgreSQL inserts the corresponding row into the result.
The cost
values is increased. The expected number of result rows (rows
) is reduced because of the WHERE clause.
Create an index on field1
and execute the same query:
CREATE INDEX ON test_table(field1);
EXPLAIN (ANALYZE) SELECT * FROM test_table WHERE field1 > 750;
The result:
QUERY PLAN --------------------------------------------------------------------- Seq Scan on test_table (cost=0.00..20834.00 rows=999279 width=37) (actual time=0.132..89.711 rows=999250 loops=1) Filter: (field1 > 750) Rows Removed by Filter: 750 Planning Time: 0.083 ms Execution Time: 121.256 ms
Only 750 rows were filtered out, query returns more than 99.9% of the table. Despite an index, the planner chose Seq Scan
.
Utilize the enable_seqscan setting to disable Seq Scan
and force the query planner to use the index:
SET enable_seqscan TO off;
EXPLAIN (ANALYZE) SELECT * FROM test_table WHERE field1 > 750;
The result:
QUERY PLAN --------------------------------------------------------------------- Index Scan using test_table_field1_idx on test_table (cost=0.42..36794.81 rows=999279 width=37) (actual time=0.054..162.518 rows=999250 loops=1) Index Cond: (field1 > 750) Planning Time: 0.087 ms Execution Time: 195.522 ms
Index Scan
means that PostgreSQL scans an index. Index Cond
contains the condition used to find rows.
PostgreSQL uses the index, but it increases the execution time. The planner chose the best plan without index in the previous example.
Enable the enable_seqscan
setting and modify the query as follows:
SET enable_seqscan TO on;
EXPLAIN (ANALYZE) SELECT * FROM test_table WHERE field1 < 750;
The result:
QUERY PLAN --------------------------------------------------------------------- Index Scan using test_table_field1_idx on test_table (cost=0.42..31.02 rows=720 width=37) (actual time=0.031..0.148 rows=749 loops=1) Index Cond: (field1 < 750) Planning Time: 0.074 ms Execution Time: 0.182 ms
The planner uses the test_table_field1_idx
index to get the best result.
Add an index on field2
. Since our database stores text fields in UTF8, create the index with the text_pattern_ops operator class.
CREATE INDEX ON test_table(field2 text_pattern_ops);
Run the following query with the condition for the text field:
EXPLAIN (ANALYZE) SELECT * FROM test_table WHERE field2 LIKE 'a%';
The result:
QUERY PLAN --------------------------------------------------------------------- Bitmap Heap Scan on test_table (cost=2711.24..11912.67 rows=70714 width=37) (actual time=7.672..25.626 rows=62382 loops=1) Filter: (field2 ~~ 'a%'::text) Heap Blocks: exact=8332 -> Bitmap Index Scan on test_table_field2_idx1 (cost=0.00..2693.57 rows=69314 width=0) (actual time=6.343..6.344 rows=62382 loops=1) Index Cond: ((field2 ~>=~ 'a'::text) AND (field2 ~<~ 'b'::text)) Planning Time: 0.097 ms Execution Time: 28.520 ms
In the EXPLAIN
command output, the ->
sign and indentation reflect the hierarchical plan tree structure.
Bitmap Index Scan
collects a bitmap with the positions of records that match the condition. After that, PostgreSQL sequentially scans the target table for these positions within Bitmap Heap Scan
.
Bitmap Heap Scan
works effectively with a large number of records that correspond to a condition.
PostgreSQL also supports Index Only Scan
operation that is used if a query only references indexed columns. Execute a query that returns a single indexed field:
EXPLAIN SELECT field1 FROM test_table WHERE field1 < 750;
The result:
QUERY PLAN --------------------------------------------------------------------- Index Only Scan using test_table_field1_idx on test_table (cost=0.42..29.69 rows=758 width=4) Index Cond: (field1 < 750)
Index Only Scan
is faster than Index Scan
, because Index Only Scan
does not need to read the entire row (width=4
), except checking the visibility of an index row.
Indexes improve performance of different operations, including sorting. Consider the ORDER BY
clause.
Drop the index for field1
:
DROP INDEX test_table_field1_idx;
Sort the table by field1
:
EXPLAIN (ANALYZE) SELECT * FROM test_table ORDER BY field1;
The result:
QUERY PLAN --------------------------------------------------------------------- Gather Merge (cost=63795.04..161033.70 rows=833416 width=37) (actual time=94.311..273.651 rows=1000000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=62795.02..63836.79 rows=416708 width=37) (actual time=91103..127.215 rows=333367 loops=3) Sort Key: field1 Sort Method: external merge Disk: 13568kB Worker 0: Sort Method: external merge Disk: 12040kB Worker 1: Sort Method: external merge Disk: 20504kB -> Parallel Seq Scan on test_table (cost=0.00..12502.08 rows=416708 width=37) (actual time=0.008..29.574 rows=333367 loops=3) Planning Time: 0.099 ms Execution Time: 316.971 ms
When the planner determines that a parallel query is the fastest execution strategy for a particular query, it creates a plan that includes a Gather
or Gather Merge
node (in our case, the Gather Merge
node). This node includes the execution plan subtree, which is processed in parallel.
The first operation is Parallel Seq Scan
on test_table
. The table blocks are divided among the cooperating processes (Worker 0
and Worker 1
). Worker 0
and Worker 1
scan different consecutive blocks of the table in parallel.
Then, PostgreSQL sorts the result. Sort Key
contains the sort condition (field1
). Sort Method
is external merge Disk
: the system creates temporary files on disk.
The query execution plan above contains several operations and looks complex.
Add an index to the field1
and execute the same query:
CREATE INDEX ON test_table(field1);
EXPLAIN (ANALYZE) SELECT * FROM test_table ORDER BY field1;
The result:
QUERY PLAN --------------------------------------------------------------------- Index Scan using test_table_field1_idx on test_table (cost=0.42..34319.93 rows=1000100 width=37) (actual time=0.018..106.251 rows=1000100 loops=1) Planning Time: 0.140 ms Execution Time: 148.662 ms
With an index, the plan contains only the Index Scan
operation,
and as a consequence — the execution time is reduced.