Partitioning
Overview
PostgreSQL supports basic table partitioning. Partitioning allows you to split one large logical table into multiple smaller physical partitions. Partitioning is only useful for very large tables. Which tables will benefit from partitioning depends on the application, although it should generally be utilized for tables that are larger than the server RAM.
Partitioning can provide the following benefits:
-
Improve dramatically query performance in certain situations. For example, when the most part of the frequently requested rows is in one or only a few partitions. Partitioning effectively substitutes the top levels of the index tree, making it more likely that the most requested parts of the indexes can be in memory.
-
To improve the performance for queries that access a large percentage of rows from a single partition, PostgreSQL can use a sequential scan of this partition instead of an index, which requires random reading of data scattered across the whole table.
-
You can accomplish bulk loads and deletes by adding or removing partitions. The
DROP TABLE
command, which can be used to drop an individual partition, orALTER TABLE DETACH PARTITION
, are faster than bulk operations with rows. -
Rarely used data can be migrated to cheaper and slower storage media.
Declarative partitioning
PostgreSQL allows you to specify conditions according to which a table is divided into partitions. The divided table is a partitioned table. The partitioning declaration includes a partitioning method and a list of columns or expressions to be used as a partition key. The partitioning methods are described in the table below.
Method | Description |
---|---|
Range partitioning |
The table is partitioned into ranges defined by a key column or set of columns. For example, partitioning by date ranges, or by ranges of identifiers for particular business objects. The limits of each range are considered to include the lower value and exclude the upper one. For example, if the first partition range is from 1 to 10, and the second partition range is from 10 to 20, the value 10 belongs to the second partition, not the first |
List partitioning |
The table is partitioned by a list that specifies which key values should be in each partition |
Hash partitioning |
The table is partitioned by specifying a modulus and a remainder for partitions. Each partition holds the rows for which the hash value of the partition key divided by the specified modulus produces the specified remainder |
If your application needs to use other partitioning forms not listed above, you can use alternative methods, for example, use inheritance to implement partitioning. This method offers flexibility but does not have the performance benefits of built-in declarative partitioning.
Use the CREATE TABLE command to define a partitioned table and its partitions. The base syntax of the partitioned table creation is:
CREATE <table_name> PARTITION BY { RANGE | LIST | HASH } ( { <column_name> | ( <expression> ) } )
The base syntax of partitions:
CREATE TABLE <partition_name> PARTITION OF <table_name>
{ FOR VALUES FROM (<value_min>) TO (<value_max>)
| FOR VALUES IN ([<value>])
| FOR VALUES WITH (MODULUS <modulus_value>, REMAINDER <remainder_value>) }
The partitioned table is a virtual table that does not have rows. Partitions are ordinary tables associated with the partitioned one. Each partition stores a subset of rows defined by partition key values. PostgreSQL inserts rows into an appropriate partition based on the partition key. If you update a row partition key value and it no longer satisfies the partition values, the row is moved into a different partition.
You cannot convert a regular table to a partitioned table or vice versa. However, you can add an existing table as a partition to a partitioned table, or detach a partition and turn it into a separate table.
Range partitioning
Range partitioning is often used with timestamp
and date
columns.
The code below creates three partitions and distributes books according to their delivery_date
. Books delivered in February will be placed in the books_2022_02
partition, books delivered in March — in books_2022_03
, and in April — in books_2022_04
. It also creates a default partition for rows that do not suit other partitions:
CREATE TABLE books (
code char(5) NOT NULL,
title varchar(40) NOT NULL,
delivery_date date NOT NULL,
genre varchar(10)
) PARTITION BY RANGE (delivery_date);
CREATE TABLE books_2022_02 PARTITION OF books FOR VALUES FROM ('2022-02-01') TO ('2022-03-01');
CREATE TABLE books_2022_03 PARTITION OF books FOR VALUES FROM ('2022-03-01') TO ('2022-04-01');
CREATE TABLE books_2022_04 PARTITION OF books FOR VALUES FROM ('2022-04-01') TO ('2022-05-01');
CREATE TABLE books_default PARTITION OF books DEFAULT;
The following row is added to the books_2022_04
partition:
INSERT INTO books (code, title, delivery_date, genre) VALUES
('DC-34','Hyperion', date'2022-04-28', 'sci-fi');
Also, the special MINVALUE
and MAXVALUE
values can be used when creating a range partition to indicate that there is no lower or upper bound on the column value. For example:
CREATE TABLE table1_part1 PARTITION OF table1 FOR VALUES FROM (MINVALUE) TO (100);
CREATE TABLE table1_part2 PARTITION OF table1 FOR VALUES FROM (100) TO (1000);
CREATE TABLE table1_part3 PARTITION OF table1 FOR VALUES FROM (1000) TO (MAXVALUE);
List partitioning
The table below uses the genre
column as the partition key for list partitioning. You can specify multiple values for one partition. The code below creates a default partition, a partition for novels, a partition for sci-fi books, and one partition for detective and mystery books:
CREATE TABLE books (
code char(5) NOT NULL,
title varchar(40) NOT NULL,
delivery_date date NOT NULL,
genre varchar(10) NOT NULL
) PARTITION BY LIST (genre);
CREATE TABLE books_novel PARTITION OF books FOR VALUES IN ('novel');
CREATE TABLE books_scifi PARTITION OF books FOR VALUES IN ('sci-fi');
CREATE TABLE books_detective PARTITION OF books FOR VALUES IN ('detective', 'mystery');
CREATE TABLE books_default PARTITION OF books DEFAULT;
Hash partitioning
You can use hash partitioning if you need to equally distribute rows across partitions and a table does not have an appropriate partition key.
Example:
CREATE TABLE books (
code char(5) NOT NULL,
title varchar(40) NOT NULL,
delivery_date date NOT NULL,
genre varchar(10) NOT NULL
) PARTITION BY HASH (code);
CREATE TABLE books_part1 PARTITION OF books FOR VALUES WITH (MODULUS 5, REMAINDER 0);
CREATE TABLE books_part2 PARTITION OF books FOR VALUES WITH (MODULUS 5, REMAINDER 1);
CREATE TABLE books_part3 PARTITION OF books FOR VALUES WITH (MODULUS 5, REMAINDER 2);
CREATE TABLE books_part4 PARTITION OF books FOR VALUES WITH (MODULUS 5, REMAINDER 3);
CREATE TABLE books_part5 PARTITION OF books FOR VALUES WITH (MODULUS 5, REMAINDER 4);
Partition maintenance
You can create an index on the key columns on the partitioned table or any other indexes you may need. It is not necessary to create a key index, but in most cases it is useful. This operation automatically creates a corresponding index on each partition, and any partitions that you create or append later will also contain this index. Indexes or unique constraints created on a partitioned table are virtual. In fact, the data resides in the child indexes of the partition tables.
The following example implies that the delivery_date
column is used as the partitioning key and creates an index on it:
CREATE INDEX ON books (delivery_date);
Although all partitions must have the same columns as the partitioned parent table, each partition can have its own indexes, constraints, and default values.
NOTE
For query optimization, the enable_partition_pruning option must be set to on , which is the default value in ADPG. See Partition pruning for details.
|
Sub-partitioning
Partition tables can be partitioned to implement sub-partitioning. For example, we can add sub-partitions to the list partitioning example:
CREATE TABLE books_novel PARTITION OF books FOR VALUES IN ('novel');
CREATE TABLE books_detective PARTITION OF books FOR VALUES IN ('detective', 'mystery');
CREATE TABLE books_scifi PARTITION OF books FOR VALUES IN ('sci-fi') PARTITION BY RANGE (delivery_date);
CREATE TABLE books_scifi_2022_01 PARTITION OF books_scifi FOR VALUES FROM ('2022-01-01') TO ('2022-02-01');
CREATE TABLE books_scifi_2022_02 PARTITION OF books_scifi FOR VALUES FROM ('2022-02-01') TO ('2022-03-01');
CREATE TABLE books_scifi_2022_03 PARTITION OF books_scifi FOR VALUES FROM ('2022-03-01') TO ('2022-04-01');
PostgreSQL inserts the following row to the books_scifi_2022_02
table:
INSERT INTO books (code, title, delivery_date, genre) VALUES
('AC-29','The time machine', date'2022-02-28', 'sci-fi');
Remove a partition
The set of partitions can be changed. It is a standard operation to remove partitions with obsolete data and add partitions for new data. One of the most important benefits of partitioning is that it allows you to perform resource-consuming operations instantly by changing the structure of the partitions, rather than physically moving large amounts of data.
You can use the DROP TABLE
command to drop a partition that is no longer needed:
DROP TABLE books_novel;
This operation can delete millions of records quickly but requires the ACCESS EXCLUSIVE
lock on the partitioned table.
Another option is to remove a partition from the partitioned table but retain access to it as an ordinary table. It allows further operations to be performed on the data before it is dropped. It has two forms:
ALTER TABLE books DETACH PARTITION books_novel;
ALTER TABLE books DETACH PARTITION books_novel CONCURRENTLY;
The first form of the command requires the ACCESS EXCLUSIVE
lock on the parent table. The command with the CONCURRENTLY
qualifier utilizes only the SHARE UPDATE EXCLUSIVE
lock. See DETACH PARTITION for details.
Add a partition
You can use the following code to add a new partition:
CREATE TABLE books_2022_05 PARTITION OF books FOR VALUES FROM ('2022-05-01') TO ('2022-06-01');
Alternatively, it is possible to create a new table outside the partition structure and use the ATTACH PARTITION clause to make it a partition later. This allows new data to be loaded, checked, and transformed before it appears in the partitioned table:
CREATE TABLE books_2022_06 (LIKE books INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
ALTER TABLE books_2022_06 ADD CONSTRAINT month_2022_06
CHECK ( delivery_date >= DATE '2022-06-01' AND delivery_date < DATE '2022-07-01' );
ALTER TABLE books ATTACH PARTITION books_2022_06
FOR VALUES FROM ('2022-06-01') TO ('2022-07-01' );
The ATTACH PARTITION
clause requires the SHARE UPDATE EXCLUSIVE
lock on the partitioned table. It is recommended to create a CHECK
constraint on the attached table that matches the new partition constraint, as illustrated above. This way, the system skips the scan which is needed to validate the implicit partition constraint.
Limitations
The following limitations are applied to partitioned tables:
-
To create a unique or primary key constraint on a partitioned table, the partition keys must not include any expressions or function calls and the constraint columns must include all the partition key columns.
-
It is impossible to create an exclusion constraint for the whole partitioned table. You can only put such a constraint on each partition individually.
-
The
BEFORE ROW
triggers onINSERT
cannot change the partition into which a new row will be inserted. -
It is impossible to mix temporary and permanent relations in the same partitioned table.
-
A partition table cannot have any parents other than the partitioned table.
-
Partitions cannot include columns that do not exist in the partitioned table.
-
The
CHECK
andNOT NULL
constraints of a partitioned table are always inherited by all its partitions. You cannot drop these constraints on a partition column if the same constraint exists in the partitioned table.
Use inheritance to implement partitioning
While the built-in declarative partitioning is suitable for most cases, there are some circumstances where a more flexible approach may be useful. Partitioning can be implemented using table inheritance, which allows for several features not supported by declarative partitioning:
-
For declarative partitioning, partitions must have the same set of columns as the partitioned table, whereas with table inheritance, child tables can include extra columns not existing in the parent.
-
Table inheritance allows for multiple inheritance.
-
Declarative partitioning only supports range, list, and hash partitioning, while table inheritance allows the partitioning of data according to the user’s choice. Note, if constraint exclusion is unable to prune child partition tables effectively, query performance might be poor.
Example
This example creates a partitioning structure equivalent to the declarative range partitioning example above.
Create the parent table, from which all the child tables inherit structure:
CREATE TABLE books (
code char(5) NOT NULL,
title varchar(40) NOT NULL,
delivery_date date NOT NULL,
genre varchar(10)
);
Create several child tables that inherit from the parent and add non-overlapping constraints to define the allowed partitioning key values in each table:
CREATE TABLE books_2022_02 (
CHECK ( delivery_date >= DATE '2022-02-01' AND delivery_date < DATE '2022-03-01' )
) INHERITS (books);
CREATE TABLE books_2022_03 (
CHECK ( delivery_date >= DATE '2022-03-01' AND delivery_date < DATE '2022-04-01' )
) INHERITS (books);
CREATE TABLE books_2022_04 (
CHECK ( delivery_date >= DATE '2022-04-01' AND delivery_date < DATE '2022-05-01' )
) INHERITS (books);
For each child table, create an index on the key column and any other indexes if you need them:
CREATE INDEX books_2022_02_delivery_date ON books_2022_02 (delivery_date);
CREATE INDEX books_2022_03_delivery_date ON books_2022_03 (delivery_date);
CREATE INDEX books_2022_04_delivery_date ON books_2022_04 (delivery_date);
PostgreSQL should insert data into an appropriate child table. To do this, attach a suitable trigger function to the parent table:
CREATE OR REPLACE FUNCTION books_select_part()
RETURNS TRIGGER AS $$
BEGIN
IF NEW.delivery_date BETWEEN date'2022-02-01' AND date'2022-03-01' - 1 then
INSERT INTO books_2022_02 VALUES (NEW.*);
ELSIF NEW.delivery_date BETWEEN date'2022-03-01' and date'2022-04-01' - 1 then
INSERT INTO books_2022_03 VALUES (NEW.*);
ELSIF NEW.delivery_date BETWEEN date'2022-04-01' and date'2022-05-01' - 1 then
INSERT INTO books_2022_04 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'This date not in your partitions. Add a partition.';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
Create a trigger that calls the function:
CREATE TRIGGER insert_books_trigger
BEFORE INSERT ON books
FOR EACH ROW EXECUTE FUNCTION books_select_part();
In practice, it is best to check the condition on the last child table first if rows are inserted into it most often. For simplicity, we arranged the trigger tests in the same order as in the other code snippets for this example.
NOTE
If you use this type of partitioning, the constraint_exclusion parameter should not be disabled. Otherwise, child tables may be scanned when not required. In ADPG, the default value is partition .
|
Maintenance for inheritance partitioning
Remove a partition
You can use the DROP TABLE
command to drop a child table that is no longer needed:
DROP TABLE books_2022_02;
To remove the child table from the inheritance hierarchy but retain access to it as an ordinary table, use the ALTER TABLE
command:
ALTER TABLE books_2022_02 NO INHERIT books;
Add a partition
To add a new child table to handle new data, create an empty child table:
CREATE TABLE books_2022_05 (
CHECK ( delivery_date >= DATE '2022-05-01' AND delivery_date < DATE '2022-06-01' )
) INHERITS (books);
Alternatively, it is possible to create and populate a new child table before adding it to the table hierarchy. This allows new data to be loaded, checked, and transformed before it appears in the parent table:
CREATE TABLE books_2022_06
(LIKE books INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
ALTER TABLE books_2022_06 ADD CONSTRAINT month_2022_06
CHECK ( delivery_date >= DATE '2022-06-01' AND delivery_date < DATE '2022-07-01' );
ALTER TABLE books_2022_06 INHERIT books;
You also need to update a trigger function:
CREATE OR REPLACE FUNCTION books_select_part()
RETURNS TRIGGER AS $$
BEGIN
IF NEW.delivery_date BETWEEN date'2022-02-01' AND date'2022-03-01' - 1 then
INSERT INTO books_2022_02 VALUES (NEW.*);
ELSIF NEW.delivery_date BETWEEN date'2022-03-01' and date'2022-04-01' - 1 then
INSERT INTO books_2022_03 VALUES (NEW.*);
ELSIF NEW.delivery_date BETWEEN date'2022-04-01' and date'2022-05-01' - 1 then
INSERT INTO books_2022_04 VALUES (NEW.*);
ELSIF NEW.delivery_date BETWEEN date'2022-05-01' and date'2022-06-01' - 1 then
INSERT INTO books_2022_05 VALUES (NEW.*);
ELSIF NEW.delivery_date BETWEEN date'2022-06-01' and date'2022-07-01' - 1 then
INSERT INTO books_2022_06 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'This date not in your partitions. Add a partition.';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
Limitations
The partitioning implemented with inheritance has the following limitations:
-
PostgreSQL cannot automatically check if all
CHECK
constraints are mutually exclusive. It is safer to write code that generates child tables and creates or modifies associated objects. -
Indexes and foreign key constraints apply to single tables and not to their inheritance children.
-
The example above assumes that the key column values cannot be changed to move a row to another partition. An attempt to do that fails because of the
CHECK
constraints. If you need to handle such cases, put corresponding update triggers on the child tables. -
If you utilize the
VACUUM
orANALYZE
commands, you need to run them on the parent and each child table individually. -
The
INSERT
statements with theON CONFLICT
clauses do not work as expected because theON CONFLICT
action executes only in case of unique violations on the specified target relation, not its child relations. -
You need to create triggers to route rows to child tables. Triggers are complicated to write, and they are slower than the routing performed internally by declarative partitioning.