TOAST
Overview
PostgreSQL uses a fixed page size (the default value is 8 KB) and does not allow tuples to span multiple pages. Therefore, it is not possible to store large field values directly. PostgreSQL compresses large field values and/or splits them into multiple physical rows. The rows are saved to special storage. This method is known as TOAST (The Oversized-Attribute Storage Technique). PostgreSQL also uses the TOAST infrastructure to improve handling of large data values in memory.
Only certain data types support TOAST (for example, text, json, jsonb, and bytea). To support TOAST, a data type must have a variable-length (varlena) representation, in which, the first four-byte word contains the total value length in bytes (including itself). TOAST does not constrain the rest of the data type’s representation. TOAST values work by manipulating and interpreting the first word. Therefore, the C-level functions, which support TOASTed data types, must carefully handle potentially TOASTed input values. An input might not consist of a four-byte length word and contents until it is converted from the TOAST representation. To do this, invoke the PG_DETOAST_DATUM
macros before processing an input value. In some cases more efficient techniques are possible, see TOAST Considerations for detail.
TOAST uses two bits of the varlena length word for additional settings. It limits the logical size of any TOAST value to 1 GB. When both bits are zero, the value is an ordinary non-TOASTed value, and the remaining bits of the length word give the total datum size in bytes. When the highest-order or lowest-order bit (depending on the architecture) is set, the value has only a single-byte header instead of the four-byte header, and the remaining bits of that byte give the total datum size in bytes. This approach supports space-efficient storage of values shorter than 127 bytes, while still allowing the value of this data type to be extended to 1 GB if needed. As a special case, if the remaining bits of a single-byte header are all zero, the value is a TOAST pointer to out-of-line data, with several possible alternatives described below. The code stored in the second byte determines the type and size of the TOAST pointer. If the highest-order or lowest-order bit is clear but the adjacent bit is set, the content of the datum is compressed and must be decompressed before use. In this case, the remaining bits of the four-byte length word give the total size of the compressed datum, not the original data. The compression is also possible for out-of-line data but the varlena header does not include information about that. The content of the TOAST pointer contains this information instead.
You can select the compression technique used for either in-line or out-of-line compressed data for each column. To do this, set the COMPRESSION
column option in the CREATE TABLE
or ALTER TABLE
command. The default_toast_compression parameter determines the default COMPRESSION
value if the COMPRESSION
option is not set.
PostgreSQL provides multiple types of TOAST pointers:
-
The oldest and most common type is a pointer to out-of-line on-disk data stored in a separate TOAST table, associated with the table containing the TOAST pointer. PostgreSQL creates this on-disk pointer when a tuple is too large to be stored as is.
-
A TOAST pointer can refer to out-of-line data in memory. This data is necessarily short-lived, and will never appear on the disk. This technique is very useful to avoid copying and redundant processing of large data values.
Out-of-line, on-disk TOAST storage
If any columns of a table are stored in the TOAST format, the table has an associated TOAST table. The TOAST table OID
is saved in the pg_class.reltoastrelid
value of the main table. All TOAST values are kept in this TOAST table.
PostgreSQL divides out-of-line values (after compression if used) into chunks of at most TOAST_MAX_CHUNK_SIZE
bytes. The default value is about 2000 bytes, so that four chunk rows fit on a page. PostgreSQL stores each chunk as a separate row in the TOAST table. Every TOAST table has the following columns:
-
chunk_id
— an OID identifying the particular TOASTed value; -
chunk_seq
— a sequence number for the chunk within its value; -
chunk_data
— the actual data of the chunk.
A unique index on chunk_id
and chunk_seq
provides fast retrieval of the values. A TOAST pointer contains the OID of the related TOAST table and the OID of the specific value (its chunk_id
). The pointer also stores the logical size of the data (original uncompressed data length), physical stored size (different if compression is applied), and the compression method, if specified. The total size of an on-disk TOAST pointer is 18 bytes regardless of the actual size of the represented value.
The TOAST management code is triggered only when a row value is wider than TOAST_TUPLE_THRESHOLD
bytes (normally 2 KB). The TOAST code compress and/or move field values out-of-line until the row value is shorter than TOAST_TUPLE_TARGET
bytes (also normally 2 KB, adjustable), or it will be impossible to reduce the size. If out-of-line values are not changed during an UPDATE
operation, there are no costs associated with TOAST.
The TOAST management code recognizes four different strategies for storing TOAST-able columns on the disk:
-
PLAIN
prevents compression or out-of-line storage. It also disables the use of single-byte headers for varlena types. This is the only possible strategy for columns of data types that do not support TOAST. -
EXTENDED
allows both compression and out-of-line storage. This is the default value for most data types supported TOAST. PostgreSQL uses compression first, then out-of-line storage if the row is still too big. -
EXTERNAL
allows out-of-line storage but not compression.EXTERNAL
makes substring operations on text and bytea columns faster because these operations are optimized to fetch only the required parts of the out-of-line value when it is not compressed. -
MAIN
allows compression but not out-of-line storage. Actually, out-of-line storage is possible for such columns, but only when there is no other way to make the row small enough to fit on a page.
Each TOAST-able data type specifies a default strategy for columns of that data type, but you can change the strategy with the ALTER TABLE … SET STORAGE command.
You can also use the ALTER TABLE … SET (toast_tuple_target = N)
to adjust TOAST_TUPLE_TARGET
for each table.
This scheme has a number of advantages compared to a more straightforward approach such as allowing row values to span pages. Assuming that queries are usually qualified by comparisons against relatively small key values, most of the work will be done using the main row entry. The big values of TOASTed attributes is only pulled out (if selected) at the time the result is sent to the client. The main table is much smaller and more of its rows fit in the shared buffer cache than would be the case without any out-of-line storage. Sort sets shrink also, and PostgreSQL processes sorting in memory more often.
Out-of-Line, in-memory TOAST storage
TOAST pointers can also point to data located in the memory of the current server process. There are currently two types of pointers:
-
Indirect TOAST pointers refers to a non-indirect varlena value stored in memory. It is currently used during logical decoding to avoid creating physical tuples larger than 1 GB. Note, that the referenced data is available as long as the pointer exists, and there is no infrastructure to save this data.
-
Expanded TOAST pointers are useful for complex data types whose on-disk representation is not suited for computational purposes. As an example, the standard varlena representation of a PostgreSQL array includes dimensionality information, a NULLs bitmap if there are any NULL elements, then the values of all the elements in order. When the element type has a variable length, the only way to find an element is to scan through all the preceding elements. This representation is appropriate for on-disk storage. For computations, it is better to have an "expanded" or "deconstructed" representation in which all the element starting locations is identified. The TOAST pointer mechanism supports this by allowing a pass-by-reference Datum to point to a standard varlena value (the on-disk representation) or a TOAST pointer that points to an expanded representation in memory. The details of this expanded representation depend on the data type, though it must have a standard header and meet the other API requirements given in Declarations for access to "expanded" value representations.
TOAST pointers to expanded values can be read-write or read-only pointers. A function that receives a read-write pointer is allowed to modify the referenced value. A function, which gets a read-only pointer, must first create a copy to modify the value. This distinction and some associated conventions help to avoid unnecessary copying of extended values during query execution.
For all types of in-memory TOAST pointer, the TOAST management code ensures that no such pointer datum can accidentally be stored on disk. In-memory TOAST pointers are expanded to varlena values before storage — and can then be converted to TOAST pointers on disk if the containing tuple would otherwise be too big.