Tables

Table is a main data object in ADB. ADB tables are similar to tables in any relational database, except distribution of table rows across the different cluster segments. When creating a table, you specify the table distribution policy.

Also, ADB provides several table storage types and compression options that can be used to store data in the most efficient way.

Create a new table

Syntax

To create an initially empty table in the current database and define its structure, use the CREATE TABLE command. The user who issues the command owns the table. The basic syntax of this command is described below.

NOTE
For more details, see CREATE TABLE in the Greenplum documentation.
CREATE [ { TEMPORARY | TEMP } | UNLOGGED] TABLE [IF NOT EXISTS]
  <table_name> (
  [ { <column_name> <data_type> [ COLLATE <collation> ] [<column_constraint> [ ... ] ] [ ENCODING ( <storage_column_directive> [, ...] ) ]
    | <table_constraint>
    | LIKE <source_table> [ <like_option> ... ] }
    [, ... ]
  ])
[ INHERITS ( <parent_table> [, ... ] ) ]
[ WITH ( <storage_table_directive> [, ... ] ) ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <tablespace_name> ]
[ DISTRIBUTED BY (<column_name> [<opclass>], [ ... ] )
       | DISTRIBUTED RANDOMLY | DISTRIBUTED REPLICATED ]
[ PARTITION BY <partition_type> (<column_name>) <partition_specification>]
Basic clauses

TEMPORARY | TEMP

Specifies a temporary table, which is automatically dropped at the end of a session, or optionally at the end of the current transaction (see ON COMMIT below)

UNLOGGED

Specifies an unlogged table, which data is not written to the write-ahead log (WAL). This makes such tables faster than ordinary tables, yet not crash-safe. The contents of unlogged tables are not replicated to mirror segment instances

COLLATE

Defines the column collation. If not specified, the column data type’s default collation is used

ENCODING

Defines the column storage options: type of compression, compression level, and block size. This clause is valid only for append-optimized column-oriented tables

LIKE

Specifies a table from which the new table automatically copies all column names, their data types, NOT NULL-constraints. Other column definition parts can be included or excluded via the INCLUDING/EXCLUDING statements (see the <like_option> argument below). Unlike INHERITS, the new table and original table are completely decoupled after creation is complete. It means that changing the original table structure (via ALTER TABLE) has no impact on the new table. Pay attention that storage properties specified in the WITH clause and partition structure are not copied

INHERITS

Specifies a list of tables from which the new table automatically inherits all columns and their constraints. Compare to the LIKE clause, use of INHERITS creates a persistent relationship between the new child table and its parent table. All modifications applied to the parent table structure normally propagate to the child one as well

WITH

Defines storage parameters for the table

ON COMMIT

Specifies the behavior of temporary tables at the end of a transaction block. Possible values:

  • PRESERVE ROWS — no special action is taken. This is the default behavior.

  • DELETE ROWS — all rows in the temporary table should be deleted at the end of the current transaction. Essentially, an automatic TRUNCATE is done.

  • DROP — the temporary table should be dropped at the end of the current transaction.

TABLESPACE

Defines the name of the tablespace in which the new table is to be created. If not specified, the database’s default tablespace is used, or the server configuration parameter temp_tablespaces (in case of a temporary table)

DISTRIBUTED BY

Sets the table distribution policy. Possible values:

  • DISTRIBUTED BY (<column_name> [<opclass>], [ …​ ] ) — hash distribution.

  • DISTRIBUTED RANDOMLY — random distribution that is based on the round-robin algorithm.

  • DISTRIBUTED REPLICATED — full distribution (the table copy is stored on every cluster segment).

For more details, see Distribution

PARTITION BY

Specifies a partitioned table. For more details, see Partitioning

Arguments

<table_name>

A table name

<column_name>

A column name

<data_type>

A column data type. See Recommendations on choosing column data types below

<collation>

A column collation

<column_constraint>

A constraint defined at the column level. The name <constraint_name> is given optionally. Possible constraints:

  • NOT NULL — specifies that the column is not allowed to contain null values.

  • NULL — specifies that the column is allowed to contain null values. Is is the default behavior (unless NOT NULL is specified).

  • CHECK — specifies the boolean expression <expression> which new or updated table rows should satisfy for an INSERT or UPDATE operation to succeed. A constraint marked as NO INHERIT will not propagate to child tables (created via INHERITS).

  • DEFAULT — sets the default column value equal to <default_expr>.

  • UNIQUE — specifies that the column should contain only unique values. Null values are not considered equal. Only for heap tables.

  • PRIMARY KEY — specifies that the column should contain unique (non-duplicate) non-null values. Only one primary key can be specified for a table (it can include several columns). Only for heap tables.

For heap tables, referential integrity constraints (FOREIGN KEY, REFERENCES) are also allowed, but not enforced.

There are two groups of constraints:

  • DEFERRABLE | NOT DEFERRABLE. A constraint that is not deferrable will be checked immediately after every command. Checking of deferrable constraints can be postponed until the end of the transaction (using the SET CONSTRAINTS command). NOT DEFERRABLE is the default. Currently, only UNIQUE and PRIMARY KEY constraints can be deferrable. NOT NULL and CHECK constraints are not deferrable.

  • INITIALLY DEFERRED | INITIALLY IMMEDIATE. If a constraint is deferrable, this clause specifies the default time to check the constraint. If the constraint is INITIALLY IMMEDIATE, it is checked after each statement. This is the default behavior. If the constraint is INITIALLY DEFERRED, it is checked only at the end of the transaction. The constraint check time can be altered with the SET CONSTRAINTS command.

For indexes associated with a UNIQUE or PRIMARY KEY constraint you can define a tablespace and storage options as well as for ordinary tables (see <index_parameters> below).

 

Syntax:

[ CONSTRAINT <constraint_name>]
{ NOT NULL
  | NULL
  | CHECK  ( <expression> ) [ NO INHERIT ]
  | DEFAULT <default_expr>
  | UNIQUE <index_parameters>
  | PRIMARY KEY <index_parameters>
[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]

Where <index_parameters> is:

[ WITH ( <storage_table_directive> [, ... ] ) ]
[ USING INDEX TABLESPACE <tablespace_name> ]

<storage_column_directive>

Specifies the column storage settings for append-optimized column-oriented tables:

  • compresstype — compression type.

  • compresslevel — compression level.

  • blocksize — block size (in bytes).

Syntax:

compresstype={ZLIB|ZSTD|RLE_TYPE|NONE}
[compresslevel={0-9}]
[blocksize={8192-2097152} ]

For more details on compression options, see Data compression

<table_constraint>

A constraint defined at the table level. The name <constraint_name> is given optionally. Unlike <column_constraint>, does not allow to set constraints NULL, NOT NULL, and DEFAULT. The rest of the syntax is the same (see <column_constraint> above).

Syntax:

[ CONSTRAINT <constraint_name> ]
{ CHECK ( <expression> ) [ NO INHERIT ]
  | UNIQUE ( <column_name> [, ... ] ) <index_parameters>
  | PRIMARY KEY ( <column_name> [, ... ] ) <index_parameters>
[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]

Where <index_parameters> is:

[ WITH ( <storage_table_directive> [, ... ] ) ]
[ USING INDEX TABLESPACE <tablespace_name> ]

You can use this form of constraints to define primary keys based on several columns

<source_table>

An original table, which structure is used when a new table is being created via the LIKE statement

<like_option>

Defines whether to include or exclude the following parts of column definitions (when copying the original table via the LIKE statement):

  • DEFAULTS — default expressions. By default, they are excluded, which results in the copied columns in the new table not having defaults.

  • CONSTRAINTS — column constraints CHECK. NOT NULL-constraints are always copied to the new table.

  • INDEXES — indexes, PRIMARY KEY and UNIQUE constraints.

  • STORAGE — column storage settings. By default, they are excluded, which results in the copied columns in the new table having type-specific default settings.

  • COMMENTS — comments for the copied columns, constraints, and indexes.

  • ALL — all parts mentioned above.

Syntax:

{INCLUDING|EXCLUDING} {DEFAULTS|CONSTRAINTS|INDEXES|STORAGE|COMMENTS|ALL}

<parent_table>

A parent table, which structure is used when a new table is being created via the INHERITS statement

<storage_table_directive>

Specifies the table storage settings:

  • appendoptimized — whether or not the table should be created as append-optimized. If FALSE or not declared, the table will be created as a heap table.

  • blocksize — block size (in bytes). Only for append-optimized tables.

  • orientation — data orientation. If COLUMN or not declared, the table will be created as a column-oriented table, and as a row-oriented — otherwise. Only for append-optimized tables.

  • checksum — if TRUE or not declared, CRC checksum validation is enabled. The checksum is calculated during block creation and is stored on disk. Checksum validation is performed during block reads. If the checksum calculated during the read does not match the stored checksum, the transaction is cancelled. Only for append-optimized tables.

  • compresstype — compression type. Only for append-optimized tables.

  • compresslevel — compression level. Only for append-optimized tables.

  • fillfactor — a percentage of filling of a standard eight-byte block (between 10 and 100). 100 (complete packing) is the default value. When a smaller fillfactor is specified, INSERT operations pack table pages only to the indicated percentage; the remaining space on each page is reserved for updating rows on that page. For a table which entries are never updated, complete packing is the best choice, but in heavily updated tables smaller fillfactors are appropriate.

  • oids — if FALSE or not declared, object identifiers (OID) are not assigned to the table rows. It is not recommended to use OIDs for user-created tables, as using OIDs for large table can cause wrap-around of the 32-bit OID counter.

Syntax:

appendoptimized={TRUE|FALSE}
blocksize={8192-2097152}
orientation={COLUMN|ROW}
checksum={TRUE|FALSE}
compresstype={ZLIB|ZSTD|RLE_TYPE|NONE}
compresslevel={0-9}
fillfactor={10-100}
[oids=FALSE]

For more details on compression options, see Data compression.

The default values for some storage options listed above can be changed via the server configuration parameter gp_default_storage_options

<tablespace_name>

A name of the tablespace, which tha table should be stored in

<opclass>

An operator class name. Can be specified optionally to use a non-default hash function for distribution

<partition_type>

A partitioning type. Possible values:

  • LIST — list of values.

  • RANGE — numeric or date range.

<partition_description>

A partitioning specification including: partitions, subpartitions, default partition. For more details, see Partitioning

Recommendations on choosing column data types

There are some recommendations on how to choose data types for columns in the most efficient way:

  • Choose data types according to the stored data. For example, use numeric data types for numbers, character data types for strings, date or timestamp data types for dates, and so on.

  • The data types you select should use the least possible space but can still accommodate your data. For example, use INT instead of BIGINT, TEXT instead of CHAR(<n>), etc.

  • When using the smallest numeric data types, try to forecast their possible expansion in the future. Pay attention that changing from a smaller datatype to a larger one after loading large amounts of data can be costly. Therefore, if your current data values fit, for example, in the SMALLINT type properly but it is likely that the values will expand later, INT is the better long-term choice.

  • If possible, use specialized data types: INET, CIDR, JSON, JSONB, MACADDR, etc.

  • Use the same data types in the columns for which JOIN operations will be performed. When data types are different, the database should convert one of them in order to compare data values correctly, which adds unnecessary overhead.

  • Though Greenplum has a rich set of native data types, you can create your own ones. Usage of ENUM can give a huge performance boost.

Examples

Create a heap table with REPLICATED distribution type and column constraints
CREATE TABLE book_type(id SERIAL PRIMARY KEY, name TEXT UNIQUE NOT NULL)
DISTRIBUTED REPLICATED;
Create a heap table with REPLICATED distribution type and table constraints
CREATE TABLE book_cover(id SERIAL, name TEXT NOT NULL, PRIMARY KEY(id), UNIQUE(name))
DISTRIBUTED REPLICATED;
Create an append-optimized column-oriented table (distributed by the column value)
CREATE TABLE author(id SERIAL, name TEXT NOT NULL)
WITH (appendoptimized=true, orientation=column)
DISTRIBUTED BY(id);
Create an append-optimized row-oriented table with ZLIB compression (at the table level)
CREATE TABLE book(id SERIAL, title TEXT, author_id INT NOT NULL, public_year SMALLINT NULL, type_id INT NOT NULL, cover_id INT NOT NULL)
WITH (appendoptimized=true, orientation=row, compresstype=ZLIB, compresslevel=5)
DISTRIBUTED BY(id);
Create an append-optimized column-oriented table with ZSTD compression (at the column level)
CREATE TABLE book_free_fragment(id SERIAL, book_id INT, shop_id INT, fragment TEXT ENCODING(compresstype=ZSTD, compresslevel=5))
WITH (appendoptimized=true, orientation=column)
DISTRIBUTED BY(id);
Create a table via the LIKE statement

 

  1. Create the book_type table if it does not exist yet:

    CREATE TABLE IF NOT EXISTS book_type(id SERIAL PRIMARY KEY, name TEXT UNIQUE NOT NULL)
    DISTRIBUTED REPLICATED;

    The possible result is:

    NOTICE:  relation "book_type" already exists, skipping
    CREATE TABLE
  2. Check the structure of the created table via the psql command \d+:

    \d+ book_type

    The result is:

                                                  Table "public.book_type"
     Column |  Type   |                       Modifiers                        | Storage  | Stats target | Description
    --------+---------+--------------------------------------------------------+----------+--------------+-------------
     id     | integer | not null default nextval('book_type_id_seq'::regclass) | plain    |              |
     name   | text    | not null                                               | extended |              |
    Indexes:
        "book_type_pkey" PRIMARY KEY, btree (id)
        "book_type_name_key" UNIQUE CONSTRAINT, btree (name)
    Distributed Replicated
  3. Add another table book_type_arch with the same structure and constraints via the LIKE statement:

    CREATE TABLE book_type_arch(LIKE book_type INCLUDING ALL)
    DISTRIBUTED REPLICATED;

    The result is:

    CREATE TABLE
  4. Check the structure of the created table via the psql command \d+:

    \d+ book_type_arch

    You can see that it is similar to the book_type table:

                                                Table "public.book_type_arch"
     Column |  Type   |                       Modifiers                        | Storage  | Stats target | Description
    --------+---------+--------------------------------------------------------+----------+--------------+-------------
     id     | integer | not null default nextval('book_type_id_seq'::regclass) | plain    |              |
     name   | text    | not null                                               | extended |              |
    Indexes:
        "book_type_arch_pkey" PRIMARY KEY, btree (id)
        "book_type_arch_name_key" UNIQUE CONSTRAINT, btree (name)
    Distributed Replicated
    CAUTION
    Pay attention that both tables now use the same sequence book_type_id_seq for the first column id. It can lead to unexpected consequences.
  5. In order to check that two tables are not related anymore, alter the first table book_type:

    ALTER TABLE book_type
    DROP COLUMN name;

    The result is:

    ALTER TABLE
  6. Check the structure of the book_type_arch table again. It has not been changed.

    adb=# \d+ book_type_arch
                                               Table "public.book_type_arch"
     Column |  Type   |                       Modifiers                        | Storage  | Stats target | Description
    --------+---------+--------------------------------------------------------+----------+--------------+-------------
     id     | integer | not null default nextval('book_type_id_seq'::regclass) | plain    |              |
     name   | text    | not null                                               | extended |              |
    Indexes:
        "book_type_arch_pkey" PRIMARY KEY, btree (id)
        "book_type_arch_name_key" UNIQUE CONSTRAINT, btree (name)
    Distributed Replicated
Create a table via the INHERITS statement

 

  1. Create the book table if it does not exist yet:

    CREATE TABLE IF NOT EXISTS book(id SERIAL, title TEXT, author_id INT NOT NULL, public_year SMALLINT NULL, type_id INT NOT NULL, cover_id INT NOT NULL)
    WITH (appendoptimized=true, orientation=row, compresstype=ZLIB, compresslevel=5)
    DISTRIBUTED BY(id);

    The possible result is:

    NOTICE:  relation "book" already exists, skipping
    CREATE TABLE
  2. Check the structure of the created table via the psql command \d+:

    \d+ book

    The result is:

                                              Append-Only Table "public.book"
       Column    |   Type   |                     Modifiers                     | Storage  | Stats target | Description
    -------------+----------+---------------------------------------------------+----------+--------------+-------------
     id          | integer  | not null default nextval('book_id_seq'::regclass) | plain    |              |
     title       | text     |                                                   | extended |              |
     author_id   | integer  | not null                                          | plain    |              |
     public_year | smallint |                                                   | plain    |              |
     type_id     | integer  | not null                                          | plain    |              |
     cover_id    | integer  | not null                                          | plain    |              |
    Compression Type: zlib
    Compression Level: 5
    Block Size: 32768
    Checksum: t
    Distributed by: (id)
    Options: appendonly=true, orientation=row, compresstype=zlib, compresslevel=5
  3. Add another table book_arch with the same structure and constraints via the INHERITS statement:

    CREATE TABLE book_arch()
    INHERITS(book)
    WITH (appendoptimized=true, orientation=row, compresstype=ZLIB, compresslevel=5)
    DISTRIBUTED BY(id);

    The result is:

    CREATE TABLE
  4. Check the structure of the created table via the psql command \d+:

    \d+ book_arch

    You can see that it is similar to the book table:

                                            Append-Only Table "public.book_arch"
       Column    |   Type   |                     Modifiers                     | Storage  | Stats target | Description
    -------------+----------+---------------------------------------------------+----------+--------------+-------------
     id          | integer  | not null default nextval('book_id_seq'::regclass) | plain    |              |
     title       | text     |                                                   | extended |              |
     author_id   | integer  | not null                                          | plain    |              |
     public_year | smallint |                                                   | plain    |              |
     type_id     | integer  | not null                                          | plain    |              |
     cover_id    | integer  | not null                                          | plain    |              |
    Compression Type: zlib
    Compression Level: 5
    Block Size: 32768
    Checksum: t
    Inherits: book
    Distributed by: (id)
    Options: appendonly=true, orientation=row, compresstype=zlib, compresslevel=5
  5. In order to check that two tables stay related after creation is complete, alter the first table book:

    ALTER TABLE book
    DROP COLUMN type_id;

    The result is:

    ALTER TABLE
  6. Check the structure of the book_arch table again. The type_id column does not exist anymore, as well as in the parent table:

    adb=# \d+ book_arch
                                            Append-Only Table "public.book_arch"
       Column    |   Type   |                     Modifiers                     | Storage  | Stats target | Description
    -------------+----------+---------------------------------------------------+----------+--------------+-------------
     id          | integer  | not null default nextval('book_id_seq'::regclass) | plain    |              |
     title       | text     |                                                   | extended |              |
     author_id   | integer  | not null                                          | plain    |              |
     public_year | smallint |                                                   | plain    |              |
     cover_id    | integer  | not null                                          | plain    |              |
    Compression Type: zlib
    Compression Level: 5
    Block Size: 32768
    Checksum: t
    Inherits: book
    Distributed by: (id)
    Options: appendonly=true, orientation=row, compresstype=zlib, compresslevel=5
Create a temporary table

 

  1. Start a new transaction:

    START TRANSACTION;

    The result is:

    START TRANSACTION
  2. Create a tempora