Конференция Arenadata
Новое время — новый Greenplum
Мы приглашаем вас принять участие в конференции, посвященной будущему Open-Source Greenplum 19 сентября в 18:00:00 UTC +3. Встреча будет проходить в гибридном формате — и офлайн, и онлайн. Онлайн-трансляция будет доступна для всех желающих.
Внезапное закрытие Greenplum его владельцем — компанией Broadcom - стало неприятным сюрпризом для всех, кто использует или планирует начать использовать решения на базе этой технологии. Многие ожидают выхода стабильной версии Greenplum 7 и надеются на её дальнейшее активное развитие.
Arenadata не могла допустить, чтобы разрабатываемый годами Open-Source проект Greenplum прекратил своё существование, поэтому 19 сентября мы представим наш ответ на данное решение Broadcom, а участники сообщества получат исчерпывающие разъяснения на все вопросы о дальнейшей судьбе этой технологии.

На конференции вас ждёт обсуждение следующих тем:

  • План возрождения Greenplum;
  • Дорожная карта;
  • Экспертное обсуждение и консультации.
Осталось до события

Basic table syntax

Table is the main data object in ADPG. It consists of rows and columns. The number and order of columns is fixed, each column has a name and a data type that limits allowed values. The number of rows is variable — it reflects how much data is stored in the table. SQL does not guarantee the order of the rows in a table. When a table is read, the rows are returned in an unspecified order, unless sorting is explicitly requested.

PostgreSQL includes a significant number of built-in data types that meet the needs of most applications. Users can also define their own data types. Most built-in data types have obvious names and semantics. Some frequently used data types are integer for whole numbers, numeric for fractional numbers, text for character strings, date for dates, time for time-of-day values, and timestamp for values containing both date and time. For more information, see Data types.

Create a table

To create a table, use the CREATE TABLE command. In this command, you need to specify a name of a new table, the names of columns, and the data type of each column. An example of this command syntax is described below.

CREATE [ { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <table_name> ( [
  { <column_name> <data_type> [ COMPRESSION <compression_method> ] [ COLLATE <collation> ] [ <column_constraint> [ ... ] ]
    | <table_constraint>
    | LIKE <source_table> [ <like_option> ... ] }
    [, ... ]
] )
[ INHERITS ( <parent_table> [, ... ] ) ]
[ PARTITION BY { RANGE | LIST | HASH } ( { <column_name> | ( <expression> ) } [ COLLATE <collation> ] [ <opclass> ] [, ... ] ) ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <tablespace_name> ]
Basic clauses and arguments

TEMPORARY or TEMP

If specified, the table is created as a temporary table. Temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction (see ON COMMIT below)

UNLOGGED

If specified, the table is created as an unlogged table. Data written to unlogged tables is not written to the write-ahead log (WAL). It makes them faster than ordinary tables. However, they are not crash-safe. The contents of an unlogged table are also not replicated to standby servers. Any indexes created on an unlogged table are automatically unlogged

IF NOT EXISTS

PostgreSQL creates a new table if the table with the same name does not exist. When this option is specified, PostgreSQL does not throw an error if a relation with the specified name is in the database. There is no guarantee that the existing relation matches the one that would be created

<table_name>

The name of the new table. Can be schema-qualified (<schema_name>.<table_name>)

<column_name>

The name of a column to be created in the new table

<data_type>

The column data type. This can include array specifiers. For more information on supported data types, refer to Data types

COMPRESSION <compression_method>

Sets the compression method for the column. Compression is supported only for variable-width data types and is used only when the column’s storage mode is main or extended (see ALTER TABLE for information on column storage modes). The supported compression methods are pglz and lz4. Also, <compression_method> can have the default value to explicitly specify the default behavior, which is determined by the default_toast_compression setting

COLLATE <collation>

Defines the column collation. If this setting is not specified, PostgreSQL uses the column data type’s default collation

<column_constraint>

Column constraints are described in the Use constraints section

<table_constraint>

Table constraints are described in the Use constraints section

LIKE <source_table> [<like_option> …​ ]

Specifies a table (<source_table>) from which the new table automatically copies all column names, their data types, and their NOT NULL constraints. The new table and the original table become completely independent after creation. The optional <like_option> clause specifies which additional properties of the original table should be copied. The INCLUDING clause copies the property, and EXCLUDING omits the property. EXCLUDING is the default value. If multiple specifications are made for the same kind of object, the last one is used.

Syntax:

{INCLUDING|EXCLUDING}{COMMENTS|COMPRESSION|CONSTRAINTS|
DEFAULTS|GENERATED|IDENTITY|INDEXES|STATISTICS|STORAGE|ALL}

The available options with the INCLUDING clause are listed below:

  • INCLUDING COMMENTS — comments for columns, constraints, and indexes are copied. The default behavior is to exclude comments — columns and constraints do not have comments in the new table.

  • INCLUDING COMPRESSION — compression methods of the columns are copied. The default behavior is to exclude compression methods — columns have the default compression method in the new table.

  • INCLUDING CONSTRAINTS — CHECK constraints are copied. This option includes column and table constraints. NOT NULL constraints are always copied to the new table.

  • INCLUDING DEFAULTS — default expressions for column definitions are copied. Otherwise, columns in the new table have NULL defaults.

  • INCLUDING GENERATED — generation expressions of column definitions are copied.

  • INCLUDING IDENTITY — identity specifications of column definitions are copied. A new sequence is created for each identity column of the new table, separate from the sequences associated with the source table.

  • INCLUDING INDEXES — indexes, PRIMARY KEY, UNIQUE, and EXCLUDE constraints on the original table are created on the new table. Names for the new indexes and constraints are chosen according to the default rules.

  • INCLUDING STATISTICS — extended statistics are copied to the new table.

  • INCLUDING STORAGE — storage settings for column definitions are copied. The default behavior is to exclude storage settings — columns have type-specific default settings in the new table.

  • INCLUDING ALL — an abbreviated form selecting all the available individual options. It can be useful to write individual EXCLUDING clauses after INCLUDING ALL to copy all options except for some.

INHERITS ( <parent_table> [, …​ ] )

Specifies a list of tables from which the new table automatically inherits all columns. Parent tables can be ordinary tables or foreign tables. The use of INHERITS creates a persistent relationship between the new child table and parent tables. Parent schema modifications propagate to a child table, and the data of the child table is included in scans of parents.

If the same column name exists in more than one parent table, an error is reported unless the data types of the columns match each of the parent tables. If there is no conflict, then the duplicate columns are merged into a single column in the new table. If the column name list of the new table contains a column name that is also inherited, the data type must match the inherited column, and the column definitions are merged into one. If the new table explicitly specifies a default value for the column, this default overrides any defaults from inherited declarations of the column.

Compare to the LIKE clause, the use of INHERITS creates a persistent relationship between the new child table and its parent table. All modifications applied to the parent table structure normally propagate to the child one

PARTITION BY { RANGE | LIST | HASH }

Specifies the partitioning strategy for the table. A created table is called a partitioned table. The list of columns or expressions in parentheses forms the partition key. For the RANGE or HASH options, the key can include multiple columns or expressions. For LIST partitioning, the partition key must consist of a single column or expression. RANGE and LIST partitioning requires a B-tree operator class (<opclass>), while HASH partitioning requires a hash operator class. If the operator class is not set explicitly, PostgreSQL uses the default operator class of the appropriate type. A partitioned table is divided into sub-tables called partitions, which are created using separate CREATE TABLE commands. The partitioned table is itself empty. A data row inserted into the table is routed to a partition based on the value of columns or expressions in the partition key. If no existing partition matches the values in the new row, an error will be reported

ON COMMIT

Specifies the behavior of temporary tables at the end of a transaction block. Possible values:

  • PRESERVE ROWS — no special action is taken. This is the default behavior.

  • DELETE ROWS — all rows in the temporary table should be deleted at the end of the current transaction.

  • DROP — the temporary table should be dropped at the end of the current transaction.

TABLESPACE

Defines the name of the tablespace in which PostgreSQL creates the new table. If not specified, default_tablespace is used for ordinary tables and temp_tablespaces for temporary tables

The following example creates the table1 table with two columns:

CREATE TABLE table1 (
    first_column text,
    second_column integer
);

The first column is named first_column and has a data type of text, the second column has the second_column name and the integer type. The table and column names follow the identifier syntax explained in Identifiers and keywords.

TIP
When you create many interrelated tables, it makes sense to pre-select a table and column naming pattern. For example, decide whether table names will use plural or singular nouns.

Examples

Create a table with NOT NULL and PRIMARY KEY constrains
CREATE TABLE books (
    code        char(5) CONSTRAINT first_key PRIMARY KEY,
    title       varchar(40) NOT NULL,
    date_pub   date,
    genre        varchar(10)
);
Create a temporary table
CREATE TEMP TABLE books (
    code        char(5) CONSTRAINT firstkey PRIMARY KEY,
    title       varchar(40) NOT NULL,
    date_pub   date,
    genre        varchar(10)
);
Use INHERITS
CREATE TABLE book_location (
    shelf_no integer
) INHERITS (books);
Use LIKE
CREATE TABLE book_location(
    shelf_no integer,
  LIKE books INCLUDING ALL
);

Drop a table

If you no longer need a table, you can use the DROP TABLE command to remove it:

DROP TABLE books;

Attempting to drop a non-existent table causes an error. To avoid it, you can use the IF EXISTS clause:

DROP TABLE IF EXISTS books;

Specify default column values

You can assign a default value to a column. Use the DEFAULT keyword for this. When a new row is created and a value is not specified for a column, this column value is filled with the specified default value. If no default value is declared, the default value is NULL.

In a table definition, default values are listed after the column data type:

CREATE TABLE products (
    product_no integer,
    name text,
    price numeric DEFAULT 49.99
);

The default value can be an expression, which is evaluated when the default value is inserted. For example, a timestamp column can have the current_timestamp function as a default value. This function will return the time of row insertion:

CREATE TABLE products (
product_no integer,
name text,
price numeric DEFAULT 49.99,
shipping_date timestamp DEFAULT current_timestamp
);

Define generated columns

A generated column is a special kind of column that is computed from others. PostgreSQL supports only stored generated columns. A stored generated column is computed on write and takes up space in the table as a regular column.

To create a generated column, use the GENERATED ALWAYS AS clause in CREATE TABLE. It is also necessary to use the STORED keyword:

CREATE TABLE products (
    product_no integer,
    name text,
    packing_weight numeric,
    product_weight numeric GENERATED ALWAYS AS (packing_weight / 6) STORED
);

A generated column value cannot be specified in the INSERT or UPDATE commands, because it is impossible to write to the generated column. However, you can use the DEFAULT keyword to set the default value.

There are the following differences between generated columns and columns with a default value:

  • The default column value is evaluated once when a row is inserted into the table and no other value is set. The value of the generated column is updated whenever the row changes and cannot be overridden.

  • The default value expression cannot refer to other columns in the table, but the generating expression refers to them.

  • A column default can use volatile functions, for example, random or functions referring to the current time, but this is not allowed for generated columns.

Use constraints

Data types limit the amount and type of data that can be stored in a table. However, for many applications, such restrictions are too coarse. For example, a column containing the price of a product should only accept positive values. But there is no such standard data type. You may also want to restrict the data of a column in relation to other columns or rows. For example, in a table with information about a product, there should be only one row with a specific product code.

To solve this problem, SQL allows you to define constraints on columns and tables. If the user tries to store a value in a column that violates the constraints, PostgreSQL raises an error. The restrictions are also applied to default values.

Check constraints

A CHECK constraint is the most generic constraint type. It allows you to specify a condition (the boolean expression) for column values. For example, to require positive product prices, you can use:

CREATE TABLE products (
    product_no integer,
    name text,
    price numeric CHECK (price > 0)
);

The constraint definition comes after the data type, as default value definitions. Default values and constraints can be listed in any order.

A CHECK constraint consists of the CHECK keyword followed by an expression in parentheses. The check constraint expression should contain a column name.

You can also give the constraint a name. This clarifies error messages and allows you to refer to the constraint when you need to change it. The syntax is the CONSTRAINT keyword followed by an identifier followed by the constraint definition:

CREATE TABLE products (
    product_no integer,
    name text,
    price numeric CONSTRAINT positive_price CHECK (price > 0)
);

A CHECK constraint can also refer to several columns. It is a table constraint, and it is written as a separate item in the comma-separated column list:

CREATE TABLE products (
    product_no integer,
    name text,
    price numeric CHECK (price > 0),
    discounted_price numeric CHECK (discounted_price > 0),
    CONSTRAINT valid_discount CHECK (price > discounted_price)
);

Note that the check constraint is satisfied if the expression evaluates to true or NULL. Since most expressions with NULL operands will evaluate to NULL, CHECK constraints do not prevent NULL from being written to associated columns. To ensure that a column does not contain a NULL value, use the NOT NULL constraint described in the next section.

NOT NULL constraints

A NOT NULL constraint specifies that a column must not accept the NULL value.

CREATE TABLE products (
    product_no integer NOT NULL,
    name text NOT NULL,
    price numeric
);

A NOT NULL constraint is a column constraint. A not-null constraint is equivalent to CHECK (<column_name> IS NOT NULL), but in PostgreSQL creating an explicit not-null constraint is more efficient. In most database designs the majority of columns should be marked as NOT NULL.

The NOT NULL constraint has an inverse — the NULL constraint. It means that a column can contain the NULL value (the default behavior). The NULL constraint is not present in the SQL standard and should not be used in portable applications.

Unique constraints

Unique constraints ensure that the data contained in a column, or a group of columns, is unique among all the rows in the table. You can define a unique restriction as a column constraint:

CREATE TABLE products (
    product_no integer UNIQUE,
    name text,
    price numeric
);

It is also possible to specify a restriction as a table constraint.

CREATE TABLE products (
    product_no integer,
    name texCt,
    price numeric,
    UNIQUE (product_no)
);

To define a unique constraint for a group of columns, write it as a table constraint with the column names separated by commas. You can also assign your own name for a unique constraint:

CREATE TABLE products (
    serial_no integer,
    product_no integer,
    name text,
    price numeric,
    CONSTRAINT must_be_different UNIQUE (serial_no, product_no)
);

Adding a unique constraint automatically creates a unique B-tree index on the column or group of columns listed in the constraint.

Primary keys

A primary key constraint indicates that a column, or group of columns, can be used as a unique identifier for rows in the table. This requires that the values be both unique and not null. The following two table definitions accept the same data:

CREATE TABLE products (
    product_no integer UNIQUE NOT NULL,
    name text,
    price numeric
);

CREATE TABLE products (
    product_no integer PRIMARY KEY,
    name text,
    price numeric
);

A primary key can contain more than one column:

CREATE TABLE products (
    product_no integer,
    serial_no integer,
    name text,
    price numeric,
    PRIMARY KEY (product_no, serial_no)
);

Adding a primary key creates a unique B-tree index on the column or group of columns listed in the primary key, and forces the columns to be marked NOT NULL.

A table can have at most one primary key. There can be more than one unique and NOT NULL constraint, which is functionally almost equivalent to primary keys, but only one primary key constraint can be defined.

Foreign keys

A foreign key constraint specifies that the values in a column (or in a group of columns) must match the values from some row of another table. This maintains the referential integrity between two related tables.

For example, we have the products table:

CREATE TABLE products (
    product_no integer PRIMARY KEY,
    name text,
    price numeric
);

Also, we have the orders table that stores orders of these products. We want to ensure that the orders table only contains orders of products that actually exist. So we define a foreign key constraint in the orders table that references the products table:

CREATE TABLE orders (
    order_id integer PRIMARY KEY,
    product_no integer REFERENCES products (product_no),
    quantity integer
);

Now it is impossible to create orders with not null product_no entries that do not exist in the products table. The orders table is the referencing table and the products table is the referenced table. Similarly, there are referencing and referenced columns.

You can omit a column list in the command as follows:

CREATE TABLE orders (
    order_id integer PRIMARY KEY,
    product_no integer REFERENCES products,
    quantity integer
);

In this case, the primary key of the referenced table is used as the referenced column.

You can assign your own name for a foreign key constraint. A foreign key can also constrain and reference a group of columns. The number and type of the constrained columns need to match the number and type of the referenced columns.

CREATE TABLE table1 (
    field1 integer PRIMARY KEY,
    field2 integer,
    field3 integer,
    CONSTRAINT constraint1 FOREIGN KEY (field2, field3) REFERENCES other_table (field_a, field_b)
);

A foreign key constraint can refer to the same table (a self-referential foreign key). It allows you to implement a node tree structure.

CREATE TABLE tree (
    node_id integer PRIMARY KEY,
    parent_id integer REFERENCES tree,
    name text
);

The top level node has parent_id equal to NULL, other parent_id entries refer to valid table rows.

A table can have more than one foreign key constraint. This is used to implement many-to-many relationships between tables.

To ensure data integrity in many-to-many relationships, you can specify the restrictions used when updating and deleting rows. For example:

CREATE TABLE order_items (
    product_no integer REFERENCES products ON DELETE RESTRICT,
    order_id integer REFERENCES orders ON UPDATE CASCADE,
    quantity integer,
    PRIMARY KEY (product_no, order_id)
);

The table below lists available options.

Name Description

CASCADE

When a referenced row is deleted or updated, referencing rows should be automatically deleted or updated

NO ACTION

If referencing rows continue to exist when the constraint is checked, an error occurs. NO ACTION is the default value

RESTRICT

Prevents deletion or update of a referenced row

SET NULL

Values in the referencing rows are set to NULL when the referenced row is deleted

SET DEFAULT

Values in the referencing rows are set to their default values when the referenced row is deleted

The main difference between NO ACTION and RESTRICT is that NO ACTION allows you to defer validation during a transaction, while RESTRICT does not.

A foreign key must reference columns that either are primary keys or form a unique constraint.

Exclusion constraints

Exclusion constraints ensure that if any two rows are compared on the specified columns or expressions using the specified operators, at least one of these operator comparisons returns false or NULL. The example:

CREATE TABLE circles (
    c circle,
    EXCLUDE USING gist (c WITH &&)
);

System columns

Every table has several system columns that are implicitly defined by the system. Their names cannot be used as names of user-defined columns.

Name Description

tableoid

The OID of the table containing this row. This column is useful for queries that select from partitioned tables or inheritance hierarchies. The tableoid can be joined against the oid column of pg_class to obtain the table name

xmin

The identity (transaction ID) of the inserting transaction for this row version. A row version is an individual state of a row. Each row update creates a new row version for the same logical row

cmin

The command identifier (starting at zero) within the inserting transaction

xmax

The identity (transaction ID) of the deleting transaction, or zero for a non-deleted row version. If the deleting transaction is not committed yet, or a deletion is rolled back, this column can be non-zero in a visible row version

cmax

The command identifier within the deleting transaction, or zero

ctid

The physical row location within the table. It is not recommended to use ctid as a row identifier, because the VACUUM FULL operation changes a row’s ctid. A primary key should be used to identify logical rows

Transaction identifiers are 32-bit quantities. In a long-lived database, it is possible for transaction IDs to wrap around. This is not critical if routine database maintenance tasks are performed on time. However, you should not rely on the uniqueness of transaction codes for a long time (with more than a billion transactions).

Command identifiers are also 32-bit quantities. This creates a hard limit of SQL commands within a single transaction (about 4 billion). In practice this limit is not a problem — note that the limit is on the number of SQL commands, not the number of rows processed. Also, only commands that actually modify the database contents have command identifiers.

Found a mistake? Seleсt text and press Ctrl+Enter to report it