Tables
Table is a main data object in ADB. ADB tables are similar to tables in any relational database, except distribution of table rows across the different cluster segments. When creating a table, you specify the table distribution policy.
Also, ADB provides several table storage types and compression options that can be used to store data in the most efficient way.
Create a new table
Syntax
To create an initially empty table in the current database and define its structure, use the CREATE TABLE
command. The user who issues the command owns the table. The basic syntax of this command is described below.
NOTE
For more details, see CREATE TABLE in the Greenplum documentation.
|
CREATE [ { TEMPORARY | TEMP } | UNLOGGED] TABLE [IF NOT EXISTS]
<table_name> (
[ { <column_name> <data_type> [ COLLATE <collation> ] [<column_constraint> [ ... ] ] [ ENCODING ( <storage_column_directive> [, ...] ) ]
| <table_constraint>
| LIKE <source_table> [ <like_option> ... ] }
[, ... ]
])
[ INHERITS ( <parent_table> [, ... ] ) ]
[ WITH ( <storage_table_directive> [, ... ] ) ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <tablespace_name> ]
[ DISTRIBUTED BY (<column_name> [<opclass>], [ ... ] )
| DISTRIBUTED RANDOMLY | DISTRIBUTED REPLICATED ]
[ PARTITION BY <partition_type> (<column_name>) <partition_specification>]
TEMPORARY | TEMP |
Specifies a temporary table, which is automatically dropped at the end of a session, or optionally at the end of the current transaction (see |
UNLOGGED |
Specifies an unlogged table, which data is not written to the write-ahead log (WAL). This makes such tables faster than ordinary tables, yet not crash-safe. The contents of unlogged tables are not replicated to mirror segment instances |
COLLATE |
Defines the column collation. If not specified, the column data type’s default collation is used |
ENCODING |
Defines the column storage options: type of compression, compression level, and block size. This clause is valid only for append-optimized column-oriented tables |
LIKE |
Specifies a table from which the new table automatically copies all column names, their data types, |
INHERITS |
Specifies a list of tables from which the new table automatically inherits all columns and their constraints. Compare to the |
WITH |
Defines storage parameters for the table |
ON COMMIT |
Specifies the behavior of temporary tables at the end of a transaction block. Possible values:
|
TABLESPACE |
Defines the name of the tablespace in which the new table is to be created. If not specified, the database’s default tablespace is used, or the server configuration parameter |
DISTRIBUTED BY |
Sets the table distribution policy. Possible values:
For more details, see Distribution |
PARTITION BY |
Specifies a partitioned table. For more details, see Partitioning |
<table_name> |
A table name |
<column_name> |
A column name |
<data_type> |
A column data type. See Recommendations on choosing column data types below |
<collation> |
A column collation |
<column_constraint> |
A constraint defined at the column level. The name
For heap tables, referential integrity constraints ( There are two groups of constraints:
For indexes associated with a Syntax:
Where
|
<storage_column_directive> |
Specifies the column storage settings for append-optimized column-oriented tables:
Syntax:
For more details on compression options, see Data compression |
<table_constraint> |
A constraint defined at the table level. The name Syntax:
Where
You can use this form of constraints to define primary keys based on several columns |
<source_table> |
An original table, which structure is used when a new table is being created via the |
<like_option> |
Defines whether to include or exclude the following parts of column definitions (when copying the original table via the
Syntax:
|
<parent_table> |
A parent table, which structure is used when a new table is being created via the |
<storage_table_directive> |
Specifies the table storage settings:
Syntax:
For more details on compression options, see Data compression. The default values for some storage options listed above can be changed via the server configuration parameter gp_default_storage_options |
<tablespace_name> |
A name of the tablespace, which tha table should be stored in |
<opclass> |
An operator class name. Can be specified optionally to use a non-default hash function for distribution |
<partition_type> |
A partitioning type. Possible values:
|
<partition_description> |
A partitioning specification including: partitions, subpartitions, default partition. For more details, see Partitioning |
Recommendations on choosing column data types
There are some recommendations on how to choose data types for columns in the most efficient way:
-
Choose data types according to the stored data. For example, use numeric data types for numbers, character data types for strings, date or timestamp data types for dates, and so on.
-
The data types you select should use the least possible space but can still accommodate your data. For example, use
INT
instead ofBIGINT
,TEXT
instead ofCHAR(<n>)
, etc. -
When using the smallest numeric data types, try to forecast their possible expansion in the future. Pay attention that changing from a smaller datatype to a larger one after loading large amounts of data can be costly. Therefore, if your current data values fit, for example, in the
SMALLINT
type properly but it is likely that the values will expand later,INT
is the better long-term choice. -
If possible, use specialized data types:
INET
,CIDR
,JSON
,JSONB
,MACADDR
, etc. -
Use the same data types in the columns for which
JOIN
operations will be performed. When data types are different, the database should convert one of them in order to compare data values correctly, which adds unnecessary overhead. -
Though Greenplum has a rich set of native data types, you can create your own ones. Usage of
ENUM
can give a huge performance boost.
Examples
CREATE TABLE book_type(id SERIAL PRIMARY KEY, name TEXT UNIQUE NOT NULL)
DISTRIBUTED REPLICATED;
CREATE TABLE book_cover(id SERIAL, name TEXT NOT NULL, PRIMARY KEY(id), UNIQUE(name))
DISTRIBUTED REPLICATED;
CREATE TABLE author(id SERIAL, name TEXT NOT NULL)
WITH (appendoptimized=true, orientation=column)
DISTRIBUTED BY(id);
CREATE TABLE book(id SERIAL, title TEXT, author_id INT NOT NULL, public_year SMALLINT NULL, type_id INT NOT NULL, cover_id INT NOT NULL)
WITH (appendoptimized=true, orientation=row, compresstype=ZLIB, compresslevel=5)
DISTRIBUTED BY(id);
CREATE TABLE book_free_fragment(id SERIAL, book_id INT, shop_id INT, fragment TEXT ENCODING(compresstype=ZSTD, compresslevel=5))
WITH (appendoptimized=true, orientation=column)
DISTRIBUTED BY(id);
-
Create the
book_type
table if it does not exist yet:CREATE TABLE IF NOT EXISTS book_type(id SERIAL PRIMARY KEY, name TEXT UNIQUE NOT NULL) DISTRIBUTED REPLICATED;
The possible result is:
NOTICE: relation "book_type" already exists, skipping CREATE TABLE
-
Check the structure of the created table via the psql command
\d+
:\d+ book_type
The result is:
Table "public.book_type" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+--------------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('book_type_id_seq'::regclass) | plain | | name | text | not null | extended | | Indexes: "book_type_pkey" PRIMARY KEY, btree (id) "book_type_name_key" UNIQUE CONSTRAINT, btree (name) Distributed Replicated
-
Add another table
book_type_arch
with the same structure and constraints via theLIKE
statement:CREATE TABLE book_type_arch(LIKE book_type INCLUDING ALL) DISTRIBUTED REPLICATED;
The result is:
CREATE TABLE
-
Check the structure of the created table via the psql command
\d+
:\d+ book_type_arch
You can see that it is similar to the
book_type
table:Table "public.book_type_arch" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+--------------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('book_type_id_seq'::regclass) | plain | | name | text | not null | extended | | Indexes: "book_type_arch_pkey" PRIMARY KEY, btree (id) "book_type_arch_name_key" UNIQUE CONSTRAINT, btree (name) Distributed Replicated
CAUTIONPay attention that both tables now use the same sequencebook_type_id_seq
for the first columnid
. It can lead to unexpected consequences. -
In order to check that two tables are not related anymore, alter the first table
book_type
:ALTER TABLE book_type DROP COLUMN name;
The result is:
ALTER TABLE
-
Check the structure of the
book_type_arch
table again. It has not been changed.adb=# \d+ book_type_arch Table "public.book_type_arch" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+--------------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('book_type_id_seq'::regclass) | plain | | name | text | not null | extended | | Indexes: "book_type_arch_pkey" PRIMARY KEY, btree (id) "book_type_arch_name_key" UNIQUE CONSTRAINT, btree (name) Distributed Replicated
-
Create the
book
table if it does not exist yet:CREATE TABLE IF NOT EXISTS book(id SERIAL, title TEXT, author_id INT NOT NULL, public_year SMALLINT NULL, type_id INT NOT NULL, cover_id INT NOT NULL) WITH (appendoptimized=true, orientation=row, compresstype=ZLIB, compresslevel=5) DISTRIBUTED BY(id);
The possible result is:
NOTICE: relation "book" already exists, skipping CREATE TABLE
-
Check the structure of the created table via the psql command
\d+
:\d+ book
The result is:
Append-Only Table "public.book" Column | Type | Modifiers | Storage | Stats target | Description -------------+----------+---------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('book_id_seq'::regclass) | plain | | title | text | | extended | | author_id | integer | not null | plain | | public_year | smallint | | plain | | type_id | integer | not null | plain | | cover_id | integer | not null | plain | | Compression Type: zlib Compression Level: 5 Block Size: 32768 Checksum: t Distributed by: (id) Options: appendonly=true, orientation=row, compresstype=zlib, compresslevel=5
-
Add another table
book_arch
with the same structure and constraints via theINHERITS
statement:CREATE TABLE book_arch() INHERITS(book) WITH (appendoptimized=true, orientation=row, compresstype=ZLIB, compresslevel=5) DISTRIBUTED BY(id);
The result is:
CREATE TABLE
-
Check the structure of the created table via the psql command
\d+
:\d+ book_arch
You can see that it is similar to the
book
table:Append-Only Table "public.book_arch" Column | Type | Modifiers | Storage | Stats target | Description -------------+----------+---------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('book_id_seq'::regclass) | plain | | title | text | | extended | | author_id | integer | not null | plain | | public_year | smallint | | plain | | type_id | integer | not null | plain | | cover_id | integer | not null | plain | | Compression Type: zlib Compression Level: 5 Block Size: 32768 Checksum: t Inherits: book Distributed by: (id) Options: appendonly=true, orientation=row, compresstype=zlib, compresslevel=5
-
In order to check that two tables stay related after creation is complete, alter the first table
book
:ALTER TABLE book DROP COLUMN type_id;
The result is:
ALTER TABLE
-
Check the structure of the
book_arch
table again. Thetype_id
column does not exist anymore, as well as in the parent table:adb=# \d+ book_arch Append-Only Table "public.book_arch" Column | Type | Modifiers | Storage | Stats target | Description -------------+----------+---------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('book_id_seq'::regclass) | plain | | title | text | | extended | | author_id | integer | not null | plain | | public_year | smallint | | plain | | cover_id | integer | not null | plain | | Compression Type: zlib Compression Level: 5 Block Size: 32768 Checksum: t Inherits: book Distributed by: (id) Options: appendonly=true, orientation=row, compresstype=zlib, compresslevel=5
-
Start a new transaction:
START TRANSACTION;
The result is:
START TRANSACTION
-
Create a tempora