Table types

Daria Barysheva

Contents

Overview
Heap vs Append-optimized
Row-oriented vs Column-oriented

Overview

ADB (as well as Greenplum) supports several types of tables with different data storage models:

Heap. This is the default table type. Heap tables are recommended for OLTP-type workloads. You can use them if the data is frequently modified after being initially loaded. It is also the most suitable kind of tables for single-row queries INSERT, UPDATE, and DELETE.
Append-optimized. These tables are preferable for OLAP-type workloads. They are also optimized for bulk data loading. You can use them if the data is rarely modified after being initially loaded, i.e. the read-only queries prevail. Unlike Heap tables (that can be only row-oriented), append-optimized tables support two forms of data orientation:
- Row-oriented. This storage model is recommended for retrieving all or the most part of the table columns at once.
- Column-oriented. This model is good for data computing over a small number of columns. You can also use it for tables that need regular updates of the small part of columns — without modifying the rest of them.

Table types

When you create a new table, you should choose the most suitable table type. The following subsections show the difference between the table types that are possible in ADB. This information can be useful for choosing the best storage model for your workloads.

Heap vs Append-optimized

Criteria Heap Append-optimized

Load

OLTP

OLAP

Size of tables

Small tables, e.g. dimension tables

Denormalized fact tables that are typically the largest ones in the system

Frequency of data updates after the initial loading

Often

Rarely

Prevailing queries

Many iterative transactions including single-row updates and parallel batch operations

Loading by batches.
Access by read-only queries.

Ability to define unique values

Supported — by using the UNIQUE and PRIMARY KEY column modifiers

—

Compression

—

Row-oriented: ZSTD, ZLIB (table level).
Column-oriented: ZSTD, ZLIB, RLE_TYPE (table and column level).

Syntax

Heap tables are created by default:

CREATE TABLE test_heap (a int, b text) DISTRIBUTED BY (a);

In order to create an append-optimized table, you should use the appendoptimized=true option in the WITH expression. The legacy form appendonly=true is possible, but not recommended:

CREATE TABLE test_ao (a int, b text)
  WITH (appendoptimized=true)
  DISTRIBUTED BY (a);

Recommendations on usage

OLTP-type workloads — with many iterative transactions over small tables

Data Warehouse environment (OLAP-type workloads) — with bulk loading and subsequent read-only requests over large denormalized tables

CAUTION

It is not recommended to perform the following commands over append-optimized tables due to their inefficiency:

The single-row INSERT, UPDATE or DELETE operations.
The parallel batch UPDATE or DELETE operations (this does not apply to the batch INSERT operations).

Row-oriented vs Column-oriented

Criteria Row-oriented Column-oriented

Table type

Heap & Append-optimized

Append-optimized

Frequency of data updates after the initial loading

Often

Rarely

Frequency of data inserting

Often

Rarely

(column-oriented tables are not optimized for the INSERT operations, as column values for a row are written to different places on disk)

Number of columns in the table

Relatively small

Many columns (dozens or more)

Number of requested columns

All or the majority of columns

A small subset of columns

Prevailing queries

Any

The SELECT queries that:
- Aggregate many values of a single column (the WHERE and HAVING predicates also use this column):
  SELECT SUM(price) FROM book; SELECT SUM(price) FROM book WHERE price > 1000;
- Return a relatively small number of columns (the WHERE predicate uses a single column):
  SELECT title FROM book WHERE public_year=2022;
Rare UPDATE queries that modify a small subset of columns without modifying other column data.

Compression

Possible on the table level in append-optimized tables.
Algorithms: ZSTD, ZLIB

Possible on the table and the column levels in append-optimized tables.
Algorithms: ZSTD, ZLIB, RLE_TYPE

Syntax

Heap:

CREATE TABLE test_heap (a int, b text) DISTRIBUTED BY (a);

Append-optimized:

CREATE TABLE test_ao_row (a int, b text)
  WITH (appendoptimized=true, orientation=row)
  DISTRIBUTED BY (a);

Append-optimized only:

CREATE TABLE test_ao_column (a int, b text)
  WITH (appendoptimized=true, orientation=column)
  DISTRIBUTED BY (a);

Recommendations on usage

OLTP-type workloads — with many iterative transactions including frequent inserts and updates.
OLAP-type workloads or mixed workloads — with queries where many columns of a single row are retrieved all at once.

Data Warehouse environment (OLAP-type workloads) — with aggregations of many values of a single column or regular selects/updates of small number of columns

Found a mistake? Seleсt text and press Ctrl+Enter to report it