Data compression
Overview
To improve I/O operations and decrease data size in ADB clusters, you can use data compression. It is available only for append-optimized tables. There are two types of data compression:
-
Table-level compression — can be applied to an entire table. It is possible both for row-oriented and column-oriented append-optimized tables.
-
Column-level compression — can be applied to a specific column. You can choose different column-level compression algorithms for different table columns. This data compression type is possible only for column-oriented append-optimized tables.
Data orientation | Compression level | Algorithms | Example |
---|---|---|---|
Row |
Table |
|
Table level only:
|
Column |
Table and column |
|
|
Regardless of the level where you define data compression (table or column), you can use the following storage directives. They should be specified in the WITH
or ENCODING
clause, accordingly:
-
compresstype
— type of data compression. Possible values:ZLIB
,ZSTD
, andRLE_TYPE
. Values are not case-sensitive. The default value isnone
— no compression. -
compresslevel
— compression level. Possible values for different compression algorithms are summarized in the table below. Levels with small numbers are the fastest, but they perform the least compression.
IMPORTANT
When an ENCODING clause conflicts with a WITH clause, the ENCODING clause has a higher precedence.
|
Directive | Deflate | Zstandard | Run-length encoding |
---|---|---|---|
compresstype |
ZLIB |
ZSTD |
RLE_TYPE |
compresslevel |
Possible values:
|
Possible values:
|
Possible values:
|
Recommendations on data compression usage
There are some recommendations on how to get the maximum performance when using data compression:
-
Choose the most suitable compression algorithm:
-
ZSTD — provides both speed and a good compression ratio that can be tuned with the
compresslevel
option. -
ZLIB — is usually used for backward compatibility. ZSTD outperforms this compression type on usual workloads.
-
RLE_TYPE — compresses data better than other algorithms when the same data values occur in many consecutive rows. But this compression type is not useful with tables that do not have large sets of repeated data.
-
-
When specifying the compression level, remember that the bigger that value is — the more (yet slower) the compression is. In fact, data compression allows you to "buy" I/O and disk space for CPU. So you should always balance between the available CPU resources and the required speed of compression. Try to find the optimal settings — to avoid long compression times or slow scan rates.
-
The final performance of compressed append-optimized tables depends on many factors, such as hardware, query tuning parameters, etc. You should perform comparison testing to determine whether data compression is useful in your environment.
-
It is not recommended to create compressed append-optimized tables on file systems that use compression.
TIP
|