Use timescaledb

Eugenia Kuzina

Contents

Overview
Hypertables
- Create an empty hypertable
- Convert a table with data to a hypertable
Compression
Use timescaledb hyperfunctions to analyze data

Overview

The timescaledb extension offers additional capabilities for processing time series data. It allows you to perform scalable inserts and complex queries. It also increases the performance of time series data queries.

The package required for the timescaledb installation is shipped with ADPG. To use timescaledb, just run the CREATE EXTENSION command:

CREATE EXTENSION timescaledb;

NOTE

If the timescaledb extension is created in the template1 database used as the default template, all subsequently created databases will have this extension installed.

ADPG uses the timescaledb version 2.15.0. To check it, execute the following query:

SELECT extversion FROM pg_extension
    WHERE extname = 'timescaledb';

Result:

 extversion
------------
 2.15.0

The timescaledb extension adds a number of functions, views, and tables to a database. For more information on timescaledb API, see TimescaleDB API reference.

You can use the \dn psql meta-command to display the various schemas that timescaledb has created.

Result:

           List of schemas
           Name           |  Owner
--------------------------+----------
 _timescaledb_cache       | postgres
 _timescaledb_catalog     | postgres
 _timescaledb_config      | postgres
 _timescaledb_debug       | postgres
 _timescaledb_functions   | postgres
 _timescaledb_internal    | postgres
 public                   | postgres
 timescaledb_experimental | postgres
 timescaledb_information  | postgres

Hypertables

The timescaledb extension provides hypertables that are a high-level abstraction, extending traditional PostgreSQL tables to operate with temporal data more effectively. Any operation that you can do with regular PostgreSQL tables, you can perform with hypertables. A hypertable simplifies the organization and retrieval of time-series information by providing built-in partitioning based on time intervals. timescaledb splits hypertables into chunks.

Create an empty hypertable

To generate a hypertable, you need to create an ordinary table and pass it as a parameter to the create_hypertable function:

CREATE TABLE base_table(
    id uuid,
    time_column timestamptz not null default clock_timestamp(),
    data_column1 int default random()*1E5,
    data_column2 int default random()*1E5
);

SELECT create_hypertable('base_table', by_range('time_column'));

Result:

create_hypertable
-------------------
 (7,t)

The number from the output (in this case 7) will be used later in the names of chunks belonging to this hypertable.

The create_hypertable function has two required parameters:

relation — an identifier of the table to convert to a hypertable. The relation must be of the regclass type ('base_table' in the example above).
dimension — must be of the dimension_info type. To create a dimension_info instance, you need to call the by_range or by_hash function. This parameter specifies conditions according to which a table will be divided into chunks ( by_range('time_column') in the example above).

Populate the hypertable with data:

INSERT INTO base_table(id, time_column) SELECT gen_random_uuid(), generate_series(
now() - INTERVAL '6 months',
now(),
INTERVAL '1 minute'
);

Use the \d+ psql meta-command to display information on the base_table hypertable:

\d+ base_table

Result:

    Column    |           Type           | Collation | Nullable |                     Default                      | Storage | Compression | Stats target | Description
--------------+--------------------------+-----------+----------+--------------------------------------------------+---------+-------------+--------------+-------------
 id           | uuid                     |           |          |                                                  | plain   |             |              |
 time_column  | timestamp with time zone |           | not null | clock_timestamp()                                | plain   |             |              |
 data_column1 | integer                  |           |          | (random() * '100000'::numeric::double precision) | plain   |             |              |
 data_column2 | integer                  |           |          | (random() * '100000'::numeric::double precision) | plain   |             |              |
Indexes:
    "base_table_time_column_idx" btree (time_column DESC)
Triggers:
    ts_insert_blocker BEFORE INSERT ON base_table FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker()
Child tables: _timescaledb_internal._hyper_7_57_chunk,
              _timescaledb_internal._hyper_7_58_chunk,
              _timescaledb_internal._hyper_7_59_chunk,
              _timescaledb_internal._hyper_7_60_chunk,
              _timescaledb_internal._hyper_7_61_chunk,
              _timescaledb_internal._hyper_7_62_chunk,
              _timescaledb_internal._hyper_7_63_chunk,
              _timescaledb_internal._hyper_7_64_chunk,
              _timescaledb_internal._hyper_7_65_chunk,
              _timescaledb_internal._hyper_7_66_chunk,
              _timescaledb_internal._hyper_7_67_chunk,
              _timescaledb_internal._hyper_7_68_chunk,
              _timescaledb_internal._hyper_7_69_chunk,
              _timescaledb_internal._hyper_7_70_chunk,
              _timescaledb_internal._hyper_7_71_chunk,
              _timescaledb_internal._hyper_7_72_chunk,
              _timescaledb_internal._hyper_7_73_chunk,
              _timescaledb_internal._hyper_7_74_chunk,
              _timescaledb_internal._hyper_7_75_chunk,
              _timescaledb_internal._hyper_7_76_chunk,
              _timescaledb_internal._hyper_7_77_chunk,
              _timescaledb_internal._hyper_7_78_chunk,
              _timescaledb_internal._hyper_7_79_chunk,
              _timescaledb_internal._hyper_7_80_chunk,
              _timescaledb_internal._hyper_7_81_chunk,
              _timescaledb_internal._hyper_7_82_chunk,
              _timescaledb_internal._hyper_7_83_chunk,
              _timescaledb_internal._hyper_7_84_chunk
Access method: heap

In the output, you can see that timescaledb has created 28 chunks for base_table.

You can also specify an interval for chunks. To do this, set an interval in the by_range function. It is possible to set it in milliseconds:

CREATE TABLE table1 (LIKE base_table);
SELECT create_hypertable('table1', by_range('time_column', 86400000000));

Alternatively, use the INTERVAL keyword:

CREATE TABLE table2 (LIKE base_table);
SELECT create_hypertable('table2', by_range('time_column', INTERVAL '1 day'));

When a hypertable is created, timescaledb generates indexes on partitioning columns. The create_default_indexes parameter controls this behavior. It is an optional parameter, and its default value is true. You can cancel index creation by setting its value to false:

CREATE TABLE table3 (LIKE base_table);
SELECT create_hypertable('table3', by_range('time_column'), create_default_indexes=> false);

If you try to call the create_hypertable function for a table that is already a hypertable, the exception is raised: ERROR: table "base_table" is already a hypertable. You can set an optional parameter if_not_exists to true to print a warning instead of raising the exception:

SELECT create_hypertable('base_table', by_range('time_column'), if_not_exists=> true);

Result:

NOTICE:  table "base_table" is already a hypertable, skipping
 create_hypertable
-------------------
 (7,f)
(1 row)

Convert a table with data to a hypertable

To create a hypertable from a table with data, specify an optional parameter — migrate_data. The code below creates a new table and populates it with data:

CREATE FUNCTION return_location() RETURNS text AS $$
BEGIN
    RETURN 'location ' || floor(random() * 5+1 ):: text;
END;
$$ LANGUAGE plpgsql;

CREATE TABLE "conditions"(
    time         TIMESTAMP WITH TIME ZONE NOT NULL,
    location    TEXT DEFAULT return_location(),
    temperature  NUMERIC,
    humidity     NUMERIC
);

INSERT INTO conditions(time, temperature, humidity) SELECT generate_series(
    now() - INTERVAL '6 months',
    now(),
    INTERVAL '1 minute'
    )as time,
    round(random()*100) as temperature,
    round(random()*100) as humidity;

Convert conditions to a hypertable passing the migrate_data parameter:

SELECT create_hypertable('conditions', 'time', migrate_data => true);

Result:

NOTICE:  migrating data to chunks
DETAIL:  Migration might take a while depending on the amount of data.
    create_hypertable
--------------------------
 (14,public,conditions,t)

If a converted table contains data, but the migrate_data parameter is not passed or set to false, an error occurs:

ERROR:  table "conditions" is not empty
HINT:  You can migrate data by specifying 'migrate_data => true' when calling this function.

Compression

The timescaledb extension allows you to compress time-series data and reduce chunk size significantly. When you enable compression, the data in the hypertable is compressed chunk by chunk. When the chunk is compressed, multiple records are combined into a single row. This reduces the amount of disk space needed, as a single row occupies less space than multiple rows. For more information on this feature, see About compression.

To enable compression, follow the steps below.

Use the ALTER TABLE statement to turn on compression and set compression options:

ALTER TABLE conditions SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'location'
);

Call the add_compression_policy function to add a compression policy. The following code compresses chunks that are older than 1 hour:
```
SELECT add_compression_policy('conditions', INTERVAL '1 hour');
```

It is possible to decompress a certain chunk using the decompress_chunk function, for example, to maximize performance if various DML operations need to be performed. You can use the timescaledb_information.chunks view to determine a chunk name and pass it as a parameter to decompress_chunk.

SELECT decompress_chunk('_timescaledb_internal._hyper_14_376_chunk');

Use the remove_compression_policy function to remove compression:

SELECT remove_compression_policy('conditions');

Use timescaledb hyperfunctions to analyze data

The timescaledb extension provides hyperfunctions to analyze data. This section describes some of them. The examples in this section utilize the conditions table, the creation of which is described above.

first and last

The first and last functions allow you to get the first or last value respectively of one column as ordered by another. The following example returns the earliest temperature value based on time within each location:

SELECT location, first(temperature, time) FROM conditions
    GROUP BY location ORDER BY location;

Result:

  location  | first
------------+-------
 location 1 |    29
 location 2 |    30
 location 3 |    43
 location 4 |    87
 location 5 |    95

The code below returns the latest humidity value based on time within a location:

SELECT location, last(humidity, time) FROM conditions
    GROUP BY location ORDER BY location;

Result:

  location  | last
------------+------
 location 1 |   15
 location 2 |   69
 location 3 |   47
 location 4 |   98
 location 5 |   69

time_bucket

The time_bucket function allows you to aggregate data into buckets of time.

The following query calculates the average temperature per day for a specific location:

SELECT time_bucket('1 day', time) AS bucket_time, AVG(temperature) AS avg_temperature
    FROM conditions WHERE location = 'location 3'
    GROUP BY bucket_time ORDER BY bucket_time
    LIMIT 5;

Result:

      bucket_time       |   avg_temperature
------------------------+---------------------
 2024-09-02 00:00:00+00 | 49.0534759358288770
 2024-09-03 00:00:00+00 | 49.6323529411764706
 2024-09-04 00:00:00+00 | 49.0774410774410774
 2024-09-05 00:00:00+00 | 47.7973856209150327
 2024-09-06 00:00:00+00 | 51.7582417582417582

histogram

The histogram function calculates how many times a value falls into a specified range (bucket). For example, the query below breaks up the humidity range 0-100 into 10 buckets, and returns how many times the humidity value falls into each bucket for each location.

SELECT location, histogram(humidity, 0, 100, 10)
    FROM conditions GROUP BY location ORDER BY location;

The return value is an array containing the specified number of buckets plus two buckets. The first bucket at the head of the array is for values under the lower min bound, and the last bucket is for values greater than or equal to the max bound:

  location  |                         histogram
------------+-----------------------------------------------------------
 location 1 | {0,4903,5316,5124,5226,5200,5219,5386,5173,5206,5172,278}
 location 2 | {0,4830,5338,5343,5150,5222,5189,5154,5333,5230,5275,267}
 location 3 | {0,4949,5264,5217,5216,5194,5278,5077,5214,5190,5246,268}
 location 4 | {0,4922,5075,5192,5134,5329,5083,5349,5302,5128,5262,277}
 location 5 | {0,4892,5224,5104,5290,5050,5291,5179,5140,5191,5320,260}

Values equal to the min bucket value are included in the bucket starting with this value, but values equal to the max value are included to the next bucket. The test data of the example contains 1350 humidity values that equal 100. In the result, the last (twelve) bucket contains this value distributed by locations (location 1 — 278, location 2 — 267, location 3 — 268, location 4 — 277, location 1 — 260).

Found a mistake? Seleсt text and press Ctrl+Enter to report it