Use timescaledb
Overview
The timescaledb extension offers additional capabilities for processing time series data. It allows you to perform scalable inserts and complex queries. It also increases the performance of time series data queries.
The package required for the timescaledb
installation is shipped with ADPG. To use timescaledb
, just run the CREATE EXTENSION command:
CREATE EXTENSION timescaledb;
NOTE
If the timescaledb extension is created in the template1 database used as the default template, all subsequently created databases will have this extension installed.
|
ADPG uses the timescaledb
version 2.14.2. To check it, execute the following query:
SELECT extversion FROM pg_extension
WHERE extname = 'timescaledb';
Result:
extversion ------------ 2.14.2
The timescaledb
extension adds a number of functions, views, and tables to a database. For more information on timescaledb
API, see TimescaleDB API reference.
You can use the \dn
psql
meta-command to display the various schemas that timescaledb
has created.
Result:
List of schemas Name | Owner --------------------------+---------- _timescaledb_cache | postgres _timescaledb_catalog | postgres _timescaledb_config | postgres _timescaledb_debug | postgres _timescaledb_functions | postgres _timescaledb_internal | postgres public | postgres timescaledb_experimental | postgres timescaledb_information | postgres
Hypertables
The timescaledb
extension provides hypertables that are a high-level abstraction, extending traditional PostgreSQL tables to operate with temporal data more effectively. Any operation that you can do with regular PostgreSQL tables, you can perform with hypertables. A hypertable simplifies the organization and retrieval of time-series information by providing built-in partitioning based on time intervals. timescaledb
splits hypertables into chunks.
Create an empty hypertable
To generate a hypertable, you need to create an ordinary table and pass it as a parameter to the create_hypertable function:
CREATE TABLE base_table(
id uuid,
time_column timestamptz not null default clock_timestamp(),
data_column1 int default random()*1E5,
data_column2 int default random()*1E5
);
SELECT create_hypertable('base_table', by_range('time_column'));
Result:
create_hypertable ------------------- (7,t)
The number from the output (in this case 7
) will be used later in the names of chunks belonging to this hypertable.
The create_hypertable
function has two required parameters:
-
relation
— an identifier of the table to convert to a hypertable. Therelation
must be of theregclass
type ('base_table'
in the example above). -
dimension
— must be of the dimension_info type. To create adimension_info
instance, you need to call the by_range or by_hash function. This parameter specifies conditions according to which a table will be divided into chunks (by_range('time_column')
in the example above).
Populate the hypertable with data:
INSERT INTO base_table(id, time_column) SELECT gen_random_uuid(), generate_series(
now() - INTERVAL '6 months',
now(),
INTERVAL '1 minute'
);
Use the \d+
psql
meta-command to display information on the base_table
hypertable:
\d+ base_table
Result:
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------------+--------------------------+-----------+----------+--------------------------------------------------+---------+-------------+--------------+------------- id | uuid | | | | plain | | | time_column | timestamp with time zone | | not null | clock_timestamp() | plain | | | data_column1 | integer | | | (random() * '100000'::numeric::double precision) | plain | | | data_column2 | integer | | | (random() * '100000'::numeric::double precision) | plain | | | Indexes: "base_table_time_column_idx" btree (time_column DESC) Triggers: ts_insert_blocker BEFORE INSERT ON base_table FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker() Child tables: _timescaledb_internal._hyper_7_57_chunk, _timescaledb_internal._hyper_7_58_chunk, _timescaledb_internal._hyper_7_59_chunk, _timescaledb_internal._hyper_7_60_chunk, _timescaledb_internal._hyper_7_61_chunk, _timescaledb_internal._hyper_7_62_chunk, _timescaledb_internal._hyper_7_63_chunk, _timescaledb_internal._hyper_7_64_chunk, _timescaledb_internal._hyper_7_65_chunk, _timescaledb_internal._hyper_7_66_chunk, _timescaledb_internal._hyper_7_67_chunk, _timescaledb_internal._hyper_7_68_chunk, _timescaledb_internal._hyper_7_69_chunk, _timescaledb_internal._hyper_7_70_chunk, _timescaledb_internal._hyper_7_71_chunk, _timescaledb_internal._hyper_7_72_chunk, _timescaledb_internal._hyper_7_73_chunk, _timescaledb_internal._hyper_7_74_chunk, _timescaledb_internal._hyper_7_75_chunk, _timescaledb_internal._hyper_7_76_chunk, _timescaledb_internal._hyper_7_77_chunk, _timescaledb_internal._hyper_7_78_chunk, _timescaledb_internal._hyper_7_79_chunk, _timescaledb_internal._hyper_7_80_chunk, _timescaledb_internal._hyper_7_81_chunk, _timescaledb_internal._hyper_7_82_chunk, _timescaledb_internal._hyper_7_83_chunk, _timescaledb_internal._hyper_7_84_chunk Access method: heap
In the output, you can see that timescaledb
has created 28 chunks for base_table
.
You can also specify an interval for chunks. To do this, set an interval in the by_range
function. It is possible to set it in milliseconds:
CREATE TABLE table1 (LIKE base_table);
SELECT create_hypertable('table1', by_range('time_column', 86400000000));
Alternatively, use the INTERVAL
keyword:
CREATE TABLE table2 (LIKE base_table);
SELECT create_hypertable('table2', by_range('time_column', INTERVAL '1 day'));
When a hypertable is created, timescaledb
generates indexes on partitioning columns. The create_default_indexes
parameter controls this behavior. It is an optional parameter, and its default value is true
. You can cancel index creation by setting its value to false
:
CREATE TABLE table3 (LIKE base_table);
SELECT create_hypertable('table3', by_range('time_column'), create_default_indexes=> false);
If you try to call the create_hypertable
function for a table that is already a hypertable, the exception is raised: ERROR: table "base_table" is already a hypertable
. You can set an optional parameter if_not_exists
to true
to print a warning instead of raising the exception:
SELECT create_hypertable('base_table', by_range('time_column'), if_not_exists=> true);
Result:
NOTICE: table "base_table" is already a hypertable, skipping create_hypertable ------------------- (7,f) (1 row)
Convert a table with data to a hypertable
To create a hypertable from a table with data, specify an optional parameter — migrate_data
. The code below creates a new table and populates it with data:
CREATE FUNCTION return_location() RETURNS text AS $$
BEGIN
RETURN 'location ' || floor(random() * 5+1 ):: text;
END;
$$ LANGUAGE plpgsql;
CREATE TABLE "conditions"(
time TIMESTAMP WITH TIME ZONE NOT NULL,
location TEXT DEFAULT return_location(),
temperature NUMERIC,
humidity NUMERIC
);
INSERT INTO conditions(time, temperature, humidity) SELECT generate_series(
now() - INTERVAL '6 months',
now(),
INTERVAL '1 minute'
)as time,
round(random()*100) as temperature,
round(random()*100) as humidity;
Convert conditions
to a hypertable passing the migrate_data
parameter:
SELECT create_hypertable('conditions', 'time', migrate_data => true);
Result:
NOTICE: migrating data to chunks DETAIL: Migration might take a while depending on the amount of data. create_hypertable -------------------------- (14,public,conditions,t)
If a converted table contains data, but the migrate_data
parameter is not passed or set to false
, an error occurs:
ERROR: table "conditions" is not empty HINT: You can migrate data by specifying 'migrate_data => true' when calling this function.
Compression
The timescaledb
extension allows you to compress time-series data and reduce chunk size significantly. When you enable compression, the data in the hypertable is compressed chunk by chunk. When the chunk is compressed, multiple records are combined into a single row. This reduces the amount of disk space needed, as a single row occupies less space than multiple rows. For more information on this feature, see About compression.
To enable compression, follow the steps below.
-
Use the ALTER TABLE statement to turn on compression and set compression options:
ALTER TABLE conditions SET ( timescaledb.compress, timescaledb.compress_segmentby = 'location' );
-
Call the add_compression_policy function to add a compression policy. The following code compresses chunks that are older than 1 hour:
SELECT add_compression_policy('conditions', INTERVAL '1 hour');
It is possible to decompress a certain chunk using the decompress_chunk function, for example, to maximize performance if various DML operations need to be performed. You can use the timescaledb_information.chunks view to determine a chunk name and pass it as a parameter to decompress_chunk
.
SELECT decompress_chunk('_timescaledb_internal._hyper_14_376_chunk');
Use the remove_compression_policy function to remove compression:
SELECT remove_compression_policy('conditions');
Use timescaledb hyperfunctions to analyze data
The timescaledb
extension provides hyperfunctions to analyze data. This section describes some of them. The examples in this section utilize the conditions
table, the creation of which is described above.
first and last
The first and last functions allow you to get the first or last value respectively of one column as ordered by another. The following example returns the earliest temperature value based on time within each location:
SELECT location, first(temperature, time) FROM conditions
GROUP BY location ORDER BY location;
Result:
location | first ------------+------- location 1 | 29 location 2 | 30 location 3 | 43 location 4 | 87 location 5 | 95
The code below returns the latest humidity value based on time within a location:
SELECT location, last(humidity, time) FROM conditions
GROUP BY location ORDER BY location;
Result:
location | last ------------+------ location 1 | 15 location 2 | 69 location 3 | 47 location 4 | 98 location 5 | 69
time_bucket
The time_bucket function allows you to aggregate data into buckets of time.
The following query calculates the average temperature per day for a specific location:
SELECT time_bucket('1 day', time) AS bucket_time, AVG(temperature) AS avg_temperature
FROM conditions WHERE location = 'location 3'
GROUP BY bucket_time ORDER BY bucket_time
LIMIT 5;
Result:
bucket_time | avg_temperature ------------------------+--------------------- 2024-09-02 00:00:00+00 | 49.0534759358288770 2024-09-03 00:00:00+00 | 49.6323529411764706 2024-09-04 00:00:00+00 | 49.0774410774410774 2024-09-05 00:00:00+00 | 47.7973856209150327 2024-09-06 00:00:00+00 | 51.7582417582417582
histogram
The histogram function calculates how many times a value falls into a specified range (bucket). For example, the query below breaks up the humidity range 0-100
into 10 buckets, and returns how many times the humidity value falls into each bucket for each location.
SELECT location, histogram(humidity, 0, 100, 10)
FROM conditions GROUP BY location ORDER BY location;
The return value is an array containing the specified number of buckets plus two buckets. The first bucket at the head of the array is for values under the lower min bound, and the last bucket is for values greater than or equal to the max bound:
location | histogram ------------+----------------------------------------------------------- location 1 | {0,4903,5316,5124,5226,5200,5219,5386,5173,5206,5172,278} location 2 | {0,4830,5338,5343,5150,5222,5189,5154,5333,5230,5275,267} location 3 | {0,4949,5264,5217,5216,5194,5278,5077,5214,5190,5246,268} location 4 | {0,4922,5075,5192,5134,5329,5083,5349,5302,5128,5262,277} location 5 | {0,4892,5224,5104,5290,5050,5291,5179,5140,5191,5320,260}
Values equal to the min bucket value are included in the bucket starting with this value, but values equal to the max value are included to the next bucket. The test data of the example contains 1350 humidity values that equal 100
. In the result, the last (twelve) bucket contains this value distributed by locations (location 1 — 278, location 2 — 267, location 3 — 268, location 4 — 277, location 1 — 260).