Конференция Arenadata
Новое время — новый Greenplum
Мы приглашаем вас принять участие в конференции, посвященной будущему Open-Source Greenplum 19 сентября в 18:00:00 UTC +3. Встреча будет проходить в гибридном формате — и офлайн, и онлайн. Онлайн-трансляция будет доступна для всех желающих.
Внезапное закрытие Greenplum его владельцем — компанией Broadcom - стало неприятным сюрпризом для всех, кто использует или планирует начать использовать решения на базе этой технологии. Многие ожидают выхода стабильной версии Greenplum 7 и надеются на её дальнейшее активное развитие.
Arenadata не могла допустить, чтобы разрабатываемый годами Open-Source проект Greenplum прекратил своё существование, поэтому 19 сентября мы представим наш ответ на данное решение Broadcom, а участники сообщества получат исчерпывающие разъяснения на все вопросы о дальнейшей судьбе этой технологии.

На конференции вас ждёт обсуждение следующих тем:

  • План возрождения Greenplum;
  • Дорожная карта;
  • Экспертное обсуждение и консультации.
Осталось до события

ADB ClickHouse Connector configuration

To send data from ADB to ClickHouse via ADB ClickHouse Connector, you should first create the following objects on the ADB cluster side:

  1. Server — encapsulates connection information that a foreign data wrapper uses to access an external data source.

  2. User mapping — provides authentication to an external data source by mapping users.

  3. Foreign table — a table in ADB that defines the remote data structure. A foreign table has no storage in ADB, but can be used in queries just like a normal table.

To see how to use the listed objects to send data from ADB to ADQM, refer to ADB ClickHouse Connector usage examples.

IMPORTANT
  • For each ClickHouse cluster to which you plan to send queries from ADB, it is sufficient to create one server and one user mapping. However, there may be several foreign tables if you need to send queries against various ClickHouse tables.

  • All options that are listed in the tables below (Server options, User mapping options, Foreign table options) should be defined in the text format (i.e. in single quotes '). The Type column in the tables shows what data type will be used when validating input values.

Server

To create a server, use the CREATE SERVER command. The basic command syntax is listed below:

CREATE SERVER <server_name> [ TYPE '<server_type>' ] [ VERSION '<server_version>' ]
    FOREIGN DATA WRAPPER <fdw_name>
    [ OPTIONS ( [ <option> '<value>' [, ... ]] ) ]

where:

  • <server_name> — a server name in ADB. Should be unique within the current ADB database.

  • <server_type> — an optional server type.

  • <server_version> — an optional server version.

  • <fdw_name> — a foreign data wrapper name. You should use the tkh_fdw foreign data wrapper, which is automatically created after the connector installation (see step 8 in ADB ClickHouse Connector installation).

  • <option> — server options that define the connection details. The options that are available for ADB ClickHouse Connector are listed in the Server options table. Note that these options can be defined both at the server level and at the foreign table level. Options marked as required should be specified at least in one definition (server or table). Table-level options take precedence.

  • <value> — option values.

NOTE
  • To create a server, you need the USAGE privilege on the tkh_fdw foreign data wrapper. The user who creates the server becomes its owner.

  • The full description of the CREATE SERVER command syntax is available in the Greenplum documentation.

  • To edit a server definition, use the ALTER SERVER command. To delete a server, use DROP SERVER.

Server options
Name Type Description Default Required

database

TEXT

A name of the ClickHouse database where the target table is located.

Should match the following regular expression: ^[a-zA-Z_][0-9a-zA-Z_]*$

 — 

Yes

hosts

TEXT

A list of ClickHouse hosts where the target table is located. A ClickHouse cluster does not have a single entry point (Master node), so the requests can be sent to any host in the cluster. To reduce the load, data batches are sent to hosts from the connector according to the random or the round-robin distribution strategy (see distribution_type below).

Validation rules for the hosts option are listed below

 — 

Yes

lines_batch_size

INT

The maximum number of rows in the data batch sent to ClickHouse by the connector.

Positive integers are allowed

100000

No

bytes_batch_size_mb

INT

The maximum batch size of data sent to ClickHouse by the connector (in MB). If the lines_batch_size option is also defined, the bytes_batch_size_mb value will be used.

Positive integers are allowed

 — 

No

send_compressed

BOOL

The flag that indicates whether to use compression on the connector side before sending data to ClickHouse.

Possible values:

  • true — use data compression.

  • false — do not use data compression.

false

No

send_delay

INT

A delay between connector requests to ClickHouse (in milliseconds).

Positive integers are allowed

300

No

insert_distributed_sync

BOOL

Enables or disables synchronous data load to distributed tables.

Possible values:

  • true — data is inserted synchronously. The INSERT query is considered successful after all data is added to all shards (at least to one replica per chard if internal_replication = true in ClickHouse).

  • false — data is inserted asynchronously.

true

No

distribution_type

TEXT

Defines the strategy of load distribution between ClickHouse hosts (see hosts above) when performing data load operations from the connector side.

Possible values:

  • random — hosts are selected randomly.

  • round-robin — hosts are selected based on the round-robin algorithm. This option value is preferable in terms of the uniform load distribution.

random

No

use_staging

BOOL

The flag that indicates whether to create staging tables in ClickHouse before inserting data into the target table. Used to emulate transactions that are not present in ClickHouse.

Staging tables are supported only for the following ClickHouse engine families: MergeTree (including Replicated* engines) and Distributed. For other engines, an error will be returned, since not all engine types support partitioning in ClickHouse.

Possible values:

  • true — use staging tables.

  • false — do not use staging tables. In this mode, there are certain risks, as the consistency of data insertions can be disturbed. However, the mode is suitable for single insertions into empty target tables.

 — 

Yes

staging_table_name_format

TEXT

Defines the staging table name format (see use_staging above). Example: '$_tmp_$'. The first placeholder ($) is replaced with the name of the target table in ClickHouse (see resource in Foreign table options). The second placeholder is replaced with the integer identifier that is automatically generated on the Master node of ADB

 — 

Yes if use_staging = true

clickhouse_properties

TEXT

A list of ClickHouse custom options in the <custom_option>=<value>;[…​] format, where <custom_option> is an option name and <value> is an option value

 — 

No

fdw_startup_cost

DOUBLE

The cost estimate for obtaining the first tuple from the data source.

Positive double values are allowed.

Can be used only in SELECT queries to ADQM/ClickHouse

1000.0

No

fdw_tuple_cost

DOUBLE

The cost estimate for obtaining one tuple from the data source.

Positive double values are allowed.

Can be used only in SELECT queries to ADQM/ClickHouse

0.01

No

fdw_tuples_count

DOUBLE

The estimate for the number of all rows in the data source. Does not take into account the query selectivity, LIMIT clauses, etc.

Positive double values >= 1 are allowed.

Can be used only in SELECT queries to ADQM/ClickHouse

100000.0

No

Validation rules for the hosts option
  • The list of ClickHouse hosts should be specified in the following format: <hostname>:<port> [, …​], where <hostname> is a name of each host and <port> is an HTTP port number to connect to the corresponding host.

  • The maximum length of the <hostname> value is 255 characters.

  • You can use in the <hostname> value digits 0-9, ASCII characters a-zA-Z, hyphens -, underscores _, and dots ..

  • The presence of the <port> substring for each <hostname> is required. Only digits 0-9 are allowed in the <port> value. The maximum length of the <port> value is 5 characters.

  • The maximum length of the hosts value: (255 (the <hostname> limit) + 1 (:) + 5 (the <port> limit) + 1 (,)) x 127 (number of hosts) = 33274 characters.

User mapping

To create a user mapping, use the CREATE USER MAPPING command. The basic command syntax is listed below:

CREATE USER MAPPING FOR { <username> | USER | CURRENT_USER | PUBLIC }
    SERVER <server_name>
    [ OPTIONS ( <option> '<value>' [, ... ] ) ]

where:

  • <username> — a user name in ADB. The following constants are allowed:

    • USER or CURRENT_USER — the current user who established the ADB connection.

    • PUBLIC — all present and future ADB users.

  • <server_name> — a server name.

  • <option> — user mapping options, which typically include a user name and a password to connect to an external data source. The options that are available for ADB ClickHouse Connector are listed in the User mapping options table.

  • <value> — option values.

NOTE
User mapping options
Name Type Description Default Required

clickhouse_username

TEXT

A user name in ClickHouse

 — 

Yes

clickhouse_password

TEXT

A user password in ClickHouse. If there is no password (for example, in the test environment), an empty string is specified

 — 

Yes

Foreign table

To create a foreign table, use the CREATE FOREIGN TABLE command. The basic command syntax is listed below:

CREATE FOREIGN TABLE [ IF NOT EXISTS ] <table_name> ( [
    <column_name> <data_type> [ COLLATE <collation> ] [ <column_constraint> [ ... ] ]
      [, ... ]
] )
    SERVER <server_name>
  [ OPTIONS ( <option> '<value>' [, ... ] ) ]

where:

  • <table_name> — a foreign table name in ADB.

  • <column_name> — a column name.

  • <data_type> — a column data type.

  • <collation> — a column collation.

  • <column_constraint> — a constraint defined at the column level. The name <constraint_name> is given optionally. The syntax:

    [ CONSTRAINT <constraint_name> ]
    { NOT NULL |
      NULL |
      DEFAULT <default_expr> }

    Possible constraints:

    • NOT NULL — specifies that the column is not allowed to contain null values.

    • NULL — specifies that the column is allowed to contain null values. It is the default behavior (unless NOT NULL is specified).

    • DEFAULT — sets the default column value equal to <default_expr>.

  • <server_name> — a server name.

  • <option> — foreign table options. For ADB ClickHouse Connector, all options defined at the server level can be overwritten at the foreign table level (partially or fully). When you specify an option for a server and a foreign table at the same time, the table level takes precedence. Additionally, at the foreign table level, two more options are available (see Foreign table options below).

  • <value> — option values.

NOTE
  • To create a foreign table, you need the USAGE privilege on the corresponding server and on all column data types that are used in the table definition. The user who creates the foreign table becomes its owner.

  • The full description of the CREATE FOREIGN TABLE command syntax is available in the Greenplum documentation.

  • To edit a foreign table definition, use the ALTER FOREIGN TABLE command. To drop a foreign table, use DROP FOREIGN TABLE.

Foreign table options
Name Type Description Default Required

resource

TEXT

A table name in ClickHouse.

If you need your own strategy for distribution data across hosts (sharding), you should insert data into the distributed ClickHouse table with the sharding key configured. To do this, specify the distributed table name in the resource option. When writing data to a distributed table, remember that physical (local) tables should be present on all hosts in the ClickHouse cluster, otherwise the connector will return an error.

The option value should match the following regular expression: ^[a-zA-Z_][0-9a-zA-Z_]*$

 — 

Yes

cluster

TEXT

A cluster name in ClickHouse. For more information, see Typical cluster in the ADQM documentation.

If you specify the cluster option, the list of ClickHouse hosts should be defined in the hosts option at the foreign table level and consist of one element <hostname>:<port>, even if it has been already declared at the server level. Otherwise, there may be an ambiguous situation when there is a list of hosts at the server level, but the work is actually done with hosts received from the ClickHouse cluster metadata.

The cluster option provides the convenience of users. With that option, it is enough to specify a cluster name and one entry point, the full list of hosts will be received from ClickHouse. The load between hosts will be still distributed according to the selected distribution_type value (see Server options).

The maximum cluster length is 128 characters

 — 

No

Found a mistake? Seleсt text and press Ctrl+Enter to report it