Конференция Arenadata
Новое время — новый Greenplum
Мы приглашаем вас принять участие в конференции, посвященной будущему Open-Source Greenplum 19 сентября в 18:00:00 UTC +3. Встреча будет проходить в гибридном формате — и офлайн, и онлайн. Онлайн-трансляция будет доступна для всех желающих.
Внезапное закрытие Greenplum его владельцем — компанией Broadcom - стало неприятным сюрпризом для всех, кто использует или планирует начать использовать решения на базе этой технологии. Многие ожидают выхода стабильной версии Greenplum 7 и надеются на её дальнейшее активное развитие.
Arenadata не могла допустить, чтобы разрабатываемый годами Open-Source проект Greenplum прекратил своё существование, поэтому 19 сентября мы представим наш ответ на данное решение Broadcom, а участники сообщества получат исчерпывающие разъяснения на все вопросы о дальнейшей судьбе этой технологии.

На конференции вас ждёт обсуждение следующих тем:

  • План возрождения Greenplum;
  • Дорожная карта;
  • Экспертное обсуждение и консультации.
Осталось до события

ADB to ADB Connector overview

Features

ADB to ADB Connector provides two-way interaction between ADB clusters. The connector is implemented on the basis of a foreign data wrapper and parallel retrieve cursors.

ADB to ADB Connector has the following features:

  • Segments of a local cluster have the ability to select data directly from segments of a remote cluster in a parallel mode. For some queries (e.g. using aggregates), the "master-master" mode is supported.

  • Transactional data insertion from local to remote clusters in the "master-master" mode is available. ADB to ADB Connector provides transaction support similar to postgres_fdw in the following way:

    • Start a transaction when opening a connection.

    • Implement inner transaction processing via SAVEPOINT.

    • Process transaction rollbacks.

    • Close parameterized queries.

  • ADB to ADB Connector supports the ability to calculate the number of query executors (QE) automatically — see Supported work modes below. If users do not want to specify the number of segments in a remote cluster, they can set the use_remote_estimate option to true at the server or foreign table level — to automatically estimate this number based on the query (see ADB to ADB Connector configuration for more details). Also, changing the number of segments in a remote cluster does not affect ADB to ADB Connector.

IMPORTANT
  • ADB to ADB Connector is available in the Enterprise version of ADB 6 (starting with 6.25.1.51).

  • ADB to ADB Connector does not support the UPDATE and DELETE queries. Trying to call them will cause an error: ERROR: adb_fdw. UPDATE and DELETE queries are forbidden.

  • Sending INSERT queries to a target (remote) cluster is not the main operating mode of ADB to ADB Connector and is not suitable for bulk data loading. A more effective strategy is to use the connector in the SELECT mode on a remote cluster, and then to call INSERT on its side. In this case, the remote cluster segments receive data in parallel directly from the local cluster segments, and then during the INSERT operation the remote cluster distributes data to its own segments.

Query execution schemas

SELECT

SELECT queries are processed by ADB to ADB Connector in the following sequence:

  1. A user sends a SELECT query to the master host of the local cluster.

  2. The local master requests creation of a parallel cursor on the master of the remote cluster.

  3. The remote master creates endpoints.

  4. As a result of calling the gp_get_endpoints function, the remote master returns some service information about the endpoints created: their number, IP addresses and port numbers, as well as additional data required for authentication.

  5. On the basis of the received information about the number of endpoints, an equal number of query executors (QE) is created on the segments of the local cluster. The retreive sessions are started on local segments.

  6. Local cluster segments start reading data directly from remote cluster segments.

  7. Local cluster segments return data to the local master in order to pass the query result to the user.

SELECT execution schema
SELECT execution schema
SELECT execution schema
SELECT execution schema

INSERT

INSERT queries are processed by ADB to ADB Connector in the "master-master" mode, which means the local and remote cluster segments do not interact directly with each other. Currently, this mode is the only guarantee of data consistency.

INSERT execution schema
INSERT execution schema
INSERT execution schema
INSERT execution schema

Supported push-down predicates

Due to a number of limitations present in Greenplum 6, the current version of ADB to ADB Connector does not fully implement push-down predicate support for SELECT queries (with remote-side processing). The list of available predicates and their execution features are shown in the table below.

Predicate Push-down support Note

WHERE

Yes, with exceptions

Only predicates operating on built-in data types, operators, and functions are transferred to the remote cluster. Also, the operators and functions that are used in the WHERE clause should satisfy the IMMUTABLE condition. Internal system columns, except for ctid, are not transferred to the remote cluster

LIMIT

No

The condition is applied on the local cluster side

ORDER BY

No

The condition is applied on the local cluster side

Any aggregate function

No

The remote cluster returns all rows that meet the conditions of the WHERE clause (if they are present). Then the rows are processed within the aggregate function on the local cluster side

Any type of JOIN

No

The remote cluster returns all rows that meet the conditions of the WHERE clause (if they are present). Then the rows are processed within JOIN operator on the local cluster side

Supported work modes

NOTE

Define the number of executors automatically

In this mode, the use_remote_estimate option is set to true at the server or foreign table level. As a result, the number of required query executors is calculated automatically — equal to the number of endpoints of the parallel cursor.

The server definition example is given below:

CREATE SERVER test_server
FOREIGN DATA WRAPPER adb_fdw
OPTIONS (
    host 'bds-mdw-ext',
    port '5432',
    dbname 'adb',
    use_remote_estimate 'true');
Pros Cons
  • The number of executors is calculated automatically. Users do not need to know this number in advance.

  • Any changes in the number of remote segments are processed correctly — without the need to update the foreign table configuration.

A parallel cursor should be created and opened in a remote cluster during the query planning phase

Use the number of executors equal to the number of local segments

In this mode, the use_remote_estimate and num_segments options are not specified. As a result, the number of executors is static and equal to the number of segments in the local cluster. If the endpoints number for the query does not match the number of local segments, then the mapping to local executors is performed.

The server definition example is given below:

CREATE SERVER test_server
FOREIGN DATA WRAPPER adb_fdw
OPTIONS (
    host 'bds-mdw-ext',
    port '5432',
    dbname 'adb');
Pros Cons

There is no need to create and open a parallel cursor in a remote cluster during the query planning stage

If a remote cluster has a larger number of segments compared to a local cluster, some local cluster executors will get more endpoints which will cause a processing skew. The endpoints assigned to the same Query Executor will be processed sequentially, which, in turn, will reduce the processing parallelization level.

Define the number of executors explicitly

In this mode, the use_remote_estimate option is not specified (by default, it is false), but the num_segments option is defined at the server level, which explicitly specifies the number of executors. Unlike the previous scheme, you can define a number of executors other than the number of local segments — for example, equal to the number of segments in the remote cluster. If the endpoints number for the query does not match the number of local segments, then the mapping to local executors is performed.

The server definition example is given below:

CREATE SERVER test_server
FOREIGN DATA WRAPPER adb_fdw
OPTIONS (
    host 'bds-mdw-ext',
    port '5432',
    dbname 'adb',
    num_segments '16');
Pros Cons
  • There is no need to create and open a parallel cursor in a remote cluster during the query planning stage.

  • The advantage over the the previous scheme is the ability to eliminate a processing skew by explicitly matching the number of executors to the number of remote segments.

  • The number of query executors should be calculated manually. The automatic work mode does not have this drawback.

  • The possibility of a processing skew during query execution is not completely excluded (in case of incorrect num_segments definition).

Found a mistake? Seleсt text and press Ctrl+Enter to report it