ADB to ADB Connector overview
Features
ADB to ADB Connector provides two-way interaction between ADB clusters. The connector is implemented on the basis of a foreign data wrapper and parallel retrieve cursors.
ADB to ADB Connector has the following features:
-
Segments of a local cluster have the ability to select data directly from segments of a remote cluster in a parallel mode. For some queries (e.g. using aggregates), the "master-master" mode is supported.
-
Transactional data insertion from local to remote clusters in the "master-master" mode is available. ADB to ADB Connector provides transaction support similar to postgres_fdw in the following way:
-
Start a transaction when opening a connection.
-
Implement inner transaction processing via SAVEPOINT.
-
Process transaction rollbacks.
-
Close parameterized queries.
-
-
ADB to ADB Connector supports the ability to calculate the number of query executors (QE) automatically — see Supported work modes below. If users do not want to specify the number of segments in a remote cluster, they can set the
use_remote_estimate
option totrue
at the server or foreign table level — to automatically estimate this number based on the query (see ADB to ADB Connector configuration for more details). Also, changing the number of segments in a remote cluster does not affect ADB to ADB Connector.
IMPORTANT
|
Query execution schemas
SELECT
SELECT
queries are processed by ADB to ADB Connector in the following sequence:
-
A user sends a
SELECT
query to the master host of the local cluster. -
The local master requests creation of a parallel cursor on the master of the remote cluster.
-
The remote master creates endpoints.
-
As a result of calling the
gp_get_endpoints
function, the remote master returns some service information about the endpoints created: their number, IP addresses and port numbers, as well as additional data required for authentication. -
On the basis of the received information about the number of endpoints, an equal number of query executors (QE) is created on the segments of the local cluster. The retreive sessions are started on local segments.
-
Local cluster segments start reading data directly from remote cluster segments.
-
Local cluster segments return data to the local master in order to pass the query result to the user.
Supported push-down predicates
Due to a number of limitations present in Greenplum 6, the current version of ADB to ADB Connector does not fully implement push-down predicate support for SELECT
queries (with remote-side processing). The list of available predicates and their execution features are shown in the table below.
Predicate | Push-down support | Note |
---|---|---|
WHERE |
Yes, with exceptions |
Only predicates operating on built-in data types, operators, and functions are transferred to the remote cluster. Also, the operators and functions that are used in the |
LIMIT |
No |
The condition is applied on the local cluster side |
ORDER BY |
No |
The condition is applied on the local cluster side |
Any aggregate function |
No |
The remote cluster returns all rows that meet the conditions of the |
Any type of JOIN |
No |
The remote cluster returns all rows that meet the conditions of the |
Supported work modes
NOTE
|
Define the number of executors automatically
In this mode, the use_remote_estimate
option is set to true
at the server or foreign table level. As a result, the number of required query executors is calculated automatically — equal to the number of endpoints of the parallel cursor.
The server definition example is given below:
CREATE SERVER test_server
FOREIGN DATA WRAPPER adb_fdw
OPTIONS (
host 'bds-mdw-ext',
port '5432',
dbname 'adb',
use_remote_estimate 'true');
Pros | Cons |
---|---|
|
A parallel cursor should be created and opened in a remote cluster during the query planning phase |
Use the number of executors equal to the number of local segments
In this mode, the use_remote_estimate
and num_segments
options are not specified. As a result, the number of executors is static and equal to the number of segments in the local cluster. If the endpoints number for the query does not match the number of local segments, then the mapping to local executors is performed.
The server definition example is given below:
CREATE SERVER test_server
FOREIGN DATA WRAPPER adb_fdw
OPTIONS (
host 'bds-mdw-ext',
port '5432',
dbname 'adb');
Pros | Cons |
---|---|
There is no need to create and open a parallel cursor in a remote cluster during the query planning stage |
If a remote cluster has a larger number of segments compared to a local cluster, some local cluster executors will get more endpoints which will cause a processing skew. The endpoints assigned to the same Query Executor will be processed sequentially, which, in turn, will reduce the processing parallelization level. |
Define the number of executors explicitly
In this mode, the use_remote_estimate
option is not specified (by default, it is false
), but the num_segments
option is defined at the server level, which explicitly specifies the number of executors. Unlike the previous scheme, you can define a number of executors other than the number of local segments — for example, equal to the number of segments in the remote cluster. If the endpoints number for the query does not match the number of local segments, then the mapping to local executors is performed.
The server definition example is given below:
CREATE SERVER test_server
FOREIGN DATA WRAPPER adb_fdw
OPTIONS (
host 'bds-mdw-ext',
port '5432',
dbname 'adb',
num_segments '16');
Pros | Cons |
---|---|
|
|