Конференция Arenadata
Новое время — новый Greenplum
Мы приглашаем вас принять участие в конференции, посвященной будущему Open-Source Greenplum 19 сентября в 18:00:00 UTC +3. Встреча будет проходить в гибридном формате — и офлайн, и онлайн. Онлайн-трансляция будет доступна для всех желающих.
Внезапное закрытие Greenplum его владельцем — компанией Broadcom - стало неприятным сюрпризом для всех, кто использует или планирует начать использовать решения на базе этой технологии. Многие ожидают выхода стабильной версии Greenplum 7 и надеются на её дальнейшее активное развитие.
Arenadata не могла допустить, чтобы разрабатываемый годами Open-Source проект Greenplum прекратил своё существование, поэтому 19 сентября мы представим наш ответ на данное решение Broadcom, а участники сообщества получат исчерпывающие разъяснения на все вопросы о дальнейшей судьбе этой технологии.

На конференции вас ждёт обсуждение следующих тем:

  • План возрождения Greenplum;
  • Дорожная карта;
  • Экспертное обсуждение и консультации.
Осталось до события

Use ADB DDBoost plugin

Overview

ADB DDBoost is a plugin-connector for native usage of the Dell EMC Data Domain storage system with Greenplum utilities for logical data backup (gpbackup) and restore (gprestore).

When the DDBoost plugin is used with gpbackup and gprestore, it connects to the Data Domain Platform server. To integrate with Data Domain Platform, the plugin requires the libDDBoost.so library version 7.7, which is supplied as part of the Dell DDBoostSDK 7.7 package. This library is proprietary and should be obtained through Dell Support or Dell Data Domain representatives in your country. Due to deduplication of data provided by the library, the DDBoost plugin allows you to significantly accelerate backup operations and reduce the physical storage usage.

NOTE
  • For more information on the ADB DDBoost plugin architecture, advantages, and load testing results see our technical blog.

  • The DDBoost plugin is available in the Enterprise version of ADB 6 (starting with 6.22.0.38).

  • The plugin is distributed as an RPM package for the Centos7/AltLinux 8.4 operating systems and x86-64 CPU architecture.

Installation and configuration

To install the ADB DDBoost plugin, follow the steps:

  1. Ensure that you use Arenadata DB Enterprise 6.22.0.38 or higher. Update your ADB cluster if necessary.

  2. Get the adb-ddp-plugin RPM package from the Arenadata support team. The package name contains a specific version number. In the example below, the version number is omitted for simplification. Install the RPM package on all cluster hosts under the user with sudo privileges. Below is an example of installation via the YUM package manager:

    $ sudo yum install -y adb-ddp-plugin.rpm

    As a result of the successful command execution, a plugin executable file is added to the following location: $GPHOME/bin/adb_ddp_plugin.

  3. Install the libDDBoost.so library on all cluster hosts if you have not done it before. Below is an example of installation via the YUM package manager:

    $ sudo yum install -y libddboost.rpm
    NOTE

    In the current example, the CentOS 7 operating system is used, and the libDDBoost.so file is created in the /lib64 directory. The path may differ in your environment.

  4. Connect to the ADB master host under the gpadmin user:

    $ sudo su - gpadmin
  5. On the master host, create a YAML configuration file in the /home/gpadmin directory (e.g. adb_ddp_plugin.yaml). A file structure and its description are given below.

    executablepath: /usr/lib/gpdb/bin/adb_ddp_plugin
    options:
      hostname: "testhost"
      username: "testuser"
      password: "XXXXXXXXXXXX"
      storage_unit: "testunit"
      directory: "testdir"
      write_buffer_size: "1048576"
      read_buffer_size: "1048576"
      log_level: "DEBUG"
      log_path: "/home/gpadmin"
Fields of the plugin configuration file
Name Description Default Required

executablepath

An absolute path to the plugin in the file system of ADB hosts (along with the plugin executable file name). The plugin should be installed on all cluster hosts in the same directory.

The default path to the plugin executable file is $GPHOME/bin/adb_ddp_plugin

 — 

Yes

hostname

An IP address or name of the host that provides operations with DDBoost.

Can contain no more than 30 characters

 — 

Yes

username

A name of the user who is granted permissions to work with DDBoost. The user is configured on the DDBoost side. This is neither operating system nor ADB user.

Can contain no more than 30 characters

 — 

Yes

password

A password of the user who is granted permissions to work with DDBoost

 — 

Yes

storage_unit

A name of the storage unit that is configured on the DDBoost side

 — 

Yes

directory

A name of the directory in the DDBoost file system. That directory is used to store all backup files created when the gpbackup utility is run with the DDBoost plugin. In the /<storage_unit>/<directory> folder, the plugin automatically creates all subdirectories corresponding to the creation date and time of each backup: /<storage_unit>/<directory>/YYYYMMDD/YYYYMMDDHHmmSS/

 — 

Yes

write_buffer_size

A buffer size for writing data to DDBoost (in bytes).

Values from the following range are allowed:

64 <= write_buffer_size <= 1048576

 — 

Yes

read_buffer_size

A buffer size for reading data from DDBoost (in bytes).

Values from the following range are allowed:

64 <= read_buffer_size <= 1048576

 — 

Yes

log_level

A log level. Possible values:

  • DEBUG

  • INFO

  • NOTICE

  • WARN

  • ERROR

  • FATAL

WARN

No

log_path

An absolute directory path to write plugin logs. The adb_ddp_plugin.log file is created in the specified directory when the plugin is used

/home/gpadmin/gpAdminLogs

No

Usage examples

gpbackup

To use the ADB DDBoost plugin when creating backups via the gpbackup utility, specify the --plugin-config <yaml_path> command argument, where <yaml_path> is an absolute path to the plugin configuration file in YAML format.

The following example shows how to backup one table of the adb database:

  1. Connect to the adb database on ADB master under the gpadmin user (for example, via psql). Using the psql command \dt, ensure the required table (public.test in the current example) exists in the database:

    \dt

    The result:

                             List of relations
     Schema |      Name       | Type  |  Owner  |       Storage
    --------+-----------------+-------+---------+----------------------
     public | spatial_ref_sys | table | gpadmin | heap
     public | test            | table | gpadmin | heap
     public | test2           | table | gpadmin | append only columnar
     public | test3           | table | gpadmin | append only columnar
    (4 rows)
  2. Disconnect from adb and run gpbackup with the --plugin-config parameter under the gpadmin user. Note that the --include-table parameter allows you to backup one table:

    $ gpbackup --dbname adb --no-compression --single-data-file --include-table public.test --plugin-config /home/gpadmin/adb_ddp_plugin.yaml
    TIP

    When the DDBoost plugin is used along with gpbackup, it is recommended to set the backup mode "all segment tables — one file" (--single-data-file). Do not use the parallel backup mode (--jobs) because of its extremely low write performance.

    If the data backup succeeds, the command output ends with the following message:

    [INFO]:-Backup completed successfully
  3. Ensure that the backup files are generated and stored on the selected Data Domain Platform server. To do this, open the /<storage_unit>/<directory>/YYYYMMDD/YYYYMMDDHHmmSS directory on the server, where:

    • <storage_unit> and <directory> — values of eponymous fields from the plugin YAML configuration file.

    • YYYYMMDD — backup creation date.

    • YYYYMMDDHHmmSS — backup creation date and time. It is a timestamp that will be used as the --timestamp value when restoring data from the current backup.

    $ ls ./ddtest-dstu/gpbackup/20240130/20240130160839

    The directory contents are as follows:

    gpbackup_0_20240130160839                   gpbackup_3_20240130160839
    gpbackup_0_20240130160839_toc.yaml          gpbackup_3_20240130160839_toc.yaml
    gpbackup_1_20240130160839                   gpbackup_4_20240130160839
    gpbackup_1_20240130160839_toc.yaml          gpbackup_4_20240130160839_toc.yaml
    gpbackup_20240130160839_config.yaml         gpbackup_5_20240130160839
    gpbackup_20240130160839_metadata.sql        gpbackup_5_20240130160839_toc.yaml
    gpbackup_20240130160839_plugin_config.yaml  gpbackup_6_20240130160839
    gpbackup_20240130160839_report              gpbackup_6_20240130160839_toc.yaml
    gpbackup_20240130160839_toc.yaml            gpbackup_7_20240130160839
    gpbackup_2_20240130160839                   gpbackup_7_20240130160839_toc.yaml
    gpbackup_2_20240130160839_toc.yaml

gprestore

To use the ADB DDBoost plugin when restoring data via the gprestore utility, specify the --plugin-config <yaml_path> command argument, where <yaml_path> is an absolute path to the plugin configuration file in YAML format.

The following example shows how to restore the public.test table, for which a backup was previously created:

  1. Connect to the adb database on ADB master under the gpadmin user (for example, via psql). Drop the public.test table:

    DROP TABLE test;
  2. Ensure the table does not exist via the psql command \dt:

    \dt

    The result:

                             List of relations
     Schema |      Name       | Type  |  Owner  |       Storage
    --------+-----------------+-------+---------+----------------------
     public | spatial_ref_sys | table | gpadmin | heap
     public | test2           | table | gpadmin | append only columnar
     public | test3           | table | gpadmin | append only columnar
    (3 rows)
  3. Disconnect from adb and run gprestore with the --plugin-config parameter under the gpadmin user. Note that the --timestamp value should contain the backup creation timestamp in the YYYYMMDDHHmmSS format:

    $ gprestore --timestamp 20240130160839 --plugin-config /home/gpadmin/adb_ddp_plugin.yaml --on-error-continue --include-table public.test

    If the data restore succeeds, the command output ends with the following message:

    [INFO]:-Restore completed successfully
  4. Connect to the adb database again and check that the public.test table is available:

    \dt

    The result:

                             List of relations
     Schema |      Name       | Type  |  Owner  |       Storage
    --------+-----------------+-------+---------+----------------------
     public | spatial_ref_sys | table | gpadmin | heap
     public | test            | table | gpadmin | heap
     public | test2           | table | gpadmin | append only columnar
     public | test3           | table | gpadmin | append only columnar
    (4 rows)
Found a mistake? Seleсt text and press Ctrl+Enter to report it