Arenadata Documentation
Our passion is to build efficient flexible solutions that scale up to dozens of petabytes
Products
Explore our range of solutions in the world of Big Data
Overview
Arenadata DB (ADB) is an open-source massively parallel relational DBMS that is based on PostgreSQL and intended for column storages with flexible horizontal scalability. Due to its architectural features and powerful query optimizer, ADB demonstrates a special reliability and high speed of SQL query processing against large data volumes, so it is widely used for Big Data analytics on an industrial scale.
For more convenient operation and launching practical tasks of any complexity, Arenadata DB comes with a number of additional tools that provide integration with external data storages, binary backup management, and real-time query monitoring. This functionality allows users to build solutions with full coverage of all processes related to business system maintenance.
Use cases
Advanced data analytics

The advanced analytics provided by ADB is being used across many verticals, including finance, manufacturing, automotive, government, energy, education, retail, and so on, to address a wide variety of problems.

Some of the Arenadata DB analytics capabilities include the ability to analyze a multitude of data types, leverage existing SQL knowledge, and train more models in less time by using the MPP architecture.

Additionally, ADB provides in-database analytics which allows you to run analytics directly in the database vs exporting and running your data in an external analytics engine.

Machine learning

Arenadata DB is an excellent database for machine learning – the study of computer algorithms that improve automatically through experience. Apache MADlib is an open-source, SQL-based machine learning library that runs in-database on ADB, as well as on PostgreSQL.

This combination helps to improve the parallelism, scalability, and predictive accuracy of a machine learning deployment. Data transformation and feature engineering capabilities are also available through MADlib for machine learning, including descriptive and inferential statistics, pivoting, sessionization, and categorical variables encoding.

Artificial intelligence

With ADB ability to ingest large volumes of data at high speeds, it makes this database a powerful tool for smart applications that need to interact intelligently based on an unlimited number of unique scenarios.

For example, a telecom company may use Arenadata DB AI capabilities in IoT (Internet of Things) systems with smart sensors to analyze and process events for maintenance, security, and operational efficiency purposes.

Enterprise
Community
Core Greenplum functionality
gpbackup/gprestore
PXF
Deploy & upgrade automation
Monitoring & Alerting
Offline installation
WAL Backup management
ppc64le
x86
Technical support 24/7
Corporate training courses
Tailored solutions
Available integrations
ADQM
Arenadata QuickMarts
ADB ClickHouse connector provides the possibility of high-speed, parallel data exchange between Arenadata DB and Arenadata QuickMarts (ADQM).
ADS
ADS
Writing data from Arenadata DB to Arenadata Streaming (ADS) is provided by PXF plugin. To read data there is an extension for Arenadata DB that implements transactional data load from ADS.
Kafka
Kafka
Writing data from Arenadata DB to Kafka is provided by PXF plugin. To read data there is an extension for Arenadata DB that implements transactional data load from Kafka.
Oracle
Oracle
Two-way data exchange with Oracle Database is available via PXF JDBC connector with support of Oracle-specific features like parallel query execution.
S3
S3
PXF service provides a connector to the S3 object store.
HBase
HBase
PXF HBase connector reads data stored in HBase tables and supports filter push-down.
HDFS
HDFS
PXF is compatible with generic Apache Hadoop distributions including Arenadata Hadoop. PXF is installed with HDFS, Hive, and HBase connectors. You can use these connectors to access varied formats of data from the above Hadoop distributions.
JDBC
JDBC
PXF provides access to this data via PXF JDBC connector. It can read data from and write data to different SQL databases.
Hive
Hive
PXF is compatible with generic Apache Hadoop distributions including Arenadata Hadoop. PXF is installed with HDFS, Hive, and HBase connectors. You use these connectors to access varied formats of data from the above Hadoop distributions.
Operating systems
AltLinux 8 SP
Supported
CentOS 7
Supported
RedHat 7
Supported
Core Greenplum functionality
gpbackup/gprestore
PXF
Deploy & upgrade automation
Monitoring & Alerting
Offline installation
WAL Backup management
ppc64le
x86
Technical support 24/7
Corporate training courses
Tailored solutions
Available integrations
ADQM
Arenadata QuickMarts
Available only for Enterprise
ADS
ADS
Available only for Enterprise
Kafka
Kafka
Available only for Enterprise
Oracle
Oracle
Two-way data exchange with Oracle Database is available via PXF JDBC connector with support of Oracle-specific features like parallel query execution.
S3
S3
PXF service provides a connector to the S3 object store.
HBase
HBase
PXF HBase connector reads data stored in HBase tables and supports filter push-down.
HDFS
HDFS
PXF is compatible with generic Apache Hadoop distributions including Arenadata Hadoop. PXF is installed with HDFS, Hive, and HBase connectors. You can use these connectors to access varied formats of data from the above Hadoop distributions.
JDBC
JDBC
PXF provides access to this data via PXF JDBC connector. It can read data from and write data to different SQL databases.
Hive
Hive
PXF is compatible with generic Apache Hadoop distributions including Arenadata Hadoop. PXF is installed with HDFS, Hive, and HBase connectors. You use these connectors to access varied formats of data from the above Hadoop distributions.
Operating systems
AltLinux 8 SP
Available only for Enterprise
CentOS 7
Supported
RedHat 7
Supported
Features
Performance
ADB can scale horizontally without degrading query performance on petabytes of data
Safety
Built-in audit of user actions on a cluster: authentication, LDAP configuration, resource group configuration
Reliability
Mirroring, safe backup management, ddboost plugin for gpbackup/gprestore utilities
Convenience
Flexible deployment and configuration, upgrades with tested binaries and migrations for all the components
Contribution
Our team is one of the main Greenplum contributors. In addition, we maintain our own documentation and keep it up-to-date
ADB Control
Arenadata DB query monitoring system
It is designed for in-depth research of command execution processes or utilities that work with ADB clusters.
Monitoring is based on real-time information of the query-level resource consumption and the progress of the query plan execution. Additionally, it is possible to monitor the execution of queries in the context of transactions.
The monitoring system has a convenient user interface with the ability to connect several Arenadata DB clusters to it, collect statistics, view its graphical representation, and export metrics.
Arenadata DB Backup Manager
Service for ADB binary backup management
The main feature is asynchronous launch of binary backups on a running cluster.
There is a user interface built into ADB Control, from which you can work with several ADB clusters and for each of them:
  • configure backup schedules;
  • manage backup configurations;
  • create backups of different types (full, incremental, differential) on-demand;
  • restore cluster databases from existing backups;
  • perform audit of actions related to backups.
ADB Spark Connector
Multifunctional connector with support for parallel read/write operations between Apache Spark and Arenadata DB. Based on it, you can easily build ETL solutions and perform in-memory data analysis.
Provides a flexible configuration and many features:
  • high data transmission speed;
  • automatic data schema generation;
  • flexible partitioning;
  • support for push-down operators;
  • support for batch operations.
ADB Kafka Connector
Special connector for Apache Kafka integration with Arenadata DB.
Features:
  • ability to read and write AVRO data from Kafka topics;
  • support for CSV and text formats in data read operations;
  • support for transactions in Arenadata DB.
ADB PXF Connector
Framework for parallel and high performance access to heterogeneous data sources from Arenadata DB based on built-in connectors.
The data is accessed through the mechanism of external tables, which allows to build complex federal queries.
To connect external data storages, the following connectors are provided: JDBC, S3, Hive, HDFS, and HBase. Authentication may include Kerberos and/or SSL.
ADB ClickHouse Connector
FDW connector for data transmission from Arenadata DB to Arenadata QuickMarts or ClickHouse.
Features:
  • transactionally load data by automatic creation of staging tables;
  • use multiple table engine families in ClickHouse;
  • flexibly distribute and parallelize the write load.
Roadmap
2023
ADB 6.23.3
  • Implemented Tkhemali connector 2.0
  • Started to process the IN predicate for filter push-down purposes in PXF JDBC
  • Enabled SSL between Client and Master
  • Added PXF Monitoring Grafana Dashboards
ADB Control 4.2.1
  • Arenadata DB Command Center (ADBCC) is renamed to Arenadata DB Control (ADB Control)
  • Optimized storage space with the ability to export metrics to an external database for a long-term storage
  • Improved security with CSRF (Cross-Site Request Forgery) protection support
  • Improved authentication security with forced password change at first login and user account blocking after several failed authentication attempts
  • Added the ability to view total monitoring metrics as well as current-time recalculations for active commands
ADBM 1.2.1
  • Ability to use ADBM on PowerPC
  • Ability to restore cluster without mirror segments
  • Improved filtering for Restore actions
ADB 6.22.1
  • Synced with upstream Greenplum Database 6.22.1
  • Upgraded pgbouncer to 1.18
  • Upgraded gpbackup to 1.27
  • Upgraded plcontainer to 2.2
  • Implemented a buffer parameter in the gpcheckperf utility
ADBM 1.1.0
  • Ability to restore from backups on a stopped cluster
  • Ability to delete the last backup from the stanza
  • Ability to restore specific databases from backups
ADB Control 4.1.0
  • Added the ability to use ADB Control along with gpperfmon within the same ADB cluster
  • LDAP search in several Organizational Units (OU) during authentication
  • Flexible sorting on query and transaction monitoring pages
ADB 6.22.0
  • Synced with upstream Greenplum Database 6.22.0
  • Added support for AltLinux 8.4 SP
  • Added Data Domain Boost 1.0.0
  • PXF: allowed setting of Oracle parallel instructions
  • Refactored Planchecker to use an external ADB Control database
  • gpbackup: fixed the metadata order so that now gprestore can restore functions after the tables that are used in functions as a returning type
  • gptkh: fixed fetching of the actual system.tables columns in ClickHouse (according to the ClickHouse version)
ADB Control 3.7.0
  • Added new performance metrics for commands and transactions: Cpu usage total, Read bytes total, Write bytes total
  • Added the ability to repeatedly change a resource group for a transaction
  • Fixed calculating the number of tuples affected by the request
  • Included the Planchecker database objects into migration
ADB 6.21.1
  • Synced with upstream Greenplum Database 6.21.1
  • Enabled core dump files for ADB processes
  • Fixed the problem with loss of resource group slots when moving a query
  • Added a Planchecker image to the ADCC service (docker-compose) in the ADB bundle
ADB Control 3.6.0
  • Added the ability to cancel a transaction
  • Added the ability to reassign queries to another resource group
  • Added the ability to filter commands by a planner
ADB 6.21.0
  • Synced with upstream Greenplum Database 6.21.0
  • Optimized DML queries against partitioned tables to avoid further planning if a partition was pruned
  • Excluded the gpmon background process from the shared memory user list
  • Implemented a fallback to PostgreSQL for an empty target list in CTE producer
  • ADB bundle: added the ability to specify a cluster network
ADB Control 3.5.1
  • A non-blocking socket is now used to communicate with an agent
  • Added monitoring of transactions
  • Added monitoring of SQL statement groups: DDL, DML, DCL, and TCL
  • Added Spill and Spill Skew calculation
ADB 6.20.1
  • Synced with upstream Greenplum Database 6.20.1
  • PXF: added PXF 6.3.0 to the ADB bundle (with ability to upgrade from PXF 5.x)
  • PXF: activated a PXF cluster sync command
  • PXF: added the ability to override data types mapping in external tables for PXF
  • Added ADB ClickHouse connector 1.0.1
ADB Control 3.4.0
  • The actual statistics from EXPLAIN ANALYZE for finished queries is processed now
  • Integrated the average cluster query metrics
  • Implemented compression for huge queries
ADB 6.19.3
  • Synced with upstream Greenplum Database 6.19.3
  • Added ADB Loader tools for RHEL 8
  • Added the ability to deploy maintenance scripts for several databases
  • Enabled the backlog_lock_waits GUC
ADB Control 3.3.1
  • Data audit support
  • Added the Background jobs history page
  • Support for virtual process memory in system metrics of commands
  • Service load ratio support
ADB 6.18.2
  • Synced with upstream Greenplum Database 6.18.2
  • Implemented archive_mode always
  • Added Kafka ADB connector 1.0.4
ADB Control 3.2.5
  • HTTPS support
  • Implemented the backpressure mechanism relative to the memory volume occupied in heap on the agent
ADB 6.18.0
  • Synced with upstream Greenplum Database 6.18.0
  • gpbackup: added an explicit order of tables by using pg_class.relpages
  • PXF: added the partitioning query support for Sybase
ADB Control 3.2.4
  • Multi-clusters
  • New system metrics in the query context: CPU, RAM, IO
  • Actualization for hanging queries
  • Adding columns dynamically to the History and Monitoring pages
ADB 6.17.5
  • Synced with upstream Greenplum Database 6.17.5
  • ADB bundle: added the Ready to upgrade status for a bundle upgrade action
  • Fixed low CPU performance on Power with newly added CGLAGS build options
  • Added a build for Power8 LE platform (ppc64le arch)
ADB Control 3.1.3
  • Added an agent build for Power8 LE platform (ppc64le arch)
ADB 6.17.1
  • Synced with upstream Greenplum Database 6.17.1
  • ADB bundle: external database connection for ADB Control
  • Fixed: PostgreSQL query optimizer built a bad plan for replicated tables with indexes
  • adcc-extension: started to retrieve and send an error text
ADB Control 3.1.0
  • New user interface
  • Ability to cancel and terminate queries
  • Time-based Retention Policy
  • Extended information on errors
  • LDAP authentication
ADB 6.16.2
  • Synced with upstream Greenplum Database 6.16.2
  • Shrinking of relation segment files to zero on TRUNCATE and DELETE
  • PXF: removed the tuple count check for JDBC queries INSERT
  • Kafka ADB Connector: allowed users to set custom librdkafka options
  • Implemented the diskquota extension update
  • Added the gp_enable_gpperfmon=on parameter to Master and Segment servers
ADB Control 2.1.1
  • Added JVM arguments for logging
ADB 6.15.0
  • Synced with upstream Greenplum Database 6.15.0
  • Implemented switchover from Master to Standby via ADCM
  • Added the $PXF_CONF and the $PXF_HOME environment variables to PXF hosts
  • Kafka ADB connector: implemented signal handlers to interrupt consuming
  • Ported ADB to Alt Linux 8.2
ADB Control 2.0.3
  • Support for horizontal scaling of ADB Control backend
  • Added the UDS unlink processing, updated the library build for CentOS 7
  • Added the innerQueueCapacity parameter that defines the internal message queue size for an agent
ADB 6.14.1
  • Synced with upstream Greenplum Database 6.14.1
  • Removed online loading of static resources from the ADB Control web interface
ADB 6.14.0
  • Synced with upstream Greenplum Database 6.14.0
ADB 6.13.0
  • Synced with upstream Greenplum Database 6.13.0
  • Supported auxiliary relations for append-optimized tables to be used by the pgstattuple extension to exactly estimate the bloat of those relations
  • ADB ClickHouse connector: avoided the intermediate conversion to a byte array
  • Removed the obsolete batching options in ADQM connector (since the TEXT is used now)
ADB 6.12.1
  • Synced with upstream Greenplum Database 6.12.1
  • Added the offset function to Kafka connector
  • Added the rest committed function to Kafka ADB connector
  • Implemented the text format for Kafka ADB connector
  • Provided the JVM_OPTS setting for PXF
  • Supported AVRO logical types in Kafka ADB connector
  • Started to use rd_kafka_query_watermark_offsets to validate partition-offset pairs in Kafka ADB connector