ADH releases

Sergei Tikhomirov, Konstantin Alpashkin, Daria Barysheva

Collapse content Expand content

Contents

4.0.0
3.3.6.2
3.3.6.1
3.2.4.3
3.2.4.2
3.2.4.1
3.1.2.1
2.1.10
2.1.8
2.1.7
2.1.6
2.1.4
2.1.3
2.1.2
2.1.1
2.1.0

4.0.0

4.0.0.b1

Date: 25.06.2025

New services
Upgrades
New features
Improvements
Bug fixes
Misc/Internal

Added the Flink History Server component for Flink

Added new Monitoring service. Includes Grafana, Prometheus, and all the necessary exporters and pre-configured dashboards for the following services and components:

HDFS (NameNode, DataNode, JournalNode)
Ozone
HUE
Kyuubi
YARN (ResourceManager, NodeManager, TimeLine Server)
Hive Server
Hive Metastore
Impala (Daemon, Catalog, State Store)

The old service is marked as deprecated and will be removed in future releases

Upgraded Trino to 468_arenadata2

Trino updates

Implemented Trino ADB connector.
Implemented DBCatalogManager with versioning and maintenance.
Implemented Catalog Migration functionality.
Propagating credentials for the JDBC connectors.
Ozone support.

Upgraded Ozone to 1.4.1_arenadata2

Ozone updates

Trino support.

Upgraded SSM to 2.1.0

SSM updates

Support for user impersonation.
Refactored local file system cache.
[Bugfix] Table fix on the Cluster Info page.
[Bugfix] Filter settings are now reset on the Actions and Rules pages.
[Bugfix] Some actions that started after running sync failed in short time with the Timeout error.
[Bugfix] Failed action’s finish date was the same as start date.
Added execution start time to the action-related entities.
Added the SCHEDULED logical action state.

Upgraded Impala to 4.5.0_arenadata1

Impala updates

Apache Hive 4.0.1 support.

Upgraded Iceberg to 1.6.1_arenadata1

Iceberg updates

Added the drop.base-directory.enabled table option to control the removal of table base directory after a table is dropped.

Upgraded Flink to 1.20.1_arenadata1

Flink updates

Applied patch FLINK-35332.

Upgraded HUE to 4.11.0_arenadata3

HUE updates

Support for Python 3.10.
Kerberos authentication support for Trino interpreter.
[Bugfix] Catalogs were not displayed in Trino interpreter.
[Bugfix] When executing queries, the status hanged indefinitely and there were errors in the logs for Trino interpreter.

Upgraded Kyuubi to 1.10.1_arenadata1

Kyuubi updates

[Bugfix] Fixed audit logging to Solr for spark-authz.
Added OpenMetadataLineageDispatcher for sending lineage events to OpenMetadata server.
Added new possible value OPEN_METADATA for the spark.kyuubi.plugin.lineage.dispatchers option.

Upgraded Spark to 3.5.4_arenadata1

Spark updates

Now a purge is forced during table/partition drop in case the external.table.purge table option is present.
Purgeable external tables are used for path updates and partition handling when renaming tables.
Applied patch SPARK-51795.
Applied patch SPARK-51821.
Applied patch SPARK-52025.
Applied patch SPARK-43221.
Applied patch SPARK-51638.
Applied patch SPARK-44756.
Applied patch SPARK-50648.
Applied patch SPARK-50853.
Applied patch SPARK-45991.
Applied patch SPARK-51272.
Applied patch SPARK-48218.
Applied patch SPARK-43301.
Applied patch SPARK-44247.
Applied patch SPARK-46718.
Applied patch SPARK-47531.

Upgraded ADQM Spark connector to 1.0.0-3.5.4_arenadata1

Upgraded ADB Spark connector to 1.0.5-3.5.4_arenadata1

ADB Spark connector updates

Moved transformation of the JDBC ResultSet rows iterator to Seq in the JdbcService.extractData() method to properly close resources.
Support for the gp_parallel_retrieve_cursor read mode.
Added the support of aggregate pushdown.
Added the support of SupportsPushDownTopN, SupportsPushDownOffset, and SupportsPushDownLimit to BatchReader.

Upgraded Livy to 0.8.0_arenadata1

Upgraded Hive to 4.0.1_arenadata1

Hive updates

[Bugfix] Explicitly specified SQL types for DB, table, and partition parameters in removeTxnComponents for compaction.
[Bugfix] Added a sanitize error message in compaction queue update.
[Bugfix] Generated the utf8Sanitize utility method and used it in CompactionTxnHandler.

Upgraded Tez to 0.10.4_arenadata1

Eliminated interservice dependencies and added the ability to install multiple clusters with the required set of services and manual integration

The bundle now includes extensions for integrating Flink with Kafka — flink-kafka-connector 3.4.0

PyFlink is now available out of the box

HUE now has a pre-configured Trino interpreter

It is now possible to download client configurations from a cluster for HDFS, YARN, Hive, Solr, Core, HBase, Ozone, Flink, Impala, and Spark3

Added the ability to automatically install Java from the Arenadata repository

Added HA support for Impala State Store

Added HA support for Impala Catalog

Implemented Ranger plugin management for Trino

The management capabilities of Trino have been expanded. The following parameters are provided for configuration:

logging settings;
Trino fault-tolerant execution control with preset values for current storage (HDFS/Ozone);
session property settings;
resource group settings.

Added the ability to generate SQL Lineage and integrate with Open Metadata for Kyuubi when using the SparkSQL engine

Trino now supports Ozone via the ofs protocol

Added the ability to perform rolling restart for HDFS, ZooKeeper, YARN, and HBase

Implemented support for impersonation to perform actions in SSM

Implemented the ability to configure Ozone Topology awareness

Added the ability to manage logs from ADCM for Kyuubi

Added the ability to enable and configure LDAP authentication for Trino from ADCM

The Add, Remove, Shrink, Expand, and Move actions have been replaced with the Add/Remove components action for Kyuubi, Airflow, ADPG, Solr, Impala, SSM, Zeppelin, Zookeeper, Flink, HBase, Ozone

All the Trino configuration properties are now explicitly presented in ADCM

The Manage SSL action now also manages the state of the SSM service

Trino preconfigures discovery.uri when using the SSH hostprovider

Improved the ability to run Flink in a YARN cluster

The web links update has been added to the Start and Restart component actions

When adding/removing services/components, you can now disable service checking to perform the action faster

Kyuubi now supports the Trino engine

Added hints to the Add/modify node-to-labels mapping and Add/modify node labels actions

Added the ability to edit the /etc/security/limits.d/om.conf file for Ozone in ADCM

Added the rest.port parameter into flink-conf.yaml for Flink

Added the Check job in the Manage SSL action for HDFS

The ADB Spark connector now respects the order of user-provided schema columns in the ADB batch entry

Bundle now manages directory creation from configuration parameters for the following services:

Trino — node.data-dir
YARN — yarn.nodemanager.local-dirs, yarn.nodemanager.log-dirs, mapreduce.cluster.local.dir, yarn.timeline-service.leveldb-timeline-store.path, yarn.nodemanager.recovery.dir, ranger.plugin.yarn.policy.cache.dir, ranger-yarn-audit.xml/xasecure.audit.destination.solr.batch.filespool.dir
HDFS — dfs.namenode.name.dir, dfs.datanode.data.dir, dfs.journalnode.edits.dir, dfs.namenode.checkpoint.dir, ranger.plugin.hdfs.policy.cache.dir, ranger-hdfs-audit.xml/xasecure.audit.destination.solr.batch.filespool.dir

Further configuration and cluster topology changes are also taken into account

If Trino components had user-defined working directories, systemd would continually restart them

Couldn’t change the SSL options for Hive catalog in Trino

Spark3: no web links were updated after removing the Livy and Connect components

Zeppelin: queries in the Spark interpreter failed because Zeppelin couldn’t create a directory

impalarc wasn’t present on the hosts to which Impala Client was expanded

The log4j2-repl.xml file for Kyuubi had an incorrect template

Installing a standalone Hive Metastore failed with an error

Hive: no web links were updated after removing the Hive TezUI component

An error while removing a decommissioned YARN NodeManager component from a host in maintenance mode

Spark AuthZ plugin did not create audit logs

Spark3 Livy did not work with Ozone

Incorrect parsing of arguments for the -q flag in impala-shell

[ADB Spark connector] The service did not work in some cases when the Spark Ranger plugin was enabled

[ADB Spark connector] Reading from a table returned no results if the partitions count was explicitly set to 1

[ADB Spark connector] DataFrame.count() failed if the partition count was manually set

Starting with the 4.0.0 version, the distribution’s name has changed from Arenadata Hadoop (ADH) to Arenadata Hyperwave (ADH). Nevertheless, it’s still possible to perform a direct upgrade

Due to the Sqoop and Spark2 services deprecation, it’s mandatory to remove them from the cluster with the upgrade. The Spark3 Thrift Server component is also being deprecated, and it’s recommended to switch to Kyuubi

The ability to automatically install Java from the Arenadata repository when installing a cluster was added in this release. However, Arenadata is not an official JDK provider and this functionality is provided to enable fully automatic installation on systems that do not have the necessary dependencies. In production environments, we recommend using JDKs from official vendors

The minimum ADCM version now is 2.5.0

Implemented package refactoring to eliminate interservice dependencies and configuration generation logic. The current major upgrade may require more time to fully update packages on all hosts. To increase the update speed, it is recommended to increase the number of Ansible forks in the cluster configuration

3.3.6.2

3.3.6.2.b1

Date: 31.01.2025

New services
Upgrades
New features
Improvements
Bug fixes
Misc/Internal

Ozone 1.4.1_arenadata1

Implemented security functionality: support for TLS, Kerberos, ADPS integration with a Ranger plugin, SPNEGO.
Implemented Ozone HttpFS integration with HUE.
Implemented Ozone topology awareness settings via ADCM.
Implemented support for the Ozone write pipeline v2.
Implemented the S3 Gateway support.
Some limitations are still in place, see ADH known issues.

The following patches are also included:

Ozone patches

HDDS-11543: Track OzoneClient object leaks via LeakDetector framework
HDDS-11574: Ozone client leak in TestS3SDKV1
HDDS-8005: [disabled] Intermittent failure in TestOmSnapshot.testSnapDiffWithMultipleSSTs/
HDDS-10148: TestOmSnapshotFsoWithNativeLib should be tagged as native test/
HDDS-10990: Fix memory leak in native lib/
HDDS-10560: Link rocksdb lib to Ozone rocksdb tools lib relative path instead of absolute path
HDDS-3415: Enable TestContainerPlacement#testContainerPlacementCapacity
HDDS-11347: Add rocks_tools_native lib check in Ozone CLI checknative subcommand
HDDS-10118: hdds-rocks-native fails to build with Java11+

Trino 468_arenadata1

Implemented security functionality: support for TLS, Kerberos, SPNEGO.
Implemented pre-configuration of internal catalogs (Iceberg, Hive).
Added the ability to use a Java version separate from the rest of the cluster for the service (manual installation is required).
Implemented the ability to manage Trino catalogs via ADCM.

Core configuration

The Hadoop configuration has been moved to a service independent of HDFS.

Upgraded HUE to 4.11.0_arenadata2

HUE updates

The HUE JDBC interface did not support retrieving table metadata from ADB.
The Impala jobs did not work with SSL.

Upgraded Kyuubi to 1.9.2_arenadata2

Kyuubi updates

Added support for handling multiple Java options in HiveProcessBuilder.

Upgraded SSM to 2.0.1

SSM updates

Upgraded Spark3 to 3.5.2_arenadata2

Spark3 updates

Added the TRUNCATE support with Hive4.
Added the DROP support with Hive4.

Implemented the SPNEGO authentication for the Impala service

Implemented support for RedOS 7.3 in the Enterprise version of the ADH bundle

Implemented the ability to import an ADH cluster into ADS for Kafka tiered storage configuration

The Add, Remove, Shrink, Expand, and Move actions have been replaced with the Add/Remove components action for Kyuubi, Sqoop, Airflow, ADPG, Solr, Impala, SSM, Zeppelin, ZooKeeper, Flink, and HBase

The visible password was removed when Samba Kerberos was enabled

Removed empty and logically redundant tasks related to Kerberos from the logs when deleting components

Corrected services where bigtop-detect-javahome is not used (ZooKeeper, SSM, Livy, and jmxtrans)

Corrected the naming of the Maintenance/Decommiss DataNodes and Decommiss/Recommiss NodeManagers actions in HDFS

Improved the usability of the Enable custom ulimits parameter

Added the ability to manage log.conf via ADCM for HUE

Added the ability to manage impalad_flags for the Impala interpreter via ADCM for HUE

Improved management of the *-env.sh files for HBase

Added custom configuration for ranger-solr-audit.xml, ranger-solr-security.xml, and ranger-solr-policymgr-ssl.xml via ADCM for Solr service

Added the ability to manage llma-site and fair-scheduler templates via ADCM for Impala

Added database, user, and grants creation for HUE when the Restart action is performed with the Metastore DB schema init/upgrade flag is active

Provided the ability to configure the following parameters in the configuration group for Hive:

hive.server2.support.dynamic.service.discovery

HDFS did not restart after disabling its encryption with the Manage credential encryption action

The JDBC interface did not support retrieving table metadata from ADB/Greenplum in HUE

The Impala jobs did not work with SSL in HUE

Inability to execute commands in Hive JDBC after moving Kyuubi to a new host

Inability to perform TRUNCATE with Hive4 in Spark3

Inability to perform DROP with Hive4 in Spark3

The desktop.http_port parameter had no effect for HUE

ZooKeeper didn’t get disabled after being moved/removed from hosts

The JDBC connection did not work with the THRIFT_HTTP mode in Kyuubi

The keystore and truststore were not hidden after enabling encryption

The Add/Remove actions without changing host-component mapping had excessive tasks

The ADCM status checker did not start with the host boot

The no password option was not applied in Solr after enabling encryption

Error launching the Solr’s Add Server(s) action on an empty host with SSL enabled

The Manage credential encryption action failed if a selected service wasn’t installed

The Hadoop services didn’t have rights to the /var/lock/subsys/ folder during installation

The Java 23 version is required for the Trino service. Manual installation on the hosts is required. Also, add the installation path to the cluster configuration (TRINO_JAVA_HOME)

For Ozone kerberization, it is necessary to install it in a kerberized cluster

The minimum ADCM version now is 2.4.0

Marked as deprecated and will be removed in future releases — Spark 2 service, Sqoop service, Spark3 Thrift Server (a Spark3 component)

When upgrading to this version, you have to manually install the Core configuration service

3.3.6.1

3.3.6.1.b1

Date: 30.10.2024

Upgrades
New features
Improvements
Bug fixes
Misc/Internal

Upgraded HDFS to 3.3.6_arenadata1 with the patches listed below.

Hadoop patches

HDFS-17299: HDFS is not rack failure tolerant while creating a new file
HDFS-17453: IncrementalBlockReport can have race condition with Edit Log Tailer
YARN-10377: Clicking on queue in Capacity Scheduler legacy ui does not show any applications
YARN-11107: When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly
HDFS-17024: Potential data race introduced by HDFS-15865
HDFS-16316: Improve DirectoryScanner: add regular file check related block
YARN-10975: EntityGroupFSTimelineStore#ActiveLogParser parses already processed files
YARN-9877: Intermittent TIME_OUT of LogAggregationReport
HDFS-16676: DatanodeAdminManager$Monitor reports a node as invalid continuously
YARN-11073: Avoid unnecessary preemption for tiny queues under certain corner cases
YARN-10855: yarn logs cli fails to retrieve logs if any TFile is corrupt or empty
YARN-10652: Capacity Scheduler fails to handle user weights for a user that has a "." (dot) in it
YARN-10442: RM should make sure node label file highly available
YARN-10448: SLS should set default user to handle SYNTH format
HDFS-15543: RBF: Write Should allow, when a subcluster is unavailable for RANDOM mount points with fault Tolerance enabled
HADOOP-18613: Upgrade ZooKeeper to version 3.8.3
HADOOP-19084: prune dependency exports of hadoop-* modules
HADOOP-18487: Make protobuf 2.5 an optional runtime dependency
YARN-11657: Remove protobuf-2.5 as dependency of hadoop-yarn-api
YARN-8056: Possible NPE when start Application Master

Also guava was upgraded to 32.0.1-jre

Upgraded Hive to 4.0.0_arenadata1 with the patches listed below.

Hive patches

HIVE-28211: Restore hive-exec-core jar
HIVE-28323: Iceberg: Allow reading tables irrespective whether they were created with hive engined enabled or not
HIVE-28306: Iceberg: Return new scan after applying column project parameter
HIVE-28291: Fix Partition spec is incorrect in getPartitionsByFilter RPC
HIVE-28286: Add filtering support for get_table_metas API in Hive metastore
HIVE-28282: Merging into iceberg table fails with copy on write when values clause has a function call
HIVE-28278: Iceberg: Stats: IllegalStateException Invalid file: file length 0
HIVE-28271: DirectSql fails for AlterPartitions
HIVE-28270: Fix missing partition paths bug on drop_database
HIVE-28266: Iceberg: select count(*) from data_files metadata tables gives wrong result
HIVE-28260: CreateTableEvent wrongly skips authorizing DFS_URI for managed table
HIVE-28225: Iceberg: Delete on entire table fails on COW mode
HIVE-28202: Incorrect projected column size after ORC upgrade to v1.6.7
HIVE-28224: Upgrade Orc version in Hive to 1.9.3
HIVE-28248: Upgrade opencsv to v5.9
HIVE-27653: Iceberg: Add conflictDetectionFilter to validate concurrently added data and delete files
HIVE-28364: Iceberg: Upgrade iceberg version to 1.5.2
HIVE-28119: Iceberg: Allow insert clause with a column list in Merge query not_matched condition
HIVE-28352: Schematool fails to upgradeSchema on dbType=hive
HIVE-28369: LLAP proactive eviction fails with NullPointerException
HIVE-28254: CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results
HIVE-28196: Preserve column stats when applying UDF upper/lower
HIVE-28190: Materialized view rebuild lock heart-beating is broken
HIVE-28166: Iceberg: Truncate on branch operates on the main table
HIVE-28121: Use direct SQL for transactional altering table parameter
HIVE-28098: Fails to copy empty column statistics of materialized CTE
HIVE-28082: HiveAggregateReduceFunctionsRule could generate an inconsistent result
HIVE-27847: Prevent query Failures on Numeric <→ Timestamp
HIVE-26018: The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR
HIVE-23772: Relocate calcite-core to prevent NoSuchFiledError

Also fixed the java.sql.Driver: Provider org.apache.calcite.jdbc.Driver not being found issue and implemented the Impala support

Upgraded Tez to 0.10.3_arenadata1

Upgraded Impala to 4.4.0_arenadata2 with the patches listed below.

Impala patches

IMPALA-13272: Analytic function of collections can lead to crash
IMPALA-13252: Filter update log message prints TUniqueId in non-standard format
IMPALA-13519: Running queries been cancelled after statestore failover
IMPALA-12712: INVALIDATE METADATA <table> should set a better createEventId
IMPALA-13034: Add logs for slow HTTP requests dumping the profile
IMPALA-13270: Bug when comparing ExprSubstitutionMap.size()
IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy
IMPALA-13107: Invalid TExecPlanFragmentInfo received by executor with instance number as 0
IMPALA-13129: Hit DCHECK when skipping MIN_MAX runtime filter
IMPALA-13130: Under heavy load, Impala does not prioritize data stream operations
IMPALA-13161: impalad crash — impala::DelimitedTextParser<true>::ParseFieldLocations
IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()
IMPALA-13170: InconsistentMetadataFetchException due to database dropped when showing databases
IMPALA-9441: TestHS2.test_get_schemas is flaky in local catalog mode
IMPALA-13203: ExprRewriter did not rewrite 'id = 0 OR false' as expected
IMPALA-13119: CostingSegment.java is initialized with wrong cost
IMPALA-13077: Equality predicate on partition column and uncorrelated subquery doesn’t reduce the cardinality estimate
IMPALA-12800: Queries with many nested inline views see performance issues with ExprSubstitutionMap
IMPALA-8042: Better selectivity estimate for BETWEEN

Also implemented support for Hadoop 3.3.6, Hive 4.0.0, and Hive Driver 4.0.0

Upgraded Spark 3 to 3.5.2_arenadata1 with the patches listed below.

Spark 3 patches

SPARK-47679: Use HiveConf.getConfVars or Hive conf names directly
SPARK-47468: Exclude logback dependency from SBT like Maven
SPARK-49197: Redact Spark Command output in launcher module
SPARK-46037: When Left Join build Left, ShuffledHashJoinExec may result in incorrect results

Also implemented the support for Hive 4.0.0, added the spark.sql.security.confblacklist blacklist filter parameter to the spark-defaults.conf group in the service configuration to exclude certain properties, added a configurable directory path for Spark Connect, and fixed the driver logging

Upgraded ADQM Spark Сonnector to 1.0.0_3.5.2

Upgraded ADB Spark Connector to 1.0.5_3.5.2

Upgraded Flink to 1.19.1_arenadata1 with support for Hive 4.0.0

Upgraded HBase to 2.5.10_arenadata1 with support for Hive 4.0.0

Upgraded hbase-operator-tools to 1.2.0_arenadata3

Upgraded phoenix-queryserver to 6.0.0_arenadata3

Upgraded Phoenix to 5.2.0_arenadata1

Upgraded Iceberg to 1.5.2_arenadata1

Upgraded SSM to 2.0.0-alpha with the following features:

Introduced new UI and removed the Zeppelin dependency.
Support for HDFS 3.3.6.
Optimized access count strategy.

Upgraded Kyuubi to 1.9.2_arenadata1 with the following features:

Support for Hive 4.0.0.
Added an additional KYUUBI_ADDITIONAL_CLASSPATH environment variable.
AuthZ fixes.
Upgraded the Spark version.
The Flink engine module preliminary now supports building with Scala 2.13.

Upgraded Solr to 8.11.3_arenadata1

Upgraded ZooKeeper to 3.8.4_arenadata1

Upgraded Sqoop to 1.4.7_arenadata3

Upgraded ADPG to 16

Added support for Ubuntu 22.04.2 LTS

Added a new cluster action — Manage Credential Encryption. It encrypts sensitive data in the configuration files for HDFS, YARN, Hive, HBase, Spark, Impala, Zeppelin, Kyuubi, and Solr

Implemented the ability to enable the SSL/TLS protocol for inter-component communication for the Flink service

Implemented the HUE SPNEGO support

The distribution includes the following extensions and formats for Flink:

The Flink Kafka extensions.
The Flink Iceberg extensions.
The Flink Hive extensions.
The parquet format.
The orc format.
The avro format.
The protobuf format.

Added log settings for Livy in ADCM

Added the Custom nginx.conf checkbox parameter with template for the Hive service

Now all the proxyusers HDFS parameters are displayed explicitly

Changed the mem_limit Impala Daemon component configuration parameter type to string

Changed classpaths properties to array type for Flink, Spark, Hive, Kyuubi, and Hive

The Add, Remove, Shrink, Expand, and Move actions have been replaced with the Add/Remove components action for Spark2, Spark3, Hive, and YARN

Hive now uses ADPG as the default metastore during installation

kyuubi.authentication.ldap.binddn is no longer a required parameter

Actualized Flink configuration:

The following parameters were added:
- taskmanager.memory.flink.size
- taskmanager.memory.process.size
- jobmanager.memory.flink.size
- jobmanager.memory.process.size
- jobmanager.memory.heap.size
- flink.yarn.appmaster.vcores
- flink.yarn.container.vcores
The following parameters were removed:
- jobmanager.heap.size
- taskmanager.heap.size

Added the new sections for YARN in ADCM: mapred-env.sh, Custom mapred-env.sh

Relocated the HADOOP_JOB_HISTORYSERVER_OPTS parameter from yarn-env.sh to mapred-env.sh for YARN

Added the Remove action for Airflow and Zeppelin in the community edition

Added the Custom yarn-env.sh parameter to the YARN configuration

Added notifications to services that notify the user of the need to restart if the topology of a dependent service has changed

Errors when connecting an external database to the SSM service

Incorrect process ID was written to the PID file for the SSM service

Error related to an undefined variable while enabling the YARN Ranger plugin

Spark3 Thrift Server didn’t restart

Kyuubi: the Check action failed if only the REST protocol was enabled

The Install CE error on Spark3 Connect check

The incorrect permissions on container-executor binary

Impala: couldn’t enable Ranger on some topologies

HUE: the bundle didn’t change spark_history_server_security_enabled

Flink: excessive parameters on host when HA was disabled

Incorrect auth-to-local rules in the Spark3 History Server process

Invalid LDAP authentication configuration for Airflow

The SSM service received a major upgrade that is not compatible with the previous version. During the update, all security settings will change and the metastore will be cleaned and migrated

The minimum ADPS version is 1.2.0_b1

The minimum ADCM version now is 2.2.0

3.2.4.3

3.2.4.3.b1

Date: 23.07.2024

New features
Improvements
Bug fixes

Added the new HUE service

Bumped Zeppelin to 0.11.1_arenadata1

Bumped Spark3 to 3.4.3_arenadata1 and fixed the Hive version regex

Bumped Kyuubi to 1.9.0_arenadata1 and added the missing Impala alias for the JDBC engine

Upgraded Impala to 4.4.0_arenadata1 with the following patches:

IMPALA-13035: Querying metadata tables from non-Iceberg tables throws IllegalArgumentException
IMPALA-13040: SIGSEGV in QueryState::UpdateFilterFromRemote

Added the Kyuubi AuthZ plugin for Spark3

Implemented support for the Samba domain controller

Implemented the Apache Iceberg support for Spark3

Implemented the Apache Iceberg support for Hive

Added the Spark3 and PySpark support for Zeppelin

Implemented the LDAP authentication for Kyuubi

Implemented the LDAP authentication for Impala

A new Manage SSL action has been added instead of Enable/Disable SSL, which allows you to more conveniently manage the SSL encryption of all services in an ADH cluster

Custom properties will now overwrite the existing ones, even if they are read-only

Provided the ability to add an additional Livy Server for Spark3

Added the ability to specify multiple directories for Impala scratch_dirs

Now the Arenadata PostgreSQL service cannot be removed when Airflow2 is installed

Now the hadoop.proxyuser parameter is registered when adding a new Kyuubi service

Now the hadoop.kms.proxyuser.kyuubi.* parameter in Ranger’s kms-site.xml is registered for the Kyuubi service

Setting the IMPALA_SHELL_HOME environment variable is no longer required to launch the Impala shell

Couldn’t change a web port for Solr

No changes were applied to the log4j configuration for the HBase service

The Kerberos parameters disappeared from the configuration file after Spark3 History Server restart

The Restart YARN action didn’t work when the local-dirs was unavailable

Minor errors during Impala installation on Red Hat

Impala service configuration wasn’t applied during a cluster start

The Precheck packages parameter being set to true led to upgrade errors

3.2.4.2

3.2.4.2.b2

Date: 27.04.2024

Improvements
Bug fixes

Removed the need to install Axiom JDK when using Astra Linux

Added the ability to set a custom value to the JAVA_HOME variable

Configuration for Shiro Simple Authentication displayed passwords in plain text

Package conflict when updating ADH and installing the SSM service

3.2.4.2.b1

Date: 27.03.2024

New features
Bug fixes
Misc/Internal

Added the SSM service

Added the Kyuubi service

Added the JDBC checks

The Manage Ranger plugin action is now customizable and more understandable

ADQM Spark connector is now included in the ADH bundle

Added the Spark Connect component for the Spark3 service

Corrected permissions on directories and files in the bundle to support the ability to install the product on a file system with umask of 027

Added ability to manage environment variables for Hive

Added ability to manage environment variables for HBase

Bumped Spark to 3.4.2_arenadata1

Bumped Hive to 3.1.3_arenadata6

Bumped ADB Spark connector to 1.0.5-spark-3.4.x

Incorrectly generated hdfs-site.xml during the YARN’s Manage Ranger plugin action

Spark3 didn’t catch up SSL settings after the Remove/Install actions

ADB PySpark connector errors

LDAP didn’t work on Airflow2

Faulty removal of Spark2 in ADH 3.2.4

HDFS start balancer failed on Astra Linux

Wrong paths of logs for Impala in ADH v3.1.2_arenadata1_b1-2

HiveServer2 couldn’t start after upgrading ADH to 3.2.4 due to the SAN section in certificates

The SSM service is currently in the technology preview state and is not intended for use in a production environment. It is under development and is provided to the clients for a review

The path to the Python interpreter for the Spark 2/3 services has been changed from /opt/python3.10/bin/python3 to /opt/pyspark3-python/bin/python3. Take this into account when setting the PYSPARK_PYTHON service parameter

To install Impala on Red Hat, you need to manually install the cyrus-sasl-sql package. Will be fixed in the next version

ADCM minimum version is 2.0 now

3.2.4.1

3.2.4.1.b3

Date: 27.04.2024

Improvements

Removed the need to install Axiom JDK when using Astra Linux

3.2.4.1.b2

Date: 16.01.2024

Bug fixes

Faulty Spark 2/3 configuration when kerberizing

3.2.4.1.b1

Date: 26.12.2023

New features
Improvements
Bug fixes
Misc/Internal

Upgraded Flink to 1.17.1_arenadata1 with the FLINK-32976 patch

Upgraded Tez to 0.10.1_arenadata1

Upgraded ADB Spark Connector to 1.0.5-spark-3.3.x with performance boosts, security improvements, and bug fixes

Added jdbc-tools 1.0 package

Upgraded Spark2 to 2.3.2_arenadata2 with the SPARK-31644 patch

Upgraded Hadoop to 3.2.4_arenadata1 with the following patches:

HDFS-14768: Busy DN replica should be consider in live replica check
HDFS-15186: Erasure Coding: Decommission may generate the parity block’s content with all 0 in some case
HADOOP-17340: TestLdapGroupsMapping failing -string mismatch in exception validation
HADOOP-15783: TestSFTPFileSystem.testGetModifyTime fails
YARN-9554: TimelineEntity DAO has java.util.Set interface which JAXB can’t handle

Added zstd support in HDFS

Added PMDK support in HDFS

Upgraded Spark3 to 3.3.2_arenadata1 with the SPARK-39910 patch (internal development)

Upgraded Spark3 Livy to 0.7.2_arenadata5

Upgraded Sqoop to 1.4.7_arenadata2

Upgraded Phoenix Query Server to 6.0.0_arenadata2

Upgraded Phoenix to 5.1.3_arenadata2

Upgraded HBase operator tools to 1.2.0_arenadata2

Upgraded Hive to 3.1.3_arenadata5 with the following patches:

HIVE-20344: PrivilegeSynchronizer for SBA might hit AccessControlException
HIVE-21844: HMS schema Upgrade Script is failing with NPE

Upgraded HBase to 2.4.17_arenadata1

Upgraded Airflow to 2.6.3

Excluded the vulnerability of the log4j library up to the version 2.15

Added log settings for Solr in ADCM

Added log settings for ZooKeeper in ADCM

Excluded Airflow1 from the bundle

Added the Thrift Server component for Spark3

Added the template for changing Ulimits to ADCM component configuration

Added support for AstraLinux 1.7 SE "Орел"

Reconfigured Hive connection settings to the metadata store database to improve flexibility

Updated the Postgres JDBC driver supplied with the distribution

Now there is no complete recreation of existing keytabs when expanding/installing services

Error adding roles to shiro.ini when setting up Zeppelin

Spark3 Server failed to start after upgrade

Because of the Airflow 1 deprecation, it’s mandatory to remove it from the cluster with the upgrade. Airflow 2 can be installed instead

Minimum ADPS version — 1.1.0

3.1.2.1

3.1.2.1.b2

Date: 30.11.2023

Bug fixes

Reconfiguration error due to HDFS being in SafeMode for a while

Error updating repository URL during offline upgrade

Upstream changes related to Phoenix Query Server are taken into account while updating HBase

Errors upgrading Community Edition cluster

3.1.2.1.b1

Date: 20.10.2023

New features
Improvements
Bug fixes
Misc/Internal

Upgraded Flink to 1.16.2

Incorporated the following upstream updates:

HIVE-5312: Let HiveServer2 run simultaneously in HTTP (over thrift) and Binary (normal thrift transport) mode
HIVE-20187: Incorrect query results in hive when hive.convert.join.bucket.mapjoin.tez is set to true
HIVE-21940: Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
HIVE-22914: Make Hive Connection ZK Interactions Easier to Troubleshoot
HIVE-19825: HiveServer2 leader selection shall use different zookeeper znode

Bumped Solr to 8.11.2 with fixed vulnerability of the log4j library

Bumped Sqoop to 1.4.7 with fixed vulnerability of the log4j library

Bumped HBase to 2.2.7

Bumped Phoenix to 5.13

Implemented Impala support

ADB Spark 3 Connector is now included in the ADH bundle

Implemented hbck2 (an HBase component)

Introduced the Maintanence mode which provides an ability to remove any node from a cluster

Intoroduced High Availability auto-management for ADH services

Added the SQL Gateway component for Flink

Added PySpark 3 for customer installation

Added Knox SSO authorization for Zeppelin

Tweaked Kerberos management (enable, disable, configure)

Added logging settings for Spark in ADCM

Added the Precheck packages cluster parameter that enables/disables package checks. By default the checks are disabled

Added logging settings for Sqoop in ADCM

Airflow 2 services failed after kerberization and restart

Metadata and statistics errors for Hive

Hive configuration failed without an installed ZooKeeper on CE

The problem with config groups for Hive

Spark 3 check failed if Spark 2 was not installed

Inability to remove Spark 3 without mapped components

The problem with HDFS Balancer with enabled Kerberos

Minimum ADPS version — 1.0.5

Minimum ADCM version — 2023.10.10.08

2.1.10

2.1.10.b1

Date: 21.06.2023

New features
Improvements
Bug fixes
Misc/Internal

Added an ability to select a TLS version for ADH services

Added support for custom Zeppelin interpreters

Added the new component Spark History Server for Spark3

The following upstream updates have been incorporated:

HIVE-20192: HS2 with embedded metastore is leaking JDOPersistenceManager objects
HIVE-20209: Metastore connection fails for first attempt in repl dump
HIVE-20511: REPL DUMP is leaking metastore connections
HIVE-20522: HiveFilterSetOpTransposeRule may throw assertion error due to nullability of fields
HIVE-20627: Concurrent async queries intermittently fail with LockException and cause memory leak
HIVE-20682: Async query execution can potentially fail if shared sessionHive is closed by master thread
HIVE-21018: Grouping/distinct on more than 64 columns should be possible
HIVE-26743: Backport HIVE-24694 to 3.1.x
HIVE-21206: Bootstrap replication is slow as it opens a lot of metastore connections
HIVE-22393: HiveStreamingConnection: Exception in beginTransaction causes AbstractRecordWriter to throw NPE, covering up real exception
HIVE-22841: ThriftHttpServlet#getClientNameFromCookie should handle CookieSigner IllegalArgumentException on invalid cookie signature
HIVE-24552: Possible HMS connections leak or accumulation in loadDynamicPartitions
HIVE-25830: Hive::loadPartitionInternal occur connection leak
HIVE-25268: date_format udf returns wrong results for dates prior to 1900 if the local timezone is other than UTC
HIVE-25075: Hive::loadPartitionInternal establishes HMS connection for every partition for external tables

ResourceManager high availability mode activates automatically

Spark can work with a custom Hive Metastore

Added SSL support for Hive Metastore

Added the Remove action for the MariaDB service with the created status

Updated Spark3 version to 3.3.2

ZooKeeper: added links to Admin Server endpoints on the ZooKeeper page in ADCM

Enhanced the Move action behavior

Fixed: can’t remove faulty installed Airflow 1 after Airflow 2

Zeppelin: fixed incorrect Hive JDBC string with enabled Hive HA

Hive: fixed timezone for date_format()

Fixed NiFi hive3streaming Unable to close exception

Fixed GROUPING/DISTINCT limitations for Hive tables with 64+ columns

Hive: fixed the AssertionError for requests with union

ZooKeeper: changed the default port for Admin Server

2.1.8

2.1.8.b3

Date: 02.03.2023

New features
Improvements
Bug fixes
Misc/Internal

Airflow2: added the high availability mode

Airflow2: added LDAP authentication/authorization support

Airflow2: added support for external broker configuration

Hive version updated to 3.1.3

The following upstream updates have been incorporated:

HIVE-20007: Hive carries out timestamp computations in UTC
HIVE-20221: Increase column width for partition_params
HIVE-20437: Handle schema evolution from float, double and decimal
HIVE-20833: package.jdo needs to be updated to conform with HIVE-20221 changes
HIVE-20839: Fix the "Cannot find field" error during dynamically partitioned hash join
HIVE-21050: Use Parquet LogicalTypes
HIVE-21215: Read Parquet INT64 timestamp
HIVE-21837: MapJoin throws an exception when selected column has completely null values
HIVE-21987: Hive is unable to read Parquet int32 annotated with decimal
HIVE-22476: Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none
HIVE-22540: Vectorization: Decimal64 columns don’t work with VectorizedBatchUtil.makeLikeColumnVector(ColumnVector)
HIVE-22589: Add storage support for ProlepticCalendar in ORC, Parquet, and Avro
HIVE-22648: Upgrade Parquet to 1.11.0
HIVE-23345: INT64 Parquet timestamps cannot be read into bigint Hive type
HIVE-24074: Incorrect handling of timestamp in Parquet/Avro when written in certain time zones in versions before Hive 3.x
HIVE-25054: Upgrade jodd-core due to CVE-2018-21234
HIVE-25093: date_format() UDF is returning output in UTC time zone only
HIVE-25104: Backward incompatible timestamp serialization in Parquet for certain timezones
HIVE-25299: Casting timestamp to numeric data types is incorrect for non-UTC timezones
HIVE-25458: unix_timestamp() with string input returns wrong result
HIVE-25559: to_unix_timestamp() udf result is incorrect
HIVE-25577: unix_timestamp() is ignoring the time zone value
HIVE-26233: Problems reading back PARQUET timestamps above 10000 years
HIVE-26270: Wrong timestamps when reading Hive 3.1.x Parquet files with the vectorized reader
HIVE-26320: Incorrect results for IN UDF on Parquet column of CHAR/VARCHAR type
HIVE-26612: INT64 Parquet timestamps cannot be read into BIGINT Hive type
HIVE-26658: INT64 Parquet timestamps cannot be mapped to most Hive numeric types
HIVE-26955: Select query fails when decimal column data type is changed to string/char/varchar in Parquet

Changed the hive.server2.transport.mode default value to binary

The cluster_version_before_upgrade variable hidden from config.yaml.j2

HDFS: removed RW permissions from the dfs.permissions.superusergroup property

Spark3: added ADH python package to the installation routine

Implemented a custom configuration for container-executor.cfg

Fixed CORS errors in ResourceManager UI2

Fixed: no retries for the Disable balancer task during the HBase shrink action

Fixed: missing timestamps in the ansible log output in ADCM

Fixed: the configuration for checker-thriftserver is not overwritten during reconfiguration

Fixed: the livy_spark3_move action finished execution with unexpected result

Fixed: status checker does not run for Spark3 after reinstalling

Fixed the checker-thriftserver conflict with other services

Fixed wrong setting of dfs.datanode.https.address when running the Disable Kerberos action

Fixed NodeName stop sequence for the JournalNode expand action

Fixed: HDFS doesn’t stop with enabled Kerberos

Fixed issues with host cert key permissions

Fixed .jceks file permissions

Fixed: Sqoop Hive import fails with enabled Ranger plugin

Fixed the NameNode stop action behavior

Airflow2: fixed support for external database connections

Airflow2: fixed Redis configuration

Airflow2: fixed template usage for airflow.cfg (cfg_properties_template)

Airflow2: fixed permissions for Redis config directory

Airflow2: fixed the Logging level list

Fixed: a failed check if TEZ submit creates an entry in Timelineserver

Fixed the ZooKeeper zkCli.sh error with enabled Kerberos

Fixed: Spark checks failed after YARN expand

Fixed errors with disabling Kerberos

ADH MySQL Service: changed the display name and version

Fixed the description for name.dirs in config.yml

Fixed the description for spark.ssl.trustStore

Updated the hadoop.http.authentication.signature.secret.file behavior

Updated offline package versions

2.1.7

2.1.7.b1

Date: 20.12.2022

New features
Improvements
Bug fixes
Misc/Internal

Added the livy-spark3 component to the Spark3 service

Added Hive delegation token

Added the Apply configs from ADCM checkbox for all services

Flink build 1.15.1 is available

Added the ability to connect to Flink JobManager in the high availability mode

Added package checks optimizations for the installation

Added a cleanup of MIT credentials after disabling Kerberos in an ADH cluster for the following services:

Hive
YARN
HDFS

Passwords hidden for actions Config check, Enable SSL for the following services:

YARN
HBase
HDFS

Refactored Livy impersonation options

Added the ability to configure Livy impersonation

Added additional xasecure.add-hadoop-authorization parameter

The Disable HA HiveServer2 action uses the deleteall command instead of rmr

Performed actions optimizations

Added the ability to delete a service in the created state

Fixed the High Availability mode activation for YARN ResourceManager

Fixed: failed to enable SSL with Airflow2 on AltLinux

Fixed: Flink TaskManager doesn’t start in the High Availability mode

Fixed: no metrics when Monitoring service was installed in a wrong order

Fixed: missing Spark3 actions Add Spark3 Livy, Remove Spark3 Livy

Fixed: Flink upgrade from 2.1.6 to 2.1.7 fails if Job Manager is not collocated with the HDFS Client

Fixed: missing parameter container-executor.class in yarn-site.xml

Fixed: Flink installation failed

Fixed the Manage ranger plugin action in HBase

Fixed SSL settings for the following services:

HDFS
YARN
Hive

Fixed: the Enable SSL action failed due to absence of ranger-hbase-plugin

Fixed the upgrade from 2.1.6.b4-1 to 2.1.7.b1-pre_rc

Fixed: incorrect minimum version for an upgrade

Fixed: the Enable HA/Disable HA actions lead to a configuration divergence of hive-site.xml on servers

Fixed the Disable HA HiveServer2 action error

Fixed HBase SASL error while connecting to a kerberized ZooKeeper (no hbase-jaas.conf)

Fixed the error when expanding Hive Server 2 on a host with Hive Metastore

Fixed the Add Hive Metastore action

Fixed: no description for the Reliability Control → timeout field

Fixed: the Scheduler type does not pass to the yarn_scheduler_jmx filter

Fixed: passwords shown in the Ansible log during the Enable SSL action

Fixed: wrong Spnego keytab value when expanding Hive Server 2

Fixed components list on the Hosts - Components page

Fixed: need to install the jdbc-mysql driver on nodes with the hive-client service when internal MariaDB is used

The Enable Ranger plugin action for Hive failed if no YARN service and politics are defined in Ranger

Fixed: enabling SSL repeatedly failed

Fixed: the Spark Thriftserver check fails due to FairScheduler being used

Fixed: YARN reconfiguration fails if FairScheduler is enabled

Fixed errors with the order of hosts decommissioning

Fixed the bug that prevented running Flink on Yarn

Added install_flag: true to the Upgrade action

Fixed naming for service configs (HDFS, HBase, Hive)

Fixed naming for ADH Service actions

Fixed naming and typos for ADH install actions

Changed the sequence of actions for Kerberos, SSL, and cluster installation

2.1.6

2.1.6.b4

Date: 03.11.2022

Bug fixes

Fixed: wrong Spnego keytab value when expanding Hive Server 2

2.1.6.b3

Date: 28.10.2022

Bug fixes

Fixed: expand operation with http principal is failed

Fixed: the Disable/Enable SSL and Kerberos actions available in any state

Fixed: Spnego error when disabling Kerberos for Hive

2.1.6.b2

Date: 17.10.2022

New features
Bug fixes

Added support for customization of krb5.conf via ADCM

Fixed: HDFS gets corrupt because of missing Spark files

Fixed an issue with custom Hive configuration

2.1.6.b1

Date: 16.09.2022

New features
Improvements
Bug fixes
Misc/Internal

Added support for AltLinux 8.4

Added support for customization of ldap.conf via ADCM

Added support for FreeIPA kerberization

Improved errors handling using cURL

Hive logging refactored

Fixed: cannot change parameters in ranger-yarn-policymgr-ssl.xml (YARN)

Fixed: hive.metastore.sasl.enabled cannot be changed (Hive)

Fixed a configuration name after Flink restarts

Fixed: Zeppelin user does not exist when enabling Ranger for YARN

Fixed: cannot change nameservice when faulty installed (HDFS)

Fixed: a cluster upgrade always switches state to Installed

Fixed: a Spark 3 upgrade does not disable old repositories on Alt Linux

Fixed: the Reconfigure Kerberos action has no allow_to_terminate attribute

Fixed: YARN Resource Manager does not start on Alt 8.4

Fixed ClassNotFoundException while executing spark-sql with enabled Ranger plugin

Fixed inconsistency between security settings in Spark and Spark3 configs

Changed check code due to Scala version upgrade

2.1.4

2.1.4.b11

Date: 08.08.2022

New features
Improvements
Bug fixes

Added the ability to specify external nameservices

Added the ability to connect to HiveServer2 in the fault-tolerant mode

Cluster components states refactored

Refactored the order of Stop, Start, and Restart actions for the HDFS service

Enhanced monitoring metrics collection by YARN queues

Removed read-only attribute from the hadoop.security.auth_to_local property

Fixed the cluster kerberization status error after a bundle upgrade

Fixed: Ansible variable is not resolved during HDFS installation

The hive.zookeeper.quorum property gets reset on the Hiveserver2 Disable action

Fixed permissions for the dfs.datanode.data.dir directory

2.1.4.b10

Date: 16.06.2022

New features
Improvements
Bug fixes
Misc/Internal

The ability to install ADH components from custom Docker registry is added

The check box Rewrite current service SSL parameters is added for the Enable SSL action

New parameters are added for the Zeppelin Hive interpreter — in order to enable SSL and Kerberos

Retries for generating Kerberos principals are implemented

Custom authentication (LDAP/AD) is enabled for Hive2Server

The Move action is added for Spark Livy Server

The Move action is added for Spark History Server

The Move action is added for Flink Job Manager

The Move action is added for Sqoop Metastore

The Move action is added for YARN Timeline Server

The Move action is added for YARN MapReduce History Server

The Ranger plugin for Solr authorization is added

The ability to remove services from the cluster is added

The ability to customize configuration files via ADCM is added

The support of Kerberos REALM is added

The Solr connection for audits is changed from Solr server to ZK node

Changing the property type from string to option when upgrading a bundle does not reset the property value anymore

SSL default configuration parameters are changed from invisible to read-only

Fixed: duplicate dictionary keys in the config.yaml file that did not pass the new YAML validation in ADCM

Fixed: the error with Zeppelin installation without Hive

Fixed: users could change some read-only Kerberos-related parameters in services

Fixed failing jobs when enabling the GPU on YARN property

Fixed the error with applying the Remove service action to Spark3

Fixed the error with incorrect value in interpereter.json in Zeppelin when SSL is off

Fixed: after enabling SSL policies did not work in Ranger

Fixed: the Container DN format when applying Enable Kerberos

Fixed the error with incorrect saving the HBASE_MASTER_OPTS value in HBase

Fixed: YARN failed to copy container-executor.cfg

Fixed: job status and result did not match when deleting optional (unnecessary) settings in the ZooKeeper service configuration

Fixed: the action Check applied to Flink failed if hosts in ADCM had uppercase letters

Fixed: services did not collect policy from Ranger in SSL

Fixed: applying the action Remove internal database actually removed the service itself

A fixed Ranger Solr plugins repository is added for ADH 2.1.4

The order of bundle upgrades is changed from particular to general

Dependencies between components and services at the ADCM level are implemented

Ranger Plugins are bumped to 1.0.3

The ability to download ADH offline packs from the Arenadata source directory to the customer proxy repository is added

2.1.4.b9

Date: 31.03.2022

New features
Improvements
Bug fixes
Misc/Internal

The Kerberos authentication is enabled for Web UI

SSL for Ranger plugins is enabled

SSL for Flink is enabled

SSL for Sqoop is enabled

The rollback operation is enabled in the case of the failed kerberization process

SSL for Zeppelin is enabled

SSL for Airflow is enabled

SSL for Spark is enabled

SSL for Solr is enabled

SSL for Hive is enabled

SSL for HBase is enabled

SSL for YARN is enabled

The ability to configure SSL in the Hyperwave clusters is added

SSL for HDFS is enabled

The Custom hive-site.xml block is placed after the hive-site.xml block in the configuration settings

The links to NameNodes and HttpFS are moved to the top of the HDFS web links list

The order of cluster stop actions is reversed

The Reconfig and restart action is replaced by the Restart action that runs three operations: stops the service, applies configuration parameters, and starts the service

The ability to execute the resourcemanager_enable_ha action without changing hc_map is disabled for Resource Manager

The ability to execute the resourcemanager_expand action without enabling High Availability (HA) is disabled for Resource Manager, since it does not work without enabling HA

Fixed: the parameters from the httpfs-site.xml configuration file did not apply to the HttpFS service

Fixed: SQL queries launched from Spark3 or Spark2 did not work correctly with the Ranger Hive plugin being enabled

Fixed: the keystore/truststore parameters specified in the service settings did not override the default cluster settings during the per-service installation

Fixed: the Phoenix Query Server could not work in the thin mode in the kerberized environment

Fixed: the error with running spark-thrift-server-checker after enabling and disabling Kerberos

Fixed: the error with saving the Flink configuration parameters after installation in the kerberized environment

The ability to work with kerberized ADH clusters is fixed for the Windows operation system

Fixed the Cluster Monitoring imported, but not installed error that occurred during the cluster installation

Fixed the error with checking privileges to the public schema of the ranger database at the preconfigure-check stage

The cluster installation errors at the HBase check stage are fixed

Fixed: application logs were unavailable in the legacy Resource Manager UI of the kerberized clusters

Fixed: Kerberization on AD failed if two instances of Ranger Admin were installed

Configuring SSL settings is added before enabling SSL in autotests

Web links are rewritten to support the http/https schema change

2.1.4.b6

Date: 21.12.2021

Improvements
Bug fixes

Refactoring of the database for Hive Metastore checks is done

The error with the kdc_type parameter being set to null after the cluster upgrade is fixed

Fixed the error with MapReduce jobs launched in the kerberized cluster not under the yarn user

Fixed: mapped but not installed services caused the errors via installation of other services

2.1.4.b5

New features
Improvements
Bug fixes
Misc/Internal

The HTTP mode is added for HiveServer2

The AD/LDAP/SIMPLE authorization is added for Zeppelin

The HBase REST Server component is added for HBase

The ability to use Active Directory as Kerberos storage is implemented

The ability to set Kerberos principal for running Spark jobs via YARN is added. Before that Spark always launched tasks using the yarn principal

Fixed the error with DataNodes expanding via the Add DataNode action in the kerberized environment

Fixed the error with the Livy interpreter working in the kerberized environment

Fixed the error with the Hive interpreter working in the kerberized environment

Fixed the error with Zeppelin checks after Kerberos activation

Fixed the error with removing the monitoring component jmxtrans

Fixed: the Enable Resource Manager HA action failed in the clusters with Kerberos and Ranger plugin being enabled

Fixed the error with Airflow installation in the kerberized environment

Fixed the error with enabling Kerberos after its disabling (enable → disable → enable)

The full stack testing for using the RedHat 7.9 enterprise license in ADH is added

2.1.4.b4

Date: 01.11.2021

New features
Improvements
Bug fixes
Misc/Internal

The Reinstall status-checker action is implemented. It runs the status-checker deployment scripts for services as well as for Docker containers

The Solr check is changed: the number of live Nodes is compared instead of lists

The timeout/retry count is increased for the Zeppelin check

Fixed: the bundle version update error

Fixed: the Ranger plugin worked incorrectly in the case of using some characters in the cluster name

Fixed: the error with the inconsistent state of the actual DataNodes maintenance state after upgrading ADH from 2.1.3 to 2.1.4

Fixed: Keytabs permissions changed during some actions

Fixed: the error with per-service installation in the kerberized ADH clusters

Fixed: the error with parsing the list of containers by the Docker status checker in Airflow

The error with the heap size test is fixed

The broken compatibility with the current dev version of ADCM is fixed

The test logic for per-service installation in the kerberized ADH clusters is changed: before each service installing it is necessary to add the service to the cluster and add its components to hosts (instead of adding all components to all hosts)

2.1.4.b3

Date: 30.09.2021

New features
Improvements
Bug fixes
Misc/Internal

The MIT Kerberos integration is implemented in ADCM

The ability to add the custom port for Kerberos Server is added

Ranger plugin and kerberized YARN are integrated

Ranger plugin and kerberized Hive are integrated

Ranger plugin and kerberized HBase are integrated

Ranger plugin and kerberized HDFS are integrated

The Ranger plugin is made operable on kerberized services

The split memory option is added for Hive services: resource management options can be configured for HiveMetastore and HiveServer2 separately

The edit memory size option is added for Flink components

The edit memory size option is added for Solr components

The edit memory size option is added for Sqoop components

The edit memory size option is added for Spark components

The edit memory size option is added for Zeppelin components

The edit memory size option is added for HBase components

The edit memory size option is added for YARN components

The Add/Remove actions are added for YARN Timeline server

The Add/Remove actions are added for Sqoop Metastore

The edit memory size option is added for HDFS components

The ADH memory management option is added

The Add/Remove actions are added for Flink Job Manager

The Add/Remove actions are added for Spark Thrift Server

The Add/Remove actions are added for Spark Livy Server

The Add/Remove actions are added for Spark History Server

The Add/Remove actions are added for Hive Tez UI

The Move action is added for YARN MapReduce History Server

Kerberos is implemented for ADH in ADCM

The ability to move any service component to another Node or remove it from the cluster is added

The unnecessary repository/packages check at the HDFS installation step for ADH EE is removed

The path for docker-status-checker files is changed

Fixed the error with the refreshUserToGroupsMappings task in the kerberized cluster

Fixed the error with Solr not working after applying host actions

Fixed the error with the Enable Resource Manager HA action in the kerberized environment

Fixed the error with Spark expanding/shrinking in the kerberized environment

Fixed the error with Solr shrinking in the kerberized environment

Fixed the error with YARN Node Manager expanding in the kerberized environment

Fixed the error Keytab does not exist: $user.home/hadoop.keytab

Fixed the error with Flink Server expanding/shrinking

Fixed the error with Flink JobManager Server Port 6123 availability

Fixed the error with Sqoop expanding in the kerberized environment

Fixed the error with YARN Server expanding/shrinking

Fixed: the Solr role tried to import the absent monitoring role even with the monitoring service being not installed

Fixed the error with running the Install service action for Solr

Fixed the error with Spark Livy server expanding in the kerberized environment

Fixed the error with Spark Thrift Server shutting down in the kerberized environment

The Solr shared memory reservation error is fixed

Fixed the error with Solr kerberization with no Hadoop services being added

Fixed the error with starting jmxtrans after the host reboot

The incorrect URL for the Hive Server Web UI is fixed

Fixed the error with opening link to the HiveServer2 UI after ADH installation

Fixed the error with availability of host actions after the cluster upgrade from 2.1.3.0 to 2.1.4.b2

Fixed: the Enterprise cluster installation failing during the per-service action

Fixed: the Reconfig and restart action failed for the Monitoring service with Airflow being installed

Fixed: the Spark ThriftServer process did not stop after the Spark context being killed

In order to speed up autotests and development process, the packages check is made optional for the specified environments

The http and registry versions are bumped to the current ET release

The specifications for new Spark and YARN MapReduce History Server actions are added

2.1.4.b2

Date: 20.07.2021

New features
Improvements
Bug fixes
Misc/Internal

The ability to use external MySQL in AirFlow is added

The ability to use external PostgreSQL in Airflow is added

Host actions are added for the Spark3 service. Hosts actions here and below mean the actions managed at the host level

Host actions are added for the Monitoring service

Host actions are added for the Sqoop service

Host actions are added for the Airflow service

Host actions are added for the Solr service

Host actions are added for the Flink service

Host actions are added for the Zeppelin service

Host actions are added for the Spark service

Host actions are added for the MySQL service

Host actions are added for the Hive service

Host actions are added for the HBase service

Host actions are added for the YARN service

Host actions are added for the HDFS service

The Sqoop Check action is modified according to the new Hive external DB variables

Host actions are renamed

The unnecessary solr-tools.jar file is removed from the Solr submodule in the bundle, as it caused errors in CI

The error with offline installation in the operation system RH 7.9 is fixed

The error with applying the Reconfig and restart action to the Monitoring service is fixed

Fixed the error with installing MySQL on the host from which it was removed earlier

Fixed the error with the CPU utilization YARN metric after the ADH cluster installation

Fixed the error with the Maintenance DataNode action that occurred due to the incorrect content of the dfs.hosts file (if another DataNode has been switched to the maintenance state earlier)

For debugging possible problems via Allure reports, logs collecting is implemented for Airflow service

Fixed the wrong description in the autotest that implements migration to the external MySQL database

Specifications for testing host actions are changed

Tests for host actions are added

Specifications and autotests are added for the ADH shrink scenarios

2.1.4.b1

Date: 22.06.2021

New features
Improvements
Bug fixes
Misc/Internal

The ability to define custom HBase environment variables is added

The action for removing MySQL from the ADH cluster is added

The ability to use external PostgreSQL in Hive Metastore is added

The ability to change the Hive Metastore host:port is added

The ability to configure Java Heap for HiveServer2 is added

The ability to add/change/remove configuration options from the httpfs-site.xml file via ADCM is added

Start checks for JournalNodes and NameNodes are added

Spark 3.1.1 is implemented for ADH 2.X

The offline installation is implemented for ADH

The Check action is improved for Sqoop

The built process for Solr is changed: Arenadata repositories are used instead of external Maven repositories

The ability to use Docker Registry from Arenadata repository is implemented

In order to install services (e.g. Airflow) without DNS, resolving host names in the Docker containers is implemented

Airflow installation without DNS is implemented

Implemented the DN check/wait for membership in the cluster from the DN itself

The Spark component Spark History Server is made mandatory

Refactoring of the ZooKeeper service is done

Fixed: Hive installation failed when adding the Tez component without TezUI

Fixed: the duplicate key type: list is removed from the bundle configuration file

Fixed: YARN applications could not run jobs after the Ranger plugin being enabled

Fixed the error with the Enable Resource manager HA action in the ADH Community Edition

Fixed the error with Sqoop installation after Ranger being installed

Fixed problems with HBase logging

Docker images in the package specifications are changed according to the new naming convention

Packages for the 2.1.4 release are uploaded to the Google repository

Fixed the logs collecting for HttpFS during autotests

The repository for the 2.1.4 version of the product is created

The ability to update a bundle without rebuilding packages is added

Unnecessary garbage files are removed from the bundle build archive

To resolve ansible lint-issue, the pipefail options for shell tasks are added

Tests for checking the integration between ADH and Ranger are added

Build 2.1.3.1 ADH

2.1.3

2.1.3.0

Date: 14.01.2021

New features
Improvements
Bug fixes
Misc/Internal

The Remove/Add Hive Tez actions are added

The Add diamond and Remove diamond actions are added

Build Ranger 2.0.0

The logic of the YARN Resource Manager expanding is changed

The validation logic for Spark Client is changed from parallel to in series

ADPS integration: the new ADPS bundle that contains Ranger has to be re-integrated with ADH after moving Ranger from it

Fixed the error with closing Hive tasks after finishing checks

Livy checks are temporarily disabled

Fixed the error with bad cluster name that occurred when creating the HDFS service via Ranger Admin

Fixed the error with the HDFS action Remove Client

Fixed unsuccessful Hive CLI checks after the Ranger plugin being enabled

Fixed the error with connecting multiple clusters to one ADPS

The repository for plugins is added to the release bundle

Packages for the 2.1.3.0 release are uploaded to the Google repository

Build 2.1.3.0 ADH

ADH is bumped to 2.1.3.0

New repositories for Ranger plugins are added

Specifications on the workaround for the error with HBase expanding after HDFS expanding are edited

Specifications on expanding ADH services are created

2.1.2

2.1.2.5

Date: 19.11.2020

New features
Improvements
Bug fixes
Misc/Internal

Client components for Flink are added

Client components for HDFS are added

Client components for YARN are added

The timeout for the docker_container is increased in autotests. Also the checks of the correct starting order are added for services in containers

The default port number for MySQL in the Airflow Metastore is changed

The volume for Hadoop configurations (e.g. /etc/hadoop/conf/, /etc/hive/conf, etc.) in Docker images with Airflow is increased

Fixed the race condition within Sqoop checks (part 2 — with multiple Clients)

Fixed the cluster installation error that occurred when MySQL being installed

Fixed the cluster installation error that occurred when checking after installing Spark

Packages for the 2.1.2.5 release are uploaded to the Google repository

Building the offline package for ADH to the ADH repository is implemented

ADH is bumped to 2.1.2.5

Tests for the YARN/HDFS Client are created

Specifications for autotests of the YARN/HDFS Client are added

Changes in the Hadoop tests related to uploading bundles are made

Autotests for Airflow are created

All file accesses are made independent from the current working directory in autotests

The dev repository for ADH 2.1.2.5 is initialized

2.1.2.4

Date: 18.09.2020

Bug fixes
Misc/Internal

The incorrect repositories in the ADH release and ET packages are fixed

The error with processing the release_version flag is fixed

Packages are copied from 2.1.2.3 to 2.1.2.4

2.1.2.3

Date: 15.09.2020

New features
Bug fixes
Misc/Internal

The ADH bundle is divided into community and enterprise versions

The High Availability for NameNodes is implemented

Fixed the error that occurred at the Restart NameNodes step during the Remove NameNode action

Fixed the error with checking Hive Tez on multiple hosts

Fixed the error with switching dynamic allocation

Fixed: ZKFC ignored the dfs.namenode.rpc-bind-host parameter and used the dfs.namenode.rpc-address parameter for binding to the host address

Packages for the 2.1.2.3 release are uploaded to the Google repository

All specifications and BOMs related to ADH20 are moved to the prj_adh. Publishing of artifacts to the artifactory is changed

The release and develop repositories are segregated in bundles

2.1.2.2

Date: 05.06.2020

Improvements
Bug fixes
Misc/Internal

The epel-release installation is disabled

The race condition within Sqoop checks is fixed

Fixed the error with running the cluster Check action

Packages for the 2.1.2.2 release are uploaded to the Google repository

ADH is bumped to 2.1.2.2

Nginx is copied from the Epel repository to the ADH2 repository

2.1.2.1

Date: 21.05.2020

New features
Improvements
Bug fixes
Misc/Internal

Sqoop deployment is ported to ALT Linux

Solr deployment is ported to ALT Linux

Flink deployment is ported to ALT Linux

The public ALT Linux repository for ZooKeeper 3.4.14 is created

Airflow deployment is ported to ALT Linux

The ability to set nproc limits for HBase is added

Sqoop is added into the ADH bundle

ADH 2.X packages are built for ALT Linux

Solr 8.2.0 is added for ADH 2.2

Refactoring of the ADH deployment process for ALT Linux is made

The error with commissioning/decommissioning Nodes via ADCM is fixed

Fixed the error Port already in use: 10102 that occurred with HBase hbck

Fixed the ordering of Generic components

Fixed: web links in ADCM did not refresh after HDFS DataNodes or YARN Node Manager shrinking

Fixed the error that occurred with YARN 3.1.2 in ALT Linux during ansible tasks

Fixed the absence of org.json.JSONObject for Sqoop Metastore

The /var/run/sqoop directory is created for Sqoop Metastore

The missing dependency for Flink-related packages is added

Fixed the error with installing HBase and Solr when using the external ZooKeeper

Airflow deployment is disabled (visible in ADCM and only for ALT Linux)

The public repository for the release is changed

Packages for the 2.1.2.1 release are uploaded to the Google repository

Changes for libisal are merged

Changes for bigtop-groovy, bigtop-jsvc, bigtop-tomcat, and bigtop-utils are merged

Changes for Bigtop are merged

Changes for Livy are merged

Changes for ZooKeeper are merged

Changes for Zeppelin are merged

Changes for Spark are merged

Changes for Phoenix are merged

Changes for Tez are merged

Changes for Hive are merged

Changes for HBase are merged

Changes for Hadoop are merged

Bigtop branches for CentOS and ALT linux are manually merged

The repository url is changed to 2.1.2

Autotests for ADH services are reviewed according to the current stack version

2.1.2.0

Date: 19.02.2020

New features
Improvements
Bug fixes
Misc/Internal

The ability to configure Hive ACID is added

SELinux is disabled for all components during installation

Support of the Flink 1.8.0 is implemented for ADCM

Flink is added into the ADH bundle

The logic of the Shrink action is improved

GPU support is enabled for YARN

Airflow is added into the ADH bundle

The UI link is added for Solr at the main ADCM page

The Shrink/Expand actions are implemented for HDFS HttpFS

HDFS HttpFS checks are implemented

The Solr Cloud Mode is implemented

The Solr deployment is implemented

Solr is added into the ADH bundle

Tez libraries are installed on Hive Client Nodes

Fixed: it was impossible to use Hive with Tez due to the configuration mismatch

Fixed the error with saving configurations for HDFS and YARN

Fixed the error with HBase checks after installation

Fixed the error with YARN checks in the HA mode

Tests/example DAGs for checking the Airflow functionality are added

Tests for checking the Solr functionality are added

2.1.1

2.1.1.1

Date: 02.12.2019

Misc/Internal

Conversion to the custom systemd units

2.1.1.0

Date: 21.11.2019

New features
Improvements
Misc/Internal

YARN Scheduler configuration is implemented

HDFS mover is implemented

The cluster-wide Install button is added to the ADCM UI

The ability to define the external ZooKeeper in the core-site.xml file is added

The ability to add custom/advanced configuration parameters to the *-site.xml files is added

YARN Node labels are implemented

HDFS HttpFS is implemented

HDFS Short-Circuit Local Reads are implemented

HDFS Disk Balancer is implemented

HDFS Balancer is implemented

The *-site.xml files are unified

Asserts and fails are replaced with adcm_check

Monitoring is refactored: code/dashboards are unified, metrics are redesigned, etc.

The hostname variable is removed from the Zeppelin PID definition

The HDFS dashboard is divided into HDFS and YARN dashboards in Grafana

Hadoop PID file names are changed

Manual testing of the ADH 2.1 installation is performed according to the documentation

2.1.0

2.1.0.0

Date: 10.10.2019

New features
Improvements
Bug fixes
Misc/Internal

Implemented the ability to get a status for the following services:

Zeppelin
Livy Server
Thrift Server
Spark Server
Zeppelin
Phoenix
HBase Thrift
HBase
MySQL
YARN
HDFS
Hive

Implemented service management for the following services:

Livy Server
Zeppelin
Spark Thrift Server
Spark Server
Phoenix Server
HBase Thrift
HBase Region Server
HBase Master
Node Manager
Resource Manager
Timeline Service
WebHCat
MySQL
Hive Metastore
Hive Server
DataNodes
Secondary NameNodes
NameNodes

Prepared deployment scripts for the following services:

Livy Server
Spark Thrift
Spark Server
MySQL
HBase Thrift
Phoenix Server
Hive
HDFS

Implemented service checks for the following services:

Zeppelin
Spark Thrift Server
Spark Server
Livy Server
Hive
MySQL
YARN
HDFS
ZooKeeper

Implemented the deployment of the following services:

Zeppelin
Livy Server
Spark Server
HBase Thrift
Phoenix Server
HBase Region Server
HBase Master
WebHCat
Hive Metastore
MySQL
Hive Server
Hive Client
Node Manager
Resource Manager
Timeline Server
DataNode
Secondary NameNode
NameNode

The following builds are available:

Tez 0.9.2
Livy 0.6
Pig 0.17
Flume 1.8
YARN UI 2.0
Sqoop 1.4.7

Monitoring features are implemented for the following services:

Spark
YARN
HBase
HDFS
Hive

Necessary configurations for Hive/Tez are added

The hbase-site block is added to the config.yml file

Necessary configurations for Hadoop services are added

Quick links for services are added

The HDFS rack awareness is implemented via custom scripts

The YARN and MapReduce services are combined into single one

The Resource Manager High Availability is implemented

Checks for Decommission/Recommission for Node Managers are implemented

Checks for Decommission/Recommission for DataNodes are implemented

Zeppelin is bumped to 0.8.1

Zeppelin is implemented for ADCM

Tez UI is implemented for ADCM

The ability to add a new Node Manager to ADH is added

The ability to add new DataNodes to ADH clusters is added

Spark is implemented for ADCM

Ranger is bumped to 1.1

The ZooKeeper Quorum configuration is added

The MySQL role is added to the ADH bundle (as a service)

Multiple configuration directories for Nodes are implemented

The YARN logs aggregation is enabled

Spark and Hive roles are reviewed

The Hadoop role is divided into HDFS, YARN, MapReduce

The Hadoop role for ADCM is refactored

Separate roles for Hadoop are implemented

The ZooKeeper service role is ported from the ADS Bundle

Basic YARN service features are refactored

Fixed the error with the hbase.zookeeper.quorum parameter missing after the installation

Pre-release preparations are made for ADH 2.1.0

The EULA.txt file is added to the bundle root

The repository for ZooKeeper packages is added to ADH

All ADH bundle submodules are switched to Master

Documentation on Decommission/Recomission/HA is prepared

Documentation on HBase deployment via ADCM is prepared

Documentation on Spark deployment via ADCM is prepared

Documentation on Hive deployment via ADCM is prepared

Documentation on YARN deployment via ADCM is prepared

Documentation on HDFS deployment via ADCM is prepared

Documentation for the ADH bundle is prepared

Spark autotests are implemented

Hive autotests are implemented

YARN autotests are implemented

HDFS autotests are implemented

Smoke tests for the Livy Server service check are prepared

Smoke tests for the Spark Thrift Server service check are prepared

Smoke tests for the Spark Server service check are prepared

Smoke tests for the MySQL service check are prepared

Smoke tests for the HBase service check are prepared

Smoke tests for the Phoenix service check are prepared

Smoke tests for the Hive service check are prepared

Smoke tests for the HDFS service check are prepared

The latest stable packages for ADH are built

Found a mistake? Seleсt text and press Ctrl+Enter to report it