ADPS overview
Arenadata Platform Security (ADPS) is a service for centralized management of group security policies in an Apache Hadoop cluster. The diagram below shows how an ADPS cluster can be integrated with an ADH cluster that has Hive.
ADPS is based on the following services:
-
Ranger. An infrastructure for monitoring and managing complex data security on the Hadoop platform. It can use a MariaDB or a PostgreSQL database to store policies and metadata. The Ranger audit logs are stored in Solr and can be viewed in the Ranger Admin UI.
-
Knox. ADPS uses Knox as a reverse proxy for a single point of entry and for perimeter security. To enable SSL, you need to use Knox as well.
-
ZooKeeper. Since Solr doesn’t have a master node for node allocation, it needs ZooKeeper for management and service discovery.
Altogether, these services provides a comprehensive approach to safety in the key areas described below.
Authorization (access control)
ADPS provides features that allow system administrators to control access to Hadoop data via the role-based authorization. It is implemented via the Ranger policies that require the Ranger plugins to be used. Ranger policies allow you to create the following authorization models and more:
-
fine-grained access control for the data stored in HDFS;
-
resource-level access control for YARN;
-
service-level access control for the MapReduce operations;
-
table/column family-level access control for the HBase data;
-
table-level access control for Apache Hive.
Security auditing and monitoring
ADPS allows tracking the system activity using the audit logs. You can also use the perimeter security auditing logs from Knox Gateway. The following processes can be logged:
-
access requests;
-
data processing operations;
-
data changes.
Data protection
ADPS provides mechanisms for the real-time data encryption. ADPS does not require partner solutions for encrypting data at rest, data discovery, and data masking. ADPS supports the following wire encryption methods:
-
SSL for ADPS components. This mode is not suitable for all environments and stack services, in particular, there may be problems with the interaction of internal services when using this protocol.
-
RPC encryption.
-
Data Transfer Protocol encryption.