Configure HDFS

You can configure HDFS in two different ways:

  • Using the Arenadata Cluster Manager.

  • Editing configuration files using any text editor.

 
There are several main configuration files:

  • core-site.xml

  • hdfs-site.xml

  • httpfs-site.xml

  • ranger-hdfs-audit.xml

  • ranger-hdfs-security.xml

  • hadoop-env.sh

All those configuration files are stored in the <hadoop_home>/etc/hadoop/ directory. You can edit those files using a text editor.

NOTE

To configure ADH, we recommend that you use Arenadata Cluster Manager (ADCM).

core-site.xml

This is the most important configuration file in the HDFS. It is used to configure runtime environment settings for Hadoop, such as where the NameNode runs or which ports are used.

For more information about these parameters, see core-site.xml parameters.

hdfs-site.xml

This file contains the configuration settings for NameNodes and DataNodes. Also, this file defines the default block size for the replication process.

For more information about these parameters, see hdfs-site.xml parameters.

httpfs-site.xml

You can use the HttpFS service to interact with HDFS. For this case, you need to configure the httpfs-site.xml file properly. This file is used for the following purposes:

  • High availability. WebHDFS does not support High Availability failover. The best way is to use HttpFS instead.

  • Secure impersonation. The HDFS user is not available for secure impersonation. If you have enabled secure impersonation in an environment where the HDFS superuser is restricted from use, you can enable HttpFS and use the HttpFS superuser for secure impersonation.

TIP

If you are enabling HttpFS for use with High Availability, you should avoid enabling the HttpFS service on the primary NameNode of the ADH cluster.

For more information about these parameters, see httpfs-site.xml parameters.

hadoop-env.sh

This file is used to define configuration parameters related to a Hadoop operating environment, such as Java_HOME. Hadoop uses JRE, and one of the environment variables in Hadoop daemons is Java_Home in the Hadoop-env.sh file.

ranger-hdfs-audit.xml

For more profound control of the environment, this file contains configuration parameters for audit tracking and policy analytics.

For more information about these parameters, see ranger-hdfs-audit.xml parameters.

ranger-hdfs-security.xml

You can enable the Apache Ranger plugin for the HDFS service using the Ranger configuration. A security administrator has an option to configure layers of authorization controls to check access to HDFS by settings parameters in the ranger-hdfs-security.xml file.

When you set all the parameters, the authorization engine will check HDFS ACLs (Access Control Lists). If ACL is disabled, decisions will be based only on Ranger policies without checking HDFS ACLs. When a user attempts to access data through a data service such as Hive, access policies for both the data service and HDFS are checked.

For more information about these parameters, see ranger-hdfs-security.xml parameters.

Found a mistake? Seleсt text and press Ctrl+Enter to report it