Protect files in HDFS
You can use two methods for file protection in HDFS:
-
Setting file permissions.
-
Creating access control lists (ACL).
File and directory permissions
In HDFS, you can restrict access to files or directories using a standard model based on POSIX with modifications. In HDFS, you can grant permissions to a file for its owner, a specified user group, and other users.
For HDFS files, you can use the following flags:
-
r— for reading. -
w— for writing.
Unlike other file systems, HDFS stores only data files, so there is no way to start execution of a particular file. Therefore, HDFS files lack such an access flag as x, which would mean the execution privilege. For the same reason, HDFS does not have such access flags as setUID and setGID.
The following flags grant different access types to a directory in HDFS:
-
r— for listing contents in the directory. -
w— for creating, renaming, and deleting files and directories in it (assuming that also the permissionxis granted). -
x— for entering the directory and getting access to all children in it (makes the directory a working directory for the affected users).
As with files, we don’t have flags similar to setUID and setGID flags for directories either.
Before managing permissions, you should get an access to the NameNode CLI (command-line interface). After that, follow these steps using CLI:
-
Read the root directory contents:
$ hdfs dfs -ls /The output looks similar to this:
-rw-r--r-- 3 hdfs hadoop 20 2021-10-07 13:34 /hadoop drwxrwxrwt - yarn hadoop 0 2021-09-15 16:58 /logs drwxr-xr-x - hdfs hadoop 0 2021-10-12 13:17 /staging drwxr-xr-x - hdfs hadoop 0 2021-09-15 16:57 /system drwxrwxrwx - hdfs hadoop 0 2021-10-20 09:00 /tmp drwxr-xr-x - hdfs hadoop 0 2021-09-27 12:24 /user
Take a look at the permissions to the /logs directory, where we have the flags
drwxrwxrwt:-
dspecifies a directory. -
The first three
rwxflags indicate the file or the directory owner’s permissions. -
The second
rwxflags indicate permissions for the specified user group. -
The third
rwxflags indicate permissions for other users.
-
-
List file permissions in the /tmp directory as in the following example:
$ hdfs dfs -ls /tmpThe output looks similar to this:
drwxr-xr-x - hdfs hadoop 0 2021-10-20 09:03 /tmp/hadoop01 -rw-r--r-- 3 hdfs hadoop 20 2021-10-07 15:36 /tmp/test01 -rw-r--r-- 3 hdfs hadoop 21 2021-10-08 08:40 /tmp/test02 -rw-r--r-- 3 hdfs hadoop 19 2021-10-08 08:27 /tmp/test03 -rw-r--r-- 3 hdfs hadoop 0 2021-10-20 09:00 /tmp/test04
In this example, the files are assigned the
readandwriteprivileges for the owner (hdfsuser),readfor thehadoopgroup, andreadfor other users. -
Before changing file permissions, use the
sudoandsucommands (switch user) to switch to the file owner whose name ishdfs.$ sudo -s $ su - hdfsYou used the above actions to grant yourself the privileges to access files and directories as their owner. Now we execute all the following commands using the
hdfsuser privileges. -
Change the permissions for the group and other users by adding the
wflag for them:$ hdfs dfs -chmod go+w /tmp/test01Check the permissions to the /tmp/test01 file:
$ hdfs dfs -ls /tmpEnsure that the access permissions for the
hadoopgroup and other users have changed:drwxr-xr-x - hdfs hadoop 0 2021-10-20 09:03 /tmp/hadoop01 -rw-rw-rw- 3 hdfs hadoop 20 2021-10-07 15:36 /tmp/test01 -rw-r--r-- 3 hdfs hadoop 21 2021-10-08 08:40 /tmp/test02 -rw-r--r-- 3 hdfs hadoop 19 2021-10-08 08:27 /tmp/test03 -rw-r--r-- 3 hdfs hadoop 0 2021-10-20 09:00 /tmp/test04
The required permissions are granted. In this way you can change permissions for any file or directory available in your HDFS.
You can also change group or owner for files and directories using the chgrp and chown commands.
Access control list
Access control list contains access permissions for users and groups to files and directories. ACL provides you with a more flexible and visual representation of this process. For example, you can use ACL if you need to set permissions for different users and groups to the same file or directory. To enable ACL in HDFS, activate the appropriate option on your Hyperwave cluster.
The following steps will guide you through the ACL setup process:
-
Enable ACL by adding the following property in the hdfs-site.xml file:
dfs.namenode.acls.enabled : trueNOTETo configure ADH, we recommend you to use Arenadata Cluster Manager (ADCM).
-
Reset your HDFS service to apply the changed configuration. Using ADCM, run a task that reconfigures and restarts the cluster.
IMPORTANTCheck whether all your services have successful status (highlighted in green).
-
Run the terminal on the selected NameNode and execute the following command to read the contents of your test Hyperwave directory, for example, /tmp:
$ hdfs dfs -ls /tmpThis command displays all files and directories in the /tmp directory.
-
Choose a file, for example, test01, and execute the following command to get its ACL:
$ hdfs dfs -getfacl /tmp/test01The output looks similar to the following:
# file: /tmp/test01 # owner: hdfs # group: hadoop user::rw- group::rw- other::rw-
-
Add a user, for example,
yarn, with all privileges to this file by using the following command:$ hdfs dfs -setfacl -m user:yarn:rwx /tmp/test01 -
Repeat the command
hdfs dfs -getfacl /tmp/test01to check the updated access list similar to this:# file: /tmp/test01 # owner: hdfs # group: hadoop user::rw- user:yarn:rwx group::rw- mask::rwx other::rw-
The new user yarn can read and write to the test file. This example illustrates that ACLs provide more advanced features for managing access to HDFS files and directories.