Protect files in HDFS
You can use two methods for file protection in HDFS:
-
Setting file permissions.
-
Creating access control lists (ACL).
File and directory permissions
In HDFS, you can restrict access to files or directories using a standard model based on POSIX with modifications. In HDFS, you can grant permissions to a file for its owner, a specified user group, and other users.
For HDFS files, you can use the following flags:
-
r
— for reading. -
w
— for writing.
Unlike other file systems, HDFS stores only data files, so there is no way to start execution of a particular file. Therefore, HDFS files lack such an access flag as x
, which would mean the execution privilege. For the same reason, HDFS does not have such access flags as setUID
and setGID
.
The following flags grant different access types to a directory in HDFS:
-
r
— for listing contents in the directory. -
w
— for creating, renaming, and deleting files and directories in it (assuming that also the permissionx
is granted). -
x
— for entering the directory and getting access to all children in it (makes the directory a working directory for the affected users).
As with files, we don’t have flags similar to setUID
and setGID
flags for directories either.
Before managing permissions, you should get an access to the NameNode CLI (command-line interface). After that, follow these steps using CLI:
-
Read the root directory contents:
$ hdfs dfs -ls /
The output looks similar to this:
-rw-r--r-- 3 hdfs hadoop 20 2021-10-07 13:34 /hadoop drwxrwxrwt - yarn hadoop 0 2021-09-15 16:58 /logs drwxr-xr-x - hdfs hadoop 0 2021-10-12 13:17 /staging drwxr-xr-x - hdfs hadoop 0 2021-09-15 16:57 /system drwxrwxrwx - hdfs hadoop 0 2021-10-20 09:00 /tmp drwxr-xr-x - hdfs hadoop 0 2021-09-27 12:24 /user
Take a look at the permissions to the /logs directory, where we have the flags
drwxrwxrwt
:-
d
specifies a directory. -
The first three
rwx
flags indicate the file or the directory owner’s permissions. -
The second
rwx
flags indicate permissions for the specified user group. -
The third
rwx
flags indicate permissions for other users.
-
-
List file permissions in the /tmp directory as in the following example:
$ hdfs dfs -ls /tmp
The output looks similar to this:
drwxr-xr-x - hdfs hadoop 0 2021-10-20 09:03 /tmp/hadoop01 -rw-r--r-- 3 hdfs hadoop 20 2021-10-07 15:36 /tmp/test01 -rw-r--r-- 3 hdfs hadoop 21 2021-10-08 08:40 /tmp/test02 -rw-r--r-- 3 hdfs hadoop 19 2021-10-08 08:27 /tmp/test03 -rw-r--r-- 3 hdfs hadoop 0 2021-10-20 09:00 /tmp/test04
In this example, the files are assigned the
read
andwrite
privileges for the owner (hdfs
user),read
for thehadoop
group, andread
for other users. -
Before changing file permissions, use the
sudo
andsu
commands (switch user) to switch to the file owner whose name ishdfs
.$ sudo -s $ su - hdfs
You used the above actions to grant yourself the privileges to access files and directories as their owner. Now we execute all the following commands using the
hdfs
user privileges. -
Change the permissions for the group and other users by adding the
w
flag for them:$ hdfs dfs -chmod go+w /tmp/test01
Check the permissions to the /tmp/test01 file:
$ hdfs dfs -ls /tmp
Ensure that the access permissions for the
hadoop
group and other users have changed:drwxr-xr-x - hdfs hadoop 0 2021-10-20 09:03 /tmp/hadoop01 -rw-rw-rw- 3 hdfs hadoop 20 2021-10-07 15:36 /tmp/test01 -rw-r--r-- 3 hdfs hadoop 21 2021-10-08 08:40 /tmp/test02 -rw-r--r-- 3 hdfs hadoop 19 2021-10-08 08:27 /tmp/test03 -rw-r--r-- 3 hdfs hadoop 0 2021-10-20 09:00 /tmp/test04
The required permissions are granted. In this way you can change permissions for any file or directory available in your HDFS.
You can also change group
or owner
for files and directories using the chgrp
and chown
commands.
Access control list
Access control list contains access permissions for users and groups to files and directories. ACL provides you with a more flexible and visual representation of this process. For example, you can use ACL if you need to set permissions for different users and groups to the same file or directory. To enable ACL in HDFS, activate the appropriate option on your Hadoop cluster.
The following steps will guide you through the ACL setup process:
-
Enable ACL by adding the following property in the hdfs-site.xml file:
dfs.namenode.acls.enabled : true
NOTETo configure ADH, we recommend you to use Arenadata Cluster Manager (ADCM).
-
Reset your HDFS service to apply the changed configuration. Using ADCM, run a task that reconfigures and restarts the cluster.
IMPORTANTCheck whether all your services have successful status (highlighted in green).
-
Run the terminal on the selected NameNode and execute the following command to read the contents of your test Hadoop directory, for example, /tmp:
$ hdfs dfs -ls /tmp
This command displays all files and directories in the /tmp directory.
-
Choose a file, for example, test01, and execute the following command to get its ACL:
$ hdfs dfs -getfacl /tmp/test01
The output looks similar to the following:
# file: /tmp/test01 # owner: hdfs # group: hadoop user::rw- group::rw- other::rw-
-
Add a user, for example,
yarn
, with all privileges to this file by using the following command:$ hdfs dfs -setfacl -m user:yarn:rwx /tmp/test01
-
Repeat the command
hdfs dfs -getfacl /tmp/test01
to check the updated access list similar to this:# file: /tmp/test01 # owner: hdfs # group: hadoop user::rw- user:yarn:rwx group::rw- mask::rwx other::rw-
The new user yarn
can read and write to the test file. This example illustrates that ACLs provide more advanced features for managing access to HDFS files and directories.