Protect files in HDFS

You can use two methods for file protection in HDFS:

  • Setting file permissions.

  • Creating access control lists (ACL).

File and directory permissions

In HDFS, you can restrict access to files or directories using a standard model based on POSIX with modifications. In HDFS, you can grant permissions to a file for its owner, a specified user group, and other users.

For HDFS files, you can use the following flags:

  • r — for reading.

  • w — for writing.

Unlike other file systems, HDFS stores only data files, so there is no way to start execution of a particular file. Therefore, HDFS files lack such an access flag as x, which would mean the execution privilege. For the same reason, HDFS does not have such access flags as setUID and setGID.

The following flags grant different access types to a directory in HDFS:

  • r — for listing contents in the directory.

  • w — for creating, renaming, and deleting files and directories in it (assuming that also the permission x is granted).

  • x — for entering the directory and getting access to all children in it (makes the directory a working directory for the affected users).

As with files, we don’t have flags similar to setUID and setGID flags for directories either.

Before managing permissions, you should get an access to the NameNode CLI (command-line interface). After that, follow these steps using CLI:

  1. Read the root directory contents:

    $ hdfs dfs -ls /

    The output looks similar to this:

    -rw-r--r--   3 hdfs hadoop         20 2021-10-07 13:34 /hadoop
    drwxrwxrwt   - yarn hadoop          0 2021-09-15 16:58 /logs
    drwxr-xr-x   - hdfs hadoop          0 2021-10-12 13:17 /staging
    drwxr-xr-x   - hdfs hadoop          0 2021-09-15 16:57 /system
    drwxrwxrwx   - hdfs hadoop          0 2021-10-20 09:00 /tmp
    drwxr-xr-x   - hdfs hadoop          0 2021-09-27 12:24 /user

    Take a look at the permissions to the /logs directory, where we have the flags drwxrwxrwt:

    • d specifies a directory.

    • The first three rwx flags indicate the file or the directory owner’s permissions.

    • The second rwx flags indicate permissions for the specified user group.

    • The third rwx flags indicate permissions for other users.

  2. List file permissions in the /tmp directory as in the following example:

    $ hdfs dfs -ls /tmp

    The output looks similar to this:

    drwxr-xr-x   - hdfs hadoop          0 2021-10-20 09:03 /tmp/hadoop01
    -rw-r--r--   3 hdfs hadoop         20 2021-10-07 15:36 /tmp/test01
    -rw-r--r--   3 hdfs hadoop         21 2021-10-08 08:40 /tmp/test02
    -rw-r--r--   3 hdfs hadoop         19 2021-10-08 08:27 /tmp/test03
    -rw-r--r--   3 hdfs hadoop          0 2021-10-20 09:00 /tmp/test04

    In this example, the files are assigned the read and write privileges for the owner (hdfs user), read for the hadoop group, and read for other users.

  3. Before changing file permissions, use the sudo and su commands (switch user) to switch to the file owner whose name is hdfs.

    $ sudo -s
    $ su - hdfs

    You used the above actions to grant yourself the privileges to access files and directories as their owner. Now we execute all the following commands using the hdfs user privileges.

  4. Change the permissions for the group and other users by adding the w flag for them:

    $ hdfs dfs -chmod  go+w /tmp/test01

    Check the permissions to the /tmp/test01 file:

    $ hdfs dfs -ls /tmp

    Ensure that the access permissions for the hadoop group and other users have changed:

    drwxr-xr-x   - hdfs hadoop          0 2021-10-20 09:03 /tmp/hadoop01
    -rw-rw-rw-   3 hdfs hadoop         20 2021-10-07 15:36 /tmp/test01
    -rw-r--r--   3 hdfs hadoop         21 2021-10-08 08:40 /tmp/test02
    -rw-r--r--   3 hdfs hadoop         19 2021-10-08 08:27 /tmp/test03
    -rw-r--r--   3 hdfs hadoop          0 2021-10-20 09:00 /tmp/test04

The required permissions are granted. In this way you can change permissions for any file or directory available in your HDFS.

You can also change group or owner for files and directories using the chgrp and chown commands.

Access control list

Access control list contains access permissions for users and groups to files and directories. ACL provides you with a more flexible and visual representation of this process. For example, you can use ACL if you need to set permissions for different users and groups to the same file or directory. To enable ACL in HDFS, activate the appropriate option on your Hadoop cluster.

The following steps will guide you through the ACL setup process:

  1. Enable ACL by adding the following property in the hdfs-site.xml file:

    dfs.namenode.acls.enabled : true

    To configure ADH, we recommend you to use Arenadata Cluster Manager (ADCM).

  2. Reset your HDFS service to apply the changed configuration. Using ADCM, run a task that reconfigures and restarts the cluster.


    Check whether all your services have successful status (highlighted in green).

  3. Run the terminal on the selected NameNode and execute the following command to read the contents of your test Hadoop directory, for example, /tmp:

    $ hdfs dfs -ls /tmp

    This command displays all files and directories in the /tmp directory.

  4. Choose a file, for example, test01, and execute the following command to get its ACL:

    $ hdfs dfs -getfacl /tmp/test01

    The output looks similar to the following:

    # file: /tmp/test01
    # owner: hdfs
    # group: hadoop
  5. Add a user, for example, yarn, with all privileges to this file by using the following command:

    $ hdfs dfs -setfacl -m user:yarn:rwx /tmp/test01
  6. Repeat the command hdfs dfs -getfacl /tmp/test01 to check the updated access list similar to this:

    # file: /tmp/test01
    # owner: hdfs
    # group: hadoop

The new user yarn can read and write to the test file. This example illustrates that ACLs provide more advanced features for managing access to HDFS files and directories.

Found a mistake? Seleсt text and press Ctrl+Enter to report it