Filesystem requirements

Supported file systems

The Hadoop Distributed File System (HDFS) is designed to operate on top of the underlying file system in the operating system. The following operating systems are supported:

  • ext3 — the most tested underlying file system for HDFS;

  • ext4 — a scalable extension of ext3;

  • XFS — the default file system in RHEL 7.

If you are choosing between ext3 and ext4, ext4 is recommended.

Use noatime option to improve performance

Linux file systems store metadata that records when each file was accessed. It means that each read operation also writes to the disk. Disable this functionality to speed up file reads. To do this, add the noatime mount option to each line that addresses a file system in the /etc/fstab file, for example:

/dev/sdb1 /data1 xfs defaults,noatime 0

Use the following command to apply changes without rebooting:

$ mount -o remount /data1

Limitations of using file system mount options

  • The sync mount option allows you to write synchronously. Using sync reduces performance for services that write data to disks (for example, HDFS and YARN). In ADH, most writes are replicated. That is why synchronous writing to disk is unnecessary, expensive, and does not provide a noticeable improvement in the cluster stability. It is not recommended to use sync.

  • The nfs and nas options are not supported to mount a data directory of DataNode.

  • Mounting /tmp as a file system with the noexec option is not supported. This approach is used to prevent the execution of stored files.

Umask settings

UNIX-based systems use umask (user file-creation mode mask) to set default permissions for created files and directories. In most Linux distributions, the umask default value is 0022 (022) or 0002 (002). The basic rights for a directory are 0777 (rwxrwxrwx), and for a file — 0666 (rw-rw-rw). Subtract umask from basic rights to determine rights when umask is applied. The default umask 0002 is used for an ordinary user. With this mask, the default rights for a directory are 775, and for a file 664. For the superuser (root), the default umask is 0022. With this mask, the default rights for a directory are 755, and for a file 644.

You can set umask in the etc/bashrc or /etc/profile file.

ADH supports umask values listed in the table below.

Supported umask values

The umask value

File

Directory

Result

Owner

Group

Others

Result

Owner

Group

Others

0022 (recommended)

644

rw-

r--

r--

755

rwx

r-x

r-x

0002

664

rw-

rw-

r--

775

rwx

rwx

r-x

0000

666

rw-

rw-

rw-

777

rwx

rwx

rwx

Found a mistake? Seleсt text and press Ctrl+Enter to report it