Filesystem requirements
Supported file systems
The Hadoop Distributed File System (HDFS) is designed to operate on top of the underlying file system in the operating system. The following operating systems are supported:
-
ext3 — the most tested underlying file system for HDFS;
-
ext4 — a scalable extension of ext3;
-
XFS — the default file system in RHEL 7.
If you are choosing between ext3 and ext4, ext4 is recommended.
Use noatime option to improve performance
Linux file systems store metadata that records when each file was accessed. It means that each read operation also writes to the disk. Disable this functionality to speed up file reads. To do this, add the noatime
mount option to each line that addresses a file system in the /etc/fstab file, for example:
/dev/sdb1 /data1 xfs defaults,noatime 0
Use the following command to apply changes without rebooting:
$ mount -o remount /data1
Limitations of using file system mount options
-
The
sync
mount option allows you to write synchronously. Usingsync
reduces performance for services that write data to disks (for example, HDFS and YARN). In ADH, most writes are replicated. That is why synchronous writing to disk is unnecessary, expensive, and does not provide a noticeable improvement in the cluster stability. It is not recommended to usesync
. -
The
nfs
andnas
options are not supported to mount a data directory of DataNode. -
Mounting /tmp as a file system with the
noexec
option is not supported. This approach is used to prevent the execution of stored files.
Umask settings
UNIX-based systems use umask (user file-creation mode mask) to set default permissions for created files and directories. In most Linux distributions, the umask default value is 0022 (022)
or 0002 (002)
. The basic rights for a directory are 0777 (rwxrwxrwx)
, and for a file — 0666 (rw-rw-rw)
. Subtract umask from basic rights to determine rights when umask is applied. The default umask 0002
is used for an ordinary user. With this mask, the default rights for a directory are 775
, and for a file 664
. For the superuser (root), the default umask is 0022
. With this mask, the default rights for a directory are 755
, and for a file 644
.
You can set umask in the etc/bashrc or /etc/profile file.
ADH supports umask values listed in the table below.
The umask value |
File |
Directory |
||||||
---|---|---|---|---|---|---|---|---|
Result |
Owner |
Group |
Others |
Result |
Owner |
Group |
Others |
|
0022 (recommended) |
644 |
rw- |
r-- |
r-- |
755 |
rwx |
r-x |
r-x |
0002 |
664 |
rw- |
rw- |
r-- |
775 |
rwx |
rwx |
r-x |
0000 |
666 |
rw- |
rw- |
rw- |
777 |
rwx |
rwx |
rwx |