DataNode hot swapping

DataNode hot swapping in HDFS is a process of changing a disk on a DataNode without shutting it down.

The following instructions describe how to configure a new disk and add it to the DataNode using CLI and ADCM.

To add a new disk to the DataNode:

  1. Connect a disk to the desired host. You can check if it’s visible to the system by running the lsblk command on the host. Possible output:

    NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    vda    253:0    0  100G  0 disk
    ├─vda1 253:1    0    1M  0 part
    └─vda2 253:2    0  100G  0 part /
    vdb    253:16   0   20G  0 disk
  2. Create a directory for HDFS:

    $ mkdir -p /srv/hadoop-hdfs/data1
  3. Create a file system on the disk:

    $ mkfs.xfs /dev/vdb
  4. Mount the disk:

    $ mount /dev/vdb /srv/hadoop-hdfs/data1
  5. Add the new filesystem to the fstab:

    $ echo "/dev/vdb /srv/hadoop-hdfs/data1 xfs defaults,noatime 0 0" | sudo tee --append /etc/fstab
  6. Mount the file system:

    $ mount -a

    You can check if the system has been mounted successfully using the lsblk command. Possible output:

    NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    vda    253:0    0  100G  0 disk
    ├─vda1 253:1    0    1M  0 part
    └─vda2 253:2    0  100G  0 part /
    vdb    253:16   0   20G  0 disk /srv/hadoop-hdfs/data1
  7. Make hdfs to be the owner of the new directory and grant it permissions as follows:

    $ chown -R hdfs:hadoop /srv/hadoop-hdfs/data1
    $ chmod -R 755 /srv/hadoop-hdfs/data1
  8. Specify the created directory in the dfs.datanode.data.dir parameter for the selected DataNode. You can do this manually, by editing the hfds-site.xml file on the DataNode host, or by creating a config group in ADCM. For more information on how to change the dfs.datanode.data.dir parameter value, see the Add HDFS data directories article. To learn how to create a config group, refer to the Set up configuration groups article.

    CAUTION

    Change the dfs.datanode.data.dir property only for the DataNode whose host has the required directory. Changing the parameter for the whole system may result in an error.

  9. Start the DataNode reconfiguration by running the command:

    $ hdfs dfsadmin -reconfig datanode <HOST>:9867 start

    Where <HOST> is the FQDN of the DataNode host. To check the status of the reconfiguration task, run:

    $ hdfs dfsadmin -reconfig datanode <HOST>:9867 status

To delete a disk from the DataNode, edit the dfs.datanode.data.dir property on the DataNode host and run the reconfiguration command.

Found a mistake? Seleсt text and press Ctrl+Enter to report it