Remove files & directories

In HDFS, you can remove files and directories using CLI, HDFS UI, or APIs. For more information on these utilities, see the Connect to HDFS article.

When deleting files in HDFS, the metadata of those files and their replicas is deleted first. If there is no metadata for file blocks on the NameNode, they are eventually deleted.

To make any changes to directories in HDFS, you must have access for writing (w). Otherwise, the system might return an error when trying to delete a file or create a trash directory. See more details on how to get the necessary access rights or change permissions for a directory in the Protect files in HDFS article.

Trash bin

Trash bin is a setting for deleting the data safely. If the trash option is enabled in the core-site.xml configuration file, then when you delete a file, the system creates a .Trash directory and moves the file there. In this case, the path to the file blocks is preserved in the metadata, so the file will not be deleted.

In ADH, the trash bin is enabled by default. The following parameters are responsible for managing the trash files:

  • fs.trash.interval — time in minutes after which the files will be deleted forever. To disable the trash bin, set this parameter to 0. The default value is 1440.

  • fs.trash.checkpoint.interval — the frequency of emptying the trash in minutes. The default value is 60.

This means that every 60 minutes (the value of fs.trash.checkpoint.interval) the system checks how long the files have been in the trash. If any of the files are in the trash longer than 1440 minutes (the value of fs.trash.interval), they are deleted permanently.

Delete files with CLI

To delete a file or a directory, run one of the following commands:

  • rm — removes a file or a directory.

  • rmdir — removes a directory.

  • rmr — removes a file or a directory recursively. This command is deprecated. It’s recommended to use the rm command with the R option instead.

An example command for deleting a directory and skipping the trash bin:

$ hadoop fs -rm -R -skipTrash <URI>

Where <URI> is the path to a file or directory.

Found a mistake? Seleсt text and press Ctrl+Enter to report it