Remove files & directories
In HDFS, you can remove files and directories using CLI, HDFS UI, or APIs. For more information on these utilities, see the Connect to HDFS article.
When deleting files in HDFS, the metadata of those files and their replicas is deleted first. If there is no metadata for file blocks on the NameNode, they are eventually deleted.
To make any changes to directories in HDFS, you must have access for writing (w
). Otherwise, the system might return an error when trying to delete a file or create a trash directory. See more details on how to get the necessary access rights or change permissions for a directory in the Protect files in HDFS article.
Trash bin
Trash bin is a setting for deleting the data safely. If the trash option is enabled in the core-site.xml configuration file, then when you delete a file, the system creates a .Trash directory and moves the file there. In this case, the path to the file blocks is preserved in the metadata, so the file will not be deleted.
In ADH, the trash bin is enabled by default. The following parameters are responsible for managing the trash files:
-
fs.trash.interval
— time in minutes after which the files will be deleted forever. To disable the trash bin, set this parameter to0
. The default value is1440
. -
fs.trash.checkpoint.interval
— the frequency of emptying the trash in minutes. The default value is60
.
This means that every 60 minutes (the value of fs.trash.checkpoint.interval
) the system checks how long the files have been in the trash. If any of the files are in the trash longer than 1440 minutes (the value of fs.trash.interval
), they are deleted permanently.