Copy data from the local file system to HDFS
This page describes how to copy any data from the local file system to the Hadoop file system. For this case, there is a special command hdfs dfs -put
with the following syntax:
$ hdfs dfs -put <source_path> <destination_path>
NOTE
The destination path should be a directory. |
For more information, type hdfs dfs -put
in the terminal to list all the arguments for this command.
Follow these steps to copy data files from the local file system to HDFS.
-
Make sure the data file, for example, file1.txt, is stored in the local file system, for example, in your home directory:
$ ls -a ~
The output looks similar to this:
. .. .ansible .bash_history .bash_logout .bash_profile .bashrc file1.txt .ssh test
-
Check the HDFS root directory contents:
$ hdfs dfs -ls /
The output looks similar to this:
Found 6 items -rw-r--r-- 3 hdfs hadoop 20 2021-10-07 13:34 /hadoop drwxrwxrwt - yarn hadoop 0 2021-09-15 16:58 /logs drwxr-xr-x - hdfs hadoop 0 2021-10-12 13:17 /staging drwxr-xr-x - hdfs hadoop 0 2021-09-15 16:57 /system drwxrwxrwx - hdfs hadoop 0 2021-10-20 09:00 /tmp drwxr-xr-x - hdfs hadoop 0 2021-09-27 12:24 /user
-
Create a separate HDFS directory to copy files to, for example:
$ hdfs dfs -mkdir /how_to_example
Usually, when using a regular user profile, HDFS returns an access error message similar to this:
mkdir: Permission denied: user=user_name, access=WRITE, inode="/user user_name":centos:centos:drwxr-xr-x
A regular user does not have privileges to write to the HDFS root directory.
-
To get around this limitation, carefully execute the required commands using the
root
andhdfs
privileges. To do this, first obtain these privileges:$ sudo -s $ su - hdfs
CAUTIONIt is not safe to use the
root
privileges constantly. Use them only if necessary. -
Using the
hdfs
user privileges, repeat the creation of a new directory called hdfsDirectory and ensure this operation is completed successfully:$ hdfs dfs -mkdir /hdfsDirectory $ hdfs dfs -ls /
The output looks similar to the following:
Found 7 items -rw-r--r-- 3 hdfs hadoop 20 2021-10-07 13:34 /hadoop drwxr-xr-x - hdfs hadoop 0 2021-11-18 13:35 /hdfsDirectory drwxrwxrwt - yarn hadoop 0 2021-09-15 16:58 /logs drwxr-xr-x - hdfs hadoop 0 2021-10-12 13:17 /staging drwxr-xr-x - hdfs hadoop 0 2021-09-15 16:57 /system drwxrwxrwx - hdfs hadoop 0 2021-10-20 09:00 /tmp drwxr-xr-x - hdfs hadoop 0 2021-09-27 12:24 /user
-
Make your regular user, for example,
admin
, the owner of the new directory:$ hdfs dfs -chown admin /hdfsDirectory $ hdfs dfs -ls /
The output looks similar to this:
Found 7 items -rw-r--r-- 3 hdfs hadoop 20 2021-10-07 13:34 /hadoop drwxr-xr-x - admin hadoop 0 2021-11-18 13:35 /hdfsDirectory drwxrwxrwt - yarn hadoop 0 2021-09-15 16:58 /logs drwxr-xr-x - hdfs hadoop 0 2021-10-12 13:17 /staging drwxr-xr-x - hdfs hadoop 0 2021-09-15 16:57 /system drwxrwxrwx - hdfs hadoop 0 2021-10-20 09:00 /tmp drwxr-xr-x - hdfs hadoop 0 2021-09-27 12:24 /user
Now your regular user is the owner of the destination directory. Exit from the
hdfs
and superuser context:$ exit $ exit
-
Copy file1.txt from the local file system to the HDFS directory and verify that the file appears in the destination HDFS directory. You can use either
put
orcopyFromLocal
command:$ hdfs dfs -copyFromLocal ~/file1.txt /hdfsDirectory $ hdfs dfs -ls /hdfsDirectory/file1.txt
The output looks similar to this:
-rw-r--r-- 3 admin hadoop 29 2021-11-18 14:01 /hdfsDirectory/file1.txt
-
Grant the write privilege to /hdfsDirectory for all users if you want to allow all of them to copy to that directory:
$ hdfs dfs -chmod 777 /hdfsDirectory
Any user that has access to HDFS can now copy their files to this directory.
Using the previous steps, you have successfully copied the file1.txt file to HDFS and granted regular users the privilege to copy files to a particular HDFS directory.