Bulk loading in Phoenix

Daria Barysheva

Contents

Method 1. Use psql
Method 2. Use MapReduce

Bulk data loading is available not only in HBase, but in Phoenix as well. There are two basic methods:

Using psql. Psql is a single-threaded tool well suited for downloading several megabytes or gigabytes of data. It is based on the typical write sequence in HBase, without direct writing to HFiles.

All uploaded files should have the .sql or .csv extension. The first group of files is used for SQL queries, the second one — for loading data.
Using MapReduce. This method is based on ETL processing (Extract, Transform, Load). Files are uploaded to HDFS and then transformed by the built-in MapReduce job CsvBulkLoadTool. The transformed data is loaded directly into HFiles in HBase by the same job. This method can be used for significantly larger amounts of data in comparison with psql because MapReduce uses a multi-threaded process. So, using MapReduce is recommended for production needs.

The supported file extensions are .csv and .json.

NOTE

You can find more information about bulk data loading in Phoenix documentation.

Method 1. Use psql

Prepare the best_books.sql SQL file for creating a new table BEST_BOOKS with the following structure:

CREATE TABLE IF NOT EXISTS best_books
	(author VARCHAR not null,
	title VARCHAR not null,
	public_year UNSIGNED_SMALLINT,
	CONSTRAINT pk PRIMARY KEY (author, title));

This table will contain information about famous books, including their authors, titles, and dates of the first publication. You can download the source file best_books.sql.

CAUTION

For some reasons, psql does not work with lowercase table names. If you call the table "best_books", it will lead to errors when loading data via psql, although when you do it manually, everything works perfectly. So, we recommend not to use double quotes in table names, allowing their converting to uppercase. This restriction does not apply to column names.

Prepare two CSV files for loading data into the BEST_BOOKS table: best_books1.csv and best_books2.csv.
Prepare the best_books_query.sql file for selecting data from the BEST_BOOKS table:
```
SELECT
	public_year as "Year",
	count(*) as "Books count"
FROM best_books
GROUP BY public_year
ORDER BY public_year;
```
This query will return the number of the books grouped by year of first publication. You can download the file best_books_query.sql.

Download all the files to the local file system of one of your HBase cluster hosts. Make sure that the files are uploaded successfully using the following command:

$ ls -la ~

The output looks similar to the following:

total 96
drwx------. 6 dasha dasha  4096 Dec  1 14:33 .
drwxr-xr-x. 3 root  root     19 Aug 31 11:54 ..
drwx------. 3 dasha dasha    17 Aug 31 15:06 .ansible
-rw-------. 1 dasha dasha 23386 Dec  1 12:45 .bash_history
-rw-r--r--. 1 dasha dasha    18 Apr  1  2020 .bash_logout
-rw-r--r--. 1 dasha dasha   193 Apr  1  2020 .bash_profile
-rw-r--r--. 1 dasha dasha   231 Apr  1  2020 .bashrc
-rw-rw-r--. 1 dasha dasha  2736 Dec  1 14:33 best_books1.csv
-rw-rw-r--. 1 dasha dasha  1140 Dec  1 14:33 best_books2.csv
-rw-rw-r--. 1 dasha dasha   122 Dec  1 14:33 best_books_query.sql
-rw-rw-r--. 1 dasha dasha   176 Dec  1 14:33 best_books.sql
drwxrwxrwx. 2 dasha dasha    64 Nov 26 07:45 dasha
-rw-rw-r--. 1 dasha dasha 17651 Nov 26 07:46 people_ages.csv
drwxrwxr-x. 2 dasha dasha    21 Nov 30 14:04 .sqlline
drwx------. 2 dasha dasha    29 Sep 23 17:35 .ssh

Change the directory:
```
$ cd /usr/lib/phoenix/bin
```

Use the following command to run psql. The -t argument means the table name. It should be explicitly specified in case of a mismatch with the CSV file name (which is used by default). The -d argument specifies the delimiter used in the CSV file (the default one is a comma ,). Other possible arguments can be found in Phoenix documentation.

You can define several files at once, but mind that they are passed one by one; so specify them in the right order. In our example, we firstly create a table, then load data into it, and then run a query:

$ ./psql.py -t BEST_BOOKS -d '|' ~/best_books.sql ~/best_books1.csv ~/best_books_query.sql

The output

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-5.0.0-HBase-2.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
21/12/01 16:13:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
no rows upserted
Time: 1.381 sec(s)

csv columns from database.
CSV Upsert complete. 70 rows upserted
Time: 0.068 sec(s)
                                    Year                              Books count
---------------------------------------- ----------------------------------------
                                    1924                                        1
                                    1925                                        2
                                    1926                                        1
                                    1927                                        3
                                    1929                                        2
                                    1932                                        1
                                    1934                                        1
                                    1936                                        1
                                    1937                                        1
                                    1939                                        2
                                    1940                                        3
                                    1945                                        2
                                    1946                                        1
                                    1947                                        1
                                    1949                                        1
                                    1950                                        1
                                    1951                                        1
                                    1952                                        1
                                    1953                                        1
                                    1954                                        2
                                    1955                                        2
                                    1957                                        1
                                    1958                                        1
                                    1959                                        1
                                    1960                                        2
                                    1961                                        4
                                    1962                                        3
                                    1963                                        1
                                    1965                                        1
                                    1966                                        2
                                    1969                                        4
                                    1970                                        2
                                    1973                                        1
                                    1975                                        1
                                    1981                                        1
                                    1984                                        1
                                    1985                                        2
                                    1987                                        2
                                    1990                                        1
                                    1992                                        1
                                    1996                                        1
                                    1997                                        1
                                    2000                                        2
                                    2001                                        2
                                    2005                                        1
Time: 0.027 sec(s)

So, you can see that the command does three things at the same time: creates a table, inserts data, and queries the results.

Check the results of the command. Run SQLLine and perform the following query to display the first ten records of the created table:

SELECT * FROM best_books LIMIT 10;

The output:

+-----------------------+-----------------------------------------------------+--------------+
|        AUTHOR         |                        TITLE                        | PUBLIC_YEAR  |
+-----------------------+-----------------------------------------------------+--------------+
| A.S. Byatt            | Possession                                          | 1990         |
| Alan Moore            | Watchmen                                            | 1987         |
| Anthony Burgess       | A Clockwork Orange                                  | 1962         |
| C.S. Lewis            | The Lion, the Witch and the Wardrobe                | 1950         |
| Carson McCullers      | The Heart Is a Lonely Hunter                        | 1940         |
| Chinua Achebe         | Chinua Achebe                                       | 1958         |
| Cormac McCarthy       | Blood Meridian, or the Evening Redness in the West  | 1985         |
| Dashiell Hammett      | Red Harvest                                         | 1929         |
| David Foster Wallace  | Infinite Jest                                       | 1996         |
| Don DeLillo           | White Noise                                         | 1985         |
+-----------------------+-----------------------------------------------------+--------------+
10 rows selected (0.059 seconds)

Use HBase shell to scan the created table. The first ten values are listed below. Pay attention that as we defined a complex primary key for this table, the two fields, which are included into it, are stored as a single row key in HBase (AUTHOR + TITLE):

scan 'BEST_BOOKS'

The output:

ROW                                                                                                   COLUMN+CELL
 A.S. Byatt\x00Possession                                                                             column=0:\x00\x00\x00\x00, timestamp=1638375201460, value=x
 A.S. Byatt\x00Possession                                                                             column=0:\x80\x0B, timestamp=1638375201460, value=\x07\xC6
 Alan Moore\x00Watchmen                                                                               column=0:\x00\x00\x00\x00, timestamp=1638375201460, value=x
 Alan Moore\x00Watchmen                                                                               column=0:\x80\x0B, timestamp=1638375201460, value=\x07\xC3
 Anthony Burgess\x00A Clockwork Orange                                                                column=0:\x00\x00\x00\x00, timestamp=1638375201460, value=x
 Anthony Burgess\x00A Clockwork Orange                                                                column=0:\x80\x0B, timestamp=1638375201460, value=\x07\xAA
 C.S. Lewis\x00The Lion, the Witch and the Wardrobe                                                   column=0:\x00\x00\x00\x00, timestamp=1638375201460, value=x
 C.S. Lewis\x00The Lion, the Witch and the Wardrobe                                                   column=0:\x80\x0B, timestamp=1638375201460, value=\x07\x9E
 Carson McCullers\x00The Heart Is a Lonely Hunter                                                     column=0:\x00\x00\x00\x00, timestamp=1638375201460, value=x
 Carson McCullers\x00The Heart Is a Lonely Hunter                                                     column=0:\x80\x0B, timestamp=1638375201460, value=\x07\x94

After the table creation, you can run psql once more to upload the second data part. Notice that this command uses only two files (the first script is not required any more):

$ ./psql.py -t BEST_BOOKS -d '|' ~/best_books2.csv ~/best_books_query.sql

The output:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-5.0.0-HBase-2.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
21/12/01 16:16:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
csv columns from database.
CSV Upsert complete. 30 rows upserted
Time: 0.065 sec(s)

                                    Year                              Books count
---------------------------------------- ----------------------------------------
                                    1924                                        1
                                    1925                                        3
                                    1926                                        1
                                    1927                                        3
                                    1929                                        2
                                    1932                                        1
                                    1934                                        5
                                    1936                                        1
                                    1937                                        1
                                    1938                                        1
                                    1939                                        4
                                    1940                                        4
                                    1945                                        4
                                    1946                                        1
                                    1947                                        1
                                    1948                                        1
                                    1949                                        2
                                    1950                                        1
                                    1951                                        1
                                    1952                                        1
                                    1953                                        2
                                    1954                                        3
                                    1955                                        4
                                    1957                                        3
                                    1958                                        1
                                    1959                                        1
                                    1960                                        3
                                    1961                                        5
                                    1962                                        4
                                    1963                                        1
                                    1964                                        1
                                    1965                                        1
                                    1966                                        2
                                    1967                                        1
                                    1969                                        4
                                    1970                                        3
                                    1973                                        1
                                    1974                                        1
                                    1975                                        1
                                    1977                                        1
                                    1980                                        1
                                    1981                                        1
                                    1984                                        2
                                    1985                                        2
                                    1986                                        1
                                    1987                                        2
                                    1990                                        1
                                    1992                                        1
                                    1996                                        1
                                    1997                                        1
                                    2000                                        2
                                    2001                                        2
                                    2005                                        1
Time: 0.033 sec(s)

In conclusion, let’s make sure that the data uploaded to HBase via psql is not stored in HFiles yet. To do this, run the list_regions command. The value of the LOCALITY column, equal to 0, proves that:

list_regions 'BEST_BOOKS'

The output:

                                        SERVER_NAME |                                                 REGION_NAME |  START_KEY |    END_KEY |  SIZE |   REQ |   LOCALITY |
 -------------------------------------------------- | ----------------------------------------------------------- | ---------- | ---------- | ----- | ----- | ---------- |
 bds-adh-2.ru-central1.internal,16020,1638426590048 | BEST_BOOKS,,1638430003108.36ee4f69dc1ab9d4a075091ff9ea3b80. |            |            |     0 |   140 |        0.0 |
 1 rows
Took 0.5920 seconds

Method 2. Use MapReduce

Using HBase shell, truncate the table content to make it empty again. Pay attention that the table should be created in Phoenix before using MapReduce.
```
truncate 'BEST_BOOKS'
```
The output looks like the following:
```
Truncating 'BEST_BOOKS' table (it may take a while):
Disabling table...
Truncating table...
Took 2.1019 seconds
```
NOTE

The steps 2 and 3 are necessary if you did not execute the example with bulk loading in HBase, as described in Bulk loading in HBase.

Create a directory for your user in HDFS if it does not exist yet:

$ sudo -u hdfs hdfs dfs -mkdir /user/dasha
$ hdfs dfs -ls /user

The output looks similar to the following:

Found 5 items
drwxr-xr-x   - hdfs   hadoop          0 2021-11-26 09:56 /user/dasha
drwx------   - hdfs   hadoop          0 2021-08-31 16:15 /user/hdfs
drwxr-xr-x   - mapred hadoop          0 2021-08-31 16:22 /user/history
drwxr-xr-x   - mapred mapred          0 2021-08-31 16:21 /user/mapred
drwxr-xr-x   - yarn   yarn            0 2021-09-01 06:57 /user/yarn

Expand access rights to your folder using the following command (see more information about file protection in Protect files in HDFS):

$ sudo -u hdfs hdfs dfs -chmod 777 /user/dasha
$ hdfs dfs -ls /user

The output looks similar to the following:

Found 5 items
drwxrwxrwx   - hdfs   hadoop          0 2021-11-26 09:57 /user/dasha
drwx------   - hdfs   hadoop          0 2021-08-31 16:15 /user/hdfs
drwxr-xr-x   - mapred hadoop          0 2021-08-31 16:22 /user/history
drwxr-xr-x   - mapred mapred          0 2021-08-31 16:21 /user/mapred
drwxr-xr-x   - yarn   yarn            0 2021-09-01 06:57 /user/yarn

Copy the best_books1.csv and best_books2.csv files from the local file system to HDFS:

$ hdfs dfs -copyFromLocal ~/best_books1.csv /user/dasha
$ hdfs dfs -copyFromLocal ~/best_books2.csv /user/dasha

Make sure the files are located in HDFS:

$ hdfs dfs -ls /user/dasha

The output:

Found 6 items
drwx------   - dasha hadoop          0 2021-11-26 09:59 /user/dasha/.staging
-rw-r--r--   3 dasha hadoop       2736 2021-12-02 06:55 /user/dasha/best_books1.csv
-rw-r--r--   3 dasha hadoop       1140 2021-12-02 06:55 /user/dasha/best_books2.csv
drwxrwxrwx   - dasha hadoop          0 2021-11-26 09:59 /user/dasha/hbase-staging
-rw-r--r--   3 dasha hadoop      17651 2021-11-26 09:57 /user/dasha/people_ages.csv
drwxr-xr-x   - hbase hbase           0 2021-11-26 09:59 /user/dasha/test_output

Use the following command for running the MapReduce job CsvBulkLoadTool. It will perform both data transformation and loading to HFiles.

Enter hadoop jar and then define the full path to the phoenix-<phoenix_version>-HBase-<hbase_version>client.jar JAR file. Arguments are listed below:

Arguments
-t	The table name (mandatory)
-d	The delimiter used in the CSV file (optional). The default value is a comma `,`
-i	The path to the CSV file (mandatory)

Other possible arguments can be found in Phoenix documentation.

$ hadoop jar /usr/lib/phoenix/phoenix-5.0.0-HBase-2.0-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool -d '|' -t BEST_BOOKS -i /user/dasha/best_books1.csv

The output:

2021-12-02 07:18:15,214 INFO util.QueryUtil: Creating connection with the jdbc url: jdbc:phoenix:localhost:2181:/hbase;
2021-12-02 07:18:15,543 INFO log.QueryLoggerDisruptor: Starting  QueryLoggerDisruptor for with ringbufferSize=8192, waitStrategy=BlockingWaitStrategy, exceptionHandler=org.apache.phoenix.log.QueryLoggerDefaultExceptionHandler@1ee563dc...
2021-12-02 07:18:15,560 INFO query.ConnectionQueryServicesImpl: An instance of ConnectionQueryServices was created.
2021-12-02 07:18:15,690 INFO zookeeper.ReadOnlyZKClient: Connect 0x5ad1da90 to localhost:2181 with session timeout=90000ms, retries 30, retry interval 1000ms, keepAlive=60000ms
2021-12-02 07:18:15,709 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2021-12-02 07:18:15,709 INFO zookeeper.ZooKeeper: Client environment:host.name=bds-adh-1.ru-central1.internal
2021-12-02 07:18:15,709 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_312
2021-12-02 07:18:15,709 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Red Hat, Inc.
2021-12-02 07:18:15,709 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-1.el7_9.x86_64/jre
2021-12-02 07:18:15,709 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jsch-0.1.54.jar:/usr/lib/hadoop/lib/accessors-smart-1.2.jar:/usr/lib/hadoop/lib/stax2-api.jar:/usr/lib/hadoop/lib/asm-5.0.4.jar:/usr/lib/hadoop/lib/jetty-io-9.3.24.v20180605.jar:/usr/lib/hadoop/lib/audience-annotations-0.5.0.jar:/usr/lib/hadoop/lib/token-provider-1.0.1.jar:/usr/lib/hadoop/lib/avro-1.7.7.jar:/usr/lib/hadoop/lib/json-smart-2.3.jar:/usr/lib/hadoop/lib/commons-beanutils-1.9.3.jar:/usr/lib/hadoop/lib/woodstox-core-5.0.3.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-codec-1.11.jar:/usr/lib/hadoop/lib/jetty-security-9.3.24.v20180605.jar:/usr/lib/hadoop/lib/commons-collections-3.2.2.jar:/usr/lib/hadoop/lib/jsr305-3.0.0.jar:/usr/lib/hadoop/lib/commons-compress-1.18.jar:/usr/lib/hadoop/lib/jetty-server-9.3.24.v20180605.jar:/usr/lib/hadoop/lib/commons-configuration2-2.1.1.jar:/usr/lib/hadoop/lib/woodstox-core.jar:/usr/lib/hadoop/lib/commons-io-2.5.jar:/usr/lib/hadoop/lib/jsr311-api-1.1.1.jar:/usr/lib/hadoop/lib/commons-lang-2.6.jar:/usr/lib/hadoop/lib/jul-to-slf4j-1.7.25.jar:/usr/lib/hadoop/lib/commons-lang3-3.4.jar:/usr/lib/hadoop/lib/kerb-server-1.0.1.jar:/usr/lib/hadoop/lib/commons-logging-1.1.3.jar:/usr/lib/hadoop/lib/kerb-simplekdc-1.0.1.jar:/usr/lib/hadoop/lib/commons-math3-3.1.1.jar:/usr/lib/hadoop/lib/zookeeper-3.4.13.jar:/usr/lib/hadoop/lib/commons-net-3.6.jar:/usr/lib/hadoop/lib/kerby-config-1.0.1.jar:/usr/lib/hadoop/lib/curator-client-2.13.0.jar:/usr/lib/hadoop/lib/jetty-servlet-9.3.24.v20180605.jar:/usr/lib/hadoop/lib/curator-framework-2.13.0.jar:/usr/lib/hadoop/lib/kerby-asn1-1.0.1.jar:/usr/lib/hadoop/lib/curator-recipes-2.13.0.jar:/usr/lib/hadoop/lib/gson-2.2.4.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/jetty-util-9.3.24.v20180605.jar:/usr/lib/hadoop/lib/htrace-core4-4.1.0-incubating.jar:/usr/lib/hadoop/lib/kerby-pkix-1.0.1.jar:/usr/lib/hadoop/lib/httpclient-4.5.2.jar:/usr/lib/hadoop/lib/httpcore-4.4.4.jar:/usr/lib/hadoop/lib/jetty-webapp-9.3.24.v20180605.jar:/usr/lib/hadoop/lib/jackson-annotations-2.7.8.jar:/usr/lib/hadoop/lib/kerby-util-1.0.1.jar:/usr/lib/hadoop/lib/jackson-core-2.7.8.jar:/usr/lib/hadoop/lib/kerby-xdr-1.0.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.9.13.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/jackson-databind-2.7.8.jar:/usr/lib/hadoop/lib/metrics-core-3.2.4.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.9.13.jar:/usr/lib/hadoop/lib/kerb-identity-1.0.1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.9.13.jar:/usr/lib/hadoop/lib/nimbus-jose-jwt-4.41.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.9.13.jar:/usr/lib/hadoop/lib/javax.servlet-api-3.1.0.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.11.jar:/usr/lib/hadoop/lib/netty-3.10.5.Final.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/jcip-annotations-1.0-1.jar:/usr/lib/hadoop/lib/protobuf-java-2.5.0.jar:/usr/lib/hadoop/lib/jersey-core-1.19.jar:/usr/lib/hadoop/lib/re2j-1.1.jar:/usr/lib/hadoop/lib/jersey-json-1.19.jar:/usr/lib/hadoop/lib/slf4j-api-1.7.25.jar:/usr/lib/hadoop/lib/jersey-server-1.19.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar:/usr/lib/hadoop/lib/jersey-servlet-1.19.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jetty-xml-9.3.24.v20180605.jar:/usr/lib/hadoop/lib/jetty-http-9.3.24.v20180605.jar:/usr/lib/hadoop/lib/kerb-admin-1.0.1.jar:/usr/lib/hadoop/lib/snappy-java-1.0.5.jar:/usr/lib/hadoop/lib/kerb-client-1.0.1.jar:/usr/lib/hadoop/lib/stax2-api-3.1.4.jar:/usr/lib/hadoop/lib/kerb-common-1.0.1.jar:/usr/lib/hadoop/lib/kerb-core-1.0.1.jar:/usr/lib/hadoop/lib/kerb-crypto-1.0.1.jar:/usr/lib/hadoop/lib/kerb-util-1.0.1.jar:/usr/lib/hadoop/.//hadoop-annotations-3.1.2.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-3.1.2.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop/.//hadoop-common-3.1.2-tests.jar:/usr/lib/hadoop/.//hadoop-common-3.1.2.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-kms-3.1.2.jar:/usr/lib/hadoop/.//hadoop-kms.jar:/usr/lib/hadoop/.//hadoop-nfs-3.1.2.jar:/usr/lib/hadoop/.//hadoop-nfs.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/kerb-client-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/accessors-smart-1.2.jar:/usr/lib/hadoop-hdfs/lib/stax2-api-3.1.4.jar:/usr/lib/hadoop-hdfs/lib/asm-5.0.4.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-ajax-9.3.24.v20180605.jar:/usr/lib/hadoop-hdfs/lib/audience-annotations-0.5.0.jar:/usr/lib/hadoop-hdfs/lib/token-provider-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/avro-1.7.7.jar:/usr/lib/hadoop-hdfs/lib/kerb-common-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/commons-beanutils-1.9.3.jar:/usr/lib/hadoop-hdfs/lib/woodstox-core-5.0.3.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/kerb-core-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.11.jar:/usr/lib/hadoop-hdfs/lib/jetty-io-9.3.24.v20180605.jar:/usr/lib/hadoop-hdfs/lib/commons-collections-3.2.2.jar:/usr/lib/hadoop-hdfs/lib/kerb-crypto-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/commons-compress-1.18.jar:/usr/lib/hadoop-hdfs/lib/jetty-xml-9.3.24.v20180605.jar:/usr/lib/hadoop-hdfs/lib/commons-configuration2-2.1.1.jar:/usr/lib/hadoop-hdfs/lib/kerb-identity-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.13.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.13.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.5.jar:/usr/lib/hadoop-hdfs/lib/kerb-server-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.6.jar:/usr/lib/hadoop-hdfs/lib/kerb-simplekdc-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/commons-lang3-3.4.jar:/usr/lib/hadoop-hdfs/lib/kerb-util-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.3.jar:/usr/lib/hadoop-hdfs/lib/kerby-asn1-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/commons-math3-3.1.1.jar:/usr/lib/hadoop-hdfs/lib/commons-net-3.6.jar:/usr/lib/hadoop-hdfs/lib/kerby-config-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/curator-client-2.13.0.jar:/usr/lib/hadoop-hdfs/lib/jetty-webapp-9.3.24.v20180605.jar:/usr/lib/hadoop-hdfs/lib/curator-framework-2.13.0.jar:/usr/lib/hadoop-hdfs/lib/kerby-pkix-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/curator-recipes-2.13.0.jar:/usr/lib/hadoop-hdfs/lib/gson-2.2.4.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/jsch-0.1.54.jar:/usr/lib/hadoop-hdfs/lib/htrace-core4-4.1.0-incubating.jar:/usr/lib/hadoop-hdfs/lib/kerby-util-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/httpclient-4.5.2.jar:/usr/lib/hadoop-hdfs/lib/httpcore-4.4.4.jar:/usr/lib/hadoop-hdfs/lib/json-simple-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-annotations-2.7.8.jar:/usr/lib/hadoop-hdfs/lib/kerby-xdr-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-2.7.8.jar:/usr/lib/hadoop-hdfs/lib/leveldbjni-all-1.8.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jackson-databind-2.7.8.jar:/usr/lib/hadoop-hdfs/lib/netty-3.10.5.Final.jar:/usr/lib/hadoop-hdfs/lib/jackson-jaxrs-1.9.13.jar:/usr/lib/hadoop-hdfs/lib/json-smart-2.3.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/lib/hadoop-hdfs/lib/netty-all-4.0.52.Final.jar:/usr/lib/hadoop-hdfs/lib/jackson-xc-1.9.13.jar:/usr/lib/hadoop-hdfs/lib/nimbus-jose-jwt-4.41.1.jar:/usr/lib/hadoop-hdfs/lib/javax.servlet-api-3.1.0.jar:/usr/lib/hadoop-hdfs/lib/jaxb-api-2.2.11.jar:/usr/lib/hadoop-hdfs/lib/okhttp-2.7.5.jar:/usr/lib/hadoop-hdfs/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-hdfs/lib/okio-1.6.0.jar:/usr/lib/hadoop-hdfs/lib/jcip-annotations-1.0-1.jar:/usr/lib/hadoop-hdfs/lib/paranamer-2.3.jar:/usr/lib/hadoop-hdfs/lib/jersey-core-1.19.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.5.0.jar:/usr/lib/hadoop-hdfs/lib/jersey-json-1.19.jar:/usr/lib/hadoop-hdfs/lib/re2j-1.1.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.19.jar:/usr/lib/hadoop-hdfs/lib/snappy-java-1.0.5.jar:/usr/lib/hadoop-hdfs/lib/jersey-servlet-1.19.jar:/usr/lib/hadoop-hdfs/lib/jettison-1.1.jar:/usr/lib/hadoop-hdfs/lib/jsr305-3.0.0.jar:/usr/lib/hadoop-hdfs/lib/jetty-http-9.3.24.v20180605.jar:/usr/lib/hadoop-hdfs/lib/jetty-security-9.3.24.v20180605.jar:/usr/lib/hadoop-hdfs/lib/jetty-server-9.3.24.v20180605.jar:/usr/lib/hadoop-hdfs/lib/jsr311-api-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/jetty-servlet-9.3.24.v20180605.jar:/usr/lib/hadoop-hdfs/lib/kerb-admin-1.0.1.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-9.3.24.v20180605.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-3.1.2-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-3.1.2.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-client-3.1.2-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-client-3.1.2.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-client.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-httpfs-3.1.2.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-httpfs.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-native-client-3.1.2-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-native-client-3.1.2.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-native-client.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-nfs-3.1.2.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-nfs.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-rbf-3.1.2-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-rbf-3.1.2.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-rbf.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-mapreduce/.//hadoop-streaming.jar:/usr/lib/hadoop-mapreduce/.//aliyun-sdk-oss-2.8.3.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-examples.jar:/usr/lib/hadoop-mapreduce/.//aws-java-sdk-bundle-1.11.271.jar:/usr/lib/hadoop-mapreduce/.//hadoop-openstack-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//azure-data-lake-store-sdk-2.2.7.jar:/usr/lib/hadoop-mapreduce/.//hadoop-openstack.jar:/usr/lib/hadoop-mapreduce/.//azure-keyvault-core-1.0.0.jar:/usr/lib/hadoop-mapreduce/.//jdom-1.1.jar:/usr/lib/hadoop-mapreduce/.//azure-storage-7.0.0.jar:/usr/lib/hadoop-mapreduce/.//kafka-clients-0.8.2.1.jar:/usr/lib/hadoop-mapreduce/.//hadoop-aliyun-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-aliyun.jar:/usr/lib/hadoop-mapreduce/.//hadoop-resourceestimator-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-archive-logs-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//lz4-1.2.0.jar:/usr/lib/hadoop-mapreduce/.//hadoop-archive-logs.jar:/usr/lib/hadoop-mapreduce/.//netty-buffer-4.1.17.Final.jar:/usr/lib/hadoop-mapreduce/.//hadoop-archives-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-archives.jar:/usr/lib/hadoop-mapreduce/.//netty-codec-4.1.17.Final.jar:/usr/lib/hadoop-mapreduce/.//hadoop-aws-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-aws.jar:/usr/lib/hadoop-mapreduce/.//netty-codec-http-4.1.17.Final.jar:/usr/lib/hadoop-mapreduce/.//hadoop-azure-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-resourceestimator.jar:/usr/lib/hadoop-mapreduce/.//hadoop-azure-datalake-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//netty-common-4.1.17.Final.jar:/usr/lib/hadoop-mapreduce/.//hadoop-azure-datalake.jar:/usr/lib/hadoop-mapreduce/.//hadoop-azure.jar:/usr/lib/hadoop-mapreduce/.//netty-handler-4.1.17.Final.jar:/usr/lib/hadoop-mapreduce/.//hadoop-datajoin-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-datajoin.jar:/usr/lib/hadoop-mapreduce/.//netty-resolver-4.1.17.Final.jar:/usr/lib/hadoop-mapreduce/.//hadoop-distcp-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-distcp.jar:/usr/lib/hadoop-mapreduce/.//netty-transport-4.1.17.Final.jar:/usr/lib/hadoop-mapreduce/.//hadoop-extras-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-extras.jar:/usr/lib/hadoop-mapreduce/.//ojalgo-43.0.jar:/usr/lib/hadoop-mapreduce/.//hadoop-fs2img-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-fs2img.jar:/usr/lib/hadoop-mapreduce/.//hadoop-gridmix-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-gridmix.jar:/usr/lib/hadoop-mapreduce/.//hadoop-kafka-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-kafka.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-jobclient-3.1.2-tests.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-app-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-rumen-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-app.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-jobclient-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-common-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-common.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-jobclient.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-core-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-sls-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-core.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-nativetask-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-hs-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-uploader-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-hs-plugins-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-uploader.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-hs-plugins.jar:/usr/lib/hadoop-mapreduce/.//hadoop-sls.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-hs.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-nativetask.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-examples-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-shuffle-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-streaming-3.1.2.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-shuffle.jar:/usr/lib/hadoop-mapreduce/.//hadoop-rumen.jar:/usr/lib/hadoop-yarn/./:/usr/lib/hadoop-yarn/lib/HikariCP-java7-2.4.12.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/dnsjava-2.1.7.jar:/usr/lib/hadoop-yarn/lib/ehcache-3.3.1.jar:/usr/lib/hadoop-yarn/lib/fst-2.50.jar:/usr/lib/hadoop-yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar:/usr/lib/hadoop-yarn/lib/guice-4.0.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-4.0.jar:/usr/lib/hadoop-yarn/lib/jackson-jaxrs-base-2.7.8.jar:/usr/lib/hadoop-yarn/lib/jackson-jaxrs-json-provider-2.7.8.jar:/usr/lib/hadoop-yarn/lib/jackson-module-jaxb-annotations-2.7.8.jar:/usr/lib/hadoop-yarn/lib/java-util-1.9.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/jersey-client-1.19.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.19.jar:/usr/lib/hadoop-yarn/lib/json-io-2.5.1.jar:/usr/lib/hadoop-yarn/lib/metrics-core-3.2.4.jar:/usr/lib/hadoop-yarn/lib/mssql-jdbc-6.2.1.jre7.jar:/usr/lib/hadoop-yarn/lib/objenesis-1.0.jar:/usr/lib/hadoop-yarn/lib/snakeyaml-1.16.jar:/usr/lib/hadoop-yarn/lib/swagger-annotations-1.5.4.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-services-core.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-unmanaged-am-launcher-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-unmanaged-am-launcher.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-client-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-client.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-registry-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-registry.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-services-core-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-applicationhistoryservice-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-applicationhistoryservice.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-router-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-router.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-sharedcachemanager-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-sharedcachemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-timeline-pluginstorage-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-timeline-pluginstorage.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-services-api-3.1.2.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-services-api.jar
2021-12-02 07:18:15,710 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/lib/hadoop/lib/native
2021-12-02 07:18:15,710 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2021-12-02 07:18:15,710 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2021-12-02 07:18:15,711 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
2021-12-02 07:18:15,711 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
2021-12-02 07:18:15,711 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-1160.11.1.el7.x86_64
2021-12-02 07:18:15,711 INFO zookeeper.ZooKeeper: Client environment:user.name=dasha
2021-12-02 07:18:15,711 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/dasha
2021-12-02 07:18:15,711 INFO zookeeper.ZooKeeper: Client environment:user.dir=/usr/lib/phoenix/bin
2021-12-02 07:18:15,714 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$13/1224047103@df7fe85
2021-12-02 07:18:15,736 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2021-12-02 07:18:15,741 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
2021-12-02 07:18:15,750 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x100000335ca0024, negotiated timeout = 40000
2021-12-02 07:18:15,915 INFO query.ConnectionQueryServicesImpl: HConnection established. Stacktrace for informational purposes: hconnection-0xe487ec java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.phoenix.util.LogUtil.getCallerStackTrace(LogUtil.java:55)
org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:432)
org.apache.phoenix.query.ConnectionQueryServicesImpl.access$400(ConnectionQueryServicesImpl.java:272)
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2556)
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2532)
org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)
org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2532)
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:255)
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:150)
org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
java.sql.DriverManager.getConnection(DriverManager.java:664)
java.sql.DriverManager.getConnection(DriverManager.java:208)
org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:400)
org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:392)
org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(AbstractBulkLoadTool.java:206)
org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(AbstractBulkLoadTool.java:180)
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:109)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.apache.hadoop.util.RunJar.run(RunJar.java:318)
org.apache.hadoop.util.RunJar.main(RunJar.java:232)

2021-12-02 07:18:18,132 INFO zookeeper.ReadOnlyZKClient: Connect 0x4ecebaad to localhost:2181 with session timeout=90000ms, retries 30, retry interval 1000ms, keepAlive=60000ms
2021-12-02 07:18:18,133 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$13/1224047103@df7fe85
2021-12-02 07:18:18,135 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2021-12-02 07:18:18,135 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
2021-12-02 07:18:18,139 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x100000335ca0025, negotiated timeout = 40000
2021-12-02 07:18:18,145 INFO zookeeper.ReadOnlyZKClient: Connect 0x0aecb893 to localhost:2181 with session timeout=90000ms, retries 30, retry interval 1000ms, keepAlive=60000ms
2021-12-02 07:18:18,146 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$13/1224047103@df7fe85
2021-12-02 07:18:18,147 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2021-12-02 07:18:18,147 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
2021-12-02 07:18:18,150 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x100000335ca0026, negotiated timeout = 40000
2021-12-02 07:18:18,362 INFO mapreduce.MultiHfileOutputFormat:  the table logical name is BEST_BOOKS
2021-12-02 07:18:18,362 INFO client.ConnectionImplementation: Closing master protocol: MasterService
2021-12-02 07:18:18,363 INFO zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x0aecb893 to localhost:2181
2021-12-02 07:18:18,365 INFO mapreduce.MultiHfileOutputFormat: Configuring 1 reduce partitions to match current region count
2021-12-02 07:18:18,365 INFO mapreduce.MultiHfileOutputFormat: Writing partition information to /tmp/hadoop-dasha/partitions_dfe383e6-d920-4829-8340-a86b2926db27
2021-12-02 07:18:18,366 INFO zookeeper.ZooKeeper: Session: 0x100000335ca0026 closed
2021-12-02 07:18:18,367 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x100000335ca0026
2021-12-02 07:18:18,489 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2021-12-02 07:18:18,490 INFO compress.CodecPool: Got brand-new compressor [.deflate]
2021-12-02 07:18:18,643 INFO mapreduce.AbstractBulkLoadTool: Running MapReduce import job from /user/dasha/best_books1.csv to /tmp/2533c78e-2a96-47f4-9e06-aaec783f979f
2021-12-02 07:18:18,836 INFO client.RMProxy: Connecting to ResourceManager at bds-adh-2.ru-central1.internal/10.92.6.9:8032
2021-12-02 07:18:18,965 INFO client.AHSProxy: Connecting to Application History server at bds-adh-3.ru-central1.internal/10.92.6.90:10200
2021-12-02 07:18:19,083 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/dasha/.staging/job_1638426545038_0003
2021-12-02 07:18:20,061 INFO input.FileInputFormat: Total input files to process : 1
2021-12-02 07:18:20,159 INFO mapreduce.JobSubmitter: number of splits:1
2021-12-02 07:18:20,198 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2021-12-02 07:18:20,291 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1638426545038_0003
2021-12-02 07:18:20,291 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-12-02 07:18:20,445 INFO conf.Configuration: resource-types.xml not found
2021-12-02 07:18:20,445 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-12-02 07:18:20,506 INFO impl.YarnClientImpl: Submitted application application_1638426545038_0003
2021-12-02 07:18:20,642 INFO mapreduce.Job: The url to track the job: http://bds-adh-2.ru-central1.internal:8088/proxy/application_1638426545038_0003/
2021-12-02 07:18:20,642 INFO mapreduce.Job: Running job: job_1638426545038_0003
2021-12-02 07:18:29,739 INFO mapreduce.Job: Job job_1638426545038_0003 running in uber mode : false
2021-12-02 07:18:29,741 INFO mapreduce.Job:  map 0% reduce 0%
2021-12-02 07:18:37,797 INFO mapreduce.Job:  map 100% reduce 0%
2021-12-02 07:18:46,833 INFO mapreduce.Job:  map 100% reduce 100%
2021-12-02 07:18:46,838 INFO mapreduce.Job: Job job_1638426545038_0003 completed successfully
2021-12-02 07:18:46,920 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=5824
                FILE: Number of bytes written=534283
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2839
                HDFS: Number of bytes written=8614
                HDFS: Number of read operations=14
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=5
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6406
                Total time spent by all reduces in occupied slots (ms)=6027
                Total time spent by all map tasks (ms)=6406
                Total time spent by all reduce tasks (ms)=6027
                Total vcore-milliseconds taken by all map tasks=6406
                Total vcore-milliseconds taken by all reduce tasks=6027
                Total megabyte-milliseconds taken by all map tasks=6559744
                Total megabyte-milliseconds taken by all reduce tasks=6171648
        Map-Reduce Framework
                Map input records=70
                Map output records=70
                Map output bytes=5678
                Map output materialized bytes=5824
                Input split bytes=103
                Combine input records=0
                Combine output records=0
                Reduce input groups=70
                Reduce shuffle bytes=5824
                Reduce input records=70
                Reduce output records=140
                Spilled Records=140
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=230
                CPU time spent (ms)=8510
                Physical memory (bytes) snapshot=1084018688
                Virtual memory (bytes) snapshot=5900730368
                Total committed heap usage (bytes)=815267840
                Peak Map Physical memory (bytes)=597655552
                Peak Map Virtual memory (bytes)=2946670592
                Peak Reduce Physical memory (bytes)=486363136
                Peak Reduce Virtual memory (bytes)=2954059776
        Phoenix MapReduce Import
                Upserts Done=70
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=2736
        File Output Format Counters
                Bytes Written=8614
2021-12-02 07:18:46,922 INFO mapreduce.AbstractBulkLoadTool: Loading HFiles from /tmp/2533c78e-2a96-47f4-9e06-aaec783f979f
2021-12-02 07:18:46,950 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2021-12-02 07:18:46,953 INFO zookeeper.ReadOnlyZKClient: Connect 0x59a2153e to localhost:2181 with session timeout=90000ms, retries 30, retry interval 1000ms, keepAlive=60000ms
2021-12-02 07:18:46,953 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$13/1224047103@df7fe85
2021-12-02 07:18:46,955 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2021-12-02 07:18:46,955 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
2021-12-02 07:18:46,958 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x100000335ca0027, negotiated timeout = 40000
2021-12-02 07:18:46,959 INFO mapreduce.AbstractBulkLoadTool: Loading HFiles for BEST_BOOKS from /tmp/2533c78e-2a96-47f4-9e06-aaec783f979f/BEST_BOOKS
2021-12-02 07:18:47,126 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties
2021-12-02 07:18:47,132 WARN impl.MetricsSystemImpl: Error creating sink 'tracing'
org.apache.hadoop.metrics2.impl.MetricsConfigException: Error creating plugin: org.apache.phoenix.trace.PhoenixMetricsSink
        at org.apache.hadoop.metrics2.impl.MetricsConfig.getPlugin(MetricsConfig.java:211)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.newSink(MetricsSystemImpl.java:531)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configureSinks(MetricsSystemImpl.java:503)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configure(MetricsSystemImpl.java:479)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:188)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:163)
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:62)
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:58)
        at org.apache.hadoop.hbase.metrics.BaseSourceImpl$DefaultMetricsSystemInitializer.init(BaseSourceImpl.java:54)
        at org.apache.hadoop.hbase.metrics.BaseSourceImpl.<init>(BaseSourceImpl.java:116)
        at org.apache.hadoop.hbase.io.MetricsIOSourceImpl.<init>(MetricsIOSourceImpl.java:46)
        at org.apache.hadoop.hbase.io.MetricsIOSourceImpl.<init>(MetricsIOSourceImpl.java:38)
        at org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactoryImpl.createIO(MetricsRegionServerSourceFactoryImpl.java:84)
        at org.apache.hadoop.hbase.io.MetricsIO.<init>(MetricsIO.java:35)
        at org.apache.hadoop.hbase.io.hfile.HFile.<clinit>(HFile.java:195)
        at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.visitBulkHFiles(LoadIncrementalHFiles.java:1025)
        at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.discoverLoadQueue(LoadIncrementalHFiles.java:941)
        at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.prepareHFileQueue(LoadIncrementalHFiles.java:224)
        at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:331)
        at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:256)
        at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebulkload(AbstractBulkLoadTool.java:355)
        at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob(AbstractBulkLoadTool.java:331)
        at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(AbstractBulkLoadTool.java:267)
        at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(AbstractBulkLoadTool.java:180)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:109)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
Caused by: java.lang.ClassNotFoundException: org.apache.phoenix.trace.PhoenixMetricsSink
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.metrics2.impl.MetricsConfig.getPlugin(MetricsConfig.java:205)
        ... 32 more
2021-12-02 07:18:47,203 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2021-12-02 07:18:47,203 INFO impl.MetricsSystemImpl: HBase metrics system started
2021-12-02 07:18:47,220 INFO metrics.MetricRegistries: Loaded MetricRegistries class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
2021-12-02 07:18:47,346 INFO hfile.CacheConfig: Created cacheConfig: CacheConfig:disabled
2021-12-02 07:18:47,412 INFO tool.LoadIncrementalHFiles: Trying to load hfile=hdfs://adh1/tmp/2533c78e-2a96-47f4-9e06-aaec783f979f/BEST_BOOKS/0/f544e36fd2044f07bce46e920d6c4c60 first=Optional[A.S. Byatt\x00Possession] last=Optional[Zora Neale Hurston\x00Their Eyes Were Watching God]
2021-12-02 07:18:47,526 INFO mapreduce.AbstractBulkLoadTool: Incremental load complete for table=BEST_BOOKS
2021-12-02 07:18:47,526 INFO mapreduce.AbstractBulkLoadTool: Removing output directory /tmp/2533c78e-2a96-47f4-9e06-aaec783f979f

Check the results of the command. You can run SQLLine and perform the following query to display the first ten records of the created table:

SELECT * FROM BEST_BOOKS LIMIT 10;

The output:

+-----------------------+-----------------------------------------------------+--------------+
|        AUTHOR         |                        TITLE                        | PUBLIC_YEAR  |
+-----------------------+-----------------------------------------------------+--------------+
| A.S. Byatt            | Possession                                          | 1990         |
| Alan Moore            | Watchmen                                            | 1987         |
| Anthony Burgess       | A Clockwork Orange                                  | 1962         |
| C.S. Lewis            | The Lion, the Witch and the Wardrobe                | 1950         |
| Carson McCullers      | The Heart Is a Lonely Hunter                        | 1940         |
| Chinua Achebe         | Chinua Achebe                                       | 1958         |
| Cormac McCarthy       | Blood Meridian, or the Evening Redness in the West  | 1985         |
| Dashiell Hammett      | Red Harvest                                         | 1929         |
| David Foster Wallace  | Infinite Jest                                       | 1996         |
| Don DeLillo           | White Noise                                         | 1985         |
+-----------------------+-----------------------------------------------------+--------------+
10 rows selected (0.092 seconds)

You can also get the rows count of the table:

SELECT COUNT(*) FROM BEST_BOOKS;

The output:

+-----------+
| COUNT(1)  |
+-----------+
| 70        |
+-----------+
1 row selected (0.048 seconds)

Finally, run the list_regions command in HBase shell. You can see that the value of the LOCALITY column is 1. It proves that the second method of bulk loading writes data directly into HFiles in HBase.

list_regions 'BEST_BOOKS'

The output:

                                        SERVER_NAME |                                                 REGION_NAME |  START_KEY |    END_KEY |  SIZE |   REQ |   LOCALITY |
 -------------------------------------------------- | ----------------------------------------------------------- | ---------- | ---------- | ----- | ----- | ---------- |
 bds-adh-1.ru-central1.internal,16020,1638426590054 | BEST_BOOKS,,1638427992879.8a84103a2b4ed50e718587bcc8a1e63b. |            |            |     0 |    81 |        1.0 |
 1 rows
Took 0.6029 seconds

Found a mistake? Seleсt text and press Ctrl+Enter to report it