Hive with HBase
Hive/HBase integration allows Hive QL statements to access HBase tables for both read (SELECT) and write (INSERT) operations. You can create a new HBase table in Hive or access an existing table.
Create a new HBase table in Hive
To create a new HBase table that can be managed by Hive, use the STORED BY clause in the CREATE TABLE statement. It allows you to specify a storage handler required for integration.
Example:
$ sudo -u hive hive
CREATE TABLE hbase_table_test(key int, value1 string, value2 int, value3 int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,col_family1:val1,col_family1:val2,col_family2:val3")
TBLPROPERTIES ("hbase.table.name" = "hive_test", "hbase.mapred.output.outputtable" = "hive_test");
Where:
- 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'— specifies a storage handler that allows Hive to access data stored in HBase. - 
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,col_family1:val1,col_family1:val2,col_family2:val3")— controls the mapping of HBase columns to Hive. Thehbase.columns.mappingcan contain the following entries::key,:timestamp, or<column-family-name>:[<column-name>][#(binary|string)]. The type specification is delimited by#. Ifhbase.columns.mappingdoes not include the type specification, the value fromhbase.table.default.storage.typeis used. - 
TBLPROPERTIES ("hbase.table.name" = "hive_test", "hbase.mapred.output.outputtable" = "hive_test")— specifies table properties. Thehbase.table.nameproperty defines the table name in HBase. In this example, the table is known ashbase_table_testwithin Hive, and ashive_testwithin HBase. Ifhbase.table.nameis not set, the table has the same name in Hive and HBase. Thehbase.mapred.output.outputtableproperty is required if you plan to insert data into the table (it is used by the hbase.mapreduce.TableOutputFormat class). 
Execute the list command to check if HBase has information about the new table:
$ hbase shell
$ list
The list output should contain the hive_test table.
In HBase shell, utilize the put command to add data to this table:
put 'hive_test', '1' , 'col_family1:val1', 'test'
put 'hive_test', '1' , 'col_family1:val2', '1'
put 'hive_test', '1' , 'col_family2:val3', '1'
put 'hive_test', '2' , 'col_family1:val1', 'test2'
put 'hive_test', '2' , 'col_family1:val2', '2'
put 'hive_test', '2' , 'col_family2:val3', '2'
Use the scan command to check the table data:
scan 'hive_test'
The result:
ROW COLUMN+CELL 1 column=col_family1:val1, timestamp=1694606917613, value=test 1 column=col_family1:val2, timestamp=1694606917647, value=1 1 column=col_family2:val3, timestamp=1694606922996, value=1 2 column=col_family1:val1, timestamp=1694607054746, value=test2 2 column=col_family1:val2, timestamp=1694607054763, value=2 2 column=col_family2:val3, timestamp=1694607056127, value=2 2 row(s) Took 0.0175 seconds
The data is added successfully to the table created in Hive.
Insert a new row to the same table through the Hive shell:
INSERT INTO hbase_table_test VALUES(3,'test3', 3, 3);
Check the table data in the HBase shell:
scan 'hive_test'
The result:
ROW COLUMN+CELL 1 column=col_family1:val1, timestamp=1698334191476, value=test 1 column=col_family1:val2, timestamp=1698334191504, value=1 1 column=col_family2:val3, timestamp=1698334191538, value=1 2 column=col_family1:val1, timestamp=1698334191591, value=test2 2 column=col_family1:val2, timestamp=1698334191616, value=2 2 column=col_family2:val3, timestamp=1698334193906, value=2 3 column=col_family1:val1, timestamp=1698334255870, value=test3 3 column=col_family1:val2, timestamp=1698334255870, value=3 3 column=col_family2:val3, timestamp=1698334255870, value=3 3 row(s) Took 0.0133 seconds
This example demonstrates that Hive and HBase can work with the same table.
Use an existing HBase table in Hive
You can also operate with an existing HBase table in Hive. The example below uses the people_ages table from the Bulk loading via built-in MapReduce jobs article.
For an existing HBase table, create an external table in the Hive shell and specify other properties according to the table structure:
CREATE EXTERNAL TABLE hbase_people_ages  (name string, age int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,basic:age")
TBLPROPERTIES ("hbase.table.name" = "people_ages", "hbase.mapred.output.outputtable" = "people_ages");
Select data to check whether the table is available:
SELECT * FROM hbase_people_ages LIMIT 5;
The result:
OK Abbott Delia 62 Abbott Howard 24 Abbott Jack 29 Adams Clyde 29 Aguilar Myrtie 23 Time taken: 0.161 seconds, Fetched: 5 row(s)
It is also possible to combine access to HBase tables with native Hive tables via JOIN and UNION clauses.
Create another table in Hive:
CREATE TABLE people (key int, name string, points int);
INSERT INTO people
    VALUES (1,'Phelps Rena', 35), (2,'Quinn Brian', 48), (1,'Reyes Danny', 52);
Execute the query with a JOIN clause:
SELECT hbase_people_ages.*, people.points FROM hbase_people_ages JOIN people ON (hbase_people_ages.name = people.name);
The result:
OK Phelps Rena 18 35 Quinn Brian 20 48 Reyes Danny 41 52 Time taken: 10.003 seconds, Fetched: 3 row(s)