Quick start with HBase shell
The simplest way to begin working with HBase is to use its utility HBase shell. It is a JRuby console available on each node of the HBase cluster immediately after its installation. To start working with the HBase shell, run the following command:
$ hbase shell
The shell prompt ends with a >
character. All subsequent commands should be written after it.
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-5.0.0-HBase-2.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-5.0.0-HBase-2.0-hive.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-5.0.0-HBase-2.0-pig.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-5.0.0-HBase-2.0-thin-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. Version 2.0.2, rUnknown, Thu Apr 15 20:20:26 UTC 2021 Took 0.0024 seconds hbase(main):001:0>
To leave the HBase shell, use the following command:
exit
The main data operations available in HBase are listed below. They are based on the example described in HBase data model. The full list of these and other useful shell commands with description of their parameters you can find in HBase shell commands.
NOTE
Before executing HBase commands, we recommend you to read about the HBase data model. |
Step 1. Create a table
To create a new table, write the create
keyword, after it a table name in quotes, and then information about all column families of this table. In a column family definition, use one of the following options:
-
Write only the name of the column families in quotes without curly brackets.
-
Describe several attributes for the column family using key and value pairs in curly brackets.
The following command creates the articles
table with two column families: basic
and tags
. Each column family is defined with two attributes: name and maximum number of stored value versions. So, all columns in these families will store five versions of each data value.
create 'articles', {NAME => 'basic', VERSIONS => 5}, {NAME => 'tags', VERSIONS => 5}
The output is similar to the following:
Created table articles Took 1.6225 seconds => Hbase::Table - articles
Step 2. Get information about the table
To check whether any table exists and get information about it, use the following commands:
-
exists. It checks whether the specified table exists:
exists 'articles'
The output is similar to the following:
Table articles does exist Took 0.0705 seconds => true
If the specified table does not exist, the command returns the following message:
Table not_existed does not exist Took 0.0076 seconds => false
-
list. This command returns the table name if the specified table exists:
list 'articles'
The output is similar to the following:
TABLE articles 1 row(s) Took 0.3102 seconds => ["articles"]
If the specified table does not exist, the command returns the following message:
TABLE 0 row(s) Took 0.0064 seconds => []
-
describe. This command checks whether the table exists and is enabled and shows information about its column families:
describe 'articles'
The output is similar to the following:
Table articles is ENABLED articles COLUMN FAMILIES DESCRIPTION {NAME => 'basic', VERSIONS => '5', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => ' false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false' , IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553 6'} {NAME => 'tags', VERSIONS => '5', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'f alse', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536 '} 2 row(s) Took 0.1100 seconds
If the specified table does not exist, the command returns the following message:
ERROR: Table not_existed does not exist. Describe the named table. For example: hbase> describe 't1' hbase> describe 'ns1:t1' Alternatively, you can use the abbreviated 'desc' for the same thing. hbase> desc 't1' hbase> desc 'ns1:t1' Took 0.0159 seconds
Step 3. Put new data into the table
To put new data into the table, use the put
keyword and then write a comma-separated list of the table name, row key, column name, and data value. Each column name is a colon-separated combination of column family name and column qualifier. Column families are fixed, meaning their names are defined during table creation. Column qualifiers are not fixed, so that you can add them on the fly while putting new data into previously created tables.
The following commands put data into the articles
table for the article1
row key. We define columns author
and header
in the column family basic
, and columns arch
, concepts
, and tutorials
in the column family tags
. The inserted data values occupy the last place in the commands. HBase generates timestamps for these data values automatically.
put 'articles', 'article1', 'basic:author', 'Test author'
put 'articles', 'article1', 'basic:header', 'Test article'
put 'articles', 'article1', 'tags:arch', true
put 'articles', 'article1', 'tags:concepts', true
put 'articles', 'article1', 'tags:tutorials', true
The following commands also put data into the articles
table, but for another row key — article2
. You can notice that it is not necessary to specify the previously used column qualifiers — you can enter only the necessary ones:
put 'articles', 'article2', 'basic:author', 'Test author2'
put 'articles', 'article2', 'basic:header', 'Test article2'
put 'articles', 'article2', 'tags:ref', true
The output is similar to the following:
Took 0.0051 seconds
Step 4. Get a specific table row
HBase provides various commands to read table rows as described in the following sections.
Read the whole row
To read a specific table row from a table, use the get
keyword and then define the table name and the required row key. For example, the following command gets all values with the article1
row key from the articles
table:
get 'articles', 'article1'
The command returns data values of all the columns that the specified row contains. Each returned cell shows the value and timestamp of its creation:
COLUMN CELL basic:author timestamp=1637054560096, value=Test author basic:header timestamp=1637054560118, value=Test article tags:arch timestamp=1637054560141, value=true tags:concepts timestamp=1637054560160, value=true tags:tutorials timestamp=1637054564066, value=true 1 row(s) Took 0.0442 seconds
The output of the same command for the row key article2
is as follows:
COLUMN CELL basic:author timestamp=1637054576501, value=Test author2 basic:header timestamp=1637054576516, value=Test article2 tags:ref timestamp=1637054577512, value=true 1 row(s) Took 0.0099 seconds
By default, the get
command shows only the latest versions for all data values even though the table can store up to five versions for each cell, as we have determined during table creation. If you add a new value, the get
command will return this new value because it will have the latest timestamp.
The following example demonstrates how to add a new value for the basic:header
column in the row with the article1
key:
put 'articles', 'article1', 'basic:header', 'Test article. Version 2'
get 'articles', 'article1'
The output of the subsequent get
command is similar to this:
COLUMN CELL basic:author timestamp=1637054560096, value=Test author basic:header timestamp=1637055836875, value=Test article. Version 2 tags:arch timestamp=1637054560141, value=true tags:concepts timestamp=1637054560160, value=true tags:tutorials timestamp=1637054564066, value=true 1 row(s) Took 0.0066 seconds
The next example shows how to add the third value version for the basic:header
column in the row with the article1
key:
put 'articles', 'article1', 'basic:header', 'Test article. Version 3'
get 'articles', 'article1'
The output is similar to this:
COLUMN CELL basic:author timestamp=1637054560096, value=Test author basic:header timestamp=1637056832082, value=Test article. Version 3 tags:arch timestamp=1637054560141, value=true tags:concepts timestamp=1637054560160, value=true tags:tutorials timestamp=1637054564066, value=true 1 row(s) Took 0.0064 seconds
Read a specific column
To read a specific column of the row, use the get
command with the column name after the table name and the row key. The column name contains the column family name and the column qualifier, separated by a colon.
The following command gets the latest value, added to the basic:header
column in the row with the article1
key.
get 'articles', 'article1', 'basic:header'
The output is similar to this:
COLUMN CELL basic:header timestamp=1637056832082, value=Test article. Version 3 1 row(s) Took 0.0227 seconds
As you see, this command returns the latest added column value. If you want to get a particular value version, specify its timestamp along with the column name in curly brackets:
get 'articles', 'article1', {COLUMN => 'basic:header', TIMESTAMP => 1637054560118}
The output is similar to this:
COLUMN CELL basic:header timestamp=1637054560118, value=Test article 1 row(s) Took 0.0171 seconds
Step 5. Scan the table for all data at once
The scan
command is used for getting all the table rows.
Scan all columns
To get all the columns for all table rows, use the scan
keyword and the table name after it. The following command returns the content of the articles
table:
scan 'articles'
The result is similar to this:
ROW COLUMN+CELL article1 column=basic:author, timestamp=1637054560096, value=Test author article1 column=basic:header, timestamp=1637056832082, value=Test article. Version 3 article1 column=tags:arch, timestamp=1637054560141, value=true article1 column=tags:concepts, timestamp=1637054560160, value=true article1 column=tags:tutorials, timestamp=1637054564066, value=true article2 column=basic:author, timestamp=1637054576501, value=Test author2 article2 column=basic:header, timestamp=1637054576516, value=Test article2 article2 column=tags:ref, timestamp=1637054577512, value=true 2 row(s) Took 0.0204 seconds
To get a specific number of value versions, not only the latest one, specify the required number of versions using the following command:
scan 'articles', {VERSIONS => 5}
The output is similar to this:
ROW COLUMN+CELL article1 column=basic:author, timestamp=1637054560096, value=Test author article1 column=basic:header, timestamp=1637056832082, value=Test article. Version 3 article1 column=basic:header, timestamp=1637055836875, value=Test article. Version 2 article1 column=basic:header, timestamp=1637054560118, value=Test article article1 column=tags:arch, timestamp=1637054560141, value=true article1 column=tags:concepts, timestamp=1637054560160, value=true article1 column=tags:tutorials, timestamp=1637054564066, value=true article2 column=basic:author, timestamp=1637054576501, value=Test author2 article2 column=basic:header, timestamp=1637054576516, value=Test article2 article2 column=tags:ref, timestamp=1637054577512, value=true 2 row(s) Took 0.0138 seconds
Scan a particular column
To get values of a particular column for all table rows, specify the column name in curly brackets after the scan
keyword and the table name. The following command gets all the values of the basic:author
column in the articles
table:
scan 'articles', {COLUMN => 'basic:author'}
The output is similar to this:
ROW COLUMN+CELL article1 column=basic:author, timestamp=1637054560096, value=Test author article2 column=basic:author, timestamp=1637054576501, value=Test author2 2 row(s) Took 0.0101 seconds
You can also define the number of required versions for a column using the following command:
scan 'articles', {COLUMN => 'basic:header', VERSIONS => 5}
The output is similar to this:
ROW COLUMN+CELL article1 column=basic:header, timestamp=1637056832082, value=Test article. Version 3 article1 column=basic:header, timestamp=1637055836875, value=Test article. Version 2 article1 column=basic:header, timestamp=1637054560118, value=Test article article2 column=basic:header, timestamp=1637054576516, value=Test article2 2 row(s) Took 0.0298 seconds
Step 6. Delete cell values from the table
To delete a particular cell value from the table, use the delete
keyword and after it the table name, the row key, the column name, and, optionally, the timestamp. If timestamp is omitted, this command deletes only the latest version of the cell, leaving other versions intact. Defining a timestamp leads to deleting the cell having this timestamp.
The following command deletes the latest value of the basic:header
column in the row with the article1
key:
delete 'articles', 'article1', 'basic:header'
The output is similar to this:
Took 0.0122 seconds
After this, the get
command returns the previous value version for the basic:header
column in the article1
row:
get 'articles', 'article1', {COLUMN => 'basic:header'}
The output looks similar to this:
COLUMN CELL basic:header timestamp=1637055836875, value=Test article. Version 2 1 row(s) Took 0.0239 seconds
Running this command with defined number of displayed versions can show that the Test article. Version 3
value has already been deleted:
hbase(main):010:0> get 'articles', 'article1', {COLUMN => 'basic:header', VERSIONS => 5} COLUMN CELL basic:header timestamp=1637055836875, value=Test article. Version 2 basic:header timestamp=1637054560118, value=Test article 1 row(s) Took 0.0077 seconds
Step 7. Alter the table
Altering the table structure is performed in three stages:
-
Run the
disable
command to make sure that other operations can’t be applied to this table. -
Run the
alter
command to apply the necessary changes to your table. -
Enable the table.
NOTE
In the latest HBase versions, tables can be altered without disabling them first. But, as altering enabled tables caused problems in the past, use this feature carefully and test it before moving to production. |
The following commands disable the articles
table, add the new temp
column family into it, and enable the table again:
disable 'articles'
alter 'articles', {NAME => 'temp', VERSIONS => 3}
enable 'articles'
The output data for the alter
command is similar to this:
Updating all regions with the new schema... All regions updated. Done. Took 1.4006 seconds
Now if you apply the describe
command to the articles
table, you will see that its structure has been changed, that is, the temp
column family has been added.
describe 'articles'
The output looks similar to this:
Table articles is ENABLED articles COLUMN FAMILIES DESCRIPTION {NAME => 'basic', VERSIONS => '5', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => ' false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false' , IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553 6'} {NAME => 'tags', VERSIONS => '5', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'f alse', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536 '} {NAME => 'temp', VERSIONS => '3', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'f alse', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536 '} 3 row(s) Took 0.0224 seconds
Step 8. Drop the table
Dropping a table requires several stages:
-
Apply the
disable
command to the table to make sure that other operations can’t be applied to this table. -
Run the
drop
command to delete your table.
The following commands disable the articles
table and drop it:
disable 'articles'
drop 'articles'
Now, if you try to apply the exists
command to the articles
table, you will see that it does not exist anymore:
hbase(main):008:0> exists 'articles' Table articles does not exist Took 0.0047 seconds => false