Конференция Arenadata
Новое время — новый Greenplum
Мы приглашаем вас принять участие в конференции, посвященной будущему Open-Source Greenplum 19 сентября в 18:00:00 UTC +3. Встреча будет проходить в гибридном формате — и офлайн, и онлайн. Онлайн-трансляция будет доступна для всех желающих.
Внезапное закрытие Greenplum его владельцем — компанией Broadcom - стало неприятным сюрпризом для всех, кто использует или планирует начать использовать решения на базе этой технологии. Многие ожидают выхода стабильной версии Greenplum 7 и надеются на её дальнейшее активное развитие.
Arenadata не могла допустить, чтобы разрабатываемый годами Open-Source проект Greenplum прекратил своё существование, поэтому 19 сентября мы представим наш ответ на данное решение Broadcom, а участники сообщества получат исчерпывающие разъяснения на все вопросы о дальнейшей судьбе этой технологии.

На конференции вас ждёт обсуждение следующих тем:

  • План возрождения Greenplum;
  • Дорожная карта;
  • Экспертное обсуждение и консультации.
Осталось до события

Quick start with HBase shell

The simplest way to begin working with HBase is to use its utility HBase shell. It is a JRuby console available on each node of the HBase cluster immediately after its installation. To start working with the HBase shell, run the following command:

$ hbase shell

The shell prompt ends with a > character. All subsequent commands should be written after it.

The HBase shell prompt
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-5.0.0-HBase-2.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-5.0.0-HBase-2.0-hive.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-5.0.0-HBase-2.0-pig.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-5.0.0-HBase-2.0-thin-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 2.0.2, rUnknown, Thu Apr 15 20:20:26 UTC 2021
Took 0.0024 seconds
hbase(main):001:0>

To leave the HBase shell, use the following command:

exit

The main data operations available in HBase are listed below. They are based on the example described in HBase data model. The full list of these and other useful shell commands with description of their parameters you can find in HBase shell commands.

NOTE

Before executing HBase commands, we recommend you to read about the HBase data model.

Step 1. Create a table

To create a new table, write the create keyword, after it a table name in quotes, and then information about all column families of this table. In a column family definition, use one of the following options:

  • Write only the name of the column families in quotes without curly brackets.

  • Describe several attributes for the column family using key and value pairs in curly brackets.

The following command creates the articles table with two column families: basic and tags. Each column family is defined with two attributes: name and maximum number of stored value versions. So, all columns in these families will store five versions of each data value.

create 'articles', {NAME => 'basic', VERSIONS => 5}, {NAME => 'tags', VERSIONS => 5}

The output is similar to the following:

Created table articles
Took 1.6225 seconds
=> Hbase::Table - articles

Step 2. Get information about the table

To check whether any table exists and get information about it, use the following commands:

  • exists. It checks whether the specified table exists:

    exists 'articles'

    The output is similar to the following:

    Table articles does exist
    Took 0.0705 seconds
    => true

    If the specified table does not exist, the command returns the following message:

    Table not_existed does not exist
    Took 0.0076 seconds
    => false
  • list. This command returns the table name if the specified table exists:

    list 'articles'

    The output is similar to the following:

    TABLE
    articles
    1 row(s)
    Took 0.3102 seconds
    => ["articles"]

    If the specified table does not exist, the command returns the following message:

    TABLE
    0 row(s)
    Took 0.0064 seconds
    => []
  • describe. This command checks whether the table exists and is enabled and shows information about its column families:

    describe 'articles'

    The output is similar to the following:

    Table articles is ENABLED
    articles
    COLUMN FAMILIES DESCRIPTION
    {NAME => 'basic', VERSIONS => '5', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => '
    false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false'
    , IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553
    6'}
    {NAME => 'tags', VERSIONS => '5', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'f
    alse', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false',
     IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536
    '}
    2 row(s)
    Took 0.1100 seconds

    If the specified table does not exist, the command returns the following message:

    ERROR: Table not_existed does not exist.
    
    Describe the named table. For example:
      hbase> describe 't1'
      hbase> describe 'ns1:t1'
    
    Alternatively, you can use the abbreviated 'desc' for the same thing.
      hbase> desc 't1'
      hbase> desc 'ns1:t1'
    
    Took 0.0159 seconds

Step 3. Put new data into the table

To put new data into the table, use the put keyword and then write a comma-separated list of the table name, row key, column name, and data value. Each column name is a colon-separated combination of column family name and column qualifier. Column families are fixed, meaning their names are defined during table creation. Column qualifiers are not fixed, so that you can add them on the fly while putting new data into previously created tables.

The following commands put data into the articles table for the article1 row key. We define columns author and header in the column family basic, and columns arch, concepts, and tutorials in the column family tags. The inserted data values occupy the last place in the commands. HBase generates timestamps for these data values automatically.

put 'articles', 'article1', 'basic:author', 'Test author'
put 'articles', 'article1', 'basic:header', 'Test article'
put 'articles', 'article1', 'tags:arch', true
put 'articles', 'article1', 'tags:concepts', true
put 'articles', 'article1', 'tags:tutorials', true

The following commands also put data into the articles table, but for another row key — article2. You can notice that it is not necessary to specify the previously used column qualifiers — you can enter only the necessary ones:

put 'articles', 'article2', 'basic:author', 'Test author2'
put 'articles', 'article2', 'basic:header', 'Test article2'
put 'articles', 'article2', 'tags:ref', true

The output is similar to the following:

Took 0.0051 seconds

Step 4. Get a specific table row

HBase provides various commands to read table rows as described in the following sections.

Read the whole row

To read a specific table row from a table, use the get keyword and then define the table name and the required row key. For example, the following command gets all values with the article1 row key from the articles table:

get 'articles', 'article1'

The command returns data values of all the columns that the specified row contains. Each returned cell shows the value and timestamp of its creation:

COLUMN                                   CELL
 basic:author                            timestamp=1637054560096, value=Test author
 basic:header                            timestamp=1637054560118, value=Test article
 tags:arch                               timestamp=1637054560141, value=true
 tags:concepts                           timestamp=1637054560160, value=true
 tags:tutorials                          timestamp=1637054564066, value=true
1 row(s)
Took 0.0442 seconds

The output of the same command for the row key article2 is as follows:

COLUMN                                   CELL
 basic:author                            timestamp=1637054576501, value=Test author2
 basic:header                            timestamp=1637054576516, value=Test article2
 tags:ref                                timestamp=1637054577512, value=true
1 row(s)
Took 0.0099 seconds

By default, the get command shows only the latest versions for all data values even though the table can store up to five versions for each cell, as we have determined during table creation. If you add a new value, the get command will return this new value because it will have the latest timestamp.

The following example demonstrates how to add a new value for the basic:header column in the row with the article1 key:

put 'articles', 'article1', 'basic:header', 'Test article. Version 2'
get 'articles', 'article1'

The output of the subsequent get command is similar to this:

COLUMN                                   CELL
 basic:author                            timestamp=1637054560096, value=Test author
 basic:header                            timestamp=1637055836875, value=Test article. Version 2
 tags:arch                               timestamp=1637054560141, value=true
 tags:concepts                           timestamp=1637054560160, value=true
 tags:tutorials                          timestamp=1637054564066, value=true
1 row(s)
Took 0.0066 seconds

The next example shows how to add the third value version for the basic:header column in the row with the article1 key:

put 'articles', 'article1', 'basic:header', 'Test article. Version 3'
get 'articles', 'article1'

The output is similar to this:

COLUMN                                   CELL
 basic:author                            timestamp=1637054560096, value=Test author
 basic:header                            timestamp=1637056832082, value=Test article. Version 3
 tags:arch                               timestamp=1637054560141, value=true
 tags:concepts                           timestamp=1637054560160, value=true
 tags:tutorials                          timestamp=1637054564066, value=true
1 row(s)
Took 0.0064 seconds

Read a specific column

To read a specific column of the row, use the get command with the column name after the table name and the row key. The column name contains the column family name and the column qualifier, separated by a colon.

The following command gets the latest value, added to the basic:header column in the row with the article1 key.

get 'articles', 'article1', 'basic:header'

The output is similar to this:

COLUMN                                   CELL
 basic:header                            timestamp=1637056832082, value=Test article. Version 3
1 row(s)
Took 0.0227 seconds

As you see, this command returns the latest added column value. If you want to get a particular value version, specify its timestamp along with the column name in curly brackets:

get 'articles', 'article1', {COLUMN => 'basic:header', TIMESTAMP => 1637054560118}

The output is similar to this:

COLUMN                                   CELL
 basic:header                            timestamp=1637054560118, value=Test article
1 row(s)
Took 0.0171 seconds

Step 5. Scan the table for all data at once

The scan command is used for getting all the table rows.

Scan all columns

To get all the columns for all table rows, use the scan keyword and the table name after it. The following command returns the content of the articles table:

scan 'articles'

The result is similar to this:

ROW                                      COLUMN+CELL
 article1                                column=basic:author, timestamp=1637054560096, value=Test author
 article1                                column=basic:header, timestamp=1637056832082, value=Test article. Version 3
 article1                                column=tags:arch, timestamp=1637054560141, value=true
 article1                                column=tags:concepts, timestamp=1637054560160, value=true
 article1                                column=tags:tutorials, timestamp=1637054564066, value=true
 article2                                column=basic:author, timestamp=1637054576501, value=Test author2
 article2                                column=basic:header, timestamp=1637054576516, value=Test article2
 article2                                column=tags:ref, timestamp=1637054577512, value=true
2 row(s)
Took 0.0204 seconds

To get a specific number of value versions, not only the latest one, specify the required number of versions using the following command:

scan 'articles', {VERSIONS => 5}

The output is similar to this:

ROW                                      COLUMN+CELL
 article1                                column=basic:author, timestamp=1637054560096, value=Test author
 article1                                column=basic:header, timestamp=1637056832082, value=Test article. Version 3
 article1                                column=basic:header, timestamp=1637055836875, value=Test article. Version 2
 article1                                column=basic:header, timestamp=1637054560118, value=Test article
 article1                                column=tags:arch, timestamp=1637054560141, value=true
 article1                                column=tags:concepts, timestamp=1637054560160, value=true
 article1                                column=tags:tutorials, timestamp=1637054564066, value=true
 article2                                column=basic:author, timestamp=1637054576501, value=Test author2
 article2                                column=basic:header, timestamp=1637054576516, value=Test article2
 article2                                column=tags:ref, timestamp=1637054577512, value=true
2 row(s)
Took 0.0138 seconds

Scan a particular column

To get values of a particular column for all table rows, specify the column name in curly brackets after the scan keyword and the table name. The following command gets all the values of the basic:author column in the articles table:

scan 'articles', {COLUMN => 'basic:author'}

The output is similar to this:

ROW                                      COLUMN+CELL
 article1                                column=basic:author, timestamp=1637054560096, value=Test author
 article2                                column=basic:author, timestamp=1637054576501, value=Test author2
2 row(s)
Took 0.0101 seconds

You can also define the number of required versions for a column using the following command:

scan 'articles', {COLUMN => 'basic:header', VERSIONS => 5}

The output is similar to this:

ROW                                      COLUMN+CELL
 article1                                column=basic:header, timestamp=1637056832082, value=Test article. Version 3
 article1                                column=basic:header, timestamp=1637055836875, value=Test article. Version 2
 article1                                column=basic:header, timestamp=1637054560118, value=Test article
 article2                                column=basic:header, timestamp=1637054576516, value=Test article2
2 row(s)
Took 0.0298 seconds

Step 6. Delete cell values from the table

To delete a particular cell value from the table, use the delete keyword and after it the table name, the row key, the column name, and, optionally, the timestamp. If timestamp is omitted, this command deletes only the latest version of the cell, leaving other versions intact. Defining a timestamp leads to deleting the cell having this timestamp.

The following command deletes the latest value of the basic:header column in the row with the article1 key:

delete 'articles', 'article1', 'basic:header'

The output is similar to this:

Took 0.0122 seconds

After this, the get command returns the previous value version for the basic:header column in the article1 row:

get 'articles', 'article1', {COLUMN => 'basic:header'}

The output looks similar to this:

COLUMN                                   CELL
 basic:header                            timestamp=1637055836875, value=Test article. Version 2
1 row(s)
Took 0.0239 seconds

Running this command with defined number of displayed versions can show that the Test article. Version 3 value has already been deleted:

hbase(main):010:0> get 'articles', 'article1', {COLUMN => 'basic:header', VERSIONS => 5}
COLUMN                                   CELL
 basic:header                            timestamp=1637055836875, value=Test article. Version 2
 basic:header                            timestamp=1637054560118, value=Test article
1 row(s)
Took 0.0077 seconds

Step 7. Alter the table

Altering the table structure is performed in three stages:

  1. Run the disable command to make sure that other operations can’t be applied to this table.

  2. Run the alter command to apply the necessary changes to your table.

  3. Enable the table.

NOTE

In the latest HBase versions, tables can be altered without disabling them first. But, as altering enabled tables caused problems in the past, use this feature carefully and test it before moving to production.

The following commands disable the articles table, add the new temp column family into it, and enable the table again:

disable 'articles'
alter 'articles', {NAME => 'temp', VERSIONS => 3}
enable 'articles'

The output data for the alter command is similar to this:

Updating all regions with the new schema...
All regions updated.
Done.
Took 1.4006 seconds

Now if you apply the describe command to the articles table, you will see that its structure has been changed, that is, the temp column family has been added.

describe 'articles'

The output looks similar to this:

Table articles is ENABLED
articles
COLUMN FAMILIES DESCRIPTION
{NAME => 'basic', VERSIONS => '5', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => '
false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false'
, IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6'}
{NAME => 'tags', VERSIONS => '5', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'f
alse', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false',
 IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536
'}
{NAME => 'temp', VERSIONS => '3', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'f
alse', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false',
 IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536
'}
3 row(s)
Took 0.0224 seconds

Step 8. Drop the table

Dropping a table requires several stages:

  1. Apply the disable command to the table to make sure that other operations can’t be applied to this table.

  2. Run the drop command to delete your table.

The following commands disable the articles table and drop it:

disable 'articles'
drop 'articles'

Now, if you try to apply the exists command to the articles table, you will see that it does not exist anymore:

hbase(main):008:0> exists 'articles'
Table articles does not exist
Took 0.0047 seconds
=> false
Found a mistake? Seleсt text and press Ctrl+Enter to report it