Use snapshots for data backup and restore in HBase
Overview
HBase snapshots is a feature that allows you to back up data with minimal possible impact on the HBase performance. Making of a table snapshot neither involves any data copying nor requires the table to be disabled. A snapshot export to another cluster is done without the Master or Region Servers participation by using the utility based on distcp. When required, the table can be quickly restored from the snapshot.
Snapshot is basically a collection of metadata: table schema and links to the HFiles containing the actual data of the table. HFiles, once created, cannot be changed in any way except deleted. At the moment of a snapshot creation, the HFiles containing the data of a table become linked to that snapshot. If an HFile linked to a snapshot undergoes compaction, it is not deleted but rather archived. HFiles linked to a snapshot exist all the time the snapshot exists, so that the information could be extracted and the corresponding table restored. Same principle applies to snapshot clones, which are tables restored from a snapshot but independent from the original table: while a clone exists, the corresponding HFile or HFiles exist as well. This is why a rotation policy for the snapshots and their clones must be in place in order to avoid the disk space depletion.
NOTE
Detailed information on the HBase shell commands used for working with snapshots is provided in the HBase shell commands → Snapshots commands section.
|
Usage
For the purposes of examples shown below, a small test table named articles
will be used, the creation process of which is provided in the Quick start with HBase shell article. Complete Step 1 and Step 3 from it if you wish to follow the examples.
The examples below are consistent with the following life cycle of a snapshot:
-
Creation of a snapshot and verification of its existence.
-
Restoration of a table from the snapshot after it being changed.
-
Export of a snapshot to another HBase cluster and recreation of a table from it.
-
Removal of a snapshot.
All commands are executed in the HBase shell if not specified otherwise.
Create a snapshot
To create a table snapshot, use the snapshot command. Example:
snapshot 'articles', 'articles_snp'
Use the list_snapshots command if you want to see all the snapshots or the list_table_snapshots command if you want to see the snapshots of the specified table. Example:
list_table_snapshots 'articles'
The output should look the following way:
SNAPSHOT TABLE + CREATION TIME articles_snp articles (2024-11-01 08:11:25 UTC) 1 row(s) Took 0.0528 seconds => ["articles_snp"]
Restore data from a snapshot
After creating a snapshot, alter the data in the table in any way you want. For example, you can delete one row entirely and add another one. Example:
deleteall 'articles', 'article2'
put 'articles', 'article3', 'basic:author', 'King'
put 'articles', 'article3', 'basic:header', 'Test article3'
put 'articles', 'article3', 'tags:ref', true
Scan the table to make sure that the changes took effect. Example:
scan 'articles'
The output should look the following way:
ROW COLUMN+CELL article1 column=basic:author, timestamp=2024-11-01T08:03:55.730, value=Test author article1 column=basic:header, timestamp=2024-11-01T08:03:55.758, value=Test article article1 column=tags:arch, timestamp=2024-11-01T08:03:55.776, value=true article1 column=tags:concepts, timestamp=2024-11-01T08:03:55.794, value=true article1 column=tags:tutorials, timestamp=2024-11-01T08:03:58.771, value=true article3 column=basic:author, timestamp=2024-11-01T08:19:36.078, value=King article3 column=basic:header, timestamp=2024-11-01T08:19:36.093, value=Test article3 article3 column=tags:ref, timestamp=2024-11-01T08:19:37.856, value=true 2 row(s) Took 0.0638 seconds
To restore the table, use the restore_snapshot command, but first you need to disable the table being restored. Example:
disable 'articles'
restore_snapshot 'articles_snp'
Enable the table back again and scan it to make sure that the restoration worked. Example:
enable 'articles'
scan 'articles'
The output should look the following way:
ROW COLUMN+CELL article1 column=basic:author, timestamp=2024-11-01T08:03:55.730, value=Test author article1 column=basic:header, timestamp=2024-11-01T08:03:55.758, value=Test article article1 column=tags:arch, timestamp=2024-11-01T08:03:55.776, value=true article1 column=tags:concepts, timestamp=2024-11-01T08:03:55.794, value=true article1 column=tags:tutorials, timestamp=2024-11-01T08:03:58.771, value=true article2 column=basic:author, timestamp=2024-11-01T08:04:19.531, value=Test author2 article2 column=basic:header, timestamp=2024-11-01T08:04:19.554, value=Test article2 article2 column=tags:ref, timestamp=2024-11-01T08:04:20.802, value=true 2 row(s) Took 0.0387 seconds
Export a snapshot
You can export the snapshot to another cluster and restore the table there. To do this, first connect to an HBase source cluster node via SSH and change user to hbase
, then export the snapshot. The export command has the following syntax:
$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot <snapshot_name> -copy-to hdfs://<namenode_active>:<port><hbase_dir> -mappers <map_number>
where:
-
<snapshot_name>
— name of the snapshot being exported; -
<namenode_active>
— network address of the active NameNode server in the destination cluster (use thehdfs haadmin -getAllServiceState
command to find out which server is active); -
<port>
— port number of the NameNode server (8020
by default); -
<hbase_dir>
— HBase home directory: value of thezookeeper.znode.parent
parameter in the hbase-site.xml configuration file of the destination cluster (/hbase
by default); -
<map_number>
— number of mapper jobs to perform the export.
Example:
$ sudo -u hbase bash
$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot articles_snp -copy-to hdfs://av-adh-backup-1.ru-central1.internal:8020/hbase -mappers 4
Log in to the HBase shell of the destination cluster and restore the table from the snapshot:
restore_snapshot 'articles_snp'
NOTE
In this case the restored table is already enabled.
|
Rotate snapshots
It is important that old and unneeded snapshots do not pile up and are deleted in a timely manner. HFiles linked by snapshots (or their clones) cannot be erased as long as those snapshots exist, even if the HFiles are archived. When you no longer need a snapshot, delete it using the delete_snapshot command. Example:
delete_snapshot 'articles_snp'
You can also delete all snapshots of a certain table using the delete_table_snapshots command or delete all snapshots with the names matching a certain regular expression using the delete_all_snapshot command.