Scan over snapshots
Overview
Scan over snapshot is a feature that allows you to scan data directly from HFiles without using HBase. This works regardless of whether the HBase is up and helps to save resources on the HBase server side.
Scan over snapshot is provided by the TableSnapshotScanner
and TableSnapshotInputFormat
classes.
TableSnapshotScanner class
The TableSnapshotScanner
class is intended for single scans over the HFiles run from the client side. Those HFiles are copied into a temporary directory and deleted when the scanner is closed. This directory must not be a subdirectory of the path specified in the hbase.rootdir
parameter in the hbase-site.xml configuration file. The user on the client side must also have write permissions for that directory.
Below is an example of the TableSnapshotScanner
class usage:
Path restoreDir = new Path("<path>");
Scan scan = new Scan();
try (TableSnapshotScanner scanner = new TableSnapshotScanner(conf, restoreDir, snapshotName, scan)) {
Result result = scanner.next();
while (result != null) {
...
result = scanner.next();
}
}
where <path>
is the path to the temporary directory.
TableSnapshotInputFormat class
The TableSnapshotInputFormat
class is intended for scanning HFiles and use the results in MapReduce jobs.
Below is an example of the TableSnapshotInputFormat
class usage:
Job job = new Job(conf);
Path restoreDir = new Path(",<path>");
Scan scan = new Scan();
TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, scan, MyTableMapper.class, MyMapKeyOutput.class, MyMapOutputValueWritable.class, job, true, restoreDir);
Configuration
To be able to use the scan over snapshot feature, do the following:
-
Go to ADCM UI and select your ADH cluster.
-
Navigate to Services → HDFS → Primary configuration and toggle Show advanced.
-
Open the Custom hdfs-site.xml section and click Add property.
-
For the property name, enter
dfs.namenode.acls.enabled
and set its value totrue
. -
Open the Custom core-site.xml section and click Add property.
-
For the property name, enter
fs.permissions.umask-mode
and set its value to027
. -
Save the configuration by clicking Save → Create.
-
Navigate to Services → HBase → Primary configuration and toggle Show advanced.
-
Open the Custom hbase-site.xml section and click Add property.
-
For the property name, enter
hbase.coprocessor.master.classes
and set its value to"org.apache.hadoop.hbase.security.access.AccessController,org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclController"
. -
Add another property. For the property name, enter
hbase.acl.sync.to.hdfs.enable
and set its value totrue
. -
Save the configuration by clicking Save → Create and restart the service by clicking Actions → Reconfig and graceful restart.
-
Go to HBase shell and modify the required table scheme to enable the snapshot scanning by executing the command of the following kind:
alter '<table>', CONFIGURATION => {'hbase.acl.sync.to.hdfs.enable' => 'true'}
where
<table>
is the name of the table.