Scan over snapshots

Overview

Scan over snapshot is a feature that allows you to scan data directly from HFiles without using HBase. This works regardless of whether the HBase is up and helps to save resources on the HBase server side.

Scan over snapshot is provided by the TableSnapshotScanner and TableSnapshotInputFormat classes.

TableSnapshotScanner class

The TableSnapshotScanner class is intended for single scans over the HFiles run from the client side. Those HFiles are copied into a temporary directory and deleted when the scanner is closed. This directory must not be a subdirectory of the path specified in the hbase.rootdir parameter in the hbase-site.xml configuration file. The user on the client side must also have write permissions for that directory.

Below is an example of the TableSnapshotScanner class usage:

Path restoreDir = new Path("<path>");
Scan scan = new Scan();
try (TableSnapshotScanner scanner = new TableSnapshotScanner(conf, restoreDir, snapshotName, scan)) {
    Result result = scanner.next();
    while (result != null) {
        ...
        result = scanner.next();
    }
}

where <path> is the path to the temporary directory.

TableSnapshotInputFormat class

The TableSnapshotInputFormat class is intended for scanning HFiles and use the results in MapReduce jobs.

Below is an example of the TableSnapshotInputFormat class usage:

Job job = new Job(conf);
Path restoreDir = new Path(",<path>");
Scan scan = new Scan();
TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, scan, MyTableMapper.class, MyMapKeyOutput.class, MyMapOutputValueWritable.class, job, true, restoreDir);

Configuration

To be able to use the scan over snapshot feature, do the following:

  1. Go to ADCM UI and select your ADH cluster.

  2. Navigate to Services → HDFS → Primary configuration and toggle Show advanced.

  3. Open the Custom hdfs-site.xml section and click Add property.

  4. For the property name, enter dfs.namenode.acls.enabled and set its value to true.

  5. Open the Custom core-site.xml section and click Add property.

  6. For the property name, enter fs.permissions.umask-mode and set its value to 027.

  7. Save the configuration by clicking Save → Create.

  8. Navigate to Services → HBase → Primary configuration and toggle Show advanced.

  9. Open the Custom hbase-site.xml section and click Add property.

  10. For the property name, enter hbase.coprocessor.master.classes and set its value to "org.apache.hadoop.hbase.security.access.AccessController,org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclController".

  11. Add another property. For the property name, enter hbase.acl.sync.to.hdfs.enable and set its value to true.

  12. Save the configuration by clicking Save → Create and restart the service by clicking Actions → Reconfig and graceful restart.

  13. Go to HBase shell and modify the required table scheme to enable the snapshot scanning by executing the command of the following kind:

    alter '<table>', CONFIGURATION => {'hbase.acl.sync.to.hdfs.enable' => 'true'}

    where <table> is the name of the table.

Found a mistake? Seleсt text and press Ctrl+Enter to report it