Checkpointing in HDFS

Overview

Checkpointing in HDFS is a feature implemented for safer and cost-effective handling of metadata. Normally, the NameNode loads fsimage and applies edits during start up. If HDFS accumulates too many edits, it takes longer for the NameNode to restart.

The designated checkpointing service periodically updates the fsimage with the information from the edit logs. This helps preventing the edit logs from occupying too much space and enables a faster restart of the NameNode.

Creating a new fsimage is a computation-intensive task. To maintain system availability during this operation, the task is delegated to either Secondary NameNode or Standby NameNode.

In a regular Hadoop cluster, the Secondary NameNode is responsible for processing edit logs and updating the fsimage. For safety reasons, you can keep a local up-to-date copy of the filesystem by enabling either Checkpoint node or Backup node.

IMPORTANT
In Hight availability (HA) Hadoop clusters, merging of the edit logs into a new fsimage is done by the Standby NameNode; and the edit logs are managed by the Quorum Journal Manager. For this reason, using Checkpoint nodes and Backup nodes is unnecessary in HA clusters.

For more information about HDFS components, refer to the HDFS architecture article.

Configuration

Checkpointing can be carried out by one of two services:

  • Checkpoint node — periodically updates fsimage on the active NameNode with the information from the edit logs.

  • Backup node — has the same functions as the checkpoint node and, additionally, keeps in memory an up-to-date copy of the file system namespace that is synchronized with the active NameNode state.

You can configure either one but not both.

Checkpoint nodes and the backup node have the same memory requirements as a regular NameNode. An attempt to run them on the same host as the NameNode will likely result in an error.

The parameters that trigger the checkpoint creation:

  • dfs.namenode.checkpoint.period — a time period between checkpoints. You can use a suffix to specify the time: ms (millis), s (sec), m (min), h (hour), d (day). The default is 3600s.

  • dfs.namenode.checkpoint.txns — a maximum number of edits per checkpoint. The default is 1000000.

In the dfs.namenode.checkpoint.check.period parameter, you can specify how often should a checkpoint node or a backup node check if any of these conditions are met.

To configure checkpointing using ADCM, perform these steps:

  • Checkpoint node

  • Backup node

  1. On the Clusters page, select the desired cluster.

  2. Go to the Services tab and click at HDFS.

  3. Toggle the Show advanced option and find the Custom hdfs-site.xml field.

  4. Add the dfs.namenode.backup.address parameter and it’s value <host>:50100, where <host> is the FQDN address of the checkpoint node’s host. You can specify several checkpoint nodes if necessary.

  5. Optionally, add the dfs.namenode.backup.http-address parameter and its value <host>:50105, where <host> is the FQDN address of the checkpoint node’s host.

  6. Confirm changes to HDFS configuration by clicking Save.

  7. In the Actions drop-down menu, select Restart, make sure the Apply configs from ADCM option is set to true and click Run.

  8. Connect to a cluster host via SSH and run the command as the HDFS superuser:

    $ hdfs namenode -checkpoint

<<<<<<< HEAD . On the CLUSTERS page, select the desired cluster. . Go to the Services page and click at HDFS. . Navigate to the Configuration page, turn on the Advanced flag, and find the Custom hdfs-site.xml field. . Add the dfs.namenode.backup.address parameter and its value <host>:50100, where <host> is the FQDN address of the backup node’s host. HDFS supports only one backup node in a cluster.

  1. On the Clusters page, select the desired cluster.

  2. Go to the Services tab and click at HDFS.

  3. Toggle the Show advanced option and find the Custom hdfs-site.xml field.

  4. Add the dfs.namenode.backup.address parameter and it’s value <host>:50100, where <host> is the FQDN address of the backup node’s host. HDFS supports only one backup node in a cluster. >>>>>>> 677937f9 (Updated screenshots and instructions)

  5. Optionally, add the dfs.namenode.backup.http-address parameter and its value <host>:50105, where <host> is the FQDN address of the backup node’s host.

  6. Confirm changes to HDFS configuration by clicking Save.

  7. In the Actions drop-down menu, select Restart, make sure the Apply configs from ADCM option is set to true and click Run.

  8. Connect to a cluster host via SSH and run the command as the HDFS superuser:

    $ hdfs namenode -backup
TIP
One of the possible errors is insufficient permissions or missing directories. Modifying permissions for some directories might be required.

Commands

To create a new checkpoint manually, run:

$ hdfs dfsadmin -saveNamespace

To clear the redundant checkpoints, use the expunge command.

For more information on checkpointing options, see the description of the following commands:

Import checkpoint

If the fsimage file has been lost or you want to create a new NameNode with the existing checkpoint, you can import a checkpoint manually.

To import a checkpoint to the NameNode:

  1. Make sure that the directory specified in the dfs.namenode.name.dir parameter exists, and it’s empty.

  2. Make sure that the dfs.namenode.checkpoint.dir parameter contains the path to the checkpoint directory.

  3. Start the NameNode using the -importCheckpoint option:

$ hdfs namenode -importCheckpoint

The NameNode will upload the checkpoint from the directory specified in dfs.namenode.checkpoint.dir and save it in the NameNode directory, set in dfs.namenode.name.dir.

Found a mistake? Seleсt text and press Ctrl+Enter to report it