NameNode

To ensure high availability, use both primary and secondary NameNodes. They are crucial parts of any Hyperwave cluster and should be highly available. Both servers keep the HDFS state in the fsimage file and logs in the edits file.

NameNodes perform the following:

  • all operations with files in the HDFS;

  • mapping files and DataNode blocks;

  • storing metadata of HDFS files and folders;

  • storing DataNode block locations;

  • controlling data replication.

The secondary NameNode is a reserve storage of the fsimage and edits files. It periodically updates the fsimage file from the edits log, thus preventing the latter from becoming too large.

Storage options

Both NameNode servers should have highly reliable storage for their namespace storage and edits log. Typically, hardware RAID and reliable network storage are justifiable options.

For ADH NameNodes, regardless of the number of DataNodes, the storage characteristics are consistent. Use four near 1 TB SAS drives with a RAID HDD controller configured for RAID 1+0. SAS drives are more expensive than SATA drives and have lower storage capacity, but they are faster and much more reliable.

Deploying your SAS drives as a RAID array ensures that the ADH management services have a fast, stable, and redundant storage for their mission-critical data.

Memory options

Memory requirements vary considerably depending on the scale of an ADH cluster. Memory is a critical factor for NameNodes, because the active and standby NameNode servers rely heavily on RAM to manage HDFS. As such, use error-correcting memory (ECC) in ADH NameNodes. NameNodes usually require from 64 to 128 GB of RAM.

The NameNode memory requirement is a direct function of the number of file blocks stored in HDFS. As a rule, a NameNode uses roughly 1 GB of RAM per million HDFS blocks.

NOTE
Remember that files are broken down into individual blocks and replicated so that you have three copies of each block, not the file.

Processors

It is recommended to use motherboards with two CPU sockets, each with eight cores 2.5—​3 GHz. The Intel architecture is commonly used.

Network

Fast communication is vital for the services on NameNodes, so it is recommended using a pair of bonded 10 Gbps connections. This bonded pair provides redundancy and also doubles throughput to 20 Gbps. For smaller clusters (less than 50 nodes) 1 Gbps connectors can also be used.

Found a mistake? Seleсt text and press Ctrl+Enter to report it