Ozone rack awareness

Rack awareness in Ozone is a feature that takes into account the physical network topology when placing data. It is crucial for data locality, fault tolerance, and overall performance, particularly in a geographically distributed cluster. If rack awareness is on, Ozone will place each key replica on a host in a different rack. This insures availability of data in case of a network failure or other unavailability issues.

To configure rack awareness for Ozone via ADCM, perform the following steps:

  1. Go to the ADCM UI and select your cluster on the Clusters page.

  2. Go to the Services tab and select Ozone.

  3. Switch on the Show advanced toggle and locate the Topology script and Topology data parameters.

  4. Paste your network topology script as the value of the Topology script parameter.

    Example topology script
    #!/bin/bash
    
    # Adjust/Add the property "net.topology.script.file.name"
    # to core-site.xml with the "absolute" path the this
    # file.  ENSURE the file is "executable".
    
    # Supply appropriate rack prefix
    RACK_PREFIX=default
    
    # To test, supply a hostname as script input:
    if [ $# -gt 0 ]; then
    
    CTL_FILE=${CTL_FILE:-"topology.data"}
    
    HADOOP_CONF=${HADOOP_CONF:-"/etc/hadoop/conf"}
    
    if [ ! -f ${HADOOP_CONF}/${CTL_FILE} ]; then
      echo -n "/$RACK_PREFIX/rack "
      exit 0
    fi
    
    while [ $# -gt 0 ] ; do
      nodeArg=$1
      exec< ${HADOOP_CONF}/${CTL_FILE}
      result=""
      while read line ; do
        ar=( $line )
        if [ "${ar[0]}" = "$nodeArg" ] ; then
          result="${ar[1]}"
        fi
      done
      shift
      if [ -z "$result" ] ; then
        echo -n "/$RACK_PREFIX/rack "
      else
        echo -n "/$RACK_PREFIX/rack_$result "
      fi
    done
    
    else
      echo -n "/$RACK_PREFIX/rack "
    fi

    You can find additional script examples in the Rack Awareness article.

  5. As value of the Topology data parameter, list the racks IPs and their corresponding IDs as shown in the following example:

    Example topology data
    # This file should be:
    #  - Placed in the /etc/hadoop/conf directory
    #    - On the Namenode (and backups IE: HA, Failover, etc)
    #    - On the Job Tracker OR Resource Manager (and any Failover JT's/RM's)
    # This file should be placed in the /etc/hadoop/conf directory.
    
    # Add Hostnames to this file. Format <host ip> <rack_location>
    10.92.42.178 01
    10.92.43.172 02
    10.92.42.229 03
  6. Click Save, then Create.

  7. In the Actions drop-down menu, select Restart.

  8. Make sure the Apply configs from ADCM option is set to true and click Run.

Topology script and data fields in ADCM
Topology script and data fields in ADCM

To check if rack awareness has been successfully configured, you can use the following command:

$ ozone admin datanode list

The output should look like this:

Datanode 10.92.42.178:9866 - Rack: /rack01 - Status: UP
Datanode 10.92.43.172:9866 - Rack: /rack02 - Status: UP
Datanode 10.92.42.229:9866 - Rack: /rack03 - Status: UP
Found a mistake? Seleсt text and press Ctrl+Enter to report it