Logging in MapReduce
Overview
MapReduce records command output and processes events in text logs. This information can be useful when diagnosing technical issues.
MapReduce History Server stores logs locally, on the host it occupies. The logs have the .log extension and are located in the var/log/hadoop-mapreduce/ directory. In the same directory, you can find .out files that contain component restart information.
All logs of all YARN components, including MapReduce logs, are also available in YARN UIs.
The logs naming convention is hadoop-mapred-<component>-<host>.log
.
Where:
-
<component>
— a component name, for example, History Server; -
<host>
— the FQDN of the component host.
The MapReduce logs configuration shares the same log4j.properties file with HDFS. You can edit this file to control the logging of MapReduce tasks.
Grep logs
You can search through the logs for a specific information, like error messages. To do this, connect to the host with the component whose logs you want to inspect, and use a grep
command.
For example:
$ cat /var/log/hadoop-mapreduce/hadoop-mapred-historyserver-elenas-adh2.ru-central1.internal.log | grep -i -A3 -B1 error
This command searches for messages containing the word error
in the History Server log, located on the elenas-adh2.ru-central1.internal
host. The -i
option allows you to ignore case distinctions. The -A3 -B1
options expand the output to one line before and three lines after the line containing the error.
The example output:
2024-04-20 08:53:57,337 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move intermediate done files 2024-04-20 08:55:59,664 ERROR org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: RECEIVED SIGNAL 15: SIGTERM 2024-04-20 08:55:59,668 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping JobHistoryServer metrics system... 2024-04-20 08:55:59,668 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: JobHistoryServer metrics system stopped. 2024-04-20 08:55:59,669 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: JobHistoryServer metrics system shutdown complete.
Logging levels
MapReduce uses the Log4j2 library for logging which supports the following log levels (from least to most informative):
-
FATAL
— indicates that an operation can’t continue execution and will terminate. -
ERROR
— notifies that a program is not working correctly or has stopped. -
WARN
— warns about potential problems. This doesn’t mean that a program is not working, but raises a concern. -
INFO
— informs regarding the program lifecycle or state. -
DEBUG
— prints debugging information about internal states of the program. -
TRACE
— prints messages tracing the execution flow of a program.
The Log4j2 loggers also accept logging level values: OFF
— for switching off the logging, and ALL
— for allowing all types of messages.
Setting one level of logging will enable this level and all levels above it. For example, if you set the logging level to WARN
, then only warnings, errors, and fatal messages would get into the log files, but not INFO
, DEBUG
, and TRACE
.
Container logs
Applications submitted to YARN generate their own logs. To read application logs, use the logs command.
For example:
$ yarn logs -applicationId application_1714649647710_0001
Archive logs
You can make an archive of MapReduce logs by using the archive-logs command. This process is similar to the YARN log aggregation feature.
The example command for creating an archive of ten log files:
$ mapred archive-logs -minNumberLogFiles 10
The default value for the -minNumberLogFiles
is 20, which means that running the mapred archive-logs
command will create an archive only if 20 or more logs are present.