ADH monitoring metrics
This article describes the monitoring metrics reported by ADH to a Monitoring cluster.
Metric types
All the metrics are divided into two groups: system metrics and service metrics. Both metric types are available in Graphite/Grafana UIs after the installation of the Monitoring cluster.
System metrics indicate general characteristics of cluster hosts, usually related to resource consumption. The reported system metrics are presented below.
| Metrics group | Description |
|---|---|
cpu |
CPU utilization |
diskspace |
Disk usage |
files |
File statistics |
iostat |
I/O performance |
loadavg |
System load averages |
memory |
Memory usage |
netstat |
Network connection statistics |
network |
Network interface performance |
uptime |
Uptime statistics |
Service metrics reflect the characteristics of individual ADH services and their list is below.
| Component | Metrics group | Metric name | Description |
|---|---|---|---|
RegionServer |
IPC |
numActiveHandler |
Number of RPC handlers that serve incoming requests |
numCallsInGeneralQueue |
Number of user requests in the queue |
||
numOpenConnections |
Number of open connections (RPC) |
||
JvmMetrics |
GcCount |
Total number of GC iterations |
|
GcCountConcurrentMarkSweep |
Number of GC iterations done with the Concurrent Mark Sweep algorithm |
||
GcCountParNew |
Number of GC iterations done with the ParNew algorithm |
||
GcTimeMillis |
Total GC time in milliseconds |
||
GcTimeMillisConcurrentMarkSweep |
Total GC time (CMS) in milliseconds |
||
GcTimeMillisParNew |
Total GC time (ParNew) in milliseconds |
||
MemHeapCommittedM |
Currently allocated heap memory in MB |
||
MemHeapMaxM |
Maximum heap memory in MB |
||
MemHeapUsedM |
Currently utilized heap memory in MB |
||
Server |
Append_95th_percentile |
The 95 percentile latency for the append operation in the RegionServer |
|
Append_median |
The median latency for the append operation in the RegionServer |
||
Delete_95th_percentile |
The 95 percentile latency for the delete operation in the RegionServer |
||
Delete_median |
The median latency for the delete operation in the RegionServer |
||
Get_95th_percentile |
The 95 percentile latency for the get operation in the RegionServer |
||
Get_median |
The median latency for the get operation in the RegionServer |
||
Increment_95th_percentile |
The 95 percentile latency for the increment operation in the RegionServer |
||
Increment_median |
The median latency for the increment operation in the RegionServer |
||
Put_95th_percentile |
The 95 percentile latency for the put operation in the RegionServer |
||
Put_median |
The median latency for the put operation in the RegionServer |
||
ScanTime_95th_percentile |
The 95 percentile latency for the scan operation in the RegionServer |
||
ScanTime_median |
The median latency for the scan operation in the RegionServer |
||
percentFilesLocal |
The percent of store file data that can be read from the local DataNode |
| Component | Metrics group | Metric name | Description |
|---|---|---|---|
datanode |
FSDatasetState |
NumFailedVolumes |
Number of failed disks in a cluster |
JvmMetrics |
GcCount |
Total number of GC iterations |
|
GcCountPSMarkSweep |
Number of GC iterations done with the PS MarkSweep algorithm |
||
GcCountPSScavenge |
Number of GC iterations done with the PS Scavenge algorithm |
||
GcTimeMillis |
Total GC time in milliseconds |
||
GcTimeMillisPSMarkSweep |
Total GC time (PS MarkSweep) in milliseconds |
||
GcTimeMillisPSScavenge |
Total GC time (PS Scavenge) in milliseconds |
||
MemHeapCommittedM |
Currently allocated heap memory in MB |
||
MemHeapMaxM |
Maximum heap memory in MB |
||
MemHeapUsedM |
Currently utilized heap memory in MB |
||
RpcActivityForPort |
NumOpenConnections |
Number of open RPC connections |
|
RpcProcessingTimeAvgTime |
Average request processing time |
||
RpcQueueTimeAvgTime |
Average time of pending requests in a queue |
||
namenode |
FSNamesystem |
BlockCapacity |
Total space available for storing data blocks |
BlocksTotal |
Total number of data blocks |
||
CapacityTotal |
Total space available for storing data |
||
CapacityUsed |
Total capacity used for storing data |
||
CorruptReplicatedBlocks |
Number of blocks that are corrupted or have been copied repeatedly to HDFS |
||
FilesTotal |
Total number of files in HDFS |
||
MissingBlocks |
Number of missing data blocks |
||
UnderReplicatedBlocks |
Number of data blocks that do not have sufficient replicas in HDFS |
||
JvmMetrics |
GcCount |
Total number of GC iterations |
|
GcCountPSMarkSweep |
Number of GC iterations done with the PS MarkSweep algorithm |
||
GcCountPSScavenge |
Number of GC iterations done with the PS Scavenge algorithm |
||
GcTimeMillis |
Total GC time in milliseconds |
||
GcTimeMillisPSMarkSweep |
Total GC time (PS MarkSweep) in milliseconds |
||
GcTimeMillisPSScavenge |
Total GC time (PS Scavenge) in milliseconds |
||
MemHeapCommittedM |
Currently allocated heap memory in MB |
||
MemHeapMaxM |
Maximum heap memory in MB |
||
MemHeapUsedM |
Currently utilized heap memory in MB |
||
RpcActivityForPort |
NumOpenConnections |
Number of open RPC connections |
|
RpcProcessingTimeAvgTime |
Average request processing time |
||
RpcQueueTimeAvgTime |
Average time of pending requests in a queue |
| Component | Metrics group | Metric name | Description |
|---|---|---|---|
hiveserver2 |
ClassLoading |
LoadedClassCount |
Number of loaded classes |
GarbageCollector |
PSMarkSweep |
GC time spent using the PS MarkSweep algorithm |
|
PSScavenge |
GC time spent using the PS Scavenge algorithm |
||
HS2 |
active_calls_api_Driver_execute |
The |
|
active_calls_api_Driver_run |
|||
active_calls_api_PostHook_org_apache_hadoop_hive_ql_stats_OperatorStatsReaderHook |
|||
active_calls_api_compile |
|||
active_calls_api_hs2_operation_INITIALIZED |
Number of active operations in HiveServer2 (HS2) with the corresponding status |
||
active_calls_api_hs2_operation_PENDING |
|||
active_calls_api_hs2_operation_RUNNING |
|||
active_calls_api_hs2_sql_operation_PENDING |
Number of active SQL operations with the corresponding status |
||
active_calls_api_hs2_sql_operation_RUNNING |
|||
active_calls_api_parse |
Number of requests submitted to the HiveServer via the API, which were successfully parsed and ready for processing |
||
active_calls_api_releaseLocks |
The |
||
active_calls_api_runTasks |
|||
active_calls_api_semanticAnalyze |
|||
active_calls_api_waitCompile |
|||
active_calls_hs2_compiling_queries |
Number of requests that are currently compiling |
||
active_calls_hs2_executing_queries |
Number of requests that are currently executing |
||
active_calls_hs2_submitted_queries |
Number of requests submitted for execution |
||
api_Driver_execute |
The |
||
api_Driver_run |
|||
api_PostHook_org_apache_hadoop_hive_ql_stats_OperatorStatsReaderHook |
|||
api_compile |
|||
api_hs2_operation_INITIALIZED |
|||
api_hs2_operation_PENDING |
|||
api_hs2_operation_RUNNING |
|||
api_hs2_sql_operation_PENDING |
|||
api_hs2_sql_operation_RUNNING |
|||
api_parse |
|||
api_releaseLocks |
|||
api_runTasks |
|||
api_semanticAnalyze |
|||
api_waitCompile |
|||
cumulative_connection_count |
The total number of established connections to HiveServer2 since the server startup |
||
exec_async_pool_size |
The current size of the HiveServer2 asynchronous thread pool |
||
exec_async_queue_size |
The current size of the HiveServer2 asynchronous operation queue |
||
hs2_active_sessions |
Number of active sessions on HiveServer2 |
||
hs2_compiling_queries |
Number of queries being compiled on HiveServer2 |
||
hs2_completed_operation_CLOSED |
Number of completed operations with the corresponding status |
||
hs2_completed_operation_FINISHED |
|||
hs2_completed_sql_operation_CLOSED |
Number of completed SQL operations with the corresponding status |
||
hs2_completed_sql_operation_FINISHED |
|||
hs2_executing_queries |
Number of queries being executed on HiveServer2 |
||
hs2_open_sessions |
Number of open sessions on HiveServer2 |
||
hs2_sql_operation_active_user |
The current number of active users performing SQL operations on HiveServer2 |
||
hs2_submitted_queries |
Number of queries submitted to HiveServer2 |
||
hs2_succeeded_queries |
Number of queries succeeded on HiveServer2 |
||
buffers_direct_count |
JVM metrics |
||
buffers_direct_used |
|||
buffers_mapped_capacity |
|||
buffers_mapped_count |
|||
buffers_mapped_used |
|||
classLoading_loaded |
|||
classLoading_unloaded |
|||
gc_PS-MarkSweep_count |
|||
gc_PS-MarkSweep_time |
|||
gc_PS-Scavenge_count |
|||
gc_PS-Scavenge_time |
|||
jvm_pause_extraSleepTime |
|||
memory_heap_committed |
|||
memory_heap_init |
|||
memory_heap_max |
|||
memory_heap_usage |
|||
memory_heap_used |
|||
memory_non-heap_committed |
|||
memory_non-heap_init |
|||
memory_non-heap_max |
|||
memory_non-heap_usage |
|||
memory_non-heap_used |
|||
memory_pools_Code-Cache_usage |
|||
memory_pools_Compressed-Class-Space_usage |
|||
memory_pools_Metaspace_usage |
|||
memory_pools_PS-Eden-Space_usage |
|||
memory_pools_PS-Old-Gen_usage |
|||
memory_pools_PS-Survivor-Space_usage |
|||
memory_total_committed |
|||
memory_total_init |
|||
memory_total_max |
|||
memory_total_used |
|||
open_connections |
|||
open_operations |
|||
qc_current_size |
|||
qc_max_size |
|||
threads_blocked_count |
|||
threads_count |
|||
threads_daemon_count |
|||
threads_deadlock_count |
|||
threads_new_count |
|||
threads_runnable_count |
|||
threads_terminated_count |
|||
threads_timed_waiting_count |
|||
threads_waiting_count |
|||
waiting_compile_ops |
|||
Memory |
HeapMemoryUsage_committed |
Amount of memory allocated for heap |
|
HeapMemoryUsage_init |
Initial heap memory size |
||
HeapMemoryUsage_max |
Maximum heap memory size |
||
HeapMemoryUsage_used |
Utilized heap memory size |
||
NonHeapMemoryUsage_committed |
Amount of memory allocated for non-heap JVM areas |
||
NonHeapMemoryUsage_init |
Initial memory size allocated for non-heap JVM areas |
||
NonHeapMemoryUsage_max |
Maximum memory size allocated for non-heap JVM areas |
||
NonHeapMemoryUsage_used |
Utilized memory for non-heap JVM areas |
||
MemoryPool |
CodeCache |
Code cache size |
|
CompressedClassSpace |
Size of the compressed class space |
||
Metaspace |
Metaspace size |
||
PSEdenSpace |
Size of the Eden space in the Spark History Server using the PS algorithm |
||
PSOldGen |
Size of the Old Generation space in the Spark History Server using the PS algorithm |
||
PSSurvivorSpace |
Size of the Survivor space in the Spark History Server using the PS algorithm |
||
OperatingSystem |
ProcessCpuLoad |
CPU load |
|
Threading |
DaemonThreadCount |
Number of daemon threads |
|
PeakThreadCount |
Maximum number of threads |
||
ThreadCount |
Current number of threads |
||
TotalStartedThreadCount |
Total threads count |
||
metastore |
ClassLoading |
LoadedClassCount |
Number of loaded classes |
GarbageCollector |
PSMarkSweep |
GC time spent using the PS MarkSweep algorithm |
|
PSScavenge |
GC time spent using the PS Scavenge algorithm |
||
Memory |
HeapMemoryUsage_committed |
Amount of memory allocated for heap |
|
HeapMemoryUsage_init |
Initial heap memory size |
||
HeapMemoryUsage_max |
Maximum heap memory size |
||
HeapMemoryUsage_used |
Utilized heap memory size |
||
NonHeapMemoryUsage_committed |
Amount of memory allocated for non-heap JVM areas |
||
NonHeapMemoryUsage_init |
Initial memory size allocated for non-heap JVM areas |
||
NonHeapMemoryUsage_max |
Maximum memory size allocated for non-heap JVM areas |
||
NonHeapMemoryUsage_used |
Utilized memory for non-heap JVM areas |
||
MemoryPool |
CodeCache |
Code cache size |
|
CompressedClassSpace |
Size of the compressed class space |
||
Metaspace |
Metaspace size |
||
PSEdenSpace |
Size of the Eden space in the Spark History Server using the PS algorithm |
||
PSOldGen |
Size of the Old Generation space in the Spark History Server using the PS algorithm |
||
PSSurvivorSpace |
Size of the Survivor space in the Spark History Server using the PS algorithm |
||
OperatingSystem |
ProcessCpuLoad |
CPU load |
|
Threading |
DaemonThreadCount |
Number of daemon threads |
|
PeakThreadCount |
Maximum number of threads |
||
ThreadCount |
Current number of threads |
||
TotalStartedThreadCount |
Total threads count |
||
metastore |
PS-MarkSweep_count |
Number of GC iterations done using the Concurrent Mark Sweep algorithm |
|
PS-MarkSweep_time |
The total GC time elapsed using the PS MarkSweep algorithm |
||
PS-Scavenge_count |
Number of GC iterations done using the PS Scavenge algorithm |
||
PS-Scavenge_time |
The total GC time elapsed using the PS Scavenge algorithm |
||
active_calls_create_table |
The |
||
active_calls_drop_table |
|||
active_calls_get_all_functions |
|||
active_calls_get_config_value |
|||
active_calls_get_database |
|||
active_calls_get_databases |
|||
active_calls_get_functions |
|||
active_calls_get_multi_table |
|||
active_calls_get_table |
|||
active_calls_get_tables |
|||
active_calls_get_tables_by_type |
|||
api_create_table |
The |
||
api_create_table_with_environment_context |
|||
api_drop_table |
|||
api_drop_table_with_environment_context |
|||
api_flushCache |
|||
api_get_all_databases |
|||
api_get_all_functions |
|||
api_get_config_value |
|||
api_get_current_notificationEventId |
|||
api_get_database |
|||
api_get_databases |
|||
api_get_functions |
|||
api_get_multi_table |
|||
api_get_next_notification |
|||
api_get_table |
|||
api_get_table_objects_by_name_req |
|||
api_get_table_req |
|||
api_get_tables |
|||
api_get_tables_by_type |
|||
api_init |
|||
api_set_ugi |
|||
api_shutdown |
|||
blocked_count |
Number of threads blocked |
||
create_total_count_dbs |
Number of databases created |
||
create_total_count_partitions |
Number of partitions created |
||
create_total_count_tables |
Number of tables created |
||
daemon_count |
The daemon count |
||
deadlock_count |
Number of deadlocks detected |
||
delete_total_count_dbs |
The total number of deleted databases |
||
delete_total_count_partitions |
The total number of deleted partitions |
||
delete_total_count_tables |
The total number of deleted tables |
||
direct_count |
JVM metrics |
||
direct_used |
|||
directsql_errors |
|||
heap_committed |
|||
heap_init |
|||
heap_max |
|||
heap_usage |
|||
heap_used |
|||
jvm_pause_extraSleepTime |
|||
jvm_pause_info-threshold |
|||
jvm_pause_warn-threshold |
|||
loaded |
|||
mapped_capacity |
|||
mapped_count |
|||
mapped_used |
|||
new_count |
|||
non-heap_committed |
|||
non-heap_init |
|||
non-heap_max |
|||
non-heap_usage |
|||
non-heap_used |
|||
open_connections |
|||
pools_Code-Cache_usage |
|||
pools_Compressed-Class-Space_usage |
|||
pools_Metaspace_usage |
|||
pools_PS-Eden-Space_usage |
|||
pools_PS-Old-Gen_usage |
|||
pools_PS-Survivor-Space_usage |
|||
runnable_count |
|||
terminated_count |
|||
timed_waiting_count |
|||
total_committed |
|||
total_count_dbs |
|||
total_count_partitions |
|||
total_count_tables |
|||
total_init |
|||
total_max |
|||
total_used |
|||
unloaded |
|||
waiting_count |
| Component | Metrics group | Metric name | Description |
|---|---|---|---|
historyserver |
ClassLoading |
LoadedClassCount |
Number of loaded classes |
GarbageCollector |
PSMarkSweep |
GC time spent using the PS MarkSweep algorithm |
|
PSScavenge |
GC time spent using the PS Scavenge algorithm |
||
Memory |
HeapMemoryUsage_committed |
Amount of memory allocated for heap |
|
HeapMemoryUsage_init |
Initial heap memory size |
||
HeapMemoryUsage_max |
Maximum heap memory size |
||
HeapMemoryUsage_used |
Utilized heap memory size |
||
NonHeapMemoryUsage_committed |
Amount of memory allocated for non-heap JVM areas |
||
NonHeapMemoryUsage_init |
Initial memory size allocated for non-heap JVM areas |
||
NonHeapMemoryUsage_max |
Maximum memory size allocated for non-heap JVM areas |
||
NonHeapMemoryUsage_used |
Utilized memory for non-heap JVM areas |
||
MemoryPool |
CodeCache |
Code cache size |
|
CompressedClassSpace |
Size of the compressed class space |
||
Metaspace |
Metaspace size |
||
PSEdenSpace |
Size of the Eden space in the Spark History Server using the PS algorithm |
||
PSOldGen |
Size of the Old Generation space in the Spark History Server using the PS algorithm |
||
PSSurvivorSpace |
Size of the Survivor space in the Spark History Server using the PS algorithm |
||
OperatingSystem |
ProcessCpuLoad |
CPU load |
|
Threading |
DaemonThreadCount |
Number of daemon threads |
|
PeakThreadCount |
Maximum number of threads |
||
ThreadCount |
Current number of threads |
||
TotalStartedThreadCount |
Total threads count |
| Component | Metrics group | Metric name | Description |
|---|---|---|---|
historyserver |
JvmMetrics |
GcCount |
Total number of GC iterations |
GcCountPSMarkSweep |
Number of GC iterations done with the PS MarkSweep algorithm |
||
GcCountPSScavenge |
Number of GC iterations done with the PS Scavenge algorithm |
||
GcTimeMillis |
Total GC time in milliseconds |
||
GcTimeMillisPSMarkSweep |
Total GC time (PS MarkSweep) in milliseconds |
||
GcTimeMillisPSScavenge |
Total GC time (PS Scavenge) in milliseconds |
||
MemHeapCommittedM |
Currently allocated heap memory in MB |
||
MemHeapMaxM |
Maximum heap memory in MB |
||
MemHeapUsedM |
Currently utilized heap memory in MB |
||
nodemanager |
JvmMetrics |
GcCount |
Total number of GC iterations |
GcCountPSMarkSweep |
Number of GC iterations done with the PS MarkSweep algorithm |
||
GcCountPSScavenge |
Number of GC iterations done with the PS Scavenge algorithm |
||
GcTimeMillis |
Total GC time in milliseconds |
||
GcTimeMillisPSMarkSweep |
Total GC time (PS MarkSweep) in milliseconds |
||
GcTimeMillisPSScavenge |
Total GC time (PS Scavenge) in milliseconds |
||
MemHeapCommittedM |
Currently allocated heap memory in MB |
||
MemHeapMaxM |
Maximum heap memory in MB |
||
MemHeapUsedM |
Currently utilized heap memory in MB |
||
NodeManagerMetrics |
AllocatedContainers |
Number of allocated containers |
|
AllocatedGB |
The size of allocated memory in GB |
||
AllocatedVCores |
Number of allocated cores |
||
AvailableGB |
The size of available memory in GB |
||
AvailableVCores |
Number of available cores |
||
BadLocalDirs |
Number of directories available on the local disk that can be used for storing task data due to errors |
||
BadLogDirs |
Number of directories on the local disk that can not be used for storing task log files due to errors |
||
ContainerLaunchDurationAvgTime |
The average time spent on launching a task container |
||
ContainersCompleted |
Number of task containers that were completed successfully |
||
ContainersFailed |
Number of task containers that failed to complete |
||
ContainersIniting |
Number of task containers in the initialization state |
||
ContainersKilled |
Number of task containers that were forcibly stopped |
||
ContainersLaunched |
Number of task containers that were started successfully |
||
ContainersRunning |
Number of task containers that are currently running |
||
GoodLocalDirsDiskUtilizationPerc |
The percentage of disk space utilization in directories on a local disk that can be used for storing task data |
||
GoodLogDirsDiskUtilizationPerc |
The percentage of disk space utilization in directories on a local disk that can be used for storing task log files |
||
resourcemanager |
JvmMetrics |
GcCount |
Total number of GC iterations |
GcCountPSMarkSweep |
Number of GC iterations done with the PS MarkSweep algorithm |
||
GcCountPSScavenge |
Number of GC iterations done with the PS Scavenge algorithm |
||
GcTimeMillis |
Total GC time in milliseconds |
||
GcTimeMillisPSMarkSweep |
Total GC time (PS MarkSweep) in milliseconds |
||
GcTimeMillisPSScavenge |
Total GC time (PS Scavenge) in milliseconds |
||
MemHeapCommittedM |
Currently allocated heap memory in MB |
||
MemHeapMaxM |
Maximum heap memory in MB |
||
MemHeapUsedM |
Currently utilized heap memory in MB |
||
QueueMetrics |
AllocatedVCores |
Number of allocated cores |
|
AppsFailed |
Number of applications exited with an error |
||
AppsKilled |
Number of applications killed by a user |
||
AppsPending |
Number of applications pending for resources |
||
AppsRunning |
Number of started applications |
||
AppsSubmitted |
Number of applications submitted to a queue |
||
AvailableVCores |
Number of available cores |
Metrics visualization
To visualize monitoring metrics as graphs and charts, use the Graphite and Grafana UIs available after the installation of a monitoring cluster.
Graphite
Graphite is a monitoring tool that stores numeric time-series data and visualizes this data on graphs in web UI.
To view metrics in Graphite, enter the address of the host, where your monitoring cluster is installed, into the browser address bar. For example, http://10.20.30.444:<port>. By default, Graphite runs on the 80 port, so you can omit the port number unless you specified a different port during the installation of your monitoring cluster.
Grafana
Grafana allows you to query data and visualize metrics stored in Graphite.
To view the Grafana web UI, enter the address of the host, where your monitoring cluster is installed, into the browser address bar. For example, http://10.20.30.444:<port>. By default, Grafana UI is available on the 3000 port.
On the Grafana home page, click Home, and then select the required dashboard to view the metrics.