Home
Arenadata Hyperwave
How to
Hive
Integrations
TEZ
View a Hive job

View a Hive job via TEZ

Konstantin Alpashkin

Contents

Job statistics

Each job that you run in Hive appears on the Tez UI. Complete the following steps on the Tez UI:

Open the Tez UI. You can find the link to the interface on the TEZ overview page.
Launch the Hive shell on the host with the Hive HiveServer2 component. More details on using the Hive shell are available on the Work with Hive tables page.

Create a table using the Hive shell:

CREATE TABLE customers
( customer_id int NOT NULL,
  customer_name char(50) NOT NULL,
  address char(50),
  city char(50),
  state char(25),
  zip_code char(10)
);

Insert data into the created table:

INSERT INTO TABLE customers VALUES
(1,'John Smith','Holliwood blvd','Los Angeles','CA','435267'),
(2,'Alice Brown','42 st','New York','NY','876234'),
(3,'Maria Jensen','Paris drv','Chicago','IL','293752');

Use Tez UI during the data insertion to monitor this process.

Adding data to Tez

You can further check the inserted data with the following query:

SELECT *
FROM customers
WHERE state = 'CA'
ORDER BY customer_id ASC;

The output looks like the one below.

Query ID = aai_20211215195428_e60e1dc5-f6f3-4c5e-896a-6e56407845c1
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1639550101374_0007)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 5,38 s
----------------------------------------------------------------------------------------------
OK
1	John Smith                                        	Holliwood blvd                                    	Los Angeles                                       	CA                       	435267
4	Peter Stormik                                     	International blv                                 	San Francisco                                     	CA                       	231987
5	Daniel Morrowetz                                  	Bond rd                                           	Elk Grove                                         	CA                       	390817
Time taken: 6.447 seconds, Fetched: 3 row(s)

The SELECT job appears in the Tez UI similarly to INSERT.

Run the Hive computation example to get the data about this computation on Tez UI. On the job completion, you can see the full statistics on the jobs page.

Job statistics

DAG Details

You can see all detailed information about the generated DAG. The Query Details tab provides information such as:

Hive query ID;
user who executed the query;
start and end time;
total time needed to execute the query;
tables that were read and written;
application ID;
DAG IDs;
session ID;
thread ID;
queue against which the query was run.

DAG Details

DAG Counters

DAG counters provide a way to measure the progress or the number of operations that occur within a generated DAG. Counters are used to gather statistics for quality control purposes or for problem diagnosis.

DAG counters provide the following details:

Group name;
Counter name;
Counter value.

DAG Counters

Graphical View

The DAG Graphical View tab displays the DAG that is generated by Hive in the form of a flowchart.

Graphical View

Here, the inputs to vertices Map 10, Map 6, Map 4, and Map 1 are the tables displayed in green boxes. Next, Map 6 depends on the result set generated by Map 10. Map 6 then passes its result as an input to Reducer 7. When the Reducer 7 finishes its execution, the results are passed onto Reducer 8 and then to Reducer 9. Reducer 3 is the last vertex in the DAG flow. After `Reducer 3`_ successfully completes its execution, the query output is written to a file in HDFS.

There are a few options to change the layout of the DAG flow. You can hide the input and the output nodes to view only the task vertices by clicking the Toggle source/sink visibility icon. You can switch between the horizontal and vertical orientation by clicking the Toggle orientation icon.

When you hover an element, you get the full information about vertex details, status and progress, duration, and tasks number.

Element details

If you click on an element, you get the full statistics about this vertex on several tabs.

Vertex Details

All Vertices

On this tab, you can see a table that shows full information about Maps and Reducers. Each Vertex name is a link that directs you to the detailed information on Maps and Reducers.

All Vertices

All Tasks

This tab shows the detailed statistics about Reducer tasks. Using the links, you can view the detailed information about any task or Reducer.

All Tasks

All Tasks Attempts

This tab shows the detailed statistics about all Reducer tasks attempts. Using the links, you can view the full information about any attempt, task or reducer.

All Tasks Attempts

Vertex Swimlane

This tab shows the DAG of the vertices against time. Each mapping and reducing task is a vertex.

Vertex Swimlane

Each horizontal line of the swimlane represents the total time taken by the vertex to complete. The vertical lines indicate the time when:

the vertex was initialized;
the vertex was started;
the time when the first task was started;
the time when the last task was completed;
the time when the vertex finished its execution.

When you hover the vertical line, the bubble displays the stage of the vertex execution and provides a timestamp.

To know more about a particular vertex, hover the mouse anywhere at the horizontal line.

Vertex Information

The following details can help you to view the time taken by a particular task and debug the query:

Vertex ID — a unique identifier for a particular vertex.
Status — indicates the query status.
Progress — the progress of the vertex, usually for long-running queries.
Start time — indicates when a particular vertex started.
End time — indicates when a particular vertex ended.
Duration (in milliseconds) — the total time taken by the vertex to complete its execution.
Description — the description of the vertex.
First task start time — when the first task within that vertex started its execution.
Tasks — the total number of the tasks executed in a particular vertex.
Succeeded tasks — the number of the tasks that were executed successfully within that vertex.
Running tasks — the tasks that are still running.
Pending tasks — the tasks that have not yet started their execution.
Processor class — the Hive processor for Tez that forms the vertices in Tez and processes the data.

The vertical lines that connect the two vertices indicate the dependency of a vertex on another vertex.

Vertex dependencies

In this example, Map 6 depends on the results of Map 10. Map 6 will finish its execution only when Map 10 ends its execution successfully. Similarly, Reducer 8 depends on Reducer 7 to complete its execution.

Found a mistake? Seleсt text and press Ctrl+Enter to report it