Конференция Arenadata
Новое время — новый Greenplum
Мы приглашаем вас принять участие в конференции, посвященной будущему Open-Source Greenplum 19 сентября в 18:00:00 UTC +3. Встреча будет проходить в гибридном формате — и офлайн, и онлайн. Онлайн-трансляция будет доступна для всех желающих.
Внезапное закрытие Greenplum его владельцем — компанией Broadcom - стало неприятным сюрпризом для всех, кто использует или планирует начать использовать решения на базе этой технологии. Многие ожидают выхода стабильной версии Greenplum 7 и надеются на её дальнейшее активное развитие.
Arenadata не могла допустить, чтобы разрабатываемый годами Open-Source проект Greenplum прекратил своё существование, поэтому 19 сентября мы представим наш ответ на данное решение Broadcom, а участники сообщества получат исчерпывающие разъяснения на все вопросы о дальнейшей судьбе этой технологии.

На конференции вас ждёт обсуждение следующих тем:

  • План возрождения Greenplum;
  • Дорожная карта;
  • Экспертное обсуждение и консультации.
Осталось до события

Connect to YARN via REST API

Overview

Hadoop provides REST APIs for all YARN components. It enables URI access to the cluster, nodes, applications, and application history.

Most YARN APIs support only GET requests, but some YARN components, like ResourceManager, support POST. The only fields used in the request header are Accept and Accept-Encoding; all other header fields are ignored.

In the Accept field, you can pass XML and JSON to indicate which format you expect in the response. The default format is JSON. In the Accept-Encoding field, you can specify gzip to get a gzip-compressed output.

The URI syntax for YARN REST APIs is as follows:

http://<IP:PORT>/ws/<version>/<resourcepath>

Where:

  • <IP:PORT> — the IP or FQDN of the component host and its port;

  • <version> — the API version;

  • <resourcepath> — a path that defines which API function to call.

Authentication

 
Using the API requires authentication. If your cluster has no security configured, add the ?user.name=<user> option to your requests. Where <user> is the username under which you want to run the request. For example, admin.

If security is on, authentication method is different for each security model.

When Kerberos SPNEGO is on, use the --negotiate and -u options:

$ curl -i --negotiate -u : "http://<HOST>:<HTTP_PORT>/ws/<version>/<resourcepath>"

When the authentication with the Hadoop delegation token is on, pass the token in place of the username:

$ curl -i "http://<HOST>:<HTTP_PORT>/ws/<version>/<resourcepath>?delegation=<TOKEN>"

For more information about authentication methods, see Authentication for Hadoop HTTP web-consoles.

The example request to the Timeline server:

$ curl http://127.0.0.1:8188/ws/v1/timeline?user.name=admin

The response contains information about the server:

{"About":"Timeline API","timeline-service-version":"3.1.2","timeline-service-build-version":"3.1.2 from 2bfc95569d9993d795ded4878847f3f3db76e77c by jenkins source checksum 7954337bcd9688eca8aa32720d2c74","timeline-service-version-built-on":"2023-09-07T08:01Z","hadoop-version":"3.1.2","hadoop-build-version":"3.1.2 from 2bfc95569d9993d795ded4878847f3f3db76e77c by jenkins source checksum 38903f2495a81dfd8e2d8fc4f659a92","hadoop-version-built-on":"2023-09-07T07:57Z"}
NOTE

The fields within the response body are not limited in number and have no specific order. When developing applications that use the YARN REST API, use parsing routines that don’t depend on a particular order.

ResourceManager API

The ResourceManager REST API enables access to information about the cluster and allows managing applications. The full API specification is available at ResourceManager REST API’s.

Here’s an example of using ResourceManager REST API for submitting a new application:

  1. Request an ID for your application by calling:

    $ curl -X POST http://127.0.0.1:8088/ws/v1/cluster/apps/new-application?user.name=admin

    The server’s response:

    {"application-id":"application_1705573015955_0001","maximum-resource-capability":{"memory":7057,"vCores":2,"resourceInformations":{"resourceInformation":[{"maximumAllocation":9223372036854775807,"minimumAllocation":0,"name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":7057},{"maximumAllocation":9223372036854775807,"minimumAllocation":0,"name":"vcores","resourceType":"COUNTABLE","units":"","value":2}]}}}[admin@elenas-adh2 ~]$
  2. The response contains the application ID and the information on available resources. Create a JSON file with parameters of your application and pass it to the ResourceManager:

    $ curl -v -X POST -d @example-app.json -H "Content-type: application/json"  http://127.0.0.1:8088/ws/v1/cluster/apps/?user.name=admin
    Application information example
     {
        "application-id":"application_1705573015955_0001",
        "application-name":"test",
        "am-container-spec":
        {
           "local-resources":
           {
              "entry":
             [
                {
                   "key":"hadoop-mapreduce-examples.jar",
                   "value":
                   {
                      "resource":"hdfs://127.0.0.2:8020/tmp/hadoop-mapreduce-examples.jar",
                      "type":"FILE",
                      "visibility":"APPLICATION",
                      "size": "30897",
                      "timestamp": "1405452071209"
                   }
                }
              ]
           },
           "commands":
           {
              "command":"{{JAVA_HOME}}/bin/java -Xmx10m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --container_vcores 1 --num_containers 1 --priority 0 1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr"
           },
           "environment":
           {
              "entry":
              [
                 {
                    "key": "DISTRIBUTEDSHELLSCRIPTTIMESTAMP",
                    "value": "1405459400754"
                 },
                 {
                    "key": "CLASSPATH",
                    "value": "{{CLASSPATH}}<CPS>./*<CPS>{{HADOOP_CONF_DIR}}<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/*<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/lib/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/lib/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/lib/*<CPS>./log4j.properties"
                 },
                 {
                    "key": "DISTRIBUTEDSHELLSCRIPTLEN",
                    "value": "6"
                 },
                 {
                    "key": "DISTRIBUTEDSHELLSCRIPTLOCATION",
                    "value": "hdfs://127.0.0.2:8020/tmp/hadoop-mapreduce-examples.jar"
                 }
              ]
           }
        },
        "unmanaged-AM":"false",
        "max-app-attempts":"2",
        "resource":
        {
           "memory":"1024",
           "vCores":"1"
        },
        "application-type":"YARN",
        "keep-containers-across-application-attempts":"false"
      }

    If the application has been submitted successfully, the server returns the HTTP/1.1 202 Accepted code and the application’s location URL, which can be used to check the application status.

To enable cross-origin support (CORS) for the ResourceManager only, set the following configuration parameters:

  • core-site.xml — set hadoop.http.filter.initializers to org.apache.hadoop.security.HttpCrossOriginFilterInitializer;

  • yarn-site.xml — set yarn.resourcemanager.webapp.cross-origin.enabled to true.

NodeManager

The NodeManager REST API provides information about the node, applications, and containers running on that node. The full API specification is available at NodeManager REST API’s.

Here’s an example of the command that prints information about a node:

$ curl -X GET http://<IP>:8042/ws/v1/node/info?user.name=admin

Where <IP> is the IP address or FQDN of the host.

To enable cross-origin support (CORS) for the NodeManager, set the following configuration parameters:

  • core-site.xml — set hadoop.http.filter.initializers to org.apache.hadoop.security.HttpCrossOriginFilterInitializer;

  • yarn-site.xml — set yarn.nodemanager.webapp.cross-origin.enabled to true.

Timeline Server

The Timeline Server REST API supports two methods: GET and POST. Currently, there are two available API versions:

Here’s an example of the command that prints the list of domains for the specified owner:

$ curl -X GET http://<IP>:8188/ws/v1/timeline/domain?owner=<owner>

Where <IP> is the IP address or FQDN of the host and <owner> is the domain owner name.

Found a mistake? Seleсt text and press Ctrl+Enter to report it