Adding services

In ADCM a service means a software that performs some function. Examples of services for ADH clusters: HDFS, HBase, Hive, etc. The steps for adding services to a cluster are listed below:

  1. From the CLUSTERS tab switch to the cluster configuring. To do this, click the row that contains the added cluster or click the icon in the Config column. Both methods open the cluster menu.

    adcm cluster config
    Switching to cluster configuring
  2. Open the Services tab in the cluster menu and click Add services.

    adcm services to cluster 01
    Switching to services adding
  3. In the opened dialog, select the services that should be added to the cluster, and click Add.

    adcm services to cluster 02
    Choosing services

    The brief description of available services is listed below.

    Services that can be added to the ADH cluster
    Service Purpose

    Airflow

    A service used for creation, scheduling, and monitoring workflows in the form of Directed Acyclic Graphs (DAGs) of tasks. Can be used in Hadoop clusters for building ETL/ELT processes

    Flink

    A distributed platform used in high-load Big Data applications for analyzing data stored in Hadoop clusters. It can be used in different streaming use cases: event-driven applications, stream and batch analytics, data pipelines and ETL, etc.

    HBase

    А non-relational, distributed database written in Java and used on the top of HDFS. Belongs to the class of column-oriented key-value storages. It is useful for random, real-time read/write access to Big Data

    HDFS

    A distributed file system used in Hadoop clusters for storing large files. Provides the possibility of the streaming access to the information distributed block-by-block across cluster nodes

    Hive

    A software designed for building data warehouses (DWH) and analyzing Big Data. It runs on the top of HDFS and other compatible systems, such as Apache HBase. It facilitates writing, reading, and managing large datasets stored in distributed systems

    Monitoring

    A service that should be added if monitoring of the ADH cluster is planned

    MySQL

    Maria DB — a relational database created on the base of MySQL and compatible with it. Some MariaDB commands and interfaces are closer to NoSQL, than to SQL. For example, it provides such data storage types as: ColumnStore — for column data storage and distributed architecture support, OQGRAPH — for storing tree and graph structures, etc.

    Solr

    A search platform based on the Apache Lucene project. Its main features include full-text search, faceted search, highlighting search results, distributed indexing, integration with databases, processing documents with a complex format (Word, PDF, etc.), load-balanced querying, centralized configuration, and others

    Spark

    Spark 2.x. A fast analytics engine used for large-scale data processing and compatible with Hadoop data. It can run in Hadoop clusters using the YARN or Spark’s standalone mode. It can process data in HDFS, HBase, Cassandra, Hive, and other Hadoop input formats. Supports both batch processing and new workloads like streaming, machine learning, interactive queries, etc.

    Spark3

    Spark 3.x. In comparison with the Spark 2.x version, it offers such new features as adaptive execution of Spark SQL, Dynamic Partition Pruning (DPP), graph processing, enhanced support for Deep Learning, and others

    Sqoop

    A service designed to transfer bulk data between Hadoop and relational databases or mainframes. You can use it, for example, to import data from MySQL, Oracle or other relational database management systems (RDBMS) to Hadoop clusters, convert the data in a certain way, and then export the data back to the RDBMS

    YARN

    A service needed for managing cluster resources and scheduling/monitoring jobs. Uses a special daemon (ResourceManager) that abstracts all the computing resources of the cluster and manages their provision to distributed applications

    Zeppelin

    A service that plays a role of a web-based notebook and enables interactive data analytics. Allows to create queries to data in Hadoop clusters and display the results in the form of tables, graphs, charts, etc.

    Zookeeper

    A centralized coordination service for distributed applications. It is used in Hadoop clusters for failure detection, active NameNode election, health monitoring, session management, etc.

    The minimal set of services recommended for ADH clusters is described below:

    • HDFS;

    • YARN;

    • Zookeeper (optional for the Community Edition of ADH).

    These services make up the core of Hadoop and are sufficient to organize distributed data storage and processing. The full list of services depends on the requirements of a particular project.

  4. As a result, the added services are displayed at the Services tab.

    adcm services to cluster 03
    The result of successful adding services to a cluster
NOTE
You can also add services later. The process of adding new services to already running cluster does not differ from installing a service from scratch.
Found a mistake? Seleсt text and press Ctrl+Enter to report it