In ADCM a service means a software that performs some function. Examples of services for ADH clusters: HDFS, HBase, Hive, etc. The steps for adding services to a cluster are listed below:
From the CLUSTERS tab switch to the cluster configuring. To do this, click the row that contains the added cluster or click the icon in the Config column. Both methods open the cluster menu.Switching to cluster configuring
Open the Services tab in the cluster menu and click Add services.Switching to services adding
In the opened dialog, select the services that should be added to the cluster, and click Add.Choosing services
The brief description of available services is listed below.
Services that can be added to the ADH cluster Service Purpose
A service used for creation, scheduling, and monitoring workflows in the form of Directed Acyclic Graphs (DAGs) of tasks. Can be used in Hadoop clusters for building ETL/ELT processes
A distributed platform used in high-load Big Data applications for analyzing data stored in Hadoop clusters. It can be used in different streaming use cases: event-driven applications, stream and batch analytics, data pipelines and ETL, etc.
А non-relational, distributed database written in Java and used on the top of HDFS. Belongs to the class of column-oriented key-value storages. It is useful for random, real-time read/write access to Big Data
A distributed file system used in Hadoop clusters for storing large files. Provides the possibility of the streaming access to the information distributed block-by-block across cluster nodes
A software designed for building data warehouses (DWH) and analyzing Big Data. It runs on the top of HDFS and other compatible systems, such as Apache HBase. It facilitates writing, reading, and managing large datasets stored in distributed systems
A service that should be added if monitoring of the ADH cluster is planned
Maria DB — a relational database created on the base of MySQL and compatible with it. Some MariaDB commands and interfaces are closer to NoSQL, than to SQL. For example, it provides such data storage types as: ColumnStore — for column data storage and distributed architecture support, OQGRAPH — for storing tree and graph structures, etc.
A search platform based on the Apache Lucene project. Its main features include full-text search, faceted search, highlighting search results, distributed indexing, integration with databases, processing documents with a complex format (Word, PDF, etc.), load-balanced querying, centralized configuration, and others
Spark 2.x. A fast analytics engine used for large-scale data processing and compatible with Hadoop data. It can run in Hadoop clusters using the YARN or Spark’s standalone mode. It can process data in HDFS, HBase, Cassandra, Hive, and other Hadoop input formats. Supports both batch processing and new workloads like streaming, machine learning, interactive queries, etc.
Spark 3.x. In comparison with the Spark 2.x version, it offers such new features as adaptive execution of Spark SQL, Dynamic Partition Pruning (DPP), graph processing, enhanced support for Deep Learning, and others
A service designed to transfer bulk data between Hadoop and relational databases or mainframes. You can use it, for example, to import data from MySQL, Oracle or other relational database management systems (RDBMS) to Hadoop clusters, convert the data in a certain way, and then export the data back to the RDBMS
A service needed for managing cluster resources and scheduling/monitoring jobs. Uses a special daemon (ResourceManager) that abstracts all the computing resources of the cluster and manages their provision to distributed applications
A service that plays a role of a web-based notebook and enables interactive data analytics. Allows to create queries to data in Hadoop clusters and display the results in the form of tables, graphs, charts, etc.
A centralized coordination service for distributed applications. It is used in Hadoop clusters for failure detection, active NameNode election, health monitoring, session management, etc.
The minimal set of services recommended for ADH clusters is described below:
Zookeeper (optional for the Community Edition of ADH).
These services make up the core of Hadoop and are sufficient to organize distributed data storage and processing. The full list of services depends on the requirements of a particular project.
As a result, the added services are displayed at the Services tab.The result of successful adding services to a cluster
NOTEYou can also add services later. The process of adding new services to already running cluster does not differ from installing a service from scratch.