SSM architecture

Overview

SSM (Smart Storage Manager) is an optimization software that analyzes HDFS metadata and helps to find a more efficient way to store and operate data in compliance with user-created rules and existing data policies.

SSM monitors clusters by collecting data metrics, such as file access count, and marks data according to its temperature: popular data is HOT, rarely accessed data is COLD, and somewhere in the middle is WARM data. The precise criteria for data labeling is provided with rules set by users.

SSM architecture
SSM architecture
SSM architecture
SSM architecture

SSM introduces five new features to HDFS:

  • Data moving — move data between different storage types depending on existing rules.

  • Asynchronous replication — replicate data in current HDFS cluster to other HDFS clusters or cloud storage. SSM monitors data modification operations (e.g. create, delete, append, and rename) on HDFS to achieve real-time data synchronization and avoid MapReduce computational costs.

  • Small files optimization — compact a certain number of small files into a container file. The container file is stored in HDFS, but it’s transparent to upper applications.

  • Erasure coding (EC) — set rules for converting between EC files and replication files, or between any pair of files with different EC policies.

  • Compression — compress data in HDFS without impeding access to it for external applications.

SSM provides a web UI, where users can submit rules and actions, check execution status and cluster metrics.

You can manage SSM and its components using ADCM or by manually updating its configuration files. For more information on SSM configuration properties, see the Configuration parameters page.

SSM Server

SSM Server (Smart Server) is the main SSM component. It’s responsible for collecting metadata from the NameNode, implementing rules, and performing actions.

The server polls edits log from the NameNode, analyzes it and stores in its SQL database — SSM Metastore. If the information pulled from the cluster triggers a certain rule, the server will start the appropriate action.

A user can interact with the server via APIs:

  • Admin API

  • Applications API

To achieve high availability (HA), it is possible to deploy additional Smart Servers. In this case, the redundant servers will act as Standby servers.

SSM Server failover scenario
SSM Server failover scenario
SSM Server failover scenario
SSM Server failover scenario

If the active server is down, one of the Standby servers becomes the active one.

Rules

SSM rules are scripts that users can create to automate data management tasks. The rules are written in a domain-specific language and have four parts: object, trigger, condition, and command.

For example, if certain data has been accessed over 3 times during the last 10 minutes, a user can categorize such data as hot data, and submit a customized rule, according to which SSM will automatically move such data to a faster storage medium.

For more information on how to create rules in SSM, see the Define actions in SSM article; and for more usage examples, see SSM rules usage examples.

Actions

SSM action is a task executed by SSM when a condition is met or if started by user. Each action is performed by a dedicated scheduler service:

  • Mover Scheduler — responsible for operation related to moving the data;

  • Copy Scheduler — replicates the data;

  • EC Scheduler — manages the erasure coded files;

  • SmallFiles Scheduler — manages the containers with small files;

  • NameSpace service — parses the edits logs;

  • Metrics service — collects access count information.

SSM components interaction
SSM components interaction
SSM components interaction
SSM components interaction

More information about actions in SSM is available in the Define actions in SSM article.

SSM Agent

Smart Agent is a dedicate process to execute tasks dispatched by SSM Server. To expand SSM’s task processing capability, you can deploy multiple Smart Agents in the cluster.

SSM Client

SSM Client (SmartDFSClient, Smart Client) is a wrapper for the HDFS client that additionally gathers data for the SSM Server.

When an application uses this client to access files in HDFS, SmartDFSClient will report the access event to Smart Server. Once the event is received, Smart Server will update the access count in SSM Metastore for corresponding data.

If the data is compressed or compacted by SSM, the client provides applications with the ability to read it.

SSM Metastore

The SSM Metastore is the SSM database that stores such data as access count, rules, and metrics.

Found a mistake? Seleсt text and press Ctrl+Enter to report it