Ozone architecture
Overview
Apache Ozone (O3) is a distributed key/value object storage that is optimized for working with both Hadoop services and S3 storages. It’s highly scalable, can share the same cluster and security policies with HDFS, and is capable of effectively handling files no matter the size.
Ozone’s main advantages:
-
Unlike the HDFS file system, Ozone is designed to work effectively with a big number of small files.
-
Integrates well with Hadoop services, such as Hive, Spark, MapReduce, Kerberos, and the Apache Ranger framework.
-
Supports working with remote object storages via Amazon S3 API.
-
Able to work with container environments such as Kubernetes and YARN.
Concepts
Ozone is based on the following components:
-
Blocks
Same as in HDFS, blocks are the basic unit of storage. Each file in the system is comprised of blocks. In Ozone, the default block size is 256 MB.
-
Containers
A container is a group of blocks that is replicated as a single unit.
-
Keys
A key is a marker that indicates which blocks belong to a particular file. The keys act as file IDs for all data operations by clients.
-
Buckets
A bucket is a directory in terms of Ozone. Buckets can contain any number of keys, but cannot contain other buckets.
-
Volume
A volume is a home directory of a user in Ozone, which can only be created by an administrator. Ozone namespace can contain any number of volumes. Once a volume for a user has been created, the user can create buckets that belong to that volume.
Components
Ozone architecture is based on the following components:
-
Ozone Manager
A system service that manages the namespace in the same manner as the HDFS NameNode. It records the blocks located on the DataNodes and provides the IDs of the blocks when requested.
-
Ozone Storage Container Manager
A system service that manages the containers. It allocates blocks and assigns them to DataNodes. Clients read and write these blocks directly. Container Manager keeps track of all the block replicas. If there is a loss of data or a disk, Container Manager detects it and instructs data nodes to make copies of the missing blocks to ensure high availability.
-
Ozone DataNode
The Ozone service that aggregates blocks into storage containers.
-
Ozone Recon
Ozone Recon Server interacts with all other Ozone components and provides a unified management API and user interface.
Data operations
Ozone has three main functional layers:
-
The metadata data layer, controlled by Ozone Manager and Storage Container Manager.
-
The data storage layer, controlled by Ozone DataNodes.
-
The replication layer, controlled by Apache Ratis and used to replicate metadata.
Below is the description of how these layers interact during read and write operations.
Write operation
In Ozone, a write operation goes through the following steps:
-
To write a file to Ozone, a client makes a request to Ozone Manager. The request contains information about the bucket and volume, where the key will be created.
-
To allocate the blocks for the new key, Ozone Manager sends a request to Storage Container Manager.
-
Storage Container Manager allocates the blocks on the DataNodes and returns their IDs to Ozone Manager.
-
Ozone Manager records the metadata and returns the block token (a security permission to write data to the block) to the client.
-
The client uses the block token and writes data to the DataNode.