HDFS vs Ozone
Both Ozone (O3) and HDFS are open-source suitable distributed Hadoop storages, but there are several key differences which are discussed in this article.
Feature comparison
Key feature comparison is presented in the table below.
Feature | HDFS | Ozone |
---|---|---|
Data model |
A file-based storage system where data is stored in files and directories |
An object store that works with large amounts of unstructured data and is optimized for cloud |
Data replication |
Replication among DataNodes to ensure fault tolerance by default |
Software-defined storage that allows for custom data replication policies and data redundancy |
Scalability |
Good scalability for handling massive processing jobs |
Designed to provide even better scalability than HDFS |
Namespace management |
Single namespace for the entire cluster |
Multiple namespaces for different use cases |
Object storage |
No |
Yes |
Support for S3 and other object storage protocols |
No |
Yes |
Access control |
POSIX-style permissions |
S3-style permissions and bucket-level access controls |
Authentication and authorization |
Kerberos |
Kerberos, Ozone Token |
Data consistency |
Eventual consistency |
Strong consistency due to protocols like RAFT |
Pros and cons
HDFS
HDFS is the default file system in Hadoop, and it has the following pros:
-
massive data storage support;
-
quick detection and response to hardware failures;
-
support for data streaming;
-
simplified consistency model;
-
high fault tolerance and easy recovery;
-
designed for commercial hardware.
However, there are also some disadvantages to it:
-
not suitable for a large number of small files;
-
doesn’t support file modification (HDFS 2.x supports appending content to files);
-
struggles with over 400 million files;
-
doesn’t support parallel writing.
Ozone
With HDFS' cons leading to a big discomfort with modern big data storage needs, a new solution had to be implemented with the following key advantages:
-
strong consistency;
-
designed to store more than 100 billion objects in a single cluster;
-
great scalability due to layered architecture;
-
just as fault tolerant and easily recoverable as HDFS;
-
can work alongside with HDFS on the same hosts.
Since the project is rather new, there are some cons:
-
little deployment cases to learn from;
-
designed to integrate with Hadoop ecosystem, it’s still not widely supported, and some services may require additional configuration to work with Ozone;
-
no local socket, and overall performance is slower.
Use cases
Apache Ozone is advantageous over HDFS in environments requiring scalability for small files, S3 compatibility, or cloud-native capabilities. However, HDFS remains suitable for Hadoop workloads with fewer demands for storage of small files without the possibility of combining them or cloud integration.