HDFS vs Ozone
Both Ozone (O3) and HDFS are open-source suitable distributed Hadoop storages, but there are several key differences which are discussed in this article.
Feature comparison
Key feature comparison is presented in the table below.
| Feature | HDFS | Ozone | 
|---|---|---|
| Data model | A file-based storage system where data is stored in files and directories | An object store that works with large amounts of unstructured data and is optimized for cloud | 
| Data replication | Replication among DataNodes to ensure fault tolerance by default | Software-defined storage that allows for custom data replication policies and data redundancy | 
| Scalability | Good scalability for handling massive processing jobs | Designed to provide even better scalability than HDFS | 
| Namespace management | Single namespace for the entire cluster | Multiple namespaces for different use cases | 
| Object storage | No | Yes | 
| Support for S3 and other object storage protocols | No | Yes | 
| Access control | POSIX-style permissions | S3-style permissions and bucket-level access controls | 
| Authentication and authorization | Kerberos | Kerberos, Ozone Token | 
| Data consistency | Eventual consistency | Strong consistency due to protocols like RAFT | 
Pros and cons
HDFS
HDFS is the default file system in Hadoop, and it has the following pros:
- 
massive data storage support; 
- 
quick detection and response to hardware failures; 
- 
support for data streaming; 
- 
simplified consistency model; 
- 
high fault tolerance and easy recovery; 
- 
designed for commercial hardware. 
However, there are also some disadvantages to it:
- 
not suitable for a large number of small files; 
- 
doesn’t support file modification (HDFS 2.x supports appending content to files); 
- 
struggles with over 400 million files; 
- 
doesn’t support parallel writing. 
Ozone
With HDFS' cons leading to a big discomfort with modern big data storage needs, a new solution had to be implemented with the following key advantages:
- 
strong consistency; 
- 
designed to store more than 100 billion objects in a single cluster; 
- 
great scalability due to layered architecture; 
- 
just as fault tolerant and easily recoverable as HDFS; 
- 
can work alongside with HDFS on the same hosts. 
Since the project is rather new, there are some cons:
- 
little deployment cases to learn from; 
- 
designed to integrate with Hadoop ecosystem, it’s still not widely supported, and some services may require additional configuration to work with Ozone; 
- 
no local socket, and overall performance is slower. 
Use cases
Apache Ozone is advantageous over HDFS in environments requiring scalability for small files, S3 compatibility, or cloud-native capabilities. However, HDFS remains suitable for Hadoop workloads with fewer demands for storage of small files without the possibility of combining them or cloud integration.