This topic describes the use methods, benefits, and features of Alibaba Cloud Object Storage Service (OSS) and OSS-HDFS.
Background information
OSS is a secure, cost-effective, and highly reliable cloud storage service that allows you to store large amounts of data. OSS is designed to provide 99.9999999999% (twelve 9's) data durability and 99.995% data availability. OSS provides multiple storage classes to help you manage and reduce storage costs. For more information, see What is OSS?.
OSS-HDFS (JindoFS) is a cloud-native data lake storage service. OSS-HDFS provides centralized metadata management capabilities and is fully compatible with Hadoop Distributed File System (HDFS) API. OSS-HDFS also supports Portable Operating System Interface (POSIX). You can use OSS-HDFS to manage data in data lake-based computing scenarios in the big data and AI fields. For more information, see What is OSS-HDFS?.
JindoData is a suite developed by the Alibaba Cloud big data team for storage acceleration of data lake systems. JindoData provides end-to-end solutions for data lake systems of Alibaba Cloud and other vendors in big data and AI scenarios. JindoData is built on top of a unified architecture and kernel. JindoData provides the following components: JindoFS (the original JindoFS in block storage mode), JindoFSx (the original JindoFS in cache mode), and JindoSDK. JindoData also provides fully compatible tools, such as JindoFuse and Jindo DistCp, and plug-ins. For more information, see Overview.
Use methods
By default, JindoSDK is deployed in E-MapReduce (EMR) clusters. You can use JindoSDK to access OSS or OSS-HDFS.
In other Alibaba Cloud services, you can download the latest version of the JindoSDK JAR package, install JindoSDK, and then use JindoSDK. For more information, see Deploy JindoSDK in an environment other than EMR.
Benefits
OSS or OSS-HDFS provides the following benefits when they are used as an underlying storage service:
Ready to use. OSS and OSS-HDFS are cloud-native storage services. You can use OSS and OSS-HDFS by calling RESTful APIs without the need to deploy the services. By default, JindoSDK is deployed in EMR clusters. You can use JindoSDK to access OSS or OSS-HDFS.
Cost-effective. You can use OSS or OSS-HDFS to reduce storage costs. OSS and OSS-HDFS provide various storage classes, such as Infrequent Access (IA), Archive, and Cold Archive, that you can use to store data. This reduces the storage costs of cold data.
High expandability. OSS and OSS-HDFS are highly expandable. The storage space of OSS or OSS-HDFS is not limited by hard disk capacity. You do not need to manually expand storage capacity.
Features
The following table describes the differences between the features of OSS and OSS-HDFS.
Scenario | Feature | OSS | OSS-HDFS |
Big data scenario (Hadoop) | Operations for files and directories, and related operations | Supported | Supported |
Support for granting permissions on files and directories | Not supported | Supported | |
Atomic operations for directories and rename operations | Supported (poor performance) | Supported (millisecond-granularity rename operations) | |
Support for specifying a point in time by using setTimes | Not supported | Supported | |
Extended attributes (XAttrs) | Not supported | Supported | |
ACL | Not supported | Supported | |
Support for accelerating on-premises read caching | Supported | Supported | |
Snapshots | Not supported | Supported | |
File-related operations, such as flush, sync, truncate, and append | Not supported | Supported | |
Truncate operations on files | Not supported | Supported | |
Checksum verification | Supported | Supported | |
Automatic clean-up of the HDFS recycle bin | Not supported | Supported | |
AI scenario (POSIX) | Metadata consistency | Weak | Strong |
File-related operations, such as flush, sync, truncate, and append | Supported (However, limits on the operations exist. For information, see Limits.) | Supported | |
Truncate operations on files | Not supported | Supported | |
Random writes to files | Not supported | Supported |