OSS/OSS-HDFS - E-MapReduce - Alibaba Cloud Documentation Center

This topic describes the use methods, benefits, and features of Alibaba Cloud Object Storage Service (OSS) and OSS-HDFS.

Background information

OSS is a secure, cost-effective, and highly reliable cloud storage service that allows you to store large amounts of data. OSS is designed to provide 99.9999999999% (twelve 9's) data durability and 99.995% data availability. OSS provides multiple storage classes to help you manage and reduce storage costs. For more information, see What is OSS?.

OSS-HDFS (JindoFS) is a cloud-native data lake storage service. OSS-HDFS provides centralized metadata management capabilities and is fully compatible with Hadoop Distributed File System (HDFS) API. OSS-HDFS also supports Portable Operating System Interface (POSIX). You can use OSS-HDFS to manage data in data lake-based computing scenarios in the big data and AI fields. For more information, see What is OSS-HDFS?.

JindoData is a suite developed by the Alibaba Cloud big data team for storage acceleration of data lake systems. JindoData provides end-to-end solutions for data lake systems of Alibaba Cloud and other vendors in big data and AI scenarios. JindoData is built on top of a unified architecture and kernel. JindoData provides the following components: JindoFS (the original JindoFS in block storage mode), JindoFSx (the original JindoFS in cache mode), and JindoSDK. JindoData also provides fully compatible tools, such as JindoFuse and Jindo DistCp, and plug-ins. For more information, see Overview.

Use methods

By default, JindoSDK is deployed in E-MapReduce (EMR) clusters. You can use JindoSDK to access OSS or OSS-HDFS.
In other Alibaba Cloud services, you can download the latest version of the JindoSDK JAR package, install JindoSDK, and then use JindoSDK. For more information, see Deploy JindoSDK in an environment other than EMR.

Benefits

OSS or OSS-HDFS provides the following benefits when they are used as an underlying storage service:

Ready to use. OSS and OSS-HDFS are cloud-native storage services. You can use OSS and OSS-HDFS by calling RESTful APIs without the need to deploy the services. By default, JindoSDK is deployed in EMR clusters. You can use JindoSDK to access OSS or OSS-HDFS.
Cost-effective. You can use OSS or OSS-HDFS to reduce storage costs. OSS and OSS-HDFS provide various storage classes, such as Infrequent Access (IA), Archive, and Cold Archive, that you can use to store data. This reduces the storage costs of cold data.
High expandability. OSS and OSS-HDFS are highly expandable. The storage space of OSS or OSS-HDFS is not limited by hard disk capacity. You do not need to manually expand storage capacity.

Features

The following table describes the differences between the features of OSS and OSS-HDFS.

Scenario	Feature	OSS	OSS-HDFS
Big data scenario (Hadoop)	Operations for files and directories, and related operations	Supported	Supported
	Support for granting permissions on files and directories	Not supported	Supported
	Atomic operations for directories and rename operations	Supported (poor performance)	Supported (millisecond-granularity rename operations)
	Support for specifying a point in time by using setTimes	Not supported	Supported
	Extended attributes (XAttrs)	Not supported	Supported
	ACL	Not supported	Supported
	Support for accelerating on-premises read caching	Supported	Supported
	Snapshots	Not supported	Supported
	File-related operations, such as flush, sync, truncate, and append	Not supported	Supported
	Truncate operations on files	Not supported	Supported
	Checksum verification	Supported	Supported
	Automatic clean-up of the HDFS recycle bin	Not supported	Supported
AI scenario (POSIX)	Metadata consistency	Weak	Strong
	File-related operations, such as flush, sync, truncate, and append	Supported (However, limits on the operations exist. For information, see Limits.)	Supported
	Truncate operations on files	Not supported	Supported
	Random writes to files	Not supported	Supported