All Products
Search
Document Center

E-MapReduce:OSS/OSS-HDFS

Last Updated:Dec 13, 2024

This topic describes the use methods, benefits, and features of Alibaba Cloud Object Storage Service (OSS) and OSS-HDFS.

Background information

OSS is a secure, cost-effective, and highly reliable cloud storage service that allows you to store large amounts of data. OSS is designed to provide 99.9999999999% (twelve 9's) data durability and 99.995% data availability. OSS provides multiple storage classes to help you manage and reduce storage costs. For more information, see What is OSS?.

OSS-HDFS (JindoFS) is a cloud-native data lake storage service. OSS-HDFS provides centralized metadata management capabilities and is fully compatible with Hadoop Distributed File System (HDFS) API. OSS-HDFS also supports Portable Operating System Interface (POSIX). You can use OSS-HDFS to manage data in data lake-based computing scenarios in the big data and AI fields. For more information, see What is OSS-HDFS?.

JindoData is a suite developed by the Alibaba Cloud big data team for storage acceleration of data lake systems. JindoData provides end-to-end solutions for data lake systems of Alibaba Cloud and other vendors in big data and AI scenarios. JindoData is built on top of a unified architecture and kernel. JindoData provides the following components: JindoFS (the original JindoFS in block storage mode), JindoFSx (the original JindoFS in cache mode), and JindoSDK. JindoData also provides fully compatible tools, such as JindoFuse and Jindo DistCp, and plug-ins. For more information, see Overview.

Use methods

  • By default, JindoSDK is deployed in E-MapReduce (EMR) clusters. You can use JindoSDK to access OSS or OSS-HDFS.

  • In other Alibaba Cloud services, you can download the latest version of the JindoSDK JAR package, install JindoSDK, and then use JindoSDK. For more information, see Deploy JindoSDK in an environment other than EMR.

Benefits

OSS or OSS-HDFS provides the following benefits when they are used as an underlying storage service:

  • Ready to use. OSS and OSS-HDFS are cloud-native storage services. You can use OSS and OSS-HDFS by calling RESTful APIs without the need to deploy the services. By default, JindoSDK is deployed in EMR clusters. You can use JindoSDK to access OSS or OSS-HDFS.

  • Cost-effective. You can use OSS or OSS-HDFS to reduce storage costs. OSS and OSS-HDFS provide various storage classes, such as Infrequent Access (IA), Archive, and Cold Archive, that you can use to store data. This reduces the storage costs of cold data.

  • High expandability. OSS and OSS-HDFS are highly expandable. The storage space of OSS or OSS-HDFS is not limited by hard disk capacity. You do not need to manually expand storage capacity.

Features

The following table describes the differences between the features of OSS and OSS-HDFS.

Scenario

Feature

OSS

OSS-HDFS

Big data scenario (Hadoop)

Operations for files and directories, and related operations

Supported

Supported

Support for granting permissions on files and directories

Not supported

Supported

Atomic operations for directories and rename operations

Supported (poor performance)

Supported (millisecond-granularity rename operations)

Support for specifying a point in time by using setTimes

Not supported

Supported

Extended attributes (XAttrs)

Not supported

Supported

ACL

Not supported

Supported

Support for accelerating on-premises read caching

Supported

Supported

Snapshots

Not supported

Supported

File-related operations, such as flush, sync, truncate, and append

Not supported

Supported

Truncate operations on files

Not supported

Supported

Checksum verification

Supported

Supported

Automatic clean-up of the HDFS recycle bin

Not supported

Supported

AI scenario (POSIX)

Metadata consistency

Weak

Strong

File-related operations, such as flush, sync, truncate, and append

Supported (However, limits on the operations exist. For information, see Limits.)

Supported

Truncate operations on files

Not supported

Supported

Random writes to files

Not supported

Supported