Overview of storage API

0.0.201

To enhance integration with the big data ecosystem and facilitate external engines accessing data in MaxCompute, Alibaba Cloud offers the Storage API. This feature, currently in public preview, allows mainstream third-party compute engines to directly access MaxCompute's underlying storage, significantly boosting data access and interaction efficiency.

Storage API

The Storage API is a data service interface that enables efficient, low-latency, and secure data read operations. It supports mainstream third-party compute engines, such as Spark on EMR, StarRocks, Presto, and PAI, to directly access MaxCompute's underlying storage system. This enhances the integration and data processing efficiency between MaxCompute and open-source compute engines for machine learning. Spark on EMR, StarRocks, and Presto can also directly read MaxCompute data through a connector, streamlining the data reading process and improving data access performance. The architecture is illustrated in the following figure.

Scenarios

The Storage API is applicable to scenarios involving data openness and multi-engine computing. It serves as a bridge for enterprises or developers who need to flexibly switch between different computing frameworks or leverage specific engine features to process data in MaxCompute, promoting data flow and processing diversity.

Key features

High throughput: Enables efficient column-level reading, supports data filtering through predicate pushdown before transmission, and accommodates the Arrow format.
Secure and user-friendly: Offers direct read access to underlying storage with table semantics, concealing storage complexities while adhering to security policies such as project isolation, access control, and data encryption.
Ecosystem integration: Spark on EMR and StarRocks can directly read MaxCompute data through a connector, simplifying the integration of compute engines.

Limits

Third-party engines can read partitioned tables, clustered tables, and materialized views in MaxCompute. However, they cannot read MaxCompute's foreign tables, logical views, and Delta Tables.
JSON data is not readable.

Data transmission resources

For data transmission tasks through MaxCompute storage API, third-party engines can utilize Data Transmission Service exclusive resource groups (subscription) resources. A detailed introduction is provided below.

Resource group name	Billing description	Supported regions	Usage description

Resource group name	Billing description	Supported regions	Usage description
Exclusive resource group of Data Transmission Service (subscription)	Subscription, billed based on the number of purchased concurrent instances. For details, see Fees for exclusive resources in data transmission service (subscription).	China (Hangzhou) China (Beijing) China (Shanghai) China (Shenzhen)	Purchase and use exclusive resource groups for data transmission service

You can view the usage details of the Data Transmission Service exclusive resource group (subscription) on the Resource Observation page. For more information, see Use resource observation.