Overview of storage API

Updated at: 2025-01-27 06:44

To enhance integration with the big data ecosystem and facilitate external engines accessing data in MaxCompute, Alibaba Cloud offers the Storage API. This feature, currently in public preview, allows mainstream third-party compute engines to directly access MaxCompute's underlying storage, significantly boosting data access and interaction efficiency.

Storage API

The Storage API is a data service interface that enables efficient, low-latency, and secure data read operations. It supports mainstream third-party compute engines, such as Spark on EMR, StarRocks, Presto, and PAI, to directly access MaxCompute's underlying storage system. This enhances the integration and data processing efficiency between MaxCompute and open-source compute engines for machine learning. Spark on EMR, StarRocks, and Presto can also directly read MaxCompute data through a connector, streamlining the data reading process and improving data access performance. The architecture is illustrated in the following figure.

image

Scenarios

The Storage API is applicable to scenarios involving data openness and multi-engine computing. It serves as a bridge for enterprises or developers who need to flexibly switch between different computing frameworks or leverage specific engine features to process data in MaxCompute, promoting data flow and processing diversity.

Key features

  • High throughput: Enables efficient column-level reading, supports data filtering through predicate pushdown before transmission, and accommodates the Arrow format.

  • Secure and user-friendly: Offers direct read access to underlying storage with table semantics, concealing storage complexities while adhering to security policies such as project isolation, access control, and data encryption.

  • Ecosystem integration: Spark on EMR and StarRocks can directly read MaxCompute data through a connector, simplifying the integration of compute engines.

Limits

  • Third-party engines can read partitioned tables, clustered tables, and materialized views in MaxCompute. However, they cannot read MaxCompute's foreign tables, logical views, and Delta Tables.

  • JSON data is not readable.

Data transmission resources

For data transmission tasks through MaxCompute storage API, third-party engines can utilize Data Transmission Service exclusive resource groups (subscription) resources. A detailed introduction is provided below.

Resource group name

Billing description

Supported regions

Usage description

Resource group name

Billing description

Supported regions

Usage description

Exclusive resource group of Data Transmission Service (subscription)

Subscription, billed based on the number of purchased concurrent instances. For details, see Fees for exclusive resources in data transmission service (subscription).

  • China (Hangzhou)

  • China (Beijing)

  • China (Shanghai)

  • China (Shenzhen)

Purchase and use exclusive resource groups for data transmission service

You can view the usage details of the Data Transmission Service exclusive resource group (subscription) on the Resource Observation page. For more information, see Use resource observation.

Usage examples

  • On this page (1)
  • Storage API
  • Scenarios
  • Key features
  • Limits
  • Data transmission resources
  • Usage examples
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare