All Products
Search
Document Center

MaxCompute:Overview of external tables

Last Updated:Jan 05, 2026

MaxCompute lets you use external tables to query and analyze data stored in external storage systems, such as OSS.

Function introduction

MaxCompute SQL is the primary tool for distributed data processing and can quickly process exabytes of offline data. As big data services expand, new data scenarios continue to emerge, with large amounts of data centralized in data lakes, real-time data warehouses, NoSQL databases, or other systems.

MaxCompute is suitable for both data warehouse and data lake scenarios. Its underlying architecture uses distributed storage and computing for big data. This provides the capacity, computing throughput, multi-engine capabilities, and openness required for data lake scenarios. Therefore, two main patterns exist for computing external data:

  • Pattern 1: Import then compute

    You can import data into MaxCompute. Structured data in a table format can be computed using SQL or opened to third-party engines. Unstructured data can also be stored and computed in MaxCompute. This approach provides higher data read and write efficiency and a more integrated experience within MaxCompute.

  • Pattern 2: Compute directly on external data

    To build a more flexible data architecture based on a data lake, MaxCompute SQL can also act as a compute engine to process data outside the data warehouse.

MaxCompute provides computing capabilities within a strongly managed data warehouse framework and can also connect to external data storage systems and their management frameworks. The most direct way to access external data is using external tables.

  • Definition and principle

    • You can use Data Definition Language (DDL) statements to define the table name, schema, properties, permissions, location, and protocol for accessing external data. This information is recorded in MaxCompute metadata.

    • SQL then uses this metadata to connect to the external data source and applies the appropriate method for each external table format. This process includes retrieving or updating the metadata of the external data and enabling data to be read, computed, and written.

  • Primary use cases

    Using external tables, you can directly compute data outside of MaxCompute. You can also manage external data sources within the MaxCompute management scope and use the data under a defined management system. Examples include batch processing of structured or unstructured data in a data lake, data sharing and exchange, and archiving data from a real-time data warehouse into a data warehouse model.

Billing information

  1. Storage costs

    • Conclusion: No storage costs are incurred.

    • Reason: When you use external tables, the data is not copied and stored in MaxCompute. The data remains in the external system. MaxCompute does not charge storage costs for the data warehouse. For information about storage costs, see the billing rules of the data source.

  2. Computing costs
    Computing costs follow the billing rules for MaxCompute compute resources. The costs vary based on the billing method:

    • For subscription or elastically reserved compute units (CUs):

      • The computing costs for external tables are included in the prepaid fees for compute resources.

    • For the pay-as-you-go billing method:

      • Billable part: Currently, billing is based only on the amount of data scanned when computing tasks access OSS and Tablestore.

      • Non-billable part: Accessing data sources such as HDFS, Hologres, RDS, HBase, and Lindorm is not currently billed. This applies whether you access them through external tables or the external schema method of Data Lakehouse 2.0. The amount of scanned data is not tracked, and no computing costs are generated.

  3. Network costs

    If you use a public MaxCompute Endpoint to connect to an external table, Internet traffic and download fees are generated. For more information about MaxCompute fees, see Billable items and billing methods.

  4. Note on external data source costs

    When you use MaxCompute external tables to access external data sources, the data sources may generate costs for computing, access, and data transmission. The specific charges depend on the billing method of the external data source. For more information, see the documentation for the corresponding products.

Scope

  • The Tunnel feature and Tunnel SDK do not support operations on external tables. You can upload data directly to MaxCompute internal tables using Tunnel. You can also upload data to OSS using the OSS Python SDK and then create a mapping to it using an external table in MaxCompute.

  • MaxCompute external tables support writing data to data sources. However, the write capability and consistency are limited by the external system. For example:

    • Hologres: MaxCompute accesses Hologres metadata based on the Java Database Connectivity (JDBC) protocol and cannot guarantee atomicity for write transaction control. The MaxCompute SQL engine can only read, not write, the underlying Pangu data of Hologres. Due to the complexity of parallel writing by multiple processes in the MaxCompute distributed computing environment, writing data from MaxCompute to Hologres does not support INSERT OVERWRITE semantics. If a job fails, only partial data might be written.

    • HDFS: Similarly, when based on Hive Metastore (HMS), writing to HDFS has a small chance of being inconsistent.

    • OSS: When writing to an OSS external table, using a .odps metadata file for control can reduce the probability of incomplete writes. However, if you abandon this control mechanism for engine compatibility, there is also a small chance of incomplete writes.

  • When you use the INSERT OVERWRITE semantic to write data from MaxCompute to an external data source, the new data is written first. Then, during the DDL commit phase, the existing data in the table or partition is deleted and replaced with the new data. MaxCompute cannot roll back or recover the deleted data. You should back up your data in advance. After the write operation, perform data validation. If any issues are found, you can repeat the full write operation.

Related Topics

Supported external tables

MaxCompute supports various external tables, such as those for OSS, Hologres, and RDS:

External table examples

Use the following examples to learn how to process various types of unstructured data using MaxCompute external tables:

  • To access unstructured data in OSS and Tablestore (OTS), see OSS external tables and Accessing unstructured data in OTS.

  • To grant MaxCompute permissions to access OSS using a custom RAM role for an external table, see STS mode authorization.

  • The MaxCompute unstructured data framework supports directly writing data from MaxCompute to OSS using the INSERT statement. For more information, see Write data to OSS.

  • To process data in various open source formats, see OSS external tables.

  • You can use DataWorks with MaxCompute to visually create, search, query, configure, process, and analyze external tables. For more information, see External tables.

FAQ