All Products
Search
Document Center

MaxCompute:Paimon external table

Last Updated:Feb 07, 2026

MaxCompute supports creating Paimon external tables that map to Paimon table directories stored in OSS buckets and access their data. This topic describes how to create a Paimon external table and query it from MaxCompute.

Function introduction

Apache Paimon is a unified lake storage format for batch and streaming workloads, offering high-throughput writes and low-latency queries. Common compute engines such as Spark, Hive, and Trino—used with Realtime Compute for Apache Flink and E-MapReduce—integrate seamlessly with Paimon. With Apache Paimon, you can quickly build a data lake on OSS and connect it to MaxCompute for analytics. Metadata filtering further optimizes query performance by skipping unnecessary OSS directory files during read operations.

Applicability

  • Paimon external tables do not automatically update their schema when the underlying Paimon file schema changes.

  • You cannot set cluster attributes or primary keys on Paimon external tables.

  • Paimon external tables do not support querying historical versions of data.

  • Do not write data directly into Paimon external tables. Instead, use methods such as UNLOAD to export data to OSS.

  • You can use INSERT INTO or INSERT OVERWRITE statements to write data into Paimon external tables. However, writing to Dynamic Bucket tables and Cross Partition tables is not supported.

  • UPDATE and DELETE operations are not supported on Paimon external tables.

  • MaxCompute and OSS must be in the same region.

  • Supported data types.

Create a Paimon external table

Syntax structure

For details about the syntax of external tables in various formats, see OSS external tables.

CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
WITH serdeproperties (
  'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION '<oss_location>';

Common parameters

For more information about common parameters, see Basic syntax parameters.

Write data

For more information about the MaxCompute write syntax, see Write syntax.

Query analysis

  • Paimon table data splitting (Split) logic differs from that of native MaxCompute tables. Paimon uses its own internal file organization and sharding mechanism, which does not fully align with MaxCompute parameters.

  • For details about SELECT syntax, see Query syntax description.

  • For details about optimizing query plans, see Query optimization.

  • For details about BadRowSkipping, see BadRowSkipping.

Usage example

Step 1: Prerequisites

  1. You have created a MaxCompute project.

  2. You have prepared an OSS bucket and OSS directory. For more information, see Create a bucket and Manage folders.

    Because MaxCompute is available only in specific regions, cross-region connectivity issues may occur. We recommend using an OSS bucket in the same region as your MaxCompute project.
  3. Authorization

    1. You have permissions to access OSS. An Alibaba Cloud account (primary account), Resource Access Management (RAM) user, or RAM role can access OSS external tables. For authorization details, see STS-mode authorization for OSS.

    2. You have CreateTable permissions in the MaxCompute project. For details about table operation permissions, see MaxCompute permissions.

Step 2: Prepare data in Flink

Create a Paimon catalog and a Paimon table, then insert data into the table as shown in the following example.

Note

If Paimon table data already exists in OSS, skip this step.

  1. Create a Paimon Filesystem Catalog

    1. Log on to the Flink console. In the upper-left corner of the page, select a region.

    2. Click the name of the target workspace. In the navigation pane on the left, select Catalogs.

    3. On the Catalog List page, click Create Catalog . In the Create Catalog dialog box, select Apache Paimon, click Next, and then configure the following parameters:

      Parameter

      Required

      Description

      metastore

      Required.

      The metastore type. In this example, select filesystem.

      catalog name

      Required

      A custom catalog name, such as paimon-catalog.

      warehouse

      Required

      The data warehouse directory in OSS. In this example, use oss://paimon-fs/paimon-test/.

      fs.oss.endpoint

      Required

      The OSS endpoint. For example, the endpoint for the China (Hangzhou) region is oss-cn-hangzhou-internal.aliyuncs.com.

      fs.oss.accessKeyId

      Required

      The AccessKey ID required to access OSS.

      fs.oss.accessKeySecret

      Required

      The AccessKey Secret required to access OSS.

  2. Create a Paimon table

    1. Log on to the Flink console. In the upper-left corner of the page, select a region.

    2. Click the name of the target workspace. In the navigation pane on the left, select Development > Scripts.

    3. On the New Script tab, click image to create a new query script.

      Enter the following commands and click Run.

      CREATE TABLE `paimon_catalog`.`default`.test_tbl (
          id BIGINT,
          data STRING,
          dt STRING,
          PRIMARY KEY (dt, id) NOT ENFORCED
      ) PARTITIONED BY (dt);
      
      INSERT INTO `paimon-catalog`.`default`.test_tbl VALUES (1,'CCC','2024-07-18'), (2,'DDD','2024-07-18');
  3. If your SQL job is rate-limited—for example, when running an INSERT INTO ... VALUES ... statement—perform the following steps:

    1. Click the name of the target workspace. In the navigation pane on the left, select O&M > Deployments.

    2. On the Deployments page, click the name of the target job to open the Configuration page.

    3. In the Runtime Parameter Settings area, click Edit. Set the execution.checkpointing.checkpoints-after-tasks-finish.enabled: true configuration in the Other Configuration section and save your changes.

      For details about configuring job runtime parameters, see Configure job deployment information.

Step 3: Create a Paimon external table in MaxCompute

Run the following SQL code in MaxCompute to create a Paimon external table.

CREATE EXTERNAL TABLE oss_extable_paimon_pt
(
    id BIGINT,
    data STRING
)
PARTITIONED BY (dt STRING )
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
WITH serdeproperties (
    'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION 'oss://oss-cn-<your region>-internal.aliyuncs.com/<table_path>'
;

In the preceding code, table_path is the path of the Paimon table created in Flink, such as paimon-fs/paimon-test/default.db/test_tbl. To obtain this path:

  1. Log on to the Flink console. In the upper-left corner of the page, select a region.

  2. Click the name of the target workspace. In the navigation pane on the left, select Catalogs.

  3. On the Metadata page, click default under the target catalog. On the default page, click View in the Actions column for the target table.

  4. On the Table Schema tab, in the Properties area, obtain the value of the path parameter. For table_path, enter only the path that follows oss://.

Step 4: Load partition data

If the OSS external table you created is partitioned, you must load partition data separately. For more information, see Syntax for loading partition data for OSS external tables.

MSCK REPAIR TABLE oss_extable_paimon_pt ADD PARTITIONS;

Step 5: Read the Paimon external table from MaxCompute

Run the following commands in MaxCompute to query the Paimon external table oss_extable_paimon_pt.

SET odps.sql.common.table.planner.ext.hive.bridge = true;
SET odps.sql.hive.compatible = true;
SELECT * FROM oss_extable_paimon_pt WHERE dt='2024-07-18';

The result is as follows:

+------------+------------+------------+
| id         | data       | dt         | 
+------------+------------+------------+
| 1          | CCC        | 2024-07-18 | 
| 2          | DDD        | 2024-07-18 | 
+------------+------------+------------+
Note

If the schema in the Paimon file differs from the external table schema:

  • Column count mismatch: If the Paimon file has fewer columns than defined in the external table DDL, missing column values are filled with NULL during reads. If it has more columns, extra columns are discarded.

  • Data type mismatch: MaxCompute does not support reading STRING data from Paimon files into INT columns. It supports reading INT data into STRING columns, although this is not recommended.

Supported data types

For MaxCompute data types, see Data type edition 1.0 and Data type edition 2.0.

Open source Paimon data type

MaxCompute 2.0 data type

Read/write support

Description

TINYINT

TINYINT

Supported

8-bit signed integer.

SMALLINT

SMALLINT

Supported

16-bit signed integer.

INT

INT

Supported

32-bit signed integer.

BIGINT

BIGINT

Supported

64-bit signed integer.

BINARY(MAX_LENGTH)

BINARY

Supported

Binary data type. Maximum length is 8 MB.

FLOAT

FLOAT

Supported

32-bit binary floating-point number.

DOUBLE

DOUBLE

Supported

64-bit binary floating-point number.

DECIMAL(precision,scale)

DECIMAL(precision,scale)

Supported

Exact decimal number. Default is decimal(38,18). You can customize precision and scale.

  • precision: The maximum number of digits. Valid range: 1 <= precision <= 38.

  • scale: The number of digits after the decimal point. Valid range: 0 <= scale <= 18.

VARCHAR(n)

VARCHAR(n)

Supported

Variable-length character string. n is the length, ranging from 1 to 65535.

CHAR(n)

CHAR(n)

Supported

Fixed-length character string. n is the length, ranging from 1 to 255.

VARCHAR(MAX_LENGTH)

STRING

Supported

String type. Maximum length is 8 MB.

DATE

DATE

Supported

Date format: yyyy-mm-dd.

TIME, TIME(p)

Not supported

Not supported

Paimon TIME type represents time without time zone, composed of hours, minutes, and seconds, with nanosecond precision.

TIME(p) specifies fractional second precision from 0 to 9 (default is 0).

MaxCompute has no corresponding data type.

TIMESTAMP, TIMESTAMP(p)

TIMESTAMP_NTZ

Supported

Timestamp without time zone, precise to nanoseconds.

To read this type, enable the native mode switch: SET odps.sql.common.table.jni.disable.native=true;

TIMESTAMP WITH LOCAL TIME_ZONE(9)

TIMESTAMP

Supported

  • Timestamp type precise to nanoseconds, formatted as yyyy-mm-dd hh:mm:ss.xxxxxxxxx.

  • When writing low-precision TIMESTAMP values from Paimon source tables, truncation occurs: precision 0–3 is truncated to 3 digits, 4–6 to 6 digits, and 7–9 to 9 digits.

TIMESTAMP WITH LOCAL TIME_ZONE(9)

DATETIME

Not supported

Timestamp type precise to nanoseconds.

Format: yyyy-mm-dd hh:mm:ss.xxxxxxxxx

BOOLEAN

BOOLEAN

Supported

Boolean type.

ARRAY

ARRAY

Supported

Complex type.

MAP

MAP

Supported

Complex type.

ROW

STRUCT

Supported

Complex type.

MULTISET<t>

Not supported

Not supported

MaxCompute has no corresponding data type.

VARBINARY, VARBINARY(n), BYTES

BINARY

Supported

Variable-length binary string.

FAQ

Error kSIGABRT when reading a Paimon external table

  • Error message:

    ODPS-0123144: Fuxi job failed - kSIGABRT(errCode:6) at Odps/*****_SQL_0_1_0_job_0/M1@f01b17437.cloud.eo166#3. 
      Detail error msg: CRASH_CORE, maybe caused by jvm crash, please check your java udf/udaf/udtf. 
      | fatalInstance: Odps/*****_SQL_0_1_0_job_0/M1#0_0 
  • Cause:

    This error occurs when reading TIMESTAMP_NTZ data in JNI mode.

  • Solution:

    Before you read from a table, disable the Native feature by running the command SET odps.sql.common.table.jni.disable.native=true;.

References

You can create a MaxCompute Paimon external table in Flink as a custom catalog, write data to it, and then query and consume the Paimon data from MaxCompute. For more information, see Create a MaxCompute Paimon external table based on Flink.