MaxCompute supports creating Paimon external tables that map to Paimon table directories stored in OSS buckets and access their data. This topic describes how to create a Paimon external table and query it from MaxCompute.
Function introduction
Apache Paimon is a unified lake storage format for batch and streaming workloads, offering high-throughput writes and low-latency queries. Common compute engines such as Spark, Hive, and Trino—used with Realtime Compute for Apache Flink and E-MapReduce—integrate seamlessly with Paimon. With Apache Paimon, you can quickly build a data lake on OSS and connect it to MaxCompute for analytics. Metadata filtering further optimizes query performance by skipping unnecessary OSS directory files during read operations.
Applicability
Paimon external tables do not automatically update their schema when the underlying Paimon file schema changes.
You cannot set cluster attributes or primary keys on Paimon external tables.
Paimon external tables do not support querying historical versions of data.
Do not write data directly into Paimon external tables. Instead, use methods such as UNLOAD to export data to OSS.
You can use INSERT INTO or INSERT OVERWRITE statements to write data into Paimon external tables. However, writing to Dynamic Bucket tables and Cross Partition tables is not supported.
UPDATE and DELETE operations are not supported on Paimon external tables.
MaxCompute and OSS must be in the same region.
Create a Paimon external table
Syntax structure
For details about the syntax of external tables in various formats, see OSS external tables.
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
WITH serdeproperties (
'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION '<oss_location>';Common parameters
For more information about common parameters, see Basic syntax parameters.
Write data
For more information about the MaxCompute write syntax, see Write syntax.
Query analysis
Paimon table data splitting (Split) logic differs from that of native MaxCompute tables. Paimon uses its own internal file organization and sharding mechanism, which does not fully align with MaxCompute parameters.
For details about SELECT syntax, see Query syntax description.
For details about optimizing query plans, see Query optimization.
For details about BadRowSkipping, see BadRowSkipping.
Usage example
Step 1: Prerequisites
You have created a MaxCompute project.
You have prepared an OSS bucket and OSS directory. For more information, see Create a bucket and Manage folders.
Because MaxCompute is available only in specific regions, cross-region connectivity issues may occur. We recommend using an OSS bucket in the same region as your MaxCompute project.
Authorization
You have permissions to access OSS. An Alibaba Cloud account (primary account), Resource Access Management (RAM) user, or RAM role can access OSS external tables. For authorization details, see STS-mode authorization for OSS.
You have CreateTable permissions in the MaxCompute project. For details about table operation permissions, see MaxCompute permissions.
Step 2: Prepare data in Flink
Create a Paimon catalog and a Paimon table, then insert data into the table as shown in the following example.
If Paimon table data already exists in OSS, skip this step.
Create a Paimon Filesystem Catalog
Log on to the Flink console. In the upper-left corner of the page, select a region.
Click the name of the target workspace. In the navigation pane on the left, select Catalogs.
On the Catalog List page, click Create Catalog . In the Create Catalog dialog box, select Apache Paimon, click Next, and then configure the following parameters:
Parameter
Required
Description
metastore
Required.
The metastore type. In this example, select
filesystem.catalog name
Required
A custom catalog name, such as
paimon-catalog.warehouse
Required
The data warehouse directory in OSS. In this example, use
oss://paimon-fs/paimon-test/.fs.oss.endpoint
Required
The OSS endpoint. For example, the endpoint for the China (Hangzhou) region is
oss-cn-hangzhou-internal.aliyuncs.com.fs.oss.accessKeyId
Required
The AccessKey ID required to access OSS.
fs.oss.accessKeySecret
Required
The AccessKey Secret required to access OSS.
Create a Paimon table
Log on to the Flink console. In the upper-left corner of the page, select a region.
Click the name of the target workspace. In the navigation pane on the left, select .
On the New Script tab, click
to create a new query script.Enter the following commands and click Run.
CREATE TABLE `paimon_catalog`.`default`.test_tbl ( id BIGINT, data STRING, dt STRING, PRIMARY KEY (dt, id) NOT ENFORCED ) PARTITIONED BY (dt); INSERT INTO `paimon-catalog`.`default`.test_tbl VALUES (1,'CCC','2024-07-18'), (2,'DDD','2024-07-18');
If your SQL job is rate-limited—for example, when running an
INSERT INTO ... VALUES ...statement—perform the following steps:Click the name of the target workspace. In the navigation pane on the left, select .
On the Deployments page, click the name of the target job to open the Configuration page.
In the Runtime Parameter Settings area, click Edit. Set the
execution.checkpointing.checkpoints-after-tasks-finish.enabled: trueconfiguration in the Other Configuration section and save your changes.For details about configuring job runtime parameters, see Configure job deployment information.
Step 3: Create a Paimon external table in MaxCompute
Run the following SQL code in MaxCompute to create a Paimon external table.
CREATE EXTERNAL TABLE oss_extable_paimon_pt
(
id BIGINT,
data STRING
)
PARTITIONED BY (dt STRING )
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
WITH serdeproperties (
'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION 'oss://oss-cn-<your region>-internal.aliyuncs.com/<table_path>'
;In the preceding code, table_path is the path of the Paimon table created in Flink, such as paimon-fs/paimon-test/default.db/test_tbl. To obtain this path:
Log on to the Flink console. In the upper-left corner of the page, select a region.
Click the name of the target workspace. In the navigation pane on the left, select Catalogs.
On the Metadata page, click default under the target catalog. On the default page, click View in the Actions column for the target table.
On the Table Schema tab, in the Properties area, obtain the value of the path parameter. For table_path, enter only the path that follows
oss://.
Step 4: Load partition data
If the OSS external table you created is partitioned, you must load partition data separately. For more information, see Syntax for loading partition data for OSS external tables.
MSCK REPAIR TABLE oss_extable_paimon_pt ADD PARTITIONS;Step 5: Read the Paimon external table from MaxCompute
Run the following commands in MaxCompute to query the Paimon external table oss_extable_paimon_pt.
SET odps.sql.common.table.planner.ext.hive.bridge = true;
SET odps.sql.hive.compatible = true;
SELECT * FROM oss_extable_paimon_pt WHERE dt='2024-07-18';The result is as follows:
+------------+------------+------------+
| id | data | dt |
+------------+------------+------------+
| 1 | CCC | 2024-07-18 |
| 2 | DDD | 2024-07-18 |
+------------+------------+------------+If the schema in the Paimon file differs from the external table schema:
Column count mismatch: If the Paimon file has fewer columns than defined in the external table DDL, missing column values are filled with NULL during reads. If it has more columns, extra columns are discarded.
Data type mismatch: MaxCompute does not support reading STRING data from Paimon files into INT columns. It supports reading INT data into STRING columns, although this is not recommended.
Supported data types
For MaxCompute data types, see Data type edition 1.0 and Data type edition 2.0.
Open source Paimon data type | MaxCompute 2.0 data type | Read/write support | Description |
TINYINT | TINYINT | 8-bit signed integer. | |
SMALLINT | SMALLINT | 16-bit signed integer. | |
INT | INT | 32-bit signed integer. | |
BIGINT | BIGINT | 64-bit signed integer. | |
BINARY(MAX_LENGTH) | BINARY | Binary data type. Maximum length is 8 MB. | |
FLOAT | FLOAT | 32-bit binary floating-point number. | |
DOUBLE | DOUBLE | 64-bit binary floating-point number. | |
DECIMAL(precision,scale) | DECIMAL(precision,scale) | Exact decimal number. Default is
| |
VARCHAR(n) | VARCHAR(n) | Variable-length character string. n is the length, ranging from 1 to 65535. | |
CHAR(n) | CHAR(n) | Fixed-length character string. n is the length, ranging from 1 to 255. | |
VARCHAR(MAX_LENGTH) | STRING | String type. Maximum length is 8 MB. | |
DATE | DATE | Date format: | |
TIME, TIME(p) | Not supported | Paimon TIME type represents time without time zone, composed of hours, minutes, and seconds, with nanosecond precision. TIME(p) specifies fractional second precision from 0 to 9 (default is 0). MaxCompute has no corresponding data type. | |
TIMESTAMP, TIMESTAMP(p) | TIMESTAMP_NTZ | Timestamp without time zone, precise to nanoseconds. To read this type, enable the native mode switch: | |
TIMESTAMP WITH LOCAL TIME_ZONE(9) | TIMESTAMP |
| |
TIMESTAMP WITH LOCAL TIME_ZONE(9) | DATETIME | Timestamp type precise to nanoseconds. Format: | |
BOOLEAN | BOOLEAN | Boolean type. | |
ARRAY | ARRAY | Complex type. | |
MAP | MAP | Complex type. | |
ROW | STRUCT | Complex type. | |
MULTISET<t> | Not supported | MaxCompute has no corresponding data type. | |
VARBINARY, VARBINARY(n), BYTES | BINARY | Variable-length binary string. |
FAQ
Error kSIGABRT when reading a Paimon external table
Error message:
ODPS-0123144: Fuxi job failed - kSIGABRT(errCode:6) at Odps/*****_SQL_0_1_0_job_0/M1@f01b17437.cloud.eo166#3. Detail error msg: CRASH_CORE, maybe caused by jvm crash, please check your java udf/udaf/udtf. | fatalInstance: Odps/*****_SQL_0_1_0_job_0/M1#0_0Cause:
This error occurs when reading TIMESTAMP_NTZ data in JNI mode.
Solution:
Before you read from a table, disable the Native feature by running the command
SET odps.sql.common.table.jni.disable.native=true;.
References
You can create a MaxCompute Paimon external table in Flink as a custom catalog, write data to it, and then query and consume the Paimon data from MaxCompute. For more information, see Create a MaxCompute Paimon external table based on Flink.