After you configure an Apache Paimon catalog, you can directly access the Apache Paimon tables in the catalog in Alibaba Cloud Object Storage Service (OSS) from Realtime Compute for Apache Flink. This topic describes how to create, view, and delete an Apache Paimon catalog and manage Apache Paimon databases and tables in the development console of Realtime Compute for Apache Flink.
Background information
Apache Paimon catalogs can be used to efficiently manage all Apache Paimon tables in the same directory. Apache Paimon catalogs can also be used by other Alibaba Cloud services. The following table describes the supported metadata storage types. You can select a metadata storage type based on your business requirements.
Metadata storage type | Description | Other Alibaba Cloud services that can access Apache Paimon tables in an Apache Paimon catalog |
filesystem | Stores metadata only in a specific path in OSS. | Compute engines such as Spark, Hive, and Trino in E-MapReduce (EMR). For more information, see Apache Paimon of EMR. |
dlf | Stores metadata in a specific path in OSS and synchronizes metadata to Alibaba Cloud Data Lake Formation (DLF). |
|
maxcompute | Stores metadata in a specific path in OSS and creates, modifies, or deletes an external table in a specified MaxCompute project when you create, modify, or delete an Apache Paimon table. This helps you query data of an Apache Paimon table in MaxCompute. | MaxCompute. For more information, see Apache Paimon external tables. |
sync | Combines the features of Apache Paimon DLF catalogs and Apache Paimon MaxCompute catalogs. This helps you connect to Hologres and MaxCompute by using the same catalog. Note In Apache Paimon Sync catalogs, the metadata of Apache Paimon DLF catalogs is synchronized to Apache Paimon MaxCompute catalogs. To ensure metadata consistency, do not manually modify or delete Apache Paimon external tables in MaxCompute. |
|
Precautions
Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 8.0.5 or later allows you to create and configure Apache Paimon catalogs and Apache Paimon tables.
OSS is used to store files related to Apache Paimon tables, including data files and metadata files. Make sure that you have activated OSS and that the storage class of the OSS bucket is Standard. For more information, see Get started by using the OSS console and Overview.
ImportantYou can also use the OSS bucket that you specify when you activate the Realtime Compute for Apache Flink service. However, to better distinguish data and prevent misoperations, we recommend that you create and use an OSS bucket that resides in the same region as Realtime Compute for Apache Flink.
The OSS bucket that you specify when you create an Apache Paimon catalog must reside in the same region as the MaxCompute project. The AccessKey pair that you specify when you create the Apache Paimon catalog must belong to an account that has read and write permissions on the OSS bucket, MaxCompute project, and DLF directory.
After you create or delete a catalog, database, or table by using SQL statements, you can click the
icon to refresh the Catalogs page.
Create an Apache Paimon catalog
All the preceding metadata storage types allow you to create an Apache Paimon catalog by using an SQL statement, whereas only the filesystem and dlf metadata storage types allow you to create an Apache Paimon catalog on the UI. The parameters for creating an Apache Paimon catalog by using an SQL statement are basically the same as the parameters for creating an Apache Paimon catalog on the UI. This section describes how to create an Apache Paimon catalog by using an SQL statement and the required parameters.
Create an Apache Paimon catalog
Create an Apache Paimon Filesystem catalog
CREATE CATALOG `my-catalog` WITH (
'type' = 'paimon',
'metastore' = 'filesystem',
'warehouse' = '<warehouse>',
'fs.oss.endpoint' = '<fs.oss.endpoint>',
'fs.oss.accessKeyId' = '<fs.oss.accessKeyId>',
'fs.oss.accessKeySecret' = '<fs.oss.accessKeySecret>'
);
The following table describes the parameters in the sample code.
Common parameters
Parameter
Description
Required
Remarks
my-catalog
The name of the Apache Paimon catalog.
Yes
Enter a custom name.
type
The type of the catalog.
Yes
Set the value to paimon.
metastore
The metadata storage type.
Yes
Valid values:
filesystem: Set this parameter to filesystem when you create an Apache Paimon Filesystem catalog.
dlf: Set this parameter to dlf when you create an Apache Paimon DLF catalog.
maxcompute: Set this parameter to maxcompute when you create an Apache Paimon MaxCompute catalog.
sync: Set this parameter to sync when you create an Apache Paimon Sync catalog.
OSS
Parameter
Description
Required
Remarks
warehouse
The data warehouse directory that is specified in OSS.
Yes
The format is oss://<bucket>/<object>. Parameters in the directory:
bucket: indicates the name of the OSS bucket that you created.
object: indicates the path in which your data is stored.
You can view the names of your bucket and object in the OSS console.
fs.oss.endpoint
The endpoint of OSS.
Yes
If DLF resides in the same region as Realtime Compute for Apache Flink, the virtual private cloud (VPC) endpoint of OSS is used. Otherwise, the public endpoint is used.
If you want to store Apache Paimon tables in OSS-HDFS, the value of the fs.oss.endpoint parameter must be in the
cn-<region>.oss-dls.aliyuncs.com
format. Example:cn-hangzhou.oss-dls.aliyuncs.com
.
fs.oss.accessKeyId
The AccessKey ID of the Alibaba Cloud account or Resource Access Management (RAM) user that has read and write permissions on OSS.
Yes
For more information about how to obtain the required information, see Regions and endpoints and Create an AccessKey pair.
fs.oss.accessKeySecret
The AccessKey secret of the Alibaba Cloud account or RAM user that has read and write permissions on OSS.
Yes
Create an Apache Paimon DLF catalog
CREATE CATALOG `my-catalog` WITH (
'type' = 'paimon',
'metastore' = 'dlf',
'warehouse' = '<warehouse>',
'dlf.catalog.id' = '<dlf.catalog.id>',
'dlf.catalog.accessKeyId' = '<dlf.catalog.accessKeyId>',
'dlf.catalog.accessKeySecret' = '<dlf.catalog.accessKeySecret>',
'dlf.catalog.endpoint' = '<dlf.catalog.endpoint>',
'dlf.catalog.region' = '<dlf.catalog.region>',
'fs.oss.endpoint' = '<fs.oss.endpoint>',
'fs.oss.accessKeyId' = '<fs.oss.accessKeyId>',
'fs.oss.accessKeySecret' = '<fs.oss.accessKeySecret>'
);
The following table describes the parameters in the sample code.
Common parameters
Parameter
Description
Required
Remarks
my-catalog
The name of the Apache Paimon catalog.
Yes
Enter a custom name.
type
The type of the catalog.
Yes
Set the value to paimon.
metastore
The metadata storage type.
Yes
Valid values:
filesystem: Set this parameter to filesystem when you create an Apache Paimon Filesystem catalog.
dlf: Set this parameter to dlf when you create an Apache Paimon DLF catalog.
maxcompute: Set this parameter to maxcompute when you create an Apache Paimon MaxCompute catalog.
sync: Set this parameter to sync when you create an Apache Paimon Sync catalog.
OSS
Parameter
Description
Required
Remarks
warehouse
The data warehouse directory that is specified in OSS.
Yes
The format is oss://<bucket>/<object>. Parameters in the directory:
bucket: indicates the name of the OSS bucket that you created.
object: indicates the path in which your data is stored.
You can view the names of your bucket and object in the OSS console.
fs.oss.endpoint
The endpoint of OSS.
Yes
If DLF resides in the same region as Realtime Compute for Apache Flink, the virtual private cloud (VPC) endpoint of OSS is used. Otherwise, the public endpoint is used.
If you want to store Apache Paimon tables in OSS-HDFS, the value of the fs.oss.endpoint parameter must be in the
cn-<region>.oss-dls.aliyuncs.com
format. Example:cn-hangzhou.oss-dls.aliyuncs.com
.
fs.oss.accessKeyId
The AccessKey ID of the Alibaba Cloud account or Resource Access Management (RAM) user that has read and write permissions on OSS.
Yes
For more information about how to obtain the required information, see Regions and endpoints and Create an AccessKey pair.
fs.oss.accessKeySecret
The AccessKey secret of the Alibaba Cloud account or RAM user that has read and write permissions on OSS.
Yes
DLF
Parameter
Description
Required
Remarks
dlf.catalog.id
The ID of the DLF data directory.
Yes
You can view the ID of the data directory in the DLF console.
dlf.catalog.accessKeyId
The AccessKey ID that is used to access the DLF service.
Yes
For more information about how to obtain your AccessKey ID, see Create an AccessKey pair.
dlf.catalog.accessKeySecret
The AccessKey secret that is used to access the DLF service.
Yes
For more information about how to obtain your AccessKey secret, see Create an AccessKey pair.
dlf.catalog.endpoint
The endpoint of the DLF service.
Yes
For more information, see Supported regions and endpoints.
NoteIf DLF resides in the same region as Realtime Compute for Apache Flink, the VPC endpoint of OSS is used. Otherwise, the public endpoint is used.
dlf.catalog.region
The region in which the DLF service resides.
Yes
For more information, see Supported regions and endpoints.
NoteMake sure that the value of this parameter matches the endpoint that is specified by the dlf.catalog.endpoint parameter.
Create an Apache Paimon MaxCompute catalog
Prerequisites
The paimon_maxcompute_connector.jar file is uploaded to your MaxCompute project by using one of the following methods:
Use the MaxCompute client (odpscmd) to access the MaxCompute project, and run the
ADD JAR <path_to_paimon_maxcompute_connector.jar>;
command to upload the Apache Paimon plug-in file to the MaxCompute project.Create a resource in the DataWorks console to upload the Apache Paimon plug-in file to the MaxCompute project. For more information, see Create and use MaxCompute resources.
SQL statement
CREATE CATALOG `my-catalog` WITH ( 'type' = 'paimon', 'metastore' = 'maxcompute', 'warehouse' = '<warehouse>', 'maxcompute.endpoint' = '<maxcompute.endpoint>', 'maxcompute.project' = '<maxcompute.project>', 'maxcompute.accessid' = '<maxcompute.accessid>', 'maxcompute.accesskey' = '<maxcompute.accesskey>', 'maxcompute.oss.endpoint' = '<maxcompute.oss.endpoint>', 'fs.oss.endpoint' = '<fs.oss.endpoint>', 'fs.oss.accessKeyId' = '<fs.oss.accessKeyId>', 'fs.oss.accessKeySecret' = '<fs.oss.accessKeySecret>' );
NoteWhen you create an Apache Paimon table in the Apache Paimon MaxCompute catalog, an Apache Paimon external table is automatically created in the MaxCompute project. To query the Apache Paimon external table in MaxCompute, you must execute the following SET statements, and then execute the SELECT statement in MaxCompute. For more information, see Step 4: Read data from the Apache Paimon external table on the MaxCompute client (odpscmd) or by using a tool that can execute MaxCompute SQL statements.
SET odps.sql.common.table.planner.ext.hive.bridge = true; SET odps.sql.hive.compatible = true;
The following table describes the parameters in the sample code.
Common parameters
Parameter
Description
Required
Remarks
my-catalog
The name of the Apache Paimon catalog.
Yes
Enter a custom name.
type
The type of the catalog.
Yes
Set the value to paimon.
metastore
The metadata storage type.
Yes
Valid values:
filesystem: Set this parameter to filesystem when you create an Apache Paimon Filesystem catalog.
dlf: Set this parameter to dlf when you create an Apache Paimon DLF catalog.
maxcompute: Set this parameter to maxcompute when you create an Apache Paimon MaxCompute catalog.
sync: Set this parameter to sync when you create an Apache Paimon Sync catalog.
OSS
Parameter
Description
Required
Remarks
warehouse
The data warehouse directory that is specified in OSS.
Yes
The format is oss://<bucket>/<object>. Parameters in the directory:
bucket: indicates the name of the OSS bucket that you created.
object: indicates the path in which your data is stored.
You can view the names of your bucket and object in the OSS console.
fs.oss.endpoint
The endpoint of OSS.
Yes
If DLF resides in the same region as Realtime Compute for Apache Flink, the VPC endpoint of OSS is used. Otherwise, the public endpoint is used.
These parameters are required if the OSS bucket specified by the warehouse parameter does not reside in the same region as the Realtime Compute for Apache Flink workspace or an OSS bucket within another Alibaba Cloud account is used.
For more information about how to obtain the required information, see Regions and endpoints and Create an AccessKey pair.
fs.oss.accessKeyId
The AccessKey ID of the Alibaba Cloud account or RAM user that has read and write permissions on OSS.
Yes
fs.oss.accessKeySecret
The AccessKey secret of the Alibaba Cloud account or RAM user that has read and write permissions on OSS.
Yes
maxcompute
Parameter
Description
Required
Remarks
maxcompute.endpoint
The endpoint of the MaxCompute service.
Yes
For more information, see Endpoints.
maxcompute.project
The name of the MaxCompute project.
Yes
MaxCompute projects for which the schema feature is enabled are not supported.
maxcompute.accessid
The AccessKey ID of the Alibaba Cloud account that has permissions on MaxCompute.
Yes
For more information about how to obtain the AccessKey ID, see Create an AccessKey pair.
maxcompute.accesskey
The AccessKey secret of the Alibaba Cloud account that has permissions on MaxCompute.
Yes
For more information about how to obtain the AccessKey secret, see Create an AccessKey pair.
maxcompute.oss.endpoint
The endpoint that is used to access OSS from MaxCompute.
No
If you do not configure this parameter, the value of the fs.oss.endpoint parameter is used by default. For more information, see Regions and endpoints.
NoteWe recommend that you set the maxcompute.oss.endpoint parameter to an internal endpoint because the OSS bucket resides in the same region as the MaxCompute project.
maxcompute.life-cycle
The lifecycle of the MaxCompute external table.
No
Unit: day.
Create an Apache Paimon Sync catalog
CREATE CATALOG `my-catalog` WITH (
'type' = 'paimon',
'metastore' = 'sync',
'source' = 'dlf',
'target' = 'maxcompute',
'warehouse' = '<warehouse>',
'dlf.catalog.id' = '<dlf.catalog.id>',
'dlf.catalog.accessKeyId' = '<dlf.catalog.accessKeyId>',
'dlf.catalog.accessKeySecret' = '<dlf.catalog.accessKeySecret>',
'dlf.catalog.endpoint' = '<dlf.catalog.endpoint>',
'dlf.catalog.region' = '<dlf.catalog.region>',
'maxcompute.endpoint' = '<maxcompute.endpoint>',
'maxcompute.project' = '<maxcompute.project>',
'maxcompute.accessid' = '<maxcompute.accessid>',
'maxcompute.accesskey' = '<maxcompute.accesskey>',
'maxcompute.oss.endpoint' = '<maxcompute.oss.endpoint>',
'fs.oss.endpoint' = '<fs.oss.endpoint>',
'fs.oss.accessKeyId' = '<fs.oss.accessKeyId>',
'fs.oss.accessKeySecret' = '<fs.oss.accessKeySecret>'
);
The following table describes the parameters in the sample code.
Common parameters
Parameter
Description
Required
Remarks
my-catalog
The name of the Apache Paimon catalog.
Yes
Enter a custom name.
type
The type of the catalog.
Yes
Set the value to paimon.
metastore
The metadata storage type.
Yes
Valid values:
filesystem: Set this parameter to filesystem when you create an Apache Paimon Filesystem catalog.
dlf: Set this parameter to dlf when you create an Apache Paimon DLF catalog.
maxcompute: Set this parameter to maxcompute when you create an Apache Paimon MaxCompute catalog.
sync: Set this parameter to sync when you create an Apache Paimon Sync catalog.
Parameters only for Apache Paimon Sync catalogs
Parameter
Description
Required
Remarks
source
The storage service from which metadata is synchronized.
Yes
Set the value to dlf.
target
The storage service to which metadata is synchronized.
Yes
Set the value to maxcompute.
OSS
Parameter
Description
Required
Remarks
warehouse
The data warehouse directory that is specified in OSS.
Yes
The format is oss://<bucket>/<object>. Parameters in the directory:
bucket: indicates the name of the OSS bucket that you created.
object: indicates the path in which your data is stored.
You can view the names of your bucket and object in the OSS console.
fs.oss.endpoint
The endpoint of OSS.
Yes
If DLF resides in the same region as Realtime Compute for Apache Flink, the VPC endpoint of OSS is used. Otherwise, the public endpoint is used.
These parameters are required if the OSS bucket specified by the warehouse parameter does not reside in the same region as the Realtime Compute for Apache Flink workspace or an OSS bucket within another Alibaba Cloud account is used.
For more information about how to obtain the required information, see Regions and endpoints and Create an AccessKey pair.
fs.oss.accessKeyId
The AccessKey ID of the Alibaba Cloud account or RAM user that has read and write permissions on OSS.
Yes
fs.oss.accessKeySecret
The AccessKey secret of the Alibaba Cloud account or RAM user that has read and write permissions on OSS.
Yes
DLF
Parameter
Description
Required
Remarks
dlf.catalog.id
The ID of the DLF data directory.
Yes
You can view the ID of the data directory in the DLF console.
dlf.catalog.accessKeyId
The AccessKey ID that is used to access the DLF service.
Yes
For more information about how to obtain your AccessKey ID, see Create an AccessKey pair.
dlf.catalog.accessKeySecret
The AccessKey secret that is used to access the DLF service.
Yes
For more information about how to obtain your AccessKey secret, see Create an AccessKey pair.
dlf.catalog.endpoint
The endpoint of the DLF service.
Yes
For more information, see Supported regions and endpoints.
NoteIf DLF resides in the same region as Realtime Compute for Apache Flink, the VPC endpoint of OSS is used. Otherwise, the public endpoint is used.
dlf.catalog.region
The region in which the DLF service resides.
Yes
For more information, see Supported regions and endpoints.
NoteMake sure that the value of this parameter matches the endpoint that is specified by the dlf.catalog.endpoint parameter.
MaxCompute
Parameter
Description
Required
Remarks
maxcompute.endpoint
The endpoint of the MaxCompute service.
Yes
For more information, see Endpoints.
maxcompute.project
The name of the MaxCompute project.
Yes
MaxCompute projects for which the schema feature is enabled are not supported.
maxcompute.accessid
The AccessKey ID of the Alibaba Cloud account that has permissions on MaxCompute.
Yes
For more information about how to obtain the AccessKey ID, see Create an AccessKey pair.
maxcompute.accesskey
The AccessKey secret of the Alibaba Cloud account that has permissions on MaxCompute.
Yes
For more information about how to obtain the AccessKey secret, see Create an AccessKey pair.
maxcompute.oss.endpoint
The endpoint that is used to access OSS from MaxCompute.
No
If you do not configure this parameter, the value of the fs.oss.endpoint parameter is used by default. For more information, see Regions and endpoints.
NoteWe recommend that you set the maxcompute.oss.endpoint parameter to an internal endpoint because the OSS bucket resides in the same region as the MaxCompute project.
maxcompute.life-cycle
The lifecycle of the MaxCompute external table.
No
Unit: day.
Manage an Apache Paimon database
In the script editor, enter and select the following code, and click Run in the upper-left corner of the script editor.
Create a database
After you create an Apache Paimon catalog, a database named
default
is automatically created in the catalog.-- Replace my-catalog with the name of the Apache Paimon catalog that you created. USE CATALOG `my-catalog`; -- Replace my_db with a custom database name. CREATE DATABASE `my_db`;
Delete a database
ImportantYou cannot delete the default database from an Apache Paimon DLF catalog, an Apache Paimon MaxCompute catalog, or an Apache Paimon Sync catalog. You can delete the default database from an Apache Paimon Filesystem catalog.
-- Replace my-catalog with the name of the Apache Paimon catalog that you created. USE CATALOG `my-catalog`; -- Replace my_db with the name of the database that you want to delete. DROP DATABASE 'my_db'; -- Delete the database that does not contain tables. DROP DATABASE 'my_db' CASCADE; -- Delete the database and all tables from the database.
Manage an Apache Paimon table
Create an Apache Paimon table
Modify the schema of an Apache Paimon table
Drop an Apache Paimon table
View or delete an Apache Paimon catalog
In the Realtime Compute for Apache Flink console, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Catalogs. On the Catalogs page, view or delete an Apache Paimon catalog.
View an Apache Paimon catalog: On the Catalog List page, find the desired catalog and view the Name and Type columns of the catalog. To view the databases and tables in the catalog, click View in the Actions column.
Delete an Apache Paimon catalog: On the Catalog List page, find the catalog that you want to delete and click Delete in the Actions column.
NoteAfter the Apache Paimon catalog is deleted, only the catalog information on the Catalogs page in the Realtime Compute for Apache Flink namespace is deleted. The data files of the Apache Paimon tables are not deleted. After the Apache Paimon catalog is deleted, you can re-create the Apache Paimon catalog by executing an SQL statement. Then, you can reuse the Apache Paimon tables in the catalog.
You can also enter
DROP CATALOG <catalog name>;
in the script editor, select the code, and then click Run in the upper-left corner of the script editor.
References
If the built-in catalogs of Realtime Compute for Apache Flink cannot meet your business requirements, you can use custom catalogs. For more information, see Manage custom catalogs.