Getting started with basic features of Apache Paimon - Realtime Compute for Apache Flink

This topic describes how to use basic features of Apache Paimon in the development console of Realtime Compute for Apache Flink. The basic features allow you to create and delete an Apache Paimon catalog, create and delete an Apache Paimon table, write data to an Apache Paimon table, and update and consume data in an Apache Paimon table.

Prerequisites

If you want to use a RAM user or RAM role to access the development console of Realtime Compute for Apache Flink, make sure that the RAM user or RAM role has the required permissions. For more information, see Permission management.
A workspace is created. For more information, see Activate Realtime Compute for Apache Flink.

Object Storage Service (OSS) is activated and an OSS bucket whose storage class is Standard is created. For more information, see Get started by using the OSS console. OSS is used to store files related to Apache Paimon tables, such as data files and metadata files.
Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 8.0.5 or later supports Apache Paimon tables.

Step 1: Create an Apache Paimon catalog

Go to the Scripts tab.
1. Log on to the Realtime Compute for Apache Flink console.
2. Find the workspace that you want to manage and click Console in the Actions column.
3. In the left-side navigation pane, click Development > Scripts. On the Scripts tab, create a script.

In the script editor, enter the following code to create an Apache Paimon catalog:

-- my-Catalog is the name of the custom catalog.
CREATE Catalog `my-catalog` WITH (
  'type' = 'paimon',
  'metastore' = 'filesystem',
  'warehouse' = '<warehouse>',
  'fs.oss.endpoint' = '<fs.oss.endpoint>',
  'fs.oss.accessKeyId' = '<fs.oss.accessKeyId>',
  'fs.oss.accessKeySecret' = '<fs.oss.accessKeySecret>'
);

The following table describes the parameters.

Parameter	Description	Required	Remarks
type	The type of the catalog.	Yes	Set the value to Paimon.
metastore	The metadata storage type.	Yes	In this example, the parameter is set to filesystem. For more information about other types, see Manage Apache Paimon catalogs.
warehouse	The data warehouse directory that is specified in OSS.	Yes	The format is oss://<bucket>/<object>. Parameters in the directory: bucket: indicates the name of the OSS bucket that you created. object: indicates the path in which your data is stored. You can view the bucket name and object name in the OSS console.
fs.oss.endpoint	The endpoint of OSS.	No	This parameter is required if the OSS bucket specified by the warehouse parameter is not in the same region as the Realtime Compute for Apache Flink workspace or an OSS bucket within another Alibaba Cloud account is used. For more information, see Regions and endpoints. Note To store the Apache Paimon table in OSS-HDFS, you must configure the fs.oss.endpoint, fs.oss.accessKeyId, and fs.oss.accessKeySecret parameters. The value of the fs.oss.endpoint parameter is in the cn-<region>.oss-dls.aliyuncs.com format, such as cn-hangzhou.oss-dls.aliyuncs.com.
fs.oss.accessKeyId	The AccessKey ID of the Alibaba Cloud account or RAM user that has the read and write permissions on OSS.	No	This parameter is required if the OSS bucket specified by the warehouse parameter is not in the same region as the Realtime Compute for Apache Flink workspace or an OSS bucket within another Alibaba Cloud account is used. For more information about how to obtain the AccessKey pair, see Create an AccessKey pair.
fs.oss.accessKeySecret	The AccessKey secret of the Alibaba Cloud account or RAM user that has the read and write permissions on OSS.	No

Select the code for creating the Apache Paimon catalog, and click Run on the left side of the script editor.
If the The following statement has been executed successfully! message is returned, the catalog is created.

Step 2: Create an Apache Paimon table

On the Scripts tab, enter the following code in the script editor to create an Apache Paimon database named my_db and an Apache Paimon table named my_tbl:
```
CREATE DATABASE `my-catalog`.`my_db`;
CREATE TABLE `my-catalog`.`my_db`.`my_tbl` (
  dt STRING,
  id BIGINT,
  content STRING,
  PRIMARY KEY (dt, id) NOT ENFORCED
) PARTITIONED BY (dt) WITH (
  'changelog-producer' = 'lookup'  
);
```
Note
In this example, the changelog-producer parameter is set to lookup in the WITH clause to use the lookup policy to generate change logs. This way, data can be consumed from the Apache Paimon table in streaming mode. For more information about change log generation, see Change data generation mechanism.
Select the code for creating the Apache Paimon database and the Apache Paimon table, and click Run on the left side of the script editor.
If the The following statement has been executed successfully! message is returned, the Apache Paimon database named my_db and the Apache Paimon table named my_tbl are created.

Step 3: Write data to the Apache Paimon table

On the Drafts tab of the Development > ETL page, click New. On the SQL Scripts tab of the New Draft dialog box, click Blank Stream Draft. For more information about how to develop an SQL draft, see Develop an SQL draft. Copy the following INSERT statement to the SQL editor:

-- The Apache Paimon result table commits data only after each checkpointing is complete. 
-- In this example, the checkpointing interval is reduced to 10s to help you quickly obtain the results. 
-- In the production environment, the checkpointing interval and the minimal pause between checkpointing attempts vary based on your business requirements for latency. In most cases, they are set to 1 to 10 minutes. 
SET 'execution.checkpointing.interval'='10s';
INSERT INTO `my-catalog`.`my_db`.`my_tbl` VALUES ('20240108',1,'apple'), ('20240108',2,'banana'), ('20240109',1,'cat'), ('20240109',2,'dog');

In the upper-right corner of the SQL editor page, click Deploy. In the Deploy draft dialog box, configure the parameters and click Confirm.
On the O&M > Deployments page, find the desired deployment, and click Start in the Actions column. In the Start Job panel, select Initial Mode, and click Start.
If the deployment status changes to FINISHED, data is written to the deployment.

Step 4: Consume data from the Apache Paimon table in streaming mode

Create a blank streaming draft, and copy the following code to the SQL editor. The code uses the Print connector to export all data from the my_tbl table to logs.

CREATE TEMPORARY TABLE Print (
  dt STRING,
  id BIGINT,
  content STRING
) WITH (
  'connector' = 'print'
);
INSERT INTO Print SELECT * FROM `my-catalog`.`my_db`.`my_tbl`;

In the upper-right corner of the SQL editor page, click Deploy. In the Deploy draft dialog box, configure the parameters and click Confirm.
On the O&M > Deployments page, find the desired deployment, and click Start in the Actions column. In the Start Job panel, select Initial Mode, and click Start.
On the Deployments page, view the computing result.
1. In the left-side navigation pane, click O&M > Deployments. On the Deployments page, click the name of the deployment that you want to manage.
2. On the Logs tab of the Logs tab, click the value in the Path, ID column on the Running Task Managers tab.
3. Click the Stdout tab to view the consumed Apache Paimon data.

Step 5: Update data in the Apache Paimon table

Create a blank streaming draft, and copy the following code to the SQL editor:

SET 'execution.checkpointing.interval' = '10s';
INSERT INTO `my-catalog`.`my_db`.`my_tbl` VALUES ('20240108', 1, 'hello'), ('20240109', 2, 'world');

In the upper-right corner of the SQL editor page, click Deploy. In the Deploy draft dialog box, configure the parameters and click Confirm.
On the O&M > Deployments page, find the desired deployment, and click Start in the Actions column. In the Start Job panel, select Initial Mode, and click Start.
If the deployment status changes to FINISHED, data is written to the Apache Paimon table.
Go to the Stdout tab from the Deployments page as described in Step 4, and view the data that is updated in the Apache Paimon table.

(Optional) Step 6: Cancel the deployment in which data is consumed in streaming mode and clear the resources

After the test is complete, you can perform the following steps to cancel the deployment in which data is consumed in streaming mode and clear the resources:

On the O&M > Deployments page, find the deployment that you want to cancel and click Cancel in the Actions column.
On the SQL Editor page, click the Scripts tab. In the SQL editor on the Scripts tab, enter the following code to delete the Apache Paimon data files and the Apache Paimon catalog:
```
DROP DATABASE 'my-catalog'.'my_db' CASCADE; -- Delete all data files of the Apache Paimon database stored in OSS. 
DROP CATALOG 'my-catalog'; -- Delete the Apache Paimon catalog from the metadata in the development console of Realtime Compute for Apache Flink. Data files stored in OSS are not deleted.
```
If the The following statement has been executed successfully! message is returned, the Apache Paimon data files and the Apache Paimon catalog are deleted.

References

For more information about how to write data to or consume data from an Apache Paimon table, see Write data to or consume data from an Apache Paimon table.
For more information about how to modify the schema of an Apache Paimon table, such as adding a column and changing the data type of a column, and how to temporarily modify the parameters of an Apache Paimon table, see Modify the schema of an Apache Paimon table.
For more information about how to optimize the Apache Paimon primary key tables and Append Scalable tables in different scenarios, see Optimize performance of Apache Paimon tables.
For more information about how to resolve issues related to Apache Paimon, see Upstream and downstream storage.