All Products
Search
Document Center

Realtime Compute for Apache Flink:Getting started with basic features of Apache Paimon

Last Updated:Sep 12, 2024

This topic describes how to use basic features of Apache Paimon in the development console of Realtime Compute for Apache Flink. The basic features allow you to create and delete an Apache Paimon catalog, create and delete an Apache Paimon table, write data to an Apache Paimon table, and update and consume data in an Apache Paimon table.

Prerequisites

  • If you want to use a RAM user or RAM role to access the development console of Realtime Compute for Apache Flink, make sure that the RAM user or RAM role has the required permissions. For more information, see Permission management.

  • A workspace is created. For more information, see Activate Realtime Compute for Apache Flink.

  • Object Storage Service (OSS) is activated and an OSS bucket whose storage class is Standard is created. For more information, see Get started by using the OSS console. OSS is used to store files related to Apache Paimon tables, such as data files and metadata files.

  • Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 8.0.5 or later supports Apache Paimon tables.

Step 1: Create an Apache Paimon catalog

  1. Go to the Scripts tab.

    1. Log on to the Realtime Compute for Apache Flink console.

    2. Find the workspace that you want to manage and click Console in the Actions column.

    3. In the left-side navigation pane, click Development > Scripts. On the Scripts tab, create a script.

  2. In the script editor, enter the following code to create an Apache Paimon catalog:

    -- my-Catalog is the name of the custom catalog.
    CREATE Catalog `my-Catalog` WITH (
      'type' = 'paimon',
      'metastore' = 'filesystem',
      'warehouse' = '<warehouse>',
      'fs.oss.endpoint' = '<fs.oss.endpoint>',
      'fs.oss.accessKeyId' = '<fs.oss.accessKeyId>',
      'fs.oss.accessKeySecret' = '<fs.oss.accessKeySecret>'
    );

    The following table describes the parameters.

    Parameter

    Description

    Required

    Remarks

    type

    The type of the catalog.

    Yes

    Set the value to Paimon.

    metastore

    The metadata storage type.

    Yes

    In this example, the parameter is set to filesystem. For more information about other types, see Manage Apache Paimon catalogs.

    warehouse

    The data warehouse directory that is specified in OSS.

    Yes

    The format is oss://<bucket>/<object>. Parameters in the directory:

    • bucket: indicates the name of the OSS bucket that you created.

    • object: indicates the path in which your data is stored.

    You can view the bucket name and object name in the OSS console.

    fs.oss.endpoint

    The endpoint of OSS.

    No

    This parameter is required if the OSS bucket specified by the warehouse parameter is not in the same region as the Realtime Compute for Apache Flink workspace or an OSS bucket within another Alibaba Cloud account is used.

    For more information, see Regions and endpoints.

    Note

    To store the Apache Paimon table in OSS-HDFS, you must configure the fs.oss.endpoint, fs.oss.accessKeyId, and fs.oss.accessKeySecret parameters. The value of the fs.oss.endpoint parameter is in the cn-<region>.oss-dls.aliyuncs.com format, such as cn-hangzhou.oss-dls.aliyuncs.com.

    fs.oss.accessKeyId

    The AccessKey ID of the Alibaba Cloud account or RAM user that has the read and write permissions on OSS.

    No

    This parameter is required if the OSS bucket specified by the warehouse parameter is not in the same region as the Realtime Compute for Apache Flink workspace or an OSS bucket within another Alibaba Cloud account is used. For more information about how to obtain the AccessKey pair, see Create an AccessKey pair.

    fs.oss.accessKeySecret

    The AccessKey secret of the Alibaba Cloud account or RAM user that has the read and write permissions on OSS.

    No

  3. Select the code for creating the Apache Paimon catalog, and click Run on the left side of the script editor.

    If the The following statement has been executed successfully! message is returned, the catalog is created.

Step 2: Create an Apache Paimon table

  1. On the Scripts tab, enter the following code in the script editor to create an Apache Paimon database named my_db and an Apache Paimon table named my_tbl:

    CREATE DATABASE `my-catalog`.`my_db`;
    CREATE TABLE `my-catalog`.`my_db`.`my_tbl` (
      dt STRING,
      id BIGINT,
      content STRING,
      PRIMARY KEY (dt, id) NOT ENFORCED
    ) PARTITIONED BY (dt) WITH (
      'changelog-producer' = 'lookup'  
    );
    Note

    In this example, the changelog-producer parameter is set to lookup in the WITH clause to use the lookup policy to generate change logs. This way, data can be consumed from the Apache Paimon table in streaming mode. For more information about change log generation, see Change data generation mechanism.

  2. Select the code for creating the Apache Paimon database and the Apache Paimon table, and click Run on the left side of the script editor.

    If the The following statement has been executed successfully! message is returned, the Apache Paimon database named my_db and the Apache Paimon table named my_tbl are created.

Step 3: Write data to the Apache Paimon table

  1. On the Drafts tab of the Development > ETL page, click New. On the SQL Scripts tab of the New Draft dialog box, click Blank Stream Draft. For more information about how to develop an SQL draft, see Develop an SQL draft. Copy the following INSERT statement to the SQL editor:

    -- The Apache Paimon result table commits data only after each checkpointing is complete. 
    -- In this example, the checkpointing interval is reduced to 10s to help you quickly obtain the results. 
    -- In the production environment, the checkpointing interval and the minimal pause between checkpointing attempts vary based on your business requirements for latency. In most cases, they are set to 1 to 10 minutes. 
    SET 'execution.checkpointing.interval'='10s';
    INSERT INTO `my-catalog`.`my_db`.`my_tbl` VALUES ('20240108',1,'apple'), ('20240108',2,'banana'), ('20240109',1,'cat'), ('20240109',2,'dog');
  2. In the upper-right corner of the SQL editor page, click Deploy. In the Deploy draft dialog box, configure the parameters and click Confirm.

  3. On the O&M > Deployments page, find the desired deployment, and click Start in the Actions column. In the Start Job panel, select Initial Mode, and click Start.

    If the deployment status changes to FINISHED, data is written to the deployment.

Step 4: Consume data from the Apache Paimon table in streaming mode

  1. Create a blank streaming draft, and copy the following code to the SQL editor. The code uses the Print connector to export all data from the my_tbl table to logs.

    CREATE TEMPORARY TABLE Print (
      dt STRING,
      id BIGINT,
      content STRING
    ) WITH (
      'connector' = 'print'
    );
    INSERT INTO Print SELECT * FROM `my-catalog`.`my_db`.`my_tbl`;
  2. In the upper-right corner of the SQL editor page, click Deploy. In the Deploy draft dialog box, configure the parameters and click Confirm.

  3. On the O&M > Deployments page, find the desired deployment, and click Start in the Actions column. In the Start Job panel, select Initial Mode, and click Start.

  4. On the Deployments page, view the computing result.

    1. In the left-side navigation pane, click O&M > Deployments. On the Deployments page, click the name of the deployment that you want to manage.

    2. On the Logs tab of the Logs tab, click the value in the Path, ID column on the Running Task Managers tab.

    3. Click the Stdout tab to view the consumed Apache Paimon data.

    Paimon快速开始.jpg

Step 5: Update data in the Apache Paimon table

  1. Create a blank streaming draft, and copy the following code to the SQL editor:

    SET 'execution.checkpointing.interval' = '10s';
    INSERT INTO `my-catalog`.`my_db`.`my_tbl` VALUES ('20240108', 1, 'hello'), ('20240109', 2, 'world');
  2. In the upper-right corner of the SQL editor page, click Deploy. In the Deploy draft dialog box, configure the parameters and click Confirm.

  3. On the O&M > Deployments page, find the desired deployment, and click Start in the Actions column. In the Start Job panel, select Initial Mode, and click Start.

    If the deployment status changes to FINISHED, data is written to the Apache Paimon table.

  4. Go to the Stdout tab from the Deployments page as described in Step 4, and view the data that is updated in the Apache Paimon table.

    Paimon快速开始1.jpg

(Optional) Step 6: Cancel the deployment in which data is consumed in streaming mode and clear the resources

After the test is complete, you can perform the following steps to cancel the deployment in which data is consumed in streaming mode and clear the resources:

  1. On the O&M > Deployments page, find the deployment that you want to cancel and click Cancel in the Actions column.

  2. On the SQL Editor page, click the Scripts tab. In the SQL editor on the Scripts tab, enter the following code to delete the Apache Paimon data files and the Apache Paimon catalog:

    DROP DATABASE 'my-catalog'.'my_db' CASCADE; -- Delete all data files of the Apache Paimon database stored in OSS. 
    DROP CATALOG 'my-catalog'; -- Delete the Apache Paimon catalog from the metadata in the development console of Realtime Compute for Apache Flink. Data files stored in OSS are not deleted.

    If the The following statement has been executed successfully! message is returned, the Apache Paimon data files and the Apache Paimon catalog are deleted.

References