All Products
Search
Document Center

Realtime Compute for Apache Flink:Manage DLF catalogs

Last Updated:Jun 28, 2024

After you create a Data Lake Formation (DLF) catalog, you can access the tables of the DLF catalog in the development console of Realtime Compute for Apache Flink. You do not need to register DLF tables. This helps improve draft development efficiency and ensure data correctness. This topic describes how to create, view, use, and drop a DLF catalog.

Background information

Alibaba Cloud DLF is a unified metadata management service that is provided by Alibaba Cloud. You can use DLF to manage tables that are in open source formats, such as Iceberg, Hudi, Delta, Parquet, ORC, and Avro.

This topic describes the operations that you can perform to manage DLF catalogs:

Prerequisites

The Alibaba Cloud DLF service is activated.

Limits

  • Only Realtime Compute for Apache Flink whose engine version is vvr-4.0.12-flink-1.13 or later supports DLF catalogs.

  • Realtime Compute for Apache Flink can manage only the Iceberg and Hudi data lake formats in DLF catalogs.

Create a DLF catalog

You can create a DLF catalog on the UI or by executing an SQL statement. We recommend that you create a DLF catalog on the UI.

Create a DLF catalog on the UI

  1. Go to the Catalogs page.

    1. Log on to the Realtime Compute for Apache Flink console.

    2. Find the workspace that you want to manage and click Console in the Actions column.

    3. In the left-side navigation pane, click Catalogs.

  2. On the Catalog List page, click Create Catalog.

  3. In the Create Catalog dialog box, click DLF on the Built-in Catalog tab in the Choose Catalog Type step and click Next.

  4. Create a DLF catalog.

    1. Configure the catalog information.

      DLF Catalog

      Parameter

      Description

      Required

      Remarks

      catalogname

      The name of the DLF catalog.

      Yes

      Set the value to a custom name.

      access.key.id

      The AccessKey ID of your Alibaba Cloud account that is used to access Object Storage Service (OSS).

      Yes

      For more information about how to obtain the AccessKey pair, see Obtain an AccessKey pair.

      access.key.secret

      The AccessKey secret of your Alibaba Cloud account that is used to access OSS.

      Yes

      For more information about how to obtain the AccessKey pair, see Obtain an AccessKey pair.

      warehouse

      The default path in which tables in the DLF catalog are stored in OSS. The default OSS path is the OSS directory.

      Yes

      The OSS and OSS-HDFS services are supported.

      • An OSS path is in the oss://<bucket>/<object> format.

      • An OSS-HDFS path is in the oss://<bucket>.<oss-hdfs-endpoint>/<object> format.

      Parameters in the path:

      • bucket: indicates the name of the OSS bucket that you created. You can log on to the OSS console to view the information.

      • object: indicates the path in which your data is stored. You can log on to the OSS console to view the information.

      • oss-hdfs-endpoint: indicates the endpoint of the OSS-HDFS service. You can log on to the OSS console and perform the following operations to view the endpoint of OSS-HDFS: In the left-side navigation pane of the OSS console, click Buckets. On the Buckets page, find the desired bucket and click the name of the bucket. In the pane in the middle of the page that appears, click Overview. On the page that appears, view the endpoint of the OSS-HDFS service in the Port section.

      Note

      Only Realtime Compute for Apache Flink that uses VVR 8.0.3 or later allows you to set this parameter to an OSS-HDFS path.

      oss.endpoint

      The endpoint of OSS, such as oss-cn-hangzhou-internal.aliyuncs.com.

      Yes

      The OSS and OSS-HDFS services are supported.

      • The endpoint of OSS. For more information, see Regions and endpoints.

      • The endpoint of OSS-HDFS. You can log on to the OSS console and perform the following operations to view the endpoint of OSS-HDFS: In the left-side navigation pane of the OSS console, click Buckets. On the Buckets page, find the desired bucket and click the name of the bucket. In the pane in the middle of the page that appears, click Overview. On the page that appears, view the endpoint of the OSS-HDFS service in the Port section.

      Note

      dlf.endpoint

      The endpoint of the DLF service.

      Yes

      Note

      dlf.region-id

      The ID of the region in which the DLF service resides.

      Yes

      Note

      Make sure that the region you selected matches the endpoint you selected for dlf.endpoint.

      more configuration

      Other parameters that you want to configure for the DLF catalog. For example, you can specify multiple DLF catalogs. Separate multiple DLF catalogs with line feeds.

      No

      Example: dlf.catalog.id:my_catalog.

    2. Click Confirm.

  5. View the catalog that you create in the Catalogs pane on the left side of the Catalog List page.

Create a DLF catalog by executing an SQL statement

  1. In the code editor of the Scripts tab on the SQL Editor page, enter the following statement to create a DLF catalog:

    CREATE CATALOG <yourcatalogname> WITH (
       'type' = 'dlf',
       'access.key.id' = '<YourAliyunAccessKeyId>',
       'access.key.secret' = '<YourAliyunAccessKeySecret>',
       'warehouse' = '<YourAliyunOSSLocation>',
       'oss.endpoint' = '<YourAliyunOSSEndpoint>',
       'dlf.region-id' = '<YourAliyunDLFRegionId>',
       'dlf.endpoint' = '<YourAliyunDLFEndpoint>'
    );

    Parameter

    Description

    Required

    Remarks

    yourcatalogname

    The name of the DLF catalog.

    Yes

    Set the value to a custom name.

    Important

    You must remove the angle brackets (<>) when you replace the value of the parameter with the name of your catalog. Otherwise, an error is returned during the syntax check.

    type

    The type of the catalog.

    Yes

    Set the value to dlf.

    access.key.id

    The AccessKey ID of the Alibaba Cloud account.

    Yes

    For more information about how to obtain the AccessKey pair, see Obtain an AccessKey pair.

    access.key.secret

    The AccessKey secret of your Alibaba Cloud account.

    Yes

    For more information about how to obtain the AccessKey pair, see Obtain an AccessKey pair.

    warehouse

    The default path in which tables in the DLF catalog are stored in OSS.

    Yes

    The path must be in the oss://<bucket>/<object> format. Parameters in the path:

    • bucket: indicates the name of the OSS bucket that you created.

    • object: indicates the path in which your data is stored.

    Note

    Log on to the OSS console to view your bucket name and object name.

    oss.endpoint

    The endpoint of OSS.

    Yes

    For more information, see Regions and endpoints.

    Note

    dlf.endpoint

    The endpoint of the DLF service.

    Yes

    Note

    dlf.region-id

    The ID of the region in which the DLF service resides.

    Yes

    Note

    Make sure that the region you selected matches the endpoint you selected for dlf.endpoint.

  2. Select the code that is used to create a catalog and click Run that appears on the left side of the code.

  3. In the Catalogs pane on the left side of the Catalog List page, view the catalog that you create.

View a DLF catalog

After you create the DLF catalog, you can perform the following steps to view the DLF metadata.

  1. Go to the Catalog List page.

    1. Log on to the Realtime Compute for Apache Flink console.

    2. Find the workspace that you want to manage and click Console in the Actions column.

    3. In the left-side navigation pane, click Catalogs.

  2. On the Catalog List page, find the desired catalog and view the Catalog Name and Type columns of the catalog.

    Note

    If you want to view the databases and tables in the catalog, click View in the Actions column.

Use a DLF catalog

Manage DLF databases

In the code editor of the Scripts tab on the SQL Editor page, enter the following statement to create or drop a DLF database based on your business requirements. Select the statement and click Run on the left of the code editor. After you create or drop a DLF database, you can click the Catalogs tab on the left side of the SQL Editor page to check whether the DLF database is created or dropped.

  • Create a DLF database

    CREATE DATABASE dlf.dlf_testdb;
  • Drop a DLF database

    DROP DATABASE dlf.dlf_testdb;

Manage DLF tables

  • Create a DLF table

    • Create a DLF table by using a connector

      Create a DLF table by executing an SQL statement

      In the code editor of the Scripts tab on the SQL Editor page, enter one of the following statements to create a DLF table by using a connector. Select the statement and click Run on the left of the code editor. After the DLF table is created, you can click the Catalogs tab on the left side of the SQL Editor page to view the DLF table that you create.

      CREATE TABLE dlf.dlf_testdb.iceberg (
        id    BIGINT,
        data  STRING,
        dt    STRING
      ) PARTITIONED BY (dt) WITH(
        'connector' = 'iceberg'
      );
      
      CREATE TABLE dlf.dlf_testdb.hudi (
        id    BIGINT PRIMARY KEY NOT ENFORCED,
        data  STRING,
        dt    STRING
      ) PARTITIONED BY (dt) WITH(
        'connector' = 'hudi'
      );

      Create a DLF table on the UI

      1. Go to the Catalogs page.

        1. Log on to the Realtime Compute for Apache Flink console.

        2. Find the workspace that you want to manage and click Console in the Actions column.

        3. In the left-side navigation pane, click Catalogs.

      2. On the Catalog List page, find the desired catalog and click View in the Actions column.

      3. On the page that appears, find the desired database and click View in the Actions column.

      4. On the page that appears, click Create Table.

      5. On the Built-in tab of the Create Table dialog box, click Connection Type and select a table type.

      6. Click Next.

      7. Enter the table creation statement and configure related parameters. Sample code:

        CREATE TABLE dlf.dlf_testdb.iceberg (
          id    BIGINT,
          data  STRING,
          dt    STRING
        ) PARTITIONED BY (dt) WITH(
          'connector' = 'iceberg'
        );
        
        CREATE TABLE dlf.dlf_testdb.hudi (
          id    BIGINT PRIMARY KEY NOT ENFORCED,
          data  STRING,
          dt    STRING
        ) PARTITIONED BY (dt) WITH(
          'connector' = 'hudi'
        );
      8. Click Confirm.

    • Quickly create a table with the same schema as an existing table (only for Apache Iceberg tables)

      In the code editor of the Scripts tab on the SQL Editor page, enter the following statement. Select the statement and click Run on the left of the code editor.

      CREATE TABLE iceberg_table_like LIKE iceberg_table;
  • Drop a DLF table

    DROP TABLE iceberg_table;

Change the schema of an Apache Iceberg table

In the code editor of the Scripts tab on the SQL Editor page, enter one of the following statements. Select the statement and click Run on the left of the code editor.

Operation

Sample code

Modify a table attribute

ALTER TABLE iceberg_table SET ('write.format.default'='avro');

Rename a table

ALTER TABLE iceberg_table RENAME TO new_iceberg_table;

Change the name of a column

ALTER TABLE iceberg_table RENAME id TO index;
Note

Only Realtime Compute for Apache Flink that uses VVR 8.0.7 or later supports this operation.

Change the data type of a column

ALTER TABLE iceberg_talbe MODIFY (id, BIGINT)

The data type of a column can be changed based on the following rules:

  • INT -> BIGINT

  • Float -> Double

  • Decimal -> Decimal

Note

Only Realtime Compute for Apache Flink that uses VVR 8.0.7 or later supports this operation.

Write data

INSERT INTO dlf.dlf_testdb.iceberg VALUES (1, 'AAA', '2022-02-01'), (2, 'BBB', '2022-02-01');
INSERT INTO dlf.dlf_testdb.hudi VALUES (1, 'AAA', '2022-02-01'), (2, 'BBB', '2022-02-01');

Read data

SELECT * FROM dlf.dlf_testdb.iceberg LIMIT 2;
SELECT * FROM dlf.dlf_testdb.hudi LIMIT 2;

Drop a DLF catalog

Warning

After you drop a DLF catalog, the deployments that are running are not affected. However, the deployments that use a table of the catalog can no longer find the table if the deployments are published or restarted. Proceed with caution when you drop a DLF catalog.

You can drop a DLF catalog on the UI or by executing an SQL statement. We recommend that you drop a DLF catalog on the UI.

Drop a DLF catalog on the UI

  1. Go to the Catalogs page.

    1. Log on to the Realtime Compute for Apache Flink console.

    2. Find the workspace that you want to manage and click Console in the Actions column.

    3. In the left-side navigation pane, click Catalogs.

  2. On the Catalog List page, find the desired catalog and click Delete in the Actions column.

  3. In the message that appears, click Delete.

  4. View the Catalogs pane on the left side of the Catalog List page to check whether the catalog is dropped.

Drop a DLF catalog by executing an SQL statement

  1. In the code editor of the Scripts tab on the SQL Editor page, enter the following statement:

    DROP CATALOG ${catalog_name}

    catalog_name indicates the name of the DLF catalog that you want to drop in the development console of Realtime Compute for Apache Flink.

  2. Right-click the statement that is used to drop the catalog and select Run from the shortcut menu.

  3. View the Catalogs pane on the left side of the Catalog List page to check whether the catalog is dropped.

References

  • For more information about how to use the Apache Iceberg connector, see Apache Iceberg connector.

  • For more information about how to use the Hudi connector, see Hudi connector.

  • If the built-in catalogs of Realtime Compute for Apache Flink cannot meet your business requirements, you can use custom catalogs. For more information, see Manage custom catalogs.