All Products
Search
Document Center

Realtime Compute for Apache Flink:May 16, 2022

Last Updated:Jun 05, 2024

This topic describes the release notes for Realtime Compute for Apache Flink and provides links to relevant references. The release notes provide the major updates and bug fixes in Realtime Compute for Apache Flink in the version that was released on May 16, 2022.

Overview

Ververica Runtime (VVR) 4.0.13 was officially released on May 16, 2022. This version is developed based on Apache Flink V1.13. This version has the following feature updates: 1. In scenarios where data is synchronized from multiple tables in a sharded database, the merging and synchronization of multiple tables in the sharded database are optimized based on the capabilities of real-time data ingestion into data lakes and data warehouses. After optimization, Realtime Compute for Apache Flink can merge data of the tables with the same name in database shards and synchronize the data to a table with a name that corresponds to each source table in the Hologres destination database. 2. Kafka catalogs are supported. You can register a Kafka catalog and directly use the topics in a Flink SQL job as source tables or result tables. 3. The Hologres connector can be used to perform full synchronization and then incremental synchronization to consume binary log data in a job. This way, end-to-end data synchronization and real-time data warehousing can be performed in an efficient manner. 4. The ApsaraDB for Redis connector allows you to configure the time to live (TTL) for ApsaraDB for Redis result tables. 5. Enhancements of multiple types of connectors are released. 6. User experience for specific operations is improved. For example, you can stop a session cluster, view documents, view logs on the Logs tab in the development console of Realtime Compute for Apache Flink, and view service notices. Some defects that are fixed in the Apache Flink community and some defects of fully managed Flink are fixed in this version.

New features

Feature

Description

References

Support for Kafka catalogs

Kafka catalogs can be used to automatically parse Kafka messages to infer table information. This way, you can directly access topics of a Kafka cluster without the need to execute DDL statements. You can also use Kafka catalogs to parse JSON-formatted messages to obtain a topic schema. This improves the development efficiency and accuracy of Flink SQL.

Manage Kafka JSON catalogs

Data synchronization of multiple tables in a sharded database by using CREATE DATABASE AS

Regular expressions can be used to define database names to match source tables in multiple database shards of the data source. After data in the database shards is merged, the data can be synchronized to a downstream destination table with a name that corresponds to each source table. This improves the efficiency of data synchronization in database shards.

CREATE DATABASE AS statement

Full and incremental data consumption of source tables by using the Hologres connector

The Hologres connector can be used to synchronize full data from a Hologres source table and then smoothly switch to synchronize incremental data to consume binary log data. This way, data can be synchronized in an efficient manner when you build a data pipeline for real-time data warehousing.

Create a Hologres source table

TTL for keys in an ApsaraDB for Redis result table

In most cases, expiration time needs to be configured for data in an ApsaraDB for Redis database. In this case, you can configure the TTL for keys when you write data to an ApsaraDB for Redis result table.

Create an ApsaraDB for Redis result table

Support for MaxCompute Streaming Tunnel, and data compression based on MaxCompute Streaming Tunnel or Batch Tunnel

MaxCompute Streaming Tunnel can be used to write data to MaxCompute in streaming mode. If a job does not need to use the exactly-once semantics, you can use MaxCompute Streaming Tunnel to prevent performance issues that occur when checkpoints are created at a low speed. In addition, data can be compressed by using tunnels to improve the data transmission efficiency.

DataStream APIs supported by the Hologres connector

The Hologres DataStream connector is supported.

-

retry_on_conflict supported by the Elasticsearch connector

If you want to update data in an Elasticsearch result table, you can configure the retry_on_conflict parameter to specify the maximum number of retries that occur due to version conflicts.

Create an Elasticsearch result table

Compatibility between Flink CDC 2.2 and the MySQL CDC connector and Postgres CDC connector

The MySQL Change Data Capture (CDC) connector and Postgres CDC connector are compatible with all features of Flink CDC 2.2. All defects in Flink CDC 2.2 are also fixed in this version of Realtime Compute for Apache Flink.

None

Heartbeat events used to identify the latest position of the binary log file that is read from the source

Heartbeat events are used to identify the latest position of the binary log file that is read from the source. This method is effective for slowly updated tables in MySQL. The source can move forward the binary log file position based on heartbeat events instead of update events. This can prevent the binary log file position from expiring.

Create a MySQL CDC source table

Support for UNSIGNED FLOAT, DOUBLE, and DECIMAL data types

The UNSIGNED FLOAT, DOUBLE, and DECIMAL data types are supported by the MySQL CDC connector and MySQL catalogs.

Create a MySQL CDC source table

Configuration of JDBC parameters for the MySQL CDC connector

Java Database Connectivity (JDBC) parameters can be configured for the MySQL CDC connector to access MySQL instances.

Create a MySQL CDC source table

Forced termination of session clusters

Session clusters are widely used to save resources. However, session clusters may affect the stability of production due to the architecture limits of session clusters. If a session cluster is abnormal, all jobs in the cluster cannot run as expected.

To prevent this issue, we recommend that you do not publish a job in the production environment to a session cluster for running. If a job fails due to an exception of the session cluster to which the job belongs, you can forcefully terminate the session cluster.

Configure a development and test environment (session cluster)

Intelligent analysis of JobManager exceptions

If an error occurs when a Realtime Compute for Apache Flink job is running, the JobManager records the exceptions of TaskManagers into logs. You can view the exception logs on the Logs tab in the development console of Realtime Compute for Apache Flink. The exception logs can be stored for a short period of time. If a job consecutively fails, the root cause may be overwhelmed by subsequent stack information. In this version, the time for storing exception logs is prolonged and exception logs are classified. This helps you easily identify the root cause of an exception.

View the exception logs of a deployment

Built-in Alibaba Cloud documentation

During job development and O&M, developers need to redirect to the Alibaba Cloud Documentation Center from the development console of Realtime Compute for Apache Flink to view documents. Frequent window switching may interrupt the work of developers. To improve the development experience, fully managed Flink provides the built-in Alibaba Cloud documentation in the console of fully managed Flink. You can directly view the documentation in the console of fully managed Flink to prevent frequent window switching.

None

Service notices

Service notices are added to the console of Realtime Compute for Apache Flink. This way, you can directly receive various notices including product updates. This avoids the issue that notices fail to be sent to users by using text messages, internal messages, or DingTalk groups.

None

UI optimization

  • The new Alibaba Cloud theme style is supported.

  • The job status description is optimized.

Performance optimization

N/A.

Fixed issues

  • The following issue is fixed: If the number of shards is changed but the Log Service connector fails to obtain a new list of shards, data cannot be read.

  • The error [J cannot be cast to [Ljava.lang.Object; that is triggered by aggregation optimization features such as miniBatch is fixed.

  • The issue that data in an ApsaraDB for HBase result table becomes out-of-order during asynchronous data processing is fixed.

  • The issue that a null pointer occurs in join operations of two data streams is fixed.

  • The issue that checkpointing always fails when the MySQL CDC connector is used to write data to Hudi is fixed.

  • The computational logic that is used to report the pendingRecords metric for Message Queue for Apache Kafka source tables is optimized.

  • The issue that specific member names are not displayed in the development console of Realtime Compute for Apache Flink is fixed.

  • The issue that an error occurs during the verification of specific valid DDL syntax is fixed.