This topic describes the release notes for Realtime Compute for Apache Flink and provides links to relevant references. The release notes provide the major updates and bug fixes in Realtime Compute for Apache Flink in the version that was released on May 16, 2022.
Overview
Ververica Runtime (VVR) 4.0.13 was officially released on May 16, 2022. This version is developed based on Apache Flink V1.13. This version has the following feature updates: 1. In scenarios where data is synchronized from multiple tables in a sharded database, the merging and synchronization of multiple tables in the sharded database are optimized based on the capabilities of real-time data ingestion into data lakes and data warehouses. After optimization, Realtime Compute for Apache Flink can merge data of the tables with the same name in database shards and synchronize the data to a table with a name that corresponds to each source table in the Hologres destination database. 2. Kafka catalogs are supported. You can register a Kafka catalog and directly use the topics in a Flink SQL job as source tables or result tables. 3. The Hologres connector can be used to perform full synchronization and then incremental synchronization to consume binary log data in a job. This way, end-to-end data synchronization and real-time data warehousing can be performed in an efficient manner. 4. The ApsaraDB for Redis connector allows you to configure the time to live (TTL) for ApsaraDB for Redis result tables. 5. Enhancements of multiple types of connectors are released. 6. User experience for specific operations is improved. For example, you can stop a session cluster, view documents, view logs on the Logs tab in the development console of Realtime Compute for Apache Flink, and view service notices. Some defects that are fixed in the Apache Flink community and some defects of fully managed Flink are fixed in this version.
New features
Feature | Description | References |
Support for Kafka catalogs | Kafka catalogs can be used to automatically parse Kafka messages to infer table information. This way, you can directly access topics of a Kafka cluster without the need to execute DDL statements. You can also use Kafka catalogs to parse JSON-formatted messages to obtain a topic schema. This improves the development efficiency and accuracy of Flink SQL. | |
Data synchronization of multiple tables in a sharded database by using CREATE DATABASE AS | Regular expressions can be used to define database names to match source tables in multiple database shards of the data source. After data in the database shards is merged, the data can be synchronized to a downstream destination table with a name that corresponds to each source table. This improves the efficiency of data synchronization in database shards. | |
Full and incremental data consumption of source tables by using the Hologres connector | The Hologres connector can be used to synchronize full data from a Hologres source table and then smoothly switch to synchronize incremental data to consume binary log data. This way, data can be synchronized in an efficient manner when you build a data pipeline for real-time data warehousing. | |
TTL for keys in an ApsaraDB for Redis result table | In most cases, expiration time needs to be configured for data in an ApsaraDB for Redis database. In this case, you can configure the TTL for keys when you write data to an ApsaraDB for Redis result table. | |
Support for MaxCompute Streaming Tunnel, and data compression based on MaxCompute Streaming Tunnel or Batch Tunnel | MaxCompute Streaming Tunnel can be used to write data to MaxCompute in streaming mode. If a job does not need to use the exactly-once semantics, you can use MaxCompute Streaming Tunnel to prevent performance issues that occur when checkpoints are created at a low speed. In addition, data can be compressed by using tunnels to improve the data transmission efficiency. | |
DataStream APIs supported by the Hologres connector | The Hologres DataStream connector is supported. | - |
retry_on_conflict supported by the Elasticsearch connector | If you want to update data in an Elasticsearch result table, you can configure the retry_on_conflict parameter to specify the maximum number of retries that occur due to version conflicts. | |
Compatibility between Flink CDC 2.2 and the MySQL CDC connector and Postgres CDC connector | The MySQL Change Data Capture (CDC) connector and Postgres CDC connector are compatible with all features of Flink CDC 2.2. All defects in Flink CDC 2.2 are also fixed in this version of Realtime Compute for Apache Flink. | None |
Heartbeat events used to identify the latest position of the binary log file that is read from the source | Heartbeat events are used to identify the latest position of the binary log file that is read from the source. This method is effective for slowly updated tables in MySQL. The source can move forward the binary log file position based on heartbeat events instead of update events. This can prevent the binary log file position from expiring. | |
Support for UNSIGNED FLOAT, DOUBLE, and DECIMAL data types | The UNSIGNED FLOAT, DOUBLE, and DECIMAL data types are supported by the MySQL CDC connector and MySQL catalogs. | |
Configuration of JDBC parameters for the MySQL CDC connector | Java Database Connectivity (JDBC) parameters can be configured for the MySQL CDC connector to access MySQL instances. | |
Forced termination of session clusters | Session clusters are widely used to save resources. However, session clusters may affect the stability of production due to the architecture limits of session clusters. If a session cluster is abnormal, all jobs in the cluster cannot run as expected. To prevent this issue, we recommend that you do not publish a job in the production environment to a session cluster for running. If a job fails due to an exception of the session cluster to which the job belongs, you can forcefully terminate the session cluster. | Configure a development and test environment (session cluster) |
Intelligent analysis of JobManager exceptions | If an error occurs when a Realtime Compute for Apache Flink job is running, the JobManager records the exceptions of TaskManagers into logs. You can view the exception logs on the Logs tab in the development console of Realtime Compute for Apache Flink. The exception logs can be stored for a short period of time. If a job consecutively fails, the root cause may be overwhelmed by subsequent stack information. In this version, the time for storing exception logs is prolonged and exception logs are classified. This helps you easily identify the root cause of an exception. | |
Built-in Alibaba Cloud documentation | During job development and O&M, developers need to redirect to the Alibaba Cloud Documentation Center from the development console of Realtime Compute for Apache Flink to view documents. Frequent window switching may interrupt the work of developers. To improve the development experience, fully managed Flink provides the built-in Alibaba Cloud documentation in the console of fully managed Flink. You can directly view the documentation in the console of fully managed Flink to prevent frequent window switching. | None |
Service notices | Service notices are added to the console of Realtime Compute for Apache Flink. This way, you can directly receive various notices including product updates. This avoids the issue that notices fail to be sent to users by using text messages, internal messages, or DingTalk groups. | None |
UI optimization |
|
Performance optimization
N/A.
Fixed issues
The following issue is fixed: If the number of shards is changed but the Log Service connector fails to obtain a new list of shards, data cannot be read.
The error [J cannot be cast to [Ljava.lang.Object; that is triggered by aggregation optimization features such as miniBatch is fixed.
The issue that data in an ApsaraDB for HBase result table becomes out-of-order during asynchronous data processing is fixed.
The issue that a null pointer occurs in join operations of two data streams is fixed.
The issue that checkpointing always fails when the MySQL CDC connector is used to write data to Hudi is fixed.
The computational logic that is used to report the pendingRecords metric for Message Queue for Apache Kafka source tables is optimized.
The issue that specific member names are not displayed in the development console of Realtime Compute for Apache Flink is fixed.
The issue that an error occurs during the verification of specific valid DDL syntax is fixed.