This topic describes the major updates and bug fixes of the Realtime Compute for Apache Flink version released on September 11, 2024.
The version upgrade is incrementally rolled out across the network by using a canary release plan. For information about the upgrade schedule, see the latest announcement on the right side of the Realtime Compute for Apache Flink console. You can use the new features in this version only after the upgrade is complete for your account. To apply for the upgrade at the earliest opportunity, submit a ticket.
Overview
This release includes platform updates, engine updates, connector updates, performance optimization, and bug fixes.
Platform updates
Platform updates in this release focus on ease of use, system stability, security, and O&M efficiency. The following section describes major platform updates:
Support of YAML deployments for data ingestion by Flink Change Data Capture (CDC): As a solution for real-time database synchronization, Flink CDC has gained wide support and application by developers and enterprise users. The official donation of the Flink CDC program by Alibaba to the Apache Software Foundation marks the evolution of Flink CDC from a Flink source for change data capture to a Flink-based streaming extract, transform, and load (ETL) framework. This evolution includes a new data ingestion module to enhance overall capabilities of Flink CDC.
Optimized task orchestration capabilities: Alerting capabilities are enriched for task orchestration. CloudMonitor can report alerts by using multiple methods such as DingTalk and phone calls. In addition, dynamic variables can be used in task orchestration to periodically run the same code for calculation at the preset intervals. The task orchestration capabilities are continuously optimized for higher ease of use.
Extended key hosting capabilities: The key hosting capabilities widely applied to SQL deployments are extended to the increasing JAR deployments. Therefore, Realtime Compute for Apache Flink supports key hosting for JAR and Python deployments. Some information rather than keys, such as IP addresses, may be used by multiple deployments. To resolve this issue, key hosting is officially renamed variable management to cover the plaintext variable management capability.
Reorganization of level-1 console menus: If the large number of new modules are displayed in tiled mode, you cannot easily locate the required modules. Therefore, the layout of menu items in the left-side navigation pane of the development console is optimized to improve the intuitiveness and ease of use. This way, you can easily locate the required features.
Engine updates
Ververica Runtime (VVR) 8.0.9 is officially released to provide an enterprise-class engine based on Apache Flink 1.17.2. VVR 8.0.9 includes the following updates:
The Binlog parsing thread parameters are supported for the MySQL CDC connector to help you improve the concurrent Binlog parsing capability as required.
The Zstandard compression algorithm is supported for the Kafka connector to improve the data transmission efficiency. The built-in Protobuf format is supported to facilitate processing of structured data.
The sink performance and processing speed of the Redis connector are improved. The connection pool parameters can be configured to allow for more flexible connection management.
The delete action is supported for Paimon sinks, which makes it easier to update partial data.
Batch Flink deployments can use the Celeborn remote shuffle service to store shuffle data in high-performance clusters. This breaks the disk capacity limit of Flink nodes, enhances ultra-large-scale data processing capabilities, and maintains stability and cost-effectiveness of deployments.
For more information about the major updates in this release and the related references, see the next section of this topic. The version upgrade is incrementally rolled out across the network by using a canary release plan. After the upgrade is complete for your account, we recommend that you upgrade the VVR engine to this version. For more information, see Upgrade the engine version of a deployment. We look forward to your feedback.
Features
Feature | Description | References |
Data ingestion module added | A YAML draft can be developed based on Flink CDC 3.0 to synchronize data from the source to the destination. | - |
Interconnection with Data Lake Formation (DLF) 2.0 | If you select DLF 2.0 as the metadata storage type when you create a Paimon catalog, you do not need to specify configurations, such as an AccessKey pair. | |
Optimization of access permissions | When you create a Realtime Compute for Apache Flink workspace for the first time, permissions on DLF-related operations are granted for you to access DLF-related catalogs. This improves the user experience of DLF 2.0. By default, DLF permissions are automatically granted to existing users. | |
Quick creation of a session cluster | If no session cluster is available when you run a query script, you can create an execution environment by configuring key parameters to directly run the script. | N/A |
Optimized task orchestration capabilities | The alerting capabilities of workflows are enriched. CloudMonitor can report event alerts by using multiple methods, such as DingTalk and phone calls. | |
Extended key hosting capabilities | Key hosting is renamed variable management. You can reference plaintext or ciphertext variables in JAR and Python deployments. | |
Reorganization of level-1 console menus | New modules such as data ingestion are added. The layout of menu items in the left-side navigation pane of the development console is optimized. This way, you can easily locate the required modules. | N/A |
Enhanced MySQL connector performance | Binlog parsing threads can be configured to improve the asynchronous parsing capability. | |
Enhanced Kafka connector performance |
| |
Enhanced Redis connector performance |
| |
Simple Log Service connector refactoring |
| |
Enhanced Paimon connector | The semantics of received retraction messages (delete or update) can be configured to improve delete action performance. | |
Enhanced dimension table lookup capabilities in MongoDB | The | |
Enhanced StarRocks connector stability | The write retry mechanism is optimized for network exceptions and the default value of the | |
Optimized HBase connector | The | |
Optimized Lindorm connector | Data can be written to a result table. Data in specific columns can be excluded for the update operation. | |
Remote shuffle service | After the remote shuffle service is enabled for batch deployments in Realtime Compute for Apache Flink, the shuffle data is stored in high-performance Apache Celeborn clusters. Deployments are no longer limited by the disk capacity of compute nodes of Realtime Compute for Apache Flink. This enhances ultra-large-scale data processing capabilities and maintains the high stability and cost-effectiveness of the deployments. The remote shuffle service is in public preview and free of charge. | Enable the remote shuffle service in a batch deployment (public preview) |
Fixed issues
If MySQL CDC consumes data from the specified checkpoint, MySQL CDC cannot recover from the checkpoint after a primary/secondary switchover.
When the StarRocks connector uses the CREATE TABLE AS statement in VVR 8.0.8, the java.lang.ClassNotFoundException error is reported.
When the Realtime Compute for Apache Flink console connects to an Elasticsearch connector, the connector version V8 is not supported.
The Hologres connector forcibly checks the table ID during startup.