This topic describes the release notes for Realtime Compute for Apache Flink and provides links to relevant references. The release notes provide the major updates and bug fixes in Realtime Compute for Apache Flink in the version that was released on March 4, 2022.
Overview
Ververica Runtime (VVR) 4.0.12 was officially released on March 4, 2022. This version is developed based on Apache Flink 1.13. In this version, Realtime Compute for Apache Flink can synchronize JSON schema changes from Message Queue for Apache Kafka to Hologres. Realtime Compute for Apache Flink provides the enterprise-level Hudi connector to work with Data Lake Formation (DLF). To improve the development efficiency, Realtime Compute for Apache Flink provides more than 20 common Flink SQL job templates. To enhance O&M capabilities, Realtime Compute for Apache Flink supports powerful job diagnostics and dynamic log level adjustment without the need to stop jobs. Realtime Compute for Apache Flink also supports various data processing capabilities, such as enterprise-level features of ClickHouse, new connectors, and the new syntax for ingesting data into data warehouses and data lakes. Some issues that are fixed in the Apache Flink community are also fixed in this version.
New features
Feature | Description | References |
Synchronization of JSON schema changes to Hologres | JSON is one of the most common event formats in stream processing. Schema changes are expected to be transparent for real-time streaming jobs and tables in the backend storage engine. This version provides the following enhancements to meet this requirement:
| |
Enhanced data lake building capabilities for Iceberg and Hudi |
| |
Improvement on ease of use for log viewing and configuration |
| |
Multiple enterprise-level ClickHouse features that are supported by Realtime Compute for Apache Flink |
| |
Optimized job diagnostics rules and the Diagnosis panel |
| |
Addition of computed columns during data synchronization | When the CREATE TABLE AS statement is used to synchronize data, a computed column can be added to the source table and used as the primary key column of the destination table. When you ingest data into a data warehouse or data lake, you can execute the CREATE TABLE AS statement to specify the position of a computed column that you want to add and use the column as the physical column in the destination table. This way, the results of the computed column can be synchronized to the destination table in real time. You can also execute the CREATE TABLE AS statement to change the primary key of the destination table and use the new column as the new primary key column of the destination table. | |
Generation of test data | The Faker connector is supported. You can use the Faker connector to more easily generate test data that meets your business requirements. This way, you can verify your business logic during development and testing. | |
Template center provided to accelerate job development |
| |
Display of resource utilization | The CPU utilization and memory usage of the current project are displayed in the lower-left corner of the development console of Realtime Compute for Apache Flink. You can manage project resources based on the information. | N/A |
Fast locating of the logs of jobs for which checkpoints are created at a low speed | The snapshot status of nodes in the snapshot history can be sorted. In addition, you can be navigated from the Flink Checkpoints History tab to the Logs tab of the Running Task Managers tab to view the cause of the slow speed at which checkpoints are created for the job. | |
Creation of an AnalyticDB for PostgreSQL result table and an AnalyticDB for PostgreSQL dimension table |
| |
Improvement on ease of use of the enterprise-level state backend storage |
|
Performance improvement
The enterprise-level state backend storage is significantly improved in this version. The performance of dual-stream or multi-stream JOIN jobs is significantly improved. The average computing resource utilization can be increased by 50%. In typical scenarios, the average computing resource utilization can be increased by 100% to 200%. This helps you run stateful stream processing applications more smoothly.
Fixed issues
The catalog service is optimized to fix the issue that data does not appear after data is refreshed if a database or a table contains a large amount of data.
The issue that the Flink version is not displayed for a session cluster is fixed.
The issue that the watermarkLag curve is not displayed as expected on the Metrics tab is fixed.
The effect of displaying curve charts by page on the Metrics tab is optimized.
Flink CDC issues, such as an issue of the currentFetchEventTimeLag metric and class conflicts, are fixed.
The issue that the CREATE TABLE AS statement cannot be used to modify existing columns is fixed.