This topic describes the release notes for Realtime Compute for Apache Flink and provides links to relevant references. The release notes provide the major updates and bug fixes in Realtime Compute for Apache Flink in the version that was released on September 19, 2022.
Overview
The new version of Realtime Compute for Apache Flink was officially released on September 19, 2022. This version provides performance optimization, bug fixes, and major updates on the platform, engine, and connectors. The Ververica Runtime (VVR) versions that are released are VVR 4.0.15 for Apache Flink 1.13 and VVR 6.0.2 for Apache Flink 1.15. The following text provides an overview of the new version of Realtime Compute for Apache Flink:
VVR 6.0.2 is officially released on the engine side. This version is the first release of an enterprise-level Flink engine based on Apache Flink 1.15. Apache Flink 1.15 provides feature and performance improvements, such as enhancements of window table-valued functions, CAST functions, type systems, and JSON functions. These feature and performance improvements are implemented on the cloud.
Status management is a key concern for users. This version allows you to manage the checkpoints and savepoints of a deployment in a status set in a centralized manner. Specifically, the generation speed, size, and restoration speed of savepoints are significantly improved in this version. The success rate and stability are also greatly improved.
In addition, the setting that the savepoints are deleted when you cancel a deployment is discontinued. The checkpoints and savepoints are clearly distinguished. You can explicitly create and manage savepoints. In this version, the optimization of the GeminiDB engine also reduces costs. The Object Storage Service (OSS) storage cost is reduced by 15% to 40% per year by using status set management. Moreover, the new version allows you to start and restore a deployment from a specified savepoint of another deployment. You can more conveniently perform dual-run tests such as A/B testing.
This version also improves resource utilization. The resource tuning feature supports the scheduled tuning policy that can automatically adjust the resources of a deployment at a specific time based on your settings. If the service has obvious peak hours or off-peak hours, you can use this policy to save labor costs.
This version also supports quick task restart that provides a fast recovery capability in case of deployment failover. This improves business continuity. If you are tolerant of duplicate copies and loss of data and have high requirements for business continuity, you can configure quick task restart to quickly recover the failed tasks. The delay caused by deployment failover even can be reduced from minutes to milliseconds.
WarningThis feature cannot prevent duplicate copies and loss of data in this version. Therefore, make sure that your business is tolerant of loss and duplicate copies of data before you use the feature. Quick task restart is disabled by default. To enable this feature for a deployment, you must add additional configuration items to the deployment. For more information about the principles and configuration details, see Configure quick task restart.
The deployment diagnostics feature introduces the concept of health score in the new version. The feature provides the diagnostic items and suggestions after a deployment is analyzed during all states and assists you in operating the real-time compute deployments.
In terms of data integration, you can use a new API supported on the platform side to integrate business.
Real-time risk management is one of the main application scenarios of Flink. The new feature related to complex event processing (CEP) aims at continuous event behaviors. The feature was provided for the whitelist users in the preview version and is verified in the production of the new version.
This version introduces a series of enhancements of Flink CEP to all users. Firstly, the hot update feature of CEP rules is the most anticipated feature for users. This capability can actively intervene in the update rules during peak hours at the earliest opportunity and solve the problem that the risk control business is interrupted for 10 minutes due to the task rerelease. The business availability is greatly improved. Secondly, the CEP SQL syntax is enhanced. This release enhances the expression capability of CEP SQL by introducing a new SQL extension syntax. You can simplify complex DataStream deployments into SQL deployments due to the enhancement of SQL syntax. The feature improves development efficiency and can be integrated into the lineage system of data governance. Finally, this version introduces more metrics to describe the rules in Flink CEP.
In terms of performance optimization, JOIN operators that are used to join two data streams in SQL streaming deployments allow the Flink engine to automatically infer whether to enable the key-value separation feature. The performance of deployments that involve dual-stream Join is significantly improved. The Hive versions supported by Hive catalog are expanded to Hive 2.1.0 to 2.3.9 and Hive 3.1.0 to 3.1.3. Users can read Connectors from Tablestore, Java Database Connectivity (JDBC) source tables, JDBC dimension tables, and result tables.
New features
Feature | Description | References |
Status set management | Status set management applies to all Flink deployments that have statuses. The status management is disassociated from the start and stop of a deployment. The savepoints are no longer deleted when you cancel a deployment. Users can create and delete savepoints on a unique management page at the scheduled time. | |
Scheduled tuning | The scheduled tuning feature applies to the Flink deployments that have obvious peak hours or off-peak hours in the business. Users can set up custom timing policies for deployments in the console. Then, the deployment resources are automatically adjusted to the preset sizes at a specific time. This feature can deal with the issue of data fluctuations and save labor costs. | |
Health score | The health score feature applies to all deployments that are being started or running. The feature detects the issues in deployments and provides suggestions by using various expert rules. Users can better understand the deployment status and adjust parameters by using the feature. | |
Process optimization on granting permissions to an account | The process of granting permissions to an account is optimized. All RAM users are listed for you to select when you are granted permissions. You do not need to manually enter the information of RAM users. | |
Flink CEP | CEP is a capability for matching patterns over real-time data streams. This version is based on Flink CEP. The CEP rules can be placed outside the database so that the rules can be dynamically loaded and take effect. The API is the DataStream API. | |
Enhancement of CEP SQL | The MATCH_RECOGNIZE statement allows you to use SQL statements to describe CEP rules. This version is based on the MATCH_RECOGNIZE statement of Flink CEP and provides enhanced capabilities, such as output of matching events that do not arrive within a specified time interval and relaxed non-contiguity by using notFollowedBy(). In addition, new metrics are introduced. The following metrics are examples:
| |
The fast recovery capability in case of deployment failover | After the quick task restart feature is enabled, you can restart only the failed task to reduce the impact of deployment failover on the deployment if an exception occurs to a task. Warning This is an experimental feature. Make sure that your business is tolerant of loss and duplicate copies of data before you use the feature. | |
Database synchronization that supports synchronizing data to Kafka | Data is written to the Upsert Kafka tables that correspond to the tables in the MySQL database by using the feature. Users can use the tables in Kafka instead of the tables in the MySQL database to reduce the load on the MySQL database caused by multiple deployments. | |
A DDL statement that defines a partitioned table in a Hologres result table | If you create a Hologres result table, you can specify the PARTITION BY parameter to define the partitioned table. | |
The timeout period for performing an asynchronous request in a Hologres dimension table | You can specify the asyncTimeoutMs parameter to ensure that the data request can be performed within a specific period. | |
You can configure table attributes when you create a Hologres table. | You can use suitable table attribute settings to sort and query data in an efficient manner. When you create a Hologres table, you can configure physical table attributes in the WITH clause. | |
MaxCompute sink connectors support the Binary data type. |
| |
Hive Catalog supports more Hive versions. | This version supports Hive 2.1.0 to 2.3.9 and Hive 3.1.0 to 3.1.3. | |
Tablestore connector | You can read the incremental logs in Tablestore. | |
JDBC connector | The JDBC connector is built in. | |
The parallelism for a Message Queue for Apache RocketMQ source table can be more than the number of partitions that are defined in a Message Queue for Apache RocketMQ message topic. | Users can reserve resources for the possible number of partitions in a topic before consumption. | |
The message keys of a Message Queue for Apache RocketMQ result table can be specified. | You can specify the key of a Message Queue for Apache RocketMQ message. | |
AnalyticDB for MySQL catalog | You can read metadata from AnalyticDB for MySQL by using the catalog. You no longer need to manually register AnalyticDB for MySQL tables. This improves the efficiency of deployment development and ensures that the data is correct. |
Performance optimization
The native format for savepoints is introduced. This optimizes the problem that the savepoints of the canonical format are easy to time out in large states and improves the overall stability of deployments. The following table describes the advantages of the native format.
Section
Benefits
Time required to create a savepoint
The average efficiency is increased by 5 to 10 times and increases with the decrease of the incremental state. The efficiency can even increase by 100 times in a typical deployment.
Time required to restore a deployment
The average efficiency is increased about 5 times and increases with the increase of the state.
Space overhead of a savepoint
The average space overload is saved by 2 times and the saving ratio increases with the increase of the state.
Network overhead of a savepoint
The average network overhead is saved by 5 to 10 times and the saving ratio increases with the decrease of the incremental state.
JOIN operators that are used to join two data streams in SQL streaming deployments allow the Flink engine to automatically infer whether to enable the key-value separation feature. JOIN operators that are used to join two data streams in SQL streaming deployments can automatically infer whether to enable the key-value separation feature based on the deployment characteristics and improve the dual-stream join performance. In the performance test of typical scenarios, the performance is improved by more than 40% on average. For more information, see Optimize Flink SQL and Configurations of GeminiStateBackend.
The deployment startup speed is optimized. The average deployment startup speed is increased by 15%.
Bug fixes
The issue that the modification time of a deployment is abnormally updated is fixed.
The issue that the state cannot be determined after specific deployments are suspended and restarted is fixed.
The issue that the JAR package cannot be uploaded locally from Alibaba Finance Cloud is fixed.
The issue that the total number of resources configured for running a deployment is inconsistent with that on the Statistics page is fixed.
The issue that you cannot log on to the Logs page is fixed.
The issue that the Upsert Kafka tables are returned when you directly read the Kafka catalog is fixed.
The issue that the NullPointerException error is returned when the intermediate results are used for nested operations of multiple user-defined functions (UDFs) is fixed.
The following issue is fixed: In the MySQL-CDC, abnormal chunks and out-of-memory (OOM) occur, and the time zone of initialization data is inconsistent with that of incremental data. For more information about how to configure resource parameters, see Create a MySQL CDC source table.