Upgrade the engine version of a deployment - Realtime Compute for Apache Flink

Apache Flink is one of the most popular stream computing engines, and is actively updated. You can upgrade the engine version of your deployment to a new version to use updated or new features. This topic describes how to upgrade the engine version of a Realtime Compute for Apache Flink deployment.

Usage notes

Before a new engine version of Realtime Compute for Apache Flink is released, various compatibility tests are performed. In principle, minor versions of the same major version are compatible with each other. However, the upgrade compatibility between major versions is not guaranteed. For more information, see the "Engine version and meaning of each digit in a version number" section of the Engine version topic and the "Compatibility Table" section of the Upgrading Applications and Flink Versions topic.

If you want to change the engine version of a deployment, take note of the following items:

If you upgrade the engine version of your deployment to a later minor version of the same major version, such as the upgrade from vvr-4.0.15-flink-1.13 to vvr-4.0.18-flink-1.13, the deployment after the upgrade is compatible with the state data before the upgrade and can use checkpoints or savepoints that are generated before the upgrade.
If you upgrade the engine version of your deployment to a later major version, such as the upgrade from vvr-4.0.15-flink-1.13 to vvr-6.0.2-flink-1.15, the deployment after the upgrade is incompatible with the state data before the upgrade. You must restart your deployment without the state data.
The versions of the Realtime Compute for Apache Flink dependencies in an SQL deployment or a DataStream deployment must be the same as the Realtime Compute for Apache Flink version that is selected for the deployment.
For Apache Flink 1.13.0 and later, BlinkPlanner is used as the default SQL Planner. BlinkPlanner is provided for the Apache Flink community by Alibaba Group. Specific differences exist between Apache Flink 1.13.0 and the previous versions. For more information about the differences, see Apache Flink 1.13.0 Release Announcement. Therefore, if you migrate data from a deployment based on an Apache Flink version earlier than 1.13.0 to a Realtime Compute for Apache Flink deployment that uses Ververica Runtime (VVR) 4.0 or later based on Apache Flink 1.13.0, specific syntaxes and APIs of Realtime Compute for Apache Flink may be incompatible with the syntaxes and APIs of the Apache Flink community.

Procedure

Step 1: Back up a draft and deploy the draft

To ensure the stable running of a deployment, we recommend that you clone the draft of the deployment and upgrade the engine version of the cloned draft.

SQL

Log on to the management console of Realtime Compute for Apache Flink. Find the workspace that you want to manage and click Console in the Actions column.
Back up the SQL draft.
1. In the left-side navigation pane, choose Development > ETL. On the Drafts tab of the page that appears, click the name of the draft that you want to manage.
2. Click Save As in the upper part of draft.
3. In the dialog box that appears, enter a file name for the Name parameter, specify the Location parameter, and then click Save.
Upgrade the engine version of the new draft.
When you change the engine version of the new draft, we recommend that you select a stable version or a recommended version. Known defects and related issues in other versions are fixed in the stable versions and recommended versions. These versions provide the latest features and higher stability.
1. On the right side of the new draft, click the Configurations tab. In the Configurations panel, select the version that you want to use from the Engine Version drop-down list, and then click Deploy in the upper-right corner.
2. In the left-side navigation pane, choose O&M > Deployments. On the Deployments page, click the name of the deployment that you want to manage. In the Basic section of the Configuration tab, check whether the engine version of the deployment is changed.

DataStream

Log on to the management console of Realtime Compute for Apache Flink. Find the workspace that you want to manage and click Console in the Actions column.
Back up the DataStream draft and select the new engine version for the new draft.
When you change the engine version of the new draft, we recommend that you select a stable version or a recommended version. Known defects and related issues in other versions are fixed in the stable versions and recommended versions. These versions provide the latest features and higher stability.
1. In the left-side navigation pane, choose O&M > Deployments. On the Deployments page, click the name of the deployment that you want to manage.
2. In the upper-right corner of the page that appears, click Clone.
3. Enter a new name in the Deployment Name field, and select the new version from the Engine Version drop-down list.
Click Deploy.

Step 2: Back up the deployment state

In the left-side navigation pane of the Realtime Compute for Apache Flink console, choose O&M > Deployments. On the Deployments page, click the name of the deployment that you want to manage and view the state set of the deployment on the State tab. For more information, see the "View the state generation overview" section in the Manage a state set topic.

If your deployment uses stateful computing, you must consider whether the state data can be reused for the deployment after the upgrade.
Before you upgrade the engine version of your deployment, manually create a savepoint for the deployment. This helps you quickly roll back your deployment if an exception occurs during the upgrade. For more information, see the "Manually create a savepoint" section of the Status set management.
Important
- In principle, minor versions of the same major version are compatible with each other. However, the upgrade compatibility between major versions is not guaranteed. If the versions before and after the upgrade are compatible with each other, the new version of the engine can read the savepoints that are generated by the old version of the engine. The old version of the engine may not be able to read the savepoints that are generated by the new version of the engine.
- Realtime Compute for Apache Flink that uses VVR 6.X or later supports two savepoint formats: native format and standard format. The native format supports faster savepoint generation. The standard format provides better compatibility. Therefore, if you want to upgrade the engine version of a deployment to a later minor version of the same major version, we recommend that you use the native format for savepoint generation. If you want to upgrade the engine version of a deployment to a later major version, we recommend that you use the standard format.
If your deployment is stateless, proceed to the next step.

(Optional) Step 3: Cancel the deployment

In the left-side navigation pane, choose O&M > Deployments. On the Deployments page, find the deployment you want to manage and click Cancel in the Actions column. For more information, see Cancel a deployment.

If the downstream operators support idempotent write operations or duplicate data is acceptable for your business, you can run the deployment and the backed up deployment at the same time.

Step 4: Start the new deployment

If the new deployment is stateful, perform the following operations to start the deployment: In the Start Job panel, select Resume Mode, select Specific State, and then select the savepoint that is created for the deployment that you created a backup for.
If the new deployment is stateless, you can select Initial Mode to start the deployment.

For more information about how to start a deployment, see Start a deployment.

Step 5: Check the online status of the new deployment and drop the deployment for which you created a backup

In most cases, if the deployment starts as expected and the first checkpoint is generated after the upgrade, the upgrade is considered initially successful. We recommend that you observe the correctness of business data to ensure the integrity of the upgrade results.

If the deployment runs as expected and the data correctness is verified, you can manually delete the deployment and the savepoint that is generated before the upgrade. For more information about how to delete a savepoint, see the "Manually delete a specified savepoint" section in Status set management.

Rollback for upgrade failures

If the deployment cannot be started or the business data is abnormal after the upgrade, we recommend that you immediately cancel the deployment, change the engine version of the deployment to the version before the upgrade, and then use the savepoint that is saved before the upgrade to restore the business. You can also submit a ticket to provide feedback.

If you cannot track the entire upgrade process, we recommend that you configure a deployment failure alert to notify you of exceptions at the earliest opportunity. For more information, see Configure monitoring and alerts.