The storage-operator component can automate cross-zone migration for stateful applications that use disk volumes and cross-zone pod spreading. With the help of the precheck and rollback features, the storage-operator component can restore an application in the source zone to ensure business availability when an exception occurs during the migration. This topic describes how to migrate stateful applications that use disk volumes across zones.
Scenarios
You may want to migrate a stateful application across zones due to deployment plan changes or when the resources in the current zone are insufficient.
File Storage NAS (NAS) and Object Storage Service (OSS) volumes support cross-zone mounting and multi-pod mounting. The persistent volumes (PVs) and persistent volume claims (PVCs) of disk volumes cannot be reused in different zones because disks cannot be migrated across zones. Therefore, if your stateful application uses disk volumes, you need to migrate the application to the destination zone.
How it works and the migration procedure
To migrate applications that use disk volumes across zones, the disk snapshot feature is needed. This feature also allows you to specify the retention period of disk snapshots. For more information about disk snapshots, see Introduction to snapshots. For more information about the billing rules of snapshots, see Snapshots.
The storage-operator component performs the following steps to migrate a stateful application that uses disk volumes:
Performs a precheck. For example, the component checks whether the application runs as expected and whether the application has disks that need to be migrated. The application is not migrated if it fails the precheck.
Scales the pods of the application to zero. After the scale-in activity, the application is suspended.
Creates snapshots for the disks that are used by the application. Snapshots can be used across zones.
Uses the snapshots to create disks in the destination zone after confirming that the snapshots are valid. The newly created disks store the same data as the original disks.
Creates PVs and PVCs with the same names and binds the PVs and PVCs to the disks.
Scales the pods of the application to the original number and mounts the disk volumes to the pods.
ImportantThe component migrates the application after the precheck. A rollback policy is prepared for each step of the migration. To avoid data loss after migration, make sure that the application runs as expected before the component deletes the original disks.
(Optional) Deletes the original PVs and disks after you confirm that the application runs as expected. For more information about the billing rules of disks, see EBS devices.
Usage notes
Make sure that the application to be migrated uses only enhanced SSDs (ESSDs).
You can also select IA snapshots, which require less time to create. For more information, see Enable or disable the instant access feature. You can create IA snapshots only for ESSDs. If your application uses disks other than ESSDs, use one of the following methods:
Change the type of the disks to ESSD before you migrate the application. For more information, see Change the type of a cloud disk.
Follow the steps in the Use volume snapshots created from disks topic to manually recreate the disks in the destination zone.
Make sure that the destination zone supports ESSDs and the cluster contains an idle node to host the pods of the application in the destination zone.
Make sure that the service provided by the application can be interrupted. If the stateful application has more than one pod, scale the pods to zero before the migration starts to ensure data consistency. After the migration is complete, scale the pods to the original number.
Business interruptions may occur when a stateful application is migrated across zones. The interruption duration depends on the number of pods, container launch speed, and disk capacity.
Prerequisites
An ACK cluster that runs Kubernetes 1.20 or later is created. The Container Storage Interface (CSI) plug-in is used as the volume plug-in. For more information, see Create an ACK managed cluster.
If you use an ACK dedicated cluster, you need to grant the following permissions to the worker role and master role. For more information, see Create custom policies.
NoteIf you use an ACK Pro cluster, you do not need to grant the preceding RAM permissions.
The version of the storage-operator component is v1.26.2-1de13b6-aliyun or later. For more information about how to update storage-operator, see Manage components.
Use the storage-operator component to migrate a stateful application
Run the following command to modify the ConfigMap of storage-operator:
kubectl patch configmap/storage-operator \ -n kube-system \ --type merge \ -p '{"data":{"storage-controller":"{\"imageRep\":\"acs/storage-controller\",\"imageTag\":\"\",\"install\":\"true\",\"template\":\"/acs/templates/storage-controller/install.yaml\",\"type\":\"deployment\"}"}}'
Run the following command to create a stateful application migration task:
cat <<EOF | kubectl apply -f - apiVersion: storage.alibabacloud.com/v1beta1 kind: ContainerStorageOperator metadata: name: default spec: operationType: APPMIGRATE operationParams: stsName: web stsNamespace: default stsType: kube targetZone: cn-beijing-h,cn-beijing-j checkWaitingMinutes: "1" healthDurationMinutes: "1" snapshotRetentionDays: "2" retainSourcePV: "true" EOF
Parameter
Required
Description
operationType
Yes
Set the value to APPMIGRATE, which specifies that the operation is performed to migrate a stateful application.
stsName
Yes
The name of the stateful application. You can specify only one application.
NoteIf you create multiple stateful application migration tasks, the component runs the tasks in the order of task creation time.
stsNamespace
Yes
The namespace to which the application belongs.
targetZone
Yes
The destination zones to which the application is migrated. Separate multiple zones with commas (,). Example:
cn-beijing-h,cn-beijing-j
.If a disk that is used by the application already resides in a zone in the list, the disk is not migrated.
If multiple destination zones are specified, the remaining disks are spread to the zones in the order of the zones in the list.
stsType
No
The type of the stateful application. Default value: kube. Valid values:
kube: Kubernetes StatefulSet.
kruise: Advanced StatefulSet provided by the OpenKruise component.
checkWaitingMinutes
No
The interval at which the component checks the status of the application in the destination zone. Unit: minutes.
Default value:
"1"
. The component checks the status of the application at an interval of 1 minute until the pods of the application reaches the original number or the application is rolled back to the source zone if the application consecutively fails to pass the status check.ImportantYou can increase the interval if the application has an excessive number of pods, image pulling is time-consuming, or a long period of time is required to launch the application. Otherwise, the application may be rolled back after the application fails to pass the status check multiple times.
healthDurationMinutes
No
The period of time to wait before the component double checks the status of the application. Unit: minutes. The component double checks the status of the application at the scheduled time after the pods of the application reach the expected number. This helps improve the reliability of businesses that are sensitive to data.
Default value:
"0"
, which indicates that the component does not double check the application.snapshotRetentionDays
No
The retention period of IA snapshots. Unit: days. Valid values:
"1"
: one day. This is the default value."-1"
: permanently retains the IA snapshots.
retainSourcePV
No
Specifies whether to retain the original disks and PVs. Valid values:
"false"
: does not retain the original disks and PVs. This is the default value."true"
: retains the original disks and PVs. You can find the original disks in the Elastic Compute Service (ECS) console. The status of the original disk volumes changes to Released.
Examples
In the following examples, the testing cluster is an ACK Pro cluster deployed in the cn-beijing region. The cluster contains the node-zone-i, node-zone-j, and node-zone-k nodes deployed in the cn-beijing-i, cn-beijing-j, and cn-beijing-k zones, respectively.
Example 1: Migrate disks across zones
Step 1: Create a stateful application that uses ESSDs
Run the following command to deploy a stateful application named nginx in the cluster:
Run the following command to query the deployment of the pods for the application:
kubectl get pod -owide | grep web-
Expected output:
NAME READY STATUS RESTARTS AGE IP ZONE NOMINATED NODE READINESS GATES web-0 1/1 Running 0 44s 172.29.XX.XX node-zone-i <none> <none> web-1 1/1 Running 0 3s 172.29.XX.XX node-zone-j <none> <none>
The output indicates that two pods are deployed on nodes in the cn-beijing-i and cn-beijing-j zones, respectively. The actual deployment depends on the scheduler.
Step 2: Create a stateful application migration task
Run the following command to create a stateful application migration task:
The following migration task migrates the two pods to the cn-beijing-k zone. Before the migration starts, make sure that the node in the cn-beijing-k zone has sufficient resources and that the zone and the node support ESSDs.
cat <<EOF | kubectl apply -f - apiVersion: storage.alibabacloud.com/v1beta1 kind: ContainerStorageOperator metadata: name: migrate-to-k spec: operationType: APPMIGRATE operationParams: stsName: web stsNamespace: default stsType: kube targetZone: cn-beijing-k # # Specify cn-beijing-k as the destination zone. healthDurationMinutes: "1" # Wait 1 minute and then check the status of the application after the migration is complete. snapshotRetentionDays: "-1" # Permanently retain the snapshots. You can manually delete the snapshots in the console. retainSourcePV: "true" # Retain the original disks and PVs. EOF
Run the following command to query the status of the migration task:
kubectl describe cso migrate-to-k | grep Status
Expected output:
Status: SUCCESS
If the output displays
SUCCESS
, the migration task runs as expected. If the output displaysFAILED
, the migration task failed to be created. For more information about how to troubleshoot the issue, see (Optional) Troubleshoot migration task creation failures.Run the following command to query the deployment of the two pods:
kubectl get pod -owide | grep web-
Expected output:
NAME READY STATUS RESTARTS AGE IP ZONE NOMINATED NODE READINESS GATES web-0 1/1 Running 0 25m 172.29.XX.XX node-zone-k <none> <none> web-1 1/1 Running 0 25m 172.29.XX.XX node-zone-k <none> <none>
The output indicates that the pods are migrated to the node in the cn-beijing-k zone.
Log on to the ECS console.
Confirm the following information:
Whether the newly created IA snapshots are permanently retained.
Whether the newly created disks reside in the cn-beijing-k zone.
Whether the disks in the cn-beijing-i and cn-beijing-j zones and the PVs are retained because the
retainSourcePV
parameter of the migration task is set totrue
.
(Optional) Troubleshoot migration task creation failures
If the output in Step 2 indicates that the migration task is in the FAILED state, perform the following steps to troubleshoot and fix the issue and then try again.
Run the following command to confirm that the application is rolled back:
kubectl get pod -owide | grep web-
Expected output:
NAME READY STATUS RESTARTS AGE IP ZONE NOMINATED NODE READINESS GATES web-0 1/1 Running 0 12m 172.29.XX.XX node-zone-i <none> <none> web-1 1/1 Running 0 12m 172.29.XX.XX node-zone-j <none> <none>
Run the following command to query the cause of failure:
kubectl describe cso migrate-to-k | grep Message -A 1
Expected output:
Message: Consume: no pvc mounted in statefulset or no pvc need to migrated web
The output indicates that the migration task failed to be created because the PVCs of the disk volumes to be migrated do not exist. This issue occurs if the volumes are not mounted to the application, the volumes already reside in the destination zone, or the system fails to retrieve the PVC information. Modify the configurations based on the cause and try again.
Example 2: Spread disks across zones
In this example, the stateful application has two pods deployed on the node-beijing-k node in the cn-beijing-k zone. To improve the availability of the application, you can spread the pods to the cn-beijing-i and cn-beijing-j zones. To do this, perform the following steps.
Run the following command to create a stateful application migration task:
cat <<EOF | kubectl apply -f - apiVersion: storage.alibabacloud.com/v1beta1 kind: ContainerStorageOperator metadata: name: migrate-to-i-and-j spec: operationType: APPMIGRATE operationParams: stsName: web stsNamespace: default stsType: kube targetZone: cn-beijing-i,cn-beijing-j # Specify cn-beijing-i and cn-beijing-j as the destination zones. healthDurationMinutes: "1" # Wait 1 minute and then check the status of the application after the migration is complete. snapshotRetentionDays: "-1" # Permanently retain the snapshots. You can manually delete the snapshots in the console. retainSourcePV: "true" # Retain the original disks and PVs. EOF
Run the following command to query the status of the migration task:
kubectl describe cso migrate-to-i-and-j | grep Status
Expected output:
Status: SUCCESS
Run the following command to query the deployment of the two pods:
kubectl get pod -owide | grep web-
Expected output:
NAME READY STATUS RESTARTS AGE IP ZONE NOMINATED NODE READINESS GATES web-0 1/1 Running 0 12m 172.29.XX.XX node-zone-i <none> <none> web-1 1/1 Running 0 12m 172.29.XX.XX node-zone-j <none> <none>
The output indicates that the two pods are spread to the cn-beijing-i and cn-beijing-j zones.
Log on to the ECS console.
Confirm the following information:
Whether the newly created IA snapshots are permanently retained.
Whether the newly created disks reside in the cn-beijing-i and cn-beijing-j zones.
Whether the disks in the cn-beijing-k zone and the PVs are retained because the
retainSourcePV
parameter of the migration task is set totrue
.