All Products
Search
Document Center

Container Service for Kubernetes:Migrate stateful applications that use disk volumes across zones

最終更新日:Nov 01, 2024

The storage-operator component can automate cross-zone migration for stateful applications that use disk volumes and cross-zone pod spreading. With the help of the precheck and rollback features, the storage-operator component can restore an application in the source zone to ensure business availability when an exception occurs during the migration. This topic describes how to migrate stateful applications that use disk volumes across zones.

Scenarios

  • You may want to migrate a stateful application across zones due to deployment plan changes or when the resources in the current zone are insufficient.

  • File Storage NAS (NAS) and Object Storage Service (OSS) volumes support cross-zone mounting and multi-pod mounting. The persistent volumes (PVs) and persistent volume claims (PVCs) of disk volumes cannot be reused in different zones because disks cannot be migrated across zones. Therefore, if your stateful application uses disk volumes, you need to migrate the application to the destination zone.

How it works and the migration procedure

To migrate applications that use disk volumes across zones, the disk snapshot feature is needed. This feature also allows you to specify the retention period of disk snapshots. For more information about disk snapshots, see Introduction to snapshots. For more information about the billing rules of snapshots, see Snapshots.

The storage-operator component performs the following steps to migrate a stateful application that uses disk volumes:

  1. Performs a precheck. For example, the component checks whether the application runs as expected and whether the application has disks that need to be migrated. The application is not migrated if it fails the precheck.

  2. Scales the pods of the application to zero. After the scale-in activity, the application is suspended.

  3. Creates snapshots for the disks that are used by the application. Snapshots can be used across zones.

  4. Uses the snapshots to create disks in the destination zone after confirming that the snapshots are valid. The newly created disks store the same data as the original disks.

  5. Creates PVs and PVCs with the same names and binds the PVs and PVCs to the disks.

  6. Scales the pods of the application to the original number and mounts the disk volumes to the pods.

    Important

    The component migrates the application after the precheck. A rollback policy is prepared for each step of the migration. To avoid data loss after migration, make sure that the application runs as expected before the component deletes the original disks.

  7. (Optional) Deletes the original PVs and disks after you confirm that the application runs as expected. For more information about the billing rules of disks, see EBS devices.

Usage notes

  • Make sure that the application to be migrated uses only enhanced SSDs (ESSDs).

    You can also select IA snapshots, which require less time to create. For more information, see Enable or disable the instant access feature. You can create IA snapshots only for ESSDs. If your application uses disks other than ESSDs, use one of the following methods:

  • Make sure that the destination zone supports ESSDs and the cluster contains an idle node to host the pods of the application in the destination zone.

  • Make sure that the service provided by the application can be interrupted. If the stateful application has more than one pod, scale the pods to zero before the migration starts to ensure data consistency. After the migration is complete, scale the pods to the original number.

Important

Business interruptions may occur when a stateful application is migrated across zones. The interruption duration depends on the number of pods, container launch speed, and disk capacity.

Prerequisites

  • An ACK cluster that runs Kubernetes 1.20 or later is created. The Container Storage Interface (CSI) plug-in is used as the volume plug-in. For more information, see Create an ACK managed cluster.

  • If you use an ACK dedicated cluster, you need to grant the following permissions to the worker role and master role. For more information, see Create custom policies.

    View content of the permission policy attached to the worker role and master role

    {
        "Version": "1",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ecs:CreateSnapshot",
                    "ecs:DescribeSnapshot",
                    "ecs:DeleteSnapshot",
                    "ecs:ModifyDiskSpec",
                    "ecs:DescribeTaskAttribute"
                ],
                "Resource": "*"
            }
        ]
    }
    Note

    If you use an ACK Pro cluster, you do not need to grant the preceding RAM permissions.

  • The version of the storage-operator component is v1.26.2-1de13b6-aliyun or later. For more information about how to update storage-operator, see Manage components.

Use the storage-operator component to migrate a stateful application

  1. Run the following command to modify the ConfigMap of storage-operator:

    kubectl patch configmap/storage-operator \
      -n kube-system \
      --type merge \
      -p '{"data":{"storage-controller":"{\"imageRep\":\"acs/storage-controller\",\"imageTag\":\"\",\"install\":\"true\",\"template\":\"/acs/templates/storage-controller/install.yaml\",\"type\":\"deployment\"}"}}'
  2. Run the following command to create a stateful application migration task:

    cat <<EOF | kubectl apply -f -
    apiVersion: storage.alibabacloud.com/v1beta1
    kind: ContainerStorageOperator
    metadata:
      name: default
    spec:
      operationType: APPMIGRATE
      operationParams:
        stsName: web
        stsNamespace: default
        stsType: kube
        targetZone: cn-beijing-h,cn-beijing-j
        checkWaitingMinutes: "1"
        healthDurationMinutes: "1"
        snapshotRetentionDays: "2"
        retainSourcePV: "true"
    EOF

    Parameter

    Required

    Description

    operationType

    Yes

    Set the value to APPMIGRATE, which specifies that the operation is performed to migrate a stateful application.

    stsName

    Yes

    The name of the stateful application. You can specify only one application.

    Note

    If you create multiple stateful application migration tasks, the component runs the tasks in the order of task creation time.

    stsNamespace

    Yes

    The namespace to which the application belongs.

    targetZone

    Yes

    The destination zones to which the application is migrated. Separate multiple zones with commas (,). Example: cn-beijing-h,cn-beijing-j.

    • If a disk that is used by the application already resides in a zone in the list, the disk is not migrated.

    • If multiple destination zones are specified, the remaining disks are spread to the zones in the order of the zones in the list.

    stsType

    No

    The type of the stateful application. Default value: kube. Valid values:

    • kube: Kubernetes StatefulSet.

    • kruise: Advanced StatefulSet provided by the OpenKruise component.

    checkWaitingMinutes

    No

    The interval at which the component checks the status of the application in the destination zone. Unit: minutes.

    Default value: "1". The component checks the status of the application at an interval of 1 minute until the pods of the application reaches the original number or the application is rolled back to the source zone if the application consecutively fails to pass the status check.

    Important

    You can increase the interval if the application has an excessive number of pods, image pulling is time-consuming, or a long period of time is required to launch the application. Otherwise, the application may be rolled back after the application fails to pass the status check multiple times.

    healthDurationMinutes

    No

    The period of time to wait before the component double checks the status of the application. Unit: minutes. The component double checks the status of the application at the scheduled time after the pods of the application reach the expected number. This helps improve the reliability of businesses that are sensitive to data.

    Default value: "0", which indicates that the component does not double check the application.

    snapshotRetentionDays

    No

    The retention period of IA snapshots. Unit: days. Valid values:

    • "1": one day. This is the default value.

    • "-1": permanently retains the IA snapshots.

    retainSourcePV

    No

    Specifies whether to retain the original disks and PVs. Valid values:

    • "false": does not retain the original disks and PVs. This is the default value.

    • "true": retains the original disks and PVs. You can find the original disks in the Elastic Compute Service (ECS) console. The status of the original disk volumes changes to Released.

Examples

In the following examples, the testing cluster is an ACK Pro cluster deployed in the cn-beijing region. The cluster contains the node-zone-i, node-zone-j, and node-zone-k nodes deployed in the cn-beijing-i, cn-beijing-j, and cn-beijing-k zones, respectively.

Example 1: Migrate disks across zones

Step 1: Create a stateful application that uses ESSDs

  1. Run the following command to deploy a stateful application named nginx in the cluster:

    View the YAML file of the stateful application named nginx

    cat << EOF | kubectl apply -f -
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: web
    spec:
      selector:
        matchLabels:
          app: nginx
      serviceName: "nginx"
      replicas: 2
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
            - name: nginx
              image: nginx:1.14.2
              ports:
                - containerPort: 80
                  name: web
              volumeMounts:
                - name: www
                  mountPath: /usr/share/nginx/html
      volumeClaimTemplates:
        - metadata:
            name: www
            labels:
              app: nginx
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: "alicloud-disk-essd"
            resources:
              requests:
                storage: 20Gi
    EOF
    
  2. Run the following command to query the deployment of the pods for the application:

    kubectl get pod -owide | grep web-

    Expected output:

    NAME                        READY    STATUS   RESTARTS    AGE     IP                ZONE                  NOMINATED NODE   READINESS GATES
    web-0                        1/1     Running   0          44s     172.29.XX.XX    node-zone-i           <none>           <none>
    web-1                        1/1     Running   0          3s      172.29.XX.XX    node-zone-j           <none>           <none>

    The output indicates that two pods are deployed on nodes in the cn-beijing-i and cn-beijing-j zones, respectively. The actual deployment depends on the scheduler.

Step 2: Create a stateful application migration task

  1. Run the following command to create a stateful application migration task:

    The following migration task migrates the two pods to the cn-beijing-k zone. Before the migration starts, make sure that the node in the cn-beijing-k zone has sufficient resources and that the zone and the node support ESSDs.

    cat <<EOF | kubectl apply -f -
    apiVersion: storage.alibabacloud.com/v1beta1
    kind: ContainerStorageOperator
    metadata:
      name: migrate-to-k
    spec:
      operationType: APPMIGRATE
      operationParams:
        stsName: web
        stsNamespace: default
        stsType: kube
        targetZone: cn-beijing-k      # # Specify cn-beijing-k as the destination zone. 
        healthDurationMinutes: "1"    #  Wait 1 minute and then check the status of the application after the migration is complete. 
        snapshotRetentionDays: "-1"   # Permanently retain the snapshots. You can manually delete the snapshots in the console. 
        retainSourcePV: "true"        # Retain the original disks and PVs. 
    EOF
  2. Run the following command to query the status of the migration task:

    kubectl describe cso migrate-to-k | grep Status

    Expected output:

      Status:  SUCCESS

    If the output displays SUCCESS, the migration task runs as expected. If the output displays FAILED, the migration task failed to be created. For more information about how to troubleshoot the issue, see (Optional) Troubleshoot migration task creation failures.

  3. Run the following command to query the deployment of the two pods:

    kubectl get pod -owide | grep web-

    Expected output:

    NAME                        READY    STATUS    RESTARTS   AGE     IP                  ZONE                 NOMINATED NODE    READINESS GATES
    web-0                        1/1     Running   0          25m     172.29.XX.XX     node-zone-k           <none>           <none>
    web-1                        1/1     Running   0          25m     172.29.XX.XX     node-zone-k           <none>           <none>

    The output indicates that the pods are migrated to the node in the cn-beijing-k zone.

  4. Log on to the ECS console.

    Confirm the following information:

    • Whether the newly created IA snapshots are permanently retained.

    • Whether the newly created disks reside in the cn-beijing-k zone.

    • Whether the disks in the cn-beijing-i and cn-beijing-j zones and the PVs are retained because the retainSourcePV parameter of the migration task is set to true.

(Optional) Troubleshoot migration task creation failures

If the output in Step 2 indicates that the migration task is in the FAILED state, perform the following steps to troubleshoot and fix the issue and then try again.

  1. Run the following command to confirm that the application is rolled back:

    kubectl get pod -owide | grep web-

    Expected output:

    NAME                        READY    STATUS    RESTARTS   AGE     IP                  ZONE                 NOMINATED NODE    READINESS GATES
    web-0                        1/1     Running   0          12m     172.29.XX.XX   node-zone-i           <none>           <none>
    web-1                        1/1     Running   0          12m     172.29.XX.XX    node-zone-j           <none>           <none>
  2. Run the following command to query the cause of failure:

    kubectl describe cso migrate-to-k | grep Message -A 1

    Expected output:

      Message:
        Consume: no pvc mounted in statefulset or no pvc need to migrated web

    The output indicates that the migration task failed to be created because the PVCs of the disk volumes to be migrated do not exist. This issue occurs if the volumes are not mounted to the application, the volumes already reside in the destination zone, or the system fails to retrieve the PVC information. Modify the configurations based on the cause and try again.

Example 2: Spread disks across zones

In this example, the stateful application has two pods deployed on the node-beijing-k node in the cn-beijing-k zone. To improve the availability of the application, you can spread the pods to the cn-beijing-i and cn-beijing-j zones. To do this, perform the following steps.

  1. Run the following command to create a stateful application migration task:

    cat <<EOF | kubectl apply -f -
    apiVersion: storage.alibabacloud.com/v1beta1
    kind: ContainerStorageOperator
    metadata:
      name: migrate-to-i-and-j
    spec:
      operationType: APPMIGRATE
      operationParams:
        stsName: web
        stsNamespace: default
        stsType: kube
        targetZone: cn-beijing-i,cn-beijing-j   # Specify cn-beijing-i and cn-beijing-j as the destination zones. 
        healthDurationMinutes: "1"              # Wait 1 minute and then check the status of the application after the migration is complete. 
        snapshotRetentionDays: "-1"             # Permanently retain the snapshots. You can manually delete the snapshots in the console. 
        retainSourcePV: "true"                  # Retain the original disks and PVs. 
    EOF
  2. Run the following command to query the status of the migration task:

    kubectl describe cso migrate-to-i-and-j | grep Status

    Expected output:

      Status:  SUCCESS
  3. Run the following command to query the deployment of the two pods:

    kubectl get pod -owide | grep web-

    Expected output:

    NAME                        READY    STATUS    RESTARTS   AGE     IP                  ZONE                NOMINATED NODE   READINESS GATES
    web-0                        1/1     Running   0          12m     172.29.XX.XX    node-zone-i           <none>           <none>
    web-1                        1/1     Running   0          12m     172.29.XX.XX    node-zone-j           <none>           <none>

    The output indicates that the two pods are spread to the cn-beijing-i and cn-beijing-j zones.

  4. Log on to the ECS console.

    Confirm the following information:

    • Whether the newly created IA snapshots are permanently retained.

    • Whether the newly created disks reside in the cn-beijing-i and cn-beijing-j zones.

    • Whether the disks in the cn-beijing-k zone and the PVs are retained because the retainSourcePV parameter of the migration task is set to true.