Best Practices for Kubernetes Migration: Flexible Management of Resource Backup for Application Recovery

This article introduces ACK Backup Center, a Kubernetes disaster recovery and migration solution that simplifies cross-cluster application restoration...

By Yashi Su (Sisi)

Introduction

In today's rapidly changing cloud-native field, the O&M of Kubernetes clusters faces many challenges, among which disaster recovery and business migration are particularly critical. An efficient and reliable mechanism for resource backup and recovery is indispensable for cluster backup and disaster recovery in response to emergencies, synchronization of primary and secondary services, migration from IDC to the cloud, and cloud migration in hybrid cloud scenarios.

In these two scenarios, a common pain point stands out: cross-cluster business recovery often comes with environmental differences, requiring manual resource adjustments. This not only greatly increases operational complexity but may prolong the recovery time objective (RTO), affecting service continuity.

To handle this challenge, ACK Backup Center supports a variety of resource adjustment strategies. This enables automatic adaptation to the target cluster environment during the data recovery, ensuring seamless business resumption. This innovative solution greatly simplifies the tedious operations in the process of disaster recovery and migration, lowers the operation threshold, and accelerates the business recovery speed. It provides strong support for Kubernetes cluster management in pursuit of high availability and flexibility.

Introduction to ACK Backup Center

For businesses running on Kubernetes clusters, ACK provides a one-stop solution for containerized business disaster recovery and migration known as the Backup Center.

Backup center overview: https://www.alibabacloud.com/help/en/ack/ack-managed-and-ack-dedicated/user-guide/backup-center-overview

Cluster O&M engineers can perform the following operations: backup, configure resource adjustment strategies (optional), and restoration.

Backup: Cluster O&M engineers can create a periodic backup schedule or a single application backup in the console of the backup cluster with one click. Compared with ETCD backup, the backup center supports selecting applications to back up based on dimensions such as namespaces, labels, and resource types. For stateful applications, it supports simultaneously backing up storage volume data mounted by the business. Enterprises that have a robust GitOps process can leverage the backup center's data protection capabilities to support exclusively selecting storage volume data to back up.

After the backup is completed and uploaded to the OSS bucket associated with the backup vault, the backup center does not make any changes to the backup stored in the cloud.

Configure resource adjustment strategies and restoration: The backup center supports the following methods to modify clusters:

• Default modifications: No configuration is required; these are executed by default by the components during recovery. Here are some examples: general modification of temporary information such as resource deletion UIDs, changes from FlexVolume to CSI when restoring storage volumes, automatic upgrade of API Versions, and some known compatibility modifications in cross-cloud scenarios.

• Common modifications: They can be easily implemented by configuring the fields of the recovery task. Examples include mapping of namespaces, storage classes, and image registry addresses, as well as rewriting annotations for svc and ing to enable compatibility with the network plug-in.

• General modifications: For more flexible modification needs, reliance on Velero's resource modifier function is required. Specifically, this involves writing configuration items (ConfigMaps) to achieve specific field changes, with support for JSON patch operations such as add, delete, and replace.

Default modifications are applied only to resources and fields that require modifications to ensure successful deployment of the resources. In most in-place disaster recovery scenarios, additional resource adjustment strategies are not required.

Common modifications and general modifications are both optional configurations for O&M engineers, which are used to:

• Ensure compatibility with new clusters or cloud resource environments when restarting services. This is important for disaster recovery scenarios across different locations such as hybrid clouds. For example, it addresses the issues caused by address changes after images are uploaded to the cloud and differences in underlying cloud resources among various cloud providers.

• Customize the business operation logic as needed. Due to requirements such as stability, business migration often involves considerations of configuration file modifications, replica count modifications, forced port changes, load balancing reuse, and mandatory listening enforcement during load balancing reuse. Common resource modifications only need to be implemented by modifying the configuration items in the restoration step, while more flexible and general modifications need to create and write a ConfigMap in advance in this step.

After you configure a resource adjustment strategy, O&M engineers can restore a backup record, including cluster resources and storage volume data, in the console of the cluster.

Best Practices for Cluster Resource Adjustment

Next, this article will simulate the best practices for migrating a stateful application from a self-built Kubernetes cluster to an ACK cluster. The example will demonstrate how to modify the namespace and image registry address for recovery through configuration and delete nodeAffinity by using the resource modifier.

Example of stateful application deployment

The self-built cluster consists of Elastic Compute Service (ECS) instances and has installed the open-source version of the Contain Storage Interface (CSI) storage plug-in.

Connect the self-built cluster to the registered cluster in ACK One and install the backup center component, migrate-controller.

Overview of registered clusters in ACK One: https://www.alibabacloud.com/help/en/ack/overview-9

The created stateful application uses the public NGINX image provided by the OpenAnolis community and is scheduled on the node marked with is_idc.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
  namespace: default
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  serviceName: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: is_idc
                operator: Exists
      containers:
      - image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
        - containerPort: 80
          name: web
          protocol: TCP
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/nginx/html/
          name: www
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      labels:
        app: nginx
      name: www
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 20Gi
      storageClassName: alicloud-disk-topology-alltype
      volumeMode: Filesystem

The created stateful application and its corresponding storage volumes are shown in the following figure:

Back up stateful applications

The example uses the following YAML code to deploy an immediate backup task. The name of the backup vault that is associated with the OSS bucket must be created in advance.

apiVersion: csdr.alibabacloud.com/v1beta1
kind: ApplicationBackup
metadata:
  name: <backupName>
  namespace: csdr
spec:
  backupType: AppAndPvBackup
  includedNamespaces:
  - default
  pvBackup:
    defaultPvBackup: true
  storageLocation: <backuplocationName>
  ttl: 720h0m0s

Wait for the status of the backup task to change to Completed.

Configure custom resource adjustment strategies

A custom resource adjustment strategy has two parameters: conditions and patches. The conditions parameter specifies the resources to be modified, while the patches parameter specifies the fields to be modified and how the fields are modified.

In this example, both StatefulSet and Pod resources are modified. Specifically, the node affinity configuration is deleted.

apiVersion: v1
data:
  modifier: |
    version: v1
    resourceModifierRules:
    - conditions:
        groupResource: statefulsets.apps
        namespaces:
        - default
        labelSelector:
          matchLabels:
            app: nginx
      patches:
      - operation: remove
        path: "/spec/template/spec/affinity/nodeAffinity"
    - conditions:
        groupResource: pods
        resourceNameRegex: "^web.*$"
        namespaces:
        - default
        labelSelector:
          matchLabels:
            app: nginx
      patches:
      - operation: remove
      path: "/spec/affinity"
kind: ConfigMap
metadata:
  name: <backupName>-resource-modifier
  namespace: csdr

Restore adjusted stateful applications

Switch to the ACK cluster, install the backup center component migrate-controller, and then associate it with the same backup vault. Wait for the backup to synchronize to the new cluster.

In addition to the preceding custom adjustment strategy, the recovery task also configures namespace and image registry mapping through the namespaceMapping and imageRegistryMapping fields.

In addition to the previously created adjustment strategy, during restoration, the resources (including storage volume data) in the default namespace in the backup will be restored to the default1 namespace, and the image registry address of the OpenAnolis community will be changed to the ACR image address (the image must be synchronized in advance).

apiVersion: csdr.alibabacloud.com/v1beta1
kind: ApplicationRestore
metadata:
  name: <restoreName>
  namespace: csdr
spec:
  backupName: <backupName>
  namespaceMapping:
    default: default1
  imageRegistryMapping:
    anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis: registry.cn-beijing.aliyuncs.com/<acrRegistry>
  resourceModifier:
    kind: ConfigMap
    name: <backupName>-resource-modifier

Wait for the status of the recovery task to change to Completed.

Verify that the recovery has been modified as required

It can be seen that in the template, the image registry address for the container images and the namespace they reside in have been modified. In addition, the node affinity configurations have also been removed. The remaining new fields are automatically populated by the Kubernetes controller.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      # Omitted
  generation: 1
  labels:
    app: nginx
    velero.io/backup-name: <backupName>
    velero.io/restore-name: <restoreName>
  name: web
  namespace: default1
  resourceVersion: "119622"
  uid: d23878ea-0b9f-40ba-b61b-1ff6bb77eb43
spec:
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Retain
  podManagementPolicy: OrderedReady
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx
  serviceName: nginx
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      affinity: {}
      containers:
      - image: registry.cn-beijing.aliyuncs.com/<acrRegistry>/nginx:1.14.1-8.6
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
        - containerPort: 80
          name: web
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/nginx/html/
          name: www
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
      name: www
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 20Gi
      storageClassName: alicloud-disk-topology-alltype
      volumeMode: Filesystem
    status:
      phase: Pending
status:
  availableReplicas: 0
  collisionCount: 0
  currentRevision: web-7b454646b4
  observedGeneration: 1
  replicas: 2
  updateRevision: web-7b454646b4

The stateful application is successfully launched on nodes of the ACK cluster without labels, and new disks are restored from snapshots for mounting.

Summary

To meet the challenges caused by different environments during the backup and recovery of Kubernetes clusters, ACK backup center provides flexible resource adjustment strategies to ensure smooth migration and seamless resumption of your business.

Alibaba Cloud ACK Backup Center: A One-stop Disaster Recovery Solution for Kubernetes Cluster Business Applications

References

[1] Backup Center
https://www.alibabacloud.com/help/en/ack/ack-managed-and-ack-dedicated/user-guide/backup-center-overview
[2] Registered clusters in ACK One
https://www.alibabacloud.com/help/en/ack/distributed-cloud-container-platform-for-kubernetes/user-guide/overview-9

Community

Best Practices for Kubernetes Migration: Flexible Management of Resource Backup for Application Recovery

Introduction

Introduction to ACK Backup Center

Best Practices for Cluster Resource Adjustment

Example of stateful application deployment

Back up stateful applications

Configure custom resource adjustment strategies

Restore adjusted stateful applications

Verify that the recovery has been modified as required

Summary

Related Articles

References

Read previous post:

Read next post:

Alibaba Container Service

You may also like

Comments

Alibaba Container Service

Related Products

Best Practices

Cloud Migration Solution

ACK One

Database Migration Solution