By Yashi Su (Sisi)
In today's rapidly changing cloud-native field, the O&M of Kubernetes clusters faces many challenges, among which disaster recovery and business migration are particularly critical. An efficient and reliable mechanism for resource backup and recovery is indispensable for cluster backup and disaster recovery in response to emergencies, synchronization of primary and secondary services, migration from IDC to the cloud, and cloud migration in hybrid cloud scenarios.
In these two scenarios, a common pain point stands out: cross-cluster business recovery often comes with environmental differences, requiring manual resource adjustments. This not only greatly increases operational complexity but may prolong the recovery time objective (RTO), affecting service continuity.
To handle this challenge, ACK Backup Center supports a variety of resource adjustment strategies. This enables automatic adaptation to the target cluster environment during the data recovery, ensuring seamless business resumption. This innovative solution greatly simplifies the tedious operations in the process of disaster recovery and migration, lowers the operation threshold, and accelerates the business recovery speed. It provides strong support for Kubernetes cluster management in pursuit of high availability and flexibility.
For businesses running on Kubernetes clusters, ACK provides a one-stop solution for containerized business disaster recovery and migration known as the Backup Center.
Backup center overview: https://www.alibabacloud.com/help/en/ack/ack-managed-and-ack-dedicated/user-guide/backup-center-overview
Cluster O&M engineers can perform the following operations: backup, configure resource adjustment strategies (optional), and restoration.
Backup: Cluster O&M engineers can create a periodic backup schedule or a single application backup in the console of the backup cluster with one click. Compared with ETCD backup, the backup center supports selecting applications to back up based on dimensions such as namespaces, labels, and resource types. For stateful applications, it supports simultaneously backing up storage volume data mounted by the business. Enterprises that have a robust GitOps process can leverage the backup center's data protection capabilities to support exclusively selecting storage volume data to back up.
After the backup is completed and uploaded to the OSS bucket associated with the backup vault, the backup center does not make any changes to the backup stored in the cloud.
Configure resource adjustment strategies and restoration: The backup center supports the following methods to modify clusters:
• Default modifications: No configuration is required; these are executed by default by the components during recovery. Here are some examples: general modification of temporary information such as resource deletion UIDs, changes from FlexVolume to CSI when restoring storage volumes, automatic upgrade of API Versions, and some known compatibility modifications in cross-cloud scenarios.
• Common modifications: They can be easily implemented by configuring the fields of the recovery task. Examples include mapping of namespaces, storage classes, and image registry addresses, as well as rewriting annotations for svc and ing to enable compatibility with the network plug-in.
• General modifications: For more flexible modification needs, reliance on Velero's resource modifier function is required. Specifically, this involves writing configuration items (ConfigMaps) to achieve specific field changes, with support for JSON patch operations such as add, delete, and replace.
Default modifications are applied only to resources and fields that require modifications to ensure successful deployment of the resources. In most in-place disaster recovery scenarios, additional resource adjustment strategies are not required.
Common modifications and general modifications are both optional configurations for O&M engineers, which are used to:
• Ensure compatibility with new clusters or cloud resource environments when restarting services. This is important for disaster recovery scenarios across different locations such as hybrid clouds. For example, it addresses the issues caused by address changes after images are uploaded to the cloud and differences in underlying cloud resources among various cloud providers.
• Customize the business operation logic as needed. Due to requirements such as stability, business migration often involves considerations of configuration file modifications, replica count modifications, forced port changes, load balancing reuse, and mandatory listening enforcement during load balancing reuse. Common resource modifications only need to be implemented by modifying the configuration items in the restoration step, while more flexible and general modifications need to create and write a ConfigMap in advance in this step.
After you configure a resource adjustment strategy, O&M engineers can restore a backup record, including cluster resources and storage volume data, in the console of the cluster.
Next, this article will simulate the best practices for migrating a stateful application from a self-built Kubernetes cluster to an ACK cluster. The example will demonstrate how to modify the namespace and image registry address for recovery through configuration and delete nodeAffinity by using the resource modifier.
The self-built cluster consists of Elastic Compute Service (ECS) instances and has installed the open-source version of the Contain Storage Interface (CSI) storage plug-in.
Connect the self-built cluster to the registered cluster in ACK One and install the backup center component, migrate-controller.
Overview of registered clusters in ACK One: https://www.alibabacloud.com/help/en/ack/overview-9
The created stateful application uses the public NGINX image provided by the OpenAnolis community and is scheduled on the node marked with is_idc.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
namespace: default
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
serviceName: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: is_idc
operator: Exists
containers:
- image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
name: web
protocol: TCP
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/nginx/html/
name: www
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app: nginx
name: www
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: alicloud-disk-topology-alltype
volumeMode: Filesystem
The created stateful application and its corresponding storage volumes are shown in the following figure:
The example uses the following YAML code to deploy an immediate backup task. The name of the backup vault that is associated with the OSS bucket must be created in advance.
apiVersion: csdr.alibabacloud.com/v1beta1
kind: ApplicationBackup
metadata:
name: <backupName>
namespace: csdr
spec:
backupType: AppAndPvBackup
includedNamespaces:
- default
pvBackup:
defaultPvBackup: true
storageLocation: <backuplocationName>
ttl: 720h0m0s
Wait for the status of the backup task to change to Completed.
A custom resource adjustment strategy has two parameters: conditions and patches. The conditions parameter specifies the resources to be modified, while the patches parameter specifies the fields to be modified and how the fields are modified.
In this example, both StatefulSet and Pod resources are modified. Specifically, the node affinity configuration is deleted.
apiVersion: v1
data:
modifier: |
version: v1
resourceModifierRules:
- conditions:
groupResource: statefulsets.apps
namespaces:
- default
labelSelector:
matchLabels:
app: nginx
patches:
- operation: remove
path: "/spec/template/spec/affinity/nodeAffinity"
- conditions:
groupResource: pods
resourceNameRegex: "^web.*$"
namespaces:
- default
labelSelector:
matchLabels:
app: nginx
patches:
- operation: remove
path: "/spec/affinity"
kind: ConfigMap
metadata:
name: <backupName>-resource-modifier
namespace: csdr
Switch to the ACK cluster, install the backup center component migrate-controller, and then associate it with the same backup vault. Wait for the backup to synchronize to the new cluster.
In addition to the preceding custom adjustment strategy, the recovery task also configures namespace and image registry mapping through the namespaceMapping and imageRegistryMapping fields.
In addition to the previously created adjustment strategy, during restoration, the resources (including storage volume data) in the default namespace in the backup will be restored to the default1 namespace, and the image registry address of the OpenAnolis community will be changed to the ACR image address (the image must be synchronized in advance).
apiVersion: csdr.alibabacloud.com/v1beta1
kind: ApplicationRestore
metadata:
name: <restoreName>
namespace: csdr
spec:
backupName: <backupName>
namespaceMapping:
default: default1
imageRegistryMapping:
anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis: registry.cn-beijing.aliyuncs.com/<acrRegistry>
resourceModifier:
kind: ConfigMap
name: <backupName>-resource-modifier
Wait for the status of the recovery task to change to Completed.
It can be seen that in the template, the image registry address for the container images and the namespace they reside in have been modified. In addition, the node affinity configurations have also been removed. The remaining new fields are automatically populated by the Kubernetes controller.
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
# Omitted
generation: 1
labels:
app: nginx
velero.io/backup-name: <backupName>
velero.io/restore-name: <restoreName>
name: web
namespace: default1
resourceVersion: "119622"
uid: d23878ea-0b9f-40ba-b61b-1ff6bb77eb43
spec:
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Retain
podManagementPolicy: OrderedReady
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx
serviceName: nginx
template:
metadata:
creationTimestamp: null
labels:
app: nginx
spec:
affinity: {}
containers:
- image: registry.cn-beijing.aliyuncs.com/<acrRegistry>/nginx:1.14.1-8.6
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
name: web
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/nginx/html/
name: www
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
app: nginx
name: www
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: alicloud-disk-topology-alltype
volumeMode: Filesystem
status:
phase: Pending
status:
availableReplicas: 0
collisionCount: 0
currentRevision: web-7b454646b4
observedGeneration: 1
replicas: 2
updateRevision: web-7b454646b4
The stateful application is successfully launched on nodes of the ACK cluster without labels, and new disks are restored from snapshots for mounting.
To meet the challenges caused by different environments during the backup and recovery of Kubernetes clusters, ACK backup center provides flexible resource adjustment strategies to ensure smooth migration and seamless resumption of your business.
[1] Backup Center
https://www.alibabacloud.com/help/en/ack/ack-managed-and-ack-dedicated/user-guide/backup-center-overview
[2] Registered clusters in ACK One
https://www.alibabacloud.com/help/en/ack/distributed-cloud-container-platform-for-kubernetes/user-guide/overview-9
Backup Center Helps Enterprises Migrate Kubernetes Container Service Platforms Across Clouds
188 posts | 33 followers
FollowAlibaba Clouder - July 15, 2020
Alibaba Clouder - October 26, 2020
Alibaba Container Service - November 7, 2024
Alibaba Developer - August 19, 2021
Alibaba Container Service - February 11, 2025
Hironobu Ohara - February 3, 2023
188 posts | 33 followers
FollowFollow our step-by-step best practices guides to build your own business case.
Learn MoreSecure and easy solutions for moving you workloads to the cloud
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreMigrating to fully managed cloud databases brings a host of benefits including scalability, reliability, and cost efficiency.
Learn MoreMore Posts by Alibaba Container Service