You can use the backup center to migrate applications from a Container Service for Kubernetes (ACK) cluster that uses FlexVolume to an ACK cluster that uses CSI. You can also use the backup center to migrate applications from an ACK cluster that runs an old Kubernetes version to an ACK cluster that runs a new Kubernetes version. The backup center can help you migrate applications across clusters that use different volume plug-ins or run different Kubernetes versions. For example, you can use the backup center when you want to back up idle cluster-level resources or automatically switch to an API version supported by the cluster where the resources are restored. This topic describes how to use the backup center to migrate applications. In this topic, applications in a cluster that uses FlexVolume and runs Kubernetes 1.16 are migrated to a cluster that uses CSI and runs Kubernetes 1.28.
Usage notes
The backup cluster and restore cluster must be deployed in the same region. The backup cluster must run Kubernetes 1.16 or later. We recommend that you do not use the backup center to migrate applications from a cluster that runs a new Kubernetes version to a cluster that runs an old Kubernetes version in case an API version compatibility issue occurs.
The backup center does not back up resources that are being deleted.
To restore backups to File Storage NAS (NAS) volumes managed by CNFS (by setting StorageClass to alibabacloud-cnfs-nas), you need to create a StorageClass first. For more information, see Use CNFS to manage NAS file systems (recommended).
The backup center preferably restores applications to the API version suggested for the restore cluster. If no API version is supported by the old and new Kubernetes versions of a resource, you need to manually deploy the resource. Example:
Deployments in a cluster that runs Kubernetes 1.16 support
extensions/v1beta1
,apps/v1beta1
,apps/v1beta2
, andapps/v1
. In this scenario, the API versions of Deployments in a cluster that runs Kubernetes 1.28 are restored toapps/v1
.Ingresses in a cluster that runs Kubernetes 1.16 support
extensions/v1beta1
andnetworking.k8s.io/v1beta1
. In this scenario, you cannot restore Ingresses in a cluster that runs Kubernetes 1.22 or later.
For more information about API updates for different Kubernetes versions, see Release notes for Kubernetes versions supported by ACK and Deprecated API Migration Guide.
ImportantIn a cluster that runs Kubernetes 1.16, groups such as
apps
andrbac.authorization.k8s.io
already support API version v1. After you migrate applications to a cluster that runs Kubernetes 1.28, you need to manually restore the Ingress and CronJob resources.
Use scenarios
Migrate applications across clusters that use different volume plug-ins
ACK clusters whose Kubernetes versions are 1.20 or later no longer support FlexVolume. You can use the backup center to migrate stateful applications from a cluster that uses FlexVolume to a cluster that uses CSI.
NoteThe backup cluster can use FlexVolume or CSI but the restore cluster must use CSI.
Migrate applications across clusters that run different Kubernetes versions
In some scenarios, you may need to migrate applications from a cluster that runs an old Kubernetes version (1.16 or later). For example, you may want to switch the network plug-in from Flannel to Terway. You can use the backup center to migrate applications across multiple Kubernetes versions. The backup center can automatically adapt the basic configuration of the application template, such as the API version, to the new Kubernetes version.
Prerequisites
Cloud Backup is activated. To back up volumes that use NAS file systems, OSS buckets, and local disks or back up volumes in hybrid cloud scenarios, you must configure the backup center to use Cloud Backup to create backups. For more information, see Cloud backup.
A cluster where the volume is restored is created. To ensure that you can use snapshots of Elastic Compute Service (ECS) instances to restore disk data, we recommend that you update the Kubernetes version of the cluster to 1.18 or later. For more information, see Create an ACK managed cluster, Create an ACK dedicated cluster, or Create a cluster registration proxy and register a Kubernetes cluster that is deployed in a data center.
ImportantThe restore cluster must use the Container Storage Interface (CSI) plug-in. Application restoration is not supported in clusters that use FlexVolume or use csi-compatible-controller and FlexVolume.
The backup center is used to back up and restore applications. Before you run a restore task, you must install and configure system components in the restore cluster. Example:
aliyun-acr-credential-helper: You need to grant permissions to the restore cluster and configure acr-configuration.
alb-ingress-controller: You need to configure an ALBConfig.
migrate-controller is installed and permissions are granted. For more information, see Install migrate-controller and grant permissions.
To create disk snapshots to back up volumes, you must install CSI 1.1.0 or later. For more information about how to install the CSI plug-in, see Manage the CSI plug-in.
Migration workflow
The migration workflow varies based on the volume plug-in used by the backup cluster. The following figures show the details.
No application in the backup cluster uses volumes
The backup cluster uses FlexVolume
The backup cluster uses CSI
Procedure
An ACK cluster that uses FlexVolume and runs Kubernetes 1.16 is used in this example. The example demonstrates how to migrate applications, configurations, and volumes from the cluster to an ACK cluster that uses CSI and runs Kubernetes 1.28. You can perform the migration by changing the data source or without changing the data source. To migrate applications that do not use volumes or migrate application from a cluster that uses CSI, you can skip the optional steps.
If you do not change the data source, set the reclaim policy of the Persistent Volume (PV) in the backup cluster to Retain. Otherwise, data is deleted after you delete volumes.
kubectl patch pv/<pv-name> --type='json' -p '[{"op":"replace","path":"/spec/persistentVolumeReclaimPolicy","value":"Retain"}]'
Method | Description | Use scenario |
Change the data source | Back up data stored in volumes in the backup cluster and synchronize the backups to the restore cluster. This means that the backup and restore clusters each use a data source. The data restoration process uses dynamically provisioned volumes. You can convert the volume type by changing the StorageClass. For example, you can convert NAS volumes to disk volumes. |
|
Do not change the data source | The restoration process uses statically provisioned volumes restored from persistent volumes (PVs) and persistent volume claims (PVCs) in backups. Therefore, the backup and restore clusters use the same data source, such as disk IDs or Object Storage Service (OSS) buckets. If you want to migrate applications from FlexVolume to CSI, you need to manually create statically provisioned PVs and PVCs because CSI does not support the FlexVolume YAML templates. | You cannot suspend write operations for your businesses during data restoration and the businesses require data consistency. |
Prepare the environment
Method | Backup cluster | Restore cluster |
Kubernetes version | 1.16.9-aliyun.1 | 1.28.3-aliyun.1 |
Runtime version | Docker 19.03.5 | containerd 1.6.20 |
Volume plug-in version | FlexVolume: v1.14.8.109-649dc5a-aliyun | CSI: v1.26.5-56d1e30-aliyun |
Others |
| Install the csi-plugin and csi-provisioner plug-ins. For more information, see Manage components. |
Step 1: Deploy a test application
Run the following command to deploy a dynamically provisioned disk volume.
Replace
alicloud-disk-topology
with the name of the default disk StorageClass installed by the FlexVolume plug-in in your cluster.cat << EOF | kubectl apply -f - kind: PersistentVolumeClaim apiVersion: v1 metadata: name: disk-essd spec: accessModes: - ReadWriteOnce storageClassName: alicloud-disk-topology resources: requests: storage: 20Gi EOF
Run the following command to deploy a statically provisioned NAS volume.
Replace
server
with the mount target of your NAS file system.cat << EOF | kubectl apply -f - apiVersion: v1 kind: PersistentVolume metadata: name: pv-nas spec: capacity: storage: 5Gi storageClassName: nas accessModes: - ReadWriteMany flexVolume: driver: "alicloud/nas" options: server: "1758axxxxx-xxxxx.cn-beijing.nas.aliyuncs.com" vers: "3" options: "nolock,tcp,noresvport" --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: pvc-nas spec: accessModes: - ReadWriteMany storageClassName: nas resources: requests: storage: 5Gi EOF
Run the following command to deploy the application and mount the disk volume and NAS volume to the application.
apiVersion
in the following code is set to extensions/v1beta1. ThisAPI version
is deprecated in Kubernetes 1.28.cat << EOF | kubectl apply -f - apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx labels: app: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx ports: - containerPort: 80 volumeMounts: - name: nas mountPath: /cold - name: disk mountPath: /hot volumes: - name: nas persistentVolumeClaim: claimName: pvc-nas - name: disk persistentVolumeClaim: claimName: disk-essd EOF
Run the following command to confirm that the application runs as normal:
kubectl get pod -l app=nginx
Expected results:
NAME READY STATUS RESTARTS AGE nginx-5ffbc895b-xxxxx 1/1 Running 0 2m28s
Step 2: Install the backup center in the backup cluster
Install the backup service component in the backup cluster. For more information, see Install migrate-controller and grant permissions.
NoteYou can directly install migrate-controller 1.7.6 or later in a cluster whose Kubernetes version is later than 1.16 from the Add-ons page in the ACK console.
If the backup cluster is an ACK dedicated cluster, a registered cluster, or a cluster that uses a volume plug-in other than CSI (such as FlexVolume), you need to grant additional permissions. For more information, see Registered cluster.
(Optional) If your cluster uses FlexVolume, run the following command to confirm that the required permissions are granted:
kubectl -n csdr get secret alibaba-addon-secret
(Optional) If your cluster uses FlexVolume, run the following command to add the
USE_FLEXVOLUME
environment variable for deploying migrate-controller in the kube-system namespace.ImportantAfter migrate-controller is installed in a cluster that uses FlexVolume, the migrate-controller pod exceptionally exits and the application backup page prompts a 404 error. In this case, you need to add the USE_FLEXVOLUME environment variable to the YAML file of migrate-controller.
kubectl -n kube-system patch deployment migrate-controller --type json -p '[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"USE_FLEXVOLUME","value":"true"}}]'
Run the following command to confirm that migrate-controller runs as normal:
kubectl -n kube-system get pod -l app=migrate-controller kubectl -n csdr get pod
Expected results:
NAME READY STATUS RESTARTS AGE migrate-controller-6c8b9c6cbf-967x7 1/1 Running 0 3m55s NAME READY STATUS RESTARTS AGE csdr-controller-69787f6dc8-f886h 1/1 Running 0 3m39s csdr-velero-58494f6bf4-52mv6 1/1 Running 0 3m37s
Step 3: Create backups in the backup cluster
Create an OSS bucket named
cnfs-oss-*
in the region of the backup cluster to store backups. For more information, see Create a bucket.NoteBy default, ACK managed clusters have permissions on OSS buckets named
cnfs-oss-*
. If your OSS bucket does not comply with the naming convention, you need to grant additional permissions. For more information, see Install migrate-controller and grant permissions.Create a backup vault. For more information, see Create a backup vault.
Run the following command to create a real-time backup task.
For more information about how to configure backup parameters in the console, see Back up and restore applications in an ACK cluster. You can modify the suggested configuration in this step based on the actual scenario.
cat << EOF | kubectl apply -f - apiVersion: csdr.alibabacloud.com/v1beta1 kind: ApplicationBackup metadata: annotations: csdr.alibabacloud.com/backuplocations: '{"name":"<Backup vault name>","region":"<Region ID, such as cn-beijing>","bucket":"<Name of the OSS bucket associated with the backup vault>","provider":"alibabacloud"}' labels: csdr/schedule-name: fake-name name: <Backup name> namespace: csdr spec: excludedNamespaces: - csdr - kube-system - kube-public - kube-node-lease excludedResources: - storageclasses - clusterroles - clusterrolebindings - events - persistentvolumeclaims - persistentvolumes includeClusterResources: true pvBackup: defaultPvBackup: true storageLocation: <Backup vault name> ttl: 720h0m0s EOF
Parameter
Description
excludedNamespaces
The list of namespaces to be excluded from the backup task. We recommend that you exclude the following namespaces:
csdr
: the namespace of the backup center. The backup center automatically synchronizes data between clusters. Do not manually back up the backup and restore tasks in the csdr namespace. Otherwise, exceptions may occur.kube-system
,kube-public
, andkube-node-lease
: default namespaces in ACK clusters. These namespaces cannot be directly restored due to different cluster parameters or configurations.
excludedResources
The list of resources to be excluded from the backup task. Configure the list based on your business requirements.
includeClusterResources
Specify whether to back up all cluster-level resources, such as StorageClasses, CustomResourceDefinitions (CRDs), and webhooks.
true
: back up all cluster-level resources.false
: back up only cluster-level resources used by namespace-level resources in the specified namespaces. For example, when the system backs up a pod, the service account used by the pod is assigned a cluster role. In this case, the cluster role is automatically backed up. When the system backs up a CustomResource (CR), the corresponding CRD is backed up.
NoteBy default,
IncludeClusterResources
is set tofalse
for backup tasks created in the ACK console.defaultPvBackup
Specify whether to back up data in volumes.
true
: back up applications and data used by running pods in volumes.false
: back up only applications.
ImportantBy default, Elastic Compute Service (ECS) snapshots are created to back up disk volumes in clusters whose Kubernetes and CSI versions are both 1.18 or later. Cloud Backup is used to back up other types of volumes in the preceding clusters or disk volumes in clusters whose Kubernetes and CSI versions are between 1.16 and 1.18 (excluding 1.18).
For volumes that are not used by running pods, you must manually create statically provisioned PVs and PVCs in the restore cluster and specify the volume source, such as the disk ID or OSS bucket. This means that you cannot change the data source in this scenario.
If your businesses strongly rely on data consistency, you need to suspend write operations during the backup process. You can also back up only applications without changing the data source.
Run the following command to query the status of the backup task:
kubectl -ncsdr describe applicationbackup <Backup name>
If
Phase
in theStatus
column in the output changes toCompleted
, the backup task is created.Run the following command to confirm the list of resources to be backed up:
kubectl -ncsdr get pod | grep csdr-velero kubectl -ncsdr exec -it <Name of the csdr-velero pod> -- /velero describe backup <Backup name> --details
You can view resources that are skipped in the backup list, modify the backup parameters, and then rerun the backup task.
Resource List: apiextensions.k8s.io/v1/CustomResourceDefinition: - volumesnapshots.snapshot.storage.k8s.io v1/Endpoints: - default/kubernetes v1/Namespace: - default v1/PersistentVolume: - d-2ze88915lz1il01v1yeq - pv-nas v1/PersistentVolumeClaim: - default/disk-essd - default/pvc-nas v1/Secret: - default/default-token-n7jss - default/oss-secret - default/osssecret v1/Service: - default/kubernetes v1/ServiceAccount: - default/default ...
Step 4: Install the backup center in the restore cluster
Install the backup center in the restore cluster. For more information, see Step 2: Install the backup center in the backup cluster.
Associate the preceding backup vault with the restore cluster.
Log on to the ACK console.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Application Backup page, click Restore.
Select your backup vault from the Backup Vaults drop-down list, click Initialize Backup Vault, and then wait for the system to synchronize backups to the current cluster.
(Optional) Step 5: Manually create PVs and PVCs in the restore cluster
In most scenarios, you need only to perform Step 6 to create a restore task in the restore cluster. The backup center will automatically create PVs and PVCs from backups.
To protect data, when the backup center runs a restore task, it does not recreate or overwrite existing PVs and PVCs with the same names as the backups. In the following scenarios, you can create PVs and PVCs before the restore task starts.
You want to back up volumes and skip the logs stored in some volumes.
You want to back up volumes that are not used by running pods.
You do not want to back up volumes and persistentvolumeclaims and persistentvolumes are specified in the excludedResources list, or you want to migrate applications from FlexVolume to CSI.
Procedure:
You cannot mount disks across zones. If you want to restore resources in a different zone in the restore cluster, use one of the following solutions.
Synchronize data by changing the data source.
Log on to the ECS console. Create a snapshot for the disk and create a disk from the snapshot in the new zone. For more information, see Create a snapshot for a disk. Replace the zone ID in nodeAffinity and disk ID in the following YAML file named
outputfile.txt
.
(Optional) If the backup cluster uses FlexVolume, you need to use the Flexvolume2CSI CLI to batch convert YAML files because the YAML files of PVs and PVCs managed by FlexVolume and CSI are different. For more information, see Use Flexvolume2CSI to batch convert PVs and PVCs.
Run the following command to use Flexvolume2CSI to generate a YAML file supported by CSI.
outputfile.txt
is the YAML file generated by Flexvolume2CSI.kubectl apply -f outputfile.txt
Run the following command to confirm that the PVC in the restore cluster is in the Bound state:
kubectl get pvc
Expected results:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE disk-essd Bound d-2ze88915lz1il0xxxxxx 20Gi RWO alicloud-disk-essd 29m pvc-nas Bound pv-nas 5Gi RWX nas 29m
Step 6: Create a restore task in the restore cluster
If a resource with the same name already exists in the restore cluster, the restore task skips the resource.
The backup center is developed to back up and restore applications. Before you run a restore task, you must install and configure system components in the restore cluster. Example:
Container Registry password-free image pulling component: You need to grant permissions to and configure acr-configuration in the restore cluster.
ALB Ingress component: You need to configure an ALBConfig.
Pay attention to the following items when you restore Services:
NodePort Services: By default, Service ports are retained when you restore Services across clusters.
LoadBalancer Services: By default, a random HealthCheckNodePort is used when ExternalTrafficPolicy is set to Local. To retain the port, specify
spec.preserveNodePorts: true
when you create the restore task.If you restore a Service that uses an existing SLB instance in the backup cluster, the restored Service uses the same SLB instance and disables the listeners by default. You need to log on to the SLB console to configure the listeners.
LoadBalancer Services are managed by the cloud controller manager (CCM) in the backup cluster. When you restore LoadBalancer Services, the CCM automatically creates SLB instances for the restored Services. For more information, see Considerations for configuring a LoadBalancer type Service.
If you choose to back up volumes, the restore cluster uses a different data source. You can use convertedarg to convert volume types. For example, you can convert NAS volumes to disk volumes. You can choose a StorageClass based on your business requirements.
In this example, the backup cluster runs Kubernetes 1.16 and uses FlexVolume. Disk volumes are backed up by using Cloud Backup. Therefore, you can specify the alicloud-disk StorageClass for the disk-essd PVC. This converts the volumes to disk volumes managed by CSI. The default StorageClass is alicloud-disk-topology-alltype. If the backup cluster runs Kubernetes 1.18 or later and uses CSI, you do not need to convert disk volumes.
In this example, the alibabacloud-cnfs-nas StorageClass is specified for the pvc-nas PVC. This converts NAS volumes managed by FlexVolume to NAS volumes managed by CNFS. If the alibabacloud-cnfs-nas StorageClass is not used in your cluster, refer to Use CNFS to manage NAS file systems (recommended).
Procedure:
Run the following command to create a restore task.
For more information about how to configure a restore task in the console, see Back up and restore applications in an ACK cluster. You can modify the suggested configuration in this step based on the actual scenario.
cat << EOF | kubectl apply -f - apiVersion: csdr.alibabacloud.com/v1beta1 kind: ApplicationRestore metadata: csdr.alibabacloud.com/backuplocations: >- '{"name":"<Backup vault name>","region":"<Region ID, such as cn-beijing>","bucket":"<Name of the OSS bucket associated with the backup vault>","provider":"alibabacloud"}' name: <Restore name> namespace: csdr spec: backupName: <Backup name> excludedNamespaces: - arms-prom excludedResources: - secrets appRestoreOnly: false convertedarg: - convertToStorageClassType: alicloud-disk-topology-alltype namespace: default persistentVolumeClaim: alicloud-disk - convertToStorageClassType: alibabacloud-cnfs-nas namespace: default persistentVolumeClaim: pvc-nas namespaceMapping: <backupNamespace>: <restoreNamespace> EOF
Parameter
Description
excludedNamespaces
The list of namespaces to be excluded. You can exclude namespaces in the backup list from the restore task.
excludedResources
The list of resources to be excluded. You can exclude resources in the backup list from the restore task.
appRestoreOnly
Specify whether to restore volumes from volume backups.
true
: The backup center creates dynamically provisioned PVs and PVCs that point to a new data source. The parameter is set to true for restore tasks created in the console.false
: You need to manually create statically provisioned volumes before the restore task starts.
NoteIn most cases, set the parameter to
true
if you want to change the data source and set the parameter tofalse
if you do not change the data source.convertedarg
The StorageClass conversion list. For volumes of the FileSystem type, such as OSS, NAS, CPFS, and local volumes, you can configure this parameter to convert the StorageClasses of their PVCs to the specified StorageClass during the restoration process. For example, you can convert NAS volumes to disk volumes.
convertToStorageClassType: the desired StorageClass. Make sure that the StorageClass exists in the current cluster. You can specify only the disk or NAS StorageClass.
namespace: the namespace of the PVC.
persistentVolumeClaim: the name of the PVC.
You can run the
kubectl -ncsdr describe <backup-name>
command to query the PVC information of a backup. In the returnedstatus.resourceList.dataResource.pvcBackupInfo
list, the dataType field displays the data type of the PVC, which can be FileSystem or Snapshot. The nameSpace and pvcName fields display the namespace and name of the PVC.Run the following command to query the status of the restore task:
kubectl -ncsdr describe applicationrestore <Backup name>
If
Phase
in theStatus
column in the output changes toCompleted
, the restore task is created.Run the following command to check whether resources fail to be restored and view the cause:
kubectl -ncsdr get pod | grep csdr-velero kubectl -ncsdr exec -it <Name of the csdr-velero pod> -- /velero describe restore <Restore name> --details
Expected results:
Warnings: Velero: <none> Cluster: could not restore, ClusterRoleBinding "kubernetes-proxy" already exists. Warning: the in-cluster version is different than the backed-up version. Namespaces: demo-ns: could not restore, ConfigMap "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version. could not restore, Endpoints "kubernetes" already exists. Warning: the in-cluster version is different than the backed-up version. could not restore, Service "kubernetes" already exists. Warning: the in-cluster version is different than the backed-up version. Errors: Velero: <none> Cluster: <none> Namespaces: demo-ns: error restoring endpoints/xxxxxx/kubernetes: Endpoints "kubernetes" is invalid: subsets[0].addresses[0].ip: Invalid value: "169.254.128.9": may not be in the link-local range (169.xxx.0.0/16, fe80::/10) error restoring endpointslices.discovery.k8s.io/demo-ns/kubernetes: EndpointSlice.discovery.k8s.io "kubernetes" is invalid: endpoints[0].addresses[0]: Invalid value: "169.xxx.128.9": may not be in the link-local range (169.xxx.0.0/16, fe80::/10) error restoring services/xxxxxx/kubernetes-extranet: Service "kubernetes-extranet" is invalid: spec.ports[0].nodePort: Invalid value: 31882: provided port is already allocated
The Warnings information in the preceding output indicates that some resources are skipped by the restore task. The Errors information indicates that NodePort reuse fails. The original ports are retained when Services are restored across clusters.
Check whether the application runs as normal.
After the application is restored, check whether resources are in abnormal status due to business limits, container exceptions, or other issues. If yes, manually fix the issues.
After you fix the issue, the
apiVersion
of the application is automatically set to apps/v1, which is suggested for ACK clusters that run Kubernetes 1.28.