Implement zone-disaster recovery using an ALB multi-cluster gateway in ACK One - Server Load Balancer

The Application Load Balancer (ALB) multi-cluster gateways of Distributed Cloud Container Platform for Kubernetes (ACK One) can be used together with ACK One GitOps or the multi-cluster application distribution feature to quickly implement zone-disaster recovery. This allows you to ensure the high availability of your business and automatically switch traffic in a seamless manner when a fault occurs. This topic describes how to build a zone-disaster recovery system by using multi-cluster gateways.

Disaster recovery overview

Disaster recovery solutions in the cloud can be classified into the following types:

Zone-disaster recovery: This solution includes active zone-redundancy and primary/secondary disaster recovery. The network latency between data centers located in the same region is low. Therefore, zone-disaster recovery is suitable for protecting data against zone-level hazardous events, such as fire, network interruptions, and power outages. Although this solution uses simple methods to back up and restore data, it is applicable for common scenarios.
Active geo-redundancy: The network latency between data centers will be higher if the active geo-redundancy solution is used. However, this solution can efficiently protect data against region-level disasters, such as floods and earthquakes.
Disaster recovery based on three data centers across two zones: This solution provides the benefits of zone-disaster recovery and active geo-redundancy. This solution is suitable for scenarios where you need to ensure the continuity and availability of applications.

In most cases, the business architecture of an enterprise can be divided into the following layers from the top down: access layer, application layer, and data layer.

Access layer: serves as an entry point for ingress traffic. This layer routes ingress traffic to the backend application layer based on forwarding rules.
Application layer: hosts applications. This layer processes ingress traffic and sends the results back to the upper layer.
Data layer: stores data. This layer provides data and storage services for the application layer.

When you build a disaster recovery system for your business, you need to enforce recovery measures on each layer.

Access layer: ACK One uses multi-cluster gateways to build the access layer. The multi-cluster gateways of ACK One support zone-disaster recovery. Therefore, the access layer built on ACK One is highly available.
Application layer: ACK One uses multi-cluster gateways to implement disaster recovery on the application layer. The multi-cluster gateways of ACK One support active zone-redundancy, primary/secondary disaster recovery, and geo-redundancy.
Data layer: Disaster recovery and data synchronization on the data layer have middleware dependencies.

Benefits

Disaster recovery by using the multi-cluster gateways of ACK One has the following advantages over disaster recovery by using DNS traffic distribution:

Disaster recovery by using DNS traffic distribution requires multiple load balancer IP addresses (one IP address for each cluster). Disaster recovery by using multi-cluster gateways uses only one load balancer IP address in one region and uses multi-zone deployment in the same region by default to ensure high availability.
Disaster recovery by using multi-cluster gateways supports request forwarding at Layer 7, while disaster recovery by using DNS traffic distribution does not support this feature.
In most cases, clients need to cache DNS query results during IP address switching in a disaster recovery system that uses DNS traffic distribution. This causes temporary service interruptions. Disaster recovery by using multi-cluster gateways can resolve this problem by seamlessly failing over to the backend pods in another cluster.
Multi-cluster gateways are region-level gateways. Therefore, you can complete all the operations on a Fleet instance without the need to install an Ingress controller or create Ingresses in each Container Service for Kubernetes (ACK) cluster. This helps you manage traffic in a region and reduce multi-cluster management costs.

Architecture

In this example, a web application is used to show how to use ALB multi-cluster gateways to implement zone-disaster recovery. The web application consists of a Deployment and a Service. The following figure shows the architecture of the zone-disaster recovery system.

Create Cluster 1 and Cluster 2 in AZ 1 and AZ 2 in the China (Hong Kong) region.
Use ACK One GitOps to distribute the application to Cluster 1 and Cluster 2.
Use an AlbConfig to create an ALB multi-cluster gateway on the ACK One Fleet instance.
After the ALB multi-cluster gateway is created, you can configure Ingress rules to route traffic to the clusters based on weights and request headers. When one of the clusters is down, traffic is automatically switched to the other cluster.
Data synchronization based on ApsaraDB RDS has middleware dependencies.

Prerequisites

ALB is activated.
The Fleet management feature is enabled. For more information, see Enable multi-cluster management.
The ACK One Fleet instance is associated with two ACK clusters that are deployed in the same virtual private cloud (VPC) as the ACK One Fleet instance. For more information, see Manage associated clusters.

The kubeconfig file of the Fleet instance is obtained in the ACK One console and a kubectl client is connected to the Fleet instance.
The latest version of Alibaba Cloud CLI is installed and Alibaba Cloud CLI is configured.

Step 1: Use GitOps or the application distribution feature to distribute an application to multiple clusters

ACK One allows you to use GitOps or the application distribution feature to distribute an application to multiple clusters. For more information, see Getting started with GitOps, Create a multi-cluster application, and Get started with application distribution. In this step, GitOps is used.

Log on to the ACK One console. In the left-side navigation pane, choose Fleet > Multi-cluster Applications.
In the upper-left corner of the Multi-cluster Applications page, click to the right of the Fleet instance name and select your Fleet instance from the drop-down list.
Choose Create Multi-cluster Application > GitOps to go to the Create Multi-cluster Application - GitOps page.
Note
- If GitOps is not enabled for the ACK One Fleet instance, enable GitOps. For more information, see Enable GitOps for the Fleet instance.
- For more information about how to enable Internet access to GitOps, see Enable public access to Argo CD.

On the Create from YAML tab, copy the following YAML template to the code editor. Then, click OK to deploy the application.

Note

The following YAML template is used to deploy an application named web-demo to each associated cluster. You can also select the clusters where you want to deploy the application on the Quick Create tab. The configuration changes that you make on the Quick Create tab will be automatically synchronized to the YAML template on the Create from YAML tab.

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: appset-web-demo
  namespace: argocd
spec:
  template:
    metadata:
      name: '{{.metadata.annotations.cluster_id}}-web-demo'
      namespace: argocd
    spec:
      destination:
        name: '{{.name}}'
        namespace: gateway-demo
      project: default
      source:
        repoURL: https://github.com/AliyunContainerService/gitops-demo.git
        path: manifests/helm/web-demo
        targetRevision: main
        helm:
          valueFiles:
            - values.yaml
          parameters:
            - name: envCluster
              value: '{{.metadata.annotations.cluster_name}}'
      syncPolicy:
        automated: {}
        syncOptions:
          - CreateNamespace=true
  generators:
    - clusters:
        selector:
          matchExpressions:
            - values:
                - cluster
              key: argocd.argoproj.io/secret-type
              operator: In
            - values:
                - in-cluster
              key: name
              operator: NotIn
  goTemplateOptions:
    - missingkey=error
  syncPolicy:
    preserveResourcesOnDeletion: false
  goTemplate: true

Step 2: Use kubectl to deploy an ALB multi-cluster gateway from the ACK One Fleet instance

You can use an AlbConfig to create an ALB multi-cluster gateway from the ACK One Fleet instance. You can associate clusters with the gateway.

Obtain the IDs of two vSwitches that belong to the VPC where the ACK One Fleet instance resides.

Create a file named gateway.yaml and copy the following content to the file.

Note

Replace ${vsw-id1} and ${vsw-id2} with the vSwitch IDs obtained from the preceding step, and replace ${cluster1} and ${cluster2} with the IDs of the associated clusters you want to add.
For associated clusters ${cluster1} and ${cluster2}, you must configure the inbound rules of their security group to allow access from all IP addresses and ports of the vSwitch CIDR block.

apiVersion: alibabacloud.com/v1
kind: AlbConfig
metadata:
  name: ackone-gateway-demo
  annotations:
    # Specify the IDs of the clusters that you want to associate with the ALB instance. 
    alb.ingress.kubernetes.io/remote-clusters: ${cluster1},${cluster2}
spec:
  config:
    name: one-alb-demo
    addressType: Internet
    addressAllocatedMode: Fixed
    zoneMappings:
    - vSwitchId: ${vsw-id1}
    - vSwitchId: ${vsw-id2}
  listeners:
  - port: 8001
    protocol: HTTP
---
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: alb
spec:
  controller: ingress.k8s.alibabacloud/alb
  parameters:
    apiGroup: alibabacloud.com
    kind: AlbConfig
    name: ackone-gateway-demo

You need to configure the following parameters.

Parameter	Required	Description
`metadata.name`	Yes	The name of the AlbConfig.
`metadata.annotations:` `alb.ingress.kubernetes.io/remote-clusters`	Yes	The list of associated clusters to be added to the ALB multi-cluster gateway. The cluster IDs listed here have been associated with the Fleet instance.
`spec.config.name`	No	The name of the ALB instance.
`spec.config.addressType`	No	The network type of the ALB instance. Valid values: Internet (default): Public network. The ALB instance provides services to the Internet and is accessible over the Internet. Note To allow an ALB instance to provide Internet-facing services, the ALB instance needs to be associated with an elastic IP address (EIP). If you use an Internet-facing ALB instance, you are charged instance fees and bandwidth or data transfer fees for the associated EIPs. For more information, see Pay-as-you-go. Intranet: Private network. The ALB instance provides services within a VPC and cannot be accessed over the Internet.
`spec.config.zoneMappings`	Yes	The IDs of the vSwitches that are associated with the ALB instance. For more information about how to create a vSwitch, see Create and manage a vSwitch. Note The specified vSwitches must be deployed in the zones supported by the ALB instance and deployed in the same VPC as the cluster. For more information about regions and zones supported by ALB, refer to Regions and zones in which ALB is available. ALB supports multi-zone deployment. If the current region supports two or more zones, select vSwitches in at least two zones to ensure high availability.
`spec.listeners`	No	The listener port and protocol of the ALB instance. The example provided in this topic configures an HTTP listener on port 8001. A listener defines how ALB receives traffic. We recommend that you retain the listener configuration. Otherwise, you must create a listener before you can use ALB Ingresses.

Run the following command to deploy the gateway.yaml file to create an ALB multi-cluster gateway and an IngressClass:
```
kubectl apply -f gateway.yaml
```

Wait 1 to 3 minutes and run the following command to check whether the ALB multi-cluster gateway is created:

kubectl get albconfig ackone-gateway-demo

Expected output:

NAME      		      ALBID      DNSNAME                               PORT&PROTOCOL   CERTID   AGE
ackone-gateway-demo           alb-xxxx   alb-xxxx.<regionid>.alb.aliyuncs.com                           4d9h

Run the following command to check whether the associated cluster is connected to the gateway:
```
kubectl get albconfig ackone-gateway-demo -ojsonpath='{.status.loadBalancer.subClusters}'
```
The IDs of the associated clusters are returned in the output.

Step 3: Use Ingresses to implement zone-disaster recovery

Multi-cluster gateways use Ingresses to manage traffic across clusters. You can create Ingress objects on the ACK One Fleet instance to implement active zone-redundancy.

Create a namespace named gateway-demo, which is the same as the namespace where the Service that you created in the preceding steps resides.

Create a file named ingress-demo.yaml and copy the following content to the file.

Note

The sum of all weights specified in the alb.ingress.kubernetes.io/cluster-weight annotation must be 100.
The /svc1 forwarding rule below the domain name example.com is used to expose the backend Service named service1. Replace ${cluster1-id} and ${cluster2-id} with the actual cluster IDs.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/listen-ports: |
     [{"HTTP": 8001}]
    alb.ingress.kubernetes.io/cluster-weight.${cluster1-id}: "20"
    alb.ingress.kubernetes.io/cluster-weight.${cluster2-id}: "80"
  name: web-demo
  namespace: gateway-demo
spec:
  ingressClassName: alb
  rules:
  - host: alb.ingress.alibaba.com
    http:
      paths:
      - path: /svc1
        pathType: Prefix
        backend:
          service:
            name: service1
            port:
              number: 80

Run the following command to deploy the Ingress on the ACK One Fleet instance:
```
kubectl apply -f ingress-demo.yaml -n gateway-demo
```

Step 4: Verify active zone-redundancy

Forward traffic to different clusters by ratio

Run the following command to access the web application:

curl -H "host: alb.ingress.alibaba.com" alb-xxxx.<regionid>.alb.aliyuncs.com:<listeners port>/svc1

You need to configure the following parameters.

Parameter	Description
`alb-xxxx.<regionid>.alb.aliyuncs.com`	Set the value to the domain name in the `DNSNAME` column in the AlbConfig details you obtained in Step 2.
`<listeners port>`	Set the value to 8001, which is the value specified in the AlbConfig configurations and the `annotations` of the Ingress configurations.

Run the following command. The output shows that 20% of traffic is forwarded to Cluster 1 (poc-ack-1) and 80% of traffic is forwarded to Cluster 2 (poc-ack-2).

for i in {1..500}; do curl -H "host: alb.ingress.alibaba.com" alb-xxxx.cn-beijing.alb.aliyuncs.com:8001/svc1; done > res.txt

Automatically and seamlessly switch traffic when a fault occurs in one cluster

Run the following command. Then, decrease the number of application pods in Cluster 2 to 0. After the change takes effect, traffic is automatically switched to Cluster 1 in a seamless manner.

for i in {1..500}; do curl -H "host: alb.ingress.alibaba.com" alb-xxxx.cn-beijing.alb.aliyuncs.com:8001/svc1; sleep 1; done