All Products
Search
Document Center

:Use multi-cluster gateways to implement zone-disaster recovery

Last Updated:Feb 28, 2024

Multi-cluster gateways are cloud-native gateways provided by Distributed Cloud Container Platform for Kubernetes (ACK One) to manage north-south traffic in multi-cloud and multi-cluster scenarios. Multi-cluster gateways use fully-managed Microservices Engine (MSE) Ingresses and Ingress APIs to manage north-south traffic at Layer 7. You can use multi-cluster gateways to implement zone-disaster recovery and verify canary versions based on request headers. This simplifies multi-cluster application O&M and reduces costs. By using the Continuous Delivery (CD) capability of ACK One GitOps, you can deploy applications across clusters and build an active zone-redundancy or primary/secondary disaster recovery system. This solution does not include disaster recovery for data services.

Disaster recovery overview

Disaster recovery solutions in the cloud include:

  • Zone-disaster recovery

    Zone-disaster recovery includes active zone-redundancy and primary/secondary disaster recovery. The network latency between data centers located in the same region is low. Therefore, zone-disaster recovery is suitable for protecting data against zone-level hazardous events, such as fire, network interruptions, and power outage.

  • Active geo-redundancy

    The network latency between data centers will be higher if the active geo-redundancy solution is used. However, this solution can efficiently protect data against region-level disasters, such as floods and earthquakes.

  • Three data centers across two zones

    Disaster recovery based on three data centers across two zones provides the benefits of zone-disaster recovery and active geo-redundancy. This solution is suitable for scenarios where you need to ensure the continuity and availability of applications.

Compared with active geo-redundancy, the implementation of zone-disaster recovery is easier and therefore still plays an important role in disaster recovery.

Benefits

Disaster recovery by using multi-cluster gateways has the following advantages over disaster recovery by using DNS traffic distribution:

  • Disaster recovery by using DNS traffic distribution requires multiple load balancer IP addresses (one IP address for each cluster). Disaster recovery by using multi-cluster gateways uses only one load balancer IP address in one region and uses multi-zone deployment in the same region by default to ensure high availability.

  • Disaster recovery by using multi-cluster gateways supports request forwarding at Layer 7, while disaster recovery by using DNS traffic distribution does not support this feature.

  • In most cases, clients need to cache DNS query results during IP address switching in a disaster recovery system that uses DNS traffic distribution. This causes temporary service interruptions. Disaster recovery by using multi-cluster gateways can resolve this problem by seamlessly failing over to the backend pods in another cluster.

  • Multi-cluster gateways are region-level gateways. Therefore, you can complete all the operations on a Fleet instance, including creating gateways and Ingresses. You do not need to install an Ingress controller or create Ingresses in each Container Service for Kubernetes (ACK) cluster. This helps you manage traffic in a region with reduced multi-cluster management costs.

Architecture

Multi-cluster gateways provided by ACK One route traffic based on fully-managed MSE Ingresses. You can use multi-cluster gateways together with the multi-cluster application distribution feature of ACK One GitOps to quickly build a zone-disaster recovery system. In this example, GitOps is used to deploy an application in ACK Cluster 1 and ACK Cluster 2 in the China (Hong Kong) region to implement active zone-redundancy and primary/secondary disaster recovery. Cluster 1 and Cluster 2 are deployed in two different zones of the region.

The application is a web application that uses Deployment and Service resources. A multi-cluster gateway is used to implement active zone-redundancy and primary/secondary disaster recovery, as shown in the following figure.image.png

  • Use the MseIngressConfig object in an ACK One Fleet instance to create an MSE gateway.

  • Create Cluster 1 and Cluster 2 in AZ 1 and AZ 2 in the China (Hong Kong) region.

  • Use ACK One GitOps to distribute the application to Cluster 1 and Cluster 2.

  • After a multi-cluster gateway is created, you can configure Ingress rules to route traffic to the clusters based on weights and request headers. When one of the clusters is down, traffic is automatically switched to the other cluster.

Prerequisites

Step 1: Use GitOps to distribute an application to multiple clusters

Use the Argo CD UI to deploy an application

  1. Log on to the ACK One console. In the left-side navigation pane, choose Fleet > GitOps.

  2. On the GitOps page of the Fleet instance, click GitOps Console to go to the GitOps console.

    Note
    • If GitOps is not enabled for the ACK One Fleet instance, click Enable GitOps to log on to the GitOps console.

    • For more information about how to enable Internet access to GitOps, see Enable public access to Argo CD.

  3. Add an application repository.

    1. In the left-side navigation pane of the ArgoCD UI, click Settings and then choose Repositories > + Connect Repo.

    2. In the panel that appears, configure the following parameters and click CONNECT.

      Section

      Parameter

      Value

      Choose your connection method

      VIA HTTPS

      CONNECT REPO USING HTTPS

      Type

      git

      Project

      default

      Repository URL

      https://github.com/AliyunContainerService/gitops-dem

      Skip server verification

      Select the check box.

      image.png

      After the Git repository is connected, CONNECTION STATUS displays Successful.

      image.png

  4. Create an application.

    On the Applications page of Argo CD, click + NEW APP and configure the following parameters.

    Section

    Parameter

    Description

    GENERAL

    Application Name

    The application name. You can specify a custom application name.

    Project Name

    default

    SYNC POLICY

    Select a synchronization policy based on your requirements. Valid values:

    • Manual: When changes are made to the image in the Git repository, you need to manually synchronize the changes to the destination cluster.

    • Automatic: Argo CD checks the Git repository for image changes every three minutes and automatically synchronizes the changes to the destination cluster.

    SYNC OPTIONS

    Select AUTO-CREATE NAMESPACE.

    SOURCE

    Repository URL

    Select a Git repository from the drop-down list and select the URL https://github.com/AliyunContainerService/gitops-demo.git that you added in the preceding step.

    Revision

    Branches: Select gateway-demo in the Branches drop-down list.

    Path

    manifests/helm/web-demo

    DESTINATION

    Cluster URL

    Select the URL of Cluster 1 or Cluster 2.

    You can also click the URL drop-down list on the right side and select NAME to select a URL based on the cluster name.

    Namespace

    gateway-demo is used in this example. Application resources (Services and Deployments) will be created in this namespace.

    Helm

    Parameters

    Set envCluster to cluster-demo-1 or cluster-demo-2 to specify the backend pods in a cluster to process requests.

Use the Argo CD CLI to deploy an application

  1. Enable GitOps on the ACK One Fleet instance. For more information, see Enable GitOps for the Fleet instance.

  2. Access Argo CD. For more information, see Use the Argo CD CLI to log on to Argo CD.

  3. Create and deploy an application.

    1. Run the following command to add a Git repository:

      argocd repo add https://github.com/AliyunContainerService/gitops-demo.git --name ackone-gitops-demos

      Expected output:

      Repository 'https://github.com/AliyunContainerService/gitops-demo.git' added
    2. Run the following command to query Git repositories:

      argocd repo list

      Expected output:

      TYPE  NAME  REPO                                                       INSECURE  OCI    LFS    CREDS  STATUS      MESSAGE  PROJECT
      git         https://github.com/AliyunContainerService/gitops-demo.git  false     false  false  false  Successful           default
    3. Run the following command to query clusters:

      argocd cluster list

      Expected output:

      SERVER                          NAME                                        VERSION  STATUS      MESSAGE                                                  PROJECT
      https://1.1.XX.XX:6443      c83f3cbc90a****-temp01   1.22+    Successful
      https://2.2.XX.XX:6443      c83f3cbc90a****-temp02   1.22+    Successful
      https://kubernetes.default.svc  in-cluster                                           Unknown     Cluster has no applications and is not being monitored.
    4. Use the Application mode to create and deploy the application to the destination cluster.

      1. Create a file named apps-web-demo.yaml and add the following content to the file:

        Replace repoURL with the actual repository URL.

        View apps-web-demo.yaml

        apiVersion: argoproj.io/v1alpha1
        kind: Application
        metadata:
          name: app-demo-cluster1
          namespace: argocd
        spec:
          destination:
            namespace: gateway-demo
            # https://1.1.XX.XX:6443
            server: ${cluster1_url}
          project: default
          source:
            helm:
              releaseName: "web-demo"
              parameters:
              - name: envCluster
                value: cluster-demo-1
              valueFiles:
              - values.yaml
            path: manifests/helm/web-demo
            repoURL: https://github.com/AliyunContainerService/gitops-demo.git
            targetRevision: gateway-demo
          syncPolicy:
            syncOptions:
            - CreateNamespace=true
        ---
        apiVersion: argoproj.io/v1alpha1
        kind: Application
        metadata:
          name: app-demo-cluster1
          namespace: argocd
        spec:
          destination:
            namespace: gateway-demo
            server: ${cluster2_url}
          project: default
          source:
            helm:
              releaseName: "web-demo"
              parameters:
              - name: envCluster
                value: cluster-demo-2
              valueFiles:
              - values.yaml
            path: manifests/helm/web-demo
            repoURL: https://github.com/AliyunContainerService/gitops-demo.git
            targetRevision: gateway-demo
          syncPolicy:
            syncOptions:
            - CreateNamespace=true
      2. Run the following command to deploy the application:

        kubectl apply -f apps-web-demo.yaml
      3. Run the following command to query applications:

        argocd app list

        Expected output:

        # app list
        NAME                      CLUSTER                  NAMESPACE  PROJECT  STATUS  HEALTH   SYNCPOLICY  CONDITIONS  REPO                                                       PATH                     TARGET
        argocd/web-demo-cluster1  https://10.1.XX.XX:6443             default  Synced  Healthy  Auto        <none>      https://github.com/AliyunContainerService/gitops-demo.git  manifests/helm/web-demo  main
        argocd/web-demo-cluster2  https://10.1.XX.XX:6443             default  Synced  Healthy  Auto        <none>      https://github.com/AliyunContainerService/gitops-demo.git  manifests/helm/web-demo  main

Step 2: Use kubectl to deploy a multi-cluster gateway from the ACK One Fleet instance

Create an MseIngressConfig object on the ACK One Fleet instance to deploy a multi-cluster gateway and then connect the associated clusters to the gateway.

  1. Obtain and record the vSwitch ID of the ACK One Fleet instance. For more information, see Obtain a vSwitch ID.

  2. Create a file named gateway.yaml and add the following content to the file.

    Note
    apiVersion: mse.alibabacloud.com/v1alpha1
    kind: MseIngressConfig
    metadata:
      annotations:
        mse.alibabacloud.com/remote-clusters: ${cluster1},${cluster2}
      name: ackone-gateway-hongkong
    spec:
      common:
        instance:
          replicas: 3
          spec: 2c4g
        network:
          vSwitches:
          - ${vsw-id}
      ingress:
        local:
          ingressClass: mse
      name: mse-ingress

    Parameter

    Description

    mse.alibabacloud.com/remote-clusters

    The cluster to be connected to the multi-cluster gateway. Enter the ID of a cluster that is associated with the ACK One Fleet instance.

    spec.name

    The name of the gateway instance.

    spec.common.instance.spec

    Optional. The specification of the gateway instance. The default value is 4c8g.

    spec.common.instance.replicas

    Optional. The number of replicated gateway instances. The default value is 3.

    spec.ingress.local.ingressClass

    Optional. The names of IngressClasses that the multi-cluster gateway listens on. In this example, the multi-cluster gateway listens on all Ingresses whose IngressClasses are mse.

  3. Run the following command to deploy the multi-cluster gateway:

    kubectl apply -f gateway.yaml
  4. Run the following command to check whether the multi-cluster gateway is created and listens on Ingresses.

    kubectl get mseingressconfig ackone-gateway-hongkong

    Expected output:

    NAME                      STATUS      AGE
    ackone-gateway-hongkong   Listening   3m15s

    The output indicates that the gateway is in the Listening state. This means that the multi-cluster gateway is created and running. The gateway listens on Ingresses whose IngressClasses are mse.

    A gateway created from an MseIngressConfig goes through the following states: Pending, Running, and Listening. State description:

    • Pending: The cloud-native gateway is being created. It requires about 3 minutes to create the gateway.

    • Running: The cloud-native gateway is created and running.

    • Listening: The cloud-native gateway is running and listens on Ingresses.

    • Failed: The cloud-native gateway is abnormal. You can check the message in the Status field to troubleshoot the issue.

  5. Run the following command to check whether the associated cluster is connected to the gateway:

    kubectl get mseingressconfig ackone-gateway-hongkong -ojsonpath="{.status.remoteClusters}"

    Expected output:

    [{"clusterId":"c7fb82****"},{"clusterId":"cd3007****"}]

    The output indicates the IDs of the associated clusters and no Failed error is returned. This means that the associated clusters are connected to the multi-cluster gateway.

Step 3: Use Ingresses to implement zone-disaster recovery

Multi-cluster gateways use Ingresses to manage traffic across clusters. You can create Ingress objects on the ACK One Fleet instance to implement active zone-redundancy and primary/secondary disaster recovery.

Important

Make sure that the gateway-demo namespace is created on the ACK One Fleet instance. The Ingress objects and Service objects in the Deployment of the application must belong to the same namespace: gateway-demo.

Active zone-redundancy

Create an Ingress object to implement active zone-redundancy

Create an Ingress object on the ACK One Fleet instance to implement active zone-redundancy.

By default, traffic is distributed to all replicated pods of the application in Cluster 1 and Cluster 2 that are connected to the multi-cluster gateway. When the backend pods in Cluster 1 are down, the gateway automatically distributes all traffic to the backend pods in Cluster 2. The ratio of replicated pods in Cluster 1 to replicated pods in Cluster 2 is 9:1. Therefore, 90% traffic is routed to Cluster 1 and 10% traffic is routed to Cluster 2. When all backend pods in Cluster 1 are down, the gateway automatically routes all traffic to Cluster 2. The following figure shows the topology.image.png

  1. Create a file named ingress-demo.yaml and copy the following content to the file.

    In the following code, the /svc1 forwarding rule below the domain name example.com is used to expose the backend Service named service1.

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: web-demo
    spec:
      ingressClassName: mse
      rules:
      - host: example.com
        http:
          paths:
          - path: /svc1
            pathType: Exact
            backend:
              service:
                name: service1
                port: 
                  number: 80
  2. Run the following command to deploy the Ingresses on the ACK One Fleet instance:

    kubectl apply -f ingress-demo.yaml -n gateway-demo

Verify the canary version

In active zone-redundancy scenarios, you can use the following methods to verify the canary version of an application without affecting your businesses.

Create an application in an existing cluster. Specify a name and selector for the Service and Deployment. Make sure that the name and selector are different from those configured for the original application. Then, verify the canary version of the application based on request headers.

  1. Create a file named new-app.yaml in Cluster 1 and copy the following content to the file. Retain the original configurations except for the configurations of the Service and Deployment.

    apiVersion: v1       
    kind: Service
    metadata:
      name: service1-canary-1
      namespace: gateway-demo
    spec:
      ports:
      - port: 80
        protocol: TCP
        targetPort: 8080
      selector:
        app: web-demo-canary-1
      sessionAffinity: None
      type: ClusterIP
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: web-demo-canary-1
      namespace: gateway-demo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: web-demo-canary-1
      template:
        metadata:
          labels:
            app: web-demo-canary-1
        spec:
          containers:
            - env:
                - name: ENV_NAME
                  value: cluster-demo-1-canary
              image: 'registry-cn-hangzhou.ack.aliyuncs.com/acs/web-demo:0.6.0'
              imagePullPolicy: Always
              name: web-demo
  2. Run the following command to deploy the canary version in Cluster 1:

    kubectl apply -f new-app.yaml
  3. Create a file named new-ingress.yaml and copy the following content to the file.

    Create an Ingress on the ACK One Fleet instance to route requests based on request headers. You can add annotations to enable the canary feature and specify the canary-dest: cluster1 header to route requests that carry this header to the canary version.

    • nginx.ingress.kubernetes.io/canary: Set the value to "true" to enable the canary feature.

    • nginx.ingress.kubernetes.io/canary-by-header: the header key of requests routed to the canary version.

    • nginx.ingress.kubernetes.io/canary-by-header-value: the header value of requests routed to the canary version.

      apiVersion: networking.k8s.io/v1
      kind: Ingress
      metadata:
        name: web-demo-canary-1
        namespace: gateway-demo
        annotations:
          nginx.ingress.kubernetes.io/canary: "true"
          nginx.ingress.kubernetes.io/canary-by-header: "canary-dest"
          nginx.ingress.kubernetes.io/canary-by-header-value: "cluster1"
      spec:
        ingressClassName: mse
        rules:
        - host: example.com
          http:
            paths:
            - path: /svc1
              pathType: Exact
              backend:
                service:
                  name: service1-canary-1
                  port: 
                    number: 80
  4. Run the following command to deploy an Ingress on the ACK One Fleet instance to route requests based on request headers.

    kubectl apply -f new-ingress.yaml

Verify active zone-redundancy

Change the number of replicated pods deployed in Cluster 1 to 9 and the number of replicated pods deployed in Cluster 2 to 1. This way, 90% traffic is routed to Cluster 1 and 10% traffic is routed to Cluster 2 by default. When all backend pods in Cluster 1 are down, all traffic is automatically routed to Cluster 2.

Run the following command to query the public IP address of the multi-cluster gateway:

kubectl get ingress web-demo -n gateway-demo -ojsonpath="{.status.loadBalancer}"
  • Route traffic to the two clusters based on the specified ratio

    Run the following command to query the traffic routing information.

    Replace XX.XX.XX.XX with the public IP address of the multi-cluster gateway that you obtained in the preceding step.

    for i in {1..100}; do curl -H "host: example.com" XX.XX.XX.XX done

    Expected output: image.pngThe output indicates that 90% traffic is routed to Cluster 1 and 10% traffic is routed to Cluster 2.

  • Route all traffic to Cluster 2 when all backend pods in Cluster 1 are down

    If the replica value in the Deployment in Cluster 1 is set to 0, the following result is returned. The output indicates that the traffic is automatically failed over to Cluster 2.image.png

  • Route requests that carry the specified header to the canary version

    Run the following command to query the traffic routing information.

    Replace XX.XX.XX.XX in the following command with the IP address that is obtained after the application and Ingress are deployed in the Verify the canary version section.

    for i in {1..100}; do curl -H "host: example.com" -H "canary-dest: cluster1" xx.xx.xx.xx/svc1; sleep 1;  done  

    Expected output: image.pngThe output indicates that requests with the canary-dest: cluster1 header are routed to the canary version in Cluster 1.

Primary/secondary disaster recovery

Create an Ingress object on the ACK One Fleet instance to implement primary/secondary disaster recovery.

If the backend pods in both clusters are normal, traffic is routed only to the backend pods in Cluster 1. If the backend pods of Cluster 1 are down, the gateway automatically routes traffic to Cluster 2. The following figure shows the topology.image.png

Create an Ingress to implement primary/secondary disaster recovery

  1. Create a file named ingress-demo-cluster-one.yaml and add the following content to the file.

    Add the mse.ingress.kubernetes.io/service-subset and mse.ingress.kubernetes.io/subset-labels annotations to the YAML file of the Ingress object to use /service1 below the domain name example.com to expose the backend Service service1. For more information about the annotations supported by MSE Ingresses, see Annotations supported by MSE Ingress gateways.

    • mse.ingress.kubernetes.io/service-subset: the name of the subset of the Service. We recommend that you use a name related to the destination cluster.

    • mse.ingress.kubernetes.io/subset-labels: The ID of the destination cluster.

      apiVersion: networking.k8s.io/v1
      kind: Ingress
      metadata:
        annotations:
          mse.ingress.kubernetes.io/service-subset: cluster-demo-1
          mse.ingress.kubernetes.io/subset-labels: |
            topology.istio.io/cluster ${cluster1-id}
        name: web-demo-cluster-one
      spec:
        ingressClassName: mse
        rules:
        - host: example.com
          http:
            paths:
            - path: /service1
              pathType: Exact
              backend:
                service:
                  name: service1
                  port: 
                    number: 80
  2. Run the following command to deploy the Ingress on the ACK One Fleet instance:

    kubectl apply -f ingress-demo-cluster-one.yaml -ngateway-demo

Cluster-level canary releases

Multi-cluster gateways allow you to deploy an Ingress to route requests to specific clusters based on request headers. You can use this feature and the Ingress object created in the Create an Ingress to implement primary/secondary disaster recovery section to evenly distribute traffic to all replicated pods for load balancing and route requests that carry the specified header to the canary version.

  1. An Ingress is created by following the steps in the Create an Ingress to implement primary/secondary disaster recovery section.

  2. Create an Ingress that contains header-related annotations on the ACK One Fleet instance to implement cluster-level canary releases. When the headers of requests match the Ingress rule, the requests are routed to the backend pods of the canary version.

    1. Create a file named ingress-demo-cluster-gray.yaml and add the following content to the file.

      In the YAML file of the following Ingress object, replace ${cluster1-id} with the ID of the destination cluster. In addition to the mse.ingress.kubernetes.io/service-subset annotation and the mse.ingress.kubernetes.io/subset-labels annotation, you must add the following annotations to expose the backend Service named service1 by using the /service1 Ingress rule below the domain name example.com.

      • nginx.ingress.kubernetes.io/canary: Set the value to "true" to enable canary releases.

      • nginx.ingress.kubernetes.io/canary-by-header: the header key of requests routed to the cluster.

      • nginx.ingress.kubernetes.io/canary-by-header-value: the header value of requests routed to the cluster.

      apiVersion: networking.k8s.io/v1
      kind: Ingress
      metadata:
        annotations:
          mse.ingress.kubernetes.io/service-subset: cluster-demo-2
          mse.ingress.kubernetes.io/subset-labels: |
            topology.istio.io/cluster ${cluster2-id}
          nginx.ingress.kubernetes.io/canary: "true"
          nginx.ingress.kubernetes.io/canary-by-header: "app-web-demo-version"
          nginx.ingress.kubernetes.io/canary-by-header-value: "gray"
        name: web-demo-cluster-gray
        name: web-demo
      spec:
        ingressClassName: mse
        rules:
        - host: example.com
          http:
            paths:
            - path: /service1
              pathType: Exact
              backend:
                service:
                  name: service1
                  port: 
                    number: 80
    2. Run the following command to deploy the Ingress on the ACK One Fleet instance:

      kubectl apply -f ingress-demo-cluster-gray.yaml -n gateway-demo

Verify primary/secondary disaster recovery

To verify primary/secondary disaster recovery, set the number of replicated pods to 1 for Cluster 1 and Cluster 2. This way, requests that carry the specified header are routed to Cluster 2 and other requests are routed to Cluster 1 by default. When the backend pods in Cluster 1 are down, traffic is routed to Cluster 2.

Run the following command to query the public IP address of the multi-cluster gateway:

kubectl get ingress web-demo -n gateway-demo -ojsonpath="{.status.loadBalancer}"
  • Route traffic to Cluster 1 by default

    Run the following command to check whether the traffic is routed to Cluster 1 by default:

    Replace XX.XX.XX.XX with the public IP address of the multi-cluster gateway that you obtained in the preceding step.

    for i in {1..100}; do curl -H "host: example.com" xx.xx.xx.xx/service1; sleep 1;  done

    Expected Output: image.pngThe output indicates that all traffic is routed to Cluster 1 by default.

  • Route requests that carry the specified header to the canary version

    Run the following command to check whether requests that carry the specified header are routed to the canary version.

    Replace XX.XX.XX.XX with the public IP address of the multi-cluster gateway that you obtained in the preceding step.

    for i in {1..50}; do curl -H "host: example.com" -H "app-web-demo-version: gray" xx.xx.xx.xx/service1; sleep 1;  done

    Expected output: image.pngThe output indicates that requests with the app-web-demo-version: gray header are routed to the canary version in Cluster 2.

  • Route traffic to Cluster 2 when Cluster 1 is down

    If the replica value in the Deployment in Cluster 1 is set to 0, the following result is returned. The output indicates that the traffic is automatically failed over to Cluster 2.image.png

References