×
Community Blog Alibaba Cloud ACK One: Quickly Build A Zone-disaster Recovery System with Multi-cluster Gateways

Alibaba Cloud ACK One: Quickly Build A Zone-disaster Recovery System with Multi-cluster Gateways

This article introduces the ACK One multi-cluster gateways and their benefits in implementing zone-disaster recovery for multi-cluster applications.

Alibaba Cloud Distributed Cloud Container Platform for Kubernetes (ACK One)’s Multi-cluster Gateways is a cloud-native gateway solution designed for multi-cloud and multi-cluster scenarios. This unified gateway enables the management of north-south traffic across multiple clusters.

With ACK One Multi-cluster Gateways, you can quickly build a zone-disaster recovery system for multi-cluster applications, ensuring the continuity and availability of application data in a simpler and more efficient manner.

Why Zone-disaster Recovery is Needed

Due to the complexity and unpredictability of failures, it's essential to ensure high application and data availability in extreme cases such as network or power outages, fires, or earthquakes. As a result, disaster recovery solutions are necessary. Cloud-based disaster recovery can be categorized into three types:

  1. Zone-disaster recovery
  2. Cross-region disaster recovery
  3. Three data centers across two zones

Zone-disaster recovery includes active zone redundancy and primary-secondary disaster recovery. With shorter physical distances and lower network latency between data centers in the same region, zone-disaster recovery can prevent zone-level disasters such as fires, network outages, and power outages. Cross-region disaster recovery, on the other hand, requires higher network latency but can prevent regional disasters like earthquakes and floods. The three-data-center-across-two-regions solution combines the advantages of zone-disaster recovery and cross-region disaster recovery, making it suitable for applications with high data continuity and availability requirements.

However, compared with cross-region disaster recovery, zone-disaster recovery is much easier to implement in actual cases. Therefore, zone-disaster recovery is still of great significance.

Existing Solutions and Problems of Zone-disaster Recovery

In the application of Kubernetes clusters, the existing solutions for zone-disaster recovery and multi-zone/multi-cluster disaster recovery are basically implemented based on DNS and multiple Ingress controllers. The following figure shows the general architecture:

1

Some gateway functions such as Global Traffic Manager (GTM) can be added between DNS and SLB to support capabilities like nearby service access, high concurrency load balancing, and health check and failover, thus improving disaster recovery solutions. However, existing DNS-based disaster recovery solutions still have some problems:

  1. DNS-based solutions cannot support request forwarding at Layer 7.
  2. DNS-based solutions usually have client-side caches during IP switching, resulting in short-term service unavailability and affecting user experience.
  3. DNS-based solutions usually require multiple LBs (one for each cluster/zone), and require Ingress controllers to be installed and ingress objects to be created in each cluster, which is costly in terms of management and fees.

Zone-disaster Recovery Solutions Based on ACK One Multi-cluster Gateways

What are ACK One Multi-cluster Gateways?

Multi-cluster gateways are implemented by ACK One Fleet by integrating the MSE cloud-native gateways [3]. They use the Ingress API to manage the north-south traffic of multi-cluster applications. They can help you manage Layer 7 traffic, and further implement the required capabilities for multi-cluster applications, such as automatic zone-disaster recovery, header-based canary release verification, and traffic load balancing based on the weight and the number of replicas.

In addition, multi-cluster gateways are region-specific. You only need to create gateways and Ingresses in the ACK One Fleet instance. You do not need to install the Ingress controller or create Ingress resources in each ACK cluster. This provides the capability of cross-region global traffic management and reduces the costs of multi-cluster management.

2

About MSE Ingress and MSE Cloud-native Gateway

MSE Ingress [4] provides a more powerful Ingress traffic management method based on MSE cloud-native gateways. MSE Ingress is compatible with NGINX Ingresses and more than 50 annotations of NGINX Ingresses. MSE Ingress covers more than 90% of NGINX Ingress business scenarios. MSE Ingress supports the simultaneous canary release of multiple service versions, flexible service governance capabilities, and comprehensive security protection to meet the traffic governance requirements of large-scale cloud-native distributed applications. For more information about the supported annotations, see the Annotations supported by MSE Ingress gateways[5].

In the ACK One Fleet, the administrator can use kubectl to create a MseIngressConfig resource to create an MSE cloud-native gateway and use it to manage multi-cluster traffic, and the administrator can also create an Ingress resource to manage traffic routing.

Overview of Traffic Management of Kubernetes Services

In the traffic management field of Kubernetes, the Ingress controller is a very important solution. As a special Layer 4 and Layer 7 proxy, the Ingress controller is used to manage the north-south traffic: the traffic is imported into the Kubernetes cluster, reaches the services, and then exported.

The main goal of Kubernetes Ingress API [6] is to use simple and declarative APIs to implement proxy services in clusters. Kubernetes Ingress API has reached the Beta phase since Kubernetes 1.2. Many vendors support Ingress, and many new features have been extended, which are also widely used. Therefore, the learning cost of using Ingress APIs is low.

It is worth mentioning that compared with Ingress, Kubernetes Gateway API [7] provides more general proxy APIs, supports more protocols, and can implement more capabilities. MSE will provide support for Gateway APIs in the future to meet more complex traffic management requirements of users.

Taking all aspects into consideration, ACK One uses Ingress APIs to support north-south traffic management across clusters.

Zone-disaster Recovery Solutions Based on ACK One Multi-cluster Gateways

ACK One multi-cluster gateways allow you to build a zone-disaster recovery system for multi-cluster applications, including active zone-redundancy and primary-secondary zone-disaster recovery. The overall architecture is as follows:

3

• Two different zones (AZ 1 and AZ 2) in a region create one ACK cluster respectively (Cluster 1 and Cluster 2)

• In the ACK One Fleet, use the MseIngressConfig resource to create an MSE gateway in the region and VPC where the Fleet is located.

• Distribute applications to multiple clusters (the created Cluster 1 and Cluster 2) through ACK One GitOps

• Create an Ingress in Fleet to set traffic rules. The following capabilities are supported across multiple ACK clusters:

HTTP routing: Layer 7 routing § such as header-based routing which routes traffic to a specified cluster based on the header and performs header-based canary verification

Traffic splitting: Perform traffic routing, A/B testing, blue-green deployment, and canary release based on weight

Health-based automatic disaster recovery: Traffic across multiple clusters or multiple zones can realize automatic disaster recovery. After a cluster or a replica in a cluster becomes unhealthy, traffic is automatically and smoothly routed to another cluster or other healthy replicas.

Traffic mirroring and traffic load balancing based on the number of replicas

Advantages

Based on the preceding architecture, it's clear that the disaster recovery solution based on ACK One multi-cluster gateways has several advantages over the disaster recovery solution based on DNS traffic distribution:

• The disaster recovery solution based on DNS traffic distribution requires multiple LB IP addresses (one for each cluster), while the disaster recovery solution based on multi-cluster gateways requires only one LB IP address at the region level, and provides high availability of multiple zones in the same region by default.

• Disaster recovery based on multi-cluster gateways supports the Layer 7 routing capability, while DNS-based disaster recovery does not.

• In the DNS-based disaster recovery solution, the client cache usually causes the service to be temporarily unavailable during IP switching, while the disaster recovery solution based on multi-cluster gateways can smoothly fallback traffic to the service back-end of another cluster.

• Multi-cluster gateways are based on regions. Therefore, you only need to create gateways and Ingresses in Fleet instances. You do not need to install Ingress controllers and create Ingresses in each ACK cluster. This reduces the cost of multi-cluster management while providing region-level global traffic management capability.

Active Zone-redundancy Based on Multi-cluster Gateways

The following figure shows the active zone-redundancy based on multi-cluster gateways:

4

• Cluster 1 and Cluster 2 have services with the same names (the name of the Kubernetes service is the same as the namespace), and the number of replicas is 1:1.

• The gateway routes traffic to Cluster 1 and Cluster 2 at a ratio of 1:1.

• When Cluster 1 is abnormal (for example, the number of replicas changes to 0), 50% of traffic of Cluster 1 is automatically migrated to Cluster 2. In this case, 100% of traffic is routed to Cluster 2.

Primary-secondary Zone-disaster Recovery Based on Multi-cluster Gateways

The following figure shows the primary-secondary zone-disaster recovery based on multi-cluster gateways:

5

• A service with the same name exists in AZ 1 Cluster 1 and AZ 2 Cluster 2 (the name of the Kubernetes service is the same as the namespace)

• Cluster 1 is the primary service and Cluster 2 is the secondary service. By default, only the primary cluster provides service, so 100% of traffic is routed to Cluster 1.

• When the primary cluster encounters an exception, 100% of traffic is automatically and smoothly routed to the secondary cluster (Cluster 2 in AZ 2)

Summary

The multi-cluster gateways of ACK One provide powerful capabilities. In addition, the multi-cluster distribution and continuous deployment capabilities of ACK One GitOps [8] help you build a disaster recovery system for multi-cluster applications at a low cost. For more information, see Use multi-cluster gateways to implement zone-disaster recovery[9] and Overview of multi-cluster gateways [10].

References:

[1] ACK One
https://www.alibabacloud.com/help/en/ack/distributed-cloud-container-platform-for-kubernetes/product-overview/ack-one-overview
[2] Multi-cluster gateways
https://www.alibabacloud.com/help/en/ack/distributed-cloud-container-platform-for-kubernetes/user-guide/multi-cluster-traffic-gateway-overview
[3] MSE cloud-native gateways
https://www.alibabacloud.com/help/en/mse/product-overview/cloud-native-gateway-overview
[4] MSE Ingress
https://www.alibabacloud.com/help/en/mse/user-guide/overview-of-mse-ingress-gateways#task-2193958
[5] Annotations supported by MSE Ingress gateways
https://www.alibabacloud.com/help/en/mse/user-guide/annotations-supported-by-mse-ingress-gateways
[6] Kubernetes Ingress API
https://kubernetes.io/docs/concepts/services-networking/ingress/
[7] Kuberentes Gateway API
https://gateway-api.sigs.k8s.io/
[8] ACK One GitOps
https://www.alibabacloud.com/help/en/ack/distributed-cloud-container-platform-for-kubernetes/user-guide/gitops-overview
[9] Use multi-cluster gateways to implement zone-disaster recovery
https://www.alibabacloud.com/help/en/ack/distributed-cloud-container-platform-for-kubernetes/use-cases/zone-disaster-recovery-based-on-multi-cluster-gateway
[10] Overview of multi-cluster gateways
https://www.alibabacloud.com/help/en/ack/distributed-cloud-container-platform-for-kubernetes/user-guide/multi-cluster-gateway-overview

0 1 0
Share on

Alibaba Container Service

173 posts | 31 followers

You may also like

Comments

Alibaba Container Service

173 posts | 31 followers

Related Products