By Jing Cai and Yu Zhuang
As enterprise business develops, the necessity of using Kubernetes multi-cluster is gradually highlighted:
• Disaster Recovery, Active Multi-zone Deployment, High Availability, and Low Latency: Deploy your business in multiple regions and zones to migrate traffic after faults to improve service availability; deploy your business in multiple clusters to distribute traffic; deploy your business in multiple regions to provide nearby access for reducing latency.
• Multi-cloud and Hybrid Cloud Deployment: Manage IDC clusters on the cloud in a centralized manner; promote the use of elastic resources on the cloud when traffic burst occurs; manage clusters across multiple cloud vendors in a centralized manner to prevent vendor lock-in.
• Business and Fault Isolation: Use multiple clusters to isolate businesses with different attributes, providing better isolation and performance compared with the multi-tenancy architecture based on the namespace (for example, use multiple clusters to divide the dev, staging, and production environments), and reducing the impact of faults.
• Security Compliance and Upper Limits on Nodes and Pods in a Single Kubernetes Cluster.
In view of the above multi-cluster use cases, many multi-cluster or fleet management solutions have emerged to manage multiple clusters efficiently and centrally. Multi-cluster management solutions began with the KubeFed project proposed by the Kubernetes community in Kubernetes 1.5 and 1.6. However, due to issues such as the low extensibility of the API, complex management, and insufficient maturity, Federation v1 has been archived by the community and has not been widely used. Federation v2 made a lot of improvements, including using the CRD mechanism to extend the API, which was more widely recognized and used. However, it ultimately could not be adopted in a wider range due to some design defects:
It is not compatible with the native Kubernetes API and uses a new set of Federated APIs, which significantly increases the learning costs for users.
It lacks extensibility and cannot be extended through its rigid nature to meet use cases in different scenarios.
Currently, the multi-cluster management solutions widely adopted by the open-source community include Open Cluster Management (OCM) and Karmada. Meanwhile, various cloud vendors have launched their own multi-cluster/fleet management solutions, such as Fleet management feature of ACK One.
ACK One is an enterprise-class distributed cloud container platform developed by Alibaba Cloud to meet container management requirements in hybrid cloud, multi-cluster, distributed computing, and disaster recovery scenarios. You can use ACK One registered clusters to connect your other public cloud vendors and IDC Kubernetes clusters to the ACK console. ACK One Fleet manages these registered clusters and ACK and ACK Edge clusters on the cloud to achieve centralized application distribution, traffic management, O&M management, and security management.
The Fleet management feature of ACK One is a solution for the centralized management of multiple clusters based on Open Cluster Management (OCM) from the open-source community. Each Fleet instance is managed by ACK. You can focus on application development without much O&M work.
ACK One Fleet includes the following key capabilities:
According to the results of the CNCF microsurvey on GitOps usage trend evaluation published in late 2023, the data shows that GitOps has become the top choice of most developers for fast, consistent, and secure delivery.
Based on the CNCF graduated project Argo CD, ACK One GitOps provides GitOps continuous delivery capabilities for multi-cluster applications in multi-cloud, multi-cluster, and hybrid cloud scenarios. ACK One GitOps is integrated with fully managed Argo CD, the multi-cluster management feature of ACK One, and Alibaba Cloud Resource Access Management (RAM) and single sign-on (SSO). With these capabilities, ACK One GitOps provides out-of-the-box Argo CD features and a secure and comprehensive GitOps CD experience for applications among clusters and allows you to implement continuous hybrid cloud application deployment across clusters in a fast, consistent, and secure manner.
ACK One GitOps has the following advantages:
• Integrated with open-source Argo CD, out-of-the-box, O&M-free, and provides a CLI and a UI that offer the same user experience as the CLI and the UI provided by Argo CD.
• Provides a separate Argo CD console, integrated with Alibaba Cloud Resource Access Management (RAM) and single sign-on (SSO), and supports Argo CD multi-tenancy permission management.
• Supports application distribution across clusters in hybrid cloud scenarios. Argo CD is automatically enabled for clusters that are associated with ACK One. The associated clusters use GitOps for application distribution.
• Supports ArgoCD ApplicationSet to improve the user experience in application distribution across clusters.
• Publishs multi-cluster applications more securely, supports Secret management in GitOps, and accesses sub-clusters at the ServiceAccount level.
Currently, multiple ACK One customers use GitOps to build continuous deployment of applications in hybrid cloud and multi-cluster scenarios across multiple teams. ACK One Fleet manages dozens of cloud clusters and on-premises clusters and uses GitOps to rapidly deploy thousands of applications. Argo CD is automatically enabled for clusters that are associated with ACK One. The associated clusters use GitOps for application distribution. This simplifies the application distribution process across clusters.
The multi-tenancy permission management in GitOps involves granting ArgoCD RBAC permissions to RAM users or RAM roles and managing the RBAC permissions of RAM users or RAM roles on clusters, repositories, and applications by using ArgoCD Projects.
The following four steps are used to build a continuous deployment case for applications in hybrid cloud and multi-cluster scenarios across multiple teams:
ACK One Serverless Argo Workflows (Argo Workflows) are fully managed by ACK One. The performance, stability, observability, and O&M capabilities of Argo Workflows are improved. EventBridge is a serverless event bus service provided by Alibaba Cloud, which has significant advantages in availability, usability, and security.
Combining EventBridge, Argo Workflows, and Argo CD, ACK One GitOps allows you to easily, quickly, efficiently, and cost-effectively deliver your applications and implement an automated CI/CD system that delivers code when you submit it. For more information about how to build a CI Pipeline, see Event-driven CI Pipeline based on EventBridge.
Multi-cluster gateways are cloud-native gateways provided by ACK One for multi-cloud and multi-cluster scenarios. It manages north-south traffic at Layer 7 across multiple clusters in a single region. ACK One manages MSE Ingress and uses Ingress APIs to define traffic routing rules. It supports the following features across multiple clusters: HTTP routing, traffic splitting, health-based automatic disaster recovery, traffic mirroring, and traffic load balancing based on the number of replicas. You can use multi-cluster gateways to build capabilities such as disaster recovery, tag-based routing, and weight-based routing.
Multi-cluster gateways have the following advantages:
• A multi-cluster Global Ingress at the region level to centrally manage north-south traffic at Layer 7 in multiple clusters.
• Simplified multi-cluster traffic management: You can configure Ingress rules for multiple clusters in a Fleet instance without separately managing each sub-cluster. And they are compatible with NGINX Ingress.
• Multi-cluster gateways ensure high availability across zones.
• Fallback in milliseconds: When a backend of a cluster fails, the multi-cluster gateway smoothly migrates traffic to other backends.
• Gateways are fully managed and O&M-free.
The following five steps are required to build a hybrid cloud disaster recovery system. For more information, please refer to Use ACK One to implement zone-disaster recovery in hybrid cloud environments.
ACK One Fleet manages cross-cluster service discovery by implementing multi-cluster services of the multi-cluster service API provided by the Kubernetes community, which can help you in the following scenarios:
The following is the architecture of ACK One multi-cluster services:
1. Connections marked with Circled Number 1 in the figure are used by the Fleet instance to manage the ServiceExport and ServiceImport in the associated Container Service for Kubernetes (ACK) clusters.
• A ServiceExport is created in ACK Cluster 1 to export Service 1. ACK Cluster 1 serves as a Service provider. Service 1 provides external services.
• A ServiceImport is created in ACK Cluster 2 to allow ACK Cluster 2 to access Service 1 exported by the service provider. ACK Cluster 2 serves as a Service consumer.
2. The connection marked by Circled Number 2 in the figure is used for data exchange. After Service 1 is exported in ACK Cluster 1 and imported in ACK Cluster 2, you can access Service 1 in ACK Cluster 1 from ACK Cluster 2. This way, you can access Services across Kubernetes clusters.
The following figure shows how the Client Pod in ACK Cluster 2 can access Service 1 in ACK Cluster 1. The principle of multi-cluster services is as follows:
amcs-service1
that is prefixed with amcs-
and associate it with EndpointSlice.a) service1.provider-ns.svc.clusterset.local
:
The Client Pod needs to enable the multi-cluster plug-in in CoreDNS to use this domain name. After the Client Pod resolves the domain name, the IP address of ServiceImport is returned. Then, the Client Pod can use the IPs in the associated EndpointSlice to access the Pods in ACK Cluster 1.
b) amcs-service1.provider-ns.svc.cluster.local
:
The Client Pod needs normal Service domain name resolution in the Kubernetes cluster to use this domain name. Then, the Client Pod can use the IPs in the associated EndpointSlice to access the Pods in ACK Cluster 1.
For example, you can use headless multi-cluster services to implement read/write splitting for MySQL primary and secondary clusters. This helps improve the performance, throughput, reliability, and fault tolerance of the MySQL database.
Cross-cluster access to a specified stateful service instance through a headless multi-cluster service. The main steps are as follows:
a) mysql-0.mysql.provider-ns.svc.clusterset.local
b) mysql-0.amcs-mysql.provider-ns.svc.cluster.local
The global observability of the Fleet management feature of ACK One includes global monitoring and global FinOps. In addition, it includes capabilities under construction such as global event centers.
ACK One Fleet manages all Kubernetes clusters in a centralized manner to prevent differences in management and control. It uses an aggregation instance of Alibaba Cloud Managed Service for Prometheus to aggregate the metrics of each cluster, providing you with a global and centralized monitoring view to ensure your business stability.
The Fleet management feature of ACK One is a multi-cluster management solution provided by Alibaba Cloud. It features GitOps application distribution, multi-cluster gateways, multi-cluster services, global observability, service mesh, centralized permission management, and multi-cluster workload scheduling and distribution. This solution simplifies multi-cluster management in scenarios such as hybrid cloud, multi-cluster, and disaster recovery. Managed Fleet instances (Kubernetes clusters) and Argo CD also minimize your O&M efforts, allowing you to focus more on business development.
Use NVIDIA NIM to Accelerate LLM Inference in Alibaba Cloud ACK
Best Practices for Managing Knative Traffic Based on Service Mesh
177 posts | 31 followers
FollowAlibaba Cloud Community - December 13, 2024
Alibaba Container Service - December 26, 2024
Alibaba Container Service - April 12, 2024
Alibaba Container Service - June 13, 2024
Alibaba Container Service - May 16, 2024
Alibaba Container Service - April 17, 2024
177 posts | 31 followers
FollowAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreA secure image hosting platform providing containerized image lifecycle management
Learn MoreMore Posts by Alibaba Container Service