By Tang Zhimin and Xie Yaoyao (Chuyang)
On February 12, 2020, Alibaba Cloud and Cloud Native Computing Foundation (CNCF) jointly held a webinar. During the webinar, Alibaba Cloud fully introduced more than 20 open-source Kubernetes projects for 10 categories for the first time, providing a practice of complete Kubernetes lifecycle management. This article summarizes the content of the complete video, provides the documents for download, and sorts out the questions that were left behind at the webinar.
Over the years, more and more enterprises use Kubernetes in their production environments. Kubernetes is widely accepted due to its sound design and prosperous community. So far, there are about 20 Special Interest Groups (SIGs) around Kubernetes. As an important SIG for the Kubernetes community, SIG Cloud Provider is devoted to promoting all cloud vendors to provide Kubernetes services with standard capabilities.
SIG-Cloud-Provider-Alibaba is the only sub-project of SIG Cloud Provider in China. SIG Cloud Provider is a cloud vendor interest group for the Kubernetes community. It ensures that the Kubernetes ecosystem is evolving in a way that is neutral to all cloud vendors and establishes standards and requirements that are common to all providers to ensure optimal Kubernetes integration. Currently, there are seven cloud vendors in SIG Cloud Provider, including Amazon Web Services (AWS), GCP, Alibaba Cloud, and IBM Cloud.
In the era of full migration to the cloud, enterprises' IT architecture has been reshaped in the cloud. Cloud-native computing is a set of best practices and methodologies for building scalable, robust, and loosely coupled applications in Alibaba Cloud, Apsara Stack, and multi-cloud environments. This facilitates quick innovation and lower-cost trials.
As a world-leading cloud vendor, Alibaba Cloud hopes to promote Kubernetes standardization and deepen the cooperation with other cloud vendors such as AWS, Google, and Azure to optimize the cloud-Kubernetes connection and unify modular and standardized protocols for different components.
We hope to establish the best Kubernetes running environment for Kubernetes developers and users and to provide Alibaba Cloud open-source plug-ins for Kubernetes. Alibaba Cloud Container Service for Kubernetes (ACK) also reuses these components.
As a cloud-native operating system for applications, Kubernetes has become a standard. During Kubernetes practices, Alibaba Cloud has provided many open-source projects to provide full-stack lifecycle management for user applications. Such projects involve five underlying categories (CloudController, computing, storage, network, and security) and five upper-layer categories (AI, ServiceBroker, application management, migration, and serverless.)
SIG-Cloud-Provider-Alibaba provides a channel for communicating Kubernetes cloud-native best practices on Alibaba Cloud. Any individual and organizational participants can learn how Cloud Provider works and apply it to production to realize its business value. For more information, please see the following:
CloudController is a cloud controller manager (CCM) of Kubernetes. It can interconnect with basic services of different cloud vendors, including Server Load Balancer (SLB), Virtual Private Cloud (VPC) routing, Elastic Compute Service (ECS), and Alibaba Cloud Domain Name System (DNS) services through NodeController, ServiceController, RouteController, and PVLController.
NodeController manages compute nodes, such as managing the lifecycle of ECS instances. By marking nodes with zones, regions, and hostnames, it provides complete information required for the orchestration system to schedule workloads for compute pools. It also regularly polls the IP addresses of ECS instances and checks whether ECS resources are released to dynamically update node information. This ensures that the orchestration system can respond to computing node events promptly.
ServiceController implements load balancing management for applications. It monitors the changes of Kubernetes Service objects, automatically configures and manages off-premises SLB services including SLB instances, listeners, and virtual server groups, and adjusts backend server groups of SLB instances based on application replica changes. On this basis, we have defined a wide range of annotations to customize the configuration of load balancing for applications. We have also worked with the Kubernetes community to standardize configurations and added the Elastic Network Interface (ENI) mode to the Kubernetes service discovery model. This simplifies the network hierarchy of service discovery and improves the overall application network performance by 10%.
Terway supports Kubernetes CNI specifications and is specially optimized for Alibaba Cloud. It supports multiple enterprise features, including the VPC routing mode, ENI mode, and inclusive ENI mode. Its performance in ENI mode is about 10% higher than that in the native VPC.
Terway is integrated with Alibaba Cloud Infrastructure as a Service (IaaS.) It allows pods to use network products such as Cloud Enterprise Network (CEN) and SLB, and use ENIs to avoid network performance loss. This eliminates experience compromise or performance degradation in the containerization process. It also supports advanced features such as Kubernetes network policies and quality of service (QoS)-based throttling.
The Alibaba Cloud Container Storage Interface (CSI) plug-in enables you to manage the lifecycle of container volumes in Kubernetes, including creating, mounting, and using cloud volumes. The CSI plug-in is implemented based on Kubernetes versions later than V1.14. It supports Alibaba Cloud storage services, such as disks, Apsara File Storage NAS, Cloud Paralleled File System (CPFS), Object Storage Service (OSS), and Logical Volume Manager (LVM.)
Log-pilot is used to efficiently collect logs from containers. It can easily collect the standard output logs of containers and dynamically discover and collect log files from containers. In declarative configuration mode, it can automatically detect the status of a container in the cluster to configure the container log collection function. It has many advanced features, such as automatic checkpoint and handle retention, tagging, and tag customization. With these features, log-pilot can flexibly collect and save log data to various log storage backends, such as Elasticsearch, Message Queue for Apache Kafka, Logstash, Redis, and Graylog.
Arena is a lightweight solution for the Machine Learning Platform for AI based on Kubernetes. It supports data preparation, model development, model training, and model prediction throughout the lifecycle, improving the work efficiency of data scientists. The service platform allows data scientists and algorithm engineers to quickly perform data preparation, model development, model training, evaluation, and prediction tasks by using Alibaba Cloud resources. These cloud resources include ECS, Elastic GPU Service, Apsara File Storage NAS, CPFS, OSS, E-MapReduce, and SLB instances. The service platform can also easily transform deep learning capabilities into service APIs to accelerate business application integration. It can also improve the utilization of Elastic GPU Service resources in a cluster through visual management of Elastic GPU Service resources and shared scheduling of devices.
This webinar talked about the strategic arrangement of Alibaba Cloud products for the Kubernetes community for the first time. We cannot detail all the open-source components here. Instead, we hope developers that are interested in Kubernetes can find corresponding open-source projects. Any developers are welcomed to raise PR, or issues, or to give roadmap suggestions. SIG-Cloud-Provider-Alibaba will share principles and best practices for specific components.
Q1: Can Cloud Provider of Alibaba Cloud Kubernetes add parameters to enable or disable each function?
A1: Yes. You can add annotations for this purpose.
Q2: Will it cause issues if we use a specified version of Kubernetes to make modifications based on Alibaba Cloud CCM?
A2: No, this will not cause issues because CCM is independent of the Kubernetes version.
Q3: Do Alibaba Cloud Kubernetes-based container services directly use open-source CCM? If so, what adjustments have been made internally before launch, and what is the specific format of provider_id?
A3: Yes, Alibaba Cloud Kubernetes-based container services directly use open-source CCM. The format of provider_id is ${regionid}.${nodeid}.
Q4: Must the node name of Kubernetes be the same as the instance ID of Alibaba Cloud for CCM? O&M personnel said they must be the same.
A4: No. Currently, only the provider ID needs to be set.
Q5: How is the underlying layer of Terway accelerated, by kernel level or Data Plane Development Kit (DPDK)?
A5: Terway can work on different networks with different configurations.
Q6: Can underlying kernel parameters of a pod be set in namespaces?
A6: It depends on the kernel. In new kernels, such as Linux Kernel 4.19 of Aliyun Linux2, most kernel parameters can be set and modified in a pod.
Q7: What security container products of Alibaba Cloud are available now?
A7: Alibaba Cloud Container Service currently provides the security sandbox as a container engine for users. In addition, some Alibaba Cloud serverless products, such as Serverless App Engine (SAE) and Elastic Container Instance (ECI), are also built on security containers.
Q8: Does Arena support multitenancy and virtual graphics processing units (vGPUs)?
A8: Arena reuses the existing user authorization and multitenancy policies of Kubernetes. Different users can use different kubeconfig files for authentication and use namespaces to isolate and share resources. In terms of Arena, users can view only the training and inference tasks for this namespace. Here, the vGPU refers to the NVIDIA vGPU technology. Currently, the vGPU technology that supports P4 in Alibaba Cloud has been integrated with ACK. You can get started in Alibaba Cloud Container Service. In terms of Arena, a vGPU is a resource that can be scheduled and orchestrated, but not a special resource.
Q9: Does the multi-container GPU sharing solution support resource isolation and can the GPU be limited?
A9: For our GPU sharing solution, Alibaba Cloud Container Service provides the only open-source GPU sharing solution in the industry. Currently, our solution implements multi-container GPU sharing at the scheduling layer and can be integrated with frameworks, such as TensorFlow, to limit GPU resources at the application layer. For more information about the usage of the solution, please see the user guide. We are also working with the Alibaba Cloud team to develop a secure and high-performance GPU isolation solution. In the near future, you may experience a complete solution with the GPU sharing and isolation functions.
Q10: Does ExternalDNS support Alibaba Cloud DNS?
A10: Alibaba Cloud DNS PrivateZone is supported now. The resolution of services or pods can be synchronized from the Kubernetes cluster to Alibaba Cloud DNS, reducing the loss caused by CoreDNS deployed in the cluster.
Q11: What is the major difference between the ingress-nginx of Alibaba Cloud and that of the Kubernetes community?
A11: The ingress-nginx of Alibaba Cloud provides more advanced features, such as the dynamic update of the ingress-nginx configuration. It also supports a phased release policy based on the headers, cookies, request parameters, and weight.
Q12: What is the release cycle of ACK and its development kits?
A12: A major version of ACK is updated every six months. Bugs are fixed irregularly.
Q13: Has the business edition of ACK@Edge been released and which users are using it?
A13: ACK@Edge has been launched for production. Its users come from many fields and industries, such as online education, video, Alibaba Cloud IoT, and Alibaba Cloud CDN. The business edition is expected to launch before June 2020.
Q14: Are there any control group (cgroup) memory leaks on the worker node on the host? If so, how can I solve the problem?
A14: The cgroup driver used by Container Service is the systemd cgroup driver. So far, no cgroup memory leaks have been reported.
Q15: Are the CPU and memory resources of a pod isolated from the host? If so, how are they isolated?
A15: You can use Kubelet to reserve resources for the host so that the resources of the pod are limited within the remaining resource space for isolation.
Q16: Does Alibaba Cloud have a tool similar to eckctl or ackctl from AWS?
A16: Please see aliyun-cli for the answer.
Q17: How does Alibaba Cloud support Windows containers?
A17: Windows 10 of version 1809 is currently supported and version 1903 will be supported soon. Windows nodes can be added to Linux clusters.
Q18: Can I integrate an open component into an existing Kubernetes cluster?
A18: Yes. Existing Kubernetes clusters meet the requirements of Kubernetes conformance testing.
You can find the complete video of the live presentation (in Chinese) here.
506 posts | 48 followers
FollowAlibaba Clouder - December 19, 2019
Alibaba Clouder - October 1, 2020
Alibaba Cloud Storage - June 4, 2019
Alibaba Developer - June 3, 2020
Alibaba Container Service - July 16, 2019
Alibaba Developer - June 17, 2020
506 posts | 48 followers
FollowAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreMore Posts by Alibaba Cloud Native Community