The Alibaba Cloud 2021 Double 11 Cloud Services Sale is live now! For a limited time only you can turbocharge your cloud journey with core Alibaba Cloud products available from just $1, while you can win up to $1,111 in cash plus $1,111 in Alibaba Cloud credits in the Number Guessing Contest.
By Jiuzhu (Siyu Wang) from the Alibaba Developer Community
OpenKruise is an open source cloud-native application automation engine provided by Alibaba Cloud since June 2019. Essentially, it is an extended workload project based on the Kubernetes standard. OpenKruise can be used with native Kubernetes to provide more powerful and efficient capabilities to manage application containers, Sidecar, and image distribution. By realizing automation in different dimensions, OpenKruise can overcome the problems concerning the large-scale O&M and website building of applications on Kubernetes, including deployment, upgrading, elastic scaling, Quality of Service (QoS) adjustment, health check, migration, and repair.
Kruise is a homophone of the word "cruise." The "K" stands for Kubernetes, indicating the applications of navigation and automatic cruise on Kubernetes. Kruise is fully equipped with Alibaba's best practices in deploying, publishing, and managing large-scale applications over the years. It also carries the demands of thousands of customers for Alibaba Cloud Kubernetes services.
During the process of fully connecting the Alibaba economy to cloud-native, the Alibaba Technical Team gradually developed a set of technical concepts and best practices that are consistent with the standards of the upstream community and suitable for large-scale Internet scenarios. Among those, the most important things are the ways to have applications published, run, and managed automatically. The Alibaba Cloud container team feeds back these capabilities to the community through OpenKruise to guide the industry to the best practice of cloud-native applications.
During the 2020 Double 11 Global Shopping Festival, Alibaba made its core systems fully cloud-native. Alibaba has been running nearly 100,000 OpenKruise workloads and managing millions of containers.
The following figure shows the relationship between the OpenKruise running internally for Alibaba and the open source OpenKruise.
The preceding figure shows that OpenKruise on GitHub is the upstream repository of the main body, and the internal downstream repository only implements a few internal coupling features based on public interfaces. Running codes for internal OpenKruise accounts for less than 5%. In other words, more than 95% of the codes for OpenKruise that runs internally for Alibaba comes from the community repository.
There are two notable points:
Personnel that is responsible for upper-layer business may not be familiar with the concept of "workload." Have you ever wondered how to scale an application up or down or how to publish an application? In the cloud-native environment, application deployment requirements, such as the required number of machines and image versions, are described in a final state-oriented way, as shown in the following figures:
The workload mainly refers to the Yet Another Markup Language (YAML) definitions and corresponding controllers.
When an application is scaling up or down, Platform as a Service (PaaS), which is an O&M platform, will modify the required number of machines in the preceding YAML object. For example, if you scale up the number of machines from 10 to 110 and scale down by 5, the final number of machines is 105. Then, the controller adjusts the number of pods, also known as containers, based on the expected workload.
When there is a request for publishing or rolling back, PaaS modifies the image version and publishing policy in the YAML object. Then, the controller rebuilds all managed pods into the desired version based on the specified publishing policy. The working mechanism is much more complicated than the preceding statement.
In short, the workload manages the lifecycle of all containers in the application. Scaling and publishing applications depend on the workload. The workload can also constantly maintain the number of pods in the running application to ensure the expected numbers of instances are running continuously. If a host fails and the instances on the host are evicted, the workload immediately creates new containers for the applications.
After the core systems of Alibaba are migrated to the cloud, services, such as e-commerce, that are related to Double 11 and applications, such as middleware, are migrated to the cloud-native environment. These services and applications are deployed based on the workload of OpenKruise. OpenKruise provides several different types of workloads to support different methods of deployment.
Therefore, from e-commerce business on the upper-layer to middleware, and then to the O&M containers and basic components, the entire upstream and downstream process is deployed and operated based on the workloads provided by OpenKruise. For all the time, OpenKruise is maintaining various kinds of operations, such as the number of machines and version management, urgent scaling up, and publishing during the application runtime.
Imagine what would happen if OpenKruise failed?
Confronted with these high-risk problems, Alibaba has taken multiple protective measures to ensure the stability and availability of services.
The solution is OpenKruise, a platform for Alibaba's deployment on the cloud, which has nearly taken over the full O&M services for the Double 11 Global Shopping Festival.
What brought about OpenKruise? Why was OpenKruise designed?
As the cloud becomes a major trend and cloud-native turns into a standard, the workload provided by native Kubernetes can hardly work to tackle the challenges faced by Alibaba in its ultra-large-scale business scenarios:
In this context, OpenKruise was created. Whether it is through new development, such as CloneSet and SidecarSet, or enhanced compatibility, including Advanced StatefulSet and Advanced DaemonSet, Alibaba finally has the upper-layer business implemented in cloud-native.
The most important feature of OpenKruise is its "in-place upgrade." When customers need to upgrade an application, this feature only upgrades the images in the original pod without migrating or rebuilding the container. Some of the benefits are listed below:
Furthermore, Alibaba provides many other advanced features of OpenKruise to meet many business demands in large scale scenarios. The following figure compares OpenKruise with a Kubernetes-native workload in terms of their features in stateless and stateful applications.
On November 11, 2020, after the votes from all of the members of the Cloud Native Computing Foundation (CNCF) Technical Oversight Committee, the open source OpenKruise from Alibaba Cloud officially became a CNCF hosting project.
OpenKruise became a community open source project, and its internal and external versions are almost the same. Moreover, Alibaba included OpenKruise into the application directory of Alibaba Cloud Container Service for Kubernetes (ACK). All customers on the public cloud can install and use OpenKruise. Therefore, OpenKruise has realized its "trinity" system of application in Alibaba's internal business, cloud services, and open source community. Currently, Douyu TV, STO Express, and Youzan are the main customers that use OpenKruise on ACK. As for the open source community, enterprises, such as Ctrip and Lyft, are both customers and contributors to OpenKruise.
The cloud-native capability of the workload that is released by OpenKruise, based on Alibaba's ultra-large-scale scenarios, is significant in many aspects. It not only complements the missing section of application workload in the cloud-native community but also provides cloud users with Alibaba's years of experience concerning the deployment and management of applications and the best practices of cloud-native. Since OpenKruise became an open source project, it has set several milestones:
Afterward, OpenKruise will focus on the following objectives.
The Cloud Infrastructure of STO Perfectly Handles Massive Amounts of Parcels During Double 11
2,599 posts | 762 followers
FollowAlibaba Developer - January 9, 2020
Alibaba Developer - May 20, 2021
Alibaba Developer - October 13, 2020
Alibaba Cloud Community - March 8, 2022
Alibaba Clouder - July 12, 2019
Alibaba Cloud Native Community - September 20, 2022
2,599 posts | 762 followers
FollowGet started on cloud with $1. Start your cloud innovation journey here and now.
Learn MoreVisualization, O&M-free orchestration, and Coordination of Stateful Application Scenarios
Learn MoreServerless Application Engine (SAE) is the world's first application-oriented serverless PaaS, providing a cost-effective and highly efficient one-stop application hosting solution.
Learn MoreMore Posts by Alibaba Clouder