By Wang Siyu (Jiuzhu)
OpenKruise is an open-source cloud-native application automation management suite from Alibaba Cloud. It is currently a Sandbox project hosted under the Cloud-Native Computing Foundation (CNCF). Based on years of Alibaba's experience in container and cloud-native technologies, OpenKruise is a Kubernetes-based standard extension component widely used in Alibaba's internal production environment. It follows the standards of the upstream community with the technical concepts and best practices for large-scale Internet scenarios.
OpenKruise released v0.8.0 on March 4, 2021. Please visit this website to view the changelog. This article gives an overview of the latest version.
If you are familiar with OpenKruise, you should already be familiar with kruise-manager. Kruise-manager is a centrally-deployed operator component that includes a series of controllers and webhooks.
v0.8.0 added the kruise-daemon node component, which is deployed to each node through DaemonSet. This way, the requirements, such as image warm-up and container restart, can be realized.
daemon.affinity
parameter during helm installation.In the Kubernetes ecosystem, there was no mature open-source solution for image warm-up in the past. Some companies, including Alibaba, implemented several warm-ups adapted to local scenarios internally. The image warm-up capabilities of Alibaba are generalized, exported, and integrated to OpenKruise v0.8.0.
The implementation of OpenKruise image warm-ups will be described in detail in the subsequent articles. There is a simple example below to illustrate the image warm-up:
apiVersion: apps.kruise.io/v1alpha1
kind: ImagePullJob
metadata:
name: job-nginx
spec:
image: nginx:1.9.1 # [required] Complete image name, name:tag
parallelism: 10 # [optional] Node sorting of pulled maximum concurrency and the default is 1
selector: # [optional] A name list of specified nodes or a tag selector (only one can be set),and no setting represents all nodes
names:
- node-1
- node-2
matchLabels:
node-type: xxx
completionPolicy:
type: Always # [optional] Default is Always
activeDeadlineSeconds: 1200 # [optional] No default and only valid for the Alway type
ttlSecondsAfterFinished: 300 # [optional] No default and only valid for the Alway type
pullPolicy: # [optional] The side face of pulled images on each node,and the default is backoffLimit=3, timeoutSeconds=600
backoffLimit: 3
timeoutSeconds: 300
The ImagePullJob
has two completionPolicy
types:
Always
indicates the job is a one-time warm-up and ends regardless of success or failure.
activeDeadlineSeconds
represents the deadline of the entire job.ttlSecondsAfterFinished
means the job will be deleted automatically if it finishes after the scheduled time.Never
indicates the job is endless and reloads the specified images on the matching nodes every day.For more information, please see the official documentation
SidecarSet is a controller that manages sidecar containers. After a SidecarSet is created, OpenKruise can automatically inject sidecar containers into pods that meet the specified conditions. OpenKruise upgrades the injected sidecar container in-place without affecting the running of business containers.
In the past versions, SidecarSet had many limitations. For example, users could not make it work only on a single namespace. It also had weak gray release capability when the sidecar was upgraded in-place. In v0.8.0, the controllers and webhooks of SidecarSet are refactored, and more capable policy fields are added to the CRD definition, as shown in the following examples:
spec.namespace
: It specifies only the sidecar injection and upgrade of a specific namespace are managed.
There are also multiple injection strategies, including:
podInjectPolicy
: It determines whether the sidecar container is injected before or after the original container list of the pod.shareVolumePolicy
: It specifies the volume-based policy to be shared with native containers in the pod.transferEnv
: It indicates certain environment variables from certain containers in the pod.There is also a variety of in-place upgrade strategies:
maxUnavailable
: It indicates the maximum number of unavailable instances during the upgrade.partition
: It indicates the number of old versions retained, including gray release and batch release.selector
: It indicates that only the sidecar of the pod that meets the selector criteria, canary release, can be upgraded.scatter
: It indicates the scattered release by tags.For more information, please see the official documentation
In the past, CRD and controller/webhook switches in OpenKruise were mainly configured in CUSTOM_RESOURCE_ENABLE
, while other configurable switches were all available in command line parameters. As a result, dispersion is triggered, and it is difficult to use the CRD switches to control switches associated with multiple CRDs.
Therefore, the new feature-gate mechanism replaced the CUSTOM_RESOURCE_ENABLE
, focusing on the functionality.
In v0.8.0, two switches were provided, PodWebhook and KruiseDaemon. After PodWebhook is disabled, OpenKruise does not perform the webhook intercept to the pod creation, but it also disables the SidecarSet function at the same time. When the latter is disabled, kruise-daemon components will not be deployed, and the warm-up function is also disabled. In later versions, the past switch parameters will be gradually unified into feature-gate.
Other optimizations:
OpenKruise v0.8.0 is the first Kubernetes community product that supports open-source, large-scale image warm-ups. In the next version coming later this year, we plan to use image warm-ups to accelerate application release, application security protection, controller gray release, and sharded control. v1.0 will release in the middle of this year.
OpenKruise is a mature CNCF Sandbox project. In addition to its massive application in Alibaba, there is also a wide range of OpenKruise user cases in the industry.
Ctrip uses CloneSet and AdvancedStatefulSet to manage stateless and stateful applications in the production environment based on requirements, such as in-place upgrade and gray release, respectively. The number of OpenKruise workloads in a single cluster reaches tens of thousands.
OPPO uses OpenKruise on a large-scale and strengthens in-place upgrade downstream with its customized Kubernetes, which is widely used in the backend running services of multiple services. About 87% of the upgrade deployment requirements were covered with an in-place update.
There is a multitude of Chinese companies using OpenKruise, including DouYu TV, Youzan, Suning, BX app, Boss app, STO Express, Xiaohongshu, Huohua, VIPKID, Zhangmen, Bank of Hangzhou Consumer Finance Company Vanyi Techonology Co., Ltd., Dmall, Zuojiang Science and Technology, Xiangzhuzhihui, Ihomefnt, Yonghui Science and Technology Center, Genshuixue, and Deepexi. Additionally, global companies such as Lyft, Bringg, Arkane Systems, and Spectro Cloud, use OpenKruise.
People interested in cloud-native are welcome to participate in the construction of OpenKruise and build a cloud-native application automation engine at the forefront of the whole industry.
Recently, Alibaba Cloud Data Accelerator for Disaggregated Infrastructure (DADI) has become an open-source product. This is a container image accelerator project that has already been widely used in Alibaba.
It replaces image downloading and decompressing with the on-demand pull of small-granularity data blocks. By doing so, many data downloads can be reduced while hiding the computing and data transmission latency and reducing startup latency significantly.
DADI combines the hierarchical nature of the container image with the block device interface of the virtual machine image to form a new hierarchical block device image called overlaybd
. DADI can support native file systems, such as ext4, XFS, and NTFS, using the block device interfaces. Block device interfaces can also support virtualized security containers with the smallest attack surface exposure. In addition, since the block device image is simple and efficient, overlaybd
can provide users with better I/O performance.
For more information, please see the Paper address and the GitHub address
If you are interested in the OpenKruise project or have any discussion topics, please visit the OpenKruise and GitHub webpages.
KubeVela: One of the Hottest Golang Cloud Native and Open Source Project!
OpenKruise: A Powerful Tool for Sidecar Container Management
503 posts | 48 followers
FollowAlibaba Developer - May 20, 2021
Alibaba Developer - April 15, 2021
Alibaba Cloud Native Community - October 18, 2022
Alibaba Clouder - December 3, 2020
Alibaba Cloud Native Community - August 17, 2022
Alibaba Developer - October 13, 2020
503 posts | 48 followers
FollowAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreA secure image hosting platform providing containerized image lifecycle management
Learn MoreMore Posts by Alibaba Cloud Native Community