By Jiuzhu, Technical Expert at Alibaba Cloud
OpenKruise is an open-source automatic management engine for large-scale applications provided by Alibaba Cloud. In addition to features similar to those of Kubernetes native controllers such as Deployment and StatefulSet, OpenKruise provides more enhancements, including graceful in-place upgrade, release priority and dispersion policy, multi-zone workload abstraction management, and unified sidecar container injection management. All these core features have been tested in ultra-large-scale application scenarios at Alibaba Cloud. These features help to cope with more diverse deployment environments and requirements and bring more flexible deployment and release policies for cluster maintainers and application developers.
Currently, in Alibaba's cloud-native environment, most applications use OpenKruise for pod deployment and release management. Besides many Alibaba Cloud customers, several companies across industries use OpenKruise to deploy applications when native Kubernetes Deployment doesn't fully meet their requirements.
First, let's take a look at the release capabilities provided by the native Kubernetes workload.
These policies are feasible in test environments or small-scale application scenarios, but they cannot meet the requirements of large-scale application scenarios. For example:
This section describes two main features of CloneSet and SidecarSet in V0.5.0. Check the version update details here.
In Alibaba's cloud-native environment, most stateless applications are managed by CloneSet. To meet the deployment requirements of ultra-large-scale applications, we use the following methods:
In Kruise V0.4.0 released in February 2020, we launched open-source CloneSet. CloneSet has attracted a lot of attention since its release. Currently, it has been applied by many well-known Internet companies.
CloneSet of the initial version only supports policies such as maxUnavailable and partition but does not support maxSurge (scale-out and then scale-in). This is not a problem for large-scale applications in Alibaba Group. However, many community users have small-scale applications on platforms. If the policy of scale-out and then scale-in is not supported, application availability may be affected during the release.
Based on the feedback regarding issues #250 and #260 from the community, we added the support for the maxSurge policy to CloneSet V0.5.0. We appreciate the community members such as fatedier and shiyan2016 for their contributions and valuable suggestions. So far, CloneSet has covered all the release policies of the native Kubernetes workload. The following figure shows the release features of CloneSet.
We will elaborate on the release policies of CloneSet in a later article. Let's take a look at how maxSurge is implemented with streaming and phased release with the help of some examples:
1) Release Based on the maxSurge, maxUnavailable, and Partition Policies
apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
# ...
spec:
replicas: 5 # The total number of pods is 5.
updateStrategy:
maxSurge: 20% # One more pod is expanded: 5 x 20% = 1 (rounded up).
maxUnavailable: 0 # At least five pods are available during the release: 5 - 0 = 5.
partition: 3 # Three old pods are reserved (two pods are released: 5 - 3 = 2).
When a release starts, CloneSet expands one more pod based on maxSurge. Then, the total number of pods is 6 (five old pods and one new pod).
$ kubectl get clone demo
NAME DESIRED UPDATED UPDATED_READY READY TOTAL AGE
demo 5 1 0 5 6 17m
On the premise that maxUnavailable is unchanged, CloneSet deletes and creates pods gradually until there are three old pods (partition = 3). At this time, CloneSet deletes a new pod so that the total number of pods is 5 (three old pods and two new pods), as per the requirements.
$ kubectl get clone demo
NAME DESIRED UPDATED UPDATED_READY READY TOTAL AGE
demo 5 2 2 5 5 17m
To continue the release, the number of old pods must change to 0 (partition = 0). CloneSet expands one more pod based on maxSurge. At this time, the total number of pods is 6 (three old pods and three new pods).
$ kubectl get clone demo
NAME DESIRED UPDATED UPDATED_READY READY TOTAL AGE
demo 5 3 2 5 6 17m
On the premise that maxUnavailable is unchanged, CloneSet deletes and creates pods gradually until all pods are new ones (partition = 0). Finally, CloneSet deletes a new pod so that the total number of pods is 5 (five new pods).
$ kubectl get clone demo
NAME DESIRED UPDATED UPDATED_READY READY TOTAL AGE
demo 5 5 5 5 5 17m
2) In-place Upgrade Using maxSurge
CloneSet supports in-place upgrade and the upgrade by pod recreation, which can be used with policies such as maxSurge, maxUnavailable, and partition for pod release.
apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
# ...
spec:
updateStrategy:
type: InPlaceIfPossible
maxSurge: 20%
If maxSurge is configured in in-place upgrade mode, CloneSet expands pods specified by maxSurge, upgrades old pods in in-place upgrade mode (by updating the images in pod spec), and then clears and deletes pods specified by maxSurge after the specified partition is met.
This ensures the service availability and keeps the information such as IP addresses and volumes unchanged during the pod release.
SidecarSet is another key feature provided by Kruise. Unlike CloneSet and StatefulSet workloads that manage business pods, SidecarSet manages the sidecar container versions and injections in a cluster in a centralized manner.
The new feature in V0.5.0 resolves the repeated definitions of volumes in SidecarSet and pods upon sidecar container injection. This is feedback regarding the issue #254 of the community. They use SidecarSet to manage log collection sidecar containers and expect to inject sidecar containers to all pods in the bypass model.
For example, we need to inject a log collection sidecar container to each pod in a cluster. However, we cannot enable all application developers to add the container definition to their CloneSets and Deployments. Even if the container definition is added to the workloads of all applications, we must update the workloads to upgrade the image version of this log collection container, which is costly.
SidecarSet provided by OpenKruise is designed to solve this problem. We only need to write the sidecar definition into a global SidecarSet. No matter whether you use CloneSet, Deployment, or StatefulSet for deployment, the defined sidecar container is injected into all expanded pods.
Taking log collection as an example, first define a SidecarSet.
apiVersion: apps.kruise.io/v1alpha1
kind: SidecarSet
metadata:
name: log-sidecar
spec:
selector:
matchLabels:
app-type: long-term # Inject the container to all pods with the long-term label.
containers:
- name: log-collector
image: xxx:latest
volumeMounts:
- name: log-volume
mountPath: /var/log # Mount log-volume to the /var/log path and collect logs from the path.
volumes:
- name: log-volume # Define a volume named log-volume.
emptyDir: {}
You may wonder what to do if the log file directory varies for each application. This is why volume merge is required.
The original pod of an application before scale-out is as follows:
apiVersion: v1
kind: Pod
metadata:
labels:
app-type: long-term
spec:
containers:
- name: app
image: xxx:latest
volumeMounts:
- name: log-volume
mountPath: /app/logs # The log path of the application.
volumes:
- name: log-volume # Define a volume named log-volume.
persistentVolumeClaim:
claimName: pvc-xxx
The Kruise webhook will inject the log sidecar container defined in the SidecarSet into the pod.
apiVersion: v1
kind: Pod
metadata:
labels:
app-type: long-term
spec:
containers:
- name: app
image: xxx:latest
volumeMounts:
- name: log-volume
mountPath: /app/logs # The log path of the application.
- name: log-collector
image: xxx:latest
volumeMounts:
- name: log-volume
mountPath: /var/log
volumes:
- name: log-volume # Define a volume named log-volume.
persistentVolumeClaim:
claimName: pvc-xxx
Both the log volumes defined in the SidecarSet and pod are named log-volume. Therefore, the volume defined in the pod prevails during the injection. For example, the volume in the pod is mounted to a persistent volume (PV) in persistent volume claim (PVC) mode. After sidecar injection, this volume is also mounted to the /var/log directory in the sidecar container and then logs are collected.
In this way, sidecar containers are managed by SidecarSet. On the one hand, sidecar containers are decoupled from application deployment and release. On the other hand, sidecar containers share volumes with application containers to implement related sidecar functions such as log collection and monitoring.
The upgrade to the latest version, V0.5.0 enables the lossless release of applications and more convenient management of sidecar containers.
OpenKruise will be further optimized in terms of application deployment and release capabilities. We welcome the participation of more users in the OpenKruise community to build complete Kubernetes application management, delivery, and expansion capabilities for various larger-scale and more complex scenarios with extreme performance.
Understanding OpenKruise Kubernetes Resource Update Mechanisms
OpenYurt: Alibaba's First Open-Source, Cloud-Native Project for Edge Computing
503 posts | 48 followers
FollowAlipay Technology - May 14, 2020
Alibaba Developer - July 14, 2021
Alibaba Cloud Native Community - November 15, 2023
Alibaba Cloud Native Community - November 22, 2023
Alibaba Cloud Native Community - September 20, 2022
Alibaba Developer - April 15, 2021
503 posts | 48 followers
FollowAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreApsaraDB Dedicated Cluster provided by Alibaba Cloud is a dedicated service for managing databases on the cloud.
Learn MoreMore Posts by Alibaba Cloud Native Community