By Wang Siyu (Jiuzhu)
OpenKruise is an open-source automated management engine for large-scale applications developed by Alibaba Cloud. In terms of functions, it is similar to Kubernetes-native controllers, such as Deployment and StatefulSet. However, OpenKruise provides many additional features, including graceful in-place upgrades, release priority/dispersion policies, multi-zone workload abstraction management, and unified container injection management of Sidecar. These features are all core capabilities that have been tested by the ultra-large-scale application scenarios of Alibaba. They help Alibaba Cloud address more diverse deployment environments and requirements and provide cluster maintainers and application developers with more flexible deployment and release policies.
Currently, OpenKruise is used for pod deployment and release management for all applications in Alibaba's internal cloud-native environment. Many companies in the industry and users of Alibaba Cloud also use OpenKruise to deploy applications because Kubernetes-native workload controllers, such as Deployment, cannot fully meet the requirements. Alibaba Cloud hopes OpenKruise can enable every Kubernetes developer and Alibaba Cloud user to use the same deployment and release capabilities the Alibaba cloud-native applications use!
Please see: OpenKruise: The Cloud-Native Platform for the Comprehensive Process of Alibaba's Double 11
OpenKruise v0.7.0 was released in November 16, 2020. It added some main features, optimizations, and iterations. The following section provides an overview of this version.
StatefulSet
Based on the native StatefulSet
, Advanced StatefulSet
provides enhanced release capabilities, such as maxUnavailable
for parallel release and in-place upgrade.
Official Documentation: https://openkruise.io/en-us/docs/advanced_statefulset.html
In the past, custom workloads provided by OpenKruise were in v1alpha1. As workloads are widely used within Alibaba and by many community members, stable capabilities will be gradually upgraded to later versions. This Advanced StatefulSet
is the first CRD in v1beta1. Resources, such as CloneSet
and SidecarSet
will be gradually upgraded.
If users have used the Advanced StatefulSet
of v1alpha1 in the past, are there any problems when upgrading it to v1beta1? There is a clear answer: no. The existing Advanced StatefulSet
objects are automatically converted to v1beta1. Moreover, users can continue to use the v1alpha1 interface and client to perform operations on objects in this version.
Let's look at the CRD definition in the new-version StatefulSet
:
kruise-webhook-service
. The kruise-controller-manager
node is mounted to the kruise-webhook-service
. The same service is also configured in the MutatingWebhookConfiguration/ValidationWebhookConfiguration
of OpenKruise.Now, let's look at the conversion procedure shown in the figure above:
StatefulSet
, conversion is not required. So, apiserver can interact directly with etcd.When using the v1alpha1 interface to perform operations on Advanced StatefulSet:
For details of the multi-version conversion logics, please see: https://github.com/openkruise/kruise/blob/master/apis/apps/v1alpha1/statefulset_conversion.go
Generally, the pods and PVCs are scaled out are in sequence for either community-native StatefulSet
or Advanced StatefulSet
. For example, for a StatefulSet with 4 replicas, the ordinals of the created pods are [0, 1, 2, 3].
However, in some cases, users need to delete the pod with a specific ordinal and hope StatefulSet
does not use the pod with this ordinal. This is especially true in scenarios where Local PVs are used. When some nodes are abnormal, the original PVC/PV will be reused by the new pod with the same ordinal by deleting the original pod. The pod will be scheduled to the original node.
Start from the Advanced StatefulSet v1beta1 of (corresponding to OpenKruise v0.7.0 and later versions), the ordinal reservation function is provided:
apiVersion: apps.kruise.io/v1beta1
kind: StatefulSet
spec:
# ...
replicas: 4
reserveOrdinals:
- 1
By writing reserved ordinals in the reserveOrdinals
field, the Advanced StatefulSet will not create pods with these ordinals. If these pods already exist, they will be deleted. Note: spec.replicas
is the expected number of pods to be run, and spec. reserveOrdinals
contains the ordinals of pods that will not be created.
Therefore, for an Advanced StatefulSet
with 4 replicas and [1] in reserveOrdinals
, the ordinals of running pods are [0, 2, 3, 4].
reserveOrdinals
. Then, the controller deletes Pod-3 and creates Pod-5. The ordinals of running pods will be [0, 2, 4, 5].reserveOrdinals
, and the replica number is reduced to 3. Then, the controller deletes Pod-3, and the ordinals of running pods will be [0, 2, 4].The CloneSet
controller provides the capability to manage stateless applications efficiently. It is similar to native Deployment, but it offers many enhanced functions.
Official Documentation: https://openkruise.io/en-us/docs/cloneset.html
In CloneSet
, users can use the partition field to control the number of gray releases. In previous versions, this field could only be set to an absolute value. Starting from v0.7.0, this field can be set to a percentage. Its semantics says the number or percentage of pods in old versions is reserved, which is 0 by default.
apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
spec:
# ...
updateStrategy:
partition: 80% # This means that, only 20% of pods are upgraded to the new version. Users can also set the partition to the absolute value of the number of reserved pods in old versions.
There are two cases for the setting of the partition value during the release process:
Some previous bugs in the edge scenarios are solved. Thanks to the feedback and contribution of the community members:
resourceVersionExpectation
is resolved.gracePeriodSeconds
mode is used for continuous upgrades is solved.AdvancedCronJob
(New Controller)AdvancedCronJob
is a new controller added in v0.7.0. It is an extended version of CronJob
. It was contributed by Rishi Anand from Spectro Cloud!
The native CronJob
only allows users to create a Job to execute tasks. AdvancedCronJob
allows users to create different types of templates. This means users can configure the schedule rule to create a Job or BroadcastJob
periodically to execute the task. BroadcastJob
can distribute the Job to all or specific nodes to execute the task.
apiVersion: apps.kruise.io/v1alpha1
kind: AdvancedCronJob
spec:
template:
# Option 1: use jobTemplate, which is equivalent to original CronJob
jobTemplate:
# ...
# Option 2: use broadcastJobTemplate, which will create a BroadcastJob object when cron schedule triggers
broadcastJobTemplate:
# ...
# Options 3(future): ...
CronJob
, and it creates a Job for task execution.BroadcastJob
periodically to execute tasks.The kruise-controller-manager
of OpenKruise contains multiple controllers and webhooks.
Webhook needs to generate a complete set of TLS certificates. The HTTPS service on the webhook server uses these certificates when being enabled. In addition, the CA certificate needs to be written to MutatingWebhookConfiguration
, ValidatingWebhookConfiguration
, and caBundle
of the CRD conversion.
How can we generate certificates automatically and configure them to the preceding configuration resources? How can we rewrite the configurations after they are reset? These are the O&M challenges that webhook encounters.
This version of OpenKruise implements a webhook controller that supports self-maintenance for TLS certificates and related configuration resources of OpenKruise. The process is listed below:
MutatingWebhookConfiguration
, ValidatingWebhookConfiguration
, and CRD conversion and performs continuous "list watch" operation on these resources. The CA certificate will be rewritten once any change occurs.For more information, please see:
https://github.com/openkruise/kruise/blob/master/pkg/webhook/util/controller/webhook_controller.go
In the future, Alibaba Cloud will put these functions in a public warehouse. When writing webhooks, users can easily reuse the self-maintenance capabilities of this webhook.
OpenKruise will continue to make deeper optimizations in application automation. The next roadmap plan of OpenKruise, v0.8.0, has been released on March 4, 2021, and you can learn more about this release in this article. Alibaba Cloud will no longer be limited to workload application management capabilities and will make efforts in more fields, such as risk prevention and control and operator enhancement.
Alibaba Cloud welcomes every cloud-native enthusiast to participate in the construction of OpenKruise. Unlike other open-source projects, OpenKruise is not a copy of Alibaba's internal code. On the contrary, the OpenKruise Github repository is the upstream of Alibaba's internal code repository. Therefore, every line of code you contribute will run in all Kubernetes clusters within Alibaba and will jointly support Alibaba's world-leading cloud-native application scenarios!
Distributed Transaction Framework: Seata-Golang Communication Model
506 posts | 48 followers
FollowAlibaba Cloud Native Community - August 25, 2022
Alibaba Cloud Native Community - December 29, 2023
Alibaba Cloud Native Community - August 17, 2022
Alibaba Cloud Native Community - June 30, 2023
Alibaba Clouder - December 3, 2020
Alibaba Cloud Native Community - May 4, 2023
506 posts | 48 followers
FollowAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreMore Posts by Alibaba Cloud Native Community