By Mingshan Zhao, Member of OpenKruise
We’re pleased to announce the release of OpenKruise 1.3, which is a CNCF Sandbox level project.
OpenKruise is an extended component suite for Kubernetes, which mainly focuses on application automations, such as deployment, upgrade, ops and availability protection. Mostly features provided by OpenKruise are built primarily based on CRD extensions. They can work in pure Kubernetes clusters without any other dependences.
In release v1.3, OpenKruise provides a new CRD named PodProbeMarker
, improves its performance in large-scale clusters, Advanced DaemonSet support pre-download image, and some new features have been added to CloneSet, WorkloadSpread, AdvancedCronJob, SidecarSet etc.
Here we are going to introduce some changes of it.
Kubernetes provides three Pod lifecycle management:
So the Probe capabilities provided in Kubernetes have defined specific semantics and related behaviors. In addition, there is actually a need to customize Probe semantics and related behaviors, such as:
OpenKruise provides the ability to customize the Probe and return the result to the Pod Status, and the user can decide the follow-up behavior based on the probe result.
An object of PodProbeMarker may look like this:
apiVersion: apps.kruise.io/v1alpha1
kind: PodProbeMarker
metadata:
name: game-server-probe
namespace: ns
spec:
selector:
matchLabels:
app: game-server
probes:
- name: Idle
containerName: game-server
probe:
exec: /home/game/idle.sh
initialDelaySeconds: 10
timeoutSeconds: 3
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
markerPolicy:
- state: Succeeded
labels:
gameserver-idle: 'true'
annotations:
controller.kubernetes.io/pod-deletion-cost: '-10'
- state: Failed
labels:
gameserver-idle: 'false'
annotations:
controller.kubernetes.io/pod-deletion-cost: '10'
podConditionType: game.io/idle
PodProbeMarker results can be viewed at Pod Object:
apiVersion: v1
kind: Pod
metadata:
labels:
app: game-server
gameserver-idle: 'true'
annotations:
controller.kubernetes.io/pod-deletion-cost: '-10'
name: game-server-58cb9f5688-7sbd8
namespace: ns
spec:
...
status:
conditions:
# podConditionType
- type: game.io/idle
# Probe State 'Succeeded' indicates 'True', and 'Failed' indicates 'False'
status: "True"
lastProbeTime: "2022-09-09T07:13:04Z"
lastTransitionTime: "2022-09-09T07:13:04Z"
# If the probe fails to execute, the message is stderr
message: ""
SidecarSet will record historical revision of some fields such as containers
, volumes
, initContainers
, imagePullSecrets
and patchPodMetadata
via ControllerRevision. Based on this feature, you can easily select a specific historical revision to inject when creating Pods, rather than always inject the latest revision of sidecar.
SidecarSet records ControllerRevision in the same namespace as Kruse Manager. You can execute kubectl get controllerrvisions -n kruise-system -l kruise.io/sidecarset-name=<your-sidecarset-name>
to list the ControllerRevisions of your SidecarSet. Moreover, the ControllerRevision name of current SidecarSet revision is shown in status.latestRevision
field, so you can record it very easily.
There are two configuration methods as follows:
apiVersion: apps.kruise.io/v1alpha1
kind: SidecarSet
metadata:
name: sidecarset
spec:
...
updateStrategy:
partition: 90%
injectionStrategy:
revisionName: <specific-controllerrevision-name>
You can add or update the label apps.kruise.io/sidecarset-custom-version=<your-version-id>
to SidecarSet when creating or publishing SidecarSet, to mark each historical revision. SidecarSet will bring this label down to the corresponding ControllerRevision object, and you can easily use the <your-version-id>
to describe which historical revision you want to inject.
Assume that you are publishing version-2
in canary way (you wish only 10% Pods will be upgraded), and you want to inject the stable version-1
to newly-created Pods to reduce the risk of the canary publishing:
apiVersion: apps.kruise.io/v1alpha1
kind: SidecarSet
metadata:
name: sidecarset
labels:
apps.kruise.io/sidecarset-custom-version: example/version-2
spec:
...
updateStrategy:
partition: 90%
injectionStrategy:
customVersion: example/version-1
SidecarSet support inject pod annotations, as follows:
apiVersion: apps.kruise.io/v1alpha1
kind: SidecarSet
spec:
containers:
...
patchPodMetadata:
- annotations:
oom-score: '{"log-agent": 1}'
custom.example.com/sidecar-configuration: '{"command": "/home/admin/bin/start.sh", "log-level": "3"}'
patchPolicy: MergePatchJson
- annotations:
apps.kruise.io/container-launch-priority: Ordered
patchPolicy: Overwrite | Retain
patchPolicy is the injected policy, as follows:
Note: When the patchPolicy is Overwrite and MergePatchJson, the annotations can be updated synchronously when the SidecarSet in-place update the Sidecar Container. However, if only the annotations are modified, it will not take effect. It must be in-place update together with the sidecar container image. When patchPolicy is Retain, the annotations will not be updated when the SidecarSet in-place update the Sidecar Container.
According to the above configuration, when the sidecarSet is injected into the sidecar container, it will inject Pod annotations synchronously, as follows:
apiVersion: v1
kind: Pod
metadata:
annotations:
apps.kruise.io/container-launch-priority: Ordered
oom-score: '{"log-agent": 1, "main": 2}'
custom.example.com/sidecar-configuration: '{"command": "/home/admin/bin/start.sh", "log-level": "3"}'
name: test-pod
spec:
containers:
...
Note: SidecarSet should not modify any configuration outside the sidecar container for security consideration, so if you want to use this capability, you need to first configure SidecarSet_PatchPodMetadata_WhiteList whitelist or turn off whitelist checks via Feature-gate SidecarSetPatchPodMetadataDefaultsAllowed=true.
If you have enabled the PreDownloadImageForDaemonSetUpdate
feature-gate, DaemonSet controller will automatically pre-download the image you want to update to the nodes of all old Pods. It is quite useful to accelerate the progress of applications upgrade.
The parallelism of each new image pre-downloading by DaemonSet is 1
, which means the image is downloaded on nodes one by one. You can change the parallelism using apps.kruise.io/image-predownload-parallelism
annotation on DaemonSet according to the capability of image registry, for registries with more bandwidth and P2P image downloading ability, a larger parallelism can speed up the pre-download process.
apiVersion: apps.kruise.io/v1alpha1
kind: DaemonSet
metadata:
annotations:
apps.kruise.io/image-predownload-parallelism: "10"
CloneSet considers Pods in PreparingDelete
state as normal by default, which means these Pods will still be calculated in the replicas
number.
In this situation:
replicas
from N
to N-1
, when the Pod to be deleted is still in PreparingDelete
, you scale up replicas
to N
, the CloneSet will move the Pod back to Normal
.replicas
from N
to N-1
and put a Pod into podsToDelete
, when the specific Pod is still in PreparingDelete
, you scale up replicas
to N
, the CloneSet will not create a new Pod until the specific Pod goes into terminating.replicas
changed, when the specific Pod is still in PreparingDelete
, the CloneSet will not create a new Pod until the specific Pod goes into terminating.Since Kruise v1.3.0, you can put a apps.kruise.io/cloneset-scaling-exclude-preparing-delete
: "true" label into CloneSet, which indicates Pods in PreparingDelete
will not be calculated in the replicas
number.
In this situation:
replicas
from N
to N-1
, when the Pod to be deleted is still in PreparingDelete
, you scale up replicas
to N
, the CloneSet will move the Pod back to Normal
.replicas
from N
to N-1
and put a Pod into podsToDelete
, even if the specific Pod is still in PreparingDelete
, you scale up replicas
to N
, the CloneSet will create a new Pod immediately.replicas
changed, even if the specific Pod is still in PreparingDelete
, the CloneSet will create a new Pod immediately.All CronJob schedule: times are based on the timezone of the kruise-controller-manager by default, which means the timezone set for the kruise-controller-manager container determines the timezone that the cron job controller uses.
However, we have introduce a spec.timeZone
field in v1.3.0. You can set it to the name of a valid time zone name. For example, setting spec.timeZone: "Etc/UTC"
instructs Kruise to interpret the schedule relative to Coordinated Universal Time.
A time zone database from the Go standard library is included in the binaries and used as a fallback in case an external database is not available on the system.
For more changes, their authors and commits, you can read the Github release.
Have something you’d like to broadcast to our community?
Join the community on Slack.
503 posts | 48 followers
FollowAlibaba Cloud Community - October 21, 2022
Alibaba Cloud Native Community - June 19, 2024
Alibaba Developer - March 31, 2021
Alibaba Cloud Community - February 20, 2024
Alibaba Cloud Native Community - March 11, 2024
Alibaba Cloud Native Community - February 1, 2024
503 posts | 48 followers
FollowAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreMore Posts by Alibaba Cloud Native Community