By Mingzhou
Kruise Rollout is an open-source progressive delivery framework provided by OpenKruise. It is designed to provide a set of standard bypass Kubernetes release components that combine traffic release with instance grayscale release, support various release methods (such as canary, blue-green, and A/B testing), and support automatic release processes that are unaware and easy to scale based on custom metrics (such as Prometheus Metrics).
The latest version of Kruise Rollout 0.3.0 brought several interesting new features. First, we enhanced the publishing capability of the most widely used Deployment workloads in the Kubernetes community. Second, we expanded the traffic grayscale capability. Third, we supported the expansion of more gateway protocols by inserting Lua scripts.
Before introducing the new features, let's take a look at the current mainstream release forms of Kubernetes workloads:
1. Rolling Upgrade: The mainstream release mode of native Deployment. You cannot set a point in this mode.
2. Canary Release: A release mode supported by Flagger and Kruise Rollout for Deployment. When Deployment is released, a canary version of Deployment is created for verification. After the verification is passed, a full workload upgrade is performed, and the canary version of Deployment is deleted.
Figure 1: Canary Release Mode
3. Standard Batch Release: A standard batch release is performed using the partition feature provided by StatefulSet or CloneSet. During the release, the metadata (such as the original workload name) remains unchanged, and other workloads are not split.
Figure 2: Standard Batch Release Mode
4. Non-Standard Batch Release: The native logic of Deployment cannot support the batch release capability. Therefore, the rollout solution proposed by the KubeVela community uses the rolling release of two Deployments. A new Deployment is created each release, and the old Deployment is scaled in when Deployment is scaled out. This means the Deployment is replaced after each release.
Figure 3: Non-Standard Batch Release Method
5. A/B Testing: It divides user traffic into two disjoint paths (A and B) based on certain rules and imports different versions of pod instances for processing to observe, compare, or grayscale the capabilities of the new version. In general, A/B testing needs to be combined with canary release or batch release.
Figure 4: A/B Testing
For the release forms above, except for the rolling upgrade provided by Deployment, which does not need to rely on other three-party components, other release forms more or less need to rely on the capabilities of other components or upper-layer PaaS platforms. What are the advantages and disadvantages of Kruise Rollout as one of the solutions compared with other solutions? We compared two solutions that are currently popular in the open-source community: The Flagger 1 and the Argo-Rollout 2.
In general, the advantages of Kruise-Rollout are summarized below:
Before introducing the new features, let's talk about why the OpenKruise community is obsessed with Rollout:
If the batch release format is used in the scenario above, the explosion radius of the problem can be controlled within the grayscale range as much as possible, and sufficient grayscale and observation time can be left. However, the native logic of Deployment does not support batch operations. However, if Argo-Rollout is used, all workloads and pods need to be migrated, which is too risky and troublesome to adapt. If Flagger is used, pods still need to be migrated, and double resources are required when publishing, which is too expensive.
At this time, what you need may be Kruise-Rollout. It only takes two steps to make your existing Deployment ready for standard batch release.
Use an existing Kubernetes cluster or create a new Kubernetes cluster:
Note: The requirements of this version are mainly caused by the major changes in Ingress API in 1.19. If you do not need the complex traffic grayscale capability, which means you do not need to configure the TrafficRouting field, you can pull and modify charts to avoid this version requirement.
$ helm install kruise-rollout openkruise/kruise-rollout --version 0.3.0
cat <<EOF | kubectl apply -f -
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
namespace: default
annotations:
rollouts.kruise.io/rolling-style: partition
spec:
objectRef: # Bind your Deployment
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: echoserver
strategy: # Make your batch release rules
canary:
steps:
- replicas: 1 # The first batch issues one pod. After the batch is released, the pod is suspended. After manual confirmation, the pod enters the next batch.
- replicas: 60% # The second batch issues 60% of pods. After the batch is released, the pod is suspended. After manual confirmation, the pods enter the next batch.
- replicas: 100% # The third batch issues full pods and is automatically completed after the last batch is released.
EOF
As such, when you subsequently publish, the rolling upgrade of Deployment will directly become a batch release. The following uses a Deployment named echoserver as an example to describe the batch release process.
Check that the number of Deployment replicas is 5 and the current version is 789b88f977
We modify an environment variable of the container to trigger the release. You can see that only one pod is released in the first batch, and the version number is d8db56c5b.
After the first batch of pods is released, assuming that we have completed the verification of the first batch and want to continue to send the second batch of pods, we can use the command line tool kubectl-kruise to confirm the completion of the batch. This tool is an extension based on kubectl and is currently maintained by the OpenKruise community.
Note: The command to issue the next batch is kubectl-kruise rollout approve rollout/rollouts-demo.
As shown in the preceding process, the Rollout enters the StepUpgrade state when the batch is being published and is not completed. When the batch is published, the Rollout enters the StepPaused state.
When the second batch of release is confirmed and the last batch is issued, Rollout enters the Completed state, indicating the release is complete.
In particular, we still follow the rolling release rules in a single release batch. In other words, you can adjust the MaxUnavailable and MaxSurge configurations of a Deployment to improve the stability and efficiency of the Deployment. For example, in the following scenarios, you can follow the following configurations of a Deployment.
kind: Deployment
spec:
strategy:
rollingUpdate:
maxUnavailble: 0
maxSurge: 20%
kind: Deployment
spec:
strategy:
rollingUpdate:
maxUnavailble: 20%
maxSurge: 0
kind: Deployment
spec:
strategy:
rollingUpdate:
maxUnavailble: 25%
maxSurge: 25%
In addition, the solution fully considers various release scenarios to maximize flexibility:
In Kruise-Rollout versions earlier than v0.3.0, we provide a traffic canary release solution based on adjusting the traffic weight. However, in most scenarios, Ingress and other types of traffic have load balancing capabilities to meet the daily traffic canary release requirements. For example, 10% of canary replicas will automatically load 10% of traffic. If it is not for specified traffic adjustment (a 10% canary replica only imports 1% of traffic), you do not need to configure this capability separately.
However, special release forms (such as A/B testing) may be required for some release-sensitive businesses. When you release a specific batch of marked traffic to the new version of the pod, the traffic of the old and new versions must be isolated. For example, the following scenarios are used.
Kruise-Rollout users can use the following configuration to enable this capability:
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
namespace: default
annotations:
rollouts.kruise.io/rolling-style: partition
spec:
objectRef:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: echoserver
strategy:
canary:
steps:
- matches: # Set header&cookie matching rules.
- headers:
- name: UserAgent
type: Exact
value: iOS
pause: {}
replicas: 1
- replicas: 50%
- replicas: 100%
trafficRoutings:
- ingress:
classType: nginx
name: echoserver
service: echoserver
Compared with the simple batch release configuration, the description above of Header&Cookie matching rules and the reference of TrafficRouting are added. The configuration here uses Ingress-Nginx as an example. In other words, the corresponding Ingress controller must have the basic capability to use this capability (which can be understood as Nginx provides data plane ability and Kruise-Rollout provides control plane ability).
In this configuration, if a Deployment with ten replicas exists, it will be divided into three batches for release. The specific behavior is listed below:
With the development of cloud-native technology, cloud-native gateways are flourishing. In addition to the Nginx Ingress and Gateway API provided by Kubernetes, there are many Network Provider solutions, such as Alibaba Cloud ALB, MSE, and ASM, community's Istio, Kong, Apisix, and other companies' gateway solutions and protocols. At the beginning of the design, Kruise Rollout considered how to support the flourishing cloud-native gateway. The conventional hard coding method is time-consuming, laborious, and inconvenient for developers from different companies to use and maintain.
Finally, Kruise Rollout chooses the Lua script-based method to allow users to support more types of gateway protocols in the form of plug-ins. (This version only supports Ingress-based extension protocols. Other custom resource protocols will be supported in the next version). Kruise Rollout completes some common parts of the capability, while the specific implementation of different NetWork Providers is solved by Lua scripts. You only need to write the corresponding Lua script for different implementations. Please see NGINX and Alb Lua script examples [3] for more information. In order to make it convenient for everyone to write your Lua scripts, the following explains the Lua script for Nginx Ingress (the corresponding Rollout configuration can refer to new feature 2), which can be placed in a specific directory or a specific ConfigMap.
-- Because the Ingress grayscale release protocol is implemented based on annotations, all operations of this script
-- modifies the annotations to the target state. Kruise rollout patches the annotations to the
-- ingress canary resource
annotations = {}
-- obj.annotations is Ingress.Annotations. This sentence does not need to be changed.
if ( obj.annotations )
then
annotations = obj.annotations
end
-- This is the standard of nginx grayscale release protocol, and other implementations can be adjusted according to actual situation.
annotations["nginx.ingress.kubernetes.io/canary"] = "true"
-- Nginx's grayscale release protocol mainly has the following changes. To simplify the complexity of switching back and forth between multiple batches, each time,
-- empty these annotations first.
annotations["nginx.ingress.kubernetes.io/canary-by-cookie"] = nil
annotations["nginx.ingress.kubernetes.io/canary-by-header"] = nil
annotations["nginx.ingress.kubernetes.io/canary-by-header-pattern"] = nil
annotations["nginx.ingress.kubernetes.io/canary-by-header-value"] = nil
annotations["nginx.ingress.kubernetes.io/canary-weight"] = nil
-- obj.weight is rollout.spec.strategy.canary.steps[x].weight
-- Indicates the grayscale percentage of the current batch, which is '-1' when it is not set (the lua script does not support nil, so it is represented by '-1').
-- If it is not '-1', you need to set obj.weight to annotations.
if ( obj.weight ~= "-1" )
then
annotations["nginx.ingress.kubernetes.io/canary-weight"] = obj.weight
end
-- obj.matches is rollout.spec.strategy.canary.steps[x].matches (same as data structure).
-- If no settings are set, this step does not need to be published by A/B Testing, and you can return it directly.
if ( not obj.matches )
then
return annotations
end
-- Publish A/B Testing, traverse matches, and set matches to annotations.
-Note: Nginx does not support multiple headers, so no real traversal is required here, and only the first array is taken by default.
for _,match in ipairs(obj.matches) do
-- Note that the array in the lua script starts with the subscript '1'.
local header = match.headers[1]
-- cookie
if ( header.name == "canary-by-cookie" )
then
annotations["nginx.ingress.kubernetes.io/canary-by-cookie"] = header.value
-- header
else
annotations["nginx.ingress.kubernetes.io/canary-by-header"] = header.name
-- Whether it is regular.
if ( header.type == "RegularExpression" )
then
annotations["nginx.ingress.kubernetes.io/canary-by-header-pattern"] = header.value
else
annotations["nginx.ingress.kubernetes.io/canary-by-header-value"] = header.value
end
end
end
-- must be return annotations
return annotations
Note: This version is only implemented for Ingress resources. Other custom resources (CRDs) (such as Apisix and Kong) will be supported in the next version. Related PR[4] has been submitted to GitHub. You are welcome to discuss it together.
You are welcome to get involved with OpenKruise by joining us via GitHub or Slack.
[1] flagger
https://github.com/fluxcd/flagger
[2] Argo-Rollout
https://github.com/argoproj/argo-rollouts
[3] Nginx and Alb Lua script sample https://github.com/openkruise/rollouts/tree/master/lua_configuration/trafficrouting_ingress
[4] Related PR
https://github.com/openkruise/rollouts/pull/111
[5] Slack channel
https://kubernetes.slack.com/?redir=%2Farchives%2Fopenkruise
Dubbo Initializer: Simplifying Project Initialization and Dependency Management
Use LifseaOS to Experience the Minute-Level Scale-Out of Thousands of ACK Nodes
503 posts | 48 followers
FollowAlibaba Cloud Native Community - December 29, 2023
Alibaba Cloud Native Community - September 20, 2022
Alibaba Cloud Native Community - August 12, 2022
Alibaba Cloud Native Community - November 22, 2023
Alibaba Cloud Native Community - March 11, 2024
Alibaba Cloud Native Community - September 20, 2022
503 posts | 48 followers
FollowMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreAccelerate software development and delivery by integrating DevOps with the cloud
Learn MoreAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreMore Posts by Alibaba Cloud Native Community