By Zhao Mingshan (Liheng)
OpenKruise is an open-source cloud-native application automation management suite of Alibaba Cloud. It is also a Sandbox project currently hosted under the Cloud Native Computing Foundation (CNCF). It comes from Alibaba's years of containerization and cloud-native technology. It is a standard extension component based on Kubernetes for large-scale applications in Alibaba's internal production environment. It is also a technical concept and best practice that closely adheres to upstream community standards and adapts to large-scale Internet scenarios. In addition to the original workloads and sidecar management, Kruise is currently experimenting with progressive delivery.
The term Progressive Delivery originated from large and complex industrial projects. It attempts to dismantle complex projects in stages and reduce delivery costs and time through continuous small closed-loop iterations. With the popularization of Kubernetes and cloud-native concepts, especially after the emergence of continuous deployment pipelines, progressive delivery provides the infrastructure and implementation methods for Internet applications.
The specific behavior of progressive delivery can be attached to the pipeline during the iteration of the product. The entire delivery pipeline can be regarded as a process of product iteration and a progressive delivery cycle. Progressive delivery in practice is based on A/B testing, canary release, and other technical means. Let’s take Taobao product recommendation as an example. Every time a major function is released, it will go through a typical progressive delivery process. Therefore, it improves the stability and efficiency of delivery through progressive delivery.
Kubernetes only provides deployment controllers for application delivery and Ingress and Service abstractions for traffic. However, Kubernetes does not have a standard definition of how to combine the implementations above into a progressive delivery solution that is easy to use. Argo-rollout and Flagger are currently popular progressive delivery solutions in the community, but they are different from our ideas in some capabilities and concepts. Firstly, they only support Deployment, not Statefulset and Daemonset, let alone custom operators. Secondly, they are not non-intrusive progressive publishing. For example, Argo-rollout cannot support community Kubernetes Native Deployment. Flagger copies Deployment created by businesses, resulting in Name changes and compatibility problems with GitOps or self-built PaaS.
In addition, free development is a major feature of cloud-native. The Alibaba Cloud Container Team is responsible for the evolution of the cloud-native architecture of the entire container platform. There is also a strong demand in the application progressive delivery field. Therefore, based on the community solutions and Alibaba's internal scenarios, we have the following goals in the process of designing Rollout:
Kruise Rollout is Kruise's abstract definition model for progressive delivery. The complete Rollout definition meets canary release, blue-green release, and A/B Test release that matches application traffic and actual deployment instances. The release process can be automated in batches and pauses based on Prometheus Metrics indicators. It can provide bypass imperceptible docking and compatibility with existing multiple workloads (Deployment, CloneSet, DaemonSet). The architecture is listed below:
Canary release and phased release are the most commonly used release methods in progressive delivery practices:
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
spec:
strategy:
objectRef:
workloadRef:
apiVersion: apps/v1
# Deployment, CloneSet, AdDaemonSet etc.
kind: Deployment
name: echoserver
canary:
steps:
# routing 5% traffics to the new version
- weight: 5
# Manual confirmation, release the back steps
pause: {}
# optional, The first step of released replicas. If not set, the default is to use 'weight', as shown above is 5%.
replicas: 1
- weight: 40
# sleep 600s, release the back steps
pause: {duration: 600}
- weight: 60
pause: {duration: 600}
- weight: 80
pause: {duration: 600}
# No configuration is required for the last batch.
trafficRoutings:
# echoserver service name
- service: echoserver
# nginx ingress
type: nginx
# echoserver ingress name
ingress:
name: echoserver
During the rollout process, Prometheus Metrics can be automatically analyzed and combined with steps to determine whether the rollout needs to be continued or suspended. As shown below, the HTTP status codes of the service in the past five minutes are analyzed after each batch is published. If the proportion of HTTP 200 is less than 99.5, this rollout process will be suspended.
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
spec:
strategy:
objectRef:
...
canary:
steps:
- weight: 5
...
# metrics analysis
analysis:
templates:
- templateName: success-rate
startingStep: 2 # delay starting analysis run until setWeight: 40%
args:
- name: service-name
value: guestbook-svc.default.svc.cluster.local
# metrics analysis template
apiVersion: rollouts.kruise.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 5m
# NOTE: prometheus queries return results in the form of a vector.
# So it is common to access the index 0 of the returned array to obtain the value
successCondition: result[0] >= 0.95
failureLimit: 3
provider:
prometheus:
address: http://prometheus.example.com:9090
query: |
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
)) /
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
))
1. Let’s assume that a user has deployed the echoServer service based on Kubernetes (below) and uses Nginx Ingress to provide external services:
2. Define Kruise Rollout Canary Release (1 new version of Pod and 5% traffic) and apply -f to the Kubernetes cluster:
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
spec:
objectRef:
...
strategy:
canary:
steps:
- weight: 5
pause: {}
replicas: 1
trafficRoutings:
...
3. Upgrade the echoserver image version (Version 1.10.2 -> 1.10.3) and kubectl -f to the Kubernetes cluster:
apiVersion: apps/v1
kind: Deployment
metadata:
name: echoserver
...
spec:
...
containers:
- name: echoserver
image: cilium/echoserver:1.10.3
After Kruise Rollout monitors the preceding behavior, the canary release process will start automatically. As shown below, the canary Deployment, service, and Ingress are generated automatically. 5% of the traffic is configured to the new version of pods.
4. After R&D personnel confirm there is no exception in the new version for a period, they can run the command kubectl-kruise rollout approve rollout/rollouts-demo -n default
to publish all remaining Pods. Rollout precisely controls the subsequent process. When the release is complete, all canary resources are reclaimed and restored to the user-deployed state.
5. If the new version is abnormal during the canary process, you can adjust the images to the previous version (1.10.2). Then, kubectl applies -f to the Kubernetes cluster. Kruise Rollout listens to this behavior and reclaims all canary resources to achieve a quick rollback.
apiVersion: apps/v1
kind: Deployment
metadata:
name: echoserver
...
spec:
...
containers:
- name: echoserver
image: cilium/echoserver:1.10.2
With the increasing number of applications deployed on Kubernetes, learning how to achieve a balance between rapid business iteration and application stability is a problem that must be solved for platform builders. Kruise Rollout is a new exploration of OpenKruise in the field of progressive delivery. It aims to solve the problem of traffic scheduling and batch deployment in the field of application delivery. Kruise Rollout has officially released v0.1.0 and is integrated with the community OAM KubeVela project. Vela users can quickly deploy and use Rollout capabilities through Addons. In addition, we hope community users can join us to explore the application delivery field together.
Kruise Rollout: Flexible and Pluggable Progressive Rollout Framework
Kruise Rollout v0.2.0: Support Gateway API and StatefulSet Rollout in Batches
506 posts | 48 followers
FollowAlibaba Cloud Native Community - September 20, 2022
Alibaba Cloud Native Community - November 22, 2023
Alibaba Cloud Native Community - September 20, 2022
Alibaba Cloud Native Community - September 18, 2023
Alibaba Cloud Native Community - July 27, 2023
Alibaba Cloud Native Community - December 29, 2023
506 posts | 48 followers
FollowAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreAn enterprise-level continuous delivery tool.
Learn MoreAccelerate software development and delivery by integrating DevOps with the cloud
Learn MoreMore Posts by Alibaba Cloud Native Community