By Fog Dong
Serverless Application Engine (SAE) is a Kubernetes-based cloud product that combines the Serverless architecture and the microservice model. As an iterative cloud product, it has encountered many challenges in the process of rapid development. How can we solve these challenges in the booming cloud-native era and perform reliable and fast upgrades for architecture? The SAE team and the KubeVela community worked closely to address these challenges and came up with a replicable open-source solution, KubeVela Workflow.
This article describes how to use KubeVela Workflow to upgrade the architecture of SAE and interprets multiple practice scenarios.
SAE is an application hosting platform for business application architecture and microservices. It is a Kubernetes-based cloud product that combines the Serverless architecture and the microservice model.
As shown in the preceding architecture diagram, SAE users can host multiple types of applications on SAE. At the underlying layer of SAE, the Java business layer processes the relevant business logic and interacts with Kubernetes resources. At the bottom, it relies on highly available, O&M-free, and pay-as-you-go elastic resource pools.
In this architecture, SAE mainly relies on its Java business layer to provide users with capabilities. This architecture helps users easily deploy applications. Still, at the same time, it brings some challenges as well.
With the continuous development of Serverless, SAE has encountered three major challenges:
The preceding three challenges show that SAE needs some kind of orchestration engine to upgrade the delivery process, integrate with internal capabilities, and automate operations.
So, what does this orchestration engine need to meet to solve these challenges?
Based on these challenges and considerations, the SAE team and the KubeVela community have conducted in-depth cooperation and launched the KubeVela Workflow project as an orchestration engine.
Thanks to the booming ecological of cloud-native, there are already many mature workflow projects in the community (such as Tekton, Argo, etc.). There are also some orchestration engines within Alibaba Cloud. So, why reinvent the wheel instead of using existing technology?
This is because KubeVela Workflow has a fundamental difference in design. The steps in the workflow are designed for the cloud-native IaC system and support abstract encapsulation and reuse, which means that you can use atomic capabilities like a function call in every step, instead of just creating pods or containers.
In KubeVela Workflow, each step has a step type, and each step type corresponds to the resource callend WorkflowStepDefinition
. You can use the CUE language (an IaC language, which is a superset of JSON) to write this step definition or directly use the step type defined in the community.
You can simply regard a WorkflowStepDefinition
as a function declaration. Each time a new step type is defined, a new function is defined. The function requires some input parameters, and the step definition is the same. In the step definition, you can define the input parameters required for this step type in the parameter
field. When the workflow is running, the workflow controller executes the CUE code in the corresponding step definition using the actual parameter values input by the user, just as it executes your custom function.
With such a layer of abstraction of the steps, it adds huge possibilities to the steps:
build-push-image
and then specify your input parameters: the code source and branch of the built image, the name of the built image, and the secret key of the image repository to push.apiVersion: core.oam.dev/v1alpha1
kind: WorkflowRun
metadata:
name: build-push-image
namespace: default
spec:
workflowSpec:
steps:
- name: build-push
type: build-push-image
properties:
context:
git: github.com/FogDong/simple-web-demo
branch: main
image: fogdong/simple-web-demo:v1
credentials:
image:
name: image-secret
In such an architecture, the abstraction of the step brings infinite possibilities to the workflow. When you need to add a step to the workflow, you no longer need to compile-build-package the business code and then use the pod to execute the code. You only need to modify the configuration code in the step definition (together with the workflow engine's orchestration and control capability) to integrate with new features.
This is also the main reason why SAE chooses Kubevela Workflow. Based on scalability, we can fully leverage the power of ecology and accelerate product upgrades.
Next, let's go deeper into the user cases in SAE.
The first scenario is an automated operations scenario for SREs in SAE.
In SAE, we write and update some base images for users. We need to preload these images to multiple clusters in different regions to provide a better experience for users who use these base images.
The original operation process is very complicated. It involves building images and pushing them across multiple regions using ACR, as well as creating image cache templates and managing those image caches. Regions here include Shanghai, US West, Shenzhen, and Singapore. These operations are non-standardized and time-consuming. This is because when an SRE needs to push these images from the local to foreign clusters, it is likely to fail due to network problems. Therefore, he needs to disperse his energy on these operations that could otherwise be automated.
This is also the scenario that KubeVela Workflow is suitable for: each step in these operations can be converted into a step in the workflow programmatically, so that these steps can be orchestrated and reused. In addition, KubeVela Workflow provides a visual dashboard. SREs only need to configure a pipeline template once, and can automate the process by triggering execution or by entering specific runtime parameters each time the execution is triggered.
The simplified steps is like below:
HTTP request
step type to build an image by requesting the service of ACR and pass the image ID to the next step through inputs/outputs. In this step definition, you need to wait until the ACR service is done before ending the execution of the current step.error handle
step.HTTP request
step to build image cache, at the same time, the service logs are used as the source of the current step. Here you can view the logs of the steps directly in the Dashboard to troubleshoot problems.step group
with deploy
step type to preload images for clusters in the China (Shanghai) region and the US (West) region. The multi-cluster management and control capabilities of KubeVela Workflow are used to distribute the ImagePullJob
workload to multiple clusters to preload images.In the preceding process, if you do not use KubeVela Workflow, you may need to write a bunch of business code to connect multiple services and clusters. Let’s take the last step of distributing the ImagePullJob
workload to multiple clusters as an example: Not only do you need to manage the configuration of multiple clusters, but also need to watch the status of the workload (CRD) until the status of the workload becomes Ready
before proceeding to the next step. This process actually corresponds to a simple Kubernetes Operator's reconcile logic: first create or update a resource, if the status of the resource is as expected, then end the reconcile, if not, continue to wait.
Do we need to implement an Operator for every new resource management in our operations? Is there any convenient way to free us from the complicated Operator development?
It is precisely because of the programmability of steps in KubeVela Workflow that it can completely cover these operations and resource management in SAE scenarios, which can help engineers reduce manpower consumption. Similar to the above logic, the step definition corresponding to KubeVela Workflow is very simple. No matter what kind of resources (or a HTTP interface request), it can be covered by a similar step template like:
template: {
// First, read resources from the specified cluster
read: op.#Read & {
value: {
apiVersion: parameter.apiVersion
kind: parameter.kind
metadata: {
name: parameter.name
namespace: parameter.namespace
}
}
cluster: parameter.cluster
}
// Second, wait until the resource is Ready. Otherwise, the step keeps waiting
wait: op.#ConditionalWait & {
continue: read.value.status != _|_ && read.value.status.phase == "Ready"
}
// Third(optional), if the resource is Ready, then...
// Custom Logic...
// Users must input the defined parameter when using the step type
parameter: {
apiVersion: string
kind: string
name: string
namespace: *context.namespace | string
cluster: *"" | string
}
}
Corresponding to the current case is:
ImagePullJob
in a specified cluster, such as the cluster in the China (Shanghai) region.ImagePullJob
is ready and the image has been preloaded, continue to execute.ImagePullJob
is ready, clean up the ImagePullJob
in the cluster.Like this, no matter how many Region clusters or new type resources are added in the subsequent O&M scenarios, you can let KubeVela Workflow manage the cluster's KubeConfig, and use the defined step types with different cluster names or resource types to achieve a simplified Kubernetes Operator Reconcile process to reduce development costs.
In addition to automating internal O&M operations, upgrading the original SAE product architecture to improve delivery efficiency for users is also an important reason of choosing KubeVela Workflow.
In the delivery scenario of SAE, a delivery process corresponds to a series of tasks, such as: initializing the environment, building images, releasing in batches, etc. This series of tasks corresponds to the SAE Tasks
in the figure below.
These tasks are sequentially thrown to Java Executor
for business logic in the original architecture of SAE, such as creating resources in Kubernetes, and synchronizing the status of current tasks with the MySQL database, etc.
After the current task is completed, JAVA Executor will get the next task from SAE's original orchestration engine, and at the same time, the orchestration engine will continuously put new tasks into the initial task list.
The biggest problem in this old architecture is the polling call. JAVA Executor
will get it from the SAE task list every second to check whether there are new tasks; at the same time, after JAVA Executor
creates Kubernetes resources, it will attempts to get the status of resources from the cluster every second.
The original architecture of SAE is not the controller model in the Kubernetes ecosystem, but the polling model. If the orchestration engine layer is upgraded to the controller model for event watching, it can better integrate with the entire Kubernetes ecosystem and improve efficiency.
However, the logic coupling of business is deep in the original architecture. If the traditional container-based cloud-native workflow is used, SAE needs to package the original business logic into images and maintain and update a large number of images, which is not a sustainable path. We hope the upgraded workflow engine can be easily integrated with the task orchestration, business execution layer, and Kubernetes cluster.
With the high scalability of KubeVela Workflow, SAE engineers do not need to repackage the original capabilities into images or make large-scale modifications.
The new process is shown in the preceding figure. After a delivery paln is created on the SAE product side, the business side writes the model to the database, converts the model, and generates a KubeVele workflow, which corresponds to YAML on the right side.
At the same time, SAE's original Java Executor provides the original business capabilities as microservice APIs. When KubeVela Workflow is executed, each step is IaC-based, and the underlying implementation is in the CUE language. Some of these steps will call the business microservice API of SAE, while others will directly interact with the underlying Kubernetes resources. Data can be transferred between steps. If there is an error in the call, you can use the conditional judgment of the step to handle the error.
Such optimization is scalable and fully reuses the Kubernetes ecosystem. It extends workflow processes and atomic capabilities and is oriented to the final state. This combination of scalability and process control can cover the original business functions and reduce the amount of development. At the same time, the state update latency is reduced from the minute level to the millisecond level, which is agile and native. It has the YAML-level description capability but also improves the development efficiency from 1d to 1h.
In addition to automated O&M and upgrading the original architecture, what else can KubeVela Workflow provide?
The reusability of steps and the ease of integration with other ecosystems bring more surprises to SAE in addition to upgrades: from writing business code to orchestrating different steps, so as to launch new product features quickly!
SAE has accumulated a large amount of JAVA foundation and supports a wealth of functions, such as: supporting single-batch release, multi-batch release, and canary release for JAVA microservices. However, with the increase of customers, some customers have put forward new requirements, and they hope to have the ability to publish multilingual north-south traffic in canary release.
There are also many mature open-source products for canary release, such as Kruise Rollout. After investigation, SAE engineers found that Kruise Rollout can be used to complete the ability of canary release, and it can cooperate with the internal ingress controller(ALB) of Alibaba Cloud to split different traffic.
Such a solution is shown in the architecture diagram above. SAE distributes a KubeVela Workflow, and the steps in the Workflow will integrate with Alibaba Cloud ALB, open-source Kruise Rollout, and SAE's business components at the same time. Batch management is completed in the steps, thus completing the rapid launch of features.
In fact, after using KubeVela Workflow, it is no longer necessary to write new business codes for this feature. It is only necessary to write a step type for updating canaray batches.
Due to the programmability of step types, we can easily use different patch policies in the definition to update the release batches of Rollout objects in the different clusters. Moreover, in the workflow, the step type of each step is reusable. This means when you develop a new step type, you are laying the foundation for the next time a new feature is launched. This reuse allows you to launch features quickly and reduce development costs.
After SAE upgraded the KubeVela architecture, it improved delivery efficiency and reduced development costs. Based on the advantages of its underlying reliance on Serverless infrastructure, it can give full play to the advantages of the product in application hosting.
In addition, the architecture upgrade solution of KubeVela Workflow in SAE is an open-source solution that can be replicated. The community provides more than 50 built-in step types (including image building, image pushing, image scanning, and multi-cluster deployment, using Terraform to manage infrastructure, condition wait, message notification, etc) to help you open up CI/CD easily.
You can refer to the following documents for more usage scenarios:
You can learn more about KubeVela and the OAM project through the following materials:
Converged Database Ecosystem: Building CDC Applications with EventBridge
503 posts | 48 followers
FollowAlibaba Cloud Community - January 20, 2023
Alibaba Cloud Native - October 9, 2021
Alibaba Cloud Indonesia - January 25, 2021
Alibaba Cloud Serverless - August 23, 2022
Alibaba Clouder - July 12, 2019
Alibaba Cloud MaxCompute - July 15, 2021
503 posts | 48 followers
FollowCustomized infrastructure to ensure high availability, scalability and high-performance
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreAccelerate software development and delivery by integrating DevOps with the cloud
Learn MoreMore Posts by Alibaba Cloud Native Community