By Tang Changzheng and Shimian
“The end-to-end canary release feature provided by Microservices Engine (MSE) improves system stability and makes the release of new requirements more at ease in a low-cost and non-intrusive manner.”
-- Laidian Technology Architect Tang Changzheng
Laidian Technology has entered the field of shared charging since 2014. It defines and pioneers the industry and is the earliest enterprise in the industry to provide shared charging. The main business covers services like self-service rental of shared power banks, development of customized shopping mall navigation machines, advertising display equipment, and advertising communication. Laidian Technology has a three-dimensional product line in the industry and cabinets of all sizes and desktop types. Currently, more than 90% of the cities in the country have realized the implementation of business services, with more than 200 million registered users and realizing the needs of users in all scenarios.
Laidian Technology has a wide range of services supported by diverse systems that use containerization and microservice in their system architecture. The microservice framework is based on Spring Cloud and Dubbo. With the rapid development in recent years, shared power bank equipment and business volume are increasing rapidly. The overall application architecture of Laidian Technology has also evolved continuously with the rapid development of the business. Microservice governance is the only way to deepen microservices. The next section shows the process explored by Laidian Technology during deepening microservices.
Origin: Review the business, architecture status, and pain points of Laidian Technology at that time
First Sight: Why Alibaba Cloud Microservices Engine (MSE) is chosen on the road to technology selection
Landing: How to achieve end-to-end canary release and lossless online and offline step by step with low cost and in a short period
Outlook: MSE and Laidian Technology work together to deepen the road to microservices.
Laidian Technology conforms to the following three points:
In October 2019, the service began to fully perform microservice transformation, and the containerization transformation was completed. In December 2020, Laidian Technology was fully microserviced and connected to Kubernetes.
A series of challenges gradually emerge in the process with the gradual deepening of the microservice process. These challenges are generally divided into three major levels: efficiency, stability, and costs. The mission of microservice implementation is to make business iteration more efficient. However, when the number of microservices gradually increases and the links are getting longer, the efficiency problems may outweigh the architectural dividends that the microservices model brings without further governance.
Therefore, Laidian Technology carried out observable construction of microservices in June 2021. Microservice governance construction began in September 2021.
In summary, containerization has the following advantages:
The following are benefits of comprehensive containerization to Laidian Technology systems. First of all, application deployment becomes very convenient. At the same time, due to the standardization of Kubernetes, CI/CD has become simple, and the overall release efficiency has been significantly improved. Applications deployed on Kubernetes naturally have the ability of flexible expansion so they can effectively cope with traffic peaks. After Kubernetes is installed, the service uses resources on-demand. The server was fixed for a long time (according to the previous peak), which has a low resource utilization rate, but now, it can save server costs. Traditional and complicated cluster O&M requires high skills of operation and maintenance personnel. They should be proficient in lua/ansible scripts and understand cloud product network configuration and monitoring operation and maintenance. The O&M cost of the system is very high. The standardized interface of Alibaba Cloud Container Service ACK can solve the problems of high-density deployment and system O&M, which significantly reduces costs.
When released, the following wrong ideas may appear:
These ideas may cause a wrong release.
Alibaba has the concept of three principles for safe production: smooth canary release, observation, and rollback. All R&D researchers must master how to use the canary release, observation, and rollback functions of the release system.
Frequent Internet release is the norm, and the same is true for Laidian Technology. Canary release, observation, and rollback are necessary abilities for microservice systems. Canary release is a necessary process before release and a key factor to improve online stability. When a new version of the service is to be launched, by draining a small part of the traffic to the new version, program problems can be found in time, and large-scale failures can be effectively prevented. There are already mature service release strategies in the industry, such as blue-green release, A/B testing, and canary release. These release strategies mainly focus on the single service release.
Laidian Technology has a large number of microservices, and the dependencies between services are complex. If hard isolation of multiple environments is adopted, the cost will be significantly increased, and the release method will become complicated. Sometimes, the release of a feature depends on multiple service upgrades and launches at the same time. Low-traffic grayscale validations can be finished on new versions of these services simultaneously. Multiple services of different versions can be validated by building environment isolation from the Ingress gateway to the entire backend service. This is the capability of end-to-end canary release in microservice governance.
Laidian Technology has considered self-developed microservice governance, and Laidian Technology architect Tang Changzheng has also participated in Dubbo's open-source community. The research and development of microservice governance are not very difficult for Laidian Technology, but self-developed microservice governance components still have the following essential cost problems:
Considering the microservice governance of production applications, the microservice framework usually introduces the logic of service governance, and this logic is usually relied on by the business code in the form of SDK. However, the changes and upgrades of these logics need to be implemented by modifying the code of each microservice business. Such changes cause very large access and upgrade costs. At the same time, it is necessary to develop the governance function of the open-source service framework, which means more workers are needed to manage and operate the components of microservice governance. Besides, self-construction will make the function close to the business, which also means the function will be relatively fewer and single. The scalability will be relatively weak in the future. In addition, there are many technical details for the implementation of end-to-end canary release, such as dynamic routing, node labeling, traffic labeling, and distributed Tracing Analysis. Therefore, the technical implementation costs are high. Due to the complexity of service frameworks (such as Dubbo and Spring Cloud) and with the gradual increase in the number of microservices and longer links, the positioning and resolution of related microservice governance issues have also become a question. With the support of professional teams (such as Spring Cloud Alibaba and Dubbo), the deepening of microservices will become more leisurely.
The first time we came into contact with MSE service governance, many points met our demands. The following points are very attractive to microservice governance transformation:
MSE microservice governance capabilities are implemented based on Java Agent bytecode enhancement technology. It seamlessly supports all Spring Cloud and Dubbo versions on the market in the past five years. Users can use them without changing codes. Users only need to enable MSE Microservice Governance Pro, configure it online, and take effect in real-time.
Users only need to install mse-pilot in the application market of Alibaba Cloud containers, enable namespace-level service governance in the MSE console, and restart applications to access them. It is also very easy to uninstall service governance. Users only need to disable service governance in the console and uninstall mse-pilot. Users do not need to change the existing architecture of the business and go up and down at any time without binding.
The service governance coverage of the whole lifecycle from development state, test state, and operation state enables developers to focus more on the business.
MSE microservice governance also provides the following solutions to solve difficulties and quickly improve the microservice governance capabilities of enterprises:
Stability Areas: MSE microservice governance provides online emergency diagnosis, troubleshooting and recovery, online release of stability solutions, and full-link grayscale solutions for microservices.
Cost Reduction and Efficiency Enhancement Areas: MSE microservice governance provides a daily test environment cost reduction isolation solution, microservice seamless migration to the cloud solution, and the efficiency improvement solution of microservice development and test.
MSE Service Governance Pro provides a visual view of microservice governance traffic:
As for the canary release traffic, the traffic will take effect in real-time after the routing rules are configured, so it can be understood at a glance:
At the same time, MSE provides end-to-end full lifecycle protection for lossless online and offline scenarios. At a glance, users can see whether the traffic is lost and what part of the loss is.
After entering the cloud-native system, the Kubernetes-based cloud-native system emphasizes flexible scheduling between clusters. Any scheduling resources based on POD will change the IP of POD after being scheduled. The traditional service governance system usually uses IP as the dimension to configure governance policies, but MSE uses tags as the dimension to configure microservice governance policies in a more cloud-native way.
At the same time, it is deeply integrated with the Kubernetes system in the Kubernetes environment, and a variety of complete solutions are introduced. Lossless online and offline enable applications to maintain traffic loss during auto-scaling. The CI/CD is built by Jenkins to realize canary release in the Kubernetes environment, and the end-to-end canary release is realized based on Ingress.
It was found that there were some limitations with MSE’s full-link capability during the process of contacting MSE microservice governance in September 2021. First, it only supports microservice gateway Spring Cloud Gateway, Zuul, and cloud gateway. At that time, it cannot support self-built Nginx gateways. At the same time, in Dubbo's scenario, it only supports routing according to the interface parameter dimension. It is also necessary for operation and maintenance personnel to know the implementation of service interfaces. Too much fine-grained control causes high costs of production. The end-to-end canary release ingress only supports HTTP gateways or applications as the ingress of canary release traffic and cannot support TCP gateways as the ingress of traffic.
After an in-depth understanding with the architects of Laidian Technology, we have further abstracted and summarized the canary release scenarios of users. We can only understand the needs of customers by going deep into the business. Three scenarios are summarized below:
The client adds the identification of the development environment to the request, and the access layer forwards to the gateway that represents the corresponding environment. The gateway of the corresponding environment uses the isolation plug-in to call the identification of the corresponding project isolation environment, and the request is closed in the business project isolation environment.
The entire call link will pass the header down by adding a specified header to the canary request. Users only need to configure the routing rules related to the header in the corresponding application. The canary request with the specified header enters the canary machine to implement the canary traffic as needed.
Scenario 1 can perfectly meet the canary release demand of Laidian Technology, and it is also the demand of the vast majority of cloud customers. Scenarios 2 and 3 can fit in well with higher-level demand.
Since the applied traffic is marked and the end-to-end canary release is performed, any traffic ingress and the grayscale of Ingress and self-built gateways are supported. While supporting the application-level canary release, it is compatible with custom routes. This more flexible way meets the end-to-end canary release of Laidian Technology.
This is the service architecture of Laidian Technology:
The top layer is the user interface, such as the mobile client. The self-built Nginx gateway is the access layer. The service layer includes various services. Spring Cloud and Dubbo are used as the service framework.
The architecture of the end-to-end canary release is shown below:
In the configuration of traffic diversion configured in the Nginx layer, 10% of traffic enters the canary release environment, and 90% of traffic enters the online formal environment without marking. Then, the traffic passing through the canary release environment will be automatically dyed by MSE with the color of the corresponding environment to carry out the canary routing of the whole link and ensure the traffic is closed in the canary environment. If there is no machine in the canary environment, the payment center only has online machines. Then, the traffic will go to the online environment. When there are machines in the canary environment in the data center, the canary traffic will return to the canary environment.
Releases during the peak hours of the day usually lead to business traffic losses. Therefore, R&D personnel have to choose to make changes during the off-peak hours of business at night. It significantly reduces the happiness index of R&D personnel because they have to face the dilemma of staying up late and working overtime. If the traffic can be changed during the peak period of heavy traffic during the day, it will significantly improve the efficiency of research and development.
Laidian Technology also encountered similar problems. When the business traffic is too large, the application is released, and the system service is just started. Due to the cold start process, the application capacity at this time is often lower than normal. However, the online traffic cannot distinguish whether the current service is just started, so there will still be a continuous influx of large traffic, which will lead to system overload and collapse, resulting in traffic loss. If MSE can warm up services, the traffic will increase slowly according to a certain curve, ensuring that services are fully warmed up. The application is protected to start safely even in high-concurrency and large-traffic scenarios.
MSE provides a non-intrusive warm-up method based on Agent that can enable users to provide service warm-up capability for applications without modifying any code.
MSE Service Governance Professional Edition provides non-intrusive core capabilities, such as end-to-end canary release, outlier instance removal, canary release, and microservice governance traffic observability. They help Laidian Technology quickly build a complete microservice governance system on the cloud in a cost-effective manner. Therefore, enterprises can ensure the efficiency of services, improve online stability, and ensure 99.9% service availability.
With the deepening of microservice of Laidian technology, there are more scenarios besides end-to-end canary release and lossless online and offline. The governance of the full lifecycle of microservices will cover the governance of release, operation, troubleshooting, fault recovery, and full-link traffic. MSE microservice governance will work together to help incoming Laidian Technology continuously improve the R&D efficiency of microservices and the high availability rate of services.
Alibaba Cloud MSE Cloud-Native Gateway Helps SKECHERS Easily Deal with Double 11
206 posts | 12 followers
FollowAliware - March 30, 2022
Alibaba Cloud Native Community - March 2, 2023
Alibaba Cloud Native - April 7, 2023
Alibaba Cloud Native Community - December 6, 2022
CloudNative - May 11, 2022
Alibaba Cloud Community - November 25, 2021
206 posts | 12 followers
FollowManaged Service for Grafana displays a large amount of data in real time to provide an overview of business and O&M monitoring.
Learn MoreA unified, efficient, and secure platform that provides cloud-based O&M, access control, and operation audit.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreMSE provides a fully managed registration and configuration center, and gateway and microservices governance capabilities.
Learn MoreMore Posts by Alibaba Cloud Native