>> The Alibaba Cloud 2022 Double 11 Mega Sale is live now. Enjoy a 75% discount with Data Transfer of Live, DCDN from $1 per year!
This article focuses on gateways and discusses how to protect gateway high availability in big promotion scenarios. The following points are introduced one by one:
Why is it important to use gateways for high availability protection in the big promotion scenarios? In a word, gateways can transform the various uncertain factors into deterministic factors, and this ability is irreplaceable. Let’s look at three points:
The first point is to deal with the uncertainty of the traffic peak. The uncertain traffic must be turned into a certainty through the traffic-limiting rules. It is difficult for the business service module to limit itself because there is a prerequisite for implementing limiting protection: the service carrying this burst can still maintain a normal CPU load. Even if the business service module implements a QPS limit at the application layer, in the instantaneous high concurrency scenario, the CPU may still soar due to a large number of new connections at the network layer, and the limiting rules are useless. The business module should focus on the business logic of the application layer. The required resource cost is quite high to deal with the network layer through up-scaling. The status of the gateway as a business traffic portal determines that it must be good at dealing with high-concurrency network traffic. This performance is also an important indicator to measure the capability of the gateway. The stronger the performance to deal with high concurrency, the lower the resource cost required, and the stronger the ability to turn the big promotion traffic from uncertainty into certainty.
The second point is to deal with the uncertainty of user behavior. It is necessary to simulate user behavior for multiple rounds of stress testing exercises based on various promotion scenarios to discover the bottlenecks and optimization points of the system in advance. The gateway is both the traffic portal for user access and the final exit for backend service responses. This determines the gateway is a necessary stop for simulating user behavior for traffic stress testing and a necessary part of observing stress testing indicators to evaluate user experience. The stress testing, observation, and adjustment of limiting configuration on the gateway can promote the construction of the high-availability system.
The third point is to deal with the uncertainty of security attacks. The underground market is usually active during the promotion period. Abnormal transaction traffic is likely to trigger limiting rules, thus affecting the access of normal users. The traffic security protection capabilities of gateways (such as WAF) can identify abnormal traffic, intercept it in advance, and automatically blacklist abnormal IP addresses and cookies. This can exclude the abnormal traffic from the limiting threshold and secure the backend business logic. This is also an essential part of promoting high availability protection.
The MSE cloud native gateway implements a three-in-one next-generation gateway architecture of traffic gateway, microservice gateway, and security gateway. The comparison with the common multi-layer gateway architecture is listed below:
In this architecture, the WAF gateway implements the security capabilities, SLB implements the load balancing capabilities, the Ingress gateway implements the cluster ingress gateway capabilities (Nginx is also deployed in non-Kubernetes scenarios), and Zuul implements the microservice gateway capabilities. In the face of the burst traffic in the big promotion scenarios, under the common multi-layer gateway architecture, it is necessary to evaluate the capacity of each layer of gateways, which may be a potential bottleneck, and its capacity may need to be expanded. Under this architecture, the resource costs and operation and maintenance labor costs are high. Also, there is another availability risk for each additional layer of gateways.
Using MSE cloud-native gateways, all the capabilities of cluster ingress gateway, WAF gateway, and microservice gateway are implemented through a one-layer gateway based on reserving SLB for load balancing. In response to the big promotion scenarios, the operation and maintenance personnel only need to focus on the MSE gateway to manage all ingress traffic and realize high availability protection. This is the next generation gateway architecture that makes everything simple. And simplicity promotes reliability.
As shown in the following figure, the throughput performance of the MSE cloud-native gateway is twice that of the Nginx Ingress Controller. Please see Kubernetes Gateway Selection: Nginx or Envoy? for specific performance comparison and analysis. Facing the peak traffic in the big promotion scenarios, if the performance of the gateway is not good enough, it means enterprises have to pay more ECS resource costs, and they have to worry about whether the gateway can carry the traffic. Once the gateway goes wrong, the loss is immeasurable.
Gateway Specifications: 16 Cores, 32G * 4 Nodes
ECS Model: ecs.c7.8xlarge
In terms of high availability, the MSE cloud-native gateway has a built-in Alibaba Sentinel high availability module. After years of Double 11 testing, it provides a wide range of traffic-limiting protection capabilities, including traffic control rules, concurrency rules, and circuit breaker rules, which can fully ensure the high availability of backend services. In addition, the MSE cloud-native gateway has the preheating capability. The small traffic preheating method can effectively solve the problems of slow response to a large number of requests and request blocking caused by slow resource initialization in the big promotion scenarios. This prevents the newly expanded nodes from being unable to provide normal services and affects user experience.
In terms of the convenience of stress testing, you can easily initiate stress testing on specified gateway instances using the MSE gateway stress testing scenario of Alibaba Cloud PTS. Combined with the traffic limiting and observability of the MSE cloud-native gateway, you can adjust the traffic limiting configuration while doing stress testing and observing, realizing the construction of an one-stop high availability protection system.
In terms of security capabilities, the MSE cloud-native gateway integrates the functions of the WAF gateway and provides various authentication and security protection plugins in the plugin market. Users can write Wasm plugins in multiple languages (such as Golang, JS, Rust, and C++) to implement special traffic authentication and protection logic in their business scenarios and intercept abnormal traffic before matching limiting rules to avoid affecting normal access.
Apsara Conference | RocketMQ under the Wave of Cloud Computing & Open-Source
508 posts | 49 followers
FollowAlibaba Clouder - December 3, 2020
zcm_cathy - November 11, 2019
Alibaba Clouder - July 16, 2021
PM - C2C_Yuan - May 31, 2024
AlibabaCloud_Network - May 31, 2021
Alibaba Clouder - December 22, 2020
508 posts | 49 followers
FollowHigh Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.
Learn MoreProvides comprehensive quality assurance for the release of your apps.
Learn MoreA HPCaaS cloud platform providing an all-in-one high-performance public computing service
Learn MorePenetration Test is a service that simulates full-scale, in-depth attacks to test your system security.
Learn MoreMore Posts by Alibaba Cloud Native Community