By Guyi
Auto scaling is a core technology dividend in the cloud computing era, but in the IT world, no system function can be applied to all scenarios without any conditions. This article systematically sorts out the problems encountered by customers applying Enterprise Distributed Application Service (EDAS) in auto-scaling scenarios when designing the system architecture and summarizes those problems into five conditions and six lessons.
Whether manual intervention is required is the essential difference between auto scaling and manual scaling. In the traditional O&M of applications, the startup of a process requires a series of manual preparation on a machine, such as the environment building, configuration sorting of dependent services, and configuration adjustment of the local environment. Security group rules and access control of dependent services need to be manually adjusted for applications on the cloud. However, these actions that need to be manually performed will become unfeasible during auto scaling.
To be exact, statelessness mainly refers to the dependence degree of the business system on data when it is running. Data is generated during process executions. The data will have a continuous impact on subsequent program behaviors. When programmers encode the logic, they need to consider whether this data will cause inconsistent results to behaviors if the system restarts the process again in a new environment. The recommended approach is that the data should ultimately correspond with that in the storage system, making a real separation of storage and computing.
One of the features of auto scaling, especially the auto scaling on the cloud, is that it will be performed frequently. Especially for traffic-burst businesses, there is a kind of uncertainty. The system after startup is often in a "cold" state; thus, how to quickly “heat up” after startup is the key to the effectiveness of auto scaling. When auto scaling ends, an automatic scale-in often occurs. Since this process is also automatic, we need the ability to achieve automatic traffic removal. The traffic here includes not only HTTP/RPC but also messages and task (background thread pool) scheduling.
Our applications may use disks to configure some startup dependencies when they are starting. We also use disks to print logs or record data when the process is running. Nonetheless, under the auto-scaling scenario, the process starts fast and ends fast. The data placed on the disk will be lost after the process ends. Therefore, we should be prepared for the loss of disk data. Someone may ask how to deal with the logs. Logs should be collected through the log collection component for unified aggregation, cleaning, and review. This is also emphasized in 12 factor apps.
A large-scale business system is often not established by one service. Some central services are also used in the most typical architecture (such as caches and databases). After one service is elastically scaled out, it is easy to ignore the availability of central dependent services. If a dependent service is unavailable, it may arouse an avalanche effect on the entire system.
Auto scaling is divided into three stages: metric acquisition, rules calculation, and scaling. Metric acquisition is generally implemented through the monitoring system or the components on the PaaS. Common basic monitoring metrics include CPU, Mem, and Load. In the short term, there are some basic metric values unstable, but if the time of auto scaling is prolonged, normally the value will be in a “stable” state. When we set metrics, a reasonable value should be based on the data as a watermark over a long time rather than the characteristics in a short time. In addition, there should not be too many metrics. At the same time, scale-in metrics should have significant value differences from scale-out metrics.
Most of the time, the main judgment to identify the system availability is to see whether there is a circling icon on the system screen. That means the system is running slowly. According to common sense, a system running very slowly means that it needs to be scaled out. Therefore, some of our customers directly regard the average RT of the system as the scale-out metric, but the RT of the system is multi-dimensional. For example, the health check is generally fast. Once the frequency of such API is slightly higher, the average value of RT becomes lower. Some customers will also have metrics accurate to the RT of API, but the API also has different logic according to different parameters, resulting in different RTs. In short, it is very dangerous to carry out auto-scaling strategies based on latency.
Scale-out specifications refer to the specifications of resources. For example, in cloud-based scenarios, for the same 4c8g specification, we can specify the memory instance, the computation instance, and the network-enhanced instance. However, the cloud is a large resource pool, and a specific specification will be sold out. If a single specification is only specified, the resources may not be provided, so the scale-out will fail. The most dangerous thing here is not the scale-out failure itself, but the troubleshooting when a business failure occurs is particularly time-consuming.
It is often simple to sort out a single application but difficult to sort out the whole business scenario. A simple way is to sort out the scenario of application calls. From the perspective of inter-application call scenarios, there are generally three call types: synchronous call (RPC and middleware such as Spring Cloud), asynchronous call (message and middleware such as RocketMQ), and task call (distributed scheduling and middleware such as SchedulerX). We can sort out the first situation quickly, but it is easy to overlook the latter two. However, when problems occur in the latter two, the troubleshooting and diagnosis of problems are the most time-consuming.
Auto scaling is a typical background task. When we manage the background tasks of a large cluster, it would be better to have a large screen for intuitive visual management. If the scale-out fails, it cannot be handled silently. If the core business fails to scale out, it may cause a direct business failure. However, when a business failure occurs, it is often not taken into account whether the scale-out strategy takes effect. Therefore, if the failure is caused by scale-out, it is difficult to troubleshoot the cause.
Although cloud computing provides a pool with almost endless resources for auto scaling, it only eases users’ work in preparing resources. Moreover, the microservice system is complex, and the capacity change of a single component will have an impact on the whole procedure. In other words, after averting one risk, the bottleneck of the system may migrate, and invisible constraints will gradually appear as the capacity changes. Therefore, it is necessary to perform end-to-end stress testing and verification for a good auto-scaling strategy. We also recommend understanding various technical means from multiple dimensions of high availability in advance to form multiple sets of plans for use.
The auto scaling capability is more abundant in cloud-native scenarios, and the metrics available for auto scaling are more business-customized as well. Applying PaaS (such as Enterprise Distributed Application Service (EDAS) or Serverless App Engine (SAE) helps combine the basic technical capabilities of cloud vendors in computing, storage, and network to reduce the cost of using the cloud. However, there will be a few challenges for business applications (such as stateless/configuration code decoupling, etc.). From a broader perspective, this is the challenge for application architectures in the cloud-native era. However, if the application becomes more cloud-native, the technological dividend from the cloud will be more available to us.
Resource Profiling Makes the Setting of Container Resource Specifications Easier
Alibaba Cloud Container Service Courses | Create the Basic Environment
204 posts | 12 followers
FollowAlibaba Clouder - December 23, 2019
Alibaba Container Service - February 24, 2021
Hironobu Ohara - February 3, 2023
Alibaba Clouder - August 25, 2020
Alibaba Clouder - December 23, 2019
Alibaba Clouder - February 4, 2021
204 posts | 12 followers
FollowAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreHigh Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.
Learn MoreDeploy custom Alibaba Cloud solutions for business-critical scenarios with Quick Start templates.
Learn MoreElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreMore Posts by Alibaba Cloud Native