Serverless App Engine (SAE) provides auto scaling policies to help you handle traffic changes and ensure business stability. After you configure an auto scaling policy for an application, the application is automatically scaled out during peak hours and scaled in during off-peak hours. The maintenance-free auto scaling feature provides high reliability at low costs. For example, to ensure the high stability of your platform and perform quick response to the requirements of customers during e-commerce promotion activities, you can use SAE to deploy applications and configure auto scaling policies. SAE helps you monitor and modify the related policies in real time. You can perform subsequent O&M optimization in an efficient manner.
Process of configuring auto scaling
The following figure shows the process of configuring SAE auto scaling.
Limits
The auto scaling feature is applicable to only microservices applications.
Preparations
Configure application health checks. This configuration ensures the availability of applications. An application can receive traffic only after the application instances are started, running, and ready. For more information, see Configure application health checks.
Configure application lifecycle management. This ensures that an application can be gracefully shut down as expected when a scale-in operation is performed. To configure application lifecycle management, configure PreStop Settings. For more information, see Configure application lifecycle management.
Configure the exponential retry mechanism. To prevent service call errors that may occur due to untimely scaling, untimely application startup, or no graceful application start or shutdown, you can configure the Java exponential retry mechanism.
Optimize the application startup speed.
Optimize software packages. You can optimize the application startup time to reduce the impacts of external factors such as class loading and caching.
Optimize images. You can reduce the size of an image to reduce the amount of time that is required for image pulling when instances are created. You can use an open source tool to analyze and simplify image layer information.
Optimize Java application startup. When you create a Java application in the SAE console, select the Dragonwell 11 environment and enable the application startup acceleration feature. For more information, see Configure startup acceleration for a Java application.
Configure an auto scaling policy
Configure auto scaling metrics
SAE allows you to flexibly configure multiple metrics in the basic monitoring and application monitoring modules based on the attributes of the current application, such as CPU sensitivity, memory sensitivity, or I/O sensitivity.
You can view the historical data of the metrics in the Basic Monitoring and Application Monitoring module, such as the peak values in the previous 6 hours, 12 hours, 1 day, or 7 days, and the P95 or P99 value, and estimate the expected values of the metrics. To estimate the peak capacity of an application, you can use stress testing tools, such as Performance Testing (PTS), to perform stress tests to obtain information such as the number of concurrent requests that the application can handle, the amount of CPU cores and memory size that the application requires, and the response method that the application uses in a high load state.
When you configure an auto scaling policy, you must consider the following factors:
The weights of business availability and costs. Specify the expected value for the related metric based on the weights. Examples:
To configure an availability optimization policy, set the metric value to 40%.
To configure an availability and cost balancing policy, set the metric value to 50%.
To configure a cost optimization policy, set the metric value to 70%.
The dependencies of upstream and downstream services, middlewares, and databases. Configure an auto scaling policy or throttling and degradation methods based on the dependencies to ensure end-to-end availability when scale-out operations are performed.
After you configure the settings, you can modify the auto scaling policy based on monitoring data to reduce the difference between the specified capacity and the actual load of the application. For information about how to view monitoring data, see Basic monitoring.
Configure memory metrics
Java application runtime optimization is implemented by releasing physical memory and enhancing the correlation between memory metrics and business. You can add a Java Virtual Machine (JVM) parameter in a Dragonwell runtime environment to enable ElasticHeap that provides the scaling capability for Java heap memory. This reduces the actual usage of physical memory when a Java application runs. For more information about ElasticHeap, see G1ElasticHeap.
We recommend that you select a Dragonwell environment and enable ElasticHeap periodic uncommit to automatically uncommit memory. For more information, see Procedure and Configure a startup command.
Java environment: In the Configure JAR Package section, select a Dragonwell environment from the Java Environment drop-down list.
JVM parameter: In the Startup Command Settings section, enter -XX:+ElasticHeapPeriodicUncommit.
The memory metric configuration is not suitable for applications that perform dynamic memory management by using the JVM memory management tool or glibc malloc memory allocator, or performing the Free operation. If the idle memory of an application fails to be released for the operating system in a timely manner, the physical memory that is consumed by instances and the average memory that is consumed by new instances cannot be reduced in real time. As a result, scale-in operations cannot be performed.
Configure instances
Specify the minimum number of instances
Set the minimum number of instances to a value that is greater than or equal to 2 and specify multi-zone vSwitches for your application. If underlying node exceptions occur, instances may be evicted or no instance may be available in a zone. In this case, the application is stopped. We recommend that you configure appropriate settings to prevent this issue.
Specify the maximum number of instances
Set the maximum number of instances to a value that is less than or equal to the number of IP addresses in the zone. If the number of configured IP addresses exceeds the limit, instances cannot be added to the application. We recommend that you specify an appropriate value to prevent this issue.
You can view the number of available IP addresses for the current application in the Application Information section of the Basic Information page. If the available IP addresses are insufficient, use another vSwitch or add more vSwitches. For more information, see Verify an auto scaling policy.
Observe the scaling process
Maximum instances after scaling
On the Overview page, you can view the applications for which auto scaling policies are enabled and monitor the applications whose number of instances reaches the upper limit. You can re-evaluate the configurations of the auto scaling policies based on the data.
If you want to add more than 50 instances to an application, join the DingTalk group 32874633 to add your account to the whitelist.
Zone rebalancing
After a scale-in operation is performed based on an auto scaling policy, the application instances may not be evenly allocated to zones. On the Basic Information page of the application, you can view the zone to which each instance belongs. If the instances are not evenly allocated, you can restart an instance to perform zone rebalancing.
Configure automatic resumption for auto scaling policies
When you release a change order such as deploying an application, SAE disables the auto scaling policy of the application to prevent conflicts between the two operations. If you want to resume the auto scaling policy after the change order is complete, select Automatic on the Deploy Application page.
Maintain auto scaling policies
View application events
On the Application Events page of an application, you can view the behavior information of the related auto scaling policy, including the scale time and scale type. You can evaluate the effectiveness of the auto scaling policy and modify the policy based on your business requirements. For more information, see View application events.
View the monitoring chart of application instances
On the Basic Information tab of the Basic Information page of an application, you can view the Application Instance Trend Chart. The chart displays the metric data of the previous seven days, including the CPU utilization, memory usage, number of active TCP connections, number of service requests, and average response time. For more information, see View the metrics of application instances (invitational preview).