By Li Wei (Muyuan) (a core R&D Staff Member of Cloud-Native AnalyticDB for MySQL with ten years of R&D experience in the data warehouse, data lake, big data, and cloud-native). Currently, he focuses on the cloud-native data warehouse and serverless elasticity.
Nowadays, the world is suffering from slow economic growth and sluggish market demand. The way for enterprises to effectively reduce costs is to enhance digital construction and improve operation efficiency. In this context, the cloud-native data warehouse, AnalyticDB for MySQL data lakehouse edition, can be used flexibly on-demand.
The following is a figure of the data lakehouse edition. The orange part is the new function of the data lakehouse edition compared with the data warehouse edition, and the gray part is the iterative upgraded function of the data lakehouse edition compared with the data warehouse edition.
Compared with Serverful, Serverless has the following three aspects of optimization (from Berkeley [1])
When you use cloud-native data warehouse service, have you encountered the following problems and expected the service provider to help you solve them?
Case 1: The mixed SQL load of the service includes short queries and offline ETL. When offline ETL is running, the response time of short queries is affected.
Case 2: In order to run a large offline SQL statement, the instance is scaled out. When the offline SQL statement is not running, the resources of the instance are wasted.
Case 3: During the peak period of the online load, you need to manually scale out the instances, and you may easily mess things up.
Case 4: In emergencies, the instances may fail to be scaled out due to the insufficient underlying resources caused by load increases.
Case 5: The instance elasticity efficiency is low, and the startup time is noteworthy compared with the usage time of service resources.
The AnalyticDB for MySQL Team will be continuously paying attention to the issues related to Serverless elasticity. It will also help enterprises build a digital infrastructure with better serverless elasticity capabilities through the productization of technology.
Serverless elasticity is used to help users solve the problems above. At the same time, it poses corresponding challenges to AnalyticDB for MySQL in terms of scheduling, cost, inventory, and elasticity efficiency. For example:
These challenges have been gradually solved by AnalyticDB for MySQL. We're also trying to share the technologies used in the process so everyone can use the related product capabilities of AnalyticDB for MySQL to meet business needs.
In order to provide Serverless elastic product capabilities, two aspects of fundamental construction need to be implemented for the architecture, including fine-grained elastic unit definition and end-to-end pooled scheduling architecture for engines, resource scheduling, and resource inventory.
1ACU is approximately equal to 1Core 4GB is introduced as the definition of normalized resource to measure the usage of elastic resources. 1 ACU has a small resource unit, which can support AnalyticDB for MySQL to achieve the most fine-grained elasticity and help users reduce costs to the extreme.
When elastic inventory assurance, elastic efficiency, and fine-grained resource elasticity have to be guaranteed, it is difficult for the traditional ECS-based architecture with exclusive deployment to do that. In the AnalyticDB for MySQL of the data lakehouse edition, resource scheduling is built based on ACK/Kubernetes, and resource pools use the two-level inventory (fixed and elastic). The overall architecture can be divided into three layers:
In terms of product capabilities, AnalyticDB for MySQL supports practical (two-level inventory assurance), fast (high elasticity efficiency), and accurate (pertinent to business without wasted resources).
Whether it is the query elasticity of the offline loads or the elasticity of the nodes at the online instance level, inventory assurance is required. If you stock up on a batch of machines to meet the elasticity requirements, it will bring a huge inventory cost burden to the AnalyticDB for MySQL service when the user resources are scaled in. In order to ensure the elastic resource supply offline and minimize the cost of AnalyticDB for MySQL, a two-level elastic inventory supply capability based on the user profile operation is established.
In addition to inventory supply technology to support load elasticity, we have elasticity efficiency as another important technology. If it takes ten minutes to start an offline query, it will affect the user experience and cause a high additional cost. In the on-demand resource mode of AnalyticDB for MySQL offline queries, we can do 1200ACU-scale queries with an elastic time of only about 10s. The AnalyticDB for MySQL team has made end-to-end optimizations from the query execution model and the storage of pods to the network of pods to achieve this efficiency.
After online/offline load decoupling, offline loads can achieve extreme resource elasticity according to query. However, online queries require high RT, and instance node elasticity is recommended to cope with load changes. On-demand elasticity of AnalyticDB for MySQL online load achieves auto elasticity by establishing a closed-loop feedback link of load awareness → inventory supply → instance elasticity.
When using AnalyticDB for MySQL, the hybrid load scenario includes both online analysis and ETL offline analysis. Under the architecture with no decoupling from the online load, both the execution tasks of the online query and the offline analysis will use compute nodes, which will cause two problems:
AnalyticDB for MySQL enables elastic resource supply at the offline query level to solve this problem. Resources required for offline queries are completely isolated from online resources, and online loads are not affected. Resources for offline queries are applied for use on-demand, and users do not need to bear the cost of running resources.
AnalyticDB for MySQL is equipped with the four basic technical capabilities above to support Serverless elasticity. In the future, AnalyticDB for MySQL will continue to strengthen its technical construction in terms of intelligence, speed, and cost-saving.
Based on the preceding technologies, the Serverless elasticity of AnalyticDB for MySQL can bring about the following effects:
AnalyticDB for MySQL is suitable for users that have low-cost offline ETL processing and need to use high-performance online analysis functions to support BI reports, interactive queries, and applications.
[1] The Berkeley Paper: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-3.pdf
Performance Optimization of RDS AliSQL for Binlog – Extreme IO Optimization
[Infographic] Highlights | Database New Feature in February 2023
ApsaraDB - January 9, 2023
Alibaba Clouder - September 28, 2020
Alibaba Clouder - December 21, 2020
Alibaba Clouder - May 20, 2020
ApsaraDB - February 29, 2024
ApsaraDB - November 28, 2022
An end-to-end solution to efficiently build a secure data lake
Learn MoreBuild a Data Lake with Alibaba Cloud Object Storage Service (OSS) with 99.9999999999% (12 9s) availability, 99.995% SLA, and high scalability
Learn MoreA premium, serverless, and interactive analytics service
Learn MoreTSDB is a stable, reliable, and cost-effective online high-performance time series database service.
Learn MoreMore Posts by ApsaraDB