By Ning Qu
This article describes business scenarios and resource usage requirements of modern cloud data warehouses, and differentiation analysis of different resource delivery modes. Based on the features and values of Serverless MaxCompute, this article also introduces best practices of Serverless MaxCompute.
The following figure shows the Serverless architecture of MaxCompute, which mainly includes modules of the ingest service, multi-computing environment, storage service and management.
The main features of each module are as follows:
Serverless ingest service
Serverless multi-computing environment
Serverless storage service
Serverless management
The above is a brief introduction to the Serverless architecture. The focus of this article is how to use the Serverless computing resources of MaxCompute to meet the requirements of data warehouses.
The following figure shows the logic model for management and usage of MaxCompute computing resources. A Project in MaxCompute corresponds to a logical isolation unit of data warehouses. Different projects can be created for different management objectives. For example, a test-oriented project and a development-oriented project can be created respectively. The two projects have independent data and permission management systems, which are not related. Therefore, the management of them is isolated. However, such isolation space is not enough, because computing tasks need to bind computing resources. By binding projects with payment methods, different payment methods can be set for different projects according to the demands, so that different isolation spaces use different computing resources.
Under the above system, MaxCompute has some unique characteristics. First, it has a multi-tenant environment. Multiple isolated data warehouse spaces can be created according to different management requirements in MaxCompute. Enterprises can purchase multiple groups of logical computing resources, which provide multiple computing resources in isolated environments to better meet the needs of different scenarios.
As shown in the following figure, the ideal Serverless resource model requires well-planned resource utilization methods to fully adapt to actual requirements (the black line).
However, customers have different resource requirements and diversified demand scenarios. Main scenarios are as follows:
For these scenarios, big data computing demands for computing resources not in the completely Serverless on-demand allocation mode. In fact, it has different demands at different stages, while different types of demands have different requirements for computing resources.
Main features of computing resource demands are as follows:
Business agility requirements
Significant differences between periodic peaks and valleys
Stable business with focus on the SLA-based output of critical jobs
Resource governance: Computing power demands become stably predictable rather than rapidly changing
In general, the ultimate goal of computing power is to minimize costs on the premise of meeting differentiated needs in reality.
Then, how does MaxCompute Serverless meet the demands in the above scenarios? For an enterprise with rapid business development and business change, on-demand computing resources of MaxCompute Serverless are recommended. From a management point of view, different projects can be created for isolation, such as development test environment and production environment.
For some analysts, they often need to do a lot of exploration or machine learning analysis on some detailed data randomly. However, there are often some sudden computing power demands, which may be very large. In this case, these jobs need to be isolated from other environments because they are infrequent but require analysis of massive amounts of data.
Environments can also be isolated based on organizations. For example, many enterprises have relatively large organizations and they can isolate environments according to departments. By doing so, each department has an isolated environment and each department is an independent organization. As they require relatively independent data and computing resources, Serverless on-demand allocation mode can be adopted. With this mode, enterprises do not need capacity planning. In the initial stage, they can use the pay-as-you-go mode. The large resource pool meets the resource demands of various departments, avoiding resource competition.
In conclusion, Serverless can meet all demands well in a variety of job scenarios. In the single-job scenarios, Serverless can meet resource demands for jobs of different scales. In the scenarios of multi-job concurrency, Serverless can also meet multi-job resource demands and avoid resource competition. In the scenarios of cost control, MaxCompute can achieve cost estimation and control by preventing costly jobs. Through the preceding methods, MaxCompute and Serverless can greatly improve business agility and accelerate value realization.
In addition, in consideration of their daily management environment, some enterprises prefer a relatively stable resource pool, because they have certain resource planning and resource governance capabilities. In this case, they can purchase resources of a fixed specification and isolate the environments by functions or organizations. Then, they can use the quota group management capability provided by MaxCompute to divide the resources into multiple resource groups. By doing so, these resources groups can meet the demands of different businesses and organizations on the premise of predictable finance.
The key technical features of this mode are as follows:
The third scenario is the balance of costs and business agility. For example, data platform managers may often face a variety of jobs. Some are daily jobs. This kind of job usually requires fixed-size resources with controllable and predictable costs. Some are key jobs. For this kind of job, certain costs are required and allowed to accelerate job processes for business needs. Additional computing power is required for these jobs. Others are exploration jobs for data scientists.
These jobs are expected to not interfere with production jobs, while at the same time data scientists can use powerful computing power to quickly make business assumptions. These jobs can be allocated with an on-demand resource pool. Some enterprises may have innovative business. They need a new environment for data development and application innovation. Thus, an isolated environment for data warehouses can be created to allocate resources on demand and help them quickly verify business assumptions.
Two capabilities are provided for users:
The above three scenarios are real scenarios in daily operations. Another scenario is that business stabilizes after the customer uses resources in pay-as-you-go mode for a period of time. Therefore, the customer wants to put the project on a fixed resource pool in subscription mode. However, the customer wonders how to assess the resource demand. After all, resource demand estimation was not required in the previous pay-as-you-go mode. MaxCompute provides the capacity planning feature to solve this problem. It uses information schema to estimate the overall computing power demand of a project based on historical computing power consumption.
The key information includes:
Based on the preceding information, MaxCompute can predict the computing power demands of the business and conduct capacity planning based on certain rules. For more information about this feature, see the relevant articles in Alibaba Cloud community.
This article describes how to use Serverless to better manage resources and meet the resource demands of different businesses at low costs. It can be summarized as follows:
(1) The pay-as-you-go billing mode is applicable to fast-growing business and fast-changing demands. With cost control of MaxCompute, the cost can be controlled while meeting the business computing power needs.
(2) For subscription resources, multiple computing resources are segmented for corresponding load isolation and time-sharing scaling through quota management. The feature of baseline job priority provided by DataWorks and MaxCompute can guarantee the SLA of key jobs.
(3) For combination of pay-as-you-go and subscription modes, different computing resources can be chosen based on the job level. For emergency jobs, resources in pay-as-you-go mode can meet the demand for sudden computing power. For routine periodic jobs with spikes, resources in pay-as-you-go mode are also available to realizing efficiently utilize resources and reduce costs.
(4) The metadata can be used to assess the computing power demand for capacity planning. Thus, customers can choose the pay-as-you-go or subscription mode. Besides, metadata can also be used to analyze resource consumption, optimize resources, reduce the number of jobs with high resource consumption, and manage resources accordingly.
Integrating Real-time Search with SaaS-based Cloud Data Warehouses
137 posts | 20 followers
FollowAlibaba Cloud MaxCompute - March 25, 2021
Alibaba Cloud MaxCompute - March 24, 2021
Alibaba Cloud MaxCompute - March 25, 2021
Alibaba Cloud MaxCompute - March 24, 2021
Alibaba Cloud MaxCompute - July 14, 2021
ApsaraDB - November 17, 2020
137 posts | 20 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreA real-time data warehouse for serving and analytics which is compatible with PostgreSQL.
Learn MoreMore Posts by Alibaba Cloud MaxCompute