×
Community Blog Serverless Data Warehouse Exploration for Agile Data Analysis

Serverless Data Warehouse Exploration for Agile Data Analysis

This article explains how to help enterprises upgrade to a more agile analytics platform architecture using Serverless OLAP, simplifying architecture complexity and improving analysis efficiency.

By Longcheng

Agile Cloud-Native Data Warehouse Architecture

Traditional cloud-native data warehouses require users to provision a data resource for 24/7 long-term operations. This method is troublesome for entrepreneurs that advocate agile database technology and lacks a flexible usage model for exploration or growth businesses to meet low-cost data analysis demands.

As more enterprises pay attention to this issue, the use of Serverless has gradually been mentioned by major manufacturers.

AnalyticDB for PostgreSQL in Serverless mode was commercially released in December 2022, which can help enterprises build more modern data policies. After activation, analysis can be started immediately after data loading. In Serverless mode, computing resources in use are billed when the analysis is performed but are free of charge when the computing resources are idle, which can significantly reduce the data usage burden on enterprises. This mode provides efficient and lightweight data architecture services when you consider building the data architecture of the entire enterprise or want to explore innovative businesses without disrupting the current architecture. It can help enterprises conduct low-cost exploration within minutes to hours.

You can enable a Serverless instance to schedule resources automatically. When you create a Serverless instance, you can configure a threshold for the Analytic Compute Unit (ACU). The threshold limits the computing resources of the instance when computing is triggered. When computing occurs, the system quickly responds to the corresponding resources to meet users' computing needs. Users can view the real-time usage of the current computing resources (ACU) in the console in real-time. Limiting the instantaneous upper limit of resource usage can help ensure the financial controllability of resources. This can be manually adjusted according to different requirements for resources at different times.

Three Recommended Scenarios of Serverless

1. Build an Agile Analysis Platform to Support Flexible Data Lake Analytics (DLA) and Federated Analytics

Large amounts of data are stored in data lakes (such as OSS and ODPS). Continuous analysis is not a norm. For business requirements, we need to analyze small data samples for data lakes. At this time, users can rely on the ADB PG in Serverless mode to quickly pull up a lightweight data analysis framework to support business and complete the most cost-effective analysis service without heavy IT planning.

1

1.  Data Lake Analytics of ADB PostgreSQL

  1. Data Analysis for ODPS: https://www.alibabacloud.com/help/en/analyticdb-for-postgresql/latest/use-maxcompute-foreign-tables-to-access-maxcompute-data
  2. Data Analysis for OSS: https://www.alibabacloud.com/help/en/analyticdb-for-postgresql/latest/analyze-data-lakes-using-oss-external-tables
  3. Data Analysis for Hadoop: https://www.alibabacloud.com/help/en/analyticdb-for-postgresql/latest/use-external-tables-for-federated-analytics-of-hadoop-data-sources

2.  Federated Analytics of ADB PostgreSQL

  1. Federated Analytics for Mainstream Databases: https://www.alibabacloud.com/help/en/analyticdb-for-postgresql/latest/use-external-tables-for-federated-analytics-of-external-sql-databases

2. Build a Read/Write Splitting Architecture

The subscription model is more suitable for the stable data business because the business can be accurately expected and relatively stable. Users can obtain a large discount through the subscription model to achieve the best financial choice. However, the analysis business is more driven by short-term business objectives, and it is an exploratory data use with high urgency, strong uncertainty, high timeliness requirements, and other characteristics. There is a local contradiction between the business-side requirements and the stability of the technical architecture team. In this case, the best analysis path is to build a physically isolated and flexible analysis architecture quickly.

2

When the analysis is involved, data of existing instances can be quickly connected through data sharing capabilities, and flexible data analysis capabilities can be implemented. At the same time, if there are too many analysis requirements, multiple resource instances can be enabled to help users meet the analysis requirements.

3. Cost-Effective Data Archiving

Must the low-frequency analysis data generated by the production database be put into the data lake? Serverless technology provides a new solution. You can use the built-in data archiving capability of DMS to archive useless data or data for low-frequency analysis in the production database and store it at a low cost. This solution has several benefits:

  1. You can create a white screen to archive data in batches and set a white screen.
  2. Resources are started only during the archiving process. After data is archived, only low-cost storage is required for retention.
  3. Archived data can be analyzed at any time, and resources are charged only for the time of analysis.

3

Create a Serverless Instance

Next, we can quickly create a Demo to help you understand how to use the Serverless mode.

First, you can create an instance of the Serverless version for free. You can select Serverless under the Pay-As-You-Go billing method and select the automatic scheduling in the Serverless mode to create the instance.

4

After the instance is created, go to the instance console, where you can manage the instance you just created.

5

You can see that after the instance is created, the instance is in an idle state without the use of SQL. No fees are charged for computing in this state.

Next, let's use test cases with pre-loaded sample data:

1.  Create a primary database account

2.  Load the sample dataset in the console and view the sample SQL statements

6

3.  After logging on to the database, you can execute sample SQL statements on the corresponding instance analysis interface.

7

4.  You can go back to the Instance Details page and view instance resource usage on the Monitoring page.

8

5.  If you find that the computing power does not match or want a faster resource release, you can manually adjust the threshold and cooling time of instance resources. By adjusting, you can change the instantaneous computing resources that need to be involved in computing and the waiting duration for the idle state after SQL computing is completed. Currently, the minimum time is 60 seconds.

9

Summary

We can see that the leading manufacturers in the industry are already involved in Serverless technology. Leaving complexity to vendors and simplicity to customers is a principle that cloud vendors have always followed. Serverless has also become synonymous with flexibility and ease of use.

We welcome your participation and valuable suggestions to provide enterprises with easier-to-use, more flexible, and more cost-effective cloud-native data warehouse services.

0 2 1
Share on

ApsaraDB

439 posts | 93 followers

You may also like

Comments