By Kong Liang (Lianyi)
A data warehouse is theme-oriented, integrated, stable, and time-varying. It is used to support management decision-making. A data warehouse collects all of the enterprise data and provides a centralized and standard data outlet for all enterprise departments.
A data warehouse (model) is a methodology of best practices for manual data collection, storage, understanding, organization, management, and decision-making. The model is not affected by where it is used or what technology it uses. However, the logical model and the physical model are integrated closely into the final solution. Therefore, we need the business and technical capabilities provided by data warehouses.
The core features and values of data warehouses include collection, synchronization, processing, storage, modeling, governance, and query. The data warehouse must be deployed, activated, and routinely maintained in an Internet Data Center (IDC) to realize its capabilities and value. It must also be highly available, secure, and scalable. These requirements constitute the total cost of ownership (TCO) of the data warehouse. From all perspectives: TCO = Core capability costs + Basic costs = Product costs + Service costs = Current costs + Long-term costs + Evolution costs.
MaxCompute is an enterprise-grade data warehousing service based on the Software-as-a-Service (SaaS) model. SaaS cloud-based data warehouses provide the following characteristics:
Data warehouses free enterprises from investment in non-core capabilities, such as infrastructure construction, maintenance, and long-term evolution.
Possible Scenarios of SaaS Cloud-Based Data Warehouses:
Benefits of SaaS Cloud-Based Data Warehouses:
Recommended SaaS Cloud-Based Data Warehouse Scenarios and Product Combination:
Here, we will focus on real-time analysis scenarios:
The following figure shows the user-oriented features and data flows of cloud-based data warehouses. After you activate the MaxCompute cloud-based data warehouse service, you can use all these features.
Closer data sources facilitate analysis and decision-making, maximizing the value of data.
The following two analogies are used to describe the evolution of real-time analysis scenarios:
Analogy 1: A grand hotel also has a wide range of other businesses, such as providing real-time food services, to leverage the advantages of collaboration.
Evolution 1: In a data warehouse analysis scenario, perform real-time analysis based on real-time business requirements to implement real-time channels and interactive analytics, forming a Lambda architecture
Analogy 2: The hotel expands from real-time food services, requiring more external support and moving toward comprehensive development.
Evolution 2: In a real-time analysis scenario, create a stream architecture to extract data from data warehouses, and play it back with data sources, forming a Kappa architecture. Then, you must consider how to implement real-time data and model warehousing.
The two evolution scenarios are analyzed in detail below:
In the data warehouse analysis scenario, you can analyze data in real-time based on real-time business requirements to implement real-time channels and real-time interactive analytics, forming a Lambda architecture. For example, for Internet of Things (IoT) device monitoring and analysis, after policies are delivered to a device, data is reported and immediately analyzed. Then, you can compare the previous results for repeated analysis and optimization.
In the real-time analysis scenario, you can create a stream architecture to extract data from data warehouses, and play it back with data sources, forming a Kappa architecture. Then, you must consider how to implement real-time data and model warehousing. For example, for fraud monitoring, obtain the analysis result in a timely manner and associate tags for accurate identification. Finally, store the real-time data in the data warehouse to generate knowledge through incorporation with other data.
The main capability requirements of real-time analysis are listed below:
1. Application Ecosystem:
2. Rapid Query Response:
3. Real-Time Storage:
4. Data Warehouse Query Acceleration:
5. Joint Computing:
The common Lambda architecture has three major problems:
1. Inconsistency:
2. Interlocking Systems With Complex O&M and High Costs:
3. Long Development Cycle and Cumbersome Businesses:
Based on the refined operations for search recommendations scenarios, open-source solution capabilities are decentralized. KVStore, Massively Parallel Processing (MPP), real-time data warehouses, and data warehouses that support multiple capabilities are shown in the following figure. We recommend using one technical solution to integrate these capabilities into one engine. For example, the storage, real-time data warehouse, interactive analytics, point query, and online analytical processing (OLAP) analysis capabilities can be integrated. MaxCompute Hologres is just such a solution.
MaxCompute Hologres makes the real-time analysis architecture simple and efficient. Hologres supports real-time data writing, analysis, and queries by focusing on real-time analysis. MaxCompute Hologres enables the same data to be used for real-time analysis, online services, and unified real-time and offline storage with a cloud-native hybrid serving/analytical processing (HSAP) architecture. This supports perfect integration with MaxCompute.
In another scenario, MaxCompute Hologres can be used as an analysis and acceleration capability module and an Application Data Service (ADS) modeling capability module for MaxCompute. No data is migrated, and data analysis is highly efficient. At the ADS layer, modeling and services are integrated, and the OLAP capability is enhanced, as shown in the following figure:
The Kappa architecture is upgraded based on the stream architecture. This requires data warehouse playback and association. You also need to consider how to implement real-time data and model warehousing. Open-source real-time data warehouses feature high real-time costs, long development cycles, and inflexible service support.
The Kappa architecture is optimized based on the Lambda architecture by combining real-time analysis and streaming and replacing data storage and channels with message queues. Therefore, the Kappa architecture still focuses on stream processing. However, data is stored and modeled at the data lake layer and will be played back in message queues for offline analysis or re-computing. The Kappa architecture seems simple but is difficult to implement, especially for data playback.
MaxCompute Hologres integrates real-time, offline, analysis, and service capabilities. This allows it to support joint real-time and offline analysis and provides insight into cold, hot, and warm data, as shown in the following figure:
In common real-time analysis scenarios, MaxCompute provides a solution that integrates real-time, offline, analysis, and service capabilities by using Hologres. These capabilities have been mentioned in the preceding section, including Lambda architecture simplification, interactive query enhancement, Kappa architecture enhancement, joint real-time and offline analysis, and full insight into cold, hot, and warm data.
This solution applies to data-driven operations in Internet industries, such as e-commerce, gaming, and social networking, including but not limited to intelligent recommendations, log collection and analysis, user profiling, data governance, business dashboards, and search.
VivaVideo is a short video community app for original videos with a wide range of editing features. It provides short video editing tools, including filming, editing, and tutorials. It ranks among the top five (by income) in the Google Play store and serves more than 890 million users worldwide.
137 posts | 20 followers
FollowAlibaba Cloud MaxCompute - March 25, 2021
Alibaba Clouder - March 17, 2021
Alibaba Cloud MaxCompute - July 14, 2021
Alibaba Cloud MaxCompute - December 8, 2020
Alibaba Clouder - April 21, 2021
Alibaba Cloud MaxCompute - August 27, 2020
137 posts | 20 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreA real-time data warehouse for serving and analytics which is compatible with PostgreSQL.
Learn MoreMore Posts by Alibaba Cloud MaxCompute