By Ananda Budi Prasetya, Solution Architect Lead Alibaba Cloud Indonesia.
Nowadays, many organizations want to serve their data from several sources to their stakeholder. Their purpose is to empower data-driven decision-making. However, constructing a robust data platform that serves as a single source of truth presents significant challenges.
At Alibaba Cloud, we have several services that can be used to build a simple data pipeline that can ingest from several data sources into one place and let the user build a data warehouse on top of it.
In this brief blog post, we will explain it on what essential services must be implemented along with practical implementation guidance.
Let's first take a look at the high-level architecture of this simple data platform pipeline:
We can divide our architecture into several layers:
In this layer, we use a change data capture mechanism to capture all the changes incrementally from the source databases to the destination databases. To do this, we can use Data Transmission Service (DTS) and connect the services to our source dtabase. This service will read the DB log and perform CDC based migration.
And other than DBs, like Sheet or API, we can use batch ingestion to get the latest data with Data Integration, which is part of DataWorks.
To store the ingestion results, we use AnalyticDB for PostgreSQL act as a data lake, which a cloud-native real-time data warehouse services, supporting high-performance low-latency ad hoc analysis of structured, unstructrured, and semi-structured data.
Since our aim is to minimize the need of complex data transformation pipeline, we can build our data warehouse or even data mart based on the Real-time Materialized View (RMV) feature in AnalyticDB for PostgreSQL. We can just create the RMV by using SQL query as the data transformer.
By using this, we don't need to think about the transformation scheduling. Whenever there's an update from the source, the RMV will refresh the table with the latest update, and the data is ready to be utilized by the visualization tools or anything else.
Like stated in the previous paragraph, we can use RMV as the reporting layer or as a layer that's connected directly to our business users, where they will see or create a dashboard/report based on the flatten data built with specific business needs on the RMV
To complete the data platform that we built, we can use Alibaba Cloud's CloudMonitor services to oversee the whole process from the upstream until the downstream of the data. For example, we can monitor if there's a delay or latency more than 20 minutes in the CDC data sync with Data Transmission Service (DTS).
As demonstrated above, to build a simple data pipeline platform is very feasible with Alibaba Cloud Big Data stacks. Sometimes, our customers are non-technical users, they only prioritize their data, and want a data platform as a single source of truth for their company. With Alibaba Cloud, we can deliver a straightforward solution. Our goal is to provide a unified data platform serving as a single source of truth, enabling your organization to leverage data effectively.
Intelligent Advisor: Your Personalized Cloud Consulting Partner
99 posts | 15 followers
FollowApsaraDB - July 25, 2023
Rupal_Click2Cloud - August 19, 2024
Alibaba EMR - January 10, 2023
Alibaba Cloud Native Community - January 27, 2022
Xi Ning Wang(王夕宁) - July 21, 2023
Alibaba Clouder - April 8, 2020
99 posts | 15 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreSupports data migration and data synchronization between data engines, such as relational database, NoSQL and OLAP
Learn MoreA real-time data warehouse for serving and analytics which is compatible with PostgreSQL.
Learn MoreMore Posts by Alibaba Cloud Indonesia