All Products
Search
Document Center

AnalyticDB:Product introduction

Last Updated:Oct 31, 2024

What is AnalyticDB?

AnalyticDB is a cloud-native real-time data warehouse service developed in-house by Alibaba Cloud. AnalyticDB allows you to write data from online transaction processing (OLTP) databases and log files in real time and analyze petabytes of data within seconds. AnalyticDB uses a cloud-native storage-compute decoupled architecture that supports the pay-as-you-go billing method for storage and the elastic scaling feature for computing. AnalyticDB provides batch processing and real-time analysis based on resource isolation to meet enterprise requirements for data processing efficiency, cost control, and system stability. AnalyticDB is compatible with the MySQL, PostgreSQL, and Spark ecosystems.

AnalyticDB provides two engines: AnalyticDB for MySQL and AnalyticDB for PostgreSQL.

Item

AnalyticDB for MySQL

AnalyticDB for PostgreSQL

Ecosystem

Highly compatible with MySQL

Highly compatible with Spark

Fully compatible with PostgreSQL

Highly compatible with Oracle

Architecture

Storage-compute decoupled architecture

Scalability

Similarities

Vertical scaling

Horizontal scaling

Differences

Uses a multi-cluster scaling model to automatically scale resources

Uses a min-max model to automatically scale resources in a scheduled manner

Uses scheduled jobs to change configurations in a scheduled manner

Scales resources on demand in Serverless mode

Features

Similarities

Vector search

Full-text search

Batch processing

Real-time materialized views

Differences

Data lakes

Spark batch processing

Intelligent diagnostics and optimization of query performance

Retrieval-Augmented Generation (RAG) service

Spatio-temporal data analysis

Scenarios

Similarities

Real-time data warehouses

Real-time log analysis

Business intelligence (BI) reports

Differences

Precision marketing

Multi-source joint analysis

Big data storage and analysis

Accelerated query of offline data

Data migration of other data lake or data warehouse services, such as Databricks, Athena, and self-managed Spark or Presto clusters

End-to-end building of Large Language Model (LLM) applications

Dedicated enterprise knowledge base

Geographic Information System (GIS)-based big data analysis

Integrated batch processing with real-time analysis

Data migration of other data warehouse services, such as Greenplum, Redshift, Synapse, Snowflake, and BigQuery

Industries

Gaming, retail, and automobile

Retail, e-commerce, and education

Cost-effectiveness

Similarities

Data storage fees based on actual data volumes

Tiered storage of hot and cold data to reduce storage costs

Scheduled scaling based on regular traffic fluctuations to ensure sufficient resources during traffic spikes and prevent idle resources after traffic spikes

Differences

Auto scaling based on business workloads

Manual instance starting or pausing based on business requirements

Introduction to AnalyticDB for MySQL

image

Data source

AnalyticDB Pipeline Service (APS) is provided to implement low-cost access to data sources, such as databases, logs, and big data platforms.

Storage layer and compute layer

Data Lakehouse Edition provides two in-house engines: the XIHE compute engine and the XUANWU storage engine. Data Lakehouse Edition also supports the open source Spark compute engine and Hudi storage engine. Data Lakehouse Edition is suitable for a variety of data analysis scenarios and supports access between the in-house and open source engines to implement centralized data management.

  • Storage layer: One copy of full data can be used for both batch processing and real-time analysis.

    In batch processing scenarios, data needs to be stored on low-cost storage media to reduce costs. In real-time analysis scenarios, data needs to be stored on fast storage media to improve performance. To meet the requirements of batch processing, Data Lakehouse Edition stores one copy of full data on low-cost, high-throughput storage media. This reduces data storage and I/O costs and ensures high throughput. To meet the requirements of real-time analysis within 100 milliseconds, Data Lakehouse Edition stores real-time data on individual elastic I/O units (EIUs). This helps meet the timeliness requirements for row data queries, full indexing, and cache acceleration.

  • Compute layer: The system automatically selects an appropriate computing mode for the XIHE compute engine. The open source Spark compute engine is suitable for various scenarios.

    The XIHE compute engine provides two computing modes: massively parallel processing (MPP) and bulk synchronous parallel (BSP). The MPP mode uses stream computing, which is not suitable for low-cost and high-throughput batch processing scenarios. The BSP mode divides tasks within a DAG and computes data for each task. This way, large amounts of data can be processed by using limited resources, and the data can be stored on disks. If the MPP mode fails to process data within a specific period of time, the XIHE compute engine can automatically switch to the BSP mode to process data.

    The open source Spark compute engine is suitable for more complex batch processing and machine learning scenarios. The compute layer and storage layer are separated but interconnected, which allows you to easily create and configure Spark resource groups.

Access layer

The access layer leverages unified billing units, metadata and permissions, development languages, and transmission links to improve development efficiency.

For more information about AnalyticDB for MySQL editions, see Editions.

Introduction to AnalyticDB for PostgreSQL

image

AnalyticDB for PostgreSQL is available in elastic storage mode and Serverless mode. The elastic storage mode uses a shared-nothing architecture based on Elastic Compute Service (ECS) and Enterprise SSDs (ESSDs) and provides MPP capabilities. The Serverless mode uses a shared-storage architecture based on ECS, local cache, and Object Storage Service (OSS) and provides decoupled storage and computing capabilities.

An AnalyticDB for PostgreSQL instance consists of a coordinator node and multiple compute nodes. The coordinator node is responsible for metadata management and load balancing. The compute nodes are responsible for data processing. The compute nodes integrate the Orca optimizer and the self-developed Laser execution engine and Beam storage engine to implement high-performance queries. The compute nodes also use incremental materialized views (IMVs) to build real-time materialized views. AnalyticDB for PostgreSQL stores hot data on ESSDs attached to the compute nodes and cold data in OSS. The tiered storage of hot and cold data helps improve query performance and reduce storage costs. You can separately scale the computing and storage resources of the compute nodes.

References

Benefits

Scenarios