Data Lake Analytics (DLA) is discontinued. AnalyticDB for MySQL supports the features of DLA and provides additional features and enhanced performance. For more information about how to use AnalyticDB for MySQL, see What is AnalyticDB for MySQL?
DLA is a next-generation big data solution that separates data computing from data storage. DLA can archive messages and database data and build data warehouses in real time. The supported databases include relational databases, PolarDB databases, and NoSQL databases. In addition, DLA provides the serverless Spark and Presto engines to meet the requirements for online interactive search, stream processing, batch processing, and machine learning. As a robust alternative to traditional Hadoop solutions, DLA facilitates a seamless transition to cloud-based analytics.
Data sources supported by DLA
For more information about the data sources that are supported by DLA, see Compatibility matrix for data sources and SQL statements.
Data source | Serverless Presto engine | Serverless Spark engine |
Supported | Supported | |
Supported | Supported | |
Supported | Supported | |
To be supported | Supported | |
Supported | To be supported | |
Supported | Supported | |
Supported | Supported | |
Supported | Supported | |
Supported | Supported | |
Supported | Supported | |
Supported | Supported | |
Supported | Supported | |
Kudu | Supported | Supported |
Self-managed Druid database hosted on an Elastic Compute Service (ECS) instance | Supported | Supported |
Features
DLA provides an end-to-end cloud-native data lake analytics and computing solution for data that is stored in OSS. DLA has the following benefits that help troubleshoot various issues:
End-to-end data lake solution: This solution enables efficient data ingestion, extract, transform, load (ETL), machine learning, and interactive analytics. DLA provides Data Lake Formation (DLF) and serverless Presto and Spark engines.
Secure data processing: All tables in databases and the stored data of DLA have separate security solutions. This prevents data misuse.
Cost-effective data processing: The serverless cloud-native data processing solution of DLA is cost-effective.
Smooth evolution: DLA ensures a smooth evolution from a Hadoop system to a data lake solution.
Support for serverless Presto and Spark engines
The serverless Presto engine of DLA is built based on Apache Presto. All the computing jobs are implemented by the memory. The serverless Presto engine delivers a high-performance and interactive analysis experience, and returns analysis results in seconds. The serverless Spark engine is compatible with all the API operations provided by Apache Spark.
We recommend that you use the serverless Spark engine of DLA in the following scenarios:
You must customize code or SQL statements cannot meet your business requirements.
A large amount of data needs to be cleansed. For example, one terabyte to one petabyte of data stored in OSS must be cleansed per day.
A wide range of algorithms must be supported. The serverless Spark engine of DLA supports all Spark algorithms.
Streaming must be supported.