Data Lake Analytics (DLA) is a next-generation big data solution that separates computing from storage. DLA can archive database data and messages and create data warehouses in real time. DLA provides the serverless Spark and Presto-compatible SQL engines to meet the requirements of online interactive search, stream processing, batch processing, and machine learning. Compared with traditional Hadoop solutions, DLA is also a competitive cloud-based Hadoop solution. Scalability is the core competitiveness of DLA.
Scalability
If the serverless Spark engine uses the billing method based on the number of compute units (CUs) used, you are charged only for the CUs actually used for a job. Compared with the billing method of a traditional solution, this billing method reduces costs by more than 50%.
If the serverless Presto-compatible SQL engine uses the billing method based on the number of CUs used, you are charged only for the CUs used during specified periods.
If the serverless Presto-compatible SQL engine uses the billing method based on the number of bytes scanned, you are charged only for the SQL statements that are executed.
Benefits
Category | Self-managed Hadoop system | Alibaba Cloud DLA + OSS solution |
---|---|---|
Product system | The system is complex and consists of a large number of components. | Integration and an end-to-end procedure are supported to improve user experience. The procedure includes data lake creation, data lake management, extract, transform, and load (ETL), and analysis and queries. Two types of serverless engines are used: Presto-compatible SQL engine and Spark engine. |
Scalability | N/A | This solution is a cloud-native solution. It can scale out a cluster to 300 nodes within one minute for data computing. |
Cost-effectiveness | It is an open source solution. | This solution provides optimizations and high scalability based on the open source solution. Compared with self-managed open source clusters, virtual clusters (VCs) of DLA reduce costs by more than 50%. |
Database data and messages (such as Kafka messages) are archived to Hudi (stored in OSS). | N/A or manually compiled code | Links and Hudi are optimized. These optimizations will be available in the future. |
Learning and O&M costs | High (a long period of time required for deployment, configuration, O&M, and learning) | Low (out-of-box experience and zero O&M cost) |
Security and multi-tenancy | Security authentication is based on Kerberos and Ranger, which is complex. | Database schema library, table authorization mode, and multi-tenancy are supported. |
Features | It uses open source features and does not support cloud connectors. Interconnections and optimization are implemented in the internal system. | Optimizations are implemented for Alibaba Cloud data sources, such as Object Storage Service (OSS), Tablestore, AnalyticDB for MySQL, and AnalyticDB for PostgreSQL. Kernel optimizations are implemented for the serverless Presto-compatible SQL engine and the serverless Spark engine. |