This topic describes the release notes for Data Lake Analytics (DLA) and provides links to the relevant references.
June 2021
Category | Feature | Description | References |
---|---|---|---|
Cluster management | Monitoring and alerting | Spark virtual clusters support monitoring and alerting. | View the metrics of the serverless Spark engine |
Data lake management | Data reading from secondary databases | The DLA lakehouse solution allows you to read data from secondary databases of ApsaraDB RDS and PolarDB for MySQL. | N/A |
Performance improvement | The DLA lakehouse solution allows you to read data from database tables in parallel during full synchronization. This way, the performance of DLA is improved by 2.5 times. | N/A | |
Time series issue fixing | The DLA lakehouse solution fixes the time series issue that occurs when DLA uses Data Transmission Service (DTS) to perform parallel write operations per second. | N/A | |
DLA Spark | Table data reading across accounts | Spark SQL allows you to use different accounts to read data from tables in the DLA metadata system. | N/A |
OSS optimization | By default, OSS optimization is enabled. This ensures that the performance of OSS is not affected when you perform a deep copy in OSS. | N/A | |
Configuration of the maximum number of job failures on Spark executors | DLA allows you to configure the maximum number of job failures on Spark executors. By default, the maximum number of job failures on Spark executors is twice the number of Spark executors. | N/A | |
Job retries | Spark jobs support automatic retries to fix the stability issue that is caused by the unstable performance of the platform framework. | Configure a Spark job | |
Monitoring and alerting | Spark jobs support monitoring and alerting. | View the metrics of the serverless Spark engine | |
DLA Presto | Configuration of a path not required for table creation | When you create a table in DLA, you do not need to configure the Location parameter to specify a path. | N/A |
Partition projection for better table performance | If you enable partition projection for a table, DLA provides better performance for the table to list OSS directories. | N/A | |
Fixing of metadata system issues | The cause can be identified if an error message is returned when you create a table in the DLA metadata system. | N/A | |
Fixing of issues on tables for which partition projection is enabled | The following issue is fixed: If partition projection is enabled for a table, no data is found in the table after the INSERT OVERWRITE statement is executed for the table. | N/A | |
Operator pushdown | Data computations of operators, such as filter, aggregation, and limit operators, can be pushed down to Tablestore. | Use computing pushdown for Tablestore | |
Parameter control | DLA allows you to control the task_writer_count and task_concurrency parameters. | N/A | |
Improved read mode | The data read method of AnalyticDB for MySQL 3.0 is changed to streaming. This resolves the issue of high memory usage caused by non-streaming read mode. | N/A |
1.0.0
Category | Feature | Description |
---|---|---|
Data analysis | Analysis of data in OSS files | Data in a single OSS file can be analyzed. In addition, association analysis can be performed for files across different OSS buckets. |
Writing of analysis results to OSS | The analysis results can be written back to OSS. | |
Analysis of data in Tablestore | Data in Tablestore can be analyzed. | |
Analysis of data in ApsaraDB RDS | Data in ApsaraDB RDS can be analyzed. | |
Analysis of data from multiple data sources | Data from multiple data sources, such as OSS, Tablestore, and ApsaraDB RDS, can be analyzed. |
1.1.0
Category | Feature | Description |
---|---|---|
Core features | PolarDB data source | Alibaba Cloud PolarDB data sources are supported. |
Redis Connector | The ApsaraDB for Redis connector is supported. | |
Data reading from ApsaraDB for MongoDB | Data can be read from ApsaraDB for MongoDB. | |
Logical view | Logical views are supported. | |
MySQL 8.0 protocol | MySQL 8.0 protocol is supported. | |
OSS data sources | The DDL table creation wizard supports OSS data sources. | |
Public datasets | Public datasets are supported. | |
Other features | JSON_EXTRACT function | The JSON_EXTRACT function is supported to process data from ApsaraDB for MongoDB. |
IP address resolution function | The IP address resolution function is a new function. This function can translate IP addresses into location information, such as countries, provinces, and cities. | |
PreparedStatement | PreparedStatement is supported. | |
OSS API calls | The number of calls to the OSS API is reduced. | |
Limits on the number of partitions | The number of partitions to which data can be written at a time is limited. | |
Table and field formats | Table and field names can start with a digit. | |
ALTER PARTITION | The ALTER PARTITION command is supported. | |
Logstash | Logstash is supported. |
1.2.0
Category | Feature | Description |
---|---|---|
Ease of use | Console reconstruction and optimization | The following features in the new version of the DLA console are optimized: overview, account management, and endpoint management. |
Pop-up window for version release | A pop-up window is displayed for version updates each time a new version is released. | |
Optimized process of account management | The account management process is optimized. This helps you manage accounts and passwords and add comments for DLA sub-accounts. | |
New page for SQL interaction | A new page is displayed for SQL interaction. This helps you explore data lakes and accelerates the SQL interaction. | |
Schema wizard | The schema creation wizard and table creation wizard are developed and optimized. This significantly improves the efficiency of data lake formation and data exploration and discovery. | |
GUI-based database and table operations | GUI-based operations are supported for you to delete tables or databases. | |
Optimized data writing to partitions | The INSERT OVERWRITE SELECT statement can be used to perform extract, transform, load (ETL) operations and write data to the destination partition. This simplifies data cleansing and processing during ETL operations. | |
Deep integration | Integration with data analysis and writing | DLA allows you to analyze data from various data sources and write data to these data sources. The data sources include OSS, Tablestore, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, ApsaraDB RDS for MySQL, ApsaraDB RDS for PostgreSQL, ApsaraDB RDS for SQL Server, self-managed MySQL, PostgreSQL, and SQL Server databases, ApsaraDB for Redis, self-managed Redis databases, ApsaraDB for MongoDB, self-managed MongoDB databases, and PolarDB. OSS data includes more than seven types of structured and semi-structured data and data files in multiple compression formats. |
Integration with DataWorks | DLA is integrated with DataWorks. This helps you customize data processing procedures in a visualized manner and create big data workflows in the cloud. | |
Integration with Function Compute | DLA is integrated with Function Compute. This helps you create cloud-native serverless workflows based on the serverless Spark and Presto engines. | |
Integration with Message Service (MNS) and Message Queue | DLA is integrated with MNS and Message Queue. This significantly improves the data processing efficiency for DLA and facilitates business integration. |