This topic describes the release notes for E-MapReduce (EMR) and provides links to the relevant references.
For more information about the release notes, see Overview.
2024
August 2024
Feature | Description | Release date | References |
Support of the monitoring and diagnostics feature | The monitoring and diagnostics feature is used for intelligent O&M of clusters. The feature is built based on a large model and incorporates the knowledge and experience of the Alibaba Cloud EMR team in the open source big data field, EMR observability, and the diagnostic experience of technical experts. The monitoring and diagnostics feature enhances the observability of EMR. The feature provides real-time health diagnostics for you to identify issues of abnormal clusters and troubleshoot the issues based on suggestions in the diagnostic result. This helps reduce O&M costs. The monitoring and diagnostics feature also provides global optimization suggestions in daily cluster reports to help you improve the running efficiency of clusters. | 2024-08-20 | |
Optimization of the cluster cloning capability | The cluster cloning feature is optimized to allow modified service configurations, added node groups, and configured auto scaling rules during cluster creation or cluster use to be cloned to a new cluster. This helps you quickly create a cluster with the same configurations as an existing cluster. | 2024-08-20 | |
Association of more security groups with a node group | A maximum of four security groups can be associated with a node group. This helps you implement access control on ECS instances in a cluster in a flexible manner. | 2024-08-20 |
June 2024
Feature | Description | Release date | References |
Support for enabling of the auto-renewal feature during scale-out | If you turn on Auto-renewal when you scale out an EMR cluster, nodes that are added are automatically renewed. This reduces asynchronous operations. You can modify the renewal duration or disable the auto-renewal feature on the Auto-renewal page. | 2024-06-19 | |
Switchover of the billing method from pay-as-you-go to subscription at the node group level | The billing method of core, task, or gateway node groups in a subscription cluster can be changed from pay-as-you-go to subscription. This helps you manage the billing method of resources in a flexible manner. | 2024-06-19 | |
Creation and custom deployment of Master-Extend node groups | Master-Extend node groups can be created for an EMR cluster. You can deploy components of Spark, Hive, and Kyuubi in a Master-Extend node group based on your business requirements. The system automatically synchronizes the configurations of related components to nodes that require the components. This helps reduce the load on the master node group of an EMR cluster. | 2024-06-19 |
March 2024
Feature | Description | Release date | References |
Creation and management of OSS-HDFS buckets in the EMR console | OSS-HDFS buckets can be created when you create a cluster in the EMR console. You can view the storage overview and object list of the buckets on the Services tab of the cluster in the EMR console. You no longer need to perform these operations in the Object Storage Service (OSS) console. This simplifies the process of using buckets and prevents misoperations that may cause the Hadoop Distributed File System (HDFS) service to become unavailable. | 2024-03-14 | |
Creation of gateway node groups | Gateway nodes are provided to reduce the load on the master node. They can serve as task submission machines. This way, you can submit tasks on gateway nodes with simple operations. Gateway nodes also help implement automatic synchronization of configurations that are related to clusters and task submission environments. This helps you deploy and configure a task submission environment with ease. | 2024-03-14 | |
Management of health check items | The feature of managing health check items is supported. EMR checks the health status of nodes and services of EMR clusters based on the preset health check items. This helps you handle exceptions and risks at the earliest opportunity. You can use the feature to view the check content of nodes and services of a cluster and modify check items. | 2024-03-14 | |
Diversification of health check items for services and components | The health check items of YARN, HDFS, Hive, Kafka, and ZooKeeper are diversified to improve the check accuracy on the health status of services and components. | 2024-03-14 |
2023
October 2023
Feature | Description | Release date | References |
Recommendation of auto scaling rules | The feature of recommending auto scaling rules is optimized. You can view the overview information about cluster resources on the Auto Scaling tab in the EMR console. The auto scaling feature helps you analyze the resource utilization of clusters and provides recommended auto scaling rules for the clusters that meet specific conditions. You can enable auto scaling based on the overview information to improve the elasticity of cluster resources. | 2023-10-24 | |
Alert rule management | The alert rule management feature is provided. This feature is implemented based on CloudMonitor. You can create and view alert rules for clusters in the EMR console. If resource metrics meet specific alert conditions, alerts are triggered and CloudMonitor sends alert notifications. This way, you can identify and handle the exceptions of monitored clusters at the earliest opportunity. | 2023-10-24 | |
Display of node health status | Node health status is displayed for you to check whether a node is run as expected. You can view the health status of nodes on the Nodes tab and identify abnormal nodes at the earliest opportunity. | 2023-10-24 | |
Configuration of disk performance levels (PLs) | PLs can be configured for disks. When you create a cluster or add a node group, you can specify different PLs for enhanced SSDs (ESSDs) to meet different cluster performance requirements. | 2023-10-24 |
August 2023
Feature | Description | Release date | References |
Cluster template | The cluster template feature is a persistent EMR instance configuration feature that can be used to create an EMR cluster with a few clicks. | 2023-08-29 | |
Viewing of overview information about cluster resources | You can view the overview information about cluster resources on the Auto Scaling tab in the EMR console. The auto scaling feature helps you analyze the resource utilization of clusters and provides auto scaling rules for the clusters that meet specific conditions. You can enable auto scaling based on the overview information to improve the elasticity of cluster resources. | 2023-08-29 | |
Viewing of configuration items | If a configuration item at the node group or node level is modified, the settings of the configuration item at the node group or node level are displayed on the Configure tab, with Node Group Configuration or Independent Node Configuration selected from the Default Cluster Configuration drop-down list. | 2023-08-29 |
July 2023
Feature | Description | Release date | References |
Auto scaling management | EMR provides a dedicated management module that allows you to manage the auto scaling feature in an efficient manner. You can use the module to manage auto scaling rules and view the elastic resource usage and cost allocation of your cluster. This way, you can evaluate the cost savings brought by auto scaling and optimize the resource utilization of your cluster. | 2023-07-12 | |
Automatic supplementation | The automatic supplementation feature of EMR is optimized. The feature can replace abnormal nodes in a cluster. Information prompt and event notification capabilities are provided to help you learn how automatic supplementation is performed. Note From 18:00 (UTC+8) on July 10, 2023, Automatic Compensation is turned on for new pay-as-you-go task node groups by default. | 2023-07-12 | |
Service configuration | Service configuration is optimized. The To Be Delivered prompt and Not Effective Yet prompt are added. This provides guidance on operations that users can perform after configurations are modified to ensure that configuration modifications take effect. | 2023-07-12 | |
Stateless clusters | Stateless clusters are supported. EMR provides a default data lake architecture, which does not depend on Hadoop Distributed File System (HDFS). If you do not need to use services that depend on core nodes, you can remove the core node group to build a completely stateless cluster. This helps further reduce the O&M costs of your cluster. | 2023-07-12 | |
Association of YARN partitions with queues | You can associate YARN partitions with queues and allocate capacity in the EMR console, without the need to configure complex settings. | 2023-07-12 | |
Per-second billing for pay-as-you-go resources | Per-second billing is supported for pay-as-you-go resources. The finer billing granularity helps effectively reduce resource costs. | 2023-07-12 |
June 2023
Feature | Description | Release date | References |
Version update |
| 2023-06-01 | |
Paimon | Paimon is added. Apache Paimon is a data lake platform that allows you to process data in streaming and batch modes. Apache Paimon supports high-throughput data writing and low-latency data queries. | 2023-06-01 | |
Presto | Presto is added. Presto (namely PrestoDB) is a flexible and scalable distributed SQL query engine. | 2023-06-07 |
April 2023
Feature | Description | Release date | References |
Version update |
| 2023-04-03 | |
New capability in data lakehouse scenarios | Hologres and MaxCompute tables can be accessed by using the Spark and Trino compute engines. | 2023-04-03 | New capability in data lakehouse scenarios: EMR supports Hologres and MaxCompute data sources |
Access to Hologres by using Spark | Spark can be used to read data from Hologres tables. | 2023-04-03 | |
Node configuration upgrade | The ECS instance configurations of a node group can be upgraded. | 2023-04-03 | |
Management of YARN partitions in the EMR console | EMR allows you to manage YARN partitions in the console in a visualized manner. You can establish mappings between multiple node groups and partitions at a time. | 2023-04-13 |
March 2023
Feature | Description | Release date | References |
Flink Table Store | Flink Table Store is added. Flink Table Store is a unified data lake storage that allows you to process data in streaming and batch modes. You can use Flink Table Store to write data at high throughput and query data at low latency. | 2023-03-03 | |
Export and import of service configurations | Service configurations can be exported in the XML or JSON format. This way, you can back up, migrate, or restore the service configurations of an EMR cluster. | 2023-03-02 |
February 2023
Feature | Description | Release date | References |
Version update |
| 2023-02-28 |
Release notes for 2022
December 2022
Feature | Description | Release date | References |
Version update |
| 2022-12-01 | |
Node label of YARN | The node label feature of YARN is supported. This feature allows you to manage nodes on which NodeManagers are deployed in a cluster based on different partitions. | 2022-12-14 |
November 2022
Feature | Description | Release date | References |
Version update |
| 2022-11-08 | |
Log management | The log management feature is supported. This feature allows you to query the logs that are generated for the open source components in the EMR console. | 2022-11-29 |
October 2022
Feature | Description | Release date | References |
Version update |
| 2022-10-14 | |
HBase Shell | HBase Shell can be used to connect to HBase that is deployed in an EMR cluster. | 2022-10-21 | |
DataServing cluster | DataServing clusters based on Apache HBase are provided. | 2022-10-28 |
September 2022
Feature | Description | Release date | References |
Automatic supplementation | Automatic supplementation is supported. After you enable this feature for an EMR cluster, the abnormal ECS instances in the EMR cluster can be automatically replaced when EMR identifies that the ECS instances cannot run the engine services as expected. | 2022-09-07 | |
Cluster cloning | The cluster cloning feature provided by EMR can be used to create a cluster based on an existing cluster. | 2022-09-09 |
August 2022
Feature | Description | Release date | References |
Version update |
| 2022-08-05 | |
Deployment set | Deployment sets provided by Alibaba Cloud ECS can be used to manage the distribution of ECS instances. Deployment sets can help improve the disaster recovery capability and availability of ECS instances. | 2022-08-05 | |
Gateway deployment by using EMR-CLI | The EMR-CLI tool provided by EMR can be used to deploy a gateway on an ECS instance. | 2022-08-05 |
July 2022
Feature | Description | Release date | References |
EMR Doctor | EMR Doctor is provided. It is an intelligent O&M system developed by the Alibaba Cloud EMR team for open source big data clusters. | 2022-07-25 |
June 2022
Feature | Description | Release date | References |
DataLake cluster | DataLake clusters are supported. A DataLake cluster is a big data computing cluster that allows you to analyze data in a flexible, reliable, and efficient manner. You can create a DataLake cluster only in the new EMR console. | 2022-06-01 | |
Association of a Spark cluster with a Shuffle Service cluster | Remote Shuffle Service (RSS) is an extension provided by Alibaba Cloud EMR to improve the stability and performance of Spark Shuffle. You can associate a Spark cluster that is created on the EMR on ACK page with a Shuffle Service cluster. | 2022-06-09 |
May 2022
Feature | Description | Release date | References |
Memory management | Memory resources can be managed. The topic in the References column describes the memory usage categories and memory configuration parameters that are related to a backend (BE) in StarRocks. The topic also describes how to view memory usage. | 2022-05-10 |