E-MapReduce (EMR) V5.2.1 is the first stable version of the EMR V5.X series. This topic describes the release notes for EMR V5.2.X, including the release date, updates, and release version information.
Release date
July 16, 2021 for EMR V5.2.1
Updates
Service | Description |
SmartData | SmartData is updated to 3.6.1. For more information, see SmartData 3.6.X. |
Hive | The issue that the output of the show create table command based on Data Lake Formation (DLF) metadata is inaccurate is fixed. The default parameters of Hive are optimized to improve the performance of Hive jobs. In the EMR console, the parameter names on the hive-env tab of the Configure tab for the Hive service are changed to uppercase. This facilitates the use of the parameters. The issue that user-defined functions (UDFs) cause HiveServer2 memory leak is fixed. The error message that is reported because of the incompatibility between the file system and Hive metastore when you write data to a Hive table is optimized.
|
HDFS | The data compression algorithm Zstandard is supported. |
Delta Lake | |
Flink | Flink is updated to 1.12-vvr-3.0.2. |
Hudi | |
Spark |
Important In EMR V5.2.1, Spark (3.1.1) and Kudu (1.11.1) are incompatible with each other. Delta Lake and Hudi are supported. Remote Shuffle Service is supported. Livy is supported. In the EMR console, the parameter names on the spark-defaults tab of the Configure tab for the Spark service are optimized. The cost-based optimization (CBO), dynamic partition pruning, and Z-order features are optimized. The performance of these features is 50% higher than in Spark 3. Log Service, DataHub, and Message Queue for Apache RocketMQ can be used as data sources.
|
Tez | The default parameters of Tez are optimized to improve the performance of Tez jobs. |
Ranger | The warning error contained in logs about starting Spark in Ranger is fixed. The issue that user information fails to be automatically synchronized after Ranger is connected to a Lightweight Directory Access Protocol (LDAP) server is fixed.
|
Knox | |
Kafka | The Cruise Control component can be used to provide the balance feature for Kafka clusters. Disks for Kafka clusters are hot-swappable. You can replace a damaged disk without the need to stop the Kafka broker of your cluster. The default values of some parameters are changed.
|
Phoenix | The issue that no JDBC driver is found when Hive or Spark SQL is used to access Phoenix tables is fixed. |
EMR Remote Shuffle Service (ESS) | Spark 3 is supported. |
Release version information
Hadoop clusters
Service | Version |
HDFS | 3.2.1 |
YARN | 3.2.1 |
Hive | 3.1.2 |
Spark | 3.1.1 |
Knox | 1.1.0 |
Tez | 0.9.2 |
Ganglia | 3.7.2 |
Sqoop | 1.4.7 |
SmartData | 3.6.1 |
Bigboot | 3.6.1 |
Hudi | 0.8.0 |
OpenLDAP | 2.4.44 |
Hue | 4.9.0 |
HBase | 2.3.4 |
ZooKeeper | 3.6.2 |
Presto | 338 |
Impala | 3.4.0 |
Zeppelin | 0.9.0 |
Flume | 1.9.0 |
Livy | 0.7.1 |
Superset | 0.36.0 |
Ranger | 2.1.0 |
Storm | 1.2.2 |
ESS | 1.0.0 |
Alluxio | 2.5.0 |
Kudu | 1.11.1 |
Oozie | 5.1.0 |
Shuffle Service clusters
Service | Version |
ZooKeeper | 3.6.2 |
Ganglia | 3.7.2 |
Kafka | 2.4.1 |
Kafka Manager | 1.3.3.16 |
OpenLDAP | 2.4.44 |
knox | 1.1.0 |
Ranger | 2.1.0 |