All Products
Search
Document Center

E-MapReduce:Enhanced features of Hive in EMR

Last Updated:May 30, 2024

This topic describes the mappings between E-MapReduce (EMR) versions and Hive versions, and the enhanced features of Hive in EMR.

The following table describes the enhanced features of Hive in EMR.

EMR V5.X series

EMR version

Hive version

Enhanced feature

EMR V5.12.1

Hive 3.1.3

By default, OSS-HDFS is used to store data in Hive warehouse files.

EMR V5.9.0

Hive 3.1.3

Kerberos authentication is supported.

EMR V5.8.0

Hive 3.1.2

LDAP authentication can be enabled with one click.

EMR V5.6.0

Hive 3.1.2

The following issue is fixed: After speculative execution is enabled for Hive on Tez, both the original task and the speculative task are committed.

EMR V5.5.0

Hive 3.1.2

  • The issue about batch deletion that occurs on Hive Jindo is fixed.

  • The out of memory (OOM) issue that occurs on HiveServer2 is fixed.

  • Hive on Spark is optimized.

  • Hive is adapted to JindoSDK.

EMR V5.4.0

Hive 3.1.2

In JindoFS in block storage mode, the metadata of multiple Hive tables can be optimized at the same time. By default, this feature is disabled.

EMR V5.3.0

Hive 3.1.2

In JindoFS in block storage mode, the metadata of multiple Hive tables can be optimized at the same time.

EMR V5.2.1

Hive 3.1.2

  • The issue that the output of the show create table command based on Data Lake Formation (DLF) metadata is inaccurate is fixed.

  • The default parameters of Hive are optimized to improve the performance of Hive jobs.

  • In the EMR console, the parameter names on the hive-env tab of the Configure tab for the Hive service are changed to uppercase. This facilitates the use of the parameters.

  • The issue that user-defined functions (UDFs) cause HiveServer2 memory leak is fixed.

  • The error message that is reported because of the incompatibility between the file system and Hive metastore when you write data to a Hive table is optimized.

EMR V3.X series

EMR version

Hive version

Enhanced feature

EMR V3.46.1

Hive 2.3.9

By default, OSS-HDFS is used to store data in Hive warehouse files.

EMR V3.40.0

Hive 2.3.8

  • The following issue is fixed: After speculative execution is enabled for Hive on Tez, both the original task and the speculative task are committed.

  • The following issue is fixed: User-defined functions (UDFs) can be called only after you reload the functions.

EMR V3.39.1

Hive 2.3.8

Hive is adapted to JindoSDK.

EMR V3.36.1

Hive 2.3.8

  • Hive is updated to 2.3.8.

  • The issue that the output of the show create table command based on Data Lake Formation (DLF) metadata is inaccurate is fixed.

  • The default parameters of Hive are optimized to improve the performance of Hive jobs.

  • In the EMR console, the parameter names on the hive-env tab of the Configure tab for the Hive service are changed to uppercase. This facilitates the use of the parameters.

  • The error message that is reported because of the incompatibility between the file system and Hive metastore when you write data to a Hive table is optimized.

EMR V3.35.0

Hive 2.3.7

Community issues that are related to fetch tasks are resolved.

EMR V3.34.0

Hive 2.3.7

  • Some default configurations are optimized.
  • Performance is optimized. The cost-based optimization (CBO) feature is enhanced.
  • LDAP authentication can be enabled or disabled with a click.
  • Calcite is updated to 1.12.0.
  • The hive.security.authorization.sqlstd.confwhitelist.append parameter is added.

EMR V3.33.0

Hive 2.3.7

  • Hive is updated to 2.3.7.
  • Metadata from Alibaba Cloud Data Lake Formation (DLF) in an HCatalog table is supported.
  • Hive metadata and job running information can be sent to DataWorks.

EMR V3.32.0

Hive 2.3.5

  • The connection leak issue of the HiveServer connection pool is fixed.
  • The data collection feature of JindoTable can be enabled or disabled.
  • The performance of ADD COLUMN is optimized.
  • The issue that causes data read from Hudi tables to be invalid is fixed.
  • The default configurations can be adjusted based on the sizes of cluster nodes.

EMR V3.30.0

Hive 2.3.5

  • Metadata from Alibaba Cloud DLF is supported.
  • The issue caused when you read an empty Delta table directory and write data into a dummy file is fixed.
  • Has dependencies are updated to 2.0.1.

EMR V3.29.0

Hive 2.3.5

  • Hive is updated to 2.3.5.6.0.

  • A third-party metastore is supported.

  • The datalake metastore-client parameter is added.

EMR V3.28.0

Hive 2.3.5

Delta 0.6.0 is supported.

EMR V3.27.2

Hive 2.3.5

  • The magic committer in an HCatalog table is supported.
  • Some outdated default configurations are removed.

EMR V3.26.3

Hive 2.3.5

The direct committer in an HCatalog table is supported.

EMR V3.25.0

Hive 2.3.5

The issue that MapReduce tasks failed to be run in automatic local mode is resolved.

EMR V3.24.0

Hive 2.3.5

  • SQL statement compatibility can be checked.
  • Hive 2.3.5 and Hadoop 2.8.5 are released as a combination.
  • When Hive is restarted, the content in hiveserver2-site.xml is not synchronized to hive-site.xml in the spark-conf folder.
  • The MSCK command can be used to add incremental directories.
  • The bug triggered by the reuse of a Tez container in Hive is fixed.
  • The MSCK command can be used to optimize column directories.

EMR V3.23.0

Hive 2.3.5

  • Removed Hive hooks configured in earlier versions of Hive.
  • Supports using multiple COUNT(DISTINCT) for hive.groupby.skew in data optimization.
  • Fixed the issue of data loss when joining tables with different bucket versions.

Versions earlier than EMR V3.23.0

Hive 2.X

Data from external databases is stored on Hive Metastores. The clusters that use the same Hive Metastore share the data in the Hive Metastore.

EMR V4.X series

EMR version

Hive version

Enhanced feature

EMR V4.10.0

Hive 3.1.2

  • The issue that garbled characters are displayed when Hue is used to query historical records is fixed.

  • The UI display exception that occurs when you use Hue together with Oozie is fixed.

  • The issue that YARN Job Browser sometimes cannot present or terminate jobs is fixed.

  • YARN Job Browser is accessible by default.

  • The Presto protocol is supported by default.

EMR V4.8.0

Hive 3.1.2

  • Some default configurations are optimized.

  • Performance is optimized. The cost-based optimization (CBO) feature is enhanced.

  • LDAP authentication can be enabled or disabled with a click.

EMR V4.6.0

Hive 3.1.2

  • Metadata from Alibaba Cloud Data Lake Formation (DLF) in an HCatalog table is supported.

  • Hive metadata and job running information can be sent to DataWorks.

EMR V4.5.0

Hive 3.1.2

  • Metadata stored in Alibaba Cloud DLF is supported.

  • Ownership-related permissions of Ranger are supported.

EMR V4.4.1

Hive 3.1.2

Default parameter settings are optimized.

EMR V4.4.0

Hive 3.1.2

  • Hive is updated to 3.1.2.
  • JindoFS is optimized.
  • Metastore consistency check (MSCK) is optimized.
  • The Jindo Job Committer in an HCatalog table is supported.
  • Has dependencies are updated.

EMR V4.3.0

Hive 3.1.1

Custom deployment is supported.