All Products
Search
Document Center

Data Lake Formation:Upgrade the EMRHOOK component in an EMR gateway

Last Updated:Aug 09, 2024

This topic describes how to upgrade the EMRHOOK component in an E-MapReduce (EMR) gateway.

Background information

Scenario: Compute operations are performed in an EMR gateway, and the EMR gateway is deployed by using EMR-CLI. If you use the number of visits and an access rate limiting rule for lifecycle data, you need to manually upgrade the EMRHOOK component. The number of visits is a metric in the Data Lake Formation (DLF) service.

Note

The upgrade process does not affect the running computing tasks. After you upgrade the EMRHOOK component and restart computing tasks, the computing tasks automatically take effect.

Applicable conditions

Procedure (applicable for EMR V5.10.1 and EMR 5.X versions later than V5.10.1 and for EMR V3.44.1 and EMR 3.X versions later than V3.44.1)

1. Update JAR packages

Log on to the gateway by using SSH and run the following script. You must have root permissions. Note that you must replace ${region} with the current region ID, such as cn-hangzhou.

sudo mkdir -p /opt/apps/EMRHOOK/upgrade/
sudo wget https://dlf-repo-${region}.oss-${region}-internal.aliyuncs.com/emrhook/latest/emrhook.tar.gz -P /opt/apps/EMRHOOK/upgrade
sudo tar -p -zxf /opt/apps/EMRHOOK/upgrade/emrhook.tar.gz -C /opt/apps/EMRHOOK/upgrade/
sudo cp -p /opt/apps/EMRHOOK/upgrade/emrhook/* /opt/apps/EMRHOOK/emrhook-current/

2. Modify EMR Hive configurations

Important

The value of ${hive-jar} varies based on EMR Hive versions. For EMR Hive 2, set ${hive-jar} to hive-hook-hive23.jar. For EMR Hive 3, set ${hive-jar} to hive-hook-hive31.jar.

  • hive-site.xml (/etc/taihao-apps/hive-conf/hive-site.xml)

    Configuration item: hive.aux.jars.path. Add the following information to the end of the value of hive.aux.jars.path:,/opt/apps/EMRHOOK/emrhook-current/${hive-jar}. Note that the comma (,) is used as a separator.

    Configuration item: hive.exec.post.hooks. Set hive.exec.post.hooks to com.aliyun.emr.meta.hive.hook.LineageLoggerHook.

  • hive-env.sh (/etc/taihao-apps/hive-conf/hive-env.sh)

    Configuration item: HIVE_AUX_JARS_PATH. Add the following information to the end of the value of HIVE_AUX_JARS_PATH: ,/opt/apps/EMRHOOK/emrhook-current/${hive-jar}. Note that the comma (,) is used as a separator.

3. Modify EMR Spark configurations

Important

The value of ${spark-jar} varies based on EMR Spark versions. For EMR Spark 2, set ${spark-jar} to spark-hook-spark24.jar. For EMR Spark 3, set ${spark-jar} to spark-hook-spark30.jar.

  • spark-defaults.conf (/etc/taihao-apps/spark-conf/spark-defaults.conf)

    Configuration item: spark.driver.extraClassPath. Add the following information to the end of the value of spark.driver.extraClassPath: :/opt/apps/EMRHOOK/emrhook-current/${spark-jar}. Note that the colon (:) is used as a separator.

    Configuration item: spark.executor.extraClassPath. Add the following information to the end of the value of spark.executor.extraClassPath: :/opt/apps/EMRHOOK/emrhook-current/${spark-jar}. Note that the colon (:) is used as a separator.

    Configuration item: spark.sql.queryExecutionListeners. Set spark.sql.queryExecutionListeners to com.aliyun.emr.meta.spark.listener.EMRQueryLogger.

Procedure (applicable for EMR 5.X versions earlier than V5.10.1 and EMR 3.X versions earlier than V3.44.1)

1. Update JAR packages

(1) Log on to the gateway by using SSH and run the following script. You must have root permissions. Note that you must replace ${region} with the current region ID, such as cn-hangzhou.

The following script is used to download and decompress the latest EMRHOOK JAR package. After the EMRHOOK JAR package is decompressed, perform the following two steps to update the decompressed JAR packages.

sudo mkdir -p /opt/apps/EMRHOOK/upgrade/
sudo wget https://dlf-repo-${region}.oss-${region}-internal.aliyuncs.com/emrhook/latest/emrhook.tar.gz -P /opt/apps/EMRHOOK/upgrade
sudo tar -p -zxf /opt/apps/EMRHOOK/upgrade/emrhook.tar.gz -C /opt/apps/EMRHOOK/upgrade/

(2) Rename the decompressed JAR packages. The minor version of the EMRHOOK component varies based on EMR versions. Therefore, before you replace the original JAR packages, you must manually rename the decompressed JAR packages. To rename the decompressed JAR packages, add the minor version of the current EMRHOOK component. After the decompressed JAR packages are renamed, copy the renamed JAR packages to replace the original JAR packages.

For example, in EMR V3.43.1, the minor version of the EMRHOOK component is 1.1.4, and a naming rule of JAR packages is hive-hook-${version}-hive20.jar. In this case, you need to replace ${version} with the version of the current EMRHOOK component. The following sample code provides an example:

cd /opt/apps/EMRHOOK/upgrade/emrhook
mv hive-hook-hive20.jar hive-hook-1.1.4-hive20.jar
mv hive-hook-hive23.jar hive-hook-1.1.4-hive23.jar
mv hive-hook-hive31.jar hive-hook-1.1.4-hive31.jar
mv spark-hook-spark24.jar spark-hook-1.1.4-spark24.jar
mv spark-hook-spark30.jar spark-hook-1.1.4-spark30.jar

image

(3) Run the renamed JAR packages to update them.

sudo cp -p /opt/apps/EMRHOOK/upgrade/emrhook/* /opt/apps/EMRHOOK/emrhook-current/

2. Modify EMR Hive configurations

Important

The value of ${hive-jar} varies based on EMR Hive versions. For EMR Hive 2, set ${hive-jar} to hive-hook-${emrhook-version}-hive23.jar. For EMR Hive 3, set ${hive-jar} to hive-hook-${emrhook-version}-hive31.jar. Set ${emrhook-version} to the version of the EMRHOOK component, such as hive-hook-1.1.4-hive23.jar.

  • hive-site.xml (/etc/taihao-apps/hive-conf/hive-site.xml)

    Configuration item: hive.aux.jars.path. Add the following information to the end of the value of hive.aux.jars.path: ,/opt/apps/EMRHOOK/emrhook-current/${hive-jar}. Note that the comma (,) is used as a separator.

    Configuration item: hive.exec.post.hooks. Set hive.exec.post.hooks to com.aliyun.emr.meta.hive.hook.LineageLoggerHook.

  • hive-env.sh (/etc/taihao-apps/hive-conf/hive-env.sh)

    Configuration item: HIVE_AUX_JARS_PATH. Add the following information to the end of the value of HIVE_AUX_JARS_PATH: ,/opt/apps/EMRHOOK/emrhook-current/${hive-jar}. Note that the comma (,) is used as a separator.

3. Modify EMR Spark configurations

Important

The value of ${spark-jar} varies based on EMR Spark versions. For EMR Spark 2, set ${spark-jar} to spark-hook-${emrhook-version}-spark24.jar. For EMR Spark 3, set ${spark-jar} to spark-hook-${emrhook-version}-spark30.jar. Set ${emrhook-version} to the version of the EMRHOOK component, such as spark-hook-1.1.4-spark24.jar.

  • spark-defaults.conf (/etc/taihao-apps/spark-conf/spark-defaults.conf)

    Configuration item: spark.driver.extraClassPath. Add the following information to the end of the value of spark.driver.extraClassPath: :/opt/apps/EMRHOOK/emrhook-current/${spark-jar}. Note that the colon (:) is used as a separator.

    Configuration item: spark.executor.extraClassPath. Add the following information to the end of the value of spark.executor.extraClassPath: :/opt/apps/EMRHOOK/emrhook-current/${spark-jar}. Note that the colon (:) is used as a separator.

    Configuration item: spark.sql.queryExecutionListeners. Set spark.sql.queryExecutionListeners to com.aliyun.emr.meta.spark.listener.EMRQueryLogger.