本文介紹在Gateway環境升級EMRHOOK組件的操作步驟
背景說明
使用情境:在EMR Gateway環境中執行計算操作,gateway通過EMR-CLI自訂部署,且使用資料湖構建(DLF)資料概況訪問次數指標,生命週期資料訪問頻次規則時,需要手工升級EMRHOOK組件;
升級過程不會影響正在啟動並執行計算任務,升級完成後重新拉起的計算任務會自動生效;
適用條件
EMR叢集的中繼資料管理使用資料湖構建(DLF)
以下操作僅對通過EMR-CLI自訂部署的gateway叢集生效
升級步驟(適用於EMR版本 >= EMR-5.10.1,EMR-3.44.1)
1. 升級jar包
ssh登入到gateway,並執行以下指令碼(需要有root許可權),注意替換${region}成目前範圍,如cn-hangzhou
sudo mkdir -p /opt/apps/EMRHOOK/upgrade/
sudo wget https://dlf-repo-${region}.oss-${region}-internal.aliyuncs.com/emrhook/latest/emrhook.tar.gz -P /opt/apps/EMRHOOK/upgrade
sudo tar -p -zxf /opt/apps/EMRHOOK/upgrade/emrhook.tar.gz -C /opt/apps/EMRHOOK/upgrade/
sudo cp -p /opt/apps/EMRHOOK/upgrade/emrhook/* /opt/apps/EMRHOOK/emrhook-current/
2. 修改HIVE配置
${hive-jar}根據spark版本不一樣選擇不一樣,hive2填寫hive-hook-hive23.jar,hive3填寫hive-hook-hive31.jar;
hive-site.xml (/etc/taihao-apps/hive-conf/hive-site.xml)
配置項:hive.aux.jars.path 配置值末尾添加(注意分隔字元是逗號)
,/opt/apps/EMRHOOK/emrhook-current/${hive-jar}
配置項:hive.exec.post.hooks 配置值添加
com.aliyun.emr.meta.hive.hook.LineageLoggerHook
hive-env.sh (/etc/taihao-apps/hive-conf/hive-env.sh)
配置項:HIVE_AUX_JARS_PATH 配置值末尾添加(注意分隔字元是逗號)
,/opt/apps/EMRHOOK/emrhook-current/${hive-jar}
3. 修改SPARK配置
${spark-jar}根據spark版本不一樣選擇不一樣,spark2填寫spark-hook-spark24.jar,spark3填寫spark-hook-spark30.jar;
spark-defaults.conf (/etc/taihao-apps/spark-conf/spark-defaults.conf)
配置項:spark.driver.extraClassPath 配置值末尾添加(注意分隔字元是冒號)
:/opt/apps/EMRHOOK/emrhook-current/${spark-jar}
配置項:spark.executor.extraClassPath 配置值末尾添加(注意分隔字元是冒號)
:/opt/apps/EMRHOOK/emrhook-current/${spark-jar}
配置項:spark.sql.queryExecutionListeners 配置值添加
com.aliyun.emr.meta.spark.listener.EMRQueryLogger
升級步驟(適用於EMR版本 < EMR-5.10.1,EMR-3.44.1)
1. 升級jar包
i). ssh登入到gateway,並執行以下指令碼(需要有root許可權), 注意替換${region}成目前範圍,如cn-hangzhou;
以下指令碼為下載並解壓最近的emrhook jar包,解壓完成後按照後續操作進行jar包升級;
sudo mkdir -p /opt/apps/EMRHOOK/upgrade/
sudo wget https://dlf-repo-${region}.oss-${region}-internal.aliyuncs.com/emrhook/latest/emrhook.tar.gz -P /opt/apps/EMRHOOK/upgrade
sudo tar -p -zxf /opt/apps/EMRHOOK/upgrade/emrhook.tar.gz -C /opt/apps/EMRHOOK/upgrade/
ii). 替換jar時,由於emrhook小版本號碼不同emr版本不一致,需要手動重新命名jar包成當前的emrhook版本後,再進行拷貝替換操作;
例如EMR-3.43.1版本emrhook組件小版本為1.1.4,jar包命名規則hive-hook-${version}-hive20.jar,則需要將上述解壓好的jar包修改成一樣的命名:
cd /opt/apps/EMRHOOK/upgrade/emrhook
mv hive-hook-hive20.jar hive-hook-1.1.4-hive20.jar
mv hive-hook-hive23.jar hive-hook-1.1.4-hive23.jar
mv hive-hook-hive31.jar hive-hook-1.1.4-hive31.jar
mv spark-hook-spark24.jar spark-hook-1.1.4-spark24.jar
mv spark-hook-spark30.jar spark-hook-1.1.4-spark30.jar
iii). 修改完成後執行
sudo cp -p /opt/apps/EMRHOOK/upgrade/emrhook/* /opt/apps/EMRHOOK/emrhook-current/
2. 修改HIVE配置
${hive-jar}根據spark版本不一樣選擇不一樣,hive2填寫hive-hook-${emrhook-version}-hive23.jar,hive3填寫hive-hook-${emrhook-version}-hive31.jar;${emrhook-version}填寫組件版本,如hive-hook-1.1.4-hive23.jar;
hive-site.xml (/etc/taihao-apps/hive-conf/hive-site.xml)
配置項:hive.aux.jars.path 配置值末尾添加(注意分隔字元是逗號)
,/opt/apps/EMRHOOK/emrhook-current/${hive-jar}
配置項:hive.exec.post.hooks 配置值添加
com.aliyun.emr.meta.hive.hook.LineageLoggerHook
hive-env.sh (/etc/taihao-apps/hive-conf/hive-env.sh)
配置項:HIVE_AUX_JARS_PATH 配置值末尾添加(注意分隔字元是逗號)
,/opt/apps/EMRHOOK/emrhook-current/${hive-jar}
3. 修改SPARK配置
${spark-jar}根據spark版本不一樣選擇不一樣,spark2填寫spark-hook-${emrhook-version}-spark24.jar,spark3填寫spark-hook-${emrhook-version}-spark30.jar;${emrhook-version}填寫組件版本,如spark-hook-1.1.4-spark24.jar;
spark-defaults.conf (/etc/taihao-apps/spark-conf/spark-defaults.conf)
配置項:spark.driver.extraClassPath 配置值末尾添加(注意分隔字元是冒號)
:/opt/apps/EMRHOOK/emrhook-current/${spark-jar}
配置項:spark.executor.extraClassPath 配置值末尾添加(注意分隔字元是冒號)
:/opt/apps/EMRHOOK/emrhook-current/${spark-jar}
配置項:spark.sql.queryExecutionListeners 配置值添加
com.aliyun.emr.meta.spark.listener.EMRQueryLogger