This topic describes how to upgrade JindoSDK that is deployed in an E-MapReduce (EMR) cluster in different scenarios.
Prerequisites
An EMR cluster is created. For more information, see Create a cluster.
Scenario 1: Upgrade JindoSDK in an existing cluster
For clusters of EMR V3.40.0 or a later minor version, or clusters of EMR V5.6.0 or a later minor version, you can upgrade JindoSDK in the clusters if you encounter known version-specific issues in JindoData or you want to use new features of JindoSDK. For information about known version-specific issues, see Known issues in JindoData 4.X.
If you upgrade JindoSDK from 4.6.8 or earlier to 4.6.9 or later or to a version of 6.X series, the default temporary job path used by JindoCommitter is changed. To prevent data loss caused by JindoSDK upgrade, you must add fs.jdo.committer.allow.concurrent=false
to the core-site.xml
file of Hadoop-Common or add spark.hadoop.fs.jdo.committer.allow.concurrent=false
to the configurations of Spark. After JindoSDK is upgraded in all nodes of your cluster, including gateway nodes, you can remove the previous settings.
Step 1: Prepare software packages and upgrade scripts
Log on to the master node of your cluster. For more information, see Log on to a cluster.
Download the patch package to the home directory of the user emr-user and decompress the package.
su - emr-user cd /home/emr-user/ wget https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/resources/emr-taihao/jindosdk-patches.tar.gz tar zxf jindosdk-patches.tar.gz
Download the software package jindosdk-{VERSION}.tar.gz of JindoSDK to the jindosdk-patches directory that you obtained in the previous step.
This topic describes how to upgrade JindoSDK to 6.3.4.
cd jindosdk-patches wget https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/release/6.3.4/jindosdk-6.3.4-linux.tar.gz ls -l
Sample content of the jindosdk-patches directory:
-rwxrwxr-x 1 emr-user emr-user 2439 May 01 00:00 apply_all.sh -rwxrwxr-x 1 emr-user emr-user 7315 May 01 00:00 apply.sh -rw-rw-r-- 1 emr-user emr-user 40 May 01 00:00 hosts -rw-r----- 1 emr-user emr-user xxxxxxxxx May 01 00:00 jindosdk-6.3.4-linux.tar.gz -rwxrwxr-x 1 emr-user emr-user 1112 May 01 00:00 revert_all.sh -rwxrwxr-x 1 emr-user emr-user 2042 May 01 00:00 revert.sh
Step 2: Configure node information
Manual configuration
Run the following command to edit the hosts file in the jindosdk-patches directory:
vim hosts
Add the hostnames, such as master-1-1 and core-1-1, of all nodes in the cluster to the hosts file. Enter one hostname in each line.
Sample file content:
master-1-1 core-1-1 core-1-2
Automatic configuration
You can run the following command to obtain the information of all nodes. If you fail to obtain
hosts
, you need to manually configure the node information.cat /usr/local/taihao-executor-all/data/cache/.cluster_context | jq --raw-output '.nodes[].hostname.alias[]' > hosts
Step 3: Upgrade JindoSDK
Run the apply_all.sh script to upgrade JindoSDK to a specific version.
./apply_all.sh $NEW_JINDOSDK_VERSION # Replace $NEW_JINDOSDK_VERSION with the version of JindoSDK that you want to upgrade to.
For example, you can run the following command to upgrade JindoSDK in an EMR cluster to 6.3.4:
./apply_all.sh 6.3.4
If the returned information contains ### DONE
, the script is successfully run.
>>> updating ... master-1-1
>>> updating ... core-1-1
>>> updating ... core-1-2
### DONE
Step 4: Check the upgrade result
ls -l /opt/apps/JINDOSDK/jindosdk-current/lib
If you successfully upgrade JindoSDK from 6.2.0 to 6.3.4, the following information is returned:
lrwxrwxrwx 1 emr-user emr-user 64 Apr 12 11:08 jindo-core-6.2.0.jar -> /opt/apps/JINDOSDK/jindosdk-6.3.4-linux/lib/jindo-core-6.3.4.jar
lrwxrwxrwx 1 emr-user emr-user 82 Apr 12 11:08 jindo-core-linux-el7-aarch64-6.2.0.jar -> /opt/apps/JINDOSDK/jindosdk-6.3.4-linux/lib/jindo-core-linux-el7-aarch64-6.3.4.jar
lrwxrwxrwx 1 emr-user emr-user 63 Apr 12 11:08 jindo-sdk-6.2.0.jar -> /opt/apps/JINDOSDK/jindosdk-6.3.4-linux/lib/jindo-sdk-6.3.4.jar
lrwxrwxrwx 1 emr-user emr-user 50 Apr 12 11:08 native -> /opt/apps/JINDOSDK/jindosdk-6.3.4-linux/lib/native
lrwxrwxrwx 1 emr-user emr-user 57 Apr 12 11:08 site-packages -> /opt/apps/JINDOSDK/jindosdk-6.3.4-linux/lib/site-packages
Step 5: Restart services
For jobs that run on YARN, such as Spark Streaming or Flink jobs, perform a rolling restart on YARN NodeManager after the jobs stop.
After you complete the upgrade, restart the related services in the EMR console, such as Hive, Presto, Impala, Flink, Ranger, Spark, and Zeppelin.
For example, you can perform the following operations to restart the Hive service: On the Hive service page of the desired EMR cluster, choose
in the upper-right corner.Scenario 2: Upgrade JindoSDK when you create a cluster or scale out an existing cluster
If you want to upgrade JindoSDK when you create a cluster or scale out an existing cluster, you can add a bootstrap action in the EMR console. This ensures that JindoSDK can be upgraded to the latest version. To upgrade JindoSDK in an efficient and accurate manner, perform the following operations:
Step 1: Prepare an upgrade package
Run the following commands to download the jindosdk-patches.tar.gz and jindosdk-{VERSION}-{PLATFORM}.tar.gz packages and the bootstrap_jindosdk.sh script:
In this example, upgrade JindoSDK to 6.3.4.
mkdir jindo-patch cd jindo-patch wget https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/resources/emr-taihao/jindosdk-patches.tar.gz wget https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/release/6.3.4/jindosdk-6.3.4-linux.tar.gz wget https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/resources/emr-taihao/bootstrap_jindosdk.sh ls -l
The following information is returned:
-rw-r----- 1 hadoop hadoop xxxx May 01 00:00 bootstrap_jindosdk.sh -rw-r----- 1 hadoop hadoop xxxxxxxxx May 01 00:00 jindosdk-6.3.4-linux.tar.gz -rw-r----- 1 hadoop hadoop xxxx May 01 00:00 jindosdk-patches.tar.gz
Run the following command to prepare an upgrade package:
bash bootstrap_jindosdk.sh -gen $NEW_JINDOSDK_VERSION # Replace $NEW_JINDOSDK_VERSION with the version of JindoSDK that you want to upgrade to.
NoteIf you want to upgrade JindoSDK when you scale out a cluster, use the
-gen
parameter to generate a lightweight upgrade package.If you want to upgrade JindoSDK when you create a cluster, use the
-gen-full
parameter to generate a complete upgrade package.
The following code describes how to upgrade JindoSDK to 6.3.4 when you scale out a cluster.
bash bootstrap_jindosdk.sh -gen 6.3.4
After you prepare the upgrade package, the following information is returned:
Generated patch at /home/emr-user/jindo-patch/jindosdk-bootstrap-patches.tar.gz
Step 2: Upload the upgrade package
Upload the patch package and bootstrap script to Object Storage Service (OSS). You can upload the patch package and the script for EMR clusters by running Hadoop commands, by using OSSUtils or OSS Browser, or in the OSS console.
For example, you can upload the bootstrap script to oss://<bucket-name>/path/to/bootstrap_jindosdk.sh
and the patch package to oss://<bucket-name>/path/to/jindosdk-bootstrap-patches.tar.gz
.
hadoop dfs -mkdir -p oss://<bucket-name>/path/to/patch/
cd /home/hadoop/patch/
hadoop dfs -put jindosdk-bootstrap-patches.tar.gz oss://<bucket-name>/path/to/patch/
hadoop dfs -put bootstrap_jindosdk.sh oss://<bucket-name>/path/to/patch/
hadoop dfs -ls oss://<bucket-name>/path/to/patch/
The following information is returned:
Found 2 items
-rw-rw-rw- 1 2634 2022-05-13 14:07 oss://<bucket-name>/.../bootstrap_jindosdk.sh
-rw-rw-rw- 1 597342992 2022-05-13 13:41 oss://<bucket-name>/.../jindosdk-bootstrap-patches.tar.gz
Step 3: Add a bootstrap action
Add a bootstrap action in the EMR console. For more information, see Manage bootstrap actions.
The following table describes the parameters that you can configure to add a bootstrap action.
Parameter | Description | Example |
Name | The name of the bootstrap action that you want to add. | update_jindosdk |
Script Address | The OSS path where the script file is located. You must configure this parameter in the |
|
Parameter | The parameter of the bootstrap action script. The parameter is used to specify the value of the variable that is referenced in the script. |
|
Execution Scope | Select Cluster. | Cluster |
Execution Time | Select After Component Startup. | After Component Startup |
Execution Failure Policy | Select Proceed. | Proceed |
Step 4: Restart services
Restart related services for the upgrade to take effect.
After you create a cluster, restart the related services, such as Hive, Presto, Impala, Flink, Ranger, Spark, and Zeppelin.
After you scale out an existing cluster, restart the related services for the new nodes, such as Hive, Presto, Impala, Flink, Ranger, Spark, and Zeppelin.
Scenario 3: Roll back JindoSDK to the default version
For clusters of EMR V3.40.0 or a later minor version, or clusters of EMR V5.6.0 or a later minor version, if you encounter issues during the upgrade of JindoSDK, you can perform the following operations to roll back JindoSDK to the default version:
Step 1: Prepare a rollback script
Log on to the master node of your cluster. For more information, see Log on to a cluster.
Download the patch package to the home directory of the user emr-user and decompress the package.
su - emr-user cd /home/emr-user/ wget https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/resources/emr-taihao/jindosdk-patches.tar.gz tar zxf jindosdk-patches.tar.gz cd jindosdk-patches ls -l
The following information is returned:
-rwxrwxr-x 1 emr-user emr-user 2439 May 01 00:00 apply_all.sh -rwxrwxr-x 1 emr-user emr-user 7315 May 01 00:00 apply.sh -rw-rw-r-- 1 emr-user emr-user 40 May 01 00:00 hosts -rwxrwxr-x 1 emr-user emr-user 1112 May 01 00:00 revert_all.sh -rwxrwxr-x 1 emr-user emr-user 2042 May 01 00:00 revert.sh
Step 2: Configure node information
Manual configuration
Run the following command to edit the hosts file in the jindosdk-patches directory:
vim hosts
Add the hostnames, such as master-1-1 and core-1-1, of all nodes in the cluster to the hosts file. Enter one hostname in each line.
Sample file content:
master-1-1 core-1-1 core-1-2
Automatic configuration
You can run the following command to obtain the information of all nodes. If you fail to obtain
hosts
, you need to manually configure the node information.cat /usr/local/taihao-executor-all/data/cache/.cluster_context | jq --raw-output '.nodes[].hostname.alias[]' > hosts
Step 3: Perform a rollback
Run the following script to roll back all changes:
./revert_all.sh
If the returned information contains ### DONE
, the script is successfully run.
>>> updating ... master-1-1
>>> updating ... core-1-1
>>> updating ... core-1-2
### DONE
Step 4: Confirm the rollback result
ls -l /opt/apps/JINDOSDK/jindosdk-current/lib
If you successfully roll JindoSDK back to 6.2.0, the following information is returned:
-rw-r--r-- 1 emr-user emr-user 1253740 Apr 24 17:40 jindo-core-6.2.0.jar
-rw-r--r-- 1 emr-user emr-user 13110547 Apr 24 17:40 jindo-core-linux-el7-aarch64-6.2.0.jar
-rw-r--r-- 1 emr-user emr-user 4432227 Apr 24 17:40 jindo-sdk-6.2.0.jar
drwxr-xr-x 2 emr-user emr-user 4096 Apr 24 17:40 native
Step 5: Restart services
For jobs that run on YARN, such as Spark Streaming or Flink jobs, perform a rolling restart on YARN NodeManager after the jobs stop.
Restart related services, such as Hive, Presto, Impala, Flink, Ranger, Spark, and Zeppelin, for the rollback to take effect.
For example, you can perform the following operations to restart the Hive service: On the Hive service page of the desired EMR cluster, choose
in the upper-right corner.