This topic provides answers to some frequently asked questions about Jindo DistCp.
What do I do if objects are listed at a low speed?
Problem description
When I use Jindo DistCp, objects are listed at a low speed, and the following message is returned:
Successfully list objects with prefix xxx/yyy/ in bucket xxx recursive 0 result 315 dur 100036.615031MS
In the message,
dur 100036.615031MS
indicates the time taken to list objects, in milliseconds. In normal cases, 1,000 Object Storage Service (OSS) objects can be listed within 1 second. You can determine whether the time taken to list objects in a directory is normal based on the normal speed. For example, the preceding message shows that 100 seconds are taken to list 315 objects in a directory. This is abnormal.Solution
Run the following command to increase the memory of your client:
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx4096m"
What do I do if a checksum-related error occurs?
Problem description
The following error message is reported when Jindo DistCp is used:
Failed to get checksum store.
Solution
By default, OSS-HDFS uses the checksum algorithm COMPOSITE_CRC. If the dfs.checksum.combine.mode parameter of HDFS is set to MD5MD5CRC, you need to change the value of the fs.oss.checksum.combine.mode parameter to MD5MD5CRC. Sample command:
hadoop jar jindo-distcp-${version}.jar --src /data --dest oss://destBucket/ --hadoopConf fs.oss.checksum.combine.mode=MD5MD5CRC
What do I do if an error occurs when I copy an Object Storage Service (OSS) object to OSS-HDFS?
Problem description
The following error message is returned when Jindo DistCp is used to copy an OSS object to OSS-HDFS:
Exception raised while copying data file, verify checksum failed
Solution
If the objects in OSS are not migrated from HDFS to OSS by using Jindo DistCp, you must configure the --disableChecksum parameter to disable the checksum feature. Sample command:
hadoop jar jindo-distcp-${version}.jar --src oss://ossBucket/ --dest oss://dlsBucket/ --disableChecksum
How do I check whether Jindo DistCp is successfully run?
If you do not add the --ignore
parameter when you run Jindo DistCp and an exception occurs during the copy process, the system reports an error and stops the copy operation. If you add the --ignore
parameter when you run Jindo DistCp, you can view the information about Jindo DistCp counters, such as COPY_FAILED and CHECKSUM_DIFF, to check whether data is complete. For more information, see Jindo DistCp counters in the Use Jindo DistCp topic.