This topic provides answers to some frequently asked questions about Jindo DistCp.
What do I do if objects are listed at a low speed?
Problem description
When I use Jindo DistCp, objects are listed at a low speed, and the following message is returned:
Successfully list objects with prefix xxx/yyy/ in bucket xxx recursive 0 result 315 dur 100036.615031MSIn the message,
dur 100036.615031MSindicates the time taken to list objects, in milliseconds. In normal cases, 1,000 Object Storage Service (OSS) objects can be listed within 1 second. You can determine whether the time taken to list objects in a directory is normal based on the normal speed. For example, the preceding message shows that 100 seconds are taken to list 315 objects in a directory. This is abnormal.Solution
Run the following command to increase the memory of your client:
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx4096m"
What do I do if a checksum-related error occurs?
Problem description
The following error message is reported when Jindo DistCp is used:
Failed to get checksum store.Solution
By default, OSS-HDFS uses the checksum algorithm COMPOSITE_CRC. If the dfs.checksum.combine.mode parameter of HDFS is set to MD5MD5CRC, you need to change the value of the fs.oss.checksum.combine.mode parameter to MD5MD5CRC. Sample command:
hadoop jar jindo-distcp-${version}.jar --src /data --dest oss://destBucket/ --hadoopConf fs.oss.checksum.combine.mode=MD5MD5CRC
What do I do if an error occurs when I copy an Object Storage Service (OSS) object to OSS-HDFS?
Problem description
The following error message is returned when Jindo DistCp is used to copy an OSS object to OSS-HDFS:
Exception raised while copying data file, verify checksum failedSolution
If the objects in OSS are not migrated from HDFS to OSS by using Jindo DistCp, you must configure the --disableChecksum parameter to disable the checksum feature. Sample command:
hadoop jar jindo-distcp-${version}.jar --src oss://ossBucket/ --dest oss://dlsBucket/ --disableChecksum
How do I check whether Jindo DistCp is successfully run?
If you do not add the --ignore parameter when you run Jindo DistCp and an exception occurs during the copy process, the system reports an error and stops the copy operation. If you add the --ignore parameter when you run Jindo DistCp, you can view the information about Jindo DistCp counters, such as COPY_FAILED and CHECKSUM_DIFF, to check whether data is complete. For more information, see Jindo DistCp counters in the Use Jindo DistCp topic.