You can use snapshots created by using the Snapshot command to restore data that is accidentally deleted or to back up data to ensure service continuity when an error occurs. You can use the snapshot feature of OSS-HDFS in the same manner that you use the snapshot feature of HDFS. The snapshot feature of OSS-HDFS supports directory-level operations.
This feature is in trial and small-scale use and is not recommended for large-scale use.
Prerequisites
A Hadoop environment, Hadoop cluster, or Hadoop client is created. For more information about how to install Hadoop, see Step 2: Create a Hadoop runtime environment.
OSS-HDFS is enabled for specific buckets. For more information about how to enable OSS-HDFS, see Enable OSS-HDFS and grant access permissions.
JindoSDK 4.5.0 or later is installed and configured. For more information, see Connect non-EMR clusters to OSS-HDFS.
Step 1: Configure environment variables
Connect to an Elastic Compute Service (ECS) instance. For more information, see Connect to an instance.
Go to the bin directory of the installed JindoSDK JAR package.
cd jindosdk-x.x.x/bin/
Notex.x.x indicates the version number of the JindoSDK JAR package.
Grant read and write permissions to the
jindo-util
file in the bin directory.chmod 700 jindo-util
Rename the
jindo-util
file tojindo
.mv jindo-util jindo
Create a configuration file named
jindosdk.cfg
, and then add the following parameters to the configuration file:[common] Retain the following default configurations: logger.dir = /tmp/jindo-util/ logger.sync = false logger.consolelogger = false logger.level = 0 logger.verbose = 0 logger.cleaner.enable = true hadoopConf.enable = false [jindosdk] Specify the following parameters: <!-- In this example, the China (Hangzhou) region is used. Specify your actual region. --> fs.oss.endpoint = cn-hangzhou.oss-dls.aliyuncs.com <! -- Configure the AccessKey ID and AccessKey secret that are used to access OSS-HDFS. --> fs.oss.accessKeyId = LTAI******** fs.oss.accessKeySecret = KZo1********
Configure environment variables.
export JINDOSDK_CONF_DIR=<JINDOSDK_CONF_DIR>
Set <JINDOSDK_CONF_DIR> to the absolute path of the
jindosdk.cfg
configuration file.
Step 2: Perform snapshot-related operations
Enable the snapshot feature
For example, you have a bucket named examplebucket and a directory named exampledir in the bucket. To enable the snapshot feature for the exampledir directory, run the following command in the JindoSDK shell CLI:
./jindo admin -allowSnapshot -dlsUri oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir
For more information about how to configure the endpoint of OSS-HDFS, see Connect non-EMR clusters to OSS-HDFS.
Create a snapshot
After you enable the snapshot feature for the exampledir directory in the examplebucket bucket, perform the following operations to create a snapshot:
Create subdirectories and objects.
Create subdirectories named dir1 and dir2, and objects named file1 and file2 in the exampledir directory.
# Create the dir1 subdirectory. hdfs dfs -mkdir oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/dir1 # Create the dir2 subdirectory. hdfs dfs -mkdir oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/dir2 # Create the file1 object. hdfs dfs -touchz oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/file1.txt # Create the file2 object. hdfs dfs -touchz oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/file2.txt
Create a snapshot named S1 for the exampledir directory.
hdfs dfs -createSnapshot oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir S1
Rename a snapshot
You can run the following command in the HDFS shell CLI to rename the S1 snapshot S2:
hdfs dfs -renameSnapshot oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir S1 S2
Access directories and objects in a snapshot
To access the dir1 subdirectory in the exampledir root directory of the examplebucket bucket, run the following command in the HDFS shell CLI:
hdfs dfs -ls oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/dir1
You can access the exampledir root directory by accessing the S1 snapshot that you created for this directory. If you want to access directories and objects in the S1 snapshot, run the following command in the HDFS shell CLI:
hdfs dfs -ls oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/.snapshot/S1/dir1
Compare snapshots
To compare the S1 and S2 snapshots in the exampledir directory, run the following command in the JindoSDK shell CLI:
./jindo admin -snapshotDiff \
-dlsUri -dlsUri oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir \
-fromSnapshot S1 \
-toSnapshot S2
Use a snapshot to restore data
You can use the snapshot feature to back up and restore data. The snapshot feature allows you to restore data that you accidentally deleted in a timely manner. For example, you delete the dir1 object from the exampledir root directory of the examplebucket bucket by running the following command:
hdfs dfs -rm -r oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/dir1
You can use the S1 snapshot that you created for the exampledir root directory of the examplebucket bucket to restore the deleted object by running the following command in the HDFS shell CLI:
hdfs dfs -cp oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/.snapshot/S1/dir1 oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir
After the data is restored, run the following command to view the directory or object that you accidentally deleted:
hdfs dfs -ls oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/dir1
Delete a snapshot
If you do not want to retain the S1 snapshot that you created for the exampledir root directory of the examplebucket bucket or you do not want to retain the S2 snapshot obtained by renaming the S1 snapshot, run the following command in the HDFS shell CLI to delete the S1 or S2 snapshot:
Delete the S1 snapshot
hdfs dfs -deleteSnapshot oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir S1
Delete the S2 snapshot
hdfs dfs -deleteSnapshot oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir S2
Disable the snapshot feature
If you no longer need to use the snapshot feature, run the following command in the JindoSDK shell CLI to disable the snapshot feature:
./jindo admin -disallowSnapshot -dlsUri oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir
Before you disable the snapshot feature, make sure that all snapshots in the destination path are deleted. Otherwise, an error occurs when you disable the snapshot feature.