You can use snapshots created by using the Snapshot command to restore data that is accidentally deleted or to back up data to ensure service continuity when an error occurs. The snapshot feature of OSS-HDFS supports directory-level operations, and is fully compatible with that of HDFS.
This feature is in trial and small-scale use and is not recommended for large-scale use.
Prerequisites
A Hadoop environment, Hadoop cluster, or Hadoop client is created. For more information about how to install Hadoop, see Step 2: Create a Hadoop runtime environment.
OSS-HDFS is enabled for specific buckets. For more information, see Enable OSS-HDFS and grant access permissions.
JindoSDK 4.5.0 or later is installed and configured. For more information, see Connect non-EMR clusters to OSS-HDFS.
Step 1: Configure environment variables
Connect to an Elastic Compute Service (ECS) instance. For more information, see Connect to an instance.
Go to the bin directory of the installed JindoSDK JAR package.
cd jindosdk-x.x.x/bin/
Notex.x.x indicates the version number of the JindoSDK JAR package.
Grant read and write permissions to the
jindo-util
file in the bin directory.chmod 700 jindo-util
Rename the
jindo-util
file tojindo
.mv jindo-util jindo
Create a configuration file named
jindosdk.cfg
, and add the following parameters:[common] Retain the following default configurations: logger.dir = /tmp/jindo-util/ logger.sync = false logger.consolelogger = false logger.level = 0 logger.verbose = 0 logger.cleaner.enable = true hadoopConf.enable = false [jindosdk] Specify the following parameters: <!-- In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint. --> fs.oss.endpoint = cn-hangzhou.oss-dls.aliyuncs.com <! -- Configure the AccessKey ID and AccessKey secret that are used to access OSS-HDFS. --> fs.oss.accessKeyId = LTAI******** fs.oss.accessKeySecret = KZo1********
Configure environment variables.
export JINDOSDK_CONF_DIR=<JINDOSDK_CONF_DIR>
Set <JINDOSDK_CONF_DIR> to the absolute path of the
jindosdk.cfg
configuration file.
Step 2: Perform snapshot-related operations
For example, you have a bucket named examplebucket and a directory named exampledir within.
Enable the snapshot feature
Run the following command to enable the snapshot feature for the exampledir directory:
./jindo admin -allowSnapshot -dlsUri oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir
For more information about how to configure the endpoint of OSS-HDFS, see Connect non-EMR clusters to OSS-HDFS.
Create a snapshot
Create subdirectories and files.
Create subdirectories named dir1 and dir2, and objects named file1 and file2 in the exampledir directory.
hdfs dfs -mkdir oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/dir1 hdfs dfs -mkdir oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/dir2 hdfs dfs -touchz oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/file1.txt hdfs dfs -touchz oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/file2.txt
Create a snapshot named S1 for the exampledir directory.
hdfs dfs -createSnapshot oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir S1
Rename a snapshot
Rename the S1 snapshot S2:
hdfs dfs -renameSnapshot oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir S1 S2
Access directories and objects in a snapshot
Access the dir1 subdirectory in the exampledir root directory of examplebucket:
hdfs dfs -ls oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/dir1
Access directories and objects in the S1 snapshot:
hdfs dfs -ls oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/.snapshot/S1/dir1
Compare snapshots
Compare the S1 and S2 snapshots:
./jindo admin -snapshotDiff \
-dlsUri oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir \
-fromSnapshot S1 \
-toSnapshot S2
Use a snapshot to restore data
For example, you delete the dir1 object from the exampledir root directory of examplebucket:
hdfs dfs -rm -r oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/dir1
Restore the deleted object:
hdfs dfs -cp oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/.snapshot/S1/dir1 oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir
View the directory or object that you accidentally deleted after the data is restored:
hdfs dfs -ls oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir/dir1
Delete a snapshot
Delete the S1 or S2 snapshot that you no longer need to retain:
Delete the S1 snapshot
hdfs dfs -deleteSnapshot oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir S1
Delete the S2 snapshot
hdfs dfs -deleteSnapshot oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir S2
Disable the snapshot feature
If you no longer need to use the snapshot feature, run the following command to disable it:
./jindo admin -disallowSnapshot -dlsUri oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampledir
Before you disable the snapshot feature, make sure that all snapshots in the destination path are deleted. Otherwise, an error occurs when you disable the snapshot feature.