OSS-HDFS supports RootPolicy. You can use RootPolicy to configure a custom prefix for OSS-HDFS. This way, jobs can run on OSS-HDFS without modifying the original access prefix hdfs://
.
Prerequisites
A Hadoop environment, Hadoop cluster, or Hadoop client is created. For more information about how to install Hadoop, see Step 2: Create a Hadoop runtime environment.
OSS-HDFS is enabled for specific buckets. For more information about how to enable OSS-HDFS, see Enable OSS-HDFS and grant access permissions.
JindoSDK 4.5.0 or later is installed and configured. For more information, see Connect non-EMR clusters to OSS-HDFS.
Procedure
Configure environment variables.
Connect to an ECS instance. For more information, see Connect to an ECS instance.
Go to the bin directory of the installed JindoSDK JAR package.
cd jindosdk-x.x.x/bin/
Notex.x.x indicates the version number of the JindoSDK JAR package.
Grant read and write permissions to the
jindo-util
file in the bin directory.chmod 700 jindo-util
Rename the
jindo-util
file tojindo
.mv jindo-util jindo
Create a configuration file named
jindosdk.cfg
, and then add the following parameters to the configuration file.[common] Retain the following default configurations. logger.dir = /tmp/jindo-util/ logger.sync = false logger.consolelogger = false logger.level = 0 logger.verbose = 0 logger.cleaner.enable = true hadoopConf.enable = false [jindosdk] Specify the following parameters. <!-- In this example, the China (Hangzhou) region is used. Specify your actual region. --> fs.oss.endpoint = cn-hangzhou.oss-dls.aliyuncs.com <! -- Configure the AccessKey ID and AccessKey secret that is used to access OSS-HDFS. --> fs.oss.accessKeyId = LTAI******** fs.oss.accessKeySecret = KZo1********
Configure environment variables.
export JINDOSDK_CONF_DIR=<JINDOSDK_CONF_DIR>
Set <JINDOSDK_CONF_DIR> to the absolute path of the
jindosdk.cfg
configuration file.
Configure RootPolicy.
Run the following SetRootPolicy command to specify a registered address that contains a custom prefix for a bucket:
jindo admin -setRootPolicy oss://<bucket_name>.<dls_endpoint>/ hdfs://<your_ns_name>/
The following table describes the parameters in the SetRootPolicy command.
Parameter
Description
bucket_name
The name of the bucket for which OSS-HDFS is enabled.
dls_endpoint
The endpoint of the region in which the bucket for which OSS-HDFS is enabled. Example:
cn-hangzhou.oss-dls.aliyuncs.com
.If you do not want to repeatedly add the <dls_endpoint> parameter to the SetRootPolicy command each time you run RootPolicy, you can use one of the following methods to add configuration items to the
core-site.xml
file of Hadoop:Method 1:
<configuration> <property> <name>fs.oss.endpoint</name> <value><dls_endpoint></value> </property> </configuration>
Method 2:
<configuration> <property> <name>fs.oss.bucket.<bucket_name>.endpoint</name> <value><dls_endpoint></value> </property> </configuration>
your_ns_name
The custom nsname that is used to access OSS-HDFS. A non-empty string is supported, such as
test
. The current version supports only the root directory.Configure Access Policy discovery address and Scheme implementation class.
You must configure the following parameters in the core-site.xml file of Hadoop:
<configuration> <property> <name>fs.accessPolicies.discovery</name> <value>oss://<bucket_name>.<dls_endpoint>/</value> </property> <property> <name>fs.AbstractFileSystem.hdfs.impl</name> <value>com.aliyun.jindodata.hdfs.HDFS</value> </property> <property> <name>fs.hdfs.impl</name> <value>com.aliyun.jindodata.hdfs.JindoHdfsFileSystem</value> </property> </configuration>
If you want to configure Access Policy discovery addresses and Scheme implementation classes for multiple buckets, separate the buckets with commas (
,
).Run the following command to check whether RootPolicy is successfully configured:
hadoop fs -ls hdfs://<your_ns_name>/
If the following results are returned, RootPolicy is successfully configured:
drwxr-x--x - hdfs hadoop 0 2023-01-05 12:27 hdfs://<your_ns_name>/apps drwxrwxrwx - spark hadoop 0 2023-01-05 12:27 hdfs://<your_ns_name>/spark-history drwxrwxrwx - hdfs hadoop 0 2023-01-05 12:27 hdfs://<your_ns_name>/tmp drwxrwxrwx - hdfs hadoop 0 2023-01-05 12:27 hdfs://<your_ns_name>/user
Use a custom prefix to access OSS-HDFS.
After you restart services such as Hive and Spark, you can access OSS-HDFS by using a custom prefix.
Optional. Use RootPolicy for other purposes.
List all registered addresses that contain a custom prefix specified for a bucket
Run the following listAccessPolicies command to list all registered addresses that contain a custom prefix specified for a bucket:
jindo admin -listAccessPolicies oss://<bucket_name>.<dls_endpoint>/
Delete all registered addresses that contain a custom prefix specified for a bucket:
Run the following unsetRootPolicy command to delete all registered addresses that contain a custom prefix specified for a bucket:
jindo admin -unsetRootPolicy oss://<bucket_name>.<dls_endpoint>/ hdfs://<your_ns_name>/
For more information about the RootPolicy commands, see Jindo CLI user guide.