This topic describes how to access instances in an Alibaba Cloud virtual private cloud (VPC) from Spark on MaxCompute.
Directly access instances in a VPC
You can access instances in a VPC or custom private domain names from Spark on MaxCompute. Instances in a VPC include Elastic Compute Service (ECS) instances, ApsaraDB for HBase instances, and ApsaraDB RDS instances.
If you want to access instances in a VPC from Spark on MaxCompute, you must add the spark.hadoop.odps.cupid.vpc.domain.list
parameter to the spark-defaults.conf file of MaxCompute or the related configuration file of DataWorks to specify one or more instances that you want to access. The value of this parameter is in the JSON format. When you configure this parameter, you must remove spaces and line feeds from the text in this parameter and merge JSON text into one line.
spark.hadoop.odps.cupid.vpc.domain.list
parameter when you access different types of instances. The values of the regionId, vpcId, domain, and port parameters in the following examples are for reference only. For information about the ID of each region, see Project operations. - Access ApsaraDB for MongoDB instancesThe following code shows the value of
spark.hadoop.odps.cupid.vpc.domain.list
when you access ApsaraDB for MongoDB instances. In this example, a primary instance and a secondary instance are specified.
Results of merging JSON text into one line:{ "regionId":"cn-beijing", "vpcs":[ { "vpcId":"vpc-2zeaeq21mb1dmkqh0****", "zones":[ { "urls":[ { "domain":"dds-2ze3230cfea08****.mongodb.rds.aliyuncs.com", "port": 3717 }, { "domain":"dds-2ze3230cfea08****.mongodb.rds.aliyuncs.com", "port":3717 } ] } ] } ] }
{"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0****","zones":[{"urls":[{"domain":"dds-2ze3230cfea08****.mongodb.rds.aliyuncs.com","port": 3717},{"domain":"dds-2ze3230cfea08****.mongodb.rds.aliyuncs.com","port":3717}]}]}]}
- Access an ApsaraDB RDS instanceThe following code shows the value of
spark.hadoop.odps.cupid.vpc.domain.list
when you access an ApsaraDB RDS instance.
Results of merging JSON text into one line:{ "regionId":"cn-beijing", "vpcs":[ { "vpcId":"vpc-2zeaeq21mb1dmkqh0****", "zones":[ { "urls":[ { "domain":"rm-2zem49k73c54z****.mysql.rds.aliyuncs.com", "port": 3306 } ] } ] } ] }
{"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0****","zones":[{"urls":[{"domain":"rm-2zem49k73c54z****.mysql.rds.aliyuncs.com","port": 3306}]}]}]}
- Access ApsaraDB for HBase instancesThe following code shows the value of
spark.hadoop.odps.cupid.vpc.domain.list
when you access ApsaraDB for HBase instances.
Results of merging JSON text into one line:{ "regionId":"cn-beijing", "vpcs":[ { "vpcId":"vpc-2zeaeq21mb1dmkqh0exox", "zones":[ { "urls":[ { "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com", "port":2181 }, { "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com", "port":16000 }, { "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com", "port":16020 }, { "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com", "port":2181 }, { "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com", "port":16000 }, { "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com", "port":16020 }, { "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com", "port":2181 }, { "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com", "port":16000 }, { "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com", "port":16020 }, { "domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com", "port":16020 }, { "domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com", "port":16020 }, { "domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com", "port":16020 } ] } ] } ] }
{"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0exox","zones":[{"urls":[{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":2181},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16000},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":2181},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16000},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":2181},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16000},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com","port":16020}]}]}]}
- Access an ApsaraDB for Redis instanceThe following code shows the value of
spark.hadoop.odps.cupid.vpc.domain.list
when you access an ApsaraDB for Redis instance.
Results of merging JSON text into one line:{ "regionId":"cn-beijing", "vpcs":[ { "vpcId":"vpc-2zeaeq21mb1dmkqh0****", "zones":[ { "urls":[ { "domain":"r-2zebda0d3c05****.redis.rds.aliyuncs.com", "port":3717 } ] } ] } ] }
{"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0****","zones":[{"urls":[{"domain":"r-2zebda0d3c05****.redis.rds.aliyuncs.com","port":3717}]}]}]}
- Access a LogHub instanceThe following code shows the value of
spark.hadoop. odps.cupid.vpc.domain.list
when you access a LogHub instance.
Results of merging JSON text into one line:{ "regionId":"cn-beijing", "vpcs":[ { "zones":[ { "urls":[ { "domain":"cn-beijing-intranet.log.aliyuncs.com", "port":80 } ] } ] } ] }
{"regionId":"cn-beijing","vpcs":[{"zones":[{"urls":[{"domain":"cn-beijing-intranet.log.aliyuncs.com","port":80}]}]}]}
Set the domain parameter to the classic network endpoint or the VPC endpoint of the LogHub instance. For the endpoint of each region, see Endpoints.
- Access a DataHub instanceThe following code shows the value of
spark.hadoop.odps.cupid.vpc.domain.list
when you access a DataHub instance.
Results of merging JSON text into one line:{ "regionId":"cn-beijing", "vpcs":[ { "zones":[ { "urls":[ { "domain":"dh-cn-beijing.aliyun-inc.com", "port":80 } ] } ] } ] }
{"regionId":"cn-beijing","vpcs":[{"zones":[{"urls":[{"domain":"dh-cn-beijing.aliyun-inc.com","port":80}]}]}]}
Set the domain parameter to the ECS endpoint on the classic network.
- Access a custom domain nameIn this example, the custom domain name
example.aliyundoc.com
is configured in a VPC. Spark on MaxCompute accesses the domain name by usingexample.aliyundoc.com:80
, which is a combination of the domain name and a port number. Perform the following operations before you access the domain name:- Associate a zone with the VPC in PrivateZone.
- On the Cloud Resource Access Authorization page in the RAM console, click Confirm Authorization Policy to grant MaxCompute the read-only permissions on PrivateZone.
- Add the following parameters to the configurations of the Spark node:
spark.hadoop.odps.cupid.pvtz.rolearn=acs:ram::xxxxxxxxxxx:role/aliyunodpsdefaultrole spark.hadoop.odps.cupid.vpc.usepvtz=true
The
spark.hadoop.odps.cupid.pvtz.rolearn
parameter specifies the Alibaba Cloud Resource Name (ARN), which can be obtained from the RAM console. - Add the
spark.hadoop.odps.cupid.vpc.domain.list
parameter to the configuration file of your Spark job. The following code shows the value of this parameter:
Results of merging JSON text into one line:{ "regionId":"cn-beijing", "vpcs":[ { "vpcId":"vpc-2zeaeq21mb1dmkqh0****", "zones":[ { "urls":[ { "domain":"example.aliyundoc.com", "port":80 } ], "zoneId":"9b7ce89c6a6090e114e0f7c415ed****" } ] } ] }
{"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0****","zones":[{"urls":[{"domain":"example.aliyundoc.com","port":80}],"zoneId":"9b7ce89c6a6090e114e0f7c415ed****"}]}]}
- Access an HDFS instance
- Add the hdfs-site.xml file to enable HDFS support. Sample configurations in the file:
<?xml version="1.0"?> <configuration> <property> <name>fs.defaultFS</name> <value>dfs://DfsMountpointDomainName:10290</value> </property> <property> <name>fs.dfs.impl</name> <value>com.alibaba.dfs.DistributedFileSystem</value> </property> <property> <name>fs.AbstractFileSystem.dfs.impl</name> <value>com.alibaba.dfs.DFS</value> </property> </configuration>
- Add the
spark.hadoop.odps.cupid.vpc.domain.list
parameter to the configuration file of your Spark job. The following code shows the value of this parameter:
Results of merging JSON text into one line:{ "regionId": "cn-shanghai", "vpcs": [{ "vpcId": "vpc-xxxxxx", "zones": [{ "urls": [{ "domain": "DfsMountpointDomainName", "port": 10290 }] }] }] }
{"regionId": "cn-shanghai","vpcs": [{"vpcId": "vpc-xxxxxx","zones": [{"urls": [{"domain": "DfsMountpointDomainName","port": 10290}]}]}]}
- Add the hdfs-site.xml file to enable HDFS support. Sample configurations in the file:
Access instances over VPCs
Compared with the direct access method described in Directly access instances in a VPC, this access method provides high stability and better performance. In addition, this access method supports Internet access.
- You can use this access method to access instances in a VPC. If your Spark job needs to access instances across multiple VPCs at the same time, you can establish connections between the VPC that you have accessed and other VPCs.
- For a Spark job that runs in a MaxCompute project, the user ID (UID) of the Alibaba Cloud account that owns the MaxCompute project must be the same as the UID of the Alibaba Cloud account that owns the VPC. Otherwise, the following error message appears:
You are not allowed to use this vpc - vpc owner and project owner must be the same person
.
For more information about how to establish a VPC connection, see Network connection process.