All Products
Search
Document Center

MaxCompute:Network connection process

Last Updated:Feb 12, 2025

By default, MaxCompute cannot access a service over the Internet or a virtual private cloud (VPC). To allow access to the service, you must establish a network connection between MaxCompute and the specified object, such as an IP address, domain name, ApsaraDB RDS instance, ApsaraDB for HBase cluster, or Hadoop cluster. This topic describes the network architecture between MaxCompute and the object that you want to access, along with the supported network connection schemes.

Disclaimer

You can use MaxCompute to establish network connections with services over the Internet or in a VPC free of charge. Before you use MaxCompute to establish a network connection, take note of the following limits:

  • MaxCompute ensures network connectivity. If a failover is triggered by network-related code of users, MaxCompute may rerun tasks on nodes. To resolve this issue, you must optimize the code. We recommend that you perform only read operations. Prevent dirty data from being generated due to repeated write operations.

  • Access requests must be forwarded by a proxy, and the number of requests that can be forwarded by a proxy is limited. We recommend that you use persistent connections and manage the number of nodes. An excessive concurrency or a large number of connections may cause network requests to fail.

  • MaxCompute does not provide guaranteed bandwidth and is not responsible if jobs run slowly.

  • The number of outbound proxy IP addresses is limited. If a connection exception occurs due to the limited number of outbound proxy IP addresses, contact Alibaba Cloud technical support.

  • Outbound proxy IP addresses may change. We recommend that you do not enable access control for the service that you want to access. If you configure an IP address whitelist for the service, access to the service may be denied due to changes of outbound proxy IP addresses.

Important

After you establish a network connection between MaxCompute and the destination service, you may still not be able to access the destination service from MaxCompute. This may be caused by network restrictions of the tool where MaxCompute jobs are run. For example, when you use MaxCompute in the DataWorks console to synchronize or cleanse data, you must also establish a network connection between a DataWorks resource group and the destination service, and make sure that DataWorks allows access from the destination service. If restrictions are configured, you must add the IP address or CIDR block of the destination service to the sandbox whitelist of DataWorks. For more information about network connection and sandbox configurations of DataWorks, see Establish a network connection between a resource group and a data source.

Feature description

The following figure shows the network architecture from MaxCompute to the service that you want to access, then to the supported network connection schemes.方案

MaxCompute accesses the destination service in three scenarios:

  • Scheme 1: Access over the Internet

    You can use this scheme if you want to access an IP address or a domain name over the Internet by using user-defined functions (UDFs), Spark, MapReduce, PyODPS, or Mars in MaxCompute. If you choose a public IP address or domain name that is commonly used, such as aliyun.com, you can directly add and remove the public IP address or domain name on the Projects page in the MaxCompute console. If the public IP address or domain name fails to pass automatic verification, you can submit a ticket to apply for access to the IP address or domain name. If no security restrictions are imposed on the IP address or domain name that you want to access, you can access the destination IP address or domain name after your application is approved. The review period is three business days.

    Note

    If security restrictions are imposed on the IP address or domain name that you want to access over the Internet, contact the owner of your organization to resolve the issue based on the security restrictions.

  • Scheme 2: Access over a VPC (dedicated connection)

    You can use this scheme if you want to access an ApsaraDB RDS instance, ApsaraDB for HBase cluster, or Hadoop cluster that resides in a VPC by using SQL statements, UDFs, Spark, MapReduce, PyODPS, Mars, external tables, or the data lakehouse solution on MaxCompute. You must authorize MaxCompute to create elastic network interfaces (ENIs) by using the Alibaba Cloud account to which the VPC belongs, and establish a connection between MaxCompute and the VPC in the MaxCompute console. Before you establish the connection, you must configure a security group to allow the connection between MaxCompute and the destination service. The security group specifies the access rules of the ENIs created by MaxCompute. You can view the created ENIs in the MaxCompute console.

    Note
    • If an access control policy is configured for the destination service, you must add the IP addresses of the ENIs or the CIDR block of the vSwitch to the IP address whitelist of the destination service.

    • MaxCompute can access only the VPC whose ID is specified for the dedicated connection with the specified VPC. If you want to access a VPC across regions or another VPC in the region to which the VPC specified for the dedicated connection belongs, you must establish a network connection between the VPC specified for the dedicated connection and another VPC.

  • Scheme 3: Access to specific Alibaba Cloud services

    You can use this scheme if you want to access Alibaba Cloud services such as Object Storage Service (OSS), Data Lake Formation (DLF), Tablestore, and Hologres by using SQL statements, UDFs, Spark, MapReduce, PyODPS, Mars, external tables, or the data lakehouse solution in MaxCompute. In this scheme, the endpoints of the cloud product interconnection network of Alibaba Cloud services are used.

    • If you create an OSS or Tablestore external table, you can access OSS or Tablestore by using its internal endpoint.

    • If you call a UDF to access OSS or Tablestore, you can only access OSS or Tablestore by using its public endpoint.

    For more information about the configurations and endpoint-based access in different scenarios, see Access to specific Alibaba Cloud services in this topic.

Prerequisites

Before you apply for establishing a network connection between MaxCompute and a service, make sure that the following conditions are met:

  • A MaxCompute project is created. If a MaxCompute project exists, you can use it without the need to create another one. If you use the data lakehouse solution, we recommend that you set the data type edition for your MaxCompute project to the Hive-compatible data type edition. For more information about how to create a MaxCompute project, see Create a MaxCompute project.

  • If you want to access an object in a VPC, make sure that the account of the VPC owner, the account that is used to access the MaxCompute project and the administrator account of the destination object are the same Alibaba Cloud account or are RAM users that belong to the same Alibaba Cloud account.

Supported regions

The following table describes the regions where a network connection can be established between MaxCompute and an object over the Internet or a VPC.

Scheme

Region

Connected object

Access over the Internet

  • China (Beijing)

  • China (Shanghai)

  • China (Zhangjiakou)

  • China (Ulanqab)

  • China (Hangzhou)

  • China (Shenzhen)

  • China (Chengdu)

  • China (Hong Kong)

  • Singapore

  • Malaysia (Kuala Lumpur)

  • Indonesia (Jakarta)

  • Germany (Frankfurt)

  • US (Silicon Valley)

  • US (Virginia)

Public IP address or domain name

Access over a VPC (dedicated connection)

  • China (Beijing)

  • China (Shanghai)

  • China (Zhangjiakou)

  • China (Ulanqab)

  • China (Hangzhou)

  • China (Shenzhen)

  • China (Hong Kong)

  • China East 2 Finance Zone F

  • Japan (Tokyo)

  • Singapore

  • Malaysia (Kuala Lumpur)

  • Indonesia (Jakarta)

  • Germany (Frankfurt)

  • US (Silicon Valley)

  • US (Virginia)

  • IP address or domain name of a VPC

  • RDS

  • ApsaraDB for HBase cluster

  • Hadoop cluster

Access over the Internet

Step 1: Manage a public IP address or domain name on the Projects page

If you choose a public IP address or domain name that is commonly used, such as aliyun.com, you can directly add and remove the public IP address or domain name on the Projects page in the MaxCompute console. To manage a public IP address or domain name, perform the following steps:

  1. Log on to the MaxCompute console. In the upper-left corner of the console, select a region.

  2. In the left-side navigation pane, click Projects.

  3. On the Projects page, find the project that you want to manage and click Manage in the Actions column.

  4. In the External Network section of the Parameter Configuration tab, add the desired public IP address or domain name.

  5. Click Submit.

Note
  • The following top-level domains (TLDs) are supported: aliyuncs.com, aliyun.com, amap.com, dingtalk.com, alicloudapi.com, cainiao.com, alicdn.com, taobao.com, alibaba.com, alipaydev.com, and alibabadns.com.

  • You cannot configure IPv6 addresses. The number of public IP addresses is unlimited.

  • If the public IP address or domain name fails the automatic verification, you can try removing it and adding another. If you still fail to pass automatic verification, apply to configure an external network address by submitting a ticket. For more information, see Submit an application.

Step 2: Access public endpoints

After completing the operation described in Step 1: Manage a public IP address or domain name on the Projects page, add the following configurations for SQL statements or Spark jobs accessing the Internet.

Note

For other job types, configure settings accordingly.

SQL UDFs jobs

  • Use the following settings:

    -- Set the public IP address or endpoint and port number that are configured in the network connection application form. Specify the public IP address or endpoint and port number in the following SQL statement.
    -- If you want to add multiple endpoints and port numbers, separate them with commas (,)
    SET odps.internet.access.list=<ip_address:port|realm_name:port>;
    -- Execute the following SQL statement to call a UDF.
    SELECT <UDF_name>("<http://ip_address|realm_name>");
  • ip_address:port | realm_name:port: Required. Specify the public IP address or endpoint and port number you want to access.

  • UDF_name: The UDF used to access the public IP address or endpoint.

  • UDF code example:

    package com.aliyun.odps.test.udf;
    import com.aliyun.odps.udf.UDF;
    import java.io.BufferedReader;
    import java.io.IOException;
    import java.io.InputStreamReader;
    import java.net.URL;
    public class <UDF_name> extends UDF {
        public String evaluate(String urlStr) throws IOException {
            URL url = new URL(urlStr);
            StringBuilder sb = new StringBuilder();
            try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()))) {
                String line;
                while ((line = reader.readLine()) != null) {
                    sb.append(line).append('\n');
                }
            }
            return sb.toString();
        }
    }
  • Execution: After the network connection is approved, execute the following sample command, where the UDF created from the sample code is named url_fetch:

    SET odps.internet.access.list=www.aliyun.com:80;
    SELECT url_fetch("http://www.aliyun.com");

Spark on MaxCompute jobs

Add the following configuration to the Spark client's conf file or the Spark job configuration submitted through DataWorks.

spark.hadoop.odps.cupid.smartnat.enable = true;
spark.hadoop.odps.cupid.internet.access.list=<ip_address:port>

Step 3: (Optional) Add to the whitelist

If access control is configured for your service, add MaxCompute's outbound IP address for Internet access to the service whitelist. You can submit a ticket to obtain the outbound IP address.

Submit an application

To allow MaxCompute to access a public IP address or domain name that fails to pass automatic verification, perform the following steps:

  1. Submit a ticket to apply for configuring an IP address whitelist.

  2. Enter the destination IP address or domain name and the port number in the form. If you want to add multiple IP addresses or domain names and port numbers, separate them with commas (,). For example, if you want to access an Alibaba Cloud domain name, provide the network configuration information www.aliyun.com:80. If you want to access the AMAP service, provide the network configuration information restapi.amap.com:443,restapi.amap.com:80.

  3. After the MaxCompute technical support team receives the application, the team reviews and completes the network configuration. After the review is passed, you can proceed with the subsequent steps. The review requires approximately three business days. If you have a question about the review result, you can search for the DingTalk group ID 11782920 to join the DingTalk group of the MaxCompute developer community to provide feedback.

Access over a VPC (dedicated connection)

Step 1: Establish a dedicated network connection

The process for establishing a dedicated network connection is as follows:

Step 1: Authorization

  • Authorize the user: Grant the permissions to create network connection objects to the logon user. For details, see Networklink. The authorized user must be the project owner or a user with the Super_Administrator or Admin role at the tenant level. For more information, see Role planning and Permissions on objects in a tenant.

  • Authorize MaxCompute: This authorization allows MaxCompute to create ENIs in the VPC to establish a connection between MaxCompute and the VPC. To grant the permissions, you must use an Alibaba Cloud account to log on to the Alibaba Cloud Management Console, visit the Cloud Resource Access Authorization page, and click Confirm Authorization Policy.

    Note

    For more information about the billing rules when you access a VPC from MaxCompute, see VPC peering connections.

Step 2: Configure security group rules

Create a security group for MaxCompute to manage the access of MaxCompute to various resources in the VPC.

  1. Log on to the VPC console. On the VPC page, click the ID of the destination VPC. On the page that appears, click the Resource Management tab.

  2. In the VPC Resources section of the Resource Management tab, move the pointer over the value of Security Group and click Add. On the Create Security Group page, click Create Security Group to create a security group for MaxCompute and record the ID of the security group. You must create a basic security group, not an advanced one. By default, basic security groups allow outbound traffic, while advanced ones do not. If you use an advanced security group, no objects in the VPC can be accessed. You must select the same VPC as the object that MaxCompute needs to access. For more information about how to create a security group, see Create a security group.

Note
  • By default, MaxCompute automatically creates two ENIs based on bandwidth requirements and provides them free of charge. The ENIs created by MaxCompute are part of the security group you created.

  • If you need to establish a connection between MaxCompute and an ApsaraDB for HBase cluster but the security group does not allow access to the ApsaraDB for HBase cluster, add the IP addresses of the ENIs created by MaxCompute to the ApsaraDB for HBase cluster's whitelist. The IP addresses of the ENIs may change, thus we recommend that you add the CIDR block of the vSwitch to the whitelist to which the VPC belongs. To obtain the IP addresses of the ENIs, perform the following operations: Log on to the Elastic Compute Service (ECS) console. In the left-side navigation pane, click Elastic Network Interfaces in the Network & Security section to view the IP addresses of the ENIs.

Step 3: Create a network connection between MaxCompute and the destination VPC

An Alibaba Cloud account or a RAM user that is assigned the tenant-level Super_Administrator or Admin role can establish a connection between MaxCompute and a VPC in the MaxCompute console. For more information about the roles, see Role planning. To establish a connection between MaxCompute and a VPC, perform the following steps:

  1. Log on to the MaxCompute console.

  2. In the left-side navigation pane, choose Tenants > Network Connection > Add Network Connection.

  3. In the Add Network Connection dialog box, configure the parameters and click OK. The following table describes the parameters:新增网络连接

    Parameter

    Description

    Network Connection Name

    The name of the custom network connection. The name must meet the following format requirements:

    • Starts with a letter.

    • Contains only letters, underscores (_), and digits.

    • Is 1 to 63 characters in length.

    Type

    The network connection type. Default value: Passthrough.

    Note

    The default value indicates that the VPC connection scheme is used.

    Region

    The region in which you can use the VPC connection scheme to establish a network connection between MaxCompute and the specified object. For more information, see Supported regions.

    Selected VPC

    The ID of the VPC.

    To obtain the ID of the VPC, perform the following operations:

    • If you want to establish a network connection between MaxCompute and an ApsaraDB for HBase cluster or a Hadoop cluster, you can obtain the VPC ID in the network connection information in the console of the related service.

    • In other cases, you can perform the following operations: Log on to the VPC console. On the VPC page, view the ID of the desired VPC in the Instance ID/Name column.VPC实例

    vSwitch ID

    The ID of the vSwitch to which the VPC belongs.

    To obtain the ID of the vSwitch, perform the following operations:

    • If you want to establish a network connection between MaxCompute and an ApsaraDB for HBase cluster or a Hadoop cluster, you can obtain the vSwitch ID in the network connection information in the console of the related service.

    • In other cases, you can perform the following operations: Log on to the VPC console. In the left-side navigation pane, click vSwitch. On the vSwitch page, click the name of the desired vSwitch. On the page that appears, view the vSwitch ID in the vSwitch Basic Information section.

    Security Group

    The ID of the security group that is recorded in the Establish a dedicated network connection step.

Step 4: Configure the security group of the destination service

After the preceding operations are complete and the dedicated connection is established, you must add rules to the security group of the destination service to allow the MaxCompute security group created in Step 2 to access the destination service by using specific ports, such as port 9200 and port 31000.

For example, if you want to access an ApsaraDB RDS instance, you must add rules to the security group of the ApsaraDB RDS instance to allow access from the security group created in Step 2. If the service that you want to access does not support security groups, and only IP addresses can be added, you must add the CIDR block of the vSwitch that is used by the destination service.

  • Configure the security group of the Hadoop cluster that MaxCompute needs to access.

    • To ensure that MaxCompute can access a Hadoop cluster, perform the following configurations for the security group of the Hadoop cluster:

      • Add inbound rules to the security group of the Hadoop cluster.

      • Set the authorization object to the security group to which the ENIs belong. In this case, the security group refers to the group that you created in Step 2.

      • Set the port number for the Hive metastore service to 9083.

      • Set the port number for NameNode of Hadoop Distributed File System (HDFS) to 8020.

      • Set the port number for DataNode of HDFS to 50010.

    • For example, if you want to allow MaxCompute to access a Hadoop cluster that is deployed on Alibaba Cloud E-MapReduce (EMR), you must configure the security group rules that are shown in the following figure. For more information, see Create a security group.配置

  • Configure the security group of an ApsaraDB for HBase cluster.

    • Add the security group that is created for MaxCompute to the security group of the ApsaraDB for HBase cluster or add the IP addresses of the ENIs that are created by MaxCompute to a whitelist of the ApsaraDB for HBase cluster.

    • For example, if you want to allow MaxCompute to access an ApsaraDB for HBase cluster, you can perform the following operations: Log on to the ApsaraDB for HBase console. On the Clusters page, click the name of the ApsaraDB for HBase cluster that you want to access in the ID/Name column. In the left-side navigation pane, click Access Control. Then, add the security group of MaxCompute on the Security Group tab or add the IP addresses of the ENIs created by MaxCompute to a whitelist of the ApsaraDB for HBase cluster on the Whitelist Setting tab. For more information about how to add a security group or add an IP address to a whitelist, see Configure a whitelist and a security group.

      Note

      If the security group of MaxCompute cannot be added, you can add the IP addresses of the ENIs that are created by MaxCompute to a whitelist of the ApsaraDB for HBase cluster on the Whitelist Setting tab. If the MaxCompute configuration is changed, the IP addresses of the ENIs may also change. We recommend that you add the CIDR block of the vSwitch to the whitelist to which the VPC belongs.

  • Configure the security group of an ApsaraDB RDS instance.

    • Add the security group that is created for MaxCompute to the security group of the ApsaraDB RDS instance or add the IP addresses of the ENIs that are created by MaxCompute to a whitelist of the ApsaraDB RDS instance.

    • For example, if you want to allow MaxCompute to access an ApsaraDB RDS instance, you can perform the following operations: Log on to the ApsaraDB RDS console. On the Instances page, click the name of the ApsaraDB RDS instance that you want to access in the Instance ID/Name column. In the left-side navigation pane, click Whitelist and SecGroup. Then, add a security group on the Security Group tab or configure an IP address whitelist on the Whitelist Settings tab. For more information about how to add a security group or configure an IP address whitelist, see Configure a security group for an ApsaraDB RDS for MySQL instance or Configure an IP address whitelist.

    Note

    If the MaxCompute configuration is changed, the IP addresses of the ENIs may also change. We recommend that you add the CIDR block of the vSwitch to the whitelist to which the VPC belongs.

Step 2: Access the address in the VPC by using the network connection

After completing the steps in Step 1: Establish a dedicated network connection, add the following configurations for SQL statements or Spark jobs accessing a VPC.

Note

For other job types, configure settings accordingly.

Access the VPC by using SQL statements

  • Access resources in a VPC by using UDFs. For details, see Use UDFs to access resources in VPCs. The following code shows an example:

    -- Configure the name of the network connection that you established based on the VPC connection scheme. This setting is valid only for the current session.
    SET odps.session.networklink=testLink;
  • Access resources in a VPC by using external tables. The following code shows an example:

    -- Configure parameters in the table creation statement.
     TBLPROPERTIES(
    'networklink'='<networklink_name>')
  • Configure Networklink for the data lakehouse solution. For details, see Lakehouse of MaxCompute.

Access the VPC by using Spark

To run a Spark job, add the following configurations to use the ENI dedicated connection for accessing services in the destination VPC. For details, see Access instances in a VPC from Spark on MaxCompute.

  • spark.hadoop.odps.cupid.eni.enable = true

  • spark.hadoop.odps.cupid.eni.info=regionid:vpc id

Step 3: (Optional) Add to the whitelist

If access control is configured for the object that you want to access, you must add the security group for the established dedicated network connection to the IP address whitelist of the object.

Access to specific Alibaba Cloud services

You can use this scheme if you want to access Alibaba Cloud services such as OSS, DLF, Tablestore, and Hologres by using SQL statements, UDFs, Spark, MapReduce, PyODPS, Mars, external tables, or the data lakehouse solution in MaxCompute. In this scheme, the endpoints of the cloud product interconnection network of Alibaba Cloud services are used.

Access OSS or Tablestore by using external tables

If you create an OSS or Tablestore external table, you can access OSS or Tablestore by using the internal endpoint of OSS or Tablestore.

  • For more information about the internal endpoints of OSS in each region, see the Internal endpoint column in Regions and endpoints.

  • For more information about the internal endpoints of Tablestore in each region, see Classic network endpoints in Endpoints.fig_intl_enpoint

For more information about how to use an external table to access OSS or Tablestore, see Hologres external tables.

Access OSS or Tablestore by calling UDFs

If you want to access OSS or Tablestore by calling UDFs, you must use the public endpoint of OSS or Tablestore and add the public endpoint of OSS or Tablestore to the MaxCompute whitelist.

  1. Add the public endpoint of OSS or Tablestore to the MaxCompute whitelist.

    You can submit a ticket to add the public endpoint of OSS or Tablestore to the MaxCompute whitelist. References for the public endpoint of OSS or Tablestore in each region:

    • Public endpoint of OSS in each region: Public endpoint column in Regions and endpoints.

    • Public endpoint of Tablestore in each region: Public endpoints in Endpoints.fig_intl_enpoint

  2. Use the public endpoint to access OSS or Tablestore.

    For more information about examples of Spark on MaxCompute jobs, see Access OSS from Spark on MaxCompute.

FAQ

What should I do if high concurrency leads to DNS resolution failures?

Problem description: During the execution of UDF or Spark tasks, many concurrent requests are generated to access a peer domain, leading to DNS resolution failures.

Solution: We recommend that you resolve the domain name to an IP address during initialization of the tasks and use the IP address for access during execution. For more information, see DNS resolution fails due to high concurrency.

What should I do if accessing HTTPS services by IP addresses leads to failures?

Problem description: The HTTPS feature is essential when accessing remote VPC services such as KMS and OSS for Spark or UDF tasks. However, Errors occur if the services are accessed through an IP address.

Solution: Add the domain name to the request's host field to solve the issue caused by IP address restrictions when accessing HTTPS services. For more information, see Access HTTPS service by using IP.