All Products
Search
Document Center

DataWorks:Network connectivity solutions

Last Updated:Aug 20, 2024

To ensure that your data synchronization tasks and data scheduling tasks in DataWorks can run as expected, you must establish a network connection between the virtual private cloud (VPC) with which your resource group is associated and the data source that you want to access. The data source can be a database, a data service, or other data in a network environment. This topic describes the network connectivity solutions for data sources that are deployed in different types of network environments.

Background information

If a data source used in a data synchronization, data development, or data scheduling task is not deployed in the VPC with which your resource group is associated, you must select an appropriate network connectivity solution to establish a network connection between the VPC and the network environment in which the data source is deployed. For example, if the data source to access is deployed in another VPC or in a data center, you must establish a network connection between the VPC or data center and the VPC with which the resource group is associated.

For example, when you configure a data synchronization task, you must establish a network connection between the VPC with which your resource group is associated and the source and a network connection between the VPC and destination.

image

Prerequisites

A resource group with appropriate specifications is purchased. For more information, see Create and use a serverless resource group.

Note
  • For more information about resource groups, see Resource group overview.

  • The network connectivity solutions provided in this topic are suitable for serverless resource groups and the following types of old-version resource groups: exclusive resource groups for Data Integration, exclusive resource groups for scheduling, and exclusive resource groups for DataService Studio.

  • To facilitate the management of DataWorks resources and improve user experience, DataWorks releases serverless resource groups. A serverless resource group can implement the core features of the preceding old-version resource groups. You can perform operations such as data synchronization, task scheduling and running, and API calling and management by using only one serverless resource group. We recommend that you purchase a serverless resource group. For more information, see Create and use a serverless resource group.

Precautions

  • Network connectivity is an important factor that affects the running result of your task.

  • Network connections cannot be established between a resource group and data sources that are deployed in the classic network. If the data source or business that you want to access is deployed in the classic network, we recommend that you migrate the data source or business to a VPC.

  • If you run a data synchronization task to synchronize data over the Internet, the efficiency and stability of the task cannot be ensured. We recommend that you run the data synchronization task to synchronize data over an internal network or use Cloud Enterprise Network (CEN) for data synchronization.

  • You can associate a general-purpose resource group with a virtual private cloud (VPC) to enable the resource group to access a data source or an address in a complex network environment over an internal network. By default, general-purpose resource groups cannot access the Internet. If you want to use a general-purpose resource group to access a data source or a network environment over the Internet, you must configure an Internet NAT gateway for the VPC with which the resource group is associated and associate the Internet NAT gateway with an elastic IP address (EIP). For more information, see the Scenario 5: Establish a network connection between a resource group and a data source that is deployed on the Internet section in this topic.

Configure network connectivity

Step 1: Associate a resource group with a VPC

The network connectivity solution that you can use varies based on the network relationship between your resource group and a data source. The following table describes the network connectivity solutions.

Network type

Data source environment

Relationship between the data source and the resource group

Common logic for establishing a network connection

Sample configuration

VPC

Alibaba Cloud

  • Data sources that are hosted on Elastic Compute Service (ECS) instances

  • Alibaba Cloud services

Same Alibaba Cloud account and same region

Associate the resource group with the VPC in which the data source is deployed.

Scenario 1: Establish a network connection between a resource group and a data source that belong to the same Alibaba Cloud account and reside in the same region

  • Different Alibaba Cloud accounts

  • Different regions

  1. Use a network connection tool, such as a CEN instance, an Express Connect circuit, or a VPN gateway, to perform one of the following operations: 1. Establish a network connection between the VPC in which the data source is deployed and a VPC in the region in which the resource group resides. 2. Establish a network connection between the VPC in which the data source is deployed and a VPC within the Alibaba Cloud account to which the resource group belongs.

  2. Associate the resource group with the VPC that is connected to the VPC in which the data source is deployed.

    Note

    If you select an advanced security group when you associate a resource group with a VPC, you must add the following security group rules to the advanced security group on the Security Groups page in the ECS console after the association:

    • Outbound rule: Add the IP address of the data source that the resource group needs to access as the authorization object.

    • Inbound rule: Add the CIDR block of the vSwitch with which the resource group is associated as the authorization object.

  3. Add a route that points to the IP address of the data source for the resource group in the DataWorks console. For more information, see General reference: Add a route.

Environment other than Alibaba Cloud

  • Data sources or business in data centers

  • Data sources in other cloud environments

Scenario 4: Establish a network connection between a resource group and a data source that resides in a data center

Internet

Internet

  • By default, serverless resource groups cannot access the Internet. If you want to use a serverless resource group to access a data source over the Internet, you must configure an NAT gateway for the VPC with which the resource group is associated. This way, the resource group can use the EIP associated with the NAT gateway to access the data source over the Internet.

  • Old-version resource groups can access the Internet and can directly connect to the data source over the Internet.

Scenario 5: Establish a network connection between a resource group and a data source that is deployed on the Internet

Step 2: Configure the IP address whitelist of a data source

If the data source to access is configured with an IP address whitelist, you must add the CIDR block of the vSwitch with which the resource group is associated, the EIP of the old-version resource group, or the EIP of the VPC with which the serverless resource group is associated to the IP address whitelist regardless of the network connectivity solution that you use.

  • If you want to access a data source over a VPC, you must add the CIDR block of the vSwitch with which the resource group is associated to the IP address whitelist of the data source.

    On the Exclusive Resource Groups tab of the Resource Groups page in the DataWorks console, find the desired resource group and click Network Settings in the Actions column. On the VPC Binding tab of the page that appears, view and record the CIDR block of the related vSwitch. Then, add the CIDR block to the IP address whitelist of the data source.

  • If you want to access a data source over the Internet, you must perform one of the following operations to configure the IP address whitelist of the data source:

    • If you use a serverless resource group, you must add the EIP configured for the VPC with which the resource group is associated to the IP address whitelist of the data source.

      On the Internet NAT Gateway page of the VPC console, find the source network address translation (SNAT) entry that is configured, and obtain the public IP address that is associated with the related vSwitch. Then, add the public IP address to the IP address whitelist of the data source.

      image

    • If you use an old-version resource group, you must add the EIP of the resource group to the IP address whitelist of the data source.

      On the Exclusive Resource Groups tab of the Resource Groups page in the DataWorks console, find the desired resource group and click Details in the Actions column. In the Basic Information section of the page that appears, view and record the EIP displayed next to the EIPAddress parameter. Then, add the EIP to the IP address whitelist of the data source.

      Note

      If you scale out the resource group in subsequent operations, you must check whether the EIP changes. If the EIP changes, we recommend that you add the latest EIP to the IP address whitelist of the data source at the earliest opportunity after the scale-out operation. This ensures that your task can run as expected.

Step 3: Test network connectivity

  • If your resource group needs to access a data source that is supported by DataWorks, you can add the data source to DataWorks and test the network connectivity of the data source.

    1. Go to the Data Integration page.

      Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Data Integration in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

    2. In the left-side navigation pane, click Data Source. On the Data Sources page, click Add Data Source. In the Add Data Source dialog box, select the desired data source type and configure the related parameters to add a data source of this type.

    3. Find the resource group that you purchased and click Test Network Connectivity in the Connection Status column.image

      Note

      If Connection failed is displayed in the Connection Status column, you can click Self-service Troubleshoot in the column to resolve the issue.

  • If your resource group needs to access a service that is deployed in a different network environment, you can test the network connectivity between the service and the desired resource group in the business code based on your business requirements.

    Note

    If your resource group needs to access the business that is deployed on an ECS instance, you must also configure a security group to allow access to the business from the CIDR block of the vSwitch with which the resource group is associated or the public IP address configured for the VPC with which the resource group is associated.

Sample configurations for different scenarios

This section provides examples on how to establish network connections between a resource group and data sources that are deployed in different network environments. An ApsaraDB RDS database or a self-managed database that is deployed in a data center or on the Internet is used as a data source in the examples.

Note

In the following scenarios, the resource group is associated with a basic security group. For more information about basic security groups, see Overview.

Scenario 1: Establish a network connection between a resource group and a data source that belong to the same Alibaba Cloud account and reside in the same region

Instruction on establishing a network connection

Illustration

  1. Associate the resource group with the VPC in which the data source is deployed.

  2. Add the CIDR block of the vSwitch with which the resource group is associated to the IP address whitelist of the data source.

同账号同地域

Scenario 2: Establish a network connection between a resource group and a data source that belong to the same Alibaba Cloud account but reside in different regions

Instruction on establishing a network connection

Illustration

  1. Establish a network connection between the region in which the resource group resides and the region in which the data source resides.

    Use a network connection tool, such as a CEN instance or a VPN gateway, to establish a network connection between the VPC in which the data source is deployed and a VPC in the region where the resource group resides.

  2. Establish a network connection between the resource group and the data source.

    1. Associate the resource group with the VPC that is connected to the VPC in which the data source is deployed.

    2. Add a route that points to the CIDR block of the data source for the resource group in the DataWorks console. For more information, see General reference: Add a route.

  3. Add the CIDR block of the vSwitch with which the resource group is associated to the IP address whitelist of the data source.

同账号不同地域

Scenario 3: Establish a network connection between a resource group and a data source that belong to different Alibaba Cloud accounts

Instruction on establishing a network connection

Illustration

  1. Establish a network connection between the Alibaba Cloud accounts.

    Use a network connection tool, such as a CEN instance or a VPN gateway, to establish a network connection between the VPC in which the data source is deployed and a VPC within the Alibaba Cloud account to which the resource group belongs.

  2. Establish a network connection between the resource group and the data source.

    1. Associate the resource group with the VPC that is connected to the VPC in which the data source is deployed.

    2. Add a route that points to the CIDR block of the data source for the resource group in the DataWorks console. For more information, see General reference: Add a route.

  3. Add the CIDR block of the vSwitch with which the resource group is associated to the IP address whitelist of the data source.

不同账号

Scenario 4: Establish a network connection between a resource group and a data source that is deployed in a data center

If the data source that you want to use does not belong to Alibaba Cloud, you can refer to this scenario to establish a network connection between the data source and your resource group.

  1. Establish a network connection between the network environment in which the data source is deployed and Alibaba Cloud.

    Use an Express Connect circuit to establish a network connection between the network environment of the data center and a VPC within the Alibaba Cloud account to which the resource group belongs.

  2. Establish a network connection between the resource group and the data source.

    1. Associate the resource group with the VPC that is connected to the network environment of the data center.

    2. Add a route that points to the CIDR block of the data source for the resource group in the DataWorks console. For more information, see General reference: Add a route.

  3. Add the CIDR block of the vSwitch with which the resource group is associated to the IP address whitelist of the data source.

Scenario 5: Establish a network connection between a resource group and a data source that is deployed on the Internet

Note

This network connectivity solution is suitable only for serverless resource groups. EIPs are automatically associated with old-version resource groups.

Instruction on establishing a network connection

Illustration

  1. Configure an SNAT entry on the Internet NAT gateway for the VPC and vSwitch with which the resource group is associated. For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet.

  2. Configure the IP address whitelist of the data source to allow the public IP address associated with the VPC and vSwitch to access the data source.

  3. Configure the public connection address of the data source to add the data source to the desired workspace, and test the network connectivity between the resource group and data source.

幻灯片5

References