To ensure that your data synchronization tasks and data scheduling tasks in DataWorks can run as expected, you must establish a network connection between the virtual private cloud (VPC) with which your resource group is associated and the data source that you want to access. The data source can be a database, a data service, or other data in a network environment. This topic describes the network connectivity solutions for data sources that are deployed in different types of network environments.
Background information
If a data source used in a data synchronization, data development, or data scheduling task is not deployed in the VPC with which your resource group is associated, you must select an appropriate network connectivity solution to establish a network connection between the VPC and the network environment in which the data source is deployed. For example, if the data source to access is deployed in another VPC or in a data center, you must establish a network connection between the VPC or data center and the VPC with which the resource group is associated.
For example, when you configure a data synchronization task, you must establish a network connection between the VPC with which your resource group is associated and the source and a network connection between the VPC and destination.
Prerequisites
A resource group with appropriate specifications is purchased. For more information, see Create and use a serverless resource group.
For more information about resource groups, see Resource group overview.
The network connectivity solutions provided in this topic are suitable for serverless resource groups and the following types of old-version resource groups: exclusive resource groups for Data Integration, exclusive resource groups for scheduling, and exclusive resource groups for DataService Studio.
To facilitate the management of DataWorks resources and improve user experience, DataWorks releases serverless resource groups. A serverless resource group can implement the core features of the preceding old-version resource groups. You can perform operations such as data synchronization, task scheduling and running, and API calling and management by using only one serverless resource group. We recommend that you purchase a serverless resource group. For more information, see Create and use a serverless resource group.
Precautions
Network connectivity is an important factor that affects the running result of your task.
Network connections cannot be established between a resource group and data sources that are deployed in the classic network. If the data source or business that you want to access is deployed in the classic network, we recommend that you migrate the data source or business to a VPC.
If you run a data synchronization task to synchronize data over the Internet, the efficiency and stability of the task cannot be ensured. We recommend that you run the data synchronization task to synchronize data over an internal network or use Cloud Enterprise Network (CEN) for data synchronization.
You can associate a serverless resource group with a virtual private cloud (VPC) to enable the resource group to access a data source or an address in a complex network environment over an internal network. By default, serverless resource groups cannot access the Internet. If you want to use a serverless resource group to access a data source or a network environment over the Internet, you must configure an Internet NAT gateway for the VPC with which the resource group is associated and associate the Internet NAT gateway with an elastic IP address (EIP). For more information, see the Scenario 5: Establish a network connection between a resource group and a data source that is deployed on the Internet section in this topic.
Configure network connectivity
Step 1: Associate a resource group with a VPC
The network connectivity solution that you can use varies based on the network relationship between your resource group and a data source. The following table describes the network connectivity solutions.
Network type | Data source environment | Relationship between the data source and the resource group | Common logic for establishing a network connection | Sample configuration |
VPC | Alibaba Cloud
| Same Alibaba Cloud account and same region | Associate the resource group with the VPC in which the data source is deployed. | |
|
| |||
Environment other than Alibaba Cloud
| ||||
Internet | Internet |
|
Step 2: Configure the IP address whitelist of a data source
If the data source to access is configured with an IP address whitelist, you must add the CIDR block of the vSwitch with which the resource group is associated, the EIP of the old-version resource group, or the EIP of the VPC with which the serverless resource group is associated to the IP address whitelist regardless of the network connectivity solution that you use.
If you want to access the data source over a virtual private cloud (VPC), you must add the CIDR block of the vSwitch with which the serverless resource group is associated to the IP address whitelist of the data source.
On the Exclusive Resource Groups tab of the Resource Groups page in the DataWorks console, find the desired resource group and click Network Settings in the Actions column. On the VPC Binding tab of the page that appears, view and record the CIDR block of the related vSwitch. Then, add the CIDR block to the IP address whitelist of the data source.
If you want to access a data source over the Internet, you must perform one of the following operations to configure the IP address whitelist of the data source:
If you use a serverless resource group, you must add the EIP configured for the VPC with which the resource group is associated to the IP address whitelist of the data source.
On the Internet NAT Gateway page of the VPC console, find the source network address translation (SNAT) entry that is configured, and obtain the public IP address that is associated with the related vSwitch. Then, add the public IP address to the IP address whitelist of the data source.
If you use an old-version resource group, you must add the EIP of the resource group to the IP address whitelist of the data source.
On the Exclusive Resource Groups tab of the Resource Groups page in the DataWorks console, find the desired resource group and click Details in the Actions column. In the Basic Information section of the page that appears, view and record the EIP displayed next to the EIPAddress parameter. Then, add the EIP to the IP address whitelist of the data source.
If you scale out the resource group in subsequent operations, you must check whether the EIP changes. If the EIP changes, we recommend that you add the latest EIP to the IP address whitelist of the data source at the earliest opportunity after the scale-out operation. This ensures that your task can run as expected.
Step 3: Test network connectivity
If your resource group needs to access a data source that is supported by DataWorks, you can add the data source to DataWorks and test the network connectivity of the data source.
Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane, click Data Source. On the Data Sources page, click Add Data Source. In the Add Data Source dialog box, select the desired data source type and configure the related parameters to add a data source of this type.
Find the resource group that you purchased and click Test Network Connectivity in the Connection Status column.
NoteIf Connection failed is displayed in the Connection Status column, you can click Self-service Troubleshoot in the column to resolve the issue.
If your resource group needs to access a service that is deployed in a different network environment, you can test the network connectivity between the service and the desired resource group in the business code based on your business requirements.
NoteIf your resource group needs to access the business that is deployed on an ECS instance, you must also configure a security group to allow access to the business from the CIDR block of the vSwitch with which the resource group is associated or the public IP address configured for the VPC with which the resource group is associated.
Sample configurations for different scenarios
This section provides examples on how to establish network connections between a resource group and data sources that are deployed in different network environments. An ApsaraDB RDS database or a self-managed database that is deployed in a data center or on the Internet is used as a data source in the examples.
In the following scenarios, the resource group is associated with a basic security group. For more information about basic security groups, see Overview.
Scenario 1: Establish a network connection between a resource group and a data source that belong to the same Alibaba Cloud account and reside in the same region
Instruction on establishing a network connection | Illustration |
|
Scenario 2: Establish a network connection between a resource group and a data source that belong to the same Alibaba Cloud account but reside in different regions
Instruction on establishing a network connection | Illustration |
|
Scenario 3: Establish a network connection between a resource group and a data source that belong to different Alibaba Cloud accounts
Instruction on establishing a network connection | Illustration |
|
Scenario 4: Establish a network connection between a resource group and a data source that is deployed in a data center
If the data source that you want to use does not belong to Alibaba Cloud, you can refer to this scenario to establish a network connection between the data source and your resource group.
Establish a network connection between the network environment in which the data source is deployed and Alibaba Cloud.
Use an Express Connect circuit to establish a network connection between the network environment of the data center and a VPC within the Alibaba Cloud account to which the resource group belongs.
Establish a network connection between the resource group and the data source.
Associate the resource group with the VPC that is connected to the network environment of the data center.
Add a route that points to the CIDR block of the data source for the resource group in the DataWorks console. For more information, see General reference: Add a route.
Add the CIDR block of the vSwitch with which the resource group is associated to the IP address whitelist of the data source.
Scenario 5: Establish a network connection between a resource group and a data source that is deployed on the Internet
This network connectivity solution is suitable only for serverless resource groups. EIPs are automatically associated with old-version resource groups.
Instruction on establishing a network connection | Illustration |
|
References
For more information about resource groups, see Resource group overview.
For more information about how to create and use a serverless resource group, see Create and use a serverless resource group.
For information about how to associate a resource group with a VPC, see Associate a resource group with a VPC.
Configure an SNAT entry on the Internet NAT gateway for the VPC and vSwitch with which the resource group is associated. For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet.