When you activate DataWorks, the system provides you with the shared resource group for scheduling and the shared resource group for DataService Studio. You can use these resource groups to perform operations such as data development, node running, and node testing. Resources in a shared resource group are used by multiple tenants. During peak hours, the tenants may compete for resources, and the resources may be insufficient. This topic provides an overview of shared resource groups.
Scenarios
We recommend that you use a shared resource group only if the number of nodes that you want to run is small and the requirement for the timeliness of data output is low.
Limits
Resources in a shared resource group are used by multiple tenants. During peak hours, the tenants may compete for resources, and the resources may be insufficient.
A maximum of 40 nodes can be run in parallel on the shared resource group for scheduling. During the peak hours from 00:00 to 09:00, nodes may compete for resources in the shared resource group for scheduling. In this case, the maximum number of nodes that can be run in parallel on the shared resource group for scheduling may be less than 40.
The shared resource group for DataService Studio cannot meet requirements for frequent and highly concurrent API calls.
If you want to ensure separate, sufficient resources for your nodes, we recommend that you purchase exclusive resource groups. The following table describes the different types of exclusive resource groups that you can purchase.
Resource group type | Description | References |
Exclusive resource group for scheduling | If a large number of nodes must be run in parallel and each node needs a large number of parallel threads, exclusive computing resources are required to ensure that the nodes are run as scheduled. In this case, we recommend that you use an exclusive resource group for scheduling. | Billing of exclusive resource groups for scheduling (subscription) |
Exclusive resource group for Data Integration | If a large number of Data Integration nodes must be run in parallel and each node needs a large number of parallel threads, exclusive computing resources are required to ensure fast and stable data transmission. In this case, we recommend that you use an exclusive resource group for Data Integration. | Billing of exclusive resource groups for Data Integration (subscription) |
Exclusive resource group for DataService Studio | If you require high queries per second (QPS) and service level agreement (SLA) guarantees when you call APIs in DataService Studio, you must use exclusive resources for DataService Studio to ensure successful API calls. In addition, exclusive resource groups for DataService Studio can meet the requirements of highly concurrent, frequent API calls and help return responses at the earliest opportunity. | Billing of exclusive resource groups for DataService Studio (subscription) |
Billing and related operations
1. Billing rules
After you activate DataWorks, you can use the shared resource groups that are provided by DataWorks. You do not need to separately purchase shared resource groups.
You are charged based on items such as Elastic Compute Service (ECS) instances in the shared resource groups and the data synchronization threads that are used. The shared resource groups support the pay-as-you-go billing method. For more information about the billing of the shared resource groups, see the following topics:
2. Deductions and overdue payments
The settlement method for deductions and overdue payments varies based on the types of shared resource groups in DataWorks. For more information, see Deduction and overdue payments.
Use a shared resource group
To ensure service efficiency, you can select an appropriate type of shared resource group to run nodes for data integration or data development based on your business requirements.
A shared resource group is a public resource pool. Nodes that use resources in a shared resource group may not be run as scheduled if resources in the resource group are insufficient. If you want your nodes to be run as expected, use an exclusive resource group. For more information, see Create and use an exclusive resource group for Data Integration and Create and use an exclusive resource group for scheduling.
You need to provide information about a shared resource group when you configure network connectivity. For more information, see Appendix: Configure a security group for an ECS instance on which a self-managed database is hosted and Configure an IP address whitelist.
A maximum of five nodes can be run in parallel on the shared resource group for Data Integration when the shared resource group is not in use. When you run nodes on the shared resource group for Data Integration, other nodes may compete for resources in the resource group. Therefore, the maximum number of nodes that are run in parallel on the shared resource group for Data Integration may be less than five. The maximum number nodes that are actually run in parallel on the shared resource group for Data Integration varies based on the resource usage of the resource group.
You cannot change the memory size of a shared resource group. Instead, you can change the number of nodes that can be run in parallel on the shared resource group.
The memory size of a shared resource group is calculated by using the following formula: Memory size =
Number of nodes that are run in parallel on the resource group × 512 MB
.
Network connectivity solutions
A DataWorks resource group is a group of Alibaba Cloud ECS instances. To run nodes for data integration or data development, you must make sure that resource groups and data sources are connected to each other. You must also make sure that special security settings such as an IP address whitelist do not affect the connections between resource groups and data sources.
Network connectivity
A network connection can be established between a shared resource group and a data source that belongs to Alibaba Cloud. The network connectivity between a data source and a shared resource group varies based on the network environment of the data source:
Shared Resource Group for Scheduling
If you want the shared resource group for scheduling to access a public IP address, you must add the public IP address or domain name and port number to a sandbox whitelist on the Workspace Management page. If the shared resource group for scheduling cannot access the public IP address after you perform the preceding operation, we recommend that you use an exclusive resource group for scheduling.
You can use the shared resource group for scheduling to access only the data sources for which no IP address whitelist is configured. To access a data source for which an IP address whitelist is configured or a data source that is deployed in a virtual private cloud (VPC), we recommend that you use an exclusive resource group for scheduling.
NoteWe recommend that you use an exclusive resource group for scheduling to access a data source that is deployed on the Internet or in a VPC. For more information about how to use an exclusive resource group for scheduling, see Exclusive resource groups for scheduling.
Shared Resource Group for DataService Studio
The following table describes the network connectivity between the shared resource group for DataService Studio and data sources that are deployed in different network environments.
Network environment
Accessible
Internet
Yes
Classic network
Yes
VPC
No
Whitelist settings
The shared resource group for scheduling provides the security sandbox feature for nodes. This feature can be used to limit access to the resource group from unknown IP addresses. If you want to access the resource group for scheduling, you must add the IP address that you use to the IP address whitelist of the security sandbox. For more information, see the Security Settings section of the "Create and manage workspaces" topic.