You can purchase an exclusive resource group for scheduling based on your business requirements and use the resource group to schedule tasks. Before you use the exclusive resource group for scheduling, you may need to perform operations such as configuring network settings and IP address whitelists. This topic describes the process from the purchase of an exclusive resource group for scheduling to the use of the resource group.
Prerequisites
You are familiar with the performance and billing of exclusive resource groups for scheduling with specific specifications. The performance of an exclusive resource group for scheduling is measured based on the number of tasks that can be run in parallel. The billing details of an exclusive resource group for scheduling vary based on the specifications of the resource group. We recommend that you determine the specifications and subscription duration based on your business requirements before you purchase an exclusive resource group for scheduling. For more information, see Billing of exclusive resource groups for scheduling (subscription).
Optional. If an interaction is required between your exclusive resource group for scheduling and a data source or a different network environment (for example, you need to use a Shell node to access a self-managed database or a private IP address in a scheduling scenario) or if an exclusive resource group for scheduling is required to run a task that uses an E-MapReduce (EMR) or Cloudera's Distribution including Apache Hadoop (CDH) compute engine instance, you must be familiar with the solutions for network connectivity between an exclusive resource group for scheduling and a data source or compute engine instance in different scenarios. In addition, you must be familiar with the precautions to practice when you configure the IP address whitelist of a data source. For information about the network connectivity solutions that can be used in different scenarios and how to configure the IP address whitelist of a data source, see Exclusive resource groups for scheduling.
NoteIf you do not need to connect an exclusive resource group to a data source and you only want to fix issues that tasks are delayed due to insufficient resources in the shared resource group for scheduling, you can ignore the network configurations described in this topic. You can purchase an exclusive resource group for scheduling in any zone without the need to configure network settings.
You have a good command of the use scenarios of exclusive resource groups for scheduling. For more information, see Scenarios of exclusive resource groups for scheduling.
Procedure
To purchase and use an exclusive resource group for scheduling, you must perform the following steps.
Step | Description | References |
1 | Create an exclusive resource group for scheduling. Exclusive resource groups for scheduling are charged based on the subscription billing method. | |
2 | Associate the exclusive resource group for scheduling with a workspace based on your business requirements. After an exclusive resource group is created, the resource group does not belong to any workspace. Therefore, you must associate the exclusive resource group with a workspace. | |
3 | If you want to use the exclusive resource group for scheduling to access a data source that is deployed in a virtual private cloud (VPC), associate the resource group with this VPC or a VPC that connects to the data source. | (Optional) Associate the exclusive resource group with a VPC |
4 | If the access of the exclusive resource group for scheduling to the data source is restricted by the IP address whitelist of the data source, add the elastic IP address (EIP) of the resource group or the CIDR block of the vSwitch with which the resource group is associated to the IP address whitelist. | (Optional) Configure the IP address whitelist of a data source |
5 | Test the network connectivity between the exclusive resource group for scheduling and the data source on the Data Source page of the DataWorks console. This ensures that a task that uses the data source can be normally configured. | (Optional) Test the network connectivity of the exclusive resource group |
6 | Change the resource groups used by tasks to the exclusive resource group for scheduling. After you associate the exclusive resource group for scheduling with a workspace, the resource groups used by the tasks in the workspace remain unchanged. If you want to use the exclusive resource group for scheduling to schedule tasks in the workspace, you must manually change the resource groups used by the tasks to the exclusive resource group for scheduling. | Change the resource groups used by tasks to the exclusive resource group |
7 | View and monitor the resource usage of the exclusive resource group for scheduling and the number of instances that are waiting for resources in the resource group. | View the resource usage of the exclusive resource group and monitor the resource group |
8 | If you want to perform command-related operations on the exclusive resource group for scheduling, such as installing a third-party Python package, use the O&M Assistant feature to perform the operations. |
Precautions
Only an Alibaba Cloud account or a RAM user to which the AliyunBSSOrderAccess and AliyunDataWorksFullAccess policies are attached can create a resource group.
Only a workspace administrator can associate a resource group with a workspace and change the workspace with which a resource group is associated.
For information about the permissions that are required to use the features and perform operations on the Resource Groups page of the DataWorks console, see Policies that can be used to manage permissions on resource groups.
For information about how to create a custom policy and attach the custom policy to a RAM user, see (Optional) Create a custom policy.
You can associate an exclusive resource group for scheduling that uses the specifications of 4 vCPUs and 8 GiB of memory with a maximum of two VPCs. You can associate an exclusive resource group for scheduling that uses other specifications with a maximum of three VPCs.
Create an exclusive resource group for scheduling
Only an Alibaba Cloud account or a RAM user to which the AliyunBSSOrderAccess and AliyunDataWorksFullAccess policies are attached can create a resource group.
Log on to the DataWorks console.
In the left-side navigation pane, click Resource Groups. On the Exclusive Resource Groups tab of the Resource Groups page, click Create Resource Group for Scheduling to go to the DataWorks Exclusive Resources page. On this page, configure the following parameters.
Parameter
Description
Region
The region in which you want to use the exclusive resource group for scheduling.
NoteAn exclusive resource group for scheduling cannot be shared across regions. For example, an exclusive resource group for scheduling in the China (Shanghai) region can be used only by workspaces in the China (Shanghai) region.
Type
The type of the exclusive resource group. Select Exclusive Resources for Scheduling for this parameter.
Exclusive Resources for Scheduling
The specifications of the exclusive resource group for scheduling. The fee for using an exclusive resource group for scheduling and the maximum number of parallel tasks that can be run on an exclusive resource group for scheduling vary based on the specifications of the resource group. For information about the billing of exclusive resource groups for scheduling, see Billing of exclusive resource groups for scheduling (subscription).
Units
The number of machines in the exclusive resource group for scheduling. To ensure the high availability of the exclusive resource group for scheduling in the production environment, we recommend that you set this parameter to 2 or a value greater than 2.
Duration
Exclusive resource groups for scheduling are charged based on the subscription billing method. To ensure service continuity, we recommend that you select Auto-renewal. You can also go to the Renewal Management page to enable or disable auto renewal after the resource group is created. For more information, see General reference: Stop using DataWorks features or resources.
Resource Group Name
The name of the exclusive resource group for scheduling. The name must be unique within a tenant. Otherwise, an error is reported when you confirm the purchase operation.
NoteA tenant refers to an Alibaba Cloud account. Each tenant can have multiple RAM users.
Click Buy Now and complete the payment based on the instructions.
Then, DataWorks starts to initialize the exclusive resource group for scheduling. When the resource group enters the Running state, the resource group is created in the DataWorks console.
NoteDataWorks requires approximately 20 minutes to initialize the exclusive resource group for scheduling. Wait until the status of the resource group changes to Running.
After the exclusive resource group for scheduling is created in the DataWorks console, you must associate the resource group with a workspace. This way, you can select the resource group when you configure a task in the workspace.
Associate the exclusive resource group with a workspace
Only a workspace administrator can associate a resource group with a workspace and change the workspace with which a resource group is associated.
You must associate the exclusive resource group for scheduling with a workspace before you can select the resource group in the workspace. An exclusive resource group for scheduling can be shared among multiple workspaces but cannot be used across regions. For example, you can associate an exclusive resource group for scheduling in the China (Shanghai) region only with workspaces in the China (Shanghai) region. To associate an exclusive resource group for scheduling with a workspace, perform the following steps:
Log on to the DataWorks console.
In the left-side navigation pane, click Resource Groups. On the Exclusive Resource Groups tab of the Resource Groups page, find the created resource group and click Change Workspace in the Actions column.
In the Modify home workspace dialog box, find the workspace with which you want to associate the resource group and click Bind in the Actions column.
(Optional) Associate the exclusive resource group with a VPC
If an interaction is required between your exclusive resource group for scheduling and a data source or a different network environment (for example, you need to use a Shell node to access a self-managed database or a private IP address in a scheduling scenario) or if an exclusive resource group for scheduling is required to run a task that uses an EMR or CDH compute engine instance, you must associate the resource group with a VPC and configure the IP address whitelist of the data source.
Exclusive resource groups are deployed in the VPC in which DataWorks is hosted. You must associate your exclusive resource group with the required VPC. This way, a network connection can be established between the exclusive resource group and the data source. To associate an exclusive resource group for scheduling with a VPC, perform the following steps:
You can associate an exclusive resource group for scheduling that uses the specifications of 4 vCPUs and 8 GiB of memory with a maximum of two VPCs. You can associate an exclusive resource group for scheduling that uses other specifications with a maximum of three VPCs.
Log on to the DataWorks console.
In the left-side navigation pane, click Resource Groups. On the Exclusive Resource Groups tab of the Resource Groups page, find the created exclusive resource group for scheduling and click Network Settings in the Actions column. On the VPC Binding tab of the page that appears, click Add Binding.
Before you associate the exclusive resource group with a VPC, you must log on to the RAM console with your Alibaba Cloud account and authorize DataWorks to access your cloud resources.
Associate the exclusive resource group with a VPC.
On the VPC Binding tab of the Network Settings page, click Add Binding. In the Add VPC Binding panel, configure the parameters. You must configure the parameters based on the network environments of your data source and resource group. The following table describes the details.
NoteIf you want to use the resource group to access a data source, such as an Alibaba Cloud data source or a self-managed data source hosted on an ECS instance, you can select a network connectivity solution and configure network settings based on whether the resource group and data source belong to the same Alibaba Cloud account.
Parameter
Description (same region and Alibaba Cloud account)
Description (different regions or Alibaba Cloud accounts)
VPC
If your data source and the exclusive resource group belong to the same Alibaba Cloud account, we recommend that you select the VPC in which your data source resides.
If your data source and the exclusive resource group belong to different Alibaba Cloud accounts, configure this parameter based on the description for the scenario where your data source and the exclusive resource group reside in different regions.
If your data source and the exclusive resource group belong to different Alibaba Cloud accounts or reside in different regions, you must select a VPC that connects to the data source. For example, if your data source does not reside in a VPC, you can click Create VPC to create a VPC for the exclusive resource group. After the VPC is created, you can select it from the VPC drop-down list. You can also select a VPC that connects to your data source.
NoteIf your data source and the exclusive resource group reside in different regions or belong to different Alibaba Cloud accounts, you must use VPN Gateway or Express Connect to establish a connection between the VPC with which the exclusive resource group is associated and the VPC in which the data source resides and add a route that points to the IP address of the data source for the exclusive resource group. For more information, see Network connectivity solutions.
Zone
Select the zone in which your data source resides.
Select a zone from which a network connection to your data source is established.
vSwitch
If you set the VPC parameter to the VPC in which your data source resides, we recommend that you select the vSwitch with which the data source is associated.
NoteAfter you associate the exclusive resource group with the VPC in which the data source resides and a vSwitch that resides in the VPC, a route that points to the CIDR block of the VPC is automatically added. This ensures that the exclusive resource group can access the data sources in this VPC.
Select the vSwitch to which the data source is connected. If no vSwitch is available, you can click Create VSwitch to create a vSwitch for the exclusive resource group. After a vSwitch is created, select the vSwitch.
Security Groups
Security groups allow or deny access to the exclusive resource group over the Internet or an internal network. You can select an existing security group based on your business requirements, or click Create Security Group to create a security group for the resources in the exclusive resource group. For information about how to create a security group, see Add a security group rule.
Click OK.
NoteIf your data source and the exclusive resource group reside in different regions or belong to different Alibaba Cloud accounts, you must add a route that points to the IP address of your data source after you associate the exclusive resource group with a VPC.
Add host configurations. This operation is optional.
You may fail to access your data source by using IP addresses. For example, you can access your data source only by using hostnames. In this case, you must perform the following steps to add host configurations. Otherwise, the connectivity test fails when you add the data source by using its hostnames.
Click the Hostname-to-IP Mapping tab. On this tab, click Add. In the Create Hostname-to-IP Mapping dialog box, configure the parameters. The following table describes the parameters.
Parameter
Description
IP Address
The actual IP address of the data source.
Hostname
The hostname that is used to access the data source. If you want to specify multiple hostnames, place each hostname on a separate line.
If the data source has multiple IP addresses, click Add to add more host configurations.
NoteThe IP address or hostnames that are added in a host configuration must be different from the IP addresses or hostnames in existing host configurations.
You can map one IP address to multiple hostnames in a host configuration. However, one hostname can point to only one IP address.
(Optional) Configure the IP address whitelist of a data source
Even if your exclusive resource group for scheduling and your data source reside in the same zone, same VPC, and same vSwitch, the access from the resource group to the data source may still fail due to restrictions of the IP address whitelist of the data source. In this case, you must configure the IP address whitelist of your data source based on the following instructions:
If you want to establish a network connection between your exclusive resource group and your data source over an internal network, you must add the CIDR block of the vSwitch with which the exclusive resource group is associated to the IP address whitelist of the data source.
To view the CIDR block of the vSwitch with which the exclusive resource group is associated, perform the following operations: Log on to the DataWorks console and click Resource Groups in the left-side navigation pane. On the Exclusive Resource Groups tab, find the exclusive resource group and click Network Settings in the Actions column. On the VPC Binding tab, you can view the CIDR block in the VSwitch CIDR Block column.
If you want to establish a network connection between your exclusive resource group and your data source over the Internet, you must add the EIP of the exclusive resource group to the IP address whitelist of the data source.
(Optional) Test the network connectivity of the exclusive resource group
After you complete the preceding network configuration, you need to test the network connectivity between the resource group and your data source by performing the following operations:
Go to the Data Sources page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane of the SettingCenter page, choose
.
Find the desired data source and click Edit in the Actions column.
In the Connection Configuration section of the page on which you configure a data source, find the exclusive resource group for scheduling and click Test Connectivity in the Connectivity Status (Production Environment) column. If the connectivity status of the exclusive resource group for scheduling is Connectable, the network connectivity test is successful.
NoteThe displayed configuration information varies based on the data source type. You can view the configurations in the DataWorks console.
If you want to use a data source that is separately added for the development environment and production environment, you must separately test the network connectivity between the data sources in the environments and the resource group.
For information about the solutions for network connectivity between an exclusive resource group and data sources that reside in various network environments, see Network connectivity solutions.
Change the resource groups used by tasks to the exclusive resource group
Environment for the operation | Supported change operation | Entry point |
Production environment | Change the resource groups for scheduling for multiple tasks in the production environment at the same time |
Important You cannot change the resource groups for zero load nodes, workflow nodes, or Platform for AI (PAI) nodes. |
Development environment |
| Go to the DataStudio page.
|
DataStudio page | Change the resource group for scheduling for a single task on the DataStudio page | Go to the configuration tab of the task for which you want to change the resource group on the DataStudio page and click the icon in the top toolbar to change the resource group for scheduling that is used to test the task. |
View the resource usage of the exclusive resource group and monitor the resource group
You can view the resource usage of the exclusive resource group for scheduling and the number of instances that are waiting for resources in the resource group in the DataWorks console. You can also use the intelligent monitoring feature provided in Operation Center to monitor the resource usage of the resource group and the number of instances that are waiting for resources in the resource group. For more information about how to view the resource usage of a resource group, see View the resource usage of an exclusive resource group. For more information about how to monitor a resource group, see Create a custom alert rule.
(Optional) Use the O&M Assistant feature to perform command-related operations on the exclusive resource group
If you need to perform command-related operations on the exclusive resource group for scheduling during data development, you can use the O&M Assistant feature. For example, you can use the O&M Assistant feature to install a third-party Python package. For more information about the O&M Assistant feature, see Use the O&M Assistant feature.