All Products
Search
Document Center

DataWorks:Create and use a resource group of the new version

Last Updated:Jun 20, 2024

To facilitate the management of resources in different DataWorks services and improve user experience, DataWorks releases a new version of resource groups. A resource group of the new version can implement the core features of an exclusive resource group for scheduling, an exclusive resource group for Data Integration, and an exclusive resource group for DataService Studio at the same time. You can perform operations such as data synchronization, task scheduling and running, and API calling and management by using only one resource group of the new version. This topic describes how to create and use a resource group of the new version.

Prerequisites

  • You are familiar with the details about resource groups of the new version, such as the specifications, performance, and billing standards. You have determined the specifications and subscription duration that you require based on your business scenario.

  • You are granted the required permissions.

Comparison between resource groups of the new version and resource groups of the old version

Comparison item

Resource group of the old version (exclusive resource groups and shared resource groups)

Resource group of the new version (resource groups of the general-purpose type)

Classification

Resource groups are classified into resource groups for Data Integration, resource groups for scheduling, and resource groups for DataService Studio based on their purposes.

Resource groups are used for general purposes and are not classified.

Support for features

Some capabilities of DataWorks are not supported.

All capabilities of DataWorks are supported.

Support for mixed use

Each type of resource group serves only one purpose.

A resource group can be used in Data Integration, scheduling, and DataService Studio at the same time.

Sales mode

Resource groups are charged based on the specifications and the number of machines.

A resource group must contain at least one machine, and the minimum specifications of a machine are 4 vCPUs and 8 GiB of memory. The minimum step size for scaling out a resource group is one machine whose specifications are 4 vCPUs and 8 GiB of memory.

Resource groups are sold by compute unit (CU).

A resource group must contain at least two CUs. The minimum step size for scaling out a resource group is one CU.

Billing method

Exclusive resource groups support only the subscription billing method. Shared resource groups support only the pay-as-you-go billing method.

Both the subscription and pay-as-you-go billing methods are supported.

Resource waste

DataWorks provides only limited types of specifications for resource groups. This causes a specific amount of resource fragments to be generated on machines of each type of specifications. As a result, resources are wasted.

You can determine the number of CUs based on your business requirements. This prevents resource waste.

Scalability

You can upgrade or downgrade the specifications of a resource group, and increase or reduce the number of machines in the resource group.

You can directly change the number of CUs for a resource group.

Impact generated by scale-out or scale-in

Running tasks are affected.

Running tasks are not affected.

Network security

DataWorks manages inbound and outbound Internet traffic for resource groups. The Internet bandwidth of resource groups is shared by multiple users. This causes resource competition.

Users use their own Internet capabilities, which makes the behavior of users controllable.

Development trend

Resource groups of the old version will be discontinued in the future.

Resource groups of the new version will become the only resource groups that are supported by DataWorks.

Support for custom images

Custom images are not supported.

If you use a resource group of the new version to deploy tasks, you can create an image that contains all components required for running tasks. This helps meet more conditions for running tasks.

Precautions

To ensure that your resource group can access the desired data source, such as a database, a data service, or other data in a specific network environment, you must establish a network connection between the resource group and the data source in advance based on the situations of the data source. For more information, see Establish a network connection between a resource group and a data source.

Step 1: Create a resource group of the new version

  1. Go to the Resource Groups page.

    Log on to the DataWorks console. In the left-side navigation pane, click Resource Groups.

  2. On the Exclusive Resource Groups tab, click Create Resource Group to go to the buy page of resource groups of the new version.

    Parameter

    Description

    Region and Zone

    The region in which you want to create the resource group. The region must be the same as the region in which the workspace resides.

    Billing Method

    • Subscription: You must pay for the resource group before you use it.

    • Pay-as-you-go: You can use the resource group before you pay for it.

    Resource Group Specifications

    This parameter is required only if you set the Billing Method parameter to Subscription.

    Valid values: 2 to 99999999. Unit: CU.

    Note
    • 1 CU = 1 vCPU core + 4 GiB of memory. If you want to use the resource group in DataService Studio, you must purchase at least four CUs.

    • The value 99999999 indicates that the number of CUs you can purchase is not limited. However, the number of CUs that you can purchase may be affected by the inventory. If the inventory is insufficient, you must pay attention to the prompt that is displayed when you purchase a resource group of the new version.

    Resource Group Name

    The name of the resource group.

    Resource Group Description

    The description of the resource group.

    VPC

    The virtual private cloud (VPC) and vSwitch with which you want to associate the resource group. You can select a VPC based on the network that the resource group needs to access.

    • If the resource group needs to access a data source that belongs to the same Alibaba Cloud account and resides in the same region as the resource group, you can select the VPC where the data source resides and a vSwitch in the VPC.

    • If the resource group needs to access a data source that resides in a complex network environment, you must use a connection tool such as VPN Gateway or Express Connect to establish a network connection between the VPC with which the resource group is associated and the VPC where the data source resides. For more information, see Establish a network connection between a resource group and a data source.

    Note
    • If no VPCs or vSwitches are available, you must go to the VPC console to create a VPC or a vSwitch. For more information about VPCs, see What is a VPC?

    • You can associate the resource group with one or more other VPCs after the resource group is created.

    • If you set the Billing Method parameter to Subscription and the VPC that you select is used in DataService Studio, data computing, and Data Integration, you cannot associate the resource group with another VPC or change the associated VPC for the resource group when you use the resource group in DataService Studio. Make appropriate planning in advance.

    vSwitch

    Billing Cycle

    The subscription duration of the resource group. This parameter is required only if you set the Billing Method parameter to Subscription.

    Important

    To prevent your business from being affected due to service suspension or resource release when the resource group expires, we recommend that you select Auto-renewal. After you select Auto-renewal, fees are automatically deducted from your Alibaba Cloud account based on the actual prices before the resource group expires. The auto-renewal cycle is one month. You can disable auto-renewal if you no longer require this feature.

    Service-linked Role

    The service-linked role. The first time you create a resource group of the new version, you must create a service-linked role named AliyunServiceRoleForDataWorks. When you create a resource group of the new version in subsequent operations, the system automatically assigns the service-linked role.

    Note

    The service-linked role AliyunServiceRoleForDataWorks is used to access resources in a VPC, in an elastic network interface (ENI), and in a security group. For more information about the service-linked role, see DataWorks service-linked role.

Step 2: Associate the resource group with a workspace

After the resource group is created, you must associate the resource group with a workspace. Then, you can select the resource group when you create tasks in the workspace.

  • Associate the resource group when you create a workspace

    1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces.

    2. On the Workspaces page, click Create Workspace. In the Create Workspace panel, select the created resource group from the Default Resource Group drop-down list.

  • Associate the resource group with an existing workspace

    1. Go to the Resource Groups page.

      Log on to the DataWorks console. In the left-side navigation pane, click Resource Groups.

    2. On the Resource Groups page, find the created resource group, click the image.png icon in the Actions column, and then click Change Workspace. In the Change Workspace panel, find the workspace with which you want to associate the resource group and click Associate in the Actions column.

Step 3: Configure network connectivity

To ensure that your task can run as expected, you must complete network connectivity configuration. This way, the resource group can access the desired data source. For more information, see Establish a network connection between a resource group and a data source.

Step 4: Modify configuration items for the resource group

Manage quotas

If you want to use the resource group in data computing, Data Integration, and DataService Studio, you can configure the maximum CU quotas or the minimum CU quotas for the resource group. This ensures that your tasks can run as expected.

Note
  • If the billing method of the resource group is pay-as-you-go, you can configure the maximum CU quotas for the resource group to prevent excessive resource consumption.

  • If the billing method of the resource group is subscription, you can configure the minimum CU quotas for the resource group.

  1. Go to the Resource Groups page.

    Log on to the DataWorks console. In the left-side navigation pane, click Resource Groups.

  2. Change quotas for the resource group.

    • Change quotas for the resource group on the Resource Groups page.

      On the Resource Groups page, find the resource group, click the image icon in the Actions column, and then click Manage Quota. In the Manage Quota dialog box, change the maximum CU quotas or the minimum CU quotas for different purposes.

    • Change quotas for the resource group on the details page of the resource group.

      On the Resource Groups page, find the resource group and click the resource group name to go to the details page of the resource group. In the upper-right corner of the details page, click Manage Quota. In the Manage Quota dialog box, change the maximum CU quotas or the minimum CU quotas for the resource group.

Change the maximum number of parallel tasks allowed in data scheduling

If you want to use the resource group in data scheduling, you can specify the maximum number of parallel tasks that are allowed to run on the resource group.

Note

By default, the maximum number of parallel tasks that are allowed is 50. The upper limit for the maximum number of parallel tasks that are allowed is 200.

  1. Go to the Resource Groups page.

    Log on to the DataWorks console. In the left-side navigation pane, click Resource Groups.

  2. Change the maximum number of parallel tasks allowed in data scheduling.

    • Change the maximum number of parallel tasks on the Resource Groups page.

      On the Resource Groups page, find the resource group, click the image icon in the Actions column, and then click Specify Threshold for Parallel Threads of Data Scheduling. In the Specify Threshold for Parallel Threads of Data Scheduling dialog box, change the value of the Specify Threshold for Parallel Threads of Data Scheduling parameter.

    • Change the maximum number of parallel tasks on the details page of the resource group.

      On the Resource Groups page, find the resource group and click the resource group name to go to the details page of the resource group. In the upper-right corner of the details page, click Specify Threshold for Parallel Threads of Data Scheduling. In the Specify Threshold for Parallel Threads of Data Scheduling dialog box, change the value of the Specify Threshold for Parallel Threads of Data Scheduling parameter.

    Note

    The value that you specify for the Specify Threshold for Parallel Threads of Data Scheduling parameter applies only to data scheduling tasks.

Next step: Configure the resource group for different tasks

After the resource group is created and configured, you need to configure the resource group for Data Integration, data scheduling, and DataService Studio tasks to use the resource group to run the tasks. For more information, see General reference: Change the resource groups used by tasks.

Other operations

View the resource usage of a resource group of the new version

If the resource usage of a subscription resource group is high, the running of related tasks may be blocked. You can perform the following steps to view the tasks that are running on a resource group, the current resource usage of a resource group, the resource usage of a resource group at a specific point in time, and the amount of resources used by each task.

  1. Go to the Resource Groups page.

    Log on to the DataWorks console. In the left-side navigation pane, click Resource Groups.

  2. View the resource usage of a resource group.

    • View the resource usage of a resource group on the Resource Groups page.

      On the Resource Groups page, find the resource group and view the resource usage displayed in the Used CUs column.

    • View the resource usage of a resource group on the details page of the resource group.

      On the Resource Groups page, find the resource group and click the resource group name to go to the details page of the resource group. On the Resource Usage tab of the details page, view the resource usage of the resource group at a specific point in time in the curve chart for resource usage, and the information about tasks that are running or waiting to run in different scenarios.

Scale out or in a resource group

If the resource usage displayed on the details page of your subscription resource group is excessively high, you can scale out the resource group to improve the processing performance of the resource group in Data Integration, task scheduling, and DataService Studio. If the resource usage of your subscription resource group is low, you can scale in the resource group to reduce costs.

  1. Go to the Resource Groups page.

    Log on to the DataWorks console. In the left-side navigation pane, click Resource Groups.

  2. On the Resource Groups page, find the resource group, click the image icon in the Actions column, and then click Scale Out or Scale In.

    Note

    Scaling in a resource group may slow the running of tasks that use the resource group. Before you perform this operation, make sure that the scale-in operation does not affect your business.

  3. On the page that appears, change the value of the Resource Group Specifications parameter, read the terms of service, select the check boxes, and then click Buy Now.

References

  • For more information about resource groups, see Overview.

  • You can use the intelligent monitoring feature provided in Operation Center to monitor the resource usage of a resource group and the number of instances that are waiting for resources in a resource group. For more information about how to use the intelligent monitoring feature, see Create a custom alert rule.

  • When you view the status of a resource group on the Resource Groups page, take note of the following items:

    • If the resource group is expired, you can click the image.png icon in the Actions column and then click Renew to renew the resource group.

    • If the resource usage of the resource group reaches the warning threshold, you can click the image.png icon in the Actions column and then click Scale Out to scale out the resource group. For more information, see the Scale out or in a resource group section in this topic.