All Products
Search
Document Center

DataWorks:Create and use a serverless resource group

Last Updated:Oct 31, 2024

To facilitate the management of resources in DataWorks and improve user experience, DataWorks introduces serverless resource groups. A serverless resource group can implement the core features of an exclusive resource group for scheduling, an exclusive resource group for Data Integration, and an exclusive resource group for DataService Studio at the same time. You can perform operations such as data synchronization, task scheduling and running, and API calling and management by using only one serverless resource group. This topic describes how to create and use a serverless resource group.

Prerequisites

  • You are familiar with the details about serverless resource groups, such as the specifications, performance, and billing standards. You have determined the specifications and subscription duration that you require based on your business scenario. For more information, see Overview of DataWorks resource groups and Billing of serverless resource groups.

  • Serverless resource groups are supported in the following regions: China (Beijing), China (Shanghai), China (Shenzhen), China (Hangzhou), China (Hong Kong), China (Zhangjiakou), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Japan (Tokyo), UK (London), US (Silicon Valley), Germany (Frankfurt), and US (Virginia).

  • You are granted the required permissions.

  • If you want to use a serverless resource group in a virtual network operator (VNO) environment, you must first check whether your service provider allows you to purchase serverless resource groups.

Comparison between serverless resource groups and old-version resource groups

Comparison item

Old-version resource group (exclusive resource groups and shared resource groups)

Serverless resource group

Classification

Resource groups are classified into resource groups for Data Integration, resource groups for scheduling, and resource groups for DataService Studio based on their purposes.

Resource groups are used for general purposes and are not classified.

Support for features

Some capabilities of DataWorks are not supported.

All capabilities of DataWorks are supported.

Support for mixed use

Each type of resource group serves only one purpose.

A resource group can be used in data synchronization, scheduling, and DataService Studio at the same time.

Sales mode

Resource groups are charged based on the specifications and the number of machines.

A resource group must contain at least one machine, and the minimum specifications of a machine are 4 vCPUs and 8 GiB of memory. The minimum step size for scaling out a resource group is one machine whose specifications are 4 vCPUs and 8 GiB of memory.

Resource groups are sold by compute unit (CU).

A resource group must contain at least two CUs. The minimum step size for scaling out a resource group is one CU.

Billing method

  • Exclusive resource groups support only the subscription billing method.

  • Shared resource groups support only the pay-as-you-go billing method.

Both the subscription and pay-as-you-go billing methods are supported.

Resource waste

DataWorks provides only limited types of specifications for resource groups. This causes a specific amount of resource fragments to be generated on machines of each type of specifications. As a result, resources are wasted.

You can determine the number of CUs based on your business requirements. This prevents resource waste.

Scalability

  • You can upgrade or downgrade the specifications of a resource group.

  • You can also increase or reduce the number of machines in a resource group.

You can directly change the number of CUs for a resource group.

Impact generated by scale-out or scale-in

Running tasks are affected.

Running tasks are not affected.

Network security

DataWorks manages inbound and outbound Internet traffic for resource groups. The Internet bandwidth of resource groups is shared by multiple users. This causes resource competition.

Users use their own Internet capabilities, which makes the behavior of users controllable.

Development trend

Old-version resource groups will be discontinued in the future.

Serverless resource groups will become the only resource groups that are supported by DataWorks.

Support for custom images

Custom images are not supported.

Custom images are supported. If you use a serverless resource group to deploy tasks, you can create an image that contains all components required for running tasks. This helps meet more conditions for running tasks.

Precautions

  • To ensure that your resource group can access the desired data source, such as a database, a data service, or other data in a specific network environment, you must establish a network connection between the resource group and the data source in advance based on the situations of the data source. For more information, see Network connectivity solutions.

    Important

    You can associate a serverless resource group with a virtual private cloud (VPC) to enable the resource group to access a data source or an address in a complex network environment over an internal network. By default, serverless resource groups cannot access the Internet. If you want to use a serverless resource group to access a data source or a network environment over the Internet, you must configure an Internet NAT gateway for the VPC with which the resource group is associated and associate the Internet NAT gateway with an elastic IP address (EIP). For more information, see the Scenario 5: Establish a network connection between a resource group and a data source that is deployed on the Internet section in this topic.

  • If you have associated a serverless resource group with a VPC and a vSwitch, do not modify the configurations of the VPC and vSwitch. Otherwise, DataWorks tasks that run on the serverless resource group may fail.

Billing of serverless resource groups

For information about billing of serverless resource groups, see Billing of serverless resource groups.

Step 1: Create a serverless resource group

  1. Go to the Resource Groups page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, click Resource Group.

  2. On the Exclusive Resource Groups tab, click Create Resource Group to go to the buy page of serverless resource groups.

    Parameter

    Description

    Region and Zone

    The region in which you want to create the resource group. The region must be the same as the region in which the workspace resides.

    Billing Method

    • Subscription: You must pay for the resource group before you use it.

    • Pay-as-you-go: You can use the resource group before you pay for it.

    Resource Group Specifications

    This parameter is required only if you set the Billing Method parameter to Subscription.

    Valid values: 2 to 99999999. Unit: CU.

    Note
    • 1 CU = 1 vCPU core + 4 GiB of memory. For information about purchase suggestions and the minimum specifications that are required to run different types of tasks, see Performance metrics and purchase suggestions.

    • The value 99999999 indicates that the number of CUs you can purchase is not limited. However, the number of CUs that you can purchase may be affected by the inventory. If the inventory is insufficient, you must pay attention to the prompt that is displayed when you purchase a serverless resource group.

    Resource Group Name

    The name of the resource group.

    Resource Group Description

    The description of the resource group.

    VPC

    The VPC and vSwitch with which you want to associate the resource group. You can select a VPC based on the network that the resource group needs to access.

    • If the resource group needs to access a data source that belongs to the same Alibaba Cloud account and resides in the same region as the resource group, you can select the VPC where the data source resides and a vSwitch in the VPC.

    • If the resource group needs to access a data source that resides in a complex network environment, you must use a connection tool such as VPN Gateway or Express Connect to establish a network connection between the VPC with which the resource group is associated and the VPC where the data source resides. For more information, see Network connectivity solutions.

    Note
    • If no VPCs or vSwitches are available, you must go to the VPC console to create a VPC or a vSwitch. For more information about VPCs, see What is a VPC?

    • You can associate the resource group with one or more other VPCs after the resource group is created.

    • If you set the Billing Method parameter to Subscription and the VPC that you select is used in DataService Studio, data computing, and data synchronization, you cannot associate the resource group with another VPC or change the associated VPC for the resource group when you use the resource group in DataService Studio. Make appropriate planning in advance.

    • If you have associated a serverless resource group with a VPC and a vSwitch, do not modify the configurations of the VPC and vSwitch. Otherwise, DataWorks tasks that run on the serverless resource group may fail.

    vSwitch

    Billing Cycle

    The subscription duration of the resource group. This parameter is required only if you set the Billing Method parameter to Subscription.

    Important

    To prevent your business from being affected due to service suspension or resource release when the resource group expires, we recommend that you select Auto-renewal. After you select Auto-renewal, fees are automatically deducted from your Alibaba Cloud account based on the actual prices before the resource group expires. The auto-renewal cycle is one month.

    Service-linked Role

    The service-linked role. The first time you create a serverless resource group, you must create a service-linked role named AliyunServiceRoleForDataWorks. When you create a serverless resource group in subsequent operations, the system automatically assigns the service-linked role.

    Note

    The service-linked role AliyunServiceRoleForDataWorks is used to access resources in a VPC, in an elastic network interface (ENI), and in a security group. For more information about the service-linked role, see DataWorks service-linked role.

Step 2: Associate the resource group with a workspace

After the resource group is created, you must associate the resource group with a workspace. Then, you can select the resource group when you create tasks in the workspace.

  • Associate the resource group when you create a workspace

    1. Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Workspace in the left-side navigation pane.

    2. On the Workspaces page, click Create Workspace. On the Create Workspace page, select the created resource group from the Default Resource Group drop-down list.

  • Associate the resource group with an existing workspace

    1. Go to the Resource Groups page.

      Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, click Resource Group.

    2. On the Resource Groups page, find the created resource group and click Associate Workspace in the Actions column. In the Associate Workspace panel, find the workspace with which you want to associate the resource group and click Associate in the Actions column.

Step 3: Configure network connectivity

To ensure that your task can run as expected, you must complete network connectivity configuration. This way, the resource group can access the desired data source. For more information, see Network connectivity solutions.

Important

You can associate a serverless resource group with a virtual private cloud (VPC) to enable the resource group to access a data source or an address in a complex network environment over an internal network. By default, serverless resource groups cannot access the Internet. If you want to use a serverless resource group to access a data source or a network environment over the Internet, you must configure an Internet NAT gateway for the VPC with which the resource group is associated and associate the Internet NAT gateway with an elastic IP address (EIP). For more information, see the Scenario 5: Establish a network connection between a resource group and a data source that is deployed on the Internet section in this topic.

You can configure an internal DNS resolution record for the VPC with which the serverless resource group is associated. This way, the serverless resource group can access the related data source by using a custom internal domain name. For example, if you associate the serverless resource group with the VPC in which a CDH cluster is deployed and configure an internal DNS resolution record for the VPC, the serverless resource group can access the CDH cluster by using a custom domain name. For more information, see Preparations: Obtain configuration information about a CDH or CDP cluster and configure network connectivity.

Step 4: Modify configuration items for the resource group

Manage quotas

If you want to use the resource group in data computing, data synchronization, and DataService Studio, you can configure the maximum CU quotas or the minimum CU quotas for the resource group. This ensures that your tasks can run as expected.

Note
  • If the billing method of the resource group is pay-as-you-go, you can configure the maximum CU quotas for the resource group to prevent excessive resource consumption.

  • If the billing method of the resource group is subscription, you can configure the minimum CU quotas for the resource group.

  1. Go to the Resource Groups page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, click Resource Group.

  2. Change quotas for the resource group.

    • Change quotas for the resource group on the Resource Groups page.

      On the Resource Groups page, find the resource group, click the image icon in the Actions column, and then click Manage Quota. In the Manage Quota dialog box, change the maximum CU quotas or the minimum CU quotas for different purposes.

    • Change quotas for the resource group on the details page of the resource group.

      On the Resource Groups page, find the resource group and click the resource group name to go to the details page of the resource group. In the upper-right corner of the details page, click Manage Quota. In the Manage Quota dialog box, change the maximum CU quotas or the minimum CU quotas for the resource group.

Change the maximum number of parallel tasks allowed in data scheduling

If you want to use the resource group in data scheduling, you can specify the maximum number of parallel tasks that are allowed to run on the resource group.

Note

By default, the maximum number of parallel tasks that are allowed is 50. The upper limit for the maximum number of parallel tasks that are allowed is 200.

  1. Go to the Resource Groups page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, click Resource Group.

  2. Change the maximum number of parallel tasks allowed in data scheduling.

    • Change the maximum number of parallel tasks on the Resource Groups page.

      On the Resource Groups page, find the resource group, click the image icon in the Actions column, and then click Specify Threshold for Parallel Threads of Data Scheduling. In the Specify Threshold for Parallel Threads of Data Scheduling dialog box, change the value of the Specify Threshold for Parallel Threads of Data Scheduling parameter.

    • Change the maximum number of parallel tasks on the details page of the resource group.

      On the Resource Groups page, find the resource group and click the resource group name to go to the details page of the resource group. In the upper-right corner of the details page, click Specify Threshold for Parallel Threads of Data Scheduling. In the Specify Threshold for Parallel Threads of Data Scheduling dialog box, change the value of the Specify Threshold for Parallel Threads of Data Scheduling parameter.

    Note

    The value that you specify for the Specify Threshold for Parallel Threads of Data Scheduling parameter is the upper limit for the number of tasks that can be scheduled in parallel on the resource group. The value is not related to task running or does not limit task running behavior.

Next step: Configure the resource group for different tasks

After the resource group is created and configured, you need to configure the resource group for data synchronization, data scheduling, and DataService Studio tasks to use the resource group to run the tasks. For more information, see General reference: Change the resource groups used by tasks.

Other operations

View the resource usage of a serverless resource group

If the resource usage of a subscription resource group is high, the running of related tasks may be blocked. You can perform the following steps to view the tasks that are running on a resource group, the current resource usage of a resource group, the resource usage of a resource group at a specific point in time, and the amount of resources used by each task.

  1. Go to the Resource Groups page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, click Resource Group.

  2. View the resource usage of a resource group.

    • View the resource usage of a resource group on the Resource Groups page.

      On the Resource Groups page, find the resource group and view the resource usage displayed in the Used CUs column.

    • View the resource usage of a resource group on the details page of the resource group.

      On the Resource Groups page, find the resource group and click the resource group name to go to the details page of the resource group. On the Resource Usage tab of the details page, view the resource usage of the resource group at a specific point in time in the curve chart for resource usage, and the information about tasks that are running or waiting to run in different scenarios.

Scale out or in a resource group

If the resource usage displayed on the details page of your subscription resource group is excessively high, you can scale out the resource group to improve the processing performance of the resource group in data synchronization, task scheduling, and DataService Studio. If the resource usage of your subscription resource group is low, you can scale in the resource group to reduce costs.

  1. Go to the Resource Groups page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, click Resource Group.

  2. On the Resource Groups page, find the resource group, click the image icon in the Actions column, and then click Scale Out or Scale In.

    Note

    Scaling in a resource group may slow the running of tasks that use the resource group. Before you perform this operation, make sure that the scale-in operation does not affect your business.

  3. On the page that appears, change the value of the Resource Group Specifications parameter, read the terms of service, select the check boxes, and then click Buy Now.

References

  • For more information about resource groups, see Overview of DataWorks resource groups.

  • You can use the intelligent monitoring feature provided in Operation Center to monitor the resource usage of a resource group and the number of instances that are waiting for resources in a resource group. For more information about how to use the intelligent monitoring feature, see Create a custom alert rule.

  • When you view the status of a resource group on the Resource Groups page, take note of the following items:

    • If the resource group is expired, you can click the image.png icon in the Actions column and then click Renew to renew the resource group.

    • If the resource usage of the resource group reaches the warning threshold, you can click the image.png icon in the Actions column and then click Scale Out to scale out the resource group. For more information, see the Scale out or in a resource group section in this topic.

  • If a specific development environment, such as an environment with third-party library dependencies, is required for running your tasks on a serverless resource group, you can create a custom image that contains all required development packages and dependencies. Then, you can use the custom image as the runtime environment when you run tasks on the serverless resource group.