All Products
Search
Document Center

DataWorks:Change from an old-version resource group to a serverless resource group

Last Updated:Nov 15, 2024

If you want to change from an old-version resource group to a serverless resource group for your tasks, you must evaluate the resource consumption of the tasks before the change and then change the resource group to a serverless resource group that can support running of all the tasks. This can ensure a smooth change. This topic provides examples on how to evaluate the number of compute units (CUs) that are required by different tasks. This topic also describes the impacts before and after the change and the operation guide for changing from old-version resource groups to a serverless resource group for tasks.

Background information

DataWorks supports exclusive resource groups for Data Integration, exclusive resource groups for scheduling, and exclusive resource groups for DataService Studio. You must separately purchase and configure the resource groups based on your business requirements. To facilitate the management of DataWorks resources and improve user experience, DataWorks introduces serverless resource groups. You can use a serverless resource group in data synchronization, task scheduling, and DataService Studio at the same time. This simplifies interactions between resource groups and ensures operation consistency.

Billing

Procedure

Step 1: Query tasks for which you want to change resource groups

Data synchronization

  • Data Integration page

    1. Go to the Data Integration page.

      Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

    2. In the left-side navigation pane, click Synchronization Task. In the Nodes section of the Data Integration page, select the resource group for Data Integration that you want to change from the Resource Group drop-down list.

      image

  • DataStudio page

    1. Go to the DataStudio page.

      Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

    2. In the left-side navigation pane, click Scheduled Workflow. In the Scheduled Workflow pane, find the desired workflow, right-click the workflow name, and then select Batch Operation.

    3. On the Node tab, select Offline synchronization and Real-time Synchronization from the Node Type drop-down list, and select the resource group for Data Integration that you want to change from the Resource Group for Data Integration drop-down list.

      image

Task scheduling

  1. Go to the Operation Center page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Operation Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.

  2. In the left-side navigation pane, choose Auto Triggered Node O&M > Auto Triggered Tasks. On the page that appears, select the resource group for scheduling that needs to be changed from the Scheduling Resource Group drop-down list.

    image

DataService Studio

  1. Go to the DataService Studio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > DataService Studio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataService Studio.

  2. In the left-side navigation pane, click Service Development. In the Service Development pane, click the image icon to go to the Batch Operation tab. On the Batch Operation tab, select the resource group for DataService Studio that you want to change from the Resource Group drop-down list.

    image

Step 2: Evaluate the required specifications of the serverless resource group to purchase

Before you change resource groups for data synchronization tasks, scheduling tasks, and DataService Studio tasks, you must evaluate the resource consumption of the tasks to determine the specifications of the serverless resource group to purchase. This ensures that the resource group can handle the workloads of related business.

The following tabs provide detailed evaluation suggestions.

Data synchronization

Batch synchronization tasks

  • Batch synchronization task configured by using the codeless user interface (UI)

    Parallelism configured for the batch synchronization task

    Specifications of a serverless resource group

    <4

    0.5 CUs

    >=4

    (Parallelism - 4) × 0.07 + 0.5 CUs

  • Batch synchronization task configured by using the code editor and configured with JVM parameters

    --Xmx value

    Specifications of a serverless resource group

    1.8g<=

    0.5 CUs

    Fixed value (GB)

    Fixed value/0.9/4 CUs

Real-time synchronization tasks

Synchronization task type

Specifications of an old-version resource group

Specifications of a serverless resource group

Real-time synchronization from MySQL

One source database

4 vCPUs 8 GiB

2.5 CUs

Two to five source databases

8 vCPUs 16 GiB

4 CUs

Six or more source databases

12 vCPUs 24 GiB

7 CUs

Real-time synchronization from PolarDB-X 1.0

12 vCPUs 24 GiB

7 CUs

Real-time synchronization from Kafka

4 vCPUs 8 GiB

2.5 CUs

Real-time synchronization of data in a single table of another source type

4 vCPUs 8 GiB

2.5 CUs

Task scheduling

  • If your scheduling tasks include data computing tasks such as PyODPS2 and EMR Hive tasks, the data computing tasks use a serverless resource group for computing. You must plan the specifications of the serverless resource group that you require based on your business requirements.

    Note

    For information about the default number of CUs that are allowed for data computing tasks, see Appendix: Data computing tasks.

  • If your scheduling tasks do not include data computing tasks, the maximum number of parallel instances supported by a serverless resource group is 200, which is greater than the maximum number of parallel instances supported by an old-version resource group with the highest specifications. In this case, the default specifications of a serverless resource group can meet your business requirements and you do not need to adjust the specifications.

    The maximum number of parallel instances that are supported by old-version resource groups with different specifications varies. The following table provides details.

    Old-version resource group

    Serverless resource group

    Specifications

    Maximum number of parallel instances

    Maximum number of parallel instances

    4 vCPUs 8 GiB

    16

    200

    8 vCPUs 16 GiB

    32

    12 vCPUs 24 GiB

    48

    16 vCPUs 32 GiB

    64

    24 vCPUs 48 GiB

    96

DataService Studio

Maximum QPS

Specifications of an old-version resource group

Specifications of a serverless resource group

SLA

500

api.s2.small

4 CUs

99.95%

1000

api.s2.medium

8 CUs

2000

api.s2.large

16 CUs

500

api.s1.small

4 CUs

1000

api.s1.medium

8 CUs

2000

api.s1.large

16 CUs

Note

The specifications api.s1.small, api.s1.medium, and api.s1.large are no longer available for purchase. If you are using an old-version resource group with one of these specifications to run your tasks, perform a resource group change at the earliest opportunity.

Step 3: Purchase a serverless resource group

You can purchase a serverless resource group based on the evaluation result. For information about how to purchase a serverless resource group, see Create and use a serverless resource group.

Step 4: Change to the purchased serverless resource group

What to do next

If you no longer require old-version resource groups after you change to a serverless resource group for your tasks, you can unsubscribe from the old-version resource groups. For more information, see Unsubscribe from DataWorks subscription features or resources.

Appendix: Comparison between billable items of different tasks after a resource group change

After you change from old-version resource groups to a serverless resource group for your tasks, the billable items that are involved in task running change. This section describes the changes in the billable items.

For example, if you use a serverless resource group to schedule data computing tasks that are configured with data quality monitoring rules, such as EMR Hive tasks, DataWorks charges you the following fees: scheduling fee, computing fee generated by code execution, fee for running data quality monitoring rules, and data computing fee generated by executing SQL statements to check data quality based on monitoring rules.

Task type

Resource group

Scheduling fee

Computing fee generated by code execution

Fee for running data quality monitoring rules

Data computing fee generated by executing SQL statements to check data quality based on monitoring rules

Tasks that use a DataWorks resource group for computing

Old-version resource groups

对号2

不涉及 (1)

对号2

不涉及 (1)

Serverless resource groups

对号2

对号2

对号2

对号2

Tasks that do not use a DataWorks resource group for computing

Old-version resource groups

对号2

不涉及 (1)

对号2

不涉及 (1)

Serverless resource groups

对号2

不涉及 (1)

对号2

不涉及 (1)

Note

You can refer to Appendix: Data computing tasks to determine whether a task uses a DataWorks resource group for computing. You are not charged computing fees for tasks that do not use a DataWorks resource group for computing. For example, data computing tasks that are run in MaxCompute do not use a DataWorks resource group for computing, and you are not charged computing fees for using a DataWorks resource group.