If you want to change from an old-version resource group to a serverless resource group for your tasks, you must evaluate the resource consumption of the tasks before the change and then change the resource group to a serverless resource group that can support running of all the tasks. This can ensure a smooth change. This topic provides examples on how to evaluate the number of compute units (CUs) that are required by different tasks. This topic also describes the impacts before and after the change and the operation guide for changing from old-version resource groups to a serverless resource group for tasks.
Background information
DataWorks supports exclusive resource groups for Data Integration, exclusive resource groups for scheduling, and exclusive resource groups for DataService Studio. You must separately purchase and configure the resource groups based on your business requirements. To facilitate the management of DataWorks resources and improve user experience, DataWorks introduces serverless resource groups. You can use a serverless resource group in data synchronization, task scheduling, and DataService Studio at the same time. This simplifies interactions between resource groups and ensures operation consistency.
Billing
For information about the billing of old-version resource groups, see the topics in the Billing of old-version resource groups directory.
For information about the billing of serverless resource groups, see Billing of serverless resource groups.
If you change from an old-version resource group to a serverless resource group for a task, the billable items may change. For more information, see the Appendix: Comparison between billable items of different tasks after a resource group change section in this topic.
Before you change from an old-version resource group to a serverless resource group for a data computing task such as a PyODPS2 or E-MapReduce (EMR) task, you are not charged computing fees. However, you are charged computing fees after the change.
Procedure
Step 1: Query tasks for which you want to change resource groups
Data synchronization
Data Integration page
Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane, click Synchronization Task. In the Nodes section of the Data Integration page, select the resource group for Data Integration that you want to change from the Resource Group drop-down list.
DataStudio page
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
In the left-side navigation pane, click Scheduled Workflow. In the Scheduled Workflow pane, find the desired workflow, right-click the workflow name, and then select Batch Operation.
On the Node tab, select Offline synchronization and Real-time Synchronization from the Node Type drop-down list, and select the resource group for Data Integration that you want to change from the Resource Group for Data Integration drop-down list.
Task scheduling
Go to the Operation Center page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.
In the left-side navigation pane, choose
. On the page that appears, select the resource group for scheduling that needs to be changed from the Scheduling Resource Group drop-down list.
DataService Studio
Go to the DataService Studio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to DataService Studio.
In the left-side navigation pane, click Service Development. In the Service Development pane, click the icon to go to the Batch Operation tab. On the Batch Operation tab, select the resource group for DataService Studio that you want to change from the Resource Group drop-down list.
Step 2: Evaluate the required specifications of the serverless resource group to purchase
Before you change resource groups for data synchronization tasks, scheduling tasks, and DataService Studio tasks, you must evaluate the resource consumption of the tasks to determine the specifications of the serverless resource group to purchase. This ensures that the resource group can handle the workloads of related business.
The following tabs provide detailed evaluation suggestions.
Data synchronization
Batch synchronization tasks
Batch synchronization task configured by using the codeless user interface (UI)
Parallelism configured for the batch synchronization task
Specifications of a serverless resource group
<4
0.5 CUs
>=4
(Parallelism - 4) × 0.07 + 0.5
CUsBatch synchronization task configured by using the code editor and configured with JVM parameters
--Xmx value
Specifications of a serverless resource group
1.8g<=
0.5 CUs
Fixed value (GB)
Fixed value/0.9/4
CUs
Real-time synchronization tasks
Synchronization task type | Specifications of an old-version resource group | Specifications of a serverless resource group | |
Real-time synchronization from MySQL | One source database | 4 vCPUs 8 GiB | 2.5 CUs |
Two to five source databases | 8 vCPUs 16 GiB | 4 CUs | |
Six or more source databases | 12 vCPUs 24 GiB | 7 CUs | |
Real-time synchronization from PolarDB-X 1.0 | 12 vCPUs 24 GiB | 7 CUs | |
Real-time synchronization from Kafka | 4 vCPUs 8 GiB | 2.5 CUs | |
Real-time synchronization of data in a single table of another source type | 4 vCPUs 8 GiB | 2.5 CUs |
Task scheduling
If your scheduling tasks include data computing tasks such as PyODPS2 and EMR Hive tasks, the data computing tasks use a serverless resource group for computing. You must plan the specifications of the serverless resource group that you require based on your business requirements.
NoteFor information about the default number of CUs that are allowed for data computing tasks, see Appendix: Data computing tasks.
If your scheduling tasks do not include data computing tasks, the maximum number of parallel instances supported by a serverless resource group is 200, which is greater than the maximum number of parallel instances supported by an old-version resource group with the highest specifications. In this case, the default specifications of a serverless resource group can meet your business requirements and you do not need to adjust the specifications.
The maximum number of parallel instances that are supported by old-version resource groups with different specifications varies. The following table provides details.
Old-version resource group
Serverless resource group
Specifications
Maximum number of parallel instances
Maximum number of parallel instances
4 vCPUs 8 GiB
16
200
8 vCPUs 16 GiB
32
12 vCPUs 24 GiB
48
16 vCPUs 32 GiB
64
24 vCPUs 48 GiB
96
DataService Studio
Maximum QPS | Specifications of an old-version resource group | Specifications of a serverless resource group | SLA |
500 | api.s2.small | 4 CUs | 99.95% |
1000 | api.s2.medium | 8 CUs | |
2000 | api.s2.large | 16 CUs | |
500 | api.s1.small | 4 CUs | |
1000 | api.s1.medium | 8 CUs | |
2000 | api.s1.large | 16 CUs |
The specifications api.s1.small
, api.s1.medium
, and api.s1.large
are no longer available for purchase. If you are using an old-version resource group with one of these specifications to run your tasks, perform a resource group change at the earliest opportunity.
Step 3: Purchase a serverless resource group
You can purchase a serverless resource group based on the evaluation result. For information about how to purchase a serverless resource group, see Create and use a serverless resource group.
Step 4: Change to the purchased serverless resource group
Change the resource groups for Data Integration used by synchronization tasks
NoteAfter you change to a serverless resource group for your tasks, DataWorks automatically configures the recommended number of CUs for the tasks based on the configurations of the tasks. If you want to manually configure the number of CUs for the tasks, you can refer to Step 2: Evaluate the required specifications of the serverless resource group to purchase.
Change the resource groups for scheduling used by tasks
NoteIf your scheduling tasks involve data computing in your resource group, DataWorks allocates a specific amount of resources in the resource group to the involved data computing tasks. When you change the resource groups for scheduling used by scheduling tasks, the resource groups used for computing are also changed.
Change the resource groups for DataService Studio used by APIs
NoteBefore you change the resource groups for DataService Studio used by APIs to a serverless resource group, you must configure CU quotas for DataService Studio. Otherwise, you cannot select a serverless resource group when you perform the change. For information about how to configure CU quotas for DataService Studio, see Manage quotas.
What to do next
If you no longer require old-version resource groups after you change to a serverless resource group for your tasks, you can unsubscribe from the old-version resource groups. For more information, see Unsubscribe from DataWorks subscription features or resources.
Appendix: Comparison between billable items of different tasks after a resource group change
After you change from old-version resource groups to a serverless resource group for your tasks, the billable items that are involved in task running change. This section describes the changes in the billable items.
For example, if you use a serverless resource group to schedule data computing tasks that are configured with data quality monitoring rules, such as EMR Hive
tasks, DataWorks charges you the following fees: scheduling fee, computing fee generated by code execution, fee for running data quality monitoring rules, and data computing fee generated by executing SQL statements to check data quality based on monitoring rules.
Task type | Resource group | Scheduling fee | Computing fee generated by code execution | Fee for running data quality monitoring rules | Data computing fee generated by executing SQL statements to check data quality based on monitoring rules |
Tasks that use a DataWorks resource group for computing | Old-version resource groups | ||||
Serverless resource groups | |||||
Tasks that do not use a DataWorks resource group for computing | Old-version resource groups | ||||
Serverless resource groups |
You can refer to Appendix: Data computing tasks to determine whether a task uses a DataWorks resource group for computing. You are not charged computing fees for tasks that do not use a DataWorks resource group for computing. For example, data computing tasks that are run in MaxCompute do not use a DataWorks resource group for computing, and you are not charged computing fees for using a DataWorks resource group.