DataWorks supports various types of tasks, such as PyODPS and E-MapReduce (EMR) Hive tasks. Specific types of tasks are issued by the scheduling system to their compute engines for running, and other types of tasks are run on DataWorks resource groups or issued to their compute engines for running based on the resources started by resource groups. Certain computing resources are consumed when task code is run. For tasks that are issued to their compute engines for running, data computing fees are included in your bill of the corresponding compute engine services. For tasks that are run on DataWorks serverless resource groups or tasks that are issued to their compute engines for running based on the resources started by serverless resource groups, data computing fees are included in your bill of the DataWorks service. This topic describes the billing details of data computing tasks that are run on a serverless resource group.
Fee generation scenarios
Data computing fees are generated if you use a DataWorks serverless resource group to run data computing tasks in the following services:
DataStudio: Data computing fees are generated when task code is run.
Data Quality: If data computing tasks are configured with data quality monitoring rules, data computing fees are generated when SQL statements for the rule-based check are executed.
DataAnalysis: Data computing fees are generated when you run data computing tasks, such as Shell and Python tasks, in DataAnalysis.
Operation Center: Data computing fees are generated when you run data computing tasks in Operation Center.
For more information about the types of data computing tasks supported by DataWorks, see the Appendix: Data computing tasks section in this topic.
Data computing fees are not generated when you use old-version resource groups to run various types of tasks.
Billing of serverless resource groups
You are charged for serverless resource groups based on the number of CUs. One CU equals 1 vCPU core and 4 GiB of memory
.
If you use the pay-as-you-go billing method, you are charged based on the number of CUs that are actually used to run tasks. For more information, see the Pay-as-you-go serverless resource group section of the "Billing of serverless resource groups" topic.
If you use the subscription billing method, you are charged based on the number of CUs that you purchase and the number of months of your subscription duration. For more information, see the Subscription serverless resource group section of the "Billing of serverless resource groups" topic.
Appendix: Data computing tasks
Identify data computing tasks
To view the type of a task, you can perform the following operations: Go to the configuration tab of the corresponding node on the DataStudio page. Click Properties in the right-side navigation pane. In the Resource Group section of the Properties tab, check whether a parameter that specifies the number of CUs for task running is displayed.
Data computing task: A parameter that specifies the number of CUs required for task running is displayed in the Resource Group section.
Scenario 1: Adjust the number of CUs required for task running.
Scenario 2: Use the default number of CUs for task running. The default number cannot be changed.
Scheduling task: You can select only a resource group for scheduling in the Resource Group section. No CUs need to be configured.
Configuration of CUs for data computing tasks
The following content describes the default configuration and actual configuration of CUs when you use a serverless resource group to run data computing tasks:
Default number of CUs: Each time you run a task, the system allocates a default number of CUs for task running based on the task type. If the number of CUs that you specified is less than the default number, tasks may fail to run in an efficient manner.
Configured number of CUs: the actual number of CUs that you configure for task running. By default, the system displays the default number of CUs. You can adjust the number based on your business requirements. Principles for CU configuration:
The minimum number is 0.25 CU. The scaling step size is 0.25 CU. If an error message that indicates an insufficient CU quota for the current resource group appears, you can adjust the CU quota for data computing tasks.
To prevent the issue of insufficient or excess CUs, configure the number of CUs based on the default number of CUs and the CU quota for data computing tasks. For more information, see Quota management.
You can adjust the number of CUs only for specific types of tasks. Examples:
By default, the number of CUs that are required for running of Hologres SQL tasks is 0.25. You cannot adjust the number.
By default, the number of CUs that are required for running of PyODPS 2 tasks is 0.5. You can adjust the number based on your business requirements. For example, you can set the number to 0.4 or 0.6.
Example: The following figures show the numbers of CUs that are required for running of Hologres SQL and PyODPS 2 tasks in DataStudio.
The following table describes the types of data computing tasks supported by DataWorks, the default number of CUs for each type of task, and whether the number of CUs can be changed.
Data source type | Node type | Default number of CUs | Changeable on the number of CUs |
MaxCompute | 0.5 | Yes | |
0.5 | Yes | ||
0.5 | Yes | ||
Hologres | 0.25 | No | |
Node to synchronize schemas of MaxCompute tables with a few clicks | 0.25 | Yes | |
0.25 | Yes | ||
EMR | 0.25 | No | |
0.25 | No | ||
EMR Impala | 0.25 | No | |
EMR Trino | 0.25 | No | |
0.25 | Yes | ||
0.25 | No | ||
0.5 | Yes | ||
0.5 | Yes | ||
0.25 | Yes | ||
CDH | 0.25 | No | |
0.25 | No | ||
0.25 | No | ||
CDH Trino | 0.25 | No | |
0.25 | Yes | ||
0.25 | No | ||
0.5 | Yes | ||
CDH Spark Streaming | 0.5 | Yes | |
CDH Shell | 0.25 | Yes | |
AnalyticDB for PostgreSQL | 0.25 | Yes | |
AnalyticDB for MySQL | 0.25 | Yes | |
ClickHouse | 0.25 | No | |
General | 0.25 | Yes | |
Do-while node | N/A | Yes | |
0.25 | Yes | ||
0.25 | Yes | ||
0.25 | No | ||
0.25 | No | ||
0.25 | Yes | ||
Algorithm | 0.25 | No |
References
For more information about the billing of serverless resource groups, see Billing of serverless resource groups.
For more information about how to create and use a serverless resource group, see Create and use a serverless resource group.