All Products
Search
Document Center

DataWorks:Billing of data computing

Last Updated:Aug 29, 2024

DataWorks supports various types of tasks, such as PyODPS and E-MapReduce (EMR) Hive tasks. Specific types of tasks are issued by the scheduling system to their compute engines for running, and other types of tasks are run on DataWorks resource groups or issued to their compute engines for running based on the resources started by resource groups. Certain computing resources are consumed when task code is run. For tasks that are issued to their compute engines for running, data computing fees are included in your bill of the corresponding compute engine services. For tasks that are run on DataWorks serverless resource groups or tasks that are issued to their compute engines for running based on the resources started by serverless resource groups, data computing fees are included in your bill of the DataWorks service. This topic describes the billing details of data computing tasks that are run on a serverless resource group.

Fee generation scenarios

Data computing fees are generated if you use a DataWorks serverless resource group to run data computing tasks in the following services:

  • DataStudio: Data computing fees are generated when task code is run.

  • Data Quality: If data computing tasks are configured with data quality monitoring rules, data computing fees are generated when SQL statements for the rule-based check are executed.

  • DataAnalysis: Data computing fees are generated when you run data computing tasks, such as Shell and Python tasks, in DataAnalysis.

  • Operation Center: Data computing fees are generated when you run data computing tasks in Operation Center.

For more information about the types of data computing tasks supported by DataWorks, see the Appendix: Data computing tasks section in this topic.

Note

Data computing fees are not generated when you use old-version resource groups to run various types of tasks.

Billing of serverless resource groups

You are charged for serverless resource groups based on the number of CUs. One CU equals 1 vCPU core and 4 GiB of memory.

  • If you use the pay-as-you-go billing method, you are charged based on the number of CUs that are actually used to run tasks. For more information, see the Pay-as-you-go serverless resource group section of the "Billing of serverless resource groups" topic.

  • If you use the subscription billing method, you are charged based on the number of CUs that you purchase and the number of months of your subscription duration. For more information, see the Subscription serverless resource group section of the "Billing of serverless resource groups" topic.

Appendix: Data computing tasks

Identify data computing tasks

To view the type of a task, you can perform the following operations: Go to the configuration tab of the corresponding node on the DataStudio page. Click Properties in the right-side navigation pane. In the Resource Group section of the Properties tab, check whether a parameter that specifies the number of CUs for task running is displayed.

  • Data computing task: A parameter that specifies the number of CUs required for task running is displayed in the Resource Group section.

    • Scenario 1: Adjust the number of CUs required for task running.

      image

    • Scenario 2: Use the default number of CUs for task running. The default number cannot be changed.

      image

  • Scheduling task: You can select only a resource group for scheduling in the Resource Group section. No CUs need to be configured.

    image

Configuration of CUs for data computing tasks

The following content describes the default configuration and actual configuration of CUs when you use a serverless resource group to run data computing tasks:

  • Default number of CUs: Each time you run a task, the system allocates a default number of CUs for task running based on the task type. If the number of CUs that you specified is less than the default number, tasks may fail to run in an efficient manner.

  • Configured number of CUs: the actual number of CUs that you configure for task running. By default, the system displays the default number of CUs. You can adjust the number based on your business requirements. Principles for CU configuration:

    • The minimum number is 0.25 CU. The scaling step size is 0.25 CU. If an error message that indicates an insufficient CU quota for the current resource group appears, you can adjust the CU quota for data computing tasks.

    • To prevent the issue of insufficient or excess CUs, configure the number of CUs based on the default number of CUs and the CU quota for data computing tasks. For more information, see Quota management.

Note

You can adjust the number of CUs only for specific types of tasks. Examples:

  • By default, the number of CUs that are required for running of Hologres SQL tasks is 0.25. You cannot adjust the number.

  • By default, the number of CUs that are required for running of PyODPS 2 tasks is 0.5. You can adjust the number based on your business requirements. For example, you can set the number to 0.4 or 0.6.

Example: The following figures show the numbers of CUs that are required for running of Hologres SQL and PyODPS 2 tasks in DataStudio.

image

The following table describes the types of data computing tasks supported by DataWorks, the default number of CUs for each type of task, and whether the number of CUs can be changed.

Data source type

Node type

Default number of CUs

Changeable on the number of CUs

MaxCompute

ODPS MR

0.5

Yes

PyODPS 2

0.5

Yes

PyODPS 3

0.5

Yes

Hologres

Hologres SQL

0.25

No

Node to synchronize schemas of MaxCompute tables with a few clicks

0.25

Yes

Node to synchronize MaxCompute data with a few clicks

0.25

Yes

EMR

EMR Hive

0.25

No

EMR Presto

0.25

No

EMR Impala

0.25

No

EMR Trino

0.25

No

EMR MR

0.25

Yes

EMR Spark SQL

0.25

No

EMR Spark

0.5

Yes

EMR Spark Streaming

0.5

Yes

EMR Shell

0.25

Yes

CDH

CDH Hive

0.25

No

CDH Presto

0.25

No

CDH Impala

0.25

No

CDH Trino

0.25

No

CDH MR

0.25

Yes

CDH Spark SQL

0.25

No

CDH Spark

0.5

Yes

CDH Spark Streaming

0.5

Yes

CDH Shell

0.25

Yes

AnalyticDB for PostgreSQL

AnalyticDB for PostgreSQL

0.25

Yes

AnalyticDB for MySQL

AnalyticDB for MySQL

0.25

Yes

ClickHouse

ClickHouse SQL

0.25

No

General

Assignment node

0.25

Yes

Do-while node

N/A

Yes

Shell

0.25

Yes

SSH node

0.25

Yes

FTP Check

0.25

No

MySQL

0.25

No

Custom node

0.25

Yes

Algorithm

PAI DLC

0.25

No

References