All Products
Search
Document Center

MaxCompute:Use resource observation

Last Updated:Sep 10, 2024

The resource observation feature allows you to view the monitoring data of various resources, such as Tunnel resources, computing resources, and storage resources, in a specific period of time. You can view metric data in line charts or tables to optimize and adjust execution plans and resource configurations of jobs. This helps improve the execution efficiency and performance of jobs. This topic describes how to view the resource usage of MaxCompute.

Supported regions

The following table describes the regions in which the resource observation feature can be used to observe various resources.

Resource type

Supported region

Computing resources

China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Ulanqab), China (Chengdu), China (Hong Kong), US (Silicon Valley), US (Virginia), Malaysia (Kuala Lumpur), Japan (Tokyo), Germany (Frankfurt), Indonesia (Jakarta), UK (London), and Singapore

Storage resources

China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Zhangjiakou), and China (Ulanqab)

Tunnel resources

China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), and China (Chengdu)

Job performance

China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Japan (Tokyo), US (Silicon Valley), US (Virginia), Germany (Frankfurt), UK (London), and SAU (Riyadh - Partner Region)

Permissions

Alibaba Cloud accounts: have full read and operation permissions for resource observation.

RAM users: require RAM permissions. For more information, see RAM permissions

Computing resources

You can view the consumption of compute units (CUs) in a subscription or pay-as-you-go quota.

Procedure

  1. Log on to the MaxCompute console, and select a region in the top navigation bar.

  2. In the left-side navigation pane, click Resource Observation.

  3. On the Resource Observation page, click the Computing Resources tab.

  4. Select a level-1 quota and a time range to query the data of each metric.

    Note

    The Resource Observation page in the MaxCompute console is provided to improve user experience on the chart display. A maximum of 60 time points can be displayed for each metric in a chart. Therefore, if the selected time range is greater than one hour, the average aggregation algorithm is used and the aggregation data within the time range is displayed in a chart by default. The average aggregation data is the number of minutes within the selected time range divided by 60. You can set the Aggregation Algorithm parameter to Maximum based on your business requirements to analyze the resource consumption in a more comprehensive manner.

  5. Click the image.png icon on the left side of the desired level-2 quota to view the resource usage trend chart of the level-2 quota. You can view charts of multiple level-2 quotas at the same time.

  6. View the list of projects that are associated with each level-2 quota.

Metrics

Metric name

Description

CPU Resources

The trend of the CPU utilization of the current quota. Click a time point to view the job snapshot list that corresponds to the time point.

Memory Resources (Unit: MB/100)

The trend of the memory usage of the current quota group.

Important

The pay-as-you-go resources come from a shared resource pool. These resources are consumed to run computing jobs. Computing jobs compete for resources, and the resources that can be used by each job cannot be specified. If a user continuously requests a large number of resources, MaxCompute limits the resource usage of the user to ensure that other users can use pay-as-you-go computing resources.

Quotas and associated projects: allow you to identify the projects that define a level-2 quota as the default quota.

Storage resources

You can view the total storage usage and the storage usage percentages of different storage types in the current region. You can also view the storage usage trends of different storage types and the detailed table or partition storage information based on the project and the time range that you select.

Procedure

  1. Log on to the MaxCompute console. In the top navigation bar, select a region.

  2. In the left-side navigation pane, click Resource Observation.

  3. On the Resource Observation page, click the Storage Resource tab to view the total storage usage and storage usage distribution of different storage types on the current day.

  4. Optional. Select a time range and one or more projects to view the storage usage trend. By default, 7d, indicating 7 days, is selected as the time range, and all projects are selected. You can manually select up to 8 projects.

  5. Optional. On the Project Details tab in the Storage Details section, select a date to view the storage usage of each project on the date. The default date is the current day.

  6. Optional. On the Table/Partition Details tab in the Storage Details section, select a date and a project to view the storage usage of tables and partitions in the project on the date. The default date is the current day.

Metrics

Metric name

Description

Storage Usage on the Current Day

Displays the total storage usage and the storage usage percentage of each storage type in the current region. The data is updated approximately every hour.

Storage Distribution

Displays the number of projects, tables, and partitions in the current region. The data is updated every day.

Storage Trend

  • Group by storage type: displays the storage usage of all projects or selected projects in the current region and the storage usage trend of each storage type over time.

  • Group by project: displays the storage usage trends of different storage types over time for top N (8 by default) projects that have the highest total storage usage or the selected projects.

Project Details

Displays the storage usage details of different storage types of projects whose total storage usage values are greater than 0 on a specified date in the current region. You can select a date within the last year. The Project Details tab also compares the total storage usage of the projects on the current day with that on the previous day, 7 days ago, or 30 days ago.

Table/Partition Details

Displays the storage types, storage size, comparison between the storage usage on the current day and storage usage on the previous day, 7 days ago, or 30 days ago.

Tunnel resources

You can view the usage of resources in shared resource groups and subscription-based exclusive resource groups of MaxCompute Tunnel. You can also view the data of the metrics that are collected at specific time ranges of a project.

Procedure

  1. Log on to the MaxCompute console, and select a region in the top navigation bar.

  2. In the left-side navigation pane, click Resource Observation.

  3. On the Resource Observation page, click the Data Transmission Service tab.

  4. Select a quota, a project, and a time range to query the data of each metric.

Metrics

Metric name

Description

Request Parallelism

The request parallelism of a resource group, including the parallelism of upload requests, the parallelism of download requests, and the parallelism of all requests. The metric data is displayed in a line chart.

Throughput(B/s)

The throughput of a resource group, including the upload throughput and download throughput. The metric data is displayed in a line chart.

Top N Access Tables(Concurrency)

If you select Tunnel Batch Upload from the Mode of use drop-down list and select testtable from the Table Name drop-down list, the parallelism of the testtable table that is uploaded by using Tunnel Batch in the current resource group is obtained. The metric data is displayed in a line chart.

Source IP Address(B/s)

If you select Tunnel Batch Upload from the Mode of use drop-down list and select testtable from the Table Name drop-down list, the amount of data that is transmitted per second from each source IP address of the testtable table that is uploaded by using Tunnel Batch in the current resource group is obtained. The metric data is displayed in a line chart.

Number of Error Codes

The number of times the status codes 500 and 429 are reported for a resource group.

Quotas and associated projects: allow you to identify the projects that define a level-2 quota as the default quota.

Job performance

You can view the quantity, CU usage, and running durations of computing jobs and determine whether the job performance meets your expectations.

Procedure

  1. Log on to the MaxCompute console, and select a region in the top navigation bar.

  2. In the left-side navigation pane, click Resource Observation.

  3. On the Resource Observation page, click the Job Performance Observation tab.

  4. You can filter and group the jobs that you want to view based on the following parameters and group metric data in charts based on different dimensions.

    Parameter

    Description

    Time Period

    Required. The time range (start time and end time) that is used to filter completed jobs.

    You can select a preset time range or configure a custom time range.

    • 1d: previous day.

    • 3d: previous 3 days.

    • 7d: previous 7 days.

    • Custom time range: Click the drop-down list, select a date, and then click Select Time to select a time range.

    Note

    The default time range is the previous day. The maximum time range is 7 days and the minimum time range is 1 hour. You can search for only jobs in the previous 45 days.

    Project Name

    The names of the MaxCompute projects that are used to filter completed jobs.

    Note

    By default, all projects are selected. You can select up to eight projects.

    Quota

    The computing quotas that are used to filter completed jobs.

    Note

    By default, all computing quotas are selected. You can select up to eight level-2 quotas. For more information about computing quotas, see Manage quotas for computing resources in the new MaxCompute console.

    Group By

    Required. The group of data in charts. You can define groups based on dimensions and chart types.

    Valid values:

    • No Group: displays the trends of metrics over time for all jobs within the selected filter range. This is the default value.

    • Project: displays the metrics of all jobs within the selected filter range by project.

      Note

      If you select Project, you must specify Project Name in the filter parameters and select up to eight projects.

    • Quota: displays the metrics of all jobs within the selected filter range by level-2 quota.

      Note

      If you select Quota, you must specify Quota in the filter parameters and select up to eight level-2 quotas.

    • Job Type: displays the metrics of all jobs within the selected filter range by job type.

      • SQL: SQL job.

      • SQLRT: MaxCompute Query Acceleration (MCQA) SQL job.

      • LOT: MapReduce job.

      • CUPID: Spark or Mars job.

      • Algo_Task: machine learning job.

      • GRAPH: graph computing job.

    • Job End Status: displays the metrics of all jobs within the selected filter range based on the status when the job ends.

      • Success: The job succeeds.

      • Failed: The job fails.

      • Canceled: The job is canceled.

  5. Click Search to view the statistics of each metric.

  6. Optional. Select Data Summary to view the statistics of each metric based on the selected time range.

    Parameter

    Description

    By Hour

    One hour is a time granularity. If you select By Hour, data statistics of jobs that are completed in the current hour are displayed. By default, data at hourly granularity is displayed.

    For example, if the current point in time is 14:00 on May 6, 2024, the statistics of each metric of jobs completed in the time range from 14:00 to 15:00 on May 6, 2024 are displayed.

    By Day

    One day is a time granularity. If you select By Day, data statistics of jobs that are completed in the current day are displayed.

    For example, if the current date is May 6, 2024, the statistics of each metric of the jobs completed in the time range from 00:00 on May 6, 2024 to 00:00 on May 7, 2024, are displayed.

  7. Select an option for Comparison Period to view the historical data statistics of the day or hour that are obtained by subtracting the number of days or hours specified in Comparison Period from the current date or hour.

    Default value: No Comparison. You can also select Previous 30 Days, Previous 7 Days, or Previous 1 Day. For example, if the current point in time is 14:00 on May 6, 2024 and Previous 30 Days is selected, the data at 14:00 on April 6, 2024 is used to compare with the current data.

Metrics

  • CU Usage Trend (Unit: Core × Hour)

    Metric name

    Description

    CPU-hour (Unit: Core × Hour)

    The number of CPU-hours consumed for completed jobs within the selected filter range.

    1 CPU-hour refers to that 1 CPU core is consumed for 1 hour. Number of CPU-hours = Number of CPU cores × Duration

    Memory-hour (Unit: GB × Hour)

    The number of memory-hours consumed for completed jobs within the selected filter range.

    1 memory-hour refers to that 1-GB memory space is consumed for 1 hour. Number of memory-hours = Memory space × Duration.

    Top 10 CPU-Hour Consumption Analysis or Top 10 Memory-Hour Consumption Analysis

    Displays the top 10 jobs that consume the most CPU-hours or memory-hours and the top 10 signatures and ExtNodeIds of jobs that are ranked by the highest total CPU-hour, the highest average CPU-hour, the highest total memory-hour, and the highest average memory-hour within the selected filter range.

  • Job Runtime Period (Unit: seconds)

    Metric name

    Description

    Average Runtime Duration

    The average job duration of completed jobs within the selected filter range.

    Maximum Runtime Duration

    The maximum job duration of completed jobs within the selected filter range.

    Minimum Runtime Duration

    The minimum job duration of completed jobs within the selected filter range.

    Select a quantile runtime duration

    The duration that is taken to complete a specified quantile of jobs within the selected filter range. The quantile can be the 1st quantile, 5th quantile, 10th quantile, 50th quantile, 90th quantile, 95th quantile, or 99th quantile.

    For example, for the 99th quantile, this metric indicates the duration that is taken to complete 99% jobs.

    Top 10 Job Runtime Analysis

    Displays the top 10 jobs that have the longest running durations and the top 10 signatures and ExtNodeIds of jobs that are ranked by the longest total running durations and longest average running durations.

  • Job Count Trend (Unit: counts): displays the number of jobs that are completed within the selected filter range.

  • Job Scan Amount Trend (Unit: GB): displays the amount of data that is scanned by completed jobs within the selected filter range. The unit is adaptively changed and the actually used unit is displayed in charts.

  • Trend of Job Scan Amount per CU-Hour (Unit: GB): displays the average amount of data that is scanned by jobs per CU-hour within the selected filter range. The unit is adaptively changed and the actually used unit is displayed in charts. 1 CU contains 1 CPU core and 4 GB of memory. The value is calculated by using MAX(CPU-hours, Roundup(Memory-hours/4)).

You can also use the tenant-level Information Schema to collect statistics on the preceding metrics. You need to take note that the Information Schema task_history table contains task instances that are generated by all operations. However, the data of metrics displayed on the Job Performance Observation tab in the console is obtained only from the jobs that consume computing resources. Therefore, the statistical results obtained by using the tenant-level Information Schema may be different from the statistical results displayed on the Job Performance Observation tab.

The following SQL statements show an example.

SET odps.namespace.schema=TRUE;
SELECT to_char (end_time, 'yyyy-mm-dd hh'), -- The hour in which the job completes.
       -- to_char (end_time, 'yyyy-mm-dd'), -- The date on which the job completes. If daily granularity is used, use this line to replace the preceding line.
       sum(cast(cost_cpu/100/3600 as DECIMAL(18,5) )) cost_cpuh, -- The CPU-hours.
       sum(cast(cost_mem/1024/3600 as DECIMAL(18,5) )) cost_memh, -- The memory-hours.
       avg(datediff(end_time, start_time, 'ss')), -- The average runtime duration of jobs.
       min(datediff(end_time, start_time, 'ss')), -- The minimum runtime duration of jobs.
       max(datediff(end_time, start_time, 'ss'))  -- The maximum runtime duration of jobs.
       -- status, -- Group basis: status: job status; project: task_catalog; job type: task_type. 
FROM SYSTEM_CATALOG.INFORMATION_SCHEMA.tasks_history
WHERE ds>=to_char(date_add(getdate(),-7),'yyyymmdd')  -- You can add other filter conditions based on your business requirements.
and task_type in ('SQL','SQLRT','LOT','CUPID','ALgoTask')
GROUP BY to_char (end_time, 'yyyy-mm-dd hh')
         -- to_char (end_time, 'yyyy-mm-dd'), -- The date on which the job completes. If daily granularity is used, use this line to replace the preceding line.
         -- status, -- Group basis: status: job status; project: task_catalog; job type: task_type.
order BY to_char (end_time, 'yyyy-mm-dd hh') ASC;
         -- to_char (end_time, 'yyyy-mm-dd'); -- The date on which the job completes. If daily granularity is used, use this line to replace the preceding line.

FAQ

  • Question 1:

    • Problem description: After jobs are grouped by project or quota, some projects or quotas are not displayed in charts.

    • Possible causes: No jobs are available in the projects or quotas.

  • Question 2:

    • Problem description: After a comparison period is selected, the data that corresponds to the comparison period is not available.

    • Possible causes: The project or quota may not be created or may be deleted within the comparison period. No jobs are available in the project or quota within the comparison period.

References

After you view the data of each metric on the Resource Observation page, you can optimize and adjust the execution plans and resource allocation of jobs based on your business requirements.

  • You can reconfigure resources. For more information about how to configure quota plans and time plans in a quota resource group, see Configure quotas.

  • You can configure job priorities. For more information, see Job priority.