Monitoring and alerting

Updated at: 2025-01-27 03:19

When you use MaxCompute, you may need to monitor MaxCompute subscription resources and consumption of pay-as-you-go jobs to learn about the status of resources. This helps you upgrade resources or plan jobs at the earliest opportunity. You can also configure alert rules. If the status of resources meets an alert rule, CloudMonitor automatically sends you an alert notification. This way, you can learn about the status of the resources at the earliest opportunity.

Monitoring and alerting scheme

The monitoring and alerting feature of MaxCompute allows you to perform the following operations:

  • Use CloudMonitor to configure metrics. This way, you can monitor subscription resources and real-time job consumption.

    Note

    You can log on to the MaxCompute console, view the number of alerts for each metric in the Alert and Risk Warnings section of the Overview page.

    • View monitoring charts on a dashboard to learn about the changes of metrics in real time. For more information, see Configure a dashboard.

    • Configure custom alert rules and add alert contacts. If the value of a metric reaches or exceeds the specified alert threshold, CloudMonitor automatically sends alert notifications to the specified contact. Alert notifications can be sent by using phone calls, text messages, emails, or DingTalk chatbot. For more information, see Configure alert rules.

  • Use the MaxCompute client to monitor resource consumption of an SQL statement. For more information about how to monitor the resource consumption of SQL statements, see Configure an upper limit for the resources that are consumed by an SQL statement.

Metrics

The following table describes the metrics.

Metric type

Metric category

Metric

Description

Metric type

Metric category

Metric

Description

MaxCompute_Subscription

level1

PrePayQuotaCpuUsagePercentageOnLevel1

The CPU utilization of level-1 quotas (sum of the CPU utilization of reserved CUs and elastically reserved CUs). Unit: %. The system collects data once every minute.

PrePayQuotaCpuUsageValueOnLevel1

The total number of utilized CPU cores of level-1 quotas. Unit: core. The system collects data once every minute.

PrePayQuotaMemUsagePercentageOnLevel1

The memory usage of level-1 quotas (sum of the memory usage of reserved CUs and elastically reserved CUs). Unit: %. The system collects data once every minute.

PrePayQuotaMemUsageValueOnLevel1

The total size of used memory resources of level-1 quotas. Unit: MB. The system collects data once every minute.

level2

PrePayQuotaCpuUsagePercentageOnLevel2

The CPU utilization of level-2 quotas (sum of the CPU utilization of reserved CUs specified by minCU and elastically reserved CUs). Unit: %. The system collects data once every minute.

PrePayQuotaCpuUsageValueOnLevel2

The total number of used CPU cores of level-2 quotas. Unit: core. The system collects data once every minute.

PrePayQuotaMemUsagePercentageOnLevel2

The memory usage of level-2 quotas (sum of the memory usage of reserved CUs specified by minCU and elastically reserved CUs). Unit: %. The system collects data once every minute.

PrePayQuotaMemUsageValueOnLevel2

The total size of used memory resources of level-2 quotas. Unit: MB. The system collects data once every minute.

PrePayQuotaJobWaitingNumberOnLevel2

The number of waiting jobs that use level-2 quotas. The system collects data once every minute.

MaxCompute_PayAsYouGo

N/A

PayAsYouGo jobs' daily consumption in USD

The metric that is used to monitor the daily fees of SQL and MapReduce jobs in a project. You can specify the maximum daily consumption amount (USD). If the daily consumption amount of a project reaches or exceeds the value of this parameter, an alert is triggered.

PayAsYouGo jobs' monthly consumption in USD

The metric that is used to monitor the monthly fees of SQL and MapReduce jobs in a project. You can specify the maximum monthly consumption amount (USD). If the monthly consumption amount of a project reaches or exceeds the value of this parameter, an alert is triggered.

You can configure a dashboard or alert rules for a metric. For more information, see Configure a dashboard or Configure alert rules.

Configure a dashboard

  1. Log on to the CloudMonitor console.

  2. In the left-side navigation pane, choose Dashboard > Custom Dashboard.

  3. On the Custom Dashboards page, click Create Dashboard. On the page that appears, click Add Chart in the upper-right corner.

  4. On the New Board page, select a chart type and metric.

    Operation

    Parameter

    Description

    Operation

    Parameter

    Description

    Select a chart type

    Line

    Select a chart type based on your business requirements. The following chart types are provided on a dashboard: Line, Area, Table, Heat Map, and Pie Chart.

    Area

    Table

    Heat Map

    Pie chart

    Select metrics

    Product Name

    Select a metric type of MaxCompute from the Cloud Services drop-down list on the Dashboards tab. For more information about the metric types of MaxCompute, see Metrics.

    Metric Name

    Select a metric from the Monitoring indicators drop-down list. For more information about the metrics of MaxCompute, see Metrics.

    Resource

    Select the regions and projects that you want to monitor from the Resource drop-down list.

  5. After the configuration is complete, click OK. Then, you can view the charts of the metrics in your custom dashboard.

    Note

    For more information about how to add a monitoring chart, see Manage the monitoring charts of a custom dashboard.

Configure alert rules

You can configure alert rules for each metric in Metrics.

You can configure an alert rule to trigger alerts for a metric of a resource group. This way, if the CPU utilization or memory usage of CUs in a quota group of MaxCompute_ subscription exceeds the specified threshold, an alert is triggered. For example, you configured 150 CUs for the resource group that you want to monitor, the full CPU utilization of a CU is 100%, and the maximum CPU utilization of all CUs is 15000%. You can configure the following alert rule: An alert is triggered if the CPU utilization of CUs exceeds the specified threshold 12000%. If you receive the alert, the resource group is about to be fully utilized. Jobs may be queued if you continue to submit jobs. You can upgrade the configurations of resource groups or plan jobs based on your business requirements. To configure alert rules in this scenario, perform the following steps:

  1. Log on to the CloudMonitor console.

  2. In the left-side navigation pane, choose Alerts > Alert Rules.

  3. On the Alert Rules page, click Create Alert Rule.

  4. In the Create Alert Rule panel, configure alert rule parameters for this scenario. For more information about the parameters, see Create an alert rule. For more information about how to configure an alert contact, see Create an alert contact or alert contact group.

    The following table describes the key parameters that you must configure for an alert rule in the preceding scenario.

    Parameter

    Description

    Parameter

    Description

    Product

    Select MaxCompute_ subscription from the Product drop-down list.

    Resource Range

    Select Instances.

    Associated Resources

    • Region: Select the region in which the MaxCompute project resides from the Region drop-down list in the upper-left corner of the pane that appears.

    • QuotaGroup: Select the quota group that you want to monitor from the quota group list. For more information about quota groups, see Quota management (new).

    Add Rule

    • Rule Name: the name of the alert rule.

    • Metric Type: Select Single Metric.

    • Metric: Select Subscription quota cpu usage from the drop-down list.

      Note

      You can also select QuotaGroupJobWaitQueue from the Rule Description drop-down list. If the CPU utilization is high and a large number of jobs are queued for N consecutive periods, manual intervention may be required.

  5. Click Confirm.

References

For more information about the limits on the consumption of pay-as-you-go computing tasks and related alerts, see Consumption control.

  • On this page (1)
  • Monitoring and alerting scheme
  • Metrics
  • Configure a dashboard
  • Configure alert rules
  • References
Feedback
phone Contact Us