All Products
Search
Document Center

MaxCompute:Manage jobs

Last Updated:Nov 19, 2024

The Operation and Maintenance feature of MaxCompute allows you to view historical jobs and jobs that are running, view job details, and analyze the resource load of a job when the job is running. This helps you manage jobs.

Feature description

The Operation and Maintenance feature of MaxCompute allows you to view and manage historical jobs and jobs that are running in your project.

  • Data developers can use this feature to view job details, identify job exceptions, and troubleshoot job issues at the earliest opportunity. For example, data developers can terminate one or more jobs in which exceptions occur to handle job issues.

  • Administrators can use this feature to view the resource load at a specific point in time and allocate and manage system resources in an efficient manner based on the quota group to which the resource belongs. This helps improve job execution efficiency and performance.

You can configure filter conditions to filter jobs on the Jobs page of the MaxCompute console. This helps you query the details of a job and analyze a job. You can perform the following operations on the Jobs page.

Operations

  • Filter jobs

    You can configure filter parameters to query the details of jobs. The following table describes the filter parameters.

    Filter parameters of a job (Expand for details)

    Parameter

    Description

    Time Range

    Filters jobs by time range. You must specify the start time and end time of a query to filter jobs. This parameter is required.

    Note

    This parameter specifies a filter condition to filter all jobs. The chart that displays job statistics and the job list change based on the configuration of this parameter.

    You can specify the time range to obtain information about the following jobs:

    • Jobs that are complete within this time range.

    • Jobs that are running at the end time of the query or within the last 3 minutes before the end time. You can obtain the job snapshots that are generated within the last 3 minutes. The 3-minute time range indicates the time range for you to observe job snapshots.

    The default time range is the last 1 hour. The maximum time range is 7 days and the minimum time range is 2 minutes. You can search for only jobs in the last 45 days.

    You can select a preset time range. You can also click the Time Range field and specify a time range in the panel that appears. The following options are available:

    • 1h: the previous 1 hour

    • 12h: the previous 12 hours

    • 1d: the previous 1 day

    • Select a time range: Select the year, month, and day that you want to query, and click Select Time to select a time range.

    Choose Project

    Filters jobs by project.

    Note

    This parameter specifies a filter condition to filter all jobs. The chart that displays job statistics and the job list change based on the configuration of this parameter.

    You can select multiple MaxCompute projects at the same time. This parameter is empty by default.

    Select Quota

    Filters jobs by quota group.

    Note

    This parameter specifies a filter condition to filter all jobs. The chart that displays job statistics and the job list change based on the configuration of this parameter.

    You can select only subscription quota groups. This parameter is left empty by default.

    Note

    You do not need to specify this parameter when you query pay-as-you-go jobs.

    For more information about quota groups, see Manage quotas in the new MaxCompute console.

    Job Type

    Filters jobs by job type.

    Note

    This parameter specifies a filter condition to filter all jobs. The chart that displays job statistics and the job list change based on the configuration of this parameter.

    Valid values:

    • SQL: SQL jobs

    • SQLRT: MaxCompute Query Acceleration (MCQA) SQL jobs

    • LOT: MapReduce jobs

    • CUPID: Spark or Mars jobs

    • AlgoTask: Platform for AI (PAI) jobs

    • Graph: graph jobs

    Instance ID

    Filters jobs by instance ID. You can enter the instance ID of the job that you want to find to perform an exact match.

    Note

    You can specify this parameter to filter jobs in the job list. The job list changes based on the configuration of this parameter.

    This parameter is empty by default.

    For more information about instance IDs, see View instance information.

    Job Owner

    Filters jobs by submitter. A submitter indicates an account that submits MaxCompute jobs.

    Note

    You can specify this parameter to filter jobs in the job list. The job list changes based on the configuration of this parameter.

    This parameter is empty by default.

    Fuzzy match is not supported. The value of this parameter must be a complete account name, such as ALIYUN$xxx or RAM$xxx.

    ExtNodeId

    Filters jobs by job ID of the data source,

    Note

    You can specify this parameter to filter jobs in the job list. The job list changes based on the configuration of this parameter.

    such as a node ID of DataWorks. For more information about node IDs of DataWorks, see Configure basic properties.

    Signature

    Filters jobs by the signature of SQL jobs.

    Note

    You can specify this parameter to filter jobs in the job list. The job list changes based on the configuration of this parameter.

    This parameter is available only for SQL jobs. You can use the signature to find the instances on which each time an SQL statement is executed.

    This parameter is empty by default.

    Latest Status

    Filters jobs by job state.

    Note

    You can specify this parameter to filter jobs in the job list. The job list changes based on the configuration of this parameter.

    Valid values:

    • Running: The job is running or is not complete.

    • Success: The job is complete.

    • Failed: The job fails to run.

    • Canceled: The job is canceled.

    • Submitted: The job is submitted and waiting for computing resources.

    This parameter is empty by default. This indicates that jobs in all states are queried.

    Note

    The state in the Latest Status column is the overall state of a job. However, multiple jobs may be run in parallel and each job can be in a specific substate. You can go to the LogView page to view the job states. For more information, see Use LogView V2.0 to view job information.

    Intelligent diagnosis

    Filter jobs by the labels from the intelligent diagnosis result. By default, no labels are selected. For detailed meanings of the intelligent diagnosis result labels, see Intelligent diagnostics for jobs.

  • Sort jobs

    The job filtering results are sorted by job completion time in descending order, with unfinished jobs appearing at the top. Basic single-column sorting and advanced multi-column sorting are supported.

    • Basic single-column sorting: Sort the column with a sort button in the list in ascending or descending order.

    • Advanced multi-column sorting: Click the Advanced Sorting button in the upper-right corner of the list, add columns by clicking Add Sort, and specify the sort order such as ascending and descending for each column. Click OK to apply the multi-column sorting.

    Note

    When advanced sorting conditions are applied, basic single-column sorting cannot be performed. You need to click the Advanced Sorting button in the upper-right corner of the list, then click Reset and OK before you can perform basic single-column sorting again.

  • View job details

    To view the details of a job, perform the following steps: In the job list, find the desired job and click LogView in the Actions column to go to the LogView page. On the page that appears, view the status, details, and results of the job.

  • Terminate jobs

    You can terminate one or more jobs that are in the Running state at a time.

  • Jobs Insight

    You can perform insight operations on individual jobs to view job summaries, resource consumption, and resource allocation for Quota at a specific point in time, as well as to trigger the Intelligent diagnostics for jobs.

    Note
    • Intelligent diagnostics are available exclusively for SQL jobs.

    • Jobs with a runtime less than 2 minutes or jobs of types other than SQL, MapReduce, Spark, and Mars do not have job-level resource consumption data.

View the chart that displays job statistics

The chart displays the number of jobs on a stacked column chart by time and job state based on the query results. This helps you view the overall status of a job.

Description of the chart (Expand for details)

The time interval represented by each column in the stacked column chart varies based on the Time Range parameter.

  • If the value specified by the Time Range parameter is less than or equal to 24 hours, the minimum time interval for each column in the stacked column chart is 2 minutes. The number of columns varies based on the length of the stacked column chart and can be up to 24.

  • If the value specified by the Time Range parameter is greater than 24 hours and less than or equal to 48 hours, the time interval for each column in the stacked column chart is 2 hours. The number of columns varies based on the length of the stacked column chart and can be up to 24.

  • If the value specified by the Time Range parameter is greater than 48 hours and less than or equal to 7 days, the time interval for each column in the stacked column chart is 6 hours. The number of columns varies based on the length of the stacked column chart and can be up to 29.

A job can be in one of the following states:

  • In operation: The job is in the Running state in the snapshot.

  • Ended: The job is complete, fails, or is terminated.

Note

The system collects job snapshots at an interval of 3 minutes. Therefore, snapshots of some jobs may not be collected. As a result, snapshots of some jobs that are running may be empty.

You can drag-select some consecutive time intervals on the chart to shorten the time range.

Jobs

Query results are obtained based on filter conditions and provide job information for you to manage jobs.

Note

The following job information cannot be collected:

  • Snapshot information of some jobs. The system collects the snapshot information at an interval of 3 minutes. In this case, the system does not collect the snapshots of jobs that are started within 3 minutes before collection.

  • Information about specific MaxCompute jobs that are created based on the PAI service, especially the jobs that are created by using RAM users.

  • Information about jobs in the projects of the MaxCompute developer edition. The MaxCompute developer edition will be phased out.

Data is processed at specific intervals. Therefore, some jobs are in the Running state in the query results but the jobs on the LogView page are complete. In most cases, this issue occurs when a job is run for an excessively short period of time. If this issue occurs, use the job state on the LogView page.

Parameters in the job list (Expand for details)

Column

Description

Instance ID

The instance ID of the job. Each MaxCompute job runs as an instance and has an instance ID. This column also shows the project to which the job belongs, the type of the job, and the priority of the job.

You can find the desired instance and click LogView in the Actions column to go to the LogView page and view the progress of the job. For more information about how to view the job progress on the LogView page, see Use LogView V2.0 to view job information.

Job Owner

The Alibaba Cloud account that is used to run the MaxCompute job.

You can find the job owner based on the account information. If a job occupies an excessive number of resources and affects the execution of other jobs, you can contact the job owner to terminate the job. For more information about how to terminate a job, see Instance operations.

Latest Status

The latest status of the job.

Intelligent diagnosis

Tags are generated based on the results of the intelligent diagnosis for the job.

Submission Time

The time at which the job was submitted.

Start Time

The time when the job received the first batch of computing resources. For the jobs that run for a short period of time or do not consume computing resources such as DDL statements, the job submission time is used instead. By default, the column is not displayed. You can click the relevant option in the Choose Display Fields dialog box to display the column.

Waiting Time

The duration from the time the job is submitted to the time the job starts to run. By default, the column is not displayed. You can click the relevant option in the Choose Display Fields dialog box to display the column.

Running Time

The duration from the start time to the end time of the job. By default, the column is not displayed. You can click the relevant option in the Choose Display Fields dialog box to display the column.

End Time

The time at which the running of the instance was complete.

Total Run Time

The interval from the time when the job was submitted to the time when the job was complete.

Quota

The quota group in which the job runs.

Snapshot Status

The state of the job when the job was observed.

CPU Utilization Percentage Snapshot

The percentage of CPU resources that are used by the MaxCompute job to the maximum value of the quota group when the job is observed. The percentage is calculated by using the following formula: Amount of used CPU resources/(Amount of reserved CPU resources + Amount of elastically reserved CPU resources) This parameter is not available for the jobs that use pay-as-you-go resources and the jobs whose snapshots cannot be collected.

Memory Usage Percentage Snapshot

The percentage of memory resources used by the MaxCompute job to the maximum value of the quota group when the job is observed. The percentage is calculated by using the following formula: Amount of used memory resources/(Amount of reserved memory resources + Amount of elastically reserved memory resources)). This parameter is not available for the jobs that use pay-as-you-go resources and the jobs whose snapshots cannot be collected.

Total Used CPU Resources

The total CPU resources that are used when you run a job. Unit: 100 CPU cores per second.

Total Amount of Used Memory Resources

The total memory consumption when you run a job. Unit: MB per second.

ExtPlantFrom

The client on which the job runs,

such as DataWorks. This parameter is passed in when the client starts to run the job.

ExtNodeId

The ID of the task on which the job runs,

such as the node ID of DataWorks. This parameter is passed in when the client starts to run the job.

ExtNodeOnDuty

The Alibaba Cloud account ID of the owner,

such as the account of the DataWorks node owner. This parameter is passed in when the client starts to run the job.

Signature

The signature of the SQL job.

You can use the signature to find the instances on which each time an SQL statement is executed.

Examples of O&M scenarios

View the details about a specific job

  • Scenario

    You want to view the details of a specific MaxCompute job or a job that is scheduled by a DataWorks node on an hourly basis.

  • Procedure

    1. Log on to the MaxCompute console. In the left-side navigation pane, click Jobs.

    2. Specify the Time Range parameter based on your business requirements.

    3. Click Search.

    4. Select ExtNodeId or Instance ID from the drop-down list below the query results and enter the value of ExtNodeId or Instance ID for your job.

    5. Click the image.png icon to filter the jobs.

      In the query results, you can find the desired instance and click LogView in the Actions column to view the details of the job on the LogView page. For more information about LogView, see Use LogView V2.0 to view job information.

View the details about a job in a specific time range

  • Scenario

    You want to view the jobs that are managed on the last day for the Project_1 and Project_2 projects, identify failed jobs, and troubleshoot errors.

  • Procedure

    1. Log on to the MaxCompute console. In the left-side navigation pane, click Jobs.

    2. On the Jobs page, set the Time Range parameter to 1d or set the time range from the current time of the last day to the current time of the current day.

    3. Select Project_1 and Project_2 from the Choose Project drop-down list.

    4. In the query results, you can find the desired instance and click LogView in the Actions column to view the details of the job on the LogView page. For more information about LogView, see Use LogView V2.0 to view job information.

View the resources occupied by a job with a subscription quota at a specific point in time

  • Scenario

    A large number of resources in the quota group named Subscription Default Quota are occupied. As a result, multiple jobs are waiting for the resources of the quota group. You can use the following method to view the jobs that use the quota.

  • Procedure

    1. Log on to the MaxCompute console. In the left-side navigation pane, click Jobs.

    2. Set the Time Range parameter to 1h. Alternatively, specify a custom start time and set the end time to the current time. The end time is the time when you observe the job.

    3. Set the Select Quota parameter to Subscription Default Quota.

    4. Click Search.

      You can view the CPU Utilization Percentage Snapshot and Memory Usage Percentage Snapshot parameters of the jobs whose Snapshot Status is Running in the query results. You can check whether the job that has large values of the CPU Utilization Percentage Snapshot and Memory Usage Percentage Snapshot parameters meets your business requirements. You can determine whether the job runs as expected or whether the job needs to be terminated based on other job information.

      Note

      In the query results, you can find the desired instance and click LogView in the Actions column to view the details of the job on the LogView page. For more information about LogView, see Use LogView V2.0 to view job information.

View details of an MCQA job

  • Scenario

    You want to view the status and details of the MCQA job in the last day.

  • Procedure

    1. Log on to the MaxCompute console. In the left-side navigation pane, click Jobs.

    2. Set the Time Range parameter to 1d and select SQLRT (Query Acceleration) from the Job Type drop-down list.

    3. Click Search.

      In the query results, you can find the desired instance and click LogView in the Actions column to view the details of the job on the LogView page. For more information about LogView, see Use LogView V2.0 to view job information.

      Note

      For MCQA jobs, multiple SQL statements may be executed in the same session. One session corresponds to one instance ID. You can click an instance ID to view the status of all SQL statements in a session on the LogView page. Take note of the following issues when you query a job of this type on the Operation and Maintenance page:

      • An active session indicates that some SQL statements are still being executed. If a session remains active, the job is in the Running state.

      • If a session expires or is closed, the job is in the Canceled state.

View the resource consumption of a job and the resource allocation of computing quotas at a specific point in time

  • Scenario

    If a job is not complete for a long period of time and you cannot locate the cause on the LogView page, you can analyze the job to check whether the issue occurs due to insufficient resources. After the job is complete, if the job runs at a low speed, you can analyze the job to check whether the issue occurs due to insufficient resources.

  • Procedure

    1. Log on to the MaxCompute console. In the left-side navigation pane, click Jobs.

    2. On the Jobs page, specify the Time Range and Select Quota parameters and click Search to filter MaxCompute jobs.

    3. In the obtained results, find the desired job and click Insight in the Actions column to go to the Job Insights page.

    4. On the CU Usage tab, you can view the resource consumption in the lifecycle of the job.

      • You can view the trend of the number of used compute units (CUs) and the number of CUs that wait to be used by a job within a specific period of time, and the trend of the CU metrics at the quota group level within a specific period of time based on the resource consumption chart. If the number of CUs used by a job is small, but the number of CUs used by a job in a quota group is large or even continuously reaches the upper limit, the resources in the quota group are insufficient. In this case, other jobs preempt computing resources from the current job.

      • You can click a time point in the horizontal axis of the resource consumption trend chart to view the resource allocation in the quota group at the point in time. You can view the number of jobs that are using CUs and the number of jobs that wait to use CUs and view the statistics on the priorities of existing jobs. You can click the legend that corresponds to the desired priority to go to the job list and view the details of the jobs. This way, you can identify the jobs that preempt computing resources from the current job. You can adjust job priorities or manage computing resources to optimize job execution based on your business requirements. For more information, see Job priority or Manage quotas in the new MaxCompute console.

What to do next

A job occupies an excessive number of resources and affects the execution of other jobs.

  • If the job does not meet your business requirements, you can terminate the job.

  • If the job meets your business requirements, invalid settings of the resources in the quota group exist. In this case, you must optimize the resource configuration plan. For more information, see Computing cost optimization.

References

You can run commands to view the details and status of a job and terminate a job. For more information, see Instance operations.