Data Lake Analytics (DLA) allows you to use a pre-defined alert template to perform monitoring and alerting for a single job or all jobs. This topic describes how to perform monitoring and alerting for a specific job.
Prerequisites
- A virtual cluster of DLA is purchased.
- The AliyunARMSFullAccess policy is attached to the RAM user that you use. This prerequisite must be met if you want to use the credentials of a RAM user to view the metrics of virtual clusters.
- A Spark job is created. For more information about how to create a Spark job, see Create and run Spark jobs.
Configure a job delay alert for a specific job
In most cases, if you select the template for job delay alerts, an alert is sent every
time a job is delayed. To accurately perform monitoring and alerting for a specific
job in a specific virtual cluster, you can select Spark Structure Streaming Job Delay Longer Than 10s from the Alarm template drop-down list on the Create Alert panel. Then, change the value of Alarm expression (PromQL) based on the following syntax:
spark_structured_streaming_driver_latency{vcName="$(vcName)",app_id=~"$(job_id).*"} / 1000 > $(latency_sec)
The following table describes the parameters in Alarm expression (PromQL).
Parameter | Description |
---|---|
vcName |
The name of the virtual cluster related to the job. |
job_id |
The ID of the job. |
latency_sec |
The delay in processing the job, in seconds. |
Configure a job stop alert for a specific job
In most cases, if you select the template for job stop alerts, an alert is sent every
time a job is stopped. To accurately perform monitoring and alerting for a specific
job in a specific virtual cluster, you can select Spark Job Stop from the Alarm template drop-down list on the Create Alert panel. Then, change the value of Alarm expression (PromQL) based on the following syntax:
sum by (parent_job) (label_replace(up{pod_name=~"${job_id}.*-driver"}, "parent_job", "$1", "pod_name", "(.*?)-(.*)")) < 1
The following table describes the parameter in Alarm expression (PromQL).
Parameter | Description |
---|---|
job_id |
The ID of the job. |