You can configure alert rules in the Cloud Monitor console to handle alerts in a timely manner.
Terms
running metric
The metric that helps you monitor the status of instances. The metric is dynamically collected and calculated by Artificial Intelligence Recommendation (AIRec) based on the running instances. You do not need to worry about the process of calculating metrics.
Cloud Monitor
An Alibaba Cloud service. AIRec synchronizes the running metrics that are calculated to Cloud Monitor. You can view the metrics in the Cloud Monitor console and create alert rules. After you activate AIRec, you can view the running metrics in the Cloud Monitor console for a free trial.
dashboard
The page that displays metrics in the Cloud Monitor console. You can view multiple metrics and reorganize the sizes and locations of charts in a dashboard.
chart
The display form of running metrics. One chart can show multiple metrics.
RAM user authorization
By default, a RAM user does not have permissions to view the metrics in the Cloud Monitor console. If you want to view the metrics as a RAM user, you must log on to the Cloud Monitor console by using your Alibaba Cloud account and authorize the RAM user. For more information, see Authorize Cloud Monitor to access your logs.
quick view of metrics
AIRec allows you to view all metrics in a quick manner. Log on to the Cloud Monitor console by using the account that is used to log on to the AIRec console. Then, specify the instance whose metrics you want to view.
Metrics
The following table describes the metrics that are calculated by AIRec. For example, you can set BehaviorFailedTps to a value. The value indicates that behavior data fails to be pushed for one 20-minute period, and the number of failures is greater than or equal to (>=) N times per second. You can adjust the number of 20-minute periods, mathematical operator, and N for this metric.
Metric | Unit | Calculation cycle | Description |
---|---|---|---|
RecommendQPS | QPS | One minute | The average number of recommended requests per second in a cycle. |
RecommendLatency | Seconds | One minute | The average response time of recommended requests. |
OverRateQPS | QPS | One minute | The average number of failed requests per second due to throttling. |
UserQuotaUsedRatio | Percentage | 10 minutes | The ratio of the actual uploaded number of the user tables to the purchased quota by the time of the calculation. |
ItemQuotaUsedRatio | Percentage | 10 minutes | The ratio of the actual uploaded number of the item tables to the purchased quota by the time of the calculation. |
UserTps | Count/Second | One minute | The average number of users uploaded per second. All upload commands are counted. |
ItemTps | Count/Second | One minute | The average number of items uploaded per second. All upload commands are counted. |
BehaviorTps | Count/Second | One minute | The average number of behaviors uploaded per second. All upload commands are counted. |
UserFailedTps | Count/Second | One minute | The average number of users that fail to upload per second. All upload commands are counted. |
ItemFailedTps | Count/Second | One minute | The average number of items that fail to upload per second. All upload commands are counted. |
BehaviorFailedTps | Count/Second | One minute | The average number of behaviors that fail to upload per second. All upload commands are counted. |
BehaviorTimeLag | Seconds | One minute | The average value of bhv_time and time interval of data upload, which is used to reflect the latency of uploading behavior data. |
The following table describes the recommended alert rules.
Name | Condition | Remarks |
---|---|---|
OverRateQPS | >0 | Check the purchased QPS quota and actual quota. |
UserQuotaUsedRatio | >90 | Check the purchased quota and the actual number of uploaded user tables. |
ItemQuotaUsedRatio | >90 | Check the purchased quota and the actual number of uploaded item tables. |
BehaviorTimeLag | >7200 | Latency in uploading behavior data affects the recommendation effect. |
Procedure
To create an alert rule, perform the following steps by using your Alibaba Cloud account or the authorized RAM user:
1. Log on to the AIRec console. In the left-side navigation pane, click Overview. In the upper-right corner of the Overview page, the Alert History section displays the details of historical alerts. If no alert rules exist, click Configure Alert Rules.
2. Click Create Alert Rule.
3. Configure the parameters in the Related Resource section. Set Resource Range to All Resources. If any instances meet the condition of the alert rule, AIRec sends you alert notifications After you specify an instance, AIRec sends you alert notifications if the specified instance meets the condition of the alert rule.
4. Set Alert Rules. The Mute for parameter specifies how often AIRec sends the alert notification repeatedly if the alert is not handled after the alert is detected.
5. Configure the parameters in the Notification Method section.
Features provided by Cloud Monitor
Custom Dashboard
Log on to the Cloud Monitor console
Log on to the Cloud Monitor console by using the account that is used to log on to the AIRec console. In the left-side navigation pane, choose Dashboard > Custom Dashboard.
Create Dashboard
On the page that appears, click Create Dashboard. In the dialog box that appears, enter the name of the custom dashboard. If the current account is associated with multiple instances, we recommend that you create a dashboard for each instance.
Add charts for metrics
We recommend that you add a chart for each metric. Perform the following steps to add charts for metrics:
Click Add View.
In the Select Metrics section of the Add View panel, select AIRec.
Set the Metrics parameter.
Enter the chart name that is easy to identify in the dashboard.
Set the Resource parameter.
Click Save. The chart is displayed in the dashboard.