本文提供了典型的Web应用防火墙(WAF)日志查询与分析告警配置案例。您可以参考本文提供的告警配置参数,在自定义WAF日志仪表盘中添加监控图表及配置告警。
本文以旧版日志服务告警配置为例,介绍相关配置参数。如果您已升级使用了新版日志服务告警,请结合本文提供的查询语句及告警参数建议,并参见快速设置日志告警来完成相关配置。
4XX比例异常告警
告警参数配置建议:
图表名称:4XX比例(忽略拦截数据)
查询语句:
user_id :您的阿里云账号ID and not real_client_ip :被拦截的请求IP | SELECT user_id, host AS "域名", Rate_2XX AS "2XX比例", Rate_3XX AS "3XX比例", Rate_4XX AS "4XX比例", Rate_5XX AS "5XX比例", countall AS "aveQPS", status_2XX, status_3XX, status_4XX, status_5XX, countall FROM( SELECT user_id, host, round( round(status_2XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_2XX, round( round(status_3XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_3XX, round( round (status_4XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_4XX, round( round(status_5XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_5XX, status_2XX, status_3XX, status_4XX, status_5XX, countall FROM( SELECT user_id, host, count_if( status >= 200 and status < 300 ) AS status_2XX, count_if( status >= 300 and status < 400 ) AS status_3XX, count_if( status >= 400 and status < 500 and status <> 444 and status <> 405 ) AS status_4XX, count_if( status >= 500 and status < 600 ) AS status_5XX, COUNT(*) AS countall FROM log GROUP BY host, user_id ) ) WHERE countall > 120 ORDER BY Rate_4XX DESC LIMIT 5
该图表包含以下字段:
aveQPS
、2XX比例
、3XX比例
、4XX比例
、5XX比例
,分别表示域名QPS和各类型响应状态码的占比。其中,4XX比例
不包含WAF拦截的CC攻击和Web攻击等造成的444和405状态码,以便只展示因业务自身原因造成的状态码变化。在设置告警触发条件时,您可以自由组合上述字段。例如,aveQPS>10 && 2XX比例<60
表示在设定的统计时间内,指定域名的QPS达到10以上且2XX比例小于60%。查询区间:5分钟(相对)
频率:固定间隔5分钟
触发条件:
$0.countall>3000&& $0.4XX比例>80
触发通知阈值:2次
通知间隔:10分钟
发送内容:
- [时间]:${FireTime} - [Uid]:${Results[0].RawResults[0].user_id} - 域名:${Results[0].RawResults[0].域名} - 产品:WAF - 最近5分钟内总请求数:${Results[0].RawResults[0].countall} - 2XX比例:${Results[0].RawResults[0].2XX比例} % - 3XX比例:${Results[0].RawResults[0].3XX比例} % - 4XX比例:${Results[0].RawResults[0].4XX比例} % - 5XX比例:${Results[0].RawResults[0].5XX比例} %
5XX比例异常告警
告警参数配置建议:
图表名称:5XX比例
查询语句:
user_id :您的阿里云账号ID and not real_client_ip :被拦截的请求IP | select user_id, host AS "域名", Rate_2XX AS "2XX比例", Rate_3XX AS "3XX比例", Rate_4XX AS "4XX比例", Rate_5XX AS "5XX比例", countall AS "相对时间内访问量", status_2XX, status_3XX, status_4XX, status_5XX, countall FROM( SELECT user_id, host, round( round(status_2XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_2XX, round( round(status_3XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_3XX, round( round (status_4XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_4XX, round( round(status_5XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_5XX, status_2XX, status_3XX, status_4XX, status_5XX, countall FROM( SELECT user_id, host, count_if( status >= 200 and status < 300 ) AS status_2XX, count_if( status >= 300 and status < 400 ) AS status_3XX, count_if( status >= 400 and status < 500 ) AS status_4XX, count_if( status >= 500 and status < 600 ) AS status_5XX, COUNT(*) AS countall FROM log GROUP BY host, user_id ) ) WHERE countall > 120 ORDER BY Rate_5XX DESC LIMIT 5
查询区间:5分钟(相对)
频率:固定间隔5分钟
触发条件:
$0.countall>3000&& $0.5XX比例>80
触发通知阈值:2次
通知间隔:10分钟
发送内容:
- [时间]:${FireTime} - [Uid]:${Results[0].RawResults[0].user_id} - 域名:${Results[0].RawResults[0].域名} - 产品:WAF - 最近5分钟内总请求数:${Results[0].RawResults[0].countall} - 2XX比例:${Results[0].RawResults[0].2XX比例} % - 3XX比例:${Results[0].RawResults[0].3XX比例} % - 4XX比例:${Results[0].RawResults[0].4XX比例} % - 5XX比例:${Results[0].RawResults[0].5XX比例} %
QPS异常告警
告警参数配置建议:
图表名称:QPS TOP 5
查询语句:
user_id :您的阿里云账号ID and not real_client_ip :被拦截的请求IP | SELECT user_id, host, Rate_2XX, Rate_3XX, Rate_4XX, Rate_5XX, countall / 60 as "aveQPS", status_2XX, status_3XX, status_4XX, status_5XX, countall FROM( SELECT user_id, host, round( round(status_2XX * 1.0000 / countall, 4) * 100, 2 ) as Rate_2XX, round( round(status_3XX * 1.0000 / countall, 4) * 100, 2 ) as Rate_3XX, round( round (status_4XX * 1.0000 / countall, 4) * 100, 2 ) as Rate_4XX, round( round(status_5XX * 1.0000 / countall, 4) * 100, 2 ) as Rate_5XX, status_2XX, status_3XX, status_4XX, status_5XX, countall FROM( SELECT user_id, host, count_if( status >= 200 and status < 300 ) as status_2XX, count_if( status >= 300 and status < 400 ) as status_3XX, count_if( status >= 400 and status < 500 and status <> 444 and status <> 405 ) as status_4XX, count_if( status >= 500 and status < 600 ) as status_5XX, COUNT(*) as countall FROM log GROUP BY host, user_id ) ) WHERE countall > 120 ORDER BY aveQPS DESC LIMIT 5
查询区间:1分钟(相对)
频率:固定间隔1分钟
触发条件:
$0.aveQPS>=50
触发通知阈值:1次
通知间隔:5分钟
发送内容:
- [时间]:${FireTime} - [Uid]:${Results[0].RawResults[0].user_id} - 域名:${Results[0].RawResults[0].host} - 产品:WAF - 过去1分钟平均QPS:${Results[0].RawResults[0].aveQPS} - 响应码 2xx_rate :${Results[0].RawResults[0].Rate_2XX}% - 响应码 3xx_rate :${Results[0].RawResults[0].Rate_3XX}% - 响应码 4xx_rate :${Results[0].RawResults[0].Rate_4XX}% - 响应码 5xx_rate :${Results[0].RawResults[0].Rate_5XX}%
QPS突增告警
告警参数配置建议:
图表名称:QPS突增监控
查询语句:
user_id :您的阿里云账号ID | SELECT t1.user_id, t1.now1mQPS, t1.past1mQPS, in_ratio, t1.host, t2.Rate_2XX, Rate_3XX, Rate_4XX, Rate_5XX, aveQPS FROM ( ( SELECT user_id, round(c [1] / 60, 0) AS now1mQPS, round(c [2] / 60, 0) AS past1mQPS, round( round(c [1] / 60, 0) / round(c [2] / 60, 0) * 100 -100, 0 ) AS in_ratio, host FROM ( SELECT compare(t, 60) AS c, host, user_id FROM ( SELECT COUNT(*) AS t, host, user_id FROM log GROUP by host, user_id ) GROUP by host, user_id ) WHERE c [3] > 1.1 and ( c [1] > 180 or c [2] > 180 ) ) t1 JOIN ( SELECT user_id, host, Rate_2XX, Rate_3XX, Rate_4XX, Rate_5XX, countall / 60 AS "aveQPS", status_2XX, status_3XX, status_4XX, status_5XX, countall FROM ( SELECT user_id, host, round( round(status_2XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_2XX, round( round(status_3XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_3XX, round( round(status_4XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_4XX, round( round(status_5XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_5XX, status_2XX, status_3XX, status_4XX, status_5XX, countall FROM ( SELECT user_id, host, count_if( status >= 200 and status < 300 ) AS status_2XX, count_if( status >= 300 and status < 400 ) AS status_3XX, count_if( status >= 400 and status < 500 and status <> 444 and status <> 405 ) AS status_4XX, count_if( status >= 500 and status < 600 ) AS status_5XX, COUNT(*) AS countall FROM log GROUP BY host, user_id ) ) WHERE countall > 1 ) t2 on t1.host = t2.host ) ORDER BY in_ratio DESC LIMIT 5
查询区间:1分钟(相对)
频率:固定间隔1分钟
触发条件:
$0.now1mqps>50&& $0.in_ratio>300
触发通知阈值:1次
通知间隔:5分钟
发送内容:
- [时间]:${FireTime} - [Uid]:${Results[0].RawResults[0].user_id} - 域名:${Results[0].RawResults[0].host} - 产品:WAF - 过去1分钟平均QPS:${Results[0].RawResults[0].now1mqps} - QPS突增率:${Results[0].RawResults[0].in_ratio}% - 响应码 2xx_Rate :${Results[0].RawResults[0].rate_2xx}% - 响应码 3xx_rate :${Results[0].RawResults[0].Rate_3XX}% - 响应码 4xx_rate :${Results[0].RawResults[0].Rate_4XX}% - 响应码 5xx_rate :${Results[0].RawResults[0].Rate_5XX}%
QPS突降告警
图表名称:QPS突降监控
查询语句:
user_id :您的阿里云账号ID | SELECT t1.user_id, t1.now1mQPS, t1.past1mQPS, de_ratio, t1.host, t2.Rate_2XX, Rate_3XX, Rate_4XX, Rate_5XX, aveQPS FROM ( ( SELECT user_id, round(c [1] / 60, 0) AS now1mQPS, round(c [2] / 60, 0) AS past1mQPS, round( 100-round(c [1] / 60, 0) / round(c [2] / 60, 0) * 100, 2 ) AS de_ratio, host FROM ( SELECT compare(t, 60) AS c, host, user_id FROM ( SELECT COUNT(*) AS t, host, user_id FROM log GROUP BY host, user_id ) GROUP BY host, user_id ) WHERE c [3] < 0.9 AND ( c [1] > 180 or c [2] > 180 ) ) t1 JOIN ( SELECT user_id, host, Rate_2XX, Rate_3XX, Rate_4XX, Rate_5XX, countall / 60 AS "aveQPS", status_2XX, status_3XX, status_4XX, status_5XX, countall FROM ( SELECT user_id, host, round( round(status_2XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_2XX, round( round(status_3XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_3XX, round( round(status_4XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_4XX, round( round(status_5XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_5XX, status_2XX, status_3XX, status_4XX, status_5XX, countall FROM ( SELECT user_id, host, count_if( status >= 200 and status < 300 ) AS status_2XX, count_if( status >= 300 and status < 400 ) AS status_3XX, count_if ( status >= 400 and status < 500 and status <> 444 and status <> 405 ) AS status_4XX, count_if( status >= 500 and status < 600 ) AS status_5XX, COUNT(*) AS countall FROM log GROUP BY host, user_id ) ) WHERE countall > 1 ) t2 on t1.host = t2.host ) ORDER BY de_ratio DESC LIMIT 5
该图表中包含
now1mqps
(当前一分钟的平均QPS)、past1mqps
(过去一分钟的平均QPS)、de_ratio
(QPS下降率)、host
等字段,您可以根据需要使用这些字段设置告警条件。查询区间:1分钟(相对)
频率:固定间隔1分钟
触发条件:
$0.now1mqps>10&& $0.de_ratio>50
触发通知阈值:2次
通知间隔:5分钟
发送内容:
- [时间]:${FireTime} - [Uid]:${Results[0].RawResults[0].user_id} - 域名:${Results[0].RawResults[0].host} - 产品:WAF(海外) - 过去1分钟平均QPS:${Results[0].RawResults[0].now1mqps} - QPS突降率:${Results[0].RawResults[0].de_ratio}% - 响应码 2xx_rate :${Results[0].RawResults[0].rate_2xx}% - 响应码 3xx_rate :${Results[0].RawResults[0].Rate_3XX}% - 响应码 4xx_rate :${Results[0].RawResults[0].Rate_4XX}% - 响应码 5xx_rate :${Results[0].RawResults[0].Rate_5XX}%
5分钟内ACL拦截情况告警
告警参数配置建议:
图表名称:ACL规则拦截量
查询语句:
user_id :您的阿里云账号ID | SELECT user_id, host, count_if( final_plugin = 'waf' AND final_action = 'block' ) AS "规则防护引擎拦截量", count_if( final_plugin = 'cc' AND final_action = 'block' ) AS "CC拦截量", count_if( final_plugin = 'acl' AND final_action = 'block' ) AS "ACL拦截量", count_if( final_plugin = 'antiscan' AND final_action = 'block' ) AS "扫描防护拦截量", count_if( (final_plugin = 'waf' AND final_action = 'block') OR (final_plugin = 'cc' AND final_action = 'block') OR (final_plugin = 'acl' AND final_action = 'block') OR (final_plugin = 'antiscan' AND final_action = 'block') ) AS totalblock GROUP BY host, user_id HAVING ( "ACL拦截量" >= 0 AND "规则防护引擎拦截量" >= 0 AND "CC拦截量" >= 0 AND "扫描防护拦截量" >= 0 AND totalblock > 10 ) ORDER BY "ACL拦截量" DESC LIMIT 5
查询区间:5分钟(相对)
频率:固定间隔5分钟
触发条件:
$0.totalblock>=500&&($0.ACL拦截量>=500)
触发通知阈值:1次
通知间隔:5分钟
发送内容:
- [时间]:${FireTime} - [Uid]:${Results[0].RawResults[0].user_id} - 域名:${Results[0].RawResults[0].host} - 产品:WAF - 最近5分钟内拦截总量:${Results[0].RawResults[0].totalblock} - ACL拦截量:${Results[0].RawResults[0].ACL拦截量} - 规则防护引擎拦截量:${Results[0].RawResults[0].规则防护引擎拦截量} - CC拦截量:${Results[0].RawResults[0].CC拦截量} - 扫描防护拦截量:${Results[0].RawResults[0].扫描防护拦截量}
5分钟内规则防护引擎拦截情况告警
告警参数配置建议:
图表名称:规则防护引擎拦截量
查询语句:
user_id :您的阿里云账号ID | SELECT user_id, host, count_if( final_plugin = 'waf' AND final_action = 'block' ) AS "规则防护引擎拦截量", count_if( final_plugin = 'cc' AND final_action = 'block' ) AS "CC拦截量", count_if( final_plugin = 'acl' AND final_action = 'block' ) AS "ACL拦截量", count_if( final_plugin = 'antiscan' AND final_action = 'block' ) AS "扫描防护拦截量", count_if( (final_plugin = 'waf' AND final_action = 'block') OR (final_plugin = 'cc' AND final_action = 'block') OR (final_plugin = 'acl' AND final_action = 'block') OR (final_plugin = 'antiscan' AND final_action = 'block') ) AS totalblock GROUP BY host, user_id HAVING ( "ACL拦截量" >= 0 AND "规则防护引擎拦截量" >= 0 AND "CC拦截量" >= 0 AND "扫描防护拦截量" >= 0 AND totalblock > 10 ) ORDER BY "规则防护引擎拦截量" DESC LIMIT 5
查询区间:5分钟(相对)
频率:固定间隔5分钟
触发条件:
$0.totalblock>=500&&($0.规则防护引擎拦截量>=500)
触发通知阈值:1次
通知间隔:5分钟
发送内容:
- [时间]:${FireTime} - [Uid]:${Results[0].RawResults[0].user_id} - 域名:${Results[0].RawResults[0].host} - 产品:WAF - 最近5分钟内拦截总量:${Results[0].RawResults[0].totalblock} - ACL拦截量:${Results[0].RawResults[0].ACL拦截量} - 规则防护引擎拦截量:${Results[0].RawResults[0].规则防护引擎拦截量} - CC拦截量:${Results[0].RawResults[0].CC拦截量} - 扫描防护拦截量:${Results[0].RawResults[0].扫描防护拦截量}
5分钟内CC拦截情况告警
告警参数配置建议:
图表名称:CC防护规则拦截量
查询语句:
user_id :您的阿里云账号ID | SELECT user_id, host, count_if( final_plugin = 'waf' AND final_action = 'block' ) AS "规则防护引擎拦截量", count_if( final_plugin = 'cc' AND final_action = 'block' ) AS "CC拦截量", count_if( final_plugin = 'acl' AND final_action = 'block' ) AS "ACL拦截量", count_if( final_plugin = 'antiscan' AND final_action = 'block' ) AS "扫描防护拦截量", count_if( (final_plugin = 'waf' AND final_action = 'block') OR (final_plugin = 'cc' AND final_action = 'block') OR (final_plugin = 'acl' AND final_action = 'block') OR (final_plugin = 'antiscan' AND final_action = 'block') ) AS totalblock GROUP BY host, user_id HAVING ( "ACL拦截量" >= 0 AND "规则防护引擎拦截量" >= 0 AND "CC拦截量" >= 0 AND "扫描防护拦截量" >= 0 AND totalblock > 10 ) ORDER BY "CC拦截量" DESC LIMIT 5
查询区间:5分钟(相对)
频率:固定间隔5分钟
触发条件:
$0.totalblock>=500&&($0.CC拦截量>=500)
触发通知阈值:1次
通知间隔:5分钟
发送内容:
- [时间]:${FireTime} - [Uid]:${Results[0].RawResults[0].user_id} - 域名:${Results[0].RawResults[0].host} - 产品:WAF - 最近5分钟内拦截总量:${Results[0].RawResults[0].totalblock} - ACL拦截量:${Results[0].RawResults[0].ACL拦截量} - 规则防护引擎拦截量:${Results[0].RawResults[0].规则防护引擎拦截量} - CC拦截量:${Results[0].RawResults[0].CC拦截量} - 扫描防护拦截量:${Results[0].RawResults[0].扫描防护拦截量}
5分钟内扫描拦截情况告警
告警参数配置建议:
图表名称:扫描防护拦截量
查询语句:
user_id :您的阿里云账号ID | SELECT user_id, host, count_if( final_plugin = 'waf' AND final_action = 'block' ) AS "规则防护引擎拦截量", count_if( final_plugin = 'cc' AND final_action = 'block' ) AS "CC拦截量", count_if( final_plugin = 'acl' AND final_action = 'block' ) AS "ACL拦截量", count_if( final_plugin = 'antiscan' AND final_action = 'block' ) AS "扫描防护拦截量", count_if( (final_plugin = 'waf' AND final_action = 'block') OR (final_plugin = 'cc' AND final_action = 'block') OR (final_plugin = 'acl' AND final_action = 'block') OR (final_plugin = 'antiscan' AND final_action = 'block') ) AS totalblock GROUP BY host, user_id HAVING ( "ACL拦截量" >= 0 AND "规则防护引擎拦截量" >= 0 AND "CC拦截量" >= 0 AND "扫描防护拦截量" >= 0 AND totalblock > 10 ) ORDER BY "扫描防护拦截量" DESC LIMIT 5
查询区间:5分钟(相对)
频率:固定间隔5分钟
触发条件:
$0.totalblock>=500&&($0.扫描防护拦截量>=500)
触发通知阈值:1次
通知间隔:5分钟
发送内容:
- [时间]:${FireTime} - [Uid]:${Results[0].RawResults[0].user_id} - 域名:${Results[0].RawResults[0].host} - 产品:WAF(海外) - 最近5分钟内拦截总量:${Results[0].RawResults[0].totalblock} - ACL拦截量:${Results[0].RawResults[0].ACL拦截量} - 规则防护引擎拦截量:${Results[0].RawResults[0].规则防护引擎拦截量} - CC拦截量:${Results[0].RawResults[0].CC拦截量} - 扫描防护拦截量:${Results[0].RawResults[0].扫描防护拦截量}
单IP攻击量预警
告警参数配置建议:
图表名称:单IP攻击量
查询语句:
user_id :您的阿里云账号ID | SELECT user_id, real_client_ip, concat( 'ACL拦截量:', cast(aclblock AS varchar(10)), ' ', '规则防护引擎拦截量:', cast(wafblock AS varchar(10)), ' ', 'CC拦截量:', cast(aclblock AS varchar(10)) ) AS blockNum, totalblock, allRequest FROM ( SELECT user_id, real_client_ip, count_if( final_plugin = 'acl' AND final_action = 'block' ) AS aclblock, count_if( final_plugin = 'waf' AND final_action = 'block' ) AS wafblock, count_if( final_plugin = 'cc' AND final_action = 'block' ) AS ccblock, count_if( ( final_plugin = 'acl' AND final_action = 'block' ) OR ( final_plugin = 'waf' AND final_action = 'block' ) OR ( final_plugin = 'cc' AND final_action = 'block' ) ) AS totalblock, COUNT(*) AS allRequest FROM log GROUP BY user_id, real_client_ip HAVING totalblock > 1 ORDER BY totalblock DESC LIMIT 5 )
该图表中包含
real_client_ip
(攻击IP)、blockNum
(包含ACL拦截量
、规则防护引擎拦截量
、CC拦截量
等数据)、totalblock
(总拦截请求数)、allRequest
(总请求数)字段,您可以根据需要使用这些字段设置告警条件。查询区间:5分钟(相对)
频率:固定间隔5分钟
触发条件:
$0.totalblock >=500
触发通知阈值:1次
通知间隔:5分钟
发送内容:
- [时间]:${FireTime} - [Uid]:${Results[0].RawResults[0].user_id} - 产品:WAF - 最近5分钟内单IP攻击排行Top3: - ${Results[0].RawResults[0].real_client_ip} (${Results[0].RawResults[0].blockNum}) - ${Results[0].RawResults[1].real_client_ip} (${Results[0].RawResults[1].blockNum}) - ${Results[0].RawResults[2].real_client_ip} (${Results[0].RawResults[2].blockNum})
单IP攻击域名数量告警
告警参数配置建议:
图表名称:单IP攻击域名数量
查询语句:
user_id :您的阿里云账号ID and not upstream_status :504 and not upstream_addr :'-' and request_time_msec < 5000 and upstream_status :200 and not ua_browser :bot | SELECT user_id, host, upstream_time, request_time, requestnum FROM ( SELECT user_id, host, round(avg(upstream_response_time), 2) * 1000 AS upstream_time, round(avg(request_time_msec), 2) AS request_time, COUNT(*) AS requestnum FROM log GROUP BY host, user_id ) WHERE requestnum > 30 ORDER BY request_time DESC LIMIT 5
该图表中包含
real_client_ip
(攻击IP)、totalblock
(总拦截请求数)、domainnum
(该IP攻击的域名数)等字段。在设置告警触发条件时,您可以自由组合上述字段来设置告警条件。例如,totalblock>500&& domainnum>5
表示某IP在对应时间内总攻击量达到500,并且攻击域名数多于5个。查询区间:5分钟(相对)
频率:固定间隔1分钟
触发条件:
$0.domainnum>=10
触发通知阈值:1次
通知间隔:5分钟
发送内容:
- [时间]:${FireTime} - [Uid]:${Results[0].RawResults[0].user_id} - 产品:WAF - 攻击IP:${Results[0].RawResults[0].real_client_ip} - 攻击的域名数:${Results[0].RawResults[0].domainnum} - 最近5分钟总攻击请求数:${Results[0].RawResults[0].totalblock} - 请及时关注处理
5分钟平均时延异常告警
告警参数配置建议:
图表名称:平均时延监控
查询语句:
user_id :您的阿里云账号ID and not upstream_status :504 and not upstream_addr :'-' and request_time_msec < 5000 and upstream_status :200 and not ua_browser :bot | SELECT user_id, host, upstream_time, request_time, requestnum FROM ( SELECT user_id, host, round(avg(upstream_response_time), 2) * 1000 AS upstream_time, round(avg(request_time_msec), 2) AS request_time, COUNT(*) AS requestnum FROM log GROUP BY host, user_id ) WHERE requestnum > 30 ORDER BY request_time DESC LIMIT 5
查询区间:5分钟(相对)
频率:固定间隔5分钟
触发条件:
$0.request_time>1000&& $0.requestnum>30
触发通知阈值:2次
通知间隔:10分钟
发送内容:
- [时间]:${FireTime} - [Uid]:${Results[0].RawResults[0].user_id} - 域名:${Results[0].RawResults[0].host} - 产品:WAF(海外) - [触发条件]:${condition} - 最近5分钟延时情况TOP 3(毫秒) - Host1:${Results[0].RawResults[0].host} Delay_time:${Results[0].RawResults[0].upstream_time} - Host2:${Results[0].RawResults[1].host} Delay_time:${Results[0].RawResults[1].upstream_time} - Host3:${Results[0].RawResults[2].host} Delay_time:${Results[0].RawResults[2].upstream_time}
流量突降告警
告警参数配置建议:
图表名称:流量突降监控
查询语句:
user_id :您的阿里云账号ID | SELECT t1.user_id, t1.now1mQPS, t1.past1mQPS, de_ratio, t2.Rate_2XX, Rate_3XX, Rate_4XX, Rate_5XX, aveQPS FROM ( ( SELECT user_id, round(c [1] / 60, 0) AS now1mQPS, round(c [2] / 60, 0) AS past1mQPS, round( 100-round(c [1] / 60, 0) / round(c [2] / 60, 0) * 100, 2 ) AS de_ratio FROM ( SELECT compare(t, 60) AS c, user_id FROM ( SELECT COUNT(*) AS t, user_id FROM log GROUP BY user_id ) GROUP BY user_id ) WHERE c [3] < 0.9 AND ( c [1] > 180 or c [2] > 180 ) ) t1 JOIN ( SELECT user_id, Rate_2XX, Rate_3XX, Rate_4XX, Rate_5XX, countall / 60 AS "aveQPS", status_2XX, status_3XX, status_4XX, status_5XX, countall FROM ( SELECT user_id, round( round(status_2XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_2XX, round( round(status_3XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_3XX, round( round(status_4XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_4XX, round( round(status_5XX * 1.0000 / countall, 4) * 100, 2 ) AS Rate_5XX, status_2XX, status_3XX, status_4XX, status_5XX, countall FROM ( SELECT user_id, count_if( status >= 200 AND status < 300 ) AS status_2XX, count_if( status >= 300 AND status < 400 ) AS status_3XX, count_if ( status >= 400 AND status < 500 AND status <> 444 AND status <> 405 ) AS status_4XX, count_if( status >= 500 AND status < 600 ) AS status_5XX, COUNT(*) AS countall FROM log GROUP BY user_id ) ) WHERE countall > 0 ) t2 ON t1.user_id = t2.user_id ) ORDER BY de_ratio DESC LIMIT 5
查询区间:1分钟(相对)
频率:固定间隔1分钟
触发条件:
$0.de_ratio>50&& $0.now1mqps>20
触发通知阈值:1次
通知间隔:5分钟
发送内容:
- [时间]:${FireTime} - [UID]:${Results[0].RawResults[0].user_id} - 产品:WAF - 过去1分钟平均QPS:${Results[0].RawResults[0].now1mqps} - [触发条件(突降率&QPS)]:${condition} - QPS突降率:${Results[0].RawResults[0].de_ratio}% - 响应码 2xx_rate :${Results[0].RawResults[0].rate_2xx}% - 响应码 3xx_rate :${Results[0].RawResults[0].Rate_3XX}% - 响应码 4xx_rate :${Results[0].RawResults[0].Rate_4XX}% - 响应码 5xx_rate :${Results[0].RawResults[0].Rate_5XX}%