By Xining Wang (xining.wxn@alibaba-inc.com)
This is the 3rd article in the series:
Users can manually configure SLO based on Prometheus metrics, but the process is cumbersome. Alibaba Cloud Service Mesh (ASM) can configure service level objectives (SLOs) and associated alert rules, simplifying this process with custom resource YAML configurations. This article explains how to configure SLOs for applications in ASM.
In this topic, an SLO is configured for the httpbin application in the default namespace to specify the service availability. The objective is 99% and the period of time during which the SLO takes effect is 30 days. Two severity levels of alerts are configured: pageAlert and ticketAlert.
Save the following configuration file in YAML format as the prometheusservicelevel.yaml
file. Use kubeconfig of the ASM instance to connect and run the kubectl
command to deploy to the mesh.
kubectl apply -f prometheusservicelevel.yaml
apiVersion: istio.alibabacloud.com/v1beta1
kind: ServiceLevelObjective
metadata:
name: asm-slo-default-httpbin
namespace: default # Namespace to which the custom resource belongs
spec:
service: httpbin # Name of the application
period: 30d # Period of time during which the SLO takes effect
slos:
- name: asm-slo # Name of the SLO
objective: "99" # Objective
sli:
plugin:
id: availability # Type of the plug-in
alerting:
name: asm-alert # Name of the alert rule
And you also can use the Web UI console of ASM to define the SLO shown as below.
Run the following command to view the results:
# default is the namespace where the application resides. httpbin is the name of the application.
kubectl get prometheusservicelevel asm-slo-default-httpbin -o yaml
The status field in the command output:
status:
......
status: success
prometheusRules: # Automatically-generated YAML configuration of the Prometheus rule
The value of the prometheusRules
field is the YAML configuration of the Prometheus rule.
The following code provides an example of the YAML configuration:
groups:
- name: asm-slo-sli-recordings-httpbin-asm-slo
rules:
- record: slo:sli_error:ratio_rate5m
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[5m])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[5m])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 5m
- record: slo:sli_error:ratio_rate30m
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[30m])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[30m])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 30m
- record: slo:sli_error:ratio_rate1h
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[1h])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[1h])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 1h
- record: slo:sli_error:ratio_rate2h
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[2h])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[2h])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 2h
- record: slo:sli_error:ratio_rate6h
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[6h])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[6h])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 6h
- record: slo:sli_error:ratio_rate1d
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[1d])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[1d])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 1d
- record: slo:sli_error:ratio_rate3d
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[3d])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[3d])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 3d
- record: slo:sli_error:ratio_rate30d
expr: |
sum_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d])
/ ignoring (slo_window)
count_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d])
labels:
slo_window: 30d
- name: asm-slo-meta-recordings-httpbin-asm-slo
rules:
- record: slo:objective:ratio
expr: vector(0.99)
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:error_budget:ratio
expr: vector(1-0.99)
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:time_period:days
expr: vector(30)
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:current_burn_rate:ratio
expr: |
slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
/ on(slo_id, asm_slo, slo_service) group_left
slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:period_burn_rate:ratio
expr: |
slo:sli_error:ratio_rate30d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
/ on(slo_id, asm_slo, slo_service) group_left
slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:period_error_budget_remaining:ratio
expr: 1 - slo:period_burn_rate:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo",
slo_service="httpbin"}
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: asm_slo_info
expr: vector(1)
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_mode: cli-gen-prom
slo_objective: "99"
slo_service: httpbin
slo_spec: prometheus/v1
slo_version: dev
- name: asm-slo-alerts-httpbin-asm-slo
rules:
- alert: asm-alert
expr: |
(
(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01))
and ignoring (slo_window)
(slo:sli_error:ratio_rate1h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01))
)
or ignoring (slo_window)
(
(slo:sli_error:ratio_rate30m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01))
and ignoring (slo_window)
(slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01))
)
labels:
slo_severity: page
annotations:
summary: '{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
rate is over expected.'
title: (page) {{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
rate is too fast.
- alert: asm-alert
expr: |
(
(slo:sli_error:ratio_rate2h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01))
and ignoring (slo_window)
(slo:sli_error:ratio_rate1d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01))
)
or ignoring (slo_window)
(
(slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01))
and ignoring (slo_window)
(slo:sli_error:ratio_rate3d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01))
)
labels:
slo_severity: ticket
annotations:
summary: '{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
rate is over expected.'
title: (ticket) {{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget
burn rate is too fast.
Save the result for the following configuration to Prometheus.
Configure SLO for Application Service in Alibaba Cloud Service Mesh (2): SLO Definition in ASM
56 posts | 8 followers
FollowXi Ning Wang(王夕宁) - April 8, 2023
Xi Ning Wang(王夕宁) - April 8, 2023
Xi Ning Wang(王夕宁) - April 8, 2023
Xi Ning Wang(王夕宁) - April 8, 2023
Alibaba Cloud Community - April 14, 2023
Alibaba Cloud Community - June 9, 2023
56 posts | 8 followers
FollowAlibaba Cloud Service Mesh (ASM) is a fully managed service mesh platform that is compatible with Istio.
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreA PaaS platform for a variety of application deployment options and microservices solutions to help you monitor, diagnose, operate and maintain your applications
Learn MoreProvides comprehensive quality assurance for the release of your apps.
Learn MoreMore Posts by Xi Ning Wang(王夕宁)