The multi-cluster alert management feature allows you to create or modify alert rules on a Fleet instance. However, the Fleet instance can propagate only the same alert rules to clusters that are associated with the Fleet instance. You may want your clusters to use different alert rules to meet business requirements. This topic describes how to override alerting configurations to allow different clusters to use different alert configurations.
Prerequisites
The Fleet management feature is enabled. For more information, see Enable multi-cluster management.
Two clusters (the service provider cluster and service consumer cluster) are associated with the Fleet instance. For more information, see Associate clusters with a Fleet instance.
Components required for multi-cluster alert management are installed in the clusters that you want to manage. For more information, see Install and update the components.
Background Information
The multi-cluster management feature allows you to create KubeVela override policies on a Fleet instance to override alerting configurations or application configurations. You can create alert rules on a Fleet instance and then create an override policy to override the alert rules of specific clusters. For example, you can create an override policy to enable GPU alerting, set different alert thresholds, and specify different contacts. After you complete the configurations, you can use the Fleet instance to propagate the alert rules to the clusters associated with the Fleet instance and then apply the override policy.
The following figure shows how alerting configurations are overridden for specific clusters. An override policy is created on the Fleet instance and applied to ACK Cluster 2 to override its alerting configurations. ACK Cluster 1 still uses the original alerting configurations.
Step 1: Create a contact and a contact group
Create a contact and a contact group. For more information, see the Step 1: Create a contact and a contact group section of the "Multi-cluster alert management" topic.
Step 2: Obtain the contact group ID
Obtain the contact group ID. For more information, see the Step 2: Obtain the contact group ID section of the "Multi-cluster alert management" topic.
Step 3: Create alert rules
Create alert rules. For more information, see the Step 3: Create alert rules section of the "Multi-cluster alert management" topic.
Step 4: Create an override policy and apply the policy to override the alert rules
KubeVela is used to create an override policy on the Fleet instance and then apply the policy from the Fleet instance to override the alert rules. To do this, perform the following steps.
Run the following command to query the IDs of the clusters to which you want to propagate the alert rules:
kubectl get managedcluster
Expected output:
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE c565e4**** true True True 12d cbaa12**** true True True 12d
NoteYou can also select clusters by specifying cluster labels. For more information, see the Method 2: Specify a label in the cluster selector section of the "Select a cluster to distribute applications" topic.
Create a file named ackalertrule-app-override.yaml based on the following content to define the configurations to override:
In this example,
ack-cluster-1
is a CPU-accelerated cluster andack-cluster-2
is a GPU-accelerated cluster. This example shows how to override the alert rules ofack-cluster-2
. The override policy enables GPU alerting, modifies the alert thresholds, and changes the contacts.apiVersion: core.oam.dev/v1alpha1 # Specify the cluster to which the alert rules are propagated by cluster ID. kind: Policy metadata: name: cluster-cpu namespace: kube-system type: topology properties: clusters: ["<ack-cluster-1>"] # Replace <ack-cluster-1> with the cluster ID of ack cluster 1. --- apiVersion: core.oam.dev/v1alpha1 # Specify the cluster to which the alert rules are propagated by cluster ID. kind: Policy metadata: name: cluster-gpu namespace: kube-system type: topology properties: clusters: ["<ack-cluster-2>"] # Replace <ack-cluster-2> with the cluster ID of ack cluster 2. --- apiVersion: core.oam.dev/v1alpha1 # Define an override policy. kind: Policy metadata: name: override-gpu namespace: kube-system type: override properties: components: - name: ackalertrules # The component name in the associated application. traits: - type: alert-rule # alert-rule trait is used to modify the alert rules. properties: groups: # The override configurations, whose structure is the same as that of the alert rules. You can define multiple groups and alert rules to be overridden. - name: res-exceptions # Specify the name of the alert group to be overridden. rules: - contactGroups: # Override the contact group. - arms_contact_group_id: "12345" cms_contact_group_name: ack_Default Contact Group id: "1234" enable: enable # Change the value to enable. name: node_cpu_util_high # Specify the name of the alert rule to be overridden. thresholds: # Modify the threshold. - key: CMS_ESCALATIONS_CRITICAL_Threshold unit: percent value: "60" - name: cluster-error # Specify the name of the alert group to override. rules: - enable: enable # Change the value to enable. name: gpu-xid-error # Specify the name of the alert rule to override. --- apiVersion: core.oam.dev/v1alpha1 # Define a KubeVela workflow. kind: Workflow metadata: name: deploy-ackalertrules namespace: kube-system steps: - type: deploy name: deploy-cpu properties: policies: ["cluster-cpu"] # Deploy the alert rules to cluster-cpu. - type: deploy name: deploy-gpu properties: policies: ["override-gpu", "cluster-gpu"] # Apply the override policy to override the alert rules of cluster-gpu. --- apiVersion: core.oam.dev/v1beta1 # Define a KubeVela application. kind: Application metadata: name: alertrules namespace: kube-system annotations: app.oam.dev/publishVersion: version1 # Repropagate the alert rules when resources are updated. The value of publishVersion must be modified. spec: components: - name: ackalertrules type: ref-objects properties: objects: - resource: ackalertrules # Reference the alert rules created in Step 3. name: default workflow: ref: deploy-ackalertrules # Use the propagate rules defined in the workflow to propagate the alert rules.
Run the following command to apply the override policy and override the alert rules:
kubectl apply -f ackalertrule-app-override.yaml
Run the following command to view the propagation progress of the alert rules:
kubectl amc appstatus alertrules -n kube-system --tree --detail
Expected output:
CLUSTER NAMESPACE RESOURCE STATUS APPLY_TIME DETAIL c565e4**** (ack-cluster-1)─── kube-system─── AckAlertRule/default updated 2022-**-** **:**:** Age: ** cbaa12**** (ack-cluster-2)─── kube-system─── AckAlertRule/default updated 2022-**-** **:**:** Age: **