All Products
Search
Document Center

Container Service for Kubernetes:Overview of the AIOps suite

Last Updated:Jun 07, 2024

Kubernetes is a large-scale distributed container orchestration engine. Due to its complexity, the management and O&M of Kubernetes clusters require specialized expertise. To simplify the management and O&M of Kubernetes clusters, Container Service for Kubernetes (ACK) provides the AIOps suite. The AIOps suite consists of cluster check, cluster inspection, and cluster diagnostics, which can help you troubleshoot issues and improve the O&M efficiency. This topic describes the benefits of the AIOps suite and its features, including cluster check, cluster inspection, and cluster diagnostics.

Benefits

The AIOps suite provides a variety of features, including cluster check, cluster inspection, and cluster diagnostics. The following table describes the benefits of the AIOps suite.

Feature

Benefit

Cluster check

Before the system performs O&M operations on a cluster, a cluster check is triggered to evaluate whether the cluster meets the requirements of the O&M operations. This increases the success rate of O&M operations.

Cluster inspection

Cluster inspections are performed at a scheduled time to identify potential risks in clusters.

Cluster diagnostics

A set of cluster diagnostics tools are provided to diagnose pods, nodes, Ingresses, memory, and Services. This simplifies troubleshooting.

image
Note

The AIOps suite is supported for ACK managed clusters, ACK dedicated clusters, and ACK Serverless Pro clusters.

Cluster check

The cluster check feature covers key O&M operations, such as cluster upgrade, cluster migration, component installation, component upgrade, and node pool upgrade. Before you perform these O&M operations, a cluster check is automatically triggered. You can perform the corresponding O&M operations only after the cluster passes the check. The system also displays the reasons of failed check items in a visualized manner and provides suggestions on how to fix them. For more information, see Cluster Check.

Cluster inspection

Based on extensive cluster management practices, ACK has accumulated rich experience in cluster inspection from a wealth of use cases. You can use the cluster inspection feature to complete the following tasks:

  • Scan the status of a cluster to identify potential risks.

  • Periodically check the resource usage, resource quotas, cluster certificates, and component versions of a cluster and allow you to view the results in a visualized manner.

  • Display the severity levels of anomalies and provide solutions to help efficiently maintain your clusters.

For more information, see Cluster Inspections.

Cluster diagnostics

The cluster diagnostics feature allows you to diagnose clusters with a few clicks. This feature can help diagnose pods, nodes, Ingresses, memory, and Services in your cluster. For more information, see Work with cluster diagnostics.

Item

Description

Pod diagnostics

Diagnoses common pod issues, such as pod startup failures, container image pulling failures, and pod exceptions, displays the root cause of these issues, and provides suggestions on how to fix the issues.

Node diagnostics

Diagnoses common node issues, such as the NotReady issue, node network issues, and runtime issues, displays the root cause of these issues, and provides suggestions on how to fix the issues.

Service diagnostics

Diagnoses common issues with Services, such as those related to service exception events, Server Load Balancer (SLB) backend server quotas, and SLB instance count quotas, displays the root cause of these issues, and provides suggestions on how to fix the issues.

Ingress diagnostics

Collects information about Ingress component check, startup parameters, Ingress pod error logs, and the SLB instances used by the Ingress controller to help troubleshoot application access issues.

Memory diagnostics

Diagnoses common memory issues in ACK clusters, such as memory leaks, memory fragmentation, and cgroup leaks, displays the root cause of these issues, and provides suggestions on how to fix the issues.