Container Service for Kubernetes (ACK) supports cluster update check, cluster migration check, component check, and node pool check. This topic describes cluster check items and provides suggestions on how to fix cluster issues.
Table of contents
Cluster check items
Cluster update check
Kubernetes is complex. In new Kubernetes versions, changes may be made to the runtime, certain Kubernetes APIs may be deprecated, and new features may be introduced. Due to these updates, high risks exist when you update clusters. To ensure that you can smoothly update your cluster, ACK provides the cluster update check feature. A precheck is automatically triggered before a cluster is updated. The cluster is updated only if the cluster passes the precheck.
Cluster update check consists of the following checks:
Cluster resource check: checks cloud resources related to ACK clusters, such as Server Load Balancer (SLB) instances, Elastic Compute Service (ECS) instances, and virtual private clouds (VPCs).
Cluster component check: checks the configurations of ACK clusters, components, and applications. For example, the system checks whether the component versions meet the requirement or whether the applications are using deprecated APIs.
Cluster configuration check: checks configurations related to the nodes in ACK clusters. To perform the cluster configuration check, the system needs to create a pod on each node to collect information.
The cluster update check items vary based on the type, runtime, and version of the cluster. The check items in the following table are for reference only. The actual check items in the console shall prevail.
Type | Check item | Description |
Cluster resources | APIServer SLB | Checks whether the SLB instance exists. |
Checks whether the status of the SLB instance is normal. | ||
Checks whether the configurations of the SLB listeners are valid, including the listener ports and protocol. | ||
Checks whether the configurations of the SLB backend server groups are valid. | ||
Checks whether the configuration of SLB access control is valid. If no access control is configured, this check item displays Normal. | ||
VPC | Checks whether the VPC exists. | |
Checks whether the status of the VPC is normal. | ||
vSwitch | Checks whether the vSwitch exists. | |
Checks whether the status of the vSwitch is normal. | ||
Checks whether the vSwitch can provide no less than two idle IP addresses. | ||
ECS | Checks whether the ECS instance exists. | |
Checks whether the status of the ECS instance is normal. | ||
Checks whether the security group of the ECS instance is normal. | ||
Checks whether the ECS instance has expired. | ||
Checks whether the instance type of the ECS instance meets the requirement. | ||
Checks whether the status of the Cloud Assistant client is normal. | ||
Cluster components | Kube Proxy Master | Checks whether the component exists. |
Kube Proxy Worker | Checks whether the component exists. | |
API Service | Checks whether unavailable API Services exist. | |
Cluster instances | Checks whether the number of master instances in the cluster is three or five. | |
Cluster components | Checks whether the version of Terway meets the requirement. | |
Checks whether the version of CoreDNS meets the requirement. | ||
Checks whether the version of the cloud controller manager meets the requirement. | ||
Checks whether the version of the NGINX Ingress controller meets the requirement. | ||
Checks whether the version of ACK Virtual Node meets the requirement. | ||
Checks whether the version of the metric server meets the requirement. | ||
Nodes | Checks whether the node IP address exists. | |
Checks whether the node is schedulable. | ||
Checks whether the node is ready. | ||
Checks whether the operating system of the node can be updated. | ||
Checks whether the number of available pods on the node is greater than two. | ||
Deprecated APIs | Checks whether the cluster uses deprecated APIs. | |
Cluster configurations | iptables configurations | Checks whether the iptables configurations are valid. |
Operating systems | Checks whether the operating system can be updated. | |
yum | Checks whether Yum is normal. | |
Disks | Checks whether the file system of the node is normal. | |
Checks whether the free disk space of the node exceeds 5% of the total disk space. | ||
Swap | Checks whether the node has Swap enabled. | |
NTP | Checks whether the NTP of the node is normal. | |
Systemd | Checks whether the Systemd version of the node is later than systemd-219-67. | |
Kubelet | Checks whether the kubelet configuration meets the requirement. | |
Container runtime | Checks whether the Docker runtime or Containerd runtime is normal. | |
Kernel configuration | Checks whether the kernel configuration of the node is normal. | |
Manifest configuration | Checks whether the manifest file meets the requirement. |
Cluster migration check
A precheck is automatically triggered before a cluster is migrated. The cluster is migrated only if the cluster passes the precheck. Cluster migration check is suitable for the following scenarios:
Migrate from an ACK dedicated cluster to an ACK Pro cluster.
Migrate from an ACK Basic cluster to an ACK Pro cluster.
Cluster migration check consists of the following checks:
Cluster resource check: checks cloud resources related to ACK clusters, such as SLB instances, ECS instances, and VPCs.
Cluster component check: checks the configurations of the components in ACK clusters. For example, the system checks whether unavailable API Services exist.
Cluster configuration check: checks configurations related to the nodes in ACK clusters. To perform the cluster configuration check, the system needs to create a pod on each node to collect information.
Use of components: After you migrate from an ACK dedicated cluster to an ACK Pro cluster, some components are managed by ACK. Therefore, the system checks whether these components are normal before the migration.
The cluster migration check items vary based on the type, runtime, and version of the cluster. The check items in the following table are for reference only. The actual check items in the console shall prevail.
Type | Check item | Description |
Cluster resources | APIServer SLB | Checks whether the SLB instance exists. |
Checks whether the status of the SLB instance is normal. | ||
Checks whether the configurations of the SLB listeners are valid, including the listener ports and protocol. | ||
Checks whether the configurations of the SLB backend server groups are valid. | ||
Checks whether the configuration of SLB access control is normal. If no access control is configured, this check item displays Normal. | ||
VPC | Checks whether the VPC exists. | |
Checks whether the status of the VPC is normal. | ||
vSwitch | Checks whether the vSwitch exists. | |
Checks whether the status of the vSwitch is normal. | ||
Checks whether the vSwitch can provide no less than two idle IP addresses. | ||
ECS | Checks whether the ECS instance exists. | |
Checks whether the status of the ECS instance is normal. | ||
Checks whether the security group of the ECS instance is normal. | ||
Checks whether the status of the Cloud Assistant client is normal. | ||
Cluster components | Kube Proxy Master | Checks whether the component exists. |
Kube Proxy Worker | Checks whether the component exists. | |
API Service | Checks whether unavailable API Services exist. | |
Cluster instances | Checks whether the number of master instances in the cluster is three or five. | |
Nodes | Checks whether the node IP address exists. | |
Checks whether the node is schedulable. | ||
Checks whether the node is ready. | ||
Checks whether the operating system of the node can be updated. | ||
Checks whether the number of available pods on the node is greater than two. | ||
Cluster configurations | Operating systems | Checks whether the operating system can be updated. |
yum | Checks whether Yum is normal. | |
Use of components | Cloud Controller Manager | Checks whether the cloud controller manager is normal. |
Component check
Component check is suitable for component update scenarios. A precheck is automatically triggered before a component is updated. The component is updated only if the cluster passes the precheck.
The component check items vary based on the type, runtime, and version of the cluster. The check items in the following table are for reference only. The actual check items in the console shall prevail.
Type | Check item | Description |
cloud-controller-manager | Addon_CCM | Checks whether the update causes SLB changes. |
Component_Block_Version | Checks whether the cloud controller manager can be updated. | |
csi-plugin | DaemonSet_Annotation | Checks whether the annotations of the DaemonSet meet the requirement. |
Csi_Driver_Attributes | Checks whether the CSI driver attribute meets the requirement. | |
Node_Status_Ready | Checks whether the node is ready. | |
csi-provisioner | Stateful_Set_Exist | Checks whether the resource is a StatefulSet. |
Deployment_Annotation | Checks whether the annotations of the Deployment meet the requirement. | |
Storage_Class_Attributes | Checks whether the StorageClass attribute meets the requirement. | |
Csi_Provisioner_Node_Count | Checks whether the number of ready nodes is equal to or greater than two. | |
terway-eniip | Systemd | Checks whether the Systemd version of the node is later than systemd-219-67. |
nginx-ingress-controller | Deployment_Healthy | Checks whether the NGINX Ingress Deployment is healthy. |
Deployment_Not_Under_HPA | Checks whether a horizontal pod autoscaler (HPA) is configured for the Deployment. | |
Deployment_Not_Modified | Checks whether the Deployment is changed. | |
Nginx_Ingress_Pod_Error_Log | Checks whether NGINX error logs are generated. | |
LoadBalancer_Service_Healthy | Checks whether the NGINX Services are healthy. | |
Nginx_Ingress_Configuration | Checks whether incompatible configurations exist in Ingresses. | |
aliyun-acr-credential-helper | RamRole_Exist | Checks whether the component is assigned the AliyunCSManagedAcrRole role. |
ack-cost-exporter | RamRole_Exist | Checks whether the component is assigned the AliyunCSManagedCostRole role. |
Node pool check
Node pool check is suitable for node pool update scenarios. A precheck is automatically triggered before a node pool is updated. The node pool is updated only if the node pool passes the precheck.
Node pool check consists of the following checks:
Cluster resource check: checks cloud resources related to ACK clusters, such as SLB instances and VPCs.
Cluster component check: checks the configurations of ACK clusters, nodes, and applications.
Cluster configuration check: checks configurations related to the nodes in ACK clusters. To perform the cluster configuration check, the system needs to create a pod on each node to collect information.
The node pool check items vary based on the type, runtime, and version of the cluster. The check items in the following table are for reference only. The actual check items in the console shall prevail.
Type | Check item | Description |
Cluster resources | APIServer SLB | Checks whether the SLB instance exists. |
Checks whether the status of the SLB instance is normal. | ||
Checks whether the configurations of the SLB listeners are valid, including the listener ports and protocol. | ||
Checks whether the configurations of the SLB backend server groups are valid. | ||
Checks whether the configuration of SLB access control is valid. If no access control is configured, this check item displays Normal. | ||
VPC | Checks whether the VPC exists. | |
Checks whether the status of the VPC is normal. | ||
vSwitch | Checks whether the vSwitch exists. | |
Checks whether the status of the vSwitch is normal. | ||
Checks whether the vSwitch can provide no less than two idle IP addresses. | ||
Cluster components | API Service | Checks whether unavailable API Services exist. |
Cluster instances | Checks whether the number of master instances in the cluster is three or five. | |
Nodes | Checks whether the node is ready. | |
Checks whether the number of available pods on the node is greater than two. | ||
HostPath | Checks whether pods that use hostPath exist on the node. | |
Cluster configurations | iptables configurations | Checks whether the iptables configurations are valid. |
Operating systems | Checks whether the operating system can be updated. | |
yum | Checks whether Yum is normal. | |
Disks | Checks whether the file system of the node is normal. | |
Free disk space on nodes | Checks whether the free disk space of the node exceeds 5% of the total disk space. | |
Swap | Checks whether the node has Swap enabled. | |
NTP | Checks whether the NTP of the node is normal. | |
Systemd | Checks whether the Systemd version of the node is later than systemd-219-67. | |
Kubelet | Checks whether the kubelet configuration meets the requirement. | |
Container runtime | Checks whether the Docker runtime or Containerd runtime is normal. | |
Kernel configuration | Checks whether the kernel configuration of the node is normal. | |
Manifest configuration | Checks whether the manifest file meets the requirement. |
Suggestions on how to fix cluster issues
Issue | Suggestion |
Role Aliyun_ARMS_CMonitor_Role missing | Grant the cluster permissions on Managed Service for Prometheus. For more information about how to manually grant permissions on Application Real-Time Monitoring Service (ARMS) and Tracing Analysis, see Enable Kubernetes Monitoring for a Kubernetes cluster. |
Outdated Systemd version | |
Outdated component version | Update the component. For more information, see Manage components. |
Yum timeout | Run the following command to check whether Yum times out. The default timeout period is 10 seconds.
|
Unavailable API Services |
|
Pods using hostPath | When the system updates a node by replacing its system disk, if the pods on the node use hostPath to mount the container directory to the host, data lost may occur. You need to check the directory that is mounted by the pods. If hostPath is not used or no risk of data loss exists, you can proceed with the update. The check result is for reference only. |
Use of deprecated APIs | Identify the resource that uses the deprecated APIs and take actions accordingly. For more information, see Deprecated APIs. |
Deprecated APIs
If your cluster runs Kubernetes 1.20 or later, the precheck checks whether deprecated APIs are used in your cluster. You can view the deprecated APIs that are used by the cluster in the check report.
For example, before you update your cluster from Kubernetes 1.20 to Kubernetes 1.22, the system checks whether deprecated APIs are used in your cluster by scanning the audit logs that were generated the previous day.
The precheck result is for reference only. You can proceed with the update even if your cluster runs Kubernetes 1.20 and uses deprecated APIs.
If you continue to use the deprecated APIs in Kubernetes 1.22, potential security risks may exist.
The following table describes the types of deprecated APIs. Before you update a cluster that uses deprecated APIs, we recommend that you refer to the Type column of the following table and perform operations that correspond to the type of deprecated API used by the cluster.
Type | Suggestion | Example |
core | Key Kubernetes components: ACK automatically updates key Kubernetes components. You do not need to update the components. Information about the components is not displayed on the precheck page. | apiserver, scheduler, and kube-controller-manager |
ack | ACK components: ACK components require manual update. You can update ACK components based on the instructions on the Add-ons page of the ACK console. Note
| metrics-server, nginx-ingress-controller, and coredns |
opensource | Open source components: Some open source components are listed in the ACK console. You can decide whether to update the components. These components can only be manually updated. Other open source components may be classified into the unknown type. Note Deprecated APIs in the precheck result are for reference only. You can proceed with the update even if your cluster uses deprecated APIs. Update the components based on your business requirements. | rancher and elasticsearch-operator |
unknown | Unknown sources: Deprecated APIs that do not belong to the preceding types are considered unknown resources and listed in the ACK console. You can decide whether to update the components. These components can only be manually updated. Note Deprecated APIs in the precheck result are for reference only. You can proceed with the update even if your cluster uses deprecated APIs. Update the components based on your business requirements. | kubectl, agent, Go-http-client, and okhttp |
Perform the following operations to view the information about a deprecated API:
On the Upgrade Cluster page, click Precheck and then click View Details.
On the Report page, click the Cluster Components tab, and then click the Troubleshoot tab.
Click the button next to Deprecated Kubernetes APIs.
In the dialog box that appears, click Deprecated Kubernetes APIs and click the link below.
On the Deprecated Kubernetes APIs page, you can view the information about the deprecated APIs.