This topic describes the diagnostic procedure for storage and how to troubleshoot storage exceptions.
Diagnostic procedure
Run the following command to view the pod event. Check whether the pod fails to be launched due to a storage issue.
kubectl describe pods <pod-name>
If the pod is in the following state, the volume is mounted to the pod. In this scenario, the pod fails to be launched due to other issues, such as CrashLoopBackOff. To resolve the issue, submit a ticket.
Run the following command to check whether the Container Storage Interface (CSI) plug-in works as normal:
kubectl get pod -n kube-system |grep csi
Expected output:
NAME READY STATUS RESTARTS AGE csi-plugin-*** 4/4 Running 0 23d csi-provisioner-*** 7/7 Running 0 14d
NoteIf the status of the pod is not Running, run the
kubectl describe pods <pod-name> -n kube-system
command to check the reason that causes containers to exit and view the pod event.Run the following command to check whether the version of the CSI plug-in is up-to-date.
kubectl get ds csi-plugin -n kube-system -oyaml |grep image
Expected output:
image: registry.cn-****.aliyuncs.com/acs/csi-plugin:v*****-aliyun
For more information about the latest CSI version, see csi-plugin and csi-provisioner. If your cluster uses an earlier CSI version, update the plug-in to the latest version. For more information, see Manage components. For more information about how to troubleshoot volume plug-in update failures, see Troubleshoot component update failures.
Troubleshoot the pod pending issue.
If the pod uses a disk, refer to The status of the pod that uses a disk is not Running.
If the pod uses an File Storage NAS (NAS) file system, refer to The status of the pod that uses a NAS file system is not Running.
If the pod uses an Object Storage Service (OSS) bucket, refer to The status of the pod that uses an OSS bucket is not Running.
Troubleshoot the issue that the status of the persistent volume claim (PVC) is not Bound.
If the PVC corresponds to a disk, refer to The status of the disk PVC not Bound.
If the PVC corresponds to a NAS file system, refer to The status of the NAS PVC not Bound.
If the PVC corresponds to an OSS bucket, refer to The status of the OSS PVC not Bound.
If the issue persists, submit a ticket.
Troubleshoot component update failures
If you fail to update the csi-provisioner and csi-plugin components, perform the following steps to troubleshoot the issue.
csi-provisioner
By default, the csi-provisioner component is deployed by using a Deployment that creates two pods. The pods cannot be scheduled to the same node because they are mutually exclusive. If you fail to update the component, check whether only one node is available in the cluster.
For version 1.14 or earlier, the csi-provisioner component is deployed by using a StatefulSet. If the csi-provisioner component in your cluster is deployed by using a StatefulSet, you can run the
kubectl delete sts csi-provisioner
command to delete the current csi-provisioner component. Then, log on to ACK console and re-install the csi-provisioner component. For more information, see Manage components.
csi-plugin
Check whether the cluster contains nodes that are in the
NotReady
state. IfNotReady
nodes exist, ACK fails to update the DaemonSet that is used to deploy the csi-plugin component.If you fail to update the csi-plugin component but all plug-ins work as normal, the issue is caused by an update timeout error. If a timeout error occurs when the component center updates the csi-plugin component, the component center automatically rolls back the update. To resolve this issue, submit a ticket.
Disk troubleshooting
To mount a disk to a node, make sure that the node and disk are created in the same region and zone. If they are created in different regions or zones, you fail to mount the disk to the node.
The types of disks supported by different types of Elastic Compute Service (ECS) instances vary. For more information, see Overview of instance families.
The status of the pod is not Running
Problem:
The status of the PVC is Bound but the status of the pod is not Running.
Cause:
No node is available for scheduling.
An error occurs when the system mounts the disk.
The ECS instance does not support the specified disk type.
Solution:
Schedule the pod to another node. For more information, see Schedule pods to specific nodes.
Run the
kubectl describe pods <pod-name>
command to view the pod event.Troubleshoot the issue based on the event.
If an error occurs when the system mounts a disk, refer to FAQ about disk volumes.
If an error occurs when the system unmounts a disk, refer to FAQ about disk volumes.
If no event is displayed, submit a ticket.
If the ECS instance does not support the specified disk type, select a disk type that is supported by the ECS instance. For more information, see Overview of instance families.
To troubleshoot ECS API issues, refer to ErrorCode.
The status of the PVC is not Bound
Problem:
The status of the PVC is not Bound and the status of the pod is not Running.
Cause:
Static: The selectors of the PVC and persistent volume (PV) fail to meet certain conditions. Therefore, the PV and PVC cannot be associated. For example, the selector configuration of the PVC is different from that of the PV, the selectors use different StorageClass names, or the status of the PV is Release.
Dynamic: The csi-provisioner component fails to create the disk.
Solution:
Static: Check the relevant YAML content. For more information, see Mount a statically provisioned disk volume by using kubectl.
NoteIf the status of the PV is Release, the PV cannot be reused. You need to create a new PV to use the disk.
Dynamic: Run the
kubectl describe pvc <pvc-name> -n <namespace>
command to view the PVC event.Troubleshoot the issue based on the event.
If an error occurs when the system expands a disk, refer to FAQ about disk volumes.
If no event is displayed, submit a ticket.
If an error occurs when you call the ECS API to create a disk, refer to ErrorCode and troubleshoot the issue. If the issue persists, submit a ticket.
NAS troubleshooting
To mount a NAS file system to a node, make sure that the node and NAS file system are deployed in the same virtual private cloud (VPC). If the node and NAS file system are deployed in different VPCs, use Cloud Enterprise Network (CEN) to connect them.
You can mount a NAS file system to a node that is deployed in a zone different from the NAS file system.
The path to which an Extreme NAS file system or CPFS 2.0 file system is mounted must start with /share.
The status of the pod is not Running
Problem:
The status of the PVC is Bound but the status of the pod is not Running.
Cause:
fsGroups
are used when you mount the NAS file system. chmod is slowed down because a large number of files need to be handled.Port 2049 is blocked in the security group rules.
The NAS file system and node are deployed in different VPCs.
Solution:
Check whether
fsGroups
are configured. If yes, delete the fsGroups, restart the pod, and try to mount the NAS file system again.Check whether port 2049 of the node that hosts the pod is blocked. If yes, unblock the port and try again. For more information, see Add a security group rule.
If the NAS file system and node are deployed in different VPCs, use CEN to connect them.
For other causes, run the
kubectl describe pods <pod-name>
command to view the pod event.Troubleshoot the issue based on the event. For more information, see FAQ about NAS volumes.
If no event is displayed, submit a ticket.
The status of the PVC is not Bound
Problem:
The status of the PVC is not Bound and the status of the pod is not Running.
Cause:
Static: The selectors of the PVC and PV fail to meet certain conditions. Therefore, the PV and PVC cannot be associated. For example, the selector configuration of the PVC is different from that of the PV, the selectors use different StorageClass names, or the status of the PV is Release.
Dynamic: The csi-provisioner component fails to mount the NAS file system.
Solution:
Static: Check the relevant YAML content. For more information, see Mount a statically provisioned NAS volume.
NoteIf the status of the PV is Release, the PV cannot be reused. Create a new PV that uses the NAS file system.
Dynamic: Run the
kubectl describe pvc <pvc-name> -n <namespace>
command to view the PVC event.Troubleshoot the issue based on the event. For more information, see FAQ about NAS volumes.
If no event is displayed, submit a ticket.
OSS troubleshooting
When you mount an OSS bucket to a node, you need to specify the AccessKey pair in the PV. You can store the AccessKey pair in a Secret.
If the OSS bucket and node are created in different regions, set Bucket URL to the public endpoint of the OSS bucket. If the OSS bucket and node are created in the same region, we recommend that you use the private endpoint of the OSS bucket.
The status of the pod is not Running
Problem:
The status of the PVC is Bound but the status of the pod is not Running.
Cause:
fsGroups
are used when you mount the OSS bucket. chmod is slowed down because a large number of files need to be handled.The OSS bucket and node are created in different regions and the private endpoint of the OSS bucket is used. As a result, the node fails to connect to the bucket endpoint.
Solution:
Check whether
fsGroups
are configured. If yes, delete the fsGroups, restart the pod, and try to mount the OSS bucket again.Check whether the OSS bucket and node are created in the same region. If they are created in different regions, check whether the private endpoint of the OSS bucket is used. If yes, change to the public endpoint of the OSS bucket.
For other causes, run the
kubectl describe pods <pod-name>
command to view the pod event.Troubleshoot the issue based on the event. For more information, see FAQ about OSS volumes.
If no event is displayed, submit a ticket.
The status of the PVC is not Bound
Problem:
The status of the PVC is not Bound and the status of the pod is not Running.
Static: The selectors of the PVC and PV fail to meet certain conditions. Therefore, the PV and PVC cannot be associated. For example, the selector configuration of the PVC is different from that of the PV, the selectors use different StorageClass names, or the status of the PV is Release.
Dynamic: The csi-provisioner component fails to mount the OSS bucket.
Solution:
Static: Check the relevant YAML content. For more information, see Mount a statically provisioned OSS volume.
NoteIf the status of the PV is Release, the PV cannot be reused. Create a new PV that uses the OSS bucket.
Dynamic: Run the
kubectl describe pvc <pvc-name> -n <namespace>
command to view the PVC event.Troubleshoot the issue based on the event. For more information, see FAQ about OSS volumes.
If no event is displayed, submit a ticket.