FAQ about CSI, FAQ about CSI - Container Service for Kubernetes

This topic describes how to troubleshoot common issues related to storage and provides answers to some frequently asked questions about disk volumes and File Storage NAS (NAS) volumes.

Type	Issue

Type	Issue
Common issues	Common issues
FAQ about disk volumes	FAQ about creating disks Why does the system prompt "InvalidDataDiskCatagory.NotSupported" when I create a dynamically provisioned PV? Why does the system prompt "The specified AZone inventory is insufficient" when I create a dynamically provisioned PV? Why does the system prompt "disk size is not supported" when I create a dynamically provisioned PV? Why does the system prompt "waiting for first consumer to be created before binding" when I create a dynamically provisioned PV? Why does the system prompt "no topology key found on CSINode node-XXXX" when I create a dynamically provisioned PV? Why does the system prompt "selfLink was empty, can't make reference" when I create a dynamically provisioned PV? FAQ about mounting disks Why does the system prompt "had volume node affinity conflict" when I start a pod that has a disk mounted? Why does the system prompt "can't find disk" when I start a pod that has a disk mounted? Why does the system prompt "Previous attach action is still in process" when I start a pod that has a disk mounted? Why does the system prompt "InvalidInstanceType.NotSupportDiskCategory" when I start a pod that has a disk mounted? Why does the system prompt "diskplugin.csi.alibabacloud.com not found in the list of registered CSI drivers" when I start a pod that has a disk mounted? Why does the system prompt "Multi-Attach error for volume" when I start a pod that has a disk mounted? Why does the system prompt "Unable to attach or mount volumes: unmounted volumes=[xxx], unattached volumes=[xxx]: timed out waiting for the condition" when I start a pod that has a disk mounted? Why does the system prompt "validate error Device /dev/nvme1n1 has error format more than one digit locations" when I start a pod that has a disk mounted? Why does the system prompt "ecs task is conflicted" when I start a pod that has a disk mounted? Why does the system prompt "wrong fs type, bad option, bad superblock on /dev/xxxxx missing codepage or helper program, or other error" when I start a pod that has a disk mounted? Why does the system prompt "exceed max volume count" when I start a pod that has a disk mounted? Why does the system prompt "The amount of the disk on instance in question reach its limits" when I start a pod that has a disk mounted? FAQ about unmounting disks Why does the system prompt "The specified disk is not a portable disk" when I delete a pod that has a disk mounted? Why does the system prompt that the disk cannot be unmounted when I delete a pod that has a disk mounted and an orphaned pod that is not managed by ACK is found in the kubelet log? What do I do if the system failed to recreate a deleted pod and prompts that the mounting failed? Why does the system prompt "target is busy" when I delete a pod that has a disk mounted? Why is the disk retained after I delete the PVC of a dynamically provisioned PV? Why does a PVC still exist after I delete it? FAQ about resizing disks Why does the system fail to dynamically expand a disk and generate the "Waiting for user to (re-)start a pod to finish file system resize of volume on node" PVC event? FAQ about using disks Why does the system prompt input/output error when an application performs read and write operations on the mount directory of a disk volume?
FAQ about NAS volumes	Why does the system prompt "chown: Operation not permitted"? What do I do if the task queue of alicloud-nas-controller is full and PVs cannot be created when I use a dynamically provisioned NAS volume? Why does it require a long time to mount a NAS volume? What do I do if I cannot create or modify directories in a NAS volume? Why does the system prompt "unknown filesystem type "xxx" when I mount a NAS volume"? Why does the system prompt "NFS Stale File Handle" when a client reads data from or writes data to a NAS volume? What do I do if a pod that uses two PVCs to mount two different NAS volumes remains in the ContainerCreating state? How do I use CSI to mount a NAS file system that has TLS enabled?
FAQ about OSS volumes	FAQ about mounting OSS volumes Why does it require a long time to mount an OSS volume? How do I manage the permissions related to OSS volume mounting? What do I do if I fail to mount a statically provisioned OSS volume? FAQ about using OSS volumes What do I do if the read speed of a statically provisioned OSS volume is slow? Why is 0 displayed for the size of a file in the OSS console after I write data to the file? Why is a path displayed as an object after I mount the path to a container? What do I do if the OSS server identifies unexpected large numbers of requests? What do I do if the content type of the metadata of an object in an OSS volume is application/octet-stream? How do I use the specified ARNs or ServiceAccount in RRSA authentication? What do I do if the "Operation not supported" or "Operation not permitted" error occurs when I create a hard link? FAQ about ossfs FAQ about detection failures in the ACK console What do I do if the detection task in the ACK console becomes stuck for a long period of time, the detection task fails but no error message is displayed, or the system prompts "unknown error"? How do I handle the "connection timed out" network error? How do I handle the "StatusCode=403" permission error? What do I do if the bucket or path does not exist and the StatusCode=404 status code is returned? What do I do if other OSS status codes or error codes are returned?
FAQ about volume plug-ins	csi-plugin update failures csi-plugin startup failures What do I do if the csi-provisioner update fails because the number of nodes in the cluster does not meet the requirements of the update precheck? What do I do if the csi-provisioner update fails due to attribute changes of StorageClasses? Do StorageClass changes affect existing volumes? What do I do if the "failed to renew lease xxx timed out waiting for the condition" error is displayed in the log of csi-provisioner? OOM issues caused by volume plug-ins Why does the system prompt "no volume plugin matched" for the PVC when I create or mount a volume? What do I do if a large volume of traffic is recorded in the monitoring data of the csi-plugin pod?
FAQ about cloud-native storage	Why does the system generate the "0/x nodes are available: x pod has unbound immediate PersistentVolumeClaims" event for a pod? What do I do if the PV is in the Released state and cannot be bound to the recreated PVC? What do I do if the PV is in the Lost state and cannot be bound to the recreated PVC?
FAQ about migrating from FlexVolume to CSI	FAQ about migrating from FlexVolume to CSI
Other storage issues	Other StorageClass issues Can multiple applications in a cluster use the same volume? How do I change the configurations of the StorageClasses automatically created for a disk?

Common issues

Perform the following steps to view the log of a volume plug-in and identify issues.

Run the following command to check whether events related to persistent volume claims (PVCs) or pods are generated:

kubectl get events

Expected output:

LAST SEEN   TYPE      REASON                 OBJECT                                                  MESSAGE
2m56s       Normal    FailedBinding          persistentvolumeclaim/data-my-release-mariadb-0         no persistent volumes available for this claim and no storage class is set
41s         Normal    ExternalProvisioning   persistentvolumeclaim/pvc-nas-dynamic-create-subpath8   waiting for a volume to be created, either by external provisioner "nasplugin.csi.alibabacloud.com" or manually created by system administrator
3m31s       Normal    Provisioning           persistentvolumeclaim/pvc-nas-dynamic-create-subpath8   External provisioner is provisioning volume for claim "default/pvc-nas-dynamic-create-subpath8"

Check whether the FlexVolume or CSI plug-in is deployed in the cluster.

Run the following command to check whether the FlexVolume plug-in is deployed in the cluster:

kubectl get pod -n kube-system |grep flexvolume

Expected output:

NAME                      READY   STATUS             RESTARTS   AGE
flexvolume-***            4/4     Running            0          23d

Run the following command to check whether the CSI plug-in is deployed in the cluster:

kubectl get pod -n kube-system |grep csi

Expected output:

NAME                       READY   STATUS             RESTARTS   AGE
csi-plugin-***             4/4     Running            0          23d
csi-provisioner-***        7/7     Running            0          14d

Check whether the volume template matches the template of the volume plug-in used in the cluster. The supported volume plug-ins are FlexVolume and CSI.
If this is the first time you mount volumes in the cluster, check whether the driver specified in the persistent volume (PV) and StorageClass is a CSI driver or a FlexVolume driver. The name of the driver that you specified must be the same as the type of the volume plug-in that is deployed in the cluster.
Check whether the volume plug-in is updated to the latest version.
- Run the following command to query the image version of the FlexVolume plug-in:
```
kubectl get ds flexvolume -n kube-system -oyaml | grep image
```
  Expected output:
```
image: registry.cn-hangzhou.aliyuncs.com/acs/Flexvolume:v1.14.8.109-649dc5a-aliyun
```
  For more information about FlexVolume, see FlexVolume (Deprecated).
- Run the following command to query the image version of the CSI plug-in:
```
kubectl get ds csi-plugin -n kube-system -oyaml |grep image
```
  Expected output:
```
image: registry.cn-hangzhou.aliyuncs.com/acs/csi-plugin:v1.18.8.45-1c5d2cd1-aliyun
```
  For more information about the CSI plug-in, see csi-plugin and csi-provisioner.
View logs.
- If a PVC of the disk type is in the Pending state, the related PV is not created. You must check the log of the Provisioner plug-in.
  - If the FlexVolume plug-in is deployed in the cluster, run the following command to print the log of alicloud-disk-controller:
    podid=`kubectl get pod -nkube-system | grep alicloud-disk-controller | awk '{print $1}'` kubectl logs <PodID> -n kube-system
  - If the CSI plug-in is deployed in the cluster, run the following command to print the log of csi-provisioner:
    podid=`kubectl get pod -n kube-system | grep csi-provisioner | awk '{print $1}'` kubectl logs <PodID> -n kube-system -c csi-provisioner
    Note
    Two pods are created to run csi-provisioner. After you run the kubectl get pod -nkube-system | grep csi-provisioner | awk '{print $1}' command, two podid values are returned. Then, run the kubectl logs <PodID> -nkube-system -c csi-provisioner command on each pod.
- If a mounting error occurs when the system starts a pod, you must check the log of FlexVolume or csi-plugin.
  - If the FlexVolume plug-in is deployed in the cluster, run the following command to print the log of FlexVolume:
    kubectl get pod <pod-name> -owide
    Log on to the Elastic Compute Service (ECS) instance where the pod runs and check the log of FlexVolume in the /var/log/alicloud/flexvolume_**.log directory.
  - If the CSI plug-in is deployed in the cluster, run the following command to print the log of csi-plugin:
    nodeID=`kubectl get pod <pod-name> -owide | awk 'NR>1 {print $7}'` podID=`kubectl get pods -nkube-system -owide -lapp=csi-plugin | grep $nodeID|awk '{print $1}'` kubectl logs <PodID> -nkube-system
- View the log of kubelet.
  Run the following command to query the node on which the pod runs:
```
kubectl get pod <pod-name> -owide | awk 'NR>1 {print $7}'
```
  Log on to the node and check the log files in the /var/log/message directory.

Quick recovery

If you fail to mount volumes to most of the pods on a node, you can schedule the pods to other nodes. For more information, see Schedule pods to specific nodes.

csi-plugin update failures

csi-plugin is deployed through a DaemonSet. If nodes that are in the NotReady state or a state other than Running exist in the cluster, Container Service for Kubernetes (ACK) fails to update csi-plugin. You need to manually fix the nodes and perform the update again. For more information, see Manage the CSI plug-in.

csi-plugin startup failures

Issue

csi-provisioner and csi-plugin fail to be started. The main container logs of csi-plugin and csi-provisioner report the 403 - Forbidden error.

Cause

Security hardening is enabled for the metadata servers on nodes. The metadata cannot be accessed because CSI does not support security hardening.

Solution

Submit a ticket to contact the ECS team for technical support.

What do I do if the csi-provisioner update fails because the number of nodes in the cluster does not meet the requirements of the update precheck?

Issues

The csi-provisioner plug-in fails to pass the precheck because the number of nodes in the cluster does not meet the requirement.

The csi-provisioner plug-in passes the precheck and can be updated. However, the csi-provisioner pod crashes and the following 403 Forbidden error is found in the log:

time="2023-08-05T13:54:00+08:00" level=info msg="Use node id : <?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n         \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n <head>\n  <title>403 - Forbidden</title>\n </head>\n <body>\n  <h1>403 - Forbidden</h1>\n </body>\n</html>\n"

Cause

Cause for issue 1:

To ensure the high availability of csi-provisioner, csi-provisioner runs in a primary pod and a secondary pod. The primary and secondary pods are scheduled to different nodes. If your cluster has only one node, you cannot update csi-provisioner.

Cause for issue 2:

The security hardening mode is enabled for the node where csi-provisioner resides. This mode prevents access to the metadata server on the node.

Solutions

Solution for issue 1:

Update csi-provisioner. For more information, see Manage the CSI plug-in.

Solution for issue 2:

Disable the security hardening mode on the node to allow CSI to access the metadata of the node.

What do I do if the csi-provisioner update fails due to StorageClasses attribute changes?

Issue

csi-provisioner fails the precheck because the attributes of StorageClasses do not meet the requirements.

Cause

The attributes of the default StorageClasses are modified. You have deleted and recreated StorageClasses that have the same names as the default StorageClasses. The attributes of the default StorageClasses cannot be changed. Otherwise, csi-provisioner may fail to be updated.

Solution

Delete the following default StorageClasses: alicloud-disk-essd, alicloud-disk-available, alicloud-disk-efficiency, alicloud-disk-ssd, and alicloud-disk-topology. The deletion operation does not affect the applications in the cluster. Then, reinstall csi-provisioner. After csi-provisioner is reinstalled, the preceding default StorageClasses are automatically recreated.

Important

If you want to create custom StorageClasses, use names that are different from the names of the preceding default StorageClasses.

Do StorageClass changes affect existing volumes?

StorageClass changes do not affect existing volumes if the YAML files of the PVCs or PVs are not modified. For example, after you modify the ALLOWVOLUMEEXPANSION setting in a StorageClass, the new setting takes effect only if you modify the Capacity parameter in the YAML file of the PVC.

What do I do if the "failed to renew lease xxx timed out waiting for the condition" error is displayed in the log of csi-provisioner?

Issue

After you run the kubectl logs csi-provisioner-xxxx -nkube-system command to query the log of csi-provisioner, the failed to renew lease xxx timed out waiting for the condition error appears in the log.

Cause

Multiple replicated pods are provisioned for csi-provisinoer to implement high availability. Kubernetes uses Leases to perform a leader election among the replicated pods of a component. During the election, csi-provisioner accesses the Kubernetes API server of the cluster to request the specified Lease. The replicated pod that acquires the Lease becomes the leader to provide services in the cluster. This issue occurs because csi-provisioner cannot access the Kubernetes API server of the cluster.

Solution

Check whether the cluster network and Kubernetes API server of the cluster are in the normal state. If the issue persists, submit a ticket.

OOM issues caused by volume plug-ins

csi-provisioner is a centralized volume plug-in. Sidecar containers are used to cache information about pods, PVs, and PVCs. When the size of the cluster grows, out of memory (OOM) errors may occur. When an OOM error occurs, you need to modify resource limits based on the size of the cluster.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Operations > Add-ons.
On the Add-ons page, click the icon in the lower-right part of the csi-provisioner component and click View in YAML.
Modify the resource limits in the YAML file based on the size of the cluster.

Why does the system prompt no volume plugin matched for the PVC when I create or mount a volume?

Issue

The system prompts Unable to attach or mount volumes: unmounted volumes=[xxx], unattached volumes=[xxx]: failed to get Plugin from volumeSpec for volume "xxx" err=no volume plugin matched for the PVC when you create or mount a volume.

Cause

The volume plug-in does not match the YAML template. As a result, the system cannot find the corresponding volume plug-in when creating or mounting a volume.

Solution

Check whether the volume plug-in exists in the cluster.

If the volume plug-in is not installed, install the plug-in. For more information, see Manage components.
If the volume plug-in is already installed, check whether the volume plug-in matches the YAML templates of the PV and PVC and whether the YAML templates meet the following requirements:
- The CSI plug-in is deployed by following the steps as required. For more information, see CSI overview.
- The FlexVolume plug-in is deployed by following the steps as required. For more information, see FlexVolume overview.
  Important
  FlexVolume is deprecated. If the version of your ACK cluster is earlier than 1.18, we recommend that you migrate from FlexVolume to CSI. For more information, see Migrate from FlexVolume to CSI.

What do I do if a large volume of traffic is recorded in the monitoring data of the csi-plugin pod?

Issue

A large volume of traffic is recorded in the monitoring data of the csi-plugin pod.

Cause

csi-plugin is responsible for mounting NAS volumes to nodes. If a NAS volume is mounted to a pod on a node, requests from the pod to the NAS volume pass through the namespace where csi-plugin is deployed. The requests are monitored by the cluster. As a result, a large volume of traffic is recorded in the monitoring data of the csi-plugin pod.

Solution

You do not need to fix this issue. The volume of traffic that flows through csi-plugin does not double. In addition, the traffic that flows through csi-plugin does not consume additional network bandwidth.

Why does the system generate the 0/x nodes are available: x pod has unbound immediate PersistentVolumeClaims event for a pod?

Issue

The system generates the 0/x nodes are available: x pod has unbound immediate PersistentVolumeClaims. preemption: 0/x nodes are available: x Preemption is not helpful for scheduling event for a pod.

Cause

The custom StorageClass referenced by the pod is not found because the custom StorageClass does not exist.

Solution

If the pod uses a dynamically provisioned volume, find the custom StorageClass that is referenced by the pod. If the StorageClass does not exist, create one.

What do I do if the PV is in the Released state and cannot be bound to the recreated PVC?

Issue

You accidentally deleted the PVC. The PV is in the Released state and cannot be bound to the PVC that you recreated.

Cause

If the reclaimPolicy of the PVC is Retain, the status of the PV changes to Released after you delete the PVC.

Solution

You need to delete the pv.spec.claimRef field for the PV and then bind the PV to the PVC as a statically provisioned volume. This way, the status of the PV changes to Bound.

For more information about how to bind a statically provisioned PV that uses a NAS file system, see Use a statically provisioned NAS volume.
For more information about how to bind a statically provisioned PV that uses an Object Storage Service (OSS) bucket, see Use a statically provisioned OSS volume.
For more information about how to bind a statically provisioned PV that uses a disk, see Use a statically provisioned disk volume.

What do I do if the PV is in the Lost state and cannot be bound to the recreated PVC?

Issue

After the PVC and PV are created, the PV remains in the Lost state and cannot be bound to the PVC.

Cause

The PVC name that is specified in the claimRef field of the PV does not exist. As a result, the status of the PV changes to Lost.

Solution

You need to delete the pv.spec.claimRef field for the PV and then bind the PV to the PVC as a statically provisioned volume. This way, the status of the PV changes to Bound.

For more information about how to bind a statically provisioned PV that uses a NAS file system, see Use a statically provisioned NAS volume.
For more information about how to bind a statically provisioned PV that uses an Object Storage Service (OSS) bucket, see Use a statically provisioned OSS volume.
For more information about how to bind a statically provisioned PV that uses a disk, see Use a statically provisioned disk volume.

FAQ about migrating from FlexVolume to CSI

In earlier ACK versions, FlexVolume is used as the volume plug-in. FlexVolume is deprecated in later versions. If the version of your ACK cluster is earlier than 1.18, we recommend that you migrate from FlexVolume to CSI. For more information, see Migrate from FlexVolume to CSI.

Other StorageClass issues

In the case that the mountOption parameter contains spelling errors, the StorageClass referenced by a PVC does not exist, or the domain name of a mount target does not exist, we recommend that you use Container Network File System (CNFS) volumes. For more information about CNFS, see CNFS overview.

Can multiple applications in a cluster use the same volume?

Disk volumes: not supported
A disk volume can be mounted only to one pod and cannot be used by multiple applications.
NAS and OSS volumes: supported
NAS and OSS volumes can be shared by multiple pods. This means that a PVC can be used by multiple applications at the same time. For more information about the limits on concurrent writes to NAS, see How do I prevent exceptions that may occur when multiple processes or clients concurrently write data to a log file? and How do I resolve the latency in writing data to an NFS file system?
- For more information about how to mount a NAS volume, see Use CNFS to manage NAS file systems (recommended), Mount a statically provisioned NAS volume, and Mount a dynamically provisioned NAS volume.
- For more information about how to mount OSS volumes, see Mount a statically provisioned OSS volume. For more information about how to use CNFS to mount dynamically provisioned OSS volumes, see Manage the lifecycle of OSS buckets.

How do I change the configurations of the StorageClasses automatically created for a disk?

You cannot modify the StorageClasses that are automatically created.

After csi-provisioner is installed, StorageClasses such as alicloud-disk-topology-alltype are automatically created in the cluster. Do not modify these StorageClasses. For more information about the StorageClasses of disks, see StorageClass. If you need to modify the configurations of a StorageClass, such as the volume type, performance, and reclaim policy, you can create a new StorageClass. The number of StorageClasses that you can create is unlimited. For more information, see Create a StorageClass.

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

China Gateway Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic Desktop Service (EDS) Featured

Cloud Phone Beta

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)