File operation errors or issues about read and write access to files - File Storage NAS

When you access files in a file system, the files in the file system may be affected by certain limits, resulting in the following issues: File operation errors occur, the mount target does not respond, and the access does not respond. This topic provides solutions to the following issues: some common file operation errors occur, file owners are inconsistent, data is not synchronized, and the access does not respond.

Cross-mount compatibility FAQ
- FAQ about compatibility issues for mounting an SMB file system on Linux
- FAQ about compatibility issues for mounting a General-purpose NFS file system on Windows
FAQ about other issues related to file read and write

What do I do if a server does not respond within 35 seconds when multiple clients concurrently access a file on the server?

Cause: The kernel driver of the current Server Message Block (SMB) protocol fails to work as expected. If the SMB protocol version is 2.1 or 3.0, the server does not respond within 35 seconds. In this case, the clients cannot send SMB break acknowledgment packets to the server.

Solution 1: Set the vers parameter to 2.0 if you mount the file system on a Linux Elastic Compute Service (ECS) instance.

Solution 2: Perform the following operations:

If the Common Internet File System (CIFS) module is being loaded, run the following command to disable the oplocks feature:
# modprobe cifs enable_oplocks=0
If the CIFS module is already loaded, run the following command to disable the oplocks feature:
# echo 0 > /sys/module/cifs/parameters/enable_oplocks
Run the following command to check the status of the oplocks feature:
# cat /sys/module/cifs/parameters/enable_oplocks
In the command output, Y indicates that the feature is enabled. N indicates that the feature is disabled.
Note
- To apply the preceding changes, unmount and remount the SMB file system.
- To permanently apply the preceding changes, create the /etc/modprobe.d/cifs.conf file and add the options cifs enable_oplocks=0 statement to the file.

Why am I unable to create a symbolic link?

Cause

When you mount an SMB file system on Linux, the mfsymlinks option is not selected or the protocol version is 2.0.

Solution

When you mount an SMB file system on Linux, use the 2.1 or 3.0 protocol version and select the mfsymlinks option. The following sample mount command shows the mount parameters. For more information, see Mount an SMB file system on a Linux ECS instance.

sudo mount -t cifs //file-system-id.region.nas.aliyuncs.com/myshare /mnt -o vers=2.1,guest,uid=0,gid=0,dir_mode=0755,file_mode=0755,mfsymlinks,cache=strict,rsize=1048576,wsize=1048576

Why does the mount target of an SMB file system not respond?

Cause

If the kernel version of your Linux distribution is 3.10.0-514 or earlier, the kernel driver of the SMB protocol may fail to respond when multiple clients concurrently access the file system. As a result, the mount target cannot be accessed. The following record is included in the kernel log:

...
[<ffffffffc03c9bc1>] cifs_oplock_break+0x1f1/0x270 [cifs]
[<ffffffff810a881a>] process_one_work+0x17a/0x440
[<ffffffff810a8d74>] rescuer_thread+0x294/0x3c0
...

Solution

Set the cache parameter to none to remount the file system. This may affect the performance of the file system.
Upgrade the operating system of the Linux ECS instance.

What do I do if the "cp: error writing '</path/to/file>': Bad file descriptor" error message is returned when I copy a large file?

Cause

A temporary minor fault occurs on the network or at the backend. SMB clients on some Linux distributions, such as SUSE, have limited features and may not support failover.

Solution

We recommend that you select a Linux version supported by File Storage NAS (NAS) SMB. The following table lists the Linux versions supported by NAS SMB.

Operating system	Version
CentOS	CentOS 7.6 64-bit: 3.10.0-957.21.3.el7.x86_64 and later
Alibaba Cloud Linux	Alibaba Cloud Linux 2.1903 64-bit: 4.19.43-13.2.al7.x86_64 and later Alibaba Cloud Linux 3.2104 64-bit: 5.10.23-4.al8.x86_64 and later
Debian	Debian 9.10 64-bit: 4.9.0-9-amd64 and later
Ubuntu	Ubuntu 18.04 64-bit: 4.15.0-52-generic and later
openSUSE	openSUSE 42.3 64-bit: 4.4.90-28-default and later
SUSE Linux	SUSE Linux Enterprise Server 12 SP2 64-bit: 4.4.74-92.35-default and later
CoreOS	CoreOS 2079.4.0 64-bit: 4.19.43-coreos and later

What do I do if Chinese characters written to a NAS file system are displayed in garbled text on the client?

Issue

You write Chinese characters (contained in the names or content of files) to a NAS file system on a Linux or Windows client. When you use a different client to read data, the Chinese characters are displayed in garbled text.

Cause

On Windows clients, Chinese characters are encoded and decoded by using the GBK character set by default. On Linux clients, Chinese characters are encoded and decoded by using the UTF-8 character set by default. Before data is written to the NAS file system, the data is encoded by using the character set that corresponds to the platform. When the data is read on another platform, the data cannot be decoded because the GBK character set and the UTF-8 character set are incompatible. As a result, Chinese characters are displayed in garbled text.

Solution

To ensure compatibility, we recommend that you use a Windows client to mount an SMB file system and use a Linux client to mount a Network File System (NFS) file system.

Why does it take a long time to create or open a file when I mount an NFS file system on Windows?

Cause

When you mount an NFS file system on Windows, compatibility issues related to case-sensitive and case-insensitive letters may occur. The performance of creating files degrades with the number of directories. This is because the directory is traversed every time you create a file. When the number of directories reaches 100,000, one directory traversal requires more than 10 seconds.

Solution

Add the -o casesensitive=yes field to the mount parameters to prevent directory traversal. Run the following command to mount the file system:

mount -o nolock -o mtype=hard -o timeout=60 -o casesensitive=yes \\file-system-id.region.nas.aliyuncs.com\! Z:

You must replace the drive letter Z and the domain name of the mount target file-system-id.region.nas.aliyuncs.com with the actual drive letter and domain name.

Note

The casesensitive option conflicts with the native semantics of Windows. When you use NFS directories, you must make sure that name conflicts do not occur due to case sensitivity. For example, a.txt conflicts with A.TXT. If you modify the mount parameters, uncertain impact may occur. We recommend that you mount an SMB file system on Windows.

What do I do if the `invalid device` error is returned when I rename a file in an NFS file system on a Windows client?

If you mount an NFS file system on a subdirectory of an ECS instance, the invalid device error message is returned when you rename a file. To fix this error, mount the file system on the root directory of the ECS instance. For more information, see Step 2: Mount the General-purpose NFS file system.

How do I resolve the latency in creating files in an NFS file system?

Issue
ECS-1 creates the abc file, but it takes some time for ECS-2 to read the abc file. The latency ranges from 1 second to 1 minute.
Cause
The issue is caused by the lookup cache, which meets the expected time T. For example, ECS-2 accesses the abc file before ECS-1 has created it, causing the issue that the file does not exist on ECS-2. As a result, a record indicating that the abc file does not exist is cached. FileAttr has not expired within time T. Therefore, when ECS-2 accesses the file again, it still accesses the record indicating that the abc file does not exist, which was cached the first time that ECS-2 accessed the file.
Solution
To ensure that ECS-2 can read the file immediately after ECS-1 creates it, you can use one of the following solutions:
- Solution 1: Disable negative lookup cache for ECS-2 so that files that do not exist are not cached. This solution causes the minimum overhead.
  Add the lookupcache=positive (default value: lookupcache=all) field when you mount the file system. Run the following command to mount the file system:
```
sudo mount -t nfs -o vers=3,nolock,proto=tcp,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,lookupcache=positive file-system-id.region.nas.aliyuncs.com:/ /mnt
```
- Solution 2: Disable all caches on ECS-2. This solution results in poor performance. Select an appropriate solution based on your business requirements.
  Add the actimeo=0 field when you mount the file system. Run the following command to mount the file system:
```
sudo mount -t nfs -o vers=3,nolock,proto=tcp,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,actimeo=0 file-system-id.region.nas.aliyuncs.com:/ /mnt
```

How do I resolve the latency in writing data to an NFS file system?

Issue
ECS-1 has updated the abc file. However, when ECS-2 reads the file immediately, the file content is still not updated.
Cause
The following two causes are involved:
- Cause 1: After ECS-1 writes the abc file, ECS-1 does not flush the content immediately. Instead, it caches the content into the page cache and relies on the application layer to call the fsync or close operation.
- Cause 2: File caches exist on ECS-2. Therefore, ECS 2 may not immediately obtain the latest file content from the server. For example, ECS-2 has cached the data when ECS-1 updates the abc file. As a result, the cached content is still used when ECS-2 reads the file.
Solution
To ensure that ECS-2 can read the file immediately after ECS-1 creates it, you can use one of the following solutions:
- Solution 1: Apply the file-based close-to-open (CTO) consistency model so that the read and write operations on ECS-1 or ECS-2 conform to CTO consistency. This way, ECS-2 can definitely read the latest data. Specifically, ECS-1 executes the close or fsync operation after it updates a file. ECS-2 executes the open operation before it reads the file.
- Solution 2: Disable all caches on ECS-1 and ECS-2. This solution results in poor performance. Select an appropriate solution based on your business requirements.
  - Disable caching on ECS-1. When you mount the file system, add the noac field to ensure that all written data is immediately flushed into the disk. Run the following command to mount the file system:
```
sudo mount -t nfs -o vers=3,nolock,proto=tcp,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,noac file-system-id.region.nas.aliyuncs.com:/ /mnt
```
    Note
    If you call the fsync operation after the write operation on ECS-1 is complete or you call the sync operation to write data, replace "noac" in the preceding command with "actimeo=0" to improve the performance slightly.
    noac is equivalent to actimeo=0 plus sync. In this case, all write operations are forcibly executed by using sync.
  - Disable caching on ECS-2. When you mount the file system, add the actimeo=0 field to omit all caches. Run the following command to mount the file system:
```
sudo mount -t nfs -o vers=3,nolock,proto=tcp,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,actimeo=0 file-system-id.region.nas.aliyuncs.com:/ /mnt
```

Why does a file in a NAS file system belong to different owners when I query the file on two ECS instances?

In NAS file systems, users are identified by User Identifiers (UIDs) or Group Identifiers (GIDs) instead of user names. The owner name of a file that you query on an ECS instance is converted from a UID. If a UID is converted into different user names on different ECS instances, the UID is identified as a different owner on each ECS instance.

For example, create a file named admin_on_machine1 on ECS Instance 1 and a file named admin_on_machine2 on ECS Instance 2 as the admin user. Run the ll command on ECS Instance 1 to view the created file, as shown in the following figure. faq001 Run the ll command on ECS Instance 2 to view the created file, as shown in the following figure. faq002 The query results on the two ECS instances indicate that the same file has different owner names.

Run the id command on the two ECS instances to query the information about the admin user. The UID of the admin user on ECS Instance 1 is 505, as shown in the following figure. faq04 The UID of the admin user on ECS Instance 2 is 2915, as shown in the following figure. faq005 Run the stat admin_on_machine1 admin_on_machine2 command, as shown in the following figure. The results indicate that the two files belong to two different UIDs. faq

How do I prevent exceptions that may occur when multiple processes or clients concurrently write data to a log file?

Issue

NAS allows multiple clients to write data to different files in the same namespace. However, NFS does not support atomic appends. If multiple processes or clients concurrently write data to the same file, exceptions such as overwrite, crossover, and disordered content may occur. This is because each process independently maintains context information. The file can be a log file. The context information includes file descriptors and write locations.

Solution

Recommended. Configure different processes or clients to write data to different files in the same file system. When you analyze or process data, you can merge these files. This solution can resolve the issues that are caused by concurrent write operations without the need to use file locks. This solution does not affect system performance.
Use the flock and seek functions together. This ensures the atomicity and consistency of write operations. However, this solution requires a long period of time and may significantly affect system performance. The following steps describe this method.

Use the flock and seek functions together

NFS does not support atomic appends. If multiple clients append data to the same file such as a log, data entries may overwrite each other. On Linux, you can use the flock and seek functions together to simulate atomic appends in an NFS file system. This ensures data consistency when multiple processes concurrently append data to the same file.

To use the flock and seek functions together, perform the following steps:

Call the fd=open(filename, O_WRONLY | O_APPEND | O_DIRECT) statement to open a file by using the append method. This statement is used to set the write method to O_DIRECT and obtain the file descriptor. O_DIRECT specifies a write-only method. In this case, no page cache is used.
Call the flock(fd, LOCK_EX|LOCK_NB) function to obtain a file lock. If the function fails to obtain a file lock, an error message is returned. The possible cause of the failure is that the file lock is in use. You can try again or troubleshoot the failure.
After the file lock is obtained, call the lseek(fd, 0, SEEK_END) function to set the current file offset of the file descriptor to the end of the file.
Write data to the end of the file. The file lock is used to prevent data entries from overwriting each other.
After data is written to the file, call the flock(fd, LOCK_UN) function to release the file lock.

What do I do if a `523` error is returned when I run the `ls` command on a Linux client on which an NFS file system is mounted?

Issue

When you run the ls command on a Linux client on which an NFS file system is mounted, the following error message is returned.

Cause

If you run the ls command on a directory of a file system when multiple rename operations are concurrently performed, a 523 error occurs.

Solution

Try again later. If the error persists, contact NAS technical support.

Why am I unable to mount an SMB file system?

Issue

The net use command is used to mount SMB file systems. If you accidentally use the command to mount NFS file systems, you can no longer use the command to mount an SMB file system.

Solution

Make sure that the protocol of the file system is SMB. Then, stop the mount operation and mount the file system again 5 minutes later. If the issue persists, contact NAS technical support.

Why is a mounted SMB directory visible only to an administrator?

Windows user accounts are isolated from each other. For example, if you log on to Windows as User A, you cannot view the directory that you mounted as User B.

If you want to allow multiple users to share data, create a shared directory. For example, you can run the following command to create a shared directory named myshare on C drive:

mklink /D C:\myshare \\xxxxxxx-xxxx.cn-beijing.nas.aliyuncs.com\myshare\

How do I improve the performance of an SMB file system that is mounted on Linux?

If the performance of your SMB file system does not meet your requirements, you can improve the performance based on one of the following causes:

Cause 1: The throughput of an SMB file system varies based on the storage capacity of the SMB file system. The maximum read and write throughput of the SMB file system has a linear relationship with the capacity of the file system.
Solution: Use the fio tool to test the performance of the SMB file system. For more information, see Test the performance of a NAS file system.
Cause 2: The bandwidth of the Linux ECS instance is low.
Solution: Use multiple Linux ECS instances to make sure that the file system can provide expected performance.
Cause 3: Caching is disabled for the SMB client.
Solution: If the cache parameter is set to strict, caching is enabled. If the cache parameter is set to none, caching is disabled. By default, caching is enabled for an SMB client. You can run the sudo mount | grep cifs command to check the value of the cache parameter.
Cause 4: The I/O size of the SMB client does not meet your business requirements.
Solution: Configure the rsize and wsize parameters based on your business requirements. The default value of the two parameters is 1048576.
Cause 5: The Linux ECS instance uses low-specification CPU or memory, or most CPU or memory resources are occupied by other processes.
Solution: Specify CPU and memory specifications for the Linux ECS instance based on your business requirements. This ensures that the file system can function as expected. You can run the top command to check the CPU utilization and memory usage.
Cause 6: The atime parameter is configured when you mount the file system.
Solution: Do not configure the atime parameter if your business does not require fast file access.
Cause 7: The web server such as Apache HTTP Server on the Linux ECS instance processes a few write requests. These requests require notifications and frequent read operations on a large number of small files.
Solution: Configure the caching mechanism of the web server on the Linux ECS instance. You can also contact Alibaba Cloud to enable the acceleration feature for the web server.

How do I fix the Permission denied error that occurs when I access an SMB file system on Linux?

Cause: You specified an invalid value for the uid, gid, file_mode, or dir_mode parameter when you mounted the file system.

Solution: Verify that the values that are specified for the uid, gid, file_mode, and dir_mode parameters are valid. For more information, see Mount an SMB file system.

How do I rename the files of an SMB file system by changing the letter case?

The file names of an SMB file system are case-insensitive. This also applies to Windows systems. You cannot rename a file in an SMB file system by changing only the letter case.

However, you can change a file name to a different name that contains different letters. Then, you can change the file name to the original name with a different letter case.

Why am I unable to change the owner of a file and the access mode of a file or a directory?

You can specify the owner of a file and the access mode of a file or directory in a file system only when you mount the file system. For more information, see Mount an SMB file system.

How are files whose names are suffixed with .nfs generated? How do I delete the files?

If you delete a file that is being used by an application, a temporary file whose name is suffixed with .nfs is generated. When the process that uses the file ends, the temporary file is automatically deleted.

What do I do if the following error message is returned when I access a file in the directory of a NAS file system: bind conn to session failed on NFSv4 server?

Cause
The error message is returned because you mounted the file system by using the NFSv4.1 protocol. NAS does not support this protocol.
Solution
Use the NFSv3.0 or NFSv4.0 protocol to remount the file system based on your business requirements. For more information, see Usage notes.

What do I do if data is not synchronized when I mount an NFS file system on multiple ECS instances?

Issue

When you use multiple clients to perform real-time synchronization for a NAS file system that has multiple mount targets, a high latency occurs.

Cause

By default, the kernel of the operating system maintains the properties of files and directories and generates a metadata cache to reduce the need to call the NFSPROC_GETATTR remote procedure.

Solution

Run the following mount command to disable caching of file and directory attributes:

mount -t nfs4 -o noac file-system-id.region.nas.aliyuncs.com:/ /mnt

In the preceding command, file-system-id.region.nas.aliyuncs.com specifies the domain name of the mount target of the NAS file system and /mnt specifies the local directory that resides on the current ECS instance. You need to replace the values with the actual values.

Why does a pod still write data to an unmounted NAS file system after I mount a new NAS file system?

Cause

When you mount a NAS file system on an ECS instance and map the mount directory of the NAS file system to a container by using the local volume (HostPath) mapping method, the mount information of the container is independent of that of the ECS instance. When you detach the mount directory from the ECS instance or mount a new NAS file system on the ECS instance, the container that has been started still uses the NAS file system that was mounted at startup.

Solution

Remount the new NAS file system on the ECS instance. Then, restart the pod.

Why am I unable to view files in a NAS file system after my server is restarted or stopped?

If the file system still exists, the common cause is that automatic mounting of NAS file systems is not configured on the server.

In this case, manually mount the NAS file system again. For more information, see Usage notes.

For more information about how to enable automatic mounting of NAS file systems at restart, see the following topics:

When I mount an SMB file system on Linux, why does it take a long time to migrate or replicate files?

Check the performance of the file system. If the performance of the file system meets your requirements, a possible cause is that the files are not concurrently migrated or replicated. You can use the following open source tools to migrate or replicate files.

GNU Parallel
Specify a number of threads based on your system resources. Example: find * -type f | parallel --will-cite -j 10 cp {} /mnt/smb/ &
Fpart
Fpsync
multi

Why is the "Disk quota exceeded" error message returned when data is written to a file system?

Cause
The size or number of files in a directory exceeds the specified user quota. As a result, write operations such as increasing file sizes, creating files or directories, and moving files to another directory fail. The Disk quota exceeded error message is returned.
Solution
1. We recommend that you clear data at your earliest opportunity to free up space, or increase the capacity limit of the directory quota. For more information, see Modify a directory quota that is assigned to a user.
2. After you clear data, we recommend that you perform test write operations on the directory configured with quotas, for example, creating and writing data to a test file, to trigger asynchronous refresh of the quota cache. Then, restart your service after the test write operations are successful.

What can I do if I do not have the permissions to access an NFS file system?

You can perform the following steps to configure the AnonymousGID and AnonymousUID registry keys:

Log on to the ECS instance on which the file system is mounted.
Open the Command Prompt and run the regedit command to open the Registry Editor window.
Choose HKEY_LOCAL_MACHINE > SOFTWARE > Microsoft > ClientForNFS > CurrentVersion > Default.
Right-click a blank area, choose New > DWORD (32-bit) Value, and then create the following registry keys:
- AnonymousGID: Set the value of the key to 0.
- AnonymousUID: Set the value of the key to 0.
Restart the ECS instance.
Run the following command to remount the NFS file system:
```
mount -o nolock -o mtype=hard -o timeout=60 \\file-system-id.region.nas.aliyuncs.com\! Z:
```
You must replace the drive letter Z: and the domain name file-system-id.region.nas.aliyuncs.com with the actual drive letter and domain name.
Run the mount command to check whether the mount is successful.
If the command output is similar to the following information and contains mount=hard, locking=no, and timeout=<a value that is greater than or equal to 10>, the NFS file system is mounted. Otherwise, the NFS file system fails to be mounted.

Can I modify the permissions on the root directory of a NAS file system by running the chown command?

No, you cannot modify the permissions on the root directory of a NAS file system.

To manage the permissions on a local directory where the NAS file system is mounted, you can use a subdirectory to mount the file system. For example, if you mount the root directory of a NAS file system to a local /data directory, you cannot change the owner and group of the /data directory by running the chown command. If you mount a subdirectory of the NAS file system to the local /data directory, you can change the owner and group of the /data directory by running the chown command. You must create the subdirectory in advance. Note that you must mount the root directory of the NAS file system before you create a subdirectory of the file system. For more information about how to create and mount a subdirectory of a NAS file system, see How do I create and mount a subdirectory of a NAS file system on Linux?

What do I do if a server does not respond within 35 seconds when multiple clients concurrently access a file on the server?

Why am I unable to create a symbolic link?

Why does the mount target of an SMB file system not respond?

What do I do if the "cp: error writing '</path/to/file>': Bad file descriptor" error message is returned when I copy a large file?

What do I do if Chinese characters written to a NAS file system are displayed in garbled text on the client?

Why does it take a long time to create or open a file when I mount an NFS file system on Windows?

What do I do if the invalid device error is returned when I rename a file in an NFS file system on a Windows client?

How do I resolve the latency in creating files in an NFS file system?

How do I resolve the latency in writing data to an NFS file system?

Why does a file in a NAS file system belong to different owners when I query the file on two ECS instances?

How do I prevent exceptions that may occur when multiple processes or clients concurrently write data to a log file?

What do I do if a 523 error is returned when I run the ls command on a Linux client on which an NFS file system is mounted?

Why am I unable to mount an SMB file system?

Why is a mounted SMB directory visible only to an administrator?

How do I improve the performance of an SMB file system that is mounted on Linux?

How do I fix the Permission denied error that occurs when I access an SMB file system on Linux?

How do I rename the files of an SMB file system by changing the letter case?

Why am I unable to change the owner of a file and the access mode of a file or a directory?

How are files whose names are suffixed with .nfs generated? How do I delete the files?

What do I do if the following error message is returned when I access a file in the directory of a NAS file system: bind conn to session failed on NFSv4 server?

What do I do if data is not synchronized when I mount an NFS file system on multiple ECS instances?

Why does a pod still write data to an unmounted NAS file system after I mount a new NAS file system?

Why am I unable to view files in a NAS file system after my server is restarted or stopped?

When I mount an SMB file system on Linux, why does it take a long time to migrate or replicate files?

Why is the "Disk quota exceeded" error message returned when data is written to a file system?

What can I do if I do not have the permissions to access an NFS file system?

Can I modify the permissions on the root directory of a NAS file system by running the chown command?

What do I do if the `invalid device` error is returned when I rename a file in an NFS file system on a Windows client?

What do I do if a `523` error is returned when I run the `ls` command on a Linux client on which an NFS file system is mounted?