This topic describes the cause of and solutions to the issue that a Non-Volatile Memory Express (NVMe) disk on a Linux Elastic Compute Service (ECS) instance is unavailable due to an invalid io_timeout parameter.
Problem description
When a Linux ECS instance uses an NVMe disk as the system disk, I/O read/write operations on the instance are slow. As a result, the operating system of the instance or the applications that are hosted on the instance cannot perform I/O read/write operations on the NVMe disk. The status of the file systems on the NVMe disk changes from read-write to read-only. Subsequent write operations fail on the NVMe disk, and the operating system and the applications cannot provide services as expected.
Slow I/O read/write refers to a scenario in which I/O read/write operations on disks are performed at a rate that is lower than expected or require an extended period of time to complete.
Cause
The io_timeout
parameter of the NVMe driver specifies the maximum I/O timeout period. If the latency of an I/O operation exceeds the value of the io_timeout parameter, the NVMe driver fails the I/O operation and returns an error. As a result, the status of the file systems on the NVMe disk may change from read-write to read-only. If the status of the NVMe disk changes to read-only, subsequent write operations on the disk fail, and the operating system of the instance or the applications that are hosted on the instance cannot provide services as expected.
In most Linux distributions, the
io_timeout
parameter is set to a default value of 30. Unit: seconds. To prevent timeout errors of I/O operations on the NVMe disk, set theio_timeout
parameter to a supported maximum value. For recent kernel versions, the supported maximum value of theio_timeout
parameter is 4294967295. Unit: seconds. For earlier kernel versions, the supported maximum value is 255. Unit: seconds.The kernel module of the NVMe driver varies based on the kernel version. The following kernel modules are used for the NVMe driver:
nvme.ko
andnvme_core.ko
. The full name of the io_timeout parameter may benvme.io_timeout
ornvme_core.io_timeout
.
Solutions
Temporarily configure the io_timeout
parameter
You can perform the following steps to set the io_timeout
parameter. The value that you specify for the io_timeout parameter is temporary. You must reset the io_timeout parameter each time you restart the Linux ECS instance.
Connect to the Linux ECS instance.
For more information, see Connection method overview.
Check the path of the kernel module that contains the
io_timeout
parameter.Run the following command to check whether the
/sys/module/nvme_core/parameters/io_timeout
path exists. If the path exists, the full name of the io_timeout parameter isnvme_core.io_timeout
.cat /sys/module/nvme_core/parameters/io_timeout
If the path does not exist, run the following command to check whether the
/sys/module/nvme/parameters/io_timeout
path exists. If the path exists, the full name of the io_timeout parameter isnvme.io_timeout
.cat /sys/module/nvme/parameters/io_timeout
Run one of the following commands to write 4294967295 to the path that you obtained in the preceding step.
nvme.ko kernel module
sudo sh -c 'echo 4294967295 > /sys/module/nvme/parameters/io_timeout'
nvme_core.ko kernel module
sudo sh -c 'echo 4294967295 > /sys/module/nvme_core/parameters/io_timeout'
If the command runs successfully and no errors are returned, the
io_timeout
parameter is set to 4294967295.If an error message that is similar to
Numerical result out of range
is returned, repeat this step to set theio_timeout
parameter to 255.
Permanently configure the io_timeout parameter
You can change the value of the io_timeout
parameter in the GRand Unified Bootloader (GRUB) configuration file or use the ecs_nvme_config
plug-in of Cloud Assistant to configure the NVMe-related settings in the instance operating system. For more information, see How do I install the NVMe driver for a custom image? The value that you specify for the io_timeout parameter is permanent and is not affected by factors such as instance startup.