All Products
Search
Document Center

:What do I do if an NVMe disk on a Linux ECS instance is unavailable due to an invalid io_timeout parameter?

最終更新日:Jul 29, 2024

This topic describes the cause of and solutions to the issue that a Non-Volatile Memory Express (NVMe) disk on a Linux Elastic Compute Service (ECS) instance is unavailable due to an invalid io_timeout parameter.

Problem description

When a Linux ECS instance uses an NVMe disk as the system disk, I/O read/write operations on the instance are slow. As a result, the operating system of the instance or the applications that are hosted on the instance cannot perform I/O read/write operations on the NVMe disk. The status of the file systems on the NVMe disk changes from read-write to read-only. Subsequent write operations fail on the NVMe disk, and the operating system and the applications cannot provide services as expected.

Note

Slow I/O read/write refers to a scenario in which I/O read/write operations on disks are performed at a rate that is lower than expected or require an extended period of time to complete.

Cause

The io_timeout parameter of the NVMe driver specifies the maximum I/O timeout period. If the latency of an I/O operation exceeds the value of the io_timeout parameter, the NVMe driver fails the I/O operation and returns an error. As a result, the status of the file systems on the NVMe disk may change from read-write to read-only. If the status of the NVMe disk changes to read-only, subsequent write operations on the disk fail, and the operating system of the instance or the applications that are hosted on the instance cannot provide services as expected.

Note
  • In most Linux distributions, the io_timeout parameter is set to a default value of 30. Unit: seconds. To prevent timeout errors of I/O operations on the NVMe disk, set the io_timeout parameter to a supported maximum value. For recent kernel versions, the supported maximum value of the io_timeout parameter is 4294967295. Unit: seconds. For earlier kernel versions, the supported maximum value is 255. Unit: seconds.

  • The kernel module of the NVMe driver varies based on the kernel version. The following kernel modules are used for the NVMe driver: nvme.ko and nvme_core.ko. The full name of the io_timeout parameter may be nvme.io_timeout or nvme_core.io_timeout.

Solutions

Temporarily configure the io_timeout parameter

You can perform the following steps to set the io_timeout parameter. The value that you specify for the io_timeout parameter is temporary. You must reset the io_timeout parameter each time you restart the Linux ECS instance.

  1. Connect to the Linux ECS instance.

    For more information, see Connection method overview.

  2. Check the path of the kernel module that contains the io_timeout parameter.

    • Run the following command to check whether the /sys/module/nvme_core/parameters/io_timeout path exists. If the path exists, the full name of the io_timeout parameter is nvme_core.io_timeout.

      cat /sys/module/nvme_core/parameters/io_timeout
    • If the path does not exist, run the following command to check whether the /sys/module/nvme/parameters/io_timeout path exists. If the path exists, the full name of the io_timeout parameter is nvme.io_timeout.

       cat /sys/module/nvme/parameters/io_timeout
  3. Run one of the following commands to write 4294967295 to the path that you obtained in the preceding step.

    nvme.ko kernel module

    sudo sh -c 'echo 4294967295 > /sys/module/nvme/parameters/io_timeout'

    nvme_core.ko kernel module

    sudo sh -c 'echo 4294967295 > /sys/module/nvme_core/parameters/io_timeout'
    • If the command runs successfully and no errors are returned, the io_timeout parameter is set to 4294967295.

    • If an error message that is similar to Numerical result out of range is returned, repeat this step to set the io_timeout parameter to 255.

Permanently configure the io_timeout parameter

You can change the value of the io_timeout parameter in the GRand Unified Bootloader (GRUB) configuration file or use the ecs_nvme_config plug-in of Cloud Assistant to configure the NVMe-related settings in the instance operating system. For more information, see How do I install the NVMe driver for a custom image? The value that you specify for the io_timeout parameter is permanent and is not affected by factors such as instance startup.