A printk deadlock refers to a situation in which multiple threads in the system wait for each other to release resources and cannot continue when the printk
function in the Linux kernel is called to print logs. printk deadlocks adversely affect the operation of system applications and services and cause kernel downtime. Therefore, preventing and resolving printk deadlocks at the earliest opportunity is critical to ensuring system stability and reliability. This topic describes why a downtime issue occurs in Alibaba Cloud Linux due to a printk deadlock and how to resolve the issue.
Problem description
When a downtime issue occurs in the kernel of the Alibaba Cloud Linux operating system, the analysis of the vmcore file reveals the following symptoms.
When the downtime issue occurs, a dump file named vmcore is generated. You can view kernel logs in the vmcore file, obtain the calltrace information that starts with "Call Trace:" to analyze the cause, and then troubleshoot the issue.
The kernel logs displayed after the dmesg command is run contains
warning
logs that are related to scheduling and work_queue.The calltrace information of a specific process has the following characteristics:
The functions that were last called are intended to acquire spinlocks, such as the
_raw_spin_lock
,queued_spin_lock_slowpath
, andraw_spin_rq_lock_netsted
functions.The system calls the
printk
function,console_unlock
function, and the preceding functions that acquire spinlocks in sequence.
Cause
The issue is caused by a deadlock that occurs when the printk mechanism of the Linux community is used. A printk deadlock is unlikely to occur, but may occur in kernel version 5.10.134-16.3
of Alibaba Cloud Linux 3.
Why does a printk deadlock occur?
If the printk function is called to print kernel logs after the kernel holds the spinlock of work_queue or runqueue (rq), printk calls the underlying Direct Rendering Manager (DRM) driver to attempt to lock work_queue or rq again. As a result, a printk deadlock occurs, which causes a system downtime issue.
NoteFor information about how the DRM driver attempts to lock an object, refer to the drm/fb-helper: Add fb_deferred_io support patch.
Why do
warning
logs related to scheduling and work_queue appear?When the kernel holds the spinlock of work_queue or rq and calls the printk function to print logs, printk prints warning messages about scheduling and work_queue to kernel logs. This is why the demesg command output contains warning logs for scheduling and work_queue.
Why does kernel version
5.10.134-16.3
of Alibaba Cloud Linux 3 have a higher probability of printk deadlocks?Warning
logs for scheduling and work_queue are printed only in a few scenarios. In kernel version5.10.134-16.3
of Alibaba Cloud Linux 3, the asynchronous unthrottle feature that is backported from the Linux community has a regression defect. The defect increases the probability of printingwarning
logs and results in a high probability of printk deadlocks in Alibaba Cloud Linux 3.
Affected versions
printk deadlocks are a known issue introduced in the drm/fb-helper: Add fb_deferred_io support patch in Linux 4.10 released by the Linux community.
The issue exists in kernel versions
4.19
and5.10
of Alibaba Cloud Linux.The probability of this issue is high in kernel version
5.10.134-16.3
of Alibaba Cloud Linux 3.
Solution
Run the following command to change the log level to prevent the printk function from printing warning
logs to a serial port.
If your journal system captures logs on a serial port but not kernel logs printed by running the dmesg command, exercise caution when you change the log level.
Change in the log level causes the
warning
logs on the serial port to be missing, but does not affect thewarning
logs in the kernel logs that are printed by using the dmesg command.
sysctl -w kernel.printk="<console_loglevel> <default_message_loglevel> <minimum_console_loglevel> <default_console_loglevel>" >> /etc/sysctl.conf
Valid values of the kernel.printk
parameter:
console_loglevel
: specifies a log level. The printk function prints the logs whose log levels are higher than the specified log level on the serial port.default_message_loglevel
: specifies the default log level. If no log level is specified, the printk function is called to print logs of the default level.minimum_console_loglevel
: specifies the minimum value of theconsole_loglevel
parameter.default_console_loglevel
: specifies the default value of theconsole_loglevel
parameter.
Linux defines eight log levels of the kernel logs. A lower log level indicates a higher priority. To prevent warning
logs from being printed on the serial port, we recommend that you set the console_loglevel
parameter to a value less than or equal to 4. Sample command:
sysctl -w kernel.printk="4 4 1 7" >> /etc/sysctl.conf
The following log levels are available:
#define LOGLEVEL_EMERG 0 /* system is unusable */
#define LOGLEVEL_ALERT 1 /* action must be taken immediately */
#define LOGLEVEL_CRIT 2 /* critical conditions */
#define LOGLEVEL_ERR 3 /* error conditions */
#define LOGLEVEL_WARNING 4 /* warning conditions */
#define LOGLEVEL_NOTICE 5 /* normal but significant condition */
#define LOGLEVEL_INFO 6 /* informational */
#define LOGLEVEL_DEBUG 7 /* debug-level messages */