In Linux memory management, slab_unreclaimable is memory that is allocated by the slab allocator and marked as unreclaimable. A high ratio of unreclaimable memory to total memory can reduce available memory and degrade system performance. This topic describes how to troubleshoot high slab_unreclaimable memory usage on an Alibaba Cloud Linux system.
Symptoms
When you run the cat /proc/meminfo | grep "SUnreclaim" command on a Linux instance to view the SUnreclaim value, you may find that the value is high (for example, SUnreclaim: 6069340 kB). If this value exceeds 10% of the total system memory, it indicates that the slab_unreclaimable memory usage is too high and that the system may have a slab memory leak.
Causes
In Linux memory management, the slab is a caching mechanism that the kernel uses to efficiently allocate small blocks of memory. Kernel components or drivers request memory from the slab allocator by calling memory allocation interfaces, such as kmalloc. If these components or drivers fail to properly release the memory, the amount of unreclaimable memory increases, and available memory decreases.
Troubleshooting steps
Connect to the Linux instance that you want to troubleshoot.
For more information, see Choose an ECS remote connection method.
Run the following command to find the name of the slab that uses many
objectsor a large amount of memory and is marked as unreclaimable.View information about the slab that uses the most
objectsor memory.slabtop -s -aIn the command output, view and record the name (in the
NAMEcolumn) of the slab that has a high value in theOBJ/SLABcolumn.Check if the slab memory is unreclaimable.
In the command, replace
<slab NAME>with the name of the slab that has a high value in theOBJ/SLABcolumn from the previous step.cat /sys/kernel/slab/<slab NAME>/reclaim_accountFor example, check if the slab named
kmalloc-192is unreclaimable.cat /sys/kernel/slab/kmalloc-192/reclaim_accountA result of 0 indicates the slab memory is unreclaimable. A result of 1 indicates it is reclaimable.
Identify the cause of the high slab_unreclaimable memory usage.
You can use the crash tool for static analysis or the perf tool for dynamic analysis to find the cause of the slab memory leak. In this example scenario, the name of the slab with the memory leak is
kmalloc-192.Static analysis using the crash tool
Run the following command to install the crash tool.
sudo yum install crash -yRun the following command to install the kernel-debuginfo tool.
Alibaba Cloud Linux 3
sudo yum install -y kernel-debuginfo-<kernel_version> --enablerepo=alinux3-plus-debugNoteReplace
kernel_versionwith the actual kernel version of your system. You can run theuname -rcommand to query the kernel version.Alibaba Cloud Linux 2
sudo yum install kernel-debuginfo -y
Run the following command to start the crash tool.
sudo crashIn the crash tool, run the following command to view memory statistics for
kmalloc-192.kmem -S kmalloc-192If a large amount of statistical information is returned, you can display only the last few lines, for example, 10 lines.
kmem -S kmalloc-192 | tail -n 10The following sample output is returned:
SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea004c94e780 ffff88132539e000 0 42 29 13 ffffea004cbef900 ffff88132fbe4000 0 42 40 2 ffffea000a0e6280 ffff88028398a000 0 42 40 2 ffffea004bfa8000 ffff8812fea00000 0 42 41 1 ffffea006842b380 ffff881a10ace000 0 42 41 1 ffffea0009e7dc80 ffff880279f72000 0 42 34 8 ffffea004e67ae80 ffff881399eba000 0 42 40 2 ffffea00b18d6f80 ffff882c635be000 0 42 42 0In the statistics for
ffff88028398a000, the free memory (in theFREEcolumn) is low, and the allocated memory (in theALLOCATEDcolumn) is high.In the crash tool, run the following command to view memory information for
ffff88028398a000.rd ffff88028398a000 512 -SThe command returns a large amount of information. You can follow the prompts to print multiple pages for analysis. For example:
If the
put_cred_rcufunction appears multiple times in the returned information, search for theput_cred_rcufunction in the Linux kernel source code.void __put_cred(struct cred *cred) { call_rcu(&cred->rcu, put_cred_rcu); }The
put_cred_rcufunction is used to asynchronously release the cred struct. The presence ofput_cred_rcuat the end of the cred struct indicates a slab memory leak in the kernel's cred struct.
Dynamic analysis using the perf tool
Run the following command to install the perf tool.
sudo yum install perf -yRun the following command to use the perf tool to dynamically record the unreleased memory in
kmalloc-192. The data is recorded at 200-second intervals.sudo perf record -a -e kmem:kmalloc --filter 'bytes_alloc == 192' -e kmem:kfree --filter ' ptr != 0' sleep 200In the current directory, print the dynamically recorded data to a temporary file.
In this example, the temporary file is named testperf.txt.
sudo perf script > testperf.txtRun the following command to view the content of the testperf.txt file.
cat testperf.txtManually inspect the memory information for entries that have no free memory (
free). Then, search the Linux kernel source code for the function that is causing the slab memory leak.
After using tools such as crash and perf to determine the function call path or the affected kernel data structure, identify the specific source of the memory leak with the guidance of a kernel developer or professional operations and maintenance (O&M) engineer. Then, resolve the memory leak.
The following are some possible solutions:
Upgrade the kernel or apply a patch.
Adjust kernel parameters.
Restart the affected services or modules.
Optimize the application or driver.
Restart the system.
References
A slab memory leak reduces the available memory for applications on an instance and causes memory fragmentation. This can trigger the system's out-of-memory (OOM) Killer and cause system performance fluctuations.
Memory fragmentation: Mitigate Linux memory fragmentation
Polkit memory leak: How to resolve a memory leak in polkit in Alibaba Cloud Linux 2?
System OOM Killer: Troubleshoot the OOM Killer