The slab_unreclaimable memory is the memory that is allocated by the slab allocator in Linux memory management and marked as unreclaimable. If the slab_unreclaimable memory takes up a high percentage of the total memory, the amount of available memory decreases and system performance degrades. This topic describes how to identify the causes of a high percentage of slab_unreclaimable memory on an Elastic Compute Service (ECS) instance that runs Alibaba Cloud Linux.
Problem description
When you run the cat /proc/meminfo | grep "SUnreclaim"
command on a Linux instance to view the SUnreclaim value, you find that the SUnreclaim value is large (for example, SUnreclaim: 6069340 kB
), which indicates a large amount of slab_unreclaimable memory. If the slab_unreclaimable memory takes up more than 10% of the total memory, the slab memory may leak.
Cause
In Linux memory management, the kernel uses slabs as a caching mechanism to efficiently allocate small chunks of memory. The kernel component or driver requests memory from the slab allocator by calling a memory allocation API (such as kmalloc), but does not properly release the memory, which results in less available memory.
Troubleshooting procedure
Connect to the Linux instance that has a high percentage of slab_unreclaimable memory.
For more information, see Connection method overview.
Run the following command to check the name of the slab that has the largest number of
objects
or the largest amount of memory and whose memory cannot be reclaimed:View information about the slab that has the largest number of
objects
or has the largest amount of memory.slabtop -s -a
In the command output, you can view and record the name (the value in the
NAME
column) of the slab that has the largest value in theOBJ/SLAB
column.Run the following command to determine whether the slab memory is reclaimable:
In the following command, replace
<slab NAME>
with the name of the slab obtained in the previous step that has the largest value in theOBJ/SLAB
column.cat /sys/kernel/slab/<slab NAME>/reclaim_account
For example, you can run the following command to determine whether the slab named
kmalloc-192
has memory marked reclaimable.cat /sys/kernel/slab/kmalloc-192/reclaim_account
If the slab memory is unreclaimable, 0 is displayed in the command output. If the slab memory is reclaimable, 1 is displayed in the command output.
Identify the causes of the high percentage of the slab_unreclaimable memory.
You can use the crash tool to statically analyze or the perf tool to dynamically analyze the issue to identify the causes of slab memory leaks. In the example scenario in this topic, the slab named
kmalloc-192
has memory leaks.Method 1: Use crash to perform static analysis
Run the following command to install the crash tool:
sudo yum install crash -y
Run the following command to install the kernel-debuginfo tool:
Alibaba Cloud Linux 3
sudo yum install -y kernel-debuginfo-<kernel version> --enablerepo=alinux3-plus-debug
NoteReplace the
kernel version
with the actual kernel version of the system. Run theuname -r
command to query the kernel version.Alibaba Cloud Linux 2
sudo yum install kernel-debuginfo -y
Run the following command to start the crash tool:
sudo crash
Run the following command in crash to view memory statistics about
kmalloc-192
:kmem -S kmalloc-192
If a large amount of memory statistical data is available, you can specify to view only the last few rows. For example, you can run the following command to view the last 10 rows of data:
kmem -S kmalloc-192 | tail -n 10
Sample command output:
SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea004c94e780 ffff88132539e000 0 42 29 13 ffffea004cbef900 ffff88132fbe4000 0 42 40 2 ffffea000a0e6280 ffff88028398a000 0 42 40 2 ffffea004bfa8000 ffff8812fea00000 0 42 41 1 ffffea006842b380 ffff881a10ace000 0 42 41 1 ffffea0009e7dc80 ffff880279f72000 0 42 34 8 ffffea004e67ae80 ffff881399eba000 0 42 40 2 ffffea00b18d6f80 ffff882c635be000 0 42 42 0
The command output indicates that the amount of free memory (the value in the
FREE
column) offfff88028398a000
is small, and the amount of allocated memory (the value in theALLOCATED
column) is large.Run the following command in crash to view the memory data about
ffff88028398a000
:rd ffff88028398a000 512 -S
If the command output contains a large amount of data, you can have the command output displayed in pages.
For example, if the
put_cred_rcu
function repeats multiple times in the command output, you can check the source code of the Linux kernel and search for theput_cred_rcu
function.void __put_cred(struct cred *cred) { call_rcu(&cred->rcu, put_cred_rcu); }
If the cred structure in the kernel has slab memory leaks, the
put_cred_rcu
function is used to asynchronously release credentials and appears at the end of the cred structure.
Method 2: Use perf to perform dynamic analysis
Run the following command to install the perf tool:
sudo yum install perf -y
Run the following command to use perf to dynamically obtain the memory that is not released in
kmalloc-192
at an interval of 200 seconds:sudo perf record -a -e kmem:kmalloc --filter 'bytes_alloc == 192' -e kmem:kfree --filter ' ptr != 0' sleep 200
Save the dynamically obtained data to a temporary file in the current directory.
In this example, the dynamically obtained data is saved to a temporary file named testperf.txt. Run the following command:
sudo perf script > testperf.txt
Run the following command to view the content of testperf.txt:
cat testperf.txt
You must manually identify the slab memory that contains no free memory (
free
) and then manually query the function that causes slab memory leaks in the source code of the Linux kernel.
After you use tools such as crash and perf to determine the function call path or the affected kernel data structure related to the memory leaks, we recommend that you identify the specific sources of the memory leaks under the guidance of kernel developers or professional O&M personnel, and then resolve the memory leak issue.
To resolve the issue, perform the following operations:
Upgrade the kernel or patch.
Adjust kernel parameters.
Restart affected services or modules.
Optimize applications or drivers.
Restart the system.
References
Perform the following operations if the slab memory leaks cause less available memory for businesses running on the instances, memory fragmentation, out-of-memory (OOM) killer issue, and system performance jitters: