When a Linux operating system does not have sufficient memory, the system reclaims memory and allocates the reclaimed memory to other processes. If memory reclamation does not resolve the memory insufficiency issue, the system triggers Out of Memory Killer (OOM Killer) to forcefully free up the memory that is occupied by processes. This alleviates memory pressure. This topic describes the possible causes of the issue that OOM Killer is triggered in Alibaba Cloud Linux and how to resolve the issue.
Problem description
The following sample log indicates that the test
process triggered OOM Killer in Alibaba Cloud Linux:
565 [Sat Sep 11 12:24:42 2021] test invoked oom-killer: gfp_mask=0x62****(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
566 [Sat Sep 11 12:24:42 2021] test cpuset=/ mems_allowed=0
567 [Sat Sep 11 12:24:42 2021] CPU: 1 PID: 29748 Comm: test Kdump: loaded Not tainted 4.19.91-24.1.al7.x86_64 #1
568 [Sat Sep 11 12:24:42 2021] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS e62**** 04/01/2014
Possible causes
OOM Killer is triggered when an instance or a cgroup in the instance does not have sufficient memory. The following table describes the possible causes of the issue that OOM Killer is triggered in Alibaba Cloud Linux.
Cause | Example scenario |
A cgroup does not have sufficient memory. | In a scenario in which OOM Killer is triggered as recorded in the following log, OOM Killer is triggered in the
Cause: The memory usage of the |
A parent cgroup does not have sufficient memory. | In a scenario in which OOM Killer is triggered as recorded in the following log, the
Cause: The memory usage of the |
An instance does not have sufficient memory. | In a scenario in which OOM Killer is triggered as recorded in the following log,
Cause: The amount of free memory on the instance is smaller than the lower limit of free memory, and memory reclamation cannot resolve the issue of insufficient memory. |
A memory node does not have sufficient memory. | In a scenario in which OOM Killer is triggered as recorded in the following log, the log data provides the following information:
Cause: In Non-Uniform Memory Access (NUMA) storage mode, the operating system may have multiple memory nodes. You can run the cat /proc/buddyinfo command to query resource information. If you use the |
A buddy system does not have sufficient memory in the event of memory fragmentation. | In a scenario in which OOM Killer is triggered as recorded in the following log, the log data provides the following information:
Cause: If the buddy system does not have sufficient memory when the operating system allocates memory, the system triggers OOM Killer to free up memory and allocates the freed memory to the buddy system. Note The buddy system is a kernel memory management mechanism in Linux that mitigates memory fragmentation and efficiently allocates and frees up memory blocks of different sizes. |
Solutions
Perform the following steps based on the scenario to troubleshoot the issue.
A cgroup or parent cgroup does not have sufficient memory
We recommend that you assess the processes that are occupying memory and terminate unnecessary processes to free up memory. If your business requires a large amount of memory and the instance type of your instance does not meet this requirement, you can upgrade to an instance type that has a larger memory size.
Upgrade the instance type of your instance.
For more information, see Overview of instance configuration changes.
Run the following command to adjust the upper limit of memory for the specified cgroup:
sudo bash -c 'echo <value> > /sys/fs/cgroup/memory/<cgroup_name>/memory.limit_in_bytes'
Replace
<value>
with a new upper limit of memory and<cgroup_name>
with the actual cgroup name.
An instance does not have sufficient memory
If an instance does not have sufficient memory, check the following items:
Usage of the slab_unreclaimable memory
cat /proc/meminfo | grep "SUnreclaim"
The slab_unreclaimable memory is the memory that cannot be reclaimed by the system. When the slab_unreclaimable memory takes up more than 10% of the total memory, the system may have slab memory leaks. For information about how to troubleshoot memory leaks, see What do I do if an instance has a high percentage of slab_unreclaimable memory? If the issue persists, submit a ticket.
Usage of the systemd memory
cat /proc/1/status | grep "RssAnon"
When OOM Killer is triggered in the kernel, the first process (PID 1) of the system is skipped. In this case, the systemd memory usage does not exceed 200 MB. If exceptions occur, you can update the systemd version.
Usage of the Transparent Huge Pages (THP) feature
If the THP feature is enabled, memory bloat may occur and trigger OOM Killer. You can optimize THP performance. For more information, see How do I use THP to tune performance in Alibaba Cloud Linux?.
A memory node does not have sufficient memory
If OOM Killer is triggered due to insufficient memory of memory nodes, re-configure the value of the cpuset.mems
interface to enable cgroups to properly use the memory of the memory nodes.
Run the following command to query the number of memory nodes in the system:
cat /proc/buddyinfo
Run the following command to specify the value of the
cpuset.mems
interface:sudo bash -c 'echo <value> > /sys/fs/cgroup/cpuset/<cgroup_name>/cpuset.mems'
Replace
<value>
with the actual memory node number and<cgroup_name>
with the actual cgroup name.For example, assume that the instance has three memory nodes: Node 0, Node 1, and Node 2. To allow the cgroup to use the memory of Node 0 and Node 2, set
<value>
to0,2
.
A buddy system does not have sufficient memory in the event of memory fragmentation
If OOM Killer is triggered due to memory fragmentation, defragment the memory on a regular basis during off-peak hours. You can run the following command to defragment the memory:
sudo bash -c 'echo 1 > /proc/sys/vm/compact_memory'