If your Elastic Compute Service (ECS) instance goes down and the Objects remaining in kmalloc message appears in an alert log when you use the memory cgroup kmem feature in the instance, you can use the solution described in this topic to resolve the issue.
Problem description
When you use the memory cgroup kmem feature in an instance, the instance goes down and an alert log similar to the following one appears in the operating system kernel of the instance:
[80569.393775] BUG kmalloc-256(15:94ef869ce655ebab64b08cd78ee00d16c20efd5737493b48293de41fe41b04a0) (Tainted: P B W OE ------------ T):
Objects remaining in kmalloc-256(15:94ef869ce655ebab64b08cd78ee00d16c20efd5737493b48293de41fe41b04a
[80569.397756] -----------------------------------------------------------------------------
[80569.397756]
[80569.400724] INFO: Slab 0xffffea0001e94a00 objects=32 used=1 fp=0xffff88007a528000 flags=0x1fffff00004080
[80569.402702] CPU: 21 PID: 26626 Comm: dockerd Tainted: P B W OE ------------ T 3.10.0-693.2.2.el7.x86_64 #1
[80569.404898] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8f19b21 04/01/2014
[80569.406747] ffffea0001e94a00 000000004eb9a19f ffff883afee53aa0 ffffffff816a3db1
[80569.408833] ffff883afee53b78 ffffffff811dbf54 ffffffff00000020 ffff883afee53b88
[80569.410731] ffff883afee53b38 656a624f8190fff8 616d657220737463 6e6920676e696e69
[80569.412630] Call Trace:
[80569.414005] [<ffffffff816a3db1>] dump_stack+0x19/0x1b
[80569.415627] [<ffffffff811dbf54>] slab_err+0xb4/0xe0
[80569.417204] [<ffffffff811e0623>] ? __kmalloc+0x1e3/0x230
[80569.420419] [<ffffffff811e1939>] kmem_cache_close+0x149/0x2e0
[80569.422006] [<ffffffff811e1ae4>] __kmem_cache_shutdown+0x14/0x80
[80569.423606] [<ffffffff811a6874>] kmem_cache_destroy+0x44/0xf0
[80569.425149] [<ffffffff811f6019>] kmem_cache_destroy_memcg_children+0x89/0xb0
[80569.426800] [<ffffffff811a6849>] kmem_cache_destroy+0x19/0xf0
[80569.428309] [<ffffffff8123b18e>] bioset_free+0xce/0x110
[80569.431306] [<ffffffffc06d0b43>] dm_destroy+0x13/0x20 [dm_mod]
[80569.432803] [<ffffffffc06d69be>] dev_remove+0x11e/0x180 [dm_mod]
[80569.435851] [<ffffffffc06d7015>] ctl_ioctl+0x1e5/0x500 [dm_mod]
[80569.437363] [<ffffffffc06d7343>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[80569.438882] [<ffffffff8121524d>] do_vfs_ioctl+0x33d/0x540
[80569.443291] [<ffffffff812154f1>] SyS_ioctl+0xa1/0xc0
[80569.446228] [<ffffffff816b5009>] system_call_fastpath+0x16/0x1b
Cause
When you use the memory cgroup kmem feature in an instance, kmem_cache_destroy deletes memcg cache
and checks whether refcount
is set to 0 before kmem_cache_destroy
destroys kmem_cache
. When refcount
is not set to 0, some tasks may attempt to allocate slab memory by using the memcg cache
of kmem_cache
. In this case, race
conditions are triggered, and as a result, the instance goes down.
Solution
We recommend that you disable the memory cgroup kmem feature in ECS instances. Perform the following steps to disable the memory cgroup kmem feature in an instance:
Run the following command to open the /etc/default/grub file:
vim /etc/default/grub
Press the I key to enter Insert mode and add the following content to the line that starts with
GRUB_CMDLINE_LINUX
:cgroup.memory=nokmem
Press the Esc key to exit Insert mode, enter :wq, and then press the
Enter
key to save and close the file.Run the following command to update GRand Unified Bootloader (GRUB):
grub2-mkconfig -o /boot/grub2/grub.cfg
Run the following command to restart the instance:
reboot
If the memory cgroup kmem feature cannot be disabled in the operating system of your instance by running commands in command line tools, we recommend that you do not set the value of memory.kmem.limit_in_bytes
in all programs within your instance. This ensures that the memory cgroup kmem feature remains disabled.