what do I do if an ECS instance goes down and the "Objects remaining in kmalloc" message appears in an alert log? -

If your Elastic Compute Service (ECS) instance goes down and the Objects remaining in kmalloc message appears in an alert log when you use the memory cgroup kmem feature in the instance, you can use the solution described in this topic to resolve the issue.

Problem description

When you use the memory cgroup kmem feature in an instance, the instance goes down and an alert log similar to the following one appears in the operating system kernel of the instance:

[80569.393775] BUG kmalloc-256(15:94ef869ce655ebab64b08cd78ee00d16c20efd5737493b48293de41fe41b04a0) (Tainted: P    B   W  OE  ------------ T):
Objects remaining in kmalloc-256(15:94ef869ce655ebab64b08cd78ee00d16c20efd5737493b48293de41fe41b04a
[80569.397756] -----------------------------------------------------------------------------
[80569.397756]
[80569.400724] INFO: Slab 0xffffea0001e94a00 objects=32 used=1 fp=0xffff88007a528000 flags=0x1fffff00004080
[80569.402702] CPU: 21 PID: 26626 Comm: dockerd Tainted: P    B   W  OE  ------------ T 3.10.0-693.2.2.el7.x86_64 #1
[80569.404898] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8f19b21 04/01/2014
[80569.406747]  ffffea0001e94a00 000000004eb9a19f ffff883afee53aa0 ffffffff816a3db1
[80569.408833]  ffff883afee53b78 ffffffff811dbf54 ffffffff00000020 ffff883afee53b88
[80569.410731]  ffff883afee53b38 656a624f8190fff8 616d657220737463 6e6920676e696e69
[80569.412630] Call Trace:
[80569.414005]  [<ffffffff816a3db1>] dump_stack+0x19/0x1b
[80569.415627]  [<ffffffff811dbf54>] slab_err+0xb4/0xe0
[80569.417204]  [<ffffffff811e0623>] ? __kmalloc+0x1e3/0x230
[80569.420419]  [<ffffffff811e1939>] kmem_cache_close+0x149/0x2e0
[80569.422006]  [<ffffffff811e1ae4>] __kmem_cache_shutdown+0x14/0x80
[80569.423606]  [<ffffffff811a6874>] kmem_cache_destroy+0x44/0xf0
[80569.425149]  [<ffffffff811f6019>] kmem_cache_destroy_memcg_children+0x89/0xb0
[80569.426800]  [<ffffffff811a6849>] kmem_cache_destroy+0x19/0xf0
[80569.428309]  [<ffffffff8123b18e>] bioset_free+0xce/0x110
[80569.431306]  [<ffffffffc06d0b43>] dm_destroy+0x13/0x20 [dm_mod]
[80569.432803]  [<ffffffffc06d69be>] dev_remove+0x11e/0x180 [dm_mod]
[80569.435851]  [<ffffffffc06d7015>] ctl_ioctl+0x1e5/0x500 [dm_mod]
[80569.437363]  [<ffffffffc06d7343>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[80569.438882]  [<ffffffff8121524d>] do_vfs_ioctl+0x33d/0x540
[80569.443291]  [<ffffffff812154f1>] SyS_ioctl+0xa1/0xc0
[80569.446228]  [<ffffffff816b5009>] system_call_fastpath+0x16/0x1b

Cause

When you use the memory cgroup kmem feature in an instance, kmem_cache_destroy deletes memcg cache and checks whether refcount is set to 0 before kmem_cache_destroy destroys kmem_cache. When refcount is not set to 0, some tasks may attempt to allocate slab memory by using the memcg cache of kmem_cache. In this case, race conditions are triggered, and as a result, the instance goes down.

Solution

Important Before you perform the operations, we recommend that you create snapshots for the ECS instances to back up data to prevent data loss due to accidental operations. For more information about snapshots, see Overview.

We recommend that you disable the memory cgroup kmem feature in ECS instances. Perform the following steps to disable the memory cgroup kmem feature in an instance:

Run the following command to open the /etc/default/grub file:
```
vim /etc/default/grub
```
Press the I key to enter Insert mode and add the following content to the line that starts with GRUB_CMDLINE_LINUX:
```
cgroup.memory=nokmem
```
Press the Esc key to exit Insert mode, enter :wq, and then press the Enter key to save and close the file.
Run the following command to update GRand Unified Bootloader (GRUB):
```
grub2-mkconfig -o /boot/grub2/grub.cfg
```
Run the following command to restart the instance:
```
reboot
```

If the memory cgroup kmem feature cannot be disabled in the operating system of your instance by running commands in command line tools, we recommend that you do not set the value of memory.kmem.limit_in_bytes in all programs within your instance. This ensures that the memory cgroup kmem feature remains disabled.