If a hung error occurs when you delete cgroups in an Elastic Compute Service (ECS) instance, you can use the solution described in this topic to fix the issue.
Problem description
[3302742.447940] Kernel panic - not syncing: softlockup: hung tasks
[3302742.448677] CPU: 18 PID: 1 Comm: systemd Kdump: loaded Tainted: G OEL ------------ T 3.10.0-862.14.4.el7.x86_64 #1
[3302742.450167] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8a46cfe 04/01/2014
[3302742.462123] [] mem_cgroup_reparent_charges+0x16d/0x3c0
[3302742.463243] [] mem_cgroup_css_offline+0x84/0x140
[3302742.464327] [] cgroup_destroy_locked+0xea/0x370
[3302742.465414] [] cgroup_rmdir+0x22/0x40
[3302742.466434] [] vfs_rmdir+0xdc/0x150
[3302742.467449] [] do_rmdir+0x1f1/0x220
[3302742.468470] [] ? ____fput+0xe/0x10
[3302742.469495] [] ? task_work_run+0xc0/0xe0
[3302742.470578] [] SyS_rmdir+0x16/0x20
[3302742.471628] [] system_call_fastpath+0x22/0x27
Cause
When you delete cgroups in the instance, the system repeatedly calculates the size of memory pages that are in use into that of the upper hierarchy of cgroups. If the cgroups consume a large amount of memory, the system spends an extended period of time in calculating. During the calculation process, the system has no scheduling test points, which results in a softlockup error.
Solution
- If your instance runs a CentOS operating system, we recommend that you upgrade the
kernel version.
- Run the following command to upgrade the kernel version:
yum update kernel
- Run the following command to restart the instance:
reboot
- Run the following command to check whether the kernel version is 3.10.0-1160 or later:
uname -r
- Run the following command to upgrade the kernel version:
- If your instance runs an Alibaba Cloud Linux operating system, no softlockup error occurs.
- If your instance runs an operating system other than the preceding ones, we recommend that you manually upgrade the kernel version to 4.17 or later.
If you have requests or feedback, you can submit a ticket to contact Alibaba Cloud.