What are the differences between cgroup v1 and cgroup v2? -

In Linux, Control Groups (cgroups) provide a resource management and restriction mechanism that limits, records, and isolates physical resources, such as CPUs, memory, and I/O resources, that are allocated to tasks (processes) in cgroups. A parent cgroup can be used to control the resource utilization of descendant cgroups. cgroup v1 and cgroup v2 are two major versions of cgroups and significantly differ in design and usage. This topic describes the main differences between cgroup v1 and cgroup v2.

Common interface differences

cgroup v1 interfaces

Interface name	Purpose	In-house interface	Corresponding cgroup v2 interface
cgroup.procs	Writes process IDs (PIDs) to migrate tasks to a cgroup.	No	cgroup.procs.
cgroup.clone_children	A value of 1 indicates that the child cgroups inherit the cpuset configurations of the parent cgroup. Note This interface takes effect only on the cpuset subsystem and is classified as a common interface due to historical reasons.	No	N/A
cgroup.sane_behavior	Supports experimental features of the cgroup v2 interfaces, which gives backward compatibility.	No	N/A
notify_on_release	A value of 1 indicates that the tasks in the release_agent interface are executed if a cgroup becomes empty. Note These interfaces exist only in the root cgroup.	No	cgroup.events, which implements similar functionality
release_agent		No	cgroup.events, which implements similar functionality
tasks	Writes thread IDs (TIDs) to migrate threads to a cgroup.	No	cgroup.threads.
pool_size	Controls the size of the cgroup cache pool. The cgroup cache pool helps accelerate the creation and binding of cgroups in high-concurrency scenarios. Note The interface depends on cgroup_rename and cannot be used in cgroup v2.	Yes	N/A

cgroup v2 interfaces

Interface name	Purpose	In-house interface	Corresponding cgroup v1 interface
cgroup.procs	Writes PIDs to migrate tasks to a cgroup.	No	cgroup.procs.
cgroup.type	Writes the string "threaded" to change a cgroup to a threaded cgroup to provide thread-granularity control. Note Only cpu, pids, and perf_event threaded controllers are supported.	No	N/A
cgroup.threads	Writes TIDs to migrate threads to a cgroup. Note The string "threaded" must be written to the cgroup.type interface file.	No	tasks.
cgroup.controllers	Queries all subsystems available for the current cgroup.	No	N/A
cgroup.subtree_control	Specifies which subsystems are enabled to control resource distribution from the cgroup to its child cgroups. Note The subsystems can be queried by using the cgroup.controllers interface.	No	N/A
cgroup.events	Queries whether active processes exist in the current cgroup and whether the current cgroup is frozen. You can use fsnotify to listen on this interface to check whether the interface status is changed. Note This interface does not exist in the root cgroup.	No	notify_on_release and release_agent, which are used together to implement similar functionality
cgroup.max.descendants	Controls the maximum number of the descendant cgroups allowed in the current cgroup.	No	N/A
cgroup.max.depth	Controls the maximum depth of descendant cgroups allowed in the current cgroup.	No	N/A
cgroup.stat	Queries the number of descendant cgroups underneath the current cgroup and the descendant cgroups that are in the Dying (deleted) state.	No	N/A
cgroup.freeze	Controls whether to freeze tasks in a cgroup. Note This interface does not exist in the root cgroup.	No	freezer.state in the freezer subsystem
cpu.stat	Queries statistics about CPU utilization.	No	N/A
io.pressure	Query Pressure Stall Information (PSI) for I/O performance, memory, and CPUs. The information can be polled. For more information, see the following topics: psi.rst Enable the PSI feature for cgroup v1	No	io.pressure, memory.pressure, and cpu.pressure interfaces in the cpuacct subsystem, which can implement the PSI feature
memory.pressure		No
cpu.pressure		No

Subsystem interface differences

CPU

cgroup v1 interfaces

Interface name	Purpose	In-house interface	Corresponding cgroup v2 interface
cpu.shares	Controls the weight, based on which CPU time slices are allocated to tasks in a cgroup. Default value: 1024.	No	cpu.weight and cpu.weight.nice, which use a different unit
cpu.idle	Controls whether to use an idle scheduling policy for the current cgroup. An idle scheduling policy allocates time slices based on the smallest CPU share. The minimum runtime is no longer supported, which allows the CPU resources to be easily allocated to non-idle tasks. Note If the cpu.idle value is set to 1, the cpu.shares interface becomes unwritable and is set only to 3.	No	cpu.idle
cpu.priority	The fine-grained preemptive priority. Preemption is performed when the clock is interrupted or woken up. The fine-grained preemptive priority varies based on the difference between priorities to allow high-priority tasks to preempt memory of low-priority tasks.	Yes	cpu.priority
cpu.cfs_quota_us	The CPU runtime controlled by using Completely Fair Scheduler (CFS). cpu.cfs_quota_us specifies the maximum CPU runtime of tasks in a cgroup within a period defined by the cpu.cfs_period_us interface.	No	cpu.max
cpu.cfs_period_us		No	cpu.max
cpu.cfs_burst_us	The duration in which tasks can burst within a period defined by the cpu.cfs_period_us interface. For more information, see Enable the CPU burst feature for cgroup v1.	No	cpu.max.burst
cpu.cfs_init_buffer_us	The duration in which tasks in a cgroup can burst when the tasks are initiated.	Yes	cpu.max.init_buffer
cpu.stat	Queries statistics about CPU runtime, such as the number of cpu.cfs_period_us periods and the number of times CPU resources used by tasks were throttled.	No	cpu.stat
cpu.rt_runtime_us	Control the real-time CPU runtime. cpu.rt_runtime_us specifies the maximum runtime of real-time tasks in a cgroup within the cpu.rt_period_us period.	No	N/A
cpu.rt_period_us		No	N/A
cpu.bvt_warp_ns	Control the group identity attribute to change the identities of cgroups, which can be used to distinguish between offline tasks and provide better CPU quality of service (QoS) guarantees for online tasks. For more information, see Group identity feature.	Yes	cpu.bvt_warp_ns
cpu.identity		Yes	cpu.identity
cpu.ht_stable	Specifies whether to generate simultaneous multithreading (SMT) peer noise to maintain consistent SMT computing power.	Yes	N/A
cpu.ht_ratio	Controls whether to use quotas to provide extra computing power when the SMT peer is idle to maintain consistent SMT computing power.	Yes	cpu.ht_ratio

cgroup v2 interfaces

Note

cgroup v2 no longer supports the cpuacct subsystem. Specific interfaces or related features of the cpuacct subsystem are implemented by the CPU subsystem in cgroup v2.

Interface name	Purpose	In-house interface	Corresponding cgroup v1 interface
cpu.weight	Controls the weight, based on which CPU time slices are allocated to tasks in a cgroup. Default value: 100.	No	cpu.shares, which uses a different unit
cpu.weight.nice	Controls the weight, based on which CPU time slices are allocated to tasks in a cgroup. Default value: 0.	No	cpu.shares, which uses a different unit
cpu.idle	Controls whether to use an idle scheduling policy for the current cgroup. An idle scheduling policy allocates time slices based on the smallest CPU share. The minimum runtime is no longer supported, which allows the CPU resources to be easily allocated to non-idle tasks. Note When the cpu.idle value is 1, the cpu.weight and cpu.weight.nice interfaces become unwritable, and a minimum weight of 0.3 takes effect. In this case, the cpu.weight value is rounded to 0.	No	cpu.idle
cpu.priority	The fine-grained preemptive priority. Preemption is performed when the clock is interrupted or woken up. The fine-grained preemptive priority varies based on the difference between priorities to allow high-priority tasks to preempt memory of low-priority tasks.	Yes	cpu.priority
cpu.max	The CPU runtime controlled by using CFS. cpu.cfs_quota_us specifies the maximum CPU runtime of tasks in a cgroup within the cpu.cfs_period_us period.	No	cpu.cfs_quota_us, cpu.cfs_period_us
cpu.max.burst	The duration in which tasks can burst within a period defined by the cpu.max interface.	No	cpu.max.burst
cpu.max.init_buffer	The duration in which tasks in a cgroup can burst when the tasks are initiated.	Yes	cpu.cfs_init_buffer_us
cpu.bvt_warp_ns	Control the group identity attribute to change the identities of cgroups, which can be used to distinguish between offline tasks and provide better CPU QoS guarantees for online tasks.	Yes	cpu.bvt_warp_ns
cpu.identity		Yes	cpu.identity
cpu.sched_cfs_statistics	Queries statistics about CFS, such as the runtime of a cgroup and the waiting time of cgroups at the same level or different levels. Note The kernel.sched_schedstats option must be enabled.	Yes	cpuacct.sched_cfs_statistics
cpu.wait_latency	Queries the latency of tasks waiting in the queue. Note The kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled options must be enabled.	Yes	cpuacct.wait_latency
cpu.cgroup_wait_latency	Queries the latency of cgroups waiting in the queue. The wait_latency interface counts the latency of task scheduling entities (SEs), and the cgroup_wait_latency interface counts the latency of group SEs. Note The kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled options must be enabled.	Yes	cpuacct.cgroup_wait_latency
cpu.block_latency	Queries the latency of tasks blocked due to non-I/O causes. Note The kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled options must be enabled.	Yes	cpuacct.block_latency
cpu.ioblock_latency	Queries the latency of tasks blocked due to I/O operations. Note The kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled options must be enabled.	Yes	cpuacct.ioblock_latency
cpu.ht_ratio	Controls whether to use quotas to provide extra computing power when the SMT peer is idle to maintain consistent SMT computing power. Note This interface takes effect only if the core scheduling feature is enabled.	Yes	cpu.ht_ratio

cpuset

cgroup v1 interfaces

Interface name	Purpose	In-house interface	Corresponding cgroup v2 interface
cpuset.cpus	Controls the CPUs on which tasks can run. Note Tasks cannot be attached to a cgroup when this interface is empty.	No	cpuset.cpus
cpuset.mems	Controls the non-uniform memory access (NUMA) nodes that can be allocated to tasks in a cgroup. Note Tasks cannot be attached to a cgroup when this interface is empty.	No	cpuset.mems
cpuset.effective_cpus	Queries the effective CPUs on which tasks are running. The value of this interface is affected by CPU hotplug events.	No	cpuset.cpus.effective
cpuset.effective_mems	Queries the effective NUMA nodes that are allocated to the running tasks. The value of this interface is affected by memory nodes hotplug events.	No	cpuset.mems.effective
cpuset.cpu_exclusive	Controls which CPUs are exclusively used by a cgroup and cannot be used by other cpusets at the same level in a cgroup.	No	cpuset.cpus.partition, that supports similar functionality
cpuset.mem_exclusive	Controls which NUMA nodes are exclusively used by a cgroup and cannot be used by other cpusets at the same level in a cgroup.	No	N/A
cpuset.mem_hardwall	A value of 1 indicates that memory only from the memory nodes that are attached to the cpuset can be allocated to tasks.	No	N/A
cpuset.sched_load_balance	Controls whether CPUs are load-balanced within the cpuset. By default, the feature is enabled.	No	N/A
cpuset.sched_relax_domain_level	Controls the range in which to search for CPUs when a scheduler migrates tasks to load-balance CPUs for the tasks. Default value: -1. -1: enforces the default system policy. 0: does not perform a search. 1: searches for hyperthreads within the same core. 2: searches for cores in the same package. 3: searches for CPUs on the same node. 4: searches for CPUs on nodes in the same chunk. 5: searches for CPUs in the entire system.	No	N/A
cpuset.memory_migrate	A non-zero value indicates that if a task is allocated a memory page in a cpuset and migrated to another cpuset, the memory page can also be migrated to the new cpuset.	No	N/A
cpuset.memory_pressure	Calculates the memory paging pressure of the current cpuset.	No	N/A
cpuset.memory_spread_page	A value of 1 indicates that the kernel evenly allocates the page cache to the memory nodes of the cpuset.	No	N/A
cpuset.memory_spread_slab	A value of 1 indicates that the kernel evenly allocates the slab caches to the memory nodes of the cpuset.	No	N/A
cpuset.memory_pressure_enabled	A value of 1 indicates that memory pressure statistics collection is enabled for the cpuset.	No	N/A

cgroup v2 interfaces

Interface name	Purpose	In-house interface	Corresponding cgroup v1 interface
cpuset.cpus	Controls the CPUs on which tasks can run. Note When the value of this interface is empty, the CPUs of the parent cpuset are used.	No	cpuset.cpus
cpuset.mems	Controls the NUMA nodes that can be allocated to tasks in a cgroup. Note When the value of this interface is empty, the NUMA nodes of the parent cpuset are used.	No	cpuset.mems
cpuset.cpus.effective	Queries the effective CPUs on which tasks are running. The value of this interface is affected by CPU hotplug events.	No	cpuset.effective_cpus
cpuset.mems.effective	Queries the effective NUMA nodes that are allocated to the running tasks. The value of this interface is affected by memory nodes hotplug events.	No	cpuset.effective_mems
cpuset.cpus.partition	Controls whether CPUs of a cpuset are exclusively used. If root is written into the interface, CPUs of a cpuset are exclusively used.	No	cpuset.cpu_exclusive, which implements similar functionality
.__DEBUG__.cpuset.cpus.subpartitions	Queries which CPUs are used exclusively when root is written into the cpuset.cpus.partition interface. Note This interface is available only if the cgroup_debug feature is enabled for kernel cmdline.	No	N/A

blkio

cgroup v1 interfaces

Interface name	Purpose	In-house interface	Corresponding cgroup v2 interface
blkio.throttle.read_bps_device	Specifies the maximum number of bytes per second that a cgroup can read from a device. Example: echo "<major>:<minor> <bps>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.read_bps_device	No	io.max
blkio.throttle.write_bps_device	Specifies the maximum number of bytes per second that a cgroup can write to a device. Example: echo "<major>:<minor> <bps>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.write_bps_device	No	io.max
blkio.throttle.read_iops_device	Specifies the maximum number of read operations per second that a cgroup can perform on a device. Example: echo "<major>:<minor> <iops>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.read_iops_device	No	io.max
blkio.throttle.write_iops_device	Specifies the maximum number of read operations per second that a cgroup can perform on a device. Example: echo "<major>:<minor> <iops>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.write_iops_device	No	io.max
blkio.throttle.io_service_bytes	Queries bandwidth statistics. This interface collects the read, write, sync, async, discard, and total bandwidth statistics of all devices. Unit: bytes.	No	io.stat
blkio.throttle.io_service_bytes_recursive	The recursive version of the blkio.throttle.io_service_bytes interface. Statistics collected by using the blkio.throttle.io_service_bytes interface include data of descendant cgroups.	No	N/A
blkio.throttle.io_serviced	Queries IOPS statistics. This interface collects the read, write, sync, async, discard, and total IOPS statistics of all devices.	No	io.stat
blkio.throttle.io_serviced_recursive	The recursive version of the blkio.throttle.io_serviced interface. Statistics collected by using the blkio.throttle.io_serviced interface include data of descendant cgroups.	No	N/A
blkio.throttle.io_service_time	Queries the duration between request dispatch and request completion for I/O operations, which is used to measure the average I/O latency. For more information, see Enhance the monitoring of block I/O throttling.	Yes	io.extstat
blkio.throttle.io_wait_time	Queries the duration when I/O operations wait in scheduler queues, which is used to measure the average I/O latency. For more information, see Enhance the monitoring of block I/O throttling.	Yes	io.extstat
blkio.throttle.io_completed	Queries the number of completed I/O operations, which is used to measure the average I/O latency. For more information, see Enhance the monitoring of block I/O throttling.	Yes	io.extstat
blkio.throttle.total_bytes_queued	Queries the number of I/O bytes that were throttled, which is used to analyze whether I/O latency is related to throttling. For more information, see Enhance the monitoring of block I/O throttling.	Yes	io.extstat
blkio.throttle.total_io_queued	Queries the number of I/O operations that were throttled, which is used to analyze whether I/O latency is related to throttling. For more information, see Enhance the monitoring of block I/O throttling.	Yes	io.extstat
blkio.cost.model	Specifies the blk-iocost cost model. The control mode (ctrl) can be set to auto or user. This interface exists only in the root cgroup. Example: echo "<major>:<minor> ctrl=user model=linear rbps=<rbps> rseqiops=<rseqiops> rrandiops=<rrandiops> wbps=<wbps> wseqiops=<wseqiops> wrandiops=<wrandiops>" > /sys/fs/cgroup/blkio/blkio.cost.model For more information, see Configure the blk-iocost weight-based throttling feature.	Yes	io.cost.model
blkio.cost.qos	Controls the blk-iocost feature and configures a QoS policy to check for disk congestion. This interface exists only in the root cgroup. Example: echo "<major>:<minor> enable=1 ctrl=user rpct= rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/blkio/blkio.cost.qos For more information, see Configure blk-iocost weight throttling.	Yes	io.cost.qos
blkio.cost.weight	Specifies the cgroup weight. This interface exists only in non-root cgroups and can be configured in the following modes: weight: sets the same weight for all devices. major:minor + weight: set the weight of a specific device. For more information, see Configure the blk-iocost weight-based throttling feature.	Yes	io.cost.weight
blkio.cost.stat	Queries the blk-iocost statistics. The interface exists only in non-root cgroups.	Yes	N/A

cgroup v2 interfaces

Interface name	Purpose	In-house interface	Corresponding cgroup v1 interface
io.max	The throttling interface that specifies the read and write throttling rates in byte/s and IOPS. Example: echo "<major>:<minor> rbps=<bps> wbps=<bps> riops=<iops> wiops=<iops>" > /sys/fs/cgroup/<cgroup>/io.max	No	blkio.throttle.read_bps_device blkio.throttle.read_iops_device blkio.throttle.write_bps_device blkio.throttle.write_iops_device
io.stat	Queries I/O operation statistics, which include the rates of read, write, and discard operations in byte/s and IOPS.	No	blkio.throttle.io_service_bytes blkio.throttle.io_serviced
io.extstat	Queries extended I/O statistics, including the wait time, service time, number of completed I/O operations, and throttling rates in byte/s and IOPS.	No	blkio.throttle.io_service_time blkio.throttle.io_wait_time blkio.throttle.io_completed blkio.throttle.total_bytes_queued blkio.throttle.total_io_queued
io.cost.model	Specifies the blk-iocost cost model. The control mode (ctrl) can be set to auto or user. This interface exists only in the root cgroup. Example: echo "<major>:<minor> ctrl=user model=linear rbps=<rbps> rseqiops=<rseqiops> rrandiops=<rrandiops> wbps=<wbps> wseqiops=<wseqiops> wrandiops=<wrandiops>" > /sys/fs/cgroup/io.cost.model For more information, see Configure blk-iocost weight throttling.	No	blkio.cost.model
io.cost.qos	Controls the blk-iocost feature and configures a QoS policy to check for disk congestion. This interface exists only in the root cgroup. Example: echo "<major>:<minor> enable=1 ctrl=user rpct= rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/io.cost.qos For more information, see Configure blk-iocost weight throttling.	No	blkio.cost.qos
io.cost.weight	Specifies the cgroup weight. This interface exists only in non-root cgroups and can be configured in the following modes: weight: sets the same weight for all devices. major:minor + weight: set the weight of a specific device. For more information, see Configure blk-iocost weight throttling.	No	blkio.cost.weight

memory

cgroup v1 interfaces

Interface name	Purpose	In-house interface	Corresponding cgroup v2 interface
memory.usage_in_bytes	Queries the current memory usage.	No	N/A
memory.max_usage_in_bytes	Queries the maximum memory usage.	No	N/A
memory.limit_in_bytes	Specifies the hard upper limit on memory usage.	No	N/A
memory.soft_limit_in_bytes	Specifies the soft lower limit on memory usage.	No	N/A
memory.failcnt	Queries the number of times the memory usage reached the upper limit.	No	N/A
memory.mglru_batch_size	Specifies the size of memory that is proactively reclaimed based on the Multi-Generational Least Recently Used (MGLRU) framework. An attempt is made to release CPUs between batches of memory reclamation.	Yes	N/A
memory.mglru_reclaim_kbytes	Specifies the size of memory that is proactively reclaimed based on the MGLRU framework.	Yes	N/A
memory.wmark_ratio	Controls the memcg backend asynchronous reclaim feature and sets the memcg memory watermark that triggers asynchronous reclamation. Unit: percent of the memcg memory upper limit. Valid values: 0 to 100. The default value is 0, which indicates that the memcg backend asynchronous reclaim feature is disabled. When the value is not 0, the memcg backend asynchronous reclaim feature is enabled. You can set the corresponding watermark. For more information, see Memcg backend asynchronous reclaim.	Yes	memory.wmark_ratio
memory.wmark_high	A read-only interface. When the memcg memory usage exceeds the value of this interface, backend asynchronous reclamation is started. The value of this interface is calculated by using the following formula: memory.wmark_high = memory.limit_in_bytes × memory.wmark_ratio/100. When the memcg backend asynchronous reclaim feature is disabled, memory.wmark_high defaults to a large value to prevent backend asynchronous reclamation from being triggered. This interface file is not stored in the memcg root directory. For more information, see Memcg backend asynchronous reclaim.	Yes
memory.wmark_low	A read-only interface. When the memcg memory usage falls below the value of this interface, backend asynchronous reclamation ends. The value of this interface is calculated by using the following formula: memory.wmark_low = memory.wmark_high-memory.limit_in_bytes × memory.wmark_scale_factor/10000. This interface file is not stored in the memcg root directory. For more information, see Memcg backend asynchronous reclaim.	Yes
memory.wmark_scale_factor	Specifies the interval between the memory.wmark_high value and the memory.wmark_low value. Unit: 0.01 percent of the memcg memory upper limit. Valid values: 1 to 1000. This interface inherits the value of its parent group when the interface is created. The inherited value is 50, which indicates 0.50% of the memcg memory upper limit. This is also the default value. This interface file is not stored in the memcg root directory. For more information, see Memcg backend asynchronous reclaim.	Yes
memory.wmark_min_adj	The factor that is used in the memcg global minimum watermark rating feature. The value of this interface indicates an adjustment in percentage over the global minimum watermark. Valid values: -25 to 50. This interface inherits a value of 0 from the parent cgroup when the interface is created. Therefore, the default value is 0. A negative value in the value range is an adjustment in percentage over the [0, WMARK_MIN] range, where WMARK_MIN is the value of global wmark_min. For example, if memory.wmark_min_adj is -25, WMARK_MIN of a memcg is calculated by using the following formula: memcg WMARK_MIN = WMARK_MIN + (WMARK_MIN - 0) × -25%. A positive value in the range is an adjustment in percentage over the `[WMARK_MIN, WMARK_LOW]` range. `WMARK_MIN` is the value of global wmark_min, and `WMARK_LOW` is the value of global wmark_low. When the offset global minimum watermark is triggered, throttling is performed, and the throttling time is linearly proportional to the excess memory usage. Valid values of the throttling time: 1 to 1000. Unit: milliseconds. For more information, see Memcg global minimum watermark rating.	Yes
memory.force_empty	Specifies whether to forcefully reclaim memory pages.	No	N/A
memory.use_hierarchy	Specifies whether to collect hierarchical statistics.	Yes	N/A
memory.swappiness	Specifies the swappiness parameter of vmscan, which controls the tendency of the kernel to use the swap partition.	No	N/A
memory.priority	Specifies the memcg priority. This interface provides 13 memcg out-of-memory (OOM) priorities to sort business. Valid values: 0 to 12. A larger value indicates a higher priority. The priority of a parent cgroup is not inherited by its descendant cgroups. Default value: 0. This interface is used to implement memcg QoS. The priority values, rather than global variables, are used to sort sibling cgroups only in the same parent cgroup. The sibling memcgs with the same priority are sorted by memory usage. An OOM error is triggered on the child memcg that consumes the largest amount of memory.	Yes	memory.priority
memory.move_charge_at_immigrate	Specifies whether charges of a task are moved along the task when the task is migrated between cgroups, which is a statistical control policy.	No	N/A
memory.oom_control	Specifies whether to trigger the OOM killer to terminate tasks when an OOM error occurs and generate notifications about OOM status.	No	N/A
memory.oom.group	Controls the OOM group feature that can terminate all tasks in a memcg if an OOM error occurs.	Yes	memory.oom.group
memory.pressure_level	Specifies memory pressure notifications.	No	N/A
memory.kmem.limit_in_bytes	Specifies the hard limit on the memory usage of the kernel.	No	N/A
memory.kmem.usage_in_bytes	Queries the memory usage of the kernel.	No	N/A
memory.kmem.failcnt	Queries the number of times the memory usage of the kernel reached the upper limit.	No	N/A
memory.kmem.max_usage_in_bytes	Queries the maximum memory usage of the kernel.	No	N/A
memory.kmem.slabinfo	Queries the slab memory usage of the kernel.	No	N/A
memory.kmem.tcp.limit_in_bytes	Specifies the hard limit on the TCP memory usage of the kernel.	No	N/A
memory.kmem.tcp.usage_in_bytes	Queries the TCP memory usage of the kernel.	No	N/A
memory.kmem.tcp.failcnt	Queries the number of times the TCP memory usage of the kernel reached the upper limit.	No	N/A
memory.kmem.tcp.max_usage_in_bytes	Queries the maximum TCP memory usage of the kernel.	No	N/A
memory.memsw.usage_in_bytes	Queries the memory usage and swap memory usage.	No	N/A
memory.memsw.max_usage_in_byte	Queries the maximum usage of memory and swap memory.	No	N/A
memory.memsw.limit_in_bytes	Specifies the upper limit on the total usage of memory and swap memory used by tasks in the cgroup.	No	N/A
memory.memsw.failcnt	Queries the number of times the total usage of memory and swap memory reached the upper limit.	No	N/A
memory.swap.high	Specifies the upper limit on available swap memory usage in a cgroup.	Yes	memory.swap.high
memory.swap.events	Queries the events occuring when the swap memory usage reached the upper limit.	Yes	memory.swap.events
memory.min	Specifies a minimum amount of memory that a cgroup must retain, which is a hard guarantee of memory. For more information, see Memcg QoS feature of the cgroup v1 interface.	Yes	memory.min
memory.low	Specifies the lower limit of memory that a cgroup can retain, which is a soft guarantee of memory. For more information, see Memcg QoS feature of the cgroup v1 interface.	Yes	memory.low
memory.high	Specifies the throttle limit of the memory usage. For more information, see Memcg QoS feature of the cgroup v1 interface.	Yes	memory.high
memory.allow_duptext	When the /sys/kernel/mm/duptext/enabled parameter is configured to globally enable the code duptext feature, the interface is used to control whether to enable the code duptext feature for tasks in a specific memcg. Valid values: 0 and 1. Default value: 0. 1: enables the code duptext feature for tasks in a specific memcg. 0: disables the code duptext feature for tasks in a specific memcg. For more information, see Code duptext feature.	Yes	memory.allow_duptext
memory.allow_duptext_refresh	Specifies whether the code duptext feature is immediately started when a binary file is generated or downloaded. The code duptext feature does not take effect in case of PageDirty or PageWriteback. The interface uses the asynchronous task mode to refresh tasks when the code duptext feature does not take effect in scenarios of PageDirty or PageWriteback.	Yes	memory.allow_duptext_refresh
memory.duptext_nodes	Limits the duptext memory allocation nodes.	Yes	memory.duptext_nodes
memory.allow_text_unevictable	Specifies whether the memcg snippet is locked.	Yes	memory.allow_text_unevictable
memory.text_unevictable_percent	Specifies the ratio of the amount of memory used by locked memcg code snippet to the total amount of memory used by memcg code.	Yes	memory.text_unevictable_percent
memory.thp_reclaim	Controls the Transparent Huge Pages (THP) reclaim feature. Valid values: reclaim: enables the THP reclaim feature. swap: is reserved for future use. disable: disables the THP reclaim feature. Default value: disable. For more information, see THP reclaim.	Yes	memory.thp_reclaim
memory.thp_reclaim_stat	Queries the status of the THP reclaim feature. Parameters of this interface: queue_length: the number of THPs in the queue of each node. If the THP reclaim feature is enabled, THPs are added to a reclaim queue. split_hugepage: the total number of THPs that are split by the THP reclaim feature for each node. reclaim_subpage: the total number of zero subpages that are reclaimed by the THP reclaim feature for each node. The values of the preceding parameters are listed in ascending order by NUMA node ID, such as node0 and node1, from left to right. For more information, see THP reclaim.	Yes	memory.thp_reclaim_stat
memory.thp_reclaim_ctrl	Specifies how the THP reclaim feature is triggered. Parameters of this interface: threshold: the maximum number of zero subpages in a THP. If the number of zero subpages in a THP exceeds the threshold value, the THP reclaim feature is triggered. Default value: 16. reclaim: triggers the THP reclaim feature. For more information, see THP reclaim.	Yes	memory.thp_reclaim_ctrl
memory.thp_control	Controls the memcg THP feature. This interface can be used to prohibit the application of anon, shmem, and file THPs. For example, an offline memcg is not allowed to use THPs. This helps reduce THP contention and memory waste, even though memory fragmentation cannot be prevented.	Yes	memory.thp_control
memory.reclaim_caches	Specifies whether the kernel proactively reclaims the cache in memcgs. Example: `echo 100M > memory.reclaim_caches`.	Yes	memory.reclaim_caches
memory.pgtable_bind	Specifies whether to forcefully apply for page table memory on the current node.	Yes	memory.pgtable_bind
memory.pgtable_misplaced	Queries statistics about page memory in page tables when page memory is allocated across nodes.	Yes	memory.pgtable_misplaced
memory.oom_offline	In the Quick OOM feature, you can use this interface to mark the memcg of an offline task.	Yes	memory.oom_offline
memory.async_fork	Controls the Async-fork feature, formerly known as fast convergent merging (FCM), for memcgs.	Yes	memory.async_fork
memory.direct_compact_latency	Specifies the latency in direct memory compaction of the memsli feature.	Yes	memory.direct_compact_latency
memory.direct_reclaim_global_latency	Specifies the latency in direct global memory reclamation of the memsli feature.	Yes	memory.direct_reclaim_global_latency
memory.direct_reclaim_memcg_latency	Specifies the latency in direct memcg memory reclamation of the memsli feature.	Yes	memory.direct_reclaim_memcg_latency
memory.direct_swapin_latency	Specifies the latency in direct memory swap-in of the memsli feature.	Yes	memory.direct_swapin_latency
memory.direct_swapout_global_latency	Specifies the latency in direct global memory swap-out of the memsli feature.	Yes	memory.direct_swapout_global_latency
memory.direct_swapout_memcg_latency	Specifies the latency in direct memcg memory swap-out of the memsli feature.	Yes	memory.direct_swapout_memcg_latency
memory.exstat	Queries statistics about extended memory and extra memory. Statistics about the following in-house features are collected: wmark_min_throttled_ms: the throttling time elapsed since the offset global minimum watermark was exceeded. wmark_reclaim_work_ms: the duration in which the kernel attempts to reclaim memory from a cgroup. unevictable_text_size_kb: the size of a code snippet to be locked. pagecache_limit_reclaimed_kb: the limit of a page cache. For more information, see Memcg Exstat feature.	Self-developed enhancement	memory.exstat
memory.idle_page_stats	Queries statistics about kidled memory usage of a memcg and the hierarchical information of the cgroup.	Yes	memory.idle_page_stats
memory.idle_page_stats.local	Queries statistics about kidled memory usage of a memcg.	Yes	memory.idle_page_stats.local
memory.numa_stat	Queries NUMA statistics for anonymous, file, and locked memory.	No	memory.numa_stat
memory.pagecache_limit.enable	Controls the Page Cache Limit feature. For more information, see Page Cache Limit feature.	Yes	memory.pagecache_limit.enable
memory.pagecache_limit.size	Specifies the size of the limited page cache.	Yes	memory.pagecache_limit.size
memory.pagecache_limit.sync	Specifies the mode of the Page Cache Limit feature, which is synchronous or asynchronous.	Yes	memory.pagecache_limit.sync
memory.reap_background	Specifies whether the zombie memcg reapers reap memory of memcgs in the backend asynchronous manner.	Yes	memory.reap_background
memory.stat	Queries memory statistics.	No	memory.stat
memory.use_priority_oom	Controls the memcg OOM priority policy feature. For more information, see Memcg OOM priority policy.	Yes	memory.use_priority_oom
memory.use_priority_swap	Specifies whether the memory is swapped based on the priorities of cgroups. For more information, see Memcg OOM priority policy.	Yes	memory.use_priority_swap

cgroup v2 interfaces

Interface name	Purpose	In-house interface	Corresponding cgroup v1 interface
memory.current	Queries the memory usage.	No	N/A
memory.min	Specifies a minimum amount of memory that a cgroup must retain, which is a hard guarantee of memory. For more information, see Memcg QoS feature of the cgroup v1 interface.	No	memory.min
memory.low	Specifies the lower limit of memory that a cgroup can retain, which is a soft guarantee of memory. For more information, see Memcg QoS feature of the cgroup v1 interface.	No	memory.low
memory.high	Specifies the upper limit on memory usage. For more information, see Memcg QoS feature of the cgroup v1 interface.	No	memory.high
memory.max	Specifies the throttle limit of the memory usage.	No	memory.max
memory.swap.current	Queries swap memory in use.	No	N/A
memory.swap.high	Specifies the upper limit on available swap memory usage in a cgroup.	No	N/A
memory.swap.max	Specifies a hard limit on swap memory.	No	N/A
memory.swap.events	Queries the events occuring when the swap memory usage reached the upper limit.	No	N/A
memory.oom.group	Specifies whether the OOM group feature is enabled, which can kill all tasks in a memcg if an OOM error occurs.	No	memory.oom.group
memory.wmark_ratio	Controls the memcg backend asynchronous reclaim feature and sets the memcg memory watermark that triggers asynchronous reclamation. Unit: percent of the memcg memory upper limit. Valid values: 0 to 100. The default value is 0, which indicates that the memcg backend asynchronous reclaim feature is disabled. When the value is not 0, the memcg backend asynchronous reclaim feature is enabled. You can set the corresponding watermark. For more information, see Memcg backend asynchronous reclaim.	Yes	memory.wmark_ratio
memory.wmark_high	A read-only interface. When the memcg memory usage exceeds the value of this interface, backend asynchronous reclamation is started. The value of this interface is calculated by using the following formula: memory.wmark_high = memory.limit_in_bytes × memory.wmark_ratio/100. When the memcg backend asynchronous reclaim feature is disabled, memory.wmark_high defaults to a large value to prevent backend asynchronous reclamation from being triggered. This interface file is not stored in the memcg root directory. For more information, see Memcg backend asynchronous reclaim.	Yes	memory.wmark_high
memory.wmark_low	A read-only interface. When the memcg memory usage falls below the value of this interface, backend asynchronous reclamation ends. The value of this interface is calculated by using the following formula: memory.wmark_low = memory.wmark_high-memory.limit_in_bytes × memory.wmark_scale_factor/10000. This interface file is not stored in the memcg root directory. For more information, see Memcg backend asynchronous reclaim.	Yes	memory.wmark_low
memory.wmark_scale_factor	Specifies the interval between the memory.wmark_high value and the memory.wmark_low value. Unit: 0.01 percent of the memcg memory upper limit. Valid values: 1 to 1000. This interface inherits the value of its parent group when the interface is created. The inherited value is 50, which indicates 0.50% of the memcg memory upper limit. This is also the default value. This interface file is not stored in the memcg root directory. For more information, see Memcg backend asynchronous reclaim.	Yes	memory.wmark_scale_factor
memory.wmark_min_adj	The factor that is used in the memcg global minimum watermark rating feature. The value of this interface indicates an adjustment in percentage over the global minimum watermark. Valid values: -25 to 50. This interface inherits a value of 0 from the parent cgroup when the interface is created. Therefore, the default value is 0. A negative value in the value range is an adjustment in percentage over the [0, WMARK_MIN] range, where WMARK_MIN is the value of global wmark_min. For example, if memory.wmark_min_adj is -25, WMARK_MIN of a memcg is calculated by using the following formula: memcg WMARK_MIN = WMARK_MIN + (WMARK_MIN - 0) × -25%. A positive value in the range is an adjustment in percentage over the `[WMARK_MIN, WMARK_LOW]` range. `WMARK_MIN` is the value of global wmark_min, and `WMARK_LOW` is the value of global wmark_low. When the offset global minimum watermark is triggered, throttling is performed, and the throttling time is linearly proportional to the excess memory usage. Valid values of the throttling time: 1 to 1000. Unit: milliseconds. For more information, see Memcg global minimum watermark rating.	Yes	memory.wmark_min_adj
memory.priority	Specifies the memcg priority. This interface provides 13 memcg OOM priorities to sort business. Valid values: 0 to 12. A larger value indicates a higher priority. The priority of a parent cgroup is not inherited by its descendant cgroups. Default value: 0. This interface is used to implement memcg QoS. The priority values, rather than global variables, are used to sort sibling cgroups only in the same parent cgroup. The sibling memcgs with the same priority are sorted by memory usage. An OOM error is triggered on the child memcg that consumes the largest amount of memory. For more information, see Memcg OOM priority policy.	Yes	memory.priority
memory.use_priority_oom	Controls the memcg OOM priority policy feature. For more information, see Memcg OOM priority policy.	Yes	memory.use_priority_oom
memory.use_priority_swap	Specifies whether the memory is swapped based on the priorities of cgroups. For more information, see Memcg OOM priority policy.	Yes	memory.use_priority_swap
memory.direct_reclaim_global_latency	Specifies the latency in direct global memory reclamation of the memsli feature.	Yes	memory.direct_reclaim_global_latency
memory.direct_reclaim_memcg_latency	Specifies the latency in direct memcg memory reclamation of the memsli feature.	Yes	memory.direct_reclaim_memcg_latency
memory.direct_compact_latency	Specifies the latency in direct memory compaction of the memsli feature.	Yes	memory.direct_compact_latency
memory.direct_swapout_global_latency	Specifies the latency in direct global memory swap-out of the memsli feature.	Yes	memory.direct_swapout_global_latency
memory.direct_swapout_memcg_latency	Specifies the latency in direct memcg memory swap-out of the memsli feature.	Yes	memory.direct_swapout_memcg_latency
memory.direct_swapin_latency	Specifies the latency in direct memory swap-in of the memsli feature.	Yes	memory.direct_swapin_latency
memory.exstat	Queries statistics about extended memory and extra memory. Statistics about the following in-house features are collected: wmark_min_throttled_ms: the throttling time elapsed since the offset global minimum watermark was exceeded. wmark_reclaim_work_ms: the duration in which the kernel attempts to reclaim memory from a cgroup. unevictable_text_size_kb: the size of a code snippet to be locked. pagecache_limit_reclaimed_kb: the limit of a page cache. For more information, see Memcg Exstat.	Yes	memory.exstat
memory.pagecache_limit.enable	Controls the Page Cache Limit feature. For more information, see Page Cache Limit feature.	Yes	memory.pagecache_limit.enable
memory.pagecache_limit.size	Specifies the size of the limited page cache. For more information, see Page Cache Limit feature.	Yes	memory.pagecache_limit.size
memory.pagecache_limit.sync	Specifies the mode of the Page Cache Limit feature, which is synchronous or asynchronous. For more information, see Page Cache Limit feature.	Yes	memory.pagecache_limit.sync
memory.idle_page_stats	Queries statistics about kidled memory of individual memcgs of each hierarchy.	Yes	memory.idle_page_stats
memory.idle_page_stats.local	Queries statistics about kidled memory of individual memcgs.	Yes	memory.idle_page_stats.local
memory.numa_stat	Queries NUMA statistics for anonymous, file, and locked memory.	Yes	memory.numa_stat
memory.reap_background	Specifies whether the zombie memcg reapers reap memory of memcgs in the backend asynchronous manner.	Yes	memory.reap_background
memory.stat	Queries memory statistics.	No	memory.stat
memory.use_priority_oom	Controls the memcg OOM priority policy feature. For more information, see Memcg OOM priority policy.	Yes	memory.use_priority_oom

cpuacct

Interface name	Purpose	In-house interface	Corresponding cgroup v2 interface
cpuacct.usage	Queries the total CPU time used. Unit: nanoseconds.	No	cpu.stat, which displays similar data
cpuacct.usage_user	Queries the CPU time used in user mode. Unit: nanoseconds.	No
cpuacct.usage_sys	Queries the CPU time used in kernel mode. Unit: nanoseconds.	No
cpuacct.usage_percpu	Queries the use time of each CPU. Unit: nanoseconds.	No
cpuacct.usage_percpu_user	Queries the use time of each CPU in user mode. Unit: nanoseconds.	No
cpuacct.usage_percpu_sys	Queries the use time of each CPU in kernel mode. Unit: nanoseconds.	No
cpuacct.usage_all	Queries the summary of the cpuacct.usage_percpu_user and cpuacct.usage_percpu_sys interfaces. Unit: nanoseconds.	No
cpuacct.stat	Queries the CPU time used in user mode and kernel mode. Unit: tick.	No
cpuacct.proc_stat	Queries data such as the CPU time, average loads (loadavg), and number of running tasks at the container level.	Yes
cpuacct.enable_sli	Controls whether to count loadavgs at the container level.	Yes	N/A
cpuacct.sched_cfs_statistics	Queries statistics about CFS, such as the runtime of a cgroup and the waiting time of cgroups at the same level or different levels.	Yes	cpu.sched_cfs_statistics
cpuacct.wait_latency	Queries the latency of tasks waiting in the queue.	Yes	cpu.wait_latency
cpuacct.cgroup_wait_latency	Queries the latency of cgroups waiting in the queue. The wait_latency interface counts the latency of task SEs, and the cgroup_wait_latency interface counts the latency of group SEs.	Yes	cpu.cgroup_wait_latency
cpuacct.block_latency	Queries the latency of tasks blocked due to non-I/O causes.	Yes	cpu.block_latency
cpuacct.ioblock_latency	Queries the latency of tasks blocked due to I/O operations.	Yes	cpu.ioblock_latency
io.pressure	Query PSI for I/O performance, memory, and CPUs. The information can be polled. For more information, see the following topics: psi.rst Enable the PSI feature for cgroup v1	No	N/A
memory.pressure		No
cpu.pressure		No

freezer

Interface name	Purpose	In-house interface	Corresponding cgroup v2 interface
freezer.state	Controls the freeze status. Valid values: `FROZEN` and `THAWED`.	No	cgroup.freeze
freezer.self_freezing	Queries whether a cgroup is frozen because of its own frozen state.	No	N/A
freezer.parent_freezing	Queries whether a cgroup is frozen because its ancestor is frozen.	No	N/A

ioasids

The cgroup v1 interfaces and the cgroup v2 interfaces of the ioasids subsystem are the same.

Interface name	Purpose	In-house interface
ioasids.current	Queries the number of ioasids allocated to the current cgroup.	Yes
ioasids.events	Queries the number of events that occurred because the upper limit of allocable ioasids was exceeded.	Yes
ioasids.max	Queries the total number of ioasids that can be allocated to the current cgroup.	Yes

net_cls and net_prio

Interface name	Purpose	In-house interface	Corresponding cgroup v2 interface
net_cls.classid	Specifies the class identifer that tags network packets of the current cgroup. This interface works with qdisc or iptable.	No	N/A Note The corresponding interfaces are removed from cgroup v2. You can use ebpf to filter and shape traffic.
net_prio.prioidx	Queries the index value of the current cgroup in the data structure. The interface is read-only and used internally by the kernel.	No
net_prio.ifpriomap	Specifies the network priority value for each network interface controller (NIC).	No

perf_event

The perf_event subsystem does not provide interfaces. The perf_event subsystem is enabled by default for cgroup v2 and provides the same functionality as the perf_event subsystem in cgroup v1.

pids

The cgroup v1 interfaces and the cgroup v2 interfaces of the pids subsystem are the same.

Interface name	Purpose	In-house interface
pids.max	Specifies the maximum number of tasks in a cgroup.	No
pids.current	Queries the current number of tasks in a cgroup.	No
pids.events	Queries the number of events in which the fork operation fails because the maximum number of supported tasks is reached. The fsnotify library is supported to provide filesystem notifications about the events.	No

rdma

The cgroup v1 interfaces and the cgroup v2 interfaces of the rdma subsystem are the same.

Interface name	Purpose	In-house interface
rdma.max	Specifies the upper limit on the resource usage of the Remote Direct Memory Access (RDMA) adapter.	No
rdma.current	Queries the resource usage of the RDMA adapter.	No