All Products
Search
Document Center

: Differences between cgroup v1 and cgroup v2

Last Updated:Aug 07, 2024

In Linux, Control Groups (cgroups) provide a resource management and restriction mechanism that limits, records, and isolates physical resources, such as CPUs, memory, and I/O resources, that are allocated to tasks (processes) in cgroups. A parent cgroup can be used to control the resource utilization of descendant cgroups. cgroup v1 and cgroup v2 are two major versions of cgroups and significantly differ in design and usage. This topic describes the main differences between cgroup v1 and cgroup v2.

Common interface differences

cgroup v1 interfaces

Interface name

Purpose

In-house interface

Corresponding cgroup v2 interface

cgroup.procs

Writes process IDs (PIDs) to migrate tasks to a cgroup.

No

cgroup.procs.

cgroup.clone_children

A value of 1 indicates that the child cgroups inherit the cpuset configurations of the parent cgroup.

Note

This interface takes effect only on the cpuset subsystem and is classified as a common interface due to historical reasons.

No

N/A

cgroup.sane_behavior

Supports experimental features of the cgroup v2 interfaces, which gives backward compatibility.

No

N/A

notify_on_release

A value of 1 indicates that the tasks in the release_agent interface are executed if a cgroup becomes empty.

Note

These interfaces exist only in the root cgroup.

No

cgroup.events, which implements similar functionality

release_agent

No

tasks

Writes thread IDs (TIDs) to migrate threads to a cgroup.

No

cgroup.threads.

pool_size

Controls the size of the cgroup cache pool. The cgroup cache pool helps accelerate the creation and binding of cgroups in high-concurrency scenarios.

Note

The interface depends on cgroup_rename and cannot be used in cgroup v2.

Yes

N/A

cgroup v2 interfaces

Interface name

Purpose

In-house interface

Corresponding cgroup v1 interface

cgroup.procs

Writes PIDs to migrate tasks to a cgroup.

No

cgroup.procs.

cgroup.type

Writes the string "threaded" to change a cgroup to a threaded cgroup to provide thread-granularity control.

Note

Only cpu, pids, and perf_event threaded controllers are supported.

No

N/A

cgroup.threads

Writes TIDs to migrate threads to a cgroup.

Note

The string "threaded" must be written to the cgroup.type interface file.

No

tasks.

cgroup.controllers

Queries all subsystems available for the current cgroup.

No

N/A

cgroup.subtree_control

Specifies which subsystems are enabled to control resource distribution from the cgroup to its child cgroups.

Note

The subsystems can be queried by using the cgroup.controllers interface.

No

N/A

cgroup.events

Queries whether active processes exist in the current cgroup and whether the current cgroup is frozen. You can use fsnotify to listen on this interface to check whether the interface status is changed.

Note

This interface does not exist in the root cgroup.

No

notify_on_release and release_agent, which are used together to implement similar functionality

cgroup.max.descendants

Controls the maximum number of the descendant cgroups allowed in the current cgroup.

No

N/A

cgroup.max.depth

Controls the maximum depth of descendant cgroups allowed in the current cgroup.

No

N/A

cgroup.stat

Queries the number of descendant cgroups underneath the current cgroup and the descendant cgroups that are in the Dying (deleted) state.

No

N/A

cgroup.freeze

Controls whether to freeze tasks in a cgroup.

Note

This interface does not exist in the root cgroup.

No

freezer.state in the freezer subsystem

cpu.stat

Queries statistics about CPU utilization.

No

N/A

io.pressure

Query Pressure Stall Information (PSI) for I/O performance, memory, and CPUs. The information can be polled. For more information, see the following topics:

No

io.pressure, memory.pressure, and cpu.pressure interfaces in the cpuacct subsystem, which can implement the PSI feature

memory.pressure

No

cpu.pressure

No

Subsystem interface differences

CPU

cgroup v1 interfaces

Interface name

Purpose

In-house interface

Corresponding cgroup v2 interface

cpu.shares

Controls the weight, based on which CPU time slices are allocated to tasks in a cgroup. Default value: 1024.

No

cpu.weight and cpu.weight.nice, which use a different unit

cpu.idle

Controls whether to use an idle scheduling policy for the current cgroup. An idle scheduling policy allocates time slices based on the smallest CPU share. The minimum runtime is no longer supported, which allows the CPU resources to be easily allocated to non-idle tasks.

Note

If the cpu.idle value is set to 1, the cpu.shares interface becomes unwritable and is set only to 3.

No

cpu.idle

cpu.priority

The fine-grained preemptive priority. Preemption is performed when the clock is interrupted or woken up. The fine-grained preemptive priority varies based on the difference between priorities to allow high-priority tasks to preempt memory of low-priority tasks.

Yes

cpu.priority

cpu.cfs_quota_us

The CPU runtime controlled by using Completely Fair Scheduler (CFS). cpu.cfs_quota_us specifies the maximum CPU runtime of tasks in a cgroup within a period defined by the cpu.cfs_period_us interface.

No

cpu.max

cpu.cfs_period_us

No

cpu.cfs_burst_us

The duration in which tasks can burst within a period defined by the cpu.cfs_period_us interface. For more information, see Enable the CPU burst feature for cgroup v1.

No

cpu.max.burst

cpu.cfs_init_buffer_us

The duration in which tasks in a cgroup can burst when the tasks are initiated.

Yes

cpu.max.init_buffer

cpu.stat

Queries statistics about CPU runtime, such as the number of cpu.cfs_period_us periods and the number of times CPU resources used by tasks were throttled.

No

cpu.stat

cpu.rt_runtime_us

Control the real-time CPU runtime. cpu.rt_runtime_us specifies the maximum runtime of real-time tasks in a cgroup within the cpu.rt_period_us period.

No

N/A

cpu.rt_period_us

No

N/A

cpu.bvt_warp_ns

Control the group identity attribute to change the identities of cgroups, which can be used to distinguish between offline tasks and provide better CPU quality of service (QoS) guarantees for online tasks. For more information, see Group identity feature.

Yes

cpu.bvt_warp_ns

cpu.identity

Yes

cpu.identity

cpu.ht_stable

Specifies whether to generate simultaneous multithreading (SMT) peer noise to maintain consistent SMT computing power.

Yes

N/A

cpu.ht_ratio

Controls whether to use quotas to provide extra computing power when the SMT peer is idle to maintain consistent SMT computing power.

Yes

cpu.ht_ratio

cgroup v2 interfaces

Note

cgroup v2 no longer supports the cpuacct subsystem. Specific interfaces or related features of the cpuacct subsystem are implemented by the CPU subsystem in cgroup v2.

Interface name

Purpose

In-house interface

Corresponding cgroup v1 interface

cpu.weight

Controls the weight, based on which CPU time slices are allocated to tasks in a cgroup. Default value: 100.

No

cpu.shares, which uses a different unit

cpu.weight.nice

Controls the weight, based on which CPU time slices are allocated to tasks in a cgroup. Default value: 0.

No

cpu.shares, which uses a different unit

cpu.idle

Controls whether to use an idle scheduling policy for the current cgroup. An idle scheduling policy allocates time slices based on the smallest CPU share. The minimum runtime is no longer supported, which allows the CPU resources to be easily allocated to non-idle tasks.

Note

When the cpu.idle value is 1, the cpu.weight and cpu.weight.nice interfaces become unwritable, and a minimum weight of 0.3 takes effect. In this case, the cpu.weight value is rounded to 0.

No

cpu.idle

cpu.priority

The fine-grained preemptive priority. Preemption is performed when the clock is interrupted or woken up. The fine-grained preemptive priority varies based on the difference between priorities to allow high-priority tasks to preempt memory of low-priority tasks.

Yes

cpu.priority

cpu.max

The CPU runtime controlled by using CFS. cpu.cfs_quota_us specifies the maximum CPU runtime of tasks in a cgroup within the cpu.cfs_period_us period.

No

cpu.cfs_quota_us, cpu.cfs_period_us

cpu.max.burst

The duration in which tasks can burst within a period defined by the cpu.max interface.

No

cpu.max.burst

cpu.max.init_buffer

The duration in which tasks in a cgroup can burst when the tasks are initiated.

Yes

cpu.cfs_init_buffer_us

cpu.bvt_warp_ns

Control the group identity attribute to change the identities of cgroups, which can be used to distinguish between offline tasks and provide better CPU QoS guarantees for online tasks.

Yes

cpu.bvt_warp_ns

cpu.identity

Yes

cpu.identity

cpu.sched_cfs_statistics

Queries statistics about CFS, such as the runtime of a cgroup and the waiting time of cgroups at the same level or different levels.

Note

The kernel.sched_schedstats option must be enabled.

Yes

cpuacct.sched_cfs_statistics

cpu.wait_latency

Queries the latency of tasks waiting in the queue.

Note

The kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled options must be enabled.

Yes

cpuacct.wait_latency

cpu.cgroup_wait_latency

Queries the latency of cgroups waiting in the queue. The wait_latency interface counts the latency of task scheduling entities (SEs), and the cgroup_wait_latency interface counts the latency of group SEs.

Note

The kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled options must be enabled.

Yes

cpuacct.cgroup_wait_latency

cpu.block_latency

Queries the latency of tasks blocked due to non-I/O causes.

Note

The kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled options must be enabled.

Yes

cpuacct.block_latency

cpu.ioblock_latency

Queries the latency of tasks blocked due to I/O operations.

Note

The kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled options must be enabled.

Yes

cpuacct.ioblock_latency

cpu.ht_ratio

Controls whether to use quotas to provide extra computing power when the SMT peer is idle to maintain consistent SMT computing power.

Note

This interface takes effect only if the core scheduling feature is enabled.

Yes

cpu.ht_ratio

cpuset

cgroup v1 interfaces

Interface name

Purpose

In-house interface

Corresponding cgroup v2 interface

cpuset.cpus

Controls the CPUs on which tasks can run.

Note

Tasks cannot be attached to a cgroup when this interface is empty.

No

cpuset.cpus

cpuset.mems

Controls the non-uniform memory access (NUMA) nodes that can be allocated to tasks in a cgroup.

Note

Tasks cannot be attached to a cgroup when this interface is empty.

No

cpuset.mems

cpuset.effective_cpus

Queries the effective CPUs on which tasks are running. The value of this interface is affected by CPU hotplug events.

No

cpuset.cpus.effective

cpuset.effective_mems

Queries the effective NUMA nodes that are allocated to the running tasks. The value of this interface is affected by memory nodes hotplug events.

No

cpuset.mems.effective

cpuset.cpu_exclusive

Controls which CPUs are exclusively used by a cgroup and cannot be used by other cpusets at the same level in a cgroup.

No

cpuset.cpus.partition, that supports similar functionality

cpuset.mem_exclusive

Controls which NUMA nodes are exclusively used by a cgroup and cannot be used by other cpusets at the same level in a cgroup.

No

N/A

cpuset.mem_hardwall

A value of 1 indicates that memory only from the memory nodes that are attached to the cpuset can be allocated to tasks.

No

N/A

cpuset.sched_load_balance

Controls whether CPUs are load-balanced within the cpuset. By default, the feature is enabled.

No

N/A

cpuset.sched_relax_domain_level

Controls the range in which to search for CPUs when a scheduler migrates tasks to load-balance CPUs for the tasks. Default value: -1.

  • -1: enforces the default system policy.

  • 0: does not perform a search.

  • 1: searches for hyperthreads within the same core.

  • 2: searches for cores in the same package.

  • 3: searches for CPUs on the same node.

  • 4: searches for CPUs on nodes in the same chunk.

  • 5: searches for CPUs in the entire system.

No

N/A

cpuset.memory_migrate

A non-zero value indicates that if a task is allocated a memory page in a cpuset and migrated to another cpuset, the memory page can also be migrated to the new cpuset.

No

N/A

cpuset.memory_pressure

Calculates the memory paging pressure of the current cpuset.

No

N/A

cpuset.memory_spread_page

A value of 1 indicates that the kernel evenly allocates the page cache to the memory nodes of the cpuset.

No

N/A

cpuset.memory_spread_slab

A value of 1 indicates that the kernel evenly allocates the slab caches to the memory nodes of the cpuset.

No

N/A

cpuset.memory_pressure_enabled

A value of 1 indicates that memory pressure statistics collection is enabled for the cpuset.

No

N/A

cgroup v2 interfaces

Interface name

Purpose

In-house interface

Corresponding cgroup v1 interface

cpuset.cpus

Controls the CPUs on which tasks can run.

Note

When the value of this interface is empty, the CPUs of the parent cpuset are used.

No

cpuset.cpus

cpuset.mems

Controls the NUMA nodes that can be allocated to tasks in a cgroup.

Note

When the value of this interface is empty, the NUMA nodes of the parent cpuset are used.

No

cpuset.mems

cpuset.cpus.effective

Queries the effective CPUs on which tasks are running. The value of this interface is affected by CPU hotplug events.

No

cpuset.effective_cpus

cpuset.mems.effective

Queries the effective NUMA nodes that are allocated to the running tasks. The value of this interface is affected by memory nodes hotplug events.

No

cpuset.effective_mems

cpuset.cpus.partition

Controls whether CPUs of a cpuset are exclusively used. If root is written into the interface, CPUs of a cpuset are exclusively used.

No

cpuset.cpu_exclusive, which implements similar functionality

.__DEBUG__.cpuset.cpus.subpartitions

Queries which CPUs are used exclusively when root is written into the cpuset.cpus.partition interface.

Note

This interface is available only if the cgroup_debug feature is enabled for kernel cmdline.

No

N/A

blkio

cgroup v1 interfaces

Interface name

Purpose

In-house interface

Corresponding cgroup v2 interface

blkio.throttle.read_bps_device

Specifies the maximum number of bytes per second that a cgroup can read from a device.

Example:

echo "<major>:<minor> <bps>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.read_bps_device

No

io.max

blkio.throttle.write_bps_device

Specifies the maximum number of bytes per second that a cgroup can write to a device.

Example:

echo "<major>:<minor> <bps>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.write_bps_device

No

io.max

blkio.throttle.read_iops_device

Specifies the maximum number of read operations per second that a cgroup can perform on a device.

Example:

echo "<major>:<minor> <iops>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.read_iops_device

No

io.max

blkio.throttle.write_iops_device

Specifies the maximum number of read operations per second that a cgroup can perform on a device.

Example:

echo "<major>:<minor> <iops>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.write_iops_device

No

io.max

blkio.throttle.io_service_bytes

Queries bandwidth statistics.

This interface collects the read, write, sync, async, discard, and total bandwidth statistics of all devices. Unit: bytes.

No

io.stat

blkio.throttle.io_service_bytes_recursive

The recursive version of the blkio.throttle.io_service_bytes interface.

Statistics collected by using the blkio.throttle.io_service_bytes interface include data of descendant cgroups.

No

N/A

blkio.throttle.io_serviced

Queries IOPS statistics.

This interface collects the read, write, sync, async, discard, and total IOPS statistics of all devices.

No

io.stat

blkio.throttle.io_serviced_recursive

The recursive version of the blkio.throttle.io_serviced interface.

Statistics collected by using the blkio.throttle.io_serviced interface include data of descendant cgroups.

No

N/A

blkio.throttle.io_service_time

Queries the duration between request dispatch and request completion for I/O operations, which is used to measure the average I/O latency.

For more information, see Enhance the monitoring of block I/O throttling.

Yes

io.extstat

blkio.throttle.io_wait_time

Queries the duration when I/O operations wait in scheduler queues, which is used to measure the average I/O latency.

For more information, see Enhance the monitoring of block I/O throttling.

Yes

io.extstat

blkio.throttle.io_completed

Queries the number of completed I/O operations, which is used to measure the average I/O latency.

For more information, see Enhance the monitoring of block I/O throttling.

Yes

io.extstat

blkio.throttle.total_bytes_queued

Queries the number of I/O bytes that were throttled, which is used to analyze whether I/O latency is related to throttling.

For more information, see Enhance the monitoring of block I/O throttling.

Yes

io.extstat

blkio.throttle.total_io_queued

Queries the number of I/O operations that were throttled, which is used to analyze whether I/O latency is related to throttling.

For more information, see Enhance the monitoring of block I/O throttling.

Yes

io.extstat

blkio.cost.model

Specifies the blk-iocost cost model. The control mode (ctrl) can be set to auto or user.

This interface exists only in the root cgroup. Example:

echo "<major>:<minor> ctrl=user model=linear rbps=<rbps> rseqiops=<rseqiops> rrandiops=<rrandiops> wbps=<wbps> wseqiops=<wseqiops> wrandiops=<wrandiops>" > /sys/fs/cgroup/blkio/blkio.cost.model

For more information, see Configure the blk-iocost weight-based throttling feature.

Yes

io.cost.model

blkio.cost.qos

Controls the blk-iocost feature and configures a QoS policy to check for disk congestion.

This interface exists only in the root cgroup. Example:

echo "<major>:<minor> enable=1 ctrl=user rpct= rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/blkio/blkio.cost.qos

For more information, see Configure blk-iocost weight throttling.

Yes

io.cost.qos

blkio.cost.weight

Specifies the cgroup weight.

This interface exists only in non-root cgroups and can be configured in the following modes:

  • weight: sets the same weight for all devices.

  • major:minor + weight: set the weight of a specific device.

For more information, see Configure the blk-iocost weight-based throttling feature.

Yes

io.cost.weight

blkio.cost.stat

Queries the blk-iocost statistics. The interface exists only in non-root cgroups.

Yes

N/A

cgroup v2 interfaces

Interface name

Purpose

In-house interface

Corresponding cgroup v1 interface

io.max

The throttling interface that specifies the read and write throttling rates in byte/s and IOPS. Example:

echo "<major>:<minor> rbps=<bps> wbps=<bps> riops=<iops> wiops=<iops>" > /sys/fs/cgroup/<cgroup>/io.max

No

blkio.throttle.read_bps_device

blkio.throttle.read_iops_device

blkio.throttle.write_bps_device

blkio.throttle.write_iops_device

io.stat

Queries I/O operation statistics, which include the rates of read, write, and discard operations in byte/s and IOPS.

No

blkio.throttle.io_service_bytes

blkio.throttle.io_serviced

io.extstat

Queries extended I/O statistics, including the wait time, service time, number of completed I/O operations, and throttling rates in byte/s and IOPS.

No

blkio.throttle.io_service_time

blkio.throttle.io_wait_time

blkio.throttle.io_completed

blkio.throttle.total_bytes_queued

blkio.throttle.total_io_queued

io.cost.model

Specifies the blk-iocost cost model. The control mode (ctrl) can be set to auto or user.

This interface exists only in the root cgroup. Example:

echo "<major>:<minor> ctrl=user model=linear rbps=<rbps> rseqiops=<rseqiops> rrandiops=<rrandiops> wbps=<wbps> wseqiops=<wseqiops> wrandiops=<wrandiops>" > /sys/fs/cgroup/io.cost.model

For more information, see Configure blk-iocost weight throttling.

No

blkio.cost.model

io.cost.qos

Controls the blk-iocost feature and configures a QoS policy to check for disk congestion.

This interface exists only in the root cgroup. Example:

echo "<major>:<minor> enable=1 ctrl=user rpct= rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/io.cost.qos

For more information, see Configure blk-iocost weight throttling.

No

blkio.cost.qos

io.cost.weight

Specifies the cgroup weight.

This interface exists only in non-root cgroups and can be configured in the following modes:

  • weight: sets the same weight for all devices.

  • major:minor + weight: set the weight of a specific device.

For more information, see Configure blk-iocost weight throttling.

No

blkio.cost.weight

memory

cgroup v1 interfaces

Interface name

Purpose

In-house interface

Corresponding cgroup v2 interface

memory.usage_in_bytes

Queries the current memory usage.

No

N/A

memory.max_usage_in_bytes

Queries the maximum memory usage.

No

N/A

memory.limit_in_bytes

Specifies the hard upper limit on memory usage.

No

N/A

memory.soft_limit_in_bytes

Specifies the soft lower limit on memory usage.

No

N/A

memory.failcnt

Queries the number of times the memory usage reached the upper limit.

No

N/A

memory.mglru_batch_size

Specifies the size of memory that is proactively reclaimed based on the Multi-Generational Least Recently Used (MGLRU) framework. An attempt is made to release CPUs between batches of memory reclamation.

Yes

N/A

memory.mglru_reclaim_kbytes

Specifies the size of memory that is proactively reclaimed based on the MGLRU framework.

Yes

N/A

memory.wmark_ratio

Controls the memcg backend asynchronous reclaim feature and sets the memcg memory watermark that triggers asynchronous reclamation. Unit: percent of the memcg memory upper limit. Valid values: 0 to 100.

  • The default value is 0, which indicates that the memcg backend asynchronous reclaim feature is disabled.

  • When the value is not 0, the memcg backend asynchronous reclaim feature is enabled. You can set the corresponding watermark.

For more information, see Memcg backend asynchronous reclaim.

Yes

memory.wmark_ratio

memory.wmark_high

A read-only interface.

  • When the memcg memory usage exceeds the value of this interface, backend asynchronous reclamation is started.

  • The value of this interface is calculated by using the following formula: memory.wmark_high = memory.limit_in_bytes × memory.wmark_ratio/100.

  • When the memcg backend asynchronous reclaim feature is disabled, memory.wmark_high defaults to a large value to prevent backend asynchronous reclamation from being triggered.

  • This interface file is not stored in the memcg root directory.

For more information, see Memcg backend asynchronous reclaim.

Yes

memory.wmark_low

A read-only interface.

  • When the memcg memory usage falls below the value of this interface, backend asynchronous reclamation ends.

  • The value of this interface is calculated by using the following formula: memory.wmark_low = memory.wmark_high-memory.limit_in_bytes × memory.wmark_scale_factor/10000.

  • This interface file is not stored in the memcg root directory.

For more information, see Memcg backend asynchronous reclaim.

Yes

memory.wmark_scale_factor

Specifies the interval between the memory.wmark_high value and the memory.wmark_low value. Unit: 0.01 percent of the memcg memory upper limit. Valid values: 1 to 1000.

  • This interface inherits the value of its parent group when the interface is created. The inherited value is 50, which indicates 0.50% of the memcg memory upper limit. This is also the default value.

  • This interface file is not stored in the memcg root directory.

For more information, see Memcg backend asynchronous reclaim.

Yes

memory.wmark_min_adj

The factor that is used in the memcg global minimum watermark rating feature.

The value of this interface indicates an adjustment in percentage over the global minimum watermark. Valid values: -25 to 50.

  • This interface inherits a value of 0 from the parent cgroup when the interface is created. Therefore, the default value is 0.

  • A negative value in the value range is an adjustment in percentage over the [0, WMARK_MIN] range, where WMARK_MIN is the value of global wmark_min. For example, if memory.wmark_min_adj is -25, WMARK_MIN of a memcg is calculated by using the following formula: memcg WMARK_MIN = WMARK_MIN + (WMARK_MIN - 0) × -25%.

  • A positive value in the range is an adjustment in percentage over the [WMARK_MIN, WMARK_LOW] range. WMARK_MIN is the value of global wmark_min, and WMARK_LOW is the value of global wmark_low.

  • When the offset global minimum watermark is triggered, throttling is performed, and the throttling time is linearly proportional to the excess memory usage. Valid values of the throttling time: 1 to 1000. Unit: milliseconds.

For more information, see Memcg global minimum watermark rating.

Yes

memory.force_empty

Specifies whether to forcefully reclaim memory pages.

No

N/A

memory.use_hierarchy

Specifies whether to collect hierarchical statistics.

Yes

N/A

memory.swappiness

Specifies the swappiness parameter of vmscan, which controls the tendency of the kernel to use the swap partition.

No

N/A

memory.priority

Specifies the memcg priority. This interface provides 13 memcg out-of-memory (OOM) priorities to sort business. Valid values: 0 to 12. A larger value indicates a higher priority. The priority of a parent cgroup is not inherited by its descendant cgroups. Default value: 0.

  • This interface is used to implement memcg QoS. The priority values, rather than global variables, are used to sort sibling cgroups only in the same parent cgroup.

  • The sibling memcgs with the same priority are sorted by memory usage. An OOM error is triggered on the child memcg that consumes the largest amount of memory.

Yes

memory.priority

memory.move_charge_at_immigrate

Specifies whether charges of a task are moved along the task when the task is migrated between cgroups, which is a statistical control policy.

No

N/A

memory.oom_control

Specifies whether to trigger the OOM killer to terminate tasks when an OOM error occurs and generate notifications about OOM status.

No

N/A

memory.oom.group

Controls the OOM group feature that can terminate all tasks in a memcg if an OOM error occurs.

Yes

memory.oom.group

memory.pressure_level

Specifies memory pressure notifications.

No

N/A

memory.kmem.limit_in_bytes

Specifies the hard limit on the memory usage of the kernel.

No

N/A

memory.kmem.usage_in_bytes

Queries the memory usage of the kernel.

No

N/A

memory.kmem.failcnt

Queries the number of times the memory usage of the kernel reached the upper limit.

No

N/A

memory.kmem.max_usage_in_bytes

Queries the maximum memory usage of the kernel.

No

N/A

memory.kmem.slabinfo

Queries the slab memory usage of the kernel.

No

N/A

memory.kmem.tcp.limit_in_bytes

Specifies the hard limit on the TCP memory usage of the kernel.

No

N/A

memory.kmem.tcp.usage_in_bytes

Queries the TCP memory usage of the kernel.

No

N/A

memory.kmem.tcp.failcnt

Queries the number of times the TCP memory usage of the kernel reached the upper limit.

No

N/A

memory.kmem.tcp.max_usage_in_bytes

Queries the maximum TCP memory usage of the kernel.

No

N/A

memory.memsw.usage_in_bytes

Queries the memory usage and swap memory usage.

No

N/A

memory.memsw.max_usage_in_byte

Queries the maximum usage of memory and swap memory.

No

N/A

memory.memsw.limit_in_bytes

Specifies the upper limit on the total usage of memory and swap memory used by tasks in the cgroup.

No

N/A

memory.memsw.failcnt

Queries the number of times the total usage of memory and swap memory reached the upper limit.

No

N/A

memory.swap.high

Specifies the upper limit on available swap memory usage in a cgroup.

Yes

memory.swap.high

memory.swap.events

Queries the events occuring when the swap memory usage reached the upper limit.

Yes

memory.swap.events

memory.min

Specifies a minimum amount of memory that a cgroup must retain, which is a hard guarantee of memory.

For more information, see Memcg QoS feature of the cgroup v1 interface.

Yes

memory.min

memory.low

Specifies the lower limit of memory that a cgroup can retain, which is a soft guarantee of memory. For more information, see Memcg QoS feature of the cgroup v1 interface.

Yes

memory.low

memory.high

Specifies the throttle limit of the memory usage. For more information, see Memcg QoS feature of the cgroup v1 interface.

Yes

memory.high

memory.allow_duptext

When the /sys/kernel/mm/duptext/enabled parameter is configured to globally enable the code duptext feature, the interface is used to control whether to enable the code duptext feature for tasks in a specific memcg. Valid values: 0 and 1. Default value: 0.

  • 1: enables the code duptext feature for tasks in a specific memcg.

  • 0: disables the code duptext feature for tasks in a specific memcg.

For more information, see Code duptext feature.

Yes

memory.allow_duptext

memory.allow_duptext_refresh

Specifies whether the code duptext feature is immediately started when a binary file is generated or downloaded. The code duptext feature does not take effect in case of PageDirty or PageWriteback. The interface uses the asynchronous task mode to refresh tasks when the code duptext feature does not take effect in scenarios of PageDirty or PageWriteback.

Yes

memory.allow_duptext_refresh

memory.duptext_nodes

Limits the duptext memory allocation nodes.

Yes

memory.duptext_nodes

memory.allow_text_unevictable

Specifies whether the memcg snippet is locked.

Yes

memory.allow_text_unevictable

memory.text_unevictable_percent

Specifies the ratio of the amount of memory used by locked memcg code snippet to the total amount of memory used by memcg code.

Yes

memory.text_unevictable_percent

memory.thp_reclaim

Controls the Transparent Huge Pages (THP) reclaim feature. Valid values:

  • reclaim: enables the THP reclaim feature.

  • swap: is reserved for future use.

  • disable: disables the THP reclaim feature.

Default value: disable.

For more information, see THP reclaim.

Yes

memory.thp_reclaim

memory.thp_reclaim_stat

Queries the status of the THP reclaim feature. Parameters of this interface:

  • queue_length: the number of THPs in the queue of each node. If the THP reclaim feature is enabled, THPs are added to a reclaim queue.

  • split_hugepage: the total number of THPs that are split by the THP reclaim feature for each node.

  • reclaim_subpage: the total number of zero subpages that are reclaimed by the THP reclaim feature for each node.

The values of the preceding parameters are listed in ascending order by NUMA node ID, such as node0 and node1, from left to right.

For more information, see THP reclaim.

Yes

memory.thp_reclaim_stat

memory.thp_reclaim_ctrl

Specifies how the THP reclaim feature is triggered. Parameters of this interface:

  • threshold: the maximum number of zero subpages in a THP. If the number of zero subpages in a THP exceeds the threshold value, the THP reclaim feature is triggered. Default value: 16.

  • reclaim: triggers the THP reclaim feature.

For more information, see THP reclaim.

Yes

memory.thp_reclaim_ctrl

memory.thp_control

Controls the memcg THP feature. This interface can be used to prohibit the application of anon, shmem, and file THPs.

For example, an offline memcg is not allowed to use THPs. This helps reduce THP contention and memory waste, even though memory fragmentation cannot be prevented.

Yes

memory.thp_control

memory.reclaim_caches

Specifies whether the kernel proactively reclaims the cache in memcgs. Example: echo 100M > memory.reclaim_caches.

Yes

memory.reclaim_caches

memory.pgtable_bind

Specifies whether to forcefully apply for page table memory on the current node.

Yes

memory.pgtable_bind

memory.pgtable_misplaced

Queries statistics about page memory in page tables when page memory is allocated across nodes.

Yes

memory.pgtable_misplaced

memory.oom_offline

In the Quick OOM feature, you can use this interface to mark the memcg of an offline task.

Yes

memory.oom_offline

memory.async_fork

Controls the Async-fork feature, formerly known as fast convergent merging (FCM), for memcgs.

Yes

memory.async_fork

memory.direct_compact_latency

Specifies the latency in direct memory compaction of the memsli feature.

Yes

memory.direct_compact_latency

memory.direct_reclaim_global_latency

Specifies the latency in direct global memory reclamation of the memsli feature.

Yes

memory.direct_reclaim_global_latency

memory.direct_reclaim_memcg_latency

Specifies the latency in direct memcg memory reclamation of the memsli feature.

Yes

memory.direct_reclaim_memcg_latency

memory.direct_swapin_latency

Specifies the latency in direct memory swap-in of the memsli feature.

Yes

memory.direct_swapin_latency

memory.direct_swapout_global_latency

Specifies the latency in direct global memory swap-out of the memsli feature.

Yes

memory.direct_swapout_global_latency

memory.direct_swapout_memcg_latency

Specifies the latency in direct memcg memory swap-out of the memsli feature.

Yes

memory.direct_swapout_memcg_latency

memory.exstat

Queries statistics about extended memory and extra memory. Statistics about the following in-house features are collected:

  • wmark_min_throttled_ms: the throttling time elapsed since the offset global minimum watermark was exceeded.

  • wmark_reclaim_work_ms: the duration in which the kernel attempts to reclaim memory from a cgroup.

  • unevictable_text_size_kb: the size of a code snippet to be locked.

  • pagecache_limit_reclaimed_kb: the limit of a page cache.

For more information, see Memcg Exstat feature.

Self-developed enhancement

memory.exstat

memory.idle_page_stats

Queries statistics about kidled memory usage of a memcg and the hierarchical information of the cgroup.

Yes

memory.idle_page_stats

memory.idle_page_stats.local

Queries statistics about kidled memory usage of a memcg.

Yes

memory.idle_page_stats.local

memory.numa_stat

Queries NUMA statistics for anonymous, file, and locked memory.

No

memory.numa_stat

memory.pagecache_limit.enable

Controls the Page Cache Limit feature.

For more information, see Page Cache Limit feature.

Yes

memory.pagecache_limit.enable

memory.pagecache_limit.size

Specifies the size of the limited page cache.

Yes

memory.pagecache_limit.size

memory.pagecache_limit.sync

Specifies the mode of the Page Cache Limit feature, which is synchronous or asynchronous.

Yes

memory.pagecache_limit.sync

memory.reap_background

Specifies whether the zombie memcg reapers reap memory of memcgs in the backend asynchronous manner.

Yes

memory.reap_background

memory.stat

Queries memory statistics.

No

memory.stat

memory.use_priority_oom

Controls the memcg OOM priority policy feature.

For more information, see Memcg OOM priority policy.

Yes

memory.use_priority_oom

memory.use_priority_swap

Specifies whether the memory is swapped based on the priorities of cgroups.

For more information, see Memcg OOM priority policy.

Yes

memory.use_priority_swap

cgroup v2 interfaces

Interface name

Purpose

In-house interface

Corresponding cgroup v1 interface

memory.current

Queries the memory usage.

No

N/A

memory.min

Specifies a minimum amount of memory that a cgroup must retain, which is a hard guarantee of memory.

For more information, see Memcg QoS feature of the cgroup v1 interface.

No

memory.min

memory.low

Specifies the lower limit of memory that a cgroup can retain, which is a soft guarantee of memory.

For more information, see Memcg QoS feature of the cgroup v1 interface.

No

memory.low

memory.high

Specifies the upper limit on memory usage.

For more information, see Memcg QoS feature of the cgroup v1 interface.

No

memory.high

memory.max

Specifies the throttle limit of the memory usage.

No

memory.max

memory.swap.current

Queries swap memory in use.

No

N/A

memory.swap.high

Specifies the upper limit on available swap memory usage in a cgroup.

No

N/A

memory.swap.max

Specifies a hard limit on swap memory.

No

N/A

memory.swap.events

Queries the events occuring when the swap memory usage reached the upper limit.

No

N/A

memory.oom.group

Specifies whether the OOM group feature is enabled, which can kill all tasks in a memcg if an OOM error occurs.

No

memory.oom.group

memory.wmark_ratio

Controls the memcg backend asynchronous reclaim feature and sets the memcg memory watermark that triggers asynchronous reclamation. Unit: percent of the memcg memory upper limit. Valid values: 0 to 100.

  • The default value is 0, which indicates that the memcg backend asynchronous reclaim feature is disabled.

  • When the value is not 0, the memcg backend asynchronous reclaim feature is enabled. You can set the corresponding watermark.

For more information, see Memcg backend asynchronous reclaim.

Yes

memory.wmark_ratio

memory.wmark_high

A read-only interface.

  • When the memcg memory usage exceeds the value of this interface, backend asynchronous reclamation is started.

  • The value of this interface is calculated by using the following formula: memory.wmark_high = memory.limit_in_bytes × memory.wmark_ratio/100.

  • When the memcg backend asynchronous reclaim feature is disabled, memory.wmark_high defaults to a large value to prevent backend asynchronous reclamation from being triggered.

  • This interface file is not stored in the memcg root directory.

For more information, see Memcg backend asynchronous reclaim.

Yes

memory.wmark_high

memory.wmark_low

A read-only interface.

  • When the memcg memory usage falls below the value of this interface, backend asynchronous reclamation ends.

  • The value of this interface is calculated by using the following formula: memory.wmark_low = memory.wmark_high-memory.limit_in_bytes × memory.wmark_scale_factor/10000.

  • This interface file is not stored in the memcg root directory.

For more information, see Memcg backend asynchronous reclaim.

Yes

memory.wmark_low

memory.wmark_scale_factor

Specifies the interval between the memory.wmark_high value and the memory.wmark_low value. Unit: 0.01 percent of the memcg memory upper limit. Valid values: 1 to 1000.

  • This interface inherits the value of its parent group when the interface is created. The inherited value is 50, which indicates 0.50% of the memcg memory upper limit. This is also the default value.

  • This interface file is not stored in the memcg root directory.

For more information, see Memcg backend asynchronous reclaim.

Yes

memory.wmark_scale_factor

memory.wmark_min_adj

The factor that is used in the memcg global minimum watermark rating feature.

The value of this interface indicates an adjustment in percentage over the global minimum watermark. Valid values: -25 to 50.

  • This interface inherits a value of 0 from the parent cgroup when the interface is created. Therefore, the default value is 0.

  • A negative value in the value range is an adjustment in percentage over the [0, WMARK_MIN] range, where WMARK_MIN is the value of global wmark_min. For example, if memory.wmark_min_adj is -25, WMARK_MIN of a memcg is calculated by using the following formula: memcg WMARK_MIN = WMARK_MIN + (WMARK_MIN - 0) × -25%.

  • A positive value in the range is an adjustment in percentage over the [WMARK_MIN, WMARK_LOW] range. WMARK_MIN is the value of global wmark_min, and WMARK_LOW is the value of global wmark_low.

  • When the offset global minimum watermark is triggered, throttling is performed, and the throttling time is linearly proportional to the excess memory usage. Valid values of the throttling time: 1 to 1000. Unit: milliseconds.

For more information, see Memcg global minimum watermark rating.

Yes

memory.wmark_min_adj

memory.priority

Specifies the memcg priority. This interface provides 13 memcg OOM priorities to sort business. Valid values: 0 to 12. A larger value indicates a higher priority. The priority of a parent cgroup is not inherited by its descendant cgroups. Default value: 0.

  • This interface is used to implement memcg QoS. The priority values, rather than global variables, are used to sort sibling cgroups only in the same parent cgroup.

  • The sibling memcgs with the same priority are sorted by memory usage. An OOM error is triggered on the child memcg that consumes the largest amount of memory.

For more information, see Memcg OOM priority policy.

Yes

memory.priority

memory.use_priority_oom

Controls the memcg OOM priority policy feature.

For more information, see Memcg OOM priority policy.

Yes

memory.use_priority_oom

memory.use_priority_swap

Specifies whether the memory is swapped based on the priorities of cgroups.

For more information, see Memcg OOM priority policy.

Yes

memory.use_priority_swap

memory.direct_reclaim_global_latency

Specifies the latency in direct global memory reclamation of the memsli feature.

Yes

memory.direct_reclaim_global_latency

memory.direct_reclaim_memcg_latency

Specifies the latency in direct memcg memory reclamation of the memsli feature.

Yes

memory.direct_reclaim_memcg_latency

memory.direct_compact_latency

Specifies the latency in direct memory compaction of the memsli feature.

Yes

memory.direct_compact_latency

memory.direct_swapout_global_latency

Specifies the latency in direct global memory swap-out of the memsli feature.

Yes

memory.direct_swapout_global_latency

memory.direct_swapout_memcg_latency

Specifies the latency in direct memcg memory swap-out of the memsli feature.

Yes

memory.direct_swapout_memcg_latency

memory.direct_swapin_latency

Specifies the latency in direct memory swap-in of the memsli feature.

Yes

memory.direct_swapin_latency

memory.exstat

Queries statistics about extended memory and extra memory. Statistics about the following in-house features are collected:

  • wmark_min_throttled_ms: the throttling time elapsed since the offset global minimum watermark was exceeded.

  • wmark_reclaim_work_ms: the duration in which the kernel attempts to reclaim memory from a cgroup.

  • unevictable_text_size_kb: the size of a code snippet to be locked.

  • pagecache_limit_reclaimed_kb: the limit of a page cache.

For more information, see Memcg Exstat.

Yes

memory.exstat

memory.pagecache_limit.enable

Controls the Page Cache Limit feature.

For more information, see Page Cache Limit feature.

Yes

memory.pagecache_limit.enable

memory.pagecache_limit.size

Specifies the size of the limited page cache.

For more information, see Page Cache Limit feature.

Yes

memory.pagecache_limit.size

memory.pagecache_limit.sync

Specifies the mode of the Page Cache Limit feature, which is synchronous or asynchronous.

For more information, see Page Cache Limit feature.

Yes

memory.pagecache_limit.sync

memory.idle_page_stats

Queries statistics about kidled memory of individual memcgs of each hierarchy.

Yes

memory.idle_page_stats

memory.idle_page_stats.local

Queries statistics about kidled memory of individual memcgs.

Yes

memory.idle_page_stats.local

memory.numa_stat

Queries NUMA statistics for anonymous, file, and locked memory.

Yes

memory.numa_stat

memory.reap_background

Specifies whether the zombie memcg reapers reap memory of memcgs in the backend asynchronous manner.

Yes

memory.reap_background

memory.stat

Queries memory statistics.

No

memory.stat

memory.use_priority_oom

Controls the memcg OOM priority policy feature.

For more information, see Memcg OOM priority policy.

Yes

memory.use_priority_oom

cpuacct

Interface name

Purpose

In-house interface

Corresponding cgroup v2 interface

cpuacct.usage

Queries the total CPU time used. Unit: nanoseconds.

No

cpu.stat, which displays similar data

cpuacct.usage_user

Queries the CPU time used in user mode. Unit: nanoseconds.

No

cpuacct.usage_sys

Queries the CPU time used in kernel mode. Unit: nanoseconds.

No

cpuacct.usage_percpu

Queries the use time of each CPU. Unit: nanoseconds.

No

cpuacct.usage_percpu_user

Queries the use time of each CPU in user mode. Unit: nanoseconds.

No

cpuacct.usage_percpu_sys

Queries the use time of each CPU in kernel mode. Unit: nanoseconds.

No

cpuacct.usage_all

Queries the summary of the cpuacct.usage_percpu_user and cpuacct.usage_percpu_sys interfaces. Unit: nanoseconds.

No

cpuacct.stat

Queries the CPU time used in user mode and kernel mode. Unit: tick.

No

cpuacct.proc_stat

Queries data such as the CPU time, average loads (loadavg), and number of running tasks at the container level.

Yes

cpuacct.enable_sli

Controls whether to count loadavgs at the container level.

Yes

N/A

cpuacct.sched_cfs_statistics

Queries statistics about CFS, such as the runtime of a cgroup and the waiting time of cgroups at the same level or different levels.

Yes

cpu.sched_cfs_statistics

cpuacct.wait_latency

Queries the latency of tasks waiting in the queue.

Yes

cpu.wait_latency

cpuacct.cgroup_wait_latency

Queries the latency of cgroups waiting in the queue. The wait_latency interface counts the latency of task SEs, and the cgroup_wait_latency interface counts the latency of group SEs.

Yes

cpu.cgroup_wait_latency

cpuacct.block_latency

Queries the latency of tasks blocked due to non-I/O causes.

Yes

cpu.block_latency

cpuacct.ioblock_latency

Queries the latency of tasks blocked due to I/O operations.

Yes

cpu.ioblock_latency

io.pressure

Query PSI for I/O performance, memory, and CPUs. The information can be polled. For more information, see the following topics:

No

N/A

memory.pressure

No

cpu.pressure

No

freezer

Interface name

Purpose

In-house interface

Corresponding cgroup v2 interface

freezer.state

Controls the freeze status. Valid values: FROZEN and THAWED.

No

cgroup.freeze

freezer.self_freezing

Queries whether a cgroup is frozen because of its own frozen state.

No

N/A

freezer.parent_freezing

Queries whether a cgroup is frozen because its ancestor is frozen.

No

N/A

ioasids

The cgroup v1 interfaces and the cgroup v2 interfaces of the ioasids subsystem are the same.

Interface name

Purpose

In-house interface

ioasids.current

Queries the number of ioasids allocated to the current cgroup.

Yes

ioasids.events

Queries the number of events that occurred because the upper limit of allocable ioasids was exceeded.

Yes

ioasids.max

Queries the total number of ioasids that can be allocated to the current cgroup.

Yes

net_cls and net_prio

Interface name

Purpose

In-house interface

Corresponding cgroup v2 interface

net_cls.classid

Specifies the class identifer that tags network packets of the current cgroup. This interface works with qdisc or iptable.

No

N/A

Note

The corresponding interfaces are removed from cgroup v2. You can use ebpf to filter and shape traffic.

net_prio.prioidx

Queries the index value of the current cgroup in the data structure. The interface is read-only and used internally by the kernel.

No

net_prio.ifpriomap

Specifies the network priority value for each network interface controller (NIC).

No

perf_event

The perf_event subsystem does not provide interfaces. The perf_event subsystem is enabled by default for cgroup v2 and provides the same functionality as the perf_event subsystem in cgroup v1.

pids

The cgroup v1 interfaces and the cgroup v2 interfaces of the pids subsystem are the same.

Interface name

Purpose

In-house interface

pids.max

Specifies the maximum number of tasks in a cgroup.

No

pids.current

Queries the current number of tasks in a cgroup.

No

pids.events

Queries the number of events in which the fork operation fails because the maximum number of supported tasks is reached. The fsnotify library is supported to provide filesystem notifications about the events.

No

rdma

The cgroup v1 interfaces and the cgroup v2 interfaces of the rdma subsystem are the same.

Interface name

Purpose

In-house interface

rdma.max

Specifies the upper limit on the resource usage of the Remote Direct Memory Access (RDMA) adapter.

No

rdma.current

Queries the resource usage of the RDMA adapter.

No