In Linux, Control Groups (cgroups) provide a resource management and restriction mechanism that limits, records, and isolates physical resources, such as CPUs, memory, and I/O resources, that are allocated to tasks (processes) in cgroups. A parent cgroup can be used to control the resource utilization of descendant cgroups. cgroup v1 and cgroup v2 are two major versions of cgroups and significantly differ in design and usage. This topic describes the main differences between cgroup v1 and cgroup v2.
Common interface differences
cgroup v1 interfaces
Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
cgroup.procs | Writes process IDs (PIDs) to migrate tasks to a cgroup. | No | cgroup.procs. |
cgroup.clone_children | A value of 1 indicates that the child cgroups inherit the cpuset configurations of the parent cgroup. Note This interface takes effect only on the cpuset subsystem and is classified as a common interface due to historical reasons. | No | N/A |
cgroup.sane_behavior | Supports experimental features of the cgroup v2 interfaces, which gives backward compatibility. | No | N/A |
notify_on_release | A value of 1 indicates that the tasks in the release_agent interface are executed if a cgroup becomes empty. Note These interfaces exist only in the root cgroup. | No | cgroup.events, which implements similar functionality |
release_agent | No | ||
tasks | Writes thread IDs (TIDs) to migrate threads to a cgroup. | No | cgroup.threads. |
pool_size | Controls the size of the cgroup cache pool. The cgroup cache pool helps accelerate the creation and binding of cgroups in high-concurrency scenarios. Note The interface depends on cgroup_rename and cannot be used in cgroup v2. | Yes | N/A |
cgroup v2 interfaces
Interface name | Purpose | In-house interface | Corresponding cgroup v1 interface |
cgroup.procs | Writes PIDs to migrate tasks to a cgroup. | No | cgroup.procs. |
cgroup.type | Writes the string "threaded" to change a cgroup to a threaded cgroup to provide thread-granularity control. Note Only cpu, pids, and perf_event threaded controllers are supported. | No | N/A |
cgroup.threads | Writes TIDs to migrate threads to a cgroup. Note The string "threaded" must be written to the cgroup.type interface file. | No | tasks. |
cgroup.controllers | Queries all subsystems available for the current cgroup. | No | N/A |
cgroup.subtree_control | Specifies which subsystems are enabled to control resource distribution from the cgroup to its child cgroups. Note The subsystems can be queried by using the cgroup.controllers interface. | No | N/A |
cgroup.events | Queries whether active processes exist in the current cgroup and whether the current cgroup is frozen. You can use fsnotify to listen on this interface to check whether the interface status is changed. Note This interface does not exist in the root cgroup. | No | notify_on_release and release_agent, which are used together to implement similar functionality |
cgroup.max.descendants | Controls the maximum number of the descendant cgroups allowed in the current cgroup. | No | N/A |
cgroup.max.depth | Controls the maximum depth of descendant cgroups allowed in the current cgroup. | No | N/A |
cgroup.stat | Queries the number of descendant cgroups underneath the current cgroup and the descendant cgroups that are in the Dying (deleted) state. | No | N/A |
cgroup.freeze | Controls whether to freeze tasks in a cgroup. Note This interface does not exist in the root cgroup. | No | freezer.state in the freezer subsystem |
cpu.stat | Queries statistics about CPU utilization. | No | N/A |
io.pressure | Query Pressure Stall Information (PSI) for I/O performance, memory, and CPUs. The information can be polled. For more information, see the following topics: | No | io.pressure, memory.pressure, and cpu.pressure interfaces in the cpuacct subsystem, which can implement the PSI feature |
memory.pressure | No | ||
cpu.pressure | No |
Subsystem interface differences
CPU
cgroup v1 interfaces
Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
cpu.shares | Controls the weight, based on which CPU time slices are allocated to tasks in a cgroup. Default value: 1024. | No | cpu.weight and cpu.weight.nice, which use a different unit |
cpu.idle | Controls whether to use an idle scheduling policy for the current cgroup. An idle scheduling policy allocates time slices based on the smallest CPU share. The minimum runtime is no longer supported, which allows the CPU resources to be easily allocated to non-idle tasks. Note If the cpu.idle value is set to 1, the cpu.shares interface becomes unwritable and is set only to 3. | No | cpu.idle |
cpu.priority | The fine-grained preemptive priority. Preemption is performed when the clock is interrupted or woken up. The fine-grained preemptive priority varies based on the difference between priorities to allow high-priority tasks to preempt memory of low-priority tasks. | Yes | cpu.priority |
cpu.cfs_quota_us | The CPU runtime controlled by using Completely Fair Scheduler (CFS). cpu.cfs_quota_us specifies the maximum CPU runtime of tasks in a cgroup within a period defined by the cpu.cfs_period_us interface. | No | cpu.max |
cpu.cfs_period_us | No | ||
cpu.cfs_burst_us | The duration in which tasks can burst within a period defined by the cpu.cfs_period_us interface. For more information, see Enable the CPU burst feature for cgroup v1. | No | cpu.max.burst |
cpu.cfs_init_buffer_us | The duration in which tasks in a cgroup can burst when the tasks are initiated. | Yes | cpu.max.init_buffer |
cpu.stat | Queries statistics about CPU runtime, such as the number of cpu.cfs_period_us periods and the number of times CPU resources used by tasks were throttled. | No | cpu.stat |
cpu.rt_runtime_us | Control the real-time CPU runtime. cpu.rt_runtime_us specifies the maximum runtime of real-time tasks in a cgroup within the cpu.rt_period_us period. | No | N/A |
cpu.rt_period_us | No | N/A | |
cpu.bvt_warp_ns | Control the group identity attribute to change the identities of cgroups, which can be used to distinguish between offline tasks and provide better CPU quality of service (QoS) guarantees for online tasks. For more information, see Group identity feature. | Yes | cpu.bvt_warp_ns |
cpu.identity | Yes | cpu.identity | |
cpu.ht_stable | Specifies whether to generate simultaneous multithreading (SMT) peer noise to maintain consistent SMT computing power. | Yes | N/A |
cpu.ht_ratio | Controls whether to use quotas to provide extra computing power when the SMT peer is idle to maintain consistent SMT computing power. | Yes | cpu.ht_ratio |
cgroup v2 interfaces
cgroup v2 no longer supports the cpuacct subsystem. Specific interfaces or related features of the cpuacct subsystem are implemented by the CPU subsystem in cgroup v2.
Interface name | Purpose | In-house interface | Corresponding cgroup v1 interface |
cpu.weight | Controls the weight, based on which CPU time slices are allocated to tasks in a cgroup. Default value: 100. | No | cpu.shares, which uses a different unit |
cpu.weight.nice | Controls the weight, based on which CPU time slices are allocated to tasks in a cgroup. Default value: 0. | No | cpu.shares, which uses a different unit |
cpu.idle | Controls whether to use an idle scheduling policy for the current cgroup. An idle scheduling policy allocates time slices based on the smallest CPU share. The minimum runtime is no longer supported, which allows the CPU resources to be easily allocated to non-idle tasks. Note When the cpu.idle value is 1, the cpu.weight and cpu.weight.nice interfaces become unwritable, and a minimum weight of 0.3 takes effect. In this case, the cpu.weight value is rounded to 0. | No | cpu.idle |
cpu.priority | The fine-grained preemptive priority. Preemption is performed when the clock is interrupted or woken up. The fine-grained preemptive priority varies based on the difference between priorities to allow high-priority tasks to preempt memory of low-priority tasks. | Yes | cpu.priority |
cpu.max | The CPU runtime controlled by using CFS. cpu.cfs_quota_us specifies the maximum CPU runtime of tasks in a cgroup within the cpu.cfs_period_us period. | No | cpu.cfs_quota_us, cpu.cfs_period_us |
cpu.max.burst | The duration in which tasks can burst within a period defined by the cpu.max interface. | No | cpu.max.burst |
cpu.max.init_buffer | The duration in which tasks in a cgroup can burst when the tasks are initiated. | Yes | cpu.cfs_init_buffer_us |
cpu.bvt_warp_ns | Control the group identity attribute to change the identities of cgroups, which can be used to distinguish between offline tasks and provide better CPU QoS guarantees for online tasks. | Yes | cpu.bvt_warp_ns |
cpu.identity | Yes | cpu.identity | |
cpu.sched_cfs_statistics | Queries statistics about CFS, such as the runtime of a cgroup and the waiting time of cgroups at the same level or different levels. Note The kernel.sched_schedstats option must be enabled. | Yes | cpuacct.sched_cfs_statistics |
cpu.wait_latency | Queries the latency of tasks waiting in the queue. Note The kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled options must be enabled. | Yes | cpuacct.wait_latency |
cpu.cgroup_wait_latency | Queries the latency of cgroups waiting in the queue. The wait_latency interface counts the latency of task scheduling entities (SEs), and the cgroup_wait_latency interface counts the latency of group SEs. Note The kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled options must be enabled. | Yes | cpuacct.cgroup_wait_latency |
cpu.block_latency | Queries the latency of tasks blocked due to non-I/O causes. Note The kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled options must be enabled. | Yes | cpuacct.block_latency |
cpu.ioblock_latency | Queries the latency of tasks blocked due to I/O operations. Note The kernel.sched_schedstats and /proc/cpusli/sched_lat_enabled options must be enabled. | Yes | cpuacct.ioblock_latency |
cpu.ht_ratio | Controls whether to use quotas to provide extra computing power when the SMT peer is idle to maintain consistent SMT computing power. Note This interface takes effect only if the core scheduling feature is enabled. | Yes | cpu.ht_ratio |
cpuset
cgroup v1 interfaces
Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
cpuset.cpus | Controls the CPUs on which tasks can run. Note Tasks cannot be attached to a cgroup when this interface is empty. | No | cpuset.cpus |
cpuset.mems | Controls the non-uniform memory access (NUMA) nodes that can be allocated to tasks in a cgroup. Note Tasks cannot be attached to a cgroup when this interface is empty. | No | cpuset.mems |
cpuset.effective_cpus | Queries the effective CPUs on which tasks are running. The value of this interface is affected by CPU hotplug events. | No | cpuset.cpus.effective |
cpuset.effective_mems | Queries the effective NUMA nodes that are allocated to the running tasks. The value of this interface is affected by memory nodes hotplug events. | No | cpuset.mems.effective |
cpuset.cpu_exclusive | Controls which CPUs are exclusively used by a cgroup and cannot be used by other cpusets at the same level in a cgroup. | No | cpuset.cpus.partition, that supports similar functionality |
cpuset.mem_exclusive | Controls which NUMA nodes are exclusively used by a cgroup and cannot be used by other cpusets at the same level in a cgroup. | No | N/A |
cpuset.mem_hardwall | A value of 1 indicates that memory only from the memory nodes that are attached to the cpuset can be allocated to tasks. | No | N/A |
cpuset.sched_load_balance | Controls whether CPUs are load-balanced within the cpuset. By default, the feature is enabled. | No | N/A |
cpuset.sched_relax_domain_level | Controls the range in which to search for CPUs when a scheduler migrates tasks to load-balance CPUs for the tasks. Default value: -1.
| No | N/A |
cpuset.memory_migrate | A non-zero value indicates that if a task is allocated a memory page in a cpuset and migrated to another cpuset, the memory page can also be migrated to the new cpuset. | No | N/A |
cpuset.memory_pressure | Calculates the memory paging pressure of the current cpuset. | No | N/A |
cpuset.memory_spread_page | A value of 1 indicates that the kernel evenly allocates the page cache to the memory nodes of the cpuset. | No | N/A |
cpuset.memory_spread_slab | A value of 1 indicates that the kernel evenly allocates the slab caches to the memory nodes of the cpuset. | No | N/A |
cpuset.memory_pressure_enabled | A value of 1 indicates that memory pressure statistics collection is enabled for the cpuset. | No | N/A |
cgroup v2 interfaces
Interface name | Purpose | In-house interface | Corresponding cgroup v1 interface |
cpuset.cpus | Controls the CPUs on which tasks can run. Note When the value of this interface is empty, the CPUs of the parent cpuset are used. | No | cpuset.cpus |
cpuset.mems | Controls the NUMA nodes that can be allocated to tasks in a cgroup. Note When the value of this interface is empty, the NUMA nodes of the parent cpuset are used. | No | cpuset.mems |
cpuset.cpus.effective | Queries the effective CPUs on which tasks are running. The value of this interface is affected by CPU hotplug events. | No | cpuset.effective_cpus |
cpuset.mems.effective | Queries the effective NUMA nodes that are allocated to the running tasks. The value of this interface is affected by memory nodes hotplug events. | No | cpuset.effective_mems |
cpuset.cpus.partition | Controls whether CPUs of a cpuset are exclusively used. If root is written into the interface, CPUs of a cpuset are exclusively used. | No | cpuset.cpu_exclusive, which implements similar functionality |
.__DEBUG__.cpuset.cpus.subpartitions | Queries which CPUs are used exclusively when root is written into the cpuset.cpus.partition interface. Note This interface is available only if the cgroup_debug feature is enabled for kernel cmdline. | No | N/A |
blkio
cgroup v1 interfaces
Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
blkio.throttle.read_bps_device | Specifies the maximum number of bytes per second that a cgroup can read from a device. Example: echo "<major>:<minor> <bps>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.read_bps_device | No | io.max |
blkio.throttle.write_bps_device | Specifies the maximum number of bytes per second that a cgroup can write to a device. Example: echo "<major>:<minor> <bps>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.write_bps_device | No | io.max |
blkio.throttle.read_iops_device | Specifies the maximum number of read operations per second that a cgroup can perform on a device. Example: echo "<major>:<minor> <iops>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.read_iops_device | No | io.max |
blkio.throttle.write_iops_device | Specifies the maximum number of read operations per second that a cgroup can perform on a device. Example: echo "<major>:<minor> <iops>" > /sys/fs/cgroup/blkio/<cgroup>/blkio.throttle.write_iops_device | No | io.max |
blkio.throttle.io_service_bytes | Queries bandwidth statistics. This interface collects the read, write, sync, async, discard, and total bandwidth statistics of all devices. Unit: bytes. | No | io.stat |
blkio.throttle.io_service_bytes_recursive | The recursive version of the blkio.throttle.io_service_bytes interface. Statistics collected by using the blkio.throttle.io_service_bytes interface include data of descendant cgroups. | No | N/A |
blkio.throttle.io_serviced | Queries IOPS statistics. This interface collects the read, write, sync, async, discard, and total IOPS statistics of all devices. | No | io.stat |
blkio.throttle.io_serviced_recursive | The recursive version of the blkio.throttle.io_serviced interface. Statistics collected by using the blkio.throttle.io_serviced interface include data of descendant cgroups. | No | N/A |
blkio.throttle.io_service_time | Queries the duration between request dispatch and request completion for I/O operations, which is used to measure the average I/O latency. For more information, see Enhance the monitoring of block I/O throttling. | Yes | io.extstat |
blkio.throttle.io_wait_time | Queries the duration when I/O operations wait in scheduler queues, which is used to measure the average I/O latency. For more information, see Enhance the monitoring of block I/O throttling. | Yes | io.extstat |
blkio.throttle.io_completed | Queries the number of completed I/O operations, which is used to measure the average I/O latency. For more information, see Enhance the monitoring of block I/O throttling. | Yes | io.extstat |
blkio.throttle.total_bytes_queued | Queries the number of I/O bytes that were throttled, which is used to analyze whether I/O latency is related to throttling. For more information, see Enhance the monitoring of block I/O throttling. | Yes | io.extstat |
blkio.throttle.total_io_queued | Queries the number of I/O operations that were throttled, which is used to analyze whether I/O latency is related to throttling. For more information, see Enhance the monitoring of block I/O throttling. | Yes | io.extstat |
blkio.cost.model | Specifies the blk-iocost cost model. The control mode (ctrl) can be set to auto or user. This interface exists only in the root cgroup. Example: echo "<major>:<minor> ctrl=user model=linear rbps=<rbps> rseqiops=<rseqiops> rrandiops=<rrandiops> wbps=<wbps> wseqiops=<wseqiops> wrandiops=<wrandiops>" > /sys/fs/cgroup/blkio/blkio.cost.model For more information, see Configure the blk-iocost weight-based throttling feature. | Yes | io.cost.model |
blkio.cost.qos | Controls the blk-iocost feature and configures a QoS policy to check for disk congestion. This interface exists only in the root cgroup. Example: echo "<major>:<minor> enable=1 ctrl=user rpct= rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/blkio/blkio.cost.qos For more information, see Configure blk-iocost weight throttling. | Yes | io.cost.qos |
blkio.cost.weight | Specifies the cgroup weight. This interface exists only in non-root cgroups and can be configured in the following modes:
For more information, see Configure the blk-iocost weight-based throttling feature. | Yes | io.cost.weight |
blkio.cost.stat | Queries the blk-iocost statistics. The interface exists only in non-root cgroups. | Yes | N/A |
cgroup v2 interfaces
Interface name | Purpose | In-house interface | Corresponding cgroup v1 interface |
io.max | The throttling interface that specifies the read and write throttling rates in byte/s and IOPS. Example: echo "<major>:<minor> rbps=<bps> wbps=<bps> riops=<iops> wiops=<iops>" > /sys/fs/cgroup/<cgroup>/io.max | No | blkio.throttle.read_bps_device blkio.throttle.read_iops_device blkio.throttle.write_bps_device blkio.throttle.write_iops_device |
io.stat | Queries I/O operation statistics, which include the rates of read, write, and discard operations in byte/s and IOPS. | No | blkio.throttle.io_service_bytes blkio.throttle.io_serviced |
io.extstat | Queries extended I/O statistics, including the wait time, service time, number of completed I/O operations, and throttling rates in byte/s and IOPS. | No | blkio.throttle.io_service_time blkio.throttle.io_wait_time blkio.throttle.io_completed blkio.throttle.total_bytes_queued blkio.throttle.total_io_queued |
io.cost.model | Specifies the blk-iocost cost model. The control mode (ctrl) can be set to auto or user. This interface exists only in the root cgroup. Example: echo "<major>:<minor> ctrl=user model=linear rbps=<rbps> rseqiops=<rseqiops> rrandiops=<rrandiops> wbps=<wbps> wseqiops=<wseqiops> wrandiops=<wrandiops>" > /sys/fs/cgroup/io.cost.model For more information, see Configure blk-iocost weight throttling. | No | blkio.cost.model |
io.cost.qos | Controls the blk-iocost feature and configures a QoS policy to check for disk congestion. This interface exists only in the root cgroup. Example: echo "<major>:<minor> enable=1 ctrl=user rpct= rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/io.cost.qos For more information, see Configure blk-iocost weight throttling. | No | blkio.cost.qos |
io.cost.weight | Specifies the cgroup weight. This interface exists only in non-root cgroups and can be configured in the following modes:
For more information, see Configure blk-iocost weight throttling. | No | blkio.cost.weight |
memory
cgroup v1 interfaces
Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
memory.usage_in_bytes | Queries the current memory usage. | No | N/A |
memory.max_usage_in_bytes | Queries the maximum memory usage. | No | N/A |
memory.limit_in_bytes | Specifies the hard upper limit on memory usage. | No | N/A |
memory.soft_limit_in_bytes | Specifies the soft lower limit on memory usage. | No | N/A |
memory.failcnt | Queries the number of times the memory usage reached the upper limit. | No | N/A |
memory.mglru_batch_size | Specifies the size of memory that is proactively reclaimed based on the Multi-Generational Least Recently Used (MGLRU) framework. An attempt is made to release CPUs between batches of memory reclamation. | Yes | N/A |
memory.mglru_reclaim_kbytes | Specifies the size of memory that is proactively reclaimed based on the MGLRU framework. | Yes | N/A |
memory.wmark_ratio | Controls the memcg backend asynchronous reclaim feature and sets the memcg memory watermark that triggers asynchronous reclamation. Unit: percent of the memcg memory upper limit. Valid values: 0 to 100.
For more information, see Memcg backend asynchronous reclaim. | Yes | memory.wmark_ratio |
memory.wmark_high | A read-only interface.
For more information, see Memcg backend asynchronous reclaim. | Yes | |
memory.wmark_low | A read-only interface.
For more information, see Memcg backend asynchronous reclaim. | Yes | |
memory.wmark_scale_factor | Specifies the interval between the memory.wmark_high value and the memory.wmark_low value. Unit: 0.01 percent of the memcg memory upper limit. Valid values: 1 to 1000.
For more information, see Memcg backend asynchronous reclaim. | Yes | |
memory.wmark_min_adj | The factor that is used in the memcg global minimum watermark rating feature. The value of this interface indicates an adjustment in percentage over the global minimum watermark. Valid values: -25 to 50.
For more information, see Memcg global minimum watermark rating. | Yes | |
memory.force_empty | Specifies whether to forcefully reclaim memory pages. | No | N/A |
memory.use_hierarchy | Specifies whether to collect hierarchical statistics. | Yes | N/A |
memory.swappiness | Specifies the swappiness parameter of vmscan, which controls the tendency of the kernel to use the swap partition. | No | N/A |
memory.priority | Specifies the memcg priority. This interface provides 13 memcg out-of-memory (OOM) priorities to sort business. Valid values: 0 to 12. A larger value indicates a higher priority. The priority of a parent cgroup is not inherited by its descendant cgroups. Default value: 0.
| Yes | memory.priority |
memory.move_charge_at_immigrate | Specifies whether charges of a task are moved along the task when the task is migrated between cgroups, which is a statistical control policy. | No | N/A |
memory.oom_control | Specifies whether to trigger the OOM killer to terminate tasks when an OOM error occurs and generate notifications about OOM status. | No | N/A |
memory.oom.group | Controls the OOM group feature that can terminate all tasks in a memcg if an OOM error occurs. | Yes | memory.oom.group |
memory.pressure_level | Specifies memory pressure notifications. | No | N/A |
memory.kmem.limit_in_bytes | Specifies the hard limit on the memory usage of the kernel. | No | N/A |
memory.kmem.usage_in_bytes | Queries the memory usage of the kernel. | No | N/A |
memory.kmem.failcnt | Queries the number of times the memory usage of the kernel reached the upper limit. | No | N/A |
memory.kmem.max_usage_in_bytes | Queries the maximum memory usage of the kernel. | No | N/A |
memory.kmem.slabinfo | Queries the slab memory usage of the kernel. | No | N/A |
memory.kmem.tcp.limit_in_bytes | Specifies the hard limit on the TCP memory usage of the kernel. | No | N/A |
memory.kmem.tcp.usage_in_bytes | Queries the TCP memory usage of the kernel. | No | N/A |
memory.kmem.tcp.failcnt | Queries the number of times the TCP memory usage of the kernel reached the upper limit. | No | N/A |
memory.kmem.tcp.max_usage_in_bytes | Queries the maximum TCP memory usage of the kernel. | No | N/A |
memory.memsw.usage_in_bytes | Queries the memory usage and swap memory usage. | No | N/A |
memory.memsw.max_usage_in_byte | Queries the maximum usage of memory and swap memory. | No | N/A |
memory.memsw.limit_in_bytes | Specifies the upper limit on the total usage of memory and swap memory used by tasks in the cgroup. | No | N/A |
memory.memsw.failcnt | Queries the number of times the total usage of memory and swap memory reached the upper limit. | No | N/A |
memory.swap.high | Specifies the upper limit on available swap memory usage in a cgroup. | Yes | memory.swap.high |
memory.swap.events | Queries the events occuring when the swap memory usage reached the upper limit. | Yes | memory.swap.events |
memory.min | Specifies a minimum amount of memory that a cgroup must retain, which is a hard guarantee of memory. For more information, see Memcg QoS feature of the cgroup v1 interface. | Yes | memory.min |
memory.low | Specifies the lower limit of memory that a cgroup can retain, which is a soft guarantee of memory. For more information, see Memcg QoS feature of the cgroup v1 interface. | Yes | memory.low |
memory.high | Specifies the throttle limit of the memory usage. For more information, see Memcg QoS feature of the cgroup v1 interface. | Yes | memory.high |
memory.allow_duptext | When the /sys/kernel/mm/duptext/enabled parameter is configured to globally enable the code duptext feature, the interface is used to control whether to enable the code duptext feature for tasks in a specific memcg. Valid values: 0 and 1. Default value: 0.
For more information, see Code duptext feature. | Yes | memory.allow_duptext |
memory.allow_duptext_refresh | Specifies whether the code duptext feature is immediately started when a binary file is generated or downloaded. The code duptext feature does not take effect in case of PageDirty or PageWriteback. The interface uses the asynchronous task mode to refresh tasks when the code duptext feature does not take effect in scenarios of PageDirty or PageWriteback. | Yes | memory.allow_duptext_refresh |
memory.duptext_nodes | Limits the duptext memory allocation nodes. | Yes | memory.duptext_nodes |
memory.allow_text_unevictable | Specifies whether the memcg snippet is locked. | Yes | memory.allow_text_unevictable |
memory.text_unevictable_percent | Specifies the ratio of the amount of memory used by locked memcg code snippet to the total amount of memory used by memcg code. | Yes | memory.text_unevictable_percent |
memory.thp_reclaim | Controls the Transparent Huge Pages (THP) reclaim feature. Valid values:
Default value: disable. For more information, see THP reclaim. | Yes | memory.thp_reclaim |
memory.thp_reclaim_stat | Queries the status of the THP reclaim feature. Parameters of this interface:
The values of the preceding parameters are listed in ascending order by NUMA node ID, such as node0 and node1, from left to right. For more information, see THP reclaim. | Yes | memory.thp_reclaim_stat |
memory.thp_reclaim_ctrl | Specifies how the THP reclaim feature is triggered. Parameters of this interface:
For more information, see THP reclaim. | Yes | memory.thp_reclaim_ctrl |
memory.thp_control | Controls the memcg THP feature. This interface can be used to prohibit the application of anon, shmem, and file THPs. For example, an offline memcg is not allowed to use THPs. This helps reduce THP contention and memory waste, even though memory fragmentation cannot be prevented. | Yes | memory.thp_control |
memory.reclaim_caches | Specifies whether the kernel proactively reclaims the cache in memcgs. Example: | Yes | memory.reclaim_caches |
memory.pgtable_bind | Specifies whether to forcefully apply for page table memory on the current node. | Yes | memory.pgtable_bind |
memory.pgtable_misplaced | Queries statistics about page memory in page tables when page memory is allocated across nodes. | Yes | memory.pgtable_misplaced |
memory.oom_offline | In the Quick OOM feature, you can use this interface to mark the memcg of an offline task. | Yes | memory.oom_offline |
memory.async_fork | Controls the Async-fork feature, formerly known as fast convergent merging (FCM), for memcgs. | Yes | memory.async_fork |
memory.direct_compact_latency | Specifies the latency in direct memory compaction of the memsli feature. | Yes | memory.direct_compact_latency |
memory.direct_reclaim_global_latency | Specifies the latency in direct global memory reclamation of the memsli feature. | Yes | memory.direct_reclaim_global_latency |
memory.direct_reclaim_memcg_latency | Specifies the latency in direct memcg memory reclamation of the memsli feature. | Yes | memory.direct_reclaim_memcg_latency |
memory.direct_swapin_latency | Specifies the latency in direct memory swap-in of the memsli feature. | Yes | memory.direct_swapin_latency |
memory.direct_swapout_global_latency | Specifies the latency in direct global memory swap-out of the memsli feature. | Yes | memory.direct_swapout_global_latency |
memory.direct_swapout_memcg_latency | Specifies the latency in direct memcg memory swap-out of the memsli feature. | Yes | memory.direct_swapout_memcg_latency |
memory.exstat | Queries statistics about extended memory and extra memory. Statistics about the following in-house features are collected:
For more information, see Memcg Exstat feature. | Self-developed enhancement | memory.exstat |
memory.idle_page_stats | Queries statistics about kidled memory usage of a memcg and the hierarchical information of the cgroup. | Yes | memory.idle_page_stats |
memory.idle_page_stats.local | Queries statistics about kidled memory usage of a memcg. | Yes | memory.idle_page_stats.local |
memory.numa_stat | Queries NUMA statistics for anonymous, file, and locked memory. | No | memory.numa_stat |
memory.pagecache_limit.enable | Controls the Page Cache Limit feature. For more information, see Page Cache Limit feature. | Yes | memory.pagecache_limit.enable |
memory.pagecache_limit.size | Specifies the size of the limited page cache. | Yes | memory.pagecache_limit.size |
memory.pagecache_limit.sync | Specifies the mode of the Page Cache Limit feature, which is synchronous or asynchronous. | Yes | memory.pagecache_limit.sync |
memory.reap_background | Specifies whether the zombie memcg reapers reap memory of memcgs in the backend asynchronous manner. | Yes | memory.reap_background |
memory.stat | Queries memory statistics. | No | memory.stat |
memory.use_priority_oom | Controls the memcg OOM priority policy feature. For more information, see Memcg OOM priority policy. | Yes | memory.use_priority_oom |
memory.use_priority_swap | Specifies whether the memory is swapped based on the priorities of cgroups. For more information, see Memcg OOM priority policy. | Yes | memory.use_priority_swap |
cgroup v2 interfaces
Interface name | Purpose | In-house interface | Corresponding cgroup v1 interface |
memory.current | Queries the memory usage. | No | N/A |
memory.min | Specifies a minimum amount of memory that a cgroup must retain, which is a hard guarantee of memory. For more information, see Memcg QoS feature of the cgroup v1 interface. | No | memory.min |
memory.low | Specifies the lower limit of memory that a cgroup can retain, which is a soft guarantee of memory. For more information, see Memcg QoS feature of the cgroup v1 interface. | No | memory.low |
memory.high | Specifies the upper limit on memory usage. For more information, see Memcg QoS feature of the cgroup v1 interface. | No | memory.high |
memory.max | Specifies the throttle limit of the memory usage. | No | memory.max |
memory.swap.current | Queries swap memory in use. | No | N/A |
memory.swap.high | Specifies the upper limit on available swap memory usage in a cgroup. | No | N/A |
memory.swap.max | Specifies a hard limit on swap memory. | No | N/A |
memory.swap.events | Queries the events occuring when the swap memory usage reached the upper limit. | No | N/A |
memory.oom.group | Specifies whether the OOM group feature is enabled, which can kill all tasks in a memcg if an OOM error occurs. | No | memory.oom.group |
memory.wmark_ratio | Controls the memcg backend asynchronous reclaim feature and sets the memcg memory watermark that triggers asynchronous reclamation. Unit: percent of the memcg memory upper limit. Valid values: 0 to 100.
For more information, see Memcg backend asynchronous reclaim. | Yes | memory.wmark_ratio |
memory.wmark_high | A read-only interface.
For more information, see Memcg backend asynchronous reclaim. | Yes | memory.wmark_high |
memory.wmark_low | A read-only interface.
For more information, see Memcg backend asynchronous reclaim. | Yes | memory.wmark_low |
memory.wmark_scale_factor | Specifies the interval between the memory.wmark_high value and the memory.wmark_low value. Unit: 0.01 percent of the memcg memory upper limit. Valid values: 1 to 1000.
For more information, see Memcg backend asynchronous reclaim. | Yes | memory.wmark_scale_factor |
memory.wmark_min_adj | The factor that is used in the memcg global minimum watermark rating feature. The value of this interface indicates an adjustment in percentage over the global minimum watermark. Valid values: -25 to 50.
For more information, see Memcg global minimum watermark rating. | Yes | memory.wmark_min_adj |
memory.priority | Specifies the memcg priority. This interface provides 13 memcg OOM priorities to sort business. Valid values: 0 to 12. A larger value indicates a higher priority. The priority of a parent cgroup is not inherited by its descendant cgroups. Default value: 0.
For more information, see Memcg OOM priority policy. | Yes | memory.priority |
memory.use_priority_oom | Controls the memcg OOM priority policy feature. For more information, see Memcg OOM priority policy. | Yes | memory.use_priority_oom |
memory.use_priority_swap | Specifies whether the memory is swapped based on the priorities of cgroups. For more information, see Memcg OOM priority policy. | Yes | memory.use_priority_swap |
memory.direct_reclaim_global_latency | Specifies the latency in direct global memory reclamation of the memsli feature. | Yes | memory.direct_reclaim_global_latency |
memory.direct_reclaim_memcg_latency | Specifies the latency in direct memcg memory reclamation of the memsli feature. | Yes | memory.direct_reclaim_memcg_latency |
memory.direct_compact_latency | Specifies the latency in direct memory compaction of the memsli feature. | Yes | memory.direct_compact_latency |
memory.direct_swapout_global_latency | Specifies the latency in direct global memory swap-out of the memsli feature. | Yes | memory.direct_swapout_global_latency |
memory.direct_swapout_memcg_latency | Specifies the latency in direct memcg memory swap-out of the memsli feature. | Yes | memory.direct_swapout_memcg_latency |
memory.direct_swapin_latency | Specifies the latency in direct memory swap-in of the memsli feature. | Yes | memory.direct_swapin_latency |
memory.exstat | Queries statistics about extended memory and extra memory. Statistics about the following in-house features are collected:
For more information, see Memcg Exstat. | Yes | memory.exstat |
memory.pagecache_limit.enable | Controls the Page Cache Limit feature. For more information, see Page Cache Limit feature. | Yes | memory.pagecache_limit.enable |
memory.pagecache_limit.size | Specifies the size of the limited page cache. For more information, see Page Cache Limit feature. | Yes | memory.pagecache_limit.size |
memory.pagecache_limit.sync | Specifies the mode of the Page Cache Limit feature, which is synchronous or asynchronous. For more information, see Page Cache Limit feature. | Yes | memory.pagecache_limit.sync |
memory.idle_page_stats | Queries statistics about kidled memory of individual memcgs of each hierarchy. | Yes | memory.idle_page_stats |
memory.idle_page_stats.local | Queries statistics about kidled memory of individual memcgs. | Yes | memory.idle_page_stats.local |
memory.numa_stat | Queries NUMA statistics for anonymous, file, and locked memory. | Yes | memory.numa_stat |
memory.reap_background | Specifies whether the zombie memcg reapers reap memory of memcgs in the backend asynchronous manner. | Yes | memory.reap_background |
memory.stat | Queries memory statistics. | No | memory.stat |
memory.use_priority_oom | Controls the memcg OOM priority policy feature. For more information, see Memcg OOM priority policy. | Yes | memory.use_priority_oom |
cpuacct
Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
cpuacct.usage | Queries the total CPU time used. Unit: nanoseconds. | No | cpu.stat, which displays similar data |
cpuacct.usage_user | Queries the CPU time used in user mode. Unit: nanoseconds. | No | |
cpuacct.usage_sys | Queries the CPU time used in kernel mode. Unit: nanoseconds. | No | |
cpuacct.usage_percpu | Queries the use time of each CPU. Unit: nanoseconds. | No | |
cpuacct.usage_percpu_user | Queries the use time of each CPU in user mode. Unit: nanoseconds. | No | |
cpuacct.usage_percpu_sys | Queries the use time of each CPU in kernel mode. Unit: nanoseconds. | No | |
cpuacct.usage_all | Queries the summary of the cpuacct.usage_percpu_user and cpuacct.usage_percpu_sys interfaces. Unit: nanoseconds. | No | |
cpuacct.stat | Queries the CPU time used in user mode and kernel mode. Unit: tick. | No | |
cpuacct.proc_stat | Queries data such as the CPU time, average loads (loadavg), and number of running tasks at the container level. | Yes | |
cpuacct.enable_sli | Controls whether to count loadavgs at the container level. | Yes | N/A |
cpuacct.sched_cfs_statistics | Queries statistics about CFS, such as the runtime of a cgroup and the waiting time of cgroups at the same level or different levels. | Yes | cpu.sched_cfs_statistics |
cpuacct.wait_latency | Queries the latency of tasks waiting in the queue. | Yes | cpu.wait_latency |
cpuacct.cgroup_wait_latency | Queries the latency of cgroups waiting in the queue. The wait_latency interface counts the latency of task SEs, and the cgroup_wait_latency interface counts the latency of group SEs. | Yes | cpu.cgroup_wait_latency |
cpuacct.block_latency | Queries the latency of tasks blocked due to non-I/O causes. | Yes | cpu.block_latency |
cpuacct.ioblock_latency | Queries the latency of tasks blocked due to I/O operations. | Yes | cpu.ioblock_latency |
io.pressure | Query PSI for I/O performance, memory, and CPUs. The information can be polled. For more information, see the following topics: | No | N/A |
memory.pressure | No | ||
cpu.pressure | No |
freezer
Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
freezer.state | Controls the freeze status. Valid values: | No | cgroup.freeze |
freezer.self_freezing | Queries whether a cgroup is frozen because of its own frozen state. | No | N/A |
freezer.parent_freezing | Queries whether a cgroup is frozen because its ancestor is frozen. | No | N/A |
ioasids
The cgroup v1 interfaces and the cgroup v2 interfaces of the ioasids subsystem are the same.
Interface name | Purpose | In-house interface |
ioasids.current | Queries the number of ioasids allocated to the current cgroup. | Yes |
ioasids.events | Queries the number of events that occurred because the upper limit of allocable ioasids was exceeded. | Yes |
ioasids.max | Queries the total number of ioasids that can be allocated to the current cgroup. | Yes |
net_cls and net_prio
Interface name | Purpose | In-house interface | Corresponding cgroup v2 interface |
net_cls.classid | Specifies the class identifer that tags network packets of the current cgroup. This interface works with qdisc or iptable. | No | N/A Note The corresponding interfaces are removed from cgroup v2. You can use ebpf to filter and shape traffic. |
net_prio.prioidx | Queries the index value of the current cgroup in the data structure. The interface is read-only and used internally by the kernel. | No | |
net_prio.ifpriomap | Specifies the network priority value for each network interface controller (NIC). | No |
perf_event
The perf_event subsystem does not provide interfaces. The perf_event subsystem is enabled by default for cgroup v2 and provides the same functionality as the perf_event subsystem in cgroup v1.
pids
The cgroup v1 interfaces and the cgroup v2 interfaces of the pids subsystem are the same.
Interface name | Purpose | In-house interface |
pids.max | Specifies the maximum number of tasks in a cgroup. | No |
pids.current | Queries the current number of tasks in a cgroup. | No |
pids.events | Queries the number of events in which the fork operation fails because the maximum number of supported tasks is reached. The fsnotify library is supported to provide filesystem notifications about the events. | No |
rdma
The cgroup v1 interfaces and the cgroup v2 interfaces of the rdma subsystem are the same.
Interface name | Purpose | In-house interface |
rdma.max | Specifies the upper limit on the resource usage of the Remote Direct Memory Access (RDMA) adapter. | No |
rdma.current | Queries the resource usage of the RDMA adapter. | No |