Alibaba Cloud Linux 2 with kernel version 4.19.91-24.al7
and later and Alibaba Cloud Linux 3 with kernel version 5.10.46-7.al8
and later support the group identity feature. You can use the group identity feature to configure different identities for CPU control groups (cgroups) to define the priorities of processes (tasks) in the cgroups.
Background information
When you deploy latency-sensitive tasks and computing tasks on the same instance, the Linux kernel scheduler must provide more scheduling opportunities to high-priority tasks to minimize scheduling latency and the impacts of low-priority tasks on kernel scheduling. In the preceding scenario, Alibaba Cloud Linux provides the group identity feature and adds interfaces that you can use to configure scheduling priorities for CPU cgroups. Tasks that have different priorities have the following characteristics:
High-priority tasks have the minimal wakeup latency.
Low-priority tasks do not affect the performance of high-priority tasks.
Waking up low-priority tasks does not affect the performance of high-priority tasks.
Low-priority tasks do not share hardware units and do not cause negative impacts on the performance of high-priority tasks.
Prerequisites
In Alibaba Cloud Linux 2 with kernel version
4.19.91-26
,4.19.91-26.1
,4.19.91-26.2
, or4.19.91-26.3
, the group identity feature is disabled in the kernel. You can run theuname -r
command to query the kernel version of Alibaba Cloud Linux 2.In Alibaba Cloud Linux 3 with kernel version
5.10.112-11.al8
,5.10.112-11.1.al8
,5.10.112-11.2.al8
,5.10.134-12.al8
,5.10.134-12.1.al8
, or5.10.134-12.2.al8
, the group identity feature is disabled in the kernel.
If you use the group identity feature on Alibaba Cloud Linux 2 with a kernel version within the range of
4.19.91-25.1.al7
to4.19.91-25.5.al7
, downtime occurs. In this case, upgrade the kernel version to4.19.91-25.6.al7
or later. For more information, see the Upgrade the kernel section of the "Change the kernel version" topic.If Alibaba Cloud Linux 3 with kernel version
5.10.134-12.2.al8
uses the x86_64 architecture, run the following commands to enable the group identity feature:yum makecache sudo yum install scheduler-group-identity.x86_64 -y
In Alibaba Cloud Linux 2 with kernel version
4.19.91-26.4
or later and Alibaba Cloud Linux 3 with kernel version5.10.134-13.al8
or later, the/proc/sys/kernel/sched_group_identity_enabled
interface is added to allow you to enable the group identity feature. Before you can use the group identity feature, you must run thesudo sh -c 'echo 1 > /proc/sys/kernel/sched_group_identity_enabled'
command to enable the feature.
How the group identity feature works
The group identity feature allows you to configure identities for CPU cgroups to define the priorities of tasks in the cgroups. The group identity feature relies on a dual red-black tree architecture. A low-priority red-black tree based on the red-black tree of the Completely Fair Scheduler (CFS) scheduling queue is added to store low-priority tasks.
When the kernel schedules the tasks for which identities are configured, the kernel processes the tasks based on the priorities of the tasks. The following table describes the identities in descending order of priority.
Identity | Description |
| Identifies a high-priority task. A high-priority task has more opportunities to preempt resources than a normal- or low-priority task. When the CFS schedules high-priority tasks, the following scenarios may occur:
|
| Identifies a normal-priority task. A normal-priority task has more opportunities to preempt resources than a low-priority task. When the CFS schedules normal-priority tasks, the following scenarios may occur:
|
| Identifies a low-priority task. When the CFS schedules low-priority tasks, the following scenarios may occur: If an |
The preceding identities apply based on the resource management policies of the CPU cgroups.
For tasks in CPU cgroups of the same level, the identity priorities take effect.
For tasks in CPU cgroups of different levels, the identity priorities do not take effect on tasks in the parent cgroups but take effect on tasks in the child cgroups.
For tasks that have the same identity priority, resources are competed for in compliance with the CFS policies. Take note that the runtime of tasks identified by the
ID_UNDERCLASS
orID_NORMAL
identity may not reach the minimum value.
Other identities
Identity | Description |
| Identifies an SMT expeller task. When an SMT expeller task runs an SMT CPU, the task evicts the tasks that are identified by the |
| Specifies that when a task wakes up, the task attempts to find idle CPUs within the limits of the scheduler policies. |
| Used together with the |
Interfaces
Interfaces used to configure identities
The group identity feature provides the following interfaces to allow you to configure task identities:
/sys/fs/cgroup/cpu/$cg/cpu.identity
and/sys/fs/cgroup/cpu/$cg/cpu.bvt_warp_ns
. The$cg
variable specifies the child cgroup directory node on which a task runs. Before you use the interfaces, take note of the following items:The
cpu.bvt_warp_ns
interface is a quick configuration interface. The written value of the interface is converted into identity values.You can use the
cpu.identity
andcpu.bvt_warp_ns
interfaces to change the identities of cgroups.The identity value that is written by using the
cpu.identity
interface overwrites the identity value that is previously written by using thecpu.bvt_warp_ns
interface, but the value of thecpu.bvt_warp_ns
interface remains unchanged.The identity value that is written by using the
cpu.bvt_warp_ns
interface overwrites the identity value that is previously written by using thecpu.identity
interface, but the value of thecpu.identity
interface remains unchanged.You can use one of the interfaces to configure task identities. We recommend that you do not use the interfaces at the same time.
If you are unfamiliar with the operations related to the operating system kernel, we recommend that you do not use the
cpu.identity
interface.
The following table describes the interfaces.
Interface
Description
cpu.identity
The default value is 0, which specifies the
ID_NORMAL
identity.The interface is a 5-bit field. Valid values of each bit: 0 and 1. A value of 0 specifies that the identity is not assumed. A value of 1 specifies that the identity is assumed. Description of each bit:
If the interface is left empty, the
ID_NORMAL
identity is used.Bit 0: specifies the
ID_UNDERCLASS
identity.Bit 1: specifies the
ID_HIGHCLASS
identity.Bit 2: specifies the
ID_SMT_EXPELLER
identity.Bit 3: specifies the
ID_IDLE_SAVER
identity.Bit 4: specifies the
ID_IDLE_SEEKER
identity.
For example, if you want to set the identity of a cgroup to
ID_HIGHCLASS
andID_IDLE_SEEKER
, set bit 1 and bit 4 to 1 and the other bits to 0 to obtain a binary value of 10010, which is converted into a decimal value of 18. Then, run theecho 18 > /sys/fs/cgroup/cpu/$cg/cpu.identity
command to write 18 to the cpu.identity interface.cpu.bvt_warp_ns
The default value is 0, which specifies the
ID_NORMAL
identity. Valid values:2: specifies the
ID_SMT_EXPELLER
,ID_IDLE_SEEKER
, andID_HIGHCLASS
identities. The corresponding value in the cpu.identity interface is 22.1: specifies the
ID_HIGHCLASS
andID_IDLE_SEEKER
identities. The corresponding value in the cpu.identity interface is 18.0: specifies the
ID_NORMAL
identity. The corresponding value in the cpu.identity interface is 0.-1: specifies the
ID_UNDERCLASS
andID_IDLE_SAVER
identities. The corresponding value in the cpu.identity interface is 9.-2: specifies the
ID_UNDERCLASS
andID_IDLE_SAVER
identities. The corresponding value in the cpu.identity interface is 9.
NoteBy default, Alibaba Cloud Linux supports the cgroup v1 interfaces. Alibaba Cloud Linux 3 with kernel version
5.10.134-13
and later in the 5.10 kernel series also supports the following cgroup v2 interfaces for the group identity feature:/sys/fs/cgroup/$cg/cpu.identity
and/sys/fs/cgroup/$cg/cpu.bvt_warp_ns
. The$cg
variable specifies the child cgroup directory node on which a task runs.Interfaces used to enable or disable kernel scheduling features
You can run the following command to view the default settings of kernel scheduling features by using the
sched_features
interface:sudo cat /sys/kernel/debug/sched_features
The following table describes the kernel scheduling features.
Kernel scheduling feature
Description
Default value
ID_IDLE_AVG
This feature is used together with the
ID_IDLE_SAVER
identity to count the runtime ofID_UNDERCLASS
tasks towards the idle time. This ensures that no CPUs remain idle when onlyID_UNDERCLASS
tasks are running, and prevents resource waste.ID_IDLE_AVG
: indicates that the feature is enabled.ID_RESCUE_EXPELLEE
This feature is used in load balancing scenarios. If tasks cannot find available CPU resources, CPUs that are evicting
ID_UNDERCLASS
tasks are used to balance loads. This feature helps moveID_UNDERCLASS
tasks out of the evicted state at the earliest opportunity.ID_RESCUE_EXPELLEE
: indicates that the feature is enabled.ID_EXPELLEE_NEVER_HOT
After this feature is enabled, if a request is initiated to migrate a task that is being evicted to another CPU, hot cache does not cause the migration request to be denied. This feature helps move
ID_UNDERCLASS
tasks out of the evicted state at the earliest opportunity.NO_ID_EXPELLEE_NEVER_HOT
: indicates that the feature is disabled.ID_LOOSE_EXPEL
After this feature is enabled, CPUs do not update the eviction status every time the CPUs select tasks but automatically update the status based on the time specified by the
sched_expel_update_interval
kernel parameter. The configuration of this feature affects only status updates when CPUs select tasks. The updates for inter-processor interrupts (IPIs) are not affected.NO_ID_LOOSE_EXPEL
: indicates that the feature is disabled.ID_LAST_HIGHCLASS_STAY
After this feature is enabled, the last
ID_HIGHCLASS
task that runs on a CPU cannot be migrated to another CPU.ID_LAST_HIGHCLASS_STAY
: indicates that the feature is enabled.ID_EXPELLER_SHARE_CORE
If this feature is enabled,
ID_SMT_EXPELLER
tasks can preferentially run on physical cores on whichID_SMT_EXPELLER
tasks are already running.If this feature is disabled,
ID_SMT_EXPELLER
tasks are distributed across physical cores. This way, theID_SMT_EXPELLER
tasks do not interfere with each other.
ID_EXPELLER_SHARE_CORE
: indicates that the feature is enabled.ID_ABSOLUTE_EXPEL
In Alibaba Cloud Linux 3, this feature is introduced in kernel version
5.10.134-16.3
and is usable in kernel version5.10.134-16.3
and later in the 5.10 kernel series. After this feature is enabled,ID_UNDERCLASS
tasks are absolutely suppressed and cannot be scheduled ifID_NORMAL
orID_HIGHCLASS
tasks exist in the task queues for running. In worst case scenarios,ID_UNDERCLASS
tasks starve. In hybrid deployment scenarios, assess the loads of tasks that have different identities before you enable the feature.NO_ID_ABSOLUTE_EXPEL
: indicates that the feature is disabled.ID_LOAD_BALANCE
In Alibaba Cloud Linux 3, this feature is introduced in kernel version
5.10.134-16.3
and is usable in kernel version5.10.134-16.3
and later in the 5.10 kernel series. After this feature is enabled, the scheduler considers the CPUs on which onlyID_UNDERCLASS
tasks run to be idle and attempts to migrateID_HIGHCLASS
tasks to the idle CPUs when a scheduler balances loads. During the migration, the scheduler tries to distribute theID_HIGHCLASS
tasks among the CPUs. This prevents CPU resource contention and Hyper-Threading (HT) inference between theID_HIGHCLASS
tasks and ensures that eachID_HIGHCLASS
task obtains sufficient CPU resources.NO_ID_LOAD_BALANCE
: indicates that the feature is disabled.Interfaces used by
sysctl
to configure kernel parametersSpecific capabilities of the group identity feature depend on the values of kernel parameters. The following table describes the kernel parameters.
Kernel parameter
Description
Unit
Default value
/proc/sys/kernel/sched_expel_update_interval
The interval at which the eviction status is automatically updated when a CPU selects tasks. This kernel parameter takes effect only if the
ID_LOOSE_EXPEL
feature is enabled.ms
10
/proc/sys/kernel/sched_expel_idle_balance_delay
The minimum
idle balance
interval when a CPU is evicting tasks. A value of -1 specifies thatidle balance
is not allowed.If only
ID_UNDERCLASS
tasks exist on a CPU and the tasks are being evicted, the CPU is idle.Idle balance
is performed on the CPU to improve load-balancing effects. However, this may damageID_UNDERCLASS
tasks. You can specify thesched_expel_idle_balance_delay
parameter to alleviate this issue.ms
-1
/proc/sys/kernel/sched_idle_saver_wmark
The watermark for CPU idle time. When an
ID_IDLE_SAVER
task wakes up, the task attempts to find an idle CPU whose idle time exceeds the specified watermark.ns
0
/proc/sys/kernel/sched_group_identity_enabled
Starting from kernel version
4.19.91-26.4
, the/proc/sys/kernel/sched_group_identity_enabled
interface is added to allow you to enable the group identity feature. Before you can use the group identity feature, you must run theecho 1 > /proc/sys/kernel/sched_group_identity_enabled
command to enable the feature.After the group identity feature is enabled, data cannot be written to the
/proc/sys/kernel/sched_group_identity_enabled
interface if the value of thecpu.bvt_warp_ns
orcpu.identity
interface of the cgroup is not zero.NoteIf your kernel version is
4.19.91-26.4.al7
,4.19.91-26.5.al7
, or4.19.91-26.6.al7
, thesched_group_identity_enabled
interface is set to 1, and the value of the cpu.bvt_warp_ns interface of the cgroup is not zero, errors occur when you read the/proc/sys/kernel/sched_group_identity_enabled
settings. This is a read bug that does not affect the normal usage of the interface. This bug is fixed in kernel version4.19.91-27.al7
and later.N/A
0
Information output
When you use the group identity feature, you can run the following command to view various parameters:
cat /proc/sched_debug
The following table describes the output parameters.
Parameter | Description |
| The number of |
| The number of |
| The number of non- |
| Indicates whether |
| Indicates whether |
| The cumulative runtime of |
| The cumulative runtime of |
| The number of non- |
| The difference between the minimum vruntimes of the two red-black trees when the CPU starts to evict tasks. |
| The cumulative difference between the minimum vruntimes of the two red-black trees caused by the CPU eviction status. |
| The minimum vruntime of the low-priority red-black tree. |