Alibaba Cloud Linux 2 with kernel 4.19.91-24.al7
or later and Alibaba Cloud Linux 3 with kernel 5.10.46-7.al8
or later support the group identity feature. You can use the group identity feature to configure different identities for CPU control groups (cgroups) to define the priorities of process tasks in the cgroups.
Prerequisites
Alibaba Cloud Linux 2 with kernel
4.19.91-26
,4.19.91-26.1
,4.19.91-26.2
, or4.19.91-26.3
does not support the group identity feature because the feature is disabled in the kernel. You can run theuname -r
command to view the kernel version of Alibaba Cloud Linux 2.Alibaba Cloud Linux 3 with kernel
5.10.112-11.al8
,5.10.112-11.1.al8
,5.10.112-11.2.al8
,5.10.134-12.al8
,5.10.134-12.1.al8
, or5.10.134-12.2.al8
does not support the group identity feature because the feature is disabled in the kernel. You can run theuname -r
command to view the kernel version of Alibaba Cloud Linux 3.
If you use the group identity feature on Alibaba Cloud Linux 2 with a kernel version in the range of
4.19.91-25.1.al7
to4.19.91-25.5.al7
, downtime occurs. Before you use the group identity feature, upgrade the kernel version to4.19.91-25.6.al7
or later. For more information, see the FAQ section of this topic.If Alibaba Cloud Linux 3 whose kernel version is
5.10.134-12.2.al8
uses the x86_64 architecture, run the following commands to use the group identity feature:yum makecache sudo yum install scheduler-group-identity.x86_64 -y
In Alibaba Cloud Linux 2 with kernel
4.19.91-26.4
or later and Alibaba Cloud Linux 3 with kernel5.10.134-13.al8
or later, the/proc/sys/kernel/sched_group_identity_enabled
interface is added to allow you to enable the group identity feature. Before you can use the group identity feature, you must run theecho 1 > /proc/sys/kernel/sched_group_identity_enabled
command to enable the feature.
Background information
When you deploy latency-sensitive tasks and computing tasks on the same instance, the Linux kernel scheduler must provide more scheduling opportunities to high-priority tasks to minimize scheduling latency and the impacts of low-priority tasks on kernel scheduling. In the preceding scenario, Alibaba Cloud Linux provides the group identity feature and adds interfaces that can be used to configure scheduling priorities for CPU cgroups. Tasks that have different priorities have the following characteristics:
High-priority tasks have the minimum wakeup latency.
Low-priority tasks do not affect the performance of high-priority tasks.
The wakeup of low-priority tasks does not affect the performance of high-priority tasks.
Low-priority tasks do not affect the performance of high-priority tasks by sharing hardware units.
How the group identity feature works
The group identity feature allows you to configure identities for CPU cgroups to define the priorities of tasks in the cgroups. The group identity feature relies on a dual red-black tree architecture. A low-priority red-black tree is added based on the red-black tree of the Completely Fair Scheduler (CFS) scheduling queue to store low-priority tasks.
When the kernel schedules the tasks for which identities are configured, the kernel processes the tasks based on their priorities. The following table describes the identities in descending order of priority.
Identity | Description |
| Identifies a high-priority task. A high-priority task has more opportunities to preempt resources than a normal- or low-priority task. When the CFS schedules high-priority tasks, the following scenarios may occur:
|
| Identifies a normal-priority task. A normal-priority task has more opportunities to preempt resources than a low-priority task. When the CFS schedules normal-priority tasks, the following scenarios may occur:
|
| Identifies a low-priority task. When the CFS schedules low-priority tasks, the following scenarios may occur: If an |
The preceding identities apply based on the resource management policies of CPU cgroups.
For tasks in CPU cgroups of the same level, identity priorities take effect.
For tasks in CPU cgroups of different levels, identity priorities do not take effect on tasks in parent cgroups but take effect on tasks in child cgroups.
For tasks that have the same identity priority, resources are competed for in compliance with CFS policies. Take note that the runtime of tasks identified by the
ID_UNDERCLASS
orID_NORMAL
identity may not reach the minimum value.
Other identities
Identity | Description |
| Identifies an SMT expeller task. When an SMT expeller task runs an SMT CPU, the tasks that are identified by the |
| Specifies that when a task wakes up, the task attempts to find idle CPUs within the limits of scheduler policies. |
| Used with the |
Interfaces
Interfaces used to configure identities
The group identity feature provides the following interfaces for you to configure task identities:
/sys/fs/cgroup/cpu/$cg/cpu.identity
and/sys/fs/cgroup/cpu/$cg/cpu.bvt_warp_ns
. The $cg variable specifies the child cgroup directory node on which a task is located. Before you use the interfaces, take note of the following items:The
cpu.bvt_warp_ns
interface is a quick configuration interface. The written value of the interface is converted into identities.You can use the
cpu.identity
andcpu.bvt_warp_ns
interfaces to change the identities of CPU cgroups.The identity value that is written by using the
cpu.identity
interface overwrites the identity value that is previously written by using thecpu.bvt_warp_ns
interface, but the value of thecpu.bvt_warp_ns
interface remains unchanged.The identity value that is written by using the
cpu.bvt_warp_ns
interface overwrites the identity value that is previously written by using thecpu.identity
interface, but the value of thecpu.identity
interface remains unchanged.You can use one of the interfaces to configure task identities. We recommend that you do not use the interfaces at the same time.
If you are unfamiliar with the operations related to the operating system kernel, we recommend that you do not use the
cpu.identity
interface.
The following table describes the interfaces.
Interface
Description
cpu.identity
The default value is 0, which specifies the
ID_NORMAL
identity.The interface is a 5-bit field. Valid values of each bit: 0 and 1. 0 specifies that the identity is not assumed. 1 specifies that the identity is assumed. Description of each bit:
If the interface is left empty, the
ID_NORMAL
identity is used.Bit 0: specifies the
ID_UNDERCLASS
identity.Bit 1: specifies the
ID_HIGHCLASS
identity.Bit 2: specifies the
ID_SMT_EXPELLER
identity.Bit 3: specifies the
ID_IDLE_SAVER
identity.Bit 4: specifies the
ID_IDLE_SEEKER
identity.
For example, if you want to set the identity of a cgroup to
ID_HIGHCLASS
andID_IDLE_SEEKER
, set bit 1 and bit 4 to 1 and the other bits to 0 to obtain a binary value of 10010, which is converted into a decimal value of 18. Then, run theecho 18 > /sys/fs/cgroup/cpu/$cg/cpu.identity
command to write 18 to the cpu.identity interface.cpu.bvt_warp_ns
The default value is 0, which specifies the
ID_NORMAL
identity. Valid values:2: specifies the
ID_SMT_EXPELLER
,ID_IDLE_SEEKER
, andID_HIGHCLASS
identities. The corresponding value in the cpu.identity interface is 22.1: specifies the
ID_HIGHCLASS
andID_IDLE_SEEKER
identities. The corresponding value in cpu.identity is 18.0: specifies the
ID_NORMAL
identity. The corresponding value in the cpu.identity interface is 0.-1: specifies the
ID_UNDERCLASS
andID_IDLE_SAVER
identities. The corresponding value in the cpu.identity interface is 9.-2: specifies the
ID_UNDERCLASS
andID_IDLE_SAVER
identities. The corresponding value in the cpu.identity interface is 9.
NoteBy default, Alibaba Cloud Linux supports the cgroup v1 interfaces. Alibaba Cloud Linux 3 with kernel
5.10.134-13
and later in the 5.10 kernel series also supports the following cgroup v2 interfaces for the group identity feature:/sys/fs/cgroup/$cg/cpu.identity
and/sys/fs/cgroup/$cg/cpu.bvt_warp_ns
. The$cg
variable specifies the child cgroup directory node on which a task is located.Interfaces used to enable or disable kernel scheduling features
You can run the following command to view the default settings of kernel scheduling features by using the
sched_features
interface:sudo cat /sys/kernel/debug/sched_features
The following table describes the scheduling features.
Scheduling feature
Description
Default value
ID_IDLE_AVG
This feature is used together with the
ID_IDLE_SAVER
identity to count the runtime ofID_UNDERCLASS
tasks towards the idle time. This ensures that no CPUs remain idle when onlyID_UNDERCLASS
tasks are running, and prevents resource waste.ID_IDLE_AVG
: indicates that the feature is enabled.ID_RESCUE_EXPELLEE
This feature is used in load balancing scenarios. If tasks cannot find available CPU resources, CPUs that are evicting
ID_UNDERCLASS
tasks are used for balancing loads. This feature helps moveID_UNDERCLASS
tasks out of the evicted state at the earliest opportunity.ID_RESCUE_EXPELLEE
: indicates that the feature is enabled.ID_EXPELLEE_NEVER_HOT
After this feature is enabled, if a request is initiated to migrate a task that is being evicted to another CPU, the migration request is not denied due to hot cache. This feature helps move
ID_UNDERCLASS
tasks out of the evicted state at the earliest opportunity.NO_ID_EXPELLEE_NEVER_HOT
: indicates that the feature is disabled.ID_LOOSE_EXPEL
After this feature is enabled, CPUs do not update the eviction status every time the CPUs select tasks but have the status automatically updated at the time specified by the
sched_expel_update_interval
kernel parameter. The configuration of this feature affects only status updates when CPUs select tasks. Updates of inter-processor interrupts (IPIs) are not affected.NO_ID_LOOSE_EXPEL
: indicates that the feature is disabled.ID_LAST_HIGHCLASS_STAY
After this feature is enabled, the last
ID_HIGHCLASS
task that runs on a CPU cannot be migrated to another CPU.ID_LAST_HIGHCLASS_STAY
: indicates that the feature is enabled.ID_EXPELLER_SHARE_CORE
If this feature is enabled,
ID_SMT_EXPELLER
tasks can preferentially run on physical cores on whichID_SMT_EXPELLER
tasks are already running.If this feature is disabled,
ID_SMT_EXPELLER
tasks are distributed across physical cores. This way, theID_SMT_EXPELLER
tasks do not interfere with each other.
ID_EXPELLER_SHARE_CORE
: indicates that the feature is enabled.ID_ABSOLUTE_EXPEL
In Alibaba Cloud Linux 3, this feature is introduced in kernel
5.10.134-16.3
and is usable in kernel5.10.134-16.3
and later in the 5.10 kernel series. After this feature is enabled,ID_UNDERCLASS
tasks are absolutely suppressed and cannot be scheduled ifID_NORMAL
orID_HIGHCLASS
tasks are in the task queues for running. In worst case scenarios,ID_UNDERCLASS
tasks starve. In hybrid deployment scenarios, assess the loads of tasks that have different identities before you enable the feature.NO_ID_ABSOLUTE_EXPEL
: indicates that the feature is disabled.ID_LOAD_BALANCE
In Alibaba Cloud Linux 3, this feature is introduced in kernel
5.10.134-16.3
and is usable in kernel5.10.134-16.3
and later in the 5.10 kernel series. After this feature is enabled, when a scheduler balances loads, the scheduler considers the CPUs on which onlyID_UNDERCLASS
tasks run to be idle and attempts to migrateID_HIGHCLASS
tasks to the idle CPUs. During the migration, the scheduler tries to distribute theID_HIGHCLASS
tasks across the CPUs. This prevents CPU resource contention and Hyper-Threading (HT) inference between theID_HIGHCLASS
tasks and ensures that eachID_HIGHCLASS
task can obtain sufficient CPU resources.NO_ID_LOAD_BALANCE
: indicates that the feature is disabled.Interfaces used by sysctl to configure kernel parameters
Some capabilities of the group identity feature depend on the values of kernel parameters. The following table describes the kernel parameters.
Kernel parameter
Description
Unit
Default value
/proc/sys/kernel/sched_expel_update_interval
The interval at which the eviction status is automatically updated when a CPU selects tasks. This kernel parameter takes effect only the
ID_LOOSE_EXPEL
feature is enabled.ms
10
/proc/sys/kernel/sched_expel_idle_balance_delay
The minimum
idle balance
interval when a CPU is evicting tasks. A value of -1 specifies thatidle balance
is not allowed.If only
ID_UNDERCLASS
tasks exist on a CPU and the tasks are being evicted, the CPU is idle.idle balance
is performed on the CPU to improve load-balancing effects. However, this may damageID_UNDERCLASS
tasks. You can configure thesched_expel_idle_balance_delay
parameter to alleviate the issue.ms
-1
/proc/sys/kernel/sched_idle_saver_wmark
The watermark for CPU idle time. When an
ID_IDLE_SAVER
task wakes up, the task attempts to find an idle CPU whose idle time exceeds the specified watermark.ns
0
/proc/sys/kernel/sched_group_identity_enabled
In kernel
4.19.91-26.4
and later, the/proc/sys/kernel/sched_group_identity_enabled
interface is added to allow you to enable the group identity feature. Before you can use the group identity feature, you must run theecho 1 > /proc/sys/kernel/sched_group_identity_enabled
command to enable the feature.After the group identity feature is enabled, data cannot be written to the
/proc/sys/kernel/sched_group_identity_enabled
interface if the value of thecpu.bvt_warp_ns
orcpu.identity
interface of the cgroup is not zero.NoteIf your kernel version is
4.19.91-26.4.al7
,4.19.91-26.5.al7
, or4.19.91-26.6.al7
, thesched_group_identity_enabled
interface is set to 1, and the value of the cpu.bvt_warp_ns interface of the cgroup is not zero, errors occur when you read the/proc/sys/kernel/sched_group_identity_enabled
settings. This is a read bug that does not affect the normal usage of the interface. This bug is fixed in kernel4.19.91-27.al7
and later.N/A
0
Information output
When you use the group identity feature, you can run the following command to view various parameters:
cat /proc/sched_debug
The following table describes the output parameters.
Parameter | Description |
| The number of |
| The number of |
| The number of non- |
| Indicates whether |
| Indicates whether |
| The cumulative runtime of |
| The cumulative runtime of |
| The number of non- |
| The difference between the minimum vruntimes of the two red-black trees when the CPU starts to evict tasks. |
| The cumulative difference between the minimum vruntimes of the two red-black trees caused by CPU eviction status. |
| The minimum vruntime of the low-priority red-black tree. |
FAQ
Question: How do I upgrade a kernel version in the range of 4.19.91-25.1.al7
to 4.19.91-25.5.al7
to 4.19.91-25.6.al7
or later?
Answer: Perform the following steps:
Log on to the Elastic Compute Service (ECS) instance whose kernel version you want to upgrade.
For more information, see Connect to a Linux instance by using a password or key.
Query the kernel version:
uname -r
Upgrade the latest kernel version:
sudo yum update kernel -y
Run the following command to restart the ECS instance for the new kernel version to take effect:
sudo reboot