All Products
Search
Document Center

Alibaba Cloud Linux:Group identity feature

Last Updated:Nov 26, 2024

Alibaba Cloud Linux 2 with kernel version 4.19.91-24.al7 and later and Alibaba Cloud Linux 3 with kernel version 5.10.46-7.al8 and later support the group identity feature. You can use the group identity feature to configure different identities for CPU control groups (cgroups) to define the priorities of processes (tasks) in the cgroups.

Background information

When you deploy latency-sensitive tasks and computing tasks on the same instance, the Linux kernel scheduler must provide more scheduling opportunities to high-priority tasks to minimize scheduling latency and the impacts of low-priority tasks on kernel scheduling. In the preceding scenario, Alibaba Cloud Linux provides the group identity feature and adds interfaces that you can use to configure scheduling priorities for CPU cgroups. Tasks that have different priorities have the following characteristics:

  • High-priority tasks have the minimal wakeup latency.

  • Low-priority tasks do not affect the performance of high-priority tasks.

    • Waking up low-priority tasks does not affect the performance of high-priority tasks.

    • Low-priority tasks do not share hardware units and do not cause negative impacts on the performance of high-priority tasks.

Prerequisites

Note
  • In Alibaba Cloud Linux 2 with kernel version 4.19.91-26, 4.19.91-26.1, 4.19.91-26.2, or 4.19.91-26.3, the group identity feature is disabled in the kernel. You can run the uname -r command to query the kernel version of Alibaba Cloud Linux 2.

  • In Alibaba Cloud Linux 3 with kernel version 5.10.112-11.al8, 5.10.112-11.1.al8, 5.10.112-11.2.al8, 5.10.134-12.al8, 5.10.134-12.1.al8, or 5.10.134-12.2.al8, the group identity feature is disabled in the kernel.

  • If you use the group identity feature on Alibaba Cloud Linux 2 with a kernel version within the range of 4.19.91-25.1.al7 to 4.19.91-25.5.al7, downtime occurs. In this case, upgrade the kernel version to 4.19.91-25.6.al7 or later. For more information, see the Upgrade the kernel section of the "Change the kernel version" topic.

  • If Alibaba Cloud Linux 3 with kernel version 5.10.134-12.2.al8 uses the x86_64 architecture, run the following commands to enable the group identity feature:

    yum makecache
    sudo yum install scheduler-group-identity.x86_64 -y
  • In Alibaba Cloud Linux 2 with kernel version 4.19.91-26.4 or later and Alibaba Cloud Linux 3 with kernel version 5.10.134-13.al8 or later, the /proc/sys/kernel/sched_group_identity_enabled interface is added to allow you to enable the group identity feature. Before you can use the group identity feature, you must run the sudo sh -c 'echo 1 > /proc/sys/kernel/sched_group_identity_enabled' command to enable the feature.

How the group identity feature works

The group identity feature allows you to configure identities for CPU cgroups to define the priorities of tasks in the cgroups. The group identity feature relies on a dual red-black tree architecture. A low-priority red-black tree based on the red-black tree of the Completely Fair Scheduler (CFS) scheduling queue is added to store low-priority tasks.

When the kernel schedules the tasks for which identities are configured, the kernel processes the tasks based on the priorities of the tasks. The following table describes the identities in descending order of priority.

Identity

Description

ID_HIGHCLASS

Identifies a high-priority task. A high-priority task has more opportunities to preempt resources than a normal- or low-priority task.

When the CFS schedules high-priority tasks, the following scenarios may occur:

  • If a high-priority task wakes up when a low-priority task is running, the high-priority task can unconditionally preempt resources from the low-priority task.

  • If a high-priority task wakes up when a normal-priority task is running and the virtual runtime (vruntime) of the high-priority task is shorter than the vruntime of the normal-priority task, the high-priority task can ignore the original scheduling policy and preempt resources. The original scheduling policy specifies that a task cannot preempt resources when the runtime of the task on a CPU is shorter than the minimum runtime.

  • If a low- or normal-priority task is running when tasks queue up to run, a high-priority task whose vruntime is shorter than the vruntime of the running task can ignore the original scheduling policy and preempt resources. The original scheduling policy specifies that a task cannot preempt resources when the runtime of the task on a CPU is shorter than the minimum runtime.

ID_NORMAL

Identifies a normal-priority task. A normal-priority task has more opportunities to preempt resources than a low-priority task.

When the CFS schedules normal-priority tasks, the following scenarios may occur:

  • If a normal-priority task wakes up when a low-priority task is running, the normal-priority task can unconditionally preempt resources from the low-priority task.

  • If a low-priority task is running when tasks queue up to run, a normal-priority task whose vruntime is shorter than the vruntime of the running task can ignore the original scheduling policy and preempt resources. The original scheduling policy specifies that a task cannot preempt resources when the runtime of the task on a CPU is shorter than the minimum runtime.

ID_UNDERCLASS

Identifies a low-priority task.

When the CFS schedules low-priority tasks, the following scenarios may occur:

If an ID_SMT_EXPELLER task runs on the peer simultaneous multithreading (SMT) CPU, low-priority tasks are evicted from the current CPU.

The preceding identities apply based on the resource management policies of the CPU cgroups.

  • For tasks in CPU cgroups of the same level, the identity priorities take effect.

  • For tasks in CPU cgroups of different levels, the identity priorities do not take effect on tasks in the parent cgroups but take effect on tasks in the child cgroups.

  • For tasks that have the same identity priority, resources are competed for in compliance with the CFS policies. Take note that the runtime of tasks identified by the ID_UNDERCLASS or ID_NORMAL identity may not reach the minimum value.

Other identities

Identity

Description

ID_SMT_EXPELLER

Identifies an SMT expeller task. When an SMT expeller task runs an SMT CPU, the task evicts the tasks that are identified by the ID_UNDERCLASS identity from the peer CPU.

ID_IDLE_SEEKER

Specifies that when a task wakes up, the task attempts to find idle CPUs within the limits of the scheduler policies.

ID_IDLE_SAVER

Used together with the sched_idle_saver_wmark kernel parameter. You can use sched_idle_saver_wmark to configure a watermark for CPU idle time. When a task identified by the ID_IDLE_SAVER identity wakes up, the task attempts to find a CPU whose idle time exceeds the specified watermark.

Interfaces

  • Interfaces used to configure identities

    The group identity feature provides the following interfaces to allow you to configure task identities: /sys/fs/cgroup/cpu/$cg/cpu.identity and /sys/fs/cgroup/cpu/$cg/cpu.bvt_warp_ns. The $cg variable specifies the child cgroup directory node on which a task runs. Before you use the interfaces, take note of the following items:

    • The cpu.bvt_warp_ns interface is a quick configuration interface. The written value of the interface is converted into identity values.

    • You can use the cpu.identity and cpu.bvt_warp_ns interfaces to change the identities of cgroups.

    • The identity value that is written by using the cpu.identity interface overwrites the identity value that is previously written by using the cpu.bvt_warp_ns interface, but the value of the cpu.bvt_warp_ns interface remains unchanged.

    • The identity value that is written by using the cpu.bvt_warp_ns interface overwrites the identity value that is previously written by using the cpu.identity interface, but the value of the cpu.identity interface remains unchanged.

    • You can use one of the interfaces to configure task identities. We recommend that you do not use the interfaces at the same time.

    • If you are unfamiliar with the operations related to the operating system kernel, we recommend that you do not use the cpu.identity interface.

    The following table describes the interfaces.

    Interface

    Description

    cpu.identity

    The default value is 0, which specifies the ID_NORMAL identity.

    The interface is a 5-bit field. Valid values of each bit: 0 and 1. A value of 0 specifies that the identity is not assumed. A value of 1 specifies that the identity is assumed. Description of each bit:

    • If the interface is left empty, the ID_NORMAL identity is used.

    • Bit 0: specifies the ID_UNDERCLASS identity.

    • Bit 1: specifies the ID_HIGHCLASS identity.

    • Bit 2: specifies the ID_SMT_EXPELLER identity.

    • Bit 3: specifies the ID_IDLE_SAVER identity.

    • Bit 4: specifies the ID_IDLE_SEEKER identity.

    For example, if you want to set the identity of a cgroup to ID_HIGHCLASS and ID_IDLE_SEEKER, set bit 1 and bit 4 to 1 and the other bits to 0 to obtain a binary value of 10010, which is converted into a decimal value of 18. Then, run the echo 18 > /sys/fs/cgroup/cpu/$cg/cpu.identity command to write 18 to the cpu.identity interface.

    cpu.bvt_warp_ns

    The default value is 0, which specifies the ID_NORMAL identity. Valid values:

    • 2: specifies the ID_SMT_EXPELLER, ID_IDLE_SEEKER, and ID_HIGHCLASS identities. The corresponding value in the cpu.identity interface is 22.

    • 1: specifies the ID_HIGHCLASS and ID_IDLE_SEEKER identities. The corresponding value in the cpu.identity interface is 18.

    • 0: specifies the ID_NORMAL identity. The corresponding value in the cpu.identity interface is 0.

    • -1: specifies the ID_UNDERCLASS and ID_IDLE_SAVER identities. The corresponding value in the cpu.identity interface is 9.

    • -2: specifies the ID_UNDERCLASS and ID_IDLE_SAVER identities. The corresponding value in the cpu.identity interface is 9.

    Note

    By default, Alibaba Cloud Linux supports the cgroup v1 interfaces. Alibaba Cloud Linux 3 with kernel version 5.10.134-13 and later in the 5.10 kernel series also supports the following cgroup v2 interfaces for the group identity feature:

    /sys/fs/cgroup/$cg/cpu.identity and /sys/fs/cgroup/$cg/cpu.bvt_warp_ns. The $cg variable specifies the child cgroup directory node on which a task runs.

  • Interfaces used to enable or disable kernel scheduling features

    You can run the following command to view the default settings of kernel scheduling features by using the sched_features interface:

    sudo cat /sys/kernel/debug/sched_features

    The following table describes the kernel scheduling features.

    Kernel scheduling feature

    Description

    Default value

    ID_IDLE_AVG

    This feature is used together with the ID_IDLE_SAVER identity to count the runtime of ID_UNDERCLASS tasks towards the idle time. This ensures that no CPUs remain idle when only ID_UNDERCLASS tasks are running, and prevents resource waste.

    ID_IDLE_AVG: indicates that the feature is enabled.

    ID_RESCUE_EXPELLEE

    This feature is used in load balancing scenarios. If tasks cannot find available CPU resources, CPUs that are evicting ID_UNDERCLASS tasks are used to balance loads. This feature helps move ID_UNDERCLASS tasks out of the evicted state at the earliest opportunity.

    ID_RESCUE_EXPELLEE: indicates that the feature is enabled.

    ID_EXPELLEE_NEVER_HOT

    After this feature is enabled, if a request is initiated to migrate a task that is being evicted to another CPU, hot cache does not cause the migration request to be denied. This feature helps move ID_UNDERCLASS tasks out of the evicted state at the earliest opportunity.

    NO_ID_EXPELLEE_NEVER_HOT: indicates that the feature is disabled.

    ID_LOOSE_EXPEL

    After this feature is enabled, CPUs do not update the eviction status every time the CPUs select tasks but automatically update the status based on the time specified by the sched_expel_update_interval kernel parameter. The configuration of this feature affects only status updates when CPUs select tasks. The updates for inter-processor interrupts (IPIs) are not affected.

    NO_ID_LOOSE_EXPEL: indicates that the feature is disabled.

    ID_LAST_HIGHCLASS_STAY

    After this feature is enabled, the last ID_HIGHCLASS task that runs on a CPU cannot be migrated to another CPU.

    ID_LAST_HIGHCLASS_STAY: indicates that the feature is enabled.

    ID_EXPELLER_SHARE_CORE

    • If this feature is enabled, ID_SMT_EXPELLER tasks can preferentially run on physical cores on which ID_SMT_EXPELLER tasks are already running.

    • If this feature is disabled, ID_SMT_EXPELLER tasks are distributed across physical cores. This way, the ID_SMT_EXPELLER tasks do not interfere with each other.

    ID_EXPELLER_SHARE_CORE: indicates that the feature is enabled.

    ID_ABSOLUTE_EXPEL

    In Alibaba Cloud Linux 3, this feature is introduced in kernel version 5.10.134-16.3 and is usable in kernel version 5.10.134-16.3 and later in the 5.10 kernel series. After this feature is enabled, ID_UNDERCLASS tasks are absolutely suppressed and cannot be scheduled if ID_NORMAL or ID_HIGHCLASS tasks exist in the task queues for running. In worst case scenarios, ID_UNDERCLASS tasks starve. In hybrid deployment scenarios, assess the loads of tasks that have different identities before you enable the feature.

    NO_ID_ABSOLUTE_EXPEL: indicates that the feature is disabled.

    ID_LOAD_BALANCE

    In Alibaba Cloud Linux 3, this feature is introduced in kernel version 5.10.134-16.3 and is usable in kernel version 5.10.134-16.3 and later in the 5.10 kernel series. After this feature is enabled, the scheduler considers the CPUs on which only ID_UNDERCLASS tasks run to be idle and attempts to migrate ID_HIGHCLASS tasks to the idle CPUs when a scheduler balances loads. During the migration, the scheduler tries to distribute the ID_HIGHCLASS tasks among the CPUs. This prevents CPU resource contention and Hyper-Threading (HT) inference between the ID_HIGHCLASS tasks and ensures that each ID_HIGHCLASS task obtains sufficient CPU resources.

    NO_ID_LOAD_BALANCE: indicates that the feature is disabled.

  • Interfaces used by sysctl to configure kernel parameters

    Specific capabilities of the group identity feature depend on the values of kernel parameters. The following table describes the kernel parameters.

    Kernel parameter

    Description

    Unit

    Default value

    /proc/sys/kernel/sched_expel_update_interval

    The interval at which the eviction status is automatically updated when a CPU selects tasks. This kernel parameter takes effect only if the ID_LOOSE_EXPEL feature is enabled.

    ms

    10

    /proc/sys/kernel/sched_expel_idle_balance_delay

    The minimum idle balance interval when a CPU is evicting tasks. A value of -1 specifies that idle balance is not allowed.

    If only ID_UNDERCLASS tasks exist on a CPU and the tasks are being evicted, the CPU is idle. Idle balance is performed on the CPU to improve load-balancing effects. However, this may damage ID_UNDERCLASS tasks. You can specify the sched_expel_idle_balance_delay parameter to alleviate this issue.

    ms

    -1

    /proc/sys/kernel/sched_idle_saver_wmark

    The watermark for CPU idle time. When an ID_IDLE_SAVER task wakes up, the task attempts to find an idle CPU whose idle time exceeds the specified watermark.

    ns

    0

    /proc/sys/kernel/sched_group_identity_enabled

    Starting from kernel version 4.19.91-26.4, the /proc/sys/kernel/sched_group_identity_enabled interface is added to allow you to enable the group identity feature. Before you can use the group identity feature, you must run the echo 1 > /proc/sys/kernel/sched_group_identity_enabled command to enable the feature.

    After the group identity feature is enabled, data cannot be written to the /proc/sys/kernel/sched_group_identity_enabled interface if the value of the cpu.bvt_warp_ns or cpu.identity interface of the cgroup is not zero.

    Note

    If your kernel version is 4.19.91-26.4.al7, 4.19.91-26.5.al7, or 4.19.91-26.6.al7, the sched_group_identity_enabled interface is set to 1, and the value of the cpu.bvt_warp_ns interface of the cgroup is not zero, errors occur when you read the /proc/sys/kernel/sched_group_identity_enabled settings. This is a read bug that does not affect the normal usage of the interface. This bug is fixed in kernel version 4.19.91-27.al7 and later.

    N/A

    0

Information output

When you use the group identity feature, you can run the following command to view various parameters:

cat /proc/sched_debug

The following table describes the output parameters.

Parameter

Description

nr_high_running

The number of ID_HIGHCLASS tasks that are running on the current CPU.

nr_under_running

The number of ID_UNDERCLASS tasks that are running on the current CPU.

nr_expel_immune

The number of non-ID_UNDERCLASS tasks that are running on the current CPU.

smt_expeller

Indicates whether ID_SMT_EXPELLER tasks are running on the current CPU. A value of 1 indicates that ID_SMT_EXPELLER tasks are running on the current CPU. A value of 0 indicates that no ID_SMT_EXPELLER tasks are running on the current CPU.

on_expel

Indicates whether ID_SMT_EXPELLER tasks are running on the peer SMT CPU. A value of 1 indicates that ID_SMT_EXPELLER tasks are running on the peer SMT CPU. A value of 0 indicates that no ID_SMT_EXPELLER tasks are running on the peer SMT CPU.

high_exec_sum

The cumulative runtime of ID_HIGHCLASS tasks on the current CPU.

under_exec_sum

The cumulative runtime of ID_UNDERCLASS tasks on the current CPU.

h_nr_expel_immune

The number of non-ID_UNDERCLASS tasks that are running on cfs_rq.

expel_start

The difference between the minimum vruntimes of the two red-black trees when the CPU starts to evict tasks.

expel_spread

The cumulative difference between the minimum vruntimes of the two red-black trees caused by the CPU eviction status.

min_under_vruntime

The minimum vruntime of the low-priority red-black tree.