The blk-iocost weight-based throttling feature is an Alibaba Cloud Linux improvement of the weight-based disk throttling feature of the cgroup I/O subsystem (blkcg). blk-iocost is an I/O controller that is used to allocate bandwidth to I/O operations on block devices based on the priorities of applications or processes. blk-iocost can also control the usage of the block device I/O bandwidth by specific applications or processes based on specified weight values. blk-iocost helps you better control and manage disk I/O resources.
Note cgroup v1 and cgroup v2 are two versions of the resource management feature in the Linux kernel. In the Alibaba Cloud Linux kernel, the blk-iocost feature supports cgroup v1 and v2 interfaces. In most cases, only one version is activated and used in a system. To check whether the system uses the cgroup v1 interface or the cgroup v2 interface, run the stat -fc %T /sys/fs/cgroup
command.
If tmpfs
is returned, the cgroup v1 interface is used.
If cgroup2fs
is returned, the cgroup v2 interface is used.
Usage notes
cost.qos
This interface is used to enable or disable the blk-iocost feature and limits the I/O quality of service (QoS) rate based on the latency weight. The interface is a read/write interface whose file exists only in the root group of the blkcg. The full name of the interface file varies based on the cgroup version:
Interface configuration:
Each line in the configuration file starts with the major (MAJ) and minor (MIN) numbers of a disk in the MAJ:MIN
format, followed by the following configurations. To query the MAJ and MIN numbers of a disk, run the lsblk | grep <disk name>
command.
cost.model
This interface is used to configure the cost model. The interface is a read/write interface whose file exists only in the root group of the blkcg. The full name of the interface file varies based on the cgroup version:
Interface configuration:
Each line in the configuration file starts with the major (MAJ) and minor (MIN) numbers of a disk in the MAJ:MIN
format, followed by the following configurations. To query the MAJ and MIN numbers of the disk, run the lsblk | grep <disk name>
command.
ctrl: the control mode. Valid values: auto
and user
.
model: the model parameter. Valid value: linear
. If you set the model parameter to linear
, you must specify the following modeling parameters:
[r|w]bps: the maximum sequential I/O throughput.
[r|w]seqiops: the sequential input/output operations per second (IOPS).
[r|w]randiops: the random IOPS.
Note You can use the tools/cgroup/iocost_coef_gen.py script in the kernel source code to generate the preceding parameters and then write the parameters to the cost.model interface file to configure the cost model.
weight (Alibaba Cloud Linux 3) or cost.weight (Alibaba Cloud Linux 2)
This interface is used to set a weight value for each disk or modify the default weight (100) of a subgroup. Valid values: 1 to 10000. The interface is a read/write interface whose interface file exists only in the subgroup of blkcg.
Alibaba Cloud Linux 3
The full name of the interface file varies based on the cgroup version:
Interface configuration:
Limits
Only Alibaba Cloud Linux images that contain the following kernel versions support the blk-iocost feature:
Procedure
Step 1: Use cost.qos to enable the blk-iocost feature
Example scenario: Use the cost.qos
interface to enable the blk-iocost feature for the 254:48
disk. If more than 5% of read and write requests have a latency (rlat|wlat
) longer than 5 milliseconds, the disk is considered to be saturated. The kernel adjusts the rate at which requests are sent to the disk within the interval from 50% to 150% of the original rate. Run the following commands for the cgroup v1 and cgroup v2 interfaces:
Command for cgroup v1
sudo sh -c 'echo "254:48 enable=1 ctrl=user rpct=95.00 rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/blkio/blkio.cost.qos'
Command for cgroup v2
sudo sh -c 'echo "254:48 enable=1 ctrl=user rpct=95.00 rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/io.cost.qos'
Step 2: Use cost.model to configure a cost model
Example scenario: Use the cost.model
interface to set model to linear
and specify modeling parameters to configure a model on the 254:48
disk. Run the following commands for the cgroup v1 and cgroup v2 interfaces:
Command for cgroup v1
sudo sh -c 'echo "254:48 ctrl=user model=linear rbps=2706339840 rseqiops=89698 rrandiops=110036 wbps=1063126016 wseqiops=135560 wrandiops=130734" > /sys/fs/cgroup/blkio/blkio.cost.model'
Command for cgroup v2
sudo sh -c 'echo "254:48 ctrl=user model=linear rbps=2706339840 rseqiops=89698 rrandiops=110036 wbps=1063126016 wseqiops=135560 wrandiops=130734" > /sys/fs/cgroup/io.cost.model'
Step 3: Modify the weight
Example scenario: After you configure cost.qos
by performing Step 1: Use the cost.qos interface to enable the blk-iocost feature and configure cost.model
by performing Step 2: Use the cost.model interface to configure the cost model, the blk-iocost feature is enabled. Then, you can create the blkcg1 (cgroup v1) or cg1 (group v2) control group and use the cost.weight
interface for cgroup v1 or the weight
interface for cgroup v2 to change the default weight of the control group to 50. Then, set the weight of the control group on the 254:48
disk to 50. Run the following commands for the cgroup v1 and cgroup v2 interfaces:
Commands for cgroup v1
sudo mkdir /sys/fs/cgroup/blkio/blkcg1 # Create the control group named blkcg1.
sudo sh -c 'echo "50" > /sys/fs/cgroup/blkio/blkcg1/blkio.cost.weight' # Change the default weight to 50.
sudo sh -c 'echo "254:48 50" > /sys/fs/cgroup/blkio/blkcg1/blkio.cost.weight' # Set the weight for the disk to 50.
Commands for cgroup v2
Alibaba Cloud Linux 2
sudo mkdir /sys/fs/cgroup/cg1 # Create the control group named cg1.
sudo sh -c 'echo "50" > /sys/fs/cgroup/cg1/io.cost.weight' # Change the default weight to 50.
sudo sh -c 'echo "254:48 50" > /sys/fs/cgroup/cg1/io.cost.weight' # Set the weight to 50.
Alibaba Cloud Linux 3
sudo mkdir /sys/fs/cgroup/cg1 # Create the control group named cg1.
sudo sh -c 'echo "50" > /sys/fs/cgroup/cg1/io.weight' # Change the default weight to 50.
sudo sh -c 'echo "254:48 50" > /sys/fs/cgroup/cg1/io.weight' # Set the weight for the disk to 50.
Common monitoring tools
blk-iocost needs to be able to monitor and evaluate the I/O performance of your system. You can use the following tools or interfaces to monitor the I/O resource usage and then optimize the resource usage.
iocost monitor script
The tools/cgroup/iocost_monitor.py
script in the kernel source code uses the drgn debugger to obtain kernel parameters and provide I/O performance monitoring data. Perform the following steps to use the script:
Install the drgn debugger. Sample command:
sudo pip3 install drgn
For information about the drgn debugger, see drgn.
(Optional) Download iocost_monitor.py
.
If you did not download the complete Linux kernel source code, clone or download the iocost_monitor.py
script from the public repository of the Linux kernel. Sample command:
wget https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/tools/cgroup/iocost_monitor.py
Run the iocost_monitor.py
script. In the following example, the VDD is used. Sample command:
sudo python3 ./iocost_monitor.p vdd
The following command output is returned:
vdd RUN per=500.0ms cur_per=3930.839:v14620.321 busy= +1 vrate=6136.22% params=hdd
active weight hweight% inflt% dbt delay usages%
blkcg1 * 50/ 50 9.09/ 9.09 0.00 0 0*000 009:009:009
blkcg2 * 500/ 500 90.91/ 90.91 0.00 0 0*000 089:091:092
blkio.cost.stat interface file of cgroup v1
The Alibaba Cloud Linux kernel provides the blk-iocost interface file (blkio.cost.stat) of the cgroup v1 interface. This interface file records the QoS data of each controlled device. Run the following command to view the interface file:
cat /sys/fs/cgroup/blkio/blkcg1/blkio.cost.stat
The following command output is returned:
254:48 is_active=1 active=50 inuse=50 hweight_active=5957 hweight_inuse=5957 vrate=159571
ftrace tool
The Alibaba Cloud Linux kernel provides the ftrace tool related to the blk-iocost feature. For the blk-iocost
feature, ftrace can help trace the decision-making process of the scheduler and the I/O request processing in detail to provide in-depth performance analysis. Perform the following steps to use the ftrace tool:
Run the following command to set the enable
attribute to 1 to enable the ftrace tool:
sudo sh -c 'echo 1 > /sys/kernel/debug/tracing/events/iocost/enable'
Run the following command to view the output information:
sudo cat /sys/kernel/debug/tracing/trace_pipe
The following command output is returned:
dd-1593 [008] d... 688.565349: iocost_iocg_activate: [vdd:/blkcg1] now=689065289:57986587662878 vrate=137438 period=22->22 vtime=0->57986365150756 weight=50/50 hweight=65536/65536
dd-1593 [008] d.s. 688.575374: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
<idle>-0 [008] d.s. 688.608369: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
dd-1594 [006] d... 688.620002: iocost_iocg_activate: [vdd:/blkcg2] now=689119946:57994099611644 vrate=137438 period=22->26 vtime=0->57993412421644 weight=250/250 hweight=65536/65536
<idle>-0 [008] d.s. 688.631367: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
<idle>-0 [008] d.s. 688.642368: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
<idle>-0 [008] d.s. 688.653366: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1
<idle>-0 [008] d.s. 688.664366: iocost_ioc_vrate_adj: [vdd] vrate=137438->137438 busy=0 missed_ppm=0:0 rq_wait_pct=0 lagging=1 shortages=0 surpluses=1