By Guangcheng (Wen Maoquan) and Wuya (Li Jingxuan), core members of the OpenAnolis Tracking Diagnosis Technology SIG.
ssar is a system performance monitoring tool developed by Alibaba and contributed to OpenAnolis. The tool has a small overhead and has been deployed on a large scale across multiple business units in the company for more than one year.
ssar tools have recently been open-source in leading operating system open source community and innovation platform – OpenAnolis. All are welcome to visit OpenAnolis to download the source code. The compilation and installation steps are provided together with the source code, and the specific "User Reference Manual" is also available.
The ssar tool solves many practical production problems in daily use. For a quick understanding of the use of ssar, here we introduce typical usage through 10 specific user scenarios.
In daily operation and maintenance, the CPU and memory of the whole machine often increase, which causes system instability. It is necessary to be able to review historical data to understand process-level CPU and memory profiles during the fluctuation periods.
ssar allows detailed view on the trend of CPU and memory changes of the whole machine. User, system, and anonpages are the indicator names used by the kernel to refer to user space CPU usage, kernel space CPU usage, and anonymous page memory.
$ ssar -i 1 -o user,system,anonpages
collect_datetime user/s system/s anonpages
2021-09-08T18:36:00 787.38 145.30 6590885
2021-09-08T18:37:00 299.93 230.25 12808418
2021-09-08T18:38:00 116.58 79.43 14119096
2021-09-08T18:39:00 401.88 101.75 10580258
2021-09-08T18:40:00 332.68 279.62 8759586
Use ssar to view process-level CPU impact factors. The procs subcommand displays a list of the top 3 processes in kernel state CPU usage during the 2-minute duration from 2021-09-08T18:38:00 to 2021-09-08T18:40:00. Pucpu, pscpu, and pcpu indicate, respectively, the user space CPU utilization, kernel space CPU utilization, and total CPU utilization for this process.
$ ssar procs -f 2021-09-08T18:40:00 -r 2 -o pid,ppid,pcpu,pucpu,pscpu,cmd -k pscpu -l 3
pid ppid pcpu pucpu pscpu cmd
1537 1536 95.3 65.0 30.3 java
29349 1 0.3 0.1 0.2 sresar
11 2 0.1 0.0 0.1 migration/0
Use ssar to view the process-level memory impact factors. The procs subcommand displays a list of the 3 processes with surging memory requests during the 2-minute duration from 2021-09-08T18:36:00 to 2021-09-08T18:38:00. rss_dlt indicates the delta-increment of rss memory.
$ ssar procs -f 2021-09-08T18:38:00 -r 2 -o pid,ppid,rss,rss_dlt,nlwp,cmd -k rss_dlt -l 3
pid ppid rss rss_dlt nlwp cmd
197779 1 14624 9472 1 syslog-ng
185017 1 136328 5400 1 systemd-journal
27495 77722 360 360 1 sleep
Through the preceding method, the process that affects the fluctuation of the whole machine index is discovered, and the CPU and memory changes of a specific process can be further tracked by proc subcommands. The left angle bracket (<) indicates that the sresar process has not started at the 2021-08-09T11:39:00 time, and the right angle bracket (>) indicates that the sresar process has ended at the 2021-08-09T11:47:00 time.
$ ssar proc -p $(pidof sresar) -i1 -o collect_datetime,rss,min_flt,cmd
collect_datetime rss min_flt cmd start_datetime
2021-08-09T11:39:00 < < < <
2021-08-09T11:40:00 1524 631 sresar 2021-08-09T11:39:18
2021-08-09T11:41:00 1708 1231 sresar 2021-08-09T11:39:18
2021-08-09T11:42:00 3552 1748 sresar 2021-08-09T11:39:18
2021-08-09T11:43:00 3552 1748 sresar 2021-08-09T11:39:18
2021-08-09T11:44:00 3552 1749 sresar 2021-08-09T11:39:18
2021-08-09T11:45:00 3552 1749 sresar 2021-08-09T11:39:18
2021-08-09T11:46:00 3552 1749 sresar 2021-08-09T11:39:18
2021-08-09T11:47:00 > > > >
$ ssar proc -p $(pidof sresar) -i 1 --mem
$ ssar proc -p $(pidof sresar) -i 1 –cpu
ssar of load5s sub-command not only improves the accuracy of load to 5s but also creatively adds load5s indicators. load5s is more precise than load1, load5, and load15 indicators and can better reflect the system profile at the time of load pressure. The legacy load1 index still has a certain lag after the pressure on system load has subsided. In comparison, the load5s index provided by ssar refectly the machine load pressure more trufully.
$ ssar load5s
collect_datetime threads load1 runq load5s
2021-08-09T14:17:35 1047 0.28 2 1
2021-08-09T14:17:40 1058 0.25 1 0
2021-08-09T14:17:47 3047 113.46 1453 1414
2021-08-09T14:17:53 3053 264.62 2002 2001
2021-08-09T14:17:59 3053 403.74 2002 2002
2021-08-09T14:18:05 1049 371.41 1 0
2021-08-09T14:18:10 1055 341.67 1 0
2021-08-09T14:18:15 1048 314.31 1 0
$ ssar load5s -f 2021-07-15T19:00:00 -r 300 -z
If the load5s value is above a certain threshold of CPU cores, the collection of load details will be triggered, in which the kernel call stack is sampled and recorded for D-status threads. Use load2p to display detailed D status call stack information.
$ ssar load2p -c 2021-09-08T17:25:00 --stackinfo
# The system problem can be identified through call stack information.
page_fault
do_page_fault
__do_page_fault
handle_mm_fault
__do_fault
filemap_fault
__lock_page_or_retry
wait_on_page_bit_killable
sleep_on_page_killable
Problems related to insufficient system memory constitute a category of most common problems in daily operation and maintenance. Sometimes we will see logs such as “page allocction failure: order:4" from the system logs. This is a manifestation of insufficient memory available. With memory fragmentation, the kernel space is unable to obtain high-level memory.
To identify such problems, it is necessary to diagnose memory profile of the whole machine to understand the memory insufficiency. The anonpages index is an anonymous page. The memfree index is the free memory of the whole machine. The pgsteal_kswapd index is the amount of asynchronous memory reclaimed per second, and the pgsteal_direct is the amount of direct memory reclaimed per second.
$ ssar -i 1 -o anonpages,memfree,pgsteal_kswapd,pgsteal_direct
collect_datetime anonpages memfree pgsteal_kswapd/s pgsteal_direct/s
2021-09-09T10:40:00 11151058 1716004 0.00 0.00
2021-09-09T10:41:00 12008117 1615129 0.00 0.00
2021-09-09T10:42:00 13993906 1404292 0.00 0.00
2021-09-09T10:43:00 15869526 1233360 0.00 0.00
2021-09-09T10:44:00 17135683 1083402 43821.88 0.00
2021-09-09T10:45:00 19488155 829303 45080.95 0.00
2021-09-09T10:46:00 20787968 712429 10452.02 455.90
2021-09-09T10:47:00 21725142 614698 0.00 2313.68
It can be seen that with the growth of the anonpages memory of the system, the memfree free memory decreases. After reaching a certain threshold, asynchronous memory reclamation pgsteal_kswapd starts and the reduction of free memory causes direct memory reclamation pgsteal_direct.
Under the premise of the serious shortage of free memory, there will be kernel high-level memory shortage. From the data, it can be seen that the order4 memory inventory at the 2021-09-09T10:46:00 time is 0.
$ ssar -i 1 --node0
collect_datetime order0 order1 order2 order3 order4 order5
2021-09-09T10:40:00 139116 429294 226754 51287 37276 44181
2021-09-09T10:41:00 20070 89672 37399 51721 37277 44181
2021-09-09T10:42:00 31154 71323 36612 49952 37277 44181
2021-09-09T10:43:00 73309 93733 36843 49249 36256 44157
2021-09-09T10:44:00 76885 73331 36409 48408 35477 42181
2021-09-09T10:45:00 46915 140381 40744 47815 34313 44171
2021-09-09T10:46:00 28346 16054 6420 141 0 0
2021-09-09T10:47:00 34694 29549 16750 3042 1481 319
When there is insufficient memory at the cgroup level, there is also a memory bump problem, which can cause serious system problems.
$ tsar2 --io -i 1 -I nvme0n1p1 -s rs,rarqsz,util
Time nvme0n1p1 nvme0n1p1 nvme0n1p1
Time rs rarqsz util
09/09/21-14:50 66.1K 10.25 100.00
09/09/21-14:51 71.1K 13.25 100.00
09/09/21-14:52 63.8k 10.00 100.00
09/09/21-14:53 57.0k 14.50 100.00
09/09/21-14:54 64.0k 14.00 100.00
With the superior performance of nvme disks, sometimes the disk util can surge to 100% and the corresponding disk read IO can be observed to be in the tens of thousands, while a single read IO size can be just over 10 Kb in size. One of the most common causes of this problem is cgroup-level memory bumps. Through the kernel index pgmajfault, it can be seen that the number of main missing page interrupts per second is high while the IO is full, which is the direct cause of the system IO full.
$ ssar -i 1 -o pgmajfault
collect_datetime pgmajfault/s
2021-09-09T14:50:00 43491.31
2021-09-09T14:51:00 44432.01
2021-09-09T14:52:00 45464.78
2021-09-09T14:53:00 43991.22
2021-09-09T14:54:00 45922.34
$ ssar procs -f 2021-09-09T14:51:00 -r 1 -o pid,rss,maj_flt,maj_dlt,cmd -k maj_dlt -l 5
pid rss maj_flt maj_dlt cmd
58532 8240736 126575484 1305199 java
46873 8237292 173405708 1257924 java
51357 577796 13201 63 python
253136 566352 11687 46 syslog-ng
46025 1229520 478 42 bash
Further, through the ssar procs subcommand, you can locate the 2 java processes pid 58532 and 46873 that have a large number of main and missing page interrupts. The reason for a large number of main and missing page interrupts in java processes is that the memory rss process is close to the memory.limit_in_bytes value set by the cgroup. Therefore, it results in insufficient memory space for the clean state code segments in the cgroup group to load all segments. Therefore, code segments can only be discarded and read from the disk during program execution.
At the host network level, routine O&M also suffers from problems such as network packet loss, retransmission, and disorder. Several sets of more in-depth network metrics are also provided here for diagnosing network problems.
$ tsar2 --retran
Time --retran--- --retran--- --retran-- --retran-- -retran- ----retran----
Time RetransSegs FastRetrans LossProbes RTORetrans RTORatio LostRetransmit
08/09/21-16:45 49.44 34.41 16.02 6.41 12.97 0.26
08/09/21-16:50 44.07 29.03 16.84 8.21 18.63 0.13
08/09/21-16:55 37.58 18.72 10.91 11.44 30.44 0.02
$ tsar2 --tcpofo
Time -tcpofo-- ---tcpofo--- ---tcpofo--- -tcpofo- tcpofo- -tcpofo-
Time OfoPruned DSACKOfoSent DSACKOfoRecv OFOQueue OFODrop OFOMerge
08/09/21-16:45 0.00 0.00 0.00 251.86 0.00 0.00
08/09/21-16:50 0.00 0.00 0.00 182.35 0.00 0.00
08/09/21-16:55 0.00 0.00 0.00 115.70 0.00 0.00
In terms of tcp retransmission, we provide many more detailed indicators than the tcp retransmission rate. Modern Linux systems have a variety of retransmission methods in addition to 200ms timeout retransmission to accelerate network jitter or recovery under abnormal conditions. Here, we classify the TCP retransmission in detail, including fast retransmission, tail packet retransmission, and overtime retransmission. Different retransmissions have different impacts on the service. We may observe that fast retransmission or tail packet retransmission more often, but overtime retransmission is not common, which does not affect the service. However, once there is a large number of timeout retransmissions, it often means real network exceptions.
--tcpofo
allows us to see some disorder situations that we currently receive. Disorder and retransmission are often associated. Generally, when we receive disorder messages, we see OFOQueue increase. However, once we see OFODrop, the received disorder messages are lost. At this time, it is likely that there are too many disorder messages in our queue, or our application process fails to collect packets from the receive queue of tcp in time.
The preceding data can be combined with ssar historical data to find some retransmission or disorder anomalies, which can help us locate the causes of some tcp problems.
$ tsar2 --tcpdrop
Time ----tcpdrop----- --tcpdrop-- ----tcpdrop---- ----tcpdrop---- --tcpdrop--
Time LockDroppedIcmps ListenDrops ListenOverflows PrequeueDropped BacklogDrop
08/09/21-16:45 0.00 0.28 0.28 0.00 0.00
08/09/21-16:50 0.00 1.97 1.97 0.00 0.00
08/09/21-16:55 0.00 21.96 21.96 0.00 0.00
$ tsar2 --tcperr
Time ---tcperr--- ---tcperr--- ---tcperr--- ---tcperr---- --tcperr---
Time RenoFailures SackFailures LossFailures AbortOnMemory AbortFailed
08/09/21-16:45 0.00 0.00 0.00 0.00 0.00
08/09/21-16:50 0.00 0.00 0.00 0.00 0.00
08/09/21-16:55 0.00 0.00 0.00 0.00 0.00
Some typical packet losses and exceptions of TCP can be observed through --tcpdrop/--tcperr. For example, we can see that there are more ListenDrops and ListenOverflows at 08/09/21-16:55, which indicates that there are more TCP SYN connection requests, resulting in the TCP accept queue being full. This indicates that the new connection requests arriving at the server are faster than the server accept() consuming new connection requests. A large number of connection requests may have occurred, or the server has not called the accept() for a long time to consume the SYN requests in the accept queue.
Driver developers are likely to want to record some performance data in their driver modules. ssar tool provides a simple and complete collection scheme for this scenario. Developers only need to expose the performance metrics in kernel modules to the user space by using pseudo files in /proc/ or /sys/. The data can be a cumulative value or an instantaneous one. For example, there are 2 values in mydev pseudo file. The indicator1 on the left is the cumulative value, and the indicator2 on the right is the instantaneous value.
$ cat /proc/mydev
104377580 7252
The ssar provides a simple way to configure and collect data.
$ cat /etc/ssar/sys.conf
[file]
default=[
{src_path='/proc/mydev', turn=true, cfile='mydev'},
]
$ sudo systemctl restart sresar # Restart the collection process configuration file
After modifying the configurations, you can run the following command to obtain historical data.
$ ssar -i 1 -f 2021-09-09T17:20:00 -r 60 -o 'metric=d:cfile=mydev:line=1:column=1:alias=indicator1;metric=c:cfile=mydev:line=1:column=2:alias=indicator2'
collect_datetime indicator1/s indicator2
2021-09-09T17:30:00 884.40 7192
2021-09-09T17:31:00 652.15 7078
2021-09-09T17:32:00 799.73 7676
2021-09-09T17:33:00 708.37 7953
2021-09-09T17:34:00 596.47 7738
The command can be changed to obtain real-time data as well:
$ ssar -i 1s -f +10 -o 'metric=d|cfile=mydev|line=1|column=1|alias=indicator1;metric=c|cfile=mydev|line=1|column=2|alias=indicator2'
collect_datetime indicator1/s indicator2
2021-09-09T17:40:00 668.50 7004
2021-09-09T17:40:01 480.38 6115
2021-09-09T17:40:02 651.55 6496
2021-09-09T17:40:03 789.15 6433
2021-09-09T17:40:04 867.93 7508
ssar tool does not require installation, simply copy ssar program to obtain real-time data.
$ ./ssar -i 1s -f +10 -o
'metric=d|src_path=/proc/mydev|line=1|column=1|alias=indicator1;metric=c|src_path=/proc/mydev|line=1|column=2|alias=indicator2'
If multiple rows or columns have similar meanings, you can refer to the following 2 usages.
$ ssar -o 'metric=d:cfile=stat:line=2:column=2-
11:alias=cpu0_{column};' # cpu0 Data in columns 2 to 11
$ ssar -o 'metric=d|cfile=stat|line=2-
17|column=5|alias=idle_{line};' # cpu0 to 15 of idle
ssar supports the case when the indicator is not an integer, but in the form of real number or strings, for example:
$ ssar -o 'metric=c:cfile=loadavg:line=1:column=1:dtype=float:alias=load1'
# load1real data
$ ssar -o 'metric=c|cfile=loadavg|line=1|column=4|dtype=string|alias=rp'
# rp string information
If you wish to collect mydev, ssar also provides two configuration parameters load5s_flag and proc_flag to disable load and process-level metrics collection. This saves unnecessary overhead in disk storage.
Excessive consumption of softirq resources for a specific CPU affects user-space resource usage on this CPU. Therefore, when interruptions are unevenly distributed, the execution of user-space processes will be affected. For diagnosis, one would first need to understand whether softirq resource consumption is balanced across CPUs.
$ tsar2 --cpu -I cpu,cpu0,cpu1,cpu2,cpu3,cpu4,cpu5,cpu6,cpu7 -s sirq -i 1
Time --cpu-- -cpu0-- -cpu1-- -cpu2-- -cpu3-- -cpu4-- -cpu5-- -cpu6-- -cpu7--
Time sirq sirq sirq sirq sirq sirq sirq sirq sirq
09/09/21-21:15 1.44 0.01 0.01 0.01 2.28 0.02 15.69 2.72 5.57
09/09/21-21:16 1.98 0.01 0.01 0.01 2.56 0.03 18.40 2.56 8.93
09/09/21-21:17 2.61 0.01 0.01 0.01 2.31 0.09 17.28 2.21 3.59
09/09/21-21:18 1.84 0.01 0.01 0.01 1.97 0.03 12.16 2.04 8.75
09/09/21-21:19 1.32 0.01 0.01 0.01 1.66 0.03 16.52 1.69 6.77
It can be seen from the command output that the softirq is not evenly distributed among various CPUs, and the cpu5 softirq is more than 10%. For multi-core CPU, you can use the tsar2 cputop subcommand to sort the CPU according to its softirq footprint.
Tsar2 irqtop can quickly locate the high softirq resource ratio of cpu5 because as a result of 158 interruption named virtio-output.1, and shows the average number of interruptions per second.
$ tsar2 irqtop -i 1 -C 5
Time --N1--- N1- N1- --------N1-------- --N2--- N2- N2- --------N2------
Time count irq CPU name count irq CPU name
09/09/21-22:58 981.80 158 5 virtio-output.1 304.02 154 5 nvme0q0, nvme0q1
09/09/21-22:59 765.32 158 5 virtio-output.1 234.97 154 5 nvme0q0, nvme0q1
09/09/21-23:00 763.63 154 5 nvme0q0, nvme0q1 393.88 158 5 virtio-output.1
09/09/21-23:01 531.87 158 5 virtio-output.1 225.03 154 5 nvme0q0, nvme0q1
09/09/21-23:02 557.23 158 5 virtio-output.1 150.25 154 5 nvme0q0, nvme0q1
By default, collection of historical data is not enabled for CPU-level interruptions. To enable it, one should change the interrupts configuration to the following.
{src_path='interrupts', turn=true, gzip=true},
If you need to obtain real-time data, append the -l option to the preceding command.
$ tsar2 --cpu -I cpu,cpu0,cpu1,cpu2,cpu3,cpu4,cpu5,cpu6,cpu7 -s sirq -i 1 -l
$ tsar2 irqtop -i 1 -C 5 -l
Not only does ssar provide rich data metrics, it also produces output that is easy to use and parse. All data output of ssar supports json format, which facilitates parsing with high-level languages such as python. This can be done with a --api option in the command.
$ ssar -r 5 -o user,system,memfree,anonpages --api
[{"anonpages":"280824","collect_datetime":"2021-09-10T12:00:00","memfree":"181753024","system":"-","user":"-"},{"anonpages":"280812","collect_datetime":"2021-09-10T12:05:00","memfree":"181752704","system":"2.35","user":"1.22"}]
For shell and other script parsing, ssar also offers convenient options. For example, the -H option can be used to hide various header information.
$ ssar -r 5 -o user,system,memfree,anonpages -H
2021-09-10T12:00:00 - - 81754836 280296
2021-09-10T12:05:00 1.29 2.57 81756712 280228
Supplying the -P option will hide symbols such as "-", "*", and "<" that appear in the data item. This allows the upper-layer caller-script to reduce handling of abnormal data.
$ ssar -r 5 -o user,system,memfree,anonpages -H -P
2021-09-10T12:10:00 1.31 2.58 81747740 281676
code base address of the ssar tool: https://gitee.com/anolis/ssar.git
SIG is open and strives to make deliverables part of community distribution. It is governed by core members in the group and can communicate through mailing lists with members in the group. At present, there are more than 20 SIG in the OpenAnolis, including trace diagnosis technology SIG, commercial secret software stack, high-performance storage technology SIG, Java language and virtual machine SIG, Cloud Kernel, OceanBase SIG.
You are welcome to join the tracking diagnosis technology SIG: https://openanolis.cn/sig/tracing
85 posts | 5 followers
FollowAlibaba Clouder - July 13, 2020
Nick Patrocky - March 8, 2024
Alibaba Clouder - January 25, 2021
Alibaba Clouder - June 10, 2020
Data Geek - April 19, 2024
Amuthan Nallathambi - July 12, 2024
85 posts | 5 followers
FollowBuild business monitoring capabilities with real time response based on frontend monitoring, application monitoring, and custom business monitoring capabilities
Learn MoreHigh Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.
Learn MoreA HPCaaS cloud platform providing an all-in-one high-performance public computing service
Learn MoreConnect your on-premises render farm to the cloud with Alibaba Cloud Elastic High Performance Computing (E-HPC) power and continue business success in a post-pandemic world
Learn MoreMore Posts by OpenAnolis