This topic describes the metrics that you must monitor during Shared Memory Communication (SMC) maintenance to determine network health.
Prerequisites
smc-tools, which is a user-mode SMC maintenance toolset provided by Alibaba Cloud Linux 3, is installed.
If the smc-tools toolset is not installed, run the following command to install the toolset:
sudo yum install -y smc-toolsMonitor protocol stacks
The SMC stack provides statistical metrics related to connections, traffic, and shared memory, and transfer the metrics to user space by using netlink. The smcr command in the smc-tools toolset retrieves and interprets the statistical metrics from netlink.
Query statistics about the SMC-R stack
Run the following command to query statistics about the SMC over Remote Direct Memory Access (SMC-R) stack in the current
net namespace:smcr statsSample command output:
# smcr stats SMC-R Connections Summary Total connections handled 5076 SMC connections 5076 Handshake errors 0 Avg requests per SMC conn 1977.0 TCP fallback 0 RX Stats Data transmitted (Bytes) 200705600 (200.7M) Total requests 5017741 Buffer usage (Bytes) 0 (0) Buffer full 0 (0.00%) 8KB 16KB 32KB 64KB 128KB 256KB 512KB >512KB Bufs 0 0 0 0 0 5.076K 0 0 Reqs 5.018M 0 0 0 0 0 0 0 TX Stats Data transmitted (Bytes) 1194173445 (1.194G) Total requests 5017640 Buffer usage (Bytes) 0 (0) Buffer full 0 (0.00%) Buffer full (remote) 0 (0.00%) Buffer too small 0 (0.00%) Buffer too small (remote) 0 (0.00%) 8KB 16KB 32KB 64KB 128KB 256KB 512KB >512KB Bufs 0 0 0 0 0 5.076K 0 0 Reqs 5.018M 0 0 0 0 0 0 0 Extras Special socket calls 5Take note of the following parameters.
Parameter
Description
Total connections handledThe total number of connections handled by the SMC-R stack, which is the sum of the
SMC connections,Handshake errors, andTCP fallbackvalues.SMC connectionsThe total number of connections converted into SMC-R connections.
Handshake errorsThe total number of connections that failed due to errors during the handshake phase, such as no responses received from the peer.
Avg requests per SMC connThe average number of requests received or sent per SMC connection.
TCP fallbackThe total number of connections that fell back to TCP/IP.
Rx/Data transmitted (Bytes)The total number of bytes received over SMC-R connections.
Rx/Total requestsThe total number of requests received over SMC-R connections.
Rx/Buffer usage(Bytes)The total size of receive (Rx) buffers used by SMC-R connections. Unit: bytes.
Rx/Buffer fullThe total number of times that the Rx buffers for SMC-R connections became full. If the user-mode application that uses an SMC-R connection does not read data from the Rx buffer allocated to the connection in a timely manner, the Rx buffer may become full. To decrease the value of this parameter, configure user-mode applications that use SMC-R connections to read data from the Rx buffers at the earliest opportunity or increase the capacity of the Rx buffers. Otherwise, the sender is backpressed and the receiver cannot receive new data.
Rx/BufsThe distribution of the Rx buffers used by SMC-R connections. SMC-R maintains a memory pool for each link group. When a connection is established, SMC-R allocates an idle memory block of a suitable size from the memory pool to the connection as the Rx buffer. If no idle memory block is available, SMC-R creates a new memory block of a suitable size. After the connection is closed, the memory block is reclaimed to the memory pool. This parameter specifies the total number of times that Rx buffers were allocated from the memory pool to SMC-R connections and the distribution of the Rx buffer sizes, including the number of times that Rx buffers were created and reused, but does not specify the actual number of Rx buffers that consume memory.
Rx/ReqsThe distribution of sizes of requests received over SMC-R connections.
Tx/Data transmitted (Bytes)The total number of bytes sent over SMC-R connections.
Tx/Total requestsThe total number of requests sent over SMC-R connections.
Tx/Buffer fullThe total number of times that the transmit (Tx) buffers for SMC-R connections became full. If the SMC-R stack does not send the data in a Tx buffer to links in a timely manner, the Tx buffer may become full. If the percentage is high, increase the capacity of the Tx buffers based on your business requirements.
Tx/Buffer full (remote)The total number of times that the peer Rx buffers for SMC-R connections became full. If the peer Rx buffer for an SMC-R connection is full, the local end cannot send data to the peer. If the percentage is high, increase the capacity of the peer Rx buffers based on your business requirements.
Tx/Buffer too smallThe total number of times that the size of requests sent over an SMC-R connection exceeded the size of the corresponding Tx buffer. If the size of requests sent over an SMC-R connection exceeds the size of the corresponding Tx buffer, the size of the Tx buffer is excessively small. If the percentage is high, increase the capacity of the Tx buffers based on your business requirements.
Tx/Buffer too small (remote)The total number of times that the size of requests sent over an SMC-R connection exceeded the size of the corresponding peer Rx buffer. If the size of requests sent over an SMC-R connection exceeds the size of the corresponding peer Rx buffer, the size of the peer Rx buffer is excessively small. If the percentage is high, increase the capacity of the peer Rx buffers based on your business requirements.
Tx/BufsThe distribution of the Tx buffers used by SMC-R connections. SMC-R maintains a memory pool for each link group. When a connection is established, SMC-R allocates an idle memory block of a suitable size from the memory pool to the connection as the Tx buffer. If no idle memory block is available, SMC-R creates a new memory block of a suitable size. After the connection is closed, the memory block is reclaimed to the memory pool. This parameter specifies the total number of times that Tx buffers were allocated from the memory pool to SMC-R connections and the distribution of the Tx buffer sizes, including the number of time that Tx buffers were created and reused, but does not specify the actual number of Tx buffers that consume memory.
Tx/ReqsThe distribution of sizes of requests sent over SMC-R connections.
Query statistics about SMC-R link groups
NoteBy default, a link group contains one Link and can carry 32 SMC-R connections. Each link group is used to maintain a set of RDMA resources in SMC-R, including Queue Pairs (QP), Protection Domains (PDs), and Memory Registrations (MRs).
Run the following command to query statistics about to all SMC-R link groups that are associated with RDMA devices accessible in the current net namespace:
smcr linkgroupSample command output:
# smcr linkgroup LG-ID LG-Role LG-Type VLAN #Conns PNET-ID 00000300 SERV SINGLE 0 0Take note of the number of link groups, which indicates the number of QPs in use, and the parameter in the following table.
Parameter
Description
#ConnsThe number of SMC connections carried by the
link group.To query more information about link groups, you can run the
smcr -d linkgroupcommand.Sample command output:
LG-ID : 00000500 LG-Role : CLNT LG-Type : SINGLE VLAN : 0 PNET-ID : Version : 2 Peer-Rel : 1 Peer-Host: Peer-OS : LINUX Direct : Yes EID : SMCV2-DEFAULT-UEID #Conns : 32 Sndbuf : 8388608 B CNY : 8388608 BTake note of the following parameters.
Parameter
Description
SndbufThe total amount of memory occupied by the Tx buffer pool maintained by the link group. Unit: bytes.
CNYThe total amount of memory occupied by the Rx buffer pool maintained by the link group. Unit: bytes.
Query statistics about SMC-R devices
Run the following command to query information about the RDMA devices used by the SMC-R stack:
smcr deviceSample command output:
# smcr device Net-Dev IB-Dev IB-P IB-State Type Crit #Links PNET-ID eth1 erdma_0 1 ACTIVE 0x107f No 0Take note of the following parameters.
Parameter
Description
Net-DevThe name of the Ethernet device.
IB-DevThe name of the RDMA device.
IB-PThe port of the RDMA device.
IB-StateThe status of the RDMA device.
TypeThe type of the RDMA device. If the device is an elastic RDMA (eRDMA) device of Alibaba Cloud,
0x107fis displayed.#LinksThe number of links associated with the RDMA device.
PNET-IDThe physical network (PNET) ID of the RDMA device. For more information, see configure PNET ID parameters.
Monitor connections
Similar to ss, the SMC user-mode tool smcss in the smc-tools toolset monitors SMC sockets. The smcss tool retrieves socket information from netlink, including information about sockets that use SMC after negotiation or fall back to TCP if negotiation fails.
Query basic information about SMC sockets
Run the following command to query basic information about connecting, closing, or connected SMC sockets that run on the SMC stack or fall back to the TCP stack in the current
net namespace:smcssSample command output:
# smcss State UID Inode Local Address Peer Address Intf Mode ACTIVE 00994 2954337 xxx.xxx.x.xx:80 xxx.xxx.x.xx:36000 0000 SMCR ACTIVE 00994 2953297 xxx.xxx.x.xx:80 xxx.xxx.x.xx:35948 0000 TCP 0x03010000Take note of the following parameters.
INIT: The socket is being initialized.CLOSED: The socket is closed.LISTEN: The socket is a listening socket.ACTIVE: The SMC socket has an established connection.PEERCLW1: No further data will be sent to the peer.PEERCLW2: No further data will be sent to or received from the peer.APPLCLW1: No further data will be received from the peer.APPLCLW2: No further data will be received from or sent to the peer.APPLFINCLW: The peer has closed the socket.PEERFINCLW: The socket is closed locally.PEERABORTW: The socket was abnormally closed locally.PROCESSABORT: The peer has closed the socket abnormally.SMCR: uses the SMC-R stack for communication.
TCP <fallback reason>: falls back to the TCP/IP stack. A numeric code indicates the fallback reason. For information about the meaning of a numeric code, see fallback to TCP/IP after enabling SMC.Query statistics about SMC sockets in the LISTEN state
Run the following command to query statistics about SMC sockets in the listening (LISTEN) state in the current
net namespace:smcss -lThe parameters in the
smcss -lcommand output are the same as those in thesmcsscommand output.Query statistics about SMC sockets that run on the SMC-R stack
Run the following instruction to query statistics about SMC sockets that run on the SMC-R stack in the current
net namespace:smcss -RSample command output:
# smcss -R State UID Inode Local Address Peer Address Intf Mode Role IB-device Port Linkid GID Peer-GID ACTIVE 00000 1833669 xxx.xxx.x.xx:33618 xxx.xxx.x.xx:80 0000 SMCR CLNT erdma_0 01 01 0000:0000:0000:0000:0000:xxxx:xxxx:xxxx 0000:0000:0000:0000:0000:xxxx:xxxx:xxxxIn addition to the preceding basic parameters in the
smcsscommand output, take note of the following parameters.Parameter
Description
IB-device
The name of the RDMA device used for the connection.
Port
The port of the RDMA device used for the connection.
GID
The global ID (GID) of the RDMA device used for the connection.
Peer-GID
The GID of the peer RDMA device.
Query statistics about all SMC sockets
Run the following command to query statistics about all SMC sockets in the current
net namespace:smcss -aThe parameters in the
smcss -acommand output are the same as those in thesmcsscommand output.
Parameter | Description |
| The status of the socket. Valid values: |
| The local IPv4 or IPv4-mapped IPv6 address and port. SMC only supports the IPv4 protocol. |
| The peer IPv4 or IPv4-mapped IPv6 address and port. CSMC only supports the IPv4 protocol. |
| The communication mode. |
Monitor devices
SMC-R uses Alibaba Cloud eRDMA devices as underlying RDMA devices. eRDMA provides various user-mode tools to collect statistics about RDMA resources and devices. For more information, see Monitor and diagnose eRDMA.
Integrated monitoring tool
smc_monitor_ex is a monitoring tool script provided by smc-tools that calls atomic instructions of smc-tools, such as smcr and smcss, to collects statistics about SMC traffic, connections, and memory usage.
Tool usage
Run the following command to query the usage of
smc_monitor_ex.Warningsmc_monitor_ex is an experimental tool, and its usage may change in the future.
# smc_monitor_ex -h usage: smc_monitor_ex [-h] {speed,s,connection,c,memory,m,base,b} ... SMC Monitor Tool (Experimental) positional arguments: {speed,s,connection,c,memory,m,base,b} commands speed (s) View transfer rates connection (c) View connection counts memory (m) View memory usages base (b) View transfer rates, connection counts, and memory usages optional arguments: -h, --help show this help message and exitQuery statistics about SMC traffic
Run the
speedsubcommand insmc_monitor_exto query the SMC traffic rate and records per second (RPS) in the currentnet namespace.# smc_monitor_ex speed -h usage: smc_monitor_ex speed [-h] [-i INTERVAL] [-r] [-m {smcr,smcd,smc}] optional arguments: -h, --help show this help message and exit -i INTERVAL, --interval INTERVAL Interval in seconds to display transfer rates. -r, --raw Display rates in B/s without converting units. -m {smcr,smcd,smc}, --mode {smcr,smcd,smc} Mode to check, either 'smc', 'smcr' or 'smcd', default is 'smc'For example, run the following command to query the SMC-R traffic rate and RPS in the current
net namespaceevery second:# smc_monitor_ex speed -m smcr -i 1 Date Mode Rx Rate Rx Rps Tx Rate Tx Rps 2025-02-21 14:01:48 smcr 0.0 B/s 0.0 /s 0.0 B/s 0.0 /s Date Mode Rx Rate Rx Rps Tx Rate Tx Rps 2025-02-21 14:01:49 smcr 0.0 B/s 0.0 /s 0.0 B/s 0.0 /s Date Mode Rx Rate Rx Rps Tx Rate Tx Rps 2025-02-21 14:01:50 smcr 0.0 B/s 0.0 /s 0.0 B/s 0.0 /sQuery statistics about SMC connections
Run the
connectionsubcommand insmc_monitor_exto query the number of connections that use SMC or fall back to TCP in the currentnet namespace.# smc_monitor_ex connection -h usage: smc_monitor_ex connection [-h] [-i INTERVAL] [-m {smc,smcr,smcd,fallback,all}] optional arguments: -h, --help show this help message and exit -i INTERVAL, --interval INTERVAL Interval in seconds to display connections. -m {smc,smcr,smcd,fallback,all}, --mode {smc,smcr,smcd,fallback,all} Mode to check, either 'all', 'smc', 'smcr', 'smcd' or 'fallback', default is 'all'For example, run the following command to query the number of connections that use SMC-R in the current
net namespaceevery second:# smc_monitor_ex connection -m smcr -i 1 Date Mode #Conn 2025-02-21 14:06:47 smcr 0 Date Mode #Conn 2025-02-21 14:06:48 smcr 0 Date Mode #Conn 2025-02-21 14:06:49 smcr 0Query statistics about memory usage
Run the
memorysubcommand insmc_monitor_exto query the total size of ring buffers used by SMC connections in the currentnet namespace.# smc_monitor_ex memory -h usage: smc_monitor_ex memory [-h] [-i INTERVAL] [-r] [-m {smcr,smcd,smc}] optional arguments: -h, --help show this help message and exit -i INTERVAL, --interval INTERVAL Interval in seconds to display ringbuf usages. -r, --raw Display memory usages in bytes without converting units. -m {smcr,smcd,smc}, --mode {smcr,smcd,smc} Mode to check, either 'smcr', 'smcd' or 'smc', default is 'smc'For example, run the following command to query the total size of ring buffers used by SMC-R connections in the current
net namespaceevery second:# smc_monitor_ex memory -m smcr -i 1 Date Mode Rx Bufs Tx Bufs 2025-01-06 15:14:20 smcr 512.00 KB 512.00 KB Date Mode Rx Bufs Tx Bufs 2025-01-06 15:14:21 smcr 512.00 KB 512.00 KBQuery all SMC statistics
Run the
basesubcommand insmc_monitor_exto query all the preceding SMC-related statistics in the currentnet namespace.For example, run the following command to query all SMC-R related statistics in the current
net namespaceevery second:# smc_monitor_ex base -m smcr -i 1 Date Mode Rx Rate Rx Rps Tx Rate Tx Rps #Conn Rx Bufs Tx Bufs 2025-01-06 15:17:23 smcr 1.81 GB/s 21.66 K/s 0 B/s 0 /s 2 512.00 KB 512.00 KB Date Mode Rx Rate Rx Rps Tx Rate Tx Rps #Conn Rx Bufs Tx Bufs 2025-01-06 15:17:24 smcr 1.82 GB/s 21.81 K/s 0 B/s 0 /s 2 512.00 KB 512.00 KB