All Products
Search
Document Center

Alibaba Cloud Linux:SMC monitoring

Last Updated:Mar 25, 2025

This topic describes the metrics that you must monitor during Shared Memory Communication (SMC) maintenance to determine network health.

Prerequisites

smc-tools, which is a user-mode SMC maintenance toolset provided by Alibaba Cloud Linux 3, is installed.

If the smc-tools toolset is not installed, run the following command to install the toolset:

sudo yum install -y smc-tools

Monitor protocol stacks

The SMC stack provides statistical metrics related to connections, traffic, and shared memory, and transfer the metrics to user space by using netlink. The smcr command in the smc-tools toolset retrieves and interprets the statistical metrics from netlink.

  • Query statistics about the SMC-R stack

    Run the following command to query statistics about the SMC over Remote Direct Memory Access (SMC-R) stack in the current net namespace:

    smcr stats

    Sample command output:

    # smcr stats
    SMC-R Connections Summary
      Total connections handled          5076
      SMC connections                    5076
      Handshake errors                      0
      Avg requests per SMC conn          1977.0
      TCP fallback                          0
    
    RX Stats
      Data transmitted (Bytes)      200705600 (200.7M)
      Total requests                  5017741
      Buffer usage (Bytes)                  0 (0)
      Buffer full                           0 (0.00%)
                8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
      Bufs        0       0       0       0       0  5.076K       0       0
      Reqs   5.018M       0       0       0       0       0       0       0
    
    TX Stats
      Data transmitted (Bytes)     1194173445 (1.194G)
      Total requests                  5017640
      Buffer usage (Bytes)                  0 (0)
      Buffer full                           0 (0.00%)
      Buffer full (remote)                  0 (0.00%)
      Buffer too small                      0 (0.00%)
      Buffer too small (remote)             0 (0.00%)
                8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
      Bufs        0       0       0       0       0  5.076K       0       0
      Reqs   5.018M       0       0       0       0       0       0       0
    
    Extras
      Special socket calls                  5

    Take note of the following parameters.

    Parameter

    Description

    Total connections handled

    The total number of connections handled by the SMC-R stack, which is the sum of the SMC connections, Handshake errors, and TCP fallback values.

    SMC connections

    The total number of connections converted into SMC-R connections.

    Handshake errors

    The total number of connections that failed due to errors during the handshake phase, such as no responses received from the peer.

    Avg requests per SMC conn

    The average number of requests received or sent per SMC connection.

    TCP fallback

    The total number of connections that fell back to TCP/IP.

    Rx/Data transmitted (Bytes)

    The total number of bytes received over SMC-R connections.

    Rx/Total requests

    The total number of requests received over SMC-R connections.

    Rx/Buffer usage(Bytes)

    The total size of receive (Rx) buffers used by SMC-R connections. Unit: bytes.

    Rx/Buffer full

    The total number of times that the Rx buffers for SMC-R connections became full. If the user-mode application that uses an SMC-R connection does not read data from the Rx buffer allocated to the connection in a timely manner, the Rx buffer may become full. To decrease the value of this parameter, configure user-mode applications that use SMC-R connections to read data from the Rx buffers at the earliest opportunity or increase the capacity of the Rx buffers. Otherwise, the sender is backpressed and the receiver cannot receive new data.

    Rx/Bufs

    The distribution of the Rx buffers used by SMC-R connections. SMC-R maintains a memory pool for each link group. When a connection is established, SMC-R allocates an idle memory block of a suitable size from the memory pool to the connection as the Rx buffer. If no idle memory block is available, SMC-R creates a new memory block of a suitable size. After the connection is closed, the memory block is reclaimed to the memory pool. This parameter specifies the total number of times that Rx buffers were allocated from the memory pool to SMC-R connections and the distribution of the Rx buffer sizes, including the number of times that Rx buffers were created and reused, but does not specify the actual number of Rx buffers that consume memory.

    Rx/Reqs

    The distribution of sizes of requests received over SMC-R connections.

    Tx/Data transmitted (Bytes)

    The total number of bytes sent over SMC-R connections.

    Tx/Total requests

    The total number of requests sent over SMC-R connections.

    Tx/Buffer full

    The total number of times that the transmit (Tx) buffers for SMC-R connections became full. If the SMC-R stack does not send the data in a Tx buffer to links in a timely manner, the Tx buffer may become full. If the percentage is high, increase the capacity of the Tx buffers based on your business requirements.

    Tx/Buffer full (remote)

    The total number of times that the peer Rx buffers for SMC-R connections became full. If the peer Rx buffer for an SMC-R connection is full, the local end cannot send data to the peer. If the percentage is high, increase the capacity of the peer Rx buffers based on your business requirements.

    Tx/Buffer too small

    The total number of times that the size of requests sent over an SMC-R connection exceeded the size of the corresponding Tx buffer. If the size of requests sent over an SMC-R connection exceeds the size of the corresponding Tx buffer, the size of the Tx buffer is excessively small. If the percentage is high, increase the capacity of the Tx buffers based on your business requirements.

    Tx/Buffer too small (remote)

    The total number of times that the size of requests sent over an SMC-R connection exceeded the size of the corresponding peer Rx buffer. If the size of requests sent over an SMC-R connection exceeds the size of the corresponding peer Rx buffer, the size of the peer Rx buffer is excessively small. If the percentage is high, increase the capacity of the peer Rx buffers based on your business requirements.

    Tx/Bufs

    The distribution of the Tx buffers used by SMC-R connections. SMC-R maintains a memory pool for each link group. When a connection is established, SMC-R allocates an idle memory block of a suitable size from the memory pool to the connection as the Tx buffer. If no idle memory block is available, SMC-R creates a new memory block of a suitable size. After the connection is closed, the memory block is reclaimed to the memory pool. This parameter specifies the total number of times that Tx buffers were allocated from the memory pool to SMC-R connections and the distribution of the Tx buffer sizes, including the number of time that Tx buffers were created and reused, but does not specify the actual number of Tx buffers that consume memory.

    Tx/Reqs

    The distribution of sizes of requests sent over SMC-R connections.

  • Query statistics about SMC-R link groups

    Note

    By default, a link group contains one Link and can carry 32 SMC-R connections. Each link group is used to maintain a set of RDMA resources in SMC-R, including Queue Pairs (QP), Protection Domains (PDs), and Memory Registrations (MRs).

    Run the following command to query statistics about to all SMC-R link groups that are associated with RDMA devices accessible in the current net namespace:

    smcr linkgroup

    Sample command output:

    # smcr linkgroup
    LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
    00000300 SERV     SINGLE      0       0

    Take note of the number of link groups, which indicates the number of QPs in use, and the parameter in the following table.

    Parameter

    Description

    #Conns

    The number of SMC connections carried by the link group.

    To query more information about link groups, you can run the smcr -d linkgroup command.

    Sample command output:

    LG-ID    : 00000500
    LG-Role  : CLNT
    LG-Type  : SINGLE
    VLAN     : 0
    PNET-ID  :
    Version  : 2
    Peer-Rel : 1
    Peer-Host:
    Peer-OS  : LINUX
    Direct   : Yes
    EID      : SMCV2-DEFAULT-UEID
    #Conns   : 32
    Sndbuf   : 8388608 B
    CNY      : 8388608 B

    Take note of the following parameters.

    Parameter

    Description

    Sndbuf

    The total amount of memory occupied by the Tx buffer pool maintained by the link group. Unit: bytes.

    CNY

    The total amount of memory occupied by the Rx buffer pool maintained by the link group. Unit: bytes.

  • Query statistics about SMC-R devices

    Run the following command to query information about the RDMA devices used by the SMC-R stack:

    smcr device

    Sample command output:

    # smcr device
    Net-Dev         IB-Dev   IB-P  IB-State  Type          Crit  #Links  PNET-ID
    eth1            erdma_0     1    ACTIVE  0x107f          No       0  

    Take note of the following parameters.

    Parameter

    Description

    Net-Dev

    The name of the Ethernet device.

    IB-Dev

    The name of the RDMA device.

    IB-P

    The port of the RDMA device.

    IB-State

    The status of the RDMA device.

    Type

    The type of the RDMA device. If the device is an elastic RDMA (eRDMA) device of Alibaba Cloud, 0x107f is displayed.

    #Links

    The number of links associated with the RDMA device.

    PNET-ID

    The physical network (PNET) ID of the RDMA device. For more information, see configure PNET ID parameters.

Monitor connections

Similar to ss, the SMC user-mode tool smcss in the smc-tools toolset monitors SMC sockets. The smcss tool retrieves socket information from netlink, including information about sockets that use SMC after negotiation or fall back to TCP if negotiation fails.

  • Query basic information about SMC sockets

    Run the following command to query basic information about connecting, closing, or connected SMC sockets that run on the SMC stack or fall back to the TCP stack in the current net namespace:

    smcss

    Sample command output:

    # smcss
    State          UID   Inode   Local Address           Peer Address            Intf Mode
    ACTIVE         00994 2954337 xxx.xxx.x.xx:80         xxx.xxx.x.xx:36000      0000 SMCR
    ACTIVE         00994 2953297 xxx.xxx.x.xx:80         xxx.xxx.x.xx:35948      0000 TCP 0x03010000

    Take note of the following parameters.

  • Parameter

    Description

    State

    The status of the socket. Valid values:

    • INIT: The socket is being initialized.

    • CLOSED: The socket is closed.

    • LISTEN: The socket is a listening socket.

    • ACTIVE: The SMC socket has an established connection.

    • PEERCLW1: No further data will be sent to the peer.

    • PEERCLW2: No further data will be sent to or received from the peer.

    • APPLCLW1: No further data will be received from the peer.

    • APPLCLW2: No further data will be received from or sent to the peer.

    • APPLFINCLW: The peer has closed the socket.

    • PEERFINCLW: The socket is closed locally.

    • PEERABORTW: The socket was abnormally closed locally.

    • PROCESSABORT: The peer has closed the socket abnormally.

    Local Address

    The local IPv4 or IPv4-mapped IPv6 address and port. SMC only supports the IPv4 protocol.

    Peer Address

    The peer IPv4 or IPv4-mapped IPv6 address and port. CSMC only supports the IPv4 protocol.

    Mode

    The communication mode.

    • SMCR: uses the SMC-R stack for communication.

    • TCP <fallback reason>: falls back to the TCP/IP stack. A numeric code indicates the fallback reason. For information about the meaning of a numeric code, see fallback to TCP/IP after enabling SMC.

  • Query statistics about SMC sockets in the LISTEN state

    Run the following command to query statistics about SMC sockets in the listening (LISTEN) state in the current net namespace:

    smcss -l

    The parameters in the smcss -l command output are the same as those in the smcss command output.

  • Query statistics about SMC sockets that run on the SMC-R stack

    Run the following instruction to query statistics about SMC sockets that run on the SMC-R stack in the current net namespace:

    smcss -R

    Sample command output:

    # smcss -R
    State          UID   Inode   Local Address           Peer Address            Intf Mode Role IB-device       Port Linkid GID                                      Peer-GID
    ACTIVE         00000 1833669 xxx.xxx.x.xx:33618      xxx.xxx.x.xx:80         0000 SMCR CLNT erdma_0         01   01     0000:0000:0000:0000:0000:xxxx:xxxx:xxxx  0000:0000:0000:0000:0000:xxxx:xxxx:xxxx

    In addition to the preceding basic parameters in the smcss command output, take note of the following parameters.

    Parameter

    Description

    IB-device

    The name of the RDMA device used for the connection.

    Port

    The port of the RDMA device used for the connection.

    GID

    The global ID (GID) of the RDMA device used for the connection.

    Peer-GID

    The GID of the peer RDMA device.

  • Query statistics about all SMC sockets

    Run the following command to query statistics about all SMC sockets in the current net namespace:

    smcss -a

    The parameters in the smcss -a command output are the same as those in the smcss command output.

Monitor devices

SMC-R uses Alibaba Cloud eRDMA devices as underlying RDMA devices. eRDMA provides various user-mode tools to collect statistics about RDMA resources and devices. For more information, see Monitor and diagnose eRDMA.

Integrated monitoring tool

smc_monitor_ex is a monitoring tool script provided by smc-tools that calls atomic instructions of smc-tools, such as smcr and smcss, to collects statistics about SMC traffic, connections, and memory usage.

  • Tool usage

    Run the following command to query the usage of smc_monitor_ex.

    Warning

    smc_monitor_ex is an experimental tool, and its usage may change in the future.

    # smc_monitor_ex -h
    usage: smc_monitor_ex [-h] {speed,s,connection,c,memory,m,base,b} ...
    
    SMC Monitor Tool (Experimental)
    
    positional arguments:
      {speed,s,connection,c,memory,m,base,b}
                            commands
        speed (s)           View transfer rates
        connection (c)      View connection counts
        memory (m)          View memory usages
        base (b)            View transfer rates, connection counts, and memory
                            usages
    
    optional arguments:
      -h, --help            show this help message and exit
  • Query statistics about SMC traffic

    Run the speed subcommand in smc_monitor_ex to query the SMC traffic rate and records per second (RPS) in the current net namespace.

    # smc_monitor_ex speed -h
    usage: smc_monitor_ex speed [-h] [-i INTERVAL] [-r] [-m {smcr,smcd,smc}]
    
    optional arguments:
      -h, --help            show this help message and exit
      -i INTERVAL, --interval INTERVAL
                            Interval in seconds to display transfer rates.
      -r, --raw             Display rates in B/s without converting units.
      -m {smcr,smcd,smc}, --mode {smcr,smcd,smc}
                            Mode to check, either 'smc', 'smcr' or 'smcd', default
                            is 'smc'

    For example, run the following command to query the SMC-R traffic rate and RPS in the current net namespace every second:

    # smc_monitor_ex speed -m smcr -i 1
    Date                  Mode            Rx Rate            Rx Rps           Tx Rate            Tx Rps
    2025-02-21 14:01:48   smcr            0.0 B/s            0.0 /s           0.0 B/s            0.0 /s
    Date                  Mode            Rx Rate            Rx Rps           Tx Rate            Tx Rps
    2025-02-21 14:01:49   smcr            0.0 B/s            0.0 /s           0.0 B/s            0.0 /s
    Date                  Mode            Rx Rate            Rx Rps           Tx Rate            Tx Rps
    2025-02-21 14:01:50   smcr            0.0 B/s            0.0 /s           0.0 B/s            0.0 /s
  • Query statistics about SMC connections

    Run the connection subcommand in smc_monitor_ex to query the number of connections that use SMC or fall back to TCP in the current net namespace.

    # smc_monitor_ex connection -h
    usage: smc_monitor_ex connection [-h] [-i INTERVAL]
                                     [-m {smc,smcr,smcd,fallback,all}]
    
    optional arguments:
      -h, --help            show this help message and exit
      -i INTERVAL, --interval INTERVAL
                            Interval in seconds to display connections.
      -m {smc,smcr,smcd,fallback,all}, --mode {smc,smcr,smcd,fallback,all}
                            Mode to check, either 'all', 'smc', 'smcr', 'smcd' or
                            'fallback', default is 'all'

    For example, run the following command to query the number of connections that use SMC-R in the current net namespace every second:

    # smc_monitor_ex connection -m smcr -i 1
    Date                  Mode                 #Conn
    2025-02-21 14:06:47   smcr                     0
    Date                  Mode                 #Conn
    2025-02-21 14:06:48   smcr                     0
    Date                  Mode                 #Conn
    2025-02-21 14:06:49   smcr                     0
  • Query statistics about memory usage

    Run the memory subcommand in smc_monitor_ex to query the total size of ring buffers used by SMC connections in the current net namespace.

    # smc_monitor_ex memory -h
    usage: smc_monitor_ex memory [-h] [-i INTERVAL] [-r] [-m {smcr,smcd,smc}]
    
    optional arguments:
      -h, --help            show this help message and exit
      -i INTERVAL, --interval INTERVAL
                            Interval in seconds to display ringbuf usages.
      -r, --raw             Display memory usages in bytes without converting
                            units.
      -m {smcr,smcd,smc}, --mode {smcr,smcd,smc}
                            Mode to check, either 'smcr', 'smcd' or 'smc', default
                            is 'smc'

    For example, run the following command to query the total size of ring buffers used by SMC-R connections in the current net namespace every second:

    # smc_monitor_ex memory -m smcr -i 1
    Date                  Mode            Rx Bufs           Tx Bufs
    2025-01-06 15:14:20   smcr          512.00 KB         512.00 KB
    Date                  Mode            Rx Bufs           Tx Bufs
    2025-01-06 15:14:21   smcr          512.00 KB         512.00 KB
  • Query all SMC statistics

    Run the base subcommand in smc_monitor_ex to query all the preceding SMC-related statistics in the current net namespace.

    For example, run the following command to query all SMC-R related statistics in the current net namespace every second:

    # smc_monitor_ex base -m smcr -i 1
    Date                  Mode             Rx Rate      Rx Rps        Tx Rate      Tx Rps     #Conn     Rx Bufs     Tx Bufs
    2025-01-06 15:17:23   smcr           1.81 GB/s   21.66 K/s          0 B/s        0 /s         2   512.00 KB   512.00 KB
    Date                  Mode             Rx Rate      Rx Rps        Tx Rate      Tx Rps     #Conn     Rx Bufs     Tx Bufs
    2025-01-06 15:17:24   smcr           1.82 GB/s   21.81 K/s          0 B/s        0 /s         2   512.00 KB   512.00 KB