All Products
Search
Document Center

Alibaba Cloud Linux:Monitor and check SMC

Last Updated:Jul 22, 2024

This topic describes the tools that are used to monitor and check Shared Memory Communications over Remote Direct Memory Access (SMC-R) in the Shared Memory Communications (SMC) stack in the kernel and how to use the tools. You can analyze traffic metrics in an SMC network and determine the health status of the network based on the monitoring and check results returned by the tools.

Prerequisites

The Alibaba Cloud Linux 3 operating system is used.

Use the smc-tools package to monitor and check SMC

The smc-tools package provided by Alibaba Cloud Linux 3 helps you obtain information about SMC connections, SMC resources, and the SMC stack.

Install the smc-tools package

sudo yum install -y smc-tools

Query information about the SMC-R stack

  • smcr device: queries information about Remote Direct Memory Access (RDMA) devices that are used by the SMC-R stack.

    Sample command output:

    # smcr device
    Net-Dev         IB-Dev   IB-P  IB-State  Type          Crit  #Links  PNET-ID
    eth1            erdma_0     1    ACTIVE  0x107f          No       0  

    Take note of the parameters in the following table.

    Parameter

    Description

    Net-Dev

    The name of the Ethernet device.

    IB-Dev

    The name of the RDMA device.

    IB-P

    The port of the RDMA device.

    IB-State

    The status of the RDMA device.

    Type

    The type of the RDMA device. If the device is an elastic RDMA (eRDMA) device of Alibaba Cloud, 0x107f is displayed.

    #Links

    The number of queue pairs (QPs) used by the RDMA device.

    PNET-ID

    The physical network (PNET) ID of the RDMA device.

  • smcr linkgroup: queries information about SMC-R link groups.

    Note

    A link group in SMC-R consists of RDMA resources, including QPs, Protection Domains (PDs), and Memory Registrations (MRs). By default, a link group supports 32 SMC connections.

    Sample command output:

    # smcr linkgroup
    LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
    00000300 SERV     SINGLE      0       0  1234

    Take note of the number of link groups, which indicates the amount of QPs that are used, and the parameter in the following table.

    Parameter

    Description

    #Conns

    The number of SMC connections carried on the link group.

  • smcr stats: queries statistics about the SMC-R stack in the current network namespace.

    Sample command output:

    # smcr stats
    SMC-R Connections Summary
      Total connections handled             7
      SMC connections                       5
      Handshake errors                      0
      Avg requests per SMC conn        518103.6
      TCP fallback                          2
    
    RX Stats
      Data transmitted (Bytes)       18133584 (18.13M)
      Total requests                  1295262
      Buffer full                           0 (0.00%)
                8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
      Bufs        0       0       0       0       0       5       0       0
      Reqs   1.295M       0       0       0       0       0       0       0
    
    TX Stats
      Data transmitted (Bytes)       18133584 (18.13M)
      Total requests                  1295256
      Buffer full                           0 (0.00%)
      Buffer full (remote)                  0 (0.00%)
      Buffer too small                      0 (0.00%)
      Buffer too small (remote)             0 (0.00%)
                8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
      Bufs        0       0       0       0       0       5       0       0
      Reqs   1.295M       0       0       0       0       0       0       0
    
    Extras
      Special socket calls                  5

    Take note of the parameters in the following table.

    Parameter

    Description

    Total connections handled

    The total number of connections that were handled by the SMC-R stack, which is the sum of the SMC connections, Handshake errors, and TCP fallback values.

    SMC connections

    The total number of connections that were converted into SMC-R connections.

    Handshake errors

    The total number of connections that failed due to errors during the handshake phase, such as no responses were received from the peer.

    Avg requests per SMC conn

    The average number of requests received or sent per SMC connection.

    TCP fallback

    The total number of connections that fell back to TCP/IP.

    Rx/Data transmitted (Bytes)

    The total number of bytes that were received over SMC-R connections.

    Rx/Total requests

    The total number of requests that were received over SMC-R connections.

    Rx/Buffer full

    The total number of times that the receive (Rx) buffers for SMC-R connections became full. If the user-mode application that uses an SMC-R connection does not read data from the Rx buffer allocated to the connection at the earliest opportunity, the Rx buffer allocated to the connection may become full. To decrease the value of this parameter, enable the user-mode applications that use SMC-R connections to read data from the Rx buffers at the earliest opportunity or increase the capacity of the Rx buffers. Otherwise, the sender is backpressed and the receiver cannot receive new data.

    Rx/Bufs

    The distribution of the Rx buffers used by SMC-R connections. SMC-R maintains a memory pool for each link group. When a connection is established, SMC-R allocates an idle memory block of a suitable size from the memory pool to the connection as the Rx buffer. If no idle memory block is available, SMC-R creates a new memory block of a suitable size. After the connection is closed, the memory block is reclaimed to the memory pool. The value of this parameter indicates the total number of times that Rx buffers were allocated from the memory pool to SMC-R connections and the distribution of the Rx buffer sizes. The value does not represent the actual number of Rx buffers that are consuming memory.

    Rx/Reqs

    The distribution of sizes of requests received over SMC-R connections.

    Tx/Data transmitted (Bytes)

    The total number of bytes that were sent over SMC-R connections.

    Tx/Total requests

    The total number of requests that were sent over SMC-R connections.

    Tx/Buffer full

    The total number of times that the transmit (Tx) buffers for SMC-R connections became full. If the SMC-R stack does not send the data delivered by the user-mode application that uses an SMC-R connection to links at the earliest opportunity, the Tx buffer corresponding to the connection may become full. If the percentage is high, increase the capacity of the Tx buffers based on your business requirements.

    Tx/Buffer full (remote)

    The total number of times that the peer Rx buffers for SMC-R connections became full. If the peer Rx buffer for an SMC-R connection is full, the local end cannot send data to the peer. If the percentage is high, increase the capacity of the peer Rx buffers based on your business requirements.

    Tx/Buffer too small

    The total number of times that the size of requests sent over an SMC-R connection exceeded the size of the corresponding Tx buffer. If the size of requests sent over an SMC-R connection exceeds the size of the corresponding Tx buffer, the size of the Tx buffer is excessively small. If the percentage is high, increase the capacity of the Tx buffers based on your business requirements.

    Tx/Buffer too small (remote)

    The total number of times that the size of requests sent over an SMC-R connection exceeded the size of the corresponding peer Rx buffer. If the size of requests sent over an SMC-R connection exceeds the size of the corresponding peer Rx buffer, the size of the peer Rx buffer is excessively small. If the percentage is high, increase the capacity of the peer Rx buffers based on your business requirements.

    Tx/Bufs

    The distribution of the Tx buffers used by SMC-R connections. SMC-R maintains a memory pool for each link group. When a connection is established, SMC-R allocates an idle memory block of a suitable size from the memory pool to the connection as the Tx buffer. If no idle memory block is available, SMC-R creates a new memory block of a suitable size. After the connection is closed, the memory block is reclaimed to the memory pool. The value of this parameter indicates the total number of times that Tx buffers were allocated from the memory pool to SMC-R connections and the distribution of the Tx buffer sizes. The value does not represent the actual number of Tx buffers that are consuming memory.

    Tx/Reqs

    The distribution of sizes of requests that were received over SMC-R connections.

Query information about SMC-R connections

  • smcss: queries basic information about SMC sockets that are being established, are being closed, and are established in the current network namespace.

    Note

    The command output of the preceding command includes the SMC sockets that fell back to TCP/IP.

    Sample command output:

    # smcss
    State          UID   Inode   Local Address           Peer Address            Intf Mode
    ACTIVE         00994 2954337 192.168.4.78:80         192.168.4.79:36000      0000 SMCR
    ACTIVE         00994 2954336 192.168.4.78:80         192.168.4.79:35994      0000 SMCR
    ACTIVE         00994 2954333 192.168.4.78:80         192.168.4.79:35978      0000 SMCR
    ACTIVE         00994 2950860 192.168.4.78:80         192.168.4.79:35972      0000 SMCR
    ACTIVE         00994 2953298 192.168.4.78:80         192.168.4.79:35966      0000 SMCR
    ACTIVE         00994 2953297 192.168.4.78:80         192.168.4.79:35948      0000 TCP 0x03010000
    ACTIVE         00994 2954330 192.168.4.78:80         192.168.4.79:35922      0000 TCP 0x03010000
    ACTIVE         00994 2947957 192.168.4.78:80         192.168.4.79:35920      0000 TCP 0x03010000
    ACTIVE         00994 2953293 192.168.4.78:80         192.168.4.79:35822      0000 TCP 0x03010000
    ACTIVE         00994 2955286 192.168.4.78:80         192.168.4.79:35752      0000 TCP 0x03010000

    Take note of the parameters in the following table.

    Parameter

    Description

    State

    The status of the socket. Valid values:

    • INIT: The socket is being initialized.

    • CLOSED: The socket is closed.

    • LISTEN: The socket is a listening socket.

    • ACTIVE: The connection is established.

    • PEERCLW1: The socket no longer sends data to the peer.

    • PEERCLW2: The socket no longer sends data to or receives data from the peer.

    • APPLCLW1: The socket no longer receives data from the peer.

    • APPLCLW2: The socket no longer receives data from or sends data to the peer.

    • APPLFINCLW: The socket is closed by the peer.

    • PEERFINCLW: The socket is closed locally.

    • PEERABORTW: The socket is unexpectedly closed locally.

    • PROCESSABORT: The socket is unexpectedly closed by the peer.

    Local Address

    The local IPv4 address and port number. SMC supports only IPv4 protocols.

    Peer Address

    The peer IPv4 address and port number. SMC supports only IPv4 protocols.

    Mode

    The communication mode.

  • smcss -l: queries the sockets in the LISTEN state in the current network namespace.

    The smcss -l command has the same output parameters as the smcss command.

  • smcss -R: queries the sockets that run on the SMC-R stack in the current network namespace.

    Sample command output:

    # smcss -R
    State          UID   Inode   Local Address           Peer Address            Intf Mode Role IB-device       Port Linkid GID                                      Peer-GID
    ACTIVE         00000 1833669 192.168.4.79:33618      192.168.4.78:80         0000 SMCR CLNT erdma_0         01   01     0000:0000:0000:0000:0000:ffff:c0a8:044f  0000:0000:0000:0000:0000:ffff:c0a8:044e
    ACTIVE         00000 1833667 192.168.4.79:33604      192.168.4.78:80         0000 SMCR CLNT erdma_0         01   01     0000:0000:0000:0000:0000:ffff:c0a8:044f  0000:0000:0000:0000:0000:ffff:c0a8:044e
    ACTIVE         00000 1828405 192.168.4.79:33590      192.168.4.78:80         0000 SMCR CLNT erdma_0         01   01     0000:0000:0000:0000:0000:ffff:c0a8:044f  0000:0000:0000:0000:0000:ffff:c0a8:044e
    ACTIVE         00000 1833665 192.168.4.79:33578      192.168.4.78:80         0000 SMCR CLNT erdma_0         01   01     0000:0000:0000:0000:0000:ffff:c0a8:044f  0000:0000:0000:0000:0000:ffff:c0a8:044e
    ACTIVE         00000 1833663 192.168.4.79:33564      192.168.4.78:80         0000 SMCR CLNT erdma_0         01   01     0000:0000:0000:0000:0000:ffff:c0a8:044f  0000:0000:0000:0000:0000:ffff:c0a8:044e

    In addition to the preceding basic parameters, take note of the parameters in the following table.

    Parameter

    Description

    IB-device

    The name of the RDMA device used for the connection.

    Port

    The port of the RDMA device used for the connection.

    GID

    The global ID (GID) of the RDMA device used for the connection.

    Peer-GID

    The GID of the peer RDMA device.

  • smcss -a: queries SMC sockets in all states in the current network namespace, including the SMC sockets that fell back to TCP/IP.

    The smcss -a command has the same output parameters as the smcss command.

    Note

    The numeric code in the Mode field for a connection indicates the fallback reason of the connection. For more information, see the SMC falls back to TCP and RDMA cannot be used to accelerate communications section of the "Troubleshoot SMC" topic.

References

  • For information about how to monitor and maintain SMC-R when eRDMA is used, see Monitor and check eRDMA.

  • For information about how to resolve SMC issues, such as communication failure, unavailable ports, and no application performance improvement compared with TCP, see Troubleshoot SMC.