ACK Net Exporter is a component that enhances the observability of cluster networks. You can deploy ACK Net Exporter in your cluster to collect various metrics of container networks. This allows you to identify and troubleshoot network issues at the earliest opportunity. This topic describes how to use ACK Net Exporter to troubleshoot container network issues.
Prerequisites
A Container Service for Kubernetes (ACK) managed cluster is created. For more information, see Create an ACK managed cluster.
Background information
ACK Net Exporter runs in a daemon pod on each node. ACK Net the Exporter uses the Extended Berkeley Packet Filter (eBPF) technology to collect network information from the node and aggregates the information to the pod. ACK Net Exporter provides a standard interface to allow you to monitor high-level network information. The following figure shows the architecture of ACK Net Exporter.
Install and configure ACK Net Exporter
Install ACK Net Exporter
Log on to the ACK console. In the left-side navigation pane, choose .
On the Marketplace page, search for ack-net-exporter and click the component.
In the upper-right corner of the ack-net-exporter component page, click Deploy in the upper-right corner. The Deploy panel appears.
In the Basic Information step, configure the Cluster and Namespace parameters and click Next.
In the Parameters step, configure the parameters that are described in the following table and click OK.
Parameter
Description
Default value
name
The name of the ACK Net Exporter component.
ack-net-exporter-default
namespace
The namespace to which ACK Net Exporter belongs.
kube-system
config.enableEventServer
Specifies whether to enable the event tracing feature. Valid values:
false: disables the event tracing feature.
true: enables the event tracing feature.
false
config.enableMetricServer
Specifies whether to enable the metric collection feature. Valid values:
false: disables the metric collection feature.
true: enables the metric collection feature.
true
config.remoteLokiAddress
The Grafana Loki service address to which events are pushed.
By default, this parameter is left empty.
config.metricLabelVerbose
Specifies whether to enable the metric verbose feature. Valid values:
false: disables the metric verbose feature.
true: enables the metric verbose feature. After you enable this feature, pod IP addresses and labels are saved as the label information of metrics.
false
config.metricServerPort
The port that is used by the metric service to provide HTTP services.
9102
config.eventServerPort
The port that is used by the event service to provide gRPC streaming services.
19102
config.metricProbes
The metric probes that you want to enable. For more information, see the ACK Net Exporter metrics section of this topic.
By default, this parameter is left empty and only the required metric probes are enabled.
config.eventProbes
The event probes that you want to enable. For more information, see the ACK Net Exporter events section of this topic.
By default, this parameter is left empty and only the required event probes are enabled.
Configure ACK Net Exporter
You can run the following command to modify the ConfigMap of ACK Net Exporter:
kubectl edit cm inspector-config -n kube-system
Alternatively, you can configure ACK Net Exporter in the ACK console.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose
.On the ConfigMap page, set the Namespace parameter to kube-system, search for kubeskoop-config, and then click Edit in the Actions column.
In the Edit panel, configure the parameters and click OK.
The following table describes the parameters supported by ACK Net Exporter.
Parameter
Description
Default value
debugmode
Specifies whether to enable the debugging mode. Valid values:
false: disables the debugging mode.
true: enables the debugging mode. After you enable this mode, debug-level logs, interface debugging, Go pprof, and gops are supported.
false
event_config.loki_enable
Specifies whether to enable the feature of pushing events to Grafana Loki. For more information, see the Use Grafana Loki to collect and visualize events section of this topic. Valid values:
false: disables the feature of pushing events to Grafana Loki.
true: enables the feature of pushing events to Grafana Loki.
false
event_config.loki_address
The Grafana Loki service address. By default, the system automatically discovers a service named grafana-loki in the specified namespace.
By default, this parameter is left empty.
event_config.probes
The event probes that you want to enable. For more information, see the ACK Net Exporter events section of this topic.
By default, this parameter is left empty and only the required event probes are enabled.
event_config.port
The port that is used by the event service to provide gRPC streaming services.
19102
metric_config.verbose
Specifies whether to enable the metric verbose feature. Valid values:
false: disables the metric verbose feature.
true: enables the metric verbose feature. After you enable this feature, pod IP addresses and labels are saved as the label information of metrics.
false
metric_config.port
The port that is used by the metric service to provide HTTP services.
9102
metric_config.probes
The metric probes that you want to enable. For more information, see the ACK Net Exporter metrics section of this topic.
By default, this parameter is left empty and only the required metric probes are enabled.
metric_config.interval
The interval at which metrics are collected. Metric collection compromises performance. Therefore, ACK Net Exporter caches the periodically collected metrics in memory.
5
In earlier ACK Net Exporter versions, you need to trigger the system to recreate all ACK Net Exporter containers after you modify the configurations of ACK Net Exporter. The modified configurations take effect after the containers are recreated. You no longer need to perform this operation in ACK Net Exporter 0.2.3 and later versions because these versions support hot updates.
Usage notes for ACK Net Exporter
Use ACK Net Exporter in operating systems other than Alinux
Some key features of ACK Net Exporter rely on eBPF programs to collect information. To meet the requirements of different operating system kernels, ACK Net Exporter uses CO-RE to distribute eBPF programs. When ACK Net Exporter starts, it needs to load the BPF Type Format (BTF) file that is associated with the operating system kernel. The BTF file stores the metadata of the kernel debugging information. If no corresponding BTF file is loaded, the key features become unavailable. Most operating systems of later versions have built-in BTF files. For more information about the operating systems, see BPF Type Format (BTF).
To run ACK Net Exporter on Alinux2 and Alinux3 nodes, make sure that the following requirements are met:
The kernel version of the operating system must be later than 4.10.
One of the following files is installed:
The kernel-debuginfo file, which stores the kernel debugging information.
The vmlinux file, which stores the debugging information. The file is compiled by the operating system kernel but has not been compressed.
The BTF file provided by the operating system.
ACK Net Exporter is updated to 0.2.9 or later, and the config.enableLegacyVersion parameter is set to false when you install ACK Net Exporter.
If the preceding requirements are met, you can perform the following steps to use the advanced features provided by ACK Net Exporter:
Store the BTF file in the /boot/ path of the node.
If you installed a complete vmlinux file, you can store the vmlinux file in the /boot/ path of the operating system.
If you installed the kernel-debuginfo package, find the vmlinux file in the /usr/lib/debug/lib/modules/ path of the node and copy the file to the /boot/ path.
Run the following command to check whether valid BTF information is loaded and ACK Net Exporter can run as expected:
# You can use an alternative tool such as docker, podman, or ctr, instead of nerdctl, to run a comparable command for conducting the test. nerdctl run -it -v /boot:/boot registry.cn-hangzhou.aliyuncs.com/acs/btfhack:latest -- btfhack discover
If the path of the BTF file is returned, the configuration is complete. You can trigger the system to recreate the containers of ACK Net Exporter and wait a period of time. Then, you can view the collected metrics and events.
Metrics and metric format supported by ACK Net Exporter
ACK Net Exporter supports Prometheus metrics. After you install ACK Net Exporter, you can access the service port of a pod that is created for ACK Net Exporter to query metrics.
If you install ACK Net Exporter from the Marketplace page of the ACK console, you can run the following command to query all ACK Net Exporter pods:
kubectl get pod -l app=net-exporter -n kube-system -o wide
Expected results:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES anp-*** 1/1 Running 0 32s 10.1.XX.XX cn-*** <none> <none>
Run the following command to query metrics. Replace
10.1.XX.XX
in the command with the IP address of ACK Net Exporter that is obtained in the previous step.curl http://<10.1.XX.XX>:9102/metrics
ACK Net Exporter returns metric data in the following format:
inspector_pod_udprcvbuferrors{namespace="elastic-system",netns="ns402653****",node="iZbp179u0bgzhofjupc****",pod="elastic-operator-0"} 0 1654487977826
The preceding format includes the following fields:
inspector_pod_udprcvbuferrors
indicates that the metric is provided by ACK Net Exporter and it is a pod metric. Metrics of both pods and nodes are collected. The name of the metric isudprcvbuferrors
, which indicates the number of UDP receive buffer errors that occur because the receive queue within a pod is full.namespace
,pod
,node
, andnetns
: the labels of metrics. You can use PromQL statements to filter labels. Thepod
label indicates the pod that the metric describes. Thenamespace
label indicates the namespace to which the pod belongs. Thenode
label indicates the name of the node that hosts the pod. The hostname specified in the /etc/hostname file is used as the default hostname. Thenetns
label indicates the ID of the network namespace of a container in the pod.0
and1654487977826
indicate the value of the metric and the point in time when the metric value is collected. The point in time is a UNIX timestamp.
Events and event format supported by ACK Net Exporter
ACK Net Exporter can collect events of network exceptions that occur on nodes. This section describes the network exceptions that may occur. These exceptions occasionally occur and are difficult to reproduce. No efficient methods are available to troubleshoot these exceptions.
Connection failures and request timeouts caused by data packet loss.
Performance issues caused by time-consuming data processing.
Business interruptions that occur due to the anomalies of the stateful connection mechanism, such as TCP or connection tracking.
ACK Net Exporter provides eBPF-based context observability for operating system kernels to help you troubleshoot the preceding issues. ACK Net Exporter captures the status of the operating system in real time when an exception occurs, and then generates an event log. For more information about the events and event probes supported by ACK Net Exporter, see the ACK Net Exporter events section of this topic.
You can check the relevant information in the event log. In this example, the tcp_reset probe is used. When a pod receives a normal packet that is destined for an unknown port, ACK Net Exporter captures the following event:
type=TCPRESET_NOSOCK pod=storage-monitor-5775dfdc77-fj767 namespace=kube-system protocol=TCP saddr=100.103.42.233 sport=443 daddr=10.1.17.188 dport=33488
type=TCPRESET_NOSOCK
: indicates the TCPRESET_NOSOCK event. This type of event is captured by the tcp_reset probe. The event indicates that a reset packet is returned for a packet that is destined for an unknown port because no matching socket is found. In most cases, this event occurs when NAT fails. For example, this event occurs when an IPVS timeout occurs.pod/namespace
: the pod metadata that is associated with the event after ACK Net Exporter finds the matching IP address and network device serial number based on the network namespace of the packet.saddr/sport/daddr/dport
: the packet information obtained by ACK Net Exporter from the kernel. The packet information varies based on the event. For example, an event captured by the net_softirq probe does not contain IP addresses. Instead, the event contains the serial number of the CPU in which the interruption occurs and the delay.
For events that require valid operating system kernel stacking information, ACK Net Exporter captures the stacking context in the operating system kernel when these events occur, such as the following event:
type=PACKETLOSS pod=hostNetwork namespace=hostNetwork protocol=TCP saddr=10.1.17.172 sport=6443 daddr=10.1.17.176 dport=43018 stacktrace:skb_release_data+0xA3 __kfree_skb+0xE tcp_recvmsg+0x61D inet_recvmsg+0x58 sock_read_iter+0x92 new_sync_read+0xE8 vfs_read+0x89 ksys_read+0x5A
ACK Net Exporter allows you to view events by using multiple methods. For more information, see the "Collect monitoring data from ACK Net Exporter" section of this topic.
Collect monitoring data from ACK Net Exporter
Scenario 1: Export monitoring data to Prometheus or Grafana and visualize the data
ACK Net Exporter can export monitoring data to a Prometheus server. If you use a self-managed Prometheus server, you can add the following scrape_config
to enable the Prometheus server to collect monitoring data from ACK Net Exporter:
# In the following example, only one endpoint is specified for data collection.
scrape_configs:
# The job=<job_name> label is added to each time series that is collected based on the configuration. In this example, the job is named net-exporter_sample.
- job_name: "net-exporter_sample"
static_configs:
- targets: ["{kubernetes pod ip}:9102"]
If your Prometheus server runs in an ACK cluster, you can use the service discovery feature of Prometheus to automatically obtain all ACK Net Exporter pods that function as expected. To do this, add the following configurations to the Prometheus server:
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-server-conf
labels:
name: prometheus-server-conf
namespace: kube-system
data:
prometheus.yml: |-
# Add the following configurations to the Prometheus server:
- job_name: 'net-exporter'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_endpoints_name]
regex: 'net-exporter'
action: keep
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
After you add the configurations, choose
in the Prometheus server. The Targets page of the Prometheus server shows the ACK Net Exporter pods that run as expected. You can also enter inspector into the search box on the Graph page of the Prometheus server to view the ACK Net Exporter metrics.You can configure Grafana to visualize the monitoring data that is collected to Prometheus:
In the left-side navigation pane of the Grafana page, choose .
On the New dashboard page, click Add a new panel.
In the lower part of the Edit Panel page, enter Prometheus in the Data source field. Then, enter the address of the Prometheus server.
Click Metric browser and enter inspector. Then, Grafana displays all available ACK Net Exporter metrics. Click Save in the upper-right part. In the dialog box that appears, click Save. Grafana then displays the visualized data, as shown in the following figure.
Configure how the metrics are displayed on a Grafana dashboard based on the configurations that are displayed in the preceding figure. For example, you can use the following configurations to display the increment trend of the
inspector_pod_tcppassiveopens
metric. This metric indicates the total number of sockets that are created due to handshake requests sent by clients to establish TCP connections within a network space after the system is started or the container is created. To view the increment trend of this metric, use the following configurations:// Use the rate() method provided by PromQL to calculate the increment trend of the metric. rate(inspector_pod_tcppassiveopens[1m]) // Use the labels provided by net-exporter to configure a legend to display the metric. {{namespace}}/{{pod}}/{{node}}
Scenario 2: Export monitoring data to ARMS and visualize the data
To export monitoring data from ACK Net Exporter to Application Real-Time Monitoring Service (ARMS) and visualize the data, perform the following steps.
Configure custom ACK Net Exporter metrics.
Log on to the ARMS console. In the left-side navigation pane, choose .
Log on to the ARMS console.
In the top navigation bar, select the region in which the ACK cluster is deployed.
In the left-side navigation pane, choose .
Click the name of the Prometheus instance instance that you want to manage to go to the Integration Center page.
On the Instances page, find the instance that you want to manage and click the name of the instance. Then, you are redirected to the instance details page. In most cases, the name of the Prometheus instance is the same as the name of your cluster.
In the left-side navigation pane, click Service Discovery, and then click the Targets tab. In the lower part of the Targets tab, click kubernetes-pods. The information shows that the custom ACK Net Exporter metrics are configured.
If kubernetes-pods is not displayed, you need to click the Configure tab and turn on the switch on the Default Service Discovery tab.
In the left-side navigation pane, click Dashboards. On the Dashboards page, click a dashboard to log on to Grafana. Click Add panel, select Graph, and then enable the data sources related to your cluster in the Data source section.
Click Metric browser and enter inspector. Then, Grafana automatically displays all available ACK Net Exporter metrics. In the upper-right corner, click Save. In the dialog box that appears, click Save. Grafana then displays the visualized data, as shown in the following figure.
Scenario 3: Export monitoring data to Grafana Loki and visualize the data
You can push anomaly events collected by ACK Net Exporter to your pre-configured Grafana Loki service in real time. This helps you manage these events in a centralized manner. To export monitoring data from ACK Net Exporter to Grafana Loki, perform the following steps.
- Note
Deploy Grafana Loki in a network that the ACK Net Exporter pods can access. ACK Net Exporter can automatically push event logs to Grafana Loki.
On the configuration page of ACK Net Exporter, set the enableEventServer parameter to true and the lokiServerAddress parameter to the address of the Grafana Loki service. You can specify the IP address or domain name of the Grafana Loki service as the service address.
Run the following command to access the service address and check whether Grafana Loki is available:
curl http://[Address of Grafana Loki]:3100/ready
If Grafana Loki is available, add Grafana Loki as a Grafana data source.
Open Grafana. In the left-side navigation pane, choose
, enter the address of Grafana Loki, and then click Save & test.In the left-side navigation pane, click Explore. In the top navigation bar, set the data source to Loki and view the events pushed to Grafana Loki.
You can view the events of a node by selecting the node from the Label filters drop-down list or enter keywords in the Line containers field to search for specific events.
You can click Add to dashboard in the top navigation bar to add a configured event panel to the dashboard.
The content of the events provided by ACK Net Exporter varies based on the event type. You can check the event details to view the relevant content.
For more information about the LogQL query language supported by Grafana Loki, see LogQL: Log query language.
Scenario 4: Use the ACK Net Exporter CLI to collect events
The ACK Net Exporter CLI (inspector-cli) is a scenario-specific troubleshooting and analysis tool developed by the ACK team based on ACK Net Exporter. You can use inspector-cli to collect kernel exception events in real time. inspector-cli can help quickly identify the cause of common exceptions in cloud-native scenarios.
You can run inspector-cli by starting a container on an on-premises machine.
# Start a temporary container to run inspector-cli. You can replace the image with a later version to update inspector-cli.
docker run -it --name=inspector-cli --network=host registry.cn-hangzhou.aliyuncs.com/acs/inspector:v0.0.1-12-gff0558c-aliyun
which inspector
# /bin/inspector is the working path of inspector-cli. You can directly run inspector-cli in the container.
The following example shows how to use inspector-cli to collect the events of a node captured by ACK Net Exporter.
# Set the -e parameter to the address of the event service of ACK Net Exporter.
inspector watch -e 10.1.16.255
# Expected results:
INFO TCP_RCV_RST_ESTAB Namespace=kube-system Pod=kube-proxy-worker-tbv5s Node=iZbp1jesgumdx66l8ym8j8Z Netns=4026531993 10.1.16.255:43186 -> 100.100.27.15:3128
...
You can also log on to the inspector container of ACK Net Exporter to troubleshoot issues.
# When you run the following command, set the -n parameter to the namespace of ACK Net Exporter and specify the ACK Net Exporter pod that you want to access.
kubectl exec -it -n kube-system -c inspector net-exporter-2rvfh -- sh
# Run the following command to view the distribution of network entities on the current node.
inspector list entity
# Run the following command to listen for network exception events and other relevant information in the local network.
inspector watch -d -v
#{"time":"2023-02-03T09:01:03.402118044Z","level":"INFO","source":"/go/src/net-exporter/cmd/watch.go:63","msg":"TCPRESET_PROCESS","meta":"hostNetwork/hostNetwork node=izbp1dnsn1bwv9oyu2gaupz netns=ns0 ","event":"protocol=TCP saddr=10.1.17.113 sport=6443 daddr=10.1.17.113 dport=44226 state:TCP_OTHER "}
# You can also specify multiple ACK Net Exporter nodes to view the time when the event occurs on these nodes.
inspector watch -s 10.1.17.113 -s 10.1.18.14 -d -v
Use ACK Net Exporter to troubleshoot occasional container network issues
This section describes how to troubleshoot occasional network issues in cloud-native scenarios. ACK Net Exporter can help you quickly obtain information that is required for fixing these issues.
DNS timeout issues
DNS timeout issues in cloud-native environments can cause service access failures. The following reasons can cause DNS timeout issues:
The DNS server fails to reply before the DNS query times out.
The DNS client cannot deliver the DNS query promptly or fails to deliver the DNS query.
The DNS server responds to the DNS query. However, the response is lost due to a DNS client issue, such as insufficient memory.
You can use the following metrics to troubleshoot DNS timeout issues.
Metric | Description |
inspector_pod_udpsndbuferrors | The number of errors that are reported when UDP packets are sent over the network layer. |
inspector_pod_udpincsumerrors | The number of checksum errors that are reported when UDP packets are received. |
inspector_pod_udpnoports | The number of times that the |
inspector_pod_udpinerrors | The number of errors that are reported when UDP packets are received. |
inspector_pod_udpoutdatagrams | The number of UDP packets that are successfully sent over the network layer. |
inspector_pod_udprcvbuferrors | The number of times that UDP fails to replicate protocol data from the application layer to a socket queue because the socket queue is full. |
A large number of services in cloud-native environments rely on the DNS resolution service provided by CoreDNS. If a DNS issue correlated to CoreDNS occurs, you need to check the metrics of the CoreDNS pod.
Nginx Ingress 499, 502, 503, and 504 issues
In cloud-native environments, exceptions usually occur to an Ingress gateway or a service that serves as a proxy or broker. The following 499, 502, 503, and 504 errors commonly occur to an NGINX Ingress or other proxy services that use NGINX as the base:
499
: This error is returned if the NGINX client closes the TCP connection without receiving a response from the NGINX server. Common reasons:The NGINX client does not send the request immediately after the TCP connection is created. As a result, the client times out before the NGINX server responds to the request. This issue commonly occurs to asynchronous requests sent by Android clients.
The NGINX server requires a period of time to handle the TCP connection. In this scenario, you need to check all possible causes.
The NGINX server is waiting for the response from the upstream backend.
502
: This error is usually caused by connection issues between the NGINX server and upstream backend, such as connection failures or unexpected connection disruptions. Common reasons:A DNS resolution failure occurs to the backend. This issue commonly occurs when a Kubernetes service is specified as the backend.
The NGINX server fails to connect to the upstream backend.
Business interaction is interrupted because the size of the upstream request or response is too large or no memory can be allocated.
503
: This error is returned to the client if all upstream backends are unavailable. Common reasons in cloud-native environments:No backends are available. This issue only occasionally occurs.
The Ingress triggers rate limiting due to the heavy traffic.
504
: This error is returned when packets exchanged between the NGINX server and upstream backend time out. One of the common reasons is that the response from the upstream backend fails to reach the NGINX server before the timeout period ends.
When the preceding errors are returned, you need to collect the following information to narrow down the scope for further troubleshooting:
The access_log information provided by NGINX, including
request_time
,upstream_connect_time
, andupstrem_response_time
.The error_log information provided by NGINX. You need to check whether error messages are returned when the issue occurs.
If you have configured liveness probing or readiness probing, you need to check the health check information.
Take note of the metrics that are described in the following table if the issues occur due to connection failures.
Metric | Description |
inspector_pod_tcpextlistenoverflows | The number of times that the SYN queue is full when the socket in the LISTEN state accepts connections. |
inspector_pod_tcpextlistendrops | The number of times that the socket in the LISTEN state fails to create a socket in the SYN_RECV state. |
inspector_pod_netdevtxdropped | The number of packet drops due to network interface controller (NIC) sending errors. |
inspector_pod_netdevrxdropped | The number of packet drops due to NIC receiving errors. |
inspector_pod_tcpactiveopens | The number of times that TCP SYN succeeds within a pod, excluding SYN retransmissions. The value of this metric also increases when connection failures occur. |
inspector_pod_tcppassiveopens | The number of times that TCP handshake succeeds and a socket is allocated within a pod. In most cases, this metric indicates the number of new connections. |
inspector_pod_tcpretranssegs | The total number of packets that are retransmitted within a pod. TCP segments generated by TCP segmentation offload (TSO) are already counted. |
inspector_pod_tcpestabresets | The number of TCP connections that are abnormally closed within a pod. The value is calculated only based on results. |
inspector_pod_tcpoutrsts | The number of TCP reset packets sent within a pod. |
inspector_pod_conntrackinvalid | The number of times that connection tracking fails to create connections but does not drop the packets. |
inspector_pod_conntrackdrop | The number of times that connection tracking drops packets due to connection failures. |
Take note of the metrics that are described in the following table if the NGINX server responds slowly. For example, the request processing time (request_time
) is short but the request times out.
Metric | Description |
inspector_pod_tcpsummarytcpestablishedconn | The number of TCP connections in the ESTABLISHED state. |
inspector_pod_tcpsummarytcptimewaitconn | The number of TCP connections in the TIMEWAIT state. |
inspector_pod_tcpsummarytcptxqueue | The size of data packets in the send queue of TCP connections in the ESTABLISHED state. Unit: bytes. |
inspector_pod_tcpsummarytcprxqueue | The size of data packets in the receive queue of TCP connections in the ESTABLISHED state. Unit: bytes. |
inspector_pod_tcpexttcpretransfail | The number of errors other than EBUSY that are returned after a retransmission. The errors indicate that the retransmission fails. |
You can check the changes of the preceding metrics at the point in time when the issue occurs to narrow down the scope. If you still cannot locate the cause, submit a ticket and include the preceding information in your ticket to request technical support.
TCP reset issues
A host returns a TCP reset packet when it receives an unexpected TCP packet. TCP reset has the following impacts on your applications:
connection reset by peer
: In most cases, this error occurs on NGINX services that rely on the C library.Broken pipe
: In most cases, this error occurs on Java and Python applications that are encapsulated with TCP.
TCP reset is common in cloud-native environments. The cause of TCP reset varies and is hard to identify. The following section lists some common reasons for TCP reset:
The server cannot provide services as expected. For example, the memory allocated to TCP is insufficient. In this scenario, TCP proactively sends reset packets.
Requests are forwarded to an unexpected backend due to a stateful mechanism error, such as an endpoint or Conntrack error, when services or load balancers are used.
Connections are released due to security reasons.
Protection Against Wrapped Sequence numbers (PAWS) or sequence number wrapping issues occur in NAT or high-concurrency scenarios.
Connections remain idle for a long period of time when TCP keepalive is used.
You can collect the following metrics to help quickly distinguish the preceding causes.
Analyze the network topology between the client and server when TCP reset packets are generated.
Take note of the metrics that are described in the following table.
Metric
Description
inspector_pod_tcpexttcpabortontimeout
The number of times that TCP reset packets are sent to close connections because the upper limit of keepalive, window probe, and retransmission calls is reached.
inspector_pod_tcpexttcpabortonlinger
The number of times that TCP reset packets are sent to close FIN_WAIT2 connections when the TCP Linger_2 option is enabled.
inspector_pod_tcpexttcpabortonclose
The number of times that TCP reset packets are sent to close TCP connections when data reception is still in progress due to a reason other than the status machine.
inspector_pod_tcpexttcpabortonmemory
The number of times that TCP reset packets are sent to close connections because tcp_check_oom triggers an out of memory error during memory allocation to tw_sock or tcp_sock.
inspector_pod_tcpexttcpabortondata*
The number of times that TCP reset packets are sent to close connections because the Linger or Linger2 option is enabled.
inspector_pod_tcpexttcpackskippedsynrecv
The number of times that the socket in the SYN_RECV state does not respond to ACK.
inspector_pod_tcpexttcpackskippedpaws
The number of times that ACK packets are limited by the Out-of-Window (OOW) rate limiting mechanism because PAWS is triggered.
inspector_pod_tcpestabresets
The number of TCP connections that are abnormally closed within a pod. The value is calculated only based on results.
inspector_pod_tcpoutrsts
The number of TCP reset packets sent within a pod.
If TCP reset occurs in a specific pattern, you can enable the events feature of ACK Net Exporter to collect the corresponding events.
Event
Description
TCP_SEND_RST
This event is generated when TCP reset packets are sent to close connections unless the following TCP_SEND_RST_NOSock or TCP_SEND_RST_ACTIVE common event occurs.
TCP_SEND_RST_NOSock
This event is generated when TCP reset packets are sent because no local socket is found.
TCP_SEND_RST_ACTIVE
This event is generated when TCP reset packets are sent due to a resource issue or because the user mode is disabled.
TCP_RCV_RST_SYN
This event is generated when TCP reset packets are sent during the three-way handshake phase.
TCP_RCV_RST_ESTAB
This event is generated when TCP reset packets are sent after connections are established.
TCP_RCV_RST_TW
This event is generated when TCP reset packets are sent during the four-way handshake phase.
Occasional network latency and jitter issues
Network latency and network jitter issues in cloud-native environments are hard to troubleshoot. The cause of these issues varies. In addition, the network latency issue may further cause the preceding three types of issues. In container networks, network latency issues in nodes usually occur due to the following reasons:
A real-time process managed by the RT scheduler requires a long period of time to complete. As a result, user processes or network kernel processes are piled in the queue or run slowly.
An external call made by the user process occasionally requires a long period of time to complete. For example, requests are processed slowly because the disk responds slowly or the round-trip time of an RDS instance increases.
Some CPUs or NUMA nodes are overwhelmed due to the improper node configurations. As a result, system stuttering occurs.
The stateful mechanism of the kernel causes the increased latency. For example, due to the confirm operation performed by connection tracking, a large number of orphan sockets adversely affect socket search.
In most cases, these network issues are caused by operating system issues. Take note of the metrics that are described in the following table to narrow down the scope.
Metric | Description |
inspector_pod_netsoftirqshed | The duration from the time when a software interrupt is initiated to the time when the ksoftirqd process starts to perform the software interrupt. |
inspector_pod_netsoftirq | The duration from the time when the ksoftirqd process starts to perform the software interrupt to the time when the ksoftirqd process changes to the offcpu state. |
inspector_pod_ioioreadsyscall | The number of read operations performed by the process, such as the number of reads or preads. |
inspector_pod_ioiowritesyscall | The number of write operations performed by the process, such as the number of writes or pwrites. |
inspector_pod_ioioreadbytes | The number of bytes that the process reads from a file system, which is a block device in most cases. |
inspector_pod_ioiowritebyres | The number of bytes that the process writes into a file system. |
inspector_pod_virtsendcmdlat | The duration of virtual calls for NIC operations. |
inspector_pod_tcpexttcptimeouts | The number of times that SYN packets are retransmitted because the SYN packets are not responded while the status of TCP_CA is not recovery, loss, or disorder. |
inspector_pod_tcpsummarytcpestablishedconn | The number of TCP connections in the ESTABLISHED state. |
inspector_pod_tcpsummarytcptimewaitconn | The number of TCP connections in the TIMEWAIT state. |
inspector_pod_tcpsummarytcptxqueue | The size of data packets in the send queue of TCP connections in the ESTABLISHED state. Unit: bytes. |
inspector_pod_tcpsummarytcprxqueue | The size of data packets in the receive queue of TCP connections in the ESTABLISHED state. Unit: bytes. |
inspector_pod_softnetprocessed | The number of backlog packets that all CPUs receive from the NIC within a pod. |
inspector_pod_softnettimesqueeze | The number of times that all CPUs fail to receive the complete packet or the receive operation times out within a pod. |
Case study
The following cases show how to use ACK Net Exporter to troubleshoot container network issues.
Case 1: Occasional DNS resolution timeout
Symptom
Customer A submitted a ticket to request technical support to handle DNS resolution timeouts that occasionally occur. The application of Customer A is written in PHP. CoreDNS is configured to perform DNS resolution.
Troubleshooting
Obtain DNS metrics from the monitoring system of Customer A.
Analyze the metrics at the moments of reported timeouts. The analysis reveals the following issues:
Each time a DNS resolution timeout occurs, the value of
inspector_pod_udpnoports
increases by 1. The value of this metric is small.The number of
__udp4_lib_rcv
packet drops indicated by theinspector_pod_packetloss
metric increases by 1. However, the change in the number of packet drops is minor.
Customer A specifies that the IP address of the DNS server is a public IP address provided by an Internet service provider (ISP). Based on the obtained metrics, the DNS timeouts occurred because the amount of time required to send the response to the client is large. The response is received after the DNS query times out in user mode.
Case 2: Occasional Java application connection failure
Symptom
Customer B submitted a ticket to request technical support to resolve the following issue: Tomcat occasionally becomes unavailable and the issue lasts 5 to 10 seconds each time.
Troubleshooting
The log analysis result shows that the Java runtime was performing a garbage collection (GC) operation when the issue occurred.
Customer B deployed ACK Net Exporter and analyzed the monitoring data. Customer B found that the value of the
inspector_pod_tcpextlistendrops
metric increased significantly at the time when the issue occurred.The analysis results indicate that request processing was slowed down when the Java runtime performed the GC operation. However, new requests are not throttled. As a result, a large number of connections are created and the backlog of the LISTEN socket is overflowed. This causes the value of the
inspector_pod_tcpextlistendrops
metric to increase.The issue of TCP connections piling up is transient and is not attributed to the request processing capability of TCP. In this scenario, Customer B modified the Tomcat parameters as we recommended and resolved the issue.
Case 3: Occasional network jitter
Symptom
Customer C submitted a ticket to request technical support to resolve the following issue: The round-trip time between the Redis instance and application significantly increases. As a result, timeout errors occur. The issue cannot be reproduced.
Troubleshooting
After Customer C analyzed the log, Customer C identified that the response time of Redis requests occasionally exceeds 300 milliseconds.
Customer C also identified that the value of the
inspector_node_virtsendcmdlat
metric increased at the time when the issue occurred. The affected monitoring levels in Prometheus service are 15 and 18. After calculation, Customer C identified two virtual calls with a long response time. The response time of the call with a monitoring level of 15 exceeds 36 milliseconds and the response time of the call with a monitoring level of 18 exceeds 200 milliseconds.The kernel occupies the CPU when processing virtual calls. In this case, CPU resources cannot be preempted by other operations. As a result, the execution of virtual calls is slowed down when pods are added or deleted in batches, which further causes the response time to increase.
Case 4: NGINX Ingress occasionally fails to pass health checks
Symptom
Customer D submitted a ticket to request technical support to resolve the following issue: The NGINX Ingress occasionally fails to pass health checks. As a result, request failures occur.
Troubleshooting
After Customer D deployed ACK Net Exporter, Customer D identified that the following metrics are abnormal:
The values of the
inspector_pod_tcpsummarytcprxqueue
andinspector_pod_tcpsummarytcptxqueue
metrics increased.The value of the
inspector_pod_tcpexttcptimeouts
metric increased.The value of the
inspector_pod_tcpsummarytcptimewaitconn
metric decreased and the value of theinspector_pod_tcpsummarytcpestablishedconn
metric increased.
The analysis results indicate that the kernel ran as expected when the issue occurred. Connections are created as expected. However, exceptions occurred when the user process handled the packets in the receive socket and sent packets. In this scenario, the health check failure may be caused by a scheduling or rate limiting issue.
Customer D checked the monitoring data of the cgroups as we recommended and identified CPU throttling at the point in time when the health check failure occurred. The results indicate that the user process occasionally failed to schedule CPU resources due to a cgroup issue.
To resolve this issue, refer to Enable CPU Burst and configure CPU Burst for the NGINX Ingress.
Appendix
ACK Net Exporter metrics
The metrics supported by ACK Net Exporter are constantly updated. For more information, see the instructions on the Marketplace page of the ACK console. All metrics and events provide pod-specific information, except net_softirq and virtcmdlat, which are not related to pods.
Metric | Description | Probe |
inspector_pod_netdevrxbytes | The number of bytes received by the NIC. | netdev |
inspector_pod_netdevtxbytes | The number of bytes sent by the NIC. | netdev |
inspector_pod_netdevtxerrors | The number of NIC send errors. | netdev |
inspector_pod_netdevrxerrors | The number of NIC receive errors. | netdev |
inspector_pod_netdevtxdropped | The number of packet drops due to NIC send errors. | netdev |
inspector_pod_netdevrxdropped | The number of packet drops due to NIC receive errors. | netdev |
inspector_pod_netdevtxpackets | The number of packets that are successfully sent by the NIC. | netdev |
inspector_pod_netdevrxpackets | The number of packets that are successfully received by the NIC. | netdev |
inspector_pod_softnetprocessed | The number of backlog packets that all CPUs receive from the NIC within a pod. | softnet |
inspector_pod_softnetdropped | The number of backlog packets that are dropped by all CPUs after the CPUs receive the packets from the NIC within a pod. | softnet |
inspector_pod_softnettimesqueeze | The number of times that all CPUs fail to receive the complete packet or the receive operation times out within a pod. | softnet |
inspector_pod_tcpactiveopens | The number of times that TCP SYN succeeds within a pod, excluding SYN retransmissions. The value of this metric also increases when connection failures occur. | tcp |
inspector_pod_tcppassiveopens | The number of times that TCP handshake succeeds and a socket is allocated within a pod. In most cases, this metric indicates the number of new connections. | tcp |
inspector_pod_tcpretranssegs | The total number of packets that are retransmitted within a pod. TCP segments generated by TSO are already counted. | tcp |
inspector_pod_tcpestabresets | The number of TCP connections that are exceptionally closed within a pod. The value is calculated only based on results. | tcp |
inspector_pod_tcpoutrsts | The number of TCP reset packets sent within a pod. | tcp |
inspector_pod_tcpcurrestab | The number of active TCP connections within a pod. | tcp |
inspector_pod_tcpexttcpabortontimeout | The number of times that TCP reset packets are sent to close connections because the upper limit of keepalive, window probe, and retransmission calls is reached. | tcpext |
inspector_pod_tcpexttcpabortonlinger | The number of times that TCP reset packets are sent to close FIN_WAIT2 connections when the TCP Linger_2 option is enabled. | tcpext |
inspector_pod_tcpexttcpabortonclose | The number of times that TCP reset packets are sent to close TCP connections when data reception is still in progress due to a reason other than the status machine. | tcpext |
inspector_pod_tcpexttcpabortonmemory | The number of times that TCP reset packets are sent to close connections because tcp_check_oom triggers an out of memory error during memory allocation to tw_sock or tcp_sock. | tcpext |
inspector_pod_tcpexttcpabortondata* | The number of times that TCP reset packets are sent to close connections because the Linger or Linger2 option is enabled. | tcpext |
inspector_pod_tcpextlistenoverflows | The number of times that the SYN queue is full when the socket in the LISTEN state accepts connections. | tcpext |
inspector_pod_tcpextlistendrops | The number of times that socket in the LISTEN state fails to create a socket in the SYN_RECV state. | tcpext |
inspector_pod_tcpexttcpackskippedsynrecv | The number of times that the socket in the SYN_RECV state does not respond to ACK. | tcpext |
inspector_pod_tcpexttcpackskippedpaws | The number of times that ACK packets are limited by the OOW rate limiting mechanism because PAWS is triggered. | tcpext |
inspector_pod_tcpexttcpackskippedseq | The number of times that ACK packets are limited by the OOW rate limiting mechanism because sequence numbers are out of window. | tcpext |
inspector_pod_tcpexttcpackskippedchallenge | The number of times that challenge ack packets are limited by the OOW rate limiting mechanism. These packets are usually sent to confirm TCP reset packets. | tcpext |
inspector_pod_tcpexttcpackskippedtimewait | The number of times that ACK packets are ignored by the OOW rate limiting mechanism in the fin_wait_2 state. | tcpext |
inspector_pod_tcpexttcpackskippedfinwait2 | The number of times that ACK packets are ignored by the OOW rate limiting mechanism in the fin_wait_2 state. | tcpext |
inspector_pod_tcpextpawsestabrejected* | The number of times that TCP inbound packets are dropped because PAWS is triggered. | tcpext |
inspector_pod_tcpexttcprcvqdrop | The value of this metric increases when memory allocation fails and the TCP receive queue is full. | tcpext |
inspector_pod_tcpexttcpretransfail | The number of errors other than EBUSY that are returned after a retransmission. The errors indicate that the retransmission fails. | tcpext |
inspector_pod_tcpexttcpsynretrans | The number of SYN packets that are retransmitted. | tcpext |
inspector_pod_tcpexttcpfastretrans | The number of times that retransmission is triggered when the status of TCP_CA is not Loss. | tcpext |
inspector_pod_tcpexttcptimeouts | The number of times that SYN packets are retransmitted because the SYN packets are not answered while the status of TCP_CA is not recovery, loss, or disorder. | tcpext |
inspector_pod_tcpsummarytcpestablishedconn | The number of TCP connections in the ESTABLISHED state. | tcpsummary |
inspector_pod_tcpsummarytcptimewaitconn | The number of TCP connections in the TIMEWAIT state. | tcpsummary |
inspector_pod_tcpsummarytcptxqueue | The size of data packets in the send queue of TCP connections in the ESTABLISHED state. Unit: bytes. | tcpsummary |
inspector_pod_tcpsummarytcprxqueue | The size of data packets in the receive queue of TCP connections in the ESTABLISHED state. Unit: bytes. | tcpsummary |
inspector_pod_udpindatagrams | The number of UDP packets that are successfully received. | udp |
inspector_pod_udpsndbuferrors | The number of errors that are reported when UDP packets are sent over the network layer. | udp |
inspector_pod_udpincsumerrors | The number of checksum errors that are reported when UDP packets are received. | udp |
inspector_pod_udpignoredmulti | The number of multicast packets that are ignored by UDP. | udp |
inspector_pod_udpnoports | The number of times that the corresponding socket cannot be found when the network layer invokes __udp4_lib_rcv to receive packets. | udp |
inspector_pod_udpinerrors | The number of errors that are reported when UDP packets are received. | udp |
inspector_pod_udpoutdatagrams | The number of UDP packets that are successfully sent over the network layer. | udp |
inspector_pod_udprcvbuferrors | The number of times that UDP fails to replicate protocol data from the application layer to a socket queue because the socket queue is full. | udp |
inspector_pod_conntrackentries* | The number of existing entries. | conntrack |
inspector_pod_conntrackfound | The number of times that connection tracking records are found. | conntrack |
inspector_pod_conntrackinsert | The metric is not in use. | conntrack |
inspector_pod_conntrackinvalid | The number of times that connection tracking fails to create connections but does not drop the packets. | conntrack |
inspector_pod_conntrackignore | The number of times that connection tracking is skipped before connections are already created or connection tracking is not required. | conntrack |
inspector_pod_conntrackinsertfailed | The metric is not in use. | conntrack |
inspector_pod_conntrackdrop | The number of times that connection tracking drops packets due to connection failures. | conntrack |
inspector_pod_conntrackearlydrop | The metric is not in use. | conntrack |
inspector_pod_conntracksearchrestart | The number of attempts to retry a search during connection tracking. | conntrack |
inspector_pod_fdopenfd | The number of file descriptors of all processes within a pod. | fd |
inspector_pod_fdopensocket | The number of file descriptors of socket type within a pod. | fd |
inspector_pod_slabtcpslabobjperslab | The number of objects included in a single page of a TCP slab. | slab |
inspector_pod_slabtcpslabpagesperslab | The number of pages in a TCP slab. | slab |
inspector_pod_slabtcpslabobjactive | The number of active objects in a TCP slab. | slab |
inspector_pod_slabtcpslabobjnum | The number of objects in a TCP slab. | slab |
inspector_pod_slabtcpslabobjsize | The size of each object in a TCP slab. The size varies based on the kernel version. | slab |
inspector_pod_ioioreadsyscall | The number of read operations performed by the process, such as the number of reads or preads. | io |
inspector_pod_ioiowritesyscall | The number of write operations performed by the process, such as the number of writes or pwrites. | io |
inspector_pod_ioioreadbytes | The number of bytes that the process reads from a file system, which is a block device in most cases. | io |
inspector_pod_ioiowritebyres | The number of bytes that the process writes into a file system. | io |
inspector_pod_net_softirq_schedslow100ms | The number of times that the amount of time to wait for scheduling exceeds 100 milliseconds when a network interruption occurs. | net_softirq |
inspector_pod_net_softirq_excuteslow100ms | The number of times that a network software interruption lasts more than 100 milliseconds. | net_softirq |
inspector_pod_abnormalloss(inspector_pod_packetloss_abnormal) | The number of times that packets are dropped by the kernel due to errors other than packet issues, such as packet integrity issues or packet checksum errors. | packetloss |
inspector_pod_totalloss(inspector_pod_packetloss_total) | The total number of packets dropped by the kernel. | packetloss |
inspector_pod_virtcmdlatency100ms | The number of times that virtualized communication performed by the NIC lasts more than 100 milliseconds. | virtcmdlat |
inspector_pod_socketlatencyread100ms | The number of times that the user program requires more than 100 milliseconds to read content from the network socket file. | socketlatency |
inspector_pod_socketlatencywrite100ms | The number of times that the user program requires more than 100 milliseconds to write content to the network socket file. | socketlatency |
kernellatency_rxslow100ms | The number of times that the operating system kernel requires more than 100 milliseconds to receive a packet. | kernellatency |
kernellatency_txslow100ms | The number of times that the operating system kernel requires more than 100 milliseconds to send a packet. | kernellatency |
ACK Net Exporter events
The following table describes the operating system network-related events that can be captured by using the latest ACK Net Exporter version.
Probe | Description |
netiftxlat | Queuing Disciplines (qdiscs) of traffic control needs to wait a long period of time before it can send data packets in the queue. |
packetloss | Normal data packets are dropped by the operating system kernel. |
net_softirq | Packet scheduling by NET_RX or NET_TX is interrupted or packet processing is severely delayed due to kernel process software interruption. |
socketlatency | Processes in a pod require a long period to time to complete socket-related read and write operations. |
kernellatency | The kernel requires a long period of time to process packets at the network layer. |
virtcmdlatency | Communication between Virtio-net and the host requires a long period of time. |
tcpreset | TCP reset packets are received or sent. |
tcptwrcv | TCP receives and processes packets when TCP is in the TIMEWAIT state. |
Recommended Grafana configuration file
If you use a Grafana version later than 8.4.0, click ACK Net Exporter-0.2.9.json to download the Grafana configuration file.
If you use Grafana 8.4.0 or earlier, click ACK Net Exporter-legacy.json to download the Grafana configuration file.