Problem description
Packet loss occasionally occurs when you connect to applications on an ECS instance. The peripheral network of the ECS instance runs as expected. However, when you run the dmesg command to query kernel logs, the "kernel: nf_conntrack: table full, dropping packet
" error message appears. The ECS instance on which the issue occurs meets the following conditions:
Image:
aliyun-2.1903-x64-20G-alibase-20190327.vhd
or laterKernel:
kernel-4.19.24-9.al7
or later
Cause
nf_conntrack is a NAT module that tracks connection entries in the Linux operating system. The nf_conntrack module uses a hash table to record the established TCP connections. When entries in the hash table are exhausted, the establishment of new TCP connections causes the module to report "nf_conntrack: table full, dropping packet
" errors. Take note of the following parameters of the nf_conntrack module:
nf_conntrack_buckets
: the size of the hash table. You can specify this parameter when you load the module or modify the parameter by running thesysctl
command. When the amount of system memory is greater than or equal to 4 GB, the default value is 65536.nf_conntrack_max
: the maximum number of nodes in the hash table, which is the maximum number of connections supported by the nf_conntrack module. When the amount of system memory is greater than or equal to 4 GB, the default value is 262144. For servers that handle a large number of connections, you can increase the value based on your business requirements.nf_conntrack_tcp_timeout_time_wait
: the period for which the TCP connections can remain in the TIME_WAIT state, which is stored in the nf_conntrack module. The default value is 120. Unit: seconds.
Solutions
Use one of the following solutions based on your business scenario.
Solution 1: Use the sysctl interface to change parameter values in the nf_conntrack module
Estimate the nf_conntrack_max value required for applications in advance and run sysctl commands to change the parameter values in the nf_conntrack module. Sample commands:
sysctl -w net.netfilter.nf_conntrack_max=1503232
sysctl -w net.netfilter.nf_conntrack_buckets=375808 # This option cannot be modified during runtime if the kernel version is not 4.19.
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=60
The parameter values in the commands are provided only for reference. Change the values based on your business requirements. Before you modify the parameter values, we recommend that you create snapshots for the ECS instance or back up important files to ensure data security.
Suggestions on how to change the parameter values:
If your applications involve a large number of concurrent short-lived connections, follow the following suggestions when you change parameter values: To prevent excessive connections from exhausting the entries in the nf_conntrack hash table, we recommend that you increase the values of the
nf_conntrack_max
andnf_conntrack_buckets
parameters. In most cases, we recommend that the value of thenf_conntrack_max
parameter is four times the value of thenf_conntrack_buckets
parameter.We recommend that you change the values of the
nf_conntrack_buckets
andnf_conntrack_max
parameters together. If you change only the value of thenf_conntrack_max
parameter, the linked list on the hash table may be long, which will reduce query efficiency. If you change only the value of thenf_conntrack_buckets
parameter, the preceding packet dropping issue persists.Before you change the value of the
nf_conntrack_tcp_timeout_time_wait
parameter, familiarize yourself with how the parameter works and the possible impacts of the parameter value change. Then, exercise caution to change the parameter value based on your application scenarios and performance monitoring data. Here are some suggestions for changing the parameter value:For high-concurrency applications that handle a large number of short-lived connections in a short period of time, such as web servers, we recommend that you set the
nf_conntrack_tcp_timeout_time_wait
parameter to a small value, such as 30 or 60. This way, port resources can be reclaimed faster and more new connections are supported. However, you must make sure that the applications can tolerate the potential retransmission of a small amount of data or latency.If your applications have strict requirements for the integrity of data transmitted, such as financial transaction systems, accept the default value of the
nf_conntrack_tcp_timeout_time_wait
parameter or set the parameter to a value that is close to the default value. This ensures that all data packets are transmitted as expected.On a high-latency or unstable network, a small value of the
nf_conntrack_tcp_timeout_time_wait
parameter may increase the risk of data loss, and a larger value may be required.
Solution 2: Use the Iptables utility to filter out connections that do not need to be tracked
Run the following commands to add the -j NOTRACK parameter to Iptables rules to filter out connections that do not need to be tracked. This method removes the records of the connections that do not need to be tracked from the hash table and prevents excessive connections from causing "kernel: nf_conntrack: table full, dropping packet" errors.
sudo iptables -t raw -A PREROUTING -p udp -j NOTRACK
sudo iptables -t raw -A PREROUTING -p tcp --dport 22 -j NOTRACK
The preceding commands are provided only for reference. They are run to prevent the nf_conntrack module from tracking UDP connections and the TCP connections over port 22. You can modify the commands based on your business requirements.