How to install the NGINX Ingress controller in high-load scenarios - Container Service for Kubernetes

An Ingress is an API object that provides Layer 7 load balancing to manage external access to Services in a Kubernetes cluster. The NGINX Ingress controller is used to implement the features of Ingresses. This allows Ingresses to perform load balancing for external access based on Ingress rules. In high-load scenarios, insufficient CPU resources and network connections may downgrade application performance. This topic describes how to improve application performance in high-load scenarios by using the NGINX Ingress controller.

Prerequisites

The NGINX Ingress controller in your Container Service for Kubernetes (ACK) cluster runs as normal.
A kubectl client is connected to the cluster. For more information, see Connect to an ACK cluster by using kubectl.

Deployment notes

When you deploy the NGINX Ingress controller in a high-load scenario, take note of the following items:

Elastic Compute Service (ECS) instance specifications

When the cluster receives a large number of concurrent requests, Ingresses consume a large amount of CPU resources and network connections. We recommend that you use ECS instance types with enhanced performance, such as the following instance types:

ecs.c6e.8xlarge (32 Core - 64 GB): a compute-optimized instance type with enhanced performance. This instance type supports up to 6,000,000 packets per second (PPS).
ecs.g6e.8xlarge (32 Core - 128 GB): a general-purpose instance type with enhanced performance. This instance type supports up to 6,000,000 PPS.

For more information about ECS instance types, see Overview of instance families.

Kubernetes configurations

Use exclusive nodes to deploy the NGINX Ingress controller. Run the following commands to add labels and taints to the nodes:
```
kubectl label nodes $node_name ingress-pod="yes"
kubectl taint nodes $node_name ingress-pod="yes":NoExecute
```
Set CPU Policy to static.
We recommend that you select Super I (slb.s3.large) as the Server Load Balancer (SLB) specification for the ingress-controller Service.
We recommend that you use Terway as the network plug-in and use the exclusive ENI mode.

NGINX Ingress controller configurations

Configure Guaranteed pods for the NGINX Ingress controller.
- Set the requests and limits parameters of the containers of nginx-ingress-controller to 30 Core and 40 GiB.
- Set the requests and limits parameters of the init-sysctl init container to 100 m and 70 MiB.
Set the number of the replicated pods of the NGINX Ingress controller Deployment to the number of newly added nodes.
Set worker-processes in the ConfigMap of the NGINX Ingress controller to 28. This reserves 28 worker processes for the system.
Set keepalive in the ConfigMap of the NGINX Ingress controller to specify the maximum number of requests through a connection.
Disable logging.

Step 1: Add nodes

Create a node pool in the ACK cluster and add two nodes to the node pool.

Configure the node pool based on the following description. For more information, see Create and manage a node pool.

Set Operating System to Alibaba Cloud Linux 3.
Set Node Labels and Taints.
- Add a taint. Set Key to ingress-pod, set Value to yes, and set Effect to NoExecute.
- Add a node label. Set Key to ingress-pod and set Value to yes.
Set CPU Policy to Static.

Step 2: Update the NGINX Ingress controller

Run the kubectl edit deploy nginx-ingress-controller -n kube-system command to edit the configuration file of the NGINX Ingress controller based on the following description.

Delete the pod anti-affinity settings.

podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: app
      operator: In
      values:
      - ingress-nginx
   topologyKey: kubernetes.io/hostname

Set the requests and limits parameters for the init container.

resources:
  limits:
    cpu: 100m
    memory: 70Mi
  requests:
    cpu: 100m
    memory: 70Mi

Set the requests and limits parameters of the nginx-ingress-controller containers to 30 Core and 40 GiB.
```
resources:
  limits:
    cpu: "30"
    memory: 40Gi
  requests:
    cpu: "30"
    memory: 40Gi
```

Set node affinity settings and tolerations.

nodeSelector:
  ingress-pod: "yes"
tolerations:
- effect: NoExecute
  key: ingress-pod
  operator: Equal
  value: "yes"

Set the number of the replicated pods of the NGINX Ingress controller to the number of newly added nodes.
Disable metric collection by adding --enable-metrics=false to the startup parameters.
Note
If you do not need metrics, we recommend that you disable metric collection. If the version of the NGINX Ingress controller is v1.9.3 or later, you can also configure startup parameters --exclude-socket-metrics to exclude metrics. Example: nginx_ingress_controller_ingress_upstream_latency_seconds. This mitigates the impact on the system performance.
```
containers:
- args:
  - /nginx-ingress-controller
  - --configmap=$(POD_NAMESPACE)/nginx-configuration
  - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
  - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
  - --annotations-prefix=nginx.ingress.kubernetes.io
  - --publish-service=$(POD_NAMESPACE)/nginx-ingress-lb
  - --enable-metrics=false
  - --v=1
```

Step 3: Update the ConfigMap of the NGINX Ingress controller

Run the kubectl edit cm -n kube-system nginx-configuration command to edit the ConfigMap of the NGINX Ingress controller. Modify the ConfigMap based on the following template:

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-configuration
  namespace: kube-system
data:
  allow-backend-server-header: "true"
  enable-underscores-in-headers: "true"
  generate-request-id: "true"
  ignore-invalid-headers: "true"
  log-format-upstream: $remote_addr - [$remote_addr] - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id $host [$proxy_alternative_upstream_name]
  max-worker-connections: "65536"
  proxy-body-size: 20m
  proxy-connect-timeout: "3"
  proxy-read-timeout: "5"
  proxy-send-timeout: "5"
  reuse-port: "true"
  server-tokens: "false"
  ssl-redirect: "false"
  upstream-keepalive-timeout: "900"
  worker-processes: "28" # Specify the number of NGINX processes to start. You can set this value based on your node specifications. We recommend that you set the value to the number of CPU cores - 2. 
  worker-cpu-affinity: auto
  upstream-keepalive-connections: "300"
  upstream-keepalive-requests: "1000" # Specify the maximum number of requests through a persistent connection. 
  keep-alive: "900"
  keep-alive-requests: "10000"

Import logs to a file and set up log rotation.

By default, logs are written into the /dev/stdout file. When the cluster receives a large number of requests, the CPU usage is high. In this case, we recommend that you write logs into the /dev/stdout file and set up log rotation.

Use SSH to log on to the ECS instance where the ingress-controller pods are deployed. For more information, see Connect to a Linux instance by using an SSH key pair.
Add the following content to the end of the /etc/crontab file:
```
*/15 * * * *  root /root/nginx-log-rotate.sh
```
Note
In this example, the logs are rotated every 15 minutes. You can change the interval based on your requirements.

Create a file named nginx-log-rotate.sh in the /root directory.

Docker clusters

#!/bin/bash
# Specify the maximum number of log files that are retained. You can change the number based on your requirements. 
keep_log_num=5
ingress_nginx_container_ids=$(docker ps | grep nginx-ingress-controller | grep -v pause | awk '{print $1}')
if [[ -z "$ingress_nginx_container_ids" ]]; then
 echo "error: failed to get ingress nginx container ids"
 exit 1
fi
# Make the NGINX Ingress controller pods sleep for a time period of a random length between 5 and 10 seconds. 
sleep $(( RANDOM % (10 - 5 + 1 ) + 5 ))
for id in $ingress_nginx_container_ids; do
 docker exec $id bash -c "cd /var/log/nginx; if [[ \$(ls access.log-* | wc -l) -gt $keep_log_num ]]; then rm -f \$(ls -t access.log-* | tail -1); fi ; mv access.log access.log-\$(date +%F:%T) ; kill -USR1 \$(cat /tmp/nginx/nginx.pid)"
done

containerd clusters

#!/bin/bash
# Specify the maximum number of log files that are retained. You can change the number based on your requirements. 
keep_log_num=5
ingress_nginx_container_ids=$(crictl ps | grep nginx-ingress-controller | grep -v pause | awk '{print $1}')
if [[ -z "$ingress_nginx_container_ids" ]]; then
 echo "error: failed to get ingress nginx container ids"
 exit 1
fi
# Make the NGINX Ingress controller pods sleep for a time period of a random length between 5 and 10 seconds. 
sleep $(( RANDOM % (10 - 5 + 1 ) + 5 ))
for id in $ingress_nginx_container_ids; do
 crictl exec $id bash -c "cd /var/log/nginx; if [[ \$(ls access.log-* | wc -l) -gt $keep_log_num ]]; then rm -f \$(ls -t access.log-* | tail -1); fi ; mv access.log access.log-\$(date +%F:%T) ; kill -USR1 \$(cat /tmp/nginx/nginx.pid)"
done

Run the following command to make the nginx-log-rotate.sh file executable:
```
chmod 755 /root/nginx-log-rotate.sh
```

Step 4: Optimize configurations

This step shows how to optimize kernel parameters, application configurations, compressions settings, and HTTPS performance.

Optimize kernel parameters
Before you optimize kernel parameters, familiarize yourself with the parameters and exercise caution.
- Modify the TIME_WAIT setting
  To optimize the performance of NGINX Ingresses, you can modify OS parameters to enable the reuse of TIME_WAIT ports and reduce the timeout period of connections in the FIN_WAIT2 and TIME_WAIT states.
  - net.ipv4.tcp_fin_timeout=15: reduces the timeout period of connections in the FIN_WAIT2 state. This way, connections are released within a shorter period of time.
  - net.netfilter.nf_conntrack_tcp_timeout_time_wait=30: reduces the time period in which a connection remains in the TIME-WAIT state. This way, connections are released within a shorter period of time.
- Configuration steps
  1. Run the following command to modify the configurations of the Deployment created for the NGINX Ingress controller:
```
kubectl edit deployments -n kube-system nginx-ingress-controller
```
  2. Add the sysctl -w net.ipv4.tcp_fin_timeout=15 and sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30 settings to the init container section. Then, save the changes and exit.
```
initContainers:
      - command:   # Add the sysctl -w net.ipv4.tcp_fin_timeout=15 and sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30 settings. 
        - /bin/sh
        - -c
        - |
          if [ "$POD_IP" != "$HOST_IP" ]; then
          mount -o remount rw /proc/sys
          sysctl -w net.core.somaxconn=65535
          sysctl -w net.ipv4.ip_local_port_range="1024 65535"
          sysctl -w kernel.core_uses_pid=0
          sysctl -w net.ipv4.tcp_fin_timeout=15 
          sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30   
          fi
```

Optimize the application configurations

Perform the following steps to optimize the application configurations of the NGINX Ingress controller:

Run the following command to modify the ConfigMap of the NGINX Ingress controller:
```
 kubectl edit cm -n kube-system nginx-configuration
```

Modify the parameters in the ConfigMap based on the parameter description in the following table.

Item	Parameter	Description
`Keep-alive` settings for downstream connections	`keep-alive: "60"`	The timeout period of `keep-alive` connections. Unit: seconds.
`Keep-alive` settings for downstream connections	`keep-alive-requests: "10000"`	The maximum number of requests that can be sent over a `keep-alive` connection.
`Keep-alive` settings for upstream connections	`upstream-keepalive-connections: "1000"`	The maximum number of `keep-alive` connections.
	`upstream-keepalive-requests: "2147483647"`	The maximum number of requests that can be sent over a `keep-alive` connection.
	`upstream-keepalive-time: 1h`	The maximum period of time during which requests can be processed over a `keep-alive` connection.
	`upstream-keepalive-timeout: "150"`	The timeout period of an idle upstream `keep-alive` connection. Unit: seconds.
Connection upper limit of each work process	`max-worker-connections: "65536"`	The maximum number of simultaneous connections that can be opened by a worker process.
Timeout settings Note You can modify the parameter values based on your business requirements.	`proxy-connect-timeout: "3"`	The timeout period for establishing a connection. Unit: seconds.
	`proxy-read-timeout: "5"`	The timeout period for reading data. Unit: seconds.
	`proxy-send-timeout: "5"`	The timeout period for sending data. Unit: seconds.
Retry settings Note When errors occur on backend services, multiple retries may lead to excessive requests. This may increase the load on the backend services or even cause a service avalanche. For more information, see ingress-nginx official documentation.	`proxy-next-upstream-tries: "3"`	The number of retries after a request fails to be sent. Default value: 3. The default value includes the original request and two retries.
	`proxy-next-upstream: "off"`	The conditions in which retries are triggered. To disable retries, set the value to off.
	`proxy-next-upstream-timeout`	The timeout period of a request retry. Unit: seconds. You can modify the value based on your business requirements.

Enable Brotli compression
Although data compression consumes additional CPU time, compressed data packets reduce bandwidth usage, which increases network throughput. Brotli is an open source compression algorithm developed by Google. Compared with the commonly used gzip compression algorithm, the compression rate of Brotli is 20% to 30% higher. gzip is the default compression algorithm used by the NGINX Ingress controller. To enable Brotli compression for ingress-nginx, you must configure the following parameters. For more information, see ingress-nginx official documentation.
- enable-brotli: specifies whether to enable Brotli. Valid values: true and false.
- brotli-level: the compression level. Valid values: 1 to 11. Default value: 4. A higher compression level requires a higher amount of CPU resources.
- brotli-types: the Multipurpose Internet Mail Extensions (MIME) types that are compressed on the fly by Brotli.
Run the following command to enable Brotli compression:
```
 kubectl edit cm -n kube-system nginx-configuration
```
Sample configurations:
```
enable-brotli: "true"
brotli-level: "6"
brotli-types: "text/xml image/svg+xml application/x-font-ttf image/vnd.microsoft.icon application/x-font-opentype application/json font/eot application/vnd.ms-fontobject application/javascript font/otf application/xml application/xhtml+xml text/javascript application/x-javascript text/plain application/x-font-truetype application/xml+rss image/x-icon font/opentype text/css image/x-win-bitmap"
```
Optimize HTTPS performance
To improve HTTPS performance, run the kubectl edit cm -n kube-system nginx-configuration command to modify the nginx-configuration ConfigMap and configure the following settings in the ConfigMap: SSL session caching, Online Certificate Status Protocol (OCSP) stapling, support for TLS 1.3 early data, and cipher suite priorities.
- SSL session caching and timeout
  Set the size of the shared SSL session cache and the time period in which a client can reuse the session parameters stored in a cache. This helps you reduce the overheads of SSL handshakes.
  - ConfigMap configurations:
```
ssl-session-cache-size: "10m"
ssl-session-timeout: "10m"
```
  - The nginx.conf configurations at the NGINX side. You can modify the configurations based on your business requirements.
```
ssl_session_cache shared:SSL:120m;   # 1 MB of cache can store about 4,000 sessions. 
ssl_session_timeout 1h;              # The timeout period of sessions is 1 hour.
```
- Enable OCSP stapling
  OCSP stapling can reduce the time required by client certificate verification.
```
enable-ocsp: "true"
```
- Support for TLS 1.3 early data (0-RTT)
  The TLS 1.3 early data feature, also known as zero round-trip time (0-RTT), allows clients to send data before the handshake is completed. This helps reduce the response time.
```
ssl-early-data: "true"
ssl-protocols: "TLSv1.3"
```
- Modify the cipher suite priorities (optimized)
  You can modify the cipher suite priorities to reduce network latency. ACK has optimized cipher suite priorities for the NGINX Ingress controller configurations.
```
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384';
ssl_prefer_server_ciphers on;    # The cipher configurations on the server side take precedence.
```