NLB health check overview - Server Load Balancer - Alibaba Cloud Documentation Center

Network Load Balancer (NLB) uses health checks to test the availability of backend servers. After you enable health checks, if a backend server fails health check, NLB automatically forwards requests that are destined to the backend server to other healthy backend servers. When the backend server is declared healthy again, NLB automatically forwards requests to the backend server. Health checks are a key measure to ensure service high availability. Health checks improve the overall availability of businesses and eliminate single points of failure (SPOFs) caused by an unhealthy server.

Health check process

NLB instances are deployed in clusters. Node servers in a cluster are used to forward data and perform health checks.

Servers in a cluster are independent of each other and forward data and run health checks in parallel based on the NLB policy. If NLB detects an unhealthy backend server, connections and requests are processed as described in the following scenarios:

If connection draining is disabled for the backend server, connections on the backend server are closed after all existing sessions are completed. Meanwhile, new requests are no longer forwarded to the backend server.
If connection draining is enabled for the backend server, the backend server continues processing existing sessions. The connections are closed when the connection draining timeout period ends. Meanwhile, new requests are no longer forwarded to the backend server.

Note

NLB uses its local IP address to run health checks. Make sure that the IP address is not blocked by the backend servers. You do not need to configure an Allow rule in the Elastic Compute Service (ECS) security group for the IP address. However, if other security policies are used, such as iptables, allow the IP address in the security policies.

How it works

TCP health checks

To improve the efficiency of TCP health checks, NLB sends customized TCP probes to test the availability of backend servers, as shown in the following figure.

How NLB performs TCP health checks:

NLB sends TCP-SYN packets to the internal IP address and health check port of the backend servers based on the health check settings of the listener.
If the backend server ports are alive, the backend servers return SYN-ACK packets after they receive the TCP-SYN packets.
If NLB does not receive a SYN-ACK packet from a backend server before the response timeout period ends, the backend server is declared unhealthy. Then, NLB sends an RST packet to the backend server to close the TCP connection.
If NLB receives a SYN-ACK packet from a backend server before the response timeout period ends, the backend server passes the health check. NLB sends an ACK packets and then immediately sends an RST packet to close the TCP connection.

Note

In normal cases, three TCP handshakes are performed. After NLB receives a SYN-ACK packet from a backend server, NLB sends an ACK packet and then immediately sends an RST packet to close the TCP connection. This mechanism may cause false TCP connection errors on backend servers. As a result, the backend servers may record the Connection reset by peer error message in software logs, such as Java connection pool logs.

Solutions:

Configure HTTP health checks for TCP listeners.
Enable client IP preservation on backend servers to ignore false TCP connection errors triggered by requests from NLB CIDR blocks.

UDP health checks

You can use the following methods to perform UDP health checks:

Method 1: Health checks on ports

The following figure shows how health checks are performed.

The following process describes how health checks are performed on ports by a UDP listener:

NLB sends an ICMP request packet to the internal IP address of the backend server based on the health check settings of the listener.
NLB sends a UDP probe packet to the internal IP address and health check port of the backend server based on the health check settings of the listener.
If the backend server returns an ICMP response packet before the response timeout period ends and does not return the Port XX Unreachable message, the backend server passes the health check. Otherwise, the backend server fails the health check.

Method 2: Custom health checks

The following figure shows how health checks are performed.

The following process describes how custom health checks are performed by a UDP listener:

NLB sends a UDP probe packet that contains specified characters to the internal IP address and health check port of the backend server.
If NLB receives the specified message before the response timeout period ends, the backend server passes the health check. Otherwise, the backend server fails the health check.

HTTP Health Check

For Layer 4 (TCP or UDP) listeners, you can configure HTTP health checks, which send HEAD or GET requests to query the availability of backend servers. The following figure shows how health checks are performed.

The following process describes how HTTP health checks are performed:

NLB sends an HTTP HEAD or GET request that includes the domain name to the internal IP address and health check path of the backend server based on the health check settings of the listener.
After the backend server receives the HTTP request, the backend server returns an HTTP status code based on the server status.
If NLB does not receive a response from the backend server before the response timeout period ends, the backend server fails the health check.
If NLB receives a response from the backend server before the response timeout period ends, the backend server compares the received HTTP status code with the HTTP status codes configured in the health check settings. If the returned status code matches one of the specified status codes, the backend server is declared healthy. Otherwise, the backend server is declared unhealthy.

Scenarios

TCP health checks

File Transfer Protocol (FTP) services: You can use TCP health checks to test whether an FTP service can receive and respond to connection requests. TCP health checks ensure the stability and reliability of FTP services.
Email services: You can use TCP health checks to test whether an email service can send and receive emails. TCP health checks ensure the reliability of email services.
Financial transactions: For financial transaction systems, the reliability of the transaction server is a key factor. You can use TCP health checks to detect system failures in time and prevent transaction interruptions.
Remote logon: You can use TCP health checks to test the availability and performance of remote logon services. TCP health checks ensure secure and stable connections between users and remote servers.

UDP health checks

Traditional industries

DNS services: You can use UDP health checks to quickly test whether DNS servers can respond to queries as expected.
Voice over Internet Protocol (VoIP) services: You can send small UDP packets to test the key metrics, including network latency, packet loss rate, and network jitter, of VoIP services, such as Skype and VoIP phone systems. UDP health checks help improve the communication quality.
Online games: You can use UDP health checks to monitor the response time and availability of game servers. UDP health checks help improve game fluency and user experience.
Streaming media services: You can use UDP health checks to assess the availability and quality of video streaming for streaming media services, such as video conferencing and real-time video streaming. UDP health checks help improve the response efficiency and streaming stability.
Instant messaging services: You can use UDP health checks to monitor the stability and network latency of connections in real time. UDP health checks ensure fast and reliable message delivery and improve user experience.

Emerging industries

Transitioning to QUIC for the IT industry: You can use UDP health checks in QUIC scenarios to check connection status. UDP health checks ensure high-efficiency, stable, and real-time data transmission.
Internet of Things (IoT) industry: You can use UDP health checks to quickly check the status of sensor devices. UDP health checks ensure low network latency and high efficiency for power-sensitive or cost-sensitive IoT devices.
Vehicle-to-Everything (V2X) industry: You can perform UDP health checks between vehicles and infrastructures to ensure real-time data exchange and quick response. UDP health checks ensure communication stability and reliability for V2X services.
Virtual reality (VR) and augmented reality (AR) industries: You can use UDP health checks to ensure quick transmission of visual and interaction data and improve user experience.
Cloud game industry: You can use UDP health checks to monitor cloud games in real time. UDP health checks ensure low network latency and improve game fluency.

HTTP health checks

Health checks on web services: If your backend server runs an HTTP or HTTPS web service, you can use HTTP health checks to query the server status. To test whether the server can process HTTP requests, you can send HTTP GET or HEAD requests to a specified path on the server, such as /health.
Custom health checks on applications: Some applications use custom health check logic. For example, you can run custom health checks to check database connection pools and the cache status.
Microservices architectures: In a microservices architecture, each microservice may communicate by using an HTTP interface. You can use HTTP health checks to detect application-layer errors of microservice instances. The health check responses can provide more detailed diagnostic information.
API gateways and reverse proxies: If a backend server is an API gateway or a reverse proxy, such as NGINX and HAProxy, an HTTP interface is provided by the components. You can use HTTP health checks to monitor the health status of the services.

Domain names for HTTP health checks

You can specify a domain name for HTTP health checks. This setting is optional. Some application servers must verify the Host header in requests before the application servers can accept the requests. In this case, the request must carry the Host header. If you specify a domain name in the health check configurations, NLB adds the domain name to the Host header. Otherwise, NLB does not add the Host header to requests. In this case, health check requests are rejected by backend servers, which may cause health check failures.

If your application server verifies the Host header in requests, you must configure a domain name for health checks to ensure that the health check feature works as expected.

References

For more information about how to configure health checks, see Create and manage a server group.
For more information about how to troubleshoot health check issues, see Troubleshooting methods for NLB health check issues.