GWLB health check overview - Server Load Balancer - Alibaba Cloud Documentation Center

Gateway Load Balancer (GWLB) performs health checks to determine the availability of backend servers. After the health check feature is enabled, if GWLB determines that a backend server is unhealthy, GWLB stops forwarding requests to the backend server and distributes subsequent requests to other backend servers. When GWLB determines that the backend server becomes healthy again, GWLB forwards traffic to that backend server again.

Health check status

The following table describes different states of health checks on backend servers.

Health check state	Description
Initializing	A GWLB instance is configured with health checks, and the backend server list is being initialized.
Healthy	The backend server is running as expected.
Unhealthy	The backend server did not respond to or failed the health check.
Idle	The backend server is not in use.
Health Check Disabled	Health checks are disabled.

How it works

Note

During a health check performed by GWLB, request messages are not encapsulated using the Geneve protocol.

TCP health check

To improve the efficiency of TCP health checks, GWLB sends customized TCP probes to test the availaility of backend servers, as shown in the following figure.

How GWLB performs TCP health checks:

GWLB sends TCP-SYN packets to the internal IP address and health check port of the backend servers based on the health check settings of the listener.
If the backend server ports are alive, the backend servers return SYN-ACK packets after they receive the TCP-SYN packets.
If GWLB does not receive a SYN-ACK packet from a backend server before the response timeout period ends, the backend server is declared unhealthy. Then, GWLB sends an RST packet to the backend server to close the TCP connection.
If GWLB receives a SYN-ACK packet from a backend server before the response timeout period ends, the backend server passes the health check. GWLB sends an ACK packets and then immediately sends an RST packet to close the TCP connection.

HTTP health check

HTTP health checks obtain status information through GET probes, as shown in the following figure.

The HTTP health check mechanism is as follows:

Servers in the GWLB cluster send an HTTP GET request (including the configured [domain name]) to the private network IP + [health check port] + [check path] of the backend server based on the configured health check settings.
After receiving the request, the backend server returns an HTTP status code based on the operation status of the corresponding service.
If the servers in the GWLB cluster do not receive information from the backend server within the [response timeout period], the service is considered unresponsive, and the health check is deemed failed.
If the servers in the GWLB cluster successfully receive information from the backend server within the [response timeout period], the returned information is compared with the configured status code. If it matches, the health check is deemed successful; otherwise, it is deemed failed.

Health check time window

The health check feature improves the availability of your services. However, frequent failovers caused by unhealthy backend servers may affect system availability. Health check time windows are introduced to control failovers. A failover is performed only when a backend server consecutively passes or fails a certain number of health checks within a time window. The health check time window is determined by the following factors:

Health check interval: the time between two health checks.
Response timeout: the time that a backend server takes to respond.
Health check threshold: the number of consecutive times that a backend server passes or fails health checks.

The health check time window is calculated based on the following formula:

Time window for health check failures = Response timeout × Unhealthy threshold + Health check interval × (Unhealthy threshold - 1)
Time window for health check successes = Response time of a successful health check × Healthy threshold + Health check interval × (Healthy threshold - 1)
Note
The response time of a successful health check is the duration from the time when the health check request is sent to the time when the response is received. When TCP health checks are configured, the response time is short and almost negligible because the only check item is whether the probed port is alive. When HTTP health checks are configured, the response time depends on the performance and load of the application server and is typically within a few seconds.

The health check result has the following impacts on request forwarding:

If a backend server fails health checks, new requests are distributed to other backend servers. GWLB remains accessible to clients.
If a backend server passes health checks, new requests are distributed to the backend server. GWLB remains accessible to clients.
If a backend server encounters an error and fails a health check, but is not declared unhealthy by health checks, requests are distributed to the backend server. However, the backend server is inaccessible to requests. By default, a backend server is declared unhealthy if it fails health checks for three consecutive times.

Examples of health check response timeout and health check interval

In this example, the following health check settings are used:

Response timeout period: 5 seconds
Health check interval: 2 seconds
Healthy threshold: 3 times
Unhealthy threshold: 3 times

Time window for health check failures = Response timeout × Unhealthy threshold + Health check interval × (Unhealthy threshold - 1). In this example, the time window is 19 seconds based on the formula 5 × 3 + 2 × (3 - 1). If the backend server does not respond for 19 seconds, the backend server is declared unhealthy.

The following figure shows the time window from a healthy status to an unhealthy status.

Time window for health check successes = Response time of a successful health check × Healthy threshold + Health check interval × (Healthy threshold - 1). In this example, the time window is 7 seconds based on the formula (1 × 3) + 2 × (3 – 1). If the backend server responds within 7 seconds, the backend server is declared healthy.

Note

The response time of a successful health check is the duration from the time when the health check request is sent to the time when the response is received. When TCP health checks are configured, the response time is short and almost negligible because the only check item is whether the probed port is alive. When HTTP health checks are configured, the response time depends on the performance and load of the application server and is typically within a few seconds.

The following figure shows the time window from an unhealthy status to a healthy status. In the following figure, the time that is required for the server to respond to a health check request is 1 second.

Domain names for HTTP health checks

You can specify a domain name for HTTP health checks. This setting is optional. Some application servers must verify the Host header in requests before the application servers can accept the requests. In this case, the request must carry the Host header. If a domain name is configured for health checks, GWLB inserts the domain name into the Host header. Otherwise, health check requests do not carry the Host header. In this case, the health check requests are rejected by the server, which may be declared unhealthy.

If your application server verifies the Host header in requests, you must configure a domain name for health checks to ensure that the health check feature works as expected.

References

For more information, see Configure and manage health checks.