After a backend server is removed or declared unhealthy, the existing connections to the backend server remain open for a period of time. Requests still can be forwarded to the backend server. In this case, request errors are triggered or services on the backend server cannot be undeployed. To prevent such errors, you can enable connection draining for your Application Load Balancer (ALB) instance. When a backend server is removed or declared unhealthy, data transmission is maintained until the connection draining timeout period ends. When the connection draining timeout period ends, the connections proactively close. This ensures graceful undeployment of services.
Scenarios
Connection draining is ideal for the following scenarios:
Removing backend servers: Before you remove a backend server, we recommend that you specify a long connection draining timeout period to ensure that the server has time to process existing requests.
Unhealthy backend servers: We recommend that you specify a short health check timeout period to ensure that faulty connections can be closed before errors are returned to clients.
To use connection draining in the preceding scenarios, you must specify a proper connection draining timeout period based on your business requirements.
Scenario 1: Removing backend servers
The following figure shows the scenario in this example. When you remove ECS01. ALB no longer distributes requests to ECS01, which processes only existing requests and does not accept new requests.
If you disable connection draining, ECS01 closes the sessions when all existing requests are processed.
If you enable connection draining and specify a connection draining timeout period:
If ECS01 has in-progress requests, ALB closes sessions on ECS01 when the connection draining timeout period ends.
If ECS01 does not have in-progress requests or active connections, ALB immediately removes ECS01, regardless of whether the connection draining timeout period ends.
If ECS01 has requests that are being transmitted, the connections are closed when ECS01 is removed and the 500 error code is returned to the clients. For example, if you specify the connection draining timeout period to 15 seconds but ECS01 requires 30 seconds to process the requests, the connections are closed before ECS01 sends all responses. As a result, the clients receive the 500 error code.
NoteIf ECS01 is removed but re-added, existing sessions are not affected before the connection draining timeout period ends. The connection draining timeout period starts when ECS01 is removed. The status of ECS01 remains unchanged before the existing sessions are closed. During this process, ECS01 only processes existing requests and no longer accepts new requests. The existing sessions are closed when the connection draining timeout period ends.
Remove ECS01, enable connection draining, and specify a connection draining timeout period. The following figure shows the status changes of ECS01.
Scenario 2: Unhealthy backend servers
If ECS01 is declared unhealthy by health checks, ALB no longer distributes requests to ECS01. In this case, ECS01 can process existing requests but no longer accept new requests.
If you disable connection draining, ECS01 does not accept new requests until ECS01 is declared healthy again.
If you enable connection draining and specify a connection draining timeout period:
ALB closes existing sessions on ECS01 when the connection draining timeout period ends.
If a server group is updated, such as configuration updates of ECS01, the connection status of ECS01 remains unchanged. ECS01 can process existing requests but does not accept new requests. ALB still closes existing sessions on ECS01 when the connection draining timeout period ends even if ECS01 is declared healthy.
NoteALB closes existing sessions on ECS01 when the connection draining timeout period ends. If ECS01 is healthy, ECS01 can accept new requests. If ECS01 is unhealthy, ECS01 does not accept new requests.
If a backend server is declared unhealthy during a configuration update, connection draining is not triggered. Connection draining is triggered only when a backend server is declared unhealthy due to service errors.
The following figure shows the status changes of ECS01 when it is declared unhealthy.
Specify a proper connection draining timeout period based on your business requirements. In this example, Scenario 1: Removing backend servers is used to demonstrate how to enable connection draining for WebSocket and HTTP sessions.
Precautions
Only standard and WAF-enabled ALB instances support connection draining. Basic ALB instances do not support connection draining.
Server groups of the Function Compute type do not support connection draining.
To enable connection draining, we recommend that you use the WebSocket protocol. In HTTP scenarios, HTTP requests may be timed out or limited. To prevent such issues, we recommend that you set the connection draining timeout period to a value that is larger than the ALB connection request timeout period. The default connection draining timeout period is longer than the connection request timeout period to prevent HTTP requests from being mistakenly closed. For more information about how to configure the connection request timeout period, see Add an HTTP listener.
Prerequisites
A standard or WAF-enabled ALB instance is created, and a server group of the server type is created for the ALB instance. In this example, a standard ALB instance is used. For more information, see Create an ALB instance and Create and manage server groups.
An HTTP listener that uses port
80
is created for the ALB instance, and the listener is associated with the server group. For more information, see Add an HTTP listener.ECS01 and ECS02 are created. For information about how to create an instance, see Create an instance by using the wizard.
ECS02 is added to the server group, and services on ECS02 are accessible from clients. For more information, see Use an ALB instance to provide IPv4 services and Use ALB to balance loads for IPv6 services.
NoteIn this example, a client that runs the 64-bit Alibaba Cloud Linux 3.2104 operating system is used. Make sure that Python is installed on the operating system of your client and on ECS01. For more information about how to install Python, see Download Python. In this example, Python3.x is used.
In this example, services run on ECS02. If you have a backend server that can run services, you do not need to create ECS02.
Configure a data synchronization task
This procedure demonstrates how to use connection draining to drain WebSocket and HTTP sessions and how requests are processed by ALB in different connection draining states.
Connection draining for WebSocket sessions
Step 1: Enable connection draining
In this example, a server group is already prepared. Connection draining is configured by modifying the server group. If you do not have a server group, you can enable connection draining when you create a server group.
- Log on to the ALB console.
In the top navigation bar, select the region where the server group is deployed.
In the left-side navigation pane, choose
.On the Server Groups page, click the ID of the server group that you want to manage.
On the Details tab, click Modify Basic Information in the Basic Information section.
In the Modify Basic Information dialog box, click Advanced Settings and turn on Connection Draining.
Set Connection Draining Timeout to 300 seconds and click Save.
Step 2: Verify the result
Configure connection draining on the backend server
Remotely log on to ECS01. For more information, see Methods for connecting to an ECS instance.
Run the following commands to create a WebSocket file and open the WebSocket directory:
mkdir WebSocket cd WebSocket
Run the following commands to install dependency packages:
pip install tornado pip install websocket-client
Run the following command to modify the server.py configuration file:
vim server.py
Press the
I
key to open the editor and configure the following parameters to enable a WebSocket service:#!/usr/bin/env python3 # encoding=utf-8 import tornado.websocket import tornado.ioloop import tornado.web from datetime import datetime # The WebSocket handler class WebSocketHandler(tornado.websocket.WebSocketHandler): def open(self): current_time = datetime.now() formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S") print("Time:", formatted_time, "The WebSocket connection is established") def on_message(self, message): current_time = datetime.now() formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S") print("Time:", formatted_time, "Received a message:", message) self.write_message("The server received your message:" + message) def on_close(self): current_time = datetime.now() formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S") print("Time:", formatted_time, "The WebSocket connection is closed") # Routes application = tornado.web.Application([ (r"/websocket", WebSocketHandler), ]) if __name__ == "__main__": print("WebSocket Server Start on 8080 ...") application.listen(8080) tornado.ioloop.IOLoop.current().start()
After you complete the modifications, press the
Esc
key, enter:wq
, and then press the Enter key to save and close the configuration file.
In the directory of the server.py file, run the following command to enable the WebSocket service:
python3 server.py
The following response message indicates that the WebSocket service is enabled:
Websocket Server Start on 8080 ...
Add ECS01 to the server group
- Log on to the ALB console.
In the top navigation bar, select the region where the server group is deployed.
In the left-side navigation pane, choose
.On the Server Groups page, find the server group that you want to manage and click Modify Backend Server in the Actions column.
On the Backend Servers tab, click Add Backend Server, select ECS01 in the Select Servers step, and then click Next.
In the Ports/Weights step, select ECS01, set the port to
8080
, and then click OK.
Configure connection draining on the client
Log on to the client and open the command-line interface (CLI). Run the following commands to create a WebSocket file and open the WebSocket directory:
mkdir WebSocket cd WebSocket
Run the following command to install dependency packages:
pip install websocket-client
Run the following command to modify the client.py file:
vim client.py
Press the
I
key to open the editor and configure the following parameters to enable an access service for the WebSocket client:#!/usr/bin/env python3 # encoding=utf-8 import websocket import time from datetime import datetime def on_message(ws, message): print("Received a server message:", message) if __name__ == "__main__": ws = websocket.WebSocket() ws.connect("ws://<Domain name>:80/websocket") # Enter a domain name based on the actual situation print("The WebSocket connection is established") try: while True: current_time = datetime.now() formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S") print("Packet time:", formatted_time) ws.send("Hello, Server!") result = ws.recv() on_message(ws, result) time.sleep(1) except Exception: print("The WebSocket connection is closed")
After you complete the modifications, press the
Esc
key, enter:wq
, and then press the Enter key to save and close the configuration file.
In the directory of the client.py file, run the following command to access ECS01:
python3 client.py
The following response message indicates that the client can access ECS01:
The WebSocket connection is established Packet time: 2024-04-28 17:00:53 Received server message: The server received your message: Hello, Server! Packet time: 2024-04-28 17:00:54 Received server message: The server received your message: Hello, Server!
ECS01 returns the following response message:
WebSocket Server Start on 8080 ... Time: 2024-04-28 17:00:53 The WebSocket connection is established Time: 2024-04-28 17:00:53 Received message: Hello, Server! Time: 2024-04-28 17:00:54 Received message: Hello, Server!
Remove backend servers
Specify a connection draining timeout period before you remove a backend server.
- Log on to the ALB console.
In the top navigation bar, select the region where the server group is deployed.
In the left-side navigation pane, choose
.Click the ID of the server group from which you want to remove backend servers.
On the Backend Servers tab, find ECS01 and click Remove in the Actions column.
In the Remove Backend Server message, click OK.
Wait for the connection draining timeout period to end
In this example, the connection draining timeout period is set to 300 seconds. As a result, the connections on ECS01 are closed 300 seconds after you remove ECS01.
In the test results, the time from when the WebSocket connection is established on ECS01 to when the WebSocket connection is closed on ECS01 is 330 seconds. The connection draining timeout period refers to the time from when ECS01 is removed to when the WebSocket connection is closed, which is about 300 seconds.
The client returns the following response message:
Packet time: 2024-04-28 17:06:23 Received server message: The server received your message: Hello, Server! Packet time: 2024-04-28 17:06:24 The WebSocket connection is closed
ECS01 returns the following response message:
Time: 2024-04-28 17:06:22 Received message: Hello, Server! Time: 2024-04-28 17:06:23 Received message: Hello, Server! Time: 2024-04-28 17:06:23 The WebSocket connection is closed
Connection draining for HTTP sessions
In HTTPS scenarios, the response that clients receive varies based on the connection draining timeout period, connection request timeout period, and backend server processing time.
If the connection draining timeout period is shorter than the backend server processing time, responses from ECS01 are interrupted. As a result, clients receive the HTTP 500 status code.
If the backend server processing time is longer than the connection request timeout period, responses from ECS01 time out. As a result, clients receive the HTTP 504 status code.
In this example, the connection draining timeout period is set to 15 seconds and the backend server processing time is set to 30 seconds. In this example, responses from ECS01 are interrupted and clients receive the HTTP 500 status code.
In Step 2, the connection request timeout period is set to the default value of 60 seconds, which is longer than the backend server processing time (30 seconds). As a result, the HTTP 504 status code is not returned but the HTTP 500 status code is returned because the connection draining timeout period (15 seconds) is shorter than the backend server processing time (30 seconds).
In this example, the
time.sleep
function is used in the Python code to simulate the backend server processing time.
Step 1: Enable connection draining
In this example, a server group is already prepared. Connection draining is configured by modifying the server group. If you do not have a server group, you can enable connection draining when you create a server group.
- Log on to the ALB console.
In the top navigation bar, select the region where the server group is deployed.
In the left-side navigation pane, choose
.On the Server Groups page, click the ID of the server group that you want to manage.
On the Details tab, click Modify Basic Information in the Basic Information section.
In the Modify Basic Information dialog box, click Advanced Settings and turn on Connection Draining.
Set Timeout Period to 15 seconds and click Save.
Step 2: Specify a connection request timeout period
- Log on to the ALB console.
In the top navigation bar, select the region where the ALB instance is deployed.
On the Instances page, click the ID of the ALB instance that you want to manage.
On the Listener tab, click the ID of the HTTP listener that you want to manage.
In the Basic Information section, click Modify Listener.
In the Modify Listener dialog box, click Modify on the right side of Advanced Settings.
Set Connection Request Timeout to 60 seconds, which is the default timeout period, and click Save.
Step 3: Create a DNS record
In actual business scenarios, we recommend that you use CNAME records to map custom domain names to the domain name of your ALB instance.
In the left-side navigation pane, choose
.On the Instances page, copy the domain name of the ALB instance.
Perform the following steps to create a CNAME record:
NoteIf your domain name is not registered by using Alibaba Cloud Domains, you must add your domain name to Alibaba Cloud DNS before you can configure a DNS record. For more information, see Manage domain names.
Log on to the Alibaba Cloud DNS console.
On the Authoritative DNS Resolution page, find your domain name and click DNS Settings in the Actions column.
On the DNS Settings tab of the domain name details page, click Add DNS Record.
In the Add DNS Record panel, configure the parameters and click OK. The following table describes the parameters.
Parameter
Description
Record Type
Select CNAME from the drop-down list.
Hostname
Enter the prefix of the domain name. In this example, @ is entered.
NoteIf you use a root domain name, enter
@
.DNS Request Source
Select Default.
Record Value
Enter the CNAME, which is the domain name of the ALB instance.
TTL
Select a time-to-live (TTL) value for the CNAME record to be cached on the DNS server. In this example, the default value is used.
Step 4: Verify the result
Configure connection draining on the backend server
Remotely log on to ECS01. For more information, see Methods for connecting to an ECS instance.
Run the following commands to create an HTTP folder and open the HTTP directory:
mkdir http cd http
Run the following command to modify the http_server.py configuration file:
vim http_server.py
Press the
I
key to open the editor and configure the following parameters to enable the HTTP Server service:#!/usr/bin/env python3 # encoding=utf-8 from http.server import SimpleHTTPRequestHandler, HTTPServer from datetime import datetime import time class DelayedHTTPRequestHandler(SimpleHTTPRequestHandler): def do_GET(self): current_time = datetime.now() formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S") print("Time:", formatted_time, "Received the GET request and will respond in 30 seconds....") time.sleep(30) # Configure the time.sleep function to simulate the backend server processing time SimpleHTTPRequestHandler.do_GET(self) PORT = 8080 server = HTTPServer(("", PORT), DelayedHTTPRequestHandler) print(f"Serving HTTP on 0.0.0.0 port {PORT} (http://0.0.0.0:{PORT}/) ...") server.serve_forever()
After you complete the modifications, press the
Esc
key, enter:wq
, and then press the Enter key to save and close the configuration file.
Enter the directory of http_server.py and run the following command to start the HTTP Server service:
python3 http_server.py
The following response message indicates that the HTTP Server service is running:
Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) ...
Add ECS01 to a server group
- Log on to the ALB console.
In the top navigation bar, select the region where the server group is deployed.
On the Server Groups page, find the server group that you want to manage and click Modify Backend Server in the Actions column.
On the Backend Servers tab, click Add Backend Server, select ECS01 in the Select Servers step, and then click Next.
In the Ports/Weights step, select ECS01, set the port to
8080
, and then click OK.
Configure connection draining on the client
Log on to the client and open the command-line interface (CLI). Run the following command to access ECS01:
curl http://<Domain name>:80/ -v
The following response message indicates that the ALB instance can access the backend service:
* About to connect() to www.example.com port 80 (#0) * Trying 10.X.X.225... * Connected to www.example.com (10.X.X.225) port 80 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: www.example.com > Accept: */*
The server receives the following response message:
Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) ... Time: 2024-02-07 13:57:33 Received the Get request and will respond in 30 seconds....
Remove backend servers
Specify a connection draining timeout period before you remove a backend server.
- Log on to the ALB console.
In the top navigation bar, select the region where the server group is deployed.
In the left-side navigation pane, choose
.Click the ID of the server group from which you want to remove backend servers.
On the Backend Servers tab, find ECS01 and click Remove in the Actions column.
In the Remove Backend Server message, click OK.
Wait for the connection draining timeout period to end
The result shows that when the connection draining timeout period is shorter than the backend server processing time, clients receive the HTTP 500 status code.
* About to connect() to www.example.com port 80 (#0)
* Trying 10.X.X.224...
* Connected to www.example.com (10.XX.XX.224) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: www.example.com
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Date: Wed, 07 Feb 2024 06:02:24 GMT
< Content-Type: text/html
< Content-Length: 186
< Connection: close
< Via: HTTP/1.1 SLB.87
<
<html>
<head><title>500 Internal Server Error</title></head>
<body bgcolor="white">
<center><h1>500 Internal Server Error</h1></center>
<hr><center>nginx</center>
</body>
</html>
* Closing connection 0
References
For more information about how to enable connection draining when you create a server group, see Create and manage server groups.
To implement graceful deployment of services, enable the slow start mode. For more information, see Use slow starts to implement graceful deployment of services.
For more information about WebSocket and HTTP, see Add an HTTP listener and Use WebSocket to enable real-time messaging.