All Products
Search
Document Center

Server Load Balancer:Use connection draining to implement graceful undeployment of services

Last Updated:Sep 26, 2024

After a backend server is removed or declared unhealthy, the existing connections to the backend server remain open for a period of time. Requests still can be forwarded to the backend server. In this case, request errors are triggered or services on the backend server cannot be undeployed. To prevent such errors, you can enable connection draining for your Application Load Balancer (ALB) instance. When a backend server is removed or declared unhealthy, data transmission is maintained until the connection draining timeout period ends. When the connection draining timeout period ends, the connections proactively close. This ensures graceful undeployment of services.

Scenarios

Connection draining is ideal for the following scenarios:

  • Removing backend servers: Before you remove a backend server, we recommend that you specify a long connection draining timeout period to ensure that the server has time to process existing requests.

  • Unhealthy backend servers: We recommend that you specify a short health check timeout period to ensure that faulty connections can be closed before errors are returned to clients.

To use connection draining in the preceding scenarios, you must specify a proper connection draining timeout period based on your business requirements.

Scenario 1: Removing backend servers

The following figure shows the scenario in this example. When you remove ECS01. ALB no longer distributes requests to ECS01, which processes only existing requests and does not accept new requests.

  • If you disable connection draining, ECS01 closes the sessions when all existing requests are processed.

  • If you enable connection draining and specify a connection draining timeout period:

    • If ECS01 has in-progress requests, ALB closes sessions on ECS01 when the connection draining timeout period ends.

    • If ECS01 does not have in-progress requests or active connections, ALB immediately removes ECS01, regardless of whether the connection draining timeout period ends.

    • If ECS01 has requests that are being transmitted, the connections are closed when ECS01 is removed and the 500 error code is returned to the clients. For example, if you specify the connection draining timeout period to 15 seconds but ECS01 requires 30 seconds to process the requests, the connections are closed before ECS01 sends all responses. As a result, the clients receive the 500 error code.

    Note

    If ECS01 is removed but re-added, existing sessions are not affected before the connection draining timeout period ends. The connection draining timeout period starts when ECS01 is removed. The status of ECS01 remains unchanged before the existing sessions are closed. During this process, ECS01 only processes existing requests and no longer accepts new requests. The existing sessions are closed when the connection draining timeout period ends.

image

Remove ECS01, enable connection draining, and specify a connection draining timeout period. The following figure shows the status changes of ECS01.

image

Scenario 2: Unhealthy backend servers

If ECS01 is declared unhealthy by health checks, ALB no longer distributes requests to ECS01. In this case, ECS01 can process existing requests but no longer accept new requests.

  • If you disable connection draining, ECS01 does not accept new requests until ECS01 is declared healthy again.

  • If you enable connection draining and specify a connection draining timeout period:

    • ALB closes existing sessions on ECS01 when the connection draining timeout period ends.

    • If a server group is updated, such as configuration updates of ECS01, the connection status of ECS01 remains unchanged. ECS01 can process existing requests but does not accept new requests. ALB still closes existing sessions on ECS01 when the connection draining timeout period ends even if ECS01 is declared healthy.

    Note
    • ALB closes existing sessions on ECS01 when the connection draining timeout period ends. If ECS01 is healthy, ECS01 can accept new requests. If ECS01 is unhealthy, ECS01 does not accept new requests.

    • If a backend server is declared unhealthy during a configuration update, connection draining is not triggered. Connection draining is triggered only when a backend server is declared unhealthy due to service errors.

    image

    The following figure shows the status changes of ECS01 when it is declared unhealthy.

    image

Specify a proper connection draining timeout period based on your business requirements. In this example, Scenario 1: Removing backend servers is used to demonstrate how to enable connection draining for WebSocket and HTTP sessions.

Precautions

  • Only standard and WAF-enabled ALB instances support connection draining. Basic ALB instances do not support connection draining.

  • Server groups of the Function Compute type do not support connection draining.

  • To enable connection draining, we recommend that you use the WebSocket protocol. In HTTP scenarios, HTTP requests may be timed out or limited. To prevent such issues, we recommend that you set the connection draining timeout period to a value that is larger than the ALB connection request timeout period. The default connection draining timeout period is longer than the connection request timeout period to prevent HTTP requests from being mistakenly closed. For more information about how to configure the connection request timeout period, see Add an HTTP listener.

Prerequisites

  • A standard or WAF-enabled ALB instance is created, and a server group of the server type is created for the ALB instance. In this example, a standard ALB instance is used. For more information, see Create an ALB instance and Create and manage server groups.

  • An HTTP listener that uses port 80 is created for the ALB instance, and the listener is associated with the server group. For more information, see Add an HTTP listener.

  • ECS01 and ECS02 are created. For information about how to create an instance, see Create an instance by using the wizard.

  • ECS02 is added to the server group, and services on ECS02 are accessible from clients. For more information, see Use an ALB instance to provide IPv4 services and Use ALB to balance loads for IPv6 services.

    Note
    • In this example, a client that runs the 64-bit Alibaba Cloud Linux 3.2104 operating system is used. Make sure that Python is installed on the operating system of your client and on ECS01. For more information about how to install Python, see Download Python. In this example, Python3.x is used.

    • In this example, services run on ECS02. If you have a backend server that can run services, you do not need to create ECS02.

Configure a data synchronization task

This procedure demonstrates how to use connection draining to drain WebSocket and HTTP sessions and how requests are processed by ALB in different connection draining states.

Connection draining for WebSocket sessions

Step 1: Enable connection draining

In this example, a server group is already prepared. Connection draining is configured by modifying the server group. If you do not have a server group, you can enable connection draining when you create a server group.

  1. Log on to the ALB console.
  2. In the top navigation bar, select the region where the server group is deployed.

  3. In the left-side navigation pane, choose ALB > Server Groups.

  4. On the Server Groups page, click the ID of the server group that you want to manage.

  5. On the Details tab, click Modify Basic Information in the Basic Information section.

  6. In the Modify Basic Information dialog box, click Advanced Settings and turn on Connection Draining.

  7. Set Connection Draining Timeout to 300 seconds and click Save.

Step 2: Verify the result

Configure connection draining on the backend server

  1. Remotely log on to ECS01. For more information, see Connection method overview.

  2. Run the following commands to create a WebSocket file and open the WebSocket directory:

    mkdir WebSocket
    cd WebSocket
  3. Run the following commands to install dependency packages:

    pip install tornado
    pip install websocket-client
  4. Run the following command to modify the server.py configuration file:

    vim server.py
    1. Press the I key to open the editor and configure the following parameters to enable a WebSocket service:

      #!/usr/bin/env python3
      # encoding=utf-8
      
      import tornado.websocket
      import tornado.ioloop
      import tornado.web
      from datetime import datetime
      
      
      # The WebSocket handler
      class WebSocketHandler(tornado.websocket.WebSocketHandler):
          def open(self):
              current_time = datetime.now()
              formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S")
              print("Time:", formatted_time, "The WebSocket connection is established")
      
          def on_message(self, message):
              current_time = datetime.now()
              formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S")
              print("Time:", formatted_time, "Received a message:", message)
              self.write_message("The server received your message:" + message)
      
          def on_close(self):
              current_time = datetime.now()
              formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S")
              print("Time:", formatted_time, "The WebSocket connection is closed")
      
      # Routes
      application = tornado.web.Application([
          (r"/websocket", WebSocketHandler),
      ])
      
      if __name__ == "__main__":
          print("WebSocket Server Start on 8080 ...")
          application.listen(8080)
          tornado.ioloop.IOLoop.current().start()
      
    2. After you complete the modifications, press the Esc key, enter :wq, and then press the Enter key to save and close the configuration file.

  5. In the directory of the server.py file, run the following command to enable the WebSocket service:

    python3 server.py

    The following response message indicates that the WebSocket service is enabled:

    Websocket Server Start on 8080 ...

Add ECS01 to the server group

  1. Log on to the ALB console.
  2. In the top navigation bar, select the region where the server group is deployed.

  3. In the left-side navigation pane, choose ALB > Server Groups.

  4. On the Server Groups page, find the server group that you want to manage and click Modify Backend Server in the Actions column.

  5. On the Backend Servers tab, click Add Backend Server, select ECS01 in the Select Servers step, and then click Next.

  6. In the Ports/Weights step, select ECS01, set the port to 8080, and then click OK.

Configure connection draining on the client

  1. Log on to the client and open the command-line interface (CLI). Run the following commands to create a WebSocket file and open the WebSocket directory:

    mkdir WebSocket
    cd WebSocket
  2. Run the following command to install dependency packages:

    pip install websocket-client
  3. Run the following command to modify the client.py file:

    vim client.py
    1. Press the I key to open the editor and configure the following parameters to enable an access service for the WebSocket client:

      #!/usr/bin/env python3
      # encoding=utf-8
      
      import websocket
      import time
      from datetime import datetime
      
      def on_message(ws, message):
          print("Received a server message:", message)
      
      if __name__ == "__main__":
          ws = websocket.WebSocket()
          ws.connect("ws://<ALB domain name>:80/websocket") # Enter the domain name of your ALB instance
          print("The WebSocket connection is established")
          try:
              while True:
                  current_time = datetime.now()
                  formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S")
                  print("Packet time:", formatted_time)
                  ws.send("Hello, Server!")
                  result = ws.recv()
                  on_message(ws, result)
                  time.sleep(1)
          except Exception:
              print("The WebSocket connection is closed")
      
    2. After you complete the modifications, press the Esc key, enter :wq, and then press the Enter key to save and close the configuration file.

  4. In the directory of the client.py file, run the following command to access ECS01:

    python3 client.py

    The following response message indicates that the client can access ECS01:

    The WebSocket connection is established
    Packet time: 2024-04-28 17:00:53
    Received server message: The server received your message: Hello, Server!
    Packet time: 2024-04-28 17:00:54
    Received server message: The server received your message: Hello, Server!

    ECS01 returns the following response message:

    WebSocket Server Start on 8080 ...
    Time: 2024-04-28 17:00:53 The WebSocket connection is established
    Time: 2024-04-28 17:00:53 Received message: Hello, Server!
    Time: 2024-04-28 17:00:54 Received message: Hello, Server!

Remove backend servers

Important

Specify a connection draining timeout period before you remove a backend server.

  1. Log on to the ALB console.
  2. In the top navigation bar, select the region where the server group is deployed.

  3. In the left-side navigation pane, choose ALB > Server Groups.

  4. Click the ID of the server group from which you want to remove backend servers.

  5. On the Backend Servers tab, find ECS01 and click Remove in the Actions column.

  6. In the Remove Backend Server message, click OK.

Wait for the connection draining timeout period to end

In this example, the connection draining timeout period is set to 300 seconds. As a result, the connections on ECS01 are closed 300 seconds after you remove ECS01.

Note

In the test results, the time from when the WebSocket connection is established on ECS01 to when the WebSocket connection is closed on ECS01 is 330 seconds. The connection draining timeout period refers to the time from when ECS01 is removed to when the WebSocket connection is closed, which is about 300 seconds.

  • The client returns the following response message:

    Packet time: 2024-04-28 17:06:23
    Received server message: The server received your message: Hello, Server!
    Packet time: 2024-04-28 17:06:24
    The WebSocket connection is closed

  • ECS01 returns the following response message:

    Time: 2024-04-28 17:06:22 Received message: Hello, Server!
    Time: 2024-04-28 17:06:23 Received message: Hello, Server!
    Time: 2024-04-28 17:06:23 The WebSocket connection is closed

Connection draining for HTTP sessions

In HTTPS scenarios, the response that clients receive varies based on the connection draining timeout period, connection request timeout period, and backend server processing time.

  • If the connection draining timeout period is shorter than the backend server processing time, responses from ECS01 are interrupted. As a result, clients receive the HTTP 500 status code.

  • If the backend server processing time is longer than the connection request timeout period, responses from ECS01 time out. As a result, clients receive the HTTP 504 status code.

In this example, the connection draining timeout period is set to 15 seconds and the backend server processing time is set to 30 seconds. In this example, responses from ECS01 are interrupted and clients receive the HTTP 500 status code.

Note
  • In Step 2, the connection request timeout period is set to the default value of 60 seconds, which is longer than the backend server processing time (30 seconds). As a result, the HTTP 504 status code is not returned but the HTTP 500 status code is returned because the connection draining timeout period (15 seconds) is shorter than the backend server processing time (30 seconds).

  • In this example, the time.sleep function is used in the Python code to simulate the backend server processing time.

Step 1: Enable connection draining

In this example, a server group is already prepared. Connection draining is configured by modifying the server group. If you do not have a server group, you can enable connection draining when you create a server group.

  1. Log on to the ALB console.
  2. In the top navigation bar, select the region where the server group is deployed.

  3. In the left-side navigation pane, choose ALB > Server Groups.

  4. On the Server Groups page, click the ID of the server group that you want to manage.

  5. On the Details tab, click Modify Basic Information in the Basic Information section.

  6. In the Modify Basic Information dialog box, click Advanced Settings and turn on Connection Draining.

  7. Set Timeout Period to 15 seconds and click Save.

Step 2: Specify a connection request timeout period

  1. Log on to the ALB console.
  2. In the top navigation bar, select the region where the ALB instance is deployed.

  3. On the Instances page, click the ID of the ALB instance that you want to manage.

  4. On the Listener tab, click the ID of the HTTP listener that you want to manage.

  5. In the Basic Information section, click Modify Listener.

  6. In the Modify Listener dialog box, click Modify on the right side of Advanced Settings.

  7. Set Connection Request Timeout to 60 seconds, which is the default timeout period, and click Save.

Step 3: Verify the result

Configure connection draining on the backend server

  1. Remotely log on to ECS01. For more information, see Connection method overview.

  2. Run the following commands to create an HTTP folder and open the HTTP directory:

    mkdir http
    cd http
  3. Run the following command to modify the http_server.py configuration file:

    vim http_server.py
    1. Press the I key to open the editor and configure the following parameters to enable the HTTP Server service:

      #!/usr/bin/env python3
      # encoding=utf-8
      
      from http.server import SimpleHTTPRequestHandler, HTTPServer
      from datetime import datetime
      import time
      
      class DelayedHTTPRequestHandler(SimpleHTTPRequestHandler):
          def do_GET(self):
              current_time = datetime.now()
              formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S")
              print("Time:", formatted_time, "Received the GET request and will respond in 30 seconds....")
              time.sleep(30) # Configure the time.sleep function to simulate the backend server processing time
              SimpleHTTPRequestHandler.do_GET(self)
      
      PORT = 8080
      server = HTTPServer(("", PORT), DelayedHTTPRequestHandler)
      print(f"Serving HTTP on 0.0.0.0 port {PORT} (http://0.0.0.0:{PORT}/) ...")
      server.serve_forever()
      
    2. After you complete the modifications, press the Esc key, enter :wq, and then press the Enter key to save and close the configuration file.

  4. Enter the directory of http_server.py and run the following command to start the HTTP Server service:

    python3 http_server.py

    The following response message indicates that the HTTP Server service is running:

    Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) ...

Add ECS01 to a server group

  1. Log on to the ALB console.
  2. In the top navigation bar, select the region where the server group is deployed.

  3. On the Server Groups page, find the server group that you want to manage and click Modify Backend Server in the Actions column.

  4. On the Backend Servers tab, click Add Backend Server, select ECS01 in the Select Servers step, and then click Next.

  5. In the Ports/Weights step, select ECS01, set the port to 8080, and then click OK.

Configure connection draining on the client

  1. Log on to the client and open the command-line interface (CLI). Run the following command to access ECS01:

    curl http://<ALB domain name>:80/ -v

    The following response message indicates that the ALB instance can access the backend service:

    * About to connect() to alb-nssnq5a********.cn-guangzhou.alb.aliyuncs.com port 80 (#0)
    *   Trying 10.X.X.225...
    * Connected to alb-nssnq5a********.cn-guangzhou.alb.aliyuncs.com (10.X.X.225) port 80 (#0)
    > GET / HTTP/1.1
    > User-Agent: curl/7.29.0
    > Host: alb-nssnq5a********.cn-guangzhou.alb.aliyuncs.com
    > Accept: */*

    The server receives the following response message:

    Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) ...
    Time: 2024-02-07 13:57:33 Received the Get request and will respond in 30 seconds....

Remove backend servers

Important

Specify a connection draining timeout period before you remove a backend server.

  1. Log on to the ALB console.
  2. In the top navigation bar, select the region where the server group is deployed.

  3. In the left-side navigation pane, choose ALB > Server Groups.

  4. Click the ID of the server group from which you want to remove backend servers.

  5. On the Backend Servers tab, find ECS01 and click Remove in the Actions column.

  6. In the Remove Backend Server message, click OK.

Wait for the connection draining timeout period to end

The result shows that when the connection draining timeout period is shorter than the backend server processing time, clients receive the HTTP 500 status code.

* About to connect() to alb-nssnq5a********.cn-guangzhou.alb.aliyuncs.com port 80 (#0)
*   Trying 10.X.X.224...
* Connected to alb-nssnq5a********.cn-guangzhou.alb.aliyuncs.com (10.1.0.224) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: alb-nssnq5a********.cn-guangzhou.alb.aliyuncs.com
> Accept: */*
> 
< HTTP/1.1 500 Internal Server Error
< Date: Wed, 07 Feb 2024 06:02:24 GMT
< Content-Type: text/html
< Content-Length: 186
< Connection: close
< Via: HTTP/1.1 SLB.87
< 
<html>
<head><title>500 Internal Server Error</title></head>
<body bgcolor="white">
<center><h1>500 Internal Server Error</h1></center>
<hr><center>nginx</center>
</body>
</html>
* Closing connection 0

References