All Products
Search
Document Center

Platform For AI:Configure the health check feature

Last Updated:Nov 28, 2024

Elastic Algorithm Service (EAS) provides the health check feature, which uses the health check mechanism of Kubernetes. The health check feature can automatically detect and recover failed containers to ensure that only healthy instances receive traffic and resources are not allocated to unhealthy instances. This topic describes how to configure the health check feature.

Limits

You can configure the health check feature only when you use a custom image that contains the health check logic to deploy a service.

How it works

The health check feature of EAS uses the health check mechanism of Kubernetes. The feature allows you to detect and manage the health status and availability of services by using the probe technology and health check methods. The following tables describe the probe types and health check methods.

  • Probe types

    Probe type

    Description

    Liveness probe

    The kubelet uses liveness probes to check whether containers are alive, kills unhealthy containers, and then performs subsequent operations based on the restart policy. If a container is not probed by a liveness probe, the kubelet considers that the liveness probe returns Success for the container. This indicates that the container is alive.

    Readiness probe

    Readiness probes are used to check whether a container is ready to receive requests. Only pods that are in the Ready state can receive requests. The relationship between services and endpoints depends on whether a pod is ready.

    • If the value of the Ready field is False, Kubernetes removes the IP address of the pod from the list of endpoints that are associated with the services.

    • After the value of the Ready field changes to True, Kubernetes adds the IP address of the pod to the list of endpoints that are associated with the services.

    Startup probe

    The kubelet uses startup probes to learn when a container is launched. You can use startup probes to ensure that liveness probes and readiness probes are sent to a container only after the container is launched. Startup probes can be used to perform liveness checks on containers that have a slow start speed. This way, the containers are not killed by the kubelet before the containers are launched.

  • Health check methods

    Health check method

    Description

    http_get

    Send HTTP GET requests to check the health status and liveness of services, and confirm whether the probes are successful based on the returned status codes.

    tcp_socket

    Attempt to create a TCP connection to check the health status and liveness of services.

    exec

    Run specific commands in containers and confirm whether the probes are successful based on the exit codes.

Prepare a custom image

You can choose a web framework to encapsulate the prediction logic. In this example, the Flask framework is used. Sample app.py file:

import json
from flask import Flask, request, make_response

app = Flask(__name__)

@app.route('/', methods = ['GET','POST'])
def process_handle_func():
    """ 
       Parse the request body based on your business requirements.
    """
    data = request.get_data().decode('utf-8')
    body = json.loads(data)
    res = process(body)
    """ 
       Configure the response based on your business requirements.
    """
    response = make_response(res)
    response.status_code = 200
    return response

def process(data):
    """ 
       Your prediction logic
    """
    return 'result'

if __name__ == '__main__':
    """
    You must set the host parameter to 0.0.0.0. Otherwise, the health check may fail during service deployment. 
    The port number that you specify for the port parameter must be the same as the port number specified in the JSON configuration file of the service that you deploy. 
    """
    app.run(host='0.0.0.0', port=8000)

You can write a simple Dockerfile to copy the prediction code to the file and install the required packages. The following sample code provides an example of the content of the Dockerfile:

# In this example, Python is used.
FROM registry.cn-shanghai.aliyuncs.com/eas/bashbase-amd64:0.0.1
COPY ./process_code  /eas
RUN /xxx/pip install Name of the package that you require
CMD ["/xxx/python", "/eas/xxx/app.py"] 

For information about how to create a custom image, see Use a Container Registry Enterprise Edition instance to build an image. For more information about custom images, see Deploy a model service by using a custom image. You can also store the code in an File Storage NAS (NAS) or Git repository and mount the storage to a service instance to write the code to the instance during service deployment. For more information, see Mount storage to services. The following section describes how to configure the health check feature during service deployment by copying prediction code to a Dockerfile.

Configure the health check feature during service deployment

Configure the health check feature in the PAI console

  1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Enter Elastic Algorithm Service (EAS).

  2. Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

  3. On the Custom Deployment page, configure the following key parameters. For information about other parameters, see Deploy a model service in the PAI console.

    1. In the Environment Information section, configure the parameters. The following table describes the parameters.

      Parameter

      Description

      Image Configuration

      Select Image Address and enter the address of the prepared custom image. Example: registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz.

      Command to Run

      The entry command of the image. You can enter only a single command. Complex scripts are not supported. The command must be consistent with the command in the Dockerfile. Example: /data/eas/ENV/bin/python /data/eas/app.py.

      You must also enter the port number, which is the local HTTP port on which the image listens after the image is started. Example: 8000.

      Important
      • We recommend that you do not specify port 8080 and port 9090 because the EAS engine listens on the ports.

      • The port number must be the same as the port number configured in the xxx.py file specified in the command.

    2. In the Service Configuration section, enable Health Check and configure the following parameters.image

      Note

      You can add up to three health check items. Only one probe type can be configured for each health check item, and the probe type configured for each health check item must be unique.

      Parameter

      Description

      Probe Type

      The following types of probes are supported:

      • Liveness probe: checks whether containers are running as expected.

      • Readiness probe: ensures that containers are initialized and ready to process requests.

      • Startup probe: prevents applications from being incorrectly marked as failed due to slow launch of containers. This probe is designed for applications that require a long period of time to be initialized.

      For information about the working principles of each type of probe, see How it works.

      Check Method

      The following health check methods are supported:

      • http_get: Call the HTTP GET method by using the IP address, port number, and path of a container. If the status code of the response is greater than or equal to 200 and less than 400, the container is healthy.

      • tcp_socket: Perform a TCP check by using the IP address and port number of a container. If a TCP connection is established, the container is healthy.

      • exec (Custom Health Check): Run specific commands in a container. If the exit code is 0 after the operation is successful, the health check is successful.

      Call Path

      This parameter is available only if you set the Check Method parameter to http_get.

      The endpoint of the HTTP server on which you want to perform the health check. The prefix of the endpoint is http://localhost. You must specify a custom suffix for the endpoint. The default suffix is /.

      Port Number

      This parameter is available only if you set the Check Method parameter to http_get or tcp_socket.

      The port number for the health check. Example: 8000.

      Command

      This parameter is available only if you set the Check Method parameter to exec(Custom Health Check).

      The command that you want to run. The frontend automatically converts the command into the corresponding format and writes the command into the JSON service configuration file.

      Latency for Check Initialization

      The time required to initiate the first health check after the container is launched. Default value: 0. Unit: seconds.

      Check Interval

      The frequency of the health check. Default value: 10. Unit: seconds. A high frequency generates additional overheads for pods. A low frequency may lead to ignorance of container errors.

      Check Timeout Period

      The timeout period of the health check. Default value: 1. Unit: seconds. If a health check times out, the health check is considered failed.

      Check Success Threshold

      The minimum number of consecutive failed health checks after a successful health check before the service is considered unhealthy. Default value for the readiness probe: 3. Default value for the liveness and startup probes: 1.

      Check Failure Threshold

      The minimum number of consecutive successful health checks after a health check fails before the service is considered healthy. Default value: 1.

    3. Click OK.

  4. After you configure the parameters, click Deploy.

Configure the health check feature on an on-premises client

  1. Download the EASCMD client and complete identity authentication. In this example, Windows 64 is used.

  2. Create a service configuration file named service.json in the directory in which the client is located. The following sample code provides an example of the content of the file:

    {
        "metadata": {
            "name": "test",
            "instance": 1,
            "enable_webservice": true
        },
        "cloud": {
            "computing": {
                "instance_type": "ml.gu7i.c16m60.1-gu30"
            }
        },
        "containers": [
            {
                "image":"registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz",
                "env":[
                    {
                        "name":"VAR_NAME",
                        "value":"var_value"
                    }
                ],
                "liveness_check":{
                    "http_get":{
                        "path":"/",
                        "port":8000
                    },
                    "initial_delay_seconds":3,
                    "period_seconds":3,
                    "timeout_seconds":1,
                    "success_threshold":2,
                    "failure_threshold":4
                },
                "command":"/data/eas/ENV/bin/python /data/eas/app1.py",
                "port":8000
            }
        ]
    }

    The following table describes the key parameters. For information about other parameters, see All Parameters of model services.

    Parameter

    Description

    image

    The address of the custom image used to deploy a model service.

    EAS does not support Internet access. You need to access the image by using the virtual private cloud (VPC) endpoint of the image repository to which the image is uploaded. Example: registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz.

    env

    name

    The name of the environment variable that is used to launch a container based on the image.

    value

    The value of the environment variable that is used to launch a container based on the image.

    command

    The entry command of the image. You can enter only a single command. Complex scripts, such as /data/eas/ENV/bin/python /data/eas/app.py, are not supported.

    port

    The network port on which the process in the image listens. Example: 8000.

    Important

    The port number must be consistent with the port number configured in the xxx.py file specified in the command.

    liveness_check

    Note

    liveness_check indicates that a liveness probe is used in the health check. You can also specify readiness_check (readiness probe) or startup_check (startup probe).

    http_get

    The HTTP GET check method that is used to send requests over port 8000. Take note of the following parameters:

    • http_get.path: the endpoint of the HTTP server on which you perform the health check. The prefix of the endpoint is http://localhost. You must specify a custom suffix for the endpoint. The default suffix is/.

    • http_get.port: the port on which you perform the health check on the HTTP Server.

    You can also use the following health check methods:

    • tcp_socket: Perform a TCP check by using the IP address and port number of a container. If a TCP connection is established, the container is healthy. Configuration method:

      "tcp_socket":{
          "port":8000
      }
    • exec: Run a specific command in the container. If the exit code is 0 after the execution is successful, the health check is successful. Configuration method:

      "exec":{
          "command":[
              "your_script",
              "with_args"
          ]
      }

    initial_delay_seconds

    The time required to initiate the first health check after the container is launched. Default value: 0. Unit: seconds.

    period_seconds

    The frequency of the health check. Default value: 10. Unit: seconds. A high frequency generates additional overheads for pods. A low frequency may lead to ignorance of container errors.

    timeout_seconds

    The timeout period of the health check. Default value: 1. Unit: seconds. If a health check times out, the health check is considered failed.

    success_threshold

    The minimum number of consecutive failed health checks after a successful health check before the service is considered unhealthy. Default value for the readiness probe: 3. Default value for the liveness and startup probes: 1.

    failure_threshold

    The minimum number of consecutive successful health checks after a health check fails before the service is considered healthy. Default value: 1.

  3. Run the following command in the directory in which the JSON file is located to create the service: For more information, see Run commands to use the EASCMD client.

    eascmdwin64.exe create <service.json>

    Replace <service.json> with the name of the JSON file that you created.