Invoke GPU functions based on asynchronous tasks - Function Compute

This topic describes how to use Serverless Devs to invoke a GPU function based on asynchronous tasks and pass the invocation results to the configured asynchronous destination functions.

Background

GPU-accelerated Instance

With the widespread application of machine learning, especially in-depth learning, CPUs are incapable of meeting the computing power requirements generated by a large number of vector, matrix, and tensor operations. The computing power requirements include the requirements for high-precision calculations in training scenarios and the requirements for low-precision calculations in reasoning scenarios. In 2007, Nvidia launched the Compute Unified Device Architecture (CUDA) framework, a programmable general-purpose computing platform. Researchers and developers revised numerous algorithms to improve performance by dozens or even thousands of times. GPU has become one of the basic facilities of various tools, algorithms, and frameworks since machine learning got popular.

During Apsara Conference 2021, Alibaba Cloud Function Compute officially launched GPU-accelerated instances that use the Turing architecture. Serverless developers can use GPU hardware to accelerate AI training and inference tasks. This way, the efficiency of model training and inference services is improved.

Asynchronous tasks

Function Compute provides full-stack capabilities that can be used to distribute, execute, and monitor asynchronous tasks. This allows you to focus on the compilation of task processing logic, and need to only create and submit the task processing functions. Function Compute provides various monitoring features such as asynchronous task logs, metrics, and duration statistics in each phase. Function Compute also provides features such as auto scaling of instances, task deduplication, termination of specified tasks, and batch task suspension, resumption, and deletion. For more information, see Overview.

Scenarios

In non-real-time and offline AI inference scenarios, AI training scenarios, and audio and video production scenarios, GPU functions are invoked based on asynchronous tasks. This allows developers to focus on businesses and quickly achieve business goals. The following section describes the implementation methods:

GPU resources can be used in 1/8, 1/4, 1/2 or exclusive mode by using the GPU virtualization technology. This way, GPU-accelerated instances can be configured in a fine-grained manner.
Various mature asynchronous task processing capabilities, such as asynchronous mode management, task deduplication, task monitoring, task retry, event triggering, result callback, and task orchestration, are provided.
Developers can focus on code development and the achievement of business objectives without the need to perform O&M on GPU clusters, such as driver and CUDA version management, machine operation management, and GPU bad card management.

How it works

This topic describes how to deploy a GPU function and implement result callbacks. In this topic, the tgpu_basic_func GPU function is deployed, the async-callback-succ-func function is specified as the callback function for successful invocations, and the async-callback-fail-func function is configured as the callback function for failed invocations. The following table lists the information about the preceding functions.

Function	Description	Runtime environment	Instance type	Trigger type
tgpu_basic_func	A function that runs AI quasi-real-time tasks and AI offline tasks based on GPU-accelerated instances of Function Compute	Custom Container	GPU-accelerated instance	HTTP function
async-callback-succ-func	The destination callback function for successful task executions	Python 3	Elastic instance	Event function
async-callback-fail-func	The destination callback function for failed task executions	Python 3	Elastic instance	Event function

The following figure describes the workflow.

Before you begin

Step 1: Deploy the callback function for successful invocations

Initialize a project

s init devsapp/start-fc-event-python3 -d async-succ-callback

The following sample code shows the directory of the created project:

├── async-succ-callback
│   ├── code
│   │   └── index.py
│   └── s.yaml

Go to the directory where the project resides.
```
cd async-succ-callback
```

Modify the parameter configurations in the directory file based on your business requirements.

Edit the s.yaml file. Example:

edition: 1.0.0
name: hello-world-app
# access specifies the key information required by the current application.
# For information about how to configure keys, visit https://www.serverless-devs.com/serverless-devs/command/config.
# For more information about how to use keys, visit https://www.serverless-devs.com/serverless-devs/tool.
access: "default"

vars: # The global variable.
  region: "cn-shenzhen"

services:
  helloworld: # The name of the service or module.
    component: fc
    props:
      region: ${vars.region}
      service:
        name: "async-callback-service"
        description: 'async callback service'
        # Obtain the logConfig configuration document from https://gitee.com/devsapp/fc/blob/main/docs/zh/yaml/service.md#logconfig.
        logConfig:
          project: tgpu-prj-sh             # The project that stores the request logs. You must create the project in Simple Log Service in advance. We recommend that you configure this item.
          logstore: tgpu-logstore-sh       # The Logstore that stores the request logs. You must create the Logstore in Simple Log Service in advance. We recommend that you configure this item.
          enableRequestMetrics: true
          enableInstanceMetrics: true
          logBeginRule: DefaultRegex
      function:
        name: "async-callback-succ-func"
        description: 'async callback succ func'
        runtime: python3
        codeUri: ./code
        handler: index.handler
        memorySize: 128
        timeout: 60

Edit the index.py file. Example:

# -*- coding: utf-8 -*-
import logging

# To enable the initializer feature
# please implement the initializer function as below:
# def initializer(context):
#   logger = logging.getLogger()
#   logger.info('initializing')

def handler(event, context):
  logger = logging.getLogger()
  logger.info('hello async callback succ')
  return 'hello async callback succ'

Deploy the code to Function Compute.
```
s deploy
```
You can view the deployed function in the Function Compute console.
Invoke and debug the function by using an on-premises machine.
```
s invoke
```
After the invocation is complete, hello async callback succ is returned.

Step 2: Deploy the callback function for failed invocations

Initialize a project

s init devsapp/start-fc-event-python3 -d async-fail-callback

The following sample code shows the directory of the created project:

├── async-fail-callback
│   ├── code
│   │   └── index.py
│   └── s.yaml

Go to the directory where the project resides.
```
cd async-fail-callback
```

Modify the parameter configurations in the directory file based on your business requirements.

Edit the s.yaml file. Example:

edition: 1.0.0
name: hello-world-app
# access specifies the key information required by the current application.
# For information about how to configure keys, visit https://www.serverless-devs.com/serverless-devs/command/config.
# For more information about how to use keys, visit https://www.serverless-devs.com/serverless-devs/tool.
access: "default"

vars: # The global variable.
  region: "cn-shenzhen"

services:
  helloworld: # The name of the service or module.
    component: fc
    props:
      region: ${vars.region}
      service:
        name: "async-callback-service"
        description: 'async callback service'
        # Obtain the logConfig configuration document from https://gitee.com/devsapp/fc/blob/main/docs/zh/yaml/service.md#logconfig.
        logConfig:
          project: tgpu-prj-sh             # The project that stores the request logs. You must create the project in Simple Log Service in advance. We recommend that you configure this item.
          logstore: tgpu-logstore-sh       # The Logstore that stores the request logs. You must create the Logstore in Simple Log Service in advance. We recommend that you configure this item.
          enableRequestMetrics: true
          enableInstanceMetrics: true
          logBeginRule: DefaultRegex
      function:
        name: "async-callback-fail-func"
        description: 'async callback fail func'
        runtime: python3
        codeUri: ./code
        handler: index.handler
        memorySize: 128
        timeout: 60

Edit the index.py file. Example:

# -*- coding: utf-8 -*-
import logging

# To enable the initializer feature
# please implement the initializer function as below:
# def initializer(context):
#   logger = logging.getLogger()
#   logger.info('initializing')

def handler(event, context):
  logger = logging.getLogger()
  logger.info('hello async callback fail')
  return 'hello async callback fail'

Deploy the code to Function Compute.
```
s deploy
```
You can view the deployed function in the Function Compute console.
Invoke and debug the function by using an on-premises machine.
```
s invoke
```
After the invocation is complete, hello async callback fail is returned.

Step 3: Deploy a GPU function

Create a project directory.

mkdir fc-gpu-async-job&&cd fc-gpu-async-job

Create a file based on the following directory structure. Use the actual configurations of the parameters when you create the file.

Directory structure:

├── fc-gpu-async-job
├── code
│   ├── app.py
│   └── Dockerfile
└── s.yaml

Edit the s.yaml file. Example:

edition: 1.0.0
name: gpu-container-demo
# access specifies the key information required by the current application.
# For information about how to configure keys, visit https://www.serverless-devs.com/serverless-devs/command/config.
# For information about the order in which keys are used, visit https://www.serverless-devs.com/serverless-devs/tool.
access: default
vars:
  region: cn-shenzhen
services:
  customContainer-demo:
    component: devsapp/fc
    props:
      region: ${vars.region}
      service:
        name: tgpu_basic_service
        internetAccess: true
        # Obtain the logConfig configuration document from https://gitee.com/devsapp/fc/blob/main/docs/zh/yaml/service.md#logconfig.
        logConfig:
          project: aliyun****          # The project that stores the request logs. You must create the project in Simple Log Service in advance. We recommend that you configure this item.
          logstore: func****     # The Logstore that stores the request logs. You must create the Logstore in Simple Log Service in advance. We recommend that you configure this item.
          enableRequestMetrics: true
          enableInstanceMetrics: true
          logBeginRule: DefaultRegex
      function:
        name: tgpu_basic_func
        description: test gpu basic
        handler: not-used
        timeout: 600
        caPort: 9000
        # You can select an appropriate GPU-accelerated instance type based on the actual GPU memory usage. The following example shows the 1/8 virtualized GPU specification:
        instanceType: fc.gpu.tesla.1
        gpuMemorySize: 2048
        cpu: 1
        memorySize: 4096
        diskSize: 512
        instanceConcurrency: 1
        runtime: custom-container
        customContainerConfig:
          # Specify the information about your image. You must create a Container Registry Personal Edition or Enterprise Edition instance in advance. You must also create a namespace and an image repository.
          image: registry.cn-shenzhen.aliyuncs.com/my****/my****
          # Enable image acceleration. This feature can optimize the cold start of gigabyte-level images.
          accelerationType: Default
        codeUri: ./code
        # Asynchronous mode configurations
        #For more information, see https://gitee.com/devsapp/fc/blob/main/docs/zh/yaml/function.md#asyncconfiguration.
        asyncConfiguration:
          destination:           
            # Specify the Alibaba Cloud Resource Name (ARN) of the callback function for failed invocations.
            onFailure: "acs:fc:cn-shenzhen:164901546557****:services/async-callback-service.LATEST/functions/async-callback-fail-func"
            # Specify the ARN of the callback function for successful invocations.
            onSuccess: "acs:fc:cn-shenzhen:164901546557****:services/async-callback-service.LATEST/functions/async-callback-succ-func"
          statefulInvocation: true
      triggers:
        - name: httpTrigger
          type: http
          config:
            authType: anonymous
            methods:
              - GET

Edit the Dockerfile file. Example:

FROM nvidia/cuda:11.0-base
FROM ubuntu
WORKDIR /usr/src/app
RUN apt-get update
RUN apt-get install -y python3
COPY . .
CMD [ "python3", "-u", "/usr/src/app/app.py" ]
EXPOSE 9000

Edit the app.py file. Example:

# -*- coding: utf-8 -*-
# python2 and python3

from __future__ import print_function
from http.server import HTTPServer, BaseHTTPRequestHandler
import json
import sys
import logging
import os
import time

host = ('0.0.0.0', 9000)

class Resquest(BaseHTTPRequestHandler):
    def do_GET(self):
        print("simulate long execution scenario, sleep 10 seconds")
        time.sleep(10)

        print("show me GPU info")
        msg = os.popen("nvidia-smi -L").read()
        data = {'result': msg}
        self.send_response(200)
        self.send_header('Content-type', 'application/json')
        self.end_headers()
        self.wfile.write(json.dumps(data).encode())

if __name__ == '__main__':
    server = HTTPServer(host, Resquest)
    print("Starting server, listen at: %s:%s" % host)
    server.serve_forever()

Deploy the code to Function Compute.
```
s deploy
```
You can view the deployed GPU function and the asynchronous configuration of the function in the Function Compute console.
Invoke and debug the function by using an on-premises machine.
```
s invoke
```
After the invocation is complete, Hello, World! is returned.
Submit the asynchronous task.
1. View the preparation status of image acceleration for the GPU function.
  We recommend that you initiate an asynchronous task after the status of image acceleration changes to Available. Otherwise, exceptions such as link timeout may occur.
2. Log on to the Function Compute console. Find the GPU function tgpu_basic_func. On the Asynchronous Tasks tab, click Submit Task.
After the execution is complete, the task status changes to Successful.
You can find the configured callback function async-callback-succ-func for successful invocations. Choose Logs > Call Request List, and find the result line of the asynchronous request to check whether the invocation is successful.

Additional information

For more information about the best practices of GPU functions, see Use cases for serverless GPU applications.