This topic describes how to use Serverless Devs to invoke a GPU function based on asynchronous tasks and pass the invocation results to the configured asynchronous destination functions.
Background
GPU-accelerated Instance
With the widespread application of machine learning, especially in-depth learning, CPUs are incapable of meeting the computing power requirements generated by a large number of vector, matrix, and tensor operations. The computing power requirements include the requirements for high-precision calculations in training scenarios and the requirements for low-precision calculations in reasoning scenarios. In 2007, Nvidia launched the Compute Unified Device Architecture (CUDA) framework, a programmable general-purpose computing platform. Researchers and developers revised numerous algorithms to improve performance by dozens or even thousands of times. GPU has become one of the basic facilities of various tools, algorithms, and frameworks since machine learning got popular.
During Apsara Conference 2021, Alibaba Cloud Function Compute officially launched GPU-accelerated instances that use the Turing architecture. Serverless developers can use GPU hardware to accelerate AI training and inference tasks. This way, the efficiency of model training and inference services is improved.
Asynchronous tasks
Function Compute provides full-stack capabilities that can be used to distribute, execute, and monitor asynchronous tasks. This allows you to focus on the compilation of task processing logic, and need to only create and submit the task processing functions. Function Compute provides various monitoring features such as asynchronous task logs, metrics, and duration statistics in each phase. Function Compute also provides features such as auto scaling of instances, task deduplication, termination of specified tasks, and batch task suspension, resumption, and deletion. For more information, see Overview.
Scenarios
In non-real-time and offline AI inference scenarios, AI training scenarios, and audio and video production scenarios, GPU functions are invoked based on asynchronous tasks. This allows developers to focus on businesses and quickly achieve business goals. The following section describes the implementation methods:
GPU resources can be used in 1/8, 1/4, 1/2 or exclusive mode by using the GPU virtualization technology. This way, GPU-accelerated instances can be configured in a fine-grained manner.
Various mature asynchronous task processing capabilities, such as asynchronous mode management, task deduplication, task monitoring, task retry, event triggering, result callback, and task orchestration, are provided.
Developers can focus on code development and the achievement of business objectives without the need to perform O&M on GPU clusters, such as driver and CUDA version management, machine operation management, and GPU bad card management.
How it works
This topic describes how to deploy a GPU function and implement result callbacks. In this topic, the tgpu_basic_func
GPU function is deployed, the async-callback-succ-func
function is specified as the callback function for successful invocations, and the async-callback-fail-func
function is configured as the callback function for failed invocations. The following table lists the information about the preceding functions.
Function | Description | Runtime environment | Instance type | Trigger type |
tgpu_basic_func | A function that runs AI quasi-real-time tasks and AI offline tasks based on GPU-accelerated instances of Function Compute | Custom Container | GPU-accelerated instance | HTTP function |
async-callback-succ-func | The destination callback function for successful task executions | Python 3 | Elastic instance | Event function |
async-callback-fail-func | The destination callback function for failed task executions | Python 3 | Elastic instance | Event function |
The following figure describes the workflow.
Before you begin
Step 1: Deploy the callback function for successful invocations
Initialize a project
s init devsapp/start-fc-event-python3 -d async-succ-callback
The following sample code shows the directory of the created project:
├── async-succ-callback │ ├── code │ │ └── index.py │ └── s.yaml
Go to the directory where the project resides.
cd async-succ-callback
Modify the parameter configurations in the directory file based on your business requirements.
Edit the
s.yaml
file. Example:edition: 1.0.0 name: hello-world-app # access specifies the key information required by the current application. # For information about how to configure keys, visit https://www.serverless-devs.com/serverless-devs/command/config. # For more information about how to use keys, visit https://www.serverless-devs.com/serverless-devs/tool. access: "default" vars: # The global variable. region: "cn-shenzhen" services: helloworld: # The name of the service or module. component: fc props: region: ${vars.region} service: name: "async-callback-service" description: 'async callback service' # Obtain the logConfig configuration document from https://gitee.com/devsapp/fc/blob/main/docs/zh/yaml/service.md#logconfig. logConfig: project: tgpu-prj-sh # The project that stores the request logs. You must create the project in Simple Log Service in advance. We recommend that you configure this item. logstore: tgpu-logstore-sh # The Logstore that stores the request logs. You must create the Logstore in Simple Log Service in advance. We recommend that you configure this item. enableRequestMetrics: true enableInstanceMetrics: true logBeginRule: DefaultRegex function: name: "async-callback-succ-func" description: 'async callback succ func' runtime: python3 codeUri: ./code handler: index.handler memorySize: 128 timeout: 60
Edit the index.py file. Example:
# -*- coding: utf-8 -*- import logging # To enable the initializer feature # please implement the initializer function as below: # def initializer(context): # logger = logging.getLogger() # logger.info('initializing') def handler(event, context): logger = logging.getLogger() logger.info('hello async callback succ') return 'hello async callback succ'
Deploy the code to Function Compute.
s deploy
You can view the deployed function in the Function Compute console.
Invoke and debug the function by using an on-premises machine.
s invoke
After the invocation is complete,
hello async callback succ
is returned.
Step 2: Deploy the callback function for failed invocations
Initialize a project
s init devsapp/start-fc-event-python3 -d async-fail-callback
The following sample code shows the directory of the created project:
├── async-fail-callback │ ├── code │ │ └── index.py │ └── s.yaml
Go to the directory where the project resides.
cd async-fail-callback
Modify the parameter configurations in the directory file based on your business requirements.
Edit the
s.yaml
file. Example:edition: 1.0.0 name: hello-world-app # access specifies the key information required by the current application. # For information about how to configure keys, visit https://www.serverless-devs.com/serverless-devs/command/config. # For more information about how to use keys, visit https://www.serverless-devs.com/serverless-devs/tool. access: "default" vars: # The global variable. region: "cn-shenzhen" services: helloworld: # The name of the service or module. component: fc props: region: ${vars.region} service: name: "async-callback-service" description: 'async callback service' # Obtain the logConfig configuration document from https://gitee.com/devsapp/fc/blob/main/docs/zh/yaml/service.md#logconfig. logConfig: project: tgpu-prj-sh # The project that stores the request logs. You must create the project in Simple Log Service in advance. We recommend that you configure this item. logstore: tgpu-logstore-sh # The Logstore that stores the request logs. You must create the Logstore in Simple Log Service in advance. We recommend that you configure this item. enableRequestMetrics: true enableInstanceMetrics: true logBeginRule: DefaultRegex function: name: "async-callback-fail-func" description: 'async callback fail func' runtime: python3 codeUri: ./code handler: index.handler memorySize: 128 timeout: 60
Edit the
index.py
file. Example:# -*- coding: utf-8 -*- import logging # To enable the initializer feature # please implement the initializer function as below: # def initializer(context): # logger = logging.getLogger() # logger.info('initializing') def handler(event, context): logger = logging.getLogger() logger.info('hello async callback fail') return 'hello async callback fail'
Deploy the code to Function Compute.
s deploy
You can view the deployed function in the Function Compute console.
Invoke and debug the function by using an on-premises machine.
s invoke
After the invocation is complete,
hello async callback fail
is returned.
Step 3: Deploy a GPU function
Create a project directory.
mkdir fc-gpu-async-job&&cd fc-gpu-async-job
Create a file based on the following directory structure. Use the actual configurations of the parameters when you create the file.
Directory structure:
├── fc-gpu-async-job ├── code │ ├── app.py │ └── Dockerfile └── s.yaml
Edit the s.yaml file. Example:
edition: 1.0.0 name: gpu-container-demo # access specifies the key information required by the current application. # For information about how to configure keys, visit https://www.serverless-devs.com/serverless-devs/command/config. # For information about the order in which keys are used, visit https://www.serverless-devs.com/serverless-devs/tool. access: default vars: region: cn-shenzhen services: customContainer-demo: component: devsapp/fc props: region: ${vars.region} service: name: tgpu_basic_service internetAccess: true # Obtain the logConfig configuration document from https://gitee.com/devsapp/fc/blob/main/docs/zh/yaml/service.md#logconfig. logConfig: project: aliyun**** # The project that stores the request logs. You must create the project in Simple Log Service in advance. We recommend that you configure this item. logstore: func**** # The Logstore that stores the request logs. You must create the Logstore in Simple Log Service in advance. We recommend that you configure this item. enableRequestMetrics: true enableInstanceMetrics: true logBeginRule: DefaultRegex function: name: tgpu_basic_func description: test gpu basic handler: not-used timeout: 600 caPort: 9000 # You can select an appropriate GPU-accelerated instance type based on the actual GPU memory usage. The following example shows the 1/8 virtualized GPU specification: instanceType: fc.gpu.tesla.1 gpuMemorySize: 2048 cpu: 1 memorySize: 4096 diskSize: 512 instanceConcurrency: 1 runtime: custom-container customContainerConfig: # Specify the information about your image. You must create a Container Registry Personal Edition or Enterprise Edition instance in advance. You must also create a namespace and an image repository. image: registry.cn-shenzhen.aliyuncs.com/my****/my**** # Enable image acceleration. This feature can optimize the cold start of gigabyte-level images. accelerationType: Default codeUri: ./code # Asynchronous mode configurations #For more information, see https://gitee.com/devsapp/fc/blob/main/docs/zh/yaml/function.md#asyncconfiguration. asyncConfiguration: destination: # Specify the Alibaba Cloud Resource Name (ARN) of the callback function for failed invocations. onFailure: "acs:fc:cn-shenzhen:164901546557****:services/async-callback-service.LATEST/functions/async-callback-fail-func" # Specify the ARN of the callback function for successful invocations. onSuccess: "acs:fc:cn-shenzhen:164901546557****:services/async-callback-service.LATEST/functions/async-callback-succ-func" statefulInvocation: true triggers: - name: httpTrigger type: http config: authType: anonymous methods: - GET
Edit the Dockerfile file. Example:
FROM nvidia/cuda:11.0-base FROM ubuntu WORKDIR /usr/src/app RUN apt-get update RUN apt-get install -y python3 COPY . . CMD [ "python3", "-u", "/usr/src/app/app.py" ] EXPOSE 9000
Edit the app.py file. Example:
# -*- coding: utf-8 -*- # python2 and python3 from __future__ import print_function from http.server import HTTPServer, BaseHTTPRequestHandler import json import sys import logging import os import time host = ('0.0.0.0', 9000) class Resquest(BaseHTTPRequestHandler): def do_GET(self): print("simulate long execution scenario, sleep 10 seconds") time.sleep(10) print("show me GPU info") msg = os.popen("nvidia-smi -L").read() data = {'result': msg} self.send_response(200) self.send_header('Content-type', 'application/json') self.end_headers() self.wfile.write(json.dumps(data).encode()) if __name__ == '__main__': server = HTTPServer(host, Resquest) print("Starting server, listen at: %s:%s" % host) server.serve_forever()
Deploy the code to Function Compute.
s deploy
You can view the deployed GPU function and the asynchronous configuration of the function in the Function Compute console.
Invoke and debug the function by using an on-premises machine.
s invoke
After the invocation is complete,
Hello, World!
is returned.Submit the asynchronous task.
View the preparation status of image acceleration for the GPU function.
We recommend that you initiate an asynchronous task after the status of image acceleration changes to Available. Otherwise, exceptions such as link timeout may occur.
Log on to the Function Compute console. Find the GPU function
tgpu_basic_func
. On the Asynchronous Tasks tab, click Submit Task.
After the execution is complete, the task status changes to Successful.
You can find the configured callback function
async-callback-succ-func
for successful invocations. Choose , and find the result line of the asynchronous request to check whether the invocation is successful.
Additional information
For more information about the best practices of GPU functions, see Use cases for serverless GPU applications.