How to enable GPU acceleration for DirectX in Windows containers - Container Service for Kubernetes

GPUs provide higher parallel computing power than CPUs for workloads on Windows nodes and can accelerate operations by orders of magnitude. This reduces costs and improves throughput. Windows containers support GPU acceleration for Direct eXtension (DirectX) and all the frameworks that are built on top of DirectX. This topic describes how to install the DirectX device plug-in on Windows nodes and how to enable GPU acceleration for DirectX.

Prerequisites

A Container Service for Kubernetes (ACK) managed cluster is created and the Kubernetes version is 1.20.4 or later. For more information, see Create an ACK managed cluster.
A kubectl client is connected to the ACK cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

Introduction

DirectX is a type of API that improves execution efficiency and enhances 3D graphics and sound effects for Windows-based games and multimedia programs. It provides designers with a common hardware driver standard, simplifying installation and setup. DirectX allows you to use GPUs to handle parallel and compute-intensive tasks. It also reduces overload and optimizes the use of GPUs as parallel processors.

Step 1: Create an elastic Windows node pool with GPU acceleration

Create a standard Windows node pool

Activate the GRID driver with a license. You can install a GRID driver in the following two ways:
- If you are an enterprise user of NVIDIA, you can download and install the GRID driver from the NVIDIA enterprise licensing site.
- If you are not an enterprise user of NVIDIA, you can use the Load a GRID driver by using a community image pre-installed with the driver provided by Alibaba Cloud.
Create a Windows node pool that meets the following requirements. For more information, see Create a Windows node pool.
- Instance type: GPU-accelerated compute-optimized instance types or vGPU-accelerated instance types. For more information about supported instance types, see GPU-accelerated compute-optimized instance families or vGPU-accelerated instance families.
- Operating system: Select the OS based on your business requirements. Example: Windows Server 2022.

Create an elastic Windows node pool

ACK only supports using ECS public images as node images by default. You need to use a custom image to create an elastic Windows node. The process is as follows.

Submit a ticket to request a shared Windows image with a GRID driver that has an activated license. Only Windows Server 2019 and Windows Server 2022 are supported. Specify the Windows version in the ticket If you have other requirements.
Create a Windows node pool that meets the following requirements. For more information, see Create a Windows node pool.
1. Instance type: GPU-accelerated compute-optimized instance types or vGPU-accelerated instance types. For more information about supported instance types, see GPU-accelerated compute-optimized instance families or vGPU-accelerated instance families.
2. Operating system: Select the OS based on your business requirements. Example: Windows Server 2022.
3. Custom image: Select the image that you requested.

Step 2: Install the DirectX device plug-in on Windows nodes

Deploy the DirectX device plug-in as a DaemonSet on Windows nodes.

Create a file named directx-device-plugin-windows.yaml and copy the following code to the file:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    k8s-app: directx-device-plugin-windows
  name: directx-device-plugin-windows
  namespace: kube-system
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: directx-device-plugin-windows
  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        k8s-app: directx-device-plugin-windows
    spec:
      tolerations:
        - operator: Exists
      # since 1.18, we can specify "hostNetwork: true" for Windows workloads, so we can deploy an application without NetworkReady.
      hostNetwork: true
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: type
                    operator: NotIn
                    values:
                      - virtual-kubelet
                  - key: beta.kubernetes.io/os
                    operator: In
                    values:
                      - windows
                  - key: windows.alibabacloud.com/deployment-topology
                    operator: In
                    values:
                      - "2.0"
                  - key: windows.alibabacloud.com/directx-supported
                    operator: In
                    values:
                      - "true"
              - matchExpressions:
                  - key: type
                    operator: NotIn
                    values:
                      - virtual-kubelet
                  - key: kubernetes.io/os
                    operator: In
                    values:
                      - windows
                  - key: windows.alibabacloud.com/deployment-topology
                    operator: In
                    values:
                      - "2.0"
                  - key: windows.alibabacloud.com/directx-supported
                    operator: In
                    values:
                      - "true"
      containers:
        - name: directx
          command:
            - pwsh.exe
            - -NoLogo
            - -NonInteractive
            - -File
            - entrypoint.ps1
          # Modify the region information in the image address below according to the region of your cluster.
          image: registry-cn-hangzhou-vpc.ack.aliyuncs.com/acs/directx-device-plugin-windows:v1.0.0
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: host-binary
              mountPath: c:/host/opt/bin
            - name: wins-pipe
              mountPath: \\.\pipe\rancher_wins
      volumes:
        - name: host-binary
          hostPath:
            path: c:/opt/bin
            type: DirectoryOrCreate
        - name: wins-pipe
          hostPath:
            path: \\.\pipe\rancher_wins

Run the following command to deploy the directx-device-plugin-windows.yaml file and install the DirectX device plug-in.
```
kubectl create -f directx-device-plugin-windows.yaml
```

Step 3: Deploy a Windows workload that has GPU acceleration enabled for DirectX

The DirectX device plug-in can automatically add the class/<interface class GUID> device to Windows containers to enable accessing DirectX services on the Elastic Compute Service (ECS) host. For more information, see Devices in containers on Windows.

Add the following resources parameter for the Windows workload that requires GPU acceleration and redeploy the workload:

spec:
  ...
  template:
    ...
    spec:
      ...
      containers:
        - name: gpu-user
          ...
+         resources:
+           limits:
+             windows.alibabacloud.com/directx: "1"
+           requests:
+             windows.alibabacloud.com/directx: "1"

Important

The preceding configuration does not allocate all GPU resources on the ECS host to the containers, nor prevent other applications from accessing the GPUs on the ECS host. Instead, GPU resources are dynamically scheduled between the ECS host and containers. This means that you can run multiple Windows containers on the ECS host and each container can use DirectX hardware acceleration.

For more information about GPU acceleration in Windows containers, see GPU acceleration in Windows containers.

Step 4: Verify whether GPU acceleration is enabled for the Windows workload

You can use the following method to verify whether the DirectX device plug-in is deployed on Windows nodes.

Create a file named gpu-job-windows.yaml and copy the following code to the file:

apiVersion: batch/v1
kind: Job
metadata:
  labels:
    k8s-app: gpu-job-windows
  name: gpu-job-windows
  namespace: default
spec:
  parallelism: 1
  completions: 1
  backoffLimit: 3
  manualSelector: true
  selector:
    matchLabels:
      k8s-app: gpu-job-windows
  template:
    metadata:
      labels:
        k8s-app: gpu-job-windows
    spec:
      restartPolicy: Never
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: type
                    operator: NotIn
                    values:
                      - virtual-kubelet
                  - key: beta.kubernetes.io/os
                    operator: In
                    values:
                      - windows
              - matchExpressions:
                  - key: type
                    operator: NotIn
                    values:
                      - virtual-kubelet
                  - key: kubernetes.io/os
                    operator: In
                    values:
                      - windows
      tolerations:
        - key: os
          value: windows
      containers:
        - name: gpu
          # Modify the region information in the image address below according to the region of your cluster.
          image: registry-cn-hangzhou-vpc.ack.aliyuncs.com/acs/sample-gpu-windows:v1.0.0
          imagePullPolicy: IfNotPresent
          resources:
            limits:
              windows.alibabacloud.com/directx: "1"
            requests:
              windows.alibabacloud.com/directx: "1"

Note

Image registry-{region}-vpc.ack.aliyuncs.com/acs/sample-gpu-windows is a sample image for GPU acceleration in Windows containers provided by ACK. This image is built on top of Microsoft Windows. For more information, see microsoft-windows.
In this example, WinMLRunner is used to generate simulated input data. After GPU acceleration is enabled for the gpu-job-windows task, 100 evaluations are performed based on the Tiny YOLOv2 model to output the final performance data. Actual results may vary depending on your operating environment.
The image file is 15.3 GB in size and may require a long time to pull the image when you use it to deploy applications.

Run the following command to deploy gpu-job-windows.yaml and create the sample application:
```
kubectl create -f gpu-job-windows.yaml
```

Run the following command to query the log of the gpu-job-windows application:

kubectl logs -f gpu-job-windows

Expected output:

INFO: Executing model of "tinyyolov2-7" 100 times within GPU driver ...

Created LearningModelDevice with GPU: NVIDIA GRID T4-8Q
Loading model (path = c:\data\tinyyolov2-7\model.onnx)...
=================================================================
Name: Example Model
Author: OnnxMLTools
Version: 0
Domain: onnxconverter-common
Description: The Tiny YOLO network from the paper 'YOLO9000: Better, Faster, Stronger' (2016), arXiv:1612.08242
Path: c:\data\tinyyolov2-7\model.onnx
Support FP16: false

Input Feature Info:
Name: image
Feature Kind: Image (Height: 416, Width:  416)

Output Feature Info:
Name: grid
Feature Kind: Float

The output shows that GPU acceleration is enabled for the gpu-job-windows application.