GPUs provide higher parallel computing power than CPUs for workloads on Windows nodes and can accelerate operations by orders of magnitude. This reduces costs and improves throughput. Windows containers support GPU acceleration for Direct eXtension (DirectX) and all the frameworks that are built on top of DirectX. This topic describes how to install the DirectX device plug-in on Windows nodes and how to enable GPU acceleration for DirectX.
Prerequisites
A Container Service for Kubernetes (ACK) managed cluster is created and the Kubernetes version is 1.20.4 or later. For more information, see Create an ACK managed cluster.
A kubectl client is connected to the ACK cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Introduction
DirectX is a type of API that improves execution efficiency and enhances 3D graphics and sound effects for Windows-based games and multimedia programs. It provides designers with a common hardware driver standard, simplifying installation and setup. DirectX allows you to use GPUs to handle parallel and compute-intensive tasks. It also reduces overload and optimizes the use of GPUs as parallel processors.
Step 1: Create an elastic Windows node pool with GPU acceleration
Create a standard Windows node pool
Activate the GRID driver with a license. You can install a GRID driver in the following two ways:
If you are an enterprise user of NVIDIA, you can download and install the GRID driver from the NVIDIA enterprise licensing site.
If you are not an enterprise user of NVIDIA, you can use the Load a GRID driver by using a community image pre-installed with the driver provided by Alibaba Cloud.
Create a Windows node pool that meets the following requirements. For more information, see Create a Windows node pool.
Instance type: GPU-accelerated compute-optimized instance types or vGPU-accelerated instance types. For more information about supported instance types, see GPU-accelerated compute-optimized instance families or vGPU-accelerated instance families.
Operating system: Select the OS based on your business requirements. Example: Windows Server 2022.
Create an elastic Windows node pool
ACK only supports using ECS public images as node images by default. You need to use a custom image to create an elastic Windows node. The process is as follows.
Submit a ticket to request a shared Windows image with a GRID driver that has an activated license. Only Windows Server 2019 and Windows Server 2022 are supported. Specify the Windows version in the ticket If you have other requirements.
Request the feature to create a node pool using a custom image.
To use this feature, submit an application in the Quota Center console.
Create a Windows node pool that meets the following requirements. For more information, see Create a Windows node pool.
Instance type: GPU-accelerated compute-optimized instance types or vGPU-accelerated instance types. For more information about supported instance types, see GPU-accelerated compute-optimized instance families or vGPU-accelerated instance families.
Operating system: Select the OS based on your business requirements. Example: Windows Server 2022.
Custom image: Select the image that you requested.
Step 2: Install the DirectX device plug-in on Windows nodes
Deploy the DirectX device plug-in as a DaemonSet on Windows nodes.
Create a file named directx-device-plugin-windows.yaml and copy the following code to the file:
apiVersion: apps/v1 kind: DaemonSet metadata: labels: k8s-app: directx-device-plugin-windows name: directx-device-plugin-windows namespace: kube-system spec: revisionHistoryLimit: 10 selector: matchLabels: k8s-app: directx-device-plugin-windows template: metadata: annotations: scheduler.alpha.kubernetes.io/critical-pod: "" labels: k8s-app: directx-device-plugin-windows spec: tolerations: - operator: Exists # since 1.18, we can specify "hostNetwork: true" for Windows workloads, so we can deploy an application without NetworkReady. hostNetwork: true affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: type operator: NotIn values: - virtual-kubelet - key: beta.kubernetes.io/os operator: In values: - windows - key: windows.alibabacloud.com/deployment-topology operator: In values: - "2.0" - key: windows.alibabacloud.com/directx-supported operator: In values: - "true" - matchExpressions: - key: type operator: NotIn values: - virtual-kubelet - key: kubernetes.io/os operator: In values: - windows - key: windows.alibabacloud.com/deployment-topology operator: In values: - "2.0" - key: windows.alibabacloud.com/directx-supported operator: In values: - "true" containers: - name: directx command: - pwsh.exe - -NoLogo - -NonInteractive - -File - entrypoint.ps1 # Modify the region information in the image address below according to the region of your cluster. image: registry-vpc.cn-hangzhou.aliyuncs.com/acs/directx-device-plugin-windows:v1.0.0 imagePullPolicy: IfNotPresent volumeMounts: - name: host-binary mountPath: c:/host/opt/bin - name: wins-pipe mountPath: \\.\pipe\rancher_wins volumes: - name: host-binary hostPath: path: c:/opt/bin type: DirectoryOrCreate - name: wins-pipe hostPath: path: \\.\pipe\rancher_wins
Run the following command to deploy the directx-device-plugin-windows.yaml file and install the DirectX device plug-in.
kubectl create -f directx-device-plugin-windows.yaml
Step 3: Deploy a Windows workload that has GPU acceleration enabled for DirectX
The DirectX device plug-in can automatically add the class/<interface class GUID>
device to Windows containers to enable accessing DirectX services on the Elastic Compute Service (ECS) host. For more information, see Devices in containers on Windows.
Add the following resources
parameter for the Windows workload that requires GPU acceleration and redeploy the workload:
spec:
...
template:
...
spec:
...
containers:
- name: gpu-user
...
+ resources:
+ limits:
+ windows.alibabacloud.com/directx: "1"
+ requests:
+ windows.alibabacloud.com/directx: "1"
The preceding configuration does not allocate all GPU resources on the ECS host to the containers, nor prevent other applications from accessing the GPUs on the ECS host. Instead, GPU resources are dynamically scheduled between the ECS host and containers. This means that you can run multiple Windows containers on the ECS host and each container can use DirectX hardware acceleration.
For more information about GPU acceleration in Windows containers, see GPU acceleration in Windows containers.
Step 4: Verify whether GPU acceleration is enabled for the Windows workload
You can use the following method to verify whether the DirectX device plug-in is deployed on Windows nodes.
Create a file named gpu-job-windows.yaml and copy the following code to the file:
apiVersion: batch/v1 kind: Job metadata: labels: k8s-app: gpu-job-windows name: gpu-job-windows namespace: default spec: parallelism: 1 completions: 1 backoffLimit: 3 manualSelector: true selector: matchLabels: k8s-app: gpu-job-windows template: metadata: labels: k8s-app: gpu-job-windows spec: restartPolicy: Never affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: type operator: NotIn values: - virtual-kubelet - key: beta.kubernetes.io/os operator: In values: - windows - matchExpressions: - key: type operator: NotIn values: - virtual-kubelet - key: kubernetes.io/os operator: In values: - windows tolerations: - key: os value: windows containers: - name: gpu # Modify the region information in the image address below according to the region of your cluster. image: registry-vpc.cn-hangzhou.aliyuncs.com/acs/sample-gpu-windows:v1.0.0 imagePullPolicy: IfNotPresent resources: limits: windows.alibabacloud.com/directx: "1" requests: windows.alibabacloud.com/directx: "1"
NoteImage
registry-vpc.{region}.aliyuncs.com/acs/sample-gpu-windows
is a sample image for GPU acceleration in Windows containers provided by ACK. This image is built on top of Microsoft Windows. For more information, see microsoft-windows.In this example, WinMLRunner is used to generate simulated input data. After GPU acceleration is enabled for the
gpu-job-windows
task, 100 evaluations are performed based on the Tiny YOLOv2 model to output the final performance data. Actual results may vary depending on your operating environment.The image file is 15.3 GB in size and may require a long time to pull the image when you use it to deploy applications.
Run the following command to deploy gpu-job-windows.yaml and create the sample application:
kubectl create -f gpu-job-windows.yaml
Run the following command to query the log of the gpu-job-windows application:
kubectl logs -f gpu-job-windows
Expected output:
INFO: Executing model of "tinyyolov2-7" 100 times within GPU driver ... Created LearningModelDevice with GPU: NVIDIA GRID T4-8Q Loading model (path = c:\data\tinyyolov2-7\model.onnx)... ================================================================= Name: Example Model Author: OnnxMLTools Version: 0 Domain: onnxconverter-common Description: The Tiny YOLO network from the paper 'YOLO9000: Better, Faster, Stronger' (2016), arXiv:1612.08242 Path: c:\data\tinyyolov2-7\model.onnx Support FP16: false Input Feature Info: Name: image Feature Kind: Image (Height: 416, Width: 416) Output Feature Info: Name: grid Feature Kind: Float
The output shows that GPU acceleration is enabled for the gpu-job-windows application.