You cannot directly use the cGPU component in an ACK Pro cluster after you migrate from an ACK dedicated cluster to the ACK Pro cluster. You must update the cGPU component before you can use GPU scheduling and isolation. This topic describes how to update the cGPU component in an ACK Pro cluster.
Prerequisites
Your applications are migrated from an ACK dedicated cluster to an ACK Pro cluster. The cGPU component is installed in the ACK dedicated cluster. For more information, see Hot migration from ACK dedicated clusters to ACK Pro clusters.
Procedure
Obtain the kubeconfig file of the cluster and connect a kubectl client to the cluster.
Download the job YAML file that is used to change the node label and uninstall the cGPU component. To download the YAML file, click gpushare-label-change.yaml.
Run the following command to deploy the job that runs the cGPU component:
kubectl apply -f gpushare-label-change.yaml
Run the following command to check whether the job is deployed:
kubectl get po -l app=change-gpushare-labels -n kube-system
Expected output:
NAME READY STATUS RESTARTS AGE gpushare-label-migration-v**** 0/1 Completed 0 89s
The output indicates that the job is in the
Completed
state.Install the cGPU component. For more information, see Install the cGPU component.
Install the GPU memory inspection tool in the cluster. For more information, see Install and use the GPU memory inspection tool.
What to do next
For more information about how to verify the GPU sharing and memory isolation features, see Examples of using cGPU to share GPUs.