All Products
Search
Document Center

Container Service for Kubernetes:Configure a GPU selection policy for nodes with the GPU sharing feature enabled

Last Updated:Jun 06, 2024

By default, the scheduler allocates all resources of a GPU on a node to pods before you switch to another GPU. This helps prevent GPU fragments. In some scenarios, you may want to spread pods to different GPUs on a node in case business interruptions occur when a GPU is faulty. This topic describes how to configure a GPU selection policy for nodes with the GPU sharing feature enabled.

Prerequisites

Policy description

If a node with the GPU sharing feature enabled has multiple GPUs, you can choose one of the following GPU selection policies:

  • Binpack: By default, the binpack policy is used. The scheduler allocates all resources of a GPU to pods before you switch to another GPU. This helps prevent GPU fragments.

  • Spread: The scheduler attempts to spread pods to different GPUs on the node in case business interruptions occur when a GPU is faulty.

In this example, a node has two GPUs. Each GPU provides 15 GiB of memory. Pod1 requests 2 GiB of memory and Pod2 requests 3 GiB of memory.

image

Step 1: Create a node pool

By default, the binpack policy is used to select GPUs. To use the spread policy, perform the following steps:

  1. Log on to the ACK console. In the left-side navigation pane, click Cluster.

  2. On the Clusters page, find the cluster that you want to manage and click the name of the cluster to go to the details page of the cluster. In the left-side navigation pane, choose Nodes > Node Pools.

  1. In the upper-right corner of the Node Pools page, click Create Node Pool.

  2. In the Create Node Pool dialog box, configure the parameters for the node pool and click Confirm Order. The following table describes the key parameters. For more information about other parameters, see Create a node pool.

    Parameter

    Description

    Instance Type

    Set Architecture to GPU-accelerated and select multiple GPU-accelerated instance types.

    The spread policy takes effect only if the node has more than one GPU. Therefore, select an instance type that has multiple GPUs.

    Expected Nodes

    Specify the initial number of nodes in the node pool. If you do not want to create nodes in the node pool, set this parameter to 0.

    Node Label

    Click the 1.jpg icon to add two entries:

    • Add a label whose key is ack.node.gpu.schedule and value is cgpu. This label enables the GPU sharing and GPU memory isolation features.

    • Add a label whose key is ack.node.gpu.placement and value is spread. This label enables the spread policy.

Step 2: Submit a job

  1. Log on to the ACK console. In the left-side navigation pane, click Cluster.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Workloads > Jobs.

  3. Click Create from YAML in the upper-right part of the page, copy the following code to the Template editor, and then modify the parameters based on the following comments. After you complete the configuration, click Create.

    Click to view YAML content

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: tensorflow-mnist-spread
    spec:
      parallelism: 3
      template:
        metadata:
          labels:
            app: tensorflow-mnist-spread
        spec:
          nodeSelector:
             kubernetes.io/hostname: <NODE_NAME> # Replace <NODE_NAME> with the name of a GPU-accelerated node in the cluster, such as cn-shanghai.192.0.2.109. 
          containers:
          - name: tensorflow-mnist-spread
            image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
            command:
            - python
            - tensorflow-sample-code/tfjob/docker/mnist/main.py
            - --max_steps=100000
            - --data_dir=tensorflow-sample-code/data
            resources:
              limits:
                aliyun.com/gpu-mem: 4 # Request 4 GiB of memory. 
            workingDir: /root
          restartPolicy: Never

    YAML template description:

    • This YAML template defines a TensorFlow MNIST job. The job creates 3 pods and each pod requests 4 GiB of memory.

    • The pod resource limit aliyun.com/gpu-mem: 4 is used to request memory for the pods.

    • To make the GPU selection policy take effect on a node, add NodeSelector kubernetes.io/hostname: <NODE_NAME> to the YAML template to schedule the pods to the specified node.

Step 3: Verify whether the spread policy is used

Use the GPU inspection tool to query the GPU allocation information of the node.

kubectl inspect cgpu

NAME                   IPADDRESS      GPU0(Allocated/Total)  GPU1(Allocated/Total)  GPU2(Allocated/Total)  GPU3(Allocated/Total)  GPU Memory(GiB)
cn-shanghai.192.0.2.109  192.0.2.109  4/15                   4/15                   0/15                   4/15                   12/60
--------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
12/60 (20%)

The results indicate that the pods are spread to different GPUs. The spread policy is in effect.