All Products
Search
Document Center

Container Service for Kubernetes:Configure GPU resources for a Knative Service and enable GPU sharing

Last Updated:Nov 04, 2024

When you deploy AI tasks, high-performance computing, or other workloads that require GPU resources in Knative, you can specify GPU-accelerated instance types in the Knative Service to create GPU-accelerated instances. You can also enable the GPU sharing feature to allow multiple pods to share a single GPU, maximizing its usage.

Prerequisites

Knative has been deployed in your cluster. For more information, see Deploy Knative.

Configure GPU resources

You can add the annotation k8s.aliyun.com/eci-use-specs to the spec.template.metadata.annotation section of the configurations of a Knative Service to specify a GPU-accelerated ECS instance type. You can add the nvidia.com/gpu field to the spec.containers.resources.limits section to specify the amount of GPU resources that are required by the Knative Service.

The following code block is an example:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: helloworld-go
spec:
  template:
    metadata:
      labels:
        app: helloworld-go
      annotations:
        k8s.aliyun.com/eci-use-specs: ecs.gn5i-c4g1.xlarge  # Specify a GPU-accelerated ECS instance type that is supported by Knative. 
    spec:
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
          ports:
          - containerPort: 8080
          resources:
            limits:
              nvidia.com/gpu: '1'    # Specify the number of GPUs that are required by the container. This field is required. If you do not specify this field, an error is returned when the pod is launched. 

The following GPU-accelerated ECS instance families are supported:

  • gn7i, a GPU-accelerated compute-optimized instance family that uses NVIDIA A10 GPUs. This instance family includes a variety of instance types, such as ecs.gn7i-c8g1.2xlarge.

  • gn7. This instance family includes a variety of instance types, such as ecs.gn7-c12g1.3xlarge.

  • gn6v, a GPU-accelerated compute-optimized instance family that uses NVIDIA V100 GPUs. This instance family includes a variety of instance types, such as ecs.gn6v-c8g1.2xlarge.

  • gn6e, a GPU-accelerated compute-optimized instance family that uses NVIDIA V100 GPUs. This instance family includes a variety of instance types, such as ecs.gn6e-c12g1.3xlarge.

  • gn6i, a GPU-accelerated compute-optimized instance family that uses NVIDIA T4 GPUs. This instance family includes a variety of instance types, such as ecs.gn6i-c4g1.xlarge.

  • gn5i, GPU-accelerated compute-optimized instance family that uses NVIDIA P4 GPUs. This instance family includes a variety of instance types, such as ecs.gn5i-c2g1.large.

  • gn5, GPU-accelerated compute-optimized instance family that uses NVIDIA P100 GPUs. This instance family includes a variety of instance types, such as ecs.gn5-c4g1.xlarge.

    The gn5 instance family is equipped with local disks. You can mount local disks to elastic container instances. For more information, see Create an elastic container instance that has local disks attached.

Note

Enable GPU sharing

  1. See examples to enable the GPU sharing feature for nodes.

  2. You can configure the aliyun.com/gpu-mem field in Knative Service to specify the GPU memory size. Example:

    apiVersion: serving.knative.dev/v1
    kind: Service
    metadata:
      name: helloworld-go
      namespace: default
    spec:
      template:
        metadata:
          annotations:
            autoscaling.knative.dev/maxScale: "100"
            autoscaling.knative.dev/minScale: "0"
        spec:
          containerConcurrency: 1
          containers:
          - image: registry-vpc.cn-hangzhou.aliyuncs.com/hz-suoxing-test/test:helloworld-go
            name: user-container
            ports:
            - containerPort: 6666
              name: http1
              protocol: TCP
            resources:
              limits:
                aliyun.com/gpu-mem: "3" # Specify the GPU memory size.

References

  • You can deploy AI models as inference services in Knative pods, configure auto scaling, and flexibly allocate GPU resources to improve the utilization of GPU resources and boost the performance of AI inference. For more information, see Best practices for deploying AI inference services in Knative.

  • For frequently asked questions and solutions about GPU, see FAQ.