All Products
Search
Document Center

Elastic Container Instance:GPU-accelerated ECS instance types

最終更新日:Dec 11, 2024

This topic describes how to specify GPU-accelerated Elastic Compute Service (ECS) instance types to create an Elastic Container Instance (ECI) pod.

Supported instance type families

GPU-accelerated ECS instance types contain GPUs and are suitable for scenarios such as deep learning and image processing. GPU-related Docker images can be directly run on a GPU-accelerated elastic container instance. A NVIDIA GPU driver is pre-installed in the instance. The supported driver and CUDA versions vary with GPU types.

Note

The gn8ia and gn8is instance families in the following table are available only in specific regions outside the Chinese mainland. To use the instance families, contact Alibaba Cloud sales personnel.

Category

GPU-accelerated instance family

Driver and CUDA versions

vGPU-accelerated instance families

sgn7i-vws

NVIDIA 470.161.03 and CUDA 11.4

vgn7i-vws

vgn6i-vws

GPU-accelerated compute-optimized instance families

gn7e

  • NVIDIA 470.82.01 and CUDA 11.4 (default)

  • NVIDIA 525.85.12 and CUDA 12.0

  • NVIDIA 535.161.08 and CUDA 12.2

gn7i

gn7s

gn7

gn6v

gn6e

gn6i

gn5i

gn5

gn8ia

NVIDIA 535.161.08 and CUDA 12.2

gn8is

For more information about ECS instance families, see the following topics:

Configurations

You can add annotations to the metadata in the configuration file of a pod to specify GPU-accelerated ECS instance types. After you specify GPU-accelerated ECS instance types, you must add the nvidia.com/gpu field to the containers.resources section to specify the number of GPUs that you want to allocate to a container.

Important
  • The value of the nvidia.com/gpu field specifies the number of GPUs that you want to allocate to a container. You must specify the field when you create a GPU-accelerated pod. If you do not specify this field, an error is returned when the pod is started.

  • By default, multiple containers in the elastic container instance can share the GPUs. You must make sure that the number of GPUs that you allocate to a single container does not exceed the number of GPUs that the specified GPU-accelerated ECS instance type provides.

Example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  labels:
    app: test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      name: nginx-test
      labels:
        app: nginx
        alibabacloud.com/eci: "true" 
      annotations:
        k8s.aliyun.com/eci-use-specs: "ecs.gn6i-c4g1.xlarge,ecs.gn6i-c8g1.2xlarge" # Specify a maximum of five GPU-accelerated ECS instance types at a time. 
    spec:
      containers:
      - name: nginx
        image: registry.cn-shanghai.aliyuncs.com/eci_open/nginx:1.14.2
        resources:
            limits:
              nvidia.com/gpu: "1" # The number of GPUs required by the Nginx container. The GPUs are shared. 
        ports:
        - containerPort: 80
      - name: busybox
        image: registry.cn-shanghai.aliyuncs.com/eci_open/busybox:1.30
        command: ["sleep"]
        args: ["999999"]
        resources:
            limits:
              nvidia.com/gpu: "1" # The number of GPUs required by the BusyBox container. The GPUs are shared.

By default, a GPU-accelerated elastic container instance automatically installs the supported driver and CUDA versions based on the specified GPU-accelerated ECS instance type. In some scenarios, you may need to use different driver and CUDA versions for different GPU-accelerated elastic container instances. In this case, you can add annotations to specify the driver and CUDA versions. For example, if you specify ecs.gn6i-c4g1.xlarge as the GPU-accelerated ECS instance type, the default driver and CUDA versions installed are NVIDIA 470.82.01 and CUDA 11.4. After you add the k8s.aliyun.com/eci-gpu-driver-version: tesla=525.85.12 annotation, the driver and CUDA versions installed change to NVIDIA 525.85.12 and CUDA 12.0. The following code provides an example in YAML format.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  labels:
    app: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      name: nginx-test
      labels:
        app: nginx
        alibabacloud.com/eci: "true" 
      annotations:
        k8s.aliyun.com/eci-use-specs: ecs.gn6i-c4g1.xlarge # Specify the supported GPU-accelerated ECS instance types. The instance types support the change of driver version. 
        k8s.aliyun.com/eci-gpu-driver-version: tesla=525.85.12 # to specify the GPU driver version. 
    spec:
      containers:
      - name: nginx
        image: registry.cn-shanghai.aliyuncs.com/eci_open/nginx:1.14.2
        resources:
            limits:
              nvidia.com/gpu: "1" # The number of GPUs required by the container.