This topic describes how to specify GPU-accelerated Elastic Compute Service (ECS) instance types to create an Elastic Container Instance (ECI) pod.
Supported instance type families
For more information about ECS instance families, see the following topics:
Configurations
You can add annotations to the metadata in the configuration file of a pod to specify GPU-accelerated ECS instance types. After you specify GPU-accelerated ECS instance types, you must add the nvidia.com/gpu
field to the containers.resources section to specify the number of GPUs that you want to allocate to a container.
The value of the
nvidia.com/gpu
field specifies the number of GPUs that you want to allocate to a container. You must specify the field when you create a GPU-accelerated pod. If you do not specify this field, an error is returned when the pod is started.By default, multiple containers in the elastic container instance can share the GPUs. You must make sure that the number of GPUs that you allocate to a single container does not exceed the number of GPUs that the specified GPU-accelerated ECS instance type provides.
Example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
labels:
app: test
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
name: nginx-test
labels:
app: nginx
alibabacloud.com/eci: "true"
annotations:
k8s.aliyun.com/eci-use-specs: "ecs.gn6i-c4g1.xlarge,ecs.gn6i-c8g1.2xlarge" # Specify a maximum of five GPU-accelerated ECS instance types at a time.
spec:
containers:
- name: nginx
image: registry.cn-shanghai.aliyuncs.com/eci_open/nginx:1.14.2
resources:
limits:
nvidia.com/gpu: "1" # The number of GPUs required by the Nginx container. The GPUs are shared.
ports:
- containerPort: 80
- name: busybox
image: registry.cn-shanghai.aliyuncs.com/eci_open/busybox:1.30
command: ["sleep"]
args: ["999999"]
resources:
limits:
nvidia.com/gpu: "1" # The number of GPUs required by the BusyBox container. The GPUs are shared.
By default, a GPU-accelerated elastic container instance automatically installs the supported driver and CUDA versions based on the specified GPU-accelerated ECS instance type. In some scenarios, you may need to use different driver and CUDA versions for different GPU-accelerated elastic container instances. In this case, you can add annotations to specify the driver and CUDA versions. For example, if you specify ecs.gn6i-c4g1.xlarge as the GPU-accelerated ECS instance type, the default driver and CUDA versions installed are NVIDIA 470.82.01 and CUDA 11.4. After you add the k8s.aliyun.com/eci-gpu-driver-version: tesla=525.85.12
annotation, the driver and CUDA versions installed change to NVIDIA 525.85.12 and CUDA 12.0. The following code provides an example in YAML format.
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
labels:
app: test
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
name: nginx-test
labels:
app: nginx
alibabacloud.com/eci: "true"
annotations:
k8s.aliyun.com/eci-use-specs: ecs.gn6i-c4g1.xlarge # Specify the supported GPU-accelerated ECS instance types. The instance types support the change of driver version.
k8s.aliyun.com/eci-gpu-driver-version: tesla=525.85.12 # to specify the GPU driver version.
spec:
containers:
- name: nginx
image: registry.cn-shanghai.aliyuncs.com/eci_open/nginx:1.14.2
resources:
limits:
nvidia.com/gpu: "1" # The number of GPUs required by the container.