When you deploy AI tasks, high-performance computing, or other workloads that require GPU resources in Knative, you can specify GPU-accelerated instance types in the Knative Service to create GPU-accelerated instances. You can also enable the GPU sharing feature to allow multiple pods to share a single GPU, maximizing its usage.
Prerequisites
Knative has been deployed in your cluster. For more information, see Deploy Knative.
Configure GPU resources
You can add the annotation k8s.aliyun.com/eci-use-specs
to the spec.template.metadata.annotation
section of the configurations of a Knative Service to specify a GPU-accelerated ECS instance type. You can add the nvidia.com/gpu
field to the spec.containers.resources.limits
section to specify the amount of GPU resources that are required by the Knative Service.
The following code block is an example:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
spec:
template:
metadata:
labels:
app: helloworld-go
annotations:
k8s.aliyun.com/eci-use-specs: ecs.gn5i-c4g1.xlarge # Specify a GPU-accelerated ECS instance type that is supported by Knative.
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
ports:
- containerPort: 8080
resources:
limits:
nvidia.com/gpu: '1' # Specify the number of GPUs that are required by the container. This field is required. If you do not specify this field, an error is returned when the pod is launched.
The following GPU-accelerated ECS instance families are supported:
gn7i, a GPU-accelerated compute-optimized instance family that uses NVIDIA A10 GPUs. This instance family includes a variety of instance types, such as ecs.gn7i-c8g1.2xlarge.
gn7. This instance family includes a variety of instance types, such as ecs.gn7-c12g1.3xlarge.
gn6v, a GPU-accelerated compute-optimized instance family that uses NVIDIA V100 GPUs. This instance family includes a variety of instance types, such as ecs.gn6v-c8g1.2xlarge.
gn6e, a GPU-accelerated compute-optimized instance family that uses NVIDIA V100 GPUs. This instance family includes a variety of instance types, such as ecs.gn6e-c12g1.3xlarge.
gn6i, a GPU-accelerated compute-optimized instance family that uses NVIDIA T4 GPUs. This instance family includes a variety of instance types, such as ecs.gn6i-c4g1.xlarge.
gn5i, GPU-accelerated compute-optimized instance family that uses NVIDIA P4 GPUs. This instance family includes a variety of instance types, such as ecs.gn5i-c2g1.large.
gn5, GPU-accelerated compute-optimized instance family that uses NVIDIA P100 GPUs. This instance family includes a variety of instance types, such as ecs.gn5-c4g1.xlarge.
The gn5 instance family is equipped with local disks. You can mount local disks to elastic container instances. For more information, see Create an elastic container instance that has local disks attached.
The GPU driver version supported by GPU-accelerated elastic container instances is NVIDIA 460.73.01. The CUDA Toolkit version supported by GPU-accelerated elastic container instances is 11.2.
For more information about GPU-accelerated ECS instance families, see ECS instance types available for each region and Overview of instance families.
Enable GPU sharing
See examples to enable the GPU sharing feature for nodes.
You can configure the
aliyun.com/gpu-mem
field in Knative Service to specify the GPU memory size. Example:apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go namespace: default spec: template: metadata: annotations: autoscaling.knative.dev/maxScale: "100" autoscaling.knative.dev/minScale: "0" spec: containerConcurrency: 1 containers: - image: registry-vpc.cn-hangzhou.aliyuncs.com/hz-suoxing-test/test:helloworld-go name: user-container ports: - containerPort: 6666 name: http1 protocol: TCP resources: limits: aliyun.com/gpu-mem: "3" # Specify the GPU memory size.
References
You can deploy AI models as inference services in Knative pods, configure auto scaling, and flexibly allocate GPU resources to improve the utilization of GPU resources and boost the performance of AI inference. For more information, see Best practices for deploying AI inference services in Knative.
For frequently asked questions and solutions about GPU, see FAQ.