ack-kube-queue is designed to manage AI, machine learning, and batch workloads in Kubernetes. It allows system administrators to customize job queue management and improve the flexibility of queues. Integrated with a quota system, ack-kube-queue automates and optimizes the management of workloads and resource quotas to maximize resource utilization in Kubernetes clusters. This topic describes how to install and use ack-kube-queue.
Prerequisites
Limits
Only Container Service for Kubernetes (ACK) Pro clusters whose Kubernetes versions are 1.18.aliyun.1 or later are supported.
Install ack-kube-queue
This section describes how to install ack-kube-queue in two scenarios.
Scenario 1: The cloud-native AI suite is not installed
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose
.In the lower part of the Cloud-native AI Suite page, click Deploy.
In the Scheduling section, select Kube Queue. In the Interactive Mode section, select Arena. In the lower part of the page, click Deploy Cloud-native AI Suite.
Scenario 2: The cloud-native AI suite is installed
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose
.Install ack-arena and ack-kube-queue.
On the Cloud-native AI Suite page, find ack-arena and click Deploy in the Actions column. In the Parameters panel, click OK.
On the Cloud-native AI Suite page, find ack-kube-queue and click Deploy in the Actions column. In the message that appears, click Confirm.
After ack-arena and ack-kube-queue are installed, Deployed is displayed in the Status column of the Components section.
Supported job types
ack-kube-queue supports TensorFlow jobs, PyTorch jobs, MPI jobs, Argo workflows, Ray jobs, Spark applications, and Kubernetes-native jobs.
Limits
You must use the Operator provided by ack-arena to submit TensorFlow jobs, PyTorch jobs, and MPI jobs to a queue.
To submit Kubernetes-native jobs to a queue, make sure that the Kubernetes version of the cluster is 1.22 or later.
You can submit MPI jobs to a queue only by using Arena.
You can submit only Argo workflows to a queue. You cannot submit steps in Argo workflows to a queue. You can add the following annotation to declare the resources requested by an Argo workflow.
``` annotations: kube-queue/min-resources: | cpu: 5 memory: 5G ```
Submit different types of jobs to a queue
By default, ack-kube-queue supports TensorFlow jobs and PyTorch jobs. You can change the job types supported by ack-kube-queue as needed.
Versions before v0.4.0
The kube-queue feature for each job type is controlled by an individual Deployment. You can set the number of replicated pods in the Extension section to 0 for the Deployment in the kube-queue namespace to disable the kube-queue feature.
Versions v0.4.0 and later
Except for Argo workflows, the kube-queue feature for other job types is controlled by Job-Extensions. You can enable or disable the feature for a job type by modifying the value of --enabled-extensions
in the command. Separate different job types with commas (,). The following table describes the job types and their names used in the command.
TfJob | tfjob |
Pytorchjob | pytorchjob |
Job | job |
SparkApplication | sparkapp |
RayJob | rayjob |
RayJob(v1alpha1) | rayjobv1alpha1 |
MpiJob | mpiv1 |
Submit TensorFlow jobs, PyTorch jobs, and MPI jobs to a queue
You must add the scheduling.x-k8s.io/suspend="true"
annotation to a job. The following sample code submits a TensorFlow job to a queue.
apiVersion: "kubeflow.org/v1"
kind: "TFJob"
metadata:
name: "job1"
annotations:
scheduling.x-k8s.io/suspend: "true"
spec:
...
Submit Kubernetes-native jobs to a queue
You must set the suspend field of the job to true. The following sample code submits a Kubernetes-native job to a queue.
apiVersion: batch/v1
kind: Job
metadata:
generateName: pi-
spec:
suspend: true
completions: 1
parallelism: 1
template:
spec:
schedulerName: default-scheduler
containers:
- name: pi
image: perl:5.34.0
command: ["sleep", "3s"]
resources:
requests:
cpu: 100m
limits:
cpu: 100m
restartPolicy: Never
In the preceding example, the job that requests 100m
of CPU resources is queued. When the job is dequeued, the value of the suspend field of the job is changed to false, and the job is executed by the cluster component.
Submit Argo workflows to a queue
Install ack-workflow from the marketplace in the console.
You must add a custom suspend template named kube-queue-suspend to the Argo workflow and set the suspend field to true when you submit the workflow. Example:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: $example-name
spec:
suspend: true # Set this field to true.
entrypoint: $example-entrypoint
templates:
# Add a suspend template named kube-queue-suspend.
- name: kube-queue-suspend
suspend: {}
- name: $example-entrypoint
...
Submit Spark applications to a queue
Install ack-spark-operator from the marketplace in the console.
You must add the scheduling.x-k8s.io/suspend="true"
annotation to a Spark application.
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
generateName: spark-pi-suspend-
namespace: spark-operator
annotations:
scheduling.x-k8s.io/suspend: "true"
spec:
type: Scala
mode: cluster
image: registry-cn-beijing.ack.aliyuncs.com/acs/spark:v3.1.1
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar
sparkVersion: "3.1.1"
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
serviceAccount: ack-spark-operator3.0-spark
executor:
cores: 1
instances: 1
memory: "512m"
Submit Ray jobs to a queue
Install ack-kuberay-operator from the marketplace in the console.
You must set the spec.suspend
field of a Ray job to true when you submit the Ray job to a queue.
apiVersion: ray.io/v1
kind: RayJob
metadata:
name: rayjob-sample
spec:
entrypoint: python /home/ray/samples/sample_code.py
runtimeEnvYAML: |
pip:
- requests==2.26.0
- pendulum==2.1.2
env_vars:
counter_name: "test_counter"
# Suspend specifies whether the RayJob controller should create a RayCluster instance.
# If a job is applied with the suspend field set to true, the RayCluster will not be created and we will wait for the transition to false.
# If the RayCluster is already created, it will be deleted. In the case of transition to false, a new RayCluste rwill be created.
suspend: true
rayClusterSpec:
rayVersion: '2.9.0' # should match the Ray version in the image of the containers
headGroupSpec:
rayStartParams:
dashboard-host: '0.0.0.0'
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.9.0
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265 # Ray dashboard
name: dashboard
- containerPort: 10001
name: client
resources:
limits:
cpu: "1"
requests:
cpu: "200m"
volumeMounts:
- mountPath: /home/ray/samples
name: code-sample
volumes:
# You set volumes at the Pod level, then mount them into containers inside that Pod
- name: code-sample
configMap:
# Provide the name of the ConfigMap you want to mount.
name: ray-job-code-sample
# An array of keys from the ConfigMap to create as files
items:
- key: sample_code.py
path: sample_code.py
workerGroupSpecs:
- replicas: 1
minReplicas: 1
maxReplicas: 5
groupName: small-group
rayStartParams: {}
template:
spec:
containers:
- name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc'
image: rayproject/ray:2.9.0
lifecycle:
preStop:
exec:
command: [ "/bin/sh","-c","ray stop" ]
resources:
limits:
cpu: "1"
requests:
cpu: "200m"
Change the type of quota system
If a cluster is used by multiple users, you must allocate a fixed amount of resources to each user in case the users compete for resources. The traditional method is to use Kubernetes resource quotas to allocate a fixed amount of resources to each user. However, the resource requirements may be different among users. By default, ack-kube-queue uses elastic quotas to improve the overall resource utilization. If you want to use Kubernetes resource quotas, perform the following steps: For more information about elastic quotas, see ElasticQuota.
Run the following command to switch from elastic quotas to Kubernetes resource quotas:
kubectl edit deploy kube-queue-controller -nkube-queue
Change the environment variable from
elasticquota
toresourcequota
.env: - name: QueueGroupPlugin value: resourcequota
Save the file after you modify the configuration. Wait for kube-queue-controller to start up. Then, use Kubernetes resource quotas to allocate resources.
Enable blocking queues
Same as kube-scheduler, ack-kube-queue processes jobs in a round robin manner by default. The jobs in a queue request resources one after one. Jobs that fail to obtain resources are submitted to the Unschedulable queue and wait for the next round of job scheduling. When a cluster contains a large number of jobs whose resource demand is low, it is time-consuming to handle these jobs in a round robin manner. In this scenario, jobs whose resource demand is high are more likely to become pending because jobs with low resource demand compete for resources. To avoid this problem, ack-kube-queue provides blocking queues. After you enable this feature, only the first job in a queue is scheduled. This way, more resources can be scheduled to jobs with high resource demand.
Procedure
Log on to the ACK console. In the left-side navigation pane, click Cluster.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
Set the Namespace parameter to kube-queue. Then, click Edit in the Actions column of kube-queue-controller.
Click Add to the right of Environment Variable and add the following environment variable.
Parameter
Value
Type
Custom
Variable Key
StrictPriority
Value/ValueFrom
true
Click Update on the right side of the page. In the OK message, click Confirm.
Enable strict priority scheduling
By default, jobs that fail to obtain resources are submitted to the Unschedulable queue and wait for the next round of job scheduling. After a job is completed, resources occupied by the job are released. These resources are not scheduled to jobs with high priorities because these jobs are still in the Unschedulable queue. Consequently, idle resources are scheduled to jobs with low priorities. To preferably schedule idle resources to jobs with high priorities, ack-kube-queue provides the strict priority scheduling feature. After a job releases resources, the system attempts to schedule resources to the job with the highest priority, which is the first job in the queue. This ensures that idle resources are preferably scheduled to jobs with high priorities.
Jobs with low priorities can compete for idle resources when the idle resources are insufficient to fulfill jobs with high priorities.
Procedure
Log on to the ACK console. In the left-side navigation pane, click Cluster.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
Set the Namespace parameter to kube-queue. Then, click Edit in the Actions column of kube-queue-controller.
Click Add to the right of Environment Variable and add the following environment variable.
Parameter
Value
Type
Custom
Variable Key
StrictConsistency
Value/ValueFrom
true
Click Update on the right side of the page. In the OK message, click Confirm.
Use case of resource quotas
ElasticQuota
Use the following YAML template to create an ElasticQuotaTree:
Run the following command to check whether the ElasticQuotaTree is created:
kubectl get elasticquotatree -A
Expected output:
NAMESPACE NAME AGE kube-system elasticquotatree 7s
The ElasticQuotaTree is created.
Create jobs.
NoteTo test the job queue management feature of ack-kube-queue, the resource quota for the jobs that you create must be less than the total amount of resources requested by the jobs.
To simplify the test, the TensorFlow image used by the TensorFlow jobs is replaced by the BusyBox image. Each container sleeps for 30 seconds to simulate the training process.
Use the following YAML template to create two TensorFlow jobs:
Query the status of the jobs after you submit the jobs.
Run the following command to query the status of the jobs:
kubectl get tfjob
Expected output:
NAME STATE AGE job1 Running 3s job2 Queuing 2s
The output indicates that
job1
is in the Running state andjob2
is in the Queuing state. This is because each TensorFlow job requests three vCores but the ElasticQuotaTree that you created allocates four vCores to the default namespace. Consequently, the two TensorFlow jobs cannot run at the same time.Wait a period of time and run the following command again:
kubectl get tfjob
Expected output:
NAME STATE AGE job1 Succeeded 77s job2 Running 77s
The output indicates that
job1
is completed. Afterjob1
is completed,job2
starts to run. The output indicates that ack-kube-queue manages job queues as expected.
ResourceQuota
Use the following YAML template to create a ResourceQuota:
apiVersion: v1 kind: ResourceQuota metadata: name: default spec: hard: cpu: "4" memory: 4Gi
Run the following command to check whether the ResourceQuota is created:
kubectl get resourcequota default -o wide
Expected output:
NAME AGE REQUEST LIMIT default 76s cpu: 0/4, memory: 0/4Gi
The ResourceQuota is created.
Use the following YAML template to create two TensorFlow jobs:
After the two jobs are submitted, run the following command to query the status of the jobs:
kubectl get tfjob NAME STATE AGE job1 Running 5s job2 Queuing 5s kubectl get pods NAME READY STATUS RESTARTS AGE job1-ps-0 1/1 Running 0 8s job1-worker-0 1/1 Running 0 8s job1-worker-1 1/1 Running 0 8s
job1 is in the Running state and the job2 is in the Queuing state. The result indicates that ack-kube-queue manages job queues as expected. This is because each TensorFlow job requests three vCores: One vCore for the parameter server pod and one vCore for each of the two worker pods. However, the ElasticQuotaTree that you created allocates four vCores to the default namespace. Consequently, the two TensorFlow jobs cannot run at the same time.
Wait a period of time and then run the following command:
kubectl get tfjob NAME STATE AGE job1 Succeeded 77s job2 Running 77s kubectl get pods NAME READY STATUS RESTARTS AGE job1-worker-0 0/1 Completed 0 54s job1-worker-1 0/1 Completed 0 54s job2-ps-0 1/1 Running 0 22s job2-worker-0 1/1 Running 0 22s job2-worker-1 1/1 Running 0 21s
job1 is completed. job2 starts to run after job1 is completed. The result indicates that ack-kube-queue manages job queues as expected.
Limit the number of jobs that can be concurrently dequeued
In scenarios where an application can be scaled automatically, the amount of resources required by the application may be unpredictable. In this case, you can limit the number of dequeued jobs. To limit the number of jobs in a queue, you need to define a
kube-queue/max-jobs
resource in the ElasticQuotaTree. After the limit is set, the number of queue units that can be dequeued below the quota cannot exceed the maximum number of jobs in the queue multiplied by the overcommitment ratio. Example: