Container Service for Kubernetes (ACK) provides the Slurm on Kubernetes solution and the ack-slurm-operator component. Together, they allow you to deploy and manage the Simple Linux Utility for Resource Management (Slurm) scheduling system in ACK clusters for high performance computing (HPC) and large-scale AI and machine learning (ML) workloads.
Introduction to Slurm
Slurm is a powerful open source platform for cluster resource management and job scheduling. It is designed to optimize the performance and efficiency of supercomputers and large compute clusters. The following figure shows how its key components work together.
slurmctld: The Slurm control daemon. As the central management component of Slurm, slurmctld monitors system resources, schedules jobs, and manages the cluster status. You can configure a secondary slurmctld for failover to ensure high availability.
slurmd: The Slurm node daemon. Deployed on each compute node, slurmd receives instructions from slurmctld and manages the job lifecycle, including starting and executing jobs, reporting job status, and preparing for new job assignments. Jobs are scheduled through slurmd.
slurmdbd: The Slurm database daemon. This optional component maintains a centralized database for job history and accounting information. It is essential for long-term management and auditing of large clusters. slurmdbd can aggregate data across multiple Slurm-managed clusters to simplify data management.
Slurm CLI: Slurm provides the following command-line tools for job management and system monitoring:
scontrol: Manages clusters and controls cluster configurations.
squeue: Queries the status of jobs in the queue.
srun: Submits and manages jobs.
sbatch: Submits jobs in batches for scheduling and managing computing resources.
sinfo: Queries the overall status of a cluster, including node availability.
Introduction to Slurm on ACK
The Slurm Operator uses the SlurmCluster CustomResource (CR) to define the configuration files required for managing Slurm clusters. This simplifies the deployment and maintenance of Slurm-managed clusters and resolves control plane management issues. The following figure shows the architecture of Slurm on ACK.
A cluster administrator deploys and manages a Slurm-managed cluster by defining a SlurmCluster CR. The Slurm Operator then creates the Slurm control components in the cluster based on this CR. A Slurm configuration file can be mounted to a control component by using a shared volume or a ConfigMap.
Prerequisites
An ACK cluster that runs Kubernetes 1.22 or later is created, and the cluster contains one GPU-accelerated node. For more information, see Create an ACK cluster with GPU-accelerated nodes and Update clusters.
Step 1: Install the ack-slurm-operator component
Log on to the ACK console. In the left-side navigation pane, choose Marketplace > Marketplace.
On the Marketplace page, search for ack-slurm-operator and click the component. On the details page, click Deploy in the upper-right corner. In the Deploy panel, configure the parameters. You need to specify only the Cluster parameter. Use the default settings for all other parameters.
After you configure the parameters, click OK.
Step 2: Create a Slurm-managed cluster
You can create a Slurm-managed cluster either manually or by using Helm. Choose the method that best fits your needs.
Manually create a Slurm-managed cluster
Create a MUNGE authentication Secret
MUNGE (MUNGE Uid 'N' Gid Emporium) provides authentication between Slurm components. You must create a Kubernetes Secret to store the MUNGE key.
Run the following command to generate a key by using OpenSSL:
openssl rand -base64 512 | tr -d '\r\n'Run the following command to create a Secret that stores the generated key:
Replace
<$MungeKeyName>with a custom name for your key, such asmungekey.Replace
<$MungeKey>with the key string generated in the previous step.
kubectl create secret generic <$MungeKeyName> --from-literal=munge.key=<$MungeKey>
After you create the Secret, you can configure or associate it with the Slurm-managed cluster for MUNGE-based authentication.
Create a ConfigMap for the Slurm-managed cluster
In this example, a ConfigMap is mounted to a pod by specifying the slurmConfPath parameter in the CR. This ensures that the pod configuration is automatically restored to the expected state even if the pod is recreated.
The data parameter in the following sample code specifies a sample ConfigMap. To generate a ConfigMap, we recommend that you use the Easy Configurator or Full Configurator tool.
Expected output:
configmap/slurm-test createdThis output indicates that the ConfigMap is created.
Submit the SlurmCluster CR
Create a file named
slurmcluster.yamland copy the following content to the file. This SlurmCluster CR creates a Slurm-managed cluster with one head node and four worker nodes. The cluster runs as a pod in the ACK cluster. The values of themungeConfPathandslurmConfPathparameters in the SlurmCluster CR must match the mount targets specified in theslurmctldandworkerGroupSpecssections.Run the following command to deploy the
slurmcluster.yamlfile to the cluster: Expected output:kubectl apply -f slurmcluster.yamlslurmcluster.kai.alibabacloud.com/slurm-job-demo createdRun the following command to verify that the Slurm-managed cluster runs as expected: Expected output: This output indicates that the Slurm-managed cluster is deployed and its five nodes are ready.
kubectl get slurmclusterNAME AVAILABLE WORKERS STATUS AGE slurm-job-demo 5 ready 14mRun the following command to verify that all pods in the Slurm-managed cluster named
slurm-job-demoare in the Running state: Expected output: This output confirms that the head node and four worker nodes are running as expected.kubectl get podNAME READY STATUS RESTARTS AGE slurm-job-demo-head-x9sgs 1/1 Running 0 14m slurm-job-demo-worker-cpu-0 1/1 Running 0 14m slurm-job-demo-worker-cpu-1 1/1 Running 0 14m slurm-job-demo-worker-cpu1-0 1/1 Running 0 14m slurm-job-demo-worker-cpu1-1 1/1 Running 0 14m
Create a Slurm-managed cluster by using Helm
To quickly install and manage a Slurm-managed cluster with flexible configuration, you can use Helm to install the SlurmCluster chart provided by Alibaba Cloud. Download the Helm chart from charts-incubator (the Alibaba Cloud chart repository). After you configure the parameters, Helm creates resources such as role-based access control (RBAC), ConfigMap, Secret, and the Slurm-managed cluster.
Resources created by the Helm chart
| Resource type | Resource name | Description |
|---|---|---|
| ConfigMap | {{ .Values.slurmConfigs.configMapName }} | When .Values.slurmConfigs.createConfigsByConfigMap is set to True, this ConfigMap stores user-defined Slurm configurations. It is mounted to the path specified by .Values.slurmConfigs.slurmConfigPathInPod, which is also rendered as .Spec.SlurmConfPath of the Slurm-managed cluster. When the pod starts, the ConfigMap is copied to /etc/slurm/ and access is restricted. |
| ServiceAccount | {{ .Release.Namespace }}/{{ .Values.clusterName }} | Allows the slurmctld pod to modify SlurmCluster CR configurations, enabling auto scaling of on-cloud nodes. |
| Role | {{ .Release.Namespace }}/{{ .Values.clusterName }} | Grants the slurmctld pod permissions to modify SlurmCluster CR configurations for auto scaling. |
| RoleBinding | {{ .Release.Namespace }}/{{ .Values.clusterName }} | Binds the Role to the ServiceAccount for auto scaling permissions. |
| Role | {{ .Values.slurmOperatorNamespace }}/{{ .Values.clusterName }} | Allows the slurmctld pod to modify Secrets in the SlurmOperator namespace. When Slurm and Kubernetes are deployed on the same batch of physical servers, the Slurm-managed cluster can use this resource to renew tokens. |
| RoleBinding | {{ .Values.slurmOperatorNamespace }}/{{ .Values.clusterName }} | Binds the operator namespace Role to the ServiceAccount for token renewal. |
| Secret | {{ .Values.mungeConfigs.secretName }} | Stores the MUNGE authentication key for Slurm component communication. When .Values.mungeConfigs.createConfigsBySecret is set to True, this Secret is created automatically with "munge.key"={{ .Values.mungeConfigs.content }}. The mount path is rendered from .Spec.MungeConfPath, and the pod startup commands initialize /etc/munge/munge.key from this path. |
| SlurmCluster | The rendered Slurm-managed cluster. |
Helm chart parameters
| Parameter | Example | Description |
|---|---|---|
| clusterName | "" | The cluster name. Used to generate Secrets and roles. The value must match the cluster name in your Slurm configuration files. |
| headNodeConfig | N/A | Required. Configures the slurmctld pod. |
| workerNodesConfig | N/A | Configures the slurmd pods. |
| workerNodesConfig.deleteSelfBeforeSuspend | true | When set to true, a preStop hook is automatically added to the worker pod. This triggers automatic node draining before the node is removed and marks the node as unschedulable. |
| slurmdbdConfigs | N/A | Configures the slurmdbd pod. If left empty, no slurmdbd pod is created. |
| slurmrestdConfigs | N/A | Configures the slurmrestd pod. If left empty, no slurmrestd pod is created. |
| headNodeConfig.hostNetwork / slurmdbdConfigs.hostNetwork / slurmrestdConfigs.hostNetwork / workerNodesConfig.workerGroups[].hostNetwork | false | Rendered as the hostNetwork parameter of the respective pod. |
| headNodeConfig.setHostnameAsFQDN / slurmdbdConfigs.setHostnameAsFQDN / slurmrestdConfigs.setHostnameAsFQDN / workerNodesConfig.workerGroups[].setHostnameAsFQDN | false | Rendered as the setHostnameAsFQDN parameter of the respective pod. |
| headNodeConfig.nodeSelector / slurmdbdConfigs.nodeSelector / slurmrestdConfigs.nodeSelector / workerNodesConfig.workerGroups[].nodeSelector | nodeSelector: example: example | Rendered as the nodeSelector parameter of the respective pod. |
| headNodeConfig.tolerations / slurmdbdConfigs.tolerations / slurmrestdConfigs.tolerations / workerNodesConfig.workerGroups[].tolerations | tolerations:- key: value: operator: | Rendered as the tolerations of the respective pod. |
| headNodeConfig.affinity / slurmdbdConfigs.affinity / slurmrestdConfigs.affinity / workerNodesConfig.workerGroups[].affinity | affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - zone-a preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: another-node-label-key operator: In values: - another-node-label-value | Rendered as the affinity rules of the respective pod. |
| headNodeConfig.resources / slurmdbdConfigs.resources / slurmrestdConfigs.resources / workerNodesConfig.workerGroups[].resources | resources: requests: cpu: 1 limits: cpu: 1 | Rendered as the resources of the primary container in the respective pod. The resource limit of the worker pod primary container is rendered as the Slurm node resource limit. |
| headNodeConfig.image / slurmdbdConfigs.image / slurmrestdConfigs.image / workerNodesConfig.workerGroups[].image | "registry-cn-hangzhou.ack.aliyuncs.com/acs/slurm:23.06-1.6-aliyun-49259f59" | Rendered as the container image. You can also build a custom image from ai-models-on-ack/framework/slurm/building-slurm-image. |
| headNodeConfig.imagePullSecrets / slurmdbdConfigs.imagePullSecrets / slurmrestdConfigs.imagePullSecrets / workerNodesConfig.workerGroups[].imagePullSecrets | imagePullSecrets:- name: example | Rendered as the Secret used to pull the container image. |
| headNodeConfig.podSecurityContext / slurmdbdConfigs.podSecurityContext / slurmrestdConfigs.podSecurityContext / workerNodesConfig.workerGroups[].podSecurityContext | podSecurityContext: runAsUser: 1000 runAsGroup: 3000 fsGroup: 2000 supplementalGroups: [4000] | Rendered as the pod-level security context. |
| headNodeConfig.securityContext / slurmdbdConfigs.securityContext / slurmrestdConfigs.securityContext / workerNodesConfig.workerGroups[].securityContext | securityContext: allowPrivilegeEscalation: false | Rendered as the security context of the primary container. |
| headNodeConfig.volumeMounts / slurmdbdConfigs.volumeMounts / slurmrestdConfigs.volumeMounts / workerNodesConfig.workerGroups[].volumeMounts | N/A | Rendered as the volume mounting configurations of the primary container. |
| headNodeConfig.volumes / slurmdbdConfigs.volumes / slurmrestdConfigs.volumes / workerNodesConfig.workerGroups[].volumes | N/A | Rendered as the volumes mounted to the pod. |
| slurmConfigs.slurmConfigPathInPod | "" | The mount path of the Slurm configuration file in the pod. If the configuration file is mounted as a volume, set this value to the path where slurm.conf is mounted. The pod startup commands copy the file to /etc/slurm/ and restrict access. |
| slurmConfigs.createConfigsByConfigMap | true | Specifies whether to automatically create a ConfigMap for Slurm configurations. |
| slurmConfigs.configMapName | "" | The name of the ConfigMap that stores the Slurm configurations. |
| slurmConfigs.filesInConfigMap | "" | The content in the automatically created ConfigMap. |
| mungeConfigs.mungeConfigPathInPod | N/A | The mount path of the MUNGE configuration file in the pod. If the configuration file is mounted as a volume, set this value to the path where munge.key is mounted. The pod startup commands copy the file to /etc/munge/ and restrict access. |
| mungeConfigs.createConfigsBySecret | N/A | Specifies whether to automatically create a Secret for MUNGE configurations. |
| mungeConfigs.secretName | N/A | The name of the Secret that stores the MUNGE configurations. |
| mungeConfigs.content | N/A | The content in the automatically created Secret. |
For more information about slurmConfigs.filesInConfigMap, see Slurm System Configuration Tool (schedmd.com).
If you modify slurmConfigs.filesInConfigMap after the pod is created, you must recreate the pod for the change to take effect. We recommend that you verify the modification before recreating the pod.
Install the Helm chart
Run the following command to add the Alibaba Cloud Helm repository to your local Helm client: This allows you to access various charts provided by Alibaba Cloud, including the Slurm-managed cluster chart.
helm repo add aliyun https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts-incubator/Run the following command to pull and decompress the Helm chart: This creates a directory named
ack-slurm-clusterin the current directory. The directory contains all chart files and templates.helm pull aliyun/ack-slurm-cluster --untar=trueModify the chart parameters in the
values.yamlfile. Thevalues.yamlfile contains the default chart configurations. Modify this file to customize parameter settings such as Slurm configurations, resource requests and limits, and storage based on your requirements.cd ack-slurm-cluster vi values.yamlUse Helm to install the chart: This deploys the Slurm-managed cluster.
cd .. helm install my-slurm-cluster ack-slurm-cluster # Replace my-slurm-cluster with your desired release name.Verify that the Slurm-managed cluster is deployed. After the deployment is complete, use
kubectlto check the deployment status and confirm that the Slurm-managed cluster runs as expected:kubectl get pods -l app.kubernetes.io/name=slurm-cluster
Step 3: Log on to the Slurm-managed cluster
Log on as a Kubernetes cluster administrator
A Kubernetes cluster administrator has the permissions to manage the entire Kubernetes cluster. Because a Slurm-managed cluster runs as a pod in the Kubernetes cluster, the administrator can use kubectl to log on to any pod of any Slurm-managed cluster and has root permissions by default.
Run the following command to log on to a pod of the Slurm-managed cluster:
# Replace slurm-job-demo-head-x9sgs with the name of the pod in your cluster.
kubectl exec -it slurm-job-demo-xxxxx -- bashLog on as a regular user of the Slurm-managed cluster
Administrators or regular users of a Slurm-managed cluster may not have permissions to run the kubectl exec command. In this case, you can log on to the Slurm-managed cluster by using SSH. Two methods are available:
LoadBalancer Service: Use an external IP address of a Service to access the head pod. This method is suitable for long-term, stable connections. You access the Slurm-managed cluster from anywhere within the internal network by using a Classic Load Balancer (CLB) instance and its external IP address.
Port forwarding: Use the
kubectl port-forwardcommand for temporary access. This method is suitable for short-term operations and maintenance (O&M) or debugging because it requires continuous execution of the port-forward command.
Log on to the head pod by using a LoadBalancer Service
Create a LoadBalancer Service to expose internal services in the cluster to external access. For more information, see Use an existing SLB instance to expose an application or Use an automatically created SLB instance to expose an application.
The LoadBalancer Service must use an internal-facing Classic Load Balancer (CLB) instance.
Add the following labels to the Service so that it routes incoming requests to the expected pod:
kai.alibabacloud.com/slurm-cluster: ack-slurm-cluster-1kai.alibabacloud.com/slurm-node-type: head
Run the following command to obtain the external IP address of the LoadBalancer Service:
kubectl get svcRun the following command to log on to the head pod by using SSH:
# Replace $YOURUSER with your username and $EXTERNAL_IP with the external IP address of the Service. ssh $YOURUSER@$EXTERNAL_IP
Forward requests by using the port-forward command
To use the port-forward command, you must save the kubeconfig file of the Kubernetes cluster to your local host. This may cause security risks. We recommend that you do not use this method in production environments.
Run the following command to enable a local port for request forwarding and map it to port 22 of the head pod running slurmctld. SSH uses port 22 by default.
# Replace $NAMESPACE, $CLUSTERNAME, and $LOCALPORT with the actual values. kubectl port-forward -n $NAMESPACE svc/$CLUSTERNAME $LOCALPORT:22While the
port-forwardcommand is running, run the following command to log on. All users on the current host can log on to the cluster and submit jobs.# Replace $YOURUSER with the username you want to use to log on to the head pod. ssh -p $LOCALPORT $YOURUSER@localhost
Step 4: Use the Slurm-managed cluster
The following sections describe how to synchronize users across nodes, share logs across nodes, and perform auto scaling for the Slurm-managed cluster.
Synchronize users across nodes
Slurm does not provide a centralized user authentication service. When you use the sbatch command to submit jobs, the jobs may fail if the submitting user's account does not exist on the node selected to execute the jobs. To resolve this issue, you can configure Lightweight Directory Access Protocol (LDAP) for the Slurm-managed cluster. LDAP serves as a centralized backend service for authentication, allowing Slurm to authenticate user identities.
Deploy the LDAP backend
Create a file named
ldap.yamland copy the following content to the file. This creates a basic LDAP instance that stores and manages user information. Theldap.yamlfile defines an LDAP backend pod and its associated Service. The pod contains an LDAP container, and the Service exposes the LDAP service within the network.Run the following command to deploy the LDAP backend Service: Expected output:
kubectl apply -f ldap.yamldeployment.apps/ldap created service/ldap-service created secret/ldap-secret created
(Optional) Deploy the LDAP frontend
Create a file named
phpldapadmin.yamland copy the following content to the file. This deploys an LDAP frontend pod and its associated Service for improved management efficiency through a web interface.Run the following command to deploy the LDAP frontend Service:
kubectl apply -f phpldapadmin.yaml
Configure the LDAP client
Log on to a pod in the Slurm-managed cluster as described in Step 3, and run the following commands to install the LDAP client package:
apt update apt install libnss-ldapdAfter the
libnss-ldapdpackage is installed, configure the network authentication service for the Slurm-managed cluster in the pod.Modify the following parameters in the
/etc/nslcd.conffile to define the connection to the LDAP server:
apt update apt install vim... BASE dc=example,dc=org # Replace the value with the distinguished name of the root node in the LDAP directory structure. URI ldap://ldap-service # Replace the value with the uniform resource identifier (URI) of your LDAP server. ...... uri ldap://ldap-service # Replace the value with the URI of your LDAP server. base dc=example,dc=org # Specify this parameter based on your LDAP directory structure. ... tls_cacertfile /etc/ssl/certs/ca-certificates.crt # Specify the path to the certificate authority (CA) certificate file used to verify the LDAP server certificate. ...
Share and access logs
By default, the job logs generated by the sbatch command are stored on the node that executes the jobs. This can make it difficult to view logs centrally. To simplify log management, you can create a File Storage NAS (NAS) file system to store all job logs in accessible directories. This allows logs to be centrally collected and accessed regardless of which node executes the computing jobs.
Create a NAS file system to store and share the logs of each node. For more information, see Create a file system.
Log on to the ACK console, and create a persistent volume (PV) and a persistent volume claim (PVC) for the NAS file system. For more information, see Mount a statically provisioned NAS volume.
Modify the SlurmCluster CR. Configure the
volumeMountsandvolumesparameters in theslurmctldandworkerGroupSpecssections to reference the created PVC and mount it to the/homedirectory. Example:slurmctld: ... # Specify /home as the mount target. volumeMounts: - mountPath: /home name: test # The name of the volume that references the PVC. volumes: # Add the PVC definition. - name: test # Must match the name in volumeMounts. persistentVolumeClaim: claimName: test # Replace with the name of your PVC. ... workerGroupSpecs: # ... Repeat the volume and volumeMounts configuration for each worker group.Run the following command to deploy the SlurmCluster CR. After the SlurmCluster CR is deployed, worker nodes can share the NAS file system.
ImportantIf the SlurmCluster CR fails to deploy, run the
kubectl delete slurmcluster slurm-job-democommand to delete the CR and then redeploy it.kubectl apply -f slurmcluster.yaml
Perform auto scaling for the Slurm-managed cluster
The root path of the default Slurm image contains executable files and scripts such as slurm-resume.sh, slurm-suspend.sh, and slurmctld-copilot. These interact with slurmctld to scale the Slurm-managed cluster.
Auto scaling for Slurm clusters based on on-cloud nodes
Slurm on ACK supports two types of nodes:
Local nodes: Physical compute nodes that are directly connected to slurmctld.
On-cloud nodes: Logical nodes backed by VM instances that can be created and destroyed on demand by cloud service providers.
Auto scaling for Slurm on ACK
Procedure
Configure auto scaling permissions. If Helm is installed, auto scaling permissions are automatically configured for the slurmctld pod and you can skip this step. The head pod requires permissions to access and update the SlurmCluster CR for auto scaling. We recommend that you use RBAC to grant the required permissions. Follow these steps: First, create the ServiceAccount, Role, and RoleBinding for the slurmctld pod. In the following example, the Slurm-managed cluster name is
slurm-job-demoand the namespace isdefault. Create a file namedrbac.yamland copy the following content to the file: Runkubectl apply -f rbac.yamlto submit the resource list. Next, grant permissions to the slurmctld pod. Runkubectl edit slurmcluster slurm-job-demoto modify the Slurm-managed cluster. SetSpec.Slurmctld.Template.Spec.ServiceAccountNameto the ServiceAccount you created: To apply the changes, rebuild the StatefulSet that manages slurmctld. Runkubectl get sts slurm-job-demoto find the StatefulSet, then runkubectl delete sts slurm-job-demoto delete it. The Slurm Operator rebuilds the StatefulSet and applies the new configurations.apiVersion: v1 kind: ServiceAccount metadata: name: slurm-job-demo --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: slurm-job-demo rules: - apiGroups: ["kai.alibabacloud.com"] resources: ["slurmclusters"] verbs: ["get", "watch", "list", "update", "patch"] resourceNames: ["slurm-job-demo"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: slurm-job-demo subjects: - kind: ServiceAccount name: slurm-job-demo roleRef: kind: Role name: slurm-job-demo apiGroup: rbac.authorization.k8s.ioapiVersion: kai.alibabacloud.com/v1 kind: SlurmCluster ... spec: slurmctld: template: spec: serviceAccountName: slurm-job-demo ...Configure the auto scaling parameters in
/etc/slurm/slurm.conf.Method A: Manage ConfigMaps by using a shared volume
# The following parameters are required if you use on-cloud nodes. # The SuspendProgram and ResumeProgram features are developed by Alibaba Cloud. SuspendTimeout=600 ResumeTimeout=600 # The interval at which the node is automatically suspended when no job runs on the node. SuspendTime=600 # Set the number of nodes that can be scaled per minute. ResumeRate=1 SuspendRate=1 # You must set the value of the NodeName parameter in the ${cluster_name}-worker-${group_name}- format. You must specify the amount of resources for the node in this line. Otherwise, the slurmctld pod # considers that the node has only one vCPU. Make sure that the resources that you specified on the on-cloud nodes are the same as those declared in the workerGroupSpecs parameter. Otherwise, resources may be wasted. NodeName=slurm-job-demo-worker-cpu-[0-10] Feature=cloud State=CLOUD # The following configurations are fixed. Keep them unchanged. CommunicationParameters=NoAddrCache ReconfigFlags=KeepPowerSaveSettings SuspendProgram="/slurm-suspend.sh" ResumeProgram="/slurm-resume.sh"Method B: Manually manage ConfigMaps
If
slurm.confis stored in the ConfigMap namedslurm-config, runkubectl edit slurm-configto add the following configurations:slurm.conf: ... # The following parameters are required if you use on-cloud nodes. # The SuspendProgram and ResumeProgram features are developed by Alibaba Cloud. SuspendTimeout=600 ResumeTimeout=600 # The interval at which the node is automatically suspended when no job runs on the node. SuspendTime=600 # Set the number of nodes that can be scaled per minute. ResumeRate=1 SuspendRate=1 # You must set the value of the NodeName parameter in the ${cluster_name}-worker-${group_name}- format. You must specify the amount of resources for the node in this line. Otherwise, the slurmctld pod # considers that the node has only one vCPU. Make sure that the resources that you specified on the on-cloud nodes are the same as those declared in the workerGroupSpecs parameter. Otherwise, resources may be wasted. NodeName=slurm-job-demo-worker-cpu-[0-10] Feature=cloud State=CLOUD # The following configurations are fixed. Keep them unchanged. CommunicationParameters=NoAddrCache ReconfigFlags=KeepPowerSaveSettings SuspendProgram="/slurm-suspend.sh" ResumeProgram="/slurm-resume.sh"Method C: Use Helm to manage ConfigMaps
Run the
helm upgradecommand to update the Slurm configuration.
slurm.conf: ... # The following parameters are required if you use on-cloud nodes. # The SuspendProgram and ResumeProgram features are developed by Alibaba Cloud. SuspendTimeout=600 ResumeTimeout=600 # The interval at which the node is automatically suspended when no job runs on the node. SuspendTime=600 # Set the number of nodes that can be scaled per minute. ResumeRate=1 SuspendRate=1 # You must set the value of the NodeName parameter in the ${cluster_name}-worker-${group_name}- format. You must specify the amount of resources for the node in this line. Otherwise, the slurmctld pod # considers that the node has only one vCPU. Make sure that the resources that you specified on the on-cloud nodes are the same as those declared in the workerGroupSpecs parameter. Otherwise, resources may be wasted. NodeName=slurm-job-demo-worker-cpu-[0-10] Feature=cloud State=CLOUD # The following configurations are fixed. Keep them unchanged. CommunicationParameters=NoAddrCache ReconfigFlags=KeepPowerSaveSettings SuspendProgram="/slurm-suspend.sh" ResumeProgram="/slurm-resume.sh"Apply the new configuration. If the name of the Slurm-managed cluster is
slurm-job-demo, runkubectl delete sts slurm-job-demoto apply the new configuration for the slurmctld pod.Set the number of worker node replicas to 0 in the
slurmcluster.yamlfile so that you can observe node scaling activities in subsequent steps.Manual management
Run
kubectl edit slurmcluster slurm-job-demoand change the value ofworkerCountto 10 in the Slurm-managed cluster. This sets the number of worker node replicas to 0.Manage by using Helm
In the
values.yamlfile, change.Values.workerGroup[].workerCountto0. Then runhelm upgrade slurm-job-demo .to update the current Helm chart. This sets the number of worker node replicas to 0.Submit a job by using the sbatch command. Enter the following content after the command prompt: Expected output: This output confirms that the script content is correct. Expected output: This output indicates that the job is submitted and assigned a job ID.
Run the following command to submit the script to the Slurm-managed cluster for processing:
cat << EOF > cloudnodedemo.sh> #!/bin/bash > srun hostname > EOFcat cloudnodedemo.sh#!/bin/bash srun hostnamesbatch cloudnodedemo.shSubmitted batch job 1View the cluster scaling results. Expected output: This output indicates that the Slurm-managed cluster automatically added one compute node to execute the submitted job. Expected output: This output shows that the
slurm-demo-worker-cpu-0pod was added to the cluster, confirming that the cluster scaled out when the job was submitted. Expected output: This output shows thatslurm-demo-worker-cpu-0is the newly started node and another 10 on-cloud nodes are available for scale-out. Expected output: In the output,NodeList=slurm-demo-worker-cpu-0indicates that the job was executed on the newly added node. Expected output: This output shows that the number of nodes available for scale-out has increased to 11, confirming that the automatic scale-in is complete.After a period of time, run the following command to view the scale-in results:
cat /var/log/slurm-resume.lognamespace: default cluster: slurm-demo resume called, args [slurm-demo-worker-cpu-0] slurm cluster metadata: default slurm-demo get SlurmCluster CR slurm-demo succeed hostlists: [slurm-demo-worker-cpu-0] resume node slurm-demo-worker-cpu-0 resume worker -cpu-0 resume node -cpu-0 endkubectl get podNAME READY STATUS RESTARTS AGE slurm-demo-head-9hn67 1/1 Running 0 21m slurm-demo-worker-cpu-0 1/1 Running 0 43ssinfoPARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up infinite 10 idle~ slurm-job-demo-worker-cpu-[2-10] debug* up infinite 1 idle slurm-job-demo-worker-cpu-[0-1]scontrol show job 1JobId=1 JobName=cloudnodedemo.sh UserId=root(0) GroupId=root(0) MCS_label=N/A Priority=4294901757 Nice=0 Account=(null) QOS=(null) JobState=COMPLETED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2024-05-28T11:37:36 EligibleTime=2024-05-28T11:37:36 AccrueTime=2024-05-28T11:37:36 StartTime=2024-05-28T11:37:36 EndTime=2024-05-28T11:37:36 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-05-28T11:37:36 Scheduler=Main Partition=debug AllocNode:Sid=slurm-job-demo:93 ReqNodeList=(null) ExcNodeList=(null) NodeList=slurm-job-demo-worker-cpu-0 BatchHost=slurm-job-demo-worker-cpu-0 NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* ReqTRES=cpu=1,mem=1M,node=1,billing=1 AllocTRES=cpu=1,mem=1M,node=1,billing=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=//cloudnodedemo.sh WorkDir=/ StdErr=//slurm-1.out StdIn=/dev/null StdOut=//slurm-1.out Power=sinfoPARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up infinite 11 idle~ slurm-demo-worker-cpu-[0-10]