submit training jobs by using the DLC command line tool - Platform For AI

You can use the Deep Learning Containers (DLC) client to submit training jobs of various types. This topic describes the commands used to submit training jobs, including call formats, parameter descriptions, and usage examples.

Common parameters that are used to submit training jobs

The parameters described in the following table are required for submitting training jobs by using the DLC client, regardless of whether the training jobs are of the TensorFlow, PyTorch, or XGBoost types. The following table lists the common parameters used to submit training jobs.

Table 1. Common parameters used to submit training jobs

Parameter	Required	Description	Type	Supported in the parameter description file
name	Yes	The name of the job. The name does not need to be unique.	STRING	Yes
command	Yes	The command that is run to start the node.	STRING	Yes
data_sources	No	The ID of the associated dataset. You can obtain the dataset ID on the Datasets page. For more information, see Create and manage datasets. Separate multiple data sources with commas (,). By default, this parameter is left empty.	STRING	Yes
code_source	No	The ID of the code set. You can obtain the code set ID on the Source Code Repositories page. For more information, see Code builds. You can specify only a single code source. By default, this parameter is left empty.	STRING	Yes
code_branch	No	The branch of the code repository. This parameter is used together with the code_source parameter.	STRING	Yes
code_commit	No	The commit ID of the code repository. This parameter is used together with the code_source parameter.	STRING	Yes
thirdparty_libs	No	The third-party Python library. Separate multiple libraries with commas (,). By default, this parameter is left empty.	STRING	Yes
thirdparty_lib_dir	No	The directory that contains the text file named requirements.txt. The file is used to install third-party Python libraries. By default, this parameter is left empty.	STRING	No
vpc_id	No	The ID of the available virtual private cloud (VPC) for the job. By default, this parameter is left empty.	STRING	Yes
switch_id	No (required if the vpc_id parameter is configured)	The ID of the available vSwitch for the job in the VPC that is specified by the vpc_id parameter. By default, this parameter is left empty.	STRING	Yes
security_group_id	No (required if the vpc_id parameter is configured)	The ID of the available security group for the job in the VPC that is specified by the vpc_id parameter. By default, this parameter is left empty.	STRING	Yes
job_file	No	The parameter description file of the job. If this parameter is specified, the parameters described in the file take precedence. Specify the parameters in the description file in the `key=value` format. The keys are the same as the keys of the parameters used in the client.	STRING	No
interactive	No	Specifies whether to start the job in interactive mode.	BOOL	Yes
job_max_running_time_minutes	No	The maximum uptime of the job. The default value is 0, which indicates that the uptime of the job is unlimited.	INT64	Yes
success_policy	No	Only TensorFlow jobs are supported. Valid values: ChiefWorker: The job is complete if the pod on the chief node is terminated. AllWorkers: The job is complete only if the pods on all nodes are terminated. By default, this parameter is left empty, which is equivalent to AllWorkers.	STRING	Yes
envs	No	The environment variables for the worker node. Separate environment variables with commas (,). Separate a key and a value in an environment variable with an equal sign (=). Configure the environment variables in the `key1=value1,key2=value2` format.	StringToString	Yes
tags	No	The tags that you want to add to the job. Separate tags with commas (,). Separate a key and a value in a tag with an equal sign (=). Configure the environment variables in the `key1=value1,key2=value2` format.	StringToString	Yes
oversold_type	No	The way in which computing resources for off-peak hours are used for the job. Valid values: AcceptQuotaOverSold: Computing resources for off-peak hours can be used for the job. ForceQuotaOverSold: Only computing resources for off-peak hours can be used for the job. ForbiddenQuotaOverSold: Only resources in the associated quota can be used for the job. Computing resources for off-peak hours cannot be used for the job.	STRING	Yes
driver	No	The GPU driver version used for the job.	STRING	Yes
default_route	No	The method to access the Internet if you select a virtual private cloud (VPC). Valid values: eth0 (default): A public gateway is used to access the Internet. eth1: A dedicated gateway is used to access the Internet over the selected VPC.	STRING	Yes
priority	No	The priority of the job. Valid values: 1 to 9. Default value: 1. The value 1 indicates the lowest priority. The value 9 indicates the highest priority.	INT32	Yes
exit_code_on_stopped	No	The exit code of the CML when a task that is run in interactive mode is stopped. Default value: 0.	INT32	Yes
job_reserved_minutes	No	The retention period after the task ends. Unit: minutes. Default value: 0.	INT32	Yes
job_reserved_policy	No	The policy that is used to retain the task. Valid values: Always (default): The task is retained regardless of whether the task runs successfully or fails. OnFailure: The task is retained if the task fails. OnSucceed: The task is retained if the task runs successfully.	STRING	Yes

Submit TensorFlow training jobs

Feature description
Submit TensorFlow training jobs.
Syntax
You can use a command that contains related parameters or use a parameter description file to submit a TensorFlow training job.
```
./dlc submit tfjob [flags]
```

Parameter description

If you use a command that contains related parameters, include both the parameter keys and their actual values in the command. If you use a parameter description file, specify related parameters in the <parameterName>=<parameterValue> format in the file. The parameters common to all types of training jobs are described in the "Common parameters used to submit training jobs" section of this topic. The following table describes the parameters specific to submitting TensorFlow jobs.

Table 2. Parameters specific to submitting TensorFlow training jobs

Parameter	Required	Description	Type	Supported in the parameter description file
workspace_id	Yes	The ID of the workspace that is used to submit the job. By default, this parameter is left empty. For information about how to create a workspace, see Create a workspace.	STRING	Yes
chief	No	Specifies whether to start the chief node. Default value: false. Valid values: false: does not start the chief node. true: starts the chief node.	BOOL	Yes
chief_image	No	The image of the chief node. By default, this parameter is left empty.	STRING	Yes
chief_spec	No	The node type of the chief node. By default, this parameter is left empty.	STRING	Yes
master_image	No	The image of the master node. By default, this parameter is left empty.	STRING	Yes
master_spec	No	The node type of the master node.	STRING	Yes
masters	No	The number of master nodes. Default value: 0.	INT	Yes
ps	No	The number of parameter servers. Default value: 0.	INT	Yes
ps_image	No	The image of the parameter server. By default, this parameter is left empty.	STRING	Yes
ps_spec	No	The node type of the parameter server. By default, this parameter is left empty.	STRING	Yes
worker_image	No	The image of the worker node. By default, this parameter is left empty.	STRING	Yes
worker_spec	No	The node type of the worker node. By default, this parameter is left empty.	STRING	Yes
workers	No	The number of worker nodes. Default value: 0.	INT	Yes
evaluator_image	No	The image of the evaluator node. By default, this parameter is left empty.	STRING	Yes
evaluator_spec	No	The node type of the evaluator node. By default, this parameter is left empty.	STRING	Yes
evaluators	No	The number of evaluator nodes. Default value: 0.	INT	Yes
graphlearn_image	No	The image of the GraphLearn node. By default, this parameter is left empty.	STRING	Yes
graphlearn_spec	No	The node type of the GraphLearn node. By default, this parameter is left empty.	STRING	Yes
graphlearns	No	The number of GraphLearn nodes. Default value: 0.	INT	Yes

Table 3. Parameters specific to submitting TensorFlow training jobs to dedicated resource groups

Parameter	Required	Description	Type	Supported in the parameter description file
resource_id	No (required if you want to submit a job to a dedicated resource group)	The ID of the dedicated resource quota. By default, this parameter is left empty. For more information about how to create a dedicated resource quota, see General computing resource quotas.	STRING	Yes
priority	No	The priority of the job. Default value: 1.	INT	Yes
chief_cpu	No	The number of CPU cores used by the chief node. By default, this parameter is left empty.	STRING	Yes
chief_gpu	No	The number of GPU cores used by the chief node. By default, this parameter is left empty.	STRING	Yes
chief_gpu_type	No	The GPU type used by the chief node. By default, this parameter is left empty. Example: GU50.	STRING	Yes
chief_memory	No	The amount of memory used by the chief node. By default, this parameter is left empty. Examples: 500Mi and 1Gi.	STRING	Yes
chief_shared_memory	No	The amount of memory shared by the chief node. By default, this parameter is left empty. Examples: 500Mi and 1Gi.	STRING	Yes
master_cpu	No	The number of CPU cores that are used by the master node. By default, this parameter is left empty.	STRING	Yes
master_gpu	No	The number of GPU cores that are used by the master node. By default, this parameter is left empty.	STRING	Yes
master_gpu_type	No	The GPU type used by the master node. By default, this parameter is left empty. Example: GU50.	STRING	Yes
master_memory	No	The amount of memory used by the master node. By default, this parameter is left empty. Examples: 500Mi and 1Gi.	STRING	Yes
master_shared_memory	No	The amount of memory shared by the master node. By default, this parameter is left empty. Examples: 500Mi and 1Gi.	STRING	Yes
*_cpu	No	The number of CPU cores used by the specified type of node, which is indicated by the wildcard character (). By default, this parameter is left empty. The wildcard character () can represent a parameter server, worker, evaluator, or GraphLearn.	STRING	Yes
*_gpu	No	The number of GPU cores used by a specified type of node, which is indicated by the wildcard character (). By default, this parameter is left empty. The wildcard character () can represent a parameter server, worker, evaluator, or GraphLearn.	STRING	Yes
*_gpu_type	No	The GPU type of a specified type of node, which is indicated by the wildcard character (). By default, this parameter is left empty. Example: GU50. The wildcard character () can represent a parameter server, worker, evaluator, or GraphLearn.	STRING	Yes
*_memory	No	The amount of memory used by a specified type of node, which is indicated by the wildcard character (). By default, this parameter is left empty. Examples: 500Mi and 1Gi. The wildcard character () can represent a parameter server, worker, evaluator, or GraphLearn.	STRING	Yes
*_shared_memory	No	The amount of memory shared by a specified type of node, which is indicated by the wildcard character (). By default, this parameter is left empty. Examples: 500Mi and 1Gi. The wildcard character () can represent a parameter server, worker, evaluator, or GraphLearn.	STRING	Yes

Examples

Run a command to submit a job that involves two worker nodes and one parameter server.

./dlc submit tfjob --name=test_2021 --ps=1 \
  --ps_spec=ecs.g6.8xlarge \
  --ps_image=registry-vpc.cn-beijing.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py27-ubuntu16.04 \
  --workers=2 \
  --worker_spec=ecs.g6.4xlarge \
  --worker_image=registry-vpc.cn-beijing.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py27-ubuntu16.04 \
  --command="python /root/data/dist_mnist/code/dist-main.py --max_steps=10000 --data_dir=/root/data/dist_mnist/data/" \
  --workspace_id=***** \
  --data_sources=data-2021xxxxxxxxxx-xxxxxxxxxxxx

The system displays information similar to the following output:

+----------------------------------+--------------------------------------+
|              JobId               |              RequestId               |
+----------------------------------+--------------------------------------+
| dlcmp6vwljkz****                 | xxxxxxxx-79AF-4EFC-9CE9-xxxxxxxxxxxx |
+----------------------------------+--------------------------------------+

Use a parameter description file to submit a job that involves two worker nodes and one parameter server.

./dlc submit tfjob --job_file=job_file.dist_mnist.1ps2w

job_file.dist_mnist.1ps2w indicates the parameter description file in which parameters are provided in the <parameterName>=<parameterValue> format. The job_file.dist_mnist.1ps2w file contains the following content:

name=test_2021
workers=2
worker_spec=ecs.g6.4xlarge
worker_image=registry-vpc.cn-beijing.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py27-ubuntu16.04
ps=1
ps_spec=ecs.g6.8xlarge
ps_image=registry-vpc.cn-beijing.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py27-ubuntu16.04
command=python /root/data/dist_mnist/code/dist-main.py --max_steps=10000 --data_dir=/root/data/dist_mnist/data/
workspace_id=*****
data_sources=data-2021xxxxxxxxxx-xxxxxxxxxxxx

Submit PyTorch training jobs

Feature description
Submit PyTorch training jobs.
Syntax
You can use a command that contains related parameters or use a parameter description file to submit a PyTorch training job.
```
./dlc submit pytorchjob [flags]
```

Parameter description

Table 4. Parameters specific to submitting PyTorch training jobs

Parameter	Required	Description	Type	Supported in the parameter description file
workspace_id	Yes	The ID of the workspace that is used to submit the job. By default, this parameter is left empty. For information about how to create a workspace, see Create a workspace.	STRING	Yes
master_image	No	The image of the master node. By default, this parameter is left empty.	STRING	Yes
master_spec	No	The node type of the master node. By default, this parameter is left empty.	STRING	Yes
masters	No	The number of master nodes. Default value: 0.	INT	Yes
worker_image	No	The image of the worker node. By default, this parameter is left empty.	STRING	Yes
worker_spec	No	The node type of the worker node. By default, this parameter is left empty.	STRING	Yes
workers	No	The number of worker nodes. Default value: 0.	INT	Yes

Table 5. Parameters specific to submitting PyTorch training jobs to dedicated resource groups

Parameter	Required	Description	Type	Supported in the parameter description file
resource_id	No (required if you want to submit a job to a dedicated resource group)	The ID of the dedicated resource quota. By default, this parameter is left empty. For more information about how to create a dedicated resource quota, see General computing resource quotas.	STRING	Yes
priority	No	The priority of the job. The number of threads used by the component. Default value: 1.	INT	Yes
master_cpu	No	The number of CPU cores used by the master node. By default, this parameter is left empty.	STRING	Yes
master_gpu	No	The number of GPU cores used by the master node. By default, this parameter is left empty.	STRING	Yes
master_gpu_type	No	The GPU type that is used by the master node. By default, this parameter is left empty. Example: GU50.	STRING	Yes
master_memory	No	The amount of memory that is used by the master node. By default, this parameter is left empty. Examples: 500Mi and 1Gi.	STRING	Yes
master_shared_memory	No	The amount of memory that is shared by the master node. By default, this parameter is left empty. Examples: 500Mi and 1Gi.	STRING	Yes
worker_cpu	No	The number of CPU cores that are used by the worker node. By default, this parameter is left empty.	STRING	Yes
worker_gpu	No	The number of GPU cores that are used by the worker node. By default, this parameter is left empty.	STRING	Yes
worker_gpu_type	No	The GPU type that is used by the worker node. By default, this parameter is left empty. Example: GU50.	STRING	Yes
worker_memory	No	The amount of memory that is used by the worker node. By default, this parameter is left empty. Examples: 500Mi and 1Gi.	STRING	Yes
worker_shared_memory	No	The amount of memory that is shared by the worker node. By default, this parameter is left empty. Examples: 500Mi and 1Gi.	STRING	Yes

Examples

Run a command that contains related parameters to submit a GPU model training job.

./dlc submit pytorchjob --name=test_pt_face \
  --workers=1 \
  --worker_spec=ecs.gn6e-c12g1.3xlarge \
  --worker_image=registry-vpc.cn-beijing.aliyuncs.com/pai-dlc/pytorch-training:1.7.1-gpu-py37-cu110-ubuntu18.04 \
  --command="apt-get update; apt-get -y --allow-downgrades install libpcre3=2:8.38-3.1 libpcre3-dev libgl1-mesa-glx libglib2.0-dev; cd /root/data/face; python train.py --num_workers 0 --save_folder outputs" \
  --data_sources=data-20210410224621-xxxxxxxxxxxx \
  --workspace_id=*****

The system displays information similar to the following output:

+----------------------------------+--------------------------------------+
|              JobId               |              RequestId               |
+----------------------------------+--------------------------------------+
| dlcu704xxuxk****                 | xxxxxxxx-79AF-4EFC-9CE9-xxxxxxxxxxxx |
+----------------------------------+--------------------------------------+

Submit XGBoost training jobs

Feature description
Submit XGBoost training jobs.
Syntax
You can use a command that contains related parameters or use a parameter description file to submit an XGBoost training job.
```
./dlc submit xgboostjob [flags]
```

Parameter description

Table 6. Parameters specific to submitting XGBoost training jobs

Parameter	Required	Description	Type	Supported in the parameter description file
workspace_id	Yes	The ID of the workspace that is used to submit the job. By default, this parameter is left empty. For information about how to create a workspace, see Create a workspace.	STRING	Yes
master_image	No	The image of the master node. By default, this parameter is left empty.	STRING	Yes
master_spec	No	The node type of the master node. By default, this parameter is left empty.	STRING	Yes
masters	No	The number of master nodes. Default value: 0.	INT	Yes
worker_image	No	The image of the worker node. By default, this parameter is left empty.	STRING	Yes
worker_spec	No	The node type of the worker node. By default, this parameter is left empty.	STRING	Yes
workers	No	The number of worker nodes. Default value: 0.	INT	Yes

Table 7. Parameters specific to submitting XGBoost training jobs to dedicated resource groups

Parameter	Required	Description	Type	Supported in the parameter description file
resource_id	No (required if you want to submit a job to a dedicated resource group)	The ID of the dedicated resource quota. By default, this parameter is left empty. For more information about how to create a dedicated resource quota, see General computing resource quotas.	STRING	Yes
priority	No	The priority of the job. The number of threads used by the component. Default value: 1.	INT	Yes
master_cpu	No	The number of CPU cores used by the master node. By default, this parameter is left empty.	STRING	Yes
master_gpu	No	The number of GPU cores used by the master node. By default, this parameter is left empty.	STRING	Yes
master_gpu_type	No	The GPU type that is used by the master node. By default, this parameter is left empty. Example: GU50.	STRING	Yes
master_memory	No	The amount of memory that is used by the master node. By default, this parameter is left empty. Examples: 500Mi and 1Gi.	STRING	Yes
master_shared_memory	No	The amount of memory shared by the master node. By default, this parameter is left empty. Examples: 500Mi and 1Gi.	STRING	Yes
worker_cpu	No	The number of CPU cores used by the worker node. By default, this parameter is left empty.	STRING	Yes
worker_gpu	No	The number of GPU cores used by the worker node. By default, this parameter is left empty.	STRING	Yes
worker_gpu_type	No	The GPU type that is used by the worker node. By default, this parameter is left empty. Example: GU50.	STRING	Yes
worker_memory	No	The amount of memory that is used by the worker node. By default, this parameter is left empty. Examples: 500Mi and 1Gi.	STRING	Yes
worker_shared_memory	No	The amount of memory that is shared by the worker node. By default, this parameter is left empty. Examples: 500Mi and 1Gi.	STRING	Yes

Examples

Run a command that contains related parameters to submit an XGBoost training job.

./dlc submit xgboostjob --name=test_xgboost \
  --workers=1 \
  --worker_spec=ecs.gn6e-c12g1.3xlarge \
  --worker_image=xgboost-training:1.6.0-cpu-py36-ubuntu18.04 \
  --command="python /root/code/horovod/xgboost/main.py --job_type=Train --xgboost_parameter=objective:multi:softprob,num_class:3 --n_estimators=50 --model_path=autoAI/xgb-opt/2" \
  --workspace_id=*****

The system displays information similar to the following output:

+----------------------------------+--------------------------------------+
|              JobId               |              RequestId               |
+----------------------------------+--------------------------------------+
| dlc1nvu3gli0****                 | xxxxxxxx-79AF-4EFC-9CE9-xxxxxxxxxxxx |
+----------------------------------+--------------------------------------+

Advanced parameters that are used to submit training jobs

Specify nodes when submitting jobs

You can configure parameters to specify nodes when submitting training jobs with Lingjun or general computing resource quotas by using the DLC client.

Note

This feature is available only for users in a whitelist. Contact your account manager to add your account to the whitelist.

Parameters

Parameter	Description	Example
--allow_nodes="${allow_nodes}"	A list of allowed nodes. Multiple modes are separated by commas (,). We recommend that you do not include spaces in between.	lingjuc47iextvg9-*,lingjuc47iextvg9-*
--deny_nodes="${deny_nodes}"	A list of denied nodes. Multiple modes are separated by commas (,). We recommend that you do not include spaces in between.	lingjuc47iextvg9-*,lingjuc47iextvg9-*

Examples

Command line parameters

Sample command:

No nodes specified

./dlc submit pytorchjob --name=assign_node_test_no_node  \--workers=1 \
    --worker_image=dsw-registry-vpc.****.cr.aliyuncs.com/pai/easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04 \
    --command="sleep 1000" \
    --workspace_id='****' \
    --resource_id='quotau2h98mt****' \
    --worker_cpu="1" \
    --worker_memory='2Gi'

Specify allowed nodes

./dlc submit pytorchjob --name=assign_node_test_2_allow_nodes  \--workers=1 \
    --worker_image=dsw-registry-vpc.****.cr.aliyuncs.com/pai/easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04 \
    --command="sleep 1000" \
    --workspace_id='****' \
    --resource_id='quotau2h98mt****' \
    --worker_cpu="1" \
    --worker_memory='2Gi' \
    --allow_nodes="lingjuc47iextvg9-****,lingjuc47iextvg9-****"

Specify denied nodes

 ./dlc submit pytorchjob --name=assign_node_test_two_deny_nodes  \--workers=1 \
    --worker_image=dsw-registry-vpc.****.cr.aliyuncs.com/pai/easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04 \
    --command="sleep 1000" \
    --workspace_id='****' \
    --resource_id='quotau2h98mt****' \
    --worker_cpu="1" \
    --worker_memory='2Gi' \
    --deny_nodes="lingjuc47iextvg9-****,lingjuc47iextvg9-****"

Specify allowed and denied nodes

./dlc submit pytorchjob --name=assign_node_test_two_allow_two_deny  \--workers=1 \
    --worker_image=dsw-registry-vpc.****.cr.aliyuncs.com/pai/easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04 \
    --command="sleep 1000" \
    --workspace_id='****' \
    --resource_id='quotau2h98mt****' \
    --worker_cpu="1" \
    --worker_memory='2Gi' \
    --allow_nodes="lingjuc47iextvg9-****,lingjuc47iextvg9-****" \
    --deny_nodes="lingjuc47iextvg9-****,lingjuc47iextvg9-****"

Read file

Sample command:
```
./dlc submit pytorchjob -f job_file
```

Example of job parameter configuration file, job_file:

No nodes specified

name=assign_node_test_no_node
workers=1
worker_image=dsw-registry-vpc.****.cr.aliyuncs.com/pai/easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04
command=sleep 1000
workspace_id=****
resource_id=quotau2h98mt****
worker_cpu=1
worker_memory=2Gi

Specify allowed nodes

name=assign_node_test_2_allow_nodes
workers=1
worker_image=dsw-registry-vpc.****.cr.aliyuncs.com/pai/easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04
command=sleep 1000
workspace_id=****
resource_id=quotau2h98mt****
worker_cpu=1
worker_memory=2Gi
allow_nodes=lingjuc47iextvg9-****,lingjuc47iextvg9-****

Specify denied nodes

name=assign_node_test_two_allow_two_deny
workers=1
worker_image=dsw-registry-vpc.****.cr.aliyuncs.com/pai/easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04
command=sleep 1000
workspace_id=****
resource_id=quotau2h98mt****
worker_cpu=1
worker_memory=2Gi
deny_nodes=lingjuc47iextvg9-****,lingjuc47iextvg9-****

Specify allowed and denied nodes

name=assign_node_test_two_allow_two_deny
workers=1
worker_image=dsw-registry-vpc.****.cr.aliyuncs.com/pai/easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04
command=sleep 1000
workspace_id=****
resource_id=quotau2h98mt****
worker_cpu=1
worker_memory=2Gi
allow_nodes=lingjuc47iextvg9-****,lingjuc47iextvg9-****
deny_nodes=lingjuc47iextvg9-****,lingjuc47iextvg9-****

Disable pay-as-you-go inventory check when submitting jobs

You can configure the disable_ecs_stock_check parameter to disable pay-as-you-go inventory check when submitting training jobs by using the DLC client.

Parameters

Parameter

Description

Example

disable_ecs_stock_check

Whether to disable pay-as-you-go inventory check. Valid values:

false (default): Enable pay-as-you-go inventory check.
true: Disable pay-as-you-go inventory check.

true or false

Examples

Command line parameters

Sample command:

Enable pay-as-you-go inventory check

./dlc submit pytorchjob \
    --name=test_skip_checking3 \
    --command='sleep 1000' \
    --workspace_id=**** \
    --priority=1 \
    --workers=1 \
    --worker_image=registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12PAI-gpu-py36-cu101-ubuntu18.04 \
    --worker_spec=ecs.g6.xlarge

Disable pay-as-you-go inventory check

./dlc submit pytorchjob \
    --name=test_skip_checking3 \
    --command='sleep 1000' \
    --workspace_id=**** \
    --priority=1 \
    --workers=1 \
    --worker_image=registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12PAI-gpu-py36-cu101-ubuntu18.04 \
    --worker_spec=ecs.g6.xlarge \
    --disable_ecs_stock_check=true

Read file

Sample command:

./dlc submit pytorchjob -f job_file

Example of job parameter configuration file, job_file:

Enable pay-as-you-go inventory check

name=test_skip_checking3
workers=1
worker_image=registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12PAI-gpu-py36-cu101-ubuntu18.04
command=sleep 1000
workspace_id=****
worker_spec=ecs.g6.xlarge

Disable pay-as-you-go inventory check

name=test_skip_checking3
workers=1
worker_image=registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12PAI-gpu-py36-cu101-ubuntu18.04
command=sleep 1000
workspace_id=****
worker_spec=ecs.g6.xlarge
disable_ecs_stock_check=true

References

After you submit a job, you can use the DLC client to manage the job. For more information, see Command used to stop training jobs and Commands used to query logs or jobs.
You can also manage submitted jobs in the PAI console. For more information, see Manage training jobs.