AutoML is an enhanced machine learning service that is provided by Platform for AI (PAI). It integrates multiple algorithms and distributed computing resources. If you use AutoML, you do not need to write code. You can create experiments to fine-tune model hyperparameters and improve the efficiency and performance of machine learning. This topic describes how to create an experiment.
Background information
How AutoML works:
An experiment generates multiple hyperparameter combinations based on the configured algorithm. The experiment creates a trial for each hyperparameter combination. Each trial may correspond to one Deep Learning Containers (DLC) task or one or more MaxCompute tasks. The task type varies based on the execution configuration of the experiment. The system runs a trial based on the configured task. The experiment schedules and runs multiple trials and compares the results of these trials to find the optimal hyperparameter combination. For more information about how AutoML works, see How AutoML works.
Prerequisites
The permissions required to use AutoML are granted to your account. This prerequisite must be met if you use AutoML for the first time. For more information, see Grant permissions that are required to use AutoML.
A workspace is created. For more information, see Create a workspace.
The following operations are performed before you create a DLC task:
The permissions required to use DLC are granted to your account. For more information, see Grant the permissions that are required to use DLC.
A resource group is prepared. For more information about how to prepare a public resource group or a dedicated resource group for general computing resources, see Create a resource group and purchase general computing resources. For more information about how to prepare a Lingjun resource group, see Lingjun resource quotas.
MaxCompute resources are prepared and associated with the created workspace. This prerequisite must be met if you want to create a MaxCompute task. For more information, see MaxCompute resource quotas.
Procedure
Go to the AutoML page.
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane, choose
.
On the AutoML page, click Create Experiment.
On the Create Experiment page, configure the parameters.
Parameters in the Basic Information section
Parameter
Description
Name
The name of the experiment. You can specify this parameter as prompted.
Description
The brief description of the experiment that you want to create. The description is used to distinguish between different experiments.
Visibility
The visibility of the experiment. Valid values:
Visible to Me: The experiment is visible to your account and the administrator in the current workspace.
Visible to Current Workspace: The experiment is visible to all users in the workspace.
Parameters in the Execution Configurations section
Job Type: the execution environment of the trial. You can select DLC or MaxCompute.
DLC: DLC tasks are executed for hyperparameter fine-tuning. For more information about DLC tasks, see Submit training jobs.
MaxCompute: SQL commands or PAI commands of Machine Learning Designer components are run by consuming MaxCompute computing resources to perform hyperparameter fine-tuning. For more information about Machine Learning Designer components and the PAI commands supported by each component, see Component reference: Overview of all components.
DLC
If you select DLC for Job Type, configure the following parameters.
Parameter
Description
Framework
The supported framework. Valid values:
Tensorflow
PyTorch
Datasets
The datasets that you prepared. For more information about how to configure datasets, see Create and manage datasets.
Code
The repository where the code file of the task is stored. In this example, you must specify the repository of the code file that you prepared. For more information about the configuration method, see Code builds.
NoteDLC downloads code to a specified working directory. Therefore, you must have the permissions to access the code repository.
Resource Group
The public resource group or a dedicated resource group that you have purchased. For more information about how to prepare a resource group, see Create a resource group and purchase general computing resources and Lingjun resource quotas.
Instance Type
The instance type that is required to run the task. The prices of instances vary based on their types. For billing details of each instance type, see Billing of general computing resources(DSW/DLC).
Node Image
The image used by the worker nodes. Valid values:
Community Image: a standard image provided by the community. For more information about different community images, see the "Community images (open source standard images)" section in Public images.
PAI Image: an image provided by Alibaba Cloud PAI. PAI images support different types of resources, Python versions, and deep learning frameworks (TensorFlow and PyTorch). For more information about PAI images, see Public images.
Custom Image: a custom image that you add to PAI. Before you select a custom image, you must add the custom image to PAI. For more information, see Custom images.
Image Address: the address of a custom, community, or PAI image. If you select Image Address, you must also specify the URL of the Docker registry image that you want to access over the Internet.
Nodes
The number of compute nodes that are used in the DLC task.
ImportantIf you configure multiple nodes, each node is separately billed instead of being billed based on the same instance type. When you specify this parameter, you must determine the cost of each node and take the compromise between costs and performance into account.
vCPUs
If you select a purchased dedicated resource group from the Resource Group drop-down list, you can specify these parameters based on the specifications of the purchased resources.
Memory (GiB)
Shared Memory (GiB)
GPUs
Advanced Settings
Advanced settings help you improve training flexibility or adapt to specific training scenarios. If you select PyTorch from the Framework drop-down list, you can configure advanced settings. For more information about the supported advanced parameters and their valid values, see the "Appendix 1: Advanced parameters" section in Submit training jobs.
Node Startup Command
The command that is run to start a node. You must specify
${Custom hyperparameter variables}
in the command to configure hyperparameter variables. Example:python /mnt/data/examples/search/dlc_mnist/mnist.py --data_dir=/mnt/data/examples/search/data --save_model=/mnt/data/exmaples/search/model/model_${exp_id}_${trial_id} --batch_size=${batch_size} --lr=${lr} --metric_filepath=/mnt/data/examples/search/metric/metric_${exp_id}_${trial_id}
In the preceding command,
${batch_size}
and${lr}
are the hyperparameter variables that you define.Hyperparameter
The hyperparameter list is automatically loaded based on the hyperparameter variables configured in the startup command. You must specify Constraint Type and Search Space for each hyperparameter.
Constraint Type: the constraint that is imposed on the hyperparameter. You can move the pointer over the icon next to Constraint Type to view the supported constraint types and relevant descriptions.
Search Space: the value range of the hyperparameter. The method to configure a search space varies based on the constraint type of a hyperparameter. You can click the icon and add a value as prompted.
MaxCompute
If you select MaxCompute for Job Type, configure the following parameters.
Parameter
Description
Command
The SQL command or PAI command of a specific Machine Learning Designer component. You must specify
${Custom hyperparameter variables}
in the command to configure hyperparameter variables. Example:pai -name kmeans -project algo_public -DinputTableName=pai_kmeans_test_input -DselectedColNames=f0,f1 -DappendColNames=f0,f1 -DcenterCount=${centerCount} -Dloop=10 -Daccuracy=0.01 -DdistanceType=${distanceType} -DinitCenterMethod=random -Dseed=1 -DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id} -DidxTableName=pai_kmeans_test_output_idx_${exp_id}_${trial_id} -DclusterCountTableName=pai_kmeans_test_output_couter_${exp_id}_${trial_id} -DcenterTableName=pai_kmeans_test_output_center_${exp_id}_${trial_id};
In the preceding command,
${centerCount}
and${distanceType}
are the hyperparameter variables that you define.For more configuration examples, see Appendix: References in this topic.
Hyperparameter
The hyperparameter list is automatically loaded based on the hyperparameter variables configured in the command. You must specify Constraint Type and Search Space for each hyperparameter.
Constraint Type: the constraint that is imposed on the hyperparameter. You can move the pointer over the icon next to Constraint Type to view the supported constraint types and relevant descriptions.
Search Space: the value range of the hyperparameter. The method to configure a search space varies based on the constraint type of a hyperparameter. You can click the icon and add a value as prompted.
Parameters in the Trial Configuration section
If you need to run a task by using a specific hyperparameter combination, configure the following parameters.
Parameter
Description
Metric Type
The type of the metric that is used to evaluate the trial. Valid values:
summary: The final metric values are extracted from the Tensorflow summary file that is obtained from Object Storage Service (OSS).
table: The final metric values are extracted from a MaxCompute table.
stdout: The final metric values are extracted from stdout in the running process.
json: The final metric values are stored in OSS as JSON files.
Method
The calculation method that is used to determine the final metric values after multiple intermediate metric values are gradually generated during the task execution process. Valid values:
final: The last metric value is used as the final metric value of an entire trial.
best: The optimal metric value that is obtained during the task execution process is used as the final metric value of an entire trial.
avg: The average value of all intermediate metric values that are obtained during the task execution process is used as the final metric value of an entire trial.
Metric Weight
If you need to consider multiple metrics at the same time, you can configure the names and weights of the metrics. The system then uses the weighted sum value as the final metric value.
key: the name of a metric. Regular expressions are supported.
value: the weight of a metric.
NoteThe weight can be a negative value, and the sum of weights can be a value other than 1. You can configure a custom value.
Metric Source
The metric source.
If you select summary or json from the Metric Type drop-down list, you must configure a file path. Example:
oss://examplebucket/examples/search/pai/model/model_${exp_id}_${trial_id}
.If you select table from the Metric Type drop-down list, you must configure an SQL statement that can return a specific result. Example:
select GET_JSON_OBJECT(summary, '$.calinhara') as vrc from pai_ft_cluster_evaluation_out_${exp_id}_${trial_id}
.If you select stdout from the Metric Type drop-down list, you must configure a command keyword. You must set this parameter to
cmdx
orcmdx;xxx,such as cmd1;worker
.
Optimization
The optimization goal that is used to evaluate the trial result. Valid values:
Maximize
Minimize
Model Storage Path
The path where the model is stored. The path must contain
${exp_id}_${trial_id}
to distinguish between models that are generated by using different combinations of hyperparameters. Example:oss://examplebucket/examples/search/pai/model/model_${exp_id}_${trial_id}
.Parameters in the Search Configurations section
Parameter
Description
Search Algorithm
The automated machine learning algorithm. Based on the algorithm, the system finds the optimal hyperparameter combination for the running of the next trial based on the hyperparameter search space and the execution results and metrics of the completed trial. Valid values:
TPE
Random
GridSearch
Evolution
GP
PBT
For more information about the search algorithms, see the "Supported search algorithms" section in Limits and usage notes of AutoML.
Maximum Trials
The maximum number of trials that can be run in the experiment.
Maximum Concurrent Trials
The maximum number of trials that can be concurrently run in the experiment.
Click Submit.
You can view the created experiment in the experiment list.
What to do next
You can view the experiment details at any time to obtain the progress of the experiment. You can view the execution result of each trial to obtain the optimal hyperparameter combination. For more information, see View experiment details.
You can manage experiments. For more information, see Manage experiments.
Appendix: References
The following configuration example is provided for your reference when you use a MaxCompute task to perform hyperparameter fine-tuning.
Machine Learning Designer components: K-means Clustering and Clustering Model Evaluation.
The following code shows the configurations of the cmd1 and cmd2 commands that are used for the two components. The two commands are listed based on the execution sequence. For the detailed procedure, see Best practice for running the K-means Clustering component.
cmd1
pai -name kmeans -project algo_public -DinputTableName=pai_kmeans_test_input -DselectedColNames=f0,f1 -DappendColNames=f0,f1 -DcenterCount=${centerCount} -Dloop=10 -Daccuracy=0.01 -DdistanceType=${distanceType} -DinitCenterMethod=random -Dseed=1 -DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id} -DidxTableName=pai_kmeans_test_output_idx_${exp_id}_${trial_id} -DclusterCountTableName=pai_kmeans_test_output_couter_${exp_id}_${trial_id} -DcenterTableName=pai_kmeans_test_output_center_${exp_id}_${trial_id};
cmd2
PAI -name cluster_evaluation -project algo_public -DinputTableName=pai_cluster_evaluation_test_input -DselectedColNames=f0,f1 -DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id} -DoutputTableName=pai_ft_cluster_evaluation_out_${exp_id}_${trial_id};