All Products
Search
Document Center

Platform For AI:Create an experiment

Last Updated:Mar 18, 2024

AutoML is an enhanced machine learning service that is provided by Platform for AI (PAI). It integrates multiple algorithms and distributed computing resources. If you use AutoML, you do not need to write code. You can create experiments to fine-tune model hyperparameters and improve the efficiency and performance of machine learning. This topic describes how to create an experiment.

Background information

How AutoML works:

An experiment generates multiple hyperparameter combinations based on the configured algorithm. The experiment creates a trial for each hyperparameter combination. Each trial may correspond to one Deep Learning Containers (DLC) task or one or more MaxCompute tasks. The task type varies based on the execution configuration of the experiment. The system runs a trial based on the configured task. The experiment schedules and runs multiple trials and compares the results of these trials to find the optimal hyperparameter combination. For more information about how AutoML works, see How AutoML works.

Prerequisites

Procedure

  1. Go to the AutoML page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose Model Development and Training > AutoML.

  2. On the AutoML page, click Create Experiment.

  3. On the Create Experiment page, configure the parameters.

    • Parameters in the Basic Information section

      Parameter

      Description

      Name

      The name of the experiment. You can specify this parameter as prompted.

      Description

      The brief description of the experiment that you want to create. The description is used to distinguish between different experiments.

      Visibility

      The visibility of the experiment. Valid values:

      • Visible to Me: The experiment is visible to your account and the administrator in the current workspace.

      • Visible to Current Workspace: The experiment is visible to all users in the workspace.

    • Parameters in the Execution Configurations sectionimage.png

      Job Type: the execution environment of the trial. You can select DLC or MaxCompute.

      • DLC: DLC tasks are executed for hyperparameter fine-tuning. For more information about DLC tasks, see Submit training jobs.

      • MaxCompute: SQL commands or PAI commands of Machine Learning Designer components are run by consuming MaxCompute computing resources to perform hyperparameter fine-tuning. For more information about Machine Learning Designer components and the PAI commands supported by each component, see Component reference: Overview of all components.

      DLC

      If you select DLC for Job Type, configure the following parameters.

      Parameter

      Description

      Framework

      The supported framework. Valid values:

      • Tensorflow

      • PyTorch

      Datasets

      The datasets that you prepared. For more information about how to configure datasets, see Create and manage datasets.

      Code

      The repository where the code file of the task is stored. In this example, you must specify the repository of the code file that you prepared. For more information about the configuration method, see Code builds.

      Note

      DLC downloads code to a specified working directory. Therefore, you must have the permissions to access the code repository.

      Resource Group

      The public resource group or a dedicated resource group that you have purchased. For more information about how to prepare a resource group, see Create a resource group and purchase general computing resources and Lingjun resource quotas.

      Instance Type

      The instance type that is required to run the task. The prices of instances vary based on their types. For billing details of each instance type, see Billing of general computing resources(DSW/DLC).

      Node Image

      The image used by the worker nodes. Valid values:

      • Community Image: a standard image provided by the community. For more information about different community images, see the "Community images (open source standard images)" section in Public images.

      • PAI Image: an image provided by Alibaba Cloud PAI. PAI images support different types of resources, Python versions, and deep learning frameworks (TensorFlow and PyTorch). For more information about PAI images, see Public images.

      • Custom Image: a custom image that you add to PAI. Before you select a custom image, you must add the custom image to PAI. For more information, see Custom images.

      • Image Address: the address of a custom, community, or PAI image. If you select Image Address, you must also specify the URL of the Docker registry image that you want to access over the Internet.

      Nodes

      The number of compute nodes that are used in the DLC task.

      Important

      If you configure multiple nodes, each node is separately billed instead of being billed based on the same instance type. When you specify this parameter, you must determine the cost of each node and take the compromise between costs and performance into account.

      vCPUs

      If you select a purchased dedicated resource group from the Resource Group drop-down list, you can specify these parameters based on the specifications of the purchased resources.

      Memory (GiB)

      Shared Memory (GiB)

      GPUs

      Advanced Settings

      Advanced settings help you improve training flexibility or adapt to specific training scenarios. If you select PyTorch from the Framework drop-down list, you can configure advanced settings. For more information about the supported advanced parameters and their valid values, see the "Appendix 1: Advanced parameters" section in Submit training jobs.

      Node Startup Command

      The command that is run to start a node. You must specify ${Custom hyperparameter variables} in the command to configure hyperparameter variables. Example:

      python /mnt/data/examples/search/dlc_mnist/mnist.py --data_dir=/mnt/data/examples/search/data --save_model=/mnt/data/exmaples/search/model/model_${exp_id}_${trial_id} --batch_size=${batch_size} --lr=${lr} --metric_filepath=/mnt/data/examples/search/metric/metric_${exp_id}_${trial_id}

      In the preceding command, ${batch_size} and ${lr} are the hyperparameter variables that you define.

      Hyperparameter

      The hyperparameter list is automatically loaded based on the hyperparameter variables configured in the startup command. You must specify Constraint Type and Search Space for each hyperparameter.

      • Constraint Type: the constraint that is imposed on the hyperparameter. You can move the pointer over the image.png icon next to Constraint Type to view the supported constraint types and relevant descriptions.

      • Search Space: the value range of the hyperparameter. The method to configure a search space varies based on the constraint type of a hyperparameter. You can click the image.png icon and add a value as prompted.

      MaxCompute

      If you select MaxCompute for Job Type, configure the following parameters.

      Parameter

      Description

      Command

      The SQL command or PAI command of a specific Machine Learning Designer component. You must specify ${Custom hyperparameter variables} in the command to configure hyperparameter variables. Example:

      pai -name kmeans
          -project algo_public
          -DinputTableName=pai_kmeans_test_input
          -DselectedColNames=f0,f1
          -DappendColNames=f0,f1
          -DcenterCount=${centerCount}
          -Dloop=10
          -Daccuracy=0.01
          -DdistanceType=${distanceType}
          -DinitCenterMethod=random
          -Dseed=1
          -DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id}
          -DidxTableName=pai_kmeans_test_output_idx_${exp_id}_${trial_id}
          -DclusterCountTableName=pai_kmeans_test_output_couter_${exp_id}_${trial_id}
          -DcenterTableName=pai_kmeans_test_output_center_${exp_id}_${trial_id};

      In the preceding command, ${centerCount} and ${distanceType} are the hyperparameter variables that you define.

      For more configuration examples, see Appendix: References in this topic.

      Hyperparameter

      The hyperparameter list is automatically loaded based on the hyperparameter variables configured in the command. You must specify Constraint Type and Search Space for each hyperparameter.

      • Constraint Type: the constraint that is imposed on the hyperparameter. You can move the pointer over the image.png icon next to Constraint Type to view the supported constraint types and relevant descriptions.

      • Search Space: the value range of the hyperparameter. The method to configure a search space varies based on the constraint type of a hyperparameter. You can click the image.png icon and add a value as prompted.

    • Parameters in the Trial Configuration section

      If you need to run a task by using a specific hyperparameter combination, configure the following parameters.image.png

      Parameter

      Description

      Metric Type

      The type of the metric that is used to evaluate the trial. Valid values:

      • summary: The final metric values are extracted from the Tensorflow summary file that is obtained from Object Storage Service (OSS).

      • table: The final metric values are extracted from a MaxCompute table.

      • stdout: The final metric values are extracted from stdout in the running process.

      • json: The final metric values are stored in OSS as JSON files.

      Method

      The calculation method that is used to determine the final metric values after multiple intermediate metric values are gradually generated during the task execution process. Valid values:

      • final: The last metric value is used as the final metric value of an entire trial.

      • best: The optimal metric value that is obtained during the task execution process is used as the final metric value of an entire trial.

      • avg: The average value of all intermediate metric values that are obtained during the task execution process is used as the final metric value of an entire trial.

      Metric Weight

      If you need to consider multiple metrics at the same time, you can configure the names and weights of the metrics. The system then uses the weighted sum value as the final metric value.

      • key: the name of a metric. Regular expressions are supported.

      • value: the weight of a metric.

      Note

      The weight can be a negative value, and the sum of weights can be a value other than 1. You can configure a custom value.

      Metric Source

      The metric source.

      • If you select summary or json from the Metric Type drop-down list, you must configure a file path. Example: oss://examplebucket/examples/search/pai/model/model_${exp_id}_${trial_id}.

      • If you select table from the Metric Type drop-down list, you must configure an SQL statement that can return a specific result. Example: select GET_JSON_OBJECT(summary, '$.calinhara') as vrc from pai_ft_cluster_evaluation_out_${exp_id}_${trial_id}.

      • If you select stdout from the Metric Type drop-down list, you must configure a command keyword. You must set this parameter to cmdx or cmdx;xxx,such as cmd1;worker.

      Optimization

      The optimization goal that is used to evaluate the trial result. Valid values:

      • Maximize

      • Minimize

      Model Storage Path

      The path where the model is stored. The path must contain ${exp_id}_${trial_id} to distinguish between models that are generated by using different combinations of hyperparameters. Example: oss://examplebucket/examples/search/pai/model/model_${exp_id}_${trial_id}.

    • Parameters in the Search Configurations sectionimage.png

      Parameter

      Description

      Search Algorithm

      The automated machine learning algorithm. Based on the algorithm, the system finds the optimal hyperparameter combination for the running of the next trial based on the hyperparameter search space and the execution results and metrics of the completed trial. Valid values:

      • TPE

      • Random

      • GridSearch

      • Evolution

      • GP

      • PBT

      For more information about the search algorithms, see the "Supported search algorithms" section in Limits and usage notes of AutoML.

      Maximum Trials

      The maximum number of trials that can be run in the experiment.

      Maximum Concurrent Trials

      The maximum number of trials that can be concurrently run in the experiment.

  4. Click Submit.

    You can view the created experiment in the experiment list.

What to do next

  • You can view the experiment details at any time to obtain the progress of the experiment. You can view the execution result of each trial to obtain the optimal hyperparameter combination. For more information, see View experiment details.

  • You can manage experiments. For more information, see Manage experiments.

Appendix: References

The following configuration example is provided for your reference when you use a MaxCompute task to perform hyperparameter fine-tuning.

  • Machine Learning Designer components: K-means Clustering and Clustering Model Evaluation.

  • The following code shows the configurations of the cmd1 and cmd2 commands that are used for the two components. The two commands are listed based on the execution sequence. For the detailed procedure, see Best practice for running the K-means Clustering component.

    • cmd1

      pai -name kmeans
          -project algo_public
          -DinputTableName=pai_kmeans_test_input
          -DselectedColNames=f0,f1
          -DappendColNames=f0,f1
          -DcenterCount=${centerCount}
          -Dloop=10
          -Daccuracy=0.01
          -DdistanceType=${distanceType}
          -DinitCenterMethod=random
          -Dseed=1
          -DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id}
          -DidxTableName=pai_kmeans_test_output_idx_${exp_id}_${trial_id}
          -DclusterCountTableName=pai_kmeans_test_output_couter_${exp_id}_${trial_id}
          -DcenterTableName=pai_kmeans_test_output_center_${exp_id}_${trial_id};
    • cmd2

      PAI -name cluster_evaluation
          -project algo_public
          -DinputTableName=pai_cluster_evaluation_test_input
          -DselectedColNames=f0,f1
          -DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id}
          -DoutputTableName=pai_ft_cluster_evaluation_out_${exp_id}_${trial_id};