How to use Machine Learning Designer components to fine-tune hyperparameters - Platform For AI

This topic describes how to run the K-means Clustering component and Clustering Model Evaluation components of Platform for AI (PAI) by submitting a hyperparameter tuning experiment based on MaxCompute resources to obtain an optimal hyperparameter combination for the K-means Clustering component algorithm.

Step 1: Prepare data

You can prepare test data and evaluation data by referring to the examples in the Clustering Model Evaluation topic.

The sample data pai_online_project.pai_kmeans_test_input and pai_online_project.pai_cluster_evaluation_test_input used in this example are from an open source data source. You can directly use the data.

Step 2: Create an experiment

Go to the Create Experiment page. For more information, see Create an experiment.

On the Create Experiment page, configure the parameters. The following tables describe the key parameters. For information about other parameters, see Create an experiment.

Execution Configurations

Parameter	Description
Metric Type	Select MaxCompute.
Command	Configure the following commands and run the commands in sequence: Command 1: Run the K-means Clustering component to build a clustering model by using the prepared test data. For information about how to configure the parameters, see the "Method 2: Run PAI commands" section in the K-means Clustering topic. pai -name kmeans -project algo_public -DinputTableName=pai_online_project.pai_kmeans_test_input -DselectedColNames=f0,f1 -DappendColNames=f0,f1 -DcenterCount=${centerCount} -Dloop=10 -Daccuracy=0.01 -DdistanceType=${distanceType} -DinitCenterMethod=random -Dseed=1 -DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id} -DidxTableName=pai_kmeans_test_output_idx_${exp_id}_${trial_id} -DclusterCountTableName=pai_kmeans_test_output_couter_${exp_id}_${trial_id} -DcenterTableName=pai_kmeans_test_output_center_${exp_id}_${trial_id}; In the preceding code, ${centerCount} and ${distanceType} are the hyperparameter variables that you can define. Command 2: Run the Clustering Model Evaluation component based on the clustering result generated by Command 1 to evaluate the performance of the clustering model. For information about how to configure the parameters, see the "Method 2: Use PAI commands" section in the Clustering Model Evaluation topic. `PAI -name cluster_evaluation -project algo_public -DinputTableName=pai_online_project.pai_cluster_evaluation_test_input -DselectedColNames=f0,f1 -DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id} -DoutputTableName=pai_ft_cluster_evaluation_out_${exp_id}_${trial_id};`
Hyperparameter	The following section lists the constraint type and valid values of the hyperparameters: centerCount: Constraint Type: choice. Valid Values: Click the icon to add the following enumeration values: 2, 3, 4, and 5. distanceType: Constraint Type: choice. Valid Values: Click the icon to add the following enumeration values: euclidean, cosine, and cityblock. The system generates 12 hyperparameter combinations based on the preceding configuration and creates a trial for each hyperparameter combination. In each trial, the system runs the K-means Clustering component and Clustering Model Evaluation component by using the hyperparameter combination.

Trial Configuration

Field	Description
Metric Type	Select table.
Method	Select best.
Metric Weight	Key: vrc Value: 1
Metric Source	Set the parameter to `select GET_JSON_OBJECT(summary, '$.calinhara') as vrc from pai_ft_cluster_evaluation_out_${exp_id}_${trial_id};`.
Optimization	Select Maximize.
Model Name	Set the parameter to `pai_kmeans_test_output_model_${exp_id}_${trial_id}`.

Search Configurations
Parameter
Description
Search Algorithm
Select TPE.
Maximum Trials
Set the parameter to 6.
Maximum Concurrent Trials
Set the parameter to 3.

Click Submit.
The system starts creating an experiment . You can view the experiment on the AutoML page.

Step 3: View the experiment details and results

On the AutoML page, click the name of the experiment to go to the Experiment Details page.
On the Experiment Details page, you can view the execution progress and status of the trial.
In this example, the system creates six trials based on the search algorithm and the maximum number of trials that you specified.
On the Trials tab, you can view the trials that the system generated. You can also view the execution status, final metric, and hyperparameter combination of each trial.
In this example, the Optimization parameter is set to Maximize. Therefore, the optimal hyperparameter combination is the one whose Final Metric is 59089. Optimal combination: centerCount: 2, distanceType: cityblock.

Parameter	Description
Search Algorithm	Select TPE.
Maximum Trials	Set the parameter to 6.
Maximum Concurrent Trials	Set the parameter to 3.