All Products
Search
Document Center

Platform For AI:Use the PS-SMART Binary Classification component based on MaxCompute resources

Last Updated:Apr 25, 2024

This topic describes how to submit a hyperparameter tuning experiment that uses MaxCompute computing resources to run the PS-SMART Binary Classification, Prediction, and Binary Classification Evaluation components to obtain an optimal hyperparameter combination for the PS-SMART component algorithm.

Prerequisites

  • The permissions that are required to use AutoML are granted to your account. The first time you use AutoML, make sure this prerequisite is met. For more information, see Grant permissions that are required to use AutoML.

  • A workspace is created and associated with the MaxCompute resources. For more information, see Create a workspace.

Step 1: Prepare data

In this example, a feature-engineered dataset for predicting bank customer product subscriptions is used. Prepare the training dataset and test dataset by performing the following operations:

  1. Run the following SQL commands on the MaxCompute client to create a table named bank_train_data and a table named bank_test_data. For information about how to install and configure the MaxCompute client, see MaxCompute client (odpscmd).

    create table bank_train_data(
    id bigint ,age double ,job double ,marital double ,education double ,default double ,housing double ,loan double ,contact double ,
    month double ,day_of_week double ,duration double ,campaign double ,pdays double ,previous double ,poutcome double ,emp_var_rate double ,
    cons_price_index double ,cons_conf_index double ,lending_rate3m double ,nr_employed double ,subscribe bigint
    );
    create table bank_test_data(
    id bigint ,age double ,job double ,marital double ,education double ,default double ,housing double ,loan double ,contact double ,
    month double ,day_of_week double ,duration double ,campaign double ,pdays double ,previous double ,poutcome double ,emp_var_rate double ,
    cons_price_index double ,cons_conf_index double ,lending_rate3m double ,nr_employed double ,subscribe bigint
    );
  2. Run the following Tunnel commands on the MaxCompute client to upload the following training dataset to the bank_train_data table and the test dataset to the bank_test_data table. For information about how to use Tunnel commands, see Tunnel commands.

    -- Upload the training dataset to the bank_train_data table. Replace xx/train_data.csv with the path of the train_data.csv file. 
    tunnel upload  xx/train_data.csv bank_train_data;
    -- Upload the test dataset to the bank_test_data table. Replace xx/test_data.csv with the path of the test_data.csv file. 
    tunnel upload  xx/test_data.csv bank_test_data;

Step 2: Create an experiment

Go to the Create Experiment page, and perform the following steps to configure key parameters. For information about the settings of other parameters, see Create an experiment. After you configure the parameters, click Submit.

  1. Configure parameters in the Execution Configurations section.

    Parameter

    Description

    Job Type

    The type of the destination database. Select MaxCompute.

    Command

    Configure the following five commands in sequence. The commands are executed in sequence.

    • cmd1: Run the PS-SMART Binary Classification component by using the test data that you prepared to build a binary classification model. For information about the parameters, see PS-SMART Binary Classification.

      PAI -name ps_smart
          -project algo_public
          -DinputTableName='bank_train_data'
          -DmodelName='bi_ps_${exp_id}_${trial_id}'
          -DoutputTableName='bi_model_output_${exp_id}_${trial_id}'
          -DoutputImportanceTableName='bi_imp_${exp_id}_${trial_id}'
          -DlabelColName='subscribe'
          -DfeatureColNames='age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp_var_rate,cons_price_index,cons_conf_index,lending_rate3m,nr_employed'
          -DenableSparse='false'
          -Dobjective='binary:logistic'
          -Dmetric='error'
          -DfeatureImportanceType='gain'
          -DtreeCount='${tree_count}'
          -DmaxDepth='${max_depth}'
          -Dshrinkage="0.3"
          -Dl2="1.0"
          -Dl1="0"
          -Dlifecycle="3"
          -DsketchEps="0.03"
          -DsampleRatio="1.0"
          -DfeatureRatio="1.0"
          -DbaseScore="0.5"
          -DminSplitLoss="0"
    • cmd2: Delete the prediction result table.

      drop table if exists bi_output_${exp_id}_${trial_id};
    • cmd3: Run the Prediction component based on the model generated by cmd1 to predict the input data. For information about the parameters, see Prediction.

      PAI -name prediction
          -project algo_public
          -DinputTableName='bank_test_data'
          -DmodelName='bi_ps_${exp_id}_${trial_id}'
          -DoutputTableName='bi_output_${exp_id}_${trial_id}'
          -DfeatureColNames='age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp_var_rate,cons_price_index,cons_conf_index,lending_rate3m,nr_employed'
          -DappendColNames='subscribe,age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp_var_rate,cons_price_index,cons_conf_index,lending_rate3m,nr_employed'
          -DenableSparse='false'
          -Dlifecycle='3';
    • cmd4: Run the Binary Classification Evaluation component based on the prediction result generated by cmd3. For information about the parameters, see Binary Classification Evaluation.

      PAI -name evaluate -project algo_public
          -DoutputDetailTableName='bi_0804_${exp_id}_${trial_id}_outputDetailTable'
          -DoutputMetricTableName='bi_0804_${exp_id}_${trial_id}_outputMetricTable'
          -DlabelColName='subscribe'
          -DscoreColName='prediction_score'
          -DpositiveLabel='1'
          -DbinCount='1000'
          -DdetailColName='prediction_detail'
          -DlabelMatch='true'
          -DinputTableName='bi_output_${exp_id}_${trial_id}';
    • cmd5: Obtain the evaluation metrics from the evaluation result table generated by cmd4.

      INSERT OVERWRITE TABLE ps_smart_classification_metrics PARTITION(pt='${exp_id}_${trial_id}')
          SELECT /*+MAPJOIN(b,c,d)*/ REGEXP_EXTRACT(a.data_range, '\\\((.*?),') as threshold,
          a.recall, a.precision, a.f1_score, c.value as auc, d.value as ks
          FROM (SELECT recall, precision, f1_score, data_range, 'AUC' auc, 'KS' ks from bi_0804_${exp_id}_${trial_id}_outputDetailTable) a
          JOIN bi_0804_${exp_id}_${trial_id}_outputMetricTable b
          on b.name='F1 Score' AND a.f1_score=b.value
          JOIN bi_0804_${exp_id}_${trial_id}_outputMetricTable c
          ON c.name=a.auc
          JOIN bi_0804_${exp_id}_${trial_id}_outputMetricTable d
          ON d.name=a.ks;

    Hyperparameter

    The following section lists the constraint type and search space of the hyperparameters:

    • tree_count:

      • Constraint Type: choice.

      • Search Space: Click the image.png icon to add the following enumeration values: 50, 100, and 150.

    • max_depth:

      • Constraint Type: choice.

      • Search Space: Click the image.png icon to add the following enumeration values: 6, 8, and 10.

    The preceding configuration generates nine hyperparameter combinations. The system creates a trial for each hyperparameter combination and runs the PS-SMART Binary Classification and Binary Classification Evaluation components in each trial by using one hyperparameter combination.

  2. Configure parameters in the Trial Configuration section.

    Parameter

    Description

    Metric Type

    Select table.

    Method

    Select best.

    Metric Weight

    Configure the following metric weight configurations:

    • Key: recall. Value: 0.5.

    • Key: precision. Value: 0.25.

    • Key: auc. Value: 0.25.

    Metric Source

    Set the value to select * from ps_smart_classification_metrics where pt='${exp_id}_${trial_id}';.

    Optimization

    Select Maximize.

    Model Name

    Set the value to bi_ps_${exp_id}_${trial_id}.

  3. Configure parameters in the Search Configurations section.

    Parameter

    Description

    Search Algorithm

    Select TPE.

    Maximum Trials

    Set the value to 5.

    Maximum Concurrent Trials

    Set the value to 2.

Step 3: View the experiment details and results

  1. On the AutoML page, click the name of the experiment to go to the Experiment Details page. imageOn the Experiment Details page, you can view the execution progress and status of trials. The system automatically creates five trials for the experiment based on the specified search algorithm and the maximum number of trials.

  2. Click Trials to go to the Trials tab. You can view the trials that are automatically generated for the experiment and the execution status, final metrics, and hyperparameter combination of each trial. image In this example, the Optimization parameter is set to Maximize. The following hyperparameter combination indicated by the metric 0.688894 is the optimal combination: tree_count:50; max_depth:8.