The Binary Classification Evaluation component is used to calculate AUC, KS, and F1 score metrics to generate Kolmogorov–Smirnov (KS) curves, precision-recall (P-R) curves, ROC curves, lift charts, and gain charts.
Configure the component
You can use one of the following methods to configure the Binary Classification Evaluation component.
Method 1: Configure the component on the pipeline page
You can configure the parameters of the Binary Classification Evaluation component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
Parameter | Description |
---|---|
Original Label Column | The name of the objective column. |
Score Column | The prediction score column. Default value: prediction_score. |
Positive Sample Label | Specifies whether the samples are positive samples. |
Number of Bins with Same Frequency when Calculating Indexes such as KS and PR | The number of bins obtained by using the equal frequency binning method. |
Grouping Column | The group ID column. This parameter is used to calculate evaluation metrics for each group. |
Advanced Options | If you select Advanced Options, the Prediction Result Detail Column, Prediction Targets Consistent With Evaluation Targets, and Save Performance Index parameters are valid. |
Prediction Result Detail Column | The name of the prediction result detail column. |
Prediction Targets Consistent with Evaluation Targets | Specifies whether the prediction objective is consistent with the evaluation objective. For example, in a financial scenario, a program is trained to predict the probability that a customer is bad. The larger the probability is, the more likely the customer is bad. Related metrics such as lift evaluate the bad-customer detection rate. In this case, the prediction objective is consistent with the evaluation objective. In a credit scoring scenario, a program is trained to predict the probability that a customer is good. The larger the probability is, the more likely the customer is good. However, related metrics evaluate the bad-customer detection rate. In this case, the prediction objective is inconsistent with the evaluation objective. |
Save Performance Index | Specifies whether to save performance metrics. |
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name=evaluate -project=algo_public
-DoutputMetricTableName=output_metric_table
-DoutputDetailTableName=output_detail_table
-DinputTableName=input_data_table
-DlabelColName=label
-DscoreColName=score
Parameter | Required | Description | Default value |
---|---|---|---|
inputTableName | Yes | The name of the input table. | N/A |
inputTablePartitions | No | The partitions that are selected from the input table for training. | Full table |
labelColName | Yes | The name of the objective column. | N/A |
scoreColName | Yes | The name of the score column. | N/A |
groupColName | No | The name of the group column. This parameter is used for the evaluation of each group. | N/A |
binCount | No | The number of bins obtained by using the equal frequency binning method during the calculation of metrics such as KS and PR. | 1000 |
outputMetricTableName | Yes | The output metric table. The metrics include AUC, KS, and F1 score. | N/A |
outputDetailTableName | No | The detail data table that is generated. | N/A |
positiveLabel | No | Specifies whether the samples are positive samples. | 1 |
lifecycle | No | The lifecycle of the output table. | N/A |
coreNum | No | The number of cores. | Determined by the system |
memSizePerCore | No | The memory size of each core. | Determined by the system |