The Confusion Matrix component is suitable for supervised learning and corresponds to the matching matrix in unsupervised learning. In precision evaluation, the Confusion Matrix component is used to compare classification results with actual measured values and display the precision of classification results in a matrix. This topic describes how to configure the Confusion Matrix component in Platform for AI (PAI).
Limits
You can use the Confusion Matrix component based only on the computing resources of MaxCompute.
Configure the component
You can use one of the following methods to configure the Confusion Matrix component.
Method 1: Configure the component in the PAI console
You can configure the parameters of the Confusion Matrix component in Machine Learning Designer. The following table describes the parameters.
Parameter | Description |
Original Label Column | The columns of numeric data types are supported. |
Prediction Result Label Column | This parameter is required if the Threshold parameter is not specified. |
Threshold | The threshold used to determine positive samples. Samples whose sample values are greater than the value of this parameter are positive samples. |
Prediction Result Detail Column | You can configure only one of the Prediction Result Detail Column and Prediction Result Label Column parameters. This parameter is required if the Threshold parameter is specified. |
Positive Sample Label | This parameter is required if the Threshold parameter is specified. |
Method 2: Configure the component by using PAI commands
The following section describes the parameters. You can use SQL scripts to call PAI commands. For more information, see SQL Script.
Threshold not specified
pai -name confusionmatrix -project algo_public -DinputTableName=wpbc_pred -DoutputTableName=wpbc_confu -DlabelColName=label -DpredictionColName=prediction_result;
Threshold specified
pai -name confusionmatrix -project algo_public -DinputTableName=wpbc_pred -DoutputTableName=wpbc_confu -DlabelColName=label -DpredictionDetailColName=prediction_detail -Dthreshold=0.8 -DgoodValue=N;
Parameter | Required | Description | Default value |
inputTableName | Yes | The name of the input table. The value is also the name of the prediction output table. | N/A |
inputTablePartition | No | The partitions that are selected from the input table for training. | Full table |
outputTableName | Yes | The name of the output table. The output table is used to store the confusion matrix. | N/A |
labelColName | Yes | The name of the original label column. | N/A |
predictionColName | No | The name of the prediction result column. This parameter is required if the threshold parameter is not specified. | N/A |
predictionDetailColName | No | The name of the prediction result detail column. This parameter is required if the threshold parameter is specified. | N/A |
threshold | No | The threshold used to determine positive samples. | 0.5 |
goodValue | No | The label value that corresponds to the training coefficient in binary classification. This parameter is required if the threshold parameter is specified. | N/A |
coreNum | No | The number of cores used in computing. | Automatically allocated |
memSizePerCore | No | The memory size of each core. Unit: MB. | Automatically allocated |
lifecycle | No | The lifecycle of the output table. | N/A |
Examples
Create a table named test_data by using the MaxCompute client. The columns of the table are id, label, and prediction_result, and the types of the columns are bigint, string, and string. For information about how to install and configure the MaxCompute client, see MaxCompute client (odpscmd). For information about how to create a table, see Create tables.
Import the following test data to the test_data table. For more information about how to import data, see Import data to tables.
id
label
prediction_result
0
A
A
1
A
B
2
A
A
3
A
A
4
B
B
5
B
B
6
B
A
7
B
B
8
B
A
9
A
A
Create a pipeline as shown in the following figure and run the pipeline. For more information, see Algorithm modeling.
Drag the Read Table and Confusion Matrix components from the list on the left to the canvas.
Connect the components as shown in the preceding figure to build a pipeline.
Configure the component parameters.
Click the Read Table -1 component on the canvas. On the Select Table tab on the right, set the Table Name parameter to test_data.
Click the Confusion Matrix -1 component on the canvas and configure the parameters. The following table describes key parameters. Use the default values for other parameters.
Parameter
Description
Original Label Column
Select the label column.
Prediction Result Label Column
Enter prediction_result.
After you configure the parameters, click the icon to run the pipeline.
After you run the pipeline, right-click the Confusion Matrix -1 component and select Visual Analysis to view the output of the component.
Click the Confusion Matrix tab to view the output confusion matrix.
Click the Statistics tab to view the statistics about the model.
References
For information about Machine Learning Designer components, see Overview of Machine Learning Designer.
Machine Learning Designer provides various preset algorithm components. You can select a component for data processing based on your business requirements. For more information, see Component reference: Overview of all components.