configure the confusion matrix component - Platform For AI

The Confusion Matrix component is suitable for supervised learning and corresponds to the matching matrix in unsupervised learning. In precision evaluation, the Confusion Matrix component is used to compare classification results with actual measured values and display the precision of classification results in a matrix. This topic describes how to configure the Confusion Matrix component in Platform for AI (PAI).

Limits

You can use the Confusion Matrix component based only on the computing resources of MaxCompute.

Configure the component

You can use one of the following methods to configure the Confusion Matrix component.

Method 1: Configure the component in the PAI console

You can configure the parameters of the Confusion Matrix component in Machine Learning Designer. The following table describes the parameters.

Parameter	Description
Original Label Column	The columns of numeric data types are supported.
Prediction Result Label Column	This parameter is required if the Threshold parameter is not specified.
Threshold	The threshold used to determine positive samples. Samples whose sample values are greater than the value of this parameter are positive samples.
Prediction Result Detail Column	You can configure only one of the Prediction Result Detail Column and Prediction Result Label Column parameters. This parameter is required if the Threshold parameter is specified.
Positive Sample Label	This parameter is required if the Threshold parameter is specified.

Method 2: Configure the component by using PAI commands

The following section describes the parameters. You can use SQL scripts to call PAI commands. For more information, see SQL Script.

Threshold not specified

pai -name confusionmatrix -project algo_public
    -DinputTableName=wpbc_pred
    -DoutputTableName=wpbc_confu
    -DlabelColName=label
    -DpredictionColName=prediction_result;

Threshold specified

pai -name confusionmatrix -project algo_public
    -DinputTableName=wpbc_pred
    -DoutputTableName=wpbc_confu
    -DlabelColName=label
    -DpredictionDetailColName=prediction_detail
    -Dthreshold=0.8
    -DgoodValue=N;

Parameter	Required	Description	Default value
inputTableName	Yes	The name of the input table. The value is also the name of the prediction output table.	N/A
inputTablePartition	No	The partitions that are selected from the input table for training.	Full table
outputTableName	Yes	The name of the output table. The output table is used to store the confusion matrix.	N/A
labelColName	Yes	The name of the original label column.	N/A
predictionColName	No	The name of the prediction result column. This parameter is required if the threshold parameter is not specified.	N/A
predictionDetailColName	No	The name of the prediction result detail column. This parameter is required if the threshold parameter is specified.	N/A
threshold	No	The threshold used to determine positive samples.	0.5
goodValue	No	The label value that corresponds to the training coefficient in binary classification. This parameter is required if the threshold parameter is specified.	N/A
coreNum	No	The number of cores used in computing.	Automatically allocated
memSizePerCore	No	The memory size of each core. Unit: MB.	Automatically allocated
lifecycle	No	The lifecycle of the output table.	N/A

Examples

Create a table named test_data by using the MaxCompute client. The columns of the table are id, label, and prediction_result, and the types of the columns are bigint, string, and string. For information about how to install and configure the MaxCompute client, see MaxCompute client (odpscmd). For information about how to create a table, see Create tables.
Import the following test data to the test_data table. For more information about how to import data, see Import data to tables.
id
label
prediction_result
0
A
A
1
A
B
2
A
A
3
A
A
4
B
B
5
B
B
6
B
A
7
B
B
8
B
A
9
A
A
Create a pipeline as shown in the following figure and run the pipeline. For more information, see Algorithm modeling.
1. Drag the Read Table and Confusion Matrix components from the list on the left to the canvas.
2. Connect the components as shown in the preceding figure to build a pipeline.
3. Configure the component parameters.
  - Click the Read Table -1 component on the canvas. On the Select Table tab on the right, set the Table Name parameter to test_data.
  - Click the Confusion Matrix -1 component on the canvas and configure the parameters. The following table describes key parameters. Use the default values for other parameters.
    Parameter
    Description
    Original Label Column
    Select the label column.
    Prediction Result Label Column
    Enter prediction_result.
4. After you configure the parameters, click the icon to run the pipeline.
After you run the pipeline, right-click the Confusion Matrix -1 component and select Visual Analysis to view the output of the component.
- Click the Confusion Matrix tab to view the output confusion matrix.
- Click the Statistics tab to view the statistics about the model.

References

For information about Machine Learning Designer components, see Overview of Machine Learning Designer.
Machine Learning Designer provides various preset algorithm components. You can select a component for data processing based on your business requirements. For more information, see Component reference: Overview of all components.

id	label	prediction_result
0	A	A
1	A	B
2	A	A
3	A	A
4	B	B
5	B	B
6	B	A
7	B	B
8	B	A
9	A	A