All Products
Search
Document Center

Platform For AI:Confusion Matrix

Last Updated:Nov 14, 2024

The Confusion Matrix component is suitable for supervised learning and corresponds to the matching matrix in unsupervised learning. In precision evaluation, the Confusion Matrix component is used to compare classification results with actual measured values and display the precision of classification results in a matrix. This topic describes how to configure the Confusion Matrix component in Platform for AI (PAI).

Limits

You can use the Confusion Matrix component based only on the computing resources of MaxCompute.

Configure the component

You can use one of the following methods to configure the Confusion Matrix component.

Method 1: Configure the component in the PAI console

You can configure the parameters of the Confusion Matrix component in Machine Learning Designer. The following table describes the parameters.

Parameter

Description

Original Label Column

The columns of numeric data types are supported.

Prediction Result Label Column

This parameter is required if the Threshold parameter is not specified.

Threshold

The threshold used to determine positive samples. Samples whose sample values are greater than the value of this parameter are positive samples.

Prediction Result Detail Column

You can configure only one of the Prediction Result Detail Column and Prediction Result Label Column parameters. This parameter is required if the Threshold parameter is specified.

Positive Sample Label

This parameter is required if the Threshold parameter is specified.

Method 2: Configure the component by using PAI commands

The following section describes the parameters. You can use SQL scripts to call PAI commands. For more information, see SQL Script.

  • Threshold not specified

    pai -name confusionmatrix -project algo_public
        -DinputTableName=wpbc_pred
        -DoutputTableName=wpbc_confu
        -DlabelColName=label
        -DpredictionColName=prediction_result;
  • Threshold specified

    pai -name confusionmatrix -project algo_public
        -DinputTableName=wpbc_pred
        -DoutputTableName=wpbc_confu
        -DlabelColName=label
        -DpredictionDetailColName=prediction_detail
        -Dthreshold=0.8
        -DgoodValue=N;

Parameter

Required

Description

Default value

inputTableName

Yes

The name of the input table. The value is also the name of the prediction output table.

N/A

inputTablePartition

No

The partitions that are selected from the input table for training.

Full table

outputTableName

Yes

The name of the output table. The output table is used to store the confusion matrix.

N/A

labelColName

Yes

The name of the original label column.

N/A

predictionColName

No

The name of the prediction result column. This parameter is required if the threshold parameter is not specified.

N/A

predictionDetailColName

No

The name of the prediction result detail column. This parameter is required if the threshold parameter is specified.

N/A

threshold

No

The threshold used to determine positive samples.

0.5

goodValue

No

The label value that corresponds to the training coefficient in binary classification. This parameter is required if the threshold parameter is specified.

N/A

coreNum

No

The number of cores used in computing.

Automatically allocated

memSizePerCore

No

The memory size of each core. Unit: MB.

Automatically allocated

lifecycle

No

The lifecycle of the output table.

N/A

Examples

  1. Create a table named test_data by using the MaxCompute client. The columns of the table are id, label, and prediction_result, and the types of the columns are bigint, string, and string. For information about how to install and configure the MaxCompute client, see MaxCompute client (odpscmd). For information about how to create a table, see Create tables.

  2. Import the following test data to the test_data table. For more information about how to import data, see Import data to tables.

    id

    label

    prediction_result

    0

    A

    A

    1

    A

    B

    2

    A

    A

    3

    A

    A

    4

    B

    B

    5

    B

    B

    6

    B

    A

    7

    B

    B

    8

    B

    A

    9

    A

    A

  3. Create a pipeline as shown in the following figure and run the pipeline. For more information, see Algorithm modeling. 混淆矩阵实验

    1. Drag the Read Table and Confusion Matrix components from the list on the left to the canvas.

    2. Connect the components as shown in the preceding figure to build a pipeline.

    3. Configure the component parameters.

      • Click the Read Table -1 component on the canvas. On the Select Table tab on the right, set the Table Name parameter to test_data.

      • Click the Confusion Matrix -1 component on the canvas and configure the parameters. The following table describes key parameters. Use the default values for other parameters.

        Parameter

        Description

        Original Label Column

        Select the label column.

        Prediction Result Label Column

        Enter prediction_result.

    4. After you configure the parameters, click the image icon to run the pipeline.

  4. After you run the pipeline, right-click the Confusion Matrix -1 component and select Visual Analysis to view the output of the component.

    • Click the Confusion Matrix tab to view the output confusion matrix.

      image

    • Click the Statistics tab to view the statistics about the model.

References