This topic describes the Chi-square Goodness of Fit Test component provided by Machine Learning Designer. The Chi-square Goodness of Fit Test component is used in scenarios in which categorical variables are used. This component is used to determine the difference between the observed frequency and expected frequency for each classification of a single multiclass categorical variable. The null hypothesis assumes that the observed frequency and expected frequency are the same.
Configure the component
You can configure the Chi-square Goodness of Fit Test component by using one of the following methods:
Method 1: Configure the component in Machine Learning Designer
Configure the component on the pipeline configuration tab of Machine Learning Designer in the Machine Learning Platform for AI console.
Parameter | Description |
Input Column | The column on which you want to perform a chi-square test. |
Class Probability | The class probability configuration. Specify this parameter in the |
Method 2: Run Machine Learning Platform for AI commands
Configure the component parameters by using a Machine Learning Platform for AI command. You can use the SQL Script component to run Machine Learning Platform for AI commands. For more information, see SQL Script. The following table describes the parameters of the command that is used to configure this component.
PAI -name chisq_test
-project algo_public
-DinputTableName=pai_chisq_test_input
-DcolName=f0
-DprobConfig=0:0.3,1:0.7
-DoutputTableName=pai_chisq_test_output0
-DoutputDetailTableName=pai_chisq_test_output0_detail
Parameter | Required | Description | Default value |
inputTableName | Yes | The name of the input table. | None. |
colName | Yes | The name of the column. | None. |
outputTableName | Yes | The name of the output table. | None. |
outputDetailTableName | Yes | The name of the output details table. | None. |
inputTablePartitions | No | The partition that is selected from the input table for training. The following formats are supported:
Note If you specify multiple partitions, separate them with commas (,). | By default, this parameter is left empty. |
probConfig | No | The class probability configuration. Specify this parameter in the | By default, this parameter is not specified, and all the probability values are the same. |
Example
Test data
create table pai_chisq_test_input as select * from ( select '1' as f0,'2' as f1 union all select '1' as f0,'3' as f1 union all select '1' as f0,'4' as f1 union all select '0' as f0,'3' as f1 union all select '0' as f0,'4' as f1 )tmp;
PAI command
PAI -name chisq_test -project algo_public -DinputTableName=pai_chisq_test_input -DcolName=f0 -DprobConfig=0:0.3,1:0.7 -DoutputTableName=pai_chisq_test_output0 -DoutputDetailTableName=pai_chisq_test_output0_detail
Output description
The output table that is specified by the outputTableName parameter is in the JSON format. The table contains only one row and one column.
{ "Chi-Square": { "comment": "Pearson's chi-square test", "df": 1, "p-value": 0.75, "value": 0.2380952380952381 } }
The following table describes the columns in the output detail table that is specified by the outputDetailTableName parameter.
column name
comment
colName
The data source class.
observed
The observed frequency.
expected
The expected frequency.
residuals
The standard residuals, which are calculated by using the following expression:
(Standard residuals = (Observed frequency - Expected frequency)/sqrt(Expected frequency)
.Generated data