The Regression Model Evaluation component is used to evaluate the advantages and disadvantages of the different models of regression algorithms based on prediction results and original results. Then, evaluation metrics and histograms of residuals are generated.
Regression Model Evaluation
You can use one of the following methods to configure the Regression Model Evaluation component.
Method 1: Configure the component on the pipeline page
You can configure the parameters of the Regression Model Evaluation component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
Tab | Parameter | Description |
---|---|---|
Fields Setting | Original Regression Value | The columns of numeric data types are supported. |
Predicted Regression Value | The columns of numeric data types are supported. | |
Tuning | Worker number | The number of cores. Valid values: 1 to 9999. This parameter must be used with the Memory Size per Node parameter. |
Memory Size per Node | The memory size of each core. Valid values: 1024 to 64 × 1024. Unit: MB. |
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name regression_evaluation -project algo_public
-DinputTableName=input_table
-DyColName=y_col
-DpredictionColName=prediction_col
-DindexOutputTableName=index_output_table
-DresidualOutputTableName=residual_output_table;
Parameter | Required | Description | Default value |
---|---|---|---|
inputTableName | Yes | The name of the input table. | N/A |
inputTablePartitions | No | The partitions that are selected from the input table for computing. | Full table |
yColName | Yes | The name of the column that contains original dependent variables in the input table. The columns of numeric data types are supported. | N/A |
predictionColName | Yes | The name of the column that contains dependent variables in the prediction result. The columns of numeric data types are supported. | N/A |
indexOutputTableName | Yes | The name of the output table of regression metrics. | N/A |
residualOutputTableName | Yes | The name of the output table of the histogram of residuals. | N/A |
intervalNum | No | The number of intervals of the histogram. | 100 |
lifecycle | No | The lifecycle of the output table. The value of this parameter must be a positive integer. | N/A |
coreNum | No | The number of cores. Valid values: 1 to 9999. | Determined by the system |
memSizePerCore | No | The memory size of each core. Valid values: 1024 to 64 × 1024. Unit: MB. | Determined by the system |
Output
The output table of regression metrics is generated in the JSON format and contains the following parameters.
Parameter | Description |
---|---|
SST | The total sum of squares. |
SSE | The sum of squared errors. |
SSR | The sum of squares due to regression. |
R2 | The coefficient of determination. |
R | The coefficient of multiple correlations. |
MSE | The mean-square error. |
RMSE | The root-mean-square error. |
MAE | The mean absolute error. |
MAD | The mean error. |
MAPE | The mean absolute percentage error. |
count | The number of rows. |
yMean | The mean of original dependent variables. |
predictionMean | The mean of prediction results. |