The Linear Model Feature Importance component is used to calculate the feature importance for a linear model, such as linear regression and logistic regression for binary classification. Both the sparse and dense data formats are supported. This topic describes how to configure the Linear Model Feature Importance component.
Limits
You can use the Linear Model Feature Importance component based only on the computing resources of MaxCompute.
Configure the component
You can configure the component by using one of the following methods:
Method 1: Configure the component in the Platform for AI (PAI) console
Configure the component parameters in Machine Learning Designer. The following table describes the parameters.
Tab | Parameter | Description |
Fields Setting | Feature Columns | Select the feature columns for training from the input table. Optional. By default, all columns except the label column are selected. |
Target Column | Required. The label column. Click Select Fields. In the Select Fields dialog box, enter the keyword of the column that you want to search for. Select the column and click OK. | |
Input Sparse Format Data | Optional. Specifies whether data in the input table is sparse. | |
Tuning | Cores | Optional. The number of cores used in computing. |
Memory Size per Core | Optional. The memory size of each core. Unit: MB. |
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. The following section describes the parameters. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name regression_feature_importance -project algo_public
-DmodelName=xlab_m_logisticregressi_20317_v0
-DoutputTableName=pai_temp_2252_20321_1
-DlabelColName=y
-DfeatureColNames=pdays,previous,emp_var_rate,cons_price_idx,cons_conf_idx,euribor3m,nr_employed,age,campaign
-DenableSparse=false -DinputTableName=pai_dense_10_9;
Parameter | Required | Description | Default value |
inputTableName | Yes | The name of the input table. | None |
outputTableName | Yes | The name of the output table. | None |
labelColName | Yes | The label column that is selected from the input table. | None |
modelName | Yes | The name of the input model. | None |
featureColNames | No | The feature columns that are selected from the input table. | All columns other than the label column |
inputTablePartitions | No | The partitions that are selected from the input table. | Full table |
enableSparse | No | Specifies whether data in the input table is sparse. | false |
itemDelimiter | No | The delimiter that is used to separate key-value pairs when data in the input table is sparse. | Backspace |
kvDelimiter | No | The delimiter that is used to separate keys and values when data in the input table is sparse. | Colons (:) |
lifecycle | No | The lifecycle of the output table. | Not specified |
coreNum | No | The number of cores. | Determined by the system |
memSizePerCore | No | The memory size of each core. | Determined by the system |
Example
Create a table named bank_data and import data to the table. For more information, see Create tables and Import data to tables.
Execute the following SQL statements to generate training data:
create table if not exists pai_dense_10_9 as select age,campaign,pdays, previous, emp_var_rate, cons_price_idx, cons_conf_idx, euribor3m, nr_employed, fixed_deposit from bank_data limit 10;
Create a pipeline shown in the following figure and run the component. For more information, see Algorithm modeling.
In the left-side component list of Machine Learning Designer, separately search for the Read Table, Logistic Regression for Multiclass Classification, and Linear Model Feature Importance components, and drag the components to the canvas on the right.
Connect nodes by drawing lines to organize the nodes into a pipeline that includes upstream and downstream relationships based on the preceding figure.
Configure the component parameters.
On the canvas, click the Read Table-1 component. On the Select Table tab in the right pane, set Table Name to bank_data.
On the canvas, click the Logistic Regression for Multiclass Classification-1 component. On the Fields Setting tab, select age, campaign, pdays, previous, emp_var_rate, cons_price_idx, cons_conf_idx, euribor3m, and nr_installed for the Training Feature Columns parameter. Set the Target Columns parameter to fixed_deposit. Retain the default values for the remaining parameters.
On the canvas, click the Linear Model Feature Importance-1 component. On the Fields Setting tab, set the Target Column parameter to fixed_deposit. Retain the default values for the remaining parameters.
After the parameter configuration is complete, click the button to run the pipeline.
After the pipeline is run, right-click the Linear Model Feature Importance-1 component and choose
.The following table describes the calculation formulas for metrics.
Column name
Formula
weight
abs(w_)
importance
abs(w_j) * STD(f_i)
Noteabs(w_j) indicates the absolute value of the feature coefficient. STD(f_i) indicates the standard deviation of the training data.
Right-click the Linear Model Feature Importance-1 component and select View Analytics Report to view the reports for visualized data analysis.
References
For more information about the components provided by Machine Learning Designer, see Overview of Machine Learning Designer.
Machine Learning Designer provides various preset algorithm components. You can use a component to process data based on your business requirements. For more information, see Component reference: Overview of all components.