Linear Model Feature Importance - Platform For AI - Alibaba Cloud Documentation Center

The Linear Model Feature Importance component is used to calculate the feature importance for a linear model, such as linear regression and logistic regression for binary classification. Both the sparse and dense data formats are supported. This topic describes how to configure the Linear Model Feature Importance component.

Limits

You can use the Linear Model Feature Importance component based only on the computing resources of MaxCompute.

Configure the component

You can configure the component by using one of the following methods:

Method 1: Configure the component in the Platform for AI (PAI) console

Configure the component parameters in Machine Learning Designer. The following table describes the parameters.

Tab	Parameter	Description
Fields Setting	Feature Columns	Select the feature columns for training from the input table. Optional. By default, all columns except the label column are selected.
	Target Column	Required. The label column. Click Select Fields. In the Select Fields dialog box, enter the keyword of the column that you want to search for. Select the column and click OK.
	Input Sparse Format Data	Optional. Specifies whether data in the input table is sparse.
Tuning	Cores	Optional. The number of cores used in computing.
Tuning	Memory Size per Core	Optional. The memory size of each core. Unit: MB.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. The following section describes the parameters. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name regression_feature_importance -project algo_public
    -DmodelName=xlab_m_logisticregressi_20317_v0
    -DoutputTableName=pai_temp_2252_20321_1
    -DlabelColName=y
    -DfeatureColNames=pdays,previous,emp_var_rate,cons_price_idx,cons_conf_idx,euribor3m,nr_employed,age,campaign
    -DenableSparse=false -DinputTableName=pai_dense_10_9;

Parameter	Required	Description	Default value
inputTableName	Yes	The name of the input table.	None
outputTableName	Yes	The name of the output table.	None
labelColName	Yes	The label column that is selected from the input table.	None
modelName	Yes	The name of the input model.	None
featureColNames	No	The feature columns that are selected from the input table.	All columns other than the label column
inputTablePartitions	No	The partitions that are selected from the input table.	Full table
enableSparse	No	Specifies whether data in the input table is sparse.	false
itemDelimiter	No	The delimiter that is used to separate key-value pairs when data in the input table is sparse.	Space
kvDelimiter	No	The delimiter that is used to separate keys and values when data in the input table is sparse.	Colons (:)
lifecycle	No	The lifecycle of the output table.	Not specified
coreNum	No	The number of cores.	Determined by the system
memSizePerCore	No	The memory size of each core.	Determined by the system

Example

Create a table named bank_data and import data to the table. For more information, see Create tables and Import data to tables.

Execute the following SQL statements to generate training data:

create table if not exists pai_dense_10_9 as
select
    age,campaign,pdays, previous, emp_var_rate, cons_price_idx, cons_conf_idx, euribor3m, nr_employed, fixed_deposit
from  bank_data limit 10;

Create a pipeline shown in the following figure and run the component. For more information, see Algorithm modeling.
1. In the left-side component list of Machine Learning Designer, separately search for the Read Table, Logistic Regression for Multiclass Classification, and Linear Model Feature Importance components, and drag the components to the canvas on the right.
2. Connect nodes by drawing lines to organize the nodes into a pipeline that includes upstream and downstream relationships based on the preceding figure.
3. Configure the component parameters.
  - On the canvas, click the Read Table-1 component. On the Select Table tab in the right pane, set Table Name to bank_data.
  - On the canvas, click the Logistic Regression for Multiclass Classification-1 component. On the Fields Setting tab, select age, campaign, pdays, previous, emp_var_rate, cons_price_idx, cons_conf_idx, euribor3m, and nr_installed for the Training Feature Columns parameter. Set the Target Columns parameter to fixed_deposit. Retain the default values for the remaining parameters.
  - On the canvas, click the Linear Model Feature Importance-1 component. On the Fields Setting tab, set the Target Column parameter to fixed_deposit. Retain the default values for the remaining parameters.
4. After the parameter configuration is complete, click the button to run the pipeline.
After the pipeline is run, right-click the Linear Model Feature Importance-1 component and choose View Data > Model Importance Table.
The following table describes the calculation formulas for metrics.
Column name
Formula
weight
abs(w_)
importance
abs(w_j) * STD(f_i)
Note
abs(w_j) indicates the absolute value of the feature coefficient. STD(f_i) indicates the standard deviation of the training data.
Right-click the Linear Model Feature Importance-1 component and select View Analytics Report to view the reports for visualized data analysis.

References

For more information about the components provided by Machine Learning Designer, see Overview of Machine Learning Designer.
Machine Learning Designer provides various preset algorithm components. You can use a component to process data based on your business requirements. For more information, see Component reference: Overview of all components.

Column name	Formula
weight	abs(w_)
importance	abs(w_j) * STD(f_i) Note abs(w_j) indicates the absolute value of the feature coefficient. STD(f_i) indicates the standard deviation of the training data.