The Feature Scaling component can scale dense or sparse numeric data by using common scaling functions.
Overview
The Feature Scaling component has the following characteristics:
- Supports common scaling functions such as log2, log10, In, abs, and sqrt.
- Supports dense and sparse data.
Configure the component
You can use one of the following methods to configure the Feature Scaling component.
Method 1: Configure the component on the pipeline page
You can configure the parameters of the Feature Scaling component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
Tab | Parameter | Description |
---|---|---|
Fields Setting | Scaled Features | The features that you want to scale. |
Label Column | The label column. If this parameter is specified, the x-y histogram that displays the relationship between the features and the objective variables can be viewed. | |
Sparse Features (K:V,K:V) | Specifies whether the training data is sparse. If the data is sparse, a single field contains all the data instead of a single data record. | |
Reserve Converted Features | Specifies whether to prefix new features with scale_. | |
Parameters Setting | Scaling Function | The Feature Scaling component supports the following scaling functions:
|
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name fe_scale_runner -project algo_public
-Dlifecycle=28
-DscaleMethod=log2
-DscaleCols=nr_employed
-DinputTable=pai_dense_10_1
-DoutputTable=pai_temp_2262_20380_1;
Parameter | Required | Description | Default value |
---|---|---|---|
inputTable | Yes | The name of the input table. | None |
inputTablePartitions | No | The partitions that are selected from the input table for training. Set this parameter in the Partition_name=value format. To specify multi-level partitions, set this parameter in the If you specify multiple partitions, separate them with commas (,). | All partitions in the input table |
outputTable | Yes | The output table after scaling. | None |
scaleCols | Yes | The features that you want to scale. Sparse features are automatically displayed. You can select only the features of numeric data types. | None |
labelCol | No | The label column. If this parameter is specified, the x-y histogram that displays the relationship between the features and the objective variables can be viewed. | None |
categoryCols | No | The selected fields that are processed as enumerated features. These fields do not support scaling. | "" |
scaleMethod | No | The method that is used for scaling. Value values:
| log2 |
scaleTopN | No | If you do not set the scaleCols parameter, the system automatically selects the top N features that require scaling. | 10 |
isSparse | No | Specifies whether features are sparse features in the key-value format. | Dense data |
itemSpliter | No | The delimiter that is used to separate sparse key-value pairs. | , |
kvSpliter | No | The delimiter that is used to separate sparse keys and values. | : |
lifecycle | No | The lifecycle of the output table. | 7 |
coreNum | No | The number of cores. The value of this parameter must be a positive integer. Valid values: [1,9999]. This parameter must be used together with the memSizePerCore parameter. | Determined by the system |
memSizePerCore | No | The memory size of each core. Unit: MB. The value of this parameter must be a positive integer. Valid values: [2048,64 × 1024]. | Determined by the system |
Examples
- Input data
Execute the following SQL statements to generate input data:
create table if not exists pai_dense_10_1 as select nr_employed from bank_data limit 10;
- Parameter settingsOn the Fields Setting tab, set the Scaled Features parameter to nr_employed. Only the features of the numeric data types are supported. On the Parameters Setting tab, set the Scaling Function parameter to log2, as shown in the following figure.
- Results
nr_employed 12.352071021075528 12.34313018339218 12.285286613666395 12.316026916036957 12.309533196497519 12.352071021075528 12.316026916036957 12.316026916036957 12.309533196497519 12.316026916036957