This topic describes the Data Type Conversion component provided by Machine Learning Designer. You can use the Data Type Conversion component to convert features of all data types into features of the STRING, DOUBLE, or INT data type. This component also allows you to replace missing values if exceptions occur during data type conversion.
Background information
You can convert the data types of table fields.
You can convert multiple data types of table fields at the same time.
You can convert fields of ODPS 2.0 numeric data types, such as DECIMAL, FLOAT, and INT.
NoteThis feature is available only in the China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), China (Zhangjiakou), and China (Chengdu) regions.
You can select whether to reserve original columns.
Configure the component
You can use one of the following methods to configure the component parameters.
Method 1: Using the Machine Learning Platform for AI (PAI) console
Configure the component parameters on the pipeline page of Machine Learning Designer.
Tab | Parameter | Description |
Fields Setting | Convert to Double Type Columns | The columns whose data types need to be converted into the DOUBLE data type. |
Default Imputed Value When Conversion Fails | The default value that is inputted when conversion to the DOUBLE data type fails. | |
Convert to Int Type Columns | The columns whose data types need to be converted into the INT data type. | |
Default Imputed Value When Conversion Fails | The default value that is inputted when conversion to the INT data type fails. | |
Convert to String Type Columns | The columns whose data types need to be converted into the STRING data type. | |
Default Imputed Value When Conversion Fails | The default value that is inputted when conversion to the STRING data type fails. | |
Reserve Original Columns | Specifies whether to reserve original columns. Column names are prefixed with typed_ after data type conversion. | |
Memory Size per Node | Valid values: 1024 to 65536 (64 × 1024). Unit: MB. | |
Cores | The number of cores used in computing. This parameter must be used with the Memory Size per Node parameter. Valid values: [1,9999]. |
Method 2: Using PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
pai -project algo_public
-name type_transform_v1
-DinputTable=type_test
-Dcols_to_string="f0"
-Ddefault_double_value=0.0
-DoutputTable=type_test_output;
Parameter | Required | Description | Default value |
inputTable | Yes | The name of the input table. | None |
inputTablePartitions | No | The partitions selected from the input table for training. Specify this parameter in one of the following formats:
Note If you specify multiple partitions, separate them with commas (,). | All partitions |
outputTable | Yes | The output table of data type conversion. | None |
reserveOldFeat | No | Specifies whether to reserve original columns. | None |
cols_to_double | No | The feature columns whose data types need to be converted into DOUBLE. | None |
cols_to_string | No | The feature columns whose data types need to be converted into STRING. | None |
cols_to_int | No | The feature columns whose data types need to be converted into INT. | None |
default_int_value | No | The value that is inputted when the cols_to_int parameter is not specified. | 0 |
default_double_value | No | The value that is inputted when the cols_to_double parameter is not specified. | 0.0 |
default_string_value | No | The value that is inputted when the cols_to_string parameter is not specified. | "" |
coreNum | No | The number of cores. This parameter must be used with the memSizePerCore parameter. Valid values: [1,9999]. | Determined by the system |
memSizePerCore | No | The memory size of each core. Valid values: 1024 to 65536 (64 × 1024). Unit: MB. | Determined by the system |
lifecycle | No | The lifecycle of the output table. | 7 |
Examples
Test data
create table transform_test as select * from ( select true as f0,2.0 as f1,1 as f2 union all select false as f0,3.0 as f1,1 as f2 union all select false as f0,4.0 as f1,1 as f2 union all select true as f0,3.0 as f1,1 as f2 union all select false as f0,3.0 as f1,1 as f2 union all select false as f0,4.0 as f1,1 as f2 union all select true as f0,3.0 as f1,1 as f2 union all select false as f0,5.0 as f1,1 as f2 union all select false as f0,3.0 as f1,1 as f2 union all select true as f0,4.0 as f1,1 as f2 union all select false as f0,3.0 as f1,1 as f2 union all select true as f0,4.0 as f1,1 as f2 )tmp;
Training data
f0
f1
f2
false
3.0
1
false
3.0
1
true
2.0
1
true
4.0
1
false
4.0
1
false
3.0
1
false
3.0
1
true
3.0
1
false
4.0
1
true
4.0
1
false
5.0
1
true
3.0
1
PAI command for training
pai -project projectxlib4 -name type_transform_v1 -DinputTable=transform_test -Dcols_to_double=f0 -Dcols_to_int=f1 -Dcols_to_string=f2 -DoutputTable=trans_test_output;
Output
Result table
f0
f1
f2
0.0
3
1
0.0
3
1
1.0
2
1
1.0
4
1
0.0
4
1
0.0
3
1
1.0
3
1
0.0
4
1
0.0
3
1
0.0
5
1
1.0
3
1
1.0
4
1