All Products
Search
Document Center

Platform For AI:Data Type Conversion

Last Updated:May 17, 2024

This topic describes the Data Type Conversion component provided by Machine Learning Designer. You can use the Data Type Conversion component to convert features of all data types into features of the STRING, DOUBLE, or INT data type. This component also allows you to replace missing values if exceptions occur during data type conversion.

Background information

  • You can convert the data types of table fields.

  • You can convert multiple data types of table fields at the same time.

  • You can convert fields of ODPS 2.0 numeric data types, such as DECIMAL, FLOAT, and INT.

    Note

    This feature is available only in the China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), China (Zhangjiakou), and China (Chengdu) regions.

  • You can select whether to reserve original columns.

Configure the component

You can use one of the following methods to configure the component parameters.

Method 1: Using the Machine Learning Platform for AI (PAI) console

Configure the component parameters on the pipeline page of Machine Learning Designer.

Tab

Parameter

Description

Fields Setting

Convert to Double Type Columns

The columns whose data types need to be converted into the DOUBLE data type.

Default Imputed Value When Conversion Fails

The default value that is inputted when conversion to the DOUBLE data type fails.

Convert to Int Type Columns

The columns whose data types need to be converted into the INT data type.

Default Imputed Value When Conversion Fails

The default value that is inputted when conversion to the INT data type fails.

Convert to String Type Columns

The columns whose data types need to be converted into the STRING data type.

Default Imputed Value When Conversion Fails

The default value that is inputted when conversion to the STRING data type fails.

Reserve Original Columns

Specifies whether to reserve original columns. Column names are prefixed with typed_ after data type conversion.

Memory Size per Node

Valid values: 1024 to 65536 (64 × 1024). Unit: MB.

Cores

The number of cores used in computing. This parameter must be used with the Memory Size per Node parameter. Valid values: [1,9999].

Method 2: Using PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

pai -project algo_public
    -name type_transform_v1
    -DinputTable=type_test
    -Dcols_to_string="f0"
    -Ddefault_double_value=0.0
    -DoutputTable=type_test_output;

Parameter

Required

Description

Default value

inputTable

Yes

The name of the input table.

None

inputTablePartitions

No

The partitions selected from the input table for training. Specify this parameter in one of the following formats:

  • Partition_name=value

  • name1=value1/name2=value2: multi-level partitions

Note

If you specify multiple partitions, separate them with commas (,).

All partitions

outputTable

Yes

The output table of data type conversion.

None

reserveOldFeat

No

Specifies whether to reserve original columns.

None

cols_to_double

No

The feature columns whose data types need to be converted into DOUBLE.

None

cols_to_string

No

The feature columns whose data types need to be converted into STRING.

None

cols_to_int

No

The feature columns whose data types need to be converted into INT.

None

default_int_value

No

The value that is inputted when the cols_to_int parameter is not specified.

0

default_double_value

No

The value that is inputted when the cols_to_double parameter is not specified.

0.0

default_string_value

No

The value that is inputted when the cols_to_string parameter is not specified.

""

coreNum

No

The number of cores. This parameter must be used with the memSizePerCore parameter. Valid values: [1,9999].

Determined by the system

memSizePerCore

No

The memory size of each core. Valid values: 1024 to 65536 (64 × 1024). Unit: MB.

Determined by the system

lifecycle

No

The lifecycle of the output table.

7

Examples

  • Test data

    create table transform_test as
    select * from
    (
    select true as f0,2.0 as f1,1 as f2 union all
    select false as f0,3.0 as f1,1 as f2 union all
    select false as f0,4.0 as f1,1 as f2 union all
    select true as f0,3.0 as f1,1 as f2 union all
    select false as f0,3.0 as f1,1 as f2 union all
    select false as f0,4.0 as f1,1 as f2 union all
    select true as f0,3.0 as f1,1 as f2 union all
    select false as f0,5.0 as f1,1 as f2 union all
    select false as f0,3.0 as f1,1 as f2 union all
    select true as f0,4.0 as f1,1 as f2 union all
    select false as f0,3.0 as f1,1 as f2 union all
    select true as f0,4.0 as f1,1 as f2
    )tmp;
  • Training data

    f0

    f1

    f2

    false

    3.0

    1

    false

    3.0

    1

    true

    2.0

    1

    true

    4.0

    1

    false

    4.0

    1

    false

    3.0

    1

    false

    3.0

    1

    true

    3.0

    1

    false

    4.0

    1

    true

    4.0

    1

    false

    5.0

    1

    true

    3.0

    1

  • PAI command for training

    pai -project projectxlib4
        -name type_transform_v1
        -DinputTable=transform_test
        -Dcols_to_double=f0
        -Dcols_to_int=f1
        -Dcols_to_string=f2
        -DoutputTable=trans_test_output;
  • Output

    Result table

    f0

    f1

    f2

    0.0

    3

    1

    0.0

    3

    1

    1.0

    2

    1

    1.0

    4

    1

    0.0

    4

    1

    0.0

    3

    1

    1.0

    3

    1

    0.0

    4

    1

    0.0

    3

    1

    0.0

    5

    1

    1.0

    3

    1

    1.0

    4

    1