All Products
Search
Document Center

Platform For AI:Table to KV

Last Updated:May 17, 2024

This topic describes how to use the Table to KV component provided by Machine Learning Designer to convert a table in the common format into a table in the key-value format.

Limits

  • The output table does not show the null values in the input table. You can specify the columns that you want to retain in the output table. The specified columns are retained in their original formats.

  • If an input key_map table exists, the columns that are converted into the output table are the columns whose keys exist in both the key_map and key-value tables.

  • If an input key_map table exists and a data type specified in the key_map table is different from the data type of a specific column in the input table, the output key_map table uses the data type that you specified.

  • The columns that you want to convert into the key-value format in the input table must be of the BIGINT or DOUBLE data type.

Configure the Table to KV component

You can use one of the following methods to configure the Table to KV component.

Method 1: Configure the Table to KV component in a visualized manner

You can configure the parameters of the Table to KV component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.

Tab

Parameter

Description

Fields Setting

Columns to Convert

The names of the columns that you want to convert.

Reserved Columns

The names of the columns that you want to retain.

KV Delimiter

The delimiters that are used between keys and values. Colons (:) are used by default.

KV Pair Delimiter

The delimiters that are used between key-value pairs. Commas (,) are used by default.

Parameters Setting

Convert Columns to IDs

Specifies whether to convert columns into IDs. Valid values:

  • Yes

  • No

Tuning

Cores

The number of cores. The system automatically allocates cores that are used for training based on the amount of input data.

Memory Size

The memory size of each core. The system automatically allocates the memory based on the amount of input data. Unit: MB.

Method 2: Configure the Table to KV component by using PAI commands

You can configure the parameters of the Table to KV component by using PAI commands. You can use the SQL Script component to run PAI commands. For more information, see SQL Script. The following table describes the parameters.

PAI -name TableToKV
    -project algo_public
    -DinputTableName=maple_tabletokv_basic_input
    -DoutputTableName=maple_tabletokv_basic_output
    -DselectedColNames=col0,col1,col2
    -DappendColNames=rowid;

Parameter

Required

Description

Default value

inputTableName

Yes

The name of the input table.

No default value

inputTablePartitions

No

The partitions that are selected from the input table for training. You can specify this parameter in the Partition_name=value format.

If multi-level partitions exist, you can specify this parameter in the name1=value1/name2=value2; format.

Separate multiple partitions with commas (,).

All partitions selected

selectedColNames

No

The names of the columns that you select to convert. The data types of the columns that you select must be BIGINT or DOUBLE.

All columns selected

appendColNames

No

The names of the columns that you want to retain. The specified columns are retained in their original formats.

No default value

outputTableName

Yes

The name of the output table.

No default value

kvDelimiter

No

The delimiters that are used between keys and values.

Colons (:)

itemDelimiter

No

The delimiters that are used between key-value pairs.

Commas (,)

convertColToIndexId

No

Specifies whether to convert columns into IDs. Valid values:

  • 1: Columns are converted into IDs.

  • 0: Columns are not converted into IDs.

0

inputKeyMapTableName

No

The name of the input index table.

This parameter takes effect only when the convertColToIndexId parameter is set to 1. If this parameter is not specified, the system automatically calculates a set of IDs.

null

outputKeyMapTableName

Determined based on convertColToIndexId

The name of the output index table. This parameter is required only when the convertColToIndexId parameter is set to 1.

No default value

lifecycle

No

The lifecycle of the output table. The value of this parameter must be a positive integer.

No default value

coreNum

No

The number of cores. The value of this parameter must be a positive integer. Valid values: [1,9999]. This parameter must be used together with the memSizePerCore parameter.

Determined by the system

memSizePerCore

No

The memory size of each core. Unit: MB. The value of this parameter must be a positive integer. Valid values: [1024,64 × 1024].

Determined by the system

Example

  • Input data

    drop table if exists test;
    create table test as
    select * from
        (
            select 0 as rowid, 1 as col0, 1.1 as col1, 2 as col2 union all
            select 1 as rowid, 0 as col0, 1.2 as col1, 3 as col2 union all
            select 2 as rowid, 1 as col0, 2.3 as col1, 4 as col2 union all
            select 3 as rowid, 1 as col0, 0.0 as col1, 5 as col2
        ) tmp;
  • PAI commands

    PAI -name TableToKV
        -project algo_public
        -DinputTableName=test
        -DoutputTableName=test_output
        -DselectedColNames=col0,col1,col2
        -DconvertColToIndexId=1
        -DoutputKeyMapTableName=test_key_map
        -DappendColNames=rowid;
  • Output tables

    Output table test_output

    rowid

    kv

    0

    0:1,1:1.1,2:2

    1

    0:0,1:1.2,2:3

    2

    0:1,1:2.3,2:4

    3

    0:1,1:0,2:5

    Output table test_key_map

    col_name

    col_index

    col_datatype

    col0

    0

    bigint

    col1

    1

    double

    col2

    2

    bigint