All Products
Search
Document Center

Platform For AI:Convert Row, Column, and Value to KV Pair

Last Updated:May 17, 2024

This topic describes the Convert Row, Column, and Value to KV Pair component provided by Machine Learning Designer (formerly known as Machine Learning Studio).

The component converts a trituple table (row,col,value) to a key-value table (row,[col_id:value]).

The data type of the trituple table (row,col,value) is "XXD" or "XXL". "X" represents any data type, "D" represents the DOUBLE data type, and "L" represents the BIGINT data type. In the key-value table generated after conversion, the data types of row and value are the same as those of the original input data. The generated index table of col maps to the col_id column. The data type of the col_id column is BIGINT.

The following tables provide a conversion example.

  • The following table is a trituple table.

    id

    word

    count

    01

    a

    10

    01

    b

    20

    01

    c

    30

  • The following table is the generated key-value table.

    id

    key_value

    01

    1:10;2:20;3:30

    Note

    You can customize the delimiter between a key and a value, as well as between two key-value pairs in the key_value column.

  • The following table is the generated index table.

    key

    key_id

    a

    1

    b

    2

    c

    3

You can configure the component by using the Machine Learning Platform for AI (PAI) console or a PAI command.

Configure the component

You can use one of the following methods to configure the Convert Row, Column, and Value to KV Pair component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Row, Column, and Value to KV Pair component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.

Tab

Parameter

Description

Fields Setting

Columns Reserved During KV Conversion

The name of the column that remains unchanged after a trituple table is converted to a key-value table.

Output Keys

The keys in the key-value table.

Output Values

The values in the key-value table.

Key Column in Input Index Table

The name of the key column in the index table.

Index ID Column of Key in Input Index Table

The name of the index column in the index table.

KV Delimiter

The delimiter used to separate a key and a value in the key-value table. The default delimiter is a colon (:).

KV Pair Delimiter

The delimiter used to separate key-value pairs. Default value: comma (,).

Tuning

Total Number of Instances

The total number of instances. The value must be a positive integer. By default, the system calculates the value based on the amount of input data.

Memory Size (MB)

The total memory. The value must be a positive integer. By default, the system calculates the value based on the amount of input data.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name triple_to_kv
    -project algo_public
    -DinputTableName=test_data
    -DoutputTableName=test_kv_out
    -DindexOutputTableName=test_index_out
    -DidColName=id
    -DkeyColName=word
    -DvalueColName=count
    -DinputTablePartitions=ds=test1
    -DindexInputTableName=test_index_input
    -DindexInputKeyColName=word
    -DindexInputKeyIdColName=word_id
    -DkvDelimiter=:
    -DpairDelimiter=;
    -Dlifecycle=3

Parameter

Required

Description

Default value

inputTableName

Yes

The name of the input table.

No default value

idColName

Yes

The name of the column that remains unchanged after a trituple table is converted to a key-value table.

No default value

keyColName

Yes

The name of the column that lists keys in the key-value table.

No default value

valueColName

Yes

The name of the column that lists values in the key-value table.

No default value

outputTableName

Yes

The name of the generated key-value table.

No default value

indexOutputTableName

Yes

The name of the generated index table.

No default value

indexInputTableName

No

The name of the existing index table. The table must contain data.

No default value

indexInputKeyColName

No

The name of the key column in the index table. This parameter is required if you set the indexInputTableName parameter.

No default value

indexInputKeyIdColName

No

The name of the index column in the index table. This parameter is required if you set the indexInputTableName parameter.

No default value

inputTablePartitions

No

The partition names of the input table. You can enter only one partition name.

No default value

kvDelimiter

No

The delimiter used to separate a key and a value in the key-value table.

Colons (:)

pairDelimiter

No

The delimiter used to separate key-value pairs.

Comma (,)

lifecycle

No

The lifecycle of the output table.

No default value

coreNum

No

The total number of instances. The value must be a positive integer.

Automatically calculated based on the amount of input data

memSizePerCore

No

The total memory. The value must be a positive integer.

Automatically calculated based on the amount of input data

Examples

  • Input

    drop table if exists triple2kv_test_input;
    create table triple2kv_test_input as
    select
      *
    from
    (
      select '01' as id, 'a' as word, 10 as count
        union all
          select '01' as id, 'b' as word, 20 as count
        union all
          select '01' as id, 'c' as word, 30 as count
        union all
          select '02' as id, 'a' as word, 100 as count
        union all
          select '02' as id, 'd' as word, 200 as count
        union all
          select '02' as id, 'e' as word, 300 as count
    ) tmp;
  • PAI command

    PAI -name triple_to_kv
        -project algo_public
        -DinputTableName=triple2kv_test_input
        -DoutputTableName=triple2kv_test_input_out
        -DindexOutputTableName=triple2kv_test_input_index_out
        -DidColName=id
        -DkeyColName=word
        -DvalueColName=count
        -Dlifecycle=1;
  • Output

    • Key-value table triple2kv_test_input_out

      +------------+------------+
      | id         | key_value  |
      +------------+------------+
      | 02         | 1:100;4:200;5:300 |
      | 01         | 1:10;2:20;3:30 |
      +------------+------------+
    • Index table triple2kv_test_input_index_out

      +------------+------------+
      | key        | key_id     |
      +------------+------------+
      | a          | 1          |
      | b          | 2          |
      | c          | 3          |
      | d          | 4          |
      | e          | 5          |
      +------------+------------+