All Products
Search
Document Center

Platform For AI:KV2Table

Last Updated:Oct 18, 2024

You can use the KV to Table component to convert a table in the key-value format into a table in a common format. The key is converted into a column name of the table, and the value is converted into the value of the column that corresponds to the row.

Background information

Definition of the key-value table format: The key represents the column name index, and the value supports data of the BIGINT or DOUBLE type. You can import a custom key_map table into this component. The key_map table contains the mappings of column names and keys. Regardless of whether a key_map table is imported, this component exports a key_map table to record the mappings of column names and keys after conversion, such as 1:10, 2:20, and 3:30.

Definition of the key_map table format: The key_map table format contains the mappings of column names and indexes and data type information, which correspond to the col_name, col_index, and col_datatype parameters. The values of the three parameters must be of the STRING data type. If you do not specify the col_datatype parameter, the default value of this parameter is double.

col_name

col_index

col_datatype

col1

1

bigint

col2

2

double

Component parameters

You can use one of the following methods to configure the KV to Table component:

Method 1: Configure the component on the pipeline page

Configure the component on the pipeline page of Machine Learning Designer.

Tab

Parameter

Description

Fields Setting

KV Column

The names of the columns that contain data in the key-value format.

Appended Columns

The names of appended columns.

KV Delimiter

The delimiter used between keys and values. By default, colons (:) are used.

KV Pair Delimiter

The delimiter used between key-value pairs. By default, commas (,) are used.

Parameters Setting

Reserve the First 1,200 Columns

Specifies whether to reserve only the first 1,200 columns if the number of columns in the key_map table after conversion exceeds 1,200.

Tuning

Computing Cores

The number of cores. The system automatically allocates cores used for training based on the volume of input data.

Memory Size per Core

The memory size of each core. The system automatically allocates the memory based on the volume of input data. Unit: MB.

Method 2: Use PAI commands

Configure the component by using PAI commands. You can use the SQL Script component to run PAI commands. For more information, see SQL Script.

PAI -name KVToTable
    -project algo_public
    -DinputTableName=test
    -DoutputTableName=test_out
    -DoutputKeyMapTableName=test_keymap_out
    -DkvColName=kv;

Parameter

Required

Description

Default value

inputTableName

Yes

The name of the input table.

N/A

kvColName

Yes

The names of the columns that contain data in the key-value format.

N/A

outputTableName

Yes

The name of the output table.

N/A

outputKeyMapTableName

Yes

The name of the output index table.

N/A

inputKeyMapTableName

No

The name of the input index table.

N/A

appendColName

No

The names of appended columns.

N/A

inputTablePartitions

No

The partitions that are selected from the input table for training. Specify this parameter in one of the following formats:

  • Partition_name=value

  • name1=value1/name2=value2: multi-level partitions

Note

If you specify multiple partitions, separate them with commas (,).

All partitions

kvDelimiter

No

The delimiter used between keys and values.

Colons (:)

itemDelimiter

No

The delimiter used between key-value pairs.

Commas (,)

top1200

No

Specifies whether to reserve only the first 1,200 columns. Valid values:

  • true

  • false

true

lifecycle

No

The lifecycle of the output table.

N/A

coreNum

No

The number of cores used in computing. The value must be a positive integer.

Determined by the system

memSizePerCore

No

The memory size of each core. Unit: MB. Valid values: (100,64 × 1024).

Determined by the system

Examples

  • Generate input data

    drop table if exists test;
    create table test as
    select * from
        (
            select '1:1,2:2,3:-3.3' as kv union all
            select '1:10,2:20,3:-33.3' as kv
        ) tmp;
  • Run PAI commands

     PAI -name KVToTable
        -project algo_public
        -DinputTableName=test
        -DoutputTableName=test_out
        -DoutputKeyMapTableName=test_keymap_out
        -DkvColName=kv;
  • Output description

    • Output table

      +------------+------------+------------+
      | kv_1       | kv_2       | kv_3       |
      +------------+------------+------------+
      | 1.0        | 2.0        | -3.3       |
      | 10.0       | 20.0       | -33.3      |
      +------------+------------+------------+
    • Output mapping table

      +------------+------------+------------+
      | col_name   | col_index  | col_type   |
      +------------+------------+------------+
      | kv_1       | 1          | double     |
      | kv_2       | 2          | double     |
      | kv_3       | 3          | double     |
      +------------+------------+------------+

Algorithm scale

The converted columns include the appended columns and the columns converted by using the KV to Table component. The columns converted by using the KV to Table component are exported before the appended columns. If the number of converted columns exceeds the maximum number of columns of the key_map table, and the top1200 parameter is set to true, the maximum number of columns is exported. Otherwise, an error is reported. A maximum of 1,200 columns can be exported.

The number of data records cannot exceed 100 million.

FAQ

  • Q: Which columns are converted if the input includes a key_map table?

    A: The converted columns are the columns whose keys exist in both the key_map and key-value tables.

  • Q: What is the data type of the key column after conversion if the input includes a key_map table?

    A: The data type of the converted key column is the same as that of the key_map table. If the data type of the key_map table is not specified, the data type of the converted key column is DOUBLE.

  • Q: What is the naming convention for the converted key column if a key_map table is imported?

    A: The naming convention is Name of the column in the key:value format + "" + Key.

    The following characters are not supported:

    %&()*+-./;<>=?

  • Q: Why are column names conflict with each other?

    A: If you specify an appended column and the name of the appended column is the same as that of the converted key column, an error is reported.

  • Q: Which types of columns can be converted?

    A: Only columns of the numeric data types can be converted.

  • Q: What would happen if the length of the column name exceeds 128 characters?

    A: Only the first 128 characters of the column name are reserved.

  • Q: What do I do if a row contains duplicate keys?

    A: Calculate the sum of the values that correspond to these keys.