This topic describes how to use the Table to KV component provided by Machine Learning Designer to convert a table in the common format into a table in the key-value format.
Limits
The output table does not show the null values in the input table. You can specify the columns that you want to retain in the output table. The specified columns are retained in their original formats.
If an input key_map table exists, the columns that are converted into the output table are the columns whose keys exist in both the key_map and key-value tables.
If an input key_map table exists and a data type specified in the key_map table is different from the data type of a specific column in the input table, the output key_map table uses the data type that you specified.
The columns that you want to convert into the key-value format in the input table must be of the BIGINT or DOUBLE data type.
Configure the Table to KV component
You can use one of the following methods to configure the Table to KV component.
Method 1: Configure the Table to KV component in a visualized manner
You can configure the parameters of the Table to KV component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
Tab | Parameter | Description |
Fields Setting | Columns to Convert | The names of the columns that you want to convert. |
Reserved Columns | The names of the columns that you want to retain. | |
KV Delimiter | The delimiters that are used between keys and values. Colons (:) are used by default. | |
KV Pair Delimiter | The delimiters that are used between key-value pairs. Commas (,) are used by default. | |
Parameters Setting | Convert Columns to IDs | Specifies whether to convert columns into IDs. Valid values:
|
Tuning | Cores | The number of cores. The system automatically allocates cores that are used for training based on the amount of input data. |
Memory Size | The memory size of each core. The system automatically allocates the memory based on the amount of input data. Unit: MB. |
Method 2: Configure the Table to KV component by using PAI commands
You can configure the parameters of the Table to KV component by using PAI commands. You can use the SQL Script component to run PAI commands. For more information, see SQL Script. The following table describes the parameters.
PAI -name TableToKV
-project algo_public
-DinputTableName=maple_tabletokv_basic_input
-DoutputTableName=maple_tabletokv_basic_output
-DselectedColNames=col0,col1,col2
-DappendColNames=rowid;
Parameter | Required | Description | Default value |
inputTableName | Yes | The name of the input table. | No default value |
inputTablePartitions | No | The partitions that are selected from the input table for training. You can specify this parameter in the If multi-level partitions exist, you can specify this parameter in the Separate multiple partitions with commas (,). | All partitions selected |
selectedColNames | No | The names of the columns that you select to convert. The data types of the columns that you select must be BIGINT or DOUBLE. | All columns selected |
appendColNames | No | The names of the columns that you want to retain. The specified columns are retained in their original formats. | No default value |
outputTableName | Yes | The name of the output table. | No default value |
kvDelimiter | No | The delimiters that are used between keys and values. | Colons (:) |
itemDelimiter | No | The delimiters that are used between key-value pairs. | Commas (,) |
convertColToIndexId | No | Specifies whether to convert columns into IDs. Valid values:
| 0 |
inputKeyMapTableName | No | The name of the input index table. This parameter takes effect only when the | null |
outputKeyMapTableName | Determined based on convertColToIndexId | The name of the output index table. This parameter is required only when the convertColToIndexId parameter is set to 1. | No default value |
lifecycle | No | The lifecycle of the output table. The value of this parameter must be a positive integer. | No default value |
coreNum | No | The number of cores. The value of this parameter must be a positive integer. Valid values: [1,9999]. This parameter must be used together with the memSizePerCore parameter. | Determined by the system |
memSizePerCore | No | The memory size of each core. Unit: MB. The value of this parameter must be a positive integer. Valid values: [1024,64 × 1024]. | Determined by the system |
Example
Input data
drop table if exists test; create table test as select * from ( select 0 as rowid, 1 as col0, 1.1 as col1, 2 as col2 union all select 1 as rowid, 0 as col0, 1.2 as col1, 3 as col2 union all select 2 as rowid, 1 as col0, 2.3 as col1, 4 as col2 union all select 3 as rowid, 1 as col0, 0.0 as col1, 5 as col2 ) tmp;
PAI commands
PAI -name TableToKV -project algo_public -DinputTableName=test -DoutputTableName=test_output -DselectedColNames=col0,col1,col2 -DconvertColToIndexId=1 -DoutputKeyMapTableName=test_key_map -DappendColNames=rowid;
Output tables
Output table test_output
rowid
kv
0
0:1,1:1.1,2:2
1
0:0,1:1.2,2:3
2
0:1,1:2.3,2:4
3
0:1,1:0,2:5
Output table test_key_map
col_name
col_index
col_datatype
col0
0
bigint
col1
1
double
col2
2
bigint