This topic describes the Convert Row, Column, and Value to KV Pair component provided by Machine Learning Designer (formerly known as Machine Learning Studio).
The component converts a trituple table (row,col,value) to a key-value table (row,[col_id:value]).
The data type of the trituple table (row,col,value) is "XXD" or "XXL". "X" represents any data type, "D" represents the DOUBLE data type, and "L" represents the BIGINT data type. In the key-value table generated after conversion, the data types of row and value are the same as those of the original input data. The generated index table of col maps to the col_id column. The data type of the col_id column is BIGINT.
The following tables provide a conversion example.
The following table is a trituple table.
id
word
count
01
a
10
01
b
20
01
c
30
The following table is the generated key-value table.
id
key_value
01
1:10;2:20;3:30
NoteYou can customize the delimiter between a key and a value, as well as between two key-value pairs in the key_value column.
The following table is the generated index table.
key
key_id
a
1
b
2
c
3
You can configure the component by using the Machine Learning Platform for AI (PAI) console or a PAI command.
Configure the component
You can use one of the following methods to configure the Convert Row, Column, and Value to KV Pair component.
Method 1: Configure the component on the pipeline page
You can configure the parameters of the Row, Column, and Value to KV Pair component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
Tab | Parameter | Description |
Fields Setting | Columns Reserved During KV Conversion | The name of the column that remains unchanged after a trituple table is converted to a key-value table. |
Output Keys | The keys in the key-value table. | |
Output Values | The values in the key-value table. | |
Key Column in Input Index Table | The name of the key column in the index table. | |
Index ID Column of Key in Input Index Table | The name of the index column in the index table. | |
KV Delimiter | The delimiter used to separate a key and a value in the key-value table. The default delimiter is a colon (:). | |
KV Pair Delimiter | The delimiter used to separate key-value pairs. Default value: comma (,). | |
Tuning | Total Number of Instances | The total number of instances. The value must be a positive integer. By default, the system calculates the value based on the amount of input data. |
Memory Size (MB) | The total memory. The value must be a positive integer. By default, the system calculates the value based on the amount of input data. |
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name triple_to_kv
-project algo_public
-DinputTableName=test_data
-DoutputTableName=test_kv_out
-DindexOutputTableName=test_index_out
-DidColName=id
-DkeyColName=word
-DvalueColName=count
-DinputTablePartitions=ds=test1
-DindexInputTableName=test_index_input
-DindexInputKeyColName=word
-DindexInputKeyIdColName=word_id
-DkvDelimiter=:
-DpairDelimiter=;
-Dlifecycle=3
Parameter | Required | Description | Default value |
inputTableName | Yes | The name of the input table. | No default value |
idColName | Yes | The name of the column that remains unchanged after a trituple table is converted to a key-value table. | No default value |
keyColName | Yes | The name of the column that lists keys in the key-value table. | No default value |
valueColName | Yes | The name of the column that lists values in the key-value table. | No default value |
outputTableName | Yes | The name of the generated key-value table. | No default value |
indexOutputTableName | Yes | The name of the generated index table. | No default value |
indexInputTableName | No | The name of the existing index table. The table must contain data. | No default value |
indexInputKeyColName | No | The name of the key column in the index table. This parameter is required if you set the indexInputTableName parameter. | No default value |
indexInputKeyIdColName | No | The name of the index column in the index table. This parameter is required if you set the indexInputTableName parameter. | No default value |
inputTablePartitions | No | The partition names of the input table. You can enter only one partition name. | No default value |
kvDelimiter | No | The delimiter used to separate a key and a value in the key-value table. | Colons (:) |
pairDelimiter | No | The delimiter used to separate key-value pairs. | Comma (,) |
lifecycle | No | The lifecycle of the output table. | No default value |
coreNum | No | The total number of instances. The value must be a positive integer. | Automatically calculated based on the amount of input data |
memSizePerCore | No | The total memory. The value must be a positive integer. | Automatically calculated based on the amount of input data |
Examples
Input
drop table if exists triple2kv_test_input; create table triple2kv_test_input as select * from ( select '01' as id, 'a' as word, 10 as count union all select '01' as id, 'b' as word, 20 as count union all select '01' as id, 'c' as word, 30 as count union all select '02' as id, 'a' as word, 100 as count union all select '02' as id, 'd' as word, 200 as count union all select '02' as id, 'e' as word, 300 as count ) tmp;
PAI command
PAI -name triple_to_kv -project algo_public -DinputTableName=triple2kv_test_input -DoutputTableName=triple2kv_test_input_out -DindexOutputTableName=triple2kv_test_input_index_out -DidColName=id -DkeyColName=word -DvalueColName=count -Dlifecycle=1;
Output
Key-value table triple2kv_test_input_out
+------------+------------+ | id | key_value | +------------+------------+ | 02 | 1:100;4:200;5:300 | | 01 | 1:10;2:20;3:30 | +------------+------------+
Index table triple2kv_test_input_index_out
+------------+------------+ | key | key_id | +------------+------------+ | a | 1 | | b | 2 | | c | 3 | | d | 4 | | e | 5 | +------------+------------+