You can use the KV to Table component to convert a table in the key-value format into a table in a common format. The key is converted into a column name of the table, and the value is converted into the value of the column that corresponds to the row.
Background information
Definition of the key-value table format: The key represents the column name index, and the value supports data of the BIGINT or DOUBLE type. You can import a custom key_map table into this component. The key_map table contains the mappings of column names and keys. Regardless of whether a key_map table is imported, this component exports a key_map table to record the mappings of column names and keys after conversion, such as 1:10, 2:20, and 3:30.
Definition of the key_map table format: The key_map table format contains the mappings of column names and indexes and data type information, which correspond to the col_name, col_index, and col_datatype parameters. The values of the three parameters must be of the STRING data type. If you do not specify the col_datatype parameter, the default value of this parameter is double.
col_name | col_index | col_datatype |
col1 | 1 | bigint |
col2 | 2 | double |
Component parameters
You can use one of the following methods to configure the KV to Table component:
Method 1: Configure the component on the pipeline page
Configure the component on the pipeline page of Machine Learning Designer.
Tab | Parameter | Description |
Fields Setting | KV Column | The names of the columns that contain data in the key-value format. |
Appended Columns | The names of appended columns. | |
KV Delimiter | The delimiter used between keys and values. By default, colons (:) are used. | |
KV Pair Delimiter | The delimiter used between key-value pairs. By default, commas (,) are used. | |
Parameters Setting | Reserve the First 1,200 Columns | Specifies whether to reserve only the first 1,200 columns if the number of columns in the key_map table after conversion exceeds 1,200. |
Tuning | Computing Cores | The number of cores. The system automatically allocates cores used for training based on the volume of input data. |
Memory Size per Core | The memory size of each core. The system automatically allocates the memory based on the volume of input data. Unit: MB. |
Method 2: Use PAI commands
Configure the component by using PAI commands. You can use the SQL Script component to run PAI commands. For more information, see SQL Script.
PAI -name KVToTable
-project algo_public
-DinputTableName=test
-DoutputTableName=test_out
-DoutputKeyMapTableName=test_keymap_out
-DkvColName=kv;
Parameter | Required | Description | Default value |
inputTableName | Yes | The name of the input table. | N/A |
kvColName | Yes | The names of the columns that contain data in the key-value format. | N/A |
outputTableName | Yes | The name of the output table. | N/A |
outputKeyMapTableName | Yes | The name of the output index table. | N/A |
inputKeyMapTableName | No | The name of the input index table. | N/A |
appendColName | No | The names of appended columns. | N/A |
inputTablePartitions | No | The partitions that are selected from the input table for training. Specify this parameter in one of the following formats:
Note If you specify multiple partitions, separate them with commas (,). | All partitions |
kvDelimiter | No | The delimiter used between keys and values. | Colons (:) |
itemDelimiter | No | The delimiter used between key-value pairs. | Commas (,) |
top1200 | No | Specifies whether to reserve only the first 1,200 columns. Valid values:
| true |
lifecycle | No | The lifecycle of the output table. | N/A |
coreNum | No | The number of cores used in computing. The value must be a positive integer. | Determined by the system |
memSizePerCore | No | The memory size of each core. Unit: MB. Valid values: (100,64 × 1024). | Determined by the system |
Examples
Generate input data
drop table if exists test; create table test as select * from ( select '1:1,2:2,3:-3.3' as kv union all select '1:10,2:20,3:-33.3' as kv ) tmp;
Run PAI commands
PAI -name KVToTable -project algo_public -DinputTableName=test -DoutputTableName=test_out -DoutputKeyMapTableName=test_keymap_out -DkvColName=kv;
Output description
Output table
+------------+------------+------------+ | kv_1 | kv_2 | kv_3 | +------------+------------+------------+ | 1.0 | 2.0 | -3.3 | | 10.0 | 20.0 | -33.3 | +------------+------------+------------+
Output mapping table
+------------+------------+------------+ | col_name | col_index | col_type | +------------+------------+------------+ | kv_1 | 1 | double | | kv_2 | 2 | double | | kv_3 | 3 | double | +------------+------------+------------+
Algorithm scale
The converted columns include the appended columns and the columns converted by using the KV to Table component. The columns converted by using the KV to Table component are exported before the appended columns. If the number of converted columns exceeds the maximum number of columns of the key_map table, and the top1200 parameter is set to true, the maximum number of columns is exported. Otherwise, an error is reported. A maximum of 1,200 columns can be exported.
The number of data records cannot exceed 100 million.
FAQ
Q: Which columns are converted if the input includes a key_map table?
A: The converted columns are the columns whose keys exist in both the key_map and key-value tables.
Q: What is the data type of the key column after conversion if the input includes a key_map table?
A: The data type of the converted key column is the same as that of the key_map table. If the data type of the key_map table is not specified, the data type of the converted key column is DOUBLE.
Q: What is the naming convention for the converted key column if a key_map table is imported?
A: The naming convention is Name of the column in the key:value format + "" + Key.
The following characters are not supported:
%&()*+-./;<>=?
Q: Why are column names conflict with each other?
A: If you specify an appended column and the name of the appended column is the same as that of the converted key column, an error is reported.
Q: Which types of columns can be converted?
A: Only columns of the numeric data types can be converted.
Q: What would happen if the length of the column name exceeds 128 characters?
A: Only the first 128 characters of the column name are reserved.
Q: What do I do if a row contains duplicate keys?
A: Calculate the sum of the values that correspond to these keys.