This topic describes the Add ID Column component provided by Machine Learning Studio. This component allows you to append an ID column to the first column of a table.
Background information
The Add ID Column component can be used for the tables that have a maximum of 1,000,000,000 x 1,023 rows.
Configure the component
You can use one of the following methods to configure the Add ID Column component.
Method 1: Configure the component on the pipeline page
Configure the component parameters on the pipeline page of Machine Learning Designer.
Tab | Parameter | Description |
---|---|---|
Parameters Setting | All Selected by Default | By default, all columns in the input table are selected. Specific columns may not be used for training. These columns do not affect the prediction result. |
ID Column | The default value of this parameter is append_id. | |
Tuning | Cores | The number of cores. |
Memory Size per Core | The memory size of each core. Unit: MB. Valid values: (1,65536). |
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name AppendId
-project algo_public
-DinputTableName=maple_test_appendid_basic_input
-DoutputTableName=maple_test_appendid_basic_output;
Parameter | Required | Description | Default value |
---|---|---|---|
inputTableName | Yes | The name of the input table. | No default value |
selectedColNames | No | The columns that are selected from the input table for training. The column names must be separated by commas (,). Columns of the INT and DOUBLE types are supported. If the input data is in the sparse format, columns of the STRING type are supported. | All columns |
inputTablePartitions | No | The partitions that are selected from the input table for training. The following formats are supported:
Note If you specify multiple partitions, separate them with commas (,). | All partitions |
outputTableName | Yes | The name of the output table. | No default value |
IDColName | No | The name of the added ID column. | append_id |
lifecycle | No | The lifecycle of the output table. | No default value |
coreNum | No | The number of cores. | Determined by the system |
memSizePerCore | No | The memory size of each core. Unit: MB. Valid values: (1,65536). | Determined by the system |
Example
PAI -name AppendId
-project algo_public
-DinputTableName=maple_test_appendid_basic_input
-DoutputTableName=maple_test_appendid_basic_output;
- Input data
col0 col1 col2 col3 col4 10 0.0 aaaa Thu Oct 01 00:00:00 CST 2015 true 11 1.0 aaaa Thu Oct 01 00:00:00 CST 2015 false 12 2.0 aaaa Thu Oct 01 00:00:00 CST 2015 true 13 3.0 aaaa Thu Oct 01 00:00:00 CST 2015 true 14 4.0 aaaa Thu Oct 01 00:00:00 CST 2015 true - Output table
append_id col0 col1 col2 col3 col4 0 10 0.0 aaaa Thu Oct 01 00:00:00 CST 2015 true 1 11 1.0 aaaa Thu Oct 01 00:00:00 CST 2015 false 2 12 2.0 aaaa Thu Oct 01 00:00:00 CST 2015 true 3 13 3.0 aaaa Thu Oct 01 00:00:00 CST 2015 true 4 14 4.0 aaaa Thu Oct 01 00:00:00 CST 2015 true