The Random Sampling component randomly samples the input data. You can specify the proportion or number of samples. The samples are independent of each other.
Configure the component
You can use one of the following methods to configure the Random Sampling component.
Method 1: Configure the component on the pipeline page
Configure the component parameters on the pipeline page of Machine Learning Designer.
Tab | Parameter | Description |
Parameters Setting | Sample Size | The value must be a positive integer. |
Sampling Fraction | The value must be a floating-point number. Valid values: (0,1). | |
Sampling with Replacement | By default, this check box is not selected. If you select this check box, sampling with replacement is enabled. | |
Random Seed | By default, the system determines the value. | |
Tuning | Cores | The value must be a positive integer. By default, the system determines the value. |
Memory Size per Core | The value must be a positive integer. Unit: MB. Valid values: (1,65536). By default, the system determines the value. |
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name RandomSample
-project algo_public
-Dlifecycle="28"
-DoutputTableName="test2"
-Dreplace="false"
-DsampleSize="500"
-DinputPartitions="pt=20150501"
-DinputTableName="bank_data_partition";
Parameter | Required | Description | Default value |
inputTableName | Yes | The name of the input table. | No default value |
inputTablePartitions | No | The partitions that are selected from the input table for training. The following formats are supported:
Note Separate multiple partitions with commas (,) | No default value |
outputTableName | Yes | The name of the output table. | No default value |
sampleSize | No | The number of samples. Note
| No default value |
sampleRatio | No | The sampling proportion. The value must be a floating-point number. Valid values: (0,1). | No default value |
replace | No | Specifies whether to enable sampling with replacement. The value must be of the BOOLEAN type. | false |
randomSeed | No | The random seed. The value must be a positive integer. | Determined by the system |
lifecycle | No | The lifecycle of the output table. Valid values: [1,3650]. | No default value |
coreNum | No | The number of cores used in computing. The value must be a positive integer. | Determined by the system |
memSizePerCore | No | The memory size of each core. Valid values: (1,65536). Unit: MB. | Determined by the system |