This topic describes the Covariance component provided by Machine Learning Designer (formerly known as Machine Learning Studio).
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. Variance is a special case of covariance where the two measured variables are the same. If the expected values are E(X) = μ and E(Y) = ν, the covariance between real-number random variables X and Y is calculated by using the following expression: cov(X, Y) = E((X - μ) (Y - ν)).
Configure the component
You can use one of the following methods to configure the Covariance component.
Method 1: Configure the component on the pipeline page
You can configure the parameters of the Covariance component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
Tab | Parameter | Description |
---|---|---|
Fields Setting | Input Columns | The input columns. You can select only BIGINT- or DOUBLE-type columns. |
Tuning | Cores | The number of cores used in computing. If you do not specify this parameter, the system automatically allocates the number of cores. |
Memory Size | The memory size of each core. If you do not specify this parameter, the system automatically allocates the memory size. Unit: MB. |
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name cov
-project algo_public
-DinputTableName=maple_test_cov_basic12x10_input
-DoutputTableName=maple_test_cov_basic12x10_output
-DcoreNum=6
-DmemSizePerCore=110;
Parameter | Required | Description | Default value |
---|---|---|---|
inputTableName | Yes | The name of the input table. | No default value |
inputTablePartitions | No | The partitions that are selected from the input table for training. The following formats are supported:
Note If you specify multiple partitions, separate them with commas (,). | All partitions of the input table |
outputTableName | Yes | The name of the output table. | No default value |
selectedColNames | No | The columns selected from the input table. | All columns |
lifecycle | No | The lifecycle of the output table. | No default value |
coreNum | No | The number of cores used in computing. The value must be a positive integer. Valid values: 1 to 9999. | Determined by the system |
memSizePerCore | No | The memory size of each core. Valid values: 1 to 65536. Unit: MB. | Determined by the system |