use the Ridge Regression Training component to train models - Platform For AI

Tikhonov regularization is the most common regularization method used to deal with ill-posed problems. The Ridge Regression Training component is developed based on Tikhonov regularization. The component supports sparse and dense data and allows weighted data samples to be used for training. This topic describes how to configure the Ridge Regression Training component.

Limits

You can use the Ridge Regression Training component based only on one of the following computing resources: MaxCompute, Realtime Compute for Apache Flink, or Deep Learning Containers (DLC) of Platform for AI (PAI).

How Tikhonov regularization works

Tikhonov regularization is a biased estimation regression method dedicated to the analysis of collinearity data. It is essentially an improved least squares method. By giving up the unbiasedness of the least squares method, Tikhonov regularization is more realistic and reliable to obtain regression coefficients and fits better with ill-conditioned data than the least squares method. However, Tikhonov regularization also causes partial information loss and reduced accuracy.

Configure the component in the PAI console

Input ports
Input port (from left to right)
Data type
Recommended upstream component
Required
data
N/A
Read Table
Feature engineering
Prepare and preprocess data
Yes
model
N/A
Read Table
No

Component parameters

Tab	Parameter	Description
Field Setting	labelCol	The name of the label column in the input table.
	featureCols	If you have set the vectorCol parameter, this parameter cannot be set. The feature columns that are used for training. Note The featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.
	vectorCol	If you have set the featureCols parameter, this parameter cannot be set. The name of the vector column. Note The featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.
	weightCol	The name of the weight column.
Parameter Setting	lambda	The DOUBLE-typed regularization coefficient.
	epsilon	The value that you expect to obtain from the training results before the iteration stops. Default value: 1.0E-6.
	LearningRate	The parameter update speed during model training. Default value: 0.1.
	maxIter	The maximum number of iterations. Default value: 100.
	optimMethod	The optimization method used to improve problem-solving. Valid values: LBFGS GD Newton SGD OWLQN
Execution Tuning	Number of Workers	The number of cores. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].
Execution Tuning	Memory per worker, unit MB	The memory size of each worker. Valid values: 1024 to 65536. Unit: MB.

Output ports
Output port (from left to right)
Data type
Downstream component
model
Regression model
Ridge Regression Prediction
model information
N/A
N/A
Feature importance
N/A
N/A
linear model weight
N/A
N/A

Configure the component by using code

You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to serve the same purpose as the Ridge Regression Training component.

from pyalink.alink import *

def main(sources, sinks, parameter):
    batchData = sources[0]
    ridge = RidgeRegTrainBatchOp()\
        .setLambda(0.1)\
        .setFeatureCols(["f0","f1"])\
        .setLabelCol("label")
    model = batchData.link(ridge)
    model.link(sinks[0])
    BatchOperator.execute()

Input port (from left to right)	Data type	Recommended upstream component	Required
data	N/A	Read Table Feature engineering Prepare and preprocess data	Yes
model	N/A	Read Table	No

Output port (from left to right)	Data type	Downstream component
model	Regression model	Ridge Regression Prediction
model information	N/A	N/A
Feature importance	N/A	N/A
linear model weight	N/A	N/A