Tikhonov regularization is the most common regularization method used to deal with ill-posed problems. The Ridge Regression Training component is developed based on Tikhonov regularization. The component supports sparse and dense data and allows weighted data samples to be used for training. This topic describes how to configure the Ridge Regression Training component.
Limits
You can use the Ridge Regression Training component based only on one of the following computing resources: MaxCompute, Realtime Compute for Apache Flink, or Deep Learning Containers (DLC) of Platform for AI (PAI).
How Tikhonov regularization works
Tikhonov regularization is a biased estimation regression method dedicated to the analysis of collinearity data. It is essentially an improved least squares method. By giving up the unbiasedness of the least squares method, Tikhonov regularization is more realistic and reliable to obtain regression coefficients and fits better with ill-conditioned data than the least squares method. However, Tikhonov regularization also causes partial information loss and reduced accuracy.
Configure the component in the PAI console
Input ports
Input port (from left to right)
Data type
Recommended upstream component
Required
data
N/A
Yes
model
N/A
No
Component parameters
Tab
Parameter
Description
Field Setting
labelCol
The name of the label column in the input table.
featureCols
If you have set the vectorCol parameter, this parameter cannot be set.
The feature columns that are used for training.
NoteThe featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.
vectorCol
If you have set the featureCols parameter, this parameter cannot be set.
The name of the vector column.
NoteThe featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.
weightCol
The name of the weight column.
Parameter Setting
lambda
The DOUBLE-typed regularization coefficient.
epsilon
The value that you expect to obtain from the training results before the iteration stops. Default value: 1.0E-6.
LearningRate
The parameter update speed during model training. Default value: 0.1.
maxIter
The maximum number of iterations. Default value: 100.
optimMethod
The optimization method used to improve problem-solving. Valid values:
LBFGS
GD
Newton
SGD
OWLQN
Execution Tuning
Number of Workers
The number of cores. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].
Memory per worker, unit MB
The memory size of each worker. Valid values: 1024 to 65536. Unit: MB.
Output ports
Output port (from left to right)
Data type
Downstream component
model
Regression model
model information
N/A
N/A
Feature importance
N/A
N/A
linear model weight
N/A
N/A
Configure the component by using code
You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to serve the same purpose as the Ridge Regression Training component.
from pyalink.alink import *
def main(sources, sinks, parameter):
batchData = sources[0]
ridge = RidgeRegTrainBatchOp()\
.setLambda(0.1)\
.setFeatureCols(["f0","f1"])\
.setLabelCol("label")
model = batchData.link(ridge)
model.link(sinks[0])
BatchOperator.execute()