The Least Absolute Shrinkage and Selection Operator (LASSO) regression algorithm can implement compression estimation. The Lasso Regression Training component is developed based on the LASSO algorithm. The component supports sparse and dense data and allows weighted data samples to be used for training. This topic describes how to configure the Lasso Regression Training component.
Limits
You can use the Ridge Regression Training component based only on one of the following computing resources: MaxCompute, Realtime Compute for Apache Flink, or Deep Learning Containers (DLC) of Platform for AI (PAI).
How LASSO works
LASSO creates a penalty function to obtain a more refined model. LASSO can shrink some regression coefficients and set specific regression coefficients to zero. If a coefficient is shrunk, the sum of the absolute values of the coefficient is less than a fixed value. This way, LASSO retains the beneficial features of subset shrinkage and implements biased estimation on multicollinearity data.
Configure the component in the PAI console
Input ports
Input port (left-to-right)
Data type
Recommended upstream component
Required
data
None
Yes
model
LASSO model (for incremental training)
Read Table (for reading model data)
Lasso Regression Training
No
Component parameters
Tab
Parameter
Description
Field Setting
labelCol
The name of the label column in the input table.
featureCols
If you have set the vectorCol parameter, this parameter cannot be set.
The feature columns that are used for training.
NoteThe featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.
vectorCol
If you have set the featureCols parameter, this parameter cannot be set.
The name of the vector column.
NoteThe featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.
weightCol
The name of the weight column.
Parameter Setting
lambda
The DOUBLE-typed regularization coefficient.
epsilon
The value that you expect to obtain from the training results before the iteration stops. Default value: 1.0E-6.
LearningRate
The parameter update speed during model training. Default value: 0.1.
maxIter
The maximum number of iterations. Default value: 100.
optimMethod
The optimization method used to improve problem-solving. Valid values:
LBFGS
GD
Newton
SGD
OWLQN
Execution Tuning
Number of Workers
The number of cores. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].
Memory per worker, unit MB
The memory size of each worker. Valid values: 1024 to 65536. Unit: MB.
Output ports
Output port
Data type
Downstream component
model
Regression model
model information
None
None
feature importance
None
None
linear model weight
None
None
Configure the component by coding
You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like the Lasso Regression Training component.
from pyalink.alink import *
def main(sources, sinks, parameter):
batchData = sources[0]
ridge = LassoRegTrainBatchOp()\
.setLambda(0.1)\
.setFeatureCols(["f0","f1"])\
.setLabelCol("label")
model = batchData.link(ridge)
model.link(sinks[0])
BatchOperator.execute()