All Products
Search
Document Center

Platform For AI:Lasso Regression Training

Last Updated:Aug 26, 2024

The Least Absolute Shrinkage and Selection Operator (LASSO) regression algorithm can implement compression estimation. The Lasso Regression Training component is developed based on the LASSO algorithm. The component supports sparse and dense data and allows weighted data samples to be used for training. This topic describes how to configure the Lasso Regression Training component.

Limits

You can use the Ridge Regression Training component based only on one of the following computing resources: MaxCompute, Realtime Compute for Apache Flink, or Deep Learning Containers (DLC) of Platform for AI (PAI).

How LASSO works

LASSO creates a penalty function to obtain a more refined model. LASSO can shrink some regression coefficients and set specific regression coefficients to zero. If a coefficient is shrunk, the sum of the absolute values of the coefficient is less than a fixed value. This way, LASSO retains the beneficial features of subset shrinkage and implements biased estimation on multicollinearity data.

Configure the component in the PAI console

  • Input ports

    Input port (left-to-right)

    Data type

    Recommended upstream component

    Required

    data

    None

    Yes

    model

    LASSO model (for incremental training)

    • Read Table (for reading model data)

    • Lasso Regression Training

    No

  • Component parameters

    Tab

    Parameter

    Description

    Field Setting

    labelCol

    The name of the label column in the input table.

    featureCols

    If you have set the vectorCol parameter, this parameter cannot be set.

    The feature columns that are used for training.

    Note

    The featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.

    vectorCol

    If you have set the featureCols parameter, this parameter cannot be set.

    The name of the vector column.

    Note

    The featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.

    weightCol

    The name of the weight column.

    Parameter Setting

    lambda

    The DOUBLE-typed regularization coefficient.

    epsilon

    The value that you expect to obtain from the training results before the iteration stops. Default value: 1.0E-6.

    LearningRate

    The parameter update speed during model training. Default value: 0.1.

    maxIter

    The maximum number of iterations. Default value: 100.

    optimMethod

    The optimization method used to improve problem-solving. Valid values:

    • LBFGS

    • GD

    • Newton

    • SGD

    • OWLQN

    Execution Tuning

    Number of Workers

    The number of cores. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].

    Memory per worker, unit MB

    The memory size of each worker. Valid values: 1024 to 65536. Unit: MB.

  • Output ports

    Output port

    Data type

    Downstream component

    model

    Regression model

    Lasso Regression Prediction

    model information

    None

    None

    feature importance

    None

    None

    linear model weight

    None

    None

Configure the component by coding

You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like the Lasso Regression Training component.

from pyalink.alink import *

def main(sources, sinks, parameter):
    batchData = sources[0]
    ridge = LassoRegTrainBatchOp()\
        .setLambda(0.1)\
        .setFeatureCols(["f0","f1"])\
        .setLabelCol("label")
    model = batchData.link(ridge)
    model.link(sinks[0])
    BatchOperator.execute()