All Products
Search
Document Center

Platform For AI:Ridge Regression Training

Last Updated:Aug 26, 2024

Tikhonov regularization is the most common regularization method used to deal with ill-posed problems. The Ridge Regression Training component is developed based on Tikhonov regularization. The component supports sparse and dense data and allows weighted data samples to be used for training. This topic describes how to configure the Ridge Regression Training component.

Limits

You can use the Ridge Regression Training component based only on one of the following computing resources: MaxCompute, Realtime Compute for Apache Flink, or Deep Learning Containers (DLC) of Platform for AI (PAI).

How Tikhonov regularization works

Tikhonov regularization is a biased estimation regression method dedicated to the analysis of collinearity data. It is essentially an improved least squares method. By giving up the unbiasedness of the least squares method, Tikhonov regularization is more realistic and reliable to obtain regression coefficients and fits better with ill-conditioned data than the least squares method. However, Tikhonov regularization also causes partial information loss and reduced accuracy.

Configure the component in the PAI console

  • Input ports

    Input port (from left to right)

    Data type

    Recommended upstream component

    Required

    data

    N/A

    Yes

    model

    N/A

    Read Table

    No

  • Component parameters

    Tab

    Parameter

    Description

    Field Setting

    labelCol

    The name of the label column in the input table.

    featureCols

    If you have set the vectorCol parameter, this parameter cannot be set.

    The feature columns that are used for training.

    Note

    The featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.

    vectorCol

    If you have set the featureCols parameter, this parameter cannot be set.

    The name of the vector column.

    Note

    The featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.

    weightCol

    The name of the weight column.

    Parameter Setting

    lambda

    The DOUBLE-typed regularization coefficient.

    epsilon

    The value that you expect to obtain from the training results before the iteration stops. Default value: 1.0E-6.

    LearningRate

    The parameter update speed during model training. Default value: 0.1.

    maxIter

    The maximum number of iterations. Default value: 100.

    optimMethod

    The optimization method used to improve problem-solving. Valid values:

    • LBFGS

    • GD

    • Newton

    • SGD

    • OWLQN

    Execution Tuning

    Number of Workers

    The number of cores. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].

    Memory per worker, unit MB

    The memory size of each worker. Valid values: 1024 to 65536. Unit: MB.

  • Output ports

    Output port (from left to right)

    Data type

    Downstream component

    model

    Regression model

    Ridge Regression Prediction

    model information

    N/A

    N/A

    Feature importance

    N/A

    N/A

    linear model weight

    N/A

    N/A

Configure the component by using code

You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to serve the same purpose as the Ridge Regression Training component.

from pyalink.alink import *

def main(sources, sinks, parameter):
    batchData = sources[0]
    ridge = RidgeRegTrainBatchOp()\
        .setLambda(0.1)\
        .setFeatureCols(["f0","f1"])\
        .setLabelCol("label")
    model = batchData.link(ridge)
    model.link(sinks[0])
    BatchOperator.execute()