All Products
Search
Document Center

Platform For AI:SVD

Last Updated:May 17, 2024

Singular value decomposition (SVD) is an important matrix decomposition in linear algebra. It is a generalization of the diagonalization of normal matrices in matrix analysis. SVD is widely used in fields such as signal processing and statistics.

Background information

Formula for singular value decomposition: X = U S V'

Configure the component

You can configure the component by using one of the following methods:

  • Use the Machine Learning Platform for AI console

    Tab

    Parameter

    Description

    Fields Setting

    Feature Columns

    The columns that are used to store key-value pairs. The keys and values are separated by colons (:), and multiple key-value pairs are separated by commas (,).

    Parameters Setting

    Number of Reserved Singular Values

    The top N singular groups that you want to decompose. All singular groups are decomposed by default.

    Accuracy Error

    The error precision that is allowed.

    Tuning

    Memory Size per Node

    The memory size of each node. Unit: MB. This parameter must be used with the Number of Nodes parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].

    Number of Nodes

    The value of this parameter must be a positive integer. Valid values: [1024,64 × 1024].

    Lifetime

    The lifecycle of the output table.

  • Use commands

    PAI -name svd
        -project algo_public
        -DinputTableName=bank_data
        -DselectedColNames=col0
        -DenableSparse=true
        -Dk=5
        -DoutputUTableName=u_table
        -DoutputVTableName=v_table
        -DoutputSTableName=s_table;

    Parameter

    Required

    Description

    Default value

    inputTableName

    Yes

    The input table that is used for training.

    N/A

    selectedColNames

    No

    The columns that are selected from the input table for training. Separate the columns with commas (,).

    If a sparse matrix is used, the columns of the STRING data type are supported. If a data table is used, the columns of the INT and DOUBLE types are supported.

    All columns

    inputTablePartitions

    No

    The partitions that are selected from the input table for training. Set this parameter in the Partition_name=value format.

    To specify multi-level partitions, set this parameter in the name1=value1/name2=value2; format.

    If you specify multiple partitions, separate them with commas (,).

    All partitions

    outputUTableName

    Yes

    The output table of the unitary matrix. The output table is generated from the m * sgNum dimension. m represents the number of rows of the data table, and sgNum represents the number of calculated singular values.

    N/A

    outputSTableName

    Yes

    The output table of the scattering matrix (S-matrix). The output table is generated from the sgNum * sgNum dimension. sgNum represents the number of calculated singular values.

    N/A

    outputVTableName

    Yes

    The output table of the V matrix. The output table is generated from the n * sgNum dimension. n represents the number of columns of the matrix, and sgNum represents the number of calculated singular values.

    N/A

    k

    Yes

    The number of expected singular values.

    The number of generated singular values may be a positive integer less than the value specified by the k parameter.

    N/A

    tol

    No

    The convergence error.

    1.0e~06

    enableSparse

    No

    Specifies whether data in the input table is in the sparse format. Valid values:

    • true

    • false

    false

    itemDelimiter

    No

    The delimiter that is used to separate key-value pairs when data in the input table is in the sparse format.

    Space

    kvDelimiter

    No

    The delimiter that is used to separate keys and values when data in the input table is in the sparse format.

    :

    coreNum

    No

    The number of cores. This parameter must be used with the memSizePerCore parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].

    Determined by the system

    memSizePerCore

    No

    The memory size of each core. Unit: MB. The value of this parameter must be a positive integer. Valid values: [1024,64 × 1024].

    Determined by the system

    lifecycle

    No

    The lifecycle of the output table. The value must be a positive integer.

    N/A

Example

  • Generate input data

    drop table if exists svd_test_input;
    create table svd_test_input
    as
    select
        *
    from
    (
      select
            '0:3.9079 2:0.0009 3:0.0416 4:0.17664 6:0.36460 8:0.091330' as col0
        union all
      select
            '0:0.09229 2:0.4872172 5:0.5267 8:0.4544 9:0.23317' as col0
        union all
        select
        '1:0.8312 3:0.9317 5:0.5680 7:0.5560 9:0.0508' as col0
        union all
        select
        '2:0.767 5:0.01891 8:0.25235 ' as col0
        union all
        select
        '0:0.29819 2:0.87598086 6:0.5315568 ' as col0
        union all
        select
        '0:0.920260 2:0.5154311513 4:0.8104 5:0.188420 8:0.88' as col0
    ) a;
  • Run commands

    PAI -name svd
        -project algo_public
        -DinputTableName=svd_test_input
        -DselectedColNames=col0
        -DenableSparse=true
        -Dk=5
        -DoutputUTableName=u_table
        -DoutputVTableName=v_table
        -DoutputSTableName=s_table;
  • Analysis scale: 100,000 columns