SVD - Platform For AI - Alibaba Cloud Documentation Center

Singular value decomposition (SVD) is an important matrix decomposition in linear algebra. It is a generalization of the diagonalization of normal matrices in matrix analysis. SVD is widely used in fields such as signal processing and statistics.

Background information

Formula for singular value decomposition: X = U S V'

Configure the component

You can configure the component by using one of the following methods:

Use the Machine Learning Platform for AI console

Tab	Parameter	Description
Fields Setting	Feature Columns	The columns that are used to store key-value pairs. The keys and values are separated by colons (:), and multiple key-value pairs are separated by commas (,).
Parameters Setting	Number of Reserved Singular Values	The top N singular groups that you want to decompose. All singular groups are decomposed by default.
Parameters Setting	Accuracy Error	The error precision that is allowed.
Tuning	Memory Size per Node	The memory size of each node. Unit: MB. This parameter must be used with the Number of Nodes parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].
	Number of Nodes	The value of this parameter must be a positive integer. Valid values: [1024,64 × 1024].
	Lifetime	The lifecycle of the output table.

Use commands

PAI -name svd
    -project algo_public
    -DinputTableName=bank_data
    -DselectedColNames=col0
    -DenableSparse=true
    -Dk=5
    -DoutputUTableName=u_table
    -DoutputVTableName=v_table
    -DoutputSTableName=s_table;

Parameter	Required	Description	Default value
inputTableName	Yes	The input table that is used for training.	N/A
selectedColNames	No	The columns that are selected from the input table for training. Separate the columns with commas (,). If a sparse matrix is used, the columns of the STRING data type are supported. If a data table is used, the columns of the INT and DOUBLE types are supported.	All columns
inputTablePartitions	No	The partitions that are selected from the input table for training. Set this parameter in the `Partition_name=value` format. To specify multi-level partitions, set this parameter in the `name1=value1/name2=value2;` format. If you specify multiple partitions, separate them with commas (,).	All partitions
outputUTableName	Yes	The output table of the unitary matrix. The output table is generated from the `m * sgNum` dimension. m represents the number of rows of the data table, and sgNum represents the number of calculated singular values.	N/A
outputSTableName	Yes	The output table of the scattering matrix (S-matrix). The output table is generated from the `sgNum * sgNum` dimension. sgNum represents the number of calculated singular values.	N/A
outputVTableName	Yes	The output table of the V matrix. The output table is generated from the `n * sgNum` dimension. n represents the number of columns of the matrix, and sgNum represents the number of calculated singular values.	N/A
k	Yes	The number of expected singular values. The number of generated singular values may be a positive integer less than the value specified by the k parameter.	N/A
tol	No	The convergence error.	1.0e~06
enableSparse	No	Specifies whether data in the input table is in the sparse format. Valid values: true false	false
itemDelimiter	No	The delimiter that is used to separate key-value pairs when data in the input table is in the sparse format.	Space
kvDelimiter	No	The delimiter that is used to separate keys and values when data in the input table is in the sparse format.	:
coreNum	No	The number of cores. This parameter must be used with the memSizePerCore parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].	Determined by the system
memSizePerCore	No	The memory size of each core. Unit: MB. The value of this parameter must be a positive integer. Valid values: [1024,64 × 1024].	Determined by the system
lifecycle	No	The lifecycle of the output table. The value must be a positive integer.	N/A

Example

Generate input data

drop table if exists svd_test_input;
create table svd_test_input
as
select
    *
from
(
  select
        '0:3.9079 2:0.0009 3:0.0416 4:0.17664 6:0.36460 8:0.091330' as col0
    union all
  select
        '0:0.09229 2:0.4872172 5:0.5267 8:0.4544 9:0.23317' as col0
    union all
    select
    '1:0.8312 3:0.9317 5:0.5680 7:0.5560 9:0.0508' as col0
    union all
    select
    '2:0.767 5:0.01891 8:0.25235 ' as col0
    union all
    select
    '0:0.29819 2:0.87598086 6:0.5315568 ' as col0
    union all
    select
    '0:0.920260 2:0.5154311513 4:0.8104 5:0.188420 8:0.88' as col0
) a;

Run commands

PAI -name svd
    -project algo_public
    -DinputTableName=svd_test_input
    -DselectedColNames=col0
    -DenableSparse=true
    -Dk=5
    -DoutputUTableName=u_table
    -DoutputVTableName=v_table
    -DoutputSTableName=s_table;

Analysis scale: 100,000 columns