Singular value decomposition (SVD) is an important matrix decomposition in linear algebra. It is a generalization of the diagonalization of normal matrices in matrix analysis. SVD is widely used in fields such as signal processing and statistics.
Background information
Formula for singular value decomposition: X = U S V'
Configure the component
You can configure the component by using one of the following methods:
Use the Machine Learning Platform for AI console
Tab
Parameter
Description
Fields Setting
Feature Columns
The columns that are used to store key-value pairs. The keys and values are separated by colons (:), and multiple key-value pairs are separated by commas (,).
Parameters Setting
Number of Reserved Singular Values
The top N singular groups that you want to decompose. All singular groups are decomposed by default.
Accuracy Error
The error precision that is allowed.
Tuning
Memory Size per Node
The memory size of each node. Unit: MB. This parameter must be used with the Number of Nodes parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].
Number of Nodes
The value of this parameter must be a positive integer. Valid values: [1024,64 × 1024].
Lifetime
The lifecycle of the output table.
Use commands
PAI -name svd -project algo_public -DinputTableName=bank_data -DselectedColNames=col0 -DenableSparse=true -Dk=5 -DoutputUTableName=u_table -DoutputVTableName=v_table -DoutputSTableName=s_table;
Parameter
Required
Description
Default value
inputTableName
Yes
The input table that is used for training.
N/A
selectedColNames
No
The columns that are selected from the input table for training. Separate the columns with commas (,).
If a sparse matrix is used, the columns of the STRING data type are supported. If a data table is used, the columns of the INT and DOUBLE types are supported.
All columns
inputTablePartitions
No
The partitions that are selected from the input table for training. Set this parameter in the
Partition_name=value
format.To specify multi-level partitions, set this parameter in the
name1=value1/name2=value2;
format.If you specify multiple partitions, separate them with commas (,).
All partitions
outputUTableName
Yes
The output table of the unitary matrix. The output table is generated from the
m * sgNum
dimension. m represents the number of rows of the data table, and sgNum represents the number of calculated singular values.N/A
outputSTableName
Yes
The output table of the scattering matrix (S-matrix). The output table is generated from the
sgNum * sgNum
dimension. sgNum represents the number of calculated singular values.N/A
outputVTableName
Yes
The output table of the V matrix. The output table is generated from the
n * sgNum
dimension. n represents the number of columns of the matrix, and sgNum represents the number of calculated singular values.N/A
k
Yes
The number of expected singular values.
The number of generated singular values may be a positive integer less than the value specified by the k parameter.
N/A
tol
No
The convergence error.
1.0e~06
enableSparse
No
Specifies whether data in the input table is in the sparse format. Valid values:
true
false
false
itemDelimiter
No
The delimiter that is used to separate key-value pairs when data in the input table is in the sparse format.
Space
kvDelimiter
No
The delimiter that is used to separate keys and values when data in the input table is in the sparse format.
:
coreNum
No
The number of cores. This parameter must be used with the memSizePerCore parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].
Determined by the system
memSizePerCore
No
The memory size of each core. Unit: MB. The value of this parameter must be a positive integer. Valid values: [1024,64 × 1024].
Determined by the system
lifecycle
No
The lifecycle of the output table. The value must be a positive integer.
N/A
Example
Generate input data
drop table if exists svd_test_input; create table svd_test_input as select * from ( select '0:3.9079 2:0.0009 3:0.0416 4:0.17664 6:0.36460 8:0.091330' as col0 union all select '0:0.09229 2:0.4872172 5:0.5267 8:0.4544 9:0.23317' as col0 union all select '1:0.8312 3:0.9317 5:0.5680 7:0.5560 9:0.0508' as col0 union all select '2:0.767 5:0.01891 8:0.25235 ' as col0 union all select '0:0.29819 2:0.87598086 6:0.5315568 ' as col0 union all select '0:0.920260 2:0.5154311513 4:0.8104 5:0.188420 8:0.88' as col0 ) a;
Run commands
PAI -name svd -project algo_public -DinputTableName=svd_test_input -DselectedColNames=col0 -DenableSparse=true -Dk=5 -DoutputUTableName=u_table -DoutputVTableName=v_table -DoutputSTableName=s_table;
Analysis scale: 100,000 columns