交替最小二乘ALS(Alternating Least Squares)是矩阵分解的一种算法,常用于推荐系统中,尤其是协同过滤场景。其主要目标是将一个用户-物品评分矩阵分解为两个低阶矩阵的乘积,从而实现降维、填补缺失值和发现潜在的用户偏好和物品特征。
支持的计算资源
MaxCompute/Flink
输入/输出
输入桩
输入的上游组件支持:
输出桩
输出的User因子和Item因子对应下游组件:ALS评分
配置组件
在Designer工作流页面添加ALS矩阵分解组件,并在界面右侧配置相关参数:
参数类型 | 参数 | 描述 |
字段设置 | user列名 | 输入数据源中,用户ID列的名称。该列数据必须是BIGINT类型。 |
item列名 | 输入数据源中,item项的列名。该列数据必须是BIGINT类型。 | |
打分列名 | 输入数据源中,用户对item项的打分所在的列名。该列数据必须是数值型。 | |
参数设置 | 因子数 | 默认值为10,取值范围为(0,+∞)。 |
迭代数 | 默认值为10,取值范围为(0,+∞)。 | |
正则化系数 | 默认值为0.1,取值范围为(0,+∞)。 | |
复选框 | 是否采用隐式偏好模型。 | |
隐式偏好系数 | 默认值为40,取值范围为(0,+∞)。 | |
输出表生命周期 | 输出模型表的生命周期,单位天。 | |
执行调优 | 节点个数 | 取值范围为1~9999。 |
单个节点内存大小 | 取值范围为1024 MB~64*1024 MB。 |
使用示例
使用以下数据作为ALS算法模板的输入数据,可以获得输出的user因子和item因子:
输入数据源
user_id
item_id
rating
10944750
13451
0
10944751
13452
1
10944752
13453
2
10944753
13454
2
10944754
13455
4
... ...
... ...
... ...
输出的user因子表
user_id
factors
8528750
[0.026986524,0.03350178,0.03532385,0.019542359,0.020429865,0.02046867,0.022253247,0.027391396,0.018985065,0.04889483]
282500
[0.116156064,0.07193632,0.090851225,0.017075706,0.025412979,0.047022138,0.12534861,0.05869226,0.11170533,0.1640192]
4895250
[0.038429666,0.061858658,0.04236993,0.055866677,0.031814687,0.0417443,0.012085311,0.0379342,0.10767074,0.028392972]
... ...
... ...
输出的item因子表
item_id
factors
24601
[0.0063337763,0.026349949,0.0064828005,0.01734504,0.022049638,0.0059205987,0.008568814,0.0015981696,0.0,0.013601779]
26699
[0.0027524426,0.0043066847,0.0031336215,0.00269448,0.0022347474,0.0020477585,0.0027995422,0.0025390312,0.0033011117,0.003957773]
20751
[0.03902271,0.050952066,0.032981463,0.03862796,0.048720762,0.027976315,0.02721664,0.018149626,0.0149896275,0.026251089]
... ...
... ...