Collaborative Filtering (etrec) - Platform For AI - Alibaba Cloud Documentation Center

The Collaborative Filtering (etrec) component provided by Platform for AI (PAI) uses etrec which is a collaborative filtering algorithm based on items. The algorithm uses two input columns and returns the top N items with the highest similarity as the output.

Configure the component

You can use one of the following methods to configure the Collaborative Filtering (etrec) component.

Method 1: Configure the component in the PAI console

You can configure the parameters of the Collaborative Filtering (etrec) component on the pipeline page of Machine Learning Designer of PAI. The following table describes the parameters.

Tab	Parameter	Description
Fields Setting	User Column	The name of the user column.
	Item Column	The name of the item column.
	Delimiter between items in the output table	Specify the delimiter between items in the output table. The default delimiter is a space.
	Delimiter between key-value in the output table	The delimiter that is used to separate keys and values in the output table. The default delimiter is a colon (:). Spaces are not supported.
Parameters Setting	Similarity Type	The type of similarity. Valid values: WbCosine, asymcosine, and Jaccard.
	TopN	The maximum number of similar items that can be reserved in the output table.
	Calculation Behavior	The method used to calculate the payload when an item of a user appears multiple times. Valid values: Add, Mul, Min, and Max. Note This parameter is discontinued and not effective.
	Minimum Item Value	If the number of items of a user is less than the value of this parameter, the behavior of the user is ignored.
	Maximum Item Value	If the number of items of a user is greater than the value of this parameter, the behavior of the user is ignored.
	Smoothing Factor	This parameter is valid only if the Similarity Type parameter is set to asymcosine.
	Weighting Coefficient	This parameter is valid only if the Similarity Type parameter is set to asymcosine.

Method 2: Configure the parameters by using PAI commands

Configure the component parameters by using PAI commands. The following section describes the parameters. You can use SQL scripts to call PAI commands. For more information, see SQL Script.

PAI -name pai_etrec
    -project algo_public
    -DsimilarityType="wbcosine"
    -Dweight="1"
    -DminUserBehavior="2"
    -Dlifecycle="28"
    -DtopN="2000"
    -Dalpha="0.5"
    -DoutputTableName="etrec_test_result"
    -DmaxUserBehavior="500"
    -DinputTableName="etrec_test_input"
    -DuserColName="user"
    -DitemColName="item"

Parameter	Required	Description	Default value
inputTableName	Yes	The name of the input table.	N/A
userColName	Yes	The name of the user column in the input table.	None
itemColName	Yes	The name of the item column in the input table.	None
inputTablePartitions	No	The partitions that are selected from the input table for training.	Full table
outputTableName	Yes	The name of the output table.	None
outputTablePartition	No	The partitions in the output table.	None
similarityType	No	The type of similarity. Valid values: wbcosine, asymcosine, and jaccard.	wbcosine
topN	No	The number of items with the largest similarity that can be reserved in the output table. The number of trees. Valid values: 1 to 10000.	2000
minUserBehavior	No	The minimum number of user behavior records.	2
maxUserBehavior	No	The maximum number of user behavior records.	500
itemDelimiter	No	The delimiter that is used to separate items in the output table.	Backspace
kvDelimiter	No	The delimiter that is used to separatekeys and values in the output table.	Colons (:)
alpha	No	The smoothing factor when the similarityType parameter is set to asymcosine. Valid values: (0,1).	0.5
weight	No	The weight index when the similarityType parameter is set to asymcosine.	1.0
lifecycle	No	The lifecycle of the output table.	1
coreNum	No	The number of cores.	Determined by the system
memSizePerCore	No	The memory size of each core. Unit: MB.	Determined by the system

Examples

Execute the following SQL statements to generate training data:

drop table if exists etrec_test_input;
create table etrec_test_input
as
select
    *
from
(
    select
        cast(0 as string) as user,
        cast(0 as string) as item
    union all
        select
            cast(0 as string) as user,
            cast(1 as string) as item
    union all
        select
            cast(1 as string) as user,
            cast(0 as string) as item
    union all
        select
            cast(1 as string) as user,
            cast(1 as string) as item
) a;

A training data table named etrec_test_input is generated.

user	item
0	0
0	1
1	0
1	1

Run the following PAI command to submit training parameters:

drop table if exists etrec_test_result;
PAI -name pai_etrec
    -project algo_public
    -DsimilarityType="wbcosine"
    -Dweight="1"
    -DminUserBehavior="2"
    -Dlifecycle="28"
    -DtopN="2000"
    -Dalpha="0.5"
    -DoutputTableName="etrec_test_result"
    -DmaxUserBehavior="500"
    -DinputTableName="etrec_test_input"
    -DuserColName="user"
    -DitemColName="item";

View the result output table named etrec_test_result.
itemid
similarity
0
1:1
1
0:1

itemid	similarity
0	1:1
1	0:1