The Collaborative Filtering (etrec) component provided by Platform for AI (PAI) uses etrec which is a collaborative filtering algorithm based on items. The algorithm uses two input columns and returns the top N items with the highest similarity as the output.
Configure the component
You can use one of the following methods to configure the Collaborative Filtering (etrec) component.
Method 1: Configure the component in the PAI console
You can configure the parameters of the Collaborative Filtering (etrec) component on the pipeline page of Machine Learning Designer of PAI. The following table describes the parameters.
Tab | Parameter | Description |
Fields Setting | User Column | The name of the user column. |
Item Column | The name of the item column. | |
Delimiter between items in the output table | Specify the delimiter between items in the output table. The default delimiter is a space. | |
Delimiter between key-value in the output table | The delimiter that is used to separate keys and values in the output table. The default delimiter is a colon (:). Spaces are not supported. | |
Parameters Setting | Similarity Type | The type of similarity. Valid values: WbCosine, asymcosine, and Jaccard. |
TopN | The maximum number of similar items that can be reserved in the output table. | |
Calculation Behavior | The method used to calculate the payload when an item of a user appears multiple times. Valid values: Add, Mul, Min, and Max. Note This parameter is discontinued and not effective. | |
Minimum Item Value | If the number of items of a user is less than the value of this parameter, the behavior of the user is ignored. | |
Maximum Item Value | If the number of items of a user is greater than the value of this parameter, the behavior of the user is ignored. | |
Smoothing Factor | This parameter is valid only if the Similarity Type parameter is set to asymcosine. | |
Weighting Coefficient | This parameter is valid only if the Similarity Type parameter is set to asymcosine. |
Method 2: Configure the parameters by using PAI commands
Configure the component parameters by using PAI commands. The following section describes the parameters. You can use SQL scripts to call PAI commands. For more information, see SQL Script.
PAI -name pai_etrec
-project algo_public
-DsimilarityType="wbcosine"
-Dweight="1"
-DminUserBehavior="2"
-Dlifecycle="28"
-DtopN="2000"
-Dalpha="0.5"
-DoutputTableName="etrec_test_result"
-DmaxUserBehavior="500"
-DinputTableName="etrec_test_input"
-DuserColName="user"
-DitemColName="item"
Parameter | Required | Description | Default value |
inputTableName | Yes | The name of the input table. | N/A |
userColName | Yes | The name of the user column in the input table. | None |
itemColName | Yes | The name of the item column in the input table. | None |
inputTablePartitions | No | The partitions that are selected from the input table for training. | Full table |
outputTableName | Yes | The name of the output table. | None |
outputTablePartition | No | The partitions in the output table. | None |
similarityType | No | The type of similarity. Valid values: wbcosine, asymcosine, and jaccard. | wbcosine |
topN | No | The number of items with the largest similarity that can be reserved in the output table. The number of trees. Valid values: 1 to 10000. | 2000 |
minUserBehavior | No | The minimum number of user behavior records. | 2 |
maxUserBehavior | No | The maximum number of user behavior records. | 500 |
itemDelimiter | No | The delimiter that is used to separate items in the output table. | Backspace |
kvDelimiter | No | The delimiter that is used to separatekeys and values in the output table. | Colons (:) |
alpha | No | The smoothing factor when the similarityType parameter is set to asymcosine. Valid values: (0,1). | 0.5 |
weight | No | The weight index when the similarityType parameter is set to asymcosine. | 1.0 |
lifecycle | No | The lifecycle of the output table. | 1 |
coreNum | No | The number of cores. | Determined by the system |
memSizePerCore | No | The memory size of each core. Unit: MB. | Determined by the system |
Examples
Execute the following SQL statements to generate training data:
drop table if exists etrec_test_input; create table etrec_test_input as select * from ( select cast(0 as string) as user, cast(0 as string) as item union all select cast(0 as string) as user, cast(1 as string) as item union all select cast(1 as string) as user, cast(0 as string) as item union all select cast(1 as string) as user, cast(1 as string) as item ) a;
A training data table named etrec_test_input is generated.
user
item
0
0
0
1
1
0
1
1
Run the following PAI command to submit training parameters:
drop table if exists etrec_test_result; PAI -name pai_etrec -project algo_public -DsimilarityType="wbcosine" -Dweight="1" -DminUserBehavior="2" -Dlifecycle="28" -DtopN="2000" -Dalpha="0.5" -DoutputTableName="etrec_test_result" -DmaxUserBehavior="500" -DinputTableName="etrec_test_input" -DuserColName="user" -DitemColName="item";
View the result output table named etrec_test_result.
itemid
similarity
0
1:1
1
0:1