Swing is an item recall algorithm. You can use the Swing Train component of Platform for AI (PAI) to measure the similarity of items based on user-item-user principles. This topic describes how to configure the Swing Train component.
Limits
You can use the Swing Train component based on the computing resources of MaxCompute and Realtime Compute for Apache Flink.
Configure the component
You can configure the component by using one of the following methods:
Method 1: Configure the component in the PAI console
Configure the Swing Train component on the pipeline page of Machine Learning Designer. The following table describes the parameters.
Tab | Parameter | Description |
Field Setting | itemCol | The name of the item column. |
userCol | The name of the user column. | |
Parameter Setting | alpha | The alpha parameter. Default value: 1.0. |
maxItemNumber | The maximum number of users who use an item for the calculation. Default value: 1000. Note If the number of occurrences of an item is greater than this value, the algorithm randomly selects the maximum number of users based on the total number of users. | |
maxUserItems | The maximum number of items used by a user for the calculation. Default value: 1000. Note If the number of items used by a user for the calculation is greater than this value, the user is not included in the calculation. | |
minUserItems | The minimum number of items used by a user for the calculation. Default value: 10. Note If the number of items used by a user for the calculation is less than this value, the user is not included in the calculation. | |
resultNormalize | Specifies whether to normalize the results. | |
userAlpha | The alpha parameter for users. Default value: 5.0. | |
userBeta | The beta parameter for users. Default value: -0.35. | |
Execute Tuning | Number of Workers | The number of worker nodes. The value must be a positive integer. This parameter must be used together with the Memory per worker, unit MB parameter. Valid values: 1 to 9999. |
Memory per worker | The memory size of each worker node. Unit: MB. The value must be a positive integer. Valid values: 1024 to 65536. |
Method 2: Configure the component by using Python code
You can configure the Swing Train component by using the PyAlink Script component to call Python code. For more information, see the PyAlink script documentation.
Parameter | Required | Description | Default value |
itemCol | Yes | The name of the item column. | N/A |
userCol | Yes | The name of the user column. | N/A |
alpha | No | The alpha parameter, which is a smoothing factor. | 1.0 |
userAlpha | No | The alpha parameter for users. Note This parameter is used to calculate the weight of a user by using the following formula: User weight = 1.0/(userAlpha + userClickCount)^userBeta. | 5.0 |
userBeta | No | The beta parameter for users. Note This parameter is used to calculate the weight of a user by using the following formula: User weight = 1.0/(userAlpha + userClickCount)^userBeta. | -0.35 |
resultNormalize | No | Specifies whether to normalize the value. | false |
maxItemNumber | No | The maximum number of users who use an item for the calculation. Note If the number of occurrences of an item is greater than this value, the algorithm randomly selects the maximum number of users based on the total number of users. | 1000 |
minUserItems | No | The minimum number of items used by a user for the calculation. Note If the number of items used by a user for the calculation is less than this value, the user is not included in the calculation. | 10 |
maxUserItems | No | The maximum number of items used by a user for the calculation. Note If the number of items used by a user for the calculation is greater than this value, the user is not included in the calculation. | 1000 |
Sample Python code:
df_data = pd.DataFrame([
["a1", "11L", 2.2],
["a1", "12L", 2.0],
["a2", "11L", 2.0],
["a2", "12L", 2.0],
["a3", "12L", 2.0],
["a3", "13L", 2.0],
["a4", "13L", 2.0],
["a4", "14L", 2.0],
["a5", "14L", 2.0],
["a5", "15L", 2.0],
["a6", "15L", 2.0],
["a6", "16L", 2.0],
])
data = BatchOperator.fromDataframe(df_data, schemaStr='user string, item string, rating double')
model = SwingTrainBatchOp()\
.setUserCol("user")\
.setItemCol("item")\
.setMinUserItems(1)\
.linkFrom(data)
model.print()
predictor = SwingRecommBatchOp()\
.setItemCol("item")\
.setRecommCol("prediction_result")
predictor.linkFrom(model, data).print()
Examples
The following figure shows a sample pipeline in which the Swing Train component is used. In this example, the following steps are performed to configure the components in the preceding figure:
Prepare a training dataset and a test dataset.
Create two MaxCompute tables named Table 1 and Table 2. Table 1 contains the userid and itemid fields, and Table 2 contains the itemid field. The fields are of the STRING type. Run the tunnel command on the MaxCompute client to upload the training dataset to Table 1 and the test dataset to Table 2. Then, set the Table Name parameter of the Read Table-1 component to Table 1 and the Table Name parameter of the Read Table-2 component to Table 2. For information about how to install and configure the MaxCompute client, see MaxCompute client (odpscmd). For information about Tunnel commands, see Tunnel commands.
Import the training dataset to the Swing Train component and configure the component parameters. For more information, see the Method 1: Configure the component in the PAI console section of this topic.
Use the test dataset and the trained model as input to the Swing Recommendation component to perform prediction.