parameter configuration of the Swing Train component - Platform For AI

Swing is an item recall algorithm. You can use the Swing Train component of Platform for AI (PAI) to measure the similarity of items based on user-item-user principles. This topic describes how to configure the Swing Train component.

Limits

You can use the Swing Train component based on the computing resources of MaxCompute and Realtime Compute for Apache Flink.

Configure the component

You can configure the component by using one of the following methods:

Method 1: Configure the component in the PAI console

Configure the Swing Train component on the pipeline page of Machine Learning Designer. The following table describes the parameters.

Tab	Parameter	Description
Field Setting	itemCol	The name of the item column.
Field Setting	userCol	The name of the user column.
Parameter Setting	alpha	The alpha parameter. Default value: 1.0.
	maxItemNumber	The maximum number of users who use an item for the calculation. Default value: 1000. Note If the number of occurrences of an item is greater than this value, the algorithm randomly selects the maximum number of users based on the total number of users.
	maxUserItems	The maximum number of items used by a user for the calculation. Default value: 1000. Note If the number of items used by a user for the calculation is greater than this value, the user is not included in the calculation.
	minUserItems	The minimum number of items used by a user for the calculation. Default value: 10. Note If the number of items used by a user for the calculation is less than this value, the user is not included in the calculation.
	resultNormalize	Specifies whether to normalize the results.
	userAlpha	The alpha parameter for users. Default value: 5.0.
	userBeta	The beta parameter for users. Default value: -0.35.
Execute Tuning	Number of Workers	The number of worker nodes. The value must be a positive integer. This parameter must be used together with the Memory per worker, unit MB parameter. Valid values: 1 to 9999.
Execute Tuning	Memory per worker	The memory size of each worker node. Unit: MB. The value must be a positive integer. Valid values: 1024 to 65536.

Method 2: Configure the component by using Python code

You can configure the Swing Train component by using the PyAlink Script component to call Python code. For more information, see the PyAlink script documentation.

Parameter	Required	Description	Default value
itemCol	Yes	The name of the item column.	N/A
userCol	Yes	The name of the user column.	N/A
alpha	No	The alpha parameter, which is a smoothing factor.	1.0
userAlpha	No	The alpha parameter for users. Note This parameter is used to calculate the weight of a user by using the following formula: User weight = 1.0/(userAlpha + userClickCount)^userBeta.	5.0
userBeta	No	The beta parameter for users. Note This parameter is used to calculate the weight of a user by using the following formula: User weight = 1.0/(userAlpha + userClickCount)^userBeta.	-0.35
resultNormalize	No	Specifies whether to normalize the value.	false
maxItemNumber	No	The maximum number of users who use an item for the calculation. Note If the number of occurrences of an item is greater than this value, the algorithm randomly selects the maximum number of users based on the total number of users.	1000
minUserItems	No	The minimum number of items used by a user for the calculation. Note If the number of items used by a user for the calculation is less than this value, the user is not included in the calculation.	10
maxUserItems	No	The maximum number of items used by a user for the calculation. Note If the number of items used by a user for the calculation is greater than this value, the user is not included in the calculation.	1000

Sample Python code:

df_data = pd.DataFrame([
    ["a1", "11L", 2.2],
    ["a1", "12L", 2.0],
    ["a2", "11L", 2.0],
    ["a2", "12L", 2.0],
    ["a3", "12L", 2.0],
    ["a3", "13L", 2.0],
    ["a4", "13L", 2.0],
    ["a4", "14L", 2.0],
    ["a5", "14L", 2.0],
    ["a5", "15L", 2.0],
    ["a6", "15L", 2.0],
    ["a6", "16L", 2.0],
])

data = BatchOperator.fromDataframe(df_data, schemaStr='user string, item string, rating double')


model = SwingTrainBatchOp()\
    .setUserCol("user")\
    .setItemCol("item")\
    .setMinUserItems(1)\
    .linkFrom(data)

model.print()

predictor = SwingRecommBatchOp()\
    .setItemCol("item")\
    .setRecommCol("prediction_result")

predictor.linkFrom(model, data).print()

Examples

The following figure shows a sample pipeline in which the Swing Train component is used. 使用示例 In this example, the following steps are performed to configure the components in the preceding figure:

Prepare a training dataset and a test dataset.
Create two MaxCompute tables named Table 1 and Table 2. Table 1 contains the userid and itemid fields, and Table 2 contains the itemid field. The fields are of the STRING type. Run the tunnel command on the MaxCompute client to upload the training dataset to Table 1 and the test dataset to Table 2. Then, set the Table Name parameter of the Read Table-1 component to Table 1 and the Table Name parameter of the Read Table-2 component to Table 2. For information about how to install and configure the MaxCompute client, see MaxCompute client (odpscmd). For information about Tunnel commands, see Tunnel commands.
Import the training dataset to the Swing Train component and configure the component parameters. For more information, see the Method 1: Configure the component in the PAI console section of this topic.
Use the test dataset and the trained model as input to the Swing Recommendation component to perform prediction.