Factorization Machine (FM) is a nonlinear model that is designed to capture interactions between features. This model is ideal for recommendation scenarios such as e-commerce, advertising, and live streaming. Machine Learning Designer provides an Alink-based template to help you create FM models and deploy recommendation systems. This topic describes how to use the preset FM algorithm template provided by Machine Learning Designer to build your recommendation models.
Prerequisites
A workspace is created. For more information, see Create a workspace.
MaxCompute resources are associated with the workspace. For more information, see Manage workspaces.
Fully-managed Flink resources are purchased and associated with a workspace. For more information, see Flink resource quotas.
Procedure
Go to the Machine Learning Designer page.
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane, choose to go to the Machine Learning Designer page.
Create a pipeline.
On the Visualized Modeling (Designer) page, click the Preset Templates tab.
On this tab, find the [Alink]FM-Embedding for Rec-System template and click Create.
In the Create Pipeline dialog box, configure the parameters. You can use their default values.
The value specified for the Pipeline Data Path parameter is the Object Storage Service (OSS) bucket path of the temporary data and models generated during the runtime of the pipeline.
Click OK.
It requires about 10 seconds to create the pipeline.
On the Pipelines tab, click the created pipeline. In the Basic Information section on the right side, click Open.
You can view the created pipeline on a canvas, as shown in the following figure.
This template provides two methods to use Alink for FM training and prediction:
Method 1: Use encapsulated Alink components
Alink provides encapsulated components for FM training and prediction, which are marked with a purple dot in the pipeline. Alink components can run in groups. For information about how to run Alink components in groups and its advantages and disadvantages, see Advanced feature: Alink components run in groups.
Method 2: Use custom PyAlink components
You can use custom PyAlink components to perform FM training and prediction by using Python code. This method implements the same functionality as Method 1.
Set the parameters of the FM Training-1 component.
Click the FM Training -1 component on the canvas.
On the Fields Setting tab in the right-side pane, set the fields listed in the following table.
Field
Description
Feature Columns
The name of the feature column in the key:value format. Separate multiple key-value pairs with commas (,).
Label Column
The name of the label column. The label column must be of the DOUBLE data type.
The FM algorithm provided by Machine Learning Designer requires data in the LIBSVM format. To convert data from other formats to LIBSVM, use a one-hot encoding component and make sure that the input data includes a feature column and a label column, as shown in the following figure.
On the Parameters Setting and Tuning tabs in the right-side pane, configure the training parameters.
For example, if the pipeline involves 120 million sample data records and 1.3 million feature data records, we recommend that you set the training parameters to the recommended values in the following table, and use the default values for other parameters. You can modify the values of the training parameters based on the amount of data that is involved.
Tab
Field
Description
Parameters Setting
Learning rate
The learning rate. Recommended value: 0.005. If the training is divergent, set this parameter to a smaller value.
Dimensions
The dimensions of the feature specified in a three-element array. Recommended value:
1,1,16
.Block size
The size of the block. If less than two million feature data records are involved, we recommend that you set this parameter to 1000000.
If two million feature data records or more are involved, you do not need to configure this parameter.
Tuning
Number of Workers
The number of workers to be used. Recommended value: 32. If a large amount of data is involved, set this parameter to a greater value.
Memory Size per Node (MB)
The memory size to be allocated to each node. Recommended value: 16384. Unit: MB.
Add code to the PyAlink-FM Training and PyAlink-FM Prediction component.
Click the PyAlink-FM Training component and paste the following code into the Code Editor.
from pyalink.alink import * def main(sources, sinks, parameter): print('start') # Method 1 # train = HugeFmTrainBatchOp().setVectorCol('features').setLabelCol('label').linkFrom(sources[0]) # Method 2 train = HugeFmTrainBatchOp( vectorCol='features', labelCol='label', task='binary_classification', numEpochs=10) # Obtain the training data from input port 0. The trained model is generated from output port 0 and passed downstream. sources[0].link(train).link(sinks[0]) BatchOperator.execute() print('end')
Click the PyAlink-FM Prediction component and paste the following code into the Code Editor.
from pyalink.alink import * def main(sources, sinks, parameter): predictor = HugeFmPredictBatchOp().setPredictionCol("prediction_result")\ .setPredictionDetailCol("prediction_detail").setReservedCols(["label"]) output = predictor.linkFrom(sources[0], sources[1]) # The prediction result is generated from the first output port and passed downstream. output.link(sinks[0]) BatchOperator.execute() print('predict end')
Set the computing resources used for algorithm execution.
Click the blank area on the canvas. On the Pipeline Attributes tab on the right-side pane, select Flink from the Default Resource Preferred by Alink or FlinkML drop-down list.
On the canvas, click the PyAlink-FM Training component and the PyAlink-FM Prediction component respectively, and modify the following parameters on the Tuning tab on the right-side pane.
Choose Running Mode: Select Flink (Distributed).
The number of workers: Select 2.
In the upper-left corner of the canvas, click Save.
In the upper-left corner of the canvas, click the Run icon to run the pipeline.
After the pipeline is run, right-click the Binary Classification Evaluation-1 component on the canvas and select Visual Analysis.
Based on the data of the used template, the FM algorithm provided by Machine Learning Designer can create a model with an area under curve (AUC) close to 0.92.
Evaluation chart generated by using Method 1
Evaluation chart generated by using Method 2
References
For more information about algorithm components, see the following topics: