All Products
Search
Document Center

Platform For AI:DSSM vector recall

Last Updated:Dec 22, 2023

This topic describes the vector recall of Deep Structured Semantic Model (DSSM) generated by customized recommendation algorithms.

Prerequisites

A feature engineering workflow is run and the datasets for vector recall are generated. For more information, see Feature engineering.

  • rec_sln_demo_user_table_preprocess_all_feature_v2

  • rec_sln_demo_item_table_preprocess_all_feature_v2

  • rec_sln_demo_behavior_table_preprocess_v2

Instruction

  1. Go to the Machine Learning Designer page

    1. Log on to the Machine Learning Platform for AI (PAI) console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose Model Development and Training > Visualized Modeling (Designer).

  2. Create a pipeline

    1. On the Visualized Modeling (Designer) page, click the Preset Templates tab.

    2. In the Recommended Solution - Vector Recall section of the template list, click Create.

    3. In the Create Pipeline dialog box, configure the parameters. You can use the default values.

      The value specified for the Pipeline Data Path parameter is an Object Storage Service (OSS) bucket path that is used to store temporary data and models generated during the runtime of the pipeline.

    4. Click OK.

      It takes about 10 seconds to create the pipeline.

    5. In the pipeline list, double-click Recommended Solution - Vector Recall to enter the pipeline.

    6. View the components of the pipeline on the canvas, as shown in the following figure. The system automatically creates the pipeline based on the preset template.image..png

      Component number

      Description

      1

      The sample model DSSM_Recall for vector recall.

      2

      Processes the sample model using feature generation (FG).

      3

      Creates a positive sample table and uses positive samples for the negative sampling training.

      4

      Uses equal frequency binning of numeric feature to set the boundaries of the model.

      5

      Uses the number of unique values of the enumeration feature to set the embedding_dim and hash_bucket_size of the model.

      6

      Processes item features using FG.

      7

      Processes user features using FG.

      8

      Summarizes the results of the rec_sln_demo_dssm_recall_30d_binning_v1 table

      and the rec_sln_demo_dssm_recall_30d_count_v1 table

      to calculate the feature configuration information and step configuration information.

      9

      Creates an item table for negative sampling.

      10

      Discretizes the 30-day sample data of the DSSM_Recall model to generate training samples.

      11

      Specifies the EasyRec configuration file based on the calculation result of the component 8.

      12

      You need to run the component 11 to generate the EasyRec configuration file before model training.

      13

      Uses the splitted item model to perform inference on the item feature table rec_sln_demo_dssm_recall_item_feature_fg_encoded_v1to obtain the item vector.

      14

      Uses the splitted user model to perform inference on the user feature table rec_sln_demo_dssm_recall_user_feature_fg_encoded_v1 to obtain the user vector.

      15

      Creates a sequence table and uses hit_rate to evaluate the model.

      Note

      The new user and new item that occurred on the day of evaluation are not included in the evaluation.

      18

      Uses hit_rate@top200 to evaluate the model.

  3. Run the pipeline and view the results.

    1. In the top toolbar of the canvas, click Run.

    2. After the pipeline is run, view the output.

      • Right-click the component 18 (18_rec_sln_demo_recall_total_hit_rate_v1_2) on the canvas and choose View Data > hit_rate_detail from the shortcut menu to view the vector recall hit rate details. image.png

      • Right-click the component 18 (18_rec_sln_demo_recall_total_hit_rate_v1_2) on the canvas and choose View Data > total_hit_rate to view the vector recall hit rate. image.png