The Model Hitrate Eval component of Platform for AI (PAI) uses the hit_rate_pai.py script to implement vector recall evaluation. This topic describes how to configure the Model Hitrate Eval component.
Limits
You can use the Model Hitrate Eval component based on MaxCompute resources.
Configure the component
You can use one of the following methods to configure the Model Hitrate Eval component.
Method 1: Configure the component in the PAI console
Input ports
Input port (from left to right)
Recommended upstream component
Parameter of PAI commands
Required
item embedding table name
Data type: MaxCompute table
Supported components: SQL Script and Read Table
Passed in as the tables parameter
Yes
ground truth table name
Data type: MaxCompute table
Supported components: SQL Script and Read Table
Yes
Component parameters
Tab
Parameter
Required
Description
Parameter of PAI commands
Default value
Parameters Setting
recall_type
No
The type of the recall. Type: String. Valid values:
u2i:user to item retrieval
i2i:item to item retrieval
recall_type
u2i:user to item retrieval
top_k hitrate
No
The number of recalls. Type: String.
top_k
200
embedding dimension
No
The embedding dimension of the embedding table. Type: INT.
emb_dim
32
knn_metric
No
The measurement method of the recall similarity. Type: String. Valid values:
0: L2 distance
1: Inner Product similarity
knn_metric
0
use exact search:knn_strict
No
The value is of the BOOL type. Valid values:
True: uses exact KNN for calculation. This increases the amount of calculation.
False: does not use exact KNN for calculation.
knn_strict
True
batch_size
No
The number of samples calculated at the same time. Type: INT.
batch_size
1024
max number of interests:num_interests
No
The maximum number of interest vectors. Type: INT.
num_interests
1
Specify the algorithm version
Yes
Select an algorithm package that you want to run.
1. Generate an EasyRec TAR package. For more information, see Release & Upgrade in the EasyRec documentation.
2. Upload the TAR package to the OSS path. For more information, see Upload objects.
3. Select the uploaded TAR package.
script
N/A
Tuning
ps Count
No
The number of PS nodes.
The parameters on the Tuning tab are passed in as the cluster parameter.
2
ps CPU
No
The number of vCPUs for each PS node. A value of 1 indicates one vCPU.
10
ps Memory
No
The memory size of each PS node. Unit: MB.
40000
Worker Count
No
The number of worker nodes.
6
Worker CPU
No
The number of vCPUs for each worker node. A value of 1 indicates one vCPU.
8
Worker Memory
No
The memory size of each worker node. Unit: MB.
40000
Worker GPU
No
GPUs are not required in most EasyRec trainings.
0
Output ports
Output port (from left to right)
Data type
Parameter of PAI commands
Required
hit_rate_details
MaxCompute table
Passed in as the outputs parameter
Yes
total_hit_rate
MaxCompute table
Yes
Method 2: Configure the component by using PAI commands
The following section describes the parameters. You can use SQL scripts to call PAI commands. For more information, see SQL Script.
PAI -name tensorflow1120_cpu_ext
-project algo_public
-Darn="acs:ram::xxx:role/aliyunodpspaidefaultrole"
-Dbuckets="oss://examplebucket/"
-DossHost="oss-cn-hangzhou-internal.aliyuncs.com"
-DentryFile="easy_rec/python/tools/hit_rate_pai.py"
-Dcluster="{\"ps\": {\"count\": 2, \"cpu\": 1000, \"memory\": 40000}, \"worker\": {\"count\": 6, \"cpu\": 800, \"gpu\": 0, \"memory\": 40000}}"
-Dtables="odps://pai_hangzhou/tables/pai_temp_flow_vjmgur2q5ca5lz****_node_j5c8mx2h26wqxu****_outputTable/,odps://pai_hangzhou/tables/pai_temp_flow_hfd4fk1z1ba9z5****_node_msqfceossxpy7v****_outputTable/"
-Doutputs="odps://pai_hangzhou/tables/pai_temp_flow_1y1h6j8bnl94ao****_node_74kp8xcaugwmy8****_hit_rate_details,odps://pai_hangzhou/tables/pai_temp_flow_1y1h6j8bnl94ao****_node_74kp8xcaugwmy8****_total_hit_rate"
-DuserDefinedParameters="--recall_type=u2i --top_k=200 --emb_dim=32 --knn_metric=0 --knn_strict=True --batch_size=1024 --num_interests=1"
-Dscript="oss://examplebucket/easy_rec_ext_0.6.1_res.tar.gz"Option | Description | Required |
entryFile | The entry file. Run the hit_rate_pai.py script. | Yes |
tables | The input table, which consists of the item embedding table and the ground truth table. Separate the two tables with a comma (,). | Yes |
outputs | The output table, which consists of the hit_rate_detail and total_hit_rate tables. Separate the two tables with a comma (,). | Yes |
arn | The resource group authorization information. To obtain arn, log on to the PAI console and choose Activation & Authorization > Dependent Services. In the Designer section, click View Authorization in the Actions column. | Yes |
ossHost | The endpoint of OSS. For more information about endpoints, see Regions and endpoints. | Yes |
buckets | The bucket in which the model file resides or the bucket in which the model is stored. If multiple buckets are used, separate the buckets with commas (,). Example: | Yes |
userDefinedParameters | Additional parameters that are not specified in the pipeline, such as recall_type, top_k, emb_dim, knn_metric, knn_strict, and other parameters. | No |
script | The OSS path of the TAR package of EasyRec. For more information about how to generate a TAR package, see Release & Upgrade in the EasyRec documentation. | No |
Examples
On the MaxCompute client, run the following commands to create the item embedding and ground truth tables. For information about how to use the MaxCompute client, see MaxCompute client (odpscmd).
CREATE TABLE IF NOT EXISTS dssm_recall_item_embedding_tmp_for_eval_v1 ( item_id bigint , item_emb string COMMENT 'Item Embedding' ); CREATE TABLE IF NOT EXISTS dssm_recall_vector_recall_sample_eval_sequence_v1 ( requestid string ,item_ids string , user_emb string COMMENT 'User Embedding' ,emb_num bigint );Upload the downloaded training data (item_embedding.csv) and test data (ground.csv) to the created MaxCompute tables. For information about how to use the MaxCompute client to upload data, see Tunnel commands. You can specify the directory in which the CSV file resides or the absolute path of the CSV file when you run the Tunnel command.
tunnel upload item_embedding.csv dssm_recall_item_embedding_tmp_for_eval_v1 -fd \t; tunnel upload ground.csv dssm_recall_vector_recall_sample_eval_sequence_v1 -fd \t;Create a pipeline as shown in the following figure.

Section
Description
①
Set the Table Name parameter of the Read Table-3 component to dssm_recall_item_embedding_tmp_for_eval_v1.
②
Set the Table Name parameter of the Read Table-4 component to dssm_recall_vector_recall_sample_eval_sequence_v1.
③
Connect the left-side input port to the Read Data Table-3 component and the right-side input port to the Read Data Table-4 component. Set the Specify the algorithm version parameter to the OSS path of the uploaded EasyRec TAR package. For more information about how to prepare a TAR package, see the "Configure the component in the PAI console " section of this topic. This example uses the 18_rec_sln_demo_dssm_recall_total_hit_rate_v1_2 component to configure the TAR file. For information about this component, see DSSM vector recall.
Click the
icon to run the pipeline. After you run the pipeline, right-click the Model Hitrate Eval component on the canvas and select View Data. You can view the following data results.
hit_rate_detail

total_hit_rate

References
For more information, see 18_rec_sln_demo_dssm_recall_total_hit_rate_v1 in the DSSM vector recall topic.