Benefits of uploading behavioral data to OpenSearch
You can use behavioral data to understand user reactions to search results, such as browse, click, dwell, like, share, add to favorites, and purchase. This can provide guidance for you to optimize search effects.
The report statistics feature of OpenSearch allows you to view various search reports for applications, such as the reports of page views (PVs), item page views (IPVs), and click-through rate (CTR). You can improve your business operations based on the reports.
OpenSearch provides an algorithm platform, which allows you to use feedback data of search behavior to train search and sort algorithm models. This helps you improve your search effects.
Usage notes
The data collection feature is automatically enabled after an application is created.
Data refers to the feedback data of user reactions to search results.
Collection refers to the process of uploading search behavioral data to OpenSearch by using OpenSearch SDKs. In the latest version, OpenSearch allows you to collect search behavioral data only by using a server SDK. The features of collecting search behavioral data by using a mobile SDK or web SDK are under development.
Compared with earlier data collection features, the data collection V2.0 feature allows you to pass parameters and use SDKs with ease. If you are new to OpenSearch, you can use OpenSearch SDKs to upload behavioral data by using the fields that are described in this topic. Note: The SDK for Java 3.4.0 and SDK for PHP 3.2.0 support data collection V2.0.
Upload behavioral data
Note: After you enable the feature of collecting behavioral data in the OpenSearch console, we recommend that you upload behavioral data by using SDKs. The following section describes the fields that are used to upload behavioral data. Description:
To upload behavioral data by using SDKs, you must specify the following fields: imei or user_id, biz_id, trace_id, rn, bhv_type, bhv_time, item_id, and item_type.
To upload behavioral data by calling API operations, you must also specify the reach_time field in addition to the preceding fields.
For more information about the demos for uploading behavioral data by using SDKs or calling API operations, see SDKs for data collection V2.0.
Description of behavioral data fields
ID | Field | Type | Description | Value | Required |
1 | app_version | STRING | The version number of the website or mobile app that collects behavioral data. | No | |
2 | sdk_type | STRING | The type of the SDK that is used to upload behavioral data. OpenSearch uses this field to distinguish whether behavioral data is uploaded or collected by using a server SDK or mobile SDK. | No. If you upload behavioral data by using OpenSearch SDKs, this field is set to opensearch_sdk by default. | |
3 | sdk_version | STRING | The version number of the SDK that is used to upload behavioral data. | No. If you upload behavioral data by using OpenSearch SDKs, this field is specified by default. | |
4 | login | STRING | Specifies whether the user has logged on to the website or mobile app that collects behavioral data. | Valid values: 0 and 1. 0: indicates that the user has not logged on. 1: indicates that the user has logged on. | No |
5 | user_id | STRING | The ID that is used to uniquely identify the user. | No. However, you must specify either the imei field or the user_id field. | |
6 | imei | STRING | The ID of the user device. Valid values: imei, device_id, and idfa. | No. However, you must specify either the imei field or the user_id field. | |
7 | biz_id | STRING | A numeric ID that is used to distinguish between different search services. Generally, a biz_id field represents an OpenSearch application. You can specify multiple biz_id fields to represent web, iOS, and Android applications. These fields can be used to divide traffic and run tests in subsequent steps. | If you do not distinguish search services, we recommend that you set this field to default. If you distinguish search services, you can set this field to pc, ios, or android based on your business requirements. | Yes |
8 | trace_id | STRING | The provider of the search service from which the document is searched and collected. | If the document is searched and collected from OpenSearch, set this field to Alibaba. If the document is searched and collected from another service provider, specify this field based on your business requirements. | Yes |
9 | trace_info | STRING | The value of this field is the value of the ops_request_misc parameter that OpenSearch returns in the search results. Pass in the value of the ops_request_misc parameter as it is. | No Note: You must pass in this field if the trace_id field is set to Alibaba. This field is used to check whether the search results are provided from OpenSearch. | |
10 | rn | STRING | This field is used to identify a PV. The value of this field is the value of the request_id parameter that OpenSearch returns in the search results. Pass in the value of the request_id parameter as it is. | Yes | |
11 | item_id | STRING | The primary key value of a document. The value of this field is the primary key value of the primary table in the OpenSearch application. | Yes | |
12 | item_type | STRING | The business type of the document. | For more information about valid values of this field, see the Description of the item_type field section of this topic. | Yes |
13 | bhv_type | STRING | The type of the behavior, such as expose, dwell, browse, add to favorites, and download. | For more information about valid values of this field, see the Common behavior types section of this topic. | Yes |
14 | bhv_value | STRING | The value that is used to measure the behavior, such as the dwell time and number of items that are purchased. | For more information about valid values of this field, see the Common behavior types section of this topic. | No |
15 | bhv_time | STRING | The time when the behavior occurs. The value is a UNIX timestamp that is accurate to the second. | Yes | |
16 | bhv_detail | STRING | The detailed description of the behavior. | The format of this field is key=value{,key=value}. The value can contain one or more key=value pairs. | No |
17 | ip | STRING | The IP address of the mobile phone or terminal device on which the behavior occurs. | No. However, we recommend that you specify this field. | |
18 | longitude | STRING | The longitude of the location at which the behavior occurs. | No. However, we recommend that you specify this field. | |
19 | latitude | STRING | The latitude of the location at which the behavior occurs. | No. However, we recommend that you specify this field. | |
20 | session_id | STRING | The ID of a user session. | No. However, we recommend that you specify this field. | |
21 | spm | STRING | This field is used to track the page module at which the behavior occurs. | The encoding format of this field is a.b.c.d, which indicates the site ID, page ID, module ID, and location ID. | No |
22 | report_src | STRING | This field is used to identify the method that is used to upload behavioral data. | Valid values: 1, 2, 3, and patch_data.
| No |
23 | mac | STRING | The media access control (MAC) address of the mobile phone or terminal device that collects behavioral data. | No | |
24 | brand | STRING | The brand of the mobile phone or terminal device that collects behavioral data. | No. However, we recommend that you specify this field. | |
25 | device_model | STRING | The model of the mobile phone or terminal device that collects behavioral data. | No | |
26 | resolution | STRING | The screen resolution of the mobile phone or terminal device that collects behavioral data. | No | |
27 | carrier | STRING | The carrier of the mobile phone or terminal device that collects behavioral data. | No | |
28 | access | STRING | The network connected to the mobile phone or terminal device that collects behavioral data. | No | |
29 | access_subtype | STRING | The type of the network connected to the mobile phone or terminal device that collects behavioral data. | No | |
30 | os | STRING | The operating system of the mobile phone or terminal device that collects behavioral data. | No | |
31 | os_version | STRING | The version of the operating system of the mobile phone or terminal device that collects behavioral data. | No | |
32 | language | STRING | The language that is configured for the mobile phone or terminal device that collects behavioral data. | No | |
33 | phone_md5 | STRING | The MD5 hash value of a mobile phone number. | No | |
34 | reserve1 | STRING | A reserved field. | No | |
35 | reserve2 | STRING | A reserved field. If the report_src field is set to patch_data, you must set the reserve2 field to the value of the raw_query field. | No | |
36 | reach_time | BIGINT | The time when the data is received by the server. The value is a UNIX timestamp that is accurate to the second. | Yes. If you upload behavioral data by using OpenSearch SDKs, this field is automatically configured by the SDKs. If you upload behavioral data by calling API operations of OpenSearch, you must specify this field. |
Description of the item_type field
ID | item_type | Description |
1 | goods | Goods and commodities |
2 | article | Articles, blogs, and fictions |
3 | ask | Q&A |
4 | bbs | Forum posts |
5 | download | Item downloads |
6 | image | Images |
7 | media | Multimedia such as movies, TV plays, and music |
8 | recipe | Food and recipes |
9 | news | News and information |
10 | institution | Organizations |
11 | other | Others |
Common behavior types
ID | bhv_type | Description | bhv_value | bhv_detail |
1 | expose | The behavior to expose an item. | Empty. | Empty |
2 | stay | The behavior to dwell on a page. | The dwell time. Unit: seconds. | Empty |
3 | click | The behavior to click an item. | The number of clicks. Default value: 1. | Empty |
4 | cart | The behavior to add an item to a shopping cart, bookshelf, or playlist. | Empty. | Empty |
5 | buy | The behavior to purchase an item. | The number of items that are purchased. Default value: 1. | Example: buy_price=12,price_unit=RMB |
6 | collect | The behavior to add an item to favorites. | Empty. | Empty |
7 | like | The behavior to like an item. | The number of likes. Default value: 1. | Empty |
8 | dislike | The behavior to dislike an item. | The number of dislikes. Default value: 1. | Empty |
9 | comment | The behavior to comment on an item. | The number of comments. Default value: 1. | Empty |
10 | share | The behavior to share or forward an item. | The number of shares or forwards. Default value: 1. | Empty |
11 | subscribe | The behavior to follow or subscribe to an item. | Empty. | Empty |
12 | gift | The behavior to send gifts. | Empty. | Empty |
13 | download | The behavior to download an item. | Empty. | Empty |
14 | read | The behavior to read an item. | Empty. | Empty |
15 | tip | The behavior to reward an item. | Empty. | Empty |
16 | complain | The behavior to complain about an item. | Empty. | Empty |
View a data report
After you enable the data collection feature and upload a specific amount of behavioral data, you can view the data status and quality on the data collection page.
Data status
Data can be in the Normal (Available) or Abnormal (Unavailable) state. Normal (Available) indicates that no quality issue occurs on the behavioral data and the behavioral data is verified. Abnormal (Unavailable) indicates that a quality issue occurs on the behavioral data.
If data is in the Abnormal (Unavailable) state, the creation and training of popularity models and category prediction may be affected.
Abnormal data
Normal data
Data quality
If the quality check on the behavioral data fails, an error message appears on the Data Verification page in the OpenSearch console. If the quality check is passed, no error message appears on the Data Verification page.Note: The sample data that is checked in the preceding figure is the behavioral data that is synchronized to OpenSearch within an hour before a sample quality check is performed at the beginning of each hour.