Functions that can be used as both feature functions and functionality functions
Specific functions can be used as both feature functions and functionality functions. You can use such functions in filter clauses, sort clauses, and sort expressions.
The fields that you reference in parameters of such functions must be configured as index or attribute fields based on the description of each function.
tag_match: matches query clauses with documents based on tags and scores the documents by calculating the weights of matched tags
1. Overview
The tag_match function is suitable for most scenarios in which you need to provide personalized searches by matching query clauses with documents. For example, the stores that users have liked are listed first, and news related to sports and entertainment that users may like is recommended. The tag-match function adds an array of key-value pairs to documents. You can use the kvpairs clause to define the key-value pairs in a query clause. The tag_match function matches the keys in documents with the keys in the query clause, calculates the score for each pair of matched keys, and then calculates the final score of each document. The final score can be used to sort documents by weight or filter documents.
The following figure shows how the tag_match function calculates the final score.
2. Syntax
Regular syntax:
tag_match(query_key, doc_field, kv_op, merge_op)
Advanced syntax:
tag_match(query_key, doc_field, kv_op, merge_op, has_default, doc_kv, max_kv_count)
3. Parameters
query_key: defines the key-value pairs in a query clause. You must specify this parameter by using a kvpairs clause. In each key-value pair, the key and value are separated by an equal sign (=). Multiple key-value pairs are separated by colons (:). Example: kvpairs=query_tags:10=0.67:960=0.85:1=48. The keys are 10, 960, and 1. The values of the three keys are 0.67, 0.85, and 48. You can also specify only a list of keys. Example: kvpairs=cats:10:960:1.
doc_field: defines the name of the field in a document that stores the key-value pairs. The field must be of the INT_ARRAY, FLOAT_ARRAY, or DOUBLE_ARRAY type. If the field is of the FLOAT_ARRAY type, the keys are converted to 64-bit integers for matching. Keys occupy the odd positions in the array, and values occupy the even positions in the array. Sample array: [key0 value0 key1 value1…].
kv_op: the operation that is performed on the values if a key in the query clause matches a key in the document. You can set this parameter to max, min, sum, avg, mul, query_value, doc_value, or a constant number. The query_value function returns the value of the matched key in the query clause. The doc_value function returns the value of the matched key in the document.
merge_op: If multiple keys in the query clause match multiple keys in the document, the operation specified by the kv_op operation calculates the score for each pair of matched keys. Then, you can set the merge_op parameter to perform an operation on the scores. You can set the merge_op parameter to max, min, sum, avg, or first_match. The first_match operation returns only the score that is calculated for the first pair of matched keys.
has_default: specifies whether to use the initial score. The default value is false. If this parameter is set to true, the first value of the doc_field parameter is the initial score. Example of the doc_field parameter: [init_score k0 v0 k1 v1…]. The initial score can be considered the base score.
doc_kv: specifies whether the value of the doc_field parameter consists of key-value pairs. The default value is true. If this parameter is set to false, the value of the doc_field parameter consists of only keys.
max_kv_count: the maximum number of key-value pairs that can be passed from the query clause. The default value is 50. You can change the value to a number that is smaller than or equal to 5120.
4. Return value
The return value is of the DOUBLE type and indicates the final score of a document. If you set the has_default parameter to false or do not set this parameter, 0 is returned. If you want to return a 64-bit integer, you must use the int_tag_match function. Except for the return value, the int_tag_match function can be used in the same way as the tag_match function. The int_tag_match cannot be used in sort expressions.
5. Scenarios
Scenario 1: Different tags are added to posts on a large and comprehensive forum, such as funny, sports, news, music, and science. When you push documents to OpenSearch, you can assign an ID for each tag. For example, the IDs of the funny, sports, news, and music tags are 1, 5, 3, and 6. These tags are stored in the tag field. You can also obtain the weight of each tag for each post after preprocessing. For example, for a post, the weights of the funny, sports, and news tags are 0.5, 0.5, and 0.1. In this case, the value of the tag field is [1 0.5 5 0.5 3 0.1]. After a long-term analysis of the searches that are performed by forum members, you can know the favorite post tags of each member.
For example, the member nba_fans is interested in sports and funny content, and the weights of the sports and funny tags for this member are 0.6 and 0.3. Then, you can use a kvpairs clause to define the tag-weight pairs as key-value pairs and pass the key-value pairs to the query clause when this member searches for posts. If the field name defined in the kvpairs clause is user_tag, the value of the user_tag field for this member is 5=0.6:1=0.3. This way, if you use the tag_match(user_tag, tag, mul, sum) function in a fine sort expression, your search service can calculate the weights of posts in which the member is interested and list the posts with high weights first.
For example, when this member searches for the preceding post, both the funny and sports tags can be matched. You can set the kv_op parameter to mul to obtain the product of the value of each key in the query clause and that of each matched key in the document. In this example, the score of the sports tag is 0.5 × 0.6 = 0.3. The score of the funny tag is 0.5 × 0.3 = 0.15. You can set the merge_op parameter to calculate the sum of scores of two tags by using the following formula: 0.3 + 0.15 = 0.45. Then, the sum is added to the final sorting score. This way, you can sort the posts in which this member is interested by calculating weights.
Scenario 2:
Goods can have multiple attribute tags. For example, 1 indicates young (age), 2 indicates middle-aged (age), 3 indicates fresh (style), 4 indicates fashion (style), 5 indicates women (gender), and 6 indicates men (gender).
You may only want to match tags but do not want to calculate the weights of tags for sorting. In this case, you can use the options field to store tags. If the clothes have the young, fashion, and women tags, the value of the options field is [1 4 5]. This value consists of only keys. Users also have attribute tags that are similar to the attribute tags of goods. For example, a young female user used to purchase fresh-style clothes in historical transactions. In this case, the user_options=1:3:5 field can be added to the query clause when this user searches for clothes. The field that is defined by the kvpairs clause consists of only keys.
If you want to sort goods that have the favorite tags of users by calculating weights, you can use the tag_match(user_options, options, 10, sum, false, false) function in a sort expression. The user_options parameter stores the tags in a query clause, and the options parameter stores the tags in a document. The value 10 of the kv_op parameter indicates that 10 is the score for each pair of matched keys. The value false of the has_default parameter indicates that the initial score is not used. The value false of the doc_kv parameter indicates that the value of the doc_field parameter consists of only keys.
When the preceding young female user searches for the preceding clothes, both the women and young tags can be matched, and the scores of both tags are 10. After the sum operation specified by the merge_op parameter is performed on the two scores, the final score of the clothes is 20. This way, you can sort documents by weight without the weight information about tags.
Usage notes
The fields that you reference in the parameters of the function must be configured as attribute fields.
If the tag_match function is used in a filter clause or a sort clause, the query_key, kv_op, merge_op, has_default, and doc_kv parameters must be enclosed in double quotation marks ("). Example: sort=-tag_match("user_options", options, "mul", "sum", "false", "true", 100).
The tag_match function matches the keys of an integer type. Therefore, the keys in a query clause and that in a document must be converted into integers. If keys are floating-point numbers, the tag_match function forces conversion to integers.
Examples
Your document has the following 10 tags:
1: finance and economics
2: technology
3: sports
4: entertainment
5: fashion
6: education
7: traveling
8: games
9: science
10: medical
Example 1: Sort titles with the same keyword but different tags
When you search for "chiji", two documents are retrieved, as shown in the preceding figure. The tags of the two documents are different. The tag ID of the first document is 1 that indicates finance and economy. The tag ID of the second document is 8 that indicates games. If you want the document with the games tag to be listed first, you can use the tag_match function. The following examples show how the tag_match function is used in a sort expression and a sort clause:
kvpairs clause: type:8
Sort expression: tag_match(type, type_arr, 10, max,false,false)
Sort clause: tag_match("type", type_arr, 10, "max","false","false")
The following figure shows the search results obtained by using the sort expression.The following figure shows the search results obtained by using the sort clause.
Example 2: Sort titles by calculating the final score based on the weights of multiple tags for personalized recommendation
If the first-level tags are the same, as shown in the preceding figure, you need to calculate the scores of the second-level tags. The following examples show how the tag_match function is used in a sort expression and a sort clause:
kvpairs clause: type:3=2:10=1
Sort expression: tag_match(type, type_arr, 10, sum,false,true)
Sort clause: tag_match("type", type_arr, 10, "sum","false","true")
Example 3: Sort titles with the same tags that have different weights
The documents that are framed in red in the preceding figure have the same tags but the tags have different weights. The following examples show how the tag_match function is used in a sort expression and a sort clause:
kvpairs clause: type:3=2:9=2
Sort expression: tag_match(type, type_arr, sum, sum,false,true)
Sort clause: tag_match("type", type_arr, "sum", "sum","false","true")
Sample code of using the SDK for Java: Search demo