A match phrase query is similar to a match query, except that a match phrase query evaluates the positions of tokens. A row meets the query conditions only if the order and positions of the tokens in the row match the order and positions of the tokens that are contained in the keyword. If the tokenization method for the field that you want to query is fuzzy tokenization, match phrase query is performed at a lower latency than wildcard query.
Scenarios
You can use match phrase query to search for data that contains a specific phrase in which the words are arranged in a specific order. You can use match phrase query together with tokenization to perform full-text search in specific scenarios, such as big data analysis, content search, and personalized recommendation. For example, you can query sentences that contain a specific phrase in content search and locate messages that are arranged in a specific sequence in chat records.
Features
A match phrase query uses approximate matches to query data and evaluates the positions of tokens. For example, the value in the column of the TEXT type is "Hangzhou West Lake Scenic Area" in a row and the keyword you specify is "Hangzhou Scenic Area". Tablestore returns the row when you use match query. However, when you use match phrase query, Tablestore does not return the row. The distance between "Hangzhou" and "Scenic Area" in the keyword is 0, but the distance in the column of this row is 2 because the two words "West" and "Lake" exist between "Hangzhou" and "Scenic Area".
When you use match phrase query, you must specify the name of the field that you want to query and the keyword. A row meets the query conditions only if the order and positions of the tokens in the row match the order and positions of the tokens that are contained in the keyword.
When you perform a match phrase query, you can specify the weight that you want to assign to the field that you want to query to calculate the BM25-based keyword relevance score, the columns that you want to return, whether to return the total number of rows that meet the query conditions, and the method that is used to sort the returned rows.
API operation
You can call the Search or ParallelScan operation and set the query type to MatchPhraseQuery to perform a match phrase query.
Parameters
Parameter | Description |
fieldName | The name of the field that you want to match. You can perform match phrase queries on TEXT fields. |
text | The keyword that is used to match the value of the field when you perform a match phrase query. If the field that you want to match is a TEXT field, the keyword is tokenized into multiple tokens based on the analyzer type that you specify when you create the search index. If you do not specify the analyzer type when you create the search index, single-word tokenization is performed. For more information, see Tokenization. For example, if you perform a match phrase query by using the phrase "this is", "..., this is tablestore" and "this is a table" are returned. "this table is ..." or "is this a table" is not returned. |
query | The type of the query. Set the query parameter to matchPhraseQuery. |
offset | The position from which the current query starts. |
limit | The maximum number of rows that you want the current query to return. To query only the number of rows that meet the query conditions without specific data, set the limit parameter to 0. |
getTotalCount | Specifies whether to return the total number of rows that meet the query conditions. The default value of this parameter is false, which specifies that the total number of rows that meet the query conditions is not returned. If you set this parameter to true, the query performance is compromised. |
weight | The weight that you want to assign to the field that you want to query to calculate the BM25-based keyword relevance score. This parameter is used in full-text search scenarios. If you specify a higher weight for the field that you want to query, the BM25-based keyword relevance score for the field is higher. The value of this parameter is a positive floating point number. This parameter does not affect the number of rows that are returned. However, this parameter affects the BM25-based keyword relevance scores of the query results. |
tableName | The name of the data table. |
indexName | The name of the search index. |
columnsToGet | Specifies whether to return all columns of each row that meets the query conditions. You can specify the returnAll and columns fields for the columnsToGet parameter. The default value of the returnAll field is false, which specifies that not all columns are returned. In this case, you can use the columns field to specify the columns that you want to return. If you do not specify the columns that you want to return, only the primary key columns are returned. If you set the returnAll field to true, all columns are returned. |
Usage notes
You can use search indexes to sort query results based on the BM25-based keyword relevance score. You cannot specify custom relevance scores to sort query results.
Methods
You can use the Tablestore console, Tablestore CLI, or Tablestore SDKs to perform a match phrase query.
Before you perform a match phrase query, make sure that the following preparations are made:
You have an Alibaba Cloud account or a RAM user that has permissions to perform operations on Tablestore. For information about how to grant Tablestore operation permissions to a RAM user, see Use a RAM policy to grant permissions to a RAM user.
If you want to use Tablestore SDKs or the Tablestore CLI to perform a query, an AccessKey pair is created for your Alibaba Cloud account or RAM user. For more information, see Create an AccessKey pair.
A data table is created. For more information, see Operations on tables.
A search index is created for the data table. For more information, see Create a search index.
If you want to use Tablestore SDKs to perform a query, an OTSClient instance is initialized. For more information, see Initialize an OTSClient instance.
If you want to use the Tablestore CLI to perform a query, the Tablestore CLI is downloaded and started, and information about the instance that you want to access and the data table are configured. For more information, see Download the Tablestore CLI, Start the Tablestore CLI and configure access information, and Operations on data tables.
Billing rules
When you use a search index to query data, you are charged for the read throughput that is consumed. For more information, see Billable items of search indexes.
FAQ
References
When you use a search index to query data, you can use the following query methods: term query, terms query, match all query, match query, match phrase query, prefix query, range query, wildcard query, fuzzy query, Boolean query, geo query, nested query, KNN vector query, and exists query. You can select query methods based on your business requirements to query data from multiple dimensions.
You can sort or paginate rows that meet the query conditions by using the sorting and paging features. For more information, see Perform sorting and paging.
You can use the collapse (distinct) feature to collapse the result set based on a specific column. This way, data of the specified type appears only once in the query results. For more information, see Collapse (distinct).
If you want to analyze data in a data table, you can use the aggregation feature of the Search operation or execute SQL statements. For example, you can obtain the minimum and maximum values, sum, and total number of rows. For more information, see Aggregation and SQL query.
If you want to obtain all rows that meet the query conditions without the need to sort the rows, you can call the ParallelScan and ComputeSplits operations to use the parallel scan feature. For more information, see Parallel scan.