How do I use search indexes to perform multi-dimensional data queries and common data statistical analysis? - Tablestore

Tablestore provides the following core features: query based on random primary key columns and non-primary key columns, Boolean query, geo query, full-text search, fuzzy query, prefix query, nested query, collapse (distinct), sorting, match all query, and aggregation. You can use these features based on your business requirements. Some features of search indexes are equivalent to specific SQL statements.

Core features

Search indexes support multi-dimensional data queries and common statistical analysis. The following table describes the core features of search indexes.

Feature	Description	References
Queries based on primary key columns and non-primary key columns	Queries that are based on only primary key columns or prefixes of primary key columns cannot meet requirements in some scenarios. If you want to query data based on a non-primary key column, you can include the column in a search index and perform queries based on the non-primary key column.	Documentation for query methods provided by search indexes, such as Term query
Boolean query	Boolean queries are suitable for order scenarios. In order scenarios, a table may contain dozens of fields. It may be difficult to determine how to combine the fields required for queries when you create a table. Even if you can determine how to combine the required fields, hundreds of combinations may be available. If you use a relational database service, you may have to create hundreds of indexes. In addition, if no index is created for a specific combination in advance, you cannot query the required data. You can use Tablestore to create a search index that includes all the fields that may be required and freely combine these fields in queries. Search indexes also support logical operators such as AND, OR, and NOT.	Boolean query
Geo query	As mobile devices gain popularity, geographical location data becomes increasingly important. Geographical location data is used in various apps such as social media apps, food delivery apps, sports apps, and Internet of Vehicles (IoV) apps. These apps require query features that support geographical location data. Search indexes support queries based on the following geographical location data: Near: queries points within a specific radius based on a central point, such as the People Nearby feature in WeChat. Within: queries points within a specific rectangular or polygonal area. Tablestore allows you to use these features to query geographical location data without the need to use other database services or search engines.	Geo-distance query Geo-bounding box query Geo-polygon query
Full-text search	Search indexes can tokenize data to support full-text search. You can use search indexes to sort query results based on only the BM25-based keyword relevance score. If you want to sort query results based on other relevance scores, we recommend that you use search systems. Search indexes support the following tokenization methods: single-word tokenization, delimiter tokenization, minimum semantic unit tokenization, maximum semantic unit tokenization, and fuzzy tokenization. You can select tokenization methods based on your requirements. To highlight specific keywords in the query output, you can enable the highlight feature.	Tokenization Highlight keywords Match query Match phrase query
KNN vector query	Search indexes provide the KNN vector query feature. You can find the most similar data items in large-scale datasets by using vectors to perform approximate nearest neighbor searches. This feature is suitable for scenarios such as recommendation systems, image and video retrieval, and natural language processing (NLP).	Overview of KNN vector query
Fuzzy query	Search indexes support queries based on wildcards. This feature is similar to the LIKE operator in relational databases. You can specify characters and wildcards such as question marks (`?`) or asterisks (`*`) to perform a wildcard query that is similar to the LIKE operator.	Wildcard query Fuzzy query
Prefix query	Search indexes support prefix queries. For example, in a query based on the prefix `apple`, results such as `apple6s` and `applexr` are returned. This feature is suitable for all natural languages.	Prefix query
Exists query	An exists query is also called a NULL query or a NULL-value query. This type of query is used in sparse data to determine whether a column of a row exists.	Exists query
Nested query	In addition to a flat structure, the data produced by online applications, such as tagged images, often has a complex and multilayered structure. For example, a database stores a large number of images, and each image has multiple elements such as houses, cars, and people. Each element in an image has a unique weight score. The score is evaluated based on the size and position of an element in an image. Therefore, each image has multiple tags. Each tag has a name and a weight score. You can use nested queries to query data based on the tag conditions. Image tags are stored in the JSON format. Example: `{ "tags": [ { "name": "car", "score": 0.78 }, { "name": "tree", "score": 0.24 } ] }` Nested queries can be used to query data that has multiple layers of logical relationships. This greatly facilitates the modeling of complex data. For Nested fields in complex data structures such as JSON, you can also enable the highlight feature to precisely locate the required information.	Array and Nested data types Nested query
Collapse (distinct)	Search indexes support the collapse (distinct) feature. You can use the collapse (distinct) feature to collapse the result set based on a specific column. This way, data of the specified type appears only once in the query results to ensure the diversity of the result types. For example, in e-commerce scenarios, if you search for `laptop`, the first page may display laptops of the same brand so that the results are not user-friendly. In this case, you can use the collapse (distinct) feature to ensure that the first page displays laptops from distinct brands.	Collapse (distinct)
Sorting	Tablestore provides the sorting feature based on the primary key of the data. If you want to sort data based on other fields, you can use the sorting feature of search indexes. Search indexes support sorting in ascending order or descending order based on one or more conditions. The sort operation is globally performed on all data in a search index. By default, the returned results of a search index are sorted based on the primary key in the data table.	Perform sorting and paging Documentation for query methods provided by search indexes, such as Term query
Match all query	You can use search indexes to return the total number of rows that match the query conditions. This feature applies to data verification and data-driven operations. If you configure an empty query condition when you perform a match all query by using a search index, all data in the search index matches the query condition. In this case, the total number of rows that are returned is the total number of rows in the search index. If you do not write data during the query and all data in the data table is indexed, the total number of rows that are returned is the total number of rows in the data table.	Match all query Documentation for query methods provided by search indexes, such as Term query
Aggregation	Search indexes allow you to perform aggregation operations to obtain the maximum value, minimum value, average value, sum, count, and distinct count of rows, percentiles, and histogram statistics. You can also use search indexes to group results. This allows you to perform lightweight statistical analysis.	Aggregation

Mappings between SQL statements and the features of search indexes

Some features of search indexes have equivalent SQL statements. The following table describes the mappings between SQL statements and the features of search indexes.

SQL	Search index feature	References
Show	API operation: DescribeSearchIndex	DescribeSearchIndex
Select	Parameter: ColumnsToGet	Documentation for query methods provided by search indexes, such as Term query
From	Parameter: index name Important Single-column indexes are supported. Multi-column indexes are not supported.	Documentation for query methods provided by search indexes, such as Term query
Where	Query: a variety of query methods such as term query	Documentation for query methods provided by search indexes, such as Term query
Order by	Parameter: sort	Perform sorting and paging
Limit	Parameter: limit	Perform sorting and paging
Delete	API operation: DeleteRow after a query	Obtain the primary key of rows Delete data
Like	Query: WildcardQuery	Wildcard query
And	Parameter: operator = and	Boolean query
Or	Parameter: operator = or
Not	Query: BoolQuery(mustNotQueries)
Between	Query: RangeQuery	Range query
Null	Query: ExistsQuery	Exists query
In	Query: TermsQuery	Terms query
Min	Aggregation: min	Aggregation
Max	Aggregation: max
Avg	Aggregation: avg
Count	Aggregation: count
Count(distinct)	Aggregation: distinctCount
Sum	Aggregation: sum
Group By	GroupBy