This topic provides a detailed analysis of queries based on the primary key of data tables, queries based on secondary indexes, and queries based on search indexes. This topic also provides suggestions on how to select between a secondary index and a search index.
Background information
To query data in a Tablestore data table, you must specify a primary key or a primary key value range. You can call the GetRow operation to query a row based on the specified primary key or call the GetRange operation to query the rows whose primary key values are within the specified range. The filter feature is used to query data in attribute columns. If attribute columns contain a large amount of data, the query efficiency of the filter feature is low and even a full table scan is required. For more information, see Read data.
In addition to the unsatisfactory query efficiency of the filter feature for a large amount of data, queries based on the primary key of data tables may not meet your business requirements in actual scenarios. To help you query data, Tablestore provides secondary indexes and search indexes.
Overview
Tablestore provides secondary indexes and search indexes to accelerate data queries.
Queries based on secondary indexes
After you create a secondary index for a data table, an index table is created for the data table. The model of the index table is the same as the model of the data table. The index table provides a custom schema that can be used to improve query efficiency based on your business scenario. For more information, see Overview.
You can use an index table to query data based on the specified primary key, primary key value range, or prefix range of primary key columns. Tablestore automatically adds the primary key columns of a data table that are not specified as index key columns to an index table as the primary key columns of the index table. This ensures that the primary key value of each row in the index table is unique.
In Tablestore, secondary indexes are classified into global secondary indexes and local secondary indexes. If you use the global secondary index feature, Tablestore automatically synchronizes data from the indexed columns and primary key columns of a data table for which an index table is created to the index table in asynchronous mode. The synchronization latency is within a few milliseconds.
To meet user requirements on strong consistency during data queries, Tablestore provides the local secondary index feature. If you use the local secondary index feature, Tablestore automatically synchronizes data from the indexed columns and primary key columns of a data table for which an index table is created to the index table in synchronous mode. After data is written to the data table, you can immediately query the data from the index table.
The following table describes the differences between global secondary indexes and local secondary indexes in terms of their synchronization mode, requirements on the first primary key column, synchronization latency, and read consistency.
Item | Global secondary index | Local secondary index |
Synchronization mode | Asynchronous mode | Synchronous mode |
Requirements on the first primary key column | The first primary key column of a global secondary index can be any primary key column or predefined column of the data table for which the secondary index is created. | The first primary key column of a local secondary index must be the same as the first primary key column of the data table for which the secondary index is created. |
Synchronization latency | Within a few milliseconds | Real-time |
Read consistency | Eventual consistency | Strong consistency |
Queries based on search indexes
Search indexes support various index schemas, such as inverted indexes and spatio-temporal indexes. Therefore, different from queries based on the primary key of data tables and queries based on secondary indexes, queries based on search indexes support multiple query methods, such as Boolean query, fuzzy query, geo query, and full-text search. For more information, see Overview.
Search indexes provide more features than secondary indexes. You can specify various conditions to query data from multiple dimensions by using a search index.
Select an index
Check whether you need to use indexes based on query conditions.
In most cases, you do not need to use indexes in the following query scenarios:
If queries based on the specified primary key or primary key value range can meet your business requirements, you do not need to create an index. For more information, see Read data.
If you want to filter data in a range within which the amount of data is small or if you do not frequently perform queries, you can use the filter feature. You do not need to create an index. For more information, see Configure a filter.
If you do not frequently perform complex queries and are not sensitive to latency, you can execute SQL statements to query data in Tablestore. For more information, see Query data.
If you need to use an index, select a secondary index or a search index based on your business requirements.
A secondary index is an index table that is similar to a data table. Secondary indexes provide a data distribution mode that uses a specific method to arrange the primary key columns of index tables. One index supports one query condition. Secondary indexes arrange data based on the specified query condition to improve query performance. An index table supports the same amount of data as a data table. You must consider hashing when you specify the primary key for secondary indexes.
A search index contains a combination of schemas. Each column supports schemas such as inverted indexes. You can sort query results by column. One search index can support multiple query conditions. You do not need to create multiple search indexes for different query conditions.
Compared with secondary indexes, search indexes support multiple query methods, such as Boolean query, fuzzy query, full-text search, and geo query. Search indexes also support multiple schemas to accelerate data filtering but are less efficient than secondary indexes when data is read in a fixed order. The query performance of search indexes varies based on the size of a data table. If a data table contains tens of billions of rows, we recommend that you use the routing key to shard data. You can also use the routing key to reduce the amount of data in a query.