Use Tablestore SDK for Java to create a search index - Tablestore

You can call the CreateSearchIndex operation to create one or more search indexes for a data table. When you create a search index, you can add the fields that you want to query to the search index and configure advanced settings for the search index. For example, you can configure the routing key and presorting settings.

Prerequisites

An OTSClient instance is initialized. For more information, see Initialize an OTSClient instance.
A data table for which the maxVersions parameter is set to 1 is created. The timeToLive parameter of the data table must meet one of the following conditions. For more information, see Create a data table.
- The timeToLive parameter is set to -1, which specifies that data in the data table never expires.
- The timeToLive parameter is set to a value other than -1, and update operations on the data table are prohibited.
You are familiar with the field types supported by search indexes and the mappings between the field types in search indexes and the field types in data tables. For more information, see Data types.

Usage notes

The data types of the fields in a search index must match the data types of the fields in the data table for which the search index is created. For more information, see Data types.
To specify a value other than -1 for the timeToLive parameter of a search index, you must disable the UpdateRow operation on the data table for which the search index is created. In addition, the time to live (TTL) value of the search index must be less than or equal to the TTL value of the data table. For more information, see Configure the TTL of a search index.

API operation

public class CreateSearchIndexRequest implements Request {
    /** The name of the data table. */
    private String tableName;
    /** The name of the search index. */
    private String indexName;
    /** The schema of the search index. */
    private IndexSchema indexSchema;
    /**
     * In most cases, you do not need to specify this parameter. 
     * You can use the setter method to specify this parameter only when the schema of the search index is dynamically modified. The parameter value is the name of the source search index used for reindexing. 
     */
    private String sourceIndexName;
    /** The TTL of data in the search index. Unit: seconds. After you create the search index, you can call the UpdateSearchIndex operation to dynamically modify this parameter. */
    private Integer timeToLive;
}

public class IndexSchema implements Jsonizable {
    /** The settings of the search index. */
    private IndexSetting indexSetting;
    /** The schema of all fields in the search index. */
    private List<FieldSchema> fieldSchemas;
    /** The presorting settings of the search index. */
    private Sort indexSort;
}

Parameters

When you create a search index, you must configure the tableName, indexName, and indexSchema parameters. You must also configure the fieldSchemas, indexSetting, and indexSort parameters in the indexSchema parameter. The following table describes the parameters.

Parameter	Description
tableName	The name of the data table.
indexName	The name of the search index.
fieldSchemas	The list of field schemas. In each field schema, configure the following parameters: fieldName: This parameter is required and specifies the name of the field in the search index. The value is used as a column name. Type: String. A field in a search index can be a primary key column or an attribute column of the data table. fieldType: This parameter is required and specifies the type of the field. Specify the type in the FieldType.XXX format. For more information, see Data types. Note If you want to store and query data in multi-level logical relationships, you can use Nested fields to store the data. For more information, see Array and Nested data types. If you want to store and query data in the JSON format, you can store the JSON-formatted data in String fields in a data table. Then, you can map the String fields to Array or Nested fields in the search index that is created for the data table, and use the Array or Nested fields to query the JSON-formatted data in a flexible manner. For more information, see Array and Nested data types. If you want to query data that is related to geographical locations, you can store the data in Geo-point fields. Index: This parameter is optional and specifies whether to enable indexing for the field. Type: Boolean. Default value: true. A value of true specifies that Tablestore indexes the field by using an inverted indexing or spatio-temporal indexing schema. A value of false specifies that indexing is disabled for the field. enableHighlighting: This parameter is optional and specifies whether to enable the highlight feature. Type: Boolean. Default value: false. A value of false specifies that the highlight feature is disabled. If you set this parameter to true, you can use the highlight feature. Only Text fields support the highlight feature. For more information, see Highlight the query results. analyzer: This parameter is optional and specifies the type of analyzer that you want to use. If you set the fieldType parameter Text, you can configure this parameter. If you do not configure this parameter, the default analyzer type single-word tokenization is used. For more information, see Tokenization. analyzerParameter: This parameter is optional and specifies the settings of the analyzer. Configure this parameter based on the type of analyzer that you specified. If you configure the analyzer parameter for the field, you must configure this parameter. For more information, see Tokenization. enableSortAndAgg: This parameter is optional and specifies whether to enable sorting and aggregation. Type: Boolean. Default value: true. A value of true specifies that sorting and aggregation are enabled. Sorting can be enabled only for fields for which the enableSortAndAgg parameter is set to true. For more information, see Perform sorting and paging. Important Sorting and aggregation are not supported for Text fields. If you want to perform sorting or aggregation on a Text field, you can use the virtual column feature and set the type of the virtual column to which the Text field is mapped to Keyword. For more information, see Virtual columns. isAnArray: This parameter is optional and specifies whether the value is an array. Type: Boolean. If you set this parameter to true, the field stores data as an array. Data written to the field must be a JSON array. Example: `["a","b","c"]`. Nested values are an array. If you set the fieldType parameter to Nested, skip this parameter. subFieldSchemas: This parameter specifies the list of field schemas for subfields. If the field is a Nested field, you must specify this parameter to configure the index types of subfields in the Nested field. isVirtualField: This parameter is optional and specifies whether the field is a virtual column. Type: Boolean. Default value: false. If you set this parameter to true, you can use a virtual column. For more information, see Virtual columns. sourceFieldName: This parameter is optional and specifies the name of the source field to which the virtual column is mapped in the data table. Type: String. If you set the isVirtualField parameter to true, you must configure this parameter. dateFormats: This parameter is optional and specifies the format of dates. Type: String. If you set the fieldType parameter to Date, you must configure this parameter. For more information, see Date data type. vectorOptions: This parameter is optional and specifies the properties of the Vector field. If you set the fieldType parameter to Vector, you must configure this parameter. You can use the following parameters to specify the properties of the Vector field: dataType: the type of vector data. Only float32 is supported. If you want to use other types of Vector data, submit a ticket. dimension: the number of dimensions of the vector. For information about the limits on the number of dimensions of a vector, see Search index limits. metricType: the algorithm that you want to use to measure the distance between vectors. Valid values: euclidean, cosine, and dot_product. euclidean: the Euclidean distance algorithm that measures the shortest path between two vectors in a multi-dimensional space. For better performance, the Euclidean distance algorithm in Tablestore does not perform the final square root calculation. A greater value that is obtained by using the Euclidean distance algorithm indicates a higher similarity between two vectors. cosine: the cosine similarity algorithm that calculates the cosine of the angle between two vectors in a vector space. A greater value that is obtained by using the cosine similarity algorithm indicates a higher similarity between two vectors. In most cases, the algorithm is used to calculate the similarity between text data. dot_product: the dot product algorithm that multiplies the corresponding coordinates of two vectors of the same dimension and adds the products. A greater value that is obtained by using the dot product algorithm indicates a higher similarity between two vectors. For more information, see Appendix: distance measurement algorithms for vectors.
indexSetting	The settings of the search index, including the routingFields parameter. routingFields: This parameter is optional and specifies custom routing fields. You can specify specific primary key columns as routing fields. Tablestore distributes data that is written to a search index across different partitions based on the specified routing fields. Data with the same routing field values is distributed to the same partition.
indexSort	The presorting settings of the search index, including the sorters parameter. If you do not configure the indexSort parameter, field values are sorted by primary key. Note You can skip the presorting settings for search indexes that contain Nested fields. sorters: This parameter is optional and specifies the presorting method for the search index. Valid values: PrimaryKeySort and FieldSort. For more information, see Perform sorting and paging. PrimaryKeySort: sorts data by primary key. You can configure the following parameter for PrimaryKeySort: order: the sort order. Data can be sorted in ascending or descending order. Default value: SortOrder.ASC. This specifies that data is sorted in ascending order. FieldSort: sorts data by the value of one or more fields. Only fields for which indexing is enabled and enableSortAndAgg is set to true can be presorted. You can configure the following parameters for FieldSort: fieldName: the name of the field that you want to use to sort data. order: the sort order. Data can be sorted in ascending or descending order. Default value: SortOrder.ASC. This specifies that data is sorted in ascending order. mode: the sorting method that you want to use when the field contains multiple values.
sourceIndexName	This parameter is optional. In most cases, you do not need to configure this parameter. You can use the setter method to specify this parameter only when the schema of the search index is dynamically modified. The parameter value is the name of the source search index used for reindexing.
timeToLive	This parameter is optional and specifies the retention period of data in the search index. Unit: seconds. Default value: -1. A value of -1 specifies that data in the search index never expires. You can set this parameter to a value that is greater than or equal to 86400 or -1. A value of 86400 specifies that the retention period of data in the search index is one day. If the retention period of data exceeds the value of the timeToLive parameter, the data expires. Tablestore automatically deletes the expired data. For more information, see Configure the TTL of a search index.

Examples

Create a search index by using the default configurations

The following sample code provides an example on how to create a search index by using the default configurations. In this example, the search index consists of the following fields: the Col_Keyword field of the Keyword type, the Col_Long field of the Long type, and the Col_Vector field of the Vector type. The data in the search index is presorted based on the primary key of the data table and never expires.

private static void createSearchIndex(SyncClient client) {
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Specify the name of the data table. 
    request.setTableName("<TABLE_NAME>"); 
    // Specify the name of the search index. 
    request.setIndexName("<SEARCH_INDEX_NAME>"); 
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
            // Specify the names and types of the fields. 
            new FieldSchema("Col_Keyword", FieldType.KEYWORD), 
            new FieldSchema("Col_Long", FieldType.LONG),
            // Specify the type of the vector. 
            new FieldSchema("Col_Vector", FieldType.VECTOR).setIndex(true)
                    // Set the number of dimensions to 4 and the distance measurement algorithm for vectors to the dot product algorithm. 
                    .setVectorOptions(new VectorOptions(VectorDataType.FLOAT_32, 4, VectorMetricType.DOT_PRODUCT))
    ));
    request.setIndexSchema(indexSchema);
    // Call the client to create the search index. 
    client.createSearchIndex(request); 
}

Create a search index with the indexSort parameter specified

The following sample code provides an example on how to create a search index with the indexSort parameter specified. In this example, the search index consists of the following fields: the Col_Keyword field of the Keyword type, the Col_Long field of the Long type, the Col_Text field of the Text type, and the Timestamp field of the Long type. The data in the search index is presorted based on the Timestamp field.

private static void createSearchIndexWithIndexSort(SyncClient client) {
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Specify the name of the data table. 
    request.setTableName("<TABLE_NAME>"); 
    // Specify the name of the search index. 
    request.setIndexName("<SEARCH_INDEX_NAME>"); 
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
            new FieldSchema("Col_Keyword", FieldType.KEYWORD),
            new FieldSchema("Col_Long", FieldType.LONG),
            new FieldSchema("Col_Text", FieldType.TEXT),
            new FieldSchema("Timestamp", FieldType.LONG)
                    .setEnableSortAndAgg(true)));
    // Presort data based on the Timestamp field. 
    indexSchema.setIndexSort(new Sort(
            Arrays.<Sort.Sorter>asList(new FieldSort("Timestamp", SortOrder.ASC))));
    request.setIndexSchema(indexSchema);
    // Call the client to create the search index. 
    client.createSearchIndex(request);
}

Create a search index with the TTL specified

Important

Make sure that update operations on the data table are prohibited.

The following sample code provides an example on how to create a search index with the TTL specified. In this example, the search index consists of the following fields: the Col_Keyword field of the Keyword type and the Col_Long field of the Long type. The TTL of the search index is seven days.

// Use Tablestore SDK for Java V5.12.0 or later to create a search index. 
public static void createIndexWithTTL(SyncClient client) {
    int days = 7;
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Specify the name of the data table. 
    request.setTableName("<TABLE_NAME>");
    // Specify the name of the search index. 
    request.setIndexName("<SEARCH_INDEX_NAME>");
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
            // Specify the names and types of the fields. 
            new FieldSchema("Col_Keyword", FieldType.KEYWORD), 
            new FieldSchema("Col_Long", FieldType.LONG)));
    request.setIndexSchema(indexSchema);
    // Specify the TTL for the search index. 
    request.setTimeToLiveInDays(days);
    // Call the client to create the search index. 
    client.createSearchIndex(request);
}

Create a search index with virtual columns specified

The following sample code provides an example on how to create a search index with virtual columns specified. In this example, the search index consists of the following fields: the Col_Keyword field of the Keyword type and the Col_Long field of the Long type. In addition, the following virtual columns are created: Col_Keyword_Virtual_Long of the Long type and Col_Long_Virtual_Keyword of the Keyword type. The Col_Keyword_Virtual_Long field is mapped to the Col_Keyword column in the data table, and the Col_Long_Virtual_Keyword field is mapped to the Col_Long column in the data table.

private static void createSearchIndex(SyncClient client) {
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Specify the name of the data table. 
    request.setTableName("<TABLE_NAME>"); 
    // Specify the name of the search index. 
    request.setIndexName("<SEARCH_INDEX_NAME>"); 
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
        // Specify the name and type of the field. 
        new FieldSchema("Col_Keyword", FieldType.KEYWORD), 
        // Specify the name and type of the field. 
        new FieldSchema("Col_Keyword_Virtual_Long", FieldType.LONG) 
             // Specify whether the field is a virtual column. 
            .setVirtualField(true) 
             // Specify the name of the source field to which the virtual column is mapped in the data table. 
            .setSourceFieldName("Col_Keyword"), 
        new FieldSchema("Col_Long", FieldType.LONG),
        new FieldSchema("Col_Long_Virtual_Keyword", FieldType.KEYWORD)
            .setVirtualField(true)
            .setSourceFieldName("Col_Long")));
    request.setIndexSchema(indexSchema);
    // Call the client to create the search index. 
    client.createSearchIndex(request); 
}

Create a search index with the highlight feature enabled

The following sample code provides an example on how to create a search index with the highlight feature enabled. In this example, the search index consists of the following fields: the Col_Keyword field of the Keyword type, the Col_Long field of the Long type, and the Col_Text field of the Text type. In addition, the highlight feature is enabled for the Col_Text field.

private static void createSearchIndexwithHighlighting(SyncClient client) {
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Specify the name of the data table. 
    request.setTableName("<TABLE_NAME>"); 
    // Specify the name of the search index. 
    request.setIndexName("<SEARCH_INDEX_NAME>"); 
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
            // Specify the names and types of the fields. 
            new FieldSchema("Col_Keyword", FieldType.KEYWORD), 
            new FieldSchema("Col_Long", FieldType.LONG),
            // Enable the highlight feature for the Col_Text field. 
            new FieldSchema("Col_Text", FieldType.TEXT).setIndex(true).setEnableHighlighting(true)
    ));
    request.setIndexSchema(indexSchema);
    // Call the client to create the search index. 
    client.createSearchIndex(request); 
}

FAQ

References

After you create a search index, you can use the query methods provided by the search index to query data from multiple dimensions based on your business requirements. When you use a search index to query data, you can use the following query methods: term query, terms query, match all query, match query, match phrase query, prefix query, suffix query, range query, wildcard query, geo query, Boolean query, KNN vector query, nested query, and exists query.
When you call the Search operation to query data, you can filter the result set.
- You can sort or paginate rows that meet the query conditions by using the sorting and paging features. For more information, see Perform sorting and paging.
- You can use the highlight feature to highlight the query strings in the query results. For more information, see Highlight the query results.
- You can use the collapse (distinct) feature to collapse the result set based on a specific column. This way, data of the specified type appears only once in the query results. For more information, see Collapse (distinct).
After you create a search index, you can manage the search index based on your business requirements.
- You can dynamically modify the schema of a search index to add, update, or remove index fields from the search index. For more information, see Dynamically modify the schema of a search index.
- You can modify the TTL of a search index to delete historical data in the search index or extend the retention period of data in the search index. For more information, see Configure the TTL of a search index.
- You can call the ListSearchIndex operation to query all search indexes that are created for a data table. For more information, see List search indexes.
- You can call the DescribeSearchIndex operation to query the description of a search index. For example, you can query the field information and search index configurations. For more information, see Query the description of a search index.
- You can delete a search index that you no longer require. For more information, see Delete search indexes.
If you want to analyze data in a table, you can call the Search operation to use the aggregation feature or use the SQL query feature. For example, you can query the maximum and minimum values, the sum of the values, and the number of rows. For more information, see Aggregation and SQL query.
If you want to obtain all rows that meet the query conditions without the need to sort the rows, you can call the ParallelScan and ComputeSplits operations to use the parallel scan feature. For more information, see Parallel scan.