All Products
Search
Document Center

Tablestore:Create a search index

Last Updated:Dec 06, 2025

Use the CreateSearchIndex method to create a search index for a data table. A data table can have multiple search indexes. When you create a search index, add the fields that you want to query to the index. You can also configure advanced options, such as custom routing keys and pre-sorting.

Prerequisites

  • Initialize the Tablestore client. For more information, see Initialize Tablestore Client.

  • You have completed creating a data table that meets the following conditions:

    • The max versions must be 1.

    • The time to live (TTL) is -1, or updates to the table are disabled.

Notes

  • When you create a search index, the data type of a field in the index must match the data type of the corresponding field in the data table.

  • If you want to set a specific TTL for a search index (a value other than -1), you must disable the UpdateRow write feature for the data table. The TTL of the search index must be less than or equal to the TTL of the data table. For more information, see Lifecycle management.

Parameters

When you create a search index, specify the table name (table_name), the search index name (index_name), and the index schema (schema). The schema includes field schemas (field_schemas), index settings (index_setting), and index pre-sorting settings (index_sort). The following table describes these parameters.

Component

Description

table_name

The name of the data table.

index_name

The name of the search index.

field_schemas

A list of field_schema objects. Each field_schema contains the following parameters:

  • field_name (Required): The name of the field to add to the search index. This is the column name. The type is String.

    The field can be a primary key column or an attribute column.

  • field_type (Required): The data type of the field. The type is FieldType.XXX.

  • is_array (Optional): Specifies whether the field is an array. The type is Boolean.

    If you set this to True, the column is an array. When you write data, it must be in JSON array format, such as ["a","b","c"].

    Because the Nested type is an array, you do not need to set this parameter when field_type is Nested.

  • index (Optional): Specifies whether to create an index for the field. The type is Boolean.

    The default value is True, which means an inverted index or a spatial index is created for the column. If you set this to False, no index is created for the column.

  • analyzer (Optional): The tokenizer type. You can set this parameter when the field type is Text. If you do not set this parameter, the default tokenizer is single-word tokenization.

  • enable_sort_and_agg (Optional): Specifies whether to enable sorting and statistical aggregation. The type is Boolean.

    Only fields with enable_sort_and_agg set to True can be used for sorting.

    Important

    Nested fields do not support sorting and statistical aggregation. However, sub-columns within a Nested field support this feature.

  • sub_field_schemas (Optional): When the field type is Nested, use this parameter to set the index types for sub-columns in the nested document. The type is a list of field_schema objects.

  • is_virtual_field (Optional): Specifies whether the field is a virtual column. The type is Boolean. The default value is False. To use a virtual column, set this parameter to True.

  • source_field_name (Optional): The name of the field in the data table. The type is String.

    Important

    This parameter is required when is_virtual_field is set to True.

  • date_formats (Optional): The date format. The type is String. For more information, see Date and time types.

    Important

    This parameter is required when the field type is Date.

  • enable_highlighting (Optional): Specifies whether to enable the summary and highlighting feature. The type is Boolean. The default value is False. To use summary and highlighting, set this parameter to True. Only Text fields support this feature.

    Important

    The Tablestore Python SDK supports this feature starting from version 6.0.0.

  • vector_options (Optional): The property parameters for a vector field. This parameter is required when the field type is Vector. It includes the following content:

    • data_type: The data type of the vector. Currently, only float32 is supported. If you require other types, submit a ticket to contact us.

    • dimension: The vector dimensions. The maximum number of dimensions supported for a vector field is 4096.

    • metric_type: The algorithm used to measure the distance between vectors. Supported algorithms include Euclidean distance (euclidean), cosine similarity (cosine), and dot product (dot_product).

      • euclidean: the Euclidean distance algorithm that measures the shortest path between two vectors in a multi-dimensional space. For better performance, the Euclidean distance algorithm in Tablestore does not perform the final square root calculation. A greater value that is obtained by using the Euclidean distance algorithm indicates a higher similarity between two vectors.

      • cosine: the cosine similarity algorithm that calculates the cosine of the angle between two vectors in a vector space. A greater value that is obtained by using the cosine similarity algorithm indicates a higher similarity between two vectors. In most cases, the algorithm is used to calculate the similarity between text data.

      • dot_product: the dot product algorithm that multiplies the corresponding coordinates of two vectors of the same dimension and adds the products. A greater value that is obtained by using the dot product algorithm indicates a higher similarity between two vectors.

      For more information about selecting a distance measure algorithm, see Distance measure algorithms.

  • json_type (Optional): The index type for JSON data. OBJECT and NESTED are supported. This parameter is required when the field type is JSON.

index_setting

Index settings, which include the routing_fields setting.

routing_fields (Optional): Custom routing fields. You can select some primary key columns as routing fields. Typically, you only need to set one. If you set multiple routing keys, the system concatenates their values into a single value.

When writing index data, the system calculates the data distribution based on the values of the routing fields. Records with the same routing field values are indexed into the same data partition.

index_sort

Index pre-sorting settings, which include the sorters setting. If you do not set this, the data is sorted by primary key by default.

Note

Indexes that contain Nested fields do not support indexSort. No pre-sorting is performed.

sorters (Required): The pre-sorting method for the index. You can sort by primary key or by field value. For more information about sorting, see Sorting and pagination.

  • PrimaryKeySort sorts the data by primary key and includes the following setting:

    sort_order: The sort order. You can sort in ascending (SortOrder.ASC) or descending order. The default is ascending.

  • FieldSort sorts the data by field value and includes the following settings:

    Only fields that are indexed and have sorting and statistical aggregation enabled can be used for pre-sorting.

    • field_name: The name of the field to sort by.

    • sort_order: The sort order. You can sort in ascending (SortOrder.ASC) or descending order. The default is ascending.

    • sort_mode: The sorting method to use when a field has multiple values.

Examples

Specify an analyzer when creating a search index

The following example shows how to specify tokenizers when you create a search index. The search index contains six fields: k (Keyword), t (Text), g (Geopoint), ka (Keyword array), la (Long array), and n (Nested). The n field has three sub-fields: nk (Keyword), nl (Long), and nt (Text).

def create_search_index(client):
    # A Keyword field. Create an index and enable statistical aggregation.
    field_a = FieldSchema('k', FieldType.KEYWORD, index=True, enable_sort_and_agg=True)
    # A Text field. Create an index and use single-word tokenization.
    field_b = FieldSchema('t', FieldType.TEXT, index=True, analyzer=AnalyzerType.SINGLEWORD)
    # A Text field. Create an index and use fuzzy tokenization.
    #field_b = FieldSchema('t', FieldType.TEXT, index=True, analyzer=AnalyzerType.FUZZY,analyzer_parameter=FuzzyAnalyzerParameter(1, 6))
    # A Text field. Create an index and use a custom separator (a comma) for tokenization.
    #field_b = FieldSchema('t', FieldType.TEXT, index=True, analyzer=AnalyzerType.SPLIT, analyzer_parameter = SplitAnalyzerParameter(","))
    # A Geopoint field. Create an index.
    field_c = FieldSchema('g', FieldType.GEOPOINT, index=True)
    # A Keyword array field. Create an index.
    field_d = FieldSchema('ka', FieldType.KEYWORD, index=True, is_array=True)
    # A Long array field. Create an index.
    field_e = FieldSchema('la', FieldType.LONG, index=True, is_array=True)

    # A Nested field that includes three sub-fields: nk (Keyword), nl (Long), and nt (Text).
    field_n = FieldSchema('n', FieldType.NESTED, sub_field_schemas=[
        FieldSchema('nk', FieldType.KEYWORD, index=True),
        FieldSchema('nl', FieldType.LONG, index=True),
        FieldSchema('nt', FieldType.TEXT, index=True),
    ])

    fields = [field_a, field_b, field_c, field_d, field_e, field_n]

    index_setting = IndexSetting(routing_fields=['PK1']) 
    index_sort = None # When a search index contains a Nested field, you cannot set index pre-sorting.
    #index_sort = Sort(sorters=[PrimaryKeySort(SortOrder.ASC)])
    index_meta = SearchIndexMeta(fields, index_setting=index_setting, index_sort=index_sort)
    client.create_search_index('<TABLE_NAME>', '<SEARCH_INDEX_NAME>', index_meta)

Create a search index and configure vector fields

The following example shows how to create a search index. The search index contains three fields: col_keyword (Keyword), col_long (Long), and col_vector (Vector). The distance measure algorithm for the vector field is the dot product.

def create_search_index(client):
    index_meta = SearchIndexMeta([
        FieldSchema('col_keyword', FieldType.KEYWORD, index=True, enable_sort_and_agg=True),  # String type
        FieldSchema('col_long', FieldType.LONG, index=True),  # Numeric type
        FieldSchema("col_vector", FieldType.VECTOR,  # Vector type
                    vector_options=VectorOptions(
                        data_type=VectorDataType.VD_FLOAT_32,
                        dimension=4,  # The vector dimension is 4, and the similarity algorithm is dot product.
                        metric_type=VectorMetricType.VM_DOT_PRODUCT
                    )),

    ])
    client.create_search_index(table_name, index_name, index_meta)

Enable summary and highlighting when creating a search index

The following example shows how to enable summary and highlighting when you create a search index. The search index contains three fields: k (Keyword), t (Text), and n (Nested). The n field has three sub-fields: nk (Keyword), nl (Long), and nt (Text). The summary and highlighting feature is enabled for the t field and the nt sub-field of the n field.

def create_search_index0905(client):
    # A Keyword field. Create an index and enable statistical aggregation.
    field_a = FieldSchema('k', FieldType.KEYWORD, index=True, enable_sort_and_agg=True)
    # A Text field. Create an index, use single-word tokenization, and enable summary and highlighting for the field.
    field_b = FieldSchema('t', FieldType.TEXT, index=True, analyzer=AnalyzerType.SINGLEWORD,
                        enable_highlighting=True)

    # A Nested field that includes three sub-fields: nk (Keyword), nl (Long), and nt (Text). The summary and highlighting feature is enabled for the nt sub-column.
    field_n = FieldSchema('n', FieldType.NESTED, sub_field_schemas=[
        FieldSchema('nk', FieldType.KEYWORD, index=True),
        FieldSchema('nl', FieldType.LONG, index=True),
        FieldSchema('nt', FieldType.TEXT, index=True, enable_highlighting=True),
    ])

    fields = [field_a, field_b, field_n]

    index_setting = IndexSetting(routing_fields=['id'])
    index_sort = None  # When a search index contains a Nested field, you cannot set index pre-sorting.
    # index_sort = Sort(sorters=[PrimaryKeySort(SortOrder.ASC)])
    index_meta = SearchIndexMeta(fields, index_setting=index_setting, index_sort=index_sort)
    client.create_search_index('pythontest', 'pythontest_0905', index_meta)

FAQ

References