Create an index - Lindorm - Alibaba Cloud Documentation Center

Lindorm provides two types of efficient and easy-to-use indexes: secondary indexes and search indexes. These indexes are individually applicable to queries in which the primary key is not used for matching and multi-dimensional queries. When you use the CREATE INDEX syntax to create an index, you can specify the index type and add attributes to the index.

Applicable engines and versions

The CREATE INDEX syntax is applicable to all versions of LindormTable.

Important

To use the CREATE INDEX syntax to create a search index or columnar index, the Lindorm SQL version must be 2.6.1 or later. For more information about how to view the Lindorm SQL version, see SQL versions.

Syntax

create_index_statement   ::=  CREATE INDEX [IF NOT EXISTS] [ index_identifier ]
                              [ USING index_method_definition ]
                              ON table_identifier '(' index_key_expression ')'
                              [ INCLUDE include_expression]
                              [ PARTITION BY partition_definition ]
                              [ { ASYNC | SYNC} ]
                              [ WITH '(' index_options  ')' ]
index_method_definition  ::=  { KV | SEARCH | COLUMNAR }
index_key_expression     ::=  '('
                                  index_key_definition 
                                  |wildcard_string_literal                                 
                              ')'

index_key_definition     ::= {
                                  column_identifier [ DESC ]
                                  | column_identifier'(' column_options ')'
                                  | function_expression
                             }
                             [ (',' index_key_definition)* ]
column_options    ::=  '(' 
                           option_definition (',' option_definition )* 
                       ')'
function_expression      ::=  function_identifier 
                              '(' 
                                  column_identifer ( ',' column_identifer )* 
                              ')'
option_definition        ::=  option_identifer '=' string_literal
include_expression       ::=  '(' 
                                  column_identifier( ',' column_identifier )*
                              ')'
partition_definition     ::= {
                                {RANGE TIME} 
                                     '(' 
                                          column_identifer 
                                     ')' [ PARTITIONS number_literal ]
                                |
                                HASH '(' 
                                          column_identifer 
                                          ( ',' column_identifer )*
                                     ')' [ PARTITIONS number_literal ]
                                |
                                ENUMERABLE 
                                     '(' 
                                          column_identifer 
                                          ( ',' column_identifer )*
index_options            ::=  '(' 
                                  option_definition (',' option_definition )* 
                              ')'

Supported parameters

LindormTable allows you to use the CREATE INDEX syntax to create two types of indexes: secondary indexes and search indexes.

The following table describes the parameters that are supported when you create each type of indexes.

Parameter	Secondary index	Search index
Index type (index_method_definition)	〇	〇
Index key expression (index_key_expression)	〇	〇
Included column (include_expression)	〇	✖️
Index partitioning (partition_definition)	✖️	〇
Index creation method (ASYNC\|SYNC) Important Only LindormTable 2.6.3 and later versions support the `SYNC` creation method.	〇	〇
Index attributes (index_options)	〇	〇

Parameters

You can create up to three secondary indexes and one search index for a wide table.

Index type (index_method_definition)

You can specify the type of an index by using the USING keyword when you create the index. The following table describes the values that you can configure for this parameter when you create different types of indexes supported by LindormTable.

Parameter value

Index type

Description

Secondary index

By default, if you do not explicitly specify an index type in the CREATE INDEX statement, a secondary index is created. For more information about secondary indexes, see Secondary indexes.

Important

Only eight secondary indexes can be created for an instance at the same time. If eight secondary indexes are being created when you try to create a secondary index, the index fails to be created.

Search index

Search indexes are used for full-text queries based on LindormSearch. Search indexes are suitable for complex multi-dimensional query scenarios, such as word segmentation, fuzzy queries, aggregate analysis, and sorting and paging. For more information, see Overview.

Search indexes support all basic data types except for DATE, TIME, and DECIMAL. For more information about the supported data types, see Basic data types.

Important

Before you create a search index, you must enable search indexes for the instance. After you enable search indexes, you are charged fees for LindormSearch nodes and LTS nodes. For more information, see Enable the search index feature.
The key of a search index must contain a column that is not contained in the primary key.
The number of search indexes that you can create in a Lindorm instance is not limited.

Index key expression (index_key_expression)

You can define one column or multiple columns as an index key. An index that consists of multiple index keys is also referred as a federated index.

Index key definition (index_key_definition)

You can use this parameter to add attributes for the index key of a search index. You can also use this parameter to specify an index key as a function expression.

Index key attributes for search indexes (option_definition)

You can specify attributes for index keys when you use the ALTER INDEX syntax to add index columns. For example, you can specify c3(type=text,analyzer=ik) in the ALTER INDEX statement to create an index for the c3 column and specify that the IK analyzer is used for the c3 column.

The following table describes the index key attributes that are supported by search indexes.

Attribute	Data type	Description
indexed	STRING	Specifies whether to create an inverted index for the specified column in the index key. true (default): creates an inverted index. false: does not create an inverted index.
rowStored	STRING	Specifies whether to store raw data. Valid values: true: stores raw data. false (default): does not store raw data.
columnStored	STRING	Specifies whether to use the columnar structure to accelerate data sorting and analysis. Valid values: true (default): uses the columnar structure. false: does not use the columnar structure.
type	STRING	The data type. When analyzers are used, set this attribute to text for the related fields. In other scenarios, set the value of this attribute to the data type in your wide table by default. Important The type attribute must be configured together with the analyzer attribute.
analyzer	STRING	The analyzer that you want to use. Valid values: standard english ik whitespace comma Important The analyzer attribute must be configured together with the type attribute.
mapping	STRING	The custom index key attribute. The value of this attribute is a string that represents a JSON object. This attribute is compatible with the syntax of LindormSearch (Compatible with Elasticsearch). Important The mapping attribute is applicable only to LindormSearch (Compatible with Elasticsearch). If you specify the mapping attribute, other attributes of the index key become invalid.

Secondary index function expression (function_expression)

When you create a secondary index, you can specify the index key as a function expression. The following five functions are supported: Z-ORDER, S2, CAST, MD5, and SHA256.
Important
You can specify the index key as an expression that contains the MD5 or SHA256 function only in LindormTable 2.6.7.5 and later versions. If you cannot upgrade LindormTable in the console, contact the technical support (DingTalk ID: s0s3eg3).
- Z-ORDER: creates spatio-temporal secondary indexes for one or more spatio-temporal columns. Syntax:
```
Z-ORDER '(' column_identifier ( ',' column_identifer )*  ')'
```
  The column specified by column_identifer must be a spatio-temporal column. For more information about spatio-temporal indexes, see Spatio-temporal indexes.
- CAST: creates indexes for a column whose data type is converted. Syntax:
```
CAST(column_identifier AS type)
```
  In the preceding syntax, type specifies the data type to which the data in the column is converted. For more information, see Basic data types.
- S2: creates an S2 secondary index for a column that contains spatio-temporal data. Syntax:
```
S2 '(' column_identifier, level ')'
```
  The data type of column_identifer must be POLYGON or MULTIPOLYGON. The value range of level is [1,30]. For more information, see S2 indexing function.
- MD5: creates an index for the MD5 hash of a column that contains VARCHAR data. Syntax:
```
MD5(column_identifier)
```
  For more information, see MD5.
- SHA256: creates an index for the SHA256 hash of a column that contains VARCHAR data. Syntax:
```
SHA256(column_identifier)
```
  For more information, see SHA256.

Wildcard constant (wildcard_string_literal)

Only search indexes support the wildcard constant (*).

The wildcard constant (*) can be used to create an index for all columns. Example: CREATE INDEX IF NOT EXISTS idx5 USING SEARCH ON test(*);.

Important

Columns that are added to the table after the index is created are not automatically added to the search index or columnar index. You must execute the ALTER INDEX statement to manually add the columns to the index.
Dynamic columns are not contained in indexes. For more information, see Dynamic columns

Included column (include_expression)

Included columns are index columns that are the same as those in the base table. Indexes with Included columns are also referred as covering indexes. A covering index can provide all data required for a query that hits the index without the need to access the base table. This way, the query performance can be ensured.

Note

You can use the WITH keyword to specify dynamic columns as included columns in secondary indexes by adding table attributes. For more information, see Secondary indexes.

Index partitioning (partition_definition)

Only search indexes support index partitioning.

Index partitioning is an index management policy based on which the server automatically splits and stores data into different partitions and prunes partitions when the data is queried.

Search indexes support range partitioning and hash partitioning. For more information, see Partition index.

Index creation method (ASYNC|SYNC)

When you use the CREATE INDEX syntax to create an index, you can specify the ASYNC or SYNC keyword to specify whether to create the index synchronously or asynchronously.

ASYNC: creates the index asynchronously. If you use this method, the index starts to be created immediately after the CREATE INDEX statement is executed. The response to the CREATE INDEX statement is immediately returned regardless of whether the index is successfully created.
SYNC: creates the index synchronously. If you use this method, the index starts to be created immediately after the CREATE INDEX statement is executed. No results are returned for the CREATE INDEX statement until the index is successfully created.

The following table shows the creation methods supported by secondary indexes and search indexes.

Creation method	Secondary indexes	Search indexes
ASYNC Important In LindormTable 2.6.1 and later versions, the default index creation method for the `CREATE INDEX` syntax is ASYNC.	〇	〇
SYNC Important The SYNC method is supported only in LindormTable 2.6.3 and later versions.	〇	〇

Index attributes (index_options)

When you use the CREATE INDEX syntax to create an index, you can use the WITH keyword to specify attributes for the index. The following tables describe the attributes supported by different types of indexes.

Secondary index

Attribute	Data type	Description
COMPRESSION	STRING	The compression algorithm you want to use for the index table. Valid values: SNAPPY ZSTD LZ4
INDEX_COVERED_TYPE	STRING	The included columns in the index. Valid values: COVERED_ALL_COLUMNS_IN_SCHEMA: All predefined non-primary key columns in the base table are included in the index. COVERED_DYNAMIC_COLUMNS: All predefined non-primary key columns and dynamic columns are included in the index. Note You do not need to specify the INCLUDE clause when you specify INDEX_COVERED_TYPE in the statement. Before you include dynamic columns in the index, make sure that dynamic columns are enabled. For more information, see Dynamic columns. You can use the `INCLUDE` keyword to include specified columns in the index.
STARTKEY	STRING	The start key of the index. Important Timestamp columns and spatio-temporal columns cannot be set to the start key of the index.
ENDKEY	STRING	The end key of the index. Important Timestamp columns and spatio-temporal columns cannot be set to the end key of the index.
NUMREGIONS	INTEGER	The estimated number of partitions of the index table. Important This attribute is not supported by Timestamp columns and spatio-temporal columns.

Search index

Attribute	Data type	Description
indexState	STRING	The status of the search index. Valid values: ACTIVE INACTIVE DISABLED
numShards	INTEGER	The number of shards. The default value is equal to twice the number of search nodes.
RANGE_TIME_PARTITION_START	INTEGER	The number of days before you create the index. The first partition is created based on the point in time n days before the point in time when you create the index. The number of days is the value of n. This attribute can be specified in scenarios in which your table contains historical data. If the timestamp of a row of historical data is earlier than the point in time based on which the first partition is created, an error message is returned. Note This attribute is required if you create a partitioned index.
RANGE_TIME_PARTITION_INTERVAL	INTEGER	The interval at which a new partition is created. For example, If you set `RANGE_TIME_PARTITION_INTERVAL` to '7', a partition is created at an interval of seven days. Note This attribute is required if you create a partitioned index.
RANGE_TIME_PARTITION_TTL	INTEGER	The retention period of your data in a partition. For example, if you set `RANGE_TIME_PARTITION_TTL` to '180', the retention period is 180 days and the data that is retained for more than 180 days in a partition is automatically removed. If you do not specify this attribute, partitions are not automatically deleted. Note This attribute is required if you create a partitioned index.
RANGE_TIME_PARTITION_MAX_OVERLAP	INTEGER	If the timestamp of the data that you write indicates a time in the future, this option specifies the maximum time period between the point in time when data is written and the point in time when you create the index. Unit: days. If you do not specify this attribute, data of up to one day in the future can be written by default.
RANGE_TIME_PARTITION_FIELD_TIMEUNIT	LONG	The unit of values in the partition key column. Default value: ms. The value ms means milliseconds. If you set the value of this attribute to s, each value in the partition key column is 10 digits in length. The value s means seconds. If you set the value of this attribute to ms, each value in the partition key column is 13 digits in length.
RANGE_TIME_PARTITION_CHS	INTEGER	The hot and cold data boundary. Data that has been stored for a period longer than the specified boundary is archived to cold storage. Unit: seconds. For example, if you set this attribute to `864000`, data that has been stored for a period longer than 10 days is archived to cold storage. Note If you do not specify this attribute, the hot and cold data separation feature is not enabled. In this case, all data is stored in hot storage by default.
INDEX_SETTINGS	STRING	The custom index attribute. The value of this attribute is a string that represents a JSON object. This attribute is compatible with the syntax of LindormSearch (Compatible with Elasticsearch). Important The INDEX_SETTINGS attribute is applicable only to LindormSearch (Compatible with Elasticsearch).

Examples

In the following examples, a base table named test is created by executing the following statement:

CREATE TABLE test (
  p1 VARCHAR NOT NULL,
  p2 INTEGER NOT NULL,
  c1 BIGINT,
  c2 DOUBLE,
  c3 VARCHAR,
  c4 TIMESTAMP,
  c5 GEOMETRY(POINT),
  PRIMARY KEY(p1, p2)
) WITH (CONSISTENCY = 'strong', MUTABILITY='MUTABLE_LATEST');

Examples related to secondary indexes

Create a secondary index asynchronously

By default, if you do not specify the creation method, the index is created asynchronously.

CREATE INDEX idx1 ON test(c1 desc) include(c3,c4) WITH (COMPRESSION='ZSTD');

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Create a federated index

Create a federated index for the c1, c2, and c3 columns.

CREATE INDEX idx1 ON test(c1, c2, c3) include(c4) SYNC WITH ( COMPRESSION ='ZSTD');

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Include all columns in the index

CREATE INDEX idx1 ON test(c4 desc) WITH (INDEX_COVERED_TYPE ='COVERED_ALL_COLUMNS_IN_SCHEMA');

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Include all dynamic columns in the index

CREATE INDEX idx1 ON test(c4 desc) WITH (INDEX_COVERED_TYPE='COVERED_DYNAMIC_COLUMNS');

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Configure the estimated number of partitions of the index table

Set the estimated number of partitions of the index table to 32.

CREATE INDEX idx1 ON test(c4 desc) include(c5,c6)  WITH (NUMREGIONS ='32');

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Specify the start and end keys of the index table

Specify that the index table is split into 32 partition in the range from the start key 11111111 to the end key9999999.

CREATE INDEX idx1 ON test(c3 desc) include(c5,c6) WITH (NUMREGIONS ='32', STARTKEY ='11111111', ENDKEY = '9999999');

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Create a Z-ORDER secondary index

Create a Z-ORDER secondary index for the c5 column that stores POINT data.

CREATE INDEX idx1 ON test(Z-ORDER(c5));

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Create an S2 secondary index

Create an S2 secondary index for the c5 column that stores POINT data. The index can be created only in asynchronous mode.
```
CREATE INDEX idx1 ON test(S2(c5, 10));
```
Build the S2 secondary index.
```
BUILD INDEX s2_idx ON test;
```

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Convert the data type of the specified column

Create a secondary index for the c3 column after converting its data type to INTEGER.

CREATE INDEX idx1 ON test(CAST(c3 AS INTEGER));

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Examples related to search indexes

Create a secondary index asynchronously

CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test(p1, p2, c1, c2, c3);

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Create a search index on all columns

Create a search index for all columns without specifying column attributes. In this case, the default values of all column attributes are retained.

CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test('*');

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Add attributes for index keys

Add non-custom attributes for index keys
Create a search index for all columns. If the c3 column is included in the index, set the following attributes for all index columns: type=text, analyzer=ik, indexed=true.
```
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test('*',c3(type=text,analyzer=ik,indexed=true));
```
Verify the result
You can execute the SHOW INDEX FROM test; statement to check whether the index is created.
Add custom attributes for index keys
In this example, a sample table named test is created by executing the following statement:
```
CREATE TABLE test (
  p1 VARCHAR NOT NULL,
  p2 INTEGER NOT NULL,
  c1 BIGINT,
  c2 DOUBLE,
  c3 VARCHAR,
  c4 TIMESTAMP,
  PRIMARY KEY(p1, p2)
) WITH (CONSISTENCY = 'strong', MUTABILITY='MUTABLE_LATEST');
```
Create a search index for all columns. Configure the following attributes for the c3 column: type=text, analyzer=ik_max_word.
```
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test('*',c3(mapping='{
  "type": "text",
  "analyzer": "ik_max_word"
}'));
```
Important
- The mapping attribute is applicable only to LindormSearch (Compatible with Elasticsearch).
- If you specify the mapping attribute, other attributes of the index key become invalid.
Verify the result
You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Configure the index status

Create a search index on the specified columns, specify index column attributes, and set the status of the search index to ACTIVE.

CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test(c1, c3(type=text,analyzer=ik)) WITH (indexState=ACTIVE,numShards=4);

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Configure custom index attributes

Create a search index and set the status of the search index to ACTIVE. Configure custom attributes to set the number of shards to 4, the compression algorithm to ZSTD, and the interval at which data is updated to 10 seconds.

CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test(c1, c3(type=text,analyzer=ik)) WITH (indexState=ACTIVE,INDEX_SETTINGS='{
  "index": {
    "codec": "zstd",
    "refresh_interval": "10s",
    "number_of_shards": 4
   }
}');

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.

Configure time-based partitioning

Perform partitioning based on the time column c4. Partitioning is automatically performed every seven days from 30 days ago. By default, partitions are retained for 90 days.

CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test (c1, c2, c3, c4)
PARTITION BY RANGE TIME(c4) PARTITIONS 16
WITH (indexState=ACTIVE, RANGE_TIME_PARTITION_START='30', RANGE_TIME_PARTITION_INTERVAL='7', RANGE_TIME_PARTITION_TTL='90', RANGE_TIME_PARTITION_MAX_OVERLAP='90');

Verify the result

You can execute the SHOW INDEX FROM test; statement to check whether the index is created.