Create indexes to query and analyze logs - Simple Log Service

If you want to query and analyze the collected logs in a Logstore, you must create indexes for the Logstore. This topic describes the definition, types, and billing of indexes that are supported by Simple Log Service. This topic also describes how to create indexes and disable the indexing feature, and provides examples on index creation.

Why do I need to create indexes?

In most cases, you can use keywords to query data from raw logs. For example, you want to obtain the curl/7.74.0 log that contains the curl keyword. If log splitting is not performed, the log is considered as a whole and the system does not associate the log with the curl keyword. In this case, you cannot obtain the log in Simple Log Service.

To search for the log, you must split the log into separate and searchable words. You can split a log by using delimiters. Delimiters determine the positions at which a log is split. In this example, you can use the following delimiters to split the preceding log: \n\t\r,;[]{}()&^*#@~=<>/\?:'". The log is split into curl and 7.74.0. Simple Log Service creates indexes based on the words that are obtained after log splitting. After indexes are created, you can query and analyze the log.

Simple Log Service supports full-text indexes and field indexes. If you create both full-text indexes and field indexes, the field indexes take precedence.

Index types

Full-text indexes

Simple Log Service splits a log into multiple words that are of the TEXT type by using delimiters. After you create full-text indexes, you can query logs by using keywords. For example, you can query logs that contain Chrome or Safari based on the following search statement: Chrome or Safari.

Important

Chinese content cannot be split by using delimiters. However, if you want to split Chinese content, you can turn on Include Chinese. Then, Simple Log Service automatically splits the Chinese content based on Chinese grammar.
If you create only full-text indexes for your Logstore, you can use only the full-text search syntax to specify query conditions. For more information, see Query syntax and functions.

Field indexes

Simple Log Service distinguishes logs by field name and then splits the fields by using delimiters. Supported field types are TEXT, LONG, DOUBLE and JSON. For more information, see Data types. After you create field indexes, you can specify field names and field values in the key:value format to query logs. You can also use a SELECT statement to query logs. For more information, see Field-specific search.

Important

If you want to query and analyze fields, you must create field indexes and use a SELECT statement. Field indexes have a higher priority than full-text indexes. If you create both full-text indexes and field indexes, the field indexes take precedence.
Fields of the TEXT type: You can use full text-based search statements, field-based search statements, and analytic statements to query and analyze data. An analytic statement includes a SELECT statement.
- If full-text indexing is not enabled, full text-based search statements query data from all fields of the TEXT type.
- If full-text indexing is enabled, full text-based search statements query data from all logs.
Fields of the LONG or DOUBLE type: You can use field-based search statements and analytic statements to query and analyze data. An analytic statement includes a SELECT statement.

Create indexes

Important

Query and analysis results vary based on index configurations. You must create indexes based on your business requirements. After indexes are created, the indexes take effect within approximately 1 minute.
New indexes take effect only for new logs. To query historical logs, you must reindex the logs. For more information, see Reindex logs for a Logstore.
Simple Log Service automatically creates indexes for specific reserved fields. For more information, see Reserved fields.
Simple Log Service leaves delimiters empty when it creates indexes for the __topic__ and __source__ reserved fields. Therefore, only exact match is supported when you specify keywords to query the two fields.
Fields that are prefixed with __tag__ do not support full-text indexes. If you want to query and analyze fields that are prefixed with __tag__, you must create field indexes. Sample query statement: *| select "__tag__:__receive_time__".
If a log contains two fields whose names are the same, such as request_time, Simple Log Service displays one of the fields as request_time_0. The two fields are still stored as request_time in Simple Log Service. If you want to query, analyze, ship, transform, or create indexes for the fields, you must use request_time.

Console

Log on to the Simple Log Service console.
In the Projects section, click the project that you want to manage.
On the Log Storage > Logstores tab, click the logstore that you want to manage.
On the query and analysis page of the Logstore, click Enable.
Note
You can query the latest data approximately 1 minute after you click Enable.
Optional. Turn off Auto Update.
If a Logstore is a dedicated Logstore for a cloud service or an internal Logstore, Auto Update is automatically turned on. In this case, the built-in indexes of the Logstore are automatically updated to the latest version. If you want to create indexes in the preceding scenario, turn off Auto Update in the Search & Analysis panel.
Warning
If you delete the indexes of a dedicated Logstore for a cloud service, features such as reports and alerting that are enabled for the Logstore may be affected.

Create indexes.

Create full-text indexes

After you click Enable, Full-text Index is automatically turned on. You can turn on LogReduce, Case Sensitive and Include Chinese based on your business requirements. You can use default delimiters or custom delimiters.

The following table describes the parameters.

Parameters

Parameter	Description
LogReduce	If you turn on LogReduce, Simple Log Service automatically clusters highly similar text logs during collection and extracts patterns from the logs. This way, you can have a comprehensive understanding of the logs. For more information, see LogReduce.
Case Sensitive	Specifies whether searches are case-sensitive. If you turn on Case Sensitive, searches are case-sensitive. For example, if a log contains `internalError`, you can search for the log by using only the `internalError` keyword. If you turn off Case Sensitive, searches are not case-sensitive. For example, if a log contains `internalError`, you can search for the log by using the `INTERNALERROR` or `internalerror` keyword.
Include Chinese	Specifies whether to distinguish between Chinese content and English content in searches. If you turn on Include Chinese and a log contains Chinese characters, the Chinese content is split based on Chinese grammar. The English content is split by using specified delimiters. Important When Chinese content is split, the write speed is reduced. Proceed with caution. If you turn off Include Chinese, all content of a log is split by using specified delimiters.
Delimiter	The delimiters that are used to split the content of a log into multiple words. By default, Simple Log Service uses the following delimiters: `, '";=()[]{}?@&<>/:\n\t\r`. If the default delimiters do not meet your business requirements, you can specify custom delimiters. All ASCII codes can be specified as delimiters. If you leave Delimiter empty, Simple Log Service considers an entire log as a whole. In this case, you can search for the log only by using a complete string or by performing fuzzy match. For example, the content of a log is `/url/pic/abc.gif`. If you do not specify a delimiter, the content of the log is considered as a single word `/url/pic/abc.gif`. You can search for the log only by using the `/url/pic/abc.gif` keyword or by using `/url/pic/` to perform fuzzy match. If you set Delimiter to a forward slash (/), the content of the log is split into the following three words: `url`, `pic`, and `abc.gif`. You can search for the log by using the `url`, `abc.gif`, or `/url/pic/abc.gif` keyword, or by using `pi` to perform fuzzy match. If you set the Delimiter parameter to a forward slash (/) and a period (.), the content of the log is split into the following four words: `url`, `pic`, `abc`, and `gif`. You can search for the log by using one of the preceding words or by performing fuzzy match.

Create field indexes

After you click Enable, you can click Automatic Index Generation in the Search & Analysis panel. Simple Log Service automatically generates field indexes based on the first log in the preview results of data collection. If you want to create custom field indexes, click the plus sign (+). For more information, see Parameters.

The first time you open the Search & Analysis panel, the following settings are displayed.

The following table describes the parameters.

Parameters

Parameter	Description
Field Name	The name of the log field. Example: `client_ip`. The name can contain only letters, digits, and underscores (_). It must start with a letter or an underscore (_). Important If you want to create an index for a `__tag__` field, such as a public IP address or UNIX timestamp, you must set the Field Name parameter to a value in the `__tag__:KEY` format. Example: `__tag__:__receive_time__`. For more information, see Reserved fields. `__tag__` fields do not support numeric indexes. When you create an index for a `__tag__` field, you must set the Type parameter to text.
Type	The data type of the field value. Valid values: text, long, double, and json. For more information, see Data types. If you set the data type for a field to long or double, you cannot configure the Case Sensitive, Include Chinese, or Delimiter parameter for the field.
Alias	The alias of the field. For example, you can set the alias of the `client_ip` field to `ip`. The alias can contain only letters, digits, and underscores (_). It must start with a letter or an underscore (_). Important You can use the alias of a field only in an analytic statement. You must use the original name of a field in a search statement. An analytic statement includes a SELECT statement. For more information, see Column aliases.
Case Sensitive	Specifies whether searches are case-sensitive. If you turn on Case Sensitive, searches are case-sensitive. For example, if a log contains `internalError`, you can search for the log by using only the `internalError` keyword. If you turn off Case Sensitive, searches are not case-sensitive. For example, if a log contains `internalError`, you can search for the log by using the `INTERNALERROR` or `internalerror` keyword.
Delimiter	The delimiters that are used to split the content of a log into multiple words. By default, Simple Log Service uses the following delimiters: `, '";=()[]{}?@&<>/:\n\t\r`. If the default delimiters do not meet your business requirements, you can specify custom delimiters. All ASCII codes can be specified as delimiters. If you leave Delimiter empty, Simple Log Service considers an entire log as a whole. In this case, you can search for the log only by using a complete string or by performing fuzzy match. For example, the content of a log is `/url/pic/abc.gif`. If you do not specify a delimiter, the content of the log is considered as a single word `/url/pic/abc.gif`. You can search for the log only by using the `/url/pic/abc.gif` keyword or by using `/url/pic/` to perform fuzzy match. If you set Delimiter to a forward slash (/), the content of the log is split into the following three words: `url`, `pic`, and `abc.gif`. You can search for the log by using the `url`, `abc.gif`, or `/url/pic/abc.gif` keyword, or by using `pi` to perform fuzzy match. If you set the Delimiter parameter to a forward slash (/) and a period (.), the content of the log is split into the following four words: `url`, `pic`, `abc`, and `gif`. You can search for the log by using one of the preceding words or by performing fuzzy match.
Include Chinese	Specifies whether to distinguish between Chinese content and English content in searches. If you turn on Include Chinese and a log contains Chinese characters, the Chinese content is split based on Chinese grammar. The English content is split by using specified delimiters. Important When Chinese content is split, the write speed is reduced. Proceed with caution. If you turn off Include Chinese, all content of a log is split by using specified delimiters.
Enable Analytics	You can perform statistical analysis on a field only if you turn on Enable Analytics for the field.

Optional. Specify the maximum length of a field value.
By default, a string is obtained after truncation in the process of SQL analysis. The default maximum length of a field value that can be retained for analysis is 2,048 bytes, which is equivalent to 2 KB. You can change the value of the Maximum Statistics Field Length parameter in the lower part of the Search & Analysis panel. Valid values: 64 to 16384. Unit: bytes. Note that new indexes take effect only for new logs.
Important
If the length of a field value exceeds the value of this parameter, the field value is truncated and the excess part is not involved in analysis.

API

Simple Log Service allows you to call API operations to manage indexes. For more information, see the following topics:

SDK

Simple Log Service allows you to use SDKs for multiple programming languages to manage indexes. The following section describes some commonly used SDKs. For more information, see Overview of Simple Log Service SDKs.

Java

You can use Simple Log Service SDK for Java to manage indexes. For more information, see Use Simple Log Service SDK for Java to manage indexes.

Python

You can use Simple Log Service SDK for Python to manage indexes. For more information, see Use Simple Log Service SDK for Python to manage indexes.

Simple Log Service is also compatible with Alibaba Cloud SDKs. For more information, see Simple Log Service_SDK Center_Alibaba Cloud OpenAPI Explorer.

CLI

You can use Simple Log Service CLIs to manage indexes. For more information, see the following topics:

Update indexes

Procedure

On the query and analysis page of the Logstore that you want to manage, choose Index Attributes > Attributes. Query and analysis results vary based on index configurations. You must update indexes based on your business requirements. After indexes are updated, the new indexes take effect within approximately 1 minute.

Disable the indexing feature

Important

After you disable the indexing feature for a Logstore, the storage space that is occupied by historical indexes is automatically released after the data retention period of the Logstore elapses.

Procedure

On the query and analysis page of the Logstore that you want to manage, choose Index Attributes > Disable.

Index configuration examples

Example 1

A log contains the request_time field, and the request_time>100 field-based search statement is executed.

If only full-text indexes are created, logs that contain request_time, >, and 100 are returned. The greater-than sign (>) is not a delimiter.
If only field indexes are created and the field types are DOUBLE and LONG, logs whose request_time field value is greater than 100 are returned.
If both full-text indexes and field indexes are created and the field types are DOUBLE and LONG, the full-text indexes do not take effect for the request_time field and logs whose request_time field value is greater than 100 are returned.

Example 2

A log contains the request_time field, and the request_time full text-based search statement is executed.

If only field indexes are created and the field types are DOUBLE and LONG, no logs are returned.
If only full-text indexes are created, logs that contain the request_time field are returned. In this case, the statement queries data from all logs.
If only field indexes are created and the field type is TEXT, logs that contain the request_time field are returned. In this case, the statement queries data from all fields of the TEXT type.

Example 3

A log contains the status field, and the * | SELECT status, count(*) AS PV GROUP BY status query statement is executed.

If only full-text indexes are created, no logs are returned.
If an index is created for the status field, the total numbers of page views (PVs) for different status codes are returned.

Index traffic descriptions

Full-text indexes

All field names and field values are stored as text. In this case, field names and field values are both included in the calculation of index traffic.

Field indexes

The method that is used to calculate index traffic varies based on the data type of a field.

TEXT type: Field names and field values are both included in the calculation of index traffic.
LONG and DOUBLE types: Field names are not included in the calculation of index traffic. Each field value is counted as 8 bytes in index traffic.
For example, if you create an index for the status field of the LONG type and the field value is 200, the string status is not included in the calculation of index traffic and the value 200 is counted as 8 bytes in index traffic.
JSON type: Field names and field values are both included in the calculation of index traffic. The subfields that are not indexed are also included. For more information, see Why is index traffic generated for JSON subfields that are not indexed?
- If a subfield is not indexed, index traffic is calculated by regarding the data type of the subfield as TEXT.
- If a subfield is indexed, index traffic is calculated based on the data type of the subfield. The data type can be TEXT, LONG or DOUBLE.

Billing overview

Logstores that use the pay-by-ingested-data billing mode

Indexes occupy storage space. For more information about storage types, see Configure intelligent tiered storage.
Reindexing does not generate fees.
For more information about the billing of index traffic, see Billable items of pay-by-ingested-data.

Logstores that use the pay-by-feature billing mode

Indexes occupy storage space. For more information about storage types, see Configure intelligent tiered storage.
When you create indexes, traffic is generated. You are charged for index traffic based on the index traffic of log data and index traffic of log data in Query Logstores items. For more information, see Billable items of pay-by-feature. For more information about how to reduce index traffic, see How do I reduce index traffic fees?
Reindexing generates fees. During reindexing, you are charged based on the same billable items and prices as when you create indexes.

What to do next

For more information about query and analysis examples, see the following topics:
For more information about how to improve query performance, see Accelerate the query and analysis of logs.
For more information about how to query and analyze JSON-formatted website logs, see Query and analyze JSON logs.

FAQ

What do I do if I cannot query logs after the logs are imported to Simple Log Service?
- Check whether the delimiters that you specify meet the requirements.
- Configured indexes take effect only for new logs. If you want to query and analyze historical logs, you must reindex the logs. For more information, see Reindex logs for a Logstore.
How do I use two conditions to query logs?
If you want to use two conditions to query logs, specify two statements at a time. For example, if you want to query logs whose status is neither OK nor Unknown in a Logstore, you can specify not OK not Unknown to obtain the logs.
How do I query logs that contain multiple keywords?
For example, if you want to query logs whose http_user_agent field value contains like Gecko, you can use one of the following methods:
- Phrase search: http_user_agent:#"like Gecko". For more information, see Phrase search.
- LIKE clause: * | Select * where http_user_agent like '%like Gecko%'
How do I query logs by using a keyword that contains spaces?
For example, if you query logs by using the POS version keyword, logs that contain POS or version are returned. If you query logs by using the "POS version" keyword, logs that contain POS version are returned.
FAQ about log query
How do I resolve common errors that may occur when I query and analyze logs?
How do I query logs by using fuzzy match?
FAQ about the query and analysis of JSON logs
How do I download logs to a local device?
Why are field values truncated when I query and analyze logs?