Create indexes

Updated at: 2025-01-14 09:42
important

This topic contains important information on necessary precautions. We recommend that you read this topic carefully before proceeding.

If you want to query and analyze the collected logs in a Logstore, you must create indexes for the Logstore. This topic describes the definition, types, and billing of indexes that are supported by Simple Log Service. This topic also describes how to create indexes and disable the indexing feature, and provides examples on index creation.

Why do I need to create indexes?

In most cases, you can use keywords to query data from raw logs. For example, you want to obtain the curl/7.74.0 log that contains the curl keyword. If log splitting is not performed, the log is considered as a whole and the system does not associate the log with the curl keyword. In this case, you cannot obtain the log in Simple Log Service.

To search for the log, you must split the log into separate and searchable words. You can split a log by using delimiters. Delimiters determine the positions at which a log is split. In this example, you can use the following delimiters to split the preceding log: \n\t\r,;[]{}()&^*#@~=<>/\?:'". The log is split into curl and 7.74.0. Simple Log Service creates indexes based on the words that are obtained after log splitting. After indexes are created, you can query and analyze the log.

Simple Log Service supports full-text indexes and field indexes. If you create both full-text indexes and field indexes, the field indexes take precedence.

Index types

Full-text indexes

Simple Log Service splits a log into multiple words that are of the TEXT type by using delimiters. After you create full-text indexes, you can query logs by using keywords. For example, you can query logs that contain Chrome or Safari based on the following search statement: Chrome or Safari.

Important
  • Chinese content cannot be split by using delimiters. However, if you want to split Chinese content, you can turn on Include Chinese. Then, Simple Log Service automatically splits the Chinese content based on Chinese grammar.

  • If you create only full-text indexes for your Logstore, you can use only the full-text search syntax to specify query conditions. For more information, see Query syntax and functions.

Field indexes

Simple Log Service distinguishes logs by field name and then splits the fields by using delimiters. Supported field types are TEXT, LONG, DOUBLE and JSON. For more information, see Data types. After you create field indexes, you can specify field names and field values in the key:value format to query logs. You can also use a SELECT statement to query logs. For more information, see Field-specific search.

Important
  • If you want to query and analyze fields, you must create field indexes and use a SELECT statement. Field indexes have a higher priority than full-text indexes. If you create both full-text indexes and field indexes, the field indexes take precedence.

  • Fields of the TEXT type: You can use full text-based search statements, field-based search statements, and analytic statements to query and analyze data. An analytic statement includes a SELECT statement.

    • If full-text indexing is not enabled, full text-based search statements query data from all fields of the TEXT type.

    • If full-text indexing is enabled, full text-based search statements query data from all logs.

  • Fields of the LONG or DOUBLE type: You can use field-based search statements and analytic statements to query and analyze data. An analytic statement includes a SELECT statement.

Create indexes

Important
  • Query and analysis results vary based on index configurations. You must create indexes based on your business requirements. After indexes are created, the indexes take effect within approximately 1 minute.

  • New indexes take effect only for new logs. To query historical logs, you must reindex the logs. For more information, see Reindex logs for a Logstore.

  • Simple Log Service automatically creates indexes for specific reserved fields. For more information, see Reserved fields.

    Simple Log Service leaves delimiters empty when it creates indexes for the __topic__ and __source__ reserved fields. Therefore, only exact match is supported when you specify keywords to query the two fields.

  • Fields that are prefixed with __tag__ do not support full-text indexes. If you want to query and analyze fields that are prefixed with __tag__, you must create field indexes. Sample query statement: *| select "__tag__:__receive_time__".

  • If a log contains two fields whose names are the same, such as request_time, Simple Log Service displays one of the fields as request_time_0. The two fields are still stored as request_time in Simple Log Service. If you want to query, analyze, ship, transform, or create indexes for the fields, you must use request_time.

Console

  1. Log on to the Simple Log Service console.

  2. In the Projects section, click the project that you want to manage.

  3. On the Log Storage > Logstores tab, click the logstore that you want to manage.

  4. On the query and analysis page of the Logstore, click Enable.

    Note

    You can query the latest data approximately 1 minute after you click Enable.

    image

  5. Optional. Turn off Auto Update.

    If a Logstore is a dedicated Logstore for a cloud service or an internal Logstore, Auto Update is automatically turned on. In this case, the built-in indexes of the Logstore are automatically updated to the latest version. If you want to create indexes in the preceding scenario, turn off Auto Update in the Search & Analysis panel.

    Warning

    If you delete the indexes of a dedicated Logstore for a cloud service, features such as reports and alerting that are enabled for the Logstore may be affected.

    自动更新索引

  6. Create indexes.

    Create full-text indexes

    After you click Enable, Full-text Index is automatically turned on. You can turn on LogReduce, Case Sensitive and Include Chinese based on your business requirements. You can use default delimiters or custom delimiters.

    The following table describes the parameters.

    image

    Parameters

    Parameter

    Description

    LogReduce

    If you turn on LogReduce, Simple Log Service automatically clusters highly similar text logs during collection and extracts patterns from the logs. This way, you can have a comprehensive understanding of the logs. For more information, see LogReduce.

    Case Sensitive

    Specifies whether searches are case-sensitive.

    • If you turn on Case Sensitive, searches are case-sensitive. For example, if a log contains internalError, you can search for the log by using only the internalError keyword.

    • If you turn off Case Sensitive, searches are not case-sensitive. For example, if a log contains internalError, you can search for the log by using the INTERNALERROR or internalerror keyword.

    Include Chinese

    Specifies whether to distinguish between Chinese content and English content in searches.

    • If you turn on Include Chinese and a log contains Chinese characters, the Chinese content is split based on Chinese grammar. The English content is split by using specified delimiters.

      Important

      When Chinese content is split, the write speed is reduced. Proceed with caution.

    • If you turn off Include Chinese, all content of a log is split by using specified delimiters.

    Delimiter

    The delimiters that are used to split the content of a log into multiple words. By default, Simple Log Service uses the following delimiters: , '";=()[]{}?@&<>/:\n\t\r. If the default delimiters do not meet your business requirements, you can specify custom delimiters. All ASCII codes can be specified as delimiters.

    If you leave Delimiter empty, Simple Log Service considers an entire log as a whole. In this case, you can search for the log only by using a complete string or by performing fuzzy match.

    For example, the content of a log is /url/pic/abc.gif.

    • If you do not specify a delimiter, the content of the log is considered as a single word /url/pic/abc.gif. You can search for the log only by using the /url/pic/abc.gif keyword or by using /url/pic/* to perform fuzzy match.

    • If you set Delimiter to a forward slash (/), the content of the log is split into the following three words: url, pic, and abc.gif. You can search for the log by using the url, abc.gif, or /url/pic/abc.gif keyword, or by using pi* to perform fuzzy match.

    • If you set the Delimiter parameter to a forward slash (/) and a period (.), the content of the log is split into the following four words: url, pic, abc, and gif. You can search for the log by using one of the preceding words or by performing fuzzy match.

    Create field indexes

    After you click Enable, you can click Automatic Index Generation in the Search & Analysis panel. Simple Log Service automatically generates field indexes based on the first log in the preview results of data collection. If you want to create custom field indexes, click the plus sign (+). For more information, see Parameters.

    The first time you open the Search & Analysis panel, the following settings are displayed.image

    The following table describes the parameters.

    image

    Parameters

    Parameter

    Description

    Field Name

    The name of the log field. Example: client_ip.

    The name can contain only letters, digits, and underscores (_). It must start with a letter or an underscore (_).

    Important
    • If you want to create an index for a __tag__ field, such as a public IP address or UNIX timestamp, you must set the Field Name parameter to a value in the __tag__:KEY format. Example: __tag__:__receive_time__. For more information, see Reserved fields.

    • __tag__ fields do not support numeric indexes. When you create an index for a __tag__ field, you must set the Type parameter to text.

    Type

    The data type of the field value. Valid values: text, long, double, and json. For more information, see Data types.

    If you set the data type for a field to long or double, you cannot configure the Case Sensitive, Include Chinese, or Delimiter parameter for the field.

    Alias

    The alias of the field. For example, you can set the alias of the client_ip field to ip.

    The alias can contain only letters, digits, and underscores (_). It must start with a letter or an underscore (_).

    Important

    You can use the alias of a field only in an analytic statement. You must use the original name of a field in a search statement. An analytic statement includes a SELECT statement. For more information, see Column aliases.

    Case Sensitive

    Specifies whether searches are case-sensitive.

    • If you turn on Case Sensitive, searches are case-sensitive. For example, if a log contains internalError, you can search for the log by using only the internalError keyword.

    • If you turn off Case Sensitive, searches are not case-sensitive. For example, if a log contains internalError, you can search for the log by using the INTERNALERROR or internalerror keyword.

    Delimiter

    The delimiters that are used to split the content of a log into multiple words. By default, Simple Log Service uses the following delimiters: , '";=()[]{}?@&<>/:\n\t\r. If the default delimiters do not meet your business requirements, you can specify custom delimiters. All ASCII codes can be specified as delimiters.

    If you leave Delimiter empty, Simple Log Service considers an entire log as a whole. In this case, you can search for the log only by using a complete string or by performing fuzzy match.

    For example, the content of a log is /url/pic/abc.gif.

    • If you do not specify a delimiter, the content of the log is considered as a single word /url/pic/abc.gif. You can search for the log only by using the /url/pic/abc.gif keyword or by using /url/pic/* to perform fuzzy match.

    • If you set Delimiter to a forward slash (/), the content of the log is split into the following three words: url, pic, and abc.gif. You can search for the log by using the url, abc.gif, or /url/pic/abc.gif keyword, or by using pi* to perform fuzzy match.

    • If you set the Delimiter parameter to a forward slash (/) and a period (.), the content of the log is split into the following four words: url, pic, abc, and gif. You can search for the log by using one of the preceding words or by performing fuzzy match.

    Include Chinese

    Specifies whether to distinguish between Chinese content and English content in searches.

    • If you turn on Include Chinese and a log contains Chinese characters, the Chinese content is split based on Chinese grammar. The English content is split by using specified delimiters.

      Important

      When Chinese content is split, the write speed is reduced. Proceed with caution.

    • If you turn off Include Chinese, all content of a log is split by using specified delimiters.

    Enable Analytics

    You can perform statistical analysis on a field only if you turn on Enable Analytics for the field.

  7. Optional. Specify the maximum length of a field value.

    By default, a string is obtained after truncation in the process of SQL analysis. The default maximum length of a field value that can be retained for analysis is 2,048 bytes, which is equivalent to 2 KB. You can change the value of the Maximum Statistics Field Length parameter in the lower part of the Search & Analysis panel. Valid values: 64 to 16384. Unit: bytes. Note that new indexes take effect only for new logs.

    Important

    If the length of a field value exceeds the value of this parameter, the field value is truncated and the excess part is not involved in analysis.

    设置字段最大长度

API

Simple Log Service allows you to call API operations to manage indexes. For more information, see the following topics:

SDK

Simple Log Service allows you to use SDKs for multiple programming languages to manage indexes. The following section describes some commonly used SDKs. For more information, see Overview of Simple Log Service SDKs.

Java
Python

You can use Simple Log Service SDK for Java to manage indexes. For more information, see Use Simple Log Service SDK for Java to manage indexes.

You can use Simple Log Service SDK for Python to manage indexes. For more information, see Use Simple Log Service SDK for Python to manage indexes.

Simple Log Service is also compatible with Alibaba Cloud SDKs. For more information, see Simple Log Service_SDK Center_Alibaba Cloud OpenAPI Explorer.

CLI

You can use Simple Log Service CLIs to manage indexes. For more information, see the following topics:

Update indexes

Procedure

On the query and analysis page of the Logstore that you want to manage, choose Index Attributes > Attributes. Query and analysis results vary based on index configurations. You must update indexes based on your business requirements. After indexes are updated, the new indexes take effect within approximately 1 minute.

image

Disable the indexing feature

Important

After you disable the indexing feature for a Logstore, the storage space that is occupied by historical indexes is automatically released after the data retention period of the Logstore elapses.

Procedure

On the query and analysis page of the Logstore that you want to manage, choose Index Attributes > Disable.

image

Index configuration examples

Example 1

A log contains the request_time field, and the request_time>100 field-based search statement is executed.

  • If only full-text indexes are created, logs that contain request_time, >, and 100 are returned. The greater-than sign (>) is not a delimiter.

  • If only field indexes are created and the field types are DOUBLE and LONG, logs whose request_time field value is greater than 100 are returned.

  • If both full-text indexes and field indexes are created and the field types are DOUBLE and LONG, the full-text indexes do not take effect for the request_time field and logs whose request_time field value is greater than 100 are returned.

Example 2

A log contains the request_time field, and the request_time full text-based search statement is executed.

  • If only field indexes are created and the field types are DOUBLE and LONG, no logs are returned.

  • If only full-text indexes are created, logs that contain the request_time field are returned. In this case, the statement queries data from all logs.

  • If only field indexes are created and the field type is TEXT, logs that contain the request_time field are returned. In this case, the statement queries data from all fields of the TEXT type.

Example 3

A log contains the status field, and the * | SELECT status, count(*) AS PV GROUP BY status query statement is executed.

  • If only full-text indexes are created, no logs are returned.

  • If an index is created for the status field, the total numbers of page views (PVs) for different status codes are returned.

Index traffic descriptions

Full-text indexes

All field names and field values are stored as text. In this case, field names and field values are both included in the calculation of index traffic.

Field indexes

The method that is used to calculate index traffic varies based on the data type of a field.

  • TEXT type: Field names and field values are both included in the calculation of index traffic.

  • LONG and DOUBLE types: Field names are not included in the calculation of index traffic. Each field value is counted as 8 bytes in index traffic.

    For example, if you create an index for the status field of the LONG type and the field value is 200, the string status is not included in the calculation of index traffic and the value 200 is counted as 8 bytes in index traffic.

  • JSON type: Field names and field values are both included in the calculation of index traffic. The subfields that are not indexed are also included. For more information, see Why is index traffic generated for JSON subfields that are not indexed?

    • If a subfield is not indexed, index traffic is calculated by regarding the data type of the subfield as TEXT.

    • If a subfield is indexed, index traffic is calculated based on the data type of the subfield. The data type can be TEXT, LONG or DOUBLE.

Billing overview

Logstores that use the pay-by-ingested-data billing mode

Logstores that use the pay-by-feature billing mode

  • Indexes occupy storage space. For more information about storage types, see Configure intelligent tiered storage.

  • When you create indexes, traffic is generated. You are charged for index traffic based on the index traffic of log data and index traffic of log data in Query Logstores items. For more information, see Billable items of pay-by-feature. For more information about how to reduce index traffic, see How do I reduce index traffic fees?

  • Reindexing generates fees. During reindexing, you are charged based on the same billable items and prices as when you create indexes.

What to do next

FAQ

  • On this page (1, T)
  • Why do I need to create indexes?
  • Index types
  • Full-text indexes
  • Field indexes
  • Create indexes
  • Console
  • API
  • SDK
  • CLI
  • Update indexes
  • Procedure
  • Disable the indexing feature
  • Procedure
  • Index configuration examples
  • Example 1
  • Example 2
  • Example 3
  • Index traffic descriptions
  • Full-text indexes
  • Field indexes
  • Billing overview
  • Logstores that use the pay-by-ingested-data billing mode
  • Logstores that use the pay-by-feature billing mode
  • What to do next
  • FAQ
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare