All Products
Search
Document Center

Simple Log Service:Create indexes

Last Updated:Dec 20, 2024

To effectively query and analyze logs collected in a Logstore, you must create indexes. This topic explains the concepts of Simple Log Service indexes, including index types, creation methods, disabling, configuration examples, and billing information.

Why you need to create indexes

Keywords are typically used to retrieve specific content from raw logs, such as entries containing curl: curl/7.74.0. Without splitting, the log text is treated as a single entity and cannot fully match the keyword curl, making it non-retrievable by Simple Log Service.

For efficient retrieval, logs must be split into searchable words. Logs are divided using delimiters, which determine the split points. For instance, with delimiters such as \n\t\r,;[]{}()&^*#@~=<>/\?:'", the log splits into words like curl and 7.74.0. Simple Log Service then creates indexes based on these words, enabling log queries and analyses.

Simple Log Service projects support the creation of both full-text and field indexes. If both are created, field index configurations take priority.

Index types

Full-text indexes

Full-text indexes divide the entire log into text type words using Delimiters. Once full-text indexes are in place, you can perform keyword-based log queries. For instance, the query Chrome or Safari retrieves logs containing either Chrome or Safari.

Important
  • Delimiters are not compatible with Chinese characters. Enabling the Include Chinese option allows Simple Log Service to automatically segment Chinese text according to the rules of Chinese grammar.

  • If only full-text indexes are set up, you can use only full-text query capabilities. For more details, see Query syntax and features.

Field indexes

Field indexes categorize logs by field names (KEY) and then segment the content within fields using delimiters. They support four data types: text, long, double, and JSON. For more information, see Data types. After setting up field indexes, you can specify field names and values (Key:Value) for log queries or use SELECT statements. For more details, see Field queries.

Important
  • To query or analyze fields using SELECT statements, you must create field indexes. They take precedence over full-text indexes. If both are created, field index settings take priority.

  • Fields of the text type can be queried using full-text, field, and analytic (SELECT) query statements.

    • Without full-text indexes, full-text query statements return results from all text type fields.

    • With full-text indexes, full-text query statements return results from all logs.

  • Fields of the long and double types can be queried and analyzed using field and analytic (SELECT) query statements.

Create indexes

Important
  • Different index configurations yield varying query and analysis outcomes. Set up indexes according to your business needs. Indexes typically take effect within one minute of creation.

  • Indexes are applicable only to new logs. To query historical logs, use the Reindex feature.

  • Simple Log Service automatically indexes certain reserved fields. For more information, see Reserved fields.

    The delimiters for the __topic__ and __source__ indexes are empty. Exact keyword matches are required when querying these fields.

  • Fields prefixed with __tag__ do not support full-text indexing. Before querying or analyzing such fields, you must create field indexes, for example, *| select "__tag__:__receive_time__".

  • If a log contains duplicate field names (e.g., both named request_time), Simple Log Service displays one as request_time_0. The original field name request_time is still used in storage, so use the original name when creating indexes, querying, analyzing, shipping, or transforming logs.

Console method

  1. Log on to the Simple Log Service console.

  2. In the Projects section, click the project that you want to manage.

  3. Choose Log Storage > Logstores. On the Logstores tab, click the Logstore that you want to manage.

  4. On the Query And Analysis page of the Logstore, click Enable Index.

    Note

    You can access the latest log data approximately 1 minute after enabling the index.

    image

  5. (Optional) Disable automatic index updates

    When the Logstore is dedicated to a cloud service or is an internal Logstore, the Auto Update switch is enabled by default. This allows automatic updates to the latest built-in index version. To create custom indexes, disable the Auto Update switch on the Query And Analysis panel.

    Warning

    Removing indexes from a dedicated Logstore for a cloud service may impact related features such as reports and alerts.

    自动更新索引

  6. Create indexes

    Create full-text indexes

    Upon selecting Enable Index, the Full-text Index feature activates automatically. You have the option to turn on additional features such as Log Clustering, Case Sensitivity, and Include Chinese. Furthermore, you can define Delimiters or create custom Delimiters.

    The page is configured as follows:

    image

    Configuration item descriptions are as follows:

    Parameter

    Description

    Log Clustering

    Activating the Log Clustering feature allows Simple Log Service to automatically group similar logs during text log collection. This process identifies common log patterns, facilitating a quick overview of the log landscape. For more information, see Log Clustering.

    Case-sensitive

    Determines if searches are case-sensitive.

    • Enabling the Case-sensitive option will make searches distinguish between uppercase and lowercase letters. For instance, if a log contains internalError, it can only be retrieved by entering the exact keyword internalError.

    • Disabling the Case-sensitive option makes searches case-insensitive. For example, logs containing internalError can be retrieved using either INTERNALERROR or internalerror.

    Contains Chinese

    Determines whether searches should differentiate between Chinese and English content.

    • Enabling the Contains Chinese feature means that Chinese content in logs will be segmented according to Chinese syntax rules, while English content will be split using the specified delimiters.

      Important

      Note that enabling segmentation of Chinese content may reduce the write speed. Adjust this setting carefully based on your business needs.

    • Disabling the Contains Chinese feature will result in all content being split based on the specified delimiters, regardless of language.

    Delimiters

    Delimiters are used to segment log content into individual words. Simple Log Service's default delimiters include , '";=()[]{}?@&<>/:\n\t\r. You can customize these if the defaults do not suit your needs. Any ASCII character can be used as a delimiter.

    Setting the Delimiter to empty treats the field value as a single entity, requiring a complete string match or a fuzzy query for log retrieval.

    For example, consider the log content /url/pic/abc.gif.

    • Without set delimiters, the log is considered a single word /url/pic/abc.gif. To locate the log, you must use either the exact string /url/pic/abc.gif or employ a fuzzy query such as /url/pic/*.

    • When the delimiter is a forward slash (/), the original log is divided into three segments: url, pic, and abc.gif. These segments can be searched individually or through a fuzzy query using terms such as url, abc.gif, pi*, or the full path /url/pic/abc.gif.

    • Setting delimiters to both a forward slash (/) and a period (.) splits the log into url, pic, abc, and gif, allowing for a more detailed search using any of the resulting words or a fuzzy query.

    Create field indexes

    After clicking Enable Index, you may select Query Analysis followed by Auto-generate Index. Simple Log Service will then automatically generate field indexes using the first log from the preview results of data collection. To customize field indexes, select + at the bottom of the page. For detailed information on specific fields, refer to Configuration Item Descriptions.

    Upon first opening, the page will display as follows: image

    The configuration items for field indexes are:

    image

    Configuration item descriptions are as follows:

    Parameter

    Description

    Field Name

    The name of the log field (KEY), such as client_ip. Names must consist of letters, digits, and underscores (_), beginning with a letter or underscore (_).

    The name may consist solely of letters, digits, and underscores (_), and must begin with either a letter or an underscore (_).

    Important
    • For __tag__ fields like public IP addresses and Unix timestamps, set the Field Name as __tag__:KEY, for example, __tag__:__receive_time__. See Reserved Fields for more details.

    • Numeric indexes are not supported for the __tag__ fields. Ensure that the Type for all __tag__ field indexes is set to text.

    Type

    The data type of the log field value, with valid options including text, long, double, and json. Refer to Data Types for further information.

    Settings such as Case-sensitive, Contains Chinese, and Delimiter are not applicable to long and double types.

    Alias

    An alias for a field, such as assigning ip as an alias for the client_ip field. Aliases must consist of letters, digits, and underscores (_), and start with a letter or underscore (_).

    An alias must begin with a letter or an underscore (_) and can only include letters, digits, and underscores (_).

    Important

    Field aliases are only usable within analytic statements. The original field name must be used in search statements. For additional details, see Column Aliases.

    Case-sensitive

    Determines if searches are case-sensitive.

    • Enabling the Case-sensitive option will make searches distinguish between uppercase and lowercase letters. For instance, if a log contains internalError, it can only be retrieved by entering the exact keyword internalError.

    • Disabling the Case-sensitive option makes searches case-insensitive. For example, logs containing internalError can be retrieved using either INTERNALERROR or internalerror.

    Delimiter

    Delimiters are used to segment log content into individual words. Simple Log Service's default delimiters include , '";=()[]{}?@&<>/:\n\t\r. You can customize these if the defaults do not suit your needs. Any ASCII character can be used as a delimiter.

    Setting the Delimiter to empty treats the field value as a single entity, requiring a complete string match or a fuzzy query for log retrieval.

    For example, consider the log content /url/pic/abc.gif.

    • Without set delimiters, the log is considered a single word /url/pic/abc.gif. To locate the log, you must use either the exact string /url/pic/abc.gif or employ a fuzzy query such as /url/pic/*.

    • When the delimiter is a forward slash (/), the original log is divided into three segments: url, pic, and abc.gif. These segments can be searched individually or through a fuzzy query using terms such as url, abc.gif, pi*, or the full path /url/pic/abc.gif.

    • Setting delimiters to both a forward slash (/) and a period (.) splits the log into url, pic, abc, and gif, allowing for a more detailed search using any of the resulting words or a fuzzy query.

    Contains Chinese

    Determines whether searches should differentiate between Chinese and English content.

    • Enabling the Contains Chinese feature means that Chinese content in logs will be segmented according to Chinese syntax rules, while English content will be split using the specified delimiters.

      Important

      Note that enabling segmentation of Chinese content may reduce the write speed. Adjust this setting carefully based on your business needs.

    • Disabling the Contains Chinese feature will result in all content being split based on the specified delimiters, regardless of language.

    Enable Statistics

    Enabling the Enable Statistics option allows for statistical analysis of the field.

  7. (Optional) Set the maximum length of the field

    During SQL analysis, fields are truncated to a default length. The standard configuration for Simple Log Service is 2048 bytes, or 2 KB. To modify this limit, navigate to the bottom of the Query Analysis page and adjust the setting under Maximum Length Of Statistical Fields (text). The permissible range is from 64 to 16384 bytes. Remember, changes to the index configuration will only affect incremental data.

    Important

    Should a single field value exceed the set maximum length, the portion beyond the limit will be truncated and excluded from the analysis.

    设置字段最大长度

API methods

Simple Log Service allows for index management via its API methods. For more information, see:

SDK method

Simple Log Service offers index management capabilities through its multi-language SDKs. Below are some commonly used SDKs. For more information, see the SDK reference overview.

Java

For detailed instructions on managing indexes with the Simple Log Service Java SDK, refer to Manage indexes using the Java SDK.

Python

For detailed instructions on managing indexes with the Simple Log Service Python SDK, refer to Manage indexes using the Python SDK.

Simple Log Service is also compatible with Alibaba Cloud SDKs. For more information, see Simple Log Service_SDK Center_Alibaba Cloud OpenAPI Explorer.

CLI method

Simple Log Service offers a Command Line Interface (CLI) for index management. For detailed operations, refer to:

Disable indexes

Important

After Disabling Indexes, the storage space used by historical indexes will be automatically cleared once the data retention period for the associated Logstore ends.

Procedure

Navigate to the Query and Analysis page of the desired Logstore, and select Query Analysis Properties > Disable Indexes.

image

Index configuration examples

Example 1

The log content includes a request_time field. To query this field, execute the statement request_time>100.

  • Configuring only full-text indexes will return logs containing the terms request_time, >, and 100.

  • Configuring only field indexes for double and long types will return logs where request_time exceeds 100.

  • Configuring both full-text and field indexes for double and long types will invalidate the full-text index for request_time, returning logs where request_time exceeds 100.

Example 2

The log content includes a request_time field. To perform a full-text query, use the statement request_time.

  • Configuring only field indexes for double and long types will not return any related logs.

  • Configuring only full-text indexes will query logs containing request_time from all log texts.

  • Configuring only field indexes for the text type will query logs containing request_time from fields indexed as text.

Example 3

The log content includes a status field. To analyze this field, execute the statement * | SELECT status, count(*) AS PV GROUP BY status.

  • Configuring only full-text indexes will not return any related logs.

  • Configuring a field index for status will yield the different status codes and their corresponding total page views (PVs).

Index traffic descriptions

Full-text index

In a full-text index, both field names and field values are stored as text and are included in the index traffic.

Field index

Index traffic calculation depends on the data type of the field.

  • Text type: Index traffic includes both field names and field values.

  • The Long and double data types: Field names do not contribute to index traffic. Index traffic is consistently 8 bytes for each field value.

    For instance, when an index is created for the status field (long type), if the field value is 200, the term status does not contribute to the index size, whereas the index size for the value 200 is consistently 8 bytes.

  • JSON type: Index traffic encompasses both field names and field values, including child nodes not indexed. For more information, see calculating index traffic for JSON type fields.

    • Child nodes without indexes are treated as text type for index traffic calculation.

    • Child nodes with indexes have their index traffic calculated according to their data type, whether text, long, or double.

Billing instructions

Logstore billed by data write volume

Logstore billed by feature usage

What to do next

FAQ

  • What should I do if I'm unable to query logs after importing them into Simple Log Service?

    • Verify that the specified delimiters conform to the requirements.

    • Indexes are only applied to new logs. To query and analyze historical logs, you must reindex them. For more information, see Reindex.

  • How can I query logs using two conditions?

    To query logs using two conditions, specify both statements simultaneously. For instance, to find logs with a status other than OK or Unknown in a Logstore, use the query not OK not Unknown.

  • How do I search for logs containing multiple keywords?

    To search logs for multiple keywords, use the http_user_agent field as an example:

    • Phrase query: http_user_agent:#"like Gecko". Phrase query

    • LIKE clause: * | Select * where http_user_agent like '%like Gecko%'

  • How do I query logs using a keyword that includes spaces?

    For instance, querying logs with the keyword POS version will return logs containing the exact phrase "POS version," as opposed to logs containing either POS or version separately.

  • FAQ about LogSearch

  • Common errors in querying and analyzing logs

  • How to perform a fuzzy query on logs

  • FAQ about querying and analyzing JSON logs

  • How to download logs to a local device

  • Why field values are truncated during query and analysis