An inverted index is a data storage structure that consists of keywords and logical pointers. Logical pointers can map to actual data. You can use keywords to quickly locate data rows of specific text in logs. An index is similar to a data catalog. You can query and analyze logs only after you create indexes. This topic describes the definition and types of indexes that are supported by Simple Log Service. This topic also describes how to create indexes and provides examples.
Prerequisites
Before you can analyze logs, you must store the logs in a Standard Logstore. For more information, see Data collection overview and Manage a Logstore.
If you want to use a Resource Access Management (RAM) user to create indexes, make sure that the RAM user is granted the required permissions. For more information about how to grant permissions, see Grant permissions to a RAM user. For more information about policies, see Overview.
Definition and types of indexes
Definition
In most cases, you can use keywords to query data from raw logs. For example, you want to obtain the following log that contains the Chrome keyword. If log splitting is not performed, the log is considered as a whole and the system does not associate the log with the Chrome keyword. In this case, you cannot obtain the log.
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/192.0.2.0 Safari/537.2
To search for the log, you must split the log into separate and searchable words. You can split a log by using delimiters. Delimiters determine the positions at which a log is split. In this example, you can use the following delimiters to split the preceding log: \n\t\r,;[]{}()&^*#@~=<>/\?:'"
. The log is split into the following words: Mozilla
, 5.0
, Windows
, NT
, 6.1
, AppleWebKit
, 537.2
, KHTML
, like
, Gecko
, Chrome
, 192.0.2.0
, Safari
, and 537.2
.
Simple Log Service creates indexes based on the words that are obtained after log splitting. You can use indexes to quickly locate specific information in a large number of logs.
Index types
Indexes are classified into full-text indexes and field indexes. Chinese content cannot be split by using delimiters. However, if you want to split Chinese content, you can turn on Include Chinese. Then, Simple Log Service automatically splits the Chinese content based on Chinese grammar.
Full-text indexes: Simple Log Service splits a log into multiple words that are of the Text type by using delimiters. You can query logs by using keywords. For example, you can query logs that contain
Chrome
orSafari
based on the following search statement:Chrome or Safari
. For more information, see Search syntax.Field indexes: Simple Log Service distinguishes logs by field name and then splits the fields by using delimiters. Supported field types are
Text
,Long
,Double
, andJSON
. After you create field indexes, you can specify field names and field values in the key:value format to query logs. You can also use a SELECT statement to query logs. For more information, see Field-specific search syntax and Log analysis overview.NoteWhen you collect logs to Simple Log Service or ship logs from Simple Log Service to other cloud services, Simple Log Service adds fields such as log sources and timestamps to logs in the key-value format. The fields are considered reserved fields of Simple Log Service.
After you create field indexes, you can use the following search or query statements to query data:
Search statement:
request_method:GET and status in [200 299]
. This search statement is used to query logs that record successful GET requests. GET requests whose status code ranges from 200 to 299 are considered successful. Search statement:request_method:GET not region:cn-hangzhou
. This search statement is used to query logs that record GET requests from regions other than the China (Hangzhou) region.Query statement:
* | SELECT status_code FROM web_logs
.Query statement:
level: ERROR | SELECT status_code FROM web_logs
.
Policies used to create indexes
Configured indexes take effect only for new logs. To query and analyze historical logs, you must reindex the logs. After indexes are created, the indexes take effect within approximately 1 minute. For more information about the configuration examples of field indexes, see Query and analyze JSON logs and Query and analyze website logs.
Query and analysis results vary based on index configurations. You must create indexes based on your business requirements. If you create both full-text indexes and field indexes, the field indexes take precedence.
If only full-text indexes are configured, you can use only search syntax to query logs. For more information, see Search syntax.
If field indexes are configured, the query statement that you can use to query and analyze logs varies based on the data types of fields in the logs.
Fields of the Long and Double types: You can use field-based search statements and analytic statements to query and analyze data. An analytic statement includes a SELECT statement.
Fields of the Text type: You can use full text-based search statements, field-based search statements, and analytic statements to query and analyze data. If full-text indexing is not enabled, full text-based search statements query data from all fields of the Text type. If full-text indexing is enabled, full text-based search statements query data from all logs.
Index configuration examples
A log contains the
request_time
field, and therequest_time>100
field-based search statement is executed.If only full-text indexes are configured, logs that contain
request_time
,>
, and100
are returned. The greater-than sign (>) is not a delimiter.If only field indexes are configured and the field types are Double and Long, logs whose
request_time
field value is greater than 100 are returned.If both full-text indexes and field indexes are configured and the field types are Double and Long, configured full-text indexes do not take effect for the
request_time
field, and logs whoserequest_time
field value is greater than 100 are returned.
A log contains the
request_time
field, and therequest_time
full text-based search statement is executed.If only field indexes are configured and the field types are Double and Long, no logs are returned.
If only full-text indexes are configured, logs that contain the
request_time
field are returned. In this case, the statement queries data from all logs.If only field indexes are configured and the field type is Text, logs that contain the
request_time
field are returned. In this case, the statement queries data from all fields of the Text type.
A log contains the
status
field, and the* | SELECT status, count(*) AS PV GROUP BY status
query statement is executed.If only full-text indexes are configured, no logs are returned.
If an index is configured for the
status
field, the total numbers of page views (PVs) for different status codes are returned.
Index traffic
Index traffic for full-text indexes: All field names and field values are stored as text. In this case, field names and field values are both included in the calculation of index traffic.
Index traffic for field indexes: The method that is used to calculate index traffic varies based on the data type of a field.
Text: Field names and field values are both included in the calculation of index traffic.
Long and Double: Field names are not included in the calculation of index traffic. Each field value is counted as 8 bytes in index traffic.
For example, if you create an index for the
status
field of the Long type and the field value is400
, the stringstatus
is not included in the calculation of index traffic, and the value400
is counted as 8 bytes in index traffic.JSON: Field names and field values are both included in the calculation of index traffic. The subfields that are not indexed are also included. For more information, see Why is index traffic generated for JSON subfields that are not indexed?
If a subfield is not indexed, index traffic is calculated by regarding the data type of the subfield as Text.
If a subfield is indexed, index traffic is calculated based on the data type of the subfield. The data type can be Text, Long, or Double.
Billing description
Logstores support the following billing modes: pay-by-ingested-data and pay-by-feature. For more information, see Manage a Logstore, Billable items of pay-by-feature, and Billable items of pay-by-ingested-data.
Logstore that uses the pay-by-ingested-data billing mode
Indexes occupy storage space. For more information about storage types, see Configure intelligent tiered storage.
Reindexing does not generate fees.
Logstore that uses the pay-by-feature billing mode
Indexes occupy storage space. For more information about storage types, see Configure intelligent tiered storage.
When you create indexes, traffic is generated. For more information about the billing of index traffic, see the index traffic of log data and log index traffic of Query Logstores billable items in Billable items of pay-by-feature. For more information about how to reduce index traffic, see the References section of this topic.
Reindexing generates fees. During reindexing, you are charged based on the same billable items and prices as when you create indexes.
Procedure
Step 1: Create indexes
Go to the query and analysis page.
Log on to the Simple Log Service console.
In the Projects section, click the project that you want to manage.
On the
tab, click the Logstore that you want to manage.On the page that appears, choose
. If no indexes are created, click Enable.
Turn off Auto Update. If a Logstore is a dedicated Logstore for a cloud service or an internal Logstore, Auto Update is turned on by default. In this case, the built-in indexes of the Logstore are automatically updated to the latest version. If you want to create indexes in the preceding scenario, turn off Auto Update in the Search & Analysis panel.
WarningIf you delete the indexes of a dedicated Logstore for a cloud service, features that are enabled for the Logstore may be affected. The features include reports and alerting.
Create indexes.
Configure index parameters. If you want to analyze fields, you must create field indexes. You must include a SELECT statement in your query statement for analysis. Field indexes have a higher priority than full-text indexes. After indexes are created, the indexes take effect within 1 minute.
ImportantSimple Log Service automatically creates indexes for specific reserved fields. For more information, see Reserved fields.
Simple Log Service leaves delimiters empty when it creates indexes for the
__topic__
and__source__
reserved fields. Therefore, only exact match is supported when you specify keywords to query the two fields.Fields that are prefixed with
__tag__
do not support full-text indexes. If you want to query and analyze fields that are prefixed with __tag__, you must create field indexes. Sample query statement:*| select "__tag__:__receive_time__"
.If a log contains two fields whose names are the same, such as request_time, Simple Log Service displays one of the fields as request_time_0. The two fields are still stored as request_time in Simple Log Service. If you want to query, analyze, ship, transform, or create indexes for the fields, you must use request_time.
Step 2: Reindex logs
Simple Log Service provides the reindexing feature that you can use to configure or modify indexes for historical data. You can reindex the logs of a specified time range in a Logstore based on the most recent indexing rules. For more information, see Reindex logs for a Logstore and Function overview.
What to do next
Query and analyze logs
For more information about how to query and analyze logs, see Query and analyze logs. For more information about the examples of query and analysis, see Query and analyze website logs, Query and analyze JSON logs, Collect, query, and analyze NGINX monitoring logs, and Analyze Layer 7 access logs of SLB.
Specify the maximum length of a field value
The default maximum length of a field value that can be retained for analysis is 2,048 bytes, which is equivalent to 2 KB. You can change the value of Maximum Statistics Field Length. Valid values: 64 to 16384. Unit: bytes.
If the length of a field value exceeds the value of this parameter, the field value is truncated, and the excess part is not involved in analysis.
LogReduce
If you turn on LogReduce, Simple Log Service automatically clusters highly similar text logs during collection and extracts patterns from the logs. This can help you fully understand the logs. For more information, see LogReduce.
Disable indexing
After you disable the indexing feature for a Logstore, the storage space that is occupied by historical indexes is automatically released after the data retention period of the Logstore elapses.
References
For more information about how to improve query performance, see Accelerate the query and analysis of logs.
For more information about how to query and analyze JSON-formatted website logs, see Query and analyze JSON logs.
How do I resolve common errors that may occur when I query and analyze logs?
Why are field values truncated when I query and analyze logs?
FAQ
Related operations
CreateIndex: creates indexes for a Logstore.
DeleteIndex: deletes the indexes of a Logstore.
GetIndex: queries the indexes of a Logstore.
UpdateIndex: updates the indexes of a Logstore.