This topic describes the usage of Simple Log Service Processing Language (SPL) in different scenarios.
SPL in different scenarios
The capability of SPL varies based on the scenarios in which SPL is used.
SPL entry point | Use of Logstore index-based filtering results as input | Case sensitivity of field names | Full-text field of __line__ | |
Real-time consumption |
| Not supported. If you use an asterisk (
| Case-sensitive. | Not supported. |
Scan-based query |
| Supported. You can perform index-based filtering and then use SPL to process the filtering results. Example:
| Not case-sensitive. | Supported. |
Logtail collection | Create Logtail configurations. | Not supported. If you use an asterisk (
| Case-sensitive. | Not supported. |
Process special fields
Time fields
During SPL execution, the data type of a Simple Log Service time field is always INTEGER
or BIGINT
. The time fields of Simple Log Service are __time__
and __time_ns_part__
. The __time__ field specifies a timestamp, and the __time_ns_part__ field specifies the nanoseconds of a time value.
If you want to update the time of data, you must use the extend instruction and make sure that the data type of the new time value is INTEGER
or BIGINT
. You cannot use other instructions to manage time fields.
project, project-away, and project-rename: By default, time fields are retained. The fields cannot be renamed or overwritten.
parse-regexp and parse-json: If time fields are extracted, the fields are ignored.
Examples
Extract time fields from an existing time string.
SPL statement
* | parse-regexp time, '([\d\-\s:]+)\.(\d+)' as ts, ms | extend ts=date_parse(ts, '%Y-%m-%d %H:%i:%S') | extend __time__=cast(to_unixtime(time_s) as INTEGER) | extend __time_ns_part__=cast(ms as INTEGER) * 1000000 | project-away ts, ms
Input data
time: '2023-11-11 01:23:45.678'
Output result
__time__: 1699637025 __time_ns_part__: 678000000 time: '2023-11-11 01:23:45.678'
Fields whose names contain special characters
If a log contains a field that is named with spaces or special characters, you can enclose the field name in double quotation marks (") to quote the field. For example, a log contains a field whose name is A B
. The field name contains a space.
You can specify "A B"
in an SPL statement to quote the field. Example:
* | where "A B" like '%error%'
Fields whose names are not case-sensitive
If you use SPL to perform a scan-based query operation, field names that you quote in SPL instructions are not case-sensitive. For example, if a log contains a field whose name is Method
, you can specify method
or METHOD
in an SPL instruction to quote the field.
The scan-based query feature of Simple Log Service is involved. For more information, see Scan-based query overview.
Examples
Specify a field name in the where instruction. The field name is not case-sensitive.
SPL statement
* | where METHOD like 'Post%'
Input data
Method: 'PostLogstoreLogs'
Output result
Method: 'PostLogstoreLogs'
Handle field name conflicts
During log upload or SPL execution, field name conflicts may occur in case-sensitive processing. For example, if a raw log contains both the Method and method fields, a conflict exists. SPL resolves field name conflicts by using different methods in different scenarios.
To prevent the following conflicts, we recommend that you eliminate duplicate fields in raw logs.
Conflicts in input data
A raw log can contain duplicate fields whose names are not case-sensitive. For example, if a log contains the Status
and status
fields, SPL randomly selects one of the fields as input and discards the other one. Example:
SPL statement
* | extend status_cast = cast(status as bigint)
Input data
Status: '200' status: '404'
Output result
Possibility 1: The value of the Status field is retained.
Status: '200' -- The first field is retained, and the second field is discarded. status_cast: '200'
Possibility 2: The value of the status field is retained.
status: '404' -- The second field is retained, and the first field is discarded. Status_cast: '404'
Conflicts in output results
Scenario 1: Existing data fields conflict.
During SPL execution, duplicate fields that are not case-sensitive may be generated. In this case, SPL randomly selects one of the fields as output. For example, a log contains a field whose value is of the JSON string type. During the execution of the parse-json
instruction, duplicate fields may be extracted. Example:
SPL statement
* | parse-json content
Input data
content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}'
Output result
Possibility 1: The Method field is retained.
content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}' Method: 'PostLogs' -- The Method field is retained. status: '200'
Possibility 2: The method field is retained.
content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}' method: 'GetLogs' -- The method field is retained. status: '200'
Scenario 2: New data fields conflict.
SPL retains the capitalization for the names of new data fields that are generated after SPL execution, including the field names that are specified in the extend
instruction and the field names that are specified after as
in the parse-regexp
and parse-csv
instructions. This helps prevent conflicts.
For example, if you specify the Method
field name in the extend
instruction, the field name remains Method
in the output result.
SPL statement
* | extend Method = 'Post'
Input data
Status: '200'
Output result
Status: '200' Method: 'Post'
Handle reserved field conflicts
The real-time consumption and scan-based query features of Simple Log Service are involved.
For more information about the reserved fields of Simple Log Service, see Reserved fields. SPL reads data from LogGroup of Simple Log Service as input. For more information about the definition of LogGroup, see Data encoding. If the raw data that is written to Simple Log Service does not comply with the encoding specifications of LogGroup or reserved fields are included in LogContent instead of LogGroup, SPL reads the reserved fields based on the following policies:
For the
__source__
,__topic__
,__time__
, and__time_ns_part__
fields, SPL reads field values from LogGroup and ignores values of duplicate fields in LogContent.For tag fields whose names are prefixed with
__tag__:
, SPL preferentially reads field values from LogGroup. If no field values are found, SPL reads field values from LogContent. For example, for the__tag__:ip
field, SPL preferentially reads a field whose key isip
from the list of LogTag. If no values are found, SPL reads a field whose key is__tag__:ip
from the list of custom fields in LogContent.
Full-text field of __line__
The scan-based query feature is involved.
If you want to filter raw logs in the Simple Log Service console or by calling the GetLogstoreLogs operation, you can use the __line__ field.
Examples
Search for logs by using the keyword error.
* | where __line__ like '%error%'
If a log contains a field named __line __, you must enclose the field name in grave accents (`), such as
`__line__`
, to quote the field.
* | where `__line__` ='20'
Retain and overwrite old and new values
If an output field is named the same as an existing field in the input data after an SPL instruction is executed, the output field is renamed based on the following policies.
The following policies do not apply to the extend instruction. If you use the extend instruction and an output field is named the same as an input field, the new field value is used.
Inconsistent data types between old and new values
The value of the input field is retained.
Examples
Example 1: After the project instruction is executed, the output field is named the same as an input field.
SPL statement
* | extend status=cast(status as BIGINT) -- Convert the data type of the status field into BIGINT. | project code=status -- The old data type of the code field is VARCHAR, and the new data type of the field is BIGINT. Retain the old value of the field.
Input data
status: '200' code: 'Success'
Output result
code: 'Success'
Example 2: After the parse-json instruction is executed, an extracted field is named the same as an input field.
SPL statement
* | extend status=cast(status as BIGINT) -- Convert the data type of the status field into BIGINT. | parse-json content -- The old data type of the status field is BIGINT, and the new data type of the field is VARCHAR. Retain the old value of the field.
Input data
status: '200' content: '{"status": "Success", "body": "this is test"}'
Output result
content: '{"status": "Success", "body": "this is test"}' status: 200 body: 'this is test'
Consistent data types between old and new values
If an output field is named the same as an input field whose value is null, the new field value is used. If the input field value is not null, the value of the output field is determined based on the mode
parameter in the instruction that is executed.
If the mode
parameter is not included in your instruction, the default value of the parameter is used. The default value is overwrite
.
Value of mode | Description |
overwrite | Overwrites the old value with the new value. |
preserve | Retains the old value and ignores the new value. |
Examples
Example 1: After the project instruction is executed, an output field is named the same as an input field. The two fields are of the same data type, and the default value of the mode parameter is used. The default value is overwrite.
SPL statement
* | project code=status -- The old and new data types of the code field are VARCHAR. In overwrite mode, the new value is used.
Input data
status: '200' code: 'Success'
Output result
code: '200'
Example 2: After the parse-json instruction is executed, an extracted output field is named the same as an input field. The two fields are of the same data type, and the default value of the mode parameter is used. The default value is overwrite.
SPL statement
* | parse-json content -- The old and new data types of the status field are VARCHAR. In overwrite mode, the new value is used.
Input data
status: '200' content: '{"status": "Success", "body": "this is test"}'
Output result
content: '{"status": "Success", "body": "this is test"}' status: 'Success' body: 'this is test'
Example 3: After the parse-json instruction is executed, an extracted output field is named the same as an input field. The two fields are of the same data type, and the mode parameter is set to preserve.
SPL statement
* | parse-json -mode='preserve' content -- The old and new data types of the status field are VARCHAR. In preserve mode, the old value is used.
Input data
status: '200' content: '{"status": "Success", "body": "this is test"}'
Output result
content: '{"status": "Success", "body": "this is test"}' status: '200' body: 'this is test'
Convert data types
Initial type
When you use SPL, the initial type of all input fields is VARCHAR, except for the time fields of logs. If strongly typed data is involved in subsequent processing, data type conversion is required.
Examples
If you want to filter access logs by using the status code 5xx, you must convert the data type of the status field into BIGINT and then compare data.
* -- The initial type of the status field is VARCHAR.
| where cast(status as BIGINT) >= 500 -- Convert the data type of the status field into BIGINT and then compare data.
Type retention
If you use the extend instruction to convert the data type of a field during SPL execution, the new data type is used in subsequent processing.
Examples
* -- A Logstore is used as the input data. The initial type of all input fields is VARCHAR, except for the time fields.
| where __source__='127.0.0.1' -- Use the __source__ field for filtering.
| extent status=cast(status as BIGINT) -- Convert the data type of the status field into BIGINT.
| project status, content
| where status>=500 -- The data type of the status field is retained as BIGINT. The field value can be directly compared with the number 500.
Process null values for SPL expressions
Generate null values
During SPL execution, null values are generated in the following scenarios:
If a field specified in an SPL expression does not exist in the input data, the system considers the value of the field as null for calculation.
If an error occurs during the calculation process of an SPL expression, the system returns null as the calculation result. The error includes a failure in data type conversion implemented by the cast function and an Array Index Out Of Bounds error.
Examples
If a field does not exist, the value null is used for calculation.
SPL statement
* | extend withoutStatus=(status is null)
Input data
# Entry 1
status: '200'
code: 'Success'
# Entry 2
code: 'Success'
Output result
# Entry 1
status: '200'
code: 'Success'
withoutStatus: false
# Entry 2
code: 'Success'
withoutStatus: true
If an error occurs during the calculation process, the calculation result is null.
SPL statement
* | extend code=cast(code as BIGINT) -- The data type of the code field failed to be converted into BIGINT. | extend values=json_parse(values) | extend values=cast(values as ARRAY(BIGINT)) | extend last=arr[10] -- An Array Index Out Of Bounds error occurred.
Input data
status: '200' code: 'Success' values: '[1,2,3]'
Output result
status: '200' code: null values: [1, 2, 3] last: null
Eliminate null values
To eliminate null values during calculation, you must use the COALESCE expression to associate multiple values by priority and use the first non-null value as the final calculation result. If all expressions return null, you can specify a default value as the final calculation result.
Examples
Read the last element of an array. If the array is empty, the default value 0 is returned.
SPL statement
* | extend values=json_parse(values) | extend values=cast(values as ARRAY(BIGINT)) | extend last=COALESCE(values[3], values[2], values[1], 0)
Input data
# Entry 1 values: '[1, 2, 3]' # Entry 2 values: '[]'
Output result
# Entry 1 values: [1, 2, 3] last: 3 # Entry 2 values: [] last: 0
Handle errors
Syntax error
An SPL syntax error refers to an error that does not conform to the syntax when you write an SPL statement. The error includes an invalid instruction name, a keyword quote error, and an invalid type. If an SPL syntax error occurs, SPL does not process data. You must handle the error based on the reported error information.
Data error
If an error occurs in a function or conversion during SPL execution, SPL sets the related result fields to null
. In this case, a data error is reported. A data error may occur in each row of data, and SPL randomly samples and returns specific errors. You can ignore data errors or modify your SPL statement based on the actual situation.
Data errors do not affect the entire execution of an SPL statement. The SPL statement still returns processing results, and the values of the fields with errors are null
. You can ignore data errors based on the actual situation.
Timeout error
An SPL statement contains different instructions, which consume different time in specific data scenarios. If the entire execution time of an SPL statement exceeds the default timeout period, the system stops the execution of the SPL statement and returns a timeout error. In this case, the execution result of the SPL statement is empty. The default timeout period may vary in scan-based query, real-time consumption, and Logtail collection scenarios.
If a timeout error occurs, we recommend that you modify your SPL statement to reduce the complexity of the statement and the number of pipelines. For example, you can use regular expressions in an SPL statement to reduce the statement complexity.
Out-of-memory error
An SPL statement contains different instructions, which consume different memory resources in specific data scenarios. If the amount of memory resources consumed by an SPL statement exceeds the default memory quota, the system stops the execution of the SPL statement and returns an out-of-memory error. In this case, the execution result of the SPL statement is empty. The default memory quota may vary in scan-based query, real-time consumption, and Logtail collection scenarios.
If an out-of-memory error occurs, we recommend that you modify your SPL statement to reduce the complexity of the statement and the number of pipelines and check whether the volume of raw data is excessively large.