Introduction to SPL usage - Simple Log Service - Alibaba Cloud Documentation Center

This topic describes the usage of Simple Log Service Processing Language (SPL) in different scenarios.

SPL in different scenarios

The capability of SPL varies based on the scenarios in which SPL is used.

Capability type	Use of Logstore index-based filtering results as input	Case sensitivity of field names	Full-text field of __line__
Logtail collection	Not supported. If you use an asterisk (``), all raw data that is collected by Logtail is used as input. Example: ` \| parse-json content`	Case-sensitive.	Not supported.
Ingest processor	Not supported. If you use an asterisk (`*`), all raw data is used as input.	Case-sensitive.	Not supported.
Real-time consumption	Not supported. If you use an asterisk (``), all raw data in a Logstore is used as input. Example: ` \| where msg like '%wrong%'`	Case-sensitive.	Not supported.
Overview of data transformation (new version)	Not supported. If you use an asterisk (`*`), all raw data in a Logstore is used as input.	Case-sensitive.	Not supported.
Scan-based query	Supported. You can perform index-based filtering and then use SPL to process the filtering results. Example: `error and msg:wrong \| project level, msg`	Not case-sensitive.	Supported.

Process special fields

Time fields

During SPL execution, the data type of a Simple Log Service time field is always INTEGER or BIGINT. The time fields of Simple Log Service are __time__ and __time_ns_part__. The __time__ field specifies a timestamp, and the __time_ns_part__ field specifies the nanoseconds of a time value.

If you want to update the time of data, you must use the extend instruction and make sure that the data type of the new time value is INTEGER or BIGINT. You cannot use other instructions to manage time fields.

project, project-away, and project-rename: By default, time fields are retained. The fields cannot be renamed or overwritten.
parse-regexp and parse-json: If time fields are extracted, the fields are ignored.

Examples

Extract time fields from an existing time string.

SPL statement

* 
| parse-regexp time, '([\d\-\s:]+)\.(\d+)' as ts, ms
| extend ts=date_parse(ts, '%Y-%m-%d %H:%i:%S')
| extend __time__=cast(to_unixtime(time_s) as INTEGER)
| extend __time_ns_part__=cast(ms as INTEGER) * 1000000
| project-away ts, ms

Input data
```
time: '2023-11-11 01:23:45.678'
```

Output result

__time__: 1699637025
__time_ns_part__: 678000000
time: '2023-11-11 01:23:45.678'

Fields whose names contain special characters

If a log contains a field that is named with spaces or special characters, you can enclose the field name in double quotation marks (") to quote the field. For example, a log contains a field whose name is A B. The field name contains a space.

You can specify "A B" in an SPL statement to quote the field. Example:

* | where "A B" like '%error%'

Fields whose names are not case-sensitive

If you use SPL to perform a scan-based query operation, field names that you quote in SPL instructions are not case-sensitive. For example, if a log contains a field whose name is Method, you can specify method or METHOD in an SPL instruction to quote the field.

Important

The scan-based query feature of Simple Log Service is involved. For more information, see Scan-based query overview.

Examples

Specify a field name in the where instruction. The field name is not case-sensitive.

SPL statement
```
* | where METHOD like 'Post%'
```
Input data
```
Method: 'PostLogstoreLogs'
```
Output result
```
Method: 'PostLogstoreLogs'
```

Handle field name conflicts

During log upload or SPL execution, field name conflicts may occur in case-sensitive processing. For example, if a raw log contains both the Method and method fields, a conflict exists. SPL resolves field name conflicts by using different methods in different scenarios.

To prevent the following conflicts, we recommend that you eliminate duplicate fields in raw logs.

Conflicts in input data

A raw log can contain duplicate fields whose names are not case-sensitive. For example, if a log contains the Status and status fields, SPL randomly selects one of the fields as input and discards the other one. Example:

SPL statement

* | extend status_cast = cast(status as bigint)

Input data
```
Status: '200'
status: '404'
```

Output result

Possibility 1: The value of the Status field is retained.

Status: '200' -- The first field is retained, and the second field is discarded.
status_cast: '200'

Possibility 2: The value of the status field is retained.

status: '404' -- The second field is retained, and the first field is discarded.
Status_cast: '404'

Conflicts in output results

Scenario 1: Existing data fields conflict.

During SPL execution, duplicate fields that are not case-sensitive may be generated. In this case, SPL randomly selects one of the fields as output. For example, a log contains a field whose value is of the JSON string type. During the execution of the parse-json instruction, duplicate fields may be extracted. Example:

SPL statement
```
* | parse-json content
```

Input data

content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}'

Output result

Possibility 1: The Method field is retained.

content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}'
Method: 'PostLogs' -- The Method field is retained.
status: '200'

Possibility 2: The method field is retained.

content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}'
method: 'GetLogs' -- The method field is retained.
status: '200'

Scenario 2: New data fields conflict.

SPL retains the capitalization for the names of new data fields that are generated after SPL execution, including the field names that are specified in the extend instruction and the field names that are specified after as in the parse-regexp and parse-csv instructions. This helps prevent conflicts.

For example, if you specify the Method field name in the extend instruction, the field name remains Method in the output result.

SPL statement
```
* | extend Method = 'Post'
```
Input data
```
Status: '200'
```
Output result
```
Status: '200'
Method: 'Post'
```

Handle reserved field conflicts

Important

The real-time consumption and scan-based query features of Simple Log Service are involved.

For more information about the reserved fields of Simple Log Service, see Reserved fields. SPL reads data from LogGroup of Simple Log Service as input. For more information about the definition of LogGroup, see Data encoding. If the raw data that is written to Simple Log Service does not comply with the encoding specifications of LogGroup or reserved fields are included in LogContent instead of LogGroup, SPL reads the reserved fields based on the following policies:

For the __source__, __topic__, __time__, and __time_ns_part__ fields, SPL reads field values from LogGroup and ignores values of duplicate fields in LogContent.
For tag fields whose names are prefixed with __tag__:, SPL preferentially reads field values from LogGroup. If no field values are found, SPL reads field values from LogContent. For example, for the __tag__:ip field, SPL preferentially reads a field whose key is ip from the list of LogTag. If no values are found, SPL reads a field whose key is __tag__:ip from the list of custom fields in LogContent.

Full-text field of line

Important

The scan-based query feature is involved.

If you want to filter raw logs in the Simple Log Service console or by calling the GetLogstoreLogs operation, you can use the __line__ field.

Examples

Search for logs by using the keyword error.

* | where __line__ like '%error%'

If a log contains a field named __line __, you must enclose the field name in grave accents (`), such as `__line__`, to quote the field.

* | where `__line__` ='20'

Retain and overwrite old and new values

If an output field is named the same as an existing field in the input data after an SPL instruction is executed, the output field is renamed based on the following policies.

Important

The following policies do not apply to the extend instruction. If you use the extend instruction and an output field is named the same as an input field, the new field value is used.

Inconsistent data types between old and new values

The value of the input field is retained.

Examples

Example 1: After the project instruction is executed, the output field is named the same as an input field.

SPL statement

* 
| extend status=cast(status as BIGINT) -- Convert the data type of the status field into BIGINT.
| project code=status -- The old data type of the code field is VARCHAR, and the new data type of the field is BIGINT. Retain the old value of the field.

Input data
```
status: '200'
code: 'Success'
```
Output result
```
code: 'Success'
```

Example 2: After the parse-json instruction is executed, an extracted field is named the same as an input field.

SPL statement

* 
| extend status=cast(status as BIGINT) -- Convert the data type of the status field into BIGINT.
| parse-json content -- The old data type of the status field is BIGINT, and the new data type of the field is VARCHAR. Retain the old value of the field.

Input data

status: '200'
content: '{"status": "Success", "body": "this is test"}'

Output result

content: '{"status": "Success", "body": "this is test"}'
status: 200
body: 'this is test'

Consistent data types between old and new values

If an output field is named the same as an input field whose value is null, the new field value is used. If the input field value is not null, the value of the output field is determined based on the mode parameter in the instruction that is executed.

Important

If the mode parameter is not included in your instruction, the default value of the parameter is used. The default value is overwrite.

Value of mode	Description
overwrite	Overwrites the old value with the new value.
preserve	Retains the old value and ignores the new value.

Examples

Example 1: After the project instruction is executed, an output field is named the same as an input field. The two fields are of the same data type, and the default value of the mode parameter is used. The default value is overwrite.
- SPL statement
```
* | project code=status -- The old and new data types of the code field are VARCHAR. In overwrite mode, the new value is used.
```
- Input data
```
status: '200'
code: 'Success'
```
- Output result
```
code: '200'
```

Example 2: After the parse-json instruction is executed, an extracted output field is named the same as an input field. The two fields are of the same data type, and the default value of the mode parameter is used. The default value is overwrite.

SPL statement

* | parse-json content -- The old and new data types of the status field are VARCHAR. In overwrite mode, the new value is used.

Input data

status: '200'
content: '{"status": "Success", "body": "this is test"}'

Output result

content: '{"status": "Success", "body": "this is test"}'
status: 'Success'
body: 'this is test'

Example 3: After the parse-json instruction is executed, an extracted output field is named the same as an input field. The two fields are of the same data type, and the mode parameter is set to preserve.

SPL statement

* | parse-json -mode='preserve' content -- The old and new data types of the status field are VARCHAR. In preserve mode, the old value is used.

Input data

status: '200'
content: '{"status": "Success", "body": "this is test"}'

Output result

content: '{"status": "Success", "body": "this is test"}'
status: '200'
body: 'this is test'

Convert data types

Initial type

When you use SPL, the initial type of all input fields is VARCHAR, except for the time fields of logs. If strongly typed data is involved in subsequent processing, data type conversion is required.

Example

If you want to filter access logs by using the status code 5xx, you must convert the data type of the status field into BIGINT and then compare data.

* -- The initial type of the status field is VARCHAR.
| where cast(status as BIGINT) >= 500 -- Convert the data type of the status field into BIGINT and then compare data.

Type retention

If you use the extend instruction to convert the data type of a field during SPL execution, the new data type is used in subsequent processing.

Example

* -- A Logstore is used as the input data. The initial type of all input fields is VARCHAR, except for the time fields.
| where __source__='127.0.0.1' -- Use the __source__ field for filtering.
| extent status=cast(status as BIGINT) -- Convert the data type of the status field into BIGINT.
| project status, content
| where status>=500 -- The data type of the status field is retained as BIGINT. The field value can be directly compared with the number 500.

Process null values for SPL expressions

Generate null values

During SPL execution, null values are generated in the following scenarios:

If a field specified in an SPL expression does not exist in the input data, the system considers the value of the field as null for calculation.
If an error occurs during the calculation process of an SPL expression, the system returns null as the calculation result. The error includes a failure in data type conversion implemented by the cast function and an Array Index Out Of Bounds error.

Examples

If a field does not exist, the value null is used for calculation.

SPL statement

* | extend withoutStatus=(status is null)

Input data

# Entry 1
status: '200'
code: 'Success'

# Entry 2
code: 'Success'

Output result

# Entry 1
status: '200'
code: 'Success'
withoutStatus: false

# Entry 2
code: 'Success'
withoutStatus: true

If an error occurs during the calculation process, the calculation result is null.

SPL statement

*
| extend code=cast(code as BIGINT) -- The data type of the code field failed to be converted into BIGINT.
| extend values=json_parse(values)
| extend values=cast(values as ARRAY(BIGINT))
| extend last=arr[10] -- An Array Index Out Of Bounds error occurred.

Input data

status: '200'
code: 'Success'
values: '[1,2,3]'

Output result

status: '200'
code: null
values: [1, 2, 3]
last: null

Eliminate null values

To eliminate null values during calculation, you must use the COALESCE expression to associate multiple values by priority and use the first non-null value as the final calculation result. If all expressions return null, you can specify a default value as the final calculation result.

Examples

Read the last element of an array. If the array is empty, the default value 0 is returned.

SPL statement

*
| extend values=json_parse(values)
| extend values=cast(values as ARRAY(BIGINT))
| extend last=COALESCE(values[3], values[2], values[1], 0)

Input data

# Entry 1
values: '[1, 2, 3]'

# Entry 2
values: '[]'

Output result

# Entry 1
values: [1, 2, 3]
last: 3

# Entry 2
values: []
last: 0

Handle errors

Syntax error

An SPL syntax error refers to an error that does not conform to the syntax when you write an SPL statement. The error includes an invalid instruction name, a keyword quote error, and an invalid type. If an SPL syntax error occurs, SPL does not process data. You must handle the error based on the reported error information.

Data error

If an error occurs in a function or conversion during SPL execution, SPL sets the related result fields to null. A data error may occur in each row of data, and SPL randomly samples and returns specific errors. You can ignore data errors or modify your SPL statement based on the actual situation.

Data errors do not affect the entire execution of an SPL statement. The SPL statement still returns processing results, and the values of the fields with errors are null. You can ignore data errors based on the actual situation.

Timeout error

An SPL statement contains different instructions, which consume different time in specific data scenarios. If the entire execution time of an SPL statement exceeds the default timeout period, the system stops the execution of the SPL statement and returns a timeout error. In this case, the execution result of the SPL statement is empty. The default timeout period may vary in scan-based query, real-time consumption, and Logtail collection scenarios.

If a timeout error occurs, we recommend that you modify your SPL statement to reduce the complexity of the statement and the number of pipelines. For example, you can use regular expressions in an SPL statement to reduce the statement complexity.

Out-of-memory error

An SPL statement contains different instructions, which consume different memory resources in specific data scenarios. If the amount of memory resources consumed by an SPL statement exceeds the default memory quota, the system stops the execution of the SPL statement and returns an out-of-memory error. In this case, the execution result of the SPL statement is empty. The default memory quota may vary in scan-based query, real-time consumption, and Logtail collection scenarios.

If an out-of-memory error occurs, we recommend that you modify your SPL statement to reduce the complexity of the statement and the number of pipelines and check whether the volume of raw data is excessively large.

SPL in different scenarios

Process special fields

Time fields

Examples

Fields whose names contain special characters

Fields whose names are not case-sensitive

Examples

Handle field name conflicts

Conflicts in input data

Conflicts in output results

Handle reserved field conflicts

Full-text field of __line__

Examples

Retain and overwrite old and new values

Inconsistent data types between old and new values

Examples

Consistent data types between old and new values

Examples

Convert data types

Initial type

Example

Type retention

Example

Process null values for SPL expressions

Generate null values

Examples

Eliminate null values

Examples

Handle errors

Syntax error

Data error

Timeout error

Out-of-memory error

Full-text field of line