SPL instructions - Simple Log Service - Alibaba Cloud Documentation Center

This topic describes the Simple Log Service Processing Language (SPL) instructions.

Parameter type description

The table below details the data types for parameters in SPL instructions.

Parameter type	Description
Bool	The parameter specifies a Boolean value. This type of parameter is a switch in SPL instructions.
Char	The parameter specifies an ASCII character. You must use single quotation marks ('') to enclose the character. For example, `'a'` indicates the character a, `'\t'` indicates the tab character, `'\11'` indicates the ASCII character whose serial number corresponds to the octal number 11, and `'\x09'` indicates the ASCII character whose serial number corresponds to the hexadecimal number 09.
Integer	The parameter specifies an integer value.
String	The parameter specifies a string. You must use single quotation marks ('') to enclose the string. For example, `'this is a string'`.
RegExp	The parameter specifies a regular expression. The RE2 syntax is supported. You must use single quotation marks ('') to enclose the regular expression. For example, `'([\d.]+)'`. For more information, see Syntax.
JSONPath	The parameter specifies a JSON path. You must use single quotation marks ('') to enclose the JSON path. For example, `'$.body.values[0]'`. For more information, see JsonPath.
Field	The parameter specifies a field name. For example, `\| project level, content`. If the field name contains special characters other than letters, digits, or underscores, you must use double quotation marks ("") to enclose the field name. For example, `\| project "a:b:c"`. Note For more information about case sensitivity of field names, see SPL functionality definitions in different scenarios.
FieldPattern	The parameter specifies a field name or a combination of a field name and a wildcard character. An asterisk () can be used as a wildcard character, which matches zero or multiple characters. You must use double quotation marks ("") to enclose the field pattern. For example, `\| project "__tag__:"`. Note For more information about case sensitivity of field names, see SPL functionality definitions in different scenarios.
SPLExp	The parameter specifies an SPL expression.
SQLExp	The parameter specifies an SQL expression.

List of SPL instructions

Instruction category	Instruction name	Description
Field processing instructions	project	This instruction retains the fields that match the specified pattern and renames the specified fields. During instruction execution, all retain-related expressions are executed before rename-related expressions.
	project-away	This instruction removes the fields that match the specified pattern and retains all other fields as they are.
	project-rename	This instruction renames the specified fields and retains all other fields as they are.
	expand-values	This instruction expands the first-layer JSON object of the specified field and generates multiple results.
SQL calculation instructions on structured data	extend	This instruction creates fields based on the result of SQL expression-based data calculation. For more information about the supported SQL functions, see List of SQL functions supported by SPL.
SQL calculation instructions on structured data	where	This instruction filters data based on the result of SQL expression-based data calculation. Data that matches the specified SQL expression is retained. For more information about the supported SQL functions, see List of SQL functions supported by SPL.
Extraction instructions on semi-structured data	parse-regexp	This instruction extracts the information that matches groups in the specified regular expression from the specified field.
	parse-csv	This instruction extracts information in the CSV format from the specified field.
	parse-json	This instruction extracts the first-layer JSON information from the specified field.
	parse-kv	This instruction extracts key-value pair information from the specified field.

Field processing instructions

project

The project instruction retains fields that match a specified pattern and renames designated fields. Retain-related expressions are executed prior to rename-related expressions during the execution of the instruction.

Important

By default, the time fields __time__ and __time_ns_part__ are preserved and cannot be renamed or overwritten. For more information, see Time fields.

Syntax

| project -wildcard <field-pattern>, <output>=<field>, ...

Parameter description

Parameter	Type	Required	Description
wildcard	Bool	No	Specifies whether to enable the wildcard match mode. By default, the exact match mode is used. If you want to enable the wildcard match mode, you must configure this parameter.
field-pattern	FieldPattern	Yes	The name of the field to retain, or a combination of a field and a wildcard character. All matched fields are processed.
output	Field	Yes	The new name of the field to rename. You cannot rename multiple fields to the same name. Important If the new field name is the same as an existing field name in the input data, see Retention and overwrite of old and new values.
field	Field	Yes	The original name of the field to rename. If the field does not exist in the input data, the rename operation is not performed. You cannot rename a field multiple times.

Sample statement

Example 1: Retain a field.
```
* | project level, err_msg
```
Example 2: Rename a field.
```
* | project log_level=level, err_msg
```
Example 3: Retain the field that exactly matches __tag__:*.
```
* | project "__tag__:*"
```

project-away

The project-away instruction removes fields that match a specified pattern, keeping all other fields unchanged.

Important

By default, the time fields __time__ and __time_ns_part__ are retained. For more information, see Time fields.

Syntax

| project-away -wildcard <field-pattern>, ...

Parameter description

Parameter	Type	Required	Description
wildcard	Bool	No	Specifies whether to enable the wildcard match mode. By default, the exact match mode is used. If you want to enable the wildcard match mode, you must configure this parameter.
field-pattern	FieldPattern	Yes	The name of the field to remove, or a combination of a field and a wildcard character. All matched fields are processed.

project-rename

The project-rename instruction renames specified fields while retaining all others as is.

Important

By default, the time fields __time__ and __time_ns_part__ are preserved and cannot be renamed or overwritten. For more information, see Time fields.

Syntax

| project-rename <output>=<field>, ...

Parameter description

Parameter

Type

Required

Description

output

Field

Yes

The new name of the field to rename. You cannot rename multiple fields to the same name.

Important

If the new field name is the same as an existing field name in the input data, see Retention and overwrite of old and new values.

field

Field

Yes

The original name of the field to rename.

If the field does not exist in the input data, the rename operation is not performed.
You cannot rename a field multiple times.

Example

Rename the specified fields.

* | project-rename log_level=level, log_err_msg=err_msg

expand-values

This instruction expands the first-layer JSON object in a specified field, generating multiple results.

Important

The output field data type is VARCHAR. If the new field name conflicts with an existing field name in the input data, refer to Retention and overwrite of old and new values.
You cannot perform operations on the time fields __time__ and __time_ns_part__. For more information, see Time fields.
Supported scenarios include data manipulation (new version). For SPL functionality definitions in different scenarios, see SPL functionality definitions in different scenarios.

Syntax

| expand-values -path=<path> -limit=<limit> -keep <field> as <output>

Parameter description

Parameter	Type	Required	Description
path	JSONPath	No	The JSON path in the specified field. The JSON path is used to locate the information that you want to extract. The default value is an empty string. If you use the default value, the complete data of the specified field is extracted.
limit	Integer	No	The maximum number of entries that can be expanded from each piece of raw data. The value is an integer from 1 to 10. The default value is 10.
keep	Bool	No	Specifies whether to retain the original field after expansion. By default, the original field is not retained. If you want to retain the original field, you must enable this switch.
field	Field	Yes	The original name of the field to expand. The data type must be `VARCHAR`. If the specified field does not exist, the expansion operation is not performed.
output	Filed	No	The name of the field to create. If you do not specify this parameter, the output result is written to the input field by default. The expansion logic for the original content is as follows: JSON array: The array is expanded based on its elements. JSON dictionary: The dictionary is expanded based on its key-value pairs. Other JSON types: The initial value is returned. Invalid JSON: `null` is returned.

Sample statement

Example 1: Expand an array, outputting multiple result sets.
- SPL statement
```
* | expand-values y
```
- Input data
```
x: 'abc'
y: '[0,1,2]'
```
- Output data: The array is expanded into three data sets.
```
# Entry 1
x: 'abc'
y: '0'

# Entry 2
x: 'abc'
y: '1'

# Entry 3
x: 'abc'
y: '2'
```
Example 2: Expand a dictionary, outputting multiple result sets.
- SPL statement
```
* | expand-values y
```
- Input data
```
x: 'abc'
y: '{"a": 1, "b": 2}'
```
- Output data: The dictionary is expanded into two data sets.
```
# Entry 1
x: 'abc'
y: '{"a": 1}'

# Entry 2
x: 'abc'
y: '{"b": 2}'
```

Example 3: Expand content under a specified JSON path, outputting results to a new field.

SPL statement

* | expand-values -keep content -path='$.body' as body

Input data

content: '{"body": [0, {"a": 1, "b": 2}]}'

Output data: The content is expanded into two data sets.

# Entry 1
content: '{"body": [1, 2]}'
body: '0'

# Entry 2
content: '{"body": [1, 2]}'
body: '{"a": 1, "b": 2}'

SQL calculation instructions on structured data

extend

This instruction creates fields based on SQL expression-based data calculations. For a list of supported SQL functions, see List of SQL functions supported by SPL.

Syntax

| extend <output>=<sql-expr>, ...

Parameter description

Parameter	Type	Required	Description
output	Field	Yes	The name of the field to create. You cannot create the same field to store the results of multiple expressions. Important If the new field name is the same as an existing field name in the input data, the new field overwrites the existing field based on the data type and value.
sql-expr	SQLExpr	Yes	The data processing expression. Important For more information about null value processing, see Null value processing in SPL expressions.

Sample statement

Example 1: Apply a computation expression.
```
* | extend Duration = EndTime - StartTime
```

Example 2: Utilize a regular expression.

* | extend server_protocol_version=regexp_extract(server_protocol, '\d+')

Example 3: Extract JSONPath content and convert a field's data type.

SPL statement

*
| extend a=json_extract(content, '$.body.a'), b=json_extract(content, '$.body.b')
| extend b=cast(b as BIGINT)

Input data
```
content: '{"body": {"a": 1, "b": 2}}'
```

Output results

content: '{"body": {"a": 1, "b": 2}}'
a: '1'
b: 2

where

This instruction filters data based on SQL expression-based calculations, retaining data that matches the specified SQL expression. For a list of supported SQL functions, see List of SQL functions supported by SPL.

Syntax

| where <sql-expr>

Parameter description

Parameter	Type	Required	Description
sql-expr	SQLExp	Yes	The SQL expression. Data that matches this expression is retained. Important For more information about null value processing in SQL expressions, see Null value processing in SPL expressions.

Sample statement

Example 1: Filter data based on field content.
```
* | where userId='123'
```
Example 2: Filter data using a regular expression that matches based on a field name.
```
* | where regexp_like(server_protocol, '\d+')
```
Example 3: Convert a field's data type to match all server error data.
```
* | where cast(status as BIGINT) >= 500
```

Extraction instructions on semi-structured data

parse-regexp

This instruction extracts information matching groups in a specified regular expression from a field.

Important

The output field data type is VARCHAR. If the new field name conflicts with an existing field name in the input data, refer to Retention and overwrite of old and new values.
Operations on time fields __time__ and __time_ns_part__ are not permitted. For more information, see Time fields.

Syntax

| parse-regexp <field>, <pattern> as <output>, ...

Parameter description

Parameter	Type	Required	Description
field	Field	Yes	The original name of the field from which you want to extract information. Make sure that this field is included in the input data, the data type is `VARCHAR`, and the field value is not `null`. Otherwise, the extract operation is not performed.
pattern	Regexp	Yes	The regular expression. The RE2 syntax is supported.
output	Field	No	The name of the output field that you want to use to store the extraction result of the regular extraction.

Sample statement

Example 1: Employ exploratory match mode.

SPL statement

*
| parse-regexp content, '(\S+)' as ip -- Generate the ip: 10.0.0.0 field.
| parse-regexp content, '\S+\s+(\w+)' as method -- Generate the method: GET field.

Input data

content: '10.0.0.0 GET /index.html 15824 0.043'

Output results

content: '10.0.0.0 GET /index.html 15824 0.043'
ip: '10.0.0.0'
method: 'GET'

Example 2: Utilize full pattern match mode with unnamed capturing groups.

SPL statement

* | parse-regexp content, '(\S+)\s+(\w+)' as ip, method

Input data

content: '10.0.0.0 GET /index.html 15824 0.043'

Output results

content: '10.0.0.0 GET /index.html 15824 0.043'
ip: '10.0.0.0'
method: 'GET'

parse-csv

This instruction extracts CSV-formatted information from a specified field.

Important

The output field data type is VARCHAR. If the new field name conflicts with an existing field name in the input data, refer to Retention and overwrite of old and new values.
Operations on time fields __time__ and __time_ns_part__ are not permitted. For more information, see Time fields.

Syntax

| parse-csv -delim=<delim> -quote=<quote> -strict <field> as <output>, ...

Parameter description

Parameter	Type	Required	Description
delim	String	No	The delimiter of the data content. The delimiter can be one to three valid ASCII characters. You can use escape characters to indicate special characters. For example, \t indicates the tab character, \11 indicates the ASCII character whose serial number corresponds to the octal number 11, and \x09 indicates the ASCII character whose serial number corresponds to the hexadecimal number 09. You can also use a combination of multiple characters as the delimiter. For example, `$$$, ^_^`. The default value is a comma (,).
quote	Char	No	The quote of the data content. The quote is a single valid ASCII character and is used when the data content contains a delimiter. For example, you can specify double quotation marks (""), single quotation marks (''), or an unprintable character (0x01). By default, no quote is used. Important This parameter takes effect only if you set the delim parameter to a single character. You must specify different values for the quote and delim parameters.
strict	Bool	No	Specifies whether to enable strict pairing when the number of values in the data content is different from the number of fields specified in `output`. False: non-strict pairing. The maximum pairing policy is used. If the number of values exceeds the number of fields, the extra values are not returned. If the number of fields exceeds the number of values, the extra fields are returned as empty strings. True: strict pairing. No fields are returned. Default value: False. If you want to enable strict pairing, configure this parameter.
field	Field	Yes	The name of the field that you want to parse. Ensure that the data content includes this field, the type must be `VARCHAR`, and its value is not `NULL`. Otherwise, the extract operation is not performed.
output	Field	Yes	The name of the field that you want to use to store the parsing result of the input data.

Sample statement

Example 1: Match data in simple mode.

SPL statement
<
```
* | parse-csv content as x, y, z
```
Input data
```
content: 'a,b,c'
```
Output results
```
content: 'a,b,c'
x: 'a'
y: 'b'
z: 'c'
```

Example 2: Use double quotes as the quote character to match data containing special characters.

SPL statement
```
* | parse-csv content as ip, time, host
```

Input data

content: '192.168.0.100,"10/Jun/2019:11:32:16,127 +0800",example.aliyundoc.com'

Output results

content: '192.168.0.100,"10/Jun/2019:11:32:16,127 +0800",example.aliyundoc.com'
ip: '192.168.0.100'
time: '10/Jun/2019:11:32:16,127 +0800'
host: 'example.aliyundoc.com'

Example 3: Employ a combination of multiple characters as the separator.

SPL statement

* | parse-csv -delim='||' content as time, ip, req

Input data

content: '05/May/2022:13:30:28||127.0.0.1||POST /put?a=1&b=2'

Output results

content: '05/May/2022:13:30:28||127.0.0.1||POST /put?a=1&b=2'
time: '05/May/2022:13:30:28'
ip: '127.0.0.1'
req: 'POST /put?a=1&b=2'

parse-json

This instruction extracts first-layer JSON information from a specified field.

Important

The output field data type is VARCHAR. If the new field name conflicts with an existing field name in the input data, refer to Retention and overwrite of old and new values.
Operations on time fields __time__ and __time_ns_part__ are not permitted. For more information, see Time fields.

Syntax

| parse-json -mode=<mode> -path=<path> -prefix=<prefix> <field>

Parameter description

Parameter	Type	Required	Description
mode	String	No	The mode that is used to extract information when the name of the output field is the same as an existing field name in the input data. The default value is overwrite.
path	JSONPath	No	The JSON path in the specified field. The JSON path is used to locate the information that you want to extract. The default value is an empty string. If you use the default value, the complete data of the specified field is extracted.
prefix	String	No	The prefix of the fields that are generated by expanding a JSON structure. The default value is an empty string.
field	Field	Yes	The name of the field that you want to parse. Make sure that this field is included in the input data and the field value is a non-null value and meets one of the following conditions. Otherwise, the extract operation is not performed. The data type is JSON. The data type is VARCHAR, and the field value is a valid JSON string.

Sample statement

Example 1: Extract all keys and values from the 'y' field.
- SPL statement
```
* | parse-json y
```
- Input data
```
x: '0'
y: '{"a": 1, "b": 2}'
```
- Output results
```
x: '0'
y: '{"a": 1, "b": 2}'
a: '1'
b: '2'
```

Example 2: Extract the 'body' key's value from the 'content' field as separate fields.

SPL statement
```
* | parse-json -path='$.body' content
```
Input data
```
content: '{"body": {"a": 1, "b": 2}}'
```

Output results

content: '{"body": {"a": 1, "b": 2}}'
a: '1'
b: '2'

Example 3: Extract information in preserve mode, retaining the original value for existing fields.
- SPL statement
```
* | parse-json -mode='preserve' y
```
- Input data
```
a: 'xyz'
x: '0'
y: '{"a": 1, "b": 2}'
```
- Output results
```
x: '0'
y: '{"a": 1, "b": 2}'
a: 'xyz'
b: '2'
```

parse-kv

This instruction extracts key-value pair information from a specified field.

Important

The output field data type is VARCHAR. If the new field name conflicts with an existing field name in the input data, refer to Retention and overwrite of old and new values.
Operations on time fields __time__ and __time_ns_part__ are not permitted. For more information, see Time fields.

Syntax

| parse-kv -mode=<mode> -prefix=<prefix> -regexp <field>, <pattern>

Parameter

Parameter	Type	Required	Description
mode	String	No	If the new field name is the same as an existing field name in the input data, you can select the data overwrite mode. The default value is overwrite. For more information, see the field value overwrite mode.
prefix	String	No	The prefix of the output field name. The default value is an empty string.
regexp	Bool	Yes	Enable the regular extraction mode.
field	Field	Yes	The original name of the field from which you want to extract information. Make sure that this field is included in the input data, the data type is `VARCHAR`, and the field value is a non-`null` value. Otherwise, the extract operation is not performed.
pattern	RegExpr	Yes	The regular expression contains two capturing groups. The first capturing group extracts the field name. The second capturing group extracts the field value. The RE2 syntax is supported.

Sample statement

Example 1: In regular extraction mode, process complex delimiters between key-value pairs and separators between keys and values.
- SPL statement
```
* | parse-kv -regexp content, '([^&?]+)(?:=|:)([^&?]+)'
```
- Input data
```
content: 'k1=v1&k2=v2?k3:v3'
k1: 'xyz'
```
- Output data
```
content: 'k1=v1&k2=v2?k3:v3'
k1: 'v1'
k2: 'v2'
k3: 'v3'
```

Example 2: In regular extraction mode, extract information in preserve mode, retaining the original value for existing fields.

SPL statement

* | parse-kv -regexp -mode='preserve' content, '([^&?]+)(?:=|:)([^&?]+)'

Input data
```
content: 'k1=v1&k2=v2?k3:v3'
k1: 'xyz'
```

Output results

content: 'k1=v1&k2=v2?k3:v3'
k1: 'xyz'
k2: 'v2'
k3: 'v3'

Example 3: In regular extraction mode, handle complex unstructured data where the value is a number or a string enclosed in double quotes.

SPL statement

* | parse-kv -regexp content, '([^&?]+)(?:=|:)([^&?]+)'

Input data

content: 'verb="GET" URI="/healthz" latency="45.911µs" userAgent="kube-probe/1.30+" audit-ID="" srcIP="192.168.123.45:40092" contentType="text/plain; charset=utf-8" resp=200'

Output results

content: 'verb="GET" URI="/healthz" latency="45.911µs" userAgent="kube-probe/1.30+" audit-ID="" srcIP="192.168.123.45:40092" contentType="text/plain; charset=utf-8" resp=200'
verb: 'GET'
URI: '/healthz'
latency: '45.911µs'
userAgent: 'kube-probe/1.30+'
audit-ID: ''
srcIP: '192.168.123.45:40092'
contentType: 'text/plain; charset=utf-8'
resp: '200'