This topic describes the Simple Log Service Processing Language (SPL) instructions.
Parameter type description
The table below details the data types for parameters in SPL instructions.
Parameter type | Description |
Bool | The parameter specifies a Boolean value. This type of parameter is a switch in SPL instructions. |
Char | The parameter specifies an ASCII character. You must use single quotation marks ('') to enclose the character. For example, |
Integer | The parameter specifies an integer value. |
String | The parameter specifies a string. You must use single quotation marks ('') to enclose the string. For example, |
RegExp | The parameter specifies a regular expression. The RE2 syntax is supported. You must use single quotation marks ('') to enclose the regular expression. For example, For more information, see Syntax. |
JSONPath | The parameter specifies a JSON path. You must use single quotation marks ('') to enclose the JSON path. For example, For more information, see JsonPath. |
Field | The parameter specifies a field name. For example, If the field name contains special characters other than letters, digits, or underscores, you must use double quotation marks ("") to enclose the field name. For example, Note For more information about case sensitivity of field names, see SPL functionality definitions in different scenarios. |
FieldPattern | The parameter specifies a field name or a combination of a field name and a wildcard character. An asterisk (*) can be used as a wildcard character, which matches zero or multiple characters. You must use double quotation marks ("") to enclose the field pattern. For example, Note For more information about case sensitivity of field names, see SPL functionality definitions in different scenarios. |
SPLExp | The parameter specifies an SPL expression. |
SQLExp | The parameter specifies an SQL expression. |
List of SPL instructions
Instruction category | Instruction name | Description |
Field processing instructions | This instruction retains the fields that match the specified pattern and renames the specified fields. During instruction execution, all retain-related expressions are executed before rename-related expressions. | |
This instruction removes the fields that match the specified pattern and retains all other fields as they are. | ||
This instruction renames the specified fields and retains all other fields as they are. | ||
This instruction expands the first-layer JSON object of the specified field and generates multiple results. | ||
SQL calculation instructions on structured data | This instruction creates fields based on the result of SQL expression-based data calculation. For more information about the supported SQL functions, see List of SQL functions supported by SPL. | |
This instruction filters data based on the result of SQL expression-based data calculation. Data that matches the specified SQL expression is retained. For more information about the supported SQL functions, see List of SQL functions supported by SPL. | ||
Extraction instructions on semi-structured data | This instruction extracts the information that matches groups in the specified regular expression from the specified field. | |
This instruction extracts information in the CSV format from the specified field. | ||
This instruction extracts the first-layer JSON information from the specified field. | ||
This instruction extracts key-value pair information from the specified field. |
Field processing instructions
project
The project instruction retains fields that match a specified pattern and renames designated fields. Retain-related expressions are executed prior to rename-related expressions during the execution of the instruction.
By default, the time fields __time__ and __time_ns_part__ are preserved and cannot be renamed or overwritten. For more information, see Time fields.
Syntax
| project -wildcard <field-pattern>, <output>=<field>, ...
Parameter description
Parameter | Type | Required | Description |
wildcard | Bool | No | Specifies whether to enable the wildcard match mode. By default, the exact match mode is used. If you want to enable the wildcard match mode, you must configure this parameter. |
field-pattern | FieldPattern | Yes | The name of the field to retain, or a combination of a field and a wildcard character. All matched fields are processed. |
output | Field | Yes | The new name of the field to rename. You cannot rename multiple fields to the same name. Important If the new field name is the same as an existing field name in the input data, see Retention and overwrite of old and new values. |
field | Field | Yes | The original name of the field to rename.
|
Sample statement
Example 1: Retain a field.
* | project level, err_msg
Example 2: Rename a field.
* | project log_level=level, err_msg
Example 3: Retain the field that exactly matches
__tag__:*
.* | project "__tag__:*"
project-away
The project-away instruction removes fields that match a specified pattern, keeping all other fields unchanged.
By default, the time fields __time__ and __time_ns_part__ are retained. For more information, see Time fields.
Syntax
| project-away -wildcard <field-pattern>, ...
Parameter description
Parameter | Type | Required | Description |
wildcard | Bool | No | Specifies whether to enable the wildcard match mode. By default, the exact match mode is used. If you want to enable the wildcard match mode, you must configure this parameter. |
field-pattern | FieldPattern | Yes | The name of the field to remove, or a combination of a field and a wildcard character. All matched fields are processed. |
project-rename
The project-rename instruction renames specified fields while retaining all others as is.
By default, the time fields __time__ and __time_ns_part__ are preserved and cannot be renamed or overwritten. For more information, see Time fields.
Syntax
| project-rename <output>=<field>, ...
Parameter description
Parameter | Type | Required | Description |
output | Field | Yes | The new name of the field to rename. You cannot rename multiple fields to the same name. Important If the new field name is the same as an existing field name in the input data, see Retention and overwrite of old and new values. |
field | Field | Yes | The original name of the field to rename.
|
Example
Rename the specified fields.
* | project-rename log_level=level, log_err_msg=err_msg
expand-values
This instruction expands the first-layer JSON object in a specified field, generating multiple results.
The output field data type is VARCHAR. If the new field name conflicts with an existing field name in the input data, refer to Retention and overwrite of old and new values.
You cannot perform operations on the time fields
__time__
and__time_ns_part__
. For more information, see Time fields.Supported scenarios include data manipulation (new version). For SPL functionality definitions in different scenarios, see SPL functionality definitions in different scenarios.
Syntax
| expand-values -path=<path> -limit=<limit> -keep <field> as <output>
Parameter description
Parameter | Type | Required | Description |
path | JSONPath | No | The JSON path in the specified field. The JSON path is used to locate the information that you want to extract. The default value is an empty string. If you use the default value, the complete data of the specified field is extracted. |
limit | Integer | No | The maximum number of entries that can be expanded from each piece of raw data. The value is an integer from 1 to 10. The default value is 10. |
keep | Bool | No | Specifies whether to retain the original field after expansion. By default, the original field is not retained. If you want to retain the original field, you must enable this switch. |
field | Field | Yes | The original name of the field to expand. The data type must be |
output | Filed | No | The name of the field to create. If you do not specify this parameter, the output result is written to the input field by default. The expansion logic for the original content is as follows: JSON array: The array is expanded based on its elements. JSON dictionary: The dictionary is expanded based on its key-value pairs. Other JSON types: The initial value is returned. Invalid JSON: |
Sample statement
Example 1: Expand an array, outputting multiple result sets.
SPL statement
* | expand-values y
Input data
x: 'abc' y: '[0,1,2]'
Output data: The array is expanded into three data sets.
# Entry 1 x: 'abc' y: '0' # Entry 2 x: 'abc' y: '1' # Entry 3 x: 'abc' y: '2'
Example 2: Expand a dictionary, outputting multiple result sets.
SPL statement
* | expand-values y
Input data
x: 'abc' y: '{"a": 1, "b": 2}'
Output data: The dictionary is expanded into two data sets.
# Entry 1 x: 'abc' y: '{"a": 1}' # Entry 2 x: 'abc' y: '{"b": 2}'
Example 3: Expand content under a specified JSON path, outputting results to a new field.
SPL statement
* | expand-values -keep content -path='$.body' as body
Input data
content: '{"body": [0, {"a": 1, "b": 2}]}'
Output data: The content is expanded into two data sets.
# Entry 1 content: '{"body": [1, 2]}' body: '0' # Entry 2 content: '{"body": [1, 2]}' body: '{"a": 1, "b": 2}'
SQL calculation instructions on structured data
extend
This instruction creates fields based on SQL expression-based data calculations. For a list of supported SQL functions, see List of SQL functions supported by SPL.
Syntax
| extend <output>=<sql-expr>, ...
Parameter description
Parameter | Type | Required | Description |
output | Field | Yes | The name of the field to create. You cannot create the same field to store the results of multiple expressions. Important If the new field name is the same as an existing field name in the input data, the new field overwrites the existing field based on the data type and value. |
sql-expr | SQLExpr | Yes | The data processing expression. Important For more information about null value processing, see Null value processing in SPL expressions. |
Sample statement
Example 1: Apply a computation expression.
* | extend Duration = EndTime - StartTime
Example 2: Utilize a regular expression.
* | extend server_protocol_version=regexp_extract(server_protocol, '\d+')
Example 3: Extract JSONPath content and convert a field's data type.
SPL statement
* | extend a=json_extract(content, '$.body.a'), b=json_extract(content, '$.body.b') | extend b=cast(b as BIGINT)
Input data
content: '{"body": {"a": 1, "b": 2}}'
Output results
content: '{"body": {"a": 1, "b": 2}}' a: '1' b: 2
where
This instruction filters data based on SQL expression-based calculations, retaining data that matches the specified SQL expression. For a list of supported SQL functions, see List of SQL functions supported by SPL.
Syntax
| where <sql-expr>
Parameter description
Parameter | Type | Required | Description |
sql-expr | SQLExp | Yes | The SQL expression. Data that matches this expression is retained. Important For more information about null value processing in SQL expressions, see Null value processing in SPL expressions. |
Sample statement
Example 1: Filter data based on field content.
* | where userId='123'
Example 2: Filter data using a regular expression that matches based on a field name.
* | where regexp_like(server_protocol, '\d+')
Example 3: Convert a field's data type to match all server error data.
* | where cast(status as BIGINT) >= 500
Extraction instructions on semi-structured data
parse-regexp
This instruction extracts information matching groups in a specified regular expression from a field.
The output field data type is VARCHAR. If the new field name conflicts with an existing field name in the input data, refer to Retention and overwrite of old and new values.
Operations on time fields __time__ and __time_ns_part__ are not permitted. For more information, see Time fields.
Syntax
| parse-regexp <field>, <pattern> as <output>, ...
Parameter description
Parameter | Type | Required | Description |
field | Field | Yes | The original name of the field from which you want to extract information. Make sure that this field is included in the input data, the data type is |
pattern | Regexp | Yes | The regular expression. The RE2 syntax is supported. |
output | Field | No | The name of the output field that you want to use to store the extraction result of the regular extraction. |
Sample statement
Example 1: Employ exploratory match mode.
SPL statement
* | parse-regexp content, '(\S+)' as ip -- Generate the ip: 10.0.0.0 field. | parse-regexp content, '\S+\s+(\w+)' as method -- Generate the method: GET field.
Input data
content: '10.0.0.0 GET /index.html 15824 0.043'
Output results
content: '10.0.0.0 GET /index.html 15824 0.043' ip: '10.0.0.0' method: 'GET'
Example 2: Utilize full pattern match mode with unnamed capturing groups.
SPL statement
* | parse-regexp content, '(\S+)\s+(\w+)' as ip, method
Input data
content: '10.0.0.0 GET /index.html 15824 0.043'
Output results
content: '10.0.0.0 GET /index.html 15824 0.043' ip: '10.0.0.0' method: 'GET'
parse-csv
This instruction extracts CSV-formatted information from a specified field.
The output field data type is VARCHAR. If the new field name conflicts with an existing field name in the input data, refer to Retention and overwrite of old and new values.
Operations on time fields __time__ and __time_ns_part__ are not permitted. For more information, see Time fields.
Syntax
| parse-csv -delim=<delim> -quote=<quote> -strict <field> as <output>, ...
Parameter description
Parameter | Type | Required | Description |
delim | String | No | The delimiter of the data content. The delimiter can be one to three valid ASCII characters. You can use escape characters to indicate special characters. For example, \t indicates the tab character, \11 indicates the ASCII character whose serial number corresponds to the octal number 11, and \x09 indicates the ASCII character whose serial number corresponds to the hexadecimal number 09. You can also use a combination of multiple characters as the delimiter. For example, The default value is a comma (,). |
quote | Char | No | The quote of the data content. The quote is a single valid ASCII character and is used when the data content contains a delimiter. For example, you can specify double quotation marks (""), single quotation marks (''), or an unprintable character (0x01). By default, no quote is used. Important This parameter takes effect only if you set the delim parameter to a single character. You must specify different values for the quote and delim parameters. |
strict | Bool | No | Specifies whether to enable strict pairing when the number of values in the data content is different from the number of fields specified in
Default value: False. If you want to enable strict pairing, configure this parameter. |
field | Field | Yes | The name of the field that you want to parse. Ensure that the data content includes this field, the type must be |
output | Field | Yes | The name of the field that you want to use to store the parsing result of the input data. |
Sample statement
Example 1: Match data in simple mode.
SPL statement
<* | parse-csv content as x, y, z
Input data
content: 'a,b,c'
Output results
content: 'a,b,c' x: 'a' y: 'b' z: 'c'
Example 2: Use double quotes as the quote character to match data containing special characters.
SPL statement
* | parse-csv content as ip, time, host
Input data
content: '192.168.0.100,"10/Jun/2019:11:32:16,127 +0800",example.aliyundoc.com'
Output results
content: '192.168.0.100,"10/Jun/2019:11:32:16,127 +0800",example.aliyundoc.com' ip: '192.168.0.100' time: '10/Jun/2019:11:32:16,127 +0800' host: 'example.aliyundoc.com'
Example 3: Employ a combination of multiple characters as the separator.
SPL statement
* | parse-csv -delim='||' content as time, ip, req
Input data
content: '05/May/2022:13:30:28||127.0.0.1||POST /put?a=1&b=2'
Output results
content: '05/May/2022:13:30:28||127.0.0.1||POST /put?a=1&b=2' time: '05/May/2022:13:30:28' ip: '127.0.0.1' req: 'POST /put?a=1&b=2'
parse-json
This instruction extracts first-layer JSON information from a specified field.
The output field data type is VARCHAR. If the new field name conflicts with an existing field name in the input data, refer to Retention and overwrite of old and new values.
Operations on time fields __time__ and __time_ns_part__ are not permitted. For more information, see Time fields.
Syntax
| parse-json -mode=<mode> -path=<path> -prefix=<prefix> <field>
Parameter description
Parameter | Type | Required | Description |
mode | String | No | The mode that is used to extract information when the name of the output field is the same as an existing field name in the input data. The default value is overwrite. |
path | JSONPath | No | The JSON path in the specified field. The JSON path is used to locate the information that you want to extract. The default value is an empty string. If you use the default value, the complete data of the specified field is extracted. |
prefix | String | No | The prefix of the fields that are generated by expanding a JSON structure. The default value is an empty string. |
field | Field | Yes | The name of the field that you want to parse. Make sure that this field is included in the input data and the field value is a non-null value and meets one of the following conditions. Otherwise, the extract operation is not performed.
|
Sample statement
Example 1: Extract all keys and values from the 'y' field.
SPL statement
* | parse-json y
Input data
x: '0' y: '{"a": 1, "b": 2}'
Output results
x: '0' y: '{"a": 1, "b": 2}' a: '1' b: '2'
Example 2: Extract the 'body' key's value from the 'content' field as separate fields.
SPL statement
* | parse-json -path='$.body' content
Input data
content: '{"body": {"a": 1, "b": 2}}'
Output results
content: '{"body": {"a": 1, "b": 2}}' a: '1' b: '2'
Example 3: Extract information in preserve mode, retaining the original value for existing fields.
SPL statement
* | parse-json -mode='preserve' y
Input data
a: 'xyz' x: '0' y: '{"a": 1, "b": 2}'
Output results
x: '0' y: '{"a": 1, "b": 2}' a: 'xyz' b: '2'
parse-kv
This instruction extracts key-value pair information from a specified field.
The output field data type is VARCHAR. If the new field name conflicts with an existing field name in the input data, refer to Retention and overwrite of old and new values.
Operations on time fields __time__ and __time_ns_part__ are not permitted. For more information, see Time fields.
Syntax
| parse-kv -mode=<mode> -prefix=<prefix> -regexp <field>, <pattern>
Parameter
Parameter | Type | Required | Description |
mode | String | No | If the new field name is the same as an existing field name in the input data, you can select the data overwrite mode. The default value is overwrite. For more information, see the field value overwrite mode. |
prefix | String | No | The prefix of the output field name. The default value is an empty string. |
regexp | Bool | Yes | Enable the regular extraction mode. |
field | Field | Yes | The original name of the field from which you want to extract information. Make sure that this field is included in the input data, the data type is |
pattern | RegExpr | Yes | The regular expression contains two capturing groups. The first capturing group extracts the field name. The second capturing group extracts the field value. The RE2 syntax is supported. |
Sample statement
Example 1: In regular extraction mode, process complex delimiters between key-value pairs and separators between keys and values.
SPL statement
* | parse-kv -regexp content, '([^&?]+)(?:=|:)([^&?]+)'
Input data
content: 'k1=v1&k2=v2?k3:v3' k1: 'xyz'
Output data
content: 'k1=v1&k2=v2?k3:v3' k1: 'v1' k2: 'v2' k3: 'v3'
Example 2: In regular extraction mode, extract information in preserve mode, retaining the original value for existing fields.
SPL statement
* | parse-kv -regexp -mode='preserve' content, '([^&?]+)(?:=|:)([^&?]+)'
Input data
content: 'k1=v1&k2=v2?k3:v3' k1: 'xyz'
Output results
content: 'k1=v1&k2=v2?k3:v3' k1: 'xyz' k2: 'v2' k3: 'v3'
Example 3: In regular extraction mode, handle complex unstructured data where the value is a number or a string enclosed in double quotes.
SPL statement
* | parse-kv -regexp content, '([^&?]+)(?:=|:)([^&?]+)'
Input data
content: 'verb="GET" URI="/healthz" latency="45.911µs" userAgent="kube-probe/1.30+" audit-ID="" srcIP="192.168.123.45:40092" contentType="text/plain; charset=utf-8" resp=200'
Output results
content: 'verb="GET" URI="/healthz" latency="45.911µs" userAgent="kube-probe/1.30+" audit-ID="" srcIP="192.168.123.45:40092" contentType="text/plain; charset=utf-8" resp=200' verb: 'GET' URI: '/healthz' latency: '45.911µs' userAgent: 'kube-probe/1.30+' audit-ID: '' srcIP: '192.168.123.45:40092' contentType: 'text/plain; charset=utf-8' resp: '200'