You can use data transformation functions provided by Simple Log Service to cleanse large amounts of log data. This way, the formats of the log data are standardized. This topic describes how to use functions to cleanse data in various scenarios.
Scenario 1: Filter logs by using the e_keep and e_drop functions
You can filter logs by using the e_drop or e_keep function. You can also filter logs by combining the e_if function and the DROP parameter or combining the e_if_else function and the DROP parameter.
Common transformation rules:
e_keep(e_search(...) )
: Logs that meet the specified conditions are retained, whereas logs that do not meet the specified conditions are discarded.e_drop(e_search(...) )
: Logs that meet the specified conditions are discarded, whereas logs that do not meet the specified conditions are retained.e_if_else(e_search("..."), KEEP, DROP)
: Logs that meet the specified conditions are retained, whereas logs that do not meet the specified conditions are discarded.e_if(e_search("not ..."), DROP)
: Logs that meet the specified conditions are discarded, whereas logs that do not meet the specified conditions are retained.e_if(e_search("..."), KEEP)
: This transformation rule is invalid.
Example:
Raw log
# Log 1 __source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214851 __topic__: app class: test_case id: 7992 test_string: <function test1 at 0x1027401e0> # Log 2 __source__: 192.168.0.1 class: produce_case id: 7990 test_string: <function test1 at 0x1020401e0>
Transformation rule
Discard the logs that do not contain the __topic__ and __tag__:__receive_time__ fields.
e_if(e_not_has("__topic__"),e_drop()) e_if(e_not_has("__tag__:__receive_time__"),e_drop())
Result
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214851 __topic__: app class: test_case id: 7992 test_string: <function test1 at 0x1027401e0>
Scenario 2: Assign values to empty fields in logs by using the e_set function
You can assign values to empty fields in logs by using the e_set function.
Sub-scenario 1: Assign a value to a field if the field does not exist or is empty.
e_set("result", "......value......", mode="fill")
For more information about the mode parameter value, see Field extraction check and overwrite modes.
Example:
Raw log
name:
Transformation rule
e_set("name", "aspara2.0", mode="fill")
Result
name: aspara2.0
Sub-scenario 2: Simplify a regular expression and extract field values by using the Grok function.
Example:
Raw log
content:"ip address: 192.168.1.1"
Transformation rule
Capture and extract the IP address in the content field by using the Grok function.
e_regex("content", grok(r"(%{IP})"),"addr")
Result
addr: 192.168.1.1 content:"ip address: 192.168.1.1"
Sub-scenario 3: Assign values to multiple fields.
e_set("k1", "v1", "k2", "v2", "k3", "v3", ......)
Example:
Raw log
__source__: 192.168.0.1 __topic__: __tag__: __receive_time__: id: 7990 test_string: <function test1 at 0x1020401e0>
Transformation rule
Assign values to the __topic__, __tag__, and __receive_time__ fields.
e_set("__topic__","app", "__tag__","stu","__receive_time__","1597214851")
Result
__source__: 192.168.0.1 __topic__: app __tag__: stu __receive_time__: 1597214851 id: 7990 test_string: <function test1 at 0x1020401e0>
Scenario 3: Delete a field and rename a field by using the e_search, e_rename, and e_compose functions
In most cases, we recommend that you use the e_compose function to evaluate data based on specified conditions and perform operations based on the evaluation result.
Example:
Raw log
content:123 age: 23 name:twiss
Transformation rule
If the value of the content field is 123, delete the age and name fields. Then, rename the content field to ctx.
e_if(e_search("content==123"),e_compose(e_drop_fields("age|name"), e_rename("content", "ctx")))
Result
ctx: 123
Scenario 4: Convert the data types of fields in logs by using the v, cn_int, and dt_totimestamp functions
The fields and field values in logs are processed as strings during data transformation. Data of a non-string type is automatically converted to data of the string type. When you call a function, take note of the data types that are supported by the function. For more information, see Syntax overview.
Sub-scenario 1: Concatenate strings and sum up data by using the op_add function.
The op_add function supports the string and numeric types. Therefore, data type conversion is not required.
Example:
Raw log
a : 1 b : 2
Transformation rule
e_set("d",op_add(v("a"), v("b"))) e_set("e",op_add(ct_int(v("a")), ct_int(v("b"))))
Result
a:1 b:2 d:12 e:3
Sub-scenario 2: Convert data types by using the Field processing and ct_int functions and call the op_mul function to multiply data.
Example:
Raw log
a:2 b:5
Transformation rule
v("a") and v("b") are of the string type. The second parameter of the op_mul function can be only of a numeric type. In this case, you must convert a string to an integer by using the ct_int function and pass the integer to the op_mul function.
e_set("c",op_mul(ct_int(v("a")), ct_int(v("b")))) e_set("d",op_mul(v("a"), ct_int(v("b"))))
Result
a: 2 b: 5 c: 10 d: 22222
Sub-scenario 3: Convert a string or datetime to a standard time by using the dt_parse and dt_parsetimestamp functions.
The dt_totimestamp function supports the datetime object type. The dt_totimestamp function does not support the string type. In this case, you must call the dt_parse function to convert time1 of the string type to the datetime object type. You can also use the dt_parsetimestamp function. The dt_parsetimestamp function supports the datetime object and string types. For more information, see Date and time functions.
Example:
Raw log
time1: 2020-09-17 9:00:00
Transformation rule
Convert the datetime that is specified by time1 to a UNIX timestamp.
e_set("time1", "2019-06-03 2:41:26") e_set("time2", dt_totimestamp(dt_parse(v("time1")))) or e_set("time2", dt_parsetimestamp(v("time1")))
Result
time1: 2019-06-03 2:41:26 time2: 1559529686
Scenario 5: Pass default values to the fields that do not exist in logs by configuring the default parameter
Some expression functions provided by the domain-specific language (DSL) for Simple Log Service have specific requirements for input parameters. If the input parameters do not meet the requirements, the data transformation rules that use the functions return the default values or an error. If a log field is required but is left empty, you can pass the default value to the field by using the op_len function.
If default values are passed to subsequent functions, errors may occur. We recommend that you handle the errors at the earliest opportunity.
Raw log
data_len: 1024
Transformation rule
e_set("data_len", op_len(v("data", default="")))
Result
data: 0 data_len: 0
Scenario 6: Evaluate logs based on specified conditions and add fields based on the evaluation result by using the e_if and e_switch functions
We recommend that you evaluate logs by using the e_if or e_switch function. For more information, see Flow control functions.
e_if function
e_if(Condition 1, Operation 1, Condition 2, Operation 2, Condition 3, Operation 3, ....)
e_switch function
When you use the e_switch function, you must specify condition-operation pairs. The e_switch function evaluates the conditions in sequence. If a condition is met, its paired operation is performed and the operation result is returned. If a condition is not met, its paired operation is not performed and the next condition is evaluated. If no conditions are met and the default field is specified, the operation that is specified by default is performed and the operation result is returned.
e_switch(Condition 1, Operation 1, Condition 2, Operation 2, Condition 3, Operation 3, ...., default=None)
Example:
Raw log
status1: 200 status2: 404
e_if function
Transformation rule
e_if(e_match("status1", "200"), e_set("status1_info", "normal"), e_match("status2", "404"), e_set("status2_info", "error"))
Result
status1: 200 status2: 404 status1_info: normal status2_info: error
e_switch function
Transformation rule
e_switch(e_match("status1", "200"), e_set("status1_info", "normal"), e_match("status2", "404"), e_set("status2_info", "error"))
Result
The e_switch function evaluates the conditions in sequence. If a condition is met, the operation result is returned and no more conditions are evaluated.
status1: 200 status2: 404 status1_info: normal
Scenario 7: Convert UNIX timestamps to log time values that are accurate to the nanosecond
In some data transformation scenarios, the timestamp of data must be accurate to the nanosecond. If a raw log contains a field whose value is a UNIX timestamp, you can use field processing functions to convert the field value into a log time that is accurate to the nanosecond.
Raw log
{ "__source__": "1.2.3.4", "__time__": 1704983810, "__topic__": "test", "log_time_nano":"1705043680630940602" }
Transformation rule
e_set( "__time__", op_div_floor(ct_int(v("log_time_nano")), 1000000000), ) e_set( "__time_ns_part__", op_mod(ct_int(v("log_time_nano")), 1000000000), )
Result
{ "__source__": "1.2.3.4", "__time__": 1705043680, "__time_ns_part__": 630940602, "__topic__": "test", "log_time_nano":"1705043680630940602" }
Scenario 8: Convert UNIX timestamps that follow the ISO 8601 standard to log time values that are accurate to the microsecond
In some data transformation scenarios, high-precision timestamps are required. If a raw log contains a field whose value follows the ISO 8601 standard, you can use field processing functions to convert the field value into a log time that is accurate to the microsecond.
Raw log
{ "__source__": "1.2.3.4", "__time__": 1704983810, "__topic__": "test", "log_time":"2024-01-11 23:10:43.992847200" }
Transformation rule
e_set( "__time__", dt_parsetimestamp(v("log_time"), tz="Asia/Shanghai"), mode="overwrite", ) e_set("tmp_ms", dt_prop(v("log_time"), "microsecond")) e_set( "__time_ns_part__", op_mul(ct_int(v("tmp_ms")), 1000), )
Result
{ "__source__": "1.2.3.4", "__time__": 1704985843, "__time_ns_part__": 992847000, "__topic__": "test", "log_time": "2024-01-11 23:10:43.992847200", "tmp_ms": "992847" }