本文檔主要為您介紹如何使用Log Service資料加工功能對複雜的JSON資料進行加工。
多子鍵為數組的複雜JSON資料加工
程式構建的日誌會以一種統計性質的JSON格式寫入,通常包含一個基礎資訊以及多個子健為數組的資料形式。例如一個伺服器每隔1分鐘寫入一條日誌,包含當前資訊狀態,以及相關伺服器和用戶端節點的統計狀態資訊。
日誌範例
__source__: 192.0.2.1 __topic__: content:{ "service": "search_service", "overal_status": "yellow", "servers": [ { "host": "192.0.2.1", "status": "green" }, { "host": "192.0.2.2", "status": "green" } ], "clients": [ { "host": "192.0.2.3", "status": "green" }, { "host": "192.0.2.4", "status": "red" } ] }
加工需求
對原始日誌進行
topic
分裂,分別是overall_type
、client_status
、server_status
。對不同的
topic
儲存不同的資訊。overall_type
:保留server、client數量、overal_status顏色和service資訊。client_status
:保留host地址、status狀態和service資訊。server_status
:保留host地址、status狀態和service資訊。
期望結果
__source__: 192.0.2.1 __topic__: overall_type client_count: 2 overal_status: yellow server_count: 2 service: search_service __source__: 192.0.2.1 __topic__: client_status host: 192.0.2.4 status: red service: search_service __source__: 192.0.2.1 __topic__: client_status host: 192.0.2.3 status: green service: search_service __source__: 192.0.2.1 __topic__: server_status host: 192.0.2.1 status: green service: search_service __source__: 192.0.2.1 __topic__: server_status host: 192.0.2.2 status: green service: search_service
解決方案
將一條日誌拆分成三條日誌,給主題賦予三個不同值再進行分裂,經過分裂後會分成除
topic
不同,其他資訊相同的三條日誌。e_set("__topic__", "server_status,client_status,overall_type") e_split("__topic__")
處理後日誌格式如下:
__source__: 192.0.2.1 __topic__: server_status // 另外2條是client_status和overall_type, 其他一樣 content: { ...如上... }
基於
content
的JSON內容在第一層展開,並刪除content
欄位。e_json('content',depth=1) e_drop_fields("content")
處理後的日誌格式如下:
__source__: 192.0.2.1 __topic__: overall_type // 另外2條是client_status和overall_type, 其他一樣 clients: [{"host": "192.0.2.3", "status": "green"}, {"host": "192.0.2.4", "status": "red"}] overal_status: yellow servers: [{"host": "192.0.2.1", "status": "green"}, {"host": "192.0.2.2", "status": "green"}] service: search_service
對主題是
overall_type
的日誌,統計client_count
和server_count
。e_if(e_search("__topic__==overall_type"), e_compose( e_set("client_count", json_select(v("clients"), "length([*])", default=0)), e_set("server_count", json_select(v("servers"), "length([*])", default=0)) ))
處理後的日誌為:
__topic__: overall_type server_count: 2 client_count: 2
丟棄相關欄位:
e_if(e_search("__topic__==overall_type"), e_drop_fields("clients", "servers"))
對主題是
server_status
的日誌,進行進一步分裂。e_if(e_search("__topic__==server_status"), e_compose( e_split("servers"), e_json("servers", depth=1) ))
處理後的日誌為如下兩條:
__topic__: server_status servers: {"host": "192.0.2.1", "status": "green"} host: 192.0.2.1 status: green
__topic__: server_status servers: {"host": "192.0.2.2", "status": "green"} host: 192.0.2.2 status: green
保留相關欄位:
e_if(e_search("__topic__==overall_type"), e_drop_fields("servers"))
對主題是
client_status
的日誌進行進一步分裂,再刪除多餘欄位。e_if(e_search("__topic__==client_status"), e_compose( e_split("clients"), e_json("clients", depth=1), e_drop_fields("clients") ))
處理後的日誌為如下兩個日誌:
__topic__: client_status host: 192.0.2.3 status: green
__topic__: clients host: 192.0.2.4 status: red
綜上LOG DSL規則:
# 總體分裂 e_set("__topic__", "server_status,client_status,overall_type") e_split("__topic__") e_json('content',depth=1) e_drop_fields("content") # 處理overall_type日誌 e_if(e_search("__topic__==overall_type"), e_compose( e_set("client_count", json_select(v("clients"), "length([*])", default=0)), e_set("server_count", json_select(v("servers"), "length([*])", default=0)) )) # 處理server_status日誌 e_if(e_search("__topic__==server_status"), e_compose( e_split("servers"), e_json("servers", depth=1) )) e_if(e_search("__topic__==overall_type"), e_drop_fields("servers")) # 處理client_status日誌 e_if(e_search("__topic__==client_status"), e_compose( e_split("clients"), e_json("clients", depth=1), e_drop_fields("clients") ))
方案最佳化
上述方案對content.servers
和content.servers
為空白時的處理有一些問題。假設原始日誌是:
__source__: 192.0.2.1
__topic__:
content:{
"service": "search_service",
"overal_status": "yellow",
"servers": [ ],
"clients": [ ]
}
按照上述方案分裂為三條日誌,其中主題為client_status
和server_status
的日誌內容是空的。
__source__: 192.0.2.1
__topic__: overall_type
client_count: 0
overal_status: yellow
server_count: 0
service: search_service
__source__: 192.0.2.1
__topic__: client_status
service: search_service
__source__: 192.0.2.1
__topic__: server_status
host: 192.0.2.1
status: green
service: search_service
方案1
可以在初始分裂後,處理
server_status
和client_status
日誌前分別判斷並丟棄空的相關事件。# 處理server_status: 空的丟棄(非空保留) e_keep(op_and(e_search("__topic__==server_status"), json_select(v("servers"), "length([*])"))) # 處理client_status: 空的丟棄(非空保留) e_keep(op_and(e_search("__topic__==client_status"), json_select(v("clients"), "length([*])")))
綜上LOG DSL規則是:
# 總體分裂 e_set("__topic__", "server_status,client_status,overall_type") e_split("__topic__") e_json('content',depth=1) e_drop_fields("content") # 處理overall_type日誌 e_if(e_search("__topic__==overall_type"), e_compose( e_set("client_count", json_select(v("clients"), "length([*])", default=0)), e_set("server_count", json_select(v("servers"), "length([*])", default=0)) )) # 新增: 預先處理server_status: 空的丟棄(非空保留) e_keep(op_and(e_search("__topic__==server_status"), json_select(v("servers"), "length([*])"))) # 處理server_status日誌 e_if(e_search("__topic__==server_status"), e_compose( e_split("servers"), e_json("servers", depth=1) )) e_if(e_search("__topic__==overall_type"), e_drop_fields("servers")) # 新增: 預先處理client_status: 空的丟棄(非空保留) e_keep(op_and(e_search("__topic__==client_status"), json_select(v("clients"), "length([*])"))) # 處理client_status日誌 e_if(e_search("__topic__==client_status"), e_compose( e_split("clients"), e_json("clients", depth=1), e_drop_fields("clients") ))
方案2
在初始分裂時進行判斷,如果對應資料為空白就進行分裂。
# 初始主題 e_set("__topic__", "server_status") # 如果content.servers非空, 則從server_status分裂出1條日誌 e_if(json_select(v("content"), "length(servers[*])"), e_compse( e_set("__topic__", "server_status,overall_type"), e_split("__topic__") )) # 如果content.clients非空, 則從overall_type再分裂出1條日誌 e_if(op_and(e_search("__topic__==overall_type"), json_select(v("content"), "length(clients[*])")), e_compse( e_set("__topic__", "client_status,overall_type"), e_split("__topic__") ))
綜上LOG DSL規則是:
# 總體分裂 e_set("__topic__", "server_status") # 如果content.servers非空, 則從server_status分裂出1條日誌 e_if(json_select(v("content"), "length(servers[*])"), e_compse( e_set("__topic__", "server_status,overall_type"), e_split("__topic__") )) # 如果content.clients非空, 則從server_status分裂出1條日誌 e_if(op_and(e_search("__topic__==overall_type"), json_select(v("content"), "length(clients[*])")), e_compse( e_set("__topic__", "client_status,overall_type"), e_split("__topic__") )) # 處理overall_type日誌 e_if(e_search("__topic__==overall_type"), e_compose( e_set("client_count", json_select(v("clients"), "length([*])", default=0)), e_set("server_count", json_select(v("servers"), "length([*])", default=0)) )) # 處理server_status日誌 e_if(e_search("__topic__==server_status"), e_compose( e_split("servers"), e_json("servers", depth=1) )) e_if(e_search("__topic__==overall_type"), e_drop_fields("servers")) # 處理client_status日誌 e_if(e_search("__topic__==client_status"), e_compose( e_split("clients"), e_json("clients", depth=1), e_drop_fields("clients") ))
方案對比
方案1在分裂出日誌後再刪除為空白的日誌,邏輯上有些多餘,但規則簡單易維護。預設推薦該方案。
方案2會在分裂前進行判斷,處理效率會高一些,但規則略微冗餘,僅在特定情境例如初始分裂可能導致大量額外事件產生時推薦。
多層數組對象嵌套的複雜JSON資料加工
以一個複雜的保護多層數組嵌套的對象為樣本,將users
下的每個對象中的login_histories
的每個登入資訊都拆成一個登入事件。
原始日誌
__source__: 192.0.2.1 __topic__: content:{ "users": [ { "name": "user1", "login_histories": [ { "date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6" }, { "date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6" }, { ...更多登入資訊... } 192.0.2.9 ] }, { "name": "user2", "login_histories": [ { "date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7" }, { "date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9" }, { ...更多登入資訊... } ] }, { ....更多user.... } ] }
期望分裂出的日誌
__source__: 192.0.2.1 name: user1 date: 2019-10-11 1:0:0 login_ip: 192.0.2.6 __source__: 192.0.2.1 name: user1 date: 2019-10-11 0:0:0 login_ip: 192.0.2.6 __source__: 192.0.2.1 name: user2 date: 2019-10-11 0:0:0 login_ip: 192.0.2.7 __source__: 192.0.2.1 name: user2 date: 2019-10-11 1:0:0 login_ip: 192.0.2.9 ....更多日誌....
解決方案
對
content
中的users
進行分裂和展開操作。e_split("content", jmes='users[*]', output='item') e_json("item",depth=1)
處理後返回的日誌:
__source__: 192.0.2.1 __topic__: content:{...如前...} item: {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]} login_histories: [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}] name: user1 __source__: 192.0.2.1 __topic__: content:{...如前...} item: {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]} login_histories: [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}] name: user2
對
login_histories
先分裂再展開。e_split("login_histories") e_json("login_histories", depth=1)
處理後返回的日誌:
__source__: 192.0.2.1 __topic__: content: {...如前...} date: 2019-10-11 0:0:0 item: {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]} login_histories: {"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"} login_ip: 192.0.2.7 name: user2 __source__: 192.0.2.1 __topic__: content: {...如前...} date: 2019-10-11 1:0:0 item: {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]} login_histories: {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"} login_ip: 192.0.2.9 name: user2 __source__: 192.0.2.1 __topic__: content: {...如前...} date: 2019-10-10 1:0:0 item: {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]} login_histories: {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"} login_ip: 192.0.2.6 name: user1 __source__: 192.0.2.1 __topic__: content: {...如前...} date: 2019-10-10 0:0:0 item: {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]} login_histories: {"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"} login_ip: 192.0.2.6 name: user1
刪除無關欄位。
e_drop_fields("content", "item", "login_histories")
處理後返回的日誌:
__source__: 192.0.2.1 __topic__: name: user1 date: 2019-10-11 1:0:0 login_ip: 192.0.2.6 __source__: 192.0.2.1 __topic__: name: user1 date: 2019-10-11 0:0:0 login_ip: 192.0.2.6 __source__: 192.0.2.1 __topic__: name: user2 date: 2019-10-11 0:0:0 login_ip: 192.0.2.7 __source__: 192.0.2.1 __topic__: name: user2 date: 2019-10-11 1:0:0 login_ip: 192.0.2.9
綜上LOG DSL規則可以如以下形式:
e_split("content", jmes='users[*]', output='item') e_json("item",depth=1) e_split("login_histories") e_json("login_histories", depth=1) e_drop_fields("content", "item", "login_histories")
總結:針對以上類似的需求,首先進行分裂,然後再做展開操作,最後刪除無關資訊。