全部產品
Search
文件中心

Simple Log Service:複雜JSON資料加工

更新時間:Jun 30, 2024

本文檔主要為您介紹如何使用Log Service資料加工功能對複雜的JSON資料進行加工。

多子鍵為數組的複雜JSON資料加工

程式構建的日誌會以一種統計性質的JSON格式寫入,通常包含一個基礎資訊以及多個子健為數組的資料形式。例如一個伺服器每隔1分鐘寫入一條日誌,包含當前資訊狀態,以及相關伺服器和用戶端節點的統計狀態資訊。

  • 日誌範例

    __source__:  192.0.2.1
    __topic__:  
    content:{
         "service": "search_service",
         "overal_status": "yellow",
         "servers": [
             {
                 "host": "192.0.2.1",
                 "status": "green"
             },
             {
                 "host": "192.0.2.2",
                 "status": "green"
             }
         ],
         "clients": [
             {
                 "host": "192.0.2.3",
                 "status": "green"
             },
             {
                 "host": "192.0.2.4",
                 "status": "red"
             }
         ]
    }
  • 加工需求

    1. 對原始日誌進行topic分裂,分別是overall_typeclient_statusserver_status

    2. 對不同的topic儲存不同的資訊。

      • overall_type:保留server、client數量、overal_status顏色和service資訊。

      • client_status:保留host地址、status狀態和service資訊。

      • server_status:保留host地址、status狀態和service資訊。

  • 期望結果

    __source__:  192.0.2.1
    __topic__:  overall_type
    client_count:  2
    overal_status:  yellow
    server_count:  2
    service:  search_service
    
    
    __source__:  192.0.2.1
    __topic__:  client_status
    host:  192.0.2.4
    status:  red
    service:  search_service
    
    
    __source__:  192.0.2.1
    __topic__:  client_status
    host:  192.0.2.3
    status:  green
    service:  search_service
    
    
    __source__:  192.0.2.1
    __topic__:  server_status
    host:  192.0.2.1
    status:  green
    service:  search_service
    
    
    __source__:  192.0.2.1
    __topic__:  server_status
    host:  192.0.2.2
    status:  green
    service:  search_service
  • 解決方案

    1. 將一條日誌拆分成三條日誌,給主題賦予三個不同值再進行分裂,經過分裂後會分成除topic不同,其他資訊相同的三條日誌。

      e_set("__topic__", "server_status,client_status,overall_type")
      e_split("__topic__")

      處理後日誌格式如下:

      __source__:  192.0.2.1
      __topic__:  server_status         // 另外2條是client_status和overall_type, 其他一樣
      content:  {
          ...如上...
      }
    2. 基於content的JSON內容在第一層展開,並刪除content欄位。

      e_json('content',depth=1)
      e_drop_fields("content")

      處理後的日誌格式如下:

      __source__:  192.0.2.1
      __topic__:  overall_type              // 另外2條是client_status和overall_type, 其他一樣
      clients:  [{"host": "192.0.2.3", "status": "green"}, {"host": "192.0.2.4", "status": "red"}]
      overal_status:  yellow
      servers:  [{"host": "192.0.2.1", "status": "green"}, {"host": "192.0.2.2", "status": "green"}]
      service:  search_service
    3. 對主題是overall_type的日誌,統計client_countserver_count

      e_if(e_search("__topic__==overall_type"), 
           e_compose(
              e_set("client_count", json_select(v("clients"), "length([*])", default=0)), 
              e_set("server_count", json_select(v("servers"), "length([*])", default=0))
        ))

      處理後的日誌為:

      __topic__:  overall_type
      server_count:  2
      client_count:  2
    4. 丟棄相關欄位:

      e_if(e_search("__topic__==overall_type"), e_drop_fields("clients", "servers"))
    5. 對主題是server_status的日誌,進行進一步分裂。

      e_if(e_search("__topic__==server_status"), 
           e_compose(
              e_split("servers"), 
              e_json("servers", depth=1)
        ))

      處理後的日誌為如下兩條:

      __topic__:  server_status
      servers:  {"host": "192.0.2.1", "status": "green"}
      host: 192.0.2.1
      status: green
      __topic__:  server_status
      servers:  {"host": "192.0.2.2", "status": "green"}
      host: 192.0.2.2
      status: green
    6. 保留相關欄位:

      e_if(e_search("__topic__==overall_type"), e_drop_fields("servers"))
    7. 對主題是client_status的日誌進行進一步分裂,再刪除多餘欄位。

      e_if(e_search("__topic__==client_status"), 
           e_compose(
              e_split("clients"), 
              e_json("clients", depth=1),
              e_drop_fields("clients")
        ))

      處理後的日誌為如下兩個日誌:

      __topic__:  client_status
      host: 192.0.2.3
      status: green
      __topic__:  clients
      host: 192.0.2.4
      status: red
    8. 綜上LOG DSL規則:

      # 總體分裂
      e_set("__topic__", "server_status,client_status,overall_type")
      e_split("__topic__")
      e_json('content',depth=1)
      e_drop_fields("content")
      
      # 處理overall_type日誌
      e_if(e_search("__topic__==overall_type"), 
           e_compose(
              e_set("client_count", json_select(v("clients"), "length([*])", default=0)),
      				e_set("server_count", json_select(v("servers"), "length([*])", default=0))
      ))
      
      # 處理server_status日誌
      e_if(e_search("__topic__==server_status"), 
           e_compose(
              e_split("servers"), 
              e_json("servers", depth=1)
        ))
      e_if(e_search("__topic__==overall_type"), e_drop_fields("servers"))
      
      
      # 處理client_status日誌
      e_if(e_search("__topic__==client_status"), 
           e_compose(
              e_split("clients"), 
              e_json("clients", depth=1),
              e_drop_fields("clients")
        ))

方案最佳化

上述方案對content.serverscontent.servers為空白時的處理有一些問題。假設原始日誌是:

__source__:  192.0.2.1
__topic__:  
content:{
            "service": "search_service",
            "overal_status": "yellow",
            "servers": [ ],
            "clients": [ ]
}

按照上述方案分裂為三條日誌,其中主題為client_statusserver_status的日誌內容是空的。

__source__:  192.0.2.1
__topic__:  overall_type
client_count:  0
overal_status:  yellow
server_count:  0
service:  search_service


__source__:  192.0.2.1
__topic__:  client_status
service:  search_service
__source__:  192.0.2.1


__topic__:  server_status
host:  192.0.2.1
status:  green
service:  search_service
  • 方案1

    可以在初始分裂後,處理server_statusclient_status日誌前分別判斷並丟棄空的相關事件。

    # 處理server_status: 空的丟棄(非空保留)
    e_keep(op_and(e_search("__topic__==server_status"), json_select(v("servers"), "length([*])")))
    
    # 處理client_status: 空的丟棄(非空保留)
    e_keep(op_and(e_search("__topic__==client_status"), json_select(v("clients"), "length([*])")))

    綜上LOG DSL規則是:

    # 總體分裂
    e_set("__topic__", "server_status,client_status,overall_type")
    e_split("__topic__")
    e_json('content',depth=1)
    e_drop_fields("content")
    
    # 處理overall_type日誌
    e_if(e_search("__topic__==overall_type"), 
         e_compose(
           e_set("client_count", json_select(v("clients"), "length([*])", default=0)),
    			 e_set("server_count", json_select(v("servers"), "length([*])", default=0))
    ))
    
    # 新增: 預先處理server_status: 空的丟棄(非空保留)
    e_keep(op_and(e_search("__topic__==server_status"), json_select(v("servers"), "length([*])")))
    
    # 處理server_status日誌
    e_if(e_search("__topic__==server_status"), 
         e_compose(
            e_split("servers"), 
            e_json("servers", depth=1)
      ))
    e_if(e_search("__topic__==overall_type"), e_drop_fields("servers"))
    
    
    # 新增: 預先處理client_status: 空的丟棄(非空保留)
    e_keep(op_and(e_search("__topic__==client_status"), json_select(v("clients"), "length([*])")))
    
    # 處理client_status日誌
    e_if(e_search("__topic__==client_status"), 
         e_compose(
            e_split("clients"), 
            e_json("clients", depth=1),
            e_drop_fields("clients")
      ))
  • 方案2

    在初始分裂時進行判斷,如果對應資料為空白就進行分裂。

    # 初始主題
    e_set("__topic__", "server_status")
    
    # 如果content.servers非空, 則從server_status分裂出1條日誌
    e_if(json_select(v("content"), "length(servers[*])"),
       e_compse(
          e_set("__topic__", "server_status,overall_type"),
          e_split("__topic__")
       ))
    
    # 如果content.clients非空, 則從overall_type再分裂出1條日誌
    e_if(op_and(e_search("__topic__==overall_type"), json_select(v("content"), "length(clients[*])")),
       e_compse(
          e_set("__topic__", "client_status,overall_type"),
          e_split("__topic__")
       ))

    綜上LOG DSL規則是:

    # 總體分裂
    e_set("__topic__", "server_status")
    
    # 如果content.servers非空, 則從server_status分裂出1條日誌
    e_if(json_select(v("content"), "length(servers[*])"),
       e_compse(
          e_set("__topic__", "server_status,overall_type"),
          e_split("__topic__")
       ))
    
    # 如果content.clients非空, 則從server_status分裂出1條日誌
    e_if(op_and(e_search("__topic__==overall_type"), json_select(v("content"), "length(clients[*])")),
       e_compse(
          e_set("__topic__", "client_status,overall_type"),
          e_split("__topic__")
       ))
    
    # 處理overall_type日誌
    e_if(e_search("__topic__==overall_type"), 
         e_compose(
            e_set("client_count", json_select(v("clients"), "length([*])", default=0)),
    				e_set("server_count", json_select(v("servers"), "length([*])", default=0))
    ))
    
    # 處理server_status日誌
    e_if(e_search("__topic__==server_status"), 
         e_compose(
            e_split("servers"), 
            e_json("servers", depth=1)
      ))
    e_if(e_search("__topic__==overall_type"), e_drop_fields("servers"))
    
    
    # 處理client_status日誌
    e_if(e_search("__topic__==client_status"), 
         e_compose(
            e_split("clients"), 
            e_json("clients", depth=1),
            e_drop_fields("clients")
      ))

方案對比

  • 方案1在分裂出日誌後再刪除為空白的日誌,邏輯上有些多餘,但規則簡單易維護。預設推薦該方案。

  • 方案2會在分裂前進行判斷,處理效率會高一些,但規則略微冗餘,僅在特定情境例如初始分裂可能導致大量額外事件產生時推薦。

多層數組對象嵌套的複雜JSON資料加工

以一個複雜的保護多層數組嵌套的對象為樣本,將users下的每個對象中的login_histories的每個登入資訊都拆成一個登入事件。

  • 原始日誌

    __source__:  192.0.2.1
    __topic__:  
    content:{
      "users": [
        {
            "name": "user1",
            "login_histories": [
              {
                "date": "2019-10-10 0:0:0",
                "login_ip": "192.0.2.6"
              },
              {
                "date": "2019-10-10 1:0:0",
                "login_ip": "192.0.2.6"
              },
          {
          ...更多登入資訊...
          }
    192.0.2.9        ]
        },
        {
            "name": "user2",
            "login_histories": [
              {
                "date": "2019-10-11 0:0:0",
                "login_ip": "192.0.2.7"
              },
              {
                "date": "2019-10-11 1:0:0",
                "login_ip": "192.0.2.9"
              },
          {
          ...更多登入資訊...
          }     
            ]
        },
      {
        ....更多user....
      }
      ]
    }
  • 期望分裂出的日誌

    __source__:  192.0.2.1
    name:  user1
    date:  2019-10-11 1:0:0
    login_ip:  192.0.2.6
    
    __source__: 192.0.2.1
    name:  user1
    date:  2019-10-11 0:0:0
    login_ip:  192.0.2.6
    
    __source__:  192.0.2.1
    name:  user2
    date:  2019-10-11 0:0:0
    login_ip:  192.0.2.7
    
    __source__: 192.0.2.1
    name:  user2
    date:  2019-10-11 1:0:0
    login_ip:  192.0.2.9  
    
    ....更多日誌....
  • 解決方案

    1. content中的users進行分裂和展開操作。

      e_split("content", jmes='users[*]', output='item')
      e_json("item",depth=1)

      處理後返回的日誌:

      __source__:  192.0.2.1
      __topic__:  
      content:{...如前...}
      item:  {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]}
      login_histories:  [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]
      name:  user1
      
      __source__:  192.0.2.1
      __topic__:  
      content:{...如前...}
      item:  {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]}
      login_histories:  [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]
      name:  user2
    2. login_histories先分裂再展開。

      e_split("login_histories")
      e_json("login_histories", depth=1)

      處理後返回的日誌:

      __source__:  192.0.2.1
      __topic__: 
      content: {...如前...}
      date:  2019-10-11 0:0:0
      item:  {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]}
      login_histories:  {"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}
      login_ip:  192.0.2.7
      name:  user2
      
      __source__:  192.0.2.1
      __topic__: 
      content: {...如前...}
      date:  2019-10-11 1:0:0
      item:  {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]}
      login_histories:  {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}
      login_ip:  192.0.2.9
      name:  user2
      
      __source__: 192.0.2.1
      __topic__:  
      content: {...如前...}
      date:  2019-10-10 1:0:0
      item:  {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]}
      login_histories:  {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}
      login_ip:  192.0.2.6
      name:  user1
      
      __source__: 192.0.2.1
      __topic__:  
      content: {...如前...}
      date:  2019-10-10 0:0:0
      item:  {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]}
      login_histories:  {"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}
      login_ip:  192.0.2.6
      name:  user1
    3. 刪除無關欄位。

      e_drop_fields("content", "item", "login_histories")

      處理後返回的日誌:

      __source__: 192.0.2.1
      __topic__:
      name:  user1
      date:  2019-10-11 1:0:0
      login_ip:  192.0.2.6
      
      __source__:  192.0.2.1
      __topic__:
      name:  user1
      date:  2019-10-11 0:0:0
      login_ip:  192.0.2.6
      
      __source__:  192.0.2.1
      __topic__:
      name:  user2
      date:  2019-10-11 0:0:0
      login_ip:  192.0.2.7
      
      __source__: 192.0.2.1
      __topic__:
      name:  user2
      date:  2019-10-11 1:0:0
      login_ip:  192.0.2.9
    4. 綜上LOG DSL規則可以如以下形式:

      e_split("content", jmes='users[*]', output='item')
      e_json("item",depth=1)
      e_split("login_histories")
      e_json("login_histories", depth=1)
      e_drop_fields("content", "item", "login_histories")

總結:針對以上類似的需求,首先進行分裂,然後再做展開操作,最後刪除無關資訊。