logstash-input-sls外掛程式 - Elasticsearch

logstash-input-sls外掛程式是阿里雲Logstash內建的預設外掛程式。作為Logstash的input外掛程式，logstash-input-sls外掛程式提供了從Log Service擷取日誌的功能。

說明 logstash-input-sls是阿里雲維護的開源外掛程式，詳情請參見logstash-input-logservice。

功能特性

支援分布式協同消費：可配置多台伺服器同時消費一個Logstore服務。
說明多台Logstash伺服器進行分布式協同消費時，由於logstash-input-sls外掛程式限制，需保證各個伺服器僅部署一個input sls管道。如果單個伺服器中存在多個input sls管道，輸出端可能會出現資料重複的異常情況。
高效能：基於Java ConsumerGroup實現，單核消費速度可達20 MB/s。
高可靠：消費進度會被儲存到服務端，宕機恢複時，會從上一次checkpoint處自動回復。
自動負載平衡：根據消費者數量自動分配shard，消費者增加或退出後會自動進行負載平衡。

前提條件

您已完成以下操作：

安裝logstash-input-sls外掛程式。
具體操作步驟請參見安裝或卸載外掛程式。
建立Log Service專案和Logstore，並採集資料。
具體操作步驟請參見Log Service快速入門教程。

使用logstash-input-sls外掛程式

滿足以上前提條件後，您可以通過設定檔管理管道的方式建立管道任務。在建立管道任務時，按照以下說明配置管道參數。配置完成後進行儲存與部署，即可觸發Logstash從Log Service擷取日誌。

重要 RAM使用者使用logstash-input-sls外掛程式前，還需要在Log Service側設定消費組相關的權限原則，詳細資料請參見通過消費組消費資料。

以使用阿里雲Logstash消費某一個Logstore，並將日誌輸出到Elasticsearch為例，配置樣本如下。

input {
 logservice{
  endpoint => "your project endpoint"
  access_id => "your access id"
  access_key => "your access key"
  project => "your project name"
  logstore => "your logstore name"
  consumer_group => "consumer group name"
  consumer_name => "consumer name"
  position => "end"
  checkpoint_second => 30
  include_meta => true
  consumer_name_with_ip => true
  }
}

output {
  elasticsearch {
    hosts => ["http://es-cn-***.elasticsearch.aliyuncs.com:9200"]
    index => "<your_index>"
    user => "elastic"
    password => "changeme"
  }
}

假設某Logstore有10個shard，每個shard的資料流量1 MB/s；每台阿里雲Logstash機器處理的能力為3 MB/s，可分配5台阿里雲Logstash伺服器，每個伺服器建立一個input sls管道；並且每個伺服器管道設定相同的consumer_group和consumer_name，將consumer_name_with_ip欄位設定為true。

這種情況每台伺服器會分配到2個shard，分別處理2 MB/s的資料。

參數說明

logstash-input-sls支援的參數如下。

參數名	參數類型	是否必填	說明
endpoint	String	是	VPC網路下的Log Service專案的Endpoint，詳情請參見私網。
access_id	String	是	阿里雲Access Key ID，需要具備ConsumerGroup相關許可權，詳情請參見使用消費組消費。
access_key	String	是	阿里雲Access Key Secret，需要具備ConsumerGroup相關許可權，詳情請參見使用消費組消費。
project	String	是	Log Service專案名。
logstore	String	是	Log Service日誌庫名。
consumer_group	String	是	自訂消費組名。
consumer_name	String	是	自訂消費者名。同一個消費組內消費者名不能重複，否則會出現未定義行為。
position	String	是	消費位置，可選： begin：從日誌庫寫入的第一條資料開始消費。 end：從目前時間點開始消費。 yyyy-MM-dd HH:mm:ss：從指定時間點開始消費。
checkpoint_second	Number	否	每隔幾秒checkpoint一次，建議10~60秒，不能低於10秒，預設30秒。
include_meta	Boolean	否	傳入日誌是否包含Meta，Meta包括日誌source、time、tag以及topic，預設為true。
consumer_name_with_ip	Boolean	否	消費者名是否包含IP地址，預設為true。分布式協同消費下必須設定為true。

效能基準測試資訊

測試環境
- 處理器：Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz，4 Core
- 記憶體：8 GB
- 環境：Linux

阿里雲Logstash配置

input {
  logservice{
  endpoint => "cn-hangzhou-intranet.log.aliyuncs.com"
  access_id => "***"
  access_key => "***"
  project => "test-project"
  logstore => "logstore1"
  consumer_group => "consumer_group1"
  consumer_name => "consumer1"
  position => "end"
  checkpoint_second => 30
  include_meta => true
  consumer_name_with_ip => true
  }
}
output {
  elasticsearch {
    hosts => ["http://es-cn-***.elasticsearch.aliyuncs.com:9200"]
    index => "myindex"
    user => "elastic"
    password => "changeme"
  }
}

測試過程
1. 使用Java Producer向Logstore發送資料，每秒分別發送2 MB、4 MB、8 MB、16 MB、32 MB資料。
  每條日誌約500位元組，包括10個Key和Value對。
2. 啟動阿里雲Logstash消費Logstore中的資料，並確保消費延遲沒有上漲（消費速度能夠跟上生產的速度）。
測試結果
流量（MB/S） CPU使用率（%） 記憶體佔用量（GB）
32 170.3 1.3
16 83.3 1.3
8 41.5 1.3
4 21.0 1.3
2 11.3 1.3

流量（MB/S）	CPU使用率（%）	記憶體佔用量（GB）
32	170.3	1.3
16	83.3	1.3
8	41.5	1.3
4	21.0	1.3
2	11.3	1.3