logstash-input-sls is a built-in input plug-in of Alibaba Cloud Logstash. This plug-in retrieves logs from Log Service.
Features
- Distributed collaborative consumption: Multiple servers can be configured to consume
log data in a Logstore at the same time.
Note If you want to use multiple Logstash servers to implement distributed collaborative consumption, make sure that only one pipeline with logstash-input-sls installed is deployed on each server. This is because of the limits on the logstash-input-sls plug-in. If multiple pipelines with logstash-input-sls installed are deployed on a single server, duplicated data may exist in the output.
- High performance: If you use a Java consumer group, the consumption speed of a single-core CPU can reach 20 MB/s.
- High reliability: logstash-input-sls saves the consumption progress on servers. If a server is recovered from an exception, it can continue to consume data from the last consumption checkpoint.
- Automatic load balancing: Shards are automatically allocated based on the number of consumers in a consumer group. If consumers join or leave the consumer group, logstash-input-sls automatically re-allocates the shards.
Prerequisites
- The logstash-input-sls plug-in is installed.
For more information, see Install and remove a plug-in.
- A Log Service project and a Logstore are created. Data is collected.
For more information, see Quick start.
logstash-input-sls application
The following example shows how to use Alibaba Cloud Logstash to consume log data in a Logstore and transfer the data to Alibaba Cloud Elasticsearch:
input {
logservice{
endpoint => "your project endpoint"
access_id => "your access id"
access_key => "your access key"
project => "your project name"
logstore => "your logstore name"
consumer_group => "consumer group name"
consumer_name => "consumer name"
position => "end"
checkpoint_second => 30
include_meta => true
consumer_name_with_ip => true
}
}
output {
elasticsearch {
hosts => ["http://es-cn-***.elasticsearch.aliyuncs.com:9200"]
index => "<your_index>"
user => "elastic"
password => "changeme"
}
}
Make the following assumptions: The Logstore has 10 shards. The data traffic of each shard is 1 MB/s. The processing capacity of each Logstash server is 3 MB/s. Five Logstash servers are allocated, and one pipeline with logstash-input-sls installed is deployed on each server. Each server uses the same settings of consumer_group and consumer_name. consumer_name_with_ip is set to true.
In this case, two shards are allocated to each server, and each shard processes data at 2 MB/s.
Parameters
The following table describes the parameters supported by logstash-input-sls.
Parameter | Type | Required | Description |
---|---|---|---|
endpoint | String | Yes | The endpoint of the Log Service project in a virtual private cloud (VPC). For more information, see Internal Log Service endpoints. |
access_id | String | Yes | The AccessKey ID of your account that has permissions on the consumer group. For more information, see Use consumer groups to consume logs. |
access_key | String | Yes | The AccessKey secret of your account that has permissions on the consumer group. For more information, see Use consumer groups to consume logs. |
project | String | Yes | The name of the Log Service project. |
logstore | String | Yes | The name of the Logstore. |
consumer_group | String | Yes | The name of the consumer group, which can be customized. |
consumer_name | String | Yes | The name of a consumer, which can be customized. The name of each consumer must be unique in a consumer group. Otherwise, undefined behavior may occur. |
position | String | Yes | The start position to consume log data. Valid values:
|
checkpoint_second | Number | No | The interval for triggering a checkpoint. We recommend that you set the interval to a value in the range of 10s to 60s. Minimum value: 10. Default value: 30. Unit: seconds. |
include_meta | Boolean | No | Specifies whether input logs contain metadata, such as the log source, time, tag, and topic. Default value: true. |
consumer_name_with_ip | Boolean | No | Specifies whether the consumer name contains an IP address. Default value: true. For distributed collaborative consumption, set the value to true. |
Performance benchmark test
- Test environment
- CPU: Intel(R) Xeon(R) Platinum 8163 processor@2.50 GHz, 4-core
- Memory: 8 GB
- Operating system: Linux
- Configuration of a Logstash pipeline
input { logservice{ endpoint => "cn-hangzhou-intranet.log.aliyuncs.com" access_id => "***" access_key => "***" project => "test-project" logstore => "logstore1" consumer_group => "consumer_group1" consumer_name => "consumer1" position => "end" checkpoint_second => 30 include_meta => true consumer_name_with_ip => true } } output { elasticsearch { hosts => ["http://es-cn-***.elasticsearch.aliyuncs.com:9200"] index => "myindex" user => "elastic" password => "changeme" } }
- Test procedure
- Use a Java producer to send 2 MB, 4 MB, 8 MB, 16 MB, and 32 MB of data to the Logstore
every second.
Each log entry contains 10 key-value pairs and is about 500 bytes in length.
- Start Logstash to consume data in the Logstore. Make sure that the consumption latency does not increase and the consumption speed can keep pace with the production speed.
- Use a Java producer to send 2 MB, 4 MB, 8 MB, 16 MB, and 32 MB of data to the Logstore
every second.
- Test results
Traffic (MB/s) CPU utilization (%) Memory usage (GB) 32 170.3 1.3 16 83.3 1.3 8 41.5 1.3 4 21.0 1.3 2 11.3 1.3