The logstash-input-oss plug-in is developed based on Alibaba Cloud Simple Message Queue (formerly MNS) (SMQ). When an associated object in Object Storage Service (OSS) is updated, SMQ sends a notification to the plug-in. After the plug-in receives the notification, the plug-in triggers Alibaba Cloud Logstash to read the latest data from OSS. You can configure event notification rules on the event notification page of OSS. This way, OSS can automatically send a notification to SMQ when an associated object in OSS is updated.
logstash-input-oss is an open source plug-in. For more information, see logstash-output-oss.
Usage notes
After logstash-input-oss receives a notification from SMQ, Logstash extracts the associated object (such as an object created by using the PutObject or AppendObject operation) from the notification and synchronizes all data in the object.
If the associated object is a text object in the .gz or .gzip format, Logstash processes the object as an object in the .gzip format. Objects in other formats are processed as objects in the text format.
Objects are read in the text format. If an object is in a format that cannot be parsed, such as the .jar or .bin format, the object may be read as garbled characters.
Prerequisites
The logstash-input-oss plug-in is installed.
For more information, see Install or remove a Logstash plug-in.
OSS and SMQ are activated, and OSS buckets are in the same region as SMQ queues or topics.
For more information, see Activate OSS and Activate SMQ and authorize RAM users to access SMQ.
Event notification rules are configured in OSS.
For more information, see Configure event notification rules.
Use logstash-input-oss
Create a pipeline by following the instructions provided in Use configuration files to manage pipelines. When you create the pipeline, configure the pipeline parameters that are described in the table of the Parameters section. After you configure the parameters, save the settings and deploy the pipeline. This way, Logstash can be triggered to obtain data from OSS.
The following code provides a pipeline configuration example. In the example, the pipeline is used to obtain data from OSS and write the data to Alibaba Cloud Elasticsearch.
input {
oss {
endpoint => "oss-cn-hangzhou-internal.aliyuncs.com"
bucket => "zl-ossou****"
access_key_id => "******"
access_key_secret => "*********"
prefix => "file-sample-prefix"
mns_settings => {
endpoint => "******.mns.cn-hangzhou-internal.aliyuncs.com"
queue => "aliyun-es-sample-mns"
}
codec => json {
charset => "UTF-8"
}
}
}
output {
elasticsearch {
hosts => ["http://es-cn-***.elasticsearch.aliyuncs.com:9200"]
index => "aliyun-es-sample"
user => "elastic"
password => "changeme"
}
}
The SMQ endpoint cannot be prefixed with http and must be an internal endpoint. Otherwise, an error is reported.
Parameters
The following table describes the parameters supported by logstash-input-oss.
Parameter | Type | Required | Description |
endpoint | string | Yes | The endpoint that is used to access OSS. For more information, see Regions, endpoints and open ports. |
bucket | string | Yes | The name of the OSS bucket. |
access_key_id | string | Yes | The AccessKey ID of your Alibaba Cloud account. |
access_key_secret | string | Yes | The AccessKey secret of your Alibaba Cloud account. |
prefix | string | No | If you configure this parameter, you must make sure that the value of this parameter matches the prefix of the name of an object or directory in the OSS bucket. The prefix is not a regular expression. You can configure this parameter to enable Logstash to read data from one or more specific directories in the OSS bucket. |
additional_oss_settings | hash | No | Additional OSS client configurations. You can configure the following parameters:
|
delete | boolean | No | Specifies whether to delete processed objects from the original OSS bucket. Valid values:
|
backup_to_bucket | string | No | The name of the OSS bucket that stores the backups of processed objects. |
backup_to_dir | string | No | The local directory that stores the backups of processed files. |
backup_add_prefix | string | No | The prefix of a key after an object is processed. The key is a full path that includes the object name in OSS. If you back up data to the same or another OSS bucket, you can use this parameter to specify a new folder to store backups. |
include_object_properties | boolean | No | Specifies whether to include the properties of an OSS object in [@metadata][oss]. The properties refer to last_modified, content_type, and metadata. Valid values:
If this parameter is not configured, [@metadata][oss][key] always exists. |
exclude_pattern | string | No | The Ruby-based regular expression of the key that you want to exclude from the OSS bucket. |
mns_settings | hash | Yes | The configurations of SMQ. Valid values:
For more information about ReceiveMessage, see ReceiveMessage. |
FAQ
Q: Why is the logstash-input-oss plug-in developed based on SMQ?
A: A mechanism is required to notify clients of OSS object updates. The events about OSS object updates can be seamlessly written to SMQ.
Q: Why is the ListObjects API operation of OSS not used to obtain updated objects?
A: OSS consumes more local storage when it records unprocessed and processed objects. When the local storage consumption is high, the performance of ListObjects decreases. Other file storage systems, such as the Amazon S3 open source community, also replaced ListObjects with the message notification mechanism.
Q: Logstash is triggered to obtain data from OSS when OSS is writing data. What does logstash-input-oss do? Can some data be lost?
A: logstash-input-oss records the data that has been written in the SMQ queue and transmits the data to Elasticsearch through the Logstash pipeline. The data that has not been written continues to be written to OSS. logstash-input-oss obtains the data from OSS until Logstash is triggered to obtain data from OSS again.