Use the logstash-input-oss plug-in - Elasticsearch - Alibaba Cloud Documentation Center

The logstash-input-oss plug-in is developed based on Alibaba Cloud Simple Message Queue (formerly MNS) (SMQ). When an associated object in Object Storage Service (OSS) is updated, SMQ sends a notification to the plug-in. After the plug-in receives the notification, the plug-in triggers Alibaba Cloud Logstash to read the latest data from OSS. You can configure event notification rules on the event notification page of OSS. This way, OSS can automatically send a notification to SMQ when an associated object in OSS is updated.

Note

logstash-input-oss is an open source plug-in. For more information, see logstash-output-oss.

Usage notes

After logstash-input-oss receives a notification from SMQ, Logstash extracts the associated object (such as an object created by using the PutObject or AppendObject operation) from the notification and synchronizes all data in the object.
If the associated object is a text object in the .gz or .gzip format, Logstash processes the object as an object in the .gzip format. Objects in other formats are processed as objects in the text format.
Objects are read in the text format. If an object is in a format that cannot be parsed, such as the .jar or .bin format, the object may be read as garbled characters.

Prerequisites

The logstash-input-oss plug-in is installed.
For more information, see Install or remove a Logstash plug-in.
OSS and SMQ are activated, and OSS buckets are in the same region as SMQ queues or topics.
For more information, see Activate OSS and Activate SMQ and authorize RAM users to access SMQ.
Event notification rules are configured in OSS.
For more information, see Configure event notification rules.

Use logstash-input-oss

Create a pipeline by following the instructions provided in Use configuration files to manage pipelines. When you create the pipeline, configure the pipeline parameters that are described in the table of the Parameters section. After you configure the parameters, save the settings and deploy the pipeline. This way, Logstash can be triggered to obtain data from OSS.

The following code provides a pipeline configuration example. In the example, the pipeline is used to obtain data from OSS and write the data to Alibaba Cloud Elasticsearch.

input {
 oss {
  endpoint => "oss-cn-hangzhou-internal.aliyuncs.com"
  bucket => "zl-ossou****"
  access_key_id => "******"
  access_key_secret => "*********"
  prefix => "file-sample-prefix"
  mns_settings => {
   endpoint => "******.mns.cn-hangzhou-internal.aliyuncs.com"
   queue => "aliyun-es-sample-mns"
  }
  codec => json {
   charset => "UTF-8"
  }
 }
}

output {
  elasticsearch {
    hosts => ["http://es-cn-***.elasticsearch.aliyuncs.com:9200"]
    index => "aliyun-es-sample"
    user => "elastic"
    password => "changeme"
  }
}

Important

The SMQ endpoint cannot be prefixed with http and must be an internal endpoint. Otherwise, an error is reported.

Parameters

The following table describes the parameters supported by logstash-input-oss.

Parameter	Type	Required	Description
endpoint	string	Yes	The endpoint that is used to access OSS. For more information, see Regions, endpoints and open ports.
bucket	string	Yes	The name of the OSS bucket.
access_key_id	string	Yes	The AccessKey ID of your Alibaba Cloud account.
access_key_secret	string	Yes	The AccessKey secret of your Alibaba Cloud account.
prefix	string	No	If you configure this parameter, you must make sure that the value of this parameter matches the prefix of the name of an object or directory in the OSS bucket. The prefix is not a regular expression. You can configure this parameter to enable Logstash to read data from one or more specific directories in the OSS bucket.
additional_oss_settings	hash	No	Additional OSS client configurations. You can configure the following parameters: secure_connection_enabled: specifies whether to enable secure connections. max_connections_to_oss: specifies the maximum number of connections to OSS.
delete	boolean	No	Specifies whether to delete processed objects from the original OSS bucket. Valid values: true false (default value)
backup_to_bucket	string	No	The name of the OSS bucket that stores the backups of processed objects.
backup_to_dir	string	No	The local directory that stores the backups of processed files.
backup_add_prefix	string	No	The prefix of a key after an object is processed. The key is a full path that includes the object name in OSS. If you back up data to the same or another OSS bucket, you can use this parameter to specify a new folder to store backups.
include_object_properties	boolean	No	Specifies whether to include the properties of an OSS object in [@metadata][oss]. The properties refer to last_modified, content_type, and metadata. Valid values: true false If this parameter is not configured, [@metadata][oss][key] always exists.
exclude_pattern	string	No	The Ruby-based regular expression of the key that you want to exclude from the OSS bucket.
mns_settings	hash	Yes	The configurations of SMQ. Valid values: endpoint: the endpoint that is used to access SMQ. The endpoint cannot be prefixed with http and must be an internal endpoint. Otherwise, an error is reported. queue: the name of an SMQ queue. poll_interval_seconds: the maximum waiting time for a ReceiveMessage request when the queue contains no messages. Default value: 10s. wait_seconds: the maximum polling waiting time for a ReceiveMessage request. Unit: seconds. For more information about ReceiveMessage, see ReceiveMessage.

FAQ

Q: Why is the logstash-input-oss plug-in developed based on SMQ?
A: A mechanism is required to notify clients of OSS object updates. The events about OSS object updates can be seamlessly written to SMQ.
Q: Why is the ListObjects API operation of OSS not used to obtain updated objects?
A: OSS consumes more local storage when it records unprocessed and processed objects. When the local storage consumption is high, the performance of ListObjects decreases. Other file storage systems, such as the Amazon S3 open source community, also replaced ListObjects with the message notification mechanism.
Q: Logstash is triggered to obtain data from OSS when OSS is writing data. What does logstash-input-oss do? Can some data be lost?
A: logstash-input-oss records the data that has been written in the SMQ queue and transmits the data to Elasticsearch through the Logstash pipeline. The data that has not been written continues to be written to OSS. logstash-input-oss obtains the data from OSS until Logstash is triggered to obtain data from OSS again.