All Products
Search
Document Center

Elasticsearch:Use the logstash-input-oss plug-in

Last Updated:Oct 15, 2024

The logstash-input-oss plug-in is developed based on Alibaba Cloud Simple Message Queue (formerly MNS) (SMQ). When an associated object in Object Storage Service (OSS) is updated, SMQ sends a notification to the plug-in. After the plug-in receives the notification, the plug-in triggers Alibaba Cloud Logstash to read the latest data from OSS. You can configure event notification rules on the event notification page of OSS. This way, OSS can automatically send a notification to SMQ when an associated object in OSS is updated.

Note

logstash-input-oss is an open source plug-in. For more information, see logstash-output-oss.

Usage notes

  • After logstash-input-oss receives a notification from SMQ, Logstash extracts the associated object (such as an object created by using the PutObject or AppendObject operation) from the notification and synchronizes all data in the object.

  • If the associated object is a text object in the .gz or .gzip format, Logstash processes the object as an object in the .gzip format. Objects in other formats are processed as objects in the text format.

  • Objects are read in the text format. If an object is in a format that cannot be parsed, such as the .jar or .bin format, the object may be read as garbled characters.

Prerequisites

Use logstash-input-oss

Create a pipeline by following the instructions provided in Use configuration files to manage pipelines. When you create the pipeline, configure the pipeline parameters that are described in the table of the Parameters section. After you configure the parameters, save the settings and deploy the pipeline. This way, Logstash can be triggered to obtain data from OSS.

The following code provides a pipeline configuration example. In the example, the pipeline is used to obtain data from OSS and write the data to Alibaba Cloud Elasticsearch.

input {
 oss {
  endpoint => "oss-cn-hangzhou-internal.aliyuncs.com"
  bucket => "zl-ossou****"
  access_key_id => "******"
  access_key_secret => "*********"
  prefix => "file-sample-prefix"
  mns_settings => {
   endpoint => "******.mns.cn-hangzhou-internal.aliyuncs.com"
   queue => "aliyun-es-sample-mns"
  }
  codec => json {
   charset => "UTF-8"
  }
 }
}

output {
  elasticsearch {
    hosts => ["http://es-cn-***.elasticsearch.aliyuncs.com:9200"]
    index => "aliyun-es-sample"
    user => "elastic"
    password => "changeme"
  }
}
Important

The SMQ endpoint cannot be prefixed with http and must be an internal endpoint. Otherwise, an error is reported.

Parameters

The following table describes the parameters supported by logstash-input-oss.

Parameter

Type

Required

Description

endpoint

string

Yes

The endpoint that is used to access OSS. For more information, see Regions, endpoints and open ports.

bucket

string

Yes

The name of the OSS bucket.

access_key_id

string

Yes

The AccessKey ID of your Alibaba Cloud account.

access_key_secret

string

Yes

The AccessKey secret of your Alibaba Cloud account.

prefix

string

No

If you configure this parameter, you must make sure that the value of this parameter matches the prefix of the name of an object or directory in the OSS bucket. The prefix is not a regular expression. You can configure this parameter to enable Logstash to read data from one or more specific directories in the OSS bucket.

additional_oss_settings

hash

No

Additional OSS client configurations. You can configure the following parameters:

  • secure_connection_enabled: specifies whether to enable secure connections.

  • max_connections_to_oss: specifies the maximum number of connections to OSS.

delete

boolean

No

Specifies whether to delete processed objects from the original OSS bucket. Valid values:

  • true

  • false (default value)

backup_to_bucket

string

No

The name of the OSS bucket that stores the backups of processed objects.

backup_to_dir

string

No

The local directory that stores the backups of processed files.

backup_add_prefix

string

No

The prefix of a key after an object is processed. The key is a full path that includes the object name in OSS. If you back up data to the same or another OSS bucket, you can use this parameter to specify a new folder to store backups.

include_object_properties

boolean

No

Specifies whether to include the properties of an OSS object in [@metadata][oss]. The properties refer to last_modified, content_type, and metadata. Valid values:

  • true

  • false

If this parameter is not configured, [@metadata][oss][key] always exists.

exclude_pattern

string

No

The Ruby-based regular expression of the key that you want to exclude from the OSS bucket.

mns_settings

hash

Yes

The configurations of SMQ. Valid values:

  • endpoint: the endpoint that is used to access SMQ. The endpoint cannot be prefixed with http and must be an internal endpoint. Otherwise, an error is reported.

  • queue: the name of an SMQ queue.

  • poll_interval_seconds: the maximum waiting time for a ReceiveMessage request when the queue contains no messages. Default value: 10s.

  • wait_seconds: the maximum polling waiting time for a ReceiveMessage request. Unit: seconds.

For more information about ReceiveMessage, see ReceiveMessage.

FAQ

  • Q: Why is the logstash-input-oss plug-in developed based on SMQ?

    A: A mechanism is required to notify clients of OSS object updates. The events about OSS object updates can be seamlessly written to SMQ.

  • Q: Why is the ListObjects API operation of OSS not used to obtain updated objects?

    A: OSS consumes more local storage when it records unprocessed and processed objects. When the local storage consumption is high, the performance of ListObjects decreases. Other file storage systems, such as the Amazon S3 open source community, also replaced ListObjects with the message notification mechanism.

  • Q: Logstash is triggered to obtain data from OSS when OSS is writing data. What does logstash-input-oss do? Can some data be lost?

    A: logstash-input-oss records the data that has been written in the SMQ queue and transmits the data to Elasticsearch through the Logstash pipeline. The data that has not been written continues to be written to OSS. logstash-input-oss obtains the data from OSS until Logstash is triggered to obtain data from OSS again.