All Products
Search
Document Center

Simple Log Service:Use Flume to consume log data

Last Updated:Sep 01, 2024

This topic describes how to use Flume to consume log data. You can use the aliyun-log-flume plug-in to connect Simple Log Service with Flume and write log data to Simple Log Service or consume log data from Simple Log Service.

Background information

The aliyun-log-flume plug-in is used to connect Simple Log Service with Flume. After Simple Log Service is connected with Flume, Simple Log Service can connect to other systems such as Hadoop Distributed File System (HDFS) and Kafka by using Flume. The aliyun-log-flume plug-in provides sinks and sources to connect Simple Log Service with Flume.

  • Sink: reads data from other data sources and writes the data to Simple Log Service.

  • Source: consumes log data from Simple Log Service and writes the log data to other systems.

For more information, see aliyun-log-flume.

Procedure

  1. Download and install Flume.

    For more information, see Apache Flume.

  2. Download the aliyun-log-flume plug-in and save the plug-in in the ***/flume/lib directory.

    To download the plug-in, click aliyun-log-flume-1.3.jar.

  3. In the ***/flume/conf directory, create a configuration file named flumejob.conf.

    • For more information about how to configure a sink and the configuration example, see Sink.

    • For more information about how to configure a source and the configuration example, see Source.

  4. Start Flume.

Sink

You can configure a sink to write data from other data sources to Simple Log Service by using Flume. The following modes are supported for parsing:

  • SIMPLE: A Flume event is written to Simple Log Service as a field.

  • DELIMITED: A Flume event is parsed into fields based on the configured column names and written to Simple Log Service.

The following table describes the parameters of a sink.

Parameter

Required

Description

type

Yes

The type of the sink. Default value: com.aliyun.Loghub.flume.sink.LoghubSink.

endpoint

Yes

The endpoint of the Simple Log Service project. Example: http://cn-qingdao.log.aliyuncs.com. Enter an endpoint based on your business scenario. For more information, see Endpoints.

project

Yes

The name of the project.

logstore

Yes

The name of the Logstore.

accessKeyId

Yes

The AccessKey ID provided by Alibaba Cloud. The AccessKey ID is used to identify the user. To ensure the security of your account, we recommend that you use the AccessKey pair of a RAM user. For more information about how to obtain an AccessKey pair, see AccessKey pair.

accessKey

Yes

The AccessKey secret provided by Alibaba Cloud. The AccessKey secret is used to authenticate the key of the user. To ensure the security of your account, we recommend that you use the AccessKey pair of a RAM user. For more information about how to obtain an AccessKey pair, see AccessKey pair.

batchSize

No

The number of data entries that are written to Simple Log Service at a time. Default value: 1000.

maxBufferSize

No

The maximum number of data entries in a cache queue. Default value: 1000.

serializer

No

The serialization mode of Flume events. Valid values:

  • DELIMITED: delimiter mode.

  • SIMPLE: single-line mode. This is the default value.

  • JSON: JSON mode.

  • Custom serializer: custom serialization mode. In this mode, you must specify the full names of columns.

columns

No

The columns. If you set the serializer parameter to DELIMITED, you must configure this parameter. Separate multiple columns with commas (,). The columns are sorted in the same order as they are in the data entries.

separatorChar

No

The delimiter, which must be a single character. If you set the serializer parameter to DELIMITED, you must configure this parameter. By default, commas (,) are used.

quoteChar

No

The quote. If you set the serializer parameter to DELIMITED, you must configure this parameter. By default, double quotation marks (") are used.

escapeChar

No

The escape character. If you set the serializer parameter to DELIMITED, you must configure this parameter. By default, double quotation marks (") are used.

useRecordTime

No

Specifies whether to use the value of the timestamp field in the data entries as the log time when data is written to Simple Log Service. Default value: false. This value indicates that the current time is used as the log time.

For more information about the configuration example of a sink, visit GitHub.

Source

You can configure a source to ship data from Simple Log Service to other data sources by using Flume. The following modes are supported for writing:

  • DELIMITED: Log data is written to Flume in delimiter mode.

  • JSON: Log data is written to Flume in JSON mode.

The following table describes the parameters of a source.

Parameter

Required

Description

type

Yes

The type of the source. Default value: com.aliyun.Loghub.flume.source.LoghubSource.

endpoint

Yes

The endpoint of the Simple Log Service project. Example: http://cn-qingdao.log.aliyuncs.com. Enter an endpoint based on your business scenario. For more information, see Endpoints.

project

Yes

The name of the project.

logstore

Yes

The name of the Logstore.

accessKeyId

Yes

The AccessKey ID provided by Alibaba Cloud. The AccessKey ID is used to identify the user. To ensure the security of your account, we recommend that you use the AccessKey pair of a RAM user. For more information about how to obtain an AccessKey pair, see AccessKey pair.

accessKey

Yes

The AccessKey secret provided by Alibaba Cloud. The AccessKey secret is used to authenticate the key of the user. To ensure the security of your account, we recommend that you use the AccessKey pair of a RAM user. For more information about how to obtain an AccessKey pair, see AccessKey pair.

heartbeatIntervalMs

No

The interval at which the client sends heartbeat messages to Simple Log Service. Default value: 30000. Unit: milliseconds.

fetchIntervalMs

No

The interval at which data is read from Simple Log Service. Default value: 100. Unit: milliseconds.

fetchInOrder

No

Specifies whether to consume log data in the order based on which the log data is written to Simple Log Service. Default value: false.

batchSize

No

The number of data entries that are read at a time. Default value: 100.

consumerGroup

No

The name of the consumer group that is used to read log data.

initialPosition

No

The starting point from which data is read. Valid values: begin, end, and timestamp. Default value: begin.

Note

If a checkpoint exists on Simple Log Service, the checkpoint is preferentially used.

timestamp

No

The UNIX timestamp. If you set the initialPosition parameter to timestamp, you must configure this parameter.

deserializer

Yes

The deserialization mode of events. Valid values:

  • DELIMITED: delimiter mode. This is the default value.

  • JSON: JSON mode.

  • Custom deserializer: custom deserialization mode. In this mode, you must specify the full names of the columns.

columns

No

The columns. If you set the deserializer parameter to DELIMITED, you must configure this parameter. Separate multiple columns with commas (,). The columns are sorted in the same order as they are in the data entries.

separatorChar

No

The delimiter, which must be a single character. If you set the deserializer parameter to DELIMITED, you must configure this parameter. By default, commas (,) are used.

quoteChar

No

The quote. If you set the deserializer parameter to DELIMITED, you must configure this parameter. By default, double quotation marks (") are used.

escapeChar

No

The escape character. If you set the deserializer parameter to DELIMITED, you must configure this parameter. By default, double quotation marks (") are used.

appendTimestamp

No

Specifies whether to append the timestamp specified by the timestamp parameter as a field to each log. If you set the deserializer parameter to DELIMITED, you must configure this parameter. Default value: false.

sourceAsField

No

Specifies whether to add the log source as a field named __source__. If you set the deserializer parameter to JSON, you must configure this parameter. Default value: false.

tagAsField

No

Specifies whether to add the log tag as a field. The field is named in the format of __tag__:{Tag name}. If you set the deserializer parameter to JSON, you must configure this parameter. Default value: false.

timeAsField

No

Specifies whether to add the log time as a field named __time__. If you set the deserializer parameter to JSON, you must configure this parameter. Default value: false.

useRecordTime

No

Specifies whether to use the value of the timestamp field in the logs as the log time when log data is read from Simple Log Service. Default value: false. This value indicates that the current time is used as the log time. Default value: false.

For more information about the configuration example of a source, visit GitHub.