This topic describes how to ship logs from Simple Log Service to Splunk by using Alibaba Cloud Log Service Add-on for Splunk.
Architecture
The following list describes how to ship logs by using the add-on.
Create consumer groups by using Splunk data inputs and use the consumer groups to consume logs in Simple Log Service in real time.
Splunk heavy forwarders use the Splunk private protocol or HTTP Event Collector (HEC) to forward the logs to Splunk indexers.
The add-on is used to only collect data. You must install the add-on on Splunk heavy forwarders. You do not need to install the add-on on Splunk indexers or search heads.
Terms
A data input is a consumer that consumes logs.
A consumer group contains multiple consumers. Each consumer in a consumer group consumes different logs that are stored in a Logstore.
A Logstore contains multiple shards.
Each shard can be allocated to only one consumer.
A consumer can consume data from multiple shards.
The name of a consumer contains the name of the consumer group to which the consumer belongs, hostname, process name, and protocol that is used to send Splunk events. This naming convention ensures that the name of each consumer in a consumer group is unique.
For more information about consumer groups, see Use consumer groups to consume data.
Preparations
Obtain an AccessKey pair that is used to access Simple Log Service.
You can use the AccessKey pair of a Resource Access Management (RAM) user to access a Simple Log Service project. For more information, see AccessKey pair and Access data by using AccessKey pairs.
You can use the permission assistant feature to grant permissions to a RAM user. For more information, see Configure the permission assistant feature. The following example shows a common policy.
Note<Project name> specifies the name of your Simple Log Service project. <Logstore name> specifies the name of your Simple Log Service Logstore. Replace the values with the actual values. You can use wildcard characters to specify the names, and the asterisk (*) is supported.
{ "Version": "1", "Statement": [ { "Action": [ "log:ListShards", "log:GetCursorOrData", "log:GetConsumerGroupCheckPoint", "log:UpdateConsumerGroup", "log:ConsumerGroupHeartBeat", "log:ConsumerGroupUpdateCheckPoint", "log:ListConsumerGroup", "log:CreateConsumerGroup" ], "Resource": [ "acs:log:*:*:project/<Project name>/logstore/<Logstore name>", "acs:log:*:*:project/<Project name>/logstore/<Logstore name>/*" ], "Effect": "Allow" } ] }
Check the version of Splunk and the operating system on which Splunk is run.
Make sure that the latest version of the add-on is used.
Make sure that the operating system is Linux, macOS, or Windows.
Make sure that the version of Splunk heavy forwarders is 8.0 or later and the version of Splunk indexers is 7.0 or later.
Configure HEC on Splunk. For more information, see Configure HTTP Event Collector on Splunk Enterprise.
If you use HEC to send events to Splunk indexers, make sure that HEC is configured on Splunk. If you use the Splunk private protocol to send events to Splunk indexers, skip this step.
NoteWhen you create an HEC token, do not enable the indexer acknowledgment feature for the token.
Install the add-on
You can log on to the Splunk web interface and install the add-on. To install the add-on, we recommend that you use one of the following methods.
The add-on is used to only collect data. You must install the add-on on Splunk heavy forwarders. You do not need to install the add-on on Splunk indexers or search heads.
Method 1
Click the icon.
On the Apps page, click Find More Apps.
On the Browse More Apps page, search for Alibaba Cloud Log Service Add-on for Splunk and click Install.
After the add-on is installed, restart Splunk as prompted.
Method 2
Click the icon.
On the Apps page, click Install app from file.
On the Upload app page, select the TGZ file that you want to upload and click Upload.
You can click App Search Results and download the required TGZ file.
Click Install.
After the add-on is installed, restart Splunk as prompted.
Configure the add-on
If Splunk is not running on an Elastic Compute Service (ECS) instance, you can use an AccessKey pair of your Alibaba Cloud account to access Simple Log Service. Procedure:
On the Splunk web interface, click Alibaba Cloud Log Service Add-on for Splunk.
Configure a global account.
On the page that appears, choose
. On the Configuration page, click the Account tab. On this tab, click Add. In the Add Account dialog box, specify an AccessKey pair that can be used to access Simple Log Service.NoteYou must enter an AccessKey ID in the Username field and the related AccessKey secret in the Password field.
Specify the level of add-on logs.
Choose
. On the Configuration page, click the Logging tab. On this tab, select a level from the Log level drop-down list.Create a data input.
Click inputs to open the inputs page.
Click Create New Input. In the Add sls_datainput dialog box, configure the parameters of a data input.
Table 1. Parameters of a data input
Parameter
Required and data type
Description
Example
Name
Yes, string
The name of the data input. The name must be globally unique.
None.
Interval
Yes, integer
The period of time that is required to restart the Splunk data input process after the process stops. Unit: seconds.
Default value: 10.
Index
Yes, string
The Splunk index.
None.
SLS AccessKey
Yes, string
The Alibaba Cloud AccessKey pair that consists of an AccessKey ID and an AccessKey secret.
NoteYou must enter an AccessKey ID in the Username field and the related AccessKey secret in the Password field.
The AccessKey pair that you enter when you configure the global account.
SLS endpoint
Yes, string
The Simple Log Service endpoint. For more information, see Endpoints.
cn-huhehaote.log.aliyuncs.com
https://cn-huhehaote.log.aliyuncs.com
SLS project
Yes, string
The name of the Simple Log Service project. For more information, see Manage a project.
None.
SLS logstore
Yes, string
The name of the Simple Log Service Logstore. For more information, see Manage a Logstore.
None.
SLS consumer group
Yes, string
The name of the Simple Log Service consumer group. If you want to use multiple data inputs to consume data that is stored in the same Logstore, you must specify the same consumer group name for the data inputs. For more information, see Use consumer groups to consume data.
None.
SLS cursor start time
Yes, string
The start time of log data consumption. This parameter is valid only when you use a new consumer group. If you use an existing consumer group, data is consumed from the last checkpoint.
NoteThe start time is the log receiving time.
Valid values: begin, end, and a time value in the ISO 8601 format. Example: 2018-12-26 0:0:0+8:00.
SLS heartbeat interval
Yes, integer
The interval at which a heartbeat message is sent between the consumer and the server. Unit: seconds.
Default value: 60.
SLS data fetch interval
Yes, integer
The interval at which logs are pulled from Simple Log Service. If logs are generated at a low frequency, we recommend that you do not set this parameter to a small value. Unit: seconds.
Default value: 1.
Topic filter
No, string
The string that is used to filter logs by topic. Separate multiple topics with semicolons (;). If the topic of a log is matched, the log is ignored and not sent to Splunk.
TopicA;TopicB. The value specifies that logs whose topic is TopicA or TopicB are ignored.
Unfolded fields
No, JSON
The mapping relationship between the topic of a log in the JSON format and a list of fields. {"topicA": ["field_nameA1", "field_nameA2", ...], "topicB": ["field_nameB1", "field_nameB2", ...], ...}
{"actiontrail_audit_event": ["event"] }. The value specifies that in a log whose topic is actiontrail_audit_event, the JSON strings of the specified fields are expanded and stored in the event field.
Event source
No, string
The source of Splunk events.
None.
Event source type
No, string
The type of the source for Splunk events.
None.
Event retry times
No, integer
The number of retries to consume log data.
Default value: 0, which specifies unlimited retries.
Event protocol
Yes, string
The protocol that is used to send Splunk events. If you use the Splunk private protocol to send Splunk events, you do not need to configure the following parameters in the table.
HTTP for HEC
HTTPS for HEC
Private protocol
HEC host
Yes, string
The HEC host. This parameter is valid only if you use HEC to send Splunk events. For more information, see Set up and use HTTP Event Collector in Splunk Web.
None.
HEC port
Yes, integer
The HEC port. This parameter is valid only if you use HEC to send Splunk events.
None.
HEC token
Yes, string
The HEC token. This parameter is valid only if you use HEC to send Splunk events. For more information, see HEC token.
None.
HEC timeout
Yes, integer
The HEC timeout period. This parameter is valid only if you use HEC to send Splunk events. Unit: seconds.
Default value: 120.
If Splunk is running on an ECS instance, you can attach a RAM role to the ECS instance. Then, the ECS instance can assume the RAM role to access Simple Log Service. Procedure:
Make sure that Splunk is running on the ECS instance to which the required RAM role is attached.
Attach a RAM role to an ECS instance.
In this step, create a RAM role, grant permissions to the RAM role, and then attach the RAM role to an ECS instance. For more information, see Configure an instance RAM role.
For more information about the policy of the RAM role, see the policy in Preparations.
On the Splunk web interface, click Alibaba Cloud Log Service Add-on for Splunk.
Configure a global account.
On the page that appears, choose Step 1 in the Password field.
. On the Configuration page, click the Account tab. On this tab, click Add. In the Add Account dialog box, specify the RAM role that is attached to the ECS instance. In this example, enter ECS_RAM_ROLE in the Username field and enter the name of the RAM role that is created inCreate a data input.
Click inputs to open the inputs page.
Click Create New Input. In the Add sls_datainput dialog box, configure the parameters of a data input.
You must set the SLS AccessKey parameter to the global account that is created in Step 3. For more information about other parameters, see Parameters of a data input.
Related operations
Query data
Make sure that the data input is in the Enabled state. On the Splunk web interface, click Search & Reporting. On the App: Search & Reporting page, query the collected audit logs.
Query Simple Log Service operational logs
Enter
index="_internal" | search "SLS info"
in the search box to query the info logs of Simple Log Service.Enter
index="_internal" | search "error"
in the search box to query the error logs of Simple Log Service.
Performance and security
Performance
The performance of the add-on and data transmission throughput vary based on the following factors:
Endpoint: You can access Simple Log Service by using a public, classic network, virtual private cloud (VPC), or global acceleration endpoint. In most cases, we recommend that you use a classic network endpoint or a VPC endpoint. For more information, see Endpoints.
Bandwidth: The bandwidth of data transmission between Simple Log Service and Splunk heavy forwarders and between Splunk heavy forwarders and indexers affects the performance.
Processing capability of Splunk indexers: The capabilities of indexers to receive data from Splunk heavy forwarders affect the performance.
Number of shards: A larger number of shards in a Logstore indicates a higher data transmission capability. You must specify the number of shards based on the speed at which raw logs are generated. For more information, see Manage shards.
Number of Splunk data inputs: A larger number of data inputs in a consumer group that is configured for a Logstore indicates a higher throughput.
NoteThe concurrent consumption of log data varies based on the number of shards in a Logstore.
Number of CPU cores and memory resources occupied by Splunk heavy forwarders: In most cases, one Splunk data input consumes 1 GB to 2 GB of memory resources and 1 CPU core.
If sufficient memory and CPU resources are allocated, one Splunk data input can consume log data at a speed of 1 MB to 2 MB per second. You must specify the number of shards based on the speed at which raw logs are generated.
For example, if logs are written to a Logstore at a speed of 10 MB per second, you must create at least 10 shards in the Logstore and configure 10 data inputs in the add-on. If you deploy the add-on on a single server, make sure that the server has 10 idle CPU cores and 12 GB of available memory resources.
High availability
A consumer group stores checkpoints on the server side. When a consumer stops consuming data, another consumer continues to consume data from the last checkpoint. You can create Splunk data inputs on multiple servers. If a server stops running or is damaged, a Splunk data input on another server continues to consume data from the last checkpoint. In theory, the number of Splunk data inputs launched on multiple servers can be greater than the number of shards, which ensures that data is consumed from the last checkpoint if an exception occurs.
HTTPS-based data transmission
Simple Log Service
To use HTTPS to encrypt the data that is transmitted between your program and Simple Log Service, make sure that your endpoint is prefixed with https://. Example: https://cn-beijing.log.aliyuncs.com.
The *.aliyuncs.com server certificate is issued by GlobalSign. By default, most Linux and Windows servers are preconfigured to trust this certificate. If a server does not trust this certificate, see Install a trusted root CA or self-signed certificate.
Splunk
To use HTTPS-based HEC, enable the SSL feature when you enable HEC in the Global Settings dialog box. For more information, see Configure HTTP Event Collector on Splunk Enterprise.
AccessKey pair storage protection
The AccessKey pair that you use to access Simple Log Service and HEC tokens are encrypted and stored in Splunk. This prevents data leaks.
FAQ
What do I do if a configuration error occurs?
A configuration error may occur in a data input when you create or modify the data input. In this case, check the basic configuration of the data input. For more information about parameters, see Parameters of a data input.
A configuration error may occur in Simple Log Service. For example, the system failed to create a consumer group. In this case, check the configuration of Simple Log Service.
Command:
index="_internal" | search "error"
Error log:
aliyun.log.consumer.exceptions.ClientWorkerException: error occour when create consumer group, errorCode: LogStoreNotExist, errorMessage: logstore xxxx does not exist
ConsumerGroupQuotaExceed error
You can configure up to 20 consumer groups for a Logstore. We recommend that you view consumer groups in the Simple Log Service console and delete the consumer groups that are no longer required. If more than 20 consumer groups are configured for a Logstore, the ConsumerGroupQuotaExceed error is reported.
What do I do if a permission error occurs?
You are not authorized to access Simple Log Service. Check your permissions.
Command:
index="_internal" | search "error"
Error log:
aliyun.log.consumer.exceptions.ClientWorkerException: error occour when create consumer group, errorCode: SignatureNotMatch, errorMessage: signature J70VwxYH0+W/AciA4BdkuWxK6W8= not match
RAM authentication for your ECS instance failed.
Command:
index="_internal" | search "error"
Error log:
ECS RAM Role detected in user config, but failed to get ECS RAM credentials. Please check if ECS instance and RAM role 'ECS-Role' are configured appropriately.
ECS-Role is the RAM role that you create. The ECS-Role variable is displayed as the actual value.
Troubleshooting:
Check whether the SLS AccessKey parameter of your data input is configured as the global account that has a RAM role.
Check whether the RAM role is properly configured for the global account. The username must be set to ECS_RAM_ROLE, and the password must be set to the name of the RAM role.
Check whether the RAM role is attached to the ECS instance.
Check whether the trusted entity type of the RAM role is set to Alibaba Cloud Service. Check whether the selected trusted service is ECS.
Check whether the ECS instance to which the RAM role is attached is the ECS instance on which Splunk is running.
You are not authorized to access HEC.
Command:
index="_internal" | search "error"
Error log:
ERROR HttpInputDataHandler - Failed processing http input, token name=n/a, channel=n/a, source_IP=127.0.0.1, reply=4, events_processed=0, http_input_body_size=369 WARNING pid=48412 tid=ThreadPoolExecutor-0_1 file=base_modinput.py:log_warning:302 | SLS info: Failed to write [{"event": "{\"__topic__\": \"topic_test0\", \"__source__\": \"127.0.0.1\", \"__tag__:__client_ip__\": \"10.10.10.10\", \"__tag__:__receive_time__\": \"1584945639\", \"content\": \"goroutine id [0, 1584945637]\", \"content2\": \"num[9], time[2020-03-23 14:40:37|1584945637]\"}", "index": "main", "source": "sls log", "sourcetype": "http of hec", "time": "1584945637"}] remote Splunk server (http://127.0.0.1:8088/services/collector) using hec. Exception: 403 Client Error: Forbidden for url: http://127.0.0.1:8088/services/collector, times: 3
Possible causes:
HEC is not configured or started.
The HEC-related parameters of data inputs are invalid. For example, if you use HTTPS-based HEC, you must enable the SSL feature.
The indexer acknowledgment feature is disabled for your HEC tokens.
What do I do if a consumption latency exists?
You can check the status of your consumer group in the Simple Log Service console. For more information, see Step 3: View the status of a consumer group.
We recommend that you increase the number of shards or create more data inputs in the same consumer group. For more information, see Performance and security.
What do I do if network jitter occurs?
Command:
index="_internal" | search "SLS info: Failed to write"
Error log:
WARNING pid=58837 tid=ThreadPoolExecutor-0_0 file=base_modinput.py:log_warning:302 | SLS info: Failed to write [{"event": "{\"__topic__\": \"topic_test0\", \"__source__\": \"127.0.0.1\", \"__tag__:__client_ip__\": \"10.10.10.10\", \"__tag__:__receive_time__\": \"1584951417\", \"content2\": \"num[999], time[2020-03-23 16:16:57|1584951417]\", \"content\": \"goroutine id [0, 1584951315]\"}", "index": "main", "source": "sls log", "sourcetype": "http of hec", "time": "1584951417"}] remote Splunk server (http://127.0.0.1:8088/services/collector) using hec. Exception: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer')), times: 3
If network jitter occurs, Splunk events are automatically retransmitted. If the issue persists, contact your network administrator.
How do I change the start time of consumption?
NoteThe SLS cursor start time parameter is valid only when you use a new consumer group. If you use an existing consumer group, data is consumed from the last checkpoint.
On the inputs page of the Splunk web interface, disable the related data input.
Log on to the Simple Log Service console. Find the Logstore from which data is consumed and delete the related consumer group in the Data Consumption section.
On the inputs page of the Splunk web interface, find the data input, and choose SLS cursor start time parameter. Then, enable the data input.
. In the dialog box that appears, modify the