You can import Amazon Simple Storage Service (S3) objects to Simple Log Service. After the objects are imported, you can perform operations on log data in the objects in Simple Log Service. For example, you can query, analyze, and transform the log data. You can only import a single S3 object that does not exceed 5 GB in size to Simple Log Service. If you want to import a compressed object, the size of the object after compression cannot exceed 5 GB.
Prerequisites
Log files are uploaded to S3.
A project and a Logstore are created. For more information, see Create a project and Create a Logstore.
A custom policy that grants permissions to manage S3 resources is created. For more information, see Custom permissions. The following sample code provides an example of the custom policy.
NoteAfter the custom policy is created, S3 objects can be imported to Simple Log Service.
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::your_bucket_name", "arn:aws:s3:::your_bucket_name/*" ] } ] }
Create a data import configuration
Log on to the Simple Log Service console.
On the right side of the page that appears, click Quick Data Import. On the Data Import tab of the Import Data dialog box, click S3 - Data Import.
Select the project and Logstore. Then, click Next.
In the Import Configuration step, create a data import configuration.
In the Import Configuration step, configure the following parameters.
Parameter
Description
Job Name
The name of the data import job.
Display Name
The display name of the data import job.
Job Description
The description of the data import job.
S3 Region
The region where the S3 bucket resides. The S3 bucket stores the S3 objects that you want to import to Simple Log Service.
AWS AccessKey ID
The AccessKey ID of your AWS account.
ImportantMake sure that your AccessKey pair has permissions to access the AWS resources that you want to manage.
AWS Secret AccessKey
The Secret AccessKey of your AWS account.
File Path Prefix Filter
The directory of the S3 objects. If you configure this parameter, the system can find the S3 objects that you want to import in a more efficient manner. For example, if the S3 objects that you want to import are stored in the csv/ directory, you can set this parameter to csv/.
If you leave this parameter empty, the system traverses the entire S3 bucket to find the S3 objects.
NoteWe recommend that you configure this parameter. A larger number of S3 objects in an S3 bucket indicates lower data import efficiency when the entire bucket is traversed.
File Path Regex Filter
The regular expression that you want to use to filter S3 objects by directory. If you configure this parameter, the system can find the S3 objects that you want to import in a more efficient manner. Only the objects whose names match the regular expression are imported. The names include the paths of the objects. By default, this parameter is empty, which indicates that no filtering is performed.
For example, if an S3 object that you want to import is named
stdata/csv/bill.csv
, you can set this parameter to(testdata/csv/)(.*)
.For more information about how to debug a regular expression, see How do I debug a regular expression?
File Modification Time Filter
The modification time based on which you want to filter S3 objects. If you configure this parameter, the system can find the S3 objects that you want to import in a more efficient manner. Valid values:
All: To import all S3 objects that meet specified conditions, select this option.
From Specific Time: To import S3 objects that are modified after a point in time, select this option.
Specific Time Range: To import S3 objects that are modified within a time range, select this option.
Data Format
The format of the S3 objects. Valid values:
CSV: You can specify the first line of an S3 object as field names or specify custom field names. All lines except the first line are parsed as the values of log fields.
Single-line JSON: An S3 object is read line by line. Each line is parsed as a JSON object. The fields in JSON objects are log fields.
Single-line Text Log: Each line in an S3 object is parsed as a log.
Multi-line Text Logs: Multiple lines in an S3 object are parsed as a log. You can specify a regular expression to match the first line or the last line of a log.
Compression Format
The compression format of the S3 objects. Simple Log Service decompresses the S3 objects based on the specified format to read data.
Encoding Format
The encoding format of the S3 objects. UTF-8 and GBK are supported.
New File Check Cycle
If new objects are constantly generated in the specified directory of S3 objects, you can configure New File Check Cycle based on your business requirements. After you configure this parameter, the data import job is continuously running in the background, and new objects are automatically detected and read at regular intervals. The system ensures that data in an S3 object is not repeatedly written to Simple Log Service.
If new objects are no longer generated in the specified directory of S3 objects, you can change the value of New File Check Cycle to Never Check. Then, the data import job automatically exits after all objects that meet specified conditions are read.
Log Time Configuration
Time Field
The time field. Enter the name of a time column in an S3 object. If you set Data Format to CSV or Single-line JSON, you must configure this parameter. This parameter specifies the log time.
Regular Expression to Extract Time
The regular expression that you want to use to extract log time.
For example, if a sample log is 127.0.0.1 - - [10/Sep/2018:12:36:49 0800] "GET /index.html HTTP/1.1", you can set Regular Expression to Extract Time to [0-9]{0,2}\/[0-9a-zA-Z]+\/[0-9:,]+.
NoteFor other data types, if you want to extract part of the time field, you can specify a regular expression.
Time Field Format
The time format that you want to use to parse the value of the time field.
You can specify a time format that is supported by the Java SimpleDateFormat class. Example:
yyyy-MM-dd HH:mm:ss
. For more information about the time format syntax, see Class SimpleDateFormat. For more information about common time formats, see Time formats.You can specify an epoch time format, which can be epoch, epochMillis, epochMicro, or epochNano.
Time Zone
The time zone for the value of the time field. If the value of Time Field Format is an epoch time format, you do not need to configure this parameter.
If you want to use daylight saving time when you parse logs, you can select a time zone in UTC. Otherwise, select a time zone in GMT.
NoteBy default, UTC+8 is used.
If you set Data Format to CSV, you must configure additional parameters. The following tables describe the parameters.
Additional parameters when you set Data Format to CSV
Parameter
Description
Delimiter
The delimiter for logs. The default value is a comma (,).
Quote
The quote that is used to enclose a CSV-formatted string.
Escape Character
The escape character for logs. The default value is a backslash (\).
Maximum Lines
If you turn on First Line as Field Name, the first line in a CSV file is used to extract field names.
Custom Fields
If you turn off First Line as Field Name, you can specify custom field names. Separate multiple field names with commas (,).
Lines to Skip
The number of lines that are skipped. For example, if you set this parameter to 1, the first line of a CSV file is skipped, and log collection starts from the second line.
Additional parameters when you set Data Format to Multi-line Text Logs
Parameter
Description
Position to Match Regular Expression
The usage of a regular expression.
Regular Expression to Match First Line: If you select this option, the regular expression that you specify is used to match the first line of a log. The unmatched lines are collected as a part of the log until the maximum number of lines that you specify is reached.
Regular Expression to Match Last Line: If you select this option, the regular expression that you specify is used to match the last line of a log. The unmatched lines are collected as a part of the next log until the maximum number of lines that you specify is reached.
Regular Expression
The regular expression. You can specify a regular expression based on the log content.
For more information about how to debug a regular expression, see How do I debug a regular expression?
Maximum Lines
The maximum number of lines allowed for a log.
Click Preview to preview the import result.
After you confirm the result, click Next.
Preview data, configure indexes, and then click Next.
By default, full-text indexing is enabled in Simple Log Service. You can also configure field indexes based on collected logs in manual or automatic mode. To configure field indexes in automatic mode, click Automatic Index Generation. This way, Simple Log Service automatically creates field indexes. For more information, see Create indexes.
ImportantIf you want to query and analyze logs, you must enable full-text indexing or field indexing. If you enable both full-text indexing and field indexing, the system uses only field indexes.
Click Query Log. On the query and analysis page that appears, check whether S3 data is imported.
Wait for approximately 1 minute. If the required S3 data exists, the data is imported.
View a data import configuration
After you create a data import configuration, you can view the configuration details and related statistical reports in the Simple Log Service console.
In the Projects section, click the project to which the data import configuration belongs.
On the tab, click the Logstore to which the data import configuration belongs, choose , and then click the name of the data import configuration.
On the Import Configuration Overview page, view the basic information and statistical reports of the data import configuration.
On the Import Configuration Overview page, you can perform the following operations on the data import configuration:
Modify the data import configuration
To modify the data import configuration, click Edit Configurations. For more information, see Import configuration.
Start a data import job
To start or resume a data import job, click Start.
Stop a data import job
To stop a data import job, click Stop.
Delete the data import configuration
To delete the data import configuration, click Delete Configuration.
WarningAfter the data import configuration is deleted, it cannot be restored.
Billing
You are not charged for the data import feature of Simple Log Service. However, the feature calls AWS API. You are charged by AWS for the traffic and requests that are generated. The daily fee that is generated when you import S3 objects is calculated by using the following formula. You can view the fees in your AWS bill.
Field | Description |
| The total size of data that is imported from S3 to Simple Log Service per day. Unit: GB. |
| The fee per GB of outbound data that flows over the Internet. |
| The fee per 10,000 PUT requests. |
| The fee per 10,000 GET requests. |
| The interval at which the system detects new objects. Unit: minutes. You can configure New File Check Cycle to specify the interval when you create a data import configuration. |
| The number of objects that are obtained based on File Path Prefix Filter. |
FAQ
Problem description | Possible cause | Solution |
No data is displayed during preview. | The S3 bucket contains no objects, the objects contain no data, or no objects meet the filter conditions. |
|
Garbled characters exist. | The data format, compression format, or encoding format is not configured as expected. | Check the actual format of the S3 objects and modify Data Format, Compression Format, or Encoding Format. To handle the existing garbled characters, create a Logstore and a data import configuration. |
The log time displayed in Simple Log Service is different from the actual log time. | No time field is specified in the data import configuration, or the specified time format or time zone is invalid. | Specify a time field or specify a valid time format and time zone. For more information, see Log Time Configuration. |
After data is imported, the data cannot be queried or analyzed. |
|
|
The number of imported data entries is less than expected. | Some S3 objects contain lines that are greater than 3 MB in size. In this case, the lines are discarded during the import. For more information, see Limits on collection. | When you write data to an S3 object, make sure that the size of a line does not exceed 3 MB. |
The number of S3 objects and the total volume of data are large, but the import speed does not meet expectations. In most cases, the import speed can reach 80 MB/s. | The number of shards in the Logstore is excessively small. For more information, see Limits on performance. | If the number of shards in a Logstore is small, increase the number of shards to 10 or more and check the latency. For more information, see Manage shards. |
Some S3 objects failed to be imported to Simple Log Service. | The settings of the filter conditions are invalid or the size of an object exceeds 5 GB. For more information, see Limits on collection. |
|
An error occurred in parsing an S3 object that is in the Multi-line Text Logs format. | The regular expression that is specified to match the first line or the last line in a log is invalid. | Check whether the regular expression that is specified to match the first line or the last line in a log is valid. |
The latency to import new S3 objects is higher than expected. | The number of existing S3 objects that are obtained based on File Path Prefix Filter exceeds the upper limit. | If the number of existing S3 objects that are obtained based on File Path Prefix Filter exceeds one million, we recommend that you specify a more precise value for File Path Prefix Filter and create more data import jobs. Otherwise, the efficiency of new file discovery is low. |
Error handling
Error | Description |
File read failure | If an S3 object fails to be completely read because a network exception occurs or the object is damaged, the data import job automatically retries to read the object. If the object fails to be read after three retries, the object is skipped. The retry interval is the same as value of New File Check Cycle. If New File Check Cycle is set to Never Check, the retry interval is 5 minutes. |
Compression format parsing error | If an S3 object is in an invalid format, the data import job skips the object during decompression. |
Data format parsing error | If data fails to be parsed, the data import job stores the original text content in the content field of logs. |
S3 bucket absence | A data import job periodically retries. After an S3 bucket is re-created, the data import job automatically resumes the import. |
Permission error | If a permission error occurs when data is read from an S3 bucket or data is written to a Simple Log Service Logstore, the data import job periodically retries. After the error is fixed, the data import job automatically resumes the import. If a permission error occurs, the data import job does not skip any S3 objects. After the error is fixed, the data import job automatically imports data from the unprocessed objects in the S3 bucket to the Simple Log Service Logstore. |