Import OSS objects to Simple Log Service for data query and analysis - Simple Log Service

You can import log data from Object Storage Service (OSS) buckets to Simple Log Service and perform operations on the data in Simple Log Service. For example, you can query, analyze, and transform the data. You can import only OSS objects that do not exceed 5 GB in size to Simple Log Service. If you want to import a compressed object, the size of the object after compression cannot exceed 5 GB.

Billing

You are not charged for the data import feature of Simple Log Service. However, the feature calls OSS API. You are charged for the OSS traffic and requests that are generated. For more information about the pricing of related billable items, see Pricing of OSS. The daily OSS fee that is generated when you import data from OSS is calculated by using the following formula:

Fields related to billing

Field	Description
`N`	The number of objects that are imported from OSS to Simple Log Service per day.
`T`	The total size of data that is imported from OSS to Simple Log Service per day. Unit: GB.
`p_read`	The traffic fee per GB of data. If you import data from OSS to Simple Log Service in the same region, outbound traffic over the internal network is generated. You are not charged for the outbound traffic. If you import data from OSS to Simple Log Service across regions, outbound traffic over the Internet is generated.
`p_put`	The fee per 10,000 PUT requests. Simple Log Service calls the ListObjects operation to query the objects in a bucket. You are charged for PUT requests. The fees are included in your OSS bills. Each call can return up to 1,000 data entries. If you have 1 million new objects to import, 1,000 calls are required.
`p_get`	The fee per 10,000 GET requests.
`M`	The interval at which the system detects new objects. Unit: minutes. You can configure New File Check Cycle to specify the interval when you create a data import configuration.

Prerequisites

Log files are uploaded to an OSS bucket. For more information, see Upload objects.
A project and a Logstore are created. For more information, see Create a project and Create a Logstore.
Simple Log Service is authorized to assume the AliyunLogImportOSSRole role to access your OSS resources. You can complete authorization on the Cloud Resource Access Authorization page.
The Resource Access Management (RAM) user that you want to use is granted the oss:ListBuckets permission to access OSS buckets. For more information, see Attach a custom policy to a RAM user.
If you use a RAM user, you must grant the PassRole permission to the RAM user. The following example shows a policy that you can use to grant the permission. For more information, see Create custom policies and Grant permissions to a RAM user.
```
{
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "ram:PassRole",
      "Resource": "acs:ram:*:*:role/aliyunlogimportossrole"
    },
    {
      "Effect": "Allow",
      "Action": "oss:GetBucketWebsite",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "oss:ListBuckets",
      "Resource": "*"
    }
  ],
  "Version": "1"
}    
```

Create a data import configuration

Important

If an OSS object is imported to Simple Log Service and new data is appended to the OSS object, all data of the OSS object is re-imported to Simple Log Service when a data import job for the OSS object is run.

Log on to the Simple Log Service console.
In the Import Data section, click the Data Import tab. Then, click OSS - Data Import.
Select the project and Logstore. Then, click Next.

In the Import Configuration step, create a data import configuration.

In the Import Configuration step, configure the following parameters.

Parameters

Parameter	Description
Job Name	The name of the job. The name must be globally unique.
Display Name	The display name of the job.
Job Description	The description of the job.
OSS Region	The region where the OSS bucket resides. The OSS bucket stores the OSS objects that you want to import to Simple Log Service. If the OSS bucket and the Simple Log Service project reside in the same region, no Internet traffic is generated, and data is transferred at a high speed.
Bucket	The OSS bucket.
File Path Prefix Filter	The directory of the OSS objects. If you configure this parameter, the system can find the OSS objects that you want to import in a more efficient manner. For example, if the OSS objects that you want to import are stored in the csv/ directory, you can set this parameter to csv/. If you leave this parameter empty, the system traverses the entire OSS bucket to find the OSS objects. Note We recommend that you configure this parameter. A larger number of OSS objects in an OSS bucket indicates lower data import efficiency when the entire bucket is traversed.
File Path Regex Filter	The regular expression that you want to use to filter OSS objects by directory. If you configure this parameter, the system can find the OSS objects that you want to import in a more efficient manner. Only the objects whose names match the regular expression are imported. The names include the paths of the objects. By default, this parameter is empty, which indicates that no filtering is performed. For example, if an OSS object that you want to import is named `testdata/csv/bill.csv`, you can set this parameter to `(testdata/csv/)(.*)`. For more information about how to debug a regular expression, see How do I debug a regular expression?
File Modification Time Filter	The modification time based on which you want to filter OSS objects. If you configure this parameter, the system can find the OSS objects that you want to import in a more efficient manner. Valid values: All: To import all OSS objects that meet specified conditions, select this option. From Specific Time: To import OSS objects that are modified after a point in time, select this option. Specific Time Range: To import OSS objects that are modified within a time range, select this option.
Data Format	The format of the OSS objects. Valid values: CSV: You can specify the first line of an OSS object as field names or specify custom field names. All lines except the first line are parsed as the values of log fields. Single-line JSON: An OSS object is read line by line. Each line is parsed as a JSON object. The fields in JSON objects are log fields. Single-line Text Log: Each line in an OSS object is parsed as a log. Multi-line Text Logs: Multiple lines in an OSS object are parsed as a log. You can specify a regular expression to match the first line or the last line of a log. ORC: An OSS object of the Optimized Row Columnar (ORC) format is automatically parsed into the format that is supported by Simple Log Service. You do not need to configure further settings. Parquet: An OSS object of the Parquet format is automatically parsed into the format that is supported by Simple Log Service. You do not need to configure further settings. Alibaba Cloud OSS Access Log: An OSS object is parsed as an access log of Alibaba Cloud OSS. For more information, see Loggging. Alibaba Cloud CDN Download Log: An OSS object is parsed as a download log of Alibaba Cloud CDN (CDN). For more information, see Download offline logs.
Compression Format	The compression format of the OSS objects. Simple Log Service decompresses the OSS objects based on the specified format to read data.
Encoding Format	The encoding format of the OSS objects. UTF-8 and GBK are supported.
New File Check Cycle	If new objects are constantly generated in the specified directory of OSS objects, you can configure New File Check Cycle based on your business requirements. After you configure this parameter, the data import job is continuously running in the background, and new objects are automatically detected and read at regular intervals. The system ensures that data in an OSS object is not repeatedly written to Simple Log Service. If new objects are no longer generated in the specified directory of OSS objects, you can change the value of New File Check Cycle to Never Check. Then, the data import job automatically exits after all objects that meet specified conditions are read.
Import Archive Files	If the OSS objects that you want to import are of the Archive or Cold Archive storage class, Simple Log Service can read data from the objects only after the objects are restored. If you turn on this switch, Archive and Cold Archive objects are automatically restored. This switch does not support Deep Cold Archive objects. Note It takes approximately 1 minute to restore Archive objects, which may cause the first preview to time out. If the first preview times out, you must wait for a period of time and try again. It takes approximately 1 hour to restore Cold Archive objects. If the preview times out, you can skip the preview or wait for an hour and try again. By default, restored Cold Archive objects remain valid for seven days. This allows sufficient time for the system to import Cold Archive objects to Simple Log Service.
Log Time Configuration
Time Field	The time field. Enter the name of a time column in an OSS object. If you set Data Format to CSV, Single-line JSON, ORC, Parquet, Alibaba Cloud OSS Access Log, or Alibaba Cloud CDN Download Log, you must configure this parameter. This parameter specifies the log time.
Regular Expression to Extract Time	The regular expression that you want to use to extract log time. If you set Data Format to Single-line Text Log or Multi-line Text Logs, you must configure this parameter. For example, if a sample log is 127.0.0.1 - - [10/Sep/2018:12:36:49 +0800] "GET /index.html HTTP/1.1", you can set Regular Expression to Extract Time to `[0-9]{0,2}\/[0-9a-zA-Z]+\/[0-9\: +]+`. Note For other data types, if you want to extract part of the time field, you can specify a regular expression.
Time Field Format	The time format that you want to use to parse the value of the time field. You can specify a time format that is supported by the Java SimpleDateFormat class. Example: `yyyy-MM-dd HH:mm:ss`. For more information about the time format syntax, see Class SimpleDateFormat. For more information about common time formats, see Time formats. You can specify an epoch time format, which can be epoch, epochMillis, epochMicro, or epochNano.
Time Zone	The time zone for the value of the time field. If the value of Time Field Format is an epoch time format, you do not need to configure this parameter. If you want to use daylight saving time when you parse logs, you can select a time zone in UTC. Otherwise, select a time zone in GMT.
Advanced Settings
OSS Metadata Indexing	If the number of OSS objects exceeds one million, we recommend that you turn on the switch. Otherwise, the system requires a long period of time to find new objects. If you turn on this switch, the system can find new objects in an OSS bucket within seconds. This way, data in the new objects can be written to Simple Log Service in near real time. Before you turn on this switch, you must enable the metadata management feature in OSS. For more information, see Data indexing.

If you set Data Format to CSV or Multi-line Text Logs, you must configure additional parameters. The following tables describe the parameters.

CSV

Parameter	Description
Delimiter	The delimiter for logs. The default value is a comma (,).
Quote	The quote that is used to enclose a CSV-formatted string.
Escape Character	The escape character for logs. The default value is a backslash (\).
Maximum Lines	The maximum number of lines allowed for a log if the original log has multiple lines. Default value: 1.
First Line as Field Name	If you turn on First Line as Field Name, the first line in a CSV file is used to extract field names. For example, the first line in the CSV file that is shown in the following figure is used to extract field names.
Custom Fields	If you turn off First Line as Field Name, you can specify custom field names. Separate multiple field names with commas (,).
Lines to Skip	The number of lines that are skipped. For example, if you set this parameter to 1, the first line of a CSV file is skipped, and log collection starts from the second line.

Multi-line Text Logs

Parameter	Description
Position to Match Regular Expression	The usage of a regular expression. Regular Expression to Match First Line: If you select this option, the regular expression that you specify is used to match the first line of a log. The unmatched lines are collected as a part of the log until the maximum number of lines that you specify is reached. Regular Expression to Match Last Line: If you select this option, the regular expression that you specify is used to match the last line of a log. The unmatched lines are collected as a part of the next log until the maximum number of lines that you specify is reached.
Regular Expression	The regular expression. You can specify a regular expression based on the log content. For more information about how to debug a regular expression, see How do I debug a regular expression?
Maximum Lines	The maximum number of lines allowed for a log.

Click Preview to preview the import result.
After you confirm the result, click Next.

Create indexes and preview data. Then, click Next. By default, full-text indexing is enabled in Simple Log Service. You can also manually create field indexes for the collected logs or click Automatic Index Generation. Then, Simple Log Service generates field indexes. For more information, see Create indexes.
Important
If you want to query all fields in logs, we recommend that you use full-text indexes. If you want to query only specific fields, we recommend that you use field indexes. This helps reduce index traffic. If you want to analyze fields, you must create field indexes. You must include a SELECT statement in your query statement for analysis.
Click Query Log. On the query and analysis page that appears, check whether OSS data is imported.
Wait for approximately 1 minute. If the required OSS data exists, the data is imported.

What to do next

After you create a data import configuration, you can view the configuration details and related statistical reports in the Simple Log Service console.

In the Projects section, click the project to which the data import configuration belongs.
On the Log Storage > Logstores tab, click the Logstore to which the data import configuration belongs, choose Data Collection > Data Import, and then click the name of the data import configuration.
View the data import job
On the Import Configuration Overview page, view the basic information and statistical reports of the data import configuration.
Modify the data import job
To modify the data import configuration, click Edit Configurations. For more information, see Create a data import configuration.
Delete the data import job
To delete the data import configuration, click Delete Configuration.
Warning
After the data import configuration is deleted, it cannot be restored.
Stop the data import job
To stop the data import job, click Stop.

FAQ

Problem description	Possible cause	Solution
No data is displayed during preview.	The OSS bucket contains no objects, the objects contain no data, or no objects meet the filter conditions.	Check whether the OSS bucket contains objects that are not empty or check whether CSV files contain only the headers line. If no OSS objects contain data, you must wait until the objects contain data and then import the objects. Modify File Path Prefix Filter, File Path Regex Filter, and File Modification Time Filter.
Garbled characters exist.	The data format, compression format, or encoding format is not configured as expected.	Check the actual format of the OSS objects and modify Data Format, Compression Format, or Encoding Format. To handle the existing garbled characters, create a Logstore and a data import configuration.
The log time displayed in Simple Log Service is different from the actual log time.	No time field is specified in the data import configuration, or the specified time format or time zone is invalid.	Specify a time field or specify a valid time format and time zone. For more information, see Create a data import configuration.
After data is imported, the data cannot be queried or analyzed.	The data is not within the query time range. No indexes are configured. Configured indexes do not take effect.	Check whether the time of the log that you want to query is within the query time range. If no, adjust the query time range and query the data again. Check whether indexes are configured for the Logstore to which the objects are imported. If no, configure indexes. For more information, see Create indexes and Reindex logs for a Logstore. If indexes are configured for the Logstore and the volume of imported data is displayed as expected on the Data Processing Insight dashboard, the possible cause is that the indexes do not take effect. In this case, reconfigure the indexes. For more information, see Reindex logs for a Logstore.
The number of imported data entries is less than expected.	Some OSS objects contain lines that are greater than 3 MB in size. In this case, the lines are discarded during the import. For more information, see Limits on collection.	When you write data to an OSS object, make sure that the size of a line does not exceed 3 MB.
The number of OSS objects and the total volume of data are large, but the import speed does not meet expectations. In most cases, the import speed can reach 80 MB/s.	The number of shards in the Logstore is excessively small. For more information, see Limits on performance.	If the number of shards in a Logstore is small, increase the number of shards to 10 or more and check the latency. For more information, see Manage shards.
OSS buckets cannot be selected during the creation of a data import configuration.	The AliyunLogImportOSSRole role is not assigned to Simple Log Service.	Complete authorization based on the descriptions in the "Prerequisites" section of this topic.
Some OSS objects failed to be imported to Simple Log Service.	The settings of the filter conditions are invalid or the size of an object exceeds 5 GB. For more information, see Limits on collection.	Check whether the OSS objects that you want to import meet the filter conditions. If no, modify the filter conditions. Check whether the size of each OSS object that you want to import is less than 5 GB. If no, reduce the size of the object.
Archive objects are not imported to Simple Log Service.	Import Archive Files is turned off. For more information, see Limits on collection.	Method 1: Modify the data import configuration and turn on Import Archive Files. Method 2: Create a different data import configuration and turn on Import Archive Files.
An error occurred in parsing an OSS object that is in the Multi-line Text Logs format.	The regular expression that is specified to match the first line or the last line in a log is invalid.	Check whether the regular expression that is specified to match the first line or the last line in a log is valid.
The latency to import new OSS objects is higher than expected.	The number of existing OSS objects that meet the conditions specified by File Path Prefix Filter exceeds the upper limit and OSS Metadata Indexing is turned off in the data import configuration.	If the number of existing OSS objects that meet the conditions specified by File Path Prefix Filter exceeds one million, turn on OSS Metadata Indexing in the data import configuration. Otherwise, the efficiency of new file discovery is low.

Error handling

Error	Description
File read failure	If an OSS object fails to be completely read because a network exception occurs or the object is damaged, the data import job automatically retries to read the object. If the object fails to be read after three retries, the object is skipped. The retry interval is the same as the value of New File Check Cycle. If New File Check Cycle is set to Never Check, the retry interval is 5 minutes.
Compression format parsing error	If an OSS object is in an invalid format, the data import job skips the object during decompression.
Data format parsing error	If an OSS object that is in a binary format such as ORC or Parquet fails to be parsed, the data import job skips the OSS object. If data in other formats fails to be parsed, the data import job stores the original text content in the content field of logs.
OSS bucket absence	A data import job periodically retries. After an OSS bucket is re-created, the data import job automatically resumes the import.
Permission error	If a permission error occurs when data is read from an OSS bucket or data is written to a Simple Log Service Logstore, the data import job periodically retries. After the error is fixed, the data import job automatically resumes the import. If a permission error occurs, the data import job does not skip any OSS objects. After the error is fixed, the data import job automatically imports data from the unprocessed objects in the OSS bucket to the Simple Log Service Logstore.

Billing

Prerequisites

Create a data import configuration

CSV

Multi-line Text Logs

What to do next

View the data import job

Modify the data import job

Delete the data import job

Stop the data import job

FAQ

Error handling