After Simple Log Service collects logs, you can ship the logs to an Object Storage Service (OSS) bucket for data storage and analysis. This topic describes how to create an OSS data shipping job of the new version.
Prerequisites
A project and a Logstore are created. For more information, see Step 1: Create a project and a Logstore.
Data is collected. For more information, see Data collection overview.
An OSS bucket is created in the region where the Simple Log Service project resides. For more information, see Create buckets.
Supported regions
Simple Log Service ships data to an OSS bucket that resides in the same region as the specified Simple Log Service project.
You can use the new version of the data shipping feature to ship data to OSS only in the following regions: China (Hangzhou), China (Shanghai), China (Qingdao), China (Beijing), China (Zhangjiakou), China (Hohhot), China (Ulanqab), China (Chengdu), China (Shenzhen), China (Heyuan), China (Guangzhou), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Philippines (Manila), Thailand (Bangkok), Japan (Tokyo), US (Silicon Valley), and US (Virginia).
Create a data shipping job
Log on to the Simple Log Service console.
In the Projects section, click the project you want.
On the tab, find the Logstore, click >, and then choose .
Move the pointer over Object Storage Service (OSS) and click the + icon.
In the Create Data Shipping Job panel, select OSS Export and click OK.
In the Data Shipping to OSS panel, configure the parameters and click OK.
ImportantAfter you create an OSS data shipping job, the frequency at which the data in a shard is shipped to the OSS bucket is based on the values of the Shipping Size and Shipping Time parameters that you configure when you create the job. If one of the conditions specified by Shipping Size and Shipping Time is met, data is shipped.
After you create an OSS data shipping job, you can check whether the job meets your requirements based on the status of the job and the data that is shipped to OSS.
Parameter
Description
Job Name
The name of the data shipping job.
Display Name
The display name of the data shipping job.
Job Description
The description of the data shipping job.
OSS Bucket
The name of the OSS bucket to which you want to ship data.
ImportantYou can ship data only to an existing OSS bucket for which the Write Once Read Many (WORM) feature is disabled. The bucket must reside in the same region as your Simple Log Service project. For more information about the WORM feature, see Retention policies.
You can ship data to an OSS bucket of the Standard, Infrequent Access (IA), Archive, Cold Archive, or Deep Cold Archive storage class. By default, the storage class of the generated OSS objects that store the shipped data is the same as the storage class of the specified OSS bucket. For more information, see Overview of storage classes.
The following limits apply to an OSS bucket that is not of the Standard storage class: minimum storage period and minimum billable size. We recommend that you specify a storage class based on your business requirements when you create an OSS bucket. For more information, see Differences between storage classes.
File Delivery Directory
The directory to which you want to ship data in the OSS bucket. The directory name cannot start with a forward slash (/) or a backslash (\).
After you create the OSS data shipping job, the data in the Logstore is shipped to the directory.
Object Suffix
The suffix of the OSS objects in which the shipped data is stored. If you do not specify an object suffix, Simple Log Service automatically generates an object suffix based on the storage format and compression type that you specify. Example:
.suffix
.Partition Format
The partition format that is used to generate subdirectories in the OSS bucket. A subdirectory is dynamically generated based on the shipping time. The default partition format is %Y/%m/%d/%H/%M. The partition format cannot start with a forward slash (/). For more information about partition format examples, see Partition formats. For more information about the parameters of partition formats, see strptime API.
OSS Write RAM Role
The method that is used to authorize the OSS data shipping job to write data to the OSS bucket. Valid values:
Default Role: specifies that the OSS data shipping job assumes the AliyunLogDefaultRole system role to write data to the OSS bucket. If you select this option, the Alibaba Cloud Resource Name (ARN) of the AliyunLogDefaultRole system role is automatically specified. For more information about how to obtain the ARN, see Access data by using a default role.
Custom Role: specifies that the OSS data shipping job assumes a custom role to write data to the OSS bucket.
If you select this option, you must grant the custom role the permissions to write data to the OSS bucket. Then, enter the ARN of the custom role in the OSS Write RAM Role field. For information about how to obtain the ARN, see one of the following topics based on your business scenario:
If the Logstore and the OSS bucket belong to the same Alibaba Cloud account, obtain the ARN by following the instructions that are provided in Step 2: Grant the RAM role the permissions to write data to an OSS bucket.
If the Logstore and the OSS bucket belong to different Alibaba Cloud accounts, obtain the ARN by following the instructions that are provided in Step 2: Grant RAM Role B the permissions to write data to an OSS bucket.
Logstore Read RAM Role
The method that is used to authorize the OSS data shipping job to read data from the Logstore. Valid values:
Default Role: specifies that the OSS data shipping job assumes the AliyunLogDefaultRole system role to read data from the Logstore. If you select this option, the ARN of the AliyunLogDefaultRole system role is automatically specified. For more information about how to obtain the ARN, see Access data by using a default role.
Custom Role: specifies that the OSS data shipping job assumes a custom role to read data from the Logstore.
If you select this option, you must grant the custom role the permissions to read data from the Logstore. Then, enter the ARN of the custom role in the Logstore Read RAM Role field. For information about how to obtain the ARN, see one of the following topics based on your business scenario:
If the Logstore and the OSS bucket belong to the same Alibaba Cloud account, obtain the ARN by following the instructions that are provided in Step 1: Grant the RAM role the permissions to read data from a Logstore.
If the Logstore and the OSS bucket belong to different Alibaba Cloud accounts, obtain the ARN by following the instructions that are provided in Step 1: Grant RAM Role A the permissions to read data from a Logstore.
Storage Format
The storage format of data. After data is shipped from Simple Log Service to OSS, the data can be stored in different formats. For more information, see CSV format, JSON format, Parquet format, and ORC format.
Compress
Specifies whether to compress data that is shipped to OSS. Valid values:
No Compress(none): Data is not compressed.
Compress(snappy): Data is compressed by using the snappy algorithm. This way, less storage space is occupied in the OSS bucket. For more information, see snappy.
Compress(zstd): Data is compressed by using the zstd algorithm. This way, less storage space is occupied in the OSS bucket.
Compress(gzip): Data is compressed by using the gzip algorithm. This way, less storage space is occupied in the OSS bucket.
Ship Tag
Reserved field. For more information, see Reserved fields.
Batch Size
The job starts to ship data when the data amount of logs in the shard reaches the value of this parameter. The value also determines the size of raw data in each OSS object. Valid values: 5 to 256. Unit: MB. The job ships data only if one of the conditions specified by the Batch Size and Batch Interval parameters is met.
Batch Interval
The job starts to ship data when the time difference between the first log obtained from the shard to the nth log reaches or exceeds the value of this parameter. Valid values: 300 to 900. Unit: seconds. The job ships data only if one of the conditions specified by the Batch Size and Batch Interval parameters is met.
Shipping Time
The interval between two operations that ship the data of a shard. Valid values: 300 to 900. Default value: 300. Unit: seconds.
Shipping Latency
The latency of data shipping. For example, if you set the value to 3600, data is shipped after 1 hour. The data that is generated at 10:00:00 on June 5, 2023 is not written to the specified OSS bucket until 11:00:00 on June 5, 2023. For more information about limits, see Configuration items.
Start Time Range
The time range of data that the OSS data shipping job can ship. The time range varies based on the time when logs are received. Valid values:
All: The OSS data shipping job ships data in the Logstore from the first log until the job is manually stopped.
From Specific Time: The OSS data shipping job ships data in the Logstore from the log that is received at the specified start time until the job is manually stopped.
Specific Time Range: The OSS data shipping job ships data in the Logstore from the log that is received at the specified start time to the log that is received at the specified end time.
NoteThe time range refers to
__tag__:__receive_time__
. For more information, see Reserved fields.
Time Zone
The time zone that is used to format the time.
If you configure the Time Zone and Shard Format parameters, the system generates subdirectories in the OSS bucket based on your configurations.
View OSS data
After data is shipped to OSS, you can view the data in the OSS console. You can also view the data by using other methods, such as OSS API or OSS SDK. For more information, see Manage objects.
The URL of an OSS object is in the following format:
oss://OSS-BUCKET/OSS-PREFIX/PARTITION-FORMAT_RANDOM-ID
OSS-BUCKET
is the name of the OSS bucket. OSS-PREFIX
is the specified directory in the OSS bucket. PARTITION-FORMAT
is the partition format that is used to generate subdirectories. A subdirectory is generated based on the shipping time by using the strptime function. For more information about the strptime function, see strptime API. RANDOM-ID
is the unique identifier of a shipping operation.
In a data shipping job, data is shipped to OSS by performing multiple shipping operations. Each data shipping operation ships data to OSS and stores the data to a different OSS object. The path to an OSS object is determined by the earliest point in time at which Simple Log Service receives the data shipped to the OSS object. This point in time is specified by receive_time. When data is shipped from Simple Log Service to OSS, you must take note of the following scenarios:
Real-time data is shipped. For example, real-time data is shipped at 5-minute intervals. A shipping operation was performed at 00:00:00 on January 22, 2022. This operation shipped the data that was written to a shard in Simple Log Service after 23:55:00 on January 21, 2022 to OSS. If you want to analyze all data from January 22, 2022, you must check all OSS objects in the 2022/01/22 subdirectory. You must also check whether the most recent OSS objects in the 2022/01/21 subdirectory include the data from January 22, 2022.
Historical data is shipped. If the Logstore that is used stores a small volume of data, a shipping operation may pull data from multiple days. In this case, the OSS objects in the 2022/01/22 subdirectory may include all the data from January 23, 2022 but no OSS objects exist in the 2022/01/23 subdirectory.
Partition formats
Each shipping operation corresponds to an OSS object URL, which is in the oss://OSS-BUCKET/OSS-PREFIX/PARTITION-FORMAT_RANDOM-ID format. The following table describes various partition formats for a shipping operation that was performed at 19:50:43 on January 20, 2022.
OSS Bucket | OSS Prefix | Partition format | Object suffix | URL of the OSS object |
test-bucket | test-table | %Y/%m/%d/%H/%M | .suffix | oss://test-bucket/test-table/2022/01/20/19/50_1484913043351525351_2850008.suffix |
test-bucket | log_ship_oss_example | year=%Y/mon=%m/day=%d/log_%H%M | .suffix | oss://test-bucket/log_ship_oss_example/year=2022/mon=01/day=20/log_1950_1484913043351525351_2850008.suffix |
test-bucket | log_ship_oss_example | ds=%Y%m%d/%H | .suffix | oss://test-bucket/log_ship_oss_example/ds=20220120/19_1484913043351525351_2850008.suffix |
test-bucket | log_ship_oss_example | %Y%m%d/ | .suffix | oss://test-bucket/log_ship_oss_example/20220120/_1484913043351525351_2850008.suffix Note If you use this format, platforms such as Hive may fail to parse data in the OSS bucket. We recommend that you do not use this format. |
test-bucket | log_ship_oss_example | %Y%m%d%H | .suffix | oss://test-bucket/log_ship_oss_example/2022012019_1484913043351525351_2850008.suffix |
You can use big data platforms such as Hive, MaxCompute, or Data Lake Analytics (DLA) to analyze OSS data. If you want to use partition format information, you can set PARTITION-FORMAT in the URL of an OSS object in the key=value format. Example URL of an OSS object: oss://test-bucket/log_ship_oss_example/year=2022/mon=01/day=20/log_195043_1484913043351525351_2850008.parquet. In this example, year, month, and day are specified as partition key columns.