After logs are shipped from Simple Log Service to Object Storage Service (OSS), the logs can be saved as a file in different formats. This topic describes the Optimized Row Columnar (ORC) format.
Parameters
If you set Storage Format to orc when you create an OSS data shipping job of the new version, you must configure the parameters, as shown in the following figure. For more information, see Create an OSS data shipping job (new version).
The following table describes the parameters.
Parameter | Description |
Key Name | The names of the log fields that you want to ship to OSS. You can view log fields on the Raw Logs tab of a Logstore. We recommend that you add log field names one by one. The data shipping job organizes ORC data in the same sequence and uses the log field names as the column names of the ORC file. The log fields that you can ship to OSS include the reserved fields such as __time__, __topic__, and __source__. For more information, see Reserved fields. In the following cases, a column value in the ORC file is null:
Note
|
Type | The data type of the specified log field. The following types of data can be stored in the ORC file: STRING, BOOLEAN, INT32, INT64, FLOAT, and DOUBLE. When logs are shipped from Simple Log Service to OSS, the log fields are converted from the STRING type to a data type that is supported in the ORC file. If a log field fails to be converted, the value of the column is null. |
Sample URLs of OSS objects
After logs are shipped to OSS, the logs are stored in OSS buckets. The following table provides the sample URLs of the OSS objects that store the logs.
If you specify an object suffix when you create a data shipping job, the OSS objects use the suffix.
If you do not specify an object suffix when you create a data shipping job, the OSS objects use the suffix that is generated based on the compression type.
Compression type | Object suffix | Sample URL | Example |
Not compressed | If you specify an object suffix, the specified suffix takes effect. Example: .suffix. | oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.suffix | You can download the OSS object to your computer and use ORC tools to open the object. |
If you do not specify an object suffix, .orc is used as the object suffix. | oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.orc | ||
Snappy | If you specify an object suffix, the specified suffix takes effect. Example: .suffix. | oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.suffix | |
If you do not specify an object suffix, .snappy.orc is used as the object suffix. | oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.snappy.orc | ||
Zstandard | If you specify an object suffix, the specified suffix takes effect. Example: .suffix. | oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.suffix | |
If you do not specify an object suffix, .zst.orc is used as the object suffix. | oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.zst.orc |
Data consumption
You can consume data that is shipped to OSS by using E-MapReduce, Spark, or Hive. For more information, see LanguageManual DDL.
You can also consume data by using inspection tools.
You can use ORC tools to view the metadata of ORC files and read data. You can also download orc-tools-1.7.2-uber.jar in maven repo to verify the consumption result.
View metadata
Run the following command:
java -jar ~/Downloads/orc-tools-1.7.2-uber.jar meta -p file.orc
Output:
Processing data file /Users/xx/file.orc [length: 200779] Structure for /Users/xx/file.orc File Version: 0.12 with ORC_CPP_ORIGINAL by ORC C++ 1.7.2 Rows: 124022 Compression: ZSTD Compression size: 65536 Calendar: Julian/Gregorian Type: struct<bucket:string,bucket_region:string> Stripe Statistics: Stripe 1: Column 0: count: 124022 hasNull: false Column 1: count: 124022 hasNull: false min: bucket0 max: sls-training-data sum: 1468133 Column 2: count: 0 hasNull: true File Statistics: Column 0: count: 124022 hasNull: false Column 1: count: 124022 hasNull: false min: bucket0 max: sls-training-data sum: 1468133 Column 2: count: 0 hasNull: true Stripes: Stripe: offset: 3 data: 199856 rows: 124022 tail: 97 index: 578 Stream: column 0 section ROW_INDEX start: 3 length 102 Stream: column 1 section ROW_INDEX start: 105 length 367 Stream: column 2 section ROW_INDEX start: 472 length 109 Stream: column 0 section PRESENT start: 581 length 25 Stream: column 1 section PRESENT start: 606 length 25 Stream: column 1 section LENGTH start: 631 length 38989 Stream: column 1 section DATA start: 39620 length 160794 Stream: column 2 section PRESENT start: 200414 length 23 Stream: column 2 section LENGTH start: 200437 length 0 Stream: column 2 section DATA start: 200437 length 0 Encoding column 0: DIRECT Encoding column 1: DIRECT_V2 Encoding column 2: DIRECT_V2 File length: 200779 bytes Padding length: 0 bytes Padding ratio: 0%
Read data
Run the following command:
java -jar ~/Downloads/orc-tools-1.7.2-uber.jar data -n 5 file.orc
Output:
Processing data file /Users/xx/file.orc [length: 200779] {"bucket":"bucket3","bucket_region":"cn-hangzhou"} {"bucket":"bucket3","bucket_region":"cn-hangzhou"} {"bucket":"bucket4","bucket_region":"cn-hangzhou"} {"bucket":"dashboard-bucket","bucket_region":"cn-hangzhou"} {"bucket":"bucket2","bucket_region":null}
For more information, run the java -jar orc-tools-1.7.2-uber.jar command or see ORC tools documentation.