ORC format - Simple Log Service - Alibaba Cloud Documentation Center

After logs are shipped from Simple Log Service to Object Storage Service (OSS), the logs can be saved as a file in different formats. This topic describes the Optimized Row Columnar (ORC) format.

Parameters

If you set Storage Format to orc when you create an OSS data shipping job of the new version, you must configure the parameters, as shown in the following figure. For more information, see Create an OSS data shipping job (new version). orc

The following table describes the parameters.

Parameter

Description

Key Name

The names of the log fields that you want to ship to OSS. You can view log fields on the Raw Logs tab of a Logstore. We recommend that you add log field names one by one. The data shipping job organizes ORC data in the same sequence and uses the log field names as the column names of the ORC file.

The log fields that you can ship to OSS include the reserved fields such as __time__, __topic__, and __source__. For more information, see Reserved fields.

In the following cases, a column value in the ORC file is null:

The specified log field does not exist in the Logstore.
The specified log field fails to be converted from the STRING type to a non-STRING type such as DOUBLE or INT64.

Note

Each log field can be configured as an ORC field only once.
If a log contains two fields that have the same name, such as request_time, Log Service displays one of the fields as request_time_0. The two fields are still stored as request_time in Log Service. When you configure a shipping rule, you can use only the original field name request_time.
If a log contains fields that have the same name, Log Service randomly ships the value of one of the fields. We recommend that you do not include fields that have the same name in your logs.

Type

The data type of the specified log field. The following types of data can be stored in the ORC file: STRING, BOOLEAN, INT32, INT64, FLOAT, and DOUBLE.

When logs are shipped from Simple Log Service to OSS, the log fields are converted from the STRING type to a data type that is supported in the ORC file. If a log field fails to be converted, the value of the column is null.

Sample URLs of OSS objects

After logs are shipped to OSS, the logs are stored in OSS buckets. The following table provides the sample URLs of the OSS objects that store the logs.

Note

If you specify an object suffix when you create a data shipping job, the OSS objects use the suffix.
If you do not specify an object suffix when you create a data shipping job, the OSS objects use the suffix that is generated based on the compression type.

Compression type	Object suffix	Sample URL	Example
Not compressed	If you specify an object suffix, the specified suffix takes effect. Example: .suffix.	oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.suffix	You can download the OSS object to your computer and use ORC tools to open the object.
Not compressed	If you do not specify an object suffix, .orc is used as the object suffix.	oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.orc
Snappy	If you specify an object suffix, the specified suffix takes effect. Example: .suffix.	oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.suffix
Snappy	If you do not specify an object suffix, .snappy.orc is used as the object suffix.	oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.snappy.orc
Zstandard	If you specify an object suffix, the specified suffix takes effect. Example: .suffix.	oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.suffix
Zstandard	If you do not specify an object suffix, .zst.orc is used as the object suffix.	oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.zst.orc

Data consumption

You can consume data that is shipped to OSS by using E-MapReduce, Spark, or Hive. For more information, see LanguageManual DDL.

You can also consume data by using inspection tools.

You can use ORC tools to view the metadata of ORC files and read data. You can also download orc-tools-1.7.2-uber.jar in maven repo to verify the consumption result.

View metadata

Run the following command:

java -jar ~/Downloads/orc-tools-1.7.2-uber.jar meta -p file.orc

Output:

Processing data file /Users/xx/file.orc [length: 200779]
Structure for /Users/xx/file.orc
File Version: 0.12 with ORC_CPP_ORIGINAL by ORC C++ 1.7.2
Rows: 124022
Compression: ZSTD
Compression size: 65536
Calendar: Julian/Gregorian
Type: struct<bucket:string,bucket_region:string>

Stripe Statistics:
  Stripe 1:
    Column 0: count: 124022 hasNull: false
    Column 1: count: 124022 hasNull: false min: bucket0 max: sls-training-data sum: 1468133
    Column 2: count: 0 hasNull: true

File Statistics:
  Column 0: count: 124022 hasNull: false
  Column 1: count: 124022 hasNull: false min: bucket0 max: sls-training-data sum: 1468133
  Column 2: count: 0 hasNull: true

Stripes:
  Stripe: offset: 3 data: 199856 rows: 124022 tail: 97 index: 578
    Stream: column 0 section ROW_INDEX start: 3 length 102
    Stream: column 1 section ROW_INDEX start: 105 length 367
    Stream: column 2 section ROW_INDEX start: 472 length 109
    Stream: column 0 section PRESENT start: 581 length 25
    Stream: column 1 section PRESENT start: 606 length 25
    Stream: column 1 section LENGTH start: 631 length 38989
    Stream: column 1 section DATA start: 39620 length 160794
    Stream: column 2 section PRESENT start: 200414 length 23
    Stream: column 2 section LENGTH start: 200437 length 0
    Stream: column 2 section DATA start: 200437 length 0
    Encoding column 0: DIRECT
    Encoding column 1: DIRECT_V2
    Encoding column 2: DIRECT_V2

File length: 200779 bytes
Padding length: 0 bytes
Padding ratio: 0%

Read data

Run the following command:

java -jar ~/Downloads/orc-tools-1.7.2-uber.jar data -n 5 file.orc

Output:

Processing data file /Users/xx/file.orc [length: 200779]
{"bucket":"bucket3","bucket_region":"cn-hangzhou"}
{"bucket":"bucket3","bucket_region":"cn-hangzhou"}
{"bucket":"bucket4","bucket_region":"cn-hangzhou"}
{"bucket":"dashboard-bucket","bucket_region":"cn-hangzhou"}
{"bucket":"bucket2","bucket_region":null}

For more information, run the java -jar orc-tools-1.7.2-uber.jar command or see ORC tools documentation.