This topic describes how to create, read from, and write to OSS foreign tables in ORC format.
Scope
OSS external tables do not support the cluster property.
A single file cannot exceed 2 GB. If a file is too large, split it.
MaxCompute and OSS must be in the same region.
Supported data types
For more information about MaxCompute data types, see Data Types Version 1.0 and Data Types Version 2.0.
Java Native Interface (JNI) mode (the Native ORC Reader is not used to read tables):
set odps.ext.oss.orc.native=false;. Both read and write operations are supported.Native mode (the Native ORC Reader is used to read tables):
set odps.ext.oss.orc.native=true;. Only read operations are supported.
Pattern | Java mode (read/write) | Native mode (read-only) |
TINYINT |
|
|
SMALLINT |
|
|
INT |
|
|
BIGINT |
|
|
BINARY |
|
|
FLOAT |
|
|
DOUBLE |
|
|
DECIMAL(precision,scale) |
|
|
VARCHAR(n) |
|
|
CHAR(n) |
|
|
STRING |
|
|
DATE |
|
|
DATETIME |
|
|
TIMESTAMP |
|
|
TIMESTAMP_NTZ |
|
|
BOOLEAN |
|
|
ARRAY |
|
|
MAP |
|
|
STRUCT |
|
|
JSON |
|
|
Supported compression formats
To read from or write to compressed OSS files, you must add the
with serdepropertiesproperty to the CREATE TABLE statement. For more information, see Parameters for WITH SERDEPROPERTIES.You can read from and write to ORC files that are compressed using SNAPPY or ZLIB.
Create a foreign table
Syntax
If the schema in the ORC file is inconsistent with the foreign table schema:
Inconsistent number of columns: If an ORC file has fewer columns than the foreign table defined in the Data Definition Language (DDL) statement, MaxCompute fills the missing columns with NULL values when reading the data. If the ORC file has more columns, the extra columns are discarded.
Inconsistent column types: MaxCompute lets you read INT data from an ORC file as the STRING type, but this practice is not recommended. When you attempt to read STRING data as the INT type, string values are converted to NULL, whereas numeric values are processed correctly.
Basic syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
STORED AS orc
LOCATION '<oss_location>';Full syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH serdeproperties(
'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
STORED AS orc
LOCATION '<oss_location>'
tblproperties (
'<xxx>'='<yyy>'
);Common parameters
For information about common parameters, see Basic syntax parameters.
Unique parameters
Parameters for WITH SERDEPROPERTIES
property_name | Scenario | Description | property_value | Default value |
mcfed.orc.schema.resolution | Add this property when the data in the same OSS foreign table has different schemas. | Specifies the parsing method for ORC files. The value | name | By column position. By column position. This is equivalent to setting |
Parameters for TBLPROPERTIES
property_name | Scenario | Description | property_value | Default value |
mcfed.orc.compress | Add this property to write ORC data to OSS in a compressed format. | ORC compression property. Specifies the compression method for ORC data. |
| None |
io.compression.codecs | Add this property if the OSS data file is in Raw-Snappy format. | If you set this parameter to `True`, MaxCompute can read compressed data. Otherwise, the read operation fails. | com.aliyun.odps.io.compress.SnappyRawCodec | None |
odps.external.data.output.prefix (Compatible with odps.external.data.prefix) | Add this property to specify a custom prefix for the names of output files. |
| A valid string, such as 'mc_'. | None |
odps.external.data.enable.extension | Add this property to display the extension of output files. | If this property is set to True, the file extension is displayed. Otherwise, the file extension is not displayed. |
| False |
odps.external.data.output.suffix | Add this property to specify a custom suffix for the names of output files. | The suffix can contain only digits, letters, and underscores (_). | A valid string, such as '_hangzhou'. | None |
odps.external.data.output.explicit.extension | Add this property to specify a custom extension for output files. |
| A valid string, such as "jsonl". | None |
mcfed.orc.batch.size | Controls the number of records processed at a time. This affects memory usage and processing efficiency. | ORC tuning property. Defines the default batch size for ORC in rows. | A non-negative integer | 1000 |
Write data
For more information about the write syntax in MaxCompute, see Write syntax.
Query and analysis
For more information about the SELECT syntax, see Query syntax.
For more information about optimizing query plans, see Query optimization.
ORC foreign tables support query optimization by enabling predicate pushdown (PPD). To enable PPD, add the following parameters before your SQL statement:
The PPD parameter is used only in Native mode, which requires the Native switch to be set to true.
-- Enable the ORC native reader. SET odps.ext.oss.orc.native=true; -- Enable ORC PPD. SET odps.storage.orc.use.predicate.pushdown=true;
Scenarios
This section describes how to create an ORC foreign table that uses SNAPPY compression, and then read data from and write data to the table.
Prerequisites
You have created a MaxCompute project.
Prepare an OSS bucket and folder. For more information, see Create a bucket and Manage folders.
MaxCompute is deployed only in some regions. To prevent data connectivity issues, use an OSS bucket in the same region as your MaxCompute project.
Authorization
You must have permissions to access OSS. You can use an Alibaba Cloud account, a Resource Access Management (RAM) user, or a RAM role to access OSS foreign tables. For more information about authorization, see STS authorization for OSS.
You must have the CreateTable permission in the MaxCompute project. For more information about table operation permissions, see MaxCompute permissions.
Prepare a data file in SNAPPY format.
In the
oss-mc-testbucket for the Sample data, create theorc_snappy/dt=20250526folder hierarchy, and upload the snappy file to thedt=20250526partition folder.Create an ORC foreign table with SNAPPY compression.
CREATE EXTERNAL TABLE orc_data_type_snappy ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) PARTITIONED BY (dt STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS ORC LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/orc_snappy/' tblproperties ( 'mcfed.orc.compress'='SNAPPY');Import partition data. If an OSS foreign table is a partitioned table, you must perform an additional step to import the partition data. For more information, see Syntax for adding partition data to an OSS foreign table.
-- Import partition data MSCK REPAIR TABLE orc_data_type_snappy ADD PARTITIONS;Read data from the ORC foreign table.
SELECT * FROM orc_data_type_snappy WHERE dt=20250526 LIMIT 10;The following result is returned:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongitude | recordtime | direction | dt | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | 1 | 12 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | SW | 20250526 | | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20250526 | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:01 | NE | 20250526 | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:02 | NE | 20250526 | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:03 | W | 20250526 | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:04 | S | 20250526 | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:05 | S | 20250526 | | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:06 | N | 20250526 | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:07 | SW | 20250526 | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:08 | NE | 20250526 | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:09 | N | 20250526 | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+Write data to the ORC foreign table and query the data.
INSERT INTO orc_data_type_snappy PARTITION (dt ='20250526') VALUES (1,16,76,1,46.81006,-92.08174,'9/14/2014 0:10','SW'); -- Query the newly written data SELECT * FROM orc_data_type_snappy WHERE dt = '20250526' AND recordid=16;The following result is returned:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongitude | recordtime | direction | dt | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | 1 | 16 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | SW | 20250526 | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+