×
Community Blog How to Load OSS gz Data into MaxCompute with DataWorks Scheduled Job

How to Load OSS gz Data into MaxCompute with DataWorks Scheduled Job

This article describes how to load OSS gz data into MaxCompute with DataWorks scheduled job.

By Haoran Wang, Sr. Big Data Solution Architect of Alibaba Cloud

Create External Table with OSS Location

CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_tbl
(
    mydata  STRING
)
PARTITIONED BY 
(
    pt      STRING
)
STORED BY 'com.aliyun.odps.CsvStorageHandler' 
WITH SERDEPROPERTIES ('odps.properties.rolearn' = 'acs:ram::1753425463711063:role/aliyunodpsdefaultrole',
'odps.text.option.gzip.input.enabled' = 'true',
'odps.text.option.gzip.output.enabled' = 'true')
LOCATION 'oss://oss-us-east-1-internal.aliyuncs.com/hrw-bucket/folder1/'
;

MSCK REPAIR TABLE  mc_oss_tbl ADD PARTITIONS
with PROPERTIES ('odps.msck.partition.column.mapping'='pt:$pt');

1
2

You can check the external table by running the DML directly.

3

Create an Internal Table

CREATE TABLE ods_tbl1_internal
(
    ac       INT
    ,sbid    DOUBLE
    ,adomain STRING
)
PARTITIONED BY 
(    pt       STRING) ;

Configure an Hourly Scheduled Job

Create a ODPS SQL job

4

INSERT OVERWRITE TABLE ods_tbl1_internal PARTITION (pt = ${bizdate})
SELECT  UDF1('ac') AS ac
        ,UDF1('sbid') AS sbid
        ,UDF1('adomain') AS adomain
FROM    mc_oss_tbl
WHERE   pt = ${bizdate}
;

Configure to run it hourly.

5

Then you need to submit the job the deploy to production env.

6

Reference:

https://www.alibabacloud.com/help/en/maxcompute/latest/access-oss-data-by-using-a-built-in-extractor

0 1 0
Share on

Farruh

30 posts | 21 followers

You may also like

Comments