By Long Qingyun
It is very complex for most storage formats in big data to support random updates, because it needs to scan large files. Therefore, MaxCompute provides the latest feature: transactional tables to update and delete statements.
However, update and delete statements do not apply to frequent updates, data deletion, or real-time writing to target tables. Moreover, the statements are not supported for non-transactional tables. This article describes methods to update data through the insert overwrite statement.
create table update_table(ID int,
tranValue string,
last_update_user string) PARTITIONED by(dt STRING ) LIFECYCLE 1;
INSERT INTO update_table PARTITION (dt="20210510") VALUES
(1, 'value_01', 'creation'),
(2, 'value_02', 'creation'),
(3, 'value_03', 'creation'),
(4, 'value_04', 'creation'),
(5, 'value_05', 'creation'),
(6, 'value_06', 'creation'),
(7, 'value_07', 'creation'),
(8, 'value_08', 'creation'),
(9, 'value_09', 'creation'),
(10, 'value_10','creation');
Update the string to value_011 if the id is 1.
--Update a data record
INSERT OVERWRITE TABLE update_table PARTITION( dt)
SELECT id
,CASE WHEN id=1 THEN "value_011"
ELSE TranValue
END TranValue
,last_update_user
,dt
FROM update_table
WHERE dt = "20210510"
;
Create an incremental table and insert data based on the update.
create table update_table_inc(ID int,
TranValue string,
last_update_user string) LIFECYCLE 1;
INSERT INTO update_table_inc VALUES
(5, 'value_11', 'creation'),
(6, NULL, '20170410'),
(7, 'value22', '20170413');
If the id is 5 and 7, update the TranValue, because the TranValue is null and not updated when the id is 6
INSERT OVERWRITE TABLE update_table PARTITION( dt)
SELECT a.id
,CASE WHEN a.id=b.id and b.TranValue is not null THEN b.TranValue
ELSE a.TranValue
END TranValue
,CASE WHEN a.id=b.id and b.TranValue is not null THEN b.last_update_user
ELSE a.last_update_user
END last_update_user
,dt
FROM update_table a
LEFT JOIN update_table_inc b
ON a.id = b.id
WHERE a.dt = "20210510"
;
--Delete data
INSERT OVERWRITE TABLE update_table PARTITION( dt)
SELECT *
FROM update_table
WHERE dt = "20210510" and id !=4
;
137 posts | 19 followers
FollowAlibaba Cloud MaxCompute - January 15, 2019
Alibaba Cloud MaxCompute - July 20, 2022
Alibaba Cloud MaxCompute - January 18, 2019
Alibaba Cloud Community - May 7, 2024
Alibaba Cloud MaxCompute - December 6, 2021
Jack008 - June 10, 2020
137 posts | 19 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreBuild a Data Lake with Alibaba Cloud Object Storage Service (OSS) with 99.9999999999% (12 9s) availability, 99.995% SLA, and high scalability
Learn MoreMore Posts by Alibaba Cloud MaxCompute