If you need to perform the INSERT
, UPDATE
, and DELETE
operations on a transactional table or a Delta table, you can encapsulate the operations in one MERGE INTO
statement. Then, you can execute the MERGE INTO statement to perform these operations on the table based on the join condition with a source table. This way, all data in the table is scanned only once and execution efficiency is improved.
Prerequisites
You are granted the Select and Update permissions on the destination transactional table. For more information about how to grant permissions, see MaxCompute permissions.
Feature description
MaxCompute allows you to perform the DELETE
or UPDATE
operation on a table to delete data from or update data in the table. If you want to perform INSERT
, UPDATE
, and DELETE
operations on the table at the same time, you must write and execute a statement for each operation. In this case, full table scan operations are performed multiple times. To improve execution efficiency, MaxCompute allows you to execute a MERGE INTO
statement to perform INSERT, UPDATE, and DELETE operations on a table at the same time. This way, full table scan operations are performed only once. This method is more efficient than the method in which you separately perform the INSERT
, UPDATE
, and DELETE
operations.
The MERGE INTO
statement ensures the atomicity of operations. A job is successful only after all the INSERT
, UPDATE
, and DELETE
operations in the job are successful. If an operation fails, the job also fails.
If you separately perform the INSERT
, UPDATE
, and DELETE
operations, specific operations may fail. The data on which successful operations are performed cannot be restored. To prevent this issue, you can execute the MERGE INTO
statement.
Limits
You cannot perform multiple INSERT
or UPDATE
operations on the same rows in a table by using a single MERGE INTO
statement.
Syntax
merge into <target_table> as <alias_name_t> using <source expression|table_name> as <alias_name_s>
-- The ON clause specifies the JOIN conditions of the source table and the destination table.
on <boolean expression1>
-- The WHEN MATCHED...THEN clause specifies the operation to be performed when the result of the ON clause is True. The operations of multiple WHEN MATCHED...THEN clauses cannot be performed on the same data.
when matched [and <boolean expression2>] then update set <set_clause_list>
when matched [and <boolean expression3>] then delete
-- The WHEN MATCHED...THEN clause specifies the operation to be performed when the result of the ON clause is False.
when not matched [and <boolean expression4>] then insert values <value_list>
target_table: required. The name of the destination table, which must be an existing table.
alias_name_t: required. The alias of the destination table.
source expression|table_name: required. The name of the source table, view, or subquery that you want to join with the destination table.
alias_name_s: required. The alias of the source table, view, or subquery that you want to join with the destination table.
boolean expression1: required. A condition that returns a value of the BOOLEAN type. The value must be True or False.
boolean expression2, boolean expression3, and boolean expression4: optional. You can specify a condition for each of the
UPDATE
,DELETE
, andINSERT
operations that you want to perform. The condition must return a value of the BOOLEAN type. Take note of the following items:If a MERGE INTO statement has three WHEN clauses, you can perform each of the
UPDATE
,DELETE
, andINSERT
operations only once.If both the
UPDATE
andDELETE
operations are included in a MERGE INTO statement, you must specify a[and <boolean expression>]
condition for the operation that needs to be performed first.The
WHEN NOT MATCHED
clause can be used only as the last WHEN clause and supports only theINSERT
operation.
set_clause_list: If you want to perform the
UPDATE
operation, this parameter is required. This parameter specifies the data that you want to update. For more information aboutUPDATE
, see the "UPDATE" section in UPDATE and DELETE.value_list: If you want to perform the
INSERT
operation, this parameter is required. This parameter specifies the data that you want to insert. For more information aboutvalues
, see VALUES.
Examples
Example 1: Create a destination table named acid_address_book_base1 and a source table named tmp_table1, and insert data into the tables. Execute the
MERGE INTO
statement on the destination table. The data entries in the source table that meet the specifiedON
join condition are used to update the joined data entries in the destination table. The data entries that do not meet the specifiedON
join condition and whose value of the event_type column is I are inserted into the destination table. Sample statements:-- Create a destination table named acid_address_book_base1. create table if not exists acid_address_book_base1 (id bigint,first_name string,last_name string,phone string) partitioned by(year string, month string, day string, hour string) tblproperties ("transactional"="true"); -- Create a source table named tmp_table1. create table if not exists tmp_table1 (id bigint, first_name string, last_name string, phone string, _event_type_ string); -- Insert test data into the acid_address_book_base1 table. insert overwrite table acid_address_book_base1 partition(year='2020', month='08', day='20', hour='16') values (4, 'nihaho', 'li', '222'), (5, 'tahao', 'ha', '333'), (7, 'djh', 'hahh', '555'); -- Query data from the destination table and check whether the INSERT operation is successfully performed. set odps.sql.allow.fullscan=true; select * from acid_address_book_base1; +------------+------------+------------+------------+------------+------------+------------+------------+ | id | first_name | last_name | phone | year | month | day | hour | +------------+------------+------------+------------+------------+------------+------------+------------+ | 4 | nihaho | li | 222 | 2020 | 08 | 20 | 16 | | 5 | tahao | ha | 333 | 2020 | 08 | 20 | 16 | | 7 | djh | hahh | 555 | 2020 | 08 | 20 | 16 | +------------+------------+------------+------------+------------+------------+------------+------------+ -- Insert test data into the tmp_table1 table. insert overwrite table tmp_table1 values (1, 'hh', 'liu', '999', 'I'), (2, 'cc', 'zhang', '888', 'I'), (3, 'cy', 'zhang', '666', 'I'),(4, 'hh', 'liu', '999', 'U'), (5, 'cc', 'zhang', '888', 'U'),(6, 'cy', 'zhang', '666', 'U'); -- Query data from the source table and check whether the INSERT operation is successfully performed. select * from tmp_table1; +------------+------------+------------+------------+--------------+ | id | first_name | last_name | phone | _event_type_ | +------------+------------+------------+------------+--------------+ | 1 | hh | liu | 999 | I | | 2 | cc | zhang | 888 | I | | 3 | cy | zhang | 666 | I | | 4 | hh | liu | 999 | U | | 5 | cc | zhang | 888 | U | | 6 | cy | zhang | 666 | U | +------------+------------+------------+------------+--------------+ -- Execute the MERGE INTO statement. merge into acid_address_book_base1 as t using tmp_table1 as s on s.id = t.id and t.year='2020' and t.month='08' and t.day='20' and t.hour='16' when matched then update set t.first_name = s.first_name, t.last_name = s.last_name, t.phone = s.phone when not matched and (s._event_type_='I') then insert values(s.id, s.first_name, s.last_name,s.phone,'2020','08','20','16'); -- Query data from the destination table and check whether the MERGE INTO operation is successfully performed. select * from acid_address_book_base1; +------------+------------+------------+------------+------------+------------+------------+------------+ | id | first_name | last_name | phone | year | month | day | hour | +------------+------------+------------+------------+------------+------------+------------+------------+ | 4 | hh | liu | 999 | 2020 | 08 | 20 | 16 | | 5 | cc | zhang | 888 | 2020 | 08 | 20 | 16 | | 7 | djh | hahh | 555 | 2020 | 08 | 20 | 16 | | 1 | hh | liu | 999 | 2020 | 08 | 20 | 16 | | 2 | cc | zhang | 888 | 2020 | 08 | 20 | 16 | | 3 | cy | zhang | 666 | 2020 | 08 | 20 | 16 | +------------+------------+------------+------------+------------+------------+------------+------------+
Example 2: Create a destination table named merge_acid_dp and a source table named merge_acid_source. Insert data into the tables. Then, execute the
MERGE INTO
statement without specifying a partition to update data in or insert data into the destination table. The MERGE INTO operation takes effect on all partitions.-- Create a destination table named merge_acid_dp. create table if not exists merge_acid_dp(c1 bigint not null, c2 bigint not null) partitioned by (dd string, hh string) tblproperties ("transactional" = "true"); -- Create a source table named merge_acid_source. create table if not exists merge_acid_source(c1 bigint not null, c2 bigint not null, c3 string, c4 string) lifecycle 30; -- Insert test data into the destination table merge_acid_dp. insert overwrite table merge_acid_dp partition (dd='01', hh='01') values (1, 1), (2, 2); insert overwrite table merge_acid_dp partition (dd='02', hh='02') values (4, 1), (3, 2); -- Query data from the destination table and check whether the INSERT operation is successfully performed. set odps.sql.allow.fullscan=true; select * from merge_acid_dp; +------------+------------+----+----+ | c1 | c2 | dd | hh | +------------+------------+----+----+ | 1 | 1 | 01 | 01 | | 2 | 2 | 01 | 01 | | 4 | 1 | 02 | 02 | | 3 | 2 | 02 | 02 | +------------+------------+----+----+ -- Insert test data into the source table merge_acid_source. insert overwrite table merge_acid_source values(8, 2, '03', '03'), (5, 5, '05', '05'), (6, 6, '02', '02'); -- Query data from the source table to check whether the INSERT operation is successfully performed. select * from merge_acid_source; +------------+------------+----+----+ | c1 | c2 | c3 | c4 | +------------+------------+----+----+ | 8 | 2 | 03 | 03 | | 5 | 5 | 05 | 05 | | 6 | 6 | 02 | 02 | +------------+------------+----+----+ -- Execute the MERGE INTO statement. merge into merge_acid_dp tar using merge_acid_source src on tar.c2 = src.c2 when matched then update set tar.c1 = src.c1 when not matched then insert values(src.c1, src.c2, src.c3, src.c4); -- Query data from the destination table to check whether the MERGE INTO operation is successfully performed. select * from merge_acid_dp; +------------+------------+----+----+ | c1 | c2 | dd | hh | +------------+------------+----+----+ | 6 | 6 | 02 | 02 | | 5 | 5 | 05 | 05 | | 8 | 2 | 02 | 02 | | 8 | 2 | 01 | 01 | | 1 | 1 | 01 | 01 | | 4 | 1 | 02 | 02 | +------------+------------+----+----+
Example 3: Create a destination table named merge_acid_sp and a source table named merge_acid_source. Insert data into the tables. Execute the
MERGE INTO
statement to update data in or insert data into a specified partition of the destination table.-- Create a destination table named merge_acid_sp. create table if not exists merge_acid_sp(c1 bigint not null, c2 bigint not null) partitioned by (dd string, hh string) tblproperties ("transactional" = "true"); -- Create a source table named merge_acid_source. create table if not exists merge_acid_source(c1 bigint not null, c2 bigint not null, c3 string, c4 string) lifecycle 30; -- Insert test data into the destination table merge_acid_sp. insert overwrite table merge_acid_sp partition (dd='01', hh='01') values (1, 1), (2, 2); insert overwrite table merge_acid_sp partition (dd='02', hh='02') values (4, 1), (3, 2); -- Query data from the destination table and check whether the INSERT operation is successfully performed. set odps.sql.allow.fullscan=true; select * from merge_acid_sp; +------------+------------+----+----+ | c1 | c2 | dd | hh | +------------+------------+----+----+ | 1 | 1 | 01 | 01 | | 2 | 2 | 01 | 01 | | 4 | 1 | 02 | 02 | | 3 | 2 | 02 | 02 | +------------+------------+----+----+ -- Insert test data into the source table merge_acid_source. insert overwrite table merge_acid_source values(8, 2, '03', '03'), (5, 5, '05', '05'), (6, 6, '02', '02'); -- Query data from the source table to check whether the INSERT operation is successfully performed. select * from merge_acid_source; +------------+------------+----+----+ | c1 | c2 | c3 | c4 | +------------+------------+----+----+ | 8 | 2 | 03 | 03 | | 5 | 5 | 05 | 05 | | 6 | 6 | 02 | 02 | +------------+------------+----+----+ -- Specify the dd = '01' and hh = '01' partitions in the ON clause of the MERGE INTO statement and execute the statement to update or insert data in the specified partitions of the destination table. merge into merge_acid_sp tar using merge_acid_source src on tar.c2 = src.c2 and tar.dd = '01' and tar.hh = '01' when matched then update set tar.c1 = src.c1 when not matched then insert values(src.c1, src.c2, src.c3, src.c4); -- Query data from the destination table to check whether the MERGE INTO operation is successfully performed. select * from merge_acid_sp; +------------+------------+----+----+ | c1 | c2 | dd | hh | +------------+------------+----+----+ | 5 | 5 | 05 | 05 | | 6 | 6 | 02 | 02 | | 8 | 2 | 01 | 01 | | 1 | 1 | 01 | 01 | | 4 | 1 | 02 | 02 | | 3 | 2 | 02 | 02 | +------------+------------+----+----+
Example 4: Create a Delta table named mf_tt6 as the destination table and create a source table named mf_delta. Insert data into the source table and destination table. Specify partitions in the
MERGE INTO
statement and execute the statement to perform the UPDATE, INSERT, or DELETE operation on the specified partitions of the destination table.NoteThe Delta table feature is in invitational preview. By default, you cannot directly use this feature. To use this feature, click the link to apply for trial use.
-- Create a Delta table named mf_tt6 as the destination table.
create table if not exists mf_tt6 (pk bigint not null primary key,
val bigint not null)
partitioned by (dd string, hh string)
tblproperties ("transactional"="true");
-- Insert test data into the destination table mf_tt6.
insert overwrite table mf_tt6 partition (dd='01', hh='02') values (1, 1), (2, 2), (3, 3);
insert overwrite table mf_tt6 partition (dd='01', hh='01') values (1, 10), (2, 20), (3, 30);
-- Enable a full table scan only for the current session. Execute the SELECT statement to query data from the mf_tt6 table.
set odps.sql.allow.fullscan=true;
select * from mf_tt6;
+------------+------------+----+----+
| pk | val | dd | hh |
+------------+------------+----+----+
| 1 | 10 | 01 | 01 |
| 3 | 30 | 01 | 01 |
| 2 | 20 | 01 | 01 |
| 1 | 1 | 01 | 02 |
| 3 | 3 | 01 | 02 |
| 2 | 2 | 01 | 02 |
+------------+------------+----+----+
-- Create a source table named mf_delta and insert test data into the table.
create table if not exists mf_delta as select pk, val from values (1, 10), (2, 20), (6, 60) t (pk, val);
-- Query data from the source table and confirm the result of the INSERT operation.
select * from mf_delta;
+------+------+
| pk | val |
+------+------+
| 1 | 10 |
| 2 | 20 |
| 6 | 60 |
+------+------+
-- Specify the dd = '01' and hh = '02' partitions in the ON clause of the MERGE INTO statement and execute the statement to perform the UPDATE, INSERT, or DELETE operation on the specified partitions in the destination table mf_tt6.
merge into mf_tt6 using mf_delta
on mf_tt6.pk = mf_delta.pk and mf_tt6.dd='01' and mf_tt6.hh='02'
when matched and (mf_tt6.pk > 1) then
update set mf_tt6.val = mf_delta.val
when matched then delete
when not matched then
insert values (mf_delta.pk, mf_delta.val, '01', '02');
-- Query data from the destination table and check whether the MERGE INTO operation is successfully performed.
set odps.sql.allow.fullscan=true;
select * from mf_tt6;
+------------+------------+----+----+
| pk | val | dd | hh |
+------------+------------+----+----+
| 1 | 10 | 01 | 01 |
| 3 | 30 | 01 | 01 |
| 2 | 20 | 01 | 01 |
| 3 | 3 | 01 | 02 |
| 6 | 60 | 01 | 02 |
| 2 | 20 | 01 | 02 |
+------------+------------+----+----+