UPDATE - MaxCompute - Alibaba Cloud Documentation Center

Updates the values of one or more columns of the rows that meet the specified conditions in partitioned transactional tables or non-partitioned transactional tables.

Prerequisites

You are granted the Select and Update permissions on the transactional table on which you want to execute the DELETE or UPDATE statement. For more information, see MaxCompute permissions.

Limits

The DELETE statement, UPDATE statement, and transactional table on which the DELETE or UPDATE statement is executed have the following limits:

You can execute the DELETE and UPDATE statements only on transactional tables. For more information about how to create a transactional table, see CREATE TABLE.
When you create a clustered table or an external table, you cannot set the clustered table or external table to a transactional table.
You cannot convert between transactional tables and MaxCompute internal tables, external tables, or clustered tables.
Jobs from other systems, such as MaxCompute Spark, Machine Learning Platform for AI, and Graph, cannot access transactional tables.
CLONE TABLE and MERGE PARTITION operations are not supported.
Before you execute the UPDATE, DELETE, or INSERT OVERWRITE statement on important data in transactional tables, you must execute the SELECT and INSERT statements to back up the data to other tables.

Usage notes

When you execute the DELETE or UPDATE statement to delete or update data in tables or partitions of the tables, take note of the following items:

In specific scenarios, you may want to execute the DELETE or UPDATE statement for a small amount of data in a table and infrequently perform read and other operations in subsequent procedures. To reduce the storage space that is occupied by the table, we recommend that you merge the base files with all delta files after you execute the DELETE or UPDATE statement for the table several times. For more information, see Merge files of a transactional table.
In specific scenarios, you may want to delete or update more than 5% of the data in a table or a partition of the table at a low frequency and perform frequent read operations in subsequent procedures. We recommend that you execute the INSERT OVERWRITE or INSERT INTO statement in such scenarios. For more information, see INSERT INTO|OVERWRITE.
For example, you want to perform delete or update operations for 10% of data 10 times each day. In this case, we recommend that you estimate the total cost and the consumption of the subsequent read performance if you execute the DELETE or UPDATE statement on the table. Then, compare the estimated result with that of executing the INSERT OVERWRITE or INSERT INTO statement. This helps you choose an efficient method.
MaxCompute executes multiple DELETE and UPDATE statements in jobs at a time. Each statement consumes resources and incurs fees. We recommend that you delete or update a batch of data at a time. For example, if you run a Python script to generate and submit a large number of row-level update jobs, and each statement is executed for only one row or a small number of rows of data, each statement incurs fees that correspond to the amount of input data scanned by the SQL statement and consumes the related computing resources. When multiple statements are accumulated, the costs are significantly increased and the system efficiency is reduced. Sample statements:
```
-- We recommend that you execute the following statement: 
update table1 set col1= (select value1 from table2 where table1.id = table2.id and table1.region = table2.region);

-- We recommend that you do not execute the following statements: 
update table1 set col1=1 where id='2021063001'and region='beijing';                  
update table1 set col1=2 where id='2021063002'and region='beijing';
```

Syntax

--1
update <table_name> set <col1_name> = <value1> [, <col2_name> = <value2> ...] [WHERE <where_condition>];

--2
update <table_name> set (<col1_name> [, <col2_name> ...]) = (<value1> [, <value2> ...])[WHERE <where_condition>];

--3
UPDATE <table_name>
       SET <col1_name> = <value1> [ , <col2_name> = <value2> , ... ]
        [ FROM <additional_tables> ]
        [ WHERE <where_condition> ]

Parameters

table_name: required. The name of the transactional table on which you want to execute the UPDATE statement.
col1_name and col2_name: the columns that you want to update. You must specify at least one column.
value1 and value2: the new values that you want to assign to the specified columns. You must update the value of at least one column.
where_condition: optional. A WHERE clause that is used to filter data based on conditions. For more information, see WHERE clause (where_condition). If you execute the UPDATE statement on a table without a WHERE clause, all data in the table is updated.

additional_tables: optional. A from clause.

The UPDATE statement can be used with the from clause. The from clause makes the use of the UPDATE statement more convenient. The following table describes the UPDATE statements that are used with and without the from clause.

Scenario Sample code

Without the from clause

Scenario	Sample code
Without the from clause	`update target set v = (select min(v) from src group by k where target.k = src.key) where target.k in (select k from src);`
With the from clause	`update target set v = b.v from (select k, min(v) v from src group by k) b where target.k = b.k;`

update target set v =
  (select min(v) from src group by k where target.k = src.key)
  where target.k in (select k from src);

With the from clause

update target set v = b.v
  from (select k, min(v) v from src group by k) b
  where target.k = b.k;

The following conclusions are obtained based on the preceding sample code:

If you use multiple rows of data in the source table to update one row of data in the destination table, you must write aggregate operations to ensure the uniqueness of the data source. The code of aggregate operations is simpler and easier to understand when you use the from clause than that when you do not use the from clause.
If you need to only update the intersection data during join operations, you must use a from clause or a where clause. The where clause is more complex than the from clause.

Examples

Example 1: Create a non-partitioned table named acid_update and insert data into the table. Then, execute the UPDATE statement to update the columns of the rows that meet the specified conditions in the table. The following statements show an example:

-- Create a non-partitioned table named acid_update.   
create table if not exists acid_update(id bigint) tblproperties ("transactional"="true");

-- Insert data into the table. 
insert overwrite table acid_update values(1),(2),(3),(2);

-- Query the table to check whether data is inserted. 
select * from acid_update; 

+------------+
| id         |
+------------+
| 1          |
| 2          |
| 3          |
| 2          |
+------------+

-- Update the value 2 in the id column to 4. 
update acid_update set id = 4 where id = 2; 

-- Query the table to check whether the value 2 is changed to 4 in the id column. 
select * from acid_update; 

+------------+
| id         |
+------------+
| 1          |
| 3          |
| 4          |
| 4          |
+------------+

Example 2: Create a partitioned table named acid_update and insert data into the table. Then, execute the UPDATE statement to update the columns of the rows that meet the specified conditions in the table. The following statements show an example:

-- Create a partitioned table named acid_update_pt.  
create table if not exists acid_update_pt(id bigint) partitioned by(ds string) tblproperties ("transactional"="true");

-- Add a partition to the table. 
alter table acid_update_pt add if not exists partition (ds= '2019');

-- Insert data into the added partition. 
insert overwrite table acid_update_pt partition (ds='2019') values(1),(2),(3);
-- Query the partition to check whether data is inserted.
select * from acid_update_pt where ds = '2019';

+------------+------------+
| id         | ds         |
+------------+------------+
| 1          | 2019       |
| 2          | 2019       |
| 3          | 2019       |
+------------+------------+

-- Update the value 2 of the id column in the 2019 partition to 4. 
update acid_update_pt set id = 4 where ds = '2019' and id = 2; 

-- Query the partition to check whether the value 2 is changed to 4 in the id column. 
select * from acid_update_pt where ds = '2019';

+------------+------------+
| id         | ds         |
+------------+------------+
| 4          | 2019       |
| 1          | 2019       |
| 3          | 2019       |
+------------+------------+

Example 3: Create a transactional table named acid_update_t that you want to update and an associated table named acid_update_s. Then, update the values of multiple columns at a time in the acid_update_t table. The following statements show an example:

-- Create a transactional table named acid_update_t that you want to update and an associated table named acid_update_s. 
create table if not exists acid_update_t(id int,value1 int,value2 int) tblproperties ("transactional"="true");
create table if not exists acid_update_s(id int,value1 int,value2 int);

-- Insert data into the tables. 
insert overwrite table acid_update_t values(2,20,21),(3,30,31),(4,40,41);
insert overwrite table acid_update_s values(1,100,101),(2,200,201),(3,300,301);
--Method 1: Update the values of specific columns with constants. 
update acid_update_t set (value1, value2) = (60,61);
-- Query the acid_update_t table to check whether data is updated as expected. 
select * from acid_update_t;

+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 2          | 60         | 61         |
| 3          | 60         | 61         |
| 4          | 60         | 61         |
+------------+------------+------------+

--Method 2: Use the data in the acid_update_s table to update all rows in the acid_update_t table. If specific rows in the acid_update_t table cannot be matched, null values are assigned to the rows. 
update acid_update_t set (value1, value2) = (select value1, value2 from acid_update_s where acid_update_t.id = acid_update_s.id);
-- Query the acid_update_t table to check whether data is updated as expected. 
select * from acid_update_t;

+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 2          | 200        | 201        |
| 3          | 300        | 301        |
| 4          | NULL       | NULL       |
+------------+------------+------------+

-- Method 3: Use the data in the acid_update_s table to update the acid_update_t table. Add filter conditions for the acid_update_t table to update only the rows that intersect with those in the acid_update_s table. 
update acid_update_t set (value1, value2) = (select value1, value2 from acid_update_s where acid_update_t.id = acid_update_s.id) where acid_update_t.id in (select id from acid_update_s);
-- Query the acid_update_t table to check whether data is updated as expected. 
select * from acid_update_t;
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 4          | 40         | 41         |
| 2          | 200        | 201        |
| 3          | 300        | 301        |
+------------+------------+------------+

-- Method 4: Use the aggregate results of the acid_update_t and acid_update_s tables to update the acid_update_t table. 
update acid_update_t set (id, value1, value2) = (select id, max(value1),max(value2) from acid_update_s where acid_update_t.id = acid_update_s.id group by acid_update_s.id) where acid_update_t.id in (select id from acid_update_s);
-- Query the acid_update_t table to check whether data is updated as expected. 
select * from acid_update_t;

+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 4          | 40         | 41         |
| 2          | 200        | 201        |
| 3          | 300        | 301        |
+------------+------------+------------+

Example 4: Perform join operations on two tables. Sample statements:

-- Create a destination table named acid_update_t and create a table named acid_update_s for joins.
create table if not exists acid_update_t(id bigint,value1 bigint,value2 bigint) tblproperties ("transactional"="true");
create table if not exists acid_update_s(id bigint,value1 bigint,value2 bigint);
-- Insert data into these tables.
insert overwrite table acid_update_t values(2,20,21),(3,30,31),(4,40,41);
insert overwrite table acid_update_s values(1,100,101),(2,200,201),(3,300,301);
-- Query data from these tables.
select * from acid_update_t;
-- The following result is returned:
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 2          | 20         | 21         |
| 3          | 30         | 31         |
| 4          | 40         | 41         |
+------------+------------+------------+
select * from acid_update_s;
-- The following result is returned:
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 1          | 100        | 101        |
| 2          | 200        | 201        |
| 3          | 300        | 301        |
+------------+------------+------------+

-- Use the data in the acid_update_s table to update the acid_update_t table. Add filter conditions for the acid_update_t table to update only the intersection of the two tables.
update acid_update_t set value1 = b.value1, value2 = b.value2
from acid_update_s b where acid_update_t.id = b.id;

-- Value 20 is updated to 200, value 21 is updated to 201, value 30 is updated to 300, and value 31 is updated to 301.
select * from acid_update_t;
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 4          | 40         | 41         |
| 2          | 200        | 201        |
| 3          | 300        | 301        |
+------------+------------+------------+

Example 5: Perform complex join operations on more than two tables. Sample statements:

-- Create a destination table named acid_update_t and create a table named acid_update_s for joins.
create table if not exists acid_update_t(id bigint,value1 bigint,value2 bigint) tblproperties ("transactional"="true");
create table if not exists acid_update_s(id bigint,value1 bigint,value2 bigint);
create table if not exists acid_update_m(id bigint,value1 bigint,value2 bigint);
-- Insert data into the tables.
insert overwrite table acid_update_t
values(2,20,21),(3,30,31),(4,40,41),(5,50,51);
insert overwrite table acid_update_s
values (1,100,101),(2,200,201),(3,300,301),(4,400,401),(5,500,501);
insert overwrite table acid_update_m
values(3,30,101),(4,400,201),(5,300,301);

-- Query data from the table.
select * from acid_update_t;
-- The following result is returned:
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 2          | 20         | 21         |
| 3          | 30         | 31         |
| 4          | 40         | 41         |
| 5          | 50         | 51         |
+------------+------------+------------+
select * from acid_update_s;
-- The following result is returned:
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 1          | 100        | 101        |
| 2          | 200        | 201        |
| 3          | 300        | 301        |
| 4          | 400        | 401        |
| 5          | 500        | 501        |
+------------+------------+------------+
select * from acid_update_m;
-- The following result is returned:
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 3          | 30         | 101        |
| 4          | 400        | 201        |
| 5          | 300        | 301        |
+------------+------------+------------+
-- Use the data in the acid_update_s table to update the acid_update_t table. Use the where clause to filter values in the acid_update_s, acid_update_t, and acid_update_m tables.
update acid_update_t
set value1 = acid_update_s.value1, value2 = acid_update_s.value2
from acid_update_s
where acid_update_t.id = acid_update_s.id
and acid_update_s.id > 2
and acid_update_t.value1 not in
(select value1 from acid_update_m where id = acid_update_t.id)
and acid_update_s.value1 not in
(select value1 from acid_update_m where id = acid_update_s.id);

-- Only the row of data that contains the value of 5 in the id column of the acid_update_t table meets the condition. 
The value in the value1 column is updated to 500, and the value in the value2 column is updated to 501.
select * from acid_update_t;
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 5          | 500        | 501        |
| 2          | 20         | 21         |
| 3          | 30         | 31         |
| 4          | 40         | 41         |
+------------+------------+------------+

Related statements

DELETE: Deletes one or more rows of data that meets the specified conditions in partitioned transactional tables or non-partitioned transactional tables.
ALTER TABLE: Merges transactional table files.