更新或刪除資料（UPDATE | DELETE） - MaxCompute

MaxCompute支援通過delete、update操作，在行層級刪除或更新Transactional表中的資料。

本文中的命令您可以在如下工具平台執行：

前提條件

執行delete、update操作前需要具備目標Transactional表的讀取表資料許可權（Select）及更新表資料許可權（Update）。授權操作請參見MaxCompute許可權。

功能介紹

MaxCompute的delete、update功能具備與傳統資料庫用法類似的刪除或更新表中指定行的能力。

實際使用delete、update功能時，系統會針對每一次刪除或更新操作自動產生使用者不可見的Delta檔案，用於記錄刪除或更新的資料資訊。具體實現原理如下：

delete：Delta檔案中使用txnid(bigint)和rowid(bigint)欄位標識Transactional表的Base檔案（表在系統中的儲存形式）中的記錄在哪次刪除操作中被刪除。
例如，表t1的Base檔案為f1，且內容為a, b, c, a, b，當執行delete from t1 where c1='a';命令後，系統會產生一個單獨的f1.delta檔案。假設txnid是t0，則f1.delta的內容是((0, t0), (3, t0))，標識行0和行3，在txnt0中被刪除了。如果再執行一次delete操作，系統會又產生一個f2.delta檔案，該檔案仍然是根據Base檔案f1編號，讀取檔案時，基於Base檔案f1和當前所有Delta檔案的共同表示結果，讀取沒有被刪除的資料。
update：update操作會轉換為delete+insert into的實現邏輯。

delete、update功能具備的優勢如下：

寫資料量下降
此前，MaxCompute通過insert into或insert overwrite操作方式刪除或更新表資料，更多資訊，請參見插入或覆寫資料（INSERT INTO | INSERT OVERWRITE）。當使用者需要更新表或分區中的少量資料時，如果通過insert操作實現，需要先讀取表的全量資料，然後通過select操作更新資料，最後通過insert操作將全量資料寫回表中，效率較低。使用delete、update功能後，系統無需寫回全部資料，寫資料量會顯著下降。
說明
- 對於隨用隨付情境，delete、update和insert overwrite任務的寫資料部分不收費，但是delete、update任務需要按分區過濾讀取需要變更的資料，用於標註刪除的記錄或寫回更新的記錄，而讀取資料部分依然遵照SQL作業隨用隨付模型收費，所以delete、update任務相比insert overwrite任務，費用並不能因為寫資料量減少而減少。
- 對於訂用帳戶情境，delete、update減少了寫資料資源消耗，與insert overwrite相比，相同資源可以運行更多的任務。
可直接讀取最新狀態的表
此前，MaxCompute在批次更新表資料情境使用的是拉鏈表，該方式需要在表中增加start_date和end_date輔助列，標識某一行記錄的生命週期。當查詢表的最新狀態時，系統需要從大量資料中根據時間戳記擷取表的最新狀態，使用起來不夠直觀。使用delete、update功能後，系統可以基於表的Base檔案和Delta檔案的共同表示結果，直接讀取最新狀態的表。

重要

多次delete、update操作會使Transactional表的底層儲存增大，會提高儲存和後續查詢費用，且影響後續查詢效率，建議定期合并（Compact）後台資料。更多合併作業資訊，請參見合并Transactional表檔案。

當作業並發運行且操作的目標表相同時，可能會出現作業衝突問題，更多資訊，請參見ACID語義。

應用情境

delete、update功能適用於隨機、低頻刪除或更新表或分區中的少量資料。例如，按照T+1周期性地批量對錶或分區中5%以下的行刪除或更新資料。

delete、update功能不適用於高頻更新、刪除資料或即時寫入目標表情境。

使用限制

delete、update功能及對應Transactional表、Delta Table表的使用限制如下：
- 在對Transactional表的重要資料執行update、delete或insert overwrite操作前需要手動通過select+insert操作將資料備份至其他表中。
- Delta Table表Update文法不支援修改PK列。

注意事項

通過delete、update操作刪除或更新表或分區內的資料時，注意事項如下：

如果需要對錶中較少資料進行刪除或更新操作，且操作和後續讀資料的頻率也不頻繁，建議使用delete、update操作，並且在多次執行刪除或更新操作之後，請合并表的Base檔案和Delta檔案，降低表的實際儲存。更多資訊，請參見合并Transactional表檔案。
如果刪除或更新行數較多（超過5%）並且操作不頻繁，但後續對該表的讀操作比較頻繁，建議使用insert overwrite或insert into操作。更多資訊，請參見插入或覆寫資料（INSERT INTO | INSERT OVERWRITE）。
例如，某業務情境為每次刪除或更新10%的資料，一天更新10次。建議根據實際情況評估delete、update操作產生的費用及後續對讀效能的消耗是否小於每次使用insert overwrite或insert into操作產生的費用及後續對讀效能的消耗，比較兩種方式在具體情境中的效率，選擇更優方案。
刪除資料會產生Delta檔案，所以刪除資料不一定能降低儲存，如果您希望通過delete操作刪除資料來降低儲存，請合并表的Base檔案和Delta檔案，降低表的實際儲存。更多資訊，請參見合并Transactional表檔案。
MaxCompute會按照批處理方式執行delete、update作業，每一條語句都會使用資源併產生費用，建議您使用批量方式刪除或更新資料。例如您通過Python指令碼產生並提交了大量行層級更新作業，且每條語句只操作一行或者少量行資料，則每條語句都會產生與SQL掃描輸入資料量對應的費用，並使用相應的計算資源，多條語句累加時將明顯增加費用成本，降低系統效率。命令樣本如下。
- 推薦方案：
```
UPDATE table1 SET col1= (SELECT value1 FROM table2 WHERE table1.id = table2.id AND table1.region = table2.region);
```
- 不推薦方案：
```
UPDATE table1 SET col1=1 WHERE id='2021063001'AND region='beijing';                  
UPDATE table1 SET col1=2 WHERE id='2021063002'AND region='beijing';
```

刪除資料（DELETE）

delete操作用於刪除Transactional或Delta Table表中滿足指定條件的單行或多行資料。

命令格式

delete from <table_name> [where <where_condition>];

參數說明
- table_name：必填。待執行delete操作的Transactional或Delta Table表名稱。
- where_condition：可選。WHERE子句，用於篩選滿足條件的資料。更多WHERE子句資訊，請參見WHERE子句（where_condition）。如果不帶WHERE子句，會刪除表中的所有資料。

使用樣本

樣本1：建立非分區表acid_delete，並匯入資料，執行delete操作刪除滿足指定條件的行資料。命令樣本如下：

--建立Transactional表acid_delete。
create table if not exists acid_delete(id bigint) tblproperties ("transactional"="true"); 

--插入資料。
insert overwrite table acid_delete values(1),(2),(3),(2); 

--查看插入結果。
select * from acid_delete; 
--返回結果
+------------+
| id         |
+------------+
| 1          |
| 2          |
| 3          |
| 2          |
+------------+

--刪除id為2的行，如果在MaxCompute用戶端（odpscmd）執行，需要輸入yes|no確認。
delete from acid_delete where id = 2; 

--查看結果表中資料只有1、3。
select * from acid_delete; 
--返回結果
+------------+
| id         |
+------------+
| 1          |
| 3          |
+------------+

樣本2：建立分區表acid_delete_pt，並匯入資料，執行delete操作刪除滿足指定條件的行。命令樣本如下：

--建立Transactional表acid_delete_pt。 
create table if not exists acid_delete_pt(id bigint) partitioned by(ds string) tblproperties ("transactional"="true");

--添加分區。
alter table acid_delete_pt add if not exists partition (ds= '2019');
alter table acid_delete_pt add if not exists partition (ds= '2018');

--插入資料。
insert overwrite table acid_delete_pt partition (ds='2019') values(1),(2),(3);
insert overwrite table acid_delete_pt partition (ds='2018') values(1),(2),(3);

--查看插入結果。
select * from acid_delete_pt;
--返回結果
+------------+------------+
| id         | ds         |
+------------+------------+
| 1          | 2018       |
| 2          | 2018       |
| 3          | 2018       |
| 1          | 2019       |
| 2          | 2019       |
| 3          | 2019       |
+------------+------------+

--刪除分區為2019且id為2的資料，如果在MaxCompute用戶端（odpscmd）執行，需要輸入yes|no確認。
delete from acid_delete_pt where ds='2019' and id = 2;

--查看結果表中已刪除分區為2019且id為2的資料。
select * from acid_delete_pt;
--返回結果
+------------+------------+
| id         | ds         |
+------------+------------+
| 1          | 2018       |
| 2          | 2018       |
| 3          | 2018       |
| 1          | 2019       |
| 3          | 2019       |
+------------+------------+

樣本3：建立目標表acid_delete_t和關聯表acid_delete_s，通過關聯操作刪除滿足指定條件的行。命令樣本如下：

--建立目標Transactional表acid_delete_t和關聯表acid_delete_s。
create table if not exists acid_delete_t(id int,value1 int,value2 int) tblproperties ("transactional"="true");
create table if not exists acid_delete_s(id int,value1 int,value2 int);

--插入資料。
insert overwrite table acid_delete_t values(2,20,21),(3,30,31),(4,40,41);
insert overwrite table acid_delete_s values(1,100,101),(2,200,201),(3,300,301);

--刪除acid_delete_t表中id與acid_delete_s表中id不匹配的行。如果在MaxCompute用戶端（odpscmd）執行，需要輸入yes|no確認。
delete from acid_delete_t where not exists (select * from acid_delete_s where acid_delete_t.id=acid_delete_s.id);

--查看結果表中只有id為2、3的資料。
select * from acid_delete_t;

--返回結果
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 2          | 20         | 21         |
| 3          | 30         | 31         |
+------------+------------+------------+

樣本4：建立Delta Table表mf_dt，並匯入資料，執行delete操作刪除滿足指定條件的行。命令樣本如下：

--建立目標Delta Table表mf_dt。
create table if not exists mf_dt (pk bigint not null primary key, 
                  val bigint not null) 
                  partitioned by (dd string, hh string) 
                  tblproperties ("transactional"="true");

--插入資料
insert overwrite table mf_dt partition (dd='01', hh='02') values (1, 1), (2, 2), (3, 3);

--查看插入結果                 
select * from mf_dt where dd='01' and hh='02';

--返回結果
+------------+------------+----+----+
| pk         | val        | dd | hh |
+------------+------------+----+----+
| 1          | 1          | 01 | 02 |
| 3          | 3          | 01 | 02 |
| 2          | 2          | 01 | 02 |
+------------+------------+----+----+

--刪除分區為01和02，且val為2的資料。
delete from mf_dt where val = 2  and dd='01' and hh='02';

--查看結果表中只有val為1、3的資料
select * from mf_dt where dd='01' and hh='02';

--返回結果
+------------+------------+----+----+
| pk         | val        | dd | hh |
+------------+------------+----+----+
| 1          | 1          | 01 | 02 |
| 3          | 3          | 01 | 02 |
+------------+------------+----+----+

清空列資料

使用clear column命令清空普通表的列，將不再使用的列資料從磁碟刪除共置NULL，從而達到降低儲存成本的目的。

命令格式

ALTER TABLE <table_name> 
           [partition ( <pt_spec>[, <pt_spec>....] )] 
					 CLEAR COLUMN column1[, column2, column3, ...]
			                         [without touch];

參數說明
- table_name：將要執行清空列資料的表名稱。
- column1 , column2 ... ：將要被清空資料的列名稱。
- partition：指定分區，若未指定，則表示操作所有分區。
- pt_spec：分區描述，格式為(partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...)。
- without touch：表示不更新LastDataModifiedTime；若未指定，則會更新LastDataModifiedTime。
說明
目前預設指定without touch，計劃在下一階段支援未指定without touch的清空列資料行為，即若未指定without touch，則會更新LastDataModifiedTime。
使用限制
- 不支援對具有非空屬性的列進行clear column操作，可以手動取消not nullable屬性：
```
alter table <table_name> change column <old_col_name> null;
```
- ACID表不支援清空列資料。
- Cluster不支援清空列資料。
- 不支援對巢狀型別內部執行清空列資料操作。
- 不支援清空所有列資料（DROP TABLE可以起到相同效果，且效能更好）。
注意事項
- Clear Column操作不會改變表的Archive屬性。
- 對巢狀型別的列執行Clear Column操作可能會失敗。
  失敗的情況為在列式（Columnar）巢狀型別關閉的情況下對含有Columnar巢狀型別的表做Clear Column操作。
- Clear Column命令執行需要依賴Storage Service線上服務，在作業量多的情況下可能需要排隊導致任務變慢。
- Clear Column操作需要使用計算資源對資料進行讀取與寫入，所以針對訂用帳戶使用者，會佔用計算資源，對隨用隨付使用者，會產生和SQL一樣的費用。（目前正在邀測中，暫時不進行收費。）

使用樣本

--建立表
create table if not exists mf_cc(key string, 
     value string, 
     a1 BIGINT , 
     a2 BIGINT , 
     a3 BIGINT , 
     a4 BIGINT)  
     partitioned by(ds string, hr string);
                 
--添加分區
alter table mf_cc add if not exists partition (ds='20230509', hr='1641');

--插入資料
insert into mf_cc partition (ds='20230509', hr='1641') 
     values("key","value",1,22,3,4);
     
--查詢資料     
select * from mf_cc where ds='20230509' and  hr='1641';
--返回結果
+-----+-------+------------+------------+--------+------+---------+-----+
| key | value | a1         | a2         | a3     | a4   | ds      | hr  |
+-----+-------+------------+------------+--------+------+---------+-----+
| key | value | 1          | 22         | 3      | 4    | 20230509| 1641|
+-----+-------+------------+------------+--------+------+---------+-----+
--清空列資料
alter table mf_cc partition(ds='20230509', hr='1641') 
                  clear column key,a1 
    without touch;
--查詢資料     
select * from mf_cc where ds='20230509' and  hr='1641';
--返回結果，key和a1的資料已經變成null
+-----+-------+------------+------------+--------+------+---------+-----+
| key | value | a1         | a2         | a3     | a4   | ds      | hr  |
+-----+-------+------------+------------+--------+------+---------+-----+
| null| value | null       | 22         | 3      | 4    | 20230509| 1641|
+-----+-------+------------+------------+--------+------+---------+-----+

下圖為lineitem表（aliorc格式）的總儲存大小隨著對每一列進行Clear Column後的變化過程。lineitem表共16列，有Bigint、Decimal、Char、Date、Varchar幾種類型。
可以看出隨著表的16個列陸續被Clear Column命令置為Null，儲存空間總共下降了99.97%（由最初的186783526byte下降到了236715byte）。
說明
- 列的資料類型和實際儲存的值和Clear Column操作節省的空間有關，比如在這個例子中，對Decimal類型的l_extendedprice列，Clear Column操作節省了24.2%的空間（146538799 byte -> 111138117 byte）明顯好於平均水平。
- 當所有列被置Null後表的大小是236715 byte，而不是0。這是因為表的檔案結構還存在，Null欄位會佔用少量的儲存空間，系統也需要保留檔案Footer資訊。

更新資料（UPDATE）

update操作用於將Transactional表或Delta Table表中行對應的單列或多列資料更新為新值。

命令格式

--方式1
update <table_name> set <col1_name> = <value1> [, <col2_name> = <value2> ...] [WHERE <where_condition>];

--方式2
update <table_name> set (<col1_name> [, <col2_name> ...]) = (<value1> [, <value2> ...])[WHERE <where_condition>];

--方式3
UPDATE <table_name>
       SET <col1_name> = <value1> [ , <col2_name> = <value2> , ... ]
        [ FROM <additional_tables> ]
        [ WHERE <where_condition> ]

參數說明

table_name：必填。待執行update操作的Transactional表名稱。
col1_name、col2_name：至少更新一個。待修改行對應的列名稱。
value1、value2：至少更新一個列值。修改後的新值。
where_condition：可選。WHERE子句，用於篩選滿足條件的資料。更多WHERE子句資訊，請參見WHERE子句（where_condition）。如果不帶WHERE子句，會更新表中的所有資料。

additional_tables：可選，from子句。

update支援from子句，使用from子句時，update的使用會更加方便，不使用from子句與使用from子句的對比樣本如下。

情境

範例程式碼

不使用from子句

update target set v =
  (select min(v) from src group by k where target.k = src.key)
  where target.k in (select k from src);

使用from子句

update target set v = b.v
  from (select k, min(v) v from src group by k) b
  where target.k = b.k;

從上述範例程式碼可見：

當用源表的多行資料更新目標表的一行資料的時，由於不知道使用哪條源表的資料進行更新，所以遇到這種情況需要使用者寫彙總操作來保證資料來源的unique性，可以看出不使用from子句時，代碼不夠簡潔，用from子句的寫比較簡潔易懂。
關聯更新的時候，如果只更新交集資料，不使用from子句時需要寫額外的where條件，相對於from文法而言也不太簡潔。

使用樣本

樣本1：建立非分區表acid_update，並匯入資料，執行update操作更新滿足指定條件的行對應的列資料。命令樣本如下：

--建立Transactional表acid_update。  
create table if not exists acid_update(id bigint) tblproperties ("transactional"="true");

--插入資料。
insert overwrite table acid_update values(1),(2),(3),(2);

--查看插入結果。
select * from acid_update; 

--返回結果
+------------+
| id         |
+------------+
| 1          |
| 2          |
| 3          |
| 2          |
+------------+

--將所有id為2的行，id值更新為4。
update acid_update set id = 4 where id = 2; 

--查看更新結果，2被更新為4。
select * from acid_update; 

--返回結果
+------------+
| id         |
+------------+
| 1          |
| 3          |
| 4          |
| 4          |
+------------+

樣本2：建立分區表acid_update，並匯入資料，執行update操作更新滿足指定條件的行對應的列資料。命令樣本如下：

--建立Transactional表acid_update_pt。 
create table if not exists acid_update_pt(id bigint) partitioned by(ds string) tblproperties ("transactional"="true");

--添加分區。
alter table acid_update_pt add if not exists partition (ds= '2019');

--插入資料。
insert overwrite table acid_update_pt partition (ds='2019') values(1),(2),(3);

--查看插入結果
select * from acid_update_pt where ds = '2019';

--返回結果
+------------+------------+
| id         | ds         |
+------------+------------+
| 1          | 2019       |
| 2          | 2019       |
| 3          | 2019       |
+------------+------------+

--更新指定行的一列資料，將分區為2019的所有id=2的行，id值更新為4。
update acid_update_pt set id = 4 where ds = '2019' and id = 2; 

--查看更新結果，2被更新為4。
select * from acid_update_pt where ds = '2019';

--返回結果
+------------+------------+
| id         | ds         |
+------------+------------+
| 4          | 2019       |
| 1          | 2019       |
| 3          | 2019       |
+------------+------------+

樣本3：建立目標表acid_update_t和關聯表acid_update_s，實現同時更新多列值。命令樣本如下：

--建立待更新目標Transactional表acid_update_t和關聯表acid_update_s。
create table if not exists acid_update_t(id int,value1 int,value2 int) tblproperties ("transactional"="true");
create table if not exists acid_update_s(id int,value1 int,value2 int);

--插入資料。
insert overwrite table acid_update_t values(2,20,21),(3,30,31),(4,40,41);
insert overwrite table acid_update_s values(1,100,101),(2,200,201),(3,300,301);

--方式一：用常量更新。
update acid_update_t set (value1, value2) = (60,61);
--查詢方式一目標表結果資料。
select * from acid_update_t;

--返回結果
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 2          | 60         | 61         |
| 3          | 60         | 61         |
| 4          | 60         | 61         |
+------------+------------+------------+

--方式二：關聯更新，規則為acid_update_t表左關聯acid_update_s表。
update acid_update_t set (value1, value2) = (select value1, value2 from acid_update_s where acid_update_t.id = acid_update_s.id);
--查詢方式二目標表結果資料。
select * from acid_update_t;

--返回結果
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 2          | 200        | 201        |
| 3          | 300        | 301        |
| 4          | NULL       | NULL       |
+------------+------------+------------+

--方式三：關聯更新，規則為增加過濾條件，只更新交集。
update acid_update_t set (value1, value2) = (select value1, value2 from acid_update_s where acid_update_t.id = acid_update_s.id) where acid_update_t.id in (select id from acid_update_s);
--查詢方式三目標表結果資料。
select * from acid_update_t;

--返回結果
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 2          | 200        | 201        |
| 3          | 300        | 301        |
| 4          | NULL       | NULL       |
+------------+------------+------------+

--方式四：用匯總結果關聯更新。
update acid_update_t set (id, value1, value2) = (select id, max(value1),max(value2) from acid_update_s where acid_update_t.id = acid_update_s.id group by acid_update_s.id) where acid_update_t.id in (select id from acid_update_s);
--查詢方式四目標表結果資料。
select * from acid_update_t;

--返回結果
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 2          | 200        | 201        |
| 3          | 300        | 301        |
| 4          | NULL       | NULL       |
+------------+------------+------------+

樣本4：涉及兩個表的簡單關聯查詢，樣本命令如下：

--建立更新目標表acid_update_t和關聯表acid_update_s
create table if not exists acid_update_t(id bigint,value1 bigint,value2 bigint) tblproperties ("transactional"="true");
create table if not exists acid_update_s(id bigint,value1 bigint,value2 bigint);

--插入資料
insert overwrite table acid_update_t values(2,20,21),(3,30,31),(4,40,41);
insert overwrite table acid_update_s values(1,100,101),(2,200,201),(3,300,301);

--查詢acid_update_t表資料
select * from acid_update_t;
--返回結果
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 2          | 20         | 21         |
| 3          | 30         | 31         |
| 4          | 40         | 41         |
+------------+------------+------------+

--查詢acid_update_s表資料
select * from acid_update_s;
--返回結果
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 1          | 100        | 101        |
| 2          | 200        | 201        |
| 3          | 300        | 301        |
+------------+------------+------------+

--關聯更新，目標表增加過濾條件，只取交集
update acid_update_t set value1 = b.value1, value2 = b.value2
from acid_update_s b where acid_update_t.id = b.id;

--查看更新結果，20被更新為200，21被更新為201，30被更新為300，31被更新為301
select * from acid_update_t;
--返回結果
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 4          | 40         | 41         |
| 2          | 200        | 201        |
| 3          | 300        | 301        |
+------------+------------+------------+

樣本5：涉及多個表的複雜關聯查詢，樣本命令如下：

--建立更新目標表acid_update_t和關聯表acid_update_s
create table if not exists acid_update_t(id bigint,value1 bigint,value2 bigint) tblproperties ("transactional"="true");
create table if not exists acid_update_s(id bigint,value1 bigint,value2 bigint);
create table if not exists acid_update_m(id bigint,value1 bigint,value2 bigint);

--插入資料
insert overwrite table acid_update_t
values(2,20,21),(3,30,31),(4,40,41),(5,50,51);
insert overwrite table acid_update_s
values (1,100,101),(2,200,201),(3,300,301),(4,400,401),(5,500,501);
insert overwrite table acid_update_m
values(3,30,101),(4,400,201),(5,300,301);

--查詢acid_update_t表資料
select * from acid_update_t;
--返回結果
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 2          | 20         | 21         |
| 3          | 30         | 31         |
| 4          | 40         | 41         |
| 5          | 50         | 51         |
+------------+------------+------------+

--查詢acid_update_s表資料
select * from acid_update_s;
--返回結果
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 1          | 100        | 101        |
| 2          | 200        | 201        |
| 3          | 300        | 301        |
| 4          | 400        | 401        |
| 5          | 500        | 501        |
+------------+------------+------------+

--查詢acid_update_m表資料
select * from acid_update_m;
--返回結果
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 3          | 30         | 101        |
| 4          | 400        | 201        |
| 5          | 300        | 301        |
+------------+------------+------------+

--關聯更新，並且在where中同時對原表和目標表進行過濾
update acid_update_t
set value1 = acid_update_s.value1, value2 = acid_update_s.value2
from acid_update_s
where acid_update_t.id = acid_update_s.id
and acid_update_s.id > 2
and acid_update_t.value1 not in
(select value1 from acid_update_m where id = acid_update_t.id)
and acid_update_s.value1 not in
(select value1 from acid_update_m where id = acid_update_s.id);

--查看更新結果，acid_update_t表只有id為5的資料符合條件，對應value1被更新尾500，valu2被更新尾501
select * from acid_update_t;
--返回結果
+------------+------------+------------+
| id         | value1     | value2     |
+------------+------------+------------+
| 5          | 500        | 501        |
| 2          | 20         | 21         |
| 3          | 30         | 31         |
| 4          | 40         | 41         |
+------------+------------+------------+

樣本6：建立Delta Table表mf_dt，並匯入資料，執行update操作刪除滿足指定條件的行。命令樣本如下：

--建立目標Delta Table表mf_dt。
create table if not exists mf_dt (pk bigint not null primary key, 
                  val bigint not null) 
                  partitioned by (dd string, hh string) 
                  tblproperties ("transactional"="true");

--插入資料
insert overwrite table mf_dt partition (dd='01', hh='02') 
                 values (1, 1), (2, 2), (3, 3);

--查看插入結果                 
select * from mf_dt where dd='01' and hh='02';
--返回結果
+------------+------------+----+----+
| pk         | val        | dd | hh |
+------------+------------+----+----+
| 1          | 1          | 01 | 02 |
| 3          | 3          | 01 | 02 |
| 2          | 2          | 01 | 02 |
+------------+------------+----+----+

--更新指定行的一列資料，將分區為01和02的所有pk=3的行，val值更新為30。
--方法一
update mf_dt set val = 30 where pk = 3 and dd='01' and hh='02';

--方法二
update mf_dt set val = delta.val 
              from (select pk, val from values (3, 30) t (pk, val)) delta 
              where delta.pk = mf_dt.pk and mf_dt.dd='01' and mf_dt.hh='02';

--查看更新結果。
select * from mf_dt where dd='01' and hh='02';
--返回結果，pk=3的行val值被更新為30。
+------------+------------+----+----+
| pk         | val        | dd | hh |
+------------+------------+----+----+
| 1          | 1          | 01 | 02 |
| 3          | 30         | 01 | 02 |
| 2          | 2          | 01 | 02 |
+------------+------------+----+----+

合并Transactional表檔案

Transactional表底層實體儲存體為不支援直接讀取的Base檔案和Delta檔案。對Transactional表執行update或delete操作，不會修改Base檔案，只會追加Delta檔案，所以會出現更新或刪除次數越多，表實際佔用儲存越大的情況，多次累積的Delta檔案會產生較高的儲存和後續查詢費用。

對同一表或分區，執行多次update或delete操作，會產生較多Delta檔案。系統讀資料時，需要載入這些Delta檔案來確定哪些行被更新或刪除，較多的Delta檔案會影響資料讀取效率。此時您可以將Base檔案和Delta合并，減少儲存以便提升資料讀取效率。

命令格式

alter table <table_name> [partition (<partition_key> = '<partition_value>' [, ...])] compact {minor|major};

參數說明
- table_name：必填。待合并檔案的Transactional表名稱。
- partition_key：可選。當Transactional表為分區表時，指定分區列名。
- partition_value：可選。當Transactional表為分區表時，指定分區列名對應的列值。
- major|minor：至少選擇其中一個。二者的區別是：
  - minor：只將Base檔案及其下所有的Delta檔案合并，消除Delta檔案。
  - major：不僅將Base檔案及其下所有的Delta檔案合并，消除Delta檔案，還會把表對應的Base檔案中的小檔案進行合并。當Base檔案較小（小於32 MB）或有Delta檔案的情況下，等價於重新對錶執行insert overwrite操作，但當Base檔案足夠大（大於等於32 MB ），且不存在Delta檔案的情況下，不會重寫。

使用樣本

樣本1：基於Transactional表acid_delete，合并表檔案。命令樣本如下：

alter table acid_delete compact minor;

返回結果如下：

Summary:
Nothing found to merge, set odps.merge.cross.paths=true if cross path merge is permitted.
OK

樣本2：基於Transactional表acid_update_pt，合并表檔案。命令樣本如下：

alter table acid_update_pt partition (ds = '2019') compact major;

返回結果如下：

Summary:
table name: acid_update_pt /ds=2019  instance count: 2  run time: 6
  before merge, file count:        8  file size: 2613  file physical size: 7839
   after merge, file count:        2  file size: 679  file physical size: 2037

OK

常見問題

問題一：
- 問題現象：執行update操作時，報錯ODPS-0010000:System internal error - fuxi job failed, caused by: Data Set should contain exactly one row。
- 問題原因：待更新的行資料與子查詢結果中的資料無法一一對應，系統無法判斷對哪一行資料進行更新。命令樣本如下：
```
update store set (s_county, s_manager) = (select d_country, d_manager from store_delta sd where sd.s_store_sk = store.s_store_sk) where s_store_sk in (select s_store_sk from store_delta);
```
  通過子查詢select d_country, d_manager from store_delta sd where sd.s_store_sk = store.s_store_sk與store_delta關聯，並用store_delta的資料更新store。假設store的s_store_sk中有[1, 2, 3]三行資料，如果store_delta的s_store_sk有[1, 1]兩行資料，資料無法一一對應，執行報錯。
- 解決措施：確保待更新的行資料與子查詢結果中的資料一一對應。
問題二：
- 問題現象：在DataWorks DataStudio中使用compact命令時，報錯ODPS-0130161:[1,39] Parse exception - invalid token 'minor', expect one of 'StringLiteral','DoubleQuoteStringLiteral'。
- 問題原因：DataWorks獨享資源群組中的MaxCompute用戶端版本不支援compact命令。
- 解決措施：請通過DataWorks交流群聯絡支援人員團隊升級獨享資源群組中的MaxCompute用戶端版本。