MaxCompute allows you to execute the INSERT INTO
or INSERT OVERWRITE
statement to insert data into or overwrite data in a table or a static partition.
You can execute the statements on the following platforms:
Prerequisites
Before you execute the INSERT INTO
or INSERT OVERWRITE
statement, make sure that you are granted the Update permission on the destination table and the Select permission on the metadata of the source table. For more information, see MaxCompute permissions.
Features
When you use MaxCompute SQL to process data, you can execute the INSERT INTO
or INSERT OVERWRITE
statement to save the execution results of the SELECT
statements to the destination table. Differences between the two statements:
INSERT INTO
: inserts data into a table or a static partition of a table. You can specify the values of partition key columns in this statement to insert data into a specified partition.If you want to insert a small amount of test data, you can use this statement with VALUES.
INSERT OVERWRITE
: clears a specified table or static partitions and inserts data into the table or the static partitions of the table.NoteThe
INSERT
syntax in MaxCompute is different from that in MySQL or Oracle.You must add the
TABLE
keyword andtable_name
toINSERT OVERWRITE
. You do not need to add theTABLE
keyword toINSERT INTO
.If you execute the
INSERT OVERWRITE
statement on the same partition several times, the size of the partition into which data is inserted may be different each time you run theDESC
command. This is because the logic to split files changes after you execute theSELECT
andINSERT OVERWRITE
statements in sequence for the same partition in a table. After you execute theINSERT OVERWRITE
statement, the total length of data remains the same. This does not affect the storage fees.
For information about how to insert data into a dynamic partition, see Insert or overwrite data into dynamic partitions (DYNAMIC PARTITION).
Limits
When you execute the
INSERT INTO
statement or theINSERT OVERWRITE
statement to insert or update data into a table or a static partition of a table, take note of the following limits:INSERT INTO
: This statement cannot be used to insert data into a clustered table.INSERT OVERWRITE
: This statement does not allow you to specify the columns into which you want to insert data. You can execute theINSERT INTO
statement to specify the columns. For example, if you executecreate table t(a string, b string); insert into t(a) values ('1');
, 1 is inserted into Column a, and NULL or the default value is inserted into Column b.MaxCompute does not provide the locking mechanism for the tables for which INSERT operations are being performed. We recommend that you do not execute the
INSERT INTO
statement or theINSERT OVERWRITE
statement for a table at the same time.
The INSERT INTO statement and the INSERT OVERWRITE statement have the following limits on Delta tables:
When you execute the
INSERT OVERWRITE
statement to insert data into a Delta table, the system removes duplicate data from multiple rows of data that have the same primary key value and inserts only the row that is ranked first into the table during the computing process. You cannot specify the data record that will be inserted into the table. The INSERT OVERWRITE statement inserts full data into the Delta table. Default deduplication can ensure the uniqueness of primary key values.When you execute the
INSERT INTO
statement to insert data into a Delta table, the system does not deduplicate data that has the same primary key value but inserts all the data into the table by default. If you set Flag(odps.sql.insert.acidtable.deduplicate.enable
) to true, the system deduplicates the data first.
Syntax
insert {into|overwrite} table <table_name> [partition (<pt_spec>)] [(<col_name> [,<col_name> ...)]]
<select_statement>
from <from_statement>
[zorder by <zcol_name> [, <zcol_name> ...]];
table_name: required. The names of the tables into which you want to insert data.
pt_spec: optional. The partition into which you want to insert data. Only constants are supported. Expressions, such as functions, are not supported. The value of this parameter is in the
(partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...)
format.col_name: optional. The name of the column in the table into which you want to insert data.
INSERT OVERWRITE
does not allow you to specify[(<col_name> [,<col_name> ...)]
.select_statement: required. The
SELECT
clause that is used to query the data that you want to insert into the destination table from the source table. For more information about theSELECT
clause, see SELECT syntax.NoteThe mappings between the source and destination tables are based on the column sequence in
select_statement
, instead of the mappings between column names in the tables.If the destination table has static partitions and you want to insert data into a static partition, you cannot include partition key columns in
select_statement
.
from_statement: required. The
FROM
clause. This clause specifies the data source. For example, you can specify the name of a source table in this clause.zorder by <zcol_name> [, <zcol_name> ...]: optional. If you write data to a table or a partition, you can use this clause to co-locate rows that have similar data records based on the columns specified in select_statement. This improves filtering performance for queries and reduces storage costs. The
ORDER BY x, y
clause sorts data records based on the sequence of x coming before y. TheZORDER BY x, y
clause co-locates rows that have similar x values and rows that have similar y values. For column-based data filtering and sorting in an SQL SELECT statement, theORDER BY
clause filters and sorts data based on x, whereas theZORDER BY
clause filters and sorts data based on x or based on both x and y. This increases the column compression ratio.
The
ZORDER BY
clause supports two modes: local Z-Ordering and global Z-Ordering. The default mode islocal Z-Ordering
. In local Z-Ordering mode, only data in a single file, instead of global data, is sorted based on the ZORDER BY clause. If data is distributed in multiple files, data is loosely aggregated. As a result, data skipping may be performed in a less effective manner. To resolve this issue, MaxCompute of the latest version supports theglobal Z-Ordering
mode.Local Z-Ordering.
Global Z-Ordering
: To use this mode, you must add the configurationset odps.sql.default.zorder.type=global;
.
The
SORT BY
clause specifies the method used to sort data in a single file. If you do not specify theSORT BY
clause, data in a single file is sorted inlocal Z-Ordering
mode.When you use the
ZORDER BY
clause, take note of the following limits:You can execute the
ZORDER BY
clause to sort data in only one partition in a partitioned table at a time.The number of fields for which you execute the
ZORDER BY
clause must range from 2 to 4.
If the destination table is a clustered table, the
ZORDER BY
clause is not supported.ZORDER BY
can be used together withDISTRIBUTE BY
but cannot be used together withORDER BY
,CLUSTER BY
, orSORT BY
.
NoteIf you use the
ZORDER BY
clause to insert data, more resources and time are consumed.
Examples: common tables
Example 1: Execute the
INSERT INTO
statement to append data to a non-partitioned table namedwebsites
. Sample statements:-- Create a non-partitioned table named websites. create table if not exists websites (id int, name string, url string ); -- Create a non-partitioned table named apps. create table if not exists apps (id int, app_name string, url string ); -- Append data to the apps table. The abbreviated form of INSERT INTO TABLE table_name is INSERT INTO table_name. insert into apps (id,app_name,url) values (1,'Aliyun','https://www.aliyun.com'); -- Copy data from the apps table and append the data to the websites table. insert into websites (id,name,url) select id,app_name,url from apps; -- Execute the SELECT statement to view data in the websites table. select * from websites; -- The following result is returned: +------------+------------+------------+ | id | name | url | +------------+------------+------------+ | 1 | Aliyun | https://www.aliyun.com | +------------+------------+------------+
Example 2: Execute the
INSERT INTO
statement to append data to a partitioned table namedsale_detail
. Sample statements:-- Create a partitioned table named sale_detail. create table if not exists sale_detail ( shop_name string, customer_id string, total_price double ) partitioned by (sale_date string, region string); -- Add a partition to the sale_detail table. This operation is optional. If you do not create a partition in advance, a partition is automatically created when you write data to the table. alter table sale_detail add partition (sale_date='2013', region='china'); -- Append data to the sale_detail table. The abbreviated form of INSERT INTO TABLE table_name is INSERT INTO table_name. The TABLE keyword in INSERT OVERWRITE TABLE table_name cannot be omitted. insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3); -- Enable a full table scan only for the current session. Execute the SELECT statement to view data in the sale_detail table. set odps.sql.allow.fullscan=true; select * from sale_detail; -- The following result is returned: +------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | +------------+-------------+-------------+------------+------------+
Example 3: Execute the
INSERT OVERWRITE
statement to overwrite the data in thesale_detail_insert
table. Sample statements:-- Create the sale_detail_insert table that has the same schema as the sale_detail table. create table sale_detail_insert like sale_detail; -- Add a partition to the sale_detail_insert table. This operation is optional. If you do not create a partition in advance, a partition is automatically created when you write data to the table. alter table sale_detail_insert add partition (sale_date='2013', region='china'); -- Extract data from the sale_detail table and insert the data into the sale_detail_insert table. Names of partition key columns in the sale_detail_insert table do not need to be declared and cannot be rearranged. -- If the sale_detail_insert table contains static partitions, the values of partition key columns are declared in PARTITION(). These values do not need to be included in select_statement. You need to search for column names based on only the sequence of common columns in the sale_detail_insert table and sequentially map the declared column values to the columns in the sale_detail_insert table. If the sale_detail_insert table contains dynamic partitions, the names of partition key columns must be included in select_statement. For more information, see Insert or overwrite data into dynamic partitions (DYNAMIC PARTITION). insert overwrite table sale_detail_insert partition (sale_date='2013', region='china') select shop_name, customer_id, total_price from sale_detail zorder by customer_id, total_price; -- Enable a full table scan only for the current session. Execute the SELECT statement to view the data in the sale_detail_insert table. set odps.sql.allow.fullscan=true; select * from sale_detail_insert; -- The following result is returned: +------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | +------------+-------------+-------------+------------+------------+
Example 4: Execute the
INSERT OVERWRITE
statement to overwrite the data in thesale_detail_insert
table and adjust the sequence of columns inselect_statement
. The mappings between the source and destination tables are based on the sequence of columns inselect_statement
, instead of the mappings between column names in the two tables. Sample statements:insert overwrite table sale_detail_insert partition (sale_date='2013', region='china') select customer_id, shop_name, total_price from sale_detail; set odps.sql.allow.fullscan=true; select * from sale_detail_insert;
The following result is returned:
+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | c1 | s1 | 100.1 | 2013 | china | | c2 | s2 | 100.2 | 2013 | china | | c3 | s3 | 100.3 | 2013 | china | +------------+-------------+-------------+------------+------------+
When you create the
sale_detail_insert
table, the column sequence is defined asshop_name string, customer_id string, and then total_price bigint
. However, data is inserted from thesale_detail
table into thesale_detail_insert
table based on the sequence ofcustomer_id, shop_name, and then total_price
. As a result, the data in thesale_detail.customer_id
column is inserted into thesale_detail_insert.shop_name
column, and the data in thesale_detail.shop_name
column is inserted into thesale_detail_insert.customer_id
column.Example 5: If you insert data into a partition, the partition key columns of the partition cannot be included in
select_statement
. After the following statement is executed, an error is returned. This is becausesale_date
andregion
are partition key columns. These columns cannot be included inselect_statement
if the INSERT OVERWRITE or INSERT INTO statement is used to insert or update data into a static partition. Sample statement of incorrect usage:insert overwrite table sale_detail_insert partition (sale_date='2013', region='china') select shop_name, customer_id, total_price, sale_date, region from sale_detail;
Example 6: pt_spec in
PARTITION()
must be constants instead of expressions. Sample statement of incorrect usage:insert overwrite table sale_detail_insert partition (sale_date=datepart('2016-09-18 01:10:00', 'yyyy') , region='china') select shop_name, customer_id, total_price from sale_detail;
Example 7: Execute the
INSERT OVERWRITE
statement to overwrite data in themf_src
table and data to themf_zorder_src
table and sort data in themf_zorder_src
table in global Z-Ordering mode. Sample statements:-- Create the mf_src table, insert data into the table, and then query data from the table. create table mf_src (key string, value string); insert overwrite table mf_src select a, b from values ('1', '1'),('3', '3'),('2', '2') as t(a, b); select * from mf_src; -- The following result is returned: +-----+-------+ | key | value | +-----+-------+ | 1 | 1 | | 3 | 3 | | 2 | 2 | +-----+-------+ -- Create the mf_zorder_src table that has the same schema as the mf_src table. create table mf_zorder_src like mf_src; -- Insert data into the mf_zorder_src table, sort data in the table in global Z-Ordering mode, and then query data from the table. set odps.sql.default.zorder.type=global; insert overwrite table mf_zorder_src select key, value from mf_src zorder by key, value; select * from mf_zorder_src; -- The following result is returned: +-----+-------+ | key | value | +-----+-------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | +-----+-------+
Example 8: Execute the
INSERT OVERWRITE
statement to overwrite data in an existing table namedtarget
. Sample statements:-- Overwrite data in an existing table named target. set odps.sql.default.zorder.type=global; insert overwrite table target select key, value from target zorder by key, value;
Examples: Delta tables
Create a Delta table named mf_dt and execute the INSERT
statements to insert data into and overwrite data in the table.
-- Create a Delta table named mf_dt.
create table if not exists mf_dt (pk bigint not null primary key,
val bigint not null)
partitioned by (dd string, hh string)
tblproperties ("transactional"="true");
-- Insert test data into the partition whose partition key columns dd and hh are both 01 in the mf_dt table.
insert overwrite table mf_dt partition (dd='01', hh='01')
values (1, 1), (2, 2), (3, 3);
-- Query data from the destination partition in the mf_dt table.
select * from mf_dt where dd='01' and hh='01';
-- The following result is returned:
+------------+------------+----+----+
| pk | val | dd | hh |
+------------+------------+----+----+
| 1 | 1 | 01 | 01 |
| 3 | 3 | 01 | 01 |
| 2 | 2 | 01 | 01 |
+------------+------------+----+----+
-- Execute the INSERT INTO statement to append data to the destination partition in the mf_dt table.
insert into table mf_dt partition(dd='01', hh='01')
values (3, 30), (4, 4), (5, 5);
select * from mf_dt where dd='01' and hh='01';
-- The following result is returned:
+------------+------------+----+----+
| pk | val | dd | hh |
+------------+------------+----+----+
| 1 | 1 | 01 | 01 |
| 3 | 30 | 01 | 01 |
| 4 | 4 | 01 | 01 |
| 5 | 5 | 01 | 01 |
| 2 | 2 | 01 | 01 |
+------------+------------+----+----+
-- Execute the INSERT OVERWRITE statement to overwrite data in the destination partition of the mf_dt table.
insert overwrite table mf_dt partition (dd='01', hh='01')
values (1, 1), (2, 2), (3, 3);
select * from mf_dt where dd='01' and hh='02';
-- The following result is returned:
+------------+------------+----+----+
| pk | val | dd | hh |
+------------+------------+----+----+
| 1 | 1 | 01 | 01 |
| 3 | 3 | 01 | 01 |
| 2 | 2 | 01 | 01 |
+------------+------------+----+----+
-- Execute the INSERT INTO statement to write data to the partition whose partition key column dd is 01 and partition key column hh is 02 in the mf_dt table.
insert overwrite table mf_dt partition (dd='01', hh='02')
values (1, 11), (2, 22), (3, 32);
select * from mf_dt where dd='01' and hh='02';
-- The following result is returned:
+------------+------------+----+----+
| pk | val | dd | hh |
+------------+------------+----+----+
| 1 | 11 | 01 | 02 |
| 3 | 32 | 01 | 02 |
| 2 | 22 | 01 | 02 |
+------------+------------+----+----+
-- Enable a full table scan only for the current session. Execute the SELECT statement to query data in the mf_dt table.
set odps.sql.allow.fullscan=true;
select * from mf_dt;
-- The following result is returned:
+------------+------------+----+----+
| pk | val | dd | hh |
+------------+------------+----+----+
| 1 | 11 | 01 | 02 |
| 3 | 32 | 01 | 02 |
| 2 | 22 | 01 | 02 |
| 1 | 1 | 01 | 01 |
| 3 | 3 | 01 | 01 |
| 2 | 2 | 01 | 01 |
+------------+------------+----+----+
Best practices
The Z-Ordering feature is not suitable for all business scenarios. MaxCompute does not provide a guideline on whether and how to use the Z-Ordering feature. In most cases, you must determine whether to use the Z-Ordering feature based on your business requirements. You must also comprehensively evaluate whether the additional computing costs that are generated when you sort data by using the Z-Ordering feature is less than the costs that are reduced in terms of storage and downstream consumption and computing. The following descriptions provide suggestions on using the Z-Ordering feature based on experience. You can also provide your suggestions and feedback.
Scenarios in which you need to preferentially use clustered indexing rather than Z-Ordering
If a filter condition is composed of combinations of prefixes, such as a, the combination of a and b, or the combination of a, b, and c, clustered indexing (ORDER BY a, b, c) is more effective than the ZORDER BY clause. The ORDER BY clause provides a better sorting effect on the first field but has small impacts on the remaining fields. The ZORDER BY clause provides the same weight for each field, and its sorting result on a specific field is less effective than the sorting result of the ORDER BY clause on the first column.
If some fields are frequently used as join keys, range clustering or hash clustering is more suitable for the fields. The Z-Ordering feature of MaxCompute can be used to sort data only in a single file. The SQL engine is imperceptible to data that is sorted by using the Z-Ordering feature but is perceptible to data that is sorted by using clustered indexing. You can better optimize the performance of joins by using clustered indexing in the stage of query planning.
If you want to frequently execute the GROUP BY or ORDER BY clause for specific fields, clustered indexing can provide better performance.
Suggestions on using the Z-Ordering feature
Select the fields that are frequently used in filter conditions, especially fields that are frequently joined for filtering.
Make sure that the number of fields for which you execute the ZORDER BY clause does not exceed 4. If you execute the ZORDER BY clause on a large number of fields, the sorting performance for each field becomes poor. If only one field needs to be sorted, we recommend that you use clustered indexing rather than Z-Ordering.
The number of distinct values of the fields that are selected cannot be excessively large or small. For example, the gender field has only two distinct values. In this case, sorting of the values is meaningless. However, if a field has an excessively large number of distinct values, sorting of the values generates high costs because the ZORDER BY clause needs to cache all values of the field in the memory to calculate the Z value.
Make sure that the data amount of the table that you want to sort is not excessively large or small. If the data amount of the table that you want to sort is excessively small, the effect of sorting by using Z-Ordering is not apparent. If the data amount of the table that you want to sort is excessively large, high costs are generated when data is sorted by using Z-Ordering. For example, the output of a baseline task may be delayed.