MaxCompute allows you to execute the INSERT INTO or INSERT OVERWRITE statement to insert or update data into a table or a static partition.

Prerequisites

Before you execute the INSERT INTO or INSERT OVERWRITE statement, make sure that you are granted the Alter permission on the destination table and the Describe permission on the metadata of the source table. For more information, see Permissions.

Description

When you use MaxCompute SQL to process data, you can execute the INSERT INTO or INSERT OVERWRITE statement to save the execution results of the SELECT statements to the destination table. Differences between the two statements:
  • INSERT INTO: inserts data into a table or a static partition of a table. You can specify the values of partition key columns in this statement to insert data into a specified partition. If you want to insert a small amount of test data, you can use this statement together with VALUES.
  • INSERT OVERWRITE: clears a specified table and inserts data into the table or the static partitions of the table.
    Note
    • The INSERT syntax in MaxCompute is different from that in MySQL or Oracle. You must add the TABLE keyword and table_name to INSERT OVERWRITE. You do not need to add the TABLE keyword to INSERT INTO.
    • If you execute the INSERT OVERWRITE statement on the same partition several times, the size of the partition into which data is inserted may be different each time you execute the DESC statement. This is because the logic to split files changes after you execute the SELECT and INSERT OVERWRITE statements in sequence for the same partition in a table. After you execute the INSERT OVERWRITE statement, the total length of data remains the same. This does not affect the storage fees.

For more information about how to insert data into a dynamic partition, see Insert or overwrite data into dynamic partitions (DYNAMIC PARTITION).

Limits

When you execute the INSERT INTO statement or the INSERT OVERWRITE statement to insert or update data into a table or a static partition of a table, take note of the following limits:
  • INSERT INTO: This statement cannot be used to insert data into a clustered table.
  • INSERT OVERWRITE: This statement does not allow you to specify the columns into which you want to insert data. You can execute the INSERT INTO statement to specify the columns. For example, if you execute create table t(a string, b string); insert into t(a) values ('1');, 1 is inserted into Column a, and NULL or the default value is inserted into Column b.
  • MaxCompute does not provide the locking mechanism for the tables for which INSERT operations are being performed. We recommend that you do not execute the INSERT INTO statement or the INSERT OVERWRITE statement for a table at the same time.

Syntax

insert {into|overwrite} table <table_name> [partition (<pt_spec>)] [(<col_name> [,<col_name> ...)]]
<select_statement>
from <from_statement>
[zorder by <zcol_name> [, <zcol_name> ...]];
  • table_name: required. The name of the table into which you want to insert data.
  • pt_spec: optional. The partition into which you want to insert data. Only constants are allowed. Expressions, such as functions, are not allowed. The value of this parameter is in the (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...) format.
  • col_name: optional. The name of the column in the table into which you want to insert data. INSERT OVERWRITE does not allow you to specify [(<col_name> [,<col_name> ...)].
  • select_statement: required. The SELECT clause that is used to query the data that you want to insert into the destination table from the source table. For more information about SELECT clauses, see SELECT syntax.
    Note
    • The mappings between the source and destination tables are based on the column sequence in select_statement, instead of the mappings between column names in the tables.
    • If the destination table has static partitions and you want to insert data into a static partition, partition key columns cannot be included in select_statement.
  • from_statement: required. The FROM clause. This clause indicates the data source. For example, you can specify the name of a source table in this clause.
  • zorder by <zcol_name> [, <zcol_name> ...]: optional. If you write data to a table or a partition, you can use this clause to co-locate rows with similar data records based on the columns specified in select_statement. This improves filtering performance for queries and reduces storage costs. The ORDER BY x, y clause sorts data records based on the sequence of x coming before y. The ZORDER BY x, y clause co-locates rows with similar x values and rows with similar y values. For column-based data filtering and sorting in an SQL SELECT statement, the ORDER BY clause filters and sorts data based on x, whereas the ZORDER BY clause filters and sorts data based on x or based on both x and y. This increases the column compression ratio.
    When you use ZORDER BY, take note of the following limits:
    • If the destination table is a clustered table, the ZORDER BY clause is not supported.
    • ZORDER BY can be used together with DISTRIBUTE BY but cannot be used together with ORDER BY, CLUSTER BY, or SORT BY.
    Note If you use the ZORDER BY clause to write data, more resources and time are consumed.

Examples

  • Example 1: Execute the INSERT INTO statement to append data to a non-partitioned table named websites. Sample statements:
    -- Create a non-partitioned table named websites. 
    create table if not exists websites
    (id int,
     name string,
     url string
    );
    -- Create a non-partitioned table named apps.
    create table if not exists apps
    (id int,
     app_name string,
     url string
    );
    -- Append data to the apps table. The abbreviated form of INSERT INTO TABLE table_name is INSERT INTO table_name.
    insert into apps (id,app_name,url) values 
    (1,'Aliyun','https://www.aliyun.com');
    -- Copy data from the apps table and append the data to the websites table.
    insert into websites (id,name,url) select id,app_name,url
    from  apps;
    -- Execute the SELECT statement to view data in the websites table. 
    select * from websites;
    -- The following result is returned: 
    +------------+------------+------------+
    | id         | name       | url        |
    +------------+------------+------------+
    | 1          | Aliyun     | https://www.aliyun.com |
    +------------+------------+------------+
  • Example 2: Execute the INSERT INTO statement to append data to a partitioned table named sale_detail. Sample statements:
    -- Create a partitioned table named sale_detail. 
    create table if not exists sale_detail
    (
    shop_name     string,
    customer_id   string,
    total_price   double
    )
    partitioned by (sale_date string, region string);
    
    -- Add partitions to the sale_detail table. 
    alter table sale_detail add partition (sale_date='2013', region='china');
    
    -- Insert data into the sale_detail table. The abbreviated form of INSERT INTO TABLE table_name is INSERT INTO table_name. The TABLE keyword in INSERT OVERWRITE TABLE table_name cannot be omitted. 
    insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
    
    -- Enable a full table scan only for the current session. Execute the SELECT statement to view data in the sale_detail table. 
    set odps.sql.allow.fullscan=true; 
    select * from sale_detail;
    
    -- The following result is returned: 
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    | s3         | c3          | 100.3       | 2013       | china      |
    +------------+-------------+-------------+------------+------------+
  • Example 3: Execute the INSERT OVERWRITE statement to update the data in the sale_detail_insert table. Sample statement:
    -- Create the sale_detail_insert table that has the same schema as the sale_detail table. 
    create table sale_detail_insert like sale_detail;
    
    -- Add partitions to the sale_detail_insert table. 
    alter table sale_detail_insert add partition (sale_date='2013', region='china');
    
    -- Extract data from the sale_detail table and insert the data into the sale_detail_insert table. Names of partition key columns in the sale_detail_insert table do not need to be declared and cannot be rearranged. 
    -- If the sale_detail_insert table contains static partitions, the values of partition key columns are declared in PARTITION(). These values do not need to be included in select_statement. You need to search for column names based only on the sequence of common columns in the sale_detail_insert table and sequentially map the declared column values to the columns in the sale_detail_insert table. If the sale_detail_insert table contains dynamic partitions, the names of partition key columns must be included in select_statement. For more information, see Insert or overwrite data into dynamic partitions (DYNAMIC PARTITION). 
    insert overwrite table sale_detail_insert partition (sale_date='2013', region='china')
      select 
      shop_name, 
      customer_id,
      total_price 
      from sale_detail
      zorder by customer_id, total_price;
    
    -- Enable a full table scan only for the current session. Execute the SELECT statement to view the data in the sale_detail_insert table. 
    set odps.sql.allow.fullscan=true;
    select * from sale_detail_insert;
    
    -- The following result is returned: 
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    | s3         | c3          | 100.3       | 2013       | china      |
    +------------+-------------+-------------+------------+------------+
  • Example 4: Execute the INSERT OVERWRITE statement to update the data in the sale_detail_insert table and adjust the sequence of columns in select_statement. The mappings between the source and destination tables are based on the sequence of columns in select_statement, instead of the mappings between column names in the two tables. Sample statement:
    insert overwrite table sale_detail_insert partition (sale_date='2013', region='china')
        select customer_id, shop_name, total_price from sale_detail;    
    select * from sale_detail_insert;                  
    The following code shows a sample query result:
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | c1         | s1          | 100.1       | 2013       | china      |
    | c2         | s2          | 100.2       | 2013       | china      |
    | c3         | s3          | 100.3       | 2013       | china      |
    +------------+-------------+-------------+------------+------------+

    When you create the sale_detail_insert table, the column sequence is defined as shop_name string, customer_id string, and then total_price bigint. However, data is inserted from the sale_detail table into the sale_detail_insert table based on the sequence of customer_id, shop_name, and then total_price. As a result, the data in the sale_detail.customer_id column is inserted into the sale_detail_insert.shop_name column, and the data in the sale_detail.shop_name column is inserted into the sale_detail_insert.customer_id column.

  • Example 5: If you insert data into a partition, the partition key columns of the partition cannot be included in select_statement. After the following statement is executed, an error is returned. This is because sale_date and region are partition key columns. These columns cannot be included in select_statement if INSERT OVERWRITE or INSERT INTO statement is used to insert or update data into a static partition. Sample statement of incorrect usage:
    insert overwrite table sale_detail_insert partition (sale_date='2013', region='china')
       select shop_name, customer_id, total_price, sale_date, region from sale_detail;
  • Example 6: pt_spec in PARTITION() must be constants instead of expressions. Sample statement of incorrect usage:
    insert overwrite table sale_detail_insert partition (sale_date=datepart('2016-09-18 01:10:00', 'yyyy') , region='china')
       select shop_name, customer_id, total_price from sale_detail;