All Products
Search
Document Center

AnalyticDB:INSERT OVERWRITE SELECT

Last Updated:Nov 04, 2024

This topic describes the INSERT OVERWRITE SELECT statement, which is a high-performance data writing method used in AnalyticDB for MySQL.

How it works

The INSERT OVERWRITE SELECT statement clears existing data in a partition and then batch writes data to the partition.

  • For partitioned tables, the INSERT OVERWRITE SELECT statement overwrites data only in partitions that are involved in the statement.

  • For non-partitioned tables, the INSERT OVERWRITE SELECT statement clears the entire table and batch writes data.

Data writing tasks are executed in series for each table. Only a single writing task can be executed at a time for each table. This setting cannot be modified. To ensure the data writing performance of each task and prevent high cluster loads, two tasks can be executed concurrently for each cluster by default. We recommend that you do not modify this default setting.

Note

If you need to modify the number of concurrent writing tasks, Submit a ticket.

Characteristics and scenarios

The INSERT OVERWRITE SELECT statement has the following characteristics:

  • During a data writing task by using the INSERT OVERWRITE SELECT statement, large amounts of cluster resources are consumed to ensure high performance. We recommend that you use this statement during off-peak hours.

  • The written data is invisible when the data writing task is in progress, and becomes visible only after the task is completed.

  • If a partition is written to AnalyticDB for MySQL by using INSERT OVERWRITE SELECT, data of the existing partition that has the same name is overwritten.

  • When a data writing task is completed, table indexes are created to improve query performance.

INSERT OVERWRITE SELECT is suitable for the following scenarios:

  • Data is written to a table by partition.

  • All data of a table is initialized.

  • Large amounts of data are written to a table.

Precautions

Do not execute the INSERT OVERWRITE SELECT and real-time writing statements (INSERT INTO, REPLACE INTO, DELETE, or UPDATE) on the same table at the same time. Otherwise, the data written in real time is discarded.

Syntax

INSERT OVERWRITE table_name (column_name[,...])
select_statement 

Parameters

  • table_name: the name of the destination table.

  • column_name: the name of the column in the destination table.

  • select_statement: the SELECT statement.

    The data type of each column in the SELECT statement must match that of each column in the destination table.

    If the number of columns in the SELECT statement is larger than the number of columns in the destination table, the data writing fails. If the number of columns in the SELECT statement is smaller than the number of columns in the destination table, the additional columns in the destination table are automatically filled with the default values. If no default value is specified, NULL is used.

Examples

Sample data

Prepare a table and data to test overwriting.

  1. Create the source table named test_source and the destination table named test_target.

    CREATE TABLE test_source (a BIGINT, b BIGINT) 
    DISTRIBUTED BY HASH(a);
    CREATE TABLE test_target (a BIGINT, b BIGINT) 
    DISTRIBUTED BY HASH(a) 
    PARTITION BY VALUE(b) LIFECYCLE 10;
  2. Initialize the test_source source table.

    INSERT INTO test_source VALUES (1,1);  
    INSERT INTO test_source VALUES (1,2); 
    INSERT INTO test_source VALUES (1,3);  
    INSERT INTO test_source VALUES (2,1); 
    INSERT INTO test_source VALUES (2,2); 
    INSERT INTO test_source VALUES (2,3);  

Overwrite

  1. Initialize the test_target destination table.

    INSERT OVERWRITE test_target 
    SELECT * FROM test_source WHERE a = 1;

    Query the test_target table. The following information is returned:

    +-----+------+
    |1    |1     |
    |1    |2     |
    |1    |3     |
    +-----+------+
  2. Write data to the test_target table by overwriting partition 1.

    INSERT OVERWRITE test_target (a,b)
    SELECT a,b FROM test_source 
    WHERE a = 2 AND b = 1;

    Query the test_target table. The following information is returned:

    +-----+------+
    |2    |1     |
    |1    |2     |
    |1    |3     |
    +-----+------+
  3. Write data to the test_target table by overwriting partitions 2 and 3.

    INSERT OVERWRITE test_target
    SELECT * FROM test_source 
    WHERE a = 2 AND b >= 2 AND b <= 3;

    Query the test_target table. The following information is returned:

    +-----+------+
    |2    |1     |
    |2    |2     |
    |2    |3     |
    +-----+------+

Asynchronous writing

Submit a task

Execute the SUBMIT JOB statement to submit an asynchronous task. Sample statement:

SUBMIT JOB
INSERT OVERWRITE adb_table
SELECT * FROM adb_external_table;

Optimize writing performance

Add a hint (/* direct_batch_load=true*/) to the beginning of a data writing statement to accelerate the writing task while saving resources. Sample statement:

/* direct_batch_load=true*/
SUBMIT JOB
INSERT OVERWRITE adb_table
SELECT * FROM adb_external_table;
Note

The /* direct_batch_load=true*/ hint is supported only for AnalyticDB for MySQL V3.1.5 and later. If performance is not improved, Submit a ticket. For information about how to query the minor version of a cluster, see How do I view the minor version of a cluster?

Query the progress

After you submit a data writing task by using SUBMIT JOB, a job ID is returned. Use the job ID to query the status of the writing task. Sample statement:

SHOW JOB STATUS WHERE job='<job_id>';

If the writing task is completed, SUCCEEDED is displayed in the status column.