This topic describes INSERT OVERWRITE SELECT, a high-performance method for writing data to tables in AnalyticDB for MySQL. It covers use cases, how it works, precautions, syntax, and asynchronous writes.
How it works
The INSERT OVERWRITE SELECT statement first clears all existing data from a partition and then writes new data to it in batches.
If the target table is a partitioned table,
INSERT OVERWRITE SELECToverwrites only the partitions to which the new data belongs. Other partitions are not affected.If the target table is a non-partitioned table,
INSERT OVERWRITE SELECTclears all existing data in the table and then writes the new data in batches.
Write jobs for a single table are executed sequentially, which means the write concurrency for a single table is 1 and cannot be changed. The default write concurrency for a cluster is 2 to ensure single-job performance and prevent excessive cluster load. Avoid adjusting this value because it may impact performance and increase the load on the cluster.
To adjust the write concurrency, Submit a ticket to technical support for an evaluation.
Overview
The features of INSERT OVERWRITE SELECT are as follows:
High resource consumption:
INSERT OVERWRITE SELECTis a resource-intensive operation. Perform this operation during off-peak hours to avoid impacting cluster performance.Batch visibility: Data written by the job is not visible until the job completes, at which point all the data becomes visible at once.
Partition overwrite:
INSERT OVERWRITE SELECToverwrites data in the corresponding partitions of the target table.Automatic index building: The system builds the index synchronously when data is written. After the write job is complete, the target table has an index, which can improve query performance.
Common use cases for INSERT OVERWRITE SELECT are as follows:
Partition-level data writes.
Data initialization (full data write).
Large-scale bulk data writes. This method is not recommended for small data volumes.
Precautions
Do not use INSERT OVERWRITE SELECT and real-time write methods (such as INSERT INTO, REPLACE INTO, DELETE, and UPDATE) to write to the same table concurrently. Otherwise, data from the real-time writes will be lost.
Syntax
INSERT OVERWRITE table_name (column_name[,...])
select_statementParameters
table_name: The name of the target table.column_name: The name of a column in the target table.select_statement: The SELECT statement.The data type of each column in the SELECT statement must match the data type of the corresponding column in the target table.
If the number of columns in the SELECT statement is greater than that in the target table, the write operation fails. If the number of columns in the SELECT statement is less than that in the target table, the remaining columns in the target table are populated with their default values. If a column has no default value, the system populates it with NULL.
Examples
Sample data
Prepare test tables and data for the overwrite examples.
Create a source table named
test_sourceand a target table namedtest_target.CREATE TABLE test_source (a BIGINT, b BIGINT) DISTRIBUTED BY HASH(a);CREATE TABLE test_target (a BIGINT, b BIGINT) DISTRIBUTED BY HASH(a) PARTITION BY VALUE(b) LIFECYCLE 10;Write test data into the source table
test_source.INSERT INTO test_source VALUES (1,1); INSERT INTO test_source VALUES (1,2); INSERT INTO test_source VALUES (1,3); INSERT INTO test_source VALUES (2,1); INSERT INTO test_source VALUES (2,2); INSERT INTO test_source VALUES (2,3);
Overwrite data
Insert initial data into the
test_targettable.INSERT OVERWRITE test_target SELECT * FROM test_source WHERE a = 1;Query the
test_targettable. The query returns the following result.+-----+------+ |1 |1 | |1 |2 | |1 |3 | +-----+------+Overwrite the partition where b=1 in the
test_targettable.INSERT OVERWRITE test_target (a,b) SELECT a,b FROM test_source WHERE a = 2 AND b = 1;After the overwrite, querying
test_targetreturns:+-----+------+ |2 |1 | |1 |2 | |1 |3 | +-----+------+Overwrite the partitions where b=2 and b=3 in the
test_targettable.INSERT OVERWRITE test_target SELECT * FROM test_source WHERE a = 2 AND b >= 2 AND b <= 3;After the overwrite, querying
test_targetreturns:+-----+------+ |2 |1 | |2 |2 | |2 |3 | +-----+------+
Asynchronous writes
Submit a job
SUBMIT JOB submits an asynchronous job that is scheduled and run in the background. For example:
SUBMIT JOB
INSERT OVERWRITE adb_table
SELECT * FROM adb_external_table;Optimize write performance
Adding the hint /* direct_batch_load=true */ before a write operation can accelerate the write task. This hint significantly improves write performance while consuming fewer resources. For example:
/* direct_batch_load=true*/
SUBMIT JOB
INSERT OVERWRITE adb_table
SELECT * FROM adb_external_table;/* direct_batch_load=true*/ is supported only in kernel versions 3.1.5 and later. If no significant performance improvement is observed after enabling it, Submit a ticket for assistance. To check your cluster's kernel version, see View instance version information.
Query job progress
After you submit a job with SUBMIT JOB, the system returns a job_id. You can use this job_id to query the job status. For example:
SHOW JOB STATUS WHERE job='<job_id>';If the status column in the result shows SUCCEEDED, the job is complete.