Data in specific columns in Delta tables can be updated. If you want to insert data into a table or update the data of a table, you need to only specify the columns that you want to modify and the new data in an SQL statement. This way, the operation does not need to be performed on all columns of the table. During data insertion, columns without values are automatically filled with NULL. During data update, columns without values remain unchanged. This helps improve operation efficiency and reduce storage costs in most scenarios. This topic provides examples on how to insert or update data in specific columns of a Delta table.
For example, some typical data warehouse business involves some data tables that use the star schema and the primary keys of the tables are the same. If all tables must be merged into a large wide table based on the primary keys for subsequent calculation and statistics, data from different tables can be separately written to their related columns in parallel. When the system reads the large wide table, the system merges the data into complete data rows for output. Compared with the method of writing data to all columns, the method of updating data in specific columns can significantly improve the write and read performance and reduce storage costs.
Precautions
The table type must be a Delta table.
You can update data in specific columns of partitioned and non-partitioned tables.
Before you update data in specific columns of a table, you must configure
SET odps.sql.upsertable.table.enable=true;
to support updates of specific columns. The default value is true.
Sample SQL statements
Sample code
Create a Delta table named delta_target
.
CREATE TABLE delta_target
(
key BIGINT NOT NULL PRIMARY KEY,
b STRING,
c BIGINT
)
TBLPROPERTIES ("acid.partial.fields.update.enable" = "true","transactional" = "true")
;
INSERT OVERWRITE/INSERT INTO
The sample code shows how to use the INSERT OVERWRITE or INSERT INTO statement to insert data into specific columns of a Delta table.
The default column is Column
c
. NULL is automatically filled in Columnc
.INSERT INTO TABLE delta_target(key, b) VALUES(1, '1'); SELECT * FROM delta_target;
The following result is returned:
+------------+------------+------------+ | key | b | c | +------------+------------+------------+ | 1 | 1 | NULL | +------------+------------+------------+
The default column is Column
b
. The operation of inserting data that has the same primary key is equivalent to the operation of updating the current data.INSERT INTO TABLE delta_target(key, c) VALUES(1, 1); SELECT * FROM delta_target;
The following result is returned:
+------------+------------+------------+ | key | b | c | +------------+------------+------------+ | 1 | 1 | 1 | +------------+------------+------------+
The default column is Column
c
. The operation of inserting data that has the same primary key is equivalent to the operation of updating the current data.INSERT INTO TABLE delta_target(key, b) VALUES(1, '11'); SELECT * FROM delta_target;
The following result is returned:
+------------+------------+------------+ | key | b | c | +------------+------------+------------+ | 1 | 11 | 1 | +------------+------------+------------+
No default column is provided. The inserted data has a different primary key. In this case, a new piece of data is added.
INSERT INTO TABLE delta_target VALUES(2, '2', 2); SELECT * FROM delta_target;
The following result is returned:
+------------+------------+------------+ | key | b | c | +------------+------------+------------+ | 2 | 2 | 2 | | 1 | 11 | 1 | +------------+------------+------------+
For more information about INSERT OVERWRITE and INSERT INTO, see Insert data into or overwrite data in a table or a static partition (INSERT INTO and INSERT OVERWRITE).
UPDATE or DELETE
The following sample code shows how to use the UPDATE or DELETE statement to update data in specific columns based on the examples of INSERT OVERWRITE/INSERT INTO.
The default column is Column
c
. The data rows in which the value of key is 1 are updated.UPDATE delta_target SET b='111' WHERE key=1; SELECT * FROM delta_target;
The following result is returned:
+------------+------------+------------+ | key | b | c | +------------+------------+------------+ | 2 | 2 | 2 | | 1 | 111 | 1 | +------------+------------+------------+
The default column is Column
c
. The data rows in which the value of key is 2 are updated.UPDATE delta_target SET b='222' WHERE key=2; SELECT * FROM delta_target;
The following result is returned:
+------------+------------+------------+ | key | b | c | +------------+------------+------------+ | 2 | 222 | 2 | | 1 | 111 | 1 | +------------+------------+------------+
For more information about UPDATE and DELETE, see UPDATE and DELETE.
MERGE INTO
The following sample code shows how to use the MERGE INTO statement to merge data based on the examples of UPDATE and DELETE.
-- Create the acid2_dml_pu_source table.
CREATE TABLE acid2_dml_pu_source AS
SELECT
key,b,c
FROM VALUES
(1,'10',10),
(2,'20',20),
(3,'30',30),
(4,'40',40),
(5,'50',50),
(6,'60',60) t (key,b,c);
-- Perform a merge operation.
MERGE INTO delta_target AS t USING acid2_dml_pu_source AS s ON s.key = t.key WHEN matched THEN UPDATE SET t.b = s.b WHEN NOT matched THEN INSERT (key, b) VALUES(s.key, s.b);
The following result is returned:
+------------+------------+------------+
| key | b | c |
+------------+------------+------------+
| 3 | 30 | NULL |
| 4 | 40 | NULL |
| 5 | 50 | NULL |
| 6 | 60 | NULL |
| 2 | 20 | 2 |
| 1 | 10 | 1 |
+------------+------------+------------+
For more information about MERGE INTO, see MERGE INTO.