By Bineli Manga, Alibaba Cloud Community Blog author.
In any of the SQL languages, the command MERGE
is an important statement, as it allows you to insert or update data in a table or database. It also allows you to avoid making too many requests, adapting the usage of one request to add or another to update existing data. For this reason, this command is sometimes referred to as UPSERT
.
In this article, we will look into what is a MERGE
statement, why does it exist, and how you can efficiently use the it for your data projects. Also, throughout this article, we will learn more about MERGE
statement through answering a series of questions.
To answer this question, we'll need to consider the following situation:
When starting any data project, you have to first collect, clean and structure your data. To do so, you will need to take data to and from many different tables to build a Data lake or a Data Warehouse. This is where the MERGE
statement comes to play.
You can use the command MERGE
to merge data coming from diverse sources before manipulating this data. Then, after that's done, you'll be able to further apply Machine learning algorithms to the data, which, of course, will enable you to learn valuable insights from your data. So, with that said. merging data through using a smart SQL statement like MERGE
is a important thing, and even a valuable skill, for any Data engineer or developer.
Luckily, through this article, you'll learn how to leverage the MERGE
statement to merge your data for a more efficient data manipulation.
The command MERGE
is an important SQL statement command because It allows you to solve the recurring problem of blind data insert in databases. This issue refers to a data insert that is done regardless of the existence of previous data. This command allows you to prevent the need for going to a higher level language like PL/SQL, for example, allowing you to be able to write SQL directly.
The update and insertion inside the target table is constrained by the clause ON
. For any entry inside the target table that verifies the constraint, the corresponding entry in the source table is modified (through the UPDATE
operation). The data in the target table that doesn't verify the constraint start an insertion (INSERT
) in the target table, based on the entries of the table.
Nowadays, the MERGE
statement has been added to the standard SQL:2003, but not every Database Management System (DBMS) also include the upsert
feature in the MERGE
statement.
To be able to use the UPSERT
function, you have to have or complete the following things:
UPDATE
and INSERT
privileges on the target table and the privilege SELECT
on the source table. Next, to specify aDELETE
clause, which you'll need to do, you also need the privileges for the clause_update_merge
clause and the object privilege DELETE
on the target table.UPDATE
statement that allows you to make changes on existing entries.INSERT
statement that allows you to insert one entry in the existing table or many line in a single stepAlso, you'll need all of these things if you'd like to dangle in the examples given below.
To learn more about about the MERGE
statement, you can check out Oracle's Database SQL Reference page on MERGE.
Below is the basic structure you can use to apply the MERGE
statement:
MERGE [indice]
INTO [schéma.] {table | voir} [t_alias]
USAGE [schéma.] {table | voir | sous-requête}
[t_alias]
ON (condition)
[update_clause_merge]
[insert_clause_merge]
[logging_clause_error];
Now consider these:
update_clause_merge
: used when the MERGE
corresponds to an UPDATE
.
UPDATE SET column = {expr | DEFAULT }
[, column = {expr | DEFAUT }] ...
[where_clause]
[DELETE where_clause]
inset_clause_merge
: when the MERGE
corresponds to an INSERT
.
INSERT [(column [, colonne] ...)]
values ({expr [, expr] ... | DEFAULT })
[where_clause]
logging_clause_error
: this one is used to log execution errors.
[Table INTO [schema.]]
[(simple_expression)]
[REJECT LIMIT {entier | ILLIMITÉ }]
Use the clause INTO
to specify the target table in which you want to update or insert data. To merge the data in a view, that view must be able to be updated. You can take a look at "Views that can be updated" section to get more details.
The clause USING
is used to specify the source of data that needs to be inserted or updated. The source can be a table, a view, or the result of a request.
We use the clause ON
to specify the condition on which the MERGE
operation to update or insert data will be applied. For every entry of the target table, for which the research condition is true, the Oracle database updates the target entry with the corresponding data of the source table. If the condition is not true for any line, the database is inserted in the target table based on the corresponding entry in the source table.
The clause clause_update_merge
specifies the new values of the target column in the target table. Oracle's Database management system makes this update if the condition of the ON
clause is true. If the update clause is ran, all the update hooks defined in the table will be activated.
You have to specify the where_clause
if you want the database to run the update operation, but this can only be the case if the specified condition is true. The condition can have a reference to the source of data or to the target table. If the condition is not true, the database ignores the update operation on the merge of the entries of the table.
You have to specify where_clause DELETE where_clause
to clean the data of a table when the insertion or the update. The entries affected by this clause are the lines of the destination table updated by the merge operation. In other words, DELETE WHERE
evaluates the value updated but not the original value evaluated by the condition UPDATE SET ... WHERE
. The entry of the destination table satisfies the condition DELETE
, but is not included in the joint defined by the clause ON
, the entry is not deleted. All the deletion hooks defined on the target table will be activated for every deletion of an entry.
You can specify this clause single or with the clause clause_insert_merge
. If you specify both, they can be in one or another order.
Remarks:
ON
.DEFAULT
when updating a view.The clause insert_clause_merge
specifies the values to insert in the column of the target table if the condition of the ON
clause is false. If the insert clause is run, all the insertion hooks defined on the table will be activated. If you omit the list of columns after the key word INSERT
, the number of columns of the target table must correspond to the number of values of the VALUES
clause.
In order to insert all the entries of the source table, you can use the constant filter predicate in the clause ON
. An example of constant filter predicate is ON
(0=1). Oracle database recognizes such a predicate and makes an unconditional insertion of all the source table entries. This approach is different from the omission of the clause_update_merge
. In this case, the database must always make the joint. With the constant filter predicate, no joint is made.
You have to specify the clause where_clause
if you want the Oracle database to run the operation only if the condition specified is true. The condition cannot be referenced to the source table. Oracle database ignores the insert operation for all the entries for which the condition is not true.*
You can specify this clause alone or with the clause clause_update_merge
. If you specified both, they can be in any order.
Remarks:
DEFAULT
when updating a view.The clause logging_clause_error
has the same behavior in the statement MERGE
as in a statement INSERT
. You have to report to the statement INSERT
logging_clause_error
for more details.
The database management systems can implement this functionality maybe in a standard manner, using a synonymous statement or using non-standard syntax. The options are listed below:
UPSERT
: Microsoft SQL Azure and MongoDBTake the following example where the data table HR_RECORDS
will be merged with the table EMPLOYEES
.
MERGE INTO employees e
USING hr_records h
ON (e.id = h.emp_id)
WHEN MATCHED THEN
UPDATE SET e.address = h.address
WHEN NOT MATCHED THEN
INSERT (id, address)
VALUES (h.emp_id, h.address);
The source of the data can also be a request, as shown in the following example:
MERGE INTO employees e
USING (SELECT * FROM hr_records WHERE start_date > ADD_MONTHS(SYSDATE, -1)) h
ON (e.id = h.emp_id)
WHEN MATCHED THEN
UPDATE SET e.address = h.address
WHEN NOT MATCHED THEN
INSERT (id, address)
VALUES (h.emp_id, h.address);
Throughout this article, you have learned what the MERGE
statement is, why it is used, and you have also learn about its syntax, have applied it to an example. I hope that this article is useful for those using Alibaba Cloud's big data services. The goal of this article was to shows how you can update your data in a table with one single step, avoiding the headaches you could have by trying to update data manually or by writing so many requests. Hopefully, this article has also showed you how you can optimize your data warehouse before running Alibaba's big data and machine learning algorithms.
Alibaba Shines as Center of Innovation with Research Presented at SIGCOMM 2019
2,599 posts | 762 followers
FollowApache Flink Community China - February 19, 2021
Apache Flink Community China - June 15, 2021
Alibaba EMR - May 16, 2022
Hologres - June 30, 2021
Alibaba Cloud MaxCompute - June 25, 2024
Alibaba Cloud MaxCompute - April 25, 2019
2,599 posts | 762 followers
FollowA platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreAccelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreMore Posts by Alibaba Clouder