How Does MySQL Implement Concurrency Control? (Part II)

This article introduces the concurrent access control features of MySQL database systems, specifically focusing on page-level and row-level concurrent access control.

By Guoqing Ma (Yiyun)

Preface

When you get started to learn about databases, you will always be asked a question: What is the biggest advantage of database systems over file systems?. There are many specific advantages, and one of the most important parts is that the database system can perform better concurrent access control.

So, how does the database system perform concurrent access control?

This series of articles takes MySQL 8.0.35 code as an example, and is divided into two parts to provide an overall introduction to concurrent access control in MySQL. The previous article mainly introduces table-level concurrent access control, and this article is part two, which will focus on page-level and row-level concurrent access control.

Page-level Concurrent Access Control

Basic Structure of B+tree

The InnoDB engine stores data through B+tree. There are many articles on the Internet about B+tree. You can search and learn by yourself. Here is a brief introduction to the basic structure of B+tree for subsequent description.

The preceding figure shows a typical three-layer B+tree, in which:

The top layer is the root node (ROOT), and each B+tree has only one root node.
The bottom layer is the leaf node (LEAF), which is also the node that actually stores data.
The middle layer is the non-leaf node (the root node is also a non-leaf node), which stores index data. Depending on the size of the B+tree itself, there may be zero to multiple middle layers.

As can be seen from the above figure, there is a one-way pointer between layers (from the upper layer to the lower layer), and there is a two-way pointer between different nodes at the same layer. Each node in the B+tree is a data page (Page), which is also the smallest unit for reading and writing data in InnoDB. The default page size in InnoDB is 16KB.

For InnoDB tables, we often hear a concept called "index organization tree". From the author's perspective, each index of each InnoDB table is a B+tree and the data is stored on the B+tree. The concepts of more indexes in InnoDB, including primary key indexes, secondary indexes, clustered indexes, and covering indexes, are not the focus of this article, so they are not elaborated here. Interested readers can query and learn by themselves.

Locking Process of B+tree

Page-level concurrent access control is mainly realized through locks on index and page, which is actually the locking process of B+tree. Before introducing the locking process, let's first look at the access path of B+tree based on the data.

The primary key information (ID) is added to the previous B+tree structure. Assuming that the data to be accessed now is a row with ID = 400, the access path of B+tree is shown in the above figure. As you can see, first access the root node, then find the non-leaf node at the next layer according to the primary key, and then continue to find the corresponding leaf nodes to read the data.

In fact, the locking process of B+tree is carried out according to the above access path. Taking the above query as an example, the locking process of B+tree is shown in the following figure:

The specific steps are as follows:

Add the S lock on the index.
Add the S lock on the root node.
Add the S lock on the non-leaf node.
Add the S lock on the leaf node.
Release the S locks on the index and all non-leaf nodes.

Similarly, if it is an optimistic update on the page (or an insert in the page), the locking process of B+tree is shown in the following figure:

The specific steps are as follows:

Add the S lock on the index.
Add the S lock on the root node.
Add the S lock on the non-leaf node.
Add the X lock on the leaf node.
Release the S locks on the index and all non-leaf nodes.

It can be seen that if it is an in-page modification, the locking logic is actually similar to the locking of the read process, except that the type of locking on the leaf node is different.

SMO Issues

The above section introduces the query process and the locking process of B+tree when an in-page modification occurs. If the updated data cannot be completed within the page, or the modification will cause a change of the B+tree structure (SMO, Structure Modify Operation), how should the locking be carried out?

When InnDB performs a data update operation, it first tries to use the optimistic update (MODIFY LEAF). If the optimistic update fails, it enters the logic of the pessimistic update (MODIFY TREE). The following figure shows the locking process of the pessimistic update.

The specific steps are as follows:

Add the SX lock on the index.
The root node is not locked.
Non-leaf nodes are not locked, but all passed nodes are searched.
Add the X lock on non-leaf nodes that may be modified and add the SX lock on the root node.
Add the X lock on leaf nodes, including front and back leaf nodes.

It can be seen that, unlike the previous process, when entering the pessimistic update logic, the SX lock will be directly added to the index (in the version earlier than 5.7, the X lock was directly added, the 5.7 version introduced the SX lock, the SX lock and the S lock are not mutually exclusive, so it can be read at this time). Therefore, in the subsequent B+tree traversal process, only the nodes passed by the index will be collected first and there is no direct locking. Only when passing the leaf node to be modified will it determine which non-leaf nodes may also be modified, thus adding the X lock.

Therefore, during the entire SMO period, except for leaf nodes and non-leaf nodes that may be modified, X locks are not added to other nodes (index and root nodes are added with SX locks), and read operations on non-modified nodes can be performed normally. However, there can only be one SMO on a B+tree at the same time.

B+tree Locking Process in a Write Process

The whole B+tree locking process is more complicated. Here, the main code stack is explained through the insert process on a primary key:

|--> row_ins_clust_index_entry
|    |--> row_ins_clust_index_entry_low(..., BTR_MODIFY_LEAF, ...) // Optimistic
|    |    |--> pcur.open(index, ...)
|    |    |    |--> btr_cur_search_to_nth_level // Traverse B+tree
|    |    |    |    |--> // switch (latch_mode)
|    |    |    |    |--> // default
|    |    |    |    |--> mtr_s_lock(dict_index_get_lock(index), ...) // Add the S lock on the index
|    |    |    |    |--> btr_cur_latch_for_root_leaf
|    |    |    |    |
|    |    |    |    |--> // search_loop
|    |    |    |    |--> // retry_page_get
|    |    |    |    |--> buf_page_get_gen(..., rw_latch, ...)
|    |    |    |    |    |--> mtr_add_page // Lock the page by type
|    |
|    |--> row_ins_clust_index_entry_low(..., BTR_MODIFY_TREE, ...) // Pessimistic
|    |    |--> pcur.open(index, ...)
|    |    |    |--> btr_cur_search_to_nth_level // Traverse B+tree
|    |    |    |    |--> // switch (latch_mode)
|    |    |    |    |--> // BTR_MODIFY_TREE
|    |    |    |    |--> mtr_sx_lock(dict_index_get_lock(index), ...) // Add the SX lock on the index
|    |    |    |    |--> btr_cur_latch_for_root_leaf
|    |    |    |    |
|    |    |    |    |--> // search_loop
|    |    |    |    |--> // retry_page_get
|    |    |    |    |--> buf_page_get_gen(..., rw_latch, ...)
|    |    |    |    |    |--> mtr_add_page // Lock the page by type

The above section is only an introduction to the B+tree locking process. The detailed locking logic can be developed by itself through the above entry, and will not be further elaborated here.

Summary of B+tree Locking Process

The above content is the B+tree locking process. Let's make a summary:

Page-level concurrent access control occurs in the B+tree traversal process, that is, the B+tree locking process.
Locked objects include index and page.
The types of locks include S, SX, and X, where the S lock and the SX lock are not mutually exclusive.
The query process only adds the S lock.
During the modification, the locking process varies based on the type of modification. If the data on the page is modified, the optimistic update logic is applied, and only the modified leaf nodes are added with X locks. If it is the pessimistic update logic, index and root nodes need to be added with SX locks, and nodes whose indexes may be modified need to be added with X locks.

Row-level Concurrent Access Control

An Interesting Deadlock Problem

Before introducing row-level concurrent access control, let's first look at an interesting example:

CREATE TABLE `t1` (
  `id` int NOT NULL,
  `c1` int DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

INSERT INTO t1 VALUES (1, 10);
INSERT INTO t1 VALUES (2, 20);
INSERT INTO t1 VALUES (3, 30);

In the preceding example, the transaction isolation level is RC (READ-COMMITTED) by default. Session 1 starts a transaction, then inserts a row of data, and does not commit it. Session 2 and session 3 subsequently insert the same data, but both are blocked. Finally, if session 1 performs a rollback operation, what happens to session 2 and session 3 respectively?

Readers who know about the principle of MySQL row locks may give the following answer: session 2 inserts successfully and session 3 reports an error with the type 'Duplicate key'. So is that really the case?

The author directly posts the execution results on MySQL 8.0.35 here.

[session 2] Execution Result:

[session 3] Execution Result:

As you can see, session 2 does insert successfully, and session 3 also reports an error, but this error seems to be not in line with our general understanding. Why is the Deadlock found, and where does the deadlock come from?

To figure out this problem, the author turned off the deadlock detection logic on MySQL (innodb_deadlock_detect was set to OFF), and then tried the above operation again. The results show that session 2 and session 3 are indeed stuck. View the data_locks table under the performance_schema mentioned earlier:

According to the lock wait relationship in the data_locks table, it is found that both session 2 (THREAD_ID=69) and session 3 (THREAD_ID=70) are waiting for the intention lock. The implied semantics is that both sessions hold the Gap Lock, which causes a deadlock.

Therefore, as you can see from the above example, even a simple primary key insert at the RC isolation level does not just lock a single row of records, and may also cause a deadlock.

Basic Concepts of Row Locks

As you can see from the above example, row locks in InnoDB are not just locks on a single row of records. In fact, row locks are divided into the following types within InnoDB:

1. Rec Lock: the lock added to a single row of records. Its name in the official code is LOCK_REC_NOT_GAP. In terms of the lock type, Rec Lock can be divided into the record read lock (S lock) and the record write lock (X lock).

2. Gap Lock: the lock added to the gap between rows of records. Its name in the official code is LOCK_GAP. (In addition, many articles on the Internet claim that Gap Lock is to solve the phantom read problem of write at RR (REPEATABLE-READ) isolation level. In fact, it is not entirely true, as can be illustrated by the previous example of inserting a deadlock. The issue of transaction isolation levels is not the focus of this article and will not be elaborated here.)

3. Next-Key Lock: the lock that can be simply understood as the combination of Rec Lock and Gap Lock (gap before the record). Its name in the official code is LOCK_ORDINARY.

4. Insert Intention Lock: if the Gap Lock has been added in the inserted position by another transaction, then the current operation needs to wait. At this time, an Insert Intention Lock will be generated, and its name in the official code is LOCK_INSERT_INTENTION.

The above is the basic concept of row locks in InnoDB, which looks very simple, but it is not easy to really understand, so the following section will provide a detailed analysis mainly through some typical cases.

Locking Process in a Write Process

Before entering the case analysis, take a write process as an example to explain the main logic based on the code. The main code stack is as follows:

|--> ha_innobase::write_row
|    |--> row_insert_for_mysql
|    |    |--> row_insert_for_mysql_using_ins_graph
|    |    |    |--> // run_again
|    |    |    |--> row_ins_step
|    |    |    |    |--> row_ins
|    |    |    |    |    |--> row_ins_index_entry_step
|    |    |    |    |    |    |--> row_ins_index_entry
|    |    |    |    |    |    |    |--> row_ins_clust_index_entry // Insert the primary key
|    |    |    |    |    |    |    |    |--> row_ins_clust_index_entry_low
|    |    |    |    |    |    |    |
|    |    |    |    |    |    |    |--> row_ins_sec_index_entry // Insert a secondary index
|    |    |    |
|    |    |    |--> row_mysql_handle_errors
|    |    |    |    |--> lock_wait_suspend_thread // Lock wait, wake up and enter run_again

|--> row_ins_clust_index_entry_low
|    |--> btr_pcur_t::open  // Traverse B+tree
|    |
|    |--> row_ins_duplicate_error_in_clust // The first insert will not enter (implicit lock)
|    |    |--> row_ins_set_rec_lock
|    |    |    |--> lock_clust_rec_read_check_and_lock
|    |    |    |    |--> lock_rec_convert_impl_to_expl // Convert implicit lock to explicit lock
|    |    |    |    |    |--> lock_rec_convert_impl_to_expl_for_trx
|    |    |    |    |    |    |--> lock_rec_add_to_queue
|    |    |    |    |    |    |    |--> rec_lock.create // RecLock::create
|    |    |    |    |    |    |    |    |--> lock_alloc
|    |    |    |    |    |    |    |    |--> lock_add
|    |    |    |    |    |    |    |    |    |--> // No lock wait
|    |    |    |    |    |    |    |    |    |--> lock_rec_insert_to_granted
|    |    |    |    |    |    |    |    |    |--> locksys::add_to_trx_locks
|    |    |    |    |--> lock_rec_lock // Construct a lock wait
|    |    |    |    |    |--> lock_rec_lock_fast
|    |    |    |    |    |    |--> rec_lock.create // RecLock::create
|    |    |    |    |    |--> lock_rec_lock_slow
|    |    |    |    |    |    |--> lock_rec_has_expl
|    |    |    |    |    |    |--> lock_rec_other_has_conflicting // Check for conflicts
|    |    |    |    |    |    |--> rec_lock.add_to_waitq
|    |    |    |    |    |    |    |--> create // RecLock::create
|    |    |    |    |    |    |    |    |--> lock_alloc
|    |    |    |    |    |    |    |    |--> lock_add
|    |    |    |    |    |    |    |    |    |--> // Wait
|    |    |    |    |    |    |    |    |    |--> lock_rec_insert_to_waiting
|    |    |    |    |    |    |    |    |    |--> locksys::add_to_trx_locks
|    |    |    |    |    |    |    |    |    |--> lock_set_lock_and_trx_wait
|    |
|    |--> btr_cur_optimistic_insert // Optimistic insert
|    |    |--> btr_cur_ins_lock_and_undo
|    |    |    |--> lock_rec_insert_check_and_lock // Check for lock conflicts before insertion
|    |    |    |    |--> lock_rec_other_has_conflicting
|    |    |    |    |--> rec_lock.add_to_waitq
|    |    |    |--> trx_undo_report_row_operation
|    |--> btr_cur_pessimistic_insert // Pessimistic insert

From the above code, the insert process in the initial example should be like this:

1. When session 1 inserts, because it is the first time to insert, it inserts directly without the need to explicitly create a lock.

2. When the session 2 inserts and the row_ins_duplicate_error_in_clust function is used for conflict check:

It is found that the record already exists and the corresponding transaction is an active transaction. At this time, the logic of converting implicit lock to explicit lock will be triggered. In short, session 2 creates a Rec X Lock for session 1 (trx 1 to be exact). Because there is no wait relationship at this time, the lock can be obtained directly.
Session 2 continues to create a Rec S Lock. Since it conflicts with the previous Rec X Lock, it will be added to the waiting queue, skip the subsequent insert operation, and finally enter the lock_wait_suspend_thread function to wait.

3. When session 3 inserts, the basic process is the same as that of session 2, except that Rec X Lock already exists. There is no need to trigger the logic of converting implicit lock to explicit lock. It directly creates a Rec S Lock for itself and enters waiting.

At this point, the waiting logic of session 2 and session 3 is relatively clear. But why do session 2 and session 3 form a deadlock after session 1 rolls back? According to the previous analysis, the rollback of session 1 releases Rec X Lock, so session 2 and session 3 are awakened. The thread that enters the insert logic first inserts successfully and the other thread fails. In addition, as mentioned earlier, session 2 and session 3 finally wait for the Insert Intention Lock. Theoretically, the occurrence of this lock requires other threads to hold the Gap Lock. How does this wait relationship occur?

The author also thought about this question for a long time and finally had to find the answer by debugging the source code. Simply put:

During rollback, session 1 does not simply release Rec X Lock and then wake up Session 2 and Session 3.
In the rollback logic of session 1. there is a very important step lock_rec_inherit_to_gap, which converts Rec Lock on session 2 and session 3 to Gap Lock.
After session 2 and session 3 are awakened, no matter which thread enters the insert logic first, it will find the other session's Gap Lock in the lock conflict check (lock_rec_insert_check_and_lock) before insertion, and then generate the Insert Intention Lock.

Regarding the locking process in the write process, the above section is only a very simple introduction with the help of the case of deadlock caused by the insert. The key point is to clarify the basic logic of row lock generation and lock wait generation in the insert process. The knowledge of InnoDB row lock is very rich, which will not be further elaborated here, and can be introduced separately later when possible.

Typical Deadlock Problem

Note: The following scenarios are mainly derived from: https://www.modb.pro/db/1703591734429175808

The table structures used in each scenario are as follows:

DROP TABLE IF EXISTS `t1`;

CREATE TABLE `t1` (
  `id` int NOT NULL AUTO_INCREMENT,
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk_a` (`a`)
) ENGINE=InnoDB;

INSERT INTO t1 values (1, 10, 0);
INSERT INTO t1 values (2, 20, 0);
INSERT INTO t1 values (3, 30, 0);
INSERT INTO t1 values (4, 40, 0);
INSERT INTO t1 values (5, 50, 0);

Note: You must disable the deadlock detection logic on MySQL (innodb_deadlock_detect is set to OFF).

Scenario 1

T1: Session 1 inserts for the first time and does not explicitly create a lock.

T2: Session 2 inserts duplicate rows. It first creates a Next-Key Lock (X) on the UK (30, 35] for session 1 (trx 1), and then creates a Next-Key Lock (S) on the UK (30, 35] for itself. At this time, the Gap Lock does not conflict, so the session is the Rec Lock (S) waiting on 35.

T3: Session 1 inserts again. Since the inserted interval is still (30, 35), which conflicts with the interval lock held by session 2, an Insert Intention Lock is generated, resulting in a deadlock.

Scenario 2

T1: When session 1 is updated, a UK conflict is detected and the Next-Key Lock (X) is created on the UK (30, 40] and (40, 50].

T2: When session 2 is updated, a UK conflict is also detected, and the Next-Key Lock (X) needs to be created on the UK (20, 30] and (30, 40]. Since session 1 already holds the Rec Lock (X) on 40, session 2 can only wait for the Rec Lock (X) on 40.

T3: When session 3 is updated, a UK conflict is also detected, and the Next-Key Lock (X) needs to be created on the UK (30, 40] and (40, 50]. Since session 1 already holds the Rec Lock (X) on 40, session 3 can only wait for the Rec Lock (X) on 40 and the Next-Key Lock (X) on (40,50] has not yet been processed.

T4: Session 1 rolls back and releases the lock; session 2 obtains the Rec Lock (X) on 40, but since the inserted interval is (20,40) and session 3 already holds the Gap Lock on (30,40), an Insert Intention Lock needs to be generated, resulting in a deadlock.

Note: There is actually an interesting question here. Why does the update on the UK require two Next-Key Locks? Interested readers can refer to the following article.

🔗 References: http://mysql.taobao.org/monthly/2022/05/02/

Scenario 3

T1: Session 1 executes the query. Due to for UPDATE protection, a Rec Lock (X) on 40 needs to be added.

Subsequent steps are the same as those in scenario 2, and details are not described again.

Scenario 4

This scenario is the same as the deadlock scenario explained earlier (An Interesting Deadlock Problem) and will not be analyzed again.

Troubleshooting of Deadlock Problems

The above example describes some deadlock problems caused by row-level concurrent access control. Finally, let's briefly describe the troubleshooting ideas for deadlock problems:

By default, MySQL 8.0 enables deadlock detection (innodb_deadlock_detect). In principle, we recommend that you do not manually disable it. In addition, we recommend that you do not set the innodb_lock_wait_timeout parameter too large.
When a deadlock occurs, if the performance_schema is enabled, you can query the data_locks table under the performance_schema to view the lock wait relationship, and then manually process it. Similar to the MDL lock wait processing logic, if you do not want to analyze the lock wait relationship, you can kill all the involved connections in the data_locks table.
If a deadlock does occur, the lock wait relationship is printed in the MySQL error log. You can analyze the lock wait relationship to optimize the write logic on the business side.

Summary of the Row-level Locking Process

The above content introduces the row-level locking process, and let's make a summary:

Row locks are not just locks on row records. Row locks include Rec Lock, Gap Lock, Next-Key Lock, and Insert Intention Lock.
Row locks are created on demand. If it is inserted for the first time, the row lock is not added by default (implicit lock). Only when a conflict occurs, the row lock is promoted to an explicit lock.
Only S lock and S lock are compatible on the Rec Lock.
S lock and X lock are compatible, and X lock and X lock are also compatible on Gap Lock.
Next-Key Lock is a combination of Rec Lock and Gap Lock, which is also separated when processed.
Insert Intention Lock is generated because there are other transactions holding a Gap Lock to be inserted into the gap.
All locks are released when the transaction is committed. To reduce deadlocks, we recommend that you commit the transaction as soon as possible.

Summary

This series of articles is mainly a compilation of the concurrent access control and locking logic on tables, pages, and rows in MySQL. In general:

Tables, pages, and rows are the basic processes of MySQL data processing.
Concurrency control at the table level, or the table-level lock, mainly protects table structures. In MySQL 8.0, table structure protection is completed by MDL locks. Non-InnDB tables (CSV tables) also rely on table-level locks at the Server layer for concurrency control, while InnoDB tables do not need to be locked at the Server layer.
Concurrency control at the page level, or locks on index and page, is mainly to protect the security of B+tree. Under optimistic write, only leaf nodes need to be added with X locks. Under pessimistic write (SMO), all nodes whose indexes may be modified need to be added with X locks. The introduction of SX locks increases read and write concurrency, but SMO still cannot be concurrent.
Concurrency control at the low level, or row locks, is mainly to protect the consistency of row records. Another important point of concurrency control at the row level is MVCC. Interested readers can learn it by themselves.

Community

How Does MySQL Implement Concurrency Control? (Part II)

Preface

Page-level Concurrent Access Control

Basic Structure of B+tree

Locking Process of B+tree

SMO Issues

B+tree Locking Process in a Write Process

Summary of B+tree Locking Process

Row-level Concurrent Access Control

An Interesting Deadlock Problem

Basic Concepts of Row Locks

Locking Process in a Write Process

Typical Deadlock Problem

Scenario 1

Scenario 2

Scenario 3

Scenario 4

Troubleshooting of Deadlock Problems

Summary of the Row-level Locking Process

Summary

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for MySQL

AnalyticDB for MySQL

ApsaraDB RDS for MySQL

ApsaraDB for MyBase