In one of my business scenarios, I needed to query the latest five audit data records.
SELECT `id`, `title`
FROM `th_content`
WHERE `audit_time` < 1541984478
AND `status` = 'ONLINE'
ORDER BY `audit_time` DESC, `id` DESC
LIMIT 5;
The monitoring data showed that the CPU usage at the time exceeded 100% and I noticed that many similar queries had the status of create sort index
after running show processlist
.
View the table structure
CREATE TABLE `th_content` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(500) CHARACTER SET utf8 NOT NULL DEFAULT '' COMMENT '内容标题',
`content` mediumtext CHARACTER SET utf8 NOT NULL COMMENT '正文内容',
`audit_time` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '审核时间',
`last_edit_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '最近编辑时间',
`status` enum('CREATED','CHECKING','IGNORED','ONLINE','OFFLINE') CHARACTER SET utf8 NOT NULL DEFAULT 'CREATED' COMMENT '资讯状态',
PRIMARY KEY (`id`),
KEY `idx_at_let` (`audit_time`,`last_edit_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
The index has a Composite index with audit_time
on the left side and does not have an index related to status
.
Analyze the execution logic of the preceding SQL statement:
ONLINE
".Although only five rows are required to be retrieved, in the preceding example that I gave, the query actually scans 1 million rows and sorts 500,000 rows in the database memory due to the large data volume.
Therefore, the query performance is extremely low.
I have drawn the following diagram to show the query process in step 1, where the pink parts indicate data rows that need to be retrieved in the table.
I have created and filled in some data according to the index and storage pattern. If you find any errors, feel free to leave a comment and let me know. I hope that this diagram can help you understand how the Composite index is stored and how to use indexes in queries.
It is not always easy to use indexes in range queries. If a Composite index with audit_time
and status
is added, what improvements can we obtain?
ALTER TABLE `th_content` ADD INDEX `idx_audit_status` (`audit_time`, `status`);
mysql> explain select `id`, `title` from `th_content` where `audit_time` < 1541984478 and `status` = 'ONLINE' order by `audit_time` desc, `id` desc limit 5;
+----+-------------+------------+-------+------------------------------------------+------------------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+------------------------------------------+------------------+---------+------+--------+-------------+
| 1 | SIMPLE | th_content | range | idx_at_ft_pt_let,idx_audit_status | idx_audit_status | 4 | NULL | 209754 | Using where |
+----+-------------+------------+-------+------------------------------------------+------------------+---------+------+--------+-------------+
Details: Because audit_time
is a range query, the index on the second column is unnecessary and only audit_time
will be used. Therefore, key_len
is 4. In the following Improvement idea 2, where the same two fields are included, key_len
is 5.
Let's also analyze the execution process after this index is added:
audit_time
value that is lower than this audit time value from the Composite index.audit_time
values than that audit time value. Because < audit_time is a range query and the values of the index on the second column is scattered, we need to further search for matching rows one by one and find all the index rows matching the criterion (status='ONLINE
') until we obtain the fifth matching row.In the preceding diagram, pink indicates rows that meet the requirement of the first index column. Continue the query and three records are found on this leaf node. Move to the left and perform query on the previous leaf node. When five matching rows are found, retrieve the corresponding data records in the table.
Because the index contains status
values, it is not required to retrieve corresponding data during the process of filtering rows with the status of "ONLINE". The iops
is significantly reduced because only five data rows need to be further retrieved in the table.
If fives rows scanned in idx_audit_status
all have the status
of ONLINE
, only five rows need to be scanned.
If only four rows of the first 1 million rows scanned in idx_audit_status
have the status
of ONLINE
, 1 million and 1 rows need to be scanned to obtain the required five rows. The number of rows to be scanned by the index is unknown.
ALTER TABLE `th_content` DROP INDEX `idx_audit_status`;
ALTER TABLE `th_content` ADD INDEX `idx_status_audit` (`status`, `audit_time`);
This allows both sorting and data retrieval in the table to be highly efficient.
2 posts | 1 followers
FollowAlibaba Cloud Community - September 10, 2024
digoal - July 25, 2019
ApsaraDB - January 3, 2024
digoal - April 12, 2019
ApsaraDB - June 4, 2024
ApsaraDB - November 17, 2020
2 posts | 1 followers
FollowAlibaba Cloud provides products and services to help you properly plan and execute data backup, massive data archiving, and storage-level disaster recovery.
Learn MoreA low-code, high-availability, and secure platform for enterprise file management and application
Learn MoreCustomized infrastructure to ensure high availability, scalability and high-performance
Learn MoreSDDP automatically discovers sensitive data in a large amount of user-authorized data, and detects, records, and analyzes sensitive data consumption activities.
Learn MoreMore Posts by zhoumengkang