By Daoke
This article analyzes and summarizes the source code of MySQL 8.0.25. The MySQL Server layer refers to the optimizer and executor of MySQL. Our understanding of MySQL is based on versions 5.6 and 5.7. We mainly compare it PostgreSQL or traditional databases. However, starting from MySQL 8.0, the continuous iteration and refactoring efforts every three months have brought a qualitative leap to the overall architecture of MySQL Server layer. Let's look at the architecture of MySQL 8.0.25.
The MySQL hierarchical architecture is not very different from other databases. It is worth noting that MySQL is now focusing more on enhancing the evolution of InnoDB, NDB clusters, and the memory cluster architecture of RAPID (HeatWave clusters), as shown in the figure. Let's take a look at the details. We do not follow the sequence of the official Feature implementation and refactoring. Instead, this article takes the perspective of the process of optimizer and executor.
The official MySQL 8.0 uses Bison to rewrite the parser and generate Parser Tree. At the same time, Parser Tree will contextualize to generate a MySQL abstract syntax tree (AST).
MySQL AST is somewhat different from other databases. It is composed of complex SELECT_LEX_UNIT/SELECT_LEX classes alternately. However, these two structures have been renamed to standard Query_expression and Query_block in this version, respectively. Query_block represents a query block, while Query_expression is a query expression that contains multiple query blocks, including the UNION AND/OR query block (such as SELECT FROM t1 union SELECT FROM t2) or multi-level ORDER BY/LIMIT (such as SELECT * FROM t1 ORDER BY a LIMIT 10) and ORDER BY b LIMIT 5.
For example, let's look at a complex nested query:
(SELECT *
FROM ttt1)
UNION ALL
(SELECT *
FROM
(SELECT *
FROM ttt2) AS a,
(SELECT *
FROM ttt3
UNION ALL SELECT *
FROM ttt4) AS b)
MySQL can be express it in the following way:
The parsed and converted syntax tree is still built under the framework of Query_block and Query_expression, but some levels of query blocks have been eliminated or merged. We will not discuss it in detail here.
Next, we will go through the resolve and transform process: Query_expression::prepare->Query_block::prepare. This process includes (by function instead of completely following the execution order):
• setup_tables: Set up table leaves in the query block based on list of tables
• resolve_placeholder_tables/merge_derived/setup_table_function/setup_materialized_derived: Resolve derived table, view, or table function references in query block
• setup_natural_join_row_types: Compute and store the row types of the top-most NATURAL/USING joins
• setup_wild: Expand all '*' in the list of expressions with the matching column references
• setup_base_ref_items: Set query_block's base_ref_items
• setup_fields: Check that all given fields exist and fill struct with current data
• setup_conds: Resolve WHERE condition and join conditions
• setup_group: Resolve and set up the GROUP BY list
• m_having_cond->fix_fields: Set up the HAVING clause
• resolve_rollup: Resolve items in SELECT list and ORDER BY list for rollup processing
• resolve_rollup_item: Resolve an item (and its tree) for rollup processing by replacing items matching grouped expressions with Item_rollup_group_items and updating properties (m_nullable, PROP_ROLLUP_FIELD). Check any GROUPING function for incorrect columnsas well.
• setup_order: Set up the ORDER BY clause
• resolve_limits: Resolve OFFSET and LIMIT clauses
• Window::setup_windows1: Set up windows after setup_order() and before setup_order_final()
• setup_order_final: Do the final setup of the ORDER BY clause after the query block is fully resolved
• setup_ftfuncs: Set up full-text functions after resolving HAVING
• resolve_rollup_wfs: Replace group by field references inside window functions with references in the presence of ROLLUP
• remove_redundant_subquery_clause: Permanently remove redundant parts from the query if:
1) This is a subquery.
2) Not normalizing a view.
Removal should take place when a query involving a view is optimized, not when the view is created.
• remove_base_options: Remove SELECT_DISTINCT options from a query block if distinct can be skipped.
• resolve_subquery: Resolve predicate involving subquery and perform early unconditional subquery transformations using one of the options below:
⚬ Convert subquery predicate into semi-join
⚬ Mark the subquery for execution using materialization
⚬ Perform IN->EXISTS transformation
⚬ Perform more/less ALL/ANY -> MIN/MAX rewrite
⚬ Substitute trivial scalar-context subquery with its value
• transform_scalar_subqueries_to_join_with_derived: Transform eligible scalar subqueries to derived tables.
• flatten_subqueries: Convert semi-join subquery predicates into semi-join join nests. Convert candidate subquery predicates into semi-join join nests. This transformation is performed once in query lifetime and is irreversible.
• apply_local_transforms:
⚬ delete_unused_merged_columns: If query block contains one or more merged derived tables/views, walk through lists of columns in select lists and remove unused columns.
⚬ simplify_joins: Convert all outer joins to inner joins if possible
⚬ prune_partitions: Perform partition pruning for a given table and condition
• push_conditions_to_derived_tables: Push-down of conditions to derived tables must be done after validity checks of grouped queries done by apply_local_transforms().
• Window::eliminate_unused_objects: Eliminate objects like unused window definitions and redundant sorts.
Due to space limitations, we will focus on the simple_joins function, which is related to top_join_list, and the simplifying process of nested joins in Query_block.
We have quoted the three procedures of PostgreSQL below to have a clearer understanding of the standard database approach:
In the following figure, Parser generates a parse tree based on the SQL statement:
testdb=# SELECT id, data FROM tbl_a WHERE id < 300 ORDER BY data;
The following figure shows how the analyzer/analyser of PostgreSQL uses the parse tree to generate a query tree after semantic analysis.
Rewriter transforms the query tree based on the rules in the rule system:
sampledb=# CREATE VIEW employees_list
sampledb-# AS SELECT e.id, e.name, d.name AS department
sampledb-# FROM employees AS e, departments AS d WHERE e.department_id = d.id;
The following example shows how a query tree containing view is expanded into a new query tree:
sampledb=# SELECT * FROM employees_list;
Next, we entered the process of generating physical plans from logical plans. This article still focuses on the analysis of structures instead of introducing the details of generation. MySQL used to rely mainly on JOIN and QEP_TAB before version 8.0.22. JOIN corresponds to Query_block, while QEP_TAB corresponds to the sequence, method, and execution plan of the specific "table" involved in Query_block. However, after version 8.0.22, the new Hypergraph-based optimizer algorithm successfully abandoned the QEP_TAB structure of expressing the execution plan of the left deep tree. The algorithm directly used the HyperNode/HyperEdge graph to represent the execution plan.
In the following example, you can see the different plan presentations of left deep tree presented in data structure and the bushy tree presented in the hypergraph structure.
| -> Inner hash join (no condition) (cost=1.40 rows=1)
-> Table scan on R4 (cost=0.35 rows=1)
-> Hash
-> Inner hash join (no condition) (cost=1.05 rows=1)
-> Table scan on R3 (cost=0.35 rows=1)
-> Hash
-> Inner hash join (no condition) (cost=0.70 rows=1)
-> Table scan on R2 (cost=0.35 rows=1)
-> Hash
-> Table scan on R1 (cost=0.35 rows=1)
| -> Nested loop inner join (cost=0.55..0.55 rows=0)
-> Nested loop inner join (cost=0.50..0.50 rows=0)
-> Table scan on R4 (cost=0.25..0.25 rows=1)
-> Filter: (R4.c1 = R3.c1) (cost=0.35..0.35 rows=0)
-> Table scan on R3 (cost=0.25..0.25 rows=1)
-> Nested loop inner join (cost=0.50..0.50 rows=0)
-> Table scan on R2 (cost=0.25..0.25 rows=1)
-> Filter: (R2.c1 = R1.c1) (cost=0.35..0.35 rows=0)
-> Table scan on R1 (cost=0.25..0.25 rows=1)
MySQL 8.0.2x introduces a new class AccessPath to be more compatible with the two optimizers. It can be considered as a Plan Tree abstracted by MySQL to decouple the executor from different optimizers.
The old optimizer still uses JOIN::optimize to convert query blocks to query execution plans (QEPs).
Some logic rewriting work is still done in this phase. The conversion in this phase can be understood as preparing for cost-based optimization. The detailed steps are listed below:
• Logical Transformations
• optimize_derived: Optimize the query expression representing a derived table/view
• optimize_cond: Equality/constant propagation
• prune_table_partitions: Partition pruning
• optimize_aggregated_query: COUNT(*), MIN(), MAX() constant substitution in case of implicit grouping
• substitute_gc: ORDER BY optimization. Substitute all expressions in the WHERE condition and ORDER/GROUP lists that match generated columns (GC) expressions with GC fields, if any.
• Perform cost-based optimization of table order and access path selection.
• JOIN::make_join_plan(): Set up join order and initial access paths
• Post-join order optimization
• substitute_for_best_equal_field: Create optimal table conditions from the where clause and the join conditions
• make_join_query_block: Inject outer-join guarding conditions
• Adjust data access methods after determining the table condition (several times)
• optimize_distinct_group_order: Optimize ORDER BY/DISTINCT
• optimize_fts_query: Perform FULLTEXT search before all regular searches
• remove_eq_conds: Remove const and eq items. Return the new item or nullptr if no condition
• replace_index_subquery/create_access_paths_for_index_subquery: See if this subquery can be evaluated with subselect_indexsubquery_engine
• setup_join_buffering: Check whether join cache could be used.
• Code Generation
• alloc_qep(tables): Create QEP_TAB array
• test_skip_sort: Try to optimize away sorting/distinct
• make_join_readinfo: Plan refinement stage: Do various setup things for the executor
• make_tmp_tables_info: Set up temporary table usage for grouping and/or sorting
• push_to_engines: Push (parts of) the query execution down to the storage engines if they can provide faster execution of the query or part of it
• create_access_paths: Generated ACCESS_PATH
The new optimizer is not enabled by default and must be enabled using set optimizer_switch="hypergraph_optimizer=on". This is mainly implemented by the FindBestQueryPlan function. The logic is listed below:
• First, determine whether it belongs to the Query syntax (CheckSupportedQuery) that the new optimizer can support. If it is not supported, it will return an error: ER_HYPERGRAPH_NOT_SUPPORTED_YET.
• Convert top_join_list to a JoinHypergraph structure. Since Hypergraph is relatively independent algorithm-level implementation, the JoinHypergraph structure is used to better encapsulate the structure of the database into the concepts of Hypergraph edges and nodes.
• The DPhyp algorithm in the paper is implemented by EnumerateAllConnectedPartitions.
• The CostingReceiver class contains the main logic of the previous JOIN planning, including selecting the corresponding access path based on cost, evaluating the sub-plan generated by DPhyp, and retaining the sub-plan with the smallest cost.
• After the root_path is obtained, handle group/agg/having/sort/limit. Hypergraph uses sorting first + streaming aggregation for the Group BY operation.
Here's an example to help us understand the relationship between Plan (AccessPath) and SQL:
Finally, the Iterator execution carrier required by the Iterator executor framework is generated. One AccessPath corresponds to one iterator.
Query_expression::m_root_iterator = CreateIteratorFromAccessPath(......)
unique_ptr_destroy_only<RowIterator> CreateIteratorFromAccessPath(
THD *thd, AccessPath *path, JOIN *join, bool eligible_for_batch_mode) {
......
switch (path->type) {
case AccessPath::TABLE_SCAN: {
const auto ¶m = path->table_scan();
iterator = NewIterator<TableScanIterator>(
thd, param.table, path->num_output_rows, examined_rows);
break;
}
case AccessPath::INDEX_SCAN: {
const auto ¶m = path->index_scan();
if (param.reverse) {
iterator = NewIterator<IndexScanIterator<true>>(
thd, param.table, param.idx, param.use_order, path->num_output_rows,
examined_rows);
} else {
iterator = NewIterator<IndexScanIterator<false>>(
thd, param.table, param.idx, param.use_order, path->num_output_rows,
examined_rows);
}
break;
}
case AccessPath::REF: {
......
}
testdb=# EXPLAIN SELECT * FROM tbl_a WHERE id < 300 ORDER BY data;
QUERY PLAN
---------------------------------------------------------------
Sort (cost=182.34..183.09 rows=300 width=8)
Sort Key: data
-> Seq Scan on tbl_a (cost=0.00..170.00 rows=300 width=8)
Filter: (id < 300)
(4 rows)
This article focuses on the official source code of MySQL 8.0.25, mainly analyzing the changes and connections in the structure of the official refactoring in each stage. We mainly intend to introduce the development of a brand-new MySQL.
ApsaraDB - February 4, 2024
ApsaraDB - March 10, 2022
Morningking - September 26, 2023
ApsaraDB - February 22, 2022
ApsaraDB - October 21, 2020
ApsaraDB - November 22, 2024
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAlibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreAnalyticDB for MySQL is a real-time data warehousing service that can process petabytes of data with high concurrency and low latency.
Learn MoreMore Posts by ApsaraDB