A Deep Dive into MySQL - Overview of Semijoin

This article delves into the MySQL Semijoin process, from identification to the optimizer selecting the best execution strategy based on cost and mode.

By Boge

MySQL Semijoin mainly solves the existence of records of one or more tables in outer queries in the results of subqueries containing one or more tables, such as the IN/EXIST subquery. If the query is executed based on the original semantics of the IN/EXIST predicate and the result of the IN/EXIST predicate is calculated by each row of records in the outer query, the content of the subquery needs to be executed separately. In the case of an associated subquery, the subquery needs to be executed repeatedly, resulting in low overall execution efficiency. Part of the existence issues (SPJ subquery) is similar to the JOIN issue of multiple tables of the outer query and those of the subquery (hereinafter referred to as outer tables and inner tables). If it can be joined, it exists. However, the difference from common JOIN is that when an outer table is joined with an inner table, each row of the outer table may be joined with multiple rows of the inner table. This results in duplicate records of rows in the outer table. However, the IN/EXIST subquery only filters the rows of the outer table based on the original semantics. The data can only be reduced but not bloated.

How to ensure this while joining outer and inner tables? MySQL implements four execution strategies to avoid duplicate rows: Materialize, DuplicateWeedOut, FirstMatch, and LooseScan. The following is a summary and schematic diagrams of four execution strategies.

The diagrams use this SQL as an example:

select * from Country 
where Country.code IN (select City.Country
                        from City
                        where City.Population > 7*1000*1000)
      and Country.continent='Europe'

• Materialize

Materialize is an execution mode that materializes the result of an independent JOIN operation on the inner table to the temporary table and deduplicates and then joins the materialized table with other outer tables. Since the materialized table has been deduplicated, JOIN with the outer table does not cause data bloat. Therefore, it becomes a common JOIN and can be adjusted with JOIN ORDER of the outer table.

• DuplicateWeedOut

DuplicateWeedout is to create a temporary table. During the process of writing the primary key recorded by the outer table to the rowids field with a unique index on the temporary table, it deduplicates the rows that are generated after the outer table is joined with the inner table.

• FirstMatch

FirstMatch is used to skip all subsequent rows of data in the inner table and read the next row of the outer table for execution after a row of records in the outer table is joined with the inner table to get a row of output. This prevents duplicate rows from being generated in the outer table.

• LooseScan

When an index exists in the inner table, LooseScan performs on the inner table (only scanning rows with different index records) to deduplicate the records of the inner table and joins with the outer table. This also prevents duplicate rows from being generated in the outer table.

Flexibility of Semijoin

Why are there four execution strategies? This is because in the case that there are different JOIN ORDER, index existence, and JOIN conditions in the outer table and the inner table, only specific execution strategies can be applied. The combination of the four strategies forms the flexibility of semijoin JOIN ORDER. MySQL selects JOIN ORDER and the corresponding execution strategy based on the cost. Use the following SQL as an example (the same schema is used in subsequent articles). The ct represents the correlated table, which is correlated with the subquery in the outer query; it represents the inner table of the subquery.

CREATE TABLE t1 (
  a INT,
  b INT,
  KEY(a),
  KEY(b)
);

INSERT INTO t1 VALUES (1, 2), (3, 4), (5, 6), (7, 8), (9, 10), (2, 1), (4, 3), (5, 6);

CREATE TABLE ct1 LIKE t1;
INSERT INTO ct1 SELECT * FROM t1;

CREATE TABLE ct2 LIKE t1;
INSERT INTO ct2 SELECT * FROM t1;

CREATE TABLE it1 LIKE t1;
INSERT INTO it1 SELECT * FROM t1;

CREATE TABLE it2 LIKE t1;
INSERT INTO it2 SELECT * FROM t1;

SELECT * FROM ct1, ct2 WHERE ct1.a IN (SELECT it1.a FROM it1, it2 WHERE it2.b = ct2.b);

Pay attention to the relative order of the outer table and the inner table in JOIN ORDER (the join order of tables in parentheses is variable). The execution strategy supports JOIN ORDER as follows:

[^1]: Inner tables are not joined together.
[^2]: MaterializeLookup requires all dependent tables are in front of the inner tables. Use MaterializeScan instead.
[^3]: FirstMatch requires all dependent tables are in front of the inner tables.
[^4]: FirstMatch doesn't allow any outer tables to be interleaved within inner tables.
[^5]: LooseScan requires there exists at least a correlated outer table appears after the LooseScan inner table.
[^6]: LooseScan requires all semijoin conditions after the LooseScan table are only dependent on index columns of LooseScan.
[^7]: LooseScan requires inner tables appear continuously.

It can be seen that DuplicateWeedOut is the most versatile strategy and other strategies can only be applied to specific JOIN ORDER. The table does not cover the order exchanges between it tables and between ot tables. If the original SQL contains other outer tables that are not correlated with the IN/EXIST subquery, in most cases, the outer table can flexibly exchange the order between ct tables or at any position outside the range of semijoin. Therefore, each execution strategy supports a wider range of JOIN ORDER in actual scenarios.

The following dives into the kernel to explore the application conditions of semijoin and how the optimizer determines the execution strategy of semijoin based on the cost.

Prepare Phase

Collection of Semijoin candidate subqueries

In the process of the predicate resolve/fix_fields() of the outer query, the subquery Query_block::prepare recursively calls Query_block::resolve_subquery by itself to determine the basic conditions that can be transformed to semijoin. If the basic conditions are met, Item_exists_subselect (the base class of Item_in_subselect) corresponding to the subquery will be added to the Query_block::sj_candidates array of the outer query. After the recursion returns, the outer query calls Query_block::flatten_subqueries to process.

The basic conditions include:

The subquery is of the IN/EXIST type and =ANY is also the IN subquery.
It is a simple query block (not composed of set operators such as UNION, INTERSECT, and EXCEPT).
There are no explicit or implicit aggregation functions and no HAVING and WINDOW functions (Semijoin is to extract the table in the subquery and join it with the outer table before performing subsequent operations, and these operators need to read the inner table to perform operations before calculating the predicate in the outer layer).
The subquery is in the ON/WHERE syntax and is the top level joined by the AND predicate.
The predicates True/False and nullability are transformation-compatible.

Simply put, the main condition is the IN/EXIST subquery of SPJ.

The outer query pulls up the inner table to eliminate the original subqueries

During resolve of bottom-up, the outer query block containing the semijoin candidates pulls up the inner table and eliminates the subqueries. This process is in Query_block::flatten_subqueries.

for SELECT#1 WHERE X IN (SELECT #2 WHERE Y IN (SELECT#3)) :

Query_block::prepare() (select#1)
   -> fix_fields() on IN condition
       -> Query_block::prepare() on subquery (select#2)
           -> fix_fields() on IN condition
                -> Query_block::prepare() on subquery (select#3)
                <- Query_block::prepare()
           <- fix_fields()
           -> flatten_subqueries: merge #3 in #2
           <- flatten_subqueries
       <- Query_block::prepare()
   <- fix_fields()
   -> flatten_subqueries: merge #2 in #1

In the outer query block, traverse all semijoin candidate subqueries in sequence:

1. Replace their predicate position (WHERE condition or ON condition nested by join) in the outer query block with a constant True expression.

2. Create a semijoin nested Table_ref structure and add it to the nested join nest structure or outer-most join list, hereinafter referred to as (sj-nest).

3. Add all inner tables to Table_ref::NESTED_JOIN, link to leaf_tables of the outer query, and renumber these tables based on tableno of the outer layer.

4. Build NESTED_JOIN::sj_outer_exprs and NESTED_JOIN::sj_inner_exprs vectors. IN subquery is the one-to-one correspondence between the left expression and the projection of the subquery. EXIST subquery is the join condition in the WHERE/ON conditions (IN subquery also collects these join conditions to facilitate execution). The original positions of the collected conditions leave empty or True, such as the following predicate:

ot1.a = ANY(SELECT it1.a FROM it1, it2) is collected to sj_outer_exprs: ot1.a; sj_inner_exprs: it1.a
ot1.a = ANY(SELECT it1.a FROM it1, it2 WHERE it1.b = ot2.b) is collected to sj_outer_exprs: ot1.a, ot2.b; sj_inner_exprs: it1.a, it1.b

5. Determine the outer table set NESTED_JOIN::sj_corr_tables (non-simple join conditions, used to determine whether the semijoin conditions can be calculated during optimization) on which the remaining WHERE/ON conditions in the subquery depend.

6. Build multiple Item_func_eq expressions with sj_outer_exprs and sj_inner_exprs, combine them with the remaining WHERE conditions of the subquery, and put them into the corresponding JOIN/WHERE conditions of the outer query.

7. NESTED_JOIN::sj_depends_on contains all semijoin conditions and the dependency of the subquery WHERE conditions on the table in the original outer query. Mark the query block has_sj_nests/has_aj_nests as True.

After that, in Query_block::apply_local_transforms -> Query_block::simplify_joins, the nested semijoin/anti-join will be simply processed to flatten the nested structure, because when the outer structure is semijoin, whether the output result of the inner layer is repeated does not affect the result, such as A SJ (B SJ (C)) -> A SJ (B JOIN C) and A AJ (B SJ (C)) -> A AJ (B JOIN C). Move the semijoin table of the inner layer to the join list of the outer layer and remove the original inner layer (sj-nest). Record (sj-nest) Table_ref::sj_inner_tables as used_tables of all inner tables in the original subqueries. Add all (sj-nest) to the Query_block::sj_nests array of the current query block.

Optimize Phase (Optimizer)

MySQL supports different execution strategies for semijoin. Different execution strategies have different JOIN ORDER for inner tables and outer tables, which provides more flexible JOIN ORDER options. Different execution strategies achieve the goal of deduplication through different modes to avoid the original semijoin transforming into data bloat after JOIN. Different execution strategies can utilize different index or table access paths. These execution strategies include FirstMatch, DuplicateWeedout, LooseScan, and Materialize. Materialize is divided into MaterializeScan and MaterializeLookup based on the different JOIN ORDER of materialized tables and outer tables. The goal of the optimizer is to evaluate the feasibility of these execution strategies (such as whether the switch and index support) and select the strategy with the lowest estimated cost.

Cost selection for execution strategies

This section describes the common processing logic of these execution strategies. The differences and optimization details of each execution strategy are described in the following sections for each execution strategy.

The main process is in JOIN::make_join_plan:

• pull_out_semijoin_tables

After update_ref_and_keys, if the inner table of semijoin can be eq_ref (unique index) outer table, and no other inner table in (sj-nest) depends on it, then the table can be moved out of (sj-nest) and placed in join list of the outer layer because joining such table with the outer table does not cause data bloat. This simplifies the (sj-nest) inner table structure. If all inner tables can be extracted, then (sj-nest) can be removed directly, thus eliminating semijoin.

• optimize_semijoin_nests_for_materialization

Before the outer optimization, optimize JOIN ORDER of the inner table for (sj-nest) that may select Materialize to execute and calculate the cost and fanout. Based on these data, the cost can be calculated during the optimization process with the outer table. See the Materialize section for details.

• Optimize_table_order::best_extension_by_limited_search -> Optimize_table_order::advance_sj_state

Every time a new table (outer table + inner table of subquery) is tried to be added to the JOIN sequence and its best_access_path is selected, Optimize_table_order::advance_sj_state is called to consider the available semijoin execution strategies at this time, calculate the cost of different strategies, and select the strategy with the lowest cost as the best strategy for the current JOIN sequence. It stores the strategy in POSITION of the current table (different strategies have different state variables) and updates the cost and cardinality (prefix_cost and prefix_rowcount) under the best strategy.

Note that for some execution modes, such as Materilize and FirstMatch, the access path of each table within a certain range is re-evaluated to calculate the cost of selecting the best access path under the current strategy. However, the reconsideration of these access paths does not persist to the JOIN::positions enumeration state, avoiding the impact on subsequent greedy search in the outer layer. The prefix_cost/prefix_rowcount and the variables selected by the semijoin execution strategy recorded in POSITION of the current table will persist. If enumerating the complete JOIN sequence is still the best plan, it will be persisted to best_positions. Therefore, when the entire greedy search ends, the strategy selection result about semijoin in POSITION is identified in Optimize_table_order::fix_semijoin_strategies and the access path of the table involved needs to be selected again. In this case, the access path is persisted.

Specific consideration for the cost of each execution strategy is elaborated in the following sections.

Materialize

Materialize is an execution mode that materializes the JOIN result of the (sj-nest) inner table into the temporary table and deduplicates, and then joins the materialized table with other outer tables. Since the materialized table has been deduplicated, JOIN with the outer table does not cause data bloat. Therefore, it becomes a common JOIN and can be adjusted with JOIN ORDER of the outer table. Materialize is divided into MaterializeScan and MaterializeLookup based on whether the materialized table is scanned entirely or is indexed eq_ref lookup on the temporary table. The optimization process includes:

• optimize_semijoin_nests_for_materialization

JOIN::make_join_plan -> optimize_semijoin_nests_for_materialization
|-- Traverse all (sj-nest).
|-- If NESTED_JOIN::sj_corr_tables is not empty, the subquery has complex join conditions that cannot be extracted. In this case, the subquery cannot be materialized independently and is skipped.
|-- semijoin_types_allow_materialization/types_allow_materialization checks whether the type can be materialized and whether the index length of the materialized table exceeds the maximum value.
|-- Optimize_table_order::choose_table_order // Select the best join order for the inner table set of semijoin.
|-- calculate_materialization_costs // Calculate NDV of the materialized table, the cost of materialization, the cost of scanning the entire materialized table, and the cost of the materialized table lookup, 
// Store the results in NESTED_JOIN::Semijoin_mat_optimize to facilitate the subsequent cost comparison of different execution strategies.
|-- Calculate the materialization cost and the number of fanout rows based on read_cost and evaluate the cost of each table in best_positions acquired by optimization.
|-- Take the smaller value of the number of fanout rows multiplied by the number of output rows of each table as the output NDV of the materialized table.
|-- Cost_model_server::tmptable_readwrite_cost calculates the cost of materialized table operations.
|-- Store the best_positions information in the Semijoin_mat_optimize::positions array as the best plan for the (sj-nest) inner table.

MaterializeScan

SET optimizer_switch='semijoin=on,materialization=on,loosescan=off,firstmatch=off,duplicateweedout=off';

EXPLAIN SELECT /*+ JOIN_SUFFIX(ct1, ct2) */ * FROM ct1, ct2 WHERE ct1.a IN (SELECT it1.a FROM it1, it2 WHERE it2.b = ct2.b);
+----+--------------+-------------+------------+-------+---------------+------+---------+---------------+------+----------+--------------------------------------------+
| id | select_type  | table       | partitions | type  | possible_keys | key  | key_len | ref           | rows | filtered | Extra                                      |
+----+--------------+-------------+------------+-------+---------------+------+---------+---------------+------+----------+--------------------------------------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL   | NULL          | NULL | NULL    | NULL          | NULL |   100.00 | Using where                                |
|  1 | SIMPLE       | ct1         | NULL       | ref   | a             | a    | 5       | <subquery2>.a |    1 |   100.00 | NULL                                       |
|  1 | SIMPLE       | ct2         | NULL       | ref   | b             | b    | 5       | <subquery2>.b |    1 |   100.00 | NULL                                       |
|  2 | MATERIALIZED | it1         | NULL       | index | a             | a    | 5       | NULL          |   16 |   100.00 | Using index                                |
|  2 | MATERIALIZED | it2         | NULL       | index | b             | b    | 5       | NULL          |   16 |   100.00 | Using index; Using join buffer (hash join) |
+----+--------------+-------------+------------+-------+---------------+------+---------+---------------+------+----------+--------------------------------------------+

| -> Nested loop inner join  (cost=259 rows=334)
    -> Nested loop inner join  (cost=142 rows=293)
        -> Filter: ((`<subquery2>`.a is not null) and (`<subquery2>`.b is not null))  (cost=53.2..39.2 rows=256)
            -> Table scan on <subquery2>  (cost=53.3..59 rows=256)
                -> Materialize with deduplication  (cost=53.3..53.3 rows=256)
                    -> Filter: ((it1.a is not null) and (it2.b is not null))  (cost=27.7 rows=256)
                        -> Inner hash join (no condition)  (cost=27.7 rows=256)
                            -> Index scan on it2 using b  (cost=0.116 rows=16)
                            -> Hash
                                -> Index scan on it1 using a  (cost=1.85 rows=16)
        -> Index lookup on ct1 using a (a=`<subquery2>`.a)  (cost=0.286 rows=1.14)
    -> Index lookup on ct2 using b (b=`<subquery2>`.b)  (cost=73.2 rows=1.14)
 |

• Optimize_table_order::advance_sj_state

The prerequisites (semijoin_order_allows_materialization) for MaterializeScan are that the current table is the (sj-nest) inner table and all other inner tables appear in the preorder and are consecutive. In the code, POSITION::sjm_scan_last_inner is used to represent the last position of the inner table and the range from the subscript sjm_scan_last_inner - Table_ref::sj_inner_tables + 1 to sjm_scan_last_inner is the range of POSITION of the inner table.

From this position, set POSITION::sjm_scan_need_tables as the set of all sj_inner_tables and NESTED_JOIN::sj_depends_on (the outer table on which the semijoin conditions depend). It represents that JOIN of MaterializeScan can be performed only when these tables are in the preorder of JOIN. If the condition is met, the access paths of all outer tables from sjm_scan_last_inner to the current position will be reevaluated based on fanout of the materialized table (traverse the table and call best_access_path to pass in a new fanout) to calculate the cost of executing MaterializeScan. This process is in Optimize_table_order::semijoin_mat_scan_access_paths.

• If MaterializeScan is the strategy with the lowest cost under the current JOIN prefix, the current outer table will be marked: POSITION::sj_strategy

When the next outer table is added for SJ_OPT_MATERIALIZE_SCAN at the same time, the cost of the MaterializeScan access path is not reconsidered because POSITION::prefix_cost/prefix_rowcount of the previously selected SJ_OPT_MATERIALIZE_SCAN has been set to the result value of MaterializeScan. Therefore, the access paths of subsequent tables are evaluated based on cardinality and the cost of the MaterializeScan plan and do not need to be reevaluated.

• Otherwise, if MaterializeScan is not the current best semijoin strategy, it will be considered repeatedly when the next new table is searched by greedy search, because cardinality of the preorder under MaterializeScan may affect the access paths of subsequent tables.

MaterializeLookup

SET optimizer_switch='semijoin=on,materialization=on,loosescan=off,firstmatch=off,duplicateweedout=off';

EXPLAIN SELECT * FROM ct1, ct2 WHERE ct1.a IN (SELECT it1.a FROM it1, it2 WHERE it2.b = ct2.b);
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-----------------------+------+----------+------------------------------------------------------+
| id | select_type  | table       | partitions | type   | possible_keys       | key                 | key_len | ref                   | rows | filtered | Extra                                                |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-----------------------+------+----------+------------------------------------------------------+
|  1 | SIMPLE       | ct1         | NULL       | ALL    | a                   | NULL                | NULL    | NULL                  |    8 |   100.00 | Using where                                          |
|  1 | SIMPLE       | ct2         | NULL       | range  | b                   | b                   | 5       | NULL                  |    8 |   100.00 | Using index condition; Using join buffer (hash join) |
|  1 | SIMPLE       | <subquery2> | NULL       | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 10      | test.ct1.a,test.ct2.b |    1 |   100.00 | NULL                                                 |
|  2 | MATERIALIZED | it1         | NULL       | index  | a                   | a                   | 5       | NULL                  |   16 |   100.00 | Using index                                          |
|  2 | MATERIALIZED | it2         | NULL       | index  | b                   | b                   | 5       | NULL                  |   16 |   100.00 | Using index; Using join buffer (hash join)           |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-----------------------+------+----------+------------------------------------------------------+

| -> Nested loop inner join  (cost=1653 rows=16384)
    -> Inner hash join (no condition)  (cost=7.7 rows=64)
        -> Index range scan on ct2 using b over (NULL < b), with index condition: (ct2.b is not null)  (cost=0.131 rows=8)
        -> Hash
            -> Filter: (ct1.a is not null)  (cost=1.05 rows=8)
                -> Table scan on ct1  (cost=1.05 rows=8)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (a=ct1.a, b=ct2.b)  (cost=53.3..53.3 rows=1)
        -> Materialize with deduplication  (cost=53.3..53.3 rows=256)
            -> Filter: ((it1.a is not null) and (it2.b is not null))  (cost=27.7 rows=256)
                -> Inner hash join (no condition)  (cost=27.7 rows=256)
                    -> Index scan on it2 using b  (cost=0.116 rows=16)
                    -> Hash
                        -> Index scan on it1 using a  (cost=1.85 rows=16)
 |

• Optimize_table_order::advance_sj_state

The MaterializeLookup execution strategy is based on the conditions that the (sj-nest) inner tables are closely joined in the preorder of JOIN and the outer tables on which the semijoin conditions depend also appear. In this case, the JOIN of eq_ref can be used on the materialized table. In the semijoin_mat_lookup_access_paths function, the cost of the materialized table lookup is calculated based on prefix_rowcount of the last outer table in the preorder, and the cost of the materialized table setup is added as the final cost of this execution mode.

DuplicateWeedOut

DuplicateWeedout applies to a wider range, and JOIN ORDER between the original outer table and the inner table can be more flexible. The implementation principle is to create a temporary table to deduplicate the data bloat caused by the JOIN of the outer table with the inner table. When the outer table and the (sj-nest) inner table join, the primary keys of the outer table involved in semijoin are closely joined together to form a string and written to the rowids field of the weedout-tmp temporary table. There is a unique index in this field. If the write operation succeeds, it means that the current row combination appears for the first time, and the JOIN operation continues. Otherwise, the row is skipped.

SET optimizer_switch='semijoin=on,materialization=off,loosescan=off,firstmatch=off,duplicateweedout=on';

# In EXPLAIN of the traditional format, the more obvious identifier is Start temporary/End temporary. 
EXPLAIN SELECT * FROM ct1, ct2 WHERE ct1.a IN (SELECT it1.a FROM it1, it2 WHERE it2.b = ct2.b);
+----+-------------+-------+------------+-------+---------------+------+---------+------------+------+----------+------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref        | rows | filtered | Extra                                                |
+----+-------------+-------+------------+-------+---------------+------+---------+------------+------+----------+------------------------------------------------------+
|  1 | SIMPLE      | ct1   | NULL       | ALL   | a             | NULL | NULL    | NULL       |    8 |   100.00 | Using where; Start temporary                         |
|  1 | SIMPLE      | it1   | NULL       | ref   | a             | a    | 5       | test.ct1.a |    2 |   100.00 | Using index                                          |
|  1 | SIMPLE      | ct2   | NULL       | range | b             | b    | 5       | NULL       |    8 |   100.00 | Using index condition; Using join buffer (hash join) |
|  1 | SIMPLE      | it2   | NULL       | ref   | b             | b    | 5       | test.ct2.b |    2 |   100.00 | Using index; End temporary                           |
+----+-------------+-------+------------+-------+---------------+------+---------+------------+------+----------+------------------------------------------------------+

| -> Remove duplicate (ct1, ct2) rows using temporary table (weedout)  (cost=89.8 rows=334)
    -> Nested loop inner join  (cost=89.8 rows=334)
        -> Inner hash join (no condition)  (cost=19.8 rows=146)
            -> Index range scan on ct2 using b over (NULL < b), with index condition: (ct2.b is not null)  (cost=0.0574 rows=8)
            -> Hash
                -> Nested loop inner join  (cost=4.88 rows=18.3)
                    -> Filter: (ct1.a is not null)  (cost=1.05 rows=8)
                        -> Table scan on ct1  (cost=1.05 rows=8)
                    -> Covering index lookup on it1 using a (a=ct1.a)  (cost=0.279 rows=2.29)
        -> Covering index lookup on it2 using b (b=ct2.b)  (cost=1.32 rows=2.29)
 |

• Optimize_table_order::advance_sj_state

The condition for DuplicateWeedOut is simple, that is, all tables (outer tables and inner tables) on which sj-cond depends have already appeared in the JOIN prefix. Then the result of JOIN can be written to the temporary table for deduplication. POSITION::dupsweedout_tables records the combination of these dependency tables. POSITION::first_dupsweedout_table records the position of the first (sj-nest) inner table. This record is used to calculate the impact of the existence of inner_tables on the JOIN sequence fanout.

After the conditions are met, each time a new table is added to the JOIN prefix, the cost of DuplicateWeedOut is reconsidered, because different cardinalities generated by these JOIN prefixes result in different costs for reading and writing the temporary table. In semijoin_dupsweedout_access_paths, the access path of the table is not reconsidered, but the cost of fanout and temporary table write /read after deduplication is mainly calculated. The calculation of fanout is not very accurate, and the fanout with the maximum number of table rows is used as a limit. Call Cost_model_server::tmptable_readwrite_cost to estimate the read and write costs of the temporary table. The number of rows read is the number of cardinality rows generated by the inner and outer JOIN operations and the number of rows written is the estimated number of rows generated by the final outer table JOIN operations.

• Execution process

WeedoutIterator::Read
  |--m_source -> RowIterator::Read // Call the sub-operator to complete the JOIN process in the preorder.
  |--do_sj_dups_weedout
|-- Use memcpy to closely fill the primary key of the outer table involved.
    |--handler::ha_write_row // Write the rowids field to the temporary table for deduplication.

FirstMatch

FirstMatch is used to JOIN with the (sj-nest) inner table to get an output, skip all the subsequent data rows of inner_tables, jump directly to the position of the last outer_table in the preorder of JOIN, and read the next row for joining. The JOIN sequence range of FirstMatch must end with all consecutive inner_tables without any outer table. Suppose that the sequence is ct1, it1, ct2, and it2, if it jumps to ct2, duplicate rows will be generated due to the JOIN of it1. If it jumps to ct1, other rows that may be generated by joining the remaining data of ct1 and ct2 will be lost.

SET optimizer_switch='semijoin=on,materialization=off,loosescan=off,firstmatch=on,duplicateweedout=off';

EXPLAIN SELECT * FROM ct1, ct2 WHERE ct1.a IN (SELECT it1.a FROM it1, it2 WHERE it2.b = ct2.b);
+----+-------------+-------+------------+-------+---------------+------+---------+------------+------+----------+------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref        | rows | filtered | Extra                                                |
+----+-------------+-------+------------+-------+---------------+------+---------+------------+------+----------+------------------------------------------------------+
|  1 | SIMPLE      | ct1   | NULL       | ALL   | a             | NULL | NULL    | NULL       |    8 |   100.00 | Using where                                          |
|  1 | SIMPLE      | ct2   | NULL       | range | b             | b    | 5       | NULL       |    8 |   100.00 | Using index condition; Using join buffer (hash join) |
|  1 | SIMPLE      | it1   | NULL       | ref   | a             | a    | 5       | test.ct1.a |    2 |   100.00 | Using index                                          |
|  1 | SIMPLE      | it2   | NULL       | ref   | b             | b    | 5       | test.ct2.b |    2 |   100.00 | Using index; FirstMatch(ct2)                         |
+----+-------------+-------+------------+-------+---------------+------+---------+------------+------+----------+------------------------------------------------------+

| -> Nested loop semijoin  (cost=57.2 rows=334)
    -> Inner hash join (no condition)  (cost=7.7 rows=64)
        -> Index range scan on ct2 using b over (NULL < b), with index condition: (ct2.b is not null)  (cost=0.131 rows=8)
        -> Hash
            -> Filter: (ct1.a is not null)  (cost=1.05 rows=8)
                -> Table scan on ct1  (cost=1.05 rows=8)
    -> Nested loop inner join  (cost=37.4 rows=5.22)
        -> Covering index lookup on it1 using a (a=ct1.a)  (cost=0.254 rows=2.29)
        -> Covering index lookup on it2 using b (b=ct2.b)  (cost=1.32 rows=2.29)
 |

• Optimize_table_order::advance_sj_state

When all outer tables on which (sj-nest) depends are in the preorder of JOIN, all (sj-nest) inner tables appear consecutively, and the current table is the last inner table, consider the execution strategy of FirstMatch. POSITION::first_firstmatch_table stores the position of the first inner table and jumps to its previous outer table during final execution. When there is an inner table that is not consecutive, this variable is reset to avoid the consideration of FirstMatch. POSITION::dups_producing_tables represents an inner table that may JOIN duplicate data in the preorder. If this value is not 0, it means that it is in the middle of a semijoin and a new semijoin execution strategy cannot be enabled.

In Optimize_table_order::semijoin_firstmatch_loosescan_access_paths, the cost of FirstMatch is calculated and the table range is considered from the first table to the last table of inner_tables. The reason for reevaluating the access paths of these tables is that after the first row of data is generated by joining, it is possible to directly jump back to the next row of data in the last outer table. Therefore, if some inner tables select JOIN BUFFER, the cached data is wasted. Therefore, only for the inner table that selects JOIN BUFFER, JOIN BUFFER is prohibited from re-calling best_access_path to determine the access path and the cost. If there is only one inner table, the preceding outer table can use JOIN BUFFER.

For cost calculation, the optimizer believes that each inner table only needs to be accessed once on average due to the existence of the fast jump-back execution mechanism. Therefore, POSITION::read_cost is added up, the number of records that need to be evaluated is the fanout product of the inner and outer tables and the final cardinality is the fanout product of the outer table.

• Execution process

To implement FirstMatch in the executor, you only need to set the JOIN type to semijoin or some state variables on the corresponding JOIN iterator. After the inner operator JOINs a row of data, it can jump out of the inner JOIN and directly read the next row of data in the outer table. For example:

NestedLoopIterator::Read
  |--m_source_outer -> RowIterator::Read // Call the sub-operator to read a row of data in the outer table.
  |--m_source_inner -> RowIterator::Read // Read a row of data in the inner table.
|-- Generally, the INNER JOIN state is set to READING_INNER_ROWS to continuously access the data in the inner table.
|-- When the type is JoinType::SEMI, set the state to NEEDS_OUTER_ROW so that the next iteration call will read the next row of data in the outer table.

LooseScan

LooseScan is used to deduplicate by performing LooseScan on the first inner table when inner tables have indexes. Then, it is joined with the outer table. Subsequent semijoin conditions need to be based on the index field of the first inner table.

SET optimizer_switch='semijoin=on,materialization=off,loosescan=on,firstmatch=off,duplicateweedout=off';

EXPLAIN SELECT * FROM ct1, ct2 WHERE ct1.a IN (SELECT it1.a FROM it1, it2 WHERE it2.b = ct2.b);
+----+-------------+-------+------------+-------+---------------+------+---------+------------+------+----------+-------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref        | rows | filtered | Extra                               |
+----+-------------+-------+------------+-------+---------------+------+---------+------------+------+----------+-------------------------------------+
|  1 | SIMPLE      | ct1   | NULL       | ALL   | a             | NULL | NULL    | NULL       |    8 |   100.00 | Using where                         |
|  1 | SIMPLE      | it2   | NULL       | index | b             | b    | 5       | NULL       |   16 |    43.75 | Using where; Using index; LooseScan |
|  1 | SIMPLE      | it1   | NULL       | ref   | a             | a    | 5       | test.ct1.a |    2 |   100.00 | Using index; FirstMatch(it2)        |
|  1 | SIMPLE      | ct2   | NULL       | ref   | b             | b    | 5       | test.it2.b |    1 |   100.00 | NULL                                |
+----+-------------+-------+------------+-------+---------------+------+---------+------------+------+----------+-------------------------------------+

| -> Nested loop inner join  (cost=129 rows=146)
    -> Nested loop inner join  (cost=30.6 rows=128)
        -> Filter: (ct1.a is not null)  (cost=1.05 rows=8)
            -> Table scan on ct1  (cost=1.05 rows=8)
        -> Nested loop semijoin with duplicate removal on b  (cost=33.8 rows=16)
            -> Filter: (it2.b is not null)  (cost=0.119 rows=7)
                -> Index scan on it2 using b  (cost=0.119 rows=16)
            -> Covering index lookup on it1 using a (a=ct1.a)  (cost=0.252 rows=2.29)
    -> Index lookup on ct2 using b (b=it2.b)  (cost=10.5 rows=1.14)
 |

• Optimize_table_order::advance_sj_state

Conditions for possible LooseScan states:

The newly added table is the first (sj-nest) inner table.
Keyuse is not empty (with index selection information).
The tables (sj_corr_tables) that cannot be extracted as the join conditions of sj-cond are already in the JOIN prefix (because even if the result of LooseScan scan is deduplicated, it is still possible to generate multiple records by joining with the outer table in complex conditions).
There are tables on which sj-cond depends in subsequent tables.

LooseScan must be all consecutive inner_tables, followed by some outer_tables. All tables on which it depends are considered once after the preorder of JOIN. When there are multiple tables in (sj-nest), LooseScan will combine the execution mode similar to FirstMatch to set the jump position of the last inner table to the LooseScan table, that is, the first inner table (this is completely different from the execution strategy of semijoin FirstMatch mentioned above, which is to jump to the outer table). The benefit of FirstMatch is that there can be sj-cond with non-LooseScan inner tables, but the outer tables on which it depends must appear in the JOIN prefix (so that the join condition can be calculated when scanning non-LooseScan tables and the existence of FirstMatch ensures that for one record on the LooseScan table, other inner_tables will JOIN one record at most), and the remaining sj-cond only depends on the index column on the LooseScan table to ensure that JOIN does not cause data bloat.

Because of this jump-style execution mode. LooseScan does not allow JOIN BUFFER to appear before the last inner table. Thus it makes sense to cache subsequent outer table data. In semijoin_firstmatch_loosescan_access_paths, best_access_path is called to evaluate all index access paths of the LooseScan table to avoid overlooking some index prefixes due to the cost or heuristic rules, because LooseScan first focuses on feasibility.

Judgment of feasibility in semijoin_loosescan_fill_driving_table_position relies on the following concepts.

After all inner_tables JOIN, the sj-cond set that has been satisfied (all outer tables on which it depends are in the JOIN prefix) is represented by bound_sj_equalities (due to the implementation of FirstMatch, there is only one row of values that satisfy this part of sj-cond);
Traverse the Key-use of the same index. If it comes from sj-cond and depends on the subsequent tables, it is sj-cond processed based on the output column of LooseScan and is represented by handled_sj_equalities. The corresponding key_parts are represented by handled_keyparts.

Based on the preceding concepts, the conditions that need to be met can be summarized as follows:

The set of bound_sj_equalities and handled_sj_equalities forms a complete sj-cond.
The information of Keyuse (ref preorder table and LooseScan output column) forms the index prefix so that LooseScan can be used to deduplicate.
If bound_keyparts includes the first field of the index, it represents that the access path is ref and only the columns on LooseScan need to be output (the join with the preorder table is on ref). Otherwise, it needs to be a covering index so that there is no need to go back to the table to evaluate the join condition with the preorder table. In both cases, the corresponding cost values are calculated separately and the minimum Keyuse is taken as the index of LooseScan. Before execution, in setup_semijoin_dups_elimination, set firstmatch_return of the last inner table to the subscript of the LooseScan table as the jump position for FirstMatch.

• Execution process

The executor completes the goal of LooseScan in SQL. When a record is scanned and the previous row is joined, it saves the current index key and then compares the index keys when reading the next record. If the index keys are the same, it continues to read the next row. The underlying layer of InnoDB is the index scan.

NestedLoopSemiJoinWithDuplicateRemovalIterator::Init
  |--IndexScanIterator<false>::Init -> ha_innobase::index_init

NestedLoopSemiJoinWithDuplicateRemovalIterator::Read
  |--m_source_outer->Read() // Read a row in the LooseScan table.
|-- If m_deduplicate_against_previous_row is True and the index keys compared are the same, continue to read the next row of records.
  |--m_source_inner->Init() & m_source_inner->Read() // Read a row of JOIN operations in the inner table. Since for each row of LooseScan records, a row of the inner table is re-Inited and read, there is no loop reading of the inner table. In this way, FirstMatch is implemented.
|-- Mark m_deduplicate_against_previous_row as True. It represents that whether the Keys are the same needs to be compared and the current Keys to be stored.

Prepare for execution based on the determined strategy

• Optimize_table_order::fix_semijoin_strategies

In greedy_search, whenever a new table is added to the preorder of JOIN, the semijoin strategy is considered once, and the access path of the table affected by the strategy is reconsidered to calculate the best cost. Therefore, the semijoin execution strategy with the best cost may be different in the process of adding different tables. The best execution strategy acquired at each phase is recorded on POSITION of the corresponding table, as follows:

[ot1  it1  it2  ot2  ot3]
           sjX  sjY  -- sjX and sjY are different strategies

After acquiring the best JOIN ORDER, the sj-strategy at each POSITION needs to be traversed from back to front, because the back position records the best strategy within the larger JOIN prefix range. In the optimization process, the access path of the tables within the range is not recorded in POSITION (only the total cost is calculated). Here, these access paths should be persisted based on the determined execution strategy.

The execution strategy of Materialize needs to copy the POSITION information corresponding to the best JOIN ORDER of the inner table recorded in NESTED_JOIN::sjm.

MaterializeScan and FirstMatch/LooseScan respectively call semijoin_mat_scan_access_paths and semijoin_firstmatch_loosescan_access_paths to recalculate the best access paths for each table to the semijoin range and store them in JOIN::best_positions.

In addition, each execution strategy records the corresponding strategy sj_strategy/n_sj_tables to POSITION of the first table within the range, so that the semijoin strategy can be preferentially discovered when traversing POSITION in the preorder. It also resets other sj_strategy of tables within the range.

• make_join_readinfo -> setup_semijoin_dups_elimination

Traverse QEP_TAB sequentially so that you will first see the sj_strategy corresponding to the first table within the range, and make the final settings for each execution strategy.

DuplicateWeedOut creates a corresponding temporary table and a unique index on rowids in create_sj_tmp_table based on the total length of the primary key of the outer table.

FirstMatch sets QEP_TAB::firstmatch_return on the last inner table of inner tables to the last outer table before inner tables. It represents that after all inner tables JOIN a record, it directly jumps to the next row of the last outer table for execution.

LooseScan uses QEP_TAB::set_index to set the corresponding index on the first inner table. If there are multiple inner tables, LooseScan sets QEP_TAB::firstmatch_return of the last inner table to the inner table of LooseScan.

Summary

The above is the whole process of MySQL semijoin from identification to the optimizer determining the best execution strategy based on the cost. It can be seen that the implementation module of semijoin is relatively centralized and logically clear. The execution process of semijoin is similar to that of common JOIN. However, in the case of a specific strategy, subtle operations are performed at the computing layer to achieve deduplication and ensure the correctness of the semijoin results. The kernel of MySQL is extensive and profound. If there are any omissions, welcome to discuss them with the author.

References

[1] Source code mysql / mysql-server 8.0.33: https://github.com/mysql/mysql-server/tree/mysql-8.0.33
[2] MySQL Source Code Analysis: Execution Code Analysis for Semi-join Optimizing :https://zhuanlan.zhihu.com/p/382416772
[3] MariaDB Semi-join Subquery Optimizations: https://mariadb.com/kb/en/semi-join-subquery-optimizations/

Try out database products for free:

Community

A Deep Dive into MySQL - Overview of Semijoin

Flexibility of Semijoin

Prepare Phase

Collection of Semijoin candidate subqueries

The outer query pulls up the inner table to eliminate the original subqueries

Optimize Phase (Optimizer)

Cost selection for execution strategies

Materialize

MaterializeScan

MaterializeLookup

DuplicateWeedOut

FirstMatch

LooseScan

Prepare for execution based on the determined strategy

Summary

References

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for MySQL

AnalyticDB for MySQL

Database for FinTech Solution

ApsaraDB RDS for MySQL