In which scenarios can LEFT JOIN be changed to RIGHT JOIN to optimize query performance - AnalyticDB

LEFT JOIN is a type of outer join that returns all records from the left table and the matching records from the right table. The hash join algorithm uses the right table to build a hash table, and LEFT JOIN does not re-order the left and right tables. This way, large amounts of data in the right table may cause issues such as slow execution and excessive memory consumption. This topic uses examples to describe the scenarios in which LEFT JOIN can be replaced with RIGHT JOIN.

Background information

By default, AnalyticDB for MySQL uses hash joins to join tables. The hash join algorithm uses the right table to build a hash table. This consumes a large amount of resources. Different from inner joins, outer joins that include LEFT JOIN and RIGHT JOIN cannot exchange the order of left and right tables. In scenarios in which the right table contains large amounts of data, slow execution and high memory consumption may occur. In extreme scenarios, the cluster performance is affected or the Out of Memory Pool size pre cal error message is returned. In such cases, you can use the optimization methods that are provided in this topic to reduce resource consumption.

Scenarios

You can change LEFT JOIN to RIGHT JOIN by modifying an SQL statement or adding a hint before the statement. This way, the original left table is changed to the right table to build a hash table. If the new right table contains a large amount of data, performance is affected. Therefore, we recommend that you change LEFT JOIN to RIGHT JOIN in the case of a smaller left table and a larger right table.

The size comparison between tables is related to the joined columns and cluster resources. You can execute the EXPLAIN ANALYZE statement to query the parameters of an execution plan, and determine whether RIGHT JOIN can be used based on the changes of parameters such as PeakMemory and WallTime. For more information, see Use the EXPLAIN and EXPLAIN ANALYZE commands to analyze execution plans.

Methods

You can use one of the following methods to change LEFT JOIN to RIGHT JOIN:

Modify an SQL statement. For example, change a left join b on a.col1 = b.col2 to b right join a on a.col1 = b.col2.
Add a hint before an SQL statement to specify an optimizer. The optimizer determines whether to change LEFT JOIN to RIGHT JOIN based on the estimated sizes of the left and right tables.
- For AnalyticDB for MySQL clusters of V3.1.8 or later, the table change feature is enabled by default. If this feature is disabled, add the following hint before the SQL statement to enable the feature: /*+O_CBO_RULE_SWAP_OUTER_JOIN=true*/.
- For AnalyticDB for MySQL clusters earlier than V3.1.8, the table change feature is disabled by default. Add the following hint before the SQL statement to enable the feature: /*+LEFT_TO_RIGHT_ENABLED=true*/.

Note

For information about how to query the minor version of a cluster, see How do I query the version of an AnalyticDB for MySQL cluster? To update the minor version of a cluster, contact technical support.

Examples

In this example, nation is a small table that has 25 rows, and customer is a large table that has 15,000,000 rows. You can execute the EXPLAIN ANALYZE statement to query the execution plan of an SQL statement that contains LEFT JOIN.

explain analyze
SELECT
  COUNT(*)
FROM
  nation t1
  left JOIN customer t2 ON t1.n_nationkey = t2.c_nationkey

The following query result shows the execution plan of stage 2 in which joins are performed. The LEFT JOIN operator contains the following information:

PeakMemory: 515MB (93.68%), WallTime: 4.34s (43.05%): The peak memory usage is 93.68%. This indicates that LEFT JOIN is the performance bottleneck of the SQL statement.
Left (probe) Input avg.: 0.52 rows; Right (build) Input avg.: 312500.00 rows: The right table is a large table, and the left table is a small table.

In this scenario, you can change LEFT JOIN to RIGHT JOIN to optimize the SQL statement.

Fragment 2 [HASH]
    Output: 48 rows (432B), PeakMemory: 516MB, WallTime: 6.52us, Input: 15000025 rows (200.27MB); per task: avg.: 2500004.17 std.dev.: 2410891.74
    Output layout: [count_0_2]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_2:bigint]
    │   Estimates: {rows: ? (?)}
    │   Output: 96 rows (864B), PeakMemory: 96B (0.00%), WallTime: 88.21ms (0.88%)
    │   count_2 := count(*)
    └─ LEFT Join[(`n_nationkey` = `c_nationkey`)][$hashvalue, $hashvalue_0_4]
       │   Outputs: []
       │   Estimates: {rows: 15000000 (0B)}
       │   Output: 30000000 rows (200.27MB), PeakMemory: 515MB (93.68%), WallTime: 4.34s (43.05%)
       │   Left (probe) Input avg.: 0.52 rows, Input std.dev.: 379.96%
       │   Right (build) Input avg.: 312500.00 rows, Input std.dev.: 380.00%
       │   Distribution: PARTITIONED
       ├─ RemoteSource[3]
       │      Outputs: [n_nationkey:integer, $hashvalue:bigint]
       │      Estimates: 
       │      Output: 25 rows (350B), PeakMemory: 64KB (0.01%), WallTime: 63.63us (0.00%)
       │      Input avg.: 0.52 rows, Input std.dev.: 379.96%
       └─ LocalExchange[HASH][$hashvalue_0_4] ("c_nationkey")
          │   Outputs: [c_nationkey:integer, $hashvalue_0_4:bigint]
          │   Estimates: {rows: 15000000 (57.22MB)}
          │   Output: 30000000 rows (400.54MB), PeakMemory: 10MB (1.84%), WallTime: 1.81s (17.93%)
          └─ RemoteSource[4]
                 Outputs: [c_nationkey:integer, $hashvalue_0_5:bigint]
                 Estimates: 
                 Output: 15000000 rows (200.27MB), PeakMemory: 3MB (0.67%), WallTime: 191.32ms (1.90%)
                 Input avg.: 312500.00 rows, Input std.dev.: 380.00%

Modify the SQL statement to change LEFT JOIN to RIGHT JOIN

SELECT
  COUNT(*)
FROM
  customer t2
  right JOIN nation t1 ON t1.n_nationkey = t2.c_nationkey

Add a hint before the SQL statement to change LEFT JOIN to RIGHT JOIN

For AnalyticDB for MySQL clusters of V3.1.8 or later, execute the following statement:

/*+O_CBO_RULE_SWAP_OUTER_JOIN=true*/
SELECT
  COUNT(*)
FROM
  nation t1
  left JOIN customer t2 ON t1.n_nationkey = t2.c_nationkey

For AnalyticDB for MySQL clusters earlier than V3.1.8, execute the following statement:

/*+LEFT_TO_RIGHT_ENABLED=true*/
SELECT
  COUNT(*)
FROM
  nation t1
  left JOIN customer t2 ON t1.n_nationkey = t2.c_nationkey

After you execute the EXPLAIN ANALYZE statement on the preceding SQL statements, you can see that LEFT JOIN is changed to RIGHT JOIN in the execution plan. In this case, the hint takes effect. The value of the PeakMemory parameter is reduced from 515 MB to 889 KB. RIGHT JOIN does not consume large amounts of resources.

Fragment 2 [HASH]
    Output: 96 rows (864B), PeakMemory: 12MB, WallTime: 4.27us, Input: 15000025 rows (200.27MB); per task: avg.: 2500004.17 std.dev.: 2410891.74
    Output layout: [count_0_2]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_2:bigint]
    │   Estimates: {rows: ? (?)}
    │   Output: 192 rows (1.69kB), PeakMemory: 456B (0.00%), WallTime: 5.31ms (0.08%)
    │   count_2 := count(*)
    └─ RIGHT Join[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue_0_4]
       │   Outputs: []
       │   Estimates: {rows: 15000000 (0B)}
       │   Output: 15000025 rows (350B), PeakMemory: 889KB (3.31%), WallTime: 3.15s (48.66%)
       │   Left (probe) Input avg.: 312500.00 rows, Input std.dev.: 380.00%
       │   Right (build) Input avg.: 0.52 rows, Input std.dev.: 379.96%
       │   Distribution: PARTITIONED
       ├─ RemoteSource[3]
       │      Outputs: [c_nationkey:integer, $hashvalue:bigint]
       │      Estimates: 
       │      Output: 15000000 rows (200.27MB), PeakMemory: 3MB (15.07%), WallTime: 634.81ms (9.81%)
       │      Input avg.: 312500.00 rows, Input std.dev.: 380.00%
       └─ LocalExchange[HASH][$hashvalue_0_4] ("n_nationkey")
          │   Outputs: [n_nationkey:integer, $hashvalue_0_4:bigint]
          │   Estimates: {rows: 25 (100B)}
          │   Output: 50 rows (700B), PeakMemory: 461KB (1.71%), WallTime: 942.37us (0.01%)
          └─ RemoteSource[4]
                 Outputs: [n_nationkey:integer, $hashvalue_0_5:bigint]
                 Estimates: 
                 Output: 25 rows (350B), PeakMemory: 64KB (0.24%), WallTime: 76.34us (0.00%)
                 Input avg.: 0.52 rows, Input std.dev.: 379.96%