All Products
Search
Document Center

AnalyticDB:Use primary and foreign key constraints to eliminate unnecessary joins

Last Updated:Nov 04, 2024

This topic describes how to use the constraint relationships between primary and foreign keys to eliminate unnecessary joins and optimize execution plans.

Prerequisites

An AnalyticDB for MySQL cluster of V3.1.10 or later is created.

Note

To query the minor version of an AnalyticDB for MySQL Data Lakehouse Edition cluster, execute the SELECT adb_version(); statement. To update the minor version of a cluster, contact technical support.

Overview

In most cases, large volumes of data are stored in large-scale database tables to perform complex query and analysis operations. You can use the JOIN statement to combine rows of two or more tables. However, join operations may degrade query performance. To optimize query performance, you can use join elimination. One of the optimization methods is to eliminate unnecessary joins by using primary and foreign key constraints. The constraints provide information about the relationships between tables to eliminate unnecessary joins, reduce query time, and improve database performance.

AnalyticDB for MySQL allows you to use the /*+ PK_FK_DEPENDENCY_ENABLED*/ hint to enable or disable the feature of using the primary and foreign key constraint information to eliminate unnecessary joins.

  • /*+ PK_FK_DEPENDENCY_ENABLED = true*/: enables the feature of obtaining the primary and foreign key constraint information. After the feature is enabled, the system applies join elimination rules based on the primary and foreign key constraint information.

  • /*+ PK_FK_DEPENDENCY_ENABLED = false*/: disables the feature of obtaining the primary and foreign key constraint information. After the feature is disabled, the system cannot apply join elimination rules based on the primary and foreign key constraint information.

Note
  • PK: the primary key.

  • FK: the foreign key.

Procedure

  1. Declare foreign keys.

  2. Apply join elimination rules.

Declare foreign keys

  • When you create a table, use the CREATE TABLE statement to add a foreign key.

  • For an existing table, use the ALTER TABLE statement to add or remove a foreign key.

Important
  • You cannot use multiple columns as a foreign key. Example: FOREIGN KEY (sr_item_sk, sr_ticket_number) REFERENCES store_sales(ss_item_sk,d_date_sk).

  • You must check the data constraint relationships between primary keys and foreign keys. AnalyticDB for MySQL does not check data constraints.

  • When you use the DROP TABLE statement to remove foreign keys, AnalyticDB for MySQL does not check this operation or report an error.

  • You cannot add foreign key constraints for external tables.

Use the CREATE TABLE statement to add a foreign key

Syntax

For information about the syntax, see CREATE TABLE.

Example

  1. Create a database named db and create a table named item in the database. Set the primary key to i_item_sk.

    CREATE DATABASE db;
    USE db;
    CREATE TABLE item
    (
      i_item_sk bigint NOT NULL,
      i_current_price bigint,
      PRIMARY KEY(i_item_sk)
    )
    DISTRIBUTED BY HASH(i_item_sk);
  2. Use the CREATE TABLE statement to add a foreign key.

    • Add a foreign key for the same database.

      Create a table named store_returns in the db database. Use the FOREIGN KEY clause to associate the sr_item_sk column of the store_returns table with the primary key column i_item_sk of the item table.

      CREATE TABLE store_returns
      (
        sr_sale_id bigint,
        sr_store_sk bigint,
        sr_item_sk bigint NOT NULL,
        FOREIGN KEY (sr_item_sk) REFERENCES item (i_item_sk)
      );
      Note

      The symbol parameter is not specified. In this case, the parser automatically uses sr_item_sk_fk as the name of the foreign key constraint.

    • Add a foreign key for another database.

      Create a database named db2, create a table named store_sales in the database, and then add a foreign key column named ss_item_sk.

      CREATE DATABASE db2;
      USE db2;
      CREATE TABLE store_sales
      (
        ss_sale_id bigint,
        ss_store_sk bigint,
        ss_item_sk bigint not null,
        CONSTRAINT fk_constraint FOREIGN KEY (ss_item_sk) REFERENCES db.item (i_item_sk)
      );

Use the CREATE TABLE statement to declare multiple foreign keys for the same table

Example:

  1. Create two tables named customer and vendor.

    USE db;
    CREATE TABLE customer
    (
      i_customer_sk bigint NOT NULL,
      i_current_price bigint,
      PRIMARY KEY(i_customer_sk)
    )
    DISTRIBUTED BY HASH(i_customer_sk);
    
    CREATE TABLE vendor
    (
      id bigint primary key,
      name varchar(5) not null
    );
  2. Create a table named store_product. Associate the sr_sale_id column of the store_product table with the primary key column id of the vendor table. Then, associate the sr_customer_sk column of the store_product table with the primary key column i_customer_sk of the customer table.

    CREATE TABLE store_product
    (
      sr_sale_id bigint,
      sr_store_sk bigint,
      sr_customer_sk bigint NOT NULL,
      FOREIGN KEY (sr_sale_id) REFERENCES vendor (id),
      FOREIGN KEY (sr_customer_sk) REFERENCES customer (i_customer_sk)
    );

Query foreign keys

Use the SHOW CREATE TABLE statement to query foreign keys.

Example:

USE db;
SHOW CREATE TABLE store_returns;

Sample result:

-- Result (other statements omitted):
-- CONSTRAINT `sr_item_sk_fk` FOREIGN KEY (`sr_item_sk`) REFERENCES `db`.`item`(`i_item_sk`)

Use the ALTER TABLE statement to add or remove a foreign key

Syntax

For information about the syntax, see ALTER TABLE.

Examples

In the following examples, a database named db is used. The table for which you want to add a foreign key is named store_returns. The primary table is named item.

  • Remove a foreign key from the store_returns table.

    USE db;
    ALTER TABLE store_returns DROP FOREIGN KEY sr_item_sk_fk;
  • Add a foreign key for the store_returns table.

    USE db;
    ALTER TABLE store_returns ADD CONSTRAINT sr_item_fk FOREIGN KEY (sr_item_sk) REFERENCES item (i_item_sk);

Use the ALTER TABLE statement to declare multiple foreign keys for the same table

You can use one ALTER TABLE statement to add a foreign key for a table at a time. To add multiple foreign keys for a table, execute multiple ALTER TABLE statements.

Example:

  1. Create a table named store in the db2 database.

    USE db2;
    CREATE TABLE store
    (
      id bigint primary key,
      name varchar(5) not null
    );
  2. Add a foreign key for the store_returns table in the db database.

    ALTER TABLE db.store_returns ADD FOREIGN KEY (sr_store_sk) REFERENCES store(id);

Apply join elimination rules

If you want to query data from only one of the joined tables, you can use primary and foreign key constraints to eliminate unnecessary joins, simplify execution plans, and improve query performance. Join elimination is commonly used in the following scenarios:

Two tables in the same database are joined

For example, you want to query data from the store_returns table after you perform a join operation on the store_returns and item tables.

/*+ PK_FK_DEPENDENCY_ENABLED = true*/
EXPLAIN
SELECT
  s.sr_sale_id,
  s.sr_store_sk,
  s.sr_item_sk
FROM
  store_returns s,
  item
WHERE
  sr_item_sk = i_item_sk;

Sample result:

+---------------+
| Plan Summary  |
+---------------+
 1- Output[ Query plan ] {Est rowCount: 1.0}
 2    -> Exchange[GATHER] {Est rowCount: 1.0}
 3        - TableScan {table: store_returns, Est rowCount: 1.0} 

The execution plan does not contain the INNER JOIN operator. In this case, the optimizer can use the constraint relationship between the sr_item_sk and i_item_sk columns to eliminate the join between the store_returns and item tables.

Two tables across databases are joined

For example, you want to query data from the store_sales table after you perform a join operation on the item table in the db database and the store_sales table in the db2 database.

USE db2;
-- Sample query statement
/*+ PK_FK_DEPENDENCY_ENABLED = true*/
EXPLAIN
SELECT
  s.ss_sale_id,
  s.ss_item_sk
FROM
  store_sales s,
  db.item
WHERE
  ss_item_sk = i_item_sk;

Sample result:

+---------------+
| Plan Summary  |
+---------------+
 1- Output[ Query plan ] {Est rowCount: 1.0}
 2    -> Exchange[GATHER] {Est rowCount: 1.0}
 3        - TableScan {table: store_sales, Est rowCount: 1.0} 

Multiple tables are joined

For example, you want to query data from the store_returns table after you perform a join operation on the store_returns, item, and store tables.

USE db;
-- Sample query statement
/*+ PK_FK_DEPENDENCY_ENABLED = true*/
EXPLAIN
SELECT
  s.sr_sale_id,
  s.sr_store_sk,
  s.sr_item_sk
FROM
  store_returns s,
  item,
  db2.store
WHERE
  sr_item_sk = i_item_sk
  AND sr_store_sk = id;

Sample result:

+---------------+
| Plan Summary  |
+---------------+
 1- Output[ Query plan ] {Est rowCount: 1.0}
 2    -> Exchange[GATHER] {Est rowCount: 1.0}
 3        - TableScan {table: store_returns, Est rowCount: 1.0} 

A view is created based on joined tables

A view may be created based on two or more tables that are joined. When you use the view in another query statement that does not reference specific information about the view, the optimizer can eliminate unnecessary joins. For example, create a view that contains information about the store_returns and item tables.

CREATE VIEW sr_item_v AS
SELECT
  s.sr_store_sk AS store_name,
  s.sr_sale_id AS sale_id,
  s.sr_item_sk AS sr_item_id,
  item.i_current_price AS item_price,
  item.i_item_sk as item_id
FROM
  store_returns s,
  item
WHERE
  sr_item_sk = i_item_sk;

If a query does not reference the item_price column of the item table, the optimizer can scan only the store_returns table, not the item table.

/*+ PK_FK_DEPENDENCY_ENABLED = true*/
EXPLAIN
SELECT store_name, sr_item_id, sale_id
FROM sr_item_v;

Sample result:

+---------------+
| Plan Summary  |
+---------------+
 1- Output[ Query plan ] {Est rowCount: 1.0}
 2    -> Exchange[GATHER] {Est rowCount: 1.0}
 3        - TableScan {table: store_returns, Est rowCount: 1.0} 

A foreign key constraint exists between the item.i_item_sk and store_returns.sr_item_sk columns. The optimizer can replace the reference to the item.i_item_sk column with the reference to the store_returns.sr_item_sk column. In this case, the optimizer can query data from columns of the store_returns table and eliminate unnecessary joins.

Exceptional case of join elimination

In the preceding view example, if a query references columns of the item table, you cannot eliminate joins.

-- Sample query statement
/*+ PK_FK_DEPENDENCY_ENABLED = true*/
EXPLAIN
SELECT store_name, sr_item_id, sale_id, item_price
FROM sr_item_v;

Sample result:

+---------------+
| Plan Summary  |
+---------------+
 1- Output[ Query plan ] {Est rowCount: 1.0}
 2    -> Exchange[GATHER] {Est rowCount: 1.0}
 3        -> InnerJoin[Hash Join] {Est rowCount: 1.0}
 4            -> Project {Est rowCount: 1.0} 
 5                -> Exchange[REPARTITION] {Est rowCount: 1.0}
 6                    - TableScan {table: store_returns, Est rowCount: 1.0} 
 7            -> LocalExchange[HASH] {Est rowCount: 1.0}
 8                -> ScanProject {table: item, Est rowCount: 1.0} 
 9                    - TableScan {table: item, Est rowCount: 1.0}