All Products
Search
Document Center

PolarDB:Configure a filter algorithm for IMCI-based queries

Last Updated:May 23, 2024

You can use the In-Memory Column Index (IMCI) pruner feature in queries to filter out data blocks that do not need to be accessed in databases for better query performance This topic describes the IMCI pruner feature.

Background information

If IMCIs are created, data in a column is stored as blocks. By default, a data block contains 64,000 rows. You can estimate the number of data blocks by using the following formula: Total number of rows in the table/64,000. All data blocks related to the specified column are scanned to obtain data that meets the filter conditions. High scan costs are generated when you query a large table. If the data in the table cannot be stored in the memory, the scan costs are further increased. You can accelerate queries by accessing statistics and specifying conditions to filter out data blocks not to be accessed. Pruners are used to implement this process In the IMCIs of PolarDB. The following types of pruners are provided:

  • Bloom filters

    This pruner type uses a BIT array to represent a collection and can determine whether an element belongs to this collection.

  • Minmax indexes

    This pruner type checks the maximum and minimum numbers of data blocks. Minmax indexes examine the maximum and minimum numbers of data blocks against filter conditions to determine whether data blocks need to be scanned.

  • token bloom filter

    This pruner type filters strings and splits strings by non-alphanumeric characters. For example, this pruner splits the string "I am IMCI" into I |am|IMCI, which is suitable for LIKE queries.

  • ngram bloom filter

    This pruner type filters strings and splits strings by a specified length. For example, if you set the value of ngram to 3, this pruner splits the string "I am IMCI" into I a| am|am |m I| IM|IMC|MCI, which is suitable for LIKE queries.

Scenarios

  • Bloom filters are suitable for equality and IN conditions, and ideal for equality conditions that have high filterability. For example, you can use string IDs to filter equality.

  • Minmax indexes are suitable for scenarios where column data has high locality, and ideal for range conditions and equality conditions, such as the WHERE condition that contains data or sort fields.

  • Token bloom filter and ngram bloom filter are suitable for LIKE queries to quickly filter missed data blocks.

Storage overheads

For string data, if you enable the query optimization feature of the columnstore index pruner, the storage overheads and memory space are consumed. You can select bloom filters, minmax indexes, token bloom filter, or ngram bloom filter for the specified columns based on your business scenario. The following formulas can be used to calculate the memory usage of bloom filters and minmax indexes:

  • bloom filter/token bloom filter/ngram bloom filter

    • The default size of a data block is 64 KB. If the ratio of distinct values to total rows is greater than 3%, the following formula is used:

      Memory usage = 1.2 × Number of columns for which bloom filters are created × Number of table rows (unit: bytes)

    • The default size of a data block is 64 KB. If the ratio of distinct values to total rows is less than or equal to 3%, the following formula is used:

      Memory usage = 1.2 × Number of columns for which bloom filters are created × Number of distinct values (unit: bytes)

      In this case, the results of bloom filters depend on data locality. Bloom filters are not suitable for scenarios where data is evenly distributed.

  • Minmax indexes

    Memory usage = 2 × Number of columns for which minmax indexes are created × (Number of table rows/Data block size) × Prefix length × Character set length

    For example, if the number of rows in a table is 2 billion, minmax indexes are created for 10 columns, the prefix length is 20, the block size is 64 KB, and the default character set utf8mb4 that is 4 bytes in length is used. Approximately, 46 MB of memory is occupied.

Usage notes

  • If the version of your PolarDB for MySQL cluster is 8.0.1.1.32 and earlier or 8.0.2.2.13 and earlier, no pruners are created for null data blocks, and the IS NULL or IS NOT NULL filter condition is not supported.

  • By default, if the version of your PolarDB for MySQL cluster is 8.0.1.1.35 or later or 8.0.2.2.16 or later, pruners are created for string fields when IMCI are being created. Least Recently Used Cache (LRU Cache) is also used to manage the memory usage of bloom filters. For a cluster that is upgraded to 8.0.1.1.35 or later or 8.0.2.2.16 or later, IMCIs must be recreated. This way, pruners can be created for string columns.

  • If the version of your PolarDB for MySQL cluster is 8.0.1.1.34 and earlier or 8.0.2.2.15 and earlier, existing pruners are resident in memory. By default, no pruners are created for string fields when IMCIs are being created.

  • If you want to create pruners for string fields, make sure that strings in these fields do not contain "\0". Example: polar\0db.

  • By default, the system creates minmax indexes for numeric data such as INT, DECIMAL, and DATETIME.

  • Minmax indexes cannot be created for the fields of the JSON and GEOMETRY types.

  • Bloom filters cannot be created for the fields of the numeric type such as INT, DECIMAL, and DATETIME, the JSON type, the BLOB type, and the TEXT type.

Syntax

You can create pruners when you create a table, or create or delete pruners on an existing table. When you create or delete pruners on an existing table, you must delete IMCIs and then recreate IMCIs.

Note
  • You can execute a DDL statement to modify the COMMENT attribute in the schema of a table to create or delete pruners for a string column in the table.

  • The COMMENT attribute at the column level always takes precedence over the COMMENT attribute at the table level.

Create pruners when you create a table

  • Create bloom filters

    • Create bloom filters for all supported fields. Sample code:

      CREATE TABLE
        t1 (
          id INT PRIMARY KEY,
          str_col1 char(10),
          str_col2 varchar(10)
        ) ENGINE InnoDB COMMENT "COLUMNAR=1 PRUNER_BLOOM=1"; /* The comment contains the pruner_bloom attribute.*/
    • Create bloom filters for a column in the table. Sample code:

      CREATE TABLE
        t1 (
          id INT PRIMARY KEY,
          str_col1 char(10) "PRUNER_BLOOM=1", /* The comment contains the pruner_bloom attribute.*/
          str_col2 varchar(10)
        ) ENGINE InnoDB COMMENT "COLUMNAR=1";
    Note

    The PRUNER_BLOOM attribute is supported for a PolarDB for MySQL 8.0 cluster whose revision version is 8.0.1.1.32 or later or 8.0.2.2.13 or later.

  • Create minmax indexes

    A string field may contain a large number of characters. To reduce the memory usage of minmax indexes, the system automatically extracts the first 20 characters of the field and up to 255 characters to compare the characters with the maximum and minimum values. You can configure the PRUNER_MINMAX attribute to specify whether to generate the string "minmax" and configure the PREFIX_LEN attribute to specify the prefix length.

    Note

    The string length is independent of the character set length. For example, the first two characters of PolarDB are Po and the first five characters are Polar.

    • Create minmax indexes for all string columns in the table. Sample code:

      CREATE TABLE
        t1 (
          id INT PRIMARY KEY,
          str_col1 char(10),
          str_col2 varchar(10)
        ) ENGINE InnoDB COMMENT "COLUMNAR=1 PRUNER_MINMAX=1 PREFIX_LEN=30"; /* The comment contains the pruner_minmax attribute and the prefix is 30 characters in length.*/
    • Create minmax indexes for a column in the table. Sample code:

      CREATE TABLE
        t1 (
          id INT PRIMARY KEY,
          str_col1 char(10) "PRUNER_MINMAX=1 PREFIX_LEN=30", /* The comment contains the pruner_minmax attribute and the prefix is 30 characters in length.*/
          str_col2 varchar the (10) "PRUNER_MINMAX=1 PREFIX_LEN=10" /*The comment contains the pruner_minmax attribute and the prefix is 10 characters in length.*/
        ) ENGINE InnoDB COMMENT "COLUMNAR=1";
    Note

    The PRUNER_MINMAX and PREFIX_LEN attributes are supported for a PolarDB for MySQL 8.0 cluster whose revision version is 8.0.1.1.32 or later or 8.0.2.2.13 or later.

  • Create token bloom filters

    • Create token bloom filters for all supported fields. Sample code:

      CREATE TABLE
        t1 (
          id INT PRIMARY KEY,
          str_col1 char(10),
          str_col2 varchar(10)
        ) ENGINE InnoDB COMMENT "COLUMNAR=1 PRUNER_TOKEN_BLOOM=1";
    • Create token bloom filters for a column in the table. Sample code:

      CREATE TABLE
        t1 (
          id INT PRIMARY KEY,
          str_col1 char(10) "PRUNER_TOKEN_BLOOM=1",
          str_col2 varchar(10)
        ) ENGINE InnoDB COMMENT "COLUMNAR=1";
    Note

    The PRUNER_TOKEN_BLOOM attribute is supported for a PolarDB for MySQL cluster that meets one of the following version requirements:

    • A PolarDB for MySQL 8.0.1 cluster whose revision version is 8.0.1.1.39 or later.

    • A PolarDB for MySQL 8.0.2 cluster whose revision version is 8.0.2.2.20 or later.

  • Create ngram bloom filters

    • Create ngram bloom filters for all supported fields. Sample code:

      CREATE TABLE
        t1 (
          id INT PRIMARY KEY,
          str_col1 char(10),
          str_col2 varchar(10)
        ) ENGINE InnoDB COMMENT "COLUMNAR=1 PRUNER_NGRAM_BLOOM=2";
    • Create ngram bloom filters for a column in a table. Sample code:

      CREATE TABLE
        t1 (
          id INT PRIMARY KEY,
          str_col1 char(10) "PRUNER_NGRAM_BLOOM=3",
          str_col2 varchar(10)
        ) ENGINE InnoDB COMMENT "COLUMNAR=1";
    Note
    • The PRUNER_TOKEN_BLOOM attribute is supported for a PolarDB for MySQL cluster that meets one of the following version requirements:

      • A PolarDB for MySQL 8.0.1 cluster whose revision version is 8.0.1.1.39 or later.

      • A PolarDB for MySQL 8.0.2 cluster whose revision version is 8.0.2.2.20 or later.

    • You can set PRUNER_NGRAM_BLOOM=N. If N is greater than or equal to 2, it is the length used for splitting strings. You need to set N to a value that is no smaller than the length of the string that is specified in the LIKE "% string %" query. Otherwise, ngram bloom filters cannot be used.

Create or delete pruners on an existing table

If you want to create or delete pruners on an existing table, you need to recreate IMCIs. Therefore, you must delete IMCIs and then recreate IMCIs. Before you create IMCIs, you must add or remove the pruner attribute to or from the comment. The following examples describe how to add or remove the PRUNER_MINMAX attribute.

  • Create pruners

    A table uses the following structure:

           Table: t1
    Create Table: CREATE TABLE `t1` (
      `id` int(11) NOT NULL,
      `str_col1` char(10) DEFAULT NULL,
      `str_col2` varchar(10) DEFAULT NULL,
      PRIMARY KEY (`id`),
      COLUMNAR INDEX (`id`,`str_col1`,`str_col2`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1'
    • To create minmax pruners for all string columns in the t1 table, perform the following steps:

      1. Execute the following statement to delete IMCIs from the t1 table:

        ALTER TABLE t1 COMMENT = "COLUMNAR=0";
      2. Execute the following statement to create minmax pruners for all string columns in the t1 table:

        ALTER TABLE t1 COMMENT = "COLUMNAR=1 PRUNER_MINMAX=1"; 
      3. (Optional) Execute the following statement to view the table structure after pruners are created:

        SHOW CREATE TABLE t1 FULL \G

        Sample result:

        *************************** 1. row ***************************
               Table: t1
        Create Table: CREATE TABLE `t1` (
          `id` int(11) NOT NULL,
          `str_col1` char(10) DEFAULT NULL,
          `str_col2` varchar(10) DEFAULT NULL,
          PRIMARY KEY (`id`),
          COLUMNAR INDEX (`id`,`str_col1`,`str_col2`)
        ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1 PRUNER_MINMAX=1'

        The preceding table structure shows that the PRUNER_MINMAX attribute has been added to the t1 table.

    • To create minmax pruners for the str_col1 column in the t1 table, perform the following steps:

      1. Execute the following statement to create pruners for the str_col1 column:

        ALTER TABLE t1 MODIFY COLUMN str_col1 char(10) COMMENT 'PRUNER_MINMAX=1';
      2. Execute the following statements in sequence to recreate IMCIs for the t1 table:

        ALTER TABLE t1 COMMENT = "COLUMNAR=0";
        ALTER TABLE t1 COMMENT = "COLUMNAR=1";
      3. (Optional) Execute the following statement to view the table structure after IMCIs are recreated:

        SHOW CREATE TABLE t1 FULL \G

        Sample result:

        *************************** 1. row ***************************
               Table: t1
        Create Table: CREATE TABLE `t1` (
          `id` int(11) NOT NULL,
          `str_col1` char(10) DEFAULT NULL COMMENT 'PRUNER_MINMAX=1',
          `str_col2` varchar(10) DEFAULT NULL,
          PRIMARY KEY (`id`),
          COLUMNAR INDEX (`id`,`str_col1`,`str_col2`)
        ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1'

        The preceding table structure shows that the PRUNER_MINMAX attribute has been added to the str_col1 column.

  • Delete pruners

    • Delete the PRUNER_MINMAX attribute from a table.

      For example, a table named t1 has the following structure:

      SHOW CREATE TABLE t1 full \G
      *************************** 1. row ***************************
             Table: t1
      Create Table: CREATE TABLE `t1` (
        `id` int(11) NOT NULL,
        `str_col1` char(10) DEFAULT NULL,
        `str_col2` varchar(10) DEFAULT NULL,
        PRIMARY KEY (`id`),
        COLUMNAR INDEX (`id`,`str_col1`,`str_col2`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1 PRUNER_MINMAX=1'

      To delete the PRUNER_MINMAX attribute from the t1 table, perform the following steps:

      1. Execute the following statement to delete IMCIs from the t1 table:

        ALTER TABLE t1 COMMENT = "COLUMNAR=0";
      2. Execute the following statement to set the PRUNER_MINMAX attribute in COMMENT to 0 and recreate IMCIs:

        ALTER TABLE t1 COMMENT = "COLUMNAR=1 PRUNER_MINMAX=0";
      3. (Optional) Execute the following statement to view the table structure after IMCIs are recreated:

        SHOW CREATE TABLE t1 FULL \G

        Sample result:

        *************************** 1. row ***************************
               Table: t1
        Create Table: CREATE TABLE `t1` (
          `id` int(11) NOT NULL,
          `str_col1` char(10) DEFAULT NULL,
          `str_col2` varchar(10) DEFAULT NULL,
          PRIMARY KEY (`id`),
          COLUMNAR INDEX (`id`,`str_col1`,`str_col2`)
        ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1'

        The preceding table structure shows that the PRUNER_MINMAX attribute has been deleted from the t1 table.

    • Delete the PRUNER attribute from a column.

      For example, the PRUNER_MINMAX attribute exists on the str_col1 column in a table named t1. The table uses the following structure:

             Table: t1
      Create Table: CREATE TABLE `t1` (
        `id` int(11) NOT NULL,
        `str_col1` char(10) DEFAULT NULL COMMENT 'PRUNER_MINMAX=1',
        `str_col2` varchar(10) DEFAULT NULL,
        PRIMARY KEY (`id`),
        COLUMNAR INDEX (`id`,`str_col1`,`str_col2`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1'

      To delete the PRUNER_MINMAX attribute from the str_col1 column, perform the following steps:

      1. Execute the following statement to delete the PRUNER_MINMAX attribute from the str_col1 column:

        ALTER TABLE t1 MODIFY COLUMN str_col1 char(10) COMMENT 'PRUNER_MINMAX=0';
      2. Execute the following statements in sequence to recreate IMCIs:

        ALTER TABLE t1 COMMENT = "COLUMNAR=0";
        ALTER TABLE t1 COMMENT = "COLUMNAR=1";
      3. (Optional) Execute the following statement to view the table structure after IMCIs are recreated:

        SHOW CREATE TABLE t1 FULL \G

        Sample result:

        *************************** 1. row ***************************
               Table: t1
        Create Table: CREATE TABLE `t1` (
          `id` int(11) NOT NULL,
          `str_col1` char(10) DEFAULT NULL,
          `str_col2` varchar(10) DEFAULT NULL,
          PRIMARY KEY (`id`),
          COLUMNAR INDEX (`id`,`str_col1`,`str_col2`)
        ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1'

        The preceding table structure shows that the PRUNER_MINMAX attribute has been deleted from the str_col1 column.

Check whether the IMCI pruner feature takes effect

  • Check whether pruners are created for a string column in a table

    You can check whether pruners are created for a string column in a table named imci_secondary_indexes based on the status values of the STR_BLOOM_PRUNER and STR_MINMAX_PRUNER fields in the table. If the fields are set to 1, pruners are created for the column. Examples:

    SELECT * FROM information_schema.imci_secondary_indexes WHERE schema_name='test_tmp' AND table_name='t1'\G

    Sample result:

    *************************** 1. row ***************************
             TABLE_ID: 1091
          SCHEMA_NAME: test_tmp
           TABLE_NAME: t1
          COLUMN_NAME: str_col1
     STR_BLOOM_PRUNER: 1 -- Bloom filters are created for the str_col1 column.
    STR_MINMAX_PRUNER: 1 -- Minmax indexes are created for the str_col1 column.
        SINDEX_SWITCH: 0
    *************************** 2. row ***************************
             TABLE_ID: 1091
          SCHEMA_NAME: test_tmp
           TABLE_NAME: t1
          COLUMN_NAME: str_col2
     STR_BLOOM_PRUNER: 1 -- Bloom filters are created for the str_col2 column.
    STR_MINMAX_PRUNER: 1 -- Minmax indexes are created for the str_col2 column.
        SINDEX_SWITCH: 0
    2 rows in set (0.00 sec)

    The preceding results show that the STR_BLOOM_PRUNER field is set to 1. This indicates that bloom filters are created for the str_col1 and str_col2 columns. The STR_MINMAX_PRUNER field is also set to 1, which indicates that minmax indexes are created for the str_col1 and str_col2 columns.

  • Check whether the IMCI pruner feature takes effect on an SQL statement

    Before and after you execute an SQL statement, you can execute the SHOW STATUS LIKE 'imci_pruner%' statement to view the filtering of data blocks and check whether the IMCI pruner feature takes effect on the statement. The result contains the following status values:

    • imci_pruner_accepted: the number of data blocks that meet the filter conditions.

    • imci_pruner_rejected: the number of data blocks that do not meet the filter conditions.

    Number of skipped data blocks = Number of accepted data blocks + Number of rejected data blocks

    Accepted data blocks do not need to be filtered by conditions. If some columns need to be materialized, the accepted data blocks are still accessed. Rejected data blocks are not scanned. Therefore, no I/O operations are performed.

    Examples

    The following example describes how to check whether the IMCI pruner feature takes effect on an SQL statement executed on a table named t1. The t1 table uses the following structure:

           Table: t1
    Create Table: CREATE TABLE `t1` (
      `id` int(11) NOT NULL,
      `str_col1` char(10) DEFAULT NULL,
      `str_col2` varchar(10) DEFAULT NULL,
      PRIMARY KEY (`id`),
      COLUMNAR INDEX (`id`,`str_col1`,`str_col2`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1 PRUNER=1'

    For example, the table contains 10 data blocks and records that meet the str_col1='polardb' condition are included in one of the data blocks. Perform the following steps to check whether the IMCI pruner feature takes effect on the SELECT COUNT(1) FROM t1 WHERE str_col1='polardb' statement.

    1. Execute the following statement to view the pruner status:

      SHOW STATUS LIKE  'imci_pruner%';

      Sample result:

      +----------------------+-------+
      | Variable_name        | Value |
      +----------------------+-------+
      | imci_pruner_accepted | 0     |
      | imci_pruner_rejected | 0     |
      +----------------------+-------+
      2 rows in set (0.00 sec)
    2. Execute the following statement to view the number of records that meet the str_col1='polardb' condition:

      SELECT COUNT(1) FROM t1 WHERE str_col1='polardb';

      Sample result:

      +----------+
      | count(1) |
      +----------+
      |        1 |
      +----------+
      1 row in set (0.01 sec)
    3. Execute the following statement to view the pruner status again:

      SHOW STATUS LIKE  'imci_pruner%';

      Sample result:

      +----------------------+-------+
      | Variable_name        | Value |
      +----------------------+-------+
      | imci_pruner_accepted | 0     |
      | imci_pruner_rejected | 9     |
      +----------------------+-------+
      2 rows in set (0.00 sec)

      The result shows that the imci_pruner_accepted value is 0 and the imci_pruner_rejected value is 9. Nine data blocks are skipped during the query. The IMCI pruner feature takes effect on the query.

Performance test

The test uses a table that contains 120 million rows of data and approximately 1,800 data blocks and a cluster that has 2 cores and 4 GB of memory to compare the performance before and after bloom filters are created for the col column in the table. The distinct value for the col column of the string type is 80 million. Execute the following statement:

SELECT COUNT(1) FROM t1 WHERE col='xxx'

The following table lists the query time.

Bloom filters created

Bloom filters not created

0.15s

18.6s

The col='xxx' condition in the statement and the bloom filters prevent most data blocks from being scanned. Only a few data blocks are scanned in the actual query process, which improves the query performance.