You can use the In-Memory Column Index (IMCI) pruner feature in queries to filter out data blocks that do not need to be accessed in databases for better query performance This topic describes the IMCI pruner feature.
Background information
If IMCIs are created, data in a column is stored as blocks. By default, a data block contains 64,000 rows. You can estimate the number of data blocks by using the following formula: Total number of rows in the table/64,000
. All data blocks related to the specified column are scanned to obtain data that meets the filter conditions. High scan costs are generated when you query a large table. If the data in the table cannot be stored in the memory, the scan costs are further increased. You can accelerate queries by accessing statistics and specifying conditions to filter out data blocks not to be accessed. Pruners are used to implement this process In the IMCIs of PolarDB. The following types of pruners are provided:
Bloom filters
This pruner type uses a BIT array to represent a collection and can determine whether an element belongs to this collection.
Minmax indexes
This pruner type checks the maximum and minimum numbers of data blocks. Minmax indexes examine the maximum and minimum numbers of data blocks against filter conditions to determine whether data blocks need to be scanned.
token bloom filter
This pruner type filters strings and splits strings by non-alphanumeric characters. For example, this pruner splits the string "I am IMCI" into
I |am|IMCI
, which is suitable for LIKE queries.ngram bloom filter
This pruner type filters strings and splits strings by a specified length. For example, if you set the value of ngram to 3, this pruner splits the string "I am IMCI" into
I a| am|am |m I| IM|IMC|MCI
, which is suitable for LIKE queries.
Scenarios
Bloom filters are suitable for equality and IN conditions, and ideal for equality conditions that have high filterability. For example, you can use string IDs to filter equality.
Minmax indexes are suitable for scenarios where column data has high locality, and ideal for range conditions and equality conditions, such as the WHERE condition that contains data or sort fields.
Token bloom filter and ngram bloom filter are suitable for LIKE queries to quickly filter missed data blocks.
Storage overheads
For string data, if you enable the query optimization feature of the columnstore index pruner, the storage overheads and memory space are consumed. You can select bloom filters, minmax indexes, token bloom filter, or ngram bloom filter for the specified columns based on your business scenario. The following formulas can be used to calculate the memory usage of bloom filters and minmax indexes:
bloom filter/token bloom filter/ngram bloom filter
The default size of a data block is 64 KB. If the ratio of distinct values to total rows is greater than 3%, the following formula is used:
Memory usage = 1.2 × Number of columns for which bloom filters are created × Number of table rows (unit: bytes)
The default size of a data block is 64 KB. If the ratio of distinct values to total rows is less than or equal to 3%, the following formula is used:
Memory usage = 1.2 × Number of columns for which bloom filters are created × Number of distinct values (unit: bytes)
In this case, the results of bloom filters depend on data locality. Bloom filters are not suitable for scenarios where data is evenly distributed.
Minmax indexes
Memory usage = 2 × Number of columns for which minmax indexes are created × (Number of table rows/Data block size) × Prefix length × Character set length
For example, if the number of rows in a table is 2 billion, minmax indexes are created for 10 columns, the prefix length is 20, the block size is 64 KB, and the default character set utf8mb4 that is 4 bytes in length is used. Approximately, 46 MB of memory is occupied.
Usage notes
If the version of your PolarDB for MySQL cluster is 8.0.1.1.32 and earlier or 8.0.2.2.13 and earlier, no pruners are created for null data blocks, and the
IS NULL
orIS NOT NULL
filter condition is not supported.By default, if the version of your PolarDB for MySQL cluster is 8.0.1.1.35 or later or 8.0.2.2.16 or later, pruners are created for string fields when IMCI are being created. Least Recently Used Cache (LRU Cache) is also used to manage the memory usage of bloom filters. For a cluster that is upgraded to 8.0.1.1.35 or later or 8.0.2.2.16 or later, IMCIs must be recreated. This way, pruners can be created for string columns.
If the version of your PolarDB for MySQL cluster is 8.0.1.1.34 and earlier or 8.0.2.2.15 and earlier, existing pruners are resident in memory. By default, no pruners are created for string fields when IMCIs are being created.
If you want to create pruners for string fields, make sure that strings in these fields do not contain "\0". Example: polar\0db.
By default, the system creates minmax indexes for numeric data such as INT, DECIMAL, and DATETIME.
Minmax indexes cannot be created for the fields of the JSON and GEOMETRY types.
Bloom filters cannot be created for the fields of the numeric type such as INT, DECIMAL, and DATETIME, the JSON type, the BLOB type, and the TEXT type.
Syntax
You can create pruners when you create a table, or create or delete pruners on an existing table. When you create or delete pruners on an existing table, you must delete IMCIs and then recreate IMCIs.
You can execute a DDL statement to modify the COMMENT attribute in the schema of a table to create or delete pruners for a string column in the table.
The COMMENT attribute at the column level always takes precedence over the COMMENT attribute at the table level.
Create pruners when you create a table
Create bloom filters
Create bloom filters for all supported fields. Sample code:
CREATE TABLE t1 ( id INT PRIMARY KEY, str_col1 char(10), str_col2 varchar(10) ) ENGINE InnoDB COMMENT "COLUMNAR=1 PRUNER_BLOOM=1"; /* The comment contains the pruner_bloom attribute.*/
Create bloom filters for a column in the table. Sample code:
CREATE TABLE t1 ( id INT PRIMARY KEY, str_col1 char(10) "PRUNER_BLOOM=1", /* The comment contains the pruner_bloom attribute.*/ str_col2 varchar(10) ) ENGINE InnoDB COMMENT "COLUMNAR=1";
NoteThe
PRUNER_BLOOM
attribute is supported for a PolarDB for MySQL 8.0 cluster whose revision version is 8.0.1.1.32 or later or 8.0.2.2.13 or later.Create minmax indexes
A string field may contain a large number of characters. To reduce the memory usage of minmax indexes, the system automatically extracts the first 20 characters of the field and up to 255 characters to compare the characters with the maximum and minimum values. You can configure the PRUNER_MINMAX attribute to specify whether to generate the string "minmax" and configure the PREFIX_LEN attribute to specify the prefix length.
NoteThe string length is independent of the character set length. For example, the first two characters of PolarDB are Po and the first five characters are Polar.
Create minmax indexes for all string columns in the table. Sample code:
CREATE TABLE t1 ( id INT PRIMARY KEY, str_col1 char(10), str_col2 varchar(10) ) ENGINE InnoDB COMMENT "COLUMNAR=1 PRUNER_MINMAX=1 PREFIX_LEN=30"; /* The comment contains the pruner_minmax attribute and the prefix is 30 characters in length.*/
Create minmax indexes for a column in the table. Sample code:
CREATE TABLE t1 ( id INT PRIMARY KEY, str_col1 char(10) "PRUNER_MINMAX=1 PREFIX_LEN=30", /* The comment contains the pruner_minmax attribute and the prefix is 30 characters in length.*/ str_col2 varchar the (10) "PRUNER_MINMAX=1 PREFIX_LEN=10" /*The comment contains the pruner_minmax attribute and the prefix is 10 characters in length.*/ ) ENGINE InnoDB COMMENT "COLUMNAR=1";
NoteThe
PRUNER_MINMAX
andPREFIX_LEN
attributes are supported for a PolarDB for MySQL 8.0 cluster whose revision version is 8.0.1.1.32 or later or 8.0.2.2.13 or later.Create token bloom filters
Create token bloom filters for all supported fields. Sample code:
CREATE TABLE t1 ( id INT PRIMARY KEY, str_col1 char(10), str_col2 varchar(10) ) ENGINE InnoDB COMMENT "COLUMNAR=1 PRUNER_TOKEN_BLOOM=1";
Create token bloom filters for a column in the table. Sample code:
CREATE TABLE t1 ( id INT PRIMARY KEY, str_col1 char(10) "PRUNER_TOKEN_BLOOM=1", str_col2 varchar(10) ) ENGINE InnoDB COMMENT "COLUMNAR=1";
NoteThe
PRUNER_TOKEN_BLOOM
attribute is supported for a PolarDB for MySQL cluster that meets one of the following version requirements:A PolarDB for MySQL 8.0.1 cluster whose revision version is 8.0.1.1.39 or later.
A PolarDB for MySQL 8.0.2 cluster whose revision version is 8.0.2.2.20 or later.
Create ngram bloom filters
Create ngram bloom filters for all supported fields. Sample code:
CREATE TABLE t1 ( id INT PRIMARY KEY, str_col1 char(10), str_col2 varchar(10) ) ENGINE InnoDB COMMENT "COLUMNAR=1 PRUNER_NGRAM_BLOOM=2";
Create ngram bloom filters for a column in a table. Sample code:
CREATE TABLE t1 ( id INT PRIMARY KEY, str_col1 char(10) "PRUNER_NGRAM_BLOOM=3", str_col2 varchar(10) ) ENGINE InnoDB COMMENT "COLUMNAR=1";
NoteThe
PRUNER_TOKEN_BLOOM
attribute is supported for a PolarDB for MySQL cluster that meets one of the following version requirements:A PolarDB for MySQL 8.0.1 cluster whose revision version is 8.0.1.1.39 or later.
A PolarDB for MySQL 8.0.2 cluster whose revision version is 8.0.2.2.20 or later.
You can set
PRUNER_NGRAM_BLOOM=N
. If N is greater than or equal to 2, it is the length used for splitting strings. You need to set N to a value that is no smaller than the length of the string that is specified in theLIKE "% string %"
query. Otherwise, ngram bloom filters cannot be used.
Create or delete pruners on an existing table
If you want to create or delete pruners on an existing table, you need to recreate IMCIs. Therefore, you must delete IMCIs and then recreate IMCIs. Before you create IMCIs, you must add or remove the pruner attribute to or from the comment. The following examples describe how to add or remove the PRUNER_MINMAX
attribute.
Create pruners
A table uses the following structure:
Table: t1 Create Table: CREATE TABLE `t1` ( `id` int(11) NOT NULL, `str_col1` char(10) DEFAULT NULL, `str_col2` varchar(10) DEFAULT NULL, PRIMARY KEY (`id`), COLUMNAR INDEX (`id`,`str_col1`,`str_col2`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1'
To create minmax pruners for all string columns in the
t1
table, perform the following steps:Execute the following statement to delete IMCIs from the
t1
table:ALTER TABLE t1 COMMENT = "COLUMNAR=0";
Execute the following statement to create minmax pruners for all string columns in the
t1
table:ALTER TABLE t1 COMMENT = "COLUMNAR=1 PRUNER_MINMAX=1";
(Optional) Execute the following statement to view the table structure after pruners are created:
SHOW CREATE TABLE t1 FULL \G
Sample result:
*************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `id` int(11) NOT NULL, `str_col1` char(10) DEFAULT NULL, `str_col2` varchar(10) DEFAULT NULL, PRIMARY KEY (`id`), COLUMNAR INDEX (`id`,`str_col1`,`str_col2`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1 PRUNER_MINMAX=1'
The preceding table structure shows that the
PRUNER_MINMAX
attribute has been added to thet1
table.
To create minmax pruners for the
str_col1
column in thet1
table, perform the following steps:Execute the following statement to create pruners for the
str_col1
column:ALTER TABLE t1 MODIFY COLUMN str_col1 char(10) COMMENT 'PRUNER_MINMAX=1';
Execute the following statements in sequence to recreate IMCIs for the
t1
table:ALTER TABLE t1 COMMENT = "COLUMNAR=0"; ALTER TABLE t1 COMMENT = "COLUMNAR=1";
(Optional) Execute the following statement to view the table structure after IMCIs are recreated:
SHOW CREATE TABLE t1 FULL \G
Sample result:
*************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `id` int(11) NOT NULL, `str_col1` char(10) DEFAULT NULL COMMENT 'PRUNER_MINMAX=1', `str_col2` varchar(10) DEFAULT NULL, PRIMARY KEY (`id`), COLUMNAR INDEX (`id`,`str_col1`,`str_col2`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1'
The preceding table structure shows that the
PRUNER_MINMAX
attribute has been added to thestr_col1
column.
Delete pruners
Delete the
PRUNER_MINMAX
attribute from a table.For example, a table named
t1
has the following structure:SHOW CREATE TABLE t1 full \G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `id` int(11) NOT NULL, `str_col1` char(10) DEFAULT NULL, `str_col2` varchar(10) DEFAULT NULL, PRIMARY KEY (`id`), COLUMNAR INDEX (`id`,`str_col1`,`str_col2`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1 PRUNER_MINMAX=1'
To delete the
PRUNER_MINMAX
attribute from thet1
table, perform the following steps:Execute the following statement to delete IMCIs from the
t1
table:ALTER TABLE t1 COMMENT = "COLUMNAR=0";
Execute the following statement to set the
PRUNER_MINMAX
attribute in COMMENT to 0 and recreate IMCIs:ALTER TABLE t1 COMMENT = "COLUMNAR=1 PRUNER_MINMAX=0";
(Optional) Execute the following statement to view the table structure after IMCIs are recreated:
SHOW CREATE TABLE t1 FULL \G
Sample result:
*************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `id` int(11) NOT NULL, `str_col1` char(10) DEFAULT NULL, `str_col2` varchar(10) DEFAULT NULL, PRIMARY KEY (`id`), COLUMNAR INDEX (`id`,`str_col1`,`str_col2`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1'
The preceding table structure shows that the
PRUNER_MINMAX
attribute has been deleted from thet1
table.
Delete the
PRUNER
attribute from a column.For example, the
PRUNER_MINMAX
attribute exists on thestr_col1
column in a table namedt1
. The table uses the following structure:Table: t1 Create Table: CREATE TABLE `t1` ( `id` int(11) NOT NULL, `str_col1` char(10) DEFAULT NULL COMMENT 'PRUNER_MINMAX=1', `str_col2` varchar(10) DEFAULT NULL, PRIMARY KEY (`id`), COLUMNAR INDEX (`id`,`str_col1`,`str_col2`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1'
To delete the
PRUNER_MINMAX
attribute from thestr_col1
column, perform the following steps:Execute the following statement to delete the
PRUNER_MINMAX
attribute from thestr_col1
column:ALTER TABLE t1 MODIFY COLUMN str_col1 char(10) COMMENT 'PRUNER_MINMAX=0';
Execute the following statements in sequence to recreate IMCIs:
ALTER TABLE t1 COMMENT = "COLUMNAR=0"; ALTER TABLE t1 COMMENT = "COLUMNAR=1";
(Optional) Execute the following statement to view the table structure after IMCIs are recreated:
SHOW CREATE TABLE t1 FULL \G
Sample result:
*************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `id` int(11) NOT NULL, `str_col1` char(10) DEFAULT NULL, `str_col2` varchar(10) DEFAULT NULL, PRIMARY KEY (`id`), COLUMNAR INDEX (`id`,`str_col1`,`str_col2`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1'
The preceding table structure shows that the
PRUNER_MINMAX
attribute has been deleted from thestr_col1
column.
Check whether the IMCI pruner feature takes effect
Check whether pruners are created for a string column in a table
You can check whether pruners are created for a string column in a table named
imci_secondary_indexes
based on the status values of theSTR_BLOOM_PRUNER
andSTR_MINMAX_PRUNER
fields in the table. If the fields are set to 1, pruners are created for the column. Examples:SELECT * FROM information_schema.imci_secondary_indexes WHERE schema_name='test_tmp' AND table_name='t1'\G
Sample result:
*************************** 1. row *************************** TABLE_ID: 1091 SCHEMA_NAME: test_tmp TABLE_NAME: t1 COLUMN_NAME: str_col1 STR_BLOOM_PRUNER: 1 -- Bloom filters are created for the str_col1 column. STR_MINMAX_PRUNER: 1 -- Minmax indexes are created for the str_col1 column. SINDEX_SWITCH: 0 *************************** 2. row *************************** TABLE_ID: 1091 SCHEMA_NAME: test_tmp TABLE_NAME: t1 COLUMN_NAME: str_col2 STR_BLOOM_PRUNER: 1 -- Bloom filters are created for the str_col2 column. STR_MINMAX_PRUNER: 1 -- Minmax indexes are created for the str_col2 column. SINDEX_SWITCH: 0 2 rows in set (0.00 sec)
The preceding results show that the
STR_BLOOM_PRUNER
field is set to 1. This indicates that bloom filters are created for thestr_col1
andstr_col2
columns. TheSTR_MINMAX_PRUNER
field is also set to 1, which indicates that minmax indexes are created for thestr_col1
andstr_col2
columns.Check whether the IMCI pruner feature takes effect on an SQL statement
Before and after you execute an SQL statement, you can execute the
SHOW STATUS LIKE 'imci_pruner%'
statement to view the filtering of data blocks and check whether the IMCI pruner feature takes effect on the statement. The result contains the following status values:imci_pruner_accepted
: the number of data blocks that meet the filter conditions.imci_pruner_rejected
: the number of data blocks that do not meet the filter conditions.
Number of skipped data blocks = Number of accepted data blocks + Number of rejected data blocks
Accepted data blocks do not need to be filtered by conditions. If some columns need to be materialized, the accepted data blocks are still accessed. Rejected data blocks are not scanned. Therefore, no I/O operations are performed.
Examples
The following example describes how to check whether the IMCI pruner feature takes effect on an SQL statement executed on a table named
t1
. The t1 tableuses the following structure:
Table: t1 Create Table: CREATE TABLE `t1` ( `id` int(11) NOT NULL, `str_col1` char(10) DEFAULT NULL, `str_col2` varchar(10) DEFAULT NULL, PRIMARY KEY (`id`), COLUMNAR INDEX (`id`,`str_col1`,`str_col2`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='COLUMNAR=1 PRUNER=1'
For example, the table contains 10 data blocks and records that meet the
str_col1='polardb'
condition are included in one of the data blocks. Perform the following steps to check whether the IMCI pruner feature takes effect on theSELECT COUNT(1) FROM t1 WHERE str_col1='polardb'
statement.Execute the following statement to view the pruner status:
SHOW STATUS LIKE 'imci_pruner%';
Sample result:
+----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | imci_pruner_accepted | 0 | | imci_pruner_rejected | 0 | +----------------------+-------+ 2 rows in set (0.00 sec)
Execute the following statement to view the number of records that meet the
str_col1='polardb'
condition:SELECT COUNT(1) FROM t1 WHERE str_col1='polardb';
Sample result:
+----------+ | count(1) | +----------+ | 1 | +----------+ 1 row in set (0.01 sec)
Execute the following statement to view the pruner status again:
SHOW STATUS LIKE 'imci_pruner%';
Sample result:
+----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | imci_pruner_accepted | 0 | | imci_pruner_rejected | 9 | +----------------------+-------+ 2 rows in set (0.00 sec)
The result shows that the
imci_pruner_accepted
value is 0 and theimci_pruner_rejected
value is 9. Nine data blocks are skipped during the query. The IMCI pruner feature takes effect on the query.
Performance test
The test uses a table that contains 120 million rows of data and approximately 1,800 data blocks and a cluster that has 2 cores and 4 GB of memory to compare the performance before and after bloom filters are created for the col
column in the table. The distinct value for the col column of the string type is 80 million. Execute the following statement:
SELECT COUNT(1) FROM t1 WHERE col='xxx'
The following table lists the query time.
Bloom filters created | Bloom filters not created |
0.15s | 18.6s |
The col='xxx'
condition in the statement and the bloom filters prevent most data blocks from being scanned. Only a few data blocks are scanned in the actual query process, which improves the query performance.