When a query that involves a large amount of data is being executed, large amounts of compute resources are consumed as the SQL layer reads data from the storage engine layer and performs computation. To improve query performance, PolarDB for MySQL pushes bloom filters down to the storage engine layer. This topic describes the bloom filter pushdown feature.
Prerequisites
- The version of your PolarDB for MySQL cluster is 8.0, and the revision version is 8.0.2.2.3 or later. For more information about how to view the cluster version, see Query the engine version.
- The data is of the INT type.
Background
Bloom filter is a proven method that can reduce data read from the storage engine and improve computing efficiency. PolarDB for MySQL accelerates hash joins by using bloom filters. When a hash table is created, the system creates bloom filters and pushes them down to the probe engine. The probe engine uses the bloom filters to filter out the data that is unnecessary for the query. This method can significantly improve query performance by reducing the amounts of data transmitted between the SQL layer and storage engine layer and reducing the computing workloads at the SQL layer.
Use the bloom filter pushdown feature
Parameter | Level | Description |
---|---|---|
loose_bloom_filter_enabled | Global and session | Specifies whether bloom filters are enabled. Default value: ON. Valid values:
|
Performance test
hash join with bloom filter
of the Extra
parameter indicates that bloom filters are used.
- Q3
EXPLAIN SELECT l_orderkey, SUM(l_extendedprice * (1 - l_discount)) AS revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = 'MACHINERY' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < '1995-03-10' AND l_shipdate > '1995-03-10' GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY revenue DESC, o_orderdate LIMIT 10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: customer partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 148463 filtered: 10.00 Extra: Using where; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: orders partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1486962 filtered: 3.33 Extra: Using where; Using join buffer (hash join with bloom filter) *************************** 3. row *************************** id: 1 select_type: SIMPLE table: lineitem partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5948979 filtered: 3.33 Extra: Using where; Using join buffer (hash join with bloom filter)
- Q11
EXPLAIN SELECT ps_partkey, SUM(ps_supplycost * ps_availqty) AS value FROM partsupp, supplier, nation WHERE ps_suppkey = s_suppkey AND s_nationkey = n_nationkey AND n_name = 'INDIA' GROUP BY ps_partkey HAVING SUM(ps_supplycost * ps_availqty) > (SELECT SUM(ps_supplycost * ps_availqty) * 0.0001000000 FROM partsupp, supplier, nation WHERE ps_suppkey = s_suppkey AND s_nationkey = n_nationkey AND n_name = 'INDIA' ) ORDER BY value DESC\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: nation partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 25 filtered: 10.00 Extra: Using where; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: PRIMARY table: supplier partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10000 filtered: 10.00 Extra: Using where; Using join buffer (hash join with bloom filter) *************************** 3. row *************************** id: 1 select_type: PRIMARY table: partsupp partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 791815 filtered: 10.00 Extra: Using where; Using join buffer (hash join with bloom filter) *************************** 4. row *************************** id: 2 select_type: SUBQUERY table: nation partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 25 filtered: 10.00 Extra: Using where *************************** 5. row *************************** id: 2 select_type: SUBQUERY table: supplier partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10000 filtered: 10.00 Extra: Using where; Using join buffer (hash join with bloom filter) *************************** 6. row *************************** id: 2 select_type: SUBQUERY table: partsupp partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 791815 filtered: 10.00 Extra: Using where; Using join buffer (hash join with bloom filter)
- Q16
EXPLAIN SELECT p_brand, p_type, p_size, COUNT(DISTINCT ps_suppkey) AS supplier_cnt FROM partsupp, part WHERE p_partkey = ps_partkey AND p_brand <> 'Brand#33' AND p_type NOT LIKE 'PROMO POLISHED%' AND p_size IN (34, 45, 33, 42, 9, 24, 26, 7) AND ps_suppkey NOT IN (SELECT s_suppkey FROM supplier WHERE s_comment LIKE '%Customer%Complaints%' ) GROUP BY p_brand, p_type, p_size ORDER BY supplier_cnt DESC, p_brand, p_type, p_size\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: part partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 198116 filtered: 40.00 Extra: Using where; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: PRIMARY table: partsupp partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 791815 filtered: 10.00 Extra: Using where; Using join buffer (hash join with bloom filter) *************************** 3. row *************************** id: 2 select_type: SUBQUERY table: supplier partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10000 filtered: 11.11 Extra: Using where