All Products
Search
Document Center

:Use the OSS file filter feature to accelerate cold data queries

Last Updated:Sep 04, 2024

This topic describes how to use the OSS file filter feature to accelerate cold data queries.

Background information

With the widespread application of OSS, more and more data is stored on OSS. The low query performance issue is steadily worsening and affects user experience. Although parallel query execution on cold data can accelerate queries, parallel query execution on cold data consumes a lot of memory, connections, and network bandwidth resources during the process of scanning OSS tables. User experience is still affected. Therefore, a method of quickly filtering OSS table data is required to improve query performance.

The OSS file filter feature is therefore introduced. This feature excludes OSS data blocks that do not need to be scanned based on query conditions. This reduces the amount of scanned data and greatly improves query performance. This feature not only optimizes resource efficiency, but also significantly enhances user experience.

Feature overview

The OSS file filter feature collects statistics on each archived data block, generates filter data, and stores the filter data in OSS. The system excludes the data blocks that do not need to be scanned based on the query conditions that are pushed and the filter data. This reduces the amount of scanned data and shortens the query time. OSS cold data can be filtered by the OSS file filter that you create. Different filtering methods are used for different data types: numeric data is filtered by comparing the minimum and maximum statistics of data blocks. String data is filtered by comparing the character map. If the BLOOM data type is specified, the BLOOM filter is used.

The OSS file filter feature performs better for ordered or partially ordered numeric data. However, for globally unordered data, the BLOOM filter is recommended for better filtering efficiency.

Because archived data typically has good time series, filtering columns of time types for can significantly improve query performance.

Prerequisites

  • The cluster must be of MySQL 8.0.2 with revision version 8.0.2.2.25 or later.

  • The cluster must have the cold data archiving feature enabled. For more information, see Enable cold data archiving.

  • A connection to the cluster is established. For more information, see Connect to a cluster.

  • You can create OSS file filters only for OSS cold data in the CSV format.

  • To enable the OSS file filter feature on a table, the table must have the OSS META = 1 option. You can execute the SHOW CREATE TABLE statement to check whether OSS META is enabled for the current table in the CSV format. If the result contains OSS META=1, OSS META is enabled for the current table. For more information, see Perform DDL operations on cold data.

    SHOW CREATE TABLE t;
    *************************** 1. row ***************************
           Table: t
    Create Table: CREATE TABLE `t` (
      `id` int(11) DEFAULT NULL
    ) /*!99990 800020213 STORAGE OSS */ ENGINE=CSV DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci /*!99990 800020204 NULL_MARKER='NULL' */ /*!99990 800020223 OSS META=1 */
    
    1 row in set (0.00 sec)

Limits

  • You cannot enable the OSS file filter feature after the ePQ feature is enabled.

  • The following functions are supported in the OSS file filter feature: =, <=>, <, <=, >=,>, BETWEEN, LIKE, IS NULL, and IS NOT NULL. The percent sign can only be added to the right of a variable in LIKE (LIKE 'ABC%').

  • The following table lists the data types supported in the OSS file filter feature.

    Category

    Data type

    Integer

    TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, TINYINT UNSIGNED, SMALLINT UNSIGNED, MEDIUMINT UNSIGNED, INT UNSIGNE, BIGINT UNSIGNED

    Floating-point

    FLOAT, DOUBLE

    Fixed-point

    DECIMAL

    Time

    DATE, DATETIME, TIME, TIMESTAMP

    String

    CHAR, VARCHAR

  • All data types are supported in a BLOOM filter.

  • If UUIDs are stored in strings, we recommend that you create a BLOOM filter.

  • For columns of the string type, only case-sensitive comparisons are supported.

  • The AND operation between multiple conditions is supported, but the OR operation between multiple conditions is not supported.

  • If the OSS table does not have data, you cannot use the OSS file filter feature for query acceleration.

Procedure

  1. In the console, set the cluster parameter loose_use_oss_meta to ON to enable the USE_OSS_META feature. For more information about how to configure cluster parameters, see Configure cluster and node parameters.

  2. In the console, set the cluster parameter loose_optimizer_switch to ENGINE_CONDITION_PUSHDOWN=ON. For more information about how to configure cluster parameters, see Configure cluster and node parameters.

  3. In the console, set the cluster parameter loose_csv_oss_file_filter to ON to enable the OSS file filter feature. For more information about how to configure cluster parameters, see Configure cluster and node parameters.

  4. Make sure that the cluster is connected. If not, you can connect to the cluster as specified in Connect to a cluster. Check whether the preceding parameters are configured.

    -- Check whether the loose_csv_oss_file_filter parameter is set to ON
    SHOW VARIABLES LIKE 'oss_file_filter';
    -- Check whether the use_oss_meta parameter is set to ON.
    SHOW VARIABLES LIKE 'use_oss_meta';
    -- Check whether the loose_optimizer_switch parameter is set to ENGINE_CONDITION_PUSHDOWN=ON.
    SHOW VARIABLES LIKE 'optimizer_switch';

OSS file filter types

You can specify OSS file filter data for an archived table by adding the OSS_FILE_FILTER option. The OSS_FILE_FILTER option is in the following format:

OSS_FILE_FILTER = 'field_filter[,field_filter]'
field_filter := field_name[:filter_type]
filter_type := bloom

Default OSS file filter types are defined for different data types. By default, numeric and time data is filtered by comparing the minimum and maximum statistics. String data is filtered by comparing the character map. All data types can be filtered by using the BLOOM filter.

Examples

Create an OSS file filter when manually archiving a table

You can create an OSS file filter when you manually archive a table.

-- Create a table that uses the specified structure.
CREATE TABLE `lineitem` (
  `L_ORDERKEY` int(11) NOT NULL,
  `L_PARTKEY` int(11) NOT NULL,
  `L_SUPPKEY` int(11) NOT NULL,
  `L_LINENUMBER` int(11) NOT NULL,
  `L_QUANTITY` decimal(15,2) NOT NULL,
  `L_EXTENDEDPRICE` decimal(15,2) NOT NULL,
  `L_DISCOUNT` decimal(15,2) NOT NULL,
  `L_TAX` decimal(15,2) NOT NULL,
  `L_RETURNFLAG` char(1) COLLATE utf8_bin NOT NULL,
  `L_LINESTATUS` char(1) COLLATE utf8_bin NOT NULL,
  `L_SHIPDATE` date NOT NULL,
  `L_COMMITDATE` date NOT NULL,
  `L_RECEIPTDATE` date NOT NULL,
  `L_SHIPINSTRUCT` char(25) COLLATE utf8_bin NOT NULL,
  `L_SHIPMODE` char(10) COLLATE utf8_bin NOT NULL,
  `L_COMMENT` varchar(44) COLLATE utf8_bin NOT NULL
) ENGINE = innodb;

-- Archive the table and create an OSS file filter in the L_ORDERKEY, L_LINENUMBER, and L_SHIPDATE columns of the table.
ALTER TABLE lineitem ENGINE = CSV STORAGE OSS
  OSS_FILE_FILTER = 'L_ORDERKEY,L_LINENUMBER,L_SHIPDATE';

Create an OSS file filter in an archived table

You can create an OSS file filter in an archived table.

-- Create a table that uses the specified structure.
CREATE TABLE `lineitem` (
  `L_ORDERKEY` int(11) NOT NULL,
  `L_PARTKEY` int(11) NOT NULL,
  `L_SUPPKEY` int(11) NOT NULL,
  `L_LINENUMBER` int(11) NOT NULL,
  `L_QUANTITY` decimal(15,2) NOT NULL,
  `L_EXTENDEDPRICE` decimal(15,2) NOT NULL,
  `L_DISCOUNT` decimal(15,2) NOT NULL,
  `L_TAX` decimal(15,2) NOT NULL,
  `L_RETURNFLAG` char(1) COLLATE utf8_bin NOT NULL,
  `L_LINESTATUS` char(1) COLLATE utf8_bin NOT NULL,
  `L_SHIPDATE` date NOT NULL,
  `L_COMMITDATE` date NOT NULL,
  `L_RECEIPTDATE` date NOT NULL,
  `L_SHIPINSTRUCT` char(25) COLLATE utf8_bin NOT NULL,
  `L_SHIPMODE` char(10) COLLATE utf8_bin NOT NULL,
  `L_COMMENT` varchar(44) COLLATE utf8_bin NOT NULL
) ENGINE = CSV STORAGE OSS;

-- Create an OSS file filter in the L_ORDERKEY, L_LINENUMBER, L_SHIPDATE, and L_SHIPINSTRUCT columns of the archived table. The L_SHIPINSTRUCT column uses the BLOOM data type.
ALTER TABLE lineitem OSS_FILE_FILTER = 'L_ORDERKEY,L_LINENUMBER,L_SHIPDATE,L_SHIPINSTRUCT:BLOOM';

Use the OSS file filter

If an OSS file filter is used in query conditions, the query conditions are pushed down to the OSS cold data in the CSV format. You can query the execution plan to view the conditions that are pushed down.

-- Create a table that uses the specified structure.
CREATE TABLE `lineitem` (
  `L_ORDERKEY` int(11) NOT NULL,
  `L_PARTKEY` int(11) NOT NULL,
  `L_SUPPKEY` int(11) NOT NULL,
  `L_LINENUMBER` int(11) NOT NULL,
  `L_QUANTITY` decimal(15,2) NOT NULL,
  `L_EXTENDEDPRICE` decimal(15,2) NOT NULL,
  `L_DISCOUNT` decimal(15,2) NOT NULL,
  `L_TAX` decimal(15,2) NOT NULL,
  `L_RETURNFLAG` char(1) COLLATE utf8_bin NOT NULL,
  `L_LINESTATUS` char(1) COLLATE utf8_bin NOT NULL,
  `L_SHIPDATE` date NOT NULL,
  `L_COMMITDATE` date NOT NULL,
  `L_RECEIPTDATE` date NOT NULL,
  `L_SHIPINSTRUCT` char(25) COLLATE utf8_bin NOT NULL,
  `L_SHIPMODE` char(10) COLLATE utf8_bin NOT NULL,
  `L_COMMENT` varchar(44) COLLATE utf8_bin NOT NULL
) ENGINE = CSV STORAGE OSS OSS_FILE_FILTER = 'L_ORDERKEY,L_LINENUMBER,L_SHIPDATE';

-- Query the execution plan.
explain select * from lineitem where l_orderkey=96;
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                                                                           |
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------+
|  1 | SIMPLE      | lineitem | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 6001215 |    10.00 | Using where; With pushed engine condition (`test`.`lineitem`.`L_ORDERKEY` = 96) |
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------+

explain format = tree select * from lineitem where l_orderkey=96 \G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (lineitem.L_ORDERKEY = 96)  (cost=15010.00 rows=10000)
    -> Table scan on lineitem, extra (oss_file_filter conditions: (lineitem.L_ORDERKEY = 96))  (cost=15010.00 rows=100000)

-- Execute the SQL query statement.
select count(*) from lineitem where l_orderkey=96;