This topic describes how to use the OSS file filter feature to accelerate cold data queries.
Background information
With the widespread application of OSS, more and more data is stored on OSS. The low query performance issue is steadily worsening and affects user experience. Although parallel query execution on cold data can accelerate queries, parallel query execution on cold data consumes a lot of memory, connections, and network bandwidth resources during the process of scanning OSS tables. User experience is still affected. Therefore, a method of quickly filtering OSS table data is required to improve query performance.
The OSS file filter feature is therefore introduced. This feature excludes OSS data blocks that do not need to be scanned based on query conditions. This reduces the amount of scanned data and greatly improves query performance. This feature not only optimizes resource efficiency, but also significantly enhances user experience.
Feature overview
The OSS file filter feature collects statistics on each archived data block, generates filter data, and stores the filter data in OSS. The system excludes the data blocks that do not need to be scanned based on the query conditions that are pushed and the filter data. This reduces the amount of scanned data and shortens the query time. OSS cold data can be filtered by the OSS file filter that you create. Different filtering methods are used for different data types: numeric data is filtered by comparing the minimum and maximum statistics of data blocks. String data is filtered by comparing the character map. If the BLOOM data type is specified, the BLOOM filter is used.
The OSS file filter feature performs better for ordered or partially ordered numeric data. However, for globally unordered data, the BLOOM filter is recommended for better filtering efficiency.
Because archived data typically has good time series, filtering columns of time types for can significantly improve query performance.
Prerequisites
The cluster must be of MySQL 8.0.2 with revision version 8.0.2.2.25 or later.
The cluster must have the cold data archiving feature enabled. For more information, see Enable cold data archiving.
A connection to the cluster is established. For more information, see Connect to a cluster.
You can create OSS file filters only for OSS cold data in the CSV format.
To enable the
OSS file filter
feature on a table, the table must have theOSS META = 1
option. You can execute theSHOW CREATE TABLE
statement to check whetherOSS META
is enabled for the current table in the CSV format. If the result containsOSS META=1
,OSS META
is enabled for the current table. For more information, see Perform DDL operations on cold data.SHOW CREATE TABLE t; *************************** 1. row *************************** Table: t Create Table: CREATE TABLE `t` ( `id` int(11) DEFAULT NULL ) /*!99990 800020213 STORAGE OSS */ ENGINE=CSV DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci /*!99990 800020204 NULL_MARKER='NULL' */ /*!99990 800020223 OSS META=1 */ 1 row in set (0.00 sec)
Limits
You cannot enable the OSS file filter feature after the ePQ feature is enabled.
The following functions are supported in the OSS file filter feature: =, <=>, <, <=, >=,>, BETWEEN, LIKE, IS NULL, and IS NOT NULL. The percent sign can only be added to the right of a variable in LIKE (LIKE 'ABC%').
The following table lists the data types supported in the OSS file filter feature.
Category
Data type
Integer
TINYINT
,SMALLINT
,MEDIUMINT
,INT
,BIGINT
,TINYINT UNSIGNED
,SMALLINT UNSIGNED
,MEDIUMINT UNSIGNED
,INT UNSIGNE
,BIGINT UNSIGNED
Floating-point
FLOAT
,DOUBLE
Fixed-point
DECIMAL
Time
DATE
,DATETIME
,TIME
,TIMESTAMP
String
CHAR
,VARCHAR
All data types are supported in a BLOOM filter.
If UUIDs are stored in strings, we recommend that you create a BLOOM filter.
For columns of the string type, only case-sensitive comparisons are supported.
The AND operation between multiple conditions is supported, but the OR operation between multiple conditions is not supported.
If the OSS table does not have data, you cannot use the OSS file filter feature for query acceleration.
Procedure
In the console, set the cluster parameter
loose_use_oss_meta
to ON to enable theUSE_OSS_META
feature. For more information about how to configure cluster parameters, see Configure cluster and node parameters.In the console, set the cluster parameter
loose_optimizer_switch
toENGINE_CONDITION_PUSHDOWN=ON
. For more information about how to configure cluster parameters, see Configure cluster and node parameters.In the console, set the cluster parameter
loose_csv_oss_file_filter
to ON to enable theOSS file filter
feature. For more information about how to configure cluster parameters, see Configure cluster and node parameters.Make sure that the cluster is connected. If not, you can connect to the cluster as specified in Connect to a cluster. Check whether the preceding parameters are configured.
-- Check whether the loose_csv_oss_file_filter parameter is set to ON SHOW VARIABLES LIKE 'oss_file_filter'; -- Check whether the use_oss_meta parameter is set to ON. SHOW VARIABLES LIKE 'use_oss_meta'; -- Check whether the loose_optimizer_switch parameter is set to ENGINE_CONDITION_PUSHDOWN=ON. SHOW VARIABLES LIKE 'optimizer_switch';
OSS file filter types
You can specify OSS file filter data for an archived table by adding the OSS_FILE_FILTER option. The OSS_FILE_FILTER option is in the following format:
OSS_FILE_FILTER = 'field_filter[,field_filter]'
field_filter := field_name[:filter_type]
filter_type := bloom
Default OSS file filter types are defined for different data types. By default, numeric and time data is filtered by comparing the minimum and maximum statistics. String data is filtered by comparing the character map. All data types can be filtered by using the BLOOM filter.
Examples
Create an OSS file filter when manually archiving a table
You can create an OSS file filter when you manually archive a table.
-- Create a table that uses the specified structure.
CREATE TABLE `lineitem` (
`L_ORDERKEY` int(11) NOT NULL,
`L_PARTKEY` int(11) NOT NULL,
`L_SUPPKEY` int(11) NOT NULL,
`L_LINENUMBER` int(11) NOT NULL,
`L_QUANTITY` decimal(15,2) NOT NULL,
`L_EXTENDEDPRICE` decimal(15,2) NOT NULL,
`L_DISCOUNT` decimal(15,2) NOT NULL,
`L_TAX` decimal(15,2) NOT NULL,
`L_RETURNFLAG` char(1) COLLATE utf8_bin NOT NULL,
`L_LINESTATUS` char(1) COLLATE utf8_bin NOT NULL,
`L_SHIPDATE` date NOT NULL,
`L_COMMITDATE` date NOT NULL,
`L_RECEIPTDATE` date NOT NULL,
`L_SHIPINSTRUCT` char(25) COLLATE utf8_bin NOT NULL,
`L_SHIPMODE` char(10) COLLATE utf8_bin NOT NULL,
`L_COMMENT` varchar(44) COLLATE utf8_bin NOT NULL
) ENGINE = innodb;
-- Archive the table and create an OSS file filter in the L_ORDERKEY, L_LINENUMBER, and L_SHIPDATE columns of the table.
ALTER TABLE lineitem ENGINE = CSV STORAGE OSS
OSS_FILE_FILTER = 'L_ORDERKEY,L_LINENUMBER,L_SHIPDATE';
Create an OSS file filter in an archived table
You can create an OSS file filter in an archived table.
-- Create a table that uses the specified structure.
CREATE TABLE `lineitem` (
`L_ORDERKEY` int(11) NOT NULL,
`L_PARTKEY` int(11) NOT NULL,
`L_SUPPKEY` int(11) NOT NULL,
`L_LINENUMBER` int(11) NOT NULL,
`L_QUANTITY` decimal(15,2) NOT NULL,
`L_EXTENDEDPRICE` decimal(15,2) NOT NULL,
`L_DISCOUNT` decimal(15,2) NOT NULL,
`L_TAX` decimal(15,2) NOT NULL,
`L_RETURNFLAG` char(1) COLLATE utf8_bin NOT NULL,
`L_LINESTATUS` char(1) COLLATE utf8_bin NOT NULL,
`L_SHIPDATE` date NOT NULL,
`L_COMMITDATE` date NOT NULL,
`L_RECEIPTDATE` date NOT NULL,
`L_SHIPINSTRUCT` char(25) COLLATE utf8_bin NOT NULL,
`L_SHIPMODE` char(10) COLLATE utf8_bin NOT NULL,
`L_COMMENT` varchar(44) COLLATE utf8_bin NOT NULL
) ENGINE = CSV STORAGE OSS;
-- Create an OSS file filter in the L_ORDERKEY, L_LINENUMBER, L_SHIPDATE, and L_SHIPINSTRUCT columns of the archived table. The L_SHIPINSTRUCT column uses the BLOOM data type.
ALTER TABLE lineitem OSS_FILE_FILTER = 'L_ORDERKEY,L_LINENUMBER,L_SHIPDATE,L_SHIPINSTRUCT:BLOOM';
Use the OSS file filter
If an OSS file filter is used in query conditions, the query conditions are pushed down to the OSS cold data in the CSV format. You can query the execution plan to view the conditions that are pushed down.
-- Create a table that uses the specified structure.
CREATE TABLE `lineitem` (
`L_ORDERKEY` int(11) NOT NULL,
`L_PARTKEY` int(11) NOT NULL,
`L_SUPPKEY` int(11) NOT NULL,
`L_LINENUMBER` int(11) NOT NULL,
`L_QUANTITY` decimal(15,2) NOT NULL,
`L_EXTENDEDPRICE` decimal(15,2) NOT NULL,
`L_DISCOUNT` decimal(15,2) NOT NULL,
`L_TAX` decimal(15,2) NOT NULL,
`L_RETURNFLAG` char(1) COLLATE utf8_bin NOT NULL,
`L_LINESTATUS` char(1) COLLATE utf8_bin NOT NULL,
`L_SHIPDATE` date NOT NULL,
`L_COMMITDATE` date NOT NULL,
`L_RECEIPTDATE` date NOT NULL,
`L_SHIPINSTRUCT` char(25) COLLATE utf8_bin NOT NULL,
`L_SHIPMODE` char(10) COLLATE utf8_bin NOT NULL,
`L_COMMENT` varchar(44) COLLATE utf8_bin NOT NULL
) ENGINE = CSV STORAGE OSS OSS_FILE_FILTER = 'L_ORDERKEY,L_LINENUMBER,L_SHIPDATE';
-- Query the execution plan.
explain select * from lineitem where l_orderkey=96;
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------+
| 1 | SIMPLE | lineitem | NULL | ALL | NULL | NULL | NULL | NULL | 6001215 | 10.00 | Using where; With pushed engine condition (`test`.`lineitem`.`L_ORDERKEY` = 96) |
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------+
explain format = tree select * from lineitem where l_orderkey=96 \G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (lineitem.L_ORDERKEY = 96) (cost=15010.00 rows=10000)
-> Table scan on lineitem, extra (oss_file_filter conditions: (lineitem.L_ORDERKEY = 96)) (cost=15010.00 rows=100000)
-- Execute the SQL query statement.
select count(*) from lineitem where l_orderkey=96;