All Products
Search
Document Center

PolarDB:Configure sort keys for columnstore indexes

Last Updated:Nov 21, 2025

This topic describes the sorting process for columnstore index data and how to use the sorting feature. It also compares the time required to build and query sorted columnstore index data.

Introduction

Columnstore index data is organized by row groups. By default, each row group contains 64,000 rows. Within each row group, different columns are packaged into column data blocks. These blocks are built in parallel based on the primary key order of the original row store data. Because updated data is written in append order, the data is generally unsorted.

Columnstore indexes support rough indexes. The metadata of each column data block contains information, such as the minimum and maximum values of all data in the block. When you query data, all column data blocks of a specified column are typically traversed. After you enable the IMCI pruner, all column data blocks are classified into three categories based on query conditions and metadata: relevant, possibly relevant, and irrelevant. Only relevant and possibly relevant column data blocks are read. Column data blocks can be organized in different orders, which produces different combinations of column data block collections and affects the filtering efficiency of the IMCI pruner. Therefore, you can change the arrangement of column data blocks based on query conditions to improve query performance.

Columnstore index resortingAs shown in the preceding figure, when the following SQL statement is executed on an unordered set of data blocks, all column data blocks must be loaded and processed. For an ordered set of data blocks, the first column data block can be filtered out based on the maximum and minimum values in its metadata. Only the second column data block needs to be processed.

SELECT * FROM t WHERE c >= 8;

Applicability

  • To use the Data sorting when creating a new columnstore index feature, you need an Enterprise Edition cluster that meets one of the following requirements:

    • PolarDB for MySQL 8.0.1 whose revision version is 8.0.1.1.32 or later.

    • PolarDB for MySQL 8.0.2 whose revision version is 8.0.2.2.12 or later.

  • To use the Incremental data sorting feature, you need an Enterprise Edition cluster that meets one of the following requirements:

    • PolarDB for MySQL 8.0.1 whose revision version is 8.0.1.1.39.1 or later.

    • PolarDB for MySQL 8.0.2 whose revision version is 8.0.2.2.20.1 or later.

You can check the cluster version. For more information, see Query the version number.

Precautions

  • Columns of the BLOB, JSON, or GEOMETRY data types cannot be used as sort keys.

  • The incremental data sorting feature does not support sort keys of the unsigned integer or Decimal data types.

  • Incremental sorting maintains the order based only on the first column of the sort key.

  • Incremental data sorting consumes a certain amount of resources. Therefore, when the write workload of the cluster is high, the speed of incremental data sorting slows down to release more resources for data write operations.

Sorting procedures

  • Data sorting procedure when you create a new columnstore index

    The data sorting for a columnstore index is implemented in a way similar to the sorting algorithm for secondary indexes during a Data Definition Language (DDL) process. Both single-threaded and multi-threaded sorting are supported. Single-threaded sorting uses a standard two-way merge sort. Multi-threaded sorting uses a k-way external merge sort with a loser tree and supports a sampling sort policy. The overall process is as follows:

    1. Traverse the data based on the primary key index and save the complete data that is read to data files. Then, add the sort key columns to the sorting buffer. Each thread uses a different data file and writes data after a specific amount accumulates.

    2. Continuously traverse and insert data into the sorting buffer. When the sorting buffer is full, sort the data in memory based on the sort key combination and save the data to merge files.

    3. After the traversal is complete, sort the merge files segment by segment in pairs. Save the sorted data to temporary files. Then, replace the merge files with the temporary files.

    4. Repeat Step 3 until the merge files are sorted. Then, read each record in the merge files, retrieve the corresponding record from the data files based on the offset value, and append it to the columnstore index.

  • Incremental data sorting procedure

    The sorting process for incremental data is progressive and does not guarantee that the data is completely sorted. The overall process is as follows:

    1. Group all data blocks in pairs and select multiple groups of data blocks with a high degree of overlap in their timestamp ranges.

    2. Perform a merge sort on each group of data blocks to generate two new sorted data blocks.

    3. Repeat Step 2 until all data blocks are sorted.

Parameter descriptions

You must set the parameters in the following table in the database to enable or disable the columnstore index sorting feature and configure the number of threads as needed.

Note

The method for modifying PolarDB cluster parameters differs between the console and a database session. The differences are as follows:

  • In the PolarDB console

    • Compatibility: Some cluster parameters in the PolarDB console have the loose_ prefix for compatibility with MySQL configuration files.

    • Procedure: Find and modify the parameters that have the loose_ prefix.

  • In a database session (using the command line or a client)

    • Procedure: When you connect to the database and use the SET command to modify a parameter, remove the loose_ prefix and use the original parameter name.

Parameter

Description

loose_imci_enable_pack_order_key

Controls whether to sort data when a new columnstore index is created. Valid values:

  • ON (default): enables the data sorting feature when you create a new columnstore index.

  • OFF: disables the data sorting feature when you create a new columnstore index.

loose_imci_enable_pack_order_key_changed_rebuild

Specifies whether to rebuild the table when the sort order of the columnstore index changes. Valid values:

  • ON: The table needs to be rebuilt when the sort order of the columnstore index changes.

  • OFF (default): The table does not need to be rebuilt when the sort order of the columnstore index changes.

loose_imci_parallel_build_threads_per_table

The number of threads used to build columnstore index data for a single table.

Value range: 1 to 128. Default value: 8.

Usage notes

You can use the columnstore index data sorting feature by following these steps:

  1. Enable the columnstore index sorting feature.

    Set the imci_enable_pack_order_key parameter to ON to enable the Data sorting when creating a new columnstore index feature.

  2. Add the order_key property to the comment of the following SQL statement to build sorted columnstore index data.

    ALTER TABLE table_name COMMENT 'columnar=1 order_key=column_name[,column_name]';

    Parameter descriptions:

    Parameter

    Description

    table_name

    The table name.

    column_name

    The column name. You can configure multiple column names. Separate multiple column names with commas (,).

    You can monitor the progress of IMCI creation in the INFORMATION_SCHEMA.IMCI_ASYNC_DDL_STATS table. For more information about the INFORMATION_SCHEMA.IMCI_ASYNC_DDL_STATS table, see View DDL execution speed and build progress for IMCIs.

    SELECT * FROM INFORMATION_SCHEMA.IMCI_ASYNC_DDL_STATS;

Differences Between Columnstore Index Data Sorting And DDL Sorting

Columnstore index data sorting is essentially sorting by a specified key combination, which is similar to the DDL sorting process for other secondary indexes. However, they differ in the following aspects:

  • Columnstore index sorting does not use index columns as sort keys. Instead, you can specify any combination of columns as sort keys.

  • After columnstore index sorting, the complete data must be read. For a secondary index DDL, only the index part of the data needs to be saved. For example, for a VARCHAR field, only the prefix is saved as index data.

Comparison of build and query times for sorted columnstore index data

This example uses a 100 GB TPC-H dataset to test the time required to build and query sorted columnstore index data.

  • Test the build time of sorted columnstore index data.

    This example builds sorted columnstore index data for the lineitem table with 16 parallel threads. The following is a sample statement:

    ALTER TABLE lineitem COMMENT='columnar=1 order_key=l_receiptdate,l_shipmode';

    The build times are as follows:

    Unordered dataset

    Ordered dataset

    6 minutes

    35 minutes

  • Test the query time of sorted columnstore index data.

    This example runs the TPC-H Q12 query. The Least Recently Used (LRU) cache and executor memory are both set to 10 GB. The following is the query statement:

    SELECT
        l_shipmode,
        SUM(CASE
            WHEN o_orderpriority = '1-URGENT' OR o_orderpriority = '2-HIGH'
            THEN 1
            ELSE 0
            END) AS high_line_count,
        SUM(CASE
            WHEN o_orderpriority <> '1-URGENT' AND o_orderpriority <> '2-HIGH'
            THEN 1
            ELSE 0
            END) AS low_line_count
        FROM
            orders,
            lineitem
        WHERE
            o_orderkey = l_orderkey
            AND l_shipmode in ('MAIL', 'SHIP')
            AND l_commitdate < l_receiptdate
            AND l_shipdate < l_commitdate
            AND l_receiptdate >= date '1994-01-01'
            AND l_receiptdate < date '1994-01-01' + interval '1' year
        GROUP BY
            l_shipmode
        ORDER BY
            l_shipmode;

    The query times are as follows:

    Unordered dataset

    Ordered dataset

    7.47s

    1.25s

Comparison of query times with sort keys and partitioned tables

This test uses a 1 TB TPC-H dataset on a node with 32 cores and 256 GB of memory. It tests the query performance when the columnstore index feature is enabled (column store) and when the columnstore index feature is enabled with partitions and sort key columns.

The following standard table creation statements are used for the test:

CREATE TABLE region ( r_regionkey  BIGINT NOT NULL,
                      r_name       CHAR(25) NOT NULL,
                      r_comment    VARCHAR(152)) COMMENT 'COLUMNAR=1';

CREATE TABLE nation ( n_nationkey  BIGINT NOT NULL,
                      n_name       CHAR(25) NOT NULL,
                      n_regionkey  BIGINT NOT NULL,
                      n_comment    VARCHAR(152)) COMMENT 'COLUMNAR=1';

CREATE TABLE part ( p_partkey     BIGINT NOT NULL,
                    p_name        VARCHAR(55) NOT NULL,
                    p_mfgr        CHAR(25) NOT NULL,
                    p_brand       CHAR(10) NOT NULL,
                    p_type        VARCHAR(25) NOT NULL,
                    p_size        BIGINT NOT NULL,
                    p_container   CHAR(10) NOT NULL,
                    p_retailprice DECIMAL(15,2) NOT NULL,
                    p_comment     VARCHAR(23) NOT NULL) COMMENT 'COLUMNAR=1';

CREATE TABLE supplier ( s_suppkey     BIGINT NOT NULL,
                        s_name        CHAR(25) NOT NULL,
                        s_address     VARCHAR(40) NOT NULL,
                        s_nationkey   BIGINT NOT NULL,
                        s_phone       CHAR(15) NOT NULL,
                        s_acctbal     DECIMAL(15,2) NOT NULL,
                        s_comment     VARCHAR(101) NOT NULL) COMMENT 'COLUMNAR=1';

CREATE TABLE partsupp ( ps_partkey     BIGINT NOT NULL,
                        ps_suppkey     BIGINT NOT NULL,
                        ps_availqty    BIGINT NOT NULL,
                        ps_supplycost  DECIMAL(15,2)  NOT NULL,
                        ps_comment     VARCHAR(199) NOT NULL) COMMENT 'COLUMNAR=1';

CREATE TABLE customer ( c_custkey     BIGINT NOT NULL,
                        c_name        VARCHAR(25) NOT NULL,
                        c_address     VARCHAR(40) NOT NULL,
                        c_nationkey   BIGINT NOT NULL,
                        c_phone       CHAR(15) NOT NULL,
                        c_acctbal     DECIMAL(15,2)   NOT NULL,
                        c_mktsegment  CHAR(10) NOT NULL,
                        c_comment     VARCHAR(117) NOT NULL) COMMENT 'COLUMNAR=1';

CREATE TABLE orders ( o_orderkey       BIGINT NOT NULL,
                      o_custkey        BIGINT NOT NULL,
                      o_orderstatus    CHAR(1) NOT NULL,
                      o_totalprice     DECIMAL(15,2) NOT NULL,
                      o_orderdate      DATE NOT NULL,
                      o_orderpriority  CHAR(15) NOT NULL,
                      o_clerk          CHAR(15) NOT NULL,
                      o_shippriority   BIGINT NOT NULL,
                      o_comment        VARCHAR(79) NOT NULL) COMMENT 'COLUMNAR=1'
                      PARTITION BY RANGE (year(`o_orderdate`))
                      (PARTITION p0 VALUES LESS THAN (1992) ENGINE = InnoDB,
                       PARTITION p1 VALUES LESS THAN (1993) ENGINE = InnoDB,
                       PARTITION p2 VALUES LESS THAN (1994) ENGINE = InnoDB,
                       PARTITION p3 VALUES LESS THAN (1995) ENGINE = InnoDB,
                       PARTITION p4 VALUES LESS THAN (1996) ENGINE = InnoDB,
                       PARTITION p5 VALUES LESS THAN (1997) ENGINE = InnoDB,
                       PARTITION p6 VALUES LESS THAN (1998) ENGINE = InnoDB,
                       PARTITION p7 VALUES LESS THAN (1999) ENGINE = InnoDB);

CREATE TABLE lineitem ( l_orderkey       BIGINT NOT NULL,
                        l_partkey        BIGINT NOT NULL,
                        l_suppkey        BIGINT NOT NULL,
                        l_linenumber     BIGINT NOT NULL,
                        l_quantity       DECIMAL(15,2) NOT NULL,
                        l_extendedprice  DECIMAL(15,2) NOT NULL,
                        l_discount       DECIMAL(15,2) NOT NULL,
                        l_tax            DECIMAL(15,2) NOT NULL,
                        l_returnflag     CHAR(1) NOT NULL,
                        l_linestatus     CHAR(1) NOT NULL,
                        l_shipdate       DATE NOT NULL,
                        l_commitdate     DATE NOT NULL,
                        l_receiptdate    DATE NOT NULL,
                        l_shipinstruct   CHAR(25) NOT NULL,
                        l_shipmode       CHAR(10) NOT NULL,
                        l_comment        VARCHAR(44) NOT NULL) COMMENT 'COLUMNAR=1'
                        PARTITION BY RANGE (year(`l_shipdate`))
                        (PARTITION p0 VALUES LESS THAN (1992) ENGINE = InnoDB,
                         PARTITION p1 VALUES LESS THAN (1993) ENGINE = InnoDB,
                         PARTITION p2 VALUES LESS THAN (1994) ENGINE = InnoDB,
                         PARTITION p3 VALUES LESS THAN (1995) ENGINE = InnoDB,
                         PARTITION p4 VALUES LESS THAN (1996) ENGINE = InnoDB,
                         PARTITION p5 VALUES LESS THAN (1997) ENGINE = InnoDB,
                         PARTITION p6 VALUES LESS THAN (1998) ENGINE = InnoDB,
                         PARTITION p7 VALUES LESS THAN (1999) ENGINE = InnoDB);

After the data is imported, set the sort key columns for the tables. For more information, see Configure sort keys for columnstore indexes.

ALTER TABLE customer COMMENT='COLUMNAR=1 order_key=c_mktsegment';
ALTER TABLE nation COMMENT='COLUMNAR=1 order_key=n_name';
ALTER TABLE part COMMENT='COLUMNAR=1 order_key=p_brand,p_container,p_type';
ALTER TABLE region COMMENT='COLUMNAR=1 order_key=r_name';
ALTER TABLE orders COMMENT='COLUMNAR=1 order_key=o_orderkey,o_custkey,o_orderdate';
ALTER TABLE lineitem COMMENT='COLUMNAR=1 order_key=l_orderkey,l_linenumber,l_receiptdate,l_shipdate,l_partkey';

A selection of TPC-H query statements are used for the test. The query times are shown in the following table:

Query SQL

Unordered dataset (seconds)

Ordered dataset (with partitions and sort key columns) (seconds)

Q3

71.951

36.566

Q4

46.679

32.015

Q6

34.652

4.4

Q7

74.749

34.166

Q12

86.742

28.586

Q14

50.248

12.56

Q15

79.22

21.113

Q20

51.746

10.178

Q21

216.942

148.459