Use a columnstore index on a partitioned table - PolarDB

In PostgreSQL, partitioned tables are an effective way to manage data growth, and partition pruning helps accelerate queries. The columnstore index in PolarDB for PostgreSQL also supports partitioned tables to further meet the statistical and analytical needs of these tables.

Background

As business systems run, they generate large amounts of historical data, which causes tables to grow. Data is usually partitioned by dimensions such as time or user_id, and each partition stores only a subset of the data. Native PostgreSQL also uses partition pruning during queries to avoid reading irrelevant data.

The columnstore index in PolarDB for PostgreSQL also supports accelerated analytics on partitioned tables. You can use it in the same way as existing indexes on partitioned tables.

Results

With a degree of parallelism of 4, the columnstore index performs over 35 times faster than native PostgreSQL parallel execution for all three queries.

Query	Native PostgreSQL parallel execution	Columnstore index
Q1	2.13 s	0.05 s
Q2	6.42 s	0.18 s
Q3	10.51 s	0.30 s

Procedure

Step 1: Prepare the environment

Confirm that your cluster version and configuration meet the following requirements:
- Cluster versions:
  - PostgreSQL 16 (minor engine version 2.0.16.8.3.0 or later)
  - PostgreSQL 14 (minor engine version 2.0.14.10.20.0 or later)
  Note
  You can view the minor engine version number in the console or by running the SHOW polardb_version; statement. If the minor engine version does not meet the requirement, you must upgrade the minor engine version.
- The source table must have a primary key. The primary key column must be included when you create the columnstore index.
- The wal_level parameter must be set to logical. This adds the information required to support logical replication to the write-ahead logging (WAL).
  Note
  You can set the wal_level parameter in the console. Modifying this parameter restarts the cluster. Plan your business operations accordingly and proceed with caution.

Enable the columnstore index feature.

The method for enabling the columnstore index feature varies depending on the minor engine version of your PolarDB for PostgreSQL cluster:

PostgreSQL 16 (2.0.16.9.8.0 or later) or PostgreSQL 14 (2.0.14.17.35.0 or later)

For PolarDB for PostgreSQL clusters with these versions, two methods are available. The differences are outlined in the following table. You can choose the method that best fits your needs.

Comparison item	[Recommended] Add a columnstore index read-only node	Directly use the pre-installed columnstore index extension
Method	You can add a columnstore index node manually through a visual interface in the console.	No action is required. You can use the extension directly.
Resource allocation	The columnstore engine uses all resources exclusively and can fully utilize the available memory.	The columnstore engine can use only 25% of the memory. The remaining memory is allocated to the row store engine.
Business impact	Transactional processing (TP) and analytical processing (AP) workloads are isolated on different nodes and do not affect each other.	TP and AP workloads run on the same node and affect each other.
Costs	In-Memory Column Index (IMCI) read-only nodes incur additional charges and are billed at the same rate as regular compute nodes.	No additional cost.

Add a columnstore index read-only node

You can add a columnstore index read-only node in one of the following two ways:

Note

The cluster must contain at least one read-only node. You cannot add a columnstore index read-only node to a single-node cluster.

Add in the console

Log on to the PolarDB console and select the region where the cluster is located. You can open the Add/Remove Node wizard in one of the following ways:
- On the Clusters page, click Add/Remove Node in the Actions column.
- On the Basic Information page of the target cluster, click Add/Remove Node in the Database Nodes section.
Select Add Read-only IMCI Node and click OK.
On the cluster upgrade/downgrade page, add the columnstore index read-only node and complete the payment.
1. Click Add an IMCI Node and select the node specifications.
2. Select a switchover time.
3. (Optional) Review the Product Terms of Service and Service Level Agreement.
4. Click Buy Now.
After the payment is complete, return to the cluster details page and wait for the columnstore index read-only node to be added. The node is added when its status changes to Running.

Add during purchase

On the PolarDB purchase page, select the number of Nodes for the IMCI Read-Only Nodes parameter.

PostgreSQL 16 (2.0.16.8.3.0 to 2.0.16.9.8.0) or PostgreSQL 14 (2.0.14.10.20.0 to 2.0.14.17.35.0)

For PolarDB for PostgreSQL clusters with these versions, the columnstore index is deployed as the polar_csi extension in the database cluster. Before you can use the columnstore index, you must create the extension in the specified database.

Note

The polar_csi extension is scoped to the database level. To use the columnstore index in multiple databases within a cluster, you must create the polar_csi extension for each database.
The database account used to install the extension must be a privileged account.

You can install the polar_csi extension in one of the following two ways.

Install from the console

Log on to the PolarDB console. In the navigation pane on the left, click Clusters. Select the region where your cluster is located, and then click the cluster ID to go to the cluster details page.
In the navigation pane on the left, choose Settings and Management > Extension Management. On the Extension Management tab, select Uninstalled Extensions.
In the upper-right corner of the page, select the target database. In the row for the polar_csi extension, click Install in the Actions column. In the Install Extension dialog box that appears, select the target Database Account and click OK to install the extension in the target database.

Install from the command line

Connect to the database cluster and execute the following statement in a target database where you have the required permissions to create the polar_csi extension.

CREATE EXTENSION polar_csi;

Step 2: Prepare the data

In this example, you will create a multi-level partitioned table and insert 320 million rows of simulated data (about 16 GB). Then, you will perform statistical analysis based on partition conditions.

The schema of the test partitioned table is as follows:

sales: The primary table.
sales_2023: Partitioned by year.
- sales_2023_a: Partitioned by month. This partition contains data for months 1 to 6.
- sales_2023_b: Partitioned by month. This partition contains data for months 7 to 12.
sales_2024: Partitioned by year.
- sales_2024_a: Partitioned by month. This partition contains data for months 1 to 6.
- sales_2024_b: Partitioned by month. This partition contains data for months 7 to 12.

Create a multi-level partitioned table named sales. Use the sale_date column as the partition key. The definition is as follows.

CREATE TABLE sales (
    sale_id serial,
    product_id int NOT NULL,
    sale_date date NOT NULL,
    amount numeric(10,2) NOT NULL,
    primary key(sale_id, sale_date)
) PARTITION BY RANGE (sale_date);

CREATE TABLE sales_2023 PARTITION OF sales
    FOR VALUES FROM ('2023-1-1') TO ('2024-1-1')
    PARTITION BY RANGE (sale_date);
CREATE TABLE sales_2023_a PARTITION OF sales_2023
    FOR VALUES FROM ('2023-1-1') TO ('2023-7-1');
CREATE TABLE sales_2023_b PARTITION OF sales_2023
    FOR VALUES FROM ('2023-7-1') TO ('2024-1-1');

CREATE TABLE sales_2024 PARTITION OF sales
    FOR VALUES FROM ('2024-1-1') TO ('2025-1-1')
    PARTITION BY RANGE (sale_date);
CREATE TABLE sales_2024_a PARTITION OF sales_2024
    FOR VALUES FROM ('2024-1-1') TO ('2024-7-1');
CREATE TABLE sales_2024_b PARTITION OF sales_2024
    FOR VALUES FROM ('2024-7-1') TO ('2025-1-1');

Generate and insert about 16 GB of data into the partitioned table.

INSERT INTO sales (product_id, sale_date, amount)
SELECT
  (random()*100)::int AS product_id,
  '2023-01-1'::date + i/3200000*7 AS sale_date,
  (random()*1000)::numeric(10,2) AS amount
FROM
  generate_series(1, 320000000) i;

Create a columnstore index for the table. Add the sale_id, product_id, sale_date, and amount fields to the columnstore index.
```
CREATE INDEX ON sales USING CSI(sale_id, product_id, sale_date, amount);
```

Step 3: Execute queries

Run queries using different execution engines. Three queries, Q1, Q2, and Q3, are generated based on different partition conditions.

Use the columnstore index

--- Enable the columnstore index and set the degree of parallelism for queries to 4.
SET polar_csi.enable_query to on;
SET polar_csi.exec_parallel to 4;

--- Q1
EXPLAIN ANALYZE SELECT sale_date, COUNT(*) FROM sales WHERE sale_date BETWEEN '2023-1-1' and '2023-3-1' AND amount > 100 GROUP BY sale_date;
--- Q2
EXPLAIN ANALYZE SELECT sale_date, COUNT(*) FROM sales WHERE sale_date BETWEEN '2023-1-1' and '2023-9-1' AND amount > 100 GROUP BY sale_date;
--- Q3
EXPLAIN ANALYZE SELECT sale_date, COUNT(*) FROM sales WHERE sale_date BETWEEN '2023-1-1' and '2024-3-1' AND amount > 100 GROUP BY sale_date;

Disable the columnstore index and use the row-store engine

--- Disable the columnstore index, use the row-store engine, and set the degree of parallelism for queries to 4.
SET polar_csi.enable_query to off;
SET max_parallel_workers_per_gather to 4;

--- Q1
EXPLAIN ANALYZE SELECT sale_date, COUNT(*) FROM sales WHERE sale_date BETWEEN '2023-1-1' and '2023-3-1' AND amount > 100 GROUP BY sale_date;
--- Q2
EXPLAIN ANALYZE SELECT sale_date, COUNT(*) FROM sales WHERE sale_date BETWEEN '2023-1-1' and '2023-9-1' AND amount > 100 GROUP BY sale_date;
--- Q3
EXPLAIN ANALYZE SELECT sale_date, COUNT(*) FROM sales WHERE sale_date BETWEEN '2023-1-1' and '2024-3-1' AND amount > 100 GROUP BY sale_date;