In PostgreSQL, partitioned tables are an effective way to manage data growth, and partition pruning helps accelerate queries. The columnstore index in PolarDB for PostgreSQL also supports partitioned tables to further meet the statistical and analytical needs of these tables.
Background
As business systems run, they generate large amounts of historical data, which causes tables to grow. Data is usually partitioned by dimensions such as time or user_id, and each partition stores only a subset of the data. Native PostgreSQL also uses partition pruning during queries to avoid reading irrelevant data.
The columnstore index in PolarDB for PostgreSQL also supports accelerated analytics on partitioned tables. You can use it in the same way as existing indexes on partitioned tables.
Results
With a degree of parallelism of 4, the columnstore index performs over 35 times faster than native PostgreSQL parallel execution for all three queries.
Query | Native PostgreSQL parallel execution | Columnstore index |
Q1 | 2.13 s | 0.05 s |
Q2 | 6.42 s | 0.18 s |
Q3 | 10.51 s | 0.30 s |
Procedure
Step 1: Prepare the environment
Confirm that your cluster version and configuration meet the following requirements:
Cluster versions:
PostgreSQL 16 (minor engine version 2.0.16.8.3.0 or later)
PostgreSQL 14 (minor engine version 2.0.14.10.20.0 or later)
NoteYou can view the minor engine version number in the console or by running the
SHOW polardb_version;statement. If the minor engine version does not meet the requirement, you must upgrade the minor engine version.The source table must have a primary key. The primary key column must be included when you create the columnstore index.
The
wal_levelparameter must be set tological. This adds the information required to support logical replication to the write-ahead logging (WAL).NoteYou can set the wal_level parameter in the console. Modifying this parameter restarts the cluster. Plan your business operations accordingly and proceed with caution.
Enable the columnstore index feature.
The method for enabling the columnstore index feature varies depending on the minor engine version of your PolarDB for PostgreSQL cluster:
Step 2: Prepare the data
In this example, you will create a multi-level partitioned table and insert 320 million rows of simulated data (about 16 GB). Then, you will perform statistical analysis based on partition conditions.
The schema of the test partitioned table is as follows:
sales: The primary table.sales_2023: Partitioned by year.sales_2023_a: Partitioned by month. This partition contains data for months 1 to 6.sales_2023_b: Partitioned by month. This partition contains data for months 7 to 12.
sales_2024: Partitioned by year.sales_2024_a: Partitioned by month. This partition contains data for months 1 to 6.sales_2024_b: Partitioned by month. This partition contains data for months 7 to 12.
Create a multi-level partitioned table named
sales. Use thesale_datecolumn as the partition key. The definition is as follows.CREATE TABLE sales ( sale_id serial, product_id int NOT NULL, sale_date date NOT NULL, amount numeric(10,2) NOT NULL, primary key(sale_id, sale_date) ) PARTITION BY RANGE (sale_date); CREATE TABLE sales_2023 PARTITION OF sales FOR VALUES FROM ('2023-1-1') TO ('2024-1-1') PARTITION BY RANGE (sale_date); CREATE TABLE sales_2023_a PARTITION OF sales_2023 FOR VALUES FROM ('2023-1-1') TO ('2023-7-1'); CREATE TABLE sales_2023_b PARTITION OF sales_2023 FOR VALUES FROM ('2023-7-1') TO ('2024-1-1'); CREATE TABLE sales_2024 PARTITION OF sales FOR VALUES FROM ('2024-1-1') TO ('2025-1-1') PARTITION BY RANGE (sale_date); CREATE TABLE sales_2024_a PARTITION OF sales_2024 FOR VALUES FROM ('2024-1-1') TO ('2024-7-1'); CREATE TABLE sales_2024_b PARTITION OF sales_2024 FOR VALUES FROM ('2024-7-1') TO ('2025-1-1');Generate and insert about 16 GB of data into the partitioned table.
INSERT INTO sales (product_id, sale_date, amount) SELECT (random()*100)::int AS product_id, '2023-01-1'::date + i/3200000*7 AS sale_date, (random()*1000)::numeric(10,2) AS amount FROM generate_series(1, 320000000) i;Create a columnstore index for the table. Add the
sale_id,product_id,sale_date, andamountfields to the columnstore index.CREATE INDEX ON sales USING CSI(sale_id, product_id, sale_date, amount);
Step 3: Execute queries
Run queries using different execution engines. Three queries, Q1, Q2, and Q3, are generated based on different partition conditions.
Use the columnstore index
--- Enable the columnstore index and set the degree of parallelism for queries to 4. SET polar_csi.enable_query to on; SET polar_csi.exec_parallel to 4; --- Q1 EXPLAIN ANALYZE SELECT sale_date, COUNT(*) FROM sales WHERE sale_date BETWEEN '2023-1-1' and '2023-3-1' AND amount > 100 GROUP BY sale_date; --- Q2 EXPLAIN ANALYZE SELECT sale_date, COUNT(*) FROM sales WHERE sale_date BETWEEN '2023-1-1' and '2023-9-1' AND amount > 100 GROUP BY sale_date; --- Q3 EXPLAIN ANALYZE SELECT sale_date, COUNT(*) FROM sales WHERE sale_date BETWEEN '2023-1-1' and '2024-3-1' AND amount > 100 GROUP BY sale_date;Disable the columnstore index and use the row-store engine
--- Disable the columnstore index, use the row-store engine, and set the degree of parallelism for queries to 4. SET polar_csi.enable_query to off; SET max_parallel_workers_per_gather to 4; --- Q1 EXPLAIN ANALYZE SELECT sale_date, COUNT(*) FROM sales WHERE sale_date BETWEEN '2023-1-1' and '2023-3-1' AND amount > 100 GROUP BY sale_date; --- Q2 EXPLAIN ANALYZE SELECT sale_date, COUNT(*) FROM sales WHERE sale_date BETWEEN '2023-1-1' and '2023-9-1' AND amount > 100 GROUP BY sale_date; --- Q3 EXPLAIN ANALYZE SELECT sale_date, COUNT(*) FROM sales WHERE sale_date BETWEEN '2023-1-1' and '2024-3-1' AND amount > 100 GROUP BY sale_date;





