ePQ supports parallel scans and batch writes for the statements to create materialized views - PolarDB

This topic describes how to use ePQ parallel scans and batch writes for the statement to create materialized views.

Prerequisites

The feature is supported on the PolarDB for PostgreSQL clusters that run the following engine:

PostgreSQL 11 (revision version 1.1.22 or later)
PostgreSQL 14 (revision version 14.6.6.0 or later)

Note

You can execute one of the following statements to view the revision version of a PolarDB for PostgreSQL cluster:

PostgreSQL 14
```
SELECT version();
```
PostgreSQL 11
```
SHOW polar_version;
```

Background information

Materialized views are database objects that contain query results. Unlike normal views, materialized views store both view definitions and data copies when they are created. If the data in the materialized view is inconsistent with that in the view definition, you can refresh the materialized view to keep the data consistent. Materialized views essentially pre-compute queries in view definitions so that they can be reused during queries.

The CREATE TABLE AS statement is used to create a table which uses the same structure as the output columns of a query.

The SELECT INTO statement is used to create a new table and write the queried data to the table instead of returning the queried data to the client. The new table uses exactly the same structure as the output columns of the query.

How it works

The PostgreSQL kernel uses the same set of code logics to handle the statements to create and refresh materialized views and the CREATE TABLE AS and SELECT INTO statements, because these statements perform similar steps at the database level. The following steps are usually included:

Data scans: Execute the query specified in the view definition or in the CREATE TABLE AS or SELECT INTO statement to scan the data that meets the conditions.
Data writes: Write the data scanned in the preceding step to a new materialized view or a new table.

To optimize the preceding steps, PolarDB for PostgreSQL introduces ePQ parallel scans and batch writes. When a large amount of data are to be scanned or written, ePQ parallel scans and batch writes can improve the performance of these DDL statements and shorten the execution time.

ePQ parallel scans: The ePQ feature allows you to use the I/O bandwidth and computing resources of multiple compute nodes to execute queries in view definitions in parallel. This improves the utilization of computing resources and bandwidth.
Batch writes: A certain number of tuples that are accumulated in the memory are written to the table or materialized view at a time. This reduces the overheads for recording WAL logs and the frequency of locking pages.

Usage

ePQ parallel scans
You can use the polar_px_enable_create_table_as parameter to specify whether to enable ePQ parallel scans to speed up queries for the CREATE TABLE AS statement. Default value: ON. Valid values:
- ON: enables ePQ parallel scans to speed up queries for the CREATE TABLE AS statement.
- OFF: disables ePQ parallel scans to speed up queries for the CREATE TABLE AS statement.
```
SET polar_px_enable_create_table_as = ON;
```
Note
- This parameter is valid only if the ePQ feature is enabled (the polar_enable_px parameter is set to ON).
- Due to limits of the ePQ feature, you cannot enable ePQ parallel scans to speed up queries for the CREATE TABLE AS ... WITH OIDS statement. The processing for the CREATE TABLE AS ... WITH OIDS statement: use the built-in optimizer of PostgreSQL to generate an execution plan for the query in the DDL definition and use the single-node executor of PostgreSQL to perform the query.
Batch writes
You can use the polar_enable_create_table_as_bulk_insert parameter to specify whether to enable batch writes for the SELECT INTO statement. Default value: ON. Valid values:
- ON: enables batch writes for the SELECT INTO statement.
- OFF: disables batch writes for the SELECT INTO statement.
```
SET polar_enable_create_table_as_bulk_insert = ON;
```