This topic describes how to use ePQ parallel scans and batch writes for the statement to create materialized views.
Prerequisites
The feature is supported on the PolarDB for PostgreSQL clusters that run the following engine:
PostgreSQL 11 (revision version 1.1.22 or later)
PostgreSQL 14 (revision version 14.6.6.0 or later)
You can execute one of the following statements to view the revision version of a PolarDB for PostgreSQL cluster:
PostgreSQL 14
SELECT version();
PostgreSQL 11
SHOW polar_version;
Background information
Materialized views are database objects that contain query results. Unlike normal views, materialized views store both view definitions and data copies when they are created. If the data in the materialized view is inconsistent with that in the view definition, you can refresh the materialized view to keep the data consistent. Materialized views essentially pre-compute queries in view definitions so that they can be reused during queries.
The CREATE TABLE AS statement is used to create a table which uses the same structure as the output columns of a query.
The SELECT INTO statement is used to create a new table and write the queried data to the table instead of returning the queried data to the client. The new table uses exactly the same structure as the output columns of the query.
How it works
The PostgreSQL kernel uses the same set of code logics to handle the statements to create and refresh materialized views and the CREATE TABLE AS
and SELECT INTO
statements, because these statements perform similar steps at the database level. The following steps are usually included:
Data scans: Execute the query specified in the view definition or in the
CREATE TABLE AS
orSELECT INTO
statement to scan the data that meets the conditions.Data writes: Write the data scanned in the preceding step to a new materialized view or a new table.
To optimize the preceding steps, PolarDB for PostgreSQL introduces ePQ parallel scans and batch writes. When a large amount of data are to be scanned or written, ePQ parallel scans and batch writes can improve the performance of these DDL statements and shorten the execution time.
ePQ parallel scans: The ePQ feature allows you to use the I/O bandwidth and computing resources of multiple compute nodes to execute queries in view definitions in parallel. This improves the utilization of computing resources and bandwidth.
Batch writes: A certain number of tuples that are accumulated in the memory are written to the table or materialized view at a time. This reduces the overheads for recording WAL logs and the frequency of locking pages.
Usage
ePQ parallel scans
You can use the
polar_px_enable_create_table_as
parameter to specify whether to enable ePQ parallel scans to speed up queries for theCREATE TABLE AS
statement. Default value: ON. Valid values:ON: enables ePQ parallel scans to speed up queries for the
CREATE TABLE AS
statement.OFF: disables ePQ parallel scans to speed up queries for the
CREATE TABLE AS
statement.
SET polar_px_enable_create_table_as = ON;
NoteThis parameter is valid only if the ePQ feature is enabled (the
polar_enable_px
parameter is set to ON).Due to limits of the ePQ feature, you cannot enable ePQ parallel scans to speed up queries for the
CREATE TABLE AS ... WITH OIDS
statement. The processing for the CREATE TABLE AS ... WITH OIDS statement: use the built-in optimizer of PostgreSQL to generate an execution plan for the query in the DDL definition and use the single-node executor of PostgreSQL to perform the query.
Batch writes
You can use the
polar_enable_create_table_as_bulk_insert
parameter to specify whether to enable batch writes for theSELECT INTO
statement. Default value: ON. Valid values:ON: enables batch writes for the
SELECT INTO
statement.OFF: disables batch writes for the
SELECT INTO
statement.
SET polar_enable_create_table_as_bulk_insert = ON;