Tips for Executing SQL Statements in Batch: Which Is Better, gexec or Dynamic SQL?

This article introduces two methods for executing SQL statements in batch: using the gexec convenient command in psql and using dynamic SQL.

By digoal

Background

In some scenarios, we will use SQL statements to generate SQLs that need to be executed in batch, and then execute the generated SQLs. There are two relatively simple methods to do this.

Method 1

Use the gexec convenient command in psql. For example, we create a partitioned table,

create table tbl (id int, info text, ts timestamp) partition by hash (id);

and then we create N corresponding partitions.

select format('create table tbl_%s PARTITION OF tbl FOR VALUES WITH (MODULUS %s, REMAINDER %s);', i, 4, i) from generate_series(0,3) i;  
  
                                    format                                       
-------------------------------------------------------------------------------  
create table tbl_0 PARTITION OF tbl FOR VALUES WITH (MODULUS 4, REMAINDER 0);  
create table tbl_1 PARTITION OF tbl FOR VALUES WITH (MODULUS 4, REMAINDER 1);  
create table tbl_2 PARTITION OF tbl FOR VALUES WITH (MODULUS 4, REMAINDER 2);  
create table tbl_3 PARTITION OF tbl FOR VALUES WITH (MODULUS 4, REMAINDER 3);  
(4 rows)

Call the gexec command in psql to execute the result of the previous command as SQL.

\gexec  
  
postgres=# \gexec  
CREATE TABLE  
CREATE TABLE  
CREATE TABLE  
CREATE TABLE

The partition is created.

postgres=# \d+ tbl  
                                               Partitioned table "public.tbl"  
 Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description   
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------  
 id     | integer                     |           |          |         | plain    |             |              |   
 info   | text                        |           |          |         | extended |             |              |   
 ts     | timestamp without time zone |           |          |         | plain    |             |              |   
Partition key: HASH (id)  
Partitions: tbl_0 FOR VALUES WITH (modulus 4, remainder 0),  
            tbl_1 FOR VALUES WITH (modulus 4, remainder 1),  
            tbl_2 FOR VALUES WITH (modulus 4, remainder 2),  
            tbl_3 FOR VALUES WITH (modulus 4, remainder 3)

For information on using gexec, refer to the man psql.

\gexec  
    Sends the current query buffer to the server, then treats each column of each row of the query's output (if any) as an SQL statement to be executed. For example, to create an index on  
    each column of my_table:  
  
        => SELECT format('create index on my_table(%I)', attname)  
        -> FROM pg_attribute  
        -> WHERE attrelid = 'my_table'::regclass AND attnum > 0  
        -> ORDER BY attnum  
        -> \gexec  
        CREATE INDEX  
        CREATE INDEX  
        CREATE INDEX  
        CREATE INDEX  
  
    The generated queries are executed in the order in which the rows are returned, and left-to-right within each row if there is more than one column. NULL fields are ignored. The  
    generated queries are sent literally to the server for processing, so they cannot be psql meta-commands nor contain psql variable references. If any individual query fails, execution of  
    the remaining queries continues unless ON_ERROR_STOP is set. Execution of each query is subject to ECHO processing. (Setting ECHO to all or queries is often advisable when using  
    \gexec.) Query logging, single-step mode, timing, and other query execution features apply to each generated query as well.  
  
If the current query buffer is empty, the most recently sent query is re-executed instead.

Method 2

Use dynamic SQL.

drop table tbl;  
  
do language plpgsql $$  
declare  
  p int := 4;  
begin  
  execute 'create table tbl (id int, info text, ts timestamp) partition by hash (id);';
  for i in 0 .. (p-1)   
  loop  
    execute format('create table tbl_%s PARTITION OF tbl FOR VALUES WITH (MODULUS %s, REMAINDER %s);', i, p, i);  
  end loop;  
end;  
$$;

The partition is created.

postgres=# \d+ tbl  
                                               Partitioned table "public.tbl"  
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description   
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------  
 id     | integer                     |           |          |         | plain    |             |              |   
 info   | text                        |           |          |         | extended |             |              |   
 ts     | timestamp without time zone |           |          |         | plain    |             |              |   
Partition key: HASH (id)  
Partitions: tbl_0 FOR VALUES WITH (modulus 4, remainder 0),  
            tbl_1 FOR VALUES WITH (modulus 4, remainder 1),  
            tbl_2 FOR VALUES WITH (modulus 4, remainder 2),  
            tbl_3 FOR VALUES WITH (modulus 4, remainder 3)

Difference between the Two Methods

gexec uses the multi-transaction mode.
Dynamic SQL encapsulated within a function is a single transaction. If there are a large number of statements, it may lead to transaction failures due to lock slot overflow. This depends on the two parameters: max_locks_per_transaction and max_connections.

Observations are as follows.

Dynamic SQL uses a single transaction.

postgres=# drop table tbl;  
DROP TABLE  
postgres=# select * from txid_current_snapshot();  
txid_current_snapshot   
-----------------------  
766:766:  
(1 row)  
  
postgres=# do language plpgsql $$  
declare  
  p int := 4;  
begin  
  execute 'create table tbl (id int, info text, ts timestamp) partition by hash (id);';
  for i in 0 .. (p-1)
  loop
    execute format('create table tbl_%s PARTITION OF tbl FOR VALUES WITH (MODULUS %s, REMAINDER %s);', i, p, i);  
  end loop;  
end;  
$$;  
DO  
postgres=# select * from txid_current_snapshot();  
 txid_current_snapshot   
-----------------------  
767:767:  
(1 row)

gexec uses a separate transaction for each SQL statement.

postgres=# drop table tbl;  
DROP TABLE  
postgres=# select * from txid_current_snapshot();  
 txid_current_snapshot   
-----------------------  
760:760:  
(1 row)  
  
postgres=# create table tbl (id int, info text, ts timestamp) partition by hash (id);
  
select format('create table tbl_%s PARTITION OF tbl FOR VALUES WITH (MODULUS %s, REMAINDER %s);', i, 4, i) from generate_series(0,3) i;  
CREATE TABLE  
                                    format                                       
-------------------------------------------------------------------------------  
 create table tbl_0 PARTITION OF tbl FOR VALUES WITH (MODULUS 4, REMAINDER 0);  
 create table tbl_1 PARTITION OF tbl FOR VALUES WITH (MODULUS 4, REMAINDER 1);  
 create table tbl_2 PARTITION OF tbl FOR VALUES WITH (MODULUS 4, REMAINDER 2);  
 create table tbl_3 PARTITION OF tbl FOR VALUES WITH (MODULUS 4, REMAINDER 3);  
(4 rows)  
  
postgres=# \gexec  
CREATE TABLE  
CREATE TABLE  
CREATE TABLE  
CREATE TABLE  
postgres=# select * from txid_current_snapshot();  
 txid_current_snapshot   
-----------------------  
765:765:  
(1 row)

0 0 0

Share on

: digoal

286 posts | 25 followers
Follow

PolarDB for PostgreSQL

Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.

Learn More

PolarDB for Xscale

Alibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.

Learn More