How to Write into a Database Using reWriteBatchedInserts Parameter

By digoal

Background

With PostgreSQL, you may often deliberate about one question: How to quickly write data into a PostgreSQL database? The database system offers several optimizations, such as asynchronous commit, deferred analysis, deferred index creation, added elongation checkpoints, and shared buffer. In addition, you should also note the writing mode.

The writing mode is as follows:

1) Best copy performance.

Example:

db2=> \h copy  
Command:     COPY  
Description: copy data between a file and a table  
Syntax:  
COPY table_name [ ( column_name [, ...] ) ]  
    FROM { 'filename' | PROGRAM 'command' | STDIN }  
    [ [ WITH ]( option [, ...] ) ]  
    [ WHERE condition ]  
  
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }  
    TO { 'filename' | PROGRAM 'command' | STDOUT }  
    [ [ WITH ]( option [, ...] ) ]  
  
where option can be one of:  
  
    FORMAT format_name  
    FREEZE [ boolean ]  
    DELIMITER 'delimiter_character'  
    NULL 'null_string'  
    HEADER [ boolean ]  
    QUOTE 'quote_character'  
    ESCAPE 'escape_character'  
    FORCE_QUOTE { ( column_name [, ...] ) | * }  
    FORCE_NOT_NULL ( column_name [, ...] )  
    FORCE_NULL ( column_name [, ...] )  
    ENCODING 'encoding_name'  
  
URL: https://www.postgresql.org/docs/12/sql-copy.html  
  
db2=> create table t(id int);  
CREATE TABLE  
  
db2=> copy t from stdin;  
Enter data to be copied followed by a newline.  
End with a backslash and a period on a line by itself, or an EOF signal.  
>> 1  
>> 2  
>> 3  
>> 4  
>> 5  
>> 6  
>> \.  
COPY 6

2) The performance of insert into values (),(),...; is similar to the copy.

db2=> insert into t values (1),(2),(3);  
INSERT 0 3  
  
db2=> prepare a (int,int,int) as insert into t values ($1),($2),($3);  
PREPARE  
db2=> execute a(1,2,3);  
INSERT 0 3  
db2=> select * from t;  
 id   
----  
  1  
  2  
  3  
(3 rows)

3) Multiple statements are placed in one transaction, reducing the synchronous waiting of write-ahead log (WAL) flush and WAL Input/Output (I/O).

begin  
...  
commit;  


db2=> begin;  
BEGIN  
db2=> insert into t values (4);  
INSERT 0 1  
db2=> insert into t values (5);  
INSERT 0 1  
db2=> insert into t values (6);  
INSERT 0 1  
db2=> commit;  
COMMIT

4) Group commit.

When you configure commit_delay, if the transactions that enter the commit state reach the commit_siblings at the same time, only flush the WAL once to reduce the WAL I/O.

This improves the effect of autocommit only for high concurrency but not for low concurrency.

ommit_delay (integer)
Commit_delay adds a time delay. It is measured in microseconds before a WAL flush is initiated. This can improve group commit throughput by allowing a larger number of transactions to commit via a single WAL flush if the system load is high enough that additional transactions become ready to commit within the given interval. However, it also increases latency by up to commit_delay microseconds for each WAL flush. Because the delay is just wasted if no other transactions become ready to commit, a delay is only performed if at least other transactions of commit_siblings are active when a flush is about to be initiated. Also, no delays are performed if fsync is disabled. The default commit_delay is zero (no delay). Only superusers can change this setting.
In PostgreSQL releases before 9.3, commit_delay behaved differently and was much less effective. It affected only commits, rather than all WAL flushes, and waited for the entire configured delay even if the WAL flush was completed sooner. Beginning in PostgreSQL 9.3, the first process that becomes ready to flush waits for the configured interval, while subsequent processes wait only until the leader completes the flush operation.
ommit_siblings (integer)
This is the minimum number of concurrent open transactions required before performing the commit_delay delay. A larger value makes it more probable that at least one other transaction will become ready to commit during the delay interval. The default is five transactions.

Query parameters:

db2=> show commit_delay ;  
 commit_delay   
--------------  
 0  
(1 row)  
  
db2=> show commit_siblings ;  
 commit_siblings   
-----------------  
 5  
(1 row)

JDBC Supports Insert Into Values (),(),...;

reWriteBatchedInserts = boolean

This will change batch inserts from insert into foo (col1, col2, col3) values (1,2,3) into insert into foo (col1, col2, col3) values (1,2,3), (4,5,6). This provides 2-3x performance improvement

Example 1: False Batch

@Entity(name = "Post")  
@Table(name = "post")  
public class Post {  
   
    @Id  
    @GeneratedValue(  
        strategy = GenerationType.SEQUENCE  
    )  
    private Long id;  
   
    private String title;  
   
    public Post() {}  
   
    public Post(String title) {  
        this.title = title;  
    }  
   
    //Getters and setters omitted for brevity  
}  


<property  
    name="hibernate.jdbc.batch_size"  
    value="10"  
/>  


for (int i = 0; i < 10; i++) {  
    entityManager.persist(  
        new Post(  
            String.format("Post no. %d", i + 1)  
        )  
    );  
}  


Query: ["insert into post (title, id) values (?, ?)"],  
Params:[(Post no. 1, 1), (Post no. 2, 2), (Post no. 3, 3),  
        (Post no. 4, 4), (Post no. 5, 5), (Post no. 6, 6),  
        (Post no. 7, 7), (Post no. 8, 8), (Post no. 9, 9),  
        (Post no. 10, 10)  
]

In fact, 10 inserts are split in the database.

log_statement = 'all'  
  
LOG:  execute S_2: insert into post (title, id) values ($1, $2)  
DETAIL:  parameters: $1 = 'Post no. 1', $2 = '1'  
LOG:  execute S_2: insert into post (title, id) values ($1, $2)  
DETAIL:  parameters: $1 = 'Post no. 2', $2 = '2'  
LOG:  execute S_2: insert into post (title, id) values ($1, $2)  
DETAIL:  parameters: $1 = 'Post no. 3', $2 = '3'  
LOG:  execute S_2: insert into post (title, id) values ($1, $2)  
DETAIL:  parameters: $1 = 'Post no. 4', $2 = '4'  
LOG:  execute S_2: insert into post (title, id) values ($1, $2)  
DETAIL:  parameters: $1 = 'Post no. 5', $2 = '5'  
LOG:  execute S_2: insert into post (title, id) values ($1, $2)  
DETAIL:  parameters: $1 = 'Post no. 6', $2 = '6'  
LOG:  execute S_2: insert into post (title, id) values ($1, $2)  
DETAIL:  parameters: $1 = 'Post no. 7', $2 = '7'  
LOG:  execute S_2: insert into post (title, id) values ($1, $2)  
DETAIL:  parameters: $1 = 'Post no. 8', $2 = '8'  
LOG:  execute S_2: insert into post (title, id) values ($1, $2)  
DETAIL:  parameters: $1 = 'Post no. 9', $2 = '9'  
LOG:  execute S_2: insert into post (title, id) values ($1, $2)  
DETAIL:  parameters: $1 = 'Post no. 10', $2 = '10'

Example 2: True Batch

PGSimpleDataSource dataSource =  
    (PGSimpleDataSource) super.dataSource();  
       
dataSource.setReWriteBatchedInserts(true);

JDBC batch API, that is •••PreparedStatement#addBatch()•••, enables the driver to send multiple "query executions" in a single network round trip. The current implementation, however, would still split large batches into smaller ones to avoid transmission control protocol (TCP) deadlock.

To prevent the TCP deadlock from being split into multiple blocks, after reWriteBatchedInserts is enabled, insert into () values (),(),()... is supported.

LOG:  execute <unnamed>: insert into post (title, id) values ($1, $2),($3, $4),($5, $6),($7, $8),($9, $10),($11, $12),($13, $14),($15, $16)  
DETAIL:  parameters: $1 = 'Post no. 1', $2 = '1', $3 = 'Post no. 2', $4 = '2', $5 = 'Post no. 3', $6 = '3', $7 = 'Post no. 4', $8 = '4', $9 = 'Post no. 5', $10 = '5', $11 = 'Post no. 6', $12 = '6', $13 = 'Post no. 7', $14 = '7', $15 = 'Post no. 8', $16 = '8'  
LOG:  execute <unnamed>: insert into post (title, id) values ($1, $2),($3, $4)  
DETAIL:  parameters: $1 = 'Post no. 9', $2 = '9', $3 = 'Post no. 10', $4 = '10'

Community

How to Write into a Database Using reWriteBatchedInserts Parameter

Background

JDBC Supports Insert Into Values (),(),...;

Example 1: False Batch

Example 2: True Batch

References

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

Batch Compute

ApsaraDB RDS for PostgreSQL

A Free Trial That Lets You Build Big!