By digoal
With PostgreSQL, you may often deliberate about one question: How to quickly write data into a PostgreSQL database? The database system offers several optimizations, such as asynchronous commit, deferred analysis, deferred index creation, added elongation checkpoints, and shared buffer. In addition, you should also note the writing mode.
The writing mode is as follows:
1) Best copy performance.
Example:
db2=> \h copy
Command: COPY
Description: copy data between a file and a table
Syntax:
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ]( option [, ...] ) ]
[ WHERE condition ]
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
TO { 'filename' | PROGRAM 'command' | STDOUT }
[ [ WITH ]( option [, ...] ) ]
where option can be one of:
FORMAT format_name
FREEZE [ boolean ]
DELIMITER 'delimiter_character'
NULL 'null_string'
HEADER [ boolean ]
QUOTE 'quote_character'
ESCAPE 'escape_character'
FORCE_QUOTE { ( column_name [, ...] ) | * }
FORCE_NOT_NULL ( column_name [, ...] )
FORCE_NULL ( column_name [, ...] )
ENCODING 'encoding_name'
URL: https://www.postgresql.org/docs/12/sql-copy.html
db2=> create table t(id int);
CREATE TABLE
db2=> copy t from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1
>> 2
>> 3
>> 4
>> 5
>> 6
>> \.
COPY 6
2) The performance of insert into values (),(),...;
is similar to the copy.
db2=> insert into t values (1),(2),(3);
INSERT 0 3
db2=> prepare a (int,int,int) as insert into t values ($1),($2),($3);
PREPARE
db2=> execute a(1,2,3);
INSERT 0 3
db2=> select * from t;
id
----
1
2
3
(3 rows)
3) Multiple statements are placed in one transaction, reducing the synchronous waiting of write-ahead log (WAL) flush and WAL Input/Output (I/O).
begin
...
commit;
db2=> begin;
BEGIN
db2=> insert into t values (4);
INSERT 0 1
db2=> insert into t values (5);
INSERT 0 1
db2=> insert into t values (6);
INSERT 0 1
db2=> commit;
COMMIT
4) Group commit.
When you configure commit_delay, if the transactions that enter the commit state reach the commit_siblings at the same time, only flush the WAL once to reduce the WAL I/O.
This improves the effect of autocommit only for high concurrency but not for low concurrency.
Query parameters:
db2=> show commit_delay ;
commit_delay
--------------
0
(1 row)
db2=> show commit_siblings ;
commit_siblings
-----------------
5
(1 row)
This will change batch inserts from insert into foo (col1, col2, col3) values (1,2,3) into insert into foo (col1, col2, col3) values (1,2,3), (4,5,6). This provides 2-3x performance improvement
@Entity(name = "Post")
@Table(name = "post")
public class Post {
@Id
@GeneratedValue(
strategy = GenerationType.SEQUENCE
)
private Long id;
private String title;
public Post() {}
public Post(String title) {
this.title = title;
}
//Getters and setters omitted for brevity
}
<property
name="hibernate.jdbc.batch_size"
value="10"
/>
for (int i = 0; i < 10; i++) {
entityManager.persist(
new Post(
String.format("Post no. %d", i + 1)
)
);
}
Query: ["insert into post (title, id) values (?, ?)"],
Params:[(Post no. 1, 1), (Post no. 2, 2), (Post no. 3, 3),
(Post no. 4, 4), (Post no. 5, 5), (Post no. 6, 6),
(Post no. 7, 7), (Post no. 8, 8), (Post no. 9, 9),
(Post no. 10, 10)
]
In fact, 10 inserts are split in the database.
log_statement = 'all'
LOG: execute S_2: insert into post (title, id) values ($1, $2)
DETAIL: parameters: $1 = 'Post no. 1', $2 = '1'
LOG: execute S_2: insert into post (title, id) values ($1, $2)
DETAIL: parameters: $1 = 'Post no. 2', $2 = '2'
LOG: execute S_2: insert into post (title, id) values ($1, $2)
DETAIL: parameters: $1 = 'Post no. 3', $2 = '3'
LOG: execute S_2: insert into post (title, id) values ($1, $2)
DETAIL: parameters: $1 = 'Post no. 4', $2 = '4'
LOG: execute S_2: insert into post (title, id) values ($1, $2)
DETAIL: parameters: $1 = 'Post no. 5', $2 = '5'
LOG: execute S_2: insert into post (title, id) values ($1, $2)
DETAIL: parameters: $1 = 'Post no. 6', $2 = '6'
LOG: execute S_2: insert into post (title, id) values ($1, $2)
DETAIL: parameters: $1 = 'Post no. 7', $2 = '7'
LOG: execute S_2: insert into post (title, id) values ($1, $2)
DETAIL: parameters: $1 = 'Post no. 8', $2 = '8'
LOG: execute S_2: insert into post (title, id) values ($1, $2)
DETAIL: parameters: $1 = 'Post no. 9', $2 = '9'
LOG: execute S_2: insert into post (title, id) values ($1, $2)
DETAIL: parameters: $1 = 'Post no. 10', $2 = '10'
PGSimpleDataSource dataSource =
(PGSimpleDataSource) super.dataSource();
dataSource.setReWriteBatchedInserts(true);
JDBC batch API, that is •••PreparedStatement#addBatch()•••
, enables the driver to send multiple "query executions" in a single network round trip. The current implementation, however, would still split large batches into smaller ones to avoid transmission control protocol (TCP) deadlock.
To prevent the TCP deadlock from being split into multiple blocks, after reWriteBatchedInserts is enabled, insert into () values (),(),()...
is supported.
LOG: execute <unnamed>: insert into post (title, id) values ($1, $2),($3, $4),($5, $6),($7, $8),($9, $10),($11, $12),($13, $14),($15, $16)
DETAIL: parameters: $1 = 'Post no. 1', $2 = '1', $3 = 'Post no. 2', $4 = '2', $5 = 'Post no. 3', $6 = '3', $7 = 'Post no. 4', $8 = '4', $9 = 'Post no. 5', $10 = '5', $11 = 'Post no. 6', $12 = '6', $13 = 'Post no. 7', $14 = '7', $15 = 'Post no. 8', $16 = '8'
LOG: execute <unnamed>: insert into post (title, id) values ($1, $2),($3, $4)
DETAIL: parameters: $1 = 'Post no. 9', $2 = '9', $3 = 'Post no. 10', $4 = '10'
How to Set Up libpq Failover and Load Balancing for JDBC Driver Layer
Changyi - February 16, 2020
Aliware - November 4, 2019
Alibaba Developer - February 7, 2021
Alibaba Cloud Product Launch - December 12, 2018
ApsaraDB - March 3, 2020
Alibaba Cloud Storage - February 27, 2020
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreResource management and task scheduling for large-scale batch processing
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreMore Posts by digoal