How Does PolarDB-X Optimize Batch Insert?

The Batch Insert statement is a common way to write data to databases. The PolarDB-X is compatible with the MySQL protocol and syntax. The batch insert syntax is listed below:

INSERT [IGNORE] [INTO] table_name(column_name, ...) VALUES (value1, ...), (value2, ...), ...;

The following factors can affect the performance of a batch insert query:

Batch size
The degree of parallelism (DOP)
The number of shards
The number of columns
The number of global secondary indexes
The number of sequences

Factors such as the number of shards, the number of columns, the number of GSIs, and the number of sequences are domestic demand factors. They are set according to actual requirements and often affect the read performance. For example, if the number of GSIs is large, the write performance will decrease, but the read performance will be improved. This topic does not describe the impacts of the factors. This topic describes how to specify an appropriate batch size and DOP based on your business requirements.

Test Environment

The following table displays the settings used in the test:

Specification	Parameter
PolarDB-X Version	polarx-kernel_5.4.11-16279028_xcluster-20210802
Node Specifications	16 CPU cores, 64 GB memory
Nodes	4

You can use the following statement to create a table:

CREATE TABLE `sbtest1` (
    `id` int(11) NOT NULL,
    `k` int(11) NOT NULL DEFAULT '0',
    `c` char(120) NOT NULL DEFAULT '',
    `pad` char(60) NOT NULL DEFAULT '',
    PRIMARY KEY (`id`),
    KEY `k_1` (`k`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;

BATCH_INSERT_POLICY=SPLIT

PolarDB-X is optimized for batch data writing. Batch Insert is optimized to ensure better concurrency. When the size of a single Batch Insert statement exceeds 256K, PolarDB-X dynamically splits the Batch Insert statement into multiple small batches and executes them in series between multiple small batches. This feature is called SPLIT.

The best performance is ensured through the BATCH_INSERT_POLICY=SPLIT mechanism while reducing the cost of batch inserts PolarDB-X in parallel execution. The load imbalance of multiple nodes is avoided in distributed areas as much as possible.

Parameters:

BATCH_INSERT_POLICY: You can set the value to SPLIT or NONE. The default value is SPLIT. If the value of this parameter is set to SPLIT, the automatic split feature is enabled.
MAX_BATCH_INSERT_SQL_LENGTH: This parameter specifies the value of a size threshold for SQL statements. The default value is 256, and the unit is KB. If the default value is used, the system automatically splits batch insert statements whose size exceeds 256 KB.
BATCH_INSERT_CHUNK_SIZE_DEFAULT: This parameter specifies the maximum size of each batch after the statement is split. The default value is 200. They represent the batch size of small batches after each split when the dynamic split batch is triggered.

You can add the following hint statement: /*+ TDDL:CMD_EXTRA(BATCH_INSERT_POLICY=NONE)*/ to disable the BATCH_INSERT_POLICY=SPLIT mechanism. The goal of this parameter is to disable the BATCH_INSERT_POLICY policy. It ensures the batch size is not automatically split when the PolarDB-X is executed. It can be used to verify the performance when the batch size is 2000, 5000, and 10000. The test results show the increase is not obvious after the batch size exceeds 1000.

Performance Benchmark for a Single Table

Background

In distributed scenarios, an order table is only placed on one host. Its performance can be used as a basic performance baseline to evaluate the horizontal scaling capability of the partition table. Data are evenly distributed to multiple hosts through a partition table.

The test method is to perform the Batch Insert operation on a single table in the PolarDB-X. The data of a single table only exists in one data storage node. PolarDB-X writes data to the corresponding data storage node based on the table definition.

Scenario 1: Batch Size

Parameters: [parallelism: 16] [column: 4] [gsi: none] [sequence: none]

	batch size	1	10	100	500	1000	2000	5000	10000
PolarDB-X [Single Table]	Performance (Rows per Second)	5397	45653	153216	211976	210644	215103	221919	220529

Scenario 2: Parallelism

Parameters: [batch size:1,000] [column: 4] [gsi: none] [sequence: none]

	thread	1	2	4	8	16	32	64	128
PolarDB-X [Single Table]	Performance (Rows per Second)	22625	41326	76052	127646	210644	223431	190138	160858

Summary

We recommend specifying a batch size of 1,000 rows and setting the DOP to a value in the range of 16 to 32 to test the performance of a table.

Note: When the batch size is 2,000, 5,000, and 10,000, you need to add a hint parameter to disable the SPLIT feature. According to the test results, the increase is not obvious after the batch size exceeds 1000. Example: /*+ TDDL:CMD_EXTRA(BATCH_INSERT_POLICY=NONE)*/

Benchmark Testing for the Performance of Sharded Tables

The batch size and DOP affect the performance of batch insert queries. The tests described in this section test query performance based on batch size and DOP.

Scenario 1: Batch Size

When a batch insert statement is executed on a sharded table, the sharding function used to shard the table shards the values specified in the statement. The batch size of the data that is pushed down to data nodes is different from the specified batch size. The following figure shows the process:

Therefore, under data shards, the Batch Insert statement of the PolarDB-X can be larger, or the data of the same physical table can be put into one Batch Insert statement as much as possible. This ensures that the batch size that is pushed down to a single data shard after the shard function separates values is appropriate to improve the performance of the storage node.

Sub-Scenario 1: BATCH_INSERT_POLICY=SPLIT

Parameters: [BATCH_INSERT_POLICY: on] [parallelism: 32] [number of parts: 32] [column: 4] [gsi: none] [sequence: none]

	batch size	1	10	100	500	1000	2000	5000	10000
PolarDB-X [Number of Parts: 32]	Performance (Rows per Second)	12804	80987	229995	401215	431579	410120	395398	389176

Note: If batch size >= 2000, the BATCH_INSERT_POLICY policy is triggered.

Sub-Scenario 2: BATCH_INSERT_POLICY=NONE

Parameters: [BATCH_INSERT_POLICY: disabled] – The other parameters are the same.

	batch size	1000	2000	5000	10000	20000	30000	50000
PolarDB-X [Number of Parts: 32]	Performance (Rows per Second)	431579	463112	490350	526751	549990	595026	685500

Summary

BATCH_INSERT_POLICY=SPLIT, batch size is 1000, and the overall performance is 43w rows per second, which is twice that of a single table.
BATCH_INSERT_POLICY=NODE, the values in this test are randomly distributed, and the splitting function is Hash, so the data distributed to each shard is uniform. In theory, the performance will be better under the batch size with the number of shards * 1,000. When the maximum batch size is 50,000, the overall performance will be 680,000 rows per second.

Scenario 2: Parallelism

The appropriate criterion for determining the degree of parallelism is to fill up the CPU utilization of PolarDB-X data nodes or fill up the IOPS to achieve better performance. Since the Batch Insert statement has no calculation, the overhead of PolarDB-X compute nodes is small. The main overhead is in PolarDB-X data nodes. If the degree of parallelism is too small or too large, the performance will be affected. The factors that affect the value of parallelism include the number of nodes, node specifications (number of cores and CPU), and thread pool pressure. Therefore, it is difficult to obtain an exact number of parallelism. We recommend testing it in a practical environment to find out the best parallelism suitable for this environment.

Sub-Scenario 1: Test the different parallelism of the 1,000 batch size under four nodes.

Parameters: [batch size:1000] [column: 4] [gsi: none] [sequence: none]

	thread	1	2	4	8	16	32	64	80	96
PolarDB-X [Number of Parts: 32]	Performance (Rows per Second)	40967	80535	151415	246062	367720	431579	478876	499918	487173

Summary

In the test, the performance reaches the peak value when the value of DOP is in the range of 64 to 80. The highest performance is approximately 500,000 rows per second.

Sub-Scenario 2: A batch insert statement is executed on different sharded tables deployed on different groups of nodes.

Parameters: [2 nodes: 2CN*2DN] [batch size: 20000] [column: 4] [gsi: none] [sequence: none]

	thread	4	8	12	16
PolarDB-X [Number of Parts: 16]	Performance (Rows per Second)	159794	302754	296298	241444

Parameters: [3 nodes: 3CN*3DN] [batch size: 20000] [column: 4] [gsi: none] [sequence: none]

	thread	9	12	15	18
PolarDB-X [Number of Parts: 24]	Performance (Rows per Second)	427212	456050	378420	309052

Parameters: [4 nodes: 4CN*4DN] [batch size:20000] [column: 4] [gsi: none] [sequence: none]

	thread	16	32	40	64
PolarDB-X [Number of Parts: 32]	Performance (Rows per Second)	464612	549990	551992	373268

Summary

When the number of database nodes increases, the value of DOP needs to be increased to ensure high performance. The peak value under two nodes is 8 concurrent 30w. The peak value under three nodes is 12 concurrent 45w. The peak value under four nodes is 32 concurrent 55w. The overall linear rate is about 0.9~1 with the performance improvement of the number of nodes.

Sub-Scenario 3: The Degree of Parallelism Test under Different Node Specifications

Parameters: [batch size: 20000] [column: 4] [gsi: none] [sequence: none]

	thread	4	8	10	12	16
PolarDB-X [4-core 16GB]	Performance (Rows per Second)	165674	288828	276837	264873	204738

	thread	8	10	12	16
PolarDB-X [8-core 32GB]	Performance (Rows per Second)	292780	343498	315982	259892

	thread	16	32	40	64
PolarDB- [16-core 64GB]	Performance (Rows per Second)	464612	549990	551992	373268

Summary

If the node specifications are upgraded, the value of DOP needs to be increased to ensure high performance. If the statement is executed on a node whose specification is 4 cores and 16 GB of memory, the system reaches the highest performance when the DOP is set to 8. The highest performance is approximately 280,000 rows per second. If the statement is executed on a node whose specification is 8 cores and 32 GB of memory, the system reaches the highest performance when the DOP is set to 10. The highest performance is approximately 340,000 rows per second. If the statement is executed on a node whose specification is 16 cores and 64 GB of memory, the system reaches the highest performance when the DOP is set to 32. The highest performance is approximately 550,000 rows per second. The linear rate of the performance increase is approximately from 50% to 60% when the specification of the node is upgraded to the nearest specification.

Test Summary

We recommend specifying a batch size of 1,000 rows and setting the DOP to a value in the range of 16 to 32 to ensure high concurrency and resource usage of your PolarDB-X instance.
If you want to increase the batch size, we recommend setting the batch size to the number of shards * [100 to 1,000], such as 20,000 to 50,000. Make sure the size of each statement used to insert a batch of data is in the range of 2 MB to 8 MB. The maximum size is 16 MB. If you do not want to use the automatic split feature, you must use a hint to specify BATCH_INSERT_POLICY=NONE. If you use an SQL statement to query a large size of data, high workloads exist on each compute node in a distributed system, and issues may occur (such as high memory usage and unbalanced loads among multiple nodes).
Batch Insert queries consume IOPS resources. CPU utilization and memory usage are not the primary factors that cause performance bottleneck issues. If you want to increase the resources of your PolarDB-X instance to improve the performance of queries, we recommend adding nodes to your PolarDB-X instance.
We recommend using Batch Tool, a PolarDB-X supporting import, and export tool for batch import of offline text data. This tool will be open-source soon. Please visit this link for more information about how to use it.

Community

How Does PolarDB-X Optimize Batch Insert?

Test Environment

BATCH_INSERT_POLICY=SPLIT

Performance Benchmark for a Single Table

Background

Scenario 1: Batch Size

Scenario 2: Parallelism

Summary

Benchmark Testing for the Performance of Sharded Tables

Scenario 1: Batch Size

Sub-Scenario 1: BATCH_INSERT_POLICY=SPLIT

Sub-Scenario 2: BATCH_INSERT_POLICY=NONE

Summary

Scenario 2: Parallelism

Sub-Scenario 1: Test the different parallelism of the 1,000 batch size under four nodes.

Summary

Sub-Scenario 2: A batch insert statement is executed on different sharded tables deployed on different groups of nodes.

Summary

Sub-Scenario 3: The Degree of Parallelism Test under Different Node Specifications

Summary

Test Summary

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for MySQL

PolarDB for PostgreSQL

PolarDB for Xscale

Architecture and Structure Design