×
Community Blog Does a Single Table with 5 Million Rows Need Database Sharding and Table Sharding?

Does a Single Table with 5 Million Rows Need Database Sharding and Table Sharding?

This article evaluates the optimization practices in the context of PolarDB and provides insights into large table optimization practices for developers and database administrators.

By Rongbiao Xie (Guimo)

1. Traditional Pain Points of Database Sharding and Table Sharding

In business architecture design, one of the most critical aspects is database modeling. Due to limitations in single-machine computing and storage capacity, many architects have to split large tables by using middleware or other methods to distribute data across multiple databases or tables, ultimately achieving resource scalability through sharding. As a result, it is often said within the MySQL ecosystem that a single table should not exceed 5 million rows.

In general, a sharding solution addresses the following issues:

• Insufficient local storage
• Insufficient computing power
• Operational challenges with large tables

When the data volume increases, the O&M team has to consider data splitting to ensure business stability. Due to the compatibility challenges of database sharding and table sharding, modification to the business code is required. The development team must participate in architecture transformation and rewrite SQL statements to adapt to the new database and table structure. The business team focuses on business growth and has little time to participate in system splitting in most cases. As a result, system splitting is postponed until it ultimately must be implemented. During this period, the stability of the overall system is constantly at risk.

The rule of no more than 5 million rows in a table has deep historical roots in the MySQL ecosystem. In terms of resources, the I/O capacity of the early server was relatively low. If a single table was too large, the height of the B-tree index would be increased, leading to I/O problems. Additionally, disk capacity was limited, so storage limits and backup space were concerns. Operationally, older versions of MySQL (V5.5 and earlier) did not support online DDL, which could cause service disruptions when maintaining large tables.

Compared with traditional MySQL database systems, the cloud-native database PolarDB owned by Alibaba Cloud ApsaraDB uses a shared distributed storage system that provides significantly more powerful capabilities for processing large tables. Most traditional scenarios of database sharding and table sharding are no longer applicable. A large number of users have more than 10 TB or 1 billion rows of data in a single table in the PolarDB public cloud. Performing database sharding and table sharding on data that involves such single tables is considerably complex. Therefore, we recommend that you do not perform excessive optimization at the earliest opportunity unless necessary.

2. Large Table Optimization Practices in PolarDB

2.1 Provide Large Elastic Storage

During rapid growth phases, many businesses realized that data splitting was necessary not because they hit a computational bottleneck, but because the storage capacity for massive datasets reached its limit within a single instance. This situation often arises when the initial design does not account for large-scale data usage patterns or when the business logic does not allow for data cleanup or archiving.

PolarDB for MySQL uses a compute-storage separation architecture and distributed shared storage. Compared with traditional standalone MySQL instances, distributed shared storage can store up to 500 TB of data in a single PolarDB for MySQL cluster.

With a large amount of storage space, users do not have to worry about disk capacity restrictions and scaling issues. They only need to focus on the design of business requirements and do not need to split data into databases and tables. This greatly reduces the complexity of business systems.

1

2.2 Optimize Computing Capabilities

One of the biggest factors that limit standalone instances is the performance of large tables. To prevent large tables from becoming the bottleneck in database operations, PolarDB has conducted extensive research and validation. To address the pain points in data insertion scenarios of large tables, we studied the mechanisms of each module that introduced bottlenecks and optimized them accordingly. Through benchmark tests and actual business scenarios, we can achieve a performance improvement of up to 10 times, including:

2.2.1 Optimize Index Concurrency Control

PolarDB optimizes the large index lock for a single instance and innovatively proposes a Polar index that allows concurrent splitting. This reduces the conflict overhead during high-concurrency access. In an actual online business scenario, the performance can be improved by 3 times, and in TPCC scenarios, the performance can be improved by 11 times.

2

2.2.2 Optimize Table File Extension

Table files require to be extended because data is continuously written. High-frequency data insertion also increases the frequency of tablespace extension. In MySQL, tablespace extension is a resource-intensive operation. Therefore, highly frequent extension operations affect the execution performance of transactions. To reduce the overhead of file extension in data insertion scenarios of large tables, PolarDB utilizes its self-developed distributed file system. During the process of table file extension, it minimally modifies file system metadata, ensuring that lock overhead does not become a bottleneck. This greatly improves the performance of PolarDB in scenarios where large amounts of data are written.

3

2.2.3 Optimize Redo Log Writes

High-load insertions inevitably generate large volumes of redo logs. To ensure the durability of a transaction, the transaction can be returned only after all redo logs are stored on the disk. Therefore, the latency and throughput of redo log writes directly affect database performance. PolarDB innovatively introduces a parallel redo log write mechanism to address the pain points of writing large volumes of redo logs for large tables to disk. Currently, redo log throughput can reach up to 4 GB/s, with a single instance's I/O throughput peaking at 5.2 GB/s.

4

2.2.4 Optimize Logical Prefetching, I/O Scheduling, and Parallel Secondary Index Insertion

Actual experience and system tests indicate that to address the challenges of cloud database systems, the performance of PolarDB in handling large tables in a single instance can meet most business requirements.

5

2.2.5 Performance Data

We use sysbench to insert 40 billion rows of data (10 TB). As a comparison, PolarDB also inserts 4 billion rows of data (1 TB) and 5 million rows of data (125 GB). On the whole, the performance of a single large table is similar to that of a single small table in high-concurrency scenarios. In addition, in the real production environment, such intensive writes do not cause dirty pages to remain at a very high level as in the stress testing scenario, and can fully carry general business traffic.

6
Performance comparison between 40 billion rows and 4 billion rows

7
Performance comparison between 4 billion rows and 5 million rows

2.3 Optimize DDL O&M Capabilities

2.3.1 Parallel DDL

Traditional DDL operations are designed based on a single core and traditional hard disks. As a result, it takes a long time to execute the DDL operations on large tables, and the latency is high. For example, when you perform DDL operations to create secondary indexes, subsequent DML queries that depend on the new indexes are blocked due to the high latency of the DDL operations. The multi-core processors provide hardware support for the parallel DDL feature to use more threads. In addition, Solid State Disks (SSDs) are widely used so that random access latency is close to sequential access latency. In this case, the parallel DDL feature can be used to accelerate the process of creating indexes on large tables.

The parallel DDL feature of PolarDB can maximize the usage of hardware resources to significantly reduce the execution time of DDL operations. This prevents the blockage of subsequent DML operations and shortens the windows of DDL operations.

Since the launch of parallel DDL, it has actually helped a large number of customers solve the pain points of adding indexes to large tables. User cases show that secondary indexes can be created for a large table that is 5 TB in size and has 6 billion rows within only one hour after the parallel DDL feature is enabled.

The following figure shows the increased rates of execution efficiency of the DDL operations that create secondary indexes on the b field of the INT type for tables of various sizes after the parallel DDL feature is enabled. Different numbers of parallel threads are used. The results show that a performance boost of up to 2000% can be achieved when 32 threads are enabled.

8
Comparison of DDL performance improvements with different numbers of parallel threads

2.3.2 Second-level DDL (add or modify fields)

Typically, if you want to add columns to a table, the entire table must be rebuilt. This consumes a large amount of system resources. PolarDB for MySQL supports the instant ADD COLUMN feature. When you add columns, you need to change only the definition of the table, rather than the existing data. This allows you to instantly add columns to a table regardless of the size of the table.

You can add columns to a table within seconds regardless of the size of the table.

9

2.3.3 Character set conversion within seconds

The UTF-8 code set is a commonly used code set. By default, the utf8mb3 character set is used as the UTF-8 encoding format in MySQL Community Edition. This character set uses up to 3 bytes per character. If you need to store information such as emoji, you may need to convert it to the utf8mb4 character set. Generally, the conversion of character sets requires table reconstruction, which is time-consuming and has a significant impact on your business. PolarDB for MySQL supports conversion within seconds from utf8mb3 to utf8mb4. With this feature, you can easily change the utf8mb3 character set to the utf8mb4 character set.

ALTER TABLE tablename MODIFY COLUMN test_column varchar(60) CHARACTER SET utf8mb4, ALGORITHM = INPLACE;

If the ERROR 1846 (0A000): ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY. error message is returned after you execute the preceding statement, it indicates that the current operation cannot be performed by using an in-place algorithm. We recommend you carefully review the usage limitations.

If you set ALGORITHM to DEFAULT or do not specify ALGORITHM, PolarDB will select the fastest algorithm to change a column character set. Example:

ALTER TABLE tablename MODIFY COLUMN test_column varchar(60) CHARACTER SET utf8mb4, ALGORITHM = DEFAULT;
ALTER TABLE tablename MODIFY COLUMN test_column varchar(60) CHARACTER SET utf8mb4;

Effect: Character set conversion from utf8mb3 into utf8mb4 can be completed within seconds regardless of the table size.

2.3.4 SaaS function: faster TRUNCATE/DROP TABLE

When you execute the TRUNCATE TABLE and DROP TABLE statements on MySQL Community 5.7, the entire buffer pool is scanned and all data pages corresponding to the tablespace are removed from the LRU list and FLUSH list. When the buffer pool is large, this process takes a long time. PolarDB optimizes the buffer pool management mechanism for DDL statements to improve the scan efficiency of buffer pools and the execution efficiency of the TRUNCATE TABLE and DROP TABLE statements.

Procedure:

The parameter loose_innodb_flush_pages_using_space_id is set to ON to enable this function.

Effect:

For clusters of different specifications, record the execution time (in seconds) required to execute the TRUNCATE TABLE statement on the t1 and t2 tables after the faster TRUNCATE/DROP TABLE function is enabled and disabled. In this example, 8,196 rows of data are inserted into the t1 table to simulate the TRUNCATE operation on a small table and 2,097,152 rows of data are inserted into the t2 table to simulate the TRUNCATE operation on a large table. The following table lists the execution results.

10

The preceding table shows that enabling the faster TRUNCATE/DROP TABLE feature can significantly improve the execution efficiency of the TRUNCATE TABLE statement.

2.3.5 Avoid business avalanche: nonblocking DDL

If a blocking DDL statement is submitted and the table it affects has uncommitted transactions or queries, the DDL statement keeps waiting for the MDL-X lock. While the DDL statement waits, transactions that operate data in the same table may still be submitted. However, because MDL-X locks in PolarDB have the highest priority, the new transactions must wait until this blocking DDL statement is completed. As a result, the connections are congested, which may cause the entire business system to fail.

To address this, PolarDB allows you to enable the nonblocking DDL feature for DDL operations. After this feature is enabled, the locking policy of DDL statements is consistent with that of tools (gh-ost / DMS lock-free change). This achieves lock-free/lossless performance on business. In addition, the performance of PolarDB is significantly better than that of peripheral tools because the DDL process is native to the kernel. When this feature is enabled, DDL operations can maintain the high performance of the kernel and have lossless features similar to the use of peripheral tools. This greatly reduces the impact of DDL operations on your business.

Procedure:
Use the loose_polar_nonblock_ddl_mode parameter to enable the nonblocking DDL feature. After enabling this feature, you can execute DDL statements during peak business hours. Even if the DDL cannot obtain a lock, it will not affect your business.

Effect:

1.  When nonblocking DDL statements are disabled, the TPS decreases to zero and stays at zero for a long period of time. The default timeout is 31,536,000 seconds, which severely impacts the business.

11

2.  When nonblocking DDL statements are enabled, the TPS decreases in a periodical manner, but never decreases to zero. The lock wait only slightly affects the business, and the stability of the system is ensured.

12

3.  When gh-ost is used to modify the table schema, the TPS decreases to zero in a periodical manner. In this case, the business is severely affected, which is caused by the temporary table lock in the cut-over step.

13

4.  Performance comparison: When the oltp_read_write script of Sysbench is used to simulate business workloads, nonblocking DDL statements with the INSTANT, INPLACE, and COPY keywords specified are still faster than the gh-ost tool.

14
The nonblocking DDL feature enabled
Comparison of native kernel DDL duration and GH-OST performance

3. Summary

15

Most major cloud service providers have launched highly compatible solutions and positioned them as core products of their cloud platforms.

Compatibility is one of the most important requirements for cloud service providers. For many small and medium-sized customers, 100% compatibility is a critical reference indicator. When customers migrate services to the cloud, they consider existing technology stacks and code. Incompatibility can impose additional learning costs on developers and increase the costs of transforming existing businesses, resulting in additional expenses. In the future, as your business grows, large tables may affect business development, performance may no longer meet your requirements, and no data cleanup solutions are available. You can consider related issues at that time.

0 1 0
Share on

ApsaraDB

464 posts | 102 followers

You may also like

Comments

ApsaraDB

464 posts | 102 followers

Related Products