By Muxin and Tangjian
A typical scenario during RDS usage is that tables storing important historical business data rarely get modified after a certain time point and are only queried occasionally. These tables cannot be dropped as they store important business data. However, the data accumulates over time, and the cost for enterprise SSDs (ESSDs) is relatively high, resulting in increased storage costs of the corresponding database. If such infrequently accessed data can be stored in lower-cost Object Storage Service (OSS), more expenses can be reduced for customers. As a result, the "data archiving feature" of ApsaraDB RDS was developed to meet this need.
As a core feature of the cold storage layer in the three-layer storage architecture of ApsaraDB RDS general ESSDs, the ApsaraDB RDS data archiving feature allows users to transfer infrequently modified online tables from a cloud disk to OSS through DDL operations, converting the tables into archived tables. Archived tables also support the read-only mode, allowing users to access their data in the tables while saving cloud disk storage costs. In addition, OSS storage space has no capacity limits, so there is no need to worry about running out of space for archived tables. The following table compares ESSD and OSS across several aspects.
Storage medium |
ESSD |
OSS |
Performance: bandwidth (MB/s) |
350 |
100 |
Data: reliability |
99.9999999% (nine 9s) |
99.9999999999% (twelve 9s) |
Data: capacity (GB) |
40-65536 |
Pay-as-you-go |
Cost: unit price for storage (USD/GB/month) (Taking Singapore as an example) |
$0.152 |
$0.0200 |
To implement the data archiving feature, two key technical questions need to be addressed:
How does RDS use OSS?
Once this question is addressed, the data archiving feature will be available. This means the database engine can store data files in OSS without affecting normal access to them. Two methods are provided for implementing this: implementation within the engine and implementation outside the engine. Implementation within the engine involves enabling the RDS engine kernel to directly access OSS and convert online data to archived data in the kernel, thus utilizing OSS. Implementation outside the engine involves using an external file system to provide POSIX compatibility for OSS, allowing the RDS engine to use OSS as if it were a local directory.
How is data in RDS archived tables backed up and restored?
Once this question is addressed, data consistency can be ensured for archived tables, and archived data can be backed up and restored along with online data.
Implementation of RDS data archiving
ApsaraDB RDS for MySQL primarily uses OSS through implementation within the engine, that is, OSS is used directly in the kernel. The kernel obtains the required OSS information based on parameters, and accesses and uses OSS within the engine. The following diagram shows the data stream of data archiving in ApsaraDB RDS for MySQL.
Data archiving in ApsaraDB RDS for MySQL
When a user enables the data archiving feature and executes an ALTER TABLE statement to archive a table, the MySQL kernel will perform the following operations to upload the specified table to OSS and convert it into an archived table. The user can access data in the archived table by executing a SELECT statement.
Examples of ALTER TABLE archiving and retrieval statements (ApsaraDB RDS for MySQL)
-- Archive
ALTER TABLE `tableName` ENGINE_ATTRIBUTE='{"OSS":"Y"}';
-- Retrieve
ALTER TABLE `tableName` ENGINE_ATTRIBUTE='{"OSS":"N"}';
ApsaraDB RDS for PostgreSQL primarily uses OSS through implementation outside the engine, that is, OSS is used based on the Grail file system. The Grail file system allows the engine to access data in OSS just like accessing local disks by providing a complete POSIX protocol. The following diagram shows the data stream of data archiving in ApsaraDB RDS for PostgreSQL.
Data archiving in ApsaraDB RDS for PostgreSQL
When a user enables the data archiving feature, in addition to the data directory, an archive directory named /cold-data is automatically created in the ApsaraDB RDS for PostgreSQL instance, and the corresponding tablespace rds_oss is created. The user can transfer the table to be archived to the rds_oss tablespace by executing an ALTER TABLE archiving statement.
Data in the rds_oss tablespace is uploaded to OSS through the Grail file system, ensuring that it does not occupy the storage space of ESSDs. All tables in the rds_oss tablespace are archived tables, and the user can query data in those tables with normal query statements.
Examples of ALTER TABLE archiving and retrieval statements (ApsaraDB RDS for PostgreSQL)
-- Archive
alter table tableName set tablespace rds_oss
-- Retrieve
alter table tableName set tablespace pg_default;
For the RDS data archiving feature, it is crucial to ensure data consistency alongside feature availability. This requires consistent backup and restore of data in the archived tables. The backup and restore of data in RDS archived tables are performed through snapshots and snapshot clones of the Grail file system.
With Alibaba Cloud EBS and OSS as its underlying storage, the Grail file system provides database engine-friendly storage layer solutions such as resource management, data compression, transmission buffer, and verification. These solutions are tailored to meet the SLA targets, data formats, and traffic characteristics of cloud database services.
Each Grail file system consists of data and metadata. Here, data refers to all objects stored in source OSS after each file in the file system is split; and metadata records the data organization format and the storage path of each data block in OSS for each file. From the preceding composition of the Grail file system, it can be seen that any file in the file system can be restored based on corresponding metadata, as long as the data is available in OSS. The process of creating a Grail snapshot in an RDS data archiving scenario is outlined in the following diagram.
Process of creating a Grail snapshot for an RDS archived table
The data archiving feature is now available in both ApsaraDB RDS for MySQL and ApsaraDB RDS for PostgreSQL.
• Eligibility for using the data archiving feature
Information |
ApsaraDB RDS for MySQL |
ApsaraDB RDS for PostgreSQL |
Major version |
8 |
15 |
Minor version |
MySQL 8.020240131 or later |
PostgreSQL 15.0 20240229 or later |
Instance billing method |
Subscription/Pay-as-you-go (Serverless is not supported yet) |
Subscription/Pay-as-you-go (Serverless is not supported yet) |
Product edition |
Basic Edition/High-availability Edition/Cluster Edition |
Basic Edition/High-availability Edition |
Product type |
Standard/YiTian |
Standard/YiTian |
Storage type |
General ESSD |
General ESSD |
Specifications |
Unlimited |
8 CPU cores or more |
Region and zone (for reference only; please refer to the console for actual use) |
All regions and zones supported by general ESSDs |
All regions and zones supported by general ESSDs |
Read/Write mode |
Read-only |
Read-only |
Note
As header files are retained locally, ApsaraDB RDS for MySQL requires that a data table to be archived must be larger than 6 MB.
• Enable or disable data archiving
• Query archived data
You can view the archived tables of an RDS instance that uses general ESSDs on the cold storage management page of the instance in the console.
• Use data archiving
-- Archive
ALTER TABLE `tableName` ENGINE_ATTRIBUTE='{"OSS":"Y"}';
-- Retrieve
ALTER TABLE `tableName` ENGINE_ATTRIBUTE='{"OSS":"N"}';
-- Archive
alter table tableName set tablespace rds_oss
-- Retrieve
alter table tableName set tablespace pg_default;
Here, we take ApsaraDB RDS for MySQL as an example to test the read-only performance before and after cold data archiving.
Preparations
Product type: ApsaraDB RDS for MySQL
Test approach: Perform read-only operations on the same instance, and compare the performance before and after cold data archiving.
Amount of test data: 285 GB (10 tables × 120,000,000 rows)
Test tool: sysbench
1. Prepare test data: Run the following command to prepare 285 GB of data in the system (10 tables, each of which contains 120 million rows).
sysbench oltp_read_only --tables=10 --table_size=120000000 --rand-type=special --rand-spec-pct=15 --time=300 --mysql-host=$DB_HOST --mysql-port=$DB_PORT --mysql-user=$DB_USER --mysql-password=$DB_PASS --mysql-db=$DB_NAME --threads=$THREAD_COUNT prepare
2. Archive the data: Use the following SQL statement to transfer the data in ESSDs to OSS.
ATLTER TABLE $table_name ENGINE_ATTRIBUTE='{"OSS":"Y"}';
3. Test the read-only performance: Run the following command to test the read performance of the instance.
sysbench oltp_read_only --tables=10 --table_size=120000000 --rand-type=special --rand-spec-pct=15 --time=300 --mysql-host=$DB_HOST --mysql-port=$DB_PORT --mysql-user=$DB_USER --mysql-password=$DB_PASS --mysql-db=$DB_NAME --threads=$THREAD_COUNT run
Note
Default parameter settings for data archiving in ApsaraDB RDS for MySQL are suitable for low-frequency accesses to cold data. For the sysbench test model, modifying some OSS parameters helps improve the performance (though with higher CPU consumption). Meanwhile, the I/O acceleration feature of general ESSDs may help enhance database performance. Therefore, during the test, we will enable and disable the I/O acceleration feature and adjust the OSS parameters to test performance from multiple aspects.
Testing scenario/parameters |
Maximum OSS connections (oss_max_connections) |
OSS prefetch (innodb_oss_prefetch) |
Thread pool (thread_pool_enabled) |
Default configurations (OSS_default) |
64 |
ON |
OFF |
Optimized configurations (OSS_opt) |
512 |
OFF |
ON |
Here, we present the test results in two scenarios: one with I/O acceleration disabled and the other with I/O acceleration enabled. In each scenario, default and optimized OSS configurations are respectively used for performance comparison.
• The read-only performance test results for a 8-core 16 GB standard ApsaraDB RDS for MySQL instance that runs High-availability Edition and uses a general-purpose instance type are as follows:
• With default configurations, the QPS for archived data can reach 8.4% of the QPS for data in PL1 ESSDs.
• With optimized configurations, the QPS for archived data can reach 51.2% of the QPS for data in PL1 ESSDs.
Comparison of read-only performance between archived tables and normal tables with I/O acceleration disabled
The read-only performance test results for a 8-core 16 GB standard ApsaraDB RDS for MySQL instance that runs High-availability Edition and uses a general-purpose instance type are as follows:
• With default configurations, the QPS for archived data can reach 59.7% of the QPS for data in PL1 ESSDs.
• With optimized configurations, the QPS for archived data can reach 76.4% of the QPS for data in PL1 ESSDs.
• When I/O acceleration is enabled, the read-only performance of archived data is improved by 15 times with default configurations and by 2.5 times with optimized configurations.
Comparison of read-only performance between archived tables and normal tables with I/O acceleration enabled
As a core capability of RDS general ESSDs, the data archiving feature significantly reduces storage costs by using OSS archiving storage for low-frequency data access scenarios, such as when an access cycle is measured in months or years. In addition, with the I/O acceleration feature provided by general ESSDs, the performance for querying archived data is significantly improved to over 50% of that for querying data in ESSDs. This breaks the stereotype that access to data in OSS is slow.
In short, the combination of "data archiving + I/O acceleration" represents a significant innovation, emerging as a dark horse in the field. This approach not only revolutionizes traditional database storage solutions but also offers a valuable opportunity for users to optimize their system architecture when building enterprise services on cloud databases. It is a solution well worth exploring.
Try out database products for free:
[Infographic] Highlights | Database New Features in July 2024
Optimizing Performance and Cost Efficiency: Advanced ApsaraDB for RDS Storage Architecture
Alibaba Cloud Community - August 30, 2024
ApsaraDB - August 21, 2024
ApsaraDB - November 17, 2020
ApsaraDB - July 23, 2024
ApsaraDB - July 23, 2021
Alibaba Cloud MaxCompute - April 25, 2019
An on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreAn on-demand database hosting service for SQL Server with automated monitoring, backup and disaster recovery capabilities
Learn MoreApsaraDB RDS for MariaDB supports multiple storage engines, including MySQL InnoDB to meet different user requirements.
Learn MoreMore Posts by ApsaraDB