Skyrocketing Data Storage Costs? Do Not Worry! RDS Data Archiving Is Here for You

The article introduces the data archiving feature in ApsaraDB RDS to address the increasing data storage costs for infrequently accessed historical business data.

By Muxin and Tangjian

1. Why Is Data Archiving Developed?

A typical scenario during RDS usage is that tables storing important historical business data rarely get modified after a certain time point and are only queried occasionally. These tables cannot be dropped as they store important business data. However, the data accumulates over time, and the cost for enterprise SSDs (ESSDs) is relatively high, resulting in increased storage costs of the corresponding database. If such infrequently accessed data can be stored in lower-cost Object Storage Service (OSS), more expenses can be reduced for customers. As a result, the "data archiving feature" of ApsaraDB RDS was developed to meet this need.

As a core feature of the cold storage layer in the three-layer storage architecture of ApsaraDB RDS general ESSDs, the ApsaraDB RDS data archiving feature allows users to transfer infrequently modified online tables from a cloud disk to OSS through DDL operations, converting the tables into archived tables. Archived tables also support the read-only mode, allowing users to access their data in the tables while saving cloud disk storage costs. In addition, OSS storage space has no capacity limits, so there is no need to worry about running out of space for archived tables. The following table compares ESSD and OSS across several aspects.

Storage medium	ESSD	OSS
Performance: bandwidth (MB/s)	350	100
Data: reliability	99.9999999% (nine 9s)	99.9999999999% (twelve 9s)
Data: capacity (GB)	40-65536	Pay-as-you-go
Cost: unit price for storage (USD/GB/month) (Taking Singapore as an example)	$0.152	$0.0200

2. How to Implement Data Archiving?

To implement the data archiving feature, two key technical questions need to be addressed:

How does RDS use OSS?

Once this question is addressed, the data archiving feature will be available. This means the database engine can store data files in OSS without affecting normal access to them. Two methods are provided for implementing this: implementation within the engine and implementation outside the engine. Implementation within the engine involves enabling the RDS engine kernel to directly access OSS and convert online data to archived data in the kernel, thus utilizing OSS. Implementation outside the engine involves using an external file system to provide POSIX compatibility for OSS, allowing the RDS engine to use OSS as if it were a local directory.

How is data in RDS archived tables backed up and restored?

Once this question is addressed, data consistency can be ensured for archived tables, and archived data can be backed up and restored along with online data.

_01
Implementation of RDS data archiving

2.1 Implementation within the Engine

ApsaraDB RDS for MySQL primarily uses OSS through implementation within the engine, that is, OSS is used directly in the kernel. The kernel obtains the required OSS information based on parameters, and accesses and uses OSS within the engine. The following diagram shows the data stream of data archiving in ApsaraDB RDS for MySQL.

_02
Data archiving in ApsaraDB RDS for MySQL

When a user enables the data archiving feature and executes an ALTER TABLE statement to archive a table, the MySQL kernel will perform the following operations to upload the specified table to OSS and convert it into an archived table. The user can access data in the archived table by executing a SELECT statement.

Split the .ibd file of the table to be archived into file blocks with a size of 4 MB (specified by block_size).
Use the SDK of OSS to upload the file blocks to OSS.
Keep header files in ESSDs for easy access.

Examples of ALTER TABLE archiving and retrieval statements (ApsaraDB RDS for MySQL)

-- Archive
ALTER TABLE `tableName` ENGINE_ATTRIBUTE='{"OSS":"Y"}';
-- Retrieve
ALTER TABLE `tableName` ENGINE_ATTRIBUTE='{"OSS":"N"}';

2.2 Implementation Outside the Engine

ApsaraDB RDS for PostgreSQL primarily uses OSS through implementation outside the engine, that is, OSS is used based on the Grail file system. The Grail file system allows the engine to access data in OSS just like accessing local disks by providing a complete POSIX protocol. The following diagram shows the data stream of data archiving in ApsaraDB RDS for PostgreSQL.

_03
Data archiving in ApsaraDB RDS for PostgreSQL

When a user enables the data archiving feature, in addition to the data directory, an archive directory named /cold-data is automatically created in the ApsaraDB RDS for PostgreSQL instance, and the corresponding tablespace rds_oss is created. The user can transfer the table to be archived to the rds_oss tablespace by executing an ALTER TABLE archiving statement.

Data in the rds_oss tablespace is uploaded to OSS through the Grail file system, ensuring that it does not occupy the storage space of ESSDs. All tables in the rds_oss tablespace are archived tables, and the user can query data in those tables with normal query statements.

Examples of ALTER TABLE archiving and retrieval statements (ApsaraDB RDS for PostgreSQL)

-- Archive
alter table tableName set tablespace rds_oss
-- Retrieve
alter table tableName set tablespace pg_default;

2.3 Backup of Archived Tables

For the RDS data archiving feature, it is crucial to ensure data consistency alongside feature availability. This requires consistent backup and restore of data in the archived tables. The backup and restore of data in RDS archived tables are performed through snapshots and snapshot clones of the Grail file system.

With Alibaba Cloud EBS and OSS as its underlying storage, the Grail file system provides database engine-friendly storage layer solutions such as resource management, data compression, transmission buffer, and verification. These solutions are tailored to meet the SLA targets, data formats, and traffic characteristics of cloud database services.

Each Grail file system consists of data and metadata. Here, data refers to all objects stored in source OSS after each file in the file system is split; and metadata records the data organization format and the storage path of each data block in OSS for each file. From the preceding composition of the Grail file system, it can be seen that any file in the file system can be restored based on corresponding metadata, as long as the data is available in OSS. The process of creating a Grail snapshot in an RDS data archiving scenario is outlined in the following diagram.

_04
Process of creating a Grail snapshot for an RDS archived table

The service initiates a snapshot creation request.
GRAIL SNAPSHOT SERVER issues the Create Snapshot request, locks the DDL of RDS, and obtains OSS metadata from ApsaraDB RDS for MySQL instances.
For ApsaraDB RDS for MySQL instances, GRAIL SNAPSHOT SERVER will convert the format of the metadata obtained from OSS. For ApsaraDB RDS for PostgreSQL instances, there is no need to convert the metadata format.
During backup for the first time, all data in OSS is copied and the metadata is dumped to OSS to generate snapshot v0. For subsequent backups, the incremental data in OSS is copied based on metadata and the metadata is dumped to generate snapshots v1, v2, v3, etc.
GRAIL SNAPSHOT SERVER switches the data path and unlocks DDL of RDS. Subsequent data will be written to the path of the new version.
The snapshot is created.

3. How to Use Data Archiving?

The data archiving feature is now available in both ApsaraDB RDS for MySQL and ApsaraDB RDS for PostgreSQL.

• Eligibility for using the data archiving feature

Information	ApsaraDB RDS for MySQL	ApsaraDB RDS for PostgreSQL
Major version	8	15
Minor version	MySQL 8.020240131 or later	PostgreSQL 15.0 20240229 or later
Instance billing method	Subscription/Pay-as-you-go (Serverless is not supported yet)	Subscription/Pay-as-you-go (Serverless is not supported yet)
Product edition	Basic Edition/High-availability Edition/Cluster Edition	Basic Edition/High-availability Edition
Product type	Standard/YiTian	Standard/YiTian
Storage type	General ESSD	General ESSD
Specifications	Unlimited	8 CPU cores or more
Region and zone (for reference only; please refer to the console for actual use)	All regions and zones supported by general ESSDs	All regions and zones supported by general ESSDs
Read/Write mode	Read-only	Read-only

Note

As header files are retained locally, ApsaraDB RDS for MySQL requires that a data table to be archived must be larger than 6 MB.

• Enable or disable data archiving

With the console: If you are creating a new RDS instance, you can enable or disable the data archiving feature in the configuration bar on the creation page. If you have purchased an RDS instance eligible for using the data archiving feature, you can go to the basic information page of the instance, and enable or disable the data archiving feature in general ESSD configurations.
With OpenAPI: For instances eligible for using the data archiving feature, you can call ModifyDBInstanceSpec and pass in ColdDataEnabled=true/false to enable or disable the data archiving feature.

• Query archived data

You can view the archived tables of an RDS instance that uses general ESSDs on the cold storage management page of the instance in the console.

• Use data archiving

ApsaraDB RDS for MySQL: Archive and retrieve data with DDL statements, as follows:

-- Archive
ALTER TABLE `tableName` ENGINE_ATTRIBUTE='{"OSS":"Y"}'; 
-- Retrieve
ALTER TABLE `tableName` ENGINE_ATTRIBUTE='{"OSS":"N"}';

ApsaraDB RDS for PostgreSQL: Archive and retrieve data with DDL statements, as follows:

-- Archive
alter table tableName set tablespace rds_oss
-- Retrieve
alter table tableName set tablespace pg_default;

4. Data Archiving Test

Here, we take ApsaraDB RDS for MySQL as an example to test the read-only performance before and after cold data archiving.

Preparations

Product type: ApsaraDB RDS for MySQL

Test approach: Perform read-only operations on the same instance, and compare the performance before and after cold data archiving.

Amount of test data: 285 GB (10 tables × 120,000,000 rows)

Test tool: sysbench

4.1 Test Approach

1. Prepare test data: Run the following command to prepare 285 GB of data in the system (10 tables, each of which contains 120 million rows).

sysbench oltp_read_only --tables=10 --table_size=120000000  --rand-type=special --rand-spec-pct=15 --time=300 --mysql-host=$DB_HOST --mysql-port=$DB_PORT --mysql-user=$DB_USER --mysql-password=$DB_PASS --mysql-db=$DB_NAME --threads=$THREAD_COUNT prepare

2. Archive the data: Use the following SQL statement to transfer the data in ESSDs to OSS.

ATLTER TABLE $table_name ENGINE_ATTRIBUTE='{"OSS":"Y"}';

3. Test the read-only performance: Run the following command to test the read performance of the instance.

sysbench oltp_read_only --tables=10 --table_size=120000000 --rand-type=special --rand-spec-pct=15 --time=300 --mysql-host=$DB_HOST --mysql-port=$DB_PORT --mysql-user=$DB_USER --mysql-password=$DB_PASS --mysql-db=$DB_NAME --threads=$THREAD_COUNT run

Note

Default parameter settings for data archiving in ApsaraDB RDS for MySQL are suitable for low-frequency accesses to cold data. For the sysbench test model, modifying some OSS parameters helps improve the performance (though with higher CPU consumption). Meanwhile, the I/O acceleration feature of general ESSDs may help enhance database performance. Therefore, during the test, we will enable and disable the I/O acceleration feature and adjust the OSS parameters to test performance from multiple aspects.

Testing scenario/parameters	Maximum OSS connections (oss_max_connections)	OSS prefetch (innodb_oss_prefetch)	Thread pool (thread_pool_enabled)
Default configurations (OSS_default)	64	ON	OFF
Optimized configurations (OSS_opt)	512	OFF	ON

4.2 Test Results

Here, we present the test results in two scenarios: one with I/O acceleration disabled and the other with I/O acceleration enabled. In each scenario, default and optimized OSS configurations are respectively used for performance comparison.

I/O acceleration disabled

• The read-only performance test results for a 8-core 16 GB standard ApsaraDB RDS for MySQL instance that runs High-availability Edition and uses a general-purpose instance type are as follows:

• With default configurations, the QPS for archived data can reach 8.4% of the QPS for data in PL1 ESSDs.

• With optimized configurations, the QPS for archived data can reach 51.2% of the QPS for data in PL1 ESSDs.

Comparison of read-only performance between archived tables and normal tables with I/O acceleration disabled

I/O acceleration enabled

The read-only performance test results for a 8-core 16 GB standard ApsaraDB RDS for MySQL instance that runs High-availability Edition and uses a general-purpose instance type are as follows:

• With default configurations, the QPS for archived data can reach 59.7% of the QPS for data in PL1 ESSDs.

• With optimized configurations, the QPS for archived data can reach 76.4% of the QPS for data in PL1 ESSDs.

• When I/O acceleration is enabled, the read-only performance of archived data is improved by 15 times with default configurations and by 2.5 times with optimized configurations.

Comparison of read-only performance between archived tables and normal tables with I/O acceleration enabled

Explanation of the results

Since OSS has much higher latency compared to ESSD, the read-only performance of archived data is significantly lower than that of data in ESSDs under low concurrency. You can shut down the thread pool (or increase the size of the thread pool) and increase the number of concurrent threads to improve read-only performance of archived data. However, this will consume more CPU resources. (CPU utilization reaches 55% with 256 concurrent threads when I/O acceleration is disabled, and 100% with 256 concurrent threads when I/O acceleration is enabled.)
Actual test results can vary because there are differences in OSS latency.
Stress testing for archived data requires more CPU and memory resources. During the test, please monitor the CPU and memory usage.

5. Summary

As a core capability of RDS general ESSDs, the data archiving feature significantly reduces storage costs by using OSS archiving storage for low-frequency data access scenarios, such as when an access cycle is measured in months or years. In addition, with the I/O acceleration feature provided by general ESSDs, the performance for querying archived data is significantly improved to over 50% of that for querying data in ESSDs. This breaks the stereotype that access to data in OSS is slow.

In short, the combination of "data archiving + I/O acceleration" represents a significant innovation, emerging as a dark horse in the field. This approach not only revolutionizes traditional database storage solutions but also offers a valuable opportunity for users to optimize their system architecture when building enterprise services on cloud databases. It is a solution well worth exploring.

Try out database products for free:

Community

Skyrocketing Data Storage Costs? Do Not Worry! RDS Data Archiving Is Here for You

1. Why Is Data Archiving Developed?

2. How to Implement Data Archiving?

2.1 Implementation within the Engine

2.2 Implementation Outside the Engine

2.3 Backup of Archived Tables

3. How to Use Data Archiving?

4. Data Archiving Test

4.1 Test Approach

4.2 Test Results

I/O acceleration disabled

I/O acceleration enabled

Explanation of the results

5. Summary

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

ApsaraDB RDS for MySQL

ApsaraDB RDS for PostgreSQL

ApsaraDB RDS for SQL Server

ApsaraDB RDS for MariaDB