Policies and methods for hot and cold data storage - AnalyticDB

AnalyticDB for MySQL supports tiered storage of hot and cold data. You can move the hot and cold data to different storage media to ensure high query performance of hot data and reduce storage costs of cold data.

Overview

In a big data scenario, a business table can contain large volumes of business data, such as log data, order data, or monitoring data. Over time, specific data is less frequently accessed and is considered cold. However, the cold data occupies the storage space. As a result, the storage costs are increased.

AnalyticDB for MySQL tiered storage of hot and cold data is applicable to tables partitioned by date and time. An AnalyticDB for MySQL cluster sorts the partitions in descending order based on the specified number of hot partitions and the partition key values. The largest N partitions are considered hot partitions, and the other partitions are considered cold partitions. The data of cold partitions can be moved to the lower-cost Object Storage Service (OSS) to reduce storage costs. The data of hot partitions remains in SSDs to ensure high query performance.

Prerequisites

Before you use tiered storage of hot and cold data in an AnalyticDB for MySQL cluster, make sure that the following requirements are met:

The cluster is of the Enterprise Edition, Basic Edition, Data Lakehouse Edition, or Data Warehouse Edition Elastic mode.

The minor version of the cluster is 3.1.3.3 or later.
Note
To view and update the minor version of an AnalyticDB for MySQL cluster, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.

Billing rules

When you use a cluster, you are charged for the storage of hot and cold data based on the pay-as-you-go billing method. For more information, see Pricing.

Note

You can use storage plans to offset data storage costs in AnalyticDB for MySQL.

Storage policies of hot and cold data

AnalyticDB for MySQL provides three policies for hot and cold data storage, which are cold storage, hot storage, and mixed storage.

Cold storage is a cost-effective storage policy. When this storage policy is used, all data is stored in OSS in zone-redundant storage (ZRS) mode.
Hot storage is a storage policy that can meet high access performance requirements. When this storage policy is used, all data is stored in SSDs. This policy provides the best query performance but requires the highest storage costs.
Mixed storage is a storage policy that allows hot partitions to be stored in SSDs and cold partitions to be stored in OSS. This policy ensures high query performance of hot partitions and reduces storage costs of cold partitions. The following section describes the principle of the mixed storage policy:
When the mixed storage policy is used, you must specify the number of hot partitions. Partitions are sorted by partition key values in descending order. If the number of the hot partitions is set to N, the first N partitions stored on SSDs are hot partitions and the others stored in OSS are cold partitions. The distribution of hot and code partitions may be changed when the number of partitions is changed. For more information, see the "Impact of partition quantity changes on the distribution of hot and cold partitions" section of this topic.

Impact of partition quantity changes on the distribution of hot and cold partitions

Assume that the number of hot partitions is N. When a new partition is inserted, all partitions are re-sorted to ensure that only N hot partitions exist.
Click to view the diagram
As show in the following figure, the number of hot partitions is set to 5, and a new partition 20241226 is inserted into a table. This partition has the largest partition key value among all partitions of the table. After a BUILD job, this partition must be placed among hot partitions. To ensure that the number of hot partitions is 5, the system moves partition 20241221 that has the smallest partition key value from hot partitions to cold partitions and places partition 20241226 among hot partitions.
Assume that the number of hot partitions is N and you want to change the number of hot partitions to M.
- If M is greater than N, the data of M - N partitions is moved from cold partitions to hot partitions.
  Click to view the diagram
  As shown in the following figure, the number of hot partitions is changed from 5 to 6. This way, the system moves partition 20241220 that has the largest partition key value from cold partitions to hot partitions.
- If M is less than N, the data of N - M partitions is moved from hot partitions to cold partitions.
  Click to view the diagram
  For example, the number of hot partitions is changed from 5 to 4. This way, the system moves partition 20241221 that has the smallest partition key value from hot partitions to cold partitions.

Specify a storage policy of hot and cold data

When you create a table, you can specify a storage policy of hot and cold data by using the storage_policy parameter.
For an existing table, you can specify a storage policy of hot and cold data by executing the ALTER TABLE statement.

Query the storage policy of hot and cold data

Syntax

Query the storage policy of hot and cold data for all tables.
```
SELECT * FROM information_schema.table_usage;
```

Query the storage policy of hot and cold data for a specific table.

SELECT * FROM information_schema.table_usage WHERE table_schema='<schema_name>' AND table_name='<table_name>';

Response parameters

Parameter	Description
table_schema	The name of the database.
table_name	The name of the table.
storage_policy	The storage policy. Valid values: HOT COLD MIXED
hot_partition_count	The number of hot partitions.
cold_partition_count	The number of cold partitions.
rt_total_size	The total size of real-time data, which is the sum of the rt_data_size and rt_index_size parameters. Unit: bytes.
rt_data_size	The size of real-time data. Unit: bytes.
rt_index_size	The size of primary key and index data in real-time data. Unit: bytes.
hot_total_size	The total size of data in hot partitions, which is the sum of the hot_data_size and hot_index_size parameters. Unit: bytes.
hot_data_size	The size of data in hot partitions. Unit: bytes.
hot_index_size	The size of primary key and index data in hot partitions. Unit: bytes.
cold_total_size	The total size of data in cold partitions, which is the sum of the cold_data_size and cold_index_size parameters. Unit: bytes.
cold_data_size	The size of data in cold partitions. Unit: bytes.
cold_index_size	The size of primary key and index data in cold partitions. Unit: bytes.

Note:

Values of the rt_total_size, rt_data_size, rt_index_size, hot_total_size, hot_data_size, hot_index_size, cold_total_size, cold_data_size, and cold_index_size parameters vary based on the execution of INSERT, UPDATE, DELETE, and BUILD statements.
If the values of the hot_total_size and cold_total_size parameters are both 0 after you write data, the data is synchronized in real time. The rt_total_size parameter indicates the size of real-time data. You can execute the BUILD statement to convert real-time data into historical data. After the BUILD job is complete, you can view the hot_total_size and cold_total_size parameters.
The defined hot_partition_count parameter specifies the number of hot partitions in a single shard, whereas the queried hot_partition_count parameter indicates the number of hot partitions after shards are unioned. If data partitions are differently distributed across shards, the value of the queried hot_partition_count parameter may be greater than the value of the defined hot_partition_count parameter.
For example, Table A contains Shard 1 and Shard 2, and hot_partition_count is set to 2. The following figure shows the distribution of data partitions in Table A.
Shard 1: P4 and P5 are hot partitions. P1, P2, and P3 are cold partitions.
Shard 2: P3 and P4 are hot partitions. P1 and P2 are cold partitions.
The actual number of hot partitions is calculated by using the following formula: (P4, P5) Union (P3, P4) = (P3, P4, P5). Therefore, the actual value of the hot_partition_count parameter is 3.

Query the change progress of the storage policy of hot and cold data

After you change the storage policy of hot and cold data by executing the ALTER TABLE statement, you can query the change progress of the storage policy from the storage_policy_modify_progress table.

Syntax

Query the change progress of the storage policy of hot and cold data for all tables.
```
SELECT * FROM information_schema.storage_policy_modify_progress;
```

Query the change progress of the storage policy of hot and cold data for a specific table.

SELECT * FROM information_schema.storage_policy_modify_progress WHERE table_schema='<schema_name>' AND table_name='<table_name>';

Response parameters

Parameter	Description
table_schema	The name of the database.
table_name	The name of the table.
task_id	The ID of the storage policy change job.
source_storage_policy	The original storage policy. Valid values: HOT COLD MIXED
source_hot_partition_count	The number of original hot partitions.
dest_storage_policy	The new storage policy. Valid values: HOT COLD MIXED
dest_hot_partition_count	The number of new hot partitions.
hot_to_cold_partition_count	The number of partitions changed from the hot partition to the cold partition.
cold_to_hot_partition_count	The number of partitions changed from the cold partition to the hot partition.
hot_to_cold_data_size	The size of data changed from hot partition to cold partition. Unit: bytes.
cold_to_hot_data_size	The size of data changed from hot partition to cold partition. Unit: bytes.
hot_data_size_before_change	The size of hot data before the storage policy is changed. Unit: bytes.
cold_data_size_before_change	The size of cold data before the storage policy is changed. Unit: bytes.
hot_data_size_after_change	The size of hot data after the storage policy is changed. Unit: bytes.
cold_data_size_after_change	The size of cold data after the storage policy is changed. Unit: bytes.
start_time	The start of the time range within which the storage policy is changed.
update_time	The end of the time range within which the storage policy is changed.
progress	The change progress of the storage policy. Unit: %.
status	The change status of the storage policy. Valid values: INIT: No change starts. RUNNING: The change is in progress. FINISH: The change is complete.