AnalyticDB for MySQL supports tiered storage of hot and cold data. You can move the hot and cold data to different storage media to ensure high query performance of hot data and reduce storage costs of cold data.
Overview
In a big data scenario, a business table can contain large volumes of business data, such as log data, order data, or monitoring data. Over time, specific data is less frequently accessed and is considered cold. However, the cold data occupies the storage space. As a result, the storage costs are increased.
AnalyticDB for MySQL tiered storage of hot and cold data is applicable to tables partitioned by date and time. An AnalyticDB for MySQL cluster sorts the partitions in descending order based on the specified number of hot partitions and the partition key values. The largest N partitions are considered hot partitions, and the other partitions are considered cold partitions. The data of cold partitions can be moved to the lower-cost Object Storage Service (OSS) to reduce storage costs. The data of hot partitions remains in SSDs to ensure high query performance.
Prerequisites
Before you use tiered storage of hot and cold data in an AnalyticDB for MySQL cluster, make sure that the following requirements are met:
A Data Lakehouse Edition cluster or a Data Warehouse Edition cluster in elastic mode is created.
The minor version of the cluster is 3.1.3.3 or later.
NoteTo query the minor version of an AnalyticDB for MySQL Data Lakehouse Edition cluster, execute the
SELECT adb_version();
statement. To update the minor version of a cluster, contact technical support.For information about how to view and update the minor version of an AnalyticDB for MySQL Data Warehouse Edition cluster, see Update the minor version of a cluster.
Billing rules
When you use a cluster, you are charged for the storage of hot and cold data based on the pay-as-you-go billing method. For more information, see Pricing.
You can use storage plans to offset data storage costs in AnalyticDB for MySQL. For more information, see Storage plans.
Storage policies of hot and cold data
AnalyticDB for MySQL provides three policies for hot and cold data storage, which are cold storage, hot storage, and mixed storage.
Cold storage is a cost-effective storage policy. When this storage policy is used, all data is stored in OSS in locally redundant storage (LRS) mode.
Hot storage is a storage policy that can meet high access performance requirements. When this storage policy is used, all data is stored in SSDs. This policy provides the best query performance but requires the highest storage costs.
Mixed storage is a storage policy that allows hot partitions to be stored in SSDs and cold partitions to be stored in OSS. This policy ensures high query performance of hot partitions and reduces storage costs of cold partitions. The following section describes the principle of the mixed storage policy:
When the mixed storage policy is used, you must specify the number of hot partitions. Partitions are sorted by partition key values in descending order. If the number of the hot partitions is set to N, the first N partitions stored on SSDs are hot partitions and the others stored in OSS are cold partitions. The distribution of hot and code partitions may be changed when the number of partitions is changed. For more information, see the "Impact of partition quantity changes on the distribution of hot and cold partitions" section of this topic.
Impact of partition quantity changes on the distribution of hot and cold partitions
Assume that the number of hot partitions is N. When a new partition is inserted, all partitions are re-sorted to ensure that only N hot partitions exist.
Assume that the number of hot partitions is N and you want to change the number of hot partitions to M.
If M is greater than N, the data of
M - N
partitions is moved from cold partitions to hot partitions.If M is less than N, the data of
N - M
partitions is moved from hot partitions to cold partitions.
Specify a storage policy of hot and cold data
When you create a table, you can specify a storage policy of hot and cold data by using the
storage_policy
parameter. For more information, see CREATE TABLE.For an existing table, you can specify a storage policy of hot and cold data by executing the
ALTER TABLE
statement. For more information, see the "Change the tiered storage policy of hot and cold data" section of the ALTER TABLE topic.
Query the storage policy of hot and cold data
Syntax
Query the storage policy of hot and cold data for all tables.
SELECT * FROM information_schema.table_usage;
Query the storage policy of hot and cold data for a specific table.
SELECT * FROM information_schema.table_usage WHERE table_schema='<schema_name>' AND table_name='<table_name>';
Response parameters
Parameter | Description |
table_schema | The name of the database. |
table_name | The name of the table. |
storage_policy | The storage policy. Valid values:
|
hot_partition_count | The number of hot partitions. |
cold_partition_count | The number of cold partitions. |
rt_total_size | The total size of real-time data, which is the sum of the rt_data_size and rt_index_size parameters. Unit: bytes. |
rt_data_size | The size of real-time data. Unit: bytes. |
rt_index_size | The size of primary key and index data in real-time data. Unit: bytes. |
hot_total_size | The total size of data in hot partitions, which is the sum of the hot_data_size and hot_index_size parameters. Unit: bytes. |
hot_data_size | The size of data in hot partitions. Unit: bytes. |
hot_index_size | The size of primary key and index data in hot partitions. Unit: bytes. |
cold_total_size | The total size of data in cold partitions, which is the sum of the cold_data_size and cold_index_size parameters. Unit: bytes. |
cold_data_size | The size of data in cold partitions. Unit: bytes. |
cold_index_size | The size of primary key and index data in cold partitions. Unit: bytes. |
Note:
Values of the rt_total_size, rt_data_size, rt_index_size, hot_total_size, hot_data_size, hot_index_size, cold_total_size, cold_data_size, and cold_index_size parameters vary based on the execution of INSERT, UPDATE, DELETE, and BUILD statements.
If the values of the hot_total_size and cold_total_size parameters are both 0 after you write data, the data is synchronized in real time. The rt_total_size parameter indicates the size of real-time data. You can execute the BUILD statement to convert real-time data into historical data. After the BUILD job is complete, you can view the hot_total_size and cold_total_size parameters. For more information, see BUILD.
The defined hot_partition_count parameter specifies the number of hot partitions in a single shard, whereas the queried hot_partition_count parameter indicates the number of hot partitions after shards are unioned. If data partitions are differently distributed across shards, the value of the queried hot_partition_count parameter may be greater than the value of the defined hot_partition_count parameter.
For example, Table A contains Shard 1 and Shard 2, and hot_partition_count is set to 2. The following figure shows the distribution of data partitions in Table A.
Shard 1: P4 and P5 are hot partitions. P1, P2, and P3 are cold partitions.
Shard 2: P3 and P4 are hot partitions. P1 and P2 are cold partitions.
The actual number of hot partitions is calculated by using the following formula: (P4, P5) Union (P3, P4) = (P3, P4, P5). Therefore, the actual value of the hot_partition_count parameter is 3.
Query the change progress of the storage policy of hot and cold data
After you change the storage policy of hot and cold data by executing the ALTER TABLE
statement, you can query the change progress of the storage policy from the storage_policy_modify_progress
table.
Syntax
Query the change progress of the storage policy of hot and cold data for all tables.
SELECT * FROM information_schema.storage_policy_modify_progress;
Query the change progress of the storage policy of hot and cold data for a specific table.
SELECT * FROM information_schema.storage_policy_modify_progress WHERE table_schema='<schema_name>' AND table_name='<table_name>';
Response parameters
Parameter | Description |
table_schema | The name of the database. |
table_name | The name of the table. |
task_id | The ID of the storage policy change job. |
source_storage_policy | The original storage policy. Valid values:
|
source_hot_partition_count | The number of original hot partitions. |
dest_storage_policy | The new storage policy. Valid values:
|
dest_hot_partition_count | The number of new hot partitions. |
hot_to_cold_partition_count | The number of partitions changed from the hot partition to the cold partition. |
cold_to_hot_partition_count | The number of partitions changed from the cold partition to the hot partition. |
hot_to_cold_data_size | The size of data changed from hot partition to cold partition. Unit: bytes. |
cold_to_hot_data_size | The size of data changed from hot partition to cold partition. Unit: bytes. |
hot_data_size_before_change | The size of hot data before the storage policy is changed. Unit: bytes. |
cold_data_size_before_change | The size of cold data before the storage policy is changed. Unit: bytes. |
hot_data_size_after_change | The size of hot data after the storage policy is changed. Unit: bytes. |
cold_data_size_after_change | The size of cold data after the storage policy is changed. Unit: bytes. |
start_time | The start of the time range within which the storage policy is changed. |
update_time | The end of the time range within which the storage policy is changed. |
progress | The change progress of the storage policy. Unit: %. |
status | The change status of the storage policy. Valid values:
|