AnalyticDB for MySQL V3.1.3.3 or later in elastic mode for Cluster Edition supports tiered storage of hot and cold data at the table or partition level.
Prerequisites
Before you use tiered storage of hot and cold data, make sure that the following requirements are met:
An AnalyticDB for MySQL cluster in elastic mode for Cluster Edition is created.
The minor version of the cluster is 3.1.3.3 or later.
NoteFor information about how to view the minor version of an AnalyticDB for MySQL cluster, see How do I view the minor version of a cluster?
To update the minor version of an AnalyticDB for MySQL cluster, Submit a ticket and provide a time window for the update. A minor version update may take about 20 minutes to complete. During the update, your cluster can run as expected but may encounter transient connections several times. We recommend that you update the minor version of your cluster during off-peak hours and make sure that your application is configured to automatically reconnect to your cluster.
Billing rules
When you use a cluster, you are charged for the storage of hot and cold data based on the pay-as-you-go billing method. For more information, see Billable items of Data Warehouse Edition and Pricing for Data Warehouse Edition.
For subscription clusters, you can purchase storage plans to offset the costs of hot data storage and cold data storage. You are charged for the excess storage space based on the pay-as-you-go billing method. For more information, see Storage plans.
Storage policies
AnalyticDB for MySQL provides three policies for hot and cold data storage, which are cold storage, hot storage, and mixed storage.
Cold storage is a cost-effective storage policy. When this storage policy is used, all data is stored in Object Storage Service (OSS).
NoteAnalyticDB for MySQL stores cold data in OSS in locally redundant storage (LRS) mode.
Hot storage is a storage policy that can meet high access performance requirements. When this storage policy is used, all data is stored in SSDs.
Mixed storage is a storage policy that allows specific partitions to be stored in SSDs and other partitions to be stored in OSS.
Specify a storage policy for hot and cold data
When you execute the CREATE TABLE statement to create a table, you can use the storage_policy
parameter to specify a storage policy for hot and cold data of the table.
To modify the storage policy for hot and cold data of an existing table, execute the ALTER TABLE table_name storage_policy;
statement. For more information, see the "Storage policies" section of the ALTER TABLE topic.
Principles of mixed storage
When the mixed storage policy is used, you must specify the number of hot partitions by using the hot_partition_count
parameter. For more information about the hot_partition_count
parameter, see CREATE TABLE.
Partitions are sorted by partition key values in descending order. If the hot_partition_count parameter is set to N, the first N partitions stored on SSDs are hot partitions and the others stored in OSS are cold partitions.
For example, assume that the number of hot partitions is set to 4, and partitions are sorted by partition key values in the following descending order: 20201110, 20201109, 20201108, 20201107, 20201106, 20201105, and 20201104. In this case, the first four partitions are specified as hot partitions and the others are cold partitions.
The distribution of hot and cold partitions may be changed in the following scenarios:
Data is added, modified, or deleted. For more information, see the "Impact of data changes on the distribution of hot and cold partitions" section of this topic.
The number of hot partitions is changed. For more information, see the "Impact of hot partition quantity changes on the distribution of hot and cold partitions" section of this topic.
Impact of data changes on the distribution of hot and cold partitions
When a new partition is inserted, all partitions are sorted again so that only N hot partitions exist.
In the following example, a new partition 20201110 is inserted into a table. This partition has the largest partition key value among all partitions of the table. As a result, the data in partition 20201105 that has the smallest partition key value is migrated from hot partitions to cold partitions and the data in partition 20201110 is migrated to hot partitions.
Impact of hot partition quantity changes on the distribution of hot and cold partitions
Assume that the number of hot partitions is N and you want to change the number of hot partitions to M.
If M is greater than N, the data of M - N partitions is migrated from cold partitions to hot partitions.
In the following example, the number of hot partitions is changed from five to six. As a result, the data in partition 20201104 that has the largest partition key value is migrated from cold partitions to hot partitions.
If M is less than N, the data of N - M partitions is migrated from hot partitions to cold partitions.
For example, the number of hot partitions is changed from five to four. As a result, the data in the partition that has the smallest partition key value is migrated from hot partitions to cold partitions.
Query the distribution of hot and cold data storage
You can use the table_usage
table to query the distribution of hot and cold data storage. Examples:
Query the distribution of hot and cold data storage in all tables.
select * from information_schema.table_usage;
Query the distribution of hot and cold data storage in a specific table.
select * from information_schema.table_usage where table_schema='<schema_name>' and table_name='<table_name>';
The following table describes the fields involved in the table_usage table.
Field | Description |
table_schema | The name of the database. |
table_name | The name of the table. |
storage_policy | The storage policy. Valid values:
|
hot_partition_count | The number of hot partitions. |
cold_partition_count | The number of cold partitions. |
rt_total_size | The total volume of real-time data, which is the sum of the rt_data_size and rt_index_size fields. Unit: bytes. |
rt_data_size | The volume of real-time data. Unit: bytes. |
rt_index_size | The volume of primary key and index data in real-time data. Unit: bytes. |
hot_total_size | The total volume of data in hot partitions, which is the sum of the hot_data_size and hot_index_size fields. Unit: bytes. |
hot_data_size | The volume of data in hot partitions. Unit: bytes. |
hot_index_size | The volume of primary key and index data in hot partitions. Unit: bytes. |
cold_total_size | The total volume of data in cold partitions, which is the sum of the cold_data_size and cold_index_size fields. Unit: bytes. |
cold_data_size | The volume of data in cold partitions. Unit: bytes. |
cold_index_size | The volume of primary key and index data in cold partitions. Unit: bytes. |
Note:
The table_usage table is updated in real time. Values of the rt_total_size, rt_data_size, rt_index_size, hot_total_size, hot_data_size, hot_index_size, cold_total_size, cold_data_size, and cold_index_size fields vary based on the execution of INSERT, UPDATE, DELETE, and BUILD statements.
If the values of the hot_total_size and cold_total_size fields are both 0 after data is loaded, data is still stored in the real-time engine and the value of the rt_total_size field indicates the volume of real-time data. You can execute the following BUILD statement to convert real-time data to partitioned data and then query the values of the hot_total_size and cold_total_size fields:
build table <table_name>;
A user-defined hot_partition_count field indicates the number of hot partitions in a single shard after list partitioning, whereas the hot_partition_count field queried from the table_usage table indicates the number of hot partitions after shards are unioned. If data partitions are differently distributed across shards, the value of the hot_partition_count field queried from the table_usage table may be greater than that of the user-defined hot_partition_count field.
For example, assume that Table A contains Shard 1 and Shard 2, and hot_partition_count is set to 2. The following figure shows the distribution of data partitions in Table A.
Shard 1: P4 and P5 are hot partitions. P1, P2, and P3 are cold partitions.
Shard 2: P3 and P4 are hot partitions. P1 and P2 are cold partitions.
The actual number of hot partitions is calculated by using the following formula: (P4, P5) Union (P3, P4) = (P3, P4, P5). Therefore, the actual value of hot_partition_count is 3.
Query the change progress of the storage policy
You can execute the ALTER TABLE
statement to change the storage policy of a table. For more information, see ALTER TABLE. The storage_policy_modify_progress
table can be used to query the change progress of the storage policy.
Query the change progress of the storage policy for all tables involved in the current cluster.
select * from information_schema.storage_policy_modify_progress;
Query the change progress of the storage policy for a specific table.
select * from information_schema.storage_policy_modify_progress where table_schema='<schema_name>' and table_name='<table_name>';
The following table describes the fields involved in the storage_policy_modify_progress
table.
Field | Description |
table_schema | The name of the database. |
table_name | The name of the table. |
task_id | The ID of the storage policy change job. |
source_storage_policy | The original storage policy. Valid values:
|
source_hot_partition_count | The number of original hot partitions. |
dest_storage_policy | The new storage policy. Valid values:
|
dest_hot_partition_count | The number of new hot partitions. |
hot_to_cold_partition_count | The number of partitions that are changed from hot storage to cold storage. |
cold_to_hot_partition_count | The number of partitions that are changed from cold storage to hot storage. |
hot_to_cold_data_size | The volume of data that is changed from hot storage to cold storage. Unit: bytes. |
cold_to_hot_data_size | The volume of data that is changed from cold storage to hot storage. Unit: bytes. |
hot_data_size_before_change | The volume of hot data before the storage policy is changed. Unit: bytes. |
cold_data_size_before_change | The volume of cold data before the storage policy is changed. Unit: bytes. |
hot_data_size_after_change | The volume of hot data after the storage policy is changed. Unit: bytes. |
cold_data_size_after_change | The volume of cold data after the storage policy is changed. Unit: bytes. |
start_time | The start of the time range within which the storage policy is changed. |
update_time | The end of the time range within which the storage policy is changed. |
progress | The change progress of the storage policy. Unit: %. |
status | The change status of the storage policy. Valid values:
|