All Products
Search
Document Center

AnalyticDB:Tiered storage of hot and cold data

更新時間:Sep 04, 2024

AnalyticDB for MySQL V3.1.3.3 or later in elastic mode for Cluster Edition supports tiered storage of hot and cold data at the table or partition level.

Prerequisites

Before you use tiered storage of hot and cold data, make sure that the following requirements are met:

  • An AnalyticDB for MySQL cluster in elastic mode for Cluster Edition is created.

  • The minor version of the cluster is 3.1.3.3 or later.

    Note
    • For information about how to view the minor version of an AnalyticDB for MySQL cluster, see How do I view the minor version of a cluster?

    • To update the minor version of an AnalyticDB for MySQL cluster, Submit a ticket and provide a time window for the update. A minor version update may take about 20 minutes to complete. During the update, your cluster can run as expected but may encounter transient connections several times. We recommend that you update the minor version of your cluster during off-peak hours and make sure that your application is configured to automatically reconnect to your cluster.

Billing rules

When you use a cluster, you are charged for the storage of hot and cold data based on the pay-as-you-go billing method. For more information, see Billable items of Data Warehouse Edition and Pricing for Data Warehouse Edition.

Note

For subscription clusters, you can purchase storage plans to offset the costs of hot data storage and cold data storage. You are charged for the excess storage space based on the pay-as-you-go billing method. For more information, see Storage plans.

Storage policies

AnalyticDB for MySQL provides three policies for hot and cold data storage, which are cold storage, hot storage, and mixed storage.

  • Cold storage is a cost-effective storage policy. When this storage policy is used, all data is stored in Object Storage Service (OSS).

  • Hot storage is a storage policy that can meet high access performance requirements. When this storage policy is used, all data is stored in SSDs.

  • Mixed storage is a storage policy that allows specific partitions to be stored in SSDs and other partitions to be stored in OSS.

image.png

Specify a storage policy for hot and cold data

When you execute the CREATE TABLE statement to create a table, you can use the storage_policy parameter to specify a storage policy for hot and cold data of the table.

To modify the storage policy for hot and cold data of an existing table, execute the ALTER TABLE table_name storage_policy; statement. For more information, see the "Storage policies" section of the ALTER TABLE topic.

Principles of mixed storage

When the mixed storage policy is used, you must specify the number of hot partitions by using the hot_partition_count parameter. For more information about the hot_partition_count parameter, see CREATE TABLE.

Partitions are sorted by partition key values in descending order. If the hot_partition_count parameter is set to N, the first N partitions stored on SSDs are hot partitions and the others stored in OSS are cold partitions.

For example, assume that the number of hot partitions is set to 4, and partitions are sorted by partition key values in the following descending order: 20201110, 20201109, 20201108, 20201107, 20201106, 20201105, and 20201104. In this case, the first four partitions are specified as hot partitions and the others are cold partitions.

The distribution of hot and cold partitions may be changed in the following scenarios:

Impact of data changes on the distribution of hot and cold partitions

When a new partition is inserted, all partitions are sorted again so that only N hot partitions exist.

In the following example, a new partition 20201110 is inserted into a table. This partition has the largest partition key value among all partitions of the table. As a result, the data in partition 20201105 that has the smallest partition key value is migrated from hot partitions to cold partitions and the data in partition 20201110 is migrated to hot partitions.热冷迁移

Impact of hot partition quantity changes on the distribution of hot and cold partitions

Assume that the number of hot partitions is N and you want to change the number of hot partitions to M.

  • If M is greater than N, the data of M - N partitions is migrated from cold partitions to hot partitions.

    In the following example, the number of hot partitions is changed from five to six. As a result, the data in partition 20201104 that has the largest partition key value is migrated from cold partitions to hot partitions.冷热迁移

  • If M is less than N, the data of N - M partitions is migrated from hot partitions to cold partitions.

    For example, the number of hot partitions is changed from five to four. As a result, the data in the partition that has the smallest partition key value is migrated from hot partitions to cold partitions.

Query the distribution of hot and cold data storage

You can use the table_usage table to query the distribution of hot and cold data storage. Examples:

  • Query the distribution of hot and cold data storage in all tables.

    select * from information_schema.table_usage;
  • Query the distribution of hot and cold data storage in a specific table.

    select * from information_schema.table_usage where table_schema='<schema_name>' and table_name='<table_name>';

The following table describes the fields involved in the table_usage table.

Field

Description

table_schema

The name of the database.

table_name

The name of the table.

storage_policy

The storage policy. Valid values:

  • HOT

  • COLD

  • MIXED

hot_partition_count

The number of hot partitions.

cold_partition_count

The number of cold partitions.

rt_total_size

The total volume of real-time data, which is the sum of the rt_data_size and rt_index_size fields. Unit: bytes.

rt_data_size

The volume of real-time data. Unit: bytes.

rt_index_size

The volume of primary key and index data in real-time data. Unit: bytes.

hot_total_size

The total volume of data in hot partitions, which is the sum of the hot_data_size and hot_index_size fields. Unit: bytes.

hot_data_size

The volume of data in hot partitions. Unit: bytes.

hot_index_size

The volume of primary key and index data in hot partitions. Unit: bytes.

cold_total_size

The total volume of data in cold partitions, which is the sum of the cold_data_size and cold_index_size fields. Unit: bytes.

cold_data_size

The volume of data in cold partitions. Unit: bytes.

cold_index_size

The volume of primary key and index data in cold partitions. Unit: bytes.

Note:

  • The table_usage table is updated in real time. Values of the rt_total_size, rt_data_size, rt_index_size, hot_total_size, hot_data_size, hot_index_size, cold_total_size, cold_data_size, and cold_index_size fields vary based on the execution of INSERT, UPDATE, DELETE, and BUILD statements.

  • If the values of the hot_total_size and cold_total_size fields are both 0 after data is loaded, data is still stored in the real-time engine and the value of the rt_total_size field indicates the volume of real-time data. You can execute the following BUILD statement to convert real-time data to partitioned data and then query the values of the hot_total_size and cold_total_size fields:

    build table <table_name>;
  • A user-defined hot_partition_count field indicates the number of hot partitions in a single shard after list partitioning, whereas the hot_partition_count field queried from the table_usage table indicates the number of hot partitions after shards are unioned. If data partitions are differently distributed across shards, the value of the hot_partition_count field queried from the table_usage table may be greater than that of the user-defined hot_partition_count field.

    For example, assume that Table A contains Shard 1 and Shard 2, and hot_partition_count is set to 2. The following figure shows the distribution of data partitions in Table A.shard

    Shard 1: P4 and P5 are hot partitions. P1, P2, and P3 are cold partitions.

    Shard 2: P3 and P4 are hot partitions. P1 and P2 are cold partitions.

    The actual number of hot partitions is calculated by using the following formula: (P4, P5) Union (P3, P4) = (P3, P4, P5). Therefore, the actual value of hot_partition_count is 3.

Query the change progress of the storage policy

You can execute the ALTER TABLE statement to change the storage policy of a table. For more information, see ALTER TABLE. The storage_policy_modify_progress table can be used to query the change progress of the storage policy.

  • Query the change progress of the storage policy for all tables involved in the current cluster.

    select * from information_schema.storage_policy_modify_progress;
  • Query the change progress of the storage policy for a specific table.

    select * from information_schema.storage_policy_modify_progress where table_schema='<schema_name>' and table_name='<table_name>';

The following table describes the fields involved in the storage_policy_modify_progress table.

Field

Description

table_schema

The name of the database.

table_name

The name of the table.

task_id

The ID of the storage policy change job.

source_storage_policy

The original storage policy. Valid values:

  • HOT

  • COLD

  • MIXED

source_hot_partition_count

The number of original hot partitions.

dest_storage_policy

The new storage policy. Valid values:

  • HOT

  • COLD

  • MIXED

dest_hot_partition_count

The number of new hot partitions.

hot_to_cold_partition_count

The number of partitions that are changed from hot storage to cold storage.

cold_to_hot_partition_count

The number of partitions that are changed from cold storage to hot storage.

hot_to_cold_data_size

The volume of data that is changed from hot storage to cold storage. Unit: bytes.

cold_to_hot_data_size

The volume of data that is changed from cold storage to hot storage. Unit: bytes.

hot_data_size_before_change

The volume of hot data before the storage policy is changed. Unit: bytes.

cold_data_size_before_change

The volume of cold data before the storage policy is changed. Unit: bytes.

hot_data_size_after_change

The volume of hot data after the storage policy is changed. Unit: bytes.

cold_data_size_after_change

The volume of cold data after the storage policy is changed. Unit: bytes.

start_time

The start of the time range within which the storage policy is changed.

update_time

The end of the time range within which the storage policy is changed.

progress

The change progress of the storage policy. Unit: %.

status

The change status of the storage policy. Valid values:

  • INIT: No change starts.

  • RUNNING: The change is in progress.

  • FINISH: The change is complete.