Automatic data migration from hot storage to cold storage

Updated at: 2024-10-11 08:26

AnalyticDB for PostgreSQL provides the feature of tiered storage of hot and cold data that allows you to store infrequently accessed data in cold storage to reduce storage costs. In addition to manual movement of data from hot storage to cold storage by using specific functions, AnalyticDB for PostgreSQL V7.0 provides automatic hot-to-cold migration policies for partitioned tables. After you configure a policy, data is automatically migrated from hot storage to cold storage based on the policy. This topic describes how to use Hot Partition Number (HPN) and time to live (TTL) policies to automatically migrate data from hot storage to cold storage.

Version limits

You can configure automatic hot-to-cold migration policies only for AnalyticDB for PostgreSQL V7.0 instances of V7.0.6.5 or later.

Usage limits

  • You can configure automatic hot-to-cold migration policies for a partitioned table but not a non-partitioned table.

  • You cannot configure automatic hot-to-cold migration policies for a multi-level partitioned table.

  • You cannot configure automatic hot-to-cold migration policies for a hash partitioned table.

  • You cannot configure automatic hot-to-cold migration policies for a table that has a primary key.

  • You cannot configure automatic hot-to-cold migration policies for a partitioned table that has multiple partition key columns.

  • After you configure an automatic hot-to-cold migration policy, data that is migrated from hot storage to cold storage cannot be migrated back to hot storage.

Scheduling policies

AnalyticDB for PostgreSQL supports the following automatic hot-to-cold migration policies: HPN and TTL. Both types of policies are applicable to list partitions and range partitions.

HPN policies

An HPN policy specifies that only M hot partitions are retained. AnalyticDB for PostgreSQL sorts the partitions in lexicographical order based on partition key values. During scheduling, the first M partitions that have the largest partition key values are retained in hot storage, whereas the other partitions are automatically moved to cold storage. Among these partitions, the DEFAULT partition is the last to be moved to cold storage, and the NULL partition is moved to cold storage before the DEFAULT partition.

Value format for the storage_policy parameter when you configure an HPN policy: HPN:M. HPN is case-sensitive. M must be a non-negative integer. No spaces are allowed.

Valid examples:

'HPN:3' 
'HPN:0'

Invalid examples:

'HPN:-3' 
'hpn:3'
'HPN: 3'
'HPN:3 '

TTL policies

You can configure TTL policies only for partitioned tables whose partition keys are fields of the DATE, TIMESTAMP, or TIMESTAMP WITH TIMEZONE type. Partitions that exceed the specified retention period are automatically moved to cold storage based on a TTL policy. For example, if the current date is 2024-08-10 and the retention period is set to 8 days, the partitions whose partition key values are 2024-08-02 or earlier are automatically moved to cold storage.

Take note of the following items:

  • If the difference between the upper range value of a range partition and the current date is greater than the specified retention period, the partition is automatically moved to cold storage.

  • If the difference between the partition key value of a list partition and the current date is greater than the specified retention period, the partition is automatically moved to cold storage.

  • The DEFAULT and NULL partitions are not moved to cold storage.

Value format for the storage_policy parameter when you configure a TTL policy: 'TTL:N [YEAR | MONTH | DAY]'. N must be a non-negative integer. No extra spaces are allowed. YEAR, MONTH, and DAY specify the time units and are case-sensitive. If you do not specify a unit, the default unit DAY is used.

Valid examples:

'TTL:3' 
'TTL:3 YEAR'
'TTL:3 MONTH'
'TTL:3 DAY'

Invalid examples:

'TTL:-3' 
'TTL:3 day'
'TTL:3DAY'
'TTL:3 DAY '

Syntax and examples

Create and configure an automatic hot-to-cold migration policy

CREATE TABLE [IF NOT EXISTS] tbl_name
(create_definition,...)
[WITH(tiered_storage_cooldown_policy='XXX')]
PARTITION BY {LIST|RANGE} (PARTITION columns)
[distribution_options]

When you create a table, you can specify a value for the tiered_storage_cooldown_policy parameter in the HPN:M or TTL:N format to configure an automatic hot-to-cold migration policy. For information about other parameters of the table creation statement, see SQL syntax.

Examples

CREATE TABLE tiered_storage_partition1 (a INT, b DATE) WITH(tiered_storage_cooldown_policy='TTL:2')  PARTITION BY LIST (b)
(
    VALUES ('2024-06-10'),
    VALUES ('2024-06-07'),
    VALUES ('2024-06-06')
);
CREATE TABLE tiered_storage_partition1 (a INT, b DATE) WITH(tiered_storage_cooldown_policy='HPN:3')  PARTITION BY LIST (b)
(
    VALUES ('2024-06-10'),
    VALUES ('2024-06-07'),
    VALUES ('2024-06-06')
);

Modify an automatic hot-to-cold migration policy

ALTER TABLE tbl_name SET(tiered_storage_cooldown_policy='XXX');    -- Modify a migration policy.
ALTER TABLE tbl_name SET(tiered_storage_cooldown_policy='');       -- Delete a migration policy.

Example

ALTER TABLE test_tbl SET(tiered_storage_cooldown_policy='HPN:5'); 

Scheduling parameters

The automatic hot-to-cold migration feature provided by AnalyticDB for PostgreSQL allows you to configure O&M parameters such as the maximum number of concurrent storage tiering tasks, the scheduler trigger interval, and the scheduling window.

To view the current parameter values, execute the SHOW statement. To change the parameter values, submit a ticket. The modified parameters take effect the next time the scheduler is triggered. You do not need to restart the AnalyticDB for PostgreSQL instance.

Maximum number of concurrent storage tiering tasks

You can execute the SHOW tiered_storage.adb_tiered_storage_max_worker statement to query the maximum number of concurrent storage tiering tasks that move partitions to cold storage in the background. For tables that have multiple partitions to be moved to cold storage, a higher concurrency allows more tables to be simultaneously moved to cold storage and increases the data migration speed. However, increasing the concurrency does not change the storage tiering speed of individual tables. The parameter is of the INT type. The default value is 5. Setting an excessively high concurrency may cause excessive CPU consumption during the storage tiering process.

Scheduler trigger interval

You can execute the SHOW tiered_storage.adb_tiered_storage_worker_launch_interval statement to query the scheduling interval at which the background scheduler runs the storage tiering tasks. The scheduler migrates data from hot storage to cold storage at a specific interval. The parameter is of the INT type. The default value is 600, which specifies that the scheduler is triggered every 10 minutes. Setting an excessively long interval may slow down the overall storage tiering speed, whereas setting an excessively short interval may consume excessive system resources.

Scheduling window

You can execute the SHOW tiered_storage.adb_worker_time_window_str statement to query the daily time periods during which the background scheduler can run scheduled storage tiering tasks. The parameter is of the STRING type. The parameter is empty by default, which specifies that the scheduler can run tasks from 00:00:00 to 23:59:59 every day. This parameter specifies the start time and end time for running scheduled storage tiering tasks every day. The parameter value must be in the HH:MM-HH:MM format. Example: 02:00-04:15. Invalid parameter values do not take effect. Scheduled tasks that are already in the storage tiering process do not automatically stop running if they run beyond the scheduling window.

Examples

  • Create a range partitioned table that is partitioned by month and retains two hot partitions. Configure the scheduler to run scheduled storage tiering tasks from 02:00 to 04:00 every day. Set the tiered_storage.adb_worker_time_window_str parameter to 02:00-04:00.

    CREATE TABLE tiered_storage_partition4 (a INT, b DATE) WITH(tiered_storage_cooldown_policy='HPN:2') PARTITION BY RANGE(b)
    (
        START ('2023-11-01'::date) END ('2023-12-01'::date),
        START ('2023-12-01'::date) END ('2024-01-01'::date),
        START ('2024-01-01'::date) END ('2024-02-01'::date),
        START ('2024-02-01'::date) END ('2024-03-01'::date)
    );

    Based on the preceding policy, the partitions of the partitioned table are sorted by the b field of the DATE type. The first two partitions that have the largest partition key values are retained in hot storage and the other partitions are automatically moved to cold storage. You can create new partitions before 02:00 on the first day of each month based on your business requirements. This way, the new partitions are identified as hot partitions when scheduled storage tiering tasks start at 02:00.

  • Create a partitioned table that is partitioned by day and configure the partitions whose partition key values or upper range values are greater than 90 days away from the current date to be moved to cold storage. Configure the scheduler to run scheduled storage tiering tasks from 23:00 to 23:59 every day.

    LIST partitioning
    RANGE partitioning
    CREATE TABLE tiered_storage_partition1 (a INT, b DATE) WITH(tiered_storage_cooldown_policy='TTL:90')  PARTITION BY LIST (b)
    (
        VALUES ('2024-06-10'),
        VALUES ('2024-06-09'),
        VALUES ('2024-06-08'),
        VALUES ('2024-06-07'),
        VALUES ('2024-06-06'),
        ...    --Replace the sample values with the actual partition key values.
        VALUES ('2023-11-01')
    );
    CREATE TABLE tiered_storage_partition3 (a INT, b DATE) WITH(tiered_storage_cooldown_policy='TTL:90') PARTITION BY RANGE(b)
    (
        START ('2023-11-01'::date) END ('2023-12-01'::date),
        START ('2023-12-01'::date) END ('2024-01-01'::date),
        START ('2024-01-01'::date) END ('2024-02-01'::date),
        ...  --Replace the sample values with the actual partition range values.
        START ('2024-05-01'::date) END ('2024-06-01'::date)
        
    );

Based on the preceding policy, AnalyticDB for PostgreSQL identifies the partitions whose partition key values or upper range values are greater than 90 days away from the current date as cold data and automatically moves the data to cold storage. This ensures that hot data can be retained for 90 days.

FAQ

  • Can I configure scheduled hot-to-cold migration for a non-partitioned table?

    Yes, you can use the pg_cron extension to configure scheduled hot-to-cold migration for a non-partitioned table. For more information, see Tiered storage of hot and cold data.

  • Can I configure an automatic hot-to-cold migration policy for a partitioned table that involves a business-related field that is not used as the partition key?

    No, you can configure only automatic hot-to-cold migration policies that involve partition keys. For example, you can configure the retention period for hot data based on the differences between the partition key values or upper range values and the current date.

If you encounter other issues during the automatic data migration from hot storage to cold storage, submit a ticket or join the AnalyticDB for PostgreSQL DingTalk group 11700737 for technical support.

  • On this page (1, T)
  • Version limits
  • Usage limits
  • Scheduling policies
  • HPN policies
  • TTL policies
  • Syntax and examples
  • Create and configure an automatic hot-to-cold migration policy
  • Modify an automatic hot-to-cold migration policy
  • Scheduling parameters
  • Examples
  • FAQ
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare