Separate storage of hot data and cold data in ApsaraDB for ClickHouse - ApsaraDB for ClickHouse

To reduce data storage costs and improve query performance and system resource utilization, ApsaraDB for ClickHouse provides tiered storage of hot data and cold data. You can store data in different types of disks based on query frequencies. This topic describes how to enable tiered storage of hot data and cold data and configure tiered storage policies.

Prerequisites

An ApsaraDB for ClickHouse cluster of V20.8 or later is created.

Note

If you want to use tiered storage of hot data and cold data on an ApsaraDB for ClickHouse cluster whose version does not support this feature, you can create an ApsaraDB for ClickHouse cluster of V20.8 or later and migrate data to the new cluster. For more information, see Migrate data between ApsaraDB for ClickHouse Community-compatible Edition clusters.
ApsaraDB for ClickHouse Enterprise Edition automatically stores hot data in the cache and cold data in shared storage. Tiered storage of hot data and cold data does not depend on the storage rules of hot data and cold data.

Background information

Hot data is the data that is frequently accessed and stored on enhanced SSDs (ESSDs) or ultra disks. These disks are selected when you create clusters. Hot data storage is suitable for business scenarios that require high-performance queries.

Cold data is the data that is not frequently accessed. Cold data is stored on cold data disks at low storage costs to meet the requirement of cost-effective storage.

Tiered storage of hot data and cold data provides the following storage policies.

Storage policy	Description
Default storage policy	Newly written data is stored on hot data disks to support efficient queries. If the amount of stored hot data reaches the storage threshold, the part data file whose size is the largest in the hot data disk is automatically migrated to a cold data disk. This way, newly written hot data can be stored on hot data disks.
Storage policy based on time to live (TTL)	Add a TTL clause to the default storage policy. This way, all data generated before the specified period in the TTL clause is automatically migrated to a cold data disk.

Precautions

If you enabled the data backup feature for an ApsaraDB for ClickHouse cluster, you cannot enable tiered storage of hot data and cold data for this cluster.
When recommend that you enable tiered storage of hot data and cold data only during off-peak hours. Enabling tiered storage of hot data and cold data causes the restart of the cluster. During the restart, you can only view the cluster information. You cannot upgrade, scale out, or scale in the cluster, or modify parameter configurations.
When you use tiered storage of hot data and cold data, you must specify storage policies for tables.
- When you create a table, you can use SETTINGS storage_policy = 'hot_to_cold' to specify the data storage policy.
- You can modify the data storage policy of an existing table.
  Clusters with tiered storage of hot data and cold data enabled after January 1, 2024
  Execute the following statement to modify the data storage policy:
```
ALTER TABLE <table_name> MODIFY SETTING storage_policy = 'hot_to_cold';
```
  The following table describes the parameters in the preceding statement.
  Parameter
  Description
  table_name
  The name of the table.
  Clusters with tiered storage of hot data and cold data enabled before January 1, 2024
  You can submit a ticket to apply for adding a hot and cold data storage policy. This way, you can use the policy to migrate data from a hot data disk to a cold data disk.
The tiered storage of hot data and cold data feature cannot be disabled after you enable this feature.

Billing

You are billed for the tiered storage of hot data and cold data feature based on the following rules.

Storage type	Billing details
Hot data storage	The storage space that you purchase for a cluster is the storage space for hot data. You do not need to purchase storage for hot data after you purchase a cluster.
Cold data storage	You are billed for cold data storage based on the pay-as-you-go billing method. You do not need to purchase storage for cold data before you use the storage. You are billed for cold data storage on an hourly basis based on the used storage.

For more information about the unit prices of cold data storage, see Storage fees of ApsaraDB for ClickHouse clusters.
Billing formula: Fee of cold storage space = Unit price of cold data storage × Used cold data storage × Usage duration.
For example, if the unit price of cold data storage is USD 0.000026 per GB-hour and 2,000 GB of cold data is stored from 19:00 to 20:00, the total fee is calculated based on the following formula: USD 0.000026 per GB-hour × 2,000 GB × 1 hour = USD 0.052. If you release 500 GB of storage space for cold data between 20:00 and 21:00, only 1,500 GB storage is used. In this case, the total fee incurred between 20:00 and 21:00 is calculated based on the following formula: USD 0.000026 per GB-hour × 1,500 GB × 1 hour = USD 0.039.

Note

The preceding prices are provided only for your reference. In actual billing scenarios, the prices displayed in the ApsaraDB for ClickHouse console shall prevail.

Enable tiered storage of hot data and cold data

Log on to the ApsaraDB for ClickHouse console.
In the top navigation bar, select the region where your cluster is deployed.
On the Clusters page, click the Clusters of Community-compatible Edition tab and click the ID of the cluster that you want to manage.
In the left-side navigation pane, click Cold and hot layered storage.
Click Click to Enable Layered Storage.
In the Enable Layered Storage message, click OK.
Then, the system restarts the cluster. After the state of the cluster changes from Changing Configuration to Running, tiered storage of hot data and cold data is enabled.

Default storage policy used after tiered storage of hot data and cold data is enabled

The following table describes the parameters for the default storage policy that is used after tiered storage of hot data and cold data is enabled.

Parameter

Description

move_factor

If the percentage of available storage for hot data to the total storage for hot data is smaller than the value of the move_factor parameter, the part data file whose size is the largest in the hot data disk is automatically migrated to a cold data disk.

Valid values: 0 to 1. If the move_factor parameter is set to 0, the tiered storage of hot data and cold data feature that is based on the disk storage capacity is disabled. The default value is 0.1. The default value specifies that data is automatically migrated when the percentage of available storage for hot data is less than 10%. You can also add a TTL-based tiered storage policy. For more information, see the Add a TTL-based tiered storage policy section of this topic.

Note

For more information about how to view the disk space, see the View the disk space section of this topic.

prefer_not_to_merge

Specifies whether to merge data on a cold data disk. Valid values:

true: Data on a cold data disk is not merged. This is the default value.
false: Data on a cold data disk is merged.

Add a TTL-based tiered storage policy

Add a TTL clause to the default storage policy. This way, all data generated before the specified period in the TTL clause is automatically migrated to a cold data disk.

The following sample code provides an example on how to add a TTL clause:

TTL <time_column> + INTERVAL <number> TO DISK 'cold_disk'

The following table describes the parameters.

Parameter

Description

time_column

The column of the DATE or DATETIME data type.

number

The period before which the data to be migrated was generated.

Common units are day, week, month, and year.

Note

For more information about TTL clauses, see TTL.

Example

Migrate all data that was generated 90 days ago to a cold data disk based on the date column. The date column contains the date when data was generated. Execute the following statement to create a table:

CREATE TABLE ttl_test_tbl
(
    `f1` String,
    `f2` String,
    `f3` Int64,
    `f4` Float64,
    `date` Date
)
ENGINE = MergeTree()
PARTITION BY date
ORDER BY f1
TTL date + INTERVAL 90 DAY TO DISK 'cold_disk'
SETTINGS storage_policy = 'hot_to_cold';

Modify TTL-based tiered storage of hot data and cold data

You can change the column of the DATE or DATETIME data type and the period in a TTL clause to modify the TTL-based tiered storage policy.

Note

By default, historical data and new data are stored based on the new storage policy after the TTL-based tiered storage policy is modified.
If you do not want to modify the TTL-based tiered storage policy for historical data, you can execute the set materialize_ttl_after_modify=0; statement before you modify the TTL-based tiered storage policy. After the statement is executed, you can modify the TTL-based tiered storage policy. Historical data is stored based on the original storage policy. Only new data is stored based on the new storage policy.
After you modify the TTL-based tiered storage policy, data that is migrated to a cold data disk based on the original storage policy is not automatically migrated to a hot data disk. If you want to migrate such cold data to a hot data disk, see the Migrate data on hot data disks and cold data disks section of this topic.

The following sample code provides an example on how to modify a TTL-based tiered storage policy:

ALTER TABLE <table_name> ON CLUSTER default MODIFY TTL <time_column> + INTERVAL <number> TO DISK 'cold_disk';

The following table describes the parameters.

Parameter	Description
table_name	The table name.
time_column	The new column of the DATE or DATETIME data type.
number	The new period before which the data to be migrated was generated. Common units are day, week, month, and year.

Migrate data on hot data disks and cold data disks

Execute the following statement to migrate data from a hot data disk to a cold data disk:

ALTER TABLE <table_name> ON CLUSTER default MOVE PARTITION <partition> TO DISK 'cold_disk';

Execute the following statement to migrate data from a cold data disk to a hot data disk:
```
ALTER TABLE <table_name> ON CLUSTER default MOVE PARTITION <partition> TO DISK 'default';
```
Note
partition specifies a partition name. You can view the value in the system.parts system table.

View data on hot data disks and cold data disks

Execute the following statement to view data on a hot data disk:

SELECT * FROM system.parts WHERE database = '<db_name>' AND TABLE = '<tbl_name>' AND disk_name ='default' AND active = 1;

Execute the following statement to view data on a cold data disk:

SELECT * FROM system.parts WHERE database = '<db_name>' AND TABLE = '<tbl_name>' AND disk_name ='cold_disk' AND active = 1;

View the disk space

After tiered storage of hot data and cold data is enabled, you can execute the following statement to view the disk space:

SELECT * FROM system.disks;

The following result is returned:

┌─name─────┬─────────path───────────────────────────┬─free_space───────────┬─total_space ─────────┬─used_space───┬─keep_free_space───┬──type──┐
│ cold_disk│ /clickhouse/data/data/disks/cold_disk/ │ 18446744073709551615 │ 18446744073709551615 │ 115312080698 │   0               │   OSS  │
│ default  │ /clickhouse/data/data/                 │ 201663209472         │ 207083249664         │ 5420040192   │   0               │  local │ 
└──────────┴────────────────────────────────────────┴──────────────────────┴──────────────────────┴──────────────┴───────────────────┴────────┘

The following table describes the parameters in the returned result.

Parameter	Description
name	The disk name. Valid values: cold_disk: the name of the cold data disk. default: the name of the hot data disk.
path	The path in which data on the disk is stored.
free_space	The available space on the disk. The available space on the cold data disk is not limited. Unit: bytes.
total_space	The total disk space. The available space on the cold data disk is not limited. Unit: bytes.
used_space	The space used on the disk. Unit: bytes.
keep_free_space	The reserved available space on the disk. Unit: bytes.
type	The storage medium. Valid values: OSS: Cold data is stored in Object Storage Service (OSS). local: Data is stored on a local disk.

ApsaraDB for ClickHouse:Tiered storage of hot data and cold data

Prerequisites

Background information

Precautions

Clusters with tiered storage of hot data and cold data enabled after January 1, 2024

Clusters with tiered storage of hot data and cold data enabled before January 1, 2024

Billing

Enable tiered storage of hot data and cold data

Default storage policy used after tiered storage of hot data and cold data is enabled

Add a TTL-based tiered storage policy

Modify TTL-based tiered storage of hot data and cold data

Migrate data on hot data disks and cold data disks

View data on hot data disks and cold data disks

View the disk space