To reduce data storage costs and improve query performance and system resource utilization, ApsaraDB for ClickHouse provides tiered storage of hot data and cold data. You can store data in different types of disks based on query frequencies. This topic describes how to enable tiered storage of hot data and cold data and configure tiered storage policies.
Prerequisites
An ApsaraDB for ClickHouse cluster of V20.8 or later is created.
If you want to use tiered storage of hot data and cold data on an ApsaraDB for ClickHouse cluster whose version does not support this feature, you can create an ApsaraDB for ClickHouse cluster of V20.8 or later and migrate data to the new cluster. For more information, see Migrate data between ApsaraDB for ClickHouse Community-compatible Edition clusters.
ApsaraDB for ClickHouse Enterprise Edition automatically stores hot data in the cache and cold data in shared storage. Tiered storage of hot data and cold data does not depend on the storage rules of hot data and cold data.
Background information
Hot data is the data that is frequently accessed and stored on enhanced SSDs (ESSDs) or ultra disks. These disks are selected when you create clusters. Hot data storage is suitable for business scenarios that require high-performance queries.
Cold data is the data that is not frequently accessed. Cold data is stored on cold data disks at low storage costs to meet the requirement of cost-effective storage.
Tiered storage of hot data and cold data provides the following storage policies.
Storage policy
Description
Default storage policy
Newly written data is stored on hot data disks to support efficient queries. If the amount of stored hot data reaches the storage threshold, the part data file whose size is the largest in the hot data disk is automatically migrated to a cold data disk. This way, newly written hot data can be stored on hot data disks.
Storage policy based on time to live (TTL)
Add a TTL clause to the default storage policy. This way, all data generated before the specified period in the TTL clause is automatically migrated to a cold data disk.
Precautions
If you enabled the data backup feature for an ApsaraDB for ClickHouse cluster, you cannot enable tiered storage of hot data and cold data for this cluster.
When recommend that you enable tiered storage of hot data and cold data only during off-peak hours. Enabling tiered storage of hot data and cold data causes the restart of the cluster. During the restart, you can only view the cluster information. You cannot upgrade, scale out, or scale in the cluster, or modify parameter configurations.
When you use tiered storage of hot data and cold data, you must specify storage policies for tables.
When you create a table, you can use
SETTINGS storage_policy = 'hot_to_cold'
to specify the data storage policy.You can modify the data storage policy of an existing table.
Clusters with tiered storage of hot data and cold data enabled after January 1, 2024
Execute the following statement to modify the data storage policy:
ALTER TABLE <table_name> MODIFY SETTING storage_policy = 'hot_to_cold';
The following table describes the parameters in the preceding statement.
Parameter
Description
table_name
The name of the table.
Clusters with tiered storage of hot data and cold data enabled before January 1, 2024
You can submit a ticket to apply for adding a hot and cold data storage policy. This way, you can use the policy to migrate data from a hot data disk to a cold data disk.
The tiered storage of hot data and cold data feature cannot be disabled after you enable this feature.
Billing
You are billed for the tiered storage of hot data and cold data feature based on the following rules.
Storage type
Billing details
Hot data storage
The storage space that you purchase for a cluster is the storage space for hot data. You do not need to purchase storage for hot data after you purchase a cluster.
Cold data storage
You are billed for cold data storage based on the pay-as-you-go billing method. You do not need to purchase storage for cold data before you use the storage. You are billed for cold data storage on an hourly basis based on the used storage.
For more information about the unit prices of cold data storage, see Storage fees of ApsaraDB for ClickHouse clusters.
Billing formula: Fee of cold storage space = Unit price of cold data storage × Used cold data storage × Usage duration.
For example, if the unit price of cold data storage is USD 0.000026 per GB-hour and 2,000 GB of cold data is stored from 19:00 to 20:00, the total fee is calculated based on the following formula: USD 0.000026 per GB-hour × 2,000 GB × 1 hour = USD 0.052. If you release 500 GB of storage space for cold data between 20:00 and 21:00, only 1,500 GB storage is used. In this case, the total fee incurred between 20:00 and 21:00 is calculated based on the following formula: USD 0.000026 per GB-hour × 1,500 GB × 1 hour = USD 0.039.
The preceding prices are provided only for your reference. In actual billing scenarios, the prices displayed in the ApsaraDB for ClickHouse console shall prevail.
Enable tiered storage of hot data and cold data
Log on to the ApsaraDB for ClickHouse console.
In the top navigation bar, select the region where your cluster is deployed.
On the Clusters page, click the Clusters of Community-compatible Edition tab and click the ID of the cluster that you want to manage.
In the left-side navigation pane, click Cold and hot layered storage.
Click Click to Enable Layered Storage.
In the Enable Layered Storage message, click OK.
Then, the system restarts the cluster. After the state of the cluster changes from Changing Configuration to Running, tiered storage of hot data and cold data is enabled.
Default storage policy used after tiered storage of hot data and cold data is enabled
The following table describes the parameters for the default storage policy that is used after tiered storage of hot data and cold data is enabled.
Parameter | Description |
move_factor | If the percentage of available storage for hot data to the total storage for hot data is smaller than the value of the move_factor parameter, the part data file whose size is the largest in the hot data disk is automatically migrated to a cold data disk. Valid values: Note For more information about how to view the disk space, see the View the disk space section of this topic. |
prefer_not_to_merge | Specifies whether to merge data on a cold data disk. Valid values:
|
Add a TTL-based tiered storage policy
Add a TTL clause to the default storage policy. This way, all data generated before the specified period in the TTL clause is automatically migrated to a cold data disk.
The following sample code provides an example on how to add a TTL clause:
TTL <time_column> + INTERVAL <number> TO DISK 'cold_disk'
The following table describes the parameters.
Parameter | Description |
time_column | The column of the DATE or DATETIME data type. |
number | The period before which the data to be migrated was generated. Common units are day, week, month, and year. |
For more information about TTL clauses, see TTL.
Example
Migrate all data that was generated 90 days ago to a cold data disk based on the date column. The date column contains the date when data was generated. Execute the following statement to create a table:
CREATE TABLE ttl_test_tbl
(
`f1` String,
`f2` String,
`f3` Int64,
`f4` Float64,
`date` Date
)
ENGINE = MergeTree()
PARTITION BY date
ORDER BY f1
TTL date + INTERVAL 90 DAY TO DISK 'cold_disk'
SETTINGS storage_policy = 'hot_to_cold';
Modify TTL-based tiered storage of hot data and cold data
You can change the column of the DATE or DATETIME data type and the period in a TTL clause to modify the TTL-based tiered storage policy.
By default, historical data and new data are stored based on the new storage policy after the TTL-based tiered storage policy is modified.
If you do not want to modify the TTL-based tiered storage policy for historical data, you can execute the
set materialize_ttl_after_modify=0;
statement before you modify the TTL-based tiered storage policy. After the statement is executed, you can modify the TTL-based tiered storage policy. Historical data is stored based on the original storage policy. Only new data is stored based on the new storage policy.After you modify the TTL-based tiered storage policy, data that is migrated to a cold data disk based on the original storage policy is not automatically migrated to a hot data disk. If you want to migrate such cold data to a hot data disk, see the Migrate data on hot data disks and cold data disks section of this topic.
The following sample code provides an example on how to modify a TTL-based tiered storage policy:
ALTER TABLE <table_name> ON CLUSTER default MODIFY TTL <time_column> + INTERVAL <number> TO DISK 'cold_disk';
The following table describes the parameters.
Parameter | Description |
table_name | The table name. |
time_column | The new column of the DATE or DATETIME data type. |
number | The new period before which the data to be migrated was generated. Common units are day, week, month, and year. |
Migrate data on hot data disks and cold data disks
Execute the following statement to migrate data from a hot data disk to a cold data disk:
ALTER TABLE <table_name> ON CLUSTER default MOVE PARTITION <partition> TO DISK 'cold_disk';
Execute the following statement to migrate data from a cold data disk to a hot data disk:
ALTER TABLE <table_name> ON CLUSTER default MOVE PARTITION <partition> TO DISK 'default';
Notepartition
specifies a partition name. You can view the value in the system.parts system table.
View data on hot data disks and cold data disks
Execute the following statement to view data on a hot data disk:
SELECT * FROM system.parts WHERE database = '<db_name>' AND TABLE = '<tbl_name>' AND disk_name ='default' AND active = 1;
Execute the following statement to view data on a cold data disk:
SELECT * FROM system.parts WHERE database = '<db_name>' AND TABLE = '<tbl_name>' AND disk_name ='cold_disk' AND active = 1;
View the disk space
After tiered storage of hot data and cold data is enabled, you can execute the following statement to view the disk space:
SELECT * FROM system.disks;
The following result is returned:
┌─name─────┬─────────path───────────────────────────┬─free_space───────────┬─total_space ─────────┬─used_space───┬─keep_free_space───┬──type──┐
│ cold_disk│ /clickhouse/data/data/disks/cold_disk/ │ 18446744073709551615 │ 18446744073709551615 │ 115312080698 │ 0 │ OSS │
│ default │ /clickhouse/data/data/ │ 201663209472 │ 207083249664 │ 5420040192 │ 0 │ local │
└──────────┴────────────────────────────────────────┴──────────────────────┴──────────────────────┴──────────────┴───────────────────┴────────┘
The following table describes the parameters in the returned result.
Parameter | Description |
name | The disk name. Valid values:
|
path | The path in which data on the disk is stored. |
free_space | The available space on the disk. The available space on the cold data disk is not limited. Unit: bytes. |
total_space | The total disk space. The available space on the cold data disk is not limited. Unit: bytes. |
used_space | The space used on the disk. Unit: bytes. |
keep_free_space | The reserved available space on the disk. Unit: bytes. |
type | The storage medium. Valid values:
|