All Products
Search
Document Center

Hologres:Tiered storage of hot data and cold data

Last Updated:Oct 15, 2024

Hologres V1.3.37 and later support tiered storage of hot data and cold data to help customers manage data in a scientific manner. This topic describes how to use tiered storage of hot data and cold data in Hologres.

Storage types

The data storage modes of Hologres include standard storage and Infrequent Access (IA) storage.

  • Standard storage is called solid state drive (SSD) hot storage. This is the default storage mode that Hologres uses, and this storage mode meets the requirements for low-latency and high-performance data access. Standard storage is the most efficient and cost-effective choice in most scenarios.

  • IA storage is called hard disk drive (HDD) cold storage. This storage mode meets the requirements for cost-effective storage of IA data. IA storage is the best choice for ultra-large datasets that are insensitive to latency or that are infrequently accessed.

  • If you store data in primary and secondary instances, we recommend that you upgrade your Hologres instances to V1.3.55 or later and use HDD cold storage.

Hologres also supports tiered storage of hot data and cold data in a table. You can use the CREATE PARTITION TABLE statement in Hologres to configure storage modes for child tables.

Hologres V2.1 and later support SSD cache-based acceleration to accelerate access to data in cold storage. Currently, access to data only in the default cache space can be accelerated. Cache acceleration is enabled by default, and the access performance is expected to be improved by more than twice.

Prerequisites

  • Only Hologres V1.3.37 and later support tiered storage of hot data and cold data. If the version of your Hologres instance is earlier than V1.3.37, manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Instance upgrades. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.

  • For an instance that uses the subscription billing method, the default quota for IA storage of the instance is 0 GB after you upgrade the instance to V1.3.37.

Usage notes

  • A Hologres table consists of user data and metadata. In tiered storage of hot data and cold data, you can set the storage mode only for user data. Metadata must be stored in the hot storage tier to ensure performance. Even if the storage mode is set to cold storage for the table, metadata is stored in the hot storage tier. This incurs a small amount of hot storage fees.

  • A low priority is configured for hot data and cold data migration tasks to minimize the impact on user services. After you change the storage mode for user data, the configuration does not immediately take effect. Hologres migrates data at the backend in the asynchronous mode. By default, the migration starts at 00:00 on the day. In Hologres V2.2 and later, the migration time can be customized. For more information, see Dynamic partitioning.

  • When you write new data to the cold storage tier of the table, Hologres first writes the data to the hot storage tier, and then migrates the data to the cold storage tier by using an asynchronous task at the backend. This incurs a certain amount of hot storage fees.

  • Due to the limit of the disk seek time, we recommend that you do not set the storage mode to cold storage if point queries are performed on the tables. For example, we recommend that you do not apply cold storage to Hologres dimension tables in Realtime Compute for Apache Flink and to serving tables. The number of table rows that Hologres reads from the cold storage tier per second is 100 times lower than that Hologres reads from the hot storage tier per second.

  • We recommend that you do not set the storage mode to cold storage for row-oriented tables.

  • If you migrate data between the code storage tier and hot storage tier in batches, the tablets of all tables are enabled. This increases the memory usage. We recommend that you migrate data in a maximum of 200 tables at a time.

Configure tiered storage of hot data and cold data for a table

When you create a table, you can configure the storage_mode parameter in the SET_TABLE_PROPERTY statement to specify a storage mode for the table. For more information, see CREATE TABLE.

Non-partitioned table

Specify a storage mode for a non-partitioned table

You can configure the storage_mode parameter in the SET_TABLE_PROPERTY statement to specify a storage mode for a non-partitioned table. Valid values of this parameter:

  • hot

  • cold

For example, you can create a table named tbl1 and set the storage mode of the table to cold storage. Sample statements:

-- Specify the storage mode as cold storage when you create the table.
BEGIN;
CREATE TABLE tbl1 (
 "id" int NOT NULL,
 "name" text NOT NULL
);
CALL set_table_property('tbl1', 'storage_mode', 'cold');
COMMIT;

Change the storage mode of the non-partitioned table

Hologres allows you to change the storage mode after you create the table. For example, you can change the storage mode of the tbl1 table to hot storage. The system asynchronously migrates data to the hot storage tier. Sample statements:

-- Change the storage mode to hot storage after you create the table.
CALL set_table_property('tbl1', 'storage_mode', 'hot');

Partitioned table

Specify a storage mode for a partitioned table

You can specify the SET_TABLE_PROPERTY parameter to configure the storage mode for a partitioned table. Child tables inherit the storage mode of their parent tables by default. You can also specify storage modes for the child tables separately. Valid values:

  • hot

  • cold

For example, you can set the storage mode of the parent table named tbl2 to cold storage. When you write data to the child table named tbl2_v1 of the parent table, the data is written to the cold storage tier. Sample statements:

-- Specify the storage mode as cold storage when you create the table.
BEGIN;
CREATE TABLE tbl2(
  c1 text NOT NULL,
  c2 text
)
PARTITION BY LIST(c2);
CALL set_table_property('tbl2', 'storage_mode', 'cold');
CREATE TABLE tbl2_v1 PARTITION OF tbl2 FOR VALUES IN ('v1');
COMMIT;

Change the storage mode of a partitioned table

Hologres allows you to change the storage mode after you create the table. For example, you can change the storage mode of the tbl2 parent table to hot storage. The system asynchronously migrates data in all child tables of this parent table to the hot storage tier. Sample statements:

-- Change the storage mode to hot storage after you create the table.
CALL set_table_property('tbl2', 'storage_mode', 'hot');

For example, you can change the storage mode of a child table to cold storage by setting the storage_mode parameter to cold for the child table. Sample statements:

-- Create child tables named tbl2_v2 and tbl2_v3 for the tbl2 parent table.

CREATE TABLE tbl2_v2 PARTITION OF tbl2 FOR VALUES IN ('v2');
CREATE TABLE tbl2_v3 PARTITION OF tbl2 FOR VALUES IN ('v3');

-- Query the storage mode of the child tables. The child tables inherit the storage mode of their parent table by default.

SELECT * FROM hg_table_storage_status('public', 'tbl2');

-- Change the storage mode of the tbl2_v3 child table to cold storage.

CALL set_table_property('tbl2_v3', 'storage_mode', 'cold');
                

Dynamically manage partitioned tables

You can use the dynamic partitioning feature to configure rules of migrating data between the hot storage tier and cold storage tier. This helps implement tiered storage of hot data and cold data in an intelligent manner to balance costs and query performance. For more information about dynamic partitioning, see Dynamic partitioning.

Query the storage status of data in a table

You can call the hg_table_storage_status function to query the storage status of data in a table. The returned result of the hg_table_storage_status function includes the storage size of user data, not the storage size of metadata. Sample statement:

SELECT * FROM hg_table_storage_status('<schema_name>', '<table_name>');

Parameter

Description

schema_name

The name of the schema to which the table belongs.

table_name

The name of the table.

The following table describes columns in the result returned by the function.

Column

Description

table_name

  • For a non-partitioned table, the name of the table is returned.

  • For a parent table, the names of its child tables are returned in separate lines. Each line represents a child table.

  • For a child table, the name of the child table is returned.

hot_size

The size of data in the hot storage tier. Unit: bytes.

cold_size

The size of data in the cold storage tier. Unit: bytes.

status

The storage status of data. Valid values:

  • hot: The data is stored in the hot storage tier.

  • cold: The data is stored in the cold storage tier.

  • transferring: The data is being migrated.

Sample statements:

-- Query the storage status of data in a non-partitioned table.
SELECT * FROM hg_table_storage_status('public', 'tbl1'); -- The returned data size is in bytes.
 table_name | hot_size |   cold_size   | status
------------+----------+---------------+--------
 tbl1       |  145643  |      3685     | transferring

-- Query the storage status of data in a partitioned table.
SELECT * FROM hg_table_storage_status('public', 'tbl2');-- The returned data size is in bytes.
   table_name    | hot_size | cold_size | status
-----------------+----------+-----------+--------
 tbl2_2022062222 |        0 |         0 | hot
 tbl2_2022062221 |     1125 |         0 | hot
 tbl2_2022062220 |     1245 |         0 | hot
 tbl2_2022062219 |     1358 |         0 | hot
 tbl2_2022062218 |        0 |      1875 | cold
 tbl2_2022062217 |        0 |      1143 | cold
 tbl2_2022062216 |        0 |      1299 | cold

Query the access frequency of table data

Hologres V1.3.37 and later collect statistics on tables in your Hologres instance on a daily basis and store the statistics in the system table named hologres.hg_table_info. This way, you can query and analyze the statistics on your tables and take optimization measures based on the results of statistical analysis. For more information, see Query and analyze table statistics. You can query this system table to obtain information such as the volumes of hot data and cold data stored in the tables, the access frequency of table data, and the access frequency of partition data. Then, you can determine whether to migrate hot data and cold data based on the obtained information. The following sample code provides examples on how to query statistics on tables in an instance by using this system table.

Non-partitioned table

SELECT a.table_name,
       (a.total_read_count - b.total_read_count) AS read_count,
       (a.total_write_count - b.total_write_count) AS write_count,
       a.hot_storage_size
FROM (SELECT * FROM hologres.hg_table_info
      WHERE type='TABLE' AND collect_time::DATE = CURRENT_DATE - interval '1 day') a
JOIN
    (SELECT * FROM hologres.hg_table_info
    WHERE type='TABLE' AND collect_time::DATE = CURRENT_DATE - interval '${days} day') b
ON a.table_name = b.table_name
ORDER BY hot_storage_size DESC;

Partitioned table

Query the access frequency of all partitions:

SELECT parent_table_name,count(*) AS partition_cnt, 
       sum(hot_storage_size)/1024/1024/1024 AS hot_size_gb 
FROM hologres.hg_table_info
WHERE type = 'PARTITION' AND collect_time::DATE = CURRENT_DATE - interval '1 day'
GROUP BY parent_table_name
ORDER BY hot_size_gb DESC;

Query the access frequency of the specified partitions:

SELECT a.table_name,
      (a.total_read_count - b.total_read_count) as read_count,
      (a.total_write_count - b.total_write_count) as write_count,
       a.hot_storage_size
FROM (SELECT * FROM hologres.hg_table_info
      WHERE type = 'PARTITION'
      AND parent_table_name = '${p_table_name}'
      AND collect_time::DATE = CURRENT_DATE - interval '1 day') a
JOIN
     (SELECT * FROM hologres.hg_table_info
     WHERE type = 'PARTITION'
     AND parent_table_name = '${p_table_name}'
     AND collect_time::DATE = CURRENT_DATE - interval '${days} day') b
ON a.table_name = b.table_name
ORDER BY table_name DESC;