Separate hot and cold data based on a custom time column - Lindorm

LindormTable lets you archive hot and cold data based on a custom time column. After you purchase storage-optimized Lindorm instances, you can specify a time column in a table or secondary index to separate hot and cold data. This method stores data on different media to improve query efficiency for hot data and reduce storage costs for cold data. This topic describes how to separate hot and cold data based on a custom time column and provides related usage notes.

Prerequisites

The cold storage feature is enabled. For more information about how to enable this feature, see Storage-optimized Lindorm instances.
The LindormTable version is 2.4.3 or later. For more information about how to view or upgrade the version, see LindormTable version guide and Perform a minor version update.
You are connected to LindormTable using Lindorm-cli. For more information, see Connect to and use LindormTable using Lindorm-cli.
The storage property STORAGE_POLICY of the table or column family is not set to COLD. This means the table for which you want to enable hot and cold data separation cannot be a cold storage table. For more information about how to modify the storage property, see Configure cold storage.

Usage notes

This feature is supported only for Lindorm SQL. It is not supported for HBase-compatible operations.
When the value in the custom time column of a row crosses the specified boundary for hot data, the entire row is archived to cold storage.
The following rules apply when you set a custom time column:
- The custom time column must be a primary key column.
- The custom time column cannot be the first column of the primary key.
- The custom time column supports only the BIGINT and TIMESTAMP data types. Data cannot be separated if the specified column does not exist, if no data is written to the column, or if the data written to the column is of an incorrect data type. In these cases, all data is written to hot storage.

Procedure

Set the hot and cold data boundary for a table

Set the custom time column and the hot and cold data boundary.

Method 1: Set them when you create the table.

Create a table named dt. Set the hot and cold data boundary to one day, and use the p2 column for hot and cold data separation.

-- Set the hot and cold data boundary to one day and use the p2 column for hot and cold data separation. The time unit for the value in the p2 column defaults to milliseconds (ms).
CREATE TABLE dt (p1 integer, p2 bigint, p3 bigint, c1 varchar,  constraint pk PRIMARY KEY(p1, p2, p3)) WITH(COMPRESSION = 'ZSTD', CHS ='86400', CHS_L2 = 'storagetype=COLD', CHS_COLUMN = 'COLUMN=p2');

-- Set the hot and cold data boundary to one day and use the p2 column for hot and cold data separation. Set the time unit for the value in the p2 column to seconds (s).
CREATE TABLE dt (p1 integer, p2 bigint, p3 bigint, c1 varchar,  constraint pk PRIMARY KEY(p1, p2, p3)) WITH(COMPRESSION = 'ZSTD', CHS ='86400', CHS_L2 = 'storagetype=COLD', CHS_COLUMN = 'COLUMN=p2|TIMEUNIT=SECONDS');

-- Set the hot and cold data boundary to one day and use the p2 column for hot and cold data separation. Set the time unit for the value in the p2 column to minutes (min).
CREATE TABLE dt (p1 integer, p2 bigint, p3 bigint, c1 varchar,  constraint pk PRIMARY KEY(p1, p2, p3)) WITH(COMPRESSION = 'ZSTD', CHS ='86400', CHS_L2 = 'storagetype=COLD', CHS_COLUMN = 'COLUMN=p2|TIMEUNIT=MINUTES');

Parameters

Parameter	Description
CHS	The hot and cold data boundary, in seconds (s). For example, if CHS is set to 86400, data with a timestamp in the custom time column older than 86,400 seconds (one day) is automatically archived to cold storage.
COMPRESSION	The compression algorithm. This applies to the entire table. The algorithm name is not case-sensitive. The default configuration is none.
CHS_L2	The layer-2 property. This is typically configured to specify the storage type: `storagetype=COLD`.
CHS_COLUMN	The custom time column. The default unit for its value is milliseconds (ms). Other supported time units include SECONDS, MINUTES, MICROSECONDS, and NANOSECONDS. The format is `COLUMN=<custom_time_column>\|TIMEUNIT=<time_unit>`. For example, `COLUMN=p2\|TIMEUNIT=SECONDS`. Important When you set CHS_COLUMN, do not add spaces on either side of the equal sign (=). Otherwise, an error occurs.

Verify the results

You can use one of the following methods to verify the settings:

Execute the SHOW VARIABLES statement to view table properties. For example: SHOW TABLE VARIABLES FROM dt LIKE 'CHS_COLUMN';.
On the Overview page of the cluster management system, click the name of the target table in the target database. In the Current Table Details area, you can view the values of parameters such as CHS and CHS_COLUMN.

Method 2: If hot and cold data separation was not enabled when the table was created, use the ALTER TABLE statement to add the related properties.

-- For an existing data table where hot and cold data separation was not enabled at creation, for example:
-- CREATE TABLE dt (p1 integer, p2 bigint, p3 bigint, c1 varchar,  constraint pk primary key(p1, p2, p3));

-- Enable hot and cold data separation for the dt table based on a custom column. Set the hot and cold data boundary to one day, use the p2 column for separation, and use the default unit of milliseconds (ms).
ALTER TABLE dt SET 'CHS' ='86400', 'CHS_L2' = 'storagetype=COLD', 'CHS_COLUMN' = 'COLUMN=p2';

-- Enable hot and cold data separation for the dt table based on a custom column. Set the hot and cold data boundary to one day, use the p2 column for separation, and change the time unit to seconds (s).
ALTER TABLE dt SET 'CHS' ='86400', 'CHS_L2' = 'storagetype=COLD', 'CHS_COLUMN' = 'COLUMN=p2|TIMEUNIT=SECONDS';

-- Enable hot and cold data separation for the dt table based on a custom column. Set the hot and cold data boundary to one day, use the p2 column for separation, and change the time unit to minutes (min).
ALTER TABLE dt SET 'CHS' ='86400', 'CHS_L2' = 'storagetype=COLD', 'CHS_COLUMN' = 'COLUMN=p2|TIMEUNIT=MINUTES';

Verify the results

You can use one of the following methods to verify the settings:

Execute the SHOW VARIABLES statement to view table properties. For example: SHOW TABLE VARIABLES FROM dt LIKE 'CHS_COLUMN';.
On the Overview page of the cluster management system, click the name of the target table under the target database. In the Current Table Details area, view the values of parameters such as CHS and CHS_COLUMN.

Optional: Modify the hot and cold data boundary and custom time column of the table.
- Modify the hot and cold data boundary.
```
ALTER TABLE dt SET 'CHS'='1000';
```
- Modify the custom time column.
```
ALTER TABLE dt SET 'CHS_COLUMN'='COLUMN=p3'; 
```
Optional: Cancel hot and cold data separation for the table.
```
ALTER TABLE dt SET 'CHS'='', 'CHS_L2' = '', 'CHS_COLUMN'='';
```
Note
After you modify the hot and cold data boundary or cancel hot and cold data separation, you must wait for the system to perform a compaction in the background before data can be moved from cold storage back to hot storage. To return the data to hot storage immediately, you can manually execute the major_compact command.

Set the hot and cold data boundary for a secondary index

By default, a secondary index is stored in a table. Therefore, you can separate hot and cold data for a secondary index in the same way as for a table.

The following example creates a secondary index for the dt table from the preceding section.

Create a secondary index and enable hot and cold data separation.

Method 1: Set the custom time column and hot and cold data boundary when you create the secondary index.
```
CREATE INDEX idx on dt (c1) WITH(CHS = '86400', CHS_L2 = 'storagetype=COLD', CHS_COLUMN='COLUMN=p2');
```
Note
You cannot specify a custom time column for a secondary index. The CHS_COLUMN must be set to the custom time column of the base table.

Method 2: If the settings were not configured when the secondary index was created, use the ALTER TABLE statement to set the custom time column and hot and cold data boundary.

-- For an existing secondary index where hot and cold data separation was not enabled at creation, for example:
-- CREATE INDEX idx on dt (c1);

-- Enable hot and cold data separation for the secondary index table. Set the hot and cold data boundary to one day and use the p2 column for separation.
ALTER TABLE `dt.idx` SET 'CHS' = '86400', 'CHS_L2' = 'storagetype=COLD', 'CHS_COLUMN'='COLUMN=p2';

Optional: Modify the hot and cold data boundary and custom time column of the secondary index.
- Modify the hot and cold data boundary.
```
ALTER TABLE `dt.idx` SET 'CHS'='10000';
```
- Modify the custom time column.
```
ALTER TABLE `dt.idx` SET 'CHS_COLUMN'='COLUMN=p2'; 
```
Note
The name of a secondary index table uses the format TableName.IndexName. Because the name contains a period (.), which is a special character, you must enclose the name in backticks (`) to escape it. For example, if the table name is test.idx, the escaped name is `test.idx`.
Optional: Cancel hot and cold data separation for the secondary index.
```
ALTER TABLE `dt.idx` SET 'CHS'='', 'CHS_L2' = '', 'CHS_COLUMN'='';
```

Write data

The method for writing data to a table that has hot and cold data separation enabled is the same as for a standard table. Data is first stored in hot storage (Medium or Compute-optimized). Over time, if a row of data meets the Current time - Value of the time column > Value of CHS condition, it is archived to cold storage when a compaction is performed.

Query data

Because hot and cold data reside in the same table, all query operations are performed on that single table. When you query data, you can specify a time range for the custom time column to limit the query scope. The system then determines the query mode based on the specified time range and can query only the hot storage area, only the cold storage area, or both. If you do not limit the time range in your query, the query might hit cold data. This can cause the query throughput to be limited by the cold storage. For more information, see Storage-optimized Lindorm instances.

If you cannot set the custom time column in the query conditions, you can also use a HINT to set the _l_hot_only_ parameter to query only hot data.

Query examples

Random query (Get)

-- p2 is the custom time column.
SELECT * FROM dt WHERE p1 = 10 AND p2 = 10;

Range query
```
-- p2 is the custom time column.
SELECT * FROM dt WHERE p2 > 10 AND p2 < 1000;
```
Important
By default, LindormTable does not execute a SELECT statement identified as an inefficient query. Instead, it throws an exception. For more information, see SELECT.

Use a HINT to query only hot data

SELECT /*+ _l_hot_only_(true) */ * FROM dt WHERE p1>1;

Best practices

In Internet of Vehicles (IoV) scenarios, the primary key of a table typically consists of the vin (Vehicle Identification Number) and timestamp columns. The timestamp column can be defined as the custom time column for hot and cold data separation. When you query data for a vehicle within a specific time range, the system can determine whether to query cold data or only hot data based on the query conditions.

USE test;
CREATE TABLE dt (
  vin varchar, ts bigint, c1 varchar, c2 varchar, constraint pk primary key(vin, ts)) WITH
(COMPRESSION = 'ZSTD', CHS ='86400', CHS_L2 = 'storagetype=COLD', CHS_COLUMN = 'COLUMN=ts'); // The time column must be a primary key.
-- Query data for a specific vehicle within a time range.
SELECT * FROM dt WHERE vin='xxxx' AND ts > 1675065122000 AND ts < 1675975222000;

FAQ

Q: If I update existing cold data, does the updated data remain cold?
A: Yes, if the update does not change the value in the custom time column, the data remains cold. However, if the update changes the value in the custom time column, the data is re-evaluated as hot or cold based on the new value. For example, assume a table has primary key columns p1 and p2, and non-primary key columns c1 and c2. A row has the values p1=row1, p2=Jan-28-2023, c1="c1", and c2="c2". The hot and cold boundary (CHS) is 1 day, and the current date is Jan-30-2023. This row is cold data. If you update the values of c1 and c2, the row remains cold data. If you update the value of p2 to Jan-30-2023, the row becomes hot data. It will become cold data again on Feb-01-2023.
Note
You cannot update the value in a custom time column that is part of the primary key.
Q: If a row does not have a value in the custom time column, is the data separated into hot and cold?
A: No, it is not. The custom time column is the basis for hot and cold data separation. If a row does not have a value in the custom time column, it is kept in hot storage.