LindormTable allows you to separately store hot and cold data based on custom time columns. After you purchase Capacity storage for a Lindorm instance, you can specify a custom time column in a table or secondary index and separately store hot and cold data based on the column. This improves the query efficiency of hot data while reducing the storage costs of cold data. This topic describes how to separately store hot data and cold data based on custom time columns and usage notes.
Prerequisites
Cold storage is enabled for the Lindorm instance. For more information, see Overview.
The version of LindormTable is 2.4.3 or later. For more information about how to view or upgrade the version of LindormTable, see Release notes of LindormTable and Upgrade the minor engine version of a Lindorm instance.
You are connected to LindormTable by using Lindorm-cli. For more information, see Use Lindorm-cli to connect to and use LindormTable.
The STORAGE_POLICY attribute of the tables or column families is not
COLD
, which indicates that the table for which you want to enable hot and cold data separation is not a table in the cold storage. For more information about how to modify the attribute of column families, see Configure cold storage.
Usage notes
Hot and cold data separation supports only tables created by using SQL statements. Tables created by using a HBase shell are not supported.
When the value in the custom time column of a row exceeds the specified hot and cold data boundary, the entire row is archived in cold storage.
Take note of the following rules when you configure custom time columns:
The custom time column must be a primary key column.
The custom time column cannot be the first column of the primary key.
The data type of the custom time column must be BIGINT. The values in the custom time column are UNIX timestamps representing the number of milliseconds that have elapsed since the epoch time January 1, 1970, 00:00:00 UTC. If the specified custom time column does not exist, data is not written to the column, or the type of data in the column is incorrect, Lindorm cannot determine whether to store the row in cold storage or hot storage. In this case, all data in the row is stored in hot storage.
Procedure
Specify a hot and cold data boundary for a table
Specify a custom time column and hot and cold data boundary.
Method 1: Specify the column and boundary for a table when you create the table.
You can execute the following statement to create a table named dt, set the hot and cold data boundary to one day, and specify p2 as the column based on which hot and cold data is separately stored:
CREATE TABLE dt (p1 integer, p2 bigint, p3 bigint, c1 varchar, constraint pk primary key(p1, p2, p3)) WITH(COMPRESSION = 'ZSTD', CHS ='86400', CHS_L2 = 'storagetype=COLD', CHS_COLUMN = 'COLUMN=p2');
Parameters
CHS: The hot and cold data boundary for the table. Data that has been stored in the table for a period longer than this period is archived in cold storage. The unit of the value of this parameter is seconds. For example, if the value of this parameter is 86400, data is archived to cold storage after the data has been written to the table for 86,400 seconds, which is equal to one day.
COMPRESSION: The algorithm used to compress the data. This parameter applies to all data in the table. The value of this parameter is not case-sensitive. Default value: NONE.
CHS_L2: The layer-2 attribute. In general, this parameter is configured to specify the storage type. Set this parameter to
storagetype=COLD
.CHS_COLUMN: The custom time column.
Method 2: Use the
ALTER TABLE
syntax to specify the column and boundary for an existing table for which hot and cold data separation is not enabled when the table is created.-- In this example, hot and cold data separation is not enabled for the dt table when the table is created. -- CREATE TABLE dt (p1 integer, p2 bigint, p3 bigint, c1 varchar, constraint pk primary key(p1, p2, p3)); -- Enable hot and cold data separation for the dt table, set the hot and cold data boundary to one day, and specify p2 as the column based on which hot and cold data is separately stored. ALTER TABLE dt SET 'CHS' ='86400', 'CHS_L2' = 'storagetype=COLD', 'CHS_COLUMN' = 'COLUMN=p2';
(Optional) Modify the hot and cold data boundary and custom time column of the table.
You can execute the following statement to modify the hot and cold data boundary of the table:
ALTER TABLE dt SET 'CHS'='1000';
You can execute the following statement to modify the custom time column of the table:
ALTER TABLE dt SET 'CHS_COLUMN'='COLUMN=p3';
(Optional) Disable hot and cold data separation for the table.
ALTER TABLE dt SET 'CHS'='', 'CHS_L2' = '', 'CHS_COLUMN'='';
NoteAfter you change the hot and cold data boundary or disable hot and cold data separation, the data is transferred from cold storage to hot storage after Lindorm performs the
compaction
operation. To immediately transfer the data from cold storage to hot storage, manually run the major_compact command.
Configure a hot and cold data boundary for a secondary index
By default, secondary indexes are stored in a table. Therefore, you can enable hot and cold data separation for a secondary index.
In this example, a secondary index is created for the dt table created in the preceding section.
Create a secondary index and enable hot and cold data separation.
Method 1: Specify a custom time column and configure a hot and cold data boundary for the secondary index when you create the secondary index.
CREATE INDEX idx on dt (c1) WITH(CHS = '86400', CHS_L2 = 'storagetype=COLD', CHS_COLUMN='COLUMN=p2');
NoteThe custom time column of a secondary index must be the same as the custom time column of the base table.
Method 2: Use the
ALTER TABLE
syntax to specify the column and boundary for an existing secondary index for which hot and cold data separation is not enabled when the index is created.-- In this example, hot and cold data separation is not enabled for the secondary index named idx when the index is created. -- CREATE INDEX idx on dt (c1); -- Enable hot and cold data separation for the idx secondary index, set the hot and cold data boundary to one day, and specify p2 as the column based on which hot and cold data is separately stored. ALTER TABLE `dt.idx` SET 'CHS' = '86400', 'CHS_L2' = 'storagetype=COLD', 'CHS_COLUMN'='COLUMN=p2';
(Optional) Modify the hot and cold data boundary and custom time column of the secondary index table.
You can execute the following statement to modify the hot and cold data boundary of the secondary index table:
ALTER TABLE `dt.idx` SET 'CHS'='10000';
You can execute the following statement to modify the custom time column of the secondary index table:
ALTER TABLE `dt.idx` SET 'CHS_COLUMN'='COLUMN=p2';
NoteThe name of a secondary index table is in the following format:
<Table name>.<Secondary index name>
. The name contains a period (.), which is a special character. Therefore, you must enclose the name of the table name with a pair of backticks (`) to escape it when you specify the table name in the statement. For example, if the name of the table istest.idx
, the table name that you specify in the statement must be`test.idx`
.(Optional) Disable hot and cold data separation for the secondary index table.
ALTER TABLE `dt.idx` SET 'CHS'='', 'CHS_L2' = '', 'CHS_COLUMN'='';
Write data
You can write data to a table that separately stores hot data and cold data in the same manner as that you write data to a standard table. Data that is written to a table with hot and cold data separation enabled is stored in hot storage first, such as standard and performance-based storage. When a row of data has been stored in hot storage for a period that is longer than the specified boundary, the data is archived to cold storage when the compaction
operation is performed. The period for which a row of data is stored can be calculated by using the following formula: Current time - Value of the time column in the row
.
Query data
Lindorm uses the same table to store hot data and cold data. This way, you can query all data within one table. We recommend that you specify a time range for the value of the custom time column as a query condition. Lindorm determines whether to query only the hot storage or cold storage, or query both the hot and cold storage based on the time range that you specify. If you do not specify a time range as a condition in a query, cold data may be queried. In this case, the performance of the query is throttled by the throughput of the cold storage. For more information, see Overview.
In the value of the custom time column cannot be specified as a query condition, you can also use the _l_hot_only_ hint in the query to query only data in the hot storage.
Examples
Use the GET method to query data
-- p2 indicates the custom time column. SELECT * FROM dt WHERE p1 = 10 AND p2 = 10;
Use the SCAN method to query data within a specified range
-- p2 indicates the custom time column. SELECT * FROM dt WHERE p2 > 10 AND p2 < 1000;
ImportantIf LindormTable determines that a SELECT query is an inefficient query, LindormTable does not process the query by default and throws an exception. For more information, see SELECT.
Use hints to query only hot data
SELECT /*+ _l_hot_only_(true) */ * FROM dt WHERE p1>1;
Best practices
In Internet of Vehicles (IoV) scenarios, the primary key of a table generally consists of the Vehicle Identification Number (VIN) column and the timestamp column
. The timestamp column is usually specified as the custom time column based on which hot and cold data is determined. When you query the data of a vehicle within a specific time range, Lindorm can determine whether to query cold data or query only hot data based on the specified time range.
USE test;
CREATE TABLE dt (
vin varchar, ts bigint, c1 varchar, c2 varchar, constraint pk primary key(vin, ts)) WITH
(COMPRESSION = 'ZSTD', CHS ='86400', CHS_L2 = 'storagetype=COLD', CHS_COLUMN = 'COLUMN=ts'); // The time column must be included in the primary key.
-- Query the data of a vehicle within a time range.
SELECT * FROM dt WHERE vin='xxxx' AND ts > 1675065122000 AND ts < 1675975222000;
FAQ
Q: Is a row of cold data still stored in the cold storage after it is updated?
A: If the data that you update is not in the custom time column, the row is stilled stored in the cold storage. If the data that you update is in the custom time column, Lindorm determines whether to store this row in the cold storage based on the updated data in the custom time column. For example, a table contains the following columns: p1, p2, c1, and c2. The primary key of the table includes the p1 and p2 columns. In a row of data, the values in the p1, p2, c1, and c2 columns are row1, 2023.1.28, c1, and c2 individually. The hot and cold data boundary for the table is one day, and the current date is January 30, 2023. In this case, this row is determined as cold data and is stored in the cold storage. If you update data in the c1 and c2 columns, this row is still stored in the cold storage. If you update the value in p2 to 2023.1.30, this row is determined as hot data until Feb 2, 2023 based on the hot and cold data boundary.
NoteIf a custom time column is specified as the primary key, the values in the column cannot be updated.
Q: Is a row archived to the cold storage if no value is specified for the custom time column of the row?
A: No, the row is not archived to the cold storage. Hot and cold data separation is implemented based on the value in the custom time column. A row with the custom time column unspecified is stored in the hot storage.