This topic describes how to separately store hot data and cold data in Lindorm based on timestamps.
Background information
Lindorm can archive data to cold storage based on the timestamp and the specified hot and cold data boundary. If you do not specify a timestamp, the time when the data is written to the table is used to determine whether to archive data to cold storage.
Prerequisites
Cold storage is enabled for the Lindorm instance. For more information, see Overview.
The version of LindormTable is 2.1.8 and later. For more information about how to view or upgrade the version of LindormTable, see Release notes of LindormTable and Upgrade the minor engine version of a Lindorm instance.
You are connected to LindormTable.
For more information about how to use HBase Shell to connect to LindormTable, see Use Lindorm Shell to connect to LindormTable.
For more information about how to use the ApsaraDB for HBase API for Java to connect to LindormTable, see Use ApsaraDB for HBase API for Java to develop applications.
For more information about how to use Lindorm-cli to connect to LindormTable, see Use Lindorm-cli to connect to and use LindormTable.
The STORAGE_POLICY attribute of the tables or column families is not
COLD
, which indicates the table for which you want to enable hot and cold data separation is not a table in the cold storage. For more information about how to modify the attribute of column families, see Configure cold storage.
Procedure
Method 1: Use Apache HBase Shell to implement hot and cold data separation
Create a table and specify a hot and cold data boundary for the table.
HBase(main):002:0> create 'chsTable', {NAME=>'f', COLD_BOUNDARY=>'86400'}
Parameters
NAME: The name of the column family for which you want to implement hot and cold data separation.
COLD_BOUNDARY: The boundary based on which hot and cold data separation is performed. The unit of the value of this parameter is seconds. For example, if the value of the COLD_BOUNDARY parameter is set to 86400, data is archived to cold storage after the data has been stored in hot storage for 86,400 seconds, which is equal to one day.
(Optional) Change the hot and cold data boundary.
HBase(main):005:0> alter 'chsTable', {NAME=>'f', COLD_BOUNDARY=>'42300'}
(Optional) Disable hot and cold data separation.
HBase(main):004:0> alter 'chsTable', {NAME=>'f', COLD_BOUNDARY=>""}
NoteAfter you change the hot and cold data boundary or disable hot and cold data separation, the data is transferred from cold storage to hot storage after Lindorm performs the
compaction
operation. To immediately transfer the data from cold storage to hot storage, manually run the major_compact command.
Method 2: Use ApsaraDB for HBase API for Java to implement hot and cold data separation
Create a table and specify a hot and cold data boundary for the table.
Admin admin = connection.getAdmin(); TableName tableName = TableName.valueOf("chsTable"); HTableDescriptor descriptor = new HTableDescriptor(tableName); HColumnDescriptor cf = new HColumnDescriptor("f"); cf.setValue(AliHBaseConstants.COLD_BOUNDARY, "86400"); descriptor.addFamily(cf); admin.createTable(descriptor);
(Optional) Change the hot and cold data boundary.
HTableDescriptor descriptor = admin .getTableDescriptor(tableName); HColumnDescriptor cf = descriptor.getFamily("f".getBytes()); cf.setValue(AliHBaseConstants.COLD_BOUNDARY, "86400"); admin.modifyTable(tableName, descriptor);
(Optional) Disable hot and cold data separation.
HTableDescriptor descriptor = admin .getTableDescriptor(tableName); HColumnDescriptor cf = descriptor.getFamily("f".getBytes()); cf.setValue(AliHBaseConstants.COLD_BOUNDARY, null); admin.modifyTable(tableName, descriptor);
NoteAfter you change the hot and cold data boundary or disable hot and cold data separation, the data is transferred from cold storage to hot storage after Lindorm performs the
compaction
operation.
Method 3: Use lindorm-cli to implement hot and cold data separation
Create a table and specify a hot and cold data boundary for the table.
Enable hot and cold data separation and specify a hot and cold data boundary for a table when you create the table.
CREATE TABLE dt ( p1 integer, p2 integer, c1 varchar, c2 bigint, constraint pk primary key(p1 desc)) WITH (COMPRESSION = 'ZSTD', CHS = '86400', CHS_L2 = 'storagetype=COLD');
Parameters
CHS: The hot and cold data boundary for the table. Data that has been stored in the table for a period longer than this period is archived in cold storage. The unit of the value of this parameter is seconds. For example, if the value of this parameter is 86400, data is archived to cold storage after the data has been written to the table for 86,400 seconds, which is equal to one day.
COMPRESSION: The algorithm used to compress the data. This parameter applies to all data in the table. The value of this parameter is not case-sensitive. Default value: NONE.
CHS_L2: The layer-2 attribute. In general, this parameter is configured to specify the storage type. Set this parameter to storagetype=COLD.
Use the
ALTER TABLE
syntax to specify a hot and cold data boundary for an existing table.-- In this example, hot and cold data separation is not enabled for the sample table dt when the table is created. -- CREATE TABLE dt (p1 integer, p2 integer, c1 varchar, c2 bigint, constraint pk primary key(p1 desc)) WITH (COMPRESSION = 'ZSTD'); -- Enable hot and cold data separation for the dt table. ALTER TABLE dt SET 'CHS' ='86400', 'CHS_L2' = 'storagetype=COLD';
(Optional) Change the hot and cold data boundary.
ALTER TABLE dt SET 'CHS'='1000';
(Optional) Disable hot and cold data separation.
ALTER TABLE dt SET 'CHS'='', 'CHS_L2' = '';
NoteAfter you change the hot and cold data boundary or disable hot and cold data separation, the data is transferred from cold storage to hot storage after Lindorm performs the
compaction
operation. To immediately transfer the data from cold storage to hot storage, manually run themajor_compact
command. For more information about the syntax used to perform this operation, see ALTER TABLE.
Write data
You can write data to a wide table that separately stores hot data and cold data in the same manner as that in which you write data to a standard table. By default, the timestamp of data is the current system time when the data is written. If you use the ApsaraDB for HBase API to write data, you can specify a custom timestamp based on your requirement. When data is written to a table that separately stores hot data and cold data, it is first stored in hot storage that is of the Standard or Performance type. When the data has been stored for a period that is longer than the value specified by COLD_BOUNDARY or CHS, the data is archived to cold storage when Lindorm performs the compaction
operation.
The timestamp of data is used by Lindorm to determine whether the data needs to be archived to cold storage. Lindorm compares the current time and the timestamp of data. If the timestamp of data is a point in time three days after the current system time, the data is archived to cold storage three days later than data whose timestamp is the current system time. If the timestamp of data is a point in time three days before the current system time and the hot and cold data boundary is set to three days, the data is asynchronously archived to cold storage after it is written to the table.
Query data
Lindorm uses the same table to store hot data and cold data. This way, you can query all data within one table. When you query data, you can configure the TimeRange parameter to specify a time range for the query. Lindorm determines whether to query only hot storage or cold storage, or query both hot and cold storage based on the time range that you specify. If you do not specify a time range in a query, cold data may be queried. In this case, the throughput of the query is throttled by the specification of cold storage. For more information, see Overview.
If you want to query data in hot storage, you can use the _l_hot_only_ hint in the query or configure the HOT_ONLY parameter to query only data in hot storage.
Data in cold storage is intended for archiving and should be rarely accessed. If data in cold storage is frequently accessed by a large number of requests, check whether the hot and cold data boundary specified by COLD_BOUNDARY is properly configured. If large amounts of data that needs to be frequently queried is archived to cold storage based on the boundary, the performance of the queries may be throttled.
If a field in a row of data stored in cold storage is updated, the updated field is stored in hot storage. If you specify the HOT_ONLY or TimeRange parameter to query only data in hot storage, only the updated field is returned. If you want LindormTable to return the data of the entire row, you must perform the query without specifying the HOT_ONLY or TimeRange parameter, or make sure that the time range specified by TimeRange covers the time period from the point in time when the row was inserted to the point in time when the row was last updated. Therefore, we recommend that you do not update data stored in cold storage.
Currently, the GET and SCAN methods are supported to query data.
Use the GET method to query data
Method 1: Use Apache HBase Shell to query data
Query data without specifying the HOT_ONLY parameter. In this case, data in cold storage may be queried.
HBase(main):013:0> get 'chsTable', 'row1'
Query data by specifying the HOT_ONLY parameter. In this case, only data in hot storage is queried.
HBase(main):015:0> get 'chsTable', 'row1', {HOT_ONLY=>true}
Query data by specifying the TimeRange parameter. Lindorm compares the values of TimeRange and COLD_BOUNDARY to determine whether to query only hot storage or cold storage, or query both hot and cold storage.
HBase(main):016:0> get 'chsTable', 'row1', {TIMERANGE => [0, 1568203111265]}
Parameters
TimeRange: the time range within which the query is performed. The time in the range is a UNIX timestamp representing the number of milliseconds that have elapsed since the epoch time January 1, 1970, 00:00:00 UTC.
Method 2: Use ApsaraDB for HBase API for Java to query data
Query data without specifying the HOT_ONLY parameter. In this case, data in cold storage may be queried.
Get get = new Get("row1".getBytes()); System.out.println("result: " + table.get(get));
Query data by specifying the HOT_ONLY parameter. In this case, only data in hot storage is queried.
get = new Get("row1".getBytes()); get.setAttribute(AliHBaseConstants.HOT_ONLY, Bytes.toBytes(true));
Query data by specifying the TimeRange parameter. Lindorm compares the values of TimeRange and COLD_BOUNDARY to determine whether to query only hot storage or cold storage, or query both hot and cold storage.
get = new Get("row1".getBytes()); get.setTimeRange(0, 1568203111265)
Method 3: Use Lindorm SQL to query data
SELECT /*+ _l_hot_only_ */ * FROM dt WHERE pk IN (1, 2, 3);
Use the SCAN method to query data within a specified range
If you do not configure the HOT_ONLY and TimeRange parameters, or the value of TimeRange covers the timestamps of data in cold storage, Lindorm queries data in both cold storage and hot storage and merges the query results.
Only Apache HBase Shell and ApsaraDB for HBase API for Java support queries within specified ranges.
Method 1: Use Apache HBase Shell to query data
Query data without specifying the HOT_ONLY parameter. In this case, data in cold storage is queried.
Lindorm(main):017:0> scan 'chsTable', {STARTROW =>'row1', STOPROW=>'row9'}
Query data by specifying the HOT_ONLY parameter. In this case, only data in hot storage is queried.
Lindorm(main):018:0> scan 'chsTable', {STARTROW =>'row1', STOPROW=>'row9', HOT_ONLY=>true}
Query data by specifying the TimeRange parameter. Lindorm compares the values of TimeRange and COLD_BOUNDARY to determine whether to query only hot storage or cold storage, or query both hot and cold storage.
Lindorm(main):019:0> scan 'chsTable', {STARTROW =>'row1', STOPROW=>'row9', TIMERANGE => [0, 1568203111265]}
Method 2: Use ApsaraDB for HBase API for Java to query data
TableName tableName = TableName.valueOf("chsTable"); Table table = connection.getTable(tableName); // In this example, the HOT_ONLY parameter is not configured. Data in cold storage is queried. Scan scan = new Scan(); ResultScanner scanner = table.getScanner(scan); for (Result result : scanner) { System.out.println("scan result:" + result); } // In this example, the HOT_ONLY parameter is configured. Only data in hot storage is queried. scan = new Scan(); scan.setAttribute(AliLindormConstants.HOT_ONLY, Bytes.toBytes(true)); // In this example, the TimeRange parameter is configured. Lindorm compares the values of TimeRange and COLD_BOUNDARY to determine whether to query only hot storage or cold storage, or query both hot and cold storage. scan = new Scan(); scan.setTimeRange(0, 1568203111265);
Preferentially query hot data
In a SCAN query that is performed to query information such as all orders or chat records of a customer, LindormTable may scan data in both hot storage and cold storage. The query results are paginated based on the timestamps when the data rows were written to the table in descending order. In most cases, hot data appears before cold data. If you do not configure the HOT_ONLY parameter in a SCAN query, LindormTable scans both hot data and cold data. As a result, the query response time increases. If you enable the hot data prioritization feature, LindormTable preferentially queries data in hot storage. Data in cold storage is queried only when the number of rows in hot storage is less than the minimum number of data rows to query. This way, access to data in cold storage is reduced and the response speed is improved.
You can use only Apache HBase Shell or ApsaraDB for HBase API for Java to preferentially query hot data.
Method 1: Use Apache HBase Shell to query data
Lindorm(main):002:0> scan 'chsTable', {COLD_HOT_MERGE=>true}
Parameters
COLD_HOT_MERGE: Specifies whether to enable the hot data prioritization feature.
true: enable the hot data prioritization feature. When the COLD_HOT_MERGE parameter is set to true, LindormTable preferentially scans hot data. LindormTable scans cold data only when the number of data rows in hot storage is less than the number of rows that you want to query.
false: disable the hot data prioritization feature.
Method 2: Use ApsaraDB for HBase API for Java to query data
scan = new Scan(); scan.setAttribute(AliHBaseConstants.COLD_HOT_MERGE, Bytes.toBytes(true)); scanner = table.getScanner(scan);
Usage notes
In scenarios where data in specific fields of a row is updated, the row stores hot data and cold data at the same time. If you enable the hot data prioritization feature, the query results are returned in two batches. You can find two results of the same row key in the result set.
After you enable the hot data prioritization feature, hot data is returned before cold data. Therefore, the row key values of specific returned rows of cold data may be smaller than the row key values of specific returned rows of hot data because LindormTable returns hot data before cold data. The results that are returned for a SCAN query are not sequentially sorted. The rows of hot data and the rows of cold data are separately sorted based on row key values. For more information about how returned rows are sorted, see the following sample results. In some scenarios, you can specify a row key to ensure the order of the results of a SCAN query. For example, if you use a table to store the information about orders that your customers make, you can specify a row key that consists of the column that store customer IDs and the column that stores the order creation time. This way, when you query the orders that were made by a customer, the returned orders are sorted based on the points in time when the orders were created.
// In the following example, the row whose row key value is coldRow stores cold data and the row whose row key value is hotRow stores hot data. // In most cases, the coldRow row is returned before the hotRow row because rows in Lindorm are sorted in lexicographic order. HBase(main):001:0> scan 'chsTable' ROW COLUMN+CELL coldRow column=f:value, timestamp=1560578400000, value=cold_value hotRow column=f:value, timestamp=1565848800000, value=hot_value 2 row(s) // If you set the value of the COLD_HOT_MERGE parameter to true, LindormTable scans the row whose row key value is hotRow first. As a result, the hotRow row is returned before the coldRow row. HBase(main):002:0> scan 'chsTable', {COLD_HOT_MERGE=>true} ROW COLUMN+CELL hotRow column=f:value, timestamp=1565848800000, value=hot_value coldRow column=f:value, timestamp=1560578400000, value=cold_value 2 row(s)
FAQ
Q: Is a row of cold data still cold data after it is updated?
A: No. After a row of cold data is updated, the timestamp of this row is updated. Therefore, the cold data becomes hot data.
Q: Why cold data is returned for my query even if I want to query only hot data?
A: You can configure the HOT_ONLY parameter or the _l_hot_only_ hint to query only hot data. Data is periodically archived to cold storage based on its timestamps. Therefore, some cold data may have not been archived to cold storage yet when you query the data. In this case, cold data is returned. To resolve this issue, you can specify a time range for the hot data that you want to query. The following example shows how to use this function:
// You must use the _l_ts_min_ and _l_ts_max_ hints to specify a time range. The _l_ts_min_ hint indicates the difference between the current system time and the hot and cold data boundary. The _l_ts_max_ hint indicates the current system time. The unit of the hints must be the same. SELECT /*+ _l_hot_only_(true), _l_ts_min_(1000), _l_ts_max_(2001) */ * FROM test WHERE p1>1;
Q: Why does my query time out even if I specify a time range and use the HOT_ONLY hint in my query?
A: This issue generally occurs after you migrate data to a table or enable hot and cold separation for a table. In this case, cold data is not completely archived to cold storage and large mounts of cold data is still stored in hot storage. Therefore, the query may time out. To solve this issue, you must perform the
major compaction
operation on the table. For more information about the syntax used to perform this operation, see ALTER TABLE.