ApsaraDB for HBase provides a new cold storage medium to store cold data. It provides equivalent write performance at one third the storage cost of ultra disks. You can query cold data in the cold storage at any time.
Background information
When you purchase an ApsaraDB for HBase cluster, you can select the cold storage medium as an additional storage space, and execute table creation statements to store cold data on the medium. In addition, ApsaraDB for HBase Performance-enhanced Edition allows you to separate cold data from hot data in the same table. The system can automatically store hot data in hot storage with fast read/write speed and store infrequently accessed data in cold storage to reduce costs.
Usage notes
The read Input/Output Operations Per Second (IOPS) of cold storage is low (up to 25 times/s per node), so cold storage is applicable to infrequent queries.
The write throughput of cold storage equals the throughput of the ultra disks that are used for hot storage.
Cold storage is not suitable for processing a large number of concurrent read requests. An error may occur if cold storage is used to process a large number of concurrent read requests.
If your purchased cold storage is extremely large, you can adjust the read IOPS based on your business requirements. You can submit a ticket to request technical support.
We recommend that you store no more than 30 TB cold data in each core node. To increase the storage capacity of each core node, you can submit a ticket for optimization suggestions.
Prerequisites
Cold storage is supported only on ApsaraDB for HBase Performance-enhanced Edition V2.1.8 and later. If your ApsaraDB for HBase Performance-enhanced Edition cluster is of a version earlier than V2.1.8, the cluster is automatically upgraded to the latest version when you activate cold storage for your cluster. The version of the client dependency AliHBase-Connector must be later than V1.0.7 or V2.0.7. The version of HBase Shell must be later than alihbase-2.0.7-bin.tar.gz.
Scenarios
Cold storage is applicable to various cold data scenarios such as data archiving and infrequently accessed data consumption.
Activate cold storage
Method 1: When you create an ApsaraDB for HBase Performance-enhanced Edition cluster, you can choose whether to purchase cold storage and the capacity of cold storage on the buy page. For more information, see Purchase a cluster.
Method 2:
Log on to the ApsaraDB for HBase console.
On the Clusters page, find the instance that you want to manage and click the instance ID.
In the left-side navigation pane, click Cold Storage.
Click Activate Now.
When cold storage is being activated, a jitter may occur when services are accessed. We recommend that you activate cold storage during off-peak hours.
Cold storage is supported only on ApsaraDB for HBase Performance-enhanced Edition V2.1.8 and later. If your ApsaraDB for HBase Performance-enhanced Edition cluster is of a version earlier than V2.1.8, the cluster is automatically upgraded to the latest version when you activate cold storage for your cluster.
Use cold storage
ApsaraDB for HBase Performance-enhanced Edition allows you to set storage properties based on column families. You can set the Storage parameter of a column family or all column families of a table to COLD. Then all data of this column family or all column families in the table is stored in cold storage and does not occupy the Hadoop Distributed File System (HDFS) space of the cluster. You can specify the property when you create a table or modify the property of the column family after you create a table.
You can use Java API or HBase Shell to create a table and modify the table properties. If you use the Java API, you must install the SDK for Java and configure the parameters first. For more information, see Use the HBase Java API to access ApsaraDB for HBase Performance-enhanced Edition clusters. If you use HBase Shell, follow the steps in Use HBaseue Shell to access an ApsaraDB for HBase Performance-enhanced Edition instance to download and configure HBase Shell.
Create a table that uses cold storage
HBase Shell
hbase(main):001:0> create 'coldTable', {NAME => 'f', STORAGE_POLICY => 'COLD'}
Java API
Admin admin = connection.getAdmin();
HTableDescriptor descriptor = new HTableDescriptor(TableName.valueOf("coldTable"));
HColumnDescriptor cf = new HColumnDescriptor("f");
cf.setValue("STORAGE_POLICY", AliHBaseConstants.STORAGETYPE_COLD);
descriptor.addFamily(cf);
admin.createTable(descriptor);
Modify the table property to use cold storage
If you have created a table, you can modify the property of a column family in the table to use cold storage. If the column family contains data, the data is archived to cold storage only after a major compaction.
HBase Shell
hbase(main):011:0> alter 'coldTable', {NAME=>'f', STORAGE_POLICY => 'COLD'}
Java API
Admin admin = connection.getAdmin();
TableName tableName = TableName.valueOf("coldTable");
HTableDescriptor descriptor = admin.getTableDescriptor(tableName);
HColumnDescriptor cf = descriptor.getFamily("f".getBytes());
// Set the storage type of the table to cold storage.
cf.setValue("STORAGE_POLICY", AliHBaseConstants.STORAGETYPE_COLD);
admin.modifyTable(tableName, descriptor);
Modify the table property to use hot storage
If the column storage type of the table is cold storage, you can change the type back to hot storage by changing the table property. If the column family contains data, the data is archived to hot storage after a major compaction.
HBase Shell
hbase(main):014:0> alter 'coldTable', {NAME=>'f', STORAGE_POLICY => 'DEFAULT'}
Java API
// Create a connection.
Admin admin = connection.getAdmin();
TableName tableName = TableName.valueOf("coldTable");
HTableDescriptor descriptor = admin.getTableDescriptor(tableName);
HColumnDescriptor cf = descriptor.getFamily("f".getBytes());
// Set the storage type of the table to the default storage. By default, hot storage is used.
cf.setValue("STORAGE_POLICY", AliHBaseConstants.STORAGETYPE_DEFAULT);
admin.modifyTable(tableName, descriptor);
View the cold storage status
You can view the cold storage status on the Cold Storage page in the console and expand the capacity of the cold storage by clicking Cold Storage Scaling on the same page. You can check the sizes of cold and hot data in a table on the User tables tab of the cluster management system.
Performance testing
Runtime environment overview
Master: ecs.c5.xlarge, 4-core 8 GB memory, and a 20 GB ultra disk.
4RegionServer: ecs.c5.xlarge, 4-core 8 GB memory, and a 20 GB ultra disk.
Test machine: ecs.c5.xlarge and 4-core 8 GB memory.
Write performance
Storage type | avg rt | p99 rt |
Hot storage | 1736 μs | 4811 μs |
Cold storage | 1748 μs | 5243 μs |
Each data record includes 10 columns and has 100 bytes of data stored in each column. This means that each row stores 1 KB data. The system writes data in 16 parallel threads.
Random GET performance
Storage type | avg rt | p99 rt |
Hot storage | 1704 μs | 5923 μs |
Cold storage | 14738 μs | 31519 μs |
If you disable BlockCache, the system reads the data from the disk every time. Each data record includes 10 columns and has 100 bytes of data stored in each column. This means that each row stores 1 KB of data. The system reads 1 KB of data in 8 parallel threads for each request.
Scan performance within a specified range
Storage type | avg rt | p99 rt |
Hot storage | 6222 μs | 20975 μs |
Cold storage | 51134 μs | 115967 μs |
If you disable BlockCache, each data record includes 10 columns and has 100 bytes of data stored in each column. This means that each row stores 1 KB of data. The system reads 1 KB of data in 8 parallel threads for each request. You can set the Caching parameter to 30.