This paper introduces ApsaraDB for HBase data compressing and encoding in practical application.
Compression Algorithm
Currently, ApsaraDB for HBase platform supports the following compression algorithms: LZO, ZSTD, GZ, LZ4, SNAPPY, and NONE. NONE means that the compression is disabled. The following table compares the compression rates and speeds of the compression algorithms in different scenarios.
Business type | Size of an uncompressed table | LZO (compression rate/decompression speed, Unit: MB/s) | ZSTD (compression rate/decompression speed, Unit: MB/s) | LZ4 (compression rate/decompression speed, Unit: MB/s) |
---|---|---|---|---|
Monitoring | 419.75 TB | 5.82/372 | 13.09/256 | 5.19/463.8 |
Logs | 77.26 TB | 4.11/333 | 6.0/287 | 4.16/496.1 |
Risk control | 147.83 TB | 4.29/297.7 | 5.93/270 | 4.19/441.38 |
Transaction records | 108.04 TB | 5.93/316.8 | 10.51/288.3 | 5.55/520.3 |
Note
- We recommend that you use the LZ4 compression algorithm for the scenarios with high response time (RT) requirements.
- We recommend that you use the ZSTD compression algorithm for scenarios with low RT requirements, such as monitoring and Internet of Things (IoT) scenarios.
Encoding
ApsaraDB for HBase supports DataBlockEncoding, which compresses data by reducing the duplicate parts in HBase KeyValue. We recommend that you use DIFF for DATA_BLOCK_ENCODING.
Procedure
- Modify the COMPRESSION property of the table.
alter 'test', {NAME => 'f', COMPRESSION => 'lz4', DATA_BLOCK_ENCODING =>'DIFF'}
- The modifications do not take effect immediately. You must perform a major compaction
for the modifications to take effect. Major compactions are time consuming, and we
recommend that you perform a major compaction during off-peak hours.
major_compact 'test'
Note For more information, see Exploration of ApsaraDB for HBase compression encoding.