The benchmark tests in this topic compare the throughput and response latency between an Apache HBase cluster and a Lindorm cluster.
The throughput test uses the same number of threads to test the throughput of the Apache HBase cluster and the throughput of the Lindorm cluster. The response latency test uses the same workloads to test the response latency of the Apache HBase cluster and the response latency of the Lindorm cluster. The compression ratio test writes the same amount of data into the Apache HBase cluster and the Lindorm cluster to test the compression ratios.
Create tables
Create tables in the Apache HBase cluster and Lindorm cluster. The tables used in all tests use the same schema. Create 200 partitions based on the Yahoo Cloud Serving Benchmark (YCSB) data when you create a table.
For more information about how to use HBase Shell to create tables, see Use Lindorm Shell to connect to LindormTable.
The Lindorm cluster supports the INDEX encoding and Zstandard compression algorithms. INDEX encoding is a compression algorithm exclusive to Lindorm. If you set the encoding algorithm to DIFF, the INDEX encoding algorithm is used. Execute the following statement to create the table:
create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'ZSTD'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }
The Apache HBase cluster uses the DIFF encoding and SNAPPY compression algorithms that are recommended by Apache HBase. Execute the following statement to create the table:
create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'SNAPPY'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }
Prepare data
Prepare the data to be read from individual rows and from multiple rows.
Each table contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes.
The following code block shows the YCSB profile:
recordcount=2000000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=false
fieldcount=20
fieldlength=20
readproportion=1.0
updateproportion=0.0
scanproportion=0
insertproportion=0
requestdistribution=uniform
Run the following command to perform a stress test:
bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s
Throughput test
The throughput test compares the throughput of the Apache HBase cluster with that of the Lindorm cluster based on the same number of threads. The test includes four scenarios. The scenarios are independent of each other.
Read data from an individual row
The table contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. After the preceding data is prepared, perform a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.
The following code block shows the workload configuration in the YCSB profile:
recordcount=10000000 operationcount=2000000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=false fieldcount=20 fieldlength=20 readproportion=1.0 updateproportion=0.0 scanproportion=0 insertproportion=0 requestdistribution=uniform
Run the following command to perform a stress test:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200
Read data from a specified range
The table contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. 50 rows are read each time. After the preceding data is prepared, perform a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.
The following code block shows the workload configuration in the YCSB profile:
recordcount=10000000 operationcount=2000000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=false fieldcount=20 fieldlength=20 readproportion=0.0 updateproportion=0.0 scanproportion=1.0 insertproportion=0 requestdistribution=uniform maxscanlength=50 Lindorm.usepagefilter=false
Run the following command to perform a stress test:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200
Insert data into an individual row
Insert one column to the table each time. The size of the inserted column is 20 bytes. Run the test for 20 minutes.
The following code block shows the workload configuration in the YCSB profile:
recordcount=2000000000 operationcount=100000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=false fieldcount=1 fieldlength=20 readproportion=0.0 updateproportion=0.0 scanproportion=0 insertproportion=1.0 requestdistribution=uniform
Run the following command to perform a stress test:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200
Insert data into multiple rows
Insert one column to the table each time. The size of the inserted column is 20 bytes. Each batch inserts data into 100 rows. Run the test for 20 minutes.
recordcount=2000000000 operationcount=10000000 workload=com.yahoo.ycsb.workloads.CoreWorkload fieldcount=1 fieldlength=20 cyclickey=true readallfields=false readproportion=0 updateproportion=0 scanproportion=0 insertproportion=0.0 batchproportion=1.0 batchsize=100 requestdistribution=uniform
Run the following command to perform a stress test:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200
Response latency test
The response latency test compares the response latency of the Apache HBase cluster with that of the Lindorm cluster based on the same Operations per Second (OPS).
Read data from an individual row
The table contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. The maximum OPS is 5000. After the preceding data is prepared, perform a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.
The following code block shows the workload configuration in the YCSB profile:
recordcount=10000000 operationcount=2000000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=false fieldcount=20 fieldlength=20 readproportion=1.0 updateproportion=0.0 scanproportion=0 insertproportion=0 requestdistribution=uniform
Run the following command to perform a stress test:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000
Read data from a specified range
The table contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. 50 rows are read each time. The maximum OPS is 5000. After the preceding data is prepared, perform a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.
The following code block shows the workload configuration in the YCSB profile:
recordcount=10000000 operationcount=2000000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=false fieldcount=20 fieldlength=20 readproportion=0.0 updateproportion=0.0 scanproportion=1.0 insertproportion=0 requestdistribution=uniform maxscanlength=50 Lindorm.usepagefilter=false
Run the following command to perform a stress test:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000
Insert data into an individual row
Insert one column to the table each time. The size of the inserted column is 20 bytes. Run the test for 20 minutes. The maximum OPS is 50000.
The following code block shows the workload configuration in the YCSB profile:
recordcount=2000000000 operationcount=100000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=false fieldcount=1 fieldlength=20 readproportion=0.0 updateproportion=0.0 scanproportion=0 insertproportion=1.0 requestdistribution=uniform
Run the following command to perform a stress test:
bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=50000
Insert data into multiple rows
Insert one column to the table each time. The size of the inserted column is 20 bytes. Each batch inserts data into 100 rows. Run the test for 20 minutes. The maximum OPS is 2000.
recordcount=2000000000 operationcount=10000000 workload=com.yahoo.ycsb.workloads.CoreWorkload fieldcount=1 fieldlength=20 cyclickey=true readallfields=false readproportion=0 updateproportion=0 scanproportion=0 insertproportion=0.0 batchproportion=1.0 batchsize=100 requestdistribution=uniform
Run the following command to perform a stress test:
bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=2000
Compression ratio test
The following compression ratio tests all follow the same procedure. Manually trigger a flush and major compaction by using YCSB to insert 5 million rows to the table. After the data is inserted into the table, check the size of the table.
Number of columns in each row | Size of each column |
1 | 10 |
20 | 10 |
20 | 20 |
100 | 10 |
The following code block shows the workload configuration in the YCSB profile:
recordcount=5000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=false
fieldcount=<Number of columns in each row>
fieldlength=<Size of each column>
readproportion=1.0
requestdistribution=uniform
Run the following command to insert data:
bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s