Count the number of rows in a table - Lindorm - Alibaba Cloud Documentation Center

This topic describes how to count the number of rows in a Lindorm wide table.

Usage notes

Lindorm is a NoSQL database that is developed based on the log-structured merge-tree (LSM tree) storage structure. If you want to query the exact number of rows in a Lindorm wide table, you must perform a full table scan, which requires a long period of time when the table contains a large amount of data. Therefore, we recommend that you do not perform frequent COUNT operations on Lindorm wide tables. If you want to obtain the exact number of rows in a wide table, use one of the following methods:
If you want to obtain the estimated number of rows in a wide table, view the data on the Overview page in the cluster management system of Lindorm. For more information, see Use the cluster management system to view the estimated number of rows in a wide table.

Use HBase Shell to count the number of rows in a wide table

Before you count the number of rows in a Lindorm wide table, make sure that LindormTable is connected by using HBase Shell. For more information, see Use Lindorm Shell to connect to LindormTable.

To count the exact number of rows in a wide table, you can use HBase Shell to run the COUNT command. The COUNT command can be used to scan an entire wide table in batches and collect the results for statistical analysis. We recommend that you run the COUNT command on an Elastic Compute Service (ECS) client that is connected to the same virtual private cloud (VPC) as LindormTable. If you run the COUNT command over the Internet, the operation that is performed consumes a large amount of network bandwidth and the statistical analysis that is performed based on the operation is inefficient. The time required to scan a table varies based on the schema of the table. If you run the COUNT command to scan the entire table, less than 100,000 rows can be scanned per second. You can run the following command to count the number of rows in a wide table named table:

count 'table'

The following command output is returned: 统计结果显示

Use HBase RowCounter to count the number of rows in a wide table

Before you count the number of rows in a Lindorm wide table, make sure that LindormTable is connected by using HBase Shell. For more information, see Use Lindorm Shell to connect to LindormTable.

RowCounter runs a MapReduce job in pseudo-distributed mode to perform the COUNT operation in an on-premises environment. By default, triggered operations are performed by using a single thread. In terms of the efficiency in running the COUNT command, this method is similar to the preceding method that uses HBase Shell. If you want to perform an efficient statistical analysis, use multiple threads. You can set Dmapreduce.local.map.tasks.maximum to the number of threads that you want to use to perform concurrent operations. Take note of the following items before you specify the number of threads:

Set the number of threads to a value that is less than or equal to the number of regions in your wide table.
If the number of threads increases, the loads on your cluster may become excessive and affect online services in a negative manner. We recommend that you specify the number of threads based on your business requirements.

Run the following commands in HBase Shell to count the number of rows in a specified wide table in different scenarios:

Count the number of rows in a table named table.

./alihbase-2.0.18/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter  "table"

Count the number of rows in a table named table by using 16 concurrent threads.

./alihbase-2.0.18/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter -Dmapreduce.local.map.tasks.maximum=16 "table"

Count the number of rows in a table named table. This table is created in the ns namespace.
```
./alihbase-2.0.18/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter  "ns:table"
```

The output is stored in the hbase.log file in the log directory.

Use Lindorm SQL statements to count the number of rows in a wide table

Before you count the number of rows in a Lindorm wide table, make sure that LindormTable is connected by using Lindorm-cli. For more information, see Use Lindorm-cli to connect to and use LindormTable.

The performance of Lindorm SQL statements is better than that of HBase RowCounter in counting the number of rows in a wide table. Lindorm automatically distributes the logical operations of COUNT to each process. This is similar to the process in which multithreading is used for statistical analysis. In HBase Shell, only a single thread is used for statistical analysis. However, the COUNT operation that is performed by using Lindorm SQL statements also requires a full table scan. The default timeout period for the execution of a Lindorm SQL statement is 120 seconds. If the result of a count statement cannot be returned within 120 seconds, a timeout error is returned. If you use Lindorm SQL statements to count the number of rows in a wide table, about 100,000 rows can be scanned on each server per second. When you use this method, the processing speed increases for more servers in your cluster because a distributed COUNT operation is performed.

You can execute the following statement to count the number of rows in a wide table named table:

SELECT COUNT(*) FROM table;

The following output is returned:

+--------+
| EXPR$0 |
+--------+
| 16000  |
+--------+

Important

A full table scan is required to count the number of rows. Make sure that you perform this operation only when it is necessary. If more than one million rows are contained in the table, we recommend that you use a search index to accelerate the count operation. For more information, see Query data in a wide table by using a search index.

Use the cluster management system to view the estimated number of rows in a wide table

Before you count the number of rows in a Lindorm wide table, make sure that you are logged on to the cluster management system of Lindorm. For more information, see Log on to the cluster management system.

In the cluster management system of Lindorm, you can view the estimated number of rows in your wide table on the Overview page. The value in the EstimateRowCount column is calculated by accumulating the metadata of the number of rows in each data file. If you perform update operations and delete operations, the same row of data may be stored in multiple data files. The metadata of the number of rows in data files is collected when these files are generated. If you enable the time-to-live (TTL) feature, specific data in these files may expire. As a result, the estimated number of rows may be inaccurate. The estimated number of rows is a rough value. If the data in your wide table is not updated or deleted and the data does not expire, the estimated number of rows that is displayed on the Overview page is accurate. The estimated value can be used to check data integrity after historical data migration is complete.

To view the estimated number of rows in your wide table, perform the following operations: In the left-side navigation pane of the cluster management system, click Overview. In the Current IDC section, click View in the EstimateRowCount column corresponding to the wide table.

Note

If your table contains data but the estimated number of rows that is displayed is 0, an earlier minor version of LindormTable may be used. In this case, update the minor version of LindormTable. For more information, see Upgrade the minor engine version of a Lindorm instance.