Use Tablestore SDK for Node.js to perform parallel scan - Tablestore

If you do not have requirements on the order of query results, you can use the parallel scan feature to obtain query results in an efficient manner.

Prerequisites

An OTSClient instance is initialized. For more information, see Initialize an OTSClient instance.
A data table is created and data is written to the data table. For more information, see Create data tables and Write data.
A search index is created for the data table. For more information, see Create a search index.

Parameters

Parameter		Description
tableName		The name of the data table.
indexName		The name of the search index.
scanQuery	query	The query statement for the search index. The operation supports term query, fuzzy query, range query, geo query, and nested query, which are similar to those of the Search operation.
	limit	The maximum number of rows that can be returned by each ParallelScan call.
	maxParallel	The maximum number of parallel scan tasks per request. The maximum number of parallel scan tasks per request varies based on the data volume. A larger volume of data requires more parallel scan tasks per request. You can use the ComputeSplits operation to query the maximum number of parallel scan tasks per request.
	currentParallelId	The ID of the parallel scan task in the request. Valid values: [0, Value of maxParallel)
	token	The token that is used to paginate query results. The results of the ParallelScan request contain the token for the next page. You can use the token to retrieve the next page.
	aliveTime	The validity period of the current parallel scan task. This validity period is also the validity period of the token. Unit: seconds. Default value: 60. Unit: seconds. We recommend that you use the default value. If the next request is not initiated within the validity period, no more data can be queried. The validity time of the token is refreshed each time you send a request. Note Sessions expire ahead of time if switch indexes are dynamically changed in schemas, a single server fails, or a server-end load balancing is performed. In this case, you must recreate sessions.
columnsToGet		The name of the column to be returned in the grouping result. You can add the column name to Columns. If you want all columns to be returned in the search index, you can use the more concise ReturnAllFromIndex operation. Important ReturnAll cannot be used here.
sessionId		The session ID of the parallel scan task. You can call the ComputeSplits operation to create a session and query the maximum number of parallel scan tasks that are supported by the parallel scan request.

Examples

The following sample code provides an example on how to perform parallel scan:

// 1. Obtain the session ID. 
let computeSplits = await new Promise((resolve, reject) => {
    client.computeSplits({
        tableName: tableName,
        searchIndexSplitsOptions: {
            indexName: indexName,
        }
    }, function (err, data) {
        if (err) {
            console.log('computeSplits error:', err.toString());
            reject(err);
        } else {
            console.log('computeSplits success:', data);
            resolve(data)
        }
    })
})

// 2. Specify query conditions. 
const scanQuery = {
    query: {
        queryType: TableStore.QueryType.MATCH_ALL_QUERY,
    },
    limit: 1000,
    aliveTime: 30,
    token: undefined,
    currentParallelId: 0,
    maxParallel: 1,
}

// 3. Create a ParallelScan request. In this example, a synchronous request is created. You can create an asynchronous request based on your business requirement. 
const parallelScanPromise = function () {
    return new Promise(function (resolve, reject) {
        client.parallelScan({
            tableName: tableName,
            indexName: indexName,
            columnToGet: {
                returnType: TableStore.ColumnReturnType.RETURN_ALL_FROM_INDEX,
            },
            sessionId: computeSplits.sessionId,
            scanQuery: scanQuery,
        }, function (err, data) {
            if (err) {
                console.log('parallelScan error:', err.toString());
                reject(err);
            } else {
                console.log("parallelScan, rows:", data.rows.length)
                resolve(data)
            }
        });
    })
}
let totalCount = 0 // Specify that the total number of rows that meet the query conditions is returned. 
let parallelScanResponse = await parallelScanPromise()
totalCount = totalCount + parallelScanResponse.rows.length
// 4. Pull data by using an iterator until all data is pulled. 
while (parallelScanResponse.nextToken !== null && parallelScanResponse.nextToken.length > 0) {
    scanQuery.token = parallelScanResponse.nextToken
    parallelScanResponse = await parallelScanPromise()
    totalCount += parallelScanResponse.rows.length
}
console.log("total rows:", totalCount)

FAQ

What do I do if no data can be found by calling the Search operation?

References

The following query types are supported by search indexes: term query, terms query, match all query, match query, match phrase query, prefix query, range query, wildcard query, Boolean query, geo query, nested query, vector query, and exists query. You can select a query type to query data based on your business requirements.
If you want to sort or paginate the rows that meet the query conditions, you can use the sorting and paging feature. For more information, see Sorting and paging.
If you want to collapse the result set based on a specific column, you can use the collapse (distinct) feature. This way, data of the specified type appears only once in the query results. For more information, see Collapse (distinct).
If you want to analyze data in a data table, such as obtaining the extreme values, sum, and total number of rows, you can perform aggregation operations or execute SQL statements. For more information, see Aggregation and SQL query.
If you want to quickly obtain all rows that meet the query conditions without the need to sort the rows, you can call the ParallelScan and ComputeSplits operations to use the parallel scan feature. For more information, see Parallel scan.