All Products
Search
Document Center

Tablestore:Perform a parallel scan

Last Updated:Sep 18, 2024

If you do not have requirements on the order of query results, you can use the parallel scan feature to obtain query results in an efficient manner.

Prerequisites

Parameters

Parameter

Description

TableName

The name of the data table.

IndexName

The name of the search index.

ScanQuery

Query

The query statement for the search index. The query types such as term query, fuzzy query, range query, geo query, and nested query are supported, which are the same as those supported by the Search operation.

Limit

The maximum number of rows that can be returned by each ParallelScan call.

MaxParallel

The maximum number of parallel scan tasks per request. The maximum number of parallel scan tasks per request varies based on the data volume. A larger volume of data requires more parallel scan tasks per request. You can call the ComputeSplits operation to query the maximum number of parallel scan tasks per request.

CurrentParallelId

The ID of the parallel scan task in the request. Valid values: [0,MaxParallel).

Token

The token that is used to page query results. The results of the ParallelScan request contain the token for the next page. You can use the token to retrieve the next page.

AliveTime

The validity period of the current parallel scan task. This validity period is also the validity period of the token. Default value: 60. Unit: seconds. We recommend that you use the default value. If the next request is not initiated within the validity period, no more data can be queried. The validity time of the token is refreshed each time you send a request.

Note

Sessions expire ahead of time if switch indexes are dynamically modified in schemas, a single server fails, or a server load balancing is performed. In this case, you must recreate sessions.

ColumnsToGet

The name of the column to be returned in the grouping result. You can set the Columns parameter to the column name.

If you want to return all columns in the search index, you can specify the ReturnAllFromIndex parameter.

Important

The ReturnAll parameter is not supported.

SessionId

The session ID of the parallel scan task. You can call the ComputeSplits operation to create a session and query the maximum number of parallel scan tasks that are supported by the parallel scan request.

Example

The following sample code shows how to scan data by using a single thread.

/// <summary>
/// Perform a parallel scan to scan data by using a single thread. 
/// </summary>
public class ParallelScan
{
    public static void ParallelScanwithSingleThread(OTSClient otsClient)
    {
        SearchIndexSplitsOptions options = new SearchIndexSplitsOptions
        {
            IndexName = IndexName
        };

        ComputeSplitsRequest computeSplitsRequest = new ComputeSplitsRequest
        {
            TableName = TableName,
            SplitOptions = options
        };

        ComputeSplitsResponse computeSplitsResponse = otsClient.ComputeSplits(computeSplitsRequest);

        MatchAllQuery matchAllQuery = new MatchAllQuery();

        ScanQuery scanQuery = new ScanQuery();
        scanQuery.AliveTime = 60;
        scanQuery.Query = matchAllQuery;
        scanQuery.MaxParallel = computeSplitsResponse.SplitsSize;
        scanQuery.Limit = 10;

        ParallelScanRequest parallelScanRequest = new ParallelScanRequest();
        parallelScanRequest.TableName = TableName;
        parallelScanRequest.IndexName = IndexName;
        parallelScanRequest.ScanQuery = scanQuery;
        parallelScanRequest.ColumnToGet = new ColumnsToGet { ReturnAllFromIndex = true };
        parallelScanRequest.SessionId = computeSplitsResponse.SessionId;

        int total = 0;
        List<Row> result = new List<Row>();

        ParallelScanResponse parallelScanResponse = otsClient.ParallelScan(parallelScanRequest);

        while (parallelScanResponse.NextToken != null)
        {
            List<Row> rows = new List<Row>(parallelScanResponse.Rows);

            total += rows.Count;
            result.AddRange(rows);

            parallelScanRequest.ScanQuery.Token = parallelScanResponse.NextToken;

            parallelScanResponse = otsClient.ParallelScan(parallelScanRequest);
        }

        foreach (Row row in result)
        {
            Console.WriteLine(JsonConvert.SerializeObject(row));
        }
        Console.WriteLine("Total Row Count: {0}", total);
    }
}