All Products
Search
Document Center

Tablestore:Parallel scan

Last Updated:Aug 13, 2024

If you do not have requirements on the order of query results, you can use the parallel scan feature to obtain query results in an efficient manner.

Important

Tablestore SDK for PHP V5.1.0 or later supports the parallel scan feature. Before you use the parallel scan feature, make sure that Tablestore SDK for PHP V5.1.0 or later is obtained. For information about the version history of Tablestore SDK for PHP, see Version history of Tablestore SDK for PHP.

Prerequisites

Parameters

ParameterDescription
table_nameThe name of the data table.
index_nameThe name of the search index.
scan_queryqueryThe query statement for the search index. The operation supports term query, fuzzy query, range query, geo query, and nested query, which are similar to those of the Search operation.
limitThe maximum number of rows that can be returned by each ParallelScan call.
max_parallelThe maximum number of parallel scan tasks per request. The maximum number of parallel scan tasks per request varies based on the data volume. A larger volume of data requires more parallel scan tasks per request. You can use the ComputeSplits operation to query the maximum number of parallel scan tasks per request.
current_parallel_idThe ID of the parallel scan task in the request. Valid values: [0, max_parallel).
tokenThe token that is used to paginate query results. The results of the ParallelScan request contain the token for the next page. You can use the token to retrieve the next page.
alive_timeThe validity period of the current parallel scan task. This validity period is also the validity period of the token. Unit: seconds. Default value: 60. We recommend that you use the default value. If the next request is not initiated within the validity period, more data cannot be queried. The validity time of the token is refreshed each time you send a request.
Note The server uses the asynchronous method to process expired tasks. The current task does not expire within the validity period. However, Tablestore does not guarantee that the task expires after the validity period.
columns_to_getYou can use parallel scan to scan data only in search indexes. To use parallel scan for a search index, you must set store to true when you create the search index.
session_idThe session ID of the parallel scan task. You can call the ComputeSplits operation to create a session and query the maximum number of parallel scan tasks that are supported by the parallel scan request.

Examples

The following sample code provides an example on how to use the parallel scan feature of search indexes:

// 1. Obtain the session ID. 
$computeSplitsPointReq = array(
    'table_name' => 'php_sdk_test',
    'search_index_splits_options' => array(
        'index_name' => 'test_create_search_index'
    )
);

$computeSplits = $otsClient->computeSplits($computeSplitsPointReq);
print json_encode ($computeSplits, JSON_PRETTY_PRINT);


// 2. Specify query conditions. 
$scanQuery = array(
    'query' => array(
        'query_type' => QueryTypeConst::MATCH_ALL_QUERY
    ),
    'limit' => 2,
    'alive_time' => 30,
    'token' => null,
    'current_parallel_id' => 0,
    'max_parallel' => 1
);

//3. Construct a ParallelScan request. 
$parallelScanReq = array(
    'table_name' => 'php_sdk_test',
    'index_name' => 'test_create_search_index',
    'columns_to_get' => array(
        'return_type' => ColumnReturnTypeConst::RETURN_ALL_FROM_INDEX, // RETURN_ALL is not allow in parallel_scan, use RETURN_ALL_FROM_INDEX
        'return_names' => array('geo', 'text', 'long', 'keyword')
    ),
    'session_id' => $computeSplits['session_id'],
    'scan_query' => $scanQuery
);

$parallelScanRes = $otsClient->parallelScan($parallelScanReq);
print json_encode ($parallelScanRes['rows'], JSON_PRETTY_PRINT);

//4. Export data by using token-based pagination. In this example, only the total number of rows is collected. 
$totalCount = count($parallelScanRes['rows']);
while (!is_null($parallelScanRes['next_token'])) {
    $parallelScanReq['scan_query']['token'] = $parallelScanRes['next_token'];
    $parallelScanRes = $otsClient->parallelScan($parallelScanReq);
    print json_encode ($parallelScanRes['rows'], JSON_PRETTY_PRINT);

    $totalCount += count($parallelScanRes['rows']);
}
print "TotalCount: " . $totalCount;