Use Tablestore SDK for Go to perform a parallel scan - Tablestore

If you do not have requirements on the order of query results, you can use the parallel scan feature to obtain query results in an efficient manner.

Prerequisites

An OTSClient instance is initialized. For more information, see Initialize an OTSClient instance.
A data table is created and data is written to the data table. For more information, see Create a data table and Write data.
A search index is created for the data table. For more information, see Create search indexes.

Parameters

Parameter		Description
TableName		The name of the data table.
IndexName		The name of the search index.
ScanQuery	Query	The query statement for the search index. The query types such as term query, fuzzy query, range query, geo query, and nested query are supported, which are the same as those supported by the Search operation.
	Limit	The maximum number of rows that can be returned by each ParallelScan call.
	MaxParallel	The maximum number of parallel scan tasks per request. The maximum number of parallel scan tasks per request varies based on the data volume. A larger volume of data requires more parallel scan tasks per request. You can call the ComputeSplits operation to query the maximum number of parallel scan tasks per request.
	CurrentParallelID	The ID of the parallel scan task in the request. Valid values: [0,MaxParallel).
	Token	The token that is used to page query results. The results of the ParallelScan request contain the token for the next page. You can use the token to retrieve the next page.
	AliveTime	The validity period of the current parallel scan task. This validity period is also the validity period of the token. Default value: 60. Unit: seconds. We recommend that you use the default value. If the next request is not initiated within the validity period, no more data can be queried. The validity time of the token is refreshed each time you send a request. Note Sessions expire ahead of time if switch indexes are dynamically modified in schemas, a single server fails, or a server load balancing is performed. In this case, you must recreate sessions.
ColumnsToGet		The name of the column to be returned in the grouping result. You can set the Columns parameter to the column name. If you want to return all columns in the search index, you can specify the ReturnAllFromIndex parameter. Important The ReturnAll parameter is not supported.
SessionId		The session ID of the parallel scan task. You can call the ComputeSplits operation to create a session and query the maximum number of parallel scan tasks that are supported by the parallel scan request.

Examples

You can scan data by using a single thread or by using multiple threads at a time based on your business requirements.

Scan data by using a single thread

When you use the parallel scan feature, the code for a request that uses a single thread is simpler than the code for a request that uses multiple threads. The currentParallelId and maxParallel parameters are not required for a request that uses a single thread. The ParallelScan request that uses a single thread provides higher throughput than the Search request. However, the ParallelScan request that uses a single thread provides lower throughput than the ParallelScan request that uses multiple threads.

func computeSplits(client *tablestore.TableStoreClient, tableName string, indexName string) (*tablestore.ComputeSplitsResponse, error) {
	req := &tablestore.ComputeSplitsRequest{}
	req.
		SetTableName(tableName).
		SetSearchIndexSplitsOptions(tablestore.SearchIndexSplitsOptions{IndexName: indexName})
	res, err := client.ComputeSplits(req)
	if err != nil {
		return nil, err
	}
	return res, nil
}

/**
 * Perform a parallel scan to scan data by using a single thread. 
 */
func ParallelScanSingleConcurrency(client *tablestore.TableStoreClient, tableName string, indexName string) {
        computeSplitsResp, err := computeSplits(client, tableName, indexName)
        if err != nil {
                fmt.Printf("%#v", err)
                return
        }

        query := search.NewScanQuery().SetQuery(&search.MatchAllQuery{}).SetLimit(2)

        req := &tablestore.ParallelScanRequest{}
        req.SetTableName(tableName).
                SetIndexName(indexName).
                SetColumnsToGet(&tablestore.ColumnsToGet{ReturnAllFromIndex: false}).
                SetScanQuery(query).
                SetSessionId(computeSplitsResp.SessionId)

        res, err := client.ParallelScan(req)
        if err != nil {
                fmt.Printf("%#v", err)
                return
        }

        total := len(res.Rows)
        for res.NextToken != nil {
                req.SetScanQuery(query.SetToken(res.NextToken))
                res, err = client.ParallelScan(req)
                if err != nil {
                        fmt.Printf("%#v", err)
                        return
                }

                total += len(res.Rows) //process rows each loop
        }
        fmt.Println("total: ", total)
}

Scan data by using multiple threads

func computeSplits(client *tablestore.TableStoreClient, tableName string, indexName string) (*tablestore.ComputeSplitsResponse, error) {
	req := &tablestore.ComputeSplitsRequest{}
	req.
		SetTableName(tableName).
		SetSearchIndexSplitsOptions(tablestore.SearchIndexSplitsOptions{IndexName: indexName})
	res, err := client.ComputeSplits(req)
	if err != nil {
		return nil, err
	}
	return res, nil
}

/**
 * Perform a parallel scan to scan data by using multiple threads. 
 */
func ParallelScanMultiConcurrency(client *tablestore.TableStoreClient, tableName string, indexName string) {
        computeSplitsResp, err := computeSplits(client, tableName, indexName)
        if err != nil {
                fmt.Printf("%#v", err)
                return
        }

        var wg sync.WaitGroup
        wg.Add(int(computeSplitsResp.SplitsSize))

        for i := int32(0); i < computeSplitsResp.SplitsSize; i++ {
                current := i
                go func() {
                        defer wg.Done()
                        query := search.NewScanQuery().
                                SetQuery(&search.MatchAllQuery{}).
                                SetCurrentParallelID(current).
                                SetMaxParallel(computeSplitsResp.SplitsSize).
                                SetLimit(2)

                        req := &tablestore.ParallelScanRequest{}
                        req.SetTableName(tableName).
                                SetIndexName(indexName).
                                SetColumnsToGet(&tablestore.ColumnsToGet{ReturnAllFromIndex: false}).
                                SetScanQuery(query).
                                SetSessionId(computeSplitsResp.SessionId)

                        res, err := client.ParallelScan(req)
                        if err != nil {
                                fmt.Printf("%#v", err)
                                return
                        }

                        total := len(res.Rows)
                        for res.NextToken != nil {
                                req.SetScanQuery(query.SetToken(res.NextToken))
                                res, err = client.ParallelScan(req)
                                if err != nil {
                                        fmt.Printf("%#v", err)
                                        return
                                }

                                total += len(res.Rows) //process rows each loop
                        }
                        fmt.Println("total: ", total)
                }()
        }
        wg.Wait()
}

FAQ

What do I do if no data can be found by calling the Search operation?

References

When you use a search index to query data, you can use the following query methods: term query, terms query, match all query, match query, match phrase query, prefix query, range query, wildcard query, geo query, Boolean query, KNN vector query, nested query, and exists query. You can use the query methods provided by the search index to query data from multiple dimensions based on your business requirements.
You can sort or paginate rows that meet the query conditions by using the sorting and paging features. For more information, see Sorting and paging.
You can use the collapse (distinct) feature to collapse the result set based on a specific column. This way, data of the specified type appears only once in the query results. For more information, see Collapse (distinct).
If you want to analyze data in a data table, you can use the aggregation feature of the Search operation or execute SQL statements. For example, you can obtain the minimum and maximum values, sum, and total number of rows. For more information, see Aggregation and SQL query.
If you want to obtain all rows that meet the query conditions without the need to sort the rows, you can call the ParallelScan and ComputeSplits operations to use the parallel scan feature. For more information, see Perform a parallel scan.