All Products
Search
Document Center

Tablestore:Read data

Last Updated:Mar 13, 2024

Tablestore provides multiple operations for you to read data from tables. Specifically, you can read a single row of data, read multiple rows of data at a time, read data whose primary key values are in the specified range, read data by using an iterator, and read data in parallel queries.

Query methods

Tablestore provides the GetRow, BatchGetRow, and GetRange operations that you can call to read data. Before you read data, select an appropriate query method based on the actual query scenario.

Important

If you want to read data from a table that contains an auto-increment primary key column, make sure that you have queried the values of all primary key columns that include the values of the auto-increment primary key column. For more information, see Configure an auto-increment primary key column. If no value is recorded for the auto-increment primary key column, you can call the GetRange operation to specify the range within which data is read based on primary key values from the first primary key column.

Query method

Description

Scenario

Read a single row of data

You can call the GetRow operation to read a single row of data.

This method is applicable to scenarios in which all primary key columns of a table can be determined and the number of rows to be read is small.

Read multiple rows of data at a time

You can call the BatchGetRow operation to read multiple rows of data from one or more tables at a time.

The BatchGetRow operation consists of multiple GetRow operations. When you call the BatchGetRow operation, the process of constructing each GetRow operation is the same as the process of constructing the GetRow operation when you call the GetRow operation.

This method is applicable to scenarios in which all primary key columns of a table can be determined and the number of rows to be read is large or data is to be read from multiple tables.

Read data whose primary key values are within a specific range

You can call the GetRange operation to read data whose primary key values are in the specified range.

The GetRange operation allows you to read data whose primary key values are in the specified range in a forward or backward direction. You can also specify the number of rows to read. If the range is large and the number of scanned rows or the volume of scanned data exceeds the upper limit, the scan stops, and the rows that are read and information about the primary key of the next row are returned. You can initiate a request to start from where the last operation left off and read the remaining rows based on the information about the primary key of the next row returned by the previous operation.

This method is applicable to scenarios in which the range of all primary key columns of a table or the prefix of primary key columns can be determined.

Important

If you cannot determine the prefix of primary key columns, you can specify the start primary key column whose data is of the INF_MIN type and the end primary key column whose data is of the INF_MAX type to determine the range of all primary key columns of a table. This operation scans all data in the table but consumes a large amount of computing resources. Proceed with caution.

Read data whose primary key values are in the specified range by using an iterator

You can call the createRangeIterator operation to read data whose primary key values are in the specified range by using an iterator.

This method is applicable to scenarios in which the range of all primary key columns of a table or the prefix of primary key columns can be determined, and an iterator is required to read data.

Read data in parallel queries

Tablestore SDK for Java provides the TableStoreReader class that encapsulates the BatchGetRow operation. You can use this class to concurrently query data of a data table. TableStoreReader also supports multi-table queries, statistics on query status, row-level callback, and custom configurations.

This method is applicable to scenarios in which all primary key columns of a table can be determined and the number of rows to be read is large or data is to be read from multiple tables.

Prerequisites

  • An OTSClient instance is initialized. For more information, see Initialize a client.

  • A data table is created, and data is written to the data table.

Read a single row of data

You can call the GetRow operation to read a single row of data. After you call the GetRow operation, one of the following results may be returned:

  • If the row exists, the primary key columns and attribute columns of the row are returned.

  • If the row does not exist, no row is returned and no error is reported.

Parameters

Parameter

Description

tableName

The name of the data table.

primaryKey

The primary key information of the row. The value of this parameter contains the name of each primary key column, the type of the primary key column, and the primary key value.

Important

The number and types of primary key columns that you specify must be the same as the actual number and types of primary key columns in the data table.

columnsToGet

The columns that you want to read. You can specify the names of primary key columns or attribute columns.

  • If you do not specify a column, all data in the row is returned.

  • If you specify columns but the row does not contain the specified columns, the return value is null. If the row contains some of the specified columns, the data in those columns of the row is returned.

Note
  • By default, Tablestore returns the data from all columns of a row when you query the row. You can specify the columnsToGet parameter to return specific columns. For example, if col0 and col1 are set for the columnsToGet parameter, only the values of the col0 and col1 columns are returned.

  • If you configure the columnsToGet and filter parameters, Tablestore queries the columns that are specified by the columnsToGet parameter, and then returns the rows that meet the filter conditions.

maxVersions

The maximum number of data versions that you can read.

Important

You must configure at least one of the following parameters: maxVersions and timeRange.

  • If only the maxVersions parameter is specified, data of the specified number of versions is returned from the most recent data entry to the earliest data entry.

  • If only the timeRange parameter is specified, all data whose versions are in the specified time range or data of the specified version is returned.

  • If the maxVersions and timeRange parameters are specified, data of the specified number of versions in the specified time range is returned from the most recent data entry to the earliest data entry.

timeRange

The range of versions or a specific version that you want to read. For more information, see TimeRange.

Important

You must configure at least one of the following parameters: maxVersions and timeRange.

  • If only the maxVersions parameter is specified, data of the specified number of versions is returned from the most recent data entry to the earliest data entry.

  • If only the timeRange parameter is specified, all data whose versions are in the specified time range or data of the specified version is returned.

  • If the maxVersions and timeRange parameters are specified, data of the specified number of versions in the specified time range is returned from the most recent data entry to the earliest data entry.

  • To query data whose versions are in the specified time range, you must configure the start and end parameters. The start parameter specifies the start timestamp. The end parameter specifies the end timestamp. The specified range is a left-closed and right-open interval that is in the [start, end) format.

  • To query data of a specific version, you must specify the timestamp parameter. The timestamp parameter specifies a specific timestamp.

You need to configure only one of timestamp and [start, end).

Valid values of the timeRange parameter: 0 to Long.MAX_VALUE. Unit: millisecond.

filter

The filter that you want to use to filter the query results on the server side. Only rows that meet the filter conditions are returned. For more information, see Configure a filter.

Note

If you configure the columnsToGet and filter parameters, Tablestore queries the columns that are specified by the columnsToGet parameter, and then returns the rows that meet the filter conditions.

Sample code

You can specify the data version and the columns that you want to read, and filter the data by using a filter or a regular expression.

Read data of the latest version from the specified columns of a row

The following sample code provides an example on how to read data of the latest version from the specified columns of a row in a data table.

private static void getRow(SyncClient client, String pkValue) {
    // Construct the primary key. 
    PrimaryKeyBuilder primaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
    primaryKeyBuilder.addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString(pkValue));
    PrimaryKey primaryKey = primaryKeyBuilder.build();

    // Specify the table name and primary key to read a row of data. 
    SingleRowQueryCriteria criteria = new SingleRowQueryCriteria("<TABLE_NAME>", primaryKey);
    // Set the MaxVersions parameter to 1 to read the latest version of data. 
    criteria.setMaxVersions(1);
    GetRowResponse getRowResponse = client.getRow(new GetRowRequest(criteria));
    Row row = getRowResponse.getRow();

    System.out.println("Read complete. Result:");
    System.out.println(row);

    // Specify the columns that you want to read. 
    criteria.addColumnsToGet("Col0");
    getRowResponse = client.getRow(new GetRowRequest(criteria));
    row = getRowResponse.getRow();

    System.out.println("Read complete. Result:");
    System.out.println(row);
} 

Use a filter to filter data that is read

The following sample code provides an example on how to read data of the latest version from a row in a data table and use a filter to filter data based on the value of the Col0 column.

private static void getRow(SyncClient client, String pkValue) {
    // Construct the primary key. 
    PrimaryKeyBuilder primaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
    primaryKeyBuilder.addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString(pkValue));
    PrimaryKey primaryKey = primaryKeyBuilder.build();

    // Specify the table name and primary key to read a row of data. 
    SingleRowQueryCriteria criteria = new SingleRowQueryCriteria("<TABLE_NAME>", primaryKey);
    // Set the MaxVersions parameter to 1 to read the latest version of data. 
    criteria.setMaxVersions(1);

    // Configure a filter to return a row in which the value of the Col0 column is 0. 
    SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter("Col0",
            SingleColumnValueFilter.CompareOperator.EQUAL, ColumnValue.fromLong(0));
    // If the Col0 column does not exist, the row is not returned. 
    singleColumnValueFilter.setPassIfMissing(false);
    criteria.setFilter(singleColumnValueFilter);

    GetRowResponse getRowResponse = client.getRow(new GetRowRequest(criteria));
    Row row = getRowResponse.getRow();

    System.out.println("Read complete. Result:");
    System.out.println(row);
}

Use a regular expression to filter data that is read

The following sample code provides an example on how to read the data of the Col1 column from a row in a data table and use a regular expression to filter data in the column.

private static void getRow(SyncClient client, String pkValue) {
    // Specify the name of the data table. 
    SingleRowQueryCriteria criteria = new SingleRowQueryCriteria("<TABLE_NAME>");
 
    // Construct the primary key. 
    PrimaryKey primaryKey = PrimaryKeyBuilder.createPrimaryKeyBuilder()
        .addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString(pkValue))
        .build();
    criteria.setPrimaryKey(primaryKey);
 
    // Set the MaxVersions parameter to 1 to read the latest version of data. 
    criteria.setMaxVersions(1);
 
    // Configure a filter. A row is returned when cast<int>(regex(Col1)) is greater than 100. 
    RegexRule regexRule = new RegexRule("t1:([0-9]+),", RegexRule.CastType.VT_INTEGER);
    SingleColumnValueRegexFilter filter =  new SingleColumnValueRegexFilter("Col1",
        regexRule,SingleColumnValueRegexFilter.CompareOperator.GREATER_THAN, ColumnValue.fromLong(100));
    criteria.setFilter(filter);
 
    GetRowResponse getRowResponse = client.getRow(new GetRowRequest(criteria));
    Row row = getRowResponse.getRow();

    System.out.println("Read complete. Result:");
    System.out.println(row);
}

Read multiple rows of data at a time

You can call the BatchGetRow operation to read multiple rows of data from one or more tables at a time. The BatchGetRow operation consists of multiple GetRow operations. When you call the BatchGetRow operation, the process of constructing each GetRow operation is the same as the process of constructing the GetRow operation when you call the GetRow operation.

If you call the BatchGetRow operation, each GetRow operation is separately performed, and Tablestore separately returns the response to each GetRow operation.

Usage notes

  • The BatchGetRow operation uses the same parameter configurations for all rows. For example, if the ColumnsToGet parameter is set to [colA], only the value of the colA column is read from all rows.

  • When you call the BatchGetRow operation to read multiple rows at a time, some rows may fail to be read. If this happens, Tablestore does not return exceptions, but returns BatchGetRowResponse in which error messages of the failed rows are included. Therefore, when you call the BatchGetRow operation, you must check the return values. You can use the isAllSucceed method of BatchGetRowResponse to check whether all rows are read or use the getFailedRows method of BatchGetRowResponse to obtain the information about failed rows.

  • You can call the BatchGetRow operation to read a maximum of 100 rows at a time.

Parameters

For more information, see the Parameters section of this topic.

Sample code

The following sample code provides an example on how to configure the version conditions, columns to read, and filters to read 10 rows of data.

private static void batchGetRow(SyncClient client) {
    // Specify the name of the data table. 
    MultiRowQueryCriteria multiRowQueryCriteria = new MultiRowQueryCriteria("<TABLE_NAME>");
    // Specify the 10 rows that you want to read. 
    for (int i = 0; i < 10; i++) {
        PrimaryKeyBuilder primaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
        primaryKeyBuilder.addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString("pk" + i));
        PrimaryKey primaryKey = primaryKeyBuilder.build();
        multiRowQueryCriteria.addRow(primaryKey);
    }
    // Add conditions. 
    multiRowQueryCriteria.setMaxVersions(1);
    multiRowQueryCriteria.addColumnsToGet("Col0");
    multiRowQueryCriteria.addColumnsToGet("Col1");
    SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter("Col0",
            SingleColumnValueFilter.CompareOperator.EQUAL, ColumnValue.fromLong(0));
    singleColumnValueFilter.setPassIfMissing(false);
    multiRowQueryCriteria.setFilter(singleColumnValueFilter);

    BatchGetRowRequest batchGetRowRequest = new BatchGetRowRequest();
    // BatchGetRow allows you to read data from multiple tables. Each multiRowQueryCriteria parameter specifies query conditions for one table. You can add multiple multiRowQueryCriteria parameters to read data from multiple tables. 
    batchGetRowRequest.addMultiRowQueryCriteria(multiRowQueryCriteria);

    BatchGetRowResponse batchGetRowResponse = client.batchGetRow(batchGetRowRequest);

    System.out.println("Whether all operations are successful:" + batchGetRowResponse.isAllSucceed());
    System.out.println("Read complete. Result:");
    for (BatchGetRowResponse.RowResult rowResult : batchGetRowResponse.getSucceedRows()) {
        System.out.println(rowResult.getRow());
    }
    if (!batchGetRowResponse.isAllSucceed()) {
        for (BatchGetRowResponse.RowResult rowResult : batchGetRowResponse.getFailedRows()) {
            System.out.println("Failed rows:" + batchGetRowRequest.getPrimaryKey(rowResult.getTableName(), rowResult.getIndex()));
            System.out.println("Cause of failures:" + rowResult.getError());
        }

        /**
         * You can use the createRequestForRetry method to construct another request to retry the operations on failed rows. Only the retry request is constructed here. 
         * We recommend that you use the custom retry policy in Tablestore SDKs as the retry method. This feature allows you to retry failed rows after batch operations. After you set the retry policy, you do not need to add retry code to call the operation. 
         */
        BatchGetRowRequest retryRequest = batchGetRowRequest.createRequestForRetry(batchGetRowResponse.getFailedRows());
    }
}

For more information about the detailed sample code, visit BatchGetRow on GitHub.

Read data whose primary key values are within a specific range

You can call the GetRange operation to read data whose primary key values are in the specified range.

The GetRange operation allows you to read data whose primary key values are in the specified range in a forward or backward direction. You can also specify the number of rows to read. If the range is large and the number of scanned rows or the volume of scanned data exceeds the upper limit, the scan stops, and the rows that are read and information about the primary key of the next row are returned. You can initiate a request to start from where the last operation left off and read the remaining rows based on the information about the primary key of the next row returned by the previous operation.

Note

In Tablestore tables, all rows are sorted by the primary key. The primary key of a table sequentially consists of all primary key columns. Therefore, the rows are not sorted based on a specific primary key column.Tablestore

Usage notes

The GetRange operation follows the leftmost matching principle. Tablestore compares values in sequence from the first primary key column to the last primary key column to read data whose primary key values are in the specified range. For example, the primary key of a data table consists of the following primary key columns: PK1, PK2, and PK3. When data is read, Tablestore first determines whether the PK1 value of a row is in the range that is specified for the first primary key column. If the PK1 value of a row is in the range, Tablestore stops determining whether the values of other primary key columns of the row are in the ranges that are specified for each primary key column and returns the row. If the PK1 value of a row is not in the range, Tablestore continues to determine whether the values of other primary key columns of the row are in the ranges that are specified for each primary key column in the same manner as PK1.

If one of the following conditions is met, the GetRange operation may stop and return data:

  • The amount of scanned data reaches 4 MB.

  • The number of scanned rows reaches 5,000.

  • The number of returned rows reaches the upper limit.

  • The read throughput is insufficient to read the next row of data because all reserved read throughput is consumed.

Each GetRange call scans data once. If the size of data that you want to scan by calling the GetRange operation is large, the scanning stops when the number of scanned rows reaches 5,000 or the size of scanned data reaches 4 MB. Tablestore does not return the remaining data that meets the query conditions. You can use the paging method to obtain the remaining data that meets the query conditions.

Parameters

Parameter

Description

tableName

The name of the data table.

direction

The order in which you want to sort the rows in the response.

  • If you set this parameter to FORWARD, the value of the inclusiveStartPrimaryKey parameter must be smaller than the value of the exclusiveEndPrimaryKey parameter, and the rows in the response are sorted in the ascending order of primary key values.

  • If you set this parameter to BACKWARD, the value of the inclusiveStartPrimaryKey parameter must be greater than the value of the exclusiveEndPrimaryKey parameter, and the rows in the response are sorted in descending order of primary key values.

For example, a table has two primary key values A and B, and Value A is smaller than Value B. If you set the direction parameter to FORWARD and specify a [A, B) range for the table, Tablestore returns the rows whose primary key values are greater than or equal to Value A but smaller than Value B in ascending order from Value A to Value B. If you set the direction parameter to BACKWARD and specify a [B, A) range for the table, Tablestore returns the rows whose primary key values are smaller than or equal to Value B and greater than Value A in descending order from Value B to Value A.

inclusiveStartPrimaryKey

The start primary key information and end primary key information of the range that you want to read. The start primary key column and end primary key column must be valid primary key columns or virtual columns whose data is of the INF_MIN type and INF_MAX type. The number of columns in the range specified by virtual columns must be the same as the number of primary key columns of the specified table.

INF_MIN indicates an infinitely small value. All values of other types are greater than a value of the INF_MIN type. INF_MAX indicates an infinitely great value. All values of other types are smaller than a value of the INF_MAX type.

  • The inclusiveStartPrimaryKey parameter specifies the start primary key. If a row that contains the start primary key exists, the row of data is returned.

  • The exclusiveEndPrimaryKey parameter specifies the end primary key. Regardless of whether a row that contains the end primary key exists, the row of data is not returned.

The rows in a data table are sorted in ascending order based on primary key values. The range that is used to read data is a left-closed and right-open interval. If data is read in the forward direction, the rows whose primary key values are greater than or equal to the start primary key value but smaller than the end primary key value are returned.

exclusiveEndPrimaryKey

limit

The maximum number of rows that can be returned. The value of this parameter must be greater than 0.

Tablestore stops an operation after the maximum number of rows that can be returned in the forward or backward direction is reached, even if some rows in the specified range are not returned. You can use the value of the nextStartPrimaryKey parameter returned in the response to read data in the next request.

columnsToGet

The columns that you want to read. You can specify the names of primary key columns or attribute columns.

  • If you do not specify a column, all data in the row is returned.

  • If you specify columns but the row does not contain the specified columns, the return value is null. If the row contains some of the specified columns, the data in those columns of the row is returned.

Note
  • By default, Tablestore returns the data from all columns of a row when you query the row. You can specify the columnsToGet parameter to return specific columns. For example, if col0 and col1 are set for the columnsToGet parameter, only the values of the col0 and col1 columns are returned.

  • If a row is in the specified range that you want to read based on primary key values but does not contain the specified columns that you want to return, the response excludes the row.

  • If you configure the columnsToGet and filter parameters, Tablestore queries the columns that are specified by the columnsToGet parameter, and then returns the rows that meet the filter conditions.

maxVersions

The maximum number of data versions that you can read.

Important

You must configure at least one of the following parameters: maxVersions and timeRange.

  • If only the maxVersions parameter is specified, data of the specified number of versions is returned from the most recent data entry to the earliest data entry.

  • If only the timeRange parameter is specified, all data whose versions are in the specified time range or data of the specified version is returned.

  • If the maxVersions and timeRange parameters are specified, data of the specified number of versions in the specified time range is returned from the most recent data entry to the earliest data entry.

timeRange

The range of versions or a specific version that you want to read. For more information, see TimeRange.

Important

You must configure at least one of the following parameters: maxVersions and timeRange.

  • If only the maxVersions parameter is specified, data of the specified number of versions is returned from the most recent data entry to the earliest data entry.

  • If only the timeRange parameter is specified, all data whose versions are in the specified time range or data of the specified version is returned.

  • If the maxVersions and timeRange parameters are specified, data of the specified number of versions in the specified time range is returned from the most recent data entry to the earliest data entry.

  • To query data whose versions are in the specified time range, you must configure the start and end parameters. The start parameter specifies the start timestamp. The end parameter specifies the end timestamp. The specified range is a left-closed and right-open interval that is in the [start, end) format.

  • To query data of a specific version, you must specify the timestamp parameter. The timestamp parameter specifies a specific timestamp.

You need to configure only one of timestamp and [start, end).

Valid values of the timeRange parameter: 0 to Long.MAX_VALUE. Unit: millisecond.

filter

The filter that you want to use to filter the query results on the server side. Only rows that meet the filter conditions are returned. For more information, see Configure a filter.

Note

If you configure the columnsToGet and filter parameters, Tablestore queries the columns that are specified by the columnsToGet parameter, and then returns the rows that meet the filter conditions.

nextStartPrimaryKey

The start primary key value of the next read request. The value of the nextStartPrimaryKey parameter can be used to determine whether all data is read.

  • If the value of the nextStartPrimaryKey parameter is not empty in the response, the value can be used as the start primary key value for the next GetRange operation.

  • If the value of the nextStartPrimaryKey parameter is empty in the response, all data in the specified range is returned.

Sample code

Read data whose primary key values are in the specified range

The following sample code provides an example on how to read data whose primary key values are in the specified range in the forward direction. If the value of the nextStartPrimaryKey parameter is empty in the response, all data whose primary key values are in the specified range is read.

private static void getRange(SyncClient client, String startPkValue, String endPkValue) {
    // Specify the name of the data table. 
    RangeRowQueryCriteria rangeRowQueryCriteria = new RangeRowQueryCriteria("<TABLE_NAME>");

    // Specify the start primary key. 
    PrimaryKeyBuilder primaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
    primaryKeyBuilder.addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString(startPkValue));
    rangeRowQueryCriteria.setInclusiveStartPrimaryKey(primaryKeyBuilder.build());

    // Specify the end primary key. 
    primaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
    primaryKeyBuilder.addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString(endPkValue));
    rangeRowQueryCriteria.setExclusiveEndPrimaryKey(primaryKeyBuilder.build());

    rangeRowQueryCriteria.setMaxVersions(1);

    System.out.println("GetRange result:");
    while (true) {
        GetRangeResponse getRangeResponse = client.getRange(new GetRangeRequest(rangeRowQueryCriteria));
        for (Row row : getRangeResponse.getRows()) {
            System.out.println(row);
        }

        // If the value of the nextStartPrimaryKey parameter is not null, continue the read operation. 
        if (getRangeResponse.getNextStartPrimaryKey() != null) {
            rangeRowQueryCriteria.setInclusiveStartPrimaryKey(getRangeResponse.getNextStartPrimaryKey());
        } else {
            break;
        }
    }
}         

Read data within the range determined by the value of the first primary key column

The following sample code provides an example on how to read data in the forward direction within the range determined by the value of the first primary key column. The start value of the second primary key column is of the INF_MIN type. The end value of the second primary key column is of the INF_MAX type. If the value of the nextStartPrimaryKey parameter is null in the response, all data in the specified range is read.

private static void getRange(SyncClient client, String startPkValue, String endPkValue) {
    // Specify the name of the data table. 
    RangeRowQueryCriteria rangeRowQueryCriteria = new RangeRowQueryCriteria("<TABLE_NAME>");
    // Specify the start primary key. In this example, two primary key columns are used. 
    PrimaryKeyBuilder primaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
    primaryKeyBuilder.addPrimaryKeyColumn("pk1", PrimaryKeyValue.fromString(startPkValue));// Set the value of the first primary key column to a specific value. 
    primaryKeyBuilder.addPrimaryKeyColumn("pk2", PrimaryKeyValue.INF_MIN);// Set the value of the second primary key column to an infinitely small value. 
    rangeRowQueryCriteria.setInclusiveStartPrimaryKey(primaryKeyBuilder.build());

    // Specify the end primary key. 
    primaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
    primaryKeyBuilder.addPrimaryKeyColumn("pk1", PrimaryKeyValue.fromString(endPkValue));// Set the value of the first primary key column to a specific value. 
    primaryKeyBuilder.addPrimaryKeyColumn("pk2", PrimaryKeyValue.INF_MAX);// Set the value of the second primary key column to an infinitely great value. 
    rangeRowQueryCriteria.setExclusiveEndPrimaryKey(primaryKeyBuilder.build());

    rangeRowQueryCriteria.setMaxVersions(1);

    System.out.println("GetRange result:");
    while (true) {
        GetRangeResponse getRangeResponse = client.getRange(new GetRangeRequest(rangeRowQueryCriteria));
        for (Row row : getRangeResponse.getRows()) {
            System.out.println(row);
        }

        // If the value of the nextStartPrimaryKey parameter is not null, continue the read operation. 
        if (getRangeResponse.getNextStartPrimaryKey() != null) {
            rangeRowQueryCriteria.setInclusiveStartPrimaryKey(getRangeResponse.getNextStartPrimaryKey());
        } else {
            break;
        }
    }
}    

Read data whose primary key values are in the specified range and use a regular expression to filter data in the specified column

The following sample code provides an example on how to read data whose primary key values are in the range of ["pk:2020-01-01.log", "pk:2021-01-01.log") from the Col1 column and use a regular expression to filter data in the Col1 column.

private static void getRange(SyncClient client) {
    // Specify the name of the data table. 
    RangeRowQueryCriteria criteria = new RangeRowQueryCriteria("<TABLE_NAME>");
 
    // Specify ["pk:2020-01-01.log", "pk:2021-01-01.log") as the range of the primary key of the data that you want to read. The range is a left-closed and right-open interval. 
    PrimaryKey pk0 = PrimaryKeyBuilder.createPrimaryKeyBuilder()
        .addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString("2020-01-01.log"))
        .build();
    PrimaryKey pk1 = PrimaryKeyBuilder.createPrimaryKeyBuilder()
        .addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString("2021-01-01.log"))
        .build();
    criteria.setInclusiveStartPrimaryKey(pk0);
    criteria.setExclusiveEndPrimaryKey(pk1);
 
    // Set the MaxVersions parameter to 1 to read the latest version of data. 
    criteria.setMaxVersions(1);
 
    // Configure a filter. A row is returned when cast<int>(regex(Col1)) is greater than 100. 
    RegexRule regexRule = new RegexRule("t1:([0-9]+),", RegexRule.CastType.VT_INTEGER);
    SingleColumnValueRegexFilter filter =  new SingleColumnValueRegexFilter("Col1",
        regexRule,SingleColumnValueRegexFilter.CompareOperator.GREATER_THAN,ColumnValue.fromLong(100));
    criteria.setFilter(filter);

    while (true) {
        GetRangeResponse resp = client.getRange(new GetRangeRequest(criteria));
        for (Row row : resp.getRows()) {
            // do something
            System.out.println(row);
        }
        if (resp.getNextStartPrimaryKey() != null) {
            criteria.setInclusiveStartPrimaryKey(resp.getNextStartPrimaryKey());
        } else {
            break;
        }
   }
}

For more information about the detailed sample code, visit GetRange on GitHub.

Read data whose primary key values are in the specified range by using an iterator

The following sample code provides an example on how to call the createRangeIterator operation to read data whose primary key values are in the specified range by using an iterator.

private static void getRangeByIterator(SyncClient client, String startPkValue, String endPkValue) {
    // Specify the name of the data table. 
    RangeIteratorParameter rangeIteratorParameter = new RangeIteratorParameter("<TABLE_NAME>");

    // Specify the start primary key. 
    PrimaryKeyBuilder primaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
    primaryKeyBuilder.addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString(startPkValue));
    rangeIteratorParameter.setInclusiveStartPrimaryKey(primaryKeyBuilder.build());

    // Specify the end primary key. 
    primaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
    primaryKeyBuilder.addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString(endPkValue));
    rangeIteratorParameter.setExclusiveEndPrimaryKey(primaryKeyBuilder.build());

    rangeIteratorParameter.setMaxVersions(1);

    Iterator<Row> iterator = client.createRangeIterator(rangeIteratorParameter);

    System.out.println("Results obtained when an iterator is used in a GetRange operation:");
    while (iterator.hasNext()) {
        Row row = iterator.next();
        System.out.println(row);
    }
}           

For more information about the detailed sample code, visit GetRangeByIterator on GitHub.

Read data in parallel queries

Tablestore SDK for Java provides the TableStoreReader class that encapsulates the BatchGetRow operation. You can use this class to concurrently query data of a data table. TableStoreReader also supports multi-table queries, statistics on query status, row-level callback, and custom configurations.

Important

TableStoreReader is supported by Tablestore SDK for Java V5.16.1 and later. Make sure that you use a valid SDK version. For more information about the version history of Tablestore SDK for Java, see Version history of Tablestore SDK for Java.

Get started

  1. Construct the TableStoreReader.

    String endpoint = "<ENDPOINT>";
    String accessKeyId = System.getenv("OTS_AK_ENV");
    String accessKeySecret = System.getenv("OTS_SK_ENV");
    String instanceName = "<INSTANCE_NAME>";
    
    AsyncClientInterface client = new AsyncClient(endpoint, accessKeyId, accessKeySecret, instanceName);
    TableStoreReaderConfig config = new TableStoreReaderConfig();
    ThreadPoolExecutor executor = new ThreadPoolExecutor(4, 4, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(1024))
    
    TableStoreReader reader = new DefaultTableStoreReader(client, config, executor, null);
  2. Construct the request.

    Cache the data that you want to query in the memory. You can add one or more data entries at a time.

    PrimaryKey pk1 = PrimaryKeyBuilder.createPrimaryKeyBuilder()
            .addPrimaryKeyColumn("pk1", PrimaryKeyValue.fromLong(0))
            .addPrimaryKeyColumn("pk2", PrimaryKeyValue.fromLong(0))
            .build();
    // Add an attribute column named pk1 to the specified table in which you want to query data. 
    Future<ReaderResult> readerResult = reader.addPrimaryKeyWithFuture("<TABLE_NAME1>", pk1);
    // You can also use the List method to add multiple columns at a time. 
    List<PrimaryKey> primaryKeyList = new ArrayList<PrimaryKey>();
    Future<ReaderResult> readerResult = reader.addPrimaryKeysWithFuture("<TABLE_NAME2>", primaryKeyList);
  3. Query data.

    Send a request to query the data that is cached in the memory. You can query data in blocking mode or non-blocking mode.

    • Query data in blocking mode

      reader.flush();
    • Query data in non-blocking mode

      reader.send();
  4. Obtain the result of the query.

    // Display the information about successful and failed queries. 
    for (RowReadResult success : readerResult.get().getSucceedRows()) {
        System.out.println(success.getRowResult());
    }
    
    for (RowReadResult fail : readerResult.get().getFailedRows()) {
        System.out.println(fail.getRowResult());
    }
  5. Close the TableStoreReader.

    reader.close();
    Close the client and executor based on your actual requirements. 
    client.shutdown();
    executor.shutdown();

Parameters

You can customize the configurations of the TableStoreReader by modifying the TableStoreReaderConfig.

Parameter

Description

checkTableMeta

Specifies whether to check the structure of the table when you add the rows to be queried. Default value: true.

Set this parameter to false if you do not need to check the table structure when you add the rows to be queried.

bucketCount

The number of cache buckets in the memory of the reader. Default value: 4.

bufferSize

The size of the RingBuffer for each bucket. Default value: 1024.

concurrency

The maximum concurrency that is allowed for the batchGetRow operation. Default value: 10.

maxBatchRowsCount

The maximum number of rows that can be queried by calling the batchGetRow operation. Default value: 100. Maximum value: 100.

defaultMaxVersions

The maximum number of data versions that can be queried by calling the getRow operation. Default value: 1.

flushInterval

The interval at which a flush operation for automatic cache is performed. Default value: 10000. Unit: millisecond.

logInterval

The interval at which the status of tasks is automatically printed. Default value: 10000. Unit: millisecond.

Specify query conditions

You can specify table-level parameters to query data, such as the maximum number of data versions, the columns to be queried, and the time range within which you want to query data.

// Query data of a maximum of 10 versions in the col1 column of the specified table within previous 60 seconds. 
// Specify the name of the data table. 
RowQueryCriteria criteria = new RowQueryCriteria("<TABLE_NAME>");
// Specify the columns to be returned. 
criteria.addColumnsToGet("col1");
// Specify the maximum number of versions to be returned. 
criteria.setMaxVersions(10);
criteria.setTimeRange(new TimeRange(System.currentTimeMillis() - 60 * 1000, System.currentTimeMillis()));
reader.setRowQueryCriteria(criteria);

Sample code

public class TableStoreReaderDemo {
    private static final String endpoint = "<ENDPOINT>";
    private static final String accessKeyId = System.getenv("OTS_AK_ENV");
    private static final String accessKeySecret = System.getenv("OTS_SK_ENV");
    private static final String instanceName = "<INSTANCE_NAME>";
    private static AsyncClientInterface client;
    private static ExecutorService executor;
    private static AtomicLong succeedRows = new AtomicLong();
    private static AtomicLong failedRows = new AtomicLong();

    public static void main(String[] args) throws ExecutionException, InterruptedException {
        /**
         * Step 1: Construct the TableStoreReader. 
         */
        // Construct the AsyncClient. 
        client = new AsyncClient(endpoint, accessKeyId, accessKeySecret, instanceName);
        // Construct the configuration class of the reader. 
        TableStoreReaderConfig config = new TableStoreReaderConfig();
        {
            // The following parameters can be used by default. You do not need to configure these parameters. 
            // Check the structure of the table before you add the data that you want to query to the reader. 
            config.setCheckTableMeta(true);  
            // The maximum number of rows can be queried in a request. The maximum value is 100. 
            config.setMaxBatchRowsCount(100);    
            // The default maximum number of column versions that can be obtained. 
            config.setDefaultMaxVersions(1);
            // The total number of concurrent requests that can be sent. 
            config.setConcurrency(16); 
            // The number of cache buckets in the memory. 
            config.setBucketCount(4);      
            // The interval at which all cached data is sent. 
            config.setFlushInterval(10000);      
            // The interval at which the status of the reader is recorded. 
            config.setLogInterval(10000);                   
        }
        // Construct an executor that is used to send the request. 
        ThreadFactory threadFactory = new ThreadFactory() {
            private final AtomicInteger counter = new AtomicInteger(1);

            @Override
            public Thread newThread(Runnable r) {
                return new Thread(r, "reader-" + counter.getAndIncrement());
            }
        };
        executor = new ThreadPoolExecutor(4, 4, 0L, TimeUnit.MILLISECONDS,
                new LinkedBlockingQueue(1024), threadFactory, new ThreadPoolExecutor.CallerRunsPolicy());

        // Construct the callback function of the reader. 
        TableStoreCallback<PrimaryKeyWithTable, RowReadResult> callback = new TableStoreCallback<PrimaryKeyWithTable, RowReadResult>() {
            @Override
            public void onCompleted(PrimaryKeyWithTable req, RowReadResult res) {
                succeedRows.incrementAndGet();
            }

            @Override
            public void onFailed(PrimaryKeyWithTable req, Exception ex) {
                failedRows.incrementAndGet();
            }
        };
        TableStoreReader reader = new DefaultTableStoreReader(client, config, executor, callback);
        /**
         * Step 2: Construct the request. 
         */
        // Add a column of data that you want to query to the memory. 
        PrimaryKey pk1 = PrimaryKeyBuilder.createPrimaryKeyBuilder()
                .addPrimaryKeyColumn("pk1", PrimaryKeyValue.fromLong(0))
                .addPrimaryKeyColumn("pk2", PrimaryKeyValue.fromLong(0))
                .build();
        reader.addPrimaryKey("<TABLE_NAME1>", pk1);

        // Add a column of data that you want to query to the memory and obtain the response of the Future parameter. 
        PrimaryKey pk2 = PrimaryKeyBuilder.createPrimaryKeyBuilder()
                .addPrimaryKeyColumn("pk1", PrimaryKeyValue.fromLong(0))
                .addPrimaryKeyColumn("pk2", PrimaryKeyValue.fromLong(0))
                .build();
        Future<ReaderResult> readerResult = reader.addPrimaryKeyWithFuture("<TABLE_NAME2>", pk2);
        /**
         * Step 3: Query data. 
         */
        // Send data from the memory in non-blocking mode. 
        reader.send();
        /**
         * Step 4: Obtain the result of the query. 
         */
        // Display the information about successful and failed queries. 
        for (RowReadResult success : readerResult.get().getSucceedRows()) {
            System.out.println(success.getRowResult());
        }
        for (RowReadResult fail : readerResult.get().getFailedRows()) {
            System.out.println(fail.getRowResult());
        }
        /**
         * Step 5: Close the TableStoreReader. 
         */
        reader.close();
        client.shutdown();
        executor.shutdown();
    }
}

References

  • If you want to accelerate data queries, you can use secondary indexes or search indexes. For more information, see Secondary Index or Search Index.

  • If you want to visualize the data of a table, you can connect Tablestore to DataV or Grafana. For more information, see Data visualization tools.

  • If you want to download data from a table to a local file, you can use DataX or the Tablestore CLI. For more information, see Download data in Tablestore to a local file.

  • If you want to perform computing or analytics on data in Tablestore, you can use the SQL query feature of Tablestore. For more information, see Overview.

    Note

    You can also use compute engines such as MaxCompute, Spark, Hive, Hadoop MapReduce, Function Compute, and Realtime Compute for Apache Flink to compute and analyze data in a table. For more information, see Overview.