All Products
Search
Document Center

Tablestore:Batch read offline data

Last Updated:May 31, 2024

Tablestore provides the BulkExport operation to batch read offline data from a data table in big data scenarios. After data is written to a data table, you can read the data based on specific conditions.

Prerequisites

  • An OTSClient instance is initialized. For more information, see Initialize an OTSClient instance.

  • A data table is created, and data is written to the data table.

Parameters

Parameter

Description

tableName

The name of the data table.

inclusiveStartPrimaryKey

The start and end primary keys for the batch read operation. The start and end primary keys must be valid primary keys or virtual points that consist of INF_MIN type values and INF_MAX type values. The number of columns in the virtual points must be the same as that in the primary key.

INF_MIN indicates an infinitely small value. All values of other types are greater than a value of the INF_MIN type. INF_MAX indicates an infinitely great value. All values of other types are smaller than a value of the INF_MAX type.

  • The inclusiveStartPrimaryKey parameter specifies the start primary key. If a row that contains the start primary key exists, the row of data is returned.

  • The exclusiveEndPrimaryKey parameter specifies the end primary key. Regardless of whether a row that contains the end primary key exists, the row of data is not returned.

The rows in a data table are sorted in ascending order based on primary key values. The range that is used to read data is a left-closed and right-open interval. If data is read in the forward direction, the rows whose primary key values are greater than or equal to the start primary key value but smaller than the end primary key value are returned.

exclusiveEndPrimaryKey

columnsToGet

The columns that you want to read. You can specify the names of primary key columns or attribute columns.

  • If you do not specify a column, all data in the row is returned.

  • If you specify columns but the row does not contain the specified columns, the return value is null. If the row contains some of the specified columns, the data in those columns of the row is returned.

Note
  • By default, Tablestore returns the data from all columns of a row when you query the row. You can specify the columnsToGet parameter to return specific columns. For example, if col0 and col1 are set for the columnsToGet parameter, only the values of the col0 and col1 columns are returned.

  • If a row is in the specified range that you want to read based on primary key values but does not contain the specified columns that you want to return, the response excludes the row.

  • If you configure the columnsToGet and filter parameters, Tablestore queries the columns that are specified by the columnsToGet parameter, and then returns the rows that meet the filter conditions.

filter

The filter that you want to use to filter the query results on the server side. Only rows that meet the filter conditions are returned. For more information, see Configure a filter.

Note

If you configure the columnsToGet and filter parameters, Tablestore queries the columns that are specified by the columnsToGet parameter, and then returns the rows that meet the filter conditions.

dataBlockType

The format type of the returned data for this read request. Valid values: PlainBuffer and SimpleRowMatrix.

Examples

The following sample code provides an example on how to batch read data whose primary key value is within a specific range:

private static void bulkExport(SyncClient client, String start, String end){
    // Specify the start primary key. 
    PrimaryKeyBuilder startPrimaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
    startPrimaryKeyBuilder.addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString(String.valueOf(start)));
    PrimaryKey startPrimaryKey = startPrimaryKeyBuilder.build();

    // Specify the end primary key. 
    PrimaryKeyBuilder endPrimaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
    endPrimaryKeyBuilder.addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString(String.valueOf(end)));
    PrimaryKey endPrimaryKey = endPrimaryKeyBuilder.build();

    // Create a bulkExportRequest. 
    BulkExportRequest bulkExportRequest = new BulkExportRequest();
    // Create a bulkExportQueryCriteria. 
    BulkExportQueryCriteria bulkExportQueryCriteria = new BulkExportQueryCriteria("<TABLE_NAME>");

    bulkExportQueryCriteria.setInclusiveStartPrimaryKey(startPrimaryKey);
    bulkExportQueryCriteria.setExclusiveEndPrimaryKey(endPrimaryKey);
    // Use the DBT_PLAIN_BUFFER encoding method. 
    bulkExportQueryCriteria.setDataBlockType(DataBlockType.DBT_PLAIN_BUFFER);
    // If you want to use the DBT_SIMPLE_ROW_MATRIX encoding method, use the following code. 
    // bulkExportQueryCriteria.setDataBlockType(DataBlockType.DBT_SIMPLE_ROW_MATRIX);
    bulkExportQueryCriteria.addColumnsToGet("pk");
    bulkExportQueryCriteria.addColumnsToGet("DC1");
    bulkExportQueryCriteria.addColumnsToGet("DC2");

    bulkExportRequest.setBulkExportQueryCriteria(bulkExportQueryCriteria);
    // Obtain the bulkExportResponse. 
    BulkExportResponse bulkExportResponse = client.bulkExport(bulkExportRequest);
    
    // If you set DataBlockType to DBT_SIMPLE_ROW_MATRIX, you need to use the following code to print the result. 
    //{
    //    SimpleRowMatrixBlockParser parser = new SimpleRowMatrixBlockParser(bulkExportResponse.getRows());
    //    List<Row> rows = parser.getRows();
    //    for (int i = 0; i < rows.size(); i++){
    //        System.out.println(rows.get(i));
    //    }
    //}

    // Set DataBlockType to DBT_PLAIN_BUFFER and print the result. 
    {
        PlainBufferBlockParser parser = new PlainBufferBlockParser(bulkExportResponse.getRows());
        List<Row> rows = parser.getRows();
        for (int i = 0; i < rows.size(); i++){
            System.out.println(rows.get(i));
        }
    }
}

References

  • For more information about the API operation, see BulkExport.

  • For more information about how to call the operation to batch read offline data, see BulkExportRequest.java and BulkExportResponse.java.

  • If you want to accelerate data queries, you can use secondary indexes or search indexes. For more information, see Secondary Index or Search indexes.

  • If you want to visualize the data of a table, you can connect Tablestore to DataV or Grafana. For more information, see Data visualization tools.

  • If you want to download data from a table to a local file, you can use DataX or the Tablestore CLI. For more information, see Download data in Tablestore to a local file.

  • If you want to perform computing or analytics on data in Tablestore, you can use the SQL query feature of Tablestore. For more information, see Overview.

    Note

    You can also use compute engines such as MaxCompute, Spark, Hive, Hadoop MapReduce, Function Compute, and Realtime Compute for Apache Flink to compute and analyze data in a table. For more information, see Overview.