All Products
Search
Document Center

Tablestore:ComputeSplitPointsBySize

Last Updated:Aug 23, 2024

Divides data in a table into several logical splits whose sizes are approximately the specified value. The split points between the splits and the information about hosts on which the splits reside are returned. This operation is used by compute engines to determine execution plans such as concurrency plans.

Request syntax

message ComputeSplitPointsBySizeRequest {
    required string table_name = 1;
    required int64 split_size = 2; // in 100MB
    optional int64 split_size_unit_in_byte = 3;
    optional int32 split_point_limit = 4;
}

Parameter

Type

Required

Description

table_name

string

Yes

The name of the table whose data you want to divide.

split_size

int64

Yes

The approximate size of each split. Unit: megabytes.

split_size_unit_in_byte

int64

No

The size unit to be used in splitting. This parameter is used in split point calculation to ensure calculation accuracy.

split_point_limit

int32

No

The limit on the number of split points. This parameter is used to control the returned result of split point calculation.

Response syntax

message ComputeSplitPointsBySizeResponse {
    required ConsumedCapacity consumed = 1;
    repeated PrimaryKeySchema schema = 2;

    /**
     * Split points between splits, in the increasing order
     *
     * A split is a consecutive range of primary keys,
     * whose data size is about split_size specified in the request.
     * The size could be hard to be precise.
     *
     * A split point is an array of primary-key column w.r.t. table schema,
     * which is never longer than that of table schema.
     * Tailing -inf will be omitted to reduce transmission payloads.
     */
    repeated bytes split_points = 3;

    /**
     * Locations where splits lies in.
     *
     * By the managed nature of TableStore, these locations are no more than hints.
     * If a location is not suitable to be seen, an empty string will be placed.
     */
     repeated SplitLocation locations = 4;
}

Parameter

Type

Description

consumed

ConsumedCapacity

The number of capacity units (CUs) that are consumed by this request.

schema

PrimaryKeySchema

The schema of the table. The schema is the same as the schema that was defined when the table was created.

split_points

repeated bytes

The split points between splits. The split points must increase monotonically between these splits. Each split point is a row of data in the PlainBuffer format and contains only the primary key. The last -inf of each split point is not transmitted. This helps reduce the amount of transmitted data.

locations

repeated SplitLocation

The information about the hosts on which the split points reside. You can leave this parameter empty.

For example, if a table contains three primary key columns and the data type of the first primary key column is string, the following splits are obtained after you call this operation: (-inf,-inf,-inf) to ("a",-inf,-inf), ("a",-inf,-inf) to ("b",-inf,-inf), ("b",-inf,-inf) to ("c",-inf,-inf), ("c",-inf,-inf) to ("d",-inf,-inf), and ("d",-inf,-inf) to (+inf,+inf,+inf). The first three splits reside on machine-A and the other two splits reside on machine-B. In this case, the value of split_points is [("a"),("b"),("c"),("d")], and the value of locations is "machine-A"*3, "machine-B"*2.

Use Tablestore SDKs

Tablestore SDK for Java: Split data into shards of a specific size

CU consumption

The number of read CUs that are consumed is the same as the number of splits. No write CUs are consumed.