Divides data in a table into several logical splits whose sizes are approximately the specified value. The split points between the splits and the information about hosts on which the splits reside are returned. This operation is used by compute engines to determine execution plans such as concurrency plans.
Request syntax
message ComputeSplitPointsBySizeRequest {
required string table_name = 1;
required int64 split_size = 2; // in 100MB
optional int64 split_size_unit_in_byte = 3;
optional int32 split_point_limit = 4;
}
Parameter | Type | Required | Description |
table_name | string | Yes | The name of the table whose data you want to divide. |
split_size | int64 | Yes | The approximate size of each split. Unit: megabytes. |
split_size_unit_in_byte | int64 | No | The size unit to be used in splitting. This parameter is used in split point calculation to ensure calculation accuracy. |
split_point_limit | int32 | No | The limit on the number of split points. This parameter is used to control the returned result of split point calculation. |
Response syntax
message ComputeSplitPointsBySizeResponse {
required ConsumedCapacity consumed = 1;
repeated PrimaryKeySchema schema = 2;
/**
* Split points between splits, in the increasing order
*
* A split is a consecutive range of primary keys,
* whose data size is about split_size specified in the request.
* The size could be hard to be precise.
*
* A split point is an array of primary-key column w.r.t. table schema,
* which is never longer than that of table schema.
* Tailing -inf will be omitted to reduce transmission payloads.
*/
repeated bytes split_points = 3;
/**
* Locations where splits lies in.
*
* By the managed nature of TableStore, these locations are no more than hints.
* If a location is not suitable to be seen, an empty string will be placed.
*/
repeated SplitLocation locations = 4;
}
Parameter | Type | Description |
consumed | The number of capacity units (CUs) that are consumed by this request. | |
schema | The schema of the table. The schema is the same as the schema that was defined when the table was created. | |
split_points | repeated bytes | The split points between splits. The split points must increase monotonically between these splits. Each split point is a row of data in the PlainBuffer format and contains only the primary key. The last -inf of each split point is not transmitted. This helps reduce the amount of transmitted data. |
locations | repeated SplitLocation | The information about the hosts on which the split points reside. You can leave this parameter empty. |
For example, if a table contains three primary key columns and the data type of the first primary key column is string, the following splits are obtained after you call this operation: (-inf,-inf,-inf)
to ("a",-inf,-inf)
, ("a",-inf,-inf)
to ("b",-inf,-inf)
, ("b",-inf,-inf)
to ("c",-inf,-inf)
, ("c",-inf,-inf)
to ("d",-inf,-inf)
, and ("d",-inf,-inf)
to (+inf,+inf,+inf)
. The first three splits reside on machine-A and the other two splits reside on machine-B. In this case, the value of split_points is [("a"),("b"),("c"),("d")]
, and the value of locations is "machine-A"*3, "machine-B"*2
.
Use Tablestore SDKs
Tablestore SDK for Java: Split data into shards of a specific size
CU consumption
The number of read CUs that are consumed is the same as the number of splits. No write CUs are consumed.