Terms
Term | Description |
---|---|
project | An organizational unit in DataHub. Each project contains one or more topics. DataHub projects are independent of MaxCompute projects. You cannot use MaxCompute projects as DataHub projects. |
topic | The minimum unit for data subscription and publishing in DataHub. You can use topics to distinguish different types of streaming data. For more information about the limits on the number of projects and topics, see Limits. |
time-to-live (TTL) period of a topic | The period that each record can be retained in a topic. Unit: day. Valid values: 1 to 7. |
shard | Shards are channels that allow for concurrently writing data to a topic. Each shard has a unique ID. Shards can be in different states. For more information about the states of shards, see the table in the "Shard states" section of this topic. Each active shard consumes server resources. We recommend that you create shards as needed. |
shard hash key range | The range of hash key values for a shard, which is in the [Starting hash key,Ending hash key) format. The hashing mechanism ensures that all records with the same shard key are written to the same shard. For more information, see DataHub SDK for Java. |
shard merge | The operation that merges two adjacent shards. Two shards are considered adjacent if the hash key ranges for the two shards form a contiguous set with no gaps. For more information, see Manage shards. |
shard split | The operation that splits one shard into two adjacent shards. |
record | A unit of data that is written to DataHub. |
record type | The data type of records in a topic. TUPLE and BLOB are supported. A topic of the TUPLE type is a sequence of immutable objects. A topic of the BLOB type is a chunk of binary data stored as a single entity. |
Data types
The following table describes the data types that are supported in a topic of the TUPLE type.
Type | Description | Valid value |
---|---|---|
BIGINT | An 8-byte signed integer. | -9223372036854775807 to 9223372036854775807 |
DOUBLE | A double-precision floating-point number. It is eight bytes in length. | -1.0 _10^308 to 1.0 _10^308 |
BOOLEAN | The Boolean data type. | True and False, true and false, or 0 and 1 |
TIMESTAMP | The timestamp data type. | The value is accurate to microseconds. |
STRING | A string. Only UTF-8 encoding is supported. | The size of a string must not exceed 2 MB. |
TINYINT | A single-byte integer. | -128 to 127 |
SMALLINT | A double-byte integer. | -32768 to 32767 |
INTEGER | A four-byte integer. | -2147483648 to 2147483647 |
FLOAT | A single-precision floating-point number. It is four bytes in length. | -3.40292347_10^38 to 3.40292347_10^38 |
DECIMAL | A decimal numeral. | - 10^38 + 1 to 10^38 - 1 |
For version V2.16.1-public and later, DataHub SDK for Java supports the TINYINT, SMALLINT, INTEGER, and FLOAT data types that are used in DataHub.
In a topic of the BLOB type, a chunk of binary data is stored as a record. Records written to DataHub are Base64 encoded.
Shard states
State | Description |
---|---|
Opening | All shards in a topic are being activated when the topic is created. You cannot perform read or write operations on a shard when it is being activated. |
Active | Read and write operations are allowed when a shard is in the Active state. |
Closing | When a shard is being split or two shards are being merged, the shards are in the Closing state. You cannot perform read or write operations on shards in this state. |
Closed | A shard is in the Closed state when the split or merge operation is complete. The shard is read-only when it is in the Closed state. |
Error codes
Error code | HTTP status code | Description |
---|---|---|
InvalidUriSpec | 400 | The error code is returned because the specified URI is invalid. |
InvalidParameter | 400 | The error code is returned because the specified parameter is invalid. Check the returned error message for detailed information. |
Unauthorized | 401 | The error code is returned because a signature error occurs. |
NoPermission | 403 | The error code is returned because the account does not have the permissions to perform the operation. |
InvalidSchema | 400 | The error code is returned because the schema format is invalid. |
InvalidCursor | 400 | The error code is returned because the cursor is invalid or has expired. |
NoSuchProject | 404 | The error code is returned because the specified project does not exist. |
NoSuchTopic | 404 | The error code is returned because the specified topic does not exist. |
NoSuchShard | 404 | The error code is returned because the specified shard ID does not exist. |
ProjectAlreadyExist | 400 | The error code is returned because the project name already exists. |
TopicAlreadyExist | 400 | The error code is returned because the topic name already exists. |
InvalidShardOperation | 405 | The error code is returned because the operation on the shard is not allowed. For example, you are not allowed to write data to a shard when it is in the Closed state. |
LimitExceeded | 400 | The error code is returned because a specified threshold is exceeded. For example, you create more than 512 shards in a topic. |
InternalServerError | 500 | The error code is returned because an unknown or internal error occurs or the system is being updated. |