This topic describes the stability and limits of the new version of data shipping from Simple Log Service (SLS) to MaxCompute.
Stability
Reading from SLS
Item | Description |
Availability | High availability. If an error occurs in SLS and data cannot be read, the MaxCompute data shipping task retries at least 10 times internally. If the task still fails, an error is reported and the task restarts. |
Writing to MaxCompute
Item | Description |
Concurrency | Partitions are created based on SLS shards and shipping instances are created. This supports rapid scale-out. If a source logstore in SLS performs a shard split, the shipping instances can be scaled out within seconds to accelerate data exporting. |
No data loss | MaxCompute data shipping tasks are extended based on consumer groups to ensure consistency. The offset is submitted only after the data is delivered. This ensures that the offset is not submitted before data is written to MaxCompute, preventing data loss. |
Schema changes | If you add a new column to a MaxCompute table during data shipping, the new column is written only to new partitions. It is not written to old partitions or the current partition. Note Due to MaxCompute limits, you cannot insert, update, or delete columns, or change the column order in the table schema during data shipping. If you perform these operations, the data shipping task becomes abnormal and cannot be recovered. For more information, see MaxCompute limits. |
Handling dirty data
Error type | Counted as a failed record | Description |
Partition error | Yes | Common use cases include invalid partitions or non-existent partition key columns. The data record is not written to MaxCompute. |
Invalid data column | No | Common use cases include data type mismatch or type conversion failure. The data in this column is not written to MaxCompute. Data in other columns is written to MaxCompute as normal. |
Data column too long | No | A common use case is that the data exceeds the length limit of the string or varchar type. The data in this column is truncated and then written to MaxCompute. Data in other columns is written to MaxCompute as normal. |
Monitoring and alerts
Stability | Description |
Monitoring and alerts | Data shipping provides comprehensive monitoring to track metrics such as latency and traffic of shipping tasks in real time. Configure custom alerts based on your business needs to promptly detect shipping issues, such as insufficient export instances or network quota limits. For more information, see Set an alert for a MaxCompute data shipping task (new version). |
Restarting a task
Item | Description |
Too many partitions | When a task restarts, if there are too many partitions and the write operation is not complete within 5 minutes, data duplication may occur. |
Data write failure | When a task restarts and fails to write data to MaxCompute due to authorization or network errors, partial data duplication may occur. |
Limits
Network
Item | Description |
Network for intra-region shipping | When shipping data within the same region, data is transmitted over the Alibaba Cloud internal network. This ensures better network stability and speed. |
Read traffic
Item | Description |
Read traffic | A maximum traffic limit exists for a single project and a single shard. For more information, see Data reads and writes. If the maximum traffic limit is exceeded, split the shard or request to increase the read traffic limit for the project. Exceeding the limit causes the MaxCompute data shipping task to fail when reading data. The task retries at least 10 times internally. If it still fails, an error is reported and the task restarts. |
Writing to MaxCompute
Item | Description |
Concurrent instances | The maximum number of concurrent export instances is 64. If the number of shards exceeds 64, multiple shards are merged into one instance for exporting. The system tries to keep the number of shards in each instance the same. |
Write threshold |
Important If you exceed the MaxCompute write limit, writing data to MaxCompute becomes unstable and triggers throttling on the MaxCompute side. This can cause FlowExceeded or SlotExceed errors. Contact MaxCompute on-duty engineers to resolve the issue. |
Prohibition of table schema modification | MaxCompute data shipping (new version) uses MaxCompute stream writing. During stream writing to MaxCompute, the MaxCompute Tunnel Service prohibits schema modifications such as inserting, updating, or deleting columns, or changing the column order in the target table. For more information, see Overview of Lindorm Tunnel Service. Due to this schema modification restriction, you cannot use MaxCompute data shipping (new version) and MaxCompute data shipping (legacy version) to write data to the same MaxCompute table at the same time. |
Unsupported special tables | You cannot write data to MaxCompute external tables, transactional tables, or clustered tables. |
Table schema changes | If your MaxCompute table schema changes, you must pause the MaxCompute data shipping task for 20 minutes and then restart it for the schema change to take effect. |
Start time |
Note Due to the slot and queries per second (QPS) limits of MaxCompute, shipping historical data can easily exceed the MaxCompute write threshold. Therefore, this feature is no longer supported. |
Permission management
Item | Description |
Write authorization | MaxCompute write authorization supports both Resource Access Management (RAM) users and RAM roles. You must perform a separate operation in MaxCompute. |
Data types
Regular columns
Type
Example
Description
string
"hello"
Maximum length: 8 MB.
datetime
"2021-12-22 05:00:00"
The data in SLS must meet the data format requirements of MaxCompute.
date
"2021-12-22"
The data in SLS must meet the data format requirements of MaxCompute.
timestamp
1648544867
Millisecond or second precision.
decimal
1.2
The data in SLS must meet the data format requirements of MaxCompute.
char
"hello"
Maximum length: 255 bytes.
varchar
"hello"
Maximum length: 65,535 bytes.
binary
"hello"
Maximum length: 8 MB.
bigint
123
Supports up to int64.
boolean
1
1, t, T, true, TRUE, and True are parsed as True.
0, f, F, false, FALSE, and False are parsed as False.
double
1.2
Supports up to 64-bit floating-point numbers.
float
1.2
Supports up to 32-bit floating-point numbers.
integer
123
Supports up to int32.
smallint
12
Supports up to int16.
tinyint
12
Supports up to int8.
Partition key columns
Item
Description
Partition key column
Treated as a string. Must meet the format requirements for MaxCompute partition key columns.
Configuring a log field other than
__partition_time__or__receive_time__If you configure a log field other than
__partition_time__or__receive_time__for a partition key column, data shipping performance may be affected.
Manage shipping
Item | Description |
Pausing a data shipping task | A data shipping task records the log cursor of the last delivery. When the task resumes, it continues shipping from the recorded cursor. The following mechanisms apply when you pause a data shipping task.
|
MaxCompute IP whitelist
Item | Description |
Enabling a whitelist in MaxCompute project management, such as a classic network IP whitelist, may cause data shipping to fail | Run commands in MaxCompute to resolve the data shipping failure caused by the whitelist.
For more information, see What do I do if data fails to be shipped to a MaxCompute project after I enable an IP address whitelist for the classic network in the MaxCompute project? |