You can ship logs from Simple Log Service to MaxCompute. This topic describes the stability and limits of the data shipping feature of the new version.
Stability
Data read from Simple Log Service
Item | Description |
Availability | High availability is provided. If a MaxCompute data shipping job fails to read data from Simple Log Service due to an error in Simple Log Service, the job is retried at least 10 times. If the job still fails, an error is reported, and the job is restarted. |
Data write to MaxCompute
Item | Description |
Concurrency | Data shipping instances can be created based on shards, and the resources that are used for data shipping can be scaled out. If shards in the source Logstore of a data shipping instance are split, the required resources can be scaled out within a few seconds to accelerate the data export process. |
Data consistency | The required resources are scaled out based on consumer groups to ensure data consistency. For more information, see Consumer groups. An offset is submitted only after data is shipped to MaxCompute. This helps ensure that all data is shipped to MaxCompute. |
Changes in the schema of a table | If you change the schema of the table during data shipping, the change applies only to new partitions. |
Processing of dirty data
Item | Counted in failed logs | Description |
Partition errors | Yes | A partition is invalid or a specified partition key column does not exist. The related log is not written to MaxCompute. |
Invalid columns | No | The data type of a column in Simple Log Service does not match the data type of the corresponding column in MaxCompute or a data type conversion fails. In this case, the data in the column is not written to MaxCompute. Only the data in valid columns is written to MaxCompute. |
Excess length of data columns | No | The length of the data in a column of the string type or the varchar type exceeds the limit. In this case, the data in the column is truncated, and then written to MaxCompute. The data in valid columns is written to MaxCompute. |
Monitoring and alerting
Item | Description |
Monitoring and alerting | You can monitor data shipping jobs in real time based on metrics such as the latency and traffic of data shipping jobs. You can configure custom alert rules based on your business requirements to identify exceptions that occur during data shipping at the earliest opportunity. For example, if the data shipping instances that are used to export data are insufficient or the network quota limit is exceeded, alerts are triggered. For more information, see Configure alerts for a MaxCompute data shipping job (new version). |
Job restart
Item | Description |
Excess number of partitions | If the number of partitions is larger than expected when a job is restarted, data write to MaxCompute may not be completed within 5 minutes. As a result, duplicate data may be written to MaxCompute. |
Data write failure | If an authorization error or a network error occurs when a job is restarted, data may fail to be written to MaxCompute. In this case, duplicate data may be written to MaxCompute. |
Limits
Network
Item | Description |
Network for data shipping within a region | If you ship data within a region, data is transmitted over the Alibaba Cloud internal network. The network stability and speed can be ensured. |
Read traffic
Item | Description |
Read traffic | Simple Log Service sets upper limits on read traffic in a single project and a single shard. For more information, see Data read and write. If a limit is exceeded, you must split shards or apply to increase the limit in your project. If a job fails to read data because a limit is exceeded, the job is retried at least 10 times. If the job still fails, an error is reported, and the job is restarted. |
Data write to MaxCompute
Item | Description |
Number of concurrent instances | You can run a maximum of 64 data shipping instances concurrently to export data. If the number of shards of Simple Log Service exceeds 64, multiple shards are grouped into one instance to export data. Make sure that the number of shards in each instance is the same. |
Write speed |
Important If the limit is exceeded, data write to MaxCompute becomes unstable, throttling is triggered on the MaxCompute side, and the FlowExceeded or SlotExceed error is reported. In this case, contact MaxCompute technical support to resolve the issues. |
Prohibition of data modification | The new version of the data shipping feature for MaxCompute allows you to write data to MaxCompute in streaming mode. When data is written to MaxCompute in streaming mode, the MaxCompute Streaming Tunnel service prohibits data update, delete, and insert operations in the MaxCompute table. Therefore, you cannot use the new and old versions of the data shipping feature for MaxCompute at the same time to write data to a MaxCompute table. |
Data write to special tables | Data cannot be written to the external tables, transactional tables, or clustered tables of MaxCompute. |
Changes in the schema of a table | If the schema of your MaxCompute table is changed, you must temporarily pause your data shipping job for 20 minutes and resume the job to allow the change to take effect. |
Permission management
Item | Description |
Write permissions | You can grant the write permissions on MaxCompute to Resource Access Management (RAM) users and RAM roles. You must complete the authorization in the MaxCompute console. |
Data types
Common column
Type
Example
Description
string
"hello"
The maximum length is 8 MB.
datetime
"2021-12-22 05:00:00"
The data in Simple Log Service must meet the data format requirements of MaxCompute.
date
"2021-12-22"
The data in Simple Log Service must meet the data format requirements of MaxCompute.
timestamp
1648544867
The time is accurate to the millisecond or second.
decimal
1.2
The data in Simple Log Service must meet the data format requirements of MaxCompute.
char
"hello"
The maximum length is 255 bytes.
varchar
"hello"
The maximum length is 65,535 bytes.
binary
"hello"
The maximum length is 8 MB.
bigint
123
64-bit signed integers are supported.
boolean
1
1, t, T, true, TRUE, and True are parsed into True.
0, f, F, false, FALSE, and False are parsed into False.
double
1.2
64-bit floating-point numbers are supported.
float
1.2
32-bit floating-point numbers are supported.
integer
123
32-bit signed integers are supported.
smallint
12
16-bit signed integers are supported.
tinyint
12
8-bit signed integers are supported.
Partition key column
Item
Description
Partition key column
Data is processed as strings and must meet the data format requirements of MaxCompute.
Data shipping management
Item | Description |
Pause of a data shipping job | If you pause a data shipping job, the job records the cursor of the last log that is shipped. After you resume the job, the job continues to ship logs from the recorded cursor. Simple Log Service implements the following mechanism when you pause a data shipping job:
|
MaxCompute IP address whitelist
Item | Description |
MaxCompute IP address whitelist for the classic network | If you enable an IP address whitelist for the classic network in a MaxCompute project, data may fail to be shipped to the MaxCompute project. To resolve this issue, you can run the following command on the MaxCompute side:
|