The data transmission service is the most important data tunnel of MaxCompute. The data transmission service supports the regular tunnel that is suitable for batch operations and the stream tunnel that is suitable for writing data in streaming mode. This service also provides you with shared resource groups. You can use the shared resource groups for free if the specified quotas are not used up.
Operation types
Data transfer by running commands
Currently, you can run commands for data transmission only on the MaxCompute client.
Batch operations on data
You can use a regular tunnel to perform batch operations on offline data, such as uploading data to a table and downloading data from a table, and downloading the query results of instance data.
Data writing in streaming mode
You can use a stream tunnel to write data in streaming mode to tables in micro-batches.
Limits on the data transmission service
Limits on the regular tunnel
Data uploads
Item
Limit
Lifecycle of a UploadSession
24 hours
Number of blocks written by a UploadSession
20,000
Speed of writing a block
10 MB/s
Amount of data in a block
100 GB
Number of UploadSessions created for a table
500 UploadSessions per 5 minutes
Number of blocks written to a table
500 blocks per 5 minutes
Number of concurrent UploadSessions submitted for a table
32
Number of blocks that are concurrently written
Subject to the number of concurrent slots. A slot is occupied when a block is written.
Concurrent writes
MaxCompute ensures that concurrent writes are performed based on the principle of atomicity, consistency, isolation, and durability (ACID). For more information about the semantics of ACID, see ACID semantics.
Data downloads
Item
Limit
Lifecycle of a DownloadSession
24 hours
Lifecycle of a InstanceDownloadSession
24 hours, subject to the lifecycle of an instance.
Number of InstanceDownloadSessions created for a project
200 InstanceDownloadSessions per 5 minutes
Number of DownloadSessions created for a table
200 DownloadSessions per 5 minutes
Speed of a single download request
10 MB/s
Number of DownloadSessions that can be concurrently created
Subject to the number of concurrent slots. A slot is occupied when a DownloadSession is created.
Number of InstanceDownloadSession that can be concurrently created
Subject to the number of concurrent slots. A slot is occupied when an InstanceDownloadSession is created.
Number of concurrent download requests
Subject to the number of concurrent slots. A slot is occupied by a data download request.
NoteThe Delta Table Upsert feature is supported for batch data uploads and downloads. This feature has the following limits:
Lifecycle of a UpsertSession: 24 hours
Maximum data write speed of an UpsertSession: number of tables or the buckets of partitions × 10 MB/s
Maximum number of slots that can be occupied: the number of tables or the buckets of partitions
Commitment frequency of a UpsertSession: each partition in a Delta Table allows only one commit per minute. If the duration between two commitments is less than 1 minute, the system reports an error with the message
ErrorCode=FlowExceeded, ErrorMessage=CommitUpsert QPS Quota exceeded
.
Limits on the stream tunnel
Item
Limit
Speed of writing data to a slot
1 MB/s
Number of write requests in a slot
10 requests per second
Number of partitions concurrently written to a table
64
Maximum number of available slots in a partition
32
Number of slots occupied by StreamUploadSessions
Subject to the number of concurrent slots. You can specify the number of slots when you create a StreamUploadSession.
Limits on data upload
The size of each field cannot exceed the limit of the field. For more information, see Data type editions.
NoteThe size of a field of the STRING type cannot exceed 8 MB.
During the upload, multiple data entries are packaged into a package file.
Network limits on exclusive resource groups of the data transmission service
Only access over virtual private clouds (VPCs) is supported. Access over the Internet is not supported.
Data transmission is supported in the same region. Cross-region data transmission is not supported.
Network bandwidth has a great impact on the upload and download speed of data transmission. In most cases, the speed is in the range of 1 MB/s to 20 MB/s. If the upload speed is slow, you can use the multi-thread upload method.
Shared resource groups of the data transmission service
The following table describes the maximum number of shared resources (slots) that can be assigned to different regions at the project level. The shared resources are free of charge.
Site | Region | Number of slots |
China | China (Hangzhou) | 300 |
China | China (Shanghai) | 600 |
China | China East 2 Finance | 50 |
China | China (Beijing) | 300 |
China | China North 2 Ali Gov | 100 |
China | China (Zhangjiakou) | 300 |
China | China (Ulanqab) | 300 |
China | China (Shenzhen) | 150 |
China | China South 1 Finance | 50 |
China | China (Chengdu) | 150 |
China | China (Hong Kong) | 50 |
Asia Pacific | Singapore | 100 |
Asia Pacific | Malaysia (Kuala Lumpur) | 50 |
Asia Pacific | Indonesia (Jakarta) | 50 |
Asia Pacific | Japan (Tokyo) | 50 |
Europe and Americas | Germany (Frankfurt) | 50 |
Europe and Americas | US (Silicon Valley) | 100 |
Europe and Americas | US (Virginia) | 50 |
Europe and Americas | UK (London) | 50 |
Middle East and India | UAE (Dubai) | 50 |
Valid status codes of the data transmission service
Status code | Status code name |
200 | HTTP_OK |
201 | HTTP_CREATED |
400 | HTTP_BAD_REQUEST |
401 | HTTP_UNAUTHORIZED |
403 | HTTP_FORBIDDEN |
404 | HTTP_NOT_FOUND |
405 | HTTP_METHOD_NOT_ALLOWED |
409 | HTTP_CONFLICT |
422 | HTTP_UNPROCESSABLE_ENTITY |
429 | HTTP_TOO_MANY_REQUESTS |
499 | HTTP_CLIENT_CLOSED_REQUEST |
500 | HTTP_INTERNAL_SERVER_ERROR |
502 | HTTP_BAD_GATEWAY |
503 | HTTP_SERVICE_UNAVAILABLE |
504 | HTTP_GATEWAY_TIME_OUT |
Retry policy for failed requests
If a request fails, the client needs to wait for a period of time before reinitiating the request.
The waiting time for consecutive failed requests must exponentially increase. The shortest waiting time is 1s. Example: 1s, 2s, 4s, 8s, 16s, 32s, and 32s.
Repeated requests
Requests with the same URL.
Requests that are consecutively initiated by clients of the same IP address.
Valid requests
Requests whose status codes are valid and meet the requirements for the retry policy.
Invalid requests
Requests whose status codes are valid but do not meet the requirements for the retry policy.
NoteInvalid requests are not guaranteed by the service level agreement (SLA).
Attack requests
Requests whose status codes are 429 and 503 and do not comply with the retry policy. Status codes 429 and 503 are returned for process control.
For attack requests, MaxCompute isolates the IP address, UID, and project of the client that initiates attacks. Isolated objects cannot access MaxCompute as expected.
NoteAttack requests are not guaranteed by the SLA.
FAQ
Why does the speed of data transmission slow down?
Due to the limits of service architecture, occasional request latency may occur on the MaxCompute Tunnel service. In this case, the time consumed for a single upload or download of data of 10 MB may increase from seconds to minutes. This issue may occur in the following scenarios:
The shared tunnel resources including the CPU and network bandwidth are used up
Duration: minutes to hours.
Due to the limits of service architecture, this issue cannot be prevented. If you have high requirements for tunnel resources, you can purchase the exclusive tunnel resources.
The network connection such as data upload or download over the Internet from the client to the tunnel is unstable
Duration: cannot be evaluated.
The stability of the Internet cannot be guaranteed. If you have high requirements for network stability, we recommend that you use Alibaba Cloud VPCs.
The client resources including the CPU and network bandwidth are in full usage
Duration: cannot be evaluated.
You need to fully evaluate the physical resources of your client.
The code logic in the client is unreasonable when you use long connections that consume a large amount of time for upload and download
Duration: cannot be evaluated.
You need to fully consider the data transmission performance when designing code.