All Products
Search
Document Center

MaxCompute:Overview of the data transmission service

Last Updated:Nov 19, 2024

The data transmission service is the most important data tunnel of MaxCompute. The data transmission service supports the regular tunnel that is suitable for batch operations and the stream tunnel that is suitable for writing data in streaming mode. This service also provides you with shared resource groups. You can use the shared resource groups for free if the specified quotas are not used up.

Operation types

  • Data transfer by running commands

    Currently, you can run commands for data transmission only on the MaxCompute client.

  • Batch operations on data

    You can use a regular tunnel to perform batch operations on offline data, such as uploading data to a table and downloading data from a table, and downloading the query results of instance data.

  • Data writing in streaming mode

    You can use a stream tunnel to write data in streaming mode to tables in micro-batches.

Tunnel服务

Limits on the data transmission service

  • Limits on the regular tunnel

    • Data uploads

      Item

      Limit

      Lifecycle of a UploadSession

      24 hours

      Number of blocks written by a UploadSession

      20,000

      Speed of writing a block

      10 MB/s

      Amount of data in a block

      100 GB

      Number of UploadSessions created for a table

      500 UploadSessions per 5 minutes

      Number of blocks written to a table

      500 blocks per 5 minutes

      Number of concurrent UploadSessions submitted for a table

      32

      Number of blocks that are concurrently written

      Subject to the number of concurrent slots. A slot is occupied when a block is written.

      Concurrent writes

      MaxCompute ensures that concurrent writes are performed based on the principle of atomicity, consistency, isolation, and durability (ACID). For more information about the semantics of ACID, see ACID semantics.

    • Data downloads

      Item

      Limit

      Lifecycle of a DownloadSession

      24 hours

      Lifecycle of a InstanceDownloadSession

      24 hours, subject to the lifecycle of an instance.

      Number of InstanceDownloadSessions created for a project

      200 InstanceDownloadSessions per 5 minutes

      Number of DownloadSessions created for a table

      200 DownloadSessions per 5 minutes

      Speed of a single download request

      10 MB/s

      Number of DownloadSessions that can be concurrently created

      Subject to the number of concurrent slots. A slot is occupied when a DownloadSession is created.

      Number of InstanceDownloadSession that can be concurrently created

      Subject to the number of concurrent slots. A slot is occupied when an InstanceDownloadSession is created.

      Number of concurrent download requests

      Subject to the number of concurrent slots. A slot is occupied by a data download request.

    Note

    The Delta Table Upsert feature is supported for batch data uploads and downloads. This feature has the following limits:

    • Lifecycle of a UpsertSession: 24 hours

    • Maximum data write speed of an UpsertSession: number of tables or the buckets of partitions × 10 MB/s

    • Maximum number of slots that can be occupied: the number of tables or the buckets of partitions

    • Commitment frequency of a UpsertSession: each partition in a Delta Table allows only one commit per minute. If the duration between two commitments is less than 1 minute, the system reports an error with the message ErrorCode=FlowExceeded, ErrorMessage=CommitUpsert QPS Quota exceeded.

  • Limits on the stream tunnel

    Item

    Limit

    Speed of writing data to a slot

    1 MB/s

    Number of write requests in a slot

    10 requests per second

    Number of partitions concurrently written to a table

    64

    Maximum number of available slots in a partition

    32

    Number of slots occupied by StreamUploadSessions

    Subject to the number of concurrent slots. You can specify the number of slots when you create a StreamUploadSession.

  • Limits on data upload

    • The size of each field cannot exceed the limit of the field. For more information, see Data type editions.

      Note

      The size of a field of the STRING type cannot exceed 8 MB.

    • During the upload, multiple data entries are packaged into a package file.

  • Network limits on exclusive resource groups of the data transmission service

    • Only access over virtual private clouds (VPCs) is supported. Access over the Internet is not supported.

    • Data transmission is supported in the same region. Cross-region data transmission is not supported.

Note

Network bandwidth has a great impact on the upload and download speed of data transmission. In most cases, the speed is in the range of 1 MB/s to 20 MB/s. If the upload speed is slow, you can use the multi-thread upload method.

Shared resource groups of the data transmission service

The following table describes the maximum number of shared resources (slots) that can be assigned to different regions at the project level. The shared resources are free of charge.

Site

Region

Number of slots

China

China (Hangzhou)

300

China

China (Shanghai)

600

China

China East 2 Finance

50

China

China (Beijing)

300

China

China North 2 Ali Gov

100

China

China (Zhangjiakou)

300

China

China (Ulanqab)

300

China

China (Shenzhen)

150

China

China South 1 Finance

50

China

China (Chengdu)

150

China

China (Hong Kong)

50

Asia Pacific

Singapore

100

Asia Pacific

Malaysia (Kuala Lumpur)

50

Asia Pacific

Indonesia (Jakarta)

50

Asia Pacific

Japan (Tokyo)

50

Europe and Americas

Germany (Frankfurt)

50

Europe and Americas

US (Silicon Valley)

100

Europe and Americas

US (Virginia)

50

Europe and Americas

UK (London)

50

Middle East and India

UAE (Dubai)

50

Valid status codes of the data transmission service

Status code

Status code name

200

HTTP_OK

201

HTTP_CREATED

400

HTTP_BAD_REQUEST

401

HTTP_UNAUTHORIZED

403

HTTP_FORBIDDEN

404

HTTP_NOT_FOUND

405

HTTP_METHOD_NOT_ALLOWED

409

HTTP_CONFLICT

422

HTTP_UNPROCESSABLE_ENTITY

429

HTTP_TOO_MANY_REQUESTS

499

HTTP_CLIENT_CLOSED_REQUEST

500

HTTP_INTERNAL_SERVER_ERROR

502

HTTP_BAD_GATEWAY

503

HTTP_SERVICE_UNAVAILABLE

504

HTTP_GATEWAY_TIME_OUT

  • Retry policy for failed requests

    • If a request fails, the client needs to wait for a period of time before reinitiating the request.

    • The waiting time for consecutive failed requests must exponentially increase. The shortest waiting time is 1s. Example: 1s, 2s, 4s, 8s, 16s, 32s, and 32s.

  • Repeated requests

    • Requests with the same URL.

    • Requests that are consecutively initiated by clients of the same IP address.

  • Valid requests

    Requests whose status codes are valid and meet the requirements for the retry policy.

  • Invalid requests

    Requests whose status codes are valid but do not meet the requirements for the retry policy.

    Note

    Invalid requests are not guaranteed by the service level agreement (SLA).

  • Attack requests

    • Requests whose status codes are 429 and 503 and do not comply with the retry policy. Status codes 429 and 503 are returned for process control.

    • For attack requests, MaxCompute isolates the IP address, UID, and project of the client that initiates attacks. Isolated objects cannot access MaxCompute as expected.

    Note

    Attack requests are not guaranteed by the SLA.

FAQ

Why does the speed of data transmission slow down?

Due to the limits of service architecture, occasional request latency may occur on the MaxCompute Tunnel service. In this case, the time consumed for a single upload or download of data of 10 MB may increase from seconds to minutes. This issue may occur in the following scenarios:

  • The shared tunnel resources including the CPU and network bandwidth are used up

    • Duration: minutes to hours.

    • Due to the limits of service architecture, this issue cannot be prevented. If you have high requirements for tunnel resources, you can purchase the exclusive tunnel resources.

  • The network connection such as data upload or download over the Internet from the client to the tunnel is unstable

    • Duration: cannot be evaluated.

    • The stability of the Internet cannot be guaranteed. If you have high requirements for network stability, we recommend that you use Alibaba Cloud VPCs.

  • The client resources including the CPU and network bandwidth are in full usage

    • Duration: cannot be evaluated.

    • You need to fully evaluate the physical resources of your client.

  • The code logic in the client is unreasonable when you use long connections that consume a large amount of time for upload and download

    • Duration: cannot be evaluated.

    • You need to fully consider the data transmission performance when designing code.

FAQ about other issues