All Products
Search
Document Center

MaxCompute:Near real-time incremental import

Last Updated:May 27, 2024

In MaxCompute, you can write incremental data to a delta table in the near real-time mode and write full data to a delta table at a time. This topic describes the architecture design for high-concurrency, near real-time incremental write scenarios.

In actual business data processing scenarios, a wide range of data sources are involved, such as databases, log systems, or message queue systems. To help you write data to delta tables, MaxCompute provides an open source Flink connector plug-in. You can use the plug-in together with Data Integration of DataWorks and other data import tools to meet the requirements for low latency and high data accuracy in high concurrency, fault tolerance, and transaction submission scenarios.

image.png

The preceding figure shows business data processing.

  • The data import tool is integrated with the SDK client that is provided by the Tunnel service of MaxCompute to support high-concurrency minute-level data writing to the Tunnel server. Then, the Tunnel server initiates multiple worker nodes to write data in parallel to the data files of each bucket.

  • You can configure the write.bucket.num parameter to specify the write concurrency. A high concurrency indicates a high write speed. For more information about the benefits provided by buckets, see Table data format.

  • The data writing interface that is provided by Tunnel SDK supports only UPSERT and DELETE operations.

  • The call of the commit interface represents an atomic commit of the data that is written before the commit.

    • If the call is successful, the data that is written can be queried and meets the read/write snapshot isolation requirements.

    • If the call fails, you can retry to write the data. If the failure is not caused by an unrecoverable error, such as data corruption, the retry may be successful and you do not need to rewrite the data. Otherwise, you must rewrite and recommit the data.