All Products
Search
Document Center

Realtime Compute for Apache Flink:Data latency and consistency

Last Updated:Jul 17, 2024

This topic describes the data latency and consistency in Apache Paimon (Paimon).

Snapshot files

Snapshot files provide access to the data in a Paimon table. A snapshot file captures the data in a Paimon table at a specific point in time. You can use snapshot files generated at different points in time to change the consumer offset in streaming mode and implement the time travel feature in batch mode.

For information about how to query the snapshot files of a Paimon table and their creation time, see Snapshots table.

Note

Snapshot files are used to consume data from Paimon tables. By default, a snapshot file is retained for 1 hour before it is deleted. If the retention period is excessively short or the consumption speed is low, a snapshot file may be deleted while it is still in use, which results in an error. To resolve the preceding issue, you can change the retention period of snapshot files, specify a consumer ID, or optimize the performance of Paimon tables.

Latency

When data is being written to a Paimon table, the writer caches the data in memory and temporary files. After the Flink deployment triggers a checkpoint, the writer commits the data to generate a snapshot file. In steaming mode, a downstream consumer listens to the list of snapshot files and reads data only from the most recent snapshot file that is detected.

Data latency refers to the delay between data writing and data consumption. Data latency varies based on the frequency at which snapshot files are generated. If no backpressure occurs in the Flink deployment, snapshot files are generated at each checkpoint. This means that the data latency of a Paimon table is equal to the checkpoint interval. Take note that if the checkpoint interval is excessively small, the performance of the Flink deployment may be affected. We recommend that you set the checkpoint interval to a value between 1 to 10 minutes. You can increase the interval within the range to improve the read and write efficiency based on your business requirements.

Consistency

Paimon uses a two-phase commit protocol to atomically commit data. If you use two Flink deployments to concurrently write data into a Paimon table, take note of the following points:

  • We recommend that you do not modify the same bucket. This allows concurrent commits in the two deployments and ensures sequential consistency.

  • If you modify the same bucket, Paimon triggers a failover to resolve data conflicts and ensures only snapshot isolation. This ensures that no data changes are lost, but the final state of the Paimon table may contain commits from both deployments.