To ensure the atomicity, consistency, isolation, and durability (ACID) semantics of Delta tables, MetaService manages all data modification operations on Delta tables as transactions in a unified manner. The multiversion concurrency control (MVCC) model is used to ensure the isolation of read and write operations on snapshots, and the optimistic concurrency control (OCC) model is used to control optimistic transaction concurrency.
Conflict detection rules
The following table describes the rules that are used to handle the conflicts between concurrent jobs on the same non-partitioned table or partition.
Job type | INSERT OVERWRITE /TRUNCATE (job that ends later) | INSERT INTO (job that ends later) | UPDATE / DELETE (job that ends later) | MINOR COMPACT (job that ends later) | MAJOR COMPACT (job that ends later) |
INSERT OVERWRITE /TRUNCAT (job that ends earlier) | Both jobs succeed. The result data of the job that ends later overwrites that of the job that ends earlier. | The job that ends earlier succeeds. An error is returned for the job that ends later. | The job that ends earlier succeeds. An error is returned for the job that ends later. | The job that ends earlier succeeds. An error is returned for the job that ends later. | The job that ends earlier succeeds. An error is returned for the job that ends later. |
INSERT INTO (job that ends earlier) | Both jobs succeed. The result data of the job that ends later overwrites that of the job that ends earlier. | The job that ends earlier succeeds. An error is returned for the job that ends later. | The job that ends earlier succeeds. An error is returned for the job that ends later. | Both jobs succeed. The compaction operation of the job that ends later succeeds. | The job that ends earlier succeeds. An error is returned for the job that ends later. |
UPDATE /DELETE (job that ends earlier) | Both jobs succeed. The result data of the job that ends later overwrites that of the job that ends earlier. | The job that ends earlier succeeds. An error is returned for the job that ends later. | The job that ends earlier succeeds. An error is returned for the job that ends later. | Both jobs succeed. The compaction operation of the job that ends later succeeds. | The job that ends earlier succeeds. An error is returned for the job that ends later. |
MINOR COMPACT (job that ends earlier) | Both jobs succeed. The result data of the job that ends later overwrites that of the job that ends earlier. | Both jobs succeed. New data is written for the job that ends later. | Both jobs succeed. New data is written for the job that ends later. | The job that ends earlier succeeds. An error is returned for the job that ends later. | Both jobs succeed. The compaction operation of the job that ends later succeeds. |
MAJOR COMPACT (job that ends earlier) | Both jobs succeed. The result data of the job that ends later overwrites that of the job that ends earlier. | Both jobs succeed. New data is written for the job that ends later. | Both jobs succeed. New data is written for the job that ends later. | The job that ends earlier succeeds. An error is returned for the job that ends later. | The job that ends earlier succeeds. An error is returned for the job that ends later. |
Concurrency conflict optimization
The preceding conflict detection rules do not apply to concurrent operations at the row or file level. A sequence of data processing operations are managed as a transaction. For specific frequently performed operations, the transaction conflict handling logic is optimized based on the semantics of the operations on the premise of data correctness. This better supports concurrency control. For example, you perform a clustering operation and an INSERT INTO operation in a transaction at the same time. The transaction does not fail even if the start time and commit time of the transaction overlap. This is because the clustering operation changes the method of organizing data but does not change the data status. The clustering operation and the INSERT INTO operation do not cause status inconsistency and can be concurrently performed. The transaction conflict handling logic will continue to be optimized to adapt to more scenarios.
If a conflict detection fails, a metadata-level retry is performed on the premise of data correctness. Data does not need to be read or written again. This improves user experience and reduces resource consumption.
Metadata is atomically updated to ensure data consistency.
The concurrency conflict optimization applies to only single-table transactions.
Management of data file versions
Each transaction generates a batch of new data files. These data files are associated with the following transaction versions:
Time version: indicates the transaction commit time of the TIMESTAMP type. A new time version is generated only for operations that are triggered by users and have logical data changes. Clustering and compaction operations only reorganize and optimize physical data, and do not add or modify data. Therefore, no new time version is generated for clustering and compaction operations. This way, when users perform incremental queries based on time versions, data files generated by clustering and compaction operations are not queried. This meets business requirements.
ID version: an auto-increment integer. An incremental ID version is generated for any data operation in a transaction, including clustering and compaction operations performed within the engine. The ID version is mainly used for internal transaction management. The ID version is also public to users and can be used for incremental queries.