This topic describes the architecture of data organization optimization for Delta tables.
Clustering
Pain points
Delta tables support minute-level near-real-time import of incremental data. If Delta tables are used in high-traffic load scenarios, the number of small incremental files may increase. This causes high access load and high storage costs. If a large number of small files exist, multiple other issues occur, such as frequent metadata updates, slow data analysis speed, and low I/O efficiency. To resolve these issues, you can perform the clustering operation to automatically merge small files in the preceding scenarios.
Solution
The storage service of MaxCompute performs the clustering operation to merge small files. However, after the clustering operation is performed, the intermediate status of all data still exists and remains unchanged.
Process of the clustering operation
The following figure shows the process of the clustering operation.
A clustering policy can be configured based on typical read and write business scenarios. After the configuration, MaxCompute periodically merges data files in a hierarchical manner based on multiple dimensions, such as the size and number of data files. In the preceding figure, the original Delta files that are highlighted in blue at Level 0 are merged into medium-sized Delta files that are highlighted in yellow at Level 1. If the number of medium-sized Delta files at Level 1 reaches the specified threshold, these files are further merged into large-sized Delta files that are highlighted in orange at Level 2.
Special isolation processing is performed on data files that exceed a specific size. This way, further merging is not triggered on these Delta files. This prevents unnecessary read and write amplification. In the preceding figure, the t8 data file in Bucket 3 is not merged with other files. Only files whose generation time falls in a specified time range can be merged into one file. If files with a large time difference are merged into one file, a large amount of historical data that does not belong to the time range of the query may be read when a time travel query or an incremental query is performed. This causes unnecessary read amplification.
Data of a Transaction Table 2.0 table is split by BucketIndex for storage. In this case, the clustering operation is concurrently implemented at the bucket granularity. This significantly reduces the overall runtime.
The clustering operation needs to interact with Meta Service to obtain the list of tables or partitions on which the clustering operation needs to be performed. After the interaction is complete, the information about old and new data files is passed to Meta Service. Meta Service detects transaction conflicts of the clustering operation, seamlessly updates the metadata of data files, and reclaims old data files in a secure manner.
The clustering operation can help resolve read and write efficiency issues caused by a large number of data files. However, the clustering operation also consumes computing and I/O resources. All data is read and written each time the clustering operation is performed. This causes read and write amplification. MaxCompute automatically triggers the clustering operation based on the system status to ensure high efficiency of the clustering operation.
Compaction
Pain points
Delta tables allow you to write data of the UPDATE and DELETE types. If a large amount of data of the UPDATE and DELETE types is written to a Delta table, excessive redundant intermediate status data is generated. This increases storage and computing costs and causes low query efficiency. Therefore, we recommend that you use the compaction operation based on your business requirements to delete the intermediate status and optimize performance in this scenario.
Solution
The storage service of MaxCompute performs the compaction operation to merge data files. You can manually execute SQL statements to trigger the compaction operation. You can also configure table property parameters to automatically trigger the compaction operation based on dimensions, such as the time frequency and the number of commits. The compaction operation merges the selected data files, including base files and Delta files, to delete the intermediate status of data of the UPDATE and DELETE types. If multiple rows of data have the same primary key value, only the row with the latest status is retained. This way, a new base file in which all data is of the INSERT type is generated.
Process of the compaction operation
The following figure shows the process of the compaction operation.
During the time points from t1 to t3, the compaction operation is triggered after a number of Delta files are written to buckets. All Delta files are merged in each bucket. Then, a new base file is generated for each bucket. During the time points from t4 to t6, the compaction operation is triggered again after a number of new Delta files are written to buckets. The existing base files and the new Delta files are merged together to generate a new base file.
The compaction operation also needs to interact with Meta Service to obtain the list of tables or partitions on which the compaction operation is performed. The process of the interaction between the compaction operation and Meta Service is similar to the process of interaction between the clustering operation and Meta Service. After the interaction is complete, the information about old and new data files is passed to Meta Service. Meta Service detects transaction conflicts of the compaction operation, atomically updates the metadata of data files, and reclaims old data files.
The compaction operation can delete the intermediate status. This helps reduce computing and storage costs and effectively improves the efficiency of full snapshot queries. However, frequent compaction operations require a large number of computing and I/O resources and may cause new base files to occupy additional storage. Historical Delta files may be used for time travel queries and cannot be immediately deleted. This increases storage costs. Therefore, you must determine the frequency of the compaction operation based on business requirements and data characteristics. In scenarios where a large amount of data of the UPDATE or DELETE type exists and full data is frequently queried, you can increase the frequency at which the compaction operation is performed to improve the query efficiency.
Data reclamation
The historical status of data is queried when time travel queries and incremental queries are performed. Therefore, the data must be retained for a specific period of time. You can configure the table property parameter acid.data.retain.hours to specify the time period during which historical data is retained. If the time at which historical data is generated is earlier than the time range specified by the acid.data.retain.hours parameter, the system automatically reclaims and deletes the data. Once the data is deleted, you cannot use the time travel feature to query the historical status of the data. The operation logs and data files are reclaimed.
The Purge command is also provided to help you manually trigger a forceful clearing of the historical data in special scenarios.
If new Delta files are continuously written to buckets, no Delta file can be deleted because the new Delta files may rely on the previous Delta files. To remove dependencies between Delta files, you can perform the compaction operation or the INSERT OVERWRITE operation. The data files that are generated after the compaction operation or the INSERT OVERWRITE operation is performed do not rely on the previous Delta files. After the period of time during which the Delta files can be queried by using the time travel feature elapses, the Delta files can be deleted.
In extreme scenarios where users do not perform the compaction operation, MaxCompute optimizes the configuration to prevent unlimited historical records from being generated. The backend system periodically compacts the base files or Delta files that cannot be queried by using the time travel feature. This way, the historical data files that have been compacted can be reclaimed as expected.