MaxCompute provides efficient support for typical business scenarios of Time travel queries and incremental queries on Delta tables. You can perform time travel queries to query data of historical versions. For example, you can backtrack business data at a historical point in time, or restore data in the historical state for data correction when a data error occurs. You can also directly perform the restore
operation to restore data to a specific historical version.
Query process
The following figure shows how to query data from a Delta table by using the time travel feature.
After you enter an SQL statement, the engine obtains the version of the data that you want to query after parsing, searches for the most recent base file within the time range of the version, searches for the delta files to which data is written after the base file is generated, and then merges the data in the base file and delta files to generate the output. The base file can be used to improve query and read efficiency.
The preceding figure shows how to query data in a transactional table named src.
The schema of the table consists of a pk column and a val column. The left part of the preceding figure shows the data change process. The time points t1 to t5 represent the time versions of transactions. Five data write transactions are executed, and five Delta files are generated. The compaction operation is performed at the time points t2 and t4, and two base files b1 and b2 are generated. The historical intermediate state record (2,a) is deleted from the base file b1 and only the most recent state record (2,b) is retained in the base file b1.
For example, if you want to query the historical data at the time point t1, MaxCompute needs to only read the Delta file d1 and generate an output. If you want to query the historical data at the time point t2, MaxCompute needs to only read the base file b1 and generate the output that contains three records. If you want to query the historical data at the time point t3, MaxCompute merges the data in the base file b1 and the data in the Delta file d3 and generates an output. You can follow the same rule to query data at other time points. Base files can be used to accelerate queries. However, queries of base files trigger compaction operations that consume a large number of resources. You need to select an appropriate trigger policy based on your business scenarios.
The time travel feature helps you query transaction versions. Two types of transaction versions are supported: time version and ID version. In terms of the SQL syntax, the get_latest_timestamp and get_latest_version functions are developed in addition to specific constants and common functions. The second parameter of the get_latest_timestamp and get_latest_version functions indicates the number of times that data is last committed. This helps you obtain the internal data version of MaxCompute for precise queries and improves user experience.
Period of time during which historical data can be queried
The historical status of data can be queried by using the time travel feature only if the historical status is saved. You can configure the table property acid.data.retain.hours to specify the period of time during which historical data is retained. The maximum retention period is seven days. We recommend that you specify a reasonable period of time based on your business requirements. A long data retention period increases the storage fee. If you do not need to use the time travel feature, we recommend that you set the acid.data.retain.hours property to 0 to disable this feature. This can greatly reduce the storage costs of historical data.