In MaxCompute, you can perform time travel queries and incremental queries on Delta tables. If you perform a time travel query, you can query historical data based on the snapshot of the source table at a historical point in time or of a historical version. If you perform an incremental query, you can query historical incremental data within a historical period of time or between two versions of a source table. This topic describes the usage notes and limits of queries on Delta tables.
Syntax
[WITH <cte>[, ...] ]
SELECT [ALL | DISTINCT] <select_expr>[, <except_expr>)][, <replace_expr>] ...
FROM <table_reference>
[TIMESTAMP | VERSION AS OF expr]
[TIMESTAMP | VERSION BETWEEN start_expr AND end_expr]
[WHERE <where_condition>]
[GROUP BY {<col_list> | ROLLUP(<col_list>)}]
[HAVING <having_condition>]
[ORDER BY <order_condition>]
[DISTRIBUTE BY <distribute_condition> [SORT BY <sort_condition>]|[ CLUSTER BY <cluster_condition>] ]
[LIMIT <number>]
[WINDOW <window_clause>]
In most cases, you can execute data query language (DQL) statements to query Delta tables in all scenarios. The syntax and limits of DQL statements basically comply with the syntax and limits of MaxCompute DQL. The only difference is that the syntax of the FROM TABLE clause is optimized in DQL statements. The optimized FROM TABLE clause provides a fixed-format expression that allows you to specify a historical point in time or historical version for time travel queries or specify a time range or version range for incremental queries.
Parameters and limits of time travel queries
If you perform a time travel query on Delta tables, you can query historical data based on snapshots of source tables at a historical point in time or of a historical version. You can use the [TIMESTAMP | VERSION AS OF expr]
syntax to configure time travel queries.
TIMESTAMP AS OF expr
Parameters
TIMESTAMP AS OF
: a fixed syntax format, which is followed by a historical point in time. Historical data that was stored before the historical point in time is queried based on snapshots.expr
: a value of the standard TIMESTAMP, DATETIME, or DATE type that is supported by MaxCompute. The value can be in one of the following formats:Date string constants:
The following table provides examples of string constants of the TIMESTAMP, DATETIME, or DATE type.
Data type
Example
TIMESTAMP
'2023-01-01 00:00:00.123'
DATETIME
'2023-01-01 00:00:00'
DATE
'2023-01-01'
Built-in time functions of MaxCompute:
current_timestamp()
,getDate() + N: current_timestamp()
andgetDate()
.In the preceding functions, N is expressed in seconds. If N is set to a negative number, the time that is N seconds before the current time is specified. If N is set to a positive number, the time that is N seconds after the current time is specified.
Syntax for Delta tables:
get_latest_timestamp(string tablename [, bigint <number>])
.For cross-project access, the value of the
tablename
parameter must be formatted asProjectName.TableName
.For a three-layer model, the value of the
tablename
parameter must be formatted asProjectName.SchemaName.TableName
.The
number
parameter is optional. The default value is 1. This parameter specifies the sequence number of a data operation in reverse chronological order. For example, to obtain the time when the last data operation is committed, set this parameter to 1. Data operations include data modifications that are initiated by users and data sorting and distribution that are initiated by the system. The returned TIMESTAMP may be the same even if the number parameter is set to different values.
Limits
The query range for historical snapshot data is
[CreateTableTimestamp, expr]
. The expr parameter specifies the time when a DML operation is committed, and the CreateTableTimestamp parameter specifies the time when the table creation operation is committed.If the time specified by the
expr
parameter is more than N hours ago or is earlier than the time when the Delta table is created, an error is returned because specific historical data may not exist. N is specified by theacid.data.retain.hours
parameter that is configured during table creation. For example, if the acid.data.retain.hours parameter is set to 72, and the expr parameter is set to a time more than 72 hours ago, an error is returned.If the time specified by the
expr
parameter is N hours ago, an error may be returned. N is specified by theacid.data.retain.hours
parameter that is configured during table creation. This is because second-level latency may occur when internal systems interact with each other. Therefore, we recommend that you do not use an error-prone syntax that is similar toTIMESTAMP AS OF current_timestamp() - time travel period
.
VERSION AS OF expr
Parameters
VERSION AS OF
: a fixed syntax format, which is followed by a historical data operation version. Historical data that was stored in versions before the historical data operation version is queried based on snapshots.expr
: a value of the BIGINT type that is supported by MaxCompute. The value supports the following formats:Constant of the BIGINT type:
For example, the constant
3
.Syntax for Delta tables:
get_latest_version(string tablename [, bigint <number>])
For cross-project access, the value of the
tablename
parameter must be formatted asProjectName.TableName
.For a three-layer model, the value of the
tablename
parameter must be formatted asProjectName.SchemaName.TableName
.The
number
parameter is optional. The default value is 1. This parameter specifies the sequence number of a data operation in reverse chronological order. For example, to obtain the version of the last data operation, set this parameter to 1. Data operations include data modifications that are initiated by users and data sorting and distribution that are initiated by the system. The returnedVERSION
varies with the value of the number parameter.
Limits
Each DML operation generates a strictly incrementing version number. You can use
SHOW HISTORY FOR TABLE/PARTITION
to view all DML operation information and obtain the corresponding version.The historical time range for time travel queries is
[CreateTableVersion, expr]
. The expr parameter specifies the version of a DML operation. TheCreateTableVersion
parameter specifies the version of the table creation operation. The default value is 1.The system obtains the time when a DML operation is committed based on the version specified by the expr parameter. If the time is more than N hours ago or the version is less than 1, an error is returned. N is specified by the
acid.data.retain.hours
parameter that is configured during table creation.If the version specified by the expr parameter is later than the version of the last DML operation, an error is returned. We recommend that you obtain the version by using the
get_latest_version
function.
Parameters and limits of incremental queries
Only Delta tables support incremental queries. If you perform an incremental query, you can query historical incremental data within a historical period of time or between two versions of a source table. You can use the [TIMESTAMP | VERSION BETWEEN start_expr AND end_expr]
syntax to configure incremental queries.
TIMESTAMP BETWEEN start_expr AND end_expr
Parameters
TIMESTAMP BETWEEN AND
: a fixed syntax format, which specifies a historical time range. Historical incremental data that was generated within the time range is queried.start_expr
andend_expr
: The usage and limits of the two parameters are consistent with those of theEXPR
parameter in theTIMESTAMP AS OF
syntax.
Limits
The time range specified by
(start_expr, end_expr]
is a left-open, right-closed interval. The start_expr and end_expr parameters specify the time when DML operations are committed.If the time specified by the
start_expr
parameter is more than N hours ago or is earlier than the table creation time, an error is returned. N is specified by theacid.data.retain.hours
parameter.If the time specified by the
end_expr
parameter is later than the time when the last DML operation is committed, the query result varies based on the value of theacid.incremental.query.out.of.time.range.enabled
parameter:If this parameter is set to the default value false, an error is returned.
If this parameter is set to true, all historical incremental data within the time range specified by
(start_expr,end_expr]
is returned.You can execute the
ALTER TABLE
statement to modify the value of this parameter. Sample statement:ALTER TABLE mf_tt2 SET tblproperties("acid.incremental.query.out.of.time.range.enabled"="true");
VERSION BETWEEN start_expr AND end_expr
Parameters
VERSION BETWEEN AND
: a fixed syntax format, which specifies a version range of historical DML operations. Historical incremental data that was generated within the version range is returned.start_expr
andend_expr
: The usage and limits of the two parameters are consistent with those of theexpr
parameter in theVERSION AS OF
syntax.
Limits
The version range specified by (start_expr, end_expr] is a left-open, right-closed interval. The start_expr and end_expr parameters specify the versions of DML operations.
The system obtains the time when a DML operation is committed based on the version specified by the
start_expr
parameter. If the time is more than N hours ago or the version is less than 1, an error is returned. N is specified by theacid.data.retain.hours
parameter that is configured for time travel queries.If the version specified by the
end_expr
parameter is later than the version of the last DML operation, the query result varies based on the value of theacid.incremental.query.out.of.time.range.enabled
parameter. If this parameter is set to the default value false, an error is returned. If this parameter is set to true, all historical incremental data that was generated within the version range specified by (start_expr, end_expr] is returned.
Others
If multiple rows of records that have the same key are queried, only the latest row is returned. If the latest row is in the DELETE state, this row is filtered out. The feature of querying the update status of data in formats similar to Change Data Capture (CDC) is planned to be provided in the future.
You cannot query the historical tables that do not exist. For example, after you perform the drop or rename operation on a table, you cannot query the historical data of the original table.
If you want to query the historical data in this type of table, you can restore a table and then perform queries.
Only Delta tables support time travel queries and incremental queries.
If you want to perform a time travel query or an incremental query on the same table in an SQL statement, you must set the timestamps or versions for the queries to the same values.
If you want to perform queries on a partitioned table, we recommend that you specify a partition for the query. This way, only the specified partition is queried, and less time is consumed.
The Multiversion Concurrency Control (MVCC) model is used for concurrent read and write transactions. This ensures that read and write operations are isolated and do not affect each other. The ReadCommitted level is supported.
Data that is generated by the compaction operation is not considered new data. Therefore, this type of data is not returned when you perform incremental queries.
Examples
Example data.
-- The version of the following table creation operation is 1. You can execute show history for table mf_tt2 to query the version; CREATE TABLE mf_tt2 ( pk bigint NOT NULL PRIMARY KEY, val bigint NOT NULL) PARTITIONED BY (dd string, hh string) tblproperties ("transactional"="true"); -- The version of the following DML operation is 2 INSERT OVERWRITE TABLE mf_tt2 PARTITION (dd='01', hh='01') VALUES (1, 1), (2, 2), (3, 3); -- The version of the following DML operation is 3 INSERT INTO TABLE mf_tt2 PARTITION (dd='01', hh='01') VALUES (3, 30), (4, 4), (5, 5);
Query table data.
Query the table creation time, which is used as a reference when you specify a historical point in time.
DESC EXTENDED mf_tt2;
The following result is returned.
+------------------------------------------------------------------------------------+ | Owner: ALIYUN$****_doctest@test.aliyunid.com | Project: doc_test_prod | | TableComment: | +------------------------------------------------------------------------------------+ | CreateTime: 2023-06-26 09:31:38 | | LastDDLTime: 2023-06-26 09:31:38 | | LastModifiedTime: 2023-06-26 09:32:31 | +------------------------------------------------------------------------------------+ | InternalTable: YES | Size: 8541 | +------------------------------------------------------------------------------------+ | Native Columns: | +------------------------------------------------------------------------------------+ | Field | Type | Label | ExtendedLabel | Nullable | DefaultValue | Comment | +------------------------------------------------------------------------------------+ | pk | bigint | | | false | NULL | | | val | bigint | | | false | NULL | | +------------------------------------------------------------------------------------+ | Partition Columns: | +------------------------------------------------------------------------------------+ | dd | string | | | hh | string | | +------------------------------------------------------------------------------------+ | Extended Info: | +------------------------------------------------------------------------------------+ | TableID: bec515a56cc9492c8f906a224c62**** | | IsArchived: false | | PhysicalSize: 25623 | | FileNum: 9 | | StoredAs: AliOrc | | CompressionStrategy: normal | | ClusterType: hash | | BucketNum: 16 | | ClusterColumns: [pk] | | SortColumns: [pk ASC] | +------------------------------------------------------------------------------------+
Query the version of a historical data operation.
SHOW HISTORY FOR TABLE mf_tt2 PARTITION (dd='01',hh='01');
The following result is returned.
ID = 20230626021756157ghict5k**** ObjectType ObjectId ObjectName VERSION(LSN) Time Operation PARTITION 4764c8e1cb634a4fb9c21f3fc850**** dd=01/hh=01 0000000000000002 2023-06-26 09:31:56 CREATE PARTITION 4764c8e1cb634a4fb9c21f3fc850**** dd=01/hh=01 0000000000000003 2023-06-26 09:32:32 APPEND
Examples of time travel queries.
Query all historical data that was stored before a specific point in time, such as the time specified by a string constant of the DATETIME type.
SELECT * FROM mf_tt2 TIMESTAMP AS OF '2023-06-26 09:33:00' WHERE dd = '01' AND hh = '01';
The following result is returned.
+------------+------------+----+----+ | pk | val | dd | hh | +------------+------------+----+----+ | 1 | 1 | 01 | 01 | | 3 | 30 | 01 | 01 | | 4 | 4 | 01 | 01 | | 5 | 5 | 01 | 01 | | 2 | 2 | 01 | 01 | +------------+------------+----+----+
Query all historical data that was stored before a version specified by a constant.
SELECT * FROM mf_tt2 VERSION AS OF 2 WHERE dd = '01' AND hh = '01';
The following result is returned.
+------------+------------+----+----+ | pk | val | dd | hh | +------------+------------+----+----+ | 1 | 1 | 01 | 01 | | 3 | 3 | 01 | 01 | | 2 | 2 | 01 | 01 | +------------+------------+----+----+
Query all historical data that was stored until the current time.
SELECT * FROM mf_tt2 TIMESTAMP AS OF current_timestamp() WHERE dd = '01' AND hh = '01';
The following result is returned.
+------------+------------+----+----+ | pk | val | dd | hh | +------------+------------+----+----+ | 1 | 1 | 01 | 01 | | 3 | 30 | 01 | 01 | | 4 | 4 | 01 | 01 | | 5 | 5 | 01 | 01 | | 2 | 2 | 01 | 01 | +------------+------------+----+----+
Query all historical data that was stored up to 10 seconds ago.
SELECT * FROM mf_tt2 TIMESTAMP AS OF current_timestamp() - 10 WHERE dd = '01' AND hh = '01';
The following result is returned.
+------------+------------+----+----+ | pk | val | dd | hh | +------------+------------+----+----+ | 1 | 1 | 01 | 01 | | 3 | 30 | 01 | 01 | | 4 | 4 | 01 | 01 | | 5 | 5 | 01 | 01 | | 2 | 2 | 01 | 01 | +------------+------------+----+----+
Query all historical data that was stored up to the time when the last second operation was committed.
SELECT * FROM mf_tt2 TIMESTAMP AS OF get_latest_timestamp('mf_tt2', 2) WHERE dd = '01' AND hh = '01';
The following result is returned.
+------------+------------+------------+------------+ | pk | val | dd | hh | +------------+------------+------------+------------+ | 1 | 1 | 01 | 01 | | 3 | 3 | 01 | 01 | | 2 | 2 | 01 | 01 | +------------+------------+------------+------------+
Query all historical data that was stored up to the version of the second most recent operation.
SELECT * FROM mf_tt2 VERSION AS OF get_latest_version('mf_tt2', 2) WHERE dd = '01' AND hh = '01';
The following result is returned.
+------------+------------+----+----+ | pk | val | dd | hh | +------------+------------+----+----+ | 1 | 1 | 01 | 01 | | 3 | 3 | 01 | 01 | | 2 | 2 | 01 | 01 | +------------+------------+----+----+
Examples of incremental queries
Query the historical incremental data within a specified period of time. For example, set the start time and end time to string constants of the DATETIME type based on the time when operations are committed.
SELECT * FROM mf_tt2 TIMESTAMP BETWEEN '2023-06-26 09:31:40' AND '2023-06-26 09:32:00' WHERE dd = '01' AND hh = '01';
The following result is returned.
+------------+------------+----+----+ | pk | val | dd | hh | +------------+------------+----+----+ | 1 | 1 | 01 | 01 | | 3 | 3 | 01 | 01 | | 2 | 2 | 01 | 01 | +------------+------------+----+----+
Query historical incremental data between two specified versions.
SELECT * FROM mf_tt2 version BETWEEN 2 AND 3 WHERE dd = '01' AND hh = '01';
The following result is returned.
+------------+------------+----+----+ | pk | val | dd | hh | +------------+------------+----+----+ | 3 | 30 | 01 | 01 | | 4 | 4 | 01 | 01 | | 5 | 5 | 01 | 01 | +------------+------------+----+----+
Query historical incremental data within the most recent 300 seconds.
Example: Use the default value false for the
acid.incremental.query.out.of.time.range.enabled
property of themf_tt2
table.SELECT * FROM mf_tt2 TIMESTAMP BETWEEN current_timestamp() - 301 AND current_timestamp() WHERE dd = '01' AND hh='01';
An error is returned in the following result.
FAILED: ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed: com.aliyun.odps.meta.exception.MetaException: com.aliyun.odps.meta.exception.MetaException: com.aliyun.odps.metadata.common.MetastoreServerException: Incremental query can't exceed current version. Current version timestamp: 2023-06-26 09:32:32, input timestamp is: 2023-06-26 10:47:55
Example: Modify the value of the
acid.incremental.query.out.of.time.range.enabled
property of the table to true.ALTER TABLE mf_tt2 SET tblproperties("acid.incremental.query.out.of.time.range.enabled"="true");
Example: Perform the query again.
SELECT * FROM mf_tt2 TIMESTAMP BETWEEN current_timestamp() - 301 AND current_timestamp() WHERE dd = '01' AND hh='01';
The following result is displayed:
+------------+------------+----+----+ | pk | val | dd | hh | +------------+------------+----+----+ +------------+------------+----+----+
Query all historical data within the range of the third most recent commit and the first most recent commit time.
SELECT * FROM mf_tt2 TIMESTAMP BETWEEN get_latest_timestamp('mf_tt2', 3) AND get_latest_timestamp('mf_tt2') WHERE dd = '01' AND hh = '01';
The following result is returned.
+------------+------------+----+----+ | pk | val | dd | hh | +------------+------------+----+----+ | 1 | 1 | 01 | 01 | | 3 | 30 | 01 | 01 | | 4 | 4 | 01 | 01 | | 5 | 5 | 01 | 01 | | 2 | 2 | 01 | 01 | +------------+------------+----+----+
Query all historical data within the range of the third most recent and the first most recent operation versions.
SELECT * FROM mf_tt2 VERSION BETWEEN get_latest_version('mf_tt2', 3) AND get_latest_version('mf_tt2') WHERE dd = '01' AND hh = '01';
The following result is returned.
+------------+------------+----+----+ | pk | val | dd | hh | +------------+------------+----+----+ | 1 | 1 | 01 | 01 | | 3 | 30 | 01 | 01 | | 4 | 4 | 01 | 01 | | 5 | 5 | 01 | 01 | | 2 | 2 | 01 | 01 | +------------+------------+----+----+