You can configure the lifecycle for a table based on your business requirements and data usage frequency to perform automatic data reclamation. The proper configuration of the lifecycle for a table allows you to flexibly manage the storage period of data, ensure the timeliness and validity of data, and minimize resource usage and costs.
Lifecycle
If data in a table or partition does not change after a specified period of time from the last update, the table or partition is automatically reclaimed by MaxCompute. The lifecycle specifies the period of time. A lifecycle-based table scan is performed at a scheduled time each day to scan all the partitions of a table.
If the value of the LastModifiedTime parameter exceeds the specified lifecycle for a non-partitioned table, MaxCompute reclaims the data of the table and deletes the table.
For a partitioned table, MaxCompute determines whether to reclaim the data of a partition based on the value of the LastModifiedTime parameter that is configured for each partition of the table. When the last partition is reclaimed, the table is not deleted.
If you want to automatically delete the table after the last partition is reclaimed, you can use one of the following methods:
Note
If project-level and table-level settings conflict, the table-level settings take precedence.
Project-level settings
setproject odps.table.lifecycle.deletemeta.on.expiration=true;
setproject odps.table.lifecycle.deletemeta.on.expiration=false;
Table-level settings
CREATE TABLE <Table_Name>(id int, name string)
partitioned BY (ds string)
tblproperties ('lifecycle.deletemeta'='true')
lifecycle 1;
ALTER TABLE <table_name> SET tblproperties('lifecycle.deletemeta'='true');
ALTER TABLE <table_name> SET tblproperties('lifecycle.deletemeta'='false');
The following table describes lifecycle-related operations.
Operation | Description | Role | Platform |
Operation | Description | Role | Platform |
Configure the lifecycle for a table | Configures the lifecycle for a table when you create the table or configures or modifies the lifecycle for an existing table. | Users who have the Alter permission on tables | You can execute the statements on the following platforms: |
Disable or restore the lifecycle | Disables or restores the lifecycle configured for a specified table or partition. |
Precautions
Task scheduling dependencies do not affect the reclamation of table data when the lifecycle of the table elapsed.
If you configure the lifecycle for a partitioned table, the lifecycle applies to the entire table, including all partitions of the table. You cannot configure or modify the lifecycle at the partition level.
After data is reclaimed, it is retained in the recycle bin for one day by default. The data is permanently deleted after one day and cannot be recovered.
The lifecycle feature allows MaxCompute to periodically reclaim a table or a partition. The availability of the system determines whether MaxCompute can immediately reclaim a table or a partition when the period after the time specified by LastModifiedTime exceeds the lifecycle of the table or the partition. Therefore, MaxCompute cannot always reclaim a table or a partition immediately after the lifecycle elapsed.
When data is written, updated, or deleted, the value of the LastModifiedTime parameter is updated. When the lifecycle is modified, the value of the LastModifiedTime parameter does not change.
Configure the lifecycle for a table
Configure the lifecycle for a table when you create the table
Syntax
CREATE [external] TABLE [if not exists] <table_name>
[(<col_name> <data_type> [default <default_value>] [comment <col_comment>], ...)]
[comment <table_comment>]
[partitioned BY (<col_name> <data_type> [comment <col_comment>], ...)]
[clustered BY | range clustered BY (<col_name> [, <col_name>, ...]) [sorted BY (<col_name> [ASC | DESC] [, <col_name> [ASC | DESC] ...])] INTO <number_of_buckets> buckets]
[stored by StorageHandler]
[with serdeproperties (options)]
[location <osslocation>]
lifecycle <days>;
create table [if not exists] <table_name> like <existing_table_name> [lifecycle <days>];
Parameters
Example
CREATE TABLE test_lifecycle (key string) lifecycle 100;
Configure or modify the lifecycle of an existing table
You can use one of the following methods to configure or modify the lifecycle for an existing table.
Syntax
ALTER TABLE <Table_Name> SET lifecycle <days>;
Parameters
Example
ALTER TABLE test_lifecycle set lifecycle 50;
Disable or restore the lifecycle for a table
Syntax
ALTER TABLE <table_name> [<pt_spec>] {enable|disable} lifecycle;
Parameters
table_name: required. The name of the table whose lifecycle you want to disable or restore.
pt_spec: optional. The partition of the table whose lifecycle you want to disable or restore. If a table has multi-level partitions, you must specify the values of all the partition key columns. The value of this parameter is in the partition_col1=col1_value1, partition_col2=col2_value1...
format.
enable: restores the lifecycle for a table or a specified partition of a table.
Note
You can apply the default lifecycle for a table or partitions after you restore the lifecycle for the table or partitions. You must learn the lifecycle that is configured before you restore the lifecycle. This prevents data from being mistakenly reclaimed due to the use of previous settings.
disable: disables the lifecycle for a table or a specified partition.
Note
The setting of disabling the lifecycle for a table takes precedence over the setting of restoring the lifecycle for the table. For example, if you configure table disable lifecycle
, the configuration of pt_spec enable lifecycle
does not take effect.
After you disable the lifecycle for a table, MaxCompute still records the previous lifecycle settings and allows you to modify the lifecycle settings.