You can configure the lifecycle for a table based on your business requirements and data usage frequency to perform automatic data reclamation. The proper configuration of the lifecycle for a table allows you to flexibly manage the storage period of data, ensure the timeliness and validity of data, and minimize resource usage and costs.
Lifecycle
If data in a table or partition does not change after a specified period of time from the last update, the table or partition is automatically reclaimed by MaxCompute. The lifecycle specifies the period of time. A lifecycle-based table scan is performed at a scheduled time each day to scan all the partitions of a table.
If the value of the LastModifiedTime parameter exceeds the specified lifecycle for a non-partitioned table, MaxCompute reclaims the data of the table and deletes the table.
For a partitioned table, MaxCompute determines whether to reclaim the data of a partition based on the value of the LastModifiedTime parameter that is configured for each partition of the table. When the last partition is reclaimed, the table is not deleted.
If you want to automatically delete the table after the last partition is reclaimed, you can use one of the following methods:
NoteIf project-level and table-level settings conflict, the table-level settings take precedence.
Project-level settings
-- Automatically delete the table after the last partition of the table is reclaimed. setproject odps.table.lifecycle.deletemeta.on.expiration=true; -- Retain the table by default after the last partition is reclaimed. setproject odps.table.lifecycle.deletemeta.on.expiration=false;
Table-level settings
--- Create a table and configure a lifecycle-related parameter to allow the table to be automatically deleted after the last partition of the table is reclaimed. CREATE TABLE <Table_Name>(id int, name string) partitioned BY (ds string) tblproperties ('lifecycle.deletemeta'='true') lifecycle 1; --- If a table exists, configure a lifecycle-related parameter to allow the table to be automatically deleted after the last partition of the table is reclaimed. ALTER TABLE <table_name> SET tblproperties('lifecycle.deletemeta'='true'); -- Retain the table by default after the last partition is reclaimed. ALTER TABLE <table_name> SET tblproperties('lifecycle.deletemeta'='false');
The following table describes lifecycle-related operations.
Operation | Description | Role | Platform |
Configures the lifecycle for a table when you create the table or configures or modifies the lifecycle for an existing table. | Users who have the Alter permission on tables | You can execute the statements on the following platforms: | |
Disables or restores the lifecycle configured for a specified table or partition. |
Precautions
Task scheduling dependencies do not affect the reclamation of table data when the lifecycle of the table elapsed.
If you configure the lifecycle for a partitioned table, the lifecycle applies to the entire table, including all partitions of the table. You cannot configure or modify the lifecycle at the partition level.
After data is reclaimed, it is retained in the recycle bin for one day by default. The data is permanently deleted after one day and cannot be recovered.
The lifecycle feature allows MaxCompute to periodically reclaim a table or a partition. The availability of the system determines whether MaxCompute can immediately reclaim a table or a partition when the period after the time specified by LastModifiedTime exceeds the lifecycle of the table or the partition. Therefore, MaxCompute cannot always reclaim a table or a partition immediately after the lifecycle elapsed.
When data is written, updated, or deleted, the value of the LastModifiedTime parameter is updated. When the lifecycle is modified, the value of the LastModifiedTime parameter does not change.
Configure the lifecycle for a table
Configure the lifecycle for a table when you create the table
Syntax
-- Create a table. CREATE [external] TABLE [if not exists] <table_name> [(<col_name> <data_type> [default <default_value>] [comment <col_comment>], ...)] [comment <table_comment>] [partitioned BY (<col_name> <data_type> [comment <col_comment>], ...)] -- Configure the shuffle and sort properties of a clustered table that you want to create. [clustered BY | range clustered BY (<col_name> [, <col_name>, ...]) [sorted BY (<col_name> [ASC | DESC] [, <col_name> [ASC | DESC] ...])] INTO <number_of_buckets> buckets] -- Used only for external tables. [stored by StorageHandler] -- Used only for external tables. [with serdeproperties (options)] -- Used only for external tables. [location <osslocation>] -- Configure the lifecycle for the table. lifecycle <days>; -- Create a table based on an existing table and replicate the schema of the existing table. You can execute the CREATE TABLE statement to create a table based on an external table or an existing table of an external project that is used to implement the data lakehouse solution. create table [if not exists] <table_name> like <existing_table_name> [lifecycle <days>];
Parameters
table_name: required. The name of the table for which you want to configure the lifecycle.
days: required. The lifecycle after modification. The value must be a positive integer. Unit: days.
NoteThe preceding section describes only the lifecycle-related parameters. For more information about the parameters for creating a table, see Create a table.
Example
-- Create the test_lifecycle table with the lifecycle of 100 days. CREATE TABLE test_lifecycle (key string) lifecycle 100;
Configure or modify the lifecycle of an existing table
You can use one of the following methods to configure or modify the lifecycle for an existing table.
Syntax
ALTER TABLE <Table_Name> SET lifecycle <days>;
Parameters
table_name: required. The name of the table for which you want to modify the lifecycle.
days: required. The lifecycle after modification. The value must be a positive integer. Unit: days.
Example
-- Change the lifecycle of the test_lifecycle table to 50 days. ALTER TABLE test_lifecycle set lifecycle 50;
Disable or restore the lifecycle for a table
Syntax
ALTER TABLE <table_name> [<pt_spec>] {enable|disable} lifecycle;
Parameters
table_name: required. The name of the table whose lifecycle you want to disable or restore.
pt_spec: optional. The partition of the table whose lifecycle you want to disable or restore. If a table has multi-level partitions, you must specify the values of all the partition key columns. The value of this parameter is in the
partition_col1=col1_value1, partition_col2=col2_value1...
format.enable: restores the lifecycle for a table or a specified partition of a table.
NoteYou can apply the default lifecycle for a table or partitions after you restore the lifecycle for the table or partitions. You must learn the lifecycle that is configured before you restore the lifecycle. This prevents data from being mistakenly reclaimed due to the use of previous settings.
disable: disables the lifecycle for a table or a specified partition.
NoteThe setting of disabling the lifecycle for a table takes precedence over the setting of restoring the lifecycle for the table. For example, if you configure
table disable lifecycle
, the configuration ofpt_spec enable lifecycle
does not take effect.After you disable the lifecycle for a table, MaxCompute still records the previous lifecycle settings and allows you to modify the lifecycle settings.
Examples
Example 1: Disable the lifecycle for the
trans
table.ALTER TABLE trans disable lifecycle;
Example 2: Disable the lifecycle for the dt='20141111' partition in the
trans
table.ALTER TABLE trans partition (dt='20141111') disable lifecycle;