All Products
Search
Document Center

Data Lake Formation:Overview of lifecycle management

Last Updated:May 14, 2024

Lifecycle management supports multiple types of lifecycle rules. You can easily manage data lifecycles in data lakes by creating different lifecycle rules to save storage costs. This topic describes the basic operations on lifecycle rules.

Feature description

You can use lifecycle management to configure data management rules for databases and tables in a data lake. You can convert the storage class of data on a regular basis based on the following rule types: Last Access Time of Data, Partition Value By Time, Last Partition/Table Update Time, and Partition/Table Creation Time. This reduces data storage costs. You can also convert the storage class of a table to Standard by restoring the table.

Scenarios

  • A large amount of historical database or table data exists in data lakes. The historical data is no longer used for your business over time. In this case, you want to convert the storage class of the historical data to Infrequent Access (IA), Archive, or Cold Archive to save costs. Examples:

    • Order tables. The partitions of order tables are created based on the time, such as the partition named 20220101. Only the order table data of the last three years needs to be analyzed. The storage class of the historical partition data needs to be changed to Cold Archive to reduce storage costs. In this case, you can configure periodic archiving based on the Partition Value By Time rule type.

    • Database A of Business A. Business A is no longer actively developing, and therefore historical data needs to be temporarily archived in Database A. You can change the storage class of data in Database A from IA, Standard, or Archive to Cold Archive.

Limits

  1. Metadata is managed by using Data Lake Formation (DLF), and the data is stored in Object Storage Service (OSS).

  2. Unstructured data management is not supported. If you require unstructured data management, see Lifecycle in OSS documentation.

Billing

If you want to use the lifecycle management feature, the following fees are involved:

  1. The lifecycle management feature of DLF is in public preview. At present, this feature is free of charge.

  2. For more information about the fees related to lifecycle rules, see Fees related to lifecycle rules in OSS documentation.

Precautions

  1. If the storage class of data is changed to Archive or Cold Archive, the data cannot be accessed by compute engines. You must manually restore the data before you use the data, and related fees are generated. For more information, see the following topics:

    1. Overview

    2. Convert storage classes

Configure lifecycle rules based on your business requirements. Proceed with caution.

  1. If the storage class of data is changed to IA, the performance of the data is degraded when the data is accessed by compute engines. Configure lifecycle rules based on your business requirements. Proceed with caution.

  2. After you turn on the Execution Scheduling switch for lifecycle rule tasks, the lifecycle rule tasks are periodically executed every night and take effect before 08:00 the next day. For lifecycle rule tasks that are manually executed, the lifecycle rule tasks immediately take effect after they are executed.

Instructions

Prerequisites

  1. OSS is activated. If you have not activated OSS, go to the OSS console to activate OSS.

  2. The permissions on databases and tables for lifecycle management are subject to data permission control enforced by DLF. Consequently, you can configure lifecycle rules only for databases and tables within your authorized permissions.

Create a lifecycle rule

Perform the following steps to create a lifecycle rule:

  1. Log on to the DLF console. Choose Lake Management > Lifecycle Management.

  2. On the Lifecycle Management page, click Create Rule to create a lifecycle rule.

a) Set the Name, Description, and Resource Type parameters.

You can set the Resource Type parameter to Database or Table. If you set the Resource Type parameter to Database, a lifecycle rule is configured for a metadatabase. If you set the Resource Type parameter to Table, a lifecycle rule is configured for a metadata table.

image

b) Set the Rule Type parameter. DLF supports the following rule types:

image.png

  • Last Access Time of Data: You can define a lifecycle based on the time when the data is last accessed. If a table has partitions, the last partition access time at the finest granularity is used. If a table does not have partitions, the last table access time is used.

  • Partition Value By Time: You can define a lifecycle based on partition values. This rule type is suitable for tables whose level-1 partitions contain time formats.

  • Last Partition/Table Update Time: You can define a lifecycle based on the time when partitions or tables are modified. If a table has partitions, the last partition update time at the finest granularity is used. If a table does not have partitions, the last table update time is used.

  • Partition/Table Creation Time: You can define a lifecycle based on the time when partitions or tables are created. If a table has partitions, the partition creation time at the finest granularity is used. If a table does not have partitions, the table creation time is used.

c) Select the interval at which the storage class of the data is converted to IA, Archive, or Cold Archive.

image.png

d) Configure the rule execution mechanism. If you want DLF to automatically execute the current lifecycle rule every day, turn on the Execution Scheduling switch. If the current lifecycle rule does not need to be automatically executed every day, you can click Manual Execution on the Lifecycle Management page to manually execute the lifecycle rule after the lifecycle rule is created. The periodic execution is completed before 08:00 every day.image

  1. Click Next. On the page that appears, select the metadatabase or the metadata table that you want to archive.

a) Click Add Database Resources. In the Add Database Resources dialog box, select the resources that you want to associate. You can search for resources or select resources on different pages.

image

b) Click Add. Then, click OK. In the message that appears, click OK. The resource association result is displayed.

If the resources are associated, you can check the number of resources that are associated.

If the resources are not associated, you can check the failure cause.

image

Note

  1. If you set the Resource Type parameter to Database, you can add database resources. If you set the Resource Type parameter to Table, you can add table resources.

  2. The priority of a table rule is higher than that of a database rule. If a table has been associated with a database rule, the original database rule associated with the table is replaced.

  3. Each database or table can be associated with only one lifecycle rule at a time.

  4. Each lifecycle rule can be associated with a maximum of 1,000 resources.

  5. You can configure a lifecycle rule and then associate resources with the lifecycle rule. After you associate resources with the lifecycle rule, click Save.

Edit a lifecycle rule

If you want to modify or edit the current lifecycle rule, find the desired lifecycle rule and click Edit on the Lifecycle Management page.

Important

  1. After the lifecycle rule is modified, the modifications take effect on the next day if you turn on the Execution Scheduling switch.

  2. After the modified lifecycle rule is executed again, all resources that are associated with the lifecycle rule are affected. The following items are the impacts:

    1. If the storage class of data is set to IA, Archive, or Cold Archive, the current storage class remains unchanged.

    2. If the storage class of data is not set to IA, Archive, or Cold Archive, the modified lifecycle rule takes effect.

生命周期列表-编辑

View a lifecycle rule

  1. Log on to the DLF console. Choose Lake Management > Lifecycle Management.

  2. Find the desired lifecycle rule, click the ID of the lifecycle rule, and then view the information about the lifecycle rule.

  • Basic Information: On the Basic Information tab, you can check the basic information about the lifecycle rule, details about the lifecycle rule, and the execution mechanism.

image

  • Resource Information: On the Resource Information tab, you can check the information about the associated databases or tables.

image

  • Execution History: On the Execution History tab, you can check the information about manual or periodic rule execution.

image

Delete a lifecycle rule

  1. Log on to the DLF console. Choose Lake Management > Lifecycle Management.

  2. Find the lifecycle rule that you want to delete and click Delete in the Actions column. In the message that appears, click Delete.

Note

  1. After you delete a lifecycle rule, the lifecycle rule cannot be manually executed or automatically and periodically executed.

  2. After you delete a lifecycle rule, the storage class of the data that is affected by the lifecycle rule is the current storage class.

image

Manually execute a lifecycle rule

  1. Log on to the DLF console. Choose Lake Management > Lifecycle Management.

  2. Find the lifecycle rule that you want to manually execute and click Manual Execution in the Actions column. In the message that appears, click OK to manually execute the lifecycle rule.

Important

After the preceding steps are performed, the lifecycle rule is immediately executed. Data about the associated resources is affected, and business access may be affected. Before you manually execute the lifecycle rule, evaluate risks in advance.

生命周期列表-手动执行

View task execution records

  1. Log on to the DLF console. Choose Lake Management > Lifecycle Management.

  2. Click the Execution History tab. You can query all historical archived execution tasks and view execution logs.

生命周期-执行历史-列表

  1. Click the ID of the desired task and view the task execution information and the execution log.

生命周期-执行历史-日志

Restore a table

  1. DLF supports table restoration. After you click Restore Table on the Storage Rule tab, the storage class of the table is changed to Standard.image.png

  2. If you want to convert cold data to hot data, see the following topics: