All Products
Search
Document Center

Data Lake Formation:Lake format management

Last Updated:Nov 22, 2024

This topic describes the automatic optimization policies provided by the lake format management feature.

Feature introduction

The following table describes the automatic optimization policies provided by the lake format management feature.

Policy

Type

Default threshold

Description

AutoOptimizeByCommitVersion

OPTIMIZE

17

Triggers an OPTIMIZE task at fixed version intervals.

AutoVacuumByCommitVersion

CLEAN

13

Triggers a CLEAN task for expired files at fixed version intervals.

AutoOptimizeWithZorderByCommitVersion

OPTIMIZE

17

Triggers an automatic optimization policy with Zorder for lake tables at fixed version intervals.

AutoOptimizeForFinishedPartition

OPTIMIZE

-

Automatic optimization policy for completed time partitions.

AutoOptimizeForCurrentPartition

OPTIMIZE

17

Automatic optimization policy for the current time partition.

HudiAutoExecuteCompaction

COMPACTION

-

Hudi automatic compaction policy.

Note

Only the Delta Lake format is supported.

Scenario descriptions

  • In some scenarios, such as streaming, writing to lake formats generates many small files, affecting the efficiency of subsequent queries.

  • Lake format tables have multiple versions or snapshots. If expired data from historical versions in the data catalog is not deleted in a timely manner, it can lead to a waste of storage resources.

Procedure

View optimization policies

  1. Log on to the Data Lake Formation console.

  2. In the left-side navigation pane, click Lake Management > Lake Format Management to view the list of optimization policies.

Set optimization policy thresholds

  1. On the Optimization Policy page, click Set Thresholds in the Actions column.

  2. In the pop-up dialog box, enter the policy threshold and click OK. Once the threshold is met, the optimization policy is automatically executed.

Disable optimization policies

  1. On the Optimization Policy page, if the policy is enabled, click Disable in the Actions column.

  2. In the pop-up dialog box, click OK to disable the optimization policy.

Enable optimization policies

  1. On the Optimization Policy page, if the policy is disabled, click Enable in the Actions column.

  2. In the pop-up dialog box, click OK to enable the optimization policy.