This topic describes the automatic optimization policies provided by the lake format management feature.
Feature introduction
The following table describes the automatic optimization policies provided by the lake format management feature.
Policy | Type | Default threshold | Description |
AutoOptimizeByCommitVersion | OPTIMIZE | 17 | Triggers an OPTIMIZE task at fixed version intervals. |
AutoVacuumByCommitVersion | CLEAN | 13 | Triggers a CLEAN task for expired files at fixed version intervals. |
AutoOptimizeWithZorderByCommitVersion | OPTIMIZE | 17 | Triggers an automatic optimization policy with Zorder for lake tables at fixed version intervals. |
AutoOptimizeForFinishedPartition | OPTIMIZE | - | Automatic optimization policy for completed time partitions. |
AutoOptimizeForCurrentPartition | OPTIMIZE | 17 | Automatic optimization policy for the current time partition. |
HudiAutoExecuteCompaction | COMPACTION | - | Hudi automatic compaction policy. |
Only the Delta Lake format is supported.
Scenario descriptions
In some scenarios, such as streaming, writing to lake formats generates many small files, affecting the efficiency of subsequent queries.
Lake format tables have multiple versions or snapshots. If expired data from historical versions in the data catalog is not deleted in a timely manner, it can lead to a waste of storage resources.
Procedure
View optimization policies
Log on to the Data Lake Formation console.
In the left-side navigation pane, click
to view the list of optimization policies.
Set optimization policy thresholds
On the Optimization Policy page, click Set Thresholds in the Actions column.
In the pop-up dialog box, enter the policy threshold and click OK. Once the threshold is met, the optimization policy is automatically executed.
Disable optimization policies
On the Optimization Policy page, if the policy is enabled, click Disable in the Actions column.
In the pop-up dialog box, click OK to disable the optimization policy.
Enable optimization policies
On the Optimization Policy page, if the policy is disabled, click Enable in the Actions column.
In the pop-up dialog box, click OK to enable the optimization policy.