This topic describes the automatic lake format optimization policies provided by the lake format management feature.
Overview
The following table describes the automatic lake format optimization policies provided by the lake format management feature.
Policy | Type | Default threshold | Description |
AutoOptimizeByCommitVersion | Optimize | 17 | An optimization task is triggered at a fixed version interval. |
AutoVacuumByCommitVersion | Clean | 13 | A task for cleaning expired files is triggered at a fixed version interval. |
The lake format management feature is currently applicable only for tables in the Delta Lake format.
The lake format management feature is in public preview and currently free of charge.
Description
In some scenarios, such as streaming scenarios, many small files are generated during writing to data lake formats. This affects the efficiency of subsequent queries.
Data lake tables have multiple versions or snapshots. The expired data of historical versions must be deleted from data catalogs in a timely manner to prevent storage resources from being wasted.
Procedure
View optimization policies
Log on to the DLF console.
In the left-side navigation tree, choose Lake Management > Lake Format Management.
View the optimization policies on the page that appears. See the following figure.
Set a threshold for an optimization policy
In the Optimization Policy section, find the desired optimization policy and click Set Thresholds in the Actions column to set a threshold for the policy. When the interval specified by the threshold is reached, the optimization policy is automatically executed.
Disable an optimization policy
In the Optimization Policy section, find the desired optimization policy and click Disable in the Actions column to disable the optimization policy.
Enable an optimization policy
In the Optimization Policy section, find the desired optimization policy and click Enable in the Actions column to enable the optimization policy.