All Products
Search
Document Center

Data Lake Formation:Lake format management

Last Updated:Aug 09, 2024

This topic describes the automatic lake format optimization policies provided by the lake format management feature.

Overview

The following table describes the automatic lake format optimization policies provided by the lake format management feature.

Policy

Type

Default threshold

Description

AutoOptimizeByCommitVersion

Optimize

17

An optimization task is triggered at a fixed version interval.

AutoVacuumByCommitVersion

Clean

13

A task for cleaning expired files is triggered at a fixed version interval.

Note
  • The lake format management feature is currently applicable only for tables in the Delta Lake format.

  • The lake format management feature is in public preview and currently free of charge.

Description

  1. In some scenarios, such as streaming scenarios, many small files are generated during writing to data lake formats. This affects the efficiency of subsequent queries.

  2. Data lake tables have multiple versions or snapshots. The expired data of historical versions must be deleted from data catalogs in a timely manner to prevent storage resources from being wasted.

Procedure

View optimization policies

  1. Log on to the DLF console.

  2. In the left-side navigation tree, choose Lake Management > Lake Format Management.

  3. View the optimization policies on the page that appears. See the following figure.

image

Set a threshold for an optimization policy

  1. In the Optimization Policy section, find the desired optimization policy and click Set Thresholds in the Actions column to set a threshold for the policy. When the interval specified by the threshold is reached, the optimization policy is automatically executed.

image

Disable an optimization policy

  1. In the Optimization Policy section, find the desired optimization policy and click Disable in the Actions column to disable the optimization policy.

image

Enable an optimization policy

  1. In the Optimization Policy section, find the desired optimization policy and click Enable in the Actions column to enable the optimization policy.

image