Clean up expired data of TTL tables

Updated at: 2024-10-25 03:00

PolarDB-X 2.0 allows you to manage the cleanup jobs for expired data in time to live (TTL) tables. You can monitor the progress of and information about cleanup jobs and change the cleanup speed and cleanup range based on actual conditions to balance cleanup jobs and running business.

Prerequisites

  • A PolarDB-X instance that meets one of the following requirements is created:

    • The engine version of the instance is MySQL 5.7 and the instance version is polardb-2.4.0_5.4.19-20240927_xcluster5.4.19-20240920 or later.

    • The engine version of the instance is MySQL 8.0 and the instance version is polardb-2.4.0_5.4.19-20240927_xcluster8.4.19-20240924 or later.

    Note

    For more information about version number, see Release notes.

    For more information about how to check the instance version, see View and update the version of an instance.

  • A partitioned table that belongs to a database in AUTO mode and is not partitioned by using the LOCAL PARTITION BY clause is used.

Start a cleanup job

To start a cleanup job for expired data in TTL tables, you can use one of the following methods:

  • Scheduled start: Configure the TTL_JOB attribute to allow the system to automatically start a cleanup job as scheduled.

  • Manual start: Manually execute the ALTER TABLE CLEANUP EXPIRED DATA statement to start a cleanup job.

Important

The system allows cleanup jobs to simultaneously run on up to two TTL tables. Other cleanup jobs must be queued for execution.

Usage

  • Scheduled start

    Examples:

    ALTER TABLE `my_ttl_tbl` MODIFY TTL SET TTL_ENABLE = 'ON';
    Note

    The scheduled cleanup job of the my_ttl_tbl TTL table is enabled. By default, the cleanup job is disabled.

    ALTER TABLE `my_ttl_tbl` MODIFY TTL SET TTL_JOB = CRON '0 0 2 */1 * ?  *' TIMEZONE '+08:00';
    Note
    • The schedule on which the cleanup job of the my_ttl_tbl TTL table runs is modified.

    • The schedule specified by the TTL_JOB attribute is an expected time value. The actual execution time of a cleanup job varies based on whether preceding jobs are complete. If cleanup jobs of multiple TTL tables are running and queued, the cleanup job must wait for the running and queued cleanup jobs to complete.

  • Manual start

    Example:

    /*+TDDL:cmd_extra(ENABLE_ASYNC_DDL=true, PURE_ASYNC_DDL_MODE=true)*/
    ALTER TABLE `my_ttl_tbl` CLEANUP EXPIRED DATA;
    Note
    • /*+TDDL:cmd_extra(ENABLE_ASYNC_DDL=true, PURE_ASYNC_DDL_MODE=true)*/ is used to configure the DDL statement to be asynchronously executed in the background. You do not need to wait for the result to return in the current session.

    • The ALTER TABLE CLEANUP EXPIRED DATA statement is supported only for TTL tables.

Calculate the cleanup time range

A TTL table uses an earliest-to-most recent batch data cleanup algorithm to remove data. The cleanup process starts with the earliest data and removes data within a fixed time interval during each cycle.

Execute the following statement to modify the data retention period of the my_ttl_tbl table in its TTL definition:

ALTER TABLE `my_ttl_tbl` 
MODIFY TTL 
SET 
TTL_EXPR = `time` EXPIRE AFTER 1 MONTH TIMEZONE '+08:00'
Note
  • The data retention period of the table is one month.

  • The cleanup time range for a cleanup job is three times the data retention period. In this example, the cleanup time range is three months.

For example, the current time is 2023-10-01 and an online table needs to retain only the data of the most recent month. The following figure shows the cleanup process.

image
Note
  • Day 1:

    In this example, the minimum time value of the TTL-defined time column is 2022-10-05(MinValue), and the data cleanup time interval (CleanupDataInterval) is 3 months. For more information about the data cleanup time interval, see Modify the data cleanup time interval. Based on the minimum time value and the data cleanup time interval, the first cleanup time range is from October 5, 2022 (inclusive) to January 1, 2023 (exclusive). The subsequent cleanup time ranges are from January 1, 2023 (inclusive) to April 1, 2023 (exclusive), from April 1, 2023 (inclusive) to July 1, 2023 (exclusive) and from July 1, 2023 (inclusive) to September 1, 2023 (exclusive).

  • Day 2:

    Cleans up data in the time range of January 1, 2023 (inclusive) to April 1, 2023 (exclusive). April 1, 2023 is the upper boundary (CleanupUpperBound) of the cleanup job.

  • Day 3:

    Cleans up data in the time range of April 1, 2023 (inclusive) to July 1, 2023 (exclusive). July 1, 2023 is the upper boundary (CleanupUpperBound) of the cleanup job.

  • Day 4:

    Cleans up data in the time range of July 1, 2023 (inclusive) to September 1, 2023 (exclusive). The time span is 2 months. In this example, the current date is October 1, 2023, and the TTL table is configured to retain data for the most recent month (ExpiredDataInterval). Therefore, the cleanup job cleans up data earlier than September 1, 2023, which is upper boundary (CleanupUpperBound) of the cleanup job.

The upper boundary (CleanupUpperBound) of each cleanup job can be calculated by using the following formula: CleanupUpperBound = Min (MinValue + CleanupDataInterval, ExpiredDataInterval).

Query the progress of a cleanup job

Query definitions of TTL tables

Execute the following statement to query the definitions and enabling statuses of the cleanup jobs for all TTL tables in the current database from the INFORMATION_SCHEMA.TTL_INFO view:

SELECT  * 
FROM INFORMATION_SCHEMA.TTL_INFO 
WHERE TABLE_SCHEMA='ttldb' AND TABLE_NAME = 'my_ttl_tbl';
*************************** 1. row ***************************
               TABLE_SCHEMA: ttldb
                 TABLE_NAME: my_ttl_tbl
                 TTL_ENABLE: OFF
                    TTL_COL: date_field
                   TTL_EXPR: `date_field` EXPIRE AFTER 2 MONTH TIMEZONE '+08:00'
                   TTL_CRON: 0 0 1 * * ?  *
               ARCHIVE_TYPE: 
       ARCHIVE_TABLE_SCHEMA: NULL
         ARCHIVE_TABLE_NAME: NULL
 ARCHIVE_TABLE_PRE_ALLOCATE: 12
ARCHIVE_TABLE_POST_ALLOCATE: 64
1 row in set (0.07 sec)

The following table describes the fields.

Field

Description

Field

Description

TABLE_SCHEMA

The logical name of the database to which the TTL table belongs.

TABLE_NAME

The logical name of the TTL table.

TTL_ENABLE

Indicates whether the scheduled cleanup job of the TTL table is enabled.

TTL_COL

The time column.

TTL_EXPR

The data retention period.

TTL_CRON

The schedule on which the cleanup job of the TTL table runs. The value is a CRON expression of the Quartz framework.

ARCHIVE_TYPE

The archive type. Default value: Columnar, which indicates that data is archived by row.

ARCHIVE_TABLE_SCHEMA

The logical name of the database to which the archive table bound to the TTL table belongs. If the TTL table is not bound to an archive table, NULL is returned.

ARCHIVE_TABLE_NAME

The logical name of the archive table bound to the TTL table. If the TTL table is not bound to an archive table, NULL is returned.

ARCHIVE_TABLE_PRE_ALLOCATE

The number of partitions that are pre-created for the TTL table based on partitioning rules.

ARCHIVE_TABLE_POST_ALLOCATE

The number of partitions that are created for the TTL table.

Query information about the cleanup job of a TTL table

You can query the real-time information about the running cleanup job of a TTL table from the INFORMATION_SCHEMA.TTL_SCHEDULE view.

Execute the following statement to query information about the running cleanup job of the my_ttl_tbl TTL table:

// Manually start the cleanup job.
ALTER TABLE `my_ttl_tbl` CLEANUP EXPIRED DATA;  
SELECT  * 
FROM INFORMATION_SCHEMA.TTL_SCHEDULE 
WHERE TABLE_SCHEMA='ttldb' AND TABLE_NAME = 'my_ttl_tbl';
+--------------+------------+--------------------------------------+---------------------+
| TABLE_SCHEMA | TABLE_NAME | METRIC_KEY                           | METRIC_VAL          |
+--------------+------------+--------------------------------------+---------------------+
| ttldb        | my_ttl_tbl | SCHEDULE_STATUS                      | DISABLED            |
| ttldb        | my_ttl_tbl | SCHEDULE_EXPR                        | 0 0 1 * * ?  *       |
| ttldb        | my_ttl_tbl | SCHEDULE_COMMENT                     | at 01:00            |
| ttldb        | my_ttl_tbl | SCHEDULE_TIMEZONE                    | +08:00              |
| ttldb        | my_ttl_tbl | SCHEDULE_LAST_FIRE_TIME              | 1970-01-01 08:00:00 |
| ttldb        | my_ttl_tbl | SCHEDULE_NEXT_FIRE_TIME              | 1970-01-01 08:00:00 |
| ttldb        | my_ttl_tbl | TTL_CURR_TTL_COL_MIN_VAL             | null                |
| ttldb        | my_ttl_tbl | TTL_CURR_CLEANUP_BOUND               | 2024-05-01 00:00:00 |
| ttldb        | my_ttl_tbl | TTL_CURR_CLEANUP_UPPER_BOUND         | 2024-05-01 00:00:00 |
| ttldb        | my_ttl_tbl | TTL_CURR_NEW_DATETIME_VAL            | 2024-07-18 18:22:54 |
| ttldb        | my_ttl_tbl | TTL_CURR_JOB_STAGE                   | Finished            |
| ttldb        | my_ttl_tbl | TTL_CURR_JOB_BEGIN_TS                | 1721298173321       |
| ttldb        | my_ttl_tbl | TTL_CURR_JOB_END_TS                  | 0                   |
| ttldb        | my_ttl_tbl | TTL_CURR_JOB_FROM_SCHEDULER          | false               |
| ttldb        | my_ttl_tbl | TTL_CURR_JOB_STOP_BY_MAINTAIN_WINDOW | false               |
| ttldb        | my_ttl_tbl | TTL_CURR_DN_ROWS_SPEED_LIMIT         | 0                   |
| ttldb        | my_ttl_tbl | TTL_CURR_CLEANED_PHY_PART_COUNT      | 8                   |
| ttldb        | my_ttl_tbl | TTL_CURR_TOTAL_PHY_PART_COUNT        | 8                   |
| ttldb        | my_ttl_tbl | TTL_CURR_JOB_PERCENT                 | 100                 |
| ttldb        | my_ttl_tbl | TTL_ACQUIRE_PERMITS_AVG_RT_NANO      | 0                   |
| ttldb        | my_ttl_tbl | TTL_CURR_CLEANUP_TIMECOST            | 6044                |
| ttldb        | my_ttl_tbl | TTL_CURR_CLEANUP_ROWS                | 0                   |
| ttldb        | my_ttl_tbl | TTL_CURR_CLEANUP_ROWS_SPEED          | 0                   |
| ttldb        | my_ttl_tbl | TTL_CURR_CLEANUP_DATA_LENGTH         | 0                   |
| ttldb        | my_ttl_tbl | TTL_CURR_CLEANUP_SPEED               | 0                   |
| ttldb        | my_ttl_tbl | TTL_CURR_SELECT_SQL_AVG_RT           | 844                 |
| ttldb        | my_ttl_tbl | TTL_CURR_DELETE_SQL_AVG_RT           | 2857                |
| ttldb        | my_ttl_tbl | TTL_CURR_OPTIMIZE_SQL_AVG_RT         | 0                   |
| ttldb        | my_ttl_tbl | TTL_CURR_ADD_PART_AVG_RT             | 0                   |
| ttldb        | my_ttl_tbl | TTL_CURR_DATA_FREE_PERCENT_LIMIT     | -1                  |
| ttldb        | my_ttl_tbl | TTL_CURR_TTL_TBL_DATA_FREE_PERCENT   | 0.00                |
| ttldb        | my_ttl_tbl | TTL_CURR_OPTIMIZE_TTL_TBL_PROGRESS   | 0                   |
| ttldb        | my_ttl_tbl | TTL_LAST_JOB_FROM_SCHEDULER          | false               |
| ttldb        | my_ttl_tbl | TTL_LAST_TTL_COL_MIN_VAL             |                     |
| ttldb        | my_ttl_tbl | TTL_LAST_CLEANUP_BOUND               |                     |
| ttldb        | my_ttl_tbl | TTL_LAST_CLEANUP_UPPER_BOUND         |                     |
| ttldb        | my_ttl_tbl | TTL_LAST_JOB_BEGIN_TS                | 0                   |
| ttldb        | my_ttl_tbl | TTL_LAST_JOB_END_TS                  | 0                   |
| ttldb        | my_ttl_tbl | TTL_LAST_SELECT_SQL_AVG_RT           | 0                   |
| ttldb        | my_ttl_tbl | TTL_LAST_DELETE_SQL_AVG_RT           | 0                   |
| ttldb        | my_ttl_tbl | TTL_LAST_OPTIMIZE_SQL_AVG_RT         | 0                   |
| ttldb        | my_ttl_tbl | TTL_LAST_ADD_PARTS_SQL_AVG_RT        | 0                   |
| ttldb        | my_ttl_tbl | TTL_LAST_CLEANUP_TIMECOST            | 0                   |
| ttldb        | my_ttl_tbl | TTL_LAST_CLEANUP_ROWS                | 0                   |
| ttldb        | my_ttl_tbl | TTL_LAST_CLEANUP_ROWS_SPEED          | 0                   |
| ttldb        | my_ttl_tbl | TTL_LAST_CLEANUP_DATA_LENGTH         | 0                   |
| ttldb        | my_ttl_tbl | TTL_LAST_CLEANUP_SPEED               | 0                   |
| ttldb        | my_ttl_tbl | TTL_LAST_TTL_TBL_DATA_FREE_PERCENT   | 0                   |
+--------------+------------+--------------------------------------+---------------------+
48 rows in set (0.03 sec)

The following table describes specific fields.

Field

Description

Field

Description

SCHEDULE_STATUS

Indicates whether the scheduled cleanup job is enabled. Valid values:

  • DISABLED

  • ENABLED

SCHEDULE_EXPR

The schedule on which the cleanup job runs. You can modify the schedule. For more information, see the "Modify the execution schedule of the cleanup job" section of the Definition and creation of TTL tables topic.

SCHEDULE_COMMENT

The description of the value of the SCHEDULE_EXPR field.

SCHEDULE_TIMEZONE

The time zone for scheduling the cleanup job.

SCHEDULE_LAST_FIRE_TIME

The time when automatic scheduling was last triggered. The initial value is 1970-01-01 08:00:00, which indicates that automatic scheduling is not triggered.

SCHEDULE_NEXT_FIRE_TIME

The time when automatic scheduling is triggered next time. The initial value is 1970-01-01 08:00:00, which indicates that automatic scheduling is not triggered.

TTL_CURR_TTL_COL_MIN_VAL

The minimum value in the time column specified in the TTL definition of the cleanup job.

TTL_CURR_JOB_STAGE

The current phase of the cleanup job.

TTL_CURR_CLEANUP_BOUND

The value in the time column that is being cleaned up by the cleanup job.

TTL_CURR_CLEANUP_UPPER_BOUND

The upper bound of the cleanup job. The time values of historical data to be cleaned up are earlier than the upper bound.

TTL_CURR_NEW_DATETIME_VAL

The current time.

TTL_CURR_DN_ROWS_SPEED_LIMIT

The speed limit of the cleanup job, which is the speed at which the DELETE statement is executed on each storage node. The value is the number of rows that are cleaned up per second.

TTL_CURR_CLEANUP_ROWS

The total number of rows that are cleaned up by the cleanup job.

TTL_CURR_CLEANUP_ROWS_SPEED

The real-time cleanup speed of the cleanup job. The value is the number of rows that are cleaned up per second.

TTL_CURR_CLEANUP_DATA_LENGTH

The estimated total number of bytes that are cleaned up by the cleanup job.

TTL_CURR_CLEANUP_SPEED

The real-time cleanup speed of the cleanup job. The value is the number of estimated bytes that are cleaned up per second.

TTL_CURR_SELECT_SQL_AVG_RT

The average response time of the SELECT statement executed by the cleanup job. Unit: milliseconds.

TTL_CURR_DELETE_SQL_AVG_RT

The average response time of the DELETE statement executed by the cleanup job. Unit: milliseconds.

TTL_CURR_JOB_PERCENT

The completion percentage of the cleanup job.

TTL_CURR_CLEANED_PHY_PART_COUNT

The number of physical partitions that are cleaned up by the cleanup job.

TTL_CURR_TOTAL_PHY_PART_COUNT

The total number of physical partitions that need to be cleaned up by the cleanup job.

Manage cleanup jobs

You can manage cleanup jobs to clean up data at an appropriate time and speed. This minimizes the impact of cleanup jobs on your business.

Modify the maintenance window

The default maintenance window for background cleanup jobs in PolarDB-X 2.0 is 02:00 to 05:00 every day. You can execute the following statements to change the maintenance window to 01:00 to 06:00 every day:

set global MAINTENANCE_TIME_START = '01:00';
set global MAINTENANCE_TIME_END = '06:00';

Modify the cleanup speed limit

By default, the cleanup speed limit for a single storage node is 1,000 rows per second. The total cleanup speed of all cleanup jobs on the node does not exceed the limit. If you increase the number of storage nodes in an instance, the total cleanup speed is increased.

You can execute the following statement to change the cleanup speed limit of a single storage node to 10,000 rows per second:

set global TTL_ENABLE_CLEANUP_ROWS_SPEED_LIMIT=10000
Important
  • If you increase the cleanup speed limit, more CPU and IOPS resources are consumed. We recommend that you specify an appropriate cleanup speed limit based on the actual performance of storage nodes. This prevents cleanup jobs from consuming excessive CPU and IOPS resources on storage nodes and affecting your business.

  • When the CPU and IOPS resources of a storage node are at full capacity, increasing the cleanup speed limit by executing the preceding statement may not improve the cleanup speed.

Modify the cleanup time range

By default, the time range of data to be cleaned up by the cleanup job of a TTL table is mainly controlled by the TTL_CLEANUP_BOUND_INTERVAL_COUNT parameter.

You can execute the following statement to modify the time range of data to be cleaned up:

set global TTL_CLEANUP_BOUND_INTERVAL_COUNT = 6;
Note

The default value of the TTL_CLEANUP_BOUND_INTERVAL_COUNT parameter is 3, which specifies that the cleanup time range for the cleanup job is three times the data retention period. You can execute the preceding statement to set the cleanup time range to six times the data retention period. The default cleanup range varies based on the time unit that you specified in the TTL definition. Examples:

  • If you specify the time unit in the TTL definition as DAY by setting the TTL_EXPR attribute to 'time' EXPIRE AFTER NUM DAY TIMEZONE '+08:00', the default cleanup range is six days of data each time. Data whose time value meets the following condition is cleaned up: MinValue ≤ Time < (MinValue + 6 Days).

  • If you specify the time unit in the TTL definition as MONTH by setting the TTL_EXPR attribute to 'time' EXPIRE AFTER NUM MONTH TIMEZONE '+08:00', the default cleanup range is six months of data each time. Data whose time value meets the following condition is cleaned up: MinValue ≤ Time < (MinValue + 6 Months).

  • If you specify the time unit in the TTL definition as YEAR by setting the TTL_EXPR attribute to 'time' EXPIRE AFTER NUM YEAR TIMEZONE '+08:00', the default cleanup range is six years of data each time. Data whose time value meets the following condition is cleaned up: MinValue ≤ Time < (MinValue + 6 Years).

Pause a cleanup job

  1. Execute the following SHOW FULL DDL statement to query the job IDs of all cleanup jobs:

    show full ddl\G;
    *************************** 1. row ***************************
                       JOB_ID: 1771694362409848832
                OBJECT_SCHEMA: ttldb
                  OBJECT_NAME: my_ttl_tbl
                       ENGINE: DAG
                     DDL_TYPE: ALTER_TABLE
                        STATE: RUNNING
      TOTAL_BACKFILL_PROGRESS: --
     CURRENT_PHY_DDL_PROGRESS: 0%
                     PROGRESS: 33%
         FASTCHECKER_TASK_NUM: 0
    FASTCHECKER_TASK_FINISHED: 0
                   START_TIME: 2024-09-14 15:55:12.991
                     END_TIME: 2024-09-14 15:55:16.959
             ELAPSED_TIME(MS): 3968
                  PHY_PROCESS:
                   CANCELABLE: true
                PARENT_JOB_ID: --
                RESPONSE_NODE: 10.57.104.154:3067
               EXECUTION_NODE: 10.57.104.154:3067
                     TRACE_ID: 189652a2bc805000
                     DDL_STMT: alter table my_ttl_tbl cleanup expired data
                       REMARK: --
           LEGACY_ENGINE_INFO: --
    1 row in set (0.01 sec)
  2. Obtain the job ID of the TTL table that you want to manage from the results in Step 1. Use the obtained job ID to execute the following statement to pause the cleanup job of the TTL table:

    pause ddl 1771694362409848832;
Note

A cleanup job can be paused or rolled back in real time. If you roll back a cleanup job, only the job is rolled back. The data that is cleaned up cannot be rolled back. For more information, see DDL management statements.

Restart a cleanup job

Execute the following statement to restart a paused cleanup job:

CONTINUE DDL 1771694362409848832;
  • On this page (1, T)
  • Prerequisites
  • Start a cleanup job
  • Usage
  • Calculate the cleanup time range
  • Query the progress of a cleanup job
  • Query definitions of TTL tables
  • Query information about the cleanup job of a TTL table
  • Manage cleanup jobs
  • Modify the maintenance window
  • Modify the cleanup speed limit
  • Modify the cleanup time range
  • Pause a cleanup job
  • Restart a cleanup job
Feedback