All Products
Search
Document Center

PolarDB:Manually archive cold data in the CSV or ORC format

Last Updated:Nov 26, 2024

This topic describes how to archive cold data in the comma-separated values (CSV) or Optimized Row Columnar (ORC) format and import data back from Object Storage Service (OSS) tables to PolarStore.

Prerequisites

  • To archive cold data in the CSV format for a cluster, make sure that the cluster meets the following engine version requirements. For information about how to query the engine version of a cluster, see Query the engine version.

    • If the cluster is of Cluster Edition, it must run one of the following database engine versions:

      • PolarDB for MySQL 8.0.1 whose revision version is 8.0.1.1.32 or later

      • PolarDB for MySQL 8.0.2 whose revision version is 8.0.2.2.10 or later

        Note

        Manual archiving of cold data is not recorded in binary logs if the cluster runs one of the following database engine versions:

        • PolarDB for MySQL 8.0.1 whose revision version is 8.0.1.1.33 or later

        • PolarDB for MySQL 8.0.2 whose revision version is 8.0.2.2.11.1 or later

    • If the cluster is of Multi-master Cluster (Database/Table) Edition, it must run PolarDB for MySQL 8.0.1.0.13 or later.

  • To archive cold data in the ORC format for a cluster, make sure that the cluster runs PolarDB for MySQL 8.0.2 whose revision version is 8.0.2.2.16.2 or later. For information about how to query the engine version of a cluster, see Query the engine version.

  • The cold data archiving feature is enabled. For more information, see Enable cold data archiving.

  • The cluster is connected. For more information, see Connected to a cluster.

Usage notes

Archive common tables

Cold data archiving is the process of preserving cold data within tables. These tables that store the cold data are referred to as archived tables (read-only) and utilize the OSS engine. The data files corresponding to these archived tables are stored in OSS. After cold data is archived, the PolarFileSystem space occupied by the original table is released.

Syntax

  • Archive cold data in the CSV format:

    ALTER TABLE table_name ENGINE = CSV CONNECTION = 'default_oss_server';

    If the cluster is of PolarDB for MySQL 8.0.1.1.33 or later, PolarDB for MySQL 8.0.2.2.13 or later, or Multi-master Cluster (Database/Table) Edition PolarDB for MySQL 8.0.1.1.15 or later, you can also execute the following statement to manually archive cold data:

    ALTER TABLE table_name ENGINE = CSV STORAGE OSS;
  • Archive cold data in the ORC format:

    ALTER TABLE table_name ENGINE = ORC STORAGE OSS;
Note

If OSS contains a file that has the same name as the file to be archived, the system returns an error message indicating that the file already exists in OSS. Example:

Target file for archived table exists on oss.

You can add the FORCE keyword to the preceding statement to overwrite the existing file in OSS. The following example shows how to add the FORCE keyword to the statement that archives cold data in the CSV format:

ALTER TABLE table_name ENGINE = CSV CONNECTION = 'default_oss_server' , FORCE;

Parameters

Parameter

Description

table_name

The name of the table to be archived to OSS.

Precautions

  • Only data in the tables that use the InnoDB storage engine and X-Engine can be archived.

  • You cannot modify the table by executing DDL or DML statements during cold data archiving.

  • Data files cannot be archived to custom OSS servers.

  • To archive cold data in a table that uses the InnoDB storage engine, make sure that the table has a primary key.

  • After cold data is archived, the table in OSS becomes read-only, and the query speed for this table is slow. Before you perform data archiving, we recommend that you use test data to evaluate whether the query performance on the archived table meets your business requirements.

Examples

Archive data in the t table to OSS in the CSV or ORC format.

  1. Create a table named t in the oss_test database. The table uses the InnoDB storage engine.

  2. CREATE TABLE t(a int, b int, c int, primary key(a)) ENGINE = INNODB;
  3. Insert data into the t table.

  4. INSERT INTO t VALUES (1,2,3);
  5. Execute the ALTER statement to archive cold data.

    • Archive data in the CSV format:

      ALTER TABLE t ENGINE = CSV CONNECTION = 'default_oss_server';

      If the cluster is of PolarDB for MySQL 8.0.1.1.33 or later, PolarDB for MySQL 8.0.2.2.13 or later, or Multi-master Cluster (Database/Table) Edition PolarDB for MySQL 8.0.1.1.15 or later, execute the following statement:

      ALTER TABLE t ENGINE = CSV STORAGE OSS; 
    • Archive data in the ORC format:

      ALTER TABLE t ENGINE = ORC STORAGE OSS;
  6. After the data is archived, you can log on to the PolarDB console to view the archived databases and tables in OSS, or execute SQL statements to view data in the archived tables.

    • To view the archived databases and tables in OSS, log on to the PolarDB console. Find the cluster that you want to manage and choose Settings and Management > Cold Data Archive to view information about the archived databases and tables in OSS.

    • To view data in the archived tables, you can execute SQL statements in the same way as you execute SQL statements before the archiving operations. Example:

      SELECT * FROM t;

Archive table partitions to an OSS foreign table

Note
  • The cluster must run PolarDB for MySQL 8.0.2 whose revision version is 8.0.2.2.25 or later.

  • Set the partition_level_mdl_enabled cluster parameter to ON in the PolarDB console to enable the partition-level metadata lock (MDL) feature. For more information about how to configure cluster parameters, see Configure cluster and node parameters.

  • Set the loose_use_oss_meta cluster parameter to ON in the PolarDB console to enable the use_oss_meta feature. For more information about how to configure cluster parameters, see Configure cluster and node parameters.

Syntax

CALL dbms_dlm.archive_partition2table('source_db', 'source_tb', 'source_part', 'archive_db', 'archive_table', 'archive_table_file_filter');

Parameters

Parameter

Description

source_db

The name of the source database.

source_tb

The name of the source table.

source_part

The partitions of the source table that you want to archive. Separate multiple partitions with commas (,).

archive_db

The name of the destination database.

archive_table

The name of the destination table.

archive_table_file_filter

Specifies whether to create a file filter for the destination table.

Precautions

  • If the destination OSS table does not exist, the system automatically creates the OSS table and creates an OSS file filter on the columns specified in the archive_table_file_filter parameter. The file filter accelerates query performance. The primary key and partition key are automatically added to the archive_table_file_filter parameter to accelerate queries.

  • If the destination OSS table exists, the system checks whether the column names and types in the source and destination tables are the same. If the column names and types are the same, the specified partitions are archived to the destination table. Otherwise, the archiving operation fails. You can use DDL statements to ensure the definition consistency between the two tables. For more information, see Perform DDL operations on cold data. If an OSS file filter created on the destination table has a different definition from the call dbms_dlm.archive_partition2table table, the OSS file filter on the destination table takes precedence.

  • If the destination OSS table does not exist but a file that has the same name as the destination OSS table exists on OSS, an error message that is similar to the following message is returned when you perform the archiving operation: Sample message:

    mysql> CALL dbms_dlm.archive_partition2table('test', 'sales', 'p0', 'test', 'sales_history', 'id');
    ERROR 8181 (HY000): [Data Lifecycle Management] errmsg: Target file for archived table exists on oss, please remove it first, or use flag 'FORCE' to overwrite on existing files.

    If you no longer require the remnant file, you can use a storage procedure to delete the file from OSS and then perform data archiving.

    -- Delete OSS data.
    mysql> CALL dbms_oss.delete_table_file('test', 'sales_history');
    
    Query OK, 0 rows affected (0.76 sec)
    
    -- Perform data archiving.
    mysql> CALL dbms_dlm.archive_partition2table('test', 'sales', 'p0', 'test', 'sales_history', 'id');
    
    Query OK, 0 rows affected (4.24 sec)
  • The destination OSS tables can archive cold data only in the CSV format.

  • The archived partitioned table must contain at least one InnoDB partition.

  • The archived partition data are combined into one table. If you want to restore the data to the partitioned table, execute the INSERT SELECT statement.

  • You cannot archive individual level-2 partitions. However, you can archive all level-2 partitions under a level-1 partition.

  • The following table describes the partition types that can be archived to OSS foreign tables.

    Level-1 partitions

    Level-2 partitions

    Archivable to OSS foreign tables

    HASH

    All types

    No

    LIST

    All types

    Yes

    RANGE

    All types

    Yes

    KEY

    All types

    Yes

    LIST DEFAULT

    All types

    No LIST DEFAULT HASH

Examples

  1. Create an InnoDB partitioned table and insert data into the table.

    DROP TABLE IF EXISTS `sales`;
    -- Create a partitioned table.
    CREATE TABLE `sales` (
      `id` int DEFAULT NULL,
      `name` varchar(20) DEFAULT NULL,
      `order_time` datetime NOT NULL,
      primary key (order_time)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
    PARTITION BY RANGE  COLUMNS(order_time) INTERVAL(month, 1)
    (PARTITION p0 VALUES LESS THAN ('2022-01-01') ENGINE = InnoDB,
     PARTITION p1 VALUES LESS THAN ('2023-01-01 00:00:00') ENGINE = InnoDB,
     PARTITION p2 VALUES LESS THAN ('2024-01-01 00:00:00') ENGINE = InnoDB);
    
    DROP PROCEDURE IF EXISTS proc_batch_insert;
    delimiter $$
    CREATE PROCEDURE proc_batch_insert(IN begin INT, IN end INT, IN name VARCHAR(20))
      BEGIN
      SET @insert_stmt = concat('INSERT INTO ', name, ' VALUES(?  , ?, ?);');
      PREPARE stmt from @insert_stmt;
      WHILE begin <= end DO
        SET @ID1 = begin;
        SET @NAME = CONCAT(begin+begin*281313, '@stiven');
        SET @TIME = from_days(begin + 738368);
        EXECUTE stmt using @ID1, @NAME, @TIME;
        SET begin = begin + 1;
        END WHILE;
      END;
    $$
    delimiter ;
    
    CALL proc_batch_insert(1, 1000, 'sales');
  2. Archive the p0 partition of the partitioned table to an OSS table.

    1. To view the schema of the sales table, execute the following statement:

      -- View the status of the InnoDB table.
      mysql> SHOW CREATE TABLE sales;

      Sample result:

      *************************** 1. row ***************************
             Table: sales
      Create Table: CREATE TABLE `sales` (
        `id` int(11) DEFAULT NULL,
        `name` varchar(20) DEFAULT NULL,
        `order_time` datetime NOT NULL,
        PRIMARY KEY (`order_time`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
      /*!50500 PARTITION BY RANGE  COLUMNS(order_time) */ /*!99990 800020200 INTERVAL(MONTH, 1) */
      /*!50500 (PARTITION p0 VALUES LESS THAN ('2022-01-01') ENGINE = InnoDB,
       PARTITION p1 VALUES LESS THAN ('2023-01-01 00:00:00') ENGINE = InnoDB,
       PARTITION p2 VALUES LESS THAN ('2024-01-01 00:00:00') ENGINE = InnoDB,
       PARTITION _p20240201000000 VALUES LESS THAN ('2024-02-01 00:00:00') ENGINE = InnoDB,
       PARTITION _p20240301000000 VALUES LESS THAN ('2024-03-01 00:00:00') ENGINE = InnoDB,
       PARTITION _p20240401000000 VALUES LESS THAN ('2024-04-01 00:00:00') ENGINE = InnoDB,
       PARTITION _p20240501000000 VALUES LESS THAN ('2024-05-01 00:00:00') ENGINE = InnoDB) */
      1 row in set (0.03 sec)
    2. Execute the following statement to archive the P0 partition to an OSS table named sales_history:

      -- Archive the p0 partition to the sales_history OSS table and create an OSS file filter on the id column of the table.
      mysql> CALL dbms_dlm.archive_partition2table('test', 'sales', 'p0', 'test', 'sales_history', 'id');
      Query OK, 0 rows affected (1.86 sec)
    3. To view the schema of the sales_history table, execute the following statement:

      SHOW CREATE TABLE sales_history;

      Sample result:

      *************************** 1. row ***************************;
             Table: sales_history
      Create Table: CREATE TABLE `sales_history` (
        `id` int(11) DEFAULT NULL,
        `name` varchar(20) DEFAULT NULL,
       `order_time` datetime DEFAULT NULL,
        PRIMARY KEY (`order_time`)
      ) /*!99990 800020213 STORAGE OSS */ ENGINE=CSV DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci /*!99990 800020204 NULL_MARKER='NULL' */ /*!99990 800020223 OSS META=1 */ /*!99990 800020224 OSS_FILE_FILTER='id,order_time' */
      1 row in set (0.00 sec)
  3. Query data on the OSS table.

    Note

    You can enable the OSS file filter feature to accelerate queries. For more information, see Use the OSS file filter feature to accelerate cold data queries.

    mysql> explain SELECT * FROM sales_history WHERE id = 100;
    +----+-------------+---------------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------------------+
    | id | select_type | table         | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                                                         |
    +----+-------------+---------------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------------------+
    |  1 | SIMPLE      | sales_history | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  152 |    10.00 | Using where; With pushed engine condition (`test`.`sales_history`.`id` = 100) |
    +----+-------------+---------------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------------------+
    1 row in set, 1 warning (0.00 sec)
    
    mysql> SELECT * FROM sales_history WHERE id = 100;
    +------+-----------------+---------------------+
    | id   | name            | order_time          |
    +------+-----------------+---------------------+
    |  100 | 28131400@stiven | 2021-11-09 00:00:00 |
    +------+-----------------+---------------------+
    1 row in set (0.24 sec)

Archive partitioned tables

Note
  • The partitioned table archiving feature is in canary release. To use this feature, go to Quota Center. Find the quota name that corresponds to the polardb_mysql_hybrid_partition quota ID. Then, click Apply in the Actions column.

  • Your cluster must run PolarDB for MySQL 8.0.2 whose revision version is 8.0.2.2.14 or later.

Partitioned table archiving is the process of archiving data in partitions. The archived table is a hybrid partitioned table. The data files of the archived partitions are stored in OSS. After cold data is archived, the PolarStore space occupied by the original partitioned table is released.

Syntax

  • Archive cold data in the CSV format:

    ALTER TABLE table_name CHANGE PARTITION part_name ENGINE = CSV;
  • Archive cold data in the ORC format:

    ALTER TABLE table_name CHANGE PARTITION part_name ENGINE = ORC;
Note

If OSS contains a file that has the same name as the file to be archived, the system returns an error message indicating that the file already exists in OSS. Example:

Target file for archived table exists on oss.

You can add the FORCE keyword to the preceding statement to overwrite the existing file in OSS. The following example shows how to add the FORCE keyword to the statement that archives cold data in the CSV format:

ALTER TABLE table_name CHANGE PARTITION part_name ENGINE = CSV FORCE;

Parameters

Parameter

Description

table_name

The name of the table to be archived to OSS.

part_name

The name of the partition to be archived to OSS.

Precautions

  • The cold data archiving feature is supported only for partitioned tables that use the InnoDB storage engine.

  • When you archive data in a partitioned table, make sure that at least one partition remains on the InnoDB storage engine. This means that you cannot archive data in the last partition on the InnoDB storage engine.

  • The archived table is a hybrid partitioned table. You cannot execute DDL statements on a hybrid partitioned table. For more information, see Create a hybrid partitioned table.

  • You cannot modify the data of archived partitions.

  • Cold data archiving is not supported for subpartitions in a partitioned table.

  • You cannot archive cold data in the default partitions of a list default hash partitioned table.

  • You cannot archive cold data in a hash or key partitioned table.

  • You cannot archive cold data for the entire partitioned table.

  • If OSS contains a file that has the same name as the file in the partition to be archived, the execution fails and an error message that is similar to the following message is returned:

    Target file for archived table exists on oss.

    Execute the following statement to overwrite the existing file in OSS: Before you execute the statement, you must replace table_name and part_name based on your environment.

    ALTER TABLE table_name CHANGE PARTITION part_name ENGINE = CSV/ORC FORCE;

Examples

Archive data in the p1 and p2 partitions of the t table to OSS in the CSV format.

  1. Create a table named t in the database. The table uses the InnoDB storage engine.

    CREATE TABLE t(a int, b int, c int, primary key(a))
    PARTITION BY RANGE(a)
    (PARTITION p1 values less than(100),
     PARTITION p2 values less than(200),
     PARTITION p3 values less than MAXVALUE
    );
  2. Insert data into the t table.

    INSERT INTO t VALUES(1,1,1);
    INSERT INTO t VALUES(10,10,10);
    INSERT INTO t VALUES(100,100,100);
    INSERT INTO t VALUES(150,150,150);
    INSERT INTO t VALUES(200,200,200);
    INSERT INTO t VALUES(1000,1000,1000);
  3. Execute the following statements to archive data in the p1 and p2 partitions to OSS:

    • Archive data in the CSV format:

      ALTER TABLE t CHANGE PARTITION p1 ENGINE = csv;
      ALTER TABLE t CHANGE PARTITION p2 ENGINE = csv;
    • Archive data in the ORC format:

      ALTER TABLE t CHANGE PARTITION p1 ENGINE = ORC;
      ALTER TABLE t CHANGE PARTITION p2 ENGINE = ORC;
  4. After the data is archived, you can log on to the PolarDB console to view the archived databases and tables in OSS, or execute SQL statements to view data in the hybrid partitioned table.

  • To view the archived databases and tables in OSS, log on to the PolarDB console. Find the cluster that you want to manage and choose Settings and Management > Cold Data Archive to view information about the archived databases and tables in OSS.

  • For information about how to view data in a hybrid partitioned table, see Query data in a hybrid partitioned table.

Import data from OSS tables to PolarStore

Import data from a common table that is archived in OSS

Before you modify the cold data archived in OSS tables, you can execute the ALTER ENGINE statement to import data from OSS to PolarStore. Then, you can modify the data in the imported table. After data is imported to PolarStore, cold data in OSS is deleted. After the data is modified, you can archive the modified table to OSS again.

Syntax
ALTER TABLE table_name ENGINE[=]engine_name;
Parameters

Parameter

Description

table_name

The name of the OSS table to be imported.

engine_name

The engine of the imported table.

Precautions

OSS tables are in the read-only state and cannot be modified by executing the INSERT, UPDATE, or DELETE statement. To modify archived cold data, convert OSS tables into tables that can be read and written, such as InnoDB tables. When you modify an OSS table in the read-only state, the following error message is reported:

1036 - Table 't1' is read only
Examples

Import data from the OSS table named t to PolarStore in the oss_test database.

ALTER TABLE `oss_test`.`t` ENGINE = InnoDB;

Modify the data in the InnoDB table named t. After the data is modified, archive the t table in the InnoDB engine to OSS again. Examples:

ALTER TABLE t ENGINE = CSV CONNECTION = 'default_oss_server';

Or

ALTER TABLE t ENGINE = CSV STORAGE OSS;

Import data from a hybrid partitioned table that is archived in OSS

Before you modify the data in a hybrid partitioned table that is archived in OSS tables, you can execute the ALTER statement to import data from OSS to PolarStore. After data is imported to PolarStore, cold data in OSS is deleted.

Syntax
ALTER TABLE table_name REORGANIZE PARTITION part_name INTO (partition_definition);
Parameters

Parameter

Description

table_name

The name of the OSS table to be imported.

part_name

The name of the partition to be imported.

partition_definition

The partition definition. The value must be the same as the partition_definition value of the partition to be imported.

Examples

Import data from the p1 partition of the partitioned table t that is archived in OSS to PolarStore in the database.

ALTER TABLE t REORGANIZE PARTITION p1 INTO(PARTITION p1 values less than(100));

Delete the corresponding file in OSS

Note
  • You can delete the corresponding file in OSS only if your PolarDB for MySQL cluster runs one of the following versions:

    • PolarDB for MySQL 8.0.1 whose revision version is 8.0.1.1.42 or later

    • PolarDB for MySQL 8.0.2 whose revision version is 8.0.2.2.23 or later

  • If the revision version of your cluster does not meet the preceding requirements, you cannot delete the corresponding files from OSS. We recommend that you update the revision version of your cluster. For more information, see Minor version update.

After you delete a table from OSS or import a table from OSS to PolarStore, the corresponding file in OSS is retained. If you confirm that the data is no longer required, you can use the following statement to delete the corresponding file from OSS:

CALL dbms_oss.delete_table_file('database_name', 'table_name');

The OSS file is asynchronously deleted. Therefore, you must wait until all nodes in the cluster no longer depend on the OSS file before you delete the file. Take note that a delay occurs during heavy traffic. If the preceding statement fails and the "OSS files are still in use" error message appears, wait for a period of time before you re-execute the statement.