全部產品
Search
文件中心

DataWorks:MySQL分區表寫入最佳實務

更新時間:Jun 19, 2024

為了在您使用MySQL處理龐巨量資料集時最佳化寫入查詢效能,本文詳細介紹了DataWorksData Integration將資料寫入MySQL分區表的最佳實務。

前提條件

已具備MySQL資料庫,並在DataWorks中添加為資料來源。具體操作,請參見MySQL資料來源

準備工作

  1. 建立源表,用於DataWorks讀取來源資料。

    CREATE TABLE `user_tb1` (
      `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
      `name` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '名字',
      `update_time` datetime DEFAULT NULL,
      PRIMARY KEY (`id`)
    );
    INSERT INTO user_tb1 (name, update_time) VALUES ('u1', '2023-12-28 09:52:05');
    INSERT INTO user_tb1 (name, update_time) VALUES ('u2', '2023-12-25 09:52:05');
    INSERT INTO user_tb1 (name, update_time) VALUES ('u3', '2023-12-26 09:52:05');
    INSERT INTO user_tb1 (name, update_time) VALUES ('u3', '2023-12-27 09:52:05');
    INSERT INTO user_tb1 (name, update_time) VALUES ('u4', '2023-12-28 09:52:05');
    INSERT INTO user_tb1 (name, update_time) VALUES ('u5', '2023-12-28 09:52:05');
    INSERT INTO user_tb1 (name, update_time) VALUES ('u6', '2023-12-28 09:52:05');
    INSERT INTO user_tb1 (name, update_time) VALUES ('u7', '2023-12-28 09:52:05');
  2. 建立目標表,用於將資料寫入分區表。

    本文以RANGE分區表和KEY分區表為例,需要分別建立如下表:

    • RANGE分區表

      CREATE TABLE `pt_write_test_tb1` (
        `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
        `name` varchar(100)  COMMENT '名字',
        `update_time` datetime  COMMENT '更新時間',
        PRIMARY KEY (`id`,`update_time`),
        KEY `idx_update_time` (`update_time`)
      )
      PARTITION BY RANGE COLUMNS(update_time) (
       PARTITION pt20231216 VALUES LESS THAN ('20231216'),
       PARTITION pt20231217 VALUES LESS THAN ('20231217'),
       PARTITION pt20231218 VALUES LESS THAN ('20231218'),
       PARTITION pt20231219 VALUES LESS THAN ('20231219'),
       PARTITION pt20231220 VALUES LESS THAN ('20231220'),
       PARTITION pt20231221 VALUES LESS THAN ('20231221'),
       PARTITION pt20231222 VALUES LESS THAN ('20231222'),
       PARTITION pt20231223 VALUES LESS THAN ('20231223'),
       PARTITION pt20231224 VALUES LESS THAN ('20231224'),
       PARTITION pt20231225 VALUES LESS THAN ('20231225'),
       PARTITION pt20231226 VALUES LESS THAN ('20231226'),
       PARTITION pt20231227 VALUES LESS THAN ('20231227'),
       PARTITION pt20231228 VALUES LESS THAN ('20231228'),
       PARTITION pt20231229 VALUES LESS THAN ('20231229')
      );
    • KEY分區表

      CREATE TABLE `pt_write_test_tb2` (
        `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
        `name` varchar(100)  COMMENT '名字',
        `update_time` datetime  COMMENT '更新時間',
        PRIMARY KEY (`id`)
      )
      PARTITION BY KEY(`id`)
      PARTITIONS 16;

分區表寫入

RANGE分區表寫入

本樣本通過日期update_time欄位作為RANGE分區條件,示範如何把資料寫入MySQL分區表。

  1. 配置離線同步任務。

    配置要點:

    • 資料來源配置為已建立的源表user_tb1

    • 資料去向配置為已建立的目標RANGE分區表pt_write_test_tb1

    image

    說明

    您也可以通過資料去向匯入前準備語句配置項,讓同步任務在寫入資料之前,先建立當天的分區:

    1. 準備一個調度參數,例如pt_date=$[yyyymmdd]

    2. 匯入前準備語句中配置如下SQL:

      ALTER TABLE `pt_write_test_tb1` 
      ADD PARTITION (PARTITION pt${pt_date} VALUES LESS THAN ('${pt_date}'));
  2. 查詢目標表中被寫入分區表的資料。

    • 查詢整表資料:

      SELECT * FROM pt_write_test_tb1;

      返回樣本:

      +----+------+---------------------+
      | id | name | update_time         |
      +----+------+---------------------+
      |  2 | u2   | 2023-12-25 09:52:05 |
      |  3 | u3   | 2023-12-26 09:52:05 |
      |  4 | u3   | 2023-12-27 09:52:05 |
      |  1 | u1   | 2023-12-28 09:52:05 |
      |  5 | u4   | 2023-12-28 09:52:05 |
      |  6 | u5   | 2023-12-28 09:52:05 |
      |  7 | u6   | 2023-12-28 09:52:05 |
      |  8 | u7   | 2023-12-28 09:52:05 |
      +----+------+---------------------+
      8 rows in set (0.01 sec)
    • 查詢單分區:

      SELECT * FROM pt_write_test_tb1 partition (pt20231229);

      返回樣本:

      +----+------+---------------------+
      | id | name | update_time         |
      +----+------+---------------------+
      |  1 | u1   | 2023-12-28 09:52:05 |
      |  5 | u4   | 2023-12-28 09:52:05 |
      |  6 | u5   | 2023-12-28 09:52:05 |
      |  7 | u6   | 2023-12-28 09:52:05 |
      |  8 | u7   | 2023-12-28 09:52:05 |
      +----+------+---------------------+
      5 rows in set (0.00 sec)

KEY分區表寫入

本樣本通過主鍵id欄位作為KEY分區條件,示範如何把資料寫入MySQL分區表。

  1. 配置離線同步任務。

    配置要點:

    • 資料來源配置為已建立的源表user_tb1

    • 資料去向配置為已建立的目標KEY分區表pt_write_test_tb2

    image

  2. 查詢目標表中被寫入分區表的資料。

    • 查詢整表資料:

      select * from pt_write_test_tb2 order by id;

      返回樣本:

      +----+------+---------------------+
      | id | name | update_time         |
      +----+------+---------------------+
      |  1 | u1   | 2023-12-28 09:52:05 |
      |  2 | u2   | 2023-12-25 09:52:05 |
      |  3 | u3   | 2023-12-26 09:52:05 |
      |  4 | u3   | 2023-12-27 09:52:05 |
      |  5 | u4   | 2023-12-28 09:52:05 |
      |  6 | u5   | 2023-12-28 09:52:05 |
      |  7 | u6   | 2023-12-28 09:52:05 |
      |  8 | u7   | 2023-12-28 09:52:05 |
      +----+------+---------------------+
      8 rows in set (0.00 sec)
    • 查看各個分區的資料:

      SELECT
      	TABLE_NAME ,
      	PARTITION_NAME ,
      	TABLE_ROWS,
      	UPDATE_TIME
      FROM
      	information_schema.partitions
      WHERE
      	TABLE_NAME = 'pt_write_test_tb2';

      返回樣本:

      +-------------------+----------------+------------+---------------------+
      | TABLE_NAME        | PARTITION_NAME | TABLE_ROWS | UPDATE_TIME         |
      +-------------------+----------------+------------+---------------------+
      | pt_write_test_tb2 | p0             |          0 | NULL                |
      | pt_write_test_tb2 | p1             |          0 | NULL                |
      | pt_write_test_tb2 | p10            |          0 | NULL                |
      | pt_write_test_tb2 | p11            |          1 | 2024-02-29 14:49:14 |
      | pt_write_test_tb2 | p12            |          0 | NULL                |
      | pt_write_test_tb2 | p13            |          0 | NULL                |
      | pt_write_test_tb2 | p14            |          1 | 2024-02-29 14:49:15 |
      | pt_write_test_tb2 | p15            |          1 | 2024-02-29 14:49:16 |
      | pt_write_test_tb2 | p2             |          1 | 2024-02-29 14:49:17 |
      | pt_write_test_tb2 | p3             |          0 | NULL                |
      | pt_write_test_tb2 | p4             |          1 | 2024-02-29 14:49:14 |
      | pt_write_test_tb2 | p5             |          1 | 2024-02-29 14:49:15 |
      | pt_write_test_tb2 | p6             |          0 | NULL                |
      | pt_write_test_tb2 | p7             |          0 | NULL                |
      | pt_write_test_tb2 | p8             |          1 | 2024-02-29 14:49:16 |
      | pt_write_test_tb2 | p9             |          1 | 2024-02-29 14:49:17 |
      +-------------------+----------------+------------+---------------------+
      16 rows in set (0.01 sec)