All Products
Search
Document Center

AnalyticDB:BUILD

Last Updated:Jul 03, 2024

BUILD jobs are used to rebuild partitions that have newly written data. This can help improve read performance. During the rebuild process, the system creates indexes, clears redundant data, and asynchronously executes DDL statements.

Usage notes

During a BUILD job, you cannot execute INSERT OVERWRITE SELECT statements until the BUILD job is complete.

Overview

During a BUILD job, the system merges the data that is written in real time and the historical partitions related to the data, creates indexes, and asynchronously executes DDL statements. After INSERT, UPDATE, or DELETE statements are executed on the data in a partition, a BUILD job can be performed to rebuild the partition. BUILD jobs do not rebuild the partitions whose data has not been modified.

After a BUILD job starts, you cannot stop the job.

BUILD jobs are performed at the table level. After a BUILD job starts on a table, the job is split into subtasks at the shard level. All three replicas of a shard each perform a subtask. The BUILD job is complete when all subtasks are complete. Up to three BUILD jobs can be performed in parallel, each on a different table.

Automatic trigger of BUILD jobs

A BUILD job can be automatically triggered when one of the following conditions is met:

  • The period of time that has elapsed since the most recent BUILD job reaches the minimum time interval between BUILD jobs and at least 50,000 rows of data have been added to a shard of a table. The minimum time interval is 0.5 hours for Data Warehouse Edition (V3.0) clusters in reserved mode and 1.5 hours for Data Warehouse Edition (V3.0) clusters in elastic mode and Data Lakehouse Edition (V3.0) clusters.

  • Twenty-four hours have elapsed since the most recent BUILD job, and at least one row of data has been modified.

Manually trigger a BUILD job

You can use one of the following methods to manually trigger a BUILD job:

  • BUILD

    BUILD TABLE <table_name>;

    This method triggers a BUILD job only on the partitions whose data is modified.

  • Execute a BUILD statement on an entire table

    BUILD TABLE <table_name> force = true;

    This method triggers a BUILD job on all partitions of a table.

  • Execute a BUILD statement on specified partitions

    BUILD TABLE test force=true partitions='partition1,partition2';
    Important

    You can execute a BUILD statement on specified partitions only for AnalyticDB for MySQL clusters of V3.1.6.0 or later. For information about how to query the minor version of an AnalyticDB for MySQL cluster, see How do I query the version of an AnalyticDB for MySQL cluster? To update the minor version of a cluster, contact technical support.

    If a table contains large volumes of data, the BUILD TABLE <table_name> force = true; statement may require a long period of time to execute. AnalyticDB for MySQL allows you to specify the partitions on which you want to perform a BUILD job. This helps reduce the usage of resources and improves the performance of BUILD jobs.

Configure automatic scheduling for BUILD jobs

  • Configuration method

    SET ADB_CONFIG RC_CSTORE_BUILD_SCHEDULE_PERIOD=`<start>,<end>`;

    BUILD jobs are scheduled on the hour within the time ranges specified by the start and end parameters. Set the values of the start and end parameters to integers within the range of 0 to 24. You can configure multiple time ranges. Separate multiple time ranges with semicolons (;).

    Important

    The specified time ranges are used for scheduling jobs, not for executing jobs. Therefore, after jobs are scheduled, the system may not complete executing the jobs when the time ranges end.

  • Example

    Configure BUILD jobs to be scheduled within the following time ranges: 00:00:00 to 06:00:00 and 18:00:00 to 24:00:00.

    SET ADB_CONFIG RC_CSTORE_BUILD_SCHEDULE_PERIOD=`0,6;18,24`;

Configure scheduling priorities for BUILD jobs

The order in which BUILD jobs are scheduled is based on the amount of data that is added to individual shards of tables. The larger the amount of data added to a shard of a table, the earlier the job is scheduled and executed. You can also use a hint or execute the SET ADB_CONFIG statement to configure the scheduling priorities of BUILD jobs for tables.

Important
  • You can configure the scheduling priorities of BUILD jobs only for AnalyticDB for MySQL clusters of V3.1.5.0 or later. For information about how to query the minor version of an AnalyticDB for MySQL cluster, see How do I query the version of an AnalyticDB for MySQL cluster? To update the minor version of a cluster, contact technical support.

  • A hint supports only one table. The configured priority takes effect only on the current BUILD job.

  • The SET ADB_CONFIG statement supports one or more tables. The configured priorities are effective until you configure new scheduling priorities of BUILD jobs for the tables.

  • If you use a hint and the SET ADB_CONFIG statement to configure different scheduling priorities of BUILD jobs for a table, the hint-based scheduling priority takes precedence in the current job.

The task_priority parameter specifies the scheduling priority of BUILD jobs. The value must be an integer. The default value is 30. A greater value specifies a higher scheduling priority. After you configure priorities for BUILD jobs, the frontend nodes schedule BUILD jobs based on the priorities. If you set the task_priority parameter to a value that is less than 0 for a table, automatic scheduling of BUILD tasks is disabled for the table and the frontend nodes do not schedule BUILD tasks for the table.

Configuration methods:

  • Use a hint

    /*+ build_task_priority = <task_priority> */ BUILD TABLE <db_name>.<table_name>;

    For example, set the scheduling priority of BUILD jobs to 30 for the test table in the adb_demo database.

    /*+ build_task_priority = 30 */ Build TABLE adb_demo.test;
  • Use the SET ADB_CONFIG statement

    • Configure scheduling priorities of BUILD jobs for multiple tables in different databases.

      SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `<db1_name>.<table1_name>.<task_priority>;<db2_name>.<table2_name>.<task_priority>`;
      Note

      Separate multiple tables with semicolons (;).

      For example, set the scheduling priority of BUILD jobs to 30 for the test1 table in the adb_demo1 database, and 10 for the test2 table in the adb_demo2 database.

      SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `adb_demo1.test1.30;adb_demo2.test2.10`;
    • Configure the same scheduling priority of BUILD jobs for all tables in a database.

      SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `<db1_name>.*.<task_priority>`;

      For example, set the scheduling priority of BUILD jobs to 30 for all tables in the adb_demo1 database.

      SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `adb_demo1.*.30`;
    • Configure different scheduling priorities of BUILD jobs between a table and other tables in the same database.

      SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `<db1_name>.*.<task_priority>;<db1_name>.<table_name>.<task_priority>`;

      For example, set the scheduling priority of BUILD jobs to 30 for the test1 table in the adb_demo1 database, and 10 for all other tables in the database.

      SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `adb_demo1.*.10;adb_demo1.test1.30`;

You can execute the SHOW ADB_CONFIG statement to query the current scheduling priority of BUILD jobs.

Query the status of a BUILD job

After a BUILD job is automatically or manually triggered, you can execute the following SQL statement to query the status of the BUILD job within the last three days:

SELECT table_name, schema_name, status FROM INFORMATION_SCHEMA.KEPLER_META_BUILD_TASK ORDER BY create_time DESC LIMIT 10;

The following table describes the status of BUILD jobs that correspond to different values of status.

Value of status

Description

INIT

The job is being initialized.

RUNNING

The job is running.

FINISH

The job is complete.