BUILD jobs are used to rebuild partitions that have newly written data. This can help improve read performance. During the rebuild process, the system creates indexes, clears redundant data, and asynchronously executes DDL statements.
Usage notes
During a BUILD job, you cannot execute INSERT OVERWRITE SELECT
statements until the BUILD job is complete.
Overview
During a BUILD
job, the system merges the data that is written in real time and the historical partitions related to the data, creates indexes, and asynchronously executes DDL statements. After INSERT, UPDATE, or DELETE statements are executed on the data in a partition, a BUILD job can be performed to rebuild the partition. BUILD jobs do not rebuild the partitions whose data has not been modified.
After a BUILD job starts, you cannot stop the job.
BUILD jobs are performed at the table level. After a BUILD job starts on a table, the job is split into subtasks at the shard level. All three replicas of a shard each perform a subtask. The BUILD job is complete when all subtasks are complete. Up to three BUILD jobs can be performed in parallel, each on a different table.
Automatic trigger of BUILD jobs
A BUILD job can be automatically triggered when one of the following conditions is met:
The period of time that has elapsed since the most recent BUILD job reaches the minimum time interval between BUILD jobs and at least 50,000 rows of data have been added to a shard of a table. The minimum time interval is 0.5 hours for Data Warehouse Edition (V3.0) clusters in reserved mode and 1.5 hours for Data Warehouse Edition (V3.0) clusters in elastic mode and Data Lakehouse Edition (V3.0) clusters.
Twenty-four hours have elapsed since the most recent BUILD job, and at least one row of data has been modified.
Manually trigger a BUILD job
You can use one of the following methods to manually trigger a BUILD job:
BUILD
BUILD TABLE <table_name>;
This method triggers a BUILD job only on the partitions whose data is modified.
Execute a BUILD statement on an entire table
BUILD TABLE <table_name> force = true;
This method triggers a BUILD job on all partitions of a table.
Execute a BUILD statement on specified partitions
BUILD TABLE test force=true partitions='partition1,partition2';
ImportantYou can execute a BUILD statement on specified partitions only for AnalyticDB for MySQL clusters of V3.1.6.0 or later. For information about how to query the minor version of an AnalyticDB for MySQL cluster, see How do I query the version of an AnalyticDB for MySQL cluster? To update the minor version of a cluster, contact technical support.
If a table contains large volumes of data, the
BUILD TABLE <table_name> force = true;
statement may require a long period of time to execute. AnalyticDB for MySQL allows you to specify thepartitions
on which you want to perform a BUILD job. This helps reduce the usage of resources and improves the performance of BUILD jobs.
Configure automatic scheduling for BUILD jobs
Configuration method
SET ADB_CONFIG RC_CSTORE_BUILD_SCHEDULE_PERIOD=`<start>,<end>`;
BUILD jobs are scheduled on the hour within the time ranges specified by the
start
andend
parameters. Set the values of the start and end parameters to integers within the range of 0 to 24. You can configure multiple time ranges. Separate multiple time ranges with semicolons (;).ImportantThe specified time ranges are used for scheduling jobs, not for executing jobs. Therefore, after jobs are scheduled, the system may not complete executing the jobs when the time ranges end.
Example
Configure BUILD jobs to be scheduled within the following time ranges: 00:00:00 to 06:00:00 and 18:00:00 to 24:00:00.
SET ADB_CONFIG RC_CSTORE_BUILD_SCHEDULE_PERIOD=`0,6;18,24`;
Configure scheduling priorities for BUILD jobs
The order in which BUILD jobs are scheduled is based on the amount of data that is added to individual shards of tables. The larger the amount of data added to a shard of a table, the earlier the job is scheduled and executed. You can also use a hint or execute the SET ADB_CONFIG
statement to configure the scheduling priorities of BUILD jobs for tables.
You can configure the scheduling priorities of BUILD jobs only for AnalyticDB for MySQL clusters of V3.1.5.0 or later. For information about how to query the minor version of an AnalyticDB for MySQL cluster, see How do I query the version of an AnalyticDB for MySQL cluster? To update the minor version of a cluster, contact technical support.
A hint supports only one table. The configured priority takes effect only on the current BUILD job.
The
SET ADB_CONFIG
statement supports one or more tables. The configured priorities are effective until you configure new scheduling priorities of BUILD jobs for the tables.If you use a hint and the
SET ADB_CONFIG
statement to configure different scheduling priorities of BUILD jobs for a table, the hint-based scheduling priority takes precedence in the current job.
The task_priority
parameter specifies the scheduling priority of BUILD jobs. The value must be an integer. The default value is 30. A greater value specifies a higher scheduling priority. After you configure priorities for BUILD jobs, the frontend nodes schedule BUILD jobs based on the priorities. If you set the task_priority
parameter to a value that is less than 0 for a table, automatic scheduling of BUILD tasks is disabled for the table and the frontend nodes do not schedule BUILD tasks for the table.
Configuration methods:
Use a hint
/*+ build_task_priority = <task_priority> */ BUILD TABLE <db_name>.<table_name>;
For example, set the scheduling priority of BUILD jobs to 30 for the
test
table in theadb_demo
database./*+ build_task_priority = 30 */ Build TABLE adb_demo.test;
Use the SET ADB_CONFIG statement
Configure scheduling priorities of BUILD jobs for multiple tables in different databases.
SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `<db1_name>.<table1_name>.<task_priority>;<db2_name>.<table2_name>.<task_priority>`;
NoteSeparate multiple tables with semicolons (;).
For example, set the scheduling priority of BUILD jobs to 30 for the
test1
table in theadb_demo1
database, and 10 for thetest2
table in theadb_demo2
database.SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `adb_demo1.test1.30;adb_demo2.test2.10`;
Configure the same scheduling priority of BUILD jobs for all tables in a database.
SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `<db1_name>.*.<task_priority>`;
For example, set the scheduling priority of BUILD jobs to 30 for all tables in the
adb_demo1
database.SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `adb_demo1.*.30`;
Configure different scheduling priorities of BUILD jobs between a table and other tables in the same database.
SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `<db1_name>.*.<task_priority>;<db1_name>.<table_name>.<task_priority>`;
For example, set the scheduling priority of BUILD jobs to 30 for the
test1
table in theadb_demo1
database, and 10 for all other tables in the database.SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `adb_demo1.*.10;adb_demo1.test1.30`;
You can execute the SHOW ADB_CONFIG statement to query the current scheduling priority of BUILD jobs.
Query the status of a BUILD job
After a BUILD job is automatically or manually triggered, you can execute the following SQL statement to query the status of the BUILD job within the last three days:
SELECT table_name, schema_name, status FROM INFORMATION_SCHEMA.KEPLER_META_BUILD_TASK ORDER BY create_time DESC LIMIT 10;
The following table describes the status of BUILD jobs that correspond to different values of status
.
Value of | Description |
INIT | The job is being initialized. |
RUNNING | The job is running. |
FINISH | The job is complete. |