This topic describes the policies that you can use to collect statistics and provides suggestions on how to configure statistics collection.
The quality of SQL query execution plans significantly affects the database performance. To provide accurate statistics for the cost-based optimizer (CBO) to generate optimal execution plans, PolarDB-X collects statistics in the background.
Statistics collection policies
The following table describes the statistics collection policies supported by PolarDB-X.
Policy | Collection frequency | Type of SQL statement | Parameter |
Auto analyze | Once a week, scheduled between 02:00 and 05:00 on a specific day. |
| ENABLE_BACKGROUND_STATISTIC_COLLECTION |
Queries on INFORMATION_SCHEMA | Once a day, scheduled between 02:00 and 05:00. | SELECT table_schema, table_name, table_rows FROM information_schema.tables WHERE table_schema IN (xxx,xxx,xxx....) AND table_name IN (yyy,yyy,yyy...) | N/A |
Real-time feedback collection | Statistics collection is triggered when the number of data rows that are inserted or deleted exceeds 20% of the number of data rows that are previously stored in the table. | The requirements are the same as the requirements of the auto analyze policy. | N/A |
Issues
If the data volume of a table is large, the IOPS on data nodes increases during the specified hours for statistics collection.
If the IOPS increases during non-specified statistics collection hours, the following issues may occur:
The data volume of the table is excessively large. As a result, the statistics collection task requires an excessively long period of time to collect data from a single logical table. The statistics collection task runs from the specified hours to the non-specified hours.
Real-time statistics collection is being performed. If an excessively large amount of data is updated and the system requires an excessively long period of time to import the updated data, real-time statistics collection is triggered.
Suggestions
If you do not want to collect statistics during early morning hours, set the ENABLE_BACKGROUND_STATISTIC_COLLECTION parameter to False on the Parameter Settings page in the PolarDB-X console. This setting disables the scheduled statistic collection feature and real-time statistics collection feature. If the two features are disabled, we recommend that you manually execute the
analyze table [table_name];
statement on a regular basis to collect statistics during off-peak hours.To immediately stop a collection task, perform the following steps:
Execute the
show full physical_processlist where info like '%rand%';
statement to view the SQL statements that are being executed on the data node.Execute a kill statement to terminate the SQL query that is performed to collect statistics. For example, you can execute
kill '0-0-222';
to terminate the corresponding SQL query. In '0-0-222', the first 0 indicates the ID of the data node, the second 0 indicates the ID of the database shard, and 222 indicates the ID of the SQL statement whose execution you want to terminate.