All Products
Search
Document Center

PolarDB:Specify a statistics collection policy

Last Updated:Nov 08, 2024

This topic describes the policies that you can use to collect statistics and provides suggestions on how to configure statistics collection.

The quality of SQL query execution plans significantly affects the database performance. To provide accurate statistics for the cost-based optimizer (CBO) to generate optimal execution plans, PolarDB-X collects statistics in the background.

Statistics collection policies

The following table describes the statistics collection policies supported by PolarDB-X.

Policy

Collection frequency

Type of SQL statement

Parameter

Auto analyze

Once a week, scheduled between 02:00 and 05:00 on a specific day.

  1. SQL statements that include the WHERE (RAND() < 3.9623395E-4) clause.

  2. SQL statements in which the IP address is specified as null in the comment.

  3. SQL statements that include the HYPERLOGLOG function.

ENABLE_BACKGROUND_STATISTIC_COLLECTION

Queries on INFORMATION_SCHEMA

Once a day, scheduled between 02:00 and 05:00.

SELECT table_schema, table_name, table_rows FROM information_schema.tables WHERE table_schema IN (xxx,xxx,xxx....) AND table_name IN (yyy,yyy,yyy...)

N/A

Real-time feedback collection

Statistics collection is triggered when the number of data rows that are inserted or deleted exceeds 20% of the number of data rows that are previously stored in the table.

The requirements are the same as the requirements of the auto analyze policy.

N/A

Issues

  • If the data volume of a table is large, the IOPS on data nodes increases during the specified hours for statistics collection.

  • If the IOPS increases during non-specified statistics collection hours, the following issues may occur:

    1. The data volume of the table is excessively large. As a result, the statistics collection task requires an excessively long period of time to collect data from a single logical table. The statistics collection task runs from the specified hours to the non-specified hours.

    2. Real-time statistics collection is being performed. If an excessively large amount of data is updated and the system requires an excessively long period of time to import the updated data, real-time statistics collection is triggered.

Suggestions

  1. If you do not want to collect statistics during early morning hours, set the ENABLE_BACKGROUND_STATISTIC_COLLECTION parameter to False on the Parameter Settings page in the PolarDB-X console. This setting disables the scheduled statistic collection feature and real-time statistics collection feature. If the two features are disabled, we recommend that you manually execute the analyze table [table_name]; statement on a regular basis to collect statistics during off-peak hours.

  2. To immediately stop a collection task, perform the following steps:

    1. Execute the show full physical_processlist where info like '%rand%'; statement to view the SQL statements that are being executed on the data node.

    2. Execute a kill statement to terminate the SQL query that is performed to collect statistics. For example, you can execute kill '0-0-222'; to terminate the corresponding SQL query. In '0-0-222', the first 0 indicates the ID of the data node, the second 0 indicates the ID of the database shard, and 222 indicates the ID of the SQL statement whose execution you want to terminate.