An ApsaraDB RDS for PostgreSQL instance is a large application. If your RDS instance processes a large number of requests, it consumes a large number of memory, CPU, I/O, and network resources. SQL optimization is an effective instance optimization method. To achieve the best results of SQL optimization, you must identify the SQL statements that consume the most resources, such as I/O resources.
Instance resources include CPU resources, memory resources, and I/O resources. You can use the pg_stat_statements extension to collect statistics on the consumed resources of your RDS instance and analyze the executed SQL statements to identify the SQL statements that consume the most CPU, memory, or I/O resources.
This topic describes how to create the pg_stat_statements extension, analyze the SQL statements that consume the most resources, and reset the statistics of resource consumption.
You can enable the SQL Explorer and Audit feature to record the execution of SQL statements and perform aggregate analysis on the execution. After you enable the feature, the feature automatically records SQL statements from the database kernel as well as statement information, such as execution details, execution accounts, and IP addresses. This does not affect instance performance. For more information, see Use the SQL Explorer and Audit feature.
Execute the following statement to create the pg_stat_statements extension in your RDS instance:
CREATE EXTENSION pg_stat_statements;
Resource consumption statistics generated by the pg_stat_statements extension
You can query the resource consumption statistics from the view that is generated by the pg_stat_statements extension. Some filter conditions in SQL statements are replaced with variables in the pg_stat_statements extension to reduce duplicate statistics.
The view that is generated by the pg_stat_statements extension provides the following important information:
Information about each SQL statement, including the number of times that the SQL statement is executed, the total execution duration, the shortest execution duration, the longest execution duration, the average execution duration, the execution duration variance, the total number of rows that are scanned, the total number of rows that are returned, and the total number of rows that are processed. The execution duration variance is used to reflect network jitters.
Usage of the shared buffer, including the hit ratio, the miss ratio, the number of dirty data blocks that are generated, and the number of dirty data blocks that are evicted.
Usage of the local buffer, including the hit ratio, the miss ratio, the number of dirty data blocks that are generated, and the number of dirty data blocks that are evicted.
Usage of the temp buffer, including the number of dirty data blocks that are read and the number of dirty data blocks that are evicted.
The durations of read and write operations on each data block in your RDS instance.
The following table describes the parameters in the resource consumption statistics that are generated by the pg_stat_statements extension.
Parameter | Type | Example | Description |
userid | oid | pg_authid.oid | OID of user who executed the statement. |
dbid | oid | pg_database.oid | OID of database in which the statement was executed. |
queryid | bigint | None | Internal hash code, computed from the statement's parse tree. |
query | text | None | Text of a representative statement. |
calls | bigint | None | Number of times executed. |
total_time | double precision | None | Total time spent in the statement, in milliseconds. |
min_time | double precision | None | Minimum time spent in the statement, in milliseconds. |
max_time | double precision | None | Maximum time spent in the statement, in milliseconds. |
mean_time | double precision | None | Mean time spent in the statement, in milliseconds. |
stddev_time | double precision | None | Population standard deviation of time spent in the statement, in milliseconds. |
rows | bigint | None | Total number of rows retrieved or affected by the statement. |
shared_blks_hit | bigint | None | Total number of shared block cache hits by the statement. |
shared_blks_read | bigint | None | Total number of shared blocks read by the statement. |
shared_blks_dirtied | bigint | None | Total number of shared blocks dirtied by the statement. |
shared_blks_written | bigint | None | Total number of shared blocks written by the statement. |
local_blks_hit | bigint | None | Total number of local block cache hits by the statement. |
local_blks_read | bigint | None | Total number of local blocks read by the statement. |
local_blks_dirtied | bigint | None | Total number of local blocks dirtied by the statement. |
local_blks_written | bigint | None | Total number of local blocks written by the statement. |
temp_blks_read | bigint | None | Total number of temp blocks read by the statement. |
temp_blks_written | bigint | None | Total number of temp blocks written by the statement. |
blk_read_time | double precision | None | Total time the statement spent reading blocks, in milliseconds (if track_io_timing is enabled, otherwise zero). |
blk_write_time | double precision | None | Total time the statement spent writing blocks, in milliseconds (if track_io_timing is enabled, otherwise zero). |
Analyze SQL statements that consume the most resources
SQL statements that consume the most I/O resources
Execute the following statement to view the top five SQL statements that consume the most I/O resources in one call:
SELECT userid::regrole, dbid, query FROM pg_stat_statements ORDER BY (blk_read_time+blk_write_time)/calls DESC LIMIT 5;
Execute the following statement to view the top five SQL statements that consume the most I/O resources in total:
SELECT userid::regrole, dbid, query FROM pg_stat_statements ORDER BY (blk_read_time+blk_write_time) DESC LIMIT 5;
SQL statements that consume the most time
Execute the following statement to view the top five SQL statements that consume the most time in one call:
SELECT userid::regrole, dbid, query FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 5;
Execute the following statement to view the top five SQL statements that consume the most time in total:
SELECT userid::regrole, dbid, query FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
SQL statements with the most severe response jitter
Execute the following statement to view the top five SQL statements with the most severe response jitter:
SELECT userid::regrole, dbid, query FROM pg_stat_statements ORDER BY stddev_time DESC LIMIT 5;
SQL statements that consume the most shared memory resources
Execute the following statement to view the top five SQL statements that consume the most shared memory resources:
SELECT userid::regrole, dbid, query FROM pg_stat_statements ORDER BY (shared_blks_hit+shared_blks_dirtied) DESC LIMIT 5;
SQL statements consume the most temporary space
Execute the following statement to view the top five SQL statements that consume the most temporary space:
SELECT userid::regrole, dbid, query FROM pg_stat_statements ORDER BY temp_blks_written DESC LIMIT 5;
Reset resource consumption statistics
The pg_stat_statements extension collects accumulative statistics. To view the statistics over a specific period of time, you must query the snapshots of RDS the instance. For more information, see PostgreSQL AWR report (for ApsaraDB PgSQL).
You can execute the following statement to delete historical statistics on a regular basis:
SELECT pg_stat_statements_reset();
References
For more information, see PostgreSQL 9.6.2 Documentation - F.29. pg_stat_statements.