Query and analyze slow query logs in Hologres - Hologres

If the response or query speed of a Hologres instance slows down, you can query and analyze slow query logs to locate the issues. You can identify and analyze slow queries and failed queries by querying and analyzing slow query logs. Then, you can take optimization measures to improve system performance. This topic describes how to view and analyze slow query logs in Hologres.

Release notes

Hologres version

Description

Remarks

V0.10

The feature of querying and analyzing slow query logs is added.

This feature is supported only by Hologres V0.10 and later. You can view the version of your Hologres instance on the instance details page in the Hologres console. If the version of your Hologres instance is earlier than V0.10, manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Instance upgrades. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.
Logs of failed queries in a Hologres instance of V0.10 do not contain specific runtime statistics, such as the memory usage, disk reads, the amount of data that is read in the query, CPU time consumed, and the value of the query_stats field.

V2.2.7

The default value of the log_min_duration_statement parameter is optimized.

Slow query logs record all data definition language (DDL) statements that are executed, all SQL statements that fail to be executed, and SQL statements whose execution durations exceed the value of log_min_duration_statement. Supported SQL statements include the INSERT, SELECT, UPDATE, and DELETE statements.

In Hologres V2.2.7 and later, the default value of the log_min_duration_statement parameter is changed from 1 second to 100 milliseconds. You do not need to modify the parameter value. If the version of your instance is earlier than V2.2.7, you can manually modify the value of the log_min_duration_statement parameter. The minimum value is 100 milliseconds.

Limits

Take note of the following items when you query slow query logs in Hologres:

By default, the slow query logs of the last month are retained. However, HoloWeb allows you to query only the slow query logs of the last seven days. To query earlier slow query logs, execute SQL statements.
To ensure system stability and prevent traffic overloads, a maximum of 10,000 slow query logs are returned from the server for each query. The length of some fields in slow query logs is limited. For more information, see field descriptions in the hologres.hg_query_log table.
Slow query logs belong to the metadata warehouse system of Hologres. Failure in querying slow query logs does not negatively affect queries in instances. Therefore, the stability of slow query logs is not specified in the service-level agreement (SLA) of Hologres.

Query the hologres.hg_query_log table

Slow query logs of Hologres are stored in the hologres.hg_query_log system table.

In Hologres V0.10 and later, Hologres collects the logs of data manipulation language (DML) queries whose execution time is longer than 100 ms and the logs of all DDL statements.
In Hologres V3.0.2 and later, Hologres also collects and aggregates the logs of DML and data query language (DQL) queries that consume less than 100 ms.
- Hologres collects and aggregates the logs of successfully executed DQL and DML queries that have SQL fingerprints and consume less than 100 ms.
- In most cases, Hologres aggregates values of the following fields: server_addr, usename, datname, warehouse_id, application_name, and digest.
- Hologres aggregates and reports field values every minute for each connection.

The following table describes the fields that are contained in the hologres.hg_query_log system table.

Field	Data type	Description for DML and DQL queries that consume more than 100 ms	Description for DML and DQL queries that consume less than 100 ms
usename	text	The name of the user who initiated the query.	The name of the user who initiated the query.
status	text	The status of the query. The value indicates whether the query is successful. Valid values: SUCCESS FAILED	The status of the query, which is fixed as SUCCESS. Only logs of successful queries are aggregated.
query_id	text	The unique ID of the query. All failed queries have IDs, but successful queries may not have IDs.	The ID of the first query among queries that have the same primary key value in an aggregation period. Field values are aggregated based on the primary key.
digest	text	The SQL fingerprint. For SELECT, INSERT, DELETE, or UPDATE queries, the system calculates MD5 hash values as the SQL fingerprints of the queries. For more information about how to collect and calculate SQL fingerprints, see SQL fingerprint in this topic. Note In Hologres V2.2 and later, the digest field is added to this table to store SQL fingerprints of queries. If the version of your Hologres instance is V2.1 or earlier, contact the Hologres technical support to upgrade your instance.	The SQL fingerprint.
datname	text	The name of the database from which data is queried.	The name of the database from which data is queried.
command_tag	text	The type of the query. Valid values: DML: COPY, DELETE, INSERT, SELECT, and UPDATE DDL: ALTER TABLE, BEGIN, COMMENT, COMMIT, CREATE FOREIGN TABLE, CREATE TABLE, DROP FOREIGN TABLE, DROP TABLE, IMPORT FOREIGN SCHEMA, ROLLBACK, and TRUNCATE TABLE Others: CALL, CREATE EXTENSION, EXPLAIN, GRANT, and SECURITY LABEL	The type of the first query among queries that have the same primary key value in an aggregation period.
duration	integer	The amount of time consumed by the query, in milliseconds. The `duration` field specifies the total amount of time consumed by the query. The total amount of time includes the following items: optimization_cost: the amount of time consumed to generate an execution plan for the query. It may take a long time to generate an execution plan if the SQL statement that you executed is complex. start_query_cost: the amount of time consumed to start the query. The startup of the query may be time-consuming if the statement is waiting for a lock to be released or resources. get_next_cost: the amount of time consumed to execute the query. The query may be time-consuming if you want to read a large amount of data. In this case, you can optimize the SQL statement based on your business requirements. Others: The `extended_cost` field specifies the amount of time consumed by other operations of the query. Valid values of the extended_cost field: build_dag: the amount of time consumed to generate a directed acyclic graph (DAG) required by the query engine (QE). For example, if a foreign table is used, the metadata of the foreign table is obtained. It may take a long time to generate a DAG if the process of accessing the metadata of the foreign table is time-consuming. prepare_reqs: the amount of time consumed to prepare to send requests to the QE. The preparation process includes many required operations. For example, the system must prepare components that you want to run and obtain the serial number of each shard in this preparation process. It may take a long time to prepare to send requests to the QE if the process of obtaining the serial number of each shard from internal services is time-consuming. By default, the system displays all DDL statements and only DML statements that consume longer than 1s. You can modify the duration for collection by using the Grand Unified Configuration (GUC) parameter `log_min_duration_statement`. For more information, see the Configuration items section in this topic.	The average amount of time consumed by queries that have the same primary key value in an aggregation period.
message	text	The returned error message.	N/A.
query_start	timestamptz	The point in time when the query was initiated.	The point in time when the first query among queries that have the same primary key value in an aggregation period was initiated.
query_date	text	The date when the query was initiated.	The date when the first query among queries that have the same primary key value in an aggregation period was initiated.
query	text	The statement executed in the query. The statement can contain a maximum of 51,200 characters in length. If the length of the SQL statement exceeds 51,200 characters, only the first 51,200 characters are retained.	The statement executed in the first query among queries that have the same primary key value in an aggregation period.
result_rows	bigint	The number of rows that were returned. If the INSERT statement is executed in the query, the number of inserted rows is returned.	The average value of the result_rows field for all queries that have the same primary key value in an aggregation period.
result_bytes	bigint	The number of bytes that were returned.	The average value of the result_bytes field for all queries that have the same primary key value in an aggregation period.
read_rows	bigint	The number of rows that were read by the query. The value is not an accurate value. If a bitmap index is used, the value is not the same as the actual number of rows that were read by the query.	The average value of the read_rows field for all queries that have the same primary key value in an aggregation period.
read_bytes	bigint	The number of bytes that were read by the query.	The average value of the read_bytes field for all queries that have the same primary key value in an aggregation period.
affected_rows	bigint	The number of rows that were affected by the DML statement.	The average value of the affected_rows field for all queries that have the same primary key value in an aggregation period.
affected_bytes	bigint	The number of bytes that were affected by the DML statement.	The average value of the affected_bytes field for all queries that have the same primary key value in an aggregation period.
memory_bytes	bigint	The accumulated peak memory usage of nodes. The value is not an accurate value. This field specifies the accumulated peak memory usage consumed by the query on all running nodes. The value of the field roughly reflects the amount of data that is read in the query.	The average value of the memory_bytes field for all queries that have the same primary key value in an aggregation period.
shuffle_bytes	bigint	The number of bytes that were shuffled. The value is not an accurate value. The value of the field roughly reflects the amount of data transmitted over a network.	The average value of the shuffle_bytes field for all queries that have the same primary key value in an aggregation period.
cpu_time_ms	bigint	The total CPU time, in milliseconds. The value is not an accurate value. This field specifies the CPU time consumed by all computing tasks, which is the sum of the time for multiple CPU cores. The value of this field basically reflects the complexity of computing tasks.	The average value of the cpu_time_ms field for all queries that have the same primary key value in an aggregation period.
physical_reads	bigint	The number of physical reads. The field specifies the number of times that batch records are read from disks. The value of the field roughly reflects the number of memory cache misses.	The average value of the physical_reads field for all queries that have the same primary key value in an aggregation period.
pid	integer	The process identifier (PID) of the query.	The PID of the first query among queries that have the same primary key value in an aggregation period.
application_name	text	The type of the application that initiates the query. The following common application types are included: Realtime Compute for Apache Flink (VVR version): {client_version}_ververica-connector-hologres. Apache Flink: {client_version}_hologres-connector-flink. DataWorks Data Integration that allows you to run batch synchronization tasks to read data from Hologres: datax_{jobId}. DataWorks Data Integration that allows you to run batch synchronization tasks to write data to Hologres: {client_version}_datax_{jobId}. DataWorks Data Integration that allows you to synchronize data from databases to Hologres in real time: {client_version}_streamx_{jobId}. HoloWeb: holoweb. MaxCompute that allows you to access Hologres by using external tables: MaxCompute. Auto Analyze that initiates SQL statements to periodically update statistics: AutoAnalyze. Quick BI: QuickBI_public_{version}. DataWorks that allows you to schedule Hologres tasks: {client_version}_dwscheduler_{tenant_id}_{scheduler_id}_{scheduler_task_id}_{bizdate}_{cyctime}_{scheduler_alisa_id}. Field description: client_version: the version of the Java Database Connectivity (JDBC) driver. scheduler_id: You can obtain the value from the environment variable `${SKYNET_ID}`. If the environment variable is not specified, set this field to `-`. scheduler_task_id: You can obtain the value from the environment variable `${SKYNET_TASKID}`. If the environment variable is not specified, set this field to `-`. scheduler_alisa_id: You can obtain the value from the environment variable `${ALISA_TASK_ID}`. The environment variable is always specified. bizdate: You can obtain the value from the environment variable `${SKYNET_BIZDATE}`. If the environment variable is not specified, set this field to `-`. cyctime: You can obtain the value from the environment variable `${SKYNET_CYCTIME}`. If the environment variable is not specified, set this field to `-`. tenant_id: You can obtain the value from the environment variable `${SKYNET_TENANT_ID}`. The environment variable is always specified. Data Security Guard: dsg. For other applications that are connected to a Hologres instance, we recommend that you explicitly specify the `application_name` field in the connection string.	The type of the application that initiates the query.
engine_type	text[]	The engine used in the query. Valid values: HQE: Hologres Query Engine (HQE), which is used by most queries and delivers high efficiency. PQE: PostgreSQL Query Engine (PQE), which is used if an SQL statement contains an operator or an expression that is not natively supported by HQE. You can modify functions in SQL statements to allow the use of HQE. This improves execution efficiency. For more information, see Optimize query performance. SDK: the engine used for executing fixed plans. This engine is used for SQL statements in serving scenarios, such as point queries, point writes, and prefix queries. For more information, see Accelerate the execution of SQL statements by using fixed plans. Note In Hologres V2.2 and later, the SDK execution engine is officially renamed FixedQE. PG: the engine used for executing SQL statements locally. The engine is used to read metadata of system tables. The engine does not read data of user tables and consumes only a small amount of system resources. DDL statements are executed by using PostgreSQL engines.	The engine used by the first query among queries that have the same primary key value in an aggregation period.
client_addr	text	The IP address from which the query was initiated. The field specifies the egress IP address of the application. The value may not be the actual IP address of the application.	The IP address from which the first query among queries that have the same primary key value in an aggregation period was initiated.
table_write	text	The table to which data was written.	The table to which data was written in the first query among queries that have the same primary key value in an aggregation period.
table_read	text[]	The table from which data was read.	The table from which data was read in the first query among queries that have the same primary key value in an aggregation period.
session_id	text	The ID of the query session.	The session ID of the first query among queries that have the same primary key value in an aggregation period.
session_start	timestamptz	The point in time when the query session started. The field specifies the point in time when the connection was established.	The points in time when sessions of all queries that have the same primary key value in an aggregation period started.
command_id	text	The ID of the command or the statement.	The IDs of the commands or statements of all queries that have the same primary key value in an aggregation period.
optimization_cost	integer	The amount of time consumed to generate an execution plan for the query. It may take a long time to generate an execution plan if the SQL statement that you executed is complex.	The amounts of time consumed to generate execution plans for all queries that have the same primary key value in an aggregation period.
start_query_cost	integer	The amount of time consumed to start the query. The startup of the query may be time-consuming if the process is waiting for a lock to be released or resources.	The amounts of time consumed to start queries that have the same primary key value in an aggregation period.
get_next_cost	integer	The amount of time consumed to perform the query. The query may be time-consuming if you want to read a large amount of data. In this case, you can optimize the SQL statement based on your business requirements.	The amounts of time consumed to perform queries that have the same primary key value in an aggregation period.
extended_cost	text	The amount of time consumed by other operations of the query. The total amount of time consumed by the query includes the values of optimization_cost, start_query_cost, get_next_cost, and extended_cost fields. The `extended_cost` field specifies the amount of time consumed by other operations of the query. Valid values: build_dag: the amount of time consumed to generate a DAG required by the QE. For example, if you access a foreign table, the metadata of the foreign table is obtained. It may take a long time to generate a DAG if the process of accessing the metadata of the foreign table is time-consuming. prepare_reqs: the amount of time consumed to prepare to send requests to the QE. The preparation process includes many required operations. For example, the system must prepare components that you want to run and obtain the serial number of each shard in this preparation process. It may take a long time to prepare to send requests to the QE if the process of obtaining the serial number of each shard from internal services is time-consuming. serverless_allocated_cores: the number of compute units (CUs) of serverless computing resources that are applied. This field is available only for queries that use serverless computing resources. serverless_allocated_workers: the number of workers of serverless computing resources that are applied. This field is available only for queries that use serverless computing resources. serverless_resource_used_time_ms: the duration that a query occupies serverless computing resources. The queuing duration for waiting for resources is not included. Unit: milliseconds.	The time consumption details of the first query among queries that have the same primary key value in an aggregation period.
plan	text	The execution plan of the query. An execution plan can contain a maximum of 102,400 characters. Exceeded characters may be truncated. You can modify the length limit by using the GUC parameter `log_min_duration_query_plan`. For more information, see the Configuration items section in this topic.	The execution plan of the first query among queries that have the same primary key value in an aggregation period.
statistics	text	The execution statistics about the query. The statistics can contain a maximum of 102,400 characters. Exceeded characters may be truncated. You can modify the length limit by using the GUC parameter `log_min_duration_query_stats`. For more information, see the Configuration items section in this topic.	The execution statistics about the first query among queries that have the same primary key value in an aggregation period.
visualization_info	text	The visualization information about the execution plan.	The visualization information about the execution plan of the first query among queries that have the same primary key value in an aggregation period.
query_detail	text	The extended information about the query in the JSON format. Note The extended information can contain a maximum of 10,240 characters. If the length exceeds 10,240 characters, only the first 10,240 characters are retained.	The extended information about the first query among queries that have the same primary key value in an aggregation period.
query_extinfo	text[]	serverless_computing: This field is available only for queries that are executed by using serverless computing resources. The extended information about the query in the ARRAY format. In Hologres V2.0.29 and later, the query_extinfo field records the AccessKey ID of the account that is used to log on to the Hologres console. Note If you log on to the Hologres console by using a local account or a service-linked role (SLR) or by using Security Token Service (STS), the AccessKey ID is not recorded. If you log on to the Hologres console by using a temporary account, the temporary AccessKey ID is recorded.	The extended information about the first query among queries that have the same primary key value in an aggregation period.
calls	INT	The value is 1 because detailed information is not aggregated.	The number of queries that have the same primary key value in an aggregation period.
agg_stats	JSONB	N/A.	Values of aggregation operations such as MIN, MAX, and AVG on the duration, memory_bytes, cpu_time_ms, physical_reads, optimization_cost, start_query_cost, and get_next_cost fields for queries that have the same primary key value in an aggregation period.
extended_info	JSONB	Other extended information, such as information about query queues and serverless computing resources. serverless_computing_source: the way in which serverless computing resources are specified to execute the SQL statement. Valid values: user_submit: The user specifies that serverless computing resources are used to execute the SQL statement. The execution of the SQL statement is not related to a query queue. query_queue: All SQL statements in the specified query queue are executed by using serverless computing resources. For more information, see Use serverless computing resources to execute queries in a query queue. query_queue_rerun: The system automatically uses serverless computing resources to re-execute the SQL statement based on the large query control capability provided by the query queue feature. For more information, see Large query control. query_id_of_triggered_rerun: This field exists only when the value of serverless_computing_source is query_queue_rerun. This field specifies the original query ID corresponding to the re-executed SQL statement.	N/A.

Note

The calls and agg_stats fields are new in Hologres V3.0.2.

Grant query permissions to users

Only authorized users can query slow query logs. When you grant query permissions to users, take note of the following items:

A user can be granted permissions to query the slow query logs of all databases in a Hologres instance.
- Grant superuser permissions to a user.
  After the superuser permissions on a Hologres instance are granted to a user, the user can query the slow query logs of all databases in the Hologres instance.
```
ALTER USER "Account ID" SUPERUSER;-- Replace Account ID with the Alibaba Cloud account ID of the user. If the user is a RAM user, prefix the account ID with p4_.
```
- Add the user to the pg_read_all_stats user group.
  In addition to superusers, users in the pg_read_all_stats user group have the permissions to query the slow query logs of all databases in the relevant Hologres instance. If you are a superuser, you can add regular users to this user group. To add a user to this user group, execute one of the following statements:
```
GRANT pg_read_all_stats TO "Account ID";-- Use the standard PostgreSQL authorization model to grant related permissions to the user.
CALL spm_grant('pg_read_all_stats', 'Account ID');  -- Use the simple permission model (SPM) to grant related permissions to the user.
CALL slpm_grant('pg_read_all_stats', 'Alibaba Cloud account ID'); -- Use the SLPM to grant relevant permissions to the user.
```
A user can be granted permissions to query the slow query logs of a specific database in a Hologres instance.
Users in the db_admin user group have the permissions to query the slow query logs of the relevant database. If you are a superuser, you can add regular users to this user group by using the SPM or SLPM.
```
CALL spm_grant('<db_name>_admin', 'Account ID');  -- Use the SPM to grant related permissions to the user.
CALL slpm_grant('<db_name>.admin', 'Account ID'); -- Use the SLPM to grant related permissions to the user.
```
A regular user has the permissions to query the logs of slow queries that were initiated by the regular user in the database within the relevant Alibaba Cloud account.

Query slow query logs in the HoloWeb console

You can query slow query logs in the HoloWeb console in a visualized manner.

Note

Slow query logs of the last seven days can be queried in the HoloWeb console.
Only a superuser can query slow query logs in the HoloWeb console. If you are a regular user who has related permissions, you can query slow query logs by executing SQL statements.

Log on to the HoloWeb console. For more information, see Connect to HoloWeb and perform queries.
In the top navigation bar, click Diagnostics and Optimization.
In the left-side navigation pane, click Historical Slow Query.
In the upper part of the Historical Slow Query page, set the query conditions.
For more information about the parameters on the Historical Slow Query page, see Query and analyze slow query logs.
Click Query and view the query results in the Query Trend Analysis and Queries sections.
- Query Trend Analysis
  The Query Trend Analysis section displays the trends of slow queries or failed queries in the specified time range in a visualized manner. This allows you to find the periods during which slow queries and failed queries frequently occurred and troubleshoot issues in a centralized manner.
- Queries
  The Queries section displays the details of slow queries and failed queries. For more information about the parameters in the Queries section, see Query and analyze slow query logs. You can also click Customize Columns to select the fields that you want to display in the Queries section.

SQL fingerprint

In Hologres V2.2 and later, the digest field is added to the system table hologres.hg_query_log to store SQL fingerprints of queries. For queries that are performed by executing the SELECT, INSERT, DELETE, or UPDATE statement, the system calculates MD5 hash values as the SQL fingerprints of the queries. This facilitates classification and analysis of queries for your business.

SQL fingerprints are collected and calculated based on the following rules:

By default, only the SQL fingerprints of queries that are performed by executing the SELECT, INSERT, DELETE, or UPDATE statement are collected.
If you insert constant values into a table by executing the INSERT statement, the SQL fingerprint of this statement is not affected by the amount of inserted data.
The calculation of SQL fingerprints follows the same case sensitivity rule as the rule that is used by Hologres to process SQL statements.
During the calculation of SQL fingerprints, the system ignores all whitespace characters and considers only structure information in SQL statements. Whitespace characters include spaces, line breaks, and tabs.
During the calculation of SQL fingerprints, the impacts of constant values on SQL fingerprints are ignored.
For example, the SQL fingerprint of the SELECT * FROM t WHERE a > 1; statement is the same as that of the SELECT * FROM t WHERE a > 2; statement.
If a query statement contains an array that consists of constants, the calculation of the SQL fingerprint is not affected by the number of elements in the array.
For example, the SQL fingerprint of the SELECT * FROM t WHERE a IN (1, 2); statement is the same as that of the SELECT * FROM t WHERE a IN (3, 4, 5); statement.
During the calculation of SQL fingerprints, a table in a query is identified based on the table name, the name of the database to which the table belongs, and the name of the schema to which the table belongs.
For example, the SQL fingerprint of the SELECT * FROM t; statement is the same as that of the SELECT * FROM public.t; statement only when the table t in the former statement belongs to the public schema.

Analyze slow query logs

You can obtain slow query logs by querying the hologres.hg_query_log table. This section describes the SQL statements that are commonly used to analyze slow query logs. You can perform the following operations:

Query the total number of slow queries.
Query the number of slow queries initiated by each user.
Query the details of a specified slow query.
Query the slow queries that are most time-consuming in the last 10 minutes.
Query the access amount and data read amount of slow queries initiated in each of the last 3 hours.
Query the data access amount of slow queries in the past 3 hours and that in the same period on the previous day for comparison.
Query the slow queries that are most time-consuming at different query stages in the last 10 minutes.
Query the first failed queries.
Execute the following statement to query the total number of slow queries. By default, the number of slow queries for the previous month is returned.
```
SELECT count(*) FROM hologres.hg_query_log;
```
The following result is returned. The result indicates that 44 queries whose execution time exceeds the specified threshold were initiated in the previous month.
```
count
-------
    44
(1 row)
```

Execute the following statement to query the number of slow queries initiated by each user:

SELECT usename AS "User",
       count(1) as "Number of slow queries"
FROM hologres.hg_query_log
GROUP BY usename
order by count(1) desc;

The following result is returned. The value returned by the count(1) function indicates the number of slow queries initiated by each user.

User                   | Number of slow queries
-----------------------+-----
 1111111111111111      |  27
 2222222222222222      |  11
 3333333333333333      |   4
 4444444444444444      |   2
(4 rows)

Execute the following statement to query the details of a specified slow query:

SELECT * FROM hologres.hg_query_log WHERE query_id = '13001450118416xxxx';

The following result is returned. For more information about the fields, see Query the hologres.hg_query_log table in this topic.

       usename      | status  |      query_id      | datname| command_tag | duration | message |      query_start       | query_date |                                             query                                                  | result_rows | result_bytes | read_rows |read_bytes | affected_rows | affected_bytes | memory_bytes | shuffle_bytes | cpu_time_ms | physical_reads |   pid   | application_name | engine_type |  client_addr  | table_write | table_read |   session_id   |     session_start      | command_id | optimization_cost | start_query_cost | get_next_cost | extended_cost | plan | statistics | visualization_info | query_detail | query_extinfo
-----------------------+---------+--------------------+---------+-------------+----------+---------+------------------------+------------+---------------------------------------------------------------------------------------------------------+-------------+--------------+-----------+------------+---------------+----------------+--------------+---------------+-------------+----------------+---------+------------------+-------------+---------------+-------------+------------+-----------------+------------------------+------------+-------------------+------------------+---------------+---------------+------+------------+--------------------+--------------+---------------
 p4_11111111111xxxx | SUCCESS | 13001450118416xxxx | dbname | SELECT      |      149 |         | 2021-03-30 23:45:01+08 | 20210330   | explain analyze SELECT  * FROM tablename WHERE user_id = '20210330010xxxx' limit 1000;             |        1000 |       417172 |         0 |         0 |            -1 |             -1 |     26731376 |     476603616 |      321626 |              0 | 1984913 | psql             | {HQE}      | 33.41.xxx.xxx |             |            | 6063475a.1e4991 | 2021-03-30 23:44:26+08 | 0          |                58 |               22 |            67 |               |      |            |                    |              |
(1 row)

Execute the following statement to query the slow queries that are most time-consuming in a recent time range, such as the last 10 minutes. You can specify a time range based on your business requirements.

SELECT status AS "Status",
       duration AS "Time consumed (ms)",
       query_start AS "Started at",
       (read_bytes/1048576)::text || ' MB' AS "Read bytes",
       (memory_bytes/1048576)::text || ' MB' AS "Memory",
       (shuffle_bytes/1048576)::text || ' MB' AS "Shuffle",
       (cpu_time_ms/1000)::text || ' s' AS "CPU time",
       physical_reads as "Physical reads",
       query_id as "QueryID",
       query::char(30)
 FROM hologres.hg_query_log
 WHERE query_start >= now() - interval '10 min'
 ORDER BY duration DESC,
          read_bytes DESC,
          shuffle_bytes DESC,
          memory_bytes DESC,
          cpu_time_ms DESC,
          physical_reads DESC
LIMIT 100;

The following result is returned:

Status     |Time consumed (ms) |    Started at            | Megabytes read | Memory  | Shuffle | CPU time | Physical reads |      QueryID       |             query
---------+---------+------------------------+--------+-------+---------+---------+------+--------------------+--------------------------------
 SUCCESS |  149    | 2021-03-30 23:45:01+08 | 0 MB   | 25 MB | 454 MB  | 321 s   |    0 | 13001450118416xxxx | explain analyze SELECT  * FROM
 SUCCESS |  137    | 2021-03-30 23:49:18+08 | 247 MB | 21 MB | 213 MB  | 803 s   | 7771 | 13001491818416xxxx | explain analyze SELECT  * FROM
 FAILED  |   53    | 2021-03-30 23:48:43+08 | 0 MB   | 0 MB  | 0 MB    | 0 s     |    0 | 13001484318416xxxx | SELECT ds::bigint / 0 FROM pub
(3 rows)

Query the number of queries that were initiated on the previous day.

Execute the following statement to query the total number of queries that were initiated on the previous day:

SELECT
    COUNT(1)
FROM ( SELECT DISTINCT
        t1.digest
    FROM
        hologres.hg_query_log t1
    WHERE
        t1.query_start >= CURRENT_DATE - INTERVAL '1 day'
        AND t1.query_start < CURRENT_DATE
        AND NOT EXISTS (
            SELECT
                1
            FROM
                hologres.hg_query_log t2
            WHERE
                t2.digest = t1.digest
                AND t2.query_start < CURRENT_DATE - INTERVAL '1 day')
        AND digest IS NOT NULL
 ) AS a;

The following result indicates that a total of 10 queries were initiated on the previous day.

count  
-------
    10 
(1 row)

Execute the following statement to query the number of each type of queries that were initiated on the previous day. The query type is specified by command_tag.

SELECT
    a.command_tag,
    COUNT(1)
FROM ( SELECT DISTINCT
        t1.digest,
        t1.command_tag
    FROM
        hologres.hg_query_log t1
    WHERE
        t1.query_start >= CURRENT_DATE - INTERVAL '1 day'
        AND t1.query_start < CURRENT_DATE
        AND NOT EXISTS (
            SELECT
                1
            FROM
                hologres.hg_query_log t2
            WHERE
                t2.digest = t1.digest
                AND t2.query_start < CURRENT_DATE - INTERVAL '1 day')
            AND t1.digest IS NOT NULL) AS a
GROUP BY
    1
ORDER BY
    2 DESC;

The following result indicates that eight INSERT queries and two SELECT queries were initiated on the previous day.

command_tag	| count  
------------+--------
INSERT	    |    8   
SELECT	    |    2
(2 rows)

Execute the following statement to query details of queries that were initiated on the previous day:

SELECT
    a.usename,
    a.status,
    a.query_id,
    a.digest, 
    a.datname,
    a.command_tag,
    a.query,
    a.cpu_time_ms,
    a.memory_bytes
FROM (
    SELECT DISTINCT
        t1.usename,
        t1.status,
        t1.query_id,
        t1.digest, 
        t1.datname,
        t1.command_tag,
        t1.query,
        t1.cpu_time_ms,
        t1.memory_bytes
    FROM
        hologres.hg_query_log t1
    WHERE
         t1.query_start >= CURRENT_DATE - INTERVAL '1 day' 
        AND t1.query_start < CURRENT_DATE 
        AND NOT EXISTS (
            SELECT
                1
            FROM
                hologres.hg_query_log t2
            WHERE
                t2.digest = t1.digest
                AND t2.query_start < CURRENT_DATE - INTERVAL '1 day'
        )
        AND t1.digest IS NOT NULL
) AS a;

The following result is returned:

usename	         |status  |query_id	           |digest	                            |datname|command_tag	|query	                            |cpu_time_ms   |memory_bytes
-----------------+--------+--------------------+------------------------------------+-------+-------------+-----------------------------------+--------------+--------------
111111111111xxxx |SUCCESS |100100425827776xxxx |md58cf93d91c36c6bc9998add971458ba1a |dbname	|INSERT	      |INSERT INTO xxx SELECT * FROM xxx  | 	   		 1748|	   898808596
111111111111xxxx |SUCCESS |100100425827965xxxx |md5f7e87e2c9e0b3d9eddcd6b3bc7f04b3b |dbname	|INSERT       |INSERT INTO xxx SELECT * FROM xxx  | 	   		59891|	  6819529886
111111111111xxxx |SUCCESS |100100425829654xxxx |md55612dc09d2d81074fd5deed1aa3eca9b |dbname	|INSERT	      |INSERT INTO xxx SELECT * FROM xxx  |   	   		  3|	     2100039
111111111111xxxx |SUCCESS |100100425829664xxxx |md58d3bf67fbdf2247559bc916586d40011 |dbname	|INSERT	      |INSERT INTO xxx SELECT * FROM xxx  |     		10729|	  2052861937
111111111111xxxx |SUCCESS |100100425830099xxxx |md503bd45d6b2d7701c2617d079b4d55a10 |dbname	|INSERT	      |INSERT INTO xxx SELECT * FROM xxx  |	   		   2196|     897948034
111111111111xxxx |SUCCESS |100100425830186xxxx |md5c62169eaf3ea3a0c59bdc834a8141ac4 |dbname	|INSERT	      |INSERT INTO xxx SELECT * FROM xxx  |	   		   5268|    1734305972
111111111111xxxx |SUCCESS |100100425830448xxxx |md59aa0c73b24c9c9eba0b34c8fdfc23bb0 |dbname	|INSERT	      |INSERT INTO xxx SELECT * FROM xxx  |	   		      2|       2098402
111111111111xxxx |SUCCESS |100100425830459xxxx |md57d22c1d37b68984e9472f11a4c9fd04e |dbname	|INSERT	      |INSERT INTO xxx SELECT * FROM xxx  |	   		    113|      76201984
111111111111xxxx |SUCCESS |100100525468694xxxx |md5ac7d6556fae123e9ea9527d8f1c94b1c |dbname	|SELECT	      |SELECT * FROM xxx limit 200	      |     		    6|	     1048576
111111111111xxxx |SUCCESS |100101025538840xxxx |md547d09cdad4d5b5da74abaf08cba79ca0 |dbname	|SELECT	      |SELECT * FROM xxx limit 200	      |\N	           |\N        
(10 rows)

Execute the following statement to query the trend of queries that were initiated in each hour of the previous day.

SELECT
    to_char(a.query_start, 'HH24') AS query_start_hour,
    a.command_tag,
    COUNT(1)
FROM (
    SELECT DISTINCT
        t1.query_start,
        t1.digest,
        t1.command_tag
    FROM
        hologres.hg_query_log t1
    WHERE
         t1.query_start >= CURRENT_DATE - INTERVAL '1 day' 
         AND t1.query_start < CURRENT_DATE 
         AND NOT EXISTS (
            SELECT
                1
            FROM
                hologres.hg_query_log t2
            WHERE
                t2.digest = t1.digest
                AND t2.query_start < CURRENT_DATE - INTERVAL '1 day'
         )
         AND t1.digest IS NOT NULL
) AS a 
GROUP BY 1, 2
ORDER BY 
	3 DESC;

The following result indicates that one SELECT query was initiated in the 11th hour, another SELECT query was initiated in the 13th hour, and eight INSERT queries were initiated in the 21st hour on the previous day.

query_start_hour	|command_tag	|count
------------------+-------------+-----
	            21	|INSERT	      |    8
	            11	|SELECT	      |    1
	            13	|SELECT	      |    1
(3 rows)

Execute the following statement to classify slow queries:

SELECT
    digest,
    command_tag,
    count(1)
FROM
    hologres.hg_query_log
WHERE
    query_start >= CURRENT_DATE - INTERVAL '1 day'
    AND query_start < CURRENT_DATE
GROUP BY
    1,2
ORDER BY
    3 DESC;

Execute the following statement to query the top 10 queries that generate the largest CPU overheads on the previous day:

SELECT
    digest,
    avg(cpu_time_ms)
FROM
    hologres.hg_query_log
WHERE
    query_start >= CURRENT_DATE - INTERVAL '1 day'
    AND query_start < CURRENT_DATE
    AND digest IS NOT NULL
    AND usename != 'system'
    AND cpu_time_ms IS NOT NULL
GROUP BY
    1
ORDER BY
    2 DESC
LIMIT 10;

Execute the following statement to query the top 10 queries that occupy the largest memory resources in the previous week. The memory is specified by memory_bytes.

SELECT
    digest,
    avg(memory_bytes)
FROM
    hologres.hg_query_log
WHERE
    query_start >= CURRENT_DATE - INTERVAL '7 day'
    AND query_start < CURRENT_DATE
    AND digest IS NOT NULL
    AND memory_bytes IS NOT NULL
GROUP BY
    1
ORDER BY
    2 DESC
LIMIT 10;

Execute the following statement to query the access amount and data read amount of slow queries initiated in each of the last 3 hours. This helps determine whether the data amount changes in each hour.

SELECT
    date_trunc('hour', query_start) AS query_start,
    count(1) AS query_count,
    sum(read_bytes) AS read_bytes,
    sum(cpu_time_ms) AS cpu_time_ms
FROM
    hologres.hg_query_log
WHERE
    query_start >= now() - interval '3 h'
GROUP BY 1;

Execute the following statement to query the data access amount of slow queries in the past 3 hours and that in the same period on the previous day for comparison. This helps determine whether the data access amount changes compared with that in the same period on the previous day.

SELECT
    query_date,
    count(1) AS query_count,
    sum(read_bytes) AS read_bytes,
    sum(cpu_time_ms) AS cpu_time_ms
FROM
    hologres.hg_query_log
WHERE
    query_start >= now() - interval '3 h'
GROUP BY
    query_date
UNION ALL
SELECT
    query_date,
    count(1) AS query_count,
    sum(read_bytes) AS read_bytes,
    sum(cpu_time_ms) AS cpu_time_ms
FROM
    hologres.hg_query_log
WHERE
    query_start >= now() - interval '1d 3h'
    AND query_start <= now() - interval '1d'
GROUP BY
    query_date;

Execute the following statement to query the slow queries that are most time-consuming at different query stages in a recent time range, such as the last 10 minutes. You can specify a time range based on your business requirements.

SELECT
    status AS "Status",
    duration AS "Time consumed (ms)",
    optimization_cost AS "Optimization time (ms)",
    start_query_cost AS "Startup time (ms)",
    get_next_cost AS "Execution duration (ms)",
    duration - optimization_cost - start_query_cost - get_next_cost AS "Others (ms)",
    query_id AS "QueryID",
    query::char(30)
FROM
    hologres.hg_query_log
WHERE
    query_start >= now() - interval '10 min'
ORDER BY
    duration DESC,
    start_query_cost DESC,
    optimization_cost,
    get_next_cost DESC,
    duration - optimization_cost - start_query_cost - get_next_cost DESC
LIMIT 100;

The following result is returned:

Status     | Time consumed (ms) | Optimization duration (ms) | Startup duration (ms) | Execution duration (ms) | Others (ms) |      QueryID       |             query
---------+----------+--------------+--------------+--------------+--------------+--------------------+--------------------------------
 SUCCESS |     4572 |          521 |          320 |         3726 |            5 | 6000260625679xxxx  | -- /* user: wang ip: xxx.xx.x
 SUCCESS |     1490 |          538 |           98 |          846 |            8 | 12000250867886xxxx | -- /* user: lisa ip: xxx.xx.x
 SUCCESS |     1230 |          502 |           95 |          625 |            8 | 26000512070295xxxx | -- /* user: zhang ip: xxx.xx.
(3 rows)

Execute the following statement to query the first failed queries:

SELECT
    status AS "Status",
    regexp_replace(message, '\n', ' ')::char(150) AS "Error message",
    duration AS "Time consumed (ms)",
    query_start AS "Started at",
    query_id AS "QueryID",
    query::char(100) AS "Query"
FROM
    hologres.hg_query_log
WHERE
    query_start BETWEEN '2021-03-25 17:00:00'::timestamptz AND '2021-03-25 17:42:00'::timestamptz + interval '2 min'
    AND status = 'FAILED'
ORDER BY
    query_start ASC
LIMIT 100;

The following result is returned:

Status    |                                                                     Error message                                                                           | Time consumed (ms)  |        Started at        |     QueryID       | Query
--------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------+------------------------+-------------------+-------
 FAILED | Query:[1070285448673xxxx] code: kActorInvokeError msg: "[holo_query_executor.cc:330 operator()] HGERR_code XX000 HGERR_msge internal error: status { c |  1460 | 2021-03-25 17:28:54+08 | 1070285448673xxxx | S...
 FAILED | Query:[1016285560553xxxx] code: kActorInvokeError msg: "[holo_query_executor.cc:330 operator()] HGERR_code XX000 HGERR_msge internal error: status { c |   131 | 2021-03-25 17:28:55+08 | 1016285560553xxxx | S...
(2 rows)

Export slow query logs

Hologres allows you to execute the INSERT statement to export slow query logs from the hologres.hg_query_log table to a custom internal table or a foreign table such as a MaxCompute foreign table or an Object Storage Service (OSS) foreign table.

Usage notes
To correctly and efficiently export slow query logs, take note of the following items:
- The account that is used to execute the INSERT INTO ... SELECT ... FROM hologres.hg_query_log; statement must have access permissions on the hologres.hg_query_log table. For more information, see Grant query permissions to users. To export all slow query logs of an instance, the account that is used to execute the INSERT statement must have the superuser permissions or the pg_read_all_stats permission. Otherwise, only part of the data in the hologres.hg_query_log table can be queried and exported.
- An index is configured for the query_start field in the hologres.hg_query_log table. If you want to export data generated within a specific time range, use the index to query the data. This ensures better performance and consumes fewer resources.
- If you use the index to query data, do not use nested expressions for the query_start field. Otherwise, the index may not work. For example, we recommend that you replace WHERE to_char(query_start, 'yyyymmdd') = '20220101'; with WHERE query_start >= to_char('20220101', 'yyyy-mm-dd') and query_start < to_char('20220102', 'yyyy-mm-dd');.

Example 1: Export slow query logs to a Hologres internal table

Execute the following statement in Hologres to export slow query logs to the query_log_download internal table:

--Hologres SQL
CREATE TABLE query_log_download (
    usename text,
    status text,
    query_id text,
    datname text,
    command_tag text,
    duration integer,
    message text,
    query_start timestamp with time zone,
    query_date text,
    query text,
    result_rows bigint,
    result_bytes bigint,
    read_rows bigint,
    read_bytes bigint,
    affected_rows bigint,
    affected_bytes bigint,
    memory_bytes bigint,
    shuffle_bytes bigint,
    cpu_time_ms bigint,
    physical_reads bigint,
    pid integer,
    application_name text,
    engine_type text[],
    client_addr text,
    table_write text,
    table_read text[],
    session_id text,
    session_start timestamp with time zone,
    trans_id text,
    command_id text,
    optimization_cost integer,
    start_query_cost integer,
    get_next_cost integer,
    extended_cost text,
    plan text,
    statistics text,
    visualization_info text,
    query_detail text,
    query_extinfo text[]
);

INSERT INTO query_log_download SELECT
  usename,
  status,
  query_id,
  datname,
  command_tag,
  duration,
  message,
  query_start,
  query_date,
  query,
  result_rows,
  result_bytes,
  read_rows,
  read_bytes,
  affected_rows,
  affected_bytes,
  memory_bytes,
  shuffle_bytes,
  cpu_time_ms,
  physical_reads,
  pid,
  application_name,
  engine_type,
  client_addr,
  table_write,
  table_read,
  session_id,
  session_start,
  trans_id,
  command_id,
  optimization_cost,
  start_query_cost,
  get_next_cost,
  extended_cost,
  plan,
  statistics,
  visualization_info,
  query_detail,
  query_extinfo
FROM hologres.hg_query_log
WHERE query_start >= '2022-08-03'
  		AND query_start < '2022-08-04';

Example 2: Export slow query logs to a MaxCompute foreign table

Execute the following statement in MaxCompute to create a MaxCompute table to which you want to export slow query logs:

CREATE TABLE if NOT EXISTS mc_holo_query_log (
    username STRING COMMENT 'The name of the user who initiated the query.'
    ,status STRING COMMENT 'The status of the query. The value indicates whether the query is successful.'
    ,query_id STRING COMMENT 'The ID of the query.'
    ,datname STRING COMMENT 'The name of the queried database.'
    ,command_tag STRING COMMENT 'The type of the statement executed in the query.'
    ,duration BIGINT COMMENT 'The amount of time consumed by the query, in milliseconds.'
    ,message STRING COMMENT 'The returned error message.'
    ,query STRING COMMENT 'The statement executed in the query.'
    ,read_rows BIGINT COMMENT 'The number of rows that were read by the query.'
    ,read_bytes BIGINT COMMENT 'The number of bytes that were read by the query.'
    ,memory_bytes BIGINT COMMENT 'The peak memory usage of a single node. The value is not an accurate value.'
    ,shuffle_bytes BIGINT COMMENT 'The number of bytes that were shuffled. The value is not an accurate value.'
    ,cpu_time_ms BIGINT COMMENT 'The total CPU time, in milliseconds. The value is not an accurate value.'
    ,physical_reads BIGINT COMMENT 'The number of physical reads.'
    ,application_name STRING COMMENT 'The type of the application that initiated the query.'
    ,engine_type ARRAY<STRING> COMMENT 'The engine used in the query.'
    ,table_write STRING COMMENT 'The table to which data was written.'
    ,table_read ARRAY<STRING> COMMENT 'The table from which data was read.'
    ,plan STRING COMMENT 'The execution plan of the query.'
    ,optimization_cost BIGINT COMMENT 'The amount of time consumed to generate an execution plan for the query.'
    ,start_query_cost BIGINT COMMENT 'The amount of time consumed to start the query.'
    ,get_next_cost BIGINT COMMENT 'The amount of time consumed to execute the query.'
    ,extended_cost STRING COMMENT 'The amount of time consumed by other operations of the query.'
    ,query_detail STRING COMMENT 'The extended information about the query in the JSON format.'
    ,query_extinfo ARRAY<STRING> COMMENT 'The extended information about the query in the form of an array.'
    ,query_start STRING COMMENT 'The point in time when the query was initiated.'
    ,query_date STRING COMMENT 'The date when the query was initiated.'
) COMMENT 'hologres instance query log daily'
PARTITIONED BY (
    ds STRING COMMENT 'stat date'
) LIFECYCLE 365;

ALTER TABLE mc_holo_query_log ADD PARTITION (ds=20220803);

Execute the following statement in Hologres to export slow query logs to the MaxCompute foreign table:

IMPORT FOREIGN SCHEMA project_name LIMIT TO (mc_holo_query_log)
FROM SERVER odps_server INTO public;

INSERT INTO mc_holo_query_log
SELECT
    usename AS username,
    status,
    query_id,
    datname,
    command_tag,
    duration,
    message,
    query,
    read_rows,
    read_bytes,
    memory_bytes,
    shuffle_bytes,
    cpu_time_ms,
    physical_reads,
    application_name,
    engine_type,
    table_write,
    table_read,
    plan,
    optimization_cost,
    start_query_cost,
    get_next_cost,
    extended_cost,
    query_detail,
    query_extinfo,
    query_start,
    query_date,
    '20220803'
FROM
    hologres.hg_query_log
WHERE
    query_start >= '2022-08-03'
    AND query_start < '2022-08-04';

Configuration items

Hologres provides the following configuration items that allow you to configure the collection of slow query logs:

log_min_duration_statement

Description

This configuration item is used to specify a threshold of the amount of time consumed by slow queries. By default, Hologres collects the logs of slow queries that consume more than 100 ms, and only queries that consume 1 second or longer are displayed in the query result. You can configure this configuration item to modify the duration. Take note of the following items:

Modification of this configuration item takes effect only for new queries. For queries that have been collected, the original value takes effect.
Only superusers can modify this configuration item.
If you set this configuration item to -1, no slow query logs are collected. If you set this configuration item to a positive value, the minimum value is 100 ms.

Examples

Enable the system to collect the logs of slow queries that consume 250 ms or longer:

-- Execute the following statement as a superuser for a database:
ALTER DATABASE dbname SET log_min_duration_statement = '250ms';

-- Execute the following statement as a regular user for a session:
SET log_min_duration_statement = '250ms';

log_min_duration_query_stats

Description

This configuration item is used to specify a threshold of the amount of time consumed by queries for the collection of execution statistics. By default, Hologres collects execution statistics about the slow queries that consume more than 10 seconds. Take note of the following items:

Modification of this configuration item takes effect only for new queries. For queries that have been collected, the original value takes effect.
If you set this configuration item to -1, no execution statistics about slow queries are collected.
Such statistics require large storage space. If Hologres collects a large amount of execution statistics, the query and analysis of slow query logs may slow down. Therefore, we recommend that you specify a threshold of at least 10 seconds in general scenarios. Specify a threshold of less than 10 seconds only when you troubleshoot specific issues.

Examples

Enable the system to collect execution statistics about the slow queries that consume more than 20 seconds:

-- Execute the following statement as a superuser for a database:
ALTER DATABASE dbname SET log_min_duration_query_stats = '20s';

-- Execute the following statement as a regular user for a session:
SET log_min_duration_query_stats = '20s';

log_min_duration_query_plan

Description

This configuration item is used to specify a threshold of the amount of time consumed by queries for the collection of execution plans. By default, Hologres displays the execution plans of slow queries that consume 10 or more seconds. Take note of the following items:

Modification of this configuration item takes effect only for new queries. For queries that have been collected, the original value takes effect.
If the amount of time consumed by a slow query is greater than or equal to the specified value, the execution plan of the slow query is collected.
By default, you can execute the EXPLAIN statement to query execution plans in real time, and you do not need to collect such data.
If you set this configuration item to -1, no execution plans of slow queries are collected.

Examples

Enable the system to collect the execution plans of slow queries that consume more than 10 seconds:

-- Execute the following statement as a superuser for a database:
ALTER DATABASE dbname SET log_min_duration_query_plan = '10s';

-- Execute the following statement as a regular user for a session:
SET log_min_duration_query_plan = '10s';

FAQ

Problem description:
In Hologres V1.1, detailed statistics such as the number of queried rows and the number of returned rows cannot be returned in queried slow query logs.
Cause:
The collected slow query logs are incomplete.
Solution:
In Hologres V1.1.36 to Hologres V1.1.49, you can execute the following statement to configure a GUC parameter, which enables the system to return complete slow query logs. Complete slow query logs are directly returned for Hologres V1.1.49 and later.
Note
If the version of your Hologres instance is earlier than V1.1.36, manually upgrade your Hologres instance or join the Hologres DingTalk group for technical support. For more information, see Instance upgrades and Obtain online support for Hologres.
```
-- (Recommended) Configure the GUC parameter at the database level. You need to execute the following statement only once for each database: 
ALTER databse <db_name> SET hg_experimental_force_sync_collect_execution_statistics = ON;

-- Configure the GUC parameter at the session level.
SET hg_experimental_force_sync_collect_execution_statistics = ON;
```
The db_name parameter specifies the name of the database.

References

For more information about how to diagnose and manage queries in an instance, see Manage queries.