Hologres V1.3 and later collect statistics on tables in your Hologres instance on a daily basis and store the statistics in the system table named hologres.hg_table_info. This way, you can query and analyze the statistics on your tables and take optimization measures based on the results of statistical analysis. This topic describes how to query and analyze table statistics in Hologres.
Limits
Hologres V1.3 and later allow you to query table statistics. If the version of your Hologres instance is earlier than V1.3, manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Upgrade instances. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.
The hologres.hg_table_info table is updated with a delay of one day. Data of the current day may be updated before 05:00 on the next day. On the day when your Hologres instance is upgraded from V1.1 to V1.3, table statistics are not generated. If you query the table statistics on this day, the following error message is returned:
meta warehouse store currently not available
. In this case, you need to query table statistics the next day after the upgrade is complete.
Usage notes
By default, table statistics of the previous 30 days are retained.
For a non-partitioned internal table in Hologres that is specified by the type='TABLE' configuration, you can query the detailed statistics on the table, including the storage space, number of files, accumulated number of times that the table was accessed, and number of rows.
For other objects in Hologres, such as views, materialized views, foreign tables, and parent tables, you can query only the basic information. For example, you can query the number of partitions of a parent table, the name of the table to which a foreign table maps, and the definition of a view or a materialized view.
The hologres.hg_table_info table belongs to the meta warehouse system of Hologres. Failure to query the hologres.hg_table_info table does not affect queries in instances. Therefore, the stability of the hologres.hg_table_info table is not specified in the service level agreement (SLA) of Hologres.
hologres.hg_table_info table
The following table describes the fields that are contained in the hologres.hg_table_info
table.
Table statistics are stored in the system table named hologres.hg_table_info. If the version of your Hologres instance is V1.3 or later, table statistics are collected on a daily basis by default.
Specific fields may be empty. If a table was created before the version of the relevant Hologres instance was upgraded to V1.3 or later, the statistics about the creation of the table failed to be collected. This type of statistics can be collected for tables that are created after the version of the relevant Hologres instance is upgraded to V1.3 or later.
Field | Data type | Description | Remarks |
db_name | text | The name of the database to which the table belongs. | None. |
schema_name | text | The name of the schema to which the table belongs. | None. |
table_name | text | The name of the table. | None. |
table_id | text | The unique ID of the table. If the table is a foreign table, the unique ID is in the Database name.Schema name.Table name format. | None. |
type | text | The type of the table. Valid values:
|
|
partition_spec | text | The partitioning condition. This field applies to partitioned child tables. | None. |
is_partition | boolean | Indicates whether the table is a partitioned child table. | None. |
owner_name | text | The username of the table owner. This field corresponds to the usename field of the hologres.hg_query_log table. You can join the two tables based on the owner_name and usename fields. | None. |
create_time | timestamp with time zone | The time when the table was created. | None. |
last_ddl_time | timestamp with time zone | The time when the table statistics were last updated. | None. |
last_modify_time | timestamp with time zone | The time when the table data was last modified. | None. |
last_access_time | timestamp with time zone | The time when the table was last accessed. | None. |
view_def | text | The definition of the view. | This field applies only to views. |
comment | text | The description of the table or view. | None. |
hot_storage_size | bigint | The storage space that is occupied by the hot data of the table. Unit: bytes. | It is normal if the storage space in the hologres.hg_table_info table differs from the storage space returned by the pg_relation_size function. This is because statistics in the hologres.hg_table_info table are collected on a daily basis and the query result of the pg_relation_size function excludes the storage space of binary logs. |
cold_storage_size | bigint | The storage space that is occupied by the cold data of the table. Unit: bytes. | It is normal if the storage space in the hologres.hg_table_info table differs from the storage space returned by the pg_relation_size function. This is because statistics in the hologres.hg_table_info table are collected on a daily basis and the query result of the pg_relation_size function excludes the storage space of binary logs. |
hot_file_count | bigint | The number of hot data files of the table. | None. |
cold_file_count | bigint | The number of cold data files of the table. | None. |
table_meta | jsonb | The metadata of the table. The value is in the JSONB format. | None. |
row_count | bigint | The number of rows in the table or the partition. | For a partitioned parent table, the value of this field indicates the total number of rows in all partitioned child tables. |
collect_time | timestamp with time zone | The time when the current statistics were collected. | None. |
partition_count | bigint | The number of partitioned child tables. | This field applies only to partitioned parent tables. |
parent_schema_name | text | The name of the schema to which the partitioned parent table belongs. The partitioned child table is associated with the partitioned parent table. | This field applies only to partitioned child tables. |
parent_table_name | text | The name of the partitioned parent table with which the partitioned child table is associated. | This field applies only to partitioned child tables. |
total_read_count | bigint | The accumulated number of times that data was read from the table. The number can be affected by SELECT, INSERT, UPDATE, and DELETE operations and does not indicate an accurate value. | We recommend that you do not use the value of this field. |
total_write_count | bigint | The accumulated number of times that data was written to the table. The number can be affected by INSERT, UPDATE, and DELETE operations and does not indicate an accurate value. | We recommend that you do not use the value of this field. |
read_sql_count_1d | bigint | The total number of times that data was read from the table from 00:00 to 24:00 on the T-1 day (UTC+8). |
|
write_sql_count_1d | bigint | The total number of times that data was written to the table from 00:00 to 24:00 on the T-1 day (UTC+8). |
|
Grant query permissions to a user
Only an authorized user can query table statistics. The following section describes how to grant query permissions to a user in different scenarios.
Query the table statistics of all databases in a Hologres instance.
Assign the superuser role to a user.
After you assign the superuser role to a user, the user can view the table statistics of all databases in the Hologres instance.
-- Replace Alibaba Cloud account ID with the Alibaba Cloud account ID of the user. For a RAM user, add the p4_. prefix to the account ID. ALTER USER "Alibaba Cloud account ID" SUPERUSER;
Add users to the
pg_stat_scan_table
user group.Superusers and users in the
pg_stat_scan_tables
(for versions earlier than Hologres V1.3.44) orpg_read_all_stats
(for Hologres V1.3.44 and later) user group have permissions to query the table statistics of all databases in the relevant Hologres instance. If you are a superuser, you can add regular users to the user group. Syntax:-- For versions earlier than Hologres V1.3.44 GRANT pg_stat_scan_tables TO "Alibaba Cloud account ID";-- Use the standard PostgreSQL authorization model to grant relevant permissions to the user. CALL spm_grant('pg_stat_scan_tables', 'Alibaba Cloud account ID'); -- Use the simple permission model (SPM) to grant relevant permissions to the user. CALL slpm_grant('pg_stat_scan_tables', 'Alibaba Cloud account ID'); -- Use the schema-level permission model (SLPM) to grant relevant permissions to the user. -- For Hologres V1.3.44 and later GRANT pg_read_all_stats TO "Alibaba Cloud account ID"; -- Use the standard PostgreSQL authorization model to grant relevant permissions to the user. CALL spm_grant('pg_read_all_stats', 'Alibaba Cloud account ID'); -- Use the SPM to grant relevant permissions to the user. CALL slpm_grant('pg_read_all_stats', 'Alibaba Cloud account ID'); -- Use the SLPM to grant relevant permissions to the user.
Query the table statistics of a database.
Users in the
db_admin
user group have the permissions to query the table statistics of the relevant database. If you are a superuser, you can add regular users to this user group by using the SPM or SLPM.NoteA regular user can query statistics only on the tables owned by the user in the relevant database.
CALL spm_grant('<db_name>_admin', 'Account ID'); -- Use the SPM to grant related permissions to the user. CALL slpm_grant('<db_name>.admin', 'Account ID'); -- Use the SLPM to grant related permissions to the user.
Execute SQL statements to query the trends of table statistics
Scenario 1: Query the access trends of Hologres internal tables
-- Query the trends of statistics on all internal tables in an instance. The statistics include the occupied storage space, file quantity, accumulated numbers of times that data was read from tables, accumulated numbers of times that data was written to tables, and number of row records.
SELECT
db_name,
schema_name,
table_name,
collect_time :: date AS collect_date,
hot_storage_size,
cold_storage_size,
hot_file_count,
cold_file_count,
read_sql_count_1d,
write_sql_count_1d,
row_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
AND type ='TABLE'
ORDER BY collect_date desc ;
Scenario 2: Query the access information of tables that occupy large disk space
-- Query the access information of the top 10 tables that occupy large disk space.
SELECT
db_name,
schema_name,
table_name,
hot_storage_size + cold_storage_size AS total_storage_size,
row_count,
sum(read_sql_count_1d) AS total_read_count,
sum(write_sql_count_1d) AS total_write_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
AND type = 'TABLE'
AND (
cold_storage_size IS NOT NULL
OR hot_storage_size IS NOT NULL
)
GROUP BY db_name,schema_name,table_name,total_storage_size,row_count
ORDER BY total_storage_size DESC
LIMIT 10;
Scenario 3: Query the access trends and data size change trends of the top 10 tables that occupy large disk space
-- Query the previous-week access trends, storage space trends, and data size change trends of the top 10 tables that occupy large disk space on the previous day.
WITH top10_table AS (SELECT
db_name,
schema_name,
table_name,
hot_storage_size + cold_storage_size AS total_storage_size
FROM
hologres.hg_table_info
WHERE
collect_time >= (current_date - interval '1 day')::timestamptz -- Specify the previous day.
AND collect_time < current_date
AND type = 'TABLE'
AND ( cold_storage_size IS NOT NULL OR hot_storage_size IS NOT NULL )
ORDER BY total_storage_size DESC
LIMIT 10
)
SELECT
base.db_name,
base.schema_name,
base.table_name,
base.hot_storage_size + cold_storage_size AS total_storage_size,
base.row_count,
base.read_sql_count_1d,
base.write_sql_count_1d,
base.collect_time :: date AS collect_date
FROM
hologres.hg_table_info AS base
LEFT JOIN
top10_table
ON
base.db_name = top10_table.db_name
AND base.schema_name = top10_table.schema_name
AND base.table_name = top10_table.table_name
WHERE
collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
AND type = 'TABLE'
AND ( cold_storage_size IS NOT NULL OR hot_storage_size IS NOT NULL )
ORDER BY total_storage_size DESC , collect_date DESC;
Scenario 4: Query the access trends and data size change trends of tables that occupy small disk space
-- Query the previous-week access trends, storage space trends, and data size change trends of the top 10 tables that occupy small disk space on the previous day.
WITH top10_table AS (SELECT
db_name,
schema_name,
table_name,
hot_storage_size + cold_storage_size AS total_storage_size
FROM
hologres.hg_table_info
WHERE
collect_time >= (current_date - interval '1 day')::timestamptz -- Specify the previous day.
AND collect_time < current_date
AND type = 'TABLE'
ORDER BY total_storage_size ASC LIMIT 10
)
SELECT
base.db_name,
base.schema_name,
base.table_name,
base.hot_storage_size + cold_storage_size AS total_storage_size,
base.row_count,
base.read_sql_count_1d,
base.write_sql_count_1d,
base.collect_time :: date AS collect_date
FROM
hologres.hg_table_info AS base
RIGHT JOIN
top10_table
ON
base.db_name = top10_table.db_name
AND base.schema_name = top10_table.schema_name
AND base.table_name = top10_table.table_name
WHERE
collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
AND type = 'TABLE'
ORDER BY total_storage_size ASC , collect_date DESC ;
Scenario 5: Query tables that occupy large disk space due to an excessive number of small files
-- Query the number of files for each table and the disk space occupied by the table, and sort the tables based on the average file size.
-- The table_group parameter displays only the shard count of the current database. The parameter is left empty for other databases.
SELECT
db_name,
schema_name,
table_name,
cold_storage_size + hot_storage_size AS total_storage_size,
cold_file_count + hot_file_count AS total_file_count,
(cold_storage_size + hot_storage_size) / (cold_file_count + hot_file_count) AS avg_file_size,
tmp_table_info.table_meta ->> 'table_group' AS table_group,
tg_info.shard_count
FROM
hologres.hg_table_info tmp_table_info
LEFT JOIN (
SELECT
tablegroup_name,
property_value AS shard_count
FROM
hologres.hg_table_group_properties
WHERE
property_key = 'shard_count'
) tg_info ON tmp_table_info.table_meta ->> 'table_group' = tg_info.tablegroup_name
WHERE
collect_time > (current_date - interval '1 day')::timestamptz
AND type = 'TABLE'
AND (
cold_storage_size IS NOT NULL
OR hot_storage_size IS NOT NULL
)
AND (
cold_file_count IS NOT NULL
OR hot_file_count IS NOT NULL
)
AND cold_file_count + hot_file_count <> 0
ORDER BY avg_file_size;
Scenario 6: Query the number of rows that changes on the day when the table was last modified
-- Query the time when the table was last modified, and the change in the number of rows compared with the modification before the last modification.
-- If the instance contains a large number of tables, we recommend that you filter the tmp_table_info table to prevent an excessively long query time due to an excessive amount of data.
WITH tmp_table_info AS (
SELECT
db_name,
schema_name,
table_name,
row_count,
collect_time,
last_modify_time
FROM
hologres.hg_table_info
WHERE
last_modify_time IS NOT NULL
AND type = 'TABLE'
-- Add filter conditions to limit the size of the tmp_table_info table.
-- Example 1: collect_time > (current_date - interval '14 day'):: timestamptz
-- Example 2: table_name like ''
-- Example 3: type = 'PARTITION'
)
SELECT
end_data.db_name AS db_name,
end_data.schema_name AS schema_name,
end_data.table_name AS table_name,
(end_data.row_count - start_data.row_count) AS modify_row_count,
end_data.row_count AS current_rows,
end_data.last_modify_time AS last_modify_time
FROM
(
SELECT
db_name,
schema_name,
table_name,
row_count,
last_modify_time
FROM
tmp_table_info
WHERE
collect_time > (current_date-interval '1 day')::timestamptz -- Query the last modification time of the table that was recorded on the previous day.
) end_data
LEFT JOIN (
SELECT
db_name,
schema_name,
table_name,
row_count,
collect_time
FROM
tmp_table_info
) start_data ON (
end_data.db_name = start_data.db_name
AND end_data.schema_name = start_data.schema_name
AND end_data.table_name = start_data.table_name
AND end_data.last_modify_time::date = (start_data.collect_time + interval '1 day')::date
);