All Products
Search
Document Center

Hologres:Query and analyze table statistics

Last Updated:Oct 21, 2024

Hologres V1.3 and later collect statistics on tables in your Hologres instance on a daily basis and store the statistics in the system table named hologres.hg_table_info. This way, you can query and analyze the statistics on your tables and take optimization measures based on the results of statistical analysis. This topic describes how to query and analyze table statistics in Hologres.

Limits

  • Hologres V1.3 and later allow you to query table statistics. If the version of your Hologres instance is earlier than V1.3, manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Upgrade instances. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.

  • The hologres.hg_table_info table is updated with a delay of one day. Data of the current day may be updated before 05:00 on the next day. On the day when your Hologres instance is upgraded from V1.1 to V1.3, table statistics are not generated. If you query the table statistics on this day, the following error message is returned: meta warehouse store currently not available. In this case, you need to query table statistics the next day after the upgrade is complete.

Usage notes

  • By default, table statistics of the previous 30 days are retained.

  • For a non-partitioned internal table in Hologres that is specified by the type='TABLE' configuration, you can query the detailed statistics on the table, including the storage space, number of files, accumulated number of times that the table was accessed, and number of rows.

  • For other objects in Hologres, such as views, materialized views, foreign tables, and parent tables, you can query only the basic information. For example, you can query the number of partitions of a parent table, the name of the table to which a foreign table maps, and the definition of a view or a materialized view.

  • The hologres.hg_table_info table belongs to the meta warehouse system of Hologres. Failure to query the hologres.hg_table_info table does not affect queries in instances. Therefore, the stability of the hologres.hg_table_info table is not specified in the service level agreement (SLA) of Hologres.

hologres.hg_table_info table

The following table describes the fields that are contained in the hologres.hg_table_info table.

Note
  • Table statistics are stored in the system table named hologres.hg_table_info. If the version of your Hologres instance is V1.3 or later, table statistics are collected on a daily basis by default.

  • Specific fields may be empty. If a table was created before the version of the relevant Hologres instance was upgraded to V1.3 or later, the statistics about the creation of the table failed to be collected. This type of statistics can be collected for tables that are created after the version of the relevant Hologres instance is upgraded to V1.3 or later.

Field

Data type

Description

Remarks

db_name

text

The name of the database to which the table belongs.

None.

schema_name

text

The name of the schema to which the table belongs.

None.

table_name

text

The name of the table.

None.

table_id

text

The unique ID of the table. If the table is a foreign table, the unique ID is in the Database name.Schema name.Table name format.

None.

type

text

The type of the table. Valid values:

  • TABLE

  • PARTITION TABLE

  • FOREIGN TABLE

  • VIEW

  • MATERIALIZED VIEW

  • If the type field is set to VIEW, the create_time and last_ddl_time fields are empty.

  • If the type field is set to VIEW, FOREIGN TABLE, or PARTITION TABLE, the last_modify_time, last_access_time, hot_file_count, cold_file_count, total_read_count, and total_write_count fields are empty.

partition_spec

text

The partitioning condition. This field applies to partitioned child tables.

None.

is_partition

boolean

Indicates whether the table is a partitioned child table.

None.

owner_name

text

The username of the table owner. This field corresponds to the usename field of the hologres.hg_query_log table. You can join the two tables based on the owner_name and usename fields.

None.

create_time

timestamp with time zone

The time when the table was created.

None.

last_ddl_time

timestamp with time zone

The time when the table statistics were last updated.

None.

last_modify_time

timestamp with time zone

The time when the table data was last modified.

None.

last_access_time

timestamp with time zone

The time when the table was last accessed.

None.

view_def

text

The definition of the view.

This field applies only to views.

comment

text

The description of the table or view.

None.

hot_storage_size

bigint

The storage space that is occupied by the hot data of the table. Unit: bytes.

It is normal if the storage space in the hologres.hg_table_info table differs from the storage space returned by the pg_relation_size function. This is because statistics in the hologres.hg_table_info table are collected on a daily basis and the query result of the pg_relation_size function excludes the storage space of binary logs.

cold_storage_size

bigint

The storage space that is occupied by the cold data of the table. Unit: bytes.

It is normal if the storage space in the hologres.hg_table_info table differs from the storage space returned by the pg_relation_size function. This is because statistics in the hologres.hg_table_info table are collected on a daily basis and the query result of the pg_relation_size function excludes the storage space of binary logs.

hot_file_count

bigint

The number of hot data files of the table.

None.

cold_file_count

bigint

The number of cold data files of the table.

None.

table_meta

jsonb

The metadata of the table. The value is in the JSONB format.

None.

row_count

bigint

The number of rows in the table or the partition.

For a partitioned parent table, the value of this field indicates the total number of rows in all partitioned child tables.

collect_time

timestamp with time zone

The time when the current statistics were collected.

None.

partition_count

bigint

The number of partitioned child tables.

This field applies only to partitioned parent tables.

parent_schema_name

text

The name of the schema to which the partitioned parent table belongs. The partitioned child table is associated with the partitioned parent table.

This field applies only to partitioned child tables.

parent_table_name

text

The name of the partitioned parent table with which the partitioned child table is associated.

This field applies only to partitioned child tables.

total_read_count

bigint

The accumulated number of times that data was read from the table. The number can be affected by SELECT, INSERT, UPDATE, and DELETE operations and does not indicate an accurate value.

We recommend that you do not use the value of this field.

total_write_count

bigint

The accumulated number of times that data was written to the table. The number can be affected by INSERT, UPDATE, and DELETE operations and does not indicate an accurate value.

We recommend that you do not use the value of this field.

read_sql_count_1d

bigint

The total number of times that data was read from the table from 00:00 to 24:00 on the T-1 day (UTC+8).

  • This field is supported in Hologres V3.0 and later.

  • For a partitioned table, if a partitioned child table is hit based on the SQL statement, this field indicates the value only for the partitioned child table but not the partitioned parent table.

write_sql_count_1d

bigint

The total number of times that data was written to the table from 00:00 to 24:00 on the T-1 day (UTC+8).

  • This field is supported in Hologres V3.0 and later.

  • For a partitioned table, if a partitioned child table is hit based on the SQL statement, this field indicates the value only for the partitioned child table but not the partitioned parent table.

Grant query permissions to a user

Only an authorized user can query table statistics. The following section describes how to grant query permissions to a user in different scenarios.

  • Query the table statistics of all databases in a Hologres instance.

    • Assign the superuser role to a user.

      After you assign the superuser role to a user, the user can view the table statistics of all databases in the Hologres instance.

      -- Replace Alibaba Cloud account ID with the Alibaba Cloud account ID of the user. For a RAM user, add the p4_. prefix to the account ID. 
      ALTER USER "Alibaba Cloud account ID" SUPERUSER;
    • Add users to the pg_stat_scan_table user group.

      Superusers and users in the pg_stat_scan_tables (for versions earlier than Hologres V1.3.44) or pg_read_all_stats (for Hologres V1.3.44 and later) user group have permissions to query the table statistics of all databases in the relevant Hologres instance. If you are a superuser, you can add regular users to the user group. Syntax:

      -- For versions earlier than Hologres V1.3.44
      GRANT pg_stat_scan_tables TO "Alibaba Cloud account ID";-- Use the standard PostgreSQL authorization model to grant relevant permissions to the user.
      CALL spm_grant('pg_stat_scan_tables', 'Alibaba Cloud account ID'); -- Use the simple permission model (SPM) to grant relevant permissions to the user.
      CALL slpm_grant('pg_stat_scan_tables', 'Alibaba Cloud account ID'); -- Use the schema-level permission model (SLPM) to grant relevant permissions to the user.
      
      -- For Hologres V1.3.44 and later
      GRANT pg_read_all_stats TO "Alibaba Cloud account ID"; -- Use the standard PostgreSQL authorization model to grant relevant permissions to the user.
      CALL spm_grant('pg_read_all_stats', 'Alibaba Cloud account ID');  -- Use the SPM to grant relevant permissions to the user.
      CALL slpm_grant('pg_read_all_stats', 'Alibaba Cloud account ID'); -- Use the SLPM to grant relevant permissions to the user.
  • Query the table statistics of a database.

    Users in the db_admin user group have the permissions to query the table statistics of the relevant database. If you are a superuser, you can add regular users to this user group by using the SPM or SLPM.

    Note

    A regular user can query statistics only on the tables owned by the user in the relevant database.

    CALL spm_grant('<db_name>_admin', 'Account ID'); -- Use the SPM to grant related permissions to the user.
    CALL slpm_grant('<db_name>.admin', 'Account ID'); -- Use the SLPM to grant related permissions to the user.

Execute SQL statements to query the trends of table statistics

Scenario 1: Query the access trends of Hologres internal tables

-- Query the trends of statistics on all internal tables in an instance. The statistics include the occupied storage space, file quantity, accumulated numbers of times that data was read from tables, accumulated numbers of times that data was written to tables, and number of row records.
SELECT
  db_name,
  schema_name,
  table_name,
  collect_time :: date AS collect_date,
  hot_storage_size,
  cold_storage_size,
  hot_file_count,
  cold_file_count,
  read_sql_count_1d,
  write_sql_count_1d,
  row_count
FROM
  hologres.hg_table_info
WHERE
  collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
  AND type ='TABLE'
  ORDER  BY  collect_date desc ;

Scenario 2: Query the access information of tables that occupy large disk space

-- Query the access information of the top 10 tables that occupy large disk space.
SELECT
  db_name,
  schema_name,
  table_name,
  hot_storage_size + cold_storage_size AS total_storage_size,
  row_count,
  sum(read_sql_count_1d) AS total_read_count,
  sum(write_sql_count_1d) AS total_write_count
FROM
  hologres.hg_table_info
WHERE
  collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
  AND type = 'TABLE'
  AND (
    cold_storage_size IS NOT NULL
    OR hot_storage_size IS NOT NULL
  )
GROUP BY db_name,schema_name,table_name,total_storage_size,row_count
ORDER BY total_storage_size DESC
LIMIT 10;

Scenario 3: Query the access trends and data size change trends of the top 10 tables that occupy large disk space

-- Query the previous-week access trends, storage space trends, and data size change trends of the top 10 tables that occupy large disk space on the previous day.
WITH top10_table AS (SELECT
  db_name,
  schema_name,
  table_name,
  hot_storage_size + cold_storage_size AS total_storage_size
FROM
  hologres.hg_table_info
WHERE
  collect_time >= (current_date - interval '1 day')::timestamptz -- Specify the previous day.
  AND collect_time < current_date
  AND type = 'TABLE'
  AND ( cold_storage_size IS NOT NULL OR hot_storage_size IS NOT NULL )
ORDER BY total_storage_size DESC
LIMIT 10
)
SELECT
  base.db_name,
  base.schema_name,
  base.table_name,
  base.hot_storage_size + cold_storage_size AS total_storage_size,
  base.row_count,
  base.read_sql_count_1d,
  base.write_sql_count_1d,
  base.collect_time :: date AS collect_date
FROM
  hologres.hg_table_info AS base
LEFT JOIN
  top10_table
ON
  base.db_name = top10_table.db_name
  AND base.schema_name = top10_table.schema_name
  AND base.table_name = top10_table.table_name
WHERE
  collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
  AND type = 'TABLE'
  AND ( cold_storage_size IS NOT NULL OR hot_storage_size IS NOT NULL )
ORDER BY total_storage_size DESC , collect_date DESC;

Scenario 4: Query the access trends and data size change trends of tables that occupy small disk space

-- Query the previous-week access trends, storage space trends, and data size change trends of the top 10 tables that occupy small disk space on the previous day.
WITH top10_table AS (SELECT
  db_name,
  schema_name,
  table_name,
  hot_storage_size + cold_storage_size AS total_storage_size
FROM
  hologres.hg_table_info
WHERE
  collect_time >= (current_date - interval '1 day')::timestamptz -- Specify the previous day.
  AND collect_time < current_date
  AND type = 'TABLE'
ORDER BY total_storage_size ASC LIMIT 10
)
SELECT
  base.db_name,
  base.schema_name,
  base.table_name,
  base.hot_storage_size + cold_storage_size AS total_storage_size,
  base.row_count,
  base.read_sql_count_1d,
  base.write_sql_count_1d,
  base.collect_time :: date AS collect_date
FROM
  hologres.hg_table_info AS base
RIGHT JOIN
  top10_table
ON
  base.db_name = top10_table.db_name
  AND base.schema_name = top10_table.schema_name
  AND base.table_name = top10_table.table_name
WHERE
  collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
  AND type = 'TABLE'
ORDER BY total_storage_size ASC  , collect_date DESC ;

Scenario 5: Query tables that occupy large disk space due to an excessive number of small files

-- Query the number of files for each table and the disk space occupied by the table, and sort the tables based on the average file size.
-- The table_group parameter displays only the shard count of the current database. The parameter is left empty for other databases.
SELECT
  db_name,
  schema_name,
  table_name,
  cold_storage_size + hot_storage_size AS total_storage_size,
  cold_file_count + hot_file_count AS total_file_count,
  (cold_storage_size + hot_storage_size) / (cold_file_count + hot_file_count) AS avg_file_size,
  tmp_table_info.table_meta ->> 'table_group' AS table_group,
  tg_info.shard_count
FROM
  hologres.hg_table_info tmp_table_info
  LEFT JOIN (
    SELECT
      tablegroup_name,
      property_value AS shard_count
    FROM
      hologres.hg_table_group_properties
    WHERE
      property_key = 'shard_count'
  ) tg_info ON tmp_table_info.table_meta ->> 'table_group' = tg_info.tablegroup_name
WHERE
  collect_time > (current_date - interval '1 day')::timestamptz
  AND type = 'TABLE'
  AND (
    cold_storage_size IS NOT NULL
    OR hot_storage_size IS NOT NULL
  )
  AND (
    cold_file_count IS NOT NULL
    OR hot_file_count IS NOT NULL
  )
  AND cold_file_count + hot_file_count <> 0
ORDER BY avg_file_size;

Scenario 6: Query the number of rows that changes on the day when the table was last modified

-- Query the time when the table was last modified, and the change in the number of rows compared with the modification before the last modification.
-- If the instance contains a large number of tables, we recommend that you filter the tmp_table_info table to prevent an excessively long query time due to an excessive amount of data.
WITH tmp_table_info AS (
  SELECT
    db_name,
    schema_name,
    table_name,
    row_count,
    collect_time,
    last_modify_time
  FROM
    hologres.hg_table_info
  WHERE
    last_modify_time IS NOT NULL
    AND type = 'TABLE'
    -- Add filter conditions to limit the size of the tmp_table_info table.
    -- Example 1: collect_time > (current_date - interval '14 day'):: timestamptz
    -- Example 2: table_name like ''
    -- Example 3: type = 'PARTITION'
)
SELECT
  end_data.db_name AS db_name,
  end_data.schema_name AS schema_name,
  end_data.table_name AS table_name,
  (end_data.row_count - start_data.row_count) AS modify_row_count,
  end_data.row_count AS current_rows,
  end_data.last_modify_time AS last_modify_time
FROM
  (
    SELECT
      db_name,
      schema_name,
      table_name,
      row_count,
      last_modify_time
    FROM
      tmp_table_info
    WHERE
      collect_time > (current_date-interval '1 day')::timestamptz -- Query the last modification time of the table that was recorded on the previous day.
  ) end_data
  LEFT JOIN (
    SELECT
      db_name,
      schema_name,
      table_name,
      row_count,
      collect_time
    FROM
      tmp_table_info
  ) start_data ON (
    end_data.db_name = start_data.db_name
    AND end_data.schema_name = start_data.schema_name
    AND end_data.table_name = start_data.table_name
    AND end_data.last_modify_time::date = (start_data.collect_time + interval '1 day')::date
  );