All Products
Search
Document Center

Hologres:Hologres Monitoring Metrics in the Console

Last Updated:Feb 27, 2026

Hologres exposes monitoring metrics through the console and Cloud Monitor so you can track resource usage, query execution, and system health in real time.

Metrics overview

CategoryMetricDescriptionSupported instance typesNotes
CPUInstance CPU Usage (%)CPU usage of the instance.General-purpose, follower, compute group--
CPUWorker Node CPU Usage (%)CPU usage of each Worker node.General-purpose, follower, compute groupV1.1+
CPUCluster CPU Usage (%)CPU usage of each Cluster in the compute group.Compute groupV4.0+
MemoryInstance Memory Usage (%)Total memory usage of the instance.General-purpose, follower, compute group--
MemoryWorker Node Memory Usage (%)Memory usage of each Worker node.General-purpose, follower, compute groupV1.1+
MemoryDetailed Compute Group Memory Usage (%)Memory usage broken down by System, Meta, Cache, Query, and Background.General-purpose, follower, compute groupV2.0+
MemoryQE Query Memory Usage (bytes)Memory used by QE engine queries.General-purpose, follower, compute groupV2.0.44+ / V2.1.22+
MemoryQE Query Memory Usage (%)Percentage of memory used by QE engine queries.General-purpose, follower, compute groupV2.0.44+ / V2.1.22+
MemoryCluster Memory Usage (%)Memory usage of each Cluster in the compute group.Compute groupV4.0+
Query QPS and RPSQuery QPS (count/s)Total queries per second. Query QPS >= QE QPS + FixedQE QPS.General-purpose, follower, compute group, shared cluster--
Query QPS and RPSQE Query QPS (count/s)Queries per second executed by the QE engine.General-purpose, follower, compute groupV2.2+
Query QPS and RPSFixedQE Query QPS (count/s)Queries per second executed by the FixedQE engine.General-purpose, follower, compute groupV2.2+
Query QPS and RPSDML RPS (count/s)Total rows per second for DML operations. DML RPS = QE RPS + FixedQE RPS.General-purpose, compute group--
Query QPS and RPSQE DML RPS (count/s)DML rows per second by the QE engine.General-purpose, compute groupV2.2+
Query QPS and RPSFixedQE DML RPS (count/s)DML rows per second by the FixedQE engine.General-purpose, compute groupV2.2+
Query LatencyQuery Latency (milliseconds)Average latency of all queries. Query Latency >= MAX(QE Latency, FixedQE Latency).General-purpose, follower, compute group, shared cluster--
Query LatencyQE Query Latency (milliseconds)Average latency of QE engine queries.General-purpose, follower, compute groupV2.2+
Query LatencyFixedQE Query Latency (milliseconds)Average latency of FixedQE engine queries.General-purpose, follower, compute groupV2.2+
Query LatencyOptimization Phase Duration (milliseconds)Time spent in the query optimization phase.General-purpose, follower, compute group, shared clusterV2.0.44+ / V2.1.22+
Query LatencyStart Query Phase Duration (milliseconds)Time spent in query initialization (locking, schema alignment).General-purpose, follower, compute group, shared clusterV2.0.44+ / V2.1.22+
Query LatencyGet Next Phase Duration (milliseconds)Time from initialization to result delivery.General-purpose, follower, compute group, shared clusterV2.0.44+ / V2.1.22+
Query LatencyQuery P99 Latency (milliseconds)99th percentile latency of all queries.General-purpose, follower, compute group, shared cluster--
Query LatencyLongest Running Query Duration in This Instance (milliseconds)Duration of the longest-running active query.General-purpose, follower, compute group, shared clusterV1.1+
Failed Query QPSFailed Query QPS (milliseconds)Total failed queries per second. Failed QPS >= QE Failed QPS + FixedQE Failed QPS.General-purpose, follower, compute group, shared cluster--
Failed Query QPSQE Failed Query QPS (count/s)Failed queries per second by the QE engine.General-purpose, follower, compute groupV2.2+
Failed Query QPSFixedQE Failed Query QPS (count/s)Failed queries per second by the FixedQE engine.General-purpose, compute groupV2.2+
LocksMaximum FE Lock Wait Time (milliseconds)DDL lock wait time on FE nodes.General-purpose, follower, compute groupV2.0.44+ / V2.1.22+
LocksFixedQE Backend Lock Wait Time (milliseconds)Lock wait time for FixedQE (typically HQE locks).General-purpose, follower, compute groupV2.0.44+ / V2.1.22+
LocksTotal Backend Lock Wait Time for Instance (milliseconds)Total HQE lock wait time, including FixedQE lock waits.General-purpose, follower, compute groupV2.0.44+ / V2.1.22+
ConnectionTotal Connections (count)Total active connections in the instance.General-purpose, follower, compute group, shared cluster--
ConnectionConnections by Database (count)Connections aggregated by database.General-purpose, follower, compute group--
ConnectionConnections by FE (count)Connections aggregated by FE node.General-purpose, follower, compute group--
ConnectionConnection Usage Rate of FE with Highest Usage (%)Peak connection usage rate across all FE nodes.General-purpose, follower, compute group--
Query QueueQueued Queries CountQueries waiting to be executed.General-purpose, follower, compute groupV3.0+
Query QueueQuery Queue Entry QPS (count/s)Queries submitted to the queue per second.General-purpose, follower, compute groupV3.0+
Query QueueQueries Transitioned from Queued to Running QPS (count/s)Queries moving from waiting to running per second.General-purpose, follower, compute groupV3.0+
Query QueueQPS by State for Queries That Started Running (count/s)Per-second count of queries grouped by execution state.General-purpose, follower, compute groupV3.0+
Query QueueAverage Query Queue Wait Time (milliseconds)Average time from queue entry to processing start.General-purpose, follower, compute groupV3.0+
Query QueueQuery Queue Auto-Rate-Limit Max Concurrency (count)Maximum concurrency for auto-rate-limited queues.Compute groupV3.1+
I/OStandard I/O Read Throughput (bytes/s)Read throughput for Standard storage.General-purpose, follower, compute group--
I/OStandard I/O Write Throughput (bytes/s)Write throughput for Standard storage.General-purpose, compute group--
I/OLow-Frequency IO Read Throughput (bytes/s)Read throughput for IA storage.General-purpose, follower, compute group--
I/OWrite throughput for low-frequency I/O (bytes/s)Write throughput for IA storage.General-purpose, compute group--
StorageStandard Storage Used Capacity (bytes)Capacity used in Standard storage.General-purpose, compute group--
StorageStandard Storage Usage (%)Usage percentage of Standard storage.General-purpose, compute group--
StorageIA Storage Used Capacity (bytes)Capacity used in IA storage.General-purpose, compute group--
StorageIA Storage Usage (%)Usage percentage of IA storage.General-purpose, compute group--
StorageRecycle Bin Storage Usage (bytes)Storage consumed by the recycle bin.General-purpose, compute groupV3.1+
FrameworkFE Replay Delay (milliseconds)Replay delay for each FE node.General-purpose, follower, compute groupV2.2+
FrameworkShard Multi-Replica Sync Delay (milliseconds)Sync delay between Shard replicas.General-purpose, follower, compute group--
FrameworkPrimary-Follower Sync Delay (milliseconds)Data sync delay from primary to follower instance.General-purpose, follower, compute group--
FrameworkCross-Instance File Sync Delay (milliseconds)File sync delay between disaster recovery instances.General-purpose--
Auto AnalyzeTables Missing Statistics per Database (count)Tables lacking statistics in each database.General-purpose, compute groupV2.2+
Serverless ComputingLongest Running Serverless Computing Query Duration (milliseconds)Longest-running Serverless Computing query.General-purpose, compute groupV2.1+
Serverless ComputingServerless Computing Query Queue CountQueries queued in the Serverless Computing pool.General-purpose, compute groupV2.2+
Serverless ComputingServerless Computing Resource Quota Usage (%)Ratio of used to maximum allocatable Serverless Computing resources.General-purpose, compute groupV2.2+
Binary LoggingBinlog Consumption Rate (count/s)Binlog entries consumed per second.General-purpose, follower, compute groupV2.2+
Binary LoggingBinlog Consumption Rate (bytes/s)Bytes consumed from Binlog per second.General-purpose, follower, compute groupV2.2+
Binary LoggingWAL Sender Count per FE (count)WAL senders used per FE node.General-purpose, follower, compute groupV2.2+
Binary LoggingWAL Sender Usage Rate of FE with Highest Usage (%)Peak WAL sender usage across FE nodes.General-purpose, follower, compute groupV2.2+
Computing ResourceElastic Core Count for Compute GroupsCores added by time-based scaling.Compute groupV2.2.21+
Computing ResourceCompute Group Auto-Elastic Core Count (count)Cores added by auto-scaling.Compute groupV4.0+
GatewayGateway CPU Usage (%)CPU usage of each Gateway.Compute groupV2.0+
GatewayGateway Memory Usage (%)Memory usage of each Gateway.Compute groupV2.0+
GatewayGateway New Connection Requests per Second (count/s)New connections established per second.Compute groupV2.1.12+
GatewayGateway Inbound Traffic Rate (B/s)Data entering through the Gateway per second.Compute groupV2.1+
GatewayGateway Outbound Traffic Rate (B/s)Data sent from the Gateway per second.Compute groupV2.1+
Dynamic TableInstance-Level Dynamic Table Refresh Failure QPS (count/s)Refresh failure rate across all Dynamic Tables.General-purpose, compute groupV4.0.8+
Dynamic TableDynamic Table Data Latency (seconds)Latency relative to the latest upstream data.General-purpose, compute groupV4.0.8+
Dynamic TableDynamic Table Current Refresh Duration (milliseconds)Duration of the ongoing refresh task.General-purpose, compute groupV4.0.8+
Dynamic TableDynamic Table Refresh Failure QPM (count/minute)Refresh failures per minute per Dynamic Table.General-purpose, compute groupV4.0.8+

Cloud Monitor metric IDs

Each metric has a unique ID in Cloud Monitor. The ID prefix varies by instance type:

Instance typePrefixMetric reference
General-purpose instancestandard_General-purpose instance metrics
Follower instancefollower_Follower instance metrics
Compute group instancewarehouse_Compute group instance metrics
Lakehouse Acceleration (Shared Cluster)shared_Shared cluster metrics

Engine categories and command types

Engine categories in monitoring metrics:

  • QE is a collective term for Hologres proprietary vector compute engines (HQE, SQE) under the XQE engine family. In slow query logs, queries with Engine Type={XQE} map to the QE category.

  • FixedQE refers to queries that use the Fixed Plan path. In slow query logs, queries with Engine Type={FixedQE} (or SDK in versions earlier than V2.2) map to the FixedQE category.

Command Type classification:

  • Command Type matches the SQL statement type. For example, both INSERT xxx and INSERT xxx ON CONFLICT DO UPDATE/NOTHING are classified as INSERT.

  • UNKNOWN: SQL statements that the DPI engine cannot recognize due to syntax errors.

  • UTILITY: Administrative, definition, and control commands other than INSERT, UPDATE, DELETE, and SELECT, including:

    • DDL: CREATE, ALTER, DROP, TRUNCATE, COMMENT

    • TCL: BEGIN, COMMIT, ROLLBACK, SAVEPOINT

    • Administration and maintenance: ANALYZE, VACUUM, EXPLAIN, SET, SHOW, COPY, REFRESH

    • Execution and procedural control: PREPARE, EXECUTE, DEALLOCATE, CALL, DECLARE CURSOR

    • Others: LOCK TABLE, LISTEN, NOTIFY

Access control

The Hologres console monitoring page retrieves data from Cloud Monitor. Resource Access Management (RAM) users need one of the following permissions to view monitoring information:

Permission policyAccess level
AliyunCloudMonitorFullAccessFull management permissions for Cloud Monitor
AliyunCloudMonitorReadOnlyAccessRead-only access to Cloud Monitor

For details on granting permissions, see Grant permissions to RAM users.

General notes

  • If a metric shows no data, the instance version may not support it, or there has been no activity for an extended period.

  • Monitoring data is retained for up to 30 days.

  • Metrics are reported every minute.

CPU

Instance CPU Usage (%)

The overall CPU load of the instance.

Background processes and asynchronous compaction tasks consume CPU even without active queries, so some usage during idle periods is normal. Hologres uses multi-core parallel computing, which means a single query can push CPU usage to 100% -- this indicates full utilization of compute resources, not necessarily an issue.

When to investigate: If CPU usage remains near 100% for 3 hours or above 90% for 12 hours, the instance is heavily loaded and CPU is likely the bottleneck. Consider whether:

  • Large offline data imports (INSERT) are running with growing data volumes.

  • High-QPS queries or writes are consuming all CPU resources.

  • Hybrid workloads combine the above scenarios.

If sustained high CPU is expected for your business, scale up the instance to handle larger workloads.

For more information, see FAQ for monitoring metrics.

Worker Node CPU Usage (%)

The CPU load on each Worker node. The number of Worker nodes varies by instance type. For more information, see Instance management.

Version: V1.1+

  • If all Worker nodes show sustained CPU usage near 100%, the instance is heavily loaded. Optimize queries or scale up the instance.

  • If only some Worker nodes show high CPU usage, a resource skew exists. For common causes and troubleshooting, see FAQ for monitoring metrics.

Cluster CPU Usage (%)

The CPU usage of each Cluster in the compute group.

Version: V4.0+. Compute group instances only.

Memory

Instance Memory Usage (%)

The overall memory consumption of the instance.

Hologres reserves memory for metadata, indexes, and data caches to accelerate queries. Idle memory usage of 30% to 40% is typical. If memory usage steadily climbs toward 80%, memory may become a bottleneck.

Use memory distribution metrics together with QPS and other indicators to identify high-memory consumers. For more information, see Troubleshooting guide for out-of-memory issues.

Worker Node Memory Usage (%)

The memory load on each Worker node. The number of Worker nodes varies by instance type. For more information, see Instance management.

Version: V1.1+

  • If all Worker nodes show sustained memory usage near 80%, the instance is heavily loaded. Optimize queries or scale up the instance.

  • If only some Worker nodes show high memory usage, a resource skew exists. For common causes and troubleshooting, see FAQ for monitoring metrics.

Detailed Compute Group Memory Usage (%)

Version: V2.0+ (memory distribution metrics available from V2.0.15)

Hologres divides memory into six categories:

CategoryWhat it tracksTypical behavior
SystemHolohub, Gateway, and FE (FE Master + FE Query)Fluctuates with query activity
CacheSQL caches (result cache, block cache) and Meta cache (schema/file metadata)Fixed size, typically ~30% of total instance memory. Some usage persists when idle (mainly Meta cache). Higher cache hit rates improve query performance -- smaller Physical read bytes values in EXPLAIN ANALYZE indicate better hit rates.
MetaMetadata and files. Uses lazy open mode -- frequently accessed metadata stays in memory, infrequently accessed metadata does not.Keep under 30% of total memory. High Meta usage suggests many files or partitioned tables. Use Table statistics overview and analysis to investigate.
QueryMemory consumed during SQL execution, including Fixed Plan, HQE, and SQE.Elastic allocation: minimum 20 GB per Worker, maximum depends on available free memory. High usage in other categories reduces Query memory.
BackgroundCompaction and flush tasks.Typically under 5%. Temporarily increases during index changes, bulk writes, or updates.
MemtableIn-memory tables for real-time writes, updates, and deletes.Typically under 5%.

Troubleshooting: High Query memory usage or out-of-memory (OOM) events typically indicate complex queries or high concurrency. For optimization guidance, see Optimize query performance.

QE Query Memory Usage (bytes)

The memory (in bytes) used by queries executed by HQE, SQE, or other XQE engines.

Version: V2.0.44+ / V2.1.22+

In memory breakdowns, Query memory usage exceeds QE Query memory usage because Query includes all engine types. Higher QE Query memory usage indicates more complex queries that require more memory.

QE Query Memory Usage (%)

The percentage of memory used by QE engine queries.

Version: V2.0.44+ / V2.1.22+

High usage may lead to OOM errors. Optimize queries or scale up the instance.

Cluster Memory Usage (%)

The memory usage of each Cluster in the compute group.

Version: V4.0+. Compute group instances only.

Query QPS and RPS

Query QPS (count/s)

The average number of SQL statements executed per second across the instance, including SELECT, INSERT, UPDATE, DELETE, UTILITY, and UNKNOWN statements.

Relationship: Query QPS >= QE Query QPS + FixedQE Query QPS

The total QPS includes all queries (such as UNKNOWN, UTILITY, and Engine Type={PG}), so it is greater than or equal to the sum of QE and FixedQE QPS.

QE Query QPS (count/s)

Queries per second executed by the QE engine, including SELECT, INSERT, UPDATE, and DELETE statements.

Version: V2.2+

FixedQE Query QPS (count/s)

Queries per second executed by the FixedQE engine (Fixed Plan path, formerly SDK), including SELECT, INSERT, UPDATE, and DELETE statements.

Version: V2.2+

DML RPS (count/s)

The average number of data records imported or updated per second, including INSERT, UPDATE, and DELETE statements.

Relationship: DML RPS = QE DML RPS + FixedQE DML RPS

QE DML RPS (count/s)

Data records imported or updated per second by the QE engine, including INSERT, UPDATE, and DELETE statements.

Version: V2.2+

Common QE scenarios:

  • Batch import or update from MaxCompute or OSS external tables

  • Batch write or update using COPY

  • Batch import between Hologres tables

FixedQE DML RPS (count/s)

Data records imported or updated per second by the FixedQE engine (formerly SDK), including INSERT, UPDATE, and DELETE statements.

Version: V2.2+

Common FixedQE scenarios:

Query Latency

Query Latency (milliseconds)

The average latency of all queries in the instance, including SELECT, INSERT, UPDATE, DELETE, UTILITY, and UNKNOWN statements.

Relationship: Query Latency >= MAX(QE Query Latency, FixedQE Query Latency)

QE Query Latency (milliseconds)

The average latency of queries executed by the QE engine, including SELECT, INSERT, UPDATE, and DELETE statements.

Version: V2.2+

To troubleshoot elevated QE Query latency, check the Optimization Phase Duration, Start Query Phase Duration, Get Next Phase Duration, and QE QPS metrics.

FixedQE Query Latency (milliseconds)

The average latency of queries executed by the FixedQE engine, including SELECT, INSERT, UPDATE, and DELETE statements.

Version: V2.2+

Troubleshooting:

  • Occasional spikes: May indicate HQE locks. Check whether the FixedQE Backend Lock Wait Time has increased. If so, use Query Insight to identify the locking queries.

  • Persistent high latency: May result from a suboptimal table design or interference from complex queries. See Common issues and diagnostics for Blink and Flink.

Optimization Phase Duration (milliseconds)

The time spent in the Optimization phase, where the optimizer parses the SQL statement and generates a physical plan.

Version: V2.0.44+ / V2.1.22+

Long Optimization durations suggest complex queries. If queries differ only in their parameters, use Prepared Statements to reduce optimization overhead. For more information, see JDBC.

Start Query Phase Duration (milliseconds)

The time spent in the Start Query phase -- the initialization before actual query execution, including locking and schema version alignment.

Version: V2.0.44+ / V2.1.22+

Long Start Query durations often result from lock waits or high CPU usage. Use execution plans for deeper analysis.

Get Next Phase Duration (milliseconds)

The time from the end of the Start Query phase until all results are returned, including computation and result delivery.

Version: V2.0.44+ / V2.1.22+

Long Get Next durations often reflect complex computations. Correlate with QE memory usage and QE QPS. If no anomalies exist in those metrics, the client may simply be slow to consume the results.

Query P99 Latency (milliseconds)

The 99th percentile latency of all queries in the instance, including SELECT, INSERT, UPDATE, UTILITY, and system queries.

Longest Running Query Duration in This Instance (milliseconds)

The duration of the longest-running query currently executing in the instance, covering SELECT, INSERT, UPDATE, DELETE, UTILITY, and UNKNOWN statements.

Version: V1.1+

Hologres is a distributed system with multiple Worker nodes. Queries are distributed across Workers, and this metric reports the longest-running query across all Workers. For example, if Workers are running queries of 10 minutes, 5 minutes, and 30 seconds, the reported value is 10 minutes.

Combine this metric with active queries or slow query logs to diagnose long-running queries and resolve deadlocks.

Metrics are reported every minute, so the "current running duration" starts slightly after the query begins. This metric is useful for anomaly detection but does not provide precise timing.

Failed Query QPS

Failed Query QPS (milliseconds)

The average number of failed SQL statements per second in the instance, including SELECT, INSERT, UPDATE, DELETE, UTILITY, and UNKNOWN statements.

Relationship: Failed Query QPS >= QE Failed Query QPS + FixedQE Failed Query QPS

The total failed QPS includes all failed queries (such as UNKNOWN, UTILITY, and Engine Type={PG}), so it is greater than or equal to the sum of QE and FixedQE failed QPS.

Use the failed query type and frequency to find failing queries in the slow query logs, then analyze root causes to improve availability.

QE Failed Query QPS (count/s)

Failed queries per second executed by the QE engine, including SELECT, INSERT, UPDATE, and DELETE statements.

Version: V2.2+

FixedQE Failed Query QPS (count/s)

Failed queries per second executed by the FixedQE engine, including SELECT, INSERT, UPDATE, and DELETE statements.

Version: V2.2+

Locks

Maximum FE Lock Wait Time (milliseconds)

Hologres has multiple FE nodes that parse, dispatch, and route SQL statements. When multiple connections on the same FE perform DDL operations on the same table (such as CREATE or DROP), FE locks occur. This metric shows the DDL lock wait time per FE.

Version: V2.2+

If the FE lock wait time exceeds 5 minutes and the FE Replay Delay also spikes, a DDL operation may be stuck. Use Manage queries to find and terminate long-running queries.

FixedQE Backend Lock Wait Time (milliseconds)

INSERT, DELETE, or UPDATE queries using HQE take table locks, while FixedPlan queries take row locks. This metric increases when FixedPlan queries wait for row locks while HQE queries hold table locks on the same table.

Version: V2.2+

If this value is high, check slow query logs for slow FixedQE queries, then use Query Insight to identify the locking HQE queries.

Total Backend Lock Wait Time for Instance (milliseconds)

The total lock wait time for INSERT, DELETE, or UPDATE queries in the instance, including both FixedQE and HQE lock waits.

Version: V2.2+

If this value is high, check slow query logs for slow INSERT, DELETE, or UPDATE queries, then use Query Insight to identify the locking HQE queries.

Connection

Total Connections (count)

All active connections in the instance, including those in active, idle, and idle-in-transaction states. Hologres sets default connection limits based on instance type. For more information, see Instance management.

Use Manage queries to view current usage. Kill idle connections if available connections are low.

Connections by Database (count)

Connections aggregated by database, for assessing per-database connection usage.

  • Default connection limit per database: 128. For more information, see Instance management.

  • If connections approach the limit, review idle versus business connections. Use Connection management to clean up idle connections or scale up.

  • If connection load skews across Workers, use Connection management to rebalance.

Connections by FE (count)

Connections aggregated by FE node, for assessing per-FE connection usage.

  • Default connection limit per FE node: 128. For more information, see Instance management.

  • If connections approach the limit, review idle versus business connections. Use Connection management to clean up idle connections or scale up.

  • If connection load skews across Workers, use Connection management to rebalance.

Connection Usage Rate of FE with Highest Usage (%)

Reports the highest connection usage rate among all FE nodes: Max(frontend_connection_used_rate). FE nodes use round-robin load balancing to distribute new connections evenly.

Use Manage queries to view current usage. Kill idle connections if available connections are low.

Query Queue

Queued Queries Count (count)

The number of query requests waiting to be executed.

Version: V3.0+

Query Queue Entry QPS (count/s)

Queries submitted to the system queue per second. Use this to gauge the system load and query frequency.

Version: V3.0+

Queries Transitioned from Queued to Running QPS (count/s)

Queries moving from the waiting state to the running state per second.

Version: V3.0+

QPS by State for Queries That Started Running (count/s)

Per-second count of queries in the query queue, grouped by state:

  • kReadyToRun -- qualified to run

  • kQueueTimeout -- failed due to queue timeout

  • kCanceled -- failed due to cancellation

  • kExceedConcurrencyLimit -- failed due to concurrency limit

Version: V3.0+

Average Query Queue Wait Time (milliseconds)

The average time from queue entry to processing start. This does not include actual query execution time.

Version: V3.0+

Query Queue Auto-Rate-Limit Max Concurrency (count)

The maximum concurrency for auto-rate-limited query queues.

Version: V3.1+. Compute group instances only.

I/O

I/O throughput reflects disk I/O activity. Note: 1 GiB = 1024 MiB = 1024 x 1024 KiB.

I/O throughput limits:

  • Standard storage (hot): I/O throughput is not fixed. It depends primarily on CPU load.

  • IA storage (cold): Maximum I/O throughput is 80 MB/s x (number of cores / 16).

Standard I/O Read Throughput (bytes/s)

Read throughput for Standard storage data.

Standard I/O Write Throughput (bytes/s)

Write throughput for Standard storage data.

Low-Frequency IO Read Throughput (bytes/s)

Read throughput for IA storage data.

Write throughput for low-frequency I/O (bytes/s)

Write throughput for IA storage data.

Storage

The logical disk space used by instance data -- the sum of all database storage, including the recycle bin. Note: 1 GiB = 1024 MiB = 1024 x 1024 KiB. Hologres storage grows continuously with no hard cap.

For subscription instances, storage exceeding the purchased amount is automatically billed on a pay-as-you-go basis. This does not affect system stability or usability. After exceeding the storage capacity, promptly upgrade storage or delete unused data to avoid unnecessary costs.

Use pg_relation_size to view table and database storage sizes. Use Table Info for fine-grained table management.

Standard Storage Used Capacity (bytes)

The capacity used in Standard storage. Scale up storage if usage exceeds the purchased capacity.

Standard Storage Usage (%)

The usage percentage of Standard storage capacity. Scale up storage if usage exceeds the purchased capacity.

IA Storage Used Capacity (bytes)

The capacity used in IA storage. Scale up storage if usage exceeds the purchased capacity.

IA Storage Usage (%)

The usage percentage of IA storage capacity. Scale up storage if usage exceeds the purchased capacity.

Recycle Bin Storage Usage (bytes)

Version: V3.1+

Hologres supports a table recycle bin starting in V3.1. Tables dropped with DROP remain in the recycle bin for a retention period, allowing recovery of accidentally dropped tables. These tables still consume instance storage.

Monitor recycle bin usage per database. If frequent table drops cause high recycle bin usage, configure tables to skip the recycle bin upon deletion.

Framework

FE Replay Delay (milliseconds)

Version: V2.2+

Hologres has multiple FE nodes. For DDL operations, Hologres executes the operation on one FE and replays it on the others. Millisecond- or second-level replay delays are normal.

If an FE's replay delay exceeds several minutes, too many DDL operations may be overwhelming the replay process. If the delay continues to increase, a query may be stuck. Use hg_stat_activity to find and terminate long-running queries.

Shard Multi-Replica Sync Delay (milliseconds)

The sync delay between Shard replicas after Replication is enabled.

The typical Shard replica delay is in milliseconds. Heavy data writes, updates, or frequent DDL operations may increase the sync delay.

Primary-Follower Sync Delay (milliseconds)

The delay when a follower instance reads data from the primary instance. This metric appears only for follower instances, not primary instances.

  • Data appears only after a follower instance is bound to a primary instance (0 ms initially). The sync delay fluctuates when the primary instance receives writes.

  • Normal sync delay is in milliseconds. Occasional jitter from primary DDL operations is safe to ignore. Persistent high delay of more than a few seconds may indicate a high instance load or resource shortage -- check CPU and memory usage and scale up if needed.

  • Sync delay may spike to several minutes during restarts or upgrades and then recovers automatically.

Cross-Instance File Sync Delay (milliseconds)

The file sync delay between disaster recovery instances. This metric appears only on follower instances (read-only followers).

Auto Analyze

Tables Missing Statistics per Database (count)

The number of tables lacking statistics in each database.

Version: V2.2+

For Hologres V2.0 and later, Auto Analyze runs by default. After table creation or bulk writes/updates, statistics may temporarily lag -- observe for a short period first.

If a database consistently lacks statistics for hours or days, Auto Analyze may not have been triggered. Use the HG_STATS_MISSING view to list affected tables, then manually run ANALYZE. For more information, see ANALYZE and AUTO ANALYZE.

Serverless Computing

Longest Running Serverless Computing Query Duration (milliseconds)

The duration of the longest-running query in Serverless Computing. Serverless Computing runs specific queries in a dedicated resource pool, isolated from the main instance.

Version: V2.1+

Use hg_stat_activity to inspect the status of Serverless Computing queries.

Serverless Computing Query Queue Count (count)

The number of queries queued in the Serverless Computing resource pool.

Version: V2.2+

Serverless Computing Resource Quota Usage (%)

The ratio of actual Serverless Computing resources used to the maximum allocatable resources.

Version: V2.2+

Binary Logging

Binlog Consumption Rate (count/s)

The number of Binlog entries consumed per second. Hologres supports subscribing to Hologres Binlog for real-time data tiering and accelerated data forwarding.

Version: V2.2+

Binlog Consumption Rate (bytes/s)

The bytes consumed from Binlog per second. Larger fields or higher data volumes increase the byte count.

Version: V2.2+

WAL Sender Count per FE (count)

The number of WAL senders used per FE node. Each shard of each table consumes one WAL sender connection when consuming Binlog using JDBC. WAL sender connections are independent of regular connections and have a default limit.

Version: V2.2+

WAL Sender Usage Rate of FE with Highest Usage (%)

The peak WAL sender utilization across all FE nodes.

Version: V2.2+

If WAL sender usage reaches the limit, see Consume Hologres Binlog via JDBC for troubleshooting.

Computing Resource

Elastic Core Count for Compute Groups

The number of cores added by time-based scaling in the compute group. For more information, see Time-based elasticity (Beta).

Version: V2.2.21+. Compute group instances only.

Compute Group Auto-Elastic Core Count (count)

The number of cores added by auto-scaling in the compute group. For more information, see Multi-cluster and auto-elasticity (Beta).

Version: V4.0+. Compute group instances only.

Gateway

Gateway CPU Usage (%)

The CPU usage of each Gateway in the instance.

Version: V2.0+. Compute group instances only.

Gateways use round-robin traffic forwarding, so CPU usage occurs even without new connections. Starting in V2.2.22, Gateways launch more worker threads by default to improve connection handling, which increases baseline CPU usage.

Gateway Memory Usage (%)

The memory usage of each Gateway in the instance.

Version: V2.0+. Compute group instances only.

Gateway New Connection Requests per Second (count/s)

The maximum number of new connections that the system can accept and successfully establish per second.

Version: V2.1.12+. Compute group instances only.

A single Gateway handles approximately 100 new connections per second. If new connection requests approach 100 x Gateway count, the Gateways are the bottleneck. Configure a connection pool or scale up the number of Gateways.

Gateway Inbound Traffic Rate (B/s)

The volume of data entering through the Gateway per second.

Version: V2.1+. Compute group instances only.

If inbound traffic approaches 200 MiB/s x Gateway count, the Gateway network capacity is the bottleneck. Scale up the number of Gateways.

Gateway Outbound Traffic Rate (B/s)

The volume of data sent from the Gateway per second.

Version: V2.1+. Compute group instances only.

If outbound traffic approaches 200 MiB/s x Gateway count, the Gateway network capacity is the bottleneck. Scale up the number of Gateways.

Dynamic Table monitoring and alerting

Starting in Hologres V4.0.8, Dynamic Tables offer monitoring metrics for managing refresh tasks. For more information, see Monitoring and alerting.

Common monitoring metric issues

The FAQ for monitoring metrics topic covers common issues, root causes, and fixes.

Monitoring metric alerting

Set alerts for monitoring metrics in Cloud Monitor to detect anomalies early. For more information, see Cloud Monitor.