ApsaraDB for ClickHouse provides a wide range of performance monitoring metrics for you to view and learn about the status of clusters. This topic describes how to view cluster monitoring data in the ApsaraDB for ClickHouse console.
Prerequisites
ARMS Prometheus is activated.
The ApsaraDB for ClickHouse is updated.
If this is the first time you use cluster monitoring, you will receive notifications that remind you to activate ARMS Prometheus and update the ApsaraDB for ClickHouse first on the Monitoring and Alerting page. If monitoring data is displayed on the Monitoring and Alerting page, it indicates that ARMS Prometheus is activated and the ApsaraDB for ClickHouse version meets the monitoring requirements.
Procedure
Log on to the ApsaraDB for ClickHouse console.
In the upper-left corner, select the region where the cluster you want to manage is located.
On the Clusters page, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, click Monitoring and Alerting.
On the Monitoring and Alerting page, click the Cluster Monitoring tab. By default, cluster monitoring data that is generated in the last 1 hour is displayed.
The ApsaraDB for ClickHouse console has been optimized to improve user experience. More comprehensive performance monitoring metrics are provided for ApsaraDB for ClickHouse clusters that were purchased after December 1, 2021.
NoteIn this topic, the console before optimization is called the old console. The console after optimization is called the new console. This classification is applicable to only the monitoring and alerting feature.
For information about the monitoring metrics of ApsaraDB for ClickHouse clusters that were purchased after December 1, 2021, see Monitoring metrics supported in the new console. For information about the monitoring metrics of ApsaraDB for ClickHouse clusters that were purchased before December 1, 2021, see Monitoring metrics supported in the old console.
Monitoring metrics supported in the new console
Metric | Description |
Disk throughput | The disk read and write throughput (bandwidth). |
Disk IOPS | The count of read and write operations on the disk per second. |
Disk usage | The size of the used disk space. Unit: MB. |
Disk usage | The ratio of the used disk space to the maximum available disk space. Unit: %. |
inode usage | The ratio of the number of used inodes to the maximum number of available inodes. Note An inode is used to keep track of all the files and directories in Linux. |
Data Part number | The total number of data shards. |
Memory usage | The memory resources used by each node in the cluster. Unit: MB. |
Number of Inactive Data Parts | The number of inactive data shards. |
CPU usage | The average CPU utilization of each node in the cluster. |
Memory usage | The average memory usage of each node in the cluster. Unit: %. |
Write size per second | The size of data written to each node in the cluster per second. Unit: bytes. |
Network throughput | The network bandwidth. Unit: bytes. |
QPS | The number of queries processed per second. |
Number of rows written per second | The number of rows written to each node in the cluster per second. |
Number of TCP connections | The number of TCP connections to the cluster. |
TPS | The number of transactions processed per second. |
Number of Query runs | The number of running query statements. |
Number of Init Query runs | The number of running initialization query statements. |
Number of running Mutation | The number of running data correction tasks. |
HTTP connections | The number of HTTP connections to the cluster. |
Number of distributed DDL | The number of distributed DDL statements. |
Number of failed queries | The number of failed query statements. |
Number of MaterializeMySQL | The number of databases that use MaterializeMySQL for data synchronization. |
Number of failed Insert queries | The number of failed Insert statements. |
Kafka appearance number | The number of created Kafka external tables. |
Number of failed Select queries | The number of failed query statements. |
Cold storage usage | The amount of cold data stored. Unit: bytes. |
Number of merge runs | The number of running merge tasks. |
Number of MaterializeMySQL synchronization failures | The number of failed synchronizations by creating databases that use MaterializeMySQL. |
Number of delayed Insert | The number of Insert statements that were delayed. |
The number of errors in kafka's appearance consumption. | The number of tables that use a Kafka engine and failed to be synchronized. |
Number of temporary files in distributed tables | The number of temporary files in a distributed table. |
ZooKeeper CPU usage | The average CPU utilization of a ZooKeeper node in the cluster. |
ZooKeeper memory usage | The average memory usage of a ZooKeeper node in the cluster. |
Monitoring metrics supported in the old console
Metric | Description |
CPU Usage | The CPU utilization of each node in the cluster. |
Memory Usage | The memory usage of each node in the cluster. Unit: %. |
Memory Usage | The memory resources used by each node in the cluster. Unit: MB. |
Disk Usage Ratio | The ratio of the used disk space to the maximum available disk space. Unit: %. |
Disk Usage | The size of the used disk space. Unit: MB. |
Disk IOPS | The count of read and write operations on the disk per second. |
Disk IOPS | The size of data read from and written to the disk per second. Unit: MB. |
Database Connection Usage | The ratio of connections used by the database to the maximum number of available connections. |
Database Connections | The number of connections used by the database. |
TPS | The number of transactions processed per second. |
Rows Written Per Second | The number of rows written to each node per second. |
Size of Data Written per second | The size of data written to each node per second. Unit: MB. |
QPS | The number of queries processed per second. |
Average ZooKeeper wait time | The length of time that is required to wait for responses from the current ZooKeeper. This metric indicates how fast the current ZooKeeper responds. Unit: milliseconds. |
Average I/O wait time | The length of time that is required to wait for responses from the current I/O system. This metric indicates how fast the current I/O system responds. Unit: milliseconds. |
Average CPU wait time | The length of time that is required to wait for responses from the current CPU. This metric indicates how fast the current CPU responds. Unit: milliseconds. |