Data monitoring

0.0.201

You can view the capacity usage details, read/write throughput, and read/write IOPS of a Cloud Parallel File Storage (CPFS) for Lingjun file system by monitoring the metrics for the capacity and performance of the CPFS for Lingjun file system. You can configure alert rules for important metrics of the CPFS for Lingjun file system. This way, you can receive notifications about exceptions and handle the exceptions at the earliest opportunity. This topic describes the metrics that are supported by CPFS for Lingjun and the alert rule configuration for the metrics.

Background

Cloud Monitor is a service that monitors Internet applications and Alibaba Cloud resources. You can use Cloud Monitor to monitor the metrics of Alibaba Cloud resources and configure alert rules for specific metrics. This way, you can monitor the usage of your Alibaba Cloud resources and the status of your applications. You can also handle alerts at the earliest opportunity to ensure the availability of your applications. For more information, see What is Cloud Monitor?

Retention policy of monitoring data

Monitoring data is retained for 90 days. After the retention period expires, the monitoring data is automatically cleared. The retention period starts when data is generated.

Metrics

Cloud Monitor monitors the capacity and performance of CPFS for Lingjun file systems. Cloud Monitor also monitors the performance of clients on a compute node.

Capacity monitoring

Type	Metric	Metric name	Unit	Description

Type	Metric	Metric name	Unit	Description
File system	CPFS Capacity	Total Storage Space	Bytes	The total storage space of a file system within a specific period of time.
	CPFS Capacity Used	Data Volume	Bytes	The amount of data that is actually used by a file system within a specific period of time.
	CPFS Inode Limit	Maximum Number of Files	Count	The maximum number of files that can be used by a file system within a specific period of time.
	CPFS Inode Alloc	Number of Allocated Files	Count	The number of files that are allocated by a file system within a specific period of time.
	CPFS Inode Used	Number of Used Files	Count	The number of files that are used by a file system within a specific period of time.
Fileset	BMCPFSFsetCapacityLimit	Allocated Capacity	Bytes	The maximum storage space that can be used by a fileset to write data. If the size of written data reaches the upper limit, the fileset cannot write data.
	BMCPFSFsetCapacityUsed	Used Capacity	Bytes	The storage space that is actually used by a fileset.
	BMCPFSFsetInodeLimit	Number of Files Allocated by Fileset	Count	The maximum number of files that can be used by a fileset to write data. If the number of used files reaches the upper limit, the fileset cannot write data.
	BMCPFSFsetInodeUsed	Number of Files Used by Fileset	Count	The number of files that are actually used by a fileset.

Performance monitoring

Type	Metric	Metric name	Unit	Description

Type	Metric	Metric name	Unit	Description
File system	ThruputRead	Read throughput	Bytes/s	The average read throughput per second of a file system within a specific period of time.
	ThruputWrite	Write throughput	Bytes/s	The average write throughput per second of a file system within a specific period of time.
	IopsRead	Read IOPS	Count/s	The average read IOPS of a file system over a specific period of time.
	IopsWrite	Write IOPS	Count/s	The average write IOPS of a file system over a specific period of time.
Dataflow	ThroughputImport	Import throughput	Bytes/s	The average import throughput per second of a dataflow within a specific period of time.
	ThroughputExport	Export throughput	Bytes/s	The average export throughput per second of a dataflow within a specific period of time.
	QPSImportMeta	Metadata QPS for Data Flow Import	Count/s	The average number of requests that are sent by a data import task for metadata per second within a specific period of time.
	QPSExportMeta	Metadata QPS for Data Flow Export	Count/s	The average number of requests that are sent by a data export task for metadata per second within a specific period of time.
	IOPSImport	Import IOPS	Count/s	The average IOPS of a data import task over a specific period of time.
	IOPSEXport	Export IOPS	Count/s	The average IOPS of a data export task over a specific period of time.
	LatencyImport	Import latency	Microseconds	The average latency of a data import task over a specific period of time.
	LatencyExport	Export latency	Microseconds	The average latency of a data export task over a specific period of time.
Client	ClientReadIops	Client Read IOPS	Count/s	The average number of a client that reads IOPS per second within a specific period of time.
	ClientWriteIops	Client Write IOPS	Count/s	The average number of a client that writes IOPS per second within a specific period of time.
	ClientReadLatency	Client Average Read IOPS	us	The average latency of a client that reads IOPS per second within a specific period of time.
	ClientWriteLatency	Client Average Write Latency	us	The average latency of a client that writes IOPS per second within a specific period of time.
	ClientReadThroughput	Client Read Throughput	Bytes/s	The average throughput of a client that reads IOPS per second within a specific period of time.
	ClientWriteThroughput	Client Write Throughput	Bytes/s	The average throughput of a client that writes IOPS per second within a specific period of time.

Note

Elastic File Client (EFC) is a client developed by CPFS team. The EFC is installed on a compute node to connect the CPFS for Lingjun file system.
You can log on to the Cloud Monitor console or call the Cloud Monitor API to view the performance monitoring data of the client. For more information, see the Use the Cloud Monitor console or Use the Cloud Monitor API section of the "View the performance monitoring data of a CPFS file system" topic.
If you use the CPFS for Lingjun file system in the EFC or single-tenant Lingjun resources, hostname is the host name of a node.
If you use the CPFS for Lingjun file system in the general computing resources or Lingjun resources, hostname is the pod id of a task.

Alert rules

You can configure alert rules for various metrics in the Cloud Monitor console. If the metric value of a resource meets the alert condition, Cloud Monitor automatically sends notifications to the specified recipients. The following table describes the alert severity, notification method, and alert condition that you can configure for alert rules.

Alert severity	Notification method	Alert condition

Alert severity	Notification method	Alert condition
Critical	Phone call, text message, email, and DingTalk chatbot	The average value of the metric reaches the specified threshold for consecutive N cycles. You can configure the value of N based on the alert severity. Note The alert condition varies based on the type of the metric that is used.
Warning	Text message, email, and DingTalk chatbot
Info	Email and DingTalk chatbot