You can view the capacity usage details, read/write throughput, and read/write IOPS of a Cloud Parallel File Storage (CPFS) for Lingjun file system by monitoring the metrics for the capacity and performance of the CPFS for Lingjun file system. You can configure alert rules for important metrics of the CPFS for Lingjun file system. This way, you can receive notifications about exceptions and handle the exceptions at the earliest opportunity. This topic describes the metrics that are supported by CPFS for Lingjun and the alert rule configuration for the metrics.
Background
Cloud Monitor is a service that monitors Internet applications and Alibaba Cloud resources. You can use Cloud Monitor to monitor the metrics of Alibaba Cloud resources and configure alert rules for specific metrics. This way, you can monitor the usage of your Alibaba Cloud resources and the status of your applications. You can also handle alerts at the earliest opportunity to ensure the availability of your applications. For more information, see What is Cloud Monitor?
Retention policy of monitoring data
Monitoring data is retained for 90 days. After the retention period expires, the monitoring data is automatically cleared. The retention period starts when data is generated.
Metrics
Cloud Monitor monitors the capacity and performance of CPFS for Lingjun file systems. Cloud Monitor also monitors the performance of clients on a compute node.
Capacity monitoring
Type | Metric | Metric name | Unit | Description |
Type | Metric | Metric name | Unit | Description |
File system | CPFS Capacity | Total Storage Space | Bytes | The total storage space of a file system within a specific period of time. |
CPFS Capacity Used | Data Volume | Bytes | The amount of data that is actually used by a file system within a specific period of time. |
CPFS Inode Limit | Maximum Number of Files | Count | The maximum number of files that can be used by a file system within a specific period of time. |
CPFS Inode Alloc | Number of Allocated Files | Count | The number of files that are allocated by a file system within a specific period of time. |
CPFS Inode Used | Number of Used Files | Count | The number of files that are used by a file system within a specific period of time. |
Fileset | BMCPFSFsetCapacityLimit | Allocated Capacity | Bytes | The maximum storage space that can be used by a fileset to write data. If the size of written data reaches the upper limit, the fileset cannot write data. |
BMCPFSFsetCapacityUsed | Used Capacity | Bytes | The storage space that is actually used by a fileset. |
BMCPFSFsetInodeLimit | Number of Files Allocated by Fileset | Count | The maximum number of files that can be used by a fileset to write data. If the number of used files reaches the upper limit, the fileset cannot write data. |
BMCPFSFsetInodeUsed | Number of Files Used by Fileset | Count | The number of files that are actually used by a fileset. |
Performance monitoring
Type | Metric | Metric name | Unit | Description |
Type | Metric | Metric name | Unit | Description |
File system | ThruputRead | Read throughput | Bytes/s | The average read throughput per second of a file system within a specific period of time. |
ThruputWrite | Write throughput | Bytes/s | The average write throughput per second of a file system within a specific period of time. |
IopsRead | Read IOPS | Count/s | The average read IOPS of a file system over a specific period of time. |
IopsWrite | Write IOPS | Count/s | The average write IOPS of a file system over a specific period of time. |
Dataflow | ThroughputImport | Import throughput | Bytes/s | The average import throughput per second of a dataflow within a specific period of time. |
ThroughputExport | Export throughput | Bytes/s | The average export throughput per second of a dataflow within a specific period of time. |
QPSImportMeta | Metadata QPS for Data Flow Import | Count/s | The average number of requests that are sent by a data import task for metadata per second within a specific period of time. |
QPSExportMeta | Metadata QPS for Data Flow Export | Count/s | The average number of requests that are sent by a data export task for metadata per second within a specific period of time. |
IOPSImport | Import IOPS | Count/s | The average IOPS of a data import task over a specific period of time. |
IOPSEXport | Export IOPS | Count/s | The average IOPS of a data export task over a specific period of time. |
LatencyImport | Import latency | Microseconds | The average latency of a data import task over a specific period of time. |
LatencyExport | Export latency | Microseconds | The average latency of a data export task over a specific period of time. |
Client | ClientReadIops | Client Read IOPS | Count/s | The average number of a client that reads IOPS per second within a specific period of time. |
ClientWriteIops | Client Write IOPS | Count/s | The average number of a client that writes IOPS per second within a specific period of time. |
ClientReadLatency | Client Average Read IOPS | us | The average latency of a client that reads IOPS per second within a specific period of time. |
ClientWriteLatency | Client Average Write Latency | us | The average latency of a client that writes IOPS per second within a specific period of time. |
ClientReadThroughput | Client Read Throughput | Bytes/s | The average throughput of a client that reads IOPS per second within a specific period of time. |
ClientWriteThroughput | Client Write Throughput | Bytes/s | The average throughput of a client that writes IOPS per second within a specific period of time. |
Note
Elastic File Client (EFC) is a client developed by CPFS team. The EFC is installed on a compute node to connect the CPFS for Lingjun file system.
You can log on to the Cloud Monitor console or call the Cloud Monitor API to view the performance monitoring data of the client. For more information, see the Use the Cloud Monitor console or Use the Cloud Monitor API section of the "View the performance monitoring data of a CPFS file system" topic.
If you use the CPFS for Lingjun file system in the EFC or single-tenant Lingjun resources, hostname is the host name of a node.
If you use the CPFS for Lingjun file system in the general computing resources or Lingjun resources, hostname is the pod id of a task.
Alert rules
You can configure alert rules for various metrics in the Cloud Monitor console. If the metric value of a resource meets the alert condition, Cloud Monitor automatically sends notifications to the specified recipients. The following table describes the alert severity, notification method, and alert condition that you can configure for alert rules.
Alert severity | Notification method | Alert condition |
Alert severity | Notification method | Alert condition |
Critical | Phone call, text message, email, and DingTalk chatbot | The average value of the metric reaches the specified threshold for consecutive N cycles. You can configure the value of N based on the alert severity. Note The alert condition varies based on the type of the metric that is used. |
Warning | Text message, email, and DingTalk chatbot |
Info | Email and DingTalk chatbot |