After you install the CloudMonitor on a GPU-accelerated compute optimized Elastic Compute Service (ECS) instance, CloudMonitor collects GPU metrics. You can also create an alert rule for the metrics. If the value of a metric meets the specified alert condition, an alert is triggered and CloudMonitor sends an alert notification. This helps you monitor the metric status in real time.
Prerequisites
A GPU-accelerated compute optimized ECS instance is created. The required GPU driver is installed on the instance. For more information, see Create a GPU-accelerated elastic container instance.
NoteIf you install the CloudMonitor agent before you install the GPU driver, you must restart the CloudMonitor agent. For more information about how to restart the CloudMonitor agent, see How can I restart the CloudMonitor agent for C++?
The CloudMonitor agent is installed on the ECS instance. For more information, see Install and uninstall the CloudMonitor agent for C++.
GPU metrics
You can view GPU metrics based on GPUs, instances, and application groups. The following table lists the GPU metrics.
Metric | Unit | MetricName | Dimensions |
(Agent)gpu_decoder_utilization | % | gpu_decoder_utilization | userId, instanceId, and gpuId |
(Agent)gpu_encoder_utilization | % | gpu_encoder_utilization | userId, instanceId, and gpuId |
(Agent)gpu_gpu_temperature | °C | gpu_gpu_temperature | userId, instanceId, and gpuId |
(Agent)gpu_gpu_usedutilization | % | gpu_gpu_usedutilization | userId, instanceId, and gpuId |
(Agent)gpu_memory_freespace | Byte | gpu_memory_freespace | userId, instanceId, and gpuId |
(Agent)gpu_memory_freeutilization | % | gpu_memory_freeutilization | userId, instanceId, and gpuId |
(Agent)gpu_memory_userdspace | Byte | gpu_memory_usedspace | userId, instanceId, and gpuId |
(Agent)gpu_memory_usedutilization | % | gpu_memory_usedutilization | userId, instanceId, and gpuId |
(Agent)gpu_power_readings_power_draw | W | gpu_power_readings_power_draw | userId, instanceId, and gpuId |
View GPU metric data in the CloudMonitor console
Log on to the CloudMonitor console.
In the left-side navigation pane, click
.On the Host Monitoring page, click the host name or click Monitoring Charts in the Actions column of the host.
Click the GPU Monitoring tab.
On the GPUMonitor tab, view the monitoring charts for GPU metrics.
You can view the GPU metrics of the host. You can also configure alert rules for specific GPU metrics and view alerts. For more information, see Step 2: Create an alert rule for the host and Step 3: View host alerts.