ApsaraMQ for RocketMQ can integrate with Managed Service for Prometheus and Managed Service for Grafana that are provided by Application Real-Time Monitoring Service (ARMS) to provide the dashboard feature. Managed Service for Prometheus is used to monitor metrics, and Managed Service for Grafana is used to store and display metrics. The dashboard feature allows you to monitor metrics and collect metric data in an all-in-one, comprehensive, and multi-dimensional manner. This helps you quickly obtain information about your business status. This topic describes the scenarios, background information, metric details, billing, and query methods of the dashboard feature.
Scenarios
Scenario 1: You need to receive alerts and locate issues in a timely manner when exceptions occur during online message consumption.
Scenario 2: You need to check whether messages are sent as expected in the messaging system when the status of specific online orders is abnormal.
Scenario 3: You need to analyze the change trend of message traffic, the characteristics of traffic distribution, or message volume to help you analyze the business trend and make business plans.
Scenario 4: You need to view and analyze the upstream and downstream dependency topologies of applications to upgrade, optimize, or transform the architecture.
Background information
When you use ApsaraMQ for RocketMQ to send and receive messages, key metrics, such as accumulated messages, buffering, and processing duration in a queue, can reflect the business performance and broker status. The key metrics of ApsaraMQ for RocketMQ are used in the following business scenarios.
Message accumulation
The following figure shows the status of each message in a queue of a specific topic.
In the preceding figure, ApsaraMQ for RocketMQ calculates the number of messages and the processing duration at different processing stages. The metrics that are used in this process reflect the processing rate and message accumulation in the queue. By monitoring the metrics, you can determine whether exceptions occur during consumption. The following table describes the details of the metrics and the formulas that are used to calculate the metrics.
Category | Metric | Description | Calculation formula |
Message quantity | Inflight messages | The messages that a consumer client is processing and for which the client has not returned the consumption results. | Number of inflight messages = Offset of the latest pulled message - Offset of the latest acknowledged message |
Ready messages | The messages that are visible to consumers and are ready for consumption on the ApsaraMQ for RocketMQ broker. | Number of ready messages = Maximum offset - Offset of the latest pulled message | |
Consumer lag | The messages that are being processed and ready to be processed. | Consumer lag = Number of inflight messages + Number of ready messages | |
Duration | Ready time |
| N/A |
Ready message queue time | The interval between the current point in time and the ready time of the earliest ready message. This metric indicates how soon a consumer pulls messages. | Ready message queue time = Current time - Ready time of the earliest ready message | |
Consumer lag time | The interval between the ready time of the earliest unacknowledged message and the current time. This metric indicates how soon a consumer processes messages. | Consumer lag time = Current time - Ready time of the earliest unacknowledged message |
Consumption in push mode
For PushConsumer, real-time message processing is based on the typical Reactor thread model of the SDK. The SDK has a built-in long polling thread, which pulls messages and stores the messages to a queue. Then, the messages are delivered from the queue to individual message consumption threads. The message listener behaves based on the message consumption logic. The following figure shows the message consumption process of PushConsumer consumers.
For more information, see PushConsumer.
The following items describe metrics that are related to local buffer queues when you consume messages in push mode:
Message quantity: the total number of messages in local buffer queues.
Message size: the total size of messages in local buffer queues.
Waiting duration: the duration for which a message is stored in a local buffer queue before the message is processed.
Metric details
The values of metrics that are related to messaging transactions per second (TPS), API calls for messaging, and message volume are calculated based on a normal message whose size is 4 KB. When you calculate metric values for large messages and featured messages, multiples are used. For more information, see Computing specifications.
The following table describes the fields that are related to the metrics of ApsaraMQ for RocketMQ.
Field | Valid value |
Metric type |
|
Label |
|
Metrics related to brokers
Type | Name | Unit | Description | Label |
Gauge | rocketmq_instance_requests_max | count/s | The maximum value of messaging TPS in the instance per minute. Throttled requests are excluded. Rule for determining the value: The system collects one sample every second based on a 1-minute cycle. The maximum value among the 60 samples is used. |
|
Gauge | rocketmq_instance_requests_in_max | count/s | The maximum value of message sending TPS in the instance per minute. Throttled requests are excluded. Rule for determining the value: The system collects one sample every second based on a 1-minute cycle. The maximum value among the 60 samples is used. |
|
Gauge | rocketmq_instance_requests_out_max | count/s | The maximum value of message receiving TPS in the instance per minute. Throttled requests are excluded. Rule for determining the value: The system collects one sample every second based on a 1-minute cycle. The maximum value among the 60 samples is used. |
|
Gauge | rocketmq_topic_requests_max | count/s | The maximum value of message sending TPS in the topics of the instance per minute. Throttled requests are excluded. Rule for determining the value: The system collects one sample every second based on a 1-minute cycle. The maximum value among the 60 samples is used. |
|
Gauge | rocketmq_group_requests_max | count/s | The maximum value of message receiving TPS in the consumer groups of the instance. Throttled requests are excluded. Rule for determining the value: The system collects one sample every second based on a 1-minute cycle. The maximum value among the 60 samples is used. |
|
Gauge | rocketmq_instance_requests_in_threshold | count/s | The throttling threshold for message sending in the instance. |
|
Gauge | rocketmq_instance_requests_out_threshold | count/s | The throttling threshold for message receiving in the instance. |
|
Gauge | rocketmq_throttled_requests_in | count | The number of throttled requests during message sending. |
|
Gauge | rocketmq_throttled_requests_out | count | The number of throttled reqeusts during message receiving. |
|
Gauge | rocketmq_instance_elastic_requests_max | count/s | The maximum scaling value of messaging TPS in the instance. |
|
Counter | rocketmq_requests_in_total | count | The number of API calls initiated to send messages. |
|
Counter | rocketmq_requests_out_total | count | The number of API calls initiated to receive messages. |
|
Counter | rocketmq_messages_in_total | message | The number of messages that producers send to the broker. |
|
Counter | rocketmq_messages_out_total | message | The number of messages that the broker delivers to consumers. The messages include messages that are being processed, successfully processed, and failed to be processed. |
|
Counter | rocketmq_throughput_in_total | byte | The throughput when producers send messages to the broker. |
|
Counter | rocketmq_throughput_out_total | byte | The throughput when the broker delivers messages to consumers. The messages include messages that are being processed, successfully processed, and failed to be processed. |
|
Counter | rocketmq_internet_throughput_out_total | byte | The amount of outbound Internet traffic that is used for messaging. |
|
Histogram | rocketmq_message_size | byte | The distribution of message sizes. Data is collected for this metric only when messages are sent. The following items describe the distribution ranges:
|
|
Gauge | rocketmq_consumer_ready_messages | message | The number of ready messages. Ready messages are messages that are ready on the broker and can be consumed by consumers. This metric reflects the number of messages that are not processed by consumers. |
|
Gauge | rocketmq_consumer_inflight_messages | message | The number of inflight messages. This metric reflects the total number of messages that consumer clients are processing and for which the client has not returned the consumption results. |
|
Gauge | rocketmq_consumer_queueing_latency | ms | The queuing time for ready messages in a consumer group. The time difference between the current point in time and the point in time when the earliest message is ready. This metric indicates how soon a consumer pulls messages. |
|
Gauge | rocketmq_consumer_lag_latency | ms | The delayed time before messages are consumed. The interval between the ready time of the earliest unacknowledged message and the current time. This metric indicates how soon a consumer processes messages. |
|
Counter | rocketmq_send_to_dlq_messages | message | The number of new dead-letter messages per minute. A dead-letter message is a message that fails to be delivered after the maximum number of retries is reached. Dead-letter messages are saved to a specific topic or discarded based on the dead-letter policy that is configured for the consumer group. |
|
Gauge | rocketmq_storage_size | byte | The size of the storage space that is used by the instance, including the storage space that is used by all files. |
|
Metrics related to producers
Type | Name | Unit | Description | Label |
Histogram | rocketmq_send_cost_time | ms | The distribution of the time consumed to successfully call the API operation to send messages. The following items describe the distribution ranges:
|
|
Metrics related to consumers
Type | Name | Unit | Description | Label |
Histogram | rocketmq_process_time | ms | The distribution of the time consumed by push consumers to process messages, including successful and failed processing. The value of this metric is calculated by using the following formula: The following items describe the distribution ranges:
|
|
Gauge | rocketmq_consumer_cached_messages | message | The number of messages in the local buffer queues of push consumers. |
|
Gauge | rocketmq_consumer_cached_bytes | byte | The total size of messages in the local buffer queues of push consumers. |
|
Histogram | rocketmq_await_time | ms | The distribution of queuing time for messages in the local buffer queues of push consumers. The value of this metric is calculated by using the following formula: The following items describe the distribution ranges:
|
|
Billing
Dashboard metrics that are used in ApsaraMQ for RocketMQ are basic metrics in Managed Service for Prometheus. You are not charged for basic metrics in Managed Service for Prometheus. Therefore, you can use the dashboard feature of ApsaraMQ for RocketMQ free of charge.
For more information, see Metrics and Pay-as-you-go.
Prerequisites
Managed Service for Prometheus is activated. For more information, see Activate ARMS.
The following service-linked role is created:
Role name: AliyunServiceRoleForOns.
Role policy name: AliyunServiceRolePolicyForOns.
Permission description: Allow ApsaraMQ for RocketMQ to assume the role to access CloudMonitor and ARMS to implement the monitoring, alerting, and dashboard features.
For more information, see Service-linked roles.
View dashboard metrics
You can view dashboard metrics on the following pages in the ApsaraMQ for RocketMQ console:
Dashboard page: displays metrics about all topics and consumer groups on an instance.
Instance Details page: displays the producer overview, billing metrics, and throttling metrics of the specified instance.
Topic Details page: displays metrics that are related to message production and producer clients of the specified topic.
Group Details page: displays metrics that are related to message accumulation and consumer clients of the specified consumer group.
Log on to the ApsaraMQ for RocketMQ console. In the left-side navigation pane, click Instances.
In the top navigation bar, select a region, such as China (Hangzhou). On the Instances page, click the name of the instance that you want to manage.
Use one of the following methods to view the dashboard:
On the Instance Details page, click the Dashboard tab.
In the left-side navigation pane of the Instance Details page, click Dashboard.
In the left-side navigation pane of the Instance Details page, click Topics. On the page that appears, click the name of the topic that you want to manage. On the Topic Details page, click the Dashboard tab.
In the left-side navigation pane of the Instance Details page, click Groups. On the page that appears, click the name of the group that you want to manage. On the Group Details page, click the Dashboard tab.
FAQ about the dashboard
How do I obtain metrics on the dashboard?
Log on to the ARMS console by using your Alibaba Cloud account.
In the left-side navigation pane, click Integration Center.
On the Integration Center page, enter
RocketMQ
in the search field and click the search icon.In the search result, select the cloud service whose monitoring data you want to integrate into ARMS. Example: Aliyun RocketMQ (5.0) Service. For more information, see Step 1: Integrate the monitoring data of the cloud service into Managed Service for Prometheus.
After you integrate the monitoring data of the cloud service into ARMS, click Integration Management in the left-side navigation pane.
On the Cloud Service Region tab, click the name of the environment that you want to manage.
In the Basic Information section of the Component Management tab, click the cloud service region next to Default Metric Storage.
On the Settings tab of the page that appears, view the methods used to access different types of data.
How do I integrate metric data provided by the dashboard of ApsaraMQ for RabbitMQ into a self-managed Grafana system?
All metric data on the dashboard of ApsaraMQ for RocketMQ are stored in Alibaba Cloud Managed Service for Prometheus. You can follow the procedure in the "How do I obtain metrics on the dashboard?" section to integrate the monitoring data of ApsaraMQ for RocketMQ into Managed Service for Prometheus, obtain the environment name and HTTP API URL, and then use the HTTP API URL to integrate the metric data on the dashboard of ApsaraMQ for RocketMQ into a self-managed Grafana system. For more information, see Use an HTTP API URL to connect a Prometheus instance to a self-managed Grafana system.
What is the maximum TPS of an instance?
Maximum TPS: The system collects one TPS value every second based on a 1-minute cycle. The maximum value among the 60 values is known as the maximum TPS of the minute.
Example:
An ApsaraMQ for RocketMQ instance produces 60 normal messages in a specific minute. If each of the message is 4 KB in size, the message production rate of the instance is 60 messages per minute. The following items describe how to calculate the maximum TPS of the instance:
If all 60 messages are sent in the first second, the TPS value for the first second is 60, and the TPS values for the other 59 seconds are all 0.
In this case, the maximum TPS of the instance is 60.
If 40 messages are sent in the first second and 20 messages are sent in the second second, the TPS value for the first second is 40, the TPS value for the second second is 20, and the TPS values for the other 58 seconds are all 0.
In this case, the maximum TPS of the instance is 40.