All Products
Search
Document Center

ApsaraMQ for RocketMQ:Dashboard

Last Updated:Dec 04, 2024

ApsaraMQ for RocketMQ can integrate with Managed Service for Prometheus and Managed Service for Grafana that are provided by Application Real-Time Monitoring Service (ARMS) to provide the dashboard feature. Managed Service for Prometheus is used to monitor metrics, and Managed Service for Grafana is used to store and display metrics. The dashboard feature allows you to monitor metrics and collect metric data in an all-in-one, comprehensive, and multi-dimensional manner. This helps you quickly obtain information about your business status. This topic describes the scenarios, background information, metric details, billing, and query methods of the dashboard feature.

Scenarios

  • Scenario 1: You need to receive alerts and locate issues in a timely manner when exceptions occur during online message consumption.

  • Scenario 2: You need to check whether messages are sent as expected in the messaging system when the status of specific online orders is abnormal.

  • Scenario 3: You need to analyze the change trend of message traffic, the characteristics of traffic distribution, or message volume to help you analyze the business trend and make business plans.

  • Scenario 4: You need to view and analyze the upstream and downstream dependency topologies of applications to upgrade, optimize, or transform the architecture.

Background information

When you use ApsaraMQ for RocketMQ to send and receive messages, key metrics, such as accumulated messages, buffering, and processing duration in a queue, can reflect the business performance and broker status. The key metrics of ApsaraMQ for RocketMQ are used in the following business scenarios.

Message accumulation

The following figure shows the status of each message in a queue of a specific topic.

队列消息状态

In the preceding figure, ApsaraMQ for RocketMQ calculates the number of messages and the processing duration at different processing stages. The metrics that are used in this process reflect the processing rate and message accumulation in the queue. By monitoring the metrics, you can determine whether exceptions occur during consumption. The following table describes the details of the metrics and the formulas that are used to calculate the metrics.

Category

Metric

Description

Calculation formula

Message quantity

Inflight messages

The messages that a consumer client is processing and for which the client has not returned the consumption results.

Number of inflight messages = Offset of the latest pulled message - Offset of the latest acknowledged message

Ready messages

The messages that are visible to consumers and are ready for consumption on the ApsaraMQ for RocketMQ broker.

Number of ready messages = Maximum offset - Offset of the latest pulled message

Consumer lag

The messages that are being processed and ready to be processed.

Consumer lag = Number of inflight messages + Number of ready messages

Duration

Ready time

  • For a normal message or an ordered message, the ready time is the time when the message is stored in the broker.

  • For a scheduled message, the ready time is the time that is scheduled for the broker to deliver the message. For a delayed message, the ready time is the time when the specified delay period elapses.

  • For a transactional message, the ready time is the time when a transaction is committed.

N/A

Ready message queue time

The interval between the current point in time and the ready time of the earliest ready message.

This metric indicates how soon a consumer pulls messages.

Ready message queue time = Current time - Ready time of the earliest ready message

Consumer lag time

The interval between the ready time of the earliest unacknowledged message and the current time.

This metric indicates how soon a consumer processes messages.

Consumer lag time = Current time - Ready time of the earliest unacknowledged message

Consumption in push mode

For PushConsumer, real-time message processing is based on the typical Reactor thread model of the SDK. The SDK has a built-in long polling thread, which pulls messages and stores the messages to a queue. Then, the messages are delivered from the queue to individual message consumption threads. The message listener behaves based on the message consumption logic. The following figure shows the message consumption process of PushConsumer consumers.

pushconsumer

For more information, see PushConsumer.

The following items describe metrics that are related to local buffer queues when you consume messages in push mode:

  • Message quantity: the total number of messages in local buffer queues.

  • Message size: the total size of messages in local buffer queues.

  • Waiting duration: the duration for which a message is stored in a local buffer queue before the message is processed.

Metric details

Important

The values of metrics that are related to messaging transactions per second (TPS), API calls for messaging, and message volume are calculated based on a normal message whose size is 4 KB. When you calculate metric values for large messages and featured messages, multiples are used. For more information, see Computing specifications.

The following table describes the fields that are related to the metrics of ApsaraMQ for RocketMQ.

Field

Valid value

Metric type

  • Counter: a cumulative metric whose value only increases. Example: the number of produced messages.

  • Gauge: a metric whose value can increase or decrease. The value of a gauge indicates the instantaneous value of a statistical object. Example: the TPS for API calls.

  • Histogram: a histogram that measures the value distribution of a metric. Example: the distribution of message sizes.

Label

  • instance_id: the ID of the ApsaraMQ for RocketMQ instance.

  • topic: the ApsaraMQ for RocketMQ topic.

  • message_type: the message type. The value normal indicates that the message is a normal message. The value fifo indicates that the message is an ordered message. The value transaction indicates that the message is a transactional message. The value delay indicates that the message is a delayed or scheduled message.

  • fifo_enable: indicates whether the ApsaraMQ for RocketMQ broker delivers messages for consumption in the same order as they are produced. The value true indicates that messages are delivered in order. The value false indicates that messages are delivered concurrently.

  • uid: the ID of your Alibaba Cloud account.

  • client_id: the ID of the ApsaraMQ for RocketMQ client.

  • invocation_status: the response of the API call that is initiated to send messages. The value success indicates that the call is successful. The value failure indicates that the call failed.

Metrics related to brokers

Type

Name

Unit

Description

Label

Gauge

rocketmq_instance_requests_max

count/s

The maximum value of messaging TPS in the instance per minute. Throttled requests are excluded.

Rule for determining the value: The system collects one sample every second based on a 1-minute cycle. The maximum value among the 60 samples is used.

  • uid

  • instance_id

Gauge

rocketmq_instance_requests_in_max

count/s

The maximum value of message sending TPS in the instance per minute. Throttled requests are excluded.

Rule for determining the value: The system collects one sample every second based on a 1-minute cycle. The maximum value among the 60 samples is used.

  • uid

  • instance_id

Gauge

rocketmq_instance_requests_out_max

count/s

The maximum value of message receiving TPS in the instance per minute. Throttled requests are excluded.

Rule for determining the value: The system collects one sample every second based on a 1-minute cycle. The maximum value among the 60 samples is used.

  • uid

  • instance_id

Gauge

rocketmq_topic_requests_max

count/s

The maximum value of message sending TPS in the topics of the instance per minute. Throttled requests are excluded.

Rule for determining the value: The system collects one sample every second based on a 1-minute cycle. The maximum value among the 60 samples is used.

  • uid

  • instance_id

  • topic

Gauge

rocketmq_group_requests_max

count/s

The maximum value of message receiving TPS in the consumer groups of the instance. Throttled requests are excluded.

Rule for determining the value: The system collects one sample every second based on a 1-minute cycle. The maximum value among the 60 samples is used.

  • uid

  • instance_id

  • consumer_group

Gauge

rocketmq_instance_requests_in_threshold

count/s

The throttling threshold for message sending in the instance.

  • uid

  • instance_id

Gauge

rocketmq_instance_requests_out_threshold

count/s

The throttling threshold for message receiving in the instance.

  • uid

  • instance_id

Gauge

rocketmq_throttled_requests_in

count

The number of throttled requests during message sending.

  • uid

  • instance_id

  • topic

  • message_type

Gauge

rocketmq_throttled_requests_out

count

The number of throttled reqeusts during message receiving.

  • uid

  • instance_id

  • topic

  • fifo_enable

  • consumer_group

Gauge

rocketmq_instance_elastic_requests_max

count/s

The maximum scaling value of messaging TPS in the instance.

  • uid

  • instance_id

Counter

rocketmq_requests_in_total

count

The number of API calls initiated to send messages.

  • uid

  • instance_id

  • topic

  • message_type

Counter

rocketmq_requests_out_total

count

The number of API calls initiated to receive messages.

  • uid

  • instance_id

  • topic

  • consumer_group

  • fifo_enable

Counter

rocketmq_messages_in_total

message

The number of messages that producers send to the broker.

  • uid

  • instance_id

  • topic

  • message_type

Counter

rocketmq_messages_out_total

message

The number of messages that the broker delivers to consumers. The messages include messages that are being processed, successfully processed, and failed to be processed.

  • uid

  • instance_id

  • topic

  • consumer_group

  • fifo_enable

Counter

rocketmq_throughput_in_total

byte

The throughput when producers send messages to the broker.

  • uid

  • instance_id

  • topic

  • message_type

Counter

rocketmq_throughput_out_total

byte

The throughput when the broker delivers messages to consumers. The messages include messages that are being processed, successfully processed, and failed to be processed.

  • uid

  • instance_id

  • topic

  • consumer_group

  • fifo_enable

Counter

rocketmq_internet_throughput_out_total

byte

The amount of outbound Internet traffic that is used for messaging.

  • uid

  • instance_id

  • topic

  • message_type

Histogram

rocketmq_message_size

byte

The distribution of message sizes. Data is collected for this metric only when messages are sent.

The following items describe the distribution ranges:

  • le_1_kb: ≤ 1 KB

  • le_4_kb: ≤ 4 KB

  • le_512_kb: ≤ 512 KB

  • le_1_mb: ≤ 1 MB

  • le_2_mb: ≤ 2 MB

  • le_4_mb: ≤ 4 MB

  • le_overflow: > 4 MB

  • uid

  • instance_id

  • topic

  • message_type

Gauge

rocketmq_consumer_ready_messages

message

The number of ready messages.

Ready messages are messages that are ready on the broker and can be consumed by consumers.

This metric reflects the number of messages that are not processed by consumers.

  • uid

  • instance_id

  • topic

  • consumer_group

Gauge

rocketmq_consumer_inflight_messages

message

The number of inflight messages.

This metric reflects the total number of messages that consumer clients are processing and for which the client has not returned the consumption results.

  • uid

  • instance_id

  • topic

  • consumer_group

Gauge

rocketmq_consumer_queueing_latency

ms

The queuing time for ready messages in a consumer group.

The time difference between the current point in time and the point in time when the earliest message is ready.

This metric indicates how soon a consumer pulls messages.

  • uid

  • instance_id

  • topic

  • consumer_group

Gauge

rocketmq_consumer_lag_latency

ms

The delayed time before messages are consumed.

The interval between the ready time of the earliest unacknowledged message and the current time.

This metric indicates how soon a consumer processes messages.

  • uid

  • instance_id

  • topic

  • consumer_group

Counter

rocketmq_send_to_dlq_messages

message

The number of new dead-letter messages per minute.

A dead-letter message is a message that fails to be delivered after the maximum number of retries is reached.

Dead-letter messages are saved to a specific topic or discarded based on the dead-letter policy that is configured for the consumer group.

  • uid

  • instance_id

  • topic

  • consumer_group

Gauge

rocketmq_storage_size

byte

The size of the storage space that is used by the instance, including the storage space that is used by all files.

  • uid

  • instance_id

Metrics related to producers

Type

Name

Unit

Description

Label

Histogram

rocketmq_send_cost_time

ms

The distribution of the time consumed to successfully call the API operation to send messages.

The following items describe the distribution ranges:

  • le_1_ms

  • le_5_ms

  • le_10_ms

  • le_20_ms

  • le_50_ms

  • le_200_ms

  • le_500_ms

  • le_overflow

  • uid

  • instance_id

  • topic

  • client_id

  • invocation_status

Metrics related to consumers

Type

Name

Unit

Description

Label

Histogram

rocketmq_process_time

ms

The distribution of the time consumed by push consumers to process messages, including successful and failed processing.

The value of this metric is calculated by using the following formula: rocketmq_process_time = Process end time - Process start time

The following items describe the distribution ranges:

  • le_1_ms

  • le_5_ms

  • le_10_ms

  • le_100_ms

  • le_10000_ms

  • le_60000_ms

  • le_overflow

  • uid

  • instance_id

  • consumer_group

  • topic

  • client_id

  • invocation_status

Gauge

rocketmq_consumer_cached_messages

message

The number of messages in the local buffer queues of push consumers.

  • uid

  • instance_id

  • consumer_group

  • topic

  • client_id

Gauge

rocketmq_consumer_cached_bytes

byte

The total size of messages in the local buffer queues of push consumers.

  • uid

  • instance_id

  • consumer_group

  • topic

  • client_id

Histogram

rocketmq_await_time

ms

The distribution of queuing time for messages in the local buffer queues of push consumers.

The value of this metric is calculated by using the following formula: rocketmq_await_time = Process start time - Arrival time

The following items describe the distribution ranges:

  • le_1_ms

  • le_5_ms

  • le_20_ms

  • le_100_ms

  • le_1000_ms

  • le_5000_ms

  • le_10000_ms

  • le_overflow

  • uid

  • instance_id

  • consumer_group

  • topic

  • client_id

Billing

Dashboard metrics that are used in ApsaraMQ for RocketMQ are basic metrics in Managed Service for Prometheus. You are not charged for basic metrics in Managed Service for Prometheus. Therefore, you can use the dashboard feature of ApsaraMQ for RocketMQ free of charge.

For more information, see Metrics and Pay-as-you-go.

Prerequisites

  • Managed Service for Prometheus is activated. For more information, see Activate ARMS.

  • The following service-linked role is created:

    • Role name: AliyunServiceRoleForOns.

    • Role policy name: AliyunServiceRolePolicyForOns.

    • Permission description: Allow ApsaraMQ for RocketMQ to assume the role to access CloudMonitor and ARMS to implement the monitoring, alerting, and dashboard features.

    • For more information, see Service-linked roles.

View dashboard metrics

You can view dashboard metrics on the following pages in the ApsaraMQ for RocketMQ console:

  • Dashboard page: displays metrics about all topics and consumer groups on an instance.

  • Instance Details page: displays the producer overview, billing metrics, and throttling metrics of the specified instance.

  • Topic Details page: displays metrics that are related to message production and producer clients of the specified topic.

  • Group Details page: displays metrics that are related to message accumulation and consumer clients of the specified consumer group.

  1. Log on to the ApsaraMQ for RocketMQ console. In the left-side navigation pane, click Instances.

  2. In the top navigation bar, select a region, such as China (Hangzhou). On the Instances page, click the name of the instance that you want to manage.

  3. Use one of the following methods to view the dashboard:

    • On the Instance Details page, click the Dashboard tab.

    • In the left-side navigation pane of the Instance Details page, click Dashboard.

    • In the left-side navigation pane of the Instance Details page, click Topics. On the page that appears, click the name of the topic that you want to manage. On the Topic Details page, click the Dashboard tab.

    • In the left-side navigation pane of the Instance Details page, click Groups. On the page that appears, click the name of the group that you want to manage. On the Group Details page, click the Dashboard tab.

FAQ about the dashboard

How do I obtain metrics on the dashboard?

  1. Log on to the ARMS console by using your Alibaba Cloud account.

  2. In the left-side navigation pane, click Integration Center.

  3. On the Integration Center page, enter RocketMQ in the search field and click the search icon.

  4. In the search result, select the cloud service whose monitoring data you want to integrate into ARMS. Example: Aliyun RocketMQ (5.0) Service. For more information, see Step 1: Integrate the monitoring data of the cloud service into Managed Service for Prometheus.

  5. After you integrate the monitoring data of the cloud service into ARMS, click Integration Management in the left-side navigation pane.

  6. On the Cloud Service Region tab, click the name of the environment that you want to manage.

  7. In the Basic Information section of the Component Management tab, click the cloud service region next to Default Metric Storage.

  8. On the Settings tab of the page that appears, view the methods used to access different types of data.

How do I integrate metric data provided by the dashboard of ApsaraMQ for RabbitMQ into a self-managed Grafana system?

All metric data on the dashboard of ApsaraMQ for RocketMQ are stored in Alibaba Cloud Managed Service for Prometheus. You can follow the procedure in the "How do I obtain metrics on the dashboard?" section to integrate the monitoring data of ApsaraMQ for RocketMQ into Managed Service for Prometheus, obtain the environment name and HTTP API URL, and then use the HTTP API URL to integrate the metric data on the dashboard of ApsaraMQ for RocketMQ into a self-managed Grafana system. For more information, see Use an HTTP API URL to connect a Prometheus instance to a self-managed Grafana system.

What is the maximum TPS of an instance?

Maximum TPS: The system collects one TPS value every second based on a 1-minute cycle. The maximum value among the 60 values is known as the maximum TPS of the minute.

Example:

An ApsaraMQ for RocketMQ instance produces 60 normal messages in a specific minute. If each of the message is 4 KB in size, the message production rate of the instance is 60 messages per minute. The following items describe how to calculate the maximum TPS of the instance:

  • If all 60 messages are sent in the first second, the TPS value for the first second is 60, and the TPS values for the other 59 seconds are all 0.

    In this case, the maximum TPS of the instance is 60.

  • If 40 messages are sent in the first second and 20 messages are sent in the second second, the TPS value for the first second is 40, the TPS value for the second second is 20, and the TPS values for the other 58 seconds are all 0.

    In this case, the maximum TPS of the instance is 40.