Generate metrics precise to the second - ApsaraMQ for RabbitMQ

This topic describes how to generate metrics precise to the second by using the message log management feature.

Background information

CloudMonitor provides charts that display the average values of minute-level statistics for ApsaraMQ for RabbitMQ instances. CloudMonitor does not provide charts that display transactions per second (TPS), which is statistics calculated per second. The number of requests that clients initiate per second by calling Advanced Message Queuing Protocol (AMQP) methods is used to calculate the TPS of an ApsaraMQ for RabbitMQ instance.

The following items describe the AMQP methods that are involved in TPS calculation:

ConnectionOpen and ChannelOpen
QueueDeclare, QueueDelete, QueueBind, and QueueUnbind
ExchangeDeclare and ExchangeDelete
ExchangeBind and ExchangeUnBind
SendMessage, BasicConsume, BasicGet, BasicAck, BasicReject, BasicNack, and BasicRecover

For more information, see Request methods.

Procedure

Enable the message log feature and configure an index.
Create a Metricstore to store metric data that has been cleansed.
1. Log on to the Simple Log Service console. In the Projects section of the page that appears, click the name of the project that you want to manage. On the project details page, select the icon and click Create Now.
2. In the Create Metricstore panel, specify the basic information about the Metricstore that you want to create.
Create a cleansing task.
1. Go to the Search & Analysis page of the logstore that you want to manage and enter the query statement. In the following example, the error codes of an ApsaraMQ for RabbitMQ instance are cleansed.
```
* | SELECT Code, count(*) as num, microtime / 1000 / 1000 as timeSecond group by Code, timeSecond limit 1000000
```
  The preceding statement is in the Search statement/Analytic statement format. In this format, the search statement specifies the filter conditions and the analytic statement is a standard SQL statement. Data can be written to the Metricstore only if the following items are cleansed from the query result: the labels that you require, the metric value of each label, and the time. In the preceding statement, Code specifies the label, which is the response code of each request, num specifies the value of each label, and timeSecond specifies the time in seconds.
  The following figure shows a sample query result.
2. In the query result, click the Graph tab and then click Save as Scheduled SQL Job. In the Compute Settings step of the wizard that appears, configure the following parameters and click Next.
  Note
  When you configure the preceding parameters, specify the Metricstore that you created for the Source Project/Logstore parameter.
3. In the Scheduling Settings step, specify the scheduling interval and click OK.
Query the distribution of metric values in the Metricstore.
The following figure shows a sample query result.
(Optional) Integrate data in the Metricstore into Grafana or Simple Log Service and display the data on a dashboard.
- For information about how to integrate data into Grafana, see Send time series data from Simple Log Service to Grafana.
- For information about the visualization feature of Simple Log Service, see Overview of visualization.

Note

In the preceding example, the error codes of an ApsaraMQ for RabbitMQ instance are cleansed. You can also cleanse other data, such as the messaging rate of each channel on each remote client, the workload and health status of each queue per second, the total number of sent and received messages per second, and the number of calls for each API operation per second.

Common statements

Query the TPS chart of an instance

* | select microtime/1000/1000 as time, sum(count) as tps 
from 
  (SELECT  microtime, if(Action!='SendMessage', 1, tps) as count 
   from log 
   Where  InstanceId='amqp-xx-xxx' 
     and Action in ('SendMessage', 'ConnectionOpen', 'ChannelOpen', 'ExchangeDeclare', 'QueueBind', 'QueueDeclare', 'QueueDelete', 'ExchangeDelete', 'QueueUnBind', 'ExchangeBind', 'ExchangeUnBind', 'BasicConsume', 'BasicReject', 'BasicRecover', 'BasicAck', 'BasicNAck', 'PullMessage') 
   limit 90000000) 
  
GROUP by time ORDER by time limit 90000000

The following figure shows a sample query result.

Replace amqp-xx-xxx in the preceding code with the ID of the instance whose TPS chart you want to query.
If the client calls BasicNack with the multiple=false setting, a request is sent for each call. If the client calls BasicNack with the multiple=true setting, multiple requests are sent for each call. However, SLS creates only one log entry for a BasicNack call, regardless of whether multiple requests are sent in the call. Therefore, the TPS returned in a TPS chart is smaller than the actual TPS.
When you query a TPS chart, if a large amount of traffic is received by your client, we recommend that you specify a time range that is less than or equal to 1 hour and add limit 90000000 to the SQL statement that is executed to query the TPS chart. You can also replace 90000000 in limit with the largest possible value.

Query the total number of sent messages by exchange and routing key

* and Action : SendMessage and Code : 200 | 
select 
  InstanceId as instance_id,
  VHost as virtual_host, 
  split_part(ResourceName,',',2) as exchange_name, 
  split_part(ResourceName,',',3) as routing_key, 
  count(*) as send_total_num 
group by 
  instance_id,
  virtual_host, 
  exchange_name, 
  routing_key 
order by 
  send_total_num 
limit 10000000

The following figure shows a sample query result.

Query the message sending rate per second by exchange and routing key

* and Action : SendMessage and Code : 200 | 
select 
  InstanceId as instance_id,
  VHost as virtual_host, 
  split_part(ResourceName,',',2) as exchange_name, 
  split_part(ResourceName,',',3) as routing_key, 
  microtime / 1000 / 1000 as time_second, 
  count(*) as send_qps 
group by 
  instance_id,
  virtual_host, 
  exchange_name, 
  routing_key,
  time_second 
order by 
  time_second, 
  send_qps 
limit 10000000

The following figure shows a sample query result.

Query the number of messages consumed by each queue per second

* and Action : PushMessage and Code : 200 | 
select 
  InstanceId as instance_id,
  VHost as virtual_host, 
  Queue as queue_name, 
  count(*) as push_total_num 
group by 
  instance_id,
  virtual_host, 
  queue_name 
order by 
  push_total_num 
limit 10000000

The following figure shows a sample query result.

Query the message consumption rate of each queue per second

* and Action : PushMessage and Code : 200 | 
select 
  InstanceId as instance_id,
  VHost as virtual_host, 
  Queue as queue_name, 
  microtime / 1000 / 1000 as time_second, 
  count(*) as push_qps 
group by 
  instance_id,
  virtual_host, 
  queue_name, 
  time_second 
order by 
  time_second, 
  push_qps 
limit 10000000

The following figure shows a sample query result.

Query the number of messages sent by each client per second

* and Action : SendMessage and Code : 200 | 
select 
  InstanceId as instance_id,
  VHost as virtual_host, 
  RemoteAddress as client_ip_port, 
  microtime / 1000 / 1000 as time_second, 
  count(*) as send_qps 
group by 
  instance_id,
  virtual_host, 
  client_ip_port, 
  time_second 
order by 
  time_second, 
  send_qps 
limit 10000000

The following figure shows a sample query result.

Query the number of messages consumed by each client per second

* and Action : PushMessage and Code : 200 | 
select 
  InstanceId as instance_id,
  VHost as virtual_host, 
  RemoteAddress as client_ip_port, 
  microtime / 1000 / 1000 as time_second, 
  count(*) as push_qps 
group by 
  instance_id,
  virtual_host, 
  client_ip_port, 
  time_second 
order by 
  time_second, 
  push_qps 
limit 10000000

The following figure shows a sample query result.

Query the rate at which an operation is performed on each client per second

If you want to query the queries per second (QPS) of an operation performed on a specific client, copy the following statement and replace {action_name} with the operation name. The following operations are available:

ConnectionOpen and ChannelOpen
QueueDeclare, QueueDelete, QueueBind, and QueueUnbind
ExchangeDeclare and ExchangeDelete
ExchangeBind and ExchangeUnBind
SendMessage, BasicConsume, BasicGet, BasicAck, BasicReject, BasicNack, and BasicRecover

* and Action : {action_name} and Code : 200 | 
select 
  InstanceId as instance_id,
  VHost as virtual_host, 
  RemoteAddress as client_ip_port, 
  microtime / 1000 / 1000 as time_second, 
  count(*) as {action_name}_qps 
group by 
  instance_id,
  virtual_host, 
  client_ip_port, 
  time_second 
order by 
  time_second, 
  {action_name}_qps 
limit 10000000

For example, you can use the following statement to query the QPS of connection opening:

* and Action : ConnectionOpen and Code : 200 | 
select 
  InstanceId as instance_id,
  VHost as virtual_host, 
  RemoteAddress as client_ip_port, 
  microtime / 1000 / 1000 as time_second, 
  count(*) as connection_open_qps 
group by 
  instance_id,
  virtual_host, 
  client_ip_port, 
  time_second 
order by 
  time_second, 
  connection_open_qps 
limit 10000000

The following figure shows a sample query result.

Query the QPS of each operation

You can use the following statement to query the QPS of all operations at a time.

* and Code : 200 | 
select 
  InstanceId as instance_id,
  VHost as virtual_host,
  Action as action_type,
  RemoteAddress as client_ip_port, 
  microtime / 1000 / 1000 as time_second, 
  count(*) as action_qps
group by 
  instance_id,
  virtual_host,
  client_ip_port,
  action_type,
  time_second 
order by
  time_second, 
  action_qps
limit 10000000

The following figure shows a sample query result.

Query the occurrence frequency of each error

* and not Code = 200 | 
select 
  Code as error_code,
  VHost as virtual_host,
  split_part(split_part(Info, '[', 1), 'Req', 1) as error_info,
  microtime / 1000 / 1000 as time_second,
  count(*) as error_num
group by 
  virtual_host,
  error_code,
  time_second,
  error_info
order by
  time_second, 
  error_num
limit 10000000

The following figure shows a sample query result.

Query the average size of message bodies

* and Action : SendMessage and Code: 200 | 
select 
  InstanceId as instance_id, 
  VHost as virtual_host, 
  split_part(Queue, ';', 1) as queue_name, 
  microtime / 1000 / 1000 as time_second, 
  avg(cast(split_part(ResourceName, 'bodySize=', 2) as bigint)) as avg_body_size 
group by 
  instance_id, 
  virtual_host, 
  queue_name, 
  time_second 
order by 
  time_second, 
  avg_body_size 
limit 10000000

The following figure shows a sample query result.

Query the number of times that each message is pushed

* and Action : PushMessage and Code : 200 | 
select 
  InstanceId as instance_id, 
  VHost as virtual_host, 
  split_part(split_part(ResourceName, ',', 1), '=', 2) as msg_id, 
  count(*) as push_times 
group by 
  instance_id, 
  virtual_host, 
  msg_id 
order by 
  push_times desc 
limit 1000000

The following figure shows a sample query result.