Message sending retry and message throttling

0.0.201

This topic describes the message sending retry and message throttling mechanisms of ApsaraMQ for RocketMQ.

Background Information

Message sending retry

The message sending retry mechanism of ApsaraMQ for RocketMQ answers the following questions:

If exceptions occur on specific nodes, can messages still be sent as expected?
Does a message retry affect the execution of the message consumption logic?
What are the disadvantages of message retries?

Message throttling

The message throttling mechanism of ApsaraMQ for RocketMQ answers the following questions:

When is message throttling triggered?
What is the client behavior when message throttling is triggered?
How do I prevent message throttling from being triggered and handle unexpected message throttling?

Message sending retry

What is message sending retry?

When a producer client of ApsaraMQ for RocketMQ initiates a request to send messages to the broker, the request may fail due to issues such as network failures and service exceptions. To ensure message reliability, ApsaraMQ for RocketMQ provides a built-in logic in the client SDK to retry failed requests.

You can use the message sending retry mechanism in synchronous and asynchronous transmission modes.

Trigger conditions

Message sending retry is triggered when the following conditions are met:

The message sending request from the client fails or times out.
- A connection fails or a request times out due to a network exception.
- A connection fails due to the restart or undeployment of the broker.
- A request times out due to the slow running of the broker.
The broker returns an error code.
- System logic error: an error caused by incorrect running logic.
- System throttling error: an error caused by excessive capacity.

Note

For transactional messages, only transparent retries are performed. No retries are performed in network exception or timeout scenarios. For more information about transparent retries, see Transparent Retries.

Retry process

You can specify the maximum number of retries when you initialize a producer. When one of the preceding trigger conditions occurs, the producer client attempts to re-send the failed message until the message is sent or the maximum number of retries is reached. If the message still fails to be sent in the last retry, an error is returned..

Synchronous transmission: The call thread is blocked until a retry succeeds or the last retry fails. If the last retry fails, the system returns an error code and an exception.
Asynchronous transmission: The call thread is not blocked. The call result is returned as an exception event or success event.

Retry interval

A client immediately retries a failed message, except when the client receives the system throttling error.
If the client receives the system throttling error, the client retries the failed message based on the interval specified in the exponential backoff retry policy.
The exponential backoff algorithm uses the following parameters to control retry behavior:
- INITIAL_BACKOFF: specifies the interval between the first failure and the first retry. Default value: 1 second.
- MULTIPLIER: specifies the factor by which the interval is multiplied after each failed retry. Default value: 1.6.
- JITTER: specifies the factor by which intervals are randomized. Default value: 0.2.
- MAX_BACKOFF: specifies the upper limit of an interval. Default value: 120 seconds.
- MIN_CONNECT_TIMEOUT: specifies the minimum interval. Default value: 20 seconds.
The following algorithm is recommended:
```
ConnectWithBackoff()
  current_backoff = INITIAL_BACKOFF
  current_deadline = now() + INITIAL_BACKOFF
  while (TryConnect(Max(current_deadline, now() + MIN_CONNECT_TIMEOUT))!= SUCCESS)
    SleepUntil(current_deadline)
    current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF)
    current_deadline = now() + current_backoff + UniformRandom(-JITTER * current_backoff, JITTER * current_backoff)
```
For more information, see connection-backoff.md.

Feature limits

Link blocking evaluation: In the message sending retry mechanism of ApsaraMQ for RocketMQ, a producer can configure only the maximum number of retries. If a system exception triggers the built-in retry logic in the SDK, the broker must wait for the final retry result. This may block the request link. Therefore, you must evaluate the timeout duration and the maximum number of retries for each request to prevent retries from blocking the request link.
Handling of exceptions: The built-in message sending retry mechanism of an ApsaraMQ for RocketMQ client does not ensure that a failed message is successfully sent. If a message still fails to be sent in the last retry, the caller must capture the exception and provide redundancy protection to prevent inconsistency in message sending results.
Duplicated messages: If an ApsaraMQ for RocketMQ producer client re-sends a message due to request timeout, the client does not know the processing result of the message on the broker. As a result, duplicate messages may exist on the broker. Make sure that your message consumption logic can properly handle duplicated messages.

Message throttling

What is message throttling?

In ApsaraMQ for RocketMQ, message throttling is triggered to prevent excessively high workloads on underlying resources when the system capacity becomes insufficient or the system usage exceeds the specified threshold. When message throttling is triggered, the ApsaraMQ for RocketMQ broker immediately fails and returns the system throttling error.

Trigger conditions

Message throttling can be triggered in ApsaraMQ for RocketMQ in the following scenarios:

High storage pressure: A consumer group starts to consume messages from the maximum offset of the queue. For more information, see Consumer progress management. In scenarios such as business rollouts, a consumer group must consume messages at a specific time. In this case, the storage pressure on the queue surges, and throttling is triggered.
Excessive unconsumed messages on the broker: If consumers cannot consume messages at the same rate at which the messages are sent, a large number of messages may be accumulated in the queue. If the number of accumulated messages exceeds the threshold, message throttling is triggered to alleviate workload on the downstream system.

Behavior

When message throttling is triggered, the client receives the system throttling error. The following items describe the error codes and error messages received by different types of clients:

gRPC
- Error code: 530
- Error message keyword: TOO_MANY_REQUESTS
When a client receives the system throttling error code, the client retries the failed message based on the interval specified in the exponential backoff retry policy. For more information, see Message sending retry.
Remoting
- Error code: 215
- Error message keyword: messages flow control
When a client that uses ApsaraMQ for RocketMQ TCP client SDK for Java whose version is earlier than 1.9.0.Final receives the system throttling error code, the client does not retry the failed message. When a client that uses ApsaraMQ for RocketMQ TCP client SDK for Java 1.9.0.Final or later receives the system throttling error code, the client retries the failed message based on the exponential backoff retry policy. For more information, see Message sending retry.
When a producer client that uses an open source Apache RocketMQ SDK receives the system throttling error code, the client does not retry the failed message. When a consumer client that uses an open source Apache RocketMQ SDK receives the system throttling error code, the client retries the failed message based on the exponential backoff retry policy.

Note

For information about the supported versions of gRPC and Remoting clients, see SDK compatibility description.

Suggestions

How to prevent throttling from being triggered: Use the observability feature to monitor the system usage and capacity. This helps ensure that underlying resources are sufficient.
How to handle unexpected message throttling: If unexpected message throttling is triggered and the built-in retry process fails in the client, we recommend that you temporarily switch requests to another system.

Feedback