If a consumer encounters an exception, ApsaraMQ for RocketMQ redelivers the message based on a consumption retry policy to enable fault recovery. This topic describes the scenarios, principles, version compatibility, and recommendations for the message consumption retry feature.
Scenarios
The message consumption retry feature of ApsaraMQ for RocketMQ primarily addresses consumption integrity issues that are caused by business logic failures. This feature is a fallback strategy for your business and should not be used for business flow control.
-
We recommend that you use message retry in the following scenarios:
-
Business processing fails due to reasons related to the current message content. For example, the transaction resolution for the message is not yet available, but you expect it to succeed after a short period.
-
The cause of the consumption failure is not persistent. This means the failure is a low-probability event for the current message, not a regular occurrence. Subsequent messages are likely to be consumed successfully. In this case, you can retry consuming the current message to avoid blocking the process.
-
-
We do not recommend that you use message retry in the following scenarios:
-
Using consumption failure for conditional branching in your processing logic is not a recommended practice. This is because the processing logic already anticipates that this branch will be taken frequently.
-
Using consumption failure for rate limiting in your processing logic is not a recommended practice. The purpose of rate limiting is to temporarily stack excess messages in the queue for peak shaving, not to send them to the retry process.
-
Purpose
A typical problem in asynchronous decoupling with middleware is ensuring the integrity of the entire call chain if a downstream service fails to process a message event. As a financial-grade and reliable business messaging middleware, ApsaraMQ for RocketMQ is designed with a reliable transmission policy. It uses comprehensive acknowledgment and retry mechanisms to ensure that every message is processed as your business expects.
Understanding the message acknowledgment mechanism and consumption retry policy of ApsaraMQ for RocketMQ can help you address the following issues:
-
How to ensure complete message processing for your business: Understanding the consumption retry policy helps you ensure the integrity of each message when you design and implement consumer logic. This practice prevents messages from being ignored when exceptions occur, which could lead to inconsistent business states.
-
How to recover the state of in-process messages during system exceptions: This helps you understand how to recover the state of messages that are being processed when system exceptions, such as breakdowns, occur, and whether state inconsistencies will arise.
Consumption retry policy
A consumption retry policy defines the retry interval and the maximum number of retries for a message after a consumer fails to consume it.
Triggers for message retry
-
Consumption fails. This includes cases where the consumer returns a failure status or throws an unexpected exception.
-
Message processing times out. This includes queuing timeouts in a PushConsumer.
Main behaviors of message retry
-
Retry process state machine: Controls the state and transition logic of a message during the retry process.
-
Retry interval: The time between the last consumption failure or timeout and when the message can be consumed again.
-
Maximum number of retries: The maximum number of times a message can be retried.
Differences in retry policies
The internal mechanisms and configuration methods of the message retry policy vary depending on the consumer type. The following table describes the differences.
|
Consumer type |
Retry process state machine |
Retry interval |
Maximum number of retries |
|
PushConsumer |
|
Controlled by the metadata of the consumer group at creation.
|
Set in the console or using an OpenAPI operation. |
|
SimpleConsumer |
|
Modified using an API operation to change the invisibility duration when receiving messages. |
Set in the console or using an OpenAPI operation. |
For more information about specific retry policies, see PushConsumer consumption retry policy and SimpleConsumer consumption retry policy.
PushConsumer consumption retry policy
Retry state machine
When a PushConsumer consumes a message, the message goes through the following main states:
-
Ready: The resource is ready for use.
The message is ready on the ApsaraMQ for RocketMQ server and can be consumed by a consumer.
-
Inflight indicates that processing is in progress.
The message has been received by the consumer client and is being processed. The consumption result has not been returned.
-
WaitingRetry: This state is unique to PushConsumer.
When message processing fails or times out, the consumption retry logic is triggered. If the current number of retries has not reached the maximum, the message enters the WaitingRetry state. After the retry interval elapses, the message returns to the Ready state and can be consumed again. The interval between retries can be extended to prevent frequent, invalid failures.
-
Commit: The status of the commit.
This is the state for successful consumption. The consumer returns a success response to terminate the message's state machine.
-
DLQ: This is the dead-letter state.
This is the final fallback mechanism for consumption logic. If a message fails to be consumed after the maximum number of retries and the dead-letter message feature is enabled, the failed message is delivered to a dead-letter topic. You can then consume messages from the dead-letter topic to perform business recovery. For more information, see Dead-letter messages.
-
Discard: To delete or remove.
If a message fails to be consumed after the maximum number of retries and the dead-letter message feature is disabled, the failed message is discarded.

For example, the preceding figure shows the retry process for a message. Assume that the message stays in the Ready state for 5 s and the processing time is 6 s.
Each time the message is retried, its state changes from Ready to Inflight to WaitingRetry. The retry interval is the time between the last consumption failure or timeout and the time when the message can be consumed again. The actual time between two consumption attempts also includes the processing time and the duration in the Ready state. For example:
-
The message enters the Ready state at 0 s for the first consumption attempt.
-
Due to the consumer's processing speed, the message is not pulled for consumption until 5 s has passed. After 6 s of processing, an exception occurs, and the client returns a consumption failure.
-
The message cannot be retried immediately. It must wait for the retry interval to elapse before it can be consumed again.
-
At 21 s, the message returns to the Ready state.
-
After another 5 s, the client starts to consume the message again.
Therefore, the actual interval between two consumption attempts is: Processing time + Retry interval + Duration in Ready state = 21 s.
Retry interval
-
Unordered messages (non-ordered messages): The retry interval uses a tiered schedule. The following table lists the specific intervals.
Retry attempt
Retry interval
Retry attempt
Retry interval
1
10 seconds
9
7 minutes
2
30 seconds
10
8 minutes
3
1 minute
11
9 minutes
4
2 minutes
12
10 minutes
5
3 minutes
13
20 minutes
6
4 minutes
14
30 minutes
7
5 minutes
15
1 hour
8
6 minutes
16
2 hours
NoteIf the number of retries exceeds 16, the retry interval is 2 hours for each subsequent attempt.
-
Ordered messages: The retry interval is fixed. For more information about the value, see Parameter limits.
Maximum number of retries
Default: 16.
Maximum: 1,000.
The maximum number of retries for a PushConsumer is controlled by the metadata of the consumer group. For more information about how to modify this value, see Modify the maximum number of retries.
For example, if the maximum number of retries is 3, the message can be delivered up to 4 times: 1 delivery of the original message and 3 retries.
Usage example
To trigger a message retry for a PushConsumer, you can return a consumption failure status code. The SDK also catches unexpected exceptions.
SimpleConsumer simpleConsumer = null;
// Sample code: Use a PushConsumer to consume normal messages. If consumption fails, return an error to trigger a retry.
MessageListener messageListener = new MessageListener() {
@Override
public ConsumeResult consume(MessageView messageView) {
System.out.println(messageView);
// Return a consumption failure to automatically trigger a retry until the maximum number of retries is reached.
return ConsumeResult.FAILURE;
}
};
View consumption retry logs
Retries for ordered consumption by a PushConsumer occur on the consumer client. The server cannot retrieve detailed logs for consumption retries. If the delivery result of an ordered message in the message trace is "failed", check the consumer client logs for information such as the maximum number of retries and the consumer client.
For more information about the consumer client log path, see Log configurations.
You can search for the following keywords in the client logs to quickly locate content related to consumption failures:
Message listener raised an exception while consuming messages
Failed to consume fifo message finally, run out of attempt times
SimpleConsumer consumption retry policy
Retry state machine
When a SimpleConsumer consumes a message, the message goes through the following main states:
-
Ready: The resource is ready for use.
The message is ready on the ApsaraMQ for RocketMQ server and can be consumed by a consumer.
-
Inflight indicates that processing is in progress.
The message has been received by the consumer client and is being processed. The consumption result has not been returned.
-
Commit: The status of the commit.
This is the state for successful consumption. The consumer returns a success response to terminate the message's state machine.
-
DLQ: This is the dead-letter state.
This is the final fallback mechanism for consumption logic. If a message fails to be consumed after the maximum number of retries and the dead-letter message feature is enabled, the failed message is delivered to a dead-letter topic. You can then consume messages from the dead-letter topic to perform business recovery. For more information, see Dead-letter messages.
-
Discard: To delete or remove.
If a message fails to be consumed after the maximum number of retries and the dead-letter message feature is disabled, the failed message is discarded.
Unlike the PushConsumer retry policy, the retry interval for a SimpleConsumer is pre-allocated. Each time a message is received, the consumer sets an invisibility duration parameter, InvisibleDuration, when it calls the API. This duration is the maximum processing time for the message. If a consumption failure triggers a retry, you do not need to set the next retry interval because the value of the invisibility duration parameter is reused.

Because the invisibility duration is pre-allocated, it may differ significantly from the actual message processing time in your business. You can use an API operation to modify the invisibility duration.
For example, if you set the maximum processing time to 20 ms but the message cannot be processed within that time, you can modify the message's invisibility duration to extend the processing time. This prevents the message from triggering the retry mechanism.
To modify the invisibility duration of a message, the following conditions must be met:
-
Message processing has not timed out.
-
The consumption status of the message has not been committed.
As shown in the following figure, the change to the invisibility duration takes effect immediately. The invisibility duration is recalculated from the moment the API is called.

Message retry interval
Message retry interval = Invisibility duration - Actual message processing time
The consumption retry interval for a SimpleConsumer is controlled by the message's invisibility duration. For example, if the invisibility duration is 30 ms and the message processing takes 10 ms before a failure response is returned, the next retry occurs after 20 ms. In this case, the message retry interval is 20 ms. If the message is not processed and no result is returned after 30 ms, the message times out and is retried immediately. In this case, the retry interval is 0 ms.
Maximum number of retries
Default: 16.
Maximum: 1,000.
The maximum number of retries for a SimpleConsumer is controlled by the metadata of the consumer group at creation. For more information about how to modify this value, see Modify the maximum number of retries.
For example, if the maximum number of retries is 3, the message can be delivered up to 4 times: 1 delivery of the original message and 3 retries.
Usage example
To trigger a message retry for a SimpleConsumer, you can wait for the message to time out.
// Sample code: Use a SimpleConsumer to consume normal messages. To trigger a retry, simply wait for the message to time out, and the server will automatically retry.
List<MessageView> messageViewList = null;
try {
messageViewList = simpleConsumer.receive(10, Duration.ofSeconds(30));
messageViewList.forEach(messageView -> {
System.out.println(messageView);
// If processing fails and you want the server to retry, simply ignore the message. You can try to receive it again after its invisibility duration expires.
});
} catch (ClientException e) {
// If the pull fails due to system throttling or other reasons, you need to initiate a new request to receive messages.
e.printStackTrace();
}
Modify the maximum number of retries
You can modify the maximum number of consumption retries for PushConsumers and SimpleConsumers in the following ways.
1. If your client uses the Remoting protocol, the actual maximum number of retries is determined by the client-side settings. The configuration in the console does not take effect. If your client uses the gRPC protocol, the maximum number of retries is determined by the configuration in the console.
2. The consumption retry policies (tiered backoff and fixed interval) apply only to clients that use the gRPC protocol. They do not apply to clients that use the Remoting protocol.
gRPC SDK
-
Call the UpdateConsumerGroup OpenAPI operation.
-
Modify in the console:
You can perform this operation as follows:
-
On the Instances page, click the name of the target instance.
-
In the navigation pane on the left, click Groups. On the Groups page, click Create Group.

-
Remoting SDK
-
Modify using a Remoting SDK parameter: Modify the value of the `maxReconsumeTimes` property of the consumer.
Recommendations
Use retries reasonably and avoid triggering them for needs such as rate limiting
As mentioned in the Scenarios section, message retry is suitable for scenarios where business processing fails and the current consumption failure is a low-probability event. It is not suitable for scenarios with persistent failures, such as rate limiting.
-
Incorrect example:
If the current consumption speed is too high and triggers rate limiting, return a consumption failure and wait for the next retry.
-
Correct example:
If the current consumption speed is too high and triggers rate limiting, delay receiving messages and consume them later.
Message retry FAQ
How do I set the consumption timeout?
The consumption timeout is set on the consumer client. The following sections describe the parameter settings.
gRPC protocol
-
SimpleConsumer: The timeout can be set to a maximum of 12 hours and a minimum of 10 seconds.
The following code provides an example:
private long minInvisiableTimeMillsForRecv = Duration.ofSeconds(10).toMillis(); private long maxInvisiableTimeMills = Duration.ofHours(12).toMillis(); -
PushConsumer: The default timeout is 230 minutes and cannot be modified.
Remoting protocol
consumer.setConsumeTimeout(15); // The unit is minutes. The minimum value is 1 minute, and the maximum value is 180 minutes.