If a rebalance occurs on a consumer client, you can view the rebalance details, such as the start time, duration, and cause of the rebalance. You can also check the number of times that rebalances occurred and whether consumers are added to a consumer group after a rebalance.
Background information
In ApsaraMQ for Kafka, a rebalance is the process in which partitions are remapped to consumer groups. A rebalance may be triggered on a consumer client due to the following causes:
A consumer adds subscriptions to or removes subscriptions from a consumer group.
The number of partitions in a topic changes.
The number of consumers in a consumer group increases or decreases.
Slow consumption response causes consumption heartbeats to time out. To filter the consumers whose heartbeats time out, a rebalance is triggered.
No messages are pulled after the time specified by the
max.poll.interval.ms
parameter elapses. This causes the client to disconnect from the queue and triggers a rebalance. The default value of the max.poll.interval.ms parameter is 5 minutes.The number of consumers in the consumer group is too large. To save topic and partition resources, specific consumers need to be shut down. This triggers a rebalance.
The number of consumers in the consumer group is insufficient. This causes message delays in topics and partitions. To prevent message delays, a specific number of consumers must be added. This triggers a rebalance.
Procedure
Log on to the ApsaraMQ for Kafka console. In the left-side navigation pane, click Instances.
In the top navigation bar, select the region where the instance that you want to manage resides. On the Instances page, click the name of the instance that you want to manage.
In the left-side navigation pane, click Groups. On the page that appears, find the group that you want to manage and click the name of the group.
On the Group Details page, click the Rebalance Details tab.
Why do rebalances frequently occur on a client?
Possible causes
This issue may occur due to the following causes:
For clients earlier than version 0.10.2: The consumer does not have a separate thread to send heartbeat messages. Heartbeat messages are sent by using the poll interface. If the consumption response is stuck, the request to send heartbeat messages times out, and a rebalance occurs.
For clients of version 0.10.2 and later: No messages are pulled after the time specified by the
max.poll.interval.ms
parameter elapses. This causes the client to disconnect from the message queue and triggers a rebalance. The default value of the max.poll.interval.ms parameter is 5 minutes.
Solutions
You must understand the following parameters:
session.timeout.ms
: specifies the timeout period of requests to send heartbeat messages. You can specify a custom value based on your business requirements.max.poll.records
: specifies the maximum number of messages returned for each poll.For clients earlier than version 0.10.2: Heartbeat messages are sent by using the poll interface. The consumer does not have a separate thread to send heartbeat messages.
For clients of version 0.10.2 and later: You can set the
max.poll.interval.ms
parameter to prevent the clients from not consuming messages for a long time.
The following items provide references for you to set parameter values:
session.timeout.ms: For clients earlier than version 0.10.2, set a value larger than the time required to consume a batch of messages but smaller than 30 seconds. We recommend that you set the value to 25 seconds. For clients of version 0.10.2 and later, keep the default value of 10 seconds.
max.poll.records: Set a value far smaller than
the number of consumed messages per thread per second multiplied by the number of threads multiplied by the value of the max.poll.interval.ms parameter
.max.poll.interval.ms: Set a value larger than
the value of max.poll.records divided by the product of the number of consumed messages per thread per second multiplied by the number of threads)
.
Improve the consumption speed of the client and allocate a separate thread to process the consumption logic.
Reduce the number of topics subscribed to by a group to no more than five. We recommend that one group subscribes to only one topic.
Upgrade the client to version 0.10.2 or later.