A consumer subscribed to topics with incoming messages may pull messages slowly or not at all, even though the consumer has not caught up to the latest offset. This typically happens when consumption traffic exceeds the available network bandwidth, especially over the Internet.
Possible causes
Three bandwidth-related conditions cause slow message pulling:
| Cause | Description |
|---|---|
| Bandwidth saturation | Total consumption traffic from the instance has reached the network bandwidth limit |
| Oversized single message | An individual message is larger than the available network bandwidth can deliver promptly |
| Batch fetch exceeds bandwidth | The combined size of messages pulled in a single fetch request exceeds the available bandwidth |
Consumer configuration parameters
The following consumer configuration parameters control how many messages are fetched per request:
| Parameter | What it controls |
|---|---|
| max.poll.records | Maximum number of messages that the consumer can pull at the same time |
| fetch.max.bytes | Maximum number of bytes of messages that the consumer can pull at the same time |
| max.partition.fetch.bytes | Maximum number of bytes of messages that the consumer can pull from a single partition at the same time |
Diagnose and resolve the issue
Step 1: Confirm that messages exist in the topic
Log on to the ApsaraMQ for Kafka console.
Query messages for the target topic.
If no messages are returned, the issue is on the producer side, not the consumer. The remaining steps apply only when messages exist but the consumer cannot pull them fast enough.
Step 2: Check whether consumption traffic has reached the bandwidth limit
In the left-side navigation pane of the Instances page, choose Observability > CloudMonitor.
Click the Monitoring Chart tab.
Locate the instance_internet_rx.rate(bit/s) chart and check whether consumption traffic has reached the bandwidth ceiling.
If traffic is at the limit, increase the instance network bandwidth.
Step 3: Check whether a single message exceeds the bandwidth
Check whether any individual message in the topic is large enough to saturate the available bandwidth on its own.
If so, increase the network bandwidth or reduce the message size at the producer -- for example, compress payloads or split large messages into smaller ones.
Step 4: Reduce the batch fetch size
If neither a single oversized message nor overall bandwidth saturation is the cause, the combined size of messages in a single fetch request may be exceeding the bandwidth limit. Adjust the following parameters:
fetch.max.bytes -- Set this to a value lower than the network bandwidth.
max.partition.fetch.bytes -- Set this to a value lower than the per-partition limit, calculated as:
limit = network bandwidth / number of partitions the consumer subscribes to
The meaning of "network bandwidth" depends on how the consumer connects to the broker:
Through a virtual private cloud (VPC): network bandwidth refers to the maximum write traffic of elastic network interfaces (ENIs) on the instance.
Over the Internet: network bandwidth refers to the Internet bandwidth of the instance.