When a consumer writes messages to a network file system (NFS) synchronously in the main poll loop, message processing slows down and can block consumption entirely.
Why NFS blocks consumption
Kafka consumers use a pull-based model: the consumer calls poll() to fetch a batch of messages, processes them, and calls poll() again. Any slow operation inside the poll loop -- such as a synchronous NFS write -- delays the next poll() call and reduces throughput.
NFS compounds this problem in two ways:
NFS is slower than local storage. Write latency on a shared network file system is higher than on a locally attached disk, which directly increases per-message processing time.
Multiple consumers compete for NFS resources. NFS supports simultaneous access from multiple consumers, but each consumer contends for the same network bandwidth and disk I/O. The more consumers share the NFS, the worse each consumer performs.
Decouple consumption from storage
Two approaches address this problem. Use them independently or together.
Separate consumption and storage into two threads (recommended)
Decouple message pulling from message storage by using two independent threads:
Consumption thread -- Calls
poll(), processes messages, and places results into an in-memory queue (such asBlockingQueuein Java orqueue.Queuein Python).Storage thread -- Reads from the in-memory queue and writes results to NFS.
This design prevents NFS writes from blocking the consumption thread, which continues pulling messages at full speed.
+-----------------------+ +----------------+ +-----------------------+
| Consumption thread | | In-memory | | Storage thread |
| |----->| queue |----->| |
| poll() + process | | BlockingQueue | | Write to NFS |
+-----------------------+ +----------------+ +-----------------------+Monitor the in-memory queue size. If the storage thread cannot keep up, the queue grows unbounded. Set a maximum queue capacity and define a backpressure strategy, such as blocking the consumption thread when the queue is full.
Use local cloud disks with asynchronous NFS sync
Attach an ultra disk or solid-state drive (SSD) to each consumer and write results to local storage instead of directly to NFS. Then use a separate asynchronous thread or tool to sync data from local cloud disks to NFS in the background.
This approach provides two benefits:
Eliminates NFS contention during consumption. Each consumer writes to its own local disk, so consumers no longer compete for shared NFS resources.
Prevents synchronous NFS writes from blocking the poll loop. The asynchronous sync process runs independently, so NFS latency does not affect message processing.