Partition skew occurs when partitions are unevenly distributed across the brokers in your ApsaraMQ for Kafka cluster. Symptoms include:
Uneven disk usage: Some disks run at high utilization while others remain underused, wasting disk performance and capacity.
Single-node throttling: Topics with excessive traffic on skewed partitions trigger throttling on one node, reducing overall cluster throughput.
Cause
Partition skew happens when a topic is created with a partition count that is not a multiple of the system-recommended number. Kafka distributes partitions across brokers in a round-robin fashion, so a non-aligned partition count results in some brokers holding more partitions than others.
Solution
To fix partition skew, adjust the partition count for the affected topic to a multiple of the recommended number.
Adding partitions does not redistribute existing data. If your producers use key-based partitioning (hash(key) % number_of_partitions), changing the partition count reshuffles the key-to-partition mapping. Consumers that rely on specific partition assignments may be affected. Plan this change during a low-traffic window if your workload depends on key ordering.
Prerequisites
Before you begin, make sure you have:
Access to the ApsaraMQ for Kafka console
The name of the topic with skewed partitions
The system-recommended partition count for your instance
Adjust partitions for non-serverless instances
Log on to the ApsaraMQ for Kafka console. In the left-side navigation pane, click Instances.
In the top navigation bar, select the region where your instance resides. On the Instances page, click the name of the instance.
In the left-side navigation pane, click Topics. On the page that appears, click the name of the topic with skewed partitions.
On the Topic Details page, click the Configuration Information tab.
Next to the Partitions parameter, click Increase Partitions.
In the Increase Partitions panel, use the arrows in the Partitions field to set the partition count to a multiple of the recommended number.
If you cannot increase partitions for a topic that uses cloud storage, use the partition rebalancing feature instead. Partition rebalancing redirects new data writes to disks with lower usage. After messages on the original disk expire, the data is automatically deleted and disk usage drops. For details, see Partition rebalancing and traffic redirection.
Adjust partitions for serverless instances
Log on to the ApsaraMQ for Kafka console. In the left-side navigation pane, click Instances.
In the top navigation bar, select the region where your instance resides. On the Instances page, click the name of the instance.
In the left-side navigation pane, click Topics. On the page that appears, click the name of the topic with skewed partitions.
On the Topic Details page, next to the Partition Replicas parameter, click Edit.
In the Edit Partition panel, use the arrows in the Partition field to set the partition count to a multiple of the recommended number.
Verify the result
After you adjust the partition count, confirm that skew is resolved:
On the Topic Details page, verify that the new partition count is a multiple of the recommended number.
Monitor disk usage across cluster nodes over the next few hours to confirm that usage levels are converging.
Check that throttling alerts on individual nodes have stopped.
Best practices
To prevent partition skew when you create topics:
Always set the partition count to a multiple of the system-recommended number for your instance.
Monitor partition distribution periodically to catch imbalances early, before they cause disk pressure or throttling.