By Zikui
Dubbo is an RPC service development framework used to solve service governance and communication problems under the microservices architecture. It has features (such as ease of use, ultra-large-scale microservice practices, cloud-native infrastructure adaptation, and security). Incorrect Dubbo usage may cause instability for Dubbo applications and the ZooKeeper registry. Recently, ZooKeeper was unavailable due to repeated initialization of Dubbo Reference when an online customer released the service. The service registration and subscription failed, causing extensive service failures.
Exception Logs in ZooKeeper ↓
The ZooKeeper cluster remains unavailable and cannot solve the problem by itself.
Dubbo Reference is the agent implementation of the service provider in the caller in the Dubbo framework. When Dubbo Reference is initialized, the consumer will be registered in the consumer list of subscribed services. If multiple Dubbo References of the same interface are instantiated in an application, the consumer list of corresponding subscribed services in ZooKeeper will have multiple Znodes generated due to this application subscription. The paths of these Znodes are consistent except for the timestamp field.
Dubbo indicates the real subscription relationship this way. However, if the client is used incorrectly, it may lead to the instability of Dubbo applications and ZooKeeper. https://github.com/apache/dubbo/issues/4587
For example, in versions earlier than Dubbo 2.7.9, if you initialize multiple Dubbo references of the same interface in an application, memory overflow may occur.
For ZooKeeper clusters, when data between ZooKeeper servers are synchronized, the size of data packets used for synchronization between servers is strictly verified according to the limit of jute.maxbuffer. If the data packets exceed the limit, the follower and leader will be disconnected. For Dubbo Reference, where the application keeps initializing the same interface due to incorrect usage, a large number of temporary nodes created by the application will cause the ZooKeeper cluster to keep crashing after the application crashes.
If ZooKeeper is used as the configuration registry, you can increase the value of the jute.maxbuffer parameter according to the suggestion in the article jute.maxbuffer to delay the problem, but it is not a fundamental solution to the problem. MSE ZooKeeper provides a throttling mechanism to prevent clients from repeatedly registering with the same consumer in case of misuse or unexpected exceptions. This ensures the stability of the ZooKeeper cluster and allows you to check specific application registration information based on the observation system of MSE ZooKeeper.
MSE ZooKeeper Troubleshooting Steps:
For example, an application test was initialized repeatedly due to an improper initialization mode. For Dubbo Reference of interface com.demo.provider, the registration error is reported after the application started for a while. At this time, MSE ZooKeeper has restricted the registration behavior of this client, ensuring the stability of the ZooKeeper Server. We can troubleshoot the problematic application based on the monitoring and push track information in the MSE console.
Log on to the MSE console. On the Instance Details page, choose Observation Analysis > Monitoring Center > TopN Monitoring:
Use the client TPS TopN in TopN Monitoring to find the session ID that is frequently written within the time. Use this session ID to query the corresponding data operation records in Data Management > Data Trace.
The query results show that a specific machine has performed multiple consumer registrations.
Upgrade the Dubbo version to the latest stable version, pay attention to the initialization mode of Dubbo references during use, and reduce unnecessary Dubbo References of the same interface. Dubbo References are relatively heavy, and multiple Dubbo References will consume machine resources.
In normal business development, the stability problems of the business and the middleware on which the business depends due to the misuse of the framework or bugs need to be checked with quick means to troubleshoot and avoid getting worse in time. MSE ZooKeeper provides a variety of data statistical aggregation capabilities for many usage scenarios to help users improve troubleshooting efficiency. It provides rich monitoring metrics for various ZooKeeper usage scenarios and performs in-depth optimization based on the Dragonwell JDK. Beyond that, it provides multi-availability zone disaster recovery, O&M-free, and high availability capabilities to help you build stable and efficient microservice applications.
Cloud-native AI Engineering Practice: Accelerating LLM Inference with FasterTransformer
507 posts | 48 followers
FollowAlibaba Developer - January 20, 2022
Alibaba Cloud Community - May 19, 2022
Alibaba Cloud Native - February 15, 2023
Alibaba Cloud Native - July 12, 2024
Alibaba Cloud Native Community - March 6, 2023
Alibaba Cloud Native - May 5, 2023
507 posts | 48 followers
FollowMSE provides a fully managed registration and configuration center, and gateway and microservices governance capabilities.
Learn MoreAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreLindorm is an elastic cloud-native database service that supports multiple data models. It is capable of processing various types of data and is compatible with multiple database engine, such as Apache HBase®, Apache Cassandra®, and OpenTSDB.
Learn MoreMore Posts by Alibaba Cloud Native Community