Practical experience of local microservices in low fault tolerance business scenarios
Founded in 2014, Helian Health is a health management service company that focuses on physical examination scenarios. For hospitals, Helian provides a set of SaaS services around pre examination, during examination and post examination; For enterprises, group physical examination and health management are provided. Li Jinji and PwC are customers of Helian; For families, health management APP is provided. At present, Helian has covered more than 200 cities and more than 2000 hospitals nationwide.
What technological development stages has Helian Health experienced?
The first stage: macro application. From 0 to 1, the iteration speed is fast, and there are also many failures. The business needs Helian to quickly iterate and verify. How to do it quickly? At that time, a container management service provided by Alibaba Cloud Jushita was also used, which is the prototype of containerization. In summary, we focus on speed, but there will be technical debt, many failures, and failure to meet business expectations.
The second stage: microservice. When there were more and more hospitals connected to Helian, there were more failures, and customers complained a lot. At that time, developers were "fighting fires" all day long. Later, Helian began to do modular decoupling and service splitting, and introduced Dubbo and Nacos. However, at that time, the understanding of business was still not deep enough. There was a problem with service splitting, which led to a lot of cross calls of services. There were super services that almost all interfaces would call, which was harmful to stability. To sum up, micro service splitting without deep understanding of business can solve the symptoms rather than the root causes.
The third stage: microservice reconstruction. Focusing on horizontal orders, orders, and data synchronization, we reorganized the modules and services, replaced the deployment architecture with K8s, and replaced some middleware used for service governance with cloud services such as Alibaba Cloud microservice engine MSE [1]. At this time, the overall system is more stable. In summary, building microservices around business combined with the advantages of cloud has improved the development, operation and maintenance efficiency and online stability.
What are the different technical challenges of the low fault tolerance medical service?
Low fault tolerance is the business feature of Helian. For example, when users go to the hospital for physical examination, projects that cannot be completed due to IT reasons will have a great impact on the user experience. Not only is physical examination, but the whole medical industry has the characteristics of low fault tolerance. In addition, for most people, the frequency of physical examination is only 1-2 times a year, which is a very low frequency scene, so the traffic is also very low. The problem with low traffic is that grayscale publishing is almost invalid, and even full publishing may not find bugs. Some bugs will not be found until one year after the code is published.
Therefore, Helian must first solve the problem of complex logic, and must be modularized and decoupled.
However, if only business decoupling is done, modularization is enough. For example, if Java language is used, Java modules can be divided into JAR packages, and Maven can be used to manage different dependencies. However, many early technical architectures supported different businesses through a single package, with many business modules and no business isolation. When the microservice is not split, there may be problems with the enterprise business code, leading to the business collapse of the hospital with low fault tolerance, which is unacceptable to the business.
Therefore, Helian has directly realized the service, and has separated the services. There are public basic services that can be called, and different businesses will not affect each other. Servitization not only realizes business decoupling, but also realizes service layering and guarantees the core services of performance. For example, for businesses with very low fault tolerance, you can build support services specifically for problem scenarios. At the same time, you can perform independent quality inspection on services. If they are packaged together, you cannot perform independent quality inspection.
There are two main modes of service splitting. One is splitting by business, and the other is splitting by capability. Different businesses can call each other. Finally, the architecture of Helian is shown in the figure above. It is mainly divided according to capacity, supplemented by business. For example, the front end is the web service, the blue block is the business service of the business core iteration, and the bottom layer splits the order, payment, and message services according to capabilities. The lower layer is far away from business, such as hospital data synchronization service and manual contract fulfillment service, which are self built independent services.
The services with the most frequent business iterations are separated from the relatively stable services. The two sides are connected through HTTP. Dubbo is used as RPC in the business cluster, Nacos as the registration and configuration center, and RocketMQ as asynchronous messages.
Practical experience in the evolution of microservices
For micro services, Helian uses the Dubbo+Nacos technology stack.
Dubbo is an RPC framework based on Java Interface. For Java programmers, it can become a microservice by adding simple annotations, so it is promoted in the team. At the same time, the call almost does not invade the code. Change @ Autowire to @ DubboReference to inject services. The integration of Nacos in Dubbo is very perfect. It only takes a few lines of configuration to use. The control panel is simple and easy to use. Like Dubbo, it is a Chinese community, and the threshold for programmers is lower.
In the early days, Helian built its own community version of Nacos, which encountered a major performance bottleneck. At that time, the Dubbo2 service model was based on interfaces. One interface and one function would bring one service, and the traffic was very large. Alibaba Cloud's microservice engine MSE helped Helian overcome the pressure of Dubbo. It has good compatibility. Later, Helian followed the community to upgrade to Dubbo 3, solving the problem of Dubbo 2's service model. In addition, from the memory perspective, MSE has excellent tuning capabilities, which improves business performance by four times and reduces resource costs.
He Lian serves a large number of hospitals. The demands of each hospital are uncertain and different. There will be a large number of characteristic switches. The operation of such switches is very dangerous, which is generally configured by developers. MSE solves the problem well. The MSE feature switch can be dynamically configured without restarting the application. At the same time, it can be combined with the KMS AliCloud key management service to encrypt and store data, but users have no awareness.
HTTP gateway mainly solves the problem of protocol conversion. Helian's App front-end business logic is heavy, and no result encapsulation is required, as long as the service capability is exposed. Therefore, Apache Shen Yu, based on open source, has made a transformation to convert the HTTP protocol to Dubbo, support POST/GET, and put the authentication and authorization logic to the gateway.
In terms of DevOps, the K8s+image release rollback uses ACK [2], and continuous integration uses cloud effect CI, which brings a high release efficiency to Helian. At most, it will release 20-30 times a week, and the single release time is reduced from 2-3 hours to 8 minutes. In addition, Helian has isolated services based on Dubbo. For example, two versions of the same service can be deployed, with the same code and usage, but different instances. Two services have independent memory. When one service fails, the other service will not be affected. However, this capability is still weak, and the enhancement of control capability is the future development direction.
Future planning of microservices
In the future, Helian hopes to realize the control surface of Service Mesh.
As shown in the figure above, for example, when a service request arrives, if it is req *, you want it to be routed to a special version of ServiceA *. The message sent after the request passes through MQ cannot be received by the Service message, but should be received by the Service * to achieve the routing capability of the whole link. At present, Alibaba Cloud ASM provides Istio hosting with the above capabilities, and also provides basic Dubbo governance capabilities [3]. Later, we will explore how to integrate and evolve in ASM.
The purpose of implementing Service Mesh is to reduce the cost of test environment. At present, there are 7-8 sets of test environments in the large cluster of Helian for each business group to use, one for each group, which does not interfere with each other, but the cost is too high. If the full link routing can be realized, each development team can publish the test environment of the service using marking traffic.
Referring to the current industry practice, the full link grayscale routing can identify and label traffic at the gateway level, and each test environment has a separate label; Each hop service call passes the traffic label, and in each hop call, matching routes with different policies are made according to the traffic label and the peer machine label. In the end, Helian can achieve each environment by deploying the modified services of the current environment, reusing the services of the baseline environment to the maximum extent, and reducing the overall cost.
In addition, Helian will implement a full HTTP gateway. From the perspective of the future trend, the front end is getting heavier and heavier. It is not necessary for the back end to be a web layer, but to directly expose the back end services to the front end. Therefore, Helian considers replacing all web layers with BFF gateways, and looks forward to closely following the pace of the community and developing together with the cloud native community.
What technological development stages has Helian Health experienced?
The first stage: macro application. From 0 to 1, the iteration speed is fast, and there are also many failures. The business needs Helian to quickly iterate and verify. How to do it quickly? At that time, a container management service provided by Alibaba Cloud Jushita was also used, which is the prototype of containerization. In summary, we focus on speed, but there will be technical debt, many failures, and failure to meet business expectations.
The second stage: microservice. When there were more and more hospitals connected to Helian, there were more failures, and customers complained a lot. At that time, developers were "fighting fires" all day long. Later, Helian began to do modular decoupling and service splitting, and introduced Dubbo and Nacos. However, at that time, the understanding of business was still not deep enough. There was a problem with service splitting, which led to a lot of cross calls of services. There were super services that almost all interfaces would call, which was harmful to stability. To sum up, micro service splitting without deep understanding of business can solve the symptoms rather than the root causes.
The third stage: microservice reconstruction. Focusing on horizontal orders, orders, and data synchronization, we reorganized the modules and services, replaced the deployment architecture with K8s, and replaced some middleware used for service governance with cloud services such as Alibaba Cloud microservice engine MSE [1]. At this time, the overall system is more stable. In summary, building microservices around business combined with the advantages of cloud has improved the development, operation and maintenance efficiency and online stability.
What are the different technical challenges of the low fault tolerance medical service?
Low fault tolerance is the business feature of Helian. For example, when users go to the hospital for physical examination, projects that cannot be completed due to IT reasons will have a great impact on the user experience. Not only is physical examination, but the whole medical industry has the characteristics of low fault tolerance. In addition, for most people, the frequency of physical examination is only 1-2 times a year, which is a very low frequency scene, so the traffic is also very low. The problem with low traffic is that grayscale publishing is almost invalid, and even full publishing may not find bugs. Some bugs will not be found until one year after the code is published.
Therefore, Helian must first solve the problem of complex logic, and must be modularized and decoupled.
However, if only business decoupling is done, modularization is enough. For example, if Java language is used, Java modules can be divided into JAR packages, and Maven can be used to manage different dependencies. However, many early technical architectures supported different businesses through a single package, with many business modules and no business isolation. When the microservice is not split, there may be problems with the enterprise business code, leading to the business collapse of the hospital with low fault tolerance, which is unacceptable to the business.
Therefore, Helian has directly realized the service, and has separated the services. There are public basic services that can be called, and different businesses will not affect each other. Servitization not only realizes business decoupling, but also realizes service layering and guarantees the core services of performance. For example, for businesses with very low fault tolerance, you can build support services specifically for problem scenarios. At the same time, you can perform independent quality inspection on services. If they are packaged together, you cannot perform independent quality inspection.
There are two main modes of service splitting. One is splitting by business, and the other is splitting by capability. Different businesses can call each other. Finally, the architecture of Helian is shown in the figure above. It is mainly divided according to capacity, supplemented by business. For example, the front end is the web service, the blue block is the business service of the business core iteration, and the bottom layer splits the order, payment, and message services according to capabilities. The lower layer is far away from business, such as hospital data synchronization service and manual contract fulfillment service, which are self built independent services.
The services with the most frequent business iterations are separated from the relatively stable services. The two sides are connected through HTTP. Dubbo is used as RPC in the business cluster, Nacos as the registration and configuration center, and RocketMQ as asynchronous messages.
Practical experience in the evolution of microservices
For micro services, Helian uses the Dubbo+Nacos technology stack.
Dubbo is an RPC framework based on Java Interface. For Java programmers, it can become a microservice by adding simple annotations, so it is promoted in the team. At the same time, the call almost does not invade the code. Change @ Autowire to @ DubboReference to inject services. The integration of Nacos in Dubbo is very perfect. It only takes a few lines of configuration to use. The control panel is simple and easy to use. Like Dubbo, it is a Chinese community, and the threshold for programmers is lower.
In the early days, Helian built its own community version of Nacos, which encountered a major performance bottleneck. At that time, the Dubbo2 service model was based on interfaces. One interface and one function would bring one service, and the traffic was very large. Alibaba Cloud's microservice engine MSE helped Helian overcome the pressure of Dubbo. It has good compatibility. Later, Helian followed the community to upgrade to Dubbo 3, solving the problem of Dubbo 2's service model. In addition, from the memory perspective, MSE has excellent tuning capabilities, which improves business performance by four times and reduces resource costs.
He Lian serves a large number of hospitals. The demands of each hospital are uncertain and different. There will be a large number of characteristic switches. The operation of such switches is very dangerous, which is generally configured by developers. MSE solves the problem well. The MSE feature switch can be dynamically configured without restarting the application. At the same time, it can be combined with the KMS AliCloud key management service to encrypt and store data, but users have no awareness.
HTTP gateway mainly solves the problem of protocol conversion. Helian's App front-end business logic is heavy, and no result encapsulation is required, as long as the service capability is exposed. Therefore, Apache Shen Yu, based on open source, has made a transformation to convert the HTTP protocol to Dubbo, support POST/GET, and put the authentication and authorization logic to the gateway.
In terms of DevOps, the K8s+image release rollback uses ACK [2], and continuous integration uses cloud effect CI, which brings a high release efficiency to Helian. At most, it will release 20-30 times a week, and the single release time is reduced from 2-3 hours to 8 minutes. In addition, Helian has isolated services based on Dubbo. For example, two versions of the same service can be deployed, with the same code and usage, but different instances. Two services have independent memory. When one service fails, the other service will not be affected. However, this capability is still weak, and the enhancement of control capability is the future development direction.
Future planning of microservices
In the future, Helian hopes to realize the control surface of Service Mesh.
As shown in the figure above, for example, when a service request arrives, if it is req *, you want it to be routed to a special version of ServiceA *. The message sent after the request passes through MQ cannot be received by the Service message, but should be received by the Service * to achieve the routing capability of the whole link. At present, Alibaba Cloud ASM provides Istio hosting with the above capabilities, and also provides basic Dubbo governance capabilities [3]. Later, we will explore how to integrate and evolve in ASM.
The purpose of implementing Service Mesh is to reduce the cost of test environment. At present, there are 7-8 sets of test environments in the large cluster of Helian for each business group to use, one for each group, which does not interfere with each other, but the cost is too high. If the full link routing can be realized, each development team can publish the test environment of the service using marking traffic.
Referring to the current industry practice, the full link grayscale routing can identify and label traffic at the gateway level, and each test environment has a separate label; Each hop service call passes the traffic label, and in each hop call, matching routes with different policies are made according to the traffic label and the peer machine label. In the end, Helian can achieve each environment by deploying the modified services of the current environment, reusing the services of the baseline environment to the maximum extent, and reducing the overall cost.
In addition, Helian will implement a full HTTP gateway. From the perspective of the future trend, the front end is getting heavier and heavier. It is not necessary for the back end to be a web layer, but to directly expose the back end services to the front end. Therefore, Helian considers replacing all web layers with BFF gateways, and looks forward to closely following the pace of the community and developing together with the cloud native community.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00