By Jian Chao, from Idle Fish Technology
In 2018, we put forward the R&D solution featuring cloud-edge unification based on Flutter + Dart FaaS in practice. This solution reduces the R&D threshold of the service-side business assembly layer by using the light (focusing on business), fast (single interface and single function, fast R&D, and fast deployment), and NoOps (O&M platform) capabilities of Serverless. Client-side developers can also have the opportunity to participate in service-side business development. This reduces the problem of client-server collaboration efficiency and improves the iterative efficiency of emerging businesses. In the traditional application architecture of Idle Fish, there is also a similar business assembly layer called idleapi.
As the vertical business boundaries of applications and the hierarchical design of the architecture are not clear, almost all businesses are iterated on idleapi. New businesses continue to accumulate, old businesses continue to iterate, and expired businesses cannot be cleaned up in time, resulting in the continuous expansion of the application scale. According to statistics, as of the Double 11 Global Shopping Festival in 2020, idleapi has provided more than 1,200 gateway interfaces. More than 500 of them have no business traffic (business is disabled), but the code is still running and has not been cleaned up in time. As a result, idleapi has a total of more than 700,000 lines of code, over 2,000 business switches, and hundreds of business modules. As a result, many businesses, code, and development objects are coupled in one application, causing a series of isolation problems.
Hundreds of business modules run in one application process and interfere with each other, which can easily lead to isolation problems. For example, if a business module has a problem (running out of memory or thread pool resources), other business modules deployed on the same machine will have no resources available, resulting in the denial of service. The core business deployed on the same machine will encounter a failure. Such examples exist every year.
Dozens of R&D personnel develop and maintain hundreds of business modules, and each release will have more than ten branches. Each additional business branch will face the risk of code conflicts. The greater the gap between the baseline version of a branch and the baseline version of other branches results in more conflicts to be resolved and longer time consumed. According to statistics, it takes about 30 minutes for idleapi to pre-release once, of which 20 minutes are used for waiting for developers to resolve conflicts. So, the development efficiency is low.
Idle Fish adjusted the personnel structure according to the business domain to develop businesses and pay attention to the business indicators, but the application structure was too late to follow up. Although the same business group can be autonomous, cohesive, and communicate effectively when all businesses are coupled in one application, a lot of energy is still needed for cross-group collaboration between businesses.
The structures of large systems tend to disintegrate during development, qualitatively more so than with small systems. Organizations that design systems are constrained to produce designs that are copies of the communication structures of these organizations. According to Conway's Law, large systems always tend to be decomposed and reorganized in development to achieve some homomorphism of system architecture and personnel structure. We split it to solve various problems with idleapi. In the process of splitting, several issues must be considered in advance.
The problems above are the key points of the splitting process and determine whether the splitting scheme can be implemented successfully. Next, let's analyze them one by one.
The first problem to be solved by the splitting: What is the target splitting product? There are roughly two ideas:
Based on the exploration and comparison over the past few years, we believe that the idea of FaaS functions is very suitable for solving the problems encountered by idleapi.
First of all, for traditional applications, multiple interfaces are developed in parallel on one application during the debugging period. When a different branch code is released, there is a risk of code merging conflicts, and it takes about 30 minutes for a pre-release deployment.
For FaaS, one gateway interface corresponds to one FaaS function. Each FaaS function has an independent Git repository and deployment environment. FaaS functions are independent of each other and physically isolated. Developers can safely modify their code and baseline versions and can also initiate remote debugging at any time without hindering the debugging of other developers. Moreover, each FaaS function focuses on only one service gateway interface. The amount of code and internal services that FaaS functions depend on are much smaller than traditional applications. Therefore, pre-release deployment takes only three minutes at a time, which is nearly ten times faster than traditional applications.
During the operating period, each FaaS function runs on a different cluster. This natural physical isolation prevents FaaS functions from causing isolation faults. If a FaaS function runs out of the thread pool or disk resources, it does not affect the functions deployed on other clusters (except for associated businesses).
Although FaaS functions have advantages in the debugging period, operating period, and O&M period, traditional monolithic applications have advantages in the encoding period. For example:
After the splitting solution is determined, idleapi will be split into hundreds of FaaS functions based on gateway interfaces from a giant monolithic application. It is unrealistic to re-implement so many businesses, so the best way is to reuse the business code in the monolithic application.
After analyzing the code, we found that in idleapi, the code of each business references each other, forming an intricate giant mesh structure. One business interface is associated with the code of five (or even ten) other business interfaces, involving nearly 1,000 source files, which accounts for 1/4 of the total number of idleapi source code files. This does not simplify the business code. Besides the business gateway entry, there are also various other implicit function entries. For example, JSON serialization will call the set function and bean initialization function of the class automatically. It poses a great challenge to the manual splitting of business code.
To this end, we have designed and implemented a code splitting tool that can help businesses analyze the classes, methods, and properties on which business entry functions depend in interwoven code and exclude classes, methods, and properties that are not called. The tool can reduce the number of source files on which a single business portal depends to about 100 (70% of which are from interface data types.) Combined with the FaaS business framework designed and implemented by us, when developers migrate businesses, they can split the business code, create FaaS functions, and deploy them to the pre-release environment with one click. The whole process takes less than half an hour. For business switch configurations, we also provide a migration tool that can migrate online or pre-release configurations to new functions in batches with one click, eliminating the need for the copy approval of manual migration.
Testing is the last barrier to ensuring the quality of the split business code. We collaborated with the FaaS platform and the automated regression testing platform to reduce the extra workload that application splitting brings to business personnel and developers and adapt the recording, playback, and other regression testing functions to the SideCar and Pod architectures of the FaaS platform. Developers only need to record online traffic in traditional applications after the FaaS function is released and then import the traffic to the FaaS function to be tested for automated regression testing.
Developers can complete the regression testing of the business by themselves by combining it with the automated testing platform. This reduces the risk of business migration and the pressure on testing personnel and improves migration efficiency.
In terms of the O&M of the FaaS business, we try our best to keep the O&M habits of the developers. The split FaaS function retains the name of the log, the organization format of the log, and the code in the monolithic application. It also retains developers' ability to log in to the remote machine. At the same time, we adapt the personalized business log to the white-screen log function of the FaaS platform. Developers can search all logs on any machine through the control platform, which is much more effective than logging into machines to check one by one. At the same time, the log-based monitoring and alerting system only needs to update the corresponding monitored business log path to complete the monitoring migration.
There are two ways to solve the problem of business code reuse after the application is split into fine-grained FaaS functions:
The first solution is governance before splitting. First, transform and reconstruct the monolithic application, expand the code reused by each business (expand to the public internal package or the service layer of the business domain), and then split the monolithic application into multiple FaaS functions
There are two problems with the solution above:
The second solution is splitting before governance. First, the monolithic application is split by business, and the problem of code reuse is ignored temporarily. After splitting, some developers perform code reuse and transformation according to real-world business needs in the subsequent development process. Encapsulate the code reused by businesses to a working internal package or expand it to the domain service.
Compared to the first solution, sorting out reusability issues among clearly isolated function code bases will be less complex and risky. Therefore, we chose the second solution.
Over 30 gateway interfaces have been split from the monolithic application and delivered for business development and maintenance. This verifies that the solution is feasible in the splitting and governance of monolithic applications. We will provide the split solution to help developers split the migration business by themselves later. After the splitting, the business retains its original development and O&M habits.
At the same time, one service gateway interface corresponds to one function, so that one FaaS function only focuses on one service gateway interface. This solves the problem of the continuous expansion of traditional applications in scenarios where services continue to innovate. This focus also makes the amount of function code less than 3% of traditional applications (mostly data code), and it only takes five minutes for a business release (Java).
In general, developers can split a business interface with one click within half an hour and deploy it in pre-release with the help of automated splitting tools. It does not require manual intervention during the process, and the split function maintains the original development and maintenance habits. The low migration cost is acceptable to developers. Moreover, with the help of the business focus of functions, one interface corresponds to one function. Each function will not be interfered with by other services during the development period, so the measurability and deployment speed are high. During the operating period, each function runs on different physical machines. This natural physical isolation improves the stability during the operating period and reduces the O&M costs of the business.
The FaaS function platform is still developing rapidly and can still be improved.
Selection and Exploration of Scheme for Flutter Automated UI Testing
How Idle Fish Uses RxJava to Improve the Asynchronous Programming Capability - Part1
56 posts | 4 followers
FollowXianYu Tech - December 13, 2021
Alibaba Clouder - November 24, 2020
Alibaba Cloud Serverless - April 7, 2022
Alibaba Developer - January 5, 2022
Alibaba Developer - February 1, 2021
XianYu Tech - December 13, 2021
56 posts | 4 followers
FollowA unified, efficient, and secure platform that provides cloud-based O&M, access control, and operation audit.
Learn MoreManaged Service for Grafana displays a large amount of data in real time to provide an overview of business and O&M monitoring.
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreVisualization, O&M-free orchestration, and Coordination of Stateful Application Scenarios
Learn MoreMore Posts by XianYu Tech