The import and export framework AGEIPort (GEI) is officially open source
What is AGEIPort?
AGEIPort is a set of data import and export solutions incubated by the digital supply chain and widely used in Alibaba Group with excellent performance, stability, reliability, rich functions, easy expansion, and ecological integrity. It is committed to helping developers quickly deliver high-performance, excellent experience, and easy maintenance data import and export functions under the complex ToB business scenario, such as Excel/CSV data file upload and download on user pages.
At present, in Alibaba Group, Hema, Cainiao, Local Life, Alibaba Health, Nai Nai, Taoshi and other departments have used them more and more, and they have become the foundation of multiple technical components. They have experienced many tests of 618 and Double 11, and steadily imported and exported 30 to 40 billion pieces of data per month.
What pain points does AGEIPort solve?
If the data import and export function of your business system has the following pain points, AGEIPort will be your only choice:
1. Users often use Excel to import or export data in batches, ranging from thousands or tens of thousands to hundreds of thousands or millions. The system processing speed is slow, and users often complain about it.
2. The asynchronous operation task progress on the page is mock. 99% of the tasks are like Schrodinger's cat. The execution may end at the next moment, or it may take more than ten minutes. In short, no one is sure when it can end.
3. The tool classes for reading and writing files and the interface/abstract classes for data processing have been written again and again in each application, encapsulating layer by layer, but they can never meet more complex business scenarios. When these utility classes and abstract classes are referenced by enough classes, the modification of public classes will affect the whole body. A smart programmer walks on the blade and skillfully copies a new one, which is used again. However, the application has been changed everywhere, and a little carelessness will hurt him.
4. The import and export related code is too specific but not reusable, too abstract but not easy to understand and maintain. When the input and return values of the interface or abstract class are defined as JSON objects or maps that are universal enough, the new receiver will know that he has won the prize
5. Different application codes are written in different ways. When you write code with the same function in a new application, you need to embrace change and increase your awareness of the world.
6. In daily business development, we are often troubled by problems unrelated to business such as performance, style processing, data reading, writing, mapping, and interaction. I hope I can focus on business development to solve business problems, and let the framework help solve non business problems.
7. For ToB complex business scenarios, we need to build a PaaS/SaaS product. The code should have sufficient scalability and configurability. The same code can meet the customization requirements of various scenarios.
8. Rely on some centralized middleware to implement the import and export function. Every time the promotion is completed and guaranteed, it will still be affected by shared resources. When the business scale suddenly increases, the performance of import and export is still limited by the processing performance of the middleware.
9. Some consoles and Nacos configurations are relied on to add and modify import and export tasks. Whenever you want to rollback, you start to be in a hurry. I wish you could never use a console that cannot support rollback again.
Advantages of AGEIPort?
The overall framework is designed based on the event driven architecture and follows the advanced design concept:
1. Transparent cluster/stand-alone execution and serial/parallel execution can greatly improve data processing performance. Developers only need to focus on business logic processing.
2. Support real-time task progress calculation and feedback, avoid MOCK data processing progress, and improve user experience.
3. For ToB complex business scenarios, it can meet the individual needs of various scenarios in a variety of ways (declaration definition, dynamic definition) and dimensions (configuration, plug-in, policy, SPI), and can be used as the foundation of platform based and PaaS/SaaS products.
4. Precipitate multiple components, multiple scenarios, and multiple functions out of the box.
5. Adhering to the design concept of GitOps, encapsulating the relevant immutable infrastructure in the application Git warehouse can make the release and rollback of deliverables faster, more stable and more secure.
6. Decentralized architecture, business application self-organized cluster resource isolation, to ensure high isolation, scalability and availability of business functions.
7. Standardize the process and code writing of the task office, define the interface between the process of a data processing task and the user, separate responsibilities between interfaces, standardize the writing of user import and export code, and improve the maintainability of the code.
8. Define the business domain object, and define the domain model in the import and export code by designing generic interfaces. This can avoid the heavy use of Map and JSON parameter transfer in the business code, and improve the maintainability of the code.
9. Record the execution process of business code to assist in optimizing the performance of business code
Architecture Design of AGEIPort
The architecture and functional design of AGEIPort are shown in the figure below:
event driven
AGEIPort designs an overall framework based on event driven architecture to ensure task performance and pure asynchronous execution, and decouples processing logic to improve scalability.
When running the main task in stand-alone mode, randomly select a node from the application cluster to execute the main task and the subtasks after the main task is disassembled.
When running the main task in the cluster mode, randomly select a node from the application cluster as the master node of the main task. The main task will be decomposed into subtasks and distributed to other nodes in the application cluster. Other nodes are regarded as slave nodes of the main task.
The framework has built-in LocalEventBus (single machine EventBus, which processes single machine tasks) and RemoteEventBus (distributed Eventbus, which processes multi machine tasks). The EventBus will deliver the Event to the Listener of the Master node according to the current context routing. The Monitor is responsible for calculating the task progress, judging the current status, sending the next Event, or doing nothing.
An example of an event driven process is shown in the figure below:
1) One subtask fails to execute, one subtask executes normally, and the main task fails to execute
2) After the normal execution of all subtasks, the main task is merged
Concurrent and asynchronous
Concurrency and asynchrony are mainly used to solve the performance, experience and development efficiency problems of import and export functions
When running the main task in stand-alone mode, randomly select a node from the application to execute subtasks in parallel in the subtask thread pool
When running the main task in multi machine mode, select multiple slave nodes from the application to execute the subtasks in parallel in the subtask thread pool of each slave
The framework uses the Reactor thread model. When the task is executed at the master node, the task is first thrown to the back pressure queue, and then the task is taken from the back pressure queue and handed over to the main task thread pool for execution. The main task is divided into subtasks and then distributed. The subtasks are distributed to the slave node and executed in the subtask thread pool of the slave node.
In addition, the concurrency and asynchrony of the framework are transparent to developers. Developers don't need to pay attention to performance issues. They just need to implement the framework's callback interface to process pages or return data (exporting a file is to implement a paging query). The framework will call the developer's callback interface in parallel in multiple threads at the same time.
Import core interface:
BizImportResult write(BizUser user, QUERY query, List data) throws BizException;
Export core interface:
List queryData(BizUser user, QUERY query, BizExportPage bizExportPage) throws BizException;
Decentralization
Decentralized design is mainly used to improve the availability and flexibility of the import and export function, so that the import and export cluster can scale with the business scale to improve performance, ensure resource isolation between applications, and improve the stability of the import and export function.
The main solution is to make business application nodes self organize clusters through decentralized design, so that applications do not need to rely on external systems. The application clusters communicate with each other internally for task scheduling, distribution and execution.
The final effect is that after a Jar package is introduced into the business application, a cluster can be automatically formed. The import and export tasks run in the cluster do not depend on external middleware or services, and a complete closed loop for the implementation of import and export tasks can be achieved; The nodes in the cluster can register, subscribe and communicate with each other (to ensure the main sub task scheduling and main sub task progress synchronization in the cluster); There is no master node in the cluster. Each node has equal relationship. As long as one node is available, the function is available as a whole.
The scheme uses the SideCar mechanism and introduces a set of service discovery and communication mechanisms to enable applications to self organize clusters and communicate with each other, so as to complete the task scheduling and task progress reporting within the application cluster. When an import/export task is executed, any node of the application can be used as both a master node and a slave node to achieve decentralization. For an import/export task cluster, there is a unique master node responsible for the processing of this task.
On the one hand, this decentralized, self-organized cluster for task distribution and execution ensures that the import and export cluster can scale elastically with the scale of business applications, improving the utilization of application cluster resources; on the other hand, the clusters between applications are isolated from each other and independent of external systems, improving the SLA of applications; Finally, the design that any node can be used as a master improves the overall availability of the application. Only when all cluster nodes are down, the import/export task is unavailable.
GitOps
GitOps is a development and operation practice that uses Git to manage infrastructure and application configuration. GitOps is a cloud native oriented continuous delivery method. The core idea is to store the declarative immutable infrastructure and applications of the application system in the Git version library, and use Git as the single trusted source of the declarative infrastructure and applications. The main advantages are: higher development efficiency, higher release controllability, reliability, stability, consistency, and convenience, which are conducive to release automation, continuous delivery, and cloud native orientation.
Just imagine that rollback only requires republishing or rolling back the code of the previous version of the Git repository, rather than modifying the configuration of various consoles to consider the integrity of configuration management. Is it much easier?
The framework supports the delivery method of GitOps to improve development and delivery efficiency and import/export function SLA. GitOps means that the infrastructure required for the import and export task to run and the import and export business code developed by the user will be stored in the Git warehouse of the user application, which can be used for arbitrary review and version control, and can be deployed automatically through the pipeline.
Non GitOps import/export function development, publishing and rollback process
GitOps import/export function development, publishing and rollback process
Standardized process and domain oriented model coding
I believe that you have seen the "power" of using a large number of Maps or JSON as function input parameters. They can easily eliminate your domain model, and also make new developers miserable and cautious.
Use generic interfaces instead of other frameworks to use Map/JSON interfaces. The main purpose is to enable interfaces to define domain objects. Developers can directly code business domain models, and ultimately improve the readability and maintainability of the code.
The framework provides a standard task processing process, which means that developers need to implement a standard interface to complete the import and export function. It is mainly used by structured developers to import and export codes, improve code readability and maintainability, and ensure that all developers' code structures and logic are consistent and can be constrained by framework semantics, so that developers can avoid context switching when developing import and export functions in different applications or teams.
In addition, the standardized task processing process can distinguish between CPU intensive and IO intensive processes, so as to improve the transparent performance of the framework.
AGEIPort is a set of data import and export solutions incubated by the digital supply chain and widely used in Alibaba Group with excellent performance, stability, reliability, rich functions, easy expansion, and ecological integrity. It is committed to helping developers quickly deliver high-performance, excellent experience, and easy maintenance data import and export functions under the complex ToB business scenario, such as Excel/CSV data file upload and download on user pages.
At present, in Alibaba Group, Hema, Cainiao, Local Life, Alibaba Health, Nai Nai, Taoshi and other departments have used them more and more, and they have become the foundation of multiple technical components. They have experienced many tests of 618 and Double 11, and steadily imported and exported 30 to 40 billion pieces of data per month.
What pain points does AGEIPort solve?
If the data import and export function of your business system has the following pain points, AGEIPort will be your only choice:
1. Users often use Excel to import or export data in batches, ranging from thousands or tens of thousands to hundreds of thousands or millions. The system processing speed is slow, and users often complain about it.
2. The asynchronous operation task progress on the page is mock. 99% of the tasks are like Schrodinger's cat. The execution may end at the next moment, or it may take more than ten minutes. In short, no one is sure when it can end.
3. The tool classes for reading and writing files and the interface/abstract classes for data processing have been written again and again in each application, encapsulating layer by layer, but they can never meet more complex business scenarios. When these utility classes and abstract classes are referenced by enough classes, the modification of public classes will affect the whole body. A smart programmer walks on the blade and skillfully copies a new one, which is used again. However, the application has been changed everywhere, and a little carelessness will hurt him.
4. The import and export related code is too specific but not reusable, too abstract but not easy to understand and maintain. When the input and return values of the interface or abstract class are defined as JSON objects or maps that are universal enough, the new receiver will know that he has won the prize
5. Different application codes are written in different ways. When you write code with the same function in a new application, you need to embrace change and increase your awareness of the world.
6. In daily business development, we are often troubled by problems unrelated to business such as performance, style processing, data reading, writing, mapping, and interaction. I hope I can focus on business development to solve business problems, and let the framework help solve non business problems.
7. For ToB complex business scenarios, we need to build a PaaS/SaaS product. The code should have sufficient scalability and configurability. The same code can meet the customization requirements of various scenarios.
8. Rely on some centralized middleware to implement the import and export function. Every time the promotion is completed and guaranteed, it will still be affected by shared resources. When the business scale suddenly increases, the performance of import and export is still limited by the processing performance of the middleware.
9. Some consoles and Nacos configurations are relied on to add and modify import and export tasks. Whenever you want to rollback, you start to be in a hurry. I wish you could never use a console that cannot support rollback again.
Advantages of AGEIPort?
The overall framework is designed based on the event driven architecture and follows the advanced design concept:
1. Transparent cluster/stand-alone execution and serial/parallel execution can greatly improve data processing performance. Developers only need to focus on business logic processing.
2. Support real-time task progress calculation and feedback, avoid MOCK data processing progress, and improve user experience.
3. For ToB complex business scenarios, it can meet the individual needs of various scenarios in a variety of ways (declaration definition, dynamic definition) and dimensions (configuration, plug-in, policy, SPI), and can be used as the foundation of platform based and PaaS/SaaS products.
4. Precipitate multiple components, multiple scenarios, and multiple functions out of the box.
5. Adhering to the design concept of GitOps, encapsulating the relevant immutable infrastructure in the application Git warehouse can make the release and rollback of deliverables faster, more stable and more secure.
6. Decentralized architecture, business application self-organized cluster resource isolation, to ensure high isolation, scalability and availability of business functions.
7. Standardize the process and code writing of the task office, define the interface between the process of a data processing task and the user, separate responsibilities between interfaces, standardize the writing of user import and export code, and improve the maintainability of the code.
8. Define the business domain object, and define the domain model in the import and export code by designing generic interfaces. This can avoid the heavy use of Map and JSON parameter transfer in the business code, and improve the maintainability of the code.
9. Record the execution process of business code to assist in optimizing the performance of business code
Architecture Design of AGEIPort
The architecture and functional design of AGEIPort are shown in the figure below:
event driven
AGEIPort designs an overall framework based on event driven architecture to ensure task performance and pure asynchronous execution, and decouples processing logic to improve scalability.
When running the main task in stand-alone mode, randomly select a node from the application cluster to execute the main task and the subtasks after the main task is disassembled.
When running the main task in the cluster mode, randomly select a node from the application cluster as the master node of the main task. The main task will be decomposed into subtasks and distributed to other nodes in the application cluster. Other nodes are regarded as slave nodes of the main task.
The framework has built-in LocalEventBus (single machine EventBus, which processes single machine tasks) and RemoteEventBus (distributed Eventbus, which processes multi machine tasks). The EventBus will deliver the Event to the Listener of the Master node according to the current context routing. The Monitor is responsible for calculating the task progress, judging the current status, sending the next Event, or doing nothing.
An example of an event driven process is shown in the figure below:
1) One subtask fails to execute, one subtask executes normally, and the main task fails to execute
2) After the normal execution of all subtasks, the main task is merged
Concurrent and asynchronous
Concurrency and asynchrony are mainly used to solve the performance, experience and development efficiency problems of import and export functions
When running the main task in stand-alone mode, randomly select a node from the application to execute subtasks in parallel in the subtask thread pool
When running the main task in multi machine mode, select multiple slave nodes from the application to execute the subtasks in parallel in the subtask thread pool of each slave
The framework uses the Reactor thread model. When the task is executed at the master node, the task is first thrown to the back pressure queue, and then the task is taken from the back pressure queue and handed over to the main task thread pool for execution. The main task is divided into subtasks and then distributed. The subtasks are distributed to the slave node and executed in the subtask thread pool of the slave node.
In addition, the concurrency and asynchrony of the framework are transparent to developers. Developers don't need to pay attention to performance issues. They just need to implement the framework's callback interface to process pages or return data (exporting a file is to implement a paging query). The framework will call the developer's callback interface in parallel in multiple threads at the same time.
Import core interface:
BizImportResult write(BizUser user, QUERY query, List data) throws BizException;
Export core interface:
List queryData(BizUser user, QUERY query, BizExportPage bizExportPage) throws BizException;
Decentralization
Decentralized design is mainly used to improve the availability and flexibility of the import and export function, so that the import and export cluster can scale with the business scale to improve performance, ensure resource isolation between applications, and improve the stability of the import and export function.
The main solution is to make business application nodes self organize clusters through decentralized design, so that applications do not need to rely on external systems. The application clusters communicate with each other internally for task scheduling, distribution and execution.
The final effect is that after a Jar package is introduced into the business application, a cluster can be automatically formed. The import and export tasks run in the cluster do not depend on external middleware or services, and a complete closed loop for the implementation of import and export tasks can be achieved; The nodes in the cluster can register, subscribe and communicate with each other (to ensure the main sub task scheduling and main sub task progress synchronization in the cluster); There is no master node in the cluster. Each node has equal relationship. As long as one node is available, the function is available as a whole.
The scheme uses the SideCar mechanism and introduces a set of service discovery and communication mechanisms to enable applications to self organize clusters and communicate with each other, so as to complete the task scheduling and task progress reporting within the application cluster. When an import/export task is executed, any node of the application can be used as both a master node and a slave node to achieve decentralization. For an import/export task cluster, there is a unique master node responsible for the processing of this task.
On the one hand, this decentralized, self-organized cluster for task distribution and execution ensures that the import and export cluster can scale elastically with the scale of business applications, improving the utilization of application cluster resources; on the other hand, the clusters between applications are isolated from each other and independent of external systems, improving the SLA of applications; Finally, the design that any node can be used as a master improves the overall availability of the application. Only when all cluster nodes are down, the import/export task is unavailable.
GitOps
GitOps is a development and operation practice that uses Git to manage infrastructure and application configuration. GitOps is a cloud native oriented continuous delivery method. The core idea is to store the declarative immutable infrastructure and applications of the application system in the Git version library, and use Git as the single trusted source of the declarative infrastructure and applications. The main advantages are: higher development efficiency, higher release controllability, reliability, stability, consistency, and convenience, which are conducive to release automation, continuous delivery, and cloud native orientation.
Just imagine that rollback only requires republishing or rolling back the code of the previous version of the Git repository, rather than modifying the configuration of various consoles to consider the integrity of configuration management. Is it much easier?
The framework supports the delivery method of GitOps to improve development and delivery efficiency and import/export function SLA. GitOps means that the infrastructure required for the import and export task to run and the import and export business code developed by the user will be stored in the Git warehouse of the user application, which can be used for arbitrary review and version control, and can be deployed automatically through the pipeline.
Non GitOps import/export function development, publishing and rollback process
GitOps import/export function development, publishing and rollback process
Standardized process and domain oriented model coding
I believe that you have seen the "power" of using a large number of Maps or JSON as function input parameters. They can easily eliminate your domain model, and also make new developers miserable and cautious.
Use generic interfaces instead of other frameworks to use Map/JSON interfaces. The main purpose is to enable interfaces to define domain objects. Developers can directly code business domain models, and ultimately improve the readability and maintainability of the code.
The framework provides a standard task processing process, which means that developers need to implement a standard interface to complete the import and export function. It is mainly used by structured developers to import and export codes, improve code readability and maintainability, and ensure that all developers' code structures and logic are consistent and can be constrained by framework semantics, so that developers can avoid context switching when developing import and export functions in different applications or teams.
In addition, the standardized task processing process can distinguish between CPU intensive and IO intensive processes, so as to improve the transparent performance of the framework.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00