Alibaba Cloud FaaS Architecture Design
1、 ECS based FaaS
In Alibaba Cloud's traditional architecture, users access the load balancing system through the Internet, and then schedule system requests to different machines through load balancing. This traditional architecture poses many problems. On the one hand, the ratio of multiple applications is easily unbalanced, resulting in a waste of resources; On the other hand, image upgrading is relatively cumbersome, with the startup speed of the entire process at the minute level, and the expansion speed is relatively slow.
(1) Architecture Design
The FaaS architecture design based on ECS is also accessed through the Internet, and falls into SLB load balancing. SLB load balancing is a system deployed inside Alibaba Cloud that is mainly used to withstand DDoS attacks and balance requests to multiple APIs_ On the server. api_ The server then initiates the CRUD operation of the function and applies to the Scheduler for a container.
The Scheduler manages the placement of containers in the worker and requests scheduled distribution that falls on the container. The user's worker is what we call a computing node. If you need to access the user's VPC environment, you can connect to the user's VPC environment through the ENI network card on the computing node.
(2) Support for multi tenant and multi application deployment
Namespace is a resource isolation scheme introduced by Linux several years ago. You can make some settings at the kernel level to specify that some processes are fixed. And can be set in this set of cgroup settings to control resource access. Under the complete set of namespaces and cgroups, container has been derived. The Docker scheme commonly used in the community packages many details in the image operating system into a single scheme. Users see a relatively complete operating system and place the user in a virtual machine as a single user. This is a VM, equivalent to an ECS. This is the operating system level. It shields the entire CPU, memory, and devices, and seals them with a cgroup layer, corresponding to the Docker container.
The application placement strategy includes user exclusive virtual machines, VPC exclusive virtual machines, and APP mixed with the same resource access permissions on the same machine. Mixing two different users under the same VM, that is, ECS, poses risks for users. In order to shield against the risks posed by sharing the kernel, we only have one tenant for a single ECS implementation. There are also some problems with this approach, the most prominent being the low utilization of resources for low-frequency function calls.
(3) Rapid horizontal elastic expansion
How to achieve horizontal elastic expansion?
① Through application container deployment, you can customize some special languages, runtime containers, and generic LIB/SDKs, and maintain a consistent community ecosystem. This eliminates the need for additional downloads, makes it easier for users to use, and enables very fast startup.
② By setting a public container image, writing a container image to an ECS image, starting a machine with an ECS image, and quickly replenishing the machine pool, the machine resource pool can be controlled, allowing for both performance and cost.
③ In a pooled machine, the creation of pooled containers, delayed mounting of code directories, early startup of runtime, and early health check can make the time required to start when a user request arrives shorter.
④ Control application size by limiting user application size, encouraging split business logic, and built-in SDK/Lib.
⑤ P2P image download acceleration is achieved through P2P image distribution, avoiding impact on download services, on-demand loading, reducing download latency, and improving startup speed.
How to improve resource utilization
In the actual research and development process, it was found that scheduling per unit of time slice under the same QPS has a significant impact on the amount of resources, and we can improve resource utilization through scheduling. For example, in the figure below, we can see that the overall TPS in the macro state is very stable. However, in fact, when we zoom in to the millisecond level, we will find that it is actually very uneven! So what impact will this non-uniformity have on us?
Let's assume that the maximum concurrency set for each container is 1, that is, one container can only process one task at any time. The following figure shows the impact on the number of containers when multiple requests for a, b, c, d, e, and f are scheduled at different time points.
We can see that in scenario 1, when each request is evenly entered, only one container is needed at any time, which is what we would ideally like to achieve;
In Scenario 2, if there is a scheduling delay, it may cause the previous requests and subsequent requests to mix at a certain point in time, resulting in a doubling of the number of containers. In the middle of the gap, these containers are not fully utilized, resulting in a waste of resources;
In Scenario 3, if the container takes a long time to start or the call takes longer, the original request b and request a will overlap in occurrence time, resulting in the need to create a new container. If the new container requires a long cold start, it will also overlap in occurrence time with request c. If the scheduling system is not implemented well enough, it may have an avalanche effect, leading to a surge in resource usage, while the actual utilization rate is extremely low.
Through the above scenarios, we can summarize an optimization direction for the cost of resource utilization:
1. Try to make the scheduling more uniform and reasonable as much as possible, and avoid clustering to arouse containers
2. Try to reduce the cold start time as much as possible, avoid creating a large number of containers in the short term, and avoid meaningless system scheduling overhead
In addition to the above, we can also consider high-density deployment to improve the resource utilization rate of stand-alone computers
How to withstand disasters and prevent avalanches?
When an exception occurs in actual operation, the user request may make an error, and after the error, it may restart or mobilize new resources to create a new container, but this will cause the entire delay to increase. Users will try again and again, and repeated attempts will lead to an increase in load, which in turn will cause exceptions, which is a vicious cycle. You can prevent avalanches by optimizing startup speed, multi-partition disaster recovery deployment, exponential backoff retries, breaker blocking abnormal requests, multiple availability zone disaster preparedness, and SLB blocking DDoS attacks.
2、 FaaS based on DPCA high-density deployment
(1) Why do we need to do high-density deployment?
Firstly, due to the high requirements for elastic startup speed, it is hoped that 10000 container instances per second can be started, the startup delay should be controlled within 300 milliseconds, the container lifetime should be at the minute level, and the resource granularity should be 128 MB;
Second, the cost is lower. Due to security isolation issues in the ECS architecture, there are many resource fragments and high latency in sudden calls, affecting the number of resources;
The third is performance. ECS has fewer stand-alone caches, higher request burr rates, and high maximum request latency;
Fourth, stability, high concurrency impact on the system, frequent creation and deletion of resources, and ECS control pressure make it difficult to control the explosion radius.
(2) Technical challenges brought by high-density deployment architecture
Some technical challenges posed by the entire high-density deployment architecture:
The first thing to face is how to solve the security risk of isolated single machine multi tenancy. If this problem is not solved, it will be impossible to achieve secure and high-density deployment of single machine multi tenancy, which will not effectively improve the resource utilization density;
The second is how to solve the problem of high concurrent startup speed. If this cannot be achieved, as we mentioned earlier, long cold startup times can seriously increase resource overhead, while seriously affecting the user's delayed experience;
How to solve the problem of single machine multi tenant VPC network connectivity and security is actually very important. The speed of establishing VPC network connections on ECS is very slow, which also seriously affects users' cold start and resource utilization;
In addition, we also need to consider how to design a technical disaster tolerance solution for high-density deployment, because exceptions to any one computing node can cause service exceptions for a large number of users.
(3) Optimization Based on Security Container Template Technology
How do we achieve optimization based on secure container template technology? Each container has an exclusive virtual machine sandbox, which is equivalent to an independent virtual machine with its own independent Linux kernel. In this way, each container is securely isolated through an independent kernel. During DPCA startup, a large number of virtual machines are templated to improve startup speed. By delaying the mounting of user code directories through virtiofs, and isolating users through virtual machine microkernels, it is possible to achieve about 20 megabytes of memory per microkernel on a single machine, with at least 2000 containers on a single machine, and controlling its cold startup time to be around 250 milliseconds. Through the scheduling algorithm, we can reasonably use resources and commit to the user's resource quota.
(4) Code load on demand
Code on-demand loading is achieved through the following aspects: the user container will reuse the same code, and a single DPCA only needs to download it once; The scripting language contains a large amount of code that cannot be used; Use FUSE (User Space File System) for real reading of intermediate layer files; The underlying layer uses NAS for low latency data downloading; OSS (Alibaba Cloud Object Storage) provides data download with high bandwidth support. Note that we use a mix of NAS and OSS to load code. It should be noted that NAS has relatively lower access latency and faster loading of small files. We started downloading code from OSS asynchronously in full at the initial stage of loading. For data that requires immediate access, we read from NAS. Because we have made the entire user code directory into two files: one is the directory file index data, and the other is the file content data. Due to the low latency of NAS access, we can obtain small file content from data files in a way similar to GetRange. This allows you to instantly load user code with the fastest speed to achieve a fast cold start.
(5) VPC network optimization
VPC gateway agents based on the network service grid are isolated through user VPC network security. In the past, plugging and unplugging the ENI network card in the ECS solution was very time-consuming, requiring at least 2 to 3 seconds, with P99 even reaching 6 to 8 seconds. In the high-density deployment of DPCA, we do not need to perform multiple network card plugging for each security container. Instead, we need to connect to the gateway agent uniformly on the DPCA machine, and the user ENI network card resides on the gateway cluster, which makes the entire network card load faster. This would be a huge optimization for both the user experience and resource overhead.
(6) Resource allocation rate
Improve the deployment density through mixed deployment of various types of multi tenant businesses, and reasonably match containers with different resource requirements to a physical dragon, thereby improving the resource allocation rate.
3、 Summary
Lecturer profile: Zhu Peng, Alibaba Cloud Serverless technical expert
Responsible for the design and development of Alibaba Cloud functional computing scheduling, and participated in the design, development, and implementation of functional computing in multiple directions, including high concurrency, technical disaster tolerance, cold start optimization, scheduling resource management, and DPCA bare metal technology architecture. He is one of the leading promoters of Alibaba Cloud functional computing's DPCA high-density deployment architecture. Currently, we are mainly committed to improving resource utilization, researching and designing low latency resource scheduling solutions for large-scale concurrency.
In Alibaba Cloud's traditional architecture, users access the load balancing system through the Internet, and then schedule system requests to different machines through load balancing. This traditional architecture poses many problems. On the one hand, the ratio of multiple applications is easily unbalanced, resulting in a waste of resources; On the other hand, image upgrading is relatively cumbersome, with the startup speed of the entire process at the minute level, and the expansion speed is relatively slow.
(1) Architecture Design
The FaaS architecture design based on ECS is also accessed through the Internet, and falls into SLB load balancing. SLB load balancing is a system deployed inside Alibaba Cloud that is mainly used to withstand DDoS attacks and balance requests to multiple APIs_ On the server. api_ The server then initiates the CRUD operation of the function and applies to the Scheduler for a container.
The Scheduler manages the placement of containers in the worker and requests scheduled distribution that falls on the container. The user's worker is what we call a computing node. If you need to access the user's VPC environment, you can connect to the user's VPC environment through the ENI network card on the computing node.
(2) Support for multi tenant and multi application deployment
Namespace is a resource isolation scheme introduced by Linux several years ago. You can make some settings at the kernel level to specify that some processes are fixed. And can be set in this set of cgroup settings to control resource access. Under the complete set of namespaces and cgroups, container has been derived. The Docker scheme commonly used in the community packages many details in the image operating system into a single scheme. Users see a relatively complete operating system and place the user in a virtual machine as a single user. This is a VM, equivalent to an ECS. This is the operating system level. It shields the entire CPU, memory, and devices, and seals them with a cgroup layer, corresponding to the Docker container.
The application placement strategy includes user exclusive virtual machines, VPC exclusive virtual machines, and APP mixed with the same resource access permissions on the same machine. Mixing two different users under the same VM, that is, ECS, poses risks for users. In order to shield against the risks posed by sharing the kernel, we only have one tenant for a single ECS implementation. There are also some problems with this approach, the most prominent being the low utilization of resources for low-frequency function calls.
(3) Rapid horizontal elastic expansion
How to achieve horizontal elastic expansion?
① Through application container deployment, you can customize some special languages, runtime containers, and generic LIB/SDKs, and maintain a consistent community ecosystem. This eliminates the need for additional downloads, makes it easier for users to use, and enables very fast startup.
② By setting a public container image, writing a container image to an ECS image, starting a machine with an ECS image, and quickly replenishing the machine pool, the machine resource pool can be controlled, allowing for both performance and cost.
③ In a pooled machine, the creation of pooled containers, delayed mounting of code directories, early startup of runtime, and early health check can make the time required to start when a user request arrives shorter.
④ Control application size by limiting user application size, encouraging split business logic, and built-in SDK/Lib.
⑤ P2P image download acceleration is achieved through P2P image distribution, avoiding impact on download services, on-demand loading, reducing download latency, and improving startup speed.
How to improve resource utilization
In the actual research and development process, it was found that scheduling per unit of time slice under the same QPS has a significant impact on the amount of resources, and we can improve resource utilization through scheduling. For example, in the figure below, we can see that the overall TPS in the macro state is very stable. However, in fact, when we zoom in to the millisecond level, we will find that it is actually very uneven! So what impact will this non-uniformity have on us?
Let's assume that the maximum concurrency set for each container is 1, that is, one container can only process one task at any time. The following figure shows the impact on the number of containers when multiple requests for a, b, c, d, e, and f are scheduled at different time points.
We can see that in scenario 1, when each request is evenly entered, only one container is needed at any time, which is what we would ideally like to achieve;
In Scenario 2, if there is a scheduling delay, it may cause the previous requests and subsequent requests to mix at a certain point in time, resulting in a doubling of the number of containers. In the middle of the gap, these containers are not fully utilized, resulting in a waste of resources;
In Scenario 3, if the container takes a long time to start or the call takes longer, the original request b and request a will overlap in occurrence time, resulting in the need to create a new container. If the new container requires a long cold start, it will also overlap in occurrence time with request c. If the scheduling system is not implemented well enough, it may have an avalanche effect, leading to a surge in resource usage, while the actual utilization rate is extremely low.
Through the above scenarios, we can summarize an optimization direction for the cost of resource utilization:
1. Try to make the scheduling more uniform and reasonable as much as possible, and avoid clustering to arouse containers
2. Try to reduce the cold start time as much as possible, avoid creating a large number of containers in the short term, and avoid meaningless system scheduling overhead
In addition to the above, we can also consider high-density deployment to improve the resource utilization rate of stand-alone computers
How to withstand disasters and prevent avalanches?
When an exception occurs in actual operation, the user request may make an error, and after the error, it may restart or mobilize new resources to create a new container, but this will cause the entire delay to increase. Users will try again and again, and repeated attempts will lead to an increase in load, which in turn will cause exceptions, which is a vicious cycle. You can prevent avalanches by optimizing startup speed, multi-partition disaster recovery deployment, exponential backoff retries, breaker blocking abnormal requests, multiple availability zone disaster preparedness, and SLB blocking DDoS attacks.
2、 FaaS based on DPCA high-density deployment
(1) Why do we need to do high-density deployment?
Firstly, due to the high requirements for elastic startup speed, it is hoped that 10000 container instances per second can be started, the startup delay should be controlled within 300 milliseconds, the container lifetime should be at the minute level, and the resource granularity should be 128 MB;
Second, the cost is lower. Due to security isolation issues in the ECS architecture, there are many resource fragments and high latency in sudden calls, affecting the number of resources;
The third is performance. ECS has fewer stand-alone caches, higher request burr rates, and high maximum request latency;
Fourth, stability, high concurrency impact on the system, frequent creation and deletion of resources, and ECS control pressure make it difficult to control the explosion radius.
(2) Technical challenges brought by high-density deployment architecture
Some technical challenges posed by the entire high-density deployment architecture:
The first thing to face is how to solve the security risk of isolated single machine multi tenancy. If this problem is not solved, it will be impossible to achieve secure and high-density deployment of single machine multi tenancy, which will not effectively improve the resource utilization density;
The second is how to solve the problem of high concurrent startup speed. If this cannot be achieved, as we mentioned earlier, long cold startup times can seriously increase resource overhead, while seriously affecting the user's delayed experience;
How to solve the problem of single machine multi tenant VPC network connectivity and security is actually very important. The speed of establishing VPC network connections on ECS is very slow, which also seriously affects users' cold start and resource utilization;
In addition, we also need to consider how to design a technical disaster tolerance solution for high-density deployment, because exceptions to any one computing node can cause service exceptions for a large number of users.
(3) Optimization Based on Security Container Template Technology
How do we achieve optimization based on secure container template technology? Each container has an exclusive virtual machine sandbox, which is equivalent to an independent virtual machine with its own independent Linux kernel. In this way, each container is securely isolated through an independent kernel. During DPCA startup, a large number of virtual machines are templated to improve startup speed. By delaying the mounting of user code directories through virtiofs, and isolating users through virtual machine microkernels, it is possible to achieve about 20 megabytes of memory per microkernel on a single machine, with at least 2000 containers on a single machine, and controlling its cold startup time to be around 250 milliseconds. Through the scheduling algorithm, we can reasonably use resources and commit to the user's resource quota.
(4) Code load on demand
Code on-demand loading is achieved through the following aspects: the user container will reuse the same code, and a single DPCA only needs to download it once; The scripting language contains a large amount of code that cannot be used; Use FUSE (User Space File System) for real reading of intermediate layer files; The underlying layer uses NAS for low latency data downloading; OSS (Alibaba Cloud Object Storage) provides data download with high bandwidth support. Note that we use a mix of NAS and OSS to load code. It should be noted that NAS has relatively lower access latency and faster loading of small files. We started downloading code from OSS asynchronously in full at the initial stage of loading. For data that requires immediate access, we read from NAS. Because we have made the entire user code directory into two files: one is the directory file index data, and the other is the file content data. Due to the low latency of NAS access, we can obtain small file content from data files in a way similar to GetRange. This allows you to instantly load user code with the fastest speed to achieve a fast cold start.
(5) VPC network optimization
VPC gateway agents based on the network service grid are isolated through user VPC network security. In the past, plugging and unplugging the ENI network card in the ECS solution was very time-consuming, requiring at least 2 to 3 seconds, with P99 even reaching 6 to 8 seconds. In the high-density deployment of DPCA, we do not need to perform multiple network card plugging for each security container. Instead, we need to connect to the gateway agent uniformly on the DPCA machine, and the user ENI network card resides on the gateway cluster, which makes the entire network card load faster. This would be a huge optimization for both the user experience and resource overhead.
(6) Resource allocation rate
Improve the deployment density through mixed deployment of various types of multi tenant businesses, and reasonably match containers with different resource requirements to a physical dragon, thereby improving the resource allocation rate.
3、 Summary
Lecturer profile: Zhu Peng, Alibaba Cloud Serverless technical expert
Responsible for the design and development of Alibaba Cloud functional computing scheduling, and participated in the design, development, and implementation of functional computing in multiple directions, including high concurrency, technical disaster tolerance, cold start optimization, scheduling resource management, and DPCA bare metal technology architecture. He is one of the leading promoters of Alibaba Cloud functional computing's DPCA high-density deployment architecture. Currently, we are mainly committed to improving resource utilization, researching and designing low latency resource scheduling solutions for large-scale concurrency.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00