By Cloud-Native Special Interest Group (SIG)
In May 2015, engineers from Hyper.sh and Intel Open Source Technology Center independently released the virtualized container projects of runV and Clear Containers, respectively. The two projects were the predecessors of the Kata Containers [1] and had a lot of exchanges with each other. After two and a half years of independent development, they were combined as the Kata Containers project at the end of 2017, and the new project was donated to the OpenStack Foundation for management. This is also the first Pilot project of the OpenStack Foundation. In April 2019, Kata Containers was recognized by the OpenStack Foundation as its second top-level project. The OpenStack Foundation only had one top-level project in the previous ten years.
The birth of Kata Containers solves many problems that cannot be solved in common container scenarios:
With these advantages, secure containers pursue extreme lightweight in virtualization, so the overall resource consumption and elasticity are close to runC container solutions, thus achieving the goal of Secure as VM and Fast as Container.
In May 2018 when Kata 1.x was in use, Kata and the containerd community jointly formulated the shimv2 interface specification and took the lead in supporting the specification in Kata Containers. In November 2018, Kata simplified a large number of components through containerd-shim-v2 and vsock technologies. With a lightweight hypervisor and simplified kernel, Kata can significantly reduce memory overhead and container startup time. More importantly, the reduction of system deployment complexity also significantly improves stability, especially when the system is overloaded.
(Figure 1: Kata 1.x Architecture Diagram)
In 2019, Kata was upgraded from 1.x to 2.x, with very important technological progress. Kata-agent used Rust for refactoring, which significantly reduced memory overhead and the overall proneness to attack.
Starting from version 2.x, Kata has gradually become well-known and turned its direction to affect the technical iteration of upstream communities. The overall architecture of Kata has become mature in version 2.x. For subsequent development, Kata needs to optimize its exclusive components to realize the improvement of the Kata capability again, starting from a single attempt. The practice of rewriting Kata-agent with Rust in version 2.x to reduce memory overhead is a good example.
When Kata was developing rapidly, a team in Alibaba Cloud named Kangaroo has been building secret weapons based on Kata for cloud-native scenarios.
Alibaba Cloud has spent many years building a cloud-native underlying system called Kangaroo to solve the technical problems caused by cloud-native (such as high density and high concurrency). The secure container solution of Kangaroo is built based on the Kata Containers project and is optimized further. Kangaroo has made great optimization, such as rewriting the Go runtime of Kata Containers 2.0 with Rust to help reduce the memory overhead of the container runtime. It also has developed a lightweight VM manager Dragonball for Kata Containers, which is deeply optimized for container scenarios. The overall experience of Kata has been brought to a new level through the integrated design of container runtime and VMs.
What kind of capabilities can Kangaroo secure container provide?
Simply put, the ultimate high density and elasticity
Kangaroo secure containers can eject 3,000 secure containers in six seconds and run more than 4,000 secure containers on one host at the same time. Kangaroo has successfully supported nearly 12 billion calls per day for Alibaba Cloud Function Compute (FC) and led to the creation of over one million container instances per day of ECI. It significantly increases the core competitiveness of the business through extreme performance. In addition to the ultimate performance, Kangaroo secure container is used for hybrid deployment of secure containers, reducing resource costs substantially.
The internal achievements of Kangaroo are closely related to the development of the Kata community. Therefore, the Kangaroo Team gave the internal system that was horned for many years to the Kata community to jointly promote the development of the Kata community.
The industry has increasingly higher requirements for container startup speed, resource consumption, and stability in cloud-native scenarios. These are also the challenges that secure containers face compared with ordinary containers. The interaction between runV and Clear Containers many years ago gave birth to Kata Containers, a top-level project. Similarly, Kangaroo and Kata collide with each other. Kangaroo will open-source its internal system based on the Kata community after experiencing the test of the online production environment to the Kata community. This can help Kata upgrade to version 3.0 and improve the overall user experience and stability, reducing resource consumption and startup time consumption.
In summary, Kata 3.0 will add the following new features:
(The extensible architecture of Kata 3.0 also allows users to use other VMMs through configuration options to make decisions that meet specific needs.)
The following parts of this article will expand on the key updates of Kata 3.0.
(Figure 2: Kata 3.0 Architecture Diagram)
Dragonball sandbox is a KVM-based lightweight VMM tailored for Kata. In addition to supporting conventional hypervisors, it makes some optimizations for container workloads:
Why do we need the built-in VMM?
(Figure 3: Architecture Diagram of Runtime and VMM of Kata 2.x)
As shown in the figure, Runtime and VMM are separate processes before Kata 2.x. The Runtime process forks the VMM process and interacts with it through RPC. In general, interactions between processes consume more resources than interactions within processes, resulting in relatively low efficiency. Resource O&M costs should also be considered. For example, when reclaiming resources under abnormal conditions, the exception of any process must be detected by other components, and the corresponding reclaim process must be activated. Recovery becomes more difficult if additional processes exist.
In addition, different versions of Kata need to consider the adaptation of different versions of VMM. The version issue of VMM may cause problems in Kata adaptation, requiring users to spend more time adjusting the version and configurations.
Even though secure containers have become the standard solution to the multi-SLO hybrid deployment problem and public cloud multi-tenant problem in the industry, there is still no VMM tailored to support the secure container ecosystem. Other VMMs have their positioning in different fields. For example, QEMU has comprehensive capabilities, but its code has already exceeded one million lines, and the overall startup speed is slow and resource consumption is relatively high. Firecracker is extremely lightweight, but many virtualization capabilities are not so powerful, which cannot apply to complex and changeable security container scenarios.
A common solution is required to solve issues of multi-process interaction, complex O&M, VMM version adaptation, and secure container virtualization vacancy. We need a VMM that focuses on secure container scenarios to provide the optimal solution to virtualization in the secure container ecosystem. Dragonball VMM was created for this purpose. Kata 3.0will implement a Dragonball VMM built in the Kata ecosystem, and the VMM will grow together with Kata in the future to solve the issues mentioned above. We will introduce Dragonball in subsequent articles. Please stay tuned!
We provide the Dragonball sandbox to enable the built-in VMM by integrating the features of the VMM into the Rust library. We can use VMM-related features by using this library. As Runtime and VMM are in the same process, they have advantages in terms of message processing speed and API synchronization. The lifecycle consistency of Runtime and VMM can also be ensured, and resource recycling and exception handling can become less possible, as shown in Figure 4:
(Figure 4: Diagram of Kata 3.0 Built-in VMM)
The Rust version of kata-runtime provides a scalable framework for Service, Runtime, and Hypervisor. It also includes configuration logic for different scenarios.
In our use cases, various resources are involved, each with a corresponding subtype. Operations on resources of different subtypes are also different, especially for Virt-Container. Dependencies may exist. For example, share-fs rootfs and share-fs volume will use share-fs resources to share files to the VM. Currently, network and share-fs are considered as sandbox resources, but rootfs, volume, and cgroup are considered as container resources. Therefore, we abstract a common interface for each resource and use subclass operations considering differences between different subtypes.
Compared with the Go language, Rust is better in terms of performance and resource consumption and is even more efficient than C++ in specific scenarios. At the same time, Rust can avoid some column storage security risks (such as null pointers, wild pointers, memory leaks, and out-of-bound memory access), significantly reducing the frequency of program crashes. These capabilities are especially important for systems (such as Kata). However, the advantage of the Go language is that it has built-in mechanisms and libraries to support the writing of concurrent programs (such as goroutine), which significantly reduces CPU and memory overhead, especially for workloads with a large number of I/O intensive tasks. We developed Async Rust Runtime to take the concurrency and asynchronization advantages of the Go language and the security and low overhead features of Rust into account.
The kata-runtime is controlled by the TOKIO_RUNTIME_WORKER_THREADS to run OS threads, two by default. Threads related to TTRPC and containers run in the tokio thread in a unified manner. Related dependencies need to be switched to Async Rust Runtime (such as Timer, File, and Netlink). Async Rust Runtime can easily support non-block I/O and timers. However, we only use Async Rust Runtime in kata-runtime for now. The built-in VMM remains OS threads, which ensures that threads are controllable.
The word out-of-the-box should gradually become clearer in the minds of readers after the introduction above. The following is a summary of this concept:
The underlying infrastructure of the overall ecosystem of security containers forms a closed loop by providing the Dragonball VMM built-in kata-runtime. The scalable Rust Runtime is provided to support different forms of security containers, such as confidential containers. From then on, users do not need to pay attention to complicated adaptation. They only need to download, compile, and run Kata, and everything becomes simpler and easier.
The out-of-the-box experience will create a leap in ease of use, maintainability, stability, and other aspects for newcomers that want to get started quickly in Kata and engineer teams that require commercially available Kata. The development of any technology must be in the direction of simplicity. Therefore, we firmly believe that simplicity will be the future direction of Kata.
Kata 3.0 has finished the first stage of development and the basic features of Kata have been implemented. The remaining features will be implemented in the second and third stages. The first alpha version of Kata 3.0 has been available since July 25, 2022. We look forward to your participation in the construction and trial use of Kata 3.0.
The release time of subsequent Kata 3.0 versions is displayed below. If you are interested in Kata, please participate in the joint construction.
Cloud-native SIG will integrate the cloud-native advantages of OpenAnolis, output the out-of-the-box cloud-native release version of OpenAnolis, and provide solutions for scenarios (such as big data and hybrid deployment) based on user needs. Therefore, it can help users build application clusters faster and better by using cloud-native technologies. Kata (as one of the OpenAnolis cloud-native projects) will also jointly build cloud-native services based on secure containers in the future.
The capabilities of Kata in subsequent versions will be implemented in the whole cloud-native ecosystem of OpenAnolis cloud-native SIG. If you are interested, please join us to build a cloud-native ecosystem together.
[1] Kata Containers Project:
https://github.com/kata-containers/kata-containers
[2] Nydus Project:
https://github.com/dragonflyoss/image-service
Tao Ma (Chairman of OpenAnolis): Expanding Computing Power, Driving a New Future on the Cloud
85 posts | 5 followers
FollowAlibaba Cloud Community - December 16, 2022
Alibaba Cloud Community - November 14, 2024
Alibaba Container Service - October 23, 2019
OpenAnolis - February 27, 2023
OpenAnolis - July 8, 2022
Alibaba Developer - June 23, 2020
85 posts | 5 followers
FollowAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreA secure image hosting platform providing containerized image lifecycle management
Learn MoreMore Posts by OpenAnolis