×
Community Blog AI Software Stack-Oriented Optimized Design - AI Ecosystem Construction in OpenAnolis

AI Software Stack-Oriented Optimized Design - AI Ecosystem Construction in OpenAnolis

The article explores how the growing AI industry is influencing operating systems, including AI-driven design, AI-assisted user experience, and AI-aided OS development.

By Lin Yan

As the artificial intelligence (AI) industry grows by leaps and bounds, we will see more and more AI servers in data centers, even taking up more than half the total. What's the impact of this trend on operating systems? Lin Yan, an AI SIG maintainer of OpenAnolis and also a senior technical expert of Alibaba Cloud, posted an article to share OpenAnolis's practice and ideas on AI ecosystem construction from four aspects: impact of AI on operating systems, AI software stack-oriented optimized design, AI technologies assisting users in using operating systems, and using AI for operating system development.

The article is shared below.

Impact of AI on Operating Systems

_1

As the AI industry grows by leaps and bounds, we will see more and more AI servers in data centers, even taking up more than half the total. What's the impact of this trend on operating systems? What challenges and opportunities will operating systems face in the future?

The first aspect is AI software stack-oriented optimized design, namely, System for AI. However developed, AI must run on operating systems. It's worth considering how operating systems can be more optimized and collaborative to run AI loads better or achieve better synergy.

Second, whether it is feasible to integrate diverse AI technologies into operating systems. This will provide operating system users with a new portal that offers powerful capabilities, such as intelligent tuning or autonomous decision-making in the field of control, delivering operating system users a better experience.

The third aspect is to use AI for operating system development. As a result of the rapid development of AI technologies, AI programming is booming. AI can be used to fix bugs and autonomously optimize operating system components. Therefore, we venture to predict that AI will be directly used for operating system development. In fact, there are already some signs of this trend.

Now, we'll present what OpenAnolis has done from the three aspects.

AI Software Stack-oriented Optimized Design of Operating Systems

_2

The preceding figure shows the overall framework of AI applications in OpenAnolis, from infrastructure platforms to distribution channels. OpenAnolis will keep optimizing the infrastructure, including the AI-oriented infrastructure, while focusing on the characteristics of AI containers and other containers. Currently, the hardware ecosystem of AI is fragmented. OpenAnolis has used AI to optimize all mainstream chips to provide full hardware support.

Nowadays, very few distributions make software packages like PyTorch available through the Red Hat Package Manager (RPM) service or embed software packages in the operating system. Why is this? That's because while AI software is evolving so fast, developers still install packages in traditional ways, such as using container images and pip or conda. Inherent issues still exist, and the software supply chain also faces severe security challenges. Developers will find that a software image made not long ago cannot run now. This is because upstream modifications are out of control and downstream developers and users have to conduct compatibility adjustments and verifications accordingly. Severe security risks also exist from a product perspective. Some mining scripts are even disguised as well-known software products and released in upstream open source communities. Once installed and distributed, these mining scripts expose users to severe security risks. Therefore, the main problem here is that the software provided in upstream ecosystems cannot be used directly and requires a lot of security analysis before they are used.

OpenAnolis's concepts of repeatable construction and security-oriented selection can be applied to software package distribution systems. Anolis OS comes with mainstream AI software, which can be installed by running the yum install command. After installation, Anolis OS further encapsulates the software package into a container, so that the software can run in any environment. In addition, OpenAnolis will provide AI-related practices and virtual machine (VM) images for novice AI enthusiasts.

_3

OpenAnolis container images focus on achieving the broadest possible coverage of mainstream AI ecosystems from a hardware perspective, including optimizing AI-enhanced instruction sets such as bfloat16 (BF16), INT8, AMX, and VNNI on both GPUs and CPUs. Collaborative software and hardware optimization is an intrinsic benefit of operating systems, as AI developers or users are not as familiar with the instruction sets of the underlying chips as operating system developers are. This is where OpenAnolis has an advantage. The AI container images provided by OpenAnolis significantly improve the fragmented AI hardware ecosystem. With the help of OpenAnolis images, AI developers can quickly find suitable AI development and operating platforms and get optimal container image solutions without the need to learn about the underlying hardware.

Currently, OpenAnolis focuses on training inference, including compatibility with ModelScope. Some of the large models in ModelScope can run on all hardware platforms that are compatible with OpenAnolis. Above is the project that OpenAnolis AI SIG is working on. We welcome you to log on to our official website to try out AI container images: https://cr.openanolis.cn/mirror

_4

AI container images have been released one after another, and they offer six major benefits.

First, rich diversity. For example, container images specifically optimized for different hardware are provided.

Second, container image scan. This feature checks whether an image contains software with viruses to ensure basic security.

Third, timely updates. A major version is released upstream every three months. Most AI application developers would want to experience the AI features of the latest version. We will use the production platform of OpenAnolis containers to automatically generate new AI images when each version is released, so that users can immediately experience the latest upstream software.

Fourth, timely troubleshooting and Common Vulnerabilities and Exposures (CVE) patching from OpenAnolis. OpenAnolis has an operating system that is provided with long-term support and continuously updated. All CVEs of the operating system, including critical security vulnerabilities, are fixed at the earliest opportunity.

The fifth benefit is that optimal performance is provided for various platforms.

The sixth benefit is guaranteed image supply chain security and an SBOM security platform. In non-RPM or uncontrolled software ecosystems, supply chain security incidents abound, including serious security situations such as merchant ghosting and software poisoning. OpenAnolis hopes to establish an SBOM security platform to allow AI developers and OpenAnolis users to use the secure container images provided by OpenAnolis.

_5

For example, OpenAnolis adds BF16 support to ARM to optimize the performance of PyTorch inference. The performance of the default version of Anolis OS has been greatly improved from aspects such as compilation optimization, operator optimization, computing library optimization, and basic library optimization.

_6

In addition to the AI software stack, OpenAnolis has collaborated with Nvidia to support Anolis OS as the default operating system running on Nvidia data processing units (DPUs). Anolis OS 8.6 is currently one of the operating systems officially supported by Nvidia's BlueField bootstream (BFB) images.

In the future, OpenAnolis will continue to explore fields such as AI networking and storage, incubate and improve solutions in the open source field, and optimize existing problems in links such as loading, distribution, and storage related to AI large models, including the significant problem of distributing and loading large models that are tens of gigabytes in size. OpenAnolis also plans to optimize technologies such as CephFS based on distributed systems.

Apart from AI networking and storage, OpenAnolis will provide the entire AI technology stack with AI IDE and other software required by MLOps, followed by gradual integration and promotion. OpenAnolis aims to provide comprehensive solutions for different AI development processes, including frontend data processing, intermediate training, and final inference deployment, and to ultimately make Anolis OS a more user-friendly operating system.

OpenAnolis has been continuously promoting AI practices, such as helping AI users and novices easily run their interested AI software or large models such as those related to graphic, image, audio, and singing processing. A series of such practices have been carried out with the Anolis OS software. With Anolis OS, AI users can quickly launch the AI components in which they are interested and enjoy optimal hardware and performance.

AI Technologies Assisting Users in Using Operating Systems

After introducing the optimized AI software stack design of Anolis OS, let's now look at fusion AI and the AI technologies integrated into Anolis OS.

_7

Copilot and KeenTune are two important AI components provided by OpenAnolis. Copilot will be used as the O&M troubleshooting and knowledge base portal and may even be used as the access portal to Anolis OS. When users encounter problems related to the operating system or software installation, they can directly report them to Copilot.

KeenTune is an AI intelligent tuning component that OpenAnolis develops specifically to maximize system performance. This component uses AI algorithms to assist the operating system in intelligent tuning. KeenTune was put into use earlier than Copilot. KeenTune is open source in OpenAnolis and has been used for commercial use in many areas. As an AI tuning tool, KeenTune enables the operating system to run in an optimal upper-layer workload or application environment to deliver superb performance to users.

Using AI for Operating System Development

_8

OpenAnolis has made some progress in using AI for operating system development.

AI has a natural advantage in participating in testing and generating test cases because it has no emotions. AI automation helps operating system developers write component code more efficiently. It also autonomously detects and fixes bugs in operating systems. We believe that as AI evolves, it will gradually be available in real production environments.

Nowadays, some overseas communities are also migrating from traditional operating system software developed based on C89 and C99 standards to new, modern programming languages and applications. Therefore, it's really worth exploring the potential of AI in this area.

0 1 0
Share on

OpenAnolis

85 posts | 5 followers

You may also like

Comments