By Devin Xu, eBPF Technology Exploration SIG Maintainer & Linux Kernel Security Researcher in OpenAnolis.
This article shares the author's thoughts and insights on kernel security from the PODS 2022 Conference. This article was written based on three aspects: monitoring and observability, eBPF security observability analysis, and kernel security observability outlook.
As shown in the figure below, monitoring is only the tip of the iceberg of observability. Most of the deep-seated problems hidden under the water cannot be solved by monitoring.
There is a trend of monitoring being visualized, but the vast majority of monitoring is done by defining parameters in advance and viewing logs afterward for analysis. The disadvantages of monitoring include:
This is also the main difference between observability and monitoring.
Metrics refer to a certain type of information in the system, such as CPU, memory, network throughput, hard disk I/O, and hard disk usage. When the value of the metrics triggers the abnormal threshold, the system will send out alarm information or process actively, such as killing or isolating processes.
Purposes: Monitoring and alert
Let's conclude the difference between monitoring and observability:
Monitoring: It collects and analyzes system data, checks the current status of the system, and analyzes and handles foreseeable problems.
Observability: It means to observe the system and measure the internal status of the system and infer from the external output data that the system is in a certain degree of metrics at this time, especially the scenarios and events we care about.
Dimension | Monitoring | Observability |
Scalability | Scope Limitations | Flexible and Extensive |
Validation Cycle | Long Cycle | Immediate |
Data Source | Limited Area | Customization |
Observation Type | Predictability | Burstability |
Security: It refers to the state that an object or the object attribute is not threatened.
The security observability of eBPF is characterized by its extremely weak sense of existence to the kernel but extremely powerful observational capabilities:
Among them, the eBPF program is sandboxed and cannot be separated from the eBPF Verifier in safe mode (the most important of which is bound check):
There are three main types of application scenarios for eBPF observability:
With the rapid development of cloud, edge, and end, people are increasingly focusing on the most popular cloud-native scenario currently. Falco, Tracee, Tetragon, Datadog-agent, and KubeArmor are several popular runtime protection solutions in cloud-native scenarios.
These solutions are mainly based on eBPF to mount kernel functions and write filtering policies. When an abnormal attack occurs at the kernel layer, predefined policies are triggered, directly issuing the alarms or even blocking operations without returning a message to the user layer.
Enforce security policies throughout the operating system in a preventative manner rather than reacting asynchronously to events. In addition to specifying, allow lists for multiple levels of access control. These solutions can also automatically detect privilege and capabilities escalation or namespace escalation (container escape) and automatically terminate affected processes.
Security policies can be injected through systems (such as Kubernetes (CRD), JSON API, or Open Policy Agent (OPA)).
Such solutions are executed at the application and system call layer, and observability solutions vary. They all have a user space agent, but the agent relies on observability data collected by definition and then reacts to the data. Such solutions cannot observe kernel-level events.
This type of solution is directly operated at the kernel layer, mainly for runtime reinforcement, and the observation capability is weak (or even null). The built-in kernel system provides a lot of policy execution options, but the kernel only focuses on providing access control capabilities when it is built, and it is very difficult to extend. For example, the kernel cannot be aware of Kubernetes and containers. Although the kernel module solves the scalability problem, it is often not a wise choice in many scenarios due to the security risks it generates.
The young kernel subsystem like LSM-eBPF is very powerful and promising, but it needs to rely on the latest kernel (≥ 5.7).
The following is a discussion of traditional kernel security, Android kernel security, and KRSI.
As Linus Torvalds once said, most security risks are caused by bugs, and bugs are part of the software development process; no bugs, no software.
As for whether it is a safe or non-safe vulnerability, the kernel community tests as much as possible and finds more potential vulnerabilities, which is similar to the blacklist approach.
The process of submitting kernel code is relatively complicated. When the kernel code is applied to a specific kernel version, there are problems of long cycle and version adaptation. Therefore, the development speed of kernel security is slower than other modules. At the same time, with the rapid development of intelligence, digitization, and cloudification, there are now tens of billions of devices based on Linux systems worldwide, and the security of these devices mainly depends on the security and robustness of the mainline kernel. When a kernel LTS version is issued with vulnerabilities, the related machines might be breached and exploited maliciously, and the loss is incalculable.
Nowadays, more smart terminals worldwide, including mobile phones, TV, SmartBox, IoT, cars, and multimedia devices, use the Android operating system, whose bottom layer is a Linux kernel. This also means the security of the Linux kernel has a significant impact on Android.
Due to historical reasons, Google's philosophy on the open-source of the Android kernel is not in line with the Linux kernel community. This has led to a number of specific modifications made to the Android kernel that cannot be incorporated into Upstream. It also leads to the fact that the Android kernel is partly different from the Linux kernel on the security side, and the focus is also different.
At the operating system level, the Android platform provides the security features of the Linux kernel and provides a secure inter-process communication (IPC) mechanism for secure communication between applications running in different processes. These security features at the operating system level are designed to ensure that even the native code is subject to application sandboxes. Whether the corresponding code is the result of self-contained application behavior or exploiting application vulnerabilities, the system can prevent illegal applications from harming other applications, the Android system, or the device itself.
Android kernel security features:
The prototype of Kernel Runtime Security Instrumentation (KRSI) is implemented in the form of the Linux security module (LSM). The eBPF program can be mounted to the security hook of the kernel. The security of the kernel mainly involves inseparable two aspects: Signals and Mitigations.
KRSI is implemented based on LSM, which means KRSI can make access control policy decisions. However, this is not the focus of KRSI's work. The focus is mainly to comprehensively monitor the system behavior to detect attacks. (It's the most important application scenario, but currently, it mainly does detection because it may be more dangerous to block processes rashly.) From this perspective, KRSI can be said to be an extension of the kernel audit mechanism, using eBPF to provide a higher level of configurability than the current kernel audit subsystem.
1) KRSI allows appropriate privileged users to mount BPF programs to any of the hundreds of hooks provided by the LSM subsystem.
2) KRSI has exported a new file system hierarchy under the /sys/kernel/security/bpf-each hook that corresponds to a file to simplify this step.
3) BPF programs (a new type of BPF_PROG_TYPE_LSM) can be mounted to these hooks through bpf() system calls, and multiple programs can be mounted to any given hook.
4) Whenever a security hook is triggered, all mounted BPF programs will be called in turn. As long as any BPF program returns an error state, the requested operation will be rejected.
5) KRSI can block operations from the function level, which is finer-grained and much less dangerous than processes.
The kernel security is crucial in that one single move can affect the whole situation, especially in terms of runtime safety. The new eBPF program mentioned above provides a unified API strategy for signals and mitigations, optimizes the kernel LSM framework, and solves the problem that in existing mechanisms, system calls are easy to be lost. From the perspective of blocking the operation of a function call, it realizes a finer-grained and more reasonable detection scheme. At the same time, it has further development space for kernel Livepatch, vulnerability detection, and defense against related attack methods, such as privilege escalation. The solution of eBPF combined with LSM is still evolving, and its functions and performance are gradually improving.
Everyone interested in the topic is welcome to join the eBPF Special Interest Group (SIG) of OpenAnolis. You are welcome to discuss and share your views and solutions on eBPF technology. Let’s start a magical journey of eBPF with SIG members!
eBPF SIG Address:
https://openanolis.cn/sig/ebpfresearch
Kata 3.0 Is Coming! Start Experiencing the Out-of-the-Box Secure Container!
85 posts | 5 followers
FollowAlibaba Cloud Native Community - January 19, 2023
Alibaba Cloud Native - March 6, 2024
Alibaba Cloud Native - March 6, 2024
Alibaba Cloud Community - January 6, 2023
Alibaba Cloud Native - April 2, 2024
Alibaba Cloud Community - October 9, 2022
85 posts | 5 followers
FollowMSE provides a fully managed registration and configuration center, and gateway and microservices governance capabilities.
Learn MoreMore Posts by OpenAnolis
Dikky Ryan Pratama May 6, 2023 at 12:32 pm
very easy article to understand.