By Wupeng
Series about Kubernetes Stability Assurance Handbook:
With the increasing emphasis on stability and the popularity of community observability projects, observability has become a hot topic. People have different understandings from different perspectives.
A macro understanding of observability is formed starting from the lifecycle of software development. Besides, the understanding and practice of observability can be determined from the perspectives of SRE and Serverless.
Wikipedia defines observability as, "In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs."
Consider a physical system modeled in a state-space representation. A system is said to be observable if, for any possible evolution of state and control vectors, the current state can be estimated only using the information from outputs. Physically, this generally corresponds to information obtained by sensors. In other words, one can determine the behavior of the entire system from the system's outputs. On the other hand, if the system is not observable, there are state trajectories that are not distinguishable by only measuring the outputs.
In short, observability is a method to derive the internal state of the system from the external output of the system.
The following figure simplifies the system composition and interaction between systems:
From the interaction diagram above, the interaction behavior of the system has the following forms:
The internal status of the system can be understood through the external output of the system through the following two forms of information:
The core of observability is to meet the needs of different people to understand the state of the system through observational data. The lifecycle of the observation data is abstracted on the following diagram:
Observational data is generated by applications, stored after intermediate processing, and queried for consumers.
Observational data serves different types of consumers, such as product users, businesses, R&D personnel, and site reliability engineers (SREs). Different consumers use the data in different forms, including SLA, SLO, SLI, and Alert.
Based on the lifecycle of observational data, the problem domains of observability are roughly summarized below:
Generation
Processing
Storage
Use
From the project perspective, the software development lifecycle involves the following steps:
Refine the Steps:
There are four types of roles in the software development lifecycle. The observability objectives of the four roles are different:
Note:
Basic Services:
OpenTelemetry can be used as a basis to implement the items above. For more details, please see: A Brief Look at OpenTelemetry.
Additionally, visual stability assurance services can be explored, which can help discover, locate, and solve problems quickly from a global perspective. The diagram below shows the health status of components themselves and interactions between them:
On this basis, an overall view of the cluster status can be kept. Exception information can be associated as well, in turn solving problems in a targeted manner.
Serverless computing is a promising cloud computing execution model. Alibaba Cloud provides various related products:
One of the main differences between different Serverless computing environments is the duration of the runtime environment. Starting from this, the core of observability in the Serverless computing environment can be abstracted, and then the corresponding solutions can be raised:
Depending on the persistence of the runtime environment, the execution duration can be divided into three types:
All of these runtime environments can be implemented using technologies, such as virtual machines, containers, and WebAssembly. The difference lies in the duration of the runtime environment defined by the business layer.
The core concerns of the platform and users may change depending on the duration of the running environment:
For the FaaS scenario, the demo of Thundra shows a good example for reference. Three examples are truncated as below:
An in-depth understanding of the concept of observability, problem domains, and requirements at different levels can help deepen your appreciation for observability. Based on the appreciation, it is integrated with the business to enhance the competitiveness of the business in terms of observability, along with iterative understanding, where technology and business are mutually reinforcing.
Optimization on Alibaba Cloud-Native Etcd, Cluster Management, and Control
503 posts | 48 followers
FollowAlibaba Developer - August 9, 2021
Alibaba Cloud Native Community - September 28, 2021
Alibaba Developer - August 9, 2021
Alibaba Developer - August 2, 2021
Alibaba Clouder - November 23, 2020
Alibaba Cloud Community - October 10, 2022
503 posts | 48 followers
FollowAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreMore Posts by Alibaba Cloud Native Community