By Zongzheng Xi, from Alibaba Cloud Storage Team
Flame Graph, created by Brendan Gregg in 2011, is a visual program performance analysis tool that helps developers track and display information about function calls and the time taken by the calls.
The basic idea of the flame graph is to convert the program's function call stack into a rectangular "flame"-shaped image. The width of each rectangle represents the proportion occupied by the function, and the height represents the call depth of the function (i.e., the number of layers of recursive calls). By comparing the flame graphs at different time points, the program's performance bottleneck can be quickly diagnosed, enabling targeted optimization. Under normal circumstances, a wide rectangle on the top of the stack indicates a performance bottleneck that needs to be analyzed and optimized.
Flame graph (broad sense) is divided into two kinds of drawing, including flame graph (narrow sense), and icicle graph. The root of the flame graph (narrow sense) is at the bottom, and the child node is displayed above its parent node. The root of the icicle graph is at the top, and the child node is displayed below its parent node. The two methods of drawing are only different in presentation and name and are usually collectively referred to as flame graphs (broad sense).
According to the types given by the founder Gregg, the common flame graph types are five: CPU, Off-CPU, Memory, Hot/Cold, and Differential.
Type | Horizontal Axis | Vertical Axis | Fixed Issues | Sampling Method |
CPU | CPU Usage Time | Call Stack | Identify problem functions with high CPU usage and analyze code hot paths. | Fixed-frequency Sampling CPU Call Stack |
Off-CPU | Blocking Time | Call Stack | Performance degradation caused by scenarios such as i/o and network blocking; performance degradation caused by lock contention and deadlock. | Fixed frequency sampling blocking event call stack |
Memory | Number of memory request/release function calls, or the total number of bytes allocated. | Call Stack | Memory leaks, objects with high memory usage/functions that apply for more memory, virtual memory, or physical memory leaks. | Track malloc/free, track brk, track mmap, track page errors. |
Hot/Cold | CPU and Off-CPU Combination | Call Stack | Scenarios where CPU usage and blocking analysis need to be combined, scenarios where problems cannot be directly determined by Off-CPU. | CPU and Off-CPU Combination |
Differential | The difference between the front and rear flame graphs | Call Stack | Performance regression problem and tuning effect analysis | Consistent with the front and rear flame graph |
The CPU flame graph shows what is happening on the CPU, which is the red part in the figure below. Off-CPU flame graph shows what happens outside the CPU, which is the waiting time when I/O, locks, timers, paging/switching, etc. are blocked. It is shown in blue in the figure below.
During I/O, there are File I/O and Block Device I/O. By collecting the call stack when the process gives up the CPU, you can know which functions are waiting for other events so frequently that they need to give up the CPU. By collecting the call stack when the process is awakened, you can know which functions make the process wait longer.
Both hot/cold and differential flame graphs have the meaning of "comparison," but they differ in dimensions.
The hot and cold flame graph mainly compares the On-CPU and Off-CPU of performance analysis. When using the native flame graph suite, it can only scale to the same x-axis. Usually, relatively large Off-CPU time will squeeze the On-CPU time. Vladimir Kirillov integrated the blocking data with the CPU profile, including the blocking call in the eflame, and implemented the merged ancestor so that the blocking function appears in blue at the top of the warm stack.
The differential flame graph mainly compares the differences between two performance analyses. After understanding the situation during the program running through the flame degree of the first performance analysis, the next step is to modify the tuning in a targeted manner. After the adjustment, a second performance analysis is performed to generate a flame graph. By comparing the flame graphs before and after the adjustment, it is evaluated whether the adjustment is effective.
Sometimes, you may find that some metrics suddenly increase after a system upgrade. Then, you can compare the flame graphs before and after the upgrade to find those functions that take more time.
Continuous Profiling is a technique that continuously collects line-level performance data from any environment, including production. Visualizations of the data are then provided so that developers can analyze, troubleshoot, and optimize their code.
Unlike traditional static analysis techniques, Continuous Profiling can obtain performance data in the actual operating environment without significantly affecting the performance of the application. This allows for more accurate analysis of application performance issues, and performance tuning and debugging in real-world deployment environments. Developers can implement continuous integration and deployment for production environments. Production feedback then goes to the Continuous Profiler, which creates a feedback loop that provides developers with feedback on profiling data.
From an implementation point of view, a flame graph is a graph of a "stack-value" data structure. As long as the data of the data structure is satisfied, it can be converted into a flame graph display mode. The CPU, Off-CPU, and Memory types given by founder Gregg are given more imagination space. Take Pyroscope as an example, it consists of Pyroscope Server and Pyroscope Agent. Agent records and aggregates the actions performed by application programs and sends them to Server, which processes, aggregates, and stores the data from Agent to quickly query according to the time range. Therefore, different agents can be designed for different languages for more detailed performance monitoring.
• Brendan Gregg, the founder of flame graph, provided his git repository on flame graph.
https://github.com/brendangregg/FlameGraph
• The founder's colleague, Martin Spier, who works on the Netflix performance engineering team, created the d3-flame-graph based on the d3.js framework.
https://github.com/spiermar/d3-flame-graph
• FlameBearer, a fast flame graph tool for Node and V8. Designed to generate fast, lightweight flame graphs that remain responsive even under large inputs. Pyroscope continues to be developed based on FlameBearer.
https://github.com/mapbox/flamebearer
• React version flame graph implements react-flame-graph.
https://github.com/bvaughn/react-flame-graph
• The implementation of the flame graph in Clinic.js.
https://github.com/clinicjs/node-clinic-flame
• Code for the section on flame graphs in the Pyroscope project.
https://github.com/pyroscope-io/pyroscope/tree/main/packages/pyroscope-flamegraph
>> Next article: How is the Flame Graph Created? Exploring Flame Graphs in Pyroscope Source Code (2)
Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.
How is the Flame Graph Created? Exploring Flame Graphs in Pyroscope Source Code (2)
1,084 posts | 278 followers
FollowAlibaba Cloud Community - January 8, 2024
Alibaba Cloud Native - February 2, 2024
Alibaba Cloud Native - April 16, 2024
Alibaba Developer - August 18, 2020
Alibaba Cloud Native Community - May 9, 2023
Alibaba Cloud Native Community - April 26, 2024
1,084 posts | 278 followers
FollowPlan and optimize your storage budget with flexible storage services
Learn MoreA cost-effective, efficient and easy-to-manage hybrid cloud storage solution.
Learn MoreProvides scalable, distributed, and high-performance block storage and object storage services in a software-defined manner.
Learn MoreBuild a Data Lake with Alibaba Cloud Object Storage Service (OSS) with 99.9999999999% (12 9s) availability, 99.995% SLA, and high scalability
Learn MoreMore Posts by Alibaba Cloud Community