How is the Flame Graph Created? Exploring Flame Graphs in Pyroscope Source Code (1)

By Zongzheng Xi, from Alibaba Cloud Storage Team

Brief Description of Flame Graph

Flame Graph, created by Brendan Gregg in 2011, is a visual program performance analysis tool that helps developers track and display information about function calls and the time taken by the calls.

General Interpretation

The basic idea of the flame graph is to convert the program's function call stack into a rectangular "flame"-shaped image. The width of each rectangle represents the proportion occupied by the function, and the height represents the call depth of the function (i.e., the number of layers of recursive calls). By comparing the flame graphs at different time points, the program's performance bottleneck can be quickly diagnosed, enabling targeted optimization. Under normal circumstances, a wide rectangle on the top of the stack indicates a performance bottleneck that needs to be analyzed and optimized.

Flame graph (broad sense) is divided into two kinds of drawing, including flame graph (narrow sense), and icicle graph. The root of the flame graph (narrow sense) is at the bottom, and the child node is displayed above its parent node. The root of the icicle graph is at the top, and the child node is displayed below its parent node. The two methods of drawing are only different in presentation and name and are usually collectively referred to as flame graphs (broad sense).

Flame Graph Type

According to the types given by the founder Gregg, the common flame graph types are five: CPU, Off-CPU, Memory, Hot/Cold, and Differential.

Type	Horizontal Axis	Vertical Axis	Fixed Issues	Sampling Method
CPU	CPU Usage Time	Call Stack	Identify problem functions with high CPU usage and analyze code hot paths.	Fixed-frequency Sampling CPU Call Stack
Off-CPU	Blocking Time	Call Stack	Performance degradation caused by scenarios such as i/o and network blocking; performance degradation caused by lock contention and deadlock.	Fixed frequency sampling blocking event call stack
Memory	Number of memory request/release function calls, or the total number of bytes allocated.	Call Stack	Memory leaks, objects with high memory usage/functions that apply for more memory, virtual memory, or physical memory leaks.	Track malloc/free, track brk, track mmap, track page errors.
Hot/Cold	CPU and Off-CPU Combination	Call Stack	Scenarios where CPU usage and blocking analysis need to be combined, scenarios where problems cannot be directly determined by Off-CPU.	CPU and Off-CPU Combination
Differential	The difference between the front and rear flame graphs	Call Stack	Performance regression problem and tuning effect analysis	Consistent with the front and rear flame graph

About On-CPU and Off-CPU

The CPU flame graph shows what is happening on the CPU, which is the red part in the figure below. Off-CPU flame graph shows what happens outside the CPU, which is the waiting time when I/O, locks, timers, paging/switching, etc. are blocked. It is shown in blue in the figure below.

During I/O, there are File I/O and Block Device I/O. By collecting the call stack when the process gives up the CPU, you can know which functions are waiting for other events so frequently that they need to give up the CPU. By collecting the call stack when the process is awakened, you can know which functions make the process wait longer.

About Hot/Cold Flame Graph and Differential Flame Graph

Both hot/cold and differential flame graphs have the meaning of "comparison," but they differ in dimensions.

The hot and cold flame graph mainly compares the On-CPU and Off-CPU of performance analysis. When using the native flame graph suite, it can only scale to the same x-axis. Usually, relatively large Off-CPU time will squeeze the On-CPU time. Vladimir Kirillov integrated the blocking data with the CPU profile, including the blocking call in the eflame, and implemented the merged ancestor so that the blocking function appears in blue at the top of the warm stack.

The differential flame graph mainly compares the differences between two performance analyses. After understanding the situation during the program running through the flame degree of the first performance analysis, the next step is to modify the tuning in a targeted manner. After the adjustment, a second performance analysis is performed to generate a flame graph. By comparing the flame graphs before and after the adjustment, it is evaluated whether the adjustment is effective.

Sometimes, you may find that some metrics suddenly increase after a system upgrade. Then, you can compare the flame graphs before and after the upgrade to find those functions that take more time.

Application of Flame Graph in Continuous Profiling

Continuous Profiling is a technique that continuously collects line-level performance data from any environment, including production. Visualizations of the data are then provided so that developers can analyze, troubleshoot, and optimize their code.

Unlike traditional static analysis techniques, Continuous Profiling can obtain performance data in the actual operating environment without significantly affecting the performance of the application. This allows for more accurate analysis of application performance issues, and performance tuning and debugging in real-world deployment environments. Developers can implement continuous integration and deployment for production environments. Production feedback then goes to the Continuous Profiler, which creates a feedback loop that provides developers with feedback on profiling data.

More Types

From an implementation point of view, a flame graph is a graph of a "stack-value" data structure. As long as the data of the data structure is satisfied, it can be converted into a flame graph display mode. The CPU, Off-CPU, and Memory types given by founder Gregg are given more imagination space. Take Pyroscope as an example, it consists of Pyroscope Server and Pyroscope Agent. Agent records and aggregates the actions performed by application programs and sends them to Server, which processes, aggregates, and stores the data from Agent to quickly query according to the time range. Therefore, different agents can be designed for different languages for more detailed performance monitoring.

table1

Flame Graph Related Open-source Warehouse

• Brendan Gregg, the founder of flame graph, provided his git repository on flame graph.
https://github.com/brendangregg/FlameGraph
• The founder's colleague, Martin Spier, who works on the Netflix performance engineering team, created the d3-flame-graph based on the d3.js framework.
https://github.com/spiermar/d3-flame-graph
• FlameBearer, a fast flame graph tool for Node and V8. Designed to generate fast, lightweight flame graphs that remain responsive even under large inputs. Pyroscope continues to be developed based on FlameBearer.
https://github.com/mapbox/flamebearer
• React version flame graph implements react-flame-graph.
https://github.com/bvaughn/react-flame-graph
• The implementation of the flame graph in Clinic.js.
https://github.com/clinicjs/node-clinic-flame
• Code for the section on flame graphs in the Pyroscope project.
https://github.com/pyroscope-io/pyroscope/tree/main/packages/pyroscope-flamegraph

>> Next article: How is the Flame Graph Created? Exploring Flame Graphs in Pyroscope Source Code (2)

Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

Community

How is the Flame Graph Created? Exploring Flame Graphs in Pyroscope Source Code (1)

Brief Description of Flame Graph

General Interpretation

Flame Graph Type

About On-CPU and Off-CPU

About Hot/Cold Flame Graph and Differential Flame Graph

Application of Flame Graph in Continuous Profiling

More Types

Flame Graph Related Open-source Warehouse

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

Storage Capacity Unit

Hybrid Cloud Storage

Hybrid Cloud Distributed Storage

Data Lake Storage Solution