An Analysis of the Exploration and Practice of Python Startup Acceleration

This article summarizes Yichen Yan’s speech on CPython community-related work, its design and implementation, and business-level integration from PyCon China 2022.

Yichen Yan (Alibaba Cloud Senior Engineer) gave a technical speech on the Exploration and Practice of Python Startup Acceleration at PyCon China 2022. The author introduced the CPython community-related work, the design and implementation of this solution, and business-level integration.

The following is the content of this speech:

1. Python Startup Speed Analysis

Start with the time-consuming analysis of the startup time of the Python 3 empty interpreter. As we can see, the main time-consuming part is related to Python package loading.

Among them, package loading occupies about 30% of the CPU time. The time spent on disk IO is related to package loading 37% of the time.

Those familiar with the Python mechanism know that when loading a package in Python, it will search for the corresponding pyc file first, which is a serialized bytecode format. Once found, it will be deserialized, and the code inside will be executed. If the corresponding pyc file does not exist, the pyc file is recompiled to obtain the bytecode and serialized to a pyc file for persistent storage. The main goal of the optimization is in the package loading process, hoping to avoid the overhead of search, read, and deserialization.

Let’s take Python 3.10 as an example. Here is the time it takes to use the Python interpreter to start an empty statement. It also uses -X importtime to print the time consumed by loading each package. As you can see, the package load time accounts for about 30% of the total time. We found this to be similar to the Java Virtual Machine. Java compiles Java source code into Java bytecode, which is then executed by java command.

We know the advantages of Java do not include startup speed, and this process is one of the reasons. How does Java partially solve this problem?

2. PyCDS (Code Object Sharing) Design and Implementation

There is a mechanism called CDS/AppCDS in Java, which saves the overhead of disk IO and parsing and verifying class files by persistently saving Java bytecode and some auxiliary data and using mmap to load them during subsequent startup.

If we want to use similar techniques in Python, we should target Python bytecode.

Python imports a module from the py file by default. The logic is shown on the left side of the preceding figure. The system obtains the corresponding rules based on the specified name and then tries to find the pyc file or recompile the code. Finally, use the exec command to create the module with the code and an empty dict and add it to the runtime.

What we do can be simplified to the right side logic. Based on the package name, try loading from mmap. If successful, the same codeobject can be used for initialization.

What are the immediate obstacles?

As you can see, the C data structure of code objects in Python is shown in the figure, including Python data types (such as consts, string, and bytes).

Serialize and store the involved data into a memory map, using the used codeobject as the root.

In this step, the most direct problem is the memory randomization mechanism. When processing Python objects in code objects, each Python object header holds a pointer to the corresponding type information in the current process. The runtime uses this pointer to determine the type of object in Python.

Let’s take PyCode_Type as an example. If you do not perform operations, the type information (offset in red) is lost. The pointer of the involved object will be saved in the image file we created to solve this problem.

Dynamic patch-related pointers during loading.

The following Python types are involved:

Constant (bool/None/eclipsis)
Literal (float/complex)
Variables to be Additionally Allocated (long/bytes/str)
Container (tuple/frozenset)

For constant and literal, you can save them by assigning them directly after allocating space in the memory map. For variables and containers, you need to simulate the logic of variable initialization in Python, create the appropriate memory size, and write them to the corresponding location. At the same time, for specific types, you need to assign additional values to the reference count in the memory map to prevent accidental recycling in Python.

The preceding is the general content of this project. Please visit the PyCDS project home page to view the specific usage of the project.

PyCDS Homepage: https://github.com/alibaba/code-data-share-for-python

Community

An Analysis of the Exploration and Practice of Python Startup Acceleration

1. Python Startup Speed Analysis

2. PyCDS (Code Object Sharing) Design and Implementation

Read previous post:

Read next post:

OpenAnolis

You may also like

Comments

OpenAnolis

Related Products

Managed Service for OpenTelemetry