An Illustration of JVM and the Java Program Operation Principle

By Feng Jin (Xiangliang)

1. The Characteristics of Java Language

Before getting down to business, I want to ask a cliché question. What is the advantage of the Java language compared to the C programming language? Anyone that has studied Java knows the first class in college and the first chapter of Java-related books talk about Write Once Run Anywhere. Java has realized cross-platform running.

Let me ask again: why does Java run cross-platform?

Most people know that Java can run cross-platform thanks to the Java Virtual Machine (JVM).

I learned before that Java can run cross-platform thanks to different versions of JVM. What is the underlying principle?

Write Once Run Anywhere is a feature of Java. Programming languages like C and C++ don't have this feature.

Through the following introduction, I believe you will have a deeper understanding.

Java is a cross-platform programming language. First, we need to know what a platform is. We call the CPU and the operating system a whole platform.

The CPU is the brain of the computer, and the instruction set is used in the CPU to calculate and control the computer system.

The instruction set is divided into reduced instruction set computing (RISC) and complex instruction set computing (CISC). Each CPU has a specific instruction set.

In order to develop a program, we must know what CPU the program is running on, which means we must know the instruction set of the CPU.

The operating system is the interface software between the user and the computer. Different operating systems support different CPUs. Strictly speaking, different operating systems support different CPU instruction sets. The problem is that the original Mac operating system only supports PowerPC and cannot be installed on Intel. What shall we do? As a result, Apple had to rewrite its Mac operating system to support the change. Finally, we should know that different operating systems support different CPU instruction sets. Intel and AMD CPU instruction sets are now supported on Windows, Linux, MAC, and Solaris.

If you want to develop a program, you should determine the following:

CPU Type: The instruction set type
Operating System: We call it a combination of hardware and software platforms. It can also be said that platform = CPU + OS. Since mainstream operating systems support mainstream CPUs, sometimes, the operating system is called a platform.

2. How to Achieve Cross-platform Running

The Java source code we write usually generates a class file after compilation, which is called a bytecode file. The Java virtual machine translates the bytecode file into the machine code under the specific platform and then runs it. In short, Java programs can run cross-platform because of the different versions of JVM. In other words, as long as the corresponding JVMs are installed on different platforms, we can run the bytecode files (.class) and run the Java programs written. In this process, the Java programs we write do not make any changes, but through JVM, they can run on different platforms and truly realize the purpose of Write Once Run Anywhere. JVM is a bridge and middleware, which is the key to achieving cross-platform running. First, the Java code is compiled into bytecode files and then translated into machine language through JVM to achieve the purpose of running Java programs. Therefore, Java programs running must be supported by JVM because the result of compilation is not machine code and must be translated by JVM before execution. Even if you package your Java programs into executable files (such as Exe), JVM is still required.

Note: Compilation is not to generate machine code but to generate bytecode. Bytecode cannot be run directly and must be converted into machine code by JVM. The bytecode generated by compilation is the same on different platforms, but the machine code translated by JVM is different.

3. An Introduction to JVM

JVM is the foundation of the Java platform. Like the actual machine, it has its own instruction set (similar to the CPU to operate the program through instructions to run) and operates different memory areas (JVM memory system) at runtime. JVM is located in the operating system (as shown in the following figure). It loads the bytecode compiled by the Java C command into its memory area and translates it into a machine code line that the CPU can recognize through the interpreter. Each Java instruction is defined in detail in JVM specifications, such as how to take and deal with operands and where to place the processing results.

JVM runs on the operating system and does not directly interact with the hardware.

4. Memory Structure of JVM

The Java source code file is compiled into bytecode that can be recognized by JVM after compilation. When the Java program is running, the bytecode will be loaded into the memory of JVM through the class loader. The memory of JVM is a logical concept, which is equivalent to an abstraction of the main memory, so the data is still stored in the main memory. Please see the following figure for more information:

JVM divides the memory it manages into several different data areas during the execution of a Java program. Each region has its own role.

The purpose of analyzing JVM memory structure is to analyze the JVM runtime data storage area. The runtime data area of JVM mainly includes heap, stack, method area, and program counter. The optimization of JVM is mainly in the data area shared by threads: heap and method areas.

4.1 Method Area

Also known as non-heap, the method area is used to store class information that has been loaded by the virtual machine, constants, static variables, and just-in-time compiled code. The most famous one in the method area is the CLASS object, which stores the metadata information of the class, including the name of the class, the loader of the class, the method of the class, and the annotation of the class.

When we create a new object using the new operator or reference a static member variable, the class loader subsystem in JVM will load the corresponding class object into JVM. Then, JVM will create the instance object we need or provide the reference value of the static variable according to the class object related to this type of information. Note: No matter how many instance objects are created for a class we define, there is only one class object corresponding to it in JVM, which means each class only has one corresponding class object in memory, as shown in the figure:

All classes are dynamically loaded into JVM when they are used for the first time. When the program creates the first static member reference to the class, the used class is loaded (the bytecode file of the class is loaded). Note: Using the new operator to create new instance objects of a class is treated as a reference to a static member of the class (the constructor is also a static method of the class).

From this point of view, Java programs are not completely loaded into memory before they start running, and various parts of Java programs are loaded on demand. Therefore, when using this class, the class loader will check whether the class object has been loaded (the instance object of the class is created according to the type information in the class object). If not, the default class loader will first look up the .class file according to the class name (the compiled class object is saved in the .class file with the same name). When the bytecode files of this class are loaded, they must accept relevant verification to ensure they are not damaged and do not contain bad Java code (this is Java's security detection mechanism). If there is no problem, the bytecode files will be dynamically loaded into memory, which is equivalent to the class object being loaded into memory (after all, the .class bytecode file saves the class object), and all instance objects of this class can be created according to the class object of this class.

4.2 Heap

All created instance objects and arrays are stored in the heap memory, which is the largest storage area in the memory managed by JVM, and the heap memory is shared by all threads.

The garbage collector collects the memory space occupied by objects on the heap based on the GC algorithm. The heap is divided into the new generation and the old generation. There are object garbage collectors and corresponding garbage collector algorithms for different generations (described in detail in the GC section).

4.3 Stack

The stack in JVM includes the JVM stack and native method stack. The difference between the two is that the former provides services for JVM to execute Java methods, while the latter serves for the native methods used by JVM. The role of the two is very similar, and this article mainly introduces the JVM stack (the stack).

The stack is the data area private to the thread and is created at the same time as the thread. The total number is associated with the thread, and it is the memory model for the execution of Java methods. Each method execution creates a stack frame to store the method's local variable table, operand stack, dynamic link method, return value, and return address. Each method from the end of the call value corresponds to the PUSH and POP process of a stack frame in the virtual machine stack, and the local variable table in the stack frame can store basic types or references to objects. When using a new Object() in a method, a reference to a heap memory instance object will be stored in the local variable table in the current method stack frame, as shown in the following figure:

4.4 Program Counter

It is a small memory space used to store the address of the next byte code instruction to be executed, which is the same concept as the program counter in the CPU.

5. How the Java Program Is Executed within JVM

The memory structure of JVM has been described above. Let's look at how the Java program runs inside the JVM.

(1) The execution process mainly includes:

(2) The Java source code is compiled into bytecode.

(3) Verify the bytecode and load the Java program through the class loader into JVM memory.

(4) Create a class object for each class after the loading and put it into the method area

(5) Initialize bytecode instructions and data into memory

(6) Find the main() method and create a stack frame

(7) Initialize the value inside the program counter as the memory address of the main() method

(8) The program counter increases continuously, executes Java bytecode instructions one by one, and stores the data of the instruction execution process in the operand stack (PUSH). After the execution is completed, take out the data from the operand stack and put it into the local variable table. When creating an object, a continuous space in the heap memory is allocated to store the object. The local variable table in the stack memory stores a reference to the heap memory. When a method is called, create another stack frame and put it on top of the current stack frame.

Let's take an example of actual code to see how the program runs inside JVM.

First, we show the bytecode corresponding to the code above through the Java P command. The following figure shows the class object and various method references initialized in the constant pool of the method area after JVM loads the class into memory. Here, we need to focus on #1, #2, and #5. These numbers hold the reference relationship with the class object and method, which will be used in the following bytecode.

Then, the interpreter in the execution engine starts first and loads the machine code in a line-by-line interpretation of the ClassFile bytecode, supported by the program counter and operand stack in the runtime data area.

The following figure shows the bytecode instructions of the main() method. We analyze the code based on the JVM memory.

The preceding figure shows stack = 3 and local = 2. Stack = 3 indicates the depth of the stack is 3, and local = 2 indicates the number of variables in the local variable table.

Explanation of Sample Program Execution

The following figure shows the JVM memory structure before the bytecode in the main() method is executed to the detail.Sum method.

The specific execution process is listed below:

First, the stack frame of the main() method is pressed into the Java stack. Then, the value in the program counter is updated to the memory address where the bytecode new is located. In this example, the value is directly represented by 0 for convenience. The program counter parses the bytecode, and the three bytecode instructions, new (as mentioned in #5, new corresponds to the class object of JvmDetailClass), dup, and invokespecial, are used for creating an object, assigning a reference, and calling a construction method. astore_1 is used to put the operand (reference) into the operand stack, and aload_1 is used to take the operand (reference) out of the stack and put it into the local variable table.

Iconst_3 and iconst_5 indicate putting operand 3 and operand 5 into the stack of operands.

Next, let's look at how the JVM memory structure is used for method calls. The code above involves two code calls, detail.Sum and detail.getSum. Here, detail.getSum is a method with a return value. Typically, we directly take the call of detail.getSum as an example to see how it is executed inside JVM.

When the interpreter executes a method call, it modifies the value in the program counter to be the first line of instructions inside the called method. At the same time, it presses the stack frame of the getSum method in the stack. After pressing the stack frame, it initializes a reference to the object of the current method in the local variable table. If the method call involves parameter passing, the passed parameters will be stored in the local variable table.

After the getSum method is executed, there are two more steps:

In turn, the program counter is changed to the address of the next line of instructions at which the main() method calls getSum.

The method return value is written to the operand stack in the main() method stack frame.

Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

Community

An Illustration of JVM and the Java Program Operation Principle

1. The Characteristics of Java Language

2. How to Achieve Cross-platform Running

3. An Introduction to JVM

4. Memory Structure of JVM

4.1 Method Area

4.2 Heap

4.3 Stack

4.4 Program Counter

5. How the Java Program Is Executed within JVM

Explanation of Sample Program Execution

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

VPC

AgentBay

Alibaba Cloud PrivateZone

Platform For AI