By Denghui Dong and Tianxiao Gu
Cross-language programming is an important branch of modern programming languages. It is widely used in the design and implementation of complex systems. This article summarizes the GIAC 2021 (Global Internet Architecture Conference) of Alibaba FFI – Exploration of Cross-Language Programming. Denghui Dong and Tianxiao Gu are the two presenters. Denghui Dong is the Head of the Java SIG (reliability, availability, and serviceability) of the Anolis Community. Tianxiao Gu is a core member of the Java SIG (reliability, availability, and serviceability) of the Anolis Community.
Undoubtedly, Java is one of the most popular application programming languages in the industry. In addition to the good performance in mainstream implementation (OpenJDK Hotspot) and mature R&D ecology (Spring), its success is based on the low learning threshold of the language (compared to C/C++). A beginner can quickly build an application that has taken shape using the existing project scaffolding. Consequently, many Java programmers are not familiar with the underlying execution principles of the program. This article will explore a technology that is less involved in most Java-related research and development work, cross-language programming.
Many years ago, the first time I used Java to print out Hello World on the console, I looked through the source code of JDK out of curiosity to find out what the implementation was. (In the C language, we can use the printf function, but printf depends on the interface of the operating system in terms of the specific implementation.) After a lot of reading, I finally stopped at a native method that failed to see the implementation.
I think many Java beginners still know little about the calling mechanism of native methods. After all, we rarely need to implement a custom native method in most research and development work. In short, the native method is an interface for Java to make cross-language calls, which is part of the Java Native Interface specification.
Before introducing Java cross-language programming technology, we can analyze the scenarios where cross-language programming technology is required in the programming process. I have listed the following four scenarios below:
1. Rely on capabilities that bytecode does not support.
What capabilities does the current standard bytecode provide? According to the Spec specification, the existing bytecode can implement creating Java objects, accessing object fields and methods, regular calculations (addition, subtraction, multiplication, and division), comparisons, jumps, exceptions, and lock operations. However, the bytecode does not support high-level functions like printing strings to the console mentioned in the preface. In addition, functions like obtaining the current time, allocating off-heap memory, and rendering graphics are not supported. It is difficult to write a pure Java method (combining these bytecodes) to implement such capabilities because these functions often require interaction with system resources. In these scenarios, we need to use cross-language programming technology to integrate these functions through implementations in other languages.
2. System-level languages are used (C, C++, and Assembly) to implement the critical path of the system.
No need to display objects to be released is one of the reasons why the Java language has a low threshold to learn, but it also introduces the mechanism of GC to clean up objects no longer needed in the program. In mainstream JVM implementations, GC can cause applications to suspend as a whole, affecting overall system performance, including response and throughput.
Therefore, compared with C/C++, Java reduces the research and development burden of programmers and improves the efficiency of product research and development, but it introduces runtime overhead. (Software engineering is largely the art of balancing competing trade-offs.)
The core portion of the critical path of the system (such as some complex algorithms) could be unstable when Java is implemented. In such a case, try to use a relatively low-level programming language to implement this part of the logic to achieve stable performance and low resource consumption.
3. Java is called in other languages.
Most Java programmers may feel that they haven't encountered such situations, but the fact is that we experience it almost every day.
For example, running a Java program through Java goes through the process of calling the Java language from the C language. This will be mentioned later.
4. Historical Legacy Libraries
The company's internal or open-source implementation has some high-performance libraries written in C/C++, which costs a lot to rewrite in Java and maintain later. When Java applications need to use the capabilities provided by these libraries, we need to use cross-language programming technology to reuse them.
Alibaba Grape
Let's briefly talk about an internal scenario in Alibaba entitled the Alibaba Grape project. It is the first business party to cooperate with our team in cross-language programming technology.
Grape is an open-source project from the parallel graph computing framework (*related papers won the ACM SIGMOD Best Paper Award). It is mainly written in C++, and a large number of template features are applied in engineering implementation. Developers interested in the Grape project can refer to the relevant documents on GitHub. It will not be described in detail here.
In the internal application of the project, many business parties use Java as the main programming language. Therefore, developers are required to encapsulate the Grape library into Java SDK for upper-layer Java applications to call. In practice, there are two problems:
The two teams have cooperated to solve these problems. The Alibaba FFI project officially began to evolve. Currently, the implementation of this project is mainly aimed at the Java calling C++ scenario.
The following describes some Java cross-language calling technologies that are relatively mature and widely used in the industry.
When it comes to Java cross-language programming, the first thing I have to mention is Java Native Interface (JNI). The JNA/JNR and JavaCPP technologies mentioned later all depend on JNI. First, let's briefly review it through two examples.
Example of Console Output
System.out.println("hello ffi");
System.out allows us to implement console output. I believe that many curious developers will be concerned about how this call implements the output. After looking through the source code, we will finally see such a native method:
private native void writeBytes(byte b[], int off, int len, boolean append) throws
IOException;
This method is implemented by JDK. Please refer to this link for the specific implementation. Can we implement such a function by ourselves? The answer is yes. The following lists the general steps, but some details are omitted:
1) First, we define a Java native method, which requires the native keyword and does not provide a specific implementation. The native method can be overloaded.
static native void myHelloFFI();
2) Run the javah or javac –h (JDK 10) command to generate a header file that the subsequent steps depend on. It can be used by C or C++ programs.
/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class HelloFFI */
#ifndef _Included_HelloFFI
#define _Included_HelloFFI
#ifdef __cplusplus
extern "C" {
#endif
/*
* Class: HelloFFI
* Method: myHelloFFI
* Signature: ()V
*/
JNIEXPORT void JNICALL Java_HelloFFI_myHelloFFI
(JNIEnv *, jclass);
#ifdef __cplusplus
}
#endif
#endif
3) Implement the function in the header file. Here, we use the printf function to output "hello ffi" in the console.
JNIEXPORT void JNICALL Java_HelloFFI_myHelloFFI
(JNIEnv * env, jclass c) {
printf("hello ffi");
}
4) Compile and generate library files by the C/C++ compiler (gcc/llvm)
5) Use the -Djava.library.path=...
parameter to specify the library file path. Call System.loadLibrary
at runtime to load the library generated in the previous step. Then, the Java program can normally call our implemented myHelloFFI
method.
The section above is an example of Java methods calling C functions. We can call Java methods in C programs through JNI technology. Two concepts are involved: Invocation API and JNI function. In the following code example, the steps to initialize the virtual machine are omitted, and only two steps are given to finally implement the call.
// Init jvm ...// Get method idjmethodID mainID = (*env)->GetStaticMethodID(env, mainClass, "main", "([Ljava/lang/String;)V");/* Invoke */(*env)->CallStaticVoidMethod(env, mainClass, mainID, mainArgs);
The example gets the "id" of the method by GetStaticMethodID
first and then calls the method with CallStaticVoidMethod
. Both are JNI functions.
As we mentioned earlier, when java <Main Class>
runs a Java program, it is a scenario where other languages call the Java language. Java commands are implemented by applying a procedure similar to the preceding code to complete calls to the main method of the main class. By the way, some diagnostic commands commonly used in our daily research and development process, such as jcmd
, jmap
, jstack
, and java
commands are implemented in the same source code (the figure shows that these binary files are about the same size), but different construction parameters are used in the construction process.
What is JNI? The following is my understanding:
After the previous introduction of Java Native Interface, we realized it is very troublesome to use JNI technology to implement the steps of the Java method calling the C language. Therefore, the open-source community gave birth to Java Native Access (JNA) and Java Native Runtime (JNR) projects to reduce the difficulty of Java cross-language programming (referring to the programs of Java calling C/C++). Essentially, the underlays of both technologies are still based on JNI. Therefore, they do not outperform JNI at runtime.
Developers do not need to actively generate or write the underlying glue code through the encapsulation of C/C++ programs by JNA/JNR. Thus, they can quickly implement cross-language calls. The two also provide other optimizations, such as Crash Protection (described later). In the implementation, JNR will dynamically generate some Stub to optimize performance at runtime.
The following shows the relationship between JNA/JNR and JNI:
The following is an example given by JNR. First, create a LibC interface to encapsulate the target C function. Then, call the API of the LibraryLoader to create a specific instance of LibC. Finally, use the interface to complete the call:
public class HelloWorld {
public interface LibC { // A representation of libC in Java
int puts(String s); // mapping of the puts function, in C `int puts(const char *s);`
}
public static void main(String[] args) {
LibC libc = LibraryLoader.create(LibC.class).load("c"); // load the "c" library into the libc variable
libc.puts("Hello World!"); // prints "Hello World!" to console
}
}
Unfortunately, the support for C++ in JNA and JNR is unfriendly, so the use is limited in scenarios that call C++ libraries.
This is the missing bridge between Java and native C++. If JNA/JNR optimizes the programming experience of Java calling C, then JavaCPP's goal is to optimize the programming experience of Java calling C++. At present, this project is an SDK widely used in the industry. JavaCPP has supported most C++ features, such as Overloaded operators, Class & Function templates, and Callback through function pointers. Similar to JNA/JNR, the underlay of JavaCPP is based on JNI. Similar glue code and some build scripts are automatically generated through mechanisms, such as annotation processing. In addition, the project provides some presets of common C++ libraries implemented by JavaCPP, such as LLVM and Caffe. The following shows an example of using JavaCPP to encapsulate std::vector:
@Platform(include="<vector>")
public class VectorTest {
@Name("std::vector<std::vector<void*> >")
public static class PointerVectorVector extends Pointer {
static { Loader.load(); }
public PointerVectorVector() { allocate(); }
public PointerVectorVector(long n) { allocate(n); }
public PointerVectorVector(Pointer p) { super(p); } // this = (vector<vector<void*> >*)p
/**
other methods ....
*/
public native @Index void resize(long i, long n); // (*this)[i].resize(n)
public native @Index Pointer get(long i, long j); // return (*this)[i][j]
public native void put(long i, long j, Pointer p); // (*this)[i][j] = p
}
public static void main(String[] args) {
PointerVectorVector v = new PointerVectorVector(13);
v.resize(0, 42); // v[0].resize(42)
Pointer p = new Pointer() { { address = 0xDEADBEEFL; } };
v.put(0, 0, p); // v[0][0] = p
PointerVectorVector v2 = new PointerVectorVector().put(v);
Pointer p2 = v2.get(0).get(); // p2 = *(&v[0][0])
System.out.println(v2.size() + " " + v2.size(0) + " " + p2);
v2.at(42);
}
}
Graal and Panama are two active community projects related to cross-language programming. However, these two technologies have not been verified on a large scale in the production environment. They will not be described here. If possible, these two projects will be introduced separately.
FBJNI is a set of open-source frameworks from Facebook to assist C++ developers with JNI. Most of the things mentioned above are how to enable Java users to access the Native method. In cross-language calling scenarios, scenarios where C++ users need to access Java code safely and conveniently exist. Alibaba FFI focuses on how to enable Java to access C++ quickly. For example, let's suppose a requirement is to allow C++ users to access Java's List interface. Then, instead of manipulating Java's List objects through the JNI interface functions, Alibaba FFI would convert the std::vector of C++ into a Java interface through the FFI package.
The core reason for the high performance of JVM is the built-in, powerful timely compiler called Just in Time (JIT). JIT compiles hotspot methods in the running process into executable code, which enables these methods to run directly (avoiding interpreting bytecode execution). Many optimization technologies are applied in the compilation process. Inlining is one of the most important optimizations. Simply put, inlining embeds the execution logic of the called method into the caller's logic. It can eliminate the overhead caused by the method call and allow more programmatic optimization.
However, JIT only supports the inlining of Java methods in the current implementation of hotspot. If a Java method calls native methods, inlining optimization cannot be applied to this native method.
Speaking of which, can some native methods we often use, such as System.currentTimeMillis
, can be inlined? For these native methods frequently used in applications, hotspot uses Intrinsic technology to improve call performance. (Non-native methods can also be intrinsic.) I think Intrinsic is somewhat similar to build-in. When the JIT encounters such a method call, it can embed the method implementation in the final generated code. However, the Intrinsic support for methods usually requires direct modification to the JVM.
Another overhead of JNI is parameter passing (including return values). The calling convention of methods and functions differs from language to language. Therefore, a process of parameter passing is involved in Java methods when calling native methods, as shown in the following figure (for x64 platforms):
According to the JNI specification, the JVM first needs to put JNIEnv * into the first parameter register (rdi). Then, it puts the rest of the parameters, including this (receiver), into the corresponding registers. Hotspot will dynamically generate an efficient stub for the conversion by method signature to make this process as fast as possible.
State switching is involved in the process of moving from a Java method to a native method. There is also a state switch when the native method has been executed and returns to the Java method. The following figure shows:
State switching for implementation requires the introduction of a memory barrier and safepoint check.
Another overhead of JNI exists in accessing Java objects in native methods.
Let's imagine we need to access a Java object in a C function. The most direct way is to obtain the pointer of the object and then access it. However, Java objects might be moved due to GC. Therefore, a mechanism is required to make the logic of accessing objects in native methods address-independent.
All problems in CS can be solved with another level of indirection.
In the specific implementation, the problem is solved by adding a profile layer JNI Handle while using JNI Functions to access objects. This solution is bound to introduce overhead.
After the previous introduction, we know the current mainstream Java cross-language programming technology has two problems:
We can use JNA/JNR and JavaCPP to solve the first problem. Do we have a corresponding optimization plan for the second problem?
The Alibaba FFI project is dedicated to solving problems encountered in Java cross-language programming. As a whole, the project is divided into the following two modules:
1) FFI (for the difficulty of programming)
2) LLVM4JNI (for overhead issues at runtime)
Currently, Alibaba FFI is mainly aimed at C++, and the following mainly uses C++ as the target communication language.
Workflow for cross-language programming with Alibaba FFI:
Note: The solid line indicates the relationship between the source code and the product in the pre-run stage. The dashed line indicates the relationship between the application, the library, and the product in the run stage.
The FFI module provides a set of annotations and types to encapsulate the interfaces of other languages. As you can see in the following figure, the top level is an FFIType (FFI -> Foreign function interface) interface.
In a specific C++ -oriented implementation, an underlying C++ object will be mapped to a Java object, so the Java object needs to include the address of the C++ object. Since C++ objects will not be moved, we can save bare pointers in Java objects.
Essentially, the FFI module generates the relevant code required for cross-language calls using the annotation processor. The users only need to rely on the relevant libraries (plug-ins) of FFI and use the api provided by FFI to encapsulate the target function to be called.
The following is a process to encapsulate std::vector. a. Encapsulate the underlying functions to be called using annotations and types:
@FFIGen(library = "stdcxx-demo")
@CXXHead(system = {"vector", "string"})
@FFITypeAlias("std::vector")
@CXXTemplate(cxx="jint", java="Integer")
@CXXTemplate(cxx="jbyte", java="Byte")
public interface StdVector<E> extends CXXPointer {
@FFIFactory
interface Factory<E> {
StdVector<E> create();
}
long size();
@CXXOperator("[]") @CXXReference E get(long index);
@CXXOperator("[]") void set(long index, @CXXReference E value);
void push_back(@CXXValue E e);
long capacity();
void reserve(long size);
void resize(long size);
}
The real implementation of the interface:
public class StdVector_cxx_0x6b0caae2 extends FFIPointerImpl implements StdVector<Byte> {
static {
FFITypeFactory.loadNativeLibrary(StdVector_cxx_0x6b0caae2.class, "stdcxx-demo");
}
public StdVector_cxx_0x6b0caae2(final long address) {
super(address);
}
public long capacity() {
return nativeCapacity(address);
}
public static native long nativeCapacity(long ptr);
...
public long size() {
return nativeSize(address);
}
public static native long nativeSize(long ptr);
}
Glue code of JNI:
#include <jni.h>
#include <new>
#include <vector>
#include <string>
#include "stdcxx_demo.h"
#ifdef __cplusplus
extern "C" {
#endif
JNIEXPORT
jbyte JNICALL Java_com_alibaba_ffi_samples_StdVector_1cxx_10x6b0caae2_nativeGet(JNIEnv* env, jclass cls, jlong ptr, jlong arg0 /* index0 */) {
return (jbyte)((*reinterpret_cast<std::vector<jbyte>*>(ptr))[arg0]);
}
JNIEXPORT
jlong JNICALL Java_com_alibaba_ffi_samples_StdVector_1cxx_10x6b0caae2_nativeSize(JNIEnv* env, jclass cls, jlong ptr) {
return (jlong)(reinterpret_cast<std::vector<jbyte>*>(ptr)->size());
}
......
#ifdef __cplusplus
}
#endif
We have introduced some optimization mechanisms during the evolution, such as the handling of temporary objects returned by C++ functions and the conversion of exceptions. Here, we introduce Crash Protection. It is a solution to the problems encountered by users in practical scenarios, and it is handled correspondingly in JNA and JNR.
Sometimes, the C++ libraries that Java applications depend on need to be versioned. We need to introduce a protection mechanism to prevent the bug in the C++ libraries from causing the entire application to crash. (The bug in Java usually behaves as exceptions and does not cause problems in the application as a whole in most cases.).
JNIEXPORT void JNICALL Java_Demo_crash(JNIEnv* env, jclass) {
void* addr = 0;
*(int*)addr = 0; // (Crash)
}
A problem with out-of-bounds memory accesses will occur at line 3. The application will crash if no special handling is done. We introduced the protection mechanism to isolate the problem. The following is the implementation on Linux:
PROTECTION_START // 生成胶水代码中插入宏
void* addr = 0;
*(int*)addr = 0;
PROTECTION_END // 宏
// 宏展开后的实现如下
// Pseudo code
// register signal handlers
signal(sig, signal_handler);
int sigsetjmp_rv;
sigsetjmp_rv = sigsetjmp(acquire_sigjmp_buf(), 1);
if (!sigsetjmp_rv) {
void* addr = 0;
*(int*)addr = 0;
}
release_sigjmp_buf();
// restore handler ...
if (sigsetjmp_rv) {
handle_crash(env, sigsetjmp_rv);
}
Crash is protected by implementing signal-handling functions and the sigsetjmp/siglongjmp mechanism. Note: Hotspot has custom signal handlers (safepoint check and implicit null check). It is necessary to preload the libjsig.so (on Linux) when it starts to prevent conflicts. Finally, we can throw Java exceptions for subsequent troubleshooting and analysis in handle_crash.
LLVM4JNI implements the translation of bitcode to bytecode. As such, a Native function is converted into a Java method, eliminating a number of the overhead issues mentioned earlier.
The translation process is completed before the application runs. Its core implements the semantics of the bitcode using bytecode. The details of the implementation will not be introduced in this article. (It will be introduced in detail after the project is open-source.) The following section demonstrates the translation results of a few simple procedures.
1. Simple Basic Operations:
int v1 = i + j;
int v2 = i - j;
int v3 = i * j;
int v4 = i / j;
return v1 + v2 + v3 + v4;
%5 = sdiv i32 %2, %3
%6 = add i32 %3, 2
%7 = mul i32 %6, %2
%8 = add nsw i32 %5, %7
ret i32 %8
Code:
stack=2, locals=6, args_size=3
0: iload_1
1: iload_2
2: idiv
3: istore_3
4: iload_2
5: ldc #193 // int 2
7: iadd
8: iload_1
9: imul
10: istore 5
12: iload_3
13: iload 5
15: iadd
16: ireturn
2. Conversion of JNI Functions: Currently, over 90 functions are supported. In the future, this feature will be integrated with fbjni and other similar frameworks. It will break the code boundary between Java and Native. It will also eliminate the extra overhead of method calls.
jclass objectClass = env->FindClass(“java/util/List");
return env->IsInstanceOf(arg, objectClass);
Code:
stack=1, locals=3, args_size=2
0: ldc #205 // class java/util/List
2: astore_2
3: aload_1
4: instanceof #205 // class java/util/List
7: i2b
8: ireturn
3. C++ Object Access: An additional benefit of Alibaba FFI is the ability to develop Java off-heap applications in an object-oriented manner. (C++ is an object-oriented language.) Currently, most Java-based big data platforms require off-heap-enabled data modules to reduce the pressure of garbage collection. However, manual development of off-heap modules requires careful handling of the underlying offsets and alignments across platforms and architectures. This process is error-prone and time-consuming. Alibaba FFI can use C++ to develop an object model. Then, we can use Alibaba FFI to expose it to Java users.
class Pointer {
public:
int _x;
int _y;
Pointer(): _x(0), _y(0) {}
const int x() { return _x; }
const int y() { return _y; }
};
JNIEXPORT
jint JNICALL Java_Pointer_1cxx_10x4b57d61d_nativeX(JNIEnv*, jclass, jlong ptr) {
return (jint)(reinterpret_cast<Pointer*>(ptr)->x());
}
JNIEXPORT
jint JNICALL Java_Pointer_1cxx_10x4b57d61d_nativeY(JNIEnv*, jclass, jlong ptr) {
return (jint)(reinterpret_cast<Pointer*>(ptr)->y());
}
define i32 @Java_Pointer_1cxx_10x4b57d61d_nativeX
%4 = inttoptr i64 %2 to %class.Pointer*
%5 = getelementptr inbounds %class.Pointer, %class.Pointer* %4, i64 0, i32 0
%6 = load i32, i32* %5, align 4, !tbaa !3
ret i32 %6
define i32 @Java_Pointer_1cxx_10x4b57d61d_nativeY
%4 = inttoptr i64 %2 to %class.Pointer*
%5 = getelementptr inbounds %class.Pointer, %class.Pointer* %4, i64 0, i32 1
%6 = load i32, i32* %5, align 4, !tbaa !8
ret i32 %6
public int y();
descriptor: ()I
flags: ACC_PUBLIC
Code:
stack=2, locals=1, args_size=1
0: aload_0
1: getfield #36 // Field address:J
4: invokestatic #84 // Method nativeY:(J)I
7: ireturn
LineNumberTable:
line 70: 0
public static int nativeY(long);
descriptor: (J)I
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=4, locals=2, args_size=1
0: lload_0
1: ldc2_w #85 // long 4l
4: ladd
5: invokestatic #80 // Method com/alibaba/llvm4jni/runtime/JavaRuntime.getInt:(J)I
8: ireturn
public class JavaRuntime {
public static final Unsafe UNSAFE;
...
public static int getInt(long address) {
return UNSAFE.getInt(address);
}
...
}
In the field implementation of accessing C++ objects, we use the Unsafe API to access the out-of-heap memory. Thus, the call of native methods is avoided.
The performance data of the SSSP (single-source shortest path algorithm) implemented by Grape in Alibaba FFI is listed below:
Three modes are compared here:
Here, we take the completion time of the algorithm (Job Time) as the indicator and normalize the final result in terms of C++ computation time.
Cross-language programming is an important direction of modern programming languages. There are many schemes in the community to implement the communication process for different languages.
Alibaba FFI currently focuses on C++. In the future, we aim to implement and optimize the communication process between Java and other languages. The project will also be open-sourced. We welcome your continued attention.
Developers are welcome to join the Java language and virtual machine SIG through our website.
Denghui Dong has been working on Java since 2015. In 2017, he joined Alibaba JVM team, mainly focusing on RAS. He is an OpenJDK committer and the project lead of Eclipse Jifa.
Alibaba Dragonwell ZGC – Part 1: New Garbage Collector ZGC Unboxing and the First Experience of ZGC
85 posts | 5 followers
FollowAlibaba Cloud Community - April 15, 2022
sunqi - February 12, 2020
OpenAnolis - July 8, 2022
Alibaba Developer - February 4, 2021
Alibaba Cloud Serverless - June 9, 2022
Alibaba Cloud Native Community - May 31, 2022
85 posts | 5 followers
FollowA dialogue platform that enables smart dialog (based on natural language processing) through a range of dialogue-enabling clients
Learn MoreMore Posts by OpenAnolis