×
Community Blog Techniques to Construct String Objects and Access Internal Members Quickly

Techniques to Construct String Objects and Access Internal Members Quickly

This article offers multiple techniques when constructing string objects in multiple JDK versions (with examples).

1

By Wen Shaojin (Gaotie)

1. Implementing JDK String-Related Knowledge

Implementing strings after JDK 8 and 9 is different. The String structure in JDK 8 is listed below:

1.1 Implementing String JDK 8

class String {
    char[] value;

    // The constructor can copy.
    public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

    // No Copy Constructor
    String(char[] value, boolean share) {
        // assert share : "unshared not supported";
        this.value = value;
    }
}

1.2 Implementing String JDK 9 and Later Versions

class String {
    static final byte LATIN1 = 0;
    static final byte UTF16  = 1;
    
    byte code;
    byte[] value;

    // No Copy Constructor
    String(byte[] value, byte coder) {
        this.value = value;
        this.coder = coder;
    }
}

After JDK 9, the value is stored by byte[], and the code field is used to distinguish between LATIN1 and UTF16. Most of the strings are LATIN1. As such, when we construct strings or encode strings into binary, we implement ZeroCopy to achieve extreme performance.

2. Unsafe-Related Knowledge

sun.Unsafe provided after JDK 8 can perform some native operations with better performance. Insecure and wrong calls will cause a JVM crash. If used correctly, it can improve performance. Unsafe can help you bypass any restrictions.

public class UnsafeUtils {
    public static final Unsafe UNSAFE;
    
    static {
        Unsafe unsafe = null;
        try {
            Field theUnsafeField = Unsafe.class.getDeclaredField("theUnsafe");
            theUnsafeField.setAccessible(true);
            unsafe = (Unsafe) theUnsafeField.get(null);
        } catch (Throwable ignored) {
            // ignored
        }
        UNSAFE = unsafe;
    }
}

3. Trusted MethodHandles.Lookup-Related Knowledge

JDK 8 starts to support Lambda to easily map a Method to a Lambda Function to avoid reflection overhead. Java.invoke.LambdaMetafactory can do this, but it is limited by visibility, which means private methods cannot be called. There is a trick, combined with Unsafe, to construct a Trusted MethodHandles.Lookup in different versions of JDK to bypass the visibility restriction to call any JDK internal method:

import static com.alibaba.fastjson2.util.UnsafeUtils.UNSAFE;

static final MethodHandles.Lookup IMPL_LOOKUP;

static {
    Class lookupClass = MethodHandles.Lookup.class;
    Field implLookup = lookupClass.getDeclaredField("IMPL_LOOKUP");
    long fieldOffset = UNSAFE.staticFieldOffset(implLookup);
    IMPL_LOOKUP = (MethodHandles.Lookup) UNSAFE.getObject(lookupClass, fieldOffset);
}

static MethodHandles.Lookup trustedLookup(Class objectClass) throws Exception {
    return IMPL_LOOKUP.in(objectClass);
}

Note: The implementation in IBM OpenJ9 JDK 8/11 version is limited by visibility and requires additional processing. Please refer to FASTJSON2 JDKUtils#trustedLookup code for more information.

4. ZeroCopy Constructing String Object

The key to quickly constructing strings is reducing copy or doing ZeroCopy. The implementation in JDK 8, JDK 9-15, JDK 16, and later versions is different.

4.1 Implementing of ZeroCopy String Object Construction in JDK 8

In JDK 8, you need to call its constructor String(char[], boolean) to construct a String object with ZeroCopy. For example:

BiFunction<char[], Boolean, String>  stringCreatorJDK8
    = (char[] value, boolean share) -> new String(chars, boolean);

Since the String(char[], boolean) method is not public, the preceding code will report an error. In order to construct a TRUSTED MethodHandles.Lookup through reflection, the internal method of String is called and mapped to a BiFunction. The code is listed below:

import com.alibaba.fastjson2.util.JDKUtils;

import java.util.function.BiFunction;
import java.lang.invoke.MethodHandles;
import static java.lang.invoke.MethodType.methodType;

MethodHandles.Lookup caller = JDKUtils.trustedLookup(String.class);

MethodHandle handle = caller.findConstructor(
        String.class, 
        methodType(void.class, char[].class, boolean.class)
);

CallSite callSite = LambdaMetafactory.metafactory(
        caller,
        "apply",
        methodType(BiFunction.class),
        methodType(Object.class, Object.class, Object.class),
        handle,
        methodType(String.class, char[].class, boolean.class)
);
BiFunction<char[], Boolean, String>  STRING_CREATOR_JDK8 
    = (BiFunction<char[], Boolean, String>) 
      callSite.getTarget().invokeExact();

4.2 Implementing JDK 9 and Later Versions Realizing the ZeroCopy Construction of String Objects

From JDK 9 to JDK 15, we want to construct a function like this to ZeroCopy a String object.

BiFunction<byte[], Byte, String> STRING_CREATOR_JDK11 
    = (byte[] value, byte coder) -> new String(value, coder);

Similarly, the String(byte[], byte) method in JDK 9 is not public and cannot be called directly. The preceding code will report an error. Call the String internal method to construct a TRUSTED MethodHandles.Lookup method, as shown below:

import com.alibaba.fastjson2.util.JDKUtils;
import static java.lang.invoke.MethodType.methodType;

MethodHandles.Lookup caller = JDKUtils.trustedLookup(String.class);
MethodHandle handle = caller.findConstructor(
        String.class, 
        methodType(void.class, byte[].class, byte.class)
);
CallSite callSite = LambdaMetafactory.metafactory(
        caller,
        "apply",
        methodType(BiFunction.class),
        methodType(Object.class, Object.class, Object.class),
        handle,
        methodType(String.class, byte[].class, Byte.class)
);
BiFunction<byte[], Byte, String> STRING_CREATOR_JDK11 
    = (BiFunction<byte[], Byte, String>) 
      callSite.getTarget().invokeExact();

Note: The preceding method does not work when the user configures the JVM parameter -XX:-CompactStrings.

4.3 Application Examples to Construct String Objects Quickly

stiatic BiFunction<char[], Boolean, String>  STRING_CREATOR_JDK8 = ...
static BiFunction<byte[], Byte, String> STRING_CREATOR_JDK11 = ... 

static String formatYYYYMMDD(LocalDate date) {
    int year = date.getYear();
    int month = date.getMonthValue();
    int dayOfMonth = date.getDayOfMonth();

    int y0 = year / 1000 + '0';
    int y1 = (year / 100) % 10 + '0';
    int y2 = (year / 10) % 10 + '0';
    int y3 = year % 10 + '0';
    int m0 = month / 10 + '0';
    int m1 = month % 10 + '0';
    int d0 = dayOfMonth / 10 + '0';
    int d1 = dayOfMonth % 10 + '0';

    String str;
    if (STRING_CREATOR_JDK11 != null) {
        byte[] bytes = new byte[10];
        bytes[0] = (byte) y0;
        bytes[1] = (byte) y1;
        bytes[2] = (byte) y2;
        bytes[3] = (byte) y3;
        bytes[4] = '-';
        bytes[5] = (byte) m0;
        bytes[6] = (byte) m1;
        bytes[7] = '-';
        bytes[8] = (byte) d0;
        bytes[9] = (byte) d1;
        str = STRING_CREATOR_JDK11.apply(bytes, JDKUtils.LATIN1);
    } else {
        char[] chars = new char[10];
        chars[0] = (char) y1;
        chars[1] = (char) y2;
        chars[2] = (char) y3;
        chars[3] = (char) y4;
        chars[4] = '-';
        chars[5] = (char) m0;
        chars[6] = (char) m1;
        chars[7] = '-';
        chars[8] = (char) d0;
        chars[9] = (char) d1;

        if (STRING_CREATOR_JDK8 != null) {
            str = STRING_CREATOR_JDK8.apply(chars, Boolean.TRUE);
        } else {
            str = new String(chars);
        }
    }
    return str;
}

In the preceding examples, according to the JDK version, char[] is directly created in JDK 8, byte[] is directly created in JDK 9, and string objects are constructed by ZeroCopy, thus realizing quick formatting of LocalDate to String, which is faster than using SimpleDateFormat/java.time.DateTimeFormat and other implementations.

5. Directly Access the Internal Members of the String Object

5.1 JDK 8 Quick Access to Value

static final Field FIELD_STRING_VALUE;
static final long FIELD_STRING_VALUE_OFFSET;

static {
    Field field = null;
    long fieldOffset = -1;
    try {
        field = String.class.getDeclaredField("value");
        fieldOffset = UnsafeUtils.objectFieldOffset(field);
    } catch (Exception ignored) {
        FIELD_STRING_ERROR = true;
    }

    FIELD_STRING_VALUE = field;
    FIELD_STRING_VALUE_OFFSET = fieldOffset;
}

public static char[] getCharArray(String str) {
    if (!FIELD_STRING_ERROR) {
        try {
            return (char[]) UnsafeUtils.UNSAFE.getObject(
                str, 
                FIELD_STRING_VALUE_OFFSET
            );
        } catch (Exception ignored) {
            FIELD_STRING_ERROR = true;
        }
    }

    return str.toCharArray();
}

5.2 JDK 9 and Later Versions Directly Access Coder and Value

We need to construct the following function:

ToIntFunction<String> stringCoder = (String str) -> str.coder();
Function<String, byte[]> stringValue = (String str) -> str.value();

However, since the String.coder and value methods are not public (similar to 4.2), they need to be constructed by TRUSTED MethodHandles.Lookup, as shown below:

import com.alibaba.fastjson2.util.JDKUtils;
import static java.lang.invoke.MethodType.methodType;

MethodHandles.Lookup lookup = JDKUtils.trustedLookup(String.class);
MethodHandle coder = lookup.findSpecial(
        String.class,
        "coder",
        methodType(byte.class),
        String.class
);
CallSite applyAsInt = LambdaMetafactory.metafactory(
        lookup,
        "applyAsInt",
        methodType(ToIntFunction.class),
        methodType(int.class, Object.class),
        coder,
        MethodType.methodType(byte.class, String.class)
);
ToIntFunction<String> STRING_CODER 
    = (ToIntFunction<String>) applyAsInt.getTarget().invokeExact();

MethodHandle value = lookup.findSpecial(
        String.class,
        "value",
         methodType(byte[].class),
        String.class
);
CallSite apply = LambdaMetafactory.metafactory(
        lookup,
        "apply",
        methodType(Function.class),
        methodType(Object.class, Object.class),
        value,
        methodType(byte[].class, String.class)
);
Function<String, byte[]> STRING_VALUE 
     = (Function<String, byte[]>) apply.getTarget().invokeExact();

5.3 An Example of Direct Access

static Byte LATIN1 = 0;
static ToIntFunction<String> STRING_CODER = ...
static Function<String, byte[]> STRING_VALUE ...

byte[] buf = ...;
int off;

void writeString(string str) {
    if (STRING_CODER != null && STRING_VALUE != null) {
        // improved for JDK 9 LATIN1
        int coder = stringCoder.apply(str);
        if (coder == LATIN1) {
            // str.getBytes(0, str.length, buf, off);
            byte[] value = STRING_VALUE.apply(str);
            System.arrayCopy(value, 0, buf, off, value.length);
            return;
        }
    }
    // normal logic
}

5.4 The String.getBytes Method

String has a Deprecated getBytes method. When there are non-LATIN characters, the result is incorrect. However, when the coder is LATIN1, it can be used to directly copy the value.

class String {
    @Deprecated
    public void getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin) {
        int j = dstBegin;
        int n = srcEnd;
        int i = srcBegin;
        char[] val = value;   /* avoid getfield opcode */

        while (i < n) {
            dst[j++] = (byte)val[i++];
        }
    }
}
static Byte LATIN1 = 0;
static ToIntFunction<String> STRING_CODER = ...

byte[] buf = ...;
int off;

void writeString(string str) {
    if (STRING_CODER != null) {
        // improved for JDK 9 LATIN1
        int coder = STRING_CODER.apply(str);
        if (coder == LATIN1) {
            str.getBytes(0, str.length, buf, off);
            return;
        }
    }
    // normal logic
}

Reference Implementation

The FASTJSON2 project uses the technique where JDKUtils and UnsafeUtils have implemented the technique.

Note

These techniques are not recommended for beginners. You need to know the principle before using it.

0 1 0
Share on

Alibaba Cloud Community

1,037 posts | 255 followers

You may also like

Comments