1G Memory Usage Optimization with One Line of Code

The one line of code here refers to the invocation of String.intern(). To invoke this line of code, several dozen additional lines of code have been written.

By Zhiqi

Background

We have a project that uses a full memory caching mechanism. The main goal is to achieve excellent RT (Response Time) while dealing with a small amount of data that can be easily processed by the standard 4C8G container. However, one day, the staging environment started to heavily alert FullGC, which was traced back to the cache becoming too large.

Body Text

We usually load configuration items with a data magnitude of about 100 pieces into the memory. Recently, a new demand has caused the configuration data volume to expand to 100,000 pieces, leading to a significant increase in memory usage. Upon analysis, we found that the information entropy of these data is not very high. Most of the JSON stores strings with a limited number of permutations and combinations, but these strings are repeatedly loaded into the heap space in a new String way by the deserialization framework.

Inspired by the concept of a constant pool, I thought of using it to solve the issue without any changes to the business logic or design.

It is clear that the Fastjson serialization tool we use does not perform constant pool processing on the value field. This makes sense as value typically represents unlimited possibilities. Introducing every incoming string into the constant pool would have a detrimental effect on the system. However, we understand that in our specific business scenarios, certain values are limited and do not require Young GC. Hence, we need to make these specific values constants by explicitly calling the String.intern() method.

To achieve this, let’s start with String.intern().

Fastjson uses the appropriate ObjectDeserializer to deserialize a field and the @JSONField(deserializeUsing = xxx.class) annotation also gives us space to customize deserializers. Therefore, we plan to customize a deserializer to call the intern method.

public class StringPoolDeserializer implements ObjectDeserializer {

    @SuppressWarnings("unchecked")
    @Override
    public <T> T deserialze(DefaultJSONParser parser, Type type, Object o) {

        if (!type.equals(String.class)) {
            throw new JSONException("StringPoolDeserializer can only deserialize String");
        }

        return (T) ((String) parser.parse(o)).intern();
    }

    @Override
    public int getFastMatchToken() {
        return 0;
    }
}

After this optimization, 800M heap memory is released, and the metaspace hardly increases. After all, our data information is repeated with a very low entropy.

However, the remaining size is still larger than expected, and it is later found that this method cannot process value in members of the Map<String, String> type.

Let's take a further look at how Map is processed. Fastjson internally implements MapDeserializer to deserialize fields of Map type. However, the implementation of this deserializer is relatively complex and the methods of the core mechanism are modified by final, which is not suitable for solving the problem by inheriting, overriding, and replacing. Later, a unique value is discovered in the code.

In the path of Map, we can intervene in the type of Map and override the corresponding put method to find the appropriate String.intern() call point.

Note:

Apart from the put method, other operations such as putAll and Map construction using parameter maps, can augment a Map's contents. Instead of recycling the put method, these actions share a putVal method, which is also finalized. Technically, given putVal cannot be overridden, these additional methods would also necessitate overrides. However, considering that MapDeserializer only calls the put method and the implementation of other methods is more complicated, only the put method is overridden.

My solution is to directly override the put method, which is simple and easy. Then replace the original HashMap type declaration of the JavaBean member with StringPoolMap:

public class StringPoolMap extends HashMap<String, String> {

    @Override
    public String put(String key, String value) {
        
        if (key != null) {
            key = key.intern();
        }
        
        if (value != null) {
            value = value.intern();
        }
        
        return super.put(key, value);
    }
}

At this point, the optimization has been completed, and the memory usage is reduced from 800M to 619M, saving 1G of space compared with the original 1.6G+.

Summary

The essence of this problem is not the call of String.intern(), but that the low information entropy is not well compressed. Therefore, the second iteration will re-consider and solve this problem from the design of the data structure.

More about the Implementation of String.intern()

I've been reading the JDK (OpenJDK) source code recently, so I'd like to expand on it.

String.intern() is a native method that represents:

Try to put "this" into a pool. If an object with the same content exists, return the existing address.
If the object does not exist, put "this" in and return the new address.

Find the corresponding source code:

Source code corresponding to String.intern()

#include "jvm.h"
#include "java_lang_String.h"

JNIEXPORT jobject JNICALL
Java_java_lang_String_intern(JNIEnv *env, jobject this)
{
    return JVM_InternString(env, this);
}

A JVM_InternString is called and the object "this" is passed in.

JVM_InternString

#include "jvm.h"
#include "java_lang_String.h"

JNIEXPORT jobject JNICALL
Java_java_lang_String_intern(JNIEnv *env, jobject this)
{
    return JVM_InternString(env, this);
}

StringTable

oop StringTable::intern(Handle string_or_null_h, const jchar* name, int len, TRAPS) {
  
  unsigned int hash = java_lang_String::hash_code(name, len);

  // Check the shared table and the local table for the string
  // Return quickly if found
  
  oop found_string = lookup_shared(name, len, hash);
  if (found_string != nullptr) {
    return found_string;
  }
  if (_alt_hash) {
    hash = hash_string(name, len, true);
  }
  found_string = do_lookup(name, len, hash);
  if (found_string != nullptr) {
    return found_string;
  }

  // If not, create one and insert it
  return do_intern(string_or_null_h, name, len, hash, THREAD);
}

Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

Community

1G Memory Usage Optimization with One Line of Code

Background

Body Text

Summary

More about the Implementation of String.intern()

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

Web Hosting Solution

Tair

Time Series Database for InfluxDB®

Real-Time Livestreaming Solutions