Serialization and deserialization are common techniques used in daily data persistence and network transmission. However, the current variety of serialization frameworks confuses people about the selection of serialization frameworks in different scenarios. This article compares open-source serialization frameworks in the industry by the universality, usability, scalability, performance, and supports for Java data types and syntax.
The following parts test and compare JDK Serializable, FST, Kryo, Protobuf, Thrift, Hessian, and Avro.
JDK Serializable is a serialization framework of Java. Users can use the serialization mechanism of Java by implementing java.io.Serializable
or java.io.Externalizable
. The implementation of serialization interfaces means only the class can be serialized or deserialized. ObjectInputStream
and ObjectOutputStream
are required to serialize and deserialize objects.
The following demo shows encoding and decoding using JDK Serializable:
/**
* Encoding
*/
public static byte[] encoder(Object ob) throws Exception{
//Buffer for byte numbers
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
//Serialize the object
ObjectOutputStream objectOutputStream = new ObjectOutputStream(byteArrayOutputStream);
objectOutputStream.writeObject(ob);
byte[] result = byteArrayOutputStream.toByteArray();
//Close the stream
objectOutputStream.close();
byteArrayOutputStream.close();
return result;
}
/**
* Decoding
*/
public static <T> T decoder(byte[] bytes) throws Exception {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
ObjectInputStream objectInputStream = new ObjectInputStream(byteArrayInputStream);
T object = (T) objectInputStream.readObject();
objectInputStream.close();
byteArrayInputStream.close();
return object;
}
JDK Serializable is a built-in serialization framework of Java. Therefore, cross-language serialization and deserialization are not supported.
JDK Serializable can complete the serialization task without referencing any external dependencies. However, it is more difficult to use than the open-source frameworks. The encoding and decoding demo above is not user-friendly. ByteArrayOutputStream
and ByteArrayInputStream
are required to convert all bytes.
JDK Serializable uses serialVersionUID
to control the version of a serialized class. If the versions of serialized and deserialized classes are different, the exception message java.io.InvalidClassException
is returned, indicating that the VUID of the serialized class is inconsistent with that of deserialized classes.
java.io.InvalidClassException: com.yjz.serialization.java.UserInfo; local class incompatible: stream classdesc serialVersionUID = -5548195544707231683, local class serialVersionUID = -5194320341014913710
The exception above occurs because serialVersionUID
is not defined but generated automatically by JDK Serializable through the hash algorithm. Therefore, the results of serialization and deserialization are inconsistent.
Users can define serialVersionUID
and use it during serialization and deserialization to avoid this problem. By doing so, JDK Serializable can support field extension.
private static final long serialVersionUID = 1L;
Although JDK Serializable is exclusive for Java, its performance is not so good. The following test sample will also be used for all other frameworks.
public class MessageInfo implements Serializable {
private String username;
private String password;
private int age;
private HashMap<String,Object> params;
...
public static MessageInfo buildMessage() {
MessageInfo messageInfo = new MessageInfo();
messageInfo.setUsername("abcdefg");
messageInfo.setPassword("123456789");
messageInfo.setAge(27);
Map<String,Object> map = new HashMap<>();
for(int i = 0; i< 20; i++) {
map.put(String.valueOf(i),"a");
}
return messageInfo;
}
}
The byte size after serialization by JDK Serializable is 432. This number will be compared to the other serialization frameworks.
Now, perform serialization and deserialization on the test sample 10 million times and then calculate the total time consumption:
Time Consumed for Serialization (ms) | Time Consumed for Deserialization (ms) |
38,952 | 96,508 |
The results will also be compared with those of other serialization frameworks.
JDK Serializable supports most Java data types and syntax.
WeakHashMap does not implement the interfaces of JDK Serializable.
Note 1: Serialize the following code:
Runnable runnable = () -> System.out.println("Hello");
Direct serialization will result in the following exception:
com.yjz.serialization.SerializerFunctionTest$$Lambda$1/189568618
Runnable Lambda does not implement the interfaces of JDK Serializable. Modify the code below to serialize Lambda expressions:
Runnable runnable = (Runnable & Serializable) () -> System.out.println("Hello");
Fast-serialization (FST) is a Java serialization framework that is fully compatible with the JDK serialization protocol. Its serialization speed is ten times faster than JDK Serializable, but the byte size is only 1/3 the size of JDK Serializable. The latest FST version is 2.56. FST has supported Android since version 2.17.
The following demo shows how to use FST for serialization. One FSTConfiguration
can be called by multiple threads. However, to prevent the performance bottleneck due to frequent calls, ThreadLocal
is usually used to assign an `FSTConfiguration
to each thread.
private final ThreadLocal<FSTConfiguration> conf = ThreadLocal.withInitial(() -> {
FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration();
return conf;
});
public byte[] encoder(Object object) {
return conf.get().asByteArray(object);
}
public <T> T decoder(byte[] bytes) {
Object ob = conf.get().asObject(bytes);
return (T)ob;
}
FST is also a serialization framework developed for Java, so it does not support cross-language serialization as well.
In terms of usability, FST is much better than JDK Serializable. Its syntax is extremely simple because FSTConfiguration
encapsulates most methods.
FST supports the compatibility of new fields with old data streams using the @Version comment. All new fields must be marked with @Version comments. If there is no @Version comment, the version number is 0.
private String origiField;
@Version(1)
private String addField;
Note:
readObject
and writeObject
methods.On the whole, FST has scalability, but it is still complicated to use.
Use FST to serialize the test sample using JDK Serializable. The byte size is 172, which is almost 1/3 the size of JDK Serializable. The following table shows the time consumption of serialization and deserialization.
Time Consumed for Serialization (ms) | Time Consumed for Deserialization (ms) |
13,587 | 19,031 |
FST can be optimized by disabling circular reference and pre-registering serialized classes.
private static final ThreadLocal<FSTConfiguration> conf = ThreadLocal.withInitial(() -> {
FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration();
conf.registerClass(UserInfo.class);
conf.setShareReferences(false);
return conf;
});
After the optimization above, the time consumption is listed below:
Time Consumed for Serialization (ms) | Time Consumed for Deserialization (ms) |
7,609 | 17,792 |
The serialization time consumption has decreased by nearly half, but the byte size has increased to 191.
FST is developed based on JDK Serializable. Therefore, they support the same Java data types and syntax.
Kryo is a fast and effective Java binary serialization framework. It relies on the underlying ASM library to generate bytecode, so it runs quickly. Kryo aims to provide a serialization framework with fast serialization speed, small result size, and simple APIs. Kryo supports automatic deep copy and shallow copy and realizes deep copy in object :arrow_right: object mode instead of object :arrow_right: byte :arrow_right: object mode.
The following demo shows how to use Kryo for serialization:
private static final ThreadLocal<Kryo> kryoLocal = ThreadLocal.withInitial(() -> {
Kryo kryo = new Kryo();
kryo.setRegistrationRequired(false);//No need to pre-register the class
return kryo;
});
public static byte[] encoder(Object object) {
Output output = new Output();
kryoLocal.get().writeObject(output,object);
output.flush();
return output.toBytes();
}
public static <T> T decoder(byte[] bytes) {
Input input = new Input(bytes);
Object ob = kryoLocal.get().readClassAndObject(input);
return (T) ob;
}
Note: The corresponding Input.readxxx
function must be used when using the Output.writeXxx
function. For example, Output.writeClassAndObject()
must be used together with Input.readClassAndObject()
.
On the official website, Kryo is described as a Java binary serialization framework. In addition, no cross-language practices of Kryo are found on the Internet. Although some articles have mentioned that the cross-language use of Kryo is very complicated, no related implementation in other languages is found.
In terms of usage, the APIs provided by Kryo are also very simple and easy to use. The Input and Output encapsulate almost all stream operations. Kryo provides rich and flexible configurations, such as serializer customization and default serializer setting, but they are difficult to use.
The default Kryo serializer FiledSerializer does not support field extension. Other default serializers are required to support field extension.
For example:
private static final ThreadLocal<Kryo> kryoLocal = ThreadLocal.withInitial(() -> {
Kryo kryo = new Kryo();
kryo.setRegistrationRequired(false);
kryo.setDefaultSerializer(TaggedFieldSerializer.class);
return kryo;
});
After using Kryo, the byte size after serialization is 172, which is the same as FST before optimization. The time consumption is listed below:
Time Consumed for Serialization (ms) | Time Consumed for Deserialization (ms) |
13,550 | 14,315 |
Disable the circular reference and serialized class pre-registering. The byte size after serialization is 120 because the identity of the serialized class is a number instead of the class name. The time consumption is listed below:
Time Consumed for Serialization (ms) | Time Consumed for Deserialization (ms) |
11,799 | 11,584 |
Kryo requires no-arg constructors to serialize classes because no-arg constructors are used to create objects during deserialization.
Protocol Buffer (Protobuf) is a language-neutral, platform-independent, and scalable serialization framework. Compared with previous serialization frameworks, Protobuf needs to predefine the schema.
The following demo shows how to use Protobuf:
(1) Prepare the .proto description file:
syntax = "proto3";
option java_package = "com.yjz.serialization.protobuf3";
message MessageInfo
{
string username = 1;
string password = 2;
int32 age = 3;
map<string,string> params = 4;
}
(2) Generate Java code:
protoc --java_out=./src/main/java message.proto
(3) The generated Java code already contains encoding and decoding methods:
//Encoding
byte[] bytes = MessageInfo.toByteArray()
//Decoding
MessageInfo messageInfo = Message.MessageInfo.parseFrom(bytes);
Protobuf is designed as a language-independent serialization framework. Currently, it supports Java, Python, C++, Go, and C# and provides third-party packages for many other languages. Therefore, in terms of universality, Protobuf is very powerful.
Protobuf uses interface definition language (IDL) to define the schema description file. After defining the description file, the protoc compiler can be used to generate serialization and deserialization code directly. Therefore, to use Protobuf, users simply need to prepare the description file.
Scalability is also one of the goals of Protobuf design. The .proto files can be modified easily.
Add fields: To add fields, make sure that the new fields have corresponding default values to interact with the old code. The message generated by the new protocol can be parsed by the old protocol.
Delete fields: To delete fields, note that the corresponding field or tag cannot be used in subsequent updates. The "reserved" command can be used to avoid errors.
message userinfo{
reserved 3,7; //Set field tags to be deleted as "reserved"
reserved "age","sex" //Set fields to be deleted as "reserved"
}
Protobuf is also compatible with many value types, such as int32, unit32, int64, unit64, and Boolean. The type can be changed as needed.
Protobuf has made a lot of efforts in scalability so it can support protocol extensions.
Perform the same serialization operation on the test sample using Protobuf. The byte size after serialization is 192. The following table lists the corresponding time consumption:
Time Consumed for Serialization (ms) | Time Consumed for Deserialization (ms) |
14,235 | 30,694 |
The deserialization performance of Protobuf is worse than FST and Kryo.
Protobuf does not support defining Java methods because it uses IDL to define schemas. The following table displays the supports of Protobuf for data types:
Note: List, Set, and Queue collection classes are defined and tested through the "repeated" modifier in Protobuf. Any class that implements the Iterable interface can use the repeated list.
Thrift is an efficient remote procedure call (RPC) framework developed by Facebook and supports multiple languages. Later, Facebook made Thrift open-source to Apache. As an RPC framework, Thrift is often used in serialization because it provides RPC services across multiple languages.
To perform serialization using Thrift, create the Thrift IDL file first and then compile the file to generate Java code. Next, use TSerializer
and TDeserializer
to serialize and deserialize objects.
(1) Use IDL to define the .thrift files:
namespace java com.yjz.serialization.thrift
struct MessageInfo{
1: string username;
2: string password;
3: i32 age;
4: map<string,string> params;
}
(2) Use the compiler of Thrift to generate Java code:
thrift --gen java message.thrift
(3) Use TSerializer
and TDeserializer
for encoding and decoding:
public static byte[] encoder(MessageInfo messageInfo) throws Exception{
TSerializer serializer = new TSerializer();
return serializer.serialize(messageInfo);
}
public static MessageInfo decoder(byte[] bytes) throws Exception{
TDeserializer deserializer = new TDeserializer();
MessageInfo messageInfo = new MessageInfo();
deserializer.deserialize(messageInfo,bytes);
return messageInfo;
}
Similar to Protobuf, Thrift also uses IDL to define the description file. This is an effective method to implement cross-language serialization/RPC. Thrift supports C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml, Delphi, and many other languages. So, Thrift is very universal.
Thrift is similar to Protobuf in terms of usability. Three steps are required for both of them:
The generated classes in Protobuf contain built-in serialization and deserialization methods, while Thrift needs to call a built-in serializer to encode and decode the class.
Thrift supports field extensions. Please note the following issues when extending fields:
Do not reuse the deleted integer field tag, or deserialization may be affected.
For the test sample, the byte size after serialization using Thrift is 257. The corresponding time consumption is listed below:
Time Consumed for Serialization (ms) | Time Consumed for Deserialization (ms) |
28,634 | 20,722 |
The time consumption of Thrift is very close to Protobuf in serialization and deserialization. Protobuf consumes less time in serialization than Thrift, while Thrift is better in deserialization.
Thrift uses IDL to define the serialization class. Thrift supports the following Java data types:
Thrift does not support defining Java methods.
Hessian is a lightweight RPC framework developed by Caucho. It uses HTTP protocol to transmit data and supports binary serialization.
Hessian is often used as a serialization framework because it supports cross-language and efficient binary serialization protocol. The Hessian serialization protocol includes Hessian 1.0 and Hessian 2.0. Hessian 2.0 optimizes the serialization process, and its performance is significantly improved compared with Hessian 1.0.
It is very simple to serialize objects using Hessian. Only HessianInput
and HessianOutput
are needed. The following demo shows how to use Hessian for serialization:
public static <T> byte[] encoder2(T obj) throws Exception{
ByteArrayOutputStream bos = new ByteArrayOutputStream();
Hessian2Output hessian2Output = new Hessian2Output(bos);
hessian2Output.writeObject(obj);
return bos.toByteArray();
}
public static <T> T decoder2(byte[] bytes) throws Exception {
ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
Hessian2Input hessian2Input = new Hessian2Input(bis);
Object obj = hessian2Input.readObject();
return (T) obj;
}
Like Protobuf and Thrift, Hessian supports RPC communication across languages. One of the main advantages of Hessian over other cross-language PRC frameworks is that it does not use IDL to define data and services. Instead, it defines services by self-description. Currently, Hessian supports languages, including Java, Flash/Flex, Python, C++, .Net/C#, D, Erlang, PHP, Ruby, and Object-C.
Hessian does not need IDL to define data and services. It only needs to implement the Serializable interface for serialized data. Therefore, Hessian is easier to use compared to Protobuf and Thrift.
Although Hessian needs to implement the Serializable interface to serialize classes, it is not affected by serialVersionUID
and supports field extension easily.
The byte size after serialization is 277 using Hessian 1.0 and 178 using Hessian 2.0.
The time consumption of serialization and deserialization is listed below:
Time Consumed for Serialization (ms) | Time Consumed for Deserialization (ms) | |
Hessian 1.0 | 57,648 | 55,261 |
Hessian 2.0 | 38,823 | 17,682 |
The results show that Hessian 2.0 is much better than Hessian 1.0 in both the bite size and time consumption.
As Hessian uses Java self-description to serialize classes, the native data types, collection classes, custom classes, and enumeration types of Java are mostly supported (SynchronousQueue is not supported.) Java syntax is also supported.
Avro is a data serialization framework. It is a sub-project of Apache Hadoop and a data serialization framework developed by Doug Cutting while he was in charge of Hadoop. Avro is designed to support data-intensive applications and is suitable for remote or local large-scale data exchange and storage.
Use Avro to serialize objects in the following three steps:
(1) Define the avsc file:
{
"namespace": "com.yjz.serialization.avro",
"type": "record",
"name": "MessageInfo",
"fields": [
{"name": "username","type": "string"},
{"name": "password","type": "string"},
{"name": "age","type": "int"},
{"name": "params","type": {"type": "map","values": "string"}
}
]
}
(2) Use avro-tools.jar or Maven to compile and generate Java code:
java -jar avro-tools-1.8.2.jar compile schema src/main/resources/avro/Message.avsc ./src/main/java
(3) Use BinaryEncoder and BinaryDecoder for encoding and decoding:
public static byte[] encoder(MessageInfo obj) throws Exception{
DatumWriter<MessageInfo> datumWriter = new SpecificDatumWriter<>(MessageInfo.class);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
BinaryEncoder binaryEncoder = EncoderFactory.get().directBinaryEncoder(outputStream,null);
datumWriter.write(obj,binaryEncoder);
return outputStream.toByteArray();
}
public static MessageInfo decoder(byte[] bytes) throws Exception{
DatumReader<MessageInfo> datumReader = new SpecificDatumReader<>(MessageInfo.class);
BinaryDecoder binaryDecoder = DecoderFactory.get().directBinaryDecoder(new ByteArrayInputStream(bytes),null);
return datumReader.read(new MessageInfo(),binaryDecoder);
}
Avro defines the data structure through schemas. Currently, Avro supports Java, C, C++, C#, Python, PHP, and Ruby, so Avro is universal among these languages.
Avro does not need to generate code for dynamic languages. However, for static languages, such as Java, avro-tools.jar is still necessary to compile and generate Java code. It is more complicated to write a Schema in Avro than in Thrift and Protobuf.
The byte size after serialization using Avro is 111. The following table lists the time consumption:
Time Consumed for Serialization (ms) | Time Consumed for Deserialization (ms) | |
Generate Java code | 26,565 | 45,383 |
Avro needs to use supported data types to write schema information. Avro supports the basic Java data types (null, Boolean, int, long, float, double, bytes, and string) and complex Java data types (Record, Enum, Array, Map, Union, and Fixed.)
Avro generates code automatically or by using schemas. Java methods cannot be defined in serialized classes.
The following table compares the universality of different serialization frameworks. It shows how Protobuf is the best because it supports multiple programming languages.
The following table compares the API usability of different serialization frameworks. All serialization frameworks provide good API usage except JDK Serializer.
The following table compares the scalability of serialization frameworks. The scalability of Protobuf is the most convenient and natural. Other serialization frameworks require some configurations and comments for scalability.
The following figure compares the byte size in different serialization frameworks after serialization. The serialization results of the Kryo pre-registering feature (pre-register the serialized class) and Avro are both very good. So, if the byte size after serialization is restricted, choose Kryo or Avro.
The following figure shows the serialization and deserialization time consumption. The Kryo pre-registering feature and the FST pre-registering feature have excellent performance. The time consumed for serialization in FST is the shortest, while the time consumed for serialization and deserialization in Kryo are almost the same. Therefore, if serialization time consumption is a key metric, choose Kryo or FST.
Java Data types supported by the serialization frameworks:
Note: The collection class tests cover most corresponding implementation classes.
The following table lists the data types and syntax supported by serialization frameworks.
Protobuf and Thrift use IDL to define class files and then use compilers to generate Java code. IDL does not provide syntax to define the static internal classes or non-static internal classes. Therefore, these functions cannot be tested.
1 posts | 0 followers
FollowAlibaba Clouder - December 5, 2016
Alibaba Cloud Native Community - September 13, 2023
Alibaba Cloud Native Community - November 22, 2023
Alibaba Cloud Native Community - December 8, 2023
Alibaba EMR - August 28, 2019
Alibaba Clouder - November 15, 2019
hi there is a error in the "Comparison of Time Consumption" picture. there are 2 thrift, maybe one is avro?
Your post is very inspiring and inspires me to think more creatively., very informative and gives interesting new views on the topic., very clear and easy to understand, makes complex topics easier to understand, very impressed with your writing style which is smart and fun to work with be read. , is highly relevant to the present and provides a different and valuable perspective.
1 posts | 0 followers
FollowA low-code development platform to make work easier
Learn MoreHelp enterprises build high-quality, stable mobile apps
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreAlibaba Cloud (in partnership with Whale Cloud) helps telcos build an all-in-one telecommunication and digital lifestyle platform based on DingTalk.
Learn More
5234642629506610 January 19, 2022 at 10:00 pm
Thank you for this excellent article, this is very informative. What type of CPU was used for executing the benchmarks? This would allow people (me ;-) to make own tests in relation to your results.