Serialization is a dumping-recovering operation process, which supports operations, such as dumping an object to a temporary buffer or a permanent file and recovering the contents of the temporary buffer or the permanent file to an object. The purpose is to share and transfer data between different application programs, achieving cross-application, cross-language and cross-platform decoupling, and to instantly save the values of the contents of the data structure to a file when the application is abnormal or crashes on the customer site, and to recover these values to assist in analyzing and locating the causes when they are sent back to the developer.
Generic programming is an abstract implementation process of different data types with the same function (such as the STL source implementation), which supports the compiler to automatically derive specific types and generate implementation code at compile time. At the same time, based on the specific properties or optimization needs of specific types, it supports specific implementations using features, such as specialization or partial specialization, and template meta-programming.
#include <iostream>
int main(int argc, char* argv[])
{
std::cout << "Hello World!" << std::endl;
return 0;
}
Generic programming is actually all around us. Many functions and classes in std and stl namespaces that we often use are implemented by generic programming. For example, in the code above, std::cout
is the template class, and std::basic_ostream
is a specialization.
namespace std
{
typedef basic_ostream<char> ostream;
}
In addition to the std::cout
and std::basic_ostream
mentioned above, C++ also provides various forms of input and output template classes, such as std::basic_istream
, std::basic_ifstream
, std::basic_ofstream
, std::basic_istringstream
, and std::basic_ostringstream
, which mainly implements built-in
input and output interfaces. For example, for Hello World
, it can be directly used in strings. However, for input and output of a custom type, it is necessary to overload and implement operators >>
and <<
, as for the custom class below.
class MyClip
{
bool mValid;
int mIn;
int mOut;
std::string mFilePath;
};
A series of compilation errors occur if you use the following method.
MyClip clip;
std::cout << clip;
The error content is basically some information that clip
does not support the <<
operator and the conversion operation is not supported when the clip
is converted to a series of built-in types supported by cout
, such as void*
and int
.
To solve compilation errors, we need to make the class MyClip
support input and output operators >>
and <<
. Similar implementation code is as follows.
inline std::istream& operator>>(std::istream& st, MyClip& clip)
{
st >> clip.mValid;
st >> clip.mIn >> clip.mOut;
st >> clip.mFilePath;
return st;
}
inline std::ostream& operator<<(std::ostream& st, MyClip const& clip)
{
st << clip.mValid << ' ';
st << clip.mIn << ' ' << clip.mOut << ' ';
st << clip.mFilePath << ' ';
return st;
}
To access the private member variables of class objects normally, we also need to add serialized and deserialized friend functions in custom types (recall the reason why friend functions must be used instead of directly overloading operators >>
and <<
). For example:
friend std::istream& operator>>(std::istream& st, MyClip& clip);
friend std::ostream& operator<<(std::ostream& st, MyClip const& clip);
The implementation method of serialization is intuitive and easy to understand, but the defect is that for large-scale project development, the number of custom types may reach tens of thousands or even more, so we need to implement 2 functions for each type. One is to serialize the dumped data, and the other is to deserialize the recovered data, which not only increases the amount of code developed and implemented, but also needs to modify these 2 functions once the member variables of some classes are modified in the later stage.
At the same time, more complex custom types are considered, such as member variables that contain inheritance relationships and custom types.
class MyVideo : public MyClip
{
std::list<MyFilter> mFilters;
};
As in the above code, things become more complicated when dumping-recovering the object contents of the class MyVideo
, because the base class also needs to be dumped-recovered. And the member variables use the combination of STL template container list
and custom class 'MyFilter`, in which case, the implementation of dumping-recovering also needs to be defined.
To address the above questions, is there a way to reduce the workload of code modification while being easy to understand and maintain?
For the problems encountered in using C++ standard input and output methods, Boost provides a good solution - all types of dumping-recovering operations are abstracted into 1 function, which is easy to understand. For the above types, the above 2 friend functions are simply replaced with the following 1 friend function.
template<typename Archive> friend void serialize(Archive&, MyClip&, unsigned int const);
The implementation of the friend functions is similar to the following:
template<typename A>void serialize(A &ar, MyClip &clip, unsigned int const ver)
{
ar & BOOST_SERIALIZATION_NVP(clip.mValid);
ar & BOOST_SERIALIZATION_NVP(clip.mIn);
ar & BOOST_SERIALIZATION_NVP(clip.mOut);
ar & BOOST_SERIALIZATION_NVP(clip.mFilePath);
}
BOOST_SERIALIZATION_NVP
is a macro defined inside Boost, whose main function is to package various variables.
The use of dumping-recovering acts directly on operators >>
and <<
. For example:
// store
MyClip clip;
...
std::ostringstream ostr;
boost::archive::text_oarchive oa(ostr);
oa << clip;
// load
std::istringstream istr(ostr.str());
boost::archive::text_iarchive ia(istr);
ia >> clip;
The std::istringstream
and std::ostringstream
are used here to recover data from the string stream and dump the data of class objects into the string stream, respectively.
For classes MyFilter
and MyVideo
, the same method is used, which is to add an implementation of the template friend function "serialize
", respectively. As for std::list
template class, Boost
has already implemented it.
At this time, we found that for each defined class, all we need to do is declare a template friend function inside the class and implement the template function outside the class. For the subsequent modification of the member variables of the class, such as adding, deleting or renaming member variables, it is only necessary to modify a function.
The Boost serialization library is perfect enough, but the story is not over!
While developing on the terminal, we found several challenges in referencing the Boost serialization library.
To solve these problems encountered in using Boost, we think it is necessary to re-implement the serialization library to remove the dependence on Boost and meet the following requirements at the same time:
To be compatible with the existing Boost code and keep the habits of the current developers, while minimizing the workload of refactoring with code modification, we should keep the template function "serialize
". For the internal implementation of the template function, the following definitions are directly used without repackaging the member variables, to improve the efficiency.
#define BOOST_SERIALIZATION_NVP(value) value
For the call to the dumping-recovering interface, the current invocation method is still used, except that the input and output classes are modified to the following
alivc::text_oarchive oa(ostr);
alivc::text_iarchive ia(istr);
So far, the external interface of the serialization library has been completed, and the rest is the internal work. How should the internal framework of the serialization library be redesigned and implemented to meet the requirements?
First, let's look at the process flow chart of the current design architecture.
For example, for the dumping class text_oarchive
, the interfaces it supports must include the following
explicit text_oarchive(std::ostream& ost, unsigned int version = 0);
template <typename T> text_oarchive& operator<<(T& v);
template <typename T> text_oarchive& operator&(T& v);
When developers call the operator function <<
, they need to callback to the template function "serialize
" of the corresponding type first.
template <typename T>
text_oarchive& operator<<(T& v)
{
serialize(*this, v, mversion);
return *this;
}
When developers start to operate on each member of a specific type, judgment is needed at this moment. If the member variable is already a built-in type, perform the serialization directly. If it is a custom type, callback to the template function "serialize
" of the corresponding type.
template <typename T>
text_oarchive& operator&(T& v)
{
basic_save<T>::invoke(*this, v, mversion);
return *this;
}
In the code above, basic_save::invoke
completes the template type derivation at compile time and chooses whether to dump the built-in type directly or callback to the "serialize
" function of the corresponding type of the member variable to continue repeating the above process.
Due to the limited number of built-in types, we choose to set the default behavior of the template class basic_save to callback to the "serialize
" function of the corresponding type.
template <typename T, bool E = false>
struct basic_load_save
{
template <typename A>
static void invoke(A& ar, T& v, unsigned int version)
{
serialize(ar, v, version);
}
};
template <typename T>
struct basic_save : public basic_load_save<T, std::is_enum<T>::value>
{
};
In this case, the template parameter of the above code has an additional parameter E
. Here, it is mainly necessary to perform special processing on the enumerated type. The implementation using partial specialization is as follows:
template <typename T>
struct basic_load_save<T, true>
{
template <typename A>
static void invoke(A& ar, T& v, unsigned int version)
{
int tmp = v;
ar & tmp;
v = (T)tmp;
}
};
At this point, we have completed the default behavior of the overloaded operator &
, which is continuously backtracking to the template function "serialize
" of the corresponding member variable type, but we need to stop the backtracking process for a built-in model, such as the int
type.
template <typename T>
struct basic_pod_save
{
template <typename A>
static void invoke(A& ar, T const& v, unsigned int)
{
ar.template save(v);
}
};
template <>
struct basic_save<int> : public basic_pod_save<int>
{
};
For the int
type, the integer value is directly dumped into the output stream. At this time, a final dump function needs to be added to text_oarchive
.
template <typename T>
void save(T const& v)
{
most << v << ' ';
}
Here, we find that in the save
member function, we have already output the value of the specific member variable to the stream.
For other built-in types, they are treated in the same way and implemented by referring to the source code of C++ std::basic_ostream
.
Correspondingly, the operation flow of text_iarchive
for recovery operation is as follows:
We have conducted a comparative test on using the Boost serialization library and the reimplemented serialization library. The results are as follows:
Boost
, and replace the Boost
related namespaces with the macros of alivc
, BOOST_SERIALIZATION_FUNCTION
, and BOOST_SERIALIZATION_NVP
.Due to the current project, the re-implemented serialization library does not support the memory data pointed by the dumping-recovering pointer. However, the current design framework has taken this extensibility into account and it may be supported in the future.
2,599 posts | 762 followers
FollowAlibaba Cloud Community - July 29, 2024
Alibaba Cloud Native Community - March 6, 2023
block - September 14, 2021
Apache Flink Community China - December 25, 2019
Apache Flink Community China - September 16, 2020
Alibaba Cloud Community - October 9, 2023
2,599 posts | 762 followers
FollowElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreAn encrypted and secure cloud storage service which stores, processes and accesses massive amounts of data from anywhere in the world
Learn MoreLearn More
More Posts by Alibaba Clouder