>> Watch the full presentation here.
Hi everyone, my name is Jiong Xie, I'm a senior staff engineer at Alibaba DAMO Academy and Alibaba Cloud. I'm very honored to give this speech at VLDB 2022 about Ganos, a multi-dimensional dynamic and scene-oriented cloud-native spatial database engine developed by our team. Thanks everybody for joining, we are going to get going.
First of all, I will give some background knowledge about the recent progress in this field. In recent years we have entered a new era of city digital twins. So the first question is what are the digital twins, the city digital twins technology develops digitization copies of our cities and the managers them for the bi-directional interaction between digital and the real worlds.
It has broader applications in urban planning, smart traffic management, automated environment monitoring, and so forth. So MDS data is a very important kind of data that is increasing fast in the city digital twins application. Here MDS refers to multi-dimensional, dynamic, and scene-oriented spatial data. Compared with traditional spatial data that is often static and two-dimensional, MDS data can model real-life objects much more precisely. Here are two cases of MDS data, the first case is about building information modeling, BIM data represents real-world objects as 3D entities with textures and materials, so textures and materials are the visual information that are rarely considered by traditional spatial data. The second case is about UAV trajectories, the trajectory data is composed of 3D geo-positions which change over time and it also has scene-oriented information such as taking off and landing events and that collects remote sensing images with time stamp information.
So, what are the challenges to supporting the MDS data for the design of our DBMS (database management system). The first is about the data types, so MDS data has much more complex data structures and a much larger data size than traditional spatial data. For instance, the size of a BIM object can be a few orders of magnitude larger than that of POI object.The second is about query types, many different types of queries should be supported, including spatial, spatio-temporal, scene-oriented, and cross-model queries. The last point is about efficiency, the large-scale and complex data structure results in long query time. We found that a big query can take hours to finish, so solving the query performance problem is a big challenge to us. As far as we know, traditional spatial RDBMS (relational database management systems) have limited support for MDS data in both data types and operations.
To overcome these challenges, we developed a new cloud-native spatial RDBMS engine called Ganos. The name comes from the goddess of earth Gaia, and the god of time Chronos. It's built on the cloud-native relational database PolarDB for PostgreSQL, developed by Alibaba Cloud. I'd like to give a brief description of features of Ganos, it considers MDS data as first-class citizens, we present a new multi-dimensional data hierarchy and provide a systematic framework to manage the MDS data. It utilizes cloud-native approaches to solve big storage and big queries. I will introduce these matters in more detail as we go on.
The figure shows the architecture of Ganos and the relation between Ganos and PolarDB. PolarDB adapts shared storage architecture that decouples computing from storage. At the computation layer, PolarDB has one primary node supporting both read and write and many read-only nodes. Ganos is a spatial database engine of PolarDB, it enables PolarDB to store index and query MDS data, Ganos extends PolarDB in four aspects, including data models, access methods, extended storage, and query processing. Let's look at the details. In this page, I would like to introduce four main MDS data types supported by Ganos. 3D mesh is used to represent a 3D object with shape, visuals, and general information, such as a 3D building. Visuals contain textures, materials, UV codes, and even animation. Trajectory is used to represent a moving object whose location changes over time. It contains series of key points and events. Raster is used to represent gridded data where each grid is associated with geolocation. For example, remote sensing images, it contains footprint, time stamp, and grid matrix. Point cloud is used to represent a collection of 3D points within dimensional attributes. we often use it for high-definition map generation in autonomous driving.
Each MDS data type is implemented as a compact binary sequence called SLOB. SLOB refers to the spatial large objects, each SLOB is divided into two parts profile and details. Profile stores the summary of an object and it is used for filtering such as spatial, spatio-temporal metadata, and detail contains detailed information of an object. Here is an example showing how Ganos stores a building. A building is decomposed into multiple components, for example a building has a big roof, many doors and many tables and so on. Each component can be stored as a SLOB of 3D mesh type that is composed of shape, textures, materials and so forth. In a format of compact binary sequence, the components share the same building ID, indicating that they belong to the same building.
Actually we just use a very mature and classic spatial index n D R-tree based on GIST+ framework. You know GiST refers to the Generalized Search Tree, originally provided by PostgreSQL, GiST+ extends GiST in the way that GiST+ can select among indexes for answering a query on the fly. This is the extended storage mechanism of Ganos, Ganos allows storing SLOB profile in a database table and the storing details on OSS. OSS refers to object storage service, just like Amazon's S3. We support two extended storage methods for different applications hot/cold data separation and heterogeneous file access. For example the last row in the figure is using heterogeneous file access, we build the direct links to the external files on OSS, such as for remote sensing images Gob for 3D models. We we will store profile and external files OSS information in table in line. The spatial object locator hides the underlying storage details and allows the query processor to access the data in different ways. So extended storage can achieve a decent trade-off between storage cost and the query performance. Ganos supports spatial queries, spatio-temporal queries, scene-oriented queries, and cross model queries. Traditional spatial RDBMS are very mature in spatial queries, but they have limited support for spatio-temporal queries and scene-oriented queries. Ganos has implemented hundreds of operations to support these queries, the table lists some examples on the supported queries.
Ganos extends the parallelism mechanism of PolarDB to accelerate the query processing and Implement spatial oriented multi-level parallelism including IQP and IFP. IQP parallelizes a big query bysigning data slices to many RO nodes. Data slice assignment can be hash based or dynamic. IFP is a final granularity parallelism compared to IQP. Let's look at the picture on the right, if role number one is a huge cell, for example a big roof of a building, IFP will divide it into four sub-objects and creates four sub-processes to process them in parallel. So what is IFP for, IFP is used to mitigate the potential load imbalance problem that is caused by the existence of spatial objects with drastic size differences.
Here we give a simple use case study about the cross model query for checking the trajectory of a UAV and its relation with buildings and the ground in a city digital twins scene with different data types. Let's look at the picture we use 3D geometry type to model no-fly zone, use Raster type to model digital revolution model, use 3D mesh tab to model buildings and there's a trajectory type to model the track of UV.
Here we gave two MDS SQL examples you can see our papers for details. The bottom line is by using simple SQL to perform the complex cross model queries in the applications of digital twins, users can benefit from reducing the development cost and the complexity of the systems.
This page gives some key evaluation results of Ganos. So the data sets we use include OSM data and the real BIM data, there are three key conclusions we can get from the results. OSS can reduce storage cost with an acceptable sacrifice of QPS. Although reading data from OSS is slow, with the help of the indexes the query performance on spatial temporal queries is still acceptable. Spatial-oriented multi-level parallelism with IQP plus IFP can significantly accelerate the process you know big queries on MDS data.
Ganos have the highest offered service in Alibaba Cloud for over four years, and it has been applied to a total of 45 application directions. In this section I will introduced several novel applications. The first case is about 3D scenes and analytics with the interpretation of BIM and 3D GIS. We achieve in-depth based computation acceleration by nearly 100 times in urban planning and construction of a new administrative district in China. The second and third cases are about querying dynamic data and the databases for Geo AI. They are very important application areas for modern spatial database you can see our papers for details and the lessons we learned from our customers. In conclusion with the rapid development of smart cities digital twins and cloud computing, the existing spatial relational databases cannot meet the requirement of modern applications for MDS data processing. So Ganos provides a systematic framework of data models, access methods, and operations for MDS data. Especially Ganos optimizes the processing of queries on MDS data through cloud-native capabilities, which provides a new practice of moving from traditional on-premise spatial database to cloud-native spatial database.
In the future work, we consider leveraging GPU resources on the cloud to accelerate Ganos and utilizing the PolarDB serverless framework to achieve better dynamic resource provisioning.
Thank you for your attention!
Read Full Paper: https://www.vldb.org/pvldb/vol15/p3483-chen.pdf
An Interpretation of PolarDB-X Source Codes (1): CN Code Structure
Alibaba Cloud Community - September 23, 2022
ApsaraDB - December 21, 2022
ApsaraDB - December 25, 2023
ApsaraDB - January 13, 2022
ApsaraDB - October 26, 2023
ApsaraDB - July 3, 2019
Leverage cloud-native database solutions dedicated for FinTech.
Learn MoreMigrate your legacy Oracle databases to Alibaba Cloud to save on long-term costs and take advantage of improved scalability, reliability, robust security, high performance, and cloud-native features.
Learn MoreLindorm is an elastic cloud-native database service that supports multiple data models. It is capable of processing various types of data and is compatible with multiple database engine, such as Apache HBase®, Apache Cassandra®, and OpenTSDB.
Learn MoreDBStack is an all-in-one database management platform provided by Alibaba Cloud.
Learn MoreMore Posts by ApsaraDB