By Zhaofeng Zhou (Muluo)
Table Store is Alibaba Cloud's first distributed multi-model database, and is a NoSQL database. Many application developers are familiar with NoSQL databases. Many application systems are no longer reliant at a low level on relational databases alone, but using different databases for different business scenarios. For example, cache KeyValue data is stored in Redis, documentary data is stored in MongoDB, and graphical data is stored in Neo4J.
Looking back at the development of NoSQL: NoSQL was created in the Web 2.0 era, and the rapid development of the Internet has also brought about an explosion of Internet data. Traditional relational databases cannot handle such massive amounts of data, which requires a highly-scalable, distributed database. However, it is very challenging to implement a highly-available, scalable distributed database based on the conventional model of relational data. Most data on the Internet is based on a simple model and does not need a relational model to model it. If data could be modeled with a simpler data model than the relational model, weakening transactions and constraints, and aim at high availability and scalability, then databases designed with these in mind would better meet business requirements. Such a concept is behind the development of NoSQL.
In summary, the development of NoSQL is based on the new challenges of business and the new requirements for databases in the Internet era. Developed on this basis, NoSQL has these distinctive features:
There has been significant development in all types of NoSQL databases in the past few years, which is apparent from development trend statistics related to DBEngines. Alibaba Cloud's Table Store is a distributed NoSQL database that uses a multi-model architecture for its data model, supporting both Wide Column and Timeline.
The wide column storage model, is a classic model, originally seen in Bigtable and subsequently widely adopted by other systems of the same type. Currently most semi-structured and structured data in the world is stored in systems based on this model. In addition to the Wide Column model, we have also introduced another completely new data model: Timeline, which is a new generation model for message data, suited to the storage and synchronization of messages in messaging systems such as IM, Feeds, and IoT device pushdowns, and is now seeing widespread application. Next, we describe these two models in detail.
The above is a schematic diagram of the wide column model. We take a relational model for comparison to help us better understand this model. A relational model can be simply understood as a two-dimensional model consisting of rows and columns, with a fixed schema for each row. So the features of a relational model are: two dimensions and fixed schema. This is the simplest understanding, aside from transactions and constraints. The wide column model is a three-dimensional model with the additional dimension of time added to the two dimensions of row and column. The time dimension is reflected in the attribute column, which has multiple values, each value corresponding to a timestamp for the version. And each row is schema free, with no strong schema defined. So the differences between the wide column model and the relational model are: three-dimension, schema free, and simplified transactions and constraints.
Taking a closer look at the composition of this model, it has several main parts:
The features of the Wide Column model can be summarized as follows: three-dimensional structure (row, column, and time), wide row, multi-version data, and TTL management. Meanwhile, in terms of data operation, the Wide Column model provides two types of data access APIs: Data API and Stream API.
For more detailed Data API documentation, see here. The Data API is a standard data API that provides online data read-write, including:
In relational model databases, there are no definitions of standard APIs for the incremental data in the database; while in many application scenarios of traditional relational databases, the use of incremental data (binlog) cannot be ignored. This is widely used inside Alibaba, and provides the DRC with this type of middleware to fully utilize this part of the data capability. After the incremental data has been fully utilized, there are many things we can do in terms of the technical architecture:
However, even though the incremental data of a relational database is useful, the industry does not have a standard API definition to obtain this data. Table Store has long recognized the value of this data, and so it provides a standard API that allows this data to be fully utilized. Refer to our Stream API documentation to learn more.
The Stream API generally includes:
The implementation of Table Store Stream is much more complicated than MySQL Binlog, as Table Store has a distributed architecture, and Stream also has a distributed incremental data consumption framework. The data consumption of the stream must be obtained in an order-preserving manner. The shards of the stream correspond to the partitions of the table inside the Table Store. The partition of the table may be split and merged. To ensure that data consumption for the old and newly-added shards, after partition splitting and merging, is order-preserving, we have designed a more sophisticated mechanism. The design of Table Store Stream is not described at length here, but we will release more detailed design documentation at a later time.
Due to the complexity of Stream's current internal architecture, which has also been introduced into Stream's data consumption side, using the Stream API is not all that simple. This year, we have also been planning the launch of a brand new data consumption service, which will simplify the data consumption of Stream and provide a simpler and easier to use API. Be sure to keep an eye out for its release.
The Timeline model is a new data model that we have created for message data scenarios. It is able to meet the special requirements needed for message data scenarios, such as message order preserving, massive message storage, and real-time synchronization.
The above is a schematic diagram of the Timeline model, which abstracts the data in a large table into multiple timelines. There is no upper limit on the number of timelines a large table can hold.
A timeline consists of:
The Timeline model is similar to the message queue in terms of logic, and a timeline is similar to the topic in a message queue. The difference is that the Table Store Timeline places greater emphasis on the scale of topics. In an instant messaging scenario, both the inbox and outbox of a user is a topic. In an IoT message scenario, each device corresponds to a topic, and the magnitude of topics will reach an order of tens of millions or even hundreds of millions. Table Store Timeline is based on an underlying distributed engine. Thus, a single table can theoretically support an unlimited number of timelines (topics), and the queue's Pub/Sub model has been simplified. It also supports message order preservation, random positioning, and ascending or descending order scans, while better meeting the requirements of massive message data scenarios, such as instant messaging (IM), feeds, and IoT messaging systems.
Timeline is a data model that was just launched last year and is constantly being improved on. Based on this model, we have helped DingTalk, Cainiao Smart Customer Service, Taopiaopiao Xiaojuchang, Smart Device Management, and other services to build messaging systems for instant messaging, feeds, and IoT messages. We invite you to give it a try.
57 posts | 12 followers
FollowAlibaba Cloud Storage - May 8, 2019
Alibaba Cloud Storage - February 27, 2020
Alibaba Cloud Storage - May 8, 2019
Alibaba Cloud Storage - November 8, 2018
Alibaba Cloud Storage - February 27, 2020
zhuoran - February 5, 2021
57 posts | 12 followers
FollowA fully managed NoSQL cloud database service that enables storage of massive amount of structured and semi-structured data
Learn MoreBuild a Data Lake with Alibaba Cloud Object Storage Service (OSS) with 99.9999999999% (12 9s) availability, 99.995% SLA, and high scalability
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreBuild business monitoring capabilities with real time response based on frontend monitoring, application monitoring, and custom business monitoring capabilities
Learn MoreMore Posts by Alibaba Cloud Storage