The concept of relational databases may seem a little "antiquated" in this day and age as its history begins with the IT technology present half a century ago. Though, realistically, the technology has always been at the core of even modern society, driving on most of the developments in commercial technical civilization. The three core technical fields – CPU, operating systems, and databases – are the cornerstone of information processing, computing power, and artificial intelligence.
From the publishing of E.F. Codd's thesis "A Relational Model of Data for Large Shared Data Banks" in 1970 to the arrival of the DB2, the commercial relational database supporting SQL on the market in the early '80s, Oracle's start, and the birth of SQL-Server in the '90s, successes of relational databases span the decades.
Today, with the development of the World Wide Web and the broad application of big data, more and more new types of databases are cropping up. However, relational databases continue to dominate the space. One of the primary reasons for the prevalence of relational databases is their adoption of SQL standards. This advanced, non-procedural programming interface language perfectly combines computer science and human-comprehensible data management methods, and remains difficult to surpass even today.
SQL (Structured Query Language) was invented by Boyce and Chamberlin in 1974 to act as a bridge between relational algebra and relational calculus. In essence it is a language that uses key words that resemble natural speech and grammar to define operations on data, program data storage, query, and management.
This abstract programming interface decouples the specific data problem from the details of data storage and query implementation, allowing large swaths of commercial business logic and information management computing models can be mass applied. This has released production power and significantly driven forward the development of commercial relational database systems.
Looking at the continued development and growth of SQL, it's not hard to see why it has already become the top choice in the world of relational databases. Even today, this programming language still has yet to be replaced by an alternative.
In 1976, Jim Gray published a thesis called "Granularity of Locks and Degrees of Consistency in a Shared Database" in which he formally defined the concept of database transactions and data consistency mechanism for relational database events. OLTP, is a classic application of relational databases which involves event processing, primarily basic, even daily processes such as transactions in a bank.
Event processing must follow ACID, four principles that ensure data accuracy. ACID stands for Atomicity, Consistency, Isolation, and Durability. Performance indicators used to measure the processing power of OLTP include response time and throughput.
In our brief overview of the history, position, and developmental phases of relational databases, we came across the names Oracle, SQL-Server, DB2, all of which are relational databases that still hold the top positions in global databases. Though they were once household names in the tech world, Informix and Sybase have already fallen out of the awareness of the general public.
However, beginning in the 1990s, a renewed wave of information sharing and the spirit of free and open software became a popular trend, bringing with it names like Linux, MySQL, PostgreSQL, and other open source software. The appearance of this trend and the strength with which it has grown have released a veritable tsunami of growth in society as these freely shared technical advances encourage massive growth in global Internet technology companies.
This progress belongs to the whole of society, but the credit belongs to those pioneering open-source developers, Richard Stallman, Linus Torvalds, Michael Widenius, and the like. Of course, more and more Chinese companies have become active participants in the open source community over the past few years, also freely contributing their own technological advancements with the rest of the open source world.
In reaction to the popularity of green computing and the shared economy, we need not just cloud servers, cloud data, networks, hardware chips, and other integrations of hardware and software, but also to continue putting the needs of users at the center of technology. With service that is focused on the user, technology will spread across public consciousness and further drive forward the development of computing efficiency and intelligence.
We currently exist in a phase of vigorous development of the so called "Cloud 2.0". This phase has seen the rise of a number of issues relating to the management of relational databases. It was precisely why AWS, Amazon's cloud computing unit, published Aurora on November 12, 2014. Launched at the AWS re:Invent 2014 conference, Aurora is a new-generation cloud-hosted relational database. The release of this new generation of databases heralds a new phase not only in the age of cloud computing but in the evolution of core technologies given to us by the IT era.
In 2017 in the SIGMOD data conference, Amazon published a thesis entitled "Amazon Aurora: Design Considerations for High Throughput Cloud Native Relational Databases", which further explained how the relational database based on the cloud environment design called Cloud-Native was born.
Click here to read more.
Cloud computing has provided more computing capability, and more creative power, to propel the Internet era. Relational databases are something few applications can do without. Cloud databases that are ready to use out of the box and feature high performance to cost ratios have found favor among developers all over the world.
Early versions of the MySQL database were optimized for early systems/hardware, but they didn’t take into consideration the kinds of systems/hardware that are becoming popular now. Therefore they leave a lot to be desired in high concurrency situations. Furthermore, unlike other relational databases, for the sake of compatibility, MySQL needs to write two backup logs (a task log and a copy log) which lowers its performance in comparison to other commercial databases. The above complaints all come from real customer cases, so to put it simply, the underlying structure of traditional cloud databases give rise to the following problems:
1.Read/write instances and read-only instances each have their own independent copy of the data, so when the customer purchases a new read-only instance, he needs not only to pay the computing costs, but also needs to purchase the appropriate storage resources.
2.Since traditional backup techniques also involve copying data and uploading to cheap storage, the speed of the operation is bottlenecked by the speed of the network.
3.Since both read/write instances and read-only instances each have their own copy of the data, creating a new read-only instance also involves re-copying all of the data, so when we take into consideration the limited speed of data flow across the network, the operation will inevitably be slow.
4.Early versions of the MySQL database were optimized for early systems/hardware, but they didn’t take into consideration optimizations for the kinds of systems/hardware that are becoming popular now. Therefore they leave a lot to be desired in high concurrency situations. Furthermore, unlike other relational databases, for the sake of compatibility, MySQL uses two logs (task log and copy log), which hurts its performance in comparison to other commercial databases.
5.Because of the limits of physical disks and backup strategies, the size of the database can't be too large without making O&M a disaster.
6.Read/write instances and read-only instances synchronize through incremental logic data, so all of the SQL in a read/write instance needs to be re-executed on read-only instances (including steps like SQL parsing and SQL optimizations). At the same time, the concurrency of copy reads is based on table dimensions, which affects all kinds of task switching.
As the database grows, so do these “small” annoyances which can plague DBAs and CTOs. Today, all of these problems that have tripped us up for years are all solved in Alibaba Cloud’s new PolarDB. Note that these issues are solved from the root of each problem, not just hacked together solutions.
PolarDB is the next-generation relational database based on the cloud computing framework. Currently PolarDB only supports MySQL with PostgreSQL, which is under development. The most notable features are as follows:
With these features, PolarDB satisfies both the elastic expandability needs of public cloud computing environments and the high availability needs of the database server for users on the Internet. The expansion time of read-only instances is no longer related to data size and the service can now continue even in the time between a server crash and restart.
PolarDB also features a complete management system based on Docker to handle instance creation, deletion, and account creation tasks passed down by the user. It also includes a complete and detailed monitoring system and reliable, high availability switching. The management system also maintains a set of metabases used to record the locational information of of each data block, which it provides to PolarSwitch which then passes it on to the appropriate destination. It can be said that the entire PolarDB project uses several new technologies to provide users with fast (6x the performance of MySQL) performance, large capacity (up to 100 TB), and cheap resources (about 1/10 the cost of other commercial databases).
This article provides an in-depth insight into cloud-native database technology, focusing on the core functions and implementation principles of transparent sharding middleware.
The development and transformation of database technology is on the rise. NewSQL has emerged to combine various technologies, and the core functions implemented by the combination of these technologies have promoted the development of the cloud-native database.
This article provides an in-depth insight into cloud-native database technology Among the three types of NewSQL, the new architecture and Database-as-a-Service types involve many underlying implementations related to the database, and thus will not be elaborated here. This article focuses on the core functions and implementation principles of transparent sharding middleware. The core functions of the other two NewSQL types are similar to those of sharding middleware but have different implementation principles.
Regarding performance and availability, traditional solutions that store data on a single data node in a centralized manner can no longer adapt to the massive data scenarios created by the Internet. Most relational database products use B+ tree indexes. When the data volume exceeds the threshold, the increase in the index depth leads to an increased disk I/O count, the substantially degrading query performance. In addition, highly concurrent access requests also turn the centralized database into the biggest bottleneck of the system.
Since traditional relational databases cannot meet the requirements of the Internet, increasing numbers of attempts have been made to store data in NoSQL databases that natively support data distribution. However, NoSQL is not compatible with SQL Server and its ecosystem is yet to be improved. Therefore, NoSQL cannot replace relational databases, and the position of the relational databases is secure.
Sharding refers to the distribution of the data stored in a single database to multiple databases or tables based on a certain dimension to improve the overall performance and availability. Effective sharding measures include database sharding and table sharding of relational databases. Both sharding methods can effectively prevent query bottlenecks caused by a huge data volume that exceeds the threshold.
In addition, database sharding can effectively distribute the access requests of a single database, while table sharding can convert distributed transactions into local transactions whenever possible. The multi-master-and-multi-slave sharding method can effectively prevent the occurrence of single-points-of-data and enhance the availability of the data architecture.
This article introduces the challenges emerging in the cloud native era, and discusses how database technologies should adapt to face these challenges.
We are now in an all-cloud era, full of new technologies, innovations, and challenges. More importantly, we now face some important questions that can redefine the way we deal with database technologies. What reforms will be made in the database market in this era? How can cloud service providers offer more efficient and cost-effective database solutions to help more enterprise users seize opportunities presented by cloud migration?
In the database session of 2019 Alibaba Cloud Summit held in Beijing, Feifei Li, Vice President of Alibaba Group, Chief Database Scientist of Alibaba DAMO Academy, and Head of Database Business Group of Alibaba Cloud Intelligence, gave an insightful presentation on the next-generation cloud-native database technologies and the challenges they face.
According to a database market analysis report released by DB-Engine in January 2019, relational database products are still dominant in the database market. Meanwhile, more market segments, such as the graph database, document database, and NoSQL database segments, are forming in the database market. Another trend in this market is the continuous decline of the market shares of the traditional commercial database giants. By contrast, open-source and third-party database market shares keep expanding.
After over 40 years of evolution, database technology is still developing vigorously. Cloud computing vendors have reached a consensus that databases are an important component in the connection of IaaS and intelligent cloud applications. Therefore, the vendors need to improve their capabilities throughout the entire data lifecycle, including data production, storage, and consumption, enabling users to connect IaaS and intelligent applications.
Thanks to the constantly developing database technology, we now have online transaction processing (OLTP) systems to record real-time transaction data and online analytical processing (OLAP) systems to analyze massive amounts of data in real time. OLTP and OLAP systems require the support of database services and management tools. Given these circumstances, NoSQL database solutions have been developed to store semi-structured and non-structured data.
From the late 1970s to early 1980s, relational databases came into being, and later the SQL query language and OLTP systems were developed. The explosive growth in data volumes and the demand for complex data analysis gave rise to data warehousing, as well as OLAP, extract-transform-load (ETL), and other data processing technologies. With the continuous increase of multi-source heterogeneous data, such as graphs, documents, spatial-temporal data, and time series data, non-relational NoSQL and NewSQL database systems have also emerged.
Traditional databases typically use a single-node architecture, whereas cloud-native databases usually use a shared storage architecture. Alibaba Cloud PolarDB establishes a shared storage architecture over a high-speed network. This architecture separates storage from computing to enable the fast scale-out of computing nodes. In addition, PolarDB allows for the rapid scaling of storage and computing capabilities based on the actual needs of customers. Customers can use this shared storage database to complete a non-intrusive data migration without any change to the original business logic.
In addition to the cloud-native shared storage technology, a distributed architecture is required to handle highly concurrent access to massive amounts of data. For example, Alibaba is exploring the use of a distributed architecture to cope with the challenges posed by Double Eleven every year. Also, Alibaba Cloud wants to provide different query interfaces, such as SQL, to support queries of data in multiple models and states. Concerning the storage system, Alibaba Cloud hopes to allow users to store their data in different locations and use a unified interface like SQL to query all types of data. Alibaba Cloud Data Lake Analytics (DLA) is a cloud-native technology developed for this application scenario.
Traditional solutions isolate read and write conflicts by using the OLTP system to process transactions and the OLAP system to analyze huge volumes of data. In the cloud native era, Alibaba Cloud will minimize the cost of data migration by taking advantage of the technical benefits delivered by new hardware devices. This can be done by integrating transaction processing and data analytics features in one engine so that these two needs can be addressed seamlessly by one system.
Alibaba Cloud serves a large number of enterprises, which use our cloud resource pools based on a virtualized architecture that separates storage from computing. Therefore, we need to monitor and schedule all off-premises resources in an intelligent way to quickly respond to customer requirements and deliver optimal service quality. To achieve the necessary intelligence, we need to use machine learning and AI to enable automatic sensing, decision-making, recovery, and optimization in all sectors, including data migration, data protection, and elastic scheduling.
Learn about the basic concepts of RDS, the benefits of using it (compared to conventional database solutions) and understand its key features. It also includes demos that further introduce database/account management, security settings, read-only instances, database backups and third-party tool integration.
The distributed database solves the problems of traditional database such as capacity bottlenecks, difficulty in expansion, and high cost .etc. However, the configuration and management of distributed databases require higher technical capabilities. Alibaba Cloud's DRDS service helps you solve the problem of creating and managing distributed databases on the cloud. Through this course, you will understand the benefits and features of DRDS and how to use DRDS console to easily build and manage distributed database systems on Alibaba Cloud.
Alibaba Cloud Relational Database Service is our cloud database offering. Through this course, you will not only learn about Alibaba Cloud Relational Database Service design architect and applicable scenarios, but also by watching product console demos, you will be familiar with Relational Database Service major functions and operation details.
Learn about the basic concepts of RDS, the benefits of using it (compared to conventional database solutions) and understand its key features. It also includes demos that further introduce database/account management, security settings, read-only instances, database backups and third-party tool integration.
Postgres Professional is a key contributor to PostgreSQL community.
At Postgres Professional we develop Postgres Pro Database, a private PostgreSQL fork.
Each new Postgres Pro Database version is the latest PostgreSQL database version with patches committed to PostgreSQL community and Postgres Pro extensions/patches which are open source in most cases.
The database we created on DRDS should be built on the RDS instance. For the stability of the OLTP service, we recommend that you select a new RDS instance for the creation of the DRDS database.
You can only create DRDS databases on the console, DRDS does not support creating databases with SQL commands.
This topic describes how to use DMS and a client to connect to a PolarDB cluster compatible with Oracle.
You have created a privileged account or standard account for a database cluster. For more information, see Create a database account.
Data Management (DMS) provides an integrated solution for data management. DMS supports data management, schema management, access control, BI charts, trend analysis, data tracing, performance optimization, and server management. DMS supports relational databases such as MySQL, SQL Server, and PostgreSQL, as well as NoSQL databases such as MongoDB and Redis. DMS also supports the management of Linux servers.
Mitigate the scalability problem of single machine relational databases for large-scale online databases.
A cloud-native database management platform that allows you to manage on-premises databases in the same way as in Alibaba Cloud.
Natural Language Intelligence: Building a Language Bridge for Business
2,599 posts | 762 followers
FollowAlibaba Clouder - January 30, 2019
Alibaba Clouder - January 30, 2019
Alibaba Clouder - April 10, 2018
Alibaba Clouder - March 23, 2021
Alibaba Cloud New Products - June 2, 2020
Alibaba Cloud Community - November 3, 2022
2,599 posts | 762 followers
FollowA reliable, cost-efficient backup service for continuous data protection.
Learn MoreAn on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreMore Posts by Alibaba Clouder